码迷,mamicode.com
首页 > 其他好文 > 详细

Infiniband驱动安装-RHEL5.8

时间:2015-03-09 00:36:28      阅读:597      评论:0      收藏:0      [点我收藏+]

标签:cluster   hpc   mellanox   infiniband   


1      下载驱动


地址:http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers


 


根据操作系统版本进行驱动选择,建议使用ISO格式驱动包。


 


备注:RHEL5及以前版本选择1.5.3系列驱动,RHEL6及以后版本选择2.0及以上系列驱动。


 



2      驱动安装


2.1  将下载好的驱动传到服务器上,挂载到/public/ofed目录。


[root@node33 sourcecode]#mount -o loop MLNX_OFED_LINUX-1.5.3-4.0.42-rhel5.8-x86_64.iso /public/ofed/


[root@node33 sourcecode]# cd


[root@node33 ~]# df -h


Filesystem           Size  Used Avail Use% Mounted on


/dev/sda3            117G  9.8G  101G  9% /


/dev/sda1            494M   17M  452M  4% /boot


tmpfs                5.9G     0  5.9G  0% /dev/shm


/tftpboot/rhel.iso   3.9G  3.9G     0 100% /tftpboot/iso


/public/sourcecode/MLNX_OFED_LINUX-1.5.3-4.0.42-rhel5.8-x86_64.iso


                      267M  267M    0 100% /public/ofed


[root@node33 ~]#


2.2  执行安装命令,开始软件包安装。


[root@node33 ~]# /public/ofed/mlnxofedinstall -y


 


 Usage:/public/ofed/mlnxofedinstall [OPTIONS]


 


 Options


          -c|--config <packages config_file> Example of the configurationfile


                                             can be found under docs


          -n|--net <network config_file> Example of the networkconfiguration file


                                          canbe found under docs


          -k|--kernel-version <kernel version> Use provided kernel versioninstead of ‘uname -r‘


          -p|--print-available       Printavailable packages for current platform


                                      Andcreate corresponding ofed.conf file


          --without-32bit            Skip32-bit libraries installation


          --without-depcheck         SkipDistro‘s libraries check


           --without-fw-update        Skip firmware update


          --fw-update-only           Updatefirmware. Skip driver installation


          --force-fw-update          Forcefirmware update


          --force                    Forceinstallation


           --all|--hpc|--basic|--msm  Install all, hpc, basic or Mellanox Subnetmanager packages


                                     correspondingly


          --vma|--vma-vpi            Installpackages required by VMA to support VPI


          --vma-eth                  Install packages required by VMA towork over Ethernet


          -v|-vv|-vvv                Setverbosity level


          --umad-dev-rw              Grantnon root users read/write permission for umad devices instead of default


           --hugepages-overcommit     Setting 80% of MAX_MEMORY as overcommitfor huge page allocation


          --pfc <0|bitmask>         Priority based Flow Control policy on TX and RX [7:0].


                                      Perpriority bit mask (uint). Default 0.


          -q                         Setquiet - no messages will be printed


[root@node33 ~]# echo y |/public/ofed/mlnxofedinstall --basic --msm --umad-dev-rw --hugepages-overcommit


This program will install the MLNX_OFED_LINUX packageon your machine.


Note that all other Mellanox, OEM, OFED, orDistribution IB packages will be removed.


Do you want to continue?[y/N]:


 


Starting MLNX_OFED_LINUX-1.5.3-4.0.42 installation...


 


Installing mlnx-ofa_kernel RPM


Preparing...               ##################################################


mlnx-ofa_kernel            ##################################################


Installing kmod-mlnx-ofa_kernel RPM


Preparing...                ##################################################


kmod-mlnx-ofa_kernel       ##################################################


Installing kmod-mlnx-ofa_kernel-xen RPM


Preparing...               ##################################################


kmod-mlnx-ofa_kernel-xen   ##################################################


Installing kernel-mft RPM


Preparing...               ##################################################


kernel-mft                 ##################################################


Installing user level RPMs:


Preparing...               ##################################################


mlnxofed-docs              ##################################################


Preparing...               ##################################################


ofed-scripts               ##################################################


Preparing...               ##################################################


libibverbs                 ##################################################


Preparing...               ##################################################


libibverbs                 ##################################################


Preparing...               ##################################################


libibverbs-utils           ##################################################


Preparing...               ##################################################


libmthca                   ##################################################


Preparing...                ##################################################


libmthca                   ##################################################


Preparing...               ##################################################


libmverbs                  ##################################################


Preparing...               ##################################################


libmverbs                  ##################################################


Preparing...               ##################################################


libmlx4                    ##################################################


Preparing...               ##################################################


libmlx4                    ##################################################


Preparing...               ##################################################


libcxgb3                   ##################################################


Preparing...               ##################################################


libcxgb3                    ##################################################


Preparing...               ##################################################


libnes                     ##################################################


Preparing...                ##################################################


libnes                     ##################################################


Preparing...               ##################################################


libipathverbs              ##################################################


Preparing...               ##################################################


libipathverbs              ##################################################


Preparing...               ##################################################


librdmacm                   ##################################################


Preparing...               ##################################################


librdmacm                  ##################################################


Preparing...                ##################################################


librdmacm-utils            ##################################################


Preparing...               ##################################################


mstflint                    ##################################################


Preparing...               ##################################################


libibumad                  ##################################################


Preparing...               ##################################################


libibumad                  ##################################################


Preparing...               ##################################################


libibmad                   ##################################################


Preparing...               ##################################################


libibmad                   ##################################################


Preparing...               ##################################################


mft                         ##################################################


Preparing...               ##################################################


opensm-libs                ##################################################


Preparing...                ##################################################


opensm-libs                ##################################################


Preparing...               ##################################################


infiniband-diags           ##################################################


Preparing...               ##################################################


opensm                     ##################################################


Preparing...               ##################################################


ibutils                    ##################################################


Device (06:00.0):


         06:00.0InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR/ 10GigE] (rev b0)


         LinkWidth: 8x


         PCILink Speed: 2.5Gb/s


 


 


Installation finished successfully.


 


Programming HCA firmware for /dev/mst/mt26428_pci_cr0device


Running: mlxburn -d /dev/mst/mt26428_pci_cr0 -fw/public/ofed/firmware/fw-25408/2_9_1000/fw-ConnectX2-rel.mlx -dev_type25408  -no


-I- Querying device ...


-I- Using auto detected configuration file:/public/ofed/firmware/fw-25408/2_9_1000/MHQH19B-XTR_A1-A3.ini (PSID =MT_0D90110009)


-I- Generating image ...


 


   Current FW version on flash: 2.7.626


   New FW version:              2.9.1000


 


Burning FW image without signatures  - OK 


Restoring signature                  - OK 


-I- Image burn completed successfully.


Configuring /etc/security/limits.conf.


Please reboot your system for the changes to takeeffect.


[root@node33 ~]#


 


备注:安装可选allhpcbasicmsm四种方式。建议使用basic标准模式。管理节点需要安装msmbasic两种模式!!!安装过程中会强制刷新HCA卡固件,非独立HCA卡请严格注意固件版本!!!


2.3  配置IB网卡IP地址


[root@node33 ~]# cat <<EOF >> /etc/sysconfig/network-scripts/ifcfg-ib0


>DEVICE=ib0


>BOOTPROTO=none


>ONBOOT=yes


>NETMASK=255.255.255.0


>IPADDR=12.12.12.3


> EOF


[root@node33 ~]#


[root@node33 ~]# cat/etc/sysconfig/network-scripts/ifcfg-ib0


DEVICE=ib0


BOOTPROTO=none


ONBOOT=yes


NETMASK=255.255.255.0


IPADDR=12.12.12.33


[root@node33 ~]#


2.4  启动IB服务


[root@node33 ~]# chkconfig--list | grep open


openibd            0:off 1:off 2:on 3:on 4:on 5:on 6:off


opensmd        0:off 1:off 2:off 3:on 4:on 5:on 6:off


[root@node33 ~]# /etc/init.d/openibdrestart


Unloading HCA driver:                                      [  OK  ]


Loading HCA driver and Access Layer:                       [  OK  ]


Setting up InfiniBand network interfaces:


Bringing up interface ib0:                                 [  OK  ]


Setting up service network . . .                           [  done  ]


[root@node33 ~]# /etc/init.d/opensmdrestart


Stopping IB Subnet Manager.                                [FAILED]


Starting IB Subnet Manager.                                [  OK  ]


[root@node33 ~]# ibstat


CA ‘mlx4_0‘


         CAtype: MT26428


         Numberof ports: 1


         Firmwareversion: 2.9.1000


         Hardwareversion: b0


         NodeGUID: 0x0002c903000cc00e


         Systemimage GUID: 0x0002c903000cc011


         Port 1:


                   State:Active


                   Physicalstate: LinkUp


                   Rate:40


                   Baselid: 1


                   LMC:0


                   SMlid: 1


                   Capabilitymask: 0x0251086a


                   PortGUID: 0x0002c903000cc00f


                   Linklayer: InfiniBand


[root@node33 ~]#


备注:管理节点需要先启动openibd,后启动opensmd。计算节点只需要启动openibd。配置完成后注意通过ibstat检查速率和链路状态。


3      卸载IB驱动


[root@node33 ~]#echo y | /public/ofed.uninstall.sh


 


This program will uninstall allMLNX_OFED_LINUX-1.5.3-4.0.42 packages on your machine.


 


Do you want to continue?[y/N]:y


 


 


rpm -e --allmatches --nodeps kmod-mlnx-ofa_kernel-xen-1.5.3-OFED.1.5.3.4.0.42.g3cb72fe.rhel5u8libnes-1.1.1mlnx1-1 libcxgb3-1.3.1-1 libmverbs-0.1.0-3.15.gd28970elibibmad-1.3.8.MLNX_20120424-0.1 libmthca-1.0.6mlnx1-0.1.gbe5eef3 libibumad-1.3.7.MLNX_20130110_ff06102-0.1libibverbs-1.1.5mlnx2-1 libmlx4-1.0.2mlnx6-1 librdmacm-1.0.15-1kernel-mft-2.7.1-2.6.18_308.el5 libmverbs-0.1.0-3.15.gd28970elibipathverbs-1.2mlnx1-1 libibmad-1.3.8.MLNX_20120424-0.1mlnx-ofa_kernel-1.5.3-OFED.1.5.3.4.0.42.g3cb72fe.rhel5u8libibverbs-utils-1.1.5mlnx2-1 libcxgb3-1.3.1-1 mstflint-1.4mlnx4-1.21.gd948dddlibmlx4-1.0.2mlnx6-1 librdmacm-1.0.15-1 libmthca-1.0.6mlnx1-0.1.gbe5eef3libibumad-1.3.7.MLNX_20130110_ff06102-0.1 libibverbs-1.1.5mlnx2-1 librdmacm-utils-1.0.15-1mlnxofed-docs-1.5.3-4.0.42 libipathverbs-1.2mlnx1-1kmod-mlnx-ofa_kernel-1.5.3-OFED.1.5.3.4.0.42.g3cb72fe.rhel5u8libnes-1.1.1mlnx1-1 kernel-mft-2.7.1-2.6.18_308.el5ofed-scripts-1.5.3-OFED.1.5.3.4.0.42 mft-2.7.1a-1


Uninstall finished successfully


[root@node33 ~]#rm –rf/etc/infiniband


[root@node33 ~]#


4      排错


4.1  查看IB工作状态


[root@node33 ~]# ibstat


CA ‘mlx4_0‘


         CAtype: MT26428


         Numberof ports: 1


         Firmwareversion: 2.9.1000


         Hardwareversion: b0


         NodeGUID: 0x0002c903000cc00e


         Systemimage GUID: 0x0002c903000cc011


         Port 1:


                   State:Active


                   Physicalstate: LinkUp


                   Rate:40


                   Baselid: 1


                   LMC:0


                   SMlid: 1


                   Capabilitymask: 0x0251086a


                   PortGUID: 0x0002c903000cc00f


                   Linklayer: InfiniBand


[root@node33 ~]#


 


4.2  查看hosts信息


[root@node33 ~]# ibhosts


Ca    :0x0002c903000cc00a ports 1 "node34 HCA-1"


Ca    :0x0002c903000cc00e ports 1 "node33 HCA-1"


[root@node33 ~]#


4.3  查看switch信息


[root@node33 ~]# ibswitches


Switch      :0x0002c9020042bcc0 ports 36 "MF0;switch-1140a2:IS5030/U1" enhancedport 0 lid 4 lmc 0


[root@node33 ~]#


4.4  查看拓扑信息


[root@node33 ~]#ibnetdiscover


#


# Topology file: generated on Sun Mar  8 19:53:35 2015


#


# Initiated from node 0002c903000cc00e port0002c903000cc00f


 


vendid=0x2c9


devid=0xbd36


sysimgguid=0x2c9020042bcc3


switchguid=0x2c9020042bcc0(2c9020042bcc0)


Switch      36"S-0002c9020042bcc0"                #"MF0;switch-1140a2:IS5030/U1" enhanced port 0 lid 4 lmc 0


[30]  "H-0002c903000cc00e"[1](2c903000cc00f)          # "node33 HCA-1" lid 14xQDR


[31]  "H-0002c903000cc00a"[1](2c903000cc00b)                   # "node34HCA-1" lid 7 4xQDR


 


vendid=0x2c9


devid=0x673c


sysimgguid=0x2c903000cc00d


caguid=0x2c903000cc00a


Ca    1"H-0002c903000cc00a"                 #"node34 HCA-1"


[1](2c903000cc00b)        "S-0002c9020042bcc0"[31]              # lid 7 lmc 0"MF0;switch-1140a2:IS5030/U1" lid 4 4xQDR


 


vendid=0x2c9


devid=0x673c


sysimgguid=0x2c903000cc011


caguid=0x2c903000cc00e


Ca    1"H-0002c903000cc00e"                 #"node33 HCA-1"


[1](2c903000cc00f)         "S-0002c9020042bcc0"[30]              # lid 1 lmc 0"MF0;switch-1140a2:IS5030/U1" lid 4 4xQDR


[root@node33 ~]#


4.5  查看报错统计信息


[root@node33 ~]# ibdiagnet -Pall=1


Loading IBDIAGNET from:/opt/ibutils/lib64/ibdiagnet1.5.7


-W- Topology file is not specified.


    Reportsregarding cluster links will use direct routes.


Loading IBDM from: /opt/ibutils/lib64/ibdm1.5.7


-I- Using port 1 as the local port.


-I- Discovering ... 3 nodes (1 Switches & 2 CA-s)discovered.


 


 


-I---------------------------------------------------


-I- Bad Guids/LIDs Info


-I---------------------------------------------------


-I- No bad Guids were found


 


-I---------------------------------------------------


-I- Links With Logical State = INIT


-I---------------------------------------------------


-I- No bad Links (with logical state = INIT) werefound


 


-I---------------------------------------------------


-I- General Device Info


-I---------------------------------------------------


 


-I---------------------------------------------------


-I- PM Counters Info


-I---------------------------------------------------


-I- No illegal PM counters values were found


 


-I---------------------------------------------------


-I- Fabric Partitions Report (see ibdiagnet.pkey fora full hosts list)


-I---------------------------------------------------


-I-   PKey:0x7fff Hosts:2 full:2 limited:0


 


-I---------------------------------------------------


-I- IPoIB Subnets Check


-I---------------------------------------------------


-I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1bMTU:2048Byte rate:10Gbps SL:0x00


-W- Suboptimal rate for group. Lowest memberrate:40Gbps > group-rate:10Gbps


 


-I---------------------------------------------------


-I- Bad Links Info


-I- No bad link were found


-I---------------------------------------------------


----------------------------------------------------------------


-I- Stages Status Report:


   STAGE                                    ErrorsWarnings


   Bad GUIDs/LIDs Check                    0      0    


   Link State Active Check                 0      0    


   General Devices Info Report             0      0    


   Performance Counters Report             0      0    


   Partitions Check                        0      0    


   IPoIB Subnets Check                     0      1    


 


Please see /tmp/ibdiagnet.log for complete log


----------------------------------------------------------------


 


-I- Done. Run time was 1 seconds.


[root@node33 ~]#


4.6  查看全局详细报错信息


[root@node33 ~]# ibqueryerrors


Errors for 0x2c9020042bcc0"MF0;switch-1140a2:IS5030/U1"


   GUID0x2c9020042bcc0 port ALL: [PortRcvSwitchRelayErrors == 64] [PortXmitDiscards ==29] [PortXmitWait == 240663]


   GUID0x2c9020042bcc0 port 0: [PortXmitWait == 1232]


   GUID0x2c9020042bcc0 port 1: [PortRcvSwitchRelayErrors == 2] [PortXmitDiscards == 3]


   GUID0x2c9020042bcc0 port 2: [PortRcvSwitchRelayErrors == 3] [PortXmitDiscards == 3]


   GUID0x2c9020042bcc0 port 3: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards == 3]


   GUID0x2c9020042bcc0 port 4: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards == 1]


   GUID0x2c9020042bcc0 port 5: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards == 2]


   GUID0x2c9020042bcc0 port 6: [PortRcvSwitchRelayErrors == 2] [PortXmitDiscards == 3]


   GUID0x2c9020042bcc0 port 7: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards == 2]


   GUID0x2c9020042bcc0 port 8: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards == 2]


   GUID0x2c9020042bcc0 port 9: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards == 2]


   GUID0x2c9020042bcc0 port 10: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards ==2]


   GUID0x2c9020042bcc0 port 11: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards ==2]


   GUID0x2c9020042bcc0 port 12: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards ==2]


   GUID0x2c9020042bcc0 port 13: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards ==1]


   GUID0x2c9020042bcc0 port 14: [PortRcvSwitchRelayErrors == 1] [PortXmitDiscards ==1]


   GUID0x2c9020042bcc0 port 30: [PortXmitWait == 4294967295]


   GUID0x2c9020042bcc0 port 31: [PortRcvSwitchRelayErrors == 46] [PortXmitWait == 295]


   GUID0x2c9020042bcc0 port 34: [PortXmitWait == 892]


   GUID0x2c9020042bcc0 port 36: [PortXmitWait == 238245]


 


## Summary: 17 nodes checked, 1 bad nodes found


##          53ports checked, 19 ports have errors beyond threshold


## Thresholds:


## Suppressed:


[root@node33 ~]#


Infiniband驱动安装-RHEL5.8

标签:cluster   hpc   mellanox   infiniband   

原文地址:http://blog.csdn.net/xztjhs/article/details/44141389

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!