stanford-rc / ibswinfo Goto Github PK
View Code? Open in Web Editor NEWCommand-line tool to retrieve information and monitor Mellanox un-managed Infiniband switches
License: GNU General Public License v3.0
Command-line tool to retrieve information and monitor Mellanox un-managed Infiniband switches
License: GNU General Public License v3.0
Break down FORE bitmasks to get alerted fan id
Line 264 in a1dea19
./ibswinfo.sh -d lid-3
error: Index: module_base was not provided
issue with register MGIR
ibswinfo supports Mellanox SB7790 unmanaged switches running firmware 11.1100.0072 or greater with 1 exception. The '-T' flag is unsupported. All other info and vitals are captured.
Thanks!
Mark
...`
consider interfaces counters (tx/rx) in PPCNT
Line 351 in a1dea19
Hello!
we have several MSX6025F-1SF in use and MSX6036F-1SF (also SwitchX but managed)
Is there a chance that they will be supported in the future?
best regards
fan status | OK
Hi,
The unmanaged EDR Switch-IB based switches (SB7790) appear to mostly work. The PSID is reported incorrectly; it appears to be transposed?
# flint -d /dev/mst/SW_MT52000_SwitchIB_Mellanox_Technologies_lid-0x0005 q | grep PSID
PSID: HP_1880110032
versus
# ./ibswinfo.sh -d lid-5 | grep PSID
PSID | 1_PH108830012
It also only reports 35 instead of 36 ports.
consider cable information in PDDR
idx local_port=0x[portnum],pnat=0x0,page_select=0x3,group_opcode=0x0
for PHYs
Line 353 in a1dea19
Hi,
Thank you for this very useful tool!
With our MQM8790-HS2F switches, I have the issue that none of them reports the presence of two PSUs, even though they are physically present. PSU1 reports "ERROR" for all of them (see below).
I wonder whether there is any real problem with them. Do you have any hints to follow up? - Thank you in advance!
Quantum Mellanox Technologies
=================================================
part number | MQM8790-HS2F
serial number | MT2006X1....
ports | 80
GUID | 0x.....
firmware version | 0.0000.0000
-------------------------------------------------
uptime (d-h:m:s) | 389d-18:45:26
-------------------------------------------------
PSU0 status | OK
S/N | MT1951X0....
DC power | OK
fan status | OK
power (W) | 198
PSU1 status | ERROR
DC power | ERROR
fan status | ERROR
When setting a switch description to a string 4 characters or shorter, the script fails to execute:
# ./ibswinfo.sh -d lid-801 -S test
Device: lid-801
Current node description: Quantum Mellanox Technologies
Set node description to : test
>> Confirm? (y/N) y
Setting new node description...
# echo $?
1
Using a longer node description works as expected:
# ./ibswinfo.sh -d lid-801 -S test1
Device: lid-801
Current node description: Quantum Mellanox Technologies
Set node description to : test1
>> Confirm? (y/N) y
Setting new node description...
done!
# echo $?
0
We have found that the ibswinfo.sh script handles HDR (MQM8790-HS2F) and NDR (MQM9790-NS2F) IB switches just fine, except that only 9 fan speeds are reported instead of 14 (6x2 + 2 PSUs) on the HDR switches and 16 (7x2 + 2 PSUs) on the NDR switches. We have not yet been able to determine a cause for this.
set node description in SPZR
mlxreg -d $dev --reg_name SPZR --set "ndm=0x1,node_description[0]=0x666f6f6f" --indexes "swid=0x0"
but there seems to be register size issues: Mellanox/mstflint#329
Line 356 in a1dea19
# mlxreg_ext -d lid-800 --reg_name SPZR --get --indexes swid=0x0
Index: router_entity was not provided'
# mlxreg --version
mlxreg, mft 4.23.0-104, built on Jan 31 2023, 14:40:52. Git SHA Hash: 7d02cc1
FYI, we've able to run ibswinfo.sh to query information on CS7500 (648x100G framed switch).
From the output we may identify the switch unit is acting as spine or leaf. However, the PSU status is wrong, as the CS7500 switch has a bunch of PSUs and these PSUs weren't bonded to specific switch units.
=================================================
MF0;XXXXXX:CS7500/S01/U1
=================================================
part number | MSB7520-E
serial number | MT1533XXXXXX
product name | Barracuda SwitchIB spine
revision | A6
ports | 36
PSID | MT_2090XXXXXX
GUID | 0xXXXXXX0300f87da0
firmware version | 11.2008.3336
-------------------------------------------------
uptime (d-h:m:s) | 208d-19:33:46
-------------------------------------------------
PSU0 status | ERROR
DC power | ERROR
fan status | ERROR
PSU1 status | ERROR
DC power | ERROR
fan status | ERROR
-------------------------------------------------
temperature (C) | 69
max temp (C) | 76
-------------------------------------------------
fan status | OK
fan#1 (rpm) | 9633
fan#2 (rpm) | 8287
fan#3 (rpm) | 9842
fan#4 (rpm) | 8389
-------------------------------------------------
=================================================
MF0;XXXXXX:CS7500/L01/U1
=================================================
part number | MSB7510-E
serial number | MT1646XXXXXX
product name | Barracuda SwitchIB leaf
revision | A9
ports | 36
PSID | MT_2080XXXXXX
GUID | 0xXXXXXX03007ee7c0
firmware version | 11.2008.3336
-------------------------------------------------
uptime (d-h:m:s) | 209d-00:34:22
-------------------------------------------------
PSU0 status | ERROR
DC power | ERROR
fan status | ERROR
PSU1 status | ERROR
DC power | ERROR
fan status | ERROR
-------------------------------------------------
temperature (C) | 69
max temp (C) | 70
-------------------------------------------------
fan status | OK
=================================================
MF0;XXXXXX:CS7500/L02/U1
=================================================
part number | 843193-B21
serial number | IL29XXXXXX
product name | Barracuda SwitchIB leaf
revision | A2
ports | 36
PSID | HP_2080XXXXXX
GUID | 0xXXXXXX03006b2480
firmware version | 15.2010.1202
-------------------------------------------------
uptime (d-h:m:s) | 208d-20:19:43
-------------------------------------------------
PSU0 status | ERROR
DC power | ERROR
fan status | ERROR
PSU1 status | ERROR
DC power | ERROR
fan status | ERROR
-------------------------------------------------
temperature (C) | 72
max temp (C) | 77
-------------------------------------------------
fan status | OK
hi ,Sir
I have update the MFT version to 4.16.and it looks tha the FW version is not correctly show in the output
[[ $mft_cur =~ 4.15 ]] && add_idx="slot_index=0x0" || add_idx=""
and the result ,highlight line with ====================
[root@n1041 ~]# ./myibcheck -d lid-0x00009 -o inventory
node_desription : SwitchIB Mellanox Technologies
part_number : MSB7790-ES2R
serial : MT1520X02370
fw_version : 0.0000.0000 ====================
psu0.serial : MT151 =====================
[root@n1041 ~]# flint -d lid-0x00047 q
Image type: FS3
FW ISSU Version: 1
FW Version: 15.2008.2402
FW Release Date: 12.2.2021
Description: UID GuidsNumber
[root@n1041 ~]# flint -d lid-0x00009 q
Image type: FS3
FW ISSU Version: 1
FW Version: 11.2008.2402 ====================
FW Release Date: 12.2.2021
Description: UID GuidsNumber Step
After update to RHEL7.9 mft package changed from mft-4.14.0-105.x86_64 to mft-4.15.1-100.x86_64 and began seeing,
error: Index: slot_index was not provided
Added slot_index=0x0 to MTMP and MTCAP as workaround and it's working fine, we call the script with only -d dev.
diff of our local copy and the original with the changes,
diff /usr/local/sbin/ibswinfo.sh ibswinfo-master/ibswinfo.sh
227,228c227
< rid[MTMP]="sensor_index=0x0,slot_index=0x0"
< rid[MTCAP]="slot_index=0x0"
---
> rid[MTMP]="sensor_index=0x0"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.