Comments (12)
So I read through the change which introduced this special pseudo device: https://www.smartmontools.org/ticket/252 .
Confusing because the output suggests that /dev/bus/0 is a real device or a real path but both actually don't apply:
-d megaraid allows to specify devices in pseudo /dev/bus/N format
This is actually handled internally by smartctl and does not refer to a real existing path in the devfs.
However the author of that change (Alex Samorukov) has a good point:
it is possible that drive name is not exists at all (unconfigured RAID)
That's a rare yet real situation which can exist. You may have a couple of disks attached to your megaraid controller but they aren't configured yet. This means there is no logical drive seen by the OS and therefore you can't monitor these drives. Except with this pseudo-device /dev/bus/N.
I will make a change in check_smart to accept thi pseudo-device/path.
from check_smart.
Can you show the output of ls -la /dev/bus/0
and ls -la /dev/sd*
please?
Do you have megacli
installed? If yes, it would be interesting to see how the raid was built (e.g. raid-5 between these 5 drives?).
What exactly is behind /dev/sda, /dev/sdb, /dev/sdc? Are these single drives or another raid of other drives or again the megaraid devices?
from check_smart.
thanks for your quick reply.
Server has 2 SSDs (sdb/sdc) as Software-RAID 1 for OS
plus 1 Hardware RAID-6 for data (lvm)
# ls -la /dev/bus/0
ls: cannot access /dev/bus/0: No such file or directory
# ls -la /dev/sd*
brw-rw---- 1 root disk 8, 0 Jun 7 14:11 /dev/sda
brw-rw---- 1 root disk 8, 16 May 31 15:27 /dev/sdb
brw-rw---- 1 root disk 8, 17 May 31 15:27 /dev/sdb1
brw-rw---- 1 root disk 8, 32 May 31 15:27 /dev/sdc
brw-rw---- 1 root disk 8, 33 May 31 15:27 /dev/sdc1
storcli as this is a newer AVAGO MegaRAID SAS 9460-16i:
# /opt/MegaRAID/storcli/storcli64 /c0 show
Generating detailed summary of the adapter, it may take a while to complete.
CLI Version = 007.1017.0000.0000 May 10, 2019
Operating system = Linux 3.10.0-957.12.2.el7.x86_64
Controller = 0
Status = Success
Description = None
Product Name = AVAGO MegaRAID SAS 9460-16i
Serial Number = ***
SAS Address = 500062b204257a40
PCI Address = 00:13:00:00
System Time = 06/07/2019 22:12:39
Mfg. Date = 08/17/18
Controller Time = 06/07/2019 21:12:34
FW Package Build = 50.5.0-1121
BIOS Version = 7.05.02.0_0x07050400
FW Version = 5.050.00-1292
Driver Name = megaraid_sas
Driver Version = 07.705.02.00-rh1
Current Personality = RAID-Mode
Vendor Id = 0x1000
Device Id = 0x14
SubVendor Id = 0x1000
SubDevice Id = 0x9460
Host Interface = PCI-E
Device Interface = SAS-12G
Bus Number = 19
Device Number = 0
Function Number = 0
Drive Groups = 1
TOPOLOGY :
========
-----------------------------------------------------------------------------
DG Arr Row EID:Slot DID Type State BT Size PDC PI SED DS3 FSpace TR
-----------------------------------------------------------------------------
0 - - - - RAID6 Optl N 2.181 TB dsbl N N dflt N N
0 0 - - - RAID6 Optl N 2.181 TB dsbl N N dflt N N
0 0 0 134:0 14 DRIVE Onln N 744.687 GB dsbl N N dflt - N
0 0 1 134:1 15 DRIVE Onln N 744.687 GB dsbl N N dflt - N
0 0 2 134:2 16 DRIVE Onln N 744.687 GB dsbl N N dflt - N
0 0 3 134:3 17 DRIVE Onln N 744.687 GB dsbl N N dflt - N
0 0 4 134:4 18 DRIVE Onln N 744.687 GB dsbl N N dflt - N
-----------------------------------------------------------------------------
DG=Disk Group Index|Arr=Array Index|Row=Row Index|EID=Enclosure Device ID
DID=Device ID|Type=Drive Type|Onln=Online|Rbld=Rebuild|Dgrd=Degraded
Pdgd=Partially degraded|Offln=Offline|BT=Background Task Active
PDC=PD Cache|PI=Protection Info|SED=Self Encrypting Drive|Frgn=Foreign
DS3=Dimmer Switch 3|dflt=Default|Msng=Missing|FSpace=Free Space Present
TR=Transport Ready
Virtual Drives = 1
VD LIST :
=======
--------------------------------------------------------------
DG/VD TYPE State Access Consist Cache Cac sCC Size Name
--------------------------------------------------------------
0/0 RAID6 Optl RW Yes RWBD - ON 2.181 TB Raid6
--------------------------------------------------------------
EID=Enclosure Device ID| VD=Virtual Drive| DG=Drive Group|Rec=Recovery
Cac=CacheCade|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady|B=Blocked|
Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency
Physical Drives = 5
PD LIST :
=======
------------------------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type
------------------------------------------------------------------------------
134:0 14 Onln 0 744.687 GB SAS SSD N N 512B SDLL1DLR800GCCA1 U -
134:1 15 Onln 0 744.687 GB SAS SSD N N 512B SDLL1DLR800GCCA1 U -
134:2 16 Onln 0 744.687 GB SAS SSD N N 512B SDLL1DLR800GCCA1 U -
134:3 17 Onln 0 744.687 GB SAS SSD N N 512B SDLL1DLR800GCCA1 U -
134:4 18 Onln 0 744.687 GB SAS SSD N N 512B SDLL1DLR800GCCA1 U -
------------------------------------------------------------------------------
EID=Enclosure Device ID|Slt=Slot No.|DID=Device ID|DG=DriveGroup
DHS=Dedicated Hot Spare|UGood=Unconfigured Good|GHS=Global Hotspare
UBad=Unconfigured Bad|Onln=Online|Offln=Offline|Intf=Interface
Med=Media Type|SED=Self Encryptive Drive|PI=Protection Info
SeSz=Sector Size|Sp=Spun|U=Up|D=Down|T=Transition|F=Foreign
UGUnsp=Unsupported|UGShld=UnConfigured shielded|HSPShld=Hotspare shielded
CFShld=Configured shielded|Cpybck=CopyBack|CBShld=Copyback Shielded
UBUnsp=UBad Unsupported
Cachevault_Info :
===============
------------------------------------
Model State Temp Mode MfgDate
------------------------------------
CVPM05 Optimal 26C - 2018/06/07
------------------------------------
from check_smart.
What is /dev/sda then? Is this the raid-6 PV for data?
from check_smart.
/dev/sda is RAID6 PV, indeed.
from check_smart.
Did you already try ./check_smart.pl -d /dev/sda -i megaraid,1
?
from check_smart.
That would work but as we're rolling out nrpe-checks by puppet (based on smartctl --scan) automatically for each physical disk we don't know the direct relation between /dev/bus/0 (retrieved by smartctl --scan) and /dev/sda)
from check_smart.
The --scan output is wrong or gives confusing data. The correct approach is to call the real block device (/dev/sda) and use the adapter position behind the block device.
from check_smart.
why wrong/confusing? It just works without guessing the correct blockdevice and returns the same results ...
Seems intended behaviour:
(man smartctl)
megaraid,N - [Linux only] the device consists of one or more SCSI/SAS disks connected to a MegaRAID controller. The non-negative integer N (in the range of 0 to 127 inclusive) denotes which disk on the controller is monitored. Use syntax such as:
smartctl -a -d megaraid,2 /dev/sda
smartctl -a -d megaraid,0 /dev/sdb
smartctl -a -d megaraid,0 /dev/bus/0
This interface will also work for Dell PERC controllers. It is possible to set RAID device name as /dev/bus/N, where N is a SCSI bus number.
The following entry in /proc/devices must exist:
For PERC2/3/4 controllers: megadevN
For PERC5/6 controllers: megaraid_sas_ioctlN
# smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
/dev/sdc -d scsi # /dev/sdc, SCSI device
/dev/bus/0 -d megaraid,14 # /dev/bus/0 [megaraid_disk_14], SCSI device
/dev/bus/0 -d megaraid,15 # /dev/bus/0 [megaraid_disk_15], SCSI device
/dev/bus/0 -d megaraid,16 # /dev/bus/0 [megaraid_disk_16], SCSI device
/dev/bus/0 -d megaraid,17 # /dev/bus/0 [megaraid_disk_17], SCSI device
/dev/bus/0 -d megaraid,18 # /dev/bus/0 [megaraid_disk_18], SCSI device
# smartctl -a /dev/bus/0 -d megaraid,14
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-957.12.2.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: HGST
Product: SDLL1DLR800GCCA1
Revision: Y150
Compliance: SPC-4
User Capacity: 800,166,076,416 bytes [800 GB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is resource provisioned, LBPRZ=1
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Logical Unit id: 0x5001173101846b40
Serial number: A0469CEF
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Sat Jun 8 19:58:43 2019 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Percentage used endurance indicator: 0%
Current Drive Temperature: 37 C
Drive Trip Temperature: 70 C
Manufactured in week 12 of year 2018
Specified cycle count over device lifetime: 0
Accumulated start-stop cycles: 526
Elements in grown defect list: 0
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 0 0 29090.359 0
write: 0 0 0 0 0 1819.224 0
verify: 0 0 0 0 0 0.034 0
Non-medium error count: 228
No self-tests have been logged
# smartctl -a /dev/sda -d megaraid,14
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-957.12.2.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: HGST
Product: SDLL1DLR800GCCA1
Revision: Y150
Compliance: SPC-4
User Capacity: 800,166,076,416 bytes [800 GB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is resource provisioned, LBPRZ=1
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Logical Unit id: 0x5001173101846b40
Serial number: A0469CEF
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Sat Jun 8 19:58:51 2019 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Percentage used endurance indicator: 0%
Current Drive Temperature: 37 C
Drive Trip Temperature: 70 C
Manufactured in week 12 of year 2018
Specified cycle count over device lifetime: 0
Accumulated start-stop cycles: 526
Elements in grown defect list: 0
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 0 0 29090.359 0
write: 0 0 0 0 0 1819.224 0
verify: 0 0 0 0 0 0.034 0
Non-medium error count: 228
No self-tests have been logged
from check_smart.
I'm doing some research concerning this and will get back to you.
from check_smart.
@greno2 Could you please test #38 (https://raw.githubusercontent.com/Napsty/check_smart/pseudo-bus-device/check_smart.pl) before I merge into master? Thx
from check_smart.
Works like a charm. Thanks for your quick support.
from check_smart.
Related Issues (20)
- status line 2000GB Gigabyte AORUS M.2 2280 PCIe 4.0 x4 NVMe HOT 4
- Warning thresholds does NOT give the expected result. HOT 2
- Add attribute 188 Command_Timeout to raw check list HOT 1
- Handling dots in attribute names HOT 1
- add aacraid HOT 5
- Request: Auto detect and count all drive on system
- Add special monitoring on SSD attribute 202 (Percent_Lifetime_Remain) HOT 1
- Prioritise output by criticality HOT 14
- Wear_Leveling_Count is not reported as CRIT when disk is almost dead HOT 7
- No performance data on NVMe drive HOT 2
- 6.12.0 regression: invalid interface
- megaraid,N not work with 6.12 HOT 2
- Add TBW calculations for end of life prediction in SSDs HOT 1
- Percent_Lifetime_Remain usage HOT 5
- flag to disable temperature check HOT 2
- Intel ssd wearout not reported when almost dead HOT 9
- check_smart.pl very slow on Almalinux 8 HOT 1
- Percent_Lifetime_Remain threshold unset with -w HOT 19
- No output after pipe HOT 4
- Kingston ssd wearout not detected HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from check_smart.