Code Monkey home page Code Monkey logo

Comments (12)

Napsty avatar Napsty commented on May 29, 2024 1

So I read through the change which introduced this special pseudo device: https://www.smartmontools.org/ticket/252 .

Confusing because the output suggests that /dev/bus/0 is a real device or a real path but both actually don't apply:

-d megaraid allows to specify devices in pseudo /dev/bus/N format

This is actually handled internally by smartctl and does not refer to a real existing path in the devfs.

However the author of that change (Alex Samorukov) has a good point:

it is possible that drive name is not exists at all (unconfigured RAID)

That's a rare yet real situation which can exist. You may have a couple of disks attached to your megaraid controller but they aren't configured yet. This means there is no logical drive seen by the OS and therefore you can't monitor these drives. Except with this pseudo-device /dev/bus/N.

I will make a change in check_smart to accept thi pseudo-device/path.

from check_smart.

Napsty avatar Napsty commented on May 29, 2024

Can you show the output of ls -la /dev/bus/0 and ls -la /dev/sd*please?
Do you have megacli installed? If yes, it would be interesting to see how the raid was built (e.g. raid-5 between these 5 drives?).
What exactly is behind /dev/sda, /dev/sdb, /dev/sdc? Are these single drives or another raid of other drives or again the megaraid devices?

from check_smart.

greno2 avatar greno2 commented on May 29, 2024

thanks for your quick reply.

Server has 2 SSDs (sdb/sdc) as Software-RAID 1 for OS
plus 1 Hardware RAID-6 for data (lvm)

# ls -la /dev/bus/0
ls: cannot access /dev/bus/0: No such file or directory
# ls -la /dev/sd*
brw-rw---- 1 root disk 8,  0 Jun  7 14:11 /dev/sda
brw-rw---- 1 root disk 8, 16 May 31 15:27 /dev/sdb
brw-rw---- 1 root disk 8, 17 May 31 15:27 /dev/sdb1
brw-rw---- 1 root disk 8, 32 May 31 15:27 /dev/sdc
brw-rw---- 1 root disk 8, 33 May 31 15:27 /dev/sdc1

storcli as this is a newer AVAGO MegaRAID SAS 9460-16i:

# /opt/MegaRAID/storcli/storcli64 /c0 show
Generating detailed summary of the adapter, it may take a while to complete.

CLI Version = 007.1017.0000.0000 May 10, 2019
Operating system = Linux 3.10.0-957.12.2.el7.x86_64
Controller = 0
Status = Success
Description = None

Product Name = AVAGO MegaRAID SAS 9460-16i
Serial Number = ***
SAS Address =  500062b204257a40
PCI Address = 00:13:00:00
System Time = 06/07/2019 22:12:39
Mfg. Date = 08/17/18
Controller Time = 06/07/2019 21:12:34
FW Package Build = 50.5.0-1121
BIOS Version = 7.05.02.0_0x07050400
FW Version = 5.050.00-1292
Driver Name = megaraid_sas
Driver Version = 07.705.02.00-rh1
Current Personality = RAID-Mode 
Vendor Id = 0x1000
Device Id = 0x14
SubVendor Id = 0x1000
SubDevice Id = 0x9460
Host Interface = PCI-E
Device Interface = SAS-12G
Bus Number = 19
Device Number = 0
Function Number = 0
Drive Groups = 1

TOPOLOGY :
========

-----------------------------------------------------------------------------
DG Arr Row EID:Slot DID Type  State BT       Size PDC  PI SED DS3  FSpace TR 
-----------------------------------------------------------------------------
 0 -   -   -        -   RAID6 Optl  N    2.181 TB dsbl N  N   dflt N      N  
 0 0   -   -        -   RAID6 Optl  N    2.181 TB dsbl N  N   dflt N      N  
 0 0   0   134:0    14  DRIVE Onln  N  744.687 GB dsbl N  N   dflt -      N  
 0 0   1   134:1    15  DRIVE Onln  N  744.687 GB dsbl N  N   dflt -      N  
 0 0   2   134:2    16  DRIVE Onln  N  744.687 GB dsbl N  N   dflt -      N  
 0 0   3   134:3    17  DRIVE Onln  N  744.687 GB dsbl N  N   dflt -      N  
 0 0   4   134:4    18  DRIVE Onln  N  744.687 GB dsbl N  N   dflt -      N  
-----------------------------------------------------------------------------

DG=Disk Group Index|Arr=Array Index|Row=Row Index|EID=Enclosure Device ID
DID=Device ID|Type=Drive Type|Onln=Online|Rbld=Rebuild|Dgrd=Degraded
Pdgd=Partially degraded|Offln=Offline|BT=Background Task Active
PDC=PD Cache|PI=Protection Info|SED=Self Encrypting Drive|Frgn=Foreign
DS3=Dimmer Switch 3|dflt=Default|Msng=Missing|FSpace=Free Space Present
TR=Transport Ready

Virtual Drives = 1

VD LIST :
=======

--------------------------------------------------------------
DG/VD TYPE  State Access Consist Cache Cac sCC     Size Name  
--------------------------------------------------------------
0/0   RAID6 Optl  RW     Yes     RWBD  -   ON  2.181 TB Raid6 
--------------------------------------------------------------

EID=Enclosure Device ID| VD=Virtual Drive| DG=Drive Group|Rec=Recovery
Cac=CacheCade|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady|B=Blocked|
Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency

Physical Drives = 5

PD LIST :
=======

------------------------------------------------------------------------------
EID:Slt DID State DG       Size Intf Med SED PI SeSz Model            Sp Type 
------------------------------------------------------------------------------
134:0    14 Onln   0 744.687 GB SAS  SSD N   N  512B SDLL1DLR800GCCA1 U  -    
134:1    15 Onln   0 744.687 GB SAS  SSD N   N  512B SDLL1DLR800GCCA1 U  -    
134:2    16 Onln   0 744.687 GB SAS  SSD N   N  512B SDLL1DLR800GCCA1 U  -    
134:3    17 Onln   0 744.687 GB SAS  SSD N   N  512B SDLL1DLR800GCCA1 U  -    
134:4    18 Onln   0 744.687 GB SAS  SSD N   N  512B SDLL1DLR800GCCA1 U  -    
------------------------------------------------------------------------------

EID=Enclosure Device ID|Slt=Slot No.|DID=Device ID|DG=DriveGroup
DHS=Dedicated Hot Spare|UGood=Unconfigured Good|GHS=Global Hotspare
UBad=Unconfigured Bad|Onln=Online|Offln=Offline|Intf=Interface
Med=Media Type|SED=Self Encryptive Drive|PI=Protection Info
SeSz=Sector Size|Sp=Spun|U=Up|D=Down|T=Transition|F=Foreign
UGUnsp=Unsupported|UGShld=UnConfigured shielded|HSPShld=Hotspare shielded
CFShld=Configured shielded|Cpybck=CopyBack|CBShld=Copyback Shielded
UBUnsp=UBad Unsupported


Cachevault_Info :
===============

------------------------------------
Model  State   Temp Mode MfgDate    
------------------------------------
CVPM05 Optimal 26C  -    2018/06/07 
------------------------------------

from check_smart.

Napsty avatar Napsty commented on May 29, 2024

What is /dev/sda then? Is this the raid-6 PV for data?

from check_smart.

greno2 avatar greno2 commented on May 29, 2024

/dev/sda is RAID6 PV, indeed.

from check_smart.

Napsty avatar Napsty commented on May 29, 2024

Did you already try ./check_smart.pl -d /dev/sda -i megaraid,1 ?

from check_smart.

greno2 avatar greno2 commented on May 29, 2024

That would work but as we're rolling out nrpe-checks by puppet (based on smartctl --scan) automatically for each physical disk we don't know the direct relation between /dev/bus/0 (retrieved by smartctl --scan) and /dev/sda)

from check_smart.

Napsty avatar Napsty commented on May 29, 2024

The --scan output is wrong or gives confusing data. The correct approach is to call the real block device (/dev/sda) and use the adapter position behind the block device.

from check_smart.

greno2 avatar greno2 commented on May 29, 2024

why wrong/confusing? It just works without guessing the correct blockdevice and returns the same results ...

Seems intended behaviour:
(man smartctl)

megaraid,N - [Linux only] the device consists of one or more SCSI/SAS disks connected to a MegaRAID controller. The non-negative integer N (in the range of 0 to 127 inclusive) denotes which disk on the controller is monitored. Use syntax such as:

smartctl -a -d megaraid,2 /dev/sda

smartctl -a -d megaraid,0 /dev/sdb

smartctl -a -d megaraid,0 /dev/bus/0

This interface will also work for Dell PERC controllers. It is possible to set RAID device name as /dev/bus/N, where N is a SCSI bus number.

The following entry in /proc/devices must exist:
For PERC2/3/4 controllers: megadevN
For PERC5/6 controllers: megaraid_sas_ioctlN

# smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
/dev/sdc -d scsi # /dev/sdc, SCSI device
/dev/bus/0 -d megaraid,14 # /dev/bus/0 [megaraid_disk_14], SCSI device
/dev/bus/0 -d megaraid,15 # /dev/bus/0 [megaraid_disk_15], SCSI device
/dev/bus/0 -d megaraid,16 # /dev/bus/0 [megaraid_disk_16], SCSI device
/dev/bus/0 -d megaraid,17 # /dev/bus/0 [megaraid_disk_17], SCSI device
/dev/bus/0 -d megaraid,18 # /dev/bus/0 [megaraid_disk_18], SCSI device
# smartctl -a /dev/bus/0 -d megaraid,14
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-957.12.2.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              SDLL1DLR800GCCA1
Revision:             Y150
Compliance:           SPC-4
User Capacity:        800,166,076,416 bytes [800 GB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is resource provisioned, LBPRZ=1
Rotation Rate:        Solid State Device
Form Factor:          2.5 inches
Logical Unit id:      0x5001173101846b40
Serial number:        A0469CEF
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Sat Jun  8 19:58:43 2019 CEST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Percentage used endurance indicator: 0%
Current Drive Temperature:     37 C
Drive Trip Temperature:        70 C

Manufactured in week 12 of year 2018
Specified cycle count over device lifetime:  0
Accumulated start-stop cycles:  526
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0      29090.359           0
write:         0        0         0         0          0       1819.224           0
verify:        0        0         0         0          0          0.034           0

Non-medium error count:      228

No self-tests have been logged

# smartctl -a /dev/sda -d megaraid,14
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-957.12.2.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              SDLL1DLR800GCCA1
Revision:             Y150
Compliance:           SPC-4
User Capacity:        800,166,076,416 bytes [800 GB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is resource provisioned, LBPRZ=1
Rotation Rate:        Solid State Device
Form Factor:          2.5 inches
Logical Unit id:      0x5001173101846b40
Serial number:        A0469CEF
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Sat Jun  8 19:58:51 2019 CEST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Percentage used endurance indicator: 0%
Current Drive Temperature:     37 C
Drive Trip Temperature:        70 C

Manufactured in week 12 of year 2018
Specified cycle count over device lifetime:  0
Accumulated start-stop cycles:  526
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0      29090.359           0
write:         0        0         0         0          0       1819.224           0
verify:        0        0         0         0          0          0.034           0

Non-medium error count:      228

No self-tests have been logged

from check_smart.

Napsty avatar Napsty commented on May 29, 2024

I'm doing some research concerning this and will get back to you.

from check_smart.

Napsty avatar Napsty commented on May 29, 2024

@greno2 Could you please test #38 (https://raw.githubusercontent.com/Napsty/check_smart/pseudo-bus-device/check_smart.pl) before I merge into master? Thx

from check_smart.

greno2 avatar greno2 commented on May 29, 2024

Works like a charm. Thanks for your quick support.

from check_smart.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.