Code Monkey home page Code Monkey logo

Comments (7)

Napsty avatar Napsty commented on May 30, 2024

Please run the plugin in debug mode and show the output here. It's important to see what the plugin actually detects.

Just my personal opinion: The -g parameter is rather for a quick check and should be avoided, personally I don't use it. You're missing out on performance data which you can graph over time. This can be quite helpful to predict a failure (done so in the past). Try the same again with the -d parameter instead. I will soon publish an Icinga2 apply rule for SMART drive checks.

from check_smart.

Reiner030 avatar Reiner030 commented on May 30, 2024

Ah thanks.

I tested it out with -g because the "normal" servers have sda/sdb as linux software mirror and thought that this would be easier to setup a common rule for nagios check.
It would be good to have your mention about the lost performance data with -g also in the readme.

I attach the debug output of run of your script without/with -s option and
the origin smartctl checks with -l error parameter:

from check_smart.

Napsty avatar Napsty commented on May 30, 2024

Thanks for the files.

Looking at check_smart.pl-MegaRaid.txt, you can see that all the checks are actually OK. At the current version, check_smart checks for the current SMART health check which was OK in all drives and checks for defect sectors in the "Current Pending Sector" attribute. These were OK, too.
Drive megaraid,17 has 2 defect sectors listed as "Offline_Uncorrectable" which may be detected in a future version of check_smart (I'm planning on that).

In check_smart.pl-MegaRaid-with-s.txt, the selftest log file is enabled. This has found errors on drive megaraid,19. The selftest log check is by default not enabled in the plugin because of two reasons:

  1. it would break existing checks of older versions of the plugin. it's possible that suddenly alerts would arise on drives which didn't alert before.
  2. the selftest log is an indicator, but the entries are not a guarantee for issues with the drive. I've already seen logs appearing which were then cleared by themselves or log entries which are actually caused by a system reset (powerloss) during the self-test but have nothing to do with bad sectors.

So it's up to the user to decide whether or not to include the selftest logs in the check.

The drive megaraid,19 doesn't seem to be an actual physical drive though as there are no SMART attributes shown in the first check without -s. It may be the megaraid controller itself. See #33. That's another reason why the -g parameter should be used with caution :-).

It would be good to have your mention about the lost performance data with -g also in the readme.

Good idea. I've added this on the official documentation page https://www.claudiokuenzler.com/monitoring-plugins/check_smart.php.

from check_smart.

Napsty avatar Napsty commented on May 30, 2024

@Reiner030 can you check if check_smart.pl v 6.0 changes something?

from check_smart.

Reiner030 avatar Reiner030 commented on May 30, 2024

Hello @Napsty actual I have no disks with reallocated counts but still with UDMA_CRC_Error_Count and it seems working nicely... thanks ;)

# for i in $(seq 4 17); do /usr/lib/nagios/plugins/check_smart-new.pl -d /dev/sda -i megaraid,$i -r Current_Pending_Sector,Reallocated_Sector_Ct,Program_Fail_Cnt_Total,Uncorrectable_Error_Cnt,Offline_Uncorrectable,Runtime_Bad_Block,UDMA_CRC_Error_Count; done
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=104 Spin_Up_Time=445 Start_Stop_Count=15 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=18 Power_On_Hours=25099 Spin_Retry_Count=0 Power_Cycle_Count=12 Power-Off_Retract_Count=1048 Load_Cycle_Count=1048 Temperature_Celsius=44 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=104 Spin_Up_Time=447 Start_Stop_Count=15 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=18 Power_On_Hours=25100 Spin_Retry_Count=0 Power_Cycle_Count=12 Power-Off_Retract_Count=1048 Load_Cycle_Count=1048 Temperature_Celsius=42 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
WARNING: UDMA_CRC_Error_Count is non-zero (4739)|Raw_Read_Error_Rate=0 Throughput_Performance=100 Spin_Up_Time=392 Start_Stop_Count=17 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=18 Power_On_Hours=28097 Spin_Retry_Count=0 Power_Cycle_Count=13 Power-Off_Retract_Count=1158 Load_Cycle_Count=1158 Temperature_Celsius=42 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=4739
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=104 Spin_Up_Time=452 Start_Stop_Count=15 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=18 Power_On_Hours=25100 Spin_Retry_Count=0 Power_Cycle_Count=12 Power-Off_Retract_Count=1048 Load_Cycle_Count=1048 Temperature_Celsius=42 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=104 Spin_Up_Time=428 Start_Stop_Count=15 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=18 Power_On_Hours=25100 Spin_Retry_Count=0 Power_Cycle_Count=12 Power-Off_Retract_Count=1048 Load_Cycle_Count=1048 Temperature_Celsius=43 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=104 Spin_Up_Time=453 Start_Stop_Count=15 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=15 Power_On_Hours=25100 Spin_Retry_Count=0 Power_Cycle_Count=12 Power-Off_Retract_Count=1049 Load_Cycle_Count=1049 Temperature_Celsius=45 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=100 Spin_Up_Time=406 Start_Stop_Count=14 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=17 Power_On_Hours=25099 Spin_Retry_Count=0 Power_Cycle_Count=12 Power-Off_Retract_Count=1047 Load_Cycle_Count=1047 Temperature_Celsius=43 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=100 Spin_Up_Time=487 Start_Stop_Count=51 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=18 Power_On_Hours=20891 Spin_Retry_Count=0 Power_Cycle_Count=51 Power-Off_Retract_Count=910 Load_Cycle_Count=910 Temperature_Celsius=42 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=106 Spin_Up_Time=458 Start_Stop_Count=15 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=18 Power_On_Hours=25100 Spin_Retry_Count=0 Power_Cycle_Count=12 Power-Off_Retract_Count=1048 Load_Cycle_Count=1048 Temperature_Celsius=42 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
WARNING: UDMA_CRC_Error_Count is non-zero (13)|Raw_Read_Error_Rate=0 Throughput_Performance=104 Spin_Up_Time=452 Start_Stop_Count=16 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=18 Power_On_Hours=26318 Spin_Retry_Count=0 Power_Cycle_Count=15 Power-Off_Retract_Count=1082 Load_Cycle_Count=1082 Temperature_Celsius=44 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=13
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=100 Spin_Up_Time=0 Start_Stop_Count=2 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=15 Power_On_Hours=172 Spin_Retry_Count=0 Power_Cycle_Count=2 Power-Off_Retract_Count=9 Load_Cycle_Count=9 Temperature_Celsius=47 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=100 Spin_Up_Time=441 Start_Stop_Count=10 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=18 Power_On_Hours=22028 Spin_Retry_Count=0 Power_Cycle_Count=7 Power-Off_Retract_Count=923 Load_Cycle_Count=923 Temperature_Celsius=48 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=100 Spin_Up_Time=430 Start_Stop_Count=13 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=18 Power_On_Hours=22028 Spin_Retry_Count=0 Power_Cycle_Count=7 Power-Off_Retract_Count=931 Load_Cycle_Count=931 Temperature_Celsius=47 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=0 Spin_Up_Time=9405 Start_Stop_Count=16 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=0 Power_On_Hours=941 Spin_Retry_Count=0 Power_Cycle_Count=13 G-Sense_Error_Rate=0 Power-Off_Retract_Count=9 Load_Cycle_Count=1669 Temperature_Celsius=44 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0 Disk_Shift=0 Loaded_Hours=530 Load_Retry_Count=0 Load_Friction=0 Load-in_Time=539 Head_Flying_Hours=0

from check_smart.

Napsty avatar Napsty commented on May 30, 2024

As you can see in #26 I first planned on adding CRC_Error_Count into the default raw list, but according to https://en.wikipedia.org/wiki/S.M.A.R.T. this value is not considered critical and these errors could also be caused by a faulty cable or connection:

count of errors in data transfer via the interface cable

But of course you're free to override the default raw list with -r, as you did. That's what this parameter is for :-). I am therefore closing the ticket.

from check_smart.

Reiner030 avatar Reiner030 commented on May 30, 2024

yes, thanks... the additional attribute was mainly for testing purpose ;)

from check_smart.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.