Comments (7)
Please run the plugin in debug mode and show the output here. It's important to see what the plugin actually detects.
Just my personal opinion: The -g parameter is rather for a quick check and should be avoided, personally I don't use it. You're missing out on performance data which you can graph over time. This can be quite helpful to predict a failure (done so in the past). Try the same again with the -d parameter instead. I will soon publish an Icinga2 apply rule for SMART drive checks.
from check_smart.
Ah thanks.
I tested it out with -g
because the "normal" servers have sda/sdb as linux software mirror and thought that this would be easier to setup a common rule for nagios check.
It would be good to have your mention about the lost performance data with -g
also in the readme.
I attach the debug output of run of your script without/with -s
option and
the origin smartctl checks with -l error
parameter:
- check_smart.pl-MegaRaid.txt
- check_smart.pl-MegaRaid-with-s.txt
- smartctl-sat+megaraid,13.txt
- smartctl-sat+megaraid,17.txt
from check_smart.
Thanks for the files.
Looking at check_smart.pl-MegaRaid.txt, you can see that all the checks are actually OK. At the current version, check_smart checks for the current SMART health check which was OK in all drives and checks for defect sectors in the "Current Pending Sector" attribute. These were OK, too.
Drive megaraid,17 has 2 defect sectors listed as "Offline_Uncorrectable" which may be detected in a future version of check_smart (I'm planning on that).
In check_smart.pl-MegaRaid-with-s.txt, the selftest log file is enabled. This has found errors on drive megaraid,19. The selftest log check is by default not enabled in the plugin because of two reasons:
- it would break existing checks of older versions of the plugin. it's possible that suddenly alerts would arise on drives which didn't alert before.
- the selftest log is an indicator, but the entries are not a guarantee for issues with the drive. I've already seen logs appearing which were then cleared by themselves or log entries which are actually caused by a system reset (powerloss) during the self-test but have nothing to do with bad sectors.
So it's up to the user to decide whether or not to include the selftest logs in the check.
The drive megaraid,19 doesn't seem to be an actual physical drive though as there are no SMART attributes shown in the first check without -s
. It may be the megaraid controller itself. See #33. That's another reason why the -g parameter should be used with caution :-).
It would be good to have your mention about the lost performance data with -g also in the readme.
Good idea. I've added this on the official documentation page https://www.claudiokuenzler.com/monitoring-plugins/check_smart.php.
from check_smart.
@Reiner030 can you check if check_smart.pl v 6.0 changes something?
from check_smart.
Hello @Napsty actual I have no disks with reallocated counts but still with UDMA_CRC_Error_Count and it seems working nicely... thanks ;)
# for i in $(seq 4 17); do /usr/lib/nagios/plugins/check_smart-new.pl -d /dev/sda -i megaraid,$i -r Current_Pending_Sector,Reallocated_Sector_Ct,Program_Fail_Cnt_Total,Uncorrectable_Error_Cnt,Offline_Uncorrectable,Runtime_Bad_Block,UDMA_CRC_Error_Count; done
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=104 Spin_Up_Time=445 Start_Stop_Count=15 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=18 Power_On_Hours=25099 Spin_Retry_Count=0 Power_Cycle_Count=12 Power-Off_Retract_Count=1048 Load_Cycle_Count=1048 Temperature_Celsius=44 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=104 Spin_Up_Time=447 Start_Stop_Count=15 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=18 Power_On_Hours=25100 Spin_Retry_Count=0 Power_Cycle_Count=12 Power-Off_Retract_Count=1048 Load_Cycle_Count=1048 Temperature_Celsius=42 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
WARNING: UDMA_CRC_Error_Count is non-zero (4739)|Raw_Read_Error_Rate=0 Throughput_Performance=100 Spin_Up_Time=392 Start_Stop_Count=17 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=18 Power_On_Hours=28097 Spin_Retry_Count=0 Power_Cycle_Count=13 Power-Off_Retract_Count=1158 Load_Cycle_Count=1158 Temperature_Celsius=42 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=4739
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=104 Spin_Up_Time=452 Start_Stop_Count=15 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=18 Power_On_Hours=25100 Spin_Retry_Count=0 Power_Cycle_Count=12 Power-Off_Retract_Count=1048 Load_Cycle_Count=1048 Temperature_Celsius=42 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=104 Spin_Up_Time=428 Start_Stop_Count=15 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=18 Power_On_Hours=25100 Spin_Retry_Count=0 Power_Cycle_Count=12 Power-Off_Retract_Count=1048 Load_Cycle_Count=1048 Temperature_Celsius=43 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=104 Spin_Up_Time=453 Start_Stop_Count=15 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=15 Power_On_Hours=25100 Spin_Retry_Count=0 Power_Cycle_Count=12 Power-Off_Retract_Count=1049 Load_Cycle_Count=1049 Temperature_Celsius=45 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=100 Spin_Up_Time=406 Start_Stop_Count=14 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=17 Power_On_Hours=25099 Spin_Retry_Count=0 Power_Cycle_Count=12 Power-Off_Retract_Count=1047 Load_Cycle_Count=1047 Temperature_Celsius=43 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=100 Spin_Up_Time=487 Start_Stop_Count=51 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=18 Power_On_Hours=20891 Spin_Retry_Count=0 Power_Cycle_Count=51 Power-Off_Retract_Count=910 Load_Cycle_Count=910 Temperature_Celsius=42 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=106 Spin_Up_Time=458 Start_Stop_Count=15 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=18 Power_On_Hours=25100 Spin_Retry_Count=0 Power_Cycle_Count=12 Power-Off_Retract_Count=1048 Load_Cycle_Count=1048 Temperature_Celsius=42 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
WARNING: UDMA_CRC_Error_Count is non-zero (13)|Raw_Read_Error_Rate=0 Throughput_Performance=104 Spin_Up_Time=452 Start_Stop_Count=16 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=18 Power_On_Hours=26318 Spin_Retry_Count=0 Power_Cycle_Count=15 Power-Off_Retract_Count=1082 Load_Cycle_Count=1082 Temperature_Celsius=44 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=13
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=100 Spin_Up_Time=0 Start_Stop_Count=2 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=15 Power_On_Hours=172 Spin_Retry_Count=0 Power_Cycle_Count=2 Power-Off_Retract_Count=9 Load_Cycle_Count=9 Temperature_Celsius=47 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=100 Spin_Up_Time=441 Start_Stop_Count=10 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=18 Power_On_Hours=22028 Spin_Retry_Count=0 Power_Cycle_Count=7 Power-Off_Retract_Count=923 Load_Cycle_Count=923 Temperature_Celsius=48 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=100 Spin_Up_Time=430 Start_Stop_Count=13 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=18 Power_On_Hours=22028 Spin_Retry_Count=0 Power_Cycle_Count=7 Power-Off_Retract_Count=931 Load_Cycle_Count=931 Temperature_Celsius=47 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=0 Spin_Up_Time=9405 Start_Stop_Count=16 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=0 Power_On_Hours=941 Spin_Retry_Count=0 Power_Cycle_Count=13 G-Sense_Error_Rate=0 Power-Off_Retract_Count=9 Load_Cycle_Count=1669 Temperature_Celsius=44 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0 Disk_Shift=0 Loaded_Hours=530 Load_Retry_Count=0 Load_Friction=0 Load-in_Time=539 Head_Flying_Hours=0
from check_smart.
As you can see in #26 I first planned on adding CRC_Error_Count
into the default raw list, but according to https://en.wikipedia.org/wiki/S.M.A.R.T. this value is not considered critical and these errors could also be caused by a faulty cable or connection:
count of errors in data transfer via the interface cable
But of course you're free to override the default raw list with -r
, as you did. That's what this parameter is for :-). I am therefore closing the ticket.
from check_smart.
yes, thanks... the additional attribute was mainly for testing purpose ;)
from check_smart.
Related Issues (20)
- status line 2000GB Gigabyte AORUS M.2 2280 PCIe 4.0 x4 NVMe HOT 4
- Warning thresholds does NOT give the expected result. HOT 2
- Add attribute 188 Command_Timeout to raw check list HOT 1
- Handling dots in attribute names HOT 1
- add aacraid HOT 5
- Request: Auto detect and count all drive on system
- Add special monitoring on SSD attribute 202 (Percent_Lifetime_Remain) HOT 1
- Prioritise output by criticality HOT 14
- Wear_Leveling_Count is not reported as CRIT when disk is almost dead HOT 7
- No performance data on NVMe drive HOT 2
- 6.12.0 regression: invalid interface
- megaraid,N not work with 6.12 HOT 2
- Add TBW calculations for end of life prediction in SSDs HOT 1
- Percent_Lifetime_Remain usage HOT 5
- flag to disable temperature check HOT 2
- Intel ssd wearout not reported when almost dead HOT 9
- check_smart.pl very slow on Almalinux 8 HOT 1
- Percent_Lifetime_Remain threshold unset with -w HOT 19
- No output after pipe HOT 4
- Kingston ssd wearout not detected HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from check_smart.