Comments (3)
I just had a case where check_smart.pl detected pending sectors (which is good) however the results of a self-test clearly shows that there are read failures. However the value of Raw_Read_Error_Rate is 0 so this is not a good indicator:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 141 141 054 Pre-fail Offline - 72
3 Spin_Up_Time 0x0007 127 127 024 Pre-fail Always - 180 (Average 180)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 29
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 113 113 020 Pre-fail Offline - 35
9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 11032
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 29
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 77
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 77
194 Temperature_Celsius 0x0002 193 193 000 Old_age Always - 31 (Min/Max 22/35)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 16
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 10% 10926 962363536
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
But in general you're right. We should be able to see the read errors from the output and alert.
from check_smart.
By the way, to my current knowledge errors like these:
Error 1 occurred at disk power-on lifetime: 4281 hours (178 days + 9 hours)
When the command that caused the error occurred, the device was active or idle.
May also occur when there was a sudden power-outage. You can just pull the power plug(s) of your computer/server. The disks won't like it ;-)
On the other hand this doesn't forcibly mean that the disk is now failing.
from check_smart.
I just created the following PR #25 which introduces a new parameter (-s) to use the selftest log as additional check.
In your case, @deric , I recommend to run an extended self test of the affected drive you believe could have errors (e.g. smartctl -t long /dev/sda
) and then run check_smart.pl with the -s parameter e.g. ./check_smart.pl -i ata -d /dev/sda -s
.
Note that as of today it's not fully merged yet. This feature will be released in version 5.10.
from check_smart.
Related Issues (20)
- status line 2000GB Gigabyte AORUS M.2 2280 PCIe 4.0 x4 NVMe HOT 4
- Warning thresholds does NOT give the expected result. HOT 2
- Add attribute 188 Command_Timeout to raw check list HOT 1
- Handling dots in attribute names HOT 1
- add aacraid HOT 5
- Request: Auto detect and count all drive on system
- Add special monitoring on SSD attribute 202 (Percent_Lifetime_Remain) HOT 1
- Prioritise output by criticality HOT 14
- Wear_Leveling_Count is not reported as CRIT when disk is almost dead HOT 7
- No performance data on NVMe drive HOT 2
- 6.12.0 regression: invalid interface
- megaraid,N not work with 6.12 HOT 2
- Add TBW calculations for end of life prediction in SSDs HOT 1
- Percent_Lifetime_Remain usage HOT 5
- flag to disable temperature check HOT 2
- Intel ssd wearout not reported when almost dead HOT 9
- check_smart.pl very slow on Almalinux 8 HOT 1
- Percent_Lifetime_Remain threshold unset with -w HOT 19
- No output after pipe HOT 4
- Kingston ssd wearout not detected HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from check_smart.