Comments (4)
Hi @der-michik , thanks for reporting!
That's a very interesting case and it might even be a misinterpretation of the attributes coming from smartctl
(smartmontools). According to https://en.wikipedia.org/wiki/S.M.A.R.T., attribute id 194 is the most used attribute to show the current temperature, whereas attribute id 231 is Life Left (SSDs) or Temperature
. As your drive is an SSD, this is more likely to be "Life Left".
Again from the Wikipedia page:
Indicates the approximate SSD life left, in terms of program/erase cycles or available reserved blocks.[67] A normalized value of 100 represents a new drive, with a threshold value at 10 indicating a need for replacement. A value of 0 may mean that the drive is operating in read-only mode to allow data recovery.[68] Previously (pre-2010) occasionally used for Drive Temperature (more typically reported at 0xC2).
Your SSD drive shows the value 100 which shows a perfectly healthy drive, according to this attribute.
Can you please check the smartctl/smartmontools version on this particular host? We should probably report this upstream.
Update: Seems already fixed in smartmontools, check out: https://github.com/smartmontools/smartmontools/blob/master/smartmontools/drivedb.h#L4082 and smartmontools/smartmontools@160ecb1#diff-5c51af8dba19f3a4f4187af4b46e415f
And the ultimate finding: smartmontools/smartmontools#4
from check_smart.
Ah, interesting, thanks for your research! That explains a lot. I did not think about having a detailed look at smartmontools as upgrading that on the affected systems is not really an option for me anyway whereas patching the script was an easy workaround.
Nevertheless, we maybe should think about a more flexible exclude option. Currently, -e
only excludes attributes from failure reporting and only by name. Names are known differ somewhat between drive vendors (even if the information in the attributes is correct) and are not always unique like in my example. IDs would probably be a bit more reliable and exclusion from the performance data as well would be nice to have. Then I could exclude the broken attributes for the affected hosts not in the script on the hosts themselves but in the Icinga configuration instead (and that is build from Ansible using the monitored hosts' facts, so I could even detect it automatically).
Maybe I will have a look at it and do a pull request tomorrow.
from check_smart.
Currently, -e only excludes attributes from failure reporting and only by name. Names are known differ somewhat between drive vendors (even if the information in the attributes is correct) and are not always unique like in my example. IDs would probably be a bit more reliable
That was actually my intended answer here (to use -e attribute_id) :D
I somewhat forgot that the ID could not be excluded. But it's fairly easy to do and add this.
If you want, I'll let you do the code change and PR. If you don't find the time, let me know.
from check_smart.
As this is originally an already solved upstream issue anyway and I have a nice workaround now that fits my workflow, this can be closed I think.
from check_smart.
Related Issues (20)
- status line 2000GB Gigabyte AORUS M.2 2280 PCIe 4.0 x4 NVMe HOT 4
- Warning thresholds does NOT give the expected result. HOT 2
- Add attribute 188 Command_Timeout to raw check list HOT 1
- Handling dots in attribute names HOT 1
- add aacraid HOT 5
- Request: Auto detect and count all drive on system
- Add special monitoring on SSD attribute 202 (Percent_Lifetime_Remain) HOT 1
- Prioritise output by criticality HOT 14
- Wear_Leveling_Count is not reported as CRIT when disk is almost dead HOT 7
- No performance data on NVMe drive HOT 2
- 6.12.0 regression: invalid interface
- megaraid,N not work with 6.12 HOT 2
- Add TBW calculations for end of life prediction in SSDs HOT 1
- Percent_Lifetime_Remain usage HOT 5
- flag to disable temperature check HOT 2
- Intel ssd wearout not reported when almost dead HOT 9
- check_smart.pl very slow on Almalinux 8 HOT 1
- Percent_Lifetime_Remain threshold unset with -w HOT 19
- No output after pipe HOT 4
- Kingston ssd wearout not detected HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from check_smart.