Code Monkey home page Code Monkey logo

Comments (4)

Napsty avatar Napsty commented on May 29, 2024 1

Hi @der-michik , thanks for reporting!
That's a very interesting case and it might even be a misinterpretation of the attributes coming from smartctl (smartmontools). According to https://en.wikipedia.org/wiki/S.M.A.R.T., attribute id 194 is the most used attribute to show the current temperature, whereas attribute id 231 is Life Left (SSDs) or Temperature. As your drive is an SSD, this is more likely to be "Life Left".
Again from the Wikipedia page:

Indicates the approximate SSD life left, in terms of program/erase cycles or available reserved blocks.[67] A normalized value of 100 represents a new drive, with a threshold value at 10 indicating a need for replacement. A value of 0 may mean that the drive is operating in read-only mode to allow data recovery.[68] Previously (pre-2010) occasionally used for Drive Temperature (more typically reported at 0xC2).

Your SSD drive shows the value 100 which shows a perfectly healthy drive, according to this attribute.

Can you please check the smartctl/smartmontools version on this particular host? We should probably report this upstream.

Update: Seems already fixed in smartmontools, check out: https://github.com/smartmontools/smartmontools/blob/master/smartmontools/drivedb.h#L4082 and smartmontools/smartmontools@160ecb1#diff-5c51af8dba19f3a4f4187af4b46e415f

And the ultimate finding: smartmontools/smartmontools#4

from check_smart.

MichiK avatar MichiK commented on May 29, 2024

Ah, interesting, thanks for your research! That explains a lot. I did not think about having a detailed look at smartmontools as upgrading that on the affected systems is not really an option for me anyway whereas patching the script was an easy workaround.

Nevertheless, we maybe should think about a more flexible exclude option. Currently, -e only excludes attributes from failure reporting and only by name. Names are known differ somewhat between drive vendors (even if the information in the attributes is correct) and are not always unique like in my example. IDs would probably be a bit more reliable and exclusion from the performance data as well would be nice to have. Then I could exclude the broken attributes for the affected hosts not in the script on the hosts themselves but in the Icinga configuration instead (and that is build from Ansible using the monitored hosts' facts, so I could even detect it automatically).

Maybe I will have a look at it and do a pull request tomorrow.

from check_smart.

Napsty avatar Napsty commented on May 29, 2024

Currently, -e only excludes attributes from failure reporting and only by name. Names are known differ somewhat between drive vendors (even if the information in the attributes is correct) and are not always unique like in my example. IDs would probably be a bit more reliable

That was actually my intended answer here (to use -e attribute_id) :D
I somewhat forgot that the ID could not be excluded. But it's fairly easy to do and add this.
If you want, I'll let you do the code change and PR. If you don't find the time, let me know.

from check_smart.

MichiK avatar MichiK commented on May 29, 2024

As this is originally an already solved upstream issue anyway and I have a nice workaround now that fits my workflow, this can be closed I think.

from check_smart.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.