Comments (7)
I found an interesting Samsung (official) document. https://image-us.samsung.com/SamsungUS/b2b/resource/2016/05/31/WHP-SSD-SSDSMARTATTRIBUTES-APR16J.pdf
The raw value of Wear Leveling Count reports the amount of NAND writes as a function of consumed P/e cycles, meaning that an increment of 1 corresponds to one full drive write. it should be noted that one full drive write in this context means the physical, raw NAND capacity of the drive, so in case of a 960gb sM863 for example, an increase of 1 in Wear Leveling Count translates to 1,024gib of NAND writes.
This indicates, that Wear_Leveling_Count (raw value) means the number of full drive writes. So if you have a 500GB drive and the Wear_Leveling_Count raw value is at 19 (in my case), this would mean that (roughly) 19 * 500GB has been written on the drive.
This can help you to calculate an estimated lifetime remaining (see Samsung document for the formula) but it does not indicate a pending failure.
The document mentions the following attributes to be considered critical for drive health:
The four SMART attributes listed in the table below are the most important indicators of drive health. if any of the normalized values drop below the 10% threshold, itβs recommended to replace the drive as soon as possible because itβs approaching the end of its life and may become unreliable if used longer.
179 Unused Reserved block Count (Used_Rsvd_Blk_Cnt_Tot)
181 Program fail Count (Program_Fail_Cnt_Total) -> already part of default raw list
182 Erase Fail Count (Erase_Fail_Count_Total)
183 Runtime Bad Count (Runtime_Bad_Block) -> already part of default raw list
So I suggest to add Erase_Fail_Count_Total to the default raw list.
from check_smart.
@pschonmann https://raw.githubusercontent.com/Napsty/check_smart/6.11.1/check_smart.pl now contains Erase_Fail_Count_Total in the default raw list.
This will be released in the next version, 6.12.0.
from check_smart.
Thanks for reporting this. In #36 I tried to determine which attributes should be default be added into the raw (check) list.
It seems that this attribute 177 Wear_Leveling_Count is not used by all SSD models. I can see this attribute on my Samsung (Samsung SSD 850 EVO 500GB) SSDs, but not on SanDisk or Western Digital SSDs.
Now the big question is whether this Wear_Leveling_Count attribute is really a strong/important indicator of pending drive failure. Do you have any official Samsung documentation at hand?
(A good comparison is the Total_Bad_Block attribute, which is used by some SSD models. The name itself sounds alarming yet the values can vary a lot, even for brand new drives, and they don't really show a pending failure).
Also in the superuser link you posted, someone mentions:
All of your drives are at between 95 and 100, and will eventually drop to 0.
I'm not sure that this is correct. The attribute counters shown by smartctl
usually start at 0 and increase to 100. This can be seen in Crucial MX SSDs with the Percent_Lifetime_Remain attributes (see https://www.claudiokuenzler.com/blog/1077/when-is-solid-state-drive-ssd-dead-analysis-crucial-mx500-1tb for a detailed analysis). Although the name indicates "remain", the counter actually starts at 0. This could be the same case for the Wear_Leveling_Count attribute (TBV!).
In my own Samsung SSDs, I can see the following values:
ckadm@mintp ~ $ sudo smartctl -a /dev/sda | grep "Wear_Leveling_Count"
177 Wear_Leveling_Count 0x0013 099 099 000 Pre-fail Always - 19
ckadm@mintp ~ $ sudo smartctl -a /dev/sdb | grep "Wear_Leveling_Count"
177 Wear_Leveling_Count 0x0013 098 098 000 Pre-fail Always - 22
I personally interpret this as 19% and 22% - which would still be largely OK if 100% is the assumed MAX value.
Now the big question is the following: Do we find proof somewhere, that Wear_Leveling_Count is really an important indicator for a pre-failure? If yes -> From which value on is this considered to be CRITICAL? Above 90? I see in your drive you have a value of 2981 - whatever this means.
Until this is discussed and solved, you can use the following workaround (append the raw list):
$ ./check_smart.pl -r "Current_Pending_Sector,Reallocated_Sector_Ct,Program_Fail_Cnt_Total,Uncorrectable_Error_Cnt,Offline_Uncorrectable,Runtime_Bad_Block,Reported_Uncorrect,Reallocated_Event_Count,Wear_Leveling_Count" -d /dev/sda -i ata
WARNING: Drive Samsung SSD 850 EVO 500GB S/N XXX: Wear_Leveling_Count is non-zero (19), |Reallocated_Sector_Ct=0 Power_On_Hours=19456 Power_Cycle_Count=435 Wear_Leveling_Count=19 Used_Rsvd_Blk_Cnt_Tot=0 Program_Fail_Cnt_Total=0 Erase_Fail_Count_Total=0 Runtime_Bad_Block=0 Uncorrectable_Error_Cnt=0 Airflow_Temperature_Cel=34 ECC_Error_Rate=0 CRC_Error_Count=0 POR_Recovery_Count=8 Total_LBAs_Written=24168613381
from check_smart.
I found some info
https://web.archive.org/web/20150310051031/http://www.samsung.com/global/business/semiconductor/minisite/SSD/global/html/whitepaper/whitepaper07.html
This attribute represents the number of media program and erase operations (the number of times a block has been erased). This value is directly related to the lifetime of the SSD. The raw value of this attribute shows the total count of P/E Cycles.
SRC: https://newbedev.com/how-to-check-the-life-left-in-ssd-or-the-medium-s-wear-level
from check_smart.
Ok, now i run...
check_smart.pl -g '/dev/sd[a-z] /dev/sd[abc][a-z]' -i 'auto' -E Airflow_Temperature_Cel -w 'Reallocated_Sector_Ct=15,Current_Pending_Sector=100,Reallocated_Event_Count=100,Runtime_Bad_Block=100,Uncorrectable_Error_Cnt=100,Wear_Leveling_Count=300,Erase_Fail_Count_Total=1' --debug
But in debug output i see
(debug) Erase_Fail_Count_Total not in raw check list (raw value: 0)
Is that value monitored ? I have no disk with value > 0 to test (unfortunately :) )
EDIT:
OH, i have old version 6.9.0. Updated and seems ok
from check_smart.
Thanks.
And would be possible to monitor Wear_leveling_count normalised values ? Normalized value: decrements from 100 to 0. Would be fine be informed when last 10% and change of disk is recommended.
from check_smart.
Unfortunately we cannot. We only can read the raw values from smartctl. Unless you know how to?
from check_smart.
Related Issues (20)
- status line 2000GB Gigabyte AORUS M.2 2280 PCIe 4.0 x4 NVMe HOT 4
- Warning thresholds does NOT give the expected result. HOT 2
- Add attribute 188 Command_Timeout to raw check list HOT 1
- Handling dots in attribute names HOT 1
- add aacraid HOT 5
- Request: Auto detect and count all drive on system
- Add special monitoring on SSD attribute 202 (Percent_Lifetime_Remain) HOT 1
- Prioritise output by criticality HOT 14
- No performance data on NVMe drive HOT 2
- 6.12.0 regression: invalid interface
- megaraid,N not work with 6.12 HOT 2
- Add TBW calculations for end of life prediction in SSDs HOT 1
- Percent_Lifetime_Remain usage HOT 5
- flag to disable temperature check HOT 2
- Intel ssd wearout not reported when almost dead HOT 9
- check_smart.pl very slow on Almalinux 8 HOT 1
- Percent_Lifetime_Remain threshold unset with -w HOT 19
- No output after pipe HOT 4
- Kingston ssd wearout not detected HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from check_smart.