Code Monkey home page Code Monkey logo

disk_failure_prediction's Introduction

disk_failure_prediction

This repository contains the code for the paper Making Disk Failure Predictions SMARTer! (The 18th USENIX Conference on File and Storage Technologies (FAST '20) ) by Sidi Lu, Bing Luo, Tirthak Patel, Yongtao Yao, Devesh Tiwari, and Weisong Shi. [pdf] [project page]

This study covers the disk and server data measured and collected at a large data center. Over, the dataset spans over 64 data center sites, 10,000 server racks and 380,000 hard disks for roughly 70 days. This corresponds to roughly 2.6 million device hours. We note that during this period, the data center housed more than two million hard disks, but not all of them are included in our study because we only focus on those disks that have logged data in all three aspects: SMART, performance, and location.

โ€‹ Figure 1: SMART attributes of healthy vs. failed disks prior to disk failures.

โ€‹ Figure 2: Performance metrics of healthy vs. failed disks prior to disk failures.

We conducted a field study of HDDs based on a large-scale dataset collected from a leading ecommerce production data center, including SMART attributes, performance metrics, and location markers. We discover that performance metrics are good indicators of disk failures. We also found that location markers can improve the accuracy of disk failure prediction. Lastly, we trained machine learning models including neural network models to predict disk failures with 0.95 F-measure and 0.95 MCC for 10 days prediction horizon.

Figure 3: Model prediction quality with different groups of SMART (S), performance (P), and location (L) features.

Figure 4: Model false positive rate (FPR = FP/(FP + TN)) and false negative rate (FNR = FN/(TP + FN)).

Training Model

Citation

@inproceedings {246172,
author = {Sidi Lu and Bing Luo and Tirthak Patel and Yongtao Yao and Devesh Tiwari and Weisong Shi},
title = {Making Disk Failure Predictions SMARTer!},
booktitle = {18th {USENIX} Conference on File and Storage Technologies ({FAST} 20)},
year = {2020},
isbn = {978-1-939133-12-0},
address = {Santa Clara, CA},
pages = {151--167},
url = {https://www.usenix.org/conference/fast20/presentation/lu},
publisher = {{USENIX} Association},
month = feb,
}

Contact

Sidi Lu, Wayne State University

disk_failure_prediction's People

Contributors

sidilu001 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.