Code Monkey home page Code Monkey logo

md-clustering's Introduction

Performance Grading of Clustering Algorithms on Molecular Dynamics Simulations of Proteins

Research Poster

image

Abstract

Computational studies continue to serve an important role in modeling and understanding protein dynamics in biology. Molecular Dynamics (MD) can model the molecular structures of proteins and simulate their motion over nanosecond-microsecond time scales using classical mechanics. MD simulations can reveal insights into the folding process that are beyond present laboratory means. When trajectories of the proteins' motion are generated by MD simulation, machine learning algorithms like k- means, spectral, and subspace clustering help identify the structures and processes that are integral to the folding process, which is challenging to do by eye. We aimed to evaluate the performance of these various algorithms with a special focus on the recent hybrid spectral/subspace method by comparing their normalized mutual information (NMI) scores over cumulative simulation time. Principal Component Analysis (PCA) was performed to visualize the trajectories and their clustering results. The theory of protein dynamics suggests that given an infinite amount of time the sampling space should become increasingly mixed. Algorithms that can still identify distinct structures are better suited for clustering MD data. We found that the hybrid spectral/subspace method delivered the best performance overall, and provided the most conservative estimate of the sampling adequacy.

Introduction and Background

MD simulations are a common tool in computational biology for simulating the dynamic behavior of biomolecular structures like proteins. Classical mechanics model the forces acting on the proteins on a molecular level and computers simulate the effects, allowing for exploration of the energy landscape of the protein, and consequently the conformation states of the protein. Of special interest is the protein’s native conformation state, which determines the protein’s function. This can be important in a myriad of scientific interests including supporting the design of safe and effective drugs, vaccine production, study of neurodegenerative diseases and other biomedical research. However, while MD simulation is an effective way of studying protein folding, it is computationally expensive. For protein folding that happens beyond the millisecond range, healthy exploration of the possible conformation states can take an unfeasible amount of time (months or years). Protein also get stuck negotiating energy barriers on the way to their native conformations. Consequently, automated methods that can accurately organize protein conformations are especially useful to the computational biologist studying proteins and protein folding. We investigate four clustering algorithms and assess their impact through comparative analysis.

Discussion

For computational biologists, the more accurate the algorithm, the better. The hybrid spectral/subspace method performed quite well, boasting the highest NMI in the most difficult space. • Sci-kit learn’s ordinary k-means algorithm delivered very respectable performance and often beat out spectral and subspace. However, manual hyperparameter optimization may have hid the clustering power of the spectral and subspace algorithms. • Different forcefields for the same protein may affect the robustness of the clustering algorithms. Further inquiry is required.

md-clustering's People

Contributors

ekim22 avatar jlphillipsphd avatar

Watchers

James Cloos avatar  avatar David Ludwig avatar

Forkers

wdingmtsu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.