Code Monkey home page Code Monkey logo

mutation-stability-data's Introduction

Mutation Stability Data

Data for mutation effects on protein stability.
In molecular biology, we call an amino acid in a protein sequence a residue.

V1 data

These data were collected initially for my degree thesis research, which aimed to build a model for ∆∆G prediction from Single Sequence because at that time we did not have the AlphaFold2. Now I already gave up that project and changed my thesis from pure dry-lab to some "real" protein design project, and developed DDGScan by the way.

For train.csv and test.csv:
PDB: PDB ID code, get widetype 3D structure from https://files.rcsb.org/download/{pdb_id}.pdb.
wildtype: Wildtype amino acid.
position: Residue number of mutation in sequence.
mutation: Changed amino acid in mutant_seq.
ddG: Experimentally measured ∆∆G(folding), positive means more stable.
sequence: Wildtype protein sequence.
mutant_seq: Mutated protein sequence.

For tm.csv:
PDB: PDB ID code, get widetype 3D structure from https://files.rcsb.org/download/{pdb_id}.pdb.
WT: Wildtype amino acid.
position: Residue number of mutation in sequence.
MUT: Changed amino acid in mutant_seq.
dTm: Experimentally measured ∆T(melting), positive means more stable.
sequence: Wildtype protein sequence.
mutant_seq: Mutated protein sequence.

V2 data

The V1 data was first released for kaggle competition novozymes-enzyme-stability-prediction, I cleaned and updated these data to this version, hope this will be helpful.

pdb: PDB ID code, get widetype 3D structure from https://files.rcsb.org/download/{pdb_id}.pdb.
wildtype: Wildtype amino acid.
pdb_resseq: Auth. Resseq number in the 6th col in a pdb file. Not always starts from 1!
seq_index: Index in the seq string where a single mutation happens starts from 0.
mutation: Changed amino acid in mut_seq.
wt_seq: Wildtype protein sequence.
mut_seq: Wildtype protein sequence.
ddG: Experimentally measured ∆∆G(folding), positive means more stable.
group: For K-fold CV. The test.csv does not have this.

Sorry for the inconsistent naming of columns. Time has changed me.
The data stored outside v1 and v2 is kept for some kaggle notebooks that already used this data.

To predict mutation ∆∆G

A tool designed for enzyme stability prediction: DDGscan

mutation-stability-data's People

Contributors

jinyuansun avatar

Stargazers

 avatar Jakub Młokosiewicz avatar Jianqiu Ye avatar Zhizhou Ren avatar Paul Merica avatar Hassan Abedi avatar  avatar Adrian Brodzik avatar Araik Tamazian avatar  avatar Wei Lu (陆威) avatar Dan Ofer avatar Ratthachat (Jung) avatar bilzard avatar Mensur Dlakic avatar

Watchers

James Cloos avatar  avatar

Forkers

jon-mt enzycue

mutation-stability-data's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.