Code Monkey home page Code Monkey logo

cheminformaticsbook's Introduction

These files are meant to accompany "What are our models really telling us?  A practical tutorial on avoiding common mistakes when building predictive models" by W. Patrick Walters

Please address any questions or corrections to [email protected]

compare_regression.txt - experimental and predicted LogS for a set of compounds using two different predictive models
huuskonen.rdkit - RDKit descriptors for the Huuskonen dataset
huuskonen.sol - experimental solubilities for the Huuskonen dataset
huuskonen_test.rdkit - RD descriptors for a random test set created from the Huuskonen dataset
huuskonen_test.smi - SMILES for a random test set built from the Huuskonen dataset
huuskonen_test.sol - experimental solubilities for a random test set created from the Huuskonen dataset
huuskonen_test.txt - similarities between the Huuskonen test set and training set
huuskonen_train.rdkit - experimental solubilities for a random training set created from the Huuskonen dataset
huuskonen_train.smi - SMILES for a random test set built from the Huuskonen dataset
huuskonen_train.sol - experimental solubilities for a random training set created from the Huuskonen dataset
install_libraries.R - a small R script to install the libraries required by the chapter
jcim.rdkit - RDKit descriptors for the JCIM dataset
jcim.smi - SMILES for the JCIM dataset
jcim.sol - experimental solublities for the JCIM dataset
listing_1.R - load data and display box plots to compare distributions
listing_2.py - calculate molecular descriptors using the RDKit library
listing_3.R - train and test a random forest model based on the Huuskonen dataset
listing_4.R - train and test a random forest model based on a subset of the Huuskonen dataset
listing_5.R - add simulated error to experimental data to examine the impact of error on correlations
listing_6.py - cacluate similarity between pairs of SMILES files and report 
listing_7.R - the most similar training set molecule for each test set molecule
listing_8.R - predict activities of 3 test sets for a training set
listing_9.R - calculate errors for regression and plot Pearson r with associated error bars
pubchem.rdkit - RDKit descriptors for the PubChem dataset
pubchem.smi - SMILES for the PubChem dataset
pubchem.sol - experimental solublities for the PubChem dataset

cheminformaticsbook's People

Contributors

patwalters avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.