The cheminformaticsbook from patwalters

These files are meant to accompany "What are our models really telling us?  A practical tutorial on avoiding common mistakes when building predictive models" by W. Patrick Walters

Please address any questions or corrections to [email protected]

compare_regression.txt - experimental and predicted LogS for a set of compounds using two different predictive models
huuskonen.rdkit - RDKit descriptors for the Huuskonen dataset
huuskonen.sol - experimental solubilities for the Huuskonen dataset
huuskonen_test.rdkit - RD descriptors for a random test set created from the Huuskonen dataset
huuskonen_test.smi - SMILES for a random test set built from the Huuskonen dataset
huuskonen_test.sol - experimental solubilities for a random test set created from the Huuskonen dataset
huuskonen_test.txt - similarities between the Huuskonen test set and training set
huuskonen_train.rdkit - experimental solubilities for a random training set created from the Huuskonen dataset
huuskonen_train.smi - SMILES for a random test set built from the Huuskonen dataset
huuskonen_train.sol - experimental solubilities for a random training set created from the Huuskonen dataset
install_libraries.R - a small R script to install the libraries required by the chapter
jcim.rdkit - RDKit descriptors for the JCIM dataset
jcim.smi - SMILES for the JCIM dataset
jcim.sol - experimental solublities for the JCIM dataset
listing_1.R - load data and display box plots to compare distributions
listing_2.py - calculate molecular descriptors using the RDKit library
listing_3.R - train and test a random forest model based on the Huuskonen dataset
listing_4.R - train and test a random forest model based on a subset of the Huuskonen dataset
listing_5.R - add simulated error to experimental data to examine the impact of error on correlations
listing_6.py - cacluate similarity between pairs of SMILES files and report 
listing_7.R - the most similar training set molecule for each test set molecule
listing_8.R - predict activities of 3 test sets for a training set
listing_9.R - calculate errors for regression and plot Pearson r with associated error bars
pubchem.rdkit - RDKit descriptors for the PubChem dataset
pubchem.smi - SMILES for the PubChem dataset
pubchem.sol - experimental solublities for the PubChem dataset

patwalters / cheminformaticsbook Goto Github PK

cheminformaticsbook's Introduction

cheminformaticsbook's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent