Code Monkey home page Code Monkey logo

andresilvapimentel / endocrine-disruption-explainer Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 9.72 MB

Endocrine Disruption Explainer is a code to generate structural alerts of endocrine disruption of chemcial compounds using Local Interpretable Model-Agnostic Explanations (LIME) of machine learning models from TOX-21, EDC, and EDKB-FDA datasets.

License: MIT License

Jupyter Notebook 100.00%
endocrine-disruptor explainable-artificial-intelligence interpretable-machine-learning machine-learning structural-alerts toxic-alerts toxicology

endocrine-disruption-explainer's Introduction

endocrine-disruption-explainer

drawing

DOI

Endocrine Disruption Explainer is a code to generate structural alerts using Local Interpretable Model-Agnostic Explanations (LIME) of machine learning models from the TOX21, EDC and EDKB-FDA datasets.

The Endocrine Disruption Explainer framework is highly versatile (coded in Google Colab), with options that can be further developed and optimized by the users: it can accept any user-defined datasets (or datasets available in any repository), can use different fingerprints, data splitters, cross-validation methods, and any classification model from DeepChem or scikitlearn library.

There are ten codes that build the model using the curated and resampled TOX-21 dataset (5-fold cross-validated) with Rain Forest classifier and analyze the curated EDC or EDKB-FDA datasets:

  1. tox21_AR_Resampling_EDC.ipynb explains the substructures important to the EDC dataset;
  2. tox21_AR_Resampling_EDKB.ipynb explains the substructures important to the EDK-FDA dataset;
  3. tox21_ER_Resampling_EDC.ipynb explains the substructures important to the EDC dataset;
  4. tox21_ER_Resampling_EDKB.ipynb explains the substructures important to the EDKB-FDA dataset;
  5. tox21_AhR_Resampling_EDC.ipynb explains the substructures important to the EDC dataset;
  6. tox21_AhR_Resampling_EDKB.ipynb explains the substructures important to the EDKB-FDA dataset;
  7. tox21_ARO_Resampling_EDC.ipynb explains the substructures important to the EDC dataset;
  8. tox21_ARO_Resampling_EDKB.ipynb explains the substructures important to the EDKB-FDA dataset;
  9. tox21_PPAR_Resampling_EDC.ipynb explains the substructures important to the EDC dataset;
  10. tox21_PPAR_Resampling_EDKB.ipynb explains the substructures important to the EDKB-FDA dataset.

The original and curated TOX-21, EDC and EDKB datasets are provided here. DeepChem tools may also be used to upload any dataset in MoleculeNet or user-defined dataset. However, the TOX-21, EDC and EDKB-FDA datasets were curated removing duplicate and triplicate compounds, unifying compounds with two lables, and fixing smiles with RDKit issues. The TOX-21, EDC, and EDKB-FDA datasets were cross-validated to get the most robust models.

Endocrine Disruption Explainer was used with the Random Forest classifier, but any scikitlearn or DeepChem classifier may be used with little modification in the source code. And, these models were analyzed with different metrics (precision, accuracy, recall, MCC, and F1 scores) and with the confusion matrix. The models were optimized using hyperparameterization approach to get the best hyper parameters from each model and output the best results.

Installation instructions

Endocrine Disruption Explainer is 100% compatible with Google Colab platform developed in Microsoft Windows using Python version 3.10.

Endocrine Disruption Explainer has the following dependencies: Lime, RDkit, DeepChem, Pandas, Matplotlib, sklearn, mols2grid, IPython and XlsxWriter.

Documentation

The complete documentation about how to run the Endocrine Disruption Explainer protocol and several tutorials is being developed.

Get help

The Endocrine Disruption Explainer is being actively developed and some issues may arise or you may need extra help to run Endocrine Disruption Explainer. In those cases, there are two main ways to get help:

  1. Open a new issue in this repository Or
  2. write an email to André Silva Pimentel ([email protected]) (I will do my best to answer your questions as soon as possible).

License

Endocrine Disruption Explainer is available under MIT License. See license document for more details. URL and DOI: https://github.com/andresilvapimentel/endocrine-disruptor-explainer (https://doi.org/10.5281/zenodo.10963050)

Contributors

This code was written under collaboration: Lucca Caiaffa Santos Rosa (Undergraduate student), Mariam Sarhan (Undergraduate student), and Andre Silva Pimentel (supervisor).

endocrine-disruption-explainer's People

Contributors

andresilvapimentel avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.