Code Monkey home page Code Monkey logo

active-generative-design's Introduction

AGD: Active learning based generative design for discovery of wide band gap materials

This software package implements our developed framework AGD for materials design based on active learning. This is the official Python repository.

Machine Learning and Evolution Laboratory
Department of Computer Science and Engineering
University of South Carolina

How to cite:
@article{xin2021active, title={Active learning based generative design for the discovery of wide bandgap materials}, author={Xin, Rui and Siriwardane, Edirisuriya and Song, Yuqi and Zhao, Yong and Louis, Steph-Yves and Nasiri, Alireza and Hu, Jianjun}, journal={arXiv preprint arXiv:2103.00608}, year={2021} }

Table of Contents

Introduction

The package provides 3 major functions:

  • Perform active-learning based sampling in whole design latent space (based on Bayesian Optimization).
  • Train and evaluate the performance of a screening model (based on Roost).
  • Generate material cadidates' cif files based on element substitution (based on ELMD).

The following paper describes the details of the our framework: Active learning based generative design for discovery of wide band gap materials

Installation

Install any of the relevant packages if not already installed:

  • Bayesian Optimization (tested on 1.2.0)
  • tensorflow (tested on 2.2.0)
  • GATGNN documentation.
  • RooSt documentation.
  • Numpy (tested on 1.18.5)
  • Pandas (tested on 1.1.0)
  • Scikit-learn (tested on 0.21.3)
  • Pytmatgen (tested on 2020.3.13)

Bayesian Optimization, Pytorch, Numpy, Pandas, Scikit-learn, and Pymatgen

conda install -c conda-forge bayesian-optimization
pip install numpy
pip install pandas
pip install scikit-learn
pip install pymatgen

Dataset

  1. Download the compressed file of our dataset using this link
  2. Unzip its content ( two .csv files' and 5 pre-trained models)
  3. Move the csv files in your AML_Roost directory. i.e. such that the datapath now exists.

Usage

train auto encoder and decoder for representation learning

use ae.py, encoder.py and decoder.py under AML_Roost folder to train auto encoder and decoder to perform representation learning.

Generate target property material candidates

Once all the aforementionned requirements are satisfied, one can easily generate target property material candidates by running ALSearch.py in the terminal along with the specification of the appropriate flags. At the bare minimum, using --budget to specify the active learning budget, --init to set number of initial samples and --kappa to control balance between exploration and exploitation.

  • Example. start active-learning process given budget and kappa.
python ALSearch.py --budget 50 --kappa 100 --init 300 --candidate_out_path path/you/prefer

The generated materials and their predicted property will be automatically generated under specified folder

Training a new screening model

Upon acquire active-learning augumented data, one can train and evaluate a screening model's performance using Roost package and GAN generated dataset. The 5 augumented dataset corresponding to Exp1, Exp2_BS, Exp2_AL, Exp3_BS, Exp3_AL in the paper are in /root_path/roost/roost/examples/prepared_training_data/

The 5 pre-trained models in figshare link are corresponding to Exp1, Exp2_BS, Exp2_AL, Exp3_BS, Exp3_AL.

Under roost/roost/examples, you can train and evluate model performance using hold out dataset:

python roost-predict.py --data-path /root_path/roost/roost/examples/prepared_training_data/Exp3_AL_1153.csv --train --evaluate --val-size 0.2  --epochs 200 --run-id 311

Evaluating the performance of a model trained by active-learning-augemented data

Independent test dataset is under folder roost/roost/examples/prepared_training_data/ Under roost/roost/examples

python roost-predict.py --test-path /root_path/roost/roost/examples/prepared_training_data/bd_test_only.csv --regression --evaluate --run-id 3

Screening Recovery rate test

Input: .csv file path of screened out candidate materials by Oracle model and model trained with active-learning augumented data. Output: Recovery rate measured by the overlap percentage of candidates with respect to the candidates screened out by the Oracle model. To test the recovery rate:

python screen_recover_rate.py --Oracle_candidates_filepath xxx --Exp2_BS_candidates_filepath xxx --Exp2_AL_candidates_filepath xxx --Exp3_BS_candidates_filepath xxx --Exp3_AL_candidates_filepath xxx

active-generative-design's People

Contributors

glard avatar usccolumbia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

zankzeke

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.