Code Monkey home page Code Monkey logo

cherry_crispr's Introduction

CHERRY-crispr version

CHERRY is a python library for predicting the interactions between viral and prokaryotic genomes. CHERRY is based on a deep learning model, which consists of a graph convolutional encoder and a link prediction decoder.

In this program, we provide a simplified version of CHERRY, which only used the CRISPRs information in CHERRY's graph for host prediction. In addition, in this lite version, we provide a more easier way to predict the links between bacterial contigs and phage contigs. The sketch of CHERRY-crispr is shown as below:


Input (provided by user):
    1. bacterial contigs from their samples (FASTA files)
    2. phage contigs from their samples (FASTA files)

Output:
    The links (interactions) between bacteria and phages (CSV files)

Required Dependencies

  • Python 3.x
  • Pandas
  • Numpy
  • Biopython
  • NCBI BLAST+

An easiler way to install

We suggest you to install all the package using conda (both miniconda and Anaconda are ok) following the command lines as below:

conda create --name cherry_crispr python=3.8
conda activate cherry_crispr

conda install pandas numpy biopython

conda install blast -c bioconda

Usage

Once install the required environments, you need to activate it when you want to use:

conda activate cherry_crispr

Then, the commond of CHERRY-crispr can be called by:

python PATH_TO_CHERRY_CRISPR/cherry_crispr.py --bfolder PATH_TO_BACTERIA --pfile PATH_TO_PHAGE --ident IDENTITY_THRESHOLD --threads NUM_OF_THREAD --rootpth PATH_TO_OUTPUT --dbdir PATH_TO_CHERRY_CRISPR/database

OR

python PATH_TO_CHERRY_CRISPR/cherry_crispr.py --bfile PATH_TO_BACTERIA --pfile PATH_TO_PHAGE --ident IDENTITY_THRESHOLD --threads NUM_OF_THREAD --rootpth PATH_TO_OUTPUT --dbdir PATH_TO_CHERRY_CRISPR/database

There are two options for bacterial contigs:

  1. --bfolder: the folder where you store your bacterial contigs (all the contigs should be FASTA files)

  2. --bfile: the file where you store your bacterial contigs (FASTA file)

For example:

python CHERRY_crispr/cherry_crispr.py --bfolder ~/bacteria/ --pfile ~/phage.fa --threads 40 --rootpth ~/test_dir --dbdir CHERRY_crispr/database

OR

python CHERRY_crispr/cherry_crispr.py --bfile ~/bacteria.fa --pfile ~/phage.fa --threads 40 --rootpth ~/test_dir --dbdir CHERRY_crispr/database

Outputs

There are three output files in --rootpth PATH_TO_OUTPUT.

  1. CRISPRs.fa: CRISPRs found in your provided bacteria FASTA
  2. crispr_align.txt: BLASTN results between CRISPR and phage
  3. cherry_crispr_pred.csv: CSV files of the prediction (alignment > --ident IDENTITY_THRESHOLD)

Citation

If you use this program, please cite the following papers:

  • CHERRY:
Jiayu Shang, Yanni Sun, CHERRY: a Computational metHod for accuratE pRediction of virus–pRokarYotic interactions using a graph encoder–decoder model, Briefings in Bioinformatics, 2022;, bbac182, https://doi.org/10.1093/bib/bbac182
  • CRT:
Bland C, Ramsey TL, Sabree F, Lowe M, Brown K, Kyrpides NC, Hugenholtz P:
CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics. 2007 Jun 18;8(1):209

The original version of CHERRY can be found via: CHERRY

cherry_crispr's People

Contributors

kennthshang avatar

Stargazers

 avatar

Watchers

 avatar

cherry_crispr's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.