Code Monkey home page Code Monkey logo

absprestree's Introduction

Absence/Presence Tree script


This script generates a phylogenetic tree, as well as a matrix showing absence/presence of biosynthetic gene clusters alongside this tree, for a set of genomes. It takes a list of accession numbers of genomes, predicts biosynthetic gene clusters for them using antiSMASH, groups these BGCs into families with BiG-SCAPE and NetworkX, and visualizes absence/presence of these families for each organism alongside a phylogenetic tree with autoMLST.

Prerequisites

The following programs need to be installed:

Additionally, several Python virtual environments are required. Requirements for three of these are given in requirements-ete-env.txt, requirements-general-env.txt, and requirements-networkx-env.txt. Beyond this, virtual environments for autoMLST and antiSMASH are also required; requirements for these can be found in their respective repositories.

Setup

Before running mibig-gbk-to-trees.sh for the first time, several placeholders need to be replaced. These are:

Virtual environments

  • /path/to/.virtualenvs/antismash-dmz_markers/bin: replace with path to antiSMASH virtualenv
  • /path/to/.virtualenvs/automlst_env/bin: replace with path to autoMLST virtualenv
  • /path/to/.virtualenvs/ete3_env/bin: replace with path to virtualenv specified in requirements-ete-env.txt
  • /path/to/.virtualenvs/base_env/bin: replace with path to virtualenv specified in requirements-general-env.txt
  • /path/to/.virtualenvs/networkx_env/bin/: replace with path to virtualenv specified in requirements-networkx-env.txt

Programs

  • /path/to/BiG-SCAPE-master: replace with path to dir containing BiG-SCAPE
  • /path/to/Pfam-A: replace with path to PFAM directory used for BiG-SCAPE (normally, but not always, in BiG-SCAPE dir)
  • /path/to/autoMLST/ziemertlab-automlst-7b2b5a9a8961: replace with path to dir containing autoMLST fork
  • /path/to/antismash-dmz_markers: replace with path to dir containing antiSMASH fork
  • /path/to/absprestree: replace with path to dir containing this script

Directories

  • /path/to/default/dir: replace with path to the directory you wish to run the script in by default If desired, you can also change curated-accessions-tiny to a list of accessions you wish to run by default.

Running the script

To run the script with default settings, simply run bash mibig-gbk-to-trees.sh. All that is required is a plain text file listing accession numbers of the genomes you wish to generate an absence/presence matrix and tree for, one per line, under the name set as a default in the default base directory. Additional arguments are:

  • -b: Base directory to run script in; all files and directories are normally generated here
  • -a: plain text file containing accession numbers of your desired genomes, one per line
  • -t: name (not path!) of the final tree image. Formats supported by ete3 are .png, .pdf and .svg.
  • -g: name (not accession!) of outgroup(s) to be used. Replace spaces with _ and remove all non-alphanumeric characters except - and _. If you wish to use more than one outgroup, separate by spaces and wrap the entire list in double quotes (e.g. "Bacillus_cereus_ATCC_14579 Bacillus_megaterium_NBRC_15308__ATCC_14581")

absprestree's People

Contributors

katsteinke avatar

Stargazers

Friederike Biermann avatar Song Feng (馮松) avatar  avatar Omkar Mohite avatar Jie Zhu avatar

Watchers

 avatar

Forkers

omkarsamo

absprestree's Issues

Install AbsPreTree

Hi,
Do you have the conda or docker environment to install these scripts easily? Thank you very much!

Li Zhihua

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.