Code Monkey home page Code Monkey logo

orfmine's Introduction

Overview

Recent studies attribute a new role to the noncoding genome in the production of novel peptides. The widespread transcription of noncoding regions and the pervasive translation of the resulting RNAs offer a vast reservoir of novel peptides to the organisms.

ORFmine is an open-source package that aims at extracting, annotating, and characterizing the fold potential and the structural properties of all Open Reading Frames (ORF) encoded in a genome (including coding and noncoding sequences). ORFmine consists of two independent programs, ORFtrack and ORFold that can be used together or independently (see here for an example of application).

Both these tools have been developed in python3 (version >= 3.6). The install.sh script will install both ORFtrack and ORFold with their dependancies. They can be used together or independently.

Documentation

You will find complete documentation on https://i2bc.github.io/ORFmine/

Installation

1. Download and uncompress the latest release

You can clone the whole project with the following command:

git clone https://github.com/i2bc/ORFmine.git

Alternatively, you can click here to access the latest release.

Then uncompress the archive. If you downloaded:

  • the .zip file: unzip ORFmine-x.x.x.zip
  • the .tar.gz file: tar xzvf ORFmine-x.x.x.tar.gz

2. Create an isolated environment

Although not strictly necessary, this step is highly recommended (it will allow you to work on different projects without having any conflicting library versions). If you do not want to create a virtual environment, please go directly to the install section.

Install virtualenv

python3 -m pip install virtualenv

Create a virtual environment

virtualenv -p python3 orfmine_env

Activate the created environment

source orfmine_env/bin/activate

Once activated, any python library you will install using pip will be installed solely in this isolated environment. You must activate this environment any time you need libraries installed in this environment.

Once you are done working on your project, simply type deactivate to exit the environment.

Note

To delete definitely your virutal environment, you can simply remove the directory with the following instruction: rm -r orfmine_env/

Note

We remind to the user that some external packages used in ORFmine (such as Biopython) require python version >= 3.6. Before creating your virtual environment make sure that your python version is up-to-date.

3. Install ORFMine

Preparation before the Installation

Please note that we will refer below to the root directory of ORFmine as ORFmine-x-x-x where x-x-x refers to the version downloaded from an archive file (either .zip or .tar.gz). If you just cloned the project, the root directory of ORFmine will be ORFmine instead.

If you just want to use ORFtrack in order to annotate all the possible ORFs of a genome, you have no other dependencies to install, and you simply have to Launch the Installation presented below.

The installation of ORFold becomes a bit more demanding as there are some external tools to be downloaded and/or installed before launching the installation.

Firstly, ORFold is based on the HCA method for the calcluation of the fold potential. As a result pyHCA [1] is essential to be pre-installed in your machine before installing ORFold. You can download for free and install pyHCA using the instructions of the developers.

If you are not interested in the calculation of the disorder and/or aggregation propensities with ORFold and you already have installed pyHCA, you can simply launch the installation presented below.

However, in the case you want to use IUPred [2][3][4] and/or Tango [5][6][7] with ORFold you have to first contact their developers through the respective links and have access to their programs. These two softwares are not freely available for non-academic users.

Once you have access to the IUPred and Tango you have to place them in a directory called softwares placed in the path: ORFmine-x.x.x/orfold_v1/orfold/. To do so:

  • First create the softwares directory if not already created:
mkdir ORFmine-x.x.x/orfold_v1/orfold/softwares
  • Move the IUPred source code and data (provided by the developer):

      mv iupred2a.py ORFmine-x.x.x/orfold_v1/orfold/softwares
      mv iupred2a.py ORFmine-x.x.x/orfold_v1/orfold/softwares
      mv data ORFmine-x.x.x/orfold_v1/orfold/softwares
    
  • Move Tango source code:

    • For MacOS:

        mv tango2_3_1 ORFmine-x.x.x/orfold_v1/orfold/softwares
      
    • For linux:

        mv tango_x86_64_release ORFmine-x.x.x/orfold_v1/orfold/softwares
      
    • For windows:

        mv Tango.exe ORFmine-x.x.x/orfold_v1/orfold/softwares
      

Note

The calculation of the disorder or aggregation propensities are both optional and complementary to the HCA score. As a result, IUPred and Tango tools are not mandatory for the installation of ORFold. In addition, they are not necessarily coupled together. ORFold will properly be installed without them or even with only one of them.

Installation

If you use a virtual environment, be sure that your virtual environment is activated. Then, in any case, follow the procedure described below:

cd ORFmine-x.x.x
chmod u+x install.sh
./install.sh

This script will first uninstall ORFmine if it was already installed and will re-install it. In addition, it will install all the dependency packages needed for ORFtrack and ORFold.

References

  1. Bitard-Feildel, T. & Callebaut, I. HCAtk and pyHCA: A Toolkit and Python API for the Hydrophobic Cluster Analysis of Protein Sequences. bioRxiv 249995 (2018).
  2. Dosztanyi, Z., Csizmok, V., Tompa, P. & Simon, I. The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. Journal of molecular biology 347, 827–839 (2005).
  3. Dosztányi, Z. Prediction of protein disorder based on IUPred. Protein Science 27, 331– 340 (2018).
  4. Mészáros, B., Erdős, G. & Dosztányi, Z. IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic acids research 46, W329–W337 (2018).
  5. Fernandez-Escamilla, A.-M., Rousseau, F., Schymkowitz, J. & Serrano, L. Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nature biotechnology 22, 1302–1306 (2004).
  6. Linding, R., Schymkowitz, J., Rousseau, F., Diella, F. & Serrano, L. A comparative study of the relationship between protein structure and β-aggregation in globular and intrinsically disordered proteins. Journal of molecular biology 342, 345–353 (2004).
  7. Rousseau, F., Schymkowitz, J. & Serrano, L. Protein aggregation and amyloidosis: confusion of the kinds? Current opinion in structural biology 16, 118–126 (2006).

orfmine's People

Contributors

annelopes avatar cgpapado avatar nchenche avatar proginski avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.