Overview
Recent studies attribute a new role to the noncoding genome in the production of novel peptides. The widespread transcription of noncoding regions and the pervasive translation of the resulting RNAs offer a vast reservoir of novel peptides to the organisms.
ORFmine is an open-source package that aims at extracting, annotating, and characterizing the fold potential and the structural properties of all Open Reading Frames (ORF) encoded in a genome (including coding and noncoding sequences). ORFmine consists of two independent programs, ORFtrack and ORFold that can be used together or independently (see here for an example of application).
Both these tools have been developed in python3 (version >= 3.6). The install.sh script will install both ORFtrack and ORFold with their dependancies. They can be used together or independently.
Documentation
You will find complete documentation on https://i2bc.github.io/ORFmine/
Installation
1. Download and uncompress the latest release
You can clone the whole project with the following command:
git clone https://github.com/i2bc/ORFmine.git
Alternatively, you can click here to access the latest release.
Then uncompress the archive. If you downloaded:
- the .zip file:
unzip ORFmine-x.x.x.zip
- the .tar.gz file:
tar xzvf ORFmine-x.x.x.tar.gz
2. Create an isolated environment
Although not strictly necessary, this step is highly recommended (it will allow you to work on different projects without having any conflicting library versions). If you do not want to create a virtual environment, please go directly to the install section.
Install virtualenv
python3 -m pip install virtualenv
Create a virtual environment
virtualenv -p python3 orfmine_env
Activate the created environment
source orfmine_env/bin/activate
Once activated, any python library you will install using pip will be installed solely in this isolated environment. You must activate this environment any time you need libraries installed in this environment.
Once you are done working on your project,
simply type deactivate
to exit the environment.
Note
To delete definitely your virutal environment, you can simply
remove the directory with the following instruction:
rm -r orfmine_env/
Note
We remind to the user that some external packages used in ORFmine (such as Biopython) require python version >= 3.6. Before creating your virtual environment make sure that your python version is up-to-date.
3. Install ORFMine
Preparation before the Installation
Please note that we will refer below to the root directory of ORFmine as ORFmine-x-x-x where x-x-x refers to the version downloaded from an archive file (either .zip or .tar.gz). If you just cloned the project, the root directory of ORFmine will be ORFmine instead.
If you just want to use ORFtrack in order to annotate all the possible ORFs of a genome, you have no other dependencies to install, and you simply have to Launch the Installation presented below.
The installation of ORFold becomes a bit more demanding as there are some external tools to be downloaded and/or installed before launching the installation.
Firstly, ORFold is based on the HCA method for the calcluation of the
fold potential. As a result pyHCA
[1]
is essential to be pre-installed in your machine before installing
ORFold. You can download for free and install pyHCA using
the instructions of the developers.
If you are not interested in the calculation of the disorder
and/or aggregation propensities with ORFold and you already
have installed pyHCA, you can simply launch the installation
presented below.
However, in the case you want to use IUPred [2][3][4] and/or Tango [5][6][7] with ORFold you have to first contact their developers through the respective links and have access to their programs. These two softwares are not freely available for non-academic users.
Once you have access to the IUPred and Tango you have to place them in a directory
called softwares
placed in the path: ORFmine-x.x.x/orfold_v1/orfold/
. To do so:
- First create the
softwares
directory if not already created:
mkdir ORFmine-x.x.x/orfold_v1/orfold/softwares
-
Move the IUPred source code and data (provided by the developer):
mv iupred2a.py ORFmine-x.x.x/orfold_v1/orfold/softwares mv iupred2a.py ORFmine-x.x.x/orfold_v1/orfold/softwares mv data ORFmine-x.x.x/orfold_v1/orfold/softwares
-
Move Tango source code:
-
For MacOS:
mv tango2_3_1 ORFmine-x.x.x/orfold_v1/orfold/softwares
-
For linux:
mv tango_x86_64_release ORFmine-x.x.x/orfold_v1/orfold/softwares
-
For windows:
mv Tango.exe ORFmine-x.x.x/orfold_v1/orfold/softwares
-
Note
The calculation of the disorder or aggregation propensities are both optional and complementary to the HCA score. As a result, IUPred and Tango tools are not mandatory for the installation of ORFold. In addition, they are not necessarily coupled together. ORFold will properly be installed without them or even with only one of them.
Installation
If you use a virtual environment, be sure that your virtual environment is activated. Then, in any case, follow the procedure described below:
cd ORFmine-x.x.x
chmod u+x install.sh
./install.sh
This script will first uninstall ORFmine if it was already installed and will re-install it. In addition, it will install all the dependency packages needed for ORFtrack and ORFold.
References
- Bitard-Feildel, T. & Callebaut, I. HCAtk and pyHCA: A Toolkit and Python API for the Hydrophobic Cluster Analysis of Protein Sequences. bioRxiv 249995 (2018).
- Dosztanyi, Z., Csizmok, V., Tompa, P. & Simon, I. The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. Journal of molecular biology 347, 827–839 (2005).
- Dosztányi, Z. Prediction of protein disorder based on IUPred. Protein Science 27, 331– 340 (2018).
- Mészáros, B., Erdős, G. & Dosztányi, Z. IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic acids research 46, W329–W337 (2018).
- Fernandez-Escamilla, A.-M., Rousseau, F., Schymkowitz, J. & Serrano, L. Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nature biotechnology 22, 1302–1306 (2004).
- Linding, R., Schymkowitz, J., Rousseau, F., Diella, F. & Serrano, L. A comparative study of the relationship between protein structure and β-aggregation in globular and intrinsically disordered proteins. Journal of molecular biology 342, 345–353 (2004).
- Rousseau, F., Schymkowitz, J. & Serrano, L. Protein aggregation and amyloidosis: confusion of the kinds? Current opinion in structural biology 16, 118–126 (2006).