Code Monkey home page Code Monkey logo

mirnature's Introduction

miRNAture

Computational detection of microRNA candidates

License install with bioconda Conda Conda

Description

Detection of miRNAs is a difficult problem. Due their small size limits the available information and current sensitive methods, such as: blast, nhmmer, or cmsearch are designed to increase sensitivity, but lead to an inevitable large number of false positives only detected by detailed analysis of specific features of typical miRNAs and/or conservation patterns in a structure-annotated multiple sequence alignments.

The miRNAture pipeline implements a workflow specific to animal microRNAs that automatizes homology search and validation steps. On the homology search it combines two modes: sequence-homology by blast or/and nhmmer using query sequences or hidden markov models (HMMs), and structural validation performed by the INFERNAL package, using covariance models (CMs). A merging step produces a final list of homology candidates. Over those candidates a Mature annotation stage performs a correction of the position of mature sequences on the detected precursor and a structural evaluation in terms of minimum free energy (MFE), precursor length, folding and the evaluation of anchored family specific-multiple secondary alignment (using MIRfix). Final sanity checks are performed on the Evaluation stage, that reviews all the last mature annotation process, filtering the invalid candidates at structure level and reporting valid candidates on GFF3/BED and fasta files together with a summarize file that provides overall information about detected miRNA candidates and families.

Installation

The easiest way to install miRNAture is through conda. To do so, please first install conda.

To speed up installation of dependencies and packages we suggest to use mamba, for this just run:

conda install mamba -c conda-forge

You can use mamba as drop-in replacement for conda by simply replacing the call to conda with a call to mamba.

Install via Conda

To install miRNAture from conda in a specific mirnature environment simply run:

mamba create -n mirnature mirnature

if mamba is available, else run:

conda create -n mirnature mirnature

Manual install, resolve dependencies via Conda

Create a mirnature conda environment with the file miRNAture.yml:

mamba env create -n mirnature -f miRNAture.yml

Activate the environment containing all dependencies:

conda activate mirnature

followed by the manual steps:

perl Build.PL
./Build
./Build test
./Build install

which will install miRNAture in the mirnature conda environment.

Input files

The most important input file is a DNA sequence. This could be a multi-fasta sequence that belongs from a common specie (i.e. complete genome or group of particular sequences). At the same time, previous to execute miRNAture a pre-calculated dataset (that contains default data as CMs, HMMs, and required files to perform mature prediction) must be downloaded and correctly indicated in the command line options with the flag -dataF.

New in version 1.1 A new dataset containing all miRBase HMMs/CMs and validated mature sequences is recommended to use as first approach to identify miRNAs over target species. This dataset can be downloaded from here.

To run miRNAture in its complete mode with default options, just run as:

# Activate the mirnature environment
conda activate mirnature

# Run miRNAture
./miRNAture -stage complete -dataF <Precalculated_folder> -speG <Target Genome> -speN <Specie_name> -speT <Tag_specie> -w <Output_dir> -m <Mode> (-str <Blast_strategy>) -blastq <Blast_queries_folder> 

Output files

Final predicted miRNAs will be written on the <Output_dir> indicated with the -w flag. The final candidates are described on the folder Final_miRNA_evaluation/ as follows:

Final_miRNA_evaluation/
├── Fasta/
├── MFE/
├── miRNA_annotation_Lach_accepted_conf.bed
├── miRNA_annotation_Lach_accepted_conf.gff3
├── miRNAture_summary_Lach.txt
└── Tables/

Inside this folder, miRNAture will create 3 folders containing their correspondent results: sequences in fasta format (Fasta/), minimum free energy and lengths from described sequences (MFE/) and the supporting information ordered in tables for each annotated candidate (Tables/). Additionally, associated genomic positions for the miRNA candidates are reported in BED and GFF3 formats and a summary file, miRNAture_summary_*.txt, that describes overall descriptive statistics from found miRNA families.

For detailed instructions how to use miRNAture please refer to the Manual pages:

Pre-calculated datasets

Pre-calculated data composed by miRNA CMs, HMMs and required input files to perform mature annotation has to be downloaded before run the full miRNAture pipeline. Available datasets are listed below:

  • New dataset containing metazoan curated miRBase v.22.1 families. Recommended for use with miRNAture v.1.1. Download here.
  • Required data to re-annotate human miRNAs: include CMs and HMMs build from miRBase without human sequences. Stored in Zenodo here.

mirnature's People

Contributors

cavelandiah avatar jfallmann avatar

Stargazers

 avatar  avatar Shuai Chen avatar

Watchers

James Cloos avatar  avatar  avatar

mirnature's Issues

Warning related with BioPerl module DB

When use makeblastdb to create fasta indexes, appears this warning:

Possible precedence issue with control flow operator at /home/user/anaconda3/envs/mirnature/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.

Seems that it was reported as a corrected bug for Bioperl 1.7.8:
bioperl/bioperl-live@6459750

Attribute (nbitscore_cut) does not pass the type constraint

Hello!

I got following error with miRNAture:

miRNAture -stage complete -dataF ~/public_db/miRNAture/Dataset_mirnature_Sept21_2022/Data -speG ~/public_db/model_insects/Drosophila_melanogaster_GCF_000001215.4.fna -speN Drosophila_mirna -speT mrna -w ~/pipeline_eval/miRNAture/Drosophila -m blast,hmm,rfam,mirbase,infernal,final -pe 0 -str 5,6 -blstq ~/public_db/miRBase -debug_mode 1
Attribute (nbitscore_cut) does not pass the type constraint because: Validation failed for 'Num' with value  at constructor Mir::ConfigurationFile::new (defined at /home/c/c-liu/miniconda3/envs/mirnature/lib/perl5/site_perl/Mir/ConfigurationFile.pm line 631) line 192
	Mir::ConfigurationFile::new('Mir::ConfigurationFile', 'stage', 'complete', 'species_name', 'Drosophila_mirna', 'species_tag', 'mrna', 'species_genome', '~/public_db/model_insects/Drosophila_melanogaster_GCF_000001215.4.fna', 'output_folder', '~/pipeline_eval/miRNAture/Drosophila', 'data_path', '~/public_db/miRNAture/Dataset_mirnature_Sept21_2022/Data', 'current_folder', '/home/c/c-liu/Hpc/pipeline_eval', 'model_list', undef, 'mirfix_path', '', 'mode', 'blast,hmm,rfam,mirbase,infernal,final', 'repetition_rules', '', 'nbitscore_cut', '', 'blast_strategy', 'ARRAY(0x55dd789bd5f0)', 'blast_queries_path', '~/public_db/miRBase', 'user_folder', '', 'parallel_linux', 0, 'parallel', 0, 'debug_mode', 1, 'evaluation_results_folder', '~/pipeline_eval/miRNAture/Drosophila/Final_miRNA_evaluation', 'mirnanchor_folder', '~/pipeline_eval/miRNAture/Drosophila/miRNA_validation') called at /home/c/c-liu/miniconda3/envs/mirnature/bin/miRNAture line 101

How to fix it?

SIncerely,

Cong

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.