smithlabcode / ribotricer Goto Github PK

View Code? Open in Web Editor NEW

28.0 5.0 8.0 51.28 MB

A tool for accurately detecting actively translating ORFs from Ribo-seq data

Home Page: http://doi.org/djv4

License: GNU General Public License v3.0

Python 71.20% Makefile 1.29% Shell 2.83% Jupyter Notebook 24.68%

orfs translation ribosome-profiling translation-regulation bioinformatics ribosome ribo-seq ribosome-profiling-data

ribotricer's Introduction

ribotricer: Accurate detection of short and long active ORFs using Ribo-seq data

Online Paper | PDF | Supplementary File | Benchmarking scripts

Installation

We highly recommend that you install ribotricer via conda in a clean environment:

conda create -n ribotricer_env -c bioconda ribotricer
conda activate ribotricer_env
ribotricer --help

To install locally, you can either download the source code from release or clone the latest version using git clone. After you get a copy of the source code, please change into the source directory and run:

make install

NOTE: ribotricer will install the following dependencies (If some of these are already present, they might be replaced by the designated version):

pyfaidx>=0.5.0
pysam>=0.11.2.2
numpy>=1.11.0
pandas>=0.20.3
scipy>=0.19.1
matplotlib>=2.1.0
click>=6.0
click-help-colors>=0.3
quicksect>=0.2.0
tqdm>=4.23.4

Workflow of ribotricer

In order to run ribotricer, you need to have the following three files prepared including:

genome annotation file in GTF format: our implementation handles all variations of GTFs besides the often used GENCODE and Ensembl hosted ones
reference genome file in FASTA format
alignment file in BAM format

Preparing candidate ORFs

The first step of ribotricer is to take the GTF file and the FASTA file to find all candidate ORFs. In order to generate all candidate ORFs, please run

ribotricer prepare-orfs --gtf {GTF} --fasta {FASTA} --prefix {RIBOTRICER_INDEX_PREFIX}

The command above by default only includes ORFs with length longer than 60 nts, and only uses 'ATG' as start codon. You can change the setting by including options --min_orf_length and --start_codons.

Output: {PREFIX}_candidate_orfs.tsv.

Detecting translating ORFs

The second step of ribotricer is to take the index file generated by prepare-orfs and the BAM file to detect the actively translating ORFs by assessing the periodicity of all candidate ORFs:

ribotricer detect-orfs \
             --bam {BAM} \
             --ribotricer_index {RIBOTRICER_INDEX_PREFIX}_candidate_ORFs.tsv \
             --prefix {OUTPUT_PREFIX}

NOTE: This above command, by default, uses a phase-score cutoff of 0.428. Our species specific recommended cutoffs are as follows:

Species	Cutoff
Arabidopsis	0.330
C. elegans	0.239
Baker's Yeast	0.318
Drosophila	0.181
Human	0.440
Mouse	0.418
Rat	0.453
Zebrafish	0.249

In order to assign non-translating or translating status, ribotricer by default uses a cutoff threshold of 0.428. ORFs with phase score above 0.428 are marked as translating as long as they have at least five codons with non-zero read count. By default, ribotricer does not take coverage into account for predicting an ORF to be translating or not-translating. However, this behavior can be changed by following filters:

--min_valid_codons (default=5): Minimum number of codons with non-zero reads for determining active translation
--min_valid_codons_ratio (default=0): Minimum ratio of codons with non-zero reads to total codons for determining active translation
--min_reads_per_codon (default=0): Minimum number of reads per codon for determining active translation
--min_read_density (default=0.0): Minimum read density (total_reads/length) over an ORF total codons for determining active translation

For each of the above filters, an ORF failing any of the filters is marked as non-translating.

For example, to ensure that each ORF has at least 3/4 of its codons non-empty, we can specify --min_valid_codons_ratio to be 0.75:


ribotricer detect-orfs \
             --bam {BAM} \
             --ribotricer_index {RIBOTRICER_INDEX_PREFIX}_candidate_ORFs.tsv \
             --prefix {OUTPUT_PREFIX}
             --min_valid_codons_ratio 0.75

The ORF detection step consists of several small steps including:

Infer the experimental protocol (strandedness of the reads)
You can directly assign the strandedness using option --stranded, it can be 'yes', 'no', or 'reverse'. If this option is not provided, ribotricer will automatically infer the experimental protocol by comparing the strand of reads to the reference.

Output: {OUTPUT_PREFIX}_protocol.txt

Split the bam file by strand and read length
In this step, all mapped reads will be filtered to include only uniquely mapped reads. Reads will be split by strand and read length with respect to the strandedness provided or inferred from the previous step. If you only want to include certain read lengths, they can be assigned with option --read_lengths.
Output: {OUTPUT_PREFIX}_bam_summary.txt
Plot read length distribution
In this step, read length distribution will be plotted and serves as quality control
Output: {OUTPUT_PREFIX}_read_length_dist.pdf
Calculate metagene profiles
In this step, the metagene profile of all CDS transcripts for each read length is calculated by aligning with start codon or stop codon.
Output: {OUTPUT_PREFIX}_metagene_profiles_5p.tsv is the metagene profile aligning with the start codon and {OUTPUT_PREFIX}_metagene_profiles_3p.tsv is the metagene profile aligning with the stop codon
Plot metagene profiles
In this step, metagene plots will be made to serve as quality control.
Output: {OUTPUT_PREFIX}_metagene_plots.pdf
Align metagene profiles
If the P-site offsets are not provided, this step will use cross-correlation to find out the relative offsets between different read lengths
Output: {OUTPUT_PREFIX}_psite_offsets.txt
merge reads from different read lengths based on P-site offsets
This step will integrate reads of different read lengths by shifting with the P-site offsets
Export wig file
A WIG file is exported in this step to be used for visualization in Genome Browser
Output: {OUTPUT_PREFIX}_pos.wig for the positive strand and {OUTPUT_PREFIX}_neg.wig for the negative strand.
Export actively translating ORFs
The periodicity of all ORF profiles are assessed and the translating ones are outputed. You can output all ORFs regardless of the translation status with option --report_all
Output: {OUTPUT_PREFIX}_translating_ORFs.tsv

Definition of ORF types

Ribotricer reports eight different ORF types as defined below:

annotated: CDS annotated in the provided GTF file
super_uORF: upstream ORF of the annotated CDS, not overlapping with any CDS of the same gene (first or most upstream uORF)
super_dORF: downstream ORF of the annotated CDS, not overlapping with any CDS of the same gene (last or most downstream dORF)
uORF: upstream ORF of the annotated CDS, not overlapping with the main CDS
dORF: downstream ORF of the annotated CDS, not overlapping with the main CDS
overlap_uORF: upstream ORF of the annotated CDS, overlapping with the main CDS
overlap_dORF: downstream ORF of the annotated CDS, overlapping with the main CDS
novel: ORF in non-coding genes or in non-coding transcripts of coding genes

Learning cutoff empirically from data

Ribotricer can also learn cutoff empirically from the data. Given at least one Ribo-seq and one RNA-seq BAM file, ribotricer learns the cutoff by running one iteration of the algorithm on the provided files with a prespecified cutoff (--phase_score_cutoff, default: 0.428) and then uses the generated output to find the median difference between Ribo-seq and RNA-seq phase scores of only candidate ORFs with transcript_type set to protein_coding (--filter_by_tx_annotation).

ribotricer learn-cutoff --ribo_bams ribo_bam1.bam,ribo_bam2.bam \
--rna_bams rna_1.bam \
--prefix ribo_rna_prefix \
--ribotricer_index {RIBOTRICER_ANNOTATION}

Visualizing ribotricer output

Ribotricer generates a de-noised profile of read counts for each ORF. We can visualize the read distribution for any ORF. For an example, see this notebook.

Contacts and bug reports

https://github.com/smithlabcode/ribotricer/issues

If you found a bug or mistake in this project, we would like to know about it. Before you send us the bug report though, please check the following:

Are you using the latest version? The bug you found may already have been fixed.
Check that your input is in the correct format and you have selected the correct options.
Please reduce your input to the smallest possible size that still produces the bug; we will need your input data to reproduce the problem, and the smaller you can make it, the easier it will be.

LICENSE

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

ribotricer's People

Contributors

Stargazers

Watchers

Forkers

bioinfonerd-forks biogeeker elabaronne freeedog michellelpeters pinin4fjords mcorley5 r-liu

ribotricer's Issues

conda install failed

UnsatisfiableError: The following specifications were found to be in conflict:
  - python 3.7*
  - ribotricer -> python >=3.6,<3.7.0a0 -> readline 6.2
  - ribotricer -> python >=3.6,<3.7.0a0 -> sqlite 3.13.*
  - ribotricer -> python >=3.6,<3.7.0a0 -> tk 8.5.18
Use "conda info <package>" to see the dependencies for each package.

Could not solve for environment specs: click-help-colors >=0.3 , which does not exist (perhaps a missing channel)

I get the following when attempting the conda install instructions:

conda create -n ribotricer_env -c bioconda ribotricer
Channels:

bioconda
defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: failed

LibMambaUnsatisfiableError: Encountered problems while solving:

nothing provides click-help-colors >=0.3 needed by ribotricer-1.0.2-py36_0

Could not solve for environment specs
The following package could not be installed
└─ ribotricer is not installable because it requires
└─ click-help-colors >=0.3 , which does not exist (perhaps a missing channel).

Need suggestions regarding Ribo-Seq data analysis using Ribotricer

Hi,

I am using Ribotricer to analyze the Human Ribo-Seq data. I have the following queries-

Is Ribotricer suitable for Harringtonine treated Ribo-Seq data?
I want to analyze Ribo-Seq datasets using a custom-defined ORF database. I reformatted my database similar to the Ribotricer reference database format.

The details of my database are as follows-

Biotypes-3'UTR, 5'UTR, annotated and novel
Number of ORFs (excluding annotated ORFs) 463,198
Number of annotated ORFs-19,402
Number of transcripts- 219,286

I was able to execute the tool successfully and generate results. I was wondering if custom generated ORF database will impact my analysis results.

I am looking forward to hearing from you.

Thank you

Kind regards,
Hitesh

Translation initiation analysis

Hi,

I have Ribo-seq data generated from treatments with Cycloheximide, Harringtonine and a no-drug sample.

What do you think is the best strategy to use these data with Ribotricer?

I'm thinking to use only the Chx data (110M reads, post QC, post small RNA filtering) to detect ORFs. To determine the TIS for each detected ORF I would move outside of Ribotricer and compare the Harr TIS enrichment vs No drug separately.

To this end, is it possible to generate a merged GTF file containing annotated and novel ORFs

Thanks!
Colin

Could this program to show summary 3 nt periodicity for specific RF nt

This program is so efficient and I used it to run my data very smoothly. How can I use package to summary 3 nt periodicity for specific RF length like 32 nt, 31 nt in my data. I know the metagene report give it but is there any way to summarize 3 nt periodicity for any specific RF length?

If any report from these pipeline will give the direction, Could you give me some indication about that?

for phase score cutoff, I USE my maize data, do I need to just use my own data from Riboseq and RNAseq to determine or Do I need to download all the maize Riboseq data to run and get one phase score?

Thank you so much.

About definition of ORFs

Hi, I find Ribotricer is really helpful for me to detect uORFs/dORFs on ribo-seq data.

However, I have a question about super_uORFs and ORFs, I have read the manual carefully, what is the difference between super_uORFs and uORFs? I don't recognize the difference between the two definition. Could you give me an example?

Thank you for your kindly help!!

ORF_ID naming confusion

Hi saketkc,

I hope all is well.

I want to trace back to the ORF coordinate on the genome, which I thought the ORF_ID and chrom should work. I thought ORF_ID is the combination of tx_id, start, end and the ORF length. But I found that the naming is not that way? may I ask what is the naming strategy for ORF_ID and where can I relocate the ORF coordinate?
e.g.

ENST00000703342.1_1_130854336_130854428_93| overlap_dORF| translating | 0.8277447| 33| 93| 8| 0.25806452| 1.06451613| ENST00000703342.1_1| protein_coding| ENSG00000153310.22_12| CYRIB| protein_coding| chr8| -|

while the chr8:130854336-130854428 is FAM49B?

Many many thanks,

Tim

What does the ORF_ID represent?

Hello,

I have run Ribotricer version 1.3.3 and created the index using Gencode v35. Below is the result from the test1_translating_ORFs.tsv file:

ORF_ID ORF_type status phase_score read_count length valid_codons valid_codons_ratio read_density transcript_id transcript_type gene_id gene_name gene_type chrom strand start_codon profile
ENST00000420190.6_924432_939291_1074 annotated translating 0.5150787536377128 7 1074 7 0.019553072625698324 0.019553072625698324 ENST00000420190.6 protein_coding ENSG00000187634.11 SAMD11 protein_coding chr1 + ATG [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .......]

What does the ORF_ID represent? Is it composed of the transcript ID, transcript start, transcript end, and ORF length in four parts?
How can I obtain the start and end positions of the ORF in the genome?
Is there any information that can suggest whether the ORF is located in the intergenic region of the genome?

Thanks a lot,

ORF Annotations

Hello,

I was wondering if there is a way to further break down ORF annotations presented by the ribotricer algorithm.

I would like to get a sense for proportion of different non-canonical ORF types such as extended, truncated, polycistronic, or internal ORFs.

Output format question

Hi,

The output format for the ORF ID is: ENST00000327044.7_944672_945119_192. The first part is the transcript ID, while the last the length of the sORF. What are the middle two numbers?

Thanks!

Can ribotricer infer start codon position?

Queries regarding Ribo-Seq analysis

Hi @saketkc,
I have a few queries regarding Ribo-Seq analysis, which are as follows:-

As most ribosomal footprints are 28-30nt, multi-mapping reads are of major concern. I am curious to know which alignment approach from below is more suitable for typical Ribo-seq data analysis.
i) Genome-based: Aligning the rRNA and tRNA depleted library to the genome using a suitable aligner (i.e. STAR), followed by considering the uniquely mapped reads.
ii) Transcript-based: Aligning the rRNA and tRNA depleted library to the pre-build transcriptome index using a suitable aligner (i.e. STAR). If we consider the uniquely mapped reads, most of the reads aligned on common exons of transcripts isoforms will be filtered.
Please advise in this regard.
How does the RiboTricer assign reads mapped to shared exons of two different transcripts isoforms?
How does the Ribotricer deal with the reads mapped at exon junctions?

Thank you in advance for your valuable suggestions.

Kind regards,
Hitesh

I have confuse about the profile column from translating_ORFs.tsv

Hi, thank you very much for providing this software. And I have a confuse about the profile column from translating_ORFs.tsv

Why most of the values in the column 'profile' is 0? And a little codon that is not 0, does these codons is just start codons? If not, a length of read has around 28nt, so there should be 9 codons, and all 9 codons in profile should have a value. Is there a problem with my understanding?

Allow input of already p-shifted reads (bigwig)

Is your feature request related to a problem? Please describe.
This tool can not input already shifted data, makes it less useful for the community,
since sometimes large bigwig files can be used, which are not possible to fit in a bam file.

Describe the solution you'd like
optional input for bigwig (ORFquant supports this with very little tinkering already)

Describe alternatives you've considered
One alternative is either to use another tool,
or I can also just hack a solution by creating a fork that skips to the step
where you have the alignments as p sites

Additional context
For very large Ribo-seq files (What would have been a > 1 TB bam file etc)
With bigwig, it can be loaded basically instantly.

Ribotricer Output

Hello,
This is my first time using ribotricer to interpret the ribosome sequencing dataset.
I was kind of confused by what the proportion meant in the metagene profile graph. is the proportion of reads that are 29nt and 30nt long / total number of reads that in the bam file that have aligned to the translating ORFs?

Also, I was only able to find metagene profile for reads of these two lengths (29 nt and 30 nt). In the RibORF characterization,

I was able to notice level of periodicity in reads of other lengths as well. I was wondering what might have happened to reads of those lengths.

This is the bam file summary.

Control over minimum coverage + better error messages

Is your feature request related to a problem? Please describe.

Minimum metagene coverage is essentially hard-coded, since a default value is set here, and no interface to that value is provided when calling the function in places like this.

The result is that no reads are selected, and errors appear like:

WARNING: no periodic read length found... using cutoff 0.0

That doesn't really describe the issue- the problem is that no reads passed the coverage threshold.

This also prevents operation in relatively low read counts, which might make sense scientifically, but makes the module hard to use in test scenarios (where scientific validity is not required).

Describe the solution you'd like
Would you be amenable to a detect-orfs CLI parameter which controls this value? Happy to make a PR.

There should also be an error triggered when all reads are removed in the coverage filter, to make it clear that periodicity is not the problem.

Additional context
Attempting to include ribotricer in the nf-core riboseq workflow.

Translation of ORFs

Hi,

I used ribotrice to detect a number of ORFs. And the following step of my research is to translate these ORFs to peptides. So I was wondering if there is a tool that you recommend to further translation based on the output format of ribotrice.

Thank you for your help!

Best,
Yue

Option to include multimapping reads?

If I understand correctly, the program's default setting for handling multi-mapped reads is to disregard them entirely and only calculate translation behavior from single-mapped reads.
Is there a setting that changes this?

Unable to run learn-cutoff function

ribotricer version:
Python version:
Operating System:

Description

I have tried running the learn-cutoff function to determine the phase score I should be using.

Ribotricer is being run through Conda and I am using the most up-to-date version (updated 10.01.23)

What I Did

Command ran:
ribotricer learn-cutoff --ribo_bams /pathtoriboBAM/Ribosome_Footprint_norna_match.sorted.bam \ --rna_bams /pathtornabam/Ribosome_Footprint_mRNA_norna_match.sorted.bam \ --prefix ribo_rna_prefix \ --ribotricer_index /pathtocandidateORFs/RiboTricer_step1_candidate_orfs.tsv

If there was a crash, please include the traceback here.

Got unexpected extra arguments ( --ribotricer_index /path/RiboTricer_step1_candidate_orfs.tsv  --rna_bams /path/Ribosome_Footprint_mRNA_norna_match.sorted.bam  --prefix ribo_rna_prefix)


I cant find what this unexpected argument is. I have tried adding in lots of other options but it still results in the same error.

Thanks so much for your help!

Ribotricer Output

Hello,

I am currently trying to identify ORFs from two Ribosome Sequencing Samples. It is a sort of a test dataset to test whether the sequencing protocol actually works(not a lot of read counts, about half million reads). The protocol is based on the paper "Transcriptome-wide measurement of translation by ribosome profiling" by Nicholas J McGlincy.

In the analysis workflow, I conduct adapter trimming, alignment to rRNA with Bowtie2, ouputting non-aligned reads and then aligning them to transcriptome index of STAR.

Then once I have those reads, I passed the BAM files directly to Ribotricer but obtain no metagenes or translating ORFs (with a default cutoff for human).

I am currently not sure whether it is incomplete adapter trimming issue or the use of STAR aligner (might not be optimal for short reads). Would it be possible to give me an insight on this output?

STAR Output

                             Started job on |	Jul 13 13:07:09
                         Started mapping on |	Jul 13 13:10:36
                                Finished on |	Jul 13 13:11:08
   Mapping speed, Million of reads per hour |	35.59

                      Number of input reads |	316397
                  Average input read length |	37
                                UNIQUE READS:
               Uniquely mapped reads number |	57426
                    Uniquely mapped reads % |	18.15%
                      Average mapped length |	30.13
                   Number of splices: Total |	10824
        Number of splices: Annotated (sjdb) |	10824
                   Number of splices: GT/AG |	7261
                   Number of splices: GC/AG |	3550
                   Number of splices: AT/AC |	11
           Number of splices: Non-canonical |	2
                  Mismatch rate per base, % |	0.47%
                     Deletion rate per base |	0.00%
                    Deletion average length |	1.69
                    Insertion rate per base |	0.00%
                   Insertion average length |	1.00

Bowtie2 Alignment Output to rRNA (would expect about 30% alignment to rRNA)

526556 reads; of these:
526556 (100.00%) were unpaired; of these:
316397 (60.09%) aligned 0 times
59203 (11.24%) aligned exactly 1 time
150956 (28.67%) aligned >1 times
39.91% overall alignment rate
841821 reads; of these:
841821 (100.00%) were unpaired; of these:
456254 (54.20%) aligned 0 times
112594 (13.38%) aligned exactly 1 time
272973 (32.43%) aligned >1 times
45.80% overall alignment rate

For Adapter Trimming
fastx_clipper -i input -a AGATCGGAAGAGCAC (constant linker sequence) -l 20 -Q33 -c -n -v -o output
cutadapt --report=minimal -u 2 -m 16 -O 8 -a PEA1=NNNNNATCGT -o output input

[BUG] Weird results when using profile from Ribotricer output ?

ribotricer version: 1.3.2
Python version: 3.7.16
Operating System: CentOS7 (Linux)

Description

I wanted to generate a visual plot of reads assigned to each frame for several of my ORFs candidates, using profile column in the Ribotricer output file.

For example I tried with this ORF candidate, found on the reverse strand:

What I Did

I used the notebook example you kindly provide: https://github.com/smithlabcode/ribotricer/blob/master/notebooks/Plotting_ribotricer_profile.ipynb

profile = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 1, 0, 2, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 3, 0, 0, 6, 3, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0]
# defined in the notebook
plot_framewise_counts(pd.Series(profile, index=range(1, len(profile)+1)))

I get this plot:

This ORF is predicted as translating (phase_score > human_threshold), but I do not understand how that is possible since Frame 3 is predominant over Frame 1 in nearly positions.

I am also wondering if the profile should be read from right to left when the ORF is one the reverse strand ?

Thanks for your help and ribotricer !
Best,
Paul

About ORF nucleotide sequence

Hi, Sir
Thanks for providing excellent tools, but there is a problem that has been bothering me.
The ORF information contained in the results of ribotricer includes the start and end positions of the ORF on the chromosome (if I understand it correctly). The value obtained by subtracting the starting position from the ending position is much larger than the ORF length, I guess it is because the length of the intron is subtracted.
I want to get the ORF nucleotide sequence information from the result file generated by ribotricer, but I find it very difficult for me. I imagined creating a bed file and then using getfasta to get the sequence, but this did not remove introns for me. Sir, can you recommend a suitable tool or function that can help me get a specific ORF sequence?
Thanks,
LeeLee

make sure the metagene plot from start and stop codon have the same color for the same frame

Otherwise it's confusing

Bed format output needed

Is there any option to output bed format result, or any script to convert existing result to bed format?

Unable to reproduce results from published studies

Hi @saketkc,

Thank you for your prompt reply to my previous issue and for clarifying the doubts.
I am using Ribotricer for my analysis and it is performing well.
I want to assess its accuracy to identify the ORFs detected by published studies. For this, I downloaded the control access data from Heesch et. al., Cell, 2019 (https://pubmed.ncbi.nlm.nih.gov/31155234/) wherein authors have identified 209 novel ORFs based on ribosome profiling data generated for 80 human heart tissues using RiboTaper approach. I reformatted these ORFs according to the Ribotricer database format (File attached) and carried out the Ribo-seq analysis. Unfortunately, I was unable to detect any of these candidates ORFs.

Here is the command for your reference-
ribotricer detect-orfs --bam hs_lv_001rawsortedByCoord.out.bam --ribotricer_index RiboTricer_customDB.txt --prefix --hs_lv_001 phase_score_cutoff 0.44

Could you please provide a possible explanation for this?

I appreciate your kind help.

Thank you

Best regards,
Hitesh

RiboTricer_customDB_07072021_posControl.txt

Can ribotricer infer translation start position?

Continued from #45

Hi,

I mean that there is some hint of where tranlation initiated even in normal Riob-seq without special inhibitor.
For example, well can tell where the translation started in the following situation, for there is no reads or no periodic reads before the second start codon:

[BUG]

ribotricer version: 1.3.3
Python version: 3.10.12
Operating System: GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)

Description

I have two sample to detect orfs. One is done but another gets following bug.

What I Did


ribotricer detect-orfs --bam ${file} --ribotricer_index ${ribotricer_index_tsv} --prefix ${file/bam/} --report_all --phase_score_cutoff 0
Sep 14 10:25:03 ..... started ribotricer detect-orfs
Sep 14 10:25:03 ... started parsing ribotricer index file
Sep 14 10:25:07 ... started inferring experimental design                                                                               
Sep 14 10:25:07 ... started reading bam file
Sep 14 10:25:10 ... started plotting read length distribution                                                                           
Sep 14 10:25:10 ... started calculating metagene profiles. This may take a long time...
                                    
Sep 14 10:25:10 ... started plotting metagene profiles
Sep 14 10:25:10 ... started inferring P-site offsets
WARNING: no periodic read length found... using cutoff 0.0

The difference between the two files is not significant, but read counts of successful one is more than another one.

Is this an error caused by the read counts?

[BUG] speficying unstranded data ('no') leads to error

ribotricer version: 1.3.3
Operating System: Linux

Description

Just experimenting with --stranded

What I Did

ribotricer detect-orfs \
    --bam SRX11780888_chr20.bam \
    --ribotricer_index homo_sapiens_chr20_candidate_orfs.tsv \
    --stranded no \
    --prefix test \
    --phase_score_cutoff 0.05

Error:

Mar 14 10:56:30 ..... started ribotricer detect-orfs
Mar 14 10:56:30 ... started parsing ribotricer index file
Mar 14 10:56:30 ... started reading bam file                                                                                                                                               
[W::hts_idx_load3] The index file is older than the data file: SRX11780888_chr20.bam.bai
  0%|                                                                                                                                                        | 0/566761 [00:00<?, ?reads/s][W::hts_idx_load3] The index file is older than the data file: SRX11780888_chr20.bam.bai
no
Traceback (most recent call last):                                                                                                                                                         
  File "/path/to/mambaforge/envs/ribotricer/bin/ribotricer", line 10, in <module>
    sys.exit(cli())
  File "/path/to/mambaforge/envs/ribotricer/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/path/to/mambaforge/envs/ribotricer/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/path/to/mambaforge/envs/ribotricer/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/path/to/mambaforge/envs/ribotricer/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/path/to/mambaforge/envs/ribotricer/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/path/to/mambaforge/envs/ribotricer/lib/python3.10/site-packages/ribotricer/cli.py", line 244, in detect_orfs_cmd
    detect_orfs(
  File "/path/to/mambaforge/envs/ribotricer/lib/python3.10/site-packages/ribotricer/detect_orfs.py", line 400, in detect_orfs
    alignments, read_length_counts = split_bam(bam, protocol, prefix, read_lengths)
  File "/path/to/mambaforge/envs/ribotricer/lib/python3.10/site-packages/ribotricer/bam.py", line 122, in split_bam
    alignments[length][strand][(chrom, pos + 1)] += 1
TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'

Without looking too hard it seems that pos does not get defined for unstranded data here.

Input files:

test_files.zip

RPF counts to TPM conversion

Hi Sanket,
I want to generate TPM counts out of raw RPFs outputted from RiboTricer. I am using following link to convert raw counts into TPM.

https://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/
Step followed:

Divided the read counts by the length of each transcript in kilobases.
Summed up all the RPK values in a sample and this number was divided by 1,000,000.
Divided the RPK values by the “per million” scaling factor.

I was wondering if I should divide the raw RFP count by ORF length instead of transcript length in step 1. What would be the right approach? Are there any standard packages to convert raw counts into TPM values?

Any help would be greatly appreciated.

Originally posted by @HiteshKore in #91 (comment)

[BUG]

Line 23 in ribotricer/plotting.py needs to come before matplib.pyplot is imported.

import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt

Internal ORFs

I was wondering why Ribotricer does not have an ORF category that is called "internal". I was trying to compare results for ribotricer and ribocode and I identify an internal ORF in RiboCode that does not seem to be in the ribotricer index at all. They are based on the same human reference GRCh38 from Ensembl version 104.

I looked into the ribotricer code, and I can see that there is an ORF type called "internal", but you do not append it (prepare_orfs.py line 258 and 340). Can you help me explain the reasoning for this. Perhaps I am just misunderstanding the code.

I have an example of an internal ORF in transcript ENST00000675536 that is part of Ribocodes indexing, but not Ribotricer (in relation to that transcript). The AA sequence that it translates to is found in other transcripts, so it is not that the ORF can actually not be identified as translating, but the annotation is missing.
I don't know if it is a bug or intended, but I would really like to understand the reasoning. It seems that many ORFs are found in that transcript, so why not that one?
In ribotricer it has the coordinates 89646069_89649410_396 (ENSG00000131165).

Thank you in advance!
I look forward to your answer.
Kind regards,
Anne

Prepare ORFs step

I had a question regarding preparation of all candidate ORFs from annotated transcript regions. I was wondering why only 3 frame translation is conducted to pull out candidate ORF. If I am trying to generate a database of ORFs from my transcript targets of interest, does it make sense to do 6 frame translation to get all candidate ORFs?

Empty result in first step (Preparing candidate ORFs)

Hi,

ribotricer version:
Python version:
Operating System:

Description

I'm trying to get the short ORFs of whole human genome. So I downloaded the gtf file from Ensembl database. And also the fasta file of cDNA and ncRNA sequence. It is reported some warnings like"Chromosome GL000220.1 does not appear in the fasta". And finally I got an empty result file.

What I Did

ribotricer prepare-orfs --gtf Homo_sapiens.GRCh38.110.gtf --fasta Homo_sapiens.GRCh38.ncrna.fa --prefix ncrna_small_peptide --start_codons ATG,CTG,GTG,TTG

Output

Thank you so much for you help!

Best,
Yue

ORF Profile

Hello,

I have been using Ribotricer in my project for quite a while now and I couldnt fully understand the part where ribotricer is agnostic the to the frame. Lot of statistical approaches to ORF prediction with RiboSeq data looks at reads that align to Frame 1. In the case below, It seems that reads covering this ORF belong to Frame 3.

Considering that Frame 1 reads align at the 5' end with the start codon, wouldnt frame 3 reads be considered as out of frame with the current ORF (or in frame with some ORF upstream)? Is 3nt periodicity or the high-low-low pattern sufficient to classify translation?

Ribotricer Output

Hello,

I am trying to merge two list of ORFs predicted by the RibORF algorithm and Ribotricer Algorithm. I am aware that RibORF predicts ORFs at the transcript level while the Ribotricer predicts ORFs at the exonal level. I would like to create a bed file where it encompasses predictions from both outputs. Is there way to merge two ORFs predicted at different level (transcript vs exonal)?

Is there a feature where I could transform the exonal predictions by RibORF to transcript level predictions? Also does it make sense to input an offset corrected bam file (corrected by ribORF) for ribotricer input?

Unstranded Library

Hello,

I am aware that Ribotricer infers strandedness of the protocol and determines whether it is forward or reverse stranded. I was wondering what happens if the protocol is unstranded? How would Ribotricer handle in this case? Thank you!

Question on prepare-orfs "--longest" flag

I understand the stated behavior of the "--longest" flag to choose the 5' most, in-frame start codon. What if this flag is not set? Are all ORFs returned?

About the length of reads

Hi, saketkc
I have a question, when using ribotricer to predict orfs, what is the interval of read lengths considered by ribotricer?
Thanks，
LeeLee

Trouble installing with bioconda [BUG]

ribotricer version:
Python version:
Operating System: Ubuntu

Description

When I try to install ribotricer through conda it is failing. I also tried with pip and it failed as well. I tried this on both a server (CentOS) and my personal computer (Ubuntu) with the same result.

What I Did

conda create -n ribotricer -c bioconda ribotricer
Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: / 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                          

UnsatisfiableError:

Peptide search from ORFs.

This is a follow up question to issue #156.

I have understood the aspect of frames being cyclic in the read profile visualization.

Currently, I am aiming to do peptide search from these ORFs. In the case where we see Frame 3 periodicity for a specific ORF, Can we still use translated peptide sequences from "ribotricer orfs-seq" output"? Since ribotricer do not identify TIS, this implies that translation did not start at the start codon for this ORF, but possibly upstream. Should we look at peptide sequence translated at Frame 3 for that ORF and not Frame 1? My understanding might be flawed so correct me if I am wrong with this assumption.

Error while processing the Ribo-Seq data using transcriptome based index

Hi @saketkc,

I tried to analyze the bam file generated by mapping the ribosome footprints to transcriptome reference. However, I am getting error while running the Ribotricer.

Steps followed-

Transcriptome index generation using STAR:

STAR --runThreadN 15 --runMode genomeGenerate --genomeDir ./gencode_transcripts/ --genomeFastaFiles gencode.v33.transcripts.fa --limitGenomeGenerateRAM 357335127648 --genomeSAindexNbases 13

Ribosomal footprints were aligned to transcriptome reference.
STAR --runThreadN 8 --runMode alignReads --genomeDir ./STAR_merged_transcriptome_index/gencode_transcripts/ --readFilesIn SRR1585486.fastq --alignEndsType EndToEnd --outFileNamePrefix SRR1585486 --outFilterMismatchNmax 2 --outSAMtype BAM SortedByCoordinate --outFilterType BySJout --twopassMode Basic
Ribotricer was used to predict actively translating transcripts

ribotricer detect-orfs --bam SRR1585486.bam --ribotricer_index ./Riboticer/RiboTricer_customDB_02262021.txt --prefix SRR1585486-RT --phase_score_cutoff 0.44 --read_lengths 27,28,29,30,31,32

I am getting the following error-

Oct 28 17:18:18 ..... started ribotricer detect-orfs
Oct 28 17:18:18 ... started parsing ribotricer index file
Oct 28 17:18:20 ... started inferring experimental design
[E::idx_find_and_load] Could not retrieve index file for 'SRR1585486_riboAligned.sortedByCoord.out.bam'
Oct 28 17:19:38 ... started reading bam file
[E::idx_find_and_load] Could not retrieve index file for 'SRR1585486_riboAligned.sortedByCoord.out.bam'
0%| | 0/68162965 [00:00<?, ?reads/s][E::idx_find_and_load] Could not retrieve index file for 'SRR1585486_riboAligned.sortedByCoord.out.bam'
Oct 28 17:22:21 ... started plotting read length distribution
Oct 28 17:22:22 ... started calculating metagene profiles. This may take a long time...

Oct 28 17:25:31 ... started plotting metagene profiles
Oct 28 17:25:31 ... started inferring P-site offsets
Traceback (most recent call last):
File "/software/ribotricer/Python-3.7.7-venv-20210111/bin/ribotricer", line 11, in
load_entry_point('ribotricer==1.3.2', 'console_scripts', 'ribotricer')()
File "/software/ribotricer/Python-3.7.7-venv-20210111/lib/python3.7/site-packages/click-7.1.2-py3.7.egg/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/software/ribotricer/Python-3.7.7-venv-20210111/lib/python3.7/site-packages/click-7.1.2-py3.7.egg/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/software/ribotricer/Python-3.7.7-venv-20210111/lib/python3.7/site-packages/click-7.1.2-py3.7.egg/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/software/ribotricer/Python-3.7.7-venv-20210111/lib/python3.7/site-packages/click-7.1.2-py3.7.egg/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/software/ribotricer/Python-3.7.7-venv-20210111/lib/python3.7/site-packages/click-7.1.2-py3.7.egg/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/software/ribotricer/Python-3.7.7-venv-20210111/lib/python3.7/site-packages/ribotricer-1.3.2-py3.7.egg/ribotricer/cli.py", line 255, in detect_orfs_cmd
File "/software/ribotricer/Python-3.7.7-venv-20210111/lib/python3.7/site-packages/ribotricer-1.3.2-py3.7.egg/ribotricer/detect_orfs.py", line 443, in detect_orfs
File "/software/ribotricer/Python-3.7.7-venv-20210111/lib/python3.7/site-packages/ribotricer-1.3.2-py3.7.egg/ribotricer/metagene.py", line 281, in align_metagenes
File "<array_function internals>", line 6, in correlate
File "/software/ribotricer/Python-3.7.7-venv-20210111/lib/python3.7/site-packages/numpy-1.19.5-py3.7-linux-x86_64.egg/numpy/core/numeric.py", line 713, in correlate
return multiarray.correlate2(a, v, mode)
ValueError: first array argument cannot be empty

I appreciate your help in resolving this issue.

Thank you

Kind regards,
Hitesh

p-site offsets and metagene profiles

Hi Saket,

Thanks for developing this great tool! I am surprised to get these results for the psite_offsets.txtoutput file:

relative lag to base: 33
	lag of 28: 0
	lag of 22: 0
	lag of 33: 0
	lag of 27: 0
	lag of 24: 0
	lag of 32: 0
	lag of 34: 0
	lag of 31: 0
	lag of 30: -1
	lag of 25: 0
	lag of 26: 0
	lag of 29: 0
	lag of 35: 0
	lag of 36: 0
	lag of 23: 0
	lag of 21: 0
	lag of 20: 0

How should I interpret that?

And I find my metagene profiles a bit strange. Especially around the stop codons. See this example:

While using RibORF with the same ribo-seq bam file I get this type of profiles showing a p-site offset of 13 nt for both start and stop codon:

Why do you think this is the case and should I then trust the results of my translating_ORFs.tsv output?

Thanks!
Pierre

About the read density in the output file

Hi, saketkc

I would like to confirm the definition of the 'reads density' column in the final output file of Ribotricer. Are the 'reads' here not normalized, similar to ‘Counts’ in RNA-seq?

Thanks,
LeeLee

[BUG]

ribotricer version: 1.3.2
Python version: 3.7
Operating System: linux

Description

I just downloaded and installed ribotricer using conda install ribotricer. My computer recognizes that it is installed and is in the correct place when I call which ribotricer. However, trying to get more info or use the ribotricer commands leads to this error. Is is something to do with pysam being installed incorrectly?

What I Did

$ ribotricer
/homes/jimmx/miniconda3/lib/python3.7/site-packages/tqdm/std.py:668: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version
  from pandas import Panel
Traceback (most recent call last):
  File "/homes/jimmx/miniconda3/bin/ribotricer", line 33, in <module>
    sys.exit(load_entry_point('ribotricer==1.3.2', 'console_scripts', 'ribotricer')())
  File "/homes/jimmx/miniconda3/lib/python3.7/site-packages/pkg_resources/__init__.py", line 489, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/homes/jimmx/miniconda3/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2861, in load_entry_point
    return ep.load()
  File "/homes/jimmx/miniconda3/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2461, in load
    return self.resolve()
  File "/homes/jimmx/miniconda3/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2467, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/homes/jimmx/miniconda3/lib/python3.7/site-packages/ribotricer/cli.py", line 31, in <module>
    from .detect_orfs import detect_orfs
  File "/homes/jimmx/miniconda3/lib/python3.7/site-packages/ribotricer/detect_orfs.py", line 22, in <module>
    from .infer_protocol import infer_protocol
  File "/homes/jimmx/miniconda3/lib/python3.7/site-packages/ribotricer/infer_protocol.py", line 19, in <module>
    import pysam
  File "/homes/jimmx/.local/lib/python3.7/site-packages/pysam/__init__.py", line 5, in <module>
    from pysam.libchtslib import *
  File "pysam/libchtslib.pyx", line 1, in init pysam.libchtslib
ImportError: libchtslib.cpython-37m-x86_64-linux-gnu.so: cannot open shared object file: No such file or directory

pip install failed

Last login: Sun May  5 21:57:38 on ttys000
Leeyukuang@~$ source activate riboseq
(riboseq) Leeyukuang@~$ conda install ribotricer
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - ribotricer

Current channels:

  - https://repo.continuum.io/pkgs/main/osx-64
  - https://repo.continuum.io/pkgs/main/noarch
  - https://repo.continuum.io/pkgs/free/osx-64
  - https://repo.continuum.io/pkgs/free/noarch
  - https://repo.continuum.io/pkgs/r/osx-64
  - https://repo.continuum.io/pkgs/r/noarch
  - https://repo.continuum.io/pkgs/pro/osx-64
  - https://repo.continuum.io/pkgs/pro/noarch


(riboseq) Leeyukuang@~$ conda install -c bioconda ribotricer
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - ribotricer
  - click-help-colors[version='>=0.3']

Current channels:

  - https://conda.anaconda.org/bioconda/osx-64
  - https://conda.anaconda.org/bioconda/noarch
  - https://repo.continuum.io/pkgs/main/osx-64
  - https://repo.continuum.io/pkgs/main/noarch
  - https://repo.continuum.io/pkgs/free/osx-64
  - https://repo.continuum.io/pkgs/free/noarch
  - https://repo.continuum.io/pkgs/r/osx-64
  - https://repo.continuum.io/pkgs/r/noarch
  - https://repo.continuum.io/pkgs/pro/osx-64
  - https://repo.continuum.io/pkgs/pro/noarch


(riboseq) Leeyukuang@~$ gc
Duo two-factor login for wenzhenl

Enter a passcode or select one of the following options:

 1. Duo Push to XXX-XXX-2869
 2. Phone call to XXX-XXX-2869
 3. SMS passcodes to XXX-XXX-2869 (next code starts with: 1)

Passcode or option (1-3): 1
Success. Logging you in...
Last login: Fri May  3 21:00:31 2019 from cpe-104-32-174-82.socal.res.rr.com
********************************************************************************

      Thursday, November 9, 2017
      All users of this computer system acknowledge that activities on it
      may be subject to monitoring;  the privacy of activities on this
      system cannot be ensured.  All computer account users are required
      to read and abide by the ITS Computing and Usage Policies.  Please
      refer to the web page at:  https://policy.usc.edu/technology/

********************************************************************************
/usr/bin/cat: /home/rcf-40/wenzhenl/.dir_colors: Is a directory
hpc-cmb@[~]$ conda create -n ribotricer python=3
Fetching package metadata .................
Solving package specifications: .

Package plan for installation in environment /home/cmb-panasas2/wenzhenl/miniconda3/envs/ribotricer:

The following NEW packages will be INSTALLED:

    bzip2:           1.0.6-h14c3975_1002  conda-forge
    ca-certificates: 2019.3.9-hecc5488_0  conda-forge
    certifi:         2019.3.9-py37_0      conda-forge
    libffi:          3.2.1-he1b5a44_1006  conda-forge
    libgcc-ng:       8.2.0-hdf63c60_1                
    libstdcxx-ng:    8.2.0-hdf63c60_1                
    ncurses:         6.1-hf484d3e_1002    conda-forge
    openssl:         1.1.1b-h14c3975_1    conda-forge
    pip:             19.1-py37_0          conda-forge
    python:          3.7.3-h5b0a415_0     conda-forge
    readline:        7.0-hf8c457e_1001    conda-forge
    setuptools:      41.0.1-py37_0        conda-forge
    sqlite:          3.26.0-h67949de_1001 conda-forge
    tk:              8.6.9-h84994c4_1001  conda-forge
    wheel:           0.33.1-py37_0        conda-forge
    xz:              5.2.4-h14c3975_1001  conda-forge
    zlib:            1.2.11-h14c3975_1004 conda-forge

Proceed ([y]/n)? y

ca-certificate 100% |#########################################################################| Time: 0:00:00   3.23 MB/s
libffi-3.2.1-h 100% |#########################################################################| Time: 0:00:00   4.14 MB/s
openssl-1.1.1b 100% |#########################################################################| Time: 0:00:00  12.52 MB/s
tk-8.6.9-h8499 100% |#########################################################################| Time: 0:00:00  13.60 MB/s
sqlite-3.26.0- 100% |#########################################################################| Time: 0:00:00  11.10 MB/s
python-3.7.3-h 100% |#########################################################################| Time: 0:00:01  24.24 MB/s
certifi-2019.3 100% |#########################################################################| Time: 0:00:00   2.28 MB/s
setuptools-41. 100% |#########################################################################| Time: 0:00:00   3.50 MB/s
pip-19.1-py37_ 100% |#########################################################################| Time: 0:00:00   9.62 MB/s
#
# To activate this environment, use:
# > source activate ribotricer
#
# To deactivate an active environment, use:
# > source deactivate
#

hpc-cmb@[~]$ source activate ribotricer
(ribotricer) hpc-cmb@[~]$ conda install -c bioconda ribotricer
Fetching package metadata .................
Solving package specifications: .

UnsatisfiableError: The following specifications were found to be in conflict:
  - python 3.7*
  - ribotricer -> python >=3.6,<3.7.0a0 -> readline 6.2
  - ribotricer -> python >=3.6,<3.7.0a0 -> sqlite 3.13.*
  - ribotricer -> python >=3.6,<3.7.0a0 -> tk 8.5.18
Use "conda info <package>" to see the dependencies for each package.

(ribotricer) hpc-cmb@[~]$ which python
/home/cmb-panasas2/wenzhenl/miniconda3/envs/ribotricer/bin/python
(ribotricer) hpc-cmb@[~]$ python
Python 3.7.3 | packaged by conda-forge | (default, Mar 27 2019, 23:01:00) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()
(ribotricer) hpc-cmb@[~]$ conda list | more
# packages in environment at /home/cmb-panasas2/wenzhenl/miniconda3/envs/ribotricer:
#
bzip2                     1.0.6             h14c3975_1002    conda-forge
ca-certificates           2019.3.9             hecc5488_0    conda-forge
certifi                   2019.3.9                 py37_0    conda-forge
libffi                    3.2.1             he1b5a44_1006    conda-forge
libgcc-ng                 8.2.0                hdf63c60_1  
libstdcxx-ng              8.2.0                hdf63c60_1  
ncurses                   6.1               hf484d3e_1002    conda-forge
openssl                   1.1.1b               h14c3975_1    conda-forge
pip                       19.1                     py37_0    conda-forge
python                    3.7.3                h5b0a415_0    conda-forge
readline                  7.0               hf8c457e_1001    conda-forge
setuptools                41.0.1                   py37_0    conda-forge
sqlite                    3.26.0            h67949de_1001    conda-forge
tk                        8.6.9             h84994c4_1001    conda-forge
wheel                     0.33.1                   py37_0    conda-forge
xz                        5.2.4             h14c3975_1001    conda-forge
zlib                      1.2.11            h14c3975_1004    conda-forge
(ribotricer) hpc-cmb@[~]$ pip install ribotricer
Collecting ribotricer
  Downloading https://files.pythonhosted.org/packages/ad/a3/8c874674612a06a809e9e22826986642d02212d4434b1786ad4a20fd67e3/ribotricer-1.0.2-py3-none-any.whl (50kB)
     |████████████████████████████████| 51kB 1.6MB/s 
Collecting matplotlib>=2.1.0 (from ribotricer)
  Downloading https://files.pythonhosted.org/packages/83/2a/e47bbd9396af32376863a426baed62d9bf3091f81defd1fe81c5f33b11a3/matplotlib-3.0.3-cp37-cp37m-manylinux1_x86_64.whl (13.0MB)
     |████████████████████████████████| 13.0MB 14.0MB/s 
Collecting click>=6.0 (from ribotricer)
  Downloading https://files.pythonhosted.org/packages/fa/37/45185cb5abbc30d7257104c434fe0b07e5a195a6847506c074527aa599ec/Click-7.0-py2.py3-none-any.whl (81kB)
     |████████████████████████████████| 81kB 3.0MB/s 
Collecting click-help-colors>=0.3 (from ribotricer)
  Downloading https://files.pythonhosted.org/packages/f8/a2/bbd2b60ba4f82048613b480a373339753539f77e243fae74340ac6410384/click_help_colors-0.5-py3-none-any.whl
Collecting tqdm>=4.23.4 (from ribotricer)
  Downloading https://files.pythonhosted.org/packages/6c/4b/c38b5144cf167c4f52288517436ccafefe9dc01b8d1c190e18a6b154cd4a/tqdm-4.31.1-py2.py3-none-any.whl (48kB)
     |████████████████████████████████| 51kB 2.0MB/s 
Collecting quicksect>=0.2.0 (from ribotricer)
  Downloading https://files.pythonhosted.org/packages/88/a6/d1e8fadda67eb10d633d32b2b4d6232e853d51aa1f793605afbfcd684377/quicksect-0.2.0.tar.gz (65kB)
     |████████████████████████████████| 71kB 2.8MB/s 
Collecting scipy>=0.19.1 (from ribotricer)
  Downloading https://files.pythonhosted.org/packages/3e/7e/5cee36eee5b3194687232f6150a89a38f784883c612db7f4da2ab190980d/scipy-1.2.1-cp37-cp37m-manylinux1_x86_64.whl (24.8MB)
     |████████████████████████████████| 24.8MB 26.0MB/s 
Collecting pyfaidx>=0.5.0 (from ribotricer)
  Downloading https://files.pythonhosted.org/packages/75/a5/7e2569527b3849ea28d79b4f70d7cf46a47d36459bc59e0efa4e10e8c8b2/pyfaidx-0.5.5.2.tar.gz
Collecting numpy>=1.11.0 (from ribotricer)
  Downloading https://files.pythonhosted.org/packages/bb/76/24e9f32c78e6f6fb26cf2596b428f393bf015b63459468119f282f70a7fd/numpy-1.16.3-cp37-cp37m-manylinux1_x86_64.whl (17.3MB)
     |████████████████████████████████| 17.3MB 20.3MB/s 
Collecting pandas>=0.20.3 (from ribotricer)
  Downloading https://files.pythonhosted.org/packages/22/e6/2d47835f91eb010036be207581fa113fb4e3822ec1b4bafb0d3d105fede6/pandas-0.24.2-cp37-cp37m-manylinux1_x86_64.whl (10.1MB)
     |████████████████████████████████| 10.1MB 22.5MB/s 
Collecting pysam>=0.11.2.2 (from ribotricer)
  Downloading https://files.pythonhosted.org/packages/20/1a/4fd27da2d19f7d914f757097605709a6509776b4cb63f42bca63f3531058/pysam-0.15.2-cp37-cp37m-manylinux1_x86_64.whl (9.5MB)
     |████████████████████████████████| 9.5MB 25.0MB/s 
Collecting kiwisolver>=1.0.1 (from matplotlib>=2.1.0->ribotricer)
  Downloading https://files.pythonhosted.org/packages/93/f8/518fb0bb89860eea6ff1b96483fbd9236d5ee991485d0f3eceff1770f654/kiwisolver-1.1.0-cp37-cp37m-manylinux1_x86_64.whl (90kB)
     |████████████████████████████████| 92kB 3.6MB/s 
Collecting python-dateutil>=2.1 (from matplotlib>=2.1.0->ribotricer)
  Downloading https://files.pythonhosted.org/packages/41/17/c62faccbfbd163c7f57f3844689e3a78bae1f403648a6afb1d0866d87fbb/python_dateutil-2.8.0-py2.py3-none-any.whl (226kB)
     |████████████████████████████████| 235kB 23.5MB/s 
Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 (from matplotlib>=2.1.0->ribotricer)
  Downloading https://files.pythonhosted.org/packages/dd/d9/3ec19e966301a6e25769976999bd7bbe552016f0d32b577dc9d63d2e0c49/pyparsing-2.4.0-py2.py3-none-any.whl (62kB)
     |████████████████████████████████| 71kB 2.7MB/s 
Collecting cycler>=0.10 (from matplotlib>=2.1.0->ribotricer)
  Using cached https://files.pythonhosted.org/packages/f7/d2/e07d3ebb2bd7af696440ce7e754c59dd546ffe1bbe732c8ab68b9c834e61/cycler-0.10.0-py2.py3-none-any.whl
Collecting six (from pyfaidx>=0.5.0->ribotricer)
  Using cached https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl
Requirement already satisfied: setuptools>=0.7 in /panfs/cmb-panasas2/wenzhenl/miniconda3/envs/ribotricer/lib/python3.7/site-packages (from pyfaidx>=0.5.0->ribotricer) (41.0.1)
Collecting pytz>=2011k (from pandas>=0.20.3->ribotricer)
  Downloading https://files.pythonhosted.org/packages/3d/73/fe30c2daaaa0713420d0382b16fbb761409f532c56bdcc514bf7b6262bb6/pytz-2019.1-py2.py3-none-any.whl (510kB)
     |████████████████████████████████| 512kB 25.7MB/s 
Building wheels for collected packages: quicksect, pyfaidx
  Building wheel for quicksect (setup.py) ... error
  ERROR: Complete output from command /home/cmb-panasas2/wenzhenl/miniconda3/envs/ribotricer/bin/python -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-install-0zndlqqc/quicksect/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-8cslk_m9 --python-tag cp37:
  ERROR: running bdist_wheel
  running build
  running build_ext
  skipping 'src/quicksect.c' Cython extension (up-to-date)
  building 'quicksect' extension
  creating build
  creating build/temp.linux-x86_64-3.7
  creating build/temp.linux-x86_64-3.7/src
  gcc -pthread -B /home/cmb-panasas2/wenzhenl/miniconda3/envs/ribotricer/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/cmb-panasas2/wenzhenl/miniconda3/envs/ribotricer/include/python3.7m -c src/quicksect.c -o build/temp.linux-x86_64-3.7/src/quicksect.o
  src/quicksect.c: In function ‘__Pyx_PyCFunction_FastCall’:
  src/quicksect.c:9016:5: error: too many arguments to function ‘(struct PyObject * (*)(struct PyObject *, struct PyObject * const*, Py_ssize_t))meth’
       return (*((__Pyx_PyCFunctionFast)meth)) (self, args, nargs, NULL);
       ^
  src/quicksect.c: In function ‘__Pyx__ExceptionSave’:
  src/quicksect.c:9402:19: error: ‘PyThreadState’ has no member named ‘exc_type’
       *type = tstate->exc_type;
                     ^
  src/quicksect.c:9403:20: error: ‘PyThreadState’ has no member named ‘exc_value’
       *value = tstate->exc_value;
                      ^
  src/quicksect.c:9404:17: error: ‘PyThreadState’ has no member named ‘exc_traceback’
       *tb = tstate->exc_traceback;
                   ^
  src/quicksect.c: In function ‘__Pyx__ExceptionReset’:
  src/quicksect.c:9411:22: error: ‘PyThreadState’ has no member named ‘exc_type’
       tmp_type = tstate->exc_type;
                        ^
  src/quicksect.c:9412:23: error: ‘PyThreadState’ has no member named ‘exc_value’
       tmp_value = tstate->exc_value;
                         ^
  src/quicksect.c:9413:20: error: ‘PyThreadState’ has no member named ‘exc_traceback’
       tmp_tb = tstate->exc_traceback;
                      ^
  src/quicksect.c:9414:11: error: ‘PyThreadState’ has no member named ‘exc_type’
       tstate->exc_type = type;
             ^
  src/quicksect.c:9415:11: error: ‘PyThreadState’ has no member named ‘exc_value’
       tstate->exc_value = value;
             ^
  src/quicksect.c:9416:11: error: ‘PyThreadState’ has no member named ‘exc_traceback’
       tstate->exc_traceback = tb;
             ^
  src/quicksect.c: In function ‘__Pyx__GetException’:
  src/quicksect.c:9471:22: error: ‘PyThreadState’ has no member named ‘exc_type’
       tmp_type = tstate->exc_type;
                        ^
  src/quicksect.c:9472:23: error: ‘PyThreadState’ has no member named ‘exc_value’
       tmp_value = tstate->exc_value;
                         ^
  src/quicksect.c:9473:20: error: ‘PyThreadState’ has no member named ‘exc_traceback’
       tmp_tb = tstate->exc_traceback;
                      ^
  src/quicksect.c:9474:11: error: ‘PyThreadState’ has no member named ‘exc_type’
       tstate->exc_type = local_type;
             ^
  src/quicksect.c:9475:11: error: ‘PyThreadState’ has no member named ‘exc_value’
       tstate->exc_value = local_value;
             ^
  src/quicksect.c:9476:11: error: ‘PyThreadState’ has no member named ‘exc_traceback’
       tstate->exc_traceback = local_tb;
             ^
  src/quicksect.c: In function ‘__Pyx__ExceptionSwap’:
  src/quicksect.c:10574:22: error: ‘PyThreadState’ has no member named ‘exc_type’
       tmp_type = tstate->exc_type;
                        ^
  src/quicksect.c:10575:23: error: ‘PyThreadState’ has no member named ‘exc_value’
       tmp_value = tstate->exc_value;
                         ^
  src/quicksect.c:10576:20: error: ‘PyThreadState’ has no member named ‘exc_traceback’
       tmp_tb = tstate->exc_traceback;
                      ^
  src/quicksect.c:10577:11: error: ‘PyThreadState’ has no member named ‘exc_type’
       tstate->exc_type = *type;
             ^
  src/quicksect.c:10578:11: error: ‘PyThreadState’ has no member named ‘exc_value’
       tstate->exc_value = *value;
             ^
  src/quicksect.c:10579:11: error: ‘PyThreadState’ has no member named ‘exc_traceback’
       tstate->exc_traceback = *tb;
             ^
  error: command 'gcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for quicksect
  Running setup.py clean for quicksect
  Building wheel for pyfaidx (setup.py) ... done
  Stored in directory: /home/rcf-40/wenzhenl/.cache/pip/wheels/54/a2/b4/e242e58d23b2808e191b214067880faa46cd2341f363886e0b
Successfully built pyfaidx
Failed to build quicksect
Installing collected packages: kiwisolver, numpy, six, python-dateutil, pyparsing, cycler, matplotlib, click, click-help-colors, tqdm, quicksect, scipy, pyfaidx, pytz, pandas, pysam, ribotricer
  Running setup.py install for quicksect ... error
    ERROR: Complete output from command /home/cmb-panasas2/wenzhenl/miniconda3/envs/ribotricer/bin/python -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-install-0zndlqqc/quicksect/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-46w7y6r5/install-record.txt --single-version-externally-managed --compile:
    ERROR: running install
    running build
    running build_ext
    skipping 'src/quicksect.c' Cython extension (up-to-date)
    building 'quicksect' extension
    creating build
    creating build/temp.linux-x86_64-3.7
    creating build/temp.linux-x86_64-3.7/src
    gcc -pthread -B /home/cmb-panasas2/wenzhenl/miniconda3/envs/ribotricer/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/cmb-panasas2/wenzhenl/miniconda3/envs/ribotricer/include/python3.7m -c src/quicksect.c -o build/temp.linux-x86_64-3.7/src/quicksect.o
    src/quicksect.c: In function ‘__Pyx_PyCFunction_FastCall’:
    src/quicksect.c:9016:5: error: too many arguments to function ‘(struct PyObject * (*)(struct PyObject *, struct PyObject * const*, Py_ssize_t))meth’
         return (*((__Pyx_PyCFunctionFast)meth)) (self, args, nargs, NULL);
         ^
    src/quicksect.c: In function ‘__Pyx__ExceptionSave’:
    src/quicksect.c:9402:19: error: ‘PyThreadState’ has no member named ‘exc_type’
         *type = tstate->exc_type;
                       ^
    src/quicksect.c:9403:20: error: ‘PyThreadState’ has no member named ‘exc_value’
         *value = tstate->exc_value;
                        ^
    src/quicksect.c:9404:17: error: ‘PyThreadState’ has no member named ‘exc_traceback’
         *tb = tstate->exc_traceback;
                     ^
    src/quicksect.c: In function ‘__Pyx__ExceptionReset’:
    src/quicksect.c:9411:22: error: ‘PyThreadState’ has no member named ‘exc_type’
         tmp_type = tstate->exc_type;
                          ^
    src/quicksect.c:9412:23: error: ‘PyThreadState’ has no member named ‘exc_value’
         tmp_value = tstate->exc_value;
                           ^
    src/quicksect.c:9413:20: error: ‘PyThreadState’ has no member named ‘exc_traceback’
         tmp_tb = tstate->exc_traceback;
                        ^
    src/quicksect.c:9414:11: error: ‘PyThreadState’ has no member named ‘exc_type’
         tstate->exc_type = type;
               ^
    src/quicksect.c:9415:11: error: ‘PyThreadState’ has no member named ‘exc_value’
         tstate->exc_value = value;
               ^
    src/quicksect.c:9416:11: error: ‘PyThreadState’ has no member named ‘exc_traceback’
         tstate->exc_traceback = tb;
               ^
    src/quicksect.c: In function ‘__Pyx__GetException’:
    src/quicksect.c:9471:22: error: ‘PyThreadState’ has no member named ‘exc_type’
         tmp_type = tstate->exc_type;
                          ^
    src/quicksect.c:9472:23: error: ‘PyThreadState’ has no member named ‘exc_value’
         tmp_value = tstate->exc_value;
                           ^
    src/quicksect.c:9473:20: error: ‘PyThreadState’ has no member named ‘exc_traceback’
         tmp_tb = tstate->exc_traceback;
                        ^
    src/quicksect.c:9474:11: error: ‘PyThreadState’ has no member named ‘exc_type’
         tstate->exc_type = local_type;
               ^
    src/quicksect.c:9475:11: error: ‘PyThreadState’ has no member named ‘exc_value’
         tstate->exc_value = local_value;
               ^
    src/quicksect.c:9476:11: error: ‘PyThreadState’ has no member named ‘exc_traceback’
         tstate->exc_traceback = local_tb;
               ^
    src/quicksect.c: In function ‘__Pyx__ExceptionSwap’:
    src/quicksect.c:10574:22: error: ‘PyThreadState’ has no member named ‘exc_type’
         tmp_type = tstate->exc_type;
                          ^
    src/quicksect.c:10575:23: error: ‘PyThreadState’ has no member named ‘exc_value’
         tmp_value = tstate->exc_value;
                           ^
    src/quicksect.c:10576:20: error: ‘PyThreadState’ has no member named ‘exc_traceback’
         tmp_tb = tstate->exc_traceback;
                        ^
    src/quicksect.c:10577:11: error: ‘PyThreadState’ has no member named ‘exc_type’
         tstate->exc_type = *type;
               ^
    src/quicksect.c:10578:11: error: ‘PyThreadState’ has no member named ‘exc_value’
         tstate->exc_value = *value;
               ^
    src/quicksect.c:10579:11: error: ‘PyThreadState’ has no member named ‘exc_traceback’
         tstate->exc_traceback = *tb;
               ^
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command "/home/cmb-panasas2/wenzhenl/miniconda3/envs/ribotricer/bin/python -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-install-0zndlqqc/quicksect/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-46w7y6r5/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-0zndlqqc/quicksect/

[BUG]

ribotricer version:1.3.2
Python version: 3.6
Operating System: Ubuntu18.04

Description

I installed ribotricer on conda according to the installation command you gave. But when I run “ribotricer --help”， I got an error.

What I Did

conda create -n ribotricer_env -c bioconda ribotricer
conda activate ribotricer_env
ribotricer --help

The error message is as follows:

Traceback (most recent call last):
  File "miniconda3/envs/ribotricer_env/bin/ribotricer", line 10, in <module>
    sys.exit(cli())
  File "/home/zsq/miniconda3/envs/ribotricer_env/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/zsq/miniconda3/envs/ribotricer_env/lib/python3.6/site-packages/click/core.py", line 696, in main
    with self.make_context(prog_name, args, **extra) as ctx:
  File "/home/zsq/miniconda3/envs/ribotricer_env/lib/python3.6/site-packages/click/core.py", line 621, in make_context
    self.parse_args(ctx, args)
  File "/home/zsq/miniconda3/envs/ribotricer_env/lib/python3.6/site-packages/click/core.py", line 1018, in parse_args
    rest = Command.parse_args(self, ctx, args)
  File "/home/zsq/miniconda3/envs/ribotricer_env/lib/python3.6/site-packages/click/core.py", line 880, in parse_args
    value, args = param.handle_parse_result(ctx, opts, args)
  File "/home/zsq/miniconda3/envs/ribotricer_env/lib/python3.6/site-packages/click/core.py", line 1404, in handle_parse_result
    self.callback, ctx, self, value)
  File "/home/zsq/miniconda3/envs/ribotricer_env/lib/python3.6/site-packages/click/core.py", line 78, in invoke_param_callback
    return callback(ctx, param, value)
  File "/home/zsq/miniconda3/envs/ribotricer_env/lib/python3.6/site-packages/click/core.py", line 809, in show_help
    echo(ctx.get_help(), color=ctx.color)
  File "/home/zsq/miniconda3/envs/ribotricer_env/lib/python3.6/site-packages/click/core.py", line 496, in get_help
    return self.command.get_help(self)
  File "/home/zsq/miniconda3/envs/ribotricer_env/lib/python3.6/site-packages/click_help_colors/core.py", line 53, in get_help
    self.format_help(ctx, formatter)
  File "/home/zsq/miniconda3/envs/ribotricer_env/lib/python3.6/site-packages/click/core.py", line 843, in format_help
    self.format_usage(ctx, formatter)
  File "/home/zsq/miniconda3/envs/ribotricer_env/lib/python3.6/site-packages/click/core.py", line 782, in format_usage
    formatter.write_usage(ctx.command_path, ' '.join(pieces))
  File "/home/zsq/miniconda3/envs/ribotricer_env/lib/python3.6/site-packages/click_help_colors/core.py", line 23, in write_usage
    colorized_prefix = _colorize(prefix, color=self.headers_color)
  File "/home/zsq/miniconda3/envs/ribotricer_env/lib/python3.6/site-packages/click_help_colors/utils.py", line 14, in _colorize
    return '\033[%dm' % (_ansi_colors[color]) + text + _ansi_reset_all
TypeError: tuple indices must be integers or slices, not str

about 3nt periodicity output

Hi Saket,
Thanks for developing this great tool!

I have a question on 3nt periodicity:

I used ribotricer find many ORFs in lncRNAs. However, I found adjust parameter would give different number of ORFs on lncRNAs.

Therefore, I want to draw periodicity on some lncRNA's ORF to check if lncRNA's ORF have false positive.

I don't know how to use ribotricer to meet my requirement.

Thank you for your kindly help~

Question about handling of non-unique alignments

Dear Ribotricer developers,

According to the documentation, it is recommended to include outFilterMultimapNmax 1 parameter in STAR alignment to exclude non-unique alignments and reduce noise for downstream analyses.

In case of default outFilterMultimapNmax 10 setting, how does Ribotricer handle non-unique alignments? Are they included in ORF detection? Does Ribotricer differentiate between primary and secondary alignment flags when dealing with multi-mapped reads?

Thank you very much!

Ribotricer with replicates

Hello,

I know this issue is closed, but seemed like an appropriate place to asks these questions.

I have multiple replicates of 3 different riboseq type (4 reps for 2 experimental conditions for chx, harringtonine and no drug treated riboseq).

What would be the optimum strategy for analysis using Ribotricer? Pool all samples, pool the 3 different riboseq types for each replicate, or run each sample individually?

Thanks,
Colin

Originally posted by @colin986 in #46 (comment)

Confusion about orf information

Hi, saketkc
Excuse me again, I am a little confused about the orf information given by ribotricer. I found that the length of some orf is not an integer multiple of 3. For example, the following line:
ENST00000600966.1_58350594_58353129_917 | annotated | translating | 0.616830824 | 2987 | 917 | 214 | 0.701639 | 9.793443
I don't quite understand such an orf, why does this happen?
Thanks，
LeeLee

smithlabcode / ribotricer Goto Github PK

ribotricer's Introduction

ribotricer: Accurate detection of short and long active ORFs using Ribo-seq data

Installation

Workflow of ribotricer

Preparing candidate ORFs

Detecting translating ORFs

Definition of ORF types

Learning cutoff empirically from data

Visualizing ribotricer output

Contacts and bug reports

LICENSE

ribotricer's People

Contributors

Stargazers

Watchers

Forkers

ribotricer's Issues

Description

What I Did

Description

What I Did

Description

What I Did

Description

What I Did

Description

What I Did

Output

Description

What I Did

Description

What I Did

Description

What I Did

Recommend Projects

Recommend Topics

Recommend Org