Code Monkey home page Code Monkey logo

circexplorer's Introduction

CIRCexplorer

Build Status Coverage Status PyPI version install with bioconda The MIT License Anaconda-Server Downloads

CIRCexplorer is a combined strategy to identify junction reads from back spliced exons and intron lariats.

Version: 1.1.10

Last Modified: 2016-7-14

Authors: Xiao-Ou Zhang ([email protected]), Li Yang ([email protected])

Maintainer: Xu-Kai Ma ([email protected])

Download the latest stable version of CIRCexplorer

To see what has changed in recent versions of CIRCexplorer, see the CHANGELOG.

FAQ

After one year's tensive development, we have upgraded and extended CIRCexplorer to a new version -- CIRCexplorer2, with many improvements and lots of new features. Welcome to use and cite it!

NEWS: CIRCpedia is an integrative database, aiming to annotating alternative back-splicing and alternative splicing in circRNAs across different cell lines. Welcome to use!

A schematic flow shows the pipeline

pipeline

Notice

CIRCexplorer is now only a circular RNA annotating tool, and it parses fusion junction information from mapping results of other aligners. The result of circular RNA annotating is directly dependent on the mapping strategy of aligners. Different aligners may have different circular RNA annotations. CIRCexplorer is only in charge of annotating circRNA junctions according to the gene annotation. More functions could be found in CIRCexplorer2.

Prerequisites

Software / Package

TopHat or STAR

Others

RNA-seq

The poly(A)βˆ’/riboβˆ’ RNA-seq is recommended. If you want to obtain more circular RNAs, RNase R treatment could be performed.

Aligner

CIRCexplorer was originally developed as a circular RNA analysis toolkit supporting TopHat & TopHat-Fusion. After version 1.1.0, it also supports STAR.

TopHat & TopHat-Fusion

To obtain junction reads for circular RNAs, two-step mapping strategy was exploited:

  • Multiple mapping with TopHat
tophat2 -a 6 --microexon-search -m 2 -p 10 -G knownGene.gtf -o tophat hg19_bowtie2_index RNA_seq.fastq
  • Convert unmapped reads (using bamToFastq from bedtools)
bamToFastq -i tophat/unmapped.bam -fq tophat/unmapped.fastq
  • Unique mapping with TopHat-Fusion
tophat2 -o tophat_fusion -p 15 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search hg19_bowtie1_index tophat/unmapped.fastq

STAR

To detect fusion junctions with STAR, --chimSegmentMin should be set to a positive value.

Example 1 (single reads):

STAR --chimSegmentMin 10 --runThreadN 10 --genomeDir hg19_STAR_index --readFilesIn RNA_seq.fastq

Example 2 (paired-end reads):

STAR --chimSegmentMin 10 --runThreadN 10 --genomeDir hg19_STAR_index --readFilesIn read_1.fastq read_2.fastq

For more details about STAR, please refer to STAR manual.

Installation

Option 1: using pip

pip install CIRCexplorer

Option 2: via conda

CIRCexplorer is available as conda package with:

conda install circexplorer --channel bioconda

Option 3: in galaxy

If you have access to a Galaxy instance, CIRCexplorer is also available from the Galaxy Tool Shed.

Option 4: from source codes

1 Download CIRCexplorer

git clone https://github.com/YangLab/CIRCexplorer.git
cd CIRCexplorer

2 Install required packages

pip install -r requirements.txt

3 Install CIRCexplorer

python setup.py install

Usage

CIRCexplorer.py 1.1.10 -- circular RNA analysis toolkits.

Usage: CIRCexplorer.py [options]

Options:
    -h --help                      Show this screen.
    --version                      Show version.
    -f FUSION --fusion=FUSION      TopHat-Fusion fusion BAM file. (used in TopHat-Fusion mapping)
    -j JUNC --junc=JUNC            STAR Chimeric junction file. (used in STAR mapping)
    -g GENOME --genome=GENOME      Genome FASTA file.
    -r REF --ref=REF               Gene annotation.
    -o PREFIX --output=PREFIX      Output prefix [default: CIRCexplorer].
    --tmp                          Keep temporary files.
    --no-fix                       No-fix mode (useful for species with poor gene annotations)

Example

TopHat & TopHat-Fusion

CIRCexplorer.py -f tophat_fusion/accepted_hits.bam -g hg19.fa -r ref.txt

STAR

  • convert Chimeric.out.junction to fusion_junction.txt (star_parse.py was modified from STAR filterCirc.awk)
star_parse.py Chimeric.out.junction fusion_junction.txt
  • parse fusion_junction.txt
CIRCexplorer.py -j fusion_junction.txt -g hg19.fa -r ref.txt

Note

Field Description
geneName Name of gene
isoformName Name of isoform
chrom Reference sequence
strand + or - for strand
txStart Transcription start position
txEnd Transcription end position
cdsStart Coding region start
cdsEnd Coding region end
exonCount Number of exons
exonStarts Exon start positions
exonEnds Exon end positions
  • hg19.fa is genome sequence in FASTA format.

  • You could use fetch_ucsc.py script to download relevant ref.txt (Known Genes, RefSeq or Ensembl) and the genome fasta file for hg19, hg38 or mm10 from UCSC.

fetch_ucsc.py hg19/hg38/mm10 ref/kg/ens/fa out

Example (download hg19 RefSeq gene annotation file):

fetch_ucsc.py hg19 ref ref.txt

Output

See details in the example file

Field Description
chrom Chromosome
start Start of junction
end End of junction
name Circular RNA/Junction reads
score Flag to indicate realignment of fusion junctions
strand + or - for strand
thickStart No meaning
thickEnd No meaning
itemRgb 0,0,0
exonCount Number of exons
exonSizes Exon sizes
exonOffsets Exon offsets
readNumber Number of junction reads
circType 'Yes' for ciRNA, and 'No' for circRNA (before 1.1.0); 'circRNA' or 'ciRNA' (after 1.1.1)
geneName Name of gene
isoformName Name of isoform
exonIndex/intronIndex Index (start from 1) of exon (for circRNA) or intron (for ciRNA) in given isoform (newly added in 1.1.6)
flankIntron Left intron/Right intron

Note: The first 12 columns are in BED12 format.

Citation

Zhang XO, Wang HB, Zhang Y, Lu X, Chen LL and Yang L. Complementary sequence-mediated exon circularization. Cell, 2014, 159: 134-147

License

Copyright (C) 2014-2017 YangLab. See the LICENSE file for license rights and limitations (MIT).

circexplorer's People

Contributors

bgruening avatar kepbod avatar xingma avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

circexplorer's Issues

unsuccessful installation

Hi there,

When I use pip install CIRCexplorer, I can install sucessfully. But when I call CIRCexplorer.py, i got the following message:

Traceback (most recent call last):
  File "/home5/jhuang/software/anaconda3/bin/CIRCexplorer.py", line 7, in <module>
    from circ.CIRCexplorer import main
  File "/home5/jhuang/software/anaconda3/lib/python3.6/site-packages/circ/CIRCexplorer.py", line 31, in <module>
    from genomic_interval import Interval
ModuleNotFoundError: No module named 'genomic_interval'
jhuang@pauper:~/projects/Misc/drought-stress/CIRCexplorer$ CIRCexplorer.py
Traceback (most recent call last):
  File "/home5/jhuang/software/anaconda3/bin/CIRCexplorer.py", line 7, in <module>
    from circ.CIRCexplorer import main
  File "/home5/jhuang/software/anaconda3/lib/python3.6/site-packages/circ/CIRCexplorer.py", line 31, in <module>
    from genomic_interval import Interval
ModuleNotFoundError: No module named 'genomic_interval'

When I try to install through conda, I got the following warning that did not success:

Fetching package metadata .............
Solving package specifications: .

UnsatisfiableError: The following specifications were found to be in conflict:
  - circexplorer -> python 3.4* -> xz 5.0.5
  - python 3.6*
Use "conda info <package>" to see the dependencies for each package.

I also tried install from source, but no luck.

Any idea how I can fix?

Thanks in advance!

KeyError CIRCexplorer.py

Hi,
thanks for sharing your useful tool. I'm currently attempting to repeat your example analysis, however I am having the following error message:
Traceback (most recent call last):
File "/usr/local/bin/CIRCexplorer.py", line 495, in
fix_fusion(ref_f, genome_fa, temp2, output, options['--no-fix'])
File "/usr/local/bin/CIRCexplorer.py", line 128, in fix_fusion
fusions, fusion_names, fixed_flag = fix_bed(input_f, ref, fa, no_fix)
File "/usr/local/bin/CIRCexplorer.py", line 324, in fix_bed
iso_starts, iso_ends = ref['\t'.join([gene, iso, chrom, strand])]
KeyError: 'Usp9x\tNM_009481\tchrX\t+'

Could you please help to fix the problem?
I really appreciate any help you can provide.

I appreciate any help you can provide

issue about not hashable

Hello,
Thanks for the effor to make the tools avaiable to bioinformatics community. I am currently replicating the analysis mention in your paper to learn this tool . However, I am having the following error message that I have no clue about how to interpret :

Start CIRCexplorer 1.0.6
Start to convert fustion reads...
Converted 138505 fusion reads!
Start to annotate fusion junctions...
Traceback (most recent call last):
File "CIRCexplorer-1.0.6/CIRCexplorer.py", line 459, in
annotate_fusion(ref_f, temp1, temp2)
File "CIRCexplorer-1.0.6/CIRCexplorer.py", line 64, in annotate_fusion
genes, gene_info = parse_ref1(ref_f) # gene annotations
File "CIRCexplorer-1.0.6/CIRCexplorer.py", line 203, in parse_ref1
genes[chrom] = Interval(genes[chrom])
File "/usr/local/lib/python2.7/dist-packages/interval.py", line 279, in init
raise TypeError("lower_bound is not hashable.")
TypeError: lower_bound is not hashable.

The corresponding command is :
1.tophat2 -a 6 --microexon-search -m 2 -p 10 -G /home/k/Downloads/Homo_sapiens/UCSC/hg19/Annotation/Archives/archive-2013-03-06-11-23-03/Genes/genes.gtf
/home/k/Downloads/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome
/home/k/Downloads/ncbi/public/sra/SRR901967.fastq

2.bamToFastq -i tophat_out/unmapped.bam -fq tophat_out/unmapped.fastq

3./home/k/Downloads/tophat-2.0.9.Linux_x86_64/tophat2 -o tophat_fusion -p 15 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search /home/k/Downloads/Homo_sapiens/UCSC/hg19/Sequence/BowtieIndex/genome tophat_out/unmapped.fastq

4.CIRCexplorer.py -f tophat_fusion/accepted_hits.bam -g /home/k/Downloads/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa -r /home/k/Downloads/Homo_sapiens/UCSC/hg19/Annotation/Archives/archive-2013-03-06-11-23-03/Genes/refFlat.txt

ValueError: need more than 2 values to unpack

Start CIRCexplorer 1.1.10
Start to convert fustion reads...
Converted 0 fusion reads!
Start to annotate fusion junctions...
Traceback (most recent call last):
File "/usr/local/bin/CIRCexplorer.py", line 9, in
load_entry_point('CIRCexplorer==1.1.10', 'console_scripts', 'CIRCexplorer.py')()
File "build/bdist.linux-x86_64/egg/circ/CIRCexplorer.py", line 508, in main

File "build/bdist.linux-x86_64/egg/circ/CIRCexplorer.py", line 68, in annotate_fusion

File "build/bdist.linux-x86_64/egg/circ/CIRCexplorer.py", line 209, in parse_ref1

ValueError: need more than 2 values to unpack

could someone pls tell me why i am getting 0 fusion reads converted and value error

Bam file error

I have run the tophat commands and generated bams using tohat2 fusion search on the unmapped fastq.
However, running them with the CIRCexplorer now gives the error - "Please make sure sample is BAM!"

Does the bam file obtained from tophat2 fusion need to be processed in any other way before passing it to the CIRCexplorer

Thanks

Lariat search

Hi,
I found CIRCexplore a very useful tool for detecting circRNA from RNA-seq. I am interested in detecting circRNA which completely falls in intron (=lariat sequence). I mapped the paired-end reads with tophat and executed this program in the same way as mentioned on github page, but I did not find circRNA associated with lariats, rather from exons. Can you suggest me if some how the tool can enforced to pic circ RNA only from introns.

Thanks.

problems in running multiple mapping with tophat

according to the instruction, I do multiple mapping with tophat using the format
tophat2 -a 6 --microexon-search -m 2 -p 10 -G knownGene.gtf -o tophat hg19_bowtie2_index RNA_seq.fastq
the program went wrong, and show error information:
error retrieving prep-reads info

what does that mean? how can I solve it? please help!

Unique mapping with TopHat-Fusion

unique mapping with TopHat-Fusion, tophat2 -o tophat_fusion -p 15 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search hg19_bowtie1_index tophat/unmapped.fastq, can i use bowtie2 to mapping with TopHat-Fusion?

problems happened in CIRCexplorer

Hi,
When I run the CIRCexplorer at last step using standard parameters, sth wrong happened, how could it happen?
thanks!


Start to convert fustion reads...
Converted 30234 fusion reads!
Start to annotate fusion junctions...
Traceback (most recent call last):
File "/home/program/bin/CIRCexplorer.py", line 371, in
annotate_fusion(ref_f, temp1, temp2)
File "/home/program/bin/CIRCexplorer.py", line 53, in annotate_fusion
gene, isoform = parse_ref1(ref_f)
File "/home/program/bin/CIRCexplorer.py", line 166, in parse_ref1
ends = list(map(int, line.split()[10].split(',')[:-1]))
IndexError: list index out of range


I've solved it! This issue happened probabilistically and it was the error coming from server, not from the script.
Hope this answer could help anyone else.

circexplorer exon content

Hello,

I was wondering how circexplorer pick an isoform and then deduces which exons are included inside the circRNA? Knowing it only get information about the BSJ

Thank you!

Drop after Start to Annotate fusion juncions..

Hello,
Im doing some tests with data set.
Followed the command lines suggested with tophat.
Fetched ref.txt with fetch_ucsc.py
And got:

Start CIRCexplorer 1.1.1
Start to convert fustion reads...
Converted 102717 fusion reads!
Start to annotate fusion junctions...
Traceback (most recent call last):
File "CIRCexplorer.py", line 494, in
annotate_fusion(ref_f, temp1, temp2)
File "CIRCexplorer.py", line 67, in annotate_fusion
genes, gene_info = parse_ref1(ref_f) # gene annotations
File "CIRCexplorer.py", line 206, in parse_ref1
genes[chrom] = Interval(genes[chrom])
File "build/bdist.linux-x86_64/egg/interval.py", line 279, in init
TypeError: lower_bound is not hashable.

Is there anything Im doing wrong?

Thanks in advance!

Issue running CIRCexplorer

I am trying to use CIRCexplorer with STAR mapper and the error "IndexError: list index out of range" keeps occurring. I can't figure out how to remedy this problem, any help would be appreciated. Here is my command line:
python CIRCexplorer.py -j 0002-surg-funsion-junction.txt --genome=human.hg19.genome --ref=UCSC_Refseq_sno_miRNA_lncipedia_3_0_hg19_11-10-2014.gtf

Start CIRCexplorer 1.1.1
Start to annotate fusion junctions...
Traceback (most recent call last):
File "CIRCexplorer.py", line 494, in
annotate_fusion(ref_f, temp1, temp2)
File "CIRCexplorer.py", line 67, in annotate_fusion
genes, gene_info = parse_ref1(ref_f) # gene annotations
File "CIRCexplorer.py", line 201, in parse_ref1
start = starts[0]
IndexError: list index out of range

Paired End Date

Hi,
Thanks for providing such a great tool!
Just wondering did CIRCexplorer support paired end data? TopHat can align paired-end reads, but when I convert unmapped reads to PE fastq after sorted and indexed the bam file, there were very few reads converted successfully. But if I merged two paired end fastq (R and L) to give a single fastq file, CIRCexplorer worked good.
One more thing, can CIRCexplorer do secondary analysis of Chimeric.out.sam (output fusion detection data from RNA-STAR)? I tried to convert Chimeric.out.sam to Chimeric.out.bam, and input the bam file to CIRCexplorer, but I got nothing. I compared the output files from TopHat-Fusion and rna-star, they are very similar to each other, and it is kind of strange that CIRCexplorer got nothing from Chimeric.out.bam.
Thanks again for the effor to make the tools avaiable to bioinformatics community.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.