ndaniel / fusioncatcher Goto Github PK

View Code? Open in Web Editor NEW

140.0 10.0 66.0 12.5 MB

Finder of Somatic Fusion Genes in RNA-seq data

License: GNU General Public License v3.0

Python 99.84% Shell 0.16%

rna-seq fusion-genes somatic-fusion-genes

fusioncatcher's Introduction

FusionCatcher

Finder of somatic fusion-genes in RNA-seq data.

Download / Install / Update / Upgrade FusionCatcher

Use this one-line command:

wget http://sf.net/projects/fusioncatcher/files/bootstrap.py -O bootstrap.py && python bootstrap.py -t --download

If one wants to have all the questions asked by boostrap.py answered automatically with yes then add -y to the command above. For more installing options, see:

bootstrap.py --help

On Ubuntu Linux running this command before installing FusionCatcher using bootstrap.py would help making the installation process smoother:

sudo apt-get install wget gawk gcc g++ make cmake automake curl unzip zip bzip2 tar gzip pigz parallel build-essential libncurses5-dev libc6-dev zlib1g zlib1g-dev libtbb-dev libtbb2 python python-dev python-numpy python-biopython python-xlrd python-openpyxl default-jdk

FusionCatcher can be installed also using conda, as follows:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda create -n fusioncatcher fusioncatcher
source activate fusioncatcher
download-human-db.sh

FusionCatcher can be installed also from GitHub, as follows:

git clone https://github.com/ndaniel/fusioncatcher
cd fusioncatcher/tools/
./install_tools.sh
cd ../data
./download-human-db.sh

NOTE: Here it is assumed that Python 2.7.x, BioPython (>v1.5), and Java Runtime Environment 1.8 are already installed.

Description

FusionCatcher searches for novel/known somatic fusion genes, translocations, and chimeras in RNA-seq data (paired-end or single-end reads from Illumina NGS platforms like Solexa/HiSeq/NextSeq/MiSeq/MiniSeq) from diseased samples.

The aims of FusionCatcher are:

very good detection rate for finding candidate somatic fusion genes (see somatic mutations; using a matched normal sample is optional; several databases of known fusion genes found in healthy samples are used as a list of known false positives; biological knowledge is used, like for example gene fusion between a gene and its pseudogene is filtered out),
very good RT-PCR validation rate of found candidate somatic fusion genes (this is very important for us),
very good detection of challenging fusion genes, like for example IGH fusions, CIC fusions, DUX4 fusions, CRLF2 fusions, TCF3 fusions, etc.
very easy to use (i.e. no a priori knowledge of bioinformatic databases and bioinformatics is needed in order to run FusionCatcher BUT Linux/Unix knowledge is needed; it allows a very high level of control for expert users),
to be as automatic as possible (i.e. the FusionCatcher will choose automatically the best parameters in order to find candidate somatic fusion genes, e.g. finding automatically the adapters, quality trimming of reads, building the exon-exon junctions automatically based on the length of the reads given as input, etc. while giving also full control to expert users) while providing the best possible detection rate for finding somatic fusion genes (with a very low rate of false positives but a very good precision).

Manual

A detailed manual is available here.

Forum

A forum for FusionCatcher is available at Google Groups.

Release history

A complete release history can be found here.

Official releases

Old releases and the latest official release of FusionCatcher are on https://sourceforge.net/projects/fusioncatcher/files/

Citing

D. Nicorici, M. Satalan, H. Edgren, S. Kangaspeska, A. Murumagi, O. Kallioniemi, S. Virtanen, O. Kilkku, FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data, bioRxiv, Nov. 2014, DOI:10.1101/011650

fusioncatcher's People

Contributors

Stargazers

Watchers

Forkers

jinyancool readline mpschr winterli1993 mmterpstra hrk2109 snashraf davidroberson alenzhao mckinsel xflicsu z-shiyi scchess wilcas dhruv-github jchenpku mflevine czc dalincn drwilliamssteven him72 msahraeian ewail selonka xchromosome219 qnie yodeng inambioinfo bknisbac ajayrv shahirb haiminli0 cristianriccio zhangb1 wangdi2014 chadsmitharc guoyuh mikeraiko pacitti bacemdatascience blancha annawoodard 522845911 yuanjingnan xm1201 pbiology sima-r jianguozhou3 mywanuo cmguodong yangzixu ericdeveaud zzygyx9119 nailouzhang myrainy bwbai tarsus-hh jmele2017 jonca79 pang-hd bsc-support-team 6emnkey nihilee rannick projassanchez

fusioncatcher's Issues

get_synonyms.py fails

fusioncatcher-build.py (v0.99.7b and Ensembl release 88) fails due to get_synonyms.py failing with the following error message:

////////////////////////////////////////////////////////////////////////////////                                                                     
  Running: step = 9   Time: 19:13   Date: 2017-04-01 (elapsed time: 0d:0h:0m)
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
==> Current working directory: '/data/group/z/software/fusioncatcher_v0.99.7b/data/ensembl_test'
get_synonyms.py \
--organism canis_familiaris \
--server ftp.ensembl.org \
--output /data/group/z/software/fusioncatcher_v0.99.7b/data/ensembl_test/
--------------------------------------------------------------------------------
+-->EXECUTING...
Downloading the MySQL files of organism 'canis_familiaris' from Ensembl!
230 Login successful.
Downloading: ftp.ensembl.org/pub/current_mysql/canis_familiaris_core_88_31/object_xref.txt.gz
Downloading: ftp.ensembl.org/pub/current_mysql/canis_familiaris_core_88_31/xref.txt.gz
Downloading: ftp.ensembl.org/pub/current_mysql/canis_familiaris_core_88_31/external_synonym.txt.gz
Downloading: ftp.ensembl.org/pub/current_mysql/canis_familiaris_core_88_31/canis_familiaris_core_88_31.sql.gz
Downloading: ftp.ensembl.org/pub/current_mysql/canis_familiaris_core_88_31/gene.txt.gz
Decompressing files ...
Parsing the tables in the SQL file...
Traceback (most recent call last):
  File "/data/group/z/software/fusioncatcher_v0.99.7b/bin/get_synonyms.py", line 195, in <module>
    table[last_name][col] = idx
KeyError: None

Sample names missing in output

Hi Daniel,
Maybe I'm missing something, but I had 5 samples in the input directory and now looking at the output directory I can see a lot of evidence for fusions in the output file, but it doesn't point out which sample they came from.

fusioncatcher-build.py does not match version of configuration.cfg

Hi,

Running fusioncatcher-build to generate data for mouse, prints the following error message:

ERROR: The version of configuration.cfg file does not match the version of the fusioncatcher-build.py! Please, fix this!

The reason behind is that fusioncatcher-build.py runs 0.99.7a beta while the config files states 0.99.7b beta. The code on github looks o.k., probablythe data on sourceforge is outdated (http://sf.net/projects/fusioncatcher/files/bootstrap.py).

Thanks, Daniel

http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64.v287/liftOver does not exist anymore

Hey, was just checking the downloader - apparently they took down
http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64.v287/liftOver

however
http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/liftOver exists.

Can we use this file with fusion catcher? Will this be okay?

Single-cell, short read advice

The wiki recommends default options to be used for read lengths that have a minimum length of 35bp (presumably for paired end) and 130bp for single-end. Unfortunately I have three single-cell RNA-sequencing datasets that do not meet this criteria - one is 25bp paired-end, the second is 50bp single-end, and the third is 125bp single-end. Do you by chance have any advice on suggested parameters for at least attempting fusion detection using these datasets? Thank you.

Slow downs when running bowtie using '--ff' and '--tryhard'

There are rare cases when FusionCatcher runs very slowly due to Bowtie running very slowly when there are command line paramenters "--ff --tryhard". Here is a such example:

bowtie \
-t  \
-a  \
-v 2 \
-p 12 \
--chunkmbs 128 \
--tryhard  \
--best  \
--strata  \
--sam  \
--ff  \
-X 1141505 \
/gpfs/ngs/48/results/gene-gene_split_[star/bowtie2].fa.06_bowtie_bowtie2/ \
-1 /gpfs/ngs/48/results/reads-ids_clip_[star/bowtie2]_psl_r1.fq.6 \
-2 /gpfs/ngs/48/results/reads-ids_clip_[star/bowtie2]_psl_r2.fq.6 \
2> /dev/null \
|  \
LC_ALL=C  \
awk '$3 == "*" { next } { print }' \
> /gpfs/ngs/48/results/split_gene-gene_[star/bowtie].sam.6

bootstrap.py as root fails

I've been running bootstrap.py successfully as a regular user on a VM.

Now I'm trying to install on a docker container (ubuntu), and the program recognizes I'm doing this as root. So instead of
Installing Python module 'python-biopython' locally
I now get
Installing Python package 'python-biopython' as root...

Unfortunately, this part of the code exits with

  File "bootstrap.py", line 1308, in <module>
    install = options.install_all or options.install_all_py
  File "bootstrap.py", line 856, in module
    pythonpath = pythonpath
  File "bootstrap.py", line 583, in install_module
    f,r = cmds([['apt-get']], verbose = False, exit = False)
TypeError: 'list' object is not callable

I had a quick look and wondered if cmds should be replaced with cmd in that code block, but a quick try just got me into more trouble.

seqtk dependency

Fusioncatcher used seqtk mergepe which is not available with older version of seqtk. This either needs documenting or else adding to the dependency checking.

generate_chrom_lens.py can't find xml.sax.saxutils when called from fusioncatcher-build

I just downloaded and installed 0.99.7b like so:

wget http://sf.net/projects/fusioncatcher/files/bootstrap.py -O bootstrap.py && python bootstrap.py -t --download -y

Dependencies are found, all messages are happy and everyone feels good. Then I try to run fusioncatcher-build and I receive an error like so:

$ mkdir tmp
$ fusioncatcher-build -g canis_familiaris -o ./tmp
WARNING: Cannot restart automatically because the previous log file '/scratch/godlovedc/tmp/fusioncatcher-build.log' cannot be found!
The workflow will be restarted from the beginning with step 1!
Log of the pipeline:
--------------------------------------------------------------------------------
Starting execution with step 1.
////////////////////////////////////////////////////////////////////////////////
  Running: step = 1   Time: 16:21   Date: 2017-03-21 (elapsed time: 0d:0h:0m)
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
==> Current working directory: '/scratch/godlovedc'
python_version.py
--------------------------------------------------------------------------------
+-->EXECUTING...
Python version: 2.7.13 |Continuum Analytics, Inc.| (default, Dec 20 2016, 23:09:15)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
Python executable: /usr/local/bin/python
--------------------------------------------------------------------------------
==> Execution time: 0 day(s), 0 hour(s), 0 minute(s), and 0 second(s)
////////////////////////////////////////////////////////////////////////////////
  Running: step = 2   Time: 16:21   Date: 2017-03-21 (elapsed time: 0d:0h:0m)
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
==> Current working directory: '/scratch/godlovedc'
biopython_version.py
--------------------------------------------------------------------------------
+-->EXECUTING...
1.68
--------------------------------------------------------------------------------
==> Execution time: 0 day(s), 0 hour(s), 0 minute(s), and 0 second(s)
////////////////////////////////////////////////////////////////////////////////
  Running: step = 3   Time: 16:21   Date: 2017-03-21 (elapsed time: 0d:0h:0m)
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
==> Current working directory: '/scratch/godlovedc'
printf \
"canis_familiaris"  \
> /scratch/godlovedc/tmp/organism.txt
--------------------------------------------------------------------------------
+-->EXECUTING...
--------------------------------------------------------------------------------
==> Execution time: 0 day(s), 0 hour(s), 0 minute(s), and 0 second(s)
////////////////////////////////////////////////////////////////////////////////
  Running: step = 4   Time: 16:21   Date: 2017-03-21 (elapsed time: 0d:0h:0m)
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
==> Current working directory: '/scratch/godlovedc'
get_genome.py \
--organism canis_familiaris \
--server ftp.ensembl.org \
--output /scratch/godlovedc/tmp/
--------------------------------------------------------------------------------
+-->EXECUTING...
Downloading the genome of organism 'canis_familiaris' from Ensembl!
230 Anonymous access granted, restrictions apply
Downloading: /pub/current_fasta//canis_familiaris/dna/Canis_familiaris.CanFam3.1.dna.chromosome.MT.fa.gz
Downloading: /pub/current_fasta//canis_familiaris/dna/Canis_familiaris.CanFam3.1.dna.toplevel.fa.gz
Executing:   gzip -d -f -c '/scratch/godlovedc/tmp/Canis_familiaris.CanFam3.1.dna.chromosome.MT.fa.gz' > '/scratch/godlovedc/tmp/Canis_familiaris.CanFam3.1.dna.chromosome.MT.fa'
Executing:   gzip -d -f -c '/scratch/godlovedc/tmp/Canis_familiaris.CanFam3.1.dna.toplevel.fa.gz' > '/scratch/godlovedc/tmp/Canis_familiaris.CanFam3.1.dna.toplevel.fa'
--------------------------------------------------------------------------------
==> Execution time: 0 day(s), 0 hour(s), 0 minute(s), and 56 second(s)
////////////////////////////////////////////////////////////////////////////////
  Running: step = 5   Time: 16:22   Date: 2017-03-21 (elapsed time: 0d:0h:1m)
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
==> Current working directory: '/scratch/godlovedc'
generate_chrom_lens.py \
--input_genome /scratch/godlovedc/tmp/genome.fa \
--output /scratch/godlovedc/tmp/
--------------------------------------------------------------------------------
+-->EXECUTING...
Traceback (most recent call last):
  File "/usr/local/apps/fusioncatcher/0.99.7b/fusioncatcher/bin/generate_chrom_lens.py", line 67, in <module>
    import Bio.SeqIO
  File "/usr/local/python/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 374, in <module>
    from . import SeqXmlIO
  File "/usr/local/python/lib/python2.7/site-packages/Bio/SeqIO/SeqXmlIO.py", line 17, in <module>
    from xml.sax.saxutils import XMLGenerator
ImportError: No module named sax.saxutils


ERROR: Workflow execution failed at step 5 while executing:
----------------
   generate_chrom_lens.py \
   --input_genome /scratch/godlovedc/tmp/genome.fa \
   --output /scratch/godlovedc/tmp/
----------------


Executing second time the same step/command in order to capture error messages (i.e. STDERR)...

-------------------------------------------
Traceback (most recent call last):
  File "/usr/local/apps/fusioncatcher/0.99.7b/fusioncatcher/bin/generate_chrom_lens.py", line 67, in <module>
    import Bio.SeqIO
  File "/usr/local/python/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 374, in <module>
    from . import SeqXmlIO
  File "/usr/local/python/lib/python2.7/site-packages/Bio/SeqIO/SeqXmlIO.py", line 17, in <module>
    from xml.sax.saxutils import XMLGenerator
ImportError: No module named sax.saxutils


################################################################################
################################################################################
TOTAL RUNNING TIME: 0 day(s), 0 hour(s), 0 minute(s), and 57 second(s)
################################################################################
################################################################################

So it seems like an error importing BioSeqIO because it can't find sax.saxutils. But, if I try importing that module myself, I don't have any problems:

$ python
Python 2.7.13 |Continuum Analytics, Inc.| (default, Dec 20 2016, 23:09:15)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import Bio.SeqIO
>>> exit()

And even stranger, if I just run generate_chrom_lens.py without the fusioncatcher-build wrapper, I don't have any problems.

$    generate_chrom_lens.py \
>    --input_genome /scratch/godlovedc/tmp/genome.fa \
>    --output /scratch/godlovedc/tmp/
  * chromosome 1 has length 122678785
  * chromosome 10 has length 69331447
  * chromosome 11 has length 74389097
  * chromosome 12 has length 72498081
  * chromosome 13 has length 63241923
  * chromosome 14 has length 60966679
  * chromosome 15 has length 64190966
  * chromosome 16 has length 59632846
  * chromosome 17 has length 64289059
  * chromosome 18 has length 55844845
  * chromosome 19 has length 53741614
  * chromosome 2 has length 85426708
  * chromosome 20 has length 58134056
  * chromosome 21 has length 50858623
  * chromosome 22 has length 61439934
  * chromosome 23 has length 52294480
  * chromosome 24 has length 47698779
  * chromosome 25 has length 51628933
  * chromosome 26 has length 38964690
  * chromosome 27 has length 45876710
  * chromosome 28 has length 41182112
  * chromosome 29 has length 41845238
  * chromosome 3 has length 91889043
  * chromosome 30 has length 40214260
  * chromosome 31 has length 39895921
  * chromosome 32 has length 38810281
  * chromosome 33 has length 31377067
  * chromosome 34 has length 42124431
  * chromosome 35 has length 26524999
  * chromosome 36 has length 30810995
  * chromosome 37 has length 30902991
  * chromosome 38 has length 23914537
  * chromosome 4 has length 88276631
  * chromosome 5 has length 88915250
  * chromosome 6 has length 77573801
  * chromosome 7 has length 80974532
  * chromosome 8 has length 74330416
  * chromosome 9 has length 61074082
  * chromosome MT has length 16727
  * chromosome X has length 123869142
  * chromosome JH373233.1 has length 2660953
  * chromosome JH373234.1 has length 1881673
  * chromosome JH373235.1 has length 1415205
  * chromosome JH373236.1 has length 1067467
[...snip...]
  * chromosome AAEX03026066.1 has length 2046
  * chromosome AAEX03026067.1 has length 1999
  * chromosome AAEX03026068.1 has length 1819
  * chromosome AAEX03026069.1 has length 1790
  * chromosome AAEX03026070.1 has length 1685
  * chromosome AAEX03026071.1 has length 1573
  * chromosome AAEX03026072.1 has length 1115
$

What's going on?

Release Tags

Hi,
The last release is from Dec. 3rd 2014. Has there been any new version that is stable enough to use?
If so could you please tag those versions so that downstream packages can be build?
Thanks,
Martin

Version of data downloadable at SourceForce (v86) does not match latest version (v88)

I am currently installing and setting up data for the latest version of FusionCatcher. I typically prefer to just download the prebuilt set of files from SourceForge, rather than using fusioncatcher-build. However I noticed that the version of the files at SourceForce is for Ensembl v86, whereas if you go to the Ensembl website the version they are currently up to is v88. Do you know why this might be, and if the v86 version will still work OK with the latest version of FusionCatcher?

I also wouldn't mind just running fusioncatcher-build if needed. Just want to get the proper annotation.

STAR aligner crashes with error: "std::bad_alloc"

In rare cases (depending on the Linux distro used, e.g. CentOS 5.1), FusionCatcher v0.99.5a crashes due to STAR v2.5.0c aligner crashing when using:

the command line parameter --alignTranscriptsPerReadNmax 1000000, and
very large number of threads.

The error message thrown by STAR is

Jan 15 21:30:20 ..... Started STAR run
Jan 15 21:30:20 ..... Loading genome
Jan 15 21:30:21 ..... Started mapping
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

This bug has been submitted further to STAR aligner here.

Until it is fixed, in cases when FusionCatcher has crashed already, the workaround is to restart FusionCatcher using only one thread (if more than one thread has been used) using restart.sh (and adding there --threads 1).

library path not recognized when I use the command in qsub

Hi,

I am having the same problems as before regarding some libraries.

$ echo $LD_LIBRARY_PATH
/cm/shared/apps/uge/8.3.1p6/lib/lx-amd64:/cm/shared/apps/glibc/2.14/lib:/cm/shared/apps/intel-tbb-oss/intel64/current/lib/gcc4.4

$ ls /cm/shared/apps/intel-tbb-oss/intel64/current/lib/gcc4.4
libtbb.so    libtbb_debug.so    libtbb_preview.so    libtbb_preview_debug.so    libtbbmalloc.so    libtbbmalloc_debug.so    libtbbmalloc_proxy.so    libtbbmalloc_proxy_debug.so
libtbb.so.2  libtbb_debug.so.2  libtbb_preview.so.2  libtbb_preview_debug.so.2  libtbbmalloc.so.2  libtbbmalloc_debug.so.2  libtbbmalloc_proxy.so.2  libtbbmalloc_proxy_debug.so.2

Bowtie version:

$ /mnt/isilon/cbmi/variome/bin/fusioncatcher/tools/bowtie/bowtie --version
bowtie-align version 1.2
64-bit
Built on mint
Tue Dec 27 17:03:06 UTC 2016
Compiler: gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) 
Options: -O3 -m64  -Wl,--hash-style=both -DWITH_TBB -DPOPCNT_CAPABILITY -DNO_SPINLOCK -DWITH_QUEUELOCK=1  
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

When I run my command as is (without qsub), it runs fine:

/mnt/isilon/cbmi/variome/bin/fusioncatcher/bin/fusioncatcher.py -p 10 -d /mnt/isilon/cbmi/variome/bin/fusioncatcher/data/current --input /mnt/isilon/maris_lab/target_nbl_ngs/rathik/fusion-catcher/cell_line/CHP212 --output /mnt/isilon/maris_lab/target_nbl_ngs/rathik/fusion-catcher/cell_line/CHP212_out

But when I use qsub to redirect the same command, it cannot find the bowtie path again:

$ head cellline_run.sh
/mnt/isilon/cbmi/variome/bin/fusioncatcher/bin/fusioncatcher.py -p 10 -d /mnt/isilon/cbmi/variome/bin/fusioncatcher/data/current --input /mnt/isilon/maris_lab/target_nbl_ngs/rathik/fusion-catcher/cell_line/CHP212 --output /mnt/isilon/maris_lab/target_nbl_ngs/rathik/fusion-catcher/cell_line/CHP212_out

# piping it into qsub
$ cat cellline_run.sh | xargs -i echo qsub -b y -o output_log  -e error_log -N job_name \"{}\" | sh

This fails as it cannot find the bowtie version:

WARNING: Cannot restart automatically because the previous log file '/mnt/isilon/maris_lab/target_nbl_ngs/rathik/fusion-catcher/CHP212_out/fusioncatcher.log' cannot be found!
The workflow will be restarted from the beginning with step 1!
bowtie-align: error while loading shared libraries: libtbbmalloc_proxy.so.2: cannot open shared object file: No such file or directory
bowtie-align: error while loading shared libraries: libtbbmalloc_proxy.so.2: cannot open shared object file: No such file or directory



ERROR: Wrong version of BOWTIE found (/mnt/isilon/cbmi/variome/bin/fusioncatcher/tools/bowtie/bowtie)! It should be 'bowtie-align version 1.2'. One may specify the path to the correct version in 'fusioncatcher/etc/configuration.cfg'.

OverflowError: size does not fit in an int

I am running the latest version of FusionCatcher (that is v0.99.7a) and was able to install the tool successfully. For the human genome I was downloading the annotation files, however for the mouse genome I issues the the following command which raises an error message:

~/fusioncatcher/bin/fusioncatcher-build -g mus_musculus -o fusioncatcher/data/mus_musculus

...

get_genome.py \
--organism mus_musculus \
--server ftp.ensembl.org \
--output /home/gerlacda/fusioncatcher/data/mus_musculus/
--------------------------------------------------------------------------------
+-->EXECUTING...
Downloading the genome of organism 'mus_musculus' from Ensembl!
230 Anonymous access granted, restrictions apply
Downloading: /pub/current_fasta//mus_musculus/dna/Mus_musculus.GRCm38.dna.chromosome.MT.fa.gz
Downloading: /pub/current_fasta//mus_musculus/dna/Mus_musculus.GRCm38.dna.toplevel.fa.gz
Traceback (most recent call last):
  File "/home/gerlacda/fusioncatcher/bin/get_genome.py", line 161, in <module>
    file_content = f.read()
  File "/apps/prod/easybuild.el7.x86_64/software/Python/2.7.9-foss-2015a/lib/python2.7/gzip.py", line 261, in read
    self._read(readsize)
  File "/apps/prod/easybuild.el7.x86_64/software/Python/2.7.9-foss-2015a/lib/python2.7/gzip.py", line 320, in _read
    self._add_read_data( uncompress )
  File "/apps/prod/easybuild.el7.x86_64/software/Python/2.7.9-foss-2015a/lib/python2.7/gzip.py", line 336, in _add_read_data
    self.crc = zlib.crc32(data, self.crc) & 0xffffffffL
OverflowError: size does not fit in an int


ERROR: Workflow execution failed at step 4 while executing:
----------------
   get_genome.py \
   --organism mus_musculus \
   --server ftp.ensembl.org \
   --output /home/gerlacda/fusioncatcher/data/mus_musculus/

I found this python bug fix https://bugs.python.org/issue23306, but was wondering if there would be any other way to fix this, instead of rebuilding a completely new version of python.I am currently running python 2.7.9 on RedHat 7.3 with the kernel 3.10.0-327.13.1.

error: zipfile.LargeZipFile: Files count would require ZIP64 extensions

In some rare cases, FusionCatcher will throw this error!

zipfile.LargeZipFile: Files count would require ZIP64 extensions

The fix is to add , allowZip64 = True in all lines containing zipfile.ZipFile for all Python scripts of FusionCatcher! Planned to be fixed in the next release!

FusionCatcher on mouse fails due to cannot find pancreases.txt

Was able to (I thought) run fusioncatcher-build successfully for mouse (version v0.99.7b), after manually adjusting the file get_synonyms.py as described in another thread. However, it failed because it was unable to find a file "pancreases.txt" in the data directory. I am trying to run it again after manually creating an empty file with this name in the data directory, but just thought you should know.

Full error:

ERROR: Workflow execution failed at step 246 while executing:

label_fusion_genes.py
--input fusioncatcher/candidate_fusion-genes_70.txt
--label pancreases
--filter_genes data_dir/pancreases.txt
--output_fusion_genes fusioncatcher/candidate_fusion-genes_1000.txt

Size 'fusioncatcher/candidate_fusion-genes_70.txt' = 267296 bytes
*Size 'data_dir/pancreases.txt' = 0 bytes
*Size 'fusioncatcher/candidate_fusion-genes_1000.txt' = 0 bytes
Executing second time the same step/command in order to capture error messages (i.e. STDERR)...

Traceback (most recent call last):
File "software_dir/label_fusion_genes.py", line 155, in
no_proteins=set([line.rstrip('\r\n') for line in file(options.input_filter_genes_filename,'r') if line.rstrip('\r\n')])
IOError: [Errno 2] No such file or directory: 'data_dir/pancreases.txt'

Commandline option for limiting coreutils sort

The script fusioncatcher.py does not have an commandline option for limiting the sort memory but defaults to 80% of the machine memory. Using job managers (slurm and PBS for example) this might get out of assigned memory or an somewhat ridiculous amount of memory has to be assigned(on 256g node =~ 205g ram). Can you consider changing this?

In the git repo this happens at line 1913

Install via conda

Hi @ndaniel,

I was wondering if there are plans to make FusionCatcher available via Conda (perhaps more appropriately via the bioconda channel. I think it could also simplify the deployment script a bit since many dependencies can then be installed also via conda.

Thanks for working on the tool and making it public by the way!

Minor edit: Update --sort-buffer-size description

Hi thanks for including my request!

The commandline usage manual here still mentions the 80% memory.

It might be better to also mention the newly included limit using something like "Default is '%default' capped at 26G." at here.

Large slowdown

In many cases there is a large slowdown while running FusionCatcher v0.99.4d compared to previous releases.

This bug will be fixed in the next release of FusionCatcher.

Until then, the workaround is to edit manually ...../fusioncatcher/bin/fusioncatcher.py as follows:

line 7388, change from

                                if job.iff(empty(outdir('reads-ids_clip_star_psl_unmapped-'+str(i)+'_1.fq') or eporcrlf2igh == False),id = "#reads-ids_clip_star_psl_unmapped-"+str(i)+"_1#"):

into

                                if job.iff(empty(outdir('reads-ids_clip_star_psl_unmapped-'+str(i)+'_1.fq')) or eporcrlf2igh == False,id = "#reads-ids_clip_star_psl_unmapped-"+str(i)+"_1#"):

line 7812, change from

                            if job.iff(empty(outdir('reads-ids_clip_star_psl_unmapped_1.fq') or eporcrlf2igh == False),id = "#reads-ids_clip_star_psl_unmapped_1#"):

into

                            if job.iff(empty(outdir('reads-ids_clip_star_psl_unmapped_1.fq')) or eporcrlf2igh == False,id = "#reads-ids_clip_star_psl_unmapped_1#"):

Very high memory usage in remove_adapter.py

I'm seeing a very high memory usage in an RNA-seq data sets (2x151bp) of ~135M reads in remove_adapter.py (I'm running with 16 threads) -- 37GB and counting. As adapter trimming essentially requires O(1) memory, I assume that this is a bug (maybe a memory "leak" by never releasing any of the buffers?)

File reads-ids_clip_bowtie2_psl_r1.fq.0 not found

I am running the most recent version of FusionCatcher on a paired-end RNA-seq data set of a cell line with about 50M fragments and a read length of 100bp. Although the test reads were running through successfully, I now see this error message on a real world sample.

////////////////////////////////////////////////////////////////////////////////
  Running: step = 422   Time: 18:07   Date: 2017-03-06 (elapsed time: 0d:5h:10m)
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
==> Current working directory: '/home/dnanexus'
head \
-4 /home/dnanexus/J1055.OCI-Ly3.27232_fusioncatcher/reads-ids_clip_bowtie2_psl_r1.fq.0 \
> /home/dnanexus/J1055.OCI-Ly3.27232_fusioncatcher/reads-ids_clip_bowtie2_psl_unmapped_filtered-0_1_t.fq
--------------------------------------------------------------------------------
+-->EXECUTING...
ERROR: Workflow execution failed at step 422 while executing:
----------------
   head \
   -4 /home/dnanexus/J1055.OCI-Ly3.27232_fusioncatcher/reads-ids_clip_bowtie2_psl_r1.fq.0 \
   > /home/dnanexus/J1055.OCI-Ly3.27232_fusioncatcher/reads-ids_clip_bowtie2_psl_unmapped_filtered-0_1_t.fq
----------------
  * Size '/home/dnanexus/J1055.OCI-Ly3.27232_fusioncatcher/reads-ids_clip_bowtie2_psl_r1.fq.0' = 0 bytes
  * Size '/home/dnanexus/J1055.OCI-Ly3.27232_fusioncatcher/reads-ids_clip_bowtie2_psl_unmapped_filtered-0_1_t.fq' = 0 bytes
Executing second time the same step/command in order to capture error messages (i.e. STDERR)...
-------------------------------------------
head: cannot open '/home/dnanexus/J1055.OCI-Ly3.27232_fusioncatcher/reads-ids_clip_bowtie2_psl_r1.fq.0' for reading: No such file or directory

Cannot allocate memory & threads

In some cases, FusionCatcher v0.99.5a failed to run (when using 4 threads while running find_homolog_genes.py) with the following error message:

Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 302, in _handle_workers
    pool._maintain_pool()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 206, in _maintain_pool
    self._repopulate_pool()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 199, in _repopulate_pool
    w.start()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 130, in start
    self._popen = Popen(self)
  File "/usr/lib/python2.7/multiprocessing/forking.py", line 120, in __init__
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

It looks like it might be related to this issue:
http://stackoverflow.com/questions/26717120/python-cannot-allocate-memory-using-multiprocessing-pool

The workaround is to run FusionCatcher using only one thread (i.e. -p 1) until this fixed.

Could not locate a Bowtie index rtrna_mt_index

Hi Daniel,

i am using Fusioncatcher, 0.99.4b version i could run fine on the test samples with the same build Ensembl 79 vesion. but when i ran with my RNA-seqdata, half way it threw Error saying could;t locate rna_index which was't there even when i ran with test samples,

How to Manipulate to skip some steps in fusioncatcher so it will be user friendly...?
like to skip Bowtie of rtrna.

How and where to reduce the Threads used by Fusion catcher.. as it is now -p 64..?

here is my Info.txt cant attach file..

https://code.google.com/p/fusioncatcher/wiki/Manual_v0_93#FUSIONCATCHER #
Issues cannot be reported in the above link anymore..

----------------

   bowtie \
   -t  \
   -v 2 \
   -p 64 \
   -k 1 \
   --solexa1.3-quals  \
   --tryhard  \
   --chunkmbs 128 \
   --un /MGMSTAR1/SHARED/ANALYSIS/NGS_P1216/Fusions/MDB_RI_T19/reads-filtered_temp.fq \
   --max /dev/null \
    /data/current/rtrna_mt_index/ \
    /MGMSTAR1/SHARED/ANALYSIS/NGS_P1216/Fusions/MDB_RI_T19/reads_acgt.fq \
    /dev/null \
   2> /MGMSTAR1/SHARED/ANALYSIS/NGS_P1216/Fusions/MDB_RI_T19/log_bowtie_reads-filtered-out.stdout.txt
----------------
  * Size '/MGMSTAR1/SHARED/ANALYSIS/NGS_P1216/Fusions/MDB_RI_T19/reads-filtered_temp.fq' = 0 bytes
  * Size '/data/current/rtrna_mt_index/' = 0 bytes
  * Size '/MGMSTAR1/SHARED/ANALYSIS/NGS_P1216/Fusions/MDB_RI_T19/reads_acgt.fq' = 27456195340 bytes
  * Size '/MGMSTAR1/SHARED/ANALYSIS/NGS_P1216/Fusions/MDB_RI_T19/log_bowtie_reads-filtered-out.stdout.txt' = 400 bytes
Could not locate a Bowtie index corresponding to basename "/data/current/rtrna_mt_index/"
Overall time: 00:00:00
Command: bowtie -t -v 2 -p 64 -k 1 --solexa1.3-quals --tryhard --chunkmbs 128 --un /MGMSTAR1/SHARED/ANALYSIS/NGS_P1216/Fusions/MDB_RI_T19/reads-filtered_temp.fq --max /dev/null /data/current/rtrna_mt_index/ /MGMSTAR1/SHARED/ANALYSIS/NGS_P1216/Fusions/MDB_RI_T19/reads_acgt.fq /dev/null 

----------

Found 40405 reads with poly-A/C/G/T/N tails (equal or more 20 repeat nucleotides)
Total number of input reads = 117508227
Total number of reads written in the output = 117495305
Count of all short reads after removing reads due to missing their mate read:
-----------------------------------------------------------------------------
104396180
Adjusted automatically mismatches_psl and trim_psl_3end_keep (105,3) because reads of maximum length of 125 bp were found!.

What need to be done continue the further steps..

Thank you

fusioncatcher_v0.99.7a: version of the data build does not match the version of this pipeline

I can't run fusioncatcher_v0.99.7b due to the libtbb issue with Bowtie. Thus, I decided to use fusioncatcher_v0.99.7a. The install finished fine. Then, I ran test/test.sh and got a warning:

WARNING: This is an OLD version of FusionCatcher! There is a newer
         version available! Please, update to the newest version!
 - Current version:   0.99.7a beta
 - New version:       0.99.7b beta

Makes sense, but it kept going.

When it gets to the data directory, it throws an error:

....................
ERROR: The version of the data build does not match the version of this pipeline!
Please, run again the 'fusioncatcher-build.py' in order to fix this!
....................
sort: open failed: test_fusioncatcher/summary_candidate_fusions.txt: No such file or directory

What version of the data and the pipeline should it be? My data directory is 86 and the pipeline should be 86 as well. Is there any way I can override that?

padding-fastq.py ValueError: min() arg is an empty sequence

error produced with:

python $EBROOTFUSIONCATCHER/bin/padding-fastq.py    --input infolder/S2_L001_R1_001.fastq    --output outfolder/S2_L001_R1_001.fastq    --size 151 
Traceback (most recent call last):
  File "padding-fastq.py", line 133, in <module>
    qual = min([min(data[j][:-1]) for j in xrange(len(data)) if (j+i)%4 == 3])
ValueError: min() arg is an empty sequence

After a bit of experimenting I found out that changing the buffer size variable solves this problem:

old:

    buffer_size = 10**8

new suggested buffer size:

    buffer_size = 10**7

You ask me why:
I don't know. At least I can tell it is not because of the maximum array length:

>>> import sys
>>> print sys.maxsize
9223372036854775807

PS: for me 6/8 runs fail on this with buffer_size = 10**8 and 2/8 runs fail on this with buffer_size = 10**7

Wrong version of BOWTIE found

Hi,

So I install fusioncatcher and the data separately like this:

wget http://sf.net/projects/fusioncatcher/files/bootstrap.py -O $bin/bootstrap.py
python2.6 $bin/bootstrap.py --install-all --yes -t
bash $bin/fusioncatcher/fusioncatcher_v0.99.7b/bin/download.sh

The installation did not throw any errors but when I went to test the installation:

/mnt/isilon/cbmi/variome/bin/fusioncatcher/test/test.sh

It gave me test not ok:

ERROR: Wrong version of BOWTIE found (/mnt/isilon/cbmi/variome/bin/fusioncatcher/tools/bowtie/bowtie)! It should be 'bowtie-align version 1.2'. One may specify the path to the correct version in 'fusioncatcher/etc/configuration.cfg'.

sort: open failed: test_fusioncatcher/summary_candidate_fusions.txt: No such file or directory

The one that is in the etc/configuration.cfg file is:

$ head configuration.cfg 
[paths]
python = /usr/bin
data = /mnt/isilon/cbmi/variome/bin/fusioncatcher/data/current
scripts = /mnt/isilon/cbmi/variome/bin/fusioncatcher/bin
bowtie = /mnt/isilon/cbmi/variome/bin/fusioncatcher/tools/bowtie
bowtie2 = /mnt/isilon/cbmi/variome/bin/fusioncatcher/tools/bowtie2

Here are all the bowtie installations:

$ ls /mnt/isilon/cbmi/variome/bin/fusioncatcher/tools/
1.0-r82b         biopython-1.68           bowtie-1.2-linux-x86_64.zip     liftover                  setuptools                       sratoolkit.2.6.2-centos_linux64.tar.gz
1.0-r82b.tar.gz  biopython-1.68.tar.gz    bowtie2                         openpyxl                  setuptools-20.9.0                star
2.5.2b           blat                     bowtie2-2.2.9-linux-x86_64      openpyxl-2.4.0-a1         setuptools-20.9.0.tar.gz         xlrd
2.5.2b.tar.gz    bowtie                   bowtie2-2.2.9-linux-x86_64.zip  openpyxl-2.4.0-a1.tar.gz  sratoolkit                       xlrd-0.9.4
biopython        bowtie-1.2-linux-x86_64  fatotwobit                      seqtk                     sratoolkit.2.6.2-centos_linux64  xlrd-0.9.4.tar.gz

$ ls /mnt/isilon/cbmi/variome/bin/fusioncatcher/tools/bowtie-1.2-linux-x86_64/
AUTHORS                 bowtie-align-s          bowtie-build-l-debug    bowtie-inspect-l        doc/                    MANUAL                  scripts/
bowtie                  bowtie-align-s-debug    bowtie-build-s          bowtie-inspect-l-debug  genomes/                MANUAL.markdown         SeqAn-1.1/
bowtie-align-l          bowtie-build            bowtie-build-s-debug    bowtie-inspect-s        indexes/                NEWS                    TUTORIAL
bowtie-align-l-debug    bowtie-build-l          bowtie-inspect          bowtie-inspect-s-debug  LICENSE                 reads/                  VERSION

However, none of the bowtie installations were ok because when I try to print out the version, it throws me this error:

$ /mnt/isilon/cbmi/variome/bin/fusioncatcher/tools/bowtie/bowtie -version
bowtie-align: error while loading shared libraries: libtbb.so.2: cannot open shared object file: No such file or directory

$ /mnt/isilon/cbmi/variome/bin/fusioncatcher/tools/bowtie-1.2-linux-x86_64/bowtie --version
bowtie-align: error while loading shared libraries: libtbb.so.2: cannot open shared object file: No such file or directory

Can you suggest a possible solution for this?

fusioncatcher-build failing for non-human organism

fusioncatcher-build v0.99.4c fails for non-human organism!

This is a bug and is due to latest changes in Ensembl release 81. This bug does not exist when Ensembl release 80 is used.

This will be fixed in next release of FusionCatcher (that is v0.99.4d).

The workaround until the next release is made publicly available is:

changed manually the line no. 127 in file "get_gtf.py" from:

        list_files = [el for el in list_files if el.lower().startswith(options.organism.lower()) and el.lower().endswith('.gtf.gz') ]

into:

        list_files = [el for el in list_files if el.lower().startswith(options.organism.lower()) and el.lower().endswith('.gtf.gz') and el.lower().find('abinitio') == -1]

changed manually the line no. 265 in file "add_custom_gene.py" from:

    if head.startswith("ENSG"):

into

    if head.startswith("ENS"):

incorrect prediction of fusion junction sequences

In some rare cases that sequence of the fusion junction is predicted incorrectly even that the genomic coordinates of the fusion are correct.

An incorrect sequence of fusion junction (from file final-list_candidate-fusion-genes.txt) looks like this:

AAAACCACAATGAGATACCATCTCACACCAGTTAGAATGGTGATCATTAA*AACCACAATGAGATACCATCTCACACCAGTTAGAATGGTGATCATTAA

which is actually looking like duplicated sequence:

AAAACCACAATGAGATACCATCTCACACCAGTTAGAATGGTGATCATTAA*
  AACCACAATGAGATACCATCTCACACCAGTTAGAATGGTGATCATTAA

test data have a mixture of 33 and 64 offset quality scores

The test data have a mixture of 33 and 64 offset quality scores.

no parallel

I just upgraded to v0.99.4c (the one in bootstrap.py) from v0.99.4a. Running the latest version gives me the following error:

The workflow will be restarted from the beginning with step 1!
which: no parallel in ( [$PATH] )
....................
ERROR: The version of the data build does not match the version of this pipeline!
Please, run again the 'fusioncatcher-build.py' in order to fix this!

Is this the right parallel? Should it be included in tools directory?
http://www.gnu.org/software/parallel/

fusioncatcher-build fails for mouse data

Greetings,

fusioncatcher-build seems to fail on step 12 for mouse data:

ERROR: Workflow execution failed at step 12 while executing:
----------------
   get_exons_positions.py \
   --organism mus_musculus \
   --server www.ensembl.org \
   --threshold_length 150 \
   --output /mnt/disks/sandbox/fusioncatcher/data/v85/
----------------


Executing second time the same step/command in order to capture error messages (i.e. STDERR)...

-------------------------------------------
Traceback (most recent call last):
  File "/mnt/disks/sandbox/fusioncatcher/bin/get_exons_positions.py", line 281, in <module>
    data = set([(line[1],line[8],line[7],line[11],line[12]) for line in data])
IndexError: list index out of range

The full output may be found here. Any ideas? Thank you in advance.

wrong version Bowtie or STAR or ...

In rare cases, FusionCatcher v0.99.6a (and also older versions?) reports "ERROR: wrong version of Bowtie found" (or even wrong version of STAR?).

It looks like this happens, when:

the paths specified in fusioncatcher/etc/configuration.cfg file have priority over the PATHS from .bashrc, and
some other path than bowtie path contains also another version of bowtie (for example, the path for java in fusioncatcher/etc/configuration.cfg contains some old version of bowtie executable also)

The workaround for now is to remove the older bowtie executable from the paths that collide with the bowtie executable which is needed by FusionCatcher.

bowtie-build: error while loading shared libraries: libtbbmalloc_proxy.so.2: cannot open shared object file: No such file or directory

Hi,

I running the FusionCatcher installation script on an Ubuntu 14.04 node (DNAnexus) with these package pre-installed ( build-essential, openjdk-7-jre, python2.7, python2.7-dev, gcc, libghc-zlib-dev, python-numpy, python-biopython, python-xlrd, python-openpyxl, dx-toolkit). All dependencies install nicely, however towards the end I see this error message claiming that a shared library is not found. I am nowmtesting to install libtbb2 to see if this fixes the problem.

bowtie-build: error while loading shared libraries: libtbbmalloc_proxy.so.2: cannot open shared object file: No such file or director

Daniel

Failure at seqtk due possibly to memory.

I have run FC on a panel of normals / tumors using the following command:

fusioncatcher -d /data/fusioncatcher_library/ --input /data/tumor/ --normal /data/normal/ --start=617 --output /data/somatic/ -z --visualization-psl --assemble --threads=8

After running for about a week (8 cores, 32gb), I received an error:

ERROR: Workflow execution failed at step 617 while executing:

seqtk
subseq
/data/fusioncatcher/somatic/tumor/reads-filtered.fq
/data//fusioncatcher/somatic/tumor/list-names-reads-filtered_genome.txt \

/data/fusioncatcher/somatic/tumor/reads_filtered_unique-mapped-genome.fq

ERROR: Most likely this fails because there is not enough free RAM memory for running SEQTK SUBSEQ tool https://github.com/lh3/seqtk on this computer. Please, try to (i) run it on a server/computer with larger amount of memory, or (ii) using command line option '--no-seqtk-subseq' !

Size '/data/fusioncatcher/somatic/tumor/reads-filtered.fq' = 100998414874 bytes
Size '/data/fusioncatcher/somatic/tumor/list-names-reads-filtered_genome.txt' = 3772297216 bytes
Size '/data/fusioncatcher/somatic/tumor/reads_filtered_unique-mapped-genome.fq' = 0 bytes

get_synonyms.py fails to handle mouse data

Hi,

I tried to download mouse data by using fusioncatcher-build;

fusioncatcher-build -g mus_musculus -o .

however I got error:

////////////////////////////////////////////////////////////////////////////////
  Running: step = 9   Time: 10:48   Date: 2017-06-08 (elapsed time: 0d:0h:8m)
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
==> Current working directory: '/gpfs/gsfs4/users/CCBR/projects/ccbr757/fc'
get_synonyms.py \
--organism mus_musculus \
--server ftp.ensembl.org \
--output /gpfs/gsfs4/users/CCBR/projects/ccbr757/fc/
--------------------------------------------------------------------------------
+-->EXECUTING...
Downloading the MySQL files of organism 'mus_musculus' from Ensembl!
230 Login successful.
ERROR: '/pub/current_mysql/mus_musculus_core_' not found!

The reason is there are more than 1 item started with mus_musculus_core_89_38

I fixed line 139 to:

if organism_dir and len(organism_dir) >= 1:

Then there is another bug at line 192, I added "and parts[0].find('Table structure') == -1"
However error KeyError: None is still exists since last_name is empty.

I don't want to change your code too much, please take a look this error.

Thanks

Jack

Bowtie needs libttbb2

I ran bootstrap.py last Friday on Ubuntu 16.04, but when I tried it out today like so

./fusioncatcher/fusioncatcher_v0.99.7b/bin/fusioncatcher \
-z \
-i ../../data/sim31_data \
-d ./fusioncatcher/data/ensembl_v86 \
-o ./output

The program exited with

#bowtie-align: error while loading shared libraries: libtbb.so.2: cannot open shared object file: No such file or directory
#ERROR: Wrong version of BOWTIE found (/mnt/fusion_callers/exp/4.fusionCatcher/fusioncatcher/tools/bowtie/bowtie)! It should be 'bowtie-align version 1.2'. One may specify the path to the correct version in 'fusioncatcher/etc/configuration.cfg'

This could be solved with a sudo apt-get install libtbb2

Bowtie info:

>/mnt/fusion_callers/exp/4.fusionCatcher/fusioncatcher/tools/bowtie/bowtie --version
bowtie-align version 1.2
64-bit
Built on mint
Tue Dec 27 17:03:06 UTC 2016
Compiler: gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
Options: -O3 -m64  -Wl,--hash-style=both -DWITH_TBB -DPOPCNT_CAPABILITY -DNO_SPINLOCK -DWITH_QUEUELOCK=1
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

bowtie-build: error while loading shared libraries: libtbbmalloc_proxy.so.2: cannot open shared object file: No such file or directory

In some cases Bowtie 1.2 would raise this error:

bowtie-build: error while loading shared libraries: libtbbmalloc_proxy.so.2: cannot open shared object file: No such file or directory

on older RedHat/glibc.

How to get fusion gene fasta sequence for validation

As asked in the title, how to get fusion gene sequence for validation (RT-PCR)?

Error while downloading ensembl_v86.tar.gz.aa

Hi,

I am trying to install fusioncatcher using:

wget http://sf.net/projects/fusioncatcher/files/bootstrap.py -O bootstrap.py
python2.6 bootstrap.py --install-all --yes -t --download

--2017-03-12 14:55:13--  https://superb-sea2.dl.sourceforge.net/project/fusioncatcher/data/ensembl_v86.tar.gz.aa
Resolving superb-sea2.dl.sourceforge.net... 209.160.57.180
Connecting to superb-sea2.dl.sourceforge.net|209.160.57.180|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5169479680 (4.8G) [application/octet-stream]
Saving to: “/home/rathik/tools/fusioncatcher/fusioncatcher/data/ensembl_v86.tar.gz.aa”

     0K .......... .......... .......... .......... ..........  0%  131K 10h40m
    50K .......... .......... .......... .......... ..........  0%  187K 9h5m
   100K .......... .......... .......... .......... ..........  0%  388K 7h15m
   150K .......... .......... .......... .......... ..........  0%  469K 6h11m
   200K .......... .......... .......... .......... ..........  0%  312K 5h51m
...
3832950K .......... .......... .......... .......... .......... 75%  885K 2h28m
3833000K .......... .......... .......... .......... .......... 75%  349K 2h28m
3833050K .......... .......... .......... .......... .......... 75%  372K 2h28m
3833100K .......... .......... .......... .......... .......... 75%  384K 2h28m
3833150K .......... .......... .......... .......... .......... 75%  306K 2h28m
3833200K .......... .......... .......... .......... .......... 75%  387K 2h28m
3833250K .......... .......... .......... .......... .......... 75%  396K 2h28m
3833300K .......... .......... .......... .......... .......... 75%  317K 2h28m
3833350K .......... .......... .......... .......... .......... 75%  374K 2h28m
3833400K .......... .......... ...

But it breaks at this point. I have tried twice and both times, it broke after installing around 75%. Can you suggest a work around for this?

mouse build fails for v0.99.6a

fusioncatcher-build -g mus_musculus -o fusioncatcher/data/mus_musculus

The error log showed this:

--------------------------------------------------------------------------------------------
ERROR: Workflow execution failed at step 50 while executing:
----------------
   shield_genes.py \
   --organism mus_musculus \
   --read-len 500 \
   --output fusioncatcher/data/mus_musculus/ \
   --pseudo-genes-check
----------------


Executing second time the same step/command in order to capture error
messages (i.e. STDERR)...

-------------------------------------------
Traceback (most recent call last):
  File "fusioncatcher/bin/shield_genes.py", line 370, in <module>
    gis = set(gene_ids)
NameError: name 'gene_ids' is not defined

Saccharomyces_cerevisiae not building

$ fusioncatcher-build -g Saccharomyces_cerevisiae -o yeastData

Here is a relevant excerpt from the output of that command:

+-->EXECUTING...
Downloading the genome of organism 'saccharomyces_cerevisiae' from Ensembl!
230 Anonymous access granted, restrictions apply
ERROR: Cannot find the genome and mitochondria files!

ERROR: Workflow execution failed at step 4 while executing:

get_genome.py
--organism Saccharomyces_cerevisiae
--server ftp.ensembl.org
--output /scratchl/yeastData/

I presume the problem is the a missing file ending in.dna.chromosome.mt.fa.gz at
ftp://ftp.ensembl.org/pub/release-83/fasta/saccharomyces_cerevisiae/dna/

But, there is a file named Saccharomyces_cerevisiae.R64-1-1.dna.chromosome.Mito.fa.gz

FusionCatcher v0.99.4c installs wrong version of STAR

FusionCatcher v0.99.4c installs an incorrect version of STAR!
STAR 2.4.1d should have been installed instead of STAR 2.4.1c by boostrap.py!

The error message shown by STAR and FusionCatcher in this case looks approximately like this:

EXITING because of fatal PARAMETERS error: present --sjdbOverhang=50 is not
equal to the value at the genome generation step =100
SOLUTION:

Jun 25 17:56:53 ...... FATAL ERROR, exiting

Fusioncatcher uses both STAR and BLAT

Hi,

Does fusioncatcher use BLAT and STAR both by default? I saw that --skip-blat and --skip-star both are set to FALSE by default. Is there a way to make fusioncatcher run faster than it already does? I am running it on 170 samples with an average file size of 6 GB. Each run is taking a lot of time:

fusioncatcher-batch.py -p 10 -d $fusioncatcher/data/current --input samples.txt --output outdir

Thanks,
Komal

seqtk trimfq error

When I run the test data with default options I get the error message "trimfq: invalid option -- 'B'". If I add the option --5keep 0 to stop trimfq, everything is OK (so far). I believe the following needs amending:

fusioncatcher.py line 2865.
job.add('-B',options.trim_3end_keep,kind='parameter')

should be?
job.add('-b',options.trim_3end_keep,kind='parameter')

fusioncatcher looks for bowtie in wrong place

(root installed) fusioncatcher dies with Wrong version of BOWTIE found due to this:

os.system("bowtie --version | head -1 > '%s'" % (outdir('bowtie_version.txt'),))

This checks if bowtie is in $PATH, not if it is in ~/fusioncatcher/tools/bowtie2, which is where ~/fusioncatcher/fusioncatcher_v0.99.7b/etc/configuration.cfg says it is.

The same is probably true for the system call to fastq-dump and all subsequent os.system calls

I printed sys.path right before the check and found:
/opt/fusioncatcher/fusioncatcher_v0.99.7b/bin:/usr/lib/python2.7:/usr/lib/python2.7/plat-x86_64-linux-gnu:/usr/lib/python2.7/lib-tk:/usr/lib/python2.7/lib-old:/usr/lib/python2.7/lib-dynload:/usr/local/lib/python2.7/dist-packages:/usr/lib/python2.7/dist-packages:/usr/lib/python2.7/dist-packages/PILcompat:/usr/local/lib/python2.7/dist-packages/numpy:/opt/fusioncatcher/tools/biopython:/opt/fusioncatcher/bin

BiopythonWarning: Import of C module failed

This is more of a warning than an error but wanted to know if I can do something about it:

This is my command:

fusioncatcher.py -p 10 -d $bin/fusioncatcher/data/current --input SKNAS --output SKNAS_out

The second warning about Biopython:

WARNING: Cannot restart automatically because the previous log file '/mnt/isilon/maris_lab/target_nbl_ngs/rathik/fusion-catcher/SKNAS_out/fusioncatcher.log' cannot be found!
The workflow will be restarted from the beginning with step 1!
/mnt/isilon/cbmi/variome/bin/fusioncatcher/tools/biopython/Bio/pairwise2.py:993: BiopythonWarning: Import of C module failed. Falling back to pure Python implementation. This may be slooow...
  'implementation. This may be slooow...', BiopythonWarning)

If its going to be really slow, how can I fix this?

Installation error

Hi Daniel,
I'm trying to install fusioncatcher but am bumping into errors. First of all, our firewall seems to break the ensembl download so I had to resort to downloading files locally and then running the --local install. I get to liftOver and then hit an error:

Checking latest version of Ensembl database that is available...
2015-01-06 15:33:39 URL: ftp://ftp.ensembl.org/pub/ [6774] -> ".listing" [1]
   * Not found! (WARNING: Is the internet connection working?)
Obtaining the absolute path of the Python executable...
  * Ok! '/home/klrl262/miniconda/envs/fusioncatcher/bin/python' found!
Python used for installation of FusionCatcher: '/home/klrl262/miniconda/envs/fusioncatcher/bin/python'
Checking Python version...
  * Ok! Found Python version: 2.7
Checking if this environment is a 64-bit environment...
  * Ok! 64-bit environment found.

Installing FusionCatcher from <http://code.google.com/p/fusioncatcher/>
------------------------------------------------------------------------

Path for installation of FusionCatcher: '/home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher'
Installing tool 'FusionCatcher (fusion genes finder in RNA-seq data)' at '/home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/bin' from './fusioncatcher_v0.99.3e.zip'
    # mkdir -p /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher
    # rm -rf /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/fusioncatcher_v0.99.3e.zip
    # cp ./fusioncatcher_v0.99.3e.zip /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher
    # rm -rf /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/bin
    # rm -rf /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/fusioncatcher_v0.99.3e
    # unzip -o /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/fusioncatcher_v0.99.3e.zip -d /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher
    # ln -s /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/fusioncatcher_v0.99.3e /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/bin
    # chmod -R +rx /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/bin
  * Done!
Checking if the Python module named 'NumPy' is installed...
  * Ok! Python module 'NumPy' found at '/home/klrl262/miniconda/envs/fusioncatcher/lib/python2.7/site-packages/numpy'!
Checking if the Python module named 'BioPython' is installed...
  * Ok! Python module 'BioPython' found at '/home/klrl262/miniconda/envs/fusioncatcher/lib/python2.7/site-packages/Bio'!
Checking if the Python module named 'Xlrd' is installed...
  * Ok! Python module 'Xlrd' found at '/home/klrl262/miniconda/envs/fusioncatcher/lib/python2.7/site-packages/xlrd'!
Checking if 'BOWTIE (short read aligner)' is installed...
  * WARNING: Not found!
Installing tool 'BOWTIE (short read aligner)' at '/home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie' from './bowtie-1.1.1-linux-x86_64.zip'
    # mkdir -p /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools
    # rm -rf /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie-1.1.1-linux-x86_64.zip
    # cp ./bowtie-1.1.1-linux-x86_64.zip /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools
    # rm -rf /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie
    # rm -rf /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie-1.1.1-linux-x86_64
    # unzip -o /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie-1.1.1-linux-x86_64.zip -d /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools
    # rm -rf /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie-1.1.1-linux-x86_64
    # mv /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie-1.1.1 /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie-1.1.1-linux-x86_64
    # ln -s /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie-1.1.1-linux-x86_64 /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie
    # chmod -R +rx /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie
  * Done!
Checking if 'BOWTIE2 (short read aligner)' is installed...
  * WARNING: Not found!
Installing tool 'BOWTIE2 (short read aligner)' at '/home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie2' from './bowtie2-2.2.4-linux-x86_64.zip'
    # mkdir -p /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools
    # rm -rf /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie2-2.2.4-linux-x86_64.zip
    # cp ./bowtie2-2.2.4-linux-x86_64.zip /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools
    # rm -rf /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie2
    # rm -rf /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie2-2.2.4-linux-x86_64
    # unzip -o /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie2-2.2.4-linux-x86_64.zip -d /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools
    # rm -rf /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie2-2.2.4-linux-x86_64
    # mv /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie2-2.2.4 /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie2-2.2.4-linux-x86_64
    # ln -s /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie2-2.2.4-linux-x86_64 /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie2
    # chmod -R +rx /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/bowtie2
  * Done!
Checking if 'NCBI SRA Toolkit (sequence assembler for short reads)' is installed...
  * WARNING: Not found!
Installing tool 'NCBI SRA Toolkit (sequence assembler for short reads)' at '/home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/sratoolkit' from './sratoolkit.2.3.5-2-centos_linux64.tar.gz'
    # mkdir -p /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools
    # rm -rf /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/sratoolkit.2.3.5-2-centos_linux64.tar.gz
    # cp ./sratoolkit.2.3.5-2-centos_linux64.tar.gz /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools
    # rm -rf /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/sratoolkit
    # rm -rf /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/sratoolkit.2.3.5-2-centos_linux64
    # tar --overwrite -xvzf /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/sratoolkit.2.3.5-2-centos_linux64.tar.gz -C /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools
    # ln -s /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/sratoolkit.2.3.5-2-centos_linux64 /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/sratoolkit
    # chmod -R +rx /home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat/fusioncatcher/tools/sratoolkit
  * Done!
Checking if 'LiftOver (Batch Coordinate Conversion)' is installed...
  * Found at '/home/klrl262/miniconda/envs/fusioncatcher/bin/fuscat'!
Traceback (most recent call last):
  File "bootstrap.py", line 1256, in <module>
    install = options.install_all or options.install_all_tools)
  File "bootstrap.py", line 791, in tool
    exit = False)
  File "bootstrap.py", line 360, in test_tool
    flag,r = cmd([[[exe,param],False]], exit = False, verbose = False)
  File "bootstrap.py", line 442, in cmd
    p = subprocess.Popen(c0, stdout = subprocess.PIPE, stderr = subprocess.STDOUT, shell = c1)
  File "/home/klrl262/miniconda/envs/fusioncatcher/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/home/klrl262/miniconda/envs/fusioncatcher/lib/python2.7/subprocess.py", line 1327, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

Could you make erroring out more verbose? Would be great to know which command causes the failing.

Further, it looks like bootstrap.py always tries to connect to Ensembl(?) as its very first step, doesn't know how to use a proxy and just waits for a long time before timing out (see first three lines above).

Thanks,
Miika

NameError running 0.99.4d beta

Hi,

Whilst running fusioncatcher (0.99.4d beta) I got the following error:

Traceback (most recent call last):
  File "/fusioncatcher/bin/fusioncatcher.py", line 3584, in <module>
    if len_reads > 40 and options.trim_wiggle:
NameError: name 'len_reads' is not defined

On line 3235, I find the following:

if os.path.exists(outdir('log_lengths_reads.txt')):
    len_reads = int(file(outdir('log_lengths_reads.txt'),'r').readline().rstrip('\r\n'))
    # reads shorter than this will be skipped from analysis, 34?
    minimum_length_short_read = len_reads

Obviously, log_lengths_reads.txt does not exist and len_reads is not being
set, but what is the real error? Is it an error for the file not to exist (in
which case this should be caught and reported earlier), or is there a sensible
default for len_reads? Either way, there is clearly a code path to the use of
len_reads when the condition isn't met.

In some cases very large number of fusions reported by FusionCatcher v0.99.4c for reads longer than 130 bp

FusionCatcher v0.99.4c will report, in some cases, very large number of fusions (e.g. several hundreds) when the input contains reads longer than 130 bp.

This will be fixed in the next release.

The workaround is to use the command line option "--sonication=1000".

ndaniel / fusioncatcher Goto Github PK

fusioncatcher's Introduction

FusionCatcher

Download / Install / Update / Upgrade FusionCatcher

Description

Manual

Forum

Release history

Official releases

Citing

fusioncatcher's People

Contributors

Stargazers

Watchers

Forkers

fusioncatcher's Issues

ERROR: Workflow execution failed at step 246 while executing:

ERROR: Workflow execution failed at step 617 while executing:

ERROR: Workflow execution failed at step 4 while executing:

Recommend Projects

Recommend Topics

Recommend Org