mehdiborji / nanoranger Goto Github PK
View Code? Open in Web Editor NEWsimplified cellranger for long-read data
License: MIT License
simplified cellranger for long-read data
License: MIT License
When running the 5p10XTCR example in a docker container the pipeline runs until the following point and then fails:
...<lines above cut>...
TRA chains: 156 (55.12%)
TRB chains: 127 (44.88%)
TCR_out/TCR_testrun_bcreads.fasta
TCR_out/TCR_testrun_ref/
Feb 19 22:24:35 ..... started STAR run
Feb 19 22:24:35 ... starting to generate Genome files
genomeGenerate.cpp:150:genomeGenerate: exiting because of *OUTPUT FILE* error: could not create output file TCR_out/TCR_testrun_ref//genomeParameters.txt
Solution: check that the path exists and you have write permission for this file
It looks like there might be an error in genomeGenerate.cpp
putting in an additional /
character into the path for the output file
Hello,
Thank you for this tool. I have 5' 10x Library sequenced with Nanopore Sequencing. I previously used JAFFAL to recover known fusion from Single-Cell which works quite well and I wanted to use your fusion detection pipeline using a fasta file to see how it performs with it. However, I encounter this error message on my own data:
alignment to genome and generation of BC-UMI-Transcript tagged BAM
cores = 20
ref = /home/user/nanoranger/FUSION_SEQUENCE.fa
infile= FUSION_TEST/fusion_deconcat.fastq.gz
outdir = FUSION_TEST
sample = fusion
[M::mm_idx_gen::0.001*1.50] collected minimizers
[M::mm_idx_gen::0.001*5.99] sorted minimizers
[M::main::0.001*5.96] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.001*5.82] mid_occ = 15
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.002*5.70] distinct minimizers: 626 (98.72% are singletons); average occurrences: 1.032; average spacing: 2.913; total length: 1882
[M::worker_pipeline::0.734*16.79] mapped 103327 sequences
[M::main] Version: 2.26-r1175
[M::main] CMD: minimap2 -aY --eqx -x splice -t 20 --secondary=no --sam-hit-only /home/user/nanoranger/FUSION_SEQUENCE.fa FUSION_TEST/fusion_deconcat.fastq.gz
[M::main] Real time: 0.738 sec; CPU: 12.330 sec; Peak RSS: 0.053 GB
[bam_sort_core] merging from 0 files and 20 in-memory blocks...
number of genome aligned reads = 4693
10000 barcode candidates processed
20000 barcode candidates processed
30000 barcode candidates processed
40000 barcode candidates processed
50000 barcode candidates processed
60000 barcode candidates processed
70000 barcode candidates processed
80000 barcode candidates processed
number of short UMI reads = 250
20000 Read-BC-UMI-Transcript tuples saved
40000 Read-BC-UMI-Transcript tuples saved
60000 Read-BC-UMI-Transcript tuples saved
rm: cannot remove 'FUSION_TEST/fusion_matching_*': No such file or directory
`
Suprisingly I encounter the same error with the test data
alignment to genome and generation of BC-UMI-Transcript tagged BAM
cores = 8
ref = /home/user/nanoranger/data/RUNX1_RUNX1T1_ABL1_BCR.fa
infile= K562_Kasumi1/fusion_deconcat.fastq.gz
outdir = K562_Kasumi1
sample = fusion
[M::mm_idx_gen::0.001*1.89] collected minimizers
[M::mm_idx_gen::0.001*2.31] sorted minimizers
[M::main::0.001*2.30] loaded/built the index for 7 target sequence(s)
[M::mm_mapopt_update::0.002*2.22] mid_occ = 10
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 7
[M::mm_idx_stat::0.002*2.17] distinct minimizers: 2164 (96.63% are singletons); average occurrences: 1.035; average spacing: 2.973; total length: 6656
[M::worker_pipeline::0.050*6.04] mapped 3152 sequences
[M::main] Version: 2.26-r1175
[M::main] CMD: minimap2 -aY --eqx -x splice -t 8 --secondary=no --sam-hit-only /home/user/nanoranger/data/RUNX1_RUNX1T1_ABL1_BCR.fa K562_Kasumi1/fusion_deconcat.fastq.gz
[M::main] Real time: 0.050 sec; CPU: 0.303 sec; Peak RSS: 0.010 GB
[bam_sort_core] merging from 0 files and 8 in-memory blocks...
number of genome aligned reads = 2883
number of short UMI reads = 4
rm: cannot remove 'K562_Kasumi1/fusion_matching_*': No such file or directory
Here is my working environment
The files present in the output directory for my data so far are :
fusion_barcode_scores.csv fusion_barcode_scores.pdf fusion_bcumi_dedup.csv fusion_BCUMI.fasta.gz fusion_deconcat.fastq.gz fusion_genome_tagged.bam fusion_genome_tagged.bam.bai fusion_knee.pdf fusion_matching.sam fusion_trns_ct.csv
I was looking to have an output file with the reads + barcodes + presence of the fusion, but I'm not sure I've found this in any of these files. Do you have a wiki with the output files created and their content description? I guess I must use the fusion_gene.py in the downstream folder in scripts, but I am unsure of the arguments I need to fill in to use it.
Also related to the script you provide, what is the script performing the extraction of the 10x barcodes? I saw that there are two bash scripts barcode_align.sh and barcode_ref.sh so I imagine those two which are called right ?
Thank you for your help,
Evan
Hi ,thanks for developing such a good tool
and if convenient ,I am looking forward to the part for 10x genomics Chromium 3' library,and very curious about when it will be uploaded.
Thanks!
When I run 5p10XTCR with the example data TCR3.fastq.gz on my MacOS system, I got the error:
"Traceback (most recent call last):
File "/Users/Home/nanoranger/pipeline.py", line 236, in
utils.process_matching_5p10XTCR(sample,outdir)
File "/Users/Home/nanoranger/utils.py", line 733, in process_matching_5p10XTCR
scores=sort_cnt(all_AS[all_AS[:,1]==0][:,0])
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.