Code Monkey home page Code Monkey logo

star-seqr's Introduction

Travis Pypi Conda Last

STAR-SEQR

RNA Fusion Detection and Quantification using STAR.

Post-alignment run times are typically <20 minutes using 4 threads. Development is still ongoing and several features are currently in the works. DNA breakpoint detection is still experimental.

Installation

This package is tested under Linux using Python 2.7, 3.4, 3.5, and 3.6.

You can install from Pypi. Please use a recent version of pip and cython:

pip install -U pip
pip install -U cython
pip install starseqr

Or build directly from Github by cloning the project, cd into the directory and run:

python setup.py install

Or from Docker:

docker pull eagenomics/starseqr

Or from Bioconda:

conda install -c bioconda starseqr
Additional Requirements

Build a STAR Index

First make sure the dependencies are installed and generate a STAR index for your reference.

RNA Index

STAR --runMode genomeGenerate --genomeFastaFiles hg19.fa --genomeDir STAR_SEQR_hg19gencodeV24lift37_S1_RNA --sjdbGTFfile gencodeV24lift37.gtf --runThreadN 18 --sjdbOverhang 150 --genomeSAsparseD 1

Run STAR-SEQR

STAR-SEQR can perform alignment or utilize existing outputs from STAR. Note- STAR-SEQR alignment parameters have been tuned for fusion calling.

Python on OS

starseqr.py -1 RNA_1.fastq.gz -2 RNA_2.fastq.gz -m 1 -p RNA_test -t 12 -i path/STAR_INDEX -g gencode.gtf -r hg19.fa -vv

CWL

Note that --name_prefix must be a string basename in this case.

cwltool ~/path/STAR-SEQR/devtools/cwl/starseqr_v0.6.6.cwl --fq1 /path/UHRR_1_2_5m_L4_1.clipped.fastq.gz --fq2 /path/UHRR_1_2_5m_L4_2.clipped.fastq.gz --star_index_dir /path/gencodev25lift37/STAR_INDEX --name_prefix test_cwl --transcript_gtf /path/gencodev25/gencode.v25lift37.annotation.gtf --genome_fasta /path/gencodev25/GRCh37.primary_assembly.genome.fa --mode 1 --worker_threads 8

DOCKER

Note that -p must be a fully qualified path in this case.

docker run -it -v /mounts:/mounts eagenomics/starseqr:0.6.5 starseqr.py -1 /mounts/path/UHRR_1_2_5m_L4_1.clipped.fastq.gz -2 /mounts/path/UHRR_1_2_5m_L4_2.clipped.fastq.gz -p /mounts/path/test_docker  -i /mounts/path/gencodev25lift37/STAR_INDEX -g /mounts/path/gencodev25/gencode.v25lift37.annotation.gtf  -r /mounts/path/gencodev25/GRCh37.primary_assembly.genome.fa -m 1 -vv

Outputs

A BEDPE file is produced and is compatible with SMC-RNA Dream Challenge.

Breakpoints.txt and Candidates.txt have the following columns:

Values Description
NAME Gene Symbols for left and right fusion partners
NREAD_SPANS The number of paired reads that are discordant spanning and suppor the fusion
NREAD_JXNLEFT The number of paired reads that are anchored on the left side of the gene fusion
NREAD_JXNRIGHT The number of paired reads that are anchored on the right side of the gene fusion
FUSION_CLASS Classification of fusion based on chromosomal location, distance and strand. [GENE_INTERNAL, TRANSLOCATION, READ_THROUGH, INTERCHROM_INVERTED, INTERCHROM_INTERSTRAND]
SPLICE_TYPE Classification of the fusion breakpoint. If on the exon boundary is CANONICAL, else NON-CANONICAL
BRKPT_LEFT The 0-based genomic position of the fusion breakpoint for the left gene partner
BRKPT_RIGHT The 0-based genomic position of the fusion breakpoint for the right gene partner
LEFT_SYMBOL The left gene symbol
RIGHT_SYMBOL The right gene symbol
ANNOT_FORMAT The description of keys that are used in the ANNOT column. Similar to VCF FORMAT notation.
LEFT_ANNOT The values described in the ANNOT_FORMAT column for the left gene breakpoint
RIGHT_ANNOT The values described in the ANNOT_FORMAT column for the right gene breakpoint
DISTANCE The genomic distance between breakpoints. Empty if a translocation.
ASSEMBLED_CONTIGS The velvet assembly of the supporting chimeric reads
ASSEMBLY_CROSS_JXN A boolean value indicating if the assembly crosses the putative breakpoint
PRIMERS Primers left, right designed against the highest expressing predicted fusion transcript
ID Internal notation of STAR-SEQR breakpoints.
SPAN_CROSSHOM_SCORE Homology score with range of [0-1] to indicate the probability of spanning chimeric reads mapping to both gene partners
JXN_CROSSHOM_SCORE Homology score with range of [0-1] to indicate the probability of junction chimeric reads mapping to both gene partners
OVERHANG_DIVERSITY The number of unique fragments that fall from left anchored split-reads onto the right gene and vice-versa.
MINFRAG20 The number of overhang fragments that have at least 20 bases
MINFRAG35 The number of overhang fragments that have at least 35 bases
TPM_FUSION Expression of the most abundant fusion transcript expressed in transcripts per million
TPM_LEFT Expression of the most abundant left transcript expressed in transcripts per million
TPM_RIGHT Expression of the most abundant right transcript expressed in transcripts per million
MAX_TRX_FUSION Highest expressing fusion transcript. Expression corresponds to TPM_FUSION
DISPOSITION Values to indicate PASS or other specific reasons for failure

Feedback

Yes! Please give us your feedback, raise issues, and let us know how the tool is working for you. Pull requests are welcome.

Contributions

This project builds of the groundwork of other public contributions. Namely:

star-seqr's People

Contributors

jasper1918 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

star-seqr's Issues

Salmon quant

As is, starseqr always requires fatqs even if a STAR alignment is provided. If we modify the salmon call to do alignment mode in this scenario we can bypass the fastq requirement

[STAR-SEQR; Docker] ERROR - Exception: ('too many values to unpack', u'occurred at index **')

System Info:

STAR-SEQR version= 0.6.7
Docker version (Server/client): 20.10.12
OS: Ubuntu 20.04.5 LTS x86_64

I'm running star-seqr via the docker image:
eagenomics/starseqr (Image ID: 0e715fd07246)

First I generated the STAR Reference Genome Index running STAR inside the docker image (v. STAR_2.5.3a_modified). No problem arose.

Then I run star-seqr providing as input fastq.gz files from one of my sample as follow:

docker run -v $storage_dir:/data \
                -v $reference_dir:/reference \
                -v $temporary_dir:/output \
                eagenomics/starseqr starseqr.py \
                -1 /data/$id'_R1.fastq.gz' \
                -2 /data/$id'_R2.fastq.gz' \
                -p /output/starseqr/$id \
                -i /reference/fusions/Homo_sapiens_assembly38-Index_STARSEQR_ReadLength50 \
                -g /reference/gencode.v27.primary_assembly.annotation.gtf \
                -r /reference/Homo_sapiens_assembly38.fasta \
                -t 16 -m 1 -vv

I already checked for the correctness of the provided paths once resolved the variables in use.
The STAR-SEQR run ends when applying the function "apply_jxn_strand"

I copy below the content of the LOG file produced by the run to provide more details about the error:

2023-01-30 10:43 - INFO -

2023-01-30 10:43 - INFO - ################################################################################
2023-01-30 10:43 - INFO - #                             0 01/30/23  10:43:06                             #
2023-01-30 10:43 - INFO - ################################################################################
2023-01-30 10:43:06 - starseqr - INFO - ***************STAR-SEQR******************
2023-01-30 10:43:06 - starseqr - INFO - CMD = /opt/conda/bin/starseqr.py -1 /data/606_R1.fastq.gz -2 /data/606_R2.fastq.gz -p /output/starseqr/606
 -i /reference/fusions/Homo_sapiens_assembly38-Index_STARSEQR_ReadLength50 -g /reference/gencode.v27.primary_assembly.annotation.gtf -r /reference
/Homo_sapiens_assembly38.fasta -t 16 -m 1 -vv
2023-01-30 10:43:06 - starseqr - INFO - STAR-SEQR_version = 0.6.7
2023-01-30 10:43:06 - starseqr - INFO - Starting to work on sample: /output/starseqr/606
2023-01-30 10:43:06 - starseqr - INFO - Found input: /data/606_R1.fastq.gz
2023-01-30 10:43:06 - starseqr - INFO - Found input: /data/606_R2.fastq.gz
2023-01-30 10:43:06 - starseqr - INFO - Found input: /reference/Homo_sapiens_assembly38.fasta
2023-01-30 10:43:06 - starseqr - INFO - Found input: /reference/gencode.v27.primary_assembly.annotation.gtf
2023-01-30 10:43:06 - star_funcs - INFO - Starting STAR Alignment
2023-01-30 10:43:06 - star_funcs - INFO - *STAR Command: STAR --readFilesIn /data/606_R1.fastq.gz /data/606_R2.fastq.gz --readFilesCommand zcat --
runThreadN 16 --genomeDir /reference/fusions/Homo_sapiens_assembly38-Index_STARSEQR_ReadLength50 --outFileNamePrefix  /output/starseqr/606_STAR-SE
QR/606. --chimScoreJunctionNonGTAG -1 --outSAMtype None --chimOutType SeparateSAMold --alignSJDBoverhangMin 5 --outFilterMultimapScoreRange 1 --ou
tFilterMultimapNmax 5 --outMultimapperOrder Random --outSAMattributes NH HI AS nM ch --chimSegmentMin 10 --chimJunctionOverhangMin 10 --chimScoreM
in 1 --chimScoreDropMax 30 --chimScoreSeparation 7 --chimSegmentReadGapMax 3 --chimFilter None --twopassMode None --alignSJstitchMismatchNmax 5 -1
 5 5 --chimMainSegmentMultNmax 10
2023-01-30 10:46:43 - star_funcs - INFO - Jan 30 10:43:06 ..... started STAR run
Jan 30 10:43:06 ..... loading genome
Jan 30 10:43:43 ..... started mapping
Jan 30 10:46:43 ..... finished successfully

2023-01-30 10:46:43 - star_funcs - INFO - STAR Alignment Finished!
2023-01-30 10:46:43 - core - INFO - Importing junctions
2023-01-30 10:46:44 - core - INFO - Number of candidates removed due to Mitochondria filter: 1500
2023-01-30 10:46:44 - core - INFO - Removing duplicate reads
2023-01-30 10:46:44 - common - INFO - Begin multiprocessing of function apply_cigar_overhang in a pool of 16 workers using map_async protocol
2023-01-30 10:46:44 - common - DEBUG - *The dataframe will be split evenly across the 16 workers
2023-01-30 10:46:44 - common - DEBUG - *Initializing a map_async pool with 16 workers
2023-01-30 10:46:45 - common - DEBUG - *Time to run pandas_parallel on apply_cigar_overhang took 0.464632 seconds
2023-01-30 10:47:05 - starseqr - INFO - Ordering junctions
2023-01-30 10:47:05 - starseqr - INFO - Normalizing junctions
2023-01-30 10:47:05 - common - INFO - Begin multiprocessing of function apply_normalize_jxns in a pool of 16 workers using map_async protocol
2023-01-30 10:47:05 - common - DEBUG - *The dataframe will be split evenly across the 16 workers
2023-01-30 10:47:05 - common - DEBUG - *Initializing a map_async pool with 16 workers
2023-01-30 10:47:05 - common - DEBUG - *Time to run pandas_parallel on apply_normalize_jxns took 0.559973 seconds
2023-01-30 10:47:05 - starseqr - INFO - Getting gene strand and flipping info as necessary
2023-01-30 10:47:05 - common - INFO - Begin multiprocessing of function apply_jxn_strand in a pool of 16 workers using map_async protocol
2023-01-30 10:47:05 - common - DEBUG - *The dataframe will be split evenly across the 16 workers
2023-01-30 10:47:05 - common - DEBUG - *Initializing a map_async pool with 16 workers
2023-01-30 10:47:19 - common - ERROR - Exception: ('too many values to unpack', u'occurred at index 1820')

Moreover, last file produced in the output directory is: *_STAR-SEQR_breakpoints.txt.
It only contains the header.

Any help about how to fix this error?

Further info:
I previously run star-seqr successfully on simulated reads data.
Given the returned error, the only difference I can think of between real and simulated read files is the naming schema of the reads themselves.
I adopted a super basic naming schema for simulated data (ie. reads1000\1 and reads1000\2) while the naming schema for real data is the one produced by illumina sequencers.
May this difference be the cause of the problem I get or it's not relevant?

nested renamer is not supported

Hi
I am running starseqr on my samples and stuck in error.

starseqr.py -1 sample_1.fastq.gz -2 sample_2.fastq.gz -m 1 -p starseqr_test -t 50 -i STAR_FUSION_LIB/ref_genome.fa.star.idx/ -g genomic.gtf -r genomic.fa -vv
2021-06-22 10:13 - INFO - STAR-SEQR***
2021-06-22 10:13 - INFO - CMD = /home/nipgr/software/STAR-SEQR/myenv/bin/starseqr.py -1 sample_1.fastq.gz -2 sample_2.fastq.gz -m 1 -p starseqr_test -t 50 -i STAR_FUSION_LIB/ref_genome.fa.star.idx/ -g genomic.gtf -r genomic.fa -vv
2021-06-22 10:13 - INFO - STAR-SEQR_version = 0.6.7
2021-06-22 10:13 - INFO - Starting to work on sample: /home/nipgr/Documents/chickpea/starseqr_test
2021-06-22 10:13 - INFO - Found input: sample_1.fastq.gz
2021-06-22 10:13 - INFO - Found input: sample_2.fastq.gz
2021-06-22 10:13 - INFO - Found input: genomic.fa
2021-06-22 10:13 - INFO - Found input: genomic.gtf
2021-06-22 10:13 - INFO - Starting STAR Alignment
2021-06-22 10:13 - INFO - *STAR Command: STAR --readFilesIn sample_1.fastq.gz sample_2.fastq.gz --readFilesCommand zcat --runThreadN 50 --genomeDir STAR_FUSION_LIB/ref_genome.fa.star.idx --outFileNamePrefix starseqr_test_STAR-SEQR/starseqr_test. --chimScoreJunctionNonGTAG -1 --outSAMtype None --chimOutType Junctions SeparateSAMold --alignSJDBoverhangMin 5 --outFilterMultimapScoreRange 1 --outFilterMultimapNmax 5 --outMultimapperOrder Random --outSAMattributes NH HI AS nM --chimSegmentMin 10 --chimJunctionOverhangMin 10 --chimScoreMin 1 --chimScoreDropMax 30 --chimScoreSeparation 7 --chimSegmentReadGapMax 3 --chimFilter None --twopassMode None --alignSJstitchMismatchNmax 5 -1 5 5 --chimMainSegmentMultNmax 10
2021-06-22 10:14 - INFO - b'Jun 22 10:13:02 ..... started STAR run\nJun 22 10:13:02 ..... loading genome\nJun 22 10:13:04 ..... started mapping\nJun 22 10:14:37 ..... finished mapping\nJun 22 10:14:37 ..... finished successfully\n'
2021-06-22 10:14 - INFO - STAR Alignment Finished!
2021-06-22 10:14 - INFO - Importing junctions
2021-06-22 10:14 - INFO - Number of candidates removed due to Mitochondria filter: 0
2021-06-22 10:14 - INFO - Removing duplicate reads
2021-06-22 10:14 - INFO - Begin multiprocessing of function apply_cigar_overhang in a pool of 50 workers using map_async protocol
2021-06-22 10:14 - INFO - Ordering junctions
2021-06-22 10:14 - INFO - Normalizing junctions
2021-06-22 10:14 - INFO - Begin multiprocessing of function apply_normalize_jxns in a pool of 50 workers using map_async protocol
2021-06-22 10:14 - INFO - Getting gene strand and flipping info as necessary
2021-06-22 10:14 - INFO - Begin multiprocessing of function apply_jxn_strand in a pool of 50 workers using map_async protocol
2021-06-22 10:15 - INFO - Begin multiprocessing of function apply_flip_func in a pool of 50 workers using map_async protocol
2021-06-22 10:15 - INFO - Aggregating junctions
Traceback (most recent call last):
File "/home/user/software/STAR-SEQR/myenv/bin/starseqr.py", line 622, in
sys.exit(main())
File "/home/user/software/STAR-SEQR/myenv/bin/starseqr.py", line 345, in main
jxn_summary = su.core.count_jxns(jxns)
File "/home/user/software/STAR-SEQR/myenv/lib64/python3.6/site-packages/starseqr_utils/core.py", line 123, in count_jxns
col)), ('counts', 'count')])), ('overhang_len', 'max')])).reset_index()
File "/home/user/software/STAR-SEQR/myenv/lib64/python3.6/site-packages/pandas/core/groupby/generic.py", line 940, in aggregate
result, how = self._aggregate(func, *args, **kwargs)
File "/home/user/software/STAR-SEQR/myenv/lib64/python3.6/site-packages/pandas/core/base.py", line 351, in _aggregate
raise SpecificationError("nested renamer is not supported")
pandas.core.base.SpecificationError: nested renamer is not supported

STAR alignment not working

I am running STAR-SEQR from the fastqs and getting this error:

2023-01-31 22:26 - ERROR - 
EXITING because of fatal PARAMETER error: --outSAMattributes contains ch tag, which requires BAM output.
SOLUTION: re-run STAR with --outSAMtype BAM Unsorted (and/or) SortedByCoordinate option, or without ch tag in --outSAMattributes

Jan 31 22:26:06 ...... FATAL ERROR, exiting

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

2023-01-31 22:26 - ERROR - Error: STAR failed
None

My version of STAR is 2.7.9a. Seems like STAR doesn't like --outSAMattributes to contain ch when --outSAMtype None is specified (but i would think it doesn't like any outSAMattributes when not producing a bam). just wondering what's the best way to deal with this or if there's any plans to update STAR-SEQR to work with newer versions of STAR.

networkx version incompatibility

The latest networkx2.1rc1 version renames several functions.

Specifically:

    adj_mat = nx.to_pandas_dataframe(node_graph)
AttributeError: 'module' object has no attribute 'to_pandas_dataframe'

The function is now called to_pandas_adjacency per:
networkx/networkx#2811

Either specify a version in the install or handle dynamically in the script.

Primer design

Currently primer design takes the assembled contig and tries to make primers on each half. Rather, it should ensure that each primer comes from a side of the junction. Either assemble each side separately and join or pull sequence from indexed fasta for the coordinates.

STAR-SEQR for single end reads

Hey!
I have to detect fusion transcripts on single-end reads using star seqr. However, I always get an error message "starseqr.py: error: the following arguments are required: -2/--fastq2".
I just wanted to know if star seqr works on single-end reads too.

homology

Reads mapping to homologous genes is an issue causing most false-positives currently. Either pre-populate a list or use blast, kmer, other alignment approaches to identify homologous regions to flag and remove support reads.

Use known reads more effectively

Split read support read ids into spanning and junction objects so that we can more effectively pass these into the support functions and gather metrics on the read pairs.

exact tool version .yaml available?

Hi, it seems that STAR-SEQR might no longer be useable with most recent versions of its dependencies. I have installed it via bioconda as per the following website, and keep running into roadblocks with deprecations and updated tools that have changed, and are no longer compatible with "starseqr.py" and its "starseqr_utils" library:

https://bioconda.github.io/recipes/starseqr/README.html

Is there anywhere that I can find a complete listing of all dependencies with exact tool versions for which STAR-SEQR has been tested and completely worked?

Thanks!

Using bam alignment

STAR-SEQR/starseqr.py

Lines 443 to 446 in 8889bc8

elif args.star_bam:
chimflag = 2048
star_bam_local = new_prefix + ".Chimeric.out.bam"
su.common.sam_2_coord_bam(new_prefix + ".Chimeric.out.sam", star_bam_local, args.threads)

I'm trying to use a pre-aligned bam and it errors here because it looks for sam file that doesn't exist. I believe this is a mistake. Do I need to have both the sam and bam file available?

read Chimeric.out.junction ERROR,about pandas

Dec 05 09:41:16 ..... loading genome
Dec 05 09:42:08 ..... started mapping
Dec 05 09:53:04 ..... finished mapping
Dec 05 09:53:04 ..... finished successfully

2020-12-05 09:53 - INFO - STAR Alignment Finished!
2020-12-05 09:53 - INFO - Importing junctions
2020-12-05 09:54 - ERROR - There was a problem reading your STAR *Chimeric.out.junction file
2020-12-05 09:54 - ERROR - Exception: could not convert string to float: NreadsUnique 33113327
Traceback (most recent call last):
File "/usr/bin/starseqr.py", line 622, in
sys.exit(main())
File "/usr/bin/starseqr.py", line 286, in main
rawdf = su.core.import_starjxns(new_prefix + ".Chimeric.out.junction", args.keep_dups, args.keep_mito)
File "/usr/lib64/python2.7/site-packages/starseqr_utils/core.py", line 55, in import_starjxns
raise_(ValueError, e, traceback)
File "/usr/lib64/python2.7/site-packages/starseqr_utils/core.py", line 36, in import_starjxns
df['pos1'] = df['pos1'].astype(float).astype(int) # this bypasses some strange numbers
File "/usr/lib64/python2.7/site-packages/pandas/core/generic.py", line 5691, in astype
**kwargs)
File "/usr/lib64/python2.7/site-packages/pandas/core/internals/managers.py", line 531, in astype
return self.apply('astype', dtype=dtype, **kwargs)
File "/usr/lib64/python2.7/site-packages/pandas/core/internals/managers.py", line 395, in apply
applied = getattr(b, f)(**kwargs)
File "/usr/lib64/python2.7/site-packages/pandas/core/internals/blocks.py", line 534, in astype
**kwargs)
File "/usr/lib64/python2.7/site-packages/pandas/core/internals/blocks.py", line 633, in _astype
values = astype_nansafe(values.ravel(), dtype, copy=True)
File "/usr/lib64/python2.7/site-packages/pandas/core/dtypes/cast.py", line 702, in astype_nansafe
return arr.astype(dtype, copy=True)
ValueError: could not convert string to float: NreadsUnique 33113327

Probabilistic module

Need a probabilistic module to assign significance. Metrics available currently:

  • Unique vs all reads
  • Number of junctions reads supporting breakpoint broken down by left, right and all orientations
  • Number of discordant spanning reads broken down by all orientations
  • Alignment scores for supporting reads
  • Number of mismatches on supporting reads
  • Non-clipped sequence lengths for overhangs, junction pairs, discordant pairs
  • Base Qualities of all reads
  • Assembly coverage, length, and status on crossing predicted junction
  • Distance between breakpoints
  • homology score(min, max, median, uq?)
  • anchor and overhangs metrics including lengths, diversity, mismatch rate
  • Expression Values
  • Multimapping

IOERRROR in docker

I am running using the latest cwl, however am getting the error below. I tried updating the docker container but this did not make a change.

[48006 rows x 6 columns]]'. Reason: 'IOError('bad message length',)'

Error: no SAMPLE.Chimeric.out.junction file

When trying to run STAR-SEQR, I get this error after STAR has finished alignment:
OSError: [Errno 2] No such file or directory: '/path/to/SAMPLE.Chimeric.out.junction'

The reported STAR command line specifies "--chimOutType SeparateSAMold". Reading the STAR manual, in order to get the SAMPLE.Chimeric.out.junction file, I need to specify "--chimOutType Junctions". Is that correct, or should I expect the .junction file to be part of the output from the STAR command as generated by STAR-SEQR?

EDIT: my output folder DOES contain a file named SAMPLE.SJ.out.tab. Is this perhaps the expected junctions file by a different name? The content looks like this:
1 12228 12612 1 1 1 0 1 44
1 12698 13482 1 1 0 1 0 58
1 12722 13220 1 1 1 0 1 64
1 14830 14969 2 2 1 59 361 72

EDIT2: the SAMPLE.SJ.out.tab file is not the same as SAMPLE.Chimeric.out.junction. I tried adding a symlink in the code, but then I get a parsing error due to missing columns in the file.

EDIT3: I tried changing --chimOutType to Junctions, but then i get an error that the SAM file doesn't exist. So I definitely need some way to have STAR output both the SAM and the junctions file. I tried setting --chimOutType to both values (space- and comma-delimited), but STAR wouldn't accept that.

Issue with multi-filtering

While filtering novel-junctions two sequential filters are executed. If the first leaves no junctions left, starseqr will die.

ERROR - b'\nBAMoutput.cpp:27:BAMoutput: exiting because of *OUTPUT FILE* error: could not create output file

Running the following command generates an error:
starseqr.py -1 26-R-EMEA1-VA_S8_trim_L001_R1_001.fastq.gz -2 26-R-EMEA1-VA_S8_trim_L001_R2_001.fastq.gz -m 0 -p ./ -t 25 -i ./ -g ../genomes/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.gtf -r ../genomes/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa

2021-04-18 20:45 - ERROR - b'\nBAMoutput.cpp:27:BAMoutput: exiting because of *OUTPUT FILE* error: could not create output file /home/simonova/fusion_genes_kate_STAR-SEQR/fusion_genes_kate._STARtmp//BAMsort/20/16\nSOLUTION: check that the path exists and you have write permission for this file. Also check ulimit -n and increase it to allow more open files.\n\nApr 18 20:45:20 ...... FATAL ERROR, exiting\n'
2021-04-18 20:45 - ERROR - Error: STAR failed
NoneType: None

Runtime far exceeding Wiki-suggested runtime

Hi there,

I've been trying to use this software to analyze single-cell RNA-seq data. I was able to successfully run the STAR Index generation, and starseqr.py does not appear to crash but is taking far longer than ~20 minutes as suggested by the Wiki. The apply_get_rna_support function took ~3 hours using 24 CPUs, and the "Getting fusion homology scores" step failed to complete in ~30 hours.

I am wondering if this behavior has been seen before (and if I should commit 24 CPUs to multiple days), or if this likely indicates an error has occurred.

Thanks in advance!

Error: salmon quant failed

H,
I am using STAR-SEQR with specified -ss SAM and -sj Juncion files, but I get the error below:

/usr/local/Anaconda/envs/py2.7/lib/python2.7/site-packages/pandas/core/groupby/generic.py:1315: FutureWarning: using a dict with renaming is deprecated and will be removed in a future version
return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
2020-05-14 15:07 - ERROR - Error: salmon quant failed
None

Do anyone know how to solve this?

Thank you!

Potential future issue

/pandas/core/groupby.py:3961: FutureWarning: using a dict with renaming is deprecated and will be removed in a future version
return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)

Fusion Quantification

Apart from using the number of reads supporting a fusion, it would be good to have a better sense of the ratio of the fusion to the two gene partners. RSEM would be ideal but other approaches are still being considered.

Running STAR-SEQR with only star alignment outputs

Hello!
I've been trying to run STARSEQR with reads already aligned (with options for arriba, star-fusion and STARSEQR).
When running with the following options:

starseqr.py -sb -sj -p -t 8 -g -r --v
or
starseqr.py -sj -sb ​-p -t 8 -g -r --v
or
starseqr.py -sb -p -t 8 -g -r --v
I'm always getting:

starseqr.py: error: the following arguments are required: -1/--fastq1, -2/--fastq2

How can I run the fusion caller only with bam files and chimeric junctions?

gtf location

starseqr assumes the location of the gtf is writable. Please modify this so that it will write to either the output folder or tmp. This makes running docker/cwl tools more friendly.

IntervalTree error: version compatibility

If anyone comes across this error:

"AttributeError: 'IntervalTree' object has no attribute 'search'"

try downgrading your version of intervaltree to version 2.1.0.
Intervaltree version 3 was released in late 2018, and the 'search' function used here is no longer supported.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.