Comments (16)
This is an error from featureCounts which seems not able to understand the GFF file you provided. Please check your GFF file and also the parameter "ATTRIBUTE" in config_main.yaml to make sure that they agree with each other.
from rasflow.
Hi @zhxiaokang,
I am providing a gff file directly from a Prokka output. Also, the ATTRIBUTE is set as "ID" instead of "gene_id". Nevertheless, I still get an error.
Maybe I should edit the gff file or convert it to some other file type?
Cheers,
Pablo
from rasflow.
Not sure whether this is the issue, but the error says: no features were loaded in format GTF, but you're actually using a GFF. Maybe try to use a GTF file instead of GFF?
And do you mind sharing the GFF file you're using? At least show some lines that include the "ATTRIBUTE".
from rasflow.
But Prokka doesn't provide gtf file. Also, gtf files are the old version of gff (gtf version 3, I think?)
I will share the file and the yaml line
from rasflow.
This is how the GFF files looks like (these are screenshots since I cannot upload a gff file here)
And this is the line of the config yaml file
# genome and annotation files
GENOME: data/example/ref/genome/027-annot.fna
ANNOTATION: data/example/ref/annotation/027-annot.gff
ATTRIBUTE: ID # the attribute used in annotation file. It's usually "gene_id", but double check that since it may also be "gene", "ID"...
from rasflow.
Hi, I'm sorry for the bad news but it seems that both featureCounts and htseq-count are designed for GTF format: https://help.galaxyproject.org/t/problems-with-attributes-in-featurecounts-gff3-input-instead-of-gtf/3046/2
As I tried out the example data on a GFF3 file with both tools, and they reported similar errors as what you got that they couldn't find the correct attribute. You may try to convert the GFF3 file into GTF file first. Here are some tools for the conversion: https://github.com/NBISweden/GAAS/blob/master/annotation/knowledge/gff_to_gtf.md
Hope this helps.
from rasflow.
Hi @zhxiaokang
I am still having issues even when I have converted the gff file into gtf file using AGAT.
My GTF file looks like this
This is the error I obtain:
`ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is 'gene_id'
An example of attributes included in your GTF annotation is 'gene_id "nbis-gene-95"; transcript_id "CDACBCJP_00095_gene"; ID "nbis-exon-95"; Parent "CDACBCJP_00095_gene"; inference "ab initio prediction:Prodigal:002006" "similar to AA sequence:ISfinder:ISHaha5"; locus_tag "CDACBCJP_00095"; product "IS110 family transposase ISHaha5"; protein_id "gnl|X|CDACBCJP_00095";'
The program has to terminate.
`
`Error in rule featureCount:
jobid: 0
output: output-pva/pva027/genome/countFile/20-0357_count.tsv, output-pva/pva027/genome/countFile/20-0357_count.tsv.summary
RuleException:
CalledProcessError in line 109 of /home/sam/Downloads/ilse/RASflow/workflow/align_count_genome.rules:
Command ' set -euo pipefail; featureCounts -p -T 4 -t exon -g gene_id -a data/example/ref/annotation/027-annot.gtf -o output-pva/pva027/genome/countFile/20-0357_count.tsv data/output/pva027/genome/bamFileSort/20-0357.sort.bam && tail -n +3 output-pva/pva027/genome/countFile/20-0357_count.tsv | cut -f1,7 > temp.20-0357 && mv temp.20-0357 output-pva/pva027/genome/countFile/20-0357_count.tsv ' returned non-zero exit status 255.
File "/home/sam/Downloads/ilse/RASflow/workflow/align_count_genome.rules", line 109, in __rule_featureCount
File "/home/sam/anaconda3/envs/rasflow/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message
`
from rasflow.
The GTF file looks good to me, at least the part in the screenshot. Could you share the whole file? I suspect that there are some issues somewhere in the file (not in the screenshot part).
from rasflow.
do you have an e-mail for sending the file? I cannot upload here on github.
from rasflow.
from rasflow.
Hi, after testing with your GTF file, I found that featureCounts will throw this error when there are two fields with quotes in the "inference" part. In the ERROR message you posted above, there are "ab initio prediction:Prodigal:002006"
and "similar to AA sequence:ISfinder:ISHaha5"
, and there are more such cases in your GTF file.
I'm trying to post this issue in their Google group but am still waiting to be admitted into the group.
For the time being, you may fix the issue by only cutting the gene_id part from the GTF file with such command:
cut -d';' -f1 027-annot.gtf > 027-annot_only_gene_id.gtf
And use 027-annot_only_gene_id.gtf
instead. I have tested this strategy (only picking the gene_id part) on the example data in RASflow, it produced almost the same counts as using the original GTF file.
from rasflow.
hi @zhxiaokang
Thank you for providing the command for fixing the GTF file. It worked perfectly!
However, there seems to be another error yet on the DEA visualization step
`Error in glmFit.default(y = y$counts, design = design, dispersion = dispersion, :
dispersion must be numeric
Calls: DEA ... glmFit -> glmFit.DGEList -> glmFit -> glmFit.default
In addition: Warning message:
In estimateDisp.default(y = y$counts, design = design, group = group, :
No residual df: setting dispersion to NA
Execution halted
[Wed Jan 20 09:27:00 2021]
Error in rule DEA:
jobid: 1
output: output-pva/pva027/genome/dea/countGroup/Untreated_gene_norm.tsv, output-pva/pva027/genome/dea/countGroup/Calcium_gene_norm.tsv, output-pva/pva027/genome/dea/DEA/dea_Untreated_Calcium.tsv, output-pva/pva027/genome/dea/DEA/deg_Untreated_Calcium.tsv
RuleException:
CalledProcessError in line 38 of /home/sam/Downloads/ilse/RASflow/workflow/dea_genome.rules:
Command ' set -euo pipefail; Rscript scripts/dea_genome.R ' returned non-zero exit status 1.
File "/home/sam/Downloads/ilse/RASflow/workflow/dea_genome.rules", line 38, in __rule_DEA
File "/home/sam/anaconda3/envs/rasflow/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Removing output files of failed job DEA since they might be corrupted:
output-pva/pva027/genome/dea/countGroup/Untreated_gene_norm.tsv, output-pva/pva027/genome/dea/countGroup/Calcium_gene_norm.tsv
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/sam/Downloads/ilse/RASflow/.snakemake/log/2021-01-20T092655.885343.snakemake.log
DEA is done!
Start visualization of DEA results!
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 end
1 plot
2
[Wed Jan 20 09:27:00 2021]
rule plot:
input: output-pva/pva027/genome/dea/countGroup, output-pva/pva027/genome/dea/DEA
output: output-pva/pva027/genome/dea/visualization/volcano_plot_Untreated_Calcium.pdf, output-pva/pva027/genome/dea/visualization/heatmap_Untreated_Calcium.pdf
jobid: 1
Loading required package: plotscale
hash-3.0.1 provided by Decision Patterns
Loading required package: GenomicFeatures
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colMeans,
colnames, colSums, dirname, do.call, duplicated, eval, evalq,
Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply,
lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int,
pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames,
rowSums, sapply, setdiff, sort, table, tapply, union, unique,
unsplit, which, which.max, which.min
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: ‘S4Vectors’
The following objects are masked from ‘package:hash’:
values, values<-
The following object is masked from ‘package:base’:
expand.grid
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Attaching package: ‘AnnotationDbi’
The following objects are masked from ‘package:hash’:
keys, keys<-
Loading required package: ggplot2
Loading required package: ggrepel
Error in file(file, "rt") : cannot open the connection
Calls: plot.volcano.heatmap -> read.table -> file
In addition: Warning message:
In file(file, "rt") :
cannot open file 'output-pva/pva027/genome/dea/DEA/dea_Untreated_Calcium.tsv': No such file or directory
Execution halted
[Wed Jan 20 09:27:06 2021]
Error in rule plot:
jobid: 1
output: output-pva/pva027/genome/dea/visualization/volcano_plot_Untreated_Calcium.pdf, output-pva/pva027/genome/dea/visualization/heatmap_Untreated_Calcium.pdf
RuleException:
CalledProcessError in line 53 of /home/sam/Downloads/ilse/RASflow/workflow/visualize.rules:
Command ' set -euo pipefail; Rscript scripts/visualize.R output-pva/pva027/genome/dea/countGroup output-pva/pva027/genome/dea/DEA output-pva/pva027/genome/dea/visualization ' returned non-zero exit status 1.
File "/home/sam/Downloads/ilse/RASflow/workflow/visualize.rules", line 53, in __rule_plot
File "/home/sam/anaconda3/envs/rasflow/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/sam/Downloads/ilse/RASflow/.snakemake/log/2021-01-20T092700.946097.snakemake.log
Visualization is done!
RASflow is done!
`
could you direct me on what do to fix it?
Cheers,
Pablo
from rasflow.
Hi Pablo, glad to hear that it works. Regarding the new issue, it actually happens in DEA, or rather, edgeR, since when I searched the error message, it led me to this post: nanoporetech/pipeline-transcriptome-de#6 As mentioned here, the problem is that there are not enough replicates. So in your case, how many replicates do you have in each group?
from rasflow.
Hi @zhxiaokang
There are two conditions and I have just one replicate for each condition, so only two sets of reads. Does this mean I cannot go any further with the visualization?
Cheers,
Pablo
from rasflow.
With only one replicate, you actually can't do differential expression analysis (DEA) since that's not enough to make statistics sense.
from rasflow.
I was fearing that was the case. Thanks a lot for your help!!!
from rasflow.
Related Issues (20)
- Strange problem with shopt -s extglob (related to issue #13) HOT 1
- How can I change default FDR treshold in DEA module? HOT 1
- Issue with environment creation on macOS Big Sur HOT 2
- BAM sorting error HOT 3
- DEA error and visualisation. HOT 4
- Gtf file format error HOT 2
- Multiple comparison with 2 treatments HOT 2
- DEA error HOT 1
- Show top 50 or 100 genes on heatmap HOT 2
- DEA error HOT 3
- reverse is reserved for internal use error HOT 1
- htseq-count test data error HOT 3
- transcriptome analysis HOT 1
- Can't run the examples HOT 3
- Error in fastqc HOT 1
- raw counts HOT 3
- Unable to render code block when opening Tutorial.pdf HOT 1
- process was be killed, ram is not enough. HOT 6
- multiQC conflicts in conda install HOT 3
- Cannot get to my files HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rasflow.