Comments (7)
Thank you for the detailed bug report! Based on what you've described, it does seem like there's a problem with inStrain reading the reads mapped to some of your scaffolds. This could be due to 1) a mismatch between the scaffold names in the .fasta file and the scaffold names in the .bam file, 2) a weird character in the names of the .fasta file or .bam file, or 3) some other bug with pysam (what inStrain uses to load the reads). A couple of questions-
About how many other samples did you run that worked? A few, or many?
Did the other samples use the same .fasta file? I there anything different about how this sample was processed as compared to the other files?
Thanks,
Matt
from instrain.
Also in looking at the log file, inStrain is reporting that it's removing ~50% of reads during filtering. Could you provide the mapping_info.tsv file as well?
Thanks,
Matt
from instrain.
I have a total of 28 samples and had no problems with the other 27! And they all used their respective assembly fasta, so the .fa file used in this command is only ever used here for this analysis. Though I generated MAGs using all assemblies (incl. this one) and cross-mapped for metaBAT and ran into no issues there.
I just checked my sam and fasta and the headers look alright. I didn't catch any whitespace characters either.
head 4500m1000m_reads_to_contigs.sam
@hd VN:1.5 SO:unsorted GO:query
@sq SN:OC1703_4500m_1000m_sens_contig_1483280 LN:1041
@sq SN:OC1703_4500m_1000m_sens_contig_3485708 LN:1284
head 4500m1000m_contigs_min1000.fa
OC1703_4500m_1000m_sens_contig_4153184
OC1703_4500m_1000m_sens_contig_593313
Here's the mapping_info.tsv file!
from instrain.
oh and when I ran coverM I required 95% ANI too so the recruitment of coverM and what inStrain's default requirements are for mapping are comparable!
from instrain.
OK thanks for this. In looking at your mapping_info.tsv file, it looks to be like inStrain thinks that all the reads mapping to lots of the contigs are singletons (unpaired reads). This could be because they are, or because of some sort of bug in inStrain / pysam that's make it think they are.
You could confirm one way or the other by trying to filter the .bam file to remove unpaired reads and then running coverM again (or something like that).
Or, if you could just add --pairing_filter all_reads
to just keep the singleton reads and not filter them out.
-Matt
from instrain.
okay I'll give both of those a shot and report back, thank you!
from instrain.
running it with --pairing_filter all_reads
worked! I think it's because my reads actually are all singletons and probably not a bug based on some samtools digging, thank you so much for helping!
from instrain.
Related Issues (20)
- SNP calling naive question HOT 5
- Issue in statistical analysis of nucleotide diversity. HOT 2
- DEBUG:no_length will not be considered as part of the genime HOT 3
- inStrain compare - output HOT 9
- about inStrain profile with KeyError: 'version' HOT 11
- clarity regarding strain_clusters.tsv HOT 2
- calculations at the single-bacterial level HOT 1
- Inconsistent order of columns in output file *_genome_info.tsv HOT 1
- Inconsistency between results of “inStrain quick_profile” and “inStrain profile" HOT 1
- phylogenetic analysis of a selected bacterium in different sample
- what`s the meaning of the no data between the gene HOT 3
- The number of synonymous and non synonymous sites is not an integer
- inStrain Profile freeze - OSError: truncated file HOT 2
- How should one interpret genes displaying a high pN value alongside a pS value of 0 HOT 1
- How to calculate Tajima'D or fixation indices (FST) using inStrain files?
- How to calculate Tajima'D or fixation indices (FST) using inStrain files? HOT 1
- inStrain compare: {genome} is in input {input} but not the provided stb file HOT 5
- Questions about read coverage and breadth filtration. HOT 1
- instrain profile step3 HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from instrain.