Code Monkey home page Code Monkey logo

Comments (7)

MrOlm avatar MrOlm commented on July 17, 2024

Hi @rebeccasophiasalcedo -

Thank you for the detailed bug report! Based on what you've described, it does seem like there's a problem with inStrain reading the reads mapped to some of your scaffolds. This could be due to 1) a mismatch between the scaffold names in the .fasta file and the scaffold names in the .bam file, 2) a weird character in the names of the .fasta file or .bam file, or 3) some other bug with pysam (what inStrain uses to load the reads). A couple of questions-

About how many other samples did you run that worked? A few, or many?

Did the other samples use the same .fasta file? I there anything different about how this sample was processed as compared to the other files?

Thanks,
Matt

from instrain.

MrOlm avatar MrOlm commented on July 17, 2024

Also in looking at the log file, inStrain is reporting that it's removing ~50% of reads during filtering. Could you provide the mapping_info.tsv file as well?

Thanks,
Matt

from instrain.

rebeccasophiasalcedo avatar rebeccasophiasalcedo commented on July 17, 2024

I have a total of 28 samples and had no problems with the other 27! And they all used their respective assembly fasta, so the .fa file used in this command is only ever used here for this analysis. Though I generated MAGs using all assemblies (incl. this one) and cross-mapped for metaBAT and ran into no issues there.

I just checked my sam and fasta and the headers look alright. I didn't catch any whitespace characters either.

head 4500m1000m_reads_to_contigs.sam
@hd VN:1.5 SO:unsorted GO:query
@sq SN:OC1703_4500m_1000m_sens_contig_1483280 LN:1041
@sq SN:OC1703_4500m_1000m_sens_contig_3485708 LN:1284

head 4500m1000m_contigs_min1000.fa

OC1703_4500m_1000m_sens_contig_4153184
OC1703_4500m_1000m_sens_contig_593313

Here's the mapping_info.tsv file!

from instrain.

rebeccasophiasalcedo avatar rebeccasophiasalcedo commented on July 17, 2024

oh and when I ran coverM I required 95% ANI too so the recruitment of coverM and what inStrain's default requirements are for mapping are comparable!

from instrain.

MrOlm avatar MrOlm commented on July 17, 2024

OK thanks for this. In looking at your mapping_info.tsv file, it looks to be like inStrain thinks that all the reads mapping to lots of the contigs are singletons (unpaired reads). This could be because they are, or because of some sort of bug in inStrain / pysam that's make it think they are.

You could confirm one way or the other by trying to filter the .bam file to remove unpaired reads and then running coverM again (or something like that).

Or, if you could just add --pairing_filter all_reads to just keep the singleton reads and not filter them out.

-Matt

from instrain.

rebeccasophiasalcedo avatar rebeccasophiasalcedo commented on July 17, 2024

okay I'll give both of those a shot and report back, thank you!

from instrain.

rebeccasophiasalcedo avatar rebeccasophiasalcedo commented on July 17, 2024

running it with --pairing_filter all_reads worked! I think it's because my reads actually are all singletons and probably not a bug based on some samtools digging, thank you so much for helping!

from instrain.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.