Comments (4)
The tl;dr of iqbal-lab-org/pandora#294 is that this sample has short reads (~50bp) and all have Ns in the middle. So we lose a lot of minimizers. The default minimum size of a cluster of hits in pandora is 10, and we basically never get more than that on a read for this sample (iqbal-lab-org/pandora#295 (comment) sums this up).
So the question is (cc @iqbal-lab), do we
- Reduce the minimum cluster size for Illumina data (in
drprg
). From Page 45 of Rachel's thesis
When the minimum size of a cluster is set too low, we have more false positive local graphs identified as present in the dataset, and also have to handle more noise downstream when inferring a mosaic sequence and genotyping. When it is set too high, we have less sensitivity to discover loci that are present.
For the purposes of drprg
, we aren't concerned with false positive loci discovery - especially for MTB. So maybe something lower (like 5?) could be better?
- Refuse to analyse samples with 50bp reads - this seems quite brutal, but also solves the issue (unless there are longer reads with lots of ambiguous bases).
from drprg.
Definitely refuse to analyse it!
from drprg.
That does feel a bit sly though given mykrobe and tbprofiler produce good predictions for this sample...
from drprg.
Sorry, I don't mean reject the sample up front if it has a few short reads. But effectively ignoring short reads is fine IMO. Fine if Mykrobe and tbprofiler win on this one. The future is long reads, we shouldn't contort ourselves over tiny ones
from drprg.
Related Issues (20)
- False positive argmatch results HOT 1
- Add grammar for specifying variant "expert" rules
- Benchmark figures HOT 7
- Parse pandora VCF to detect minor alleles HOT 46
- Update pandora and make_prg HOT 2
- Deal with gene absence HOT 9
- Add some common resistance-conferring mutations that do no exist in population graph HOT 8
- Notice partial gene deletion that spans start codon HOT 8
- Targeted sequencing mode HOT 1
- Disruptive in-frame indels HOT 3
- Lineage calling HOT 1
- Add install instructions to README.md
- How do i pronounce this tool name HOT 1
- Installation_error HOT 2
- Collate_drprg_results HOT 1
- error: unrecognized subcommand 'index' HOT 4
- Paired-end fastq files HOT 1
- Missing expected output file /test/outdir/discover/denovo_sequences.fa HOT 5
- Index is not valid, missing files error HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from drprg.