Comments (46)
Yes, and just to keep our minds clean, we detect a minor allele (the genotype is major-REF plus minor-ALT), but this leads to a phenotype of R
from drprg.
"This position seems to always have depth across the various alleles."
Could also be a paralogous kmer from elsewhere?
from drprg.
First eyeball of the results of this feature are that it is way too permissive - there's a lot of minor alleles in each VCF. I'm working on adding in a second layer whereby the the minor allele must also have a GAPS value less than some threshold. Eyeballing some of the "true" minor alleles it seems they all have GAPS < 0.4 (most are 0.0)
from drprg.
Are you looking at illumina or nanopore data?
from drprg.
Illumina. I'm not going to both trying to do minor alleles on nanopore (I don't think there's any minor alleles in the H2H dataset anyway).
from drprg.
I'm surprised you're getting too many calls with illumina and a 10% threshold
from drprg.
Here are the Illumina results after running with the new implementation of minor allele calling (I chaged the markers so that you can see the error bars better). This is detecting a fraction of 0.2 with a maximum GAPS value of 0.4
The first result that sticks out here is the ETO specificity. We basically call all but 4 samples as resistant (out of ~7000 isolates with phenotype for ETO!!)
I obviously didn't look at all 7000 samples, but all of the samples I did look at have a minor resistance call for mutation ethA ACGGTTTAGCGC1479A. The interesting thing is that the proportion of depth on the ref and alt allele is almost exactly the same for all the samples I looked at (e.g, 0.59/0.41 ref/alt). However, the GAPS for all of the samples I looked at was 0.0/0.333 - we have the max. GAPS threshold set at 0.4. I will try lowering it to 0.3 and see what that looks like.
from drprg.
Tb-profiler does manage to have slightly higher sensitivity for nearly all drugs also
from drprg.
Oh that might be because they use threshold 0.1 and we use 0.2
from drprg.
Yeah possibly. I started with 0.2 and was thinking to finetune and then test out 0.1
from drprg.
Here are the results of changing the max. GAPS to 0.3
It fixes the ETO specificity issue above.
from drprg.
I did some digging into the fluoroquinolone (FQ) FNs as minor alleles are super important for these drugs.
There are two issues that have popped up in most of the cases.
-
When the minor allele's coverage is close to the major allele - i.e. fractions >0.4 for both alleles - pandora calls a null genotype. I saw this quite a bit (and also saw it a little for INH). When I switch to using local genotyping in pandora though it fixes these null calls. Given that, what is happening here is the maximum likelihood path is choosing on allele but genotyping is picking another, so we nullify the call. Does anyone forsee issues with switching to local genotyping?
-
There is a silent mutation gyrA S95T (which is susceptible) that kills an assumption I make when doing minor allele calling. My assumption is that if we have an ALT call, then skip checking for a minor allele. Which is fine in nearly all cases as ALTs generally mean resistance; but this allele doesn't mean resistance. So we call an alt, which is just the reference allele with this silent mutation and therefore don't look for minor alleles (which are there and in the same VCF position/site). Here is an example of this site
gyrA 380 a68d4d07 GACAG AACAC,CACAC,GACAC,GCCAC,GGCAC,TACAC,TACAG . PASS VC=PH_SNPs;GRAPHTYPE=NESTED;PDP=0,0,0,0.82392,0.17608,0,0,0;VARID=gyrA_D94G,gyrA_D94C,gyrA_D94A,gyrA_D94N,gyrA_D94H,gyrA_D94Y,gyrA_S95T;PREDICT=.,.,.,.,.,.,U GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF 3:0,0,0,124,27,0,0,0:0,0,0,124,26,0,0,0:0,0,0,124,27,0,0,0:0,0,0,124,26,0,0,0:0,0,0,248,55,0,0,0:0,0,0,249,53,0,0,0:1,1,1,0,0,1,1,1:-1862.16,-1862.16,-1862.16,-247.957,-1250.38,-1862.16,-1862.16,-1862.16:1002.43
The silent mutation is allele 3.
So I think I will need to change some of my assumptions as this pops up a fair bit
from drprg.
- When Pandora makes a null call, does it still report depths? If yes, you can just look at the depth and make your own decisioj/call? Local genotyping....I'm not sure.
- DrPrg could just ignore Pandora genotyping entirely, and do it's own using depth on alleles.
Find the R allele with highest coverage
Find the S with highest coverage
If the R allele has >0.1 * the sum of these, you have an R call
from drprg.
- When Pandora makes a null call, does it still report depths? If yes, you can just look at the depth and make your own decisioj/call? Local genotyping....I'm not sure.
Yes, it does.
When I switch to using local genotyping, here is a table fo the changes in FNs and FPs
Tool | Drug | ΔFN | ΔFP |
---|---|---|---|
drprg | Amikacin | 0 | 0 |
drprg | Capreomycin | 0 | 0 |
drprg | Delamanid | 0 | 0 |
drprg | Ethambutol | 0 | -6 |
drprg | Ethionamide | -5 | 2 |
drprg | Isoniazid | -4 | 0 |
drprg | Kanamycin | 0 | 0 |
drprg | Levofloxacin | -3 | 1 |
drprg | Linezolid | 0 | 0 |
drprg | Moxifloxacin | -5 | 0 |
drprg | Ofloxacin | -1 | 0 |
drprg | Pyrazinamide | -2 | 0 |
drprg | Rifampicin | -2 | 0 |
drprg | Streptomycin | 1 | 0 |
e.g. A value of -2 means 2 less FN/FP
This seems like a strong case for local genotyping
- DrPrg could just ignore Pandora genotyping entirely, and do it's own using depth on alleles.
This seems unnecessary given the above I think?
Find the R allele with highest coverage Find the S with highest coverage If the R allele has >0.1 * the sum of these, you have an R call
This is almost what I do currently, but I just make some naive assumptions about REF and ALT alleles which I will fix and that wiill make it effectively the same as you suggest
from drprg.
Wow those are great results!
from drprg.
- When the minor allele's coverage is close to the major allele - i.e. fractions >0.4 for both alleles - pandora calls a null genotype. I saw this quite a bit (and also saw it a little for INH). When I switch to using local genotyping in pandora though it fixes these null calls. Given that, what is happening here is the maximum likelihood path is choosing on allele but genotyping is picking another, so we nullify the call. Does anyone forsee issues with switching to local genotyping?
Not really, we just switched away from local genotyping because we got worst results in the 20-way dataset. I remember checking a few calls manually, but not as much and not as deep as you are doing here, we were mostly driven by the final results. Would it be possible to have both VCFs from a sample (one with global and the other with local genotyping) and some calls where the local genotyping is better? I'd need that to be able to understand better what is happening. And if it is not an issue, the pandora compare output dir and command would be great too
from drprg.
my guess is michael is just going to go with local genotyping. I think there are two kinds of thing that can happen with local genotyping
- if you have a complicated graph (which happens in e coli but not in tb) you can have two diverged arms of a bubble, both with nested snps, and you can end up with global genotyping (=dynamic prog) chooses differently to local. You need to have a process to look at the local genotype calls and make sure they are actually compatible with each other. I think this just wont happen in michael's use case.
- Michael is dealiin with explicitly non-haploid things (minor alleles), whereas global genotyping is very vey haploid, and will N things.
from drprg.
Would it be possible to have both VCFs from a sample (one with global and the other with local genotyping) and some calls where the local genotyping is better? I'd need that to be able to understand better what is happening.
I don't have both on hand as they have been overridden with the local genotyping one. But could easily regenerate a pre-local one if you want. But what happened was when I look at the pandora consensus VCF the genotype was 0 and then genotyping VCF was trying to call GT 1, so pandora nullifies the call as a result.
And if it is not an issue, the pandora compare output dir and command would be great too
I don't use compare, just discover and map.
I suspect Zam's first point above is the likely situation
from drprg.
I don't use compare, just discover and map.
Fine, can be discover/map output dirs
This example means it was not a correct decision to deprecate the local genotyping for users, and it has its uses.
But what happened was when I look at the pandora consensus VCF the genotype was 0 and then genotyping VCF was trying to call GT 1, so pandora nullifies the call as a result.
This is one example I want to understand better why we have such a divergence, and if we have a better solution than simply nullifying the call. We need to revisit pandora genotyping and improve...
from drprg.
Could you rerun drprg using the most recent make_prg (0.4.0) and pandora (0.10.0-alpha.0) versions to check if results improve in any way?
from drprg.
Could you rerun drprg using the most recent make_prg (0.4.0) and pandora (0.10.0-alpha.0) versions to check if results improve in any way?
Yep, I will do this one at a time and check the change once this PR is complete
from drprg.
Alright, so after implementing the fix for point 2 here, the diff between the results before minor allele feature is
Tool | Drug | ΔFN | ΔFP |
---|---|---|---|
drprg | Amikacin | -8 | 7 |
drprg | Capreomycin | -5 | 3 |
drprg | Delamanid | 0 | 0 |
drprg | Ethambutol | -9 | 6 |
drprg | Ethionamide | -12 | 22 |
drprg | Isoniazid | -35 | 6 |
drprg | Kanamycin | -13 | 6 |
drprg | Levofloxacin | -21 | 4 |
drprg | Linezolid | 0 | 0 |
drprg | Moxifloxacin | -16 | 13 |
drprg | Ofloxacin | -5 | 1 |
drprg | Pyrazinamide | -2 | 3 |
drprg | Rifampicin | -4 | 0 |
drprg | Streptomycin | -9 | 2 |
The next two things to check before finalising this feature are using a minor allele fraction of 0.1 (results above were for 0.2) and removing the FRS filter
from drprg.
But wait , why suddenly so many more FPs?? Will only get worse if you lower to 0.1 and remove FRS?
from drprg.
More FPs because we are doing minor allele calling now.... This diff is with the results before I implemented minor allele calling. The diff before this was an earlier minor allele implementation before/after using local genotyping
from drprg.
Right but tb-profiler is also doing this without those FPs ? My expectation was that it would work well for fluoroquinolones at least (some precedent in mykrone paper and the new biorxiv from Alice brankin and Phil Fowler). Obviously those two aren't on this dataset tho
from drprg.
Right but tb-profiler is also doing this without those FPs ?
Not necessarily... These results are purely drprg FNs and FPs. We actually have less FPs than tb-profiler for all FQs at the moment
Drug | Tool | FN(R) | FP(S) | Sensitivity (95% CI) | Specificity (95% CI) |
---|---|---|---|---|---|
Amikacin | drprg | 69(485) | 57(6958) | 85.8% (82.4-88.6%) | 99.2% (98.9-99.4%) |
Amikacin | mykrobe | 93(485) | 51(6958) | 80.8% (77.1-84.1%) | 99.3% (99.0-99.4%) |
Amikacin | tbprofiler | 62(485) | 59(6958) | 87.2% (83.9-89.9%) | 99.2% (98.9-99.3%) |
Capreomycin | drprg | 57(235) | 95(2449) | 75.7% (69.9-80.8%) | 96.1% (95.3-96.8%) |
Capreomycin | mykrobe | 72(235) | 88(2449) | 69.4% (63.2-74.9%) | 96.4% (95.6-97.1%) |
Capreomycin | tbprofiler | 54(235) | 96(2449) | 77.0% (71.2-81.9%) | 96.1% (95.2-96.8%) |
Delamanid | drprg | 111(116) | 1(8152) | 4.3% (1.9-9.7%) | 100.0% (99.9-100.0%) |
Delamanid | mykrobe | 111(116) | 2(8152) | 4.3% (1.9-9.7%) | 100.0% (99.9-100.0%) |
Delamanid | tbprofiler | 111(116) | 2(8152) | 4.3% (1.9-9.7%) | 100.0% (99.9-100.0%) |
Ethambutol | drprg | 137(1538) | 742(4936) | 91.1% (89.6-92.4%) | 85.0% (83.9-85.9%) |
Ethambutol | mykrobe | 133(1538) | 747(4936) | 91.4% (89.8-92.7%) | 84.9% (83.8-85.8%) |
Ethambutol | tbprofiler | 118(1538) | 765(4936) | 92.3% (90.9-93.6%) | 84.5% (83.5-85.5%) |
Ethionamide | drprg | 329(1104) | 394(6105) | 70.2% (67.4-72.8%) | 93.5% (92.9-94.1%) |
Ethionamide | mykrobe | 265(1104) | 413(6105) | 76.0% (73.4-78.4%) | 93.2% (92.6-93.8%) |
Ethionamide | tbprofiler | 272(1104) | 414(6105) | 75.4% (72.7-77.8%) | 93.2% (92.6-93.8%) |
Isoniazid | drprg | 327(3900) | 170(4194) | 91.6% (90.7-92.4%) | 95.9% (95.3-96.5%) |
Isoniazid | mykrobe | 333(3900) | 170(4194) | 91.5% (90.5-92.3%) | 95.9% (95.3-96.5%) |
Isoniazid | tbprofiler | 297(3900) | 181(4194) | 92.4% (91.5-93.2%) | 95.7% (95.0-96.3%) |
Kanamycin | drprg | 129(670) | 107(6975) | 80.7% (77.6-83.6%) | 98.5% (98.1-98.7%) |
Kanamycin | mykrobe | 152(670) | 98(6975) | 77.3% (74.0-80.3%) | 98.6% (98.3-98.8%) |
Kanamycin | tbprofiler | 122(670) | 107(6975) | 81.8% (78.7-84.5%) | 98.5% (98.1-98.7%) |
Levofloxacin | drprg | 84(1040) | 101(5454) | 91.9% (90.1-93.4%) | 98.1% (97.8-98.5%) |
Levofloxacin | mykrobe | 88(1040) | 102(5454) | 91.5% (89.7-93.1%) | 98.1% (97.7-98.5%) |
Levofloxacin | tbprofiler | 85(1040) | 109(5454) | 91.8% (90.0-93.3%) | 98.0% (97.6-98.3%) |
Linezolid | drprg | 49(65) | 4(6110) | 24.6% (15.8-36.3%) | 99.9% (99.8-100.0%) |
Linezolid | mykrobe | 48(65) | 4(6110) | 26.2% (17.0-38.0%) | 99.9% (99.8-100.0%) |
Linezolid | tbprofiler | 48(65) | 5(6110) | 26.2% (17.0-38.0%) | 99.9% (99.8-100.0%) |
Moxifloxacin | drprg | 44(603) | 477(5431) | 92.7% (90.3-94.5%) | 91.2% (90.4-91.9%) |
Moxifloxacin | mykrobe | 44(603) | 473(5431) | 92.7% (90.3-94.5%) | 91.3% (90.5-92.0%) |
Moxifloxacin | tbprofiler | 42(603) | 482(5431) | 93.0% (90.7-94.8%) | 91.1% (90.3-91.9%) |
Ofloxacin | drprg | 26(105) | 5(424) | 75.2% (66.2-82.5%) | 98.8% (97.3-99.5%) |
Ofloxacin | mykrobe | 26(105) | 5(424) | 75.2% (66.2-82.5%) | 98.8% (97.3-99.5%) |
Ofloxacin | tbprofiler | 26(105) | 6(424) | 75.2% (66.2-82.5%) | 98.6% (96.9-99.3%) |
Pyrazinamide | drprg | 73(341) | 50(822) | 78.6% (73.9-82.6%) | 93.9% (92.1-95.4%) |
Pyrazinamide | mykrobe | 55(341) | 56(822) | 83.9% (79.6-87.4%) | 93.2% (91.3-94.7%) |
Pyrazinamide | tbprofiler | 45(341) | 62(822) | 86.8% (82.8-90.0%) | 92.5% (90.4-94.1%) |
Rifampicin | drprg | 138(3222) | 166(4586) | 95.7% (95.0-96.4%) | 96.4% (95.8-96.9%) |
Rifampicin | mykrobe | 164(3222) | 169(4586) | 94.9% (94.1-95.6%) | 96.3% (95.7-96.8%) |
Rifampicin | tbprofiler | 102(3222) | 177(4586) | 96.8% (96.2-97.4%) | 96.1% (95.5-96.7%) |
Streptomycin | drprg | 269(1042) | 132(1205) | 74.2% (71.4-76.7%) | 89.0% (87.2-90.7%) |
Streptomycin | mykrobe | 282(1042) | 135(1205) | 72.9% (70.2-75.5%) | 88.8% (86.9-90.5%) |
Streptomycin | tbprofiler | 257(1042) | 136(1205) | 75.3% (72.6-77.9%) | 88.7% (86.8-90.4%) |
These are the results from the most recent run.
from drprg.
Wow that comparison looks great. Moxi and ethionamide stand out for everyone tho, weird
from drprg.
No I did mean moxi , standing out for many FPs, so specificity is low
from drprg.
Frameshit eh
from drprg.
No I did mean moxi , standing out for many FPs, so specificity is low
Ah, right. Well the minor allele calling hasn't really increased the FPs much for it (see table here), but the sensitivity has come up a lot. I guess this is to be expected given Phil's recent paper
I'll take a look at those FPs once I've finalised the minor allele stuff
from drprg.
You are right, and anyway same for all tools
from drprg.
Alright, here is the diff when reducing minor allele calling from depth fraction 0.2 to 0.1
Tool | Drug | ΔFN | ΔFP |
---|---|---|---|
drprg | Amikacin | -1 | 0 |
drprg | Capreomycin | 0 | 0 |
drprg | Delamanid | 0 | 0 |
drprg | Ethambutol | -15 | 8 |
drprg | Ethionamide | 0 | 1 |
drprg | Isoniazid | -10 | 3 |
drprg | Kanamycin | -1 | 0 |
drprg | Levofloxacin | -3 | 2 |
drprg | Linezolid | 0 | 0 |
drprg | Moxifloxacin | -3 | 3 |
drprg | Ofloxacin | -2 | 0 |
drprg | Pyrazinamide | -1 | 1 |
drprg | Rifampicin | 0 | 1 |
drprg | Streptomycin | -7 | 2 |
I'm quite impressed with this. I was expecting a lot more FPs. So I think 0.1 is the winner!
Here is the latest Illumina results with the 0.1 fraction
Drug | Tool | FN(R) | FP(S) | Sensitivity (95% CI) | Specificity (95% CI) | MCC |
---|---|---|---|---|---|---|
Amikacin | drprg | 68(485) | 57(6958) | 86.0% (82.6-88.8%) | 99.2% (98.9-99.4%) | 0.861 |
Amikacin | mykrobe | 93(485) | 51(6958) | 80.8% (77.1-84.1%) | 99.3% (99.0-99.4%) | 0.836 |
Amikacin | tbprofiler | 62(485) | 59(6958) | 87.2% (83.9-89.9%) | 99.2% (98.9-99.3%) | 0.866 |
Capreomycin | drprg | 57(235) | 95(2449) | 75.7% (69.9-80.8%) | 96.1% (95.3-96.8%) | 0.672 |
Capreomycin | mykrobe | 72(235) | 88(2449) | 69.4% (63.2-74.9%) | 96.4% (95.6-97.1%) | 0.638 |
Capreomycin | tbprofiler | 54(235) | 96(2449) | 77.0% (71.2-81.9%) | 96.1% (95.2-96.8%) | 0.679 |
Delamanid | drprg | 111(116) | 1(8152) | 4.3% (1.9-9.7%) | 100.0% (99.9-100.0%) | 0.188 |
Delamanid | mykrobe | 111(116) | 2(8152) | 4.3% (1.9-9.7%) | 100.0% (99.9-100.0%) | 0.173 |
Delamanid | tbprofiler | 111(116) | 2(8152) | 4.3% (1.9-9.7%) | 100.0% (99.9-100.0%) | 0.173 |
Ethambutol | drprg | 122(1538) | 750(4936) | 92.1% (90.6-93.3%) | 84.8% (83.8-85.8%) | 0.693 |
Ethambutol | mykrobe | 133(1538) | 747(4936) | 91.4% (89.8-92.7%) | 84.9% (83.8-85.8%) | 0.689 |
Ethambutol | tbprofiler | 118(1538) | 765(4936) | 92.3% (90.9-93.6%) | 84.5% (83.5-85.5%) | 0.691 |
Ethionamide | drprg | 329(1104) | 395(6105) | 70.2% (67.4-72.8%) | 93.5% (92.9-94.1%) | 0.622 |
Ethionamide | mykrobe | 265(1104) | 413(6105) | 76.0% (73.4-78.4%) | 93.2% (92.6-93.8%) | 0.658 |
Ethionamide | tbprofiler | 272(1104) | 414(6105) | 75.4% (72.7-77.8%) | 93.2% (92.6-93.8%) | 0.653 |
Isoniazid | drprg | 317(3900) | 173(4194) | 91.9% (91.0-92.7%) | 95.9% (95.2-96.4%) | 0.879 |
Isoniazid | mykrobe | 333(3900) | 170(4194) | 91.5% (90.5-92.3%) | 95.9% (95.3-96.5%) | 0.876 |
Isoniazid | tbprofiler | 297(3900) | 181(4194) | 92.4% (91.5-93.2%) | 95.7% (95.0-96.3%) | 0.882 |
Kanamycin | drprg | 128(670) | 107(6975) | 80.9% (77.7-83.7%) | 98.5% (98.1-98.7%) | 0.805 |
Kanamycin | mykrobe | 152(670) | 98(6975) | 77.3% (74.0-80.3%) | 98.6% (98.3-98.8%) | 0.789 |
Kanamycin | tbprofiler | 122(670) | 107(6975) | 81.8% (78.7-84.5%) | 98.5% (98.1-98.7%) | 0.811 |
Levofloxacin | drprg | 81(1040) | 103(5454) | 92.2% (90.4-93.7%) | 98.1% (97.7-98.4%) | 0.896 |
Levofloxacin | mykrobe | 88(1040) | 102(5454) | 91.5% (89.7-93.1%) | 98.1% (97.7-98.5%) | 0.892 |
Levofloxacin | tbprofiler | 85(1040) | 109(5454) | 91.8% (90.0-93.3%) | 98.0% (97.6-98.3%) | 0.89 |
Linezolid | drprg | 49(65) | 4(6110) | 24.6% (15.8-36.3%) | 99.9% (99.8-100.0%) | 0.441 |
Linezolid | mykrobe | 48(65) | 4(6110) | 26.2% (17.0-38.0%) | 99.9% (99.8-100.0%) | 0.457 |
Linezolid | tbprofiler | 48(65) | 5(6110) | 26.2% (17.0-38.0%) | 99.9% (99.8-100.0%) | 0.447 |
Moxifloxacin | drprg | 41(603) | 480(5431) | 93.2% (90.9-94.9%) | 91.2% (90.4-91.9%) | 0.669 |
Moxifloxacin | mykrobe | 44(603) | 473(5431) | 92.7% (90.3-94.5%) | 91.3% (90.5-92.0%) | 0.669 |
Moxifloxacin | tbprofiler | 42(603) | 482(5431) | 93.0% (90.7-94.8%) | 91.1% (90.3-91.9%) | 0.668 |
Ofloxacin | drprg | 24(105) | 5(424) | 77.1% (68.2-84.1%) | 98.8% (97.3-99.5%) | 0.821 |
Ofloxacin | mykrobe | 26(105) | 5(424) | 75.2% (66.2-82.5%) | 98.8% (97.3-99.5%) | 0.808 |
Ofloxacin | tbprofiler | 26(105) | 6(424) | 75.2% (66.2-82.5%) | 98.6% (96.9-99.3%) | 0.802 |
Pyrazinamide | drprg | 72(341) | 51(822) | 78.9% (74.2-82.9%) | 93.8% (91.9-95.2%) | 0.741 |
Pyrazinamide | mykrobe | 55(341) | 56(822) | 83.9% (79.6-87.4%) | 93.2% (91.3-94.7%) | 0.77 |
Pyrazinamide | tbprofiler | 45(341) | 62(822) | 86.8% (82.8-90.0%) | 92.5% (90.4-94.1%) | 0.782 |
Rifampicin | drprg | 138(3222) | 167(4586) | 95.7% (95.0-96.4%) | 96.4% (95.8-96.9%) | 0.92 |
Rifampicin | mykrobe | 164(3222) | 169(4586) | 94.9% (94.1-95.6%) | 96.3% (95.7-96.8%) | 0.912 |
Rifampicin | tbprofiler | 102(3222) | 177(4586) | 96.8% (96.2-97.4%) | 96.1% (95.5-96.7%) | 0.927 |
Streptomycin | drprg | 262(1042) | 134(1205) | 74.9% (72.1-77.4%) | 88.9% (87.0-90.5%) | 0.647 |
Streptomycin | mykrobe | 282(1042) | 135(1205) | 72.9% (70.2-75.5%) | 88.8% (86.9-90.5%) | 0.629 |
Streptomycin | tbprofiler | 257(1042) | 136(1205) | 75.3% (72.6-77.9%) | 88.7% (86.8-90.4%) | 0.649 |
from drprg.
Thus is much better isn't it!
Questions
- just checking, is Mykrobe pinned to haploid here, or can it call minors?
- tbprofiler still manages to call 36 rif TPs and 20 isoniazid TPs and 27 pyrazinamids TPs that we don't (although drprg beats Mykrobe which is cool!) . So, double checking: this is the same catalogue right? However tbprofiler takes vcf. Maybe some small issue in probe generation, especially where we have dense variation? But racon ought to plug those gaps. Weird. Are they all in the same samples?
from drprg.
just checking, is Mykrobe pinned to haploid here, or can it call minors?
it can call minor alleles
So, double checking: this is the same catalogue right? However tbprofiler takes vcf. Maybe some small issue in probe generation, especially where we have dense variation? But racon ought to plug those gaps. Weird. Are they all in the same samples?
100% the same catalogue for all tools. I'm going to dig into the discrepancies this week
from drprg.
It seems there are a decent number of cases where we don't call a minor allele because we fail the GAPS filer (min 0.3). However, in many of these cases, even the ref fails the GAPS filter. A lot of examples I am seeing have a small difference in GAPS between the ref and alt. (rpoB S450L has a lot of minor allele calls where both the ref and alt have GAPS 0.333). I'll try increasing the GAPS filter to 0.5, but also adding in a GAPS diff filter that sets an allowed max. difference between the ref and alt GAPS value.
from drprg.
The GAPS filter threshold is set in the expectation were looking at a major/haploid allele. I'd ignore it for minor allele detection, or as you're doing, change the threshold
from drprg.
Although ref failing gaps also is interesting
from drprg.
I added in a GAPS diff parameter. So we have a max GAPS value and a max GAPS diff, which is the difference between the major and minor allele's GAPS value (this can actually be negative - I found a case of the minor allele GAPS being lower than the called allele!).
I ran with max GAPS 0.5 and max diff 0.2 and I get the following diff
Tool | Drug | ΔFN | ΔFP |
---|---|---|---|
drprg | Amikacin | 0 | 0 |
drprg | Capreomycin | 0 | 0 |
drprg | Delamanid | 0 | 0 |
drprg | Ethambutol | -101 | 4135 |
drprg | Ethionamide | -1 | 3 |
drprg | Isoniazid | 0 | 0 |
drprg | Kanamycin | 0 | 0 |
drprg | Levofloxacin | -2 | 2 |
drprg | Linezolid | 0 | 1 |
drprg | Moxifloxacin | -1 | 0 |
drprg | Ofloxacin | 0 | 0 |
drprg | Pyrazinamide | -1 | 1 |
drprg | Rifampicin | -21 | 2 |
drprg | Streptomycin | 1 | 1 |
This is nearly great. We save a lot of RIF FNs. BUT, we end up calling everything EMB resistant by the looks of it. I'll dig into this today and see if I can get to the bottom of that
from drprg.
All of the EMB calls are due to the same position
embA 85 4b6d566c CCTACC CCTACA,CCTATC,TCTACC . PASS VC=PH_SNPs;GRAPHTYPE=SIMPLE;PDP=0.423729,0.169492,0.0847458,0.322034;OGT=0;VARID=embA_C-12T,embA_C-16G,embA_C-16T;PREDICT=.,.,r GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF 3:18,8,4,14:7,2,1,5:25,0,0,12:7,0,0,3:90,25,25,59:37,7,7,22:0.4,0.666667,0.833333,0.5:-207.412,-316.464,-362.32,-249.056:41.6434
The minor allele has GAPS 0.5 and the ref has 0.4. This position seems to always have depth across the various alleles. My assumption is there is a shared minimizer between these alleles. For instance, the ref is CCTACC and the alt that always gets called is TCTACC. The last 5 bases of these two alleles are the same, so it is conceiveable that they share a minimizer that covers some/all of those 5 bases and not the first, given the small difference in GAPS.
I'll have a think about the best way to address this without losing the RIF TPs we gained.
from drprg.
If I break this variant into two sites (there is 3bp that match in between the first base and last two bases) this GAPS issue goes away.
So I see two options here
- Reduce the minimum match length to 3 when building the index
- Add in another GAPS filter 'max called gaps' which sets a maximum value for the called allele's GAPS value. If it is above this value, we don't consider minor alleles for that site. This would get rid of all of the bad EMB calls
I'm kind of tempted to try both and see what happens separately for each...
from drprg.
Reducing the lin. match length to 3 gives the following diff (replaces the last diff)
Tool | Drug | ΔFN | ΔFP |
---|---|---|---|
drprg | Amikacin | 0 | 0 |
drprg | Capreomycin | 0 | 0 |
drprg | Delamanid | 0 | 0 |
drprg | Ethambutol | -8 | 24 |
drprg | Ethionamide | 3 | 2 |
drprg | Isoniazid | 12 | -1 |
drprg | Kanamycin | 0 | 0 |
drprg | Levofloxacin | -1 | 2 |
drprg | Linezolid | 0 | 1 |
drprg | Moxifloxacin | -1 | 0 |
drprg | Ofloxacin | 0 | 0 |
drprg | Pyrazinamide | 0 | 3 |
drprg | Rifampicin | -19 | 2 |
drprg | Streptomycin | 1 | 2 |
Not super happy with this one. Currently trying option 2 above
from drprg.
Bit weird, just this one position always having coverage on both alleles. Your idea of a shared minimizer would effectively mean those 2 alleles collapse to one path in the kmer graph instead of a bubble, or potentially the bubble becomes a bit shorter I suppose. Is there always the same coverage on both?
from drprg.
No, this is only an issue when we pay attention to minor alleles? So we always get a tiny bit of coverage on one allele? So this can't be a shared minimizer can it?
from drprg.
Maybe this is a mad idea, but what happens if you change your hash function?
from drprg.
No, this is only an issue when we pay attention to minor alleles? So we always get a tiny bit of coverage on one allele? So this can't be a shared minimizer can it?
All alleles get some coverage, but not the same amount on each. You're right, it can't be shared minimizers...
When I split the allele into two, I still got some coverage on the minor allele, but the GAPS increased on the minor and decreased on the called allele.
Maybe this is a mad idea, but what happens if you change your hash function?
As in the pandora hash function? I think this is what we currently use https://github.com/rmcolq/pandora/blob/945c0255e99ab32fd5b056dc1c16ffdf611ef478/thirdparty/src/inthash.cpp#L106-L117. Do you think there might be a hash collision?
from drprg.
This is the diff for using the max called GAPS filter (option 2 from above)
Tool | Drug | ΔFN | ΔFP |
---|---|---|---|
drprg | Amikacin | 0 | 0 |
drprg | Capreomycin | 0 | 0 |
drprg | Delamanid | 0 | 0 |
drprg | Ethambutol | 0 | 0 |
drprg | Ethionamide | -1 | 3 |
drprg | Isoniazid | 0 | 0 |
drprg | Kanamycin | 0 | 0 |
drprg | Levofloxacin | -2 | 2 |
drprg | Linezolid | 0 | 0 |
drprg | Moxifloxacin | -1 | 0 |
drprg | Ofloxacin | 0 | 0 |
drprg | Pyrazinamide | -1 | 1 |
drprg | Rifampicin | -19 | 2 |
drprg | Streptomycin | 1 | 1 |
This looks pretty good.
from drprg.
Related Issues (20)
- Run TBProfiler HOT 4
- False positive argmatch results HOT 1
- Run drprg HOT 4
- Add grammar for specifying variant "expert" rules
- Benchmark figures HOT 7
- Update pandora and make_prg HOT 2
- Deal with gene absence HOT 9
- Add some common resistance-conferring mutations that do no exist in population graph HOT 8
- Notice partial gene deletion that spans start codon HOT 8
- Targeted sequencing mode HOT 1
- Disruptive in-frame indels HOT 3
- Lineage calling HOT 1
- Add install instructions to README.md
- How do i pronounce this tool name HOT 1
- Installation_error HOT 2
- Collate_drprg_results HOT 1
- error: unrecognized subcommand 'index' HOT 4
- Consequence of unknown mutations HOT 1
- Run mykrobe
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from drprg.