Code Monkey home page Code Monkey logo

Comments (46)

iqbal-lab avatar iqbal-lab commented on July 20, 2024 1

Yes, and just to keep our minds clean, we detect a minor allele (the genotype is major-REF plus minor-ALT), but this leads to a phenotype of R

from drprg.

iqbal-lab avatar iqbal-lab commented on July 20, 2024 1

"This position seems to always have depth across the various alleles."

Could also be a paralogous kmer from elsewhere?

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

First eyeball of the results of this feature are that it is way too permissive - there's a lot of minor alleles in each VCF. I'm working on adding in a second layer whereby the the minor allele must also have a GAPS value less than some threshold. Eyeballing some of the "true" minor alleles it seems they all have GAPS < 0.4 (most are 0.0)

from drprg.

iqbal-lab avatar iqbal-lab commented on July 20, 2024

Are you looking at illumina or nanopore data?

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

Illumina. I'm not going to both trying to do minor alleles on nanopore (I don't think there's any minor alleles in the H2H dataset anyway).

from drprg.

iqbal-lab avatar iqbal-lab commented on July 20, 2024

I'm surprised you're getting too many calls with illumina and a 10% threshold

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

Here are the Illumina results after running with the new implementation of minor allele calling (I chaged the markers so that you can see the error bars better). This is detecting a fraction of 0.2 with a maximum GAPS value of 0.4

illumina

The first result that sticks out here is the ETO specificity. We basically call all but 4 samples as resistant (out of ~7000 isolates with phenotype for ETO!!)

I obviously didn't look at all 7000 samples, but all of the samples I did look at have a minor resistance call for mutation ethA ACGGTTTAGCGC1479A. The interesting thing is that the proportion of depth on the ref and alt allele is almost exactly the same for all the samples I looked at (e.g, 0.59/0.41 ref/alt). However, the GAPS for all of the samples I looked at was 0.0/0.333 - we have the max. GAPS threshold set at 0.4. I will try lowering it to 0.3 and see what that looks like.

from drprg.

iqbal-lab avatar iqbal-lab commented on July 20, 2024

Tb-profiler does manage to have slightly higher sensitivity for nearly all drugs also

from drprg.

iqbal-lab avatar iqbal-lab commented on July 20, 2024

Oh that might be because they use threshold 0.1 and we use 0.2

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

Yeah possibly. I started with 0.2 and was thinking to finetune and then test out 0.1

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

Here are the results of changing the max. GAPS to 0.3

illumina

It fixes the ETO specificity issue above.

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

I did some digging into the fluoroquinolone (FQ) FNs as minor alleles are super important for these drugs.

There are two issues that have popped up in most of the cases.

  1. When the minor allele's coverage is close to the major allele - i.e. fractions >0.4 for both alleles - pandora calls a null genotype. I saw this quite a bit (and also saw it a little for INH). When I switch to using local genotyping in pandora though it fixes these null calls. Given that, what is happening here is the maximum likelihood path is choosing on allele but genotyping is picking another, so we nullify the call. Does anyone forsee issues with switching to local genotyping?

  2. There is a silent mutation gyrA S95T (which is susceptible) that kills an assumption I make when doing minor allele calling. My assumption is that if we have an ALT call, then skip checking for a minor allele. Which is fine in nearly all cases as ALTs generally mean resistance; but this allele doesn't mean resistance. So we call an alt, which is just the reference allele with this silent mutation and therefore don't look for minor alleles (which are there and in the same VCF position/site). Here is an example of this site

gyrA    380     a68d4d07        GACAG   AACAC,CACAC,GACAC,GCCAC,GGCAC,TACAC,TACAG       .       PASS    VC=PH_SNPs;GRAPHTYPE=NESTED;PDP=0,0,0,0.82392,0.17608,0,0,0;VARID=gyrA_D94G,gyrA_D94C,gyrA_D94A,gyrA_D94N,gyrA_D94H,gyrA_D94Y,gyrA_S95T;PREDICT=.,.,.,.,.,.,U GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      3:0,0,0,124,27,0,0,0:0,0,0,124,26,0,0,0:0,0,0,124,27,0,0,0:0,0,0,124,26,0,0,0:0,0,0,248,55,0,0,0:0,0,0,249,53,0,0,0:1,1,1,0,0,1,1,1:-1862.16,-1862.16,-1862.16,-247.957,-1250.38,-1862.16,-1862.16,-1862.16:1002.43

The silent mutation is allele 3.

So I think I will need to change some of my assumptions as this pops up a fair bit

from drprg.

iqbal-lab avatar iqbal-lab commented on July 20, 2024
  1. When Pandora makes a null call, does it still report depths? If yes, you can just look at the depth and make your own decisioj/call? Local genotyping....I'm not sure.
  2. DrPrg could just ignore Pandora genotyping entirely, and do it's own using depth on alleles.

Find the R allele with highest coverage
Find the S with highest coverage
If the R allele has >0.1 * the sum of these, you have an R call

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024
  1. When Pandora makes a null call, does it still report depths? If yes, you can just look at the depth and make your own decisioj/call? Local genotyping....I'm not sure.

Yes, it does.

When I switch to using local genotyping, here is a table fo the changes in FNs and FPs

Tool Drug ΔFN ΔFP
drprg Amikacin 0 0
drprg Capreomycin 0 0
drprg Delamanid 0 0
drprg Ethambutol 0 -6
drprg Ethionamide -5 2
drprg Isoniazid -4 0
drprg Kanamycin 0 0
drprg Levofloxacin -3 1
drprg Linezolid 0 0
drprg Moxifloxacin -5 0
drprg Ofloxacin -1 0
drprg Pyrazinamide -2 0
drprg Rifampicin -2 0
drprg Streptomycin 1 0

e.g. A value of -2 means 2 less FN/FP

This seems like a strong case for local genotyping

  1. DrPrg could just ignore Pandora genotyping entirely, and do it's own using depth on alleles.

This seems unnecessary given the above I think?

Find the R allele with highest coverage Find the S with highest coverage If the R allele has >0.1 * the sum of these, you have an R call

This is almost what I do currently, but I just make some naive assumptions about REF and ALT alleles which I will fix and that wiill make it effectively the same as you suggest

from drprg.

iqbal-lab avatar iqbal-lab commented on July 20, 2024

Wow those are great results!

from drprg.

leoisl avatar leoisl commented on July 20, 2024
  1. When the minor allele's coverage is close to the major allele - i.e. fractions >0.4 for both alleles - pandora calls a null genotype. I saw this quite a bit (and also saw it a little for INH). When I switch to using local genotyping in pandora though it fixes these null calls. Given that, what is happening here is the maximum likelihood path is choosing on allele but genotyping is picking another, so we nullify the call. Does anyone forsee issues with switching to local genotyping?

Not really, we just switched away from local genotyping because we got worst results in the 20-way dataset. I remember checking a few calls manually, but not as much and not as deep as you are doing here, we were mostly driven by the final results. Would it be possible to have both VCFs from a sample (one with global and the other with local genotyping) and some calls where the local genotyping is better? I'd need that to be able to understand better what is happening. And if it is not an issue, the pandora compare output dir and command would be great too

from drprg.

iqbal-lab avatar iqbal-lab commented on July 20, 2024

my guess is michael is just going to go with local genotyping. I think there are two kinds of thing that can happen with local genotyping

  1. if you have a complicated graph (which happens in e coli but not in tb) you can have two diverged arms of a bubble, both with nested snps, and you can end up with global genotyping (=dynamic prog) chooses differently to local. You need to have a process to look at the local genotype calls and make sure they are actually compatible with each other. I think this just wont happen in michael's use case.
  2. Michael is dealiin with explicitly non-haploid things (minor alleles), whereas global genotyping is very vey haploid, and will N things.

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

Would it be possible to have both VCFs from a sample (one with global and the other with local genotyping) and some calls where the local genotyping is better? I'd need that to be able to understand better what is happening.

I don't have both on hand as they have been overridden with the local genotyping one. But could easily regenerate a pre-local one if you want. But what happened was when I look at the pandora consensus VCF the genotype was 0 and then genotyping VCF was trying to call GT 1, so pandora nullifies the call as a result.

And if it is not an issue, the pandora compare output dir and command would be great too

I don't use compare, just discover and map.

I suspect Zam's first point above is the likely situation

from drprg.

leoisl avatar leoisl commented on July 20, 2024

I don't use compare, just discover and map.

Fine, can be discover/map output dirs

This example means it was not a correct decision to deprecate the local genotyping for users, and it has its uses.

But what happened was when I look at the pandora consensus VCF the genotype was 0 and then genotyping VCF was trying to call GT 1, so pandora nullifies the call as a result.

This is one example I want to understand better why we have such a divergence, and if we have a better solution than simply nullifying the call. We need to revisit pandora genotyping and improve...

from drprg.

leoisl avatar leoisl commented on July 20, 2024

Could you rerun drprg using the most recent make_prg (0.4.0) and pandora (0.10.0-alpha.0) versions to check if results improve in any way?

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

Could you rerun drprg using the most recent make_prg (0.4.0) and pandora (0.10.0-alpha.0) versions to check if results improve in any way?

Yep, I will do this one at a time and check the change once this PR is complete

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

Alright, so after implementing the fix for point 2 here, the diff between the results before minor allele feature is

Tool Drug ΔFN ΔFP
drprg Amikacin -8 7
drprg Capreomycin -5 3
drprg Delamanid 0 0
drprg Ethambutol -9 6
drprg Ethionamide -12 22
drprg Isoniazid -35 6
drprg Kanamycin -13 6
drprg Levofloxacin -21 4
drprg Linezolid 0 0
drprg Moxifloxacin -16 13
drprg Ofloxacin -5 1
drprg Pyrazinamide -2 3
drprg Rifampicin -4 0
drprg Streptomycin -9 2

The next two things to check before finalising this feature are using a minor allele fraction of 0.1 (results above were for 0.2) and removing the FRS filter

from drprg.

iqbal-lab avatar iqbal-lab commented on July 20, 2024

But wait , why suddenly so many more FPs?? Will only get worse if you lower to 0.1 and remove FRS?

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

More FPs because we are doing minor allele calling now.... This diff is with the results before I implemented minor allele calling. The diff before this was an earlier minor allele implementation before/after using local genotyping

from drprg.

iqbal-lab avatar iqbal-lab commented on July 20, 2024

Right but tb-profiler is also doing this without those FPs ? My expectation was that it would work well for fluoroquinolones at least (some precedent in mykrone paper and the new biorxiv from Alice brankin and Phil Fowler). Obviously those two aren't on this dataset tho

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

Right but tb-profiler is also doing this without those FPs ?

Not necessarily... These results are purely drprg FNs and FPs. We actually have less FPs than tb-profiler for all FQs at the moment

Drug Tool FN(R) FP(S) Sensitivity (95% CI) Specificity (95% CI)
Amikacin drprg 69(485) 57(6958) 85.8% (82.4-88.6%) 99.2% (98.9-99.4%)
Amikacin mykrobe 93(485) 51(6958) 80.8% (77.1-84.1%) 99.3% (99.0-99.4%)
Amikacin tbprofiler 62(485) 59(6958) 87.2% (83.9-89.9%) 99.2% (98.9-99.3%)
Capreomycin drprg 57(235) 95(2449) 75.7% (69.9-80.8%) 96.1% (95.3-96.8%)
Capreomycin mykrobe 72(235) 88(2449) 69.4% (63.2-74.9%) 96.4% (95.6-97.1%)
Capreomycin tbprofiler 54(235) 96(2449) 77.0% (71.2-81.9%) 96.1% (95.2-96.8%)
Delamanid drprg 111(116) 1(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%)
Delamanid mykrobe 111(116) 2(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%)
Delamanid tbprofiler 111(116) 2(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%)
Ethambutol drprg 137(1538) 742(4936) 91.1% (89.6-92.4%) 85.0% (83.9-85.9%)
Ethambutol mykrobe 133(1538) 747(4936) 91.4% (89.8-92.7%) 84.9% (83.8-85.8%)
Ethambutol tbprofiler 118(1538) 765(4936) 92.3% (90.9-93.6%) 84.5% (83.5-85.5%)
Ethionamide drprg 329(1104) 394(6105) 70.2% (67.4-72.8%) 93.5% (92.9-94.1%)
Ethionamide mykrobe 265(1104) 413(6105) 76.0% (73.4-78.4%) 93.2% (92.6-93.8%)
Ethionamide tbprofiler 272(1104) 414(6105) 75.4% (72.7-77.8%) 93.2% (92.6-93.8%)
Isoniazid drprg 327(3900) 170(4194) 91.6% (90.7-92.4%) 95.9% (95.3-96.5%)
Isoniazid mykrobe 333(3900) 170(4194) 91.5% (90.5-92.3%) 95.9% (95.3-96.5%)
Isoniazid tbprofiler 297(3900) 181(4194) 92.4% (91.5-93.2%) 95.7% (95.0-96.3%)
Kanamycin drprg 129(670) 107(6975) 80.7% (77.6-83.6%) 98.5% (98.1-98.7%)
Kanamycin mykrobe 152(670) 98(6975) 77.3% (74.0-80.3%) 98.6% (98.3-98.8%)
Kanamycin tbprofiler 122(670) 107(6975) 81.8% (78.7-84.5%) 98.5% (98.1-98.7%)
Levofloxacin drprg 84(1040) 101(5454) 91.9% (90.1-93.4%) 98.1% (97.8-98.5%)
Levofloxacin mykrobe 88(1040) 102(5454) 91.5% (89.7-93.1%) 98.1% (97.7-98.5%)
Levofloxacin tbprofiler 85(1040) 109(5454) 91.8% (90.0-93.3%) 98.0% (97.6-98.3%)
Linezolid drprg 49(65) 4(6110) 24.6% (15.8-36.3%) 99.9% (99.8-100.0%)
Linezolid mykrobe 48(65) 4(6110) 26.2% (17.0-38.0%) 99.9% (99.8-100.0%)
Linezolid tbprofiler 48(65) 5(6110) 26.2% (17.0-38.0%) 99.9% (99.8-100.0%)
Moxifloxacin drprg 44(603) 477(5431) 92.7% (90.3-94.5%) 91.2% (90.4-91.9%)
Moxifloxacin mykrobe 44(603) 473(5431) 92.7% (90.3-94.5%) 91.3% (90.5-92.0%)
Moxifloxacin tbprofiler 42(603) 482(5431) 93.0% (90.7-94.8%) 91.1% (90.3-91.9%)
Ofloxacin drprg 26(105) 5(424) 75.2% (66.2-82.5%) 98.8% (97.3-99.5%)
Ofloxacin mykrobe 26(105) 5(424) 75.2% (66.2-82.5%) 98.8% (97.3-99.5%)
Ofloxacin tbprofiler 26(105) 6(424) 75.2% (66.2-82.5%) 98.6% (96.9-99.3%)
Pyrazinamide drprg 73(341) 50(822) 78.6% (73.9-82.6%) 93.9% (92.1-95.4%)
Pyrazinamide mykrobe 55(341) 56(822) 83.9% (79.6-87.4%) 93.2% (91.3-94.7%)
Pyrazinamide tbprofiler 45(341) 62(822) 86.8% (82.8-90.0%) 92.5% (90.4-94.1%)
Rifampicin drprg 138(3222) 166(4586) 95.7% (95.0-96.4%) 96.4% (95.8-96.9%)
Rifampicin mykrobe 164(3222) 169(4586) 94.9% (94.1-95.6%) 96.3% (95.7-96.8%)
Rifampicin tbprofiler 102(3222) 177(4586) 96.8% (96.2-97.4%) 96.1% (95.5-96.7%)
Streptomycin drprg 269(1042) 132(1205) 74.2% (71.4-76.7%) 89.0% (87.2-90.7%)
Streptomycin mykrobe 282(1042) 135(1205) 72.9% (70.2-75.5%) 88.8% (86.9-90.5%)
Streptomycin tbprofiler 257(1042) 136(1205) 75.3% (72.6-77.9%) 88.7% (86.8-90.4%)

These are the results from the most recent run.

from drprg.

iqbal-lab avatar iqbal-lab commented on July 20, 2024

Wow that comparison looks great. Moxi and ethionamide stand out for everyone tho, weird

from drprg.

iqbal-lab avatar iqbal-lab commented on July 20, 2024

No I did mean moxi , standing out for many FPs, so specificity is low

from drprg.

iqbal-lab avatar iqbal-lab commented on July 20, 2024

Frameshit eh

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

No I did mean moxi , standing out for many FPs, so specificity is low

Ah, right. Well the minor allele calling hasn't really increased the FPs much for it (see table here), but the sensitivity has come up a lot. I guess this is to be expected given Phil's recent paper

I'll take a look at those FPs once I've finalised the minor allele stuff

from drprg.

iqbal-lab avatar iqbal-lab commented on July 20, 2024

You are right, and anyway same for all tools

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

Alright, here is the diff when reducing minor allele calling from depth fraction 0.2 to 0.1

Tool Drug ΔFN ΔFP
drprg Amikacin -1 0
drprg Capreomycin 0 0
drprg Delamanid 0 0
drprg Ethambutol -15 8
drprg Ethionamide 0 1
drprg Isoniazid -10 3
drprg Kanamycin -1 0
drprg Levofloxacin -3 2
drprg Linezolid 0 0
drprg Moxifloxacin -3 3
drprg Ofloxacin -2 0
drprg Pyrazinamide -1 1
drprg Rifampicin 0 1
drprg Streptomycin -7 2

I'm quite impressed with this. I was expecting a lot more FPs. So I think 0.1 is the winner!

Here is the latest Illumina results with the 0.1 fraction

Drug Tool FN(R) FP(S) Sensitivity (95% CI) Specificity (95% CI) MCC
Amikacin drprg 68(485) 57(6958) 86.0% (82.6-88.8%) 99.2% (98.9-99.4%) 0.861
Amikacin mykrobe 93(485) 51(6958) 80.8% (77.1-84.1%) 99.3% (99.0-99.4%) 0.836
Amikacin tbprofiler 62(485) 59(6958) 87.2% (83.9-89.9%) 99.2% (98.9-99.3%) 0.866
Capreomycin drprg 57(235) 95(2449) 75.7% (69.9-80.8%) 96.1% (95.3-96.8%) 0.672
Capreomycin mykrobe 72(235) 88(2449) 69.4% (63.2-74.9%) 96.4% (95.6-97.1%) 0.638
Capreomycin tbprofiler 54(235) 96(2449) 77.0% (71.2-81.9%) 96.1% (95.2-96.8%) 0.679
Delamanid drprg 111(116) 1(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.188
Delamanid mykrobe 111(116) 2(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.173
Delamanid tbprofiler 111(116) 2(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.173
Ethambutol drprg 122(1538) 750(4936) 92.1% (90.6-93.3%) 84.8% (83.8-85.8%) 0.693
Ethambutol mykrobe 133(1538) 747(4936) 91.4% (89.8-92.7%) 84.9% (83.8-85.8%) 0.689
Ethambutol tbprofiler 118(1538) 765(4936) 92.3% (90.9-93.6%) 84.5% (83.5-85.5%) 0.691
Ethionamide drprg 329(1104) 395(6105) 70.2% (67.4-72.8%) 93.5% (92.9-94.1%) 0.622
Ethionamide mykrobe 265(1104) 413(6105) 76.0% (73.4-78.4%) 93.2% (92.6-93.8%) 0.658
Ethionamide tbprofiler 272(1104) 414(6105) 75.4% (72.7-77.8%) 93.2% (92.6-93.8%) 0.653
Isoniazid drprg 317(3900) 173(4194) 91.9% (91.0-92.7%) 95.9% (95.2-96.4%) 0.879
Isoniazid mykrobe 333(3900) 170(4194) 91.5% (90.5-92.3%) 95.9% (95.3-96.5%) 0.876
Isoniazid tbprofiler 297(3900) 181(4194) 92.4% (91.5-93.2%) 95.7% (95.0-96.3%) 0.882
Kanamycin drprg 128(670) 107(6975) 80.9% (77.7-83.7%) 98.5% (98.1-98.7%) 0.805
Kanamycin mykrobe 152(670) 98(6975) 77.3% (74.0-80.3%) 98.6% (98.3-98.8%) 0.789
Kanamycin tbprofiler 122(670) 107(6975) 81.8% (78.7-84.5%) 98.5% (98.1-98.7%) 0.811
Levofloxacin drprg 81(1040) 103(5454) 92.2% (90.4-93.7%) 98.1% (97.7-98.4%) 0.896
Levofloxacin mykrobe 88(1040) 102(5454) 91.5% (89.7-93.1%) 98.1% (97.7-98.5%) 0.892
Levofloxacin tbprofiler 85(1040) 109(5454) 91.8% (90.0-93.3%) 98.0% (97.6-98.3%) 0.89
Linezolid drprg 49(65) 4(6110) 24.6% (15.8-36.3%) 99.9% (99.8-100.0%) 0.441
Linezolid mykrobe 48(65) 4(6110) 26.2% (17.0-38.0%) 99.9% (99.8-100.0%) 0.457
Linezolid tbprofiler 48(65) 5(6110) 26.2% (17.0-38.0%) 99.9% (99.8-100.0%) 0.447
Moxifloxacin drprg 41(603) 480(5431) 93.2% (90.9-94.9%) 91.2% (90.4-91.9%) 0.669
Moxifloxacin mykrobe 44(603) 473(5431) 92.7% (90.3-94.5%) 91.3% (90.5-92.0%) 0.669
Moxifloxacin tbprofiler 42(603) 482(5431) 93.0% (90.7-94.8%) 91.1% (90.3-91.9%) 0.668
Ofloxacin drprg 24(105) 5(424) 77.1% (68.2-84.1%) 98.8% (97.3-99.5%) 0.821
Ofloxacin mykrobe 26(105) 5(424) 75.2% (66.2-82.5%) 98.8% (97.3-99.5%) 0.808
Ofloxacin tbprofiler 26(105) 6(424) 75.2% (66.2-82.5%) 98.6% (96.9-99.3%) 0.802
Pyrazinamide drprg 72(341) 51(822) 78.9% (74.2-82.9%) 93.8% (91.9-95.2%) 0.741
Pyrazinamide mykrobe 55(341) 56(822) 83.9% (79.6-87.4%) 93.2% (91.3-94.7%) 0.77
Pyrazinamide tbprofiler 45(341) 62(822) 86.8% (82.8-90.0%) 92.5% (90.4-94.1%) 0.782
Rifampicin drprg 138(3222) 167(4586) 95.7% (95.0-96.4%) 96.4% (95.8-96.9%) 0.92
Rifampicin mykrobe 164(3222) 169(4586) 94.9% (94.1-95.6%) 96.3% (95.7-96.8%) 0.912
Rifampicin tbprofiler 102(3222) 177(4586) 96.8% (96.2-97.4%) 96.1% (95.5-96.7%) 0.927
Streptomycin drprg 262(1042) 134(1205) 74.9% (72.1-77.4%) 88.9% (87.0-90.5%) 0.647
Streptomycin mykrobe 282(1042) 135(1205) 72.9% (70.2-75.5%) 88.8% (86.9-90.5%) 0.629
Streptomycin tbprofiler 257(1042) 136(1205) 75.3% (72.6-77.9%) 88.7% (86.8-90.4%) 0.649

illumina

from drprg.

iqbal-lab avatar iqbal-lab commented on July 20, 2024

Thus is much better isn't it!
Questions

  • just checking, is Mykrobe pinned to haploid here, or can it call minors?
  • tbprofiler still manages to call 36 rif TPs and 20 isoniazid TPs and 27 pyrazinamids TPs that we don't (although drprg beats Mykrobe which is cool!) . So, double checking: this is the same catalogue right? However tbprofiler takes vcf. Maybe some small issue in probe generation, especially where we have dense variation? But racon ought to plug those gaps. Weird. Are they all in the same samples?

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

just checking, is Mykrobe pinned to haploid here, or can it call minors?

it can call minor alleles

So, double checking: this is the same catalogue right? However tbprofiler takes vcf. Maybe some small issue in probe generation, especially where we have dense variation? But racon ought to plug those gaps. Weird. Are they all in the same samples?

100% the same catalogue for all tools. I'm going to dig into the discrepancies this week

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

It seems there are a decent number of cases where we don't call a minor allele because we fail the GAPS filer (min 0.3). However, in many of these cases, even the ref fails the GAPS filter. A lot of examples I am seeing have a small difference in GAPS between the ref and alt. (rpoB S450L has a lot of minor allele calls where both the ref and alt have GAPS 0.333). I'll try increasing the GAPS filter to 0.5, but also adding in a GAPS diff filter that sets an allowed max. difference between the ref and alt GAPS value.

from drprg.

iqbal-lab avatar iqbal-lab commented on July 20, 2024

The GAPS filter threshold is set in the expectation were looking at a major/haploid allele. I'd ignore it for minor allele detection, or as you're doing, change the threshold

from drprg.

iqbal-lab avatar iqbal-lab commented on July 20, 2024

Although ref failing gaps also is interesting

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

I added in a GAPS diff parameter. So we have a max GAPS value and a max GAPS diff, which is the difference between the major and minor allele's GAPS value (this can actually be negative - I found a case of the minor allele GAPS being lower than the called allele!).

I ran with max GAPS 0.5 and max diff 0.2 and I get the following diff

Tool Drug ΔFN ΔFP
drprg Amikacin 0 0
drprg Capreomycin 0 0
drprg Delamanid 0 0
drprg Ethambutol -101 4135
drprg Ethionamide -1 3
drprg Isoniazid 0 0
drprg Kanamycin 0 0
drprg Levofloxacin -2 2
drprg Linezolid 0 1
drprg Moxifloxacin -1 0
drprg Ofloxacin 0 0
drprg Pyrazinamide -1 1
drprg Rifampicin -21 2
drprg Streptomycin 1 1

This is nearly great. We save a lot of RIF FNs. BUT, we end up calling everything EMB resistant by the looks of it. I'll dig into this today and see if I can get to the bottom of that

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

All of the EMB calls are due to the same position

embA    85      4b6d566c        CCTACC  CCTACA,CCTATC,TCTACC    .       PASS    VC=PH_SNPs;GRAPHTYPE=SIMPLE;PDP=0.423729,0.169492,0.0847458,0.322034;OGT=0;VARID=embA_C-12T,embA_C-16G,embA_C-16T;PREDICT=.,.,r       GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      3:18,8,4,14:7,2,1,5:25,0,0,12:7,0,0,3:90,25,25,59:37,7,7,22:0.4,0.666667,0.833333,0.5:-207.412,-316.464,-362.32,-249.056:41.6434

The minor allele has GAPS 0.5 and the ref has 0.4. This position seems to always have depth across the various alleles. My assumption is there is a shared minimizer between these alleles. For instance, the ref is CCTACC and the alt that always gets called is TCTACC. The last 5 bases of these two alleles are the same, so it is conceiveable that they share a minimizer that covers some/all of those 5 bases and not the first, given the small difference in GAPS.

I'll have a think about the best way to address this without losing the RIF TPs we gained.

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

If I break this variant into two sites (there is 3bp that match in between the first base and last two bases) this GAPS issue goes away.

So I see two options here

  1. Reduce the minimum match length to 3 when building the index
  2. Add in another GAPS filter 'max called gaps' which sets a maximum value for the called allele's GAPS value. If it is above this value, we don't consider minor alleles for that site. This would get rid of all of the bad EMB calls

I'm kind of tempted to try both and see what happens separately for each...

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

Reducing the lin. match length to 3 gives the following diff (replaces the last diff)

Tool Drug ΔFN ΔFP
drprg Amikacin 0 0
drprg Capreomycin 0 0
drprg Delamanid 0 0
drprg Ethambutol -8 24
drprg Ethionamide 3 2
drprg Isoniazid 12 -1
drprg Kanamycin 0 0
drprg Levofloxacin -1 2
drprg Linezolid 0 1
drprg Moxifloxacin -1 0
drprg Ofloxacin 0 0
drprg Pyrazinamide 0 3
drprg Rifampicin -19 2
drprg Streptomycin 1 2

Not super happy with this one. Currently trying option 2 above

from drprg.

iqbal-lab avatar iqbal-lab commented on July 20, 2024

Bit weird, just this one position always having coverage on both alleles. Your idea of a shared minimizer would effectively mean those 2 alleles collapse to one path in the kmer graph instead of a bubble, or potentially the bubble becomes a bit shorter I suppose. Is there always the same coverage on both?

from drprg.

iqbal-lab avatar iqbal-lab commented on July 20, 2024

No, this is only an issue when we pay attention to minor alleles? So we always get a tiny bit of coverage on one allele? So this can't be a shared minimizer can it?

from drprg.

iqbal-lab avatar iqbal-lab commented on July 20, 2024

Maybe this is a mad idea, but what happens if you change your hash function?

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

No, this is only an issue when we pay attention to minor alleles? So we always get a tiny bit of coverage on one allele? So this can't be a shared minimizer can it?

All alleles get some coverage, but not the same amount on each. You're right, it can't be shared minimizers...

When I split the allele into two, I still got some coverage on the minor allele, but the GAPS increased on the minor and decreased on the called allele.

Maybe this is a mad idea, but what happens if you change your hash function?

As in the pandora hash function? I think this is what we currently use https://github.com/rmcolq/pandora/blob/945c0255e99ab32fd5b056dc1c16ffdf611ef478/thirdparty/src/inthash.cpp#L106-L117. Do you think there might be a hash collision?

from drprg.

mbhall88 avatar mbhall88 commented on July 20, 2024

This is the diff for using the max called GAPS filter (option 2 from above)

Tool Drug ΔFN ΔFP
drprg Amikacin 0 0
drprg Capreomycin 0 0
drprg Delamanid 0 0
drprg Ethambutol 0 0
drprg Ethionamide -1 3
drprg Isoniazid 0 0
drprg Kanamycin 0 0
drprg Levofloxacin -2 2
drprg Linezolid 0 0
drprg Moxifloxacin -1 0
drprg Ofloxacin 0 0
drprg Pyrazinamide -1 1
drprg Rifampicin -19 2
drprg Streptomycin 1 1

This looks pretty good.

from drprg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.