Code Monkey home page Code Monkey logo

Comments (5)

wjwei-handsome avatar wjwei-handsome commented on June 24, 2024

Hi!

I also encountered the same problem as you :)

Did you have try to replace the vg call --threads 64 --ploidy 2 --vcf calls.vcf --pack alignments.pack graph.gbz > calls-genotyped.vcf to vg call --threads 64 --ploidy 2 --vcf calls.vcf --pack alignments.pack graph.vg > calls-genotyped.vcf

As the Warning messages say, maybe both graph.gbz and graph.vg represent graphs, but the size of the namespace or node is different between them (It's just my dumb guess).

from vg.

glennhickey avatar glennhickey commented on June 24, 2024

Hi, thanks for raising this issue. The problem is that vg call --vcf only works on the the xg file (as created from vg construct --alt-paths then vg index --xg-alts --xg-name). It will not (unlike vg call without --vcf) work on the gbz.

If your graph is coming from construct, I think the node id space should be compatible between the xg and gbwt. You can verify this with vg gbwt -Z graph.gbz --translation graph.trans and making sure the toil columns of that file are the same.

If that's the case, you can run vg pack and vg call on the xg instead of the GBZ and it should work. This isn't ideal, but I think is the only way to proceed barring a substantial rewrite of vg call --vcf. Please let me know whether or not it works.

from vg.

fabio-cunial avatar fabio-cunial commented on June 24, 2024

Thanks a lot, it works now!

Since we are here... I'm slightly confused about the difference between vg call and vg call --vcf when the graph contains only SVs. From what I understood, vg call --vcf is identical to vg call, except that it also returns calls with genotype 0/0, correct? I don't think that vg call is also trying to call new SVs that are supported by discordant read pairs but that are not in the graph, correct?

Finally, an unrelated, minor issue :) As I mentioned, vg call --vcf works now, but it skips 77 calls because there are too many traversals: see this example output.

[VCFTraversalFinder] Warning: Site {"directed_acyclic_net_graph": true, "end": {"node_id": "1326945"}, "start": {"node_id": "1326155"}, "start_end_reachable": true, "type": 1} with 77 variants contains too many traversals (>50000) to enumerate so it will be skipped:
10 42358412 Sniffles2.DEL.2FDS9 CATTCGTGTTTATTCCATTCCATTCCATTCCATTCCATTCCACTCGGGTTCATTGCATTCAGTTCCGTTCCATTCCATTC...TTCCATTCCTTTCCATTCCATTCCATGCCAGTCATGTTGATTCCATTCCATTCCTA C 60 PASS AF=0.077;COVERAGE=19,16,9,14,15;END=42371507;STDEV_LEN=0;STDEV_POS=0;STRAND=-;SUPPORT=1;SVLEN=-13095;SVTYPE=DEL;PRECISE
10 42371080 Sniffles2.INS.DCS9 A ATTGCATTCTATTCCATTCTAATCGGGTTGATTTCATTCCATTCCATTCCATTCTAGTCCATTCCATTCCATTCCGTTCCATTAAATTCCATTCCGTTCCATTCCCCTGGTGATTATTCCAGTCCGTTCCATTCCATTGCATTCCCTTCCACTGGTGTTTTTGGAATCGTGTTGATTCCAATCCATTCAATTACAGTCCAGTCTTTTCCATTCCATTACATTCCACTCGGTTTGTTTCCATTACATTGAATTCCATTGTATTCCATTCCATACCATTGCATTCCATTGCATTCCCATCTTTCCAGTTGATTCCATTTCATTCCATTGCATTCTATTCCATTCAAATCGGGTTTTGGTGCTTATTCCAGTCCGCTCCATTCCATTGCATTCCACATTCCACTCCGTTTGTTTCCATTACATTGAATTACATTGCATTCCATTAAAATA 42 PASS AF=0.143;COVERAGE=16,15,14,15,14;END=42371080;STDEV_LEN=107.48;STDEV_POS=156.271;STRAND=+;SUPPORT=2;SUPPORT_LONG=0;SVLEN=599;SVTYPE=INS;IMPRECISE
10 42370809 Sniffles2.INS.DBS9 C CATTCCATTCTATTCCAATGCATTCCATTAGGGTTGAATTCATTGTCCATTCCTCTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCTTTCCATTCTTCCATTCCATTCCATTCCATTCCACTCGTGTTCATTT 60 PASS AF=0.067;COVERAGE=16,15,15,15,15;END=42370809;STDEV_LEN=0;STDEV_POS=0;STRAND=+;SUPPORT=1;SUPPORT_LONG=0;SVLEN=139;SVTYPE=INS;PRECISE
...

I thought that vg giraffe has a heuristic for dealing with complex regions in the alignment stage (choosing a greedy path cover and prioritizing those paths). So why is this region being skipped in the genotyping stage? Isn't genotyping just looking for alignment coverage on nodes and edges?

Thanks a lot for your help and time!

from vg.

glennhickey avatar glennhickey commented on June 24, 2024

vg call --vcf outputs variants strictly in terms of the VCF used as input to vg construct. In doing so, it does some exhaustive enumeration of possible allele combinations in large sites and will give up if it gets overwhelmed (hence the warning). It has nothing to do with giraffe.

The newer way to go about this would be to use vg call <graph.gbz> -z. This will limit the possible alleles, even for large sites, to haplotypes from the phasing information of your graph. The drawback is that the resulting VCF may look a little different than your input VCF as it may have changed slightly while roundtripping into and out of the graph.

It would be nice to have a combination of the two: use the haplotypes from the gbz but cast output exactly in terms of the input VCF, but that's not implemented and probably won't be anytime soon...

from vg.

fabio-cunial avatar fabio-cunial commented on June 24, 2024

Thanks a lot Glenn. I think you can close the issue now if you want to.

from vg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.