Comments (5)
Hi!
I also encountered the same problem as you :)
Did you have try to replace the vg call --threads 64 --ploidy 2 --vcf calls.vcf --pack alignments.pack graph.gbz > calls-genotyped.vcf
to vg call --threads 64 --ploidy 2 --vcf calls.vcf --pack alignments.pack graph.vg > calls-genotyped.vcf
As the Warning messages say, maybe both graph.gbz
and graph.vg
represent graphs, but the size of the namespace or node is different between them (It's just my dumb guess).
from vg.
Hi, thanks for raising this issue. The problem is that vg call --vcf
only works on the the xg file (as created from vg construct --alt-paths
then vg index --xg-alts --xg-name
). It will not (unlike vg call
without --vcf
) work on the gbz
.
If your graph is coming from construct
, I think the node id space should be compatible between the xg and gbwt. You can verify this with vg gbwt -Z graph.gbz --translation graph.trans
and making sure the toil columns of that file are the same.
If that's the case, you can run vg pack
and vg call
on the xg instead of the GBZ and it should work. This isn't ideal, but I think is the only way to proceed barring a substantial rewrite of vg call --vcf
. Please let me know whether or not it works.
from vg.
Thanks a lot, it works now!
Since we are here... I'm slightly confused about the difference between vg call
and vg call --vcf
when the graph contains only SVs. From what I understood, vg call --vcf
is identical to vg call
, except that it also returns calls with genotype 0/0, correct? I don't think that vg call
is also trying to call new SVs that are supported by discordant read pairs but that are not in the graph, correct?
Finally, an unrelated, minor issue :) As I mentioned, vg call --vcf
works now, but it skips 77 calls because there are too many traversals: see this example output.
[VCFTraversalFinder] Warning: Site {"directed_acyclic_net_graph": true, "end": {"node_id": "1326945"}, "start": {"node_id": "1326155"}, "start_end_reachable": true, "type": 1} with 77 variants contains too many traversals (>50000) to enumerate so it will be skipped:
10 42358412 Sniffles2.DEL.2FDS9 CATTCGTGTTTATTCCATTCCATTCCATTCCATTCCATTCCACTCGGGTTCATTGCATTCAGTTCCGTTCCATTCCATTC...TTCCATTCCTTTCCATTCCATTCCATGCCAGTCATGTTGATTCCATTCCATTCCTA C 60 PASS AF=0.077;COVERAGE=19,16,9,14,15;END=42371507;STDEV_LEN=0;STDEV_POS=0;STRAND=-;SUPPORT=1;SVLEN=-13095;SVTYPE=DEL;PRECISE
10 42371080 Sniffles2.INS.DCS9 A ATTGCATTCTATTCCATTCTAATCGGGTTGATTTCATTCCATTCCATTCCATTCTAGTCCATTCCATTCCATTCCGTTCCATTAAATTCCATTCCGTTCCATTCCCCTGGTGATTATTCCAGTCCGTTCCATTCCATTGCATTCCCTTCCACTGGTGTTTTTGGAATCGTGTTGATTCCAATCCATTCAATTACAGTCCAGTCTTTTCCATTCCATTACATTCCACTCGGTTTGTTTCCATTACATTGAATTCCATTGTATTCCATTCCATACCATTGCATTCCATTGCATTCCCATCTTTCCAGTTGATTCCATTTCATTCCATTGCATTCTATTCCATTCAAATCGGGTTTTGGTGCTTATTCCAGTCCGCTCCATTCCATTGCATTCCACATTCCACTCCGTTTGTTTCCATTACATTGAATTACATTGCATTCCATTAAAATA 42 PASS AF=0.143;COVERAGE=16,15,14,15,14;END=42371080;STDEV_LEN=107.48;STDEV_POS=156.271;STRAND=+;SUPPORT=2;SUPPORT_LONG=0;SVLEN=599;SVTYPE=INS;IMPRECISE
10 42370809 Sniffles2.INS.DBS9 C CATTCCATTCTATTCCAATGCATTCCATTAGGGTTGAATTCATTGTCCATTCCTCTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCTTTCCATTCTTCCATTCCATTCCATTCCATTCCACTCGTGTTCATTT 60 PASS AF=0.067;COVERAGE=16,15,15,15,15;END=42370809;STDEV_LEN=0;STDEV_POS=0;STRAND=+;SUPPORT=1;SUPPORT_LONG=0;SVLEN=139;SVTYPE=INS;PRECISE
...
I thought that vg giraffe
has a heuristic for dealing with complex regions in the alignment stage (choosing a greedy path cover and prioritizing those paths). So why is this region being skipped in the genotyping stage? Isn't genotyping just looking for alignment coverage on nodes and edges?
Thanks a lot for your help and time!
from vg.
vg call --vcf
outputs variants strictly in terms of the VCF used as input to vg construct
. In doing so, it does some exhaustive enumeration of possible allele combinations in large sites and will give up if it gets overwhelmed (hence the warning). It has nothing to do with giraffe.
The newer way to go about this would be to use vg call <graph.gbz> -z
. This will limit the possible alleles, even for large sites, to haplotypes from the phasing information of your graph. The drawback is that the resulting VCF may look a little different than your input VCF as it may have changed slightly while roundtripping into and out of the graph.
It would be nice to have a combination of the two: use the haplotypes from the gbz but cast output exactly in terms of the input VCF, but that's not implemented and probably won't be anytime soon...
from vg.
Thanks a lot Glenn. I think you can close the issue now if you want to.
from vg.
Related Issues (20)
- How to tanslate the Minigraph-cactus result to SequenceTubeMaps through vg HOT 2
- Merging alignment graphs vs merging BAMs HOT 3
- Feature request: Accept jellyfish kmer counts for vg haplotypes HOT 2
- Using vg to analyze transcriptomic data from a mixed background HOT 4
- `vg giraffe` for long reads HOT 1
- Missing '--alt-prefix' option in recent versions of vg deconstruct.
- Problem with vg autoindex with phased VCF HOT 2
- No sample & read group field in BAM output with vg map HOT 5
- I cant get .vg file using vg construct HOT 2
- `vg autoindex` fails on GFA file containing 20k SARS-CoV-2 sequences. HOT 1
- vg giraffe mapping HiFi HOT 1
- error vg autoindex: Input is not sufficient to create indexes HOT 2
- Oddly high mapping quality when aligning reads with Ns HOT 6
- Release vg v1.56.0
- I have a question about gam file. HOT 2
- vg surject: incorrect behavior near polymorphic {repetitive} insertions HOT 4
- Can vg construct a trans-chromosomal graph ? HOT 5
- ERROR: Signal 6 occurred. VG has crashed. HOT 8
- vg construct will let you create a graph without reference paths, with different path names than Cactus
- VG convert to gfa and back will drop path data. HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vg.