Code Monkey home page Code Monkey logo

Comments (7)

glennhickey avatar glennhickey commented on September 26, 2024

You have to run vg pack on the same graph you mapped to with vg giraffe. In this case, that woudl appear to be all.giraffe.gbz.

from vg.

wjwei-handsome avatar wjwei-handsome commented on September 26, 2024

You have to run vg pack on the same graph you mapped to with vg giraffe. In this case, that woudl appear to be all.giraffe.gbz.

But I want to use vg call --vcf to do genotyping. As here mentioned,vg call --vcf only works on the the xg file, not gbz file.

And my confusion is : should not vg call and vg pack use same graph?

Thanks a lot for your help and time!

from vg.

glennhickey avatar glennhickey commented on September 26, 2024

If you really want the xg, you can make it with vg convert -Hx all.giraffe.gbz > all.giraffe.xg and that will work with vg call / pack.

The important thing is to make sure that the xg and gbz are identical (which doesn't necesarily happen if they are created independently).

from vg.

wjwei-handsome avatar wjwei-handsome commented on September 26, 2024

If you really want the xg, you can make it with vg convert -Hx all.giraffe.gbz > all.giraffe.xg and that will work with vg call / pack.

The important thing is to make sure that the xg and gbz are identical (which doesn't necesarily happen if they are created independently).

Hi, I tried convert all.giraffe.gbz to all.giraffe.xg by using vg convert -Hx all.giraffe.gbz > all.giraffe.xg. And then I executed:

vg pack --xg all.giraffe.xg --packs-out sample.pack --gam sample.gam

vg call all.giraffe.xg --pack sample.pack --vcf ref.vcf --snarls all.snarls > sample.geno.vcf

But a empty result vcf returned with the log:

[VCFTraversalFinder] Warning: No alt path (prefix=_alt_80bfe3aa8146b33a31c5b8045e279c732771b806_) found in graph for variant
....

What I can guarantee is that the vcf used in the call and the vcf used when generating gbz are the same.

And the snarls index file which I pre-computed is generated from all.giraffe.xg using: vg snarls all.giraffe.xg > all.snarls.

So I am confused that the variation cannot be found in the graph. How can I solve this problem?

Thanks for your help!

from vg.

glennhickey avatar glennhickey commented on September 26, 2024

Hmm, this issue seems to be coming up a lot for some reason. I think we need to improve the documentation. If you want to use -v with vg call you cannot use vg autoindex. You must follow this particular recipe exactly

#3974 (comment)

from vg.

wjwei-handsome avatar wjwei-handsome commented on September 26, 2024

First of all, thank you very much for grand help, which has eased my anxiety :)

Well, combining the above information, I have two roads to choose from:

  1. Follow this issue's recipe #3974 (comment) to regenerate the gbz/dist/min index file and re-perform the giraffe for hundreds of samples. This has already taken 2 months for me, so I don't think it is a wise choice.

  2. Take gbz graph to perform vg calls and use the -a parameter for multiple samples. For this method, my confusion is, how different are the coordinates of such a result and the result of using vg call -v ref.vcf ? And how to balance the difference? If there is not much difference between the two, is it more recommended to use gbz, which contains known haplotype information, instead of vcf?

Do you have any suggestions?

Some proposals about wiki/docs/Q&A

As for the upgrade of documentation and wiki, I think it is very necessary.
The ability of giraffe is quite excellent, so I used the autoindex method highly recommended on the wiki for this, but for subsequent pack and call, only gbz/dist/min index file is not enough (if vg call -v ref.vcf is used). These documents confuse users, some pages even stagnate in 2017, and most importantly, vg is a team/project that updates quickly, so it is crucial to keep the documents updated, and for some popular bioinformatics analysis, such as population NGS genotype identification, I hope there are some best practice tutorials that will greatly help users.

During the period of using vg, I also put forward some issues for vg to help improve, such as : errors in building distance index: #3884 .and distance Index file permission problem: #3865 . In this process, I am glad to see that the efficient vg team is constantly building better software, which has given me great help and support.
I have noticed that many similar issues have been opened, which will confuse users. Is it possible to provide a dedicated online community channel(like Discord) to allows users to get help from the vg team and other users?

Best wishes.

from vg.

glennhickey avatar glennhickey commented on September 26, 2024

For your question: I think you are best to just call with vg call -az. This will use the haplotypes in the GBZ. Your vcf records won't be identical to the input VCF but they will be equivalent, and it will probably do a better job in complex sites.

Because the vg tools are under development and changing all the time, it has been challenging to maintain the documentation. But I am in complete agreement with you that they need some work now. But we try to support people the best we can here on github.

from vg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.