Code Monkey home page Code Monkey logo

Comments (4)

jeizenga avatar jeizenga commented on September 26, 2024

It would be easier for me to diagnose what's happening here if you included the actual commands, but it seems like you're using vg autoindex for a purpose that it's not really designed for. I highly recommend not trying to reach into the inside of on of vg autoindes's indexing workflows. The individual steps can be unintuitive, and we don't try to maintain any consistent interface or guarantees for the intermediate stages. The --keep-intermediate option is only intended as a debugging tool.

I think what might be happening here is that vg autoindex does not accept repeated --gfa arguments. It's probably only retaining the last GFA, in which case you would expect mapping to be very slow since most reads will not have a good mapping target.

from vg.

venkatk89 avatar venkatk89 commented on September 26, 2024

Thank you for your quick response.

I was wondering if there is a way to manually create the indices required for the Giraffe mapper from scratch. By collating the different indexing commands present in the wiki, I could come with the following pipeline for creating the 1000 genomes project graph:

1.Create VG files with alt paths

(seq 1 22; echo X; echo Y) | \
parallel -j 24 \
vg construct --alt-paths \
--region-is-chrom --region chr{} \
--reference $hg38_reference_genome \
--vcf 1KGP_chr{}.vcf.gz \
--threads 1 --flat-alts --handle-sv > 1KGP_graph_chr{}.vg 

2. Coordinate Node IDs

vg ids -j -m mapping $(for i in $(seq 1 22; echo X; echo Y); do echo 1KGP_graph_chr$i.vg; done)

3. Create GFAs

seq 1 22; echo X; echo Y) | \
parallel -j 24 \
"time \
vg view -v 1KGP_graph_chr{}.vg \
-g > 1KGP_graph_chr{}.gfa

4. Get XG index with Alt paths

vg index -L -x 1KGP_graph.xg \
$(for i in $(seq 22; echo X; echo Y); do echo 1KGP_graph_chr$i.vg; done)

5. GBWT with greedy path cover

vg gbwt -n 16 -g 1KGP_graph.gg -o 1KGP_graph.gbwt -x 1KGP_graph.xg -P 

6. Build Minimizer Index

vg minimizer -o 1KGP_graph.min -g 1KGP_graph.gbwt 1KGP_graph.xg

7. Get Distance Index

vg snarls --include-trivial 1KGP_graph.xg > 1KGP_graph.trivial.snarls

vg index -j 1KGP_graph.dist -x 1KGP_graph.xg -s 1KGP_graph.trivial.snarls

When I run vg giraffe using the .gbwt, .dist and .min files created above, a .gbz file is created for the first time. Later on, the .gbz file is auto-recognized by giraffe.

But the issue is, when using these indices, the runtime of the giraffe mapper is extremely high. Is the pipeline missing something?

from vg.

jeizenga avatar jeizenga commented on September 26, 2024

Nothing jumps out immediately to me, but maybe @xchang1 could comment.

from vg.

xchang1 avatar xchang1 commented on September 26, 2024

Try rebuilding the minimizer index with -d to include the distance index. The minimizer index can store some information from the distance index to make giraffe faster.

If that doesn't work, can you try updating your version of vg? It would be easier to debug on the current version

from vg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.