1. What were you trying to do? I am creating a hum

Nothing jumps out immediately to me, but maybe <a class="user-mention notranslate" dat

Extracting GFA with alternate paths from vg autoindex about vg HOT 4 CLOSED

venkatk89 commented on September 26, 2024

Extracting GFA with alternate paths from vg autoindex

from vg.

Comments (4)

jeizenga commented on September 26, 2024

It would be easier for me to diagnose what's happening here if you included the actual commands, but it seems like you're using vg autoindex for a purpose that it's not really designed for. I highly recommend not trying to reach into the inside of on of vg autoindes's indexing workflows. The individual steps can be unintuitive, and we don't try to maintain any consistent interface or guarantees for the intermediate stages. The --keep-intermediate option is only intended as a debugging tool.

I think what might be happening here is that vg autoindex does not accept repeated --gfa arguments. It's probably only retaining the last GFA, in which case you would expect mapping to be very slow since most reads will not have a good mapping target.

from vg.

venkatk89 commented on September 26, 2024

Thank you for your quick response.

I was wondering if there is a way to manually create the indices required for the Giraffe mapper from scratch. By collating the different indexing commands present in the wiki, I could come with the following pipeline for creating the 1000 genomes project graph:

1.Create VG files with alt paths

(seq 1 22; echo X; echo Y) | \
parallel -j 24 \
vg construct --alt-paths \
--region-is-chrom --region chr{} \
--reference $hg38_reference_genome \
--vcf 1KGP_chr{}.vcf.gz \
--threads 1 --flat-alts --handle-sv > 1KGP_graph_chr{}.vg

2. Coordinate Node IDs

vg ids -j -m mapping $(for i in $(seq 1 22; echo X; echo Y); do echo 1KGP_graph_chr$i.vg; done)

3. Create GFAs

seq 1 22; echo X; echo Y) | \
parallel -j 24 \
"time \
vg view -v 1KGP_graph_chr{}.vg \
-g > 1KGP_graph_chr{}.gfa

4. Get XG index with Alt paths

vg index -L -x 1KGP_graph.xg \
$(for i in $(seq 22; echo X; echo Y); do echo 1KGP_graph_chr$i.vg; done)

5. GBWT with greedy path cover

vg gbwt -n 16 -g 1KGP_graph.gg -o 1KGP_graph.gbwt -x 1KGP_graph.xg -P

6. Build Minimizer Index

vg minimizer -o 1KGP_graph.min -g 1KGP_graph.gbwt 1KGP_graph.xg

7. Get Distance Index

vg snarls --include-trivial 1KGP_graph.xg > 1KGP_graph.trivial.snarls

vg index -j 1KGP_graph.dist -x 1KGP_graph.xg -s 1KGP_graph.trivial.snarls

When I run vg giraffe using the .gbwt, .dist and .min files created above, a .gbz file is created for the first time. Later on, the .gbz file is auto-recognized by giraffe.

But the issue is, when using these indices, the runtime of the giraffe mapper is extremely high. Is the pipeline missing something?

from vg.

jeizenga commented on September 26, 2024

Nothing jumps out immediately to me, but maybe @xchang1 could comment.

from vg.

xchang1 commented on September 26, 2024

Try rebuilding the minimizer index with -d to include the distance index. The minimizer index can store some information from the distance index to make giraffe faster.

If that doesn't work, can you try updating your version of vg? It would be easier to debug on the current version

from vg.

Extracting GFA with alternate paths from vg autoindex about vg HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent