Comments (4)
It would be easier for me to diagnose what's happening here if you included the actual commands, but it seems like you're using vg autoindex
for a purpose that it's not really designed for. I highly recommend not trying to reach into the inside of on of vg autoindes
's indexing workflows. The individual steps can be unintuitive, and we don't try to maintain any consistent interface or guarantees for the intermediate stages. The --keep-intermediate
option is only intended as a debugging tool.
I think what might be happening here is that vg autoindex
does not accept repeated --gfa
arguments. It's probably only retaining the last GFA, in which case you would expect mapping to be very slow since most reads will not have a good mapping target.
from vg.
Thank you for your quick response.
I was wondering if there is a way to manually create the indices required for the Giraffe mapper from scratch. By collating the different indexing commands present in the wiki, I could come with the following pipeline for creating the 1000 genomes project graph:
1.Create VG files with alt paths
(seq 1 22; echo X; echo Y) | \
parallel -j 24 \
vg construct --alt-paths \
--region-is-chrom --region chr{} \
--reference $hg38_reference_genome \
--vcf 1KGP_chr{}.vcf.gz \
--threads 1 --flat-alts --handle-sv > 1KGP_graph_chr{}.vg
2. Coordinate Node IDs
vg ids -j -m mapping $(for i in $(seq 1 22; echo X; echo Y); do echo 1KGP_graph_chr$i.vg; done)
3. Create GFAs
seq 1 22; echo X; echo Y) | \
parallel -j 24 \
"time \
vg view -v 1KGP_graph_chr{}.vg \
-g > 1KGP_graph_chr{}.gfa
4. Get XG index with Alt paths
vg index -L -x 1KGP_graph.xg \
$(for i in $(seq 22; echo X; echo Y); do echo 1KGP_graph_chr$i.vg; done)
5. GBWT with greedy path cover
vg gbwt -n 16 -g 1KGP_graph.gg -o 1KGP_graph.gbwt -x 1KGP_graph.xg -P
6. Build Minimizer Index
vg minimizer -o 1KGP_graph.min -g 1KGP_graph.gbwt 1KGP_graph.xg
7. Get Distance Index
vg snarls --include-trivial 1KGP_graph.xg > 1KGP_graph.trivial.snarls
vg index -j 1KGP_graph.dist -x 1KGP_graph.xg -s 1KGP_graph.trivial.snarls
When I run vg giraffe using the .gbwt, .dist and .min files created above, a .gbz file is created for the first time. Later on, the .gbz file is auto-recognized by giraffe.
But the issue is, when using these indices, the runtime of the giraffe mapper is extremely high. Is the pipeline missing something?
from vg.
Nothing jumps out immediately to me, but maybe @xchang1 could comment.
from vg.
Try rebuilding the minimizer index with -d
to include the distance index. The minimizer index can store some information from the distance index to make giraffe faster.
If that doesn't work, can you try updating your version of vg? It would be easier to debug on the current version
from vg.
Related Issues (20)
- Program stuck at [IndexRegistry]: Chunking VCF(s) for days HOT 2
- vg alignments not reporting split reads HOT 1
- Mapping paired end reads with vg giraffe: "Falling back on single-end mapping" HOT 9
- Autoindex should parse tabix-indexed monolithic VCFs in parallel
- vg pack error HOT 3
- VCF file empty when calling SV on ONT data HOT 9
- vg map errors HOT 5
- Genotyping SVs in a minigraph-cactus graph yields many similar alleles in output vcf HOT 1
- How to align both long and paired end short reads using vg HOT 1
- Augmentation failed on one chromosome, but succesfull on other chromosomes HOT 3
- Can VG simulate the third-generations long reads? HOT 5
- Issue with vg map and vg augment for certain inputs. HOT 2
- Merging multiple graph files HOT 4
- short read giraffe alignment crashes most of the time, works on a few random samples - Signal 11 HOT 2
- Running VG deconstruct failed in the gfa generated from PGGB pipline HOT 3
- ERROR vg autoindex :Tag "transcript_id" not found in attributes (line 145). ERROR: Tag "transcript_id" not found in attributes (line 4). ERROR: No transcripts parsed (remember to set feature type "-y" in vg rna or "-f" in vg autoindex) HOT 1
- Error Exceeded Limit of Size on Disk While Running vg index HOT 1
- Mapping paired reads w/ giraffe, no EOF marker, job stalls, exit code 79 HOT 5
- Construct a generation-level pan-genome HOT 5
- Release vg v1.58.0
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vg.