Comments (21)
What does your graph contain and how did you build it?
If you run the following commands, what is the output?
vg gbwt -M -Z graph.gbz
vg paths -L -G -x graph.gbz | wc -l
vg gbwt --tags -Z graph.gbz
If the last command listed any sample names under tag reference_samples
, try running the following command for each of them:
vg paths -S sample -L -x graph.gbz | wc -l
Additionally, is the mapping speed reasonable if you use GAM or GAF as the output format?
from vg.
Hello, although I am now able to run giraffe in an array job. The mapping speed is still very slow ~ 80 reads / per second per thread.
I then ran
vg gbwt -M -Z graph.gbz
vg paths -L -G -x graph.gbz | wc -l
vg gbwt --tags -Z graph.gbz
which returned
11775451 paths with names, 7 samples with names, 10 haplotypes, 3261 contigs with names
0
reference_samples PUMM
source jltsiren/gbwt
Then I ran
vg paths -S PUMM -L -x graph.gbz | wc -l
which returned 10
from vg.
Are you specifying the number of mapping threads each Giraffe job should use with the -t
/ --threads
option?
Also, what does vg stats -l graph.gbz
say and how does that compare to the size of the genome?
from vg.
I specified 32 threads, and vg stats -l graph.gbz returns 400249670. Which is 50 Mb larger than the reference genome. Although other genomes used in the graph construction are closer to 400Mb
from vg.
Can you share the graph and some reads? I think we have ruled out most of the common things that could go wrong.
from vg.
How should I share it?
from vg.
I don't know. What options do you have?
from vg.
I can share it with you on google drive if that works for you
from vg.
Sharing on Google Drive should work. Please let me know once you have uploaded the files.
from vg.
Just shared the folder to your ucsc email address
from vg.
There were no real reads to try, but I managed to map simulated 150 bp and 250 bp reads in a reasonable time on my laptop.
Based on the filenames, the graph you provided is a filter
graph from the Minigraph-Cactus pipeline. Filtering removes all nodes used only by a single haplotype from the graph. Because you only have 10 haplotypes to begin with, that results in a large number of short path fragments. That may be a problem, especially if the reads you are trying to map diverge significantly from the haplotypes.
You may get better performance by mapping reads to the default (clip
) graph.
from vg.
I met the same problom,did you solve it? @IsaacDiaz026
from vg.
vg v1.36.0 works.
vg v1.40.0 and v1.48.0 has this problem, and mapping seems never stop ( v1.36.0 use 20m and 1.40.0 runs more than 1 day).
from vg.
I met the same problom,did you solve it? @IsaacDiaz026
I haven't yet, working on rebuilding a new pangenome graph first
from vg.
vg v1.36.0 works. vg v1.40.0 and v1.48.0 has this problem.
I think it's actually that we added these warnings since v1.36.0, not that the mapping got slower. If you see a few of these warnings in a mapping run, it's not such a huge deal. If you're seeing many of them, then there's probably something to troubleshoot with the mapping speed.
from vg.
My graph is constructed by minigraph-cactus pipeline. When i mapping with vg giraffe,I saw lots of warnings above so that mapping got lower.I want to know what went wrong maybe. @jeizenga
from vg.
Hi, maybe this helps but from what I can tell when vg giraffe hits these errors when accessing the same index files?
My situation was that I have a single graph (with three files graph.dist, graph.giraffe.gbz, graph.min) and multiple short-read samples that I want to align.
When I run a single sample I get steady progress. When I run my samples in parallel (using a slurm array job) I get almost no progress and lots of watchdog errors. When I copy the 3 graph files to unique names for each sample for eahc array job I get steady progress again.
So it feels like there is some conflict in accessing the three graph files when running vg in parallel? I've got no idea how/why but yeah, having separate graph files for each sample let me run normally. Not something I've seen mentioned?
from vg.
Hello, although I am now able to run giraffe in an array job. The mapping speed is still very slow ~ 80 reads / per second per thread.
I then ran
vg gbwt -M -Z graph.gbz vg paths -L -G -x graph.gbz | wc -l vg gbwt --tags -Z graph.gbz
which returned 11775451 paths with names, 7 samples with names, 10 haplotypes, 3261 contigs with names 0 reference_samples PUMM source jltsiren/gbwt
Then I ran
vg paths -S PUMM -L -x graph.gbz | wc -l
which returned 10
What does your graph contain and how did you build it?
If you run the following commands, what is the output?
vg gbwt -M -Z graph.gbz vg paths -L -G -x graph.gbz | wc -l vg gbwt --tags -Z graph.gbz
If the last command listed any sample names under tag
reference_samples
, try running the following command for each of them:vg paths -S sample -L -x graph.gbz | wc -l
Additionally, is the mapping speed reasonable if you use GAM or GAF as the output format?
Sorry sir. If i get the 0
What does your graph contain and how did you build it?
If you run the following commands, what is the output?
vg gbwt -M -Z graph.gbz vg paths -L -G -x graph.gbz | wc -l vg gbwt --tags -Z graph.gbz
If the last command listed any sample names under tag
reference_samples
, try running the following command for each of them:vg paths -S sample -L -x graph.gbz | wc -l
Additionally, is the mapping speed reasonable if you use GAM or GAF as the output format?
Sorry, sir. If my results were like this, what would it reveal?
vg gbwt -M -Z graph.gbz
vg paths -L -G -x graph.gbz | wc -l
vg gbwt --tags -Z graph.gbz
vg paths -S PUMM -L -x graph.gbz | wc -l
returned:
17 paths with names, 17 samples with names, 17 haplotypes, 1 contigs with names
1
reference_samples
source jltsiren/gbwt
0
from vg.
Maybe I found the reason. when i use real short-read data, there is no problem happen. but when i use simulated data, waring would happen.
from vg.
@Andy-B-123 I am facing exactly same question as you described. Do you use cluster job manage tool like slurm to run it in parallel? And I think the reason may be we should compile vg by ourselves?
from vg.
@Andy-B-123 I am facing exactly same question as you described. Do you use cluster job manage tool like slurm to run it in parallel? And I think the reason may be we should compile vg by ourselves?
Yes I use slurm on our HPC. My work around is to copy the graph files (graph.dist, graph.giraffe.gbz, graph.min)at the start of each parallel (eg job 1, graph.1.dist, job 2 graph.2.dist... ), use those for each job and then delete them once complete. It increases the space requirements for the run but actually allows it to run in parallel so in my mind it's worth it. Just make sure to include a delete step at the end to clean up the copies graph files
from vg.
Related Issues (20)
- VCF file empty when calling SV on ONT data HOT 9
- vg map errors HOT 5
- Genotyping SVs in a minigraph-cactus graph yields many similar alleles in output vcf HOT 1
- How to align both long and paired end short reads using vg HOT 1
- Augmentation failed on one chromosome, but succesfull on other chromosomes HOT 3
- Can VG simulate the third-generations long reads? HOT 5
- Issue with vg map and vg augment for certain inputs. HOT 2
- Merging multiple graph files HOT 4
- short read giraffe alignment crashes most of the time, works on a few random samples - Signal 11 HOT 2
- Running VG deconstruct failed in the gfa generated from PGGB pipline HOT 3
- ERROR vg autoindex :Tag "transcript_id" not found in attributes (line 145). ERROR: Tag "transcript_id" not found in attributes (line 4). ERROR: No transcripts parsed (remember to set feature type "-y" in vg rna or "-f" in vg autoindex) HOT 1
- Error Exceeded Limit of Size on Disk While Running vg index HOT 1
- Mapping paired reads w/ giraffe, no EOF marker, job stalls, exit code 79 HOT 5
- Construct a generation-level pan-genome HOT 5
- Release vg v1.58.0
- VG autoindex not working with PGGB GFA HOT 3
- vg augment forwardize_breakpoints error HOT 1
- How to quantify the uncore gene expression by using vg rna HOT 1
- vg pack signal 6 error HOT 3
- Remove limit on Dozeu problem cell count from old Giraffe codepath
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vg.