Comments (6)
If you want Python code to read vg's formats directly, you can try either the libbdsg Python bindings which can read most .vg files (which are usually now in PackedGraph format), or pystream-protobuf which knows how to decode vg's Protobuf-based formats like .snarls, .gam, and .vg as written straight from vg construct
. If you have a .vg that isn't in PackedGraph format internally and you want to use it with libbdsg, vg convert
can convert it.
If you do want to convert to a text-based format, you have some options in vg:
You can use vg convert --gfa-out graph.vg
to convert graphs to the text-based standard GFA format.
You can use vg convert --gam-to-gaf whatever.gam
to convert a GAM alignment file to the standard text-based GAF format.
The vg view
command also has options to dump miscellaneous formats like Protobuf .snarls to a JSON representation. You can say vg --snarl-in --json whatever.snarls
and it will print out JSON.
from vg.
Thank you, I get it. I want to read gam format and i check out the pystream-protobuf. I found that I need to import stream and vg_pb2 to parse GAM file. I install stream, but I don't know how to use vg_pb2.
from vg.
vg_pb2
is the Python module generated by Protobuf to provide Python language bindings for the Protobuf-defined types that vg uses (such as Alignment
, which represents a GAM read). It doesn't really quite have its own documentation, but you can look at the Protobuf definition of an Alignment message, or the Doxygen docs for Alignment, and at the documentation for how Protobuf messages get their fields and accessors exposed to Python.
It looks like all the non-message fields just become Python fields. So if you want to print out all the read names, you would get the Alignment
type out of vg_pb2
, you would use stream
to loop over the file looking for Alignment
s, and for each alignment a
you would print out a.name
.
from vg.
@adamnovak Thank you! And I have another question: I use vg snarl to get the snarl fie of a variation graph. Can I get the reads that mapping to each snarl?(It means I can get snarl and reads that match to a special snarl) Does vg support it?
from vg.
It looks like you can take your snarls file and provide it to vg chunk
with the --snarls
option. Then you can use --path
or --path-list
to ask for region(s) along a path in the graph that goes through the snarl, and vg chunk
will use the snarl information to pull out complete snarls. And if you use --gam-name
with vg chunk
, it will pull out reads from that (sorted and indexed) GAM instead of the graph.
But we don't have a way to ask vg chunk
to pull a snarl directly by giving its bounding node ends. So you have to do something like:
- Get the node IDs bounding your snarl (
123
and456
) - Get the lists of paths on each of them (
vg find -x graph.vg -n 123 | vg paths --list
) - Pick a path that both are on and get each node's positions on that path (
vg find -x graph.vg -n 123 --position-in pathname
) - Come up with the path interval you are interested in that you think the snarl covers (
pathname:1000-2000
) - Make sure your GAM is sorted and indexed fro random access and has a
.gai
file (vg gamsort reads.gam -i reads.sorted.gam.gai >reads.sorted.gam
) - Fetch out the reads (
vg chunk -x graph.vg -a reads.sorted.gam --path pathname:1000-2000 --snarls whatever.snarls
)
This would work way better if we actually had C++ code somewhere to pull all the nodes in a specified snarl, or in each snarl, and grab the reads touching them.
If you want to do it as a batch process and not just look at one snarl and its reads, you can make it work with pystream-protobuf
and the new snarl decomposition in the libbdsg Python bindings which presents the new SnarlDecomposition API, but that loads snarls in a different format. You would vg index graph.vg -j graph.snarlsonly.dist --snarl-limit 0
to build it, and then load it up:
from bdsg.bdsg import SnarlDistanceIndex
index = SnarlDistanceIndex()
index.deserialize("graph.snarlsonly.dist")
And you would also need the graph loaded in order to traverse the "nets" that the new snarl decomposition API uses:
from bdsg.bdsg import PackedGraph
graph = PackedGraph()
graph.deseriaslize("graph.vg")
But then you can loop over all your reads with pystream-protobuf
, and go through the node_id
in the position
of each entry in mapping
in the Alignment's path
, and then for each of them figure out what snarls the node is in. (Since the snarl tree is a tree, each node can be in several nested snarls, back up to the root.) Something like:
node_handle = graph.get_handle(node_id, False)
node_net_handle = index.get_net(node_handle, graph)
here_net_handle = index.canonical(node_net_handle)
node_snarl_bounds = []
while not index.is_root(here_net_handle):
if index.is_snarl(here_net_handle):
# Make a list of tuples of the node ID and orientation for the nodes before and after the snarl.
bounds = []
def visit(net_handle):
handle = index.get_handle(net_handle, graph)
bounds.append((graph.get_id(handle), graph.get_is_reverse(handle))
# Get the node before the snarl
index.follow_net_edges(here_net_handle, graph, True, visit)
# Get the node after the snarl
index.follow_net_edges(here_net_handle, graph, False, visit)
# Put them in the list of ancestor snarl bound pairs for the node we started at
node_snarl_bounds.append(tuple(bounds))
here_net_handle = index.canonical(index.get_parent(here_net_handle))
Then you can collect all the unique snarls that all the nodes visited by the read appear in, and then take the read and store it somewhere where you can look at it when you want to think about each of those snarls.
from vg.
You might want to skip the trivial snarls that don't have any contents between their bounding nodes. I think you can ID them because for_each_child
on them won't visit anything.
from vg.
Related Issues (20)
- VCF file empty when calling SV on ONT data HOT 9
- vg map errors HOT 5
- Genotyping SVs in a minigraph-cactus graph yields many similar alleles in output vcf HOT 1
- How to align both long and paired end short reads using vg HOT 1
- Augmentation failed on one chromosome, but succesfull on other chromosomes HOT 3
- Can VG simulate the third-generations long reads? HOT 5
- Issue with vg map and vg augment for certain inputs. HOT 2
- Merging multiple graph files HOT 4
- short read giraffe alignment crashes most of the time, works on a few random samples - Signal 11 HOT 2
- Running VG deconstruct failed in the gfa generated from PGGB pipline HOT 3
- ERROR vg autoindex :Tag "transcript_id" not found in attributes (line 145). ERROR: Tag "transcript_id" not found in attributes (line 4). ERROR: No transcripts parsed (remember to set feature type "-y" in vg rna or "-f" in vg autoindex) HOT 1
- Error Exceeded Limit of Size on Disk While Running vg index HOT 1
- Mapping paired reads w/ giraffe, no EOF marker, job stalls, exit code 79 HOT 5
- Construct a generation-level pan-genome HOT 5
- Release vg v1.58.0
- VG autoindex not working with PGGB GFA HOT 3
- vg augment forwardize_breakpoints error HOT 1
- How to quantify the uncore gene expression by using vg rna HOT 1
- vg pack signal 6 error HOT 3
- Remove limit on Dozeu problem cell count from old Giraffe codepath
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vg.