Code Monkey home page Code Monkey logo

pybel's Introduction

PyBEL zenodo Build Status Development Coverage Status Development Documentation Status Powered by the Bioregistry Code style: black

PyBEL is a pure Python package for parsing and handling biological networks encoded in the Biological Expression Language (BEL).

It facilitates data interchange between data formats like NetworkX, Node-Link JSON, JGIF, CSV, SIF, Cytoscape, CX, INDRA, and GraphDati; database systems like SQL and Neo4J; and web services like NDEx, BioDati Studio, and BEL Commons. It also provides exports for analytical tools like HiPathia, Drug2ways and SPIA; machine learning tools like PyKEEN and OpenBioLink; and others.

Its companion package, PyBEL Tools, contains a suite of functions and pipelines for analyzing the resulting biological networks.

We realize that we have a name conflict with the python wrapper for the cheminformatics package, OpenBabel. If you're looking for their python wrapper, see here.

Citation

If you find PyBEL useful for your work, please consider citing:

Installation Current version on PyPI Stable Supported Python Versions MIT License

PyBEL can be installed easily from PyPI with the following code in your favorite shell:

$ pip install pybel

or from the latest code on GitHub with:

$ pip install git+https://github.com/pybel/pybel.git

See the installation documentation for more advanced instructions. Also, check the change log at CHANGELOG.rst.

Getting Started

More examples can be found in the documentation and in the PyBEL Notebooks repository.

Compiling and Saving a BEL Graph

This example illustrates how the a BEL document from the Human Brain Pharmacome project can be loaded and compiled directly from GitHub.

>>> import pybel
>>> url = 'https://raw.githubusercontent.com/pharmacome/conib/master/hbp_knowledge/proteostasis/kim2013.bel'
>>> graph = pybel.from_bel_script_url(url)

Other functions for loading BEL content from many formats can be found in the I/O documentation. Note that PyBEL can handle BEL 1.0 and BEL 2.0+ simultaneously.

After you have a BEL graph, there are numerous ways to save it. The pybel.dump function knows how to output it in many formats based on the file extension you give. For all of the possibilities, check the I/O documentation.

>>> import pybel
>>> graph = ...
>>> # write as BEL
>>> pybel.dump(graph, 'my_graph.bel')
>>> # write as Node-Link JSON for network viewers like D3
>>> pybel.dump(graph, 'my_graph.bel.nodelink.json')
>>> # write as GraphDati JSON for BioDati
>>> pybel.dump(graph, 'my_graph.bel.graphdati.json')
>>> # write as CX JSON for NDEx
>>> pybel.dump(graph, 'my_graph.bel.cx.json')
>>> # write as INDRA JSON for INDRA
>>> pybel.dump(graph, 'my_graph.indra.json')

Summarizing the Contents of the Graph

The BELGraph object has several "dispatches" which are properties that organize its various functionalities. One is the BELGraph.summarize dispatch, which allows for printing summaries to the console.

These examples will use the RAS Model from EMMAA, so you'll have to be sure to pip install indra first. The graph can be acquired and summarized with BELGraph.summarize.statistics() as in:

>>> import pybel
>>> graph = pybel.from_emmaa('rasmodel', date='2020-05-29-17-31-58')  # Needs
>>> graph.summarize.statistics()
---------------------  -------------------
Name                   rasmodel
Version                2020-05-29-17-31-58
Number of Nodes        126
Number of Namespaces   5
Number of Edges        206
Number of Annotations  4
Number of Citations    1
Number of Authors      0
Network Density        1.31E-02
Number of Components   1
Number of Warnings     0
---------------------  -------------------

The number of nodes of each type can be summarized with BELGraph.summarize.nodes() as in:

>>> graph.summarize.nodes(examples=False)
Type (3)        Count
------------  -------
Protein            97
Complex            27
Abundance           2

The number of nodes with each namespace can be summarized with BELGraph.summarize.namespaces() as in:

>>> graph.summarize.namespaces(examples=False)
Namespace (4)      Count
---------------  -------
HGNC                  94
FPLX                   3
CHEBI                  1
TEXT                   1

The edges can be summarized with BELGraph.summarize.edges() as in:

>>> graph.summarize.edges(examples=False)
Edge Type (12)                       Count
---------------------------------  -------
Protein increases Protein               64
Protein hasVariant Protein              48
Protein partOf Complex                  47
Complex increases Protein               20
Protein decreases Protein                9
Complex directlyIncreases Protein        8
Protein increases Complex                3
Abundance partOf Complex                 3
Protein increases Abundance              1
Complex partOf Complex                   1
Protein decreases Abundance              1
Abundance decreases Protein              1

Grounding the Graph

Not all BEL graphs contain both the name and identifier for each entity. Some even use non-standard prefixes (also called namespaces in BEL). Usually, BEL graphs are validated against controlled vocabularies, so the following demo shows how to add the corresponding identifiers to all nodes.

from urllib.request import urlretrieve

url = 'https://github.com/cthoyt/selventa-knowledge/blob/master/selventa_knowledge/large_corpus.bel.nodelink.json.gz'
urlretrieve(url, 'large_corpus.bel.nodelink.json.gz')

import pybel
graph = pybel.load('large_corpus.bel.nodelink.json.gz')

import pybel.grounding
grounded_graph = pybel.grounding.ground(graph)

Note: you have to install pyobo for this to work and be running Python 3.7+.

Displaying a BEL Graph in Jupyter

After installing jinja2 and ipython, BEL graphs can be displayed in Jupyter notebooks.

>>> from pybel.examples import sialic_acid_graph
>>> from pybel.io.jupyter import to_jupyter
>>> to_jupyter(sialic_acid_graph)

Using the Parser

If you don't want to use the pybel.BELGraph data structure and just want to turn BEL statements into JSON for your own purposes, you can directly use the pybel.parse() function.

>>> import pybel
>>> pybel.parse('p(hgnc:4617 ! GSK3B) regulates p(hgnc:6893 ! MAPT)')
{'source': {'function': 'Protein', 'concept': {'namespace': 'hgnc', 'identifier': '4617', 'name': 'GSK3B'}}, 'relation': 'regulates', 'target': {'function': 'Protein', 'concept': {'namespace': 'hgnc', 'identifier': '6893', 'name': 'MAPT'}}}

This functionality can also be exposed through a Flask-based web application with python -m pybel.apps.parser after installing flask with pip install flask. Note that the first run requires about a ~2 second delay to generate the parser, after which each parse is very fast.

Using the CLI

PyBEL also installs a command line interface with the command pybel for simple utilities such as data conversion. In this example, a BEL document is compiled then exported to GraphML for viewing in Cytoscape.

$ pybel compile ~/Desktop/example.bel
$ pybel serialize ~/Desktop/example.bel --graphml ~/Desktop/example.graphml

In Cytoscape, open with Import > Network > From File.

Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.rst for more information on getting involved.

Acknowledgements

Support

The development of PyBEL has been supported by several projects/organizations (in alphabetical order):

Funding

The PyBEL logo was designed by Scott Colby.

pybel's People

Contributors

amanchoudhri avatar bgyori avatar christianebeling avatar cthoyt avatar ddomingof avatar djinnome avatar lekono avatar nsoranzo avatar scolby33 avatar smoe avatar tehw0lf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pybel's Issues

Additional UNSET syntax

Check the BEL 1.0 Syntax for UNSET at item 5.2.3. There are additional ways to unset, including:

  • UNSET ALL
  • UNSET {list, list2, ...}

Improve Documentation for Definitions Cache Manager

I already added a RST page with automodule, so add the docs in the right format inside the classes, their__init__ methods, and the modules that can get propogated through. You can also add any design notes to manager.rst that you want

Error handling for tloc()

Make a custom error handler for tloc() statements that come in the form tloc(<abundance>) and cannot be handled

Export to BEL

After loading a BEL document into a graph, output it again as BEL statements. These statements should be canonical BEL 2.0, organized by:

  1. Citation
  2. Evidence
  3. Species
  4. Alphabetical order of remaining annotations

Cache Manager Summary

I want to add to the CLI something like pybel nscache list to list all of the urls that have been stored in the cache. Can you make a function DefinitionCacheManager.ls(stream=sys.stdout) that prints this info to the given stream (default to stdout)

Fix the gmml output

Problem: Character sets in BEL could ends up in non valid gmml output

pybel convert --path /pathTo/PD.bel --graphml /pathTo/PD.graphml

Define conventions

We should define some conventions on how to handle stuff. i.e. naming of tables in db.

Create Demo Notebook

Create Jupyter Notebook showcasing a nice use case.

Todo:

  • pick a BEL model
  • choose additional data

Options:

  • implementation of the Reverse Causal Reasoning algorithm.

Graph Queries with __getitem__

Define BELGraph.__getitem__ or BELGraph.query_edge.__getitem__ so queries like this can be made:

>>> import pybel
>>> g = pybel.from_url('http://resource.belframework.org/belframework/20150611/knowledge/small_corpus.bel')
>>> g.query_edge[g.relation == 'decreases']
...
>>> g.query_edge[g.relation == 'decreases' or g.relation == 'directlyDecreases']
...
>>> g.query_node[type='Protein']
...

Hopefully we can recycle the code from AETIONOMY. This stuff can all go into the BELGraph subclass of nx.MultiDiGraph

Add Namespace Mapping Models and Manager

The OpenBEL framework provides equivalence (*.beleq) files in the same format as the namespace (*.belns) and annotation (*.belanno) files. These files map each value to a hash that represents their "equivalence class"

Example: http://resources.openbel.org/belframework/20150611/equivalence/disease-ontology.beleq

  • Using the models in pybel.manager.database_models and I/O class pybel.manager.database_models.DefinitionsCacheManager.py built by @LeKono for the Definitions Cache Manager as a template, build a new sqlalchemy model to store this data.
    • This could be one table to represent "equivalence classes" and one to represent the many-to-one relation between pybel.manager.database_models.Context and the equivalence class
  • Identify relevant equivalence files from the 1.0, 2013, and 2015 releases of the BEL Framework

Add Annotation Cache to DefinitionsCacheManager

In pybel.manager.defaults, add second list called default_annotations with the different iterations of Selventa's default annotations

The data model for namespaces and annotations should be the same, but there are cases when an ontology and a namespace are both using the same name (reference BRCO in the AD model) so lets keep their data seperate for now, even though the logic might be the same.

Validator Plugins

Get hooks for more validator plugins. For example, Reagon asked for the statements that don't have a disease state (normal, AD, etc) to get flagged.

This could either be done on compile time, or the line number for each statement could be included in the graph.

Exit Code on CLI

Exit Codes:

  • 0 if it was perfect
  • 1 if there was an awful mistake and you should feel bad

Two Way Relations

Automatically add two-way relations with A relation B and B relation A for correlative relations and other mutual relationships.

Build Definition Hierarchy Database

Caches for prior knowledge about subClassOf relationships for any nodes like diseases in a hierarchy, protein families, components of protein complexes

  • Data model in sqlalchemy that connects data in definitions cache with relationships like subClassOf, partOf, memberOf, etc.
SUBCLASS_TABLE_NAME = 'pybel_cache_subclass'

class Subclass(Base):
    """This table represents the many-to-many subclass relationships between names or annotations"""
    __tablename__ = SUBCLASS_TABLE_NAME

    parent = Column(Integer, ForeignKey('{}.id'.format(CONTEXT_TABLE_NAME)), primary_key=True)
    child = Column(Integer, ForeignKey('{}.id'.format(CONTEXT_TABLE_NAME)), primary_key=True)
  • Addition of function add_heirarchy(list_of_pairs) to DefinitionCacheManager that takes a list of pairs (parent, child) and inserts the appropriate edges to this table
  • Function polymorphism to also accept a nx.DiGraph whose edges are pairs of (parent, child)

Write-up on Why Not RDF

This issue used to be called 'Export to RDF', but I think we should assemble our thoughts on why RDF is a bad way to do data interchange for this kind of data, and add that to the Design Choices section of the documentation

Parsing of Nested Statements

Parse BEL statements in the form of subject_1 predicate_1 (subject_2 predicate_2 object) and loading into graph.

Example:

p(HGNC:AKT1) -> (p(HGNC:AKT2) -> p(HGNC:PIK3CA))

To Do:

  • add hooks for resolving predicate_1 and predicate_2
  • write extensive testing

Export to SMBL

How can data be interchanged between BEL and other systems biology formats, like SMBL?

Check INDRA, NDEX, CausalBioNet, and other data sources to see how they're interchanging data

Improve default namespaces

In pybel.manager.defaults, include a much more thorough list of namespaces using all 3 Selventa releases, and consider using some of the higher-end BELIEF namespaces (consult the AD and PD AETIONOMY BEL models for what they use)

Use these:

default_namespaces = [
    # 1.0 Release
    'http://resources.openbel.org/belframework/1.0/namespace/affy-hg-u133-plus2.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/affy-hg-u133ab.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/affy-hg-u95av2.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/affy-mg-u74abc.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/affy-moe430ab.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/affy-mouse430-2.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/affy-mouse430a-2.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/affy-rae230ab-2.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/affy-rat230-2.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/chebi-ids.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/chebi-names.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/entrez-gene-ids-hmr.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/go-biological-processes-accession-numbers.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/go-biological-processes-accession-numbers.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/go-biological-processes-names.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/go-cellular-component-accession-numbers.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/go-cellular-component-accession-numbers.belns.s',
    'http://resources.openbel.org/belframework/1.0/namespace/go-cellular-component-terms.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/hgnc-approved-symbols.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/mesh-biological-processes.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/mesh-cellular-locations.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/mesh-diseases.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/mgi-approved-symbols.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/rgd-approved-symbols.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/selventa-legacy-chemical-names.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/selventa-legacy-diseases.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/selventa-named-human-complexes.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/selventa-named-human-protein-families.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/selventa-named-mouse-complexes.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/selventa-named-mouse-protein-families.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/selventa-named-rat-complexes.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/selventa-named-rat-protein-families.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/swissprot-accession-numbers.belns',
    'http://resources.openbel.org/belframework/1.0/namespace/swissprot-entry-names.belns',

    # 2013 Release
    'http://resources.openbel.org/belframework/20131211/namespace/affy-probeset-ids.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/chebi-ids.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/chebi.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/disease-ontology-ids.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/disease-ontology.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/entrez-gene-ids.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/go-biological-process-ids.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/go-biological-process.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/go-cellular-component-ids.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/go-cellular-component.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/hgnc-human-genes.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/mesh-cellular-structures.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/mesh-diseases.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/mesh-processes.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/mgi-mouse-genes.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/rgd-rat-genes.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/selventa-legacy-chemicals.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/selventa-legacy-diseases.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/selventa-named-complexes.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/selventa-protein-families.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/swissprot-ids.belns',
    'http://resources.openbel.org/belframework/20131211/namespace/swissprot.belns',

    # 2015 Release
    'http://resource.belframework.org/belframework/20150611/namespace/affy-probeset-ids.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/chebi-ids.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/chebi.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/disease-ontology-ids.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/disease-ontology.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/entrez-gene-ids.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/go-biological-process-ids.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/go-biological-process.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/go-cellular-component-ids.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/go-cellular-component.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/hgnc-human-genes.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/mesh-cellular-structures-ids.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/mesh-cellular-structures.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/mesh-chemicals-ids.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/mesh-chemicals.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/mesh-diseases-ids.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/mesh-diseases.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/mesh-processes-ids.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/mesh-processes.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/mgi-mouse-genes.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/rgd-rat-genes.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/selventa-legacy-chemicals.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/selventa-legacy-diseases.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/selventa-named-complexes.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/selventa-protein-families.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/swissprot-ids.belns',
    'http://resource.belframework.org/belframework/20150611/namespace/swissprot.belns'
]

default_annotations = [
    # 1.0  Release
    'http://resource.belframework.org/belframework/1.0/annotation/atcc-cell-line.belanno',
    'http://resource.belframework.org/belframework/1.0/annotation/mesh-body-region.belanno',
    'http://resource.belframework.org/belframework/1.0/annotation/mesh-cardiovascular-system.belanno',
    'http://resource.belframework.org/belframework/1.0/annotation/mesh-cell-structure.belanno',
    'http://resource.belframework.org/belframework/1.0/annotation/mesh-cell.belanno',
    'http://resource.belframework.org/belframework/1.0/annotation/mesh-digestive-system.belanno',
    'http://resource.belframework.org/belframework/1.0/annotation/mesh-disease.belanno',
    'http://resource.belframework.org/belframework/1.0/annotation/mesh-embryonic-structure.belanno',
    'http://resource.belframework.org/belframework/1.0/annotation/mesh-endocrine-system.belanno',
    'http://resource.belframework.org/belframework/1.0/annotation/mesh-fluid-and-secretion.belanno',
    'http://resource.belframework.org/belframework/1.0/annotation/mesh-hemic-and-immune-system.belanno',
    'http://resource.belframework.org/belframework/1.0/annotation/mesh-integumentary-system.belanno',
    'http://resource.belframework.org/belframework/1.0/annotation/mesh-musculoskeletal-system.belanno',
    'http://resource.belframework.org/belframework/1.0/annotation/mesh-nervous-system.belanno',
    'http://resource.belframework.org/belframework/1.0/annotation/mesh-respiratory-system.belanno',
    'http://resource.belframework.org/belframework/1.0/annotation/mesh-sense-organ.belanno',
    'http://resource.belframework.org/belframework/1.0/annotation/mesh-stomatognathic-system.belanno',
    'http://resource.belframework.org/belframework/1.0/annotation/mesh-tissue.belanno',
    'http://resource.belframework.org/belframework/1.0/annotation/mesh-urogenital-system.belanno',
    'http://resource.belframework.org/belframework/1.0/annotation/species-taxonomy-id.belanno',

    # 2013 Release
    'http://resource.belframework.org/belframework/20131211/annotation/anatomy.belanno',
    'http://resource.belframework.org/belframework/20131211/annotation/cell-line.belanno',
    'http://resource.belframework.org/belframework/20131211/annotation/cell-structure.belanno',
    'http://resource.belframework.org/belframework/20131211/annotation/cell.belanno',
    'http://resource.belframework.org/belframework/20131211/annotation/disease.belanno',
    'http://resource.belframework.org/belframework/20131211/annotation/mesh-anatomy.belanno',
    'http://resource.belframework.org/belframework/20131211/annotation/mesh-diseases.belanno',
    'http://resource.belframework.org/belframework/20131211/annotation/species-taxonomy-id.belanno',

    # 2015 Release
    'http://resource.belframework.org/belframework/20150611/annotation/anatomy.belanno',
    'http://resource.belframework.org/belframework/20150611/annotation/cell-line.belanno',
    'http://resource.belframework.org/belframework/20150611/annotation/cell-structure.belanno',
    'http://resource.belframework.org/belframework/20150611/annotation/cell.belanno',
    'http://resource.belframework.org/belframework/20150611/annotation/disease.belanno',
    'http://resource.belframework.org/belframework/20150611/annotation/mesh-anatomy.belanno',
    'http://resource.belframework.org/belframework/20150611/annotation/mesh-diseases.belanno',
    'http://resource.belframework.org/belframework/20150611/annotation/species-taxonomy-id.belanno'
]

Export to BEL

After loading a BEL document into a graph, output it again as BEL statements. Serialize:

  • variants
  • nodes
  • edges
  • document metadata
  • definitions
  • annotations

These statements should be canonical BEL 2.0, organized by:

  1. Citation
  2. Evidence
  3. Species
  4. Alphabetical order of remaining annotations

Windows compatibility

Windows may not be the OS of choice for programmers but for other scientists and students it is often the OS to go. There is a problem in requests_file with local files on a windows machine. It can not handle the windows OS.sep becaus the url decoding is implemented with this statement:

path_parts = [unquote(p) for p in url_parts.path.split('/')].

Maybe this is not the most urgent 'bug' to handle. But it is a problem that we should take care of.

Multiple annotations

Hello guys,
I am trying to add my thesis to the pybel and I found that there is only one annotation save to the edges despite there are more than one.
Could you please check that out?

Thanks,
Daniel

Semantic constraints on SET DOCUMENT

While it's not mentioned in the BEL 2.0 Specification, the BEL 1.0 Specification item 5.1.3 states that only the following metadata may be annotated:

  • Authors
  • ContactInfo
  • Copyright
  • Description
  • Disclaimer
  • Licenses
  • Name
  • Version

Graph Data Management

This is the monolithic to-do for the data manager portion of the project. This consists of translating the Django models used in the AETIONOMY Knowledgebase to use SQLlchemy.

For the first try, do NOT focus on speed. We need a working implementation first before optimizing. This should all be done in beautiful, idiomatic python, and take full advantage of the SQLAlchemy ORM. This also means that pandas is unacceptable.

Production ready

  • Add model for network metadata. Include the pickled graph. (pybel.manager.models.Network)
  • Input and output of network object in Graph Cache Manager (pybel.manager.graph_cache.GraphCacheManager)

Still needs reinvestigation

  • Add models to pybel.manager
  • Implement pybel.to_database(belgraph, conn_str)
  • Implement pybel.from_database(conn_str)

Reinvestigate all code:

  • Add docs to all models
  • Add ORM information, like relationship() to automatically make joins
  • Use proper session querying, like session.query(models.Namespace).filter(models.Namespace.keyword == 'HGNC').one() and transaction management at the session level

Namespace Cache Manager

This database should be importable to python as a dictionary of {namespace: {name: (canonical_namesapce, canonical_name)}}

To Do:

  • Write script to import and populate data from scratch (HGNC website and other sources)
  • Decide on database schema
  • Build sqlalchemy models

Ultimately, the API should look like:

>>> import sqlite3
>>> import pybel
>>> conn = sqlite3.connect(":memory:")
>>> mapping = pybel.nsdb.load_mapping_db(conn)

Or, when using sqlalchemy:

>>> from sqlalchemy import create_engine
>>> eng = create_engine("sqlite://")
>>> mapping = pybel.nsdb.load_mapping_db(eng)

Node names in graph

Inside the networkx.MultiDiGraph data structure, is it reasonable to assign bizarre names to nodes based on their type/namespace/name, or is it better to just keep an ID?

Implement:

  1. On handling, canonicalize ParseElement_ to tuples using ParseElement_.asList and pybel.parser.utils.list2tuple and come up with a reasonable sorting method for nodes containing lists, like composite, complex, and for lists of modifiers for protein, gene, and RNA variants
  2. Maintain a two-way dictionary of {node tuple: node id} for quick lookup of graph membership

Explain Resolving Protein Complex Names

Make a Jupyter Notebook describing where to get named protein complex information from the BEL Framework. Parse this information and use it to post-process a graph. Later, this code can be written as a post-processing tool in either pybel-core or pybel-tools

Same issue as pybel/pybel-notebooks#3

Later:

  • Identify external databases listing named protein complexes and their members
  • Write scripts to automatically download and parse that information
  • Add to hierarchy database #25

Semantic Checks on Namespace Usage

BEL namespace files specify in which abundance functions namespaces can be used. Add a semantic validator (possibly after compilation) for this.

Parser Performance Improvements

Use MatchFirst in all lists, and streamline the individual items upon building like self.protein and eventually self.bel_term within pybel.parsers.bel_parser.py

Finally, do memory checks, then hopefully make it possible for Travis to run the whole test suite

Implementation of Neo4j Export

  • How to mangle dictionaries?
  • How to handle py2neo graphs?
  • Add to cli.py

Consider the AETIONOMY API for neo2django, but not too hard. This should be sensical in its own right, and neo2django might be overdue for changes as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.