Code Monkey home page Code Monkey logo

el-grafo's Introduction

Bio4j bioinformatics graph data platform

Bio4j is a bioinformatics graph data platform, integrating most data available in Uniprot KB (SwissProt + Trembl), Gene Ontology (GO), UniRef (50,90,100), NCBI Taxonomy, and Expasy Enzyme DB.

Bio4j provides a completely new and powerful framework for protein related information querying and management. The use of a graph-based data model makes possible to store and query data in a way that semantically represents its own structure. On the contrary, traditional relational models and databases must flatten the data they represent into tables, creating artificial ids in order to connect the different tuples; which can in some cases eventually lead to domain models that have almost nothing to do with the actual structure of data.

Project structure and overview

Bio4j can look a bit intimidating at first, with all those repositories with kind of similar names; here you have a guided tour around:

bio4j/bio4j

In this repository bio4j/bio4j you will find the generic Bio4j model and API. Entities, relationships and their properties are modeled using a typed property graph model. For example, there are vertex types for Protein or GoTerm, and a GoAnnotation edge type going from Protein to GoTerm. This graph schema is separated into different graphs, corresponding to the different data sources (UniProt, Go, UniRef, ...) and connections between them (UniProtGo, UniProtUniRef, ...).

The API, based on bio4j/angulillos, lets you write generic typed traversals over this graph schema:

protein.uniref50Member_outV()
  .map(
    UniRef50Cluster::uniRef50Member_inV
  )
  .map(
    prts -> prts.map(
      Protein::goAnnotation_outV
    )
  );

which can later be executed on a particular backend. Generic data import code is also here, which can be used to load the data using any implementation of angulillos.

bio4j/angulillos

You can think of bio4j/angulillos as a strongly typed version of the property graph model. You can describe graph schemas and write generic traversals over them which are guranteed to be well-typed in that for example

  • you cannot retrieve the outgoing edges of and edge
  • and you can get the tweets that a user tweeted, but not the users that a tweet follows!

bio4j/bio4j-titan

In bio4j/bio4j-titan you will find a Titan-based Bio4j distribution. This is the the default standard distribution, and we also provide through AWS S3 the database binaries with all data already loaded. Go there if you want to stop reading and use Bio4j now!

bio4j/angulillos-titan

bio4j/angulillos-titan is an implementation of the angulillos API using Titan.

Documentation

Community and contact

Licensing

Bio4j is an open source platform released under the AGPLv3 license.

el-grafo's People

Contributors

carmen-tm avatar eparejatobes avatar laughedelic avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

el-grafo's Issues

create a sensible project layout

Including

  • examples
  • resources, where you should put things as libs etc
  • separated, completely independent folders for each example/subproject
  • all docs should be in docs
  • ...

look at D3-based libs for graph data manipulation and representation

It would be good to have a set of tools and libs which we plan to be based on before coding starts. One aspect is that of working with graph-like data structures client-side. Here's a list of interesting libs, @carmen-ooo I want you to write a draft doc on how and for what these (and/or other) libs will be used in our project.

graph data libs

  • cpettit/graphlib this looks like a really nice lib, with implementations for some graph algos that could be really useful for us: Tarjan (connected components), top sorting, shortest paths, etc

graph rendering

Representation structure

Appearance, features and interactions. Based on my GSoC Submitted Proposal initial material reviewed.

A. Intro Stage

Presentation. First approach to the Bio4j Database Representation and main aspects

graphicalrepresentation-introstage sketch

Elements proposed:

  • General info of the project: the Bio4j Database, the Visualization project, description, what's for, credits...
  • Modules Graphical representation.
    • Textual description of each Module (On hover)
    • Selection of the Modules that will be loaded to the Network Visualization (On click)
  • Dependencies between Modules (Doubts about this topic on this issue #6)
    • Visual representation into the Modules Graphical scheme.
    • Textual description of each dependencies.
  • "Access to the Network" button -> The network will be loaded with the selected Modules (the selection could be changed afterwards).

(Optional):

  • Numbers section:
    • Updated section with all numbers re. the project (Nº of nodes, Nº of edges, Nº of properties..), last revision date, etc.
    • Small histogram diagram showing the database update activity history in general, by modules, etc.

B. Main Network Stage

Basic Network exploration through different views
The goal is to represent the domain model network allowing both the whole picture and a smaller detailed exploration of a portion of the dataset, as on this example with fluid transitions and path-drawing actions while exploring the node and defining a certain path for a further Bio4j request.

graphicalrepresentation-mainnetworkstage sketch

Elements proposed:

  • FULL NETWORK VIEW

    • Modules loaded diagram as schematic guide : will highlighted the modules loaded on the network (selected on the previous Intro Stage). Will allows to remove/add any desired module into the Network view.
    • Network Visualization of all selected/loaded Modules.
      Two ways of exploring the graph are proposed (Doubts about this topic on this issue #7 )
      • Step by Step -> Path drawing case. The user defines the route as far as explores the network by clicking on the nodes consecutively. Contextual Menu options will allow to perform specific graph actions (filtering, neighbors, etc) as on this dagre+d3 example to support the exploration.
      • Target defined -> Path finding case. The user already knows the info is looking for, and how is called. He writes it on a Search element (_ _ _ _) as on this case and the nodes/edges required are highlighted on the network. By selecting one, a Contextual Menu option will allow to perform again specific graph actions plus the shortest pathway to the root, and maybe more path alternatives.
    • On both cases, a Highlighted Pathway will be standing out on the network (either selecting nodes step by step or choosing a path with a defined target). It will updated as tree/path diagram on the Route Map, corresponding with it node by node.
    • Re. the graphical aspect of the network, it will be simple, not visualizing detailed info about the network, as that will be show on the Detailed Network View next to it. Its sense is to give a whole picture/context and a path drawing of the route. Maybe no labels appearing on it, as appearing on the Detailed View? To be defined. Re. colours, always by module, persistent in all views.
    • Minimap on top of the network slightly highlighting the partial set of nodes and edges loaded on the Detailed Zoom view.

    js graph layout and rendering:

    • Not sure. Something simple as the d3 native force layout or its alternative cola.js could work, in order to represent the network as a whole, but not sure if it is a good idea to have this with a different layout to the zoomed view, code wise, or keep both views with the same dagre layout.

    d3-based possible components:

    • d3.behaviour: zoom ,drag...
    • js events: onClick, onMouseOver, on MouseOut... to allow interacting with the diagram
  • ZOOMED NETWORK VIEW

    • Threshold/Degree of the fragment: Nº of steps/links to be loaded as fragment.
    • Partial Network Visualization of the set of nodes/links specified with the threshold, starting from a selected node.
    • In/Out switch option, allowing to see the network connected afterwards the node or backwards. As selecting a node on the route (on click here or on the full network view), the focus will change to it.
    • Legend of the graphical aspects of the network represented (arity, always defined, etc). (See here)
    • Contextual Menu options will allow to perform specific graph actions (filtering, neighbors, etc) as on this dagre+d3 example to support the exploration. NOTE: Not sure if this should happens on both full/zoomed views or just on this one.
    • (Optional): whatever other elements useful to improve the users navigation wit the the network: spring tension, etc.

    js graph algorithms:

    • graphlib Digraph functions: inEdges, outEdges, filterNodes... to returns
    • graphlib alg modules: alg.topsort, alg.tarjan, alg.dijkstra... to sort nodes,find groups of strongly connected components, find the shortest path.. etc for all the Contextual Menu options allowed.

    js graph layout:

    graph rendering: d3 possible components + dagre-d3 lib:

    • d3 selection.data: enter, update, exit... with [transitions](https://github.com/mbostock/d3/wiki/Transitions) to update the partial network as the users interacts with it.
    • d3.behaviour: zoom ,drag...
    • js events: onClick, onMouseOver, on MouseOut... to allow interacting with the diagram.
  • ROUTE MAP VIEW

    • Protein node as root of the diagram.
    • From root will grow a small tree path. Each branch will corresponds to a different Module.
    • Contextual Menu: Allows removing steps, go backwards.
    • Important info accompanying the diagram:
      • Indexes/typing information listed for the route covered (up of the each step on the diagram)
      • Properties of nodes & edges (down each node and each edge of the digram)
      • "Copy route" button of the Indexes list for a further easy copy/paste operation on the Bio4j platform (lateral position)
    • (Optional): When hovering a node/link, it will be highlighted on both the Full Network and the Zoomed Network

    d3-based possible components to be used:

    • d3.layout.tree() to give it the basic hierarchy structure (root, parent...)
    • selection.data: enter, update, exit... with transitions to update the diagram as the user network selections changes.
    • js events: onClick, onMouseOver, on MouseOut... to allow interacting with the diagram

sample json for the GO model

We need to have a complete json sample with format exactly that that the model service is supposed to return.

Initial material of the project

Compilation of useful stuff for the project from its beginning.

  1. GSoC project initial info:
  2. Bio4j database:
  3. About Networks in general
  4. About graph data-structures formats:
  5. d3 Useful resources:
  6. Other similar cases:

Modules Dependencies

About Uniprot KB, GO, UniRef, RefSeq, NCBI Taxonomy and Expasy Enzyme DB Modules.

Afaik modules aren't totally independent between them. On the Domain Model picture seems they are all connected by the protein central node and they keep independent between them, but they don't..
DomainModelWithDataSourceView

Scheme showing the important dependencies among the modules from Bio4j page

moduledependencies

Question: How are these dependencies? I mean are they specific connections between certain particular nodes from different modules? Or are they more generic, like on a global level?

Just a basic understanding in order to express this visually on the Intro Stage: Dependencies info to give a first glance to the database: which modules are included, what are the main relationships between them if relevant, etc on a visual+descriptive way.

InitialStage

midterm evaluation

Hi @carmen-ooo

midterm evaluations are already here :)

Some relevant links are here:

I'd like you to have all this sorted out by tomorrow 2014-06-25. We will proceed as follows

  1. your mentors will fill out the evaluation today
  2. you should do the same
  3. today we will have a meeting for discussing them

today at 15:00h?

Ways of Exploring the Network

Doubts about Ways of exploring the Full Network, in order to allow them on the Network Stage: Full and Partial Networks
How would users expect to explore the Network, in order to give them the most useful interface? My go:
waysexploringnetwork sketch

  • Step by Step -> Path drawing case. The user defines the route as far as explores the network by clicking on the nodes consecutively. Contextual Menu options will allow to perform specific graph actions (filtering, neighbors, etc) as on this dagre+d3 example to support the exploration. That will define a highlighted pathway on the network that will be updated as tree map/path diagram on the Route Map.
  • Target defined -> Path finding case. The user already knows the info is looking for, and how is called. He writes it on a Search element (_ _ _ _) and the nodes/edges required are highlighted on the network. By selecting one, a Contextual Menu option will allow to perform again specific graph actions plus the shortest pathway to the root, and maybe more path alternatives. Again, when clicking/accepting, the network will shows a path drawing highlighted that will be updated as tree map diagram on the Route Map.

Are those 2 cases right? Any other thought to have in mind about what the users would expect or how they would like to interact with the network? Any other case?

(Optional) Making those 2 ways complementaries. Allowing to mix a handy selection (step by step case) and find by the text search element another node (target defined case) and short paths connecting the selections.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.