Code Monkey home page Code Monkey logo

networkexpansionpy's Introduction

networkExpansionPy

Python package to construct biosphere-level metabolic networks, and run network expansion algorithms. This package contains functions to prune biochemical reactions based on thermodynamic constraints.

The package is built from algorithms described in the following papers:

Goldford J.E. et al, Remnants of an ancient metabolism without phosphate. Cell 168, 1โ€“9, March 9, 2017

Goldford J.E. et al, Environmental boundary conditions for the origin of life converge to an organo-sulfur metabolism. Nature Ecol Evo 3,12 1715-1724, November 11, 2019

Installation

In a conda or virtual environment, clone git repo and install using pip.

git clone https://github.com/jgoldford/networkExpansionPy.git
cd networkExpansionPy
pip install -e .

networkexpansionpy's People

Contributors

hbsmith avatar jgoldford avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

networkexpansionpy's Issues

Providing built-in datasets

This includes allowing a user to easily query different versions of KEGG, or possibly other data (such as ATLAS, or subsets of reactions correspond to different taxa/omic data).

Network expansion visualizations

It would be cool to be able to reliably, easily visualize clear figures of network expansion (network graphs) and other associated data. While it's easy to make a default networkx visualization, they rarely help provide intuition with default parameters. Standard options, like being able to exclude highly connected nodes like H2O, or only connect nodes from adjacent generations, would be a good start. Plots made in seaborn or plotly would also be nice.

Documentation

Add docstrings (and comments if needed) to all functions

Parallelization

Provide built-ins to easily allow multi-threading, or parallel processing.

pivot_table and fillna are slow

It looks like the slowest parts of calling expand are:

  1. Creating the pivot table
  2. Filling the table with 0s
  3. Sparsifying the data
  4. Making R,P,b before sparsifying the data

You also call pivot/fill part twice every call to expand--once when inside initialize_metabolite_vector and once in the main expand loop. That's easy enough to get rid of and will shave 4ish seconds off everything.

I have an idea for how to ditch pivot to make this faster and am working on a PR .

But the other good news is that if we want to save S, R,P,b etc and just read them in during runs that should be possible too.

Timer unit: 1 s

Total time: 12.6907 s
File: /.../networkExpansionPy/networkExpansionPy/lib.py
Function: expand at line 221

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   221                                               def expand(self,seedSet,algorithm='naive'):
   222                                                   # constructre network from skinny table and create matricies for NE algorithm
   223         1          4.3      4.3     34.1          x0 = self.initialize_metabolite_vector(seedSet)
   224         1          4.2      4.2     33.1          network = self.network.pivot_table(index='cid',columns = ['rn','direction'],values='s').fillna(0)
   225         1          0.0      0.0      0.0          S = network.values
   226         1          0.4      0.4      3.5          R = (S < 0)*1
   227         1          0.4      0.4      3.5          P = (S > 0)*1
   228         1          0.9      0.9      7.0          b = sum(R)
   229                                           
   230                                                   # sparsefy data
   231         1          1.1      1.1      8.8          R = csr_matrix(R)
   232         1          1.2      1.2      9.1          P = csr_matrix(P)
   233         1          0.0      0.0      0.0          b = csr_matrix(b)
   234         1          0.0      0.0      0.0          b = b.transpose()
   235                                           
   236         1          0.0      0.0      0.0          x0 = csr_matrix(x0)
   237         1          0.0      0.0      0.0          x0 = x0.transpose()
   238         1          0.0      0.0      0.0          if algorithm.lower() == 'naive':
   239         1          0.0      0.0      0.4              x,y = netExp(R,P,x0,b)
   240                                                   elif algorithm.lower() == 'cr':
   241                                                       x,y = netExp_cr(R,P,x0,b)
   242                                                   else:
   243                                                       raise ValueError('algorithm needs to be naive (compound stopping criteria) or cr (reaction/compound stopping criteria)')
   244                                                   
   245                                                   # convert to list of metabolite ids and reaction ids
   246         1          0.0      0.0      0.0          if x.toarray().sum() > 0:
   247         1          0.0      0.0      0.0              cidx = np.where(x.toarray().T[0])[0]
   248         1          0.0      0.0      0.3              compounds = network.iloc[cidx].index.get_level_values(0).tolist()
   249                                                   else:
   250                                                       compounds = []
   251                                                       
   252         1          0.0      0.0      0.0          if y.toarray().sum() > 0:
   253         1          0.0      0.0      0.0              ridx = np.where(y.toarray().T[0])[0]
   254         1          0.0      0.0      0.0              ridx = np.where(y.toarray().T[0])[0]
   255         1          0.0      0.0      0.1              reactions = list(network.iloc[:,ridx])
   256                                                   else:
   257                                                       reactions = [];
   258                                                       
   259         1          0.0      0.0      0.0          return compounds,reactions

Add ability to download new versions of KEGG

Probably through Biopython's Togows module to grab easily parsable json files. This is necessary prior to additional annotation (e.g. adding information on reaction balancing or free energies).

Testing

Add basic tests for core network expansion code, and other functions

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.