jgoldford / networkexpansionpy Goto Github PK

Metabolic network expansion python package

Python 42.70% Jupyter Notebook 57.30%

networkexpansionpy's Introduction

networkExpansionPy

Python package to construct biosphere-level metabolic networks, and run network expansion algorithms. This package contains functions to prune biochemical reactions based on thermodynamic constraints.

The package is built from algorithms described in the following papers:

Goldford J.E. et al, Remnants of an ancient metabolism without phosphate. Cell 168, 1–9, March 9, 2017

Goldford J.E. et al, Environmental boundary conditions for the origin of life converge to an organo-sulfur metabolism. Nature Ecol Evo 3,12 1715-1724, November 11, 2019

Installation

In a conda or virtual environment, clone git repo and install using pip.

git clone https://github.com/jgoldford/networkExpansionPy.git
cd networkExpansionPy
pip install -e .

networkexpansionpy's People

Contributors

Stargazers

Watchers

Forkers

hbsmith zmaas clairhuffine

networkexpansionpy's Issues

write/read zipped files for fold expansion

in order to reduce the size of the files being created

writing: https://stackoverflow.com/questions/57983431/whats-the-most-space-efficient-way-to-compress-serialized-python-data
reading (can just use pandas i think): https://pandas.pydata.org/docs/reference/api/pandas.read_pickle.html

create nx graph objects from metabolism and expansion

Add feature to create nx graph object for metabolism or expansion. This can be used down the road for other analytical functions, or graph visualization.

Providing built-in datasets

This includes allowing a user to easily query different versions of KEGG, or possibly other data (such as ATLAS, or subsets of reactions correspond to different taxa/omic data).

add .lower() to all string kwargs?

Should we make all string-valued kwargs case insensitive?

Network expansion visualizations

It would be cool to be able to reliably, easily visualize clear figures of network expansion (network graphs) and other associated data. While it's easy to make a default networkx visualization, they rarely help provide intuition with default parameters. Standard options, like being able to exclude highly connected nodes like H2O, or only connect nodes from adjacent generations, would be a good start. Plots made in seaborn or plotly would also be nice.

Documentation

Add docstrings (and comments if needed) to all functions

Set cobra model as metabolic network

Load and parse cobra model and set as metabolism object (would be good for individual genomes)

Parallelization

Provide built-ins to easily allow multi-threading, or parallel processing.

pivot_table and fillna are slow

It looks like the slowest parts of calling expand are:

Creating the pivot table
Filling the table with 0s
Sparsifying the data
Making R,P,b before sparsifying the data

You also call pivot/fill part twice every call to expand--once when inside initialize_metabolite_vector and once in the main expand loop. That's easy enough to get rid of and will shave 4ish seconds off everything.

I have an idea for how to ditch pivot to make this faster and am working on a PR .

But the other good news is that if we want to save S, R,P,b etc and just read them in during runs that should be possible too.

Timer unit: 1 s

Total time: 12.6907 s
File: /.../networkExpansionPy/networkExpansionPy/lib.py
Function: expand at line 221

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   221                                               def expand(self,seedSet,algorithm='naive'):
   222                                                   # constructre network from skinny table and create matricies for NE algorithm
   223         1          4.3      4.3     34.1          x0 = self.initialize_metabolite_vector(seedSet)
   224         1          4.2      4.2     33.1          network = self.network.pivot_table(index='cid',columns = ['rn','direction'],values='s').fillna(0)
   225         1          0.0      0.0      0.0          S = network.values
   226         1          0.4      0.4      3.5          R = (S < 0)*1
   227         1          0.4      0.4      3.5          P = (S > 0)*1
   228         1          0.9      0.9      7.0          b = sum(R)
   229                                           
   230                                                   # sparsefy data
   231         1          1.1      1.1      8.8          R = csr_matrix(R)
   232         1          1.2      1.2      9.1          P = csr_matrix(P)
   233         1          0.0      0.0      0.0          b = csr_matrix(b)
   234         1          0.0      0.0      0.0          b = b.transpose()
   235                                           
   236         1          0.0      0.0      0.0          x0 = csr_matrix(x0)
   237         1          0.0      0.0      0.0          x0 = x0.transpose()
   238         1          0.0      0.0      0.0          if algorithm.lower() == 'naive':
   239         1          0.0      0.0      0.4              x,y = netExp(R,P,x0,b)
   240                                                   elif algorithm.lower() == 'cr':
   241                                                       x,y = netExp_cr(R,P,x0,b)
   242                                                   else:
   243                                                       raise ValueError('algorithm needs to be naive (compound stopping criteria) or cr (reaction/compound stopping criteria)')
   244                                                   
   245                                                   # convert to list of metabolite ids and reaction ids
   246         1          0.0      0.0      0.0          if x.toarray().sum() > 0:
   247         1          0.0      0.0      0.0              cidx = np.where(x.toarray().T[0])[0]
   248         1          0.0      0.0      0.3              compounds = network.iloc[cidx].index.get_level_values(0).tolist()
   249                                                   else:
   250                                                       compounds = []
   251                                                       
   252         1          0.0      0.0      0.0          if y.toarray().sum() > 0:
   253         1          0.0      0.0      0.0              ridx = np.where(y.toarray().T[0])[0]
   254         1          0.0      0.0      0.0              ridx = np.where(y.toarray().T[0])[0]
   255         1          0.0      0.0      0.1              reactions = list(network.iloc[:,ridx])
   256                                                   else:
   257                                                       reactions = [];
   258                                                       
   259         1          0.0      0.0      0.0          return compounds,reactions