jgoldford / networkexpansionpy Goto Github PK

View Code? Open in Web Editor NEW

3.0 1.0 3.0 49.38 MB

Metabolic network expansion python package

Python 42.70% Jupyter Notebook 57.30%

networkexpansionpy's Issues

add .lower() to all string kwargs?

Should we make all string-valued kwargs case insensitive?

Set cobra model as metabolic network

Load and parse cobra model and set as metabolism object (would be good for individual genomes)

Parallelization

Provide built-ins to easily allow multi-threading, or parallel processing.

Command line tool

Add command line tool to access basic functionality. Probably using argparse https://docs.python.org/3/library/argparse.html

pivot_table and fillna are slow

It looks like the slowest parts of calling expand are:

Creating the pivot table
Filling the table with 0s
Sparsifying the data
Making R,P,b before sparsifying the data

You also call pivot/fill part twice every call to expand--once when inside initialize_metabolite_vector and once in the main expand loop. That's easy enough to get rid of and will shave 4ish seconds off everything.

I have an idea for how to ditch pivot to make this faster and am working on a PR .

But the other good news is that if we want to save S, R,P,b etc and just read them in during runs that should be possible too.

Timer unit: 1 s

Total time: 12.6907 s
File: /.../networkExpansionPy/networkExpansionPy/lib.py
Function: expand at line 221

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   221                                               def expand(self,seedSet,algorithm='naive'):
   222                                                   # constructre network from skinny table and create matricies for NE algorithm
   223         1          4.3      4.3     34.1          x0 = self.initialize_metabolite_vector(seedSet)
   224         1          4.2      4.2     33.1          network = self.network.pivot_table(index='cid',columns = ['rn','direction'],values='s').fillna(0)
   225         1          0.0      0.0      0.0          S = network.values
   226         1          0.4      0.4      3.5          R = (S < 0)*1
   227         1          0.4      0.4      3.5          P = (S > 0)*1
   228         1          0.9      0.9      7.0          b = sum(R)
   229                                           
   230                                                   # sparsefy data
   231         1          1.1      1.1      8.8          R = csr_matrix(R)
   232         1          1.2      1.2      9.1          P = csr_matrix(P)
   233         1          0.0      0.0      0.0          b = csr_matrix(b)
   234         1          0.0      0.0      0.0          b = b.transpose()
   235                                           
   236         1          0.0      0.0      0.0          x0 = csr_matrix(x0)
   237         1          0.0      0.0      0.0          x0 = x0.transpose()
   238         1          0.0      0.0      0.0          if algorithm.lower() == 'naive':
   239         1          0.0      0.0      0.4              x,y = netExp(R,P,x0,b)
   240                                                   elif algorithm.lower() == 'cr':
   241                                                       x,y = netExp_cr(R,P,x0,b)
   242                                                   else:
   243                                                       raise ValueError('algorithm needs to be naive (compound stopping criteria) or cr (reaction/compound stopping criteria)')
   244                                                   
   245                                                   # convert to list of metabolite ids and reaction ids
   246         1          0.0      0.0      0.0          if x.toarray().sum() > 0:
   247         1          0.0      0.0      0.0              cidx = np.where(x.toarray().T[0])[0]
   248         1          0.0      0.0      0.3              compounds = network.iloc[cidx].index.get_level_values(0).tolist()
   249                                                   else:
   250                                                       compounds = []
   251                                                       
   252         1          0.0      0.0      0.0          if y.toarray().sum() > 0:
   253         1          0.0      0.0      0.0              ridx = np.where(y.toarray().T[0])[0]
   254         1          0.0      0.0      0.0              ridx = np.where(y.toarray().T[0])[0]
   255         1          0.0      0.0      0.1              reactions = list(network.iloc[:,ridx])
   256                                                   else:
   257                                                       reactions = [];
   258                                                       
   259         1          0.0      0.0      0.0          return compounds,reactions

Network expansion visualizations

It would be cool to be able to reliably, easily visualize clear figures of network expansion (network graphs) and other associated data. While it's easy to make a default networkx visualization, they rarely help provide intuition with default parameters. Standard options, like being able to exclude highly connected nodes like H2O, or only connect nodes from adjacent generations, would be a good start. Plots made in seaborn or plotly would also be nice.

Add functions to annotate default KEGG data

First requires implementation of #9.

Including elemental and stoichiometric balancing, or existing thermodynamic data, for instance.

Integration with eQuilibrator

Allow querying of thermodynamic parameters through equilibrator directly from the networkExpansionPy package https://gitlab.com/equilibrator/equilibrator-api .

Providing built-in datasets

This includes allowing a user to easily query different versions of KEGG, or possibly other data (such as ATLAS, or subsets of reactions correspond to different taxa/omic data).

write/read zipped files for fold expansion

in order to reduce the size of the files being created

writing: https://stackoverflow.com/questions/57983431/whats-the-most-space-efficient-way-to-compress-serialized-python-data
reading (can just use pandas i think): https://pandas.pydata.org/docs/reference/api/pandas.read_pickle.html

Add ability to download new versions of KEGG

Probably through Biopython's Togows module to grab easily parsable json files. This is necessary prior to additional annotation (e.g. adding information on reaction balancing or free energies).

create nx graph objects from metabolism and expansion

Add feature to create nx graph object for metabolism or expansion. This can be used down the road for other analytical functions, or graph visualization.

Documentation

Add docstrings (and comments if needed) to all functions

Testing

Add basic tests for core network expansion code, and other functions

jgoldford / networkexpansionpy Goto Github PK

networkexpansionpy's Issues

add .lower() to all string kwargs?

Set cobra model as metabolic network

Parallelization

Command line tool

pivot_table and fillna are slow

Network expansion visualizations

Add functions to annotate default KEGG data

Integration with eQuilibrator

Providing built-in datasets

write/read zipped files for fold expansion

Add ability to download new versions of KEGG

create nx graph objects from metabolism and expansion

Documentation

Testing

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent