jgoldford / networkexpansionpy Goto Github PK
View Code? Open in Web Editor NEWMetabolic network expansion python package
Metabolic network expansion python package
Should we make all string-valued kwargs case insensitive?
Load and parse cobra model and set as metabolism object (would be good for individual genomes)
Provide built-ins to easily allow multi-threading, or parallel processing.
Add command line tool to access basic functionality. Probably using argparse https://docs.python.org/3/library/argparse.html
It looks like the slowest parts of calling expand
are:
0
sR
,P
,b
before sparsifying the dataYou also call pivot/fill part twice every call to expand--once when inside initialize_metabolite_vector
and once in the main expand
loop. That's easy enough to get rid of and will shave 4ish seconds off everything.
I have an idea for how to ditch pivot to make this faster and am working on a PR .
But the other good news is that if we want to save S
, R
,P
,b
etc and just read them in during runs that should be possible too.
Timer unit: 1 s
Total time: 12.6907 s
File: /.../networkExpansionPy/networkExpansionPy/lib.py
Function: expand at line 221
Line # Hits Time Per Hit % Time Line Contents
==============================================================
221 def expand(self,seedSet,algorithm='naive'):
222 # constructre network from skinny table and create matricies for NE algorithm
223 1 4.3 4.3 34.1 x0 = self.initialize_metabolite_vector(seedSet)
224 1 4.2 4.2 33.1 network = self.network.pivot_table(index='cid',columns = ['rn','direction'],values='s').fillna(0)
225 1 0.0 0.0 0.0 S = network.values
226 1 0.4 0.4 3.5 R = (S < 0)*1
227 1 0.4 0.4 3.5 P = (S > 0)*1
228 1 0.9 0.9 7.0 b = sum(R)
229
230 # sparsefy data
231 1 1.1 1.1 8.8 R = csr_matrix(R)
232 1 1.2 1.2 9.1 P = csr_matrix(P)
233 1 0.0 0.0 0.0 b = csr_matrix(b)
234 1 0.0 0.0 0.0 b = b.transpose()
235
236 1 0.0 0.0 0.0 x0 = csr_matrix(x0)
237 1 0.0 0.0 0.0 x0 = x0.transpose()
238 1 0.0 0.0 0.0 if algorithm.lower() == 'naive':
239 1 0.0 0.0 0.4 x,y = netExp(R,P,x0,b)
240 elif algorithm.lower() == 'cr':
241 x,y = netExp_cr(R,P,x0,b)
242 else:
243 raise ValueError('algorithm needs to be naive (compound stopping criteria) or cr (reaction/compound stopping criteria)')
244
245 # convert to list of metabolite ids and reaction ids
246 1 0.0 0.0 0.0 if x.toarray().sum() > 0:
247 1 0.0 0.0 0.0 cidx = np.where(x.toarray().T[0])[0]
248 1 0.0 0.0 0.3 compounds = network.iloc[cidx].index.get_level_values(0).tolist()
249 else:
250 compounds = []
251
252 1 0.0 0.0 0.0 if y.toarray().sum() > 0:
253 1 0.0 0.0 0.0 ridx = np.where(y.toarray().T[0])[0]
254 1 0.0 0.0 0.0 ridx = np.where(y.toarray().T[0])[0]
255 1 0.0 0.0 0.1 reactions = list(network.iloc[:,ridx])
256 else:
257 reactions = [];
258
259 1 0.0 0.0 0.0 return compounds,reactions
It would be cool to be able to reliably, easily visualize clear figures of network expansion (network graphs) and other associated data. While it's easy to make a default networkx visualization, they rarely help provide intuition with default parameters. Standard options, like being able to exclude highly connected nodes like H2O, or only connect nodes from adjacent generations, would be a good start. Plots made in seaborn or plotly would also be nice.
First requires implementation of #9.
Including elemental and stoichiometric balancing, or existing thermodynamic data, for instance.
Allow querying of thermodynamic parameters through equilibrator directly from the networkExpansionPy package https://gitlab.com/equilibrator/equilibrator-api .
This includes allowing a user to easily query different versions of KEGG, or possibly other data (such as ATLAS, or subsets of reactions correspond to different taxa/omic data).
in order to reduce the size of the files being created
writing: https://stackoverflow.com/questions/57983431/whats-the-most-space-efficient-way-to-compress-serialized-python-data
reading (can just use pandas i think): https://pandas.pydata.org/docs/reference/api/pandas.read_pickle.html
Probably through Biopython's Togows module to grab easily parsable json files. This is necessary prior to additional annotation (e.g. adding information on reaction balancing or free energies).
Add feature to create nx graph object for metabolism or expansion. This can be used down the road for other analytical functions, or graph visualization.
Add docstrings (and comments if needed) to all functions
Add basic tests for core network expansion code, and other functions
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.