mjenior / riptide Goto Github PK

View Code? Open in Web Editor NEW

11.0 3.0 3.0 4.63 MB

Reaction Inclusion by Parsimony and Transcript Distribution

License: MIT License

Python 97.37% Jupyter Notebook 2.63%

cobrapy transcriptomics metabolism pfba flux-sampling flux-variability opencobra

riptide's Introduction

RIPTiDe

Reaction Inclusion by Parsimony and Transcript Distribution

v3.4.79

Transcriptomic analyses of bacteria have become instrumental to our understanding of their responses to changes in their environment. While traditional analyses have been informative, leveraging these datasets within genome-scale metabolic network reconstructions (GENREs) can provide greatly improved context for shifts in pathway utilization and downstream/upstream ramifications for changes in metabolic regulation. Many previous techniques for GENRE transcript integration have focused on creating maximum consensus with input datasets, but these approaches have been shown to generate less accurate metabolic predictions than a transcript-agnostic method of flux minimization (pFBA), which identifies the most efficient/economic patterns of metabolism given certain growth constraints. Despite this success, growth conditions are not always easily quantifiable and highlights the need for novel platforms that build from these findings. This method, known as RIPTiDe, combines these concepts and utilizes overall minimization of flux weighted by transcriptomic analysis to identify the most energy efficient pathways to achieve growth that include more highly transcribed enzymes, without previous insight into extracellular conditions. This platform could be important for revealing context-specific bacterial phenotypes in line with governing principles of adaptive evolution, that drive disease manifestation or interactions between microbes.

Please cite when using:

Jenior ML, Moutinho Jr TJ, Dougherty BV, & Papin JA. (2020). Transcriptome-guided parsimonious flux analysis improves predictions with metabolic networks in complex environments. PLOS Comp Biol. https://doi.org/10.1371/journal.pcbi.1007099.

Dependencies

>=python-3.6.4
>=cobra-0.15.3
>=pandas-0.24.1
>=symengine-0.4.0
>=scipy-1.3.0

Installation

Installation is:

$ pip install riptide

Arguments for core RIPTiDe functions:

riptide.read_transcription_file() - Generates dictionary of transcriptomic abundances from a file

REQUIRED
file : string
    User-provided file name which contains gene IDs as rows and associated transcription values as columns per replicate

OPTIONAL
header : boolean
    Defines if read abundance file has a header that needs to be ignored
    Default is no header
sep: string
    Defines what character separates entries on each line
    Defaults to tab (.tsv)
rarefy : bool
    Rarefies rounded transcript abundances to 90% of the smallest replicate
    Default is False
level : int
    Level by which to rarefy samples
    Default is 100000
binning : boolean
    Perform discrete binning of transcript abundances into quantiles
    OPTIONAL, not advised
    Default is False
quant_max : float
    Largest quantile to consider
    Default is 0.9
quant_min : float
    Smallest quantile to consider
    Default is 0.5
step : float
    Step size for parsing quantiles
    Default is 0.125
norm : bool
    Normalize transcript abundances using RPM calculation
    Performed by default
factor : numeric
    Denominator for read normalization calculation
    Default is 1e6 (RPM)
silent  : bool
    Silences std out 
    Default is False

riptide.maxfit() - Create context-specific model based on transcript distribution with maximum fit of flux distribution to input transctiptome

REQUIRED
model : cobra.Model
    The model to be contextualized
transcriptome : dictionary
    Dictionary of transcript abundances, output of read_transcription_file()

OPTIONAL
frac_min : float
    Lower bound for range of minimal fractions to test
    Default is 0.25
frac_max : float
    Upper bound for range of minimal fractions to test
    Default is 0.85
frac_step : float
    Starting interval size within fraction range
    Default is 0.1
prune : bool
    Perform pruning step
    Default is True
samples : int 
    Number of flux samples to collect
    Default is 500
silent  : bool
    Silences std out 
    Default is False
minimum : float
    Minimum linear coefficient allowed during weight calculation for pFBA
    Default is False
conservative : bool
    Conservatively remove inactive reactions based on GPR rules (all member reactions must be inactive to prune)
    Default is False
objective : bool
    Sets previous objective function as a constraint with minimum flux equal to user input fraction
    Default is True
additive : bool
    Pool transcription abundances for reactions with multiple contributing gene products
    Default is False
direct : bool
    Assigns both minimization and maximization step coefficents directly, instead of relying on abundance distribution
    Default is False
set_bounds : bool
    Uses flux variability analysis results from constrained model to set new bounds for all reactions
    Default is True
tasks : list
    List of gene or reaction ID strings for forced inclusion in final model (metabolic tasks or essential genes)
task_lb : float
    Minimum flux bound for metabolic task reactions during pruning
    Default is equal to threshold var
exclude : list
    List of reaction ID strings for forced exclusion from final model
gpr : bool
    Determines if GPR rules will be considered during coefficient assignment
    Default is False
threshold : float
    Minimum flux a reaction must acheive in order to avoid pruning during flux sum minimization step
    Default is 1e-8
defined : False or list
    User defined range of linear coeffients, needs to be defined in a list like [1, 0.5, 0.1, 0.01, 0.001]
    Works best paired with binned abundance catagories from riptide.read_transcription_file()
    Default is False

riptide.contextualize() - Create context-specific model based on transcript distribution with user-defined objective flux minimum

REQUIRED
model : cobra.Model
    The model to be contextualized

OPTIONAL
transcriptome : dictionary
    Dictionary of transcript abundances, output of read_transcription_file()
    With default, an artifical transcriptome is generated where all abundances equal 1.0
fraction : float
    Minimum objective fraction used during single run setting
    Default is 0.8

* Other arguments from iterative implementation are carried over (excluding frac_min and frac_max)

riptide.save_output() - Writes RIPTiDe results to files in a new directory

REQUIRED
riptide_obj : RIPTiDe object
    Class object creared by riptide.contextualize()

OPTIONAL
path : str
    New directory to write output files
file_type : str
    Type of output file for RIPTiDe model
    Accepts either sbml or json
    Default is JSON
silent  : bool
    Silences std out 
    Default is False

Usage

Comments before starting:

Make sure that genes in the transcriptome file matches those that are in your model.
Check the example files for proper data formatting
Binning genes into discrete thresholds for coefficient assignment is available in riptide.read_transcription_file() (not recommended)
Opening the majority of exchange reactions (bounds = +/- 1000) may yeild better prediction when extracellular conditions are unknown
The resulting RIPTiDe object has multiple properties including the context-specific model and flux analyses, accessing each is described below

import riptide

my_model = cobra.io.read_sbml_model('tests/genre.sbml')

transcript_abundances_1 = riptide.read_transcription_file('tests/transcriptome1.tsv')
transcript_abundances_2 = riptide.read_transcription_file('tests/transcriptome2.tsv') # has replicates

riptide_object_1_a = riptide.contextualize(model=my_model, transcriptome=transcript_abundances_1)
riptide_object_1_b = riptide.contextualize(model=my_model, transcriptome=transcript_abundances_1, tasks=['rxn1'], exclude=['rxn2','rxn3'])

riptide.save_output(riptide_obj=riptide_object_1_a, path='~/Desktop/example_riptide_output')

Example riptide.contextualize() stdout report:


Initializing model and integrating transcriptomic data...
Pruning zero flux subnetworks...
Analyzing context-specific flux distributions...

Running max fit RIPTiDe for objective fraction range: 0.65 to 0.85
Progress: 100%

Testing local fractions to 0.3...
Progress: 100%

Context-specific metabolism fit with 0.35 of optimal objective flux

Reactions pruned to 285 from 1129 (74.76% change)
Metabolites pruned to 285 from 1132 (74.82% change)
Flux through the objective DECREASED to ~54.71 from ~65.43 (16.38% change)
Context-specific metabolism correlates with transcriptome (r=0.149, p=0.011 *)

Max fit RIPTiDe completed in, 4 minutes and 33 seconds

In the final step, RIPTiDe assesses the fit of transcriptomic data for the calculated network activity through correlation of transcript abundance and median flux value for each corresponding reaction. The Spearman correlation coefficient and associated p-value are the reported following the fraction of network topology that is pruned during the flux minimization step.

Max fit RIPTiDe tests all minimum objective flux fractions over the provided range and returns only the model with the best Spearman correlation between context-specific flux for reactions and the associated transcriptomic values. Note, terminating search if a subsequent iteration has a lower correlation coefficient than the last could result from a local maxima and must be considered if an exhaustive analysis is preferred.

Resulting RIPTiDe object (class) properties:

The resulting object is a container for the following data structures.

model - Contextualized genome-scale metabolic network reconstruction
transcriptome - Transcriptomic replicate abundances provided by user
percent_of_mapping - Percent of genes in mapping file found in input GENRE
minimization_coefficients - Linear coefficients used during flux sum minimization (based on transcriptome replicates)
maximization_coefficients - Linear coefficients for each reaction based used during flux sampling
pruned - Dictionary containing the IDs of genes, reactions, and metabolites pruned by RIPTiDe
flux_samples - Flux samples from constrained model
flux_variability - Flux variability analysis from constrained model
fraction_of_optimum - Minimum specified percentage of optimal objective flux during contextualization
metabolic_tasks - User defined reactions whose activity is saved from pruning
concordance - Spearman correlation results between linear coefficients and median fluxes from sampling
gpr_integration - Whether GPR rules were considered during assignment of linear coefficients
defined_coefficients - Range of linear coefficients RIPTiDe is allowed to utilize provided as a list
included_important - Reactions or Genes included in the final model which the user defined as important
additional_parameters - Dictionary of additional parameters RIPTiDe uses
fraction_bounds - Minimum and maximum values for the range of objective flux minimum fractions tested
maxfit_iters - Objective flux and fit to transcriptome for each minimum flux fraction tested

Examples of accessing components of RIPTiDe output:

context_specific_GENRE = riptide_object.model
context_specific_FVA = riptide_object.flux_variability
context_specific_flux_samples = riptide_object.flux_samples

Additional Information

Thank you for your interest in RIPTiDe!

If you encounter any problems, please file an issue along with a detailed description.

Distributed under the terms of the MIT license, "riptide" is free and open source software

riptide's People

Contributors

Stargazers

Watchers

Forkers

hongzhonglu yashsharma mjenior-zymergen

riptide's Issues

Error: Something is wrong with the provided bounds nan and nan in constraint

Dear RIPTiDE team!

Our research group uses genome-scale metabolic modelling approach to investigate metabolic systems of pro- and eukarytic organisms. We have been working on the reconstruction and contextualization of GSM models based on omics data, in general, and transcriptomics data integration, in particular. To conduct this research we are using web-version of our original BioUML platform (10.1093/nar/gkac286) where we would like to harness the RiPTIDE tool developed by your team via corresponding Jupyter notebook. So we have an original transcriptomics dataset containing expression data for chicken embryos in several experimental conditions that we analyzed using Edger and DESeq2 tool. We were able to integrate the processed data provided by Edger tool into the model and run the context-specific GSM model for chicken recently published (10.1371/journal.pone.0254270), while the similar analyses for DESeq2 data was failed due to the error: "Something is wrong with the provided bounds nan and nan in constraint". We doublechecked the data in order to identify the issue with constraints and checked the format correctness of the file using an example from your github project. We would really appreciate if you could to help us with the issue or give us some clues or causes of the error?

Error - 'symengine.lib.symengine_wrapper.Symbol' object has no attribute '_index'

Having trouble whilst pruning 0-flux subnetworks.
Using python version 3.10.4

rv = reductor(4) TypeError: can't pickle SwigPyObject objects

Hi,

when I try the last Human-GEM model I got an error in the deepcopy process. Thanks in advance.

model = cobra.io.read_sbml_model("Human-GEM_30112021.xml")
modelRiptide = riptide.contextualize(model)
WARNING: No transcriptome provided. Analyzing most parsimonious state

Initializing model and integrating transcriptomic data...
Traceback (most recent call last):
File "", line 1, in
File "/home/agatha/python-virtual-environments/env/lib/python3.6/site-packages/riptide/riptide.py", line 612, in contextualize
riptide_model = copy.deepcopy(model)
File "/usr/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.6/copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "/usr/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.6/copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "/usr/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.6/copy.py", line 169, in deepcopy
rv = reductor(4)
TypeError: can't pickle SwigPyObject objects

Issue with pip download

Following the pip download a syntax error is given after import is called:

from riptide import *
Traceback (most recent call last):

File "C:\Users\alexa\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)

File "", line 1, in
from riptide import *

File "C:\Users\alexa\anaconda3\lib\site-packages\riptide_init_.py", line 3, in
from .riptide import *

File "C:\Users\alexa\anaconda3\lib\site-packages\riptide\riptide.py", line 639
def _constrain_and_analyze_model(model, coefficient_dict, fraction, sampling_depth, objective, tasks, minimum_threshold=1e-6, cpus):
^
SyntaxError: non-default argument follows default argument

This line contains "cpus" which is not in the present GitHub version.

After removing cpus or replacing riptide with github version import works fine . I think the GitHub / pip version need to be matched

Issues of output results from riptide

Dear Matt Jenior,

I'm using riptide for transcriptomic-constraint genome scale metabolic model. When I use different gene expression datasets, I got similar results, i.e. similar growth rate, flux distributions of each reactions. I do not know how to explain this.
The model information are as follows: The model was constructed using carveme with cplex as its solver (https://carveme.readthedocs.io/en/latest/usage.html ). I use this code: carve 163560.faa -g LB -i LB. The no constraint model has 1028 genes. The gene expression datasets are from control group (ctrl_4h) and antibiotic combination treatment group (comb_4h). There are 197/1028 differentially expressed genes (adjusted p value <0.05, fold change >2).
Here attached the bacteria genome annotation file 163560.faa (for model construction), the model LB163560 I've constructed, the gene expression data from two groups and output results from riptide.

I would be very grateful if you could help.
Thank you very much!

Best regards,
Xingchen Bian
Attachments.zip

Essential genes fail to be forcefully included in model

Minor issue:
If gene IDs are input into RIPTiDe as tasks to be forcefully included, the script returns an error because the gene IDs are not converted into reaction IDs. There is a _screen_tasks function defined in riptide.py, but it has not been incorporated into the maxfit or contextualize functions. Doing so would be an easy fix.

Major issue:
Another issue arises after fixing the conversion from gene IDs to reaction IDs. From reading the riptide.py script it seems task reactions are assigned the minimum cost during pruning, but this does not guarantee its inclusion in the final model. I have tried manually assigning a large negative cost to the task reactions, to no avail. I hesitate to manually remove task reactions from the inactive_rxns list, as this might result in blocked reactions in the final model. And since there is no check that tasks have indeed been included in the model, when the _constrain_for_sampling function is reached in contextualize, RIPTiDe crashes due to the missing task reaction. Is there a way to fix this, or is this just an inherent limitation of the method?

issues on samples and fraction

Dear Matt Jenior,

I'm using riptide for transcriptomic-constraint genome scale metabolic model. When I set fraction to a higher value such as 0.9 with samples being the default value 500, I can not get 500 samples for each reaction in the result file. When I change the fration to a lower one i.e. 0.7, it works. How can it happen?

I would be very grateful if you could help.
Thank you very much!

Best regards,
Xingchen Bian

Riptide context specific model quality and related questions

The GMM (gap filled) used for input in riptide was having memote score of 77% while after integrating transcriptomics data the score fall to 54%. Is this normal ? also the unbounded flux in default media shows negative value which become major cause of decline in model consistency score.
Also whether to supply draft metabolic model or gap-filled model in riptide ?
Does any further model validation is required on riptide generated model ?

unable to save riptide output

Whatever path i enter , it always throw the same error

WARNING: Output path already exists, overwriting previous files
Saving results to ripout_20230524_172604_20230524_172604
Traceback (most recent call last):
File "", line 1, in
File "/home/mibiome/.local/lib/python3.10/site-packages/riptide.py", line 88, in save_output
cobra.io.save_json_model(riptide_obj.model, outFile)
File "/home/mibiome/.local/lib/python3.10/site-packages/cobra/io/json.py", line 122, in save_json_model
with open(filename, "w") as file_handle:
FileNotFoundError: [Errno 2] No such file or directory: 'ripout_20230524_172604_20230524_172604/model.json'

riptide.save_output(riptide_obj=riptide_object_1_a)
WARNING: Did not provide an output directory. Using default riptide_files in working directory
WARNING: Output path already exists, overwriting previous files
Saving results to M_GCF_020912005_1_riptide_20230524_172604
Traceback (most recent call last):
File "", line 1, in
File "/home/mibiome/.local/lib/python3.10/site-packages/riptide.py", line 88, in save_output
cobra.io.save_json_model(riptide_obj.model, outFile)
File "/home/mibiome/.local/lib/python3.10/site-packages/cobra/io/json.py", line 122, in save_json_model
with open(filename, "w") as file_handle:
FileNotFoundError: [Errno 2] No such file or directory: 'M_GCF_020912005_1_riptide_20230524_172604/model.json'

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.