Code Monkey home page Code Monkey logo

velodyn's Introduction

VeloDyn -- Quantitative analysis of RNA velocity

RNA velocity infers a rate of change for each transcript in an RNA-sequencing experiment based on the ratio of intronic to exonic reads. This inferred velocity vectors serves as a prediction for the future transcriptional state of a cell, while the current read counts serve as a measurement of the instantaneous state. Qualitative analysis of RNA velocity has been used to establish the order of gene expression states in a sequence, but quantitative analysis has generally been lacking.

velodyn adopts formalisms from dynamical systems to provide a quantitative framework for RNA velocity analysis. The tools provided by velodyn along with their associated usage are described below. All velodyn tools are designed to integrate with the scanpy ecosystem and anndata structures.

We have released velodyn in association with a recent paper. Please cite our paper if you find velodyn useful for your work.

Differentiation reveals latent features of aging and an energy barrier in murine myogenesis
Jacob C Kimmel, Nelda Yi, Margaret Roy, David G Hendrickson, David R Kelley
Cell Reports 2021, 35 (4); doi: https://doi.org/10.1016/j.celrep.2021.109046

BibTeX

@article{kimmel_latent_2021,
	title = {Differentiation reveals latent features of aging and an energy barrier in murine myogenesis},
	volume = {35},
	issn = {2211-1247},
	url = {https://www.cell.com/cell-reports/abstract/S2211-1247(21)00362-4},
	doi = {10.1016/j.celrep.2021.109046},
	language = {English},
	number = {4},
	urldate = {2021-05-19},
	journal = {Cell Reports},
	author = {Kimmel, Jacob C. and Yi, Nelda and Roy, Margaret and Hendrickson, David G. and Kelley, David R.},
	month = apr,
	year = {2021},
	pmid = {33910007},
	note = {Publisher: Elsevier},
	keywords = {aging, dynamical systems, fibro/adipogenic progenitor, muscle stem cell, myogenesis, RNA-seq, single cell, stem cell}
}

If you have any questions or comments, please feel free to email me.

Jacob C. Kimmel, PhD
[email protected]
Calico Life Sciences, LLC

Installation

git clone https://github.com/calico/velodyn
cd velodyn
pip install .

or

pip install velodyn

Tutorial

We have provided a velodyn tutorial using the Colab computing environment from Google. This notebook allows for execution of a velodyn workflow, end-to-end, all within your web browser.

velodyn tutorial

Gene expression state stability measurements

velodyn can provide a quantitative measure of gene expression state stability based on the divergence of the RNA velocity field. The divergence reflects the net flow of cells to a particular region of state space and is frequently used to characterize vector fields in physical systems. Divergence measures can reveal stable attractor states and unstable repulsor states in gene expression space. For example, we computed the divergence of gene expression states during myogenic differentiation and identified two attractor states, separated by a repulsor state. This repulsor state is unstable, suggesting it represents a decision point where cells decide to adopt one of the attractor states.

Divergence maps of myogenic differentiation. Two attractor states along a one-dimensional manifold are separated by a repulsor state in the center.

Usage

from velodyn.velocity_divergence import compute_div, plot_div

D = compute_div(
    adata=adata,
    use_rep='pca',
    n_grid_points=30,
)
print(D.shape) # (30, 30,)

fig, ax = plot_div(D)

State transition rate comparisons with phase simulations

Across experimental conditions, the rates of change in gene expression space may change significantly. However, it is difficult to determine where RNA velocity fields differ across conditions, and what impact any differences may have on the transit time between states. In dynamical systems, phase point analysis is used to quantify the integrated behavior of a vector field. For a review of phase point simulation methods, we highly recommend Nonlinear Dynamics & Chaos by Steven Strogatz.

In brief, a phase point simulation instantiates a particle ("phase point") at some position in a vector field. The position of the particle is updated ("evolved") over a number of timesteps using numerical methods.

For velodyn, we implement our update step using a stochastic weighted nearest neighbors model. We have a collection of observed cells and their associated velocity vectors as the source of our vector field. For each point at each timestep, we estimate the parameters of a Gaussian distribution of possible update steps based on the mean and variance of observed velocity vectors in neighboring cells. We then draw a sample from this distribution to update the position of the phase point. The stochastic nature of this evolution mirrors the stochastic nature of gene expression.

By applying phase point simulations to RNA velocity fields, velodyn allows for comparisons of state transition rates across experimental conditions. For example, we used phase point simulations to analyze the rate of myogenic differentiation in young and aged muscle stem cells. These analyses revealed that aged cells progress more slowly toward the differentiated state than their young counterparts.

Phase point simulations show the direction and rate of motion in an RNA velocity field.

Usage

from velodyn.velocity_dynsys import PhaseSimulation

simulator = PhaseSimulation(
    adata=adata,
)
# set the velocity basis to use
simulator.set_velocity_field(basis='pca')
# set starting locations for phase points
# using a categorical variable in `adata.obs`
simulator.set_starting_point(
    method='metadata', 
    groupby='starting_points',
    group='forward',
)
# run simulations using the stochastic kNN velocity estimator
trajectories = simulator.simulate_phase_points(
    n_points=n_points_to_simulate,
    n_timesteps=n_timesteps_to_simulate,
    velocity_method='knn',
    velocity_method_attrs={'vknn_method': 'stochastic'},          
    step_scale=float(step_scale),
    multiprocess=True, # use multiple cores
)

print(trajectories.shape)
# [
#     n_points_to_simulate, 
#     n_timesteps, 
#     n_embedding_dims, 
#     (position, velocity_mean, velocity_std),
# ]

Change in pseudotime predictions

Dynamic cell state transitions are often parameterized by a pseudotime curve, as introduced by Cole Trapnell in monocle. Given RNA velocity vectors and pseudotime coordinates, velodyn can predict a "change in pseudotime" for each individual cell. The procedure for predicting a change in pseudotime is fairly simple. velodyn trains a machine learning model to predict pseudotime coordinates from gene expression embedding coordinates (e.g. coordinates in principal component space). The future position of each cell in this embedding is computed as the current position shifted by the RNA velocity vector and a new pseudotime coordinate is predicted using the trained model. The "change in pseudotime" is then returned as the difference between the pseudotime coordinate for the predicted future point and the pseudotime coordinate for the observed point.

Change in pseudotime is predicted using a machine learning model for each cell.

Usage

from velodyn.velocity_dpst import dPseudotime

DPST = dPseudotime(
    adata=adata,
    use_rep='pca',
    pseudotime_var='dpt_pseudotime',
)
change_in_pseudotime = DPST.predict_dpst()

Velocity confidence intervals

RNA velocity estimates for each cell are incredibly useful, but there is no notion of variance inherent to the inference procedure. If we wish to make comparisons between cells that moving in different directions in gene expression space, we require confidence intervals on each cell's RNA velocity vector. velodyn introduces a molecular parameteric bootstrapping procedure to compute these confidence intervals. Briefly, we parameterize a multinomial distribution across genes using the mRNA profile for each cell. We then parameterize a second multinomial distribution for each gene in each cell based on the observed counts of spliced, unspliced, and ambiguous reads. We sample reads to the observed depth across genes, using the gene-level multinomial to distribute these reads as spliced, unspliced, or ambiguous observations and repeat this prodcued many times for each cell. We then compute RNA velocity vectors for each bootstrap sample and use these vectors to compute RNA velocity confidence intervals.

RNA velocity confidence intervals for each cell.

Usage

from velodyn.velocity_ci import VelocityCI

# initialize velocity CI
vci = VelocityCI(
    adata=adata,
)
# sample velocity vectors
# returns [n_iter, Cells, Genes]
velocity_bootstraps = vci.bootstrap_velocity(
    n_iter=n_iter,
    save_counts=out_path,
    embed=adata_embed, # experiment with genes of interest and relevant embedding
)

velodyn's People

Contributors

jacobkimmel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

misaka-dayu

velodyn's Issues

Deprecated numpy bool resulting in error for simulator.set_velocity_field()

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[83], line 17
     13 simulator = PhaseSimulation(
     14     adata=adata_vel,
     15 )
     16 # set the velocity basis to use
---> 17 simulator.set_velocity_field(basis='pca')
     18 # set starting locations for phase points
     19 # using a categorical variable in `adata.obs`
     20 simulator.set_starting_point(
     21     method='metadata', 
     22     groupby='starting_points',
     23     group='forward',
     24 )

File ~/miniconda3/envs/remote_singlecellseq/lib/python3.9/site-packages/velodyn/velocity_dynsys.py:158, in PhaseSimulation.set_velocity_field(self, groupby, group, basis)
    156     bidx = self.adata.obs[groupby] == group
    157 else:
--> 158     bidx = np.ones(self.adata.shape[0]).astype(np.bool)
    160 # get the relevant cells from the grouping
    161 self.vadata = self.adata[bidx, :].copy()

File ~/miniconda3/envs/remote_singlecellseq/lib/python3.9/site-packages/numpy/__init__.py:324, in __getattr__(attr)
    319     warnings.warn(
    320         f"In the future `np.{attr}` will be defined as the "
    321         "corresponding NumPy scalar.", FutureWarning, stacklevel=2)
    323 if attr in __former_attrs__:
--> 324     raise AttributeError(__former_attrs__[attr])
    326 if attr == 'testing':
    327     import numpy.testing as testing

AttributeError: module 'numpy' has no attribute 'bool'.
`np.bool` was a deprecated alias for the builtin `bool`. To avoid this error in existing code, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

I believe for line 158 of velodyn/velocity_dynsys the np.bool needs to be changed to bool because of Numpy deprecation errors. Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.