Code Monkey home page Code Monkey logo

harmony's Introduction

Harmony

Harmony is a unified framework for data visualization, analysis and interpretation of scRNA-seq data measured across discrete time points.Harmony constructs an augmented affinity matrix by augmenting the kNN graph affinity matrix with mutually nearest neighbors between successive time points. This augmented affinity matrix forms the basis for generated a force directed layout for visualization and also serves as input for computing the diffusion operator which can be used for trajectory detection using Palantir

Installation and dependencies

  1. Harmony has been implemented in Python3 and can be installed using:

     $> pip install harmonyTS
     $> pip install palantir
    
  2. Harmony depends on a number of python3 packages available on pypi and these dependencies are listed in setup.py All the dependencies will be automatically installed using the above commands

  3. To uninstall:

     $> pip uninstall harmonyTS
    
  4. If you would like to determine gene expression trends, please install R programming language and the R package GAM . You will also need to install the rpy2 module using

     $> pip install rpy2
    
  5. If you would like to speed-up the analysis in case of big datasets, you can run the main functions of this package on a CUDA GPU. To do so please install rapids-0.17 as well as cupy>=9.0.

Usage

A tutorial on Harmony usage and results visualization for single cell RNA-seq data can be found in this notebook: http://nbviewer.jupyter.org/github/dpeerlab/Harmony/blob/master/notebooks/Harmony_sample_notebook.ipynb

The datasets generated as part of the manuscript and harmozined using Harmony are available for exploration at: endoderm-explorer.com

Citations

Harmony was used to harmonize datasets across multiple time points in our manuscript characterizing mouse gut endoderm development. This manuscript is available at Nature. If you use Harmony for your work, please cite our paper.

harmony's People

Contributors

awnimo avatar katosh avatar louisfaure avatar manusetty avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

harmony's Issues

PyPI release

hi, now that harmony is in scanpy.ext, it should be available from pip.

It’s no problem that there’s already something called “harmony”, you just have to make the distribution name (the one on PyPI that you can pip install) different from the module name (the one you import)

Installation error

I use pip install harmonyTS and get the following error:

Building wheels for collected packages: fa2
Building wheel for fa2 (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [33 lines of output]
Installing fa2 package (fastest forceatlas2 python implementation)

  >>>> Cython is installed?
  Yes
  
  >>>> Starting to install!
  
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.macosx-10.9-x86_64-cpython-39
  creating build/lib.macosx-10.9-x86_64-cpython-39/fa2
  copying fa2/fa2util.py -> build/lib.macosx-10.9-x86_64-cpython-39/fa2
  copying fa2/__init__.py -> build/lib.macosx-10.9-x86_64-cpython-39/fa2
  copying fa2/forceatlas2.py -> build/lib.macosx-10.9-x86_64-cpython-39/fa2
  running egg_info
  writing fa2.egg-info/PKG-INFO
  writing dependency_links to fa2.egg-info/dependency_links.txt
  writing requirements to fa2.egg-info/requires.txt
  writing top-level names to fa2.egg-info/top_level.txt
  [03/11/24 17:51:10] ERROR    listing git files failed - pretending     git.py:24
                               there aren't any
  reading manifest file 'fa2.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  writing manifest file 'fa2.egg-info/SOURCES.txt'
  copying fa2/fa2util.c -> build/lib.macosx-10.9-x86_64-cpython-39/fa2
  copying fa2/fa2util.pxd -> build/lib.macosx-10.9-x86_64-cpython-39/fa2
  running build_ext
  Compiling fa2/fa2util.py because it changed.
  [1/1] Cythonizing fa2/fa2util.py
  building 'fa2.fa2util' extension
  error: unknown file type '.pxd' (from 'fa2/fa2util.pxd')
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for fa2
Running setup.py clean for fa2
Failed to build fa2
ERROR: Could not build wheels for fa2, which is required to install pyproject.toml-based projects

GPU accelerated package

Dear Harmony developpers,

I really like your tool as it is working great with my developmental data! So to remove any waiting time while applying it on huge datasets, I converted the core as well as the ForceAtlas2 embedding generation functions to CUDA accelerated ones here: https://github.com/LouisFaure/Harmony-GPU

If you think this would be relevant to implement it to the original package I could propose a pull request.

Cheers

Use precomputed pca

Hi dear,
Great job with Harmony! I wonder is it possible to extend the harmony.core.augmented_affinity_matrix so that the user can input precomputed reduced dimension data (e.g. PCA, ICA ...)?
Thanks

Ordering of the timepoints

In my AnnData object, I have a field adata.obs['day'], which is categorical, calling adata.obs['day'].cat.categories yields

Index(['0', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13',
       '15', '21', '36'],
      dtype='object')

So the values are strings in the right order. However, when I call Harmony using the scanpy interface, the timepoint connections are created using

    timepoints = adata.obs[tp].unique().tolist()
    timepoint_connections = pd.DataFrame(np.array([timepoints[:-1], timepoints[1:]]).T)

which permutes my timepoints to a random order. To keep the order, I need to change this to

    timepoints = list(adata.obs[tp].cat.categories)
    timepoint_connections = pd.DataFrame(np.array([timepoints[:-1], timepoints[1:]]).T)

It would be very important in my opinion to have some info in the docstring about the format that the time point annotation needs to have in order to results in the expected results. It would also be good to check the dtype of the passed .obs annotation and to create the timepoint connections accordingly, as this is really critical.

Find dynamics genes in branches

Hi,

I have used harmony for my data, it works quite well, very great tools. But I have another question which may outside the scope of current harmony. I have already found the different branches for my data, but I want to know why it has those branches, so we want to check what is the dynamics expressed genes in one branch, and what is the different expressed genes between different branches. Dose harmony has any function to do this, or do you have any suggestions how can I do this with other tools. Very appreciate for your help.

Best regards,
Jphe

Is Harmony deterministic?

I have run Harmony several times with the same input and gotten different results. Is Harmony expected to be deterministic? If so, I can try to work up a minimal example of what I am seeing. If not, then that's fine; I just want to know what to expect. Thanks!

Pandas error in sample Harmony code

Hello all,

We're trying out the Harmony code in the sample notebook and it runs fine until it hits the line:
hvg_genes = harmony.utils.hvg_genes(norm_df)
where it throws the following error:

    Traceback (most recent call last):
        ...
        [snip]
        ...  
        File "/home/.../.local/lib/python3.6/site-packages/pandas/core/indexes/category.py", line 503, in reindex
        raise ValueError("cannot reindex with a non-unique indexer")
    ValueError: cannot reindex with a non-unique indexer

Has anyone seen this before? Alternately, are there specific library versions we should be running against? Here are the versions ot the underlying libs that we're using:

Library Version
Harmony 0.1
Palantir 0.2.1
Pandas 0.24.2
NumPy 1.16.3

We'd really love to use the software - so any ideas or advice is welcome.

Cheers,
Bill

can't import harmony

Hi my name is Martin and i want to use harmony for my data but i have a little problem

import harmony
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.5/dist-packages/harmony/init.py", line 1, in
from . import core
File "/usr/local/lib/python3.5/dist-packages/harmony/core.py", line 58
print(f'Constucting affinities between {t1} and {t2}...')
^

Can you help me to understand this error?
Do I have the right version of Python to use harmony and palantir?
thanks for your help

load 10X data

Hi,

I'm trying to use harmony for my own 10X scRNAseq data, for the test data it is a csv file, I'm wondering how can I load 10X data into harmony? should I transfer it into a csv? or dose it possible to use any output from scanpy?

Thanks,
Jphe

Scanpy Implementation does not expose n_neighbors

The scanpy implementation computes the augmented affinity matrix via

 # compute the augmented and non-augmented affinity matrices
    aug_aff, aff = harmony.core.augmented_affinity_matrix(
        adata.to_df(), adata.obs[tp], timepoint_connections, pc_components=n_components,
    )

Unfortunately, it does not expose the n_neighbors parameter, which would be quite important to be able to choose.

Meaning and function of the `n_components` parameter

Harmony has a n_components parameter, which, according to the docstring:

:param pc_components: Minimum number of principal components to use. Specify `None` to use pre-computed components

That value is used for utils.run_pca, but it's not passed over to scanpy's neighboor computation, see

sc.pp.neighbors(temp, n_pcs=0, n_neighbors=n_neighbors)

So I wonder what the significance of that parameter actually is?

Also, I find the default value of 1000 a bit high, as scanpy's default here is much smaller, 50 I believe

int16 not suitable for fluidigm data

I'm using harmony to analyse fluidigm data - although it's working brilliantly now I've had to alter the int16 datatype in utils.load_from_csvs as some of our gene counts are too high to be saved as int16.

The data didn't fail to load initially, just gave me negative counts in the matrix, so thought I would flag this up.

Thanks!

Anna

Harmony data pre-processing and time connection

Hi,

when I run harmony to load the data with:
counts = harmony.utils.load_from_csvs(csv_files, sample_names)

I found that values greater than 32767 were transformed to negative value. It is noted that the default dtype is int16 (so the max allowed value is 32767). Should the value be adjusted larger?

And if I have multiple time points, should the time connection data frame look like this?
0 1
0 0 1
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6
6 6 7

Computational efficiency of KNN graph computation

I realised that the first step of harmony, the standard KNN graph computation (output: Nearest neighbor computation...) takes really long - around 10 to 20x longer than the same operation takes in scanpy. Would it be possible to use scanpy's KNN graph implementation in the first step of the algorithm? It would speed up things considerably.

I know that harmony uses a different kernel to get the graph connectivities/affinities, but I guess this could be implemented as a method in scanpy.pp.neighbors, besides umap, gauss and rapid.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.