Code Monkey home page Code Monkey logo

deepinterpolation's Introduction

image

# Deep Interpolation

deepinterpolation is a Python library to denoise data by removing independent noise. Importantly training does NOT require ground truth. This repository is currently meant to support the bioRxiv publication results : https://www.biorxiv.org/content/10.1101/2020.10.15.341602v1

# Principle of Deep Interpolation

principle of deep interpolation

Figure 1 - Schematic introducing the principles of deep interpolation. A. An interpolation model is trained to predict a noisy block from other blocks with independent noise. The loss is the difference between the predicted data and a new noisy block. B. The interpolation model is used to create a noiseless version of the input data.

For more information, consult the associated bioRxiv publication : https://www.biorxiv.org/content/10.1101/2020.10.15.341602v1

# Support

For bug and issues, please submit issue tickets on this repository. For installation and running support, we are trying to move to using the more public discussion forum on this repository (https://github.com/AllenInstitute/deepinterpolation/discussions). Alternatively you can join the slack channel where the past support history was saved (if invitation has expired: email to Jerome): https://join.slack.com/t/deepinterpolation/shared_invite/zt-rkmcw7h1-v8y0Grwe3fZg4m~DiAQVMg

# Installation

In all cases, unless you only want to work from CPU, you will have to consider installing tensorflow GPU dependencies (ie. cuda drivers). To that end, you might have to consult tensorflow documentation to enable your GPU.

To install the package, you have 2 options.

  1. Install from pypi using:

Create new conda environment called 'local_env'

conda create -n local_env python=3.7

Our integration tests on the CI server are currently running with python 3.7. While it is likely working with other versions, we cannot guarantee it.

pip install deepinterpolation

This will install the latest deployed stable version and only the core components of the library. You will NOT have access to sample datasets present on this repository.

  1. Install from a clone of this repository.

This will give you access to the latest developments as well as the provided sample data. Our step by step example assume this installation mode as it depends on the sample datasets.

The small training examples below works on both CPU and GPU architecture (ie. even a small macbook). If you are not familiar with using deep learning, we recommend to play with smaller datasets first, such as the example Neuropixel data provided.

Our integration tests on the CI server are currently running with python 3.7. While it is likely working with other versions, we cannot guarantee it.

  • activate environment

    conda activate local_env

  • install necessary packages

    make init

  • install deepinterpolation package

    python setup.py install

# Descrition and use of the Command Line Interface (CLI).

DeepInterpolation 0.1.3 introduced a refactored interface to use the package. The purpose of this mode is to faciliate deployment of deepinterpolation and provide a consistent API for use. Example use of the CLI are provided in the examples/ folder under cli*.

There are two modes that you can use:

  • Scripting mode:

In this mode you construct a set of dictionaries of parameters and feed them to the training, inference or finetuning objects within a python script. This mode is useful to iterate and improve your jobs. Example of this mode are provided in the examples/ folder as cli*.py files.

  • Command-line mode:

In this mode, you save the dictionary into a json file and provide the path to this file as a parameter through the command line. This mode is useful for deploying your jobs at a larger scale. Typically your json file is mostly the same from job to job. Example of this mode are provided in the examples/ folder as cli.sh and cli_.json files.

All parameters of the CLI are documented within the schema. To access the documentation, type down :

python -m deepinterpolation.cli.training --help

or

python -m deepinterpolation.cli.inference --help

or

python -m deepinterpolation.cli.fine_tuning --help

# General package description

The files in the deepinterpolation folder contain the core classes for training, inferrence, loss calculation and network generations. Those are called 'Collection'. Each collection is essentially a local list of functions that are used to create different type of objects and can be extended on one another. For instance, the network_collection.py contains a list of networks that can be generated for training. This allows for quick iteration and modification of an architecture while keeping the code organized.

# FAQ

See here : https://github.com/AllenInstitute/deepinterpolation/tree/master/faq

# Example training

To try out training your own DeepInterpolation network, we recommend to start with this file: https://github.com/AllenInstitute/deepinterpolation/blob/master/examples/cli_example_tiny_ephys_training.py

In this file, you will need to edit the paths to a local folder appropriate to save your models.

Then, activate your conda env called 'local_env'

conda activate local_env

then run

python cli_example_tiny_ephys_training.py

If everything runs correctly, you should see the following in just a few minutes :

2020-10-19 18:01:03.735098: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. sh: sysctl: command not found 2020-10-19 18:01:03.749184: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f9b1f115860 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-10-19 18:01:03.749202: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version WARNINGperiod argument is deprecated. Please use save_freq to specify the frequency in number of batches seen. Epoch 1/5 10/10 [==============================] - 19s 2s/step - loss: 0.4597 - val_loss: 0.3987 Epoch 2/5 10/10 [==============================] - 20s 2s/step - loss: 0.3796 - val_loss: 0.3785 Epoch 3/5 10/10 [==============================] - 22s 2s/step - loss: 0.3646 - val_loss: 0.3709 Epoch 4/5 10/10 [==============================] - 21s 2s/step - loss: 0.3797 - val_loss: 0.3698 Epoch 5/5 10/10 [==============================] - 21s 2s/step - loss: 0.3835 - val_loss: 0.3675 Saved model to disk

This is a toy example but you can increase the number of training frames to increase the quality of the model. All parameters are commented in the file. To adjust to a larger dataset, change the path parameters, the start_frame and end_frame parameters. Please consult the CLI documentation mentioned above for more details of each parameter.

# Example inference

Raw pre-trained models are available as separate h5 file on Dropbox.

The following models are currently available :

Two-photon Ai93 excitatory line DeepInterpolation network:

Key recording parameters:

Two-photon Ai148 excitatory line DeepInterpolation network:

Key recording parameters:

Neuropixel DeepInterpolation network:

Key recording parameters:

  • Neuropixels Phase 3a probes
  • 374 simultaneous recording sites across 3.84 mm, 10 reference channels
  • Four-column checkerboard site layout with 20 µm spacing between rows
  • 30 kHz sampling rate
  • 500x hardware gain setting
  • 500 Hz high pass filter in hardware, 150 Hz high-pass filter applied offline.
  • Pre-processing: Median subtraction was applied to individual probes to remove signals that were common across all recording sites. Each probe recording was mean-centered and normalized with a single pair of value for all nodes on the probe.
  • Docker hub id : 245412653747/deep_interpolation:allen_neuropixel
  • Dropxbox link : https://www.dropbox.com/sh/tm3epzil44ybalq/AACyKxfvvA2T_Lq_rnpHnhFma?dl=0

fMRI DeepInterpolation network:

Key recording parameters:

To start inference, we recommend to start with this file: https://github.com/AllenInstitute/deepinterpolation/blob/master/examples/cli_example_tiny_ephys_inference.py

In this file, you will need to edit the paths strings to fit your local paths.

Then, activate your conda env called 'local_env'

conda activate local_env

then run:

python cli_example_tiny_ephys_inference.py

If everything runs correctly, you should see the following in just a few minutes:

2020-10-20 14:10:37.549061: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. sh: sysctl: command not found 2020-10-20 14:10:37.564133: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f82ada8a520 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-10-20 14:10:37.564156: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version

This is a toy example but you can increase the start_frame and end_frame variable for larger data.

It is important to keep in mind that this process is easily parallelizable. In practice, we wrapped this code with additional routines to leverage 20 to 100 cluster CPU nodes to accelerate this process. You could also use GPU nodes as well, we just had access to a much larger number of CPU machines quickly.

# Adapting the module to a newer data structure

To adapt DeepInterpolation to a new dataset, you will need to use or recreate a generator in 'generator_collection.py'. Those are all constructed from core classes called DeepGenerator and SequentialGenerator.

The CollectorGenerator class allows to group generators if your dataset is distributed across many files/folder/sources. This system was designed to allow to train very large DeepInterpolation models from TB of data distributed on a network infrastructure. The CollectorGenerator is not currently supported throught the CLI and will be replaced with a simpler API in a future release.

# License

Allen Institute Software License – This software license is the 2-clause BSD license plus clause a third clause that prohibits redistribution and use for commercial purposes without further permission.

Copyright © 2019. Allen Institute. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Redistributions and use for commercial purposes are not permitted without the Allen Institute’s written permission. For purposes of this license, commercial purposes are the incorporation of the Allen Institute's software into anything for which you will charge fees or other compensation or use of the software to perform a commercial service for a third party. Contact [email protected] for commercial licensing opportunities.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

deepinterpolation's People

Contributors

aamster avatar djkapner avatar jeromelecoq avatar jmmanley avatar jsiegle avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.