Code Monkey home page Code Monkey logo

phy's Introduction

phy: interactive visualization and manual spike sorting of large-scale ephys data

Build Status codecov.io Documentation Status GitHub release PyPI release

phy is an open-source Python library providing a graphical user interface for visualization and manual curation of large-scale electrophysiological data. It is optimized for high-density multielectrode arrays containing hundreds to thousands of recording sites (mostly Neuropixels probes).

Phy provides two GUIs:

  • Template GUI (recommended): for datasets sorted with KiloSort and Spyking Circus,
  • Kwik GUI (legacy): for datasets sorted with klusta and klustakwik2.

phy 2.0b1 screenshot

What's new

  • [5 June 2024] phy 2.0 beta 6, bug fixes, install work, fixing some deprecations
  • [7 Sep 2021] Release of phy 2.0 beta 5, with some install and bug fixes
  • [7 Feb 2020] Release of phy 2.0 beta 1, with many new views, new features, various improvements and bug fixes...

Links

Hardware requirements

It is recommanded to store the data on a SSD for performance reasons.

There are no specific GPU requirements as long as relatively recent graphics and OpenGL drivers are installed on the system.

Installation instructions

Run the following commands in a terminal:

  1. Create a new conda environment with the conda dependencies:

    conda create -n phy2 -y python=3.11 cython dask h5py joblib matplotlib numpy pillow pip pyopengl pyqt pyqtwebengine pytest python qtconsole requests responses scikit-learn scipy traitlets
    
  2. Activate the new conda environment with conda activate phy2

  3. Install the development version of phy: pip install git+https://github.com/cortex-lab/phy.git

  4. [OPTIONAL] If you plan to use the Kwik GUI, type pip install klusta klustakwik2

  5. Phy should now be installed. Open the GUI on a dataset as follows (the phy2 environment should still be activated):

cd path/to/my/spikesorting/output
phy template-gui params.py
  1. If there are problems with this method we also have an environment.yml file which allows for automatic install of the necessary packages. Give that a try.

Dealing with the error ModuleNotFoundError: No module named 'PyQt5.QtWebEngineWidget

In some environments, you might get an error message related to QtWebEngineWidget. Run the command pip install PyQtWebEngine and try launching phy again. This command should not run if the error message doesn't appear, as it could break the PyQt5 installation.

Upgrading from phy 1 to phy 2

  • Do not install phy 1 and phy 2 in the same Python environment.
  • It is recommended to delete ~/.phy/*GUI/state.json when upgrading.

Developer instructions

To install the development version of phy in a fresh environment, do:

git clone [email protected]:cortex-lab/phy.git
cd phy
pip install -r requirements.txt
pip install -r requirements-dev.txt
pip install -e .
cd ..
git clone [email protected]:cortex-lab/phylib.git
cd phylib
pip install -e . --upgrade

Troubleshooting

Running phy from a Python script

In addition to launching phy from the terminal with the phy command, you can also launch it from a Python script or an IPython terminal. This may be useful when debugging or profiling. Here's a code example to copy-paste in a new launch.py text file within your data directory:

from phy.apps.template import template_gui
template_gui("params.py")

Credits

phy is developed and maintained by Cyrille Rossant.

Contributors to the repository are:

phy's People

Contributors

alejoe91 avatar cgestes avatar crnolan avatar czuba avatar gitter-badger avatar lshaheen avatar mspacek avatar mswallac avatar nippoo avatar nsteinme avatar rossant avatar samminkowicz avatar shabnamkadir avatar szapp avatar ycanerol avatar zm711 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

phy's Issues

Data structure for cluster-dependent information

We need an efficient structure for per-cluster data.

  • Based on 1D, 2D, or 3D NumPy arrays
  • Cluster list on 1 (example of cluster statistics) or 2 axis (example of CCGs)
  • Fast cluster indexing
  • Fast update when the cluster assignments change
  • Arbitrary cluster indices
  • Relabelling

We'll probably need a dynamic array implementation on top of NumPy (inspired by this for example). For dual cluster axis (CCGs) we'll need something specific as well.

Ideally, this structure would contain a cluster_map variable with the cluster assignements for all spikes. When this variable is changed, the internal arrays are updated.

cc @nippoo

Probe widget

  • HTML/SVG/d3.js view for a probe
  • Show the layout (channel positions) with discs
  • Equal normalization for x and y axes
  • Display it with _repr_html_() of the MEA class

Find a dynamic layout library in JavaScript

Should offer the same experience as Qt's docking panels (resizable, drag-and-drop, fullscreen widgets).

A few links:

We should experiment with a few of these libraries and try to implement a prototype (using PNG screenshots of KlustaViewa's views for example).

Manual clustering Session object

Implement a user-level class with control actions:

class Session:
    def merge(clusters)
    def move(clusters, group)

    def undo()
    def redo()

    def start_wizard(self)
    def pause_wizard(self)
    def reset_wizard(self)

This class uses Clustering, ClusterMetadata, and Selection instances, and uses a GlobalHistory to track a unique undo stack with both clustering (merge, split, etc.) and cluster metadata (cluster moved, cluster color changed, etc.) actions.

This class can also update all views through the Selection instance. The different instances communicate with UpdateInfo instances.

Trace viewer

Possible starting point. Based on Vispy.

Features

  • Simple paging system.
  • Load the entire page into GPU memory, no dynamic undersampling (first approach).
  • Load and show the previous and next pages.
  • Pan & zoom.
  • Change channel scaling uniformly.
  • Optional automatic page scrolling with a timer.

Inputs

  • NumPy array (or memmap array) of size (nchannels, nsamples)
  • h5py dataset
  • [Optional] spike trains (spike times, neuron indices, masks) to show the spikes within the traces

Options

  • Color of the channels
  • Page size

phy.cluster.manual.color subpackage

  • Facilities to generate distinct colors
  • Generate a random color
  • Generate a color distinct from a given color

(possibly: to be partially merged into VisPy later)

IPython visualization widget with traitlets

Each view for clustering will be an IPython widget exposing specific traitlet attributes:

  • clustering: a Clustering instance
  • selected_spikes: a ndarray of selected spikes (selection used for highlighting or splitting)
  • clusters: a list of selected clusters
  • cluster_order: a string specifying the cluster order (by index, cluster group, size...?)

A base widget will implement those, custom widgets will derive from it.

In the final interface in IPython, we'll link all these traitlets together using IPython's link() function. When a spike selection changes in one widget, it will also change in the others.

To make this work, we'll need to implement specific traitlet types:

  • ndarray (see this)
  • Clustering

Provisional list of clustering widgets:

  • FeatureView
  • WaveformView
  • TraceView
  • GridView
  • CorrelogramsView
  • SimilarityView

Efficient data structures for the features

Benchmarks need to be done in order to find efficient on-disk formats for the features.

  • Features are used for:
    • Feature View (a subset of the spikes, two features x and y)
    • Split action (find all spikes which features x and y within a given polygon)
    • Similarity matrix (a subset of the spikes, but all feature columns)

Example size (high estimate): a (n_spikes, n_features) numerical matrix with:

  • n_spikes = 100,000,000
  • n_features = 10,000
  • about 20 non-null values per spike (sparse array)
  • float32 data type
  • total size (sparse): ~10 GB

Access patterns:

  1. View: arbitrary subset of <10,000 of rows, 2 arbitrary columns x and y.
  2. Split: arbitrary subset of several 10,000s of rows, 2 arbitrary columns x and y.
  3. Matrix: regular subset of ~10,000 rows (strided selection), all columns.

Possibilities:

  • HDF5 (dense, sparse csr, something else)
  • sqlite
  • flat binary

Notes:

  • Possibility to duplicate the data on disk using different structures for different access patterns.
  • Possibility to cache up to X GB of data, with X being a user option (1 by default?), the larger X, the better the performance.
  • We can consider SSDs exclusively for benchmarks.

Wizard

  • Keep a list of past actions (history):
    • ('move', [2], 0): move cluster 2 to group 0
    • ('merge', [3, 4], [10]): merge clusters 3 and 4 to cluster 10
  • Public methods:
    • next_best()
    • next_candidate()
    • next(): call next_candidate() or next_best() if there's no candidate left
    • merge(clusters, to): called by the Session controller
    • move(clusters, to): called by the Session controller
  • The Wizard keeps a reference to the similarity matrix.
  • What structure for the matrix? (see #43). Idea: defaultdict (cl1, cl2) ==> similarity, default value=0. When the pair doesn't exist, the structure returns 0. We just have to compute the similarity for clusters that have similar channel masks.

Improve ClusterView

  • Show selected/unselected
  • Allow multiple selection
  • Require to find the appropriate HTML controls

KwikModel

  • In phy.io.kwik.model
  • Derive from BaseModel.
  • Load data from HDF5.
  • Save data in HDF5.
  • No high-performance feature/waveform loading yet, just read from HDF5.

Raster plot

Based on Vispy.

Features

  • Optional paging system

Inputs

  • Spike times (seconds)
  • Neuron indices

Options

  • Positions of the neurons
  • Marker shape

Add config toolbox

  • File format: key = value pairs
  • Global (user-wide) options in ~/.phy/config.py
  • Local (dataset-wide) options in ~/.phy/filename/config.py

Structures for time data

We need specific data structures to represent temporal data (like in the file format, but for in-memory structures). To be implemented in a specific package phy.time.

What are the different types of temporal data?

  • time series
  • continuous data
  • epochs
  • ...?

Structures

We could subclass ndarray to represent temporal data.

Time series

one array + metadata:

  • array of times
  • unit (second, samples with sampling rate, ...)

Continuous data

two arrays:

  • array of times (irregularly sampled data) or sampling rate
  • values

Epochs

one array + metadata

  • a (2, N) array with start and end
  • unit

Array of time series

just a Time Series + another array with the indices (e.g. neuron number for every spike)

Routines

(proposed by Adrien Peyrache)

  • Time series: rate, restrict(interval or other time series)
  • Continuous data: thresholdInterval(value), meanInterval(interval)
  • Epochs: union, intersection, duration, dropShort(ShorterThanThisValue), mergeClose(closerThanThisValue)

Ping @kdharris101 @nippoo Adrien.

Add simple HDF5 functions

Create a io/h5.py module implementing a simple HDF5 API (on top of h5py).

with open_h5(filename, 'r') as f:
    data = f.read('/path/to/node')
    value = f.read_attr('/path/to/node', 'myattr')

with open_h5(filename, 'w') as f:
    f.write('/path/to/node', data)
    f.write_attr('/path/to/node', 'myattr', value)

Improve Waveform view

  • Use VisPy transforms for box placement
  • Use ST instead of PanZoom (optional)
  • Support sparse waveforms
  • Better management of keyboard shortcuts
  • Add depth
  • Unit testing interactivity to increase coverage
  • More interactivity options

Basic FeatureView

  • Just a scatter plot of selected spikes
  • Subset of all spikes from a list of given clusters
  • Point colors as a function of the cluster
  • Refactor WaveformVisual in a BaseVisual with baking mechanism

Basic WaveformView

  • Waveforms positioned with a probe geometry
  • Subset of all spikes from a list of given clusters
  • Point colors as a function of the cluster
  • Implement traitlets so that selected spikes, cluster colors, and probe geometry can be easily changed through an API

Similarity matrix

  • See this.
  • Put in phy.cluster.masked_em._stats.
  • Add many unit tests.

To do later: support sparse structures.

Selector

An object that represents a selection of spikes.

  • Can be instanciated with spike_clusters
  • Selection by specifying list of spikes or clusters (trait attributes)
  • Support a maximum number of spikes, with automatic subselection performed if too many spikes are selected by the user
  • Can be linked with a Reader: when the selection changes, new data may need to be fetched from disk or cache

See the API on the wiki.

ClusterManager class

A structure that handles:

  • moving clusters into groups
  • changing cluster colors
  • relabelling clusters

ClusterView

  • IPython widget in HTML showing a list of clusters
  • Supporting multiple selection
  • Exposes a traitlet attribute with the list of selected clusters

Proper format for logging, error, warn

We need to standardise:

What format the error, warn, log should take (present tense, capitalisation, line breaks, when they should be used, etc)

Also a standard for breaking from functions after an error.

First prototype: roadmap

  • KwikExperiment #59
  • Selector class #41
  • ClusterView: display all clusters in an IPython widget (HTML/CSS) #32
  • React to selected clusters (list traitlet attribute in the widget)
  • WaveformView #31
  • Session controller

Undo stack

  • Start from the original clustering
  • Save a stack of all actions: merge, custom spk->clu mapping (=split), move (ony forward actions are needed)
  • Write an efficient function that applies a list of actions
  • The undo/redo stack comes for free
  • We can keep a limit to the history length: we save the complete mapping of the oldest item in the history, and apply further changes on it
  • Benchmark: <50 ms to apply 100 successive changes on a 10M-long vector, if we keep in memory a tuple (spike_changed, cluster) (works for both merge and split; those are actually similar actions)

Common interface for sorting algorithms

Inspired by scikit-learn:

Spike detection

# We launch the spike detection.
# This will automatically use multiple CPUs if
# multiple engines have been launched with IPython.parallel.
# This call is asynchronous: the user can continue to work in the notebook,
# and request the task's status.
phy.spikedetect.run(model, algorithm="spikedetekt", ipp_view=c.load_balanced_view())

# Launch clustering.
phy.cluster.run(model, algorithm="klustakwik2", ipp_view=c.load_balanced_view())

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.