cortex-lab / phy Goto Github PK

phy: interactive visualization and manual spike sorting of large-scale ephys data

License: BSD 3-Clause "New" or "Revised" License

Python 95.97% GLSL 3.66% Makefile 0.07% CSS 0.24% Batchfile 0.01% Shell 0.02% HTML 0.03%

phy's Introduction

phy: interactive visualization and manual spike sorting of large-scale ephys data

phy is an open-source Python library providing a graphical user interface for visualization and manual curation of large-scale electrophysiological data. It is optimized for high-density multielectrode arrays containing hundreds to thousands of recording sites (mostly Neuropixels probes).

Phy provides two GUIs:

Template GUI (recommended): for datasets sorted with KiloSort and Spyking Circus,
Kwik GUI (legacy): for datasets sorted with klusta and klustakwik2.

What's new

[5 June 2024] phy 2.0 beta 6, bug fixes, install work, fixing some deprecations
[7 Sep 2021] Release of phy 2.0 beta 5, with some install and bug fixes
[7 Feb 2020] Release of phy 2.0 beta 1, with many new views, new features, various improvements and bug fixes...

Hardware requirements

It is recommanded to store the data on a SSD for performance reasons.

There are no specific GPU requirements as long as relatively recent graphics and OpenGL drivers are installed on the system.

Installation instructions

Run the following commands in a terminal:

Create a new conda environment with the conda dependencies:

conda create -n phy2 -y python=3.11 cython dask h5py joblib matplotlib numpy pillow pip pyopengl pyqt pyqtwebengine pytest python qtconsole requests responses scikit-learn scipy traitlets

Activate the new conda environment with conda activate phy2
Install the development version of phy: pip install git+https://github.com/cortex-lab/phy.git
[OPTIONAL] If you plan to use the Kwik GUI, type pip install klusta klustakwik2
Phy should now be installed. Open the GUI on a dataset as follows (the phy2 environment should still be activated):

cd path/to/my/spikesorting/output
phy template-gui params.py

If there are problems with this method we also have an environment.yml file which allows for automatic install of the necessary packages. Give that a try.

Dealing with the error `ModuleNotFoundError: No module named 'PyQt5.QtWebEngineWidget`

In some environments, you might get an error message related to QtWebEngineWidget. Run the command pip install PyQtWebEngine and try launching phy again. This command should not run if the error message doesn't appear, as it could break the PyQt5 installation.

Upgrading from phy 1 to phy 2

Do not install phy 1 and phy 2 in the same Python environment.
It is recommended to delete ~/.phy/*GUI/state.json when upgrading.

Developer instructions

To install the development version of phy in a fresh environment, do:

git clone [email protected]:cortex-lab/phy.git
cd phy
pip install -r requirements.txt
pip install -r requirements-dev.txt
pip install -e .
cd ..
git clone [email protected]:cortex-lab/phylib.git
cd phylib
pip install -e . --upgrade

Troubleshooting

Running phy from a Python script

In addition to launching phy from the terminal with the phy command, you can also launch it from a Python script or an IPython terminal. This may be useful when debugging or profiling. Here's a code example to copy-paste in a new launch.py text file within your data directory:

from phy.apps.template import template_gui
template_gui("params.py")

Credits

phy is developed and maintained by Cyrille Rossant.

Contributors to the repository are:

phy's People

Contributors

Stargazers

Watchers

Forkers

rossant nippoo cgestes apeyrache neuromusic danieljdenman nsteinme shabnamkadir gitter-badger eghbalhosseini yger der-tim sundeepteki peichao pombredanne ablot arnefmeyer crnolan stephenlenzi tgs2315 nachosi shanelen lbhb mspacek neuronetmem joseroubert08 do-jo srikanthbojja newmanmemorylab ryansclement mhhennig yutasenzai arita37 longlabgit lorefice brykko nithyar89 neurowsatl janlukasklee chris-angeloni helenxhou petersenpeter balajisriram warnerwarner wonkoderverstaendige chongxi jup36 adehad spoono huklab zhixiaosu yagui m-beau cinpla nawalgupta njeong1004 nperentos mikailweston giocomolab danielmontgomery7 haixinliuneuro tonylien hlwroblewski gzoumpourlis soyounk eliezyer thomashainmueller rcubero fjflores alejoe91 jorritmontijn gollischlab ycanerol davidjtitus cat-boucher ashrafulsbmcbd valejandracch mswallac jagsapphub mayofaulkner mmyros alex-attinger knierimlab 410pfeliciano czuba paulmanderson zsong30 lxdfrank ajay-zou yuichi-takeuchi wuffi shirquinn xinyangubc mccaffertylab moritzlindner esi-neuroscience ayalab1 saramatias spicyhands robertpetrovic

phy's Issues

SparseCSR class in Python

Just a data structure containing:
- val
- channel (col_ind)
- spike (row_ptr)
to_dense()
from_dense

Data structure for cluster-dependent information

We need an efficient structure for per-cluster data.

Based on 1D, 2D, or 3D NumPy arrays
Cluster list on 1 (example of cluster statistics) or 2 axis (example of CCGs)
Fast cluster indexing
Fast update when the cluster assignments change
Arbitrary cluster indices
Relabelling

We'll probably need a dynamic array implementation on top of NumPy (inspired by this for example). For dual cluster axis (CCGs) we'll need something specific as well.

Ideally, this structure would contain a cluster_map variable with the cluster assignements for all spikes. When this variable is changed, the internal arrays are updated.

cc @nippoo

Add logging functions

Singleton instance that handles Kwik file open/close

Different classes that work with files should use this singleton to access the files.

opener = KwikOpener()
fh = opener.open('myfile.kwik')
opener.close('myfile.kwik')

Probe widget

HTML/SVG/d3.js view for a probe
Show the layout (channel positions) with discs
Equal normalization for x and y axes
Display it with _repr_html_() of the MEA class

Trace View

Experimental VisPy code here: https://github.com/kwikteam/experimental/tree/master/plot

Find a dynamic layout library in JavaScript

Should offer the same experience as Qt's docking panels (resizable, drag-and-drop, fullscreen widgets).

A few links:

http://layout.jquery-dev.com/demos.cfm (looks good, try simple.html, but maybe too limited: not possible to move widgets around)
http://www.dockspawn.com/ (probably the most complete option, github here, but unmaintained)
http://gridster.net/ (not exactly what we're looking for)
http://troolee.github.io/gridstack.js/ (not exactly what we're looking for)
http://methvin.com/splitter/
https://github.com/Lucent/drag-resize-dock (untested)

We should experiment with a few of these libraries and try to implement a prototype (using PNG screenshots of KlustaViewa's views for example).

Manual clustering Session object

Implement a user-level class with control actions:

class Session:
    def merge(clusters)
    def move(clusters, group)

    def undo()
    def redo()

    def start_wizard(self)
    def pause_wizard(self)
    def reset_wizard(self)

This class uses Clustering, ClusterMetadata, and Selection instances, and uses a GlobalHistory to track a unique undo stack with both clustering (merge, split, etc.) and cluster metadata (cluster moved, cluster color changed, etc.) actions.

This class can also update all views through the Selection instance. The different instances communicate with UpdateInfo instances.

Update Probe instance to support multi shank

Trace viewer

Possible starting point. Based on Vispy.

Features

Simple paging system.
Load the entire page into GPU memory, no dynamic undersampling (first approach).
Load and show the previous and next pages.
Pan & zoom.
Change channel scaling uniformly.
Optional automatic page scrolling with a timer.

Inputs

NumPy array (or memmap array) of size (nchannels, nsamples)
h5py dataset
[Optional] spike trains (spike times, neuron indices, masks) to show the spikes within the traces

Options

Color of the channels
Page size

phy.cluster.manual.color subpackage

Facilities to generate distinct colors
Generate a random color
Generate a color distinct from a given color

(possibly: to be partially merged into VisPy later)

User-definable parameter with maximum undo stack size

Integrate CCG into the session

IPython visualization widget with traitlets

Each view for clustering will be an IPython widget exposing specific traitlet attributes:

clustering: a Clustering instance
selected_spikes: a ndarray of selected spikes (selection used for highlighting or splitting)
clusters: a list of selected clusters
cluster_order: a string specifying the cluster order (by index, cluster group, size...?)

A base widget will implement those, custom widgets will derive from it.

In the final interface in IPython, we'll link all these traitlets together using IPython's link() function. When a spike selection changes in one widget, it will also change in the others.

To make this work, we'll need to implement specific traitlet types:

ndarray (see this)
Clustering

Provisional list of clustering widgets:

FeatureView
WaveformView
TraceView
GridView
CorrelogramsView
SimilarityView

Popup widget in IPython

See this issue.

Efficient data structures for the features

Benchmarks need to be done in order to find efficient on-disk formats for the features.

Features are used for:
- Feature View (a subset of the spikes, two features x and y)
- Split action (find all spikes which features x and y within a given polygon)
- Similarity matrix (a subset of the spikes, but all feature columns)

Example size (high estimate): a (n_spikes, n_features) numerical matrix with:

n_spikes = 100,000,000
n_features = 10,000
about 20 non-null values per spike (sparse array)
float32 data type
total size (sparse): ~10 GB

Access patterns:

View: arbitrary subset of <10,000 of rows, 2 arbitrary columns x and y.
Split: arbitrary subset of several 10,000s of rows, 2 arbitrary columns x and y.
Matrix: regular subset of ~10,000 rows (strided selection), all columns.

Possibilities:

HDF5 (dense, sparse csr, something else)
sqlite
flat binary

Notes:

Possibility to duplicate the data on disk using different structures for different access patterns.
Possibility to cache up to X GB of data, with X being a user option (1 by default?), the larger X, the better the performance.
We can consider SSDs exclusively for benchmarks.

Wizard

Keep a list of past actions (history):
- ('move', [2], 0): move cluster 2 to group 0
- ('merge', [3, 4], [10]): merge clusters 3 and 4 to cluster 10
Public methods:
- next_best()
- next_candidate()
- next(): call next_candidate() or next_best() if there's no candidate left
- merge(clusters, to): called by the Session controller
- move(clusters, to): called by the Session controller
The Wizard keeps a reference to the similarity matrix.
What structure for the matrix? (see #43). Idea: defaultdict (cl1, cl2) ==> similarity, default value=0. When the pair doesn't exist, the structure returns 0. We just have to compute the similarity for clusters that have similar channel masks.

Improve ClusterView

Show selected/unselected
Allow multiple selection
Require to find the appropriate HTML controls

Remove unused imports

Consider putting back flake8 F401
Use autoflake once on the whole code base to remove unused imports.

KwikModel

In phy.io.kwik.model
Derive from BaseModel.
Load data from HDF5.
Save data in HDF5.
No high-performance feature/waveform loading yet, just read from HDF5.

Make clusters and their spikes deletable

For performance and memory considerations, it may be needed to delete noisy spikes to save memory.

Set up coverage

Locally and on Travis.

Implement CCGs/ACGs with NumPy

Possible starting point.

In phy.cluster.manual.stats.py.

Raster plot

Based on Vispy.

Features

Optional paging system

Inputs

Spike times (seconds)
Neuron indices

Options

Positions of the neurons
Marker shape

Improve the WebGL backend

See this issue

Add config toolbox

File format: key = value pairs
Global (user-wide) options in ~/.phy/config.py
Local (dataset-wide) options in ~/.phy/filename/config.py

time series
continuous data
epochs
...?

Structures

We could subclass ndarray to represent temporal data.

Time series

one array + metadata:

array of times
unit (second, samples with sampling rate, ...)

Continuous data

two arrays:

array of times (irregularly sampled data) or sampling rate
values

Epochs

one array + metadata

a (2, N) array with start and end
unit

Array of time series

just a Time Series + another array with the indices (e.g. neuron number for every spike)

Routines

(proposed by Adrien Peyrache)

Time series: rate, restrict(interval or other time series)
Continuous data: thresholdInterval(value), meanInterval(interval)
Epochs: union, intersection, duration, dropShort(ShorterThanThisValue), mergeClose(closerThanThisValue)

Ping @kdharris101 @nippoo Adrien.

Add simple HDF5 functions

Create a io/h5.py module implementing a simple HDF5 API (on top of h5py).

with open_h5(filename, 'r') as f:
    data = f.read('/path/to/node')
    value = f.read_attr('/path/to/node', 'myattr')

with open_h5(filename, 'w') as f:
    f.write('/path/to/node', data)
    f.write_attr('/path/to/node', 'myattr', value)

Fractional peak offset in waveform view

In the vertex shader, a_time + offset with different offset for every spike ==> vertex

Improve Waveform view

Use VisPy transforms for box placement
Use ST instead of PanZoom (optional)
Support sparse waveforms
Better management of keyboard shortcuts
Add depth
Unit testing interactivity to increase coverage
More interactivity options

Basic FeatureView

Just a scatter plot of selected spikes
Subset of all spikes from a list of given clusters
Point colors as a function of the cluster
Refactor WaveformVisual in a BaseVisual with baking mechanism

Basic WaveformView

Waveforms positioned with a probe geometry
Subset of all spikes from a list of given clusters
Point colors as a function of the cluster
Implement traitlets so that selected spikes, cluster colors, and probe geometry can be easily changed through an API

Csicsvari format input functions

Similarity matrix

See this.
Put in phy.cluster.masked_em._stats.
Add many unit tests.

To do later: support sparse structures.

Move phy/io/utils.py to phy/utils/array.py

After #49 is merged.

phy/utils/array.py will contain array-related utility functions.

Selector

An object that represents a selection of spikes.

Can be instanciated with spike_clusters
Selection by specifying list of spikes or clusters (trait attributes)
Support a maximum number of spikes, with automatic subselection performed if too many spikes are selected by the user
Can be linked with a Reader: when the selection changes, new data may need to be fetched from disk or cache

See the API on the wiki.

CCG view in matplotlib

ClusterManager class

A structure that handles:

moving clusters into groups
changing cluster colors
relabelling clusters

ClusterView

IPython widget in HTML showing a list of clusters
Supporting multiple selection
Exposes a traitlet attribute with the list of selected clusters

ndarray traitlet type

Proper format for logging, error, warn

We need to standardise:

What format the error, warn, log should take (present tense, capitalisation, line breaks, when they should be used, etc)

Also a standard for breaking from functions after an error.

Unit test notebook and plot functionality

First prototype: roadmap

KwikExperiment #59
Selector class #41
ClusterView: display all clusters in an IPython widget (HTML/CSS) #32
React to selected clusters (list traitlet attribute in the widget)
WaveformView #31
Session controller

Undo stack

Start from the original clustering
Save a stack of all actions: merge, custom spk->clu mapping (=split), move (ony forward actions are needed)
Write an efficient function that applies a list of actions
The undo/redo stack comes for free
We can keep a limit to the history length: we save the complete mapping of the oldest item in the history, and apply further changes on it
Benchmark: <50 ms to apply 100 successive changes on a 10M-long vector, if we keep in memory a tuple (spike_changed, cluster) (works for both merge and split; those are actually similar actions)

Fileset

See here.

Datasets package

See here. Should be in:

phy/datasets/mock.py
phy/datasets/tests/test_mock.py

New cluster_mask property in Clustering

compute it at the beginning
with new _update() method which updates cluster counts and masks

Common interface for sorting algorithms

Inspired by scikit-learn:

Spike detection

# We launch the spike detection.
# This will automatically use multiple CPUs if
# multiple engines have been launched with IPython.parallel.
# This call is asynchronous: the user can continue to work in the notebook,
# and request the task's status.
phy.spikedetect.run(model, algorithm="spikedetekt", ipp_view=c.load_balanced_view())

# Launch clustering.
phy.cluster.run(model, algorithm="klustakwik2", ipp_view=c.load_balanced_view())

cortex-lab / phy Goto Github PK

phy's Introduction

phy: interactive visualization and manual spike sorting of large-scale ephys data

What's new

Links

Hardware requirements

Installation instructions

Dealing with the error ModuleNotFoundError: No module named 'PyQt5.QtWebEngineWidget

Upgrading from phy 1 to phy 2

Developer instructions

Troubleshooting

Running phy from a Python script

Credits

phy's People

Contributors

Stargazers

Watchers

Forkers

phy's Issues

Features

Inputs

Options

Features

Inputs

Options

Structures

Time series

Continuous data

Epochs

Array of time series

Routines

Spike detection

Recommend Projects

Recommend Topics

Recommend Org

Dealing with the error `ModuleNotFoundError: No module named 'PyQt5.QtWebEngineWidget`