alleninstitute / ecephys_spike_sorting Goto Github PK

View Code? Open in Web Editor NEW

101.0 16.0 87.0 6.96 MB

Modules for processing extracellular electrophysiology data from Neuropixels probes

License: Other

Shell 0.07% Python 73.25% Makefile 1.15% MATLAB 0.28% C 15.93% C++ 8.73% Objective-C++ 0.60%

electrophysiology analysis-pipeline spikes python3

ecephys_spike_sorting's Introduction

ecephys spike sorting

https://github.com/AllenInstitute/ecephys_spike_sorting

Modules for processing extracellular electrophysiology data from Neuropixels probes.

Overview

This repository contains code for running spike sorting pipelines for the Allen Brain Observatory. Public datasets that have been processed with ecephys_spike_sorting include Visual Coding - Neuropixels and Visual Behavior - Neuropixels. Electrophysiology that was spike sorted with this code has appeared in a number of publications, including:

Siegle, Jia et al. (2021) Survey of spiking in the mouse visual system reveals functional hierarchy.
Siegle, Ledochowitsch et al. (2021) Reconciling functional differences in populations of neurons recorded with two-photon imaging and electrophysiology.
Jia et al. (2022) Multi-regional module-based signal transmission in mouse visual cortex.

Compatibility

This code is designed to ingest data collected with the Open Ephys GUI. @jenniferColonell from HHMI Janelia Research Campus maintains a fork that is compatible with data recorded by SpikeGLX. For the spike sorting step, both versions rely on Kilosort 2 or 2.5. For more information on Kilosort, please read through the GitHub wiki.

Level of Support

This repository is no longer under development, and we recommend that new users base their spike sorting pipelines on SpikeInterface instead. We believe that even existing ecephys_spike_sorting users would benefit from migrating to SpikeInterface. The Allen Institute has already converted its spike sorting workflows to use SpikeInterface, which is actively maintained, works with a range of modern spike sorters, and includes up-to-date implementations of the most important pre- and post-processing methods. The SpikeInterface syntax needed to reproduce the functionality of ecephys_spike_sorting can be found in each module's README file.

To get started with SpikeInterface, we recommend reading through this tutorial on analyzing Neuropixels data.

Modules

The first three modules take data saved by the Open Ephys GUI and prepare it for spike sorting by Kilosort2. Following the spike-sorting step (using the kilosort_helper module), we clean up the outputs and calculate mean waveforms and quality metrics for each unit.

extract_from_npx (deprecated): Calls a binary executable that converts data from compressed NPX format into .dat files (continuous data) and .npy files (event data). The NPX format is no longer used by Open Ephys (or any other software), so this module can be skipped.
depth_estimation: Uses the LFP data to identify the surface channel, which is required by the median subtraction and kilosort modules.
median_subtraction: Calls a binary executable that removes the DC offset and common-mode noise from the AP band continuous file.
kilosort_helper: Generates config files for Kilosort and launches spike sorting via the Matlab engine.
kilosort_postprocessing: Removes putative double-counted spikes from Kilosort output.
noise_templates: Identifies noise units based on their waveform shape and ISI histogram.
mean_waveforms: Extracts mean waveforms from the raw data, given spike times and unit IDs. Also calculates metrics for each waveform.
quality_metrics: Calculates quality metrics for each unit to assess isolation and sorting quality.

(Not used) automerging: Automatically merges templates that belong to the same unit (included in case it's helpful to others).

Installation and Usage

These modules require Python 3.5+, and have been tested with Python 3.5, 3.6, and 3.7.

Three of the modules (extract_from_npx, median_subtraction, and kilosort_helper) have non-Python dependencies that will need to be installed prior to use.

We recommend using pipenv to run these modules. From the ecephys_spike_sorting top-level directory, run the following commands from a terminal:

Linux

    $ pip install --user pipenv
    $ export PIPENV_VENV_IN_PROJECT=1
    $ pipenv install
    $ pipenv shell
    (ecephys_spike_sorting) $ pip install .

You can now edit one of the processing scripts found in ecephys_spike_sorting/scripts and run via:

    (ecephys_spike_sorting) $ python ecephys_spike_sorting/scripts/batch_processing.py

See the scripts README file for more information on their usage.

To leave the pipenv virtual environment, simply type:

    (ecephys_spike_sorting) $ exit

macOS

If you don't have it already, install homebrew. Then, type:

    $ brew install pipenv
    $ export PIPENV_VENV_IN_PROJECT=1
    $ pipenv install
    $ pipenv shell
    (ecephys_spike_sorting) $ pip install .

You can now edit one of the processing scripts found in ecephys_spike_sorting/scripts and run via:

    (ecephys_spike_sorting) $ python ecephys_spike_sorting/scripts/batch_processing.py

See the scripts README file for more information on their usage.

To leave the pipenv virtual environment, simply type:

    (ecephys_spike_sorting) $ exit

Windows

    $ pip install --user pipenv
    $ set PIPENV_VENV_IN_PROJECT=1
    $ pipenv install
    $ pipenv shell
    (.venv) $ pip install .

Note: This will work in the standard Command Prompt, but the cmder console emulator has better compatibility with Python virtual environments.

You can now edit one of the processing scripts found in ecephys_spike_sorting\scripts and run via:

    (.venv) $ python ecephys_spike_sorting\scripts\batch_processing.py

See the scripts README file for more information on their usage.

To leave the pipenv virtual environment, simply type:

    (.venv) $ exit

Terms of Use

See Allen Institute Terms of Use

ecephys_spike_sorting's People

Contributors

Stargazers

Watchers

Forkers

noisyoscillator colehurwitz olsenshawn mmyros jennifercolonell plodocus jiaxx nandchandravadia atirtip danieljdenman zsong30 yinawei mfvd greggheller jingjie-li ianyao yangfanpeng johnwwang iwgreen midumitrescu lkeegan sourishmu edwardyan95 bjhardcastle ckfaber tommyxiaorong jakewesterberg zjm199502 mdiamantaki anoushkajain cpiette95 scottiealexander bjmiao xiaochenfu tan-lao spencer-hanson chelseali1998 sachuriga

ecephys_spike_sorting's Issues

Implement auto-merging

This module already exists, we just need to clean it up and integrate it into the pipeline

documentation request

Hey, great tool you've made.

it is though, very hard to install.
-can you please mark in the DOCS, all the paths that must be changed in the code to make it work? or even better, have them all in only one file?
otherwise when some one wants to use the tool they have to change it line by line as error messages pop up, which causes some of us to give up on the way.

also, "pipenv" installation and path registering doesn't work (it put's it in your system's python, rather even when specifcially installed with a different version. but this is a minor issue.

thanks,
Shahaf

Move Kilosort parameters into input json

Currently, most of the parameters for Kilosort are hard coded in matlab_file_generator.py. These need to moved into the schema, so they can be tracked using the input json file.

Marshmallow 3/Argschema 2 compatibility

Marshmallow 3.0 (and consequently, argschema 2.0) have changed things so that schemas are always strict, making it so that any extra keys or invalid data are passed, gets you a validation error.

I know there is a force downgrade parameter in the install file for this package, but newer versions of different packages (most relevantly, the Allen SDK), require the newest versions or Marshmallow/Argschema.

See Marshmallow upgrade documentation here: https://marshmallow.readthedocs.io/en/stable/upgrading.html

So far, only working in the NPX extraction module:
This causes errors in making the input JSON file (which can be worked around by creating module-specific files and running them one at a time), and also in the output JSON file, due to mismatched key names in OutputParameters class and the keys returned by run_npx_extraction in __ main__.py

extract_from_npx.__ main__.py:
return {"execution_time" : execution_time, "npx_extractor_commit_date" : commit_date, "npx_extractor_commit_hash" : commit_hash }

extract_from_npx._schemas.py:
class OutputParameters(OutputSchema): npx_extractor_execution_time = Float() settings_json = String() npx_extractor_commit_hash = String() npx_extractor_commit_date = String()

Traceback of error:
Traceback (most recent call last): File "C:\Users\saharm\AppData\Local\Continuum\anaconda3\envs\py3\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "C:\Users\saharm\AppData\Local\Continuum\anaconda3\envs\py3\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "c:\users\saharm\documents\code\code_package_downloads\ecephys_spike_sorting\ecephys_spike_sorting\modules\extract_from_npx\__main__.py", line 77, in <module> main() File "c:\users\saharm\documents\code\code_package_downloads\ecephys_spike_sorting\ecephys_spike_sorting\modules\extract_from_npx\__main__.py", line 71, in main mod.output(output, indent=2) File "C:\Users\saharm\AppData\Local\Continuum\anaconda3\envs\py3\lib\site-packages\argschema\argschema_parser.py", line 234, in output output_json = self.get_output_json(d) File "C:\Users\saharm\AppData\Local\Continuum\anaconda3\envs\py3\lib\site-packages\argschema\argschema_parser.py", line 204, in get_output_json raise(mm.ValidationError(errors)) marshmallow.exceptions.ValidationError: {'execution_time': ['Unknown field.']}

find_depth needs to be updated for mean waveforms

The find_depth function is failing when there are noisy channels. For low-amplitude units, the channel with maximum peak-to-trough amplitude is often identified as a noisy channel, rather than the channel with the actual peak waveform.

I propose 4 possible solutions:

Use the depth from the Kilosort template, rather than the mean waveform. This is convenient, but won't work if there's been any manually merges or splits.
Provide an optional mask to find_depth that will prevent noisy channels from being chosen as the peak channel. This can either come from probe_info.json, or be computed directly from the mean waveforms. This requires information other than the mean waveform for a single unit.
Require that the highest-amplitude channel be well-correlated with the median spike shape across channels. This is pretty simple, and only requires information from a single unit.
Fit a Gaussian to the interpolated waveform at the time of the peak. This should be the most accurate, but also the most complex to compute.

What do you think?

PCmetrics values are not constant for multiple runs

Hello guys, is it normal that when we run the metrics module, on the exactly same dataset, that the values for isolation distance, l_ratio, d_prime and the 2 nearest_neighbours metrics change their values?

For some clusters the values are indeed pretty similar, but for others, like a cluster I have with 35k spikes, the isolation_distance varied from 361 to 556...

The biggest changes come from clusters with more spikes. Any thoughts about that? Is it normal?

Thank you
@jsiegle

Change the outputs of kilosort_postprocessing to overwrite the originals

Also update spike_templates.npy so the data can be loaded in phy.

Concatenating multiple runs

Hi,
We had an issue with concatenating recordings from different triggers with data acquired with SpikeGLX from a NP1.0 probe. We followed the inline comments in 'sglx_multi_run_pipeline.py' using the fork for SpikeGLX data and had 5 different triggers to concatenate.
However, it does not seem that we get the concatenated file from all the runs since the duration is much shorter than expected and we get the following log from catGT.

[Thd 15236 CPU 15 4/04/22 14:24:57.529] Cmdline: CatGT -dir=W:/nobackup/group/user/AH/043_01 -run=concat220327_220401 -g=0 -t=0,5 -prb_fld -prb=0 -ap -lf -ni -apfilter=butter,12,300,10000 -lffilter=butter,12,1,500 -gblcar -SY=0,-1,6,500 -XA=0,0.8,0.2,0 -XA=7,1,0.2,0 -XD=-1,2,500 -XD=-1,4,5 -XD=-1,5,0 -XD=-1,7,5 -BF=0,5,1,5 -dest=W:/nobackup/garber/kanohars/HA/043_01 -out_prb_fld
[Thd 15236 CPU 15 4/04/22 14:25:05.387] Skipping tiny content (olap: 204583110, rem: -7417422, bps: 6) file 'concat220327_220401_g0_t1.nidq.bin'.
[Thd 15236 CPU 15 4/04/22 14:25:05.467] Skipping tiny content (olap: 205072890, rem: -23291088, bps: 6) file 'concat220327_220401_g0_t2.nidq.bin'.
[Thd 15236 CPU 15 4/04/22 14:25:05.547] Skipping tiny content (olap: 205012470, rem: -7134180, bps: 6) file 'concat220327_220401_g0_t3.nidq.bin'.
[Thd 15236 CPU 15 4/04/22 14:25:07.937] Skipping tiny content (olap: 216751854, rem: -67511592, bps: 6) file 'concat220327_220401_g0_t5.nidq.bin'.
[Thd 15236 CPU 15 4/04/22 15:00:58.431] Skipping tiny content (olap: 74353845720, rem: -2695779240, bps: 770) file 'concat220327_220401_g0_t1.imec0.ap.bin'.
[Thd 15236 CPU 15 4/04/22 15:00:58.578] Skipping tiny content (olap: 74531854320, rem: -8464914150, bps: 770) file 'concat220327_220401_g0_t2.imec0.ap.bin'.
[Thd 15236 CPU 15 4/04/22 15:00:58.669] Skipping tiny content (olap: 74509896230, rem: -2592825620, bps: 770) file 'concat220327_220401_g0_t3.imec0.ap.bin'.
[Thd 15236 CPU 15 4/04/22 15:03:28.883] Skipping tiny content (olap: 78776493950, rem: -24536489670, bps: 770) file 'concat220327_220401_g0_t5.imec0.ap.bin'.
[Thd 15236 CPU 15 4/04/22 15:06:30.597] Skipping tiny content (olap: 6196154580, rem: -224648270, bps: 770) file 'concat220327_220401_g0_t1.imec0.lf.bin'.
[Thd 15236 CPU 15 4/04/22 15:06:30.688] Skipping tiny content (olap: 6210988630, rem: -705409320, bps: 770) file 'concat220327_220401_g0_t2.imec0.lf.bin'.
[Thd 15236 CPU 15 4/04/22 15:06:30.757] Skipping tiny content (olap: 6209159110, rem: -216068930, bps: 770) file 'concat220327_220401_g0_t3.imec0.lf.bin'.
[Thd 15236 CPU 15 4/04/22 15:06:40.864] Skipping tiny content (olap: 6564708150, rem: -2044707280, bps: 770) file 'concat220327_220401_g0_t5.imec0.lf.bin'.

This is what the folder with the run looks like -

We noticed in the documentation of CatGT there is a mention of supercat and were wondering if this is the command that is run.
Thanks!

Initial code migration

Code needs to get migrated from braintv_ephys_dev. We want a structure that looks like the one in ecephys_analysis_modules.

kilosort post-processing module for KS3 data

Hi,

It seems to me at the moment only the quality metric module is compatible with KS3 data. Will there be an update for the kilosort-postprocessing module at some point? Thanks for letting me know.

Cheers

Laurenz

Docs build is broken

sphinx.apidoc was moved to sphinx.ext.apidoc and a reroute was added to cover. This reroute strips out the first element passed to it, which breaks our docs build.

Set up tests

I'm going to set up the infrastructure for unit and integration tests, using ecephys_pipeline as a model. However, this won't be able to run cleanly using CI, since it depends on binary executables and Matlab code.

get_sessions method missing?

Hey, I just tried to run the "quickstart" ipynb and it seems to succeed in importing the code and building the cache but I get an error right after that, see below. Any idea what I'm doing wrong? Thanks -

pip install fails on Windows

The call to pip install -q -U pip in the tox.ini file is failing on my Windows machine:

ERROR: InvocationError for command 'C:\\Users\\svc_neuropix\\Documents\\GitHub\\ecephys_spike_sorting\\.tox\\py36-test\\Scripts\\pip.EXE install -q -U pip' (exited with code 1)

What is the purpose of this command, and how should this problem be addressed?

All of the other commands in the tox file run fine if I comment out that line.

Improve noise unit classifier

The current classifier only takes into account the waveform shape on the peak channel. The classification will become much better if we look at the waveform across all channels.

Generate test report

The tests are not currently producing a junit xml.

Mean waveforms: Value error

Hello,
I have been trying to calculate mean waveforms on my data. I used all three: Kilosort2, Kilosort 2.5 and Kilosort 3 outputs and I always get the same error:

Loading data...
Traceback (most recent call last):
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\.venv\lib\site-packages\ecephys_spike_sorting\modules\mean_waveforms\__main__.py", line 69, in <module>
    main()
  File "D:\.venv\lib\site-packages\ecephys_spike_sorting\modules\mean_waveforms\__main__.py", line 59, in main
    output = calculate_mean_waveforms(mod.args)
  File "D:\.venv\lib\site-packages\ecephys_spike_sorting\modules\mean_waveforms\__main__.py", line 23, in calculate_mean_waveforms
    data = np.reshape(rawData, (int(rawData.size/args['ephys_params']['num_channels']), args['ephys_params']['num_channels']))
  File "<__array_function__ internals>", line 6, in reshape
  File "D:\.venv\lib\site-packages\numpy\core\fromnumeric.py", line 299, in reshape
    return _wrapfunc(a, 'reshape', newshape, order=order)
  File "D:\.venv\lib\site-packages\numpy\core\fromnumeric.py", line 58, in _wrapfunc
    return bound(*args, **kwds)
ValueError: cannot reshape array of size 22608838945 into shape (59030911,383)

Could you please help me with how I could fix this error?

I also tried to run the post processing module, but I get an error:
ecephys spike sorting: kilosort postprocessing module

Loading data...
Removing within-unit overlapping spikes...
 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒░ 99% Traceback (most recent call last):
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\runpy.py", line 193, in _run_module_as_main
    "_main_", mod_spec)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\.venv\lib\site-packages\ecephys_spike_sorting\modules\kilosort_postprocessing\_main_.py", line 76, in <module>
    main()
  File "D:\.venv\lib\site-packages\ecephys_spike_sorting\modules\kilosort_postprocessing\_main_.py", line 66, in main
    output = run_postprocessing(mod.args)
  File "D:\.venv\lib\site-packages\ecephys_spike_sorting\modules\kilosort_postprocessing\_main_.py", line 38, in run_postprocessing
    args['ks_postprocessing_params'])
  File "D:\.venv\lib\site-packages\ecephys_spike_sorting\modules\kilosort_postprocessing\postprocessing.py", line 88, in remove_double_counted_spikes
    spikes_to_remove)
  File "D:\.venv\lib\site-packages\ecephys_spike_sorting\modules\kilosort_postprocessing\postprocessing.py", line 219, in remove_spikes
    spike_times = np.delete(spike_times, spikes_to_remove, 0)
  File "<__array_function__ internals>", line 6, in delete
  File "D:\.venv\lib\site-packages\numpy\lib\function_base.py", line 4480, in delete
    keep[obj,] = False
IndexError: arrays used as indices must be of integer (or boolean) type

Lastly, I noticed that in the output from Kilosort3, there are no longer files: pc_feature_ind.npy and pc_feature.npy. Are you planning to switch to Kilosort3 in the future?

Thank you in advance for your help.

Add new metrics

These include:

isolation distance/quality, contamination rate, noise overlap

We also need to validate existing metrics (e.g., isi violations) and add the ability to calculate metrics for different time points or stimulus blocks.

raise InvalidGitRepositoryError

I have activated environment, change path in json
but it still have error below in cmd window:
anyone knows why?

python C:\Users**(myname)\Desktop\mykilo\ecephys_spike_sorting-master\ecephys_spike_sorting\scripts\batch_processing.py
ecephys spike sorting: kilosort helper module
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\qiushou\Desktop\mykilo\ecephys_spike_sorting-master\ecephys_spike_sorting\modules\kilosort_helper_main.py", line 138, in
main()
File "C:\Users\qiushou\Desktop\mykilo\ecephys_spike_sorting-master\ecephys_spike_sorting\modules\kilosort_helper_main.py", line 128, in main
output = run_kilosort(mod.args)
File "C:\Users\qiushou\Desktop\mykilo\ecephys_spike_sorting-master\ecephys_spike_sorting\modules\kilosort_helper_main_.py", line 21, in run_kilosort
commit_date, commit_time = get_repo_commit_date_and_hash(args['kilosort_helper_params']['kilosort_repository'])
File "C:\Users\qiushou\Desktop\mykilo\ecephys_spike_sorting-master\ecephys_spike_sorting\common\utils.py", line 403, in get_repo_commit_date_and_hash
repo = Repo(repo_location)
File "C:\Users\qiushou.virtualenvs\ecephys_spike_sorting-master-DMPDJmnf\lib\site-packages\git\repo\base.py", line 224, in init
self.working_dir: Optional[PathLike] = self._working_tree_dir or self.common_dir
File "C:\Users\qiushou.virtualenvs\ecephys_spike_sorting-master-DMPDJmnf\lib\site-packages\git\repo\base.py", line 307, in common_dir
raise InvalidGitRepositoryError()
git.exc.InvalidGitRepositoryError
Traceback (most recent call last):
File "C:\Users\qiushou\Desktop\mykilo\ecephys_spike_sorting-master\ecephys_spike_sorting\scripts\batch_processing.py", line 41, in
subprocess.check_call(command.split(' '))
File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 347, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['python', '-W', 'ignore', '-m', 'ecephys_spike_sorting.modules.kilosort_helper', '--input_json', 'D:\02\mice_recording_raw\control\7-2\20211224_pre1\20211224_pre1-input.json', '--output_json', 'D:\02\mice_recording_raw\control\7-2\20211224_pre1\20211224_pre1-output.json']' returned non-zero exit status 1.

add parameter to correct waveform amplitude in extractWaveform

and add it to output parameter

Fix bug in PC feature channel selection

For the outputs of Kilosort2, the channels of the PC features matrix are not ordered consecutively. This violates an assumption of the calculate_pc_metrics function in the quality_metrics module. The symptom is that the PC-based metrics cannot be computed for a subset of units, because the PCs for the unit of interest get excluded from the calculation.

Create module for raw data extraction

Convert .npx files to .dat files

Error running mean_waveforms module from curated kilosort output

Hi, I was trying to use the mean_waveforms module and I get the following error towards the end of the analysis.

peak_channels[cluster_idx],
IndexError: index 539 is out of bounds for axis 0 with size 539

From what I can tell, it results from the fact that template.npy isn't updated after manual curation in phy, so merging and spliting gives rise to new cluster ids that are not in the original template.npy. The size of peak_channels depends on the size of template (line 87, extract_waveforms.py) but the cluster_ids are from spike_clusters which are changed following curation(line 79, extract_waveforms.py). Currently my workaround is to run the module on the original kilosort output and manually map the original clusters to the manually curated clusters. I'm curious if there's any way that I can run the module on curated kilosort output directly?

Another thing that I noticed is that on line 195 in utils.py, the cluster_quality is inferred from 'group' column of the cluster_group.tsv. However, it seems the kilosort now outputs cluster_group.tsv with 'KSLabel' instead of 'group'.

I really appreciate your work and it has helped a lot with my research, so just want to say thank you!

Module usage

Just writing some documentation on how the modules are intended to be used:

extract_from_npx

input: npx file (saved by Open Ephys)
output: directory with continuous data and events
dependency: NpxExtractor

median_subtraction

input: continuous data file (ap band)
output: median-subtracted version of that file
dependency: SpikeBandMedianSubtraction

depth_estimation

input: continuous data file (lfp band)
output: surface channel, air channel (in JSON output args)

kilosort_helper

input: continuous data file
output: .npy files for spike times, spike clusters, etc.
dependency: Kilosort2

noise_templates

input: Kilosort .npy files
output: cluster_group.tsv

automerging

input: Kilosort .npy files
output: modifies spike_clusters.npy and cluster_group.tsv

mean_waveforms

input: Kilosort .npy files, continuous data file
output: mean_waveforms.npy

quality_metrics

input: Kilosrt .npy files, continuous data file
output: metrics.npy

Error calculating d_prime

I'm running into the following error when I run the quality_metrics module (using the master branch) on a dataset I collected:

d_prime = lda_metrics(all_pcs, all_labels, cluster_id) (from line 278 in metrics.py)
ValueError: n_components cannot be larger than min(n_features, n_classes - 1).

I've done a little digging and, from what I can tell, it is failing on clusters that do not have any other units_in_range. One way to avoid the problem is to increase the num_channels_to_compare in the quality_metrics_params. Is this a good solution though? I know this will increase the number of neighboring units each cluster is compared to. Another possible solution would be to set a d_prime value (NaN or some high value?) if there are no units_in_range. Does this make sense? I could also be misdiagnosing the problem, any ideas?

in KS2.5 channel_map.npy is now a row vector

Hi,

Just to let you know that I noticed in the new kilosort 2.5 'channel_map.npy' is a row vector instead of a column vector. This crashes the kilosort_postprocessing for me (l.58 in 'postprocessing.py'). You might want to add a line that forces the channel_map to a column vector.

Cheers

Laurenz

Potential synergy between this repository and SpikeInterface

Hello!

I am a developer working on an open-source suite of tools (SpikeInterface) designed to standardize extracellular analysis and spike sorting.

A lot of our work in the past year has gone into standardizing IO for both raw and sorted extracellular file formats, wrapping popular spike sorting algorithms in our framework, and developing comprehensive tools for ground truth analysis of different sorters (along with implementation of some preprocessing and postprocessing tools for spike sorting analysis).

One part of our framework that still needs a lot of work is in the unsupervised metrics that can be used to evaluate the result of a spike sorter on datasets without ground truth. Recently, we discovered this repository and were impressed by the large suite of implemented metrics for this express purpose. We would love to get in touch and even discuss over email/skype about a possible integration of these metrics into SpikeInterface.

If you want to get in touch over email, my email is [email protected].

Best,

Cole Hurwitz

Marshmallow Validation Error

Hello, we are attempting to run the scripts from the extract_from_npx module. When running the following command: python -W ignore -m ecephys_spike_sorting.modules.extract_from_npx --input_json <C:\<path\to\input json> --output_json C:\<path to output json>, we run into the following error:

Traceback (most recent call last): File "C:\Users\lihao\anaconda3\envs\npx\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "C:\Users\lihao\anaconda3\envs\npx\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\lihao\anaconda3\envs\npx\lib\site-packages\ecephys_spike_sorting\modules\extract_from_npx\__main__.py", line 71, in <module> main() File "C:\Users\lihao\anaconda3\envs\npx\lib\site-packages\ecephys_spike_sorting\modules\extract_from_npx\__main__.py", line 59, in main output_schema_type=OutputParameters) File "C:\Users\lihao\anaconda3\envs\npx\lib\site-packages\argschema\argschema_parser.py", line 175, in __init__ result = self.load_schema_with_defaults(self.schema, args) File "C:\Users\lihao\anaconda3\envs\npx\lib\site-packages\argschema\argschema_parser.py", line 274, in load_schema_with_defaults result = utils.load(schema, args) File "C:\Users\lihao\anaconda3\envs\npx\lib\site-packages\argschema\utils.py", line 418, in load results = schema.load(d) File "C:\Users\lihao\anaconda3\envs\npx\lib\site-packages\marshmallow\schema.py", line 707, in load postprocess=True, File "C:\Users\lihao\anaconda3\envs\npx\lib\site-packages\marshmallow\schema.py", line 867, in _do_load raise exc marshmallow.exceptions.ValidationError: {'directories': {'kilosort_output_tmp': ['Unknown field.']}, 'quality_metrics_params': ['Unknown field.'], 'median_subtraction_params': ['Unknown field.'], 'ks_postprocessing_params': ['Unknown field.'], 'waveform_metrics': ['Unknown field.'], 'mean_waveform_params': ['Unknown field.'], 'depth_estimation_params': ['Unknown field.'], 'noise_waveform_params': ['Unknown field.'], 'kilosort_helper_params': ['Unknown field.'], 'ephys_params': ['Unknown field.']}

We are not sure how to proceed, as this seems to be an error in parsing the dictionaries. We are currently using argschema==2.0.2, and we are using a new anaconda environment for this as well. Please let us know if you have any suggestions, thanks!

KeyError: 'mean_waveforms_params'

File "build/bdist.macosx-10.5-x86_64/egg/ecephys_spike_sorting/modules/mean_waveforms/main.py", line 41, in calculate_mean_waveforms
KeyError: 'mean_waveforms_params'

Create a module for batch processing

We need a module that integrates all of the other modules, in order to automatically process data on the experimental machines.

PCmetrics not being calculated for some clusters

Hi guys, I was checking out the function on the metrics module that calculates the Principal Component related metrics, from which the next printscreen was taken:

The condition in line 286 seems to be working just fine, as I tested on my own dataset and it doesn't calculate the metrics for clusters with 20 spikes or less (also tested for 100 spikes threshold). However, I do have some other clusters in my dataset with more than 20 spikes, including one with 30k ish spikes, for which metrics are not being calculated (its row on the DataFrame is filled with NaN values). I was wondering if this has to do with the conditions on lines 284 or 285, since I haven't quite fully understood what they accomplish. Any help is appreciated :) @jsiegle

Command returned non-zero exit status 1

subprocess.check_call(command.split(' '))

error: subprocess.CalledProcessError: Command returned non-zero exit status 1

Add visualization functions

Raw data traces
Comparison of KS templates with original data
Driftmap

waveform variance

@jsiegle @jiaxx @wbwakeman

In the past we've discussed providing some measure of waveform variance (such as a stdev array) along with the mean waveforms. Is this still planned? I think this means an additional file tracked in LIMS as well as produced by this set of modules.

batch_processing.py fails to execute any modules

When I run batch_processing.py, it returns non-zero exit status 1. I have attempted to run a few different modules (extract_from_npx and kilosort_helper) and it gives the same error. It seems that some of the paths are being set to None (in create_input_json.py) but something down the line doesn't like that.

l_ratio

Hi guys,

On line 700 in the metrics.py script from the quality_metrics module, you have "mahalanobis_other" in the denominator of the l_ratio metric. I was wondering if this was a mistake and if it should rather be "mahalanobis_self"?

build error on spike median subtraction module

Hello, so i am trying to build compile the c++ for the median subtraction module and get this error attached. It seems like it cannot properly find the juce. cpp files. is there something i need to do with those before compiling the code?

Improve mean waveform calculation

For each unit, we need the mean waveform at different time points, as well as additional waveform metrics (e.g. peak channel, spread, and width)

Bug in calculate_silhouette_score

Hi,

I think I found a bug in calculate_silhouette_score(). The number features in all_pcs is too low, which leads to overlaps of PC features.

PR with a fix will follow.

Best wishes,
Daniel

Implement silhouette_score as quality metric

See: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html

This is a standard metric of cluster quality, suggested by Jai Bhagat

Add drift metrics

The quality_metrics module should include some measurement of electrode drift, such as the change in the peak channel over time.

PC-based quality metrics on data from Kilosort 1 fail

Hello,

I'm trying to run the quality metrics module on data that was sorted with Kilosort 1 (about 2 years ago if that matters). Unfortunately, computation stops at PC-based metrics:

(ecephys_spike_sorting) bash-3.2$ python -m ecephys_spike_sorting.modules.quality_metrics --input_json params.json --output_json output.json

ecephys spike sorting: quality metrics module
Loading data...
Calculating isi violations
 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 100% 
Calculating presence ratio
 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 100% 
Calculating firing rate
 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 100% 
Calculating amplitude cutoff
 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 100% 
Calculating PC-based metrics
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/agjacob06/Documents/Git/ecephys_spike_sorting/ecephys_spike_sorting/modules/quality_metrics/__main__.py", line 77, in <module>
    main()
  File "/Users/agjacob06/Documents/Git/ecephys_spike_sorting/ecephys_spike_sorting/modules/quality_metrics/__main__.py", line 67, in main
    output = calculate_quality_metrics(mod.args)
  File "/Users/agjacob06/Documents/Git/ecephys_spike_sorting/ecephys_spike_sorting/modules/quality_metrics/__main__.py", line 30, in calculate_quality_metrics
    metrics = calculate_metrics(spike_times, spike_clusters, amplitudes, channel_map, pc_features, pc_feature_ind, args['quality_metrics_params'])
  File "/Users/agjacob06/Documents/Git/ecephys_spike_sorting/ecephys_spike_sorting/modules/quality_metrics/metrics.py", line 81, in calculate_metrics
    params['n_neighbors'])
  File "/Users/agjacob06/Documents/Git/ecephys_spike_sorting/ecephys_spike_sorting/modules/quality_metrics/metrics.py", line 228, in calculate_pc_metrics
    peak_channels[cluster_id] = pc_feature_ind[cluster_id, pc_max]
IndexError: index 513 is out of bounds for axis 0 with size 512

cluster_ids has non-consecutive numbers which seems to be the cause of the problem. I'm a bit confused that enumerate is used as the generator but the variable idx is not used as the index. Could cluster_id be switched with idx?

Postprocessing

Hi. I would like to use the Postprocessing module to remove double detected spikes. I run Kilosort3 from Matlab, then try to run the module in cmd but I get this error message:

ecephys spike sorting: kilosort postprocessing module
Loading data...
Removing within-unit overlapping spikes...
Traceback (most recent call last):
File "C:\Users\dmagyar\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in run_module_as_main
return run_code(code, main_globals, None,
File "C:\Users\dmagyar\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\dmagyar\Documents\Matlab_toolboxes_DM\ecephys_spike_sorting\ecephys_spike_sorting\modules\kilosort_postprocessing_main.py", line 94, in
main()
File "C:\Users\dmagyar\Documents\Matlab_toolboxes_DM\ecephys_spike_sorting\ecephys_spike_sorting\modules\kilosort_postprocessing_main.py", line 84, in main
output = run_postprocessing(mod.args)
File "C:\Users\dmagyar\Documents\Matlab_toolboxes_DM\ecephys_spike_sorting\ecephys_spike_sorting\modules\kilosort_postprocessing_main.py", line 44, in run_postprocessing
remove_double_counted_spikes(spike_times,
File "C:\Users\dmagyar\Documents\Matlab_toolboxes_DM\ecephys_spike_sorting\ecephys_spike_sorting\modules\kilosort_postprocessing\postprocessing.py", line 73, in remove_double_counted_spikes
for idx1, unit_id1 in enumerate(unit_list[order]):
IndexError: index 175 is out of bounds for axis 0 with size 175

For me it seems like unit_list is 1 number smaller than it should be. If I change line 58 in postprocessing.py from "unit_list = np.arange(np.max(spike_clusters)+1)" to "unit_list = np.arange(np.max(spike_clusters)+2)" the module works.

I dont really understand this problem, I'm pretty new to programming.

Thank you for your help.

compiling median_subtraction

Hi,

First of all, thanks for making all the analysis pipeline open access!

I am trying to compile the median_subtraction code but I am lost due to my ignorance of C++.
I am following this guide but I end up with

Main.cpp c:\users\akino s-900\downloads\ecephys_spike_sorting-master\ecephys_spike_sorting-master\ecephys_spike_sorting\modules\median_subtraction\spikebandmediansubtraction\source\../JuceLibraryCode/JuceHeader.h(18) : fatal error C1083: Cannot open include file: 'juce_audio_basics/juce_audio_basics.h': No such file or directory

Can somebody point me to the right direction?

Thanks!

"Files not available" on Quality Metrics module

Hello, I just started to use your modules to process my Kilosort3 output a couple of days ago, and I am mainly interested in running both the Mean Waveforms and the Quality Metrics modules. After some initial drawbacks with instalation issues I was able to run the waveforms module, generating both the mean_waveforms.npy and waveform_metrics.csv files. When it comes to the Quality Metrics module, the following appears:

Any idea of why is it not able to load the data?
Maybe it's some variable I haven't defined yet, but I don´t quite know since the Mean Waveforms module did work.
All the info related to the probe was not defined, only the num_channels=64 (neuronexus probe).
On my create_input_json file I have only explicited the kilosort_output_directory, where I have the .dat file of the whole 2hour recording and the npy and tsv files that Kilosort3 outputed. That means, no extracted_data_directory or kilosort_output_tmp filepaths are defined. Don't know if the error is due to that, just trying to give the best detailed picture of the situation.
Besides that, only defined the path to output_file (the directory where the input and output json files are stored).

Kind regards :)

alleninstitute / ecephys_spike_sorting Goto Github PK

ecephys_spike_sorting's Introduction

ecephys spike sorting

Overview

Compatibility

Level of Support

Modules

Installation and Usage

Linux

macOS

Windows

Terms of Use

ecephys_spike_sorting's People

Contributors

Stargazers

Watchers

Forkers

ecephys_spike_sorting's Issues

Recommend Projects

Recommend Topics

Recommend Org