Build a full ephys pipeline using the canonical pipeline elements
This repository provides demonstrations for:
- Set up a workflow using different elements (see workflow_array_ephys/pipeline.py)
- Ingestion of data/metadata based on:
- predefined file/folder structure and naming convention
- predefined directory lookup methods (see workflow_array_ephys/paths.py)
- Ingestion of clustering results (built-in routine from the ephys element)
The electrophysiology pipeline presented here uses pipeline components from 4 DataJoint Elements,
element-lab
, element-animal
, element-session
and element-array-ephys
, assembled together to form a fully functional workflow.
Clone this repository from here
- Launch a new terminal and change directory to where you want to clone the repository to
cd C:/Projects
- Clone the repository:
git clone https://github.com/datajoint/workflow-array-ephys
- Change directory to
workflow-array-ephys
cd workflow-array-ephys
It is highly recommended (though not strictly required) to create a virtual environment to run the pipeline.
-
You can install with
virtualenv
orconda
. Below are the commands forvirtualenv
. -
If
virtualenv
not yet installed, runpip install --user virtualenv
-
To create a new virtual environment named
venv
:virtualenv venv
-
To activated the virtual environment:
-
On Windows:
.\venv\Scripts\activate
-
On Linux/macOS:
source venv/bin/activate
-
From the root of the cloned repository directory:
pip install -e .
Note: the -e
flag will install this repository in editable mode,
in case there's a need to modify the code (e.g. the pipeline.py
or paths.py
scripts).
If no such modification required, using pip install .
is sufficient
- Register an IPython kernel with Jupyter
ipython kernel install --name=workflow-array-ephys
We provided a tutorial notebook 01-configuration to guide the configuration.
At the root of the repository folder,
create a new file dj_local_conf.json
with the following template:
{
"database.host": "<hostname>",
"database.user": "<username>",
"database.password": "<password>",
"loglevel": "INFO",
"safemode": true,
"display.limit": 7,
"display.width": 14,
"display.show_tuple_count": true,
"custom": {
"database.prefix": "<neuro_>",
"ephys_root_data_dir": "<C:/data/ephys_root_data_dir>"
}
}
-
Specify database's
hostname
,username
, andpassword
properly. -
Specify a
database.prefix
to create the schemas. -
Setup your data directory (
ephys_root_data_dir
) following the convention described below.
- At this point the setup of this workflow is complete.
The workflow presented here is designed to work with the directory structure and file naming convention as followed
-
The
ephys_root_data_dir
is configurable in thedj_local_conf.json
, undercustom/ephys_root_data_dir
variable -
The
subject
directory names must match the identifiers of your subjects in the subjects.csv script -
The
session
directories can have any naming convention -
Each session can have multiple probes, the
probe
directories must match the following naming convention:*[0-9]
(where[0-9]
is a one digit number specifying the probe number) -
Each
probe
directory should contain:-
One neuropixels meta file, with the following naming convention:
*[0-9].ap.meta
-
Potentially one Kilosort output folder
-
root_data_dir/
└───subject1/
│ └───session0/
│ │ └───imec0/
│ │ │ │ *imec0.ap.meta
│ │ │ └───ksdir/
│ │ │ │ spike_times.npy
│ │ │ │ templates.npy
│ │ │ │ ...
│ │ └───imec1/
│ │ │ *imec1.ap.meta
│ │ └───ksdir/
│ │ │ spike_times.npy
│ │ │ templates.npy
│ │ │ ...
│ └───session1/
│ │ │ ...
└───subject2/
│ │ ...
We provide an example data set to run through this workflow. The instruction of data downloading is in the notebook 00-data-download.
For new users, we recommend using the following two notebooks to run through the workflow.
Here is a general instruction:
Once you have your data directory configured with the above convention, populating the pipeline with your data amounts to these 3 steps:
-
Insert meta information (e.g. subjects, sessions, etc.) - modify:
- user_data/subjects.csv
- user_data/sessions.csv
-
Import session data - run:
python workflow_array_ephys/ingest.py
-
Import clustering data and populate downstream analyses - run:
python workflow_array_ephys/populate.py
-
For inserting new subjects, sessions or new analysis parameters, step 1 needs to be re-executed.
-
Rerun step 2 and 3 every time new sessions or clustering data become available.
-
In fact, step 2 and 3 can be executed as scheduled jobs that will automatically process any data newly placed into the
imaging_root_data_dir
.
For new users, we recommend using our notebook 05-explore to interact with the pipeline.
Here is a general instruction:
-
Connect to database and import tables
from workflow_array_ephys.pipeline import *
-
View ingested/processed data
subject.Subject() session.Session() ephys.ProbeInsertion() ephys.EphysRecording() ephys.Clustering() ephys.Clustering.Unit()
-
If required to drop all schemas, the following is the dependency order. Also refer to 06-drop
from workflow_array_ephys.pipeline import * ephys.schema.drop() probe.schema.drop() session.schema.drop() subject.schema.drop() lab.schema.drop()
This method allows you to modify the source code for workflow-array-ephys
, element-array-ephys
, element-animal
, element-session
, and element-lab
.
- Launch a new terminal and change directory to where you want to clone the repositories
cd C:/Projects
- Clone the repositories
git clone https://github.com/datajoint/element-lab git clone https://github.com/datajoint/element-animal git clone https://github.com/datajoint/element-session git clone https://github.com/datajoint/element-array-ephys git clone https://github.com/datajoint/workflow-array-ephys
- Install each package with the
-e
optionpip install -e ./workflow-array-ephys pip install -e ./element-session pip install -e ./element-lab pip install -e ./element-animal pip install -e ./element-array-ephys
-
Download the test dataset to your local machine (note the directory where the dataset is saved at - e.g.
/tmp/testset
) -
Create an
.env
file with the following content:TEST_DATA_DIR=/tmp/testset
(replace
/tmp/testset
with the directory where you have the test dataset downloaded to) -
Run:
docker-compose -f docker-compose-test.yaml up --build