deeplabcut / dlc2nwb Goto Github PK

Utilities to convert DeepLabCut (DLC), output to/from Neurodata Without Borders (NWB) format.

License: MIT License

Python 99.32% Shell 0.68%

deeplabcut nwb nwb-format

dlc2nwb's Introduction

Welcome to the DeepLabCut 2 Neurodata Without Borders Repo

Here we provide utilities to convert DeepLabCut (DLC) output to/from Neurodata Without Borders (NWB) format. This repository also elaborates a way for how pose estimation data should be represented in NWB.

Specifically, this package allows you to convert DLC's predictions on videos (*.h5 files) into NWB format. This is best explained with an example (see below).

NWB pose ontology

The standard is presented here. Our code is based on this NWB extension (PoseEstimationSeries, PoseEstimation) that was developed with Ben Dichter, Ryan Ly and Oliver Ruebel.

Installation:

Simply do (it only depends on ndx-pose and deeplabcut):

pip install dlc2nwb

Example within DeepLabCut

DeepLabCut's h5 data files can be readily converted to NWB format either via the GUI from the Analyze Videos tab or programmatically, as follows:

import deeplabcut

deeplabcut.analyze_videos_converth5_to_nwb(config_path, video_folder)

Note that DLC does not strictly depend on dlc2nwb just yet; if attempting to convert to NWB, a user would be asked to run pip install dlc2nwb.

Example use case of this package (directly):

Here is an example for converting DLC data to NWB format (and back). Notice you can also export your data directly from DeepLabCut.

from dlc2nwb.utils import convert_h5_to_nwb, convert_nwb_to_h5

# Convert DLC -> NWB:
nwbfile = convert_h5_to_nwb(
    'examples/config.yaml',
    'examples/m3v1mp4DLC_resnet50_openfieldAug20shuffle1_30000.h5',
)

# Convert NWB -> DLC
df = convert_nwb_to_h5(nwbfile[0])

Example data to run the code is provided in the folder examples. The data is based on a DLC project you can find on Zenodo and that was originally presented in Mathis et al., Nat. Neuro as well as Mathis et al., Neuron. To limit space, the folder only contains the project file config.yaml and DLC predictions for an example video called m3v1mp4.mp4, which are stored in *.h5 format. The video is available, here.

Funding and contributions:

We gratefully acknowledge the generous support from the Kavli Foundation via a Kavli Neurodata Without Borders Seed Grants .

We also acknowledge feedback, and our collaboration with Ben Dichter, Ryan Ly and Oliver Ruebel.

dlc2nwb's People

Contributors

Stargazers

Watchers

Forkers

bendichter catalystneuro cbroz1 h-mayorquin waondering vigji

dlc2nwb's Issues

Having a new release to include append nwbfile mode

After you have merged #10 we were wondering if you could do a new release. That would allow us to indicate the users that they can use this new feature by using pip install dlc2nwb instead of telling them to install the dev branch from github.

Let us know if there is anything that we can help you to make this happen.

Request for new release

Hello all,

The latest release of deeplabcut triggered some minor breaks on downsteam packages due to the loosening of tensorflow in minimal requirements: catalystneuro/neuroconv#268

After checking the latest state of the main branch here, though, it seems like this may have been anticipated as of a few months ago by making the deeplabcut import here safer: https://github.com/DeepLabCut/DLC2NWB/blob/main/dlc2nwb/utils.py#L16-L17

However, it seems like the last release (Jul 29) was a few months before that.

Thus I'd like to request a new release of dlc2nwb so I can pin to the version using this safer import and bypass the need for tensorflow altogether.

Let me know if I'm missing something~

Cheers and happy holidays!

Enable stand-alone conversion without DeepLabCut installed

Recently we had an issue to use this repository as the dependency on deeplabcut crashed our mac workflow (we think this might be related to the context of the following issue DeepLabCut/DeepLabCut#1430). While using state of the art deep learning libraries is a necessity for performing the powerful analysis that deeplabcut enables they are also know to have brittle installation process and introduce hard dependency management problems in the ecosystem. In that context, we think it might be useful to enable this repository to work as stand-alone post-processing tool on itself. That is, running a conversion pipeline in an environment that does not have deeplabcut as a dependency.

Two illustrate this need more concretely, consider the two following scenarios:

A research runs the deeplabcut processing pipeline in a workstation machine but then runs the data analysis or paper writing in another computer and wants to modify the writing-to-nwb pipeline quickly.
A researcher gets the deeplabcut data from a collaborator and is going to integrate this data in a pipeline that includes other modalities but does not have the environment where the initial analysis was carried out.

In the cases above, the results are already produced and this library role would be just transforming the data into nwb. Therefore, installing deeplabcut in those scenarios is unnecessary and, as discussed above, might be brittle.

I am opening an accompanying PR that achieves this with minimal changes.

`opencv-python[-headless]` not installed automatically

In a fresh venv environment, opencv-python[-headless] does not automatically install opencv-python. Doing pip show opencv-python or pip show opencv-python-headless shows nothing.

This is the final output for pip install dlc2nwb inside a fresh venv:

Successfully installed attrs-23.1.0 dlc2nwb-0.3 h5py-3.9.0 hdmf-3.8.0 jsonschema-4.18.4 jsonschema-specifications-2023.7.1 ndx-pose-0.1.1 numpy-1.25.1 pandas-2.0.3 pynwb-2.4.0 python-dateutil-2.8.2 pytz-2023.3 referencing-0.30.0 rpds-py-0.9.2 ruamel-yaml-0.17.32 ruamel.yaml.clib-0.2.7 scipy-1.11.1 six-1.16.0 tzdata-2023.3

Add optional parameters to `PoseEstimation`

As described in https://github.com/catalystneuro/neuroconv/issues/915, I would need to pass over a name value to PoseEstimation to support multiple DLC pose estimations in the same nwb file.

I am currently fixing it by piping an optional dictionary of additional kwargs from the args in this way:

def _write_pes_to_nwbfile(
   nwbfile,
   animal,
   df_animal,
   scorer,
   video,  # Expects this to be a tuple; first index is string path, second is the image shape as "0, width, 0, height"
   paf_graph,
   timestamps,
   exclude_nans,
   **optional_kwargs,
):  
   ...

   pe = PoseEstimation(
       pose_estimation_series=pose_estimation_series,
       description="2D keypoint coordinates estimated using DeepLabCut.",
       original_videos=[video[0]],
       # TODO check if this is a mandatory arg in ndx-pose (can skip if video is not found_
       dimensions=[list(map(int, video[1].split(",")))[1::2]],
       scorer=scorer,
       source_software="DeepLabCut",
       source_software_version=deeplabcut_version,
       nodes=[pes.name for pes in pose_estimation_series],
       edges=paf_graph if paf_graph else None,
       **optional_kwargs,
   )

If you would be open to support this fix, I can submit a new PR or add this solutions as well to #23, as you prefer!

Separate `PoseEstimation()` object from `NWBFile()` metadata?

I'm interested in using the first half of convert_h5_to_nwb to save with different metadata from DataJoint Elements.

Would you be interested in a PR that separated out the pieces that generated PoseEstimation() object?

Next steps

include unit-tests for testing round-trip conversion
update usage in docs
expand description of DLC data + NWB data

Later:

put an example export in DLC cookbook
set up CI
add DLC multi animal project example

Can't get movie timestamps due to TypeError: object of type 'cv2.VideoCapture' has no len()

It is impossible to retrieve movie timestamps due to dlc2nwb.utils.get_movietimestamps throwing TypeError: object of type 'cv2.VideoCapture' has no len()

Steps to reproduce:

Download this video: https://photos.app.goo.gl/ddFTw11QWRjVPUmXA
Navigate to the folder where you downloaded the video in the terminal of your choice
Open a python interpreter
Run the following code:

from dlc2nwb.utils import get_movie_timestamps
get_movie_timestamps('VID_20240117_165651.mp4')

Expected behaviour:

The timestamps are returned

Actual behaviour:

TypeError: object of type 'cv2.VideoCapture' has no len()

Environment info:

OS: Windows 10 x64
Conda version: 23.3.1
Python version: 3.9.0
opencv-python version: 4.7.0.72
dlc2nwb version: 0.3

Additional info:

This error might depend on the opencv-python version, in which case pinning the DLC2NWB package to whichever version added the ability to get the len() of a cv.VideoReader (or the one before they took it away if it's an old feature) is the simplest solution.

Alternatively, the first return value of reader.read() is a boolean indicating whether a frame was successfully reader, so using this to change the for loop to a while loop, like so:

success, _ = reader.read()
while success:
    timestamps.append(reader.get(cv2.CAP_PROP_POS_MSEC))
    success, _ = reader.read()

fixes the issue. However, (again, possibly depending on your opencv-python version) you then run into an AttributeError on line 83, since a cv2.VideoReader has no attribute fps. This can be fixed by replacing reader.fps on that line with reader.get(cv2.CAP_PROP_FPS)

timetamps end in 0s

Installation and release

Once we are happy with this converter, we will integrate the functionality in DLC.

put ndx-pose on pypi
put DLC2NWB code on pypi --> voila
allow exporting from DeepLabCut, see here: https://github.com/DeepLabCut/DeepLabCut/tree/nwb

timestamps

This looks really good!

One issue I found is that the timestamps are integers, e.g. (1.0, 2.0, 3.0, ...). These should be the time in seconds with respect to the session start time. Does DLC track timing or does it simply go frame by frame without needing to know about the frame times? We may need to have an additional input argument to support this. We could input just a sampling rate, but I know that videos often have irregular sampling.

Another more minor thing is that if all the timestamps vectors are the same, we can do is create links between the timestamps so the values only need to be stored once in the file and the other timeseries object can point to it. It's easy to do this in PyNWB, but the syntax would be hard to guess. You do:

timeseries1 = TimeSeries(...)
timeseries2 = TimeSeries(..., timestamps=timeseries1)

including skeleton

ndx-pose has a place for "edges", which I think maps to "skeleton" in DLC. However, the example config.yml does not contain skeleton information and the converter does not handle skeleton info. Including this information would allow us to provide much better visualizations of DLC output in NWB Widgets.

Expanding metadata file

DLC stores the following metadata:

https://github.com/DeepLabCut/DeepLabCut/blob/444da3da70b1f3bfaae39b0849e0aafcff1a9b7a/deeplabcut/pose_estimation_tensorflow/predict_videos.py#L750

(we are likely going to expand this in the near future). What other variables should be stored and could we put in NWB?