google-research / perch Goto Github PK

License: Apache License 2.0

Python 96.16% Jupyter Notebook 3.72% Shell 0.09% Dockerfile 0.03%

perch's Introduction

Perch

A bioacoustics research project.

Installation

We support installation on a generic Linux workstation. A GPU is recommended, especially when working with large datasets. The recipe below is the same used by our continuous integration testing.

Some users have successfully used our repository with the Windows Linux Subsystem, or with Docker in a cloud-based virtual machine. Anecdotally, installation on OS X is difficult.

You might need the following dependencies.

# Install Poetry for package management
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies for librosa
sudo apt-get install libsndfile1 ffmpeg

# Install all dependencies specified in the poetry configs
poetry install  --with jaxtrain

Running poetry install installs all Perch dependencies into a new virtual environment, in which you can run the Perch code base. To run the tests, use:

poetry run python -m unittest discover -s chirp/tests -p "*test.py"
poetry run python -m unittest discover -s chirp/inference/tests -p "*test.py"

Lightweight Inference

Note that if you only need the python notebooks for use with pre-trained models, you can install with lighter dependencies:

# Install inference-only dependencies specified in the poetry configs
poetry install

And check that the inference tests succeed:

poetry run python -m unittest discover -s chirp/inference/tests -p "*test.py"

Using a container

Alternatively, you can install and run this project using a container via Docker. To build a container using the tag perch, run:

git clone https://github.com/google-research/perch
cd perch
docker build . --tag perch

After building the container, to run the unit tests, use:

docker run --rm -t perch python -m unittest discover -s chirp/tests -p "*test.py"

BIRB benchmark

Data preparation

To build the BIRB evaluation data, after installing the chirp package, run the following command from the repository's root directory:

poetry run tfds build -i chirp.data.bird_taxonomy,chirp.data.soundscapes \
    soundscapes/{ssw,hawaii,coffee_farms,sierras_kahl,high_sierras,peru}_full_length \
    bird_taxonomy/{downstream_full_length,class_representatives_slice_peaked}

The process should take 36 to 48 hours to complete and use around 256 GiB of disk space.

Benchmark README

For details on setting up the benchmark and evaluation protocol, please refer to this brief readme with instructions. The evaluation codebase is in perch/chirp/eval.

This is not an officially supported Google product.

perch's People

Contributors

Stargazers

Watchers

perch's Issues

CUDA_ERROR_ILLEGAL_ADDRESS is thrown when running chirp test in CUDA environment

After cloning the repo and installing all required dependencies, all unit tests can be run correctly on CPU as below.

poetry run python -m unittest discover -s chirp/tests -p "*test.py"

But CUDA_ERROR_ILLEGAL_ADDRESS is thrown when running chirp test in CUDA environment. Here is my CUDA environment

GPU: Nvidia RTX 3090
Ubuntu: 22.10
Driver version: 525.60.13
Python version: 3.10
cudatoolkit version: 11.8
cudnn version: 8.6
jax version: 0.4.1
jaxlib version: 0.4.1+cuda11.cudnn86
flax version: 0.6.3

The first issue I ran into is the OOM failure, so I followed the instruction to use CPU-only Tensorflow.
GPU memory allocation

tests/sep_train_test.py TrainSeparationTest.test_eval_one_step works fine.

PYTHONPATH=. python chirp/tests/sep_train_test.py TrainSeparationTest.test_eval_one_step

but tests/sep_train_test.py TrainSeparationTest.test_train_one_step reports CUDA_ERROR_ILLEGAL_ADDRESS error consistently in my CUDA environment.

PYTHONPATH=. python chirp/tests/sep_train_test.py TrainSeparationTest.test_train_one_step

Error message:

I0111 16:41:57.470412 140120040671040 pipeline.py:742] Splitting batch across 1 devices, with local device count 1.
I0111 16:42:07.193832 140120040671040 utils.py:33] Checkpoint.restore_or_initialize() ...
I0111 16:42:07.193934 140120040671040 utils.py:33] MultihostCheckpoint.get_latest_checkpoint_to_restore_from() ...
I0111 16:42:07.194855 140120040671040 checkpoint.py:508] /tmp/tmpl5pjxa1ptrain_dir-0 not in []
I0111 16:42:07.194904 140120040671040 utils.py:43] MultihostCheckpoint.get_latest_checkpoint_to_restore_from() finished after 0.00s.
I0111 16:42:07.194927 140120040671040 checkpoint.py:346] Storing initial version.
I0111 16:42:07.194949 140120040671040 utils.py:33] Checkpoint.save() ...
I0111 16:42:07.195027 140120040671040 checkpoint.py:304] Storing next checkpoint '/tmp/tmpl5pjxa1ptrain_dir-0/ckpt-1'
I0111 16:42:07.219635 140120040671040 utils.py:43] Checkpoint.save() finished after 0.02s.
I0111 16:42:07.219720 140120040671040 utils.py:43] Checkpoint.restore_or_initialize() finished after 0.03s.
/home/ryanz/projects/ml/source-separation/chirp/chirp/models/metrics.py:188: FutureWarning: The sym_pos argument to solve() is deprecated and will be removed in a future JAX release. Use assume_a='pos' instead.
  return scipy.linalg.solve(
2023-01-11 16:42:13.886039: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:1032] could not wait stream on event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2023-01-11 16:42:13.886061: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/stream.cc:1112] Error waiting for event in stream: error recording waiting for CUDA event on stream 0x562ab718b7a0; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2023-01-11 16:42:13.886070: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:1159] failed to enqueue async memcpy from device to host: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered; host dst: 0x7f6d38002560; GPU src: 0x7f67441dfe00; size: 4=0x4
2023-01-11 16:42:13.886077: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/stream.cc:327] Error recording event in stream: Error recording CUDA event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2023-01-11 16:42:13.886094: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:614] unable to add host callback: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
E0111 16:42:13.888874 140076977440448 asynclib.py:139] Error in producer thread for AsyncWriter
Traceback (most recent call last):
  File "/home/ryanz/miniconda3/envs/new-tf/lib/python3.10/site-packages/clu/asynclib.py", line 135, in trap_errors
    return fn(*args, **kwargs)
  File "/home/ryanz/miniconda3/envs/new-tf/lib/python3.10/site-packages/clu/metric_writers/logging_writer.py", line 44, in write_scalars
    values = [
  File "/home/ryanz/miniconda3/envs/new-tf/lib/python3.10/site-packages/clu/metric_writers/logging_writer.py", line 45, in <listcomp>
    f"{k}={v:.6f}" if isinstance(v, float) else f"{k}={v}"
  File "/home/ryanz/miniconda3/envs/new-tf/lib/python3.10/site-packages/jax/_src/array.py", line 252, in __format__
    return format(self._value[()], format_spec)
  File "/home/ryanz/miniconda3/envs/new-tf/lib/python3.10/site-packages/jax/_src/array.py", line 487, in _value
    self._npy_value = np.asarray(self._arrays[0])  # type: ignore
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: stream did not block host until done; was already in an error state
I0111 16:42:13.892304 140120040671040 utils.py:33] Checkpoint.save() ...
E0111 16:42:13.892458 140076985833152 asynclib.py:139] Error in producer thread for AsyncWriter

I tried to trace the cause by commenting out lines in this function and it is the separator.train() that causes the error.

  def test_train_one_step(self):
    config = self._get_test_config(use_small_encoder=True)
    ds, _ = self._get_test_dataset(
        'train',
        config,
    )
    model = separator.initialize_model(
        workdir=self.train_dir, **config.init_config)

    separator.train(
        *model, train_dataset=ds, logdir=self.train_dir, **config.train_config)
    ckpt = checkpoint.MultihostCheckpoint(self.train_dir)
    self.assertIsNotNone(ckpt.latest_checkpoint)

It could be something wrong with my CUDA environment but test_eval_one_step works fine so does other JAX code.

It would be great if Chirp team can share your CUDA environment setup.

Thanks,

Ryan

Include the spectrogram in outputs

Oftentimes the best way to debug a bioacoustic classifier is to eyeball the mel spectrogram. Would it be possible to either document some sort of escape hatch that allows me access to the spectrogram that chirp calculates for a window of audio during inference, or just include it in the output dict alongside the logits and embeddings?

Most of the time I use the model from TFHub, which provides fewer access points for this than the chirp Python library.

Switch from tfhub to kaggle models API

TFHub has been folded into Kaggle Models - we should use the new API, and check that the model wrappers still work as intended.

Consolidate colab notebooks into a single notebook

This allows better memory management and possibly less user confusion, at the cost of a much larger single colab notebook.

Support wavs+csv labels for building small classifiers

We currently use a folder-of-folder format for labeled data. If we allow providing a directory of audio files and a CSV of labels, then we will be able to support:

a) Evaluation with published fully-annotated datasets,
b) Simpler import of people's pre-existing labeled data,
c) Multi-label audio examples,
d) 'Strong negative' labels (via an extra CSV column)

CCAI Notebook broken due to Jax Config issue

AttributeError: 'Config' object has no attribute 'define_bool_state'

Add logged validation (for call density estimation) to Colab notebook.

As described here:
https://arxiv.org/abs/2402.15360

Registers are spilled to local memory on calculating embeddings

I'm trying to find a solution to a memory spill issue. I am running perch on a few 100 GB of audio. When I run `perch/embed_audio.ipynb I end up with a lot of spills into local memory and it's not an issue I've had to trace before:

Environment:

Python 3.11.9
GCC 11.2.0
NVIDIA-SMI 535.161.08
Driver Version: 535.161.08
CUDA Version: 12.2
Tensorflow 2.16.1
OS: Ubuntu 22.04.4 LTS
VM: Azure Standard NC24ads A100 v4
RAM 220 GB
CPU 24x vCPU AMD EPYC™ 7V13 (Milan)
GPU A100 80GB PCIe GPU card

I0000 00:00:1718394013.781000    6517 asm_compiler.cc:369] ptxas warning : Registers are spilled to local memory in function 'triton_gemm_dot_2', 24 bytes spill stores, 24 bytes spill loads

I0000 00:00:1718394013.794912    6502 asm_compiler.cc:369] ptxas warning : Registers are spilled to local memory in function 'triton_gemm_dot_3753', 52 bytes spill stores, 52 bytes spill loads

I0000 00:00:1718394013.866065    6524 asm_compiler.cc:369] ptxas warning : Registers are spilled to local memory in function 'triton_gemm_dot_3753', 220 bytes spill stores, 220 bytes spill loads

I0000 00:00:1718394014.030538    6506 asm_compiler.cc:369] ptxas warning : Registers are spilled to local memory in function 'triton_gemm_dot_2', 472 bytes spill stores, 304 bytes spill loads

This leads to the following slow execution errors but the script continues with register spilling errors without falling over but running very slowly. Could anyone suggest some pointers to solve this? Many thanks.


2024-06-14 19:40:16.455757: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng0{} for conv (f32[719,640,501,1]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,1,160640,1]{3,2,1,0}, f32[640,1,640,1]{3,2,1,0}), window={size=640x1 stride=320x1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:18.707113: E external/local_xla/xla/service/slow_operation_alarm.cc:133] The operation took 3.251487411s
Trying algorithm eng0{} for conv (f32[719,640,501,1]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,1,160640,1]{3,2,1,0}, f32[640,1,640,1]{3,2,1,0}), window={size=640x1 stride=320x1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:20.156609: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng4{} for conv (f32[719,160,500,1]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,160,755,1]{3,2,1,0}, f32[160,1,256,1]{3,2,1,0}), window={size=256x1}, dim_labels=bf01_oi01->bf01, feature_group_count=160, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:22.419095: E external/local_xla/xla/service/slow_operation_alarm.cc:133] The operation took 3.262581059s
Trying algorithm eng4{} for conv (f32[719,160,500,1]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,160,755,1]{3,2,1,0}, f32[160,1,256,1]{3,2,1,0}), window={size=256x1}, dim_labels=bf01_oi01->bf01, feature_group_count=160, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:28.401848: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng46{k2=5,k5=3,k14=4} for conv (f32[719,32,249,79]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,32,249,79]{3,2,1,0}, f32[32,1,3,3]{3,2,1,0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, feature_group_count=32, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:28.672951: E external/local_xla/xla/service/slow_operation_alarm.cc:133] The operation took 1.271200136s
Trying algorithm eng46{k2=5,k5=3,k14=4} for conv (f32[719,32,249,79]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,32,249,79]{3,2,1,0}, f32[32,1,3,3]{3,2,1,0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, feature_group_count=32, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:37.274820: W external/local_tsl/tsl/framework/bfc_allocator.cc:368] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature.
2024-06-14 19:40:38.274947: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng3{k11=0} for conv (f32[719,96,249,79]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,16,249,79]{3,2,1,0}, f32[96,16,1,1]{3,2,1,0}), window={size=1x1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:38.595592: E external/local_xla/xla/service/slow_operation_alarm.cc:133] The operation took 1.320788364s
Trying algorithm eng3{k11=0} for conv (f32[719,96,249,79]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,16,249,79]{3,2,1,0}, f32[96,16,1,1]{3,2,1,0}), window={size=1x1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:41.298653: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng46{k2=5,k5=3,k14=4} for conv (f32[719,96,125,40]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,96,249,79]{3,2,1,0}, f32[96,1,3,3]{3,2,1,0}), window={size=3x3 stride=2x2 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, feature_group_count=96, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:41.485093: E external/local_xla/xla/service/slow_operation_alarm.cc:133] The operation took 1.186534591s
Trying algorithm eng46{k2=5,k5=3,k14=4} for conv (f32[719,96,125,40]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,96,249,79]{3,2,1,0}, f32[96,1,3,3]{3,2,1,0}), window={size=3x3 stride=2x2 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, feature_group_count=96, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:42.485235: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng4{} for conv (f32[719,96,125,40]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,96,249,79]{3,2,1,0}, f32[96,1,3,3]{3,2,1,0}), window={size=3x3 stride=2x2 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, feature_group_count=96, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:42.517285: E external/local_xla/xla/service/slow_operation_alarm.cc:133] The operation took 1.032144481s
Trying algorithm eng4{} for conv (f32[719,96,125,40]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,96,249,79]{3,2,1,0}, f32[96,1,3,3]{3,2,1,0}), window={size=3x3 stride=2x2 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, feature_group_count=96, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:47.752097: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng46{k2=5,k5=3,k14=4} for conv (f32[719,144,125,40]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,144,125,40]{3,2,1,0}, f32[144,1,3,3]{3,2,1,0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, feature_group_count=144, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:48.225649: E external/local_xla/xla/service/slow_operation_alarm.cc:133] The operation took 1.473644391s
Trying algorithm eng46{k2=5,k5=3,k14=4} for conv (f32[719,144,125,40]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,144,125,40]{3,2,1,0}, f32[144,1,3,3]{3,2,1,0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, feature_group_count=144, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
  0%|          | 6/21119 [01:10<84:44:35, 14.45s/it] W0000 00:00:1718394066.767582    4752 assert_op.cc:38] Ignoring Assert operator jax2tf_infer_fn_/assert_equal_1/Assert/AssertGuard/Assert

Pip install is unpredictable and often breaks Colab usage

I discovered that pip install doesn't actually make use of the poetry lock file, and essentially makes up the dependency tree on the fly from the pyproject.toml file.... This means that it's pretty easy to get into a weird state when we do the colab pip install: the lock file gives us a specific tested combination of dependency versions, with CI tests, but we don't have any real way to test what's going on with the pip-installed version.

Ideally, we should have pip install the exact set of dependency versions specified in the lock file, to ensure that our CI testing actually tells us that the Colab notebooks are working.

There's some pretty extensive discussion here of the problem:
python-poetry/poetry#2778 (comment)

Very short audio snippets fail to display

Sometimes we get embeddings for very short audio clips, which then cause an error when we try to compute the melspec during audio display.

These /should/ be filtered out during embedding anyways... eg, last 12 samples of a 1 minute+epsilon file.

Cannot install package: resolution impossible because optax has new version

I came in this morning and can no longer install the perch package. I tracked it down to this commit in a transitive dependency: google-deepmind/optax@beae523

In short:

chirp requires optax < 0.2.0
chirp requires scenic
scenic requires the latest version of optax, which is 0.2.0.dev, i.e. conflicting with chirp's own requirement.

scenic doesn't have tagged releases that perch can specify, so a resolution to this issue is probably going to have to require the scenic authors to constrain their own dependency on optax (unlikely?) or say perch itself can accept optax 0.2.*.

$ pip install "chirp @ git+https://github.com/google-research/perch.git@77edeff5800be0cc1af81bf8c078f70a1ad82f79"
<snip a bunch of logs>
ERROR: Cannot install chirp and chirp==0.1.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    chirp 0.1.0 depends on optax<0.2.0 and >=0.1.7
    flax 0.7.4 depends on optax
    scenic 0.0.1 depends on optax 0.2.0.dev0 (from git+https://github.com/google-deepmind/optax.git@main)

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

TF SavedModel to Jax conversion

Hello,

I am trying to get the original jax model; by converting the weight of the tf saved model however the embedding i am getting seems different. I wonder if there is some parameters i should configure when calling the EfficientNet model in jax.

Here my code:

import tensorflow_hub as hub
import tensorflow, tensorflow.python as tf

# Load the model.
model = hub.load('https://www.kaggle.com/models/google/bird-vocalization-classifier/TensorFlow2/bird-vocalization-classifier/8')

waveform = librosa.load('....')[0]

out = model.infer_tf(waveform[np.newaxis, :])
spec = out['frontend']
embedding = out['embedding']

# jax model 
backbone = EfficientNet(model=EfficientNetModel(value="b1"))

# init jax  
rng = jax.random.PRNGKey(42)
rng, inp_rng, init_rng, dropout_key = jax.random.split(rng, 4)
inp = jax.random.normal(inp_rng, (8, 2))  # Batch size 8, input size 2

# Initialize the model
inputs = jax.numpy.asarray(np.expand_dims(spec, axis=-1))
params = backbone.init(init_rng, inputs, train=False, )

# transfer weight 

# --- params --- #
for k1, v1 in model._structured_variables['params']['encoder'].items():
    for k2, v2 in v1.items():
        for k3, v3 in v2.items():
            if type(v3) == tf.trackable.data_structures._DictWrapper:
                 for k4, v4 in v3.items():
                    params['params'][k1][k2][k3][k4] = jax.numpy.asarray(v4)
            else:
                params['params'][k1][k2][k3] = jax.numpy.asarray(v3)

for k1, v1 in model._structured_variables['batch_stats']['encoder'].items():
    for k2, v2 in v1.items():
        for k3, v3 in v2.items():
            if type(v3) == tf.trackable.data_structures._DictWrapper:
                 for k4, v4 in v3.items():
                    params['batch_stats'][k1][k2][k3][k4] = jax.numpy.asarray(v4)
            else:
                params['batch_stats'][k1][k2][k3] = jax.numpy.asarray(v3)

# inference with jax 
embedding_jax = backbone.apply(params, inputs, train=False, use_running_average=True, rngs={'dropout': dropout_key})

Is there anything to configure such as activation function, head and stem when calling EffcientNet model?

embedding and embedding jax are very different.

Spectrogram reproduction

Hello,

Trying to reproduce the spectrogram output (frontend) available within this model: https://www.kaggle.com/models/google/bird-vocalization-classifier/TensorFlow2/bird-vocalization-classifier

I used so far the MelSpectrogram class from frontend.py and try to add as well the normalized_audio function but the results i get are different.

mel = MelSpectrogram(sample_rate=32000, freq_range=(60,10000), kernel_size=1024, features=160, stride=320)

I am using the right function?

Occasional repeated results in search

The TopKSearch results should be deduplicated, but occasionally we see repeated results. What gives?

How to run the retrieval task in BIRB on the downstream datasets?

I have the data ready but Im a bit lost in the code, could you please provide step by step details on how to run the retrieval tasks of BIRB? thanks in advance!

Memory leak

I have 48,391 audio files totalling 80GB in size.

Ubuntu with python3.10.12 and 126GB memory and 8GB swap and 32 cores CPU.

When I run the notebook agile_modeling.ipynb multiple times, the memory usage always reaches maximum capacity. I suspect that certain lines of code in note book:

audio_iterator = audio_utils.multi_load_audio_window(
    filepaths=[s.filepath for s in new_source_infos],
    offsets=[s.shard_num * s.shard_len_s for s in new_source_infos],
    sample_rate=config.embed_fn_config.model_config.sample_rate,
    window_size_s=config.get('shard_len_s', -1.0),
)
with tf_examples.EmbeddingsTFRecordMultiWriter(
    output_dir=output_dir, num_files=config.get('tf_record_shards', 1)) as file_writer:
  for source_info, audio in tqdm.tqdm(
      zip(new_source_infos, audio_iterator), total=len(new_source_infos)):
    file_id = source_info.file_id(config.embed_fn_config.file_id_depth)
    offset_s = source_info.shard_num * source_info.shard_len_s
    example = embed_fn.audio_to_example(file_id, offset_s, audio)
    if example is None:
      fail += 1
      continue
    file_writer.write(example.SerializeToString())
    succ += 1
  file_writer.flush()
print(f'\n\nSuccessfully processed {succ} source_infos, failed {fail} times.')

After changing the audio_utils.multi_load_audio_window's argument max_workers to 1, the problem still occurs.

After further debuging, the code below are causing this memory issue.

example = embed_fn.audio_to_example(file_id, offset_s, audio)

Could you please take a look at the code and suggest some ways to optimize it to reduce memory usage?

More compact display of labels in Agile Modeling workflow

When there are more than a few labels, the vertical label display can get annoying. We should explore packing more buttons into less space. This probably involves creating some more formatted HTML and displaying with the IPython.display module, though I'm not sure how that works with the individual ipywidget buttons.

Bird Vocalization Classifier fails to load from tensorflow hub

Running hub.load('https://tfhub.dev/google/bird-vocalization-classifier/3') worked for me until today. Now, I receive the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 3
      1 # Run the model, check the output.
      2 # waveform: 5 seconds of audio signal as mono 32 kHz waveform samples.
----> 3 model = hub.load('https://tfhub.dev/google/bird-vocalization-classifier/3')
      5 logits, embeddings = model.infer_tf(np.zeros([1,160000])) #succeeds

File ~/miniconda3/envs/tensorflow/lib/python3.9/site-packages/tensorflow_hub/module_v2.py:107, in load(handle, tags, options)
    102 saved_model_pbtxt_path = os.path.join(
    103     tf.compat.as_bytes(module_path),
    104     tf.compat.as_bytes(tf.saved_model.SAVED_MODEL_FILENAME_PBTXT))
    105 if (not tf.io.gfile.exists(saved_model_path) and
    106     not tf.io.gfile.exists(saved_model_pbtxt_path)):
--> 107   raise ValueError("Trying to load a model of incompatible/unknown type. "
    108                    "'%s' contains neither '%s' nor '%s'." %
    109                    (module_path, tf.saved_model.SAVED_MODEL_FILENAME_PB,
    110                     tf.saved_model.SAVED_MODEL_FILENAME_PBTXT))
    112 if options:
    113   if not hasattr(getattr(tf, "saved_model", None), "LoadOptions"):

ValueError: Trying to load a model of incompatible/unknown type. '/var/folders/d8/265wdp1n0bn_r85dh3pp95fh0000gq/T/tfhub_modules/3c59b9f74a43d0124967f39277c8a407b5ae7011' contains neither 'saved_model.pb' nor 'saved_model.pbtxt'.

Set a default for write_frontend to unbreak old configs

Better detection and signaling for corrupt audio files

Mass embedding commonly fails when it encounters corrupt audio files. We should find better ways to detect and fail descriptively when we encounter corrupt files.

Historically, this has been due to Soundfile failing to surface errors appropriately...

Need more descriptive README

As someone new landing on this repository won't be able to understand what's this project about, whereas the about section can be updated with descriptive explanations on projects working.
Adding a small draft explaining about the project will help a lot get more out of it!

Possible deadlock in audio iterator in CCAI notebook

The audio iterator over labeled data has a tendency to deadlock. It's very frustrating.

pyproject.toml not updated?

Hey guys,

I followed the instructions to create the environment necessary to use the repo, but after all the installations, the unit test keeps failing on finding some of the necessary modules. I'm guessing either the poetry.toml or the unit test is not updated? I installed a bunch of modules manually (list below) but got stuck after a while:

tensorflow_datasets
tensorflow_io
aqt
pyqt5
pyqt6
PyQt6.QtWebEngineCore (could not pass through this one)
pydub
scenic
the lib libegl1

I appreciate any guidance regarding the installation.

Of of errors I got stuck in the unit test:

======================================================================
ERROR: train_test (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: train_test
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/unittest/loader.py", line 436, in _find_test_path
    module = self._get_module_from_name(name)
  File "/opt/conda/lib/python3.10/unittest/loader.py", line 377, in _get_module_from_name
    __import__(name)
  File "/home/nnbuainain/perch/chirp/tests/train_test.py", line 25, in <module>
    from chirp.configs import config_globals
  File "/home/nnbuainain/perch/chirp/configs/config_globals.py", line 24, in <module>
    from chirp.models import efficientnet
  File "/home/nnbuainain/perch/chirp/models/efficientnet.py", line 25, in <module>
    from aqt.jax.v2 import aqt_conv_general
  File "/home/nnbuainain/.cache/pypoetry/virtualenvs/chirp-IjcdE_j--py3.10/lib/python3.10/site-packages/aqt/__init__.py", line 56, in <module>
    from aqt import gui_hooks
  File "/home/nnbuainain/.cache/pypoetry/virtualenvs/chirp-IjcdE_j--py3.10/lib/python3.10/site-packages/aqt/gui_hooks.py", line 11, in <module>
    from _aqt.hooks import *
  File "/home/nnbuainain/.cache/pypoetry/virtualenvs/chirp-IjcdE_j--py3.10/lib/python3.10/site-packages/_aqt/hooks.py", line 18, in <module>
    from aqt.qt import QDialog, QEvent, QMenu, QModelIndex, QWidget, QMimeData
  File "/home/nnbuainain/.cache/pypoetry/virtualenvs/chirp-IjcdE_j--py3.10/lib/python3.10/site-packages/aqt/qt/__init__.py", line 20, in <module>
    from .qt6 import *
  File "/home/nnbuainain/.cache/pypoetry/virtualenvs/chirp-IjcdE_j--py3.10/lib/python3.10/site-packages/aqt/qt/qt6.py", line 19, in <module>
    from PyQt6.QtWebEngineCore import *
ModuleNotFoundError: No module named 'PyQt6.QtWebEngineCore'

I already have pyqt6 installed.

Make it easier to save/load linear classifier models

We have some code for wrangling small classifiers, but it's not exposed to users fo the Agile Modeling workflow.

The LogitsOutputHead definition is here:
https://github.com/google-research/perch/blob/main/chirp/inference/interface.py#L166
and has 'save_model' and 'from_config_file' methods which should be useful.

There's some example of usage here:
https://github.com/google-research/perch/blob/main/chirp/inference/embed_lib.py#L228
The EmbedFn can pick up a LogitsOutputHead and attach the extra logits to embeddings. This was useful for doing speech+empty filtering on the A2O data, for example.