Code Monkey home page Code Monkey logo

perch's Introduction

Perch

CI

A bioacoustics research project.

Installation

We support installation on a generic Linux workstation. A GPU is recommended, especially when working with large datasets. The recipe below is the same used by our continuous integration testing.

Some users have successfully used our repository with the Windows Linux Subsystem, or with Docker in a cloud-based virtual machine. Anecdotally, installation on OS X is difficult.

You might need the following dependencies.

# Install Poetry for package management
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies for librosa
sudo apt-get install libsndfile1 ffmpeg

# Install all dependencies specified in the poetry configs
poetry install  --with jaxtrain

Running poetry install installs all Perch dependencies into a new virtual environment, in which you can run the Perch code base. To run the tests, use:

poetry run python -m unittest discover -s chirp/tests -p "*test.py"
poetry run python -m unittest discover -s chirp/inference/tests -p "*test.py"

Lightweight Inference

Note that if you only need the python notebooks for use with pre-trained models, you can install with lighter dependencies:

# Install inference-only dependencies specified in the poetry configs
poetry install

And check that the inference tests succeed:

poetry run python -m unittest discover -s chirp/inference/tests -p "*test.py"

Using a container

Alternatively, you can install and run this project using a container via Docker. To build a container using the tag perch, run:

git clone https://github.com/google-research/perch
cd perch
docker build . --tag perch

After building the container, to run the unit tests, use:

docker run --rm -t perch python -m unittest discover -s chirp/tests -p "*test.py"

BIRB benchmark

Data preparation

To build the BIRB evaluation data, after installing the chirp package, run the following command from the repository's root directory:

poetry run tfds build -i chirp.data.bird_taxonomy,chirp.data.soundscapes \
    soundscapes/{ssw,hawaii,coffee_farms,sierras_kahl,high_sierras,peru}_full_length \
    bird_taxonomy/{downstream_full_length,class_representatives_slice_peaked}

The process should take 36 to 48 hours to complete and use around 256 GiB of disk space.

Benchmark README

For details on setting up the benchmark and evaluation protocol, please refer to this brief readme with instructions. The evaluation codebase is in perch/chirp/eval.

This is not an officially supported Google product.

perch's People

Contributors

agentmorris avatar bartvm avatar bringingjoy avatar cdh4696 avatar chiamp avatar dependabot[bot] avatar elenitriantafillou avatar hawkinsp avatar jeffgeoff4 avatar laurenharrell avatar matt-har-vey avatar mboudiaf avatar owahltinez avatar rchen152 avatar sdenton4 avatar vdumoulin avatar yilei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

perch's Issues

CUDA_ERROR_ILLEGAL_ADDRESS is thrown when running chirp test in CUDA environment

After cloning the repo and installing all required dependencies, all unit tests can be run correctly on CPU as below.

poetry run python -m unittest discover -s chirp/tests -p "*test.py"

But CUDA_ERROR_ILLEGAL_ADDRESS is thrown when running chirp test in CUDA environment. Here is my CUDA environment

GPU: Nvidia RTX 3090
Ubuntu: 22.10
Driver version: 525.60.13
Python version: 3.10
cudatoolkit version: 11.8
cudnn version: 8.6
jax version: 0.4.1
jaxlib version: 0.4.1+cuda11.cudnn86
flax version: 0.6.3

The first issue I ran into is the OOM failure, so I followed the instruction to use CPU-only Tensorflow.
GPU memory allocation

tests/sep_train_test.py TrainSeparationTest.test_eval_one_step works fine.

PYTHONPATH=. python chirp/tests/sep_train_test.py TrainSeparationTest.test_eval_one_step

but tests/sep_train_test.py TrainSeparationTest.test_train_one_step reports CUDA_ERROR_ILLEGAL_ADDRESS error consistently in my CUDA environment.

PYTHONPATH=. python chirp/tests/sep_train_test.py TrainSeparationTest.test_train_one_step

Error message:

I0111 16:41:57.470412 140120040671040 pipeline.py:742] Splitting batch across 1 devices, with local device count 1.
I0111 16:42:07.193832 140120040671040 utils.py:33] Checkpoint.restore_or_initialize() ...
I0111 16:42:07.193934 140120040671040 utils.py:33] MultihostCheckpoint.get_latest_checkpoint_to_restore_from() ...
I0111 16:42:07.194855 140120040671040 checkpoint.py:508] /tmp/tmpl5pjxa1ptrain_dir-0 not in []
I0111 16:42:07.194904 140120040671040 utils.py:43] MultihostCheckpoint.get_latest_checkpoint_to_restore_from() finished after 0.00s.
I0111 16:42:07.194927 140120040671040 checkpoint.py:346] Storing initial version.
I0111 16:42:07.194949 140120040671040 utils.py:33] Checkpoint.save() ...
I0111 16:42:07.195027 140120040671040 checkpoint.py:304] Storing next checkpoint '/tmp/tmpl5pjxa1ptrain_dir-0/ckpt-1'
I0111 16:42:07.219635 140120040671040 utils.py:43] Checkpoint.save() finished after 0.02s.
I0111 16:42:07.219720 140120040671040 utils.py:43] Checkpoint.restore_or_initialize() finished after 0.03s.
/home/ryanz/projects/ml/source-separation/chirp/chirp/models/metrics.py:188: FutureWarning: The sym_pos argument to solve() is deprecated and will be removed in a future JAX release. Use assume_a='pos' instead.
  return scipy.linalg.solve(
2023-01-11 16:42:13.886039: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:1032] could not wait stream on event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2023-01-11 16:42:13.886061: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/stream.cc:1112] Error waiting for event in stream: error recording waiting for CUDA event on stream 0x562ab718b7a0; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2023-01-11 16:42:13.886070: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:1159] failed to enqueue async memcpy from device to host: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered; host dst: 0x7f6d38002560; GPU src: 0x7f67441dfe00; size: 4=0x4
2023-01-11 16:42:13.886077: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/stream.cc:327] Error recording event in stream: Error recording CUDA event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2023-01-11 16:42:13.886094: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:614] unable to add host callback: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
E0111 16:42:13.888874 140076977440448 asynclib.py:139] Error in producer thread for AsyncWriter
Traceback (most recent call last):
  File "/home/ryanz/miniconda3/envs/new-tf/lib/python3.10/site-packages/clu/asynclib.py", line 135, in trap_errors
    return fn(*args, **kwargs)
  File "/home/ryanz/miniconda3/envs/new-tf/lib/python3.10/site-packages/clu/metric_writers/logging_writer.py", line 44, in write_scalars
    values = [
  File "/home/ryanz/miniconda3/envs/new-tf/lib/python3.10/site-packages/clu/metric_writers/logging_writer.py", line 45, in <listcomp>
    f"{k}={v:.6f}" if isinstance(v, float) else f"{k}={v}"
  File "/home/ryanz/miniconda3/envs/new-tf/lib/python3.10/site-packages/jax/_src/array.py", line 252, in __format__
    return format(self._value[()], format_spec)
  File "/home/ryanz/miniconda3/envs/new-tf/lib/python3.10/site-packages/jax/_src/array.py", line 487, in _value
    self._npy_value = np.asarray(self._arrays[0])  # type: ignore
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: stream did not block host until done; was already in an error state
I0111 16:42:13.892304 140120040671040 utils.py:33] Checkpoint.save() ...
E0111 16:42:13.892458 140076985833152 asynclib.py:139] Error in producer thread for AsyncWriter

I tried to trace the cause by commenting out lines in this function and it is the separator.train() that causes the error.

  def test_train_one_step(self):
    config = self._get_test_config(use_small_encoder=True)
    ds, _ = self._get_test_dataset(
        'train',
        config,
    )
    model = separator.initialize_model(
        workdir=self.train_dir, **config.init_config)

    separator.train(
        *model, train_dataset=ds, logdir=self.train_dir, **config.train_config)
    ckpt = checkpoint.MultihostCheckpoint(self.train_dir)
    self.assertIsNotNone(ckpt.latest_checkpoint)

It could be something wrong with my CUDA environment but test_eval_one_step works fine so does other JAX code.

It would be great if Chirp team can share your CUDA environment setup.

Thanks,

Ryan

Include the spectrogram in outputs

Oftentimes the best way to debug a bioacoustic classifier is to eyeball the mel spectrogram. Would it be possible to either document some sort of escape hatch that allows me access to the spectrogram that chirp calculates for a window of audio during inference, or just include it in the output dict alongside the logits and embeddings?

Most of the time I use the model from TFHub, which provides fewer access points for this than the chirp Python library.

Support wavs+csv labels for building small classifiers

We currently use a folder-of-folder format for labeled data. If we allow providing a directory of audio files and a CSV of labels, then we will be able to support:

a) Evaluation with published fully-annotated datasets,
b) Simpler import of people's pre-existing labeled data,
c) Multi-label audio examples,
d) 'Strong negative' labels (via an extra CSV column)

Registers are spilled to local memory on calculating embeddings

I'm trying to find a solution to a memory spill issue. I am running perch on a few 100 GB of audio. When I run `perch/embed_audio.ipynb I end up with a lot of spills into local memory and it's not an issue I've had to trace before:

Environment:

  • Python 3.11.9
  • GCC 11.2.0
  • NVIDIA-SMI 535.161.08
  • Driver Version: 535.161.08
  • CUDA Version: 12.2
  • Tensorflow 2.16.1
  • OS: Ubuntu 22.04.4 LTS
  • VM: Azure Standard NC24ads A100 v4
  • RAM 220 GB
  • CPU 24x vCPU AMD EPYC™ 7V13 (Milan)
  • GPU A100 80GB PCIe GPU card
I0000 00:00:1718394013.781000    6517 asm_compiler.cc:369] ptxas warning : Registers are spilled to local memory in function 'triton_gemm_dot_2', 24 bytes spill stores, 24 bytes spill loads

I0000 00:00:1718394013.794912    6502 asm_compiler.cc:369] ptxas warning : Registers are spilled to local memory in function 'triton_gemm_dot_3753', 52 bytes spill stores, 52 bytes spill loads

I0000 00:00:1718394013.866065    6524 asm_compiler.cc:369] ptxas warning : Registers are spilled to local memory in function 'triton_gemm_dot_3753', 220 bytes spill stores, 220 bytes spill loads

I0000 00:00:1718394014.030538    6506 asm_compiler.cc:369] ptxas warning : Registers are spilled to local memory in function 'triton_gemm_dot_2', 472 bytes spill stores, 304 bytes spill loads

This leads to the following slow execution errors but the script continues with register spilling errors without falling over but running very slowly. Could anyone suggest some pointers to solve this? Many thanks.


2024-06-14 19:40:16.455757: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng0{} for conv (f32[719,640,501,1]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,1,160640,1]{3,2,1,0}, f32[640,1,640,1]{3,2,1,0}), window={size=640x1 stride=320x1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:18.707113: E external/local_xla/xla/service/slow_operation_alarm.cc:133] The operation took 3.251487411s
Trying algorithm eng0{} for conv (f32[719,640,501,1]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,1,160640,1]{3,2,1,0}, f32[640,1,640,1]{3,2,1,0}), window={size=640x1 stride=320x1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:20.156609: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng4{} for conv (f32[719,160,500,1]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,160,755,1]{3,2,1,0}, f32[160,1,256,1]{3,2,1,0}), window={size=256x1}, dim_labels=bf01_oi01->bf01, feature_group_count=160, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:22.419095: E external/local_xla/xla/service/slow_operation_alarm.cc:133] The operation took 3.262581059s
Trying algorithm eng4{} for conv (f32[719,160,500,1]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,160,755,1]{3,2,1,0}, f32[160,1,256,1]{3,2,1,0}), window={size=256x1}, dim_labels=bf01_oi01->bf01, feature_group_count=160, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:28.401848: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng46{k2=5,k5=3,k14=4} for conv (f32[719,32,249,79]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,32,249,79]{3,2,1,0}, f32[32,1,3,3]{3,2,1,0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, feature_group_count=32, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:28.672951: E external/local_xla/xla/service/slow_operation_alarm.cc:133] The operation took 1.271200136s
Trying algorithm eng46{k2=5,k5=3,k14=4} for conv (f32[719,32,249,79]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,32,249,79]{3,2,1,0}, f32[32,1,3,3]{3,2,1,0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, feature_group_count=32, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:37.274820: W external/local_tsl/tsl/framework/bfc_allocator.cc:368] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature.
2024-06-14 19:40:38.274947: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng3{k11=0} for conv (f32[719,96,249,79]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,16,249,79]{3,2,1,0}, f32[96,16,1,1]{3,2,1,0}), window={size=1x1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:38.595592: E external/local_xla/xla/service/slow_operation_alarm.cc:133] The operation took 1.320788364s
Trying algorithm eng3{k11=0} for conv (f32[719,96,249,79]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,16,249,79]{3,2,1,0}, f32[96,16,1,1]{3,2,1,0}), window={size=1x1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:41.298653: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng46{k2=5,k5=3,k14=4} for conv (f32[719,96,125,40]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,96,249,79]{3,2,1,0}, f32[96,1,3,3]{3,2,1,0}), window={size=3x3 stride=2x2 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, feature_group_count=96, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:41.485093: E external/local_xla/xla/service/slow_operation_alarm.cc:133] The operation took 1.186534591s
Trying algorithm eng46{k2=5,k5=3,k14=4} for conv (f32[719,96,125,40]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,96,249,79]{3,2,1,0}, f32[96,1,3,3]{3,2,1,0}), window={size=3x3 stride=2x2 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, feature_group_count=96, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:42.485235: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng4{} for conv (f32[719,96,125,40]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,96,249,79]{3,2,1,0}, f32[96,1,3,3]{3,2,1,0}), window={size=3x3 stride=2x2 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, feature_group_count=96, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:42.517285: E external/local_xla/xla/service/slow_operation_alarm.cc:133] The operation took 1.032144481s
Trying algorithm eng4{} for conv (f32[719,96,125,40]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,96,249,79]{3,2,1,0}, f32[96,1,3,3]{3,2,1,0}), window={size=3x3 stride=2x2 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, feature_group_count=96, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:47.752097: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng46{k2=5,k5=3,k14=4} for conv (f32[719,144,125,40]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,144,125,40]{3,2,1,0}, f32[144,1,3,3]{3,2,1,0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, feature_group_count=144, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
2024-06-14 19:40:48.225649: E external/local_xla/xla/service/slow_operation_alarm.cc:133] The operation took 1.473644391s
Trying algorithm eng46{k2=5,k5=3,k14=4} for conv (f32[719,144,125,40]{3,2,1,0}, u8[0]{0}) custom-call(f32[719,144,125,40]{3,2,1,0}, f32[144,1,3,3]{3,2,1,0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, feature_group_count=144, custom_call_target="__cudnn$convForward", backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0}} is taking a while...
  0%|          | 6/21119 [01:10<84:44:35, 14.45s/it] W0000 00:00:1718394066.767582    4752 assert_op.cc:38] Ignoring Assert operator jax2tf_infer_fn_/assert_equal_1/Assert/AssertGuard/Assert

Pip install is unpredictable and often breaks Colab usage

I discovered that pip install doesn't actually make use of the poetry lock file, and essentially makes up the dependency tree on the fly from the pyproject.toml file.... This means that it's pretty easy to get into a weird state when we do the colab pip install: the lock file gives us a specific tested combination of dependency versions, with CI tests, but we don't have any real way to test what's going on with the pip-installed version.

Ideally, we should have pip install the exact set of dependency versions specified in the lock file, to ensure that our CI testing actually tells us that the Colab notebooks are working.

There's some pretty extensive discussion here of the problem:
python-poetry/poetry#2778 (comment)

Very short audio snippets fail to display

Sometimes we get embeddings for very short audio clips, which then cause an error when we try to compute the melspec during audio display.

These /should/ be filtered out during embedding anyways... eg, last 12 samples of a 1 minute+epsilon file.

Cannot install package: resolution impossible because optax has new version

I came in this morning and can no longer install the perch package. I tracked it down to this commit in a transitive dependency: google-deepmind/optax@beae523

In short:

  • chirp requires optax < 0.2.0
  • chirp requires scenic
  • scenic requires the latest version of optax, which is 0.2.0.dev, i.e. conflicting with chirp's own requirement.

scenic doesn't have tagged releases that perch can specify, so a resolution to this issue is probably going to have to require the scenic authors to constrain their own dependency on optax (unlikely?) or say perch itself can accept optax 0.2.*.

$ pip install "chirp @ git+https://github.com/google-research/perch.git@77edeff5800be0cc1af81bf8c078f70a1ad82f79"
<snip a bunch of logs>
ERROR: Cannot install chirp and chirp==0.1.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    chirp 0.1.0 depends on optax<0.2.0 and >=0.1.7
    flax 0.7.4 depends on optax
    scenic 0.0.1 depends on optax 0.2.0.dev0 (from git+https://github.com/google-deepmind/optax.git@main)

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

TF SavedModel to Jax conversion

Hello,

I am trying to get the original jax model; by converting the weight of the tf saved model however the embedding i am getting seems different. I wonder if there is some parameters i should configure when calling the EfficientNet model in jax.

Here my code:

import tensorflow_hub as hub
import tensorflow, tensorflow.python as tf

# Load the model.
model = hub.load('https://www.kaggle.com/models/google/bird-vocalization-classifier/TensorFlow2/bird-vocalization-classifier/8')

waveform = librosa.load('....')[0]

out = model.infer_tf(waveform[np.newaxis, :])
spec = out['frontend']
embedding = out['embedding']

# jax model 
backbone = EfficientNet(model=EfficientNetModel(value="b1"))

# init jax  
rng = jax.random.PRNGKey(42)
rng, inp_rng, init_rng, dropout_key = jax.random.split(rng, 4)
inp = jax.random.normal(inp_rng, (8, 2))  # Batch size 8, input size 2

# Initialize the model
inputs = jax.numpy.asarray(np.expand_dims(spec, axis=-1))
params = backbone.init(init_rng, inputs, train=False, )

# transfer weight 

# --- params --- #
for k1, v1 in model._structured_variables['params']['encoder'].items():
    for k2, v2 in v1.items():
        for k3, v3 in v2.items():
            if type(v3) == tf.trackable.data_structures._DictWrapper:
                 for k4, v4 in v3.items():
                    params['params'][k1][k2][k3][k4] = jax.numpy.asarray(v4)
            else:
                params['params'][k1][k2][k3] = jax.numpy.asarray(v3)

for k1, v1 in model._structured_variables['batch_stats']['encoder'].items():
    for k2, v2 in v1.items():
        for k3, v3 in v2.items():
            if type(v3) == tf.trackable.data_structures._DictWrapper:
                 for k4, v4 in v3.items():
                    params['batch_stats'][k1][k2][k3][k4] = jax.numpy.asarray(v4)
            else:
                params['batch_stats'][k1][k2][k3] = jax.numpy.asarray(v3)

# inference with jax 
embedding_jax = backbone.apply(params, inputs, train=False, use_running_average=True, rngs={'dropout': dropout_key})

Is there anything to configure such as activation function, head and stem when calling EffcientNet model?

embedding and embedding jax are very different.

Spectrogram reproduction

Hello,

Trying to reproduce the spectrogram output (frontend) available within this model: https://www.kaggle.com/models/google/bird-vocalization-classifier/TensorFlow2/bird-vocalization-classifier

I used so far the MelSpectrogram class from frontend.py and try to add as well the normalized_audio function but the results i get are different.

mel = MelSpectrogram(sample_rate=32000, freq_range=(60,10000), kernel_size=1024, features=160, stride=320)

I am using the right function?

Memory leak

I have 48,391 audio files totalling 80GB in size.

Ubuntu with python3.10.12 and 126GB memory and 8GB swap and 32 cores CPU.

When I run the notebook agile_modeling.ipynb multiple times, the memory usage always reaches maximum capacity. I suspect that certain lines of code in note book:

audio_iterator = audio_utils.multi_load_audio_window(
    filepaths=[s.filepath for s in new_source_infos],
    offsets=[s.shard_num * s.shard_len_s for s in new_source_infos],
    sample_rate=config.embed_fn_config.model_config.sample_rate,
    window_size_s=config.get('shard_len_s', -1.0),
)
with tf_examples.EmbeddingsTFRecordMultiWriter(
    output_dir=output_dir, num_files=config.get('tf_record_shards', 1)) as file_writer:
  for source_info, audio in tqdm.tqdm(
      zip(new_source_infos, audio_iterator), total=len(new_source_infos)):
    file_id = source_info.file_id(config.embed_fn_config.file_id_depth)
    offset_s = source_info.shard_num * source_info.shard_len_s
    example = embed_fn.audio_to_example(file_id, offset_s, audio)
    if example is None:
      fail += 1
      continue
    file_writer.write(example.SerializeToString())
    succ += 1
  file_writer.flush()
print(f'\n\nSuccessfully processed {succ} source_infos, failed {fail} times.')

After changing the audio_utils.multi_load_audio_window's argument max_workers to 1, the problem still occurs.

After further debuging, the code below are causing this memory issue.

example = embed_fn.audio_to_example(file_id, offset_s, audio)

Could you please take a look at the code and suggest some ways to optimize it to reduce memory usage?

More compact display of labels in Agile Modeling workflow

When there are more than a few labels, the vertical label display can get annoying. We should explore packing more buttons into less space. This probably involves creating some more formatted HTML and displaying with the IPython.display module, though I'm not sure how that works with the individual ipywidget buttons.

Bird Vocalization Classifier fails to load from tensorflow hub

Running hub.load('https://tfhub.dev/google/bird-vocalization-classifier/3') worked for me until today. Now, I receive the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 3
      1 # Run the model, check the output.
      2 # waveform: 5 seconds of audio signal as mono 32 kHz waveform samples.
----> 3 model = hub.load('https://tfhub.dev/google/bird-vocalization-classifier/3')
      5 logits, embeddings = model.infer_tf(np.zeros([1,160000])) #succeeds

File ~/miniconda3/envs/tensorflow/lib/python3.9/site-packages/tensorflow_hub/module_v2.py:107, in load(handle, tags, options)
    102 saved_model_pbtxt_path = os.path.join(
    103     tf.compat.as_bytes(module_path),
    104     tf.compat.as_bytes(tf.saved_model.SAVED_MODEL_FILENAME_PBTXT))
    105 if (not tf.io.gfile.exists(saved_model_path) and
    106     not tf.io.gfile.exists(saved_model_pbtxt_path)):
--> 107   raise ValueError("Trying to load a model of incompatible/unknown type. "
    108                    "'%s' contains neither '%s' nor '%s'." %
    109                    (module_path, tf.saved_model.SAVED_MODEL_FILENAME_PB,
    110                     tf.saved_model.SAVED_MODEL_FILENAME_PBTXT))
    112 if options:
    113   if not hasattr(getattr(tf, "saved_model", None), "LoadOptions"):

ValueError: Trying to load a model of incompatible/unknown type. '/var/folders/d8/265wdp1n0bn_r85dh3pp95fh0000gq/T/tfhub_modules/3c59b9f74a43d0124967f39277c8a407b5ae7011' contains neither 'saved_model.pb' nor 'saved_model.pbtxt'.

Better detection and signaling for corrupt audio files

Mass embedding commonly fails when it encounters corrupt audio files. We should find better ways to detect and fail descriptively when we encounter corrupt files.

Historically, this has been due to Soundfile failing to surface errors appropriately...

Need more descriptive README

  • As someone new landing on this repository won't be able to understand what's this project about, whereas the about section can be updated with descriptive explanations on projects working.
  • Adding a small draft explaining about the project will help a lot get more out of it!

pyproject.toml not updated?

Hey guys,

I followed the instructions to create the environment necessary to use the repo, but after all the installations, the unit test keeps failing on finding some of the necessary modules. I'm guessing either the poetry.toml or the unit test is not updated? I installed a bunch of modules manually (list below) but got stuck after a while:

  • tensorflow_datasets
  • tensorflow_io
  • aqt
  • pyqt5
  • pyqt6
  • PyQt6.QtWebEngineCore (could not pass through this one)
  • pydub
  • scenic
  • the lib libegl1

I appreciate any guidance regarding the installation.

Of of errors I got stuck in the unit test:

======================================================================
ERROR: train_test (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: train_test
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/unittest/loader.py", line 436, in _find_test_path
    module = self._get_module_from_name(name)
  File "/opt/conda/lib/python3.10/unittest/loader.py", line 377, in _get_module_from_name
    __import__(name)
  File "/home/nnbuainain/perch/chirp/tests/train_test.py", line 25, in <module>
    from chirp.configs import config_globals
  File "/home/nnbuainain/perch/chirp/configs/config_globals.py", line 24, in <module>
    from chirp.models import efficientnet
  File "/home/nnbuainain/perch/chirp/models/efficientnet.py", line 25, in <module>
    from aqt.jax.v2 import aqt_conv_general
  File "/home/nnbuainain/.cache/pypoetry/virtualenvs/chirp-IjcdE_j--py3.10/lib/python3.10/site-packages/aqt/__init__.py", line 56, in <module>
    from aqt import gui_hooks
  File "/home/nnbuainain/.cache/pypoetry/virtualenvs/chirp-IjcdE_j--py3.10/lib/python3.10/site-packages/aqt/gui_hooks.py", line 11, in <module>
    from _aqt.hooks import *
  File "/home/nnbuainain/.cache/pypoetry/virtualenvs/chirp-IjcdE_j--py3.10/lib/python3.10/site-packages/_aqt/hooks.py", line 18, in <module>
    from aqt.qt import QDialog, QEvent, QMenu, QModelIndex, QWidget, QMimeData
  File "/home/nnbuainain/.cache/pypoetry/virtualenvs/chirp-IjcdE_j--py3.10/lib/python3.10/site-packages/aqt/qt/__init__.py", line 20, in <module>
    from .qt6 import *
  File "/home/nnbuainain/.cache/pypoetry/virtualenvs/chirp-IjcdE_j--py3.10/lib/python3.10/site-packages/aqt/qt/qt6.py", line 19, in <module>
    from PyQt6.QtWebEngineCore import *
ModuleNotFoundError: No module named 'PyQt6.QtWebEngineCore'

I already have pyqt6 installed.

Make it easier to save/load linear classifier models

We have some code for wrangling small classifiers, but it's not exposed to users fo the Agile Modeling workflow.

The LogitsOutputHead definition is here:
https://github.com/google-research/perch/blob/main/chirp/inference/interface.py#L166
and has 'save_model' and 'from_config_file' methods which should be useful.

There's some example of usage here:
https://github.com/google-research/perch/blob/main/chirp/inference/embed_lib.py#L228
The EmbedFn can pick up a LogitsOutputHead and attach the extra logits to embeddings. This was useful for doing speech+empty filtering on the A2O data, for example.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.