ear-team / bambird Goto Github PK

Unsupervised classification to improve the quality of a bird song recording dataset. https://doi.org/10.1016/j.ecoinf.2022.101952

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

bioacoustics cluster-analysis data-centric-ai dataset ecoacoustics labeling labeling-tool labelling-tool machine-learning soundscape

bambird's People

Contributors

Stargazers

Watchers

Forkers

sean1572 michebio intouchables120

bambird's Issues

AttributeError: 'DataFrame' object has no attribute 'append'. Did you mean: '_append'? in scikit-maad

When trying to use bambird.query_xc I get the following error "AttributeError: 'DataFrame' object has no attribute 'append'. Did you mean: '_append'? in scikit-maad" originating from maad/util/xeno_canto.py line 224

'DataFrame' object has no attribute 'append'

The function dataset.grab_audio_to_df() and segmentation.multicpu_extract_rois() uses the attribute 'append'. However, this attribute has been deprecated since pandas 2.0. New version should use 'concat' for the package to be compatible with more recent installations.

Here is a working fix.

In dataset.grab_audio_to_df(), replace line 349 with :

df_line = pd.DataFrame({      'fullfilename':file,
                                      'filename'    :Path(file).parts[-1],
                                      'categories'  :categories,
                                      'id'          :iden}
                               , index=[0])
df_dataset = pd.concat([df_dataset, df_line], ignore_index=True)

In segmentation.multicpu_extract_rois(), replace line 490 with :
df_rois = pd. concat([df_rois,df_rois_temp])

line 496 :
df_rois_sorted = pd. concat([df_rois_sorted, df_rois[df_rois["categories"] == categories].sort_index()] )

Similar modifications also need to be applied to segmentaiton.py

add species names in df_features saved

To add the species name in the saved df_features if we want to re-process the features later on.

Manual labeling

While testing the cool functions for segmenting sound units, I thought it could be nice to implement an additional function to ease the manual labeling of the sound units if we want to evaluate the clustering with cluster_eval.

Here is a code snippet that could maybe be useful for doing so.

import maad
import librosa
import bambird
import pandas as pd
import ipywidgets as widgets
import IPython.display as ipd
import matplotlib.pyplot as plt
from scipy.signal import butter, lfilter

class ManualLabeling:
    
    def __init__(self, df_cluster, params):

        self.df_cluster = df_cluster.copy()
        self.params = params['PARAMS_EXTRACT']
        self.manual_label = []        

        button1 = widgets.Button(description="Signal")
        button2 = widgets.Button(description="Noise")
        buttons = widgets.HBox(children=[button1, button2])
        button1.on_click(self.signal)
        button2.on_click(self.noise)

        self.all_widgets = widgets.HBox(children=[buttons, widgets.Output()])
        self.roi(0)
        
    def roi(self, i):
        filename   = self.df_cluster['fullfilename_ts'][i]
        tmin       = self.df_cluster['min_t'][i]
        tmax       = self.df_cluster['max_t'][i]
        fmin       = self.df_cluster['min_f'][i]
        fmax       = self.df_cluster['max_f'][i]
        sr         = self.params["SAMPLE_RATE"]
        duration   = self.params["AUDIO_DURATION"]
        n_fft      = self.params["NFFT"]
        
        print(f'Loading region of interest (ROI): {i}/{len(self.df_cluster)}')
        
        # Load and filter the segmented signal
        sig, sr = librosa.load(filename, sr=sr)
        b, a = butter(5, [fmin, fmax], fs=sr, btype='band')       
        sig = lfilter(b, a, sig)

        # Compute spectrogram
        Sxx, tn, fn, ext = maad.sound.spectrogram(sig, sr,nperseg=n_fft, noverlap=n_fft // 2)
        X = maad.util.power2dB(Sxx, db_range=96) + 96
        
        ipd.clear_output(wait=False)
        
        # Display spectrogram
        fig, ax = plt.subplots(figsize=(4,2))
        maad.util.plot_spectrogram(X, log_scale=False, colorbar=False, ax=ax, now=False, extent=ext)
        ax.yaxis.set_label_position("right")
        ax.set_ylim([fmin, fmax])
        ax.yaxis.tick_right()
        plt.show()
        
        # Display audio unit and widgets
        ipd.display(ipd.Audio(sig, rate=sr))
        ipd.display(self.all_widgets)

    def signal(self, b):
        self.manual_label.append(1.0)
        ipd.clear_output(wait=False)
        
        if len(self.manual_label) == len(self.df_cluster):
            self.df_cluster['manual_label'] = self.manual_label
            ipd.display(self.df_cluster)
        else:
            i = len(self.manual_label)
            self.roi(i)

    def noise(self, b):
        self.manual_label.append(0.0)
        ipd.clear_output(wait=False)
        
        if len(self.manual_label) == len(self.df_cluster):
            self.df_cluster['manual_label'] = self.manual_label
            ipd.display(self.df_cluster)
            
        else:
            i = len(self.manual_label)
            self.roi(i)

Creating an object of the class ManualLabeling(df_cluster, params) will then make it possible to iterate over all the units found in the df_cluster and allow the user to click either on the Signal or Noise buttons to append the values in the manual_labels list.

Problem with the `find_cluster` function

I have found a problem related to the find_cluster function. When finding the cluster for each categories separately, the function test if the number of ROIs is higher than 2 and if not it converts all the dataframe columns to noise:

  if len(df_single_categories) < 3:
      df_cluster["cluster_number"] = -1 # noise
      df_cluster["auto_label"] = 0 # noise

This is problematic as it converts all the dataframe to noise even if some categories were containing more than 3 ROIs before. Consequently, it is needed to modify the lines 304 and 305 with the following lines of code:

  if len(df_single_categories) < 3:
      df_cluster.loc[df_cluster["categories"] == categories, "cluster_number"] = int(-1) # noise
      df_cluster.loc[df_cluster["categories"] == categories, "auto_label"] = int(0) # noise

Extract ROIs audio format

The function multicpu_extract_rois allows to extract ROIs directly from a directory. However, if it appears that the audio files are encoded using the wav format this does not work because the default audio_format parameter of the grab_audio_to_df is set to mp3.

It would be cool to add an audio_format parameter to the multicpu_extract_rois function in order to allow the user to change it to wav if the directory to process contains only wav files.

ear-team / bambird Goto Github PK

bambird's People

Contributors

Stargazers

Watchers

Forkers

bambird's Issues

AttributeError: 'DataFrame' object has no attribute 'append'. Did you mean: '_append'? in scikit-maad

'DataFrame' object has no attribute 'append'

add species names in df_features saved

Manual labeling

Problem with the `find_cluster` function

Extract ROIs audio format

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent