Code Monkey home page Code Monkey logo

multilingual_kws's People

Contributors

chooper1 avatar colbybanbury avatar mmaz avatar morphine00 avatar tejasprabhune avatar v0xnihili avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

multilingual_kws's Issues

some empty directories in MSWC? or the 16KHz reencode?

uhohs = []
mswc_16khz = Path("/media/mark/hyperion/mswc/16khz_wav/en/clips")
keywords = list(sorted(os.listdir(mswc_16khz)))
print(len(keywords))
for keyword in tqdm.tqdm(keywords):
    keyword_samples = list(sorted((mswc_16khz / keyword).glob("*.wav")))
    if len(keyword_samples) == 0:
        uhohs.append(keyword)
print(len(uhohs))
>>> 24

Where can I find the used keywords (total 760) and splits from the paper?

First of all, thanks a lot for this work! It is incredibly useful for training strong models for keyword spotting.

I would like to train with the same data as mentioned in the paper: where can I download this data or where can I download a list of the files used for training/validation/testing etc. (see image below for the dataset I am looking for0?

image

For example, the newest version of the dataset on mlcommons.org has now more than 340k keywords:

image

So is there such an overview somewhere? I couldnt find it on the mlcommons website or in this repo, but maybe I missed it somewhere.

word counts

Really great job on kicking off the wordcount feature Tejas! Excited to see you making progress so fast. Some suggestions on next steps:

  • It looks like the current script produces a csv of wordcounts for an input list of keywords. I think what we're looking for is rather, a csv of wordcounts for all words present in the .tsv file (after they have been normalized with clean_and_filter). Let me know if you have questions about this
  • Excellent to see type annotations! Can you also add docstrings please?
  • use standard __main__ (link)
    • I think you can omit sys.argv since you're using argparse
  • format with black (https://github.com/psf/black)
  • rename the file to snakecasing (I have a bad habit of camelcasing .ipynb files but I think python files should be lowercased; eventually we will move several of these functions into a library)

Again, great job!! Let me know if you have any questions or if these suggestions don't make sense.

ERROR: Cannot find key when Running docker

Hi I get ERROR: Cannot find key: --keyword
when I Run

docker run --gpus all -p 8080:8080 --rm -u $(id -u):$(id -g) -it \
   -v $(pwd):/demo_data \
   mkws \
   --keyword mask \
   --modelpath /demo_data/xfer_epochs_4_bs_64_nbs_2_val_acc_1.00_target_mask \
   --groundtruth /demo_data/mask_groundtruth_labels.txt \
   --wav /demo_data/mask_stream.wav \
   --transcript /demo_data/mask_full_transcript.json \
   --visualizer

also is there an example for the required files (groundtruth,transcript)?

Thanks!

Filter out NaNs from Common Voice tsvs, distinguish between intentional "nan" in language vocabulary

in German, 'null' (zero) is being converted to NaN by pandas when it is the only word present in the transcript (due to single-word-target-segments data)

One option is to use filter_na=False when reading Common Voice TSVs
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

however, we should also first check for truly missing values in the sentence transcription column

UMAP visualization transitive dependency on old numpy

When running the intro tutorial notebook in the docker container for tensorflow/tensorflow:latest-gpu-jupyter the umap library can't be installed because numba only works on numpy <= 1.20

Installing umap in colab currently works but this might cause issues soon. We might want to move the umap visualization to a separate notebook.

OperatorNotAllowedInGraphError during Transfer Learning

Hi!

After creating a conda environment using the provided environment.yml file, followed by additionally installing TensorFlow 2.9.0 as mentioned in the Dockerfile, I tried to run the Jupyter Notebook's cells (put together in a main.py file). When calling the transfer_learning.transfer_learn function, I observed the following error:

File "main.py", line 152, in <module>
    main()
  File "main.py", line 104, in main
    _, model, _ = transfer_learning.transfer_learn(
  File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 302, in wrapper
    return func(*args, **kwargs)
  File "/path/to/cioflanc/few_shot_kws/multilingual_kws/multilingual_kws/embedding/transfer_learning.py", line 76, in transfer_learn
    init_train_ds = audio_dataset.init_single_target(
  File "/path/to/cioflanc/few_shot_kws/multilingual_kws/multilingual_kws/embedding/input_data.py", line 467, in init_single_target
    waveform_ds = waveform_ds.map(self.augment, num_parallel_calls=AUTOTUNE)
  File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1697, in map
    return ParallelMapDataset(
  File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 4080, in __init__
    self._map_func = StructuredFunctionWrapper(
  File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3371, in __init__
    self._function = wrapper_fn.get_concrete_function()
  File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2938, in get_concrete_function
    graph_function = self._get_concrete_function_garbage_collected(
  File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2906, in _get_concrete_function_garbage_collected
    graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
  File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3213, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3065, in _create_graph_function
    func_graph_module.func_graph_from_py_func(
  File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 986, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3364, in wrapper_fn
    ret = _wrapper_helper(*args)
  File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3299, in _wrapper_helper
    ret = autograph.tf_convert(func, ag_ctx)(*nested_args)
  File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 302, in wrapper
    return func(*args, **kwargs)
  File "/path/to/cioflanc/few_shot_kws/multilingual_kws/multilingual_kws/embedding/input_data.py", line 290, in augment
    self.random_timeshift(audio) if self.max_time_shift_samples > 0 else audio
  File "/path/to/cioflanc/few_shot_kws/multilingual_kws/multilingual_kws/embedding/input_data.py", line 261, in random_timeshift
    if time_shift_amount > 0:
  File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 877, in __bool__
    self._disallow_bool_casting()
  File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 483, in _disallow_bool_casting
    self._disallow_when_autograph_disabled(
  File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 467, in _disallow_when_autograph_disabled
    raise errors.OperatorNotAllowedInGraphError(
tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed: AutoGraph is disabled in this function. Try decorating it directly with @tf.function.

Have you noticed this behaviour before? Do you have any suggestions?

Reproducing paper results

I'm unable to train a working monolingual embedding model. Using the provided script (train_monolingual_embedding.py) with the top 165 English words yields the following results at the end of training:
loss: 0.7145 - accuracy: 0.7711 - val_loss: 7.6774 - val_accuracy: 0.0586

Based on the paper, I was expecting something in the range of 70's for validation accuracy. Is it dependent on choosing the "right" words?

Could you please post a tutorial or maybe some of the missing files (e.g. train_files.txt, val_files.txt, test_files.txt, commands.txt) for reproducing the embedding?

I also notice that the file references seem to be to common voice rather than MSW. I'm using the English clips download from MSW which I'm assuming are the same. I've converted these to 16KHz, 16bit wav files using pydub which I guess is ffmpeg under the hood.

First time user

Hello,
i've read the article concerning the script, it seems great!
But i'm facing problems using it for the first time, what are the files to make the embedding model ? how can i input the keywords to search for?
any help would be welcomed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.