harvard-edge / multilingual_kws Goto Github PK
View Code? Open in Web Editor NEWFew-shot Keyword Spotting in Any Language and Multilingual Spoken Word Corpus
Few-shot Keyword Spotting in Any Language and Multilingual Spoken Word Corpus
uhohs = []
mswc_16khz = Path("/media/mark/hyperion/mswc/16khz_wav/en/clips")
keywords = list(sorted(os.listdir(mswc_16khz)))
print(len(keywords))
for keyword in tqdm.tqdm(keywords):
keyword_samples = list(sorted((mswc_16khz / keyword).glob("*.wav")))
if len(keyword_samples) == 0:
uhohs.append(keyword)
print(len(uhohs))
>>> 24
Thanks!
First of all, thanks a lot for this work! It is incredibly useful for training strong models for keyword spotting.
I would like to train with the same data as mentioned in the paper: where can I download this data or where can I download a list of the files used for training/validation/testing etc. (see image below for the dataset I am looking for0?
For example, the newest version of the dataset on mlcommons.org has now more than 340k keywords:
So is there such an overview somewhere? I couldnt find it on the mlcommons website or in this repo, but maybe I missed it somewhere.
Impacts certain languages more heavily than others (French, Kinyarwanda, ...)
add version.txt
containing version 1.0
for all .tar.gz
files
Similar to common voice or speech commands
given two transcripts 1. [hello is a common greeting] and 2. [she said, “hello”], without punctuation filtering we would otherwise treat [hello] and [“hello”] as separate words
Really great job on kicking off the wordcount feature Tejas! Excited to see you making progress so fast. Some suggestions on next steps:
.tsv
file (after they have been normalized with clean_and_filter
). Let me know if you have questions about this__main__
(link)
sys.argv
since you're using argparse
black
(https://github.com/psf/black).ipynb
files but I think python files should be lowercased; eventually we will move several of these functions into a library)Again, great job!! Let me know if you have any questions or if these suggestions don't make sense.
Most of our current alignments are for Common Voice 3/4, so re-running the alignments should create a lot more data.
Low priority as of now.
Hi I get ERROR: Cannot find key: --keyword
when I Run
docker run --gpus all -p 8080:8080 --rm -u $(id -u):$(id -g) -it \
-v $(pwd):/demo_data \
mkws \
--keyword mask \
--modelpath /demo_data/xfer_epochs_4_bs_64_nbs_2_val_acc_1.00_target_mask \
--groundtruth /demo_data/mask_groundtruth_labels.txt \
--wav /demo_data/mask_stream.wav \
--transcript /demo_data/mask_full_transcript.json \
--visualizer
also is there an example for the required files (groundtruth,transcript)?
Thanks!
in German, 'null' (zero) is being converted to NaN
by pandas when it is the only word present in the transcript (due to single-word-target-segments data)
One option is to use filter_na=False
when reading Common Voice TSVs
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
however, we should also first check for truly missing values in the sentence transcription column
When running the intro tutorial notebook in the docker container for tensorflow/tensorflow:latest-gpu-jupyter
the umap
library can't be installed because numba
only works on numpy <= 1.20
Installing umap in colab currently works but this might cause issues soon. We might want to move the umap visualization to a separate notebook.
Hi!
Very interesting work! I would like to know if it was possible to test this using the microphone stream as input?
both for mp3s and wav files
Hi!
After creating a conda environment using the provided environment.yml
file, followed by additionally installing TensorFlow 2.9.0 as mentioned in the Dockerfile, I tried to run the Jupyter Notebook's cells (put together in a main.py
file). When calling the transfer_learning.transfer_learn
function, I observed the following error:
File "main.py", line 152, in <module>
main()
File "main.py", line 104, in main
_, model, _ = transfer_learning.transfer_learn(
File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 302, in wrapper
return func(*args, **kwargs)
File "/path/to/cioflanc/few_shot_kws/multilingual_kws/multilingual_kws/embedding/transfer_learning.py", line 76, in transfer_learn
init_train_ds = audio_dataset.init_single_target(
File "/path/to/cioflanc/few_shot_kws/multilingual_kws/multilingual_kws/embedding/input_data.py", line 467, in init_single_target
waveform_ds = waveform_ds.map(self.augment, num_parallel_calls=AUTOTUNE)
File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1697, in map
return ParallelMapDataset(
File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 4080, in __init__
self._map_func = StructuredFunctionWrapper(
File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3371, in __init__
self._function = wrapper_fn.get_concrete_function()
File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2938, in get_concrete_function
graph_function = self._get_concrete_function_garbage_collected(
File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2906, in _get_concrete_function_garbage_collected
graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3213, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3065, in _create_graph_function
func_graph_module.func_graph_from_py_func(
File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 986, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3364, in wrapper_fn
ret = _wrapper_helper(*args)
File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3299, in _wrapper_helper
ret = autograph.tf_convert(func, ag_ctx)(*nested_args)
File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 302, in wrapper
return func(*args, **kwargs)
File "/path/to/cioflanc/few_shot_kws/multilingual_kws/multilingual_kws/embedding/input_data.py", line 290, in augment
self.random_timeshift(audio) if self.max_time_shift_samples > 0 else audio
File "/path/to/cioflanc/few_shot_kws/multilingual_kws/multilingual_kws/embedding/input_data.py", line 261, in random_timeshift
if time_shift_amount > 0:
File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 877, in __bool__
self._disallow_bool_casting()
File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 483, in _disallow_bool_casting
self._disallow_when_autograph_disabled(
File "/path/to/cioflanc/miniconda3/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 467, in _disallow_when_autograph_disabled
raise errors.OperatorNotAllowedInGraphError(
tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed: AutoGraph is disabled in this function. Try decorating it directly with @tf.function.
Have you noticed this behaviour before? Do you have any suggestions?
I'm unable to train a working monolingual embedding model. Using the provided script (train_monolingual_embedding.py) with the top 165 English words yields the following results at the end of training:
loss: 0.7145 - accuracy: 0.7711 - val_loss: 7.6774 - val_accuracy: 0.0586
Based on the paper, I was expecting something in the range of 70's for validation accuracy. Is it dependent on choosing the "right" words?
Could you please post a tutorial or maybe some of the missing files (e.g. train_files.txt, val_files.txt, test_files.txt, commands.txt) for reproducing the embedding?
I also notice that the file references seem to be to common voice rather than MSW. I'm using the English clips download from MSW which I'm assuming are the same. I've converted these to 16KHz, 16bit wav files using pydub which I guess is ffmpeg under the hood.
does multilingual_context_73_0.8011 trained on Full Dataset mswc or just English and Spanish?
if so, potentially remove for challenge specification
Hello,
i've read the article concerning the script, it seems great!
But i'm facing problems using it for the first time, what are the files to make the embedding model ? how can i input the keywords to search for?
any help would be welcomed
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.