spokenlanguage / platalea Goto Github PK
View Code? Open in Web Editor NEWLibrary for training visually-grounded models of spoken language understanding.
License: Apache License 2.0
Library for training visually-grounded models of spoken language understanding.
License: Apache License 2.0
Some changes done in relation to #37 have made the CI worflow fail due to the tox
job. We should fix this even if it means removing that job for now.
The last transformer run (crisp-bee-8 on wandb) ran on CPU instead of GPU on Carmine (which has 3 GPUs). Find out why it didn't detect the GPU anymore. Could be some configuration changed in the last reboot (which was around 1 November).
For consistency as we'll also have an audio dir. There is no plural 'audios' :-)
Some options for fine-tuning the schedule that may be promising:
Merge vector-quant and transformer branches before we start working on grounding in videos.
Maybe we should take this as an opportunity to use a uniform scheduler creation procedure. I see that different experiment functions use different code (e.g. asr.experiment()
has very different rules than basic.experiment()
). This should include:
adadelta
, adam
...)none
, cyclic
, noam
; where I would use none
instead of constant
as adaptive optimizers don't use a constant learning rate even without scheduler)constant_lr
-> lr
)scheduler.py
to handle this, called from all experiment()
functionsOriginally posted by @bhigy in #51 (comment)
Do every step for now, may cost more, but we need finegrained info for debugging the Transformer model.
When platalea.config
is imported in a script, it catches the -h
(help) parameter before the script's specific parameters are added. As an example, python utils/evaluate_asr.py -h
doesn't show the script's parameters (path
, -b
), only the global ones (e.g. data_root
, meta
). Importing config after the script has parsed the parameters would result in the opposite behavior.
@egpbos, any idea how this could be handled better?
https://github.com/spokenlanguage/platalea/blob/510d5ef9270411dc971c19d988c30bed4c1c20d6/platalea/encoders.py#L443
For some inputs this returns negative numbers, which causes failures elsewhere and also logically makes no sense.
I tried to run the basic_default
experiment and get following error:
Traceback (most recent call last):
File "basic_default.py", line 49, in <module>
M.experiment(net, data, run_config)
File "/home/bjrhigy/dev/platalea/platalea/basic.py", line 125, in experiment
wandb_step_output["last_lr"] = scheduler.get_last_lr()[0]
AttributeError: 'LambdaLR' object has no attribute 'get_last_lr'
@egpbos and @cwmeijer, this seems related to the new code for wandb. Did you experience that?
Default will be 0 (no regularization).
This will hopefully solve the problem mentioned in #30 .
Currently, our CI test suite of running experiments is only a "smoke test", i.e. it checks only whether everything runs, but not if it runs correctly. We should add result value checks so that we can see when changes in code change model outcomes.
We want to easily share results with the team. Maybe we should request an academic license or maybe not. We need to find out.
The test suite as set up in #17 takes over 40 minutes. The major bottlenecks are:
For the experiments, would it be an idea if we run them with different hyperparameters so they train faster? If so, what parameters should we target?
If this is not enough, there are also a few possibilities within GitHub Actions to speed the whole test suite up:
Do a couple of runs with constant learning, varied systematically between runs, and find out if we can find some learning rate that consistently reduces loss. After that we can experiment with more complicated schemes.
Possible sources of inspiration for the visual part:
[1] https://arxiv.org/abs/2006.09199
[2] https://openaccess.thecvf.com/content_ICCV_2019/html/Miech_HowTo100M_Learning_a_Text-Video_Embedding_by_Watching_Hundred_Million_Narrated_ICCV_2019_paper.html
[3] https://openaccess.thecvf.com/content_CVPR_2020/html/Miech_End-to-End_Learning_of_Visual_Representations_From_Uncurated_Instructional_Videos_CVPR_2020_paper.html
We are trying to find out why the Transformer model is performing poorer than expected, considering the better performance of the GRU model and from earlier Transformer-based models like ESPnet. In particular, we seem to be memory bound on the GPU, whereas those other models can get higher performance with less memory, which is puzzling.
We have started investigating the GRU and Transformer models with the torchinfo
tool. Below some reports.
We are using branch 67_torchinfo: https://github.com/spokenlanguage/platalea/tree/67_torchinfo
There was a hack at codecov. I don't think it really compromises anything important, but we should probably still regenerate the codecov token we use in this project to be sure.
When running e.g. basic-default, I now get the following warnings:
INFO:root:Loading data
/Users/pbos/sw/miniconda3/envs/platalea/lib/python3.8/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.preprocessing.label module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.preprocessing. Anything that cannot be imported from sklearn.preprocessing is now part of the private API.
warnings.warn(message, FutureWarning)
/Users/pbos/sw/miniconda3/envs/platalea/lib/python3.8/site-packages/sklearn/base.py:313: UserWarning: Trying to unpickle estimator LabelEncoder from version 0.21.3 when using version 0.22.1. This might lead to breaking code or invalid results. Use at your own risk.
warnings.warn(
To have something in between the tiny 1D and the large 8K datasets.
There's still one remaining explicit .cuda()
call in decoders.py in TextDecoder. It has a use_cuda
boolean parameter and checks that before calling .cuda()
on the model. Is the use_cuda
parameter actually used? Or would it preferable to remove the parameter and also switch to automatic device detection as we're using everywhere else now?
In lunar-serenity-26, dark-serenity-27 and worthy-sunset-28, we notice that dropout may help slightly to get validation and training loss closer together, but at a cost to performance. Although it might be worth seeing how dropout models fare when trained for more epochs (only 32 in the mentioned runs), one other promising avenue may be to look into using deeper and wider models, as suggested here (in turn based on this article).
Compared to that paper, we are on the low end of both width (we used d_model = 256 in the above runs) and depth (4 now).
Glancing over their paper, it seems they are getting diminishing returns after d_model = 768 and depth = 12.
Their data and tasks are vastly different, they use multi-million word text sets and hence also completely different tasks.
Still, we should look into scaling up these parameters and seeing what it does.
In run dainty-dawn-20 we saw the validation loss increasing again starting from epoch 20, approximately. We should find a way to train that reduces validation loss together with training loss.
We need to adapt platalea/utils/preprocessing.py
to work with videos from HowTo100M.
What does this software do? Why would anyone want to use it? Why not just use keras/pytorch instead?
In the case anyone stumbles upon this repo, what could make this person enthusiastic about it before browsing away?
If we want many users, a logo would also help.
Define actions still required by the software sustainability plan and implement them.
The soundfile
package is in requirements.txt, but not in setup.py, so won't be installed by pip install. We should add it there, since it was added to preprocessing in #11.
Related: do we need to keep requirements.txt for something specific? We had a recent discussion about this at the eScience Center: NLeSC/guide#156. Our conclusion, based on a survey of common practices, that requirements.txt is often superfluous, except for instance when you use it for specifying package repositories/sources. I see that in this case that is actually done for ursa
, but I'm not sure whether that dependency is actually used at all. It was originally used in the analysis subdirectory, but that was removed at some point in this repo.
Do you agree that we can remove requirements.txt?
When doing pip install .
, currently the file label_encoders.pkl
does not get installed, despite it being in the MANIFEST.in file.
I also tried putting it in setup.py, with package_data={'platalea': ['platalea/label_encoders.pkl']}
, but that doesn't work either.
As mentioned here, we can now calculate scoring on CPU so that we have more memory available for training on the GPU. A similar option for the validation step might be useful; it also takes about 2GB extra, so getting rid of that would allow us to further increase model size during training on a single GPU.
The title says "try", because we'll have to see whether it does not slow down training too much.
While trying to fix warnings, I stumbled upon a new warning that only comes up since pytorch 1.8: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.
The warning is triggered in the basic and transformer tests:
platalea/basic.py:39: in cost
speech_enc, image_enc = self.forward(item['audio'], item['audio_len'], item['image'])
platalea/basic.py:34: in forward
speech_enc = self.SpeechEncoder(audio, audio_len)
../../../sw/miniconda3/envs/platalea/lib/python3.8/site-packages/torch/nn/modules/module.py:914: in _call_impl
self._maybe_warn_non_full_backward_hook(input, result, grad_fn)
(Note: I added the forward function in SpeechImage as a test, this call to SpeechEncoder is the actual troublemaker, and it was in SpeechImage.cost() before.)
I frankly have no idea what a backward hook is, let alone a non-full one. Anyone have any clue?
My best guess is that it has something to do with the multiple inputs and multiple outputs and some kind of missing element regarding autograd somewhere. I tried to search the other models/experiments (asr, mtl) for hints in this direction, but couldn't really find anything.
Unfortunately, it seems like something has not gone right in #17 and hence in #4. I misread that the coverage went up to 80%, but it actually went down very slightly. Maybe not all the coverage reports are properly merging, maybe not all files are properly tracked (the experiment files for instance seem to be missing in the report), maybe something else is going on, but it seems very strange that coverage would go down when we now run all these experiments. Something to look into.
Currently, for the transformer experiment, we can only activate dropout on the transformer layers, not on the CNN and other layers. It may make sense to do this, so that we can counter overfitting using those layers as well.
We need to adapt platalea/dataset.py
to be able to train a model with HowTo100M.
Update pillow to version 7.1.0 and get rid of dependabot alert.
I was working on fixing CI some weeks ago. For that purpose, I had set up interactive SSH debugging with tmate
. Continue this.
Whenever I try to run an experiment I get the following prompt:
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice:
By default, I would prefer if the user doesn't have to do anything. W&B could be inactive or use choice (3) above.
By doing so, we could use a smaller kernel and stride while keeping the same size or smaller input for the transformer layers.
We're getting different results with the same configuration settings, including seed, so possibly we are not setting all seeds. Check which ones they are. I suspect torch.cuda.seed()
may be one.
On CI, the ASR experiment saves a file in a directory so that a subsequent test can reuse it. We should just save this file to the repo. This will allow for parallel running of the tests, but also we can check the result of the ASR test to the existing saved output network.
We should take a look at the coverage reports (now that they are fixed #19) and figure out why some parts of the code are not covered by the experiment runs on CI. It seems for instance that in encoders.py
there are a lot of unused models. Are they still used in some other dependent package or can we remove them?
Related to #41.
I would add:
We need to check if that is not going to take to much time but it would cover the main experiments and architectures.
Currently, our custom config class does not fail when it encounters unknown parameters. This is mostly a legacy from the past in which we were using multiple parsing moments. Currently, the parse_unknown_args
function is still used to do some complicated help-print delaying. I don't think we still need that either. So let's just switch to parse_args
so that the parser automatically checks for faulty arguments.
The MFCC conversion routines could be checked against the torchaudio.transforms equivalent to see whether they match (possibly after tweaking parameters).
If they don't match, that might be interesting, because it could a) hint at errors in either our or their implementation or b) it could mean that we have some interesting alternative algorithm to contribute (mentioned in point 3 of #41).
If they do match, that means we could consider replacing our audio folder altogether with the torchaudio implementation. This might be useful if we would want to switch to other available transforms in the torchaudio package in the future.
It could be that all this is not worth the effort, though. @bhigy what do you think?
See #6 (comment)
we want to log more stuff like number of layers, nodes per layer. etc
Found this package version-query, may be useful, also for wandb logging.
I just noticed some additions in basic.py
, which rely on two config varialbles validate_on_cpu
and score_on_cpu
. According to @cwmeijer, this was used to save some memory. It is not clear to me why that would be the case as memory used during validation should be freed before training resumes. @egpbos, do you have more details on this?
In addition to satisfying my own curiosity, I would like to be sure we are not missing an issue in our memory management.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.