Comments (4)
Transformer model run with --trafo_d_model=512 --trafo_encoder_layers=4 --trafo_heads=2 --trafo_dropout=0.4 --trafo_feedforward_dim=512
on flickr1d dataset (30 samples only, batch_size also 30).
python -m platalea.experiments.flickr8k.transformer -c $DATAPATH/config.yml --flickr8k_root=$DATAPATH --epochs=1 --trafo_d_model=512 --trafo_encoder_layers=4 --trafo_heads=2 --trafo_dropout=0.4 --trafo_feedforward_dim=512 --batch_size=30
INFO:root:Arguments: {'config': '/Users/pbos/sw/miniconda3/envs/platalea/lib/python3.8/site-packages/flickr1d/config.yml', 'verbose': False, 'silent': False, 'audio_features_fn': 'mfcc_features.pt', 'seed': 123, 'epochs': 1, 'downsampling_factor': None, 'lr_scheduler': 'cyclic', 'cyclic_lr_max': 0.0002, 'cyclic_lr_min': 1e-06, 'constant_lr': 0.0001, 'device': None, 'hidden_size_factor': 1024, 'l2_regularization': 0, 'flickr8k_root': '/Users/pbos/sw/miniconda3/envs/platalea/lib/python3.8/site-packages/flickr1d', 'flickr8k_meta': 'dataset.json', 'flickr8k_audio_subdir': 'flickr_audio/wavs/', 'flickr8k_image_subdir': 'Flickr8k_Dataset/Flicker8k_Dataset/', 'flickr8k_language': 'en', 'librispeech_root': '/home/bjrhigy/corpora/LibriSpeech', 'librispeech_meta': 'metadata.json', 'batch_size': 30, 'trafo_d_model': 512, 'trafo_encoder_layers': 4, 'trafo_heads': 2, 'trafo_feedforward_dim': 512, 'trafo_dropout': 0.4, 'score_on_cpu': False, 'validate_on_cpu': False}
INFO:root:Loading data
INFO:root:Building model
INFO:root:Training
INFO:root:Run 'wandb disabled' if you don't want to use wandb cloud logging.
INFO:wandb:setting login settings: {}
wandb: Offline run mode, not syncing to the cloud.
wandb: W&B is disabled in this directory. Run `wandb on` to enable cloud syncing.
INFO:root:Setting stepsize of 4
/Users/pbos/sw/miniconda3/envs/platalea/lib/python3.8/site-packages/torch/utils/checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
===============================================================================================
Layer (type:depth-idx) Output Shape Param #
===============================================================================================
├─Conv1d: 1-1 [30, 64, 299] 14,976
├─Linear: 1-2 [299, 30, 512] 33,280
├─TransformerEncoder: 1-3 [299, 30, 512] --
| └─ModuleList: 2 [] --
| | └─TransformerEncoderLayer: 3-1 [299, 30, 512] --
| | | └─MultiheadAttention: 4-1 [299, 30, 512] --
| | | └─Dropout: 4-2 [299, 30, 512] --
| | | └─LayerNorm: 4-3 [299, 30, 512] 1,024
| | | └─Linear: 4-4 [299, 30, 512] 262,656
| | | └─Dropout: 4-5 [299, 30, 512] --
| | | └─Linear: 4-6 [299, 30, 512] 262,656
| | | └─Dropout: 4-7 [299, 30, 512] --
| | | └─LayerNorm: 4-8 [299, 30, 512] 1,024
| | └─TransformerEncoderLayer: 3-2 [299, 30, 512] --
| | | └─MultiheadAttention: 4-9 [299, 30, 512] --
| | | └─Dropout: 4-10 [299, 30, 512] --
| | | └─LayerNorm: 4-11 [299, 30, 512] 1,024
| | | └─Linear: 4-12 [299, 30, 512] 262,656
| | | └─Dropout: 4-13 [299, 30, 512] --
| | | └─Linear: 4-14 [299, 30, 512] 262,656
| | | └─Dropout: 4-15 [299, 30, 512] --
| | | └─LayerNorm: 4-16 [299, 30, 512] 1,024
| | └─TransformerEncoderLayer: 3-3 [299, 30, 512] --
| | | └─MultiheadAttention: 4-17 [299, 30, 512] --
| | | └─Dropout: 4-18 [299, 30, 512] --
| | | └─LayerNorm: 4-19 [299, 30, 512] 1,024
| | | └─Linear: 4-20 [299, 30, 512] 262,656
| | | └─Dropout: 4-21 [299, 30, 512] --
| | | └─Linear: 4-22 [299, 30, 512] 262,656
| | | └─Dropout: 4-23 [299, 30, 512] --
| | | └─LayerNorm: 4-24 [299, 30, 512] 1,024
| | └─TransformerEncoderLayer: 3-4 [299, 30, 512] --
| | | └─MultiheadAttention: 4-25 [299, 30, 512] --
| | | └─Dropout: 4-26 [299, 30, 512] --
| | | └─LayerNorm: 4-27 [299, 30, 512] 1,024
| | | └─Linear: 4-28 [299, 30, 512] 262,656
| | | └─Dropout: 4-29 [299, 30, 512] --
| | | └─Linear: 4-30 [299, 30, 512] 262,656
| | | └─Dropout: 4-31 [299, 30, 512] --
| | | └─LayerNorm: 4-32 [299, 30, 512] 1,024
├─Attention: 1-4 [30, 512] --
| └─Linear: 2-1 [30, 299, 128] 65,664
| └─Linear: 2-2 [30, 299, 512] 66,048
| └─Softmax: 2-3 [30, 299, 512] --
===============================================================================================
Total params: 2,289,408
Trainable params: 2,289,408
Non-trainable params: 0
Total mult-adds (M): 23.66
===============================================================================================
Input size (MB): 2.82
Forward/backward pass size (MB): 675.12
Params size (MB): 9.16
Estimated Total Size (MB): 687.10
===============================================================================================
==========================================================================================
Layer (type:depth-idx) Output Shape Param #
==========================================================================================
├─Linear: 1-1 [30, 512] 1,049,088
==========================================================================================
Total params: 1,049,088
Trainable params: 1,049,088
Non-trainable params: 0
Total mult-adds (M): 1.05
==========================================================================================
Input size (MB): 0.25
Forward/backward pass size (MB): 0.12
Params size (MB): 4.20
Estimated Total Size (MB): 4.56
==========================================================================================
from platalea.
GRU model run with default parameters on flickr1d dataset (30 samples only, so batch_size of 32 is not filled up completely).
python -m platalea.experiments.flickr8k.basic_default -c $DATAPATH/config.yml --flickr8k_root=$DATAPATH --epochs=1
INFO:root:Arguments: {'config': '/Users/pbos/sw/miniconda3/envs/platalea/lib/python3.8/site-packages/flickr1d/config.yml', 'verbose': False, 'silent': False, 'audio_features_fn': 'mfcc_features.pt', 'seed': 123, 'epochs': 1, 'downsampling_factor': None, 'lr_scheduler': 'cyclic', 'cyclic_lr_max': 0.0002, 'cyclic_lr_min': 1e-06, 'constant_lr': 0.0001, 'device': None, 'hidden_size_factor': 1024, 'l2_regularization': 0, 'flickr8k_root': '/Users/pbos/sw/miniconda3/envs/platalea/lib/python3.8/site-packages/flickr1d', 'flickr8k_meta': 'dataset.json', 'flickr8k_audio_subdir': 'flickr_audio/wavs/', 'flickr8k_image_subdir': 'Flickr8k_Dataset/Flicker8k_Dataset/', 'flickr8k_language': 'en', 'librispeech_root': '/home/bjrhigy/corpora/LibriSpeech', 'librispeech_meta': 'metadata.json'}
INFO:root:Loading data
INFO:root:Building model
INFO:root:Training
INFO:root:Run 'wandb disabled' if you don't want to use wandb cloud logging.
INFO:wandb:setting login settings: {}
wandb: Offline run mode, not syncing to the cloud.
wandb: W&B is disabled in this directory. Run `wandb on` to enable cloud syncing.
INFO:root:Setting stepsize of 4
==========================================================================================
Layer (type:depth-idx) Output Shape Param #
==========================================================================================
├─Conv1d: 1-1 [30, 64, 299] 14,976
├─GRU: 1-2 [5780, 2048] 63,356,928
├─Attention: 1-3 [30, 2048] --
| └─Linear: 2-1 [30, 299, 128] 262,272
| └─Linear: 2-2 [30, 299, 2048] 264,192
| └─Softmax: 2-3 [30, 299, 2048] --
==========================================================================================
Total params: 63,898,368
Trainable params: 63,898,368
Non-trainable params: 0
Total mult-adds (M): 68.83
==========================================================================================
Input size (MB): 2.82
Forward/backward pass size (MB): 255.44
Params size (MB): 255.59
Estimated Total Size (MB): 513.86
==========================================================================================
torch.Size([30, 2048])
==========================================================================================
Layer (type:depth-idx) Output Shape Param #
==========================================================================================
├─Linear: 1-1 [30, 2048] 4,196,352
==========================================================================================
Total params: 4,196,352
Trainable params: 4,196,352
Non-trainable params: 0
Total mult-adds (M): 4.19
==========================================================================================
Input size (MB): 0.25
Forward/backward pass size (MB): 0.49
Params size (MB): 16.79
Estimated Total Size (MB): 17.52
==========================================================================================
from platalea.
Now also used torchprof
to profile memory usage of the Transformer model. Results (slightly different parameters than above):
python -m platalea.experiments.flickr8k.transformer -c $DATAPATH/config.yml --flickr8k_root=$DATAPATH --epochs=1 --trafo_d_model=512 --trafo_encoder_layers=4 --trafo_heads=4 --trafo_dropout=0.4 --trafo_feedforward_dim=512 --batch_size=30
INFO:root:Arguments: {'config': '/Users/pbos/sw/miniconda3/envs/platalea/lib/python3.8/site-packages/flickr1d/config.yml', 'verbose': False, 'silent': False, 'audio_features_fn': 'mfcc_features.pt', 'seed': 123, 'epochs': 1, 'downsampling_factor': None, 'lr_scheduler': 'cyclic', 'cyclic_lr_max': 0.0002, 'cyclic_lr_min': 1e-06, 'constant_lr': 0.0001, 'device': None, 'hidden_size_factor': 1024, 'l2_regularization': 0, 'flickr8k_root': '/Users/pbos/sw/miniconda3/envs/platalea/lib/python3.8/site-packages/flickr1d', 'flickr8k_meta': 'dataset.json', 'flickr8k_audio_subdir': 'flickr_audio/wavs/', 'flickr8k_image_subdir': 'Flickr8k_Dataset/Flicker8k_Dataset/', 'flickr8k_language': 'en', 'librispeech_root': '/home/bjrhigy/corpora/LibriSpeech', 'librispeech_meta': 'metadata.json', 'batch_size': 30, 'trafo_d_model': 512, 'trafo_encoder_layers': 4, 'trafo_heads': 4, 'trafo_feedforward_dim': 512, 'trafo_dropout': 0.4, 'score_on_cpu': False, 'validate_on_cpu': False}
INFO:root:Loading data
INFO:root:Building model
INFO:root:Training
INFO:root:Run 'wandb disabled' if you don't want to use wandb cloud logging.
INFO:wandb:setting login settings: {}
wandb: Offline run mode, not syncing to the cloud.
wandb: W&B is disabled in this directory. Run `wandb on` to enable cloud syncing.
INFO:root:Setting stepsize of 4
INFO:root:Saving model in net.1.pt
INFO:root:Calculating and saving epoch score results
Module | Self CPU total | CPU total | Self CPU Mem | CPU Mem | Number of Calls
-------------------------|----------------|-----------|--------------|-----------|----------------
SpeechImage | | | | |
├── SpeechEncoder | | | | |
│├── Conv | 66.873ms | 289.453ms | 6.65 Mb | 36.94 Mb | 2
│├── Transformer | | | | |
││├── layers | | | | |
│││├── 0 | | | | |
││││├── self_attn | | | | |
│││││└── out_proj | 0.000us | 0.000us | | | 0
││││├── linear1 | 83.397ms | 151.518ms | 41.41 Mb | 124.22 Mb | 3
││││├── dropout | 35.445ms | 72.829ms | 52.56 Mb | 227.75 Mb | 3
││││├── linear2 | 58.937ms | 109.521ms | 41.41 Mb | 124.22 Mb | 3
││││├── norm1 | 13.010ms | 26.011ms | 41.47 Mb | 124.61 Mb | 3
││││├── norm2 | 5.002ms | 9.874ms | 41.47 Mb | 124.61 Mb | 3
││││├── dropout1 | 45.915ms | 100.343ms | 52.56 Mb | 227.75 Mb | 3
││││└── dropout2 | 52.156ms | 114.384ms | 52.56 Mb | 227.75 Mb | 3
│││├── 1 | | | | |
││││├── self_attn | | | | |
│││││└── out_proj | 0.000us | 0.000us | | | 0
││││├── linear1 | 86.873ms | 157.309ms | 41.41 Mb | 124.22 Mb | 3
││││├── dropout | 23.239ms | 48.686ms | 52.56 Mb | 227.75 Mb | 3
││││├── linear2 | 60.659ms | 115.586ms | 41.41 Mb | 124.22 Mb | 3
││││├── norm1 | 14.783ms | 29.562ms | 41.47 Mb | 124.61 Mb | 3
││││├── norm2 | 19.597ms | 39.195ms | 41.47 Mb | 124.61 Mb | 3
││││├── dropout1 | 24.864ms | 50.021ms | 52.56 Mb | 227.75 Mb | 3
││││└── dropout2 | 22.024ms | 44.320ms | 52.56 Mb | 227.75 Mb | 3
│││├── 2 | | | | |
││││├── self_attn | | | | |
│││││└── out_proj | 0.000us | 0.000us | | | 0
││││├── linear1 | 50.357ms | 98.688ms | 41.41 Mb | 124.22 Mb | 3
││││├── dropout | 37.660ms | 80.871ms | 52.56 Mb | 227.75 Mb | 3
││││├── linear2 | 56.380ms | 110.692ms | 41.41 Mb | 124.22 Mb | 3
││││├── norm1 | 9.368ms | 18.736ms | 41.47 Mb | 124.61 Mb | 3
││││├── norm2 | 6.272ms | 12.558ms | 41.47 Mb | 124.61 Mb | 3
││││├── dropout1 | 35.492ms | 70.916ms | 52.56 Mb | 227.75 Mb | 3
││││└── dropout2 | 45.605ms | 100.806ms | 52.56 Mb | 227.75 Mb | 3
│││└── 3 | | | | |
│││ ├── self_attn | | | | |
│││ │└── out_proj | 0.000us | 0.000us | | | 0
│││ ├── linear1 | 59.860ms | 116.771ms | 41.41 Mb | 124.22 Mb | 3
│││ ├── dropout | 33.931ms | 70.847ms | 52.56 Mb | 227.75 Mb | 3
│││ ├── linear2 | 49.675ms | 94.920ms | 41.41 Mb | 124.22 Mb | 3
│││ ├── norm1 | 6.048ms | 12.110ms | 41.47 Mb | 124.61 Mb | 3
│││ ├── norm2 | 7.821ms | 15.640ms | 41.47 Mb | 124.61 Mb | 3
│││ ├── dropout1 | 37.179ms | 79.899ms | 52.56 Mb | 227.75 Mb | 3
│││ └── dropout2 | 32.213ms | 64.872ms | 52.56 Mb | 227.75 Mb | 3
│├── scale_conv_to_trafo | 20.789ms | 30.776ms | 26.87 Mb | 83.60 Mb | 2
│├── att | | | | |
││├── hidden | 26.160ms | 56.518ms | 29.86 Mb | 113.46 Mb | 2
││├── out | 10.915ms | 19.699ms | 23.89 Mb | 71.66 Mb | 2
││└── softmax | 24.973ms | 49.969ms | 23.89 Mb | 71.66 Mb | 2
└── ImageEncoder | | | | |
└── linear_transform | 1.649ms | 1.849ms | 64.00 Kb | 64.00 Kb | 2
from platalea.
We decreased the GPU usage by using larger strides. See report https://wandb.ai/spokenlanguage/platalea_transformer/reports/Feb-4-Project-Update-Conv-stride-sweep--Vmlldzo0NDkwMzA.
With also tried to use the extra available memory by adding layers. See report https://wandb.ai/spokenlanguage/platalea_transformer/reports/Feb-23-Project-Update-grid-search-on-trafo-layers-and-heads--Vmlldzo0ODQ5MzE.
from platalea.
Related Issues (20)
- Reproduce label_encoders.pkl HOT 3
- [ZRVG] CPC feature extraction
- Merge zerospeech21-vg branch? HOT 4
- add to RSD HOT 1
- Add issue template
- Create a conda package HOT 1
- Consider creating a conda channel HOT 1
- Use static analysis/code quality service HOT 2
- Description of the software HOT 1
- Supported platform HOT 1
- Add code of conduct
- Add code documentation HOT 1
- No help message when mandatory argument is missing. HOT 2
- CI fails HOT 3
- Test case with SpokenCOCO dataset
- Store dtype in metadata of memmap
- Versioning, releasing, DOI, Zenodo HOT 7
- Linting
- Fix warnings about code quality
- Add reference documentation
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from platalea.