Code Monkey home page Code Monkey logo

Comments (4)

egpbos avatar egpbos commented on September 23, 2024

Transformer model run with --trafo_d_model=512 --trafo_encoder_layers=4 --trafo_heads=2 --trafo_dropout=0.4 --trafo_feedforward_dim=512 on flickr1d dataset (30 samples only, batch_size also 30).

python -m platalea.experiments.flickr8k.transformer -c $DATAPATH/config.yml --flickr8k_root=$DATAPATH --epochs=1 --trafo_d_model=512 --trafo_encoder_layers=4 --trafo_heads=2 --trafo_dropout=0.4 --trafo_feedforward_dim=512 --batch_size=30
INFO:root:Arguments: {'config': '/Users/pbos/sw/miniconda3/envs/platalea/lib/python3.8/site-packages/flickr1d/config.yml', 'verbose': False, 'silent': False, 'audio_features_fn': 'mfcc_features.pt', 'seed': 123, 'epochs': 1, 'downsampling_factor': None, 'lr_scheduler': 'cyclic', 'cyclic_lr_max': 0.0002, 'cyclic_lr_min': 1e-06, 'constant_lr': 0.0001, 'device': None, 'hidden_size_factor': 1024, 'l2_regularization': 0, 'flickr8k_root': '/Users/pbos/sw/miniconda3/envs/platalea/lib/python3.8/site-packages/flickr1d', 'flickr8k_meta': 'dataset.json', 'flickr8k_audio_subdir': 'flickr_audio/wavs/', 'flickr8k_image_subdir': 'Flickr8k_Dataset/Flicker8k_Dataset/', 'flickr8k_language': 'en', 'librispeech_root': '/home/bjrhigy/corpora/LibriSpeech', 'librispeech_meta': 'metadata.json', 'batch_size': 30, 'trafo_d_model': 512, 'trafo_encoder_layers': 4, 'trafo_heads': 2, 'trafo_feedforward_dim': 512, 'trafo_dropout': 0.4, 'score_on_cpu': False, 'validate_on_cpu': False}
INFO:root:Loading data
INFO:root:Building model
INFO:root:Training
INFO:root:Run 'wandb disabled' if you don't want to use wandb cloud logging.
INFO:wandb:setting login settings: {}
wandb: Offline run mode, not syncing to the cloud.
wandb: W&B is disabled in this directory.  Run `wandb on` to enable cloud syncing.
INFO:root:Setting stepsize of 4
/Users/pbos/sw/miniconda3/envs/platalea/lib/python3.8/site-packages/torch/utils/checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
===============================================================================================
Layer (type:depth-idx)                        Output Shape              Param #
===============================================================================================
├─Conv1d: 1-1                                 [30, 64, 299]             14,976
├─Linear: 1-2                                 [299, 30, 512]            33,280
├─TransformerEncoder: 1-3                     [299, 30, 512]            --
|    └─ModuleList: 2                          []                        --
|    |    └─TransformerEncoderLayer: 3-1      [299, 30, 512]            --
|    |    |    └─MultiheadAttention: 4-1      [299, 30, 512]            --
|    |    |    └─Dropout: 4-2                 [299, 30, 512]            --
|    |    |    └─LayerNorm: 4-3               [299, 30, 512]            1,024
|    |    |    └─Linear: 4-4                  [299, 30, 512]            262,656
|    |    |    └─Dropout: 4-5                 [299, 30, 512]            --
|    |    |    └─Linear: 4-6                  [299, 30, 512]            262,656
|    |    |    └─Dropout: 4-7                 [299, 30, 512]            --
|    |    |    └─LayerNorm: 4-8               [299, 30, 512]            1,024
|    |    └─TransformerEncoderLayer: 3-2      [299, 30, 512]            --
|    |    |    └─MultiheadAttention: 4-9      [299, 30, 512]            --
|    |    |    └─Dropout: 4-10                [299, 30, 512]            --
|    |    |    └─LayerNorm: 4-11              [299, 30, 512]            1,024
|    |    |    └─Linear: 4-12                 [299, 30, 512]            262,656
|    |    |    └─Dropout: 4-13                [299, 30, 512]            --
|    |    |    └─Linear: 4-14                 [299, 30, 512]            262,656
|    |    |    └─Dropout: 4-15                [299, 30, 512]            --
|    |    |    └─LayerNorm: 4-16              [299, 30, 512]            1,024
|    |    └─TransformerEncoderLayer: 3-3      [299, 30, 512]            --
|    |    |    └─MultiheadAttention: 4-17     [299, 30, 512]            --
|    |    |    └─Dropout: 4-18                [299, 30, 512]            --
|    |    |    └─LayerNorm: 4-19              [299, 30, 512]            1,024
|    |    |    └─Linear: 4-20                 [299, 30, 512]            262,656
|    |    |    └─Dropout: 4-21                [299, 30, 512]            --
|    |    |    └─Linear: 4-22                 [299, 30, 512]            262,656
|    |    |    └─Dropout: 4-23                [299, 30, 512]            --
|    |    |    └─LayerNorm: 4-24              [299, 30, 512]            1,024
|    |    └─TransformerEncoderLayer: 3-4      [299, 30, 512]            --
|    |    |    └─MultiheadAttention: 4-25     [299, 30, 512]            --
|    |    |    └─Dropout: 4-26                [299, 30, 512]            --
|    |    |    └─LayerNorm: 4-27              [299, 30, 512]            1,024
|    |    |    └─Linear: 4-28                 [299, 30, 512]            262,656
|    |    |    └─Dropout: 4-29                [299, 30, 512]            --
|    |    |    └─Linear: 4-30                 [299, 30, 512]            262,656
|    |    |    └─Dropout: 4-31                [299, 30, 512]            --
|    |    |    └─LayerNorm: 4-32              [299, 30, 512]            1,024
├─Attention: 1-4                              [30, 512]                 --
|    └─Linear: 2-1                            [30, 299, 128]            65,664
|    └─Linear: 2-2                            [30, 299, 512]            66,048
|    └─Softmax: 2-3                           [30, 299, 512]            --
===============================================================================================
Total params: 2,289,408
Trainable params: 2,289,408
Non-trainable params: 0
Total mult-adds (M): 23.66
===============================================================================================
Input size (MB): 2.82
Forward/backward pass size (MB): 675.12
Params size (MB): 9.16
Estimated Total Size (MB): 687.10
===============================================================================================
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
├─Linear: 1-1                            [30, 512]                 1,049,088
==========================================================================================
Total params: 1,049,088
Trainable params: 1,049,088
Non-trainable params: 0
Total mult-adds (M): 1.05
==========================================================================================
Input size (MB): 0.25
Forward/backward pass size (MB): 0.12
Params size (MB): 4.20
Estimated Total Size (MB): 4.56
==========================================================================================

from platalea.

egpbos avatar egpbos commented on September 23, 2024

GRU model run with default parameters on flickr1d dataset (30 samples only, so batch_size of 32 is not filled up completely).

python -m platalea.experiments.flickr8k.basic_default -c $DATAPATH/config.yml --flickr8k_root=$DATAPATH --epochs=1
INFO:root:Arguments: {'config': '/Users/pbos/sw/miniconda3/envs/platalea/lib/python3.8/site-packages/flickr1d/config.yml', 'verbose': False, 'silent': False, 'audio_features_fn': 'mfcc_features.pt', 'seed': 123, 'epochs': 1, 'downsampling_factor': None, 'lr_scheduler': 'cyclic', 'cyclic_lr_max': 0.0002, 'cyclic_lr_min': 1e-06, 'constant_lr': 0.0001, 'device': None, 'hidden_size_factor': 1024, 'l2_regularization': 0, 'flickr8k_root': '/Users/pbos/sw/miniconda3/envs/platalea/lib/python3.8/site-packages/flickr1d', 'flickr8k_meta': 'dataset.json', 'flickr8k_audio_subdir': 'flickr_audio/wavs/', 'flickr8k_image_subdir': 'Flickr8k_Dataset/Flicker8k_Dataset/', 'flickr8k_language': 'en', 'librispeech_root': '/home/bjrhigy/corpora/LibriSpeech', 'librispeech_meta': 'metadata.json'}
INFO:root:Loading data
INFO:root:Building model
INFO:root:Training
INFO:root:Run 'wandb disabled' if you don't want to use wandb cloud logging.
INFO:wandb:setting login settings: {}
wandb: Offline run mode, not syncing to the cloud.
wandb: W&B is disabled in this directory.  Run `wandb on` to enable cloud syncing.
INFO:root:Setting stepsize of 4
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
├─Conv1d: 1-1                            [30, 64, 299]             14,976
├─GRU: 1-2                               [5780, 2048]              63,356,928
├─Attention: 1-3                         [30, 2048]                --
|    └─Linear: 2-1                       [30, 299, 128]            262,272
|    └─Linear: 2-2                       [30, 299, 2048]           264,192
|    └─Softmax: 2-3                      [30, 299, 2048]           --
==========================================================================================
Total params: 63,898,368
Trainable params: 63,898,368
Non-trainable params: 0
Total mult-adds (M): 68.83
==========================================================================================
Input size (MB): 2.82
Forward/backward pass size (MB): 255.44
Params size (MB): 255.59
Estimated Total Size (MB): 513.86
==========================================================================================
torch.Size([30, 2048])
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
├─Linear: 1-1                            [30, 2048]                4,196,352
==========================================================================================
Total params: 4,196,352
Trainable params: 4,196,352
Non-trainable params: 0
Total mult-adds (M): 4.19
==========================================================================================
Input size (MB): 0.25
Forward/backward pass size (MB): 0.49
Params size (MB): 16.79
Estimated Total Size (MB): 17.52
==========================================================================================

from platalea.

egpbos avatar egpbos commented on September 23, 2024

Now also used torchprof to profile memory usage of the Transformer model. Results (slightly different parameters than above):

python -m platalea.experiments.flickr8k.transformer -c $DATAPATH/config.yml --flickr8k_root=$DATAPATH --epochs=1 --trafo_d_model=512 --trafo_encoder_layers=4 --trafo_heads=4 --trafo_dropout=0.4 --trafo_feedforward_dim=512 --batch_size=30
INFO:root:Arguments: {'config': '/Users/pbos/sw/miniconda3/envs/platalea/lib/python3.8/site-packages/flickr1d/config.yml', 'verbose': False, 'silent': False, 'audio_features_fn': 'mfcc_features.pt', 'seed': 123, 'epochs': 1, 'downsampling_factor': None, 'lr_scheduler': 'cyclic', 'cyclic_lr_max': 0.0002, 'cyclic_lr_min': 1e-06, 'constant_lr': 0.0001, 'device': None, 'hidden_size_factor': 1024, 'l2_regularization': 0, 'flickr8k_root': '/Users/pbos/sw/miniconda3/envs/platalea/lib/python3.8/site-packages/flickr1d', 'flickr8k_meta': 'dataset.json', 'flickr8k_audio_subdir': 'flickr_audio/wavs/', 'flickr8k_image_subdir': 'Flickr8k_Dataset/Flicker8k_Dataset/', 'flickr8k_language': 'en', 'librispeech_root': '/home/bjrhigy/corpora/LibriSpeech', 'librispeech_meta': 'metadata.json', 'batch_size': 30, 'trafo_d_model': 512, 'trafo_encoder_layers': 4, 'trafo_heads': 4, 'trafo_feedforward_dim': 512, 'trafo_dropout': 0.4, 'score_on_cpu': False, 'validate_on_cpu': False}
INFO:root:Loading data
INFO:root:Building model
INFO:root:Training
INFO:root:Run 'wandb disabled' if you don't want to use wandb cloud logging.
INFO:wandb:setting login settings: {}
wandb: Offline run mode, not syncing to the cloud.
wandb: W&B is disabled in this directory.  Run `wandb on` to enable cloud syncing.
INFO:root:Setting stepsize of 4
INFO:root:Saving model in net.1.pt
INFO:root:Calculating and saving epoch score results
Module                   | Self CPU total | CPU total | Self CPU Mem | CPU Mem   | Number of Calls
-------------------------|----------------|-----------|--------------|-----------|----------------
SpeechImage              |                |           |              |           |
├── SpeechEncoder        |                |           |              |           |
│├── Conv                | 66.873ms       | 289.453ms | 6.65 Mb      | 36.94 Mb  | 2
│├── Transformer         |                |           |              |           |
││├── layers             |                |           |              |           |
│││├── 0                 |                |           |              |           |
││││├── self_attn        |                |           |              |           |
│││││└── out_proj        | 0.000us        | 0.000us   |              |           | 0
││││├── linear1          | 83.397ms       | 151.518ms | 41.41 Mb     | 124.22 Mb | 3
││││├── dropout          | 35.445ms       | 72.829ms  | 52.56 Mb     | 227.75 Mb | 3
││││├── linear2          | 58.937ms       | 109.521ms | 41.41 Mb     | 124.22 Mb | 3
││││├── norm1            | 13.010ms       | 26.011ms  | 41.47 Mb     | 124.61 Mb | 3
││││├── norm2            | 5.002ms        | 9.874ms   | 41.47 Mb     | 124.61 Mb | 3
││││├── dropout1         | 45.915ms       | 100.343ms | 52.56 Mb     | 227.75 Mb | 3
││││└── dropout2         | 52.156ms       | 114.384ms | 52.56 Mb     | 227.75 Mb | 3
│││├── 1                 |                |           |              |           |
││││├── self_attn        |                |           |              |           |
│││││└── out_proj        | 0.000us        | 0.000us   |              |           | 0
││││├── linear1          | 86.873ms       | 157.309ms | 41.41 Mb     | 124.22 Mb | 3
││││├── dropout          | 23.239ms       | 48.686ms  | 52.56 Mb     | 227.75 Mb | 3
││││├── linear2          | 60.659ms       | 115.586ms | 41.41 Mb     | 124.22 Mb | 3
││││├── norm1            | 14.783ms       | 29.562ms  | 41.47 Mb     | 124.61 Mb | 3
││││├── norm2            | 19.597ms       | 39.195ms  | 41.47 Mb     | 124.61 Mb | 3
││││├── dropout1         | 24.864ms       | 50.021ms  | 52.56 Mb     | 227.75 Mb | 3
││││└── dropout2         | 22.024ms       | 44.320ms  | 52.56 Mb     | 227.75 Mb | 3
│││├── 2                 |                |           |              |           |
││││├── self_attn        |                |           |              |           |
│││││└── out_proj        | 0.000us        | 0.000us   |              |           | 0
││││├── linear1          | 50.357ms       | 98.688ms  | 41.41 Mb     | 124.22 Mb | 3
││││├── dropout          | 37.660ms       | 80.871ms  | 52.56 Mb     | 227.75 Mb | 3
││││├── linear2          | 56.380ms       | 110.692ms | 41.41 Mb     | 124.22 Mb | 3
││││├── norm1            | 9.368ms        | 18.736ms  | 41.47 Mb     | 124.61 Mb | 3
││││├── norm2            | 6.272ms        | 12.558ms  | 41.47 Mb     | 124.61 Mb | 3
││││├── dropout1         | 35.492ms       | 70.916ms  | 52.56 Mb     | 227.75 Mb | 3
││││└── dropout2         | 45.605ms       | 100.806ms | 52.56 Mb     | 227.75 Mb | 3
│││└── 3                 |                |           |              |           |
│││ ├── self_attn        |                |           |              |           |
│││ │└── out_proj        | 0.000us        | 0.000us   |              |           | 0
│││ ├── linear1          | 59.860ms       | 116.771ms | 41.41 Mb     | 124.22 Mb | 3
│││ ├── dropout          | 33.931ms       | 70.847ms  | 52.56 Mb     | 227.75 Mb | 3
│││ ├── linear2          | 49.675ms       | 94.920ms  | 41.41 Mb     | 124.22 Mb | 3
│││ ├── norm1            | 6.048ms        | 12.110ms  | 41.47 Mb     | 124.61 Mb | 3
│││ ├── norm2            | 7.821ms        | 15.640ms  | 41.47 Mb     | 124.61 Mb | 3
│││ ├── dropout1         | 37.179ms       | 79.899ms  | 52.56 Mb     | 227.75 Mb | 3
│││ └── dropout2         | 32.213ms       | 64.872ms  | 52.56 Mb     | 227.75 Mb | 3
│├── scale_conv_to_trafo | 20.789ms       | 30.776ms  | 26.87 Mb     | 83.60 Mb  | 2
│├── att                 |                |           |              |           |
││├── hidden             | 26.160ms       | 56.518ms  | 29.86 Mb     | 113.46 Mb | 2
││├── out                | 10.915ms       | 19.699ms  | 23.89 Mb     | 71.66 Mb  | 2
││└── softmax            | 24.973ms       | 49.969ms  | 23.89 Mb     | 71.66 Mb  | 2
└── ImageEncoder         |                |           |              |           |
 └── linear_transform    | 1.649ms        | 1.849ms   | 64.00 Kb     | 64.00 Kb  | 2

from platalea.

cwmeijer avatar cwmeijer commented on September 23, 2024

We decreased the GPU usage by using larger strides. See report https://wandb.ai/spokenlanguage/platalea_transformer/reports/Feb-4-Project-Update-Conv-stride-sweep--Vmlldzo0NDkwMzA.
With also tried to use the extra available memory by adding layers. See report https://wandb.ai/spokenlanguage/platalea_transformer/reports/Feb-23-Project-Update-grid-search-on-trafo-layers-and-heads--Vmlldzo0ODQ5MzE.

from platalea.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.