freewym / espresso Goto Github PK

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

License: Other

Python 98.20% C++ 0.45% Lua 0.09% Shell 0.09% Makefile 0.06% Cuda 0.82% Cython 0.28%

python pytorch fairseq kaldi end-to-end speech-recognition asr

espresso's Introduction

Espresso

Espresso is an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq. Espresso supports distributed training across GPUs and computing nodes, and features various decoding approaches commonly employed in ASR, including look-ahead word-based language model fusion, for which a fast, parallelized decoder is implemented.

We provide state-of-the-art training recipes for the following speech datasets:

What's New:

September 2022: CTC model training and decoding are supported. Check out a config file example.
February 2022: Conformer encoder is implemented. Simply add one line option in the config file to enable it. See examples: here and here.
December 2021: A suite of Transducer model training and decoding code is added. An illustrative LibriSpeech recipe is here. The training requires torchaudio >= 0.10.0 installed.
April 2021: On-the-fly feature extraction from raw waveforms with torchaudio is supported. A LibriSpeech recipe is released here with no dependency on Kaldi and using YAML files (via Hydra) for configuring experiments.
June 2020: Transformer recipes released.
April 2020: Both E2E LF-MMI (using PyChain) and Cross-Entropy training for hybrid ASR are now supported. WSJ recipes are provided here and here as examples, respectively.
March 2020: SpecAugment is supported and relevant recipes are released.
September 2019: We are in an effort of isolating Espresso from fairseq, resulting in a standalone package that can be directly pip installed.

Requirements and Installation

PyTorch version >= 1.10.0
Python version >= 3.8
For training new models, you'll also need an NVIDIA GPU and NCCL
To install Espresso from source and develop locally:

git clone https://github.com/freewym/espresso
cd espresso
pip install --editable .

# on MacOS:
# CFLAGS="-stdlib=libc++" pip install --editable ./
pip install kaldi_io sentencepiece soundfile
cd espresso/tools; make KALDI=<path/to/a/compiled/kaldi/directory>

add your Python path to PATH variable in examples/asr_<dataset>/path.sh, the current default is ~/anaconda3/bin.

kaldi_io is required for reading kaldi scp files. sentencepiece is required for subword pieces training/encoding. soundfile is required for reading raw waveform files. Kaldi is required for data preparation, feature extraction, scoring for some datasets (e.g., Switchboard), and decoding for all hybrid systems.

If you want to use PyChain for LF-MMI training, you also need to install PyChain (and OpenFst):

edit PYTHON_DIR variable in espresso/tools/Makefile (default: ~/anaconda3/bin), and then

cd espresso/tools; make openfst pychain

For faster training install NVIDIA's apex library:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./

License

Espresso is MIT-licensed.

Citation

Please cite Espresso as:

@inproceedings{wang2019espresso,
  title = {Espresso: A Fast End-to-end Neural Speech Recognition Toolkit},
  author = {Yiming Wang and Tongfei Chen and Hainan Xu
            and Shuoyang Ding and Hang Lv and Yiwen Shao
            and Nanyun Peng and Lei Xie and Shinji Watanabe
            and Sanjeev Khudanpur},
  booktitle = {2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
  year = {2019},
}

espresso's People

Contributors

Stargazers

Watchers

Forkers

rohithkodali entn-at lvhang qoboty marvis mowayao shujian2015 xdcesc xzm2004260 xinkez abhishekyana ideaplexus huguanglong wgfi110 desh2608 dragomirradev asrivast13 tchigher cuulee hhy5277 tarsbase barseghyanartur hadryan zqma2 priestd09 lanpn85 ze6 sahanduiuc chenchy caoyuhang hongwen-sun twistedmove wendonggan crosstuck rpersie baozixifan huangziliandy lahiruts 1163710124 alan101-tech srgangireddy by2101 chienlinhuang1116 laurii hopeskair yyttyy wqn628 xiaming9880 mbrukman feddybear fei00wu mahbubnoor jinyiyang-jhu dertilo valentinp72 newhousewhite messiaen dendisuhubdy luyizhou4 1div0 tolysz luweishuang 5l1v3r1 zhiguangzhang agangzz vyraun nonlocal shiyuzh2007 peter05010402 chunchiehchang 200987299 hiyoung-asr joyfish kingfener arendu-zz yfliao 53x rheehot abubakar26 medbar 23lnlx pzelasko markwucl shantanunair mirishkarganesh amirhussein96 donstang aarora8 russ76 jmiller0711 goodbyedk sshuster rxhmdia rakhi-alina pengyizhou shirosweets trendingtechnology mbencherif sciai-ai slikos

espresso's Issues

ASR_WSJ: LM is training but no logging output?

🐛 Bug

I am running the asr_wsj recipe. It is training the word_lm (stage 6) since last night but does not produce any output, logging or otherwise.

When I run nvtop or Nvidia-smi the gpus seem to be busy with my jobs. I am running 4 GPUs in parallel. Early on there were some OOM problems that it tried to recover from. Is it possible it in some sort of weird infinite loop but is doing nothing?

Attached is the screen output - at the top you can see nvidia-smi is run along with the early OOM messages.

no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/condabin/conda
no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/bin/conda
no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/bin/conda-env
no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/bin/activate
no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/bin/deactivate
no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/etc/profile.d/conda.sh
no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/etc/fish/conf.d/conda.fish
no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/shell/condabin/Conda.psm1
no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/shell/condabin/conda-hook.ps1
no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/lib/python3.7/site-packages/xontrib/conda.xsh
no change /misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espnet-may142020/etc/profile.d/conda.csh
no change /home/map22/.bashrc
No action taken.
Tue Dec 8 22:30:53 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.36 Driver Version: 440.36 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... On | 00000000:02:00.0 Off | N/A |
| 23% 18C P8 9W / 250W | 1MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... On | 00000000:03:00.0 Off | N/A |
| 23% 21C P8 9W / 250W | 1MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... On | 00000000:82:00.0 Off | N/A |
| 23% 22C P8 8W / 250W | 1MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 108... On | 00000000:83:00.0 Off | N/A |
| 23% 22C P8 8W / 250W | 1MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Stage 3: Text Binarization for LM Training
./run.sh: binarizing word text...
Unable to get 4 GPUs
Stage 6: word LM Training
2020-12-08 22:32:29 | INFO | fairseq.distributed_utils | distributed init (rank 0): tcp://localhost:19801
2020-12-08 22:32:29 | INFO | fairseq.distributed_utils | distributed init (rank 2): tcp://localhost:19801
2020-12-08 22:32:29 | INFO | fairseq.distributed_utils | distributed init (rank 1): tcp://localhost:19801
2020-12-08 22:32:29 | INFO | fairseq.distributed_utils | distributed init (rank 3): tcp://localhost:19801
2020-12-08 22:32:39 | INFO | fairseq.distributed_utils | initialized host lion6.cs.nyu.edu as rank 3
2020-12-08 22:32:39 | INFO | fairseq.distributed_utils | initialized host lion6.cs.nyu.edu as rank 2
2020-12-08 22:32:39 | INFO | fairseq.distributed_utils | initialized host lion6.cs.nyu.edu as rank 0
2020-12-08 22:32:39 | INFO | fairseq.distributed_utils | initialized host lion6.cs.nyu.edu as rank 1
2020-12-08 22:32:39 | INFO | fairseq_cli.train | {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 1000, 'log_format': 'simple', 'tensorboard_logdir': None, 'wandb_project': None, 'seed': 1, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': False, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'user_dir': None, 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile': False, 'reset_logging': True}, 'common_eval': {'_name': None, 'path': None, 'post_process': None, 'quiet': False, 'model_overrides': '{}', 'results_path': None}, 'distributed_training': {'_name': None, 'distributed_world_size': 4, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': 'tcp://localhost:19801', 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': False, 'ddp_backend': 'c10d', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': False, 'find_unused_parameters': False, 'fast_stat_sync': False, 'broadcast_buffers': False, 'distributed_wrapper': 'DDP', 'slowmo_momentum': None, 'slowmo_algorithm': 'LocalSGD', 'localsgd_frequency': 3, 'nprocs_per_node': 4, 'pipeline_model_parallel': False, 'pipeline_balance': None, 'pipeline_devices': None, 'pipeline_chunks': 0, 'pipeline_encoder_balance': None, 'pipeline_encoder_devices': None, 'pipeline_decoder_balance': None, 'pipeline_decoder_devices': None, 'pipeline_checkpoint': 'never', 'zero_sharding': 'none', 'tpu': False, 'distributed_num_procs': 4}, 'dataset': {'_name': None, 'num_workers': 0, 'skip_invalid_size_inputs_valid_test': False, 'max_tokens': 6400, 'batch_size': 256, 'required_batch_size_multiple': 8, 'required_seq_len_multiple': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'train_subset': 'train', 'valid_subset': 'valid', 'validate_interval': 1, 'validate_interval_updates': 0, 'validate_after_updates': 0, 'fixed_validation_seed': None, 'disable_validation': False, 'max_tokens_valid': 6400, 'batch_size_valid': 512, 'curriculum': 0, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0}, 'optimization': {'_name': None, 'max_epoch': 25, 'max_update': 0, 'stop_time_hours': 0.0, 'clip_norm': 0.0, 'sentence_avg': False, 'update_freq': [1], 'lr': [0.001], 'min_lr': -1.0, 'use_bmuf': False}, 'checkpoint': {'_name': None, 'save_dir': 'exp/wordlm_lstm', 'restore_file': 'checkpoint_last.pt', 'finetune_from_model': None, 'reset_dataloader': False, 'reset_lr_scheduler': False, 'reset_meters': False, 'reset_optimizer': False, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 1000, 'keep_interval_updates': 5, 'keep_last_epochs': 5, 'keep_best_checkpoints': -1, 'no_save': False, 'no_epoch_checkpoints': False, 'no_last_checkpoints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'loss', 'maximize_best_checkpoint_metric': False, 'patience': -1, 'checkpoint_suffix': '', 'checkpoint_shard_count': 1, 'model_parallel_size': 1, 'distributed_rank': 0}, 'bmuf': {'_name': None, 'block_lr': 1.0, 'block_momentum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 4}, 'generation': {'_name': None, 'beam': 5, 'nbest': 1, 'max_len_a': 0.0, 'max_len_b': 200, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'unkpen': 0.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': False, 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False, 'eos_factor': None, 'subwordlm_weight': 0.8, 'oov_penalty': 0.0001, 'disable_open_vocab': False, 'apply_log_softmax': False, 'state_prior_file': None}, 'eval_lm': {'_name': None, 'output_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'_name': None, 'buffer_size': 0, 'input': '-'}, 'model': Namespace(_name='lstm_wordlm_wsj', adam_betas='(0.9, 0.999)', adam_eps=1e-08, adaptive_softmax_cutoff=None, add_bos_token=False, all_gather_list_size=16384, arch='lstm_wordlm_wsj', batch_size=256, batch_size_valid='512', best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', clip_norm=0.0, cpu=False, criterion='cross_entropy', curriculum=0, data='data/wordlm_text', data_buffer_size=10, dataset_impl=None, ddp_backend='c10d', decoder_dropout_in=0.35, decoder_dropout_out=0.35, decoder_embed_dim=1200, decoder_embed_path=None, decoder_freeze_embed=False, decoder_hidden_size=1200, decoder_layers=3, decoder_out_embed_dim=1200, decoder_rnn_residual=False, device_id=0, dict='data/lang/wordlist_65000.txt', disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=4, distributed_wrapper='DDP', dropout=0.35, empty_cache_freq=0, eos=2, fast_stat_sync=False, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, future_target=False, gen_subset='test', is_wordlm=True, keep_best_checkpoints=-1, keep_interval_updates=5, keep_last_epochs=5, localsgd_frequency=3, log_format='simple', log_interval=1000, lr=[0.001], lr_patience=0, lr_scheduler='reduce_lr_on_plateau', lr_shrink=0.5, lr_threshold=0.0001, max_epoch=25, max_target_positions=None, max_tokens=6400, max_tokens_valid=6400, max_update=0, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1.0, model_parallel_size=1, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_seed_provided=False, nprocs_per_node=4, num_shards=1, num_workers=0, optimizer='adam', optimizer_overrides='{}', output_dictionary_size=-1, pad=1, past_target=False, patience=-1, pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, profile=False, quantization_config_path=None, required_batch_size_multiple=8, required_seq_len_multiple=1, reset_dataloader=False, reset_logging=True, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', sample_break_mode='eos', save_dir='exp/wordlm_lstm', save_interval=1, save_interval_updates=1000, scoring='bleu', seed=1, self_target=False, sentence_avg=False, shard_id=0, share_embed=True, shorten_data_split_list='', shorten_method='none', skip_invalid_size_inputs_valid_test=False, slowmo_algorithm='LocalSGD', slowmo_momentum=None, stop_time_hours=0, task='language_modeling_for_asr', tensorboard_logdir=None, threshold_loss_scale=None, tokenizer=None, tokens_per_sample=1024, tpu=False, train_subset='train', unk=3, update_freq=[1], use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, wandb_project=None, warmup_init_lr=-1, warmup_updates=0, weight_decay=0.0, zero_sharding='none'), 'task': {'_name': 'language_modeling_for_asr', 'data': 'data/wordlm_text', 'sample_break_mode': 'eos', 'tokens_per_sample': 1024, 'output_dictionary_size': -1, 'self_target': False, 'future_target': False, 'past_target': False, 'add_bos_token': False, 'max_target_positions': None, 'shorten_method': 'none', 'shorten_data_split_list': '', 'seed': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'tpu': False, 'dict': 'data/lang/wordlist_65000.txt'}, 'criterion': {'_name': 'cross_entropy', 'sentence_avg': False}, 'optimizer': {'_name': 'adam', 'adam_betas': '(0.9, 0.999)', 'adam_eps': 1e-08, 'weight_decay': 0.0, 'use_old_adam': False, 'tpu': False, 'lr': [0.001]}, 'lr_scheduler': {'_name': 'reduce_lr_on_plateau', 'lr_shrink': 0.5, 'lr_threshold': 0.0001, 'lr_patience': 0, 'warmup_updates': 0, 'warmup_init_lr': -1.0, 'lr': [0.001], 'maximize_best_checkpoint_metric': False}, 'scoring': {'_name': 'bleu', 'pad': 1, 'eos': 2, 'unk': 3}, 'bpe': None, 'tokenizer': None}
2020-12-08 22:32:39 | INFO | espresso.tasks.language_modeling_for_asr | dictionary: 65003 types
2020-12-08 22:32:39 | INFO | fairseq.data.data_utils | loaded 503 examples from: data/wordlm_text/valid
2020-12-08 22:32:42 | INFO | fairseq_cli.train | LSTMLanguageModelEspresso(
(decoder): SpeechLSTMDecoder(
(dropout_in_module): FairseqDropout()
(dropout_out_module): FairseqDropout()
(embed_tokens): Embedding(65003, 1200, padding_idx=0)
(layers): ModuleList(
(0): LSTMCell(1200, 1200)
(1): LSTMCell(1200, 1200)
(2): LSTMCell(1200, 1200)
)
)
)
2020-12-08 22:32:42 | INFO | fairseq_cli.train | task: LanguageModelingForASRTask
2020-12-08 22:32:42 | INFO | fairseq_cli.train | model: LSTMLanguageModelEspresso
2020-12-08 22:32:42 | INFO | fairseq_cli.train | criterion: CrossEntropyCriterion)
2020-12-08 22:32:42 | INFO | fairseq_cli.train | num. model params: 112592400 (num. trained: 112592400)
2020-12-08 22:32:43 | INFO | fairseq.utils | CUDA enviroments for all 4 workers
2020-12-08 22:32:43 | INFO | fairseq.utils | rank 0: capabilities = 6.1 ; total memory = 10.917 GB ; name = GeForce GTX 1080 Ti
2020-12-08 22:32:43 | INFO | fairseq.utils | rank 1: capabilities = 6.1 ; total memory = 10.917 GB ; name = GeForce GTX 1080 Ti
2020-12-08 22:32:43 | INFO | fairseq.utils | rank 2: capabilities = 6.1 ; total memory = 10.917 GB ; name = GeForce GTX 1080 Ti
2020-12-08 22:32:43 | INFO | fairseq.utils | rank 3: capabilities = 6.1 ; total memory = 10.917 GB ; name = GeForce GTX 1080 Ti
2020-12-08 22:32:43 | INFO | fairseq.utils | CUDA enviroments for all 4 workers
2020-12-08 22:32:43 | INFO | fairseq_cli.train | training on 4 devices (GPUs/TPUs)
2020-12-08 22:32:43 | INFO | fairseq_cli.train | max tokens per GPU = 6400 and batch size per GPU = 256
2020-12-08 22:32:43 | INFO | fairseq.trainer | no existing checkpoint found exp/wordlm_lstm/checkpoint_last.pt
2020-12-08 22:32:43 | INFO | fairseq.trainer | loading train data for epoch 1
/misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espresso-dec082020/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:398: UserWarning: The check_reduction argument in DistributedDataParallel module is deprecated. Please avoid using it.
"The check_reduction argument in DistributedDataParallel "
2020-12-08 22:41:58 | INFO | fairseq.data.data_utils | loaded 1662964 examples from: data/wordlm_text/train
/misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espresso-dec082020/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:398: UserWarning: The check_reduction argument in DistributedDataParallel module is deprecated. Please avoid using it.
"The check_reduction argument in DistributedDataParallel "
/misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espresso-dec082020/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:398: UserWarning: The check_reduction argument in DistributedDataParallel module is deprecated. Please avoid using it.
"The check_reduction argument in DistributedDataParallel "
/misc/vlgscratch4/PichenyGroup/picheny/anaconda3/envs/espresso-dec082020/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:398: UserWarning: The check_reduction argument in DistributedDataParallel module is deprecated. Please avoid using it.
"The check_reduction argument in DistributedDataParallel "
2020-12-08 22:42:06 | INFO | fairseq.trainer | begin training epoch 1
/misc/vlgscratch5/PichenyGroup/picheny/espresso/fairseq/utils.py:347: UserWarning: amp_C fused kernels unavailable, disabling multi_tensor_l2norm; you may get better performance by installing NVIDIA's apex library
"amp_C fused kernels unavailable, disabling multi_tensor_l2norm; "
/misc/vlgscratch5/PichenyGroup/picheny/espresso/fairseq/utils.py:347: UserWarning: amp_C fused kernels unavailable, disabling multi_tensor_l2norm; you may get better performance by installing NVIDIA's apex library
"amp_C fused kernels unavailable, disabling multi_tensor_l2norm; "
/misc/vlgscratch5/PichenyGroup/picheny/espresso/fairseq/utils.py:347: UserWarning: amp_C fused kernels unavailable, disabling multi_tensor_l2norm; you may get better performance by installing NVIDIA's apex library
"amp_C fused kernels unavailable, disabling multi_tensor_l2norm; "
/misc/vlgscratch5/PichenyGroup/picheny/espresso/fairseq/utils.py:347: UserWarning: amp_C fused kernels unavailable, disabling multi_tensor_l2norm; you may get better performance by installing NVIDIA's apex library
"amp_C fused kernels unavailable, disabling multi_tensor_l2norm; "
2020-12-08 22:42:08 | INFO | root | Reducer buckets have been rebuilt in this iteration.
2020-12-08 22:42:14 | WARNING | fairseq.trainer | OOM: Ran out of memory with exception: CUDA out of memory. Tried to allocate 1.55 GiB (GPU 1; 10.92 GiB total capacity; 7.68 GiB already allocated; 1.37 GiB free; 8.91 GiB reserved in total by PyTorch)
2020-12-08 22:42:14 | WARNING | fairseq.trainer | |===========================================================================|

PyTorch CUDA memory summary, device ID 0
CUDA OOMs: 0
===========================================================================
Metric
---------------------------------------------------------------------------
Allocated memory
from large pool
from small pool
---------------------------------------------------------------------------
Active memory
from large pool
from small pool
---------------------------------------------------------------------------
GPU reserved memory
from large pool
from small pool
---------------------------------------------------------------------------
Non-releasable memory
from large pool
from small pool
---------------------------------------------------------------------------
Allocations
from large pool
from small pool
---------------------------------------------------------------------------
Active allocs
from large pool
from small pool
---------------------------------------------------------------------------
GPU reserved segments
from large pool
from small pool
---------------------------------------------------------------------------
Non-releasable allocs
from large pool
from small pool
===========================================================================