hirofumi0810 / neural_sp Goto Github PK

End-to-end ASR/LM implementation with PyTorch

License: Apache License 2.0

Python 98.18% Shell 1.12% Makefile 0.70%

pytorch speech-recognition automatic-speech-recognition asr ctc attention-mechanism attention seq2seq sequence-to-sequence speech

neural_sp's People

Contributors

Stargazers

Watchers

Forkers

dsp6414 carolinebear nipengmath caochensi ddxk xitianluo templeblock xinkez xdcesc johnwu678 by2101 fireae elgeish chavesliu cahuja1992 sscorpio93 mowayao iwaterxt shubhamrock428 addie11 magic-bubble hiyoung-asr entn-at pigzach esdsnqxz shiyuzh2007 hwong39 mgsong ishine thanhkm akmal018 githanwenjing sidney1994 twistedmove iamweiweishi zth9730 qoboty alan101-tech drrdrem many-hats jiabinxue juneren xiaoxuegu rpersie yyttyy rxhmdia park-jong-min xin-w8023 cescfangs sunilsivadas qmpzzpmq zhengkuntian alongwithyou saber5433 nikhil-salodkar lahiruts xiexukang xhtian caozhengquan fuyanzhe chenchy widdiot whitefu zh794390558 cdliang11 li563042811 natalia-t sanzimu shammur sunski hiejulia lalimili6 hanhaoyu139 soonsyj happyjin nikhil-garg ggsonic gandolfxu pradipcyb lfgogogo ductho9799 cst781 sdqdlgj gavin90s lijianhackthon mirishkarganesh luomingshuang eyonjoshua91 sphara-app qute012 stefan-falk tanghaitao1994 forestlee vpellegrain ernie-mlg qwjaskzxl a2d8a4v hajime9652 jinggaizi sundy1219

neural_sp's Issues

bug: bugs when use multi-gpu training

Hi, hiro, I use multi GPU to train an ASR model. It seems torch.nn.parallel.DataParallel won't scatter the dict input. You should overwrite the scatter function in CustomDataParallel function to make sure each item in the dict can be scattered on different devices.

SkipThought class is missing

I failed to find this class
from neural_sp.models.seq2seq.skip_thought import SkipThought

did I download the wrong version of the code?

multi gpu training aishell error

./run.sh --gpu 0,1,2,3 --stage 4 --stop_stage 4 --conf conf/asr/blstm_las.yaml

Traceback (most recent call last):
  File "/e2e_asr/neural_sp/examples/aishell/s5/../../../neural_sp/bin/asr/train.py", line 501, in <module>
    save_path = pr.runcall(main)
  File "/e2e_asr/neural_sp/tools/miniconda/lib/python3.7/cProfile.py", line 121, in runcall
    return func(*args, **kw)
  File "/e2e_asr/neural_sp/examples/aishell/s5/../../../neural_sp/bin/asr/train.py", line 309, in main
    teacher=teacher, teacher_lm=teacher_lm)
  File "/e2e_asr/neural_sp/tools/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/e2e_asr/neural_sp/tools/miniconda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/e2e_asr/neural_sp/tools/miniconda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/e2e_asr/neural_sp/tools/miniconda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
    output.reraise()
  File "/e2e_asr/neural_sp/tools/miniconda/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
StopIteration: Caught StopIteration in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/e2e_asr/neural_sp/tools/miniconda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "/e2e_asr/neural_sp/tools/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/e2e_asr/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 248, in forward
    loss, observation = self._forward(batch, task, teacher, teacher_lm)
  File "/e2e_asr/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 258, in _forward
    eout_dict = self.encode(batch['xs'], 'all')
  File "/e2e_asr/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 373, in encode
    xs = pad_list([np2tensor(x, self.device).float() for x in xs], 0.)
  File "/e2e_asr/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 373, in <listcomp>
    xs = pad_list([np2tensor(x, self.device).float() for x in xs], 0.)
  File "/e2e_asr/neural_sp/neural_sp/models/base.py", line 58, in device
    return next(self.parameters()).device
StopIteration

how to fix this?
commit 66a8865

Question about the SpecAugment configuration in aishell example

First ,thanks to your great work!

I am new to neural_sp. I have a question about the SpecAugment configuration in aishell example.
In README.md of aishell:

Transformer + SpecAugment (no LM)
conf: conf/asr/transformer.yaml
decoding parameters
n_average: 10
beam width: 5

Eval Set	# Snt	# Wrd	Corr	Sub	Del	Ins	Err	S.Err
dev	14326	205341	95.1	4.8	0.1	0.1	5.0	36.0
test	7176	104765	94.7	5.1	0.2	0.1	5.4	37.7

'''

I don't find 'n_freq_masks' or 'n_time_masks' config in the config yaml.

Since their default value was 0, SpecAugment will not work.

So, just setting 'n_freq_masks=1' and 'n_time_masks=1' , will that be correct?

what's the mean of `hie` in config

accum_grad loss error

neural_sp/neural_sp/bin/asr/train.py

Line 321 in a02c929

loss = loss / accum_n_steps

should divide loss by accum_grad_n_steps

about the bug when decoding streaming ASR

2020-08-17T03:05:45.568884032Z File "neural_sp/neural_sp/models/seq2seq/decoders/transformer.py", line 968, in beam_search 2020-08-17T03:05:45.568887299Z rightmost_frame = max(0, aws_last_success[0, :, 0].nonzero()[:, -1].max().item()) + 1 2020-08-17T03:05:45.568890475Z RuntimeError: invalid argument 1: tensor must have one dimension at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:590

I need assistance, really appreciate.
After finish trainning your default streaming configs on aishell, I tried to decode the model and this error occurs.

Multi GPU training speed

Hello ， I am using neural_sp to train aishell-1 and another task, but I find that when i use 4 GPU to train ASR model, it needs 8 hours per epoch, which is slow than I use single GPU，the time even grow when I add more GPU. Do you have this problem or there is something wrong with my envirment?
I use your tools to install the envirment, python=3.7.9 and pytorch=10.0 ,with cuda 10.1。

Add License

Can you please add a license?

why not add decode step in run.sh?

【CIF】Is the model available ？

HI，I saw that models / modules have CIF models. Have you done any testing on relevant data? What is the effect? I didn't find the relevant conf.
If not, is there any plan to supplement it?

Look forward to your reply~
Thank you

aishell test set decoded by streaming transformer cer cannot reach 6.6

train config:
conf/asr/transformer_mma/lc_transformer_mma_mono4H_chunk4H_chunk16_from4L_headdrop0.5_subsample8_96_64_32.yaml
Because CUDA doesn't have enough memory, I change batch_size from 32 to 16;

decode config
metric=edit_distance
batch_size=1
beam_width=10
min_len_ratio=0.0
max_len_ratio=1.0
length_penalty=2.0
length_norm=false
coverage_penalty=0.0
coverage_threshold=0.0
gnmt_decoding=false
eos_threshold=1.0
lm=
lm_second=
lm_bwd=
lm_weight=0.0
lm_second_weight=0.0
lm_bwd_weight=0.0
ctc_weight=0.0 # 1.0 for joint CTC-attention means decoding with CTC
resolving_unk=false
fwd_bwd_attention=false
bwd_attention=false
reverse_lm_rescoring=false
asr_state_carry_over=false
lm_state_carry_over=true
n_average=10 # for Transformer
oracle=false
chunk_sync=false # for MoChA
mma_delay_threshold=8

The recognition result on the test test set is Corr:68.1% ,much lower than in README(cer:6.6)

how can this project do keyword spotting task?

Sorry, I can not find the 'AcousticModel' code ?

I can not find the code difine, can any one help?
neural_sp/neural_sp/bin/asr/train.py /
from neural_sp.models.seq2seq.acoustic_model import AcousticModel

Is it your WER on librispeech?

Bug: TypeError: forward() got an unexpected keyword argument 'boundary_rightmost'

Hi,
When I use your recent code, in example/aishell/s5.
when I score.sh the result, get error:

File "/../../../neural_sp/bin/asr/eval.py", line 259, in
main()
File "../../../neural_sp/bin/asr/eval.py", line 190, in main
progressbar=True)
File "neural_sp/evaluators/word.py", line 82, in eval_word
ensemble_models=models[1:] if len(models) > 1 else [])
File "neural_sp/models/seq2seq/speech2text.py", line 752, in decode
ensmbl_eouts, ensmbl_elens, ensmbl_decs)
File "neural_sp/models/seq2seq/decoders/transformer.py", line 772, in beam_search
eps_wait=eps_wait)
File "tools/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'boundary_rightmost'
0%| | 0/400 [00:00<?, ?it/s]

torch version is not valid for warmctc_pytorch torch's

Hi, I have met a problem for warmctc_pytorch, which torch's version is installed rightly in version 0.4, but this code is based on torch-1.0.0.
Are there any friends have solved this? Help!

Thanks!

config file and example for rnnt

Which example can I use for training an rnnt model ?

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Hi!
When I ran ASR model training stage (stage 4) with 8 * 1080 Ti, I got error as follows:

Original utterance num: 281241
Removed 54 utterances (threshold)
Original utterance num: 2703
Removed 61 utterances (threshold)
5%|▌         | 15240/281187 [05:13<1:39:01, 44.76it/s]
Traceback (most recent call last):
  File "/asr/neural_sp-master/examples/librispeech/s5/../../../neural_sp/bin/asr/train.py", line 533, in <module>
    save_path = pr.runcall(main)
  File "/asr/miniconda/lib/python3.7/cProfile.py", line 121, in runcall
    return func(*args, **kw)
  File "/asr/neural_sp-master/examples/librispeech/s5/../../../neural_sp/bin/asr/train.py", line 367, in main
    teacher=teacher, teacher_lm=teacher_lm)
  File "/asr/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/asr/miniconda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 143, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/asr/miniconda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 153, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/asr/miniconda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
    raise output
  File "/asr/miniconda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
    output = module(*input, **kwargs)
  File "/asr/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/asr/neural_sp-master/neural_sp/models/seq2seq/speech2text.py", line 426, in forward
    loss, reporter = self._forward(batch, task, reporter, teacher, teacher_lm)
  File "/asr/neural_sp-master/neural_sp/models/seq2seq/speech2text.py", line 461, in _forward
    enc_outs = self.encode(batch['xs'], 'all', flip=flip)
  File "/asr/neural_sp-master/neural_sp/models/seq2seq/speech2text.py", line 568, in encode
    enc_outs = self.enc(xs, xlens, task.split('.')[0])
  File "/asr/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/asr/neural_sp-master/neural_sp/models/seq2seq/encoders/rnn.py", line 300, in forward
    xs = self.padding(xs, xlens, self.rnn[l])
  File "/asr/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/asr/neural_sp-master/neural_sp/models/seq2seq/encoders/rnn.py", line 378, in forward
    xs, _ = rnn(xs, hx=None)
  File "/asr/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/asr/miniconda/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 182, in forward
    self.num_layers, self.dropout, self.training, self.bidirectional)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Anyone know what is the problem?

My environment:
system ubuntu 16.04
gpu NVIDIA GTX 1080 Ti
python 3.7.4
cuda 9.0.176
torch 1.0.0
cudnn 7.0.5

However, it worked when I used only one GPU.
Can anyone help me resolve this issue?
Thank you!

GLU bottleneck

Hi, I have a question regarding your GLU implementation here.

As far as I understood the GCNN paper, the gate mechanism should be taken place in bottleneck layer if it is issued. In your implementation, however, it is happened after resample the bottleneck to output dimension.
layers['conv_out'] = nn.utils.weight_norm( nn.Conv2d(in_channels=bottlececk_dim, out_channels=out_ch * 2, kernel_size=(1, 1)), name='weight', dim=0)

I'm not sure if I really understood the paper correctly, but I would have implemented it like this:

def __init__(...):
    .....
    elif bottlececk_dim > 0:
        layers['conv_in'] = nn.utils.weight_norm(
            nn.Conv2d(in_channels=in_ch,
                      out_channels=bottlececk_dim,
                      kernel_size=(1, 1)), name='weight', dim=0)
        layers['dropout_in'] = nn.Dropout(p=dropout)
        layers['layers'] = nn.utils.weight_norm(
                    nn.Conv2d(in_channels=bottlececk_dim,
                              out_channels=bottlececk_dim * 2,
                              kernel_size=(kernel_size, 1)), name='weight', dim=0)
        layers['dropout'] = nn.Dropout(p=dropout)
        
        self._layers1 = nn.Sequential(layers)
        self._layers2 = nn.Sequential(
               nn.utils.weight_norm(
                   nn.Conv2d(in_channels=bottlececk_dim,
                             out_channels=out_ch,
                             kernel_size=(1, 1)), name='weight', dim=0),
               nn.Dropout(p=dropout)
        )

def forward(...):
    ......#padding, residual etc.
    x = self._layers1(x)
    x = F.glu(x)
    x = self._layers2(x)
    .......

How can I reproduce mocha WER in Librispeech?

I want to reproduce Librispeech MOCHA WER in this paper (CTC-synchronous Training for Monotonic Attention Model).
When I use conf file "lstm_mocha.yaml" in librispeech example, it was reduced from 700% wer at first to 100% wer at 35 epoch without lm.
I just change conf in run.sh and training with 1 GPU.
I would like to know if there is anything else that needs to be modified to make the wer in addition to the conf on run.sh.

How much does the LM affect the WER in the Librispeech recipe?

Hi,
I'm new to neural_sp and I found it's really a good project. The results showed in the Librispeech recipe readme is very good. I'm trying to reproduce the results using the recipe and I want to know how much does the LM affects the WER? For example, if I only train a model(Transformer + speed perturb + SpecAugment) without the LM, how about the performance on the test sets?
I also found the Conformer has been implemented in this project. The result showed in Conformer's paper is better than the compared Transformer. Are there any results in neural_sp with Conformer?
Thank you very much!

How can I use external LM and fusion LM?

Using the script here as an example, I ask the question.
https://github.com/hirofumi0810/neural_sp/blob/master/examples/csj/s5/run.sh

I think, If you run that script as is , you're not using the LM you learned in stage 3, are you?
So I stopped learning at stage 3 and started relearning at stage 4, specifying the language model created in stage 3.

Add the following options as arguments to train.py in run.sh.

external_lm=${model}/lm/train_nodev_all_vocaball_wpbpe10000/lstm1024H0P4L_emb1024_adam_lr0.001_bs64_bptt200_tie_residual_glu_dropI0.2H0.5_ls0.1_3/model.epoch-13

CUDA_VISIBLE_DEVICES=${gpu} ${NEURALSP_ROOT}/neural_sp/bin/asr/train.py \
        --corpus csj \
        --config ${conf} \
        --config2 ${conf2} \
        --n_gpus ${n_gpus} \
        --cudnn_benchmark ${benchmark} \
        --train_set ${data}/dataset/${train_set}_${unit}${wp_type}${vocab}.tsv \
        --dev_set ${data}/dataset/${dev_set}_${unit}${wp_type}${vocab}.tsv \
        --eval_sets ${asr_test_sets} \
        --unit ${unit} \
        --dict ${dict} \
        --wp_model ${wp_model}.model \
        --model_save_dir ${model}/asr \
        --asr_init ${asr_init} \
        --external_lm ${external_lm} \
        --lm_fusion 'cold' \
        --stdout ${stdout} \
        --resume ${resume} || exit 1;

Is this the right approach?

inject_weight_noise is not defined?

How much GPU memory do we need to run librispeech lstm_mocha.yaml?

I try to run lstm_mocha.conf script in librispeech example.
But, out of memory occurs in a 24GB size GPU when I try to run the scripted with reference batch_size (30).
How much GPU memory is need for librispeech lstm_mocha.yaml?
Or, Can I get similar WER rate with accum_grad_n_steps option with lower batch_size in lstm_mocha.yaml?

The ''decode_streaming'' in speech2text.py did not support the streamming transformer?

The model https://arxiv.org/pdf/2005.09394.pdf , But the ''decode_streaming'' in speech2text.py did not support the streamming transformer? It only support rnn model?

Error during training librispeech model, in multi-gpu mode

I've got the following error message from stage 3 (training lm) during training librispeech model in the example directory.
AttributeError: 'numpy.ndarray' object has no attribute 'items'
Checking the corresponding code in data_parallel.py, the variable 'input' is found to be of type 'ndarray', but the code tries to call 'input.items()' though 'input' is not a dictionary type. Is there anything wrong? or how can I fix it?
This error happens when I training with multi gpus. Single-gpu training seems to work without errors.

============================================================================
LM Training stage (stage:3)

0%| | 0/11000320 [00:00<?, ?it/s]Original utterance num: 281241
Removed 0 utterances (threshold)
Original utterance num: 2703
Removed 0 utterances (threshold)
Original utterance num: 2864
Removed 0 utterances (threshold)
Original utterance num: 2620
Removed 0 utterances (threshold)
Original utterance num: 2939
Removed 0 utterances (threshold)
Traceback (most recent call last):
File "/home/ahn/pkgs/neural_sp/examples/librispeech/s5/../../../neural_sp/bin/lm/train.py", line 340, in
save_path = pr.runcall(main)
File "/home/ahn/pkgs/neural_sp/tools/miniconda/lib/python3.7/cProfile.py", line 121, in runcall
return func(*args, **kw)
File "/home/ahn/pkgs/neural_sp/examples/librispeech/s5/../../../neural_sp/bin/lm/train.py", line 214, in main
loss, hidden, observation = model(ys_train, state=hidden)
File "/home/ahn/pkgs/neural_sp/tools/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ahn/pkgs/neural_sp/tools/miniconda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 151, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
File "/scl_group_shared/ahn/pkgs/neural_sp/neural_sp/models/data_parallel.py", line 49, in scatter
res = [{k: scatter_map(v, i) for k, v in inputs.items()} for i in range(len(self.device_ids))]
File "/scl_group_shared/ahn/pkgs/neural_sp/neural_sp/models/data_parallel.py", line 49, in
res = [{k: scatter_map(v, i) for k, v in inputs.items()} for i in range(len(self.device_ids))]
AttributeError: 'numpy.ndarray' object has no attribute 'items'
0%| | 0/11000320 [00:00<?, ?it/s]

python: can't open file 'check_install.py': [Errno 2] No such file or directory

hello,everyone. When I install neural_sp, the error that "python: can't open file 'check_install.py': [Errno 2] No such file or directory"
occur. At the same time, I cannot find check_install.py. what should I do to solve this problem.

why should we initialize the gating bias -1 for cold fusion?

is there any experiment supporting this?

Error in ASR Training stage: Missing transformer_transducer

Hi,
Thanks for creating such a great repo!
I tried running the SWBD example and ran into an issue at the ASR training stage. It seems a python module is missing (transformer_transducer.py)

============================================================================
                       ASR Training stage (stage:4)
============================================================================
Traceback (most recent call last):
  File "/home/user/neural_sp/neural_sp/bin/asr/train.py", line 41, in <module>
    from neural_sp.models.seq2seq.speech2text import Speech2Text
  File "/home/user/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 26, in <module>
    from neural_sp.models.seq2seq.decoders.transformer_transducer import TrasformerTransducer
ModuleNotFoundError: No module named 'neural_sp.models.seq2seq.decoders.transformer_transducer'

Could you add the file, or should I just remove any reference to it in speech2text.py?
Thanks!

what is the different between neuralSp and ESPnet in term of accuracy, models and speed ?

transformer LM error in ASR

Hi,
I want to use transformer LM in ASR ,but when decoding it is wrong , it shows " TypeError :list indices muts be intergers or slice, not str" at /models/seq2seq/decoders/transformer.py line 814. How can I solve it ?

Thank you very much.

what's the intention of mocha first layers?

neural_sp/neural_sp/models/seq2seq/decoders/transformer.py

Lines 190 to 194 in 78fa843

    
           # self-attention 
        
           self.layers = nn.ModuleList([copy.deepcopy(TransformerDecoderBlock( 
        
               d_model, d_ff, attn_type, n_heads, dropout, dropout_att, dropout_layer, 
        
               layer_norm_eps, ffn_activation, param_init, 
        
               src_tgt_attention=False if lth < mocha_first_layer - 1 else True,

hey, I notice there'll be mocha_first_layer - 1 transformer blocks without encoder-decoder attention, what's the intention of it? And I copied the neural_sp transformer decoder into espnet, training will not converge if mocha_first_layer set to 4(config from librispeech), but it will be much better if I set mocha_first_layer to 0.

rnn transducer param error

no glu_joint_net param now.

Question on masking in transformer encoder

Hello Mr. Hirofumi, thanks for opening source this excellent repository. Here I have some questions about the masking mechanism in transformer encoder:

In the code here

neural_sp/neural_sp/models/seq2seq/encoders/transformer.py

Line 394 in 3be0bac

if self.streaming_type == 'reshape':

there is a note says: # NOTE: no mask to avoid masking all frames in a chunk. If you would be so kind, could you explain a little bit the reason why avoiding masking all frames in a chunk? Isn't it ok if all frames in a chunk are masked? In which case, I think both the masks and encoder output could be correct after reshape from chunkwise shapes to their normal shapes.

Another question is how do you deal with overlapped chunks when streaming_type=='mask'?

How many gpus do you use for aishell dataset?

Hi, I realize the accum_grad_n_step argument is set to 8 in transformer.yaml, I want to know that how many gpus do you use when you train models for this setting. Should I change it to 2 when I use 4 gpus to reproduce your results?

Install Instructions and Scripts to replicate the results on librispeech

thank you

A problem about shallow fusion

Hi , thanks for your uploading code. have you uploaded the code os shallow fusion, I didn't find it.

aishell run.sh error, segmentation fault

run.sh: line 186: 9762 Segmentation fault (core dumped) CUDA_VISIBLE_DEVICES=${gpu} ${NEURALSP_ROOT}/neural_sp/bin/asr/train.py --corpus aishell1 --config ${conf} --config2 ${conf2} --n_gpus ${n_gpus} --cudnn_benchmark ${benchmark} --train_set ${data}/dataset/${train_set}.tsv --dev_set ${data}/dataset/${dev_set}.tsv --eval_sets ${data}/dataset/${test_set}.tsv --unit ${unit} --dict ${dict} --model_save_dir ${model}/asr --asr_init ${asr_init} --external_lm ${external_lm} --stdout ${stdout} --resume ${resume}

Aboult the warmup_n_steps and accum_grad_n_steps usd in Aishell

In the config transformer.yaml, "n_epochs=30, batch_size=32, warmup_n_steps=25000, accum_grad_n_steps=8".
Under the above config, I got a learning rate curve which went straight up until the end of the training.
And the result is not good as it reported in this project.
I need assistance, really appreciate. @hirofumi0810 @boji123

convert_to_sgd usage is not consistent with definition?

Here is the usage in asr/train.py and lm/train.py

optimizer.convert_to_sgd(model, 'sgd', args.lr, conf['weight_decay'],
                                     decay_type='always', decay_rate=0.5)

while function definition is :

    def convert_to_sgd(self, model, lr, weight_decay, decay_type, decay_rate):
        self.decay_type = decay_type
        self.decay_rate = decay_rate

        weight_decay = self.optimizer.weight_decay
        self.optimizer = set_optimizer(model, 'sgd', lr, weight_decay)
        logger.info('========== Convert to SGD ==========')

isn't it a bug?

conv_conformer error

thank you for your contribution.and i want to test conformer,but the error:
train.py: error: argument --enc_type: invalid choice: 'conv_conformer' (choose from 'blstm', 'lstm', 'bgru', 'gru', 'conv_blstm', 'conv_lstm', 'conv_bgru', 'conv_gru', 'transformer', 'conv_transformer', 'conv', 'tds', 'gated_conv')

bug: AttributeError: 'TransformerEncoder' object has no attribute 'chunk_size_left'

Hi,
When I use streamming transformer to train the model,
unit: word
when I decode the result, using streaming_score.sh. I got an error:

Original utterance num: 2000
Removed 0 empty utterances
  0%|                                                                                                                         | 0/2000 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/../../../neural_sp/bin/asr/eval.py", line 251, in <module>
    main()
  File "/../../../neural_sp/bin/asr/eval.py", line 182, in main
    progressbar=True)
  File "/neural_sp/evaluators/word.py", line 73, in eval_word
    exclude_eos=True)
  File "/neural_sp/models/seq2seq/speech2text.py", line 449, in decode_streaming
    N_l = self.enc.chunk_size_left
  File "/tools/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 535, in __getattr__
    type(self).__name__, name))
AttributeError: 'TransformerEncoder' object has no attribute 'chunk_size_left'
  0%|

Yours,
Thanks a lot.

how to decoding sub task model in multi-task learning model?

Hello,
I want to obtain the output of the sub1 task learning ,how should I do? or is there about script?
Thank you.

why mask=None in stream mode

neural_sp/neural_sp/models/seq2seq/encoders/transformer.py

Line 373 in cf73c78

xx_mask = None # NOTE: no mask

TransformerDecoder object has no attribute 'ctc'?

Hi,
I trained a word-char mix model.Then I want use /neural_sp/bin/asr/eval.py for decoding. When I set recog_ctc_weight = 0.6, it happened the error "TransformerDecoder object has no attribute 'ctc' ". How can I solve it ? Thank you. And if I set recog_ctc_weight = 0.0, It will decode successful.The following list is my decode parameter:
--recog_metric edit_distance
--recog_batch_size 1
--recog_ctc_weight 0.6
--recog_beam_width 20
--recog_n_average 5
--recog_lm_weight 0.3

Thank you.

about the transfer learning in neural_sp

Hi ,
I want to know the effective of transfer learning in this tool, I try to set resume, asr_init , and there is no obviously imporve.

Thank you very much.

Errors when running make KALDI-/path/to/kaldi command

I get the following make error related to Bazel:

command -v bazel > /dev/null || echo "SentencePiece requires Bazel, see https://bazel.build/"
cd /home/rohan_doshi96/neural_sp/tools/sentencepiece && bazel build src:all --incompatible_disable_deprecated_attr_
params=false
Extracting Bazel installation...
ERROR: The 'build' command is only supported from within a workspace (below a directory having a WORKSPACE file).
See documentation at https://docs.bazel.build/versions/master/build-ref.html#workspace
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel
shutdown".
Makefile:92: recipe for target 'sentencepiece.done' failed
make: *** [sentencepiece.done] Error 2

what's the purpose using trigger_points returned from ctc or trigger_points is None

Hello, just same as the title, the follow is the code:
https://github.com/hirofumi0810/neural_sp/blob/master/neural_sp/models/seq2seq/decoders/transformer.py#L332-L345

install problem

https://github.com/hirofumi0810/neural_sp/blob/master/tools/Makefile#L63
should using warp_rnnt==0.4 for torch1.5.1+cu101, gcc 7.3, python 3.7.

torch installed many time, by pip, conda, and setup.py.

since warp-transducer installed, does need warp_rnnt anymore?

	# self-attention
	self.layers = nn.ModuleList([copy.deepcopy(TransformerDecoderBlock(
	d_model, d_ff, attn_type, n_heads, dropout, dropout_att, dropout_layer,
	layer_norm_eps, ffn_activation, param_init,
	src_tgt_attention=False if lth < mocha_first_layer - 1 else True,

hirofumi0810 / neural_sp Goto Github PK

neural_sp's People

Contributors

Stargazers

Watchers

Forkers

neural_sp's Issues

============================================================================ LM Training stage (stage:3)

Recommend Projects

Recommend Topics

Recommend Org

============================================================================
LM Training stage (stage:3)