shijie-wu / neural-transducer Goto Github PK
View Code? Open in Web Editor NEWThis repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.
License: MIT License
This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.
License: MIT License
Hi, Thank you for making the code open-source!
I am trying to train a g2p based model with beam-decoding. Unfortunately, I am getting the following error. Please refer to the logs below for complete details.
FYI, the code works fine with greedy decoding. Kindly advice.
(base) [aagarwal@ip-0A000427 neural-transducer]$ python src/train.py --train data/100hrs-youtube.train --dev data/100hrs-youtube.dev --test data/100hrs-youtube.test --epochs 100 --dataset g2p --arch transformer --model models/v2-beam-search-decoding/v2 --decode beam
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: seed - 0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: train - ['data/100hrs-youtube.train']
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: dev - ['data/100hrs-youtube.dev']
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: test - ['data/100hrs-youtube.test']
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: model - 'models/v2-beam-search-decoding/v2'
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: load - ''
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: bs - 20
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: epochs - 100
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: max_steps - 0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: warmup_steps - 4000
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: total_eval - -1
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: optimizer - <Optimizer.adam: 'adam'>
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: scheduler - <Scheduler.reducewhenstuck: 'reducewhenstuck'>
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: lr - 0.001
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: min_lr - 1e-05
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: momentum - 0.9
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: beta1 - 0.9
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: beta2 - 0.999
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: estop - 1e-08
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: cooldown - 0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: patience - 0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: discount_factor - 0.5
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: max_norm - 0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: gpuid - []
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: loglevel - 'info'
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: saveall - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: shuffle - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: cleanup_anyway - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: dataset - <Data.g2p: 'g2p'>
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: max_seq_len - 128
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: max_decode_len - 128
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: init - ''
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: dropout - 0.2
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: embed_dim - 100
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: nb_heads - 4
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: src_layer - 1
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: trg_layer - 1
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: src_hs - 200
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: trg_hs - 200
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: label_smooth - 0.0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: tie_trg_embed - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: arch - <Arch.transformer: 'transformer'>
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: nb_sample - 2
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: wid_siz - 11
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: indtag - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: decode - <Decode.beam: 'beam'>
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: mono - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: bestacc - False
INFO - 10/18/20 14:31:37 - 0:00:00 - src vocab size 45
INFO - 10/18/20 14:31:37 - 0:00:00 - trg vocab size 44
INFO - 10/18/20 14:31:37 - 0:00:00 - src vocab ['<PAD>', '<s>', '<\\s>', '<UNK>', '"b', '"g', '"h', '"i', '"j', '"k', '"m', '"n', '"s', '"z', "'", 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'ß', 'ä', 'ö', 'ü']
INFO - 10/18/20 14:31:37 - 0:00:00 - trg vocab ['<PAD>', '<s>', '<\\s>', '<UNK>', "'", ',"', '-', '.', '\\', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '¨', 'ß', 'ä', 'ç', 'è', 'é', 'ö', 'ü', 'ș']
INFO - 10/18/20 14:31:37 - 0:00:00 - model: Transformer(
(src_embed): Embedding(45, 100, padding_idx=0)
(trg_embed): Embedding(44, 100, padding_idx=0)
(position_embed): SinusoidalPositionalEmbedding()
(encoder): TransformerEncoder(
(layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): _LinearWithBias(in_features=100, out_features=100, bias=True)
)
(linear1): Linear(in_features=100, out_features=200, bias=True)
(dropout): Dropout(p=0.2, inplace=False)
(linear2): Linear(in_features=200, out_features=100, bias=True)
(norm1): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
(activation_dropout): Dropout(p=0.2, inplace=False)
)
)
(norm): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
)
(decoder): TransformerDecoder(
(layers): ModuleList(
(0): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(out_proj): _LinearWithBias(in_features=100, out_features=100, bias=True)
)
(multihead_attn): MultiheadAttention(
(out_proj): _LinearWithBias(in_features=100, out_features=100, bias=True)
)
(linear1): Linear(in_features=100, out_features=200, bias=True)
(dropout): Dropout(p=0.2, inplace=False)
(linear2): Linear(in_features=200, out_features=100, bias=True)
(norm1): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
(activation_dropout): Dropout(p=0.2, inplace=False)
)
)
(norm): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
)
(final_out): Linear(in_features=100, out_features=44, bias=True)
(dropout): Dropout(p=0.2, inplace=False)
)
INFO - 10/18/20 14:31:37 - 0:00:00 - number of parameter 216544
INFO - 10/18/20 14:31:37 - 0:00:00 - maximum training 269700 steps (100 epochs)
INFO - 10/18/20 14:31:37 - 0:00:00 - evaluate every 1 epochs
INFO - 10/18/20 14:31:37 - 0:00:00 - At 0-th epoch with lr 0.001000.
100%|| 2697/2697 [01:10<00:00, 38.40it/s]
INFO - 10/18/20 14:32:47 - 0:01:11 - Running average train loss is 1.5452647511058266 at epoch 0
INFO - 10/18/20 14:32:47 - 0:01:11 - At 1-th epoch with lr 0.001000.
100%|| 2697/2697 [01:06<00:00, 40.65it/s]
INFO - 10/18/20 14:33:54 - 0:02:17 - Running average train loss is 1.218658867061779 at epoch 1
100%|| 338/338 [00:02<00:00, 128.70it/s]
INFO - 10/18/20 14:33:56 - 0:02:19 - Average dev loss is 0.9772854196073035 at epoch 1
0%|| 0/6741 [00:00<?, ?it/s]
Exception ignored in: <generator object StandardG2P.read_file at 0x2af3a8d8b3d0>
RuntimeError: generator ignored GeneratorExit
Traceback (most recent call last):
File "src/train.py", line 350, in <module>
main()
File "src/train.py", line 346, in main
trainer.run(start_epoch, decode_fn=decode_fn)
File "/share/pretzel1/exp1/aagarwal/neural-transducer/src/trainer.py", line 373, in run
eval_res = self.evaluate(DEV, epoch_idx, decode_fn)
File "src/train.py", line 255, in evaluate
decode_fn)
File "/share/pretzel1/exp1/aagarwal/neural-transducer/src/util.py", line 194, in evaluate_all
pred, _ = decode_fn(model, src)
File "/share/pretzel1/exp1/aagarwal/neural-transducer/src/decoding.py", line 64, in __call__
trg_eos=self.trg_eos)
File "/share/pretzel1/exp1/aagarwal/neural-transducer/src/decoding.py", line 364, in decode_beam_search
enc_hs = transducer.encode(src_sentence)
TypeError: encode() missing 1 required positional argument: 'src_mask'
Could you provide the dataset you used in g2p experiments? I am wondering how you split the dictionary into training, dev and test sets, which is helpful to compare the performance using different models
I am trying to run the augmentation method of SIGMORPHON's 2021 shared task but I get the following error, even though I have align
installed.
How can I fix the problem?
hi! I've been playing around with the repo, the code is very nicely organized. but I have a question: I've run the code in example/irregularity-acl19
exactly as shown in the README, and I'm confused by the numbers I'm getting. I ran it on the English UniMorph data following the README, and also on German UniMorph as I'm working with German right now.
according to the README, the output (i.e. in model/unimorph/large/monotag-hmm/{lang}-{fold}.decode.{split}.tsv
) contains p(inflected form|lemma, tags) / len(inflected form)
. I assume this is in the loss
column in the TSVs, as that's the only column that makes sense.
here's the distribution of values I get in that column. N is number of predicted forms overall, across all folds and dev/test splits, and N(p > 1) is the number of predicted forms where the listed value for p(inflected form|lemma, tags) / len(inflected form)
> 1. the results are split by whether the model correctly predicted the target form.
Lang | Prediction correct? | N | N (p > 1) | mean(p) | min(p) | max(p) |
---|---|---|---|---|---|---|
ENG | Yes | 95861 | 0 | 0.0023 | 1e7 | 0.2833 |
No | 5437 | 1524 | 0.9874 | 5e2 | 22.3135 | |
DEU | Yes | 318311 | 0 | 0.0027 | 1e7 | 0.2891 |
No | 28423 | 7770 | 0.8299 | 5e2 | 23.2277 |
the main thing that confuses me is that the model systematically assigns higher probabilities to forms it gets wrong. (it also looks like there might a bug somewhere if 28% of the incorrectly predicted forms in each language are assigned a probability greater than one.)
going by the paper, the degree of irregularity metric i should be calculated as -log(p / (1 - p))
. applying that to the results above, the average i for words the model got right is 9.6 (ENG) and 11.0 (DEU), while for words it predicted wrong ( excluding forms with p > 1
where i is undefined), the average i is 0.7 (ENG, DEU).
this seems completely at odds from the analysis described in the paper. I'm wondering if I've misunderstood something, or ran the example wrong? any ideas what's going on here?
Hi,
I want to analyse the actual predictions of a trained model (i.e., the word forms themselves), so I need to save them to a file somehow. Is there an easy way to do that? Sorry, if it's something obvious, but I can't find it at a glance.
papars -> papers
Hi Shijie,
I got the following error:
C:\research\neural-transducer-master>conda env create --file environment.yml
Collecting package metadata (repodata.json): done
Solving environment: failed
ResolvePackageNotFound:
- zlib==1.2.11=h516909a_1010
- mkl_random==1.2.1=py39ha9443f7_2
- mkl-service==2.3.0=py39h27cfd23_1
- virtualenv==20.4.4=py39hf3d152e_0
- ld_impl_linux-64==2.35.1=hea4e1c9_2
- cffi==1.14.5=py39he32792d_0
- libstdcxx-ng==9.3.0=h6de172a_19
- readline==8.1=h46c0cb4_0
- _libgcc_mutex==0.1=conda_forge
- libgcc-ng==9.3.0=h2828fa1_19
- libuv==1.41.0=h7f98852_0
- tbb==2021.2.0=h4bd325d_0
- python==3.9.2=hffdb5ce_0_cpython
- pre-commit==2.12.1=py39hf3d152e_0
- pytorch==1.8.1=py3.9_cuda11.1_cudnn8.0.5_0
- ncurses==6.2=h58526e2_4
- mkl==2021.2.0=h726a3e6_389
- ninja==1.10.2=h4bd325d_0
- _openmp_mutex==4.5=1_llvm
- ca-certificates==2020.12.5=ha878542_0
- numpy-base==1.20.1=py39h7d8b39e_0
- pyyaml==5.4.1=py39h3811e60_0
- cudatoolkit==11.1.1=h6406543_8
- mkl_fft==1.3.0=py39h42c9631_2
- yaml==0.2.5=h516909a_0
- llvm-openmp==11.1.0=h4bd325d_1
- sqlite==3.35.5=h74cdb3f_0
- libffi==3.3=h58526e2_2
- editdistance-s==1.0.0=py39h1a9c180_1
- tk==8.6.10=h21135ba_1
- jedi==0.18.0=py39hf3d152e_2
- xz==5.2.5=h516909a_1
- certifi==2020.12.5=py39hf3d152e_1
- numpy==1.20.1=py39h93e21f0_0
- setuptools==49.6.0=py39hf3d152e_3
- ipython==7.22.0=py39hef51801_0
- openssl==1.1.1k=h7f98852_0
It would be really helpful if you give me some hints to solve this issue. :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.