shijie-wu / neural-transducer Goto Github PK

View Code? Open in Web Editor NEW

72.0 5.0 23.0 766 KB

This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.

License: MIT License

Makefile 0.07% Shell 9.48% Python 81.00% C 9.46%

character-level-transduction sequence-to-sequence transducers

neural-transducer's People

Contributors

Stargazers

Watchers

neural-transducer's Issues

encode() missing 1 required positional argument: 'src_mask' - Beam Decode

Hi, Thank you for making the code open-source!

I am trying to train a g2p based model with beam-decoding. Unfortunately, I am getting the following error. Please refer to the logs below for complete details.

FYI, the code works fine with greedy decoding. Kindly advice.

(base) [aagarwal@ip-0A000427 neural-transducer]$ python src/train.py --train data/100hrs-youtube.train --dev data/100hrs-youtube.dev --test data/100hrs-youtube.test --epochs 100 --dataset g2p --arch transformer --model models/v2-beam-search-decoding/v2 --decode beam
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: seed - 0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: train - ['data/100hrs-youtube.train']
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: dev - ['data/100hrs-youtube.dev']
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: test - ['data/100hrs-youtube.test']
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: model - 'models/v2-beam-search-decoding/v2'
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: load - ''
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: bs - 20
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: epochs - 100
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: max_steps - 0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: warmup_steps - 4000
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: total_eval - -1
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: optimizer - <Optimizer.adam: 'adam'>
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: scheduler - <Scheduler.reducewhenstuck: 'reducewhenstuck'>
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: lr - 0.001
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: min_lr - 1e-05
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: momentum - 0.9
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: beta1 - 0.9
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: beta2 - 0.999
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: estop - 1e-08
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: cooldown - 0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: patience - 0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: discount_factor - 0.5
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: max_norm - 0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: gpuid - []
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: loglevel - 'info'
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: saveall - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: shuffle - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: cleanup_anyway - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: dataset - <Data.g2p: 'g2p'>
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: max_seq_len - 128
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: max_decode_len - 128
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: init - ''
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: dropout - 0.2
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: embed_dim - 100
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: nb_heads - 4
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: src_layer - 1
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: trg_layer - 1
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: src_hs - 200
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: trg_hs - 200
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: label_smooth - 0.0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: tie_trg_embed - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: arch - <Arch.transformer: 'transformer'>
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: nb_sample - 2
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: wid_siz - 11
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: indtag - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: decode - <Decode.beam: 'beam'>
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: mono - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: bestacc - False
INFO - 10/18/20 14:31:37 - 0:00:00 - src vocab size 45
INFO - 10/18/20 14:31:37 - 0:00:00 - trg vocab size 44
INFO - 10/18/20 14:31:37 - 0:00:00 - src vocab ['<PAD>', '<s>', '<\\s>', '<UNK>', '"b', '"g', '"h', '"i', '"j', '"k', '"m', '"n', '"s', '"z', "'", 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'ß', 'ä', 'ö', 'ü']
INFO - 10/18/20 14:31:37 - 0:00:00 - trg vocab ['<PAD>', '<s>', '<\\s>', '<UNK>', "'", ',"', '-', '.', '\\', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '¨', 'ß', 'ä', 'ç', 'è', 'é', 'ö', 'ü', 'ș']
INFO - 10/18/20 14:31:37 - 0:00:00 - model: Transformer(
                                       (src_embed): Embedding(45, 100, padding_idx=0)
                                       (trg_embed): Embedding(44, 100, padding_idx=0)
                                       (position_embed): SinusoidalPositionalEmbedding()
                                       (encoder): TransformerEncoder(
                                         (layers): ModuleList(
                                           (0): TransformerEncoderLayer(
                                             (self_attn): MultiheadAttention(
                                               (out_proj): _LinearWithBias(in_features=100, out_features=100, bias=True)
                                             )
                                             (linear1): Linear(in_features=100, out_features=200, bias=True)
                                             (dropout): Dropout(p=0.2, inplace=False)
                                             (linear2): Linear(in_features=200, out_features=100, bias=True)
                                             (norm1): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                             (norm2): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                             (activation_dropout): Dropout(p=0.2, inplace=False)
                                           )
                                         )
                                         (norm): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                       )
                                       (decoder): TransformerDecoder(
                                         (layers): ModuleList(
                                           (0): TransformerDecoderLayer(
                                             (self_attn): MultiheadAttention(
                                               (out_proj): _LinearWithBias(in_features=100, out_features=100, bias=True)
                                             )
                                             (multihead_attn): MultiheadAttention(
                                               (out_proj): _LinearWithBias(in_features=100, out_features=100, bias=True)
                                             )
                                             (linear1): Linear(in_features=100, out_features=200, bias=True)
                                             (dropout): Dropout(p=0.2, inplace=False)
                                             (linear2): Linear(in_features=200, out_features=100, bias=True)
                                             (norm1): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                             (norm2): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                             (norm3): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                             (activation_dropout): Dropout(p=0.2, inplace=False)
                                           )
                                         )
                                         (norm): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                       )
                                       (final_out): Linear(in_features=100, out_features=44, bias=True)
                                       (dropout): Dropout(p=0.2, inplace=False)
                                     )
INFO - 10/18/20 14:31:37 - 0:00:00 - number of parameter 216544
INFO - 10/18/20 14:31:37 - 0:00:00 - maximum training 269700 steps (100 epochs)
INFO - 10/18/20 14:31:37 - 0:00:00 - evaluate every 1 epochs
INFO - 10/18/20 14:31:37 - 0:00:00 - At 0-th epoch with lr 0.001000.
100%|| 2697/2697 [01:10<00:00, 38.40it/s]
INFO - 10/18/20 14:32:47 - 0:01:11 - Running average train loss is 1.5452647511058266 at epoch 0
INFO - 10/18/20 14:32:47 - 0:01:11 - At 1-th epoch with lr 0.001000.
100%|| 2697/2697 [01:06<00:00, 40.65it/s]
INFO - 10/18/20 14:33:54 - 0:02:17 - Running average train loss is 1.218658867061779 at epoch 1
100%|| 338/338 [00:02<00:00, 128.70it/s]
INFO - 10/18/20 14:33:56 - 0:02:19 - Average dev loss is 0.9772854196073035 at epoch 1
  0%|| 0/6741 [00:00<?, ?it/s]
Exception ignored in: <generator object StandardG2P.read_file at 0x2af3a8d8b3d0>
RuntimeError: generator ignored GeneratorExit
Traceback (most recent call last):
  File "src/train.py", line 350, in <module>
    main()
  File "src/train.py", line 346, in main
    trainer.run(start_epoch, decode_fn=decode_fn)
  File "/share/pretzel1/exp1/aagarwal/neural-transducer/src/trainer.py", line 373, in run
    eval_res = self.evaluate(DEV, epoch_idx, decode_fn)
  File "src/train.py", line 255, in evaluate
    decode_fn)
  File "/share/pretzel1/exp1/aagarwal/neural-transducer/src/util.py", line 194, in evaluate_all
    pred, _ = decode_fn(model, src)
  File "/share/pretzel1/exp1/aagarwal/neural-transducer/src/decoding.py", line 64, in __call__
    trg_eos=self.trg_eos)
  File "/share/pretzel1/exp1/aagarwal/neural-transducer/src/decoding.py", line 364, in decode_beam_search
    enc_hs = transducer.encode(src_sentence)
TypeError: encode() missing 1 required positional argument: 'src_mask'

Data set for g2p

Could you provide the dataset you used in g2p experiments? I am wondering how you split the dictionary into training, dev and test sets, which is helpful to compare the performance using different models

AttributeError: module 'align' has no attribute 'Aligner'

I am trying to run the augmentation method of SIGMORPHON's 2021 shared task but I get the following error, even though I have align installed.

How can I fix the problem?

Strange results running the ACL2019 irregularity example code

hi! I've been playing around with the repo, the code is very nicely organized. but I have a question: I've run the code in example/irregularity-acl19 exactly as shown in the README, and I'm confused by the numbers I'm getting. I ran it on the English UniMorph data following the README, and also on German UniMorph as I'm working with German right now.

according to the README, the output (i.e. in model/unimorph/large/monotag-hmm/{lang}-{fold}.decode.{split}.tsv) contains p(inflected form|lemma, tags) / len(inflected form). I assume this is in the loss column in the TSVs, as that's the only column that makes sense.

here's the distribution of values I get in that column. N is number of predicted forms overall, across all folds and dev/test splits, and N(p > 1) is the number of predicted forms where the listed value for p(inflected form|lemma, tags) / len(inflected form) > 1. the results are split by whether the model correctly predicted the target form.

Lang	Prediction correct?	N	N (p > 1)	mean(p)	min(p)	max(p)
ENG	Yes	95861	0	0.0023	1e7	0.2833
	No	5437	1524	0.9874	5e2	22.3135
DEU	Yes	318311	0	0.0027	1e7	0.2891
	No	28423	7770	0.8299	5e2	23.2277

the main thing that confuses me is that the model systematically assigns higher probabilities to forms it gets wrong. (it also looks like there might a bug somewhere if 28% of the incorrectly predicted forms in each language are assigned a probability greater than one.)

going by the paper, the degree of irregularity metric i should be calculated as -log(p / (1 - p)). applying that to the results above, the average i for words the model got right is 9.6 (ENG) and 11.0 (DEU), while for words it predicted wrong ( excluding forms with p > 1 where i is undefined), the average i is 0.7 (ENG, DEU).

this seems completely at odds from the analysis described in the paper. I'm wondering if I've misunderstood something, or ran the example wrong? any ideas what's going on here?

C:\research\neural-transducer-master>conda env create --file environment.yml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:
  - zlib==1.2.11=h516909a_1010
  - mkl_random==1.2.1=py39ha9443f7_2
  - mkl-service==2.3.0=py39h27cfd23_1
  - virtualenv==20.4.4=py39hf3d152e_0
  - ld_impl_linux-64==2.35.1=hea4e1c9_2
  - cffi==1.14.5=py39he32792d_0
  - libstdcxx-ng==9.3.0=h6de172a_19
  - readline==8.1=h46c0cb4_0
  - _libgcc_mutex==0.1=conda_forge
  - libgcc-ng==9.3.0=h2828fa1_19
  - libuv==1.41.0=h7f98852_0
  - tbb==2021.2.0=h4bd325d_0
  - python==3.9.2=hffdb5ce_0_cpython
  - pre-commit==2.12.1=py39hf3d152e_0
  - pytorch==1.8.1=py3.9_cuda11.1_cudnn8.0.5_0
  - ncurses==6.2=h58526e2_4
  - mkl==2021.2.0=h726a3e6_389
  - ninja==1.10.2=h4bd325d_0
  - _openmp_mutex==4.5=1_llvm
  - ca-certificates==2020.12.5=ha878542_0
  - numpy-base==1.20.1=py39h7d8b39e_0
  - pyyaml==5.4.1=py39h3811e60_0
  - cudatoolkit==11.1.1=h6406543_8
  - mkl_fft==1.3.0=py39h42c9631_2
  - yaml==0.2.5=h516909a_0
  - llvm-openmp==11.1.0=h4bd325d_1
  - sqlite==3.35.5=h74cdb3f_0
  - libffi==3.3=h58526e2_2
  - editdistance-s==1.0.0=py39h1a9c180_1
  - tk==8.6.10=h21135ba_1
  - jedi==0.18.0=py39hf3d152e_2
  - xz==5.2.5=h516909a_1
  - certifi==2020.12.5=py39hf3d152e_1
  - numpy==1.20.1=py39h93e21f0_0
  - setuptools==49.6.0=py39hf3d152e_3
  - ipython==7.22.0=py39hef51801_0
  - openssl==1.1.1k=h7f98852_0

It would be really helpful if you give me some hints to solve this issue. :)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

shijie-wu / neural-transducer Goto Github PK

neural-transducer's People

Contributors

Stargazers

Watchers

Forkers

neural-transducer's Issues

encode() missing 1 required positional argument: 'src_mask' - Beam Decode

Data set for g2p

AttributeError: module 'align' has no attribute 'Aligner'

Strange results running the ACL2019 irregularity example code

Writing out final dev/test predictions to file?

Typo in readme

Error for creating conda env

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent