kolloldas / torchnlp Goto Github PK

View Code? Open in Web Editor NEW

252.0 7.0 44.0 2.13 MB

Easy to use NLP library built on PyTorch and TorchText

License: Apache License 2.0

Python 100.00%

nlp machine-learning crf pytorch torchtext transformer

torchnlp's People

Contributors

Stargazers

Watchers

Forkers

qiuwei ruimao1988 wuqingzhou828 aleksas yuanjie-ai sayduke ngo010 jeniyat fyvictor93 qibaoyuan xiongshufeng lavine24 andrster1 pierre-zhao yushu-liu yucoian liuweiping2020 zhangjiekui legendtianjin gokunwu hjs542761058 mrvoh irnlpcoder cr1024 cnlpt spring-quan howardli1984 ammieqi judelee19 emanuelaboros oya163 markiewagner vgoklani jiangxiluning yezhengli-mr9 aghie personx000 rosequ shekhar-amit harishsdev harshp1802 simohara jackyin68 elijahahianyo

torchnlp's Issues

Can't get to run torchnlp.ner properly

This is the result I get when following installation and running instructions:

>>> train('ner-conll2003', TransformerTagger, conll2003)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "torchnlp/tasks/sequence_tagging/main.py", line 46, in train
    dataset = dataset_fn()
  File "torchnlp/data/conll.py", line 67, in conll2003_dataset
    fields=tuple(fields))
  File "/usr/local/lib/python2.7/dist-packages/torchtext/data/dataset.py", line 78, in splits
    os.path.join(path, train), **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torchtext/datasets/sequence_tagging.py", line 33, in __init__
    examples.append(data.Example.fromlist(columns, fields))
  File "/usr/local/lib/python2.7/dist-packages/torchtext/data/example.py", line 50, in fromlist
    setattr(ex, n, f.preprocess(val))
  File "/usr/local/lib/python2.7/dist-packages/torchtext/data/field.py", line 181, in preprocess
    x = Pipeline(six.text_type.lower)(x)
  File "/usr/local/lib/python2.7/dist-packages/torchtext/data/pipeline.py", line 37, in __call__
    x = pipe.call(x, *args)
  File "/usr/local/lib/python2.7/dist-packages/torchtext/data/pipeline.py", line 52, in call
    return [self.convert_token(tok, *args) for tok in x]
TypeError: descriptor 'lower' requires a 'unicode' object but received a 'str'

Took conll2003 dataset files from THIS REPO

ENV:

Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.4 LTS
Release:	16.04
Codename:	xenial
--------------------------------
torch.__version__: 0.4.1
torchtext.__version__: 0.3.1
--------------------------------
python: 2.7.12

is there a way to port it to 0.3.0?

Can you please update it for pytorch 0.3.0 as pytorch 0.2.0 is long gone and runs very slow on many new gpus

Using only encoder part for word accentation

Should it be possible to use only transformers encoder part to train word accentation for Lithuanian language. In Lithuanian language stressing is somewhat tricky as it can vary dependyng on context along with word meaning (e.g. grammar case).
You've mentioned in your post using only encoding part for one to one mapping. In case of Lithuanian language accentation, there are three types of accent and the position of the accent within the word (varies alot). And there can also be no accent at all.
Any suggestions?

.

Training killed

Hello, hope you are well, and thank you so much for writing this awesome resource!

I have a question about the training procedure. When I ran it in command prompt in Ubuntu, the training process abruptly was "killed". (image attached)

This did not look like it was early stopped, and had only undergone n = 3 epochs.

I tried to re-initiate the training procedure, and it stopped at the same point.

Do you know what is going on? Thank you in advance!

python setup.py error: no commands supplied

hello kolloldas, when I am install the project, i came across this problem.
(py35) bash-3.2$ python setup.py
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help

error: no commands supplied

python setup.py install

kernel size in position-wise ffn

In the paper attention is all you need, the kernel size of convolutions is set to 1. But I find in this implement, this value is 3. Therefore, I am asking the reason.

POS Tagging

Hi @kolloldas, great job with the transformer. I was using your model to run a few basic experiments on sequence labeling and after completing chunking and NER, wanted to move ahead with POS tagging. From what I understand, I'll need to create a new pos.py file in torchnlp/data/?
Could you give me a heads up on if I'm on the right track or there's an easier work-around?

Thanks! :D

Question: time series

How can I modify transformer for time series analysis,in this case also will you need to use masked attention heads

F score on NER task

Hi~ Could you report your best F score on NER conll 2003 task ?
Thank you !

How to mask <pad> in sentence?

i can't find code piece doing this job.

Add & Norm

normalization seems different from the paper #attention is all you need#

in paper, normalization layer stays after mha and feed forward layer, in torchnlp, it stays before them

    x = inputs
    
    # Layer Normalization
    x_norm = self.layer_norm_mha(x)
    
    # Multi-head attention
    y = self.multi_head_attention(x_norm, x_norm, x_norm)
    
    # Dropout and residual
    x = self.dropout(x + y)
    
    # Layer Normalization
    x_norm = self.layer_norm_ffn(x)
    
    # Positionwise Feedforward
    y = self.positionwise_feed_forward(x_norm)
    
    # Dropout and residual
    y = self.dropout(x + y)

import error

I have installed torchtext and using pytorch 0.4 and python=3.6

I am getting the following error while testing one of my files

  File "conll.py", line 7, in <module>
    from torchtext.datasets import SequenceTaggingDataset, CoNLL2000Chunking
ImportError: cannot import name 'CoNLL2000Chunking'

any clues?

Issue with installation of torchnlp per instructions here

Attempting to install torchnlp with 64 bit Windows 10, Python version 3.8.5 and torch 1.6.0 (CPU-only version; no CUDA.) I cloned the repository to a non-system directory off the root of my C drive. Got the following when trying to run the installer:
$ python -m pip install -r requirements.txt
Collecting git+git://github.com/kolloldas/text.git (from -r requirements.txt (line 4))
Cloning git://github.com/kolloldas/text.git to c:\users\xxxxxxx\appdata\local\temp\pip-req-build-pufzml50
ERROR: Command errored out with exit status 128: git clone -q git://github.com/kolloldas/text.git 'C:\Users\xxxxxxx\AppData\Local\Temp\pip-req-build-pufzml50' Check the logs for full command output.

Batch size stuck at 100

Hi, even when I try changing the hyperparameters like so:

from torchnlp.ner import *

h2 = hparams_transformer_ner()
h2.update(batch_size=10)

train('ner-conll2003-nocrf', TransformerTagger, conll2003, hparams=h2)

The batch.batch_size still is 100 (line 167 of train.py)(I added the print statement):

for batch in prog_iter:
    print(batch.batch_size)

Edit:
I can see where the batch size is being set by default to 100, line 41 of torchnlp/ner.py.

conll2003 = partial(conll2003_dataset, 'ner',  hparams_tagging_base().batch_size,  
                                    root=PREFS.data_root,
                                    train_file=PREFS.data_train,
                                    validation_file=PREFS.data_validation,
                                    test_file=PREFS.data_test)

However not sure where it's supposed be updated to a custom value.