kolloldas / torchnlp Goto Github PK
View Code? Open in Web Editor NEWEasy to use NLP library built on PyTorch and TorchText
License: Apache License 2.0
Easy to use NLP library built on PyTorch and TorchText
License: Apache License 2.0
This is the result I get when following installation and running instructions:
>>> train('ner-conll2003', TransformerTagger, conll2003)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "torchnlp/tasks/sequence_tagging/main.py", line 46, in train
dataset = dataset_fn()
File "torchnlp/data/conll.py", line 67, in conll2003_dataset
fields=tuple(fields))
File "/usr/local/lib/python2.7/dist-packages/torchtext/data/dataset.py", line 78, in splits
os.path.join(path, train), **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torchtext/datasets/sequence_tagging.py", line 33, in __init__
examples.append(data.Example.fromlist(columns, fields))
File "/usr/local/lib/python2.7/dist-packages/torchtext/data/example.py", line 50, in fromlist
setattr(ex, n, f.preprocess(val))
File "/usr/local/lib/python2.7/dist-packages/torchtext/data/field.py", line 181, in preprocess
x = Pipeline(six.text_type.lower)(x)
File "/usr/local/lib/python2.7/dist-packages/torchtext/data/pipeline.py", line 37, in __call__
x = pipe.call(x, *args)
File "/usr/local/lib/python2.7/dist-packages/torchtext/data/pipeline.py", line 52, in call
return [self.convert_token(tok, *args) for tok in x]
TypeError: descriptor 'lower' requires a 'unicode' object but received a 'str'
ENV:
Distributor ID: Ubuntu
Description: Ubuntu 16.04.4 LTS
Release: 16.04
Codename: xenial
--------------------------------
torch.__version__: 0.4.1
torchtext.__version__: 0.3.1
--------------------------------
python: 2.7.12
Can you please update it for pytorch 0.3.0 as pytorch 0.2.0 is long gone and runs very slow on many new gpus
Should it be possible to use only transformers encoder part to train word accentation for Lithuanian language. In Lithuanian language stressing is somewhat tricky as it can vary dependyng on context along with word meaning (e.g. grammar case).
You've mentioned in your post using only encoding part for one to one mapping. In case of Lithuanian language accentation, there are three types of accent and the position of the accent within the word (varies alot). And there can also be no accent at all.
Any suggestions?
Hello, hope you are well, and thank you so much for writing this awesome resource!
I have a question about the training procedure. When I ran it in command prompt in Ubuntu, the training process abruptly was "killed". (image attached)
This did not look like it was early stopped, and had only undergone n = 3 epochs.
I tried to re-initiate the training procedure, and it stopped at the same point.
Do you know what is going on? Thank you in advance!
hello kolloldas, when I am install the project, i came across this problem.
(py35) bash-3.2$ python setup.py
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help
error: no commands supplied
python setup.py install
In the paper attention is all you need, the kernel size of convolutions is set to 1. But I find in this implement, this value is 3. Therefore, I am asking the reason.
Hi @kolloldas, great job with the transformer. I was using your model to run a few basic experiments on sequence labeling and after completing chunking and NER, wanted to move ahead with POS tagging. From what I understand, I'll need to create a new pos.py file in torchnlp/data/?
Could you give me a heads up on if I'm on the right track or there's an easier work-around?
Thanks! :D
How can I modify transformer for time series analysis,in this case also will you need to use masked attention heads
Hi~ Could you report your best F score on NER conll 2003 task ?
Thank you !
i can't find code piece doing this job.
normalization seems different from the paper #attention is all you need#
in paper, normalization layer stays after mha and feed forward layer, in torchnlp, it stays before them
x = inputs
# Layer Normalization
x_norm = self.layer_norm_mha(x)
# Multi-head attention
y = self.multi_head_attention(x_norm, x_norm, x_norm)
# Dropout and residual
x = self.dropout(x + y)
# Layer Normalization
x_norm = self.layer_norm_ffn(x)
# Positionwise Feedforward
y = self.positionwise_feed_forward(x_norm)
# Dropout and residual
y = self.dropout(x + y)
I have installed torchtext and using pytorch 0.4 and python=3.6
I am getting the following error while testing one of my files
File "conll.py", line 7, in <module>
from torchtext.datasets import SequenceTaggingDataset, CoNLL2000Chunking
ImportError: cannot import name 'CoNLL2000Chunking'
any clues?
Attempting to install torchnlp with 64 bit Windows 10, Python version 3.8.5 and torch 1.6.0 (CPU-only version; no CUDA.) I cloned the repository to a non-system directory off the root of my C drive. Got the following when trying to run the installer:
$ python -m pip install -r requirements.txt
Collecting git+git://github.com/kolloldas/text.git (from -r requirements.txt (line 4))
Cloning git://github.com/kolloldas/text.git to c:\users\xxxxxxx\appdata\local\temp\pip-req-build-pufzml50
ERROR: Command errored out with exit status 128: git clone -q git://github.com/kolloldas/text.git 'C:\Users\xxxxxxx\AppData\Local\Temp\pip-req-build-pufzml50' Check the logs for full command output.
Hi, even when I try changing the hyperparameters like so:
from torchnlp.ner import *
h2 = hparams_transformer_ner()
h2.update(batch_size=10)
train('ner-conll2003-nocrf', TransformerTagger, conll2003, hparams=h2)
The batch.batch_size still is 100 (line 167 of train.py)(I added the print statement):
for batch in prog_iter:
print(batch.batch_size)
Edit:
I can see where the batch size is being set by default to 100, line 41 of torchnlp/ner.py.
conll2003 = partial(conll2003_dataset, 'ner', hparams_tagging_base().batch_size,
root=PREFS.data_root,
train_file=PREFS.data_train,
validation_file=PREFS.data_validation,
test_file=PREFS.data_test)
However not sure where it's supposed be updated to a custom value.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.