Comments (4)
@dsindex that's right! By itself the encoder is weak if we limit feedforward connections to each time step. Setting the filter size to 3 essentially takes the context information as you rightly pointed out. In fact the folks at Google did the same thing. However this won't be a problem if we pair the encoder with a decoder. I wrote an article on this issue, please check it out if you haven't read it yet!
from torchnlp.
Sure! I got F1 of 0.867 on the test set (0.921 on validation) using the BiLSTM CRF. I haven't done much hyperparameter tuning, so probably there is room for improvement. Would love to know the numbers you got.
from torchnlp.
hi @kolloldas
i am implementing the Transformer-based NER by referring your code.
https://github.com/dsindex/etagger
here, i found that
- if i do not use the CRF layer, the performance is around 70%.
- 5 layers of the Transformer blocks
- feed forward net with conv1d(kernel size 1)
- but, with the CRF layer, the performance goes up to 88%.
test precision, recall, f1(token): with out CRF
[0.9940347495376279, 0.847970479704797, 0.7586206896551724, 0.7618694362017804, 0.5936254980079682, 0.8837209302325582, 0.5938914027149321, 0.32207207207207206, 0.22399150743099788, 0.6607466473359913]
[0.9897805675156923, 0.6917519566526189, 0.5954415954415955, 0.6351267779839208, 0.773356401384083, 0.6606714628297362, 0.6287425149700598, 0.6620370370370371, 0.8210116731517509, 0.6741863905325444]
[0.9919030970978516, 0.7619363395225465, 0.6671987230646449, 0.6927487352445193, 0.6716754320060105, 0.7560891938250429, 0.6108202443280977, 0.43333333333333335, 0.3519599666388657, 0.66739886509244]
-> last column is the overall precision, recall, f1
test precision, recall, f1(chunk): with CRF
[0.8724561403508772, 0.8804886685552408, 0.87645400070497]
so, i suspect that the Transformer encoder alone is weak for collecting context information at the current position(time=t).
in your code, you are using kernel_size=3 for feed forward net.
is it the key of increasing performance?
from torchnlp.
the above problem was fixed after applying kernel_size=3 :)
from torchnlp.
Related Issues (15)
- import error HOT 1
- Batch size stuck at 100 HOT 2
- Add & Norm HOT 1
- Question: time series
- How to mask <pad> in sentence?
- Training killed HOT 1
- Issue with installation of torchnlp per instructions here
- kernel size in position-wise ffn
- is there a way to port it to 0.3.0? HOT 3
- . HOT 1
- POS Tagging HOT 1
- python setup.py error: no commands supplied
- Using only encoder part for word accentation HOT 4
- Can't get to run torchnlp.ner properly HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from torchnlp.