Hi~ Could you report your best F score on NER conll 2003 task ? Thank you !

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

F score on NER task about torchnlp HOT 4 CLOSED

TanyaZhao commented on June 7, 2024

F score on NER task

from torchnlp.

Comments (4)

kolloldas commented on June 7, 2024 3

@dsindex that's right! By itself the encoder is weak if we limit feedforward connections to each time step. Setting the filter size to 3 essentially takes the context information as you rightly pointed out. In fact the folks at Google did the same thing. However this won't be a problem if we pair the encoder with a decoder. I wrote an article on this issue, please check it out if you haven't read it yet!

from torchnlp.

kolloldas commented on June 7, 2024

Sure! I got F1 of 0.867 on the test set (0.921 on validation) using the BiLSTM CRF. I haven't done much hyperparameter tuning, so probably there is room for improvement. Would love to know the numbers you got.

from torchnlp.

dsindex commented on June 7, 2024

hi @kolloldas

i am implementing the Transformer-based NER by referring your code.

https://github.com/dsindex/etagger

here, i found that

if i do not use the CRF layer, the performance is around 70%.

5 layers of the Transformer blocks
feed forward net with conv1d(kernel size 1)

but, with the CRF layer, the performance goes up to 88%.

test precision, recall, f1(token): with out CRF
[0.9940347495376279, 0.847970479704797, 0.7586206896551724, 0.7618694362017804, 0.5936254980079682, 0.8837209302325582, 0.5938914027149321, 0.32207207207207206, 0.22399150743099788, 0.6607466473359913]
[0.9897805675156923, 0.6917519566526189, 0.5954415954415955, 0.6351267779839208, 0.773356401384083, 0.6606714628297362, 0.6287425149700598, 0.6620370370370371, 0.8210116731517509, 0.6741863905325444]
[0.9919030970978516, 0.7619363395225465, 0.6671987230646449, 0.6927487352445193, 0.6716754320060105, 0.7560891938250429, 0.6108202443280977, 0.43333333333333335, 0.3519599666388657, 0.66739886509244]
-> last column is the overall precision, recall, f1

test precision, recall, f1(chunk): with CRF
[0.8724561403508772, 0.8804886685552408, 0.87645400070497]

so, i suspect that the Transformer encoder alone is weak for collecting context information at the current position(time=t).

in your code, you are using kernel_size=3 for feed forward net.
is it the key of increasing performance?

from torchnlp.

dsindex commented on June 7, 2024

the above problem was fixed after applying kernel_size=3 :)

from torchnlp.

F score on NER task about torchnlp HOT 4 CLOSED

Comments (4)

Related Issues (15)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent