Code Monkey home page Code Monkey logo

Comments (4)

kolloldas avatar kolloldas commented on June 7, 2024 3

@dsindex that's right! By itself the encoder is weak if we limit feedforward connections to each time step. Setting the filter size to 3 essentially takes the context information as you rightly pointed out. In fact the folks at Google did the same thing. However this won't be a problem if we pair the encoder with a decoder. I wrote an article on this issue, please check it out if you haven't read it yet!

from torchnlp.

kolloldas avatar kolloldas commented on June 7, 2024

Sure! I got F1 of 0.867 on the test set (0.921 on validation) using the BiLSTM CRF. I haven't done much hyperparameter tuning, so probably there is room for improvement. Would love to know the numbers you got.

from torchnlp.

dsindex avatar dsindex commented on June 7, 2024

hi @kolloldas

i am implementing the Transformer-based NER by referring your code.

https://github.com/dsindex/etagger

here, i found that

  1. if i do not use the CRF layer, the performance is around 70%.
  • 5 layers of the Transformer blocks
  • feed forward net with conv1d(kernel size 1)
  1. but, with the CRF layer, the performance goes up to 88%.
test precision, recall, f1(token): with out CRF
[0.9940347495376279, 0.847970479704797, 0.7586206896551724, 0.7618694362017804, 0.5936254980079682, 0.8837209302325582, 0.5938914027149321, 0.32207207207207206, 0.22399150743099788, 0.6607466473359913]
[0.9897805675156923, 0.6917519566526189, 0.5954415954415955, 0.6351267779839208, 0.773356401384083, 0.6606714628297362, 0.6287425149700598, 0.6620370370370371, 0.8210116731517509, 0.6741863905325444]
[0.9919030970978516, 0.7619363395225465, 0.6671987230646449, 0.6927487352445193, 0.6716754320060105, 0.7560891938250429, 0.6108202443280977, 0.43333333333333335, 0.3519599666388657, 0.66739886509244]
-> last column is the overall precision, recall, f1

test precision, recall, f1(chunk): with CRF
[0.8724561403508772, 0.8804886685552408, 0.87645400070497]

so, i suspect that the Transformer encoder alone is weak for collecting context information at the current position(time=t).

in your code, you are using kernel_size=3 for feed forward net.
is it the key of increasing performance?

from torchnlp.

dsindex avatar dsindex commented on June 7, 2024

the above problem was fixed after applying kernel_size=3 :)

from torchnlp.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.