Code Monkey home page Code Monkey logo

named-entity-recognition's Introduction

Named-Entity-Recognition

We have created a dataset of Hindi-English Code-Mixed Social Media Text (tweets) for the task of Named Entity Recognition. Tweets are pre-processed and annotated as per the 6 NER tags and a 7th Other tag.

NER-Tags

  • B-Per Indicates the Begening of a Person's name.
  • I-Per Indicates the intermediate of a Person's name.
  • B-Org Indicates the Begening of a Organizations's name.
  • I-Org Indicates the intermediate of a Organizations's name.
  • B-Loc Indicates the Begening of a Locations's name.
  • I-Loc Indicates the intermediate of a Locations's name.
  • Other Indicates all the word not falling in any of the above 6.

eg:

#Word #Tag
Bharat B-Loc
ke Other
2016 Other
ke Other
Demonetization Other
mein Other
kitna Other
kala Other
dhan Other
real Other
mein Other
aaya Other
??? Other
Accha Other
hua Other
ye Other
prashna Other
Miss B-Per
Word I-Per
Chillar I-Per
ko Other
nahi Other
puccha Other
gaya Other
0 Other
#misschillar B-Per
#missworld Other
#Demonetisation Other
#notebandi Other
#modi B-Per
#bjp B-Org
#gujrat B-Loc

Contents

  • TwitterData folder contains Id's of the scrapped tweets inside Scrapped folder, and processed and annotated data as named inside this.
  • All the three Models.py are the files for the three ML classification models we used for our reserach paper.
  • preprocessing and vector creation scripts are added with names indicating that.
  • This dataset is in development and in future we will extend this to more number of tweets so as to make it a more reliable dataset for this taska and others.

Outputs

  • DecisionTree and CRF models have direct score calls that gives all the required stats.
  • Keras does not provide the same for displaying score stats for LSTM model, so we build a coustom call of all the measure values and took average over all the iterations (here 5).
  • All the models performed well on the given data.
  • Decision Tree model with a f1-score of 0.94.
  • Conditional Random Field (CRF) model with a f1-score of 0.95.
  • LSTM model with a f1-score of 0.95.

Authors
  • Vinay Singh
  • Deepanshu Vijay
  • Syed A. Sarfaraz
  • Manish Srivastava

LTRC IIIT-Hyderabad


Citation

Named Entity Recognition for Hindi-English Code-Mixed Social Media Text

2018, 27-35, Proceedings of the Seventh Named Entities Workshop here

named-entity-recognition's People

Contributors

silentflame avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.