Code Monkey home page Code Monkey logo

tagtransition-and-word-emission-'s Introduction

Tag Transition and Word Emission

I have implemented Bigram, Trigram, 4-gram, 5-gram model. Training data set is Hindi labels corpus named Hindi-tagged18.txt file. Code is capable to calculate bigrams, trigrams, 4-grams, 5-grams and store then in the file and calculate the perplexity of the given sentences. The input data to calculate the perplexity have to be labeled data.

TTP (Tag transition probabilities):

Tag-Transition Probability is calculated for all the tags pairing with all other recorded tags from the training data set. Tags pairs with the 0 probability are also mentioned in the submitted TTP text file. Output file contains this data in below format:

prob(PR | QT) = 0.0110803324099723

prob(PR | N_NN) = 0.06583217431617988

prob(PR | PSP) = 0.06504904491481672

WEP (Word-emission Probability):

Word-emission probability is calculated only for the existing pair of the words and tags to avoid the overhead of calculations for non-existing pairs of tags and words which value will be 0 apparently. Output file has data In below format:

prob(वह | PR) = 0.34782608695652173

prob(पचास | QT) = 0.0110803324099723

prob(वर्ष| N_NN) = 0.0037088548910523874

prob(से| PSP) = 0.5534331440371709

Perplexity of ngram models:

To evaluate the models perplexity factor is used. In order to calculate the perplexity of given sentence training data set is the one which is hindi tagged data named Hindi-tagged-18.txt an the input sentence is provided as bolow

Input sentence: जब_PR मेरे_PR पास_N_NST एक_QT पैसा_N_NN नह ीं_RP

Output:

Bigram perplexity = 1.001811704525849

trigram perplexity = 1.0012386564808766

4gram perplexity = 1.0010211035763825

5gram perplexity = 1.0008433370198297

Input sentence: आकाश_N_NN में_PSP भ षण_JJ विस्फोट_N_NN जैस _N_NN ध्ववन_N_NN के_PSP साथ_PSP हजार_N_NN सूर्यों_N_NN का_PSP उजाला_N_NN फै लता_V_VM देख_V_VM कर_V_VAUX सब_QT च ींक_N_NN गए_V_VM

Output:

Bigram perplexity = 1.0015066218392483

trigram perplexity = 1.0002859183043158

4gram perplexity = 1

5gram perplexity = 1

In above to examples perplexity decreases as the N of the N-gram model increases. But the input is small sentence sometime it may be false. Please note that for the calculation 0 probability n-grams are not considered they are ignored to do the calculations smoothly without any exceptions like “divide by zero” or “zero can not be raised”

tagtransition-and-word-emission-'s People

Contributors

radhakuchekar avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.