Code Monkey home page Code Monkey logo

modifiedkneserney's Introduction

ModifiedKneserNey

As part of an independent research project in natural language processing, I implemented a modified, interpolated Kneser-Ney smoothing algorithm. Looking online, I could not find a Kneser-Ney smoothing algorithm that met my exact needs, so I created my own.

What's special about my version:

  1. It has a correction for out-of-vocabulary words, necessary for scoring probabilities for unseen n-grams
  2. It estimates discount values based on training data instead of setting them to a fixed value of the typically used .75
  3. It is super easy to use

Example

# let corpus represent a large string of training data
# let sentence represent a string that you wish to score

kn = ModifiedKneserNey()
kn.train(corpus)
kn.log_score_per_ngram(sentence)

# Done!:)

Requirements

  • Python 3, including:
    • nltk
    • numpy

References:

  • Stanley F. Chen, Joshua Goodman (1999), ”An empirical study of smoothing techniques for language modeling,” in Computer Speech and Language, vol. 13, Issue 4, pp. 359-394.

  • P. Taraba (2007), ”Kneser-Ney Smoothing With a Correcting Transformation for Small Data Sets,” in IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 6, pp. 1912-1921.

  • Heafield, Kenneth and Pouzyrevsky, Ivan and H Clark, Jonathan and Koehn, Philipp. (2013). ”Scalable Modified Kneser-Ney Language Model Estimation” in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistic, vol. 2, pp. 690-696.

  • Kneser, Reinhard and Hermann Ney (1995), ”Improved backing-off for M-gram language modeling.” ICASSP. D. Jurafsky and J. H. Martin (2017), ”Speech and Language Processing,” (Third Edition draft)

modifiedkneserney's People

Contributors

epeake avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

modifiedkneserney's Issues

any demo results?

Hi epeake,

My results are not reasonable to me... May I ask for uploading any demo results? Thank you!
such as a training set you use and some test results. Thanks a lot!

Best wishes!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.