The predictnextkbo from amarnathprabhakar

Predicting the next word from a series of prior words using a Katz Backoff Trigram language model

The goal of this project was to build a data product which uses a Katz Backoff Trigram language model to predict the next word from a series of prior words. This is implemented as a Shiny R web application accessible from the following link:

https://michael-szczepaniak.shinyapps.io/predictnextkbo/

Pre-process input filtering is not in place yet, but the model appears to be functioning as expected.

Project Breakdown

The project is broken down in to four parts described below. Each part contains a link to a page on rpubs which describes that part in further detail.

Part 1 - Overview and Pre-Processing

Background
Project Objectives
Acquiring, Partitioning, Preparing the Data
- Sentence Parsing
- Non-ASCII Character Filtering
- Unicode Tag Conversions and Filtering
- URL Filtering
- Additional Filtering and EOS Tokenization

Part 2 - N-grams and Exploratory Data Analysis

Unigram Singleton Processing
Unigram, Bigram, and Trigram Frequency Table Generation
Count of Counts plots
Top 10 Unigram, Bigram, and Trigram Frequency Plots

Part 3 - Understanding and Implementing the Model

Deriving the Model
- Maximum Likelihood Estimate
- Markov Assumption
- Discounting
- Probabiltities of Observed N-grams
- Probabiltities of Unobserved N-grams
Walk-through of the KBO Trigram Algorithm Calculations

Part 4 - Parameter Selection and Optimization

amarnathprabhakar / predictnextkbo Goto Github PK

predictnextkbo's Introduction

Predicting the next word from a series of prior words using a Katz Backoff Trigram language model

Project Breakdown

predictnextkbo's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent