This repository contains Language Modeling that computes bigram and trigram probabilities, conditional probabilities from a corpus, and use GPT-2, which is a more sophisticated (but no longer state-of-the-art) neural language model which computes the probabilities of all tokens given the previous context simultaneously.
To run this code, you will need to have the following installed:
Python 3.6 or higher NumPy Pandas Huggingface Transformers
To run the code, simply run the following command:
python main.py
This will run the code and print the results to the console.
The results of the code are as follows:
The bigram and trigram probabilities are computed correctly. The conditional probabilities are computed correctly. The GPT-2 model is able to predict the next word in a sentence with high accuracy.