Code Monkey home page Code Monkey logo

natural-language-processing's Introduction

Natural-Language-Processing End-to-End Implementation Examples

nlp

In recent years, natural language processing (NLP) has seen quick growth in quality and usability, and this has helped to drive business adoption of artificial intelligence (AI) solutions. In the last few years, researchers have been applying newer deep learning methods to NLP. Data scientists started moving from traditional methods to state-of-the-art (SOTA) deep neural network (DNN) algorithms which use language models pretrained on large text corpora.

This repository contains the full implementation example of several Natural Language Processing methods in Python, which can be used in any dataset of indutry to readily usage. I tried to output it all as jupyter notebook so that it's easy to read and follow through.

The goal of this repo is to provide a comprehensive set of tools and examples that leverage recent advances in NLP algorithms.

When properly consumed in order of the notebooks, it will guide you through the basics of NLP concepts and skills through several different libraries (Keras, Tenforfow), and eventually will help build production-level systems like Chatbot / Recommendation system based on the language data


aa

Table of contents

Requirements

  • Python 3
  • Tensorflow 2.x
  • Numpy
  • Pandas
  • sklearn
  • Transformers (HuggingFace)

Usage

  • Install required packages.
  • Follow along each Jupyter Notebooks

Data

Each dataset needed for each notebook can be downloaded through 'wget'. Also, full dataset can be found in 'data' folder in this repo.


Implementation

We try to classify the category of the News using the pre-trained embedding model. Use Keras to build model from scrath and start training.

We perform a sentiment Analysis using Google BERT model on the movie data with TF Keras API. We use Keras API this time to do the analysis, as Pytorch version examples are already a lot. Also, We will use Korean Movie Review dataset, as analysis done in English Movie Review (IMDB) are easy-to-be-find online. You should be able to have a firm grasp of how Google's language model 'BERT' works, and fine-tune it to apply to any of custom business problems.

Same sentiment Analysis using Google BERT model on the movie data with Tensorflow 2.0. Here we utilize transformers 'BertTokenizer' and 'BertModel' to easily load functions necessary in BERT, and use it in training.

Build SQuAD model using Keras and BERT. Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

This time, we are going to solve the same SQuAD problem with Tensorflow by fine-tuning pretrained BERT-Large model

Solve the another application of Natural Language Processing - Names Entity Recognition (NER). Named entity recognition (NER) โ€’ also called entity identification or entity extraction โ€’ is an information extraction technique that automatically identifies named entities in a text and classifies them into predefined categories.

Build a simpel End-to-End Speech-to-Text model using librosa library. The model takes recordings of 10 different classes or words (data from Kaggle Speech Recognition Challenge), trains algorithm that is in Convolutional 1D, and predicts the sound in text.

A simple example to look at different decoder method provided by Transformer library. We use GPT2 specifically to to see which decoder gerenates the most human-like languages when given texts.


ExtraLearning

  1. Novel generator using KoGPT2 and pytorch Link
  2. Text GEneration / Lyric GEneration / SQuAD fine-tuning Link
  3. Make a simple Chat-bot in Korean language using Korean language data and pre-trained KoGPT2 model Link

Refernece

https://github.com/kimwoonggon/publicservant_AI
https://github.com/microsoft/nlp-recipes
https://github.com/NirantK/nlp-python-deep-learning
https://github.com/monologg/KoBERT-NER \

natural-language-processing's People

Contributors

hyunjoonbok avatar

Stargazers

Rajendra avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.