Paraphrase Generator with T5

A Paraphrase-Generator built using transformers which takes an English sentence as an input and produces a set of paraphrased sentences. This is an NLP task of conditional text-generation. The model used here is the T5ForConditionalGeneration from the huggingface transformers library. This model is trained on the Google's PAWS Dataset and the model is saved in the transformer model hub of hugging face library under the name Vamsi/T5_Paraphrase_Paws.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Streamlit library
Huggingface transformers library
Pytorch
Tensorflow

Installing

Streamlit

$ pip install streamlit

Huggingface transformers library

$ pip install transformers

Tensorflow

$ pip install --upgrade tensorflow

Pytorch

Head to the docs and install a compatible version
https://pytorch.org/

Running the web app

Clone the repository

$ git clone [repolink]

Running streamlit app

$ cd Streamlit

$ streamlit run paraphrase.py

Running the flask app

$ cd Server

$ python server.py

The initial server call will take some time as it downloads the model parameters. The later calls will be relatively faster as it will store the model params in the cahce.

General Usage

PyTorch and TF models are available

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Vamsi/T5_Paraphrase_Paws")  
model = AutoModelForSeq2SeqLM.from_pretrained("Vamsi/T5_Paraphrase_Paws")

sentence = "This is something which i cannot understand at all"

text =  "paraphrase: " + sentence + " </s>"

encoding = tokenizer.encode_plus(text,pad_to_max_length=True, return_tensors="pt")
input_ids, attention_masks = encoding["input_ids"].to("cuda"), encoding["attention_mask"].to("cuda")


outputs = model.generate(
    input_ids=input_ids, attention_mask=attention_masks,
    max_length=256,
    do_sample=True,
    top_k=200,
    top_p=0.95,
    early_stopping=True,
    num_return_sequences=5
)

for output in outputs:
    line = tokenizer.decode(output, skip_special_tokens=True,clean_up_tokenization_spaces=True)
    print(line)

Built With

Streamlit - Fastest way for building data apps
Flask - Backend framework
Transformers-Huggingface - On a mission to solve NLP, one commit at a time. Transformers Library.

Authors

Sai Vamsi Alisetti

Acknowledgments

Sampath Kethineedi

parvez2017 / paraphrase-generator Goto Github PK

paraphrase-generator's Introduction

Paraphrase Generator with T5

Getting Started

Prerequisites

Installing

Running the web app

General Usage

Built With

Authors

Acknowledgments

paraphrase-generator's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent