Code Monkey home page Code Monkey logo

prithivirajdamodaran / styleformer Goto Github PK

View Code? Open in Web Editor NEW
472.0 17.0 63.0 1.88 MB

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

License: Apache License 2.0

Python 100.00%
style-transfer slang informal-sentences formal-languages nlp active passive text-style-transfer text-style text-style-transfer-benchmark

styleformer's Introduction

PyPI - License visitors

Styleformer

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more.For instance, understand What makes text formal or casual/informal.

Table of contents

Usecases for Styleformer

Area 1: Data Augmentation

  • Augment training datasets with various fine-grained language styles.

Area 2: Post-processing

  • Apply style transfers to machine generated text.
  • e.g.
    • Refine a Summarised text to active voice + formal tone.
    • Refine a Translated text to more casual tone to reach younger audience.

Area 3: Controlled paraphrasing

  • Formal <=> Casual and Active <=> style transfers adds a notion of control over how we paraphrase when compared to free-form paraphrase where there is control or guarantee over the paraphrases.

Area 4: Assisted writing

  • Integrate this to any human writing interfaces like email clients, messaging tools or social media post authoring tools. Your creativity is your limit to te uses.
  • e.g.
    • Polish an email with business tone for professional uses.

Installation

pip install git+https://github.com/PrithivirajDamodaran/Styleformer.git

Quick Start

python [IMPORTANT] If you are using in notebook, use the below line to login: from huggingface_hub import notebook_login notebook_login() else use: huggingface-cli login

Casual to Formal (Available now !)

from styleformer import Styleformer
import torch
import warnings
warnings.filterwarnings("ignore")

'''
#uncomment for re-producability
def set_seed(seed):
  torch.manual_seed(seed)
  if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

set_seed(1234)
'''

# style = [0=Casual to Formal, 1=Formal to Casual, 2=Active to Passive, 3=Passive to Active etc..]
sf = Styleformer(style = 0) 

source_sentences = [
"I am quitting my job",
"Jimmy is on crack and can't trust him",
"What do guys do to show that they like a gal?",
"i loooooooooooooooooooooooove going to the movies.",
"That movie was fucking awesome",
"My mom is doing fine",
"That was funny LOL" , 
"It's piece of cake, we can do it",
"btw - ur avatar looks familiar",
"who gives a crap?",
"Howdy Lucy! been ages since we last met.",
"Dude, this car's dope!",
"She's my bestie from college",
"I kinda have a feeling that he has a crush on you.",
"OMG! It's finger-lickin' good.",
]   

for source_sentence in source_sentences:
    target_sentence = sf.transfer(source_sentence)
    print("-" *100)
    print("[Casual] ", source_sentence)
    print("-" *100)
    if target_sentence is not None:
        print("[Formal] ",target_sentence)
        print()
    else:
        print("No good quality transfers available !")
[Casual]  I am quitting my job
[Formal]  I will be stepping down from my job.
----------------------------------------------------------------------------------------------------
[Casual]  Jimmy is on crack and can't trust him
[Formal]  Jimmy is a crack addict I cannot trust him
----------------------------------------------------------------------------------------------------
[Casual]  What do guys do to show that they like a gal?
[Formal]  What do guys do to demonstrate their affinity for women?
----------------------------------------------------------------------------------------------------
[Casual]  i loooooooooooooooooooooooove going to the movies.
[Formal]  I really like to go to the movies.
----------------------------------------------------------------------------------------------------
[Casual]  That movie was fucking awesome
[Formal]  That movie was wonderful.
----------------------------------------------------------------------------------------------------
[Casual]  My mom is doing fine
[Formal]  My mother is doing well.
----------------------------------------------------------------------------------------------------
[Casual]  That was funny LOL
[Formal]  That was hilarious
----------------------------------------------------------------------------------------------------
[Casual]  It's piece of cake, we can do it
[Formal]  The whole process is simple and is possible.
----------------------------------------------------------------------------------------------------
[Casual]  btw - ur avatar looks familiar
[Formal]  Also, your avatar looks familiar.
----------------------------------------------------------------------------------------------------
[Casual]  who gives a crap?
[Formal]  Who cares?
----------------------------------------------------------------------------------------------------
[Casual]  Howdy Lucy! been ages since we last met.
[Formal]  Hello, Lucy It has been a long time since we last met.
----------------------------------------------------------------------------------------------------
[Casual]  Dude, this car's dope!
[Formal]  I find this car very appealing.
----------------------------------------------------------------------------------------------------
[Casual]  She's my bestie from college
[Formal]  She is my best friend from college.
----------------------------------------------------------------------------------------------------
[Casual]  I kinda have a feeling that he has a crush on you.
[Formal]  I have a feeling that he is attracted to you.
----------------------------------------------------------------------------------------------------
[Casual]  OMG! It's finger-lickin' good.
[Formal]  It is so good, it is delicious.
----------------------------------------------------------------------------------------------------

Formal to Casual (Available now !)

from styleformer import Styleformer
import warnings
warnings.filterwarnings("ignore")

# style = [0=Casual to Formal, 1=Formal to Casual, 2=Active to Passive, 3=Passive to Active etc..]
sf = Styleformer(style = 1) 
import torch
def set_seed(seed):
  torch.manual_seed(seed)
  if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

set_seed(1212)

source_sentences = [
"I would love to meet attractive men in town",
"Please leave the room now",
"It is a delicious icecream",
"I am not paying this kind of money for that nonsense",
"He is on cocaine and he cannot be trusted with this",
"He is a very nice man and has a charming personality",
"Let us go out for dinner",
"We went to Barcelona for the weekend. We have a lot of things to tell you.",
]   

for source_sentence in source_sentences:
    # inference_on = [-1=Regular model On CPU, 0-998= Regular model On GPU, 999=Quantized model On CPU]
    target_sentence = sf.transfer(source_sentence, inference_on=-1, quality_filter=0.95, max_candidates=5)
    print("[Formal] ", source_sentence)
    if target_sentence is not None:
        print("[Casual] ",target_sentence)
    else:
        print("No good quality transfers available !")
    print("-" *100)        
[Formal]  I would love to meet attractive men in town
[Casual]  i want to meet hot guys in town
----------------------------------------------------------------------------------------------------
[Formal]  Please leave the room now
[Casual]  leave the room now.
----------------------------------------------------------------------------------------------------
[Formal]  It is a delicious icecream
[Casual]  It is a yummy icecream
----------------------------------------------------------------------------------------------------
[Formal]  I am not paying this kind of money for that nonsense
[Casual]  But I'm not paying this kind of money for that crap
----------------------------------------------------------------------------------------------------
[Formal]  He is on cocaine and he cannot be trusted with this
[Casual]  he is on coke and he can't be trusted with this
----------------------------------------------------------------------------------------------------
[Formal]  He is a very nice man and has a charming personality
[Casual]  he is a really nice guy with a cute personality.
----------------------------------------------------------------------------------------------------
[Formal]  Let us go out for dinner
[Casual]  let's hang out for dinner.
----------------------------------------------------------------------------------------------------
[Formal]  We went to Barcelona for the weekend. We have a lot of things to tell you.
[Casual]  hehe..we went to barcelona for the weekend..we got a lot of things to tell ya..
----------------------------------------------------------------------------------------------------

Active to Passive (Available now !)

# style = [0=Casual to Formal, 1=Formal to Casual, 2=Active to Passive, 3=Passive to Active etc..]
sf = Styleformer(style = 2) 

Passive to Active (Available now !)

# style = [0=Casual to Formal, 1=Formal to Casual, 2=Active to Passive, 3=Passive to Active etc..]
sf = Styleformer(style = 3) 

Knobs

# inference_on = [-1=Regular model On CPU, 0-998= Regular model On GPU, 999=Quantized model On CPU]
target_sentence = sf.transfer(source_sentence, inference_on=-1, quality_filter=0.95, max_candidates=5)

Models

Model Type Status
prithivida/informal_to_formal_styletransfer Seq2Seq Beta
prithivida/formal_to_informal_styletransfer Seq2Seq Beta
prithivida/active_to_passive_styletransfer Seq2Seq Beta
prithivida/passive_to_active_styletransfer Seq2Seq Beta
prithivida/positive_to_negative_styletransfer Seq2Seq WIP
prithivida/negative_to_positive_styletransfer Seq2Seq WIP

Dataset

  • The casual <=> formal dataset was generated using ideas mentioned in reference paper 1
  • The positive <=> negative dataset was generated using ideas mentioned in reference paper 3
  • Fined tuned on T5 on a Tesla T4 GPU and it took ~2 hours to train each of the above models with batch_size = 16 and epochs = 5.(Will share training args shortly)

Benchmark

  • TBD

Streamlit Demo

pip install streamlit
streamlit run streamlit_app.py

References

Citation

  • TBD

styleformer's People

Contributors

creatorrr avatar ilnarselimcan avatar prithivirajdamodaran avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

styleformer's Issues

cant create a Styleformer(style=n)

it keeps throing the same error(diffrent request id)
OSError: There was a specific connection error when trying to load prithivida/informal_to_formal_styletransfer:
<class 'requests.exceptions.HTTPError'> (Request ID: K9-6-Ks5uMEai7cOcQ3gC)

Dataset & Retraining

If possible could you please share the training script with and the dataset, I would like to finetune this on a flan-t5-base by using specific tokens for specific task, this would reduce the double requirement for the gpu.

Usage of "adequacy_model" in the `def _active_to_passive()` and `def _passive_to_active()`

In the codes, an adequacy model is applied to score and rank the generated sentences.
(1) def _formal_to_casual(): https://github.com/PrithivirajDamodaran/Styleformer/blob/main/styleformer/styleformer.py#L69-L95
(2) def _casual_to_formal(): https://github.com/PrithivirajDamodaran/Styleformer/blob/main/styleformer/styleformer.py#L97-L123

May I know why it is not used in the def _active_to_passive() and def _passive_to_active()?

Thank you very much!

Sentiment Transfer

Love the library!

Was hoping to do sentiment transfer but I see that has not yet been integrated. Any pointers towards off the shelf models that can do that?

OSError: Can't load config for 'prithivida/parrot_adequacy_on_BART'. Make sure that....

Hi Prithviraj,

Fantastic work you are doing.

While testing your models, I intended to deploy the model wrapped in a flask app on EC2.

Although the results work on Google Colab, I receive the following error on EC2 -

OSError: Can't load config for 'prithivida/parrot_adequacy_on_BART'. Make sure that:

- 'prithivida/parrot_adequacy_on_BART' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'prithivida/parrot_adequacy_on_BART' is the correct path to a directory containing a config.json file

Can you guide me on how this can be resolved?

Regards,
Paritosh

Code to train the model

Hey,
Can you please share the code, where you train models?
We have tasks similar to issues you solve but in other domains. It might be very helpful for us.
Do you fine-tune only T5 or you make additional changes to T5 fine-tuning?
Thanks

Single model with multiple task prefixes

Just a quick question.

Given that Styleformer is based on the T5 model is it possible to have a single model that uses 4 different task prefixes as an option? Or would that require using a larger model than t5-base to maintain the same level of accuracy?

Trimming long sentences

Following the code snippet for a better understanding of the problem, I am facing.

from styleformer import Styleformer
import torch
import warnings
warnings.filterwarnings("ignore")

# style = [0=Casual to Formal, 1=Formal to Casual, 2=Active to Passive, 3=Passive to Active etc..]
sf = Styleformer(style = 0) 

source_sentences = [
                   "Corruption in african countries hinders economic, political and social development. It is a major obstacle to economic growth, good governance and fundamental freedoms, such as freedom of speech or the right of citizens to hold governments accountable. In addition, corruption affects the lives of individuals, families and communities. The 10th global corruption barometer (gcb) program - in africa, shows that while many people in africa feel that corruption is on the rise in their country, many also feel confident that, as citizens, they can make a difference in the fight against corruption."
]

for paragraph in source_sentences:
    # sentences = sent_tokenize(paragraph)
    sentences = paragraph.split('. ')
    for source_sentence in sentences:
        target_sentence = sf.transfer(source_sentence)
        print("-" *100)
        print("[Casual] ", source_sentence)
        print("-" *100)
        if target_sentence is not None:
            print("[Formal] ",target_sentence)
            print()
        else:
            print("No good quality transfers available !")

Program Output


[Casual] Corruption in african countries hinders economic, political and social development

[Formal] In African countries, corruption affects economic, political, and social development.


[Casual] It is a major obstacle to economic growth, good governance and fundamental freedoms, such as freedom of speech or the right of citizens to hold governments accountable

[Formal] It's a major obstacle to economic growth, good governance, and fundamental freedoms, such as the freedom of speech or the right of citizens to


[Casual] In addition, corruption affects the lives of individuals, families and communities

[Formal] Additionally, corruption has a negative impact on individuals, families and communities.


[Casual] The 10th global corruption barometer (gcb) program - in africa, shows that while many people in africa feel that corruption is on the rise in their country, many also feel confident that, as citizens, they can make a difference in the fight against corruption.

[Formal] The tenth Global Corruptibility Barometer (GCB) program - in Africa - shows that while many people in Africa feel that corruption

Please help to fix this for longer sentences. Thanks in advance!

Issue with loading saved models

Hi,
I'm trying to save and load the tokenizer and model.
I use the following to save them:

tokenizer = AutoTokenizer.from_pretrained("prithivida/informal_to_formal_styletransfer")
tokenizer.save_pretrained('./data/style_tokenizer')
model = AutoModelForSeq2SeqLM.from_pretrained("prithivida/informal_to_formal_styletransfer")
model.save_pretrained('./data/style_model')

But when I try to load them, from the local path, I get the following error:

OSError: Can't load config for '../data/style_tokenizer'. Make sure that:

- '../data/style_tokenizer' is a correct model identifier listed on 'https://huggingface.co/models'

- or '../data/style_tokenizer' is the correct path to a directory containing a config.json file

This somehow makes sense since saving the vectorizer, no config.json is being created.

Any idea how can I save/load the tokenizer and model?

How to do inferencing using multiple GPU's for styleformer

I am using this model to do inferencing on 1 million data point using A100 GPU's with 4 GPU. I am launching a inference.py code using Googles vertex-ai Container.

How can I make inference code to utilise all 4 GPU's ? So that inferencing is super-fast.

Here is the same code I use in inference.py:

from styleformer import Styleformer
import warnings
warnings.filterwarnings("ignore")

# style = [0=Casual to Formal, 1=Formal to Casual, 2=Active to Passive, 3=Passive to Active etc..]
sf = Styleformer(style = 1) 
import torch
def set_seed(seed):
  torch.manual_seed(seed)
  if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

set_seed(1212)

source_sentences = [
"I would love to meet attractive men in town",
"Please leave the room now",
"It is a delicious icecream",
"I am not paying this kind of money for that nonsense",
"He is on cocaine and he cannot be trusted with this",
"He is a very nice man and has a charming personality",
"Let us go out for dinner",
"We went to Barcelona for the weekend. We have a lot of things to tell you.",
]   

for source_sentence in source_sentences:
    # inference_on = [0=Regular model On CPU, 1= Regular model On GPU, 2=Quantized model On CPU]
    target_sentence = sf.transfer(source_sentence, inference_on=1, quality_filter=0.95, max_candidates=5)
    print("[Formal] ", source_sentence)
    if target_sentence is not None:
        print("[Casual] ",target_sentence)
    else:
        print("No good quality transfers available !")
    print("-" *100)     

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.