Code Monkey home page Code Monkey logo

calend.ai's Introduction

The @Calend_AI Twitter bot

This repository contains the source code I used to create @Calend_AI, a small toy Language Model I developed for fun in September 2021.

FAQ

Who is (or what is) @Calend_AI?

It is an automated Italian Twitter profile. It mimics the peculiar writing style and content of the Italian politician Carlo Calenda, former Italian Minister of Economic Development, former candidate for mayor of Rome and currently MEP.

How does it work?

It replies to tweets that mention @Calend_AI, as long as they are not in turn reply tweets, and do not contain links, images, or videos.

Who created it?

I am Marco Roberti, a Ph.D. candidate at the Department of Computer Science, University of Turin. I created this bot partly for fun, partly to practice, partly on inspiration of some chats with my old friend Federico.

What technologies does it use?

The bot is a Hugging Face ๐Ÿค— Transformers Language Model, obtained by fine-tuning a dataset created specifically from @CarloCalenda 's tweets, starting from the Italian T5 model developed by Gabriele Sarti (University of Groningen).

HowTo

Requirements

I ran this code on Python 3.9, but it should be compatible with Python 3.8+.

Required libraries are listed in the requirements.txt file, use one of the following commands to install them, depending on your environment:

pip install requirements.txt
# XOR
conda install --file requirements.txt

You'll also need to install and configure Twurl.

Dataset (download and pre-processing)

In order to massively download tweets, you'll need a Twitter developer account.

The following command will download and process tweets from October 18th, 2020 (date of Calenda's official candidacy for mayor of Rome) until today:

BEARER_TOKEN=<bearer_token> python3 tweet_downloader.py

You can change the starting date via the --start_date YYYY-MM-DD argument.

Training

The training script is a modified version of HuggingFace's run_summarization.py file.

python3 main.py config/train.json

The model @Calend_AI is currently running is downloadable here. Its Tensorboard training log is available as well.

Interactive generation

To check offline your model and tune the config/generate.json file, you can use one of the interactive_*.py scripts.

Replying to custom tweets you can write on-the-fly:

python3 interactive_gen.py <checkpoint> config/generate.json -b config/blacklist.txt

Replying to tweets in the test set:

python3 interactive_test.py <checkpoint> config/generate.json --test_file data/test.json -b config/blacklist.txt

The -b config/blacklist.txt argument is optional on both scripts.

Interacting with Twitter

Once you have your optimal model and generating configuration, you can go online!

PYTHONPATH=. BEARER_TOKEN=<bearer_token> TOKENIZERS_PARALLELISM=true python3 bot/server.py <checkpoint> config/generate.json -b config/blacklist.txt

Useful links

calend.ai's People

Contributors

marco-roberti avatar

Stargazers

Martin Milan avatar Matteo Manighetti avatar AirNicco8 avatar Davide Gualano avatar Cristiano Ceccarelli avatar  avatar Alessandro Gasparini avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.