cyruslk / franklin-ford-bot Goto Github PK

JavaScript 100.00%

franklin-ford-bot's Introduction

Franklin-ford-bot

This project is funded by a grant from Social Sciences and Humanities Research Council. I'm the technical/design lead on the project. It is located between digital humanities, archives, and journalism.

franklin-ford-bot's People

Contributors

Watchers

franklin-ford-bot's Issues

sentence tokenization

Going back to the code of last week - Something I realised is that we're running the OCR every time we're performing a fetch from the URL. This is a lot of computation and we should be able to:

1 - Run the OCR one -- for all the texts
2 - Pick a random string using the split(".") operator inside a .txt file.
3 - Then, do the computation

Just a small thing: as you work with Watson, it probably has some kind of sentence tokenization feature. That might be better than simply using the split(".") operator?

Q: Do we use the same sentence from Twitter to Reddit?

I guess we do use the same sentence but we compute it differently but just want to make sure with you. I guess using the same sentence is more interesting because then we have both a platform where this sentence is outputted without any targeting (Twitter, no #), then we have the propagation of this same sentence on Reddit; something we could easily use for for visualisation (I'll start thinking about this soon!) inside the bot's website.

Q: How do we want to pick a specific subreddit?

How do we target the reddit list of subreddits and select a thread where our fragment will be posted? For now I see two techniques we could explore:

Go through the picken sentence, map through all the words -- and for all of the words, see if there's a match with one of the subreddit's name/title. If the bot returns The gathering of advertising to a given trade paper must increase just 1/2, the sentence could be posted both inside r/thegathering, r/advertising and others. Recursion and propagation mode.
Go through the picken sentence, run a Watson script to interpret the sentence (Tone Analyzer, Natural Language Understanding...)and see if its interpretation is matching with one of the subreddit's name/title. More tricky, less brutal; since the sentence might be quasi-gibberish, watson might have difficulties finding a meaning.

fvdfvdv

vfv

Questions (and answers about the bot)

> When does it Tweet? What's his frequency?
> The bot last tweeted on January, 21st. Before that it was tweting almost every day. Seems quite unstable - is it on purpose?

It tweets on random intervals that range between 18000 (5 hours) seconds and 176400 (49 hours). Sometimes it stops because my Raspberry Pi crashed (for reasons I have not investigated) and I have to restart everything manually (juste like a did today, Feb 20, 2019).

> When does the bot replies to its own posts?
> I see that sometimes the bot is replying its own posts and sometimes no. This seems quite unstable too. Is there a computational procedure behind this? Random() maybe?

Replies are for longer threads, i.e. sentences that are longer than 140 characters. (the bot is proudly sticking to the old 140-character limit, that's quite old-fashioned)

> Why is the bot splitting its comments in two sections (separated by 1/2 and 2/2)?
> Is there a computational logic/procedure behind this? Is it meaningful to Ford?

Same than previous answer, the bot just splits sentences that are too long in several parts.

Ford bot x machine learning

Ford bot x machine learning
 Text-generation model?
This is quite interesting. The idea would be to basically train our own model (based on the writings of Ford) to create new sorts of linguistic forms. Here's the ml5 tutorial: https://ml5js.org/docs/training-lstm. So based on some inputs entered by the website's user, 'artificial' text coming form the neural network (trained on the writings of Ford) will be displayed. Simple interaction: the user change a set of buttons -> 'artificial' text appear.

-> These inputted parameters could be also defined by the machine, randomly or not.

Here, this is what one of the ml5 default models looks like, trained on the writings of Harry Potter. The first step would be therefore to concatenate all the writings of fords located in the working repository into one single .txt file.

 Interactive Text Generation LSTM
In this interactive demo you ask the LSTM: "Starting with the seed text, predict what text might come next based on the pre-trained Ernest Hemingway model." Changing length changes the number of characters in the resulting predicted text. 
This would definetely add something more interactive to the project. The idea with this is to have an autocomplete field where the user types some words and the machine completes the sentence based on the model (aka - the writings of Ford).

I love the idea of a Fordian autocomplete. Or could we find a way to generate new (=predicted) text in order to answer questions 'the user types a question, the machine replies)?

Chrome extension = great

> Creating a chrome extension?
> 
> I don't think we talked about this when we met but it's something I had in mind recently. I don't really envision this as a main asset of the project but as a fun exploration on its own. The idea is to give the user of this plugin (that you install on your browser) an updated version of all the webpages he will visit.
> With chrome extensions, specific (or all) words, images, videos, advertisments (and so on) could be changed. All kinds of contents could be also added inside the webpage. This one, for an example, is replacing the word santa by the word satan. You get the idea ;-)
> Chrome extensions could also operate only for a specific website or for lists of websites.

I want to see this on a news site! E.g. the New York Times frontpage, but with Ford-related content.

cyruslk / franklin-ford-bot Goto Github PK

franklin-ford-bot's Introduction

Franklin-ford-bot

franklin-ford-bot's People

Contributors

Watchers

franklin-ford-bot's Issues

sentence tokenization

Q: Do we use the same sentence from Twitter to Reddit?

Q: How do we want to pick a specific subreddit?

fvdfvdv

Questions (and answers about the bot)

Ford bot x machine learning

Chrome extension = great

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent