Code Monkey home page Code Monkey logo

franklin-ford-bot's Introduction

franklin-ford-bot's People

Contributors

cyruslk avatar

Watchers

 avatar  avatar

franklin-ford-bot's Issues

sentence tokenization

Going back to the code of last week - Something I realised is that we're running the OCR every time we're performing a fetch from the URL. This is a lot of computation and we should be able to:

1 - Run the OCR one -- for all the texts
2 - Pick a random string using the split(".") operator inside a .txt file.
3 - Then, do the computation

Just a small thing: as you work with Watson, it probably has some kind of sentence tokenization feature. That might be better than simply using the split(".") operator?

Q: Do we use the same sentence from Twitter to Reddit?

I guess we do use the same sentence but we compute it differently but just want to make sure with you. I guess using the same sentence is more interesting because then we have both a platform where this sentence is outputted without any targeting (Twitter, no #), then we have the propagation of this same sentence on Reddit; something we could easily use for for visualisation (I'll start thinking about this soon!) inside the bot's website.

Q: How do we want to pick a specific subreddit?

How do we target the reddit list of subreddits and select a thread where our fragment will be posted? For now I see two techniques we could explore:

  1. Go through the picken sentence, map through all the words -- and for all of the words, see if there's a match with one of the subreddit's name/title. If the bot returns The gathering of advertising to a given trade paper must increase just 1/2, the sentence could be posted both inside r/thegathering, r/advertising and others. Recursion and propagation mode.

  2. Go through the picken sentence, run a Watson script to interpret the sentence (Tone Analyzer, Natural Language Understanding...)and see if its interpretation is matching with one of the subreddit's name/title. More tricky, less brutal; since the sentence might be quasi-gibberish, watson might have difficulties finding a meaning.

Questions (and answers about the bot)

> When does it Tweet? What's his frequency?
> The bot last tweeted on January, 21st. Before that it was tweting almost every day. Seems quite unstable - is it on purpose?

It tweets on random intervals that range between 18000 (5 hours) seconds and 176400 (49 hours). Sometimes it stops because my Raspberry Pi crashed (for reasons I have not investigated) and I have to restart everything manually (juste like a did today, Feb 20, 2019).

> When does the bot replies to its own posts?
> I see that sometimes the bot is replying its own posts and sometimes no. This seems quite unstable too. Is there a computational procedure behind this? Random() maybe?

Replies are for longer threads, i.e. sentences that are longer than 140 characters. (the bot is proudly sticking to the old 140-character limit, that's quite old-fashioned)

> Why is the bot splitting its comments in two sections (separated by 1/2 and 2/2)?
> Is there a computational logic/procedure behind this? Is it meaningful to Ford?

Same than previous answer, the bot just splits sentences that are too long in several parts.

Ford bot x machine learning

Ford bot x machine learning
 Text-generation model?
This is quite interesting. The idea would be to basically train our own model (based on the writings of Ford) to create new sorts of linguistic forms. Here's the ml5 tutorial: https://ml5js.org/docs/training-lstm. So based on some inputs entered by the website's user, 'artificial' text coming form the neural network (trained on the writings of Ford) will be displayed. Simple interaction: the user change a set of buttons -> 'artificial' text appear.

-> These inputted parameters could be also defined by the machine, randomly or not.

Here, this is what one of the ml5 default models looks like, trained on the writings of Harry Potter. The first step would be therefore to concatenate all the writings of fords located in the working repository into one single .txt file.

 Interactive Text Generation LSTM
In this interactive demo you ask the LSTM: "Starting with the seed text, predict what text might come next based on the pre-trained Ernest Hemingway model." Changing length changes the number of characters in the resulting predicted text. 
This would definetely add something more interactive to the project. The idea with this is to have an autocomplete field where the user types some words and the machine completes the sentence based on the model (aka - the writings of Ford).

I love the idea of a Fordian autocomplete. Or could we find a way to generate new (=predicted) text in order to answer questions 'the user types a question, the machine replies)?

Chrome extension = great

> Creating a chrome extension?
> 
> I don't think we talked about this when we met but it's something I had in mind recently. I don't really envision this as a main asset of the project but as a fun exploration on its own. The idea is to give the user of this plugin (that you install on your browser) an updated version of all the webpages he will visit.
> With chrome extensions, specific (or all) words, images, videos, advertisments (and so on) could be changed. All kinds of contents could be also added inside the webpage. This one, for an example, is replacing the word santa by the word satan. You get the idea ;-)
> Chrome extensions could also operate only for a specific website or for lists of websites.

I want to see this on a news site! E.g. the New York Times frontpage, but with Ford-related content.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.