Code Monkey home page Code Monkey logo

twint-ng's Introduction

twint-ng

Twint core component; new lite version due to Twitter Legacy removal

twint-ng's People

Contributors

pielco11 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

twint-ng's Issues

functional design

I would like to start off with proposing (and agreeing on) a functional design of the new tool. I had already written something up and I will post that in this issue over the next day or so.

I think the combination of a package and a command line tool is something we absolutely want to keep.

  • Propose to remove databases, ES, translations, Those can be handled by different tools.
  • Do we really need all the async stuff, especially if we won't be doing database access? I think the code would be much more easy to maintain if we use Requests.
  • Python3 only

Below is my braindump. Please respond in the comments what you think!!!

Output

Output will be written to stdout (default) or file.

Command line arguments:

-o <filename> output to file
--csv Write as .csv format
--json Write as .jsonlines format (independent json object on every line)

Error messages and debugging

Errors and informational messages will be output to stderr (default) or file.

Command line arguments:

-v enable verbose output (loglevel info)
-vv enable debug logging (loglevel debug)
-q disable error output completely (loglevel none)
-l <filename> logfile instead of stderr

--count Display number of Tweets scraped at the end of session.
--stats Show number of replies, retweets, and likes.

Cloaking and rate limiting options

The tool needs might need to be able to circumvent most measures taken by Twitter.

  • configurable user agent (not for now)
  • proxy support
  • rate limit handling

Command line arguments:

-ua <user agent>
-uafile <filename> (with ua strings, one per line, tool will rotate through them)

-proxy <proxyurl>
-proxyfile <filename> (with proxyurls, one per line, tool will rotate through them))

TBD rate limiting, for instance backoff exponent, min/max/random wait time

Searching and filtering

I consolidated all command line args that have to do with searching and filtering. I think we need to keep the search params (i.e. those that send a different request to Twitter) and remove the filters (i.e. those that remove things that are in the output). Filtering can be done by an external program.

Can someone with more internal knowledge split these args in those two groups maybe?

--to USERNAME Search Tweets to a user.
--all USERNAME Search all Tweets associated with a user.
--favorites Scrape Tweets a user has liked.

-nr, --native-retweets
Filter the results for retweets only.
--min-likes MIN_LIKES
Filter the tweets by minimum number of likes.
--min-retweets MIN_RETWEETS
Filter the tweets by minimum number of retweets.
--min-replies MIN_REPLIES
Filter the tweets by minimum number of replies.
--links LINKS Include or exclude tweets containing one o more links.
If not specified you will get both tweets that might
contain links or not.
--source SOURCE Filter the tweets for specific source client.
--members-list MEMBERS_LIST
Filter the tweets sent by users in a given list.
-fr, --filter-retweets
Exclude retweets from the results.
--videos Display only Tweets with videos.
--images Display only Tweets with images.
--media Display Tweets with only images or videos.
--retweets Include user's Retweets (Warning: limited).

--email Filter Tweets that might have email addresses
--phone Filter Tweets that might have phone numbers
--verified Display Tweets only from verified users (Use with -s).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.