Code Monkey home page Code Monkey logo

getoldtweets-r's Introduction

Get Old Tweets Programmatically

A project written in R to get old tweets, it bypass some limitations of Twitter Official API.

Details

Twitter Official API has the bother limitation of time constraints, you can't get older tweets than a week. Some tools provide access to older tweets but in the most of them you have to spend some money before. I was searching other tools to do this job, I found "https://github.com/Jefferson-Henrique/GetOldTweets-python", but this application has some limitations. Data cannot be fetched at all times. It is also difficult to discern information such as geolocation, retweet count, quotation. There are also problems in some languages (such as Turkish).

In the face of these challenges, I found a way to bring twitter data from the past by combining Json API knowledge and twitter api call with inspiration from this application.

Prerequisites

library(rtweet)
library(rvest)
library(jsonlite)

Also you may need to create an app from Twitter developer to take "consumer_key, consumer_secret, access_token, access_secret".

create_token(
  app = "dummy",
  consumer_key = "xxxxxxxxxxxxx",
  consumer_secret = "yyyyyyyyyyyyyyyyyyyyyyyyyyyyyy",
  access_token = "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz",
  access_secret = "kkkkkkkkkkkkkkkkkkkkkkkkkkkk")

Input parameters

startdate =  "2020-06-01" # A lower bound date to restrict search.
enddate = "2020-07-01"    # An upper bound date to restrist search.
language = "en"           # Tweets in a specific language to restrist search.
ntweets = 1000            # The maximum number of tweets to be retrieved
searchTerm <- "donald trump"
searchTerm <- "(from%3ArealDonaldTrump)" # for user search

Add source filter

searchTerm <- "donald trump"
searchbox <- URLencode(searchTerm)
# For iPhone only use below
searchbox <- paste0(searchbox,"%20AND%20source%3A\"Twitter%20for%20iPhone\"")
# For Android only use below
searchbox <- paste0(searchbox,"%20AND%20source%3A\"Twitter%20for%20Android\"")
# For Web App only use below
searchbox <- paste0(searchbox,"%20AND%20source%3A\"Twitter%20Web%20App\"")

Examples of R usage

library(rtweet)
library(rvest)
library(jsonlite)

create_token(
  app = "dummy",
  consumer_key = "xxxxxxxxxxxxx",
  consumer_secret = "yyyyyyyyyyyyyyyyyyyyyyyyyyyyyy",
  access_token = "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz",
  access_secret = "kkkkkkkkkkkkkkkkkkkkkkkkkkkk")

# Input parameters
startdate =  "2014-01-01"
enddate = "2015-01-01"
language = "en"
ntweets = 100
searchTerm <- "donald trump"
# searchTerm <- "(from%3ArealDonaldTrump)" # for user search
searchbox <- URLencode(searchTerm)
# convert to url
temp_url <- paste0("https://twitter.com/i/search/timeline?f=tweets&q=",searchbox,"%20since%3A",startdate,"%20until%3A",enddate,"&l=",language,"&src=typd&max_position=")
webpage <- fromJSON(temp_url)
if(webpage$new_latent_count>0){
  tweet_ids <- read_html(webpage$items_html) %>% html_nodes('.js-stream-tweet') %>% html_attr('data-tweet-id')
  breakFlag <- F
  while (webpage$has_more_items == T) {
    tryCatch({
      min_position <- webpage$min_position
      next_url <- paste0(temp_url, min_position)
      webpage <- fromJSON(next_url)
      next_tweet_ids <- read_html(webpage$items_html) %>% html_nodes('.js-stream-tweet') %>% html_attr('data-tweet-id')
      next_tweet_ids <- next_tweet_ids[!is.na(next_tweet_ids)]
      tweet_ids <- unique(c(tweet_ids,next_tweet_ids))
      if(length(tweet_ids) >= ntweets)
      {
        breakFlag <- T
      }
    },
    error=function(cond) {
      message(paste("URL does not seem to exist:", next_url))
      message("Here's the original error message:")
      message(cond)
      breakFlag <<- T
    })
    
    if(breakFlag == T){
      break
    }
  }
  tweets <- lookup_tweets(tweet_ids, parse = TRUE, token = NULL)
  # df <- apply(tweets,2,as.character)
  # write.csv(df, file = "tweets.csv", row.names = F)
} else {
  paste0("There is no tweet about this search term!")
}

getoldtweets-r's People

Contributors

sarikayamehmet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

getoldtweets-r's Issues

problem with fromJSON

Hi,

Using your code I get the following error after the line - webpage <- fromJSON(temp_url):
Error in open.connection(con, "rb") : HTTP error 404.

I understand this means that the page doesn't exist. (It happens on two machines running Windows and Ubuntu)

If I use a query that I can see in a browser such as:
temp_url <- https://twitter.com/search?f=tweets&q=donald%20trump%20since%3A2014-01-01%20until%3A2015-01-01&l=en&src=typd&max_position=

then webpage <- fromJSON(temp_url) gives me this error:
Error in parse_con(txt, bigint_as_char) :
lexical error: invalid char in json text.
<html dir="ltr"
(right here) ------^

I guess the problem in this case is that the browser friendly query is not in json format?
Is it possible that Twitter changed the type of query you used?
Can you please explain how you get the URL or how can we adapt it when Twitter changes it?
Thanks!

Not all tweets gathered

I tried searching for Tweets by one user by using the 'searchTerm <- "@TwittterName"'; apart from receiving a lot of tweets, that simply replied to @TwitterName which I sorted out afterwards, I noticed that not all tweets of @TwitterName were returned. I checked for the return limits but was still way below the 1000. Any ideas? Or any other way, to search for tweets of one user precisely (not having to sort them out after the search)?

Needs only user tweets - Yardim ?

Hello and tesekkurler for sharing!

Script works fine, Problem is when run the script it's collecting all user tweets, retweets and comments. When i tried to run the script for a year data it just gets me for 1000 tweets.. so far so good. Problem is i am getting 2 months tweets cause of its getting all data(tweets,retweets and comment).

It's possible to get only user tweets. how to do ?

Thanks again

no output

Hello,

Code works fine but when I tried to replicate the example provided here, tweets file was empty. Could you please help on that issue?

Also as far as I know, there are many libraries to get old tweets in python but this is the first tool I come across in R, it will be very useful when I can extract old tweets :)

Thanks in advance

Cagla

created_at column values in the csv file and df table

First of all thanks for your script, ir worked perfectly. The problem a'm getting is related to the values in the created_at column. I get values like 1420070245 but i have no ideahow to convert this code to twitter time string.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.