vosonlab / vosonsml Goto Github PK

View Code? Open in Web Editor NEW

77.0 9.0 13.0 3.13 MB

R package for collecting social media data and creating networks for analysis.

Home Page: https://vosonlab.github.io/vosonSML/

License: GNU General Public License v3.0

R 99.85% CSS 0.15%

voson r r-package network-graph cran social-network-analysis social-media rstats youtube reddit

vosonsml's People

Contributors

Stargazers

Watchers

Forkers

bryn-g nigelfranciscus katiesilk vaisang knapply wizardshowing doctormike87 zhanglipku mistertoby kadewe gabriela-monteiro lfagliano grigorescurazvan

vosonsml's Issues

Issue when using Collect(<search.twitter>): "Error in `dplyr::bind_cols()`:"

In Version 0.32.7 (on R 4.2.2 with rtweet 1.1.0), I receive an error when executing a simple Twitter Collect:

twitter_data <- twitterAuth %>%
  Collect(searchTerm = "#auspol",
          searchType = "recent",
          numTweets = 1000,
          lang = "en",
          includeRetweets = TRUE,
          writeToFile = TRUE)

Error message:

Error in `dplyr::bind_cols()`:
! Can't recycle `..1` (size 1000) to match `..2` (size 43).
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/vctrs_error_incompatible_size>
Error in `dplyr::bind_cols()`:
! Can't recycle `..1` (size 1000) to match `..2` (size 43).
---
Backtrace:
 1. ... %>% ...
 4. vosonSML:::Collect.search.twitter(...)
 5. vosonSML:::import_rtweet_(df_tweets)
 6. dplyr::bind_cols(...)
 9. vctrs::vec_cbind(!!!dots, .name_repair = .name_repair)
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/vctrs_error_incompatible_size>
Error in `dplyr::bind_cols()`:
! Can't recycle `..1` (size 1000) to match `..2` (size 43).
---
Backtrace:
     ▆
  1. ├─... %>% ...
  2. ├─vosonSML::Collect(...)
  3. ├─vosonSML:::Collect.twitter(...)
  4. ├─vosonSML:::Collect.search.twitter(...)
  5. │ └─vosonSML:::import_rtweet_(df_tweets)
  6. │   └─dplyr::bind_cols(...)
  7. │     ├─dplyr:::fix_call(vec_cbind(!!!dots, .name_repair = .name_repair))
  8. │     │ └─base::withCallingHandlers(...)
  9. │     └─vctrs::vec_cbind(!!!dots, .name_repair = .name_repair)
 10. └─vctrs::stop_incompatible_size(...)
 11.   └─vctrs:::stop_incompatible(...)
 12.     └─vctrs:::stop_vctrs(...)
 13.       └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = vctrs_error_call(call))

In Versions 0.30.6 and 0.29.13, I receive a slightly different error when executing the same code:

Collecting tweets for search query...
Search term: #auspol
Unable to determine rate limit.
Requested 1000 tweets.
Error in `dplyr::filter()`:
! Problem while computing `..1 = .data$status_id %in% ...`.
Caused by error in `.data$status_id`:
! Column `status_id` not found in `.data`.
Run `rlang::last_error()` to see where the error occurred.
Warning message:
`create_token()` was deprecated in rtweet 1.0.0.
ℹ See vignette('auth') for details
ℹ The deprecated feature was likely used in the vosonSML package.
  Please report the issue at <https://github.com/vosonlab/vosonSML/issues>.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated. 
> rlang::last_error()
<error/rlang_error>
Error in `dplyr::filter()`:
! Problem while computing `..1 = .data$status_id %in% ...`.
Caused by error in `.data$status_id`:
! Column `status_id` not found in `.data`.

The same code works fine on R 4.1.2 with vosonSML 0.29.13 and vosonSML 0.30.6.

Collect(<youtube>) and Collect(<thread.reddit>) throw an error in R4.4.0: object 'collect_log' not found

On R 4.4.0 with vosonSML 0.34.2, Collect(<youtube>) and Collect(<thread.reddit>) throw the errors below, respectively:

Error in Collect.youtube(yt_auth, videoIDs = video_url, maxComments = 500, :
object 'collect_log' not found

Error in Collect.thread.reddit(Authenticate("reddit"), threadUrls = thread_urls, :
object 'collect_log' not found

Sample:

# YouTube Data Collection --------------------------------------------------
library(vosonSML)
my_api_key <- "XXX"
yt_auth <- Authenticate("youtube", apiKey = my_api_key)
video_url <- c("https://www.youtube.com/watch?v=jNQXAC9IVRw")
yt_data <- yt_auth |> Collect(videoIDs = video_url,
                              maxComments = 500,
                              writeToFile = TRUE,
                              verbose = TRUE) 

# Reddit Data Collection --------------------------------------------------
thread_urls <- c(
  "https://www.reddit.com/r/AusFinance/comments/ymkrlr/",
  "https://www.reddit.com/r/AusFinance/comments/ugetai/")
rd_data <- Authenticate("reddit") |>  
  Collect(threadUrls = thread_urls,
          sort = "best", 
          waitTime = c(6, 8),
          writeToFile = TRUE, 
          verbose = TRUE)

Error in the creation of the activity network graph

When I load a .rds file and try to run an activity network graph, the following error message appears :

activityNetwork <- twitterData %>%
Create("activity") %>%
Graph(writeToFile = TRUE) %>%
summary()

Creating igraph network graph...Generating twitter activity network...

-------------------------
collected tweets | 370562
quote tweets     | 39868
reply tweets     | 182792
tweets           | 147902
nodes from data  | 79492
nodes            | 450054
edges            | 222660
-------------------------
Done.
Error in igraph::graph_from_data_frame(d = net$edges, directed = directed,  : 
  Duplicate vertex names

"Error in if (!is_json(rsp)) { : argument is of length zero" - Connect function does not retrieve tweets

Error occurs after running Authenticate("twitter", ...) %>% Collect(searchTerm="#auspol", language="en", numTweets=1000, writeToFile=TRUE).

Tested in vosonSML Version 0.26.2 and 0.29.4. Error ccurs in both versions.

Error does not occur when using rtweet directly. Could retrieve tweets as usual with rtweet (version 0.6.8).

Tested 6 March 2020 ca. 5am UTC. R version 3.5.2.

Error in gsub(sprintf("(*UCP)\\b(%s)\\b", paste(sort(words, decreasing = TRUE), : invalid regular expression

Collected tweets using the rtweet package. Now trying to generate a Twitter semantic network. Was able to generate an actor network with no problem.

Error now occurs after running
semanticNetwork <- twitterData %>% Create("semantic", removeTermsOrHashtags = c("#auspol"), termFreq = 2, hashtagFreq = 10, verbose = TRUE)

Allow youtube search queries

I'm planning on using this package for a project I'm working on. I need to first write a function that will return a vector of all video IDs that match my search parameters (so that I can then use them with CollectDataYoutube()). If I created this function and submitted a pull request, would you be interested in accepting?

Error creating an actorNetwork

When I try to network with the actors in a video, people who comment and their responses, I get the following error:

Generating youtube actor network...Error: x must be a vector, not a tbl_df/tbl/data.frame/datasource/youtube object.

I review the code and write it in different ways but for the moment i am getting this. How can i fix it? The code is the following one:

library(vosonSML)
library(devtools)

apikey <- "xxxxxxx"
key <- Authenticate("youtube", 
                    apiKey=apikey)

video = c("2GRs1HKaLnA)  #Example

actorNetwork <- Authenticate("youtube", apiKey = apikey) %>%
                Collect(videoIDs = video) %>%
                Create("actor", writeToFile = TRUE)

As I said I tried different ways:

youtubeData <- youtubeAuth %>%
               Collect(videoIDs = video, maxComments = 500, writeToFile = TRUE)

#Note that if you read the YouTube dataframe from disk, you will need to modify the class values for the object, before you can use it with vosonSML:

youtubeData <- readRDS("/path/to/data/data.rds")
class(youtubeData) <- append(class(youtubeData), c("datasource", "youtube"))

actorNetwork <- youtubeData %>% Create("actor") %>% AddText(youtubeData)

Anyone could help me please??

thank you very much!

Having trouble getting hyperlink networks for some sites.

Error in dplyr::mutate():
! Problem while computing url = ifelse(...).
Caused by error in stri_replace_first_regex():
! Incorrectly nested parentheses in regexp pattern. (U_REGEX_MISMATCHED_PAREN, context=#:~:text=曉藍 (136)-,本週財經看點 (153),-李進 (404$)

Inbound links (via Google Analytics)?

Hello, thanks for creating this lovely package and documenting it so well; I plan to use it in my team's research!

A recent paper in a prominent educational research journal used a different (not R-based) version of VOSON that seemed to include the use of Google Analytics to document/collect inbound links to specified websites - and that it used Google Analytics to do this. Is that functionality a different version of VOSON had - and, is anything like that possible now? My understanding is that Google Analytics doesn't support that, so I was surprised to read about what they did in that paper, but wonder if it once available and is now phased out, or if there may be other ways to accomplish something similar. Thanks for considering this!

Josh

Semantic networks have no edge weights

When creating semantic networks, it seems that edge weights are all "NA".

Tested with Twitter data in Versions 0.26.2 and 0.27.0. I'm not sure since which version this appears, but in some previous version edge weights had actual values.

Collect.reddit fails hard on large threads

I'm aware that collecting comments from reddit is limited to 500 comments, but I did not expect the collection to error and exit completely. Is this intentional?

# First we create the authentication token and then we collect the data. 
reddit.data <- Authenticate("reddit") |>
              Collect(
                threadUrls = "https://www.reddit.com/r/Music/comments/pgm7le/abba_will_release_a_new_album_first_time_since/",
                verbose = TRUE)

results in:

Error: Error in `dplyr::mutate()`:
! Problem while computing `rm = ifelse(...)`.
Caused by error in `ifelse()`:
! dims [product 1] do not match the length of object [574]
7.   stop(gsub("^Error:\\s", "", paste0(e)), call. = FALSE)
6.   value[[3L]](cond)
5.   tryCatchOne(expr, names, parentenv, handlers[[1L]])
4.   tryCatchList(expr, classes, parentenv, handlers)
3.   tryCatch({
        threads_df <- reddit_build_df(threadUrls, waitTime, ua, verbose, 
        msg = msg)
        }, error = function(e) { ...
2. Collect.reddit(Authenticate("reddit"), threadUrls ="https://www.reddit.com/r/Music/comments/pgm7le/abba_will_release_a_new_album_first_time_since/", 
verbose = TRUE)
1. Collect(Authenticate("reddit"), threadUrls = "https://www.reddit.com/r/Music/comments/pgm7le/abba_will_release_a_new_album_first_time_since/", 
verbose = TRUE)

I would expect collection to fail gracefully with either a warning or error message, and return whatever results it could obtain. If that is not possible, a more helpful error message would be nice.

Error when collecting Reddit thread (input string 34 is invalid in this locale)

Hey Bryan!

I was demoing the tool today and encountered an error when trying to collect a Reddit thread.

Here is the thread URL: https://www.reddit.com/r/pcmasterrace/comments/dc3wbn/brother_laser_printer_airprint_ftw/

Here is the error message:
reddit collection error: Error in gsub(.html_entities[i], .html_entities[i + 1], x, fixed = TRUE): input string 34 is invalid in this locale

I believe it might have something to do with the following issue documented on Stack Overflow: https://stackoverflow.com/questions/41717781/warning-input-string-not-available-in-this-locale

The solution is probably to add this line somewhere early in the code before any character vector manipulation starts to occur:
Sys.setlocale('LC_ALL','C')

I've come across this issue before with other work in R using Mac, and IIRC this seemed to fix it.

Error in Collect.youtube: No comments could be collected from the given video Ids

This error occurs when a request to the youtube API to retrieve comments for given videos returns no results. It is most often caused by reaching the youtube API rate-limit or an issue with the API key set by the user.

Recently there have been users reporting that previously working and correct API keys are returning this error. The cause is unknown, but this problem seems to be related to some change in the Google API's projects of which the keys are attached. Generating new keys attached to the same project of the key that has stopped working also fails although everything appears correct on the Google API console.

Further information about the problem can be gathered by running the following snippet with the users API key value assigned to youtube_key:

library(httr)
youtube_key <- "xxxxxxxx"
req <- httr::GET("https://www.googleapis.com/youtube/v3/search", 
                 query = list(part = "snippet",
                 q = "programming",
                 maxResults = 25,
                 key = youtube_key))

httr::content(req)

# check the values of $error$code and $error$message

The output from the API request will have more information about the problem in the $error$code and $error$message fields.

The problem with the Google API project as described above has been found to have the following error values with x's substituted for project ID's:

$error$code
[1] 403
$error$message
[1] "Access Not Configured. 
YouTube Data API has not been used in project xxxxx before or it is disabled. 
Enable it by visiting
https://console.developers.google.com/apis/api/youtube.googleapis.com/overview?project=xxxxx
then retry.
If you enabled this API recently, wait a few minutes for the action to propagate
to our systems and retry."

The resolution to this issue that has been found to work for all users so far is to simply create a new project and then generate a new API key attached to the new project.

Note: The above error instructions about enabling the key may also work.

The next release versions of vosonSML will have more informative messages and handling of youtube API errors.

Error in collecting YouTube video comments

Hi!

I tried to collect some YouTube comment data (e.g., https://www.youtube.com/watch?v=ATYK2svJ6eM). However, I got an error like:
Error in Collect.youtube(., videoIDs = myYoutubeVideoIds, maxComments = 100, : No comments could be collected from the given video Ids: ATYK2svJ6eM

I also asked my colleague to try to do the same thing using a different YouTube video and replicated the same error.

And the error seems similar to this one: vosonlab/SocialMediaLab#14

I'm using Windows 10, R 3.5.1, and vosonSML 0.27.2.

Thanks for your help in advance!

2-mode network error

When running the example, there were errors:

twomodeNetwork <- twitterData %>% Create("twomode", removeTermsOrHashtags = c("#auspol"))
Generating twitter 2-mode network...Using to_lower = TRUE with token = 'tweets' may not preserve URLs.
Done.

twomodeNetwork
Network attributes:
Error in if (names(x$gal)[i] == "n") { : argument is of length zero

Are the errors thrown by vosonSML or other packages? Thanks!

Collect(<thread.reddit>) suddenly fails with 403 error but RedditExtractoR succeeds

On R 4.3.2 with vosonSML 0.34.2, it seems that Collect(<thread.reddit>) fails with 403 error. However, trying to collect data for the same threads with RedditExtractoR 3.0.9 succeeds.

Sample:

# vosonSML Data Collection --------------------------------------------------
library(vosonSML)
thread_urls <- c(
  "https://www.reddit.com/r/AusFinance/comments/ymkrlr/",
  "https://www.reddit.com/r/AusFinance/comments/ugetai/")
rd_data <- Authenticate("reddit") |>  
  Collect(threadUrls = thread_urls,
          sort = "best", 
          waitTime = c(6, 8),
          writeToFile = TRUE, 
          verbose = TRUE)

# RedditExtractoR Data Collection --------------------------------------------------
library(RedditExtractoR)
rex_data <- get_thread_content(thread_urls)

Error message for vosonSML collection:

Is Create(<actor.reddit>) creating spurious edges?

Apologies if I missed something in the code or the documentation, but it seems to me that Create.actor.reddit.R somehow creates extra edges. Using vosonSML version 0.34.1.

Example:

thread_urls <- c("https://www.reddit.com/r/AusFinance/comments/ugetai/")
rd_data <- Authenticate("reddit") |>  
  Collect(threadUrls = thread_urls,
          sort = "best", 
          waitTime = c(6, 8),
          writeToFile = TRUE, 
          verbose = TRUE)

# Remove rows that have 'NA'
rd_data <- rd_data[complete.cases(rd_data), ]

rd_actor_graph <- rd_data |> 
  Create("actor") |> 
  AddText(rd_data,
          verbose = TRUE) |>
  Graph()

# Replace node IDs with actual user names
V(rd_actor_graph)$name <- V(rd_actor_graph)$user

As of collection date, this returned 1259 rows for rd_data (i.e., comments/replies) and 550 nodes with 10218 edges for rd_actor_graph. The number of nodes matches the number of unique users, so that's fine. However, looking at the edges, I'm a bit puzzled.

For example, if I look at the Reddit thread on the Web, I can see the following comment/replies. (I used the same sorting method: 'Best'.) There is one top-level comment by u/Ferox101, a reply by u/brednoq, and a reply to that by u/Ferox101.

In R, I can see the following corresponding rows:

The first thing that I notice is that for each comment/reply that u/Ferox101 made, there are two rows in rd_data. The two respective rows differ only by their values for

structure
post_score
comment_score

The second thing I noticed when looking at the graph in Gephi. I imported the graph using "Don't Merge" for parallel edges. A lot of edges seem to have been created for the reply by u/Ferox101:

There is one edge from u/Ferox101 (n498) to u/without_my_remorse (n403, the author of the main post in the thread) representing the top-level comment ("I think you've missed a crucial word..."). You can see that edge just above the blue highlighted area. That one edge is correct.

However, there are 24 edges (highlighted in blue) from u/Ferox101 to u/without_my_remorse representing the reply "Yeah, it's pretty cheeky...". Where do all these edges come from?

object 'collect_log' not found

Hello, I tried to run the example in the 'README', and got the error says object 'collect_log' not found

collect a listing of the 25 top threads by upvote of all time

collect_rd_listing <- Authenticate("reddit") |>

Collect(endpoint = "listing", subreddits = subreddits,

        sort = "top", period = "all", max = 25,

        writeToFile = TRUE, verbose = TRUE)

The following is the error message:
No listings were collected.
Error in Collect.listing.reddit(Authenticate("reddit"), endpoint = "listing", :
object 'collect_log' not found

Error in curl::curl_fetch_memory(url, handle = handle): Error in the HTTP2 framing layer

Recently the following error will occasionally and temporarily occur when performing youtube Collect operations:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Error in the HTTP2 framing layer

This is likely due to some issue with Google servers and or recent updates to httr and curl.

This is a temporary problem and simply performing the operation again will often immediately succeed.

However, for the time being this issue seems to be alleviated by using the following work around before running VOSONDash or vosonSML:

httr::set_config(httr::config(http_version = 0))

Failing reddit thread listing and thread collect functions

There seems to be recent new issues with the httr and RCurl packages and reddit GET requests.

In package testing all requests are returning 403 error codes indicating the requests are being blocked by the reddit server. This is occurring for both default package (httr) and custom user-agent strings under the R environment. It may be related to package or environment changes with R 4.4.

As blocking is determined by a number of conditions it may not occur for everyone and it may be temporary. It does not appear to be IP related as reddit json could be requested and retrieved using a web browser and curl installed on the same test system with default user-agent strings.

This issue was not occurring a few weeks ago under R 4.3 at the end of April 2024. It now occurs under a fresh R 4.3 install using the latest package versions, previous package versions have not yet been tested.

Create function doesn't work in vosonSML_0.29.4

Hi there,
When I use the version 0.29.4, Create doesn't generate a network object (i.e., NULL). It seems this's true regardless of OS (e.g., Windows, Mac, and Linux) and different versions of R (e.g., 3.5 and 3.6). This isn't an issue on the version 0.27.2 if I do the same thing (i.e., collecting data and then generating a network using Create).

Broken documentation link & import Rtweet

Hey VosonLab!

First, great tool, I have been exploring the functionalities and I am eager to use it in my project. I just wanted to let you know that the following link (https://vosonlab.github.io/posts/2021-02-11_twitter_vsml_from_rtweet/) is broken in the documentation page... coincidentally I was trying to find how to load my Rtweet dataset directly, without having to run collect through the package. Is that possible?

Also, is it possible to connect to other endpoints of the twitter API, like the getTimeline ?

Best,
Lucas

ImportData function from latest GitHub release

Hello,

I would like to learn network analysis and I came across your blog post. I pulled data from rtweet package and I would like to use your package to create different network data frames. However, when I use the ImportData function I get the following error: Error in if (!type %in% supported_types) { : the condition has length > 1.

Can you please help me figure out what I am doing wrong?

library(dplyr)
library(vosonSML)

# Load in data (rt data frame)
rt = readRDS("Data.rds")

# Convert to work with vosonSML package
class(rt) = append(c("datasource", "twitter"), class(rt))
rt = rt %>% ImportData("twitter")

Thank you for your help!

vosonlab / vosonsml Goto Github PK

vosonsml's People

Contributors

Stargazers

Watchers

Forkers

vosonsml's Issues

collect a listing of the 25 top threads by upvote of all time

Recommend Projects

Recommend Topics

Recommend Org