vosonlab / vosonsml Goto Github PK
View Code? Open in Web Editor NEWR package for collecting social media data and creating networks for analysis.
Home Page: https://vosonlab.github.io/vosonSML/
License: GNU General Public License v3.0
R package for collecting social media data and creating networks for analysis.
Home Page: https://vosonlab.github.io/vosonSML/
License: GNU General Public License v3.0
In Version 0.32.7 (on R 4.2.2 with rtweet 1.1.0), I receive an error when executing a simple Twitter Collect:
twitter_data <- twitterAuth %>%
Collect(searchTerm = "#auspol",
searchType = "recent",
numTweets = 1000,
lang = "en",
includeRetweets = TRUE,
writeToFile = TRUE)
Error message:
Error in `dplyr::bind_cols()`:
! Can't recycle `..1` (size 1000) to match `..2` (size 43).
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/vctrs_error_incompatible_size>
Error in `dplyr::bind_cols()`:
! Can't recycle `..1` (size 1000) to match `..2` (size 43).
---
Backtrace:
1. ... %>% ...
4. vosonSML:::Collect.search.twitter(...)
5. vosonSML:::import_rtweet_(df_tweets)
6. dplyr::bind_cols(...)
9. vctrs::vec_cbind(!!!dots, .name_repair = .name_repair)
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/vctrs_error_incompatible_size>
Error in `dplyr::bind_cols()`:
! Can't recycle `..1` (size 1000) to match `..2` (size 43).
---
Backtrace:
▆
1. ├─... %>% ...
2. ├─vosonSML::Collect(...)
3. ├─vosonSML:::Collect.twitter(...)
4. ├─vosonSML:::Collect.search.twitter(...)
5. │ └─vosonSML:::import_rtweet_(df_tweets)
6. │ └─dplyr::bind_cols(...)
7. │ ├─dplyr:::fix_call(vec_cbind(!!!dots, .name_repair = .name_repair))
8. │ │ └─base::withCallingHandlers(...)
9. │ └─vctrs::vec_cbind(!!!dots, .name_repair = .name_repair)
10. └─vctrs::stop_incompatible_size(...)
11. └─vctrs:::stop_incompatible(...)
12. └─vctrs:::stop_vctrs(...)
13. └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = vctrs_error_call(call))
In Versions 0.30.6 and 0.29.13, I receive a slightly different error when executing the same code:
Collecting tweets for search query...
Search term: #auspol
Unable to determine rate limit.
Requested 1000 tweets.
Error in `dplyr::filter()`:
! Problem while computing `..1 = .data$status_id %in% ...`.
Caused by error in `.data$status_id`:
! Column `status_id` not found in `.data`.
Run `rlang::last_error()` to see where the error occurred.
Warning message:
`create_token()` was deprecated in rtweet 1.0.0.
ℹ See vignette('auth') for details
ℹ The deprecated feature was likely used in the vosonSML package.
Please report the issue at <https://github.com/vosonlab/vosonSML/issues>.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
> rlang::last_error()
<error/rlang_error>
Error in `dplyr::filter()`:
! Problem while computing `..1 = .data$status_id %in% ...`.
Caused by error in `.data$status_id`:
! Column `status_id` not found in `.data`.
The same code works fine on R 4.1.2 with vosonSML 0.29.13 and vosonSML 0.30.6.
On R 4.4.0 with vosonSML 0.34.2, Collect(<youtube>)
and Collect(<thread.reddit>)
throw the errors below, respectively:
Error in Collect.youtube(yt_auth, videoIDs = video_url, maxComments = 500, :
object 'collect_log' not found
Error in Collect.thread.reddit(Authenticate("reddit"), threadUrls = thread_urls, :
object 'collect_log' not found
Sample:
# YouTube Data Collection --------------------------------------------------
library(vosonSML)
my_api_key <- "XXX"
yt_auth <- Authenticate("youtube", apiKey = my_api_key)
video_url <- c("https://www.youtube.com/watch?v=jNQXAC9IVRw")
yt_data <- yt_auth |> Collect(videoIDs = video_url,
maxComments = 500,
writeToFile = TRUE,
verbose = TRUE)
# Reddit Data Collection --------------------------------------------------
thread_urls <- c(
"https://www.reddit.com/r/AusFinance/comments/ymkrlr/",
"https://www.reddit.com/r/AusFinance/comments/ugetai/")
rd_data <- Authenticate("reddit") |>
Collect(threadUrls = thread_urls,
sort = "best",
waitTime = c(6, 8),
writeToFile = TRUE,
verbose = TRUE)
When I load a .rds file and try to run an activity network graph, the following error message appears :
activityNetwork <- twitterData %>%
Create("activity") %>%
Graph(writeToFile = TRUE) %>%
summary()
Creating igraph network graph...Generating twitter activity network...
-------------------------
collected tweets | 370562
quote tweets | 39868
reply tweets | 182792
tweets | 147902
nodes from data | 79492
nodes | 450054
edges | 222660
-------------------------
Done.
Error in igraph::graph_from_data_frame(d = net$edges, directed = directed, :
Duplicate vertex names
Error occurs after running Authenticate("twitter", ...) %>% Collect(searchTerm="#auspol", language="en", numTweets=1000, writeToFile=TRUE)
.
Tested in vosonSML Version 0.26.2 and 0.29.4. Error ccurs in both versions.
Error does not occur when using rtweet directly. Could retrieve tweets as usual with rtweet (version 0.6.8).
Tested 6 March 2020 ca. 5am UTC. R version 3.5.2.
Collected tweets using the rtweet package. Now trying to generate a Twitter semantic network. Was able to generate an actor network with no problem.
Error now occurs after running
semanticNetwork <- twitterData %>% Create("semantic", removeTermsOrHashtags = c("#auspol"), termFreq = 2, hashtagFreq = 10, verbose = TRUE)
I'm planning on using this package for a project I'm working on. I need to first write a function that will return a vector of all video IDs that match my search parameters (so that I can then use them with CollectDataYoutube()). If I created this function and submitted a pull request, would you be interested in accepting?
When I try to network with the actors in a video, people who comment and their responses, I get the following error:
Generating youtube actor network...Error: x
must be a vector, not a tbl_df/tbl/data.frame/datasource/youtube
object.
I review the code and write it in different ways but for the moment i am getting this. How can i fix it? The code is the following one:
library(vosonSML)
library(devtools)
apikey <- "xxxxxxx"
key <- Authenticate("youtube",
apiKey=apikey)
video = c("2GRs1HKaLnA) #Example
actorNetwork <- Authenticate("youtube", apiKey = apikey) %>%
Collect(videoIDs = video) %>%
Create("actor", writeToFile = TRUE)
As I said I tried different ways:
youtubeData <- youtubeAuth %>%
Collect(videoIDs = video, maxComments = 500, writeToFile = TRUE)
#Note that if you read the YouTube dataframe from disk, you will need to modify the class values for the object, before you can use it with vosonSML:
youtubeData <- readRDS("/path/to/data/data.rds")
class(youtubeData) <- append(class(youtubeData), c("datasource", "youtube"))
actorNetwork <- youtubeData %>% Create("actor") %>% AddText(youtubeData)
Anyone could help me please??
thank you very much!
Error in dplyr::mutate()
:
! Problem while computing url = ifelse(...)
.
Caused by error in stri_replace_first_regex()
:
! Incorrectly nested parentheses in regexp pattern. (U_REGEX_MISMATCHED_PAREN, context=#:~:text=曉藍 (136)-,本週財經看點 (153),-李進 (404$
)
Hello, thanks for creating this lovely package and documenting it so well; I plan to use it in my team's research!
A recent paper in a prominent educational research journal used a different (not R-based) version of VOSON that seemed to include the use of Google Analytics to document/collect inbound links to specified websites - and that it used Google Analytics to do this. Is that functionality a different version of VOSON had - and, is anything like that possible now? My understanding is that Google Analytics doesn't support that, so I was surprised to read about what they did in that paper, but wonder if it once available and is now phased out, or if there may be other ways to accomplish something similar. Thanks for considering this!
Josh
I'm aware that collecting comments from reddit is limited to 500 comments, but I did not expect the collection to error and exit completely. Is this intentional?
# First we create the authentication token and then we collect the data.
reddit.data <- Authenticate("reddit") |>
Collect(
threadUrls = "https://www.reddit.com/r/Music/comments/pgm7le/abba_will_release_a_new_album_first_time_since/",
verbose = TRUE)
results in:
Error: Error in `dplyr::mutate()`:
! Problem while computing `rm = ifelse(...)`.
Caused by error in `ifelse()`:
! dims [product 1] do not match the length of object [574]
7. stop(gsub("^Error:\\s", "", paste0(e)), call. = FALSE)
6. value[[3L]](cond)
5. tryCatchOne(expr, names, parentenv, handlers[[1L]])
4. tryCatchList(expr, classes, parentenv, handlers)
3. tryCatch({
threads_df <- reddit_build_df(threadUrls, waitTime, ua, verbose,
msg = msg)
}, error = function(e) { ...
2. Collect.reddit(Authenticate("reddit"), threadUrls ="https://www.reddit.com/r/Music/comments/pgm7le/abba_will_release_a_new_album_first_time_since/",
verbose = TRUE)
1. Collect(Authenticate("reddit"), threadUrls = "https://www.reddit.com/r/Music/comments/pgm7le/abba_will_release_a_new_album_first_time_since/",
verbose = TRUE)
I would expect collection to fail gracefully with either a warning or error message, and return whatever results it could obtain. If that is not possible, a more helpful error message would be nice.
Hey Bryan!
I was demoing the tool today and encountered an error when trying to collect a Reddit thread.
Here is the thread URL: https://www.reddit.com/r/pcmasterrace/comments/dc3wbn/brother_laser_printer_airprint_ftw/
Here is the error message:
reddit collection error: Error in gsub(.html_entities[i], .html_entities[i + 1], x, fixed = TRUE): input string 34 is invalid in this locale
I believe it might have something to do with the following issue documented on Stack Overflow: https://stackoverflow.com/questions/41717781/warning-input-string-not-available-in-this-locale
The solution is probably to add this line somewhere early in the code before any character vector manipulation starts to occur:
Sys.setlocale('LC_ALL','C')
I've come across this issue before with other work in R using Mac, and IIRC this seemed to fix it.
This error occurs when a request to the youtube API to retrieve comments for given videos returns no results. It is most often caused by reaching the youtube API rate-limit or an issue with the API key set by the user.
Recently there have been users reporting that previously working and correct API keys are returning this error. The cause is unknown, but this problem seems to be related to some change in the Google API's projects of which the keys are attached. Generating new keys attached to the same project of the key that has stopped working also fails although everything appears correct on the Google API console.
Further information about the problem can be gathered by running the following snippet with the users API key value assigned to youtube_key
:
library(httr)
youtube_key <- "xxxxxxxx"
req <- httr::GET("https://www.googleapis.com/youtube/v3/search",
query = list(part = "snippet",
q = "programming",
maxResults = 25,
key = youtube_key))
httr::content(req)
# check the values of $error$code and $error$message
The output from the API request will have more information about the problem in the $error$code
and $error$message
fields.
The problem with the Google API project as described above has been found to have the following error values with x
's substituted for project ID's:
$error$code
[1] 403
$error$message
[1] "Access Not Configured.
YouTube Data API has not been used in project xxxxx before or it is disabled.
Enable it by visiting
https://console.developers.google.com/apis/api/youtube.googleapis.com/overview?project=xxxxx
then retry.
If you enabled this API recently, wait a few minutes for the action to propagate
to our systems and retry."
The resolution to this issue that has been found to work for all users so far is to simply create a new project and then generate a new API key attached to the new project.
Note: The above error instructions about enabling the key may also work.
The next release versions of vosonSML
will have more informative messages and handling of youtube API errors.
Hi!
I tried to collect some YouTube comment data (e.g., https://www.youtube.com/watch?v=ATYK2svJ6eM). However, I got an error like:
Error in Collect.youtube(., videoIDs = myYoutubeVideoIds, maxComments = 100, : No comments could be collected from the given video Ids: ATYK2svJ6eM
I also asked my colleague to try to do the same thing using a different YouTube video and replicated the same error.
And the error seems similar to this one: vosonlab/SocialMediaLab#14
I'm using Windows 10, R 3.5.1, and vosonSML 0.27.2.
Thanks for your help in advance!
When running the example, there were errors:
twomodeNetwork <- twitterData %>% Create("twomode", removeTermsOrHashtags = c("#auspol"))
Generating twitter 2-mode network...Usingto_lower = TRUE
withtoken = 'tweets'
may not preserve URLs.
Done.
twomodeNetwork
Network attributes:
Error in if (names(x$gal)[i] == "n") { : argument is of length zero
Are the errors thrown by vosonSML or other packages? Thanks!
On R 4.3.2 with vosonSML 0.34.2, it seems that Collect(<thread.reddit>)
fails with 403 error. However, trying to collect data for the same threads with RedditExtractoR 3.0.9 succeeds.
Sample:
# vosonSML Data Collection --------------------------------------------------
library(vosonSML)
thread_urls <- c(
"https://www.reddit.com/r/AusFinance/comments/ymkrlr/",
"https://www.reddit.com/r/AusFinance/comments/ugetai/")
rd_data <- Authenticate("reddit") |>
Collect(threadUrls = thread_urls,
sort = "best",
waitTime = c(6, 8),
writeToFile = TRUE,
verbose = TRUE)
# RedditExtractoR Data Collection --------------------------------------------------
library(RedditExtractoR)
rex_data <- get_thread_content(thread_urls)
Apologies if I missed something in the code or the documentation, but it seems to me that Create.actor.reddit.R somehow creates extra edges. Using vosonSML version 0.34.1.
Example:
thread_urls <- c("https://www.reddit.com/r/AusFinance/comments/ugetai/")
rd_data <- Authenticate("reddit") |>
Collect(threadUrls = thread_urls,
sort = "best",
waitTime = c(6, 8),
writeToFile = TRUE,
verbose = TRUE)
# Remove rows that have 'NA'
rd_data <- rd_data[complete.cases(rd_data), ]
rd_actor_graph <- rd_data |>
Create("actor") |>
AddText(rd_data,
verbose = TRUE) |>
Graph()
# Replace node IDs with actual user names
V(rd_actor_graph)$name <- V(rd_actor_graph)$user
As of collection date, this returned 1259 rows for rd_data
(i.e., comments/replies) and 550 nodes with 10218 edges for rd_actor_graph
. The number of nodes matches the number of unique users, so that's fine. However, looking at the edges, I'm a bit puzzled.
For example, if I look at the Reddit thread on the Web, I can see the following comment/replies. (I used the same sorting method: 'Best'.) There is one top-level comment by u/Ferox101, a reply by u/brednoq, and a reply to that by u/Ferox101.
In R, I can see the following corresponding rows:
The first thing that I notice is that for each comment/reply that u/Ferox101 made, there are two rows in rd_data
. The two respective rows differ only by their values for
structure
post_score
comment_score
The second thing I noticed when looking at the graph in Gephi. I imported the graph using "Don't Merge" for parallel edges. A lot of edges seem to have been created for the reply by u/Ferox101:
There is one edge from u/Ferox101 (n498) to u/without_my_remorse (n403, the author of the main post in the thread) representing the top-level comment ("I think you've missed a crucial word..."). You can see that edge just above the blue highlighted area. That one edge is correct.
However, there are 24 edges (highlighted in blue) from u/Ferox101 to u/without_my_remorse representing the reply "Yeah, it's pretty cheeky...". Where do all these edges come from?
Hello, I tried to run the example in the 'README', and got the error says object 'collect_log' not found
collect a listing of the 25 top threads by upvote of all time
collect_rd_listing <- Authenticate("reddit") |>
Collect(endpoint = "listing", subreddits = subreddits,
sort = "top", period = "all", max = 25,
writeToFile = TRUE, verbose = TRUE)
The following is the error message:
No listings were collected.
Error in Collect.listing.reddit(Authenticate("reddit"), endpoint = "listing", :
object 'collect_log' not found
Recently the following error will occasionally and temporarily occur when performing youtube Collect
operations:
Error in curl::curl_fetch_memory(url, handle = handle) :
Error in the HTTP2 framing layer
This is likely due to some issue with Google servers and or recent updates to httr
and curl
.
This is a temporary problem and simply performing the operation again will often immediately succeed.
However, for the time being this issue seems to be alleviated by using the following work around before running VOSONDash
or vosonSML
:
httr::set_config(httr::config(http_version = 0))
There seems to be recent new issues with the httr
and RCurl
packages and reddit
GET requests.
In package testing all requests are returning 403
error codes indicating the requests are being blocked by the reddit
server. This is occurring for both default package (httr
) and custom user-agent strings under the R environment. It may be related to package or environment changes with R 4.4
.
As blocking is determined by a number of conditions it may not occur for everyone and it may be temporary. It does not appear to be IP related as reddit json could be requested and retrieved using a web browser and curl
installed on the same test system with default user-agent strings.
This issue was not occurring a few weeks ago under R 4.3
at the end of April 2024. It now occurs under a fresh R 4.3
install using the latest package versions, previous package versions have not yet been tested.
Hi there,
When I use the version 0.29.4, Create
doesn't generate a network object (i.e., NULL). It seems this's true regardless of OS (e.g., Windows, Mac, and Linux) and different versions of R (e.g., 3.5 and 3.6). This isn't an issue on the version 0.27.2 if I do the same thing (i.e., collecting data and then generating a network using Create
).
Hey VosonLab!
First, great tool, I have been exploring the functionalities and I am eager to use it in my project. I just wanted to let you know that the following link (https://vosonlab.github.io/posts/2021-02-11_twitter_vsml_from_rtweet/) is broken in the documentation page... coincidentally I was trying to find how to load my Rtweet dataset directly, without having to run collect through the package. Is that possible?
Also, is it possible to connect to other endpoints of the twitter API, like the getTimeline ?
Best,
Lucas
Hello,
I would like to learn network analysis and I came across your blog post. I pulled data from rtweet package and I would like to use your package to create different network data frames. However, when I use the ImportData function I get the following error: Error in if (!type %in% supported_types) { : the condition has length > 1.
Can you please help me figure out what I am doing wrong?
library(dplyr)
library(vosonSML)
# Load in data (rt data frame)
rt = readRDS("Data.rds")
# Convert to work with vosonSML package
class(rt) = append(c("datasource", "twitter"), class(rt))
rt = rt %>% ImportData("twitter")
Thank you for your help!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.