oduwsdl / followercounthistory Goto Github PK
View Code? Open in Web Editor NEWCrawler that grabs Twitter follower counts across time via internet archives given account user name
License: MIT License
Crawler that grabs Twitter follower counts across time via internet archives given account user name
License: MIT License
Getting a few pages in Bengali from the Internet Archive that breaks the R script because the numbers are not Arabic numerals.
Tool is accepting redirects as a new memento
Fix axis and labels to make more attractive graph
Hi all,
First off, thanks for the awesome tool! It was instrumental in my being able to pull Twitter follower statistics for a year-end review for the Monero project.
However, I can't seem to get graphs working properly, no matter what I try.
Commands used:
docker container run --rm -it -v $PWD:/app -u $(id -u):$(id -g) --entrypoint /bin/bash oduwsdl/fch:2.0
I have no name!@054b386f00b1:/app$ ./fch/__main__.py --st=20200418000000 --et=20210418000000 monero | Rscript twitterFollowerCount.R
Error in `$<-.data.frame`(`*tmp*`, MementoTimestamp, value = numeric(0)) :
replacement has 0 rows, data has 7
Calls: $<- -> $<-.data.frame
Execution halted
Rscript version:
Rscript --version
R scripting front-end version 3.5.2 (2018-12-20)
Any help would be greatly appreciated!
Running:
python3 ./FollowerHist.py -e Ocasio2018
echoes the following to stdout:
http://web.archive.org/web/timemap/link/http://twitter.com/Ocasio2018
15 archive points found
20171018033758
2955
20180411044556
20180505060722
19150
20180523034652
20888
20180527074406
22706
20180529221423
24899
20180530045347
25135
20180606211514
33135
20180614141606
36746
20180621005111
43546
20180625203123
48758
20180625203536
48766
20180625203539
48766
20180627025800
85757
20180627165955
I presume the longer numerical strings are 14-digit datetimes of respective mementos and the other numerical strings are a count but there is nothing to signify this.
I proposed making the stdout results a little more descriptive, even if the ultimate result is outputted to a file.
Axis label gets overlapped when the numbers on the y axis get large enough. Remove label or account for size dynamically
It would be useful for some potential users to have the tool available on Pypi so instead of requiring them to download the source, they can run a single command like pip install fch
or pip install followercounthistory
.
We (@ibnesayeed and I) recently did this for cdxjGenerator, which has comparatively trivial code relative to FollowerCountHistory.
I have not gotten a chance to thoroughly examine the codebase for potential complications but regardless, it would be useful to consider to make the tool more accessible to those that would like to use it.
The README provides options to push the last memento to archives. This feature seems beyond the objective of this tool, however useful.
FollowerCountHistory ought to be more functionally cohesive. I am suggesting we remove this option and create another tool that imports FollowerCountHistory
with this expanded functionality.
In doing this, FollowerCountHistory
could also be adapted to be a Python module, uploaded to pip, and used by others' tools without the functional scope creep.
if there are no archives, or only old ones, of the person push to the archive
Running fch joebiden
gives the following error, tested on both Ubuntu as well as MacOS:
Fetch Timemap: Error: http://twitter.com/joebiden HTTPSConnectionPool(host='memgator.cs.odu.edu', port=443): Max retries exceeded with url: /timemap/cdxj/http://twitter.com/joebiden (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fc868335ee0>: Failed to establish a new connection: [Errno 61] Connection refused'))
Hi! I was wondering if there's a way to specify specific dates for which I want data for. For instance, if I wanted to only get data for the year 2022, how would I do so? Thanks in advance!
FollowerCount.py uses nested try-excepts to check for the historical Twitter UIs.
try:
result = soup.select(".ProfileNav-item--followers")[0]
try:
result = result.find("a")['title']
except:
result = result.find("a")['data-original-title']
except:
try:
result = soup.select(".js-mini-profile-stat")[-1]['title']
except:
try:
result = soup.select(".stats li")[-1].find("strong")['title']
except:
try:
result = soup.select(".stats li")[-1].find("strong").text
except:
...
This nesting goes down to excessive levels but seems to work for the expected logic. However, there are some stylistic issues that would help the programmatic flow.
For example, PEP20 states "Flat is better than nested." PEP8 also recommends a line length of 72 characters, which is far exceeded due the nested try-except scoping. In the previous PEP, bare excepts are also discouraged due to the implications described there.
try-except
s are the right paradigm to use here, via Python recommendation of asking for forgiveness (exceptions) over permission (conditionals) (i.e., EAFP). However, the implementation could be improved by making the code structure flatter, which will should have positive effects in maintainability, among other benefits.
In the most recent version of fch (1.0.11), the command line flags no longer appear to work correctly. This was not the case in the previous version (1.0.10). I first noticed this when installing from source to verify #21 but was also able to replicate via the pypi release.
❯ fch
zsh: command not found: fch
❯ pip install fch==1.0.11
❯ fch
Traceback (most recent call last):
File "/usr/local/bin/fch", line 5, in <module>
from fch.__main__ import main
File "/usr/local/lib/python3.8/site-packages/fch/__main__.py", line 11, in <module>
from fch.core.config.configreader import ConfigurationReader
ModuleNotFoundError: No module named 'fch.core.config.configreader'
❯ pip uninstall -y fch
❯ fch
zsh: command not found: fch
❯ pip install fch==1.0.10
❯ fch
usage: fch [-h] [--st] [--et] [--freq] [-f] thandle
fch: error: the following arguments are required: thandle
❯
good morning,
I just tried to run a basic query:
fch joebiden
Such command results in the following error:
parse_timemap: 'NoneType' object has no attribute 'groupdict'
'NoneType' object is not iterable
Am i missing something? All dependencies are installed
thanks in advance for any help!
currently just rewrites the data in csv file below previous data including headers. Change so that the code keeps old data and just looks for new data from the archive
I installed fch via pip but wanted to generate plots, so also ran git clone https://github.com/oduwsdl/FollowerCountHistory
in /tmp/
.
While my current working directory is /tmp/
, I ran fch machawk1 > followers.csv
. This created /tmp/followers.csv
.
I then moved into the source directory using cd FollowerCountHistory/
, ran Rscript twitterFollowerCount.R ../followers.csv
, and received an error message with no plots generated:
[1] "Unsupported file type"
Warning message:
In if (ext == "csv") { :
the condition has length > 1 and only the first element will be used
However, running the same command with an absolute path to the CSV file works, i.e., Rscript twitterFollowerCount.R /tmp/followers.csv
generates plots without an error. Further, moving the CSV file into the source directory (mv ../followers.csv ./
) and running Rscript twitterFollowerCount.R followers.csv
works but adding the relative part of the data path, Rscript twitterFollowerCount.R ./followers.csv
causes the same above error.
This is likely an issue with the Rscript trying to detect the file type and choking on anything but the absolute path or the data file in the same directory.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.