hodgesmr / mastodon_digest Goto Github PK
View Code? Open in Web Editor NEWA Python script that aggregates recent popular posts from your Mastodon timeline
License: BSD 3-Clause "New" or "Revised" License
A Python script that aggregates recent popular posts from your Mastodon timeline
License: BSD 3-Clause "New" or "Revised" License
Using the default options I noticed this error:
Traceback (most recent call last):
File "/opt/mastodon_digest/run.py", line 190, in <module>
run(
File "/opt/mastodon_digest/run.py", line 70, in run
threshold_posts = threshold.posts_meeting_criteria(posts, scorer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/mastodon_digest/thresholds.py", line 26, in posts_meeting_criteria
all_post_scores = [p.get_score(scorer) for p in posts]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/mastodon_digest/thresholds.py", line 26, in <listcomp>
all_post_scores = [p.get_score(scorer) for p in posts]
^^^^^^^^^^^^^^^^^^^
File "/opt/mastodon_digest/models.py", line 21, in get_score
return scorer.score(self)
^^^^^^^^^^^^^^^^^^
File "/opt/mastodon_digest/scorers.py", line 75, in score
return super().score(scored_post) * super().weight(scored_post)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/mastodon_digest/scorers.py", line 38, in weight
weight = 1 / sqrt(scored_post.info["account"]["followers_count"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: math domain error
Bug is on this line:
Line 36 in f70a018
By adding debug info to scored_post.info["account"]["followers_count"]
I noticed that indeed I'm following an account which has hidden its number of followers:
Therefore, followers_count=-1
and the sqrt
fails
I haven't read the algorithm yet to suggest what's the reasonable thing to do in this case. Perhaps someone will be quicker than me :)
Thank you for this really awesome tool, putting the algorithm into the hands (and CI-pipelines ;) of the people. I can think of some use cases for not just focusing on the 'top scorers' but also the overlooked low and medium scorers. I think this is partly the intent of #10 - but I would also like to see the page having several sections, with separate result sets.
Hello! Thinking aloud momentarily, would love your thoughts and gut check on this.
I've been thinking a bit about how to integrate this kind of digest experience more natively into Mastodon. Generating the stand alone HTML is a great POC, but user experience and availability would be improved if we had a way to get the results of the digest into Mastodon itself.
In an ideal world (imho, and I'm very biased having worked on this feature at Twitter previously) Mastodon's platform would implement something akin to Twitter's Collections API (aka “Custom Timelines”), which are effectively “lists for statuses”: A named data store into which people can curate posts in an arbitrary order. An application like Mastodon Digest would then add its filtered posts and boosts into a “Mastodon Digest” timeline, which the user could browse through any client.
Since Mastodon Collections don't exist today, I wonder if this could be prototyped by overriding the Mastodon Bookmarks feature. With a few additions for polish, I imagine this:
Obvious caveat: This is a very opinionated use of the Bookmarks feature and I'm sure wouldn't align with how some people already use it. That's fine. It might work for many people and maybe demonstrate the value of arbitrary custom timelines in Mastodon.
You might activate it with a -b
option, allowing the existing functionality to be used to preview and refine the filtering before writing anything to production.
Love your thoughts. Thanks.
Since the user's home instance is not aware of all followers of the author and all boosts, stars, etc, related to the post, the Scorer works with incomplete information when calculating the score for the post based on the information retrieved from the timeline request to the home instance.
It would be more accurate to query the information about the post and the user from their respective home instance.
Skip posts by authored by anyone with the #nobot
or #noindex
tag in their bio
I think mastodon_username
at
Line 48 in 3f2ea4d
mastodon.me()
(as specified in https://mastodonpy.readthedocs.io/en/stable/15_everything.html#mastodon.Mastodon.me). I removed the username requirement for https://fediview.com by doing this.
There might be a use case where someone wants to log in with one account, but filter out interactions from another account, but that seems like an edge case?
Let me know if you'd like a PR for this and I'd be happy to create one.
At some point the embedded posts all started to render for me with a 150px height for each iframe which means I have to go and scroll within each iframe to be able to read the contents of each post. This is running a Docker image built from d91876a (but I also had this problem in version 0.0.12 which I ran until today).
The small, fixed height makes essentially all posts cut off (if they are longer than one line of text). Since I think this happened without me having updated Mastodon Digest, maybe this is caused by some change in cross-origin behavior in more recent Mastodon versions?
In worst case I can adjust the stylesheet locally to have a chosen height value for iframe.mastodon-embed that works for most posts so I don't have to scroll in each iframe but just the overly long ones. (At the cost of making short posts be unnecessarily long). But it would be nice to have each post have just the right height needed.
Here is the computed layout in two browsers.
Chrome (version 111.0.5563.64 running on version 12.5.1 of MacOS):
The Mastodon.py library gives "mastodonpy" as its default user agent. It is possible to specify a different user agent, which would allow instance operators to block this bot script (or distinguish it from others). PR coming.
I'd suggest that the -o
argument should be usable with either a directory or file parameter.
If given a filename, it should have the extension html
and should not exist.
If given a directory, it should exist, and the filename should give some information about the run-conditions (like scorer used, timestamp, and time range, for instance).
I checked out tag 0.3.1
and tried to build and run the docker container, but the run command did not work
make build
:
docker build -f Dockerfile \
-t hodgesmr/mastodon-digest:0.3.1 \
-t hodgesmr/mastodon-digest:latest . \
--build-arg VERSION=0.3.1 \
--build-arg BUILD_DATE="Fr 14. Apr 08:57:40 UTC 2023" \
--build-arg VCS_REF=be98741 \
--build-arg NAME=mastodon-digest \
--build-arg VENDOR="Matt Hodges" \
--build-arg ORG=hodgesmr \
--build-arg WORKDIR="/opt/mastodon-digest"
[+] Building 19.6s (14/14) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 1.05kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/python:3.11-slim-bullseye 0.5s
=> [internal] load build context 0.0s
=> => transferring context: 774B 0.0s
=> [1/9] FROM docker.io/library/python:3.11-slim-bullseye@sha256:286f2f1d6f2f730a44108656afb04b131504b610a6cb2f3413918e98dabba67e 0.0s
=> CACHED [2/9] WORKDIR /opt/mastodon-digest 0.0s
=> CACHED [3/9] COPY requirements.txt . 0.0s
=> [4/9] RUN mkdir -p venvs 0.4s
=> [5/9] RUN python3 -m venv venvs/mastodon-digest 4.3s
=> [6/9] RUN venvs/mastodon-digest/bin/pip install --upgrade pip 3.0s
=> [7/9] RUN venvs/mastodon-digest/bin/pip install -r requirements.txt 9.6s
=> [8/9] COPY templates/ ./templates/ 0.1s
=> [9/9] COPY *.py ./ 0.1s
=> exporting to image 1.5s
=> => exporting layers 1.5s
=> => writing image sha256:5ba9c5b3b324ad5501f9bb13a2bfc67c738e4ffacd5ef077c16e00edf83926ef 0.0s
=> => naming to docker.io/hodgesmr/mastodon-digest:0.3.1 0.0s
=> => naming to docker.io/hodgesmr/mastodon-digest:latest
make run
:
docker run --env-file .env -it --rm -v "/render:"/opt/mastodon-digest"/render" hodgesmr/mastodon-digest
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "venvs/mastodon_digest/bin/python3": stat venvs/mastodon_digest/bin/python3: no such file or directory: unknown.
make: *** [Makefile:51: run] Fehler 127
I also tried:
sudo docker run --env-file .env -it --rm -v "/render":"/opt/mastodon-digest" hodgesmr/mastodon-digest
sudo docker run --env-file .env -it --rm -v /render:/opt/mastodon-digest hodgesmr/mastodon-digest
but got the same error message
In the discussion on mastodon I saw the idea to provide a user-defined list of accounts to boost for the digest and implemented a very basic functionality for this using a configuration file. I think the approach could be used for more options. See my fork here: https://github.com/leoluecken/mastodon_digest
@hodgesmr Would that be something that you'd be interested to include here?
I'm on a Mastodon instance running the Hometown fork, and am getting this error:
raise MastodonVersionError("Version check failed (Need version " + version + ")")
mastodon.errors.MastodonVersionError: Version check failed (Need version 2.4.3)
This makes me wonder if the mastodon_digest works with Hometown. My instance runs Hometown v1.0.5+3.5.2 (i.e Hometown 1.0.5 and Mastodon 3.5.2)
Most urls in the app are parsed and constructed with string concatenation. Move that work to urlparse and urlunparse.
I'm game to add some tests to this. (Ulterior motive: I want to use it as a back end for something like icymi_law.) It will be easier for me to do this if we use poetry for dependency management rather than requirements.txt. I am happy to do this myself. Is that cool with you?
If this ran as a service and implemented enough of the Mastodon login and timeline APIs (https://docs.joinmastodon.org/methods/timelines/), and exposed the digests as timelines, third party clients that support multiple accounts (e.g., Tusky) could integrate with this very easily.
My ideal way of getting this digest would be via email. For self-hosted instances, Zapier email would suffice. I think what would work would be:
mastodon_digest
has access to a Zapier webhook keyKick that off in a cronjob, with a pretty basic Zapier task, and it should just work.
Running python3 run.py
I get the following output:
Building digest from the past 12 hours...
Traceback (most recent call last):
File "/home/[...]/mastodon_digest/run.py", line 159, in <module>
run(
File "/home/[...]/mastodon_digest/run.py", line 53, in run
posts, boosts = fetch_posts_and_boosts(hours, mst, mastodon_username, timeline)
File "/home/[...]/mastodon_digest/api.py", line 20, in fetch_posts_and_boosts
filters = mastodon_client.filters()
File "/home/linuxbrew/.linuxbrew/Cellar/[email protected]/3.10.8/lib/python3.10/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/home/linuxbrew/.linuxbrew/Cellar/[email protected]/3.10.8/lib/python3.10/site-packages/mastodon/utility.py", line 42, in wrapper
raise MastodonVersionError("Version check failed (Need version " + version + ")")
mastodon.errors.MastodonVersionError: Version check failed (Need version 2.4.3)
My instance is running v4.0.2.
Note: [...]
is modified by me
This is not a main bottleneck, but a simple fix should improve the part in thresholds.py
Here's the diff - sorry for inconvenience :)
diff --git a/thresholds.py b/thresholds.py
index 739d869..1524e69 100644
--- a/thresholds.py
+++ b/thresholds.py
@@ -24,13 +24,8 @@ class Threshold(Enum):
"""Returns a list of ScoredPosts that meet this Threshold with the given Scorer"""
all_post_scores = [p.get_score(scorer) for p in posts]
- threshold_posts = [
- p
- for p in posts
- if stats.percentileofscore(all_post_scores, p.get_score(scorer))
- >= self.value
- ]
-
+ q = stats.scoreatpercentile(all_post_scores, per=self.value)
+ threshold_posts = [p for p, s in zip(posts, all_post_scores) if s >= q]
return threshold_posts
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.