nanos / fedifetcher Goto Github PK

FediFetcher is a tool for Mastodon that automatically fetches missing replies and posts from other fediverse instances, and adds them to your own Mastodon instance.

Home Page: https://blog.thms.uk/fedifetcher?utm_source=github

License: MIT License

Python 99.61% Dockerfile 0.39%

mastodon

fedifetcher's Introduction

FediFetcher for Mastodon

This GitHub repository provides a simple script that can pull missing posts into Mastodon using the Mastodon API. FediFetcher has no further dependencies, and can be run as either a GitHub Action, as a scheduled cron job, or a pre-packaged container. Here is what FediFetcher can do:

It can pull missing remote replies to posts that are already on your server into your server. Specifically, it can
1. fetch missing replies to posts that users on your instance have already replied to,
2. fetch missing replies to the most recent posts in your home timeline,
3. fetch missing replies to your bookmarks.
4. fetch missing replies to your favourites.
It can also backfill profiles on your instance. In particular it can
1. fetch missing posts from users that have recently appeared in your notifications,
2. fetch missing posts from users that you have recently followed,
3. fetch missing posts form users that have recently followed you,
4. fetch missing posts form users that have recently sent you a follow request.

Each part of this script is fully configurable, and you can completely disable parts that you are not interested in.

FediFetcher will store posts and profiles it has already pulled in on disk, to prevent re-fetching the same info in subsequent executions.

Be aware, that this script may run for a very long time. This is particularly true, the first time this script runs, and/or if you enable all parts of this script. You should ensure that you take steps to prevent multiple overlapping executions of this script, as that will lead to unpleasant results. There are detailed instructions for this below.

For detailed information on the how and why, please read the FediFetcher for Mastodon page.

Supported servers

FediFetcher makes use of the Mastodon API. It'll run against any instance implementing this API, and whilst it was built for Mastodon, it's been confirmed working against Pleroma as well.

FediFetcher will pull in posts and profiles from any servers running the following software: Mastodon, Pleroma, Akkoma, Pixelfed, Hometown, Misskey, Firefish (Calckey), Foundkey, and Lemmy.

Setup

You can run FediFetcher either as a GitHub Action, as a scheduled cron job on your local machine/server, or from a pre-packed container.

1) Get the required access token:

Regardless of how you want to run FediFetcher, you must first get an access token:

If you are an Admin on your instance

In Mastodon go to Preferences > Development > New Application
1. Give it a nice name
2. Enable the required scopes for your options. You could tick read and admin:read:accounts, or see below for a list of which scopes are required for which options.
3. Save
4. Copy the value of Your access token

If you are not an Admin on your Instance

Go to GetAuth for Mastodon
Type in your Mastodon instance's domain
Copy the token.

2) Configure and run FediFetcher

Run FediFetcher as a GitHub Action, a cron job, or a container:

To run FediFetcher as a GitHub Action:

Fork this repository
Add your access token:
1. Go to Settings > Secrets and Variables > Actions
2. Click New Repository Secret
3. Supply the Name ACCESS_TOKEN and provide the Token generated above as Secret
Create a file called config.json with your configuration options in the repository root. Do NOT include the Access Token in your config.json!
Finally go to the Actions tab and enable the action. The action should now automatically run approximately once every 10 min.

Note

Keep in mind that the schedule event can be delayed during periods of high loads of GitHub Actions workflow runs.

To run FediFetcher as a cron job:

Clone this repository.
Install requirements: pip install -r requirements.txt
Create a json file with your configuration options. You may wish to store this in the ./artifacts directory, as that directory is .gitignored
Then simply run this script like so: python find_posts.py -c=./artifacts/config.json.

If desired, all configuration options can be provided as command line flags, instead of through a JSON file. An example script can be found in the examples folder.

When using a cronjob, we are using file based locking to avoid multiple overlapping executions of the script. The timeout period for the lock can be configured using lock-hours.

Note

If you are running FediFetcher locally, my recommendation is to run it manually once, before turning on the cron job: The first run will be significantly slower than subsequent runs, and that will help you prevent overlapping during that first run.

To run FediFetcher from a container:

FediFetcher is also available in a pre-packaged container, FediFetcher - Thank you @nikdoof.

Pull the container from ghcr.io, using Docker or your container tool of choice: docker pull ghcr.io/nanos/fedifetcher:latest
Run the container, passing the configurations options as command line arguments: docker run -it ghcr.io/nanos/fedifetcher:latest --access-token=<TOKEN> --server=<SERVER>

Note

The same rules for running this as a cron job apply to running the container: don't overlap any executions.

Persistent files are stored in /app/artifacts within the container, so you may want to map this to a local folder on your system.

An example Kubernetes CronJob for running the container is included in the examples folder.

An example Docker Compose Script for running the container periodically is included in the examples folder.

To run FediFetcher with systemd-timer:

See systemd.md

Configuration options

FediFetcher has quite a few configuration options, so here is my quick configuration advice, that should probably work for most people:

Warning

Do NOT include your access-token in the config.json when running FediFetcher as GitHub Action. When running FediFetcher as GitHub Action ALWAYS set the Access Token as an Action Secret.

{
  "access-token": "Your access token",
  "server": "your.mastodon.server",
  "home-timeline-length": 200,
  "max-followings": 80,
  "from-notifications": 1
}

If you configure FediFetcher this way, it'll fetch missing remote replies to the last 200 posts in your home timeline. It'll additionally backfill profiles of the last 80 people you followed, and of every account who appeared in your notifications during the past hour.

Advanced Options

Please find the list of all configuration options, including descriptions, below:

Option	Required?	Notes
`log-level`	No	The severity of messages to log. Possible values are `DEBUG`, `INFO`, `WARNING`, `ERROR`, and `CRITICAL`. Defaults to `DEBUG`.
`access-token`	Yes	The access token. If using GitHub action, this needs to be provided as a Secret called `ACCESS_TOKEN`. If running as a cron job or a container, you can supply this option as array, to fetch posts for multiple users on your instance.
`server`	Yes	The domain only of your mastodon server (without `https://` prefix) e.g. `mstdn.thms.uk`.
`home-timeline-length`	No	Provide to fetch remote replies to posts in the API-Key owner's home timeline. Determines how many posts we'll fetch replies for. Recommended value: `200`.
`max-bookmarks`	No	Provide to fetch remote replies to any posts you have bookmarked. Determines how many of your bookmarks you want to get replies to. Recommended value: `80`. Requires an access token with `read:bookmarks` scope.
`max-favourites`	No	Provide to fetch remote replies to any posts you have favourited. Determines how many of your favourites you want to get replies to. Recommended value: `40`. Requires an access token with `read:favourites` scope.
`max-followings`	No	Provide to backfill profiles for your most recent followings. Determines how many of your last followings you want to backfill. Recommended value: `80`.
`max-followers`	No	Provide to backfill profiles for your most recent followers. Determines how many of your last followers you want to backfill. Recommended value: `80`.
`max-follow-requests`	No	Provide to backfill profiles for the API key owner's most recent pending follow requests. Determines how many of your last follow requests you want to backfill. Recommended value: `80`.
`from-notifications`	No	Provide to backfill profiles of anyone mentioned in your recent notifications. Determines how many hours of notifications you want to look at. Requires an access token with `read:notifications` scope. Recommended value: `1`, unless you run FediFetcher less than once per hour.
`reply-interval-in-hours`	No	Provide to fetch remote replies to posts that have received replies from users on your own instance. Determines how far back in time we'll go to find posts that have received replies. You must be administrator on your instance to use this option, and this option is not supported on Pleroma / Akkoma and its forks. Recommend value: `0` (disabled). Requires an access token with `admin:read:accounts`.
`backfill-with-context`	No	Set to `0` to disable fetching remote replies while backfilling profiles. This is enabled by default, but you can disable it, if it's too slow for you.
`backfill-mentioned-users`	No	Set to `0` to disable backfilling any mentioned users when fetching the home timeline. This is enabled by default, but you can disable it, if it's too slow for you.
`remember-users-for-hours`	No	How long between back-filling attempts for non-followed accounts? Defaults to `168`, i.e. one week.
`remember-hosts-for-days`	No	How long should FediFetcher cache host info for? Defaults to `30`.
`http-timeout`	No	The timeout for any HTTP requests to the Mastodon API in seconds. Defaults to `5`.
`lock-hours`	No	Determines after how many hours a lock file should be discarded. Not relevant when running the script as GitHub Action, as concurrency is prevented using a different mechanism. Recommended value: `24`.
`lock-file`	No	Location for the lock file. If not specified, will use `lock.lock` under the state directory. Not relevant when running the script as GitHub Action.
`state-dir`	No	Directory storing persistent files, and the default location for lock file. Not relevant when running the script as GitHub Action.
`on-start`	No	Optionally provide a callback URL that will be pinged when processing is starting. A query parameter `rid={uuid}` will automatically be appended to uniquely identify each execution. This can be used to monitor your script using a service such as healthchecks.io.
`on-done`	No	Optionally provide a callback URL that will be called when processing is finished. A query parameter `rid={uuid}` will automatically be appended to uniquely identify each execution. This can be used to monitor your script using a service such as healthchecks.io.
`on-fail`	No	Optionally provide a callback URL that will be called when processing has failed. A query parameter `rid={uuid}` will automatically be appended to uniquely identify each execution. This can be used to monitor your script using a service such as healthchecks.io.

Multi User support

If you wish to run FediFetcher for multiple users on your instance, you can supply the access-token as an array, with different access tokens for different users. That will allow you to fetch replies and/or backfill profiles for multiple users on your account.

This is only supported when running FediFetcher as cron job, or container. Multi-user support is not available when running FediFetcher as GitHub Action.

Required Access Token Scopes

For all actions, your access token must include these scopes:
- read:search
- read:statuses
- read:accounts
If you are supplying reply-interval-in-hours you must additionally enable this scope:
- admin:read:accounts
If you are supplying max-follow-requests you must additionally enable this scope:
- read:follows
If you are supplying max-bookmarks you must additionally enable this scope:
- read:bookmarks
If you are supplying max-favourites you must additionally enable this scope:
- read:favourites
If you are supplying from-notifications you must additionally enable this scope:
- read:notifications

Acknowledgments

The original inspiration of this script, as well as parts of its implementation are taken from Abhinav Sarkar. Thank you Abhinav!

fedifetcher's People

Contributors

Stargazers

Watchers

Forkers

quicoto willbarton stammy joshuaholme 2beflo pauljacobson mrdaemon lambada l0rdangus ronilaukkarinen cid214 gregology raygan laril vga-256 datares37 gashead76 typetura clarity99 mattb emorydunn pageuppagedown motoridersd johnwarne isacson cassidyjames lazyatom andrewgodwin jcrabapple tomtaylor softwarehistorysociety tomoyanonymous simonft devsqit dawangthang chrisw-b mattlehrer swrogers kevanloy shiruken morphtown frankramblings lilfade joshix-1 zaherg andre-dierker jbfriedrich arachnist abhin4v crisukbot sjbat edoswald maurizi chenew rdela vavallee jmiguelr adriankuntner eddiestech nicthurne cooperaj brunty bogge101 crazypedia dcava 4ndrewv siskourso lexevolution zackarired arilin-thorferra just-insane weimin-liu russjr08 c-hoy cdrum jsilkens agessaman averymd teqed guiltmanager dvtkrlbs ddvd233 unbitten arbolitoloco1 tedcarstensen cherishmey drmxrcy mikehuntington senacand kevinpayravi pegelinuxtop sauyon marivaux32 hoodsen moussaclarke adlerweb vastream skin-soc akrabat hazycora

fedifetcher's Issues

New version depends on endpoint "api/v1/admin/accounts" which is not available in Pleroma

When attempting to run the latest version of FediFetcher (v4.3.0), using it on Pleroma/Akkoma fails as the new method depends on the endpoint "https://example.net/api/v1/admin" which gives the error

{"error":"Not implemented"}

Logs:

2023-06-13 10:44:37.970590 CST: Starting FediFetcher
2023-06-13 10:44:38.680607 CST: Job failed after 0:00:00.709996.
Traceback (most recent call last):
  File "/home/admin/.local/lib/python3.9/site-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
  File "/home/admin/.local/lib/python3.9/site-packages/urllib3/util/connection.py", line 95, in create_connection
    raise err
  File "/home/admin/.local/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):                                                                                                                                                                                 File "/home/admin/.local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen                                                                                                              httplib_response = self._make_request(
  File "/home/admin/.local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._validate_conn(conn)
  File "/home/admin/.local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
    conn.connect()
  File "/home/admin/.local/lib/python3.9/site-packages/urllib3/connection.py", line 358, in connect
    self.sock = conn = self._new_conn()
  File "/home/admin/.local/lib/python3.9/site-packages/urllib3/connection.py", line 186, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f12063f8d00>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/admin/.local/lib/python3.9/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
  File "/home/admin/.local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "/home/admin/.local/lib/python3.9/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='https', port=443): Max retries exceeded with url: //social.example.net/api/v1/admin/accounts (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f12063f8d00>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/admin/FediFetcher/find_posts.py", line 860, in <module>
    reply_toots = get_all_reply_toots(
  File "/home/admin/FediFetcher/find_posts.py", line 310, in get_all_reply_toots
    reply_toots = list(
  File "/home/admin/FediFetcher/find_posts.py", line 311, in <genexpr>
    itertools.chain.from_iterable(
  File "/home/admin/FediFetcher/find_posts.py", line 278, in get_active_user_ids
    resp = get(url, headers={
  File "/home/admin/FediFetcher/find_posts.py", line 687, in get
    response = requests.get( url, headers= h, timeout=timeout)
  File "/home/admin/.local/lib/python3.9/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/home/admin/.local/lib/python3.9/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/admin/.local/lib/python3.9/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/admin/.local/lib/python3.9/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "/home/admin/.local/lib/python3.9/site-packages/requests/adapters.py", line 565, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='https', port=443): Max retries exceeded with url: //social.example.net/api/v1/admin/accounts (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f12063f8d00>: Failed to establish a new connection: [Errno 111] Connection refused'))

Properly keep workflow alive

The way this was done in #17 didn't work, because we aren't actually checking out the repo, but simply fetching files (#23).

Need to revisit this properly

Scan lists for accounts and backfill/same for bookmarks

Request 1: Backfill members of lists

Problem 1. In absence of an AI to figure out what we are interested in, a lot of users use lists to create our very own timeline views of accounts we find important. It can be assumed that many of those accounts on lists are aged in the following list and out of reach of a normal scan.

Desire 1: I propose the script scans lists for members of the list and backfill those accounts, so they are up-to-date.

Request 2: Backfill Bookmarks
Problem 2: We bookmark toot posts we are interested in. It would be nice to have those bookmarks up-to-date when we go and look at them.

Desire: Have the script scan and backfill bookmark toot posts.

Improve configuration options and their documentation

There are now three ways to configure FediFetcher:

If using Action it must be configured using environment variables
Otherwise it can be configured using either command line flags, or a json file.

The problem is to a large part, that each way has its own naming of options. Can we standardise all of this, and then improve the documentation?

Enhancement: Report version number in "Starting" message

Can FediFetcher report its version number/git hash when starting so that can be logged? I'm running locally and it'd be a nice way to verify the correct version is running.

Unclear how to disable backfill

The --backfill options are confusing to disable. The documentation suggests setting them to "0", but actually that enables them:

The bool() function is not recommended as a type converter. All it does is convert empty strings to False and non-empty strings to True. This is usually not what is desired.

FR: Allow configuring the access token in a more secure way

As far as I can tell from the documentation, the --access-token flag is the only way the (required) access token can be passed to FediFetcher. Command lines are usually world readable on Linux systems (via /proc, unless hidepid is in use), and cron/systemd configurations are also often world readable.

Providing an alternative such as e.g. --access-token-file being a path to a file that contains the access token would allow deploying fedifetcher on a shared box without leaking tokens to all other residents.

Thank you!

Enhancement: add the ability to parse messages from Peertube, Kbin, and Lemmy

Currently, if an user is following an account on a service that is not a microblog (such as Peertube, Kbin, or Lemmy) the backfill requests fails, typically with the error message "Error parsing toot URL". Checking on the source code, it seems like the application is hardcoded to support only URLs from Mastodon, Pleroma, and PixelFed. What is required to parse URLs from the other services listed above?

Running Directly via Cron Traceback on `KNOWN_FOLLOWINGS_FILE`

I'm running the get_context.py script directly via cron on my Mastodon server and the cron job is ending with the following error.

Traceback (most recent call last):
  File "/home/mastodon/get_context.py", line 668, in <module>
    with open(KNOWN_FOLLOWINGS_FILE, "w", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'artifacts/known_followings'

My cron job call is the following:

*/10 * * * * /home/mastodon/get_context.py <access-token> <domain-name> 24 20 10 <user>

From the cron output it appears that something is working.

Error can't find release

I get quite some errors like this one https://github.com/quicoto/mastodon-get-replies/actions/runs/4491658020/jobs/7900426746

Not sure if there's something we can do about it

Thank you

TIP when scheduling on Windows via Task Scheduler.

If you copy and rename the script to the .pyw extension instead of the py, you can set it up in Windows Task Scheduler and it will run silent, without opening a console window, unlike regular . py files. Windows will use PYTHONW. EXE instead of PYTHON. EXE.

Implement Dead Man Monitoring

Give an option to ping a url when script is complete, which can then integrate for example with https://healthchecks.io.

This can be configured with these options:

Environment Variable Name (if using GitHub Action)	Command line flag (if using cron, or the container)	Required?	Notes
`ON_START`	`--on-start`	No	Optionally provide a callback URL that will be pinged when processing is starting. A query parameter `rid={uuid}` will automatically be appended to uniquely identify each execution.
`ON_DONE`	`--on-done`	No	Optionally provide a callback URL that will be called when processing is finished. A query parameter `rid={uuid}` will automatically be appended to uniquely identify each execution.
`ON_FAIL`	`--on-fail`	No	Optionally provide a callback URL that will be called when processing has failed. A query parameter `rid={uuid}` will automatically be appended to uniquely identify each execution.

Parse error for "toots" from non-Mastodon fediverse applications

While running FediFetcher we see the following error when it tries to parse URLs which are not specific to Mastodon, e.g. PixelFed:

2023-03-29 18:58:56.773641 CEST: Error parsing toot URL https://pixelfed.de/p/kaffeeringe/545560350894759521

Running with docker image from latest tag from timestamp as logged above.

PS: I'm not sure if this tool is meant to be used for Mastodon-to-Mastodon communication specifically. If so please close this issue.

Add the ability to pull likes in

It’d be great to be able to see how many total likes a post has as well in addition to the comments. Not just what your instance has seen.

Process keeps failing

I thought deleting my fork and reforking would fix this.. but the process keeps failing. Here's what's happening right before it fails

2023-04-03 04:32:11.770488 UTC: Added 281 new context toots (with 2 failures)
2023-04-03 04:32:11.770497 UTC: Getting posts from last 80 followings
2023-04-03 04:32:11.917412 UTC: Job failed after 0:04:54.898183.
Error: Process completed with exit code 1.

Docker image name is invalid

The example Docker commands fail with the new name:

$ docker --version
Docker version 20.10.21, build v20.10.21
$ docker pull ghcr.io/nanos/FediFetcher:latest
invalid reference format: repository name must be lowercase

on-done URL not called?

Further to #46 I just had a number of successful runs from the GitHub Action but the ON_DONE URL wasn't called for any of them, I can see it in the command being issued but it seems there's no other mention of it in the logs

2023-06-01T21:05:02.4191495Z ##[group]Run python find_posts.py --lock-hours=0 --access-token=*** --server=social.racf.guru --reply-interval-in-hours=0 --home-timeline-length=200 --max-followings=80 --user= --max-followers=0  --http-timeout=5 --max-follow-requests=0 --on-fail="https://uptime.xxx.com/api/push/xxxxx?status=down&msg=Failed%20run" --on-start="" --on-done="https://uptime.xxx.com/api/push/xxxxx?status=up&msg=OK" --max-bookmarks=0 --remember-users-for-hours=168 --from-notifications=1 --backfill-with-context=1  --backfill-mentioned-users=1  --max-favourites=0
2023-06-01T21:05:02.4193319Z �[36;1mpython find_posts.py --lock-hours=0 --access-token=*** --server=social.racf.guru --reply-interval-in-hours=0 --home-timeline-length=200 --max-followings=80 --user= --max-followers=0  --http-timeout=5 --max-follow-requests=0 --on-fail="https://uptime.xxx.com/api/push/xxxxx?status=down&msg=Failed%20run" --on-start="" --on-done="https://uptime.xxx.com/api/push/xxxxx?status=up&msg=OK" --max-bookmarks=0 --remember-users-for-hours=168 --from-notifications=1 --backfill-with-context=1  --backfill-mentioned-users=1  --max-favourites=0�[0m

Support instances where LOCAL_DOMAIN≠WEB_DOMAIN.

At the moment, while we'll fetch context just fine, the backfilling of profiles won't work in these cases.

"NoMethodError: undefined method `[]' for nil:NilClass" Showing up in Queues

Since upgrading from 3.0.1 to 4.1.6, suddenly I'm seeing dead/retry jobs in my sidekiq queues of a strange nature that seem to coincide with scheduled runs of FediFetcher

In the ingress queue, I get jobs that show as:

ActivityPub::ProcessingWorker - <18 digit number here>, "null", nil, "Account" - NoMethodError: undefined method `[]' for nil:NilClass

No errors like this before updating the script and changing the arguments to the new style from the old style, not sure what would be causing it but since it started happening exactly as soon as I changed the script and no other changes have been made, I'd suppose it has to do with the script making a bad call?

Curious whats going on here... I see later releases today but nothing in the patch notes that seems to denote this problem?

panzner.net

Hi, as you mentioned on mastodon, bye bogge101

Make 5 second timeout a configuration option

See https://sea-mstdn.social/@patrick/110031235862511034

Fails on exit code 1: not supported between instances of 'str' and 'int'

Since yesterday I have started having an issue with it failing constantly with the error:

2023-06-18 02:24:18.123189 UTC: Job failed after 0:03:53.352363.
File "/home/runner/work/FediFetcher/FediFetcher/find_posts.py", line 885, in
if arguments.backfill_mentioned_users > 0:
TypeError: '>' not supported between instances of 'str' and 'int'
Error: Process completed with exit code 1.

Running as a GitHub action, running the most recent branch

logs from the last run: https://paste.lanofthedead.xyz/?bc97b25835463f1b#2SxgVUpk7hNZt2Nr6qhA5GbY4TPVcTYwYpWWU8phJDZ1

Please let me know if you need anymore information

Enhancement: Add CalcKey Instance Support

CalcKey's API documentation can be found at https://calckey.social/api-doc

For both pulling in posts, but also updating timelines, etc.

Error handling 404

Just ran this for the first time and it seems one of the URL's failed to load or returned 404, and the action failed.

https://github.com/quicoto/mastodon-get-replies/actions/runs/4353317910/jobs/7607196694

Can we catch these to prevent action failure?

Can we also backfill posts from follow requests?

any chance to upgrade that to include people that requested to follow me? i like to see if/what they post before i decide if i accept
https://uff.rip/@rauschen/110021234592495681

Last step action error

Hello,

I'm getting this error on the last step of the action.

Thank you

Possible attack vector

This is a great script, thanks for sharing it! I just have one concern, the workflow currently relies on releases from the parent repo. This creates a possible attack vector, if this repo is compromised, all forks may also be compromised. Ideally forked repo workflows would only reference code from within their own repository. The obvious advantage to the current pattern is getting updates for free.

Error while running fedifetcher: "Error getting user ID" on certain URLs and "KeyError: 'url'"

Hello!

Lately I have observed the following errors while running fedifetcher:

The URLs it complains work for me in the browser.

2023-04-29 19:30:52.973901 CEST: Error parsing toot URL https://digitalcourage.video/videos/watch/80110bb9-2048-44cc-8a77-0f0b42da900e
2023-04-29 19:30:54.483262 CEST: Error parsing toot URL https://digitalcourage.video/videos/watch/d6a27ca1-9fde-4821-9b8a-ae25c8515609
2023-04-29 19:30:59.137947 CEST: Error getting context for toot None. Status code: 401
2023-04-29 19:32:11.349357 CEST: Error parsing toot URL https://anonsys.net/display/bf69967c-6664-495d-6e3a-a16077737418
2023-04-29 19:32:56.704193 CEST: Error getting user ID for user [email protected]: User digitalcourage.de was not found on server digitalcourage.video/accounts.
2023-04-29 19:32:56.858679 CEST: Error getting user ID for user [email protected]: User bba was not found on server digitalcourage.video/video-channels.
2023-04-29 19:32:56.995373 CEST: Error getting user ID for user [email protected]: User digitalcourage.de was not found on server digitalcourage.video/accounts.
2023-04-29 19:32:57.550275 CEST: Error getting user ID for user [email protected]: Error getting URL https://types.pl/api/v1/accounts/lookup?acct=amy. Status code: 401
2023-04-29 19:32:58.050662 CEST: Error getting user ID for user [email protected]: Expecting value: line 1 column 1 (char 0)
2023-04-29 19:32:58.311605 CEST: Error getting user ID for user [email protected]: Expecting value: line 1 column 1 (char 0)
2023-04-29 19:50:20.165598 CEST: Job failed after 0:02:20.695650.
Traceback (most recent call last):
  File "/app/find_posts.py", line 878, in <module>
    add_user_posts(arguments.server, token, filter_known_users(mentioned_users, all_known_users), recently_checked_users, all_known_users, seen_urls)
  File "/app/find_posts.py", line 75, in add_user_posts
    if post['reblog'] == None and post['url'] != None and post['url'] not in seen_urls:
                                  ~~~~^^^^^^^
KeyError: 'url'

Could you take a look?

Thanks as always for your work!

Favourites and Reblogs

As mentioned: https://elk.zone/coldrick.me.uk/@eddie/110273363273834774

Would be great to also fetch favourites and reblogs, if possible with the API :)

Prevent automatically disabling action

GitHub will suspend the scheduled trigger for GitHub action workflows if there is no commit in the repository for the past 60 days.

We should see if we can implement this action to keep it alive:

https://github.com/marketplace/actions/keepalive-workflow

Workflow failing with “IndexError: list index out of range”

It’s entirely possible I am doing something wrong, but I tried to follow the README instructions for setting this up. 😅

Log:

2023-03-15T03:31:10.8031708Z Requested labels: ubuntu-latest
2023-03-15T03:31:10.8031750Z Job defined at: cassidyjames/mastodon_get_replies/.github/workflows/get_context.yml@refs/heads/main
2023-03-15T03:31:10.8031785Z Waiting for a runner to pick up this job...
2023-03-15T03:31:10.9608952Z Job is waiting for a hosted runner to come online.
2023-03-15T03:31:15.7317194Z Job is about to start running on the hosted runner: GitHub Actions 2 (hosted)
2023-03-15T03:31:20.7149283Z Current runner version: '2.302.1'
2023-03-15T03:31:20.7180556Z ##[group]Operating System
2023-03-15T03:31:20.7181174Z Ubuntu
2023-03-15T03:31:20.7181627Z 22.04.2
2023-03-15T03:31:20.7181947Z LTS
2023-03-15T03:31:20.7182310Z ##[endgroup]
2023-03-15T03:31:20.7182621Z ##[group]Runner Image
2023-03-15T03:31:20.7183910Z Image: ubuntu-22.04
2023-03-15T03:31:20.7184308Z Version: 20230305.1
2023-03-15T03:31:20.7184890Z Included Software: https://github.com/actions/runner-images/blob/ubuntu22/20230305.1/images/linux/Ubuntu2204-Readme.md
2023-03-15T03:31:20.7185561Z Image Release: https://github.com/actions/runner-images/releases/tag/ubuntu22%2F20230305.1
2023-03-15T03:31:20.7186085Z ##[endgroup]
2023-03-15T03:31:20.7186924Z ##[group]Runner Image Provisioner
2023-03-15T03:31:20.7187324Z 2.0.119.1
2023-03-15T03:31:20.7187608Z ##[endgroup]
2023-03-15T03:31:20.7188420Z ##[group]GITHUB_TOKEN Permissions
2023-03-15T03:31:20.7189138Z Contents: read
2023-03-15T03:31:20.7189505Z Metadata: read
2023-03-15T03:31:20.7190055Z Packages: read
2023-03-15T03:31:20.7190572Z ##[endgroup]
2023-03-15T03:31:20.7193565Z Secret source: Actions
2023-03-15T03:31:20.7194102Z Prepare workflow directory
2023-03-15T03:31:20.8210687Z Prepare all required actions
2023-03-15T03:31:20.8450399Z Getting action download info
2023-03-15T03:31:21.1364623Z Download action repository 'actions/setup-python@v4' (SHA:d27e3f3d7c64b4bbf8e4abfb9b63b83e846e0435)
2023-03-15T03:31:21.8307018Z Download action repository 'dawidd6/action-download-artifact@v2' (SHA:5e780fc7bbd0cac69fc73271ed86edf5dcb72d67)
2023-03-15T03:31:22.4156088Z Download action repository 'actions/upload-artifact@v3' (SHA:0b7f8abb1508181956e8e162db84b466c27e18ce)
2023-03-15T03:31:23.0276501Z Complete job name: run
2023-03-15T03:31:23.1399151Z ##[group]Run curl -s https://api.github.com/repos/nanos/mastodon_get_replies/releases/latest | jq .zipball_url | xargs wget -O download.zip
2023-03-15T03:31:23.1399934Z �[36;1mcurl -s https://api.github.com/repos/nanos/mastodon_get_replies/releases/latest | jq .zipball_url | xargs wget -O download.zip�[0m
2023-03-15T03:31:23.1400373Z �[36;1munzip -j download.zip�[0m
2023-03-15T03:31:23.1400642Z �[36;1mmkdir artifacts�[0m
2023-03-15T03:31:23.1400886Z �[36;1mls -lR�[0m
2023-03-15T03:31:23.1463891Z shell: /usr/bin/bash -e {0}
2023-03-15T03:31:23.1464203Z ##[endgroup]
2023-03-15T03:31:23.4598532Z --2023-03-15 03:31:23--  https://api.github.com/repos/nanos/mastodon_get_replies/zipball/v3.0.1
2023-03-15T03:31:23.4612388Z Resolving api.github.com (api.github.com)... 192.30.255.116
2023-03-15T03:31:23.4804263Z Connecting to api.github.com (api.github.com)|192.30.255.116|:443... connected.
2023-03-15T03:31:23.6476701Z HTTP request sent, awaiting response... 302 Found
2023-03-15T03:31:23.6477597Z Location: https://codeload.github.com/nanos/mastodon_get_replies/legacy.zip/refs/tags/v3.0.1 [following]
2023-03-15T03:31:23.6478550Z --2023-03-15 03:31:23--  https://codeload.github.com/nanos/mastodon_get_replies/legacy.zip/refs/tags/v3.0.1
2023-03-15T03:31:23.6508746Z Resolving codeload.github.com (codeload.github.com)... 192.30.255.120
2023-03-15T03:31:23.6697172Z Connecting to codeload.github.com (codeload.github.com)|192.30.255.120|:443... connected.
2023-03-15T03:31:23.8523167Z HTTP request sent, awaiting response... 200 OK
2023-03-15T03:31:23.8525178Z Length: 10876 (11K) [application/zip]
2023-03-15T03:31:23.8530014Z Saving to: ‘download.zip’
2023-03-15T03:31:23.8531375Z 
2023-03-15T03:31:23.8545945Z      0K ..........                                            100% 27.7M=0s
2023-03-15T03:31:23.8549115Z 
2023-03-15T03:31:23.8549827Z 2023-03-15 03:31:23 (27.7 MB/s) - ‘download.zip’ saved [10876/10876]
2023-03-15T03:31:23.8550102Z 
2023-03-15T03:31:23.8582667Z Archive:  download.zip
2023-03-15T03:31:23.8583897Z 248542d7a6767113a30085147b22551a81774546
2023-03-15T03:31:23.8591014Z   inflating: get_context.yml         
2023-03-15T03:31:23.8592278Z  extracting: .gitignore              
2023-03-15T03:31:23.8592590Z   inflating: LICENSE.md              
2023-03-15T03:31:23.8593727Z   inflating: README.md               
2023-03-15T03:31:23.8595522Z  extracting: blank                   
2023-03-15T03:31:23.8604326Z   inflating: get_context.py          
2023-03-15T03:31:23.8604670Z   inflating: requirements.txt        
2023-03-15T03:31:23.8635901Z .:
2023-03-15T03:31:23.8636227Z total 64
2023-03-15T03:31:23.8636990Z -rw-r--r-- 1 runner docker  1091 Mar 14 10:23 LICENSE.md
2023-03-15T03:31:23.8637530Z -rw-r--r-- 1 runner docker  5490 Mar 14 10:23 README.md
2023-03-15T03:31:23.8638004Z drwxr-xr-x 2 runner docker  4096 Mar 15 03:31 artifacts
2023-03-15T03:31:23.8638514Z -rw-r--r-- 1 runner docker     0 Mar 14 10:23 blank
2023-03-15T03:31:23.8639023Z -rw-r--r-- 1 runner docker 10876 Mar 15 03:31 download.zip
2023-03-15T03:31:23.8639621Z -rw-r--r-- 1 runner docker 24973 Mar 14 10:23 get_context.py
2023-03-15T03:31:23.8640103Z -rw-r--r-- 1 runner docker  1364 Mar 14 10:23 get_context.yml
2023-03-15T03:31:23.8640647Z -rw-r--r-- 1 runner docker   116 Mar 14 10:23 requirements.txt
2023-03-15T03:31:23.8640913Z 
2023-03-15T03:31:23.8641050Z ./artifacts:
2023-03-15T03:31:23.8641399Z total 0
2023-03-15T03:31:23.9023886Z ##[group]Run actions/setup-python@v4
2023-03-15T03:31:23.9024456Z with:
2023-03-15T03:31:23.9024796Z   python-version: 3.10
2023-03-15T03:31:23.9025088Z   cache: pip
2023-03-15T03:31:23.9025428Z   check-latest: false
2023-03-15T03:31:23.9026034Z   token: ***
2023-03-15T03:31:23.9026671Z   update-environment: true
2023-03-15T03:31:23.9027273Z ##[endgroup]
2023-03-15T03:31:24.3052528Z ##[group]Installed versions
2023-03-15T03:31:24.3193034Z Successfully set up CPython (3.10.10)
2023-03-15T03:31:24.3194333Z ##[endgroup]
2023-03-15T03:31:24.3945599Z [command]/opt/hostedtoolcache/Python/3.10.10/x64/bin/pip cache dir
2023-03-15T03:31:24.9136481Z /home/runner/.cache/pip
2023-03-15T03:31:25.1816491Z pip cache is not found
2023-03-15T03:31:25.2154450Z ##[group]Run pip install -r requirements.txt
2023-03-15T03:31:25.2154876Z �[36;1mpip install -r requirements.txt�[0m
2023-03-15T03:31:25.2215357Z shell: /usr/bin/bash -e {0}
2023-03-15T03:31:25.2215632Z env:
2023-03-15T03:31:25.2215946Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.10/x64
2023-03-15T03:31:25.2216351Z   PKG_CONFIG_PATH: /opt/hostedtoolcache/Python/3.10.10/x64/lib/pkgconfig
2023-03-15T03:31:25.2216740Z   Python_ROOT_DIR: /opt/hostedtoolcache/Python/3.10.10/x64
2023-03-15T03:31:25.2217093Z   Python2_ROOT_DIR: /opt/hostedtoolcache/Python/3.10.10/x64
2023-03-15T03:31:25.2217454Z   Python3_ROOT_DIR: /opt/hostedtoolcache/Python/3.10.10/x64
2023-03-15T03:31:25.2217824Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.10/x64/lib
2023-03-15T03:31:25.2219383Z ##[endgroup]
2023-03-15T03:31:26.1950731Z Collecting certifi==2022.12.7
2023-03-15T03:31:26.2949678Z   Downloading certifi-2022.12.7-py3-none-any.whl (155 kB)
2023-03-15T03:31:26.3499118Z      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.3/155.3 kB 2.9 MB/s eta 0:00:00
2023-03-15T03:31:26.5147440Z Collecting charset-normalizer==3.0.1
2023-03-15T03:31:26.5249831Z   Downloading charset_normalizer-3.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (198 kB)
2023-03-15T03:31:26.5388992Z      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 198.8/198.8 kB 18.1 MB/s eta 0:00:00
2023-03-15T03:31:26.6050751Z Collecting docutils==0.19
2023-03-15T03:31:26.6171138Z   Downloading docutils-0.19-py3-none-any.whl (570 kB)
2023-03-15T03:31:26.6339955Z      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 570.5/570.5 kB 42.2 MB/s eta 0:00:00
2023-03-15T03:31:26.6880780Z Collecting idna==3.4
2023-03-15T03:31:26.6930875Z   Downloading idna-3.4-py3-none-any.whl (61 kB)
2023-03-15T03:31:26.6995144Z      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.5/61.5 kB 15.4 MB/s eta 0:00:00
2023-03-15T03:31:26.7766629Z Collecting requests==2.28.2
2023-03-15T03:31:26.7812699Z   Downloading requests-2.28.2-py3-none-any.whl (62 kB)
2023-03-15T03:31:26.7887157Z      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.8/62.8 kB 14.3 MB/s eta 0:00:00
2023-03-15T03:31:26.8177366Z Collecting six==1.16.0
2023-03-15T03:31:26.8227023Z   Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
2023-03-15T03:31:26.8921064Z Collecting urllib3==1.26.14
2023-03-15T03:31:26.9029935Z   Downloading urllib3-1.26.14-py2.py3-none-any.whl (140 kB)
2023-03-15T03:31:26.9104852Z      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 140.6/140.6 kB 31.5 MB/s eta 0:00:00
2023-03-15T03:31:27.0687738Z Installing collected packages: charset-normalizer, urllib3, six, idna, docutils, certifi, requests
2023-03-15T03:31:27.8294022Z Successfully installed certifi-2022.12.7 charset-normalizer-3.0.1 docutils-0.19 idna-3.4 requests-2.28.2 six-1.16.0 urllib3-1.26.14
2023-03-15T03:31:28.2025134Z ##[group]Run dawidd6/action-download-artifact@v2
2023-03-15T03:31:28.2025470Z with:
2023-03-15T03:31:28.2025675Z   name: artifacts
2023-03-15T03:31:28.2025922Z   workflow: get_context.yml
2023-03-15T03:31:28.2026427Z   if_no_artifact_found: warn
2023-03-15T03:31:28.2026684Z   path: artifacts
2023-03-15T03:31:28.2027140Z   github_token: ***
2023-03-15T03:31:28.2027383Z   workflow_conclusion: success
2023-03-15T03:31:28.2027693Z   repo: cassidyjames/mastodon_get_replies
2023-03-15T03:31:28.2027977Z   check_artifacts: false
2023-03-15T03:31:28.2028232Z   search_artifacts: false
2023-03-15T03:31:28.2028466Z   skip_unpack: false
2023-03-15T03:31:28.2028713Z env:
2023-03-15T03:31:28.2029005Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.10/x64
2023-03-15T03:31:28.2029394Z   PKG_CONFIG_PATH: /opt/hostedtoolcache/Python/3.10.10/x64/lib/pkgconfig
2023-03-15T03:31:28.2029763Z   Python_ROOT_DIR: /opt/hostedtoolcache/Python/3.10.10/x64
2023-03-15T03:31:28.2030154Z   Python2_ROOT_DIR: /opt/hostedtoolcache/Python/3.10.10/x64
2023-03-15T03:31:28.2030512Z   Python3_ROOT_DIR: /opt/hostedtoolcache/Python/3.10.10/x64
2023-03-15T03:31:28.2030862Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.10/x64/lib
2023-03-15T03:31:28.2031157Z ##[endgroup]
2023-03-15T03:31:28.3654497Z ==> Repository: cassidyjames/mastodon_get_replies
2023-03-15T03:31:28.3661844Z ==> Artifact name: artifacts
2023-03-15T03:31:28.3662489Z ==> Local path: artifacts
2023-03-15T03:31:28.3663201Z ==> Workflow name: get_context.yml
2023-03-15T03:31:28.3663525Z ==> Workflow conclusion: success
2023-03-15T03:31:28.8778139Z ##[warning]no matching workflow run found with any artifacts?
2023-03-15T03:31:28.8823392Z ##[group]Run ls -lR
2023-03-15T03:31:28.8823701Z �[36;1mls -lR�[0m
2023-03-15T03:31:28.8886041Z shell: /usr/bin/bash -e {0}
2023-03-15T03:31:28.8886289Z env:
2023-03-15T03:31:28.8886596Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.10/x64
2023-03-15T03:31:28.8887002Z   PKG_CONFIG_PATH: /opt/hostedtoolcache/Python/3.10.10/x64/lib/pkgconfig
2023-03-15T03:31:28.8887394Z   Python_ROOT_DIR: /opt/hostedtoolcache/Python/3.10.10/x64
2023-03-15T03:31:28.8887745Z   Python2_ROOT_DIR: /opt/hostedtoolcache/Python/3.10.10/x64
2023-03-15T03:31:28.8888114Z   Python3_ROOT_DIR: /opt/hostedtoolcache/Python/3.10.10/x64
2023-03-15T03:31:28.8888481Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.10/x64/lib
2023-03-15T03:31:28.8888779Z ##[endgroup]
2023-03-15T03:31:28.8990268Z .:
2023-03-15T03:31:28.8990845Z total 64
2023-03-15T03:31:28.8991818Z -rw-r--r-- 1 runner docker  1091 Mar 14 10:23 LICENSE.md
2023-03-15T03:31:28.8992297Z -rw-r--r-- 1 runner docker  5490 Mar 14 10:23 README.md
2023-03-15T03:31:28.8992807Z drwxr-xr-x 2 runner docker  4096 Mar 15 03:31 artifacts
2023-03-15T03:31:28.8993268Z -rw-r--r-- 1 runner docker     0 Mar 14 10:23 blank
2023-03-15T03:31:28.8993707Z -rw-r--r-- 1 runner docker 10876 Mar 15 03:31 download.zip
2023-03-15T03:31:28.8994154Z -rw-r--r-- 1 runner docker 24973 Mar 14 10:23 get_context.py
2023-03-15T03:31:28.8994623Z -rw-r--r-- 1 runner docker  1364 Mar 14 10:23 get_context.yml
2023-03-15T03:31:28.8995069Z -rw-r--r-- 1 runner docker   116 Mar 14 10:23 requirements.txt
2023-03-15T03:31:28.8995289Z 
2023-03-15T03:31:28.8995388Z ./artifacts:
2023-03-15T03:31:28.8995614Z total 0
2023-03-15T03:31:28.9094923Z ##[group]Run python get_context.py *** mastodon.blaede.family 48 256 256 cassidy 256
2023-03-15T03:31:28.9095586Z �[36;1mpython get_context.py *** mastodon.blaede.family 48 256 256 cassidy 256�[0m
2023-03-15T03:31:28.9154983Z shell: /usr/bin/bash -e {0}
2023-03-15T03:31:28.9155236Z env:
2023-03-15T03:31:28.9155541Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.10/x64
2023-03-15T03:31:28.9155923Z   PKG_CONFIG_PATH: /opt/hostedtoolcache/Python/3.10.10/x64/lib/pkgconfig
2023-03-15T03:31:28.9156312Z   Python_ROOT_DIR: /opt/hostedtoolcache/Python/3.10.10/x64
2023-03-15T03:31:28.9156671Z   Python2_ROOT_DIR: /opt/hostedtoolcache/Python/3.10.10/x64
2023-03-15T03:31:28.9157035Z   Python3_ROOT_DIR: /opt/hostedtoolcache/Python/3.10.10/x64
2023-03-15T03:31:28.9157400Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.10/x64/lib
2023-03-15T03:31:28.9157682Z ##[endgroup]
2023-03-15T03:32:05.5047727Z Traceback (most recent call last):
2023-03-15T03:32:05.5048336Z Getting last 48 hrs of replies, and latest 256 posts in home timeline from mastodon.blaede.family
2023-03-15T03:32:05.5048718Z Found active user: cassidy
2023-03-15T03:32:05.5049088Z Found reply toot: https://mastodon.blaede.family/@cassidy/110024998531138529
2023-03-15T03:32:05.5049546Z Found reply toot: https://mastodon.blaede.family/@cassidy/110024963888568854
2023-03-15T03:32:05.5049977Z Found reply toot: https://mastodon.blaede.family/@cassidy/110024940118314942
2023-03-15T03:32:05.5050404Z Found reply toot: https://mastodon.blaede.family/@cassidy/110024931491621275
2023-03-15T03:32:05.5050813Z Found reply toot: https://mastodon.blaede.family/@cassidy/110024607291708389
2023-03-15T03:32:05.5051236Z Found reply toot: https://mastodon.blaede.family/@cassidy/110024102676236442
2023-03-15T03:32:05.5051666Z Found reply toot: https://mastodon.blaede.family/@cassidy/110023999736356730
2023-03-15T03:32:05.5052092Z Found reply toot: https://mastodon.blaede.family/@cassidy/110023835643309426
2023-03-15T03:32:05.5052518Z Found reply toot: https://mastodon.blaede.family/@cassidy/110023047132592424
2023-03-15T03:32:05.5052925Z Found reply toot: https://mastodon.blaede.family/@cassidy/110022804984013696
2023-03-15T03:32:05.5053363Z Found reply toot: https://mastodon.blaede.family/@cassidy/110022776859257298
2023-03-15T03:32:05.5053788Z Found reply toot: https://mastodon.blaede.family/@cassidy/110022758367052001
2023-03-15T03:32:05.5054204Z Found reply toot: https://mastodon.blaede.family/@cassidy/110022753304958284
2023-03-15T03:32:05.5054606Z Found reply toot: https://mastodon.blaede.family/@cassidy/110022750620366130
2023-03-15T03:32:05.5055024Z Found reply toot: https://mastodon.blaede.family/@cassidy/110022280550682771
2023-03-15T03:32:05.5055455Z Found reply toot: https://mastodon.blaede.family/@cassidy/110022228850343725
2023-03-15T03:32:05.5055879Z Found reply toot: https://mastodon.blaede.family/@cassidy/110022210962430763
2023-03-15T03:32:05.5056299Z Found reply toot: https://mastodon.blaede.family/@cassidy/110022173643432269
2023-03-15T03:32:05.5056712Z Found reply toot: https://mastodon.blaede.family/@cassidy/110022169488045032
2023-03-15T03:32:05.5057137Z Found reply toot: https://mastodon.blaede.family/@cassidy/110019697080949493
2023-03-15T03:32:05.5057563Z Found reply toot: https://mastodon.blaede.family/@cassidy/110017935213871779
2023-03-15T03:32:05.5057984Z Found reply toot: https://mastodon.blaede.family/@cassidy/110017501952428673
2023-03-15T03:32:05.5058390Z Found reply toot: https://mastodon.blaede.family/@cassidy/110017431196196315
2023-03-15T03:32:05.5058712Z Found 23 reply toots
2023-03-15T03:32:05.5059064Z Got context for toot https://mastodon.blaede.family/@cassidy/110024998531138529
2023-03-15T03:32:05.5059506Z Got context for toot https://mastodon.blaede.family/@cassidy/110024963888568854
2023-03-15T03:32:05.5059941Z Got context for toot https://mastodon.blaede.family/@cassidy/110024940118314942
2023-03-15T03:32:05.5060364Z Got context for toot https://mastodon.blaede.family/@cassidy/110024931491621275
2023-03-15T03:32:05.5061267Z Got context for toot https://mastodon.blaede.family/@cassidy/110024607291708389
2023-03-15T03:32:05.5061741Z Got context for toot https://mastodon.blaede.family/@cassidy/110024102676236442
2023-03-15T03:32:05.5062324Z Got context for toot https://mastodon.blaede.family/@cassidy/110023999736356730
2023-03-15T03:32:05.5075484Z   File "/home/runner/work/mastodon_get_replies/mastodon_get_replies/get_context.py", line 689, in <module>
2023-03-15T03:32:05.5076022Z Got context for toot https://mastodon.blaede.family/@cassidy/110023835643309426
2023-03-15T03:32:05.5076414Z     pull_context(
2023-03-15T03:32:05.5076927Z   File "/home/runner/work/mastodon_get_replies/mastodon_get_replies/get_context.py", line 40, in pull_context
2023-03-15T03:32:05.5077380Z     add_context_urls(server, access_token, context_urls, seen_urls)
2023-03-15T03:32:05.5077864Z   File "/home/runner/work/mastodon_get_replies/mastodon_get_replies/get_context.py", line 532, in add_context_urls
2023-03-15T03:32:05.5078251Z     for url in context_urls:
2023-03-15T03:32:05.5078675Z   File "/home/runner/work/mastodon_get_replies/mastodon_get_replies/get_context.py", line 492, in <genexpr>
2023-03-15T03:32:05.5079076Z     itertools.chain.from_iterable(
2023-03-15T03:32:05.5079507Z   File "/home/runner/work/mastodon_get_replies/mastodon_get_replies/get_context.py", line 355, in <genexpr>
2023-03-15T03:32:05.5079992Z     get_replied_toot_server_id(server, toot, replied_toot_server_ids, parsed_urls)
2023-03-15T03:32:05.5080914Z Got context for toot https://mastodon.blaede.family/@cassidy/110023047132592424
2023-03-15T03:32:05.5081484Z Got context for toot https://mastodon.blaede.family/@cassidy/110022804984013696
2023-03-15T03:32:05.5082020Z Got context for toot https://mastodon.blaede.family/@cassidy/110022776859257298
2023-03-15T03:32:05.5082559Z Got context for toot https://mastodon.blaede.family/@cassidy/110022758367052001
2023-03-15T03:32:05.5083084Z Got context for toot https://mastodon.blaede.family/@cassidy/110022753304958284
2023-03-15T03:32:05.5083596Z Got context for toot https://mastodon.blaede.family/@cassidy/110022750620366130
2023-03-15T03:32:05.5084123Z Got context for toot https://mastodon.blaede.family/@cassidy/110022280550682771
2023-03-15T03:32:05.5084676Z Got context for toot https://mastodon.blaede.family/@cassidy/110022228850343725
2023-03-15T03:32:05.5085192Z Got context for toot https://mastodon.blaede.family/@cassidy/110022210962430763
2023-03-15T03:32:05.5085710Z Got context for toot https://mastodon.blaede.family/@cassidy/110022173643432269
2023-03-15T03:32:05.5086232Z Got context for toot https://mastodon.blaede.family/@cassidy/110022169488045032
2023-03-15T03:32:05.5087565Z Got context for toot https://mastodon.blaede.family/@cassidy/110019697080949493
2023-03-15T03:32:05.5088014Z Got context for toot https://mastodon.blaede.family/@cassidy/110017935213871779
2023-03-15T03:32:05.5088444Z Got context for toot https://mastodon.blaede.family/@cassidy/110017501952428673
2023-03-15T03:32:05.5089004Z Got context for toot https://mastodon.blaede.family/@cassidy/110017431196196315
2023-03-15T03:32:05.5089351Z Found 31 known context toots
2023-03-15T03:32:05.5089754Z Discovered redirect for URL https://mastodon.blaede.family/@[email protected]/110023703499781145
2023-03-15T03:32:05.5090222Z Got context for toot https://floss.social/@FineFindus/110023703499856524
2023-03-15T03:32:05.5090619Z Added context url https://floss.social/@sonny/110023814015856172
2023-03-15T03:32:05.5091098Z Discovered redirect for URL https://mastodon.blaede.family/@[email protected]/110024934764755193
2023-03-15T03:32:05.5091696Z Got context for toot https://social.opendesktop.org/@justinz/110024934654688230
2023-03-15T03:32:05.5092205Z Discovered redirect for URL https://mastodon.blaede.family/@[email protected]/110024644449119425
2023-03-15T03:32:05.5092687Z Got context for toot https://social.opendesktop.org/@justinz/110024644391728230
2023-03-15T03:32:05.5093445Z Discovered redirect for URL https://mastodon.blaede.family/@[email protected]/110024455190650671
2023-03-15T03:32:05.5093953Z Got context for toot https://fosstodon.org/@communiteatime/110024454995085213
2023-03-15T03:32:05.5094537Z Discovered redirect for URL https://mastodon.blaede.family/@[email protected]/110024065409345789
2023-03-15T03:32:05.5095079Z Error getting context for toot https://goto.vsta.org/@vandys/statuses/01GVH4YDKFE9AFRZ6WRXS0QK5P. Status code: 401
2023-03-15T03:32:05.5095619Z Discovered redirect for URL https://mastodon.blaede.family/@[email protected]/110023757816286497
2023-03-15T03:32:05.5096096Z Got context for toot https://fosstodon.org/@Joseph_of_Earth/110023757826966194
2023-03-15T03:32:05.5096569Z Discovered redirect for URL https://mastodon.blaede.family/@[email protected]/110023008179073203
2023-03-15T03:32:05.5097019Z Got context for toot https://hachyderm.io/@schlink/110023008163576219
2023-03-15T03:32:05.5097400Z Added context url https://hachyderm.io/@schlink/110023085200502690
2023-03-15T03:32:05.5097791Z Added context url https://hachyderm.io/@schlink/110023093857209669
2023-03-15T03:32:05.5098200Z Added context url https://octodon.social/@alienghic/110023013757536494
2023-03-15T03:32:05.5098599Z Added context url https://hachyderm.io/@schlink/110023018822456395
2023-03-15T03:32:05.5098993Z Added context url https://octodon.social/@alienghic/110023054065087700
2023-03-15T03:32:05.5099375Z Added context url https://hachyderm.io/@schlink/110023059265039386
2023-03-15T03:32:05.5099763Z Added context url https://octodon.social/@alienghic/110023062967678202
2023-03-15T03:32:05.5100172Z Added context url https://social.binarydad.com/@ryan/110023021042427958
2023-03-15T03:32:05.5100569Z Added context url https://floss.social/@downey/110023097070802026
2023-03-15T03:32:05.5100936Z Added context url https://hachyderm.io/@schlink/110023101998553219
2023-03-15T03:32:05.5102159Z Added context url https://floss.social/@downey/110023111380778319
2023-03-15T03:32:05.5102566Z Added context url https://mstdn.social/@MrBigEars/110024076748410502
2023-03-15T03:32:05.5102945Z Added context url https://vmst.io/@djwfyi/110024813738792924
2023-03-15T03:32:05.5103312Z Added context url https://fosstodon.org/@atoponce/110024853529562195
2023-03-15T03:32:05.5103708Z Added context url https://hachyderm.io/@schlink/110024854932015522
2023-03-15T03:32:05.5107489Z   File "/home/runner/work/mastodon_get_replies/mastodon_get_replies/get_context.py", line 369, in get_replied_toot_server_id
2023-03-15T03:32:05.5110865Z     mention = [
2023-03-15T03:32:05.5112463Z IndexError: list index out of range
2023-03-15T03:32:05.5400723Z ##[error]Process completed with exit code 1.
2023-03-15T03:32:05.5584291Z Cleaning up orphan processes

Error getting timeline toots: 'next'

I get the error Error getting timeline toots: 'next' many many times for my instance, https://social.harding.dev.

Convert positional parameters into named ones

Not entirely sure how this works with Python, but I hate that we now have so many configuration options that must be supplied as positional parameters to the script.

I want to use named parameters, so that we can omit unwanted parameters more easily.

But we need to ensure backwards compatibility.

[question] fetching replies for all users on instance?

I read all the instructions and your blog page about it, but I am not sure I understood its functionality correctly, I'm afraid.

I am the admin of a small Mastodon instance (10 users) and I want to use your script to fetch replies for all my users. Is this possible or am I fetching only person A's replies when using person A's access token? And if so, is there a way to do this for my whole instance?

Thank you in advance!

How to proceed with update that will cause FediFetcher to be very slow for a few executions?

I'm currently working on a new feature for FediFetcher which will backfill the profiles of any users mentioned in posts on your home timeline. If your home timeline contains posts by accounts you don't follow (e.g. because you are following hashtags), these accounts will also be backfilled.

Due to the large number of posts involved, this can take a lot of time for the first execution. Because FediFetcher caches profiles that it has backfilled, it'll then catch up relatively quickly in subsequent executions (as long as you don't leave too long gaps between executions), and once it had caught up, I noticed no significant speed difference.

In my testing, when fetching context for the last 400 posts in my home timeline, these were the execution times: 1st run after the update: 4.5 hours. 2nd run: 1.5 hours. 3rd run: 45 min. 4th and subsequent runs: < 5 min.

Obviously your mileage will vary, depending particularly on how many accounts you don't follow appear in your timeline.

Now my question: How should I deal with this update, particularly for users of GitHub Actions, who'll just get the update without any action on their side?

I could either make this new behaviour optional, and disable it on the first run, or I could just enable it for everyone. But for users of GitHub Actions, that would mean that the first run after release would suddenly take several hours, without obvious reasons on why this happened.

My preferred option is to just ship it, and let every admin deal with it. I don't really want to make this optional, let alone turn it off by default, because I think it's super helpful.

But I'm also conscious that I don't want to alienate admins who might trust me with this script.

So, particularly if you are using FediFetcher as a GitHub Action, I'd love to hear from you: Should this be enabled by default, or not?

Implement basic locking to prevent overlapping runs

If running the script locally (using cron or another mechanism) there is no overlap protection.

I'm implementing a file based locking mechanism for this, with a configurable --lock-hours parameter:

If the --lock-hours parameter is provided, the lock file will be discarded if it's older than the number of hours provided. This offers protection against crashes in the script.

Can we backfill profiles of people we follow?

See https://holme.social/@Josh/110006268889083737

Add the ability to fetch data from lists

Use the /api/v1/accounts/:id/lists backend to get the list of accounts followed on a given list (this requires the permission read:lists)
Backfill the accounts included in these lists
Expose the option, either as from-lists or max-lists in the configuration

Something changed and now takes ages to finish

Something changed like an hour ago and now the action runs for more than hour without end. I did cancel the run and attempt to fire it again, but it always gets fetching to no end.

https://github.com/quicoto/mastodon-get-replies/actions

https://github.com/quicoto/mastodon-get-replies/actions/runs/4573577685

GitHub action fails with unrecognized arguments

Forked repository as of c587e09 fails with the following error message:

`cron` schedule variable

Would you consider adding an environment variable for the cron schedule?

It'd be nice to have the option to run the workflow every 15 or 20 minutes instead of every 10.

[request] fetch new posts with recently followed tags from remote instances

Running a single user instance, I still don't see a lot of posts for hashtags—established and popular ones like #DogsOfMastodon and current events are useless.

I would like to add a fetch recently followed tag posts, perhaps from a specified remote instance (e.g. mastodon.social).

The tags docs don't look like there's a great way to see the concept of "newly followed tags" other than, perhaps, a tag's ID.

The established parameters of number of posts, timeframe et cetera would be useful here as well.

In a way this is a bit like a relay I suppose, but much more specific.

How to use on-done, on-start, on-fail with secret URLs?

I'm using the GitHub Action and would like to use on-done etc. but I don't want the URLs to be visible in the config.json file. Is there any way of achieving this?

Running container with docker compose does not accept configuration from environment file

Putting command line options in as environment variables and launching using docker-compose always returns an error:

You must supply at least a server name and an access token

Docker documentation seems to say that the two methods should be identical. Is Fedifetcher reading from environment variables?

Exit code 127 with on-fail specified

I added ON_FAIL and ON_DONE environment variables to my Actions but now I'm getting an error with "command not found", it seems that on-start also needs to be specified? It's also highlighting an issue with max-bookmarks too (note that the URLs have been redacted):

Run python find_posts.py --lock-hours=0 --access-token=*** --server=social.racf.guru --reply-interval-in-hours=0 --home-timeline-length=200 --max-followings=80 --user= --max-followers=0  --http-timeout=5 --max-follow-requests=0 --on-fail=https://uptime.zzz/api/push/xxxxxx?status=down&msg=Failed%20Run&ping= --on-start= --on-done=https://uptime.zzz/api/push/xxxxxx?status=ok&msg=OK&ping= --max-bookmarks=0 --remember-users-for-hours=168 --from-notifications=1 --backfill-with-context=1  --backfill-mentioned-users=1  --max-favourites=0
/home/runner/work/_temp/23caf967-1fa7-49dd-965b-7234c27c4927.sh: line 1: --on-start=: command not found
/home/runner/work/_temp/23caf967-1fa7-49dd-965b-7234c27c4927.sh: line 1: --max-bookmarks=0: command not found
Error: Process completed with exit code 127.

Migrated to config.json but fails

Hello,

I just migrated to the new JSON configuration

My config file is here https://github.com/quicoto/mastodon-get-replies/blob/main/config.json

Action fails saying the config.json does not exist https://github.com/quicoto/mastodon-get-replies/actions/runs/5277386376/jobs/9545361582

Thank you

New runs failing since merge of 886a0ce

I am seeing this error:

File "/home/runner/work/mastodon_get_replies/mastodon_get_replies/find_posts.py", line 756, in
arguments.lock_file = os.path.join(arguments.state, 'lock.lock')
AttributeError: 'Namespace' object has no attribute 'state'

Can I make it do less?

The GitHub actions take too long and they get cancelled over and over https://github.com/quicoto/mastodon-get-replies/actions

I've set HOME_TIMELINE_LENGTH to 10 but that doesn't help either.

Is there a config I can use to fetch less stuff? At the beginning it used to work just fine. Can I skip profile fetching? I'm mainly interested in threads

Thank you

Extremely large gaps during rate limit timeout

When the API Rate Limit hit is reached for any given toot, it is holding for 6 hours before retrying when running locally. Is that normal? Seems excessive.

Parsing error

I got another build fail when parsing a URL

Not sure if we can catch these as well, following up on #1

https://github.com/quicoto/mastodon-get-replies/actions/runs/4362831948/jobs/7628194627

Thank you

nanos / fedifetcher Goto Github PK

fedifetcher's Introduction

FediFetcher for Mastodon

Supported servers

Setup

1) Get the required access token:

If you are an Admin on your instance

If you are not an Admin on your Instance

2) Configure and run FediFetcher

To run FediFetcher as a GitHub Action:

To run FediFetcher as a cron job:

To run FediFetcher from a container:

To run FediFetcher with systemd-timer:

Configuration options

Advanced Options

Multi User support

Required Access Token Scopes

Acknowledgments

fedifetcher's People

Contributors

Stargazers

Watchers

Forkers

fedifetcher's Issues

Recommend Projects

Recommend Topics

Recommend Org