Reddit archiving scripts
- A recent Python version for sqlite v3.37.0. I have python 3.11.3 and it works just fine.
- gallery-dl
- praw (
pip install praw
) - disable 2fa
Instructions are from @manojkarthick/reddsaver
- Create a new script application at https://www.reddit.com/prefs/apps
- Click on create an app at the bottom of the page
- Input a name for your application, for example: -reddsaver
- Choose "script" as the type of application
- Set "http://localhost:8080" or any other URL for the redirect url
- Click on "create app" - you should now see the application has been created
- Under your application name, you should see a random string - that is your client ID
- The random string next to the field "secret" is your client secret
- Copy the client ID and client secret information returned
_secrets.py
is used by _config.py
. Here's an example:
CLIENT_ID = "GJPeNG1ngA3"
CLIENT_SECRET = "sdfJsdfsd_dsfo_sdflksjf02"
USERNAME = "username"
PASSWORD = "password"
The default _config.py
looks like this:
from _secrets import *
SN0_DB_PATH = "sn0-archive.sqlite3"
MEDIA_PATH = "sn0-media"
USER_AGENT = "Python:sn0-archiver:v1.0.0 (by /u/coalfa)"
GALLERY_DL_COMMAND = "gallery-dl"
ICON_IMG_THREADS = 8
SUBMISSION_MEDIA_THREADS = 2
# number of MoreComments to unpack. to archive all comments, set to None
MORECOMMENTS_LIMIT = 32
# number of comments per post to archive, set to None for no limit
COMMENT_LIMIT = None
With MORECOMMENTS_LIMIT
and COMMENT_LIMIT
set to None
, sn0-archiver will archive all comments. This, however, is quite time consuming and I don't recommend it.
If you're having difficulty getting gallery-dl
to show up on your path. Find the site-packages
directory that your python (or python3) is using. Then add that directory to your $PYTHONPATH
. For example:
export PYTHONPATH="$PYTHONPATH:/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages"
Next, change the GALLERY_DL_COMMAND
GALLERY_DL_COMMAND = "python -m gallery_dl"
# or
GALLERY_DL_COMMAND = "python3 -m gallery_dl"
python 1_provision.py
python 2_archive_reddits.py
Optionally, unsave posts after archiving. This may be necessary since the reddit API only shows a limited number of saved posts (1000). You can exit this using ctrl+c
without losing progress on previously saved
python 2_archive_reddits.py and unsave
If you mistakenly unsaved things, you can run resave.py
to resave all comments and submissions. It won't be in the order you saved them in though.
I recommend running this periodically while 2_archive_reddits.py
is running.
python 3_archive_media.py
3_archive_media.py
randomly stops sometimes and needs to be restarted.