Code Monkey home page Code Monkey logo

grabit's Introduction

GrabIt

GrabIt is a tool built to archive self-posts, images, gifs and videos from subreddit and users from Reddit. This program works through the command line and uses Python 3.

Installation

Get your Reddit API credentials.

Install all the dependencies.

pip3 install -r requirements.txt

Add the Reddit API client ID and secret through the terminal as shown below, replace the string in quotes with your credentials:

python3 RedditGrabber.py --reddit_id "client_id_here" --reddit_secret "client_secret_here"

If you do not wish to enter them through the terminal you can also enter the client id and secret in the config.json file in the resources folder.

Usage and Arguments

Subreddits and users or a submission url are positional arguments and must be entered at the start. Subreddits must be entered without any prefix whereas users must be untered with a "u/" before the username. To download from a single subreddit, in this case /r/diy

python3 RedditGrabber.py diy

You can also pass in a list of subreddits and users in the form of a txt file, which contains each subreddit or user on a newline.

python3 RedditGrabber.py subs.txt

Below are all the optional arguments that you can use:

-h, --help                      show this help message and exit

-p POSTS, --posts POSTS         Number of posts to grab on each cycle
--search SEARCH                 Search for submissions in a subreddit
--sort SORT                     Sort submissions by "hot", "new", "top", or "controversial"
--time_filter TIME_FILTER       Filter sorted submission by "all", "day", "hour", "month", 
                                "week", or "year"
-w WAIT, --wait WAIT            Wait time between subreddits in seconds
-c CYCLES, --cycles CYCLES      Number of times to repeat after wait time
-o OUTPUT, --output OUTPUT      Set base directory to start download
-t OUTPUT_TEMPLATE, --output_template OUTPUT_TEMPLATE
                                Specify output template for download
--allow_nsfw                    Include nsfw posts too
-v, --verbose                   Sets verbose
--pushshift                     Only use pushshift to grab submissions
--ignore_duplicate              Ignore duplicate media submissions
--blacklist BLACKLIST           Avoid downloading a user or subreddit
--search SEARCH                 Search for submissions in a subreddit
--reddit_id REDDIT_ID           Reddit client ID
--reddit_secret REDDIT_SECRET   Reddit client secret
--imgur_cookie IMGUR_COOKIE     Imgur authautologin cookie
--db_location                   Set location of database file

Output Template

By default the program saves by subreddit then user, if you would like to change this you can specify an output template.

The default can be represented by -t '%(subreddit)s/%(author)s/%(id)s-%(title)s.%(ext)s'. If you would like to only save by author and name the file by title, you can do the following -t '%(author)s/%(title)s.%(ext)s'.

Note, if you ues this parameter you must specify a template for the filename and use %(ext)s if you wish the files to save properly. If you only wish to change the output directory you can use the --output parameter.

Below are the available tags

Tags Description
author The author of the submission
subreddit The subreddit of the submission
id ID of the submission
created_utc Time the submission was created
title Title of the submission
ext File extension

Blacklist

If you wish to avoid downloading a specific user or subreddit you can blacklist them. Below is an example of how you would blacklist the user "Gallowboob" and the subreddit "r/Documentaries".

python3 RedditGrabber.py --blacklist u/GallowBoob
python3 RedditGrabber.py --blacklist r/Documentaries

Search

You can search a subreddit using keywords along with sorting and time filters. Below are examples of a simple search on r/all for "breakfast cereal".

python3 RedditGrabber.py all --search "breakfast cereal"

If you do not use the "--sort" flag then it will default to sorting by relevance, otherwise you can use "hot", "top", "new" or "comments". While using the search you can also get links by time using the "--time_filter" flag with "all", "day", "hour", "month", "week", or "year". Below is an example searching r/DataHoader for "sata fire" sorted by top submissions retrieving links only from the past year.

python3 RedditGrabber.py DataHoarder --search "sata fire" --sort top --time_filter year

Imgur Cookie

Imgur requires users to login to view NSFW content on their site, therefore if you wish to download such content that has been posted to Reddit you will need to provide the cookie used to verify an Imgur login.

Using the flag provide the authautologin cookie data. You can find this cookie in your browser's storage inspector (Chrome, Edge, Firefox, Safari).

python3 RedditGrabber.py --imgur_cookie "abcdefghi9876%jklmnop54321qrstu"

The cookies is then stored in the config.json file for future use. If you wish to update the cookie use the command above with the new value.

grabit's People

Contributors

peskypotato avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

grabit's Issues

Connection timeout on saveAlbum

May be due to ratelimiting, didn't happen until last month.

Traceback (most recent call last):
  File "/usr/lib/python3.5/urllib/request.py", line 1254, in do_open
    h.request(req.get_method(), req.selector, req.data, headers)
  File "/usr/lib/python3.5/http/client.py", line 1106, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request
    self.endheaders(body)
  File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python3.5/http/client.py", line 934, in _send_output
    self.send(msg)
  File "/usr/lib/python3.5/http/client.py", line 877, in send
    self.connect()
  File "/usr/lib/python3.5/http/client.py", line 1252, in connect
    super().connect()
  File "/usr/lib/python3.5/http/client.py", line 849, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "/usr/lib/python3.5/socket.py", line 711, in create_connection
    raise err
  File "/usr/lib/python3.5/socket.py", line 702, in create_connection
    sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "RedditGrabber.py", line 165, in <module>
    main(subR, posts)
  File "RedditGrabber.py", line 115, in main
    grabber(subR, direct, posts)
  File "RedditGrabber.py", line 65, in grabber
    saveAlbum(albumId, str(submission.author), str(submission.subreddit), title, direct)
  File "/home/boxy/Documents/RedditImageBackup/handlers/ImgurDownloader2.py", line 53, in saveAlbu$
    urllib.request.urlretrieve(image.link, os.path.join(folder, "(" + str(counter) + ") " + str(im$ge.id) + type))
  File "/usr/lib/python3.5/urllib/request.py", line 188, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/usr/lib/python3.5/urllib/request.py", line 163, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.5/urllib/request.py", line 466, in open
    response = self._open(req, data)
  File "/usr/lib/python3.5/urllib/request.py", line 484, in _open
    '_open', req)
  File "/usr/lib/python3.5/urllib/request.py", line 444, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.5/urllib/request.py", line 1297, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/usr/lib/python3.5/urllib/request.py", line 1256, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 110] Connection timed out>

ValueError: Invalid values

Traceback (most recent call last):
  File "user/GrabIt/RedditGrabber.py", line 183, in <module>
    main(parser)

  File "user/GrabIt/RedditGrabber.py", line 125, in main
    feeder(subR, parser)

  File "user/GrabIt/RedditGrabber.py", line 83, in feeder
    submission_queue = Reddit(subR, parser).queue()

  File "user/GrabIt/resources/interfaces/reddit.py", line 65, in queue
    submissions = reddit.subreddit(self.subR).controversial(limit=int(posts), time_filter=self.parser.time_filter)

  File "user.local/lib/python3.9/site-packages/praw/models/helpers.py", line 313, in __call__
    return Subreddit(self._reddit, display_name=display_name)

  File "user/.local/lib/python3.9/site-packages/praw/models/reddit/subreddit.py", line 561, in __init__
    super().__init__(reddit, _data=_data)

  File "user/.local/lib/python3.9/site-packages/praw/models/listing/mixins/subreddit.py", line 71, in __init__
    super().__init__(reddit, _data=_data)

  File "user/.local/lib/python3.9/site-packages/praw/models/reddit/base.py", line 65, in __init__
    raise ValueError(

ValueError: An invalid value was specified for display_name. Check that the argument for the display_name parameter is not empty.

praw outdated

praw package is outdated error code:
Version 7.5.0 of praw is outdated. Version 7.6.0 was released 1 day ago

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.