Code Monkey home page Code Monkey logo

vk-scraper's Introduction

VK Scraper

Python 3.5, 3.6, 3.7 PyPI AUR

vk-scraper is a command-line application written in Python that scrapes and downloads VK user's / community's data. Use responsibly.

To get a better understanding of how it works, head to the docs.

Features

  • Scrape photos
  • Scrape videos (both uploaded and external)
  • Scrape saved photos
  • Scrape stories

Install

Arch GNU/Linux

For the stable version (vk-scraper):

git clone https://aur.archlinux.org/vk-scraper.git vk-scraper

For the git version (vk-scraper-git):

git clone https://aur.archlinux.org/vk-scraper-git.git vk-scraper

Then build & install:

cd vk-scraper 
makepkg -sic

Or use an AUR helper of your choice.

Other distros

For the stable version:

$ pip3 install vk-scraper --upgrade --user

For the git version:

$ pip3 install git+https://github.com/vanyasem/VK-Scraper.git --upgrade --user

Usage

To scrape media:

vk-scraper <username/community> -u <your username> -p <your password>

By default, downloaded media will be placed in <current working directory>/<username>.

To specify multiple users/communities, pass a comma separated list of users:

vk-scraper username1,community1,username2,username3,community2

You can also supply a file containing a list of users/communities:

vk-scraper -f scrape_list.txt
$ cat vk_users.txt
username1
community1
username2
username3
community2
...

Usernames may be separated by newlines, commas, semicolons, or whitespace.

Arguments

--help -h             Show help message and exit

--login-user  -u      Your VK username

--login-pass  -p      Your VK password

--filename    -f      Path to a file containing a list of users/communities to scrape

--destination -d      Specify destination folder. By default, media will
                      be downloaded to <current working directory>/<username>

--retain-username -n  Creates a subdirectory for each scraped name when the flag is set

--media-types -t      Specify media types to scrape. Enter as space separated values.
                      Valid values are image, saved, video, story, wall, or none
                      (defaults to image)

--latest              Scrape only new media since the last scrape. Uses the last modified
                      time of the latest media item in the destination directory for comparasion

--quiet       -q      Be quiet while scraping

--maximum     -m      Maximum number of items to scrape

--offset      -o      Offset from which the scrape starts. 0 is from the oldest. (Defaults to 0)

Contribution

  1. Check open issues, or open a new one to start a discussion around your idea or a bug you found
  2. Fork the repository and make your changes
  3. Send a pull request

Futurelog

  • Scrape by hashtag
  • Scrape by location
  • Save metadata to a file (likes, comments, etc)
  • Sort photos by their albums

vk-scraper's People

Contributors

rsxrwscjpzdzwpxaujrr avatar vanyasem avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

vk-scraper's Issues

ERROR: This video was marked as adult content

Hi
I am unable to download video tagged with adult

Searching -204491214 for videos: 2 videos [00:01, 1.43 videos/s] Downloading: 0%| | 0/2 [00:00<?, ?it/s] ERROR: This video was marked as adult content. Embedding adult videos on external websites is prohibited. Downloading: 50%|████████████████████████████████████ | 1/2 [00:02<00:02, 2.95s/it] ERROR: This video was marked as adult content. Embedding adult videos on external websites is prohibited.

Does anyone know how to make it work.

Stories are duplicated sometimes

When downloading user's profile several times a day, the name of the mp4 file on VK servers changes and thus detects as new media and gets downloaded again with a different name resulting in a lot of duplicates

Scraping Communities Data?

Hello-- the app description states "vk-scraper is a command-line application written in Python that scrapes and downloads VK user's / community's data. Use responsibly" but when I add the name of the community to the command as 'VK-Scraper @vk_args.txt [community_name]' or the list file as 'VK-Scraper @vk_args.txt -f vk_users.txt' I get the error: 'Error getting user details for [community_name]' Is the feature not working or am I doing something incorrectly?

Outdated API

vk-scraper uses outdated VK API (thought still available to use for backwards compatibility). Should be addressed in future releases

IndexError: list index out of range

Hi,

I am getting an IndexError issue using the latest version.
Python 3.6.6 via pyenv, Debian 10

Command: vk-scraper user -u "[email protected]" -p "pass"

Traceback (most recent call last):
  File "/home/sideloading/.local/bin/vk-scraper", line 11, in <module>
    load_entry_point('VK-Scraper==2.0.3', 'console_scripts', 'vk-scraper')()
  File "/home/sideloading/.local/lib/python3.6/site-packages/vk_scraper/app.py", line 492, in main
    scraper.scrape()
  File "/home/sideloading/.local/lib/python3.6/site-packages/vk_scraper/app.py", line 155, in scrape
    self.login()
  File "/home/sideloading/.local/lib/python3.6/site-packages/vk_scraper/app.py", line 92, in login
    self.vk_session.auth()
  File "/home/sideloading/.local/lib/python3.6/site-packages/vk_api/vk_api.py", line 177, in auth
    self._auth_cookies(reauth=reauth)
  File "/home/sideloading/.local/lib/python3.6/site-packages/vk_api/vk_api.py", line 208, in _auth_cookies
    self._api_login()
  File "/home/sideloading/.local/lib/python3.6/site-packages/vk_api/vk_api.py", line 439, in _api_login
    params = response.url.split('#', 1)[1].split('&')
IndexError: list index out of range

Saved pictures are not scraping.

I tried to download photos from my profile, but scraper didn't download photos from Saved photos' album. Then I tried to change my privacy settings, but that didn't help.

Scraping a list of users based on searches

I would like to find a way to gather a list of users' images based on certain requirements (e.g. Location, age, sex) using VK's search.
That would make the scraping much easier to do.

Login no longer working

Has anyone got this working recently?

Log-in is failing for me, updated vk_api module also.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.