Code Monkey home page Code Monkey logo

mh_scrapy's Introduction

mh_scrapy

Web scraper written in Selenium (Python) inspired by my work in a call center, collecting contact data of rental agencies and individuals offering their estates online. Scraping this sort of data using Scrapy library was successful mostly only with Czech estate websites.

Dependencies:

  • pyenv global 3.7.0; python -m venv venv370; pyenv global system; source venv370/bin/activate
  • (venv370) pip install scrapy
  • (venv370) pip install scrapy-fake-useragent

Procedure:

(0. activate the venv and 'cd' to the project folder):

  • $ source venv370/bin/activate
  • $ cd mh/mh

(examples) sreality - fully tested

  • open Firefox, get to the search results
  • select 20, 40 or 60 results per page
  • press CTRL + SHIFT + E to open built-in network monitor of Firefox
  • press CTRL + R to reload the page in order to catch the API calls of the website
  • copy the Request URL for example: https://www.sreality.cz/api/cs/v2/estates?category_main_cb=1&category_type_cb=1&per_page=60
  • now, how many pages of results to scrape? let's say 10 pages
  • what is going to be the name of the output file? let's say "sreality.csv" (CSV format)
  • paste the URL in the -a spec= argument, paste the number of pages in the -a pages= argument and paste the filename in the -o argument
  • execute the following command with the arguments, example: (venv) standa@e330 ~/PycharmProjects/mh/mh $ scrapy crawl sreality -a pages=10 -a spec='https://www.sreality.cz/api/cs/v2/estates?category_main_cb=1&category_type_cb=1&per_page=60' -o sreality.csv

bezrealitky - tested

  • open Firefox, get to the search results
  • copy the URL
  • paste the URL in the -a spec= argument and paste the filename in the -o argument
  • execute the following command with the arguments, example: (venv) standa@e330 ~/PycharmProjects/mh/mh $ scrapy crawl bezrealitky -a spec='https://www.bezrealitky.cz/vypis/nabidka-prodej/byt' -o bezrealitky.csv

reality_idnes - not fully tested

  • open Firefox, get to the search results
  • copy the URL
  • paste the URL in the -a spec= argument and paste the filename in the -o argument
  • execute the following command with the arguments, example: (venv) standa@e330 ~/PycharmProjects/mh/mh $ scrapy crawl reality_idnes -a spec='https://reality.idnes.cz/s/prodej/byty/' -o reality_idnes.csv

mh_scrapy's People

Contributors

chicocheco avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.