Code Monkey home page Code Monkey logo

autoria-parser's Introduction

Auto Ria Parser

This project utilizes Scrapy to get data on used cars from auto.ria.com and saves it into PostgreSQL database.

  • The code responsible for parsing is in autoria_parser/spider/car_parser.py
  • All database actions are handled in autoria_parser/pipelines.py
  • In autoria_parser/celery.py Celery settings for scheduled tasks are stored
  • In autoria_parser/log.py Logger settings are stored

.env

Overview:

  • POSTGRES_HOST=db Hostname for PostgreSQL Database
  • POSTGRES_DB=cars Name of the Database
  • POSTGRES_USER=postgres Username
  • POSTGRES_PASSWORD=postgres Password for the user
  • POSTGRES_PORT=5432 Database port, for PostgreSQL default is 5432
  • RABBIT_URL=amqp://guest:guest@rabbitmq3:5672/ RabbitMQ broker url
  • PAGES=10 Number of pages to be scraped
  • PARSE_HOUR=12 For daily parsing: hour
  • PARSE_MINUTE=0 For daily parsing: minutes
  • DUMP_HOUR=12 For daily database dump: hour
  • DUMP_MINUTE=0 For daily database dump: minutes

Starting project locally

To run the project follow next steps:

  1. Fork the repository

  2. Clone it: git clone <here goes the HTTPS link you could copy on github repositiry page>

  3. Create a virtual environment: python3 -m venv venv

  4. Acivate venv:

  • MAC source venv/Scripts/activate
  • Windows cd venv/Scripts/activate -> . activate
  1. Create .env file:
  • You can copy .env.sample if you are going to use Docker.
  1. Launch the project:
  • With Docker: docker-compose up --build
  • Locally: scrapy crawl car_parser

autoria-parser's People

Contributors

lyutillis avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.