Code Monkey home page Code Monkey logo

recipy's Introduction

Python Django React PostgreSQL

A crawler framework that scrapes data from famous Greek cuisine sites.

Current target sites:

Installation

DjangoREST & Scrapy Installation from source

This section contains the installation instructions in order to set up a local development environment. The instructions have been validated for Ubuntu 20.04.

First, install all required software with the following command:

sudo apt update
sudo apt install git python3 python3-pip python3-dev postgresql postgresql-contrib 

The project dependencies are managed with pipenv. You can install it with:

pip install --user pipenv

pipenv should now be in your PATH. If not, logout and log in again. Then install all dependencies with:

pipenv install --dev

Then you can enable the python environment with:

pipenv shell

All commands from this point forward require the python environment to be enabled.

Environment variables

The project uses environment variables in order to keep private data like user names and passwords out of source control. You can either set them at system level, or by creating a file named .env at the root of the repository. The required environment variables for development are:

  • RECIPY_DATABASE_USER: The database user
  • RECIPY_DATABASE_PASSWORD: The database user password
  • RECIPY_DATABASE_HOST: The database host. For local development use localhost
  • RECIPY_DATABASE_NAME: The database name.

Local Development

In order to run the project on your workstation, you must create a database named according to the value of the RECIPY_DATABASE_NAME environment variable, at the host that is specified by the RECIPY_DATABASE_HOST environment variable. You can create the database by running:

sudo -u postgres psql
postgres=# CREATE DATABASE recipy_development_db;

After you create the database, you can populate it with the initial schema by running:

python manage.py migrate

Now you can run the web server, exposing the API:

python manage.py runserver

The API is available at http://127.0.0.1:8000/api/v1/

The documentation Swagger page of the API is available at http://127.0.0.1:8000/api/swagger

Also in order to populate the database with data you must run the crawlers. In order to do that, just simply run the following

cd crawlers
./deploy.sh

This will spawn a Scrapyd instance and will execute all the crawlers concurrently.

The Scrapyd management page is available at http://127.0.0.1:6800

If you want to run each crawler saperately run:

scrapy crawl <crawler-name>

Installation using Docker (RECOMMENDED)

Initially, install Docker Engine (click the link to see instructions) & Docker Compose in order to build the project.

Set up the .env at the root of the repository!

  • RECIPY_DATABASE_USER: The database user
  • RECIPY_DATABASE_PASSWORD: The database user password
  • RECIPY_DATABASE_HOST: db The host name must be db
  • RECIPY_DATABASE_NAME: The database name.

Then just execute the following:

docker-compose up --build

Then you have the database, the API & the crawlers & the React frontend client up and running!

The database is exposed at jdbc:postgresql://localhost:5433/

The API, the Swagger page and the Scrapy page are available to the same addresses that referred above. The React client is available at http://127.0.0.1:5000/

Aditional Notes

The diagram below shows the structure and the main components of the ReciPy project.

The project is structured mainly by:

  • The Crawlers component which gathers all the required data from the targeted websites
  • A database in which those data are stored
  • An API that is able to access the data and to provide them following the REST architecture
  • And finally a web application, that is used as the User Interface, from which the users can search for recipes that exists in one of the targeted websites.

Below is an example of the management console of Scrapyd showing the status of each crawler process:

The following endpoints were implemented in order to serve all the requests of the front-end application:

Below is the Diagram of the respective schema that we used in order to store the various Recipes, Sites and Ingredients that were retrieved from the crawlers:

Finally, the below screenshots display the frontend application.

Search page:

Recipe detail page:

recipy's People

Contributors

vtsiatouras avatar nickmanit avatar teomandi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.