Code Monkey home page Code Monkey logo

scrapinghub-elasticsearch-loader's Introduction

Load items from Scrapy Cloud to ElasticSearch instance

Installation

Install dependencies:

virtualenv venv
source venv/bin/activate
pip install .

Also you need to install ElasticSearch, or install docker and docker-compose to use the docker-compose.yml config from this project.

Usage

Fire up ElasticSearch

Launch it if you have local installation and make sure that it's running or use a configuration from this project and run ElasticSearch and Kibana with command:

docker-compose up -d

Set environmental variables

In order to use this script you need you Scrapy Cloud API key, add it to environmenatal variable SH_APIKEY:

export SH_APIKEY="your_key"

Run script

The project has a command line interface "shes" (ScrapingHub - ElasticSearch), try running it and see a help message:

$ ./shes.py -h
Download items to ElasticSearch.

usage: shes.py -j JOB_ID [-e ELASTICSEARCH_URL] [-i INDEX] [-t DOC_TYPE] [-h]

Download items from Scrapinhub cloud and upload them to ElasticSearch index.

optional arguments:
  -h, --help            show this help message and exit
  -j JOB_ID, --job_id JOB_ID                                Required Scrapy Cloud job idetentifier
  -e ELASTICSEARCH_URL, --elasticsearch ELASTICSEARCH_URL   URL of ElasticSearch instance, [default: localhost:9200]
  -i INDEX, --index index                                   Index name, defaults to job_id
  -t DOC_TYPE, --type DOC_TYPE                              Document type, [default: product]

scrapinghub-elasticsearch-loader's People

Contributors

al-serebrov avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

scrapinghub-elasticsearch-loader's Issues

Dependencies are not installed through setup.py

If the package is installed though setup.py:

➜ pip install .
Processing /home/alex/web/sh-es-loader
Building wheels for collected packages: shes
  Building wheel for shes (setup.py) ... done
  Created wheel for shes: filename=shes-0.5-cp36-none-any.whl size=16746 sha256=2974729c15a020bd3885db7712b69f1eb7983ef9403d435e95b73264c959e8fa
  Stored in directory: /home/alex/.cache/pip/wheels/09/3c/53/0f75f4671470b1e6202e29fc617af498eda6ee8fd8bb0cd053
Successfully built shes
Installing collected packages: shes
Successfully installed shes-0.5

And then run:

➜ python shes.py -j $JOB_ID
Traceback (most recent call last):
  File "shes.py", line 18, in <module>
    from docopt import docopt
ModuleNotFoundError: No module named 'docopt'

We can see that the dependencies are not installed, a hotfix (for now) is to manually install them:

➜ pip install -r requirements.txt

Updated README.md accordingly in 3628423
But we need to figure out why they are not installed with the package and fix this.

If I add a workaround for installing dependencies from requirements.txt file as suggested here, tox starts failing because it's unable to locate requirements.txt file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.