Code Monkey home page Code Monkey logo

scrapyd-client's Introduction

Scrapyd-client

PyPI Version Build Status Coverage Status Python Version

Scrapyd-client is a client for Scrapyd. It provides:

Command line tools:

  • scrapyd-deploy, to deploy your project to a Scrapyd server
  • scrapyd-client, to interact with your project once deployed

Python client:

  • ScrapydClient, to interact with Scrapyd within your python code

scrapyd-deploy

Deploying your project to a Scrapyd server typically involves two steps:

  1. Eggifying your project. You'll need to install setuptools for this. See Egg Caveats below.
  2. Uploading the egg to the Scrapyd server through the addversion.json endpoint.

The scrapyd-deploy tool automates the process of building the egg and pushing it to the target Scrapyd server.

Including Static Files

If the egg needs to include static (non-Python) files, edit the setup.py file in your project. Otherwise, you can skip this step.

If you don't have a setup.py file, create one with:

scrapyd-deploy --build-egg=/dev/null

Then, set the package_data keyword argument in the setup() function call in the setup.py file. Example (note: projectname would be your project's name):

from setuptools import setup, find_packages

setup(
    name         = 'project',
    version      = '1.0',
    packages     = find_packages(),
    entry_points = {'scrapy': ['settings = projectname.settings']},
    package_data = {'projectname': ['path/to/*.json']}
)

Deploying a Project

First cd into your project's root, you can then deploy your project with the following:

scrapyd-deploy <target> -p <project>

This will eggify your project and upload it to the target. If you have a setup.py file in your project, it will be used, otherwise one will be created automatically.

If successful you should see a JSON response similar to the following:

Deploying myproject-1287453519 to http://localhost:6800/addversion.json
Server response (200):
{"status": "ok", "spiders": ["spider1", "spider2"]}

To save yourself from having to specify the target and project, you can set the defaults in the Scrapy configuration file.

Versioning

By default, scrapyd-deploy uses the current timestamp for generating the project version, as shown above. However, you can pass a custom version using --version:

scrapyd-deploy <target> -p <project> --version <version>

The version must be comparable with LooseVersion. Scrapyd will use the greatest version unless specified.

If you use Mercurial or Git, you can use HG or GIT respectively as the argument supplied to --version to use the current revision as the version. You can save yourself having to specify the version parameter by adding it to your target's entry in scrapy.cfg:

[deploy]
...
version = HG

Local Settings

You may want to keep certain settings local and not have them deployed to Scrapyd. To accomplish this you can create a local_settings.py file at the root of your project, where your scrapy.cfg file resides, and add the following to your project's settings:

try:
    from local_settings import *
except ImportError:
    pass

scrapyd-deploy doesn't deploy anything outside of the project module, so the local_settings.py file won't be deployed.

Egg Caveats

Some things to keep in mind when building eggs for your Scrapy project:

  • Make sure no local development settings are included in the egg when you build it. The find_packages function may be picking up your custom settings. In most cases you want to upload the egg with the default project settings.

  • Avoid using __file__ in your project code as it doesn't play well with eggs. Consider using pkgutil.get_data instead. Instead of:

    path = os.path.dirname(os.path.realpath(__file__))  # BAD
    open(os.path.join(path, "tools", "json", "test.json"), "rb").read()

    Use:

    import pkgutil
    pkgutil.get_data("projectname", "tools/json/test.json")
  • Be careful when writing to disk in your project, as Scrapyd will most likely be running under a different user which may not have write access to certain directories. If you can, avoid writing to disk and always use tempfile for temporary files.

Including dependencies

If your project has additional dependencies, you can either install them on the Scrapyd server, or you can include them in the project's egg, in two steps:

  • Create a requirements.txt file at the root of the project

  • Use the --include-dependencies option when building or deploying your project:

    scrapyd-deploy --include-dependencies
    

scrapyd-client

For a reference on each subcommand invoke scrapyd-client <subcommand> --help.

Where filtering with wildcards is possible, it is facilitated with fnmatch. The --project option can be omitted if one is found in a scrapy.cfg.

deploy

This is a wrapper around scrapyd-deploy.

projects

Lists all projects of a Scrapyd instance:

# lists all projects on the default target
scrapyd-client projects
# lists all projects from a custom URL
scrapyd-client -t http://scrapyd.example.net projects

schedule

Schedules one or more spiders to be executed:

# schedules any spider
scrapyd-client schedule
# schedules all spiders from the 'knowledge' project
scrapyd-client schedule -p knowledge \*
# schedules any spider from any project whose name ends with '_daily'
scrapyd-client schedule -p \* \*_daily
# schedules spider1 in project1 specifying settings
scrapyd-client schedule -p project1 spider1 --arg 'setting=DOWNLOADER_MIDDLEWARES={"my.middleware.MyDownloader": 610}'

spiders

Lists spiders of one or more projects:

# lists all spiders
scrapyd-client spiders
# lists all spiders from the 'knowledge' project
scrapyd-client spiders -p knowledge

ScrapydClient

Interact with Scrapyd within your python code.

from scrapyd_client import ScrapydClient
client = ScrapydClient()

for project in client.projects():
   print(client.jobs(project=project))

Scrapy configuration file

Targets

You can define a Scrapyd target in your project's scrapy.cfg file. Example:

[deploy]
url = http://scrapyd.example.com/api/scrapyd
username = scrapy
password = secret
project = projectname

You can now deploy your project without the <target> argument or -p <project> option:

scrapyd-deploy

If you have multiple targets, add the target name in the section name. Example:

[deploy:targetname]
url = http://scrapyd.example.com/api/scrapyd

[deploy:another]
url = http://other.example.com/api/scrapyd

If you are working with CD frameworks, you do not need to commit your secrets to your repository. You can use environment variable expansion like so:

[deploy]
url = $SCRAPYD_URL
username = $SCRAPYD_USERNAME
password = $SCRAPYD_PASSWORD

or using this syntax:

[deploy]
url = ${SCRAPYD_URL}
username = ${SCRAPYD_USERNAME}
password = ${SCRAPYD_PASSWORD}

To deploy to one target, run:

scrapyd-deploy targetname -p <project>

To deploy to all targets, use the -a option:

scrapyd-deploy -a -p <project>

To list all available targets, use the -l option:

scrapyd-deploy -l

To list all available projects on one target, use the -L option:

scrapyd-deploy -L example

While your target needs to be defined with its URL in scrapy.cfg, you can use netrc for username and password, like so:

machine scrapyd.example.com
    login scrapy
    password secret

scrapyd-client's People

Contributors

jpmckinney avatar funkyfuture avatar redapple avatar pablohoffman avatar mxdev88 avatar rdowinton avatar laerte avatar alexanderlukanin13 avatar dangra avatar digenis avatar david-caro avatar felixonmars avatar jayzeng avatar curita avatar julien-duponchelle avatar kmike avatar pfrenssen avatar torymur avatar jxltom avatar lalalalovewh avatar nyov avatar utek avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.