Code Monkey home page Code Monkey logo

python-script-service-guide's Introduction

Python Script As A Service

Oxylabs promo code

A service (also known as a "daemon") is a process that performs tasks in the background and responds to system events.

Services can be written using any language. We use Python in these examples as it is one of the most versatile languages out there.

For more information, be sure to read our blogpost on the subject.

Setting Up

To run any of the examples, you will need Python 3. We also recommend using a virtual environment.

python3 -m venv venv

Create A Systemd Service

First, create a script that scrapes a website. Make sure the script also handles OS signals so that it exits gracefully.

linux_scrape.py:

import json
import re
import signal
from pathlib import Path

import requests
from bs4 import BeautifulSoup

class SignalHandler:
    shutdown_requested = False

    def __init__(self):
        signal.signal(signal.SIGINT, self.request_shutdown)
        signal.signal(signal.SIGTERM, self.request_shutdown)

    def request_shutdown(self, *args):
        print('Request to shutdown received, stopping')
        self.shutdown_requested = True

    def can_run(self):
        return not self.shutdown_requested


signal_handler = SignalHandler()
urls = [
    'https://books.toscrape.com/catalogue/sapiens-a-brief-history-of-humankind_996/index.html',
    'https://books.toscrape.com/catalogue/shakespeares-sonnets_989/index.html',
    'https://books.toscrape.com/catalogue/sharp-objects_997/index.html',
]

index = 0
while signal_handler.can_run():
    url = urls[index % len(urls)]
    index += 1

    print('Scraping url', url)
    response = requests.get(url)

    soup = BeautifulSoup(response.content, 'html.parser')
    book_name = soup.select_one('.product_main').h1.text
    rows = soup.select('.table.table-striped tr')
    product_info = {row.th.text: row.td.text for row in rows}

    data_folder = Path('./data')
    data_folder.mkdir(parents=True, exist_ok=True)

    json_file_name = re.sub('[\': ]', '-', book_name)
    json_file_path = data_folder / f'{json_file_name}.json'
    with open(json_file_path, 'w') as book_file:
        json.dump(product_info, book_file)

Then, create a systemd configuration file.

/etc/systemd/system/book-scraper.service:

[Unit]
Description=A script for scraping the book information
After=syslog.target network.target

[Service]
WorkingDirectory=/home/oxylabs/python-script-service/src/systemd
ExecStart=/home/oxylabs/python-script-service/venv/bin/python3 main.py

Restart=always
RestartSec=120

[Install]
WantedBy=multi-user.target

Make sure to adjust the paths based on your actual script location.

A fully working example can be found here.

Create A Windows Service

To create a Windows service, you will need to implement methods such as SvcDoRun and SvcStop and handle events sent by the operating system.

windows_scrape.py:

import sys
import servicemanager
import win32event
import win32service
import win32serviceutil
import json
import re
from pathlib import Path

import requests
from bs4 import BeautifulSoup


class BookScraperService(win32serviceutil.ServiceFramework):
    _svc_name_ = 'BookScraperService'
    _svc_display_name_ = 'BookScraperService'
    _svc_description_ = 'Constantly updates the info about books'

    def __init__(self, args):
        win32serviceutil.ServiceFramework.__init__(self, args)
        self.event = win32event.CreateEvent(None, 0, 0, None)

    def GetAcceptedControls(self):
        result = win32serviceutil.ServiceFramework.GetAcceptedControls(self)
        result |= win32service.SERVICE_ACCEPT_PRESHUTDOWN
        return result

    def SvcDoRun(self):
        urls = [
'https://books.toscrape.com/catalogue/sapiens-a-brief-history-of-humankind_996/index.html',
'https://books.toscrape.com/catalogue/shakespeares-sonnets_989/index.html',
'https://books.toscrape.com/catalogue/sharp-objects_997/index.html',
        ]

        index = 0

        while True:
            result = win32event.WaitForSingleObject(self.event, 5000)
            if result == win32event.WAIT_OBJECT_0:
                break

            url = urls[index % len(urls)]
            index += 1

            print('Scraping url', url)
            response = requests.get(url)

            soup = BeautifulSoup(response.content, 'html.parser')
            book_name = soup.select_one('.product_main').h1.text
            rows = soup.select('.table.table-striped tr')
            product_info = {row.th.text: row.td.text for row in rows}

            data_folder = Path('C:\\Users\\User\\Scraper\\dist\\scrape\\data')
            data_folder.mkdir(parents=True, exist_ok=True)

            json_file_name = re.sub('[\': ]', '-', book_name)
            json_file_path = data_folder / f'{json_file_name}.json'
            with open(json_file_path, 'w') as book_file:
                json.dump(product_info, book_file)

    def SvcStop(self):
        self.ReportServiceStatus(win32service.SERVICE_STOP_PENDING)
        win32event.SetEvent(self.event)


if __name__ == '__main__':
    if len(sys.argv) == 1:
        servicemanager.Initialize()
        servicemanager.PrepareToHostSingle(BookScraperService)
        servicemanager.StartServiceCtrlDispatcher()
    else:
        win32serviceutil.HandleCommandLine(BookScraperService)

Next, install dependencies and run a post-install script.

PS C:\> cd C:\Users\User\Scraper
PS C:\Users\User\Scraper> .\venv\Scripts\pip install pypiwin32
PS C:\Users\User\Scraper> .\venv\Scripts\pywin32_postinstall.py -install

Bundle your script into an executable.

PS C:\Users\User\Scraper> venv\Scripts\pyinstaller --hiddenimport win32timezone -F scrape.py

And finally, install your newly-created service.

PS C:\Users\User\Scraper> .\dist\scrape.exe install
Installing service BookScraper
Changing service configuration
Service updated

PS C:\Users\User\Scraper> .\dist\scrape.exe start
Starting service BookScraper
PS C:\Users\User\Scripts>

A fully working example can be found here.

Easier Windows Service Using NSSM

Instead of dealing with the Windows service layer directly, you can use the NSSM (Non-Sucking Service Manager).

Install NSSM by visiting the official website. Extract it to a folder of your choice and add the folder to your PATH environment variable for convenience.

Once you have NSSM installed, simplify your script by getting rid of all Windows-specific methods and definitions.

simple_scrape.py:

import json
import re
from pathlib import Path

import requests
from bs4 import BeautifulSoup

urls = ['https://books.toscrape.com/catalogue/sapiens-a-brief-history-of-humankind_996/index.html',
        'https://books.toscrape.com/catalogue/shakespeares-sonnets_989/index.html',
        'https://books.toscrape.com/catalogue/sharp-objects_997/index.html', ]

index = 0

while True:
    url = urls[index % len(urls)]
    index += 1

    print('Scraping url', url)
    response = requests.get(url)

    soup = BeautifulSoup(response.content, 'html.parser')
    book_name = soup.select_one('.product_main').h1.text
    rows = soup.select('.table.table-striped tr')
    product_info = {row.th.text: row.td.text for row in rows}

    data_folder = Path('C:\\Users\\User\\Scraper\\data')
    data_folder.mkdir(parents=True, exist_ok=True)

    json_file_name = re.sub('[\': ]', '-', book_name)
    json_file_path = data_folder / f'{json_file_name}.json'
    with open(json_file_path, 'w') as book_file:
        json.dump(product_info, book_file)

Bundle your script into an executable.

PS C:\Users\User\Scraper> venv\Scripts\pyinstaller -F simple_scrape.py

And finally, install the script using NSSM.

PS C:\> nssm.exe install SimpleScrape C:\Users\User\Scraper\dist\simple_scrape.exe
PS C:\Users\User\Scraper> .\venv\Scripts\pip install pypiwin32

A fully working script can be found here.

python-script-service-guide's People

Contributors

augustoxy avatar oxyjohan avatar oxylabsorg avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.