Code Monkey home page Code Monkey logo

samsour / it-plr-decision-scraper Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 653 KB

This repository contains a Python-based web scraping tool designed to extract decision links and their headlines from the IT-Planungsrat website. The scraped data is stored in a JSON file and displayed on a web page with filtering capabilities. This project is useful for gathering and analyzing decision information efficiently.

License: MIT License

HTML 2.19% Python 50.20% CSS 3.52% JavaScript 44.08%

it-plr-decision-scraper's Introduction

Decision Scraper

This repository contains a Python-based web scraping tool designed to extract decision links and their headlines from the IT-Planungsrat website. The scraped data is stored in a JSON file and displayed on a web page with filtering capabilities. This project is useful for gathering and analyzing decision information efficiently.

Netlify Status

IT PLR Scraper Online

Key Features

  • Scrapes decision links and headlines from specified URLs.
  • Supports additional suffixes like -al and -al-runde.
  • Stores the extracted data in a structured JSON file.
  • Provides a web interface to view and filter the data.
  • Handles consecutive misses to optimize the scraping process.

Technologies Used

  • Python
  • BeautifulSoup (for web scraping)
  • JSON (for data storage)
  • HTML, CSS, JavaScript (for web interface)
  • Python's HTTP server (for local development)

How to Use

  1. Clone the repository:
git clone https://github.com/samsour/it-plr-decision-scraper.git
  1. Navigate into the project directory:
cd decision-scraper
  1. Install required Python packages:
pip install requests beautifulsoup4
  1. Run the scraper:
python scraper.py
  1. Start the local server:
npm i

npm run start
  1. Open your web browser and navigate to:
http://localhost:9000

Example HTML Content of a Page

<article id="c211" class="ce-module v-main fitkodecisions-details" data-js-module="fitkodecisions-details" data-mk="1">
    <h1>Nutzung eines Online-Dienstes durch die IHK FOSA</h1>
    <h4>AL-Runde | 30.04.2024 | 31. Sitzung AL-Runde | Beschluss 2024/08-AL</h4>
    <div class="rte-container">
        <p class="AufzhlungBulletpoints1FITKO"><strong>Beschluss:</strong></p>
        <p class="AufzhlungBulletpoints1FITKO">Die AL-Runde beschließt, die Kosten für die Nutzung des Online-Dienstes „Anerkennung ausländischer Berufsqualifikationen“, soweit sie durch die Nutzung seitens der IHK FOSA entstehen, zu 100% aus dem Stammbudget der FITKO ausnahmsweise in diesem expliziten Einzelfall zu finanzieren. Der Online-Dienst wird der IHK FOSA durch die FITKO in Form eines (unentgeltlichen) Nachnutzungsvertrages über den FIT-Store zur Verfügung gestellt.</p>
    </div>
    <span class="fitkodecisions-details__back">
        <a class="fitkodecisions-details__back-history" href="#" aria-hidden="false">
            <div class="shortcut-links-element__icon-container">
                <i class="svg-itpl_icon_arrow_20px_blue"></i>
                <i class="svg-itpl_icon_arrow_20px"></i>
            </div>
            <span>Zurück zur Übersicht</span>
        </a>
        <a aria-hidden="true" class="fitkodecisions-details__back-overview" title="Öffnet die Übersichtseite der Beschlüsse" href="/beschluesse-informationen">
            <div class="shortcut-links-element__icon-container">
                <i class="svg-itpl_icon_arrow_20px_blue"></i>
                <i class="svg-itpl_icon_arrow_20px"></i>
            </div>
            <span>Zur Übersicht</span>
        </a>
    </span>
</article>

License

This project is licensed under the MIT License - see the LICENSE.md file for details

it-plr-decision-scraper's People

Contributors

samsour avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.