Web Scraping Repository

Introduction

Welcome to my Web Scraping Repository! This repository contains a collection of Jupyter Notebook (.ipynb) files, each designed to scrape different websites. The scripts are written in Python and utilize various web scraping libraries and tools. My goal is to provide a useful resource for those looking to learn about web scraping techniques or gather data from a range of online sources.

Repository Structure

Each .ipynb file in the repository is named according to the website it is designed to scrape. Here's a brief overview:

Elmenus_Scraping.ipynb: Script for scraping Elmenus website.
IMDP.ipynb: Script for scraping IMDP website.
Pokemon.ipynb: Script for scraping Pokemon website.
Departments_Scraping.ipynb: Script for scraping departments from york.ac.uk website.
Institute_Cleaning.ipynb: Script for cleaning institutes.

Features

Multiple Website Scrapers: Each notebook is tailored to extract specific data from a different website.
Error Handling: Scripts include basic error handling to manage common scraping issues.
Data Parsing and Cleaning: Methods to clean and format the scraped data.

Prerequisites

Python 3.8
Jupyter Notebook
Web scraping library: Selenium

Installation

To use these scripts, follow these steps:

Clone the Repository

git clone https://github.com/eyadshabrawy/web-scraping.git

Navigate to the Repository Folder
```
cd web-scraping
```

Usage

Open Jupyter Notebook
- Launch Jupyter Notebook in your environment.
- Navigate to the cloned repository directory.
Select a Notebook
- Open the .ipynb file corresponding to the website you are interested in scraping.
Run the Notebook
- Execute the cells in the notebook to perform web scraping.

Contributing

We welcome contributions to improve the existing scripts or add new ones. Please adhere to the following steps for contributing:

Fork the repository.
Create a new branch (git checkout -b feature-branch).
Make your changes and commit them (git commit -am 'Add some feature').
Push to the branch (git push origin feature-branch).
Create a new Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer

Web scraping can be against the Terms of Service of some websites. Always ensure that your scraping activities are legal and ethical.

eyadshabrawy / web-scraping Goto Github PK

web-scraping's Introduction

Web Scraping Repository

Introduction

Repository Structure

Features

Prerequisites

Installation

Usage

Contributing

License

Disclaimer

web-scraping's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent