Welcome to my Web Scraping Repository! This repository contains a collection of Jupyter Notebook (.ipynb) files, each designed to scrape different websites. The scripts are written in Python and utilize various web scraping libraries and tools. My goal is to provide a useful resource for those looking to learn about web scraping techniques or gather data from a range of online sources.
Each .ipynb
file in the repository is named according to the website it is designed to scrape. Here's a brief overview:
Elmenus_Scraping.ipynb
: Script for scraping Elmenus website.IMDP.ipynb
: Script for scraping IMDP website.Pokemon.ipynb
: Script for scraping Pokemon website.Departments_Scraping.ipynb
: Script for scraping departments from york.ac.uk website.Institute_Cleaning.ipynb
: Script for cleaning institutes.
- Multiple Website Scrapers: Each notebook is tailored to extract specific data from a different website.
- Error Handling: Scripts include basic error handling to manage common scraping issues.
- Data Parsing and Cleaning: Methods to clean and format the scraped data.
- Python 3.8
- Jupyter Notebook
- Web scraping library: Selenium
To use these scripts, follow these steps:
-
Clone the Repository
git clone https://github.com/eyadshabrawy/web-scraping.git
-
Navigate to the Repository Folder
cd web-scraping
-
Open Jupyter Notebook
- Launch Jupyter Notebook in your environment.
- Navigate to the cloned repository directory.
-
Select a Notebook
- Open the
.ipynb
file corresponding to the website you are interested in scraping.
- Open the
-
Run the Notebook
- Execute the cells in the notebook to perform web scraping.
We welcome contributions to improve the existing scripts or add new ones. Please adhere to the following steps for contributing:
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch
). - Make your changes and commit them (
git commit -am 'Add some feature'
). - Push to the branch (
git push origin feature-branch
). - Create a new Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
Web scraping can be against the Terms of Service of some websites. Always ensure that your scraping activities are legal and ethical.