Our program will scrape a real estate website (Immoweb) for data about houses and apartments in Belgium. Once the information is fetched it will be cleaned and stored in a CSV file.
We have collected data for houses and apartments for sale in Belgium on ImmoWeb.be
How we achieved that?
-
Basically we started by creating class "Property" and all its functions that would contain all the elements from the data that a property for sale contain.
-
We wrote the loop to extract from each property every element needed (locality, rooms, etc..)
-
We wrote a loop to extract from each search page all the property's links.
-
We came up with 2 way to automate the scraping for each search page result one after another, and we picked the fastest one, 18 min 30 sec for 12.000 links.
-
After checking our code we saved all the data in the .csv file that you can download.
- selenium
- webdriver_manager
- geckodrdiver for firefox
- Clone the repo
- Install the libraries
- Install Selenium WebDriver
- Run main.py
Please, follow carefully the installation manual for Selenium WebDriver.
First our program will collect all the links from the website using selenium and store them in links.txt.
After this is will divide the links into chunks of 500 urls and scrape the data for each property. Once all chunks have been scraped its will output the data to a CSV file.
Data would be outputted in (Immoweb_Data_Scraper.csv).
Our program will give these fields about each property: locality, type of property (house or apartment, bungalow, chalet, mansion...), price, type of sale (exclusion or life sale), number of rooms, area, kitchen type, garden, terrace, and swimming pool availability as well as some additional properties.
We create a dataset which holds the following columns :
- Locality
- Type of property (House/apartment)
- Subtype of property (Bungalow, Chalet, Mansion, ...)
- Price
- Type of sale (Exclusion of life sales)
- Number of rooms
- Area
- Fully equipped kitchen (Yes/No)
- Furnished (Yes/No)
- Open fire (Yes/No)
- Terrace (Yes/No)
- If yes: Area
- Garden (Yes/No)
- If yes: Area
- Surface of the land
- Surface area of the plot of land
- Number of facades
- Swimming pool (Yes/No)
- State of the building (New, to be renovated, ...)
main.py
: The Complete Project
property.py
: Contains class to store all the data
items_urls.py
: Code to scrape all the links of different properties
scrape_house.py
: Code to scrape the data for each individual property
links.txt
: Contains all the links scrapped by items_urls.py
Immoweb_Data_Scraper.csv
: Contains all the data formatted as CSV file described in here
Frederik Frank :
- Properties Class
- Property data collection
Saïf Malkshahi:
- Search page result automation
- ReadMe file
- Alternate Scrapper
Lelo Tokwaulu:
- Properties links collection
- ReadMe file
The Other version made by Saif is availble here
Thank you, :)