A A project Summaryor brief description of what this project does and who it's for
1.Installation of Visual Studio code setup.
- Downloaded from https://code.visualstudio.com/download
-
Setting up the vs code path and enviroment. -Edited the system environment and path for project
-
Installation of python setup -Downloaded the latest versionof python from https://www.python.org/downloads/
3.opened the command promt terminal and then install Scrapy Framework;
- pip install scrapy
- Then, create a folder inside the disk named "WebScraping" inside the disk
5.. After opening the folder through path-cmd , project is created
6..A project is created using cmd; -scrapy startproject project_name eg. scrapy startproject reed. cd reed then cd reed to get inside the project.
-
Scrapy genspider is created using cmd; -scrapy genspiser reedjob https://www.reed.co.uk/jobs/data-analyst-jobs
-
After that, reedjob project is crawl using spider command; -scrapy crawl reedjob
-
Code is written to extract the data from single card.
-
code is writen to extract the data from all the card of the main page.
-
At last, Code is written to extract the data from 100 pages of websites.
-
AtProject terminal; cmd is run to generate the json and csv for single and multiple pages respectively.
- scrapy startproject project_name -o file_name.extension_name eg. scrapy startproject reedjob.json eg. scrapy startproject reedjob.csv
-
A new repository named "WebScraping" is created in my git user account named "sanjiv001".
-
All the code, json file and readme file is uploaded in the repository.
-
At last, project is submitted via email address.