This repository includes the materials covered during the Data Scraping course, Summer 2017. The course is taught by Hrant Davtyan as an elective course available to 1st year MS Economics students at the American University of Armenia (AUA).
During the course students will learn and use several tools necessary for completing a Data Scraping project as listed below:
- Sublime Text Editor - A simple yet powerful (user friendly interface + amazing performance) text editor. During the course Sublime will be used for creating and editing HTML, CSS, XML and JSON documents.
- Selector Gadget - A google chrome extension that helps to easily discover CSS selectors for the elements on the webpage.
- JSON formatter - A google chrome extension which makes the JSON representation indented and highlighted (when viewed directly inside the browser).
- Regex search - Another chrome extension which provides the opportunity of running a search on the webpage using regular extensions directly inside the borwser.
- Online regex tester - an online tool for testing a regular expression on a sample text typed by the user. Also provides quick reference sheet and interactive explanation of the expression being tested.
- Anaconda Python 2.7 - Python powered open data science platform, which comes toegther with Jupyter notebooks, Spider IDE and some of the most popular Python libraries preinstalled. During the course several python packages will be used. The list of packages (including thsoe preinstalled by Anaconda) is available below.
- requests
- re
- lxml
- html5lib
- beautifulsoup
- scrapy
- selenium
- numpy
- pandas
- sklearn
- statsmodels
- pandas_datareader
- python-linkedin
- markovbot
- googlemaps
- pafy
- quandl
The packages json, csv, time and urllib2 are also required, yet they come preinstalled with Python 2.7.
To install the above provided packages please download requirements.txt to your local directory and run the following command in the command prompt:
pip install -r requirements.txt