Code Monkey home page Code Monkey logo

yellow_pages_scraper's Introduction

Yellow Pages Restaurant Scraper

https://www.yellowpages.com/oakland-ca/restaurants

How to run

  • Download files
  • Locate scraper_v2.py
  • if you want all the data change the range to range(1,num_pages) in the main loop
  • Run that file

Outputs

  • The script will output 2 files: pre_cleaned_output.csv and cleaned.csv
  • Pretty clear what means what
  • The ones I have here are simply samples and yours will be much longer as I only went through the first 3 pages here out of 100 total

Description of each file

scraper_v1.py

  • My first attempt at a scraper
  • Very unorganized but I keep it in to show alternate ideas to do something like this

scraper_v2.py

  • My working scraper
  • Prettier and more functional than scraper_v1 probably will ever be

test1.py

  • I am going to connect the output to a relational database and query the data so this will connect to one
  • Currently working with DBeaver

transform.ipynb

  • Made sure I knew exactly how to transform to my liking before making it into an actual script
  • Able to view step by step easily

transform.py

  • Just the notebook in script form
  • outputs the cleaned csv

yellow_pages_scraper's People

Contributors

nicoceresa avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.