Code Monkey home page Code Monkey logo

princeton-scraper-seas-faculty's Introduction

Princeton SEAS Faculty Scraper

This is a web scraper that produces machine-processable JSON and CSV feeds of Princeton University's School of Engineering and Applied Science (SEAS) faculty, sourced from the official, publicly available faculty directory.

You can see the JSON feed by clicking here.

This feed is updated every week on Saturday. Read on to learn more.

Accessing the static feeds

You can access the (regularly updated) JSON feed directly from this URL:

https://jlumbroso.github.io/princeton-scraper-seas-faculty/feeds/

and the CSV feed is accessible from this URL:

https://jlumbroso.github.io/princeton-scraper-seas-faculty/feeds/index.csv

For example using Python, you can use the requests package to get the JSON feed:

import requests
r = requests.get("https://jlumbroso.github.io/princeton-scraper-seas-faculty/feeds/")
if r.ok:
    data = r.json()["data"]

You can use the comma package to get the CSV feed:

import comma
data = comma.load(
    "https://jlumbroso.github.io/princeton-scraper-seas-faculty/feeds/index.csv",
    force_header=True)

Feed format

This feed provides each person in the directory as a JSON dictionary with the following fields:

    {
      "netid": "lumbroso",
      "email": "[email protected]",
      "name": "J\u00e9r\u00e9mie Lumbroso",
      "first": "J\u00e9r\u00e9mie",
      "last": "Lumbroso",
      "profile-url": "https://engineering.princeton.edu/faculty/j-r-mie-lumbroso",
      "image-url": "https://engineering.princeton.edu/sites/default/files/thumbnails/image/Lumbroso_450x600_0.jpg",
      "website": "https://www.cs.princeton.edu/people/profile/lumbroso",
      "office": "035 Corwin Hall",
      "phone": "609-258-5379",
      "research": "Expertise: Probabilistic algorithms, data streaming, data structures, analysis of algorithms, analytic combinatorics.",
      "rank": "Lecturer",
      "affiliations": [
        "Computer Science"
      ]
    }

The CSV file follows this format (note that it does not contain the "research" field from the JSON format):

netid,email,name,first,last,profile-url,image,website,office,phone,rank,affiliations
lumbroso,[email protected],Jérémie Lumbroso,Jérémie,Lumbroso,https://engineering.princeton.edu/faculty/j-r-mie-lumbroso,https://engineering.princeton.edu/sites/default/files/thumbnails/image/Lumbroso_450x600.jpg,https://www.cs.princeton.edu/people/profile/lumbroso,035 Corwin Hall,609-258-5379,Lecturer,Computer Science

Acknowledgement & Backstory

This project was put together in August 2020, in preparation of the BSE freshpeople advising period, co-organized with Peter Bogucki and Traci Miller.

I had often felt the need to have a programmatically accessible version of the faculty directory, and made several (failed) attempts to address this need decisively. Often my approach was too ambitious, for instance trying to aggregate the information of all faculty on campus—despite the fact that this information is housed across many different websites, which follow different formats and break at different intervals.

This time around, I decided to write a more focused project to robustly address this need at least for SEAS faculty. This allowed me to integrate the faculty photos easily within a project meant to make navigation for incoming students easier.

License

This repository is licensed under The Unlicense. This means I have no liability, but you can do absolutely what you want with this.

princeton-scraper-seas-faculty's People

Contributors

github-actions[bot] avatar jlumbroso avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.