Code Monkey home page Code Monkey logo

princeton-scraper-cos-people's Introduction

Princeton COS People Scraper

This is a web scraper that produces machine-processable JSON feeds of Princeton University's Department of Computer Science directory, sourced from the official, publicly available directory.

You can see the main JSON feed by clicking here.

There are also sub-feeds by category of persons (faculty, grad students, staff, etc.). These feeds are all updated every week on Saturday. Read on to learn more.

Accessing the static feeds

You can access the main (regularly updated) JSON feed directly from this URL:

https://jlumbroso.github.io/princeton-scraper-cos-people/feeds/

There are sub-feeds available for the different categories of people:

For example using Python, you can use the requests package to get the JSON feed:

import requests
r = requests.get("https://jlumbroso.github.io/princeton-scraper-cos-people/feeds/")
if r.ok:
    data = r.json()["data"]

Feed format

This feed provides most people in the directory as a JSON dictionary with the following fields:

    {
        "email": "[email protected]",
        "office": "035 Corwin Hall",
        "degree": "Ph.D., Universit\u00e9 Pierre et Marie Curie, 2012",
        "title": "Lecturer",
        "name": "J\u00e9r\u00e9mie Lumbroso",
        "research-interests": "Probabilistic algorithms, data streaming, data structures, analysis of algorithms, analytic combinatorics.",
        "profile-url": "https://www.cs.princeton.edu/people/profile/lumbroso",
        "image-url": "https://www.cs.princeton.edu/sites/all/modules/custom/cs_people/generate_thumbnail.php?id=2488&thumb=",
        "image": "<base 64 encoded JPEG of the image>",
        "netid": "lumbroso",
        "first": "J\u00e9r\u00e9mie",
        "last": "Lumbroso",
        "type": "faculty"
    }

Other categories of people may have other fields, such as leave, advisers, website, etc.

Backstory

Previously, I had implemented JSON feeds to programmatically obtain the faculty of Princeton's School of Engineering and Applied Sciences, to build the web portal for the BSE 2024 First Year Advising program.

This time, I needed to access the directory information of the Department of Computer Science graduate students. Unfortunately, like for the SEAS faculty, there is no programmatically available data source that also contains important information such as photos; the only such source is the Department of Computer Science official website.

Despite having had conversations with @sckarlin about not scraping the contents of the directory, it appeared that this was the easiest way to obtain up-to-date grad student information.

The first application for this feed will be to configure and provision the Slack profiles of the CS grad student Slack.

License

This repository is licensed under The Unlicense. This means I have no liability, but you can do absolutely what you want with this.

princeton-scraper-cos-people's People

Contributors

github-actions[bot] avatar jlumbroso avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.