Code Monkey home page Code Monkey logo

imslp-scrape's Introduction

Scraping the International Music Score Library Project

This repository is a collection of tools used to scrape imslp.org. As of May 18, 2012, IMSLP has:

  • 54,374 works
  • 193,537 scores
  • 16,140 recordings
  • 7,202 composers
  • 193 performers

Scripts

get_people_result_page_urls.sh: Generates a list of links to every results page under http://imslp.org/wiki/Category:People. There are 200 results per page, and 9,162 results in total.

get_people_page_urls.py: Generates a list of links to every person page (i.e. http://imslp.org/wiki/Category:Aagesen,_Truid).

mk_people.sh: Create a directory for each person page and download it's HTML into said directory.

get_score_page_urls.py: Create score_id_to_url_map.txt, a text file mapping score IDs to URLs. Download HTML source for given score page into people directory.

get_pdf.py: Download all of the PDFs for a given person/composer into the corresponding directory.

mk_meta.py: Generate metadata, and move files to upload/staging directory.

imslp-scrape's People

Contributors

jjjake avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

imslp-scrape's Issues

Using this with AWS Lambda and Netlify.

Hey there!
I'm currently in the process of developing a web/app-based easy search tool for IMSLP, as well as being able to bypass commercial recordings by using Spotify.
I have no idea of how Python works, but I know that it works well, and your scraper program here seems to be exactly what I'm looking for to easily access IMSLP content.
How would one go about integrating it as a Lambda Function with Netlify?

Thanks in advance,
Josef Leventon

P.S: I'm already using IMSLP's API to get a composer and work list as JSON data and have integrated that as a Lambda function (as seen here.)

lwp-request5 not allowed

When I run get_people_result_page_urls.sh, I get the following message:

lwp-request5: LWP-REQUEST5 is not an allowed method

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.