Code Monkey home page Code Monkey logo

ce-data-import's Introduction

TROMPA CE Data importer

This is a tool to import metadata to the Trompa Contributor Environment

It imports detailed data from the following sources:

  • CPDL
  • IMSLP
  • MusicBrainz
  • Wikidata
  • Muziekweb

It also imports basic identifiers for the following sources

  • VIAF
  • Worldcat
  • ISNI
  • Library of Congress

Installation

pip install requirements.txt

Running the application

To configure the importer to point to a specific Contributor Environment, set the TROMPACE_CLIENT_CONFIG environment variable to the path of a configuration file for trompace-client:

export TROMPACE_CLIENT_CONFIG=trompace.ini

The main entrypoint is ceimport.cli. Use

python -m ceimport.cli

to get a list of imports that can be performed.

To get detailed documentation about a particular importer, user the help flag:

$ python -m ceimport.cli cpdl-import-work --help

Usage: python -m ceimport.cli cpdl-import-work [OPTIONS]

  Import the given work (--url x) or file of works (--file f). Works need to
  be wiki titles (no http://.... and no _ to split words.

Options:
  --file TEXT
  --url TEXT
  --help       Show this message and exit.

Muziekweb

To import data from Muziekweb into the Trompa CE start the import-mw.py script with the nescessary parameters. For example To import a single track use:

python import-mw.py -t JK136417-0003

The importer uses the Trompa CE client settings to connect and identify the responsible account for the import. When using custom settings, you can place a copy of the import.ini from the CE client in the root of this repository and modify the settings in the file.

When importing audio fragments from an album release, the importer acquires data from the Muziekweb API (https://www.muziekweb.nl/Muziekweb/Webservice/WebserviceAPI.php). To make use of this API an account is required. You can register a free account for the API and fill the account details in a .env file. An example is given in the .env.example file. You can also run the importer using the run parameters -mwu and -mwp.

License

Copyright 2020 Music Technology Group, Universitat Pompeu Fabra Copyright 2020 Muziekweb

Licensed under the Apache License, Version 2.0. See LICENSE for more information

ce-data-import's People

Contributors

alastair avatar lporcaro avatar pc2752 avatar juansgomez87 avatar caspercdr avatar

Stargazers

 avatar

Watchers

 avatar James Cloos avatar David M. Weigl avatar Bauke Freiburg avatar  avatar IP Samiotis avatar Nicolás Gutiérrez avatar

Forkers

pc2752 voctrolabs

ce-data-import's Issues

Join items from different data sources

We need to make sure that we have sameas relationships or similar between the same data from different sources, e.g. musicbrainz/wikipedia/imslp

Where relations exist, we should import these. e.g. if musicbrainz links to wikidata and imslp, add these.
If there are no relations we need to do manual matching. I'm not sure of the best way to do this. For now we could use basic string matching / close matches. If our dataset is small enough, this could be done manually. If we create relations that don't previously exist, we should add them to musicbrainz

Use Prov-o relationships to indicate the software that loaded data

Related to #8

In the CE Data import guidelines, section 4.6. Provenance properties, there is a section that describes how provenance of nodes should be tracked, including saying that a node was generated by something on behalf of a user.
It's unclear if we should be creating these nodes ourselves , or if the CE should do them.
Especially if if we run the importer as an algorithm, it seems like a good idea to create these nodes, but we should only do this work if we have a clear usecase for it, to avoid doing work that will be unused.

Add more data to IMSLP composer importer

Some additional data that is sometimes present on IMSLP composer pages that we should import:

  • Nationality
  • Wikipedia link
  • Link to other authorities (sometimes these link directly to the authority, other times they link to wikidata and get the links from there)
  • "Labels" (Spellings of the name in different languages)
  • Partial or approximate dates
  • Biography link
  • Picture + Picture caption
  • Link to CPDL

run importer as algorithm

It'd be nice to be able to poke the importer to say "import item x" and it goes to all data sources and loads as much information as possible

e.g. a score on imslp will import the score + artist + links to MB/viaf/wikidata

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.