Code Monkey home page Code Monkey logo

tablinker's Introduction

TabLinker v0.2

Authors: Rinke Hoekstra, Laurens Rietveld, Albert Meroño-Peñuela

Copyright: VU University Amsterdam

License: LGPL v3.0

About TabLinker

TabLinker is experimental software for converting manually annotated Microsoft Excel workbooks to the RDF Data Cube vocabulary. It is used in the context of the Data2Semantics project to investigate the use of Linked Data for humanities research (Dutch census data produced by DANS).

TabLinker was designed for converting Excel or CSV files to RDF (triplification, RDF-izing) that have a complex layout and cannot be handled by fully automatic csv2rdf scripts.

A presentation about Linked Census Data, including TabLinker is available from SlideShare.

Using TabLinker

TabLinker takes annotated Excel files (found using the srcMask option in the config.ini file) and converts them to RDF. This RDF is serialized to the target folder specified using the targetFolder option in config.ini.

Annotations in the Excel file should be done using the built-in style functionality of Excel (you can specify these by hand). TabLinker currently recognises seven styles:

  • TL Title - The cell containing the title of a sheet
  • TL Data - A cell that contains data, e.g. a number for the population size
  • TL ColHeader - Used for the headers of columns
  • TL RowHeader - Used for row headers
  • TL HRowHeader - Hierarchical header, used for multi-column row headers with subsumption/taxonomic relations between the values of the columns
  • TL RowProperty - Typically used for the header cells directly above RowHeader or HierarchicalRowHeader cells. Cell values are the properties that relate Data cells to RowHeader and HierarchicalRowHeader cells.
  • TL RowLabel - Used for cells that contain a label for one of the HierarchicalRowHeader cells in the same row.

An eight style, TL Metadata, is currently ignored (See #3).

An example of such an annotated Excel file is provided in the input directory. There are ways to import the styles defined in that file into your own Excel files.

Tip: If your table contains totals for HierarchicalRowHeader cell values, use a non-TabLink style to mark the cells between the level to which the total belongs, and the cell that contains the name of the total. Have a look at the example annotated Excel file to see how this is done (up to row 428).

Once you're all set, start the TabLinker by cd-ing to the src folder, and running:

python tablinker.py

Configuration file

Editing the config.ini file can help you to get a more customized RDF output. Options are:

general

  • format is the desired RDF serialization format ('n3' by default).

paths

  • srcMask is a mask matching the input XLS files. It works as a directory specifier, as well as wildcard specifier to filter only desired files.
  • targetFolder is the target directory where you want the RDF files to be written to.

debug

  • verbose with value 1 (default) enables verbose mode (0 disables it).

dataCell

  • literalType is a URI specifying the datatype for the data cells in the spreadsheet (e.g. [http://www.w3.org/2001/XMLSchema#int])
  • propertyName is the name of data cell property. Leave blank to use default 'hasValue'.
  • labels are language literals to be added for that property name (if needed).

annotations

  • enabled with value 1 (default) generates an additional graph with the annotations (Excel comments) contained in the XLS files (if any) (0 disables it).
  • model specifies the data model ('oa' by default) for generating the annotations graph.

Requirements

TabLinker was developed under the following environment:

API Documentation

API documentation (generated using pydoc) is available in the doc folder.

tablinker's People

Contributors

albertmeronyo avatar rinkehoekstra avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.