Code Monkey home page Code Monkey logo

data's Introduction

ICONCLASS

Please use the following to cite:

H. van de Waal, Iconclass, an iconographic classification system. Completed and edited by L.D. Couprie, E. Tholen & G. Vellekoop. (Amsterdam, 1972-1985). online edition by E. Posthumus & J.P.J. Brandhorst, 2024. https://iconclass.org/

DOI

A multilingual subject classification system for cultural content For more information see: http://www.iconclass.org/

Made by Hans Brandhorst [email protected] & Etienne Posthumus [email protected]

...with lots of support from a global cast of many, many people since 1972.

Data file

This repository contains the main data files for the ICONCLASS system. It is a collection of simple structured text files, dating back in concept to the late nineties of the previous century.

Structure

The structure is determined by the file notations.txt.

For example, the file looks like:

N 1
C 10
; 11
; 12
; 13
; 14
$
N 10
$

This is a chunk of data (each chunk is separated by a single $ character on its own line) The first part of a line, up to the first space, is the field name. If there are multi-valued fields, in other words, more than one value for a field, it is listed on a different line starting with a ; character,followed by a space and the field value. The above snippet, is roughly equivalent to the following JSON value:

[
    {N: "1",
     C: ["10", "11", "12", "13", "14"]},
    {N: "10}
]

Why not use a standard Knowledge Management System?

You might wonder why we can not simply use a standard system to manage vocabularies or classification systems. If IC has a SKOS version, surely we can just use a SKOS editor?

Alas, no. The "base" ICONCLASS system has around 40K nodes arranged in a tree. But then there are several "sub-trees" that are switched on and off at various parts of the base tree. These so-called "keys" in the IC causes an explosion to more than 1 million nodes in the system, which would make it very tricky to maintain in a traditional system.

Keys to 25F

A further complication is the use of WITH-NAMES placeholders in tree, also known as bracketed text. These notations look like 11H(...) where the ... can be filled in with any valid entry that makes sense to the user using that particular node in the tree. In the example, 11H(...) are male saints, so that could be 11H(JOHN) - but this could be in any language or variant. In the printed volumes for IC, several entries were already filled in as a convenience, and over the years some items have been added to the "official" list.

This also causes a problem when we create static dumps of the IC system, for example in RDF as it creates very large files.

data's People

Contributors

hansaticonclass avatar epoz avatar reemweda avatar eelkevdbos avatar mjhea0 avatar matthijsb avatar

Stargazers

 avatar Avindra Goolcharan avatar Giacomo Marchioro avatar Max Gr眉ntgens avatar Anja Gerber avatar  avatar  avatar

Watchers

Lucian avatar  avatar Avindra Goolcharan avatar James Cloos avatar

data's Issues

Add line number from source data file to import script and database

Add a facility to the make_import.py script to record the line number in the notations.txt file being used, and record that in the resulting iconclass.sqlite file.
The database schema should also be updated to accommodate this, plus the commit hash of the repo being used.

These line numbers and the data repo commit hash can then be used to create links in a UI to display an exact reference to the end-user to invite them to view the sources.

Path to WITH-NAMES that are not in system seems incorrect

For example, see:
11H(FOO)(+11)
gives:

1 路 Religion and Magic
11 路 Christian religion
11H 路 saints
11H(...) 路 male saints (with NAME)
11H(...)(+1) 路 male saints (with NAME) (+ Holy Trinity)

but when we do: 11H(JOHN)(+11) we get:

1 路 Religion and Magic
11 路 Christian religion
11H 路 saints
11H(...) 路 male saints (with NAME)
11H(JOHN) 路 the apostle John the Evangelist; possible attributes: book, cauldron, chalice with snake, eagle, palm, scroll
11H(JOHN)(+1) 路 the apostle John the Evangelist; possible attributes: book, cauldron, chalice with snake, eagle, palm, scroll (+ Holy Trinity)

Problems with diacritics in German keywords

There are some issues with the german keywords.

For example the german word "trauern" (to be sad) is saved as "ta眉rn" (which is not a word in german). maybe you replaced all "ue" with "眉" at some step in the processing?
Other examples are "klaue" => "kla眉", "schauen"=>"scha眉n"

(reported by Kolya Bailly)

Parent-Child relationship in notations.txt errors

The following list of items in notation.txt has an issue with the parent-child relationship:

11P315(AUGUSTINIANS)
11P315(BENEDICTINES)
11P315(CARMELITES)
11P315(CARTHUSIANS)
11P315(CISTERCIANS)
11P315(DOMINICANS)
11P315(FRANCISCANS)
11P315(PRAEMONSTRATENSIANS)
23Q8
23Q81
25C112
25D12(FLINT)
25H182
25H183
32B311(ENGLISHMEN)
32B311(SLOWAKIANS)
32B313(...)
32B313(MOROCCO)
32B313(SRI LANKA)
32B3213
32B3213(...)
32B3213(INDIA)
32B3213(GREAT BRITAIN)
32B3213(THE NETHERLANDS)
32B332
32B332(AUSTRALIA)
32B333
32B333(EUROPE)
34A111
41D211(ROBE 脌 L鈥橝NGLAISE)
42A4213
45C15(CROSSBOW)
46C11213
49E3933(...)
49E3933(BALNEUM ARENAE)
49E3933(BALNEUM MARIAE)
73C941
73D144
73F421(...)
73G423
91B25(PYRACMON)
95A(SISYPHUS)681
95A(SISYPHUS)6811
95A(SISYPHUS)682
95A(SISYPHUS)6821
95B(THEONOE)1
95B(THEONOE)11
95B(THEONOE)12
95B(THEONOE)2
95B(THEONOE)3
95B(THEONOE)31
95B(THEONOE)32
95B(THEONOE)33
95B(THEONOE)34
95B(THEONOE)35
95B(THEONOE)36
95B(THEONOE)37
95B(THEONOE)4
95B(THEONOE)5
95B(THEONOE)6
95B(THEONOE)68
95B(THEONOE)69
95B(THEONOE)7
95B(THEONOE)78
95B(THEONOE)79
95B(THEONOE)8
95B(TYRO)1
95B(TYRO)11
95B(TYRO)12
95B(TYRO)2
95B(TYRO)3
95B(TYRO)4
95B(TYRO)5
95B(TYRO)6
95B(TYRO)61
95B(TYRO)68
95B(TYRO)69
95B(TYRO)7
95B(TYRO)78
95B(TYRO)79
95B(TYRO)8

Discovered this when doing a recursive im-memory listing parent-child relationships:

import textbase

def depth_first_find(node, wanted):
    if node["n"] == wanted:
        return node
    for kid in node.get("c", []):
        possible =  depth_first_find(kid, wanted)
        if possible:
            return possible

tree = {"n": "", "c": [
    {"n":"0", "c":[]},
    {"n":"1", "c":[]},
    {"n":"2", "c":[]},
    {"n":"3", "c":[]},
    {"n":"4", "c":[]},
    {"n":"5", "c":[]},
    {"n":"6", "c":[]},
    {"n":"7", "c":[]},
    {"n":"8", "c":[]},
    {"n":"9", "c":[]},
] }

map = {}

for x in textbase.parse("notations.txt"):
    notation = x['N'][0]
    node = map.get(notation)
    if node is None:
        node = depth_first_find(tree, notation)
        if node is not None:
            map[notation] = node
        else:
            print(notation)
            continue
    node["c"] = [{"n": c} for c in x.get("C", [])]

Seems to have the wrong keywords for 25F23(LION)(+46)

See

>>> iconclass.get("25F23(LION)(+46)")
>>> o['kw']
{'it': ['leone'], 'pt': ['le茫o'], 'nl': ['leeuw'], 'de': ['Loewe', 'Saeugetier'], 'fi': ['leijona'], 'fr': ['lion'], 'es': ['le贸n'], 'en': ['dormir', 'dormire', 'lion', 'schlafen', 'sleeping']}
>>> o['kw']['en']
['dormir', 'dormire', 'lion', 'schlafen', 'sleeping']

The English keywords should only be ['lion', 'sleeping'], there are extra languages thrown in?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    馃枛 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 馃搳馃搱馃帀

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google 鉂わ笍 Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.