Code Monkey home page Code Monkey logo

cremma-medieval-lat's Introduction

CREMMA Medii Aevi

characters badge regions badge lines badge files badge

HTR Ground truth for Latin medieval manuscripts from the 12th to the 16th century. Transcription is graphematic.

Transcription guidelines

The transcription guidelines are described in a paper available on HAL and published at the Journal for Open Humanities Data. It provides specific details about the selection process, the transcription methods and choices, as well as details about output (mainly the Generic CREMMA Model for Medieval Manuscripts (Latin and Old French) for Kraken)

Content

More details of manuscripts can be found in the data registry files

Shelfmark Folder Biblissima Pages Type Century Color Script Content
Egerton 821 🔗 4 Medic. 12 Praegothica 54v-56r: Sortes sanctorum. f. 56r: A prayer-charm for a wounded animal. ff. 56r: A charm against fever
Montpellier H318 🔗 5 Medic. 12 Semitextualis Libraria Anonyme, De urinis, Recettes, Constantinus, Libert de coitu, Bartholomeus Salernitanus, Practica
CCCC MSS 236 🔗 5 Lit. 13 Textualis Libraria Martial, Book 1: pr, 3, 4, 6, 15, 8-10, 13-14, 16, 19-20, 18, 21-25, 28, 33-34, 37, 40, 42-48
CLM 13027 🔗 5 Medic. 13 Southern Textualis Libraria Liber minor de Coitu. Galien, De Crisibus
Latin 16195 🔗 4 Medic. 13 Semitextualis Currens Questiones De Coitu
† MsWettF 15 🔗 5 Schol. 13 Textualis Libraria Rothwell, Commentarius in libros Sententiarum
Laur. Plut. 33.31 🔗 5 Lit. 14 Textualis Meridionalis Priapea, 12-45
Arras 861 🔗 5 Lit. 14 Textualis Formata Seneca, Ad Lucilium, 121-122
† BIS 193 🔗 5 Schol. 14 Textualis currens Adam Wodeham, Ordinatio, Liber IV, Quaestio 6.
Phil., Col. of Phys. 10a 135 🔗 5 Medic. 14 Cursiva recentior Tractatus de Sterilitate
† Mazarine Ms. 915 🔗 4 Schol. 14 Textualis Meridionalis Adam Wodeham, Ordinatio
‡ UBL, Ms 758 🔗 15 Eccl. 14 Textualis Libraria In annuntiatione Mariae
Latin 6395 🔗 6 Lit. 14 Semitextualis Libraria Seneca, Medea, 284-
Laur. Plut. 39.34 🔗 5 Lit. 15 Humanistica Cursiva Priapea, 01-16
† Vat. Pal. Lat. 373 🔗 4 Schol. 15 Hybrida Currens Plaoul, De Fide, Lectio 1-2
Laur. Plut. 53.08 🔗 4 Gramm. 15 Personal Humanistica Donat, In Phormionem Terenti commentum
Laur. Plut. 53.09 🔗 4 Gramm. 15 Humanistica Rotunda Donat, In Phormionem Terenti commentum
‡ Berlin, Hdschr. 25 🔗 17 Eccl. 15 Textualis Formata Book of Hours
‡ Berlin, Germ. Oct. 511 🔗 6 Eccl. 15 Hybrida formata Psalm. 6
Latin 8236 🔗 5 Lit. 15 Humanistica Cursiva Prudentius, 3.1-3.2, 3.16-3.17, 4.4-4.5; Tibullus, 3.7.74-3.7.114, 3.7.199-3.9.3
† CCCC MSS 165 🔗 5 Schol. 16 Personal Cursive Peter Abelard, Sic et non

Credits

  • For the manuscripts: MsWettF 15, BIS 193, Latin 6395, Vat. Pal. Lat. 373 and CCCC MSS 165, the transcriptions of Sentences Commentary Text Archive (SCTA) Project by Jeffrey C. Witt. In the case of dubia, additional corrections have been made for the faithful reproduction of the abbreviations. The GitHub repository of the project can be found here: https://github.com/scta-texts and their reading room here: https://scta.lombardpress.org/.
  • For the Faithful Transcriptions Data Set: https://zenodo.org/record/5582483
  • For the Donatus manuscripts: Laurentianus Pluteus 53.08 and 53.09, the edition of HyperDonat by Bruno Bureau & Christian Nicolas has been consulted http://hyperdonat.huma-num.fr/editions/html/index.html, preserving, nevertheless, the manuscript lectiones/errors;
  • In the same vein, for Latin 16195, the critical edition of Questiones de coitu, for Montpelier H318 and CLM 1302, the critical edition of Liber minor de coitu and for Philadelphia, College of Physicians, 10a 135, the critical edition of the Tractatus de sterilitate by Enrique Montero Cartelle were consulted respectively as reference.

Lines count

Directory Line type Count
./data/Arras-861 HeadingLine 1
DefaultLine 357
./data/BGO-511 DefaultLine 56
./data/BIS-193 DefaultLine 889
./data/CCCC-MSS-165 DefaultLine 151
./data/CCCC-MSS-236 HeadingLine 31
DefaultLine 161
./data/CLM13027 DefaultLine 616
./data/Egerton821 DefaultLine 114
./data/H318 DefaultLine 427
./data/Latin16195 DefaultLine 449
./data/Latin6395 DefaultLine 681
./data/Latin8236 HeadingLine 1
DropCapitalLine 1
DefaultLine 212
./data/LaurentianusPluteus33.31 DefaultLine 204
./data/LaurentianusPluteus39.34 HeadingLine 15
DefaultLine 120
./data/LaurentianusPluteus53.08 DefaultLine 245
./data/LaurentianusPluteus53.09 DefaultLine 158
./data/Mazarine915 DefaultLine 528
./data/PalLat373 DefaultLine 453
./data/Phi_10a135 DefaultLine 142
./data/SBB_PK_Hdschr25 HeadingLine 2
DefaultLine 235
./data/UBL758 HeadingLine 9
DefaultLine 561
./data/WettF0015 DefaultLine 455
----- ----- -----
All All 7274

cremma-medieval-lat's People

Contributors

alix-tz avatar malamatenia avatar ponteineptique avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

cremma-medieval-lat's Issues

Ajouter une nouvelle lettre ?

Ajouter une nouvelle lettre ?
Il me semble que dans les abréviations du capelli cette lettre apparait assez souvent :
ꝭ (lettre latine pour is).
Capture d’écran 2022-07-20 à 16 43 35 pour habetis

Minor editions for the paper

Paper itself

  • Explain the parameters with which the model was trained
  • One minor suggested correction: would "axe" on p. 7 be better as "vertical axis"? To me at least, "axis" seems better.
  • Explain the reuse potential:
    • Reviewer A asks for us to show how these data have already been used and how it can be used. Cite DH2019 paper and look for other kind of similar papers
    • I don't see what else to do, but my first approach does not really expand on "how or whether scholars who are not immediately connected to the project might use this model."

Data

  • Improve the documentation and presentation of the dataset on Zenodo
    • Documentation is not existant: import the README.md of Github
    • Improve said documentation with link to the Biblissima registry (That's bonus but sounds like a good idea)
    • Talk about the paper in the presentation of the dataset
  • Provide .txt exports in Zenodo / Github

Garder une trace des règles de transcription suivies

  • faire une capture de la portion de texte concerner, par exemple
  • expliciter le problème rencontré ou la règle adoptée
  • si le phénomène est rare, pour faciliter la correction a posteriori en cas de changement de règle, garder du trace des pages concernées

2 solutions possibles :

  • ajouter des éléments dans la FAQ partagées entre tous les annotateurs (permet de centraliser toutes les règles)
  • créer une issue par règle dans le répo concerné en me taggant

Exemple de documentation d'un règle de transcription :

Un mot raturé n'est pas transcrit.

illustration : dans carnets de Nerval/f64.jpg, on a transcrit "14," et non "14,00".
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.