Code Monkey home page Code Monkey logo

ajmc-elixir's Introduction

Ajax Multi-Commentary

This repository is a hard fork of the Open Commentaries' main server as of fd2b294d1ff89a8d73aaeec53b316d31ce038572.

i18n

This project uses the Gettext support that is built into Phoenix. To add a translation, enclose the default text where you want it to appear in the application in a call to the gettext backend: gettext("My default text."). Then run mix gettext.extract and mix gettext.merge priv/gettext. These commands will find your newly added i18n string and add it to the default.po files for each of the languages that the project supports.

Don't edit the default.pot file at the root of the priv/gettext directory.

Instead, find your newly added string (or any strings whose translations you want to modify) in the default.po file of the language into which you're translating.

A translation looks something like this:

#: lib/text_server_web/components/layouts/app.html.heex:16
#, elixir-autogen, elixir-format
msgid "About"
msgstr "À propos du projet"

The string from the call to gettext/1 is the msgid, and you can add your translation to this string on the msgstr line.

Environment variables

In order to start the app locally, you will need to set a few environment variables:

In production, a few additional variables are required:

  • DATABASE_URL: For example: postgres://USER:PASS@HOST/DATABASE
  • SECRET_KEY_BASE: For signing cookies etc.
  • PHX_HOST: The hostname of the application. ajmc.unil.ch for now. Note that, even though the Phoenix server will not be talking to the outside world directly (all traffic goes through a proxy), it still needs to know what hostname to expect in requests so that it can respond properly.
  • PORT: The local port for the server. This is where you'll send the proxied requests to, so if the proxy is serving the app at https://ajmc.unil.ch:443, it should proxy requests to something like http://127.0.0.1:4000.
  • SENDGRID_API_KEY: Sign up at sendgrid.com. This API key is needed to send account verification emails for trusted users.

Deployment

This application is deployed using an Elixir Release that is built and deployed via a Docker container. The container's specification can be found in the Dockerfile. Note the (very simple) Dockerfile.postgres as well: an example of using it can be found in docker-compose.yaml.

(Note that this docker-compose file is not used in production, but is rather a development convenience for debugging deployment issues.)

Production server configuration

All of the configuration for the production Phoenix/Cowboy endpoint can be found in config/runtime.exs. Note that HTTPS is not enforced at the application level. Instead, the expectation is that the application only allows local access, which is brokered to the outside world by a reverse proxy such as nginx. Bear in mind that the proxy needs to allow websocket connections in order for LiveView to work.

Building

The Dockerfile builds a release of the Elixir application in a fairly standard way, but we also need to seed the database with the latest textual data about the Ajax commentaries.

To perform this seeding, entrypoint.sh runs /app/bin/text_server eval "TextServer.Release.seed_database". This function starts the application processes (except for the HTTP server) and calls TextServer.Ingestion.Ajmc.run/0.

TextServer.Ingestion.Ajmc.run/0 deletes all of the existing comments and commentaries: the data have the potential to change in difficult-to-reconcile ways, so it's easier just to start fresh, since we store the source files locally (more on that in a second).

TextServer.Ingestion.Ajmc.run/0 then creates the Versions (= editions) of the critical text (Sophocles' Ajax), as detailed in TextServer.Ingestion.Versions.

These Versions are CTS-compliant editions of the text, meaning that they all descend from the same Work, which is identified by the URN urn:cts:greekLit:tlg0011.tlg003. Right now, we're only making one Version, based on Greg Crane's TEI XML encoding of Lloyd-Jones 1994's OCT. Eventually, we will ingest more editions into the same format.

The data structure for representing a text is essentially an ordered list of TextNodes. We need to keep the order (found at the offset property internally) even though each TextNode also has a location because the locations do not necessarily match textual order: lines can be transposed, for example, so that the reading order of lines 5, 6, and 7 might actually be 6, 5, 7. To take a real example, the lines 1028–1039 are bracketed in some editions and arguably should be excluded from the text. That would mean a jump from 1027 to 1040 -- still properly ordered, but irreconcilable across editions without individual ordering.

Caveat lector: the following might change

Each TextNode can be broken down further into an ordered list of graphemes. (We use graphemes and not characters in order to simplify handling polytonic Greek combining characters.) Annotations typically refer to lemmata as the range of graphemes that correspond to the word tokens of a given lemma. That means that instead of the CTS standard urn:cts:greekLit:tlg0011.tlg003.ajmc-fin:1034@Ἐρινὺς, we would refer to the grapheme range at urn:cts:greekLit:tlg0011.tlg003.ajmc-fin:1034@7-12.

This approach, however, should likely change, decomposing each edition to its TextTokens. This transition is a work in progress.

Commentaries

Once the Versions have been ingested, we ingest each of the commentaries detailed in the commentaries.toml. Their source files can be found with the glob pattern priv/static/json/*_tess_retrained.json. (Nota bene: Eventually we will need to move these files elsewhere, as we can only store public domain content in this repository.)

Each CanonicalCommentary pulls its data from Zotero by mapping the id from the corresponding tess_retrained.json to its accompanying zotero_id.

Each CanonicalCommentary has two kinds of comments: Comments, which have a word-anchor and thus a lemma, and LemmalessComments, which have a scope-anchor (a range of lines).

Each Comment is mapped to its corresponding tokens in urn:cts:greekLit:tlg0011.tlg003.ajmc-lj; each LemmalessComment is mapped to the corresponding lines.

Note that sometimes these mappings will producde nonsensical results: Weckleinn, for instance, reorders the words in line 4, so his Comment on that line has a lemma ("ἔνθα Αἴαντος ἐσχάτην τάξιν ἔχει") that does not correspond to the text ("Αἴαντος, ἔνθα τάξιν ἐσχάτην ἔχει") — and this is a relatively minor discrepancy.

This is why it's important that we also allow readers to change the "base" or critical text and to apply the comments in a flexible way.

Rendering comments in the reader

We render the lemma of comments as a heatmap over the critical text in the reading environment, allowing readers to see at a glance when lines have been heavily glossed. To do so, we borrow approaches from the OOXML specification and ProseMirror:

We need to group the graphemes of each text node (line of Ajax) with the elements that should apply (we’re also preserving things like cruces and editorial insertions), including comments.

Starting by finding the comments that apply to a given line:

# comment starts with this text node OR
# comment ends on this text node OR
# text node is in the middle of a multi-line comment
comment.start_text_node_id == text_node.id or
  comment.end_text_node_id == text_node.id or
  (comment.start_text_node.offset <= text_node.offset and
      text_node.offset <= comment.end_text_node.offset)

we then check each grapheme to see if one of those comments applies:

cond do
  # comment applies only to this text node
  c.start_text_node == c.end_text_node ->
    i in c.start_offset..(c.end_offset - 1)

  # comment starts on this text_node
  c.start_text_node == text_node ->
    i >= c.start_offset

  # comment ends on this text node
  c.end_text_node == text_node ->
    i <= c.end_offset

  # entire text node is in this comment
  true ->
    true
end

with that information (packaged in an admittedly confusing tuple of graphemes and tags), we can linearly render the text as a series of “grapheme blocks” with their unique tag sets:

<.text_element 
  :for={{graphemes, tags} <- @text_node.graphemes_with_tags} 
  tags={tags} text={Enum.join(graphemes)} 
/>

It remains to be determined how we will work with comments that don't match the underlying critical text.

About the schema

We follow the CTS URN spec, which can at times be confusing.

Essentially, every collection (which is roughly analogous to a git repository) contains one or more text_groups. It can be helpful to think of each text_group as an author, but remember that "author" here designates not a person but rather a loose grouping of works related by style, content, and (usually) language. Sometimes the author is "anonymous" or "unknown" --- hence text_group instead of "author".

Each text_group contains one or more works. You might think of these as texts, e.g., "Homer's Odyssey" or "Lucan's Bellum Civile".

A work can be further specified by a version URN component that points to either an edition (in the traditional sense of the word) or a translation.

So in rough database speak:

  • A version has a type indication of one of commentary, edition, or translation
  • A version belongs to a work
  • A work belongs to a text_group
  • A text_group belongs to a collection

In reverse:

  • A collection has many text_groups
  • A text_group has many works
  • A work has many versions, each of which is typed as commentary, edition, or translation

Note that the CTS specification allows for an additional level of granularity known as exemplars. In our experience, creating exemplars mainly introduced unnecessary redundancy with versions, so we have opted not to include them in our API. See also http://capitains.org/pages/vocabulary.

Running in development

To start your Phoenix server:

  • Install dependencies with mix deps.get
  • Make sure your configuration (./config) is correct
  • Create and migrate your database with mix ecto.setup
  • Start Phoenix endpoint with mix phx.server or inside IEx with iex -S mix phx.server

Now you can visit localhost:4000 from your browser.

Ready to run in production? Please check our deployment guides.

Front-end environment and development

We're leveraging Phoenix LiveView as much as possible for the front-end, but occasionally we need modern niceties for CSS and JS. If you need to install a dependency:

  1. Think very carefully.
  2. Do we really need this dependency?
  3. What happens if it breaks?
  4. Can we just use part of the dependency in the vendor/ directory with proper attribution?
  5. If you really must install a dependency --- like @tailwindcss/forms --- run npm i -D <dependency> from within the assets/ directory.

Acknowledgments

Data and application code in this repository were produced in the context of the Ajax Multi-Commentary project, funded by the Swiss National Science Foundation under an Ambizione grant PZ00P1_186033.

Contributors: Carla Amaya (UNIL), Sven Najem-Meyer (EPFL), Charles Pletcher (UNIL), Matteo Romanello (UNIL), Bruce Robertson (Mount Allison University).

License

Open Commentaries: Collaborative, cutting-edge editions of ancient texts
Copyright (C) 2022 New Alexandria Foundation

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.

ajmc-elixir's People

Contributors

pletcher avatar mromanello avatar

Watchers

Mahmoud Assaf avatar

ajmc-elixir's Issues

HTML title of pages

This refers to the content of the html > head > title element in the various pages. It's a small detail, but it becomes "visible" whenever pages get bookmarked.

  • default: "Ajax Multi-Commentary"
  • bibliography page: "Ajax Multi-Commentary – Bibliography"
  • about page: "Ajax Multi-Commentary – About"

For links to the ajmc platform that point to a specific glossa, I guess the title could/should reflect the target glossa, but we can leave this, in case, for the minimal comp. version.

behavior of heatmap for long ranging scope anchors

  • a good example of this is Wecklein 1894 ad loc. vv. 1–133.
  • in order to keep the line heatmap informative, only the starting line (here line 1) should be a highlight
  • all other lines, 2-133 will not receive any "heat"

Linking to commentary lemmata

As a user, I'd like to have direct links to individual commentary lemmata. This requirement could be satisfied if we had a CTS URN for each lemma, and to have the reading environment respond when it gets a reference like that in the URL.

serialize commentaries to TEI/XML

Some thoughts:

  • this applies only to public domain commentaries
  • we should keep the link between text and IIIF images (TEI Publisher will then exploit this for display)
  • we don't have paragraphs information, but we do have divisions into glosses (for the commentary part)
  • we do have a ToC for each commentary that we can use
  • canonical references and tagged entities should be translated into corresponding TEI elements

Needs further discussion:

  • should we produce one TEI file per commentary, or split into critical text/commentary/translation (if applicable)?

oprhan glossae after applying commentary filter

steps to reproduce:

  • refresh the application
  • select and expand a glossa in the glossa viewer
  • click on show page image
  • now select a different commentary in the comm. filter
  • the glossa should disappear but stays there

Add a line with status explanation

@pletcher Just an idea, let's discuss
As a user, it'd be useful to know (at a glance), at all times, how many glosses are available for the text section in focus and how many am I currently viewing (e.g., as a result of filtering).

Possible template: Currently displaying X glosses from X commentaries for this section (lines x-y). In total, there are Y glosses (of which Z are lemma-less).

Once #20 is fixed, quite a bit of space will be freed up in the top horizontal part (i.e. above the horizontal grey lines)., so this could be easily added.

navigation and main menu

  • ajmc.unil.ch should point directly to the multi-comm., at the very beginning of the text (no landing page)
  • a new page About should be added (with localised versions)
  • the menu will then have: Ajax multi-commentary, Meta-commentary, Available commentaries, About

NB: the Meta-Commentary menu item will be a placeholder for functionalities to come in next releases

broken Wikidata links in bibliography

e.g., in the page http://0.0.0.0:4000/bibliography/DeRomilly1976
https://wikidata.org/Q118976611 should rather be https://wikidata.org/wiki/Q118976611

API for browsing/searching commentaries

We need to define API endpoints for browsing and searching lemmas for commentaries.

We can start with Jebb, as the data are mostly ready to go.

  • /commentaries API endpoint
  • /commentaries/:commentary_urn API endpoint
  • /commentaries/:commentary_urn/comments API endpoint
  • /commentaries/:commentary_urn/lemmas API endpoint
  • Expose search API (/commentaries/:commentary_urn/comments?search=my%20query ?)

add info-buttons

Add some info buttons to provide the users quick explanations of how key interface components work.
Info buttons for:

  • commentary filter
  • glossae viewer
  • critical text selector
  • dynamic apparatus

Explanations attached to info-buttons should be localised

Page editables via Markdown

It'd be great to be able to edit some pages via Markdown. For now this applies to the to-come About page but there may be more use cases in the near future.

Basic bibliography section

Bibliography will be one item in the main menu.

In this basic version, I'd like to have displayed bibliographic metadata for the commentaries in our corpus, which should be fetched from this Zotero library. Short labels in the multi-comm. reading environment (e.g. Finglass 2011) should link back to the corresponding record in this section.

display page image number in glossa viewer (for debugging)

For debugging of data it would be super useful. It could be hidden behind a small button or something. For a given glossa, I'd like to know what is the corresponding page image number.

Even better: make the page number a link to the IIIF page image for a quick check (I know there is the OSD viewer, but it makes the page number transparent, which is instead useful for debugging)

Display lemmata for more (all?) commentaries in the application

In order for this to happen, it's necessary to ingest all available lemmata annotations into the canonical JSONs. So, all other commentaries (but Jebb) will need to be re-canonified so that these annotations will be picked up by the backend API and finally be displayed in the application. On the inception-side of things, all available annotations were exported to XMI (see this commit https://github.com/AjaxMultiCommentary/lemma-linkage-corpus/commit/4e949614345443d397c3df6386cc095da214219f).

Synopsis-based navigation

JSON data for the synopsis is here.

Behaviour: the entire text will be loaded, but when clicking on a synopsis section the reading focus (= text window in focus) will ‘jump to’ the desired section.

Linear reading of commentaries

As a user I'd like to:

  • read the introduction in De Romilly’s commentary
  • read the bibliography of Finglass’ commentary

Features: OCR + image display

This can perhaps be offloaded to existing tools, e.g. TEI Publisher

⚠️This feature has a dependency with #39

"Glossae by commentary" filter

some design changes to the filter (as discussed yesterday):

  • put the filter behind a button (when not in use it stays collapsed and takes up less space)
  • move the filter/button at the very top of the glossa viewer panel (column)
  • make the list of available commentaries scrollable
  • add a search field to make the list quickly searchable

PS: can we have in brackets, beside the commentator's name, the total number of glosses existing for the text section in focus?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.