I want to take the next step towards designing a good high-level API for STAM. In the

High-level API design about stam HOT 6 CLOSED

annotation commented on June 14, 2024

High-level API design

from stam.

Comments (6)

proycon commented on June 14, 2024

In order to get a feel for this API, I translated Dirk's example code from https://nbviewer.org/github/ETCBC/bhsa/blob/master/programs/stam-nu.ipynb to it. This is code to find "all phrases where the first and the last words have the same grammatical number":

First the Text Fabric code:

results = []

for p in F.otype.s("phrase"):
    ws = L.d(p, otype="word")
    if len(ws) < 2:
        continue
    fi = ws[0]
    la = ws[-1]
    if F.nu.v(fi) != F.nu.v(la):
        continue
    results.append((p, fi, la))

Now almost the exact same structure with the new STAM API using the python binding (proof of concept, details may vary still):

results = []
for phrase in store.annotations_by_data(set="someset", key="type", value="phrase", textual_order=True):
    words = phrase.annotations_by_data_in_targets(set="someset", key="type", value="word", textual_order=True)
    if len(words) < 2:
        continue
    firstword = words[0]
    lastword = words[-1]
    for data, annotation in firstword.data_about(set="someset",key="nu"):
        if lastword.test_data_about(data):
            results.append((phrase,firstword,lastword))

I also reformulated the pseudo-query-code from one of the query proposals to the new proposed API. Using Python. This is a complex query to selecting specific noun phrases followed by specific verb phrases within a specific context (chapter, sentence, book):

for book in store.resources_by_data(set="someset",key="name", value=DataOperator.any(DataOperator.equals("genesis"),DataOperator.equals("exodus")):
    for chapter in book.text_by_data(set="someset",key="type",value="chapter"):
        if chapter.test_data_about(set="someset",key="number",value=2):
            for sentence in chapter.text_by_data(set="someset",key="type", value="sentence"):
                if chapter.test_related_text(TextSelectionOperator.EMBEDS, sentence):
                    for nn in sentence.related_text(TextSelectionOperator.EMBEDS):
                        if nn.test_data_about(set="someset",key="type",value="word") and\
                           nn.test_data_about(set="someset",key="pos", value="noun") and\
                           nn.test_data_about(set="someset",key="gender",value="feminine") and\
                           nn.test_data_about(set="someset",key="number",value="singular"):
                            for vb in nn.related_text(TextSelectionOperator.PRECEDES):
                                if sentence.test_related_text(TextSelectionOperator.EMBEDS,vb) and\
                                    vb.test_data_about(set="someset",key="type",value="word") and\
                                    vb.test_data_about(set="someset",key="pos", value="verb") and\
                                    vb.test_data_about(set="someset",key="gender",value="feminine") and\
                                    vb.test_data_about(set="someset",key="number",value="plural"):
                                yield book, chapter, sentence, nn, vb

from stam.

dirkroorda commented on June 14, 2024

I am a bit worried by the verbosity. You prefer to work with methods on annotation objects. Then you have to repeat the argument set="someset" all the time.

If you have an object that exposes the higher-level methods independent of the annotations, say F, you could say

F.setSet("someset")

before doing many calls to retrieve annotation values.

Then it would be nice if you could say:

fData = F.getData(key="type")

lookup = fData.lookup
support = fData.support
targets = fData.targets

lookup is a function that given a target t delivers the value of an annotation in "someset", with key "type" and target t.

support is a function that given a value v delivers all targets t of annotations in
"someset" with key "type" and value v.

targets is a function that given an annotation and a value v delivers all targets t of that annotation provided there is an annotation in "someset" that has target t and key "type" and value v.

It is also handy to assume that textual_order is True by default.

With this, you could shorten the phrase lookup like so:

F.setSet("someset")
tpData = F.getData("type")
tpSupport = tpData.support
tpTargets = tpData.targets

nuData = F.getData("nu")
nuLookup = nuData.lookup

results = []
for phrase in tpSupport("phrase")):
    words = tpTargets(phrase, "word")
    if len(words) < 2:
        continue
    firstword = words[0]
    lastword = words[-1]
    if nuLookup(firstword) != nuLookup(lastWord):
        continue
    results.append((phrase,firstword,lastword))

from stam.

proycon commented on June 14, 2024

I am a bit worried by the verbosity. You prefer to work with methods on annotation objects. Then you
have to repeat the argument set="someset" all the time.

Yes, the underlying idea is that you have all kinds of objects with distinct methods to travel the edges.

If you have an object that exposes the higher-level methods independent of the annotations, say F, you could say
F.setSet("someset")

That would already be possible with proposed API: it exposes methods to travel edges in almost every direction. If you want to be invariant over the set/key, then just grab a dataset and datakey instance and work from there. So there's often multiple ways of doing thing with this API, which does come with the disadvantage that the API is bigger than it could be, but this flexibility should hopefully match the flexibility the model itself provides and give some freedom to the choices of the modeller:

set = store.dataset("someset")
key = set.key("nu")

for phrase in key.annotations_by_data(value="phrase", textual_order=True):
  ...

It is also handy to assume that textual_order is True by default.

Yeah, I'm not entirely sure how I'm going to incorporate that parameter yet. If it's gonna have an extra cost (temporary buffer allocation) I don't like doing it by default.

from stam.

proycon commented on June 14, 2024

lookup is a function that given a target t delivers the value of an annotation in "someset", with key "type" and target t.

That'd be t.find_data_about(set,key,value_test) . If you already have a DataKey instance like in my previous example, you should be able to pass it without the set (because it will know what set it belongs to).
You can also do key.find_data(value_test) to get the annotation data, and then data.annotations() to get annotations referencing the data.

support is a function that given a value v delivers all targets t of annotations in
"someset" with key "type" and value v.

Depending on the type of target you're looking for, that'd be: data.annotations(), data.resources(), data.dataset() etc..
In STAM it's harder to consider the targets a heterogeneous bunch (also due to it being implemented in a strongly typed language), usually you have to be explicit about which kind of target you want (annotations, textselections, resources, etc..).

targets is a function that given an annotation and a value v delivers all targets t of that annotation provided there is an annotation in "someset" that has target t and key "type" and value v.

That'd be annotation.annotations_by_data_in_targets(set, key, value_test) for annotations, though I suppose I also need methods for the other target types then. This one feels a bit too contrived still, not to say that it isn't a valid function, but the DIY route where it's split into two calls feels a bit more natural/understandable to me:

for annotation in annotation.annotations_in_targets()
    if annotation.test_data_about(set, key, value_test):
        ...

The first method can be replaced with resources(), datasets(), textselections() for the other types.

My method naming style may be a bit more verbose than you're accustomed to, but I feel the names have to be self-documenting to a certain extent so I'd rather be a bit explicit.

from stam.

proycon commented on June 14, 2024

Over the summer period, this API has been implemented (in git master, both for stam-rust and stam-python, but not released yet).

from stam.

proycon commented on June 14, 2024

Released in stam-rust 0.8.0

from stam.

High-level API design about stam HOT 6 CLOSED

Comments (6)

Related Issues (13)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent