Code Monkey home page Code Monkey logo

Comments (6)

proycon avatar proycon commented on June 14, 2024

In order to get a feel for this API, I translated Dirk's example code from https://nbviewer.org/github/ETCBC/bhsa/blob/master/programs/stam-nu.ipynb to it. This is code to find "all phrases where the first and the last words have the same grammatical number":

First the Text Fabric code:

results = []

for p in F.otype.s("phrase"):
    ws = L.d(p, otype="word")
    if len(ws) < 2:
        continue
    fi = ws[0]
    la = ws[-1]
    if F.nu.v(fi) != F.nu.v(la):
        continue
    results.append((p, fi, la))

Now almost the exact same structure with the new STAM API using the python binding (proof of concept, details may vary still):

results = []
for phrase in store.annotations_by_data(set="someset", key="type", value="phrase", textual_order=True):
    words = phrase.annotations_by_data_in_targets(set="someset", key="type", value="word", textual_order=True)
    if len(words) < 2:
        continue
    firstword = words[0]
    lastword = words[-1]
    for data, annotation in firstword.data_about(set="someset",key="nu"):
        if lastword.test_data_about(data):
            results.append((phrase,firstword,lastword))

I also reformulated the pseudo-query-code from one of the query proposals to the new proposed API. Using Python. This is a complex query to selecting specific noun phrases followed by specific verb phrases within a specific context (chapter, sentence, book):

for book in store.resources_by_data(set="someset",key="name", value=DataOperator.any(DataOperator.equals("genesis"),DataOperator.equals("exodus")):
    for chapter in book.text_by_data(set="someset",key="type",value="chapter"):
        if chapter.test_data_about(set="someset",key="number",value=2):
            for sentence in chapter.text_by_data(set="someset",key="type", value="sentence"):
                if chapter.test_related_text(TextSelectionOperator.EMBEDS, sentence):
                    for nn in sentence.related_text(TextSelectionOperator.EMBEDS):
                        if nn.test_data_about(set="someset",key="type",value="word") and\
                           nn.test_data_about(set="someset",key="pos", value="noun") and\
                           nn.test_data_about(set="someset",key="gender",value="feminine") and\
                           nn.test_data_about(set="someset",key="number",value="singular"):
                            for vb in nn.related_text(TextSelectionOperator.PRECEDES):
                                if sentence.test_related_text(TextSelectionOperator.EMBEDS,vb) and\
                                    vb.test_data_about(set="someset",key="type",value="word") and\
                                    vb.test_data_about(set="someset",key="pos", value="verb") and\
                                    vb.test_data_about(set="someset",key="gender",value="feminine") and\
                                    vb.test_data_about(set="someset",key="number",value="plural"):
                                yield book, chapter, sentence, nn, vb

from stam.

dirkroorda avatar dirkroorda commented on June 14, 2024

I am a bit worried by the verbosity. You prefer to work with methods on annotation objects. Then you have to repeat the argument set="someset" all the time.

If you have an object that exposes the higher-level methods independent of the annotations, say F, you could say

F.setSet("someset")

before doing many calls to retrieve annotation values.

Then it would be nice if you could say:

fData = F.getData(key="type")

lookup = fData.lookup
support = fData.support
targets = fData.targets

lookup is a function that given a target t delivers the value of an annotation in "someset", with key "type" and target t.

support is a function that given a value v delivers all targets t of annotations in
"someset" with key "type" and value v.

targets is a function that given an annotation and a value v delivers all targets t of that annotation provided there is an annotation in "someset" that has target t and key "type" and value v.

It is also handy to assume that textual_order is True by default.

With this, you could shorten the phrase lookup like so:

F.setSet("someset")
tpData = F.getData("type")
tpSupport = tpData.support
tpTargets = tpData.targets

nuData = F.getData("nu")
nuLookup = nuData.lookup

results = []
for phrase in tpSupport("phrase")):
    words = tpTargets(phrase, "word")
    if len(words) < 2:
        continue
    firstword = words[0]
    lastword = words[-1]
    if nuLookup(firstword) != nuLookup(lastWord):
        continue
    results.append((phrase,firstword,lastword))

from stam.

proycon avatar proycon commented on June 14, 2024

I am a bit worried by the verbosity. You prefer to work with methods on annotation objects. Then you
have to repeat the argument set="someset" all the time.

Yes, the underlying idea is that you have all kinds of objects with distinct methods to travel the edges.

If you have an object that exposes the higher-level methods independent of the annotations, say F, you could say
F.setSet("someset")

That would already be possible with proposed API: it exposes methods to travel edges in almost every direction. If you want to be invariant over the set/key, then just grab a dataset and datakey instance and work from there. So there's often multiple ways of doing thing with this API, which does come with the disadvantage that the API is bigger than it could be, but this flexibility should hopefully match the flexibility the model itself provides and give some freedom to the choices of the modeller:

set = store.dataset("someset")
key = set.key("nu")

for phrase in key.annotations_by_data(value="phrase", textual_order=True):
  ...

It is also handy to assume that textual_order is True by default.

Yeah, I'm not entirely sure how I'm going to incorporate that parameter yet. If it's gonna have an extra cost (temporary buffer allocation) I don't like doing it by default.

from stam.

proycon avatar proycon commented on June 14, 2024

lookup is a function that given a target t delivers the value of an annotation in "someset", with key "type" and target t.

That'd be t.find_data_about(set,key,value_test) . If you already have a DataKey instance like in my previous example, you should be able to pass it without the set (because it will know what set it belongs to).
You can also do key.find_data(value_test) to get the annotation data, and then data.annotations() to get annotations referencing the data.

support is a function that given a value v delivers all targets t of annotations in
"someset" with key "type" and value v.

Depending on the type of target you're looking for, that'd be: data.annotations(), data.resources(), data.dataset() etc..
In STAM it's harder to consider the targets a heterogeneous bunch (also due to it being implemented in a strongly typed language), usually you have to be explicit about which kind of target you want (annotations, textselections, resources, etc..).

targets is a function that given an annotation and a value v delivers all targets t of that annotation provided there is an annotation in "someset" that has target t and key "type" and value v.

That'd be annotation.annotations_by_data_in_targets(set, key, value_test) for annotations, though I suppose I also need methods for the other target types then. This one feels a bit too contrived still, not to say that it isn't a valid function, but the DIY route where it's split into two calls feels a bit more natural/understandable to me:

for annotation in annotation.annotations_in_targets()
    if annotation.test_data_about(set, key, value_test):
        ...

The first method can be replaced with resources(), datasets(), textselections() for the other types.

My method naming style may be a bit more verbose than you're accustomed to, but I feel the names have to be self-documenting to a certain extent so I'd rather be a bit explicit.

from stam.

proycon avatar proycon commented on June 14, 2024

Over the summer period, this API has been implemented (in git master, both for stam-rust and stam-python, but not released yet).

from stam.

proycon avatar proycon commented on June 14, 2024

Released in stam-rust 0.8.0

from stam.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.