Comments (6)
In order to get a feel for this API, I translated Dirk's example code from https://nbviewer.org/github/ETCBC/bhsa/blob/master/programs/stam-nu.ipynb to it. This is code to find "all phrases where the first and the last words have the same grammatical number":
First the Text Fabric code:
results = []
for p in F.otype.s("phrase"):
ws = L.d(p, otype="word")
if len(ws) < 2:
continue
fi = ws[0]
la = ws[-1]
if F.nu.v(fi) != F.nu.v(la):
continue
results.append((p, fi, la))
Now almost the exact same structure with the new STAM API using the python binding (proof of concept, details may vary still):
results = []
for phrase in store.annotations_by_data(set="someset", key="type", value="phrase", textual_order=True):
words = phrase.annotations_by_data_in_targets(set="someset", key="type", value="word", textual_order=True)
if len(words) < 2:
continue
firstword = words[0]
lastword = words[-1]
for data, annotation in firstword.data_about(set="someset",key="nu"):
if lastword.test_data_about(data):
results.append((phrase,firstword,lastword))
I also reformulated the pseudo-query-code from one of the query proposals to the new proposed API. Using Python. This is a complex query to selecting specific noun phrases followed by specific verb phrases within a specific context (chapter, sentence, book):
for book in store.resources_by_data(set="someset",key="name", value=DataOperator.any(DataOperator.equals("genesis"),DataOperator.equals("exodus")):
for chapter in book.text_by_data(set="someset",key="type",value="chapter"):
if chapter.test_data_about(set="someset",key="number",value=2):
for sentence in chapter.text_by_data(set="someset",key="type", value="sentence"):
if chapter.test_related_text(TextSelectionOperator.EMBEDS, sentence):
for nn in sentence.related_text(TextSelectionOperator.EMBEDS):
if nn.test_data_about(set="someset",key="type",value="word") and\
nn.test_data_about(set="someset",key="pos", value="noun") and\
nn.test_data_about(set="someset",key="gender",value="feminine") and\
nn.test_data_about(set="someset",key="number",value="singular"):
for vb in nn.related_text(TextSelectionOperator.PRECEDES):
if sentence.test_related_text(TextSelectionOperator.EMBEDS,vb) and\
vb.test_data_about(set="someset",key="type",value="word") and\
vb.test_data_about(set="someset",key="pos", value="verb") and\
vb.test_data_about(set="someset",key="gender",value="feminine") and\
vb.test_data_about(set="someset",key="number",value="plural"):
yield book, chapter, sentence, nn, vb
from stam.
I am a bit worried by the verbosity. You prefer to work with methods on annotation objects. Then you have to repeat the argument set="someset"
all the time.
If you have an object that exposes the higher-level methods independent of the annotations, say F
, you could say
F.setSet("someset")
before doing many calls to retrieve annotation values.
Then it would be nice if you could say:
fData = F.getData(key="type")
lookup = fData.lookup
support = fData.support
targets = fData.targets
lookup
is a function that given a target t
delivers the value of an annotation in "someset", with key "type" and target t
.
support
is a function that given a value v
delivers all targets t
of annotations in
"someset" with key "type" and value v
.
targets
is a function that given an annotation and a value v
delivers all targets t
of that annotation provided there is an annotation in "someset" that has target t
and key "type" and value v
.
It is also handy to assume that textual_order
is True
by default.
With this, you could shorten the phrase lookup like so:
F.setSet("someset")
tpData = F.getData("type")
tpSupport = tpData.support
tpTargets = tpData.targets
nuData = F.getData("nu")
nuLookup = nuData.lookup
results = []
for phrase in tpSupport("phrase")):
words = tpTargets(phrase, "word")
if len(words) < 2:
continue
firstword = words[0]
lastword = words[-1]
if nuLookup(firstword) != nuLookup(lastWord):
continue
results.append((phrase,firstword,lastword))
from stam.
I am a bit worried by the verbosity. You prefer to work with methods on annotation objects. Then you
have to repeat the argument set="someset" all the time.
Yes, the underlying idea is that you have all kinds of objects with distinct methods to travel the edges.
If you have an object that exposes the higher-level methods independent of the annotations, say F, you could say
F.setSet("someset")
That would already be possible with proposed API: it exposes methods to travel edges in almost every direction. If you want to be invariant over the set/key, then just grab a dataset and datakey instance and work from there. So there's often multiple ways of doing thing with this API, which does come with the disadvantage that the API is bigger than it could be, but this flexibility should hopefully match the flexibility the model itself provides and give some freedom to the choices of the modeller:
set = store.dataset("someset")
key = set.key("nu")
for phrase in key.annotations_by_data(value="phrase", textual_order=True):
...
It is also handy to assume that textual_order is True by default.
Yeah, I'm not entirely sure how I'm going to incorporate that parameter yet. If it's gonna have an extra cost (temporary buffer allocation) I don't like doing it by default.
from stam.
lookup
is a function that given a targett
delivers the value of an annotation in "someset", with key "type" and targett
.
That'd be t.find_data_about(set,key,value_test)
. If you already have a DataKey instance like in my previous example, you should be able to pass it without the set (because it will know what set it belongs to).
You can also do key.find_data(value_test)
to get the annotation data, and then data.annotations()
to get annotations referencing the data.
support
is a function that given a valuev
delivers all targetst
of annotations in
"someset" with key "type" and valuev
.
Depending on the type of target you're looking for, that'd be: data.annotations()
, data.resources()
, data.dataset()
etc..
In STAM it's harder to consider the targets a heterogeneous bunch (also due to it being implemented in a strongly typed language), usually you have to be explicit about which kind of target you want (annotations, textselections, resources, etc..).
targets
is a function that given an annotation and a valuev
delivers all targetst
of that annotation provided there is an annotation in "someset" that has targett
and key "type" and valuev
.
That'd be annotation.annotations_by_data_in_targets(set, key, value_test)
for annotations, though I suppose I also need methods for the other target types then. This one feels a bit too contrived still, not to say that it isn't a valid function, but the DIY route where it's split into two calls feels a bit more natural/understandable to me:
for annotation in annotation.annotations_in_targets()
if annotation.test_data_about(set, key, value_test):
...
The first method can be replaced with resources()
, datasets()
, textselections()
for the other types.
My method naming style may be a bit more verbose than you're accustomed to, but I feel the names have to be self-documenting to a certain extent so I'd rather be a bit explicit.
from stam.
Over the summer period, this API has been implemented (in git master, both for stam-rust and stam-python, but not released yet).
from stam.
Released in stam-rust 0.8.0
from stam.
Related Issues (13)
- Disallow nesting complex selectors HOT 1
- Improve the space-efficiency of complex selectors HOT 5
- Formulate a STAM Query Language HOT 35
- Write STAM paper HOT 1
- Initial STAM presentation
- Add examples other than "explicit_containment" HOT 1
- Support external annotations files to allow selective loading and avoid memory issues HOT 2
- consider adding remarks or descriptions HOT 7
- why must a private identifier start with _? HOT 1
- How to deal with resource changes? HOT 4
- Annotate existing xml resources? HOT 1
- the importance of having a coordinate system independent of what the source files offer HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stam.