Comments (1)
This is all of my initial understanding. There seems to be a LOT more going on, which we'll track in the opening post of this issue.
[EDIT: PAUSE FOR DISCUSSION. NEEDS MORE INVESTIGATION ON WHAT THE SPEC IS, based on Eric's reply in Translator Slack]
Previously, we implemented QNode is_set
behavior (original issue, behavior with ID/node-expansion).
- default behavior (property missing or
is_set: false
): each result has 1 KG node bound to each QNode - but if a QNode has
is_set: true
, a result can have >= 1 KG node bound to that QNode. AKA there's a merging/consolidation.
Feature
In TRAPI 1.5, is_set
is replaced with set_interpretation
, which has more explicit rules for results-assembly (PR, lines 881-896). It's an optional property with string values (enum).
- default behavior (property missing or null) == "BATCH": same as before, each result has 1 KG node bound to each QNode
- "MANY": same as previous
is_set: true
behavior.- Note: This new specification only seems to cover when QNodes have multiple starting IDs. But I'd like to keep our current use of
is_set:true
/set_interpretation: MANY
on QNodes with no starting IDs to merge/consolidate results.
- Note: This new specification only seems to cover when QNodes have multiple starting IDs. But I'd like to keep our current use of
- "ALL": new behavior. This should only be set on QNodes that have multiple starting IDs/entities. Similar to the "MANY" behavior, but only keep results that contain all starting IDs/entities.
- AKA if only some of the starting IDs/entities are in the consolidated result, it should be thrown out (and any KG nodes/edges unique to it should be pruned).
Examples
All use the same basic query, just setting set_interpretation
to different values. I used HP IDs with no descendants (because ID/node-expansion triggers an automatic use of is_set: true
, see #555 (comment))
set to BATCH (default)
Query:
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"categories":["biolink:PhenotypicFeature"],
"ids":["HP:0500041", "HP:0007750"],
"set_interpretation": "BATCH"
},
"n1": {
"categories":["biolink:Disease"]
}
},
"edges": {
"eA": {
"subject": "n0",
"object": "n1",
"predicates": ["biolink:phenotype_of"]
}
}
}
}
}
Response should have 39 results: current_default.json. This was generated with current default (not setting is_set
)
set to MANY
Query:
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"categories":["biolink:PhenotypicFeature"],
"ids":["HP:0500041", "HP:0007750"],
"set_interpretation": "MANY"
},
"n1": {
"categories":["biolink:Disease"]
}
},
"edges": {
"eA": {
"subject": "n0",
"object": "n1",
"predicates": ["biolink:phenotype_of"]
}
}
}
}
}
Response should have 37 results (rather than 39): current_is_set.json. This was generated with current is_set: true
.
set to ALL
Query:
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"categories":["biolink:PhenotypicFeature"],
"ids":["HP:0500041", "HP:0007750"],
"set_interpretation": "ALL"
},
"n1": {
"categories":["biolink:Disease"]
}
},
"edges": {
"eA": {
"subject": "n0",
"object": "n1",
"predicates": ["biolink:phenotype_of"]
}
}
}
}
}
Response should have 2 results (rather than 39 or 37). Only 2 disease entities are connected to both starting entities. See the first two results of current_is_set.json. This was generated with current is_set: true
.
- MONDO:0009003 (achromatopsia 2)
- MONDO:0013560 (Hermansky-Pudlak syndrome 8)
Complications that need discussion
(1) ID/node expansion
Currently, if we find descendants of a starting ID (ID/node expansion), we set that starting ID's QNode to is_set: true
#555 (comment). Can we remove this behavior?
- the current behavior has unintended consequences, like being completely unable to do
is_set: false
/set_interpretation: BATCH
for some queries - but I'm not sure if we depend on this behavior (we want to keep the behavior of "subclass_of edges using different descendant IDs are kept in the same result")
- Context: we implemented this with an old version of representing subclass info (comment). Now we use "constructed edges" + aux-graphs (issue)
Example
I expect 11 results for following query (not setting is_set
), but end up with 10 results which is the same as setting is_set: true
.
- response to the query below, not setting
is_set
: 10_not_setting_is_set.json - VS response setting
is_set
10_is_set_true.json
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"categories":["biolink:PhenotypicFeature"],
"ids":["HP:0007800", "HP:0025586"]
},
"n1": {
"categories":["biolink:Disease"]
}
},
"edges": {
"eA": {
"subject": "n0",
"object": "n1",
"predicates": ["biolink:phenotype_of"]
}
}
}
}
}
This happens because ID/node-expansion finds a descendant for 1 of the starting IDs. Console logs:
bte:biothings-explorer-trapi:main Expanded ids for node n0: (2 ids -> 3 ids) +0ms
bte:biothings-explorer-trapi:main Added is_set:true to node n0 +1ms
Note to self with another example
This query has 130 results whether is_set: true
or not, when it should have >= 134 results when not. It also has some subclass_of edges/aux-graphs, but I'm not sure if it's a good test for seeing if ID/node-expansion becomes wonky after the changes.
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"categories":["biolink:PhenotypicFeature"],
"ids":["HP:0003259", "HP:0000110"]
},
"n1": {
"categories":["biolink:Disease"]
}
},
"edges": {
"eA": {
"subject": "n0",
"object": "n1",
"predicates": ["biolink:phenotype_of"]
}
}
}
}
}
(2) Unclear what the KG Node is_set
property is for
Asked in Translator Slack:
The PR for set_interpretation
also adds an is_set
property to KG Nodes (lines 1011-1017). I'm not sure if this is meant to be used, and how (merging KG Nodes??).
Eric's reply in Translator Slack - needs more investigation.
(3) Clarifying an edge case
Asked in Translator Slack:
If set_interpretation
is set on a QNode with multiple starting IDs, but these IDs all map to the same entity (using NodeNorm), then there isn't any set behavior to do. Is that fine? Does there need to be any log noting this?
from biothings_explorer.
Related Issues (20)
- overview and management of TRAPI 1.5 features (excluding set_interpretation/MCQ) HOT 4
- TRAPI 1.5: new query property `bypass_cache` HOT 3
- TRAPI 1.5: add `"attributes": []` to NodeBindings, EdgeBindings, AuxGraphs HOT 2
- TRAPI 1.5: update SmartAPI yamls for BTE + Service Provider HOT 10
- TRAPI 1.5: support source_record_urls HOT 10
- error in subclass node expansion HOT 3
- Handling local instance "port in use" HOT 1
- Changing Creative mode threshold from results to time HOT 10
- Remove temporary config changes (rolling deployment chore) HOT 2
- TRAPI validation error in meta_knowledge_graph in Dev and CI for TRAPI 1.5 HOT 2
- add `max research phase` to `treatsChembl` edges from mychem.info HOT 6
- KP is returning a curie that wasn't asked for without a query id HOT 10
- BTE is failing TRAPI 1.5 validation in CI HOT 4
- TRAPI 1.5: update registrations for TRAPI KPs used HOT 6
- add frequency qualifier for disease-phenotype associations to biolink API annotation
- `/meta_knowledge_graph` MetaEdge `knowledge_types` cannot be `lookup_and_score`? HOT 4
- Run all creative mode templates simultaneously HOT 1
- Backport Retriever prototype improvements
- Reduce scoring time by limiting NGD timeout HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from biothings_explorer.