teester / entityshape Goto Github PK
View Code? Open in Web Editor NEWAn api to compare a wikidata item with an entityschema
License: GNU General Public License v3.0
An api to compare a wikidata item with an entityschema
License: GNU General Public License v3.0
There should be a separate section for properties not allowed in the item. They should be green if missing and red if present.
Currently shex groups are not supported and the grouping is ignored.
Thus the following from E228 will not correctly evaluate:
{
(
pq:P1534 @ +;
pq:P582 xsd:dateTime +;
)* ;
}
This translates as must have 0 or more of the following: 1 or more P1534 and 1 or more of P582.
shape.py evaluates it as must have 1 or more P1534 and 1 or moreP582, which is incorrect.
Use PyShExC (https://github.com/shexSpec/grammar-python-antlr) to generate shapes in JSON-LD, which is a standard, rather than the bespoke json we are currently generating.
Advantages:
Disadvantages:
With the imminent deployment of a new entityschema data type on wikidata and presumably, the approval of the Shape Expression for Class property, it should become possible to determine what shapes apply to an entity programmatically. Depending on how it is adopted, I could see the api checking items or properties associated with the queried entity for Shape Expression for Class properties and putting together a list of entity schemas to check the entity against. Then get the results of all the shapes and concatenate them so that you get a list of shapes checked and where (if at all) each property and statement fails.
For example: Simon Harris (Q7518922) is an human (E10) and a Member of the Oireachtas (E236). So if human (Q5) had a Shape Expression for Class property of E10 and Oireachtas Member ID (P4690) had a Shape Expression for Class property of E236, the api could detect E10 and E236 and run a check on Q7518922 with E10, then with E236. The script should then parse the results of both. In the summary section, the properties from both schemas should be listed in the appropriate sections.
In cases where the same property is checked in both schemas, the property should appear in the most restrictive section on the summary. i.e. if a property is necessary in one schema and optional in the other, it should appear in the necessary section only. If the property fails in either schema, it should be listed as a fail. On mousing over the properties, the breakdown from each schema should appear as a tooltip. This should be done in a similar way in the tags added to the properties and statements on the page.
It should also be possible to check a random schema in the usual way, and also to check multiple schemas from the search box, perhaps using a space or comma as a separator. Checking when there is no input in the search field should trigger automatic schema determination. The UI will need a minor update to make it clear that this will happen.
Tasks to complete
Statements containing {0} to describe cardinality translate to 'does not contain'. Currently, these statements are being evaluated as if they are required to contain at least 1 match.
Items in the required section which are missing currently show up as orange. They should show up as red.
Currently, a number of entityschemas fail to parse correctly and return a 500 error, even when the entityschema is valid. Any valid entityschema should return a 200 response along with some sort of result.
The following entityschemas return a 500 response:
- E1 - ShExR
- E2 - Wikimedia
- E3 - Wikidata Item
- E4 - Labels/Descriptions
- E5 - Statement - Blank schema
- E6 - Language mappings - Blank Schema
- E7 - Citation - Blank Schema
- E8 - External RDF - Blank Schema
- E9 - Wikidata-Wikibase - Blank Schema
- E16 - Software Titles
- E37 - human gene
- E38 - human protein
- E39 - Reactome Pathway
- E44 - University Teacher
- E49 - Wikidata prefixes
- E53 - sportsperson
- E55 - programming language
Total: 69/272 failures (~25%)
Total: 53/272 failures (~19.5%)
Total: 39/272 failures (~14.5%)
Total: 17/272 (6.25%)
Total: 10/301 (3.32%)
It should be possible to use the user script on wikidata's mobile site.
EntityShape currently seems to assume it will only ever be run on Wikidata and contains hardcoded URIs in both Python as well as JS, both in obvious places like _get_property_name and _get_entity_json, but also _strip_schema_comments and _compare_statements.
In practice, Wikidata does not contain all data, and in my use-case I wanted to check our Q3 against our EntitySchema:E1 - which as documented in that schema works with the ShEx2 validator once CORS is whitelisted for that domain with CORS Everywhere.
Seeing this script recommended in Wikidata's shape tutorial, I tried to adapt the JS by changing the harcoded URLs within it and using mw.loader.load() on it. However, this failed. Looking closer, I saw it passes entity and property IDs into the API, not URI/IRIs.
It would be nice if it could work with a different base URI that was passed into it, or potentially across disparate URIs for entity and schema, to aid use of Wikibase by third-parties and federation between these and Wikidata. Failing that, moving hardcoded URIs into centrally-defined constants would probably make it easier to reuse the code if hosted elsewhere (e.g. by @wbstack).
Lexemes currently get the "failed to validate schema" error message.
Support for lexemes needs to be added to the api. This required determining the important prefixes used by lexemes and translating them into properties, forms and senses in shape.py. The analysis in compareshape.py will presumably also have to take into account forms and senses.
The userscript may also need to be updated if the html element ids and classes are different to the ones that are used in entities.
An initial partial solution may be to ignore certain prefixes and only analyse properties (which the api can currently do) so there's at least partial support available.
The userscript should update to show the new result when the user adds, removes or edits a statement. Currently, the user has to click the "Check" button for a check to occur.
Currently "or" or "|" are ignored when generating a shape so x or y is interpreted as x and y.
Need to decide how to represent "or" in a shape and then how to take "or" into account when comparing the shape to an entity.
Visiting https://entityshape.toolforge.org/api I get 500
Would you be willing to fix that?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.