Code Monkey home page Code Monkey logo

entityshape's People

Contributors

dependabot[bot] avatar teester avatar waldyrious avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

entityshape's Issues

Disallowed section

There should be a separate section for properties not allowed in the item. They should be green if missing and red if present.

Add support for groups

Currently shex groups are not supported and the grouping is ignored.

Thus the following from E228 will not correctly evaluate:

{
(
pq:P1534 @ +;
pq:P582 xsd:dateTime +;
)* ;
}

This translates as must have 0 or more of the following: 1 or more P1534 and 1 or more of P582.

shape.py evaluates it as must have 1 or more P1534 and 1 or moreP582, which is incorrect.

Use PyShExC to generate shapes in JSON-LD

Use PyShExC (https://github.com/shexSpec/grammar-python-antlr) to generate shapes in JSON-LD, which is a standard, rather than the bespoke json we are currently generating.

Advantages:

  • will translate parts of shemas that we currently do not, such as AND and OR, making it easier to solve #1, #2 and #3
  • can let PyShExC worry about generating json shapes rather than supporting it here
  • adheres to a standard, which may make reuse easier

Disadvantages:

  • requires rewriting compareshape.py to parse JSON-LD instead of the current json.

Process multiple shapes for a single entity

With the imminent deployment of a new entityschema data type on wikidata and presumably, the approval of the Shape Expression for Class property, it should become possible to determine what shapes apply to an entity programmatically. Depending on how it is adopted, I could see the api checking items or properties associated with the queried entity for Shape Expression for Class properties and putting together a list of entity schemas to check the entity against. Then get the results of all the shapes and concatenate them so that you get a list of shapes checked and where (if at all) each property and statement fails.

For example: Simon Harris (Q7518922) is an human (E10) and a Member of the Oireachtas (E236). So if human (Q5) had a Shape Expression for Class property of E10 and Oireachtas Member ID (P4690) had a Shape Expression for Class property of E236, the api could detect E10 and E236 and run a check on Q7518922 with E10, then with E236. The script should then parse the results of both. In the summary section, the properties from both schemas should be listed in the appropriate sections.

In cases where the same property is checked in both schemas, the property should appear in the most restrictive section on the summary. i.e. if a property is necessary in one schema and optional in the other, it should appear in the necessary section only. If the property fails in either schema, it should be listed as a fail. On mousing over the properties, the breakdown from each schema should appear as a tooltip. This should be done in a similar way in the tags added to the properties and statements on the page.

It should also be possible to check a random schema in the usual way, and also to check multiple schemas from the search box, perhaps using a space or comma as a separator. Checking when there is no input in the search field should trigger automatic schema determination. The UI will need a minor update to make it clear that this will happen.

Tasks to complete

  • get entityshape to check an item against multiple schemas and return the results
  • get the script to display multiple sets of results initially & update the UI to show how to check multiple schemas
  • get the script to concatenate the results - this will allow people to check multiple schemas at the same time
  • once Shape Expression for Class is approved, get the script to autodetect schemas from pages associated with the entity and update the UI to make it clear how this works.

Statements with {0} are marked as required

Statements containing {0} to describe cardinality translate to 'does not contain'. Currently, these statements are being evaluated as if they are required to contain at least 1 match.

All entityschemas should return a 200 response

Currently, a number of entityschemas fail to parse correctly and return a 500 error, even when the entityschema is valid. Any valid entityschema should return a 200 response along with some sort of result.

The following entityschemas return a 500 response:
- E1 - ShExR
- E2 - Wikimedia
- E3 - Wikidata Item
- E4 - Labels/Descriptions
- E5 - Statement - Blank schema
- E6 - Language mappings - Blank Schema
- E7 - Citation - Blank Schema
- E8 - External RDF - Blank Schema
- E9 - Wikidata-Wikibase - Blank Schema
- E16 - Software Titles
- E37 - human gene
- E38 - human protein
- E39 - Reactome Pathway
- E44 - University Teacher
- E49 - Wikidata prefixes
- E53 - sportsperson
- E55 - programming language

  • E59 - evidence and conclusion ontology term
  • E70 - Clinical Interpretations of Variants in Cancer
    - E72 - pharmaceutical drug
    - E74 - pseudogene
    - E75 - gene
    - E86 - native Wikipathways schema
    - E87 - biological pathway in Wikidata
    - E89 - Public library branch in The Netherlands
    - E90 - Public library organisation in The Netherlands
  • E93 - FLOSS emulator
    - E96 - dummy - Blank Schema
    - E99 - statue
    - E100 - city
    - E103 - gene variant according to myvariant.info
    - E117 - newspaper with direct claim properties only
    - E118 - virtual assistant
    - E121 - [empty schema] - Blank Schema
    - E122 - [empty schema] - Blank Schema
  • E123 - Sandbox schema
    - E124 - [empty schema] - Blank Schema
    - E128 - extrasolar planet
    - E129 - one-of-a-kind computer
    - E132 - web comic
    - E150 - Specific event in figure skating
  • E165 - virus gene
    - E166 - [empty schema] - Blank Schema
    - E169 - virus protein
    - E175 - [empty schema] - Blank Schema
    - E176 - Chilean astronomers
    - E180 - [empty schema] - Blank Schema
    - E181 - [empty schema] - Blank Schema
    - E182 - [empty schema] - Blank Schema
    - E183 - Chilean Women Football Players
    - E187- hospital
    - E194 - Complex Portal entity
    - E221 - YouTube - Blank Schema
    - E226 - Swedish Academy Chair
    - E227 - Gender
  • E245 - Unicode plane
  • E246 - Unicode block
  • E247 - Unicode character
  • E251 - non-coding RNA
    - E252 - non-coding RNA gene
    - E258 - Genewiki schema
    - E259 - Wikibase property
    - E261 - Fredmans Epistel places - Blank Schema
    - E262 - Fredman Epistels person - Blank Schema
    - E263 - Type specimens of Oxalis
    - E265 - Gene Wiki SARS-COV2 primary sources
    - E266 - Gene Wiki SARS-COV2 external identifiers
    - E269 - monument historique français
    - E570 - recently deceased humans - Blank Schema
  • E999 - Borked
    - E12345 - Sandbox Schema

Total: 69/272 failures (~25%)
Total: 53/272 failures (~19.5%)
Total: 39/272 failures (~14.5%)
Total: 17/272 (6.25%)
Total: 10/301 (3.32%)

Add support for Wikibases other than Wikidata

EntityShape currently seems to assume it will only ever be run on Wikidata and contains hardcoded URIs in both Python as well as JS, both in obvious places like _get_property_name and _get_entity_json, but also _strip_schema_comments and _compare_statements.

In practice, Wikidata does not contain all data, and in my use-case I wanted to check our Q3 against our EntitySchema:E1 - which as documented in that schema works with the ShEx2 validator once CORS is whitelisted for that domain with CORS Everywhere.

Seeing this script recommended in Wikidata's shape tutorial, I tried to adapt the JS by changing the harcoded URLs within it and using mw.loader.load() on it. However, this failed. Looking closer, I saw it passes entity and property IDs into the API, not URI/IRIs.

It would be nice if it could work with a different base URI that was passed into it, or potentially across disparate URIs for entity and schema, to aid use of Wikibase by third-parties and federation between these and Wikidata. Failing that, moving hardcoded URIs into centrally-defined constants would probably make it easier to reuse the code if hosted elsewhere (e.g. by @wbstack).

Add support for lexemes

Lexemes currently get the "failed to validate schema" error message.

Support for lexemes needs to be added to the api. This required determining the important prefixes used by lexemes and translating them into properties, forms and senses in shape.py. The analysis in compareshape.py will presumably also have to take into account forms and senses.

The userscript may also need to be updated if the html element ids and classes are different to the ones that are used in entities.

An initial partial solution may be to ignore certain prefixes and only analyse properties (which the api can currently do) so there's at least partial support available.

Add support for "or"

Currently "or" or "|" are ignored when generating a shape so x or y is interpreted as x and y.

Need to decide how to represent "or" in a shape and then how to take "or" into account when comparing the shape to an entity.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.