operations's Introduction
operations's People
operations's Issues
Harvest mapping for each of the 8PRs
Review composite mapping of prefixes and namespaces
- Ferret out any cross-registrations that may not be an exact match on prefix, but on other fields.
- Assess the text-edit distance or other easy to do similarity match across all of the title fields.
- Based on the above, for each prefix in the list, provide two pieces of data: 1) score indicating how likely the prefix in one row is to be related to a prefix in another row. (eg. corresponding to the same dataset, or to portions of that dataset). (eg. KEGG-disease, vs KEGG-protein). 2) what that corresponding prefix is to investigate
- Re-arrange the list so that the related ones are clustered together for easier curation (?)
Design front end concept
First front end concept is in https://docs.google.com/drawings/d/1O-9VUcUExZBgGf5d-jAVlV0AsaMQdjK6ovooAVONh2k/edit. After evaluating desired functions, consider additional options as appropriate.
Develop workflows that include bi-directional synchronization
Determine which prefixes are cross-registered
Standardize the syntax of example URL syntax
Eg. with [example-id] as is done in GO, or # as done elsewhere.
Unfortunately, we can not just always append the id to the end as sometimes more needs to be appended after that (eg. .html etc)
Design embeddable widget
[ ] Wireframe
[ ] Get feedback from repositories
HGNC as use case of multiple identifier complexities
HGNC is an example collection with four co-occuring identifier complexities:
1. Ambiguity about what $id even is.
The identifiers.org record above captures the fact that HGNC records exist in 3rd party databases but identifiers.org doesn't have a strong concept of a prefix; consequently it isn't possible to get to both "physical locations" of the entity using a single (equivalent) $id. In one case $id is prefixed, and in the other, it is not. HGNC, mercifully, honors both forms. However:
- Other data providers may not be as forgiving as HGNC is
- More often than not variation in the local ID pattern is precisely what the data provider is relying on in order to redirect to their right type-specific path.
A stronger notion of prefix is the simplest thing that would help data integrators collapse the following as equivalent http identifiers since 2674
is the invariant part of the ID.
Given the identifiers.org data model, there is no way to determine whether http://identifiers.org/hgnc/hgnc:2674 points to the same entity as http://identifiers.org/hgnc/2674. This is why I favor developing a bare-curie based resolver like http://n2t.net/hgnc:2674--or if identifiers.org is interested in doing so--http://identifiers.org/hgnc:2674
This would allow us to determine that all of these are talking about the same entity:
Authoritative sources:
- http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=hgnc:$localid
- http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=$localid
Identifier resolvers:
- http://identifiers.org/hgnc/$localid
- http://identifiers.org/hgnc/hgnc:$localid
- http://identifiers.org/hgnc/HGNC:$localid
- http://n2t.net/HGNC:$localid
Third party content providers
- http://hgnc.bio2rdf.org/describe/?url=http://bio2rdf.org/hgnc:$localid
- https://monarchinitiative.org/resolve/HGNC:$localid
2. Multiple entity types (Genes and Gene families)
Identifiers.org namespace | regex | URI |
---|---|---|
hgnc | ^((HGNC or hgnc):)?\d{1,5}$ | http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=$id [Example: 2674] |
hgnc.family | ^[A-Z0-9-]+(#[A-Z0-9-]+)?$ | http://www.genenames.org/genefamilies/$id [Example: PADI] |
hgnc.symbol | ^[A-Za-z-0-9_]+(@)?$ | http://www.genenames.org/cgi-bin/gene_symbol_report?match=$id [Example: DAPK1] |
3. Multiple identifier types (alphanumeric symbol and numeric ID)
4. Type-specific URL patterns combined with lack of deterministic typing in local ID
Consequently you have to know what you're looking at before you can know where to resolve it. Note lack of deterministic typing in localID is not a problem unless you need type-specific URLs the way HGNC does.
Sorry to bug you @KrisGray, you're listed on the HGNC github; could you comment as to whether there's a single URL that can be used across types of IDs in HGNC? (family, symbol, numeric ID) so that we can address at least number 4 on the list?
Of cross-registered prefixes, determine which refer to the same resources and which are actual collisions
Refine core data model to include flag for isDeprecated
In order to allow us to collapse entries that are based on out-of-date info
Tighten up documentation and resolution behavior for OBO ontologies in Identifiers.org
Provenance tracking at level of prefix and URI
Pull In datacite info too
Evaluate required functions
Review user scenarios at https://docs.google.com/document/d/1DxU2IN56fUoASxaiEOsn2NDtjRGWw5rK_BxRAKcyvwU/edit, evaluate for applicability to current hackathon. Consider how to address in UI (#9).
Document locations of dumps for each of the 8PRs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.