Comments (4)
I think it is being used to order the synonyms when they get grouped by curie. But I think if we put all the names in one document we wouldn't need to do that (and could therefore lose it). I think that it's worth giving this a shot and seeing whether it works or not.
from nameresolution.
Of course the downside here is that it makes the solr documents even that much bigger
from nameresolution.
The current docs look like this, I think:
{
"curie": "CHEBI:74925",
"length": 42,
"id": 192534,
"name": "beta-site APP-cleaving enzyme 1 inhibitors"
},
{
"curie": "CHEBI:74925",
"length": 22,
"id": 192535,
"name": "EC 3.4.23.46 inhibitor"
},
{
"curie": "CHEBI:74925",
"length": 23,
"id": 192536,
"name": "EC 3.4.23.46 inhibitors"
},
{
"curie": "CHEBI:74925",
"length": 36,
"id": 192537,
"name": "EC 3.4.23.46 (memapsin 2) inhibitors"
},
If we need to add the type and preferred name to every element, then oof.
I wonder if we could transform the docs to something more like
{
curie: ....
names: [ "name1", "name2", "name3"]
type: ""
preferred name:
}
That way the percent diff of adding the new info is paid for in the reduction of curie repetition.
from nameresolution.
If we need to add the type and preferred name to every element, then oof.
The other option would be for NameRes to include a Redis table that has id to canonical name + biolink type mappings, but that would complicate the backend.
I wonder if we could transform the docs to something more like
I like this! NameRes is loaded from the synonym file, which is currently in the format:
id [tab] synonym type [tab] synonym
I think we if modify this to:
id [tab] biolink type [tab] canonical label [tab] synonym type [tab] synonym
We could use Solr updates to load it in the format you suggest. The synonym files would then get much larger, but the Solr database -- which is the piece we have to transfer from RENCI to ITRB -- might end up being smaller because it has fewer documents in it.
Is it okay to drop the length
field? Is it currently being used?
from nameresolution.
Related Issues (20)
- Duplicated results
- Rat v rats
- Rename to Name Resolver to be consistent
- Error on reverse lookup for PUBCHEM.COMPOUND:107526
- Searching for nothing actually searches for "OR"
- Figure out some way to document the dataUrl being used for a NameLookup instance
- Update NameLookup Jupyter Notebook and write some Translator-specific documentation
- We don't do a good job on depression
- Rename `/reverse_lookup?curies=X` to `/synonyms?preferred_curie=X`
- How do you get synonyms JSON file from the download TSV files? HOT 2
- Add LICENSE to NameRes HOT 1
- Lookup for "diabetes type ..." takes more than a minute, and gives a wrong result HOT 1
- Can we incorporate hierarchy information somehow?
- "Bone" works well when autocomplete=true but breaks when autocomplete=false
- Figure out how to reduce boosts on repeated terms
- Brand Names? ®
- Hyphen processing still a bit dodgy, possibly because of our choice of tokenizer
- Upgrade Solr to 9.5
- Searching for BRCA1 in autocomplete=true mode gives a lot of bad matches
- Ideal memory requirement for NameRes on Translator HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nameresolution.