Code Monkey home page Code Monkey logo

snowstorm's Introduction

Snowstorm Terminology Server

Build Status Language grade: Java codecov Docker

Snowstorm is an open source terminology server with special support for SNOMED CT. It is built on top of Elasticsearch, with a focus on performance and enterprise scalability.

SNOMED International is not able to offer commercial support for this product. Support is provided by the community via this repository.

APIs

Snowstorm has two APIs:

  • HL7 FHIR API 🔥
    • Implements the Terminology Module
    • Recommended for implementers
    • Supports SNOMED CT, LOINC, ICD-10, ICD-10-CM and other code systems
  • Specialist SNOMED CT API
    • Supports the management of SNOMED CT code systems
    • Supports the SNOMED CT Browser
    • Supports authoring SNOMED CT editions

Advice for Implementers

SNOMED International recommends that implementers of SNOMED CT use a terminology service, such as Snowstorm, and a standard interface, such as the HL7 FHIR API.

This approach allows loose coupling of applications as well as access to powerful terminology features.

Snowstorm is a good choice for teams who are just getting started or who have terminology and technical support capability. Other terminology servers are available, some offer commercial support.

SNOMED CT Browser Support

Snowstorm provides the terminology server API for the SNOMED International Browser including the International Edition and around fourteen national Editions.

Snowstorm can be used in local implementations to query SNOMED CT with the following features:

  • Hosting multiple extensions alongside the International Edition of SNOMED CT
  • Multi-lingual search and content retrieval
  • Fully ECL v2.0 compliant
  • Full history (depends on full RF2 import)
  • Read-only FHIR API 🔥

Authoring Use

Snowstorm also provides the terminology server API for the SNOMED International Authoring Platform.

The Authoring Platform is used for the maintenance of the International Edition of SNOMED CT as well as nine national Editions and several community content Extensions.

Documentation

Contributing

We welcome questions, ideas, issues and code contributions to this project.

Use the issues page to get in touch with the community.

If you would like to make a code contribution please fork the repository and create a GitHub pull request to the develop branch.

License

Apache 2.0

See the included LICENSE file for details.

Tools

For Java performance profiling we recommend the JProfiler Java profiler.

snowstorm's People

Contributors

ahoejen avatar astro-snail avatar chrismorris-cmo avatar codermchu avatar danka74 avatar dependabot[bot] avatar dmcgihtsdo avatar hurtigcodes avatar jbarcas avatar jimcornmell avatar jonzammit avatar kaicode avatar kevinbayes avatar liquid36 avatar lucacorbucci avatar luezgj avatar markrynbeek avatar martingall87 avatar me2resh avatar metaruslan avatar nhnicwaller avatar nightscape avatar pgwilliams avatar quyenly87 avatar richardawood avatar rorydavidson avatar rowanvanbeckhoven avatar shapirod2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

snowstorm's Issues

Repeated warnings: number of terms exceeded allowed maximum

I noticed a large number of WARN events (repeated 98 times) being written to the Elasticsearch log while I had an import in progress.

[2018-11-08T23:38:06,216][WARN ][o.e.d.i.q.TermsQueryBuilder] Deprecated: the number of terms [342029] used in the Terms Query request has exceeded the allowed maximum of [65536]. This maximum can be set by changing the [index.max_terms_count] index level setting.

The same warning was returned back to the Elasticsearch RestClient and logged by snowstorm (12 times).

2018-11-08 23:40:57.115 WARN 155 --- [/O dispatcher 1] org.elasticsearch.client.RestClient : request [GET http://localhost:9200/es-query/query-concept/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&scroll=60000ms&search_type=query_then_fetch&batched_reduce_size=512] returned 1 warnings: [299 Elasticsearch-6.4.2-04711c2 "Deprecated: the number of terms [384272] used in the Terms Query request has exceeded the allowed maximum of [65536]. This maximum can be set by changing the [index.max_terms_count] index level setting." "Thu, 08 Nov 2018 23:40:53 GMT"]

What is the status of this warning?

  • Is this an unexpected warning?
  • Or is it expected, but not a concern?
  • Or is it expected, and I should adjust my settings in order to avoid printing the warning?

It looks like this warning will become a hard error in 7.0.

Additional Details

  • Elasticsearch 6.4.2
  • Snowstorm 2.1.0

Addition of read-only option

As some users would want to use this in a read-only fashion, then parts of the functionality are not needed, including the allocation of SCT identifiers or authentication tokens, making deployments more straightforward.
The simplistic recommendation is a startup flag for read-only use, by-passing and disabling any write functionality.

Snowstorm on Google Kubernetes Engine

I was able to run it locally using latest jar file (3.0.3) and elastic search 6.5.4. Also created docker images with the help of docker-compose.yml. Pushed images of elastic search and snowstorm on GCP kubernetes. I was able to run elastic search on GCP after adding changes to increase max virtual memory. But snowstorm workload is not running, It shows error for loading elastic search.

Error - 2019-06-03 12:28:07.374 ERROR 1 --- [ main] .d.e.r.s.AbstractElasticsearchRepository : failed to load elasticsearch nodes : UncategorizedExecutionException[Failed execution]; nested: ExecutionException[java.net.ConnectException: Connection refused]; nested: ConnectException[Connection refused];; java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused

The elastic search is running on its own IP (load balancer) with port 80 though target port is mentioned 9200. I tried to change port to 9200 but then it goes to unhealthy state, So currently elastic search is running with port=80 and target port=9200.
I don't know how to link snowstorm with elasticsearch service running on GCP. Please provide some guidelines or document for deployment process.
Thank You.!

^ character not accepted in http requests

When trying to run the following ECL - ^ 733990004 |Nursing activities reference set| - the following exception is thrown by snowstorm:

2018-04-19 09:50:21.334 INFO 1 --- [nio-8080-exec-1] o.apache.coyote.http11.Http11Processor : Error parsing HTTP request header java.lang.IllegalArgumentException: Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC 3986 at org.apache.coyote.http11.Http11InputBuffer.parseRequestLine(Http11InputBuffer.java:476) ~[tomcat-embed-core-8.5.29.jar!/:8.5.29] at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:687) ~[tomcat-embed-core-8.5.29.jar!/:8.5.29] at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66) [tomcat-embed-core-8.5.29.jar!/:8.5.29] at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:790) [tomcat-embed-core-8.5.29.jar!/:8.5.29] at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1459) [tomcat-embed-core-8.5.29.jar!/:8.5.29] at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) [tomcat-embed-core-8.5.29.jar!/:8.5.29] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_151] at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-embed-core-8.5.29.jar!/:8.5.29] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]

Inferred vs. stated relationships for parents

I am seeing something odd when hitting the GET /browser/{branch}/concepts/{conceptId}/parents endpoint between "inferred" versus "stated". It may be my misunderstanding of something in SNOMED.

Working with the veterinary extension the following concept - 354541000009105 (Castrated male) is giving two completely different concepts as parents depending on whether I use "inferred" or "stated" for the form parameter.

For "inferred" it is returning concept 248153007 (Male).
For "stated" it is returning concept 106106004 (Male reproductive finding)

I looked in the Relationship RF2 file in the veterinary extension and both of those is-a relationships are defined in there have different characteristicTypeId values. The 248153007 is 900000000000011006 (inferred relationship) and 106106004 is 900000000000010007 (stated relationship). So I think that explains why it is showing up the way it is.

So my question is does setting the form parameter to inferred only return the relationships with the explicit inferred relationship and not return any with stated and vice versa? If so, then doesn't this mean that in order to get all parents one would have to call this endpoint twice?

[Feature Request] Create HELM chart for Kubernetes Deployment

It will be very useful to have a public HELM chart containing the Kubernetes descriptors.
Having a HELM chart will be useful also to parameterise different values for different environments and also secrets (for key).

Once the HELM chart is provided it can be submitted to the main HELM chart repositories.

Incorrectly edited docker-compose?

I was just trying to setup this project with docker compose and I noticed the latest commit seems to delete critical lines of the file, that seem unrelated to the change

Was this deleted by accident?

Also, it seems that packaging requires ElasticSearch

Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.data.elasticsearch.rest.ElasticsearchRestClient]: Circular reference involving containing bean 'testConfig' - consider declaring the factory method as static for independence from its containing instance. Factory method 'elasticsearchClient' threw exception; nested exception is pl.allegro.tech.embeddedelasticsearch.EmbeddedElasticsearchStartupException: Failed to start elasticsearch. Check previous logs for details

I was trying to make the Dockerfile do RUN mvn clean package but it fails as it expects elasticSearch to be available at package time?

Incorrect ECL Ancestor Count After Importing Upgrade RF2 Delta

After importing the July 2018 International Snapshot and then the Jan 2019 International Delta a handful of concepts have an incorrect ancestor count.
For example >125021000119107 give 48 results rather than the expected 49 results because ancestor 195967001 has been lost.

Language support in FHIR API

As far as I can see the FHIR API does not support any other languages than the language imported in the MAIN branch. I am running Snowstorm with the international release in the MAIN branch and the Swedish edition in MAIN/SNOMEDCT-SE. I've tested the FHIR lookup-operations on CodeSystem and it always uses the MAIN branch, so I am not able to get the Swedish translations through FHIR . The regular REST-API works using the Accept-Language http-header, so I have verified that the Swedish release has been imported correctly.

Parents endpoint seems to provide ancestors in response

Hi all,

The parents endpoint seems to provide the ancestors in its response. As far as I understand the ECL query system, these two queries should give the same response;

  • /browser/MAIN/concepts/19431000/parents?form=inferred
  • /MAIN/concepts?ecl=>!19431000

Am I missing something (ie. direct parents != inferred parents), or is the parents endpoint misbehaving?

Best regards,
Sander

Incorrect concept search total results size

The concept search endpoint is returning an incorrect total results size when performing a simple logical search like activeFilter=true. The number of results on that page is reported as the total number of results available.

Add Filter semantic Tag

How to search term based on semantic tag.
eg: I want to search 'heart attack' term from disorder semantic tag.

Large number of ES scroll contexts created during import

During the initial snapshot import of the SNOMED CT RF2 file there are up to 10K Elasticsearch scroll contexts open at once. This may be slowing down the import. We should investigate if these search contexts can be closed more quickly in code, perhaps in the spring-data-elasticsearch layer, rather than relying on the scroll context timeout which is likely at the moment.

See the open_contexts stat when importing here http://localhost:9200/_nodes/stats

Thanks to @rorydavidson for finding this.

Some ECL results missing

When performing an ECL search using attributes some concepts are not returned as expected.
For example *:363698007=* does not return concept 34020007 |Streptococcal pneumonia| even though this concept has the 363698007 attribute in the inferred form.

Snowstorm induced memory leak in Elasticsearch?

I'm running Snowstorm server in Amazon Web Services, in combination with a hosted Elasticsearch service provided by Amazon Elasticsearch Service. This has been running for about two months now, and today I noticed some undesirable trends in metrics corresponding to our Elasticsearch instance for Snowstorm. In the last 63 days...

  • JVMMemoryPressure increased from 28.5% to 65.9% in a stepwise fashion, with steps occurring approximately every 4 hours. This correlates with a brief spike in DiskQueueDepth, which normally holds at 0.
  • JVMGCYoungCollectionCount and JVMGCYoungCollectionTime are increasingly linearly over time, with no apparent connection to the steps shown in JVMMemoryPressure.

image

Could Snowstorm be performing some regular, routine process that is leading to the buildup of objects in Elasticsearch and causing a memory leak?

I'm using Snowstorm 2.1.0 and Elasticsearch 6.3.

Quotes not escaped in ECL response

Consider this curl request for the concept "Wallace "69" side-to-end anastomosis":

curl -X GET --header 'Accept-Language: en' 'http://localhost:8080/browser/MAIN/concepts/257751006'

In this case, the quotes in the terms are properly escaped, eg:
"Wallace \"69\" side-to-end anastomosis - action (qualifier value)".

However, the same request expressed as an ECL query, like this:

curl 'http://localhost:8080/MAIN/concepts?ecl=%20257751006%20&page=0&limit=1'

will return an invalid JSON response because the quotes are not escaped, eg:
"Wallace "69" side-to-end anastomosis - action (qualifier value)"

Unexplained cause of failed import

I'm trying to set up snowstorm for the first time, and I'm running into a bit of trouble. I'm starting up snowstorm and doing the import immediately on launch.

java -Xmx4g -jar /opt/snowstorm-2.1.0.jar --delete-indices --import=/opt/SnomedCT.zip

After running for a while, the import appears to fail.

2018-11-07 18:02:17.624 ERROR 106 --- [pool-5-thread-2] o.ihtsdo.otf.snomedboot.ReleaseImporter : Failed to read or process lines.
2018-11-07 18:02:17.625 ERROR 106 --- [ool-5-thread-14] o.ihtsdo.otf.snomedboot.ReleaseImporter : Failed to read or process lines.
2018-11-07 18:02:17.752 ERROR 106 --- [pool-5-thread-1] o.ihtsdo.otf.snomedboot.ReleaseImporter : Failed to read or process lines.
2018-11-07 18:02:17.757 ERROR 106 --- [ool-5-thread-15] o.ihtsdo.otf.snomedboot.ReleaseImporter : Failed to read or process lines.
2018-11-07 18:02:17.788 ERROR 106 --- [pool-5-thread-5] o.ihtsdo.otf.snomedboot.ReleaseImporter : Failed to read or process lines.
[...]
2018-11-07 18:02:17.807 ERROR 106 --- [ main] o.s.s.core.rf2.rf2import.ImportService : Failed RF2 SNAPSHOT import on branch MAIN. ID 1a2bff8f-8a00-4d80-8047-b056b90859fe

I see stack traces for a few occurrences of UncategorizedExecutionException, all of which are caused by java.net.ConnectException (Connection refused). All of this concludes with the Spring application context shutting down.

Error starting ApplicationContext. To display the conditions report re-run your application with 'debug' enabled.
2018-11-07 18:02:18.270 ERROR 106 --- [ main] o.s.boot.SpringApplication : Application run failed
java.lang.IllegalStateException: Failed to execute ApplicationRunner

I'm using SnomedCT_RF2Release_CDN_20181031 obtained through Canada Health Infoway.

Shutting down after a failed import seems like a reasonable approach, but none of this output really helps me identify the specific file(s) or line(s) that are causing a problem with import. It would be helpful to provide more information here, perhaps by logging the names of files that are being opened, before they are fully processed.

Some concepts do not appear in the ECL index

After importing the Jan 2018 RF2 export from the UAT Snow Owl terminology server some concepts are missing from the Snowstorm ECL index.

Searching for descendants and self of the root concept in UAT gives 363509 results but only 354171 in Snowstorm.

Concept 16837005 is an example concept which is missing. This concept can be returned from the browser endpoint but an ECL search with this as the focus concept returns nothing.

Error when using international characters on ECL

Using this content on POST /{branch}/concepts/search search returns an error:

{ "activeFilter": true, "conceptIds": [], "eclFilter": "<<19923001 |catéter (objeto físico)|", "limit": 2, "offset": 0, "statedEclFilter": "", "termFilter": "" }

Returns this result:

{ "error": "INTERNAL_SERVER_ERROR", "message": "Failed to parse ECL '<<19923001 |catéter (objeto físico)|'" }

Removing accented characters from the ECL resolves the issue:

<<19923001 |catéter (objeto físico)| -> <<19923001 |cateter (objeto fisico)|

The term comes from the Spanish Edition. Tested the ECL on the APG parser site and parses OK with accents.

Thanks

Retrieving descendants - stated versus inferred

Retrieving descendants using either the swagger ui or URL does not react to changing the requested state (inferred versus stated).
http://localhost:8080/MAIN/concepts/125589001/descendants?stated=true&offset=0&limit=10000
Results in the same descendants as:
http://localhost:8081/MAIN/concepts/125589001/descendants?stated=false&offset=0&limit=10000
Can anyone confirm this and perhaps help with a workaround?
Best regards

Exit process after import completes

Please consider adding a command line option that causes snowstorm to exit gracefully after finishing an import.

Scenario

I'm running Snowstorm in two ways.

  1. I have snowstorm running as a long-lived, supervised process (webserver mode) that serves responses to client requests.
  2. I have a short-lived snowstorm process that starts up when I need to [re]import the Snomed CT concept database into Elasticsearch. That process runs with options --delete-indices and --import [file]. I would like that version to exit after the import has successfully completed in order to free up system memory, but currently it just continues running indefinitely.

Workaround

Currently my workaround is to set a timeout on the import process so that it is killed after a number of hours, but that's less optimal compared to snowstorm exiting gracefully as soon as the work is done. It means I'm using memory longer than I need to, and it means there's a slight risk the process could be killed before import completes.

Suggestion

I've seen other software use options like --once or --exit. Maybe one of those would fit here?

Creating a version of an imported extension fails

I am trying to import the SNOMED Veterinary Extensions into snowstorm. I followed the instructions in the "updating-snomed-and-externsions.md" document to load the extension and that went smoothly except that it didn't create a version. So I tried to create a version using the /codesystems/{shortName}/versions endpoint and that throws an exception using the input:

{
  "description": "SNOMED Veterinary extension April 2019 release",
  "effectiveDate": 20190401
}

Here is the exception:

Caused by: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/es-member/_bulk?timeout=1m], status line [HTTP/1.1 413 Request Entity Too Large]

	at org.elasticsearch.client.RestClient$1.completed(RestClient.java:355) ~[elasticsearch-rest-client-6.0.1.jar!/:6.0.1]
	at org.elasticsearch.client.RestClient$1.completed(RestClient.java:344) ~[elasticsearch-rest-client-6.0.1.jar!/:6.0.1]
	at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:123) ~[httpcore-4.4.9.jar!/:4.4.9]
	at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:181) ~[httpasyncclient-4.1.3.jar!/:4.1.3]
	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:439) ~[httpcore-nio-4.4.9.jar!/:4.4.9]
	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:329) ~[httpcore-nio-4.4.9.jar!/:4.4.9]
	at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265) ~[httpcore-nio-4.4.9.jar!/:4.4.9]
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) ~[httpasyncclient-4.1.3.jar!/:4.1.3]
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) ~[httpasyncclient-4.1.3.jar!/:4.1.3]
	at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) ~[httpcore-nio-4.4.9.jar!/:4.4.9]
	at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[httpcore-nio-4.4.9.jar!/:4.4.9]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[httpcore-nio-4.4.9.jar!/:4.4.9]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[httpcore-nio-4.4.9.jar!/:4.4.9]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[httpcore-nio-4.4.9.jar!/:4.4.9]
	at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.9.jar!/:4.4.9]
	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588) ~[httpcore-nio-4.4.9.jar!/:4.4.9]
	... 1 common frames omitted

Snowstorm 2.2.0 Startup Error in AWS when no credentials available

When starting Snowstorm 2.2.0 on an AWS EC2 instance Snowstorm fails to start with the following error:
Caused by: com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain

This only happens if I use an EC2 instance which does not have any credentials set in the environment variables or configured on disk.

Finding concepts within Indian Refset

Hello,

I had an issue earlier with my SNOMED API server and I just upgraded to SNOW STORM and I am using the 2017-09-01 version of the SNOMED taxonomy.

I am trying to use the Refsets that were published by India -> https://mlds.ihtsdotools.org/#/viewReleases/viewRelease/194856 and search for concepts within the refset.

I tried using the inbuilt swagger API in snowstorm to search for refsetmembers under the Neurosurgeory refset that was published but I'm still seeing 0 results.

screen shot 2019-01-04 at 7 34 20 pm

screen shot 2019-01-04 at 7 34 29 pm

Am I doing something wrong? Are there any other steps that I need to do to access a refset's members?

Regards,
Vybhav

Missing concept parents in branch created for extension

I've loaded the veterinary extensions and while searching for concept parents for our species value set I have run into a concept that does not return any parents when I use the branch created for the extension (MAIN/SNOMED-VET) with the findConceptParents endpoint.

The concept is 81260002. When I call the findConceptParents endpoint using the branch MAIN/SNOMED-VET no parents are returned (the list returned is empty). If I specify MAIN as the branch then the parents are returned.

I tried the findBrowserConcept with this concept id and the MAIN/SNOMED-VET branch and the returned value does include that parents in the relationships array.

Allow customizing number of shards

Snowstorm should allow configuring the number of shards per Elasticsearch index, rather than hardcoding the number at 8.

I believe that I would get better performance and a smaller memory footprint if my single-instance Elasticsearch was configured with a smaller number of shards. However, I cannot easily test or validate this assumption because snowstorm does not make this value configurable. (Yes, I could test this if I built snowstorm from source rather than using the release jar.)

I tried assigning a custom index template which specifies index.number_of_shards and index.number_of_replicas, but the hardcoded values in snowstorm take precedence over the index template.

When using the create index API, the settings/mappings defined as part of the create index call will take precedence over any matching settings/mappings defined in the template. - elastic.co

Creating new concepts and descriptions in local extension

Hi,

We are currently testing the creation of a local extension with Snowstorm. Ideally, we would like to create the whole basis with the tools provided by Snowstorm.

I tried to create the first module concept and its descriptions and relations. Apparently it worked, but the created concept has the following key-value : "released" : false , no matter the effectiveTime (or releasedEffectiveTime, for relations) provided. When I search for the concept with get/{branch}/concepts, here is the response:

{
  "items": [],
  "total": 1,
  "limit": 50,
  "offset": 0
}

Which, I'm guessing, means that the concept exists, but can't be extracted because it was not released.
Any insight as to why "released" is false and how to solve this?

Note: we do not have RF2 files to import for our extension as the whole test is about starting a new extension from scratch with Snowstorm.

Incomplete swagger API

Upon first running and inspecting Swagger at localhost I was confused not to find the /import endpoint mentioned in the docs. After importing snomed data via the command line a host of other endpoints were added. Where before there were only GET endpoints for retrieve etc afterwards there were many more including create, delete etc and the aforementioned imports endpoint. Upon returning today after restarting my laptop the api has reverted to the initial incomplete state. The snomed data is still present and the endpoints that are there work correctly.

ECL relationship group 0 mapping

Compared results from snowstorm 2.1.0, 2.0.0 as well as current sct-snapshot-rest-api and OntoServer and there are some differences. See: https://confluence.ihtsdotools.org/display/SLPG/ECL+and+grouped+attributes?focusedCommentId=78938364#comment-78938364

Copied from above (but formatting seems to have been lost):

  1. << 71388002 | atgard | : { 363703001 | har avsikt | = << 129428001 | preventiv avsikt | }
  2. << 71388002 | atgard | : 363703001 | har avsikt | = << 129428001 | preventiv avsikt |
implementation SNOMED CT release Query 1 Query 2
sct-snapshot-rest-api, commit 3ce4ab6 The one I had on my hard drive, likely International 2018-01-31 91 91
snowstorm, v 2.1.0 SE edition 2018-11-30 9 523
OntoServer through Shrimp UI, buildid ddd5953f1d34f52fb9f5d79a5d910e5d2f4bfaf4487755d3f8f6a5c7ea12a81c International 2018-01-31 347 347
snowstorm through browser, v 2.0.0 (https://browser.ihtsdotools.org/ecl/) International 2018-07-31 7 347

Expected to have the same results from the two queries.

Can't create RF2 Import via REST API

Importing a snapshot via the command line is okay but creating an import using the REST API is throwing an error. Looks like a Jackson issue - no default constructor in the RF2 config class.

Warn if multiple concept files in RF2 import archive

It has come to my attention that there is one member country SNOMED CT distribution which contains multiple concept files. For example there are two concept snapshot files.

It is not known how Snowstorm will deal with this type of archive. This is not a recommended format for distribution. For now Snowstorm should probably detect this issue and make the import fail.

Ability to import Full history of SNOMED Extensions

Currently the Full import works for an Edition like the International Edition where all concepts are in a single RF2 file. However with an Extension, where the concepts in the RF2 are in addition to the International Edition, there is no easy way to import the full history. Each version in the Extension history should be applied to a different release branch on top of the International Edition content.

Would be a great new feature.

Missing relationships after importing extension

High level summary - after importing the SNOMED International version followed by the SNOMED Veterinary Extension some relationships are missing from the child branch created for the extension when the extension contains inactive relationships with earlier effective times than the International edition.

Here are the steps I followed

  1. Startup a clean instance of Elasticsearch 6.4.2
  2. Startup snowstorm 2.2.3 using java -Xms2g -Xmx2g -jar target/snowstorm*.jar
  3. Follow the "Loading SNOMED into Snowstorm" guide to import the SNOMED International edtiion into the MAIN branch
  4. Modify the Veterinary Extension RF2 release files to reformat the effectiveTime in all files to YYYYMMDD format and rezip the release files
  5. Follow the "Loading & updating SNOMED CT with local Extensions or Editions" guide to import the Veterinary Extension into the MAIN/SNOMED-VET branch

After finishing this process the issue is seen by calling the findConceptParents endpoint and specifying the following parameters:

branch = MAIN/SNOMED-VET
conceptId = 81260002
form = inferred
Accept-Language = en-US;q=0.8,en-GB;q=0.6

the response code is a 200 and the response body is an empty array ([]).

If I make the same endpoint call but change the branch to MAIN I get one parent returned, conceptId 321351000009104.

In the SNOMED International edition Relationship file this relationship is present and active with effectiveTime = 20160131:

6412388027	20160131	1	900000000000207008	81260002	321351000009104	0	116680003	900000000000011006	900000000000451002

in the Veterinary extension Relationship file the relationship also exists but is inactive with effectiveTime = 20160130:

739111000009126	20160130	0	332351000009108	81260002	321351000009104	0	116680003	900000000000011006	900000000000451002

I am also attaching the log output from the import of the extension file.

vetext-snowstorm-import-log.txt

Please let me know if there is any other information I can provide or troubleshooting I can help with.

IllegalStateException error "Branch MAIN is already locked" when trying to import new INT release

Hi there,

I'm trying to import the latest International release (20190731) RF2 files into Snowstorm. My understanding was that this should be done by creating a DELTA import job on the MAIN branch. However once I upload the RF2 zip file, I get an IllegalStateException with the following message : "Branch MAIN is already locked". I tried to update the MAIN branch to unlock it, but as expected, this branch can't be modified.

Am I missing something?
Thanks

Full copy of the logs: https://pastebin.com/4RwAALbc
And here's the full error:

java.lang.IllegalStateException: Branch MAIN is already locked
at io.kaicode.elasticvc.api.BranchService.lockBranch(BranchService.java:254)
at io.kaicode.elasticvc.api.BranchService.openCommit(BranchService.java:244)
at io.kaicode.elasticvc.api.BranchService.openCommit(BranchService.java:235)
at org.snomed.snowstorm.core.rf2.rf2import.ImportComponentFactoryImpl.loadingComponentsStarting(ImportComponentFactoryImpl.java:165)
at org.ihtsdo.otf.snomedboot.ReleaseImporter$ImportRun.doLoadReleaseFiles(ReleaseImporter.java:188)
at org.ihtsdo.otf.snomedboot.ReleaseImporter$ImportRun.doLoadReleaseFiles(ReleaseImporter.java:159)
at org.ihtsdo.otf.snomedboot.ReleaseImporter$ImportRun.access$100(ReleaseImporter.java:145)
at org.ihtsdo.otf.snomedboot.ReleaseImporter.loadDeltaReleaseFiles(ReleaseImporter.java:51)
at org.ihtsdo.otf.snomedboot.ReleaseImporter.loadDeltaReleaseFiles(ReleaseImporter.java:85)
at org.snomed.snowstorm.core.rf2.rf2import.ImportService.importArchive(ImportService.java:101)
at org.snomed.snowstorm.core.rf2.rf2import.ImportService.lambda$importArchiveAsync$1(ImportService.java:145)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)```

CORS Support

Hi Everyone
Is there a way to enable/configure CORS in the server?
Thanks!

International characters: Diacritics normalization in text search

The elastic search index does not normalize diacritics, for example, in the spanish edition, using the “findConcepts” API for searching for “vías resp” and “vias resp” (from “vías respiratorias” “respiratory tract” ) produce different results.

Example:

https://snowstorm.msal.gov.ar/MAIN/concepts?activeFilter=true&term=v%C3%ADas%20resp&offset=0&limit=1

https://snowstorm.msal.gov.ar/MAIN/concepts?activeFilter=true&term=vias%20resp&offset=0&limit=1

The browser implementation has a diacritics normalization algorithm on the index creation and search, and spanish users expect that writing the word with or without accent would produce the same results (vía vs via)

Searching on the latest elastic search documentation one way to resolve this is to use multiple fields with different analyzers, and a multi match query with “Most fields”

most_fieldsedit
The most_fields type is most useful when querying multiple fields that contain the same text analyzed in different ways. For instance, the main field may contain synonyms, stemming and terms without diacritics. A second field may contain the original terms, and a third field might contain shingles. By combining scores from all three fields we can match as many documents as possible with the main field, but use the second and third fields to push the most similar results to the top of the list.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.