ihtsdo / snowstorm Goto Github PK

Scalable SNOMED CT Terminology Server using Elasticsearch

License: Other

Java 99.91% Shell 0.02% HTML 0.06%

snowstorm's Introduction

Snowstorm is an open source terminology server with special support for SNOMED CT. It is built on top of Elasticsearch, with a focus on performance and enterprise scalability.

SNOMED International is not able to offer commercial support for this product. Support is provided by the community via this repository.

APIs

Snowstorm has two APIs:

HL7 FHIR API 🔥
- Implements the Terminology Module
- Recommended for implementers
- Supports SNOMED CT, LOINC, ICD-10, ICD-10-CM and other code systems
Specialist SNOMED CT API
- Supports the management of SNOMED CT code systems
- Supports the SNOMED CT Browser
- Supports authoring SNOMED CT editions

Advice for Implementers

SNOMED International recommends that implementers of SNOMED CT use a terminology service, such as Snowstorm, and a standard interface, such as the HL7 FHIR API.

This approach allows loose coupling of applications as well as access to powerful terminology features.

Snowstorm is a good choice for teams who are just getting started or who have terminology and technical support capability. Other terminology servers are available, some offer commercial support.

SNOMED CT Browser Support

Snowstorm provides the terminology server API for the SNOMED International Browser including the International Edition and around fourteen national Editions.

Snowstorm can be used in local implementations to query SNOMED CT with the following features:

Hosting multiple extensions alongside the International Edition of SNOMED CT
Multi-lingual search and content retrieval
Fully ECL v2.0 compliant
Full history (depends on full RF2 import)
Read-only FHIR API 🔥

Authoring Use

Snowstorm also provides the terminology server API for the SNOMED International Authoring Platform.

The Authoring Platform is used for the maintenance of the International Edition of SNOMED CT as well as nine national Editions and several community content Extensions.

Documentation

Setup
Loading SNOMED CT content
- Loading SNOMED
- Loading & updating SNOMED CT with local Extensions or Editions
Authoring SNOMED CT
- Extension Authoring
Use
- Using the FHIR API
- Using the Specialist SNOMED API
Productionization
- Load Balancing

Contributing

We welcome questions, ideas, issues and code contributions to this project.

Use the issues page to get in touch with the community.

If you would like to make a code contribution please fork the repository and create a GitHub pull request to the develop branch.

License

Apache 2.0

See the included LICENSE file for details.

Tools

For Java performance profiling we recommend the JProfiler Java profiler.

snowstorm's People

Contributors

Stargazers

Watchers

Forkers

shapirod2 dmiller02 westcoastinformatics wrts gudmed danka74 songle321 bugkiller2021 waydes ulsa123 jamesallain elacus zivee shaydevelops bellmit saldiyusuf eirikconteir zzeniou86 jimcominsky andremoah hurtigcodes alopezo kaicode czarmich rorydavidson buraillc lmvwijk ai-dialogos-chatbot-with-llms joe-nano gauravvaishnav17 orangefruity ihtsdo athyacekemblayerun martinklapacz lgtm-migrator jonzammit fjgl ouafanachit martingall87 jorikseldeslachts threls kawsarnoor kpitech angeloimm cami-health octosoft-ai saleem-unifycare lopior eatyourpeas telradai khaled-el-mansoury-eg eaglecoders carnegiejunior newber0 healthwhale rammaram11 infor-hct cortexcloud aehrc avitalse liquid36 westcoastinformatics charlmex drmayu7 charlieliu9999 matiterativo maneeshnandan akitectio joennlae hawyar

snowstorm's Issues

Repeated warnings: number of terms exceeded allowed maximum

I noticed a large number of WARN events (repeated 98 times) being written to the Elasticsearch log while I had an import in progress.

[2018-11-08T23:38:06,216][WARN ][o.e.d.i.q.TermsQueryBuilder] Deprecated: the number of terms [342029] used in the Terms Query request has exceeded the allowed maximum of [65536]. This maximum can be set by changing the [index.max_terms_count] index level setting.

The same warning was returned back to the Elasticsearch RestClient and logged by snowstorm (12 times).

2018-11-08 23:40:57.115 WARN 155 --- [/O dispatcher 1] org.elasticsearch.client.RestClient : request [GET http://localhost:9200/es-query/query-concept/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&scroll=60000ms&search_type=query_then_fetch&batched_reduce_size=512] returned 1 warnings: [299 Elasticsearch-6.4.2-04711c2 "Deprecated: the number of terms [384272] used in the Terms Query request has exceeded the allowed maximum of [65536]. This maximum can be set by changing the [index.max_terms_count] index level setting." "Thu, 08 Nov 2018 23:40:53 GMT"]

What is the status of this warning?

Is this an unexpected warning?
Or is it expected, but not a concern?
Or is it expected, and I should adjust my settings in order to avoid printing the warning?

It looks like this warning will become a hard error in 7.0.

Additional Details

Elasticsearch 6.4.2
Snowstorm 2.1.0

Addition of read-only option

As some users would want to use this in a read-only fashion, then parts of the functionality are not needed, including the allocation of SCT identifiers or authentication tokens, making deployments more straightforward.
The simplistic recommendation is a startup flag for read-only use, by-passing and disabling any write functionality.

Snowstorm on Google Kubernetes Engine

I was able to run it locally using latest jar file (3.0.3) and elastic search 6.5.4. Also created docker images with the help of docker-compose.yml. Pushed images of elastic search and snowstorm on GCP kubernetes. I was able to run elastic search on GCP after adding changes to increase max virtual memory. But snowstorm workload is not running, It shows error for loading elastic search.

Error - 2019-06-03 12:28:07.374 ERROR 1 --- [ main] .d.e.r.s.AbstractElasticsearchRepository : failed to load elasticsearch nodes : UncategorizedExecutionException[Failed execution]; nested: ExecutionException[java.net.ConnectException: Connection refused]; nested: ConnectException[Connection refused];; java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused

The elastic search is running on its own IP (load balancer) with port 80 though target port is mentioned 9200. I tried to change port to 9200 but then it goes to unhealthy state, So currently elastic search is running with port=80 and target port=9200.
I don't know how to link snowstorm with elasticsearch service running on GCP. Please provide some guidelines or document for deployment process.
Thank You.!

ECL pagination does not work when using Reverse flag.

When using the "R" reverse flag or the alternative dot notation an ECL search returns all the results rather than just the page I asked for.

^ character not accepted in http requests

When trying to run the following ECL - ^ 733990004 |Nursing activities reference set| - the following exception is thrown by snowstorm:

2018-04-19 09:50:21.334 INFO 1 --- [nio-8080-exec-1] o.apache.coyote.http11.Http11Processor : Error parsing HTTP request header java.lang.IllegalArgumentException: Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC 3986 at org.apache.coyote.http11.Http11InputBuffer.parseRequestLine(Http11InputBuffer.java:476) ~[tomcat-embed-core-8.5.29.jar!/:8.5.29] at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:687) ~[tomcat-embed-core-8.5.29.jar!/:8.5.29] at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66) [tomcat-embed-core-8.5.29.jar!/:8.5.29] at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:790) [tomcat-embed-core-8.5.29.jar!/:8.5.29] at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1459) [tomcat-embed-core-8.5.29.jar!/:8.5.29] at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) [tomcat-embed-core-8.5.29.jar!/:8.5.29] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_151] at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-embed-core-8.5.29.jar!/:8.5.29] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]

Refset expansion in ECL incomplete

Running snowstorm develop@8d0d54c with Swedish edition loaded:
I tried expanding refsets using the refset API vs. ECL:
http://localhost:8080/MAIN%2FSNOMEDCT-SE/members?referenceSet=58191000052103&active=true&offset=0&limit=50
and got 76 members (which is correct), but
http://localhost:8080/MAIN%2FSNOMEDCT-SE/concepts?activeFilter=true&ecl=%5E58191000052103&offset=0&limit=50
gives me 41 members...

Getting 'Bad Request - 'DELETE' method not supported' back from DELETE endpoint

Getting 'Bad Request - request method 'DELETE' not supported' back
when testing the delete concept endpoint?

Inferred vs. stated relationships for parents

I am seeing something odd when hitting the GET /browser/{branch}/concepts/{conceptId}/parents endpoint between "inferred" versus "stated". It may be my misunderstanding of something in SNOMED.

Working with the veterinary extension the following concept - 354541000009105 (Castrated male) is giving two completely different concepts as parents depending on whether I use "inferred" or "stated" for the form parameter.

For "inferred" it is returning concept 248153007 (Male).
For "stated" it is returning concept 106106004 (Male reproductive finding)

I looked in the Relationship RF2 file in the veterinary extension and both of those is-a relationships are defined in there have different characteristicTypeId values. The 248153007 is 900000000000011006 (inferred relationship) and 106106004 is 900000000000010007 (stated relationship). So I think that explains why it is showing up the way it is.

So my question is does setting the form parameter to inferred only return the relationships with the explicit inferred relationship and not return any with stated and vice versa? If so, then doesn't this mean that in order to get all parents one would have to call this endpoint twice?

Elasticsearch unhealthy, removing healthcheck will exit elasticsearch (Docker Compose)

Attaching to snowstorm, elasticsearch
elasticsearch    | OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
elasticsearch exited with code 137

[Feature Request] Create HELM chart for Kubernetes Deployment

It will be very useful to have a public HELM chart containing the Kubernetes descriptors.
Having a HELM chart will be useful also to parameterise different values for different environments and also secrets (for key).

Once the HELM chart is provided it can be submitted to the main HELM chart repositories.

Incorrectly edited docker-compose?

I was just trying to setup this project with docker compose and I noticed the latest commit seems to delete critical lines of the file, that seem unrelated to the change

Was this deleted by accident?

Also, it seems that packaging requires ElasticSearch

Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.data.elasticsearch.rest.ElasticsearchRestClient]: Circular reference involving containing bean 'testConfig' - consider declaring the factory method as static for independence from its containing instance. Factory method 'elasticsearchClient' threw exception; nested exception is pl.allegro.tech.embeddedelasticsearch.EmbeddedElasticsearchStartupException: Failed to start elasticsearch. Check previous logs for details

I was trying to make the Dockerfile do RUN mvn clean package but it fails as it expects elasticSearch to be available at package time?

Incorrect ECL Ancestor Count After Importing Upgrade RF2 Delta

After importing the July 2018 International Snapshot and then the Jan 2019 International Delta a handful of concepts have an incorrect ancestor count.
For example >125021000119107 give 48 results rather than the expected 49 results because ancestor 195967001 has been lost.

Language support in FHIR API

As far as I can see the FHIR API does not support any other languages than the language imported in the MAIN branch. I am running Snowstorm with the international release in the MAIN branch and the Swedish edition in MAIN/SNOMEDCT-SE. I've tested the FHIR lookup-operations on CodeSystem and it always uses the MAIN branch, so I am not able to get the Swedish translations through FHIR . The regular REST-API works using the Accept-Language http-header, so I have verified that the Swedish release has been imported correctly.

Rename variable / method

Is the "termPrefix" really a prefix? I'm calling it to find matching text anywhere in the term. Would a clearer name be "termMatch" or "termFilter" ? Thanks!

snowstorm/src/main/java/org/snomed/snowstorm/core/data/services/QueryService.java

Line 501 in 9ec2680

public ConceptQueryBuilder termPrefix(String termPrefix) {

PS Just used it to implement a FHIR parameter in under 5 mins - bonus!

Collation for non-English languages

Hi,

in what way is specification of collation for non-English languages supported?

https://www.elastic.co/guide/en/elasticsearch/guide/master/sorting-collations.html

/Daniel

Parents endpoint seems to provide ancestors in response

Hi all,

The parents endpoint seems to provide the ancestors in its response. As far as I understand the ECL query system, these two queries should give the same response;

/browser/MAIN/concepts/19431000/parents?form=inferred
/MAIN/concepts?ecl=>!19431000

Am I missing something (ie. direct parents != inferred parents), or is the parents endpoint misbehaving?

Best regards,
Sander

Incorrect concept search total results size

The concept search endpoint is returning an incorrect total results size when performing a simple logical search like activeFilter=true. The number of results on that page is reported as the total number of results available.

Add Filter semantic Tag

How to search term based on semantic tag.
eg: I want to search 'heart attack' term from disorder semantic tag.

Large number of ES scroll contexts created during import

During the initial snapshot import of the SNOMED CT RF2 file there are up to 10K Elasticsearch scroll contexts open at once. This may be slowing down the import. We should investigate if these search contexts can be closed more quickly in code, perhaps in the spring-data-elasticsearch layer, rather than relying on the scroll context timeout which is likely at the moment.

See the open_contexts stat when importing here http://localhost:9200/_nodes/stats

Thanks to @rorydavidson for finding this.

Pipe character not accepted in HTTP GET requests

This is an issue for ECL which often contains concept terms within pipes.

Upgrade Spring Boot Tomcat version to 8.5.41+ to address CVE-2019-10072

The current version of Spring Boot being used embeds Tomcat 8.5.29, which is vulnerable to CVE-2019-10072. This vulnerability was fixed in Apache Tomcat 8.5.41. There is currently no version of Spring Boot which embeds a non-affected version of Tomcat 8.5, and so it is required to set the Tomcat version manually. This can be done by adding the following property in pom.xml:

<tomcat.version>8.5.41</tomcat.version>

Some ECL results missing

When performing an ECL search using attributes some concepts are not returned as expected.
For example *:363698007=* does not return concept 34020007 |Streptococcal pneumonia| even though this concept has the 363698007 attribute in the inferred form.

Snowstorm induced memory leak in Elasticsearch?

I'm running Snowstorm server in Amazon Web Services, in combination with a hosted Elasticsearch service provided by Amazon Elasticsearch Service. This has been running for about two months now, and today I noticed some undesirable trends in metrics corresponding to our Elasticsearch instance for Snowstorm. In the last 63 days...

JVMMemoryPressure increased from 28.5% to 65.9% in a stepwise fashion, with steps occurring approximately every 4 hours. This correlates with a brief spike in DiskQueueDepth, which normally holds at 0.
JVMGCYoungCollectionCount and JVMGCYoungCollectionTime are increasingly linearly over time, with no apparent connection to the steps shown in JVMMemoryPressure.

Could Snowstorm be performing some regular, routine process that is leading to the buildup of objects in Elasticsearch and causing a memory leak?

I'm using Snowstorm 2.1.0 and Elasticsearch 6.3.

Quotes not escaped in ECL response

Consider this curl request for the concept "Wallace "69" side-to-end anastomosis":

curl -X GET --header 'Accept-Language: en' 'http://localhost:8080/browser/MAIN/concepts/257751006'

In this case, the quotes in the terms are properly escaped, eg:
"Wallace \"69\" side-to-end anastomosis - action (qualifier value)".

However, the same request expressed as an ECL query, like this:

curl 'http://localhost:8080/MAIN/concepts?ecl=%20257751006%20&page=0&limit=1'

will return an invalid JSON response because the quotes are not escaped, eg:
"Wallace "69" side-to-end anastomosis - action (qualifier value)"

Unexplained cause of failed import

I'm trying to set up snowstorm for the first time, and I'm running into a bit of trouble. I'm starting up snowstorm and doing the import immediately on launch.

java -Xmx4g -jar /opt/snowstorm-2.1.0.jar --delete-indices --import=/opt/SnomedCT.zip

After running for a while, the import appears to fail.

2018-11-07 18:02:17.624 ERROR 106 --- [pool-5-thread-2] o.ihtsdo.otf.snomedboot.ReleaseImporter : Failed to read or process lines.
2018-11-07 18:02:17.625 ERROR 106 --- [ool-5-thread-14] o.ihtsdo.otf.snomedboot.ReleaseImporter : Failed to read or process lines.
2018-11-07 18:02:17.752 ERROR 106 --- [pool-5-thread-1] o.ihtsdo.otf.snomedboot.ReleaseImporter : Failed to read or process lines.
2018-11-07 18:02:17.757 ERROR 106 --- [ool-5-thread-15] o.ihtsdo.otf.snomedboot.ReleaseImporter : Failed to read or process lines.
2018-11-07 18:02:17.788 ERROR 106 --- [pool-5-thread-5] o.ihtsdo.otf.snomedboot.ReleaseImporter : Failed to read or process lines.
[...]
2018-11-07 18:02:17.807 ERROR 106 --- [ main] o.s.s.core.rf2.rf2import.ImportService : Failed RF2 SNAPSHOT import on branch MAIN. ID 1a2bff8f-8a00-4d80-8047-b056b90859fe

I see stack traces for a few occurrences of UncategorizedExecutionException, all of which are caused by java.net.ConnectException (Connection refused). All of this concludes with the Spring application context shutting down.

Error starting ApplicationContext. To display the conditions report re-run your application with 'debug' enabled.
2018-11-07 18:02:18.270 ERROR 106 --- [ main] o.s.boot.SpringApplication : Application run failed
java.lang.IllegalStateException: Failed to execute ApplicationRunner

I'm using SnomedCT_RF2Release_CDN_20181031 obtained through Canada Health Infoway.

Shutting down after a failed import seems like a reasonable approach, but none of this output really helps me identify the specific file(s) or line(s) that are causing a problem with import. It would be helpful to provide more information here, perhaps by logging the names of files that are being opened, before they are fully processed.

Some concepts do not appear in the ECL index

After importing the Jan 2018 RF2 export from the UAT Snow Owl terminology server some concepts are missing from the Snowstorm ECL index.

Searching for descendants and self of the root concept in UAT gives 363509 results but only 354171 in Snowstorm.

Concept 16837005 is an example concept which is missing. This concept can be returned from the browser endpoint but an ECL search with this as the focus concept returns nothing.

Error when using international characters on ECL

Using this content on POST /{branch}/concepts/search search returns an error:

{ "activeFilter": true, "conceptIds": [], "eclFilter": "<<19923001 |catéter (objeto físico)|", "limit": 2, "offset": 0, "statedEclFilter": "", "termFilter": "" }

Returns this result:

{ "error": "INTERNAL_SERVER_ERROR", "message": "Failed to parse ECL '<<19923001 |catéter (objeto físico)|'" }

Removing accented characters from the ECL resolves the issue:

<<19923001 |catéter (objeto físico)| -> <<19923001 |cateter (objeto fisico)|

The term comes from the Spanish Edition. Tested the ECL on the APG parser site and parses OK with accents.

Thanks

Retrieving descendants - stated versus inferred

Retrieving descendants using either the swagger ui or URL does not react to changing the requested state (inferred versus stated).
http://localhost:8080/MAIN/concepts/125589001/descendants?stated=true&offset=0&limit=10000
Results in the same descendants as:
http://localhost:8081/MAIN/concepts/125589001/descendants?stated=false&offset=0&limit=10000
Can anyone confirm this and perhaps help with a workaround?
Best regards

Exit process after import completes

Please consider adding a command line option that causes snowstorm to exit gracefully after finishing an import.

Scenario

I'm running Snowstorm in two ways.

I have snowstorm running as a long-lived, supervised process (webserver mode) that serves responses to client requests.
I have a short-lived snowstorm process that starts up when I need to [re]import the Snomed CT concept database into Elasticsearch. That process runs with options --delete-indices and --import [file]. I would like that version to exit after the import has successfully completed in order to free up system memory, but currently it just continues running indefinitely.

Workaround

Currently my workaround is to set a timeout on the import process so that it is killed after a number of hours, but that's less optimal compared to snowstorm exiting gracefully as soon as the work is done. It means I'm using memory longer than I need to, and it means there's a slight risk the process could be killed before import completes.

Suggestion

I've seen other software use options like --once or --exit. Maybe one of those would fit here?

Creating a version of an imported extension fails

I am trying to import the SNOMED Veterinary Extensions into snowstorm. I followed the instructions in the "updating-snomed-and-externsions.md" document to load the extension and that went smoothly except that it didn't create a version. So I tried to create a version using the /codesystems/{shortName}/versions endpoint and that throws an exception using the input:

{
  "description": "SNOMED Veterinary extension April 2019 release",
  "effectiveDate": 20190401
}

Here is the exception:

Caused by: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/es-member/_bulk?timeout=1m], status line [HTTP/1.1 413 Request Entity Too Large]

	at org.elasticsearch.client.RestClient$1.completed(RestClient.java:355) ~[elasticsearch-rest-client-6.0.1.jar!/:6.0.1]
	at org.elasticsearch.client.RestClient$1.completed(RestClient.java:344) ~[elasticsearch-rest-client-6.0.1.jar!/:6.0.1]
	at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:123) ~[httpcore-4.4.9.jar!/:4.4.9]
	at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:181) ~[httpasyncclient-4.1.3.jar!/:4.1.3]
	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:439) ~[httpcore-nio-4.4.9.jar!/:4.4.9]
	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:329) ~[httpcore-nio-4.4.9.jar!/:4.4.9]
	at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265) ~[httpcore-nio-4.4.9.jar!/:4.4.9]
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) ~[httpasyncclient-4.1.3.jar!/:4.1.3]
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) ~[httpasyncclient-4.1.3.jar!/:4.1.3]
	at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) ~[httpcore-nio-4.4.9.jar!/:4.4.9]
	at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[httpcore-nio-4.4.9.jar!/:4.4.9]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[httpcore-nio-4.4.9.jar!/:4.4.9]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[httpcore-nio-4.4.9.jar!/:4.4.9]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[httpcore-nio-4.4.9.jar!/:4.4.9]
	at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.9.jar!/:4.4.9]
	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588) ~[httpcore-nio-4.4.9.jar!/:4.4.9]
	... 1 common frames omitted

Snowstorm 2.2.0 Startup Error in AWS when no credentials available

When starting Snowstorm 2.2.0 on an AWS EC2 instance Snowstorm fails to start with the following error:
Caused by: com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain

This only happens if I use an EC2 instance which does not have any credentials set in the environment variables or configured on disk.

Importing RF2 Full file does not create all codesystem versions

Rory imported the 20180731 International Full. All the content was imported and the release branches were created but not all the versions in the code system registry were created.

The last code system created was 2006-07-31.
Built from the develop branch around commit 6e6276a
Log file https://gist.github.com/kaicode/0f73df644dc2e668e3a33264e0b2441f

Finding concepts within Indian Refset

Hello,

I had an issue earlier with my SNOMED API server and I just upgraded to SNOW STORM and I am using the 2017-09-01 version of the SNOMED taxonomy.

I am trying to use the Refsets that were published by India -> https://mlds.ihtsdotools.org/#/viewReleases/viewRelease/194856 and search for concepts within the refset.

I tried using the inbuilt swagger API in snowstorm to search for refsetmembers under the Neurosurgeory refset that was published but I'm still seeing 0 results.

Am I doing something wrong? Are there any other steps that I need to do to access a refset's members?

Regards,
Vybhav

Missing concept parents in branch created for extension

I've loaded the veterinary extensions and while searching for concept parents for our species value set I have run into a concept that does not return any parents when I use the branch created for the extension (MAIN/SNOMED-VET) with the findConceptParents endpoint.

The concept is 81260002. When I call the findConceptParents endpoint using the branch MAIN/SNOMED-VET no parents are returned (the list returned is empty). If I specify MAIN as the branch then the parents are returned.

I tried the findBrowserConcept with this concept id and the MAIN/SNOMED-VET branch and the returned value does include that parents in the relationships array.

Allow customizing number of shards

Snowstorm should allow configuring the number of shards per Elasticsearch index, rather than hardcoding the number at 8.

I believe that I would get better performance and a smaller memory footprint if my single-instance Elasticsearch was configured with a smaller number of shards. However, I cannot easily test or validate this assumption because snowstorm does not make this value configurable. (Yes, I could test this if I built snowstorm from source rather than using the release jar.)

I tried assigning a custom index template which specifies index.number_of_shards and index.number_of_replicas, but the hardcoded values in snowstorm take precedence over the index template.

When using the create index API, the settings/mappings defined as part of the create index call will take precedence over any matching settings/mappings defined in the template. - elastic.co

Creating new concepts and descriptions in local extension

Hi,

We are currently testing the creation of a local extension with Snowstorm. Ideally, we would like to create the whole basis with the tools provided by Snowstorm.

I tried to create the first module concept and its descriptions and relations. Apparently it worked, but the created concept has the following key-value : "released" : false , no matter the effectiveTime (or releasedEffectiveTime, for relations) provided. When I search for the concept with get/{branch}/concepts, here is the response:

{
  "items": [],
  "total": 1,
  "limit": 50,
  "offset": 0
}

Which, I'm guessing, means that the concept exists, but can't be extracted because it was not released.
Any insight as to why "released" is false and how to solve this?

Note: we do not have RF2 files to import for our extension as the whole test is about starting a new extension from scratch with Snowstorm.

Incomplete swagger API

Upon first running and inspecting Swagger at localhost I was confused not to find the /import endpoint mentioned in the docs. After importing snomed data via the command line a host of other endpoints were added. Where before there were only GET endpoints for retrieve etc afterwards there were many more including create, delete etc and the aforementioned imports endpoint. Upon returning today after restarting my laptop the api has reverted to the initial incomplete state. The snomed data is still present and the endpoints that are there work correctly.

SemanticTag not working??

When the following command is entered

curl http://localhost:8080/MAIN/concepts?semanticTag=disorder&term=mouth

It give a false reading?
"total":466617,"limit":50,"offset":0

ECL relationship group 0 mapping

Compared results from snowstorm 2.1.0, 2.0.0 as well as current sct-snapshot-rest-api and OntoServer and there are some differences. See: https://confluence.ihtsdotools.org/display/SLPG/ECL+and+grouped+attributes?focusedCommentId=78938364#comment-78938364

Copied from above (but formatting seems to have been lost):

<< 71388002 | atgard | : { 363703001 | har avsikt | = << 129428001 | preventiv avsikt | }
<< 71388002 | atgard | : 363703001 | har avsikt | = << 129428001 | preventiv avsikt |

implementation	SNOMED CT release	Query 1	Query 2
sct-snapshot-rest-api, commit 3ce4ab6	The one I had on my hard drive, likely International 2018-01-31	91	91
snowstorm, v 2.1.0	SE edition 2018-11-30	9	523
OntoServer through Shrimp UI, buildid ddd5953f1d34f52fb9f5d79a5d910e5d2f4bfaf4487755d3f8f6a5c7ea12a81c	International 2018-01-31	347	347
snowstorm through browser, v 2.0.0 (https://browser.ihtsdotools.org/ecl/)	International 2018-07-31	7	347

Expected to have the same results from the two queries.

Can't create RF2 Import via REST API

Importing a snapshot via the command line is okay but creating an import using the REST API is throwing an error. Looks like a Jackson issue - no default constructor in the RF2 config class.

Warn if multiple concept files in RF2 import archive

It has come to my attention that there is one member country SNOMED CT distribution which contains multiple concept files. For example there are two concept snapshot files.

It is not known how Snowstorm will deal with this type of archive. This is not a recommended format for distribution. For now Snowstorm should probably detect this issue and make the import fail.

Searching with some characters gives 0 results

Dear Snowstorm,
when using the search interfaces (e.g. findConcepts, search) and including some characters in the term filter snowstorm gives 0 results even though descriptions with those characters exist in SNOMED CT. Examples of such characters are comma (,) and percent sign (%), could be more, but full stop (.) and parenthesis (() work fine.
E.g.: http://localhost:8080/MAIN/concepts?term=anemia%2C&offset=0&limit=50

Ability to import Full history of SNOMED Extensions

Currently the Full import works for an Edition like the International Edition where all concepts are in a single RF2 file. However with an Extension, where the concepts in the RF2 are in addition to the International Edition, there is no easy way to import the full history. Each version in the Extension history should be applied to a different release branch on top of the International Edition content.

Would be a great new feature.

Recommended EC2 instance type

Hello Team,

What is the recommended EC2 instance type to run the SNOWSTORM server?

Thank you,
Vybhav

Missing relationships after importing extension

High level summary - after importing the SNOMED International version followed by the SNOMED Veterinary Extension some relationships are missing from the child branch created for the extension when the extension contains inactive relationships with earlier effective times than the International edition.

Here are the steps I followed

Startup a clean instance of Elasticsearch 6.4.2
Startup snowstorm 2.2.3 using java -Xms2g -Xmx2g -jar target/snowstorm*.jar
Follow the "Loading SNOMED into Snowstorm" guide to import the SNOMED International edtiion into the MAIN branch
Modify the Veterinary Extension RF2 release files to reformat the effectiveTime in all files to YYYYMMDD format and rezip the release files
Follow the "Loading & updating SNOMED CT with local Extensions or Editions" guide to import the Veterinary Extension into the MAIN/SNOMED-VET branch

After finishing this process the issue is seen by calling the findConceptParents endpoint and specifying the following parameters:

branch = MAIN/SNOMED-VET
conceptId = 81260002
form = inferred
Accept-Language = en-US;q=0.8,en-GB;q=0.6

the response code is a 200 and the response body is an empty array ([]).

If I make the same endpoint call but change the branch to MAIN I get one parent returned, conceptId 321351000009104.

In the SNOMED International edition Relationship file this relationship is present and active with effectiveTime = 20160131:

6412388027	20160131	1	900000000000207008	81260002	321351000009104	0	116680003	900000000000011006	900000000000451002

in the Veterinary extension Relationship file the relationship also exists but is inactive with effectiveTime = 20160130:

739111000009126	20160130	0	332351000009108	81260002	321351000009104	0	116680003	900000000000011006	900000000000451002

I am also attaching the log output from the import of the extension file.

vetext-snowstorm-import-log.txt

Please let me know if there is any other information I can provide or troubleshooting I can help with.

IllegalStateException error "Branch MAIN is already locked" when trying to import new INT release

Hi there,

I'm trying to import the latest International release (20190731) RF2 files into Snowstorm. My understanding was that this should be done by creating a DELTA import job on the MAIN branch. However once I upload the RF2 zip file, I get an IllegalStateException with the following message : "Branch MAIN is already locked". I tried to update the MAIN branch to unlock it, but as expected, this branch can't be modified.

Am I missing something?
Thanks

Full copy of the logs: https://pastebin.com/4RwAALbc
And here's the full error:

java.lang.IllegalStateException: Branch MAIN is already locked
at io.kaicode.elasticvc.api.BranchService.lockBranch(BranchService.java:254)
at io.kaicode.elasticvc.api.BranchService.openCommit(BranchService.java:244)
at io.kaicode.elasticvc.api.BranchService.openCommit(BranchService.java:235)
at org.snomed.snowstorm.core.rf2.rf2import.ImportComponentFactoryImpl.loadingComponentsStarting(ImportComponentFactoryImpl.java:165)
at org.ihtsdo.otf.snomedboot.ReleaseImporter$ImportRun.doLoadReleaseFiles(ReleaseImporter.java:188)
at org.ihtsdo.otf.snomedboot.ReleaseImporter$ImportRun.doLoadReleaseFiles(ReleaseImporter.java:159)
at org.ihtsdo.otf.snomedboot.ReleaseImporter$ImportRun.access$100(ReleaseImporter.java:145)
at org.ihtsdo.otf.snomedboot.ReleaseImporter.loadDeltaReleaseFiles(ReleaseImporter.java:51)
at org.ihtsdo.otf.snomedboot.ReleaseImporter.loadDeltaReleaseFiles(ReleaseImporter.java:85)
at org.snomed.snowstorm.core.rf2.rf2import.ImportService.importArchive(ImportService.java:101)
at org.snomed.snowstorm.core.rf2.rf2import.ImportService.lambda$importArchiveAsync$1(ImportService.java:145)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)```

CORS Support

Hi Everyone
Is there a way to enable/configure CORS in the server?
Thanks!

International characters: Diacritics normalization in text search

The elastic search index does not normalize diacritics, for example, in the spanish edition, using the “findConcepts” API for searching for “vías resp” and “vias resp” (from “vías respiratorias” “respiratory tract” ) produce different results.

Example:

https://snowstorm.msal.gov.ar/MAIN/concepts?activeFilter=true&term=v%C3%ADas%20resp&offset=0&limit=1

https://snowstorm.msal.gov.ar/MAIN/concepts?activeFilter=true&term=vias%20resp&offset=0&limit=1

The browser implementation has a diacritics normalization algorithm on the index creation and search, and spanish users expect that writing the word with or without accent would produce the same results (vía vs via)

Searching on the latest elastic search documentation one way to resolve this is to use multiple fields with different analyzers, and a multi match query with “Most fields”

most_fieldsedit
The most_fields type is most useful when querying multiple fields that contain the same text analyzed in different ways. For instance, the main field may contain synonyms, stemming and terms without diacritics. A second field may contain the original terms, and a third field might contain shingles. By combining scores from all three fields we can match as many documents as possible with the main field, but use the second and third fields to push the most similar results to the top of the list.

Additional relationships not included in inferred ECL

Relationships with the additional characteristic type, for example "Part of" relationships, should be included in the semantic index for ECL against the inferred form.

Loading Release Snapshot: Via REST

Issue: no suggested endpoint in Swagger found.