Comments (23)
Hey @wardle. I've been doing good! Thanks for asking! Yes, I'll definitely update to the latest version soon. I'll have to check for all the breaking changes first 😬.
Docker is being depreciated by Kubernetes, but all docker images are OCI compliant, to begin with, so can be run with Kubernetes using containerd, or cri-o. Docker is also based on containerd. It's just an implementation detail. I've been using Docker containers as the primary mode of deploying Hermes on multiple managed cloud providers - Mostly Google Cloud Run. It is great at scaling up and down the containers (to even zero containers if no one's using it) and I think it's actually good enough to offer Hermes as a managed service! I've more or less automated the whole flow of building and deploying the containers, and it scales really well since they are stateless. Let me know if you want to compete with Ontoserver 😉.
from hermes.
@wardle The reasoning here is that - we use the docker images to version both hermes as well as the release of SNOMED CT. And since the server is useless without the data, we index everything into one image. Using a volume is a valid solution too, and I've used it a lot when testing locally, but while deploying and managing releases, it's just easier to deal with 1 release artifact instead of 2. The database file needs to be versioned and maintained as well.
Keeping the container small in this case does not make much difference because the database file has to be downloaded in every single machine running this container anyway. And while retrieving and storing the container, layer caches are maintained, so the same data is never stored or pulled more than once.
from hermes.
from hermes.
Hi Simon,
Thanks for your kind comments!
I now generally deploy via source code or the runnable uberjar.
Kubernetes has deprecated docker support recently in favour of containerd. The code to build a docker image is more complex than the code to actually run from source or build a runnable jar.
The runnable jar has no dependencies except a working java. You don't need to know clojure.
But I'm happy to have a pull request as long as it doesn't complicate things.
But I'd add one other caveat - I wouldn't docker-ize the software together with the data files - but instead have a shared persistent filesystem on which you have your versioned file-based databases (e.g. for each release) - so you could have multiple load-balanced services using the same data files and minimise the size of your docker image, which otherwise would be unnecessarily large. Once you start down this path, you realise docker-ization or deployment is a separate step from the actual tool. So the docker file simply needs to compile and then build the image containing the jar.
I simply git pull and run from source.
Mark
from hermes.
Also see #7 for a prior version by @sidharthramesh
from hermes.
If you want to try it, without installing clojure, you can just download a release https://github.com/wardle/hermes/releases and it will run as long as you have java installed.
from hermes.
Hey @eatyourpeas I actually have a script to download SNOMED from MLDS and also containerize it after indexing it. It's currently built using the cloudbuild.yaml file, but please feel free to take a look at the Dockerfile. You can change the version of hermes by just changing the .jar file.
Here is the repo: https://github.com/medblocks/mlds
from hermes.
Thanks Sidharth. Hope you are well. I'd definitely upgrade to the latest release if you can.
from hermes.
Sounds good. We could put different releases behind a gateway and offer different versions as well!
from hermes.
Thank you both - I shall nose around both of these ideas first.
from hermes.
Also see this blog post and video. I have a tiny $3.50 lightsail instance running with very low memory just for demonstration purposes with which you can play. See IP address in the post.
from hermes.
Hi @sidharthramesh - don't think that there are any breaking changes - I should hope all additive changes only at this point. You should find bind-address is now configurable, and there is a new index for association refsets which enables functionality to include historical items in any value set. That does mean that the file-database version changes to 0.6, so you need to build a new file-based database using the latest tool ideally. You can find the current database version in the manifest file of any existing databases. It's not really a breaking change however, but the tooling is strict and should refuse to open a database with the wrong version - that approach prevents hard-to-understand bugs in the future. I think that is a good approach because a fundamental idea in hermes is that one builds a new database frequently anyway - but I'm open to discussion on the strictness of compatibility checks. I've erred on side of strictness here rather than functionality simply failing silently with few clues as to why.
from hermes.
I maybe going about this wrong in that case:
I am looking to set up a dockerised server that I can send queries to locally for the moment. Within my (python) project I have put the hermes jar in the root folder and set up a docker file with these steps:
FROM openjdk:11-jre-slim
RUN mkdir -p /src
WORKDIR /src
COPY ./hermes-v0.7.5.jar ./hermes.jar
RUN java -jar hermes.jar --db snomed.db download uk.nhs/sct-clinical {TRUD_API_KEY} /tmp/trud
RUN java -jar hermes.jar --db snomed.db compact
RUN java -jar hermes.jar --db snomed.db index
RUN java -jar hermes.jar --db snomed.db --port 8080 serve
I am not sure it is downloading the databases though since it jumps past that step in only 8 seconds. Do you have any insights?
from hermes.
@eatyourpeas make sure you're passing the TRUD_API_KEY correctly? What do the logs during build say? Docker has a very smart way of caching layers, so if it's downloaded the files once for a step, it will skip that step and only execute steps that have changed since then.
I've really not experimented with TRUD, since I usually get the files from MLDS for the Indian Version.
Also, the last step, you probably don't want to execute a RUN, but a CMD to ensure this command runs when the container starts and not when the image is built. You can look here for reference.
from hermes.
Thank you @sidharthramesh - I am passing the TRUD_API_KEY for the moment as a parameter just to get it working.
It does seem to pass the step, and as you say, cached it so is skipped second time round.
The error I am getting is:
no database found at path and operating read-only {:path "snomed.db"}
So either it does not download it or else my paths are all wrong. I have changed the final step to a CMD like you suggested, but it breaks at the previous compact step.
from hermes.
from hermes.
Much more promising thank you both.
If I get it working, would you like me to update your readme?
from hermes.
Sure. Perhaps add a doc directory with a file docker.md and we can link from the main README? This would be a UK-specific thing.
You probably want to specify the database explicitly - and for the UK - you'll want uk.nhs/sct-drug-ext as well usually.
I still think it is simpler to run from source code. All this looks as if it adds complexity and reduces flexibility, but no issue in a how-to I guess! Also think the data shouldn't be in the image - shouldn't containers be small and have a link to a fixed filesystem using the cloud provider's fixed filesystem options - that'd be my preference. But open to documenting this approach, as it might feel easier.
from hermes.
Thanks I think I have a lot to learn here. I haven't fiddled with memory heaps etc since dealing with emulators in Android Studio. I ran out of memory at the compacting stage and had to play with docker settings to free up 8GB. You are right this may not be a sensible idea - I will go with it for the moment and maybe send you some markdown of my experience for you to include if you think it helpful if I find my way through.
from hermes.
SNOMED is enormous.
I do mention the need to pass in specific jvm parameters for the compaction phase in the existing documentation. There is no avoiding this need as it is a requirement from a dependency, but you can actually safely leave out compaction if you wish.
I personally wouldn't create a docker image containing the data itself. You can, but I prefer not.
I'd just use the command-line to get it running before you think about docker-ization. Then you'll understand more about what is happening. You only need java to run the jar file. You'll need clojure to run from source, but running using the pre-build uberjar is fine too.
Most SNOMED terminology servers use multiple federated instances of Elasticsearch and have far greater requirements - and are far more complex.
Once you're confident with the process, then you might want to look at the best way to deploy into your production environment - the options for deployment are wide. I've used a range from API gateway, multiple instances of the service and shared fixed disks to running on a single server using systemd to embedded in a larger server application! Building a small container image is trivial containing the code. Scale multiple containers and share a filesystem is the way I'd recommend, if you're needing to satisfy millions of users. I've never had to have more than a single instance. This is in production in our emergency unit so keep it simple!
from hermes.
@eatyourpeas I'm happy to have a call and show you if that helps. I can also share some pre-built datafiles via google drive if you wish.
from hermes.
PS. You didn't run out of memory - the java application ran out of memory - you just need to give it more for the compaction phase or skip that step.
from hermes.
Hi @eatyourpeas and @sidharthramesh ,
Shouldn't your docker files use VOLUMEs to share a file system? That means your containers are small.
See https://docs.docker.com/storage/
Couldn't you even copy the data from s3 or other storage if it didn't exist?
You surely don't want lots of read only static data in your container?
from hermes.
Related Issues (20)
- Search using constraint with refinements times out HOT 6
- Add support for concrete types, for indexing and for search HOT 5
- Improve server error messages
- Consider adding alternatives to expand-ecl HOT 3
- Some client JSON parsers don't support 64-bit integers, potentially causing truncation HOT 14
- Build from source - ClassNotFoundException HOT 1
- 404 when trying to include SNOMED UK Medicines HOT 3
- In core API, 'synonyms' should also accept locale preferences HOT 1
- Allow explicit locale fallback preferences HOT 1
- Modify search to use normalised / folded strings to better handle diacritics HOT 8
- Search should default to including normalised (folded) index HOT 1
- Error when compacting db HOT 14
- Error on indexing with unclear error messages because no available reference sets for selected locale HOT 5
- Default, fallback locale should be on basis of data not system
- ECL expansion in which attribute values resolve to nothing result in an error
- improve documentation of removeDuplicates in HTTP API HOT 3
- Add support for parallelised result collection via new CollectorManager API HOT 11
- If indexing occurs in-place, then results during indexing may be incorrect
- Add explicit tests to check Hermes can deal with arbitrarily large description terms
- Support for ECL v2.2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hermes.