Code Monkey home page Code Monkey logo

Comments (23)

sidharthramesh avatar sidharthramesh commented on September 1, 2024 1

Hey @wardle. I've been doing good! Thanks for asking! Yes, I'll definitely update to the latest version soon. I'll have to check for all the breaking changes first 😬.

Docker is being depreciated by Kubernetes, but all docker images are OCI compliant, to begin with, so can be run with Kubernetes using containerd, or cri-o. Docker is also based on containerd. It's just an implementation detail. I've been using Docker containers as the primary mode of deploying Hermes on multiple managed cloud providers - Mostly Google Cloud Run. It is great at scaling up and down the containers (to even zero containers if no one's using it) and I think it's actually good enough to offer Hermes as a managed service! I've more or less automated the whole flow of building and deploying the containers, and it scales really well since they are stateless. Let me know if you want to compete with Ontoserver 😉.

from hermes.

sidharthramesh avatar sidharthramesh commented on September 1, 2024 1

@wardle The reasoning here is that - we use the docker images to version both hermes as well as the release of SNOMED CT. And since the server is useless without the data, we index everything into one image. Using a volume is a valid solution too, and I've used it a lot when testing locally, but while deploying and managing releases, it's just easier to deal with 1 release artifact instead of 2. The database file needs to be versioned and maintained as well.

Keeping the container small in this case does not make much difference because the database file has to be downloaded in every single machine running this container anyway. And while retrieving and storing the container, layer caches are maintained, so the same data is never stored or pulled more than once.

from hermes.

wardle avatar wardle commented on September 1, 2024 1

from hermes.

wardle avatar wardle commented on September 1, 2024

Hi Simon,

Thanks for your kind comments!

I now generally deploy via source code or the runnable uberjar.

Kubernetes has deprecated docker support recently in favour of containerd. The code to build a docker image is more complex than the code to actually run from source or build a runnable jar.

The runnable jar has no dependencies except a working java. You don't need to know clojure.

But I'm happy to have a pull request as long as it doesn't complicate things.

But I'd add one other caveat - I wouldn't docker-ize the software together with the data files - but instead have a shared persistent filesystem on which you have your versioned file-based databases (e.g. for each release) - so you could have multiple load-balanced services using the same data files and minimise the size of your docker image, which otherwise would be unnecessarily large. Once you start down this path, you realise docker-ization or deployment is a separate step from the actual tool. So the docker file simply needs to compile and then build the image containing the jar.

I simply git pull and run from source.

Mark

from hermes.

wardle avatar wardle commented on September 1, 2024

Also see #7 for a prior version by @sidharthramesh

from hermes.

wardle avatar wardle commented on September 1, 2024

If you want to try it, without installing clojure, you can just download a release https://github.com/wardle/hermes/releases and it will run as long as you have java installed.

from hermes.

sidharthramesh avatar sidharthramesh commented on September 1, 2024

Hey @eatyourpeas I actually have a script to download SNOMED from MLDS and also containerize it after indexing it. It's currently built using the cloudbuild.yaml file, but please feel free to take a look at the Dockerfile. You can change the version of hermes by just changing the .jar file.

Here is the repo: https://github.com/medblocks/mlds

from hermes.

wardle avatar wardle commented on September 1, 2024

Thanks Sidharth. Hope you are well. I'd definitely upgrade to the latest release if you can.

from hermes.

wardle avatar wardle commented on September 1, 2024

Sounds good. We could put different releases behind a gateway and offer different versions as well!

from hermes.

eatyourpeas avatar eatyourpeas commented on September 1, 2024

Thank you both - I shall nose around both of these ideas first.

from hermes.

wardle avatar wardle commented on September 1, 2024

Also see this blog post and video. I have a tiny $3.50 lightsail instance running with very low memory just for demonstration purposes with which you can play. See IP address in the post.

from hermes.

wardle avatar wardle commented on September 1, 2024

Hi @sidharthramesh - don't think that there are any breaking changes - I should hope all additive changes only at this point. You should find bind-address is now configurable, and there is a new index for association refsets which enables functionality to include historical items in any value set. That does mean that the file-database version changes to 0.6, so you need to build a new file-based database using the latest tool ideally. You can find the current database version in the manifest file of any existing databases. It's not really a breaking change however, but the tooling is strict and should refuse to open a database with the wrong version - that approach prevents hard-to-understand bugs in the future. I think that is a good approach because a fundamental idea in hermes is that one builds a new database frequently anyway - but I'm open to discussion on the strictness of compatibility checks. I've erred on side of strictness here rather than functionality simply failing silently with few clues as to why.

from hermes.

eatyourpeas avatar eatyourpeas commented on September 1, 2024

I maybe going about this wrong in that case:
I am looking to set up a dockerised server that I can send queries to locally for the moment. Within my (python) project I have put the hermes jar in the root folder and set up a docker file with these steps:

FROM openjdk:11-jre-slim
RUN mkdir -p /src
WORKDIR /src 
COPY ./hermes-v0.7.5.jar ./hermes.jar
RUN java -jar hermes.jar --db snomed.db download uk.nhs/sct-clinical {TRUD_API_KEY} /tmp/trud
RUN java -jar hermes.jar --db snomed.db compact
RUN java -jar hermes.jar --db snomed.db index
RUN java -jar hermes.jar --db snomed.db --port 8080 serve 

I am not sure it is downloading the databases though since it jumps past that step in only 8 seconds. Do you have any insights?

from hermes.

sidharthramesh avatar sidharthramesh commented on September 1, 2024

@eatyourpeas make sure you're passing the TRUD_API_KEY correctly? What do the logs during build say? Docker has a very smart way of caching layers, so if it's downloaded the files once for a step, it will skip that step and only execute steps that have changed since then.

I've really not experimented with TRUD, since I usually get the files from MLDS for the Indian Version.

Also, the last step, you probably don't want to execute a RUN, but a CMD to ensure this command runs when the container starts and not when the image is built. You can look here for reference.

from hermes.

eatyourpeas avatar eatyourpeas commented on September 1, 2024

Thank you @sidharthramesh - I am passing the TRUD_API_KEY for the moment as a parameter just to get it working.
It does seem to pass the step, and as you say, cached it so is skipped second time round.
The error I am getting is:
no database found at path and operating read-only {:path "snomed.db"}
So either it does not download it or else my paths are all wrong. I have changed the final step to a CMD like you suggested, but it breaks at the previous compact step.

from hermes.

wardle avatar wardle commented on September 1, 2024

from hermes.

eatyourpeas avatar eatyourpeas commented on September 1, 2024

Much more promising thank you both.
If I get it working, would you like me to update your readme?

image

from hermes.

wardle avatar wardle commented on September 1, 2024

Sure. Perhaps add a doc directory with a file docker.md and we can link from the main README? This would be a UK-specific thing.

You probably want to specify the database explicitly - and for the UK - you'll want uk.nhs/sct-drug-ext as well usually.

I still think it is simpler to run from source code. All this looks as if it adds complexity and reduces flexibility, but no issue in a how-to I guess! Also think the data shouldn't be in the image - shouldn't containers be small and have a link to a fixed filesystem using the cloud provider's fixed filesystem options - that'd be my preference. But open to documenting this approach, as it might feel easier.

from hermes.

eatyourpeas avatar eatyourpeas commented on September 1, 2024

Thanks I think I have a lot to learn here. I haven't fiddled with memory heaps etc since dealing with emulators in Android Studio. I ran out of memory at the compacting stage and had to play with docker settings to free up 8GB. You are right this may not be a sensible idea - I will go with it for the moment and maybe send you some markdown of my experience for you to include if you think it helpful if I find my way through.

from hermes.

wardle avatar wardle commented on September 1, 2024

SNOMED is enormous.

I do mention the need to pass in specific jvm parameters for the compaction phase in the existing documentation. There is no avoiding this need as it is a requirement from a dependency, but you can actually safely leave out compaction if you wish.

I personally wouldn't create a docker image containing the data itself. You can, but I prefer not.

I'd just use the command-line to get it running before you think about docker-ization. Then you'll understand more about what is happening. You only need java to run the jar file. You'll need clojure to run from source, but running using the pre-build uberjar is fine too.

Most SNOMED terminology servers use multiple federated instances of Elasticsearch and have far greater requirements - and are far more complex.

Once you're confident with the process, then you might want to look at the best way to deploy into your production environment - the options for deployment are wide. I've used a range from API gateway, multiple instances of the service and shared fixed disks to running on a single server using systemd to embedded in a larger server application! Building a small container image is trivial containing the code. Scale multiple containers and share a filesystem is the way I'd recommend, if you're needing to satisfy millions of users. I've never had to have more than a single instance. This is in production in our emergency unit so keep it simple!

from hermes.

wardle avatar wardle commented on September 1, 2024

@eatyourpeas I'm happy to have a call and show you if that helps. I can also share some pre-built datafiles via google drive if you wish.

from hermes.

wardle avatar wardle commented on September 1, 2024

PS. You didn't run out of memory - the java application ran out of memory - you just need to give it more for the compaction phase or skip that step.

from hermes.

wardle avatar wardle commented on September 1, 2024

Hi @eatyourpeas and @sidharthramesh ,

Shouldn't your docker files use VOLUMEs to share a file system? That means your containers are small.

See https://docs.docker.com/storage/

Couldn't you even copy the data from s3 or other storage if it didn't exist?

You surely don't want lots of read only static data in your container?

from hermes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.