Code Monkey home page Code Monkey logo

Comments (9)

kimaina avatar kimaina commented on May 28, 2024

/+ for rewriting using beam, I think there is much to gain given that we will not have to manually implement many of the functionalities provided by beam e.g windowing, parallel processing, and micro-batching. Let's go for whichever approach works the best between the 2.

I like the scalability that can be provided using Kafka! On the issue of architectural complexity, it is possible to have kafka without a zookeeper, however, this is going to take a few more years before it becomes production-ready https://www.confluent.io/blog/removing-zookeeper-dependency-in-kafka/. On the other hand, the same stack (kafka + zk) can be significantly scaled-down, depending on server specs, please take a look at this https://kafka.blog/posts/scaling-down-apache-kafka/

from fhir-data-pipes.

bashir2 avatar bashir2 commented on May 28, 2024

Thanks @kimaina for the notes.

Re. benefits of Beam: Yes, definitely Beam brings many nice features and our long term plan should be to switch.

Re. scaled down Kafka: My concern is not much around resource usage but the complexity. When things go wrong (and they will definitely will) debugging issues is potentially much more complicated since you need to deal/understand 5 pieces of infrastructure instead of 2.

BTW, do you care about the scalability of Kafka in the AMPATH use case? I am asking because you have a central installation and my expectation is that converting data to FHIR would be the bottleneck not the message passing/processing part.

from fhir-data-pipes.

kimaina avatar kimaina commented on May 28, 2024

we really don't care about scaling Kafka. I think one Kafka broker should be able to handle all incoming streams of data at any given point in time (whether peak or offpeak). During the peak, we should expect an average of 80 records per second.
Screenshot 2020-03-17 at 11 35 56 AM

from fhir-data-pipes.

kimaina avatar kimaina commented on May 28, 2024

When things go wrong (and they will definitely will) debugging issues is potentially much more complicated since you need to deal/understand 5 pieces of infrastructure instead of 2.

I see your point! Let's look at the other option. I am curious to know how long this will take to implement

from fhir-data-pipes.

mozzy11 avatar mozzy11 commented on May 28, 2024

Implement a custom IO connector for Beam as described here which includes an UnboundedSource that wraps Debezium. The main drawback of this approach is its custom nature and also the fact that it is not extendable to a distributed environment (for which Kafka is probably the right approach). The main benefit is its architectural simplicity.

@bashir2 , have we concluded to take on the second option ?

i could do some work on this

from fhir-data-pipes.

bashir2 avatar bashir2 commented on May 28, 2024

Thanks @mozzy11 for volunteering. I don't think that we need to work on this for now. Both of these options are significant endeavors and the reason I was considering them was because of #65 for which I have implemented a different (temporary?) solution for now.

I'll add more notes about pros/cons of the two options soon and make a suggestion (based on what I have learnt so far) to see what everyone thinks. But I would say this is not something to do for the MVP.

from fhir-data-pipes.

mozzy11 avatar mozzy11 commented on May 28, 2024

But I would say this is not something to do for the MVP.

oh sure , that makes sense @bashir2

from fhir-data-pipes.

bashir2 avatar bashir2 commented on May 28, 2024

Here are some more notes to have a record of my investigation/thoughts and to put this issue on the back burner for now:

Between the two approaches, i.e., using Kafka or embedded Debezium ("Debezium" for short), here is a list of benefits of each:

  • Debezium provides simpler architecture; no need for dealing with Kafka and ZooKeeper.
  • Kafka makes it easier to merge multiple OpenMRS instances into a single Data Warehouse (DW). It also makes it easier to import data from other non-OpenMRS sources.
  • Kafka has standard support in Beam. But I have learnt that there are plans to implement a DebeziumIO in Beam too which is what I was considering to implement myself (so we can just wait for that standard implementation instead).
  • The Kafka based approach has an extra pair of serialization/deserialization of DB update messages compare to Debezium.

The main motivation for considering this rewrite at this time was Issue #65 which I resolved with a custom windowing implementation to deal with Parquet file issues in the streaming mode. So we don't need to do the Beam rewrite for now. We should wait until both DebeziumIO is available and we have a better sense of whether consolidating multiple data sources into a single DW is a need. If it is, we should consider the Kafka based approach. If it is not, we should go with the DebeziumIO based approach, IMO.

from fhir-data-pipes.

bashir2 avatar bashir2 commented on May 28, 2024

This is obsolete now because of #952.

from fhir-data-pipes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.