Comments (1)
The following improvements have been made for the Flink local execution mode
- Auto generate Flink configuration file with appropriate values configured under it. The parameters are determined to the best effort basis so that the pipelines does not fail even for high loads. Refer here for details.
- The number of threads (parallelism) are defaulted to the cores in the machine, but can be overridden over here. In local mode, by default only one worker gets created per pipeline and the parallelism is achieved by the same worker. However, in non-local mode the cluster can distribute the load across workers(Taskmanagers) to achieve the needed parallelism.
- The parquet row group sizes are made configurable, so that the pipeline does not consume much Heap memory, changes can be found here.
Since for the non-local execution mode the resources are little abundant, these properties can be fine tuned for it. There are might be few changes that are needed to suit the needs of the cluster.
from fhir-data-pipes.
Related Issues (20)
- Data retrieved from the Hapi JPA Database only includes the latest information, with no historical data being fetched HOT 1
- Feature Request: Requires Delta Data to be retained in incremental Snapshots HOT 1
- `text` field is lost when the Hapi object is converted to `Avro` HOT 1
- FhirEtl with DataflowRunner does not produce Parquet files HOT 7
- Come up with a common parquet schema for the sql-on-FHIR v1 type
- NPE while trying to find StructureDefinition for a resource extensions that is missing.
- Integrate FHIR Bulk export API with data-pipes
- Add config to change the pipeline controller cron schedule timezone HOT 1
- add config to override the default 100 batch size for server queries HOT 1
- Add last run time and purging schedule to the pipeline contoller UI
- Various documentations
- Feature Request: Options to rerun pipelines for selective date ranges HOT 2
- Feature Request: Pipeline Failure Notifications
- Relook at the `recursiveDepth` parameter
- Fix the number of parquet files generation for Direct Runner
- Create a Util Function for testing Resources in the FHIR store
- Add ability to turn on /off parquet file generation in case of syncying fhir to fhir server
- Support BULK_EXPORT mode of fetching FHIR resources for Incremental Run
- Controller maven module unit tests in `Junit 4` does not report coverage correctly
- Make sure all `DoFn`s are idempotent
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fhir-data-pipes.