A Solution to Replicate data from System A(Snowflake) to System B(Postgres)
- Ordering on the replicated data does not need to be maintained
- The Case Table structure is identical in both databases
- Table only contains new/fresh data. Table will not contain data from the previous scheduled job
Table: case
Column Name | Type |
---|---|
country_id | int |
country | varchar |
Error Queue Message:
{
"country_id": Number,
"country" : String
}
DLQ Message:
{
"country_id": Number,
"country" : String
}
- Country column uses a 3-letter code to represent the country
- Our Case Table has two columns/fields (CountryId, Country)
- CountryId is the primary key of the Case Table
- Utilize Mulesoft and Anypoint MQ
- Use regulare queues (non-FIFO) and Dead Letter Queues (DLQ)
- Retry on failed records
- Scale applications independently
Our system will run on a scheduled job. Schedule and frequency will be set based on business needs. We will have a process application (proc-countries-scheduler-app) which will pull data from the database in System A and insert that data in batches into our database in System B. This application can be scaled horizontally and vertically if needed.
If any errors occur during the process of replicating data over to System B, proc-countries-scheduler-app will publish messages of failed records onto our countries-schedule-error-queue in Anypoint MQ. This allows us to process any failed records asyncronously. This also allows us to have a more sophisticated fallback strategy as we can set a number of retries on those records/messages.
To implement the error stategy using an error queue we will need to have another process application (proc-countries-error-app). This application will subsrcibe to the queue and attempt to insert the failed records into the database in System B. If records fail again after a certain amount of retries, they will be dropped off into our countries-schedule-dlq DLQ.
Our proc-countries-dlq-app will take any messages off of the DLQ and append those failed records on to a failure report file as our lowest state of fallback.
Anypoint will allow us to assign a DLQ to any queue (countries-schedule-error-queue). After our set amount of redeliveries/retries, the message will be dropped into the assign DLQ. DLQs are meant to be subscribed in the same fashion as other queues. The benefit of subscribing to the queue is the application can handle messages in real-time.
- Anypoint MQ supports up to 120,000 in-flight messages per standard queue.
- The amount of resources allocated to our application are limited to Mulesoft.
- Limiting the amount of records pulled in per scheduled job
- Only remove processed records from System A database
- Could move all the logic in proc-countries-error-app into proc-countries-scheduler-app. Should allocate more workers to handle the extra load put onto the app.
- If we want real time data, consider having another application which will publish the new records found onto a seperate queue and then having an application subscribe/listen on that queue. Diagram
- Have a system to process the failure report and determine a strategy to recover or compensate. If customers are impacted, send notifications.