Code Monkey home page Code Monkey logo

amazon-resilient-fix-engine-demo's Introduction

Resilient FIX Engine on AWS

  • Example implementation of an AWS FIX service, which provides automated HA (High Availability) failover, with an RTO (Recovery Time Objective) of seconds in case of failure of the primary engine, and 0 RPO (Recovery Point Objective) meaning no data loss.
  • The implementation uses the QuickFixJ Java-based FIX engine to create FIX connections, and provides the ability to synchronize its state with a hot standby backup engine using a JDBC database.
  • The provided Cloud Formation template and FIX encoding/decoding libraries from the QuickFix project can be used to quickly stand up the engine and use Java, Python, C++, Ruby, .NET or GO to use it to send and receive messages with any other FIX engine.
  • The solution can be configured either as a FIX server or a FIX client, and creates and configures all the infrastructure required to run, including the primary and backup engine, JDBC database, MSK (Kafka) queue, networking, configuration, security and SDLC components.
  • The solution can be reconfigured to run in asynchronous mode using EFS (with near-real-time replication) to achieve higher throughput with a lower RPO

Vision

-FIX (Financial Information eXchange) is a protocol that allows vastly different institutions, technologies and architectures to communicate financial data to each other with minimal coupling and coordination effort.

  • Example of a FIX message : Execution Report (Pipe character is used to represent SOH character): 8=FIX.4.2|9=176|35=8|49=PHLX|56=PERS|52=20071123-05:30:00.000|11=ATOMNOCCC9990900|20=3|150=E|39=E|55=MSFT|167=CS|54=1|38=15|40=2|44=15|58=PHLX EQUITY TESTING|59=0|47=C|32=0|31=0|151=15|14=0|6=0|10=128|
  • The FIX protocol has ushered in an era of straight-through processing in capital markets, replacing faxes and phone calls, and allowing firms to exchange data in real time.
  • To achieve this, the FIX engine must be highly available and resilient against brief outages as well as data center failures, and allow participants to continue exchanging FIX messages.
  • As FIX evolves and is more widely adopted, it holds the promise of allowing real-time processing of all post-trade events such as reconciliations, money movements, settlements, risk calculations and corporate actions.
  • Upgrading the protocol and integrating new clients also requires extensive and labor-intensive testing of the proposed FIX infrastructure without disturbing existing Production connectivity.
  • This requires an engine architecture that's trivial to set up, tear down, scale horizontally and test in parallel.
  • The aim of this project is to provide such a solution.

Purpose

  • Build a cloud-native, highly available resilient FIX engine for financial services industry (FSI) to demonstrate how easy AWS makes it to quickly deploy and easily run complex, real-time services.
  • Enable migration of proprietary protocols and low-latency on-prem FSI trading services to AWS.
  • Demonstrate how this design pattern can be deployed and used with AWS products and services, with a clear focus on security, scale, and distinguishable operational excellence
  • Provide example code that users can adapt to meet this and other similar use cases

Architecture

  • Open-source QuickFix engine and encoding/decoding libraries with Amazon's multi-region Aurora MySQL database and MSK (managed Kafka) queues as well as Fargate (managed Docker) containers, resilience and high availability becomes extremely simple to achieve.
  • Amazon's VPC (Virtual Public Cloud), NLB (Network Load Balancer) and Global Accelerator (intelligent network routing) technologies make it easy to securely expose the FIX service to internal and external users, and route them to the active engine instance.
  • CloudFormation (infrastructure-as code) and SSM (Systems Manager) allow users to automatically create every component required to run a FIX server or client in a matter of minutes, and to reconfigure it without having to recreate it.
  • Java engine deployed as a container on Fargate ECS (Managed Elastic Container Service)
  • Multi-AZ (Availability Zone), multi-master database (Aurora MySQL) keep primary and secondary engine states in sync
  • The secondary engine is kept in hot-standby mode (always running but not processing messages until it detects that it's become the primary)
  • Multi-AZ MSK (Managed Kafka) allows seamless client failover
  • Global Accelerator transparently redirects clients to the currently active FIX engine
  • Internal SQL-based heartbeat uses a MySQL database table row as a mutex lock which allows FIX engines to perform leader election without external watchers
  • Architecture

AWS vs. On-Prem Architecture

  • AWS managed services vastly simplify the deployment and maintenance of complex components like databases, queues and compute containers
  • This is much simpler than deploying the same components on traditional on-prem architectures
  • Architecture

Pre-requisites

  • An AWS account
  • The FIX port number and (optional for internal testing) DNS name you intend to use (either as a FIX server or client)
  • (Optionally) Domain which you control and a hosted zone, which will be used to create a DNS alias for the FIX server (not required if you are using this solution only as a FIX client).
  • (Optionally) An existing VPC and subnets (one public, one private) where you'd like to run the FIX engine (if you don't have one or would like a new one created, simply run the "VPC" version of the Cloud Formation template included with this project.
  • The FIX port number and DNS name you intend to use (either as a FIX server or client)
  • Download/install the QuickFix version appropriate for the language and FIX protocol version that's used for your application from quickfixengine.org/
  • Download/install Kafka producer/consumer library appropriate for the language that's used for your application from cwiki.apache.org/confluence/display/KAFKA/Clients
  • (Optionally) Cloud9 (for building the container image)
  • (Optionally) ECR (for hosting the container image)

Installation

Usage

  • See FixEncoderDecoderDemo.java for an example of how to use QuickFixJ to build an order object, encode it into a FIX string and decode this string back into a new order object
  • Use the QuickFix library for your language (link above) to construct a FIX message object and convert it to a FIX String (in Java this is just calling the toString() method)
  • Connect to the MSK Kafka queue created by CloudFormation and send this string
  • Subscribe to the same MSK Kafka queue to receive FIX reply strings from the target FIX server
  • Use the QuickFix library to parse the retreived FIX message back into an object (using the appropriate-version XML template in quickfixj-messages-all-2.2.0.jar)

Monitoring

  • Navigate to CloudWatch to look at logs
  • Click on ECS Cluster --> Click on Task Tab --> Click on Task --> Expand the Container and scroll down and click on View logs in CloudWatch
  • or Go to CloudWatch and filter by stack name e.g. fixengineonaws-server-1-19
  • Log group will be names as /ecs/fargate/FixEngineOnAws-Client-1-19
  • within the log group, there will be 2 log streams for primary e.g. FixEngineOnAws-Client-1-19/Primary/FixEngineOnAws...... and failover FixEngineOnAws-Client-1-19/Failover/FixEngineOnAws.......

Testing

  • Create a FIX Server and FIX Client stack
  • Clone this repo
    git clone https://github.com/aws-samples/amazon-resilient-fix-engine-demo
  • Login into AWS Console --> CloudFormation --> Create Stack with New resources
  • Select "Template is Ready" and "Upload a template file"
  • Click "Choose File" and browser to FIX Engine repo folder to select file "amazon-resilient-fix-engine-demo/cloudformation/FIXEngineVPCApplication.yml"
  • Enter Parameters as follows
    Fix Server CF 1

Fix Server CF 2

  • Wait until the CloudFormation stack deployment is succesful.

  • Click on output tab, note down GlobalAcceleratorDNSName and NATGatewayIPAddresses

  • Once the FIX Server stack is deployed, click update and update "Fix Client CIDRs" parameter with 10.10.0.0/20, PrimaryNATGatewayEIP/32, FailoverNATGatewayEIP/32 (replace PrimaryNATGatewayEIP and FailoverNATGatewayEIP with actual IP addresses )
    Fix Server CF 3

  • Now you will be deploying the FIX client stack

  • Go to CloudFormation --> Create Stack with New resources

  • Select "Template is Ready" and "Upload a template file"

  • Click "Choose File" and browser to FIX Engine repo folder and select file "amazon-resilient-fix-engine-demo/cloudformation/FIXEngineApplication.yml"

  • Enter Parameters as follows, use the GlobalAcceleratorDNSName, PrimaryNATGatewayEIP and FailoverNATGatewayEIP noted down previously
    Fix Server CF 1

Fix Server CF 2

  • Wait until the CloudFormation stack deployment is succesful.

  • For local testing provision an EC2 or AWS Cloud9 in same VPC where FIX Server and FIX Client Engine is deployed.

  • Install required packages
    sudo yum install telnet
    sudo yum install jq git docker java-1.8.0-openjdk-devel -y
    wget https://archive.apache.org/dist/kafka/2.2.1/kafka_2.12-2.2.1.tgz
    tar -xzf kafka_2.12-2.2.1.tgz
    sudo wget -c https://services.gradle.org/distributions/gradle-6.6.1-all.zip
    sudo unzip gradle-6.6.1-all.zip -d /opt
    sudo ln -s /opt/gradle-6.6.1 /opt/gradle

  • Update .bash_profile to add below
    export GRADLE_HOME=/opt/gradle
    export PATH=$PATH:/opt/gradle/bin

  • Get MSK broker endpoint for both server and client MSK. Go to MSK, client client MSK and note down the broker endpoints
    MSK Brokers

  • Repeats same steps to get server MSK broker endpoints

  • Update src/main/resources/config/test-client.cfg to update KafkaBootstrapBrokerString, NoOfMessages and WaitBetweenMessages KafkaBootstrapBrokerString=":9092,:9092"
    NoOfMessages=30

  • Update src/main/resources/config/test-server.cfg to update KafkaBootstrapBrokerString KafkaBootstrapBrokerString=":9092,:9092"

  • Create a local build if you are planning to modify code or you could use the already built jar located at build/libs/fixengineonaws.jar
    cd amazon-resilient-fix-engine-demo
    -- create local build, skip this step is not modifying code
    gradle build

  • Open a terminal window and run the test client on FIX Server side
    cd amazon-resilient-fix-engine-demo
    ./scripts/runtestserver.sh

  • Open a terminal window and run the test client on FIX Client side
    cd amazon-resilient-fix-engine-demo
    ./scripts/runtestclient.sh

  • Open a terminal window and monitor execution reports received back by FIX Client MSK
    export PS1="MSK-Client-1 >"
    cd /home/ec2-user/environment/kafka_2.12-2.2.1/bin
    export BootstrapBrokerString=":9092,:9092"
    ./kafka-topics.sh --list --bootstrap-server $BootstrapBrokerString
    ./kafka-console-consumer.sh --bootstrap-server $BootstrapBrokerString --topic FROM-FIX-ENGINE --from-beginning

  • Open a terminal window and and ,onitor order received by FIX Server MSK
    export PS1="MSK-Server-1 >"
    cd /home/ec2-user/environment/kafka_2.12-2.2.1/bin
    export BootstrapBrokerString=":9092,:9092"
    ./kafka-topics.sh --list --bootstrap-server $BootstrapBrokerString
    ./kafka-console-consumer.sh --bootstrap-server $BootstrapBrokerString --topic FROM-FIX-ENGINE --from-beginning
    FIX Test Terminals

Testing for Resilience and Failover

  • Testing Failover of ECS Task/Container

  • Goto CloudWatch Console --> Log Groups --> Filter by stack name e.g. FixEngine... --> Select Log Group e.g. /aws/ecs/fargate/

  • You will see two log streams named as /Primary//******* e.g. FixEngineOnAws-Client/Primary/FixEngineOnAws-Client/******** /Failover//******* e.g. FixEngineOnAws-Client/Failover/FixEngineOnAws-Client/******** FIX ECS CloudWatch

  • Select the log for Primary and Failover to check if "IM_AM_THE_ACTIVE_ENGINE" is true or false to determine the active FIX Engine. This will display as "IM_AM_THE_ACTIVE_ENGINE: true" for active FIX Engine

  • Goto ECS Console -- Click on Cluster --> Select the Active Cluster --> Click on Tasks Tab --> Select the task and click stop and then stop again on confirmation window. FIX ECS Stop Task

  • It takes approxmately 15-30 seconds for passive FIX engine become active.

Throughput Considerations

  • Below RDS, ECS and Kafka configuration provides approximately 200 TPS on single partitioned topic with replication factor of 2. AWS FIX Engine Team is working on increasing the throughput with horizontal and vertical scaling in upcoming releases.
  • 2 Node Kafka Cluster with intsance size kafka.m5.large
  • ECS Task: Task CPU (unit) 512 Task memory (MiB): 1024
  • 2 Node Aurora MySQL Multi-Master with instance type db.r4.2xlarge

API Documentation

References

Security

  • The Admin and Fix Service DB passwords are stored in SSM's Secrets Manager
  • Access to the FIX engine subnet is limited to the FIX protocol TPC port selected by the user during installation
  • Access to the MSK subnet is limited to the Kafka TCP port created by MSK during installation (see Parameter Store for the Kafka port number and endpoint DNS names)

License

This library is licensed under the Apache 2.0 License. See the LICENSE file.

Support

amazon-resilient-fix-engine-demo's People

Contributors

amazon-auto avatar cartermeyers avatar hnnagra-aws avatar winmaxim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

amazon-resilient-fix-engine-demo's Issues

Performance metrics?

Hi,
I'd love to know what throughput you were able to achieve with this architecture.
I'm looking for numbers on latency, throughput (messages per second etc.. )

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.