Code Monkey home page Code Monkey logo

real-time-technical-indicators's Introduction

Real-time technical indicators in Python

Table of contents

What is a real-time feature pipeline?

A real-time feature pipeline is a program that constantly transforms

  • raw data, e.g. stock trades from an external web socket, into
  • features, e.g. OHLC (open-high-low-close) data for stock trading

and saves these features into a Feature Store.

Real-time feature pipelines are used for real-time ML problems like fraud detection, or cutting-edge recommender systems.

Once the features are in the store, you can fetch them to

  • train an ML model, from the offline feature store or
  • generate predictions with your deployed model, from the online feature store.


The problem

To ensure your deployed model performance matches the test metrics you get at training time, you need to generate features IN THE EXACT SAME WAY.

This is especially tricky for real-time feature pipelines, where

  • live raw data often comes from an external web socket, while
  • historical data comes from external offline storage, like a data warehouse.

The solution

We would like to re-use as much code as possible, and only re-write pre-processing and post-processing logic, depending on

  • the input source, either web socket or data warehouse, and
  • the output sink, either printing on the console (for debugging), the online feature store (for real-time inference), or the offline feature store (for ML Model training)


Python implementation using Bytewax

Python alone is not a language designed for speed ๐Ÿข, which makes it unsuitable for real-time processing. Because of this, real-time feature pipelines were usually writen with Java-based tools like Apache Spark or Apache Flink.

However, things are changing fast with the emergence of Rust ๐Ÿฆ€ and libraries like Bytewax ๐Ÿ that expose a pure Python API on top of a highly-efficient language like Rust.

So you get the best from both worlds.

  • Rust's speed and performance, plus
  • Python-rich ecosystem of libraries.

So you can develop highly performant and scalable real-time pipelines, leveraging top-notch Python libraries.

๐Ÿฆ€ + ๐Ÿ + ๐Ÿ = โšก


Run the whole thing in 5 minutes

  1. Create a Python virtual environment with the project dependencies with

    $ make init
    
  2. Set your Hopsworks API key and project name variables in set_environment_variables_template.sh, rename the file and run it (sign up for free at hospworks.ai to get these 2 values)

    $ . ./set_environment_variables.sh
    
  3. To run the feature pipeline in production mode run

    $ make run
    
  4. To run the feature pipeline in backfill mode, set your PREFECT_API_KEY in set_environment_variables_template.sh, run the file, and then

    $ from_day=2023-08-01 make backfill
    
  5. To run the feature pipeline in debug mode run

    $ make debug
    

Wanna learn more Real-Time ML?

I am preparing a new hands-on tutorial where you will learn to buld a complete real-time ML system, from A to Z.

โžก๏ธ Subscribe to The Real-World ML Newsletter to access exclusive discounts.

real-time-technical-indicators's People

Contributors

paulescu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.