Code Monkey home page Code Monkey logo

deepcat's Introduction

DeepCAT+: A Low-Cost and Transferrable Online Configuration Auto-Tuning Approach for Big Data Frameworks(ICPP22,TPDS Revision)

Big data frameworks usually provide a large number of performance-related parameters. Online auto-tuning these parameters based on deep reinforcement learning (DRL) to achieve a better performance has shown their advantages over search-based and machine learning-based approaches. Unfortunately, the time cost during the online tuning phase of conventional DRL-based methods is still heavy, especially for big data applications. To reduce the total online tuning cost and increase the adaptability: 1) DeepCAT+ utilizes the TD3 algorithm instead of DDPG to alleviate value overestimation; 2) DeepCAT+ modifies the conventional experience replay to fully utilize the rare but valuable transitions via a novel reward-driven prioritized experience replay mechanism; 3) DeepCAT+ designs a Twin-Q Optimizer to estimate the execution time of each action without the costly configuration evaluation and optimize the sub-optimal ones to achieve a low-cost exploration-exploitation tradeoff; 4) Furthermore, DeepCAT+ also implements an Online Continual Learner module based on Progressive Neural Networks to transfer knowledge from historical tuning experiences. system overview

New features in DeepCAT+ beyond ICPP22 Paper DeepCAT

Progressive Neural Networks (PNN) based Online Continual Learner to enhance the adaptability for dynamic workloads and hardware environments changes.

  1. Log-based workload features extraction
  2. PNN-based knowledge transfer

pnn

Start

Cluster deployment

  1. Install Hadoop distributed environment and file system.
  2. install the Spark computing framework.
  3. Install and compile the hibench testing framework.
  4. Install Ansible Playbook for batch configuration and automated deployment.

Steps for reproducing DeepCAT+โ€™s Results

  1. Data collection: collect offline exploration data, including cluster metric states, configuration values, rewards. The interaction between Python programs and clusters is conducted through Ansible tools, check target/target_spark/readme.md for more details.
  2. Use the data to form memory pool for offline training and save the model, see offline_train() function in DeepCAT.py.
  3. Use the model to tune configuration for big data frameworks using tune() in DeepCAT.py. Note there are two polcies:
    • if the workload is known, DeepCAT+ will direct conduct optimization, details in DeepCAT.py.
    • if the workload is unknown, DeepCAT+ will use Progressive Neural Networks for continual learning to enhence it's adaptability, details in DeepCAT_with_PNN.py.
  4. Compare DeepCAT with CDBTune, OtterTune and Qtune baselines.

Environment Version

  • Hadoop 2.7.3
  • Spark 2.2.2
  • Hibench 7.0
  • Ansible

Install dependencies (with python 3.8)

pip install -r requirements.txt

Benchmark

we use 9 worklaods with different input data sizes form Hibench The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis

  • WordCount (WC)
  • TeraSort (TS)
  • PageRank (PR)
  • KMeans (KM)
  • Gradient Boosted Trees(GBT)
  • Nweight (NW)
  • Principal Component Analysis (PCA)
  • Aggregation (AGG)
  • WordCount(for streaming)

Baseline

Datasets

  1. The data collected based on the local 3-node Spark cluster includes the execution time of 4 spark workloads under different configuration values in the dataset, check dataset for more details.
  2. For reinforcement learning training, memory pools consist of transitions(s,a,r,s') is in test_kit/ultimate/memory

Configuraiton details

  1. Description of the performance-critical parameters From Spark, YARN and HDFS Description of the performance-critical parameters From Spark, YARN and HDFS
  2. For experiments on Flink, check test_kit/ultimate/flink-experimental/readme.md for more details.

Hardware environments

Seven experimental clusters to extensively evaluate the effectiveness of DeepCAT/DeepCAT+ and its robustness to various hardware environments.

Cluster Nodes Cluster types BD frameworks Evaluation
Cluster_A 3 Physical machines Spark Effectiveness
Cluster_B 3 VMs_1 Spark Adaptability
Cluster_C 6 Physical machines Flink Other BD frameworks
Cluster_D 5 Physical machines + VMs_2 Spark Heterogeneous clusters
Cluster_E 8 VMs_3 Spark Large-scale clusters
Cluster_F 10 VMs_3 Spark Large-scale clusters
Cluster_G 12 VMs_3 Spark Large-scale clusters
  • Physical machines: 8 cores, 16GB memory
  • VMs_1: 8 cores, 8GB memory
  • VMs_2: 12 cores, 8GB memory
  • VMs_3: 8 cores, 16GB memory

deepcat's People

Contributors

wiluen avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.