Code Monkey home page Code Monkey logo

gc-ml's Introduction

Performance Tuning using Reinforcement Learning

The goal of this research is to develop an approach that can automatically tune JVM settings in response to the environment dynamics to achieve better performance. The key idea of the system is to use reinforcement learning (RL) techniques (Deep Q-Network) to automate performance configuration tuning on a given benchmark.

We can formulate the task of performance tuning as a RL problem in which the optimal policy refers to generate a configuration for higher performance. Each configuration (JVM options) represents a RL state. When performance tuning occurs (i.e., issue an action to change configuration), an action receives a reward based on performance measurements. Given sufficient interactions with the environment, RL obtains an estimation of how good an action is for the current configuration (i.e. state).

1. Installation

Requirements

  • OpenJDK 11
  • DaCapo benchmark
  • gcviewer
  • python3.8 <
  • tensorflow==2.13.0
  • tf-agents==0.17.0 Please see the additional requirements in the file: requirements.txt.

Step-by-step installation

Download the DaCapo benchmark & gcviewer jars using command line or just simply click on the following links and manually move downloaded files into the project directory.

cd gc-ml/
wget -O dacapo-bench.jar https://downloads.sourceforge.net/project/dacapobench/9.12-bach-MR1/dacapo-9.12-MR1-bach.jar
wget -O gcviewer-1.36.jar http://sourceforge.net/projects/gcviewer/files/gcviewer-1.36.jar/download

Then download the OpenJDK 11 build from BellSoft website (optionally). You can use any Java build you want.

# NOTE: This link is for Linux
wget -O jdk.tar.gz https://download.bell-sw.com/java/11.0.20+8/bellsoft-jdk11.0.20+8-linux-amd64.tar.gz

Unpack jdk.tar.gz and remove the archive.

tar xzf jdk.tar.gz && rm -fv jdk.tar.gz

2. Prepare a dataset

Synthetic Data

If you don't want to run DaCapo benchmarks while RL, pelase use synthethic data files that imitate avrora and kafka becnhmarks results.

Real Data (optional)

NOTE: JVMEnv can be used with both offline synthetic data and online benchmark runs. You do not need to collect real data by yourself to run RL with JVMEnv.

The following steps describe how to run a DaCapo benchmark, get Garbage Collector logs, process them using GCViewer and use this data in a Python script.

In case you want to run the benchmarks and visualize the data, follow these steps.

  1. Get GC logs from benchmark. Run DaCapo benchmarks on your machine using downloaded JDK and dacapo-bench.jar. run_dacapo.sh
# Training data (avrora becnhmark)
bash scripts/run_dacapo.sh jdk-11.0.20.8 dacapo-bench.jar callback gc-logs avrora

# Evaluation data (kafka becnhmark)
bash scripts/run_dacapo.sh jdk-11.0.20.8 dacapo-bench.jar callback gc-logs kafka
  1. Get summaries. Preprocess collected GC logs stored in gc-logs directory using downloaded GCViewer jar file. get_summary.sh
# Training data (avrora becnhmark)
bash scripts/get_summary.sh jdk-11.0.20.8 gcviewer-1.36.jar avrora

# Evaluation data (kafka becnhmark)
bash scripts/get_summary.sh jdk-11.0.20.8 gcviewer-1.36.jar kafka
  1. Get the goals from summaries. Parse files stored in summaries_avrora and summaries_kafka directories using get_data_from_csv() method from utils.
# Example from dataset_util.ipynb
target_goal = "avgGCPause"
x, y, z = get_data_from_csv(
    csv_dir= "summaries_avrora", 
    goal = target_goal)

3. Usage

The main code is stored in main_dqn.ipynb.

4. Description

4.1 Environment

  • Java Benchmark (DaCapo benchmark, e.g. Avrora/Kafka) with specific JVM arguments.

  • action_spec: we have 2 actions for each JVM argument.

    Table 1. RL Actions

    Num Action
    A1 Decrease MaxTenuringThreshold by 3
    A2 Increase MaxTenuringThreshold by 3
    A3 Decrease ParallelGCThreads by 4
    A4 Increase ParallelGCThreads by 4
  • observation_spec: Current JVM argument values.

    Table 2. Configuration Options

    Name Type Range Constraints
    MaxTenuringThreshold Int [1, 16] -
    ParallelGCThreads Int [4, 24] -
  • reward: A reward is calculated based on performance measuremennt, for example, average GC pause. The objective is to find the smallest goal value (average GC pause). We can achieve this by using the following reward at the end of the training iteration:

    $$ reward(c,d) = \begin {cases}-1, \text{ if c > d}, \-(c - d) / c\end{cases} $$

    where c is current goal value, and d is default goal value. We first obtain the performance measurement under the current configuration options. The reward is the relative difference between the current goal value and the default one, the normalization puts a large measurement range on the same scale.

  • state : A state is encoded as an instance of the subject program’s configurations (JVM arguments). The default JVM arguments are used as the initial states.

4.2 Agent

The algorithm used to solve an RL problem is a DQN agent. At the heart of a DQN Agent is a QNetwork, a neural network model that can learn to predict QValues (expected rewards) for all actions, given an observation from the environment.

The network is consist of a sequence of Dense layers, where the final layer has a single output for each possible action.

https://medium.com/@vishnuvijayanpv/deep-reinforcement-learning-value-functions-dqn-actor-critic-method-backpropagation-through-83a277d8c38d

4.3 Data Collection & Experience Replay

In order to keep track of the data collected from the environment, we use a replay buffer. It stores experience data when we collect trajectories and is consumed during training. All of the actions and observations that the agent has taken from the beginning (limited by the capacity of the memory, of course) are stored. Then a batch of samples is randomly selected from this memory. This ensures that the batch is ‘shuffled’ and contains enough diversity from older and newer samples to allow the network to learn weights that generalize to all the scenarios that it will be required to handle. Each row of the replay buffer only stores a single observation step. But since the DQN Agent needs both the current and next observation to compute the loss, the dataset pipeline will sample two adjacent rows for each item in the batch.

4.4 Example

Table below shows a small example — the problem of perfomance tuning with RL using a DaCapo benchmark. It can be seen that each state is a combination of the current configuration option values. The default values of JVM arguments are used as the initial state (S1). The “Action” column lists the next actions to be performed on the JVM configuration. The “Reward” column lists the immediate performance reward.

Goal — average GC pause.

Default {MaxTenuringThreshold, ParallelGCThreads} = {16, 16}.

Default {avgGCPause} =0.00939.

Table 3. RL States

State MaxTenuringThreshold ParallelGCThreads Goal Action Reward
S1 16 16 0.00939 A1 -1
S2 13 16 0.01032 A3 0.114
S3 13 12 0.00843 A2 0.088
S4 16 12 0.00863

These rewards are used to compute the Target Q Value.

gc-ml's People

Contributors

ellkrauze avatar

Watchers

 avatar

gc-ml's Issues

Add an external information support during the Agent's training

Like many machine learning techniques, RL faces the problem of high-dimensional spaces. As environments become larger, the agent's learning time increases and finding the optimal solution becomes impractical. To successfully scale RL to real-world scenarios, it would be necessary to use information from outside the environment.

Problem
Currently, our RL algorithm relies only on the current state (2 flags and performance metric values) and the reward. But what if we feed the agent with additional information during training, such as machine configuration, available RAM and CPU info?

Source: https://link.springer.com/article/10.1007/s12652-021-03489-y

Вопросы 21_11

Пока не пробовал запускать код. Насколько я понимаю, у меня нет всех файлов для этого. Посмотрел , возникло несколько вопросов:

  1. Нормализация вознаграждения
    def _get_reward(self, next_state, current_state):

    Мне кажется, логичнее нормализовывать на дефолтную величину, а не на текущую.
  2. У вас есть 4 действия: уменьшить/увеличить одну из 2-х величин
    Не смог разобраться, как работает следующий код:
    def _decrease_MaxTenuringThreshold(self):

    Если вы проверяли, то ок.
  3. Epsilon = 0.9 кажется слишком большим. Или вы пытались сместить его, чтобы было больше изменений?
  4. lr =1e-6 кажется слишком маленьким. Возможно, конкретно для агентов такой подходит, но обычно используют что-то ближе к 0.001.

Add more JVM flags for configuration

The primary objective is to recommend JVM flag values that influence the ParallelGC configuration, based on pertinent metrics extracted from a Garbage Collector logging file.

Our approach involves identifying crucial GC log metrics that provide significant insights into the user's performance tuning objectives. We will focus on tuning JVM flags that can directly impact these identified metrics. Essentially, we aim to analyze the input GC metrics, identify problematic metrics that can be addressed through flag tuning, and propose suitable flag values to align with the user's goals.

Thus, we can frame our task as follows: "Given a set of metrics, adjust the flags to optimize the goal."

1. Metrics

Metrics, extracted from GC logging files, such as those analyzed using GCViewer, provide valuable insights into our application's garbage collection process. They offer a deeper understanding of our application's behavior in relation to garbage collection.

Below are some key metrics that we need to consider:

# Name Description
1 fullGCPause Sum of all pauses due to full collections.
2 avgPause Average length of a GC pause of any kind.
3 fullGcPauseCount Count of all pauses due to full collections.
4 footprint Maximal amount of memory allocated.
5 gcPerformance Performance of minor collections. These are collections that are not full according to the definition above.
6 totalTenuredUsedMax Total memory usage in tenured space.
7 avgPromotion Avg promotion shows the average amount of memory that is promoted from young to tenured with each young collection.
8 fullGCPerformance Performance of full collections. Note that all collections that include a collection of the tenured generation or are marked with "Full GC" are considered Full GC.

2. Flags

The JVM offers numerous configuration options, but for our focus on ParallelGC, we'll emphasize essential options.

By default, the JVM should be configured with -XX:-UseParallelGC. Here are the other options we'll consider tuning:

# Name Description
1 ParallelGCThreads The number of threads used for parallel garbage collection. This parameter influences the parallelism of the garbage collection process.
2 MaxTenuringThreshold The maximum age threshold for objects in the young generation before they are promoted to the old generation. This parameter influences when objects are promoted to the old generation during garbage collection.
3 Xmx The maximum heap size that the JVM can allocate for the Java application.
4 Xms The initial heap size that the JVM allocates for the Java application.
5 Xmn The initial and maximum size of the heap for the young generation.
6 SurvivorRatio The ratio of eden space to survivor space in the young generation. For example, if SurvivorRatio is 4, then eden space is one-fifth of the young generation, and each survivor space is one-twentieth of the young generation.
7 TargetSurvivorRatio The desired survivor space ratio after minor garbage collection. It's used to calculate the desired size of survivor spaces.

3. Goal

The primary goal is to optimize the user's performance metric, which can include footprint, throughput, totalTime, or any other metric extracted from the GC logging file.

While tuning a single user's objective at a time is the simplest approach, exploring multi-objective tuning is also valuable for future research.

Add multi-objective support

gc-ml/draft.ipynb

Lines 542 to 559 in 4d47000

" if goal_value == None:\n",
" raise Exception(f\"Goal '{self._goal}' was not found in '{summary}'!\")\n",
"\n",
" if os.path.exists(summary):\n",
" os.remove(summary)\n",
"\n",
" return goal_value\n",
"\n",
" def _get_reward(\n",
" self,\n",
" current_state: JVMState,\n",
" previous_state: JVMState,\n",
" lower_is_better: bool = False,\n",
" beta: float = 1.0,\n",
" ):\n",
" \"\"\"\n",
" Get the environment reward. The reward is composed of two terms:\n",
" `reward = reward_ex + beta * reward_in`,\n",

Add support of boolean JVM flags

self._flags_min_values = [each["min"] for each in self._flags]
self._flags_max_values = [each["max"] for each in self._flags]
self._flags_step_values = [each["step"] for each in self._flags]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.