Add an external information support during the Agent's training

Performance Tuning using Reinforcement Learning

The goal of this research is to develop an approach that can automatically tune JVM settings in response to the environment dynamics to achieve better performance. The key idea of the system is to use reinforcement learning (RL) techniques (Deep Q-Network) to automate performance configuration tuning on a given benchmark.

We can formulate the task of performance tuning as a RL problem in which the optimal policy refers to generate a configuration for higher performance. Each configuration (JVM options) represents a RL state. When performance tuning occurs (i.e., issue an action to change configuration), an action receives a reward based on performance measurements. Given sufficient interactions with the environment, RL obtains an estimation of how good an action is for the current configuration (i.e. state).

1. Installation

Requirements

OpenJDK 11
DaCapo benchmark
gcviewer
python3.8 <
tensorflow==2.13.0
tf-agents==0.17.0 Please see the additional requirements in the file: requirements.txt.

Step-by-step installation

Download the DaCapo benchmark & gcviewer jars using command line or just simply click on the following links and manually move downloaded files into the project directory.

cd gc-ml/
wget -O dacapo-bench.jar https://downloads.sourceforge.net/project/dacapobench/9.12-bach-MR1/dacapo-9.12-MR1-bach.jar
wget -O gcviewer-1.36.jar http://sourceforge.net/projects/gcviewer/files/gcviewer-1.36.jar/download

Then download the OpenJDK 11 build from BellSoft website (optionally). You can use any Java build you want.

# NOTE: This link is for Linux
wget -O jdk.tar.gz https://download.bell-sw.com/java/11.0.20+8/bellsoft-jdk11.0.20+8-linux-amd64.tar.gz

Unpack jdk.tar.gz and remove the archive.

tar xzf jdk.tar.gz && rm -fv jdk.tar.gz

2. Prepare a dataset

Synthetic Data

If you don't want to run DaCapo benchmarks while RL, pelase use synthethic data files that imitate avrora and kafka becnhmarks results.

Real Data (optional)

NOTE: JVMEnv can be used with both offline synthetic data and online benchmark runs. You do not need to collect real data by yourself to run RL with JVMEnv.

The following steps describe how to run a DaCapo benchmark, get Garbage Collector logs, process them using GCViewer and use this data in a Python script.

In case you want to run the benchmarks and visualize the data, follow these steps.

Get GC logs from benchmark. Run DaCapo benchmarks on your machine using downloaded JDK and dacapo-bench.jar. run_dacapo.sh

# Training data (avrora becnhmark)
bash scripts/run_dacapo.sh jdk-11.0.20.8 dacapo-bench.jar callback gc-logs avrora

# Evaluation data (kafka becnhmark)
bash scripts/run_dacapo.sh jdk-11.0.20.8 dacapo-bench.jar callback gc-logs kafka

Get summaries. Preprocess collected GC logs stored in gc-logs directory using downloaded GCViewer jar file. get_summary.sh

# Training data (avrora becnhmark)
bash scripts/get_summary.sh jdk-11.0.20.8 gcviewer-1.36.jar avrora

# Evaluation data (kafka becnhmark)
bash scripts/get_summary.sh jdk-11.0.20.8 gcviewer-1.36.jar kafka

Get the goals from summaries. Parse files stored in summaries_avrora and summaries_kafka directories using get_data_from_csv() method from utils.

# Example from dataset_util.ipynb
target_goal = "avgGCPause"
x, y, z = get_data_from_csv(
    csv_dir= "summaries_avrora", 
    goal = target_goal)

3. Usage

The main code is stored in main_dqn.ipynb.

4. Description

4.1 Environment

Java Benchmark (DaCapo benchmark, e.g. Avrora/Kafka) with specific JVM arguments.
action_spec: we have 2 actions for each JVM argument.

Table 1. RL Actions

Num Action

A1 Decrease MaxTenuringThreshold by 3

A2 Increase MaxTenuringThreshold by 3

A3 Decrease ParallelGCThreads by 4

A4 Increase ParallelGCThreads by 4
observation_spec: Current JVM argument values.

Table 2. Configuration Options

Name Type Range Constraints

MaxTenuringThreshold Int [1, 16] -

ParallelGCThreads Int [4, 24] -
reward: A reward is calculated based on performance measuremennt, for example, average GC pause. The objective is to find the smallest goal value (average GC pause). We can achieve this by using the following reward at the end of the training iteration:

$$ reward(c,d) = \begin {cases}-1, \text{ if c > d}, \-(c - d) / c\end{cases} $$

where c is current goal value, and d is default goal value. We first obtain the performance measurement under the current configuration options. The reward is the relative difference between the current goal value and the default one, the normalization puts a large measurement range on the same scale.
state : A state is encoded as an instance of the subject program’s configurations (JVM arguments). The default JVM arguments are used as the initial states.

4.2 Agent

The algorithm used to solve an RL problem is a DQN agent. At the heart of a DQN Agent is a QNetwork, a neural network model that can learn to predict QValues (expected rewards) for all actions, given an observation from the environment.

The network is consist of a sequence of Dense layers, where the final layer has a single output for each possible action.

4.3 Data Collection & Experience Replay

In order to keep track of the data collected from the environment, we use a replay buffer. It stores experience data when we collect trajectories and is consumed during training. All of the actions and observations that the agent has taken from the beginning (limited by the capacity of the memory, of course) are stored. Then a batch of samples is randomly selected from this memory. This ensures that the batch is ‘shuffled’ and contains enough diversity from older and newer samples to allow the network to learn weights that generalize to all the scenarios that it will be required to handle. Each row of the replay buffer only stores a single observation step. But since the DQN Agent needs both the current and next observation to compute the loss, the dataset pipeline will sample two adjacent rows for each item in the batch.

4.4 Example

Table below shows a small example — the problem of perfomance tuning with RL using a DaCapo benchmark. It can be seen that each state is a combination of the current configuration option values. The default values of JVM arguments are used as the initial state (S1). The “Action” column lists the next actions to be performed on the JVM configuration. The “Reward” column lists the immediate performance reward.

Goal — average GC pause.

Default {MaxTenuringThreshold, ParallelGCThreads} = {16, 16}.

Default {avgGCPause} =0.00939.

Table 3. RL States

State	MaxTenuringThreshold	ParallelGCThreads	Goal	Action	Reward
S1	16	16	0.00939	A1	-1
S2	13	16	0.01032	A3	0.114
S3	13	12	0.00843	A2	0.088
S4	16	12	0.00863	…	…

These rewards are used to compute the Target Q Value.

#	Name	Description
1	fullGCPause	Sum of all pauses due to full collections.
2	avgPause	Average length of a GC pause of any kind.
3	fullGcPauseCount	Count of all pauses due to full collections.
4	footprint	Maximal amount of memory allocated.
5	gcPerformance	Performance of minor collections. These are collections that are not full according to the definition above.
6	totalTenuredUsedMax	Total memory usage in tenured space.
7	avgPromotion	Avg promotion shows the average amount of memory that is promoted from young to tenured with each young collection.
8	fullGCPerformance	Performance of full collections. Note that all collections that include a collection of the tenured generation or are marked with "Full GC" are considered Full GC.

#	Name	Description
1	ParallelGCThreads	The number of threads used for parallel garbage collection. This parameter influences the parallelism of the garbage collection process.
2	MaxTenuringThreshold	The maximum age threshold for objects in the young generation before they are promoted to the old generation. This parameter influences when objects are promoted to the old generation during garbage collection.
3	Xmx	The maximum heap size that the JVM can allocate for the Java application.
4	Xms	The initial heap size that the JVM allocates for the Java application.
5	Xmn	The initial and maximum size of the heap for the young generation.
6	SurvivorRatio	The ratio of eden space to survivor space in the young generation. For example, if SurvivorRatio is 4, then eden space is one-fifth of the young generation, and each survivor space is one-twentieth of the young generation.
7	TargetSurvivorRatio	The desired survivor space ratio after minor garbage collection. It's used to calculate the desired size of survivor spaces.

	" if goal_value == None:\n",
	" raise Exception(f\"Goal '{self._goal}' was not found in '{summary}'!\")\n",
	"\n",
	" if os.path.exists(summary):\n",
	" os.remove(summary)\n",
	"\n",
	" return goal_value\n",
	"\n",
	" def _get_reward(\n",
	" self,\n",
	" current_state: JVMState,\n",
	" previous_state: JVMState,\n",
	" lower_is_better: bool = False,\n",
	" beta: float = 1.0,\n",
	" ):\n",
	" \"\"\"\n",
	" Get the environment reward. The reward is composed of two terms:\n",
	" `reward = reward_ex + beta * reward_in`,\n",

Num	Action
A1	Decrease MaxTenuringThreshold by 3
A2	Increase MaxTenuringThreshold by 3
A3	Decrease ParallelGCThreads by 4
A4	Increase ParallelGCThreads by 4

Name	Type	Range	Constraints
MaxTenuringThreshold	Int	[1, 16]	-
ParallelGCThreads	Int	[4, 24]	-

ellkrauze / gc-ml Goto Github PK

gc-ml's Introduction

Performance Tuning using Reinforcement Learning

1. Installation

Requirements

Step-by-step installation

2. Prepare a dataset

Synthetic Data

Real Data (optional)

3. Usage

4. Description

4.1 Environment

4.2 Agent

4.3 Data Collection & Experience Replay

4.4 Example

gc-ml's People

Contributors

Watchers

gc-ml's Issues

1. Metrics

2. Flags

3. Goal

Recommend Projects

Recommend Topics

Recommend Org