eclipse-openj9 / openj9-utils Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 29.0 467 KB

License: Other

Shell 0.18% Java 19.97% Makefile 0.24% CMake 0.12% C++ 79.25% Smarty 0.24%

openj9-utils's People

Contributors

Stargazers

Watchers

openj9-utils's Issues

Link information from JLM and MonitorContended events

JLM (java lock monitor) gives us a summary of activity on monitors, whereas MonitorContended events are triggered for every monitor individually and they also can be used to compute the waiting time for a particular monitor operation (something that JLM does not do).
Ideally, we would first use the information from JLM to determine which monitors are expensive and then use the MonitorContended events to drill deeper and find stack traces for threads waiting on expensive monitors and for threads holding onto expensive monitors. To do that we need some common monitor information between the tho sources. There are two possibilities:

Use the raw address of the OpenJ9 monitors. This is already printed by JLM and we need a way to find this address from the MonitorContended events.
Use the hash value of the monitors (actually for the object we are synchronizing on). This is available in the current version of the code for the MonitorContended events, but is not available in JLM. We would need to modify OpenJ9 repo to make this change.

perf-tools: assess performance overhead

the performance overhead of this tool should be measured and documented.

under default configurations
how does it vary with sampling rate
how does it vary with various collection of data points

perf:tool Integrate JLM (Java Lock Monitor) with the perf agent

While there is a JLM tool as part of the tprof collection of perf tools, it would be nice to have that functionality integrated with our own perf agent.

perf-tool: need a unique key name in the `body` of events

right now, similar events cannot be aggregated - as there is no key that binds them together. The only way to find out is by iterating and parsing to match the thread stack.

Because each event on a monitorEvent is on a lock, does it make sense to use the address of the monitor itself as the key? As per @mpirvu , the java object address is subjected to chhange across gc cycles, and are not trustworthy.

this can have some discussions.

perf-tool: send event types to network clients

per #37 (review)

Basically, the additional field that is introduced in #37 needs to be sent to the networking clients as well.

Those who pick this up, may do so after #37 lands.

perf-tools: add a verbose option

add a verbose option to the tool, guard all the prints under this option, and add more prints at vital control points.

No jitserver binary in Semeru images?

I'm trying to find jitserver binary in the ibm-semeru-runtimes:open-8u332-b09-jre image, as referenced in values.yaml and deployment.yaml, but there appears to be none:

podman run --rm -it ibm-semeru-runtimes:open-8u332-b09-jre bash -c jitserver
bash: jitserver: command not found

Am I using a wrong image?

perf-tool: Consider setting capabilities only if needed

I see that capabilities like can_generate_method_entry_events (and can_generate_method_exit_events in #52) are set on Agent_OnLoad. It might be worth only setting these if explicitly set on the agent options so that if these capabilities aren't used, the JVM won't have to make unnecessary compromises (for example, the optimizations the JIT has to forgo because method enter/exit hooks could be triggered).

perf-tool: Add verbose to README.md

Please update https://github.com/eclipse/openj9-utils/blob/master/perf-tool/README.md on how to use verbose.

perf-tool: list of all waiters in the monitorEvents

It would be good to capture all the waiters info on the monitorEvents output, if need be. I suggest this to be implemented under a flag, or else can:

clutter the output
add performance overhead

The info_ptr field in the call to

GetObjectMonitorUsage(jvmtiEnv* env,
            jobject object,
            jvmtiMonitorUsage* info_ptr)

is a structure like this:

typedef struct {
    jthread owner;
    jint entry_count;
    jint waiter_count;
    jthread* waiters;
    jint notify_waiter_count;
    jthread* notify_waiters;
} jvmtiMonitorUsage;

and the fields waiter_count and waiters have the needed info, IIUC.
/cc @mpirvu - am I right?

perf-tools: investigate intermittent crash

@mpirvu reports that the tool faces occasional crashes. (upon pressing ctrl+c on a running process with the tool?)

reproduce it, investigate and fix the root cause.

perf-tools: Group the information of owner thread info under a heading

For better readability, Group the owner thread info under a meaningful heading

Currently,

"threadID": 34,
"threadName": "Thread-13",
"threadNativeID": 329430

Target:

"OwnerThread": [
      {
        "threadID": 34,
        "threadName": "Thread-13",
        "threadNativeID": 329430
      }
    ],

Discussion on helm-based JITServer operator

As a continuous part of the JITServer on-cloud deployment discussion, we would like to implement an OpenJ9 JITServer operator to support OpenShift users.

The initial idea is to take advantage of the existing JITServer helm chart and implement a helm-based operator, then publish it as a community operator (without RedHat certification).

This issue will be populated as more details become available. A few items to confirm:

Explore the possibility of hosting operator on github repo, or we might have to host it on quay.io.
Find out and document what are the required steps before users can install this operator (operator-source.yaml).
Maintainance for version update or functionality updates.

In the future, we might want to extend the capacity of the operator and implement a go-based operator, but this is not our focus right now.

FYI: @mpirvu @keithc-ca @EmanElsaban

perf-tool: compute lock latency

Bascially compute

how long the lock is held,
what is the reason for the delay:
- heavy contention or
- locked code execution latency

an approach would be to split monitor event into monitor enter and monitor exit events, and find the time in between events with matching monitors. compare it with contention data to make inferences.

(there could be other means to do this)

Save info received by network clients to log

When the perf agent sends tracing information to the networking clients, that information is displayed on the screen. With so much text flushing on the screen it's difficult to type any new command. It's better to store this information to a log/file at the client, rather than displaying it on screen.

perf-tool: missing sections in verbose:gc

Example: 2 consecutive callbacks printed data like below:

{
  "body": "<exclusive-end id=\"38\" timestamp=\"2021-03-01T22:53:34.040\" durationms=\"6.638\" />\n\n",
  "eventType": "verboseGCEvent",
  "from": "Server",
  "timestamp": 1614668014040320324
},

and

{
  "body": "<sys-start reason=\"explicit\" id=\"40\" timestamp=\"2021-03-01T22:53:34.040\" intervalms=\"6.750\" />\n",
  "eventType": "verboseGCEvent",
  "from": "Server",
  "timestamp": 1614668014040566262
},

there is a missing section for exclusive-start. IMO this is not accidental. The facts that:

a single gc event triggers multiple callbacks (anything starting with an xml tag)
we collect only nth sample of events here: https://github.com/eclipse/openj9-utils/blob/44ace98053f6516801a57cdd8aecd792064bb4ec/perf-tool/src/verboseLog.cpp#L112

we are not actually skipping coherent blocks of verbose:gc sections, instead some random XML blocks

A solution would be to identify a reasonable eye-catcher (such as execusive-start and execusive-end) in the log data, use that for turning on and off the processing, along with the sampling calculation.

Security Best Practices

Hi,

As a member of the Security Team from the Eclipse Foundation, we used a tools Scorecard and StepSecurity to analyze this repo in order to push a pull request that cover some or all the following best practices below:

Apply least privilege principle to GITHUB_TOKEN
Add or fine tune the use of Dependabot
Pin actions to a full length commit SHA

As a result, You will see a PR coming from StepSecurity to help to implement those fixes above which will cover a list of points below identified detected:

Add or fine tune the use of Dependabot
Pin Actions to a full length commit SHA for files .github/workflows/cpp.yml

Please don’t hesitate and reach out if there is something unclear above.

Kind Regards,
Francisco Perez

Readme for JITServer Helm Chart refers to AdoptOpenJDK

The Readme.md file for the JITServer Helm Chart makes many references to AdoptOpenJDK which is obsolete.
We should change that to refer to Semeru builds and containers.

perf-tool: platform support

Right now it is developed and tested in Linux. It would be great to have this ported in other platforms that liberty supports (Windows, AIX, Mac, IBM i and zOS too) (originally raised by Felix, WAS dev)

perf-tool: Update documentation

Development in the perf-tool are has added new fields to the commands and the output generated by the agent.
Update documentation to reflect these changes.

perf-tool: add event type

When multiple events are enabled, the output JSON has no easy way to distinguish the a JSON object is written for a particular event type.

Helm Chart: Version upgrade for new OpenJ9 releases

The JITServer helm chart version needs to follow every OpenJ9 release to include the latest Adopt release images. This issue keeps track of changes that need to be applied to the helm chart.

There are three files that require updates for every version upgrade.

values.yaml
Chart.yaml
index.yaml

perf-tool: process crashes if starts with non-existent command file

ERROR opening commands file: No such file or directory
terminate called after throwing an instance of 'nlohmann::detail::parse_error'
  what():  [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - unexpected end of input; expected '[', '{', or a literal
JVMDUMP039I Processing dump event "abort", detail "" at 2021/02/23 05:40:05 - please wait.

perf-tool: data aggregation

Is it possible / meaningful to produce a summary view of the monitor data from JSON format to an aggregate view? If so, what aggregations make sense? when it should be performed? how it should be represented?

This can have some discussions

Catch json parse exceptions and fail gracefully

Currently, if there is a syntax error in the command file,
the json library will throw an exception and the JVM will
generate a core dump.

terminate called after throwing an instance of 'nlohmann::detail::parse_error'
  what():  [json.exception.parse_error.101] parse error at line 7, column 3: syntax error while parsing object key - unexpected '}'; expected string literal
JVMDUMP039I Processing dump event "abort", detail "" at 2021/03/22 18:37:19 - please wait.
JVMDUMP032I JVM requested System dump using '/home/mpirvu/CANOSP/Test/core.20210322.183719.14378.0001.dmp' in response to an event
JVMPORT030W /proc/sys/kernel/core_pattern setting "|/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %e" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.14401.

It may be nicer to catch such exceptions,
print a message and exit gracefully.

perf-tool: callback on events

Ability to send result to a callback method rather than writing into logs.json (originally raised by Felix, WAS dev)

perf-tool: more granular timestamp

  "body": "Server started",
  "from": "Server",
  "timestamp": 1611374479
}

right now the timestamp of an event is in seconds from epoch. this can be made in milliseconds by default, and made configurable (seconds and microseconds, if need be)

Eliminate JITServer Helm Chart warning about beta.kubernetes.io/arch

When deploying JITServer with the helm chart I see the following warning

W0513 13:23:01.453840   84779 warnings.go:70] spec.template.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[0].matchExpressions[0].key: beta.kubernetes.io/arch is deprecated since v1.14; use "kubernetes.io/arch" instead

We should make the suggested change to eliminate the warning.

perf-tool: owning thread info in the JSON body

It would be nice to have the owning thread ID (both java and native) in the body section.

As per @mpirvu , the info_ptr field in the call to

GetObjectMonitorUsage(jvmtiEnv* env,
            jobject object,
            jvmtiMonitorUsage* info_ptr)

is a structure like this:

typedef struct {
    jthread owner;
    jint entry_count;
    jint waiter_count;
    jthread* waiters;
    jint notify_waiter_count;
    jthread* notify_waiters;
} jvmtiMonitorUsage;

which has the owner field.

eclipse-openj9 / openj9-utils Goto Github PK

openj9-utils's People

Contributors

Stargazers

Watchers

Forkers

openj9-utils's Issues

Recommend Projects

Recommend Topics

Recommend Org