eclipse-openj9 / openj9-utils Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
JLM (java lock monitor) gives us a summary of activity on monitors, whereas MonitorContended events are triggered for every monitor individually and they also can be used to compute the waiting time for a particular monitor operation (something that JLM does not do).
Ideally, we would first use the information from JLM to determine which monitors are expensive and then use the MonitorContended events to drill deeper and find stack traces for threads waiting on expensive monitors and for threads holding onto expensive monitors. To do that we need some common monitor information between the tho sources. There are two possibilities:
the performance overhead of this tool should be measured and documented.
While there is a JLM tool as part of the tprof collection of perf tools, it would be nice to have that functionality integrated with our own perf agent.
right now, similar events cannot be aggregated - as there is no key that binds them together. The only way to find out is by iterating and parsing to match the thread stack.
Because each event on a monitorEvent is on a lock, does it make sense to use the address of the monitor itself as the key? As per @mpirvu , the java object address is subjected to chhange across gc cycles, and are not trustworthy.
this can have some discussions.
per #37 (review)
Basically, the additional field that is introduced in #37 needs to be sent to the networking clients as well.
Those who pick this up, may do so after #37 lands.
add a verbose option to the tool, guard all the prints under this option, and add more prints at vital control points.
I'm trying to find jitserver
binary in the ibm-semeru-runtimes:open-8u332-b09-jre
image, as referenced in values.yaml and deployment.yaml, but there appears to be none:
podman run --rm -it ibm-semeru-runtimes:open-8u332-b09-jre bash -c jitserver
bash: jitserver: command not found
Am I using a wrong image?
I see that capabilities like can_generate_method_entry_events
(and can_generate_method_exit_events
in #52) are set on Agent_OnLoad
. It might be worth only setting these if explicitly set on the agent options so that if these capabilities aren't used, the JVM won't have to make unnecessary compromises (for example, the optimizations the JIT has to forgo because method enter/exit hooks could be triggered).
Please update https://github.com/eclipse/openj9-utils/blob/master/perf-tool/README.md on how to use verbose.
It would be good to capture all the waiters info on the monitorEvents
output, if need be. I suggest this to be implemented under a flag, or else can:
The info_ptr
field in the call to
GetObjectMonitorUsage(jvmtiEnv* env,
jobject object,
jvmtiMonitorUsage* info_ptr)
is a structure like this:
typedef struct {
jthread owner;
jint entry_count;
jint waiter_count;
jthread* waiters;
jint notify_waiter_count;
jthread* notify_waiters;
} jvmtiMonitorUsage;
and the fields waiter_count
and waiters
have the needed info, IIUC.
/cc @mpirvu - am I right?
@mpirvu reports that the tool faces occasional crashes. (upon pressing ctrl+c on a running process with the tool?)
reproduce it, investigate and fix the root cause.
For better readability, Group the owner thread info under a meaningful heading
Currently,
"threadID": 34,
"threadName": "Thread-13",
"threadNativeID": 329430
Target:
"OwnerThread": [
{
"threadID": 34,
"threadName": "Thread-13",
"threadNativeID": 329430
}
],
As a continuous part of the JITServer on-cloud deployment discussion, we would like to implement an OpenJ9 JITServer operator to support OpenShift users.
The initial idea is to take advantage of the existing JITServer helm chart and implement a helm-based operator, then publish it as a community operator (without RedHat certification).
This issue will be populated as more details become available. A few items to confirm:
quay.io
.operator-source.yaml
).In the future, we might want to extend the capacity of the operator and implement a go-based operator, but this is not our focus right now.
Bascially compute
an approach would be to split monitor event into monitor enter and monitor exit events, and find the time in between events with matching monitors. compare it with contention data to make inferences.
(there could be other means to do this)
When the perf agent sends tracing information to the networking clients, that information is displayed on the screen. With so much text flushing on the screen it's difficult to type any new command. It's better to store this information to a log/file at the client, rather than displaying it on screen.
Example: 2 consecutive callbacks printed data like below:
{
"body": "<exclusive-end id=\"38\" timestamp=\"2021-03-01T22:53:34.040\" durationms=\"6.638\" />\n\n",
"eventType": "verboseGCEvent",
"from": "Server",
"timestamp": 1614668014040320324
},
and
{
"body": "<sys-start reason=\"explicit\" id=\"40\" timestamp=\"2021-03-01T22:53:34.040\" intervalms=\"6.750\" />\n",
"eventType": "verboseGCEvent",
"from": "Server",
"timestamp": 1614668014040566262
},
there is a missing section for exclusive-start
. IMO this is not accidental. The facts that:
gc
event triggers multiple callbacks (anything starting with an xml tag)we are not actually skipping coherent blocks of verbose:gc
sections, instead some random XML blocks
A solution would be to identify a reasonable eye-catcher (such as execusive-start
and execusive-end
) in the log data, use that for turning on and off the processing, along with the sampling calculation.
Hi,
As a member of the Security Team from the Eclipse Foundation, we used a tools Scorecard and StepSecurity to analyze this repo in order to push a pull request that cover some or all the following best practices below:
As a result, You will see a PR coming from StepSecurity to help to implement those fixes above which will cover a list of points below identified detected:
Please don’t hesitate and reach out if there is something unclear above.
Kind Regards,
Francisco Perez
The Readme.md file for the JITServer Helm Chart makes many references to AdoptOpenJDK which is obsolete.
We should change that to refer to Semeru builds and containers.
Right now it is developed and tested in Linux. It would be great to have this ported in other platforms that liberty supports (Windows, AIX, Mac, IBM i and zOS too) (originally raised by Felix, WAS dev)
Development in the perf-tool are has added new fields to the commands and the output generated by the agent.
Update documentation to reflect these changes.
When multiple events are enabled, the output JSON has no easy way to distinguish the a JSON object is written for a particular event type.
The JITServer helm chart version needs to follow every OpenJ9 release to include the latest Adopt release images. This issue keeps track of changes that need to be applied to the helm chart.
There are three files that require updates for every version upgrade.
values.yaml
Chart.yaml
index.yaml
ERROR opening commands file: No such file or directory
terminate called after throwing an instance of 'nlohmann::detail::parse_error'
what(): [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - unexpected end of input; expected '[', '{', or a literal
JVMDUMP039I Processing dump event "abort", detail "" at 2021/02/23 05:40:05 - please wait.
Is it possible / meaningful to produce a summary view of the monitor data from JSON format to an aggregate view? If so, what aggregations make sense? when it should be performed? how it should be represented?
This can have some discussions
Currently, if there is a syntax error in the command file,
the json library will throw an exception and the JVM will
generate a core dump.
terminate called after throwing an instance of 'nlohmann::detail::parse_error'
what(): [json.exception.parse_error.101] parse error at line 7, column 3: syntax error while parsing object key - unexpected '}'; expected string literal
JVMDUMP039I Processing dump event "abort", detail "" at 2021/03/22 18:37:19 - please wait.
JVMDUMP032I JVM requested System dump using '/home/mpirvu/CANOSP/Test/core.20210322.183719.14378.0001.dmp' in response to an event
JVMPORT030W /proc/sys/kernel/core_pattern setting "|/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %e" specifies that the core dump is to be piped to an external program. Attempting to rename either core or core.14401.
It may be nicer to catch such exceptions,
print a message and exit gracefully.
Ability to send result to a callback method rather than writing into logs.json (originally raised by Felix, WAS dev)
"body": "Server started",
"from": "Server",
"timestamp": 1611374479
}
right now the timestamp of an event is in seconds
from epoch. this can be made in milliseconds by default, and made configurable (seconds and microseconds, if need be)
When deploying JITServer with the helm chart I see the following warning
W0513 13:23:01.453840 84779 warnings.go:70] spec.template.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[0].matchExpressions[0].key: beta.kubernetes.io/arch is deprecated since v1.14; use "kubernetes.io/arch" instead
We should make the suggested change to eliminate the warning.
It would be nice to have the owning thread ID (both java and native) in the body section.
As per @mpirvu , the info_ptr
field in the call to
GetObjectMonitorUsage(jvmtiEnv* env,
jobject object,
jvmtiMonitorUsage* info_ptr)
is a structure like this:
typedef struct {
jthread owner;
jint entry_count;
jint waiter_count;
jthread* waiters;
jint notify_waiter_count;
jthread* notify_waiters;
} jvmtiMonitorUsage;
which has the owner
field.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.