Code Monkey home page Code Monkey logo

openj9-utils's People

Contributors

amyhou avatar andrewcraik avatar danheidinga avatar dsouzai avatar emanelsaban avatar gireeshpunathil avatar keithc-ca avatar mpirvu avatar poojadurgad avatar pshipton avatar sharon-wang avatar yathamravali avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openj9-utils's Issues

Link information from JLM and MonitorContended events

JLM (java lock monitor) gives us a summary of activity on monitors, whereas MonitorContended events are triggered for every monitor individually and they also can be used to compute the waiting time for a particular monitor operation (something that JLM does not do).
Ideally, we would first use the information from JLM to determine which monitors are expensive and then use the MonitorContended events to drill deeper and find stack traces for threads waiting on expensive monitors and for threads holding onto expensive monitors. To do that we need some common monitor information between the tho sources. There are two possibilities:

  1. Use the raw address of the OpenJ9 monitors. This is already printed by JLM and we need a way to find this address from the MonitorContended events.
  2. Use the hash value of the monitors (actually for the object we are synchronizing on). This is available in the current version of the code for the MonitorContended events, but is not available in JLM. We would need to modify OpenJ9 repo to make this change.

perf-tools: assess performance overhead

the performance overhead of this tool should be measured and documented.

  • under default configurations
  • how does it vary with sampling rate
  • how does it vary with various collection of data points

perf-tool: need a unique key name in the `body` of events

right now, similar events cannot be aggregated - as there is no key that binds them together. The only way to find out is by iterating and parsing to match the thread stack.

Because each event on a monitorEvent is on a lock, does it make sense to use the address of the monitor itself as the key? As per @mpirvu , the java object address is subjected to chhange across gc cycles, and are not trustworthy.

this can have some discussions.

No jitserver binary in Semeru images?

I'm trying to find jitserver binary in the ibm-semeru-runtimes:open-8u332-b09-jre image, as referenced in values.yaml and deployment.yaml, but there appears to be none:

podman run --rm -it ibm-semeru-runtimes:open-8u332-b09-jre bash -c jitserver
bash: jitserver: command not found

Am I using a wrong image?

perf-tool: Consider setting capabilities only if needed

I see that capabilities like can_generate_method_entry_events (and can_generate_method_exit_events in #52) are set on Agent_OnLoad. It might be worth only setting these if explicitly set on the agent options so that if these capabilities aren't used, the JVM won't have to make unnecessary compromises (for example, the optimizations the JIT has to forgo because method enter/exit hooks could be triggered).

perf-tool: list of all waiters in the monitorEvents

It would be good to capture all the waiters info on the monitorEvents output, if need be. I suggest this to be implemented under a flag, or else can:

  • clutter the output
  • add performance overhead

The info_ptr field in the call to

GetObjectMonitorUsage(jvmtiEnv* env,
            jobject object,
            jvmtiMonitorUsage* info_ptr)

is a structure like this:

typedef struct {
    jthread owner;
    jint entry_count;
    jint waiter_count;
    jthread* waiters;
    jint notify_waiter_count;
    jthread* notify_waiters;
} jvmtiMonitorUsage;

and the fields waiter_count and waiters have the needed info, IIUC.
/cc @mpirvu - am I right?

Discussion on helm-based JITServer operator

As a continuous part of the JITServer on-cloud deployment discussion, we would like to implement an OpenJ9 JITServer operator to support OpenShift users.

The initial idea is to take advantage of the existing JITServer helm chart and implement a helm-based operator, then publish it as a community operator (without RedHat certification).

This issue will be populated as more details become available. A few items to confirm:

  • Explore the possibility of hosting operator on github repo, or we might have to host it on quay.io.
  • Find out and document what are the required steps before users can install this operator (operator-source.yaml).
  • Maintainance for version update or functionality updates.

In the future, we might want to extend the capacity of the operator and implement a go-based operator, but this is not our focus right now.

FYI: @mpirvu @keithc-ca @EmanElsaban

perf-tool: compute lock latency

Bascially compute

  • how long the lock is held,
  • what is the reason for the delay:
    • heavy contention or
    • locked code execution latency

an approach would be to split monitor event into monitor enter and monitor exit events, and find the time in between events with matching monitors. compare it with contention data to make inferences.

(there could be other means to do this)

Save info received by network clients to log

When the perf agent sends tracing information to the networking clients, that information is displayed on the screen. With so much text flushing on the screen it's difficult to type any new command. It's better to store this information to a log/file at the client, rather than displaying it on screen.

perf-tool: missing sections in verbose:gc

Example: 2 consecutive callbacks printed data like below:

{
  "body": "<exclusive-end id=\"38\" timestamp=\"2021-03-01T22:53:34.040\" durationms=\"6.638\" />\n\n",
  "eventType": "verboseGCEvent",
  "from": "Server",
  "timestamp": 1614668014040320324
},

and

{
  "body": "<sys-start reason=\"explicit\" id=\"40\" timestamp=\"2021-03-01T22:53:34.040\" intervalms=\"6.750\" />\n",
  "eventType": "verboseGCEvent",
  "from": "Server",
  "timestamp": 1614668014040566262
},

there is a missing section for exclusive-start. IMO this is not accidental. The facts that:

we are not actually skipping coherent blocks of verbose:gc sections, instead some random XML blocks

A solution would be to identify a reasonable eye-catcher (such as execusive-start and execusive-end) in the log data, use that for turning on and off the processing, along with the sampling calculation.

Security Best Practices

Hi,

As a member of the Security Team from the Eclipse Foundation, we used a tools Scorecard and StepSecurity to analyze this repo in order to push a pull request that cover some or all the following best practices below:

As a result, You will see a PR coming from StepSecurity to help to implement those fixes above which will cover a list of points below identified detected:

Please don’t hesitate and reach out if there is something unclear above.

Kind Regards,
Francisco Perez

perf-tool: platform support

Right now it is developed and tested in Linux. It would be great to have this ported in other platforms that liberty supports (Windows, AIX, Mac, IBM i and zOS too) (originally raised by Felix, WAS dev)

perf-tool: Update documentation

Development in the perf-tool are has added new fields to the commands and the output generated by the agent.
Update documentation to reflect these changes.

perf-tool: add event type

When multiple events are enabled, the output JSON has no easy way to distinguish the a JSON object is written for a particular event type.

Helm Chart: Version upgrade for new OpenJ9 releases

The JITServer helm chart version needs to follow every OpenJ9 release to include the latest Adopt release images. This issue keeps track of changes that need to be applied to the helm chart.

There are three files that require updates for every version upgrade.

  • values.yaml
  • Chart.yaml
  • index.yaml

perf-tool: process crashes if starts with non-existent command file

ERROR opening commands file: No such file or directory
terminate called after throwing an instance of 'nlohmann::detail::parse_error'
  what():  [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - unexpected end of input; expected '[', '{', or a literal
JVMDUMP039I Processing dump event "abort", detail "" at 2021/02/23 05:40:05 - please wait.

perf-tool: data aggregation

Is it possible / meaningful to produce a summary view of the monitor data from JSON format to an aggregate view? If so, what aggregations make sense? when it should be performed? how it should be represented?

This can have some discussions

Catch json parse exceptions and fail gracefully

Currently, if there is a syntax error in the command file,
the json library will throw an exception and the JVM will
generate a core dump.

terminate called after throwing an instance of 'nlohmann::detail::parse_error'
  what():  [json.exception.parse_error.101] parse error at line 7, column 3: syntax error while parsing object key - unexpected '}'; expected string literal
JVMDUMP039I Processing dump event "abort", detail "" at 2021/03/22 18:37:19 - please wait.
JVMDUMP032I JVM requested System dump using '/home/mpirvu/CANOSP/Test/core.20210322.183719.14378.0001.dmp' in response to an event
JVMPORT030W /proc/sys/kernel/core_pattern setting "|/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %e" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.14401.

It may be nicer to catch such exceptions,
print a message and exit gracefully.

perf-tool: callback on events

Ability to send result to a callback method rather than writing into logs.json (originally raised by Felix, WAS dev)

perf-tool: more granular timestamp

  "body": "Server started",
  "from": "Server",
  "timestamp": 1611374479
}

right now the timestamp of an event is in seconds from epoch. this can be made in milliseconds by default, and made configurable (seconds and microseconds, if need be)

Eliminate JITServer Helm Chart warning about beta.kubernetes.io/arch

When deploying JITServer with the helm chart I see the following warning

W0513 13:23:01.453840   84779 warnings.go:70] spec.template.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[0].matchExpressions[0].key: beta.kubernetes.io/arch is deprecated since v1.14; use "kubernetes.io/arch" instead

We should make the suggested change to eliminate the warning.

perf-tool: owning thread info in the JSON body

It would be nice to have the owning thread ID (both java and native) in the body section.

As per @mpirvu , the info_ptr field in the call to

GetObjectMonitorUsage(jvmtiEnv* env,
            jobject object,
            jvmtiMonitorUsage* info_ptr)

is a structure like this:

typedef struct {
    jthread owner;
    jint entry_count;
    jint waiter_count;
    jthread* waiters;
    jint notify_waiter_count;
    jthread* notify_waiters;
} jvmtiMonitorUsage;

which has the owner field.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.