aws-observability / aws-otel-java-instrumentation Goto Github PK
View Code? Open in Web Editor NEWAWS Distro for OpenTelemetry Java Instrumentation Library
Home Page: https://aws-otel.github.io/
License: Apache License 2.0
AWS Distro for OpenTelemetry Java Instrumentation Library
Home Page: https://aws-otel.github.io/
License: Apache License 2.0
Github action integration test workflow for java agent
Any estimation about when are we going to have the base otel agent updated?
currently, it's based on v1.2.0 which includes the out-of-memory issue.
That problem got solved in otel agent v1.3.0 (1) but the PR and the merge for upgrading the agent failed in here;
recently, v1.4.0 has been released, so looking forward to this one :)
Thanks in advance
Hello @mxiamxia and AWS team -
When using OTEL with Spring boot app, spring boot admin console wraps all actual classes with io.openteletry.javaagent.instrumentation.spring.scheduling.SpringSchedulingRunnablewrapper. I would like to see the actual classes and not have the above wrapper obfuscate the actual classes.
Describe the bug
We are seeing more than 50% performance degradation with instrumenting otel agents, Our application instrumented with otel runs on EKS cluster. OTel Collector running as daemon set in the same EKS cluster collects traces and ingest data to AWS Xray.
Steps to reproduce
This is Spring 5 project with webflux and spring cloud stream support interacting with SQS, DynamoDB and AWS MSK
What did you expect to see?
Without Otel Agent, application could reach upto 250 request per second with 2Gi memory.
What did you see instead?
After OTel agent, we are seeing ~65 request per second with same settings, I was expecting some degradation in the throughput but this is more 50%
Additional context
We are using aws-opentelemetry-agent-1.1.0 with default settings for BSP and sampling is set to 100% and metrics exporter is set to logging.
My issue looks exactly like #66 but we're using version 1.5.1/1.6.0
{
"@timestamp": "2021-09-23T12:03:26.518+02:00",
"@version": "1",
"message": "my message",
"logger_name": "myloggerr",
"thread_name": "grpc-default-executor-11",
"level": "TRACE",
"level_value": 5000,
"trace_id": "614c50d5522ead27e473e03f06c469c5",
"trace_flags": "01",
"span_id": "ab12a7b5a9a0001c"
}
If I use version 1.4.1 its there
{
"@timestamp": "2021-09-23T12:11:15.644+02:00",
"@version": "1",
"message": "mymessage",
"logger_name": "mylogger",
"thread_name": "grpc-default-executor-7",
"level": "TRACE",
"level_value": 5000,
"trace_id": "614c52c132ce415b4ab8604020840209",
"trace_flags": "01",
"span_id": "c5e5f7bf633a1f60",
"AWS-XRAY-TRACE-ID": "1-614c52c1-32ce415b4ab8604020840209@c5e5f7bf633a1f60"
}
Seems to be a regression
Describe the bug
It takes very long time to run the AWS Lambda Java function instrumented by AWS OTEL Java Agent. In case of very simple function which makes only HTTP GET requests it takes around 40-45 seconds to run the code. (cold run) and it is causing some Lambda HTTP response timeouts - later it's smooth.
Steps to reproduce
aws-opentelemetry-agent.jar set as -javaagent
lambda function code
package example;
import static java.util.stream.Collectors.toMap;
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import com.amazonaws.services.lambda.runtime.events.APIGatewayProxyRequestEvent;
import com.amazonaws.services.lambda.runtime.events.APIGatewayProxyResponseEvent;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.HttpClients;
import java.io.PrintWriter;
import java.io.StringWriter;
import java.util.Map;
import java.util.concurrent.TimeUnit;
public class HelloLambdaHandler implements RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent> {
@Override
public APIGatewayProxyResponseEvent handleRequest(APIGatewayProxyRequestEvent request, Context context) {
APIGatewayProxyResponseEvent response = new APIGatewayProxyResponseEvent();
response.setStatusCode(200);
try {
response.setHeaders(request.getHeaders().entrySet()
.stream()
.map(e -> Map.entry("received-" + e.getKey(), e.getValue()))
.collect(toMap(Map.Entry::getKey, Map.Entry::getValue)));
} catch (Exception e) {}
HttpClient httpclient = HttpClients.createDefault();
for(int i = 0; i < 3; i++) {
try {
HttpResponse httpResponse = httpclient.execute(new HttpGet("http://httpbin.org/"));
httpResponse.getEntity().getContent().readAllBytes();
} catch (Exception e) {
}
}
Throwable t = new Throwable();
StringWriter writer = new StringWriter();
t.printStackTrace(new PrintWriter(writer));
response.setBody("I'm lambda!\n" + writer.toString());
return response;
}
}
What did you expect to see?
I expect to start the Lambda much faster. In case of OpenTelemetry Java Agent it is 9 seconds
Additional context
After the Soak Tests completed, a performance degradation was revealed for commit df05909 of the refs/heads/main
branch for the (spark, auto) Sample App. Check out the Action Logs from the Soak Testing
workflow run on GitHub to view the threshold violation.
Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:
The threshold violation should also be noticeable on our graph of Soak Test average results per commit.
Hello
Describe the bug
We used the old Xray SDK before we switched to otel-collector/javaagent this month. Right now we're missing the "x-amzn-trace-id" header in our API responses.
What did you expect to see?
x-amzn-trace-id should be present in http responses.
Additional context
v.0.6.0 sidecar and v0.12.1-aws.1 agent
Is this expected/work in progress or do I miss something?
Thanks
After the Soak Tests completed, a performance degradation was revealed for commit d1b010b of the refs/heads/main
branch for the (springboot, auto) Sample App. Check out the Action Logs from the Soak Testing
workflow run on GitHub to view the threshold violation.
Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:
The threshold violation should also be noticeable on our graph of Soak Test average results per commit.
After switching to the new opentelemetry agent in version: v 1.4.1 from 1.2.0 metrics exported to the new relic are always 0. I can see labels that are added there on my custom metrics and I can see agent reported metrics that are sent. Yet they have always value equal to 0.
Params that I use:
-javaagent:/...somePaths.../aws-opentelemetry-agent.jar -Dotel.javaagent.debug=true -Dotel.metrics.exporter=otlp -Dotel.instrumentation.runtime-metrics.enabled=true
And ENV Variables that I use:
OTEL_RESOURCE_ATTRIBUTES=service.name=someappName,service.namespace=somenamespace,service.instance.id=mymachine.local
OTEL_EXPORTER_OTLP_ENDPOINT=http://collectorIp:4317
OTEL_PROPAGATORS=xray
After replacing path to jar with agent in version 1.2.0 everything works just fine
Is your feature request related to a problem? Please describe.
II am trying to log the end-user username to X-ray with with the OTEL agent using the attribute "enduser.id".
I've verified that the code works locally, and that the attribute is added.
The metadata does not contain the attribute when in X-Ray:
{
"default": {
"enduser.scope": "ROLE_USER USE_PREMIUM_TTS ROLE_STUDENT",
"thread.name": "http-nio-8180-exec-21",
"thread.id": ""
}
}
Code:
public abstract class AbstractSpringOpenTelemetryInterceptor {
// see io.opentelemetry.semconv.trace.attributes.SemanticAttributes
public static final String ENDUSER_ID = "enduser.id";
public static final String ENDUSER_SCOPE = "enduser.scope";
public static String INSTRUMENTATION_APP_PACKAGE = "se.nomp";
private final Tracer tracer = GlobalOpenTelemetry.getTracer(INSTRUMENTATION_APP_PACKAGE);
protected Object wrapWithSpan(ProceedingJoinPoint pjp) throws Throwable {
Span span = tracer.spanBuilder(generateSubsegmentName(pjp)).startSpan();
try (Scope scope = span.makeCurrent()) {
var principalName = getPrincipalName();
if (principalName != null) {
span.setAttribute(ENDUSER_ID, principalName);
var authorities = getAuthorities();
if (authorities != null) {
span.setAttribute(ENDUSER_SCOPE, authorities);
}
}
return conditionalProceed(pjp);
} finally {
span.end();
}
}
protected String generateSubsegmentName(ProceedingJoinPoint pjp) {
return pjp.getSignature().getDeclaringType().getSimpleName() + "." + pjp.getSignature().getName();
}
private static Object conditionalProceed(ProceedingJoinPoint pjp) throws Throwable {
return pjp.getArgs().length == 0 ? pjp.proceed() : pjp.proceed(pjp.getArgs());
}
public static String getPrincipalName() {
Authentication authentication = SecurityContextHolder.getContext().getAuthentication();
if (authentication != null) {
return authentication.getName();
}
return null;
}
public static String getAuthorities() {
Authentication authentication = SecurityContextHolder.getContext().getAuthentication();
if (authentication != null) {
var authorities = authentication.getAuthorities();
if (authorities != null) {
return authorities.stream().map(a -> a.getAuthority()).collect(Collectors.joining(" "));
}
}
return null;
}
}
Describe the solution you'd like
I'd like to know what I should be doing instead.
Hello,
Describe the bug
We use the java agent with one of our spring boot apps. The agent adds a traceId like 60108d573a343aec0c7a35f4ede1e064
to our log output but it should have been something like 1-60108d57-3a343aec0c7a35f4ede1e064
.
What did you expect to see?
Maybe the proper traceId copy&paste'ble to Xray? Im not quite sure what is the expected behavior here.
Additional context
v.0.6.0 sidecar and v0.12.1-aws.1 agent
Is this expected/work in progress or do I miss something?
Thanks
The log4j dependency has a critical security vulnerability: https://www.kaspersky.com/blog/log4shell-critical-vulnerability-in-apache-log4j/43124/
The latest commit in master fixes this by updating the log4j dependency to > 2.15.
Can you create a release to include the critical security fix, so we don't need to build a separate jar from master?
Automated performance test workflow for AWS OTel Java agent
After the Soak Tests completed, a performance degradation was revealed for commit 1e5ce35 of the refs/heads/main
branch for the (spark-awssdkv1, auto) Sample App. Check out the Action Logs from the Soak Testing
workflow run on GitHub to view the threshold violation.
Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:
The threshold violation should also be noticeable on our graph of Soak Test average results per commit.
Describe the bug
Smoke test is failing intermittently
Steps to reproduce
What did you expect to see?
A clear and concise description of what you expected to see.
What did you see instead?
SpringBootSmokeTest > hello() FAILED
java.lang.AssertionError:
Expecting any element of:
<[trace_id: "_\312\\\346M\357;\244\266z\310\0054\332\233\307"
span_id: "\001\005\330\037\367\231u\237"
parent_span_id: "\"?\035@\372S\3259"
name: "AppController.backend"
kind: SPAN_KIND_INTERNAL
start_time_unix_nano: 1607097575118242100
end_time_unix_nano: 1607097575155671900
attributes {
key: "thread.id"
value {
int_value: 24
}
}
attributes {
key: "thread.name"
value {
string_value: "http-nio-8080-exec-4"
}
}
status {
}
,
trace_id: "_\312\\\346M\357;\244\266z\310\0054\332\233\307"
span_id: "5e\347F\240S\345}"
parent_span_id: "\367\275\001\230\224\355P\201"
name: "AppController.hello"
kind: SPAN_KIND_INTERNAL
start_time_unix_nano: 1607097575004034200
end_time_unix_nano: 1607097575194917200
attributes {
key: "thread.id"
value {
int_value: 23
}
}
attributes {
key: "thread.name"
value {
string_value: "http-nio-8080-exec-3"
}
}
status {
}
,
trace_id: "_\312\\\346M\357;\244\266z\310\0054\332\233\307"
span_id: "\'\354\243\3516\032\267\212"
parent_span_id: "5e\347F\240S\345}"
name: "HTTP GET"
kind: SPAN_KIND_CLIENT
start_time_unix_nano: 1607097575093631200
end_time_unix_nano: 1607097575180852600
attributes {
key: "thread.id"
value {
int_value: 23
}
}
attributes {
key: "thread.name"
value {
string_value: "http-nio-8080-exec-3"
}
}
attributes {
key: "net.transport"
value {
string_value: "IP.TCP"
}
}
attributes {
key: "http.method"
value {
string_value: "GET"
}
}
attributes {
key: "http.flavor"
value {
string_value: "1.1"
}
}
attributes {
key: "net.peer.name"
value {
string_value: "localhost"
}
}
attributes {
key: "net.peer.port"
value {
int_value: 8080
}
}
attributes {
key: "http.url"
value {
string_value: "http://localhost:8080/backend"
}
}
attributes {
key: "http.status_code"
value {
int_value: 200
}
}
status {
}
,
trace_id: "_\312\\\346M\357;\244\266z\310\0054\332\233\307"
span_id: "\"?\035@\372S\3259"
parent_span_id: "\'\354\243\3516\032\267\212"
name: "/backend"
kind: SPAN_KIND_SERVER
start_time_unix_nano: 1607097575117008100
end_time_unix_nano: 1607097575160650900
attributes {
key: "thread.id"
value {
int_value: 24
}
}
attributes {
key: "thread.name"
value {
string_value: "http-nio-8080-exec-4"
}
}
attributes {
key: "net.peer.ip"
value {
string_value: "127.0.0.1"
}
}
attributes {
key: "net.peer.port"
value {
int_value: 47104
}
}
attributes {
key: "http.method"
value {
string_value: "GET"
}
}
attributes {
key: "http.user_agent"
value {
string_value: "Java/11.0.8"
}
}
attributes {
key: "http.url"
value {
string_value: "http://localhost:8080/backend"
}
}
attributes {
key: "http.flavor"
value {
string_value: "HTTP/1.1"
}
}
attributes {
key: "http.client_ip"
value {
string_value: "127.0.0.1"
}
}
attributes {
key: "http.status_code"
value {
int_value: 200
}
}
status {
}
]>
to satisfy the given assertions requirements but none did:
<trace_id: "_\312\\\346M\357;\244\266z\310\0054\332\233\307"
span_id: "\001\005\330\037\367\231u\237"
parent_span_id: "\"?\035@\372S\3259"
name: "AppController.backend"
kind: SPAN_KIND_INTERNAL
start_time_unix_nano: 1607097575118242100
end_time_unix_nano: 1607097575155671900
attributes {
key: "thread.id"
value {
int_value: 24
}
}
attributes {
key: "thread.name"
value {
string_value: "http-nio-8080-exec-4"
}
}
status {
}
> error:
Expecting:
<SPAN_KIND_INTERNAL>
to be equal to:
<SPAN_KIND_SERVER>
but was not.
<trace_id: "_\312\\\346M\357;\244\266z\310\0054\332\233\307"
span_id: "5e\347F\240S\345}"
parent_span_id: "\367\275\001\230\224\355P\201"
name: "AppController.hello"
kind: SPAN_KIND_INTERNAL
start_time_unix_nano: 1607097575004034200
end_time_unix_nano: 1607097575194917200
attributes {
key: "thread.id"
value {
int_value: 23
}
}
attributes {
key: "thread.name"
value {
string_value: "http-nio-8080-exec-3"
}
}
status {
}
> error:
Expecting:
<SPAN_KIND_INTERNAL>
to be equal to:
<SPAN_KIND_SERVER>
but was not.
<trace_id: "_\312\\\346M\357;\244\266z\310\0054\332\233\307"
span_id: "\'\354\243\3516\032\267\212"
parent_span_id: "5e\347F\240S\345}"
name: "HTTP GET"
kind: SPAN_KIND_CLIENT
start_time_unix_nano: 1607097575093631200
end_time_unix_nano: 1607097575180852600
attributes {
key: "thread.id"
value {
int_value: 23
}
}
attributes {
key: "thread.name"
value {
string_value: "http-nio-8080-exec-3"
}
}
attributes {
key: "net.transport"
value {
string_value: "IP.TCP"
}
}
attributes {
key: "http.method"
value {
string_value: "GET"
}
}
attributes {
key: "http.flavor"
value {
string_value: "1.1"
}
}
attributes {
key: "net.peer.name"
value {
string_value: "localhost"
}
}
attributes {
key: "net.peer.port"
value {
int_value: 8080
}
}
attributes {
key: "http.url"
value {
string_value: "http://localhost:8080/backend"
}
}
attributes {
key: "http.status_code"
value {
int_value: 200
}
}
status {
}
> error:
Expecting:
<SPAN_KIND_CLIENT>
to be equal to:
<SPAN_KIND_SERVER>
but was not.
<trace_id: "_\312\\\346M\357;\244\266z\310\0054\332\233\307"
span_id: "\"?\035@\372S\3259"
parent_span_id: "\'\354\243\3516\032\267\212"
name: "/backend"
kind: SPAN_KIND_SERVER
start_time_unix_nano: 1607097575117008100
end_time_unix_nano: 1607097575160650900
attributes {
key: "thread.id"
value {
int_value: 24
}
}
attributes {
key: "thread.name"
value {
string_value: "http-nio-8080-exec-4"
}
}
attributes {
key: "net.peer.ip"
value {
string_value: "127.0.0.1"
}
}
attributes {
key: "net.peer.port"
value {
int_value: 47104
}
}
attributes {
key: "http.method"
value {
string_value: "GET"
}
}
attributes {
key: "http.user_agent"
value {
string_value: "Java/11.0.8"
}
}
attributes {
key: "http.url"
value {
string_value: "http://localhost:8080/backend"
}
}
attributes {
key: "http.flavor"
value {
string_value: "HTTP/1.1"
}
}
attributes {
key: "http.client_ip"
value {
string_value: "127.0.0.1"
}
}
attributes {
key: "http.status_code"
value {
int_value: 200
}
}
status {
}
> error:
Expecting:
<"/backend">
to be equal to:
<"/hello">
but was not.
at io.awsobservability.instrumentation.smoketests.runner.SpringBootSmokeTest.hello(SpringBootSmokeTest.java:154)
Additional context
Add any other context about the problem here.
After the Soak Tests completed, a performance degradation was revealed for commit 1e5ce35 of the refs/heads/main
branch for the (spark-awssdkv1, auto) Sample App. Check out the Action Logs from the Soak Testing
workflow run on GitHub to view the threshold violation.
Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:
The threshold violation should also be noticeable on our graph of Soak Test average results per commit.
During Soak Tests execution, a performance degradation was revealed for commit 2a354ab of the refs/heads/main
branch for the (spark, auto) Sample App. Check out the Action Logs from the Soak Testing
workflow run on GitHub to view the threshold violation.
Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:
Describe the bug
spring java applications have kafka implementation, one is producing message and other is consuming, running it with aws opentelemetry javaagent, collector is installed , traces are generated and I can see on aws-xray console but producer-consumer graph and traces are not connected.
When I printed headers I can see AWS-Amazon-Xray-Id header in both producer and consumer. Trace id value is same in this header but trace-id is different when printing it in logs.
e.g : TRACES from my application
9090] 2021-10-07 18:49:18 - c.n.n.k.i.KafkaProducerInterceptor - HEADERS IN PRODUCER**-----for TOPIC XXXX , {traceparent=00-615ef3d419ea6e5838f81274b50e4dde-8816e8faeacd8c8a-01, X-Amzn-Trace-Id=Root=1-615ef3d4-19ea6e5838f81274b50e4dde;Parent=8816e8faeacd8c8a;Sampled=1} trace_id=615ef3d419ea6e5838f81274b50e4dde span_id=8816e8faeacd8c8a trace_flags=01
[9090] 2021-10-07 18:49:18 - c.n.n.k.i.KafkaConsumerInterceptor - HEADERS IN CONSUMER-----for TOPIC XXXX , {nn-api-key-id=null, nn-timestamp=null, traceparent=00-615ef3d419ea6e5838f81274b50e4dde-8816e8faeacd8c8a-01, nn-app-name=null, nn-device-id=null, X-Amzn-Trace-Id=Root=1-615ef3d4-19ea6e5838f81274b50e4dde;Parent=8816e8faeacd8c8a;Sampled=1, nn-trans-id=9090} trace_id=615ef3d63e20955caaa62e7a3bd423e9 span_id=1633f0c818ad6cf1 trace_flags=01
Steps to reproduce
two microservices , getting request and sending message to kafka, other microservice is consuming the same message on same topic.
What did you expect to see?
from controller to producer and also consumer traces should be connected on aws-xray-console
What did you see instead?
Additional context
I have tested in two sets of application, result is same not connecting:
After the Soak Tests completed, a performance degradation was revealed for commit c258373 of the refs/heads/main
branch for the (springboot, auto) Sample App. Check out the Action Logs from the Soak Testing
workflow run on GitHub to view the threshold violation.
Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:
The threshold violation should also be noticeable on our graph of Soak Test average results per commit.
During Soak Tests execution, a performance degradation was revealed for commit d1b010b of the refs/heads/main
branch for the (springboot, auto) Sample App. Check out the Action Logs from the Soak Testing
workflow run on GitHub to view the threshold violation.
Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:
The request is to support sampling logic in ADOT sdk as similar like what has been currently offered in X-Ray SDK.
In order to better structure our release history, we should create a running release notes folder. This would allow us to:
Better structure of our release history by creating release-specific change log file.
Better focus on release changes, as we can easily reference specific time-stamped release notes files. Also, a less "monolith" document creates more logical space to work with to allow specific examples and comments in a reader-friendly structure.
Easier to maintain with new files per commit.
Can be easily used in conjunction with a change log generator tool to compile a skeleton that includes all PRs and commits.
AWS Open Source advises tools such as the github-changelog-generator, but we can discuss options :)
During Soak Tests execution, a performance degradation was revealed for commit 5d89f4c of the refs/heads/main
branch for the (spark-awssdkv1, auto) Sample App. Check out the Action Logs from the Soak Testing
workflow run on GitHub to view the threshold violation.
Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:
During Soak Tests execution, a performance degradation was revealed for commit a49aa48 of the refs/heads/main
branch for the (springboot, auto) Sample App. Check out the Action Logs from the Soak Testing
workflow run on GitHub to view the threshold violation.
Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:
Is your feature request related to a problem? Please describe.
We've just migrated from AppD (javaagent) to X-Ray using the OTEL agent, v1.1. We're currently using traceidratio
sampling and a 1/200 ratio.
This causes high frequency service calls to be over-represented in the sample set and low frequency services to be under-represented, or simply not present at all.
Describe the solution you'd like
I'd like a more sophisticated sampler that will collect data from all spans every time period, regardless of traffic.
See jaegertracing/jaeger#365 for problem description and possible inspiration.
It seems like a common and basic need for instrumentation? Is anything planned upstream?
I should also add that we're running a monolitic application, so we can't configure each service with a different sample rate, as they are running in the same JVM / javaagent. But even in a micro service architecture, this problem would exist.
We do use Spring AOP to create custom spans for the Spring @Service
business services.
I'd rather not have to roll my own OTEL java agent to address this issue... Are there any ways to address this from user-space?
Describe the bug
Im not quite sure if this ever worked with AWS ADOT but with the old Xray instrumentation it did.
We're running a quite simple and typical setup, an AWS ALB in front of a ECS Fargate task that runs our application.
We use the latest version of the AWS flavoured java agent (1.7) and version 0.13.0 of the AWS flavoured otel collector as a sidecar.
We turned on LB logs (s3) in order to have more insights into requests. It appears that not a single trace id mentioned in LB logs is visible within our application logs.
With the former Xray setup we could basically search for a trace id in our log frontend and would get both, the application logs and the corresponding LB log.
Apart from the java agent we only use the AwsXrayPropagator to inject the trace id back into our response. Something along these lines
public class XrayTraceIdHeaderFilter extends OncePerRequestFilter {
@Override
protected void doFilterInternal(
HttpServletRequest request, HttpServletResponse response, FilterChain chain)
throws ServletException, IOException {
AwsXrayPropagator.getInstance()
.inject(Context.current(), response, HttpServletResponse::setHeader);
chain.doFilter(request, response);
}
}
Is there something else I have to do in order to get this working again?
Additional notes:
After the Soak Tests completed, a performance degradation was revealed for commit c258373 of the refs/heads/main
branch for the (spark-awssdkv1, auto) Sample App. Check out the Action Logs from the Soak Testing
workflow run on GitHub to view the threshold violation.
Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:
The threshold violation should also be noticeable on our graph of Soak Test average results per commit.
After the Soak Tests completed, a performance degradation was revealed for commit df05909 of the refs/heads/main
branch for the (springboot, auto) Sample App. Check out the Action Logs from the Soak Testing
workflow run on GitHub to view the threshold violation.
Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:
The threshold violation should also be noticeable on our graph of Soak Test average results per commit.
During Soak Tests execution, a performance degradation was revealed for commit e7dc6a7 of the refs/heads/main
branch for the (spark, auto) Sample App. Check out the Action Logs from the Soak Testing
workflow run on GitHub to view the threshold violation.
Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:
Describe the bug
The trace_id, span_id and trace_flags are not logged with log4j2 + spring boot.
With the javaagent provided from this repository (1.6.0)
With the javaagent from open-telemetry (1.6.2)
Steps to reproduce
Here is a repository that reproduces the described issue.
Note, there are no tests in the repository. But the application logs show the faulty behaviour.
What did you expect to see?
What did you see instead?
Additional context
The same application works when using the native javaagent from the open-telemetry org.
gradle-q-dependencis.txt
Is your feature request related to a problem? Please describe.
So actually I have a question, is SQS supported by the agent fully?
Because currently I have a queue between my services, and by using the java agent, the service map is representing only one SQS queue, without any in or outflow from it to my actual services. while in the pet clinic sample app or other places, we have a nice representation which shows the queue exactly between the two services which pushing or receiving msg from it.
Describe the solution you'd like
Support for the SQS and be able to have the same visualization and representation of sqs.
Thanks in advance
Are there plans to upgrade to open-telemetry agent v. 0.10.1 anytime soon? I need it to fix open-telemetry/opentelemetry-java/#2052 Thank you!
During Soak Tests execution, a performance degradation was revealed for commit 5d89f4c of the refs/heads/main
branch for the (spark, auto) Sample App. Check out the Action Logs from the Soak Testing
workflow run on GitHub to view the threshold violation.
Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:
After the Soak Tests completed, a performance degradation was revealed for commit 1e5ce35 of the refs/heads/main
branch for the (spark-awssdkv1, auto) Sample App. Check out the Action Logs from the Soak Testing
workflow run on GitHub to view the threshold violation.
Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:
The threshold violation should also be noticeable on our graph of Soak Test average results per commit.
During Soak Tests execution, a performance degradation was revealed for commit 5d89f4c of the refs/heads/main
branch for the (springboot, auto) Sample App. Check out the Action Logs from the Soak Testing
workflow run on GitHub to view the threshold violation.
Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:
During Soak Tests execution, a performance degradation was revealed for commit 5d89f4c of the refs/heads/main
branch for the (spark-awssdkv1, auto) Sample App. Check out the Action Logs from the Soak Testing
workflow run on GitHub to view the threshold violation.
Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:
Hello, Anyone else seeing this error with AWS OTEL java agent 1.7 (aws-opentelemetry-agent.jar) and it does not appear with java agent 1.6.
Stacktrace below:
java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0
at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248)
at java.base/java.util.Objects.checkIndex(Objects.java:372)
at java.base/java.util.ArrayList.set(ArrayList.java:473)
at io.opentelemetry.javaagent.instrumentation.apachehttpasyncclient.ApacheHttpClientRequest.headersToList(ApacheHttpClientRequest.java:51)
at io.opentelemetry.javaagent.instrumentation.apachehttpasyncclient.ApacheHttpClientRequest.getHeader(ApacheHttpClientRequest.java:41)
at io.opentelemetry.javaagent.instrumentation.apachehttpasyncclient.ApacheHttpAsyncClientHttpAttributesExtractor.requestHeader(ApacheHttpAsyncClientHttpAttributesExtractor.java:31)
at io.opentelemetry.javaagent.instrumentation.apachehttpasyncclient.ApacheHttpAsyncClientHttpAttributesExtractor.requestHeader(ApacheHttpAsyncClientHttpAttributesExtractor.java:16)
at io.opentelemetry.javaagent.shaded.instrumentation.api.instrumenter.http.HttpCommonAttributesExtractor.userAgent(HttpCommonAttributesExtractor.java:92)
at io.opentelemetry.javaagent.shaded.instrumentation.api.instrumenter.http.HttpCommonAttributesExtractor.onStart(HttpCommonAttributesExtractor.java:36)
at io.opentelemetry.javaagent.shaded.instrumentation.api.instrumenter.http.HttpClientAttributesExtractor.onStart(HttpClientAttributesExtractor.java:44)
at io.opentelemetry.javaagent.shaded.instrumentation.api.instrumenter.Instrumenter.start(Instrumenter.java:147)
at io.opentelemetry.javaagent.shaded.instrumentation.api.instrumenter.ClientInstrumenter.start(ClientInstrumenter.java:26)
at io.opentelemetry.javaagent.instrumentation.apachehttpasyncclient.ApacheHttpAsyncClientInstrumentation$DelegatingRequestProducer.generateRequest(ApacheHttpAsyncClientInstrumentation.java:109)
at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.start(DefaultClientExchangeHandlerImpl.java:123)
at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:141)
at com.livevox.proxy.core.GenericRequestProcessor.forward(GenericRequestProcessor.java:145)
at com.livevox.proxy.core.GenericRequestProcessor.run(GenericRequestProcessor.java:90)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
During Soak Tests execution, a performance degradation was revealed for commit e7dc6a7 of the refs/heads/main
branch for the (springboot, auto) Sample App. Check out the Action Logs from the Soak Testing
workflow run on GitHub to view the threshold violation.
Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:
Describe the bug
We are using EKS with several namespaces. The problem is that the services are not getting recognized correctly and I think the proper attributes are not getting add to the traces correctly.
I'm using ADOT java agent(1.1.0) + ADOT agent as DaemonSet (v0.11.0) + ADOT collector (v0.11.0)
firstly while all of the services have different names and service.name
attribute, they are not getting represented, just the connection from those services are getting represented, for example, the connection from the service to the S3 or documentDB. like below:
client ---> S3
while the expected behaviour is like, client ---> my sevice(pod) name---> S3
Secondly, I expected to have some extra information and attributes by using ADOT java agent or ADOT collector which seems not working, for instance as I was seeing that the pods are getting considered as EC2 instance, I tried to add the k8s.cluster.name=the-cluster-name
manually to the java agent. This leads to having some changes in the representation as follows:
without setting manually the k8s cluster attribute:
After adding the attribute: (just seems that the representation icon changed and it becomes double as I just added this attribute to one of namespaces which this image was running on it, so other namespaces are still without attribute and represented as single EC2 instance (The same image running in different namespaces))
I thought maybe it's due to the fact that i'm running the collector as DaemonSet, not Sidecar, Hence I tested it with sidecar but didn't improve the situation. I would be glad to have your idea about these issues, while I'm trying to solve them.
I created this issue as it works as documentation as well, maybe I'm missing some parts or there are some compulsory attributes that should be set in case of using multi namespace based on EKS manually. in any case i will try to keep my opened issues as update as possible to my latest findings.
thank you in advance for your help.
During Soak Tests execution, a performance degradation was revealed for commit 5d89f4c of the refs/heads/main
branch for the (springboot, auto) Sample App. Check out the Action Logs from the Soak Testing
workflow run on GitHub to view the threshold violation.
Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:
Describe the bug
We updated from 1.4.1 of the AWS flavoured agent to 1.5.0 and start to see exceptions in the log like this
Caused by: java.lang.NoSuchMethodError: 'io.opentelemetry.sdk.resources.Resource io.opentelemetry.sdk.autoconfigure.OpenTelemetrySdkAutoConfiguration.getResource()'
--
| | at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:525)
| | at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:513)
| | at java.base/java.lang.reflect.Method.invoke(Method.java:566)
| | at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
| | at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
| | at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
| | at software.amazon.opentelemetry.javaagent.bootstrap.AwsAgentBootstrap.premain(AwsAgentBootstrap.java:24)
| | at software.amazon.opentelemetry.javaagent.bootstrap.AwsAgentBootstrap.agentmain(AwsAgentBootstrap.java:28)
| | at io.opentelemetry.javaagent.OpenTelemetryAgent.agentmain(OpenTelemetryAgent.java:51)
| | at io.opentelemetry.javaagent.bootstrap.AgentInitializer.initialize(AgentInitializer.java:40)
| | at java.base/java.lang.reflect.Method.invoke(Method.java:566)
| | at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
| | at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
| | at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Additional context
We also updated the opentelemtry bom version to 1.5.0 as suggested but we're only using the aws extensions for injecting the trace ID into our HTTP responses.
Any ideas whats going on?
Our application works great with aws agent 1.4.1 and open telemetry 1.4.0
After the Soak Tests completed, a performance degradation was revealed for commit df05909 of the refs/heads/main
branch for the (spark-awssdkv1, auto) Sample App. Check out the Action Logs from the Soak Testing
workflow run on GitHub to view the threshold violation.
Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:
The threshold violation should also be noticeable on our graph of Soak Test average results per commit.
Is it possible for the AWS javaagent to capture AWS specific resource attributes like region, availability zone, ECS ARN, Container memory and CPU limits etc as per the OTel Cloud Semantic Conventions? At the moment, we're using the OTel Javaagent and using AOP to inject these attributes as span attributes but these are truly resource attributes and should be captured as such.
We updated to the latest release (0.17) of the aws-otel-java-instrumentation (from 0.15 previously) and now we see exceptions on startup and traces are not reported.
ERROR io.opentelemetry.javaagent.OpenTelemetryAgent
java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
at io.opentelemetry.javaagent.OpenTelemetryAgent.agentmain(OpenTelemetryAgent.java:64)
at software.amazon.opentelemetry.javaagent.bootstrap.AwsAgentBootstrap.agentmain(AwsAgentBootstrap.java:28)
at software.amazon.opentelemetry.javaagent.bootstrap.AwsAgentBootstrap.premain(AwsAgentBootstrap.java:24)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(Unknown Source)
at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndCallPremain(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
at io.opentelemetry.javaagent.bootstrap.AgentInitializer.startAgent(AgentInitializer.java:44)
at io.opentelemetry.javaagent.bootstrap.AgentInitializer.initialize(AgentInitializer.java:30)
... 13 more
Caused by: java.lang.ExceptionInInitializerError
at io.opentelemetry.sdk.extension.aws.resource.Ec2ResourceProvider.createResource(Ec2ResourceProvider.java:16)
at io.opentelemetry.sdk.autoconfigure.OpenTelemetrySdkAutoConfiguration.buildResource(OpenTelemetrySdkAutoConfiguration.java:77)
at io.opentelemetry.sdk.autoconfigure.OpenTelemetrySdkAutoConfiguration.(OpenTelemetrySdkAutoConfiguration.java:25)
at io.opentelemetry.javaagent.tooling.OpenTelemetryInstaller.installAgentTracer(OpenTelemetryInstaller.java:36)
at io.opentelemetry.javaagent.tooling.OpenTelemetryInstaller.beforeByteBuddyAgent(OpenTelemetryInstaller.java:27)
at io.opentelemetry.javaagent.tooling.AgentInstaller.installComponentsBeforeByteBuddy(AgentInstaller.java:168)
at io.opentelemetry.javaagent.tooling.AgentInstaller.installBytebuddyAgent(AgentInstaller.java:102)
at io.opentelemetry.javaagent.tooling.AgentInstaller.installBytebuddyAgent(AgentInstaller.java:86)
... 19 more
Caused by: java.lang.NullPointerException
at io.opentelemetry.sdk.extension.aws.resource.Ec2Resource.buildResource(Ec2Resource.java:83)
at io.opentelemetry.sdk.extension.aws.resource.Ec2Resource.buildResource(Ec2Resource.java:49)
at io.opentelemetry.sdk.extension.aws.resource.Ec2Resource.(Ec2Resource.java:31)
... 27 more
Additional info:
We deploy to an EC2 backed ECS cluster with the aws otel collector running as a sidecar to each service.
Describe the bug
When trying to add a Span Event to a span using the span.current.addEvent() method in OpenTelemetry, the event is not visible in AWS X-Ray. We can see our custom spans but none of the Events.
Steps to reproduce
We are using a Java SpringBoot application using the spring-boot-starter-parent version 2.4.5. We are using the opentelemetry-otlp-exporter-starter 1.4.1-alpha dependency.
We created a docker image for our application that adds the aws-opentelemetry-agent.jar (v1.4.1) into the image when we build it. We then run as a javaagent when the image is deployed. We deploy our application to ECS Fargate and we then run the aws-otel-collector as a sidecar with the default configuration.
What did you expect to see?
We expected to see our span events either as annotations or metadata in our spans.
What did you see instead?
We just saw our custom spans without any Span Events.
Additional context
We are also experiencing the same behaviour when we use Elastic as the backend for observability. This is due to this issue
Describe the bug
Instrumenting java agent for application interacting with S3 using aws-java-sdk-s3 causes SignatureDoesNotMatch error.
Steps to reproduce
Instrument application interacting with S3 using aws-java-sdk-s3 library
com.amazonaws.services.s3.model.AmazonS3Exception: null (Service: Amazon S3; Status Code: 403; Error Code: SignatureDoesNotMatch;
Version of Java agent
0.17.0-aws.1
Describe the bug
We are tracing our SpringBoot/SpringCloudGateway application by the AWS Auto Instrumentation Agent. After I change the release from Release v1.1.0 to Release v1.4.0, the traceId shows empty, and the key AWS-XRAY-TRACE-ID in MDC context map has also disappeared.
Steps to reproduce
java -Dspring.profiles.active=local -javaagent:/aotel/aws-opentelemetry-agent-1.4.0.jar one.jar
"one.jar" is a SpringBoot application, configuring by logback pattern:
%d{yyyy-MM-dd HH:mm:ss.SSS}\t[%X{AWS-XRAY-TRACE-ID}] - %msg%n
What did you expect to see?
Before upgrade, using agent 1.1.0:
java -Dspring.profiles.active=local -javaagent:/aotel/aws-opentelemetry-agent-1.1.0.jar one.jar
the log is showing like this:
2021-07-29 18:50:09.618 [1-610287e1-4bad020c021ab9b074d04217@a01edea91aca582c] origin request body: ...
What did you see instead?
After upgrade to agent 1.2.0 or 1.4.0, logging like this:
2021-07-29 18:50:09.618 [] origin request body: ...
There are none AWS-XRAY-TRACE-ID in MDC context by debug mode:
"mdc": { "appVersion": "", "interface io.opentelemetry.javaagent.shaded.io.opentelemetry.api.trace.Span": "{opentelemetry-trace-span-key=RecordEventsReadableSpan{traceId=610399e3815dc59e58765f785125a19c, spanId=52817ef4208de551, parentSpanContext=ImmutableSpanContext{traceId=610399e3815dc59e58765f785125a19c, spanId=0877b939030ee172, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, name=FilteringWebHandler.handle, kind=INTERNAL, attributes=AttributesMap{data={thread.name=reactor-http-nio-4, thread.id=43}, capacity=128, totalAddedValues=2}, status=ImmutableStatusData{statusCode=UNSET, description=}, totalRecordedEvents=0, totalRecordedLinks=0, startEpochNanos=1627625955919426700, endEpochNanos=0}, opentelemetry-traces-server-span-key=RecordEventsReadableSpan{traceId=610399e3815dc59e58765f785125a19c, spanId=0877b939030ee172, parentSpanContext=ImmutableSpanContext{traceId=00000000000000000000000000000000, spanId=0000000000000000, traceFlags=00, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=false}, name=HTTP POST, kind=SERVER, attributes=AttributesMap{data={net.peer.port=51264, schoolCode=415, http.method=POST, clientType=, http.user_agent=PostmanRuntime/7.28.2, platformVersion=, http.url=http://localhost:8080/v1/aggregate/user/info, platform=, studentId=7592518, http.client_ip=0:0:0:0:0:0:0:1, net.peer.ip=0:0:0:0:0:0:0:1, thread.name=reactor-http-nio-4, deviceName=, appVersion=, thread.id=43, http.flavor=1.1, appName=}, capacity=128, totalAddedValues=17}, status=ImmutableStatusData{statusCode=UNSET, description=}, totalRecordedEvents=0, totalRecordedLinks=0, startEpochNanos=1627625955919001800, endEpochNanos=0}}", "appName": "", "requestMethod": "POST", "reactor.onDiscard.local": "reactor.core.publisher.Operators$$Lambda$1192/593986789@222b5946", "requestUri": "/v1/aggregate/user/info", "deviceName": "", "platform": "", "studentId": "7592518", "hostname": "ZT-081201", "clientType": "", "platformVersion": "", "schoolCode": "415", "remoteAddr": "0:0:0:0:0:0:0:1", "trace_id": "610399e3815dc59e58765f785125a19c", "trace_flags": "01", "span_id": "0877b939030ee172" },
I don’t know where the usage has changed, or there is a bug here, any suggestions or code would be appreciated.
Is your feature request related to a problem? Please describe.
We run the aws-otel-java-instrumentation agent to export telemetry data to the collector that runs as an ECS daemon service on an EC2 host that's part of an ECS cluster. When doing a deployment, the collector daemon becomes temporarily unavailable as it needs to stop the existing task in order to run the new version of the collector task. During this time, telemetry data might be lost when trying to export it to the collector.
Is there a way to configure the agent such that it can retry until the collector becomes available? If not, is there a workaround for this? Also note, we're using the otlp
exporter.
Describe the solution you'd like
The agent should be configurable in a way that allows for retrying exporting telemetry data during transient unavailability of the collector.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.