Code Monkey home page Code Monkey logo

Comments (9)

0xnm avatar 0xnm commented on September 26, 2024 2

Hello @MikeFot!

Thanks for the details. I don't think it is somehow related to the background events, here we have a some non-successful status code returned by the Datadog intake, meaning something is wrong with the payload.

I've added a small change in the referenced PR above, which is now a part of 2.9.0 release, which will add response status code to the error message. Can you please try then version 2.9.0 of our SDK and tell us what is the status code reported, so that we can proceed with investigation?

from dd-sdk-android.

nicbell avatar nicbell commented on September 26, 2024 1

Thanks @0xnm we will try it out and report back.

from dd-sdk-android.

0xnm avatar 0xnm commented on September 26, 2024 1

I think here the issue is that your custom endpoint is replying with status code 200, while Datadog intake replies with status code 202 when request is accepted. In this case, there is a difference between the specification of your custom endpoint and Datadog intake.

Ideally, custom endpoint should act as a proxy and forward directly the communication with Datadog intake as-is (that is why it is advised to use the proxy configuration instead of custom endpoint).

I may suggest you to analyze the communication between your custom endpoint and Datadog intake for these failing requests, and if there is indeed a difference in status codes reported, then to adjust endpoint configuration or switch to the proxy setup if it works for you.

from dd-sdk-android.

0xnm avatar 0xnm commented on September 26, 2024

Hello @nicbell!

In order to better understand the issue, I have the following questions:

  • does it happen only on a particular device/OS version?
  • do you see such errors only for RUM, or, if you are using other features, you see such errors also for, say, Logs, etc.?
  • do you see any successful uploads on this device, or all the RUM uploads are like that?
  • do you see any other errors from Datadog SDK in Logcat?

from dd-sdk-android.

MikeFot avatar MikeFot commented on September 26, 2024

Hello! I can provide provide additional info on the issue as I am working with @nicbell:

General Info

  • This is happening across multiple devices and supported OS versions in our Production Environment
  • We can verify that the events are being tracked correctly by our other Analytics implementations and we can compare the numbers
  • This is mostly happening when the app is in the background e.g. there is a push notification rendered event that we can verify is tracked in another platform 8300 times but only 2300 times in Datadog, even though they are going through the same in-app funnel so we have verified they are submitted to the analytics integrations identically
    • We have also verified that 8300 is the expected number from other sources
  • Success rate for events showing up on the Datadog Dashboards is much higher when the app is in the foreground but there is still a percentage of missing events

We are looking at ~70% of events lost when the app is in the Background and possibly ~10% in the Foreground. Please note that these are Rough estimates as this is an ongoing investigation.

Additional Investigations Performed

  • As @nicbell mentioned, we have tried out 2.6.0, 2.7.0 and 2.8.0
  • Additionally reverted to SDK version 2.1.0 and noticed that background events were being tracked more reliably
    • There were still missing events in general and the aforementioned error is in the logs but not at the scale that we are observing in the tested versions
  • Downgraded our Work Manager to match the version packaged with the Datadog SDK but there was no improvement

Error Log

This is the only Datadog error we are seeing in the logs, and we have tried with all log levels, and the additional sdk log artifact enabled just in case it provided more info. The error is logged multiple times per session.

[_dd.sdk_core.default]: Batch 238f8a16-f0ea-4eef-a7b0-43bf873d65fb [18036 bytes] (RUM Request) failed because of an unknown error; the batch was dropped.

Example of this occurring consecutively within 1 second with timestamps:

11:51:32.467  E  [_dd.sdk_core.default]: Batch 5da558ea-607e-4191-bbf1-6b0b83b67812 [132314 bytes] (RUM Request) failed because of an unknown error; the batch was dropped.
11:51:40.923  E  [_dd.sdk_core.default]: Batch 42bb555e-24ef-4982-b2c9-50c6eb836bf4 [39492 bytes] (RUM Request) failed because of an unknown error; the batch was dropped.
11:51:48.583  E  [_dd.sdk_core.default]: Batch 07a75b5c-5418-42b3-a1eb-feab9648f5d8 [136818 bytes] (RUM Request) failed because of an unknown error; the batch was dropped.
11:51:55.512  E  [_dd.sdk_core.default]: Batch 3671f32d-2b38-4103-bb09-22a41ae758d4 [415 bytes] (RUM Request) failed because of an unknown error; the batch was dropped.
11:52:08.268  E  [_dd.sdk_core.default]: Batch f22c39ac-8c79-4088-a257-78f4be337672 [653 bytes] (RUM Request) failed because of an unknown error; the batch was dropped.

Work Manager

We can occasionally see the Work Manager succeeding. Have not seen any erroneous Worker results in the logs.

Worker result SUCCESS for Work [ id=4df6f445-729c-44c4-9b52-c7c9b6725bad, tags={ com.datadog.android.core.internal.data.upload.UploadWorker, DatadogBackgroundUpload } ]

Artifacts Implemented

  • com.datadoghq:dd-sdk-android-okhttp
  • com.datadoghq:dd-sdk-android-rum
  • com.datadoghq:dd-sdk-android-trace

and tried out com.datadoghq:dd-sdk-android-logs as well for the sake of debugging the issue.

Additional log warnings

The only warnings we are seeing are listed below.

[INFO:CONSOLE(1)] "Unsatisfied version 5.12.0 from one-host of shared singleton module @datadog/browser-rum (required ^4.50.1)

and in some cases

[_dd.sdk_core.default]: A RUM event was detected, but no view is active. To track views automatically, try calling the RumConfiguration.Builder.useViewTrackingStrategy() method. You can also track views manually using the RumMonitor.startView() and RumMonitor.stopView() methods.

from dd-sdk-android.

MikeFot avatar MikeFot commented on September 26, 2024

Thank you very much for adding this log!

Ran the app with the new version today through several scenarios and I can see the status code 200 being reported.

09:33:55.652  E  [_dd.sdk_core.default]: Batch 24f97b16-7bc4-4a69-977c-0fc81202f918 [41625 bytes] (RUM Request) failed because of an unknown error (status code = 200); the batch was dropped.
09:34:06.887  E  [_dd.sdk_core.default]: Batch 58cb65fa-a72b-47b3-9f49-6218e8431f13 [100250 bytes] (RUM Request) failed because of an unknown error (status code = 200); the batch was dropped.
09:34:17.069  E  [_dd.sdk_core.default]: Batch 4dd34e0a-9353-4bb1-b091-dd469e4ea0b2 [59533 bytes] (RUM Request) failed because of an unknown error (status code = 200); the batch was dropped.
09:34:26.265  E  [_dd.sdk_core.default]: Batch 07047f3f-6c01-4eb9-948d-0e847e630337 [48981 bytes] (RUM Request) failed because of an unknown error (status code = 200); the batch was dropped.
09:35:15.274  E  [_dd.sdk_core.default]: Batch 9079f43f-2376-43c7-b5e5-9f17bb283e6c [30183 bytes] (RUM Request) failed because of an unknown error (status code = 200); the batch was dropped.
09:36:34.126  E  [_dd.sdk_core.default]: Batch 2fe7197d-1145-40b9-ad4a-edf335fc0b51 [156802 bytes] (RUM Request) failed because of an unknown error (status code = 200); the batch was dropped.
09:36:53.408  E  [_dd.sdk_core.default]: Batch 318530dc-b14a-4333-bf1f-5a1e2eca6002 [60887 bytes] (RUM Request) failed because of an unknown error (status code = 200); the batch was dropped.
09:37:02.593  E  [_dd.sdk_core.default]: Batch 765d632f-0d38-4da6-9a28-5d64f6516c4c [83955 bytes] (RUM Request) failed because of an unknown error (status code = 200); the batch was dropped.
09:37:10.905  E  [_dd.sdk_core.default]: Batch 7e9f77d8-6785-49c5-84e0-2545bb668fc5 [83834 bytes] (RUM Request) failed because of an unknown error (status code = 200); the batch was dropped.
09:37:18.508  E  [_dd.sdk_core.default]: Batch af856a59-410b-4f7a-88af-5ae13df0e71b [90671 bytes] (RUM Request) failed because of an unknown error (status code = 200); the batch was dropped.
09:37:32.533  E  [_dd.sdk_core.default]: Batch 0a555ffd-85d4-4a5e-a7f4-a7961f55c039 [12886 bytes] (RUM Request) failed because of an unknown error (status code = 200); the batch was dropped.
09:37:46.397  E  [_dd.sdk_core.default]: Batch 5c1db6d5-1be1-41f5-9575-4e76e4cd8f07 [415 bytes] (RUM Request) failed because of an unknown error (status code = 200); the batch was dropped.

from dd-sdk-android.

0xnm avatar 0xnm commented on September 26, 2024

@MikeFot Thanks a lot for a quick return. This log indeed makes things a bit more clear, I have additional questions now:

  • Are you using proxy/custom endpoint in Datadog SDK setup? Or events are sent directly to Datadog intake?
  • Which datacenter do you have configured? Is it US1, EU1, etc.?

from dd-sdk-android.

MikeFot avatar MikeFot commented on September 26, 2024

Hi!

  • RumConfiguration is set up with a custom endpoint
  • We are using DatadogSite.US1 in Configuration

Cheers!

from dd-sdk-android.

MikeFot avatar MikeFot commented on September 26, 2024

Thank you very much! We are going to investigate and let you know 🙂

from dd-sdk-android.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.