Code Monkey home page Code Monkey logo

Comments (9)

bvolpato avatar bvolpato commented on May 18, 2024 4

For those who are wondering, the Profiler does not get populated (and profile files are not saved on GCS, either) if you set both properties at the same time (APICurated=true and saveProfilesToGcs={path}).

I removed the saveProfilesToGcs and now profiler works fine for me.

from dataflowjavasdk.

bjchambers avatar bjchambers commented on May 18, 2024 3

We’re working on a better experience for profiling but there is rudimentary support for profiling available in the SDK.

What you’ll need

  1. An installation of pprof
  2. An installation of graphviz if you’d like to visualize profile information.

How to get profiles

  1. Run your pipeline specifying --saveProfilesToGcs=<gs://your_gcs_bucket>. This will write profiles to the given GCS bucket.
  2. Retrieve the profiles from the GCS using gsutil -m cp -r <gs://your_gcs_bucket> <local_dir>.
  3. View the profiles using pprof. Run pprof <local_dir>/*cpu*.gz for CPU profiles (or *wall*.gz for wall-time profiles). From here you can run graphviz to render a calltree, or text or tree for text-based reports. See the pprof docs and pprof --help for more ways to interact with the profiles.

Hope that helps!

Notes and Caveats

  • The profiles will be 10 second samples from every 60 seconds of execution.
  • For a batch job the VM instances are normally torn down after the job completes, and the final trace may not get uploaded to GCS.
  • Multiple ParDo steps may execute together. When this happens, the call to output() in the first step will include the time to execute the later steps. As a result, the inclusive time for these steps will be inflated.
  • If you want the profiles to include information about JNI calls make sure to have any relevant binaries/object files in the directory you run pprof from.

from dataflowjavasdk.

bfabry avatar bfabry commented on May 18, 2024

If you want the profiles to include information about JNI calls make sure to have any relevant binaries/object files in the directory you run pprof from.

Is there any documentation on how to get the binaries used by dataflow to do this?

EDIT: ie, I'm seeing a lot of this type of thing

      flat  flat%   sum%        cum   cum%
885904.05s 93.27% 93.27% 885904.05s 93.27%  [libpthread-2.19.so]
 36963.81s  3.89% 97.16%  36963.81s  3.89%  GC
 11025.24s  1.16% 98.33%  11045.60s  1.16%  [libc-2.19.so]
  5444.52s  0.57% 98.90%   5444.52s  0.57%  Native
   488.93s 0.051% 98.95% 330068.45s 34.75%  [libjvm.so]
    60.95s 0.0064% 98.96% 897579.05s 94.50%  <unknown>

and would like to get some understanding as to what is being called inside libpthread

/cc @bjchambers

from dataflowjavasdk.

peay avatar peay commented on May 18, 2024

Is there similar support for Beam's Dataflow runner? (edit: nevermind, just found DataflowProfilingOptions)

from dataflowjavasdk.

swegner avatar swegner commented on May 18, 2024

Yes, in Apache Beam profiling support is now enabled via --saveProfilesToGcs=<gs://...>, defined inside DataflowProfilingOptions.

from dataflowjavasdk.

bvolpato avatar bvolpato commented on May 18, 2024

I couldn't get it to work.

Even though I am sending:

  saveProfilesToGcs: gs://labs1-carol-internal/profiler
  profilingAgentConfiguration: {APICurated=true}

Through Java code:

        DataflowProfilingOptions profilingOptions = dataflowPipelineOptions.as(DataflowProfilingOptions.class);
        profilingOptions.setSaveProfilesToGcs("gs://" + PipelineHelper.getBucketName(bucket) + "/profiler");

        DataflowProfilingAgentConfiguration agent = new DataflowProfilingAgentConfiguration();
        agent.put("APICurated", true);
        
        profilingOptions.setProfilingAgentConfiguration(agent);
        

I don't get any files in the profiler, and this message is printed on Stackdriver:
Profiling Agent not found. Profiles will not be available from this worker.

Any ideas?

from dataflowjavasdk.

lukecwik avatar lukecwik commented on May 18, 2024

Which version of the SDK are you using?

Have you tried contacting Google Cloud support and share some job ids with them?

from dataflowjavasdk.

bvolpato avatar bvolpato commented on May 18, 2024

@lukecwik I tried with both 2.13.0 and 2.14.0. Will try to contact their support, thanks!

from dataflowjavasdk.

bvolpato avatar bvolpato commented on May 18, 2024

Support could not help with this and still didn't find a way to get profilers.

On the other hand, I would like to mention that Profiles in Dataflow don't have a Service Level Agreement (SLA) since this is an experimental Alpha feature, and is not recommended for production use cases, as mentioned in [2].
[2] https://cloud.google.com/products/#product-launch-stages

from dataflowjavasdk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.