Code Monkey home page Code Monkey logo

staticpgo_example's Introduction

This project demonstrates how to collect a static profile (PGO aka Profile-Guided Optimization) for a simple console app in order to make it faster. The profile describes a typical behavior of an app: which parts of methods are hot or cold, actual types of objects hidden under abstractions, etc. It can be collected dynamically via tiered compilation or statically where we build a special version of an app (aka "Instrumented Build"), run it, simulate typical workloads, save the resulting profile to a file and then re-use it in production. Both approaches have pros and cons.

NOTE: The workflow to collect static profiles is not final yet and can be improved/simplied in the future versions of daily builds.

What exactly PGO can optimize for us?

void DisposeMe(IDisposable d)
{
    d.Dispose();
}

    is optimized into:

void DisposeMe(IDisposable d)
{
    if (d is MyType)           // E.g. Profile states that Dispose here is mostly called on MyType.
        ((MyType)d).Dispose(); // It can be inlined now (e.g. to no-op if MyType::Dispose() is empty)
    else
        d.Dispose();           // a cold fallback, just in case
}

image     ^ codegen diff for a case where MyType::Dispose is empty

  • JIT re-orders blocks to keep hot ones closer to each other and pushes cold ones to the end of the method.
void DoWork(int a)
{
    if (a > 0)
        DoWork1();
    else
        DoWork2();
}

    is transformed into:

void DoWork(int a)
{
    // E.g. Profile states that DoWork1 branch was never (or rarely) taken
    if (a <= 0)
        DoWork2();
    else
        DoWork1();
}
  • Some optimizations such as Loop Clonning, Inlined Casts, etc. aren't applied in cold blocks
  • Guided AOT: We can prejit only the code that was executed during the test run. It should noticeably reduce binary size of R2R'd images as the cold methods won't be prejitted at all. For that, you need to pass --partial flag to crossgen2 along with the actual MIBC data.

DynamicPGO vs StaticPGO

As I already mentioned, there are pros and cons.

DynamicPGO

Pros:

  • Easy to use: you just need to set the following env. variables: DOTNET_TC_QuickJitForLoops=1, DOTNET_TieredPGO=1 and DOTNET_ReadyToRun=0
  • Collects actual profile live - you don't need to worry about "Is my static profile still relevant?" or "Does my static profile cover this specific scenario?"

Cons:

  • Noticeably slower start - mostly because for better results we need to turn off all the prejitted (AOT) code + we emit a lot of additional block counters and class probes in tier0.
  • We don't support context-sensitive PGO and de-optimizations yet so we bake profile data into methods after just 30 calls and that data wlll be there forever for all possible callsites, and for some of them it might be less relevant.
  • DOTNET_TC_QuickJitForLoops=1 sometimes leads to performance issues known as "Cold loop - hot body" and needs OSR in JIT which is not finished yet.

StaticPGO

Pros

  • Doesn't affect startup time or even improves it
  • Can be used for Guided-AOT where we prejit only the code that was invoked during the test run. It makes AOT images smaller.
  • Since during the test run we never promote methods to tier1 - it's able to avoid that "context-sensitive" issue by collecting all possible scenarios for a specific method.

Cons

  • Difficult to setup, it requires special steps to create an instrumented build and simulate typical workloads
  • It has to be re-collected once something is changed
  • Currently it requires Composite-R2R mode (with compilebubblegenerics) for better results.

Prerequisites

  • The latest daily build of .NET 6.0 from here (should be at least 7/25/2021)
  • dotnet tool install --global dotnet-pgo --version "6.0.0-rc.1.21375.2" tool. See dotnet-pgo.md

How to run the sample

First, we need to build a special version of our sample and run it in order to collect a profile:

dotnet publish -c Release -r win-x64 /p:CollectMibc=true # or linux-x64, osx-arm64, etc..

The console app has a special msbuild task to do that job. Basically, it runs a fully instrumented build, collects traces, converts them to a special format *.mibc that we can use to optimize our app. Now we can re-publish the app using the PGO data we collected previously:

dotnet publish -c Release -r win-x64 /p:PgoData=pgo.mibc

Let's compare performance for StaticPGO, DynamicPGO and Default modes:

Performance results

  1. Normal run dotnet run -c Release:
Running...
[0/9]: 57 ms.
[1/9]: 56 ms.
[2/9]: 56 ms.
[3/9]: 54 ms.
[4/9]: 54 ms.
[5/9]: 54 ms.
[6/9]: 54 ms.
[7/9]: 54 ms.
[8/9]: 54 ms.
[9/9]: 54 ms.
  1. Run with static pgo (steps from the How to run the sample section above):
Running...
[0/9]: 19 ms.
[1/9]: 19 ms.
[2/9]: 19 ms.
[3/9]: 19 ms.
[4/9]: 19 ms.
[5/9]: 19 ms.
[6/9]: 18 ms.
[7/9]: 18 ms.
[8/9]: 18 ms.
[9/9]: 18 ms.
  1. Run with dynamic PGO (steps from How to run the sample aren't needed. Only just set the following env.variables in your console):
$env:DOTNET_ReadyToRun=0           # ignore AOT code
$env:DOTNET_TieredPGO=1            # enable dynamic pgo
$env:DOTNET_TC_QuickJitForLoops=1  # don't bypass tier0 for methods with loops
Running...
[0/9]: 164 ms.
[1/9]: 175 ms.
[2/9]: 19 ms.
[3/9]: 18 ms.
[4/9]: 18 ms.
[5/9]: 18 ms.
[6/9]: 18 ms.
[7/9]: 18 ms.
[8/9]: 18 ms.
[9/9]: 18 ms.

Notes

DynamicPGO is easy to use, but you pay for it with a slower start, because we need to disable all the prejitted code and re-compile everything in tier0 with instrumentation - edge counters and class probes. E.g. the following aspnet benchmark demonstrates the difference between Static and Dynamic PGOs:

image

With the static one you only need to collect it in advance.

staticpgo_example's People

Contributors

egorbo avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.