Code Monkey home page Code Monkey logo

sizebench's Introduction

Build Status

Welcome to the SizeBench repo

This repository contains the SizeBench tool for doing size analysis of PE files (Portable Executables such as DLL, EXE, and SYS files) typically used on Windows. It's intended to help with questions like:

  • Why is this binary so big?
  • What can be done to make it smaller?

If you're a user of the tool

You can get help by opening SizeBench and going to Help > Show Help. Or, in this repo the usage docs are in the EndUserDocs folder.

For an introduction to this tool, see the announcement blog post

Quick Start

Install SizeBench from the Microsoft Store.

Launch it from the start menu, then select Examine a Binary to pick a PE file and its symbols. Or select Start a diff to pick a "before" PE with its symbols and an "after" PE with its symbols.

On the command line, use sizebench.exe mybinary.exe mysymbols.pdb to analyze a single binary, or sizebench.exe ..\baseline\mybinary.exe ..\baseline\pdb\mybinary.pdb mybinary.exe .\pdb\mybinary.pdb to start a comparison session.

Contributing

We are excited to work alongside you, our amazing community!

BEFORE you start work on a feature/fix, please read & follow our Contributor's Guide to help avoid any wasted or duplicate effort.

Communicating with the Team

The easiest way to communicate with the team is via GitHub issues and GitHub discussions.

Please file new issues, feature requests and suggestions, but DO search for similar open/closed pre-existing issues before creating a new issue.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines.

Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Project Road Maps

The next expected work should be roughly this:

  1. Move to GitHub Actions for CI and PR pipelines.
  2. Create and publish NuGet package for the Analysis Engine.
  3. Support diffing of SourceFiles.
  4. Support diffing of Inline Sites.
  5. Improved support for binaries containing Rust code.

sizebench's People

Contributors

austin-lamb avatar jonwis avatar microsoftopensource avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sizebench's Issues

Column sorting should be done using stable sort implementation

Describe the bug
I cannot compose column sorting in SizeBench, because it uses unstable sorting for sorting by column.

What you are doing

  1. Open a binary for inspection
  2. Select "Start Exploring By Compilands"
  3. Sort results by specific column, e.g. Size on Disk
  4. Sort results by different column, e.g. Lib Name

Expected behavior
I want the rows to be sorted by Lib Name and within single lib, by their size on disk.

What actually happens is that I get rows sorted by Lib Name and otherwise jumbled, so I can't go after largest obj files in given lib, which is what I want to do.

Environment Details

  • OS: [type ver at the Windows Command Prompt]: [Version 10.0.19044.1889]
  • SizeBench version number [Go to Help > About SizeBench]: 2.2204.1800.0 (git commit b5c2b2ea)

Additional context
While my original use case is specifically grouping by lib, sort by size on disk, using stable sort when sorting by column will allow users to compose their own sorting orders arbitrarily.

Parallelize analysis to help with large binaries

When looking at large PDBs (like msedge.dll.pdb, chrome.dll.pdb) sizebench can take hours on my heavy duty dev box. Seems like there is an opportunity for performance improvement by utilizing more cores.

For example, the below has been running for 15 hours.

image

MicrosoftDebuggingDataModelDbgModelApiXtnPath empty in Directory.build.targets

I did a build, and saw this warning:

  MSBUILD : warning MSB5029: The value "\**\*" of the "Include" attribute in element <ItemGroup> in file "C:\src
\github\SizeBench\src\Directory.Build.targets (53,11)" is a wildcard that results in enumerating all files on th
e drive, which was likely not intended. Check that referenced properties are always defined.

Directory.build.targets has a line that does this:

    <None Include="$(MicrosoftDebuggingDataModelDbgModelApiXtnPath)\**\*" CopyToOutputDirectory="PreserveNewest" Visible="False" Link="%(RecursiveDir)%(FileName)%(Extension)"/>

which means that MicrosoftDebuggingDataModelDbgModelApiXtnPath must be empty.

Protecting it with a check ('$(MicrosoftDebuggingDataModelDbgModelApiXtnPath)' != '') fixed it for me:

  <!-- This is all the stuff necessary to use DbgX at runtime, it's needed in multiple projects, so it's centralized here -->
  <ItemGroup Condition="'$(IncludeDbgXAssets)'=='true' And '$(MicrosoftDebuggingDataModelDbgModelApiXtnPath)' != ''">
    <Content Include="..\ExternalDependencies\DIA\msdia140.dll">
      <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
    </Content>
    <None Include="$(MicrosoftDebuggingDataModelDbgModelApiXtnPath)\**\*" CopyToOutputDirectory="PreserveNewest" Visible="False" Link="%(RecursiveDir)%(FileName)%(Extension)"/>
  </ItemGroup>

I'm not entirely sure why MicrosoftDebuggingDataModelDbgModelApiXtnPath isn't set, but this fixes it for me.

It looks like I don't have permissions to create a Pull Request.

Enum / class in different unnamed namespaces cause Error opening binary

I have found a bug in Sizebench (the version downloaded from the Microsoft AppStore, I did not try to compile a version from source). The bug happens when compiling an executable or dll with VC2017, when in one translation unit an unnamed namespace exists with a virtual class, and in another translation unit an unnamed namespace with an enum with the same name. The following is a minimal example. If you rename NameClash in either file to something different, the problem no longer occurs.

-------- virtualstruct.cpp --------

namespace
{
    struct NameClash
    {
        virtual ~NameClash() = default;
    };
    NameClash instance;
}

-------- main.cpp --------

namespace
{
    enum NameClash { FOO };
}

int main(int argc, char ** argv)
{
    NameClash t = NameClash(argc);
    if (t == FOO)
        return 1;
    else
        return 0;
}

When you compile this with Visual Studio 2017, and open the resulting executable with Sizebench, you get a dialog with an error message "Error opening binary! - There was an error opening this binary or PDB", following by a link to some log, and a call stack.

I have found out that this problem does not occur when compiling the same files with Visual Studio 2022.

I'll try to attach a screenshot, the content of the log file, and a zip file with the project file, which can be used with Visual Studio 2017 to compile and reproduce the bug.

sizebencherror

tmpAD61.tmp.sizebenchlog.txt

sizebenchbug.zip

SizeBench's version information needs to be copyable

Describe the bug
If I go to Help > About SizeBench, I can't copy the version. This makes it needlessly annoying to file out bug reports against SizeBench.

Expected behavior
I can copy the version number just as I can copy the text in "Product details" field.

Environment Details

  • OS: [type ver at the Windows Command Prompt]: [Version 10.0.19044.1889]
  • SizeBench version number [Go to Help > About SizeBench]: 2.2204.1800.0 (git commit b5c2b2ea)

Infinite "Initializing debugging engine"

Describe the bug
When using features that require the debugging engine, SizeBench loops forever in "Initializing debugging engine" state.

What you are doing

  1. Download and install latest release from Windows Store.
  2. Run an analysis on a single binary. In my case this is C++ built with MSVC 2022 17.6.3, 50 MB exe and ~900 MB pdb.
  3. Select template foldability analysis
  4. In results, try to fall inside any record

It also reproduces with other features that require debugging engine.

Expected behavior
Either to have all necessary things installed with SizeBench or to provide meaningful error message with instructions for fixing an issue on user side.

Screenshots
image

Environment Details

  • OS: Windows 11 21H2 (22000.2057)
  • SizeBench version number 2.2302.2300.0 (git commit 91c1cfe)
  • Examined binary is built with MSVC 2022 17.6.3 and linked with Microsoft linker

Additional context
Application loops but doesn't hang, i.e. it responds to the input. I can use File and Help menus and open About SizeBench. Modal popup with "Initializing debugging engine" has a clickable X button but nothing happens on click. The longest trial is around 30 min with no progress. SizeBench.GUI.exe uses 0-1% of CPU. Memory consumption doesn't change.

SelectSingleBinaryAndPDBControlViewModel never gets an IBinaryLocator

Describe the bug
Using the "pick pdb" path, the code is supposed to hunt around to find a .dll nearby when the PDB path is chosen. This hunting does not happen.

What you are doing

  1. Start a new single-binary session
  2. Pick a PDB

Expected behavior
After picking a PDB with a DLL next to it, the "Binary path" field should auto-populate.

Environment Details

  • Windows 11
  • Version 1.0.0.0 git commit 2450ddc-dirty

Additional context
The callstack instantiating SelectSingleBinaryAndPDBControlViewModel is in the Windsor module. It ends up synthesizing a zero-length IBInaryLocator parameter to the constructor, causing InferBinaryPathFromPDBPathIfPossible to skip the mapping step.

Provide a "reference list" for a symbol

While tracking down mysterious binary bloat, I found a bunch of new .rdata in my DLL. They are all strings that look like symbols of code, but my binary has RTTI turned off.

It took a bit of digging:

  • Use the RVA from sizebench
  • Use link -dump -disasm -all -bytes thedll.dll > myfile.asm
  • Search for the RVA in the asm file to get a symbolic name
  • Search for the symbol in the asm file to find references

So two features in one:

Show the RVA symbolic name if it exists

In the view for the item, show both the gross-name and the decoded name if possible.

Provide a clickable link of functions that reference the symbol

This might be very costly to produce, but it would have made my life very easy. If the RVA is referenced in code, provide a link to the function disassembly that uses it. Bonus points for scrolling the listing and highlighting HERE IT IS for easy finding.

Remove LLVM Linker soft lock. It's usable otherwise.

I took SizeBench and removed the DIAAdapter.cs: linkerCommandLine is not MSVC_LINK_CommandLine check.
Tested one of my projects which is linked with LLVM Linker.

It hits the The gap between COFF Groups ... - a gap this large has not been observed before so it may indicate a bug in SizeBench check in BinarySection.cs.
Not sure this is related to LLVM Linker differences per se, but I removed that check as well.

Now SizeBench is perfectly usable for my project and provides helpful information.

I propose:

  • Drop the "MSVC Linker" hard requirement, possibly just warn instead.
  • If there is information missing in LLVM Linker output that would be helpful for SizeBench, create issue in LLVM project github.
  • Drop the InvalidOperation exception for large gaps between COFF groups, just warn instead.

"Wasteful virtuals" "virtuals with no overrides" has false positives with multiple levels of inheritance

Describe the bug
I was using the "wasteful virtuals" heuristic to try and optimize a large binary. The "virtuals with no overrides" list specifically is easy gains. However, one of the methods flagged does have overrides and removing the virtual keyword introduced a build break.

On closer inspection the problem seems to be that the virtual method and the override has at least one other derived class in the middle.

The heirarchy is roughly as follows:

class A
{
public:
   virtual void DoStuff() { }
};

class B : public A
{ };

class C : public B
{
public:
    void DoStuff() override { }
};

In this case A::DoStuff is considered a wasteful virtual even though C::DoStuff overrides it. Removing the virtual keyword will introduce a build break.

Expected behavior
This method should not be considered wasteful.

Screenshots
N/A

Environment Details
OS is Windows 11.
SizeBench is latest version from the Store at this time (on a different device so I don't have the exact version handy).

Additional context
N/A

Template Foldability diff view should ignore file paths

In this view, the first line difference is only in the source file - this DLL pulls in static libs that use the same cppwinrt version, but different runs of its projector, so it's got both lib1/obj/winrt/base.h and lib2/obj/winrt/base.h. The second difference is actually meaningful (comparing IIDs in queryinterface.)

I'd like to fix the diff view to either ignore the file-path difference in the view, or mask the "noncommon source base path" so only the real differences in code are shown.

SizeBench Duplicate Data suggestions should vary with project language

Describe the bug
SizeBench duplicate data advice is inaccurate. Inside the duplicate data feature, SizeBench says that "Changing to const 'constexpr' or 'const' or 'extern __declspec(selectany) const' will save copies.". This is only partially true depending on which language the user's project is written in. In Chromium's case, it is written in C++ where both const and constexpr symbols defined inside classes are implied to be static. The only part of the suggestion which works in Chromium's case is '__declspec(selectany) const'.

Expected behavior
SizeBench should give advice based on which language the user's project is written in. At the very least, it should have a warning that the current suggestions aren't for C++. This would prevent users from potentially losing development time on C++ projects.

Screenshots
SizeBench suggestion after opening duplicate data feature
Text:
These chunks of data are marked as 'static const' or in some cases just 'const' and have ended up with multiple copies of their data in the binary. Typically you'll see one copy per translation unit referencing the symbol. Changing to 'extern __declspec(selectany) const' will save copies.
image

SizeBench suggestion after opening a symbol in duplicate data feature
Text:
This symbol is duplicated between multiple compilands - the most common cause of this is that it is marked 'static const' and would be better off marked as 'const' or 'extern __declspec(selectany) const'. With the way it is defined now, it is wasting space in the binary with the same data in multiple locations.
image

Environment Details

  • OS: Microsoft Windows [Version 10.0.19044.2130]
  • SizeBench version number: 1.0.0.0

Additional context
N/A

The incremental linking detection could be made more accurate

Describe the bug
SizeBench will fail to analyze a binary if it thinks incremental linking was used. Some sets of linker options will preclude incremental linking but SizeBench doesn't realize this. The result is that it will reject analysis of a binary that would otherwise be compatible.

This change to BinSkim encodes more of the incremental linking rules in code and could serve as a good reference: microsoft/binskim#667.

The official docs are here: https://docs.microsoft.com/en-us/cpp/build/reference/incremental-link-incrementally?view=msvc-170#remarks.

Text should be selectable and copyable in the disassembly view (e.g. template foldability diff)

Is your feature request related to a problem? Please describe.
The disassembly view in template foldability is useful for seeing how templates are very similar but slightly different. It has the file name and line number in the disassembly. However, the text cannot be selected and therefore cannot be copy/pasted to a text editor to view the source code.

Describe the solution you'd like
The text in this view should be selectable and support copying.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.