Code Monkey home page Code Monkey logo

go-perfguard's Introduction

Build Status Go Report Card

perfguard

This tool is a work in progress. It's not fully production-ready yet, but you can try it out.

Overview

perfguard is a Go static analyzer with an emphasis on performance.

It supports two run modes:

  1. perfguard lint finds potential issues, works like traditional static analysis
  2. perfguard optimize uses CPU profiles to improve the analysis precision

perfguard key features:

  • Profile-guided analysis in perfguard optimize mode
  • Most found issues are auto-fixable with --fix argument (quickfixes)
  • Easy to extend with custom rules (no recompilation needed)
  • Can analyze big projects* even if they have some compilation errors

(*) It doesn't try to load analysis targets into memory all at once.

Here are some examples of what it can do for you:

  • Remove redundant data copying or make it faster
  • Reduce the amounts of heap allocations
  • Suggest more optimized functions or types from stdlib
  • Recognize expensive operations in hot paths that can be lifted

Installation

Install a perfguard binary under your $(go env GOPATH)/bin:

$ go install -v github.com/quasilyte/go-perfguard/cmd/perfguard@latest

Using perfguard

It's recommended that you collect CPU profiles on realistic workflows.

For a short-lived CLI app it could be a full run. For a long-living app you may want to turn the profiling on for a minute or more, then save it to a file.

Profiles that are obtained from benchmarks are not representative and may lead to suboptimal results.

Hot spots in the profile may appear in three main places:

  1. Standard Go library and the runtime. We can't apply fixes to that
  2. Your app (or library) own code
  3. Your code dependencies (direct or indirect)

Optimizing your own code is straightforward. Run perfguard on the root of your project:

$ perfguard optimize --heatmap cpu.out ./...

This will only suggest fixes to the (2) category.

To optimize the code from (3) we have several choices.

  1. Optimize the library itself
  2. Optimize the whole code base with an explicit vendor

The first option is preferable. You can use the same CPU profile to optimize the library. Run the perfguard on the library source code root just like you did with your application.

The second option can work for the cases when you want to deploy an optimized binary while not having a way to fix dependencies using the first option. Follow these steps:

# Make dependencies easily available for perfguard.
$ go mod vendor
# Run the analysis over the vendor.
# We use --fix argument to immediately apply the suggested changes.
$ perfguard optimize --heatmap cpu.out --fix ./vendor/...
# Build the optimized binary.
$ go build -o bin/app ./cmd/myapp

Then you can revert the changes to the ./vendor or remove it if you're not using vendoring.

go-perfguard's People

Contributors

peakle avatar quasilyte avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

go-perfguard's Issues

Recognize inefficient zeroing

var zero = make([]byte, 1024 * 10)

func clear(b []byte) {
  copy(b, zero)
}

=>

func clear(b []byte) {
  for i := range b {
    b[i] = 0
  }
}

The compiler would recognize this and insert memclrNoHeapPointers call there.

Suggest to omit []byte(str) conversion for %s format arguments

When doing []byte(str) in fmt arguments, the copy will be eagerly made. This is an extra allocation + memory copying.

If %s is used and []byte is passed as is, no copy is made, bytes are printed as a string to the result.

fmt.Sprintf("foo %s", string(b))
=>
fmt.Sprintf("foo %s", b)

Cover main use cases

Suppose there is an X service running in staging or production with profiling enabled.

go-perfguard should handle these cases:

  • Using a profile to optimize the app code (vendor/go mod imported packages remain unchanged)
  • Using a profile to optimize the library used in X (no app code is available, changes are applied to the library code)

Lift const-like calculations that allocate

An example:

s := strconv.FormatInt(int64(0xffffffff), 10)

This expression always allocates a new string "4294967295".

Replacing a call with that is not ideal as it hurts readability.
But we can maybe do this calculation only once and use a variable then instead?

Or maybe something like this:

s := "4294967295" // folded strconv.FormatInt(int64(0xffffffff), 10)

In any case, it may be worthwhile in hot spots.

Move allocations after early return checks, closer to the place they're needed

func countUniq(data []int) int {
	set := make(map[int]struct{}, len(data))
	if len(data) == 0 {
		return 0
	}
	for _, x := range data {
		set[x] = struct{}{}
	}
	return len(set)
}

=>

func countUniq(data []int) int {
	if len(data) == 0 {
		return 0
	}
	set := make(map[int]struct{}, len(data))
	for _, x := range data {
		set[x] = struct{}{}
	}
	return len(set)
}

Net IP comparison

xip.String() == yip.String()
=>
xip.Equal(yip) // preferrable
or
bytes.Equal([]byte(xip), []byte(yip)) // quirk-by-quirk identical

For local map[T]bool, suggest map[T]struct{} for sets

There is a problem that this would require making several code changes instead of just one:

  1. Initialization of the map
  2. Usages of the map (both reads and writes)

It should be possible to start from reporting the suggestion without applying it, like a warning.

Suggest more efficient forms of strings operations

strings.Count(s, ".") == 0
=>
!strings.Contains(s, ".")

And so on.

Same for bytes package.

bytes.Compare(b1, b2) == 0
=>
bytes.Equal(b1, b2)
const idLen = 8
strings.Count(s, "f") + strings.Count(s, "F") == idLen

=>

strings.EqualFold(s, "ffffffff")
strings.Contains(name, " ") || strings.Contains(name, "`")
strings.ContainsRune(name, ' ') || strings.ContainsRune(name, '`')
=>
strings.ContainsAny(name, " `")

Combine calls?

col.Name = strings.Trim(strings.Trim(field, "`[] "), `"`)
=> 
col.Name = strings.Trim(field, "`[] \"")

Add allocsmap index

It should be possible to know whether X line did any significant allocations or not.

Since we're using only CPU profile, we should rely on newobject, makeslice and other allocation function calls in these places.

Other functions that can be interesting:

  • runtime.convTslice (and other conv functions)
  • runtime.growslice

In make+copy idiom, recognize bad patterns

// cap may break the optimization
dst = make([]T, len(src), len(src))
copy(dst, src)

Also, accessing src slice with expressions like o.src can also break the optimization.

reflect.Type comparison

reflect.TypeOf(x).String() == reflect.TypeOf(y).String()
reflect.TypeOf(x).String() != reflect.TypeOf(y).String()
=>
reflect.TypeOf(x) == reflect.TypeOf(y)
reflect.TypeOf(x) != reflect.TypeOf(y)

Suggest map clear loop idiom when map is reassigned and used later

This analysis can be func-local.

If we see a statement like parser.tab = map[T]K{} and that parser.tab is used below (in the same function), then it could be beneficial to clear the map instead of replacing it with a new map.

Same goes for the statements like:

m = make(map[T]K, len(m))

We can start by some simple ruleguard rules and then implement a proper analysis for this.

Should be careful here: mapclear is not always better than a realloc.

Update: partially implemented by #98

Add reflect.Value.MethodByName rules in opt_rules.go?

MethodByName uses a linear search and it may be beneficial to cache the results.
Or avoid using MethodByName in hot paths at all, whether possible.

If we add an o2 rule in opt_rules.go, then we can report usages of MethodByName in hot paths.

Lift allocated objects from the loop and reuse them

for _, x := range xs {
  obj := &object{x: x}
  f(obj)
}
// =>
var obj object
for _, x := range xs {
  obj = object{x: x}
  f(&obj)
}

But we need to know (somehow) that this object pointer is not retained inside f.

Another example of this would be:

for _, x := range xs {
  var buf bytes.Buffer
  buf.Write(x.a)
  buf.Write(x.b)
  f(buf.Bytes())
}
// =>
var buf bytes.Buffer
for _, x := range xs {
  buf.Reset()
  buf.Write(x.a)
  buf.Write(x.b)
  f(buf.Bytes())
}

Handle ptr-typed bytes.Buffer in stringsBuilder checker

Use case:

func f() string {
  buf := bytes.NewBuffer(make([]byte, 0, sizehint))
  // use buf...
  return buf.String()
}

// =>

func f() string {
  buf := strings.Builder{}
  buf.Grow(sizehint)
  // use buf...
  return buf.String()
}

Suggest Grow() for strings.Builder and bytes.Buffer

When writes are unconditional and it's possible to find out their len, we can have a pretty good Grow size hint.

func test(s1, s2, s3 string) string {
	var buf strings.Builder
	if s1 != "" {
		buf.WriteString(s1)
	}
	buf.WriteString(s2)
	buf.WriteString(s3)
	return buf.String()
}

// =>

func test(s1, s2, s3 string) string {
	var buf strings.Builder
	buf.Grow(len(s2) + len(s3))
	if s1 != "" {
		buf.WriteString(s1)
	}
	buf.WriteString(s2)
	buf.WriteString(s3)
	return buf.String()
}

Change mapInc to mapOps

Since not only increment does avoid the double hashing, but any <op>= operation, we should suggest all of them.

Maybe suggest io.WriteString instead of w.Write([]byte(s))?

Needs investigation.
The constant overhead of calling Write via WriteString should be small enough, I think.
If w happens to have WriteString method, it could save some time and remove the redundant data copying.

Maybe we can use the profiling info to see whether that call actually involved big data copying and suggest it only in these cases?

Detect eager memory allocations

Sometimes people allocate slices before checking that they actually need to do this.

func f(o *object, num int) []item {
  result := make([]item, 0, num)
  if o != nil {
    for _, v := range o.items[:num] {
      result = append(result, v)
    }
  }
  return result
}

// =>

func f(o *object, num int) []item {
  result := []item{} // To avoid returning a nil slice, that could change the API
  if o != nil {
    result = make([]item, 0, num)
    for _, v := range o.items[:num] {
      result = append(result, v)
    }
  }
  return result
}

Add slice prealloc when result size is known

func joinData(x, y []byte) (result []byte) {
  result = append(result, x...)
  result = append(result, y...)
  return result
}
=>
func joinData(x, y []byte) (result []byte) {
  result = make([]byte, 0, len(x) + len(y))
  result = append(result, x...)
  result = append(result, y...)
  return result
}

This is probably something that is easier to do in SSA form.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.