quasilyte / go-perfguard Goto Github PK

View Code? Open in Web Editor NEW

75.0 5.0 4.0 409 KB

CPU-guided performance analyzer for Go

License: BSD 3-Clause "New" or "Revised" License

Go 99.55% Makefile 0.45%

go golang performance linter ruleguard go-analysis dynamic-rules static-analysis quickfix autofix

go-perfguard's Introduction

perfguard

This tool is a work in progress. It's not fully production-ready yet, but you can try it out.

Overview

perfguard is a Go static analyzer with an emphasis on performance.

It supports two run modes:

perfguard lint finds potential issues, works like traditional static analysis
perfguard optimize uses CPU profiles to improve the analysis precision

perfguard key features:

Profile-guided analysis in perfguard optimize mode
Most found issues are auto-fixable with --fix argument (quickfixes)
Easy to extend with custom rules (no recompilation needed)
Can analyze big projects* even if they have some compilation errors

(*) It doesn't try to load analysis targets into memory all at once.

Here are some examples of what it can do for you:

Remove redundant data copying or make it faster
Reduce the amounts of heap allocations
Suggest more optimized functions or types from stdlib
Recognize expensive operations in hot paths that can be lifted

Installation

Install a perfguard binary under your $(go env GOPATH)/bin:

$ go install -v github.com/quasilyte/go-perfguard/cmd/perfguard@latest

Using perfguard

It's recommended that you collect CPU profiles on realistic workflows.

For a short-lived CLI app it could be a full run. For a long-living app you may want to turn the profiling on for a minute or more, then save it to a file.

Profiles that are obtained from benchmarks are not representative and may lead to suboptimal results.

Hot spots in the profile may appear in three main places:

Standard Go library and the runtime. We can't apply fixes to that
Your app (or library) own code
Your code dependencies (direct or indirect)

Optimizing your own code is straightforward. Run perfguard on the root of your project:

$ perfguard optimize --heatmap cpu.out ./...

This will only suggest fixes to the (2) category.

To optimize the code from (3) we have several choices.

Optimize the library itself
Optimize the whole code base with an explicit vendor

The first option is preferable. You can use the same CPU profile to optimize the library. Run the perfguard on the library source code root just like you did with your application.

The second option can work for the cases when you want to deploy an optimized binary while not having a way to fix dependencies using the first option. Follow these steps:

# Make dependencies easily available for perfguard.
$ go mod vendor
# Run the analysis over the vendor.
# We use --fix argument to immediately apply the suggested changes.
$ perfguard optimize --heatmap cpu.out --fix ./vendor/...
# Build the optimized binary.
$ go build -o bin/app ./cmd/myapp

Then you can revert the changes to the ./vendor or remove it if you're not using vendoring.

go-perfguard's People

Contributors

Stargazers

Watchers

Forkers

isgasho peakle silverrainz terminator637

go-perfguard's Issues

Run goimports on files that were autofixed

For const len/cap make() for slices, suggest a bigger size, aligning it for a size class

s := make([]byte, 0, 6)

// =>

s := make([]byte, 0, 8)

Note: 0-sized types should be ignored.

I don't think that this can be implemented in ruleguard rules as we need to do some complicated computations.

Do unparen in condReorder checker

Detect places where strings.Cut can be used

Maybe suggest io.WriteString instead of w.Write([]byte(s))?

Needs investigation.
The constant overhead of calling Write via WriteString should be small enough, I think.
If w happens to have WriteString method, it could save some time and remove the redundant data copying.

Maybe we can use the profiling info to see whether that call actually involved big data copying and suggest it only in these cases?

Suggest more efficient forms of strings operations

strings.Count(s, ".") == 0
=>
!strings.Contains(s, ".")

And so on.

Same for bytes package.

bytes.Compare(b1, b2) == 0
=>
bytes.Equal(b1, b2)

const idLen = 8
strings.Count(s, "f") + strings.Count(s, "F") == idLen

=>

strings.EqualFold(s, "ffffffff")

strings.Contains(name, " ") || strings.Contains(name, "`")
strings.ContainsRune(name, ' ') || strings.ContainsRune(name, '`')
=>
strings.ContainsAny(name, " `")

Combine calls?

col.Name = strings.Trim(strings.Trim(field, "`[] "), `"`)
=> 
col.Name = strings.Trim(field, "`[] \"")

Rewrite makeslice to be more stack-alloc friendly

n := 10
s := make([]data, 0, n)

=>

const n int = 10
s := make([]data, 0, 10)

Net IP comparison

xip.String() == yip.String()
=>
xip.Equal(yip) // preferrable
or
bytes.Equal([]byte(xip), []byte(yip)) // quirk-by-quirk identical

Handle ptr-typed bytes.Buffer in stringsBuilder checker

Use case:

func f() string {
  buf := bytes.NewBuffer(make([]byte, 0, sizehint))
  // use buf...
  return buf.String()
}

// =>

func f() string {
  buf := strings.Builder{}
  buf.Grow(sizehint)
  // use buf...
  return buf.String()
}

Implement weight-based replacement selection

When there are multiple edits for a single line, we may want to select the one that has the highest weight.

Detect string(key) conversions for map lookups that are not optimized (but could be made optimized)

For example, this is not optimized and runtime.slicebytetostring will be generated:

func lookup(m map[string]int, k []byte) int {
	key := string(k)
	return m[key]
}

In make+copy idiom, recognize bad patterns

// cap may break the optimization
dst = make([]T, len(src), len(src))
copy(dst, src)

Also, accessing src slice with expressions like o.src can also break the optimization.

Skip autogenerated files by default

Lift const-like calculations that allocate

An example:

s := strconv.FormatInt(int64(0xffffffff), 10)

This expression always allocates a new string "4294967295".

Replacing a call with that is not ideal as it hurts readability.
But we can maybe do this calculation only once and use a variable then instead?

Or maybe something like this:

s := "4294967295" // folded strconv.FormatInt(int64(0xffffffff), 10)

In any case, it may be worthwhile in hot spots.

Change mapInc to mapOps

Since not only increment does avoid the double hashing, but any <op>= operation, we should suggest all of them.

Add allocsmap index

It should be possible to know whether X line did any significant allocations or not.

Since we're using only CPU profile, we should rely on newobject, makeslice and other allocation function calls in these places.

Other functions that can be interesting:

runtime.convTslice (and other conv functions)
runtime.growslice

Add appendCombine

See also: https://github.com/golang/go/issues/44628

Improve make() capacity values based on the cpu/mem profile info

We can have a growslice index that will help us identify appends to a slice with a hint that resulted in allocation.
See #66

Add negative tests to stringsBuilder checker test suite

Suggest filepath.Match alternatives

Some patterns can be replaced by simple string ops.

ok, _ := filepath.Match("*.go", s)
=>
ok := strings.HasSuffix(s, ".go")

Suggest direct binary package usages

go-faster/city@8e81405

Cover main use cases

Suppose there is an X service running in staging or production with profiling enabled.

go-perfguard should handle these cases:

Using a profile to optimize the app code (vendor/go mod imported packages remain unchanged)
Using a profile to optimize the library used in X (no app code is available, changes are applied to the library code)

reflect.Type comparison

reflect.TypeOf(x).String() == reflect.TypeOf(y).String()
reflect.TypeOf(x).String() != reflect.TypeOf(y).String()
=>
reflect.TypeOf(x) == reflect.TypeOf(y)
reflect.TypeOf(x) != reflect.TypeOf(y)

Find Time.Format usages that can be converted to AppendFormat calls

Add quickfix tests

Especially important for multi-line replacements.

Use a fact that Sprint and alike are variadic?

fmt.Sprintf("%d %d", x, y)
= 
fmt.Sprint(x) + " " + fmt.Sprint(y)
=
fmt.Sprint(x, y) // adds a space between args if neither is a string

TODO: benchmark these?

Do not break imports grouping while inserting/removing imports

Suggest Grow() for strings.Builder and bytes.Buffer

When writes are unconditional and it's possible to find out their len, we can have a pretty good Grow size hint.

func test(s1, s2, s3 string) string {
	var buf strings.Builder
	if s1 != "" {
		buf.WriteString(s1)
	}
	buf.WriteString(s2)
	buf.WriteString(s3)
	return buf.String()
}

// =>

func test(s1, s2, s3 string) string {
	var buf strings.Builder
	buf.Grow(len(s2) + len(s3))
	if s1 != "" {
		buf.WriteString(s1)
	}
	buf.WriteString(s2)
	buf.WriteString(s3)
	return buf.String()
}

Recognize inefficient zeroing

var zero = make([]byte, 1024 * 10)

func clear(b []byte) {
  copy(b, zero)
}

func clear(b []byte) {
  for i := range b {
    b[i] = 0
  }
}

The compiler would recognize this and insert memclrNoHeapPointers call there.

Suggest strconv Append functions instead of string-based API + conversions

Find repeated bytes.Buffer.String() calls

Calling String() once is preferred to avoid redundant []byte->string copies.

Note: if buffer is replaced with strings Builder then it's not a problem.

Optimize `math/big` expressions

Detect eager memory allocations

Sometimes people allocate slices before checking that they actually need to do this.

func f(o *object, num int) []item {
  result := make([]item, 0, num)
  if o != nil {
    for _, v := range o.items[:num] {
      result = append(result, v)
    }
  }
  return result
}

// =>

func f(o *object, num int) []item {
  result := []item{} // To avoid returning a nil slice, that could change the API
  if o != nil {
    result = make([]item, 0, num)
    for _, v := range o.items[:num] {
      result = append(result, v)
    }
  }
  return result
}

Suggest to omit []byte(str) conversion for %s format arguments

When doing []byte(str) in fmt arguments, the copy will be eagerly made. This is an extra allocation + memory copying.

If %s is used and []byte is passed as is, no copy is made, bytes are printed as a string to the result.

fmt.Sprintf("foo %s", string(b))
=>
fmt.Sprintf("foo %s", b)

Add rangeToCopy rules

Print execution stats in debug mode

Which rules matched and how many times

Suggest map clear loop idiom when map is reassigned and used later

This analysis can be func-local.

If we see a statement like parser.tab = map[T]K{} and that parser.tab is used below (in the same function), then it could be beneficial to clear the map instead of replacing it with a new map.

Same goes for the statements like:

m = make(map[T]K, len(m))

We can start by some simple ruleguard rules and then implement a proper analysis for this.

Should be careful here: mapclear is not always better than a realloc.

Update: partially implemented by #98

Have context info whether X node is on a hot branch?

We have per-line coverage, but it's hard to use it when branch prediction is needed.

Handle anonymous funcs in optimize mode

Combine init into literal

var xs []T
xs = append(xs, x)

// =>

var xs = []T{x}

var xs []T
xs = append(xs, x1)
xs = append(xs, x2)

// =>

var xs = []T{x1, x2}

More examples https://go.dev/play/p/e34FVX-sqQx

Lift allocated objects from the loop and reuse them

for _, x := range xs {
  obj := &object{x: x}
  f(obj)
}
// =>
var obj object
for _, x := range xs {
  obj = object{x: x}
  f(&obj)
}

But we need to know (somehow) that this object pointer is not retained inside f.

Another example of this would be:

for _, x := range xs {
  var buf bytes.Buffer
  buf.Write(x.a)
  buf.Write(x.b)
  f(buf.Bytes())
}
// =>
var buf bytes.Buffer
for _, x := range xs {
  buf.Reset()
  buf.Write(x.a)
  buf.Write(x.b)
  f(buf.Bytes())
}

Handle `var b = bytes.Buffer{}` in stringsBuilder

var-based assignments should be treated identically to := assignments.

For local map[T]bool, suggest map[T]struct{} for sets

There is a problem that this would require making several code changes instead of just one:

Initialization of the map
Usages of the map (both reads and writes)

It should be possible to start from reporting the suggestion without applying it, like a warning.

Add reflect.Value.MethodByName rules in opt_rules.go?

MethodByName uses a linear search and it may be beneficial to cache the results.
Or avoid using MethodByName in hot paths at all, whether possible.

If we add an o2 rule in opt_rules.go, then we can report usages of MethodByName in hot paths.

Handle cpu matches inside anon functions in global scope

var f = func() {
  // here
}

var pool = sync.Pool{
  New: func() interface{} {
    // here
  },
}

func joinData(x, y []byte) (result []byte) {
  result = append(result, x...)
  result = append(result, y...)
  return result
}
=>
func joinData(x, y []byte) (result []byte) {
  result = make([]byte, 0, len(x) + len(y))
  result = append(result, x...)
  result = append(result, y...)
  return result
}

This is probably something that is easier to do in SSA form.

Move allocations after early return checks, closer to the place they're needed

func countUniq(data []int) int {
	set := make(map[int]struct{}, len(data))
	if len(data) == 0 {
		return 0
	}
	for _, x := range data {
		set[x] = struct{}{}
	}
	return len(set)
}

func countUniq(data []int) int {
	if len(data) == 0 {
		return 0
	}
	set := make(map[int]struct{}, len(data))
	for _, x := range data {
		set[x] = struct{}{}
	}
	return len(set)
}