Zig Version 0.12.0-dev.2059+42389cb9c Steps to

For a single const , we can omit the <code class="notr

`@addWithOverflow` is slower than `std.math.add` even though builtins should be preferred over stdlib functions about zig HOT 11 OPEN

notcancername commented on June 18, 2024

`@addWithOverflow` is slower than `std.math.add` even though builtins should be preferred over stdlib functions

from zig.

Comments (11)

mlugg commented on June 18, 2024 2

To be clear, I agree we should improve on this, it would be great to avoid those allocs. I was more just trying to explain the reason for the complexity to Vexu.

from zig.

mlugg commented on June 18, 2024 2

Ah yes, that makes sense. We could set up AstRlAnnotate to require result locations for destructures if there are ever multiple peers - that way we avoid the PTR issues. That might be a little tricky to detect in general, but it would be very easy to at least optimize the trivial case of directly destructuring the result of these builtins - that would be a few lines in AstRlAnnotate and a new result location strategy in AstGen. I'll take a look at this soon.

from zig.

IntegratedQuantum commented on June 18, 2024 1

Part of the difference seems to come from tuple destructuring, which seems to add an extra write to the stack.
But there seems to be an even bigger issue here:
@addWithOverflow seems to generate a lot more code than the equivalent C function (godbolt).
Why is the zig version even writing to the stack? Why is it using so many extra instructions?

from zig.

notcancername commented on June 18, 2024

Part of the difference seems to come from tuple destructuring, which seems to add an extra write to the stack.

Yes, I added the NoDestructure variants to verify this.

Why is the zig version even writing to the stack? Why is it using so many extra instructions?

I blame tuples.

from zig.

Vexu commented on June 18, 2024

@addWithOverflow seems to generate a lot more code than the equivalent C function (godbolt).

Because at -O3 the C version doesn't check the result:

define dso_local i64 @addInC(i64 noundef %a, i64 noundef %b) local_unnamed_addr {
entry:
  %retval.0 = tail call i64 @llvm.uadd.sat.i64(i64 %a, i64 %b)
  ret i64 %retval.0
}

from zig.

Vexu commented on June 18, 2024

And the reason for the write is AstGen.assignDestructureMaybeDecls always generating an alloc for const variables.

I may be missing something but I don't see why destructuring into a const would ever require an alloc.

from zig.

mlugg commented on June 18, 2024

I may be missing something but I don't see why destructuring into a const would ever require an alloc.

Well, firstly, we need one if we want to use RLS, just like with a single const. However, unfortunately, we pretty much need to be consistent and use allocs for all destructures. The reason is that by the current semantics, there is no type T such that all destructurable types can coerce to T. Consider this:

var rt = true;
var a: struct { u64, u64 } = .{ 1, 2 };
var b: [2]u32 = .{ 1, 2 };
_ = .{ &rt, &a, &b };
const c, const d = if (rt) a else b;

If we didn't use allocs here, we would create a block which had breaks for both of these types, and they don't peer resolve. We could restrict the language to disallow this, but that would break the RLS case by default (allowing things that we've disallowed), so we'd need instructions to check this.

from zig.

Vexu commented on June 18, 2024

Oh right. But shouldn't it still be able to use nodes_need_rl?

from zig.

mlugg commented on June 18, 2024

For a single const, we can omit the alloc if we don't need to utilize RLS (based on nodes_need_rl). What I was getting at above is that trying to elide allocs for destructuring here would cause an actual difference in semantics, because of the PTR/coercion issue I described, where we would be trying to peer resolve potentially-incompatible types which might nonetheless be individually destructurable.

Also note that even if it did work (we adjusted the language spec in some way or implemented some weird ZIR instructions to coerce tuples how we want) we would have to deal with the weird scenario of some elements of a destructure having result locations when others don't.

from zig.

andrewrk commented on June 18, 2024

The whole point of switching @addWithOverflow to tuples was so that the backends could lower it more efficiently. It's pretty lame how it turned out.

from zig.

Vexu commented on June 18, 2024

Looking at it closer nodes_need_rl isn't enough but shouldn't combining that with a check for control flow that could result in PTR be enough to remove the allocs in most cases?

from zig.

`@addWithOverflow` is slower than `std.math.add` even though builtins should be preferred over stdlib functions about zig HOT 11 OPEN

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent