We tried this a few years back with a very loaded Go server using SQLite (actual C SQLite, not modernc.org/sqlite) and something made performance tank... likely some difference between glibc and musl. We never had time to hunt it down. We talked to Zig folk about sponsoring them to work on it, making performance comparable to glibc-built-SQLite, but they were busy at the time.
it was the intrinsics, sometime ago zig "imported" compiler-rt and wrote pure-zig variants that are lacking machine level optimizations and a lot of vectorization and so on. SQLite performance is highly dependent on them.
I'm curious: do you have any specific examples that stress this?
I'd like to see how my Wasm build fares. I assume worse, but it'd be interesting to know by how much.
I've also heard that it's allocator performance under fragmentation, lock contention on the allocator, the mutex implementation itself, etc. Without something to measure, it's hard to know.
There are three areas of subject matter here (Go, Zig, SQLite).
Our affected deployment was an unusually large vertical scale SQLite deployment, with the part causing primary concern hitting >48 fully active cores and struggling to maintain 16k iops, large read workload you can think of almost like slowly slurping a whole set of tables, with a concurrent write workload updating mostly existing rows. Lots of json extension usage.
Something important to add here: Go code is largely, possibly entirely unaffected by Zigs intrinsics, as it has its own symbols and implementations, but I didn't check if the linker setup ended up doing anything else fancy/screwy to take over additional stuff.
I just had a user come up to me a particular query that was relatively much slower on my driver than others, and it unexpectedly turned out that it was the way I implemented context cancellation for long running queries.
Fixing the issue lead to a two fold improvement, which I wouldn't have figured out without something to measure.
Now obviously, I can't ask you for the source code of your huge production app. But if you could give hints that would help build a benchmark, that's the best way to improve.
I just wonder which compiler-rt builtins make such a huge difference (and where), and how do they end up when going through Wasm and the wazero code generator.
Maybe I could star by compiling speedtest1 with zig cc, see if anything pops up.
So we aren't using mmap (but we are using WAL), so for one thing there's a lot of buffer and cache copy work going on - I'd expect it's the old classics from string.h. WASM is a high level VM though so I'd expect (provided you pass the relevant flags) it should be using e.g. the VMs memory.copy instruction which will bottom out in e.g. v8s memcpy.
sqlite in general should be mostly optimized for the targets it's often built with, so for example it'll expect everything in string.h to be insanely well optimized, but it should also expect malloc to be atrocious and so it'll manage it's own pools.
Musl's allocator has a reputation if being slow. Switching to an alternative allocation like jmalloc or mimalloc should avoid that. (Not sure if that was your problem, but it's a common complaint about Rust+Musl)
This post seems to be missing an additional opportunity: Zig's build can involve go build! Use Build.addSystemCommand, have it depend on library step, and have the install step depend on that step. This reduces the steps to build to just 'zig build'
My biggest problem with “zig build” is learning it. Mainly because it still changes from version to version, and having a clear picture of what steps are and how they interact is somehow hard for me.
Obviously I will overcome this, but unlike zig itself, the build file still feels like guesswork to me.
The generated html docs can also be hard to read / understand. Zig 0.14.0 changed something about LazyPath (I think that‘s how it‘s called) and I had a hard time figuring out how to migrate.
Zig builds a statically linked library called "cgo_static_linking" from Zig sources. Zig places the artifact into "./zig-out/lib/cgo_static_linking.a" (or maybe some other extension). The library is exported with C ABI, which Go (cgo) compiler supports.
The #cgo line tells the Go compiler to add custom compilation flags: -l flag tells the compiler that you wish to link to a library called "cgo_static_linking" in to the binary. However, the cgo compiler can't find it since it is not in the default search paths for libraries. The -L option extends the library search path to "./zig-out/lib/", allowing "cgo_static_linking" to be found.
The "#include" line inserts the header file contents as-is into the source file (main.go), bringing the definition of the function available to the compiler. The compiler is then able to compile the Go source, and later link it with the other build artifacts to produce the final executable.
The key here is the "C ABI" which acts as common ground for both languages. The rest is just scaffolding to let the compilers find and connect appropriate files together.
Most of the magic is due to two things:
1. Zig comes with clang and musl (C library) for lots of architectures, so can be used for cross-compiling, even for non-Zig projects since clang can build C/C++ no problem
2. Musl is designed to be compatible with static linking (glibc is not), so it's the natural choice here too
Hi HN,
I tried doing this 2 days ago for AWS lambda arm64 runners.
And it did not work. Has anyone successfully built a statically linked golang binary arm64 runners in Lambda?
We tried this a few years back with a very loaded Go server using SQLite (actual C SQLite, not modernc.org/sqlite) and something made performance tank... likely some difference between glibc and musl. We never had time to hunt it down. We talked to Zig folk about sponsoring them to work on it, making performance comparable to glibc-built-SQLite, but they were busy at the time.
it was the intrinsics, sometime ago zig "imported" compiler-rt and wrote pure-zig variants that are lacking machine level optimizations and a lot of vectorization and so on. SQLite performance is highly dependent on them.
I'm curious: do you have any specific examples that stress this?
I'd like to see how my Wasm build fares. I assume worse, but it'd be interesting to know by how much.
I've also heard that it's allocator performance under fragmentation, lock contention on the allocator, the mutex implementation itself, etc. Without something to measure, it's hard to know.
I can probably help but what are you looking for?
There are three areas of subject matter here (Go, Zig, SQLite).
Our affected deployment was an unusually large vertical scale SQLite deployment, with the part causing primary concern hitting >48 fully active cores and struggling to maintain 16k iops, large read workload you can think of almost like slowly slurping a whole set of tables, with a concurrent write workload updating mostly existing rows. Lots of json extension usage.
Something important to add here: Go code is largely, possibly entirely unaffected by Zigs intrinsics, as it has its own symbols and implementations, but I didn't check if the linker setup ended up doing anything else fancy/screwy to take over additional stuff.
Sorry for not replying sooner! Tbh, idk.
I just had a user come up to me a particular query that was relatively much slower on my driver than others, and it unexpectedly turned out that it was the way I implemented context cancellation for long running queries.
Fixing the issue lead to a two fold improvement, which I wouldn't have figured out without something to measure.
Now obviously, I can't ask you for the source code of your huge production app. But if you could give hints that would help build a benchmark, that's the best way to improve.
I just wonder which compiler-rt builtins make such a huge difference (and where), and how do they end up when going through Wasm and the wazero code generator.
Maybe I could star by compiling speedtest1 with zig cc, see if anything pops up.
So we aren't using mmap (but we are using WAL), so for one thing there's a lot of buffer and cache copy work going on - I'd expect it's the old classics from string.h. WASM is a high level VM though so I'd expect (provided you pass the relevant flags) it should be using e.g. the VMs memory.copy instruction which will bottom out in e.g. v8s memcpy.
we also don't use a common driver, ours is here https://github.com/tailscale/sqlite
sqlite in general should be mostly optimized for the targets it's often built with, so for example it'll expect everything in string.h to be insanely well optimized, but it should also expect malloc to be atrocious and so it'll manage it's own pools.
Musl's allocator has a reputation if being slow. Switching to an alternative allocation like jmalloc or mimalloc should avoid that. (Not sure if that was your problem, but it's a common complaint about Rust+Musl)
This post seems to be missing an additional opportunity: Zig's build can involve go build! Use Build.addSystemCommand, have it depend on library step, and have the install step depend on that step. This reduces the steps to build to just 'zig build'
My biggest problem with “zig build” is learning it. Mainly because it still changes from version to version, and having a clear picture of what steps are and how they interact is somehow hard for me.
Obviously I will overcome this, but unlike zig itself, the build file still feels like guesswork to me.
That‘s my experience as well.
The generated html docs can also be hard to read / understand. Zig 0.14.0 changed something about LazyPath (I think that‘s how it‘s called) and I had a hard time figuring out how to migrate.
Do you by chance have a link to a working example of this? Sounds interesting...
zigception!
All the magic seems to happen in
and/* #cgo LDFLAGS: -L./zig-out/lib -lcgo_static_linking -static #include "zig_lib.h" */
Now I can only guess why it works. A bit of an explanation wouldn't hurt. Good post btw
Zig builds a statically linked library called "cgo_static_linking" from Zig sources. Zig places the artifact into "./zig-out/lib/cgo_static_linking.a" (or maybe some other extension). The library is exported with C ABI, which Go (cgo) compiler supports.
The #cgo line tells the Go compiler to add custom compilation flags: -l flag tells the compiler that you wish to link to a library called "cgo_static_linking" in to the binary. However, the cgo compiler can't find it since it is not in the default search paths for libraries. The -L option extends the library search path to "./zig-out/lib/", allowing "cgo_static_linking" to be found.
The "#include" line inserts the header file contents as-is into the source file (main.go), bringing the definition of the function available to the compiler. The compiler is then able to compile the Go source, and later link it with the other build artifacts to produce the final executable.
The key here is the "C ABI" which acts as common ground for both languages. The rest is just scaffolding to let the compilers find and connect appropriate files together.
Most of the magic is due to two things: 1. Zig comes with clang and musl (C library) for lots of architectures, so can be used for cross-compiling, even for non-Zig projects since clang can build C/C++ no problem 2. Musl is designed to be compatible with static linking (glibc is not), so it's the natural choice here too
Hi HN, I tried doing this 2 days ago for AWS lambda arm64 runners. And it did not work. Has anyone successfully built a statically linked golang binary arm64 runners in Lambda?
Thanks in advance.
what is the upside for this rather than just building cgo with musl? what does zig add?
Zig allows you to cross-compile as easily as Go does. Setting up cross-compilation toolchain otherwise is quite painful
ah okay.
I was building with cgo+musl before and it was not complex at all, but yeah I didn't think about cross-compiling.
It's pretty simple on Debian/Ubuntu.