I like the style of the blog but a minor nit I'd change is have a definition what SSA is right at the top. It discusses SSA for quite a while "SSA is a property of intermediate representations (IRs)", "it's frequently used" and only 10 paragraphs down actually defines what SSA is
> SSA stands for “static single assignment”, and was developed in the 80s as a way to enhance the existing three-argument code (where every statement is in the form x = y op z) so that every program was circuit-like, using a very similar procedure to the one described above.
I understand it's one of those "well if you don't know what it is, the post is not for you" but I think it's a nice article and could get people who are not familiar with the details interested in it
> The reason this works so well is because we took a function with mutation, and converted it into a combinatorial circuit, a type of digital logic circuit that has no state, and which is very easy to analyze.
That's an interesting insight, it made sense to me. I only dealt with SSA when decompiling bytecode or debugging compiler issues, and never knew why it was needed, but that sort of made it click.
Here's a concise explanation of SSA. Regular (imperative) code is hard to optimize because in general statements are not pure -- if a statement has side effects, then it might not preserve the behavior to optimize that statement by, for example:
1. Removing that statement (dead code elimination)
2. Deduplicating that statement (available expressions)
3. Reordering that statement with other statements (hoisting; loop-invariant code motion)
4. Duplicating that statement (can be useful to enable other optimizations)
All of the above optimizations are very important in compilers, and they are much, much easier to implement if you don't have to worry about preserving side effects while manipulating the program.
So the point of SSA is to translate a program into an equivalent program whose statements have as few side effects as possible. The result is often something that looks like a functional program. (See: https://www.cs.princeton.edu/~appel/papers/ssafun.pdf, which is famous in the compilers community.) In fact, if you view basic blocks themselves as a function, phi nodes "declare" the arguments of the basic block, and branches correspond to tailcalling the next basic block with corresponding values. This has motivated basic block arguments in MLIR.
The "combinatorial circuit" metaphor is slightly wrong, because most SSA implementations do need to consider state for loads and stores into arbitrary memory, or arbitrary function calls. Also, it's not easy to model a loop of arbitrary length as a (finite) combinatorial circuit. Given that the author works at an AI accelerator company, I can see why he leaned towards that metaphor, though.
> Given that the author works at an AI accelerator company
I actually believe he works at Buf.build and his resume is stale, the author previously posted about their Go Protobuf parser hyperpb which is a Buf project.
Maybe still "author recently worked at an AI accelerator company" though.
This post is frankly one of the most convoluted discussions of SSA I've read. There's lots of info there, but I'd frankly suggest going back and look at a paper on implementing it. I think I first came across SSA in a paper adding it to Wirths Oberon compiler, and it was much more accessible.
As someone who knows a bit about SSA, what do you make of LLVM's design decision not to use SSA for memory but only registers (i.e. it doesn't have memory SSA)? It has always confused me a bit as to why this was done.
I’ve found the SSA book to be... unforgiving in its difficulty. Not in the sense that I thought it to be a bad book but rather in that I was getting the feeling that a dilettante in compilers like me wasn’t the target audience.
I was involved in making the book. It is very much a book for academics, and came out of an academic conference bringing together people working at the forefront of SSA-based research.
I mean, once again, I’m not really complaining about the book. It’s fairly mathy, sure, but so what. I also actually welcome that it’s a coherent book rather than a bunch of papers in a trenchcoat or a menagerie of neat things people have thought of (*cough* Paxos variants).
It’s rather that it’s a bit unusual in that it’s a coherent book whose prerequisites (on top of an old-timey compilers course, say) I don’t think actually exist in book form (I’d love to be proven wrong, as that’s likely what I need to read). The introductory part does make it self-contained in a sense, but it’s more like those grad-level maths books that include a definitions chapter or three: technically you don’t need to know any of that stuff beforehand, true, but in reality if it does more for you than just fill a few gaps and fix terminology, then you’re not going to have a good time with the rest. Again, just my experience, I don’t know if that’s what you were going for.
If there was a criticism implied in my initial comment, it’s that I think that the kind of person that goes looking for literature recommendations in this thread isn’t going to have a good time with it, either; so at the very least they should know what they’re signing up for. But I don’t think you’re really disagreeing with that part?..
Honestly, I think it's just something you either like or don't. If all you were trying to do was understand SSA, I agree this blog post is probably inefficient at that particular task, but often blog posts are entertainment as much as education, so meandering through a bunch of different things along the way is part of the deal. Personally I thought there were a lot of pretty interesting insights that I haven't seen a lot of discussion about in other places, though I will admit I mostly learned about SSA from Wikipedia and from people yelling about compilers online.
I learned a bit about SSA in a compiler course. Among many other things, it is crucial for register assignment. You want to know each distinct value that will exist, and the lifetimes of those values, in order to give each a register. Then, if have more distinct values existing at one time than you have registers, you have to push stuff to the stack.
It is not critical for register assignment -- in fact, SSA makes register assignment more difficult (see the swap problem; the lost copy problem).
Lifetime analysis is important for register assignment, and SSA can make lifetime analysis easier, but plenty of non-SSA compilers (lower-tier JIT compilers often do not use SSA because SSA is heavyweight) are able to register allocate just fine without it.
The motivation and reason it works is also wrong anyway. Like i get it's a gentle intro, but i think there are ways to accomplish that without being egregiously history rewriting ;)
Ken zadeck was my office mate for years, so this is my recollection, but it's also been a few decades, so sorry for any errors :)
The reason of why it works is definitely wrong - they weren't even using rewriting forms of SSA, and didn't for a long time. Even the first "fully SSA" compiler (generally considered to be open64) did not rewrite the IR into SSA.
The reason it works so well is because it enables you to perform effective per-variable dataflow. In fact, this is the only problem it solves - the ability to perform unrestricted per-variable dataflow quickly. This was obvious given the historical context at the time, but less obvious now that it is history :)
In the simplest essence, SSA enables you to follow chains of dataflow for a variable very easily and simply, and that's probably what i would have said instead.
It's true that for simple scalar programs, the variable name reuse that breaks these dataflow chains mostly occur at explicit mutation points, but this is not always correct depending on the compiler IR, and definitely not correct you extend it to memory, among other things. It's also not correct at all as you extend the thing SSA enables you to do (per-variable dataflow quickly) to other classes of problems (SSU, SSI, etc).
History wise - this post also sort of implies SSA came out of nowhere and was some revolutionary thing nobody had ever thought about, just sort of developed in the 80's.
In fact, it was a formalization of attempts at per-variable dataflow they had been working on for a while.
I'd probably just say that as a gentle intro, but if you want the rest of history, here you go:
Well before SSA, it was already known that lower bounds on bitvector dataflow (the dominant approach at the time of SSA) were not great. Over the years, it turned out they were much better than initially expected, but in the end, worse than anyone wanted as programs got bigger and bigger. N^2 or N^3 bitvector operations for most problems. Incrementality is basically impossible[2]. They were also hard to understand and debug because of dependencies between related variables, etc.
Attempts at faster/better dataflow existed in two rough forms, neither of which are bound by the same lower bound:
1. Structured dataflow/Interval analysis algorithms/Elimination dataflow - reduce the CFG into various types of regions with a known system of equations, solve the equations, distribute the results to the regions. Only works well on reducible graphs Can be done incrementally with care. This was mostly studied in parallel to bitvectors, and was thought heavily about before it became known that there was a class of rapid dataflow problems (IE before the late 70's). Understanding the region equations well enough to debug them requires a very deep understanding of the basis of dataflow - lattices, etc.
In that sense it was worse than bitvectors to understand for most people. Graph reducibility restrictions were annoying but tractable on forward graphs through node splitting and whatnot (studies showed 90+% of programs at the time had reducible flowgraphs already), but almost all backwards CFG's are not reducible, making backwards dataflow problems quite annoying. In the end, as iterative dataflow became provably faster/etc, and compilers became the province of more than just theory experts, this sort of died[3]. If you ever become a compiler theory nerd, it's actually really interesting to look at IMHO.
2. Per-variable dataflow approaches. This is basically "solve a dataflow problem for single variable or cluster of variables at a time instead of for all variables at once".
There were not actually a ton of people who thought this would ever turn into a practical approach:
a. The idea of solving reaching definitions (for example) one variable at a time seemed like it would be much slower than solving it for all variables at once.
b. It was very non-obvious how to be able to effectively partition variables to be able to compute a problem on one variable (or a cluster of variables) at a time without it devolving into either bitvector-like time bounds or structured dataflow like math complexity.
It was fairly obvious at the time that if you culd make it work, you could probably get much faster incremental dataflow solution.
SSA came out of this approach. Kenny's thesis dealt with incremental dataflow and partitioned variable problems, and proved time bounds on various times of partitioned/clustered variable problems. You can even see the basis of SSA in how it thinks about things. Here, i'll quote from a few parts:
" As the density of Def sites per variable increases, the average size of the affected area for each change will decrease...".
His thesis is (amont other things) on a method for doing this that is typical for the time (IE not SSA):
"The mechanism used to enhance performance is raising the density of Def sites for each variable. The most obvious way to increase the Use and Def density is to attempt a reduction on the program flow graph. This reduction would replace groups of nodes by a single new node. This new node would then be labeled with infcrmation that summarized the effects of execution through that group of nodes."
This is because, as i mentioned, figuring out how to do it by partitioning the variables based on dominance relationships was not known - that is literally SSA, and also because Kenny wanted to finish his thesis before the heat death of the sun.
In this sense, they were already aware that if they got "the right number of names" for a variable, the amount of dataflow computation needed for most problems (reaching defs, etc) would become very small, and that changes would be easy to handle. They knew fairly quickly that for the dataflow problems they wanted to solve, they needed each variable to have a single reaching definition (and reaching definitions was well known), etc.
SSA was the incremental culmination of going from there to figuring out a clustering/partitioning of variables that was not based on formally structured control flow regions (which are all-variables things), but instead based on local dataflow for a variable with incorporation of the part of the CFG structure that could actually affect the local result for a given variable.
It was approached systematically - understanding the properties they needed, figuring out algorithms that solved them, etc.
Like a lot of things that turn out to be unexpectedly useful, take the world by storm, whatever, etc, history later seems to try to often represent them as a eureka moment.
[1] Bitvectors are assumed to be fixed size, and thus constant cost, but feel free to add another factor of N here if you want.
[2] This is not true in the sense that we knew how to recompute the result incrementally, and end up with a correct result, but doing so provably faster than solving the problem from scratch was not possible.
[3] They actually saw somewhat of a resurgence in the world of GPUs and more general computation graphs because it all becomes heavily applicable again to solving. However, we almost always have eventually developed easier to understand (even if potentially slower theory-wise) global algorithms and use those instead, because we have the compute power to do it, and this tradeoff is IMHO worth it.
I'm extremely uncertain of my history here, but my recollection is that SSA wasn't seen as practical until the development of the dominance frontier algorithm for inserting phi nodes, which seems to be 1991.
Yes, this is a problem I see more and more often. Besides the fact that it's always a good habit to spell out an acrostic or abbreviation the first time it's used in an article, there's also the fact that this is, you know, the web, and it's very easy to make a link to a Wikipedia article or something that will explain the term, if you don't want to bother adding your own explanation.
I don't think it really explains what it's for though. Here's my explanation (from my PhD thesis on compilers):
Static single assignment (SSA) form provides an efficient intermediate representation for program analysis [Cytron et al., 1989, 1991]. It was introduced as a way of efficiently representing dataflow analyses.
SSA uses a single key idea: that all variables in the program are renamed so that each variable is assigned a value at a single unique statement in the program. From this key idea, SSA is able to provide a number of advantages over other techniques of dataflow analyses:
Factored use-def chain: With a def-use chain, dataflow results may be propagated directly from the assignment of a variable to all of its uses. However, a def-use chain requires an edge from each definition to each use, which may be expensive when a program has many definitions and uses of the same variable. In practice, this occurs in the presence of switch-statements. SSA factors the def-use chain over a φ-node, avoiding this pathological case.
Flow-sensitivity: A flow-insensitive algorithm performed on an SSA form is much more precise than if SSA form were not used. The flow-insensitive problem of multiple definitions to the same variable is solved by the single assignment property. The allows a flow-insensitive algorithm to approach the precision of a flow-sensitive algorithm.
Memory usage: Without SSA form, an analysis must store information for every variable at every program point. SSA form allows a sparse analysis, where an analysis must store information only for every assignment in the program. With a unique version per assignment, the memory usage of storing the results of an analysis can be considerably lower than using bit-vector or set-based approaches.
I like this article a lot but it doesn't answer the question of "Why SSA?".
Sure, a graph representation is nice, but that isn't a unique property of SSA. You can have graph IRs that aren't SSA at all.
And sure, SSA makes some optimizations easy, but it also makes other operations more difficult. When you consider that, plus the fact that going into and out of SSA is quite involved, it doesn't seem like SSA is worth the fuss.
So why SSA?
Well, it turns out compilers have sequencing issues. If you view compilation as a series of small code transformations, your representation goes from A -> B, then B -> C, then C -> D and so on. At least, that's how it works for non-optimizing compilers.
For optimizing compilers however, passes want to loop. Whenever an optimization is found, previous passes should be run again with new inputs... if possible. The easiest way to ensure this is to make all optimizations input and output the same representation. So A -> B is no good. We want A -> A: a singular representation.
So if we want a singular representation, let's pick a good one right? One that works reasonably well for most things. That's why SSA is useful: it's a decently good singular representation we can use for every pass.
While iterating optimizations is nice, I think you missed the main point of SSA.
SSA makes dataflow between operations explicit; it completely eliminates the original (incidental) names from programs. Because of that, all dataflow problems (particularly forward dataflow problems) get vastly simpler.
With SSA you can throw basically all forward dataflow problems (particularly with monotonic transformations) into a single pass and they all benefit each other. Without SSA, you have every single transformation tripping over itself to deal with names from the source program and introducing transformations that might confuse other analyses.
I know we teach different compiler optimizations at different stages, but it's really important to realize that all of them need to work together and that having each as a separate pass is a good way to fail at the phase ordering problem.
And going further with the sea-of-nodes representation just makes them all more powerful; I really do recommend reading Cliff Click's thesis.
You're very correct but I suppose I was really answering why compilers centralize around SSA. It's a bold choice to choose one data structure for everything, and that requires more motivation than, "it makes certain optimizations really easy". Because again, it makes other stuff harder.
>And going further with the sea-of-nodes representation just makes them all more powerful; I really do recommend reading Cliff Click's thesis.
We might have to agree to disagree on this one. I actually found sea of nodes to be a boneheaded idea. It makes one or two optimizations a little more elegant but everything else a huge pain in the ass. At least, that was my experience.
It's funny, I've been working on a CPS (continuation passing style) graph optimizer loosly based on Cliff Click's thesis and when I show it to the robots they're like "yeah, that's academic fluffery, it'll never fly" to try inflate my ego a bit until I explain how it actually works then they're "wait, that's actually possible ...and a good idea".
His 'optimization as a constraint solving problem' thing is actually pretty powerful and it just so happens I've been fiddling with a Projective Dynamics constraint solver (which is what the VM is for, to define the constraints) whivh I can abuse to optimize CPS graphs so... take that Robots!
Phi-functions and block arguments are just different views of the same thing. Sometimes it is more convenient to use one or the other when thinking of a problem.
If you lay out phi-functions and their parameters on a grid, you'd get a "phi-matrix" where
phi-functions are rows and block arguments are the columns.
If you don't do an out-of-SSA transform before register allocation, and effectively treat block parameters like function calls then you're pushing the complexity to the register allocator.
An out-of-SSA transform before register allocation would coalesce not just registers but also variables in spill slots (thus avoiding memory-memory moves), it would reduce the complexity of parallel moves. A more advanced transform could also hoist moves out from before the hottest branch which could potentially lead to un-splitting previously split critical edges.
Every time I see a clean SSA explainer like this, I’m reminded that the “simplicity” of SSA only exists because we’ve decided mutation is evil. It’s not that SSA is simpler — it’s that we’ve engineered our entire optimization pipeline around pretending state doesn’t exist.
It’s a brilliant illusion that works… until you hit aliasing, memory models, or concurrency, and suddenly the beautiful DAG collapses into a pile of phi nodes and load/store hell.
SSA isn't about saying mutation is evil. It's about trivializing chasing down def-use rules. In the Dragon Book, essentially the first two dataflow analyses introduced are "reaching definitions" and "live variables"; within an SSA-based IR, these algorithms are basically "traverse a few pointers". There's also some ancillary benefits--SSA also makes a flow-insensitive algorithm partially flow-sensitive just by the fact that it's renaming several variables.
Sure, you still need to keep those algorithms in place for being able to reason about memory loads and stores. But if you put effort into kicking memory operations into virtual register operations (where you get SSA for free), then you can also make the compiler faster since you're not constantly rerunning these analyses, but only on demand for the handful of passes that specifically care about eliminating or moving loads and stores.
As a fan of a functional language, immutability doesn't mean state doesn't exist. You keep state with assignment --- in SSA, every piece of state has a new name.
If you want to keep state beyond the scope of a function, you have to return it, or call another function with it (and hope you have tail call elimination). Or, stash it in a mutable escape hatch.
Is it that we think it's evil or that it's easier to deal with while guaranteeing some SSA-like pattern in the code? It's a fairly easy thing to agree to maintain, whereas if you are more jazzy with it you can end up with either very fiddly passes or just constantly running DFA
SSA isn't (primarily) concerned with memory, it's concerned with local variables. It completely virtualizes the storage of local variables--in fact, all intermediate computations. By connecting computations through dataflow edges and not storage, it removes ordering (except that induced by dependence edges) from consideration.
It is, after all, what CPUs do under the hood with register renaming! They are doing dynamic SSA, a trick they stole from us compiler people!
SSA is appealing because you can defer the load/store hell until after a bunch of optimizations, though, and a lot of those optimizations becomes a lot easier to reason about when you get to pretend state doesn't exist.
You have it backwards. Modern compilers don't use SSA because it's "simpler", we use it because it enables very fast data-flow optimizations (constant prop, CSE, register allocation, etc.) that would otherwise require a lot of state. It doesn't "pretend state doesn't exist", it's actually exactly what makes it possible/practical for the compiler to handle changes in state.
As some evidence to the second point: Haskell is a language that does enforce immutability, but it's compiler, GHC, does not use SSA for main IR -- it uses a "spineless tagless g-machine" graph representation that does, in fact, rely on that immutability. SSA only happens later once it's lowered to a mutating form. If your variables aren't mutated, then you don't even need to transform them to SSA!
Of course, you're welcome to try something else, people certainly have -- take a look at how V8's move to Sea-of-Nodes has gone for them.
To appreciate the “fast” part, nothing beats reading though LuaJIT’s lj_opt_fold.c, none of which would work without SSA.
Of course, LuaJIT is cheating, because compared to most compilers it has redefined the problem to handling exactly two control-flow graphs (a line and a line followed by a loop), so most of the usual awkward parts of SSA simply do not apply. But isn’t creatively redefining the problem the software engineer’s main tool?..
The major issue is that approximately one person in the world understands it well enough to make it work in practice, and that kind of tech debt can't really be pushed too far.
The functional programmers have decided mutability is evil. The imperative programmers have not.
We functional programmers do crazy stuff and pretend we're not on real hardware - a new variable instead of mutating (how wasteful!) and infinite registers (what real-world machine supports that!?).
Anyway, there's plenty of room for alternatives to SSA/CPS/ANF. It's always possible to come up with something with more mutation.
Functional programming is actually good for performance.
Clojure uses "Persistent data structure", which has its own Wikipedia page.
You can have a very complex object, like a long and wide JSON with several levels of indentation. If you want to modify a tiny part of this object, you won't waste memory or time. The object won't be copied entirely. The parts that haven't changed will be reused.
This is only possible because the objects are immutable. If they were mutable, you wouldn't be able to trust that the parts that haven't changed won't change in the future. This means you can share these parts. This is good for memory and for parallelism.
If you're building a React app in JS, React has to check if the object has changed deeply. If you're doing a React app with Clojure, this check is actually disabled.
React will only use a single === comparison to know wether two objects are different or not, and this is basically an integer comparison (like a memory address comparison).
SSA form is a state representation. SSA encodes data flow information explicitly which therefore simplifies all other analysis passes. Including alias analysis.
What a ridiculous comment. No one says mutation is “evil.” It's just harder to optimise. Then all this talk of pretending and illusions, as if the compiler doesn't really work and is producing fake outputs. I assure you that other people do not take your strangely moralising tone to compiler optimisations.
> the “simplicity” of SSA only exists because we’ve decided mutation is evil.
Mutation is the result of sloppy thinking about the role of time in computation. Sloppy thinking is a hindrance to efficient and tractable code transformations.
When you "mutate" a value, you're implicitly indexing it on a time offset - the variable had one value at time t_0 and another value at time t_1. SSA simply uses naming to make this explicit. (As do CPS and ANF, which is where that "equivalence" comes from.)
If you don't use SSA, CPS, or ANF for this purpose, you need to do something else to make the time dimension explicit, or you're going to be dealing with some very hairy problems.
"Evil" in this case is shorthand for saying that mutable variables are an unsuitable model for the purpose. That's not a subjective decision - try to achieve similar results without dealing with the time/mutation issue and you'll find out why.
The shocking truth is that SSA is functional! That's right, the compiler for your favourite imperative language actually optimizes functional programs. See, for example, https://www.jantar.org/papers/chakravarty03perspective.pdf. In fact, SSA, continuation passing style, and ANF are basically the same thing.
The essence of functional languages is that names are created by lambdas, labmdas are first class, and names might not alias themselves (within the same scope, two references to X may be referencing two instances of X that have different values).
The essence of SSA is that names must-alias themselves (X referenced twice in the same scope will definitely give the same value).
There are lots of other interesting differences.
But perhaps the most important difference is just that when folks implement SSA, or CPS, or ANF, they end up with things that look radically different with little opportunity for skills transfer (if you're an SSA compiler hacker then you'll feel like a fish out of water in a CPS compiler).
Folks like to write these "cute" papers that say things that sound nice but aren't really true.
The whole ad-hoc mechanism of phi-nodes in SSA can be replaced by local blocks with parameters. A block that can take parameters is not that different conceptually from a lambda.
But even if you used block arguments, it's so very different from a lambda. Lambdas allow dynamic creation of variables, while SSA doesn't. Therefore, in SSA, variables must-alias themselves, while in the lambda calculus they don't. If you think that a block that takes arguments is the same as a lambda because you're willing to ignore such huge differences, then what's the limiting principle that would make you say that two languages really are different from one another?
Remember, all Turing complete languages are "conceptually the same" in the sense that you can compile them to one another
Phi/Upsilon is even more obviously equivalent to blocks with parameters than phi nodes were. It's storing to a "shadow variable" that you "load from" in the phi, i.e. it's exactly the same "store to ABI specified location in caller, load from it in callee" as a function call.
> It does seem to rhyme with how function calls work under the hood.
You're just overindexing on the fact that block "arguments" are called "arguments". In SSA with block arguments, you can pass data to a block without passing it as an argument. The arguments are just a way of expressing Phis.
And in Phi/Upsilon, this is valid:
MyBlock:
X = Whatever(...)
Upsilon(X, ^Y)
Y = Phi()
Note how I'm using an upsilon to set the shadow variable of a Phi that's in the same block. You can't do that with block arguments.
These things just aren't the same at all. Saying that they are the same just shows that you don't get it.
> When people say equivalent, it is usually not in the mathematical sense, which would be meaningless here because all forms are turing complete anyway.
But they're not equivalent even in a mathematical sense.
> Take equivalent as--similarity that helps you understand A given you know B, and perhaps some of the wisdom from B translates to A.
If you want to understand these things, then focus on how they are so very different rather than brain damaging yourself into thinking they are similar
> You're just overindexing on the fact that block "arguments" are called "arguments"
But that's what it is. Briefly stated, your "upsilon" is just picking the 'actual' argument for the 'formal' parameter in a matching "phi". That's exactly what a function call does, and this holds even though you've intentionally decoupled the "upsilon" and "phi" nodes from any control-flow "jump" construct.
> Note how I'm using an upsilon to set the shadow variable of a Phi that's in the same block. You can't do that with block arguments.
Classically, the phi node would sit at the top of a block anyway, and this arrangement helps significantly in computing dominator set properties, renaming and use-def chains etc. etc. Giving up that property makes everything more awkward, including proofs of correctness for transformations, minimality, etc.
In CS-411 we teach both SSA and parameterized blocks, with more emphasis on SSA. IMHO the main advantage is that phis are connected to their "call" sites and argument values. Blocks aren't first class, there's no need to decouple the names of their inputs from the actual inputs. SSA makes those dataflow connections explicit, which is obviously superior.
I've never actually looked at that compiler, so can't comment on it, but have you read Appel's "Compiling with Continuations"? It motivates and explains the process very clearly. Add ANF into the mix and it's a very straightforward, well-defined system - much more so than SSA.
The author of that compiler used that book as a reference in developing it; that book was open on his desk daily.
It had an immutable IR and literally every one of the dozens of passes made basically an entire copy of it. Needless to say, that was slow and memory hungry. It was difficult to follow the transformation passes unless well-versed in CPS.
We replaced that with a clone of the C1 compiler, C1X, which was basically just a Java rewrite. C1X supposedly still exists in MaxineVM, but has been superceded by the Graal compiler in Truffle/Graal, which is a hybrid CFG/sea-of-nodes style compiler based on SSA.
> It had an immutable IR and literally every one of the dozens of passes made basically an entire copy of it.
Yeah, this is what CPS is really all about. It's not like SSA at all.
Also, it's just a dumb way to write a compiler, and so nobody does that to themselves anymore, now that SSA is widely known.
The goofy "hot" takes about how CPS and SSA are the same are usually written by CPS apologists who want to make the rest of us believe that their work on CPS is somehow relevant when it really isn't
The same thing I don't know... but a long time ago, I remember reading that SSA and CPS were isomorphic. Basically CPS being used for functional languages.
CPS has book keeping related to nested structures whereas SSA has the dominator tree.
CPS is usually higher order and SSA usually first order. As in the continuation you're passing in CPS is probably a closure with some state attached. In SSA you'd have expanded that to pass a function and some explicit state argument, making the allocation explicit.
I think the big thing in favour of CPS was it came first but I could be wrong about that. The lambda people came up with CPS and the imperative people came up with SSA. A while later Appel pointed out that they're views on very similar things. https://www.cs.princeton.edu/~appel/papers/ssafun.pdf is worth reading if you haven't seen it yet.
My experience with SSA is extremely limited, so that might be a stupid question. But does that remain true once memory enters the picture?
The llvm tutorials I played with (admittedly a long time ago) made it seem like "just allocate everything and trust mem2reg" basically abstracted SSA pretty completely from a user pov.
I think you're asking if SSA is still a functional language in the presence of mutation of memory. Both Scheme and ML (of Standard ML and OCaml fame, not machine learning) allow mutation, so mutation doesn't seem to disqualify a language from being functional.
This naturally leads to the question "what is a functional language?" I've written my thoughts on what FP is at [1]. I argue that FP is about local reasoning and composition. The former is most relevant here: local reasoning means it's easy to reason about code. This is exactly why SSA is used in compiler: it makes it easy for the compiler to reason about code and therefore which optimizations are valid. This is the same argument given in these comments: https://news.ycombinator.com/item?id=45674568 and https://news.ycombinator.com/item?id=45678483
In MLIR, there are two representations of memory, `tensor` and `memref`, which enables you to do some high-level things[0] in SSA before "bufferizing" to memrefs, which are eventually lowered to LLVM pointers.
If you're hell bent on functional style, you can represent memory writes as ((Address, Value, X) -> X), where X is a compile-time-only linear type representing the current memory state, which can be manipulated like any other SSA variable. It makes some things more elegant, like two reads of the same address naturally being the same value (as long as it's reading from the same memory state). Doesn't help at all with aliasing analysis or write reordering though so I don't think any serious compilers do this.
It helps _lots_ with aliasing analysis if you've got multiple instances of X representing disjoint areas of the address space. Could call them "heaps".
The second code snippet doesn't use SSA. It just translates the first loop into IR and mangles the variable names. Here is an SSA version of that in the Scheme language.
(let loop ((c c)) (if (< c 10) (loop (* c 3)) c))
Notice that this is stateless and also returns the final value of “c” from the loop. People who use the below style have tended to find that it is much easier to reason about for more complicated looping structures.
Considering it's a functional language (bar memory access bits), and most procedural languages can target this, we can say that a lot of procedural code can be compiled down to functional code - so procedural programming is syntactic sugar on top of a functional framework
Also functional programmers have a couple of pet peeves - tail recursion to implement infinite recursion and loops being one. SSA uses a completely different paradigm - phi nodes - to achieve looping. Considering I don't particularly like tail recursion, as for example the naive recursive implementation of fibonacci is not tail recursive (and thus dangerous, a property that a functional program should never have), and trying to make it tail recursive looks very much like a transformation a compiler should do, which goes against the spirit of functional programming.
I think functional programmers should think of other constructs to achieve infinite recursion or looping, considering I suspect there are infinitely many possible, I guess we could discover ones that are less inherently dangerous and easier to reason about as mere mortals.
The transition from one basic block to another is to copy the live values to some common location, jump the instruction counter, then copy the values out of that location to give the results of the phi nodes.
The transition from one function to another is to copy the arguments to some common location, jump the instruction counter, then copy the values our of that location to give the initial values of the parameters.
These are the same thing, except that the branch in the first case is always at the end of the block. In the tail position.
Your preferred looping construct of branching between basic blocks is _identically_ tail calls between functions, except that some of the block arguments are represented as phi nodes. This is because the blocks are functions.
I see your point - tail calls are identical to MLIR-style phi nodes, where not the branch source determines the value, but it's passed explicitly as a function argument.
I still think that tail recursion is too low level a construct for functional programmers to interact with, but it's wrapped into something like a reduce operation, which allows to write an identical fibonacci impl, as you would in a procedural language.
That "tail recursion" is a thing in it's own right is more a sign of people getting their implementation wrong than anything else.
Specifically, if calling a function at the end of your current function _doesn't_ clean up the call stack first, what you've got is a space leak. That C's ABI on various architectures gives you this behaviour is a bad thing. The space leak sucks there too. You get functions with a tailcall: label at the top and `goto tailcall;` written in the body as a workaround when functional programmers collide with this.
Related is RAII in C++ - implicitly calling things on leaving scope mostly means the function call in the "tail" position has to do more stuff after it returns, so it isn't in the tail position, and you burn memory on keeping track of the work still to do. This takes the initial mistake from C and really leans into it.
If your language doesn't make that implementation mistake, you just have function calls that work fine. Tree style recursion is still a problem, but it's an out of memory one and at least only one side of the tree is eating memory, and in proportion to the book keeping needed to find the other side of the tree.
In the olden days, _function calls_ in Fortran didn't work, because they put their local state in global memory, not on a stack. So if foo called bar called foo, broken. We don't tolerate that any more, everyone has a call stack. But we still tolerate "ah, let's just leak this stack frame for a while, we've always done it like that" at present.
ending up in a UAF when FooObject::bar() tries accessing the receiver "this". Or any other case of the tail function accessing a pointer to something the caller has put on the stack. Short of some kind of crazy dependency tracking (or shoving stuff onto the heap and using a GC), at the end of the day the programmer will have to explicitly work around the stack frame no longer existing in most return statements. To which the only workaround would be doing some circumlocutions specifically to avoid the tail-call recognition.
The foo object needs to be somewhere that the address of it means something, because C++ passes 'this' by pointer.
The answer to this is to look at the current stack frame, shuffle everything that needs to stay alive to one end of it, move the stack pointer to deallocate the rest and then jump to bar, where bar is now responsible for deallocating N bytes of stack before it continues onwards.
It's a pain, sure, but in the scheme of compiler backends it's fine. They do very similar things on optimisation grounds, e.g. "shrink wrapping" means holding off on allocating stack in case an early return fires and you don't need it after all.
Though, when memory use is a function of escape analysis, which is a function of how much time you gave the compiler to work with, I do start to empathise with the make-the-programmer-do-it as the solution.
SSA is a representation of closed Cartesian categories that looks like normal code. Closed Cartesian categories model computation. You don't need SSA and modern compiler frameworks like MLIR make better choices that still mostly use SSA but have more sanity when it comes to basic block boundaries (treat them as having arguments, instead of phi nodes).
I like the style of the blog but a minor nit I'd change is have a definition what SSA is right at the top. It discusses SSA for quite a while "SSA is a property of intermediate representations (IRs)", "it's frequently used" and only 10 paragraphs down actually defines what SSA is
> SSA stands for “static single assignment”, and was developed in the 80s as a way to enhance the existing three-argument code (where every statement is in the form x = y op z) so that every program was circuit-like, using a very similar procedure to the one described above.
I understand it's one of those "well if you don't know what it is, the post is not for you" but I think it's a nice article and could get people who are not familiar with the details interested in it
> The reason this works so well is because we took a function with mutation, and converted it into a combinatorial circuit, a type of digital logic circuit that has no state, and which is very easy to analyze.
That's an interesting insight, it made sense to me. I only dealt with SSA when decompiling bytecode or debugging compiler issues, and never knew why it was needed, but that sort of made it click.
Here's a concise explanation of SSA. Regular (imperative) code is hard to optimize because in general statements are not pure -- if a statement has side effects, then it might not preserve the behavior to optimize that statement by, for example:
1. Removing that statement (dead code elimination)
2. Deduplicating that statement (available expressions)
3. Reordering that statement with other statements (hoisting; loop-invariant code motion)
4. Duplicating that statement (can be useful to enable other optimizations)
All of the above optimizations are very important in compilers, and they are much, much easier to implement if you don't have to worry about preserving side effects while manipulating the program.
So the point of SSA is to translate a program into an equivalent program whose statements have as few side effects as possible. The result is often something that looks like a functional program. (See: https://www.cs.princeton.edu/~appel/papers/ssafun.pdf, which is famous in the compilers community.) In fact, if you view basic blocks themselves as a function, phi nodes "declare" the arguments of the basic block, and branches correspond to tailcalling the next basic block with corresponding values. This has motivated basic block arguments in MLIR.
The "combinatorial circuit" metaphor is slightly wrong, because most SSA implementations do need to consider state for loads and stores into arbitrary memory, or arbitrary function calls. Also, it's not easy to model a loop of arbitrary length as a (finite) combinatorial circuit. Given that the author works at an AI accelerator company, I can see why he leaned towards that metaphor, though.
> Given that the author works at an AI accelerator company
I actually believe he works at Buf.build and his resume is stale, the author previously posted about their Go Protobuf parser hyperpb which is a Buf project.
Maybe still "author recently worked at an AI accelerator company" though.
Thank you. That’s a great explanation.
This post is frankly one of the most convoluted discussions of SSA I've read. There's lots of info there, but I'd frankly suggest going back and look at a paper on implementing it. I think I first came across SSA in a paper adding it to Wirths Oberon compiler, and it was much more accessible.
Edit: It was this paper by Brandis and Mössenböck: https://share.google/QNoV9G8yMBWQJqC82
Thanks for the link. Looks like an interesting paper. Here is the original reference: https://dl.acm.org/doi/10.1145/197320.197331.
And here is a better readable postscript version: https://web.archive.org/web/20170706013237/ftp://ftp.ssw.uni...
As someone who knows a bit about SSA, what do you make of LLVM's design decision not to use SSA for memory but only registers (i.e. it doesn't have memory SSA)? It has always confused me a bit as to why this was done.
Also I recommend Bob Morgan's book:
https://turbo51.com/download/Building-an-Optimizing-Compile-...
Indeed a great book; I even have a paper copy.
The SSA book is also pretty good: https://web.archive.org/web/20201111210448/https://ssabook.g...
I’ve found the SSA book to be... unforgiving in its difficulty. Not in the sense that I thought it to be a bad book but rather in that I was getting the feeling that a dilettante in compilers like me wasn’t the target audience.
I was involved in making the book. It is very much a book for academics, and came out of an academic conference bringing together people working at the forefront of SSA-based research.
I mean, once again, I’m not really complaining about the book. It’s fairly mathy, sure, but so what. I also actually welcome that it’s a coherent book rather than a bunch of papers in a trenchcoat or a menagerie of neat things people have thought of (*cough* Paxos variants).
It’s rather that it’s a bit unusual in that it’s a coherent book whose prerequisites (on top of an old-timey compilers course, say) I don’t think actually exist in book form (I’d love to be proven wrong, as that’s likely what I need to read). The introductory part does make it self-contained in a sense, but it’s more like those grad-level maths books that include a definitions chapter or three: technically you don’t need to know any of that stuff beforehand, true, but in reality if it does more for you than just fill a few gaps and fix terminology, then you’re not going to have a good time with the rest. Again, just my experience, I don’t know if that’s what you were going for.
If there was a criticism implied in my initial comment, it’s that I think that the kind of person that goes looking for literature recommendations in this thread isn’t going to have a good time with it, either; so at the very least they should know what they’re signing up for. But I don’t think you’re really disagreeing with that part?..
Like so many compiler books from academia.
This only contains the first 3 chapters.
Google says there’s no eBook available, for what that’s worth.
https://books.google.com/books/about/Building_an_Optimizing_...
Yet Amazon says it’s on Kindle. https://www.amazon.com/Building-Optimizing-Compiler-Bob-Morg...
Honestly, I think it's just something you either like or don't. If all you were trying to do was understand SSA, I agree this blog post is probably inefficient at that particular task, but often blog posts are entertainment as much as education, so meandering through a bunch of different things along the way is part of the deal. Personally I thought there were a lot of pretty interesting insights that I haven't seen a lot of discussion about in other places, though I will admit I mostly learned about SSA from Wikipedia and from people yelling about compilers online.
I learned a bit about SSA in a compiler course. Among many other things, it is crucial for register assignment. You want to know each distinct value that will exist, and the lifetimes of those values, in order to give each a register. Then, if have more distinct values existing at one time than you have registers, you have to push stuff to the stack.
It is not critical for register assignment -- in fact, SSA makes register assignment more difficult (see the swap problem; the lost copy problem).
Lifetime analysis is important for register assignment, and SSA can make lifetime analysis easier, but plenty of non-SSA compilers (lower-tier JIT compilers often do not use SSA because SSA is heavyweight) are able to register allocate just fine without it.
One nice thing is that it makes register assignment polynomial (the coloring of SSA variables graph is polynomial since it's not an arbitrary graph).
The motivation and reason it works is also wrong anyway. Like i get it's a gentle intro, but i think there are ways to accomplish that without being egregiously history rewriting ;)
Ken zadeck was my office mate for years, so this is my recollection, but it's also been a few decades, so sorry for any errors :)
The reason of why it works is definitely wrong - they weren't even using rewriting forms of SSA, and didn't for a long time. Even the first "fully SSA" compiler (generally considered to be open64) did not rewrite the IR into SSA.
The reason it works so well is because it enables you to perform effective per-variable dataflow. In fact, this is the only problem it solves - the ability to perform unrestricted per-variable dataflow quickly. This was obvious given the historical context at the time, but less obvious now that it is history :)
In the simplest essence, SSA enables you to follow chains of dataflow for a variable very easily and simply, and that's probably what i would have said instead.
It's true that for simple scalar programs, the variable name reuse that breaks these dataflow chains mostly occur at explicit mutation points, but this is not always correct depending on the compiler IR, and definitely not correct you extend it to memory, among other things. It's also not correct at all as you extend the thing SSA enables you to do (per-variable dataflow quickly) to other classes of problems (SSU, SSI, etc).
History wise - this post also sort of implies SSA came out of nowhere and was some revolutionary thing nobody had ever thought about, just sort of developed in the 80's.
In fact, it was a formalization of attempts at per-variable dataflow they had been working on for a while.
I'd probably just say that as a gentle intro, but if you want the rest of history, here you go:
Well before SSA, it was already known that lower bounds on bitvector dataflow (the dominant approach at the time of SSA) were not great. Over the years, it turned out they were much better than initially expected, but in the end, worse than anyone wanted as programs got bigger and bigger. N^2 or N^3 bitvector operations for most problems. Incrementality is basically impossible[2]. They were also hard to understand and debug because of dependencies between related variables, etc.
Attempts at faster/better dataflow existed in two rough forms, neither of which are bound by the same lower bound:
1. Structured dataflow/Interval analysis algorithms/Elimination dataflow - reduce the CFG into various types of regions with a known system of equations, solve the equations, distribute the results to the regions. Only works well on reducible graphs Can be done incrementally with care. This was mostly studied in parallel to bitvectors, and was thought heavily about before it became known that there was a class of rapid dataflow problems (IE before the late 70's). Understanding the region equations well enough to debug them requires a very deep understanding of the basis of dataflow - lattices, etc.
In that sense it was worse than bitvectors to understand for most people. Graph reducibility restrictions were annoying but tractable on forward graphs through node splitting and whatnot (studies showed 90+% of programs at the time had reducible flowgraphs already), but almost all backwards CFG's are not reducible, making backwards dataflow problems quite annoying. In the end, as iterative dataflow became provably faster/etc, and compilers became the province of more than just theory experts, this sort of died[3]. If you ever become a compiler theory nerd, it's actually really interesting to look at IMHO.
2. Per-variable dataflow approaches. This is basically "solve a dataflow problem for single variable or cluster of variables at a time instead of for all variables at once". There were not actually a ton of people who thought this would ever turn into a practical approach:
a. The idea of solving reaching definitions (for example) one variable at a time seemed like it would be much slower than solving it for all variables at once.
b. It was very non-obvious how to be able to effectively partition variables to be able to compute a problem on one variable (or a cluster of variables) at a time without it devolving into either bitvector-like time bounds or structured dataflow like math complexity.
It was fairly obvious at the time that if you culd make it work, you could probably get much faster incremental dataflow solution.
SSA came out of this approach. Kenny's thesis dealt with incremental dataflow and partitioned variable problems, and proved time bounds on various times of partitioned/clustered variable problems. You can even see the basis of SSA in how it thinks about things. Here, i'll quote from a few parts:
" As the density of Def sites per variable increases, the average size of the affected area for each change will decrease...".
His thesis is (amont other things) on a method for doing this that is typical for the time (IE not SSA):
"The mechanism used to enhance performance is raising the density of Def sites for each variable. The most obvious way to increase the Use and Def density is to attempt a reduction on the program flow graph. This reduction would replace groups of nodes by a single new node. This new node would then be labeled with infcrmation that summarized the effects of execution through that group of nodes."
This is because, as i mentioned, figuring out how to do it by partitioning the variables based on dominance relationships was not known - that is literally SSA, and also because Kenny wanted to finish his thesis before the heat death of the sun.
In this sense, they were already aware that if they got "the right number of names" for a variable, the amount of dataflow computation needed for most problems (reaching defs, etc) would become very small, and that changes would be easy to handle. They knew fairly quickly that for the dataflow problems they wanted to solve, they needed each variable to have a single reaching definition (and reaching definitions was well known), etc.
SSA was the incremental culmination of going from there to figuring out a clustering/partitioning of variables that was not based on formally structured control flow regions (which are all-variables things), but instead based on local dataflow for a variable with incorporation of the part of the CFG structure that could actually affect the local result for a given variable. It was approached systematically - understanding the properties they needed, figuring out algorithms that solved them, etc.
Like a lot of things that turn out to be unexpectedly useful, take the world by storm, whatever, etc, history later seems to try to often represent them as a eureka moment.
[1] Bitvectors are assumed to be fixed size, and thus constant cost, but feel free to add another factor of N here if you want.
[2] This is not true in the sense that we knew how to recompute the result incrementally, and end up with a correct result, but doing so provably faster than solving the problem from scratch was not possible.
[3] They actually saw somewhat of a resurgence in the world of GPUs and more general computation graphs because it all becomes heavily applicable again to solving. However, we almost always have eventually developed easier to understand (even if potentially slower theory-wise) global algorithms and use those instead, because we have the compute power to do it, and this tradeoff is IMHO worth it.
I'm extremely uncertain of my history here, but my recollection is that SSA wasn't seen as practical until the development of the dominance frontier algorithm for inserting phi nodes, which seems to be 1991.
Yes, this is a problem I see more and more often. Besides the fact that it's always a good habit to spell out an acrostic or abbreviation the first time it's used in an article, there's also the fact that this is, you know, the web, and it's very easy to make a link to a Wikipedia article or something that will explain the term, if you don't want to bother adding your own explanation.
I don't think it really explains what it's for though. Here's my explanation (from my PhD thesis on compilers):
Static single assignment (SSA) form provides an efficient intermediate representation for program analysis [Cytron et al., 1989, 1991]. It was introduced as a way of efficiently representing dataflow analyses.
SSA uses a single key idea: that all variables in the program are renamed so that each variable is assigned a value at a single unique statement in the program. From this key idea, SSA is able to provide a number of advantages over other techniques of dataflow analyses:
Factored use-def chain: With a def-use chain, dataflow results may be propagated directly from the assignment of a variable to all of its uses. However, a def-use chain requires an edge from each definition to each use, which may be expensive when a program has many definitions and uses of the same variable. In practice, this occurs in the presence of switch-statements. SSA factors the def-use chain over a φ-node, avoiding this pathological case.
Flow-sensitivity: A flow-insensitive algorithm performed on an SSA form is much more precise than if SSA form were not used. The flow-insensitive problem of multiple definitions to the same variable is solved by the single assignment property. The allows a flow-insensitive algorithm to approach the precision of a flow-sensitive algorithm.
Memory usage: Without SSA form, an analysis must store information for every variable at every program point. SSA form allows a sparse analysis, where an analysis must store information only for every assignment in the program. With a unique version per assignment, the memory usage of storing the results of an analysis can be considerably lower than using bit-vector or set-based approaches.
I like this article a lot but it doesn't answer the question of "Why SSA?".
Sure, a graph representation is nice, but that isn't a unique property of SSA. You can have graph IRs that aren't SSA at all.
And sure, SSA makes some optimizations easy, but it also makes other operations more difficult. When you consider that, plus the fact that going into and out of SSA is quite involved, it doesn't seem like SSA is worth the fuss.
So why SSA?
Well, it turns out compilers have sequencing issues. If you view compilation as a series of small code transformations, your representation goes from A -> B, then B -> C, then C -> D and so on. At least, that's how it works for non-optimizing compilers.
For optimizing compilers however, passes want to loop. Whenever an optimization is found, previous passes should be run again with new inputs... if possible. The easiest way to ensure this is to make all optimizations input and output the same representation. So A -> B is no good. We want A -> A: a singular representation.
So if we want a singular representation, let's pick a good one right? One that works reasonably well for most things. That's why SSA is useful: it's a decently good singular representation we can use for every pass.
While iterating optimizations is nice, I think you missed the main point of SSA.
SSA makes dataflow between operations explicit; it completely eliminates the original (incidental) names from programs. Because of that, all dataflow problems (particularly forward dataflow problems) get vastly simpler.
With SSA you can throw basically all forward dataflow problems (particularly with monotonic transformations) into a single pass and they all benefit each other. Without SSA, you have every single transformation tripping over itself to deal with names from the source program and introducing transformations that might confuse other analyses.
I know we teach different compiler optimizations at different stages, but it's really important to realize that all of them need to work together and that having each as a separate pass is a good way to fail at the phase ordering problem.
And going further with the sea-of-nodes representation just makes them all more powerful; I really do recommend reading Cliff Click's thesis.
You're very correct but I suppose I was really answering why compilers centralize around SSA. It's a bold choice to choose one data structure for everything, and that requires more motivation than, "it makes certain optimizations really easy". Because again, it makes other stuff harder.
>And going further with the sea-of-nodes representation just makes them all more powerful; I really do recommend reading Cliff Click's thesis.
We might have to agree to disagree on this one. I actually found sea of nodes to be a boneheaded idea. It makes one or two optimizations a little more elegant but everything else a huge pain in the ass. At least, that was my experience.
It's funny, I've been working on a CPS (continuation passing style) graph optimizer loosly based on Cliff Click's thesis and when I show it to the robots they're like "yeah, that's academic fluffery, it'll never fly" to try inflate my ego a bit until I explain how it actually works then they're "wait, that's actually possible ...and a good idea".
His 'optimization as a constraint solving problem' thing is actually pretty powerful and it just so happens I've been fiddling with a Projective Dynamics constraint solver (which is what the VM is for, to define the constraints) whivh I can abuse to optimize CPS graphs so... take that Robots!
In the basic block with arguments variation there is no going in and out of SSA.
Phi-functions and block arguments are just different views of the same thing. Sometimes it is more convenient to use one or the other when thinking of a problem.
If you lay out phi-functions and their parameters on a grid, you'd get a "phi-matrix" where phi-functions are rows and block arguments are the columns.
If you don't do an out-of-SSA transform before register allocation, and effectively treat block parameters like function calls then you're pushing the complexity to the register allocator.
An out-of-SSA transform before register allocation would coalesce not just registers but also variables in spill slots (thus avoiding memory-memory moves), it would reduce the complexity of parallel moves. A more advanced transform could also hoist moves out from before the hottest branch which could potentially lead to un-splitting previously split critical edges.
Every time I see a clean SSA explainer like this, I’m reminded that the “simplicity” of SSA only exists because we’ve decided mutation is evil. It’s not that SSA is simpler — it’s that we’ve engineered our entire optimization pipeline around pretending state doesn’t exist.
It’s a brilliant illusion that works… until you hit aliasing, memory models, or concurrency, and suddenly the beautiful DAG collapses into a pile of phi nodes and load/store hell.
SSA isn't about saying mutation is evil. It's about trivializing chasing down def-use rules. In the Dragon Book, essentially the first two dataflow analyses introduced are "reaching definitions" and "live variables"; within an SSA-based IR, these algorithms are basically "traverse a few pointers". There's also some ancillary benefits--SSA also makes a flow-insensitive algorithm partially flow-sensitive just by the fact that it's renaming several variables.
Sure, you still need to keep those algorithms in place for being able to reason about memory loads and stores. But if you put effort into kicking memory operations into virtual register operations (where you get SSA for free), then you can also make the compiler faster since you're not constantly rerunning these analyses, but only on demand for the handful of passes that specifically care about eliminating or moving loads and stores.
> pretending state doesn’t exist.
As a fan of a functional language, immutability doesn't mean state doesn't exist. You keep state with assignment --- in SSA, every piece of state has a new name.
If you want to keep state beyond the scope of a function, you have to return it, or call another function with it (and hope you have tail call elimination). Or, stash it in a mutable escape hatch.
Is it that we think it's evil or that it's easier to deal with while guaranteeing some SSA-like pattern in the code? It's a fairly easy thing to agree to maintain, whereas if you are more jazzy with it you can end up with either very fiddly passes or just constantly running DFA
> we’ve decided mutation is evil.
SSA isn't (primarily) concerned with memory, it's concerned with local variables. It completely virtualizes the storage of local variables--in fact, all intermediate computations. By connecting computations through dataflow edges and not storage, it removes ordering (except that induced by dependence edges) from consideration.
It is, after all, what CPUs do under the hood with register renaming! They are doing dynamic SSA, a trick they stole from us compiler people!
SSA is appealing because you can defer the load/store hell until after a bunch of optimizations, though, and a lot of those optimizations becomes a lot easier to reason about when you get to pretend state doesn't exist.
You have it backwards. Modern compilers don't use SSA because it's "simpler", we use it because it enables very fast data-flow optimizations (constant prop, CSE, register allocation, etc.) that would otherwise require a lot of state. It doesn't "pretend state doesn't exist", it's actually exactly what makes it possible/practical for the compiler to handle changes in state.
As some evidence to the second point: Haskell is a language that does enforce immutability, but it's compiler, GHC, does not use SSA for main IR -- it uses a "spineless tagless g-machine" graph representation that does, in fact, rely on that immutability. SSA only happens later once it's lowered to a mutating form. If your variables aren't mutated, then you don't even need to transform them to SSA!
Of course, you're welcome to try something else, people certainly have -- take a look at how V8's move to Sea-of-Nodes has gone for them.
Meanwhile Java Hotspot, where Sea-of-Nodes was originally made mainstream, and GraalVM are doing just fine.
The author of Sea-of-Nodes approach is quite critic of V8's decision, as one would expect.
https://www.youtube.com/watch?v=Zo801M9E--M
To appreciate the “fast” part, nothing beats reading though LuaJIT’s lj_opt_fold.c, none of which would work without SSA.
Of course, LuaJIT is cheating, because compared to most compilers it has redefined the problem to handling exactly two control-flow graphs (a line and a line followed by a loop), so most of the usual awkward parts of SSA simply do not apply. But isn’t creatively redefining the problem the software engineer’s main tool?..
> take a look at how V8's move to Sea-of-Nodes has gone for them.
Are you implying it hasn't gone well? I thought it bought some performance at least. What are the major issues? Any sources I can follow up on?
V8 blog post from March: "Land ahoy: leaving the Sea of Nodes" https://v8.dev/blog/leaving-the-sea-of-nodes
Feedback from sea of nodes algorithm creator, https://www.youtube.com/watch?v=Zo801M9E--M
Fascinating, thanks!
The major issue is that approximately one person in the world understands it well enough to make it work in practice, and that kind of tech debt can't really be pushed too far.
SSA is not about hiding states.
It's about naming intermediate states so you can refer to them in some way.
> we’ve decided mutation is evil
The functional programmers have decided mutability is evil. The imperative programmers have not.
We functional programmers do crazy stuff and pretend we're not on real hardware - a new variable instead of mutating (how wasteful!) and infinite registers (what real-world machine supports that!?).
Anyway, there's plenty of room for alternatives to SSA/CPS/ANF. It's always possible to come up with something with more mutation.
Functional programming is actually good for performance. Clojure uses "Persistent data structure", which has its own Wikipedia page. You can have a very complex object, like a long and wide JSON with several levels of indentation. If you want to modify a tiny part of this object, you won't waste memory or time. The object won't be copied entirely. The parts that haven't changed will be reused.
This is only possible because the objects are immutable. If they were mutable, you wouldn't be able to trust that the parts that haven't changed won't change in the future. This means you can share these parts. This is good for memory and for parallelism.
If you're building a React app in JS, React has to check if the object has changed deeply. If you're doing a React app with Clojure, this check is actually disabled. React will only use a single === comparison to know wether two objects are different or not, and this is basically an integer comparison (like a memory address comparison).
SSA form is a state representation. SSA encodes data flow information explicitly which therefore simplifies all other analysis passes. Including alias analysis.
What a ridiculous comment. No one says mutation is “evil.” It's just harder to optimise. Then all this talk of pretending and illusions, as if the compiler doesn't really work and is producing fake outputs. I assure you that other people do not take your strangely moralising tone to compiler optimisations.
> the “simplicity” of SSA only exists because we’ve decided mutation is evil.
Mutation is the result of sloppy thinking about the role of time in computation. Sloppy thinking is a hindrance to efficient and tractable code transformations.
When you "mutate" a value, you're implicitly indexing it on a time offset - the variable had one value at time t_0 and another value at time t_1. SSA simply uses naming to make this explicit. (As do CPS and ANF, which is where that "equivalence" comes from.)
If you don't use SSA, CPS, or ANF for this purpose, you need to do something else to make the time dimension explicit, or you're going to be dealing with some very hairy problems.
"Evil" in this case is shorthand for saying that mutable variables are an unsuitable model for the purpose. That's not a subjective decision - try to achieve similar results without dealing with the time/mutation issue and you'll find out why.
The shocking truth is that SSA is functional! That's right, the compiler for your favourite imperative language actually optimizes functional programs. See, for example, https://www.jantar.org/papers/chakravarty03perspective.pdf. In fact, SSA, continuation passing style, and ANF are basically the same thing.
No they're not.
The essence of functional languages is that names are created by lambdas, labmdas are first class, and names might not alias themselves (within the same scope, two references to X may be referencing two instances of X that have different values).
The essence of SSA is that names must-alias themselves (X referenced twice in the same scope will definitely give the same value).
There are lots of other interesting differences.
But perhaps the most important difference is just that when folks implement SSA, or CPS, or ANF, they end up with things that look radically different with little opportunity for skills transfer (if you're an SSA compiler hacker then you'll feel like a fish out of water in a CPS compiler).
Folks like to write these "cute" papers that say things that sound nice but aren't really true.
The whole ad-hoc mechanism of phi-nodes in SSA can be replaced by local blocks with parameters. A block that can take parameters is not that different conceptually from a lambda.
Local blocks with parameters is the gross way to do it. The right way to do it is Phi/Upsilon form.
https://gist.github.com/pizlonator/cf1e72b8600b1437dda8153ea...
But even if you used block arguments, it's so very different from a lambda. Lambdas allow dynamic creation of variables, while SSA doesn't. Therefore, in SSA, variables must-alias themselves, while in the lambda calculus they don't. If you think that a block that takes arguments is the same as a lambda because you're willing to ignore such huge differences, then what's the limiting principle that would make you say that two languages really are different from one another?
Remember, all Turing complete languages are "conceptually the same" in the sense that you can compile them to one another
Phi/Upsilon is even more obviously equivalent to blocks with parameters than phi nodes were. It's storing to a "shadow variable" that you "load from" in the phi, i.e. it's exactly the same "store to ABI specified location in caller, load from it in callee" as a function call.
> Phi/Upsilon is even more obviously equivalent to blocks with parameters than phi nodes were
Then you don't understand Phi/Upsilon
It does seem to rhyme with how function calls work under the hood.
When people say equivalent, it is usually not in the mathematical sense, which would be meaningless here because all forms are turing complete anyway.
Take equivalent as--similarity that helps you understand A given you know B, and perhaps some of the wisdom from B translates to A.
> It does seem to rhyme with how function calls work under the hood.
You're just overindexing on the fact that block "arguments" are called "arguments". In SSA with block arguments, you can pass data to a block without passing it as an argument. The arguments are just a way of expressing Phis.
And in Phi/Upsilon, this is valid:
Note how I'm using an upsilon to set the shadow variable of a Phi that's in the same block. You can't do that with block arguments.These things just aren't the same at all. Saying that they are the same just shows that you don't get it.
> When people say equivalent, it is usually not in the mathematical sense, which would be meaningless here because all forms are turing complete anyway.
But they're not equivalent even in a mathematical sense.
> Take equivalent as--similarity that helps you understand A given you know B, and perhaps some of the wisdom from B translates to A.
If you want to understand these things, then focus on how they are so very different rather than brain damaging yourself into thinking they are similar
> You're just overindexing on the fact that block "arguments" are called "arguments"
But that's what it is. Briefly stated, your "upsilon" is just picking the 'actual' argument for the 'formal' parameter in a matching "phi". That's exactly what a function call does, and this holds even though you've intentionally decoupled the "upsilon" and "phi" nodes from any control-flow "jump" construct.
> Note how I'm using an upsilon to set the shadow variable of a Phi that's in the same block. You can't do that with block arguments.
Classically, the phi node would sit at the top of a block anyway, and this arrangement helps significantly in computing dominator set properties, renaming and use-def chains etc. etc. Giving up that property makes everything more awkward, including proofs of correctness for transformations, minimality, etc.
In CS-411 we teach both SSA and parameterized blocks, with more emphasis on SSA. IMHO the main advantage is that phis are connected to their "call" sites and argument values. Blocks aren't first class, there's no need to decouple the names of their inputs from the actual inputs. SSA makes those dataflow connections explicit, which is obviously superior.
I've never been more confused than working on Maxine VM's CPS optimizing compiler.
I've never actually looked at that compiler, so can't comment on it, but have you read Appel's "Compiling with Continuations"? It motivates and explains the process very clearly. Add ANF into the mix and it's a very straightforward, well-defined system - much more so than SSA.
The author of that compiler used that book as a reference in developing it; that book was open on his desk daily.
It had an immutable IR and literally every one of the dozens of passes made basically an entire copy of it. Needless to say, that was slow and memory hungry. It was difficult to follow the transformation passes unless well-versed in CPS.
We replaced that with a clone of the C1 compiler, C1X, which was basically just a Java rewrite. C1X supposedly still exists in MaxineVM, but has been superceded by the Graal compiler in Truffle/Graal, which is a hybrid CFG/sea-of-nodes style compiler based on SSA.
> It had an immutable IR and literally every one of the dozens of passes made basically an entire copy of it.
Yeah, this is what CPS is really all about. It's not like SSA at all.
Also, it's just a dumb way to write a compiler, and so nobody does that to themselves anymore, now that SSA is widely known.
The goofy "hot" takes about how CPS and SSA are the same are usually written by CPS apologists who want to make the rest of us believe that their work on CPS is somehow relevant when it really isn't
The same thing I don't know... but a long time ago, I remember reading that SSA and CPS were isomorphic. Basically CPS being used for functional languages.
edit: actually even discussed on here
CPS is formally equivalent to SSA, is it not? What are advantages of using CPS o... | Hacker News https://share.google/PkSUW97GIknkag7WY
CPS has book keeping related to nested structures whereas SSA has the dominator tree.
CPS is usually higher order and SSA usually first order. As in the continuation you're passing in CPS is probably a closure with some state attached. In SSA you'd have expanded that to pass a function and some explicit state argument, making the allocation explicit.
I think the big thing in favour of CPS was it came first but I could be wrong about that. The lambda people came up with CPS and the imperative people came up with SSA. A while later Appel pointed out that they're views on very similar things. https://www.cs.princeton.edu/~appel/papers/ssafun.pdf is worth reading if you haven't seen it yet.
My experience with SSA is extremely limited, so that might be a stupid question. But does that remain true once memory enters the picture?
The llvm tutorials I played with (admittedly a long time ago) made it seem like "just allocate everything and trust mem2reg" basically abstracted SSA pretty completely from a user pov.
I think you're asking if SSA is still a functional language in the presence of mutation of memory. Both Scheme and ML (of Standard ML and OCaml fame, not machine learning) allow mutation, so mutation doesn't seem to disqualify a language from being functional.
This naturally leads to the question "what is a functional language?" I've written my thoughts on what FP is at [1]. I argue that FP is about local reasoning and composition. The former is most relevant here: local reasoning means it's easy to reason about code. This is exactly why SSA is used in compiler: it makes it easy for the compiler to reason about code and therefore which optimizations are valid. This is the same argument given in these comments: https://news.ycombinator.com/item?id=45674568 and https://news.ycombinator.com/item?id=45678483
[1]: https://noelwelsh.com/posts/what-and-why-fp/
In MLIR, there are two representations of memory, `tensor` and `memref`, which enables you to do some high-level things[0] in SSA before "bufferizing" to memrefs, which are eventually lowered to LLVM pointers.
[0]: https://mlir.llvm.org/docs/Dialects/TensorOps/
If you're hell bent on functional style, you can represent memory writes as ((Address, Value, X) -> X), where X is a compile-time-only linear type representing the current memory state, which can be manipulated like any other SSA variable. It makes some things more elegant, like two reads of the same address naturally being the same value (as long as it's reading from the same memory state). Doesn't help at all with aliasing analysis or write reordering though so I don't think any serious compilers do this.
It helps _lots_ with aliasing analysis if you've got multiple instances of X representing disjoint areas of the address space. Could call them "heaps".
The sea-of-nodes representation renames memory as just another dependence edge in a somewhat analogous manner.
https://lwn.net/Articles/84888/ bit of compiler history on GCC's move to SSA way back when
Forget compilers, SSA is an immensely valuable readability improvement for humans, too.
Why have
when you could haveTry mem2reg on that to get rid of the loads and stores.
The second code snippet doesn't use SSA. It just translates the first loop into IR and mangles the variable names. Here is an SSA version of that in the Scheme language.
Notice that this is stateless and also returns the final value of “c” from the loop. People who use the below style have tended to find that it is much easier to reason about for more complicated looping structures.I intended it to be mostly a joke. Many people equate LLVM IR with SSA.
I might even argue that easy to read and easy to reason about are opposites.
For the most part, languages like Python and Ruby can be very easy to read but difficult to understand precisely what actually happens at runtime.
Things like LLVM IR are much more explicit, making it easier to reason about but extremely difficult to read.
Maybe somewhere between is "pure" side effect free functional programs.
SSA makes me think of a few interesting points:
Considering it's a functional language (bar memory access bits), and most procedural languages can target this, we can say that a lot of procedural code can be compiled down to functional code - so procedural programming is syntactic sugar on top of a functional framework
Also functional programmers have a couple of pet peeves - tail recursion to implement infinite recursion and loops being one. SSA uses a completely different paradigm - phi nodes - to achieve looping. Considering I don't particularly like tail recursion, as for example the naive recursive implementation of fibonacci is not tail recursive (and thus dangerous, a property that a functional program should never have), and trying to make it tail recursive looks very much like a transformation a compiler should do, which goes against the spirit of functional programming.
I think functional programmers should think of other constructs to achieve infinite recursion or looping, considering I suspect there are infinitely many possible, I guess we could discover ones that are less inherently dangerous and easier to reason about as mere mortals.
The transition from one basic block to another is to copy the live values to some common location, jump the instruction counter, then copy the values out of that location to give the results of the phi nodes.
The transition from one function to another is to copy the arguments to some common location, jump the instruction counter, then copy the values our of that location to give the initial values of the parameters.
These are the same thing, except that the branch in the first case is always at the end of the block. In the tail position.
Your preferred looping construct of branching between basic blocks is _identically_ tail calls between functions, except that some of the block arguments are represented as phi nodes. This is because the blocks are functions.
I see your point - tail calls are identical to MLIR-style phi nodes, where not the branch source determines the value, but it's passed explicitly as a function argument.
I still think that tail recursion is too low level a construct for functional programmers to interact with, but it's wrapped into something like a reduce operation, which allows to write an identical fibonacci impl, as you would in a procedural language.
That "tail recursion" is a thing in it's own right is more a sign of people getting their implementation wrong than anything else.
Specifically, if calling a function at the end of your current function _doesn't_ clean up the call stack first, what you've got is a space leak. That C's ABI on various architectures gives you this behaviour is a bad thing. The space leak sucks there too. You get functions with a tailcall: label at the top and `goto tailcall;` written in the body as a workaround when functional programmers collide with this.
Related is RAII in C++ - implicitly calling things on leaving scope mostly means the function call in the "tail" position has to do more stuff after it returns, so it isn't in the tail position, and you burn memory on keeping track of the work still to do. This takes the initial mistake from C and really leans into it.
If your language doesn't make that implementation mistake, you just have function calls that work fine. Tree style recursion is still a problem, but it's an out of memory one and at least only one side of the tree is eating memory, and in proportion to the book keeping needed to find the other side of the tree.
In the olden days, _function calls_ in Fortran didn't work, because they put their local state in global memory, not on a stack. So if foo called bar called foo, broken. We don't tolerate that any more, everyone has a call stack. But we still tolerate "ah, let's just leak this stack frame for a while, we've always done it like that" at present.
If you want automatic implicit tail calls for literally everything, then you need a solution for
ending up in a UAF when FooObject::bar() tries accessing the receiver "this". Or any other case of the tail function accessing a pointer to something the caller has put on the stack. Short of some kind of crazy dependency tracking (or shoving stuff onto the heap and using a GC), at the end of the day the programmer will have to explicitly work around the stack frame no longer existing in most return statements. To which the only workaround would be doing some circumlocutions specifically to avoid the tail-call recognition.Good example. It's a little clearer if desugared:
The foo object needs to be somewhere that the address of it means something, because C++ passes 'this' by pointer.The answer to this is to look at the current stack frame, shuffle everything that needs to stay alive to one end of it, move the stack pointer to deallocate the rest and then jump to bar, where bar is now responsible for deallocating N bytes of stack before it continues onwards.
It's a pain, sure, but in the scheme of compiler backends it's fine. They do very similar things on optimisation grounds, e.g. "shrink wrapping" means holding off on allocating stack in case an early return fires and you don't need it after all.
Though, when memory use is a function of escape analysis, which is a function of how much time you gave the compiler to work with, I do start to empathise with the make-the-programmer-do-it as the solution.
Smallest of nitpicks: the depicted multiplier is a 2 bit multiplier. A one bit multiplier is just an and gate.
Static single assignment (SSA)
That minimap is wild, it duplicates the entire post but every word is surrounded by a span. I thought maybe it was like a bitmap or something but no:
SSA is a representation of closed Cartesian categories that looks like normal code. Closed Cartesian categories model computation. You don't need SSA and modern compiler frameworks like MLIR make better choices that still mostly use SSA but have more sanity when it comes to basic block boundaries (treat them as having arguments, instead of phi nodes).
[dead]