Tree-shaking is such a bad misnomer. Virgil's compiler calls this "reachability analysis" and it's built into the compilation model. The compiler will parse and typecheck a program's (and libraries' code), and run initializers, but after that the compilation proceeds by exploring from the main entrypoint(s) and only reachable code is analyzed and ends in the final binary. It will happily generate a program (without runtime system) that just consists of a single main function. The runtime system is only necessary for stacktraces and GC, and can be omitted if desired.
The name "treeshaker" for an application delivery tool possibly originated in Lisp. The first time I've found it was in Lucid Common Lisp, a (no longer available) commercial implementation of Common Lisp for UNIX. Lucid CL 4.1 in 1992 included a tool called Treeshaker. Lucid CL was one of those Lisps which have the idea of an image (see for example the "image" feature of Lisp 1 in 1960), a saved memory dump of the current heap of a running Lisp. An application then is an image plus a runtime. The image typically includes almost all code AND data from the memory, which sometimes creates the wish to create smaller images for application delivery. Lucid CL then included a tool, which before saving that image, removes all kinds of unused code and data, depending on some meaning of what „unused“ means. For example the symbol table may have variables, types, functions, etc. which are not referenced anywhere. A treeshaker tool might also get a list of things to remove. Thus in a graph of reachable Lisp data&code, the connections are pruned. Either a GC or a specialized piece of code then collects the garbage, shrinks memory (-> shakes the tree and everything which is cut loose is falling down) and dumps it as a possibly smaller image.
The treeshaker thus was not a compiler tool, but a tool to remove what was determined to be unused code of a Lisp heap. Remember, by default such a Lisp image would also contain a compiler, an interpreter and an implementation of a read-eval-print loop. Thus if we break (-> interrupt a thread which then provides a REPL) a running program into such a read-eval-print loop, we could still use all the code, which is in this heap (which was restored from an image). Thus it would make sense to remove the compiler too, and possibly the read-eval-print loop, too.
I could be wrong, but I've heard that the term originated in Smalltalk. Smalltalk is also image-based, and tree-shaking was a way to produce a smaller image for deploying.
From above: "In contrast, tree shaking uses the approach of eliminating --shaking out-- what is not needed in a static fashion. Programmers specify the requirements of their application and the unneeded parts of the Lisp system are removed using their detailed knowledge of the application. The disadvantages of tree shaking are that it requires programmer intervention and that a function which is shaken out cannot be easily restored."
> Most Lisp development systems, including Lucid’s, provide all the resources of the Lisp system by default, and this in turn leads to a style of development in which the programmer makes use of whatever tool happens to be most convenient. Because much of the basic Lisp system (or any development system built on top of the basic Lisp system) will generally be unused by a given application, it is very worthwhile to have a tool for excising these unused parts. This tool is called the Treeshaker.
> Treeshaker execution occurs in three phases: walking, testing and writing. In the walking phase, the Treeshaker accumulates a set of objects that need to be included in the saved image. After making this set, the treeshaker runs a test of the application to check that all objects which are used in a typical run have been included. The writing phase then generates an executable image which will run the application.
> To a first approximation, the walk phase is just a matter of computing the connected component of the Lisp image (treated as a directed graph in the obvious way) generated by the application’s toplevel function. However, because of the way that Lisp objects are generally connected this usually includes almost the entire Lisp image including the unused subsystems. Therefore the treeshaker uses several techniques to find connections between objects that do not actually need to be followed in the walk."
> "The name Treeshaker is meant to be evocative of the idea of actually shaking a tree to dislodge dead branches or other trash."
On the more funny side, LispWorks has a keyword to the DELIVER function :shake-shake-shake , which invokes the treeshaker during application delivery.
The correct general term is “dead-code elimination” [0]. Reachability analysis is commonly used do determine what code can be removed, but other methods are possible, and the analysis by itself doesn’t remove any code, that’s a subsequent step.
“Tree-shaking” as commonly used implies that the granularity of the removal is functions, whereas dead-code elimination can generally be at arbitrarily fine levels, for example eliminating branches of conditional expressions, and can be based on all kinds of static program analyses.
Although true, in most cases when people talk about dead code elimination, they refer to eliminating code inside a function, whereas tree-shaking unambiguously refers to inter-procedural dead code elimination.
Probably because traditionally, unused 'objects' in static link libraries were ignored in the first place by linkers. But with LTO the term dead-code-elimination makes sense IMHO (since the LTO pass will drop any code and data that ends up being unused).
The reason I didn't like the name when I first came across it is that I think of shaking the tree for harvesting the fruit. The fruit is what you want, not what you want to eliminate.
Dead branches and loose leaves are still connected to a real tree, that's why it's a misnomer. "Raking" would be a better name if you want to keep to the metaphor. Dead code elimination is the most precise term, and is already well established.
dead branches and loose leaves are connected to a real tree in the same way that dead code is connected to the program in a file. if you break off the dead branch, nothing happens to the tree, just like when you remove unused code, nothing happens to the program.
I learned of tree shaking first, and DCE is clearly superior as a term: 1) it's actually descriptive, 2) there's a large literature using this term to look for further information, and 3) as the original poster noted, it's not actually a misnomer. There is literally no advantage to the new term that I can think of.
Tree-shaking is specifically the form of DCE where you remove unreferenced functions/modules.
Especially important is that you can do tree-shaking without analyzing control flow, while by default "DCE" implies you're analyzing control flow.
And I don't think tree-shaking is a misnomer. Depending on how you visualize the metaphor, unreferenced functions are either barely attached or not attached. Shaking them off is simple and sufficiently realistic.
Tree shaking is the established lisp term, but not used for compilers, but packagers, to shrink images. Dead branches are NOT stored in the pruned image.
Dead code elimination came 20 years later with C compilers.
The problem with treeshaking - I wrote my first for my lisp 30 years ago, it was trivial - is the lack of compile-time evaluation. The more the compiler knows, the more it can prune. Every run-time branch, late binding and esp. dynamic call by string kills it. With simple tricks you can eliminate 90% of your code. IO, error handling, the number tree, lots of slack in the stdlib's. With GUI even more. I heard from CL images shrinked from 2GB down to a floppy disk.
DCE is a standard compiler optimization that's been been around since at least the 80s. It's done in multiple different ways, and "tree shaking" is just one more way. Whether the term DCE is known doesn't seem relevant to what is the most descriptive and meaningful term for this optimization.
I can't speak to origins, but currently in the JS world DCE and tree-shaking refer to different things. "Tree-shaking" normally refers to when the bundler omits unreachable code, that a more naive bundler would have included. It's an oft-discussed topic because it wasn't possible to do in some earlier module formats, and some bundlers do it better than others. In this context the "tree" mostly refers to the dependency tree.
In contrast DCE usually refers what the JS engine does at runtime, via whatever means. But DCE isn't much discussed, unless one talking about v8 internals or the like.
DCE has been around at least since 1971 Frances E. Allen's "A catalogue of optimizing trasformations", but I don't have her earlier papers/internal ibm memos to be able to say, but the section on DCE in that paper didn't appear to contain any further references.
So in your mind, a compiler pass is somehow supposed to be similar to the wind blowing, and despite the fact that "tree shaking" makes no reference to the wind or dead branches or leaves, you think anyone seeing this term will immediately make this "completely obvious" connection, and to you, this makes more sense than simply calling an optimization a straightforward, self-explanatory name like "reachability analysis" or "dead code elimination".
It's no wonder naming things is considered one of the hardest things in computer science if this kind of convoluted argument makes sense to people.
Virgil's reachability analysis also applies to initialized objects and their fields. I.e. it will only include the reachable parts of the heap and will remove fields that are never accessed from objects. This analysis happens on the entire program at once.
Proving yet again that there aren’t enough gardeners in computer science.
The metaphor we use for optimization is “low hanging fruit” which no orchard owner would ever do. It’s massively wasteful, be you a programmer or a farmer. It’s what amateurs do. I do tree shaking. Pick a tree (subject matter in the code) and get all of the fruit that’s willing to fall off before moving to the next. It’s more efficient, more effective, and more sustainable. It works with human factors instead of against them.
What we call tree shaking is more like tree pruning. Specifically what you’d call thinning (tracing a misplaced or damaged branch back to the parent and cutting it there).
> The metaphor we use for optimization is “low hanging fruit” which no orchard owner would ever do.
This is just over-extending the metaphor.
The term has existed long before software was a thing, and refers simply to grabbing something that's easy. That's it.
The same goes for tree-shaking. The author does the same thing - over-extends the metaphor. Tree shaking simply means giving everything a jiggle and seeing what comes loose. It's easy to understand and shouldn't be read into any more than that.
I think the idea behind calling it "shaking" is these branches and leaves that are already cut (inaccessible from the root), and just need a strong breeze to shake the tree and make them fall out.
I believe low hanging fruit is not specific to CS.
Regardless, it is a perfect metaphor. You want to eat an apple: which one do you pick? Taking the low-hanging fruit is less work right now and gets you to your immediate goal, but disregards general efficiency. Sure, picking a whole tree is more efficient. But if you want a single apple, taking the low hanging fruit is the fastest approach. The metaphor works because it actually implies that it's not the most efficient approach, just the easiest
Everyone, even the amateurs and complete non–gardeners, know that this is not the best way to pick fruit. The whole point of the phrase is to point out that someone was lazy. It is saying that all they did was the absolute minimum amount of work.
You think of orchard owners, but that's not the image it conjures for me. I remember the neighborhood of my childhood. The fruit trees there were 90% decoration. Usually someone would pick one (1) fruit whenever they felt like getting one. Most of the fruits would never get picked at all. The ground would be full of overripe and rotting fruit, until someone could be bothered to clean it up.