The most difficult part about code deletion is practicing the Chesterton's Fence...

UncleMeat · on April 29, 2023

"This code has been dead for six months" is a very good heuristic that the code is not relevant. I do occasionally reject the sensenmann CLs, but only very very rarely. This isn't weird code that nobody knows why it exists but it is currently doing something. This is code that cannot execute.

ninjanomnom · on April 29, 2023

Code that only triggers from a yearly holiday, disaster alerts, leap years, or the like, would have longer periods of going unused and likely be very problematic if removed. Unless by dead code you mean unreachable code in which case it shouldn't exist in the first place and I agree should be removed.

UncleMeat · on April 30, 2023

That isn't dead code.

Dead code is shit that has a build rule and no other build rules in the entire repo depend on it. Or private functions that have no callers in an application that isn't built with any kind of reflection capabilities.

jaggederest · on April 30, 2023

Or code that is behind if(false), System.exit or null pointer dereference. Finding those kinds of things is one of the best part of modern compilers, it's nice that they've extended that to the whole ecosystem.

ethanwillis · on April 30, 2023

Dead code or abandoned project with goals that got forgotten but might still be important?

joshuamorton · on April 29, 2023

Yes, the nice thing about blaze/bazel + sensenmann is that you can very accurately say "this code was not built into a binary that has run in the past 6 months".

Sometimes you still want it (e.g. python scripts that are used every once in a while for ad-hoc things and might go months between uses), but usually the right thing to do is productionize stuff like that slightly more (and also test it semi-regularly to make sure it hasn't broken).

CyberDildonics · on April 29, 2023

You can probably get most of that by just looking at the atime attribute on the file system.

joshuamorton · on April 29, 2023

Nah, there's stuff that scans the entire repo regularly for all kinds of interesting purposes, and of that's ignoring the fact that `atime` isn't available or a source of truth in piper.

Like conceptually I believe this could be wrong in both directions, since there's heavy caching of build artifacts, you can totally build a transitive dependency of some file without actually reading the file (and potentially do this for a relatively long period of time, though I don't think that will happen in practice), and stuff will regularly look through large swaths of files that aren't necessarily run.

CyberDildonics · on April 30, 2023

Like conceptually why wouldn't you turn on atime and like why would you cache an entire file instead of just reading it?

joshuamorton · on April 30, 2023

> conceptually why wouldn't you turn on atime

Because piper[1] isn't a normal filesystem. It's often accessible through a FUSE-based api, so it appears like a filesystem to some users sometimes, but it can also be accessed over an RPC api. So the concept of "atime" isn't really a fit, because, well, you access the filesystem from a view that is based on your workspace, (think, akin to the git commit hash being in the filepath), and the "real" underlying file isn't necessarily on your machine.

So under a reasonable definition of atime, the atime of most files is never, because on any given arbitrary commit, you don't access/build everything.

> you cache an entire file instead of just reading it

You don't cache the file, you cache the file's outputs, keyed by a hash of the file (or all the files which are deps of a particular output). With bazel, you have a shared build artifact cache that can securely and reliably be shared across every user at a company with tens of thousands of engineers. If I build some target `:foo`, which corresponds to `foo.o`, generated from `foo.c`, but someone built `:foo` five minutes ago, as long as `:foo` hasn't changed (and you can check that the file hasn't changed without reading it because the fancy filesystem stores a hash of the file alongside the actual file), I won't actually read `foo.c` or go though the motions of building `:foo`, I can just pull `foo.o` directly from the object cache and use that in any dependencies, which means that I can build my output without ever invoking a compiler, which is really cheap and fast.

You could argue that upon reading the hash you should update the atime, but that's extremely expensive since now you have to do writes instead of idempotent reads a bunch of times a second.

[1]: https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...

CyberDildonics · on April 30, 2023

Nah, like conceptually you could check the atime of the cached outputs too

joshuamorton · on April 30, 2023

I mean that's functionally what they do, just more accurately since it isn't read time, but time it was built into a binary since that information is accessible, and because the object cache isn't filesystem like and doesn't have an "atime".

And also you still need to reverse from a cached output foo.o, to every transitive dependency of that, of which there could be thousands. Which can be done but requires invoking the build system to do. So it's not anything like just checking a timestamp on the file.

CyberDildonics · on April 30, 2023

Nah, conceptually a file system is already an object cache so you can use the filesystem then like use its atime

joshuamorton · on May 1, 2023

I realize you're trolling at this point, but no a filesystem and an object cache aren't conceptually similar, in case anyone else might be confused.

CyberDildonics · on May 1, 2023

Nah, like conceptually they are both writing serialized bytes and like indexing them so they can conceptually be found easily and like read back. This is probably why you didn't like explain why they are different.

taspeotis · on April 29, 2023

Different industries I guess. The new financial year comes around every 12mo. Good luck explaining to the accountants that you deleted their end-of-year reconciliation reports because they didn’t run them every 6mo.

jaggederest · on April 30, 2023

I would expect that those binaries would be built and deployed constantly, even if they just woke up once a day to check whether it was mid-april or whenever the financial year ended. I would also expect that they would be integrated with other reporting that happened on a daily, weekly, monthly, and quarterly basis.

Do you really have an entire application binary that is only deployed and run once a year?

userbinator · on April 30, 2023

would be built and deployed constantly

That's an enormously stupid waste of resources.

Do you really have an entire application binary that is only deployed and run once a year?

I have ones which are run when the need arises. It could be a year, it could be ten. But I know they work and I don't need to touch them, nor would I want to delete them.

ElliotH · on April 30, 2023

At Google there’s not really such a thing as code that never changes. Your dependencies will shift beneath you, even if it’s just the compiler itself. More likely though the libraries you use have been updated, a robot or two have come through and applied automated fixes etc etc.

So if we’re using it once a year, (we do have tools like that, maybe not strictly annually but certainly rarely) then you do indeed want it to be automatically built on some kind of recurring basis so it’s ready to go when you need it. And the tooling makes it (relatively) easy to have that happen.

GeneralMayhem · on April 30, 2023

If your once-a-year binary has any dependencies in common with others, then you need to at the very least be rebuilding and re-running the tests when those shared dependencies change, or else you'll find at the end of the year that it doesn't build, let alone work. And the compiler and stdlib count as shared dependencies.

jeffbee · on April 29, 2023

Wouldn't it be mostly their fault for approving the removal?

userbinator · on April 30, 2023

The fact that you can so very straightforwardly declare that code which hasn't been used in 6 months is irrelevant is extremely revealing about what leads to the deplorable state of "modern" software development.

dekhn · on April 29, 2023

Google's response to Chesterton's Fence is: "if you liked it, then put a test on it".

I used to update the internal version of numpy for Google and if people asked me to rollback after I made my update (having fixed all the test failures I could detect), and they didn't have a test, well, that's their problem. The one situation where that rule wouldn't apply is if I somehow managed to break production and we needed to do an emergency rollback.

I shed a tear when some of my old, unused code was autodeleted at Google, but nowadays my attitude is: HEAD of your version control should only contain things which are absolutely necessary from a functional selection perspective.

Forge36 · on April 29, 2023

I like that philosophy. In a similar vein: if it's important why aren't we testing it.

How do you encourage testing?

compiler-guy · on April 30, 2023

At Google, because everything is in the mono repo and nearly everything around it is in some level of flux, if you don’t have a test for your code, it gets broken pretty regularly.

Most teams are pretty understanding when they break you on some untested code, and will work with you to fix it, but the very first step in working together is for you to write a test that shows the issue.

The resultant culture is heavily biased toward writing tests before something bad happens to you.

opportune · on April 29, 2023

I don’t think you understand Senssenmann fully based on this post. At Google basically everything in use has a Bazel-like build target. This means the code base is effectively a directed “forest”/tree-like data structure with recognizable sources and sink. If you can trace through the tree and find included-but-not-used code by analyzing build targets, you can safely delete it. There are even systems (though not covering everything) that sample binaries’ function usage you could double check against.

> why the code exists in the first place

If the code is unreachable it’s at best a “possibly will be used in the future” and most likely simply something that was used but not deleted when it’s last use was removed (or a YAGNI liability).

If you can find a piece of code included in build targets but unreachable in all of them, it’s typically safe to delete. And it’s not done without permission generally, automation will send the change to a team member to double check it’s ok to delete/nobody is going to start using it soon.

compiler-guy · on April 30, 2023

Recovering from such a mistake is pretty easy too. Roll backs are usually very easy.

opportune · on May 1, 2023

Yep. You would almost always know before the deletion is even committed or the change proposed whether it would cause a build or test breakage. And if the automated safety checks and manual review didn’t catch such a breakage, you’d easily rollback. The deleted code is always “still there”, you have to jump through a ton of justified hoops to do a true deletion, which Sensenmann won’t be doing.

anonymousiam · on April 29, 2023

It should also be clear that this article is about "deleting" code from an active project, not about "deleting" it entirely from the version control system. Thus, any code "deleted" through the described process could still easily be restored if necessary.

usrusr · on April 30, 2023

From reading the comments, I get the impression that there is a much deeper misunderstanding going on. This seems to be not so much (or not at all?) about identifying rarely run code paths, but about build units that have been abandoned on the dependant side.

So if there's a fancy onLocusSwarmAttack() that has never been run outside a test in a module to help applications deal with various kinds of datacenter outages called TenPlagues, it won't be in the crosshairs of Sensenmann. But if one day that module gets a successor (perhaps BeiWeltuntergangScheibeEinschlagen if it's made by the Zurich team?) and all code is supposed to eventually migrate to the new one, Sensenmann will tell you that TenPlagues has finally left the building when (if) it has happened.

Again, just my impression from the article and the discussions here (many seem to think it's about onLocusSwarmAttack()), probably resulting from a linguistic barrier between the language used at Google to talk about their monorepo and more conventional English. I guess they have done some quite deliberate shifts to help people "think monorepo"?

breck · on April 29, 2023

As a counter to Chesterton's Fence: sometimes the fastest way to understand what something does is to remove it and see what complains. You might get only 1 complainer for every 10 fences you take down. Putting that one fence back up takes much longer than taking it down, but the time saved from removing the other 9 unnecessary ones makes it a net win. And this time you can add Documentation to the rebuilt fence.

macNchz · on April 29, 2023

Also known as the scream test in IT: unplug that old server and see who screams!

shagie · on April 29, 2023

Microsoft uses a scream test to silence its unused servers - https://www.microsoft.com/insidetrack/blog/microsoft-uses-a-...

IshKebab · on April 29, 2023

A further counterpoint: if you follow the Fence proponents' logic to its conclusion you can never remove any code which is clearly an absurd situation.

I think the real logical flaw is that Fencers (as I will now call them) put the blame on the person who removes an apparently useless fence. But they're wrong. The real blame lies with the person who built the apparently useless fence and didn't put a sign on it explaining why it shouldn't be removed.

proper_elb · on April 29, 2023

> A further counterpoint: if you follow the Fence proponents' logic to its conclusion you can never remove any code which is clearly an absurd situation.

No, that would only be the case if one would never understand any code. Chesterton's Fence consists of two parts ("understanding some code" as a precondition to "removing some code"), and leaving one or the other part out makes it some other thing than what Chesterton's Fence means.

> The real blame lies with the person who built the apparently useless fence and didn't put a sign on it explaining why it shouldn't be removed.

Chesterton's Fence is not about blame, or the past in general - it is about how to deal with things that are in the present. (Although I agree that the original fence-builder should have left a note or two!)

kube-system · on April 29, 2023

The principle says you can’t remove it until you understand why it was there. It’s more about doing due diligence.

I follow the principle when I remove code, and it’s a reason why good code comments are important. “Oh yeah, this was written for [x] which is no longer a thing, we can remove it now”

hgsgm · on April 29, 2023

It's not about blame, it's about making good decisions.

xboxnolifes · on April 29, 2023

And not putting up a sign explaining why it's a necessary fence is a bad decision. Avoiding removing all unlabeled fences because they might, maybe, potentially be useful, is also likely a bad decision if taken to it's conclusion.

TeMPOraL · on April 29, 2023

Chesterton's Fence is a reminder to actually get up, walk over to the fence and skim the multiple labels and notes the builders left there - because the big problem is usually someone looking at a thing that came before them, and assuming they understand what it was for, without bothering to actually check it.

einpoklum · on April 29, 2023

The article is about the removal of _dead_ code. So, not a "fence across the road" - it's a fence that was moved to the side of the road, already cleared. The question is just whether to dismantle the fence or keep it there just in case.

Xorlev · on April 29, 2023

+1. And, it's in version control forever. It's not as if it entirely disappears. Like one of the sibling comments mentioned, I only rarely reject Sensenmann CLs.

That's worth explaining: it's automated code deletion, but the owner of the code (a committer to that directory hierarchy) must approve it, so it's rare there's ever a false deletion.

proper_elb · on April 29, 2023

You raise a good point, and I would answer it with agree and disagree:

Agree: Yes, you are correct, merely observing that a code path was never executed in the last 6 months is not the same as understanding why the code path was created in the first place. There might be the quite real possibility of an infrequent event that appears just once in every two years or so (of course, this should also be documented somewhere!).

Disagree: Pragmatically, we have an answer if the code path was not executed after 6 months use in production and test: We know that, with a very high probability, the code path was created either by mistake (human factor) or intentionally for some behavior that is no longer expected from our software. To continue the Fence metaphor, regarding Sensenmann: After 6 months, we know about the Fence that 1) it has no role to play in keeping the stuff out that we want out (that was all done by other fences that were had contact with an animal at least once) and 2) that it might have been used to keep out flying elephants or whatever, but no such being was observed in the last 6 months (at least the fence made no contact with it, which it then should have!) and probably went away.

That said, having a human in the loop is probably a good idea.