More

jmcomets · 2025-12-03T14:49:46 1764773386

Obvious caveat: pushing this a bit further it can quickly fallback to the default case. The optimizer is a superpower but you still need to try to write efficient code.

    unsigned add_v5(unsigned x, unsigned y) {
      if (x == y) return 2 * x;
      return x + y;
    }

Results in:

    add_v5(unsigned int, unsigned int):
      lsl w8, w0, #1
      add w9, w1, w0
      cmp w0, w1
      csel w0, w8, w9, eq
      ret

(armv8-a clang 21.1.0 with O3)

If compiler folks can chime in, I'm curious why incrementing in a loop can be unrolled and inspected to optimize to an addition, but doubling the number when both operands are equal can't?

jcranmer · 2025-12-03T15:55:34 1764777334

> If compiler folks can chime in, I'm curious why incrementing in a loop can be unrolled and inspected to optimize to an addition, but doubling the number when both operands are equal can't?

Compilers are essentially massive towers of heuristics for which patterns to apply for optimization. We don't throw a general SMT solver at your code because that takes way too long to compile; instead, we look at examples of actual code and make reasonable efforts to improve code.

In the case of the incrementing in a loop, there is a general analysis called Scalar Evolution that recasts expressions as an affine expression of canonical loop iteration variables (i.e., f(x), where x is 0 on the first loop iteration, 1 on the second, etc.). In the loop `while (x--) y++;`, the x variable [at the end of each loop iteration] can be rewritten as x = x₀ + -1*i, while the y variable is y = y₀ + 1*i. The loop trip count can be solved to an exact count, so we can replace the use of y outside the loop with y = y₀ + 1*trip count = y₀ + x, and then the loop itself is dead and can be deleted. These are all optimizations that happen to be quite useful in other contexts, so it's able to easily recognize this form of loop.

In the example you give, the compiler has to recognize the equivalence of two values conditional on control flow. The problem is that this problem really starts to run into the "the time needed to optimize this isn't worth the gain you get in the end." Note that there are a lot of cases where you have conditional joins (these are "phis" in SSA optimizer parlance), most of which aren't meaningfully simplifiable, so you're cutting off the analysis for all but the simplest cases. At a guess, the simplification is looking for all of the input values to be of the same form, but 2 * x (which will actually be canonicalized to x << 1) is not the same form as x + y, so it's not going to see if the condition being used to choose between the same values would be sufficient to make some operation return the same value. There are representations that make this problem much easier (egraphs), but these are not the dominant form for optimizers at present.

DannyBee · 2025-12-03T23:18:03 1764803883

This is all true. Additionally, the payback from optimizing purely scalar arithmetic harder has gone down more and more over time compared to almost anything else.

For example, eliminating an extra load or store is often worth more than eliminating 100 extra arithmetic operations these days.

Someone · 2025-12-03T15:06:21 1764774381

> I'm curious why incrementing in a loop can be unrolled and inspected to optimize to an addition, but doubling the number when both operands are equal can’t?

I expect because the former helps more in optimising real-world code than the latter. It’s not worth the LLVM developer's time to make the compiler better for programs that it won’t see in practice.

It’s not as if the compiler did nothing with that code, though. It replaced the multiplication by a left shift and removed the branch.

scialex · 2025-12-03T15:38:24 1764776304

This sort of pattern can't be found by incremental lowering (and isn't common enough to have more sophisticated analysis written for it) so it ends up in a local maximum.

Basically the idea for most compilers is to do a series of transforms which incrementally improve the program (or at least make it worse in understood and reversible ways). To do this transform you need the optimizer to do the (not always trivial) proof that the 2*x is equivalent to x+y, do the replacement, do the gvn to duplicate the adds and finally do the branch elimination. Each of these steps is however totally separate from one another and the first one doesn't trigger since as far as it's concerned a shift left is faster than an add so why should it do the replacement.

This is all even more complicated since what representation is faster can depend on the target.

AlotOfReading · 2025-12-03T16:04:14 1764777854

I agree, but GCC manages the optimization, and not all optimizations need to take fewer cycles. The single instruction version is obviously better for -Os and it would probably be a win in general.

DullPointer · 2025-12-03T15:29:22 1764775762

I’m not a compiler expert, an assembly expert or an ARM expert, so this may be wildly wrong, but this looks optimized to me.

The trick is that it’s doing both the add and the left shift in parallel then selecting which to use based on a compare of the two values with csel.

(To see this, rather than reading the code sequentially, think of every instruction as being issued at the same time until you hit an instruction that needs a destination register from an earlier instruction)

The add is stored in W9 but only read if the two arguments are unequal.

If the compare succeeds and the lsl retires before the add, the add is never read, so nothing stalls waiting for it and the answer can be returned while the add is still in flight. The result of the add would then be quietly discarded assuming it ever started (maybe there’s some magic where it doesn’t even happen at all?).

It’s not clear to me that this is power efficient, or that on many real cpus there’s a latency difference to exploit between add and lsl, so it may not be faster than just unconditionally doing the addition.

That said, it is definitely faster than the code as it was written which if translated to asm verbatim stalls on the compare before executing either the add or the left shift.

adwn · 2025-12-03T15:57:53 1764777473

> this looks optimized to me.

It's not. Why would lsl+csel or add+csel or cmp+csel ever be faster than a simple add? Or have higher throughput? Or require less energy? An integer addition is just about the lowest-latency operation you can do on mainstream CPUs, apart from register-renaming operations that never leave the front-end.

DannyBee · 2025-12-03T23:14:32 1764803672

In the end, the simple answer is that scalar code is just not worth optimizing harder these days. It's rarer and rarer for compilers to be compiling code where spending more time optimizing purely scalar arithmetic/etc is worth the payback.

This is even true for mid to high end embedded.

DullPointer · 2025-12-03T16:09:06 1764778146

ARM is a big target, there could be cpus where lsl is 1 cycle and add is 2+.

Without knowing about specific compiler targets/settings this looks reasonable.

Dumb in the majority case? Absolutely, but smart on the lowest common denominator.

adwn · 2025-12-03T18:12:02 1764785522

> Without knowing about specific compiler targets/settings this looks reasonable.

But we do, armv8-a clang 21.1.0 with O3, and it doesn't.

> […] but smart on the lowest common denominator.

No, that would be the single add instruction.

jmcomets · 2025-09-17T08:18:10 1758097090

Still is. It's a French/Lebanese corp based in Paris/Beirut. I worked there for a few years early in my career.

jmcomets · on Nov 22, 2023

1) C++: 7+ years; Rust: 3+ years.

2) C++, my employer's choice. Their rationale: easier to hire people with experience, internal codebase is mostly C++ so tooling/people more versed.

3) Rust. For me it's just easier overall to work with. It takes me an hour or so to get my head back into Rust's standard library and idioms/quirks, but afterwards it feels much better to work with: dev. workflow, testing, expressiveness. Plus it catches programming errors for me every once in a while.

4) I wish Rust had wider adoption by the industry. It's always compared to C/C++/Java/Go and the usual argument boil down to "not enough people use it".

5) I wish C++ had better compiler error messages. I have to rely on instinct when it's not scrolling through nested template/SFINAE errors to understand what's happening. Rust tells me "here's what's wrong" 99% of the time.

jmcomets · on Nov 8, 2023

Another fact not mentioned in this article is the WWII historical locations found in the unofficial Catacombs. Back when I lived in Paris, I was lucky enough to know someone who had access to one of the Cataphiles' maps (yearly-updated with notes on entrances and potential police patrols, with closest exits and dangerous passages).

We visited an old school basement, which was used as a bunker for members of the Resistance. The school itself was razed and rebuilt over at some point, but the Catacombs still hold traces of this period. Being there felt very...intimate. Nothing like you'd see in a museum or a documentary, we were in the same place as those back then.

Aromasin · on Nov 8, 2023

I had a very similar experience. I used to do a lot of "urban exploring" in the UK, and as part of that, I visited Paris for it with some friends. We went with a guide I met on a forum to a secret resistance shelter recently uncovered at the time, and there was still litter from the 1940s. Old cigarette and match boxes, wine bottles, old posters and the like. It was exactly as you describe; intimate, in the way that visiting a mausoleum or cathedral is compared to a museum. I'm sure it's been picked clean now, but the memory will stick with me forever I suspect.

jmcomets · on Aug 23, 2023

My pet theory is that programmers are exposed to many business areas during their career, either directly or hearing it from other programmers. This desacralizes these jobs, and since we spend most of our days working though complexity (requirements, bugs, ...), you end up with "what's so hard about X?" statements.

jmcomets · on June 15, 2023

+1, the link to Joel Spolky's post on Unicode is probably the most interesting read I've found this year: https://www.joelonsoftware.com/2003/10/08/the-absolute-minim...

chrismorgan · on June 15, 2023

I was never much impressed with that article (given its title, I say—given its title) from the first time I saw it, probably around 2010, and it has aged poorly. Some of my complaints about it:

• It tells a verbose story, rather than just telling you what you need to know succinctly. (Seriously, the 3,600-word article could be condensed to under a thousand words with no meaningful loss—and the parts that I consider useful could be better expressed in well under 500 words. As a reminder of where I’m coming from in this criticism: the title of the article is “The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)”, so I expect it to be, y’know, absolute-minimumy.)

• It spends way too much time on history that even in 2003 wasn’t particularly relevant to a lot of people (though yeah, if you were working in the Windows ecosystems he was working in, you would benefit from knowing about some of it), and which is now (twenty years later) completely irrelevant to almost everyone.

• It portrays UCS-2 and UTF-16 as equivalent, which is disastrously wrong. (As in, “if you do that, you stand a good chance of destroying anything beyond U+FFFF”.)

• As far as I can tell (I was a young child at the time), its chronology around UTF-8 and UCS-2/UTF-16 is… well, dubious at best, playing fast and loose with ordering and causality.

• Really, the whole article looks at things from roughly the Windows API perspective, which, although what you’d expect from him (as a developer living in Microsoft ecosystems), just isn’t very balanced or relevant any more, since by now UTF-8 has more or less won.

• It doesn’t talk about the different ways of looking at Unicode text: code units (mostly important because of the UTF-16 mess), scalar values, extended grapheme clusters, that kind of thing. A brief bit on the interactions with font shaping would also be rather useful; a little bidirectional text, a little Indic script, these days even a bit of emoji composition. These days especially, all of this stuff is far more useful than ancient/pre-Unicode history.

• The stuff about browsers was right at the time, but that’s been fixed for I think over a decade now (browser makers agreeing on the precise algorithms to use in sniffing). (He’s absolutely right that Postel’s Law was a terrible idea.)

jmcomets · on June 8, 2023

> Work relationships matter, sharing 3D space allows people to be more creative and collaborative, and companies are recognizing that. Google seems like the last company that would make people come back to the office without doing their homework, there’s clearly data showing that fully remote employees are falling behind.

A couple things I wanted to point out when reading this:

- RTO does not imply sharing a 3D space with those you collaborate with (e.g. distributed teams).

- There is no data that proves that in-person work is more "creative" or "collaborative", simply because it's not measurable.

- You assume Google's intentions to always driven by data rather than appealing to stakeholders.

repeekad · on June 11, 2023

- RTO is entirely about sharing 3D spaces with those you collaborate, agree that if the team you work directly with is distributed there’s little value to RTO to me, direct team in office and other teams distributed seems more common

- there is data: https://www.nature.com/articles/s41586-022-04643-y

- they are not mutually exclusive, I do very much assume google uses data AND appeals to stakeholder value. I don’t hold the view that stakeholder value is purely maximizing productivity, I think google does their homework before making big decisions.

jmcomets · on June 8, 2023

It's a mix of bikeshedding (many people experienced remote work during the COVID lockdowns) and there being no data available for this.

Also consider who would be more likely to respond in this thread: people heavily invested in or against remote work (since there is no actual data). Always-relevant XKCD: https://xkcd.com/386/

jmcomets · on June 8, 2023

2022 headcount was estimated at ~1850[0], so that would be 150 employees (allegedly).

[0]: https://en.wikipedia.org/wiki/HashiCorp

metaphor · on June 8, 2023

That headcount estimate is off by at least -23% relative to this year.

From Hashicorp's latest 10-K[1] filed circa March 2023:

> As of January 31, 2023, we had over 2,400 employees operating across the world.

> As of January 31, 2023, we had more than 1,050 employees in our sales and marketing organizations.

> As of January 31, 2023, we had more than 670 employees in our research and development organization.

[1] https://www.sec.gov/Archives/edgar/data/1720671/000162828023...

jmcomets · on May 25, 2023

FWIW, most DBMS have built-in index usage stats, and it's not too difficult to query. In my previous job we had a Postgres health dashboard that notably showed `log10(index size) / index hits`, making it pretty clear when some recently added index was useless.

My opinion is that indexes should be either logical (ex: used for exclusion constraints) or purely for performance (tradeoff space for better QPS). Query patterns change, specs change, so monitoring your DB's performance is the way to go.