Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The harmful consequences of the robustness principle (ietf.org)
103 points by signa11 on Feb 14, 2022 | hide | past | favorite | 48 comments


One of the trends of newer programming languages that I'm a big fan of is to pair good error messages with strict validation (see e.g. Elm or Rust) rather than to try to have very lax input (in this case code) and make it work (e.g. older versions of PHP or JS).

While there are specific cases where lax validation or extensible validation is the right way to go, I agree with the IETF here that that lax/extensible validation is a poor default and strict (+ good error messages!) is a better default.

It is always easier, whether purely from an implementation perspective or from a wider ecosystem perspective, to take strict validation and make it laxer.


> I agree with the IETF here

Note that the status of this document is an (expired) Internet-Draft -- it is a proposed RFC that has not been adopted, so it doesn't necessarily reflect an IETF consensus.

I mention this because I'm the author of some somewhat controversial Internet-Drafts that some people thought had already been adopted because they were hosted on ietf.org and were written in the style of an RFC. It's worth remembering that anyone can write a proposed RFC and it will be hosted on ietf.org in order to facilitate discussion. IETF documents should indicate their status at the top, as well as having more details about their history available in the IETF datatracker, so we can see what kind of consensus each document has or has not achieved so far.


This might also go hand in hand with newer programming languages being (1) free and (2) single-implementation unique languages.

When you have a multiply implemented language there is an incentive to support nonconforming programs, in order to fool naive programmers and their pointy haired bosses that a given language implementation is superior. And also to lock users to that implementation.

There is existing code out there creating a kind of competitive scene: which implementation will most of that get built with.

The browser wars were an example of this. Rendering broken HTML was a feature; it looked like your browser was better than another one which "broke" on the incorrect web page.

Microsoft tried to co-opt Java with something called Visual J++, pissing off Sun Microsystems badly enough to provoke a lawsuit.


On an episonde of Corecursive [1] there's a great discussion/explanation about how following robustness principle can lead to a massive burden in having to develop extremely complex software (like browser engines) because you need to keep supporting everything that was "was allowed" in past versions. They include a quote from Martin Thomas who observed that:

> "The problem with the robustness principle is a flaw can become entrenched as the defacto standard. Any implementation of a protocol is required to replicate the apparent behavior."

If web browsers enforced a very strict protocol of what is "acceptable" HTML and CSS from the very beginning, they would be a lot more simple to build today. Now it's too late and bc of the complexity of "robust" input out there, it's almost impossible for anyone other than one of the giants to create a working implementation.

[1] https://corecursive.com/internet-is-duct-tape/#chuck-norris-...


Arguably, would the web exist today as it is if not for robustness principle that also leads to "almost impossible for anyone other than one of the giants to create a working implementation"


I would disagree. For example, suppose we enforced XHTML. Even though it's a mostly dead standard that was never very popular, there are plenty of free and very functional validators around. You don't need to be a multi-billion dollar company to close your tags.

Similarly for JS, strict mode did not ruin the internet. Sure, there is a lot of old JS code around that may or may not run in strict mode. That is not because it is any harder to write correct JS in strict mode than it is in "sloppy" mode, but the "robustness principle" has just burdened us with a lot of legacy crap that wouldn't otherwise exist. I would argue that the web would be better if JS had been strict from the start.


Big differences between xhtml and js-strict are:

- strict mode is a much more local opt-in, it’s difficult to opt others into strict mode, and thus the likelihood of respecting strict mode is high

- js is much less subject to embeddings, with xhtml even back then there was a long tail of CMS type content which would suddenly break the page hard and with no actionable recourse for the viewer

- finally strict mode came with a bunch of benefits e.g. new syntax / constructs, etc, for the person opting into strict mode, but unless you were an XML-head with an entire XML-based processing pipeline, what did XHTML give you? Additional pain when authoring content is about all.


Explain to me why XHTML did not stay with us?


XHTML never really had a chance because the dominant browser at the time, Internet Explorer, never supported it. Since people were going to have to deal with all of IE's quirks anyways there was little appetite for putting in the effort to support XHTML too.


It didn't survive because HTML5 came along with very well defined and strict parsing rules.

You don't need closing tags for a specification to be strict. As long as you define the parsing rules rigorously as HTML5 does, there is little benefit in XML compliance and there is in a well defined non-XML standard as it is mostly backwards compatible.


It depends on what you mean by 'XHTML.'

One of the possible definitions is a mode of HTML that was forced to be strictly validating by an XML parser. Adoption of this was hurt by the fact that XHTML-delivered-as-XHTML didn't work correctly in the dominant browser of the time. But HTML5 also delivered a fully-specified parser that explained how to handle tag soup correctly, which meant the problem trying to be solved here (having to parse HTML files into a DOM) was solved in a different way.

Another possible definition is the XHTML 2.0 definition, which tried to rip out a lot of HTML features and replace them with more semantic definitions and was largely an effort that was entirely divorced from reality (to the point of driving out the web browser implementors entirely from the working group).


HTML5 rules didn't solve the problem of parsing HTML files into a DOM. Instead it mandated the existence of HTML files which, when parsed, converted to HTML, and parsed again, produce a different tree than the first time around. This harmful property can be used to develop XSS exploits like https://research.securitum.com/mutation-xss-via-mathml-mutat....


Because nobody cared much for it, and it didn't give us what we actually wanted, which HTML5 did...


I think you could have the same issue with a stricter format.

The "impossibility" is not the quirks and ends of HTML - is adding all the 100s of frameworks on top, like 5 accelerated graphic options (Canvas, WebGL, SVG, CSS Animations, WebGPU), all kinds of type-setting-foo including math rendering, real-time communications, 2 optimized-to-death-by-experts languages (JS, WASM), storage options, sandboxes, audio, video, MIDI, and synthesizer capabilities, and so on...


The link also mentions the (useful) name for this phenomenon, Hyrum's Law: "With a sufficient number of users of an API, it doesn’t matter what you promise. Any observable behavior of your system will be dependent upon by somebody."

Related xkcd variant: "Every change breaks someone's workflow."

https://xkcd.com/1172/


I don't entirely agree, or rather I think we can profitably go further with this.

It's possible to be both liberal and rigorous in what you accept. Let me provide a motivating example.

Let us say you have a tool which is nominally supposed to accept JSON, but for whatever reason you might expect a lot of format errors in the JSON: comments which aren't legal, trailing or missing commas, an equal sign instead of a colon, unquoted symbols as keys.

If you build some kind of shotgun system to try and repair the JSON, you're going to have a bad time. What if, instead, you define a grammar for your liberal superset of JSON? Now there's no ambiguity about what you accept, or what your parser will do with a given string: it's as well-defined as JSON is, it just accepts more formats of the 'same' data.

If your system only ever emits strict JSON, well, you've embodied Postel's law: this is liberal in what it accepts and conservative in what it provides. While still losing none of the rigor of accepting well-defined formats.


The issue is that either your choice of extensions to standard JSON turn out to be useful and used - which causes the value of having a standard to become eroded, or they remain unused - in which case your work to support a bigger/wider format was redundant. Both outcomes are to some degree negative.

All the actual JSON extensions you describe are completely reasonable and reflect failings in JSON as a standard but the solution should be for everyone to agree on a JSON v1.1 standard which fixes the failings rather than have every JSON v1.0 tool support their own particular set of extensions, rigorously defined or not.


The problem arises when other people try to implement a compatible tool. Sure your JSON rules might not be ambiguous, but they are not standard either so every competitor aiming for compatibility will have to adhere to these specific rules.

This means that they need to use samatman-JSON and cannot really on JSON libraries for parsing. Or worse, if these tools become popular, we will see "samatman-rules" flags arising in JSON parsing libraries thereby harming the simplicity that made JSON what it is to today.

I think that your example illustrates exactly what makes Postel's law harmful. If you accept JSON, stick to the rules.


> Sure your JSON rules might not be ambiguous, but they are not standard either so every competitor aiming for compatibility will have to adhere to these specific rules.

Every competitor aiming for compatibility with exactly the set of garbage JSON you will accept, no more and no less will have to adhere to those specific rules. That's... not an especially worthwhile goal for a competitor.

[Edit: Now, if you output something weird, others might have to be compatible. But this was talking about outputting strictly standard JSON.]


> That's... not an especially worthwhile goal for a competitor.

But it is. Because otherwise we're just embracing monopolies. We can't switch to X because it doesn't accept our input as well as the "main" implementation does.

You can only beat BIND, or Postfix or Excel if you at least understand what they understand.


That’s the point of only producing valid JSON. Someone who wants to build something compatible on top of your tool only needs to understand valid JSON.

If they want to build a drop in replacement for your tool sure they need to understand your grammar. Or assuming open source, include your tool as a dependency and run it through that first.


So two people do this and, however principled their grammars, they aren't the same. And when two systems disagree on the meaning of something you get bugs and security holes.


Well that's the point isn't it. I don't have to care about the second system's grammar, as long as it accepts at least valid JSON, since that's all I ever intend on emitting.

Meanwhile, anyone who would like to check whether e.g. a `// comment \n` statement will be accepted by my parser is invited to check the grammar, which will make all such questions perfectly clear.


I always associated this line of thinking with Perl's "there's more than one way to do it" - a principle that sort of sounds great but turns out to be the opposite what what a technology should be striving for..


In the long term, yes. Until then, experimentation is also necessary, otherwise we will be stuck at a local optimum only.


"there's more than one way to do it" is more about allowing a more comfortable style for some tasks, or express some concepts in a better way.

So for instance, Perl has a completely ordinary if statement:

    if ($foo) {
        say "Foo is enabled";
    }

However, it also allows doing this:

    say "Processing file $filename" if ($debug);

Used judiciously, this can make code better because you avoid breaking up the flow with a bunch of braces. It's also easy to tack on to the end of an existing statement.

The downside of course is the lack of predictability, such features can make code more confusing to read rather than clearer if abused. This kind of thing is just syntactic sugar and not inherently a bad thing. The issues that come up are mostly a result of some people trying to be too clever.


> Used judiciously

Which is exactly the problem. Yes, some people will use it judiciously.

Other people will just use it wherever possible. Yet others will mix it with the alternative style willy-nilly. Some won't use it at all. And as you so correctly point out, some people will try to be too clever with it.

The result is the exact opposite of what was originally intended by syntactic sugar: Instead of making code easier to read, it becomes harder.

The only way to , at least partially, prevent that, is for languages to be restrictive in their syntax. Yes, that may be more verbose at times. But I consider that a small price to pay for consistency.


The problem with perl is that the language is so dense and complex that everything means something (and is very context sensitive) so it's very hard to know what a specific piece of code means and if it was intended that way by the author.

Perl was intentionally (designed by linguist) to be like a "human" language, with all the small benefits and large problems that entails.


Syntactic sugar typically leads to diabetes.


Syntactic sugar leads to cancer of the semicolon.


This was a proposed RFC that wasn’t adopted. For another take on the Robustness Principal/Postel’s Law by a proponent see this blog post: https://apenwarr.ca/log/20090222


This sort of critique has been around for a while. This is from 2001:

https://datatracker.ietf.org/doc/html/rfc3117#section-4.5


You can even say this is (among others) the root cause of al evil in computing:

Any wrong in the bottom of the stack "spread" far and wide, and when you UNDERSTAND how bad that was, you are stuck: You can't fix it without break ALL the infected!

This is why C/c++/Js/PHP/etc cause so MUCH damage, and lost of money. Them are at the bottom: Even your fancy Rust must deal with it. Even modern C must deal with it!

That is why is so important that we, as developers, understand we can't expect real progress until this is improved, and why is so important that better tools with better defaults become the norm (ie: invest on them!).


I think robustness is good. But. There needs to be a way to validate your output against a strict spec. It's too easy to say "these 3 programs can read my output so it must be ok." That form of testing isnt valid when everyone is being robust. You gotta have a solid way to verify output.


Until you run into a HTTP/2 to HTTP/1 tunneling setup where your HTTP/2 frontend thinks it forwarded one well defined and safe HTTP/1 message to your backend, but the backend sees an additional malformed HTTP/1 header field that changes the message length and suddenly it sees a second malicious message. The result leaking thousands of user authentication tokens for some widely used web services. If necessary robustness should be part of the spec, not some wild and ill defined outgrowth of implementations that some intern hacked in at the eleventh hour to make things work just before release.


I find this well meaning, but mostly incoherent.

The fact that implementations are essentially required to interoperate with other fielded implementations is pragmatically true, regardless of any principle. Older versions also don't go away even if newer versions are created. We will still need bug-for-bug interoperability even in a world without the "robustness principle".

The main remedies proposed are largely about the setting of the standards themselves, rather than the implementations. Active protocol maintenance, extensibility and virtuous intolerance are things that must happen in the standard. The robustness principle applies to implementations, not the setting of standards.


My suspicion is that this article is looking at the problem through "HTTP"-colored glasses.

Just because that protocol is going through continuous development does not mean all other protocols are doing so as well.

The driving force here is the reluctance of the web community to use ports other than 80/443 for the newer versions of the HTTP protocols. While HTTP/2 and HTTP/3 are superficially similar to HTTP/1, the fact is that the protocols are different and should have been given different port numbers.


Having worked on the email side of things, the problems mentioned here are exactly as painful there as it is for HTTP.

The problem with the robustness principle is that it is heavily misinterpreted in practice. The original intent was to remind implementors to be wary of ambiguity in specifications, but in practice, it has often been taken to mean that implementors must try to scavenge as much understanding as possible from flagrant violations of the specification.


I think that's a consequence of another network principle: default deny. In practice anything that doesn't look like a standard and common protocol gets firewalled out into oblivion at some stage or other: first anything that doesn't look like TCP, and if you are lucky UDP, then block everything but well known ports. Then block anything that doesn't look like HTTP, and so on and on.

So we end up with a matrioska stack of encapsulated protocols.


To default deny everything other than HTTP is patently absurd and it should not be used as an excuse for poor engineering.

The security argument is poor at best, and dangerous at worst. The fact that one can SSH over HTTP means organizations are no more secure than if they just allowing direct SSH access. Forcing everything over HTTP serves to obfuscate what could be dangerous traffic (SSH-based exfiltration) as innocuous web-browsing.

The real way to implement default deny is to block everything except that which is required. If new ports were required for HTTP2 and HTTP3 then you would simply open those ports. If you didn't want to support HTTP/1 because it's unencrypted and it closes the connection after each request you would block port 80 and be done with it. Instead, we build super-complex software that changes its behavior based on a version string in a GET request.


I agree completely, but often that's the reality for protocol designers.


It is an RFC, and these tend to be about network protocols…

HTML <= 4 used to follow this principle, and turned out to be a non-interoperable mess. It needed an intervention in the form of HTML5. Although it may seem that HTML5 embraces the robustness principle, it's more of a workaround for it: it exactly defines how to handle any input, even "garbage" one, leaving implementers no room for flexibility, so there are no undefined inputs left to be liberal about (XHTML is also defined for all possible inputs with no ambiguity, they only differ in what they prescribe for handling of the "bad" ones).

Non-network things decay too. For example, there's no useful spec for GIF. Real-world files depend on specific bugs, like misinterpretation of frame rate or spec-incorrect handling of backgrounds.


Browser interpretations of HTML are a indeed a hot mess.

However, HTTP and HTML tend to grow together.

By assigning new ports to HTTP/2 and HTTP/3 (and the versions of HTML that are prevalent at the time the protocols are released), older browsers that are not receiving updates would be completely unaware of the new ports and would only connect to the ports used for older versions where the clients would happily exist in their "I'm going to interpret things my way" world.

When connecting to the new ports, browsers would have to agree to abide by the new well-defined and strict specifications.

As support for HTTP/1 (and the associated older versions of HTML) disappeared, the HTTP/1 servers would either be shut down or configured to server redirects to the HTTP/2 and HTTP/3 content. In the latter case, should the client continue to issue HTTP/1 request to HTTP/2 servers, the server would send a "Version not supported" message telling the client to upgrade.


Yes, and when your protocol is almost 30 years old and has most of the true quirks and edge cases ironed out because nearly everyone with a computer uses it - I think adopting a more strict approach can make sense.

That said - there's a reason this draft wasn't adopted: it doesn't make sense in most other areas (and certainly wouldn't have made sense for http and some of these protocols even 20 years ago).

The reality of technology is almost everyone gives zero fucks about how good your standard/protocol is - they want to use it to do something genuinely useful.

---

To summarize: The initial argument of this draft states "The posture this statement advocates promotes interoperability in the short term, but can negatively affect the protocol ecosystem over time."

My rebuttal is: Interoperability in the short term FAR OUTWEIGHS negative consequences in the long term, and should be prioritized for any protocol which has not yet seen wide and robust adoption.


They're only acknowledging this now?

This principle has long been known to be a bad idea; I've articulated this myself and have heard it from others.

At least most lower-level infrastructure pieces should accept only their documented inputs, and loudly diagnose and reject anything else.

As a software engineer, you cover this with test cases: not just happy input test cases but test cases which verify that bad inputs are handled appropriately.

In my job, I write tests which validate that expected assertions go off in C API's. The assert framework can be reprogrammed to execute a longjmp, or a successful process termination: two different ways available to the test case writer to have the code continue after a failed assertion and report a successful test case (assertion was expected).

For anything that is multiply implemented and has many dependencies, the robustness principle is deadly in the long run.


I'm starting to discover an interesting contrast with SCADA industrial protocols. In principle, being "liberal in what you accept" when you're writing a controller for an industrial process, well, it's a non starter.

But, the protocols themselves are such monstrosities written by committee, that in fact most implementations are very deliberately only partial implementation of the spec, with most of the standard rejected right off the bat.


I thought we'd all agreed by now that the robustness principle is actually a terrible idea, and that you should be strict in what you emit and what you accept, and that any flexibility should be explicitly specified by the spec?


So someday took the "considered harmful" blog trope and IETF'd it. Now I guess someone's going to write a blog replying to this with why the robustness principle is the best thing since sliced bread. Both of them focusing on a generic phrase found in a 40 year old protocol spec, rather than just impartially laying out a series of design considerations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: