Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> a contrast between Claude’s modern approach [...] XML, a technology dating back to 1998

Are we really at the point where some people see XML as a spooky old technology? The phrasing dotted around this article makes me feel that way. I find this quite strange.

 help



XML has been "spooky old technology" for over a decade now. It's heyday was something like 2002.

Nobody dares advertise the XML capabilities of their product (which back then everybody did), nobody considers it either hot new thing (like back then) or mature - just obsolete enterprise shit.

It's about as popular now as J2EE, except to people that think "10 years ago" means 1999.


XML is used a lot in standards and publishing industries -- JATS, EPUB, ODF, DOCX/XLSX/..., DocBook, etc. are all XML based/use XML.

And I think this makes sense.

XML is really great for text documents with embeds and markup, either semantic (this part of the paper is an abstract) or visual (this part of the document should be 14-point and aligned right). You can do this in JSON, but it's a pain.

JSON is great for representing data. If you have some data structures and two machines trying to exchange them, JSON is great for that.

TOML / yaml / hcl / JSON with comments are great at config. If you have a human writing something that a machine is supposed to understand, you don't want turning completeness and you don't want to deal with the pain of having your own DSL, those are great.


Without being facetious, isn’t HTML a dialect of XML and very widely used?

HTML is actually a dialect of SGML. XHTML was an attempt to move to an XML-based foundation, but XML's strictness in parsing worked against it, and eventually folks just standardized how HTML parsers should interpret ill-formed HTML instead.

I do wish they at least allowed you to make any tag self closing so I can do <div class="my-element" /> without needing to include a </div>

Ah good to know. It’s interesting (to me) how similar they look to each other but you and other commentators below mention how they’re more like distant cousins

I suppose the proof is in the parsing


No, HTML was historically supposed to be a subset of SGML; XML is also an application of SGML. XHTML is the XML version of HTML. As of HTML5, HTML is no longer technically SGML or XML.

HTML is far loosier-goosier in its syntax than XML allows. There was an attempt to nail its syntax down in the pre-HTML 5 days; that's XHTML. When HTML 5 pivoted away from that, that spelled the end of these two things ever coming together.

Really, I think you can trace a lot of the "XML is spooky old technology" mindset to the release of HTML 5. That was when XML stopped being directly relevant to the web, though of course it still lives on in many other domains and legacy web apps.


> There was an attempt to nail its syntax down in the pre-HTML 5 days; that's XHTML. When HTML 5 pivoted away from that, that spelled the end of these two things ever coming together.

Exactly the opposite; WHATWG “Living Standard” HTML (different releases of which were used as the basis for W3C HTML5, 5.1, and 5.2 before the W3C stopped doing that) includes an XML serialization as part of the spec, so now the HTML-in-XML is permanently in sync with and feature-matched with plain HTML.


https://html.spec.whatwg.org/multipage/xhtml.html

“Warning! Using the XML syntax is not recommended, for reasons which include the fact that there is no specification which defines the rules for how an XML parser must map a string of bytes or characters into a Document object, as well as the fact that the XML syntax is essentially unmaintained — in that, it’s not expected that any further features will ever be added to the XML syntax (even when such features have been added to the HTML syntax).”


No, HTML was a specific application profile of SGML (modern HTML, I believe, no longer technically is), XML is a newer (than HTML) application profile of SGML inspired by HTML but aiming for greater generality.

XHTML was an attempt to encode HTML semantics (approximately, each version of XHTML also altered some semantics from HTML and previous XHTML versions) in XML, and the XML serialization of modern, WHATWG HTML exactly encodes HTML semantics in XML.


Yes, there's a handful of niches. Still 1/1000th the momentum it had, or adoption it was expected to get, and nobody under 40 even considers it for new stuff.

It was the blockchain of its day

Also in finance. XBRL and FIXML although I do not know how widely used the latter is.

For me, even when it was first released, I considered obsolete enterprise shit. That view has not diminished as the sorry state of performance and security in that space has just reaffirmed that perception.

I kind of miss SOAP. Ahead of its time? Probably not, but I built some cool things on top of it

Right now I'm writing adapter so people could call one SOAP service using simpler interfaces. That involves implementing WS-Security with non-standard algorithms, that also involves dealing with things like XML escaped into a string and embedded inside another XML.

Let's say I hope for the day I'll miss SOAP. Right now I have too much of it.


atproto's lexicon-based rpc is pretty soap-like

20 years old means 1980!

It's not the hot new thing but when has hype ever mattered for getting shit done? I don't think anyone who considers it obsolete has an informed opinion on the matter.

Typically a more primitive (sorry, minimal) format such as JSON is sufficient in which case there's no excuse to overcomplicate things. But sometimes JSON isn't sufficient and people start inventing half baked solutions such as JSON-LD for what is already a solved problem with a mature tech stack.

XSLT remains an elegant and underused solution. Guile even includes built in XML facilities named SXML.


>It's not the hot new thing but when has hype ever mattered for getting shit done?

People who wanted to "get shit done" had much better alternatives. XML grew out of hype, corporate management forcing it, and bundling to all kinds of third party products and formats just so they can tick the "have this hot new format support" box.


XML is perfectly fine. What are these alternatives?

YAML is just bad. JSON is harder to read for deeply nested structures. TOML and the like don't have enough features.


XML is pretty fantastic for a lot of things that JSON is not up to the task for. And YAML ... has it's own, special issues.

Maybe ASN.1? Although that has an official XML encoding so maybe not.

> It's not the hot new thing but when has hype ever mattered for getting shit done?

But it used to be. And so it was used for a lot of things where it wasn't a great fit. XML works fairly well as a markup format, but for a lot of things, something like json models the data better.

> which case there's no excuse to overcomplicate things.

And that's a problem with xml. It's too complicated. Even if the basic model of xml is a good fit for your data, most of the time you don't need to worry about namespaces and entity definitions, and DTDs, but those are still part of most implementations and can expose more attack surface for vulnerabilities (especially entity definitions). And the APIs of libraries are generally fairly complicated.


I don't think I'd agree that it's a problem with the tool. However you do raise a good point - that there are problems that JSON and similar struggle with where XML would introduce a noticeable amount of unneeded complexity. It's a wide enough gap that a simplified subset of XML is probably be warranted. (I assume it must exist by now and I've just never heard of it?)

> a simplified subset of XML is probably be warranted

There are several. And that's the problem. It isn't hard to find a subset with a library for a single language that uses a slightly different subset from the other subsets. But none of them ever caught on.


It makes me wonder how well an LLM like Opus can generate XSLT which was always the hard part when writing by hand.

Given that the SXML DSL has existed since the early 2000s have ergonomics really been a limiting factor? Of course having LLMs write things for you is also useful.

Obsolete enterprise shit I guess includes podcasting. Impressive for the enterprise.

I’d be very curious what lasting open formats JSON has been used to build.


That the podcast feed format is XML based is an insignificant detail - and a remnant of the past, nobody cares about.

People upload their podcasts to a platform like Apple Music or Spotify or Substack and co, or to some backend connected to their Wordpress/Ghost/etc) and it spits the RSS behind the scenes, with nobody giving a shit about the XML part.

Might as well declare USSR a huge IT success because people still play Tetris.


didn't know html was spooky tech, TIL. /s

HTML predates XML by 5 years.

What's more, the web standards bodies even abandoned a short-lived XML-hype-era plan to make a new version of HTML based on XML in 2009.

That from this touted to the heavens format a handful of uses remain (some companies still using SOAP, the MS Office monster schemas, RSS, EPUB, and so on) is the very opposite of the adoption it was supposed to have. For those that missed the 90s/early 00s, XML was a hugely hyped format, with enormous corporate adoption between 1999–2005, which deflated totally.

Did you also learned those things too today?


thinking for a bit longer, it does make sense. internet came before xml.

XML is still around, but I don't think many people would choose it as a serialization format today for something new.

The use of XML as a data serialization format was always a bad choice. It was designed as a document _markup_ language (it’s in the name), which is exactly the way it’s being used for Claude, and is actually a good use case.

XML is back, everyone is rediscovering the terminal. Soon we’ll discover that object oriented programming is good again.

Unambiguously, though, it is. There's so much trash imperative code in its training data that LLMs tend to vomit out garbage. But if you anchor it with OOP, the quality tends to be higher.

If you think XML is old tech, wait until you hear of EDI, still powering Walmart and Amazon logistics. XML came in like a wrecking ball with its self-documenting promise designed to replace that cryptic pesky payload called EDI. XML promised to solve world hunger. It spawned SOAP, XML over RPC, DOM, DTD, the heyday was beautiful and Microsoft was leading the charge. C# was also right around this time. Consulting firms were bloomed charged with delivering the asynchronous revolution, the loosely coupled messaging promises of XML. I think it succeeded and it’s now quietly in the halls of warehouse having a beer or two with its older cousin the Electronic Data Interchange aka EDI.

EDI is a PITA, but we're trying to solve it Surpass. The underlying architecture is key, there's variability in every element, segment and the overarching golden rule: the issuer gets to define their own interpretation of the standard.

Haha, EDI is such a pita. very efficient for machines I suspect - the first time they tried to take over.

EDI is XML now.

It all brings back nightmares from migrating the older style EDI for healthcare data for what was HL7 XML at the time. XML is widely used still for all kinds of stuff. On some level if JSON was allowed to evolve the same way, eventually you would just wind up with something like XML.

JSON is a bad version of XML.

Imagine the worst data format you can think of.

Then spend the next week making it even more convoluted.

That data format is still better than EDI.


I'm not sure if this is a compliment or insult to my powers of invention.

XML is as old now as the PDP-11 was when XML came out.

I tried following the best practice to use XML tags and the difference was not observable. I honestly believe Anthropic forgot to remove that part of the documentation from Sonnet 3.x days and now people are still writing blogs about this secret sauce

It has a number of security issues which have not been fixed which could be used for really interesting exploitation.

I don't think anybody's proposing to throw recursive entity definitions at Claude. Just a little light informally-defined angle-bracket markup.

XML works great for XMPP. KDL is compatible with it too.

What gets me is going from this structured data to Markdown which doesn’t even have enough features & syntax that the LLMs try to invent or co-opt things like the blockquote for not quoting sources.


The evidence suggests that XML was never that popular though for the general audience, you have to admit.

For Web markup, as an industry we tried XHTML (HTML that was strictly XML) for a while, and that didn't stick, and now we have HTML5 which is much more lenient as it doesn't even require closing tags in some cases.

For data exchange, people vastly prefer JSON as an exchange format for its simplicity, or protobuf and friends for their efficiency.

As a configuration format, it has been vastly overtaken by YAML, TOML, and INI, due to their content-forward syntax.

Having said all this I know there are some popular tools that use XML like ClickHouse, Apple's launchd, ROS, etc. but these are relatively niche compared to (e.g.) HTML


MS Office and Open-/LibreOffice are using zipped xml files (e.g. .docx, .xlsx and .odt). Svg vector graphics is xml, the x in ajax stands for xml (although replaced by json by now). SOAP (probably counts as the predecessor of REST) is xml-based.

XML was definitely popular in the "well used" sense. How popular it was in the "well liked" sense can maybe be up for debate, but it was the best tool for the job at the time for alot of use cases.


Yup. Kids these days...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: