Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
New UUID formats have been approved (ietf.org)
70 points by delduca on May 9, 2024 | hide | past | favorite | 33 comments


I was a bit confused because this document is a draft and I couldn't find any mention of "approval." But here's a link to the published RFC that obsoletes the original UUID RFC 4122:

https://www.rfc-editor.org/rfc/rfc9562


The metadata about the linked document (https://www.ietf.org/archive/id/draft-peabody-dispatch-new-u...) is at https://datatracker.ietf.org/doc/draft-peabody-dispatch-new-... which shows that this is indeed a very old draft.

The linked document is version 01 of the draft, that draft went up to version 04 before being replaced by draft-ietf-uuidrev-rfc4122bis (https://datatracker.ietf.org/doc/draft-ietf-uuidrev-rfc4122b...), which then went up to version 14 before finally being approved as RFC 9562.


Thanks, that's what I was looking for.

@dang, could you please update the link to this?


In retrospect it is super obvious that you'd want a time-ordered UUID with a random offset. I wonder how they managed to screw up 5 iterations of UUID that were basically useless to most developers.


It’s the evolution of the internet. When is the last time you heard someone say WAN instead of cloud?

Lexicographically-sortable and universally-unique are the two desirable properties. Indexable would have saved mongo’s fate, if added up-front.

The funny thing is, no one was stopping us from using cuid or blid. I use ulids all the time for primary keys, and the query properties are simply exactly as advertised.

Annoying for monotonic to even be optional.


Why would someone ever use “WAN” and “cloud” in an interchangeable manner?


I wondered if anyone would question that! A touch of snark.

But, I’ve now been in the industry long enough that people have given me the same business requirements for intrawebs that they now use to talk about the cloud.

The idea of a UUID was always to sync pieces of data generated on nodes that might be across the street, or in Munich or Tokyo-023. They don’t have the same synchronous atomic clock.

Same problem, but we have new bandwidth, new compute, new thinkers, and new yellers.

The key is that latency has become low enough for real-time applications to always be on your mind.

Example: artificial neural networks have been around since the 60’s. You couldn’t get paid to work on them until GPUs gave you the compute, and cryptocurrency flooded the market with it. Same thing with fiber and always connected UI lifestyles.


Imho the UUID format is a hopeless cause anyways. You don't want to waste any bits in a 128bit identifier on version metadata, UUID uses 4. With the birthday paradox that effectively decreases the collision resistance by a factor of 4. (2^64 vs. 2^62 elements for a collision probability of 0.5)


I don't follow. You (correctly) say UUID is 128 bits, but then talk about 64 bits being prone to collision.


UUID is 124 bits. The collision probability (via the birthday paradox) is 50% after roughly enumerating sqrt(2*2^124) elements. [1] Since sqrt(x) = x^1/2 you can say that a n bit hash/id has a collision probability of 50% after enumerating 2^((n+1)/2) elements. 2^n/2 is just the extra handwavey version of that.

1: https://stackoverflow.com/questions/62664761/probability-of-...


Thanks!


2^62 things are still a fairly large number for plausible systems.


Not if you want "Globally Unique Identifier"s. These have to work even when every person on earth generates a couple billion of them.


I have no idea why it bothers me so much, but it bugs the shit out of me when people create temporary directories by creating a directory and stapling a UUID on the end of a base name, as opposed to the APIs that exist for that explicit purpose.


What is that API called?


Many languages have built-in support for managing tempdirs; if you're using Linux you can call `mktemp`:

https://www.man7.org/linux/man-pages/man1/mktemp.1.html


Very interesting! Thank you for informing me!


> Basically useless

And yet, implemented across an incredible number of systems! Is it cargo culting? Was it "good enough"? Why do you think these "basically useless" mechanisms are/were so widely adopted and used?


Probably distributed computing was not common at the time, so a truly universally unique identifier was not needed. And UUIDs were not used as database keys so they did not need to be time ordered.

I still think they were quite naive. I mean, using MAC addresses in the UUID? Not great for privacy.


UUIDs were invented for distributed computing. But distributed computing in the original true sense of the term, not in the current sense of getting horizontal scalability for what is in the end one big application.


In my experience, the vast majority of the time UUIDs are "implemented", they are just 16 randomly generated bytes; the important part is that they are unique.


“5 iterations”.

That’s…not how it works. You’ve literally just taken “UUID6” and made a bunch of incorrect assumptions.


Explain.


They're not "iterations", they're variants. It's not like they made a UUIDv1, then came up with a better idea to make UUIDv2, then improved that and made UUIDv3, etc. They simply have different purposes. UUIDv4, for example, is for when you simply want the UUID to be fully random. UUIDv8 is for when you want to use your own completely custom UUID generation scheme but still want it to fit in to the UUID format. UUIDv7 is for when you want UUIDs which are sortable based on timestamps and have a random component. Other UUID variants work well if you have a known set of machines generating UUIDs and you can give each machine a unique ID to ensure that no two machines will generate the same UUID.

Now some UUID versions do supersede others, for example UUIDv7 is pretty much just a better UUIDv1. But UUIDv4, for example, has not been superseded and remains a good choice for some uses.


But UUIDv1 did exist for many years before UUIDv2, v3, v4, and v5 were all built and many years before this final approval of v6, v7, and v8. Even if the variants don't "supersede" each other, live side-by-side and aren't meant to imply a version number like "semantic versioning" and a "proper" succession, the numbers still represent a progression of decades of real time between variant standards and the question remains valid how the designers of v4 especially missed/under-appreciated the changes/fixes/tweaks in v6 and v7 and didn't make such variants then? Hindsight is 20/20 of course, and it is easy to armchair quarterback how "let's keep the timestamp from v1 but use random data like v4" sounds like an obvious variant if you were in the process of designing v4 to "fix/replace" then usages of v1 knowing most of the trouble was just with MAC addresses and most of the advantages to v1 were time-sorting.


I agree that it should seem obvious to introduce a "UUID with sortable timestamp first then a bunch of random bytes" much earlier, probably alongside v4 or something. But then the complaint is, "why did it take many decades before they made this variant", not "why did it take them 7 iterations".


Sure, you are technically correct (as we all know, the best kind of correct), but I think the point is maybe that the comment far above about "7 iterations" was meant to be more "emotionally correct" than technically correct.


It was, thank you.


Lots of typos in this document:

> The format for the 16-octet, 128-bit UUIDv6 is shown in Figure 2

Figure 2 is a uuidv7

> UUIDv8 SHOULD only be utilized if an implementation cannot utilize UUIDv1, UUIDv6, or UUIDv8

Probably meant this say or uuidv7

Hopefully this has been fixed...


> UUIDv8 SHOULD only be utilized if an implementation cannot utilize UUIDv1, UUIDv6, or UUIDv8

Did they mean UUIDv7 for the last one?


I'm glad I wasn't the only one who noticed that. As written, it makes no sense.


The clocks of systems using this UUIDs may at some point use an external NTP server for synchronization, and may cause problems without something being designed to check this (e.g. older and overflow time stamps).

Take care.


I'm curious as to why they initially chose the Gregorian calendar for dates and then stuck with it for the v6 implementation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: