I'm not advocating you write this, I'm saying people have written it, probably h...

Ygg2 · on Oct 19, 2023

Your point doesn't stand for UTF-16 either. It's not a fixed length encoding either. It's broken in UTF-16 as well.

It was always O(n).

Of course assuming you aren't using UTF-32, which has its own set of problems (BE or LE), and sees little usage outside of China.

taway1237 · on Oct 19, 2023

...it's not O(n). Many languages, JS, Java and C# included, have O(1) access to a character at a given position. You correctly note that it won't work well with international strings, but GP is right that A LOT of code like this was written by western ASCII-brained developers.

alexvitkov · on Oct 20, 2023

Haven't used Java in a while but I believe charAt() returns a UTF-16 codepoint and is constant time access. So something like the above works not only for ASCII, as well as for the majority of Western languages and special characters you may encounter on a day to day basis.

Ygg2 · on Oct 20, 2023

It's constant time iff you ignore surrogate pairs and Unicode. By that logic UTF8 is constant time if you ignore anything not ASCII because most text is in English.

Saying it works fine if you ignore errors and avoid edge cases is just a clever rephrashing of it worked on my machine.

Plus Emojis are Unicode U+1F600 and above, so even in Western language you are bound to find such "exceptions" .