Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You’re correct about algorithms that do “human” things with text, but you need to think of more examples.

That’s how you write hashing algorithms, checksums, and certain trivial parsers.[0]

But most importantly, right or wrong, this code is out there, running today, god knows where, and you do not slow it down from O(n) to O(n^2).



Is such code really going to be ported to WASM though? And does it really matter for the string lengths that a typical web application has to process? WASM really doesn't have to worry about legacy that much.


Hashing algorithms and checksums work on bytes, not characters.


Here is the JDK 7 String#hashCode(), which operates on characters: https://github.com/openjdk-mirror/jdk7u-jdk/blob/f4d80957e89....

That's changed in the newer versions, because String has a `byte[]` not a `char[]`, but it was just fine. A hash algorithm can take in bytes, characters, ints, it doesn't matter.

In Java, you don't get access to the bytes that make up a string, to preserve the string's immutability. So for many operations where you might operate on bytes in a lower level language, you end up using characters (unless you're the standard library, and you can finagle access to the bytes), or alternately doing a byte copy of the entire string.

I admit, checksums using characters are a bit weird sounding, but they should also be perfectly well-defined.


A possible optimization would be to change internal representation on-the-fly for long-ish strings as soon as random accesses are observed. Guidance from experiments would be required to tell where the right tresholds are. Also JavaScript implementations already do internal conversions between string implementations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: