Go's a System language. Doesn't that mean it's more suitable for writing a web s...

leon_ · on May 10, 2011

Go turned out to be a pretty nice general purpose language. You got garbage collection, sane native string handling, native maps and lists. Mix that with static typing and you got a pretty nice language that enables fast and sane web development.

lars512 · on May 10, 2011

Having done a lot of CJK development, I felt that Go's unicode strings were pretty kludgey last time I looked. Go's strings are all utf8, so unless you're working in its ASCII subset alone, you have to manually iterate over multi-byte runes to get the unicode codepoints out. That's really not what I'd call a friendly unicode handling comparable to scripting languages, or even Java.

Please correct if things have changed. I haven't revisited Go for a little while now, and would be very interested as to any updates to its unicode handling.

dchest · on May 10, 2011

1. Range on a string iterates over Unicode code points (runes):

    s := "Какая-то строка"
    for _, rune := range s {
        // do something with rune 'К', 'a', ...
    }

2. Converting to []int gives you a slice of runes:

   s := "Какая-то строка"
   runes := []int(s)

   sub := string(runes[:8]) // "Какая-то"

however, slicing a string directly will slice it by byte:

   s[:8] // "Кака"

3. With package utf8 (http://golang.org/pkg/utf8/) you can manipulate runes manually.

While this is all not intuitive (you have to know what does what), I find it rather easy.

supersillyus · on May 10, 2011

Iteration over strings is rune-by-rune in Go. However, (somewhat counter-intuitively) string indexing/slicing is byte-by-byte, so you can't just go "str[0:10]" and get the first 10 code points. Then again, that's true in utf16 also, if I'm not mistaken. But if you want an array of runes instead of a utf8 encoded string, you can just do "[]int(mystring)" and it'll do the conversion for you.

fauigerzigerk · on May 11, 2011

Same in Java, C# and Python. To get the codepoints out or access the nth codepoint you need to iterate over the string if you want your code to be correct and safe.