I think the point being made is that -m does not count characters, it counts multi-bytes. Or at least tries to. So the same Unicode point in utf-8 and utf-16 (and utf-32) could be very different strings of bytes. No way to tell unless you know before hand you are dealing with utf-8 or 16. Hence BOM, but no one likes that.
Its hard. And possibly we have to abandon tools like wc when we leave the Latin world.
Its hard. And possibly we have to abandon tools like wc when we leave the Latin world.