In most C and C++ libraries, itoa is actually is done recursively like this: Div...

In most C and C++ libraries, itoa is actually is done recursively like this: Divide by 100000 to get the top 5 digits and the bottom 5 digits and do the conversion for each in parallel (relying on the superscalar nature of CPUs for parallelism, no SIMD). For longs, you divide by 100000 twice, and run 4 streams.

String to integer conversion, however, is a lot harder to do in log n time, but each iteration is usually faster. The same trick doesn't work - you can't efficiently do math on the base 10 string, so the equivalent division by 2^16 is very hard. I think it has to be done in linear time, but this expands to O(n log n) for arbitrary word width due to math ops.

However, a lot of what we do for atoi/itoa assumes you have a finite length. Same with the FFTs: the algorithms rely on finite length. Infinite-length bignum libraries have a huge cost on trivial things like this, and it's part of the cost of doing business in python.

There is a very good chance that the bignum library used here is not optimized for things like atoi and itoa - most bignum libraries are written for cryptography and math where these are not done frequently.