Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> In python 3 it always blows up when you mix bytes with text so you can catch the issue early on.

This is definitely the case. I've been wrestling with bytes and strings all the time during the port of a Django application to Python 3 for a costumer. I can see myself encoding and decoding response bodies and JSON for the time being. For reasons I didn't investigate I don't have to do that with projects in Ruby and Elixir. It seems everything is a string there and yet they work.



I’ve worked in a variety of Django codebases, and the last time I had trouble with string encoding/decoding was with Python 2. Since moving to Python 3, I have rarely needed to manually encode or decode, and I genuinely can't remember the last time I did.

Perhaps there’s something about a port that requires encoding/decoding bytes/strings?


The encoding/decoding is heavy in codebases that have to run on Python 2 and Python 3 at the same time, and authors are worried about handling unicode correctly on python 2.

Ironically when your python 2 app doesn't care about unicode, the porting to python 3 is actually much easier.


you don't have to do these things in python 3 either, your problem was that you had python 2 code that was already broken and you are started adding encode/decode to fix it, typically making the problem worse.

If you write code in python 3 from the start you rarely need to use encode() and decode(). Typically what you always want is a text not bytes.

Exception to it might be places where you want to serialize like IO (network or files, although even files are converted on the fly unless you open file in a binary mode).


The problem are external APIs returning whatever they want no matter what they should return. The world is messy.

Example, I just had to write this

  return urllib.request
    .urlopen(url, timeout=60)  
    .read()
    .decode("utf-8", errors="backslashreplace")
(probably not valid code because of the newlines but you'll forgive me) Then I use that string in a regexp, etc.

This is the only language where I have to explicitly deal with encodings at such low level. I don't feel like I want to use it for my pet projects.


Why is that bad? The result returned from an URL is always binary. In certain situations it could be text but it doesn't have to be. If the result was an image and you would want to convert the data to image, if it was sound file same, you should think of text as another distinguished type.

Of course urllib could have method text() that would do such conversion, but then urllib is not requests. It never was user friendly.

Edit: personally I use aiohttp, the interface is much nicer: https://aiohttp.readthedocs.io/en/stable/client_reference.ht... if I can't use asyncio then would use requests.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: