They shouldn't, but they do tend to have a large quantity of memory backing them that gets added to when grown into. The kernel uses 4k, 8k, or 16k stacks; a quick test on a Linux x86-64 system suggests that userspace has 8M stacks by default.
Hackish test code to recurse infinitely and print the stack pointer until a segfault:
I didn't see this until now, the problem is that when you spawn a new thread space for the stack needs to be allocated. Since the stack can't be reallocated easily in C/C++ it also needs to be "big enough".
The current implementation is to allocate several megabytes of stack for each thread, since memory accounting isn't strict this doesn't create problems because unused stack doesn't really take up space.
If you start accounting memory you will also have to be stricter with allocating stacks for new threads, limiting yourself to just a page or two wherever possible. This is not an easy task.