I don't have anything for you, but if you have some normally allocated hierarchal data structures in order to free them you'll have to go through their members, chase pointers, etc., to figure out the addresses to free, then call free on them in sequence. That's all going to be a lot more expensive than just memsetting a bunch of data to zero, which you can do at whatever the speed of your cores memory bandwidth is.
Yep. And you often don’t even need to zero the data.
Generally, no paper or SO answer will tell you where your program spends its time. Learn to use profiling tools, and experiment with stuff like this. Try out arenas. Benchmark before and after and see what kind of real performance difference it makes in your own program.