Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why aren't these tuned automatically? Should be pretty easy.


It sounds simple until you actually try doing that. The thing is, reducing the costing to these two parameters is a significantly simplified model of what happens in practice. So you can't just run some I/O benchmark to measure random vs. sequential requests. For example the defaults that worked fine for a long time (seq_page_cost=1 and random_page_cost=4) certainly do not reflect the difference between random and sequential I/O on rotational devices (where the device can easily do 100MB/s in sequential access, but less than 1MB/s in random).


> For example the defaults that worked fine for a long time (seq_page_cost=1 and random_page_cost=4) certainly do not reflect the difference between random and sequential I/O on rotational devices (where the device can easily do 100MB/s in sequential access, but less than 1MB/s in random)

The postresql documentation explains why. They assume HDD random access is 40x slower than seq access but that you'll have a 90% cache hit rate, so random_page_cost=4 reflects 10% of 40x slower.


Not entirely. The documentation says you can interpret it that way, not that it's how the numbers were determined.

AFAIK it's much more "We're using those numbers as defaults because they seem to be working well," rather than "We did extensive benchmarking and these are the right values!"

You can measure how much slower random I/O is fairly easily. But the question is how to derive PostgreSQL cost parameters from that. Should you use the same assumption about 90% cache hit ratio (why?) or should you use some different value?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: