graveland's comments

graveland · 2025-10-30T15:16:35 1761837395

There's some secret sauce there I don't know if I'm allowed to talk about yet, so I'll just address the existing tech that we didn't use: most things either didn't have a good enough license, cost too much, would take a TON of ramp-up and expertise we don't currently have to manage and maintain, but generally speaking, our stuff allows us to fully control it.

Entirely programmable storage so far has allowed us to try a few different things to try and make things efficient and give us the features we want. We've been able to try different dedup methods, copy-on-write styles, different compression methods and types, different sharding strategies... All just as a start. We can easily and quickly create a new experimental storage backends and see exactly how pg performs with it side-by-side with other backends.

We're a kubernetes shop, and we have our own CSI plugin, so we can also transparently run a pg HA pair with one pg server using EBS and the other running in our new storage layer, and easily bounce between storage types with nothing but a switchover event.

yencabulator · 2025-11-06T17:14:27 1762449267

> would take a TON of ramp-up and expertise we don't currently have to manage and maintain

But you think you have resources to maintain a distributed strongly-consistent replicating block store?

The edge cases in RDB are literally why Ceph takes expertise to manage! Things like failure while recovering from failure while trying to maintain performance are inherently tricky.

adsharma · 2025-10-30T23:02:43 1761865363

Ceph is under LGPL.Cost doesn't seem to be a barrier. Supports k8s through CSI and has observability and documentation.

You can probably hire people to maintain it.

Was it the ramp-up cost or expertise?

graveland · 2025-10-30T14:56:02 1761836162

(I'm on the team that made this)

The raw numbers are one thing, but the overall performance of pg is another. If you check out https://planetscale.com/blog/benchmarking-postgres-17-vs-18 for example, in the average QPS chart, you can see that there isn't a very large difference in QPS between GP3 at 10k iops and NVMe at 300k iops.

So currently I wouldn't recommend this new storage for the highest end workloads, but it's also a beta project that's still got a lot of room for growth! I'm very enthusiastic about how far we can take this!

samlambert · 2025-10-30T18:04:28 1761847468

it's a 70% difference at lower cost. i know math is hard but c'mon try and be serious.