> Wrapping S3 with a write-through cache setup can often be the biggest cost win...

vidarh · on Sept 23, 2024

I usually just use a suitable Nginx config unless there's a compelling reason. It means you "waste" the first read - you just let post/put etc. hit S3, and just case reads, but it's easier to get right. It's rare this matters (if your reads are rare enough relative to writes that avoiding the cost of the first read matters, odds are the savings from this are going to be questionable anyway - the big benefit here comes when reads dominate massively)

mbreese · on Sept 23, 2024

Minio used to support this, but removed the feature a while back. It was called “gateway mode”. Sadly, I know that doesn’t help much now…

https://blog.min.io/deprecation-of-the-minio-gateway/amp/

amluto · on Sept 23, 2024

Every time I contemplate increasing my use of S3 or similar cloud services, I get annoyed at the extremely sorry state of anything on-prem that replicates to S3.

Heck, even confirming that one has uploaded one's files correctly is difficult. You can extract MD5 (!) signatures from most object storage systems, but only if you don't use whatever that particular system calls a multipart upload. You can often get CRC32 (gee thanks). With AWS, but not most competing systems, you can do a single-part upload and opt into "object integrity" and get a real hash in the inventory. You cannot ask for a hash after the fact.

I understand why computing a conventional cryptographically-secure has is challenging in a multipart upload (but that actually all that bad). But would it kill the providers to have first-class support for something like BLAKE3? BLAKE3 is a tree hash: one can separately hash multiple parts (with a priori known offsets, but that's fine for most APIs but maybe not Google's as is), assemble them into a whole in logarithmic time and memory, and end up with a hash that actually matches what b3sum would have output on the whole file. And one could even store some metadata and thereby allow downloading part of a large file and proving that one got the right data. (And AWS could even charge for that!)

But no, verifying the contents of one's cloud object storage bucket actually sucks, and it's very hard to be resistant to errors that occur at upload time.