Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I often recommend a "embrace, extend, extinguish" approach to AWS: Starting there for simplicity is fine, then "wrap" anything bandwidth intensive with caches elsewhere (every 1TB in egress from AWS will pay for a fleet of Hetzner instances with 5TB included, or one or more dedicated servers).

Gradually shift workloads, leaving anything requiring super-high durability last (optionally keeping S3, or competitors, as a backup storage option) as getting durability right is one of the more difficult things to get confidence in and most dangerous ones to get wrong.

Wrapping S3 with a write-through cache setup can often be the biggest cost win if your egress costs are high. Sometimes caching the entire dataset is worth it, sometimes just a small portion.



> Wrapping S3 with a write-through cache setup can often be the biggest cost win if your egress costs are high. Sometimes caching the entire dataset is worth it, sometimes just a small portion.

Is there an off the shelf implementation of this?


I usually just use a suitable Nginx config unless there's a compelling reason. It means you "waste" the first read - you just let post/put etc. hit S3, and just case reads, but it's easier to get right. It's rare this matters (if your reads are rare enough relative to writes that avoiding the cost of the first read matters, odds are the savings from this are going to be questionable anyway - the big benefit here comes when reads dominate massively)


Minio used to support this, but removed the feature a while back. It was called “gateway mode”. Sadly, I know that doesn’t help much now…

https://blog.min.io/deprecation-of-the-minio-gateway/amp/


Every time I contemplate increasing my use of S3 or similar cloud services, I get annoyed at the extremely sorry state of anything on-prem that replicates to S3.

Heck, even confirming that one has uploaded one's files correctly is difficult. You can extract MD5 (!) signatures from most object storage systems, but only if you don't use whatever that particular system calls a multipart upload. You can often get CRC32 (gee thanks). With AWS, but not most competing systems, you can do a single-part upload and opt into "object integrity" and get a real hash in the inventory. You cannot ask for a hash after the fact.

I understand why computing a conventional cryptographically-secure has is challenging in a multipart upload (but that actually all that bad). But would it kill the providers to have first-class support for something like BLAKE3? BLAKE3 is a tree hash: one can separately hash multiple parts (with a priori known offsets, but that's fine for most APIs but maybe not Google's as is), assemble them into a whole in logarithmic time and memory, and end up with a hash that actually matches what b3sum would have output on the whole file. And one could even store some metadata and thereby allow downloading part of a large file and proving that one got the right data. (And AWS could even charge for that!)

But no, verifying the contents of one's cloud object storage bucket actually sucks, and it's very hard to be resistant to errors that occur at upload time.


Well, S3 is hard to beat for our use case. We make a heavy use of their various tires, we store a somewhat large amount of data but only a minor part ever goes out.

The compute and network heavy stuff we do is still out of AWS.


That's pretty much the one situation where they're competitive, so sounds very reasonable. Some of their competitors (Cloudflare, Backblaze) might compete, but the biggest cost issue with S3 by far is the egress so if not much goes out it might still be best for you to stay there.

Sounds like (unlike most people who use AWS) you've done your homework. It's great to see. I've used AWS a lot, and will again, because it's often convenient, but so often I see people doing it uncritically without modeling their costs even as it skyrockets with scale.


S3 is a decent product with zero competition. You should keep s3, it’s a fair price.


S3 has plenty of competition. It can be a fair price if you rarely read and need it's extreme durability, but that leaves plenty of uses it's totally wrong for.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: