Great advice for scaling. Especially considering the size of scribd. I will definitely be interested to see the nginx module, but I wonder how this development and the recent news of Rack::Cache will affect each other.
Regarding the article, this statement really freaked me out:
> So, we’ve got an idea - why can’t we place such a server in front of our application and make it cache content for all users in the world?
Seriously? They'd never even _heard_ of gateway / reverse proxy caching? They came up with it independently? I knew HTTP caching wasn't understood very well but this shows that it's so much worse than I imagined.
It's sad that such a fundamental aspect of the web's architecture is just completely off the radar for so many "web developers". That's not a knock on the Scribd folks, it's a systemic problem across our entire industry as far as I can tell.
Of course we've heard (and even used) caching reverse proxies - I've just explained the flow of our thoughts and words in the chat room during the brainstorming. I mean one thing is when you know about something and completely different when all pieces of a puzzle get together and you see a full picture.
Ahhh, I see. Like telling a joke in the first person because it just works better.
Well, that's definitely good to know. It does read as if you were forced to discover the concept independently, though, which might be a tad confusing for those familiar with the technique. Maybe it's just me.
At any rate, I do like the clear description of the situation and discussion that led to putting a gateway cache in place. It helps make things click.
Actually squid is fast enough for us :-)
And, as I already pointed out in comments to the post:
Before choosing Squid 3.0 I've discussed this with percona guys (which I really trust since I worked there quite a while) and they shared with my some really weird stories from their experience (on boardreader.com) with Varnish (constant memory leaks, crashes, etc). So, I decided to go with Squid and pretty happy about the result so far.
Yeah. But it looks like they have Nginx in front of Squid, so Nginx would still be a bottleneck if they used Varnish. It seems like the cache ought to be at the front of the stack.