Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Amazon EC2, MongoDB, S3. The EC2 instances scale with how many stale feeds we have, but it is usually less than 2.

Just checked and we have ~25k feeds in the system, though not all are deep harvesting as we call it.

Note we do a few things over just extracting the full content as well, we also try to grab out images and create a pleasing thumbnail using face detection etc.. So that probably slows things down a good deal as well.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: