The thrust of the post wasn’t meant to be that it is surprising that you can eve...

PaulHoule · on Jan 17, 2015

I had to propagate colors across a graph with a billion or so edges and had an easy time doing it in-memory in Java with a 32GB laptop and sub-byte data structures. In fact, it takes more time to serialize and deserialize the data than it takes to do the actual calculation.

I'd say, however, the trend is towards data sets being much bigger and I break non-scalable tools frequently in the data profiling process; you can extend the non-scalable ways of doing it by using multiple cores (which is sometimes easy) or SIMD instructions or the GPU. I have even sometimes gone down the rabbit hole of optimizing something non-scalable and hitting the wall. So I am using scalable systems increasingly.

scott_s · on Jan 19, 2015

I don't think it's accurate to say it's a different point, but it's a weaker version. And I don't feel that the GraphChi folks have any explaining to do, as I would expect that using a more general framework to solve a problem will have some performance penalty over an expert hand-coding a solution. What that performance penalty would buy us (hopefully) is it's quicker to write the application, and perhaps easier to port elsewhere.

But, I'm also not an expert in graph algorithms, so it's difficult for me to evaluate how much domain knowledge you needed to implement yours.

gaalze · on Jan 17, 2015

So you are saying a naive implementation on a single core is showing that people are prematurely scaling graph processing?