Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The thrust of the post wasn’t meant to be that it is surprising that you can even do these things on one core; sorry if it came across that way.

Rather, it’s that when doing the work on one core (with a way simple implementation) you go faster than many of the measurements that popular scalable systems put forward as evidence that they improve on the state of the art. Which sort of invalidates that evidence. Some of the papers end up left with fairly scant evidence, which should be a serious issue for researchers.

GraphChi and X-Stream (http://infoscience.epfl.ch/record/188535) are much more sophisticated implementations, ones you might not be embarrassed to be beaten by, so they don’t make the same point. And they aren’t any faster than the systems above, and so have the same issues (though perhaps less explaining to do).



I had to propagate colors across a graph with a billion or so edges and had an easy time doing it in-memory in Java with a 32GB laptop and sub-byte data structures. In fact, it takes more time to serialize and deserialize the data than it takes to do the actual calculation.

I'd say, however, the trend is towards data sets being much bigger and I break non-scalable tools frequently in the data profiling process; you can extend the non-scalable ways of doing it by using multiple cores (which is sometimes easy) or SIMD instructions or the GPU. I have even sometimes gone down the rabbit hole of optimizing something non-scalable and hitting the wall. So I am using scalable systems increasingly.


I don't think it's accurate to say it's a different point, but it's a weaker version. And I don't feel that the GraphChi folks have any explaining to do, as I would expect that using a more general framework to solve a problem will have some performance penalty over an expert hand-coding a solution. What that performance penalty would buy us (hopefully) is it's quicker to write the application, and perhaps easier to port elsewhere.

But, I'm also not an expert in graph algorithms, so it's difficult for me to evaluate how much domain knowledge you needed to implement yours.


So you are saying a naive implementation on a single core is showing that people are prematurely scaling graph processing?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: