Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

JOINs are a non-trivial problem with data in the terabyte-to-petabyte range. Apache Pinot had to re-architect itself with a multi-stage query engine in order to handle native query-time JOINs. There was a separate blog released today that dealt with that specifically:

https://startree.ai/blog/query-time-joins-in-apache-pinot-1-...

[Edit: if users just naively throw query-time JOINs at a problem, they might not get results in the time they want — a non-optimized real-time JOIN took upwards of 39 seconds. With predicate pushdowns, and partition-aware JOINs, suddenly results could be done in 3.3 seconds — faster by an order of magnitude. Still kinda long for a typical page or mobile app refresh but survivable. And yes, even faster subsecond results were possible with additional compute parallelism, but that has a cloud services cost associated with it.

Pre-ingestion JOINs, such as through Apache Flink, would be generally more performant than query-time JOINs. And that's how most people do them right now anyway. So just because you can do query-time JOINs doesn't mean you should if you haven't thought about it ahead of time. If you can, optimize the data partioning to ensure best performance. The good news is that users now have a lot more flexibility in how they want to do JOINs.]



> JOINs are a non-trivial problem with data in the terabyte-to-petabyte range.

but that benchmark is 250G, plus all established players I think support joins well, I work with BQ closely and it works very well.

> So just because you can do query-time JOINs doesn't mean you should if you haven't thought about it ahead of time.

prebuilding denormalized table can significantly increase your datasize, so it is tradeoff and depends on your data structure and cardinality.


Are you comparing BigQuery to Pinot? They are for totally different use cases.


I don't agree with "totally", many usecases, especially where joins are involved totally overlap.


Very true.


Exactly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: