Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

you can't keep a raw stream of logs in an indexed database like clickhouse usefully

volume for any nontrivial organization is too large



Log storage is a standard use case for ClickHouse and has been for years. Our company (Altinity) current hosts or supports numerous online services that store and query logs. The standard implementation approach is to store log messages in one column and use the others as indexes on interesting properties such as time, service name, transaction ID, host name, etc. You can then build a log viewer that implements slicing & dicing queries to locate interesting messages. ClickHouse is much faster and more cost-efficient than competing solutions like Loki or ElasticSearch.

Log messages often compress very well (> 95%) so storage is not as much of an issue as you might think.

Disclaimer: I work for Altinity


What's the suggestion to do it efficiently?

And what kind of volume is it that ClickHouse can't handle when Uber can?

https://eng.uber.com/logging/


Clickhouse's data and control plane are well defined, so many folks end up using S3 (or something like it) as a backing store. From what we've heard, this is what clickhouse cloud does behind the scenes: https://clickhouse.com/cloud




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: