Hi. I'm the presenter. Thanks for the interest. Opinions here are my own.
I'll put in a TLDR as the presentation is quite long. The other thing I'd like to say was that QCon London impressed me, the organisers spent time ensuring a good quality of presentation. The other talks that I saw were great. Many conferences I've been to recently are just happy to get someone, or can choose and go with well known quantities. I first attended QCon London early in my career, so it was interesting coming back after over a decade to present.
TLDR:
Why did we build our own database? In effort terms, successful quantative trading is more about good ideas well executed than it is about production trading technology (apart from perhaps HFT). We needed something that helped the quants be the most productive with data.
We needed something that was:
- Easy to use (I mean really easy for beginner/moderate programmers). We talk about day 1 productivity for new starters. Python is a tool for Quants not a career.
- Cost effective to run (no large DB infra, easy to maintain, cheap storage, low licensing)
- Performant (traditional SQL DBs don't compare here, we're in the Parquet, Clickhouse, KBD, etc space)
- Scalable (large data-science jobs 10K+ cores, on-demand)
This sort of general architecture (store parquet-like files somewhere like s3 and build a metadata database on top) seems reasonably common and gives obvious advantages for storing lots of data, scaling horizontally, and scaling storage and compute separately. I wonder where you feel your advantages are compared to similar systems? Eg is it certain API choices/affordances like the ‘time travel’ feature, or having in-house expertise or some combination of features that don’t usually come together?
A slightly more technical question is what your time series indexes are? Is it about optimising storage, or doing fast random-access lookups, or more for better as-of joins?
We do have a specialist time-series index, optimised for things like tick-data. It compresses fairly well but we generally optimise for read-time. Not all over the place random-access, but slicing out date-ranges. There are two layers of index, a high level index of the data-objects, and the index in each object in S3.
A built-in as-of join is something we want to build.
I feel like ‘exactly’ is doing a lot of work in your comment and I am interested in the reasons that that word may not be quite the right word to describe these situations.
I'll put in a TLDR as the presentation is quite long. The other thing I'd like to say was that QCon London impressed me, the organisers spent time ensuring a good quality of presentation. The other talks that I saw were great. Many conferences I've been to recently are just happy to get someone, or can choose and go with well known quantities. I first attended QCon London early in my career, so it was interesting coming back after over a decade to present.
TLDR:
Why did we build our own database? In effort terms, successful quantative trading is more about good ideas well executed than it is about production trading technology (apart from perhaps HFT). We needed something that helped the quants be the most productive with data.
We needed something that was:
- Easy to use (I mean really easy for beginner/moderate programmers). We talk about day 1 productivity for new starters. Python is a tool for Quants not a career.
- Cost effective to run (no large DB infra, easy to maintain, cheap storage, low licensing)
- Performant (traditional SQL DBs don't compare here, we're in the Parquet, Clickhouse, KBD, etc space)
- Scalable (large data-science jobs 10K+ cores, on-demand)
A much shorter 3 min intro from PyQuantNews: https://www.youtube.com/watch?v=5_AjD7aVEEM
GitHub repo (Source-available/BSL): https://github.com/man-group/ArcticDB