Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Those are some very big claims with respect to performance. Has anyone outside of the author been able to reproduce the claims, considering you need to pay 100k/month just to do it?

I also wonder if the commercial version has anti-benchmark clauses like some database vendors. I've always seen claims that K is much faster than anything else out there, but I've never seen an actual independent benchmark with numbers.

Edit: according to https://mlochbaum.github.io/BQN/implementation/kclaims.html, commercial licenses do indeed come with anti-benchmark clauses, which makes it very hard to take the one in this post at face value.



I used to use K professionally inside a hedge fund a few years back. Aside from the terrible user experience (if your code isn’t correct you will often just get ‘error’ or ‘not implemented’ with no further detail), if the performance really was as stellar as claimed, then there wouldn’t need to be a no benchmark clause in the license. It can be fast, if your data is in the right formats, but not crazy fast. And easy to beat if you can run your code on the GPU.


The last point is spot on... Pandas on GPUs (cudf) gets you both the perf + usability, without having to deal with issues common to stack/array languages (k) and lazy languages (dask, polars). My flow is pandas -> cudf -> dask cudf , spark, etc.

More recently, we have been working on GFQL with users at places like banks (graph dataframe query language), where we translate down to tools like pandas & cudf. A big "aha" is that columnar operations are great -- not far from what array languages focus on -- and having a static/dynamic query planner so optimizations around that helps once you hit memory limits. Eg, dask has dynamic DFS reuse of partitions as part of its work stealing. More SQL-y tools like Spark may make plans like that ahead of time. In contrast, that lands more on the user if they stick with pandas or k, eg, manual tiling.


I've been using kdb/q since 2010. Started at a big bank and have used it ever since.

Kdb/q is like minimalist footwear. But you can run longer and faster with it on. There's a tipping point where you just "get it". It's a fantastic language and platform.

The problem is very few people will pay 100k/month for shakti. I'm not saying people won't pay and it won't be a good business. But if you want widespread adoption you need to create and an ecosystem. Open sourcing it is a start. Creating libraries and packages comes after. The mongodb model is the right approach IMO


Can you elaborate on what mongo did right? My understanding is that AWS is stealing their business by creating a compatible api


MongoDB's biggest feat is marketing. Recovering from people comparing your database with a black hole is... something.


Would you recommend K?

Is something else better (if so what)?


I think the main reason to use any of these array languages (for work) is job security. Since it's so hard to find expert programmers if you can get your employer to sign off on an array language for all the mission-critical stuff then you can lock in your job for life! How can they possibly replace you if they can't find anyone else who understands the code?

Otherwise, I don't see anything you can do in an array language that you couldn't do in any other language, albeit less verbosely. But I believe in this case a certain amount of verbosity is a feature if you want people to be able to read and understand the code. Array languages and their symbol salad programs are like the modern day equivalent of medieval alchemists writing all their lab notes in a bespoke substitution cipher. Not unbreakable (like modern cryptography) but a significant enough barrier to dissuade all but the most determined investigators.

As an aside, I think the main reason these languages took off among quants is that investing as an industry tends toward the exultation of extremely talented geniuses. Perhaps unintelligible "secret sauce" code has an added benefit of making industrial espionage more challenging (and of course if a rival firm steals all your code they can arbitrage all of your trades into oblivion).


I'm sorry but you really sound like you judge APLs from an outsider pov. For sure, it's not job security that's keeping APLs afloat, because APLs are very easy to learn. A pro programmer would never use K or any APL. But pro mathematicians or scientists needing some array programming for their job will.


I have been a programmer, scientist, etc, at various times in my life and I have programmed in J. I don't think there is any compelling reason to use J over Matlab, R, or Python and very many reasons not to. Vector languages are mind expanding, for sure, but they have done a very poor job keeping up with the user experience and network effects of newer languages.

A few years ago I wrote a pipeline in J and then re-implemented it R. The J code was exactingly crafted and the R code was naive, but the R code was still faster, easier to read and maintain and, frankly, easier to write. J gives a certain perverse frisson, but beyond that I don't really see the use case.


I disagree. For the stuff I'm doing, I've been rewriting the same 1-screen program in different ways many times. J made that easy, R would make that much more tedious since I'd have >250 lines instead of 30. Of course if I'm doing something for which R has libs, I'll do it in R. Additionally, I'll very probably end up rewriting my final program in another language for speed and libs, because by then I'll know precisely what I want to write. IMO, the strength of array languages lies in easy experimentation.


I think there is an element of truth to this, but the context where this ends up being an advantage is personal and idiosyncratic.


Are you a quant, or doing very specialized numerical things that aren't in libs on 2D datasets? Then 10X yes. If not, no. Everything else will be better


I'm not a quant but having used "data science" languages, ocaml, J, R, etc, I strongly doubt that an array language offers any substantial advantages at this date. I could be wrong, of course, but it seems unlikely.


J only is a data science language in the sense that its core is suited to dataset manipulation, but it's severely lacking in modelling librairies. If you're doing exotic stuff and you know exactly what you're doing, I can see the rationale but otherwise it's R/Python any day. It makes plenty of sense for custom mathematical models, such as simulations though.


J and OPFS


no benchmark clause sounds like webgpu


Quick for building a website? Probably not.

Quick for evaluating some idea you just had if you are a quant? Yes absolutely!

So imagine you have a massive dataset, and an idea.

For testing out your idea you want that data to be in an “online analytical processing” (OLAP) kind of database. These typically store the data by column not row and other tricks to speed up crunching through reads, trading off write performance etc.

There are several big tech choices you could make. Mainstream king is SQL.

Something that was trendy a few years ago in the nosql revolution was to write some scala at the repl.

It is these that K is competing with, and being faster than.


I would probably use Matlab for that sort of stuff tbh. Is K faster than Matlab?


For using a heap of libs to compute something that has been done a thousand times? No. For writing a completely new data processing pipeline from scratch? Much, much faster! Array langs have decent performance, but that's not why they are used. They are used because you develop faster for the kind of problem they're good at. People always conflate those two aspects. I use J for scientific research.


Seems weird to switch to develop faster and complain about people conflating the two aspects when this thread is clearly talking about runtime performance, triggered by the benchmark claims:

> real-sql(k) is consistently 100 times faster (or more) than redshift, bigquery, snowflake, spark, mongodb, postgres, ..

> same data. same queries. same hardware. anyone can run the scripts.


GP here :)

When talking about speed I was rolling the time taken to write the query and get the query to run with the run time. The total speed depends upon both.

In another thread someone pointed out that being a good quant is also about having the best ideas in the first place.

This is just a general comment on the whole comments section in general that I leave here:

We have this weird situation where the general programming population looks at a small group of the most profitable programmers who are thinking about their domain problems in languages that are a mapping or mathematical symbols to a qwerty keyboard (in the old days there were APL keyboards). And the mainstream programmers say that is so weird that it must be wrong and must be a lie and so on.

Occam’s razor says that those profitable programmers wouldn’t be buying K if the same results at lower TCO or better results at a higher price?

In broader data engineering there has been tech like “I use scala!” that are used to gatekeep and recognise the in crowd. But that is in the faceless corporate end of enterprise data engineering where people are not measured in bottom lines.

Sorry for venting :)


> most profitable programmers

This more than anything demonstrates the hothouse-flower mentality of K stans. Quants have long since stopped being the best-paid or most value-generating engineers, and since K has zero application outside of quant, it's no longer even a particularly lucrative skill to acquire.

It's interesting though that the opacity of the "I make more money than you" argument fits so snugly with other unverifiable and outdated claims of K supremacy, like performance, job security, or expressiveness.


Besides, it has been my experience that the more the programming part of a given quant's job contributes to their profitability, the less they enjoy using K. K is a neat language for research, but I don't know of many who still like it as a language to write code you intend to reuse or maintain.

That said, I would personally rather do research in python, especially now that the performance situation is reversed.


> Seems weird to switch to develop faster and complain about people conflating the two aspects when this thread is clearly talking about runtime performance, triggered by the benchmark claims

It doesn't look to me like GP switched to develop and complained of conflation. The switch happened higher up the thread by wood_spirit, and GP just continued the conversation (and called out the tendency to conflate, without calling out a specific person).

On a meta note, I wish this trend of saying "it seems weird" and then calling out some fallacy or error would die. Fallacies are extremely common and not "weird", and it comes off as extremely condescending.

It happens quite frequently on HN (and surely other places, though I don't regularly patronize those). So to be clear, this isn't critcism levelled at you exclusively. (I even include myself as target for this criticism, as I've used the expression previously on HN as well, before I thought more about it).

Firstly, in this case and in most cases where that expression is used, it's actually weird to call it weird[1]. Fallacies, logic errors, and other mistakes are extremely natural to humans. Even with significant training and effort, we still make those mistakes routinely.

Secondly, it seems like it's often used as a veiled ad hominem or insult. It's entirely superfluous to add. In this case you could have just said "you complained about people conflating the two aspects and then conflated them yourself." (It still wouldn't have been correct as GP didn't conflate them, but it would have been more direct and clear).

Thirdly, it comes off as condescending[2]. It's sort of like saying, "whoa dude, we're all normal and don't make mistakes, but you're weird and do make mistakes." In reality, we all do it so it's not weird at all.

[1]: https://www.merriam-webster.com/dictionary/weird

  1: of strange or extraordinary character : ODD, FANTASTIC
  2: of, relating to, or caused by witchcraft or the supernatural : MAGICAL
[2]: The irony of this is not lost on me. I can definitely see how this comment might also come off as condescending. I don't intend it to be, but it is a ridiculously long comment for such a simple point. It also included a dictionary check which is also frequently a charactersitc of condescending comments. I don't intend it to be condescending, merely reflective of self-analytical, but as is my theme here, we all make mistakes :-)


You can understand something fully and still call it weird. I’ve used perl for decades, some of the things it does are still just weird. As for fallacies, one of the first things I was taught in logic class was that using an argument’s formal structure (or lack thereof) to determine truth is itself a fallacy. Unsound != Untrue, and throwing around Latin names of fallacies doesn’t actually support an argument.


I'm not switching anything. Just trying to add to the conversation. BTW, for simpler queries I have no doubt the benchmarks are correct. I anticipate it would not hold for more beefy queries.


You came by it honestly! The initial conflation (and therefore context switch) happened further up-thread.


Yeah to be clear I meant performance-wise. In terms of development speed it looks like it just goes to insane extreme on the "fast at the beginning / fast in the middle" trade-off. You know how dynamic typing can be faster to develop when you're writing a one-man 100 line script, but once you get beyond that the extra effort to add static types means you overtake dynamic typing.


Performance-wise I'll admit I don't really know. But I do know that performance is at least decent, in that it lets you experiment easily with pretty much any array problem. Your dynamic/static typing analogy is pretty much on point.


I think some things will be faster and the real difference is more in things that are easy to express in the language. Eg solving differential equations may be easier in matlab and as-of joins in k.


Array languages are very fast for code that fits sensibly into arrays and databases spend a lot of compute time getting correctness right. 100x faster than postgres on arbitrary data sounds unlikely-to-impossible but on specific problems might be doable.


Yes, when I took a look at shakti's database benchmarks before, they seemed entirely reasonable with typical array language implementation methods. I even compared shakti's sum-by benchmarks to BQN group followed by sum-each, which I'd expect to be much slower than a dedicated implementation, and it was around the same order of magnitude (like 3x slower or something) when accounting for multiple cores. I was surprised that something like Polars would do it so slowly, but that's what the h2o benchmark said... I guess they just don't have a dedicated sum-by implementation for each type. I think shakti may have less of an advantage with more complicated queries, and doing the easy stuff blazing fast is nice but they're probably only a small fraction of the typical database workload anyway.


The comparison posted may be true, but on some level it doesn't make sense. It's like this old awk-vs-hadoop post https://adamdrake.com/command-line-tools-can-be-235x-faster-...

Yes, a very specific implementation will be faster than a generic system which includes network delays and ensures you can handle things larger than your memory in a multi-client system. But the result is meaningless - perl or awk will also be faster here.

If you need a database system, you're not going to replace it with K, if you need super fast in-memory calculations, you're not going to use a database system. Apples and oranges.


They at least used to have free evaluation licenses that were good for a month. Our license was even unlocked for unlimited cores.

I doubt they'd give them out to a random individual or small startup, but maybe still possible for a serious potential customer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: