Lots of great additions. I will just highlight two: *Column selection*: When you...

nicoburns · on May 12, 2022

> Allowing for trailing commas should get included in the SQL spec.

Yep! That would be my #1 request for SQL. Seems ridiculous that it's not supported already.

layer8 · on May 13, 2022

I agree, though you can always use a dummy value as a workaround:

  SELECT
      first_column,
      second_column,
      third_column,
      null

_dark_matter_ · on May 12, 2022

BigQuery also supports trailing commas!

snidane · on May 12, 2022

Allow referencing columns defined previously in the same query would make duckdb competitive for data analytics. Without that one has to chain With statements for just the tiniest operations.

  select 1 as x, x + 2 as y, y/x as z;

flakiness · on May 12, 2022

There is a bug for that and it looks someone is even working on it. https://github.com/duckdb/duckdb/issues/1547

karmakaze · on May 12, 2022

There's also no need to make it left to right usage, as long as it's acyclic:

  select y-2 as x, 3 as y, y/x as z;

sonthonax · on May 12, 2022

Would this be compiled into a graph of subqueries and window statements?

karmakaze · on May 12, 2022

I'm not following, the original could be written as

  select x + 2 as y, 1 as x, y/x as z;

with the same column values in a different order. Order of arguments shouldn't matter is all I was saying.

1egg0myegg0 · on May 12, 2022

Yes, good thought! That is listed at the bottom of the article as something we are looking at for the future.

gigatexal · on May 12, 2022

Yep! Agreed!

zasdffaa · on May 12, 2022

(nothing to do with DuckDB but..) SQL is complex enough, and allowing this (and acyclically as mentioned below) would do my $%^& nut implementing it.

But I know a user requirement when I hear one, so can you give me an large, real example of where allowing this would make things easier? That would be mega helpful, ta

wruza · on May 12, 2022

SQL is complex enough

No, it is not. I mean it is, but not in parts where that could be seen as useful and/or convenient. [A]cyclic graph traversal/etc is one of the basic tests in a modern interview at any CRUD studio. How come it could do $%^& to any part of yours?

zasdffaa · on May 12, 2022

> How come it could do $%^& to any part of yours?

Because just implementing the standard stuff nearly did my $^&% nut. Also I know about graphs & posets, and it's potentially a little more complex than it seems. The variables

    select x * x as y, 1 as x

is meh, but what about

    select 
        (select tbl.z from tbl where tbl.y = y) as subq, 
        x * yy as y,
        xx + 1 as x,
        subq + yy as zzz
    from ( 
        select xx, yy
        from ... )

I just don't fancy supporting that.

sagarm · on May 13, 2022

Note you can already reference select list items in GROUP, HAVING, and ORDER BY so it's not that big of an extension.

I've implemented the ability to reference select-list aliases before; it's not that hard to do if implemented basically like a macro expansion. The main problem is user confusion due to ambiguous references, e.g.

    select 2 as x, x as `which x?`
    from (select 1 as x) t;

we ended up adding a warning for the case where a select list alias shadowed a table column, suggesting using a fully-qualified table name if they actually wanted the table column (t.x in the above example).

IMO only allowing references to previous select list items is a perfectly reasonable restriction; loosening it isn't worth the implementation headache or user confusion. Though we did allow using aliases in the WHERE clause.

zasdffaa · on May 13, 2022

> Note you can already reference select list items in GROUP, HAVING, and ORDER BY so it's not that big of an extension.

You're just looking for symbols in the symbol table, I think it's a big difference!

> IMO only allowing references to previous select list items is a perfectly reasonable...

agreed, see my other post where I say the same.

> Though we did allow using aliases in the WHERE clause

And the SQL standards people didn't go for this, and I'm sure they were very far from stupid. And nobody's asking why they didn't allow this, which really bothers me.

sagarm · on May 13, 2022

Oh, was your objection specifically to allowing references to following (not just preceding) select list items? Then we're in violent agreement. That would be complicated to implement and confuse users. Definitely not worth it.

zasdffaa · on May 13, 2022

I'm against doing anything without checking beforehand whether it's actually going to be worth the effort.

But yes, I'd be far happier doing left-to-right dependencies only, which I can believe (though I still need evidence) it would be of some value.

zasdffaa · on May 13, 2022

> suggesting using a fully-qualified table name if they actually wanted the table column (t.x in the above example).

I just realised why this was bothering me. That means 't' and 't.x' are actually different variables. In standard SQL it's always the case (right?) that an unqualified variable ('t') is just an convenient shorthand for the fully qualified variable ('t.x', or more fully I suppose, '<db>.<schema>.t.x), and you just broke that.

wruza · on May 12, 2022

what about

That’s no different than the first snippet, if you aren’t parsing it with regexps, of course. The resulting AST identifiers would simply refer to not only column names, but also to other defined expressions. This is the case for both snippets. It’s either cyclic or not, and when not, it is easy to substitute/cse/etc as usual. The complexity of these expressions is irrelevant.

zasdffaa · on May 13, 2022

https://news.ycombinator.com/item?id=31364281

zasdffaa · on May 13, 2022

@wruza, @wenc: These are both very good answers, and you are of course both right. Check the symbol table, anything you can't find should be defined in the same context (in the select list, as a new expr). In which case, match each symbol use (eg. x in x * x as y) to the definition (eg. 1 as x) to establish a set of dependencies, then do a partial order sort, then spit out the results.

I can do that I just don't fancy it, and more to the point nobody is giving me an example of where it would be particularly helpful. So if anyone can, I'm interested.

(also, consider human factors; although an acyclic definition could be extracted from an unordered expression set, a consistent left to right (in the western world anyway, matching textual layout) with dependencies being introduced on the right and depending only on what came before on the left might actually be better for us meatsacks)

wruza · on May 13, 2022

My examples are from boring enterprise, not from what we love to create at home. I’ve read and patched literally meters long queries in analytics, which could be reduced dramatically by being self-referential and by other approaches discussed itt. Of course these could be refactored into something “create view/temp/cte”, but that requires a full control of ddl, special access rights and code ownership. Most space was used by similar case-when-then constructs and permutations of values these produced. The original code was on official support, so we couldn’t just rewrite it, because migrating to the next update would cost a week instead of an hour.

I could reach to and post a lenghty example, but it’s nothing but boring reshuffles really, spiced with 3-level joins of “modelling db in db to allow user columns”.

I agree on the LTR idea, because reading a symbol not yet defined may lead to confusion.

wenc · on May 12, 2022

It’s not trivial but as someone who has implemented something similar (for an equation based modeling language) it’s not super complicated if you use the right abstractions. It’s basically traversing the AST and doing substitutions.

zasdffaa · on May 13, 2022

https://news.ycombinator.com/item?id=31364281

karmakaze · on May 12, 2022

The thing that makes SQL simple for me is that I can think in set operations devoid of proceduralness. Once we make things more and more sequential the more it is like programming than a formula.

layer8 · on May 13, 2022

What does "do my $%^& nut" even mean? (looks like Perl ;))

zasdffaa · on May 13, 2022

:-) English idiom. Nut = head. Doing my head in, basically.

yread · on May 12, 2022

for example

    select id, count(...something complicated) as complicated_count
    from ....
    order by complicated_count

would help

jsmith99 · on May 12, 2022

'ORDER BY 2' would work here, but using the named column is a lot nicer.

tomjakubowski · on May 12, 2022

Wow, TIL. Great tip for those random one-off queries you have to bash out when investigating a problem.

zasdffaa · on May 13, 2022

Please never let this vile shortcut work its way into your production code.

sagarm · on May 13, 2022

I've seen quite a few production queries that use indexes in GROUP BY and ORDER BY; it's quite common. Probably partially because linters/code review/etc are lightweight to nonexistent amongst the analysts/data science types that I tend to work with.

zasdffaa · on May 13, 2022

Indexes are used all over for grouping an ordering, I was objecting only to the syntax of ORDER BY <number>

zasdffaa · on May 13, 2022

SQL already supports "order by complicated_count". Did you mean group by?

This isn't really the large, convincing example I was looking for btw.

sagarm · on May 13, 2022

Many dialects already support using aliases in GROUP BY and HAVING too, btw.

IMO it's most useful (though somewhat more difficult to implement) to be able to use the aliases with window functions or large case/when statements, something like

   SELECT
     page,
     SUM(clicks) AS total_clicks,
     100. * total_clicks / (SUM(total_clicks) OVER ()) AS click_pct,
     100. * SUM(total_clicks) OVER (ORDER BY total_clicks DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) / (SUM(total_clicks) OVER ()) AS cumulative_click_pct
    FROM weblog
    GROUP BY page;

zasdffaa · on May 13, 2022

Interesting, ta. My code rarely looks like that so thanks for the insight. Was exactrly what I was looking for.

gregmac · on May 12, 2022

Yes, trailing commas should work everywhere!

JSON is the other one where it annoys me, but luckily I rarely hand-write any JSON anymore (and there are semi-solutions for this like json5).

In code I always add trailing commas to anything comma-separated. It makes editing simpler (you can shuffle lines without thinking about commas). In a diff or blame it doesn't show adding a comma as a change.

SQL is the one spot where this doesn't work, and it's a constant foot-gun as I often don't remember until I run and get a syntax error.

skrtskrt · on May 12, 2022

JSONC allows comments and trailing commas, but adoption seems to be low.

VSCode uses it for configuration, but when I wanted to use it in Python (to add context to source-controlled Elasticsearch schemas) there were only a couple old barely-maintained libraries for parsing.

nicoburns · on May 13, 2022

You can often find better maintained libraries for json5, which is a superset of jsonc

yunohn · on May 12, 2022

> there were only a couple old barely-maintained libraries for parsing.

Do they work, though? If it’s a mostly stable standard, doesn’t seem like you’d need a frequently updated parser.

go_prodev · on May 12, 2022

EXCEPT columns would get my vote for ansi standard SQL adoption. So much time is spent selecting all but a few columns.

skeeter2020 · on May 12, 2022

You can do the same thing with your WHERE clause and ANDs by always starting them WHERE 1=1 as well.

>> Allowing for trailing commas should get included in the SQL spec.

So there is no "SQL spec" per se, there's an ANSI specification with decades of convention and provider-specific customizations piled on top. This support for trailing commas is the best you're going to get.

1egg0myegg0 · on May 12, 2022

Thank you for the feedback! I will check those Clickhouse features out. I totally agree on the trailing commas, and I use commas first syntax for that same reason! But maybe not anymore... :-)

franga2000 · on May 13, 2022

> Allowing for trailing commas should get included in the SQL spec

Not just SQL, trailing commas are stupidly useful and convenient, so as far as I'm concerned every language should have them. To be fair, a decent amount of them have implemented them (I was pleasantly surprised by GCC C), but there are still notable holdouts (JSON!).

throw_away · on May 13, 2022

Are leading commas allowed? Because otherwise, you've just traded out the inability to comment out the last element for the inability to comment out the first. I never understood this convention.

sagarm · on May 13, 2022

I agree that it's ugly and don't use it myself, but I find that I modify the last item in a list far more frequently than the first. Probably because the grouping columns tend to go first by convention, and these change less.

IshKebab · on May 12, 2022

Matching columns by regular expression sounds like a terrible feature. Talk about bug-prone!