Because just implementing the standard stuff nearly did my $^&% nut. Also I know about graphs & posets, and it's potentially a little more complex than it seems. The variables
select x * x as y, 1 as x
is meh, but what about
select
(select tbl.z from tbl where tbl.y = y) as subq,
x * yy as y,
xx + 1 as x,
subq + yy as zzz
from (
select xx, yy
from ... )
Note you can already reference select list items in GROUP, HAVING, and ORDER BY so it's not that big of an extension.
I've implemented the ability to reference select-list aliases before; it's not that hard to do if implemented basically like a macro expansion. The main problem is user confusion due to ambiguous references, e.g.
select 2 as x, x as `which x?`
from (select 1 as x) t;
we ended up adding a warning for the case where a select list alias shadowed a table column, suggesting using a fully-qualified table name if they actually wanted the table column (t.x in the above example).
IMO only allowing references to previous select list items is a perfectly reasonable restriction; loosening it isn't worth the implementation headache or user confusion. Though we did allow using aliases in the WHERE clause.
> Note you can already reference select list items in GROUP, HAVING, and ORDER BY so it's not that big of an extension.
You're just looking for symbols in the symbol table, I think it's a big difference!
> IMO only allowing references to previous select list items is a perfectly reasonable...
agreed, see my other post where I say the same.
> Though we did allow using aliases in the WHERE clause
And the SQL standards people didn't go for this, and I'm sure they were very far from stupid. And nobody's asking why they didn't allow this, which really bothers me.
Oh, was your objection specifically to allowing references to following (not just preceding) select list items? Then we're in violent agreement. That would be complicated to implement and confuse users. Definitely not worth it.
> suggesting using a fully-qualified table name if they actually wanted the table column (t.x in the above example).
I just realised why this was bothering me. That means 't' and 't.x' are actually different variables. In standard SQL it's always the case (right?) that an unqualified variable ('t') is just an convenient shorthand for the fully qualified variable ('t.x', or more fully I suppose, '<db>.<schema>.t.x), and you just broke that.
That’s no different than the first snippet, if you aren’t parsing it with regexps, of course. The resulting AST identifiers would simply refer to not only column names, but also to other defined expressions. This is the case for both snippets. It’s either cyclic or not, and when not, it is easy to substitute/cse/etc as usual. The complexity of these expressions is irrelevant.
@wruza, @wenc: These are both very good answers, and you are of course both right. Check the symbol table, anything you can't find should be defined in the same context (in the select list, as a new expr).
In which case, match each symbol use (eg. x in x * x as y) to the definition (eg. 1 as x) to establish a set of dependencies, then do a partial order sort, then spit out the results.
I can do that I just don't fancy it, and more to the point nobody is giving me an example of where it would be particularly helpful. So if anyone can, I'm interested.
(also, consider human factors; although an acyclic definition could be extracted from an unordered expression set, a consistent left to right (in the western world anyway, matching textual layout) with dependencies being introduced on the right and depending only on what came before on the left might actually be better for us meatsacks)
My examples are from boring enterprise, not from what we love to create at home. I’ve read and patched literally meters long queries in analytics, which could be reduced dramatically by being self-referential and by other approaches discussed itt. Of course these could be refactored into something “create view/temp/cte”, but that requires a full control of ddl, special access rights and code ownership. Most space was used by similar case-when-then constructs and permutations of values these produced. The original code was on official support, so we couldn’t just rewrite it, because migrating to the next update would cost a week instead of an hour.
I could reach to and post a lenghty example, but it’s nothing but boring reshuffles really, spiced with 3-level joins of “modelling db in db to allow user columns”.
I agree on the LTR idea, because reading a symbol not yet defined may lead to confusion.
It’s not trivial but as someone who has implemented something similar (for an equation based modeling language) it’s not super complicated if you use the right abstractions. It’s basically traversing the AST and doing substitutions.
Because just implementing the standard stuff nearly did my $^&% nut. Also I know about graphs & posets, and it's potentially a little more complex than it seems. The variables
is meh, but what about I just don't fancy supporting that.