Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[postgresql] Fix for #4332 -- fix ambiguity with '=' (rewrite Floyd operator precedence grammar) #4334

Closed
wants to merge 1 commit into from

Conversation

kaby76
Copy link
Contributor

@kaby76 kaby76 commented Nov 19, 2024

This is a fix for #4332.

The problem raised in the issue was the ambiguity with select * from tmptz where f1 at time zone 'utc' = '2017-01-18 00:00';. But, the problem was the haphazard implementation of the operator precedence grammar for a_expr.

I rewrote the rules associated with a_expr to use the Antlr style for expressions, fixing the ambiguity. In addition, the parse trees for expressions are much more concise. I will need to unfold c_expr into a_expr and refactor the order of alts to correspond to the correct precedence--and remove ambiguity that I introduced with the rewrite of a_expr. This is important because Antlr does not implement precedence like Bison. Some of the alts for a_expr were removed because they overlapped with those in c_expr.

Not only are the trees much smaller, but it is also a little faster, taking ~7s for the examples/ test suite vs. ~8s before this PR.

To do: Add expression tests with parse trees.

@kaby76
Copy link
Contributor Author

kaby76 commented Nov 21, 2024

Added a bunch of ambiguity. Will need to clean up before submitting.

@kaby76 kaby76 closed this Nov 21, 2024
@kaby76
Copy link
Contributor Author

kaby76 commented Dec 7, 2024

The problem is that Antlr cannot handle operator precedence parsing if one of the symbols used in the RHS does not follow the standard pattern recognized by Antlr:

Recursion must occur with first and/or last occurrences of the symbol only.

Consider this grammar.

grammar xxx;

start: e (';' e)* EOF;

e : e '::' t
 | e '||' e
 | ID
 ;
t : e;
f : f '::' f
 | f '||' f
 | ID
 ;
z : ID;
ID : [a-z]+;
WS : [ \t\r\n] -> skip;

This grammar is ambiguous because of the symbol t in the RHS of the recursive rule. This results in the following ATN for e.

graphviz (19)

Now, compare that with the ATN for f (with a proper recursion).

graphviz (20)

See:

@kaby76 kaby76 reopened this Dec 7, 2024
@kaby76
Copy link
Contributor Author

kaby76 commented Dec 18, 2024

Note, Antlr may not rewrite left-recursion correctly. In that case, I may have to explicitly set the precedence. https://stackoverflow.com/q/30001506/4779853

@kaby76
Copy link
Contributor Author

kaby76 commented Dec 31, 2024

Closing for now until I have some free time for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants