Skip to content

Conversation

kaby76
Copy link
Contributor

@kaby76 kaby76 commented May 16, 2025

This is a fix for #4488.

This PR fixes all ambiguity in the grammar with the addition of explicit actions and predicates. This is likely the first grammar in grammars-v4 with a full-scale symbol table.

The reason for the symbol table is that the Go language is not context-free. The EBNF grammar from the Go Language Specification is ambiguous. The semantics to disambiguate the EBNF grammar is given in the Spec, but it is not in the EBNF itself. This PR adds that part of the semantics explicitly into the grammar using actions and predicates in target-agnostic format.

In addition, some parts of the grammar were just incorrect, somehow erroneously copied from the Spec.

This PR is being tested against the Go runtime.

Current status of testing of the Go runtime: 2835 success out of 3058 total .go files.

This is the test suite.
goroot-src.tar.gz

@kaby76
Copy link
Contributor Author

kaby76 commented May 24, 2025

Annoyingly, Antlr does not parse correctly when predicates are placed after a decision point. This occurs when there are two deriviations that lead to the same thing, as with input make([]fs.DirEntry, 1) in reader.go in the Go runtime. For this input, there are at least two parse trees:

expressionStmt
└── expression
    └── primaryExpr
        ├── primaryExpr
        │   └── operand
        │       └── operandName
        │           ├── Attribute TERMINATOR Value '\n' chnl:HIDDEN
        │           ├── Attribute WS Value '\t' chnl:HIDDEN
        │           └── IDENTIFIER
        │               └── "make"
        └── arguments
            ├── Attribute OTHER Value '' chnl:HIDDEN
            ├── L_PAREN
            │   └── "("
            ├── expressionList
            │   ├── expression
            │   │   └── primaryExpr
            │   │       └── methodExpr
            │   │           ├── type_
            │   │           │   └── typeLit
            │   │           │       └── sliceType
            │   │           │           ├── L_BRACKET
            │   │           │           │   └── "["
            │   │           │           ├── R_BRACKET
            │   │           │           │   └── "]"
            │   │           │           └── elementType
            │   │           │               └── type_
            │   │           │                   └── typeName
            │   │           │                       ├── Attribute OTHER Value '' chnl:HIDDEN
            │   │           │                       └── IDENTIFIER
            │   │           │                           └── "fs"
            │   │           ├── Attribute OTHER Value '' chnl:HIDDEN
            │   │           ├── DOT
            │   │           │   └── "."
            │   │           └── IDENTIFIER
            │   │               └── "DirEntry"
            │   ├── Attribute OTHER Value '' chnl:HIDDEN
            │   ├── COMMA
            │   │   └── ","
            │   └── expression
            │       └── primaryExpr
            │           └── operand
            │               └── literal
            │                   └── basicLit
            │                       └── integer
            │                           ├── Attribute WS Value ' ' chnl:HIDDEN
            │                           └── DECIMAL_LIT
            │                               └── "1"
            ├── Attribute OTHER Value '' chnl:HIDDEN
            └── R_PAREN
                └── ")"

and

expressionStmt
└── expression
    └── primaryExpr
        ├── primaryExpr
        │   └── operand
        │       └── operandName
        │           ├── Attribute TERMINATOR Value '\n' chnl:HIDDEN
        │           ├── Attribute WS Value '\t' chnl:HIDDEN
        │           └── IDENTIFIER
        │               └── "make"
        └── arguments
            ├── Attribute OTHER Value '' chnl:HIDDEN
            ├── L_PAREN
            │   └── "("
            ├── type_
            │   └── typeLit
            │       └── sliceType
            │           ├── L_BRACKET
            │           │   └── "["
            │           ├── R_BRACKET
            │           │   └── "]"
            │           └── elementType
            │               └── type_
            │                   └── typeName
            │                       └── qualifiedIdent
            │                           ├── Attribute OTHER Value '' chnl:HIDDEN
            │                           ├── IDENTIFIER
            │                           │   └── "fs"
            │                           ├── Attribute OTHER Value '' chnl:HIDDEN
            │                           ├── DOT
            │                           │   └── "."
            │                           └── IDENTIFIER
            │                               └── "DirEntry"
            ├── Attribute OTHER Value '' chnl:HIDDEN
            ├── COMMA
            │   └── ","
            ├── expressionList
            │   └── expression
            │       └── primaryExpr
            │           └── operand
            │               └── literal
            │                   └── basicLit
            │                       └── integer
            │                           ├── Attribute WS Value ' ' chnl:HIDDEN
            │                           └── DECIMAL_LIT
            │                               └── "1"
            ├── Attribute OTHER Value '' chnl:HIDDEN
            └── R_PAREN
                └── ")"

There are two derivations for arguments:

  • Choose the alt that derives expressionList immediately after the L_PAREN;
  • Choose the alt that derives the sentential form type_ ',' expressionList.

If we place the predicate at the end of type_, the first choice should be eliminated because fs is not a type. In fact, fs is a package name. So, it cannot be a receiver.

The second choice is the correct one because the string []fs.DirEntry is a type. However, the predicate in parser rule type_, which checks whether the type is correct, is evaluated too late: AdaptivePredict() makes the choice between the two ambiguous alts incorrectly because it does not evaluate the disambiguation predicate.

Parr et al recommend hoisting the predicate into the calling rule, but this is quite unnatural because we don't have the parse tree for the input at the point where the choice needs to be made. In fact, all we have is a stream of tokens.

There are a number of different hacks to fix this. One hack is to perform the parse at the beginning of argments, turning off certain alts and forcing the choice after the fact. This is equivalent to the old syntactic predicates of Antlr3.

@kaby76
Copy link
Contributor Author

kaby76 commented May 25, 2025

It turns out that we should have both the ambiguous grammar from spec without symbol table and one with symbol table. Using the strategy I outline above to implement disambiguation predicates, the original grammar without predicates is important in identifying the source of the ambiguity. We need to understand where the ambiguity is actually coming from so we can add in the "syntactic predicate" hack at the correct pinch points.

@kaby76 kaby76 changed the title [golang] Fix for #4469--Add symbol table. [golang] Fix for #4469: Add symbol table. Jun 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant