Skip to content

stone-lang/grammy

Repository files navigation

Grammy

Grammy is a tool for generating parsers. You describe the language with a grammar written in a Ruby DSL, similar to EBNF. Grammy dynamically generates a parser from that description. You can then use the parser to parse strings into a parse tree, and create an AST from the parse tree.

License: MIT Version Gem Version

ToC

Features

Grammy implements a PEG (Parsing Expression Grammar) parser. But that's an implementation detail, possibly subject to change, if something better comes along (LPEG, GLR, etc). PEG parsers are quick (using "packrat" caching) and easy to use. They easily handle ambiguous grammars, left or right recursion, and infinite lookahead.

You can use Grammy to parse complex languages. I'm going to be using it to write a full general-purpose programming language. But you can also use it to parse simpler languages, like:

  • JSON
  • CSV
  • external DSLs
  • configuration files
  • data formats
  • HTTP headers

Basically, if you have an EBNF grammar, it should be easy to use Grammy to parse it.

Usage

require "grammy"
require "arithmetic" # Your grammar file, as below.

input = "1+2*3"
parse_tree = Arithmetic.parse(input)
ast = parse_tree.ast

I'm still experimenting with a few different DSL syntaxes for the grammar.

Class Methods

NOTE: This is the only syntax currently implemented.

require 'grammy'

class Arithmetic < Grammy::Grammar
  # Specify which rule to start with.
  start :expression

  # Define the rules.
  rule(:expression) { term + (plus + term)[0..] }
  rule(:term) { factor + (times + factor)[0..] }
  rule(:factor) { number | parens(expression) }
  terminal(:number) { /\d+/ }  # You can also use `token` instead of `terminal`.
  terminal(:plus) { "+" }      # A terminal rule matches a string or regex, without needing `str` or `reg`.
  terminal(:times) { "*" }     # Other than that, it works like a normal rule.

  # Define any custom combinators.
  def parens(exp) = str("(") + expression + str(")")
end

Decorated Methods

NOTE: This syntax has not been implemented yet.

require 'grammy'

class Arithmetic < Grammy::Grammar
  # Specify which rule to start with.
  start :expression

  # Define the rules.
  rule def expression = term + (plus + term)[0..]
  rule def term = factor + (times + factor)[0..]
  rule def factor = number | parens(expression)
  terminal def number = /\d+/
  terminal def plus = "+"
  terminal def times = "*"

  # Define any custom combinators.
  def parens(exp) = str("(") + expression + str(")")
end

Instance Methods

NOTE: This syntax has not been implemented yet.

require 'grammy'

class Arithmetic < Grammy::Grammar
  # Specify which rule to start with.
  start :expression

  # Define the rules.
  def expression = rule { term + (str("+") + term)[0..] }
  def term = rule { factor + (str("*") + factor)[0..] }
  def factor = rule { number | parens(expression) }
  def number = terminal { /\d+/ }

  # Define any custom combinators.
  def parens(exp) = str("(") + expression + str(")")
end

Combinators

Grammy uses combinators to define the grammar. Combinators are functions that take one or more parsers as arguments and return a new parser.

Only a few primitive combinators are needed.

String

The str combinator is used to match a string; it has an alias of lit (for "literal").

str("return")
lit("return")

Regex

The reg combinator is used to match a regular expression.

reg(/\d+/)

Sequence

You use the seq combinator to specify a sequence of what to match.

This example matches a sequence of a term, a plus sign, and another term.

seq(term, str("+"), term)

Alternatives

The alt combinator is used to specify alternatives (multiple choices) of what to match.

This example matches either a plus sign or a minus sign.

alt(str("+"), str("-"))

Note

In practical use, this would most likely be done with a single reg call:

reg(/[+-]/)

Repetition

The rep combinator is used to specify repetition of what to match. You can specify the minimum and maximum number of repetitions, using a range.

This example matches one or more digits.

rep(reg(/\d+/), 1..)

Operator DSL

You can use +, |, and [] operators in place of the named combinators. The + operator can be used in place of the seq combinator. The | operator can be used in place of the alt combinator. The [] operator can be used in place of the rep combinator.

For example, the following two lines are equivalent:

seq(alt(term, expr), str("+"), rep(term, 0..1))
(term | expr) + lit("+") + term[0..1]

Start/End of Line/File

The eol and sol combinators match the end of a line and the start of a line. The eof and sof combinators match the end of a file and the start of a file.

Whitespace

The wsp combinator matches any whitespace characters, including tab, newline, carriage return, etc.

Parse Tree

The internal nodes of the parse tree are ParseTree objects. The leaves of the parse tree are Token objects.

The parse tree generated by the example above will look like this:

expected_parse_tree = ParseTree.new("expression", [
  ParseTree.new("term", [
    ParseTree.new("factor", [
      ParseTree.new("number", [
        Token.new("1")
      ])
    ])
  ]),
  Token.new("+"),
  ParseTree.new("term", [
    ParseTree.new("factor", [
      ParseTree.new("number", [
        Token.new("2")
      ]),
    Token.new("*"),
    ParseTree.new("factor", [
      ParseTree.new("number", [
        Token.new("3")
      ])
    ])
  ])
])

AST

An AST (Abstract Syntax Tree) can be generated from the parse tree. Your AST base class can be a subclass of Grammy::Tree, and use the tree transformation DSL to derive the AST from the parse tree.

Given the parse tree above, and this tree transformation DSL:

transform(:term) do |node|
  left, op, right = node.children
  AST::BinaryOp.new(op.text, [transform(left), transform(right)])
end
transform(:number) { |token| token.with(value: token.text.to_i) }

You'd end up with the following AST:

expected_ast = AST::Arithmetic.new(
  AST::BinaryOp.new("+", [
    AST::Number.new(1),
    AST::BinaryOp.new("*", [
      AST::Number.new(2),
      AST::Number.new(3)
    ])
  ])
)

Tests

rspec

License

Copyright (c) 2024-2025 by Craig Buchek and BoochTek, LLC.

This code is licensed under the MIT License. See the LICENSE for the full details.

Contributing

PRs and issues are welcome!

Release

To release a new version, update the version in lib/VERSION, then build the gem and publish it to RubyGems.

make publish

About

PEG parser with a BNF-like DSL

Resources

Stars

Watchers

Forks

Packages

No packages published