Grammy is a tool for generating parsers. You describe the language with a grammar written in a Ruby DSL, similar to EBNF. Grammy dynamically generates a parser from that description. You can then use the parser to parse strings into a parse tree, and create an AST from the parse tree.
Grammy implements a PEG (Parsing Expression Grammar) parser. But that's an implementation detail, possibly subject to change, if something better comes along (LPEG, GLR, etc). PEG parsers are quick (using "packrat" caching) and easy to use. They easily handle ambiguous grammars, left or right recursion, and infinite lookahead.
You can use Grammy to parse complex languages. I'm going to be using it to write a full general-purpose programming language. But you can also use it to parse simpler languages, like:
- JSON
- CSV
- external DSLs
- configuration files
- data formats
- HTTP headers
Basically, if you have an EBNF grammar, it should be easy to use Grammy to parse it.
require "grammy"
require "arithmetic" # Your grammar file, as below.
input = "1+2*3"
parse_tree = Arithmetic.parse(input)
ast = parse_tree.ast
I'm still experimenting with a few different DSL syntaxes for the grammar.
NOTE: This is the only syntax currently implemented.
require 'grammy'
class Arithmetic < Grammy::Grammar
# Specify which rule to start with.
start :expression
# Define the rules.
rule(:expression) { term + (plus + term)[0..] }
rule(:term) { factor + (times + factor)[0..] }
rule(:factor) { number | parens(expression) }
terminal(:number) { /\d+/ } # You can also use `token` instead of `terminal`.
terminal(:plus) { "+" } # A terminal rule matches a string or regex, without needing `str` or `reg`.
terminal(:times) { "*" } # Other than that, it works like a normal rule.
# Define any custom combinators.
def parens(exp) = str("(") + expression + str(")")
end
NOTE: This syntax has not been implemented yet.
require 'grammy'
class Arithmetic < Grammy::Grammar
# Specify which rule to start with.
start :expression
# Define the rules.
rule def expression = term + (plus + term)[0..]
rule def term = factor + (times + factor)[0..]
rule def factor = number | parens(expression)
terminal def number = /\d+/
terminal def plus = "+"
terminal def times = "*"
# Define any custom combinators.
def parens(exp) = str("(") + expression + str(")")
end
NOTE: This syntax has not been implemented yet.
require 'grammy'
class Arithmetic < Grammy::Grammar
# Specify which rule to start with.
start :expression
# Define the rules.
def expression = rule { term + (str("+") + term)[0..] }
def term = rule { factor + (str("*") + factor)[0..] }
def factor = rule { number | parens(expression) }
def number = terminal { /\d+/ }
# Define any custom combinators.
def parens(exp) = str("(") + expression + str(")")
end
Grammy uses combinators to define the grammar. Combinators are functions that take one or more parsers as arguments and return a new parser.
Only a few primitive combinators are needed.
The str
combinator is used to match a string; it has an alias of lit
(for "literal").
str("return")
lit("return")
The reg
combinator is used to match a regular expression.
reg(/\d+/)
You use the seq
combinator to specify a sequence of what to match.
This example matches a sequence of a term, a plus sign, and another term.
seq(term, str("+"), term)
The alt
combinator is used to specify alternatives (multiple choices) of what to match.
This example matches either a plus sign or a minus sign.
alt(str("+"), str("-"))
Note
In practical use, this would most likely be done with a single reg
call:
reg(/[+-]/)
The rep
combinator is used to specify repetition of what to match.
You can specify the minimum and maximum number of repetitions, using a range.
This example matches one or more digits.
rep(reg(/\d+/), 1..)
You can use +
, |
, and []
operators in place of the named combinators.
The +
operator can be used in place of the seq
combinator.
The |
operator can be used in place of the alt
combinator.
The []
operator can be used in place of the rep
combinator.
For example, the following two lines are equivalent:
seq(alt(term, expr), str("+"), rep(term, 0..1))
(term | expr) + lit("+") + term[0..1]
The eol
and sol
combinators match the end of a line and the start of a line.
The eof
and sof
combinators match the end of a file and the start of a file.
The wsp
combinator matches any whitespace characters, including tab, newline, carriage return, etc.
The internal nodes of the parse tree are ParseTree objects. The leaves of the parse tree are Token objects.
The parse tree generated by the example above will look like this:
expected_parse_tree = ParseTree.new("expression", [
ParseTree.new("term", [
ParseTree.new("factor", [
ParseTree.new("number", [
Token.new("1")
])
])
]),
Token.new("+"),
ParseTree.new("term", [
ParseTree.new("factor", [
ParseTree.new("number", [
Token.new("2")
]),
Token.new("*"),
ParseTree.new("factor", [
ParseTree.new("number", [
Token.new("3")
])
])
])
])
An AST (Abstract Syntax Tree) can be generated from the parse tree.
Your AST base class can be a subclass of Grammy::Tree
,
and use the tree transformation DSL to derive the AST from the parse tree.
Given the parse tree above, and this tree transformation DSL:
transform(:term) do |node|
left, op, right = node.children
AST::BinaryOp.new(op.text, [transform(left), transform(right)])
end
transform(:number) { |token| token.with(value: token.text.to_i) }
You'd end up with the following AST:
expected_ast = AST::Arithmetic.new(
AST::BinaryOp.new("+", [
AST::Number.new(1),
AST::BinaryOp.new("*", [
AST::Number.new(2),
AST::Number.new(3)
])
])
)
rspec
Copyright (c) 2024-2025 by Craig Buchek and BoochTek, LLC.
This code is licensed under the MIT License. See the LICENSE for the full details.
To release a new version, update the version in lib/VERSION
,
then build the gem and publish it to RubyGems.
make publish