Grammy

Grammy is a tool for generating parsers. You describe the language with a grammar written in a Ruby DSL, similar to EBNF. Grammy dynamically generates a parser from that description. You can then use the parser to parse strings into a parse tree, and create an AST from the parse tree.

Features

Grammy implements a PEG (Parsing Expression Grammar) parser. But that's an implementation detail, possibly subject to change, if something better comes along (LPEG, GLR, etc). PEG parsers are quick (using "packrat" caching) and easy to use. They easily handle ambiguous grammars, left or right recursion, and infinite lookahead.

You can use Grammy to parse complex languages. I'm going to be using it to write a full general-purpose programming language. But you can also use it to parse simpler languages, like:

JSON
CSV
external DSLs
configuration files
data formats
HTTP headers

Basically, if you have an EBNF grammar, it should be easy to use Grammy to parse it.

Usage

require "grammy"
require "arithmetic" # Your grammar file, as below.

input = "1+2*3"
parse_tree = Arithmetic.parse(input)
ast = parse_tree.ast

I'm still experimenting with a few different DSL syntaxes for the grammar.

Class Methods

NOTE: This is the only syntax currently implemented.

require 'grammy'

class Arithmetic < Grammy::Grammar
  # Specify which rule to start with.
  start :expression

  # Define the rules.
  rule(:expression) { term + (plus + term)[0..] }
  rule(:term) { factor + (times + factor)[0..] }
  rule(:factor) { number | parens(expression) }
  terminal(:number) { /\d+/ }  # You can also use `token` instead of `terminal`.
  terminal(:plus) { "+" }      # A terminal rule matches a string or regex, without needing `str` or `reg`.
  terminal(:times) { "*" }     # Other than that, it works like a normal rule.

  # Define any custom combinators.
  def parens(exp) = str("(") + expression + str(")")
end

Decorated Methods

NOTE: This syntax has not been implemented yet.

require 'grammy'

class Arithmetic < Grammy::Grammar
  # Specify which rule to start with.
  start :expression

  # Define the rules.
  rule def expression = term + (plus + term)[0..]
  rule def term = factor + (times + factor)[0..]
  rule def factor = number | parens(expression)
  terminal def number = /\d+/
  terminal def plus = "+"
  terminal def times = "*"

  # Define any custom combinators.
  def parens(exp) = str("(") + expression + str(")")
end

Instance Methods

NOTE: This syntax has not been implemented yet.

require 'grammy'

class Arithmetic < Grammy::Grammar
  # Specify which rule to start with.
  start :expression

  # Define the rules.
  def expression = rule { term + (str("+") + term)[0..] }
  def term = rule { factor + (str("*") + factor)[0..] }
  def factor = rule { number | parens(expression) }
  def number = terminal { /\d+/ }

  # Define any custom combinators.
  def parens(exp) = str("(") + expression + str(")")
end

Combinators

Grammy uses combinators to define the grammar. Combinators are functions that take one or more parsers as arguments and return a new parser.

Only a few primitive combinators are needed.

String

The str combinator is used to match a string; it has an alias of lit (for "literal").

str("return")
lit("return")

Regex

The reg combinator is used to match a regular expression.

reg(/\d+/)

Sequence

You use the seq combinator to specify a sequence of what to match.

This example matches a sequence of a term, a plus sign, and another term.

seq(term, str("+"), term)

Alternatives

The alt combinator is used to specify alternatives (multiple choices) of what to match.

This example matches either a plus sign or a minus sign.

alt(str("+"), str("-"))

Note

In practical use, this would most likely be done with a single reg call:

reg(/[+-]/)

Repetition

The rep combinator is used to specify repetition of what to match. You can specify the minimum and maximum number of repetitions, using a range.

This example matches one or more digits.

rep(reg(/\d+/), 1..)

Operator DSL

You can use +, |, and [] operators in place of the named combinators. The + operator can be used in place of the seq combinator. The | operator can be used in place of the alt combinator. The [] operator can be used in place of the rep combinator.

For example, the following two lines are equivalent:

seq(alt(term, expr), str("+"), rep(term, 0..1))
(term | expr) + lit("+") + term[0..1]

Start/End of Line/File

The eol and sol combinators match the end of a line and the start of a line. The eof and sof combinators match the end of a file and the start of a file.

Whitespace

The wsp combinator matches any whitespace characters, including tab, newline, carriage return, etc.

Parse Tree

The internal nodes of the parse tree are ParseTree objects. The leaves of the parse tree are Token objects.

The parse tree generated by the example above will look like this:

expected_parse_tree = ParseTree.new("expression", [
  ParseTree.new("term", [
    ParseTree.new("factor", [
      ParseTree.new("number", [
        Token.new("1")
      ])
    ])
  ]),
  Token.new("+"),
  ParseTree.new("term", [
    ParseTree.new("factor", [
      ParseTree.new("number", [
        Token.new("2")
      ]),
    Token.new("*"),
    ParseTree.new("factor", [
      ParseTree.new("number", [
        Token.new("3")
      ])
    ])
  ])
])

AST

An AST (Abstract Syntax Tree) can be generated from the parse tree. Your AST base class can be a subclass of Grammy::Tree, and use the tree transformation DSL to derive the AST from the parse tree.

Given the parse tree above, and this tree transformation DSL:

transform(:term) do |node|
  left, op, right = node.children
  AST::BinaryOp.new(op.text, [transform(left), transform(right)])
end
transform(:number) { |token| token.with(value: token.text.to_i) }

You'd end up with the following AST:

expected_ast = AST::Arithmetic.new(
  AST::BinaryOp.new("+", [
    AST::Number.new(1),
    AST::BinaryOp.new("*", [
      AST::Number.new(2),
      AST::Number.new(3)
    ])
  ])
)

Tests

rspec

License

This code is licensed under the MIT License. See the LICENSE for the full details.

Contributing

PRs and issues are welcome!

Release

To release a new version, update the version in lib/VERSION, then build the gem and publish it to RubyGems.

make publish

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.github/workflows		.github/workflows
docs		docs
lib		lib
spec		spec
.gitignore		.gitignore
.markdownlint.yaml		.markdownlint.yaml
.rspec		.rspec
.rubocop.yml		.rubocop.yml
.tool-versions		.tool-versions
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
Makefile		Makefile
README.md		README.md
Rakefile		Rakefile
VERSION		VERSION
bun.lock		bun.lock
grammy.gemspec		grammy.gemspec
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Grammy

ToC

Features

Usage

Class Methods

Decorated Methods

Instance Methods

Combinators

String

Regex

Sequence

Alternatives

Repetition

Operator DSL

Start/End of Line/File

Whitespace

Parse Tree

AST

Tests

License

Contributing

Release

About

Uh oh!

Releases

Packages

Uh oh!

Languages

stone-lang/grammy

Folders and files

Latest commit

History

Repository files navigation

Grammy

ToC

Features

Usage

Class Methods

Decorated Methods

Instance Methods

Combinators

String

Regex

Sequence

Alternatives

Repetition

Operator DSL

Start/End of Line/File

Whitespace

Parse Tree

AST

Tests

License

Contributing

Release

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages