Create a Lark LR grammar for BibTeX citation format and use it with SynCode to constrain LLM outputs. This demonstrates how formal grammars can be used to ensure structured output generation from LLMs.
One week
- Lark parser (
pip install lark-parser
) - SynCode library (https://github.com/structuredllm/syncode,
pip install syncode
) - Python 3.8+
Create a Lark LALR(1) grammar that:
- Handles all standard BibTeX entry types
- Supports nested braces in field values
- Correctly processes special characters and LaTeX commands
- Is compatible with SynCode's grammar requirements
A unit test file will be provided to verify your grammar implementation. The tests include various complex BibTeX examples:
- Nested braces in field values
- Special characters and LaTeX commands
- Mixed quote/brace formatting
Please see a similar ANTLR grammar here: https://github.com/antlr/grammars-v4/tree/master/bibtex (Note: This grammar is not in the Lark format)
Create a script that:
- Use your grammar with
- Tests with at least 3 different prompts asking for BibTeX citations
- Example prompts:
- "Generate a BibTeX entry for a 3 recent paper on LLM security"
- "Create a BibTeX citation for a conference paper by authors Smith and Johnson"
- "Provide the BibTeX entry for RL book Barto and Sutton"
bibtex.lark
- Your Lark grammar filebibtex_syncode.py
- Script for SynCode integration