This project is a comprehensive interpreter for a subset of the BASIC (Beginner's All-Purpose Symbolic Instruction Code) programming language. It includes a lexer, a parser, and an interpreter, capable of processing BASIC code from a file or in an interactive mode. The interpreter supports various BASIC language features including variable assignments, control flow (FOR/NEXT, IF/THEN, WHILE, GOSUB/RETURN, GOTO), input/output operations (PRINT, READ, INPUT, DATA), and a range of built-in functions.
This project was developed incrementally, following a series of assignments that built upon each other:
- Assignment 1: The Lexer: Focused on tokenizing the BASIC source code.
- Assignment 2: Enhance the Lexer: Added more token types, including keywords and symbols, using HashMaps for efficient lookup.
- Assignment 3: Start the Parser: Began the process of transforming the token list into an Abstract Syntax Tree (AST), focusing on order of operations for mathematical expressions (Expression-Term-Factor pattern).
- Assignment 4: Expand the Parser: Introduced more AST nodes for statements like PRINT, assignment, and variable handling.
- Assignment 5: Even More in the Parser: Added support for string literals and statements like READ, DATA, and INPUT.
- Assignment 6: Finish the Parser: Completed the parser by adding support for labels, GOSUB/RETURN, FOR/NEXT, IF, WHILE, END, and function invocations.
- Assignment 7: Start the Interpreter: Began building the interpreter, focusing on pre-processing optimizations like handling DATA statements and labels, and implementing built-in functions.
- Assignment 8: Continue the Interpreter: Implemented the evaluation logic for expressions (including mathematical operations and function calls) and started interpreting statement nodes like READ, ASSIGN, INPUT, and PRINT.
- Assignment 9: Finish the Interpreter: Completed the interpreter by adding a 'Next' member to StatementNode for proper execution flow, implementing the remaining statement interpretations (IF, GOSUB, RETURN, FOR, NEXT, END), and setting up the main interpreter loop.
- Lexical Analysis: Converts BASIC source code into a stream of tokens.
- Recognizes keywords, identifiers (variables), numbers (integer and float), string literals, and symbols.
- Handles comments and whitespace.
- Tracks line numbers and character positions for tokens.
- Parsing: Builds an Abstract Syntax Tree (AST) from the token stream.
- Implements Expression-Term-Factor pattern for correct order of operations.
- Supports a variety of statement types including assignments, print statements, control flow, and data manipulation.
- Creates heterogeneous n-ary tree nodes.
- Interpretation: Executes the BASIC program represented by the AST.
- Manages variable storage (integers, floats, strings).
- Handles control flow statements like
IF-THEN,FOR-NEXT,WHILE,GOSUB-RETURN, andGOTO. - Supports
READ/DATAfor program data. - Implements
INPUTfor user interaction. - Provides
PRINTfor outputting values. - Includes built-in functions:
RANDOM,LEFT$,RIGHT$,MID$,NUM$,VAL,VAL%.
- Error Handling: Provides error messages for syntax errors and runtime issues.
- Interactive Mode: Allows users to type and run BASIC commands directly in the interpreter.
- Commands:
RUN,HELP,EXIT,NEW,OUTPUT,SAVE [file],LOAD [file].
- Commands:
- Debug Mode: Offers a debug flag (
-dor-debug) to print tokens and the AST for inspection.
- Variables are typed by their ending character:
$for string (e.g.,myString$)%for float (e.g.,percentOfPeople%)- Any other ending for integer (e.g.,
count)
- Simple variables do not need to be pre-defined.
- All variables are global.
- Programs are a list of commands.
- Labels are used for line referencing (e.g.,
myLabel:). A label must be the first item on a line.
- PRINT: Outputs values to the console. Can print multiple comma-separated values.
PRINT expression1, expression2, "string_literal", ...
- READ: Reads values from
DATAstatements into variables.READ var1, var2$, var3%
- INPUT: Prompts the user for input and assigns it to variables.
INPUT "Prompt string", var1$, var2
- DATA: Defines a list of constant data (strings, integers, floats) to be used by
READ.DATA "string", 123, 45.67
- GOSUB: Transfers control to a subroutine identified by a label.
GOSUB subroutineLabel
- RETURN: Returns control from a subroutine to the statement after the
GOSUBcall. - GOTO: Unconditionally transfers control to a line identified by a label.
GOTO targetLabel
- FOR...TO...STEP...NEXT: Creates a loop.
FOR variable = initialValue TO limit [STEP increment]...statements...NEXT variable- The
STEPpart is optional; if omitted, the increment is 1.
- IF...THEN: Conditional execution. If the expression is true, control jumps to the specified label.
IF boolean_expression THEN label- Boolean expressions are of the form:
Expression operator Expression, where operator can be>,>=,<,<=,<>,=.
- WHILE...label: Loops through statements until an expression is false. The loop body is defined up to the specified end label.
WHILE boolean_expression endLabelName...statements...endLabelName:
- END: Terminates program execution.
- Assignment: Assigns a value to a variable.
variable = expression(e.g.,C = 5*(F-32)/9)
- RANDOM(): Returns a random integer.
- LEFT$(string, int): Returns the leftmost N characters from the string.
- RIGHT$(string, int): Returns the rightmost N characters from the string.
- MID$(string, int, int): Returns a substring starting from the second argument's position for a length specified by the third argument (e.g.,
MID$("Albany",2,3)results in "ban"). - NUM$(int or float): Converts a number to its string representation.
- VAL(string): Converts a string to an integer.
- VAL%(string): Converts a string to a float.
- Java 1.8 or higher (Project
pom.xmlspecifies Java 11, butREADME.mdmentions Java 1.8). It's best to use Java 11 as perpom.xmlconfiguration.- The
pom.xmlhas<maven.compiler.source>1.8</maven.compiler.source>and<maven.compiler.target>1.8</maven.compiler.target>under properties, but themaven-compiler-pluginconfiguration specifies source and target as 11.
- The
- Maven 3.8.x or later.
- Clone the repository:
git clone <repository-url> cd <repository-directory>
- Build the project using Maven:
This will compile the source code, run tests, and create a JAR file in the
mvn clean install -U mvn package
target/directory (e.g.,basic-app-1.0.jar).
- Open IntelliJ IDEA.
- Click "Open", navigate to the project root directory, select the
pom.xmlfile, then click "Open as Project". - Ensure IntelliJ uses the correct Maven version:
- Click "View" -> "Tool Windows" -> "Maven".
- Open Maven Settings for your project.
- Ensure you are using "Bundled (Maven 3)" under "Maven home path" or point to your local Maven 3.8.x+ installation. This will help use the correct Maven version if your system-level Maven is different.
Execute the compiled JAR file, passing the BASIC file name as an argument:
java -jar target/basic-app-1.0.jar "your_basic_file.bas"Run the interpreter in interactive mode:
java -jar target/basic-app-1.0.jar -interactiveor
java -jar target/basic-app-1.0.jar -iThis will start the SimpleBASIC interpreter. You can type BASIC commands directly.
- HELP: Print help options.
- RUN: Compile and run the current program entered in the interactive session.
- NEW: Start a new program, discarding the current one.
- OUTPUT: Output the current program.
- SAVE: Save the current program to output.txt in the current directory.
- SAVE [file]: Save the current program to the specified file in the current directory.
- LOAD [file]: Load a BASIC program from the specified file.
- EXIT: Quit the interpreter.
To see the token stream and the Abstract Syntax Tree (AST) during execution, use the debug flag:
java -jar target/basic-app-1.0.jar "your_basic_file.bas" -debugor
java -jar target/basic-app-1.0.jar "your_basic_file.bas" -dContains classes responsible for lexical analysis.
CodeHandler.java: Manages reading and navigating the source code character by character.Lexer.java: The main lexer class that tokenizes the input source code.Token.java: Represents a token with its type, value (optional), line number, and position.TokenManager.java: Manages the stream of tokens produced by the Lexer, providing peek and match-and-remove functionalities for the Parser.
Contains classes responsible for parsing the token stream into an AST.
Parser.java: The main parser class. Implements recursive descent parsing, including the Expression-Term-Factor pattern.
Contains classes defining the nodes of the Abstract Syntax Tree.
Node.java: Abstract base class for all AST nodes.ProgramNode.java: Root node of the AST, holding a list of statements or expressions.StatementNode.java: Abstract base class for all statement nodes (e.g.,PrintNode,AssignmentNode). Has anextpointer to link statements for sequential execution.ExpressionNode.java,TermNode.java,FactorNode.java: Nodes representing parts of mathematical expressions.- Specific nodes for each language construct:
IntegerNode,FloatNode,StringNode,VariableNode,MathOpNodePrintNode,AssignmentNode,IfNode,ForNode,WhileNode,GosubNodeReturnNode,DataNode,ReadNode,InputNode,FunctionNodeLabeledStatementNode,EndNode, etc.
InterpreterDataType.java: Enum for variable types (STRING,INTEGER,FLOAT).StatementVisitor.java: Interface for the visitor pattern used by the Interpreter for pre-processing and statement execution.BuiltInFunctions.java: Static methods for BASIC built-in functions.
Contains the class responsible for executing the AST.
Interpreter.java: Walks the AST and executes the BASIC program. Manages variable state, control flow, and I/O. ImplementsStatementVisitor.
The main class with the main method, serving as the entry point for the application. Handles command-line arguments and orchestrates the lexing, parsing, and interpretation processes.
Contains JUnit tests for various components.
LexerTest.java,ParserTest.java,InterpreterTest.javaCodeHandlerTest.java,BuiltInFunctionsTest.java,TokenTest.java
Contains sample BASIC files and other resources used for testing.
Maven project configuration file, defining dependencies (JUnit Jupiter for testing), build plugins, and project metadata.
Contains documents detailing the requirements for each assignment, which collectively define the features and evolution of the interpreter.
The project includes a suite of JUnit tests to ensure the correctness of the lexer, parser, and interpreter components.
- Verify tokenization of various inputs, including:
- Keywords
- Identifiers
- Numbers
- String literals (with escapes and multi-lines)
- Symbols
- Labels
- Comments
- Test the functionality of the
CodeHandlerfor reading and navigating source code.
- Validate AST construction for different expressions and statements.
- Ensure proper order of operations is respected in expression parsing.
- Test data and label processing.
- Validate variable storage and retrieval across data types.
- Test mathematical operations and function calls.
- Evaluate
INPUTandPRINTstatements (including a test mode for automated I/O). - Test control flow statements:
IF,FOR,WHILE,GOSUB,GOTO. - Test
READ/DATAfunctionality.
- Test each of the built-in BASIC functions for correctness.
- Validate
Tokenclass methods, includingtoString().
Use the following Maven command:
mvn testWhile the project fulfills the specified requirements, potential future enhancements could include:
- More extensive error recovery in the parser.
- Support for user-defined functions and subroutines with parameters and local variables.
- Additional data structures like arrays.
- File I/O operations beyond
INPUTandPRINT. - A richer set of built-in functions.
- Graphical User Interface (GUI).
- More sophisticated debugging capabilities.