Team Name: Byte Sized Duo
Team Members: Puja Singla (ps3467), Ria Luo (xl3466)
Video demo for parser (Programming 2): https://drive.google.com/file/d/14Eowv9yAQSn401B3Jn-2aXUW_2a-pLG2/view?usp=drive_link
Video demo for Code Generation (Programming 3): https://drive.google.com/file/d/1Cd235Fbfq1xfSEYqohKIjWhuW38NwGNg/view?usp=sharing
ScriptLite is a language designed to simplify file management with a clear, easy-to-understand syntax that abstracts away complex shell commands for users who may be less familiar with shell scripts. It supports basic file operation commands, such as creating new directories, moving files, and copying files, as well as more advanced operations like batch moving files, batch renaming files, backing up files, and syncing files. The goals of this language are to:
-
Simplify file management for users unfamiliar with shell scripts, enabling them to perform complex file operations like batch-copying, syncing, and backups without the need for loops or extensive scripting.
-
Streamline complex batch file operations with an easy syntax that requires fewer lines of code.
-
Provide a natural language-like syntax that minimizes errors and makes tasks easier to understand and execute.
The lexical grammar of ScriptLite defines the syntax rules for recognizing different types of tokens within the language. Below are the token types supported by ScriptLite, along with their definitions and examples.
Keywords are reserved words in the ScriptLite language that have special meanings. They cannot be used as identifiers.
-
Token Type:
KEYWORD
-
Regex:
(path| list| define| call| in| to| string| bulk_rename_files| create_directory| copy_files| sync_files| display_files| ends_with| not| where| append| get_files| foreach| create_new_file| add_content)
-
Examples:
path
,list
,define
,call
,in
,to
,string
,bulk_rename_files
,create_directory
,copy_files
,sync_files
,display_files
,ends_with
,not
,where
,append
,get_files
,foreach
,create_new_file
,add_content
Identifiers are names used to identify variables, functions, and other entities in the program. They can be composed of letters, digits, and underscores, but must not begin with a digit.
- Token Type:
IDENTIFIER
- Regex:
[a-zA-Z_][a-zA-Z0-9_]*
- Rules:
- Must start with a letter or underscore (
_
). - Can contain letters, digits, and underscores.
- Must start with a letter or underscore (
- Examples:
my_variable
,_functionName
,data123
Strings are sequences of characters enclosed in double quotes. They can contain any characters except the newline character.
- Token Type:
STRING
- Regex:
"[^\"]*"
- Rules:
- Must be enclosed in double quotes (
"
). - Supports escaping of double quotes and other special characters (to be implemented as needed).
- Must be enclosed in double quotes (
- Examples:
"Hello, World!"
,"File path: C:\Users\Username"
Operators are symbols that represent computations or operations on operands. ScriptLite supports simple arithmetic and assignment operations.
- Token Type:
OPERATOR
- Regex:
[+=]
- Examples:
+
,=
Separators are characters used to separate tokens and denote the structure of the code.
- Token Type:
SEPARATOR
- Regex:
[();,{}\[\]]
- Examples:
(
,)
,{
,}
,;
,,
,[
,]
The grammar defined below outlines the structure of the language, demonstrating how programs are constructed using various production rules.
Program
, Declarations
, Function_Header
, Parameter_List
, Parameter
, Function_Call
, Arguments
, Block
, Block_Statements
, Statement
, Expression
, File_Handling
, Files
, CP
, Foreach_statement
, A
, String
list
, string
, id
, =
, [
, ]
, ;
, get_files
, define
, (
, )
, call
, ,
, {
, }
, create_directory
, display_files
, create_new_file
, add_content
, to
, append
, bulk_rename_files
, in
, +
, copy_files
, move_files
, ends_with
, foreach
, "
Program → Declarations Function_Header Function_Call | Statement
Declarations → list id = [ id ]; | string id = String; | list id = get_files id;
Function_Header → define id (Parameter_List) Block
Parameter_List → ε | list id Parameter_List’ | string id Parameter
Parameter → ε | , Parameter_List
Function_Call → call id(Arguments);
Arguments → id | ,Arguments | ε
Block → { Block_Statements } | { }
Block_Statements → Statement | Function_Call | Declarations | Foreach_statement
Statement → create_directory A; | display_files A; | create_new_file A; | get_files A| add_content String to A; | File_Handling | append id to id; | bulk_rename_files id in id to Expression;
Expression → A + A
File_Handling → Files id in id CP to id
Files → copy_files | move_files
CP → ends_with String | ε
Foreach_statement → foreach id in id Block
A → String | id
String → “id”
The program processes tokens from the scanner output and builds an Abstract Syntax Tree (AST) by recursively analyzing the structure of the source code. It matches tokens based on their class (e.g., KEYWORD, IDENTIFIER, etc.), advancing through the tokens while constructing nodes in the AST. Each node represents a syntactic construct like a variable declaration, function call, or block of code. The recursive algorithm closely follows the production rule we defined above.
Error handling is done using SyntaxError exceptions when the expected token doesn't match the current token. This ensures the parser detects and reports syntax errors, such as missing or misplaced tokens (e.g., missing semicolons or unbalanced parentheses). Only the first syntax error will be reported and the parser will exit.Some examples:
-
Missing Semicolon:
string x = "hello"
-
Unexpected Token:
list y = ;
-
Mismatched Parentheses in Function Call:
call print("hello",
-
Trailing Comma in Parameter List:
define myFunction(string x, string y, )
-
Incorrect Block Closure:
if condition { doSomething();
More examples can be found in the demo video linked at the top of README.
The cursor keeps track of the current position in the input program string. It increments as characters are processed.
- Tokens: A list that stores recognized tokens as tuples of (token type, token value).
- Errors: A list that records any lexical errors encountered during scanning.
The scanner transitions between different states based on the character being analyzed. The main states can be summarized as follows:
- Transition: If the character is whitespace, the scanner simply advances the cursor to skip over it.
- Transition: When the character is alphanumeric or an underscore, the scanner identifies the beginning of an
identifier or keyword.
- The scanner records the starting position of the potential token and enters a loop to collect all contiguous alphanumeric characters or underscores, forming a complete lexeme.
- After exiting the loop, the scanner checks if the lexeme is present in the predefined set of keywords. If it is, a
KEYWORD
token is created. If not, it further checks if the lexeme starts with a digit. If it does, an error is logged, as identifiers cannot start with a number. Otherwise, the lexeme is classified as anIDENTIFIER
, and the corresponding token is added.
- Transition: If the character is a double quote (
"
), the scanner recognizes the start of a string literal.- It advances the cursor and enters a loop until it finds the closing quote, collecting characters for the string.
- If it reaches the end of the line without finding the closing quote, it records a lexical error for an unclosed string.
- Transition: If the character is found in the separators string,
it recognizes it as a separator and adds it as a
SEPARATOR
token.
- Transition: If the character is found in the operators string, it
recognizes it as an operator and adds it as an
OPERATOR
token.
- Transition: If the character does not match any of the above criteria, it’s considered an invalid character. The scanner records an error and advances the cursor.
- The scanner catches three main types of errors: invalid characters that do not conform to any recognized token types, identifiers starting with numbers and unclosed string literals. When an invalid character is encountered, it logs a lexical error message indicating the character's position and type, allowing the scanner to continue processing the remainder of the input without interruption. For identifiers starting with numbers it will throw an error stating that identifier cannot start with numbers. For unclosed strings, the scanner detects when a string literal begins with a double quote but lacks a corresponding closing quote by the end of the line, recording this as an error as well. In all the cases, the scanner advances the cursor to the next character after logging the error, ensuring that it can continue scanning the input rather than halting on encountering an error.
Code generator program converts a custom scripting language into executable Bash shell scripts. It allows users to write high-level scripts using a predefined grammar, which the program parses, processes, and translates into Bash commands. The tool supports various functionalities, including file management, string and list operations, function definitions, and flow control constructs (e.g., foreach
loops).
-
Tokenization and Parsing
Uses aScanner
to tokenize input code and aParser
to construct an Abstract Syntax Tree (AST) from the tokens. -
Shell Script Generation
Translates the AST into a valid Bash shell script, preserving the logic and flow of the input program. -
Error Handling
- Lexical Errors: The
Scanner
identifies invalid tokens and displays errors with relevant details. - Syntax Errors: The
Parser
reports invalid syntax and halts execution if the input program does not conform to the grammar. - Runtime Errors: During script execution, any errors (e.g., invalid directory paths or missing files) are caught and displayed, ensuring robust handling of unexpected issues.
- Lexical Errors: The
-
Supported Operations
- File operations: creating directories, moving, copying, and renaming files.
- Variable declarations: supports strings and lists.
- Function definitions and calls with parameter handling.
- Flow control: supports
foreach
loops for iterating over lists.
-
Automatic Execution
The generated shell script is automatically executed after creation.
To install Python on macOS using Homebrew:
- Install Homebrew (if not already installed):
Open Terminal and run:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
- Install Python
brew install python
To install Python on linux distribution:
sudo apt update
sudo apt install python3 python3-pip -y
After installation, verify that Python was installed correctly by running:
python3 --version
./make_test_dirs.sh
To make the script executable, run the following command:
Lexer:
chmod +x run_lexer.sh
Parser:
chmod +x ast_parser.sh
Code Generator:
chmod +x run_generator.sh
Following are the description of each program and the command to execute the program. The expected output for each program for scanner is given in expected_output.txt
file.
This ScriptLite program automates the process of appending a copyright notice to all files within multiple directories. Below is a summary of its functionality:
-
Directories List: The program defines a list of directories (
/user/Docs
,/user/Projects
,/user/Reports
) where the copyright notice will be appended. -
Copyright Notice: It also defines a
copyright_notice
string that contains the text© 2024 Your Company. All rights reserved.
This notice will be added to each file in the specified directories.
Function Definitions:
-
append_copyright_to_file
: Appends thecopyright_notice
string to a single file located at thefile_path
. -
append_copyright_to_directory
: Retrieves all files in a specified directory and iterates through each file, appending the copyright notice by callingappend_copyright_to_file
. -
append_copyright_to_multiple_directories
: Loops through each directory in thedirectories
list and callsappend_copyright_to_directory
to append the copyright notice to all files in those directories.
Execution:
The program calls the append_copyright_to_multiple_directories
function, which processes all the files across the listed directories and appends the copyright notice to them.
./run_lexer.sh add_copyright_to_directories.txt
./run_parser.sh adding_copyright_to_directories.txt
./run_generator.sh adding_copyright_to_directories.txt
This script performs basic file management tasks using the ScriptLite
language. It demonstrates the following operations:
- Creating a Directory: The program creates a new directory named
tasks
. - Displaying Files: It displays the contents of the
tasks
directory. - Defining a Filename: The filename
todo_list.txt
is assigned to the variablefile_name
. - Creating a New File: A new file named
todo_list.txt
is created in thetasks
directory. - Adding Content to the File: The program adds the content
"finish programming assignment"
to thetodo_list.txt
file.
./run_lexer.sh adding_newfile\(errors_included\).txt
./run_parser.sh adding_newfile.txt
./run_generator.sh adding_newfile.txt
This ScriptLite program is designed to facilitate basic file management tasks, specifically creating directories and managing file copies. The script performs the following operations:
- Create a Backup Directory: It initializes a string variable
dest_dir
with the path/home/backup
and creates a directory at this location. - Display Backup Directory Contents: The program then lists all files present in the
/home/backup
directory to show the current contents. - Set Source Directory: It defines another string variable
src_dir
to point to the directory/home/usr
. - Copy Log Files: The script copies all files ending with the
.log
extension from thesrc_dir
to thedest_dir
. - Show Updated Backup Directory Contents: Finally, it displays the contents of the
dest_dir
again to reflect any newly copied files.
./run_lexer.sh backup_log_files.txt
./run_parser.sh backup_log_files\(error\).txt
./run_generator.sh backup_log_files_error.txt
This ScriptLite program is designed to rename files in specified directories by adding a prefix to their original names. It starts by defining two directory paths, directory1 and directory2, which point to locations on a user's computer. It then creates a list named directories that includes these two directories.
The core functionality is encapsulated in the function rename_files_in_dirs, which takes a list of directories and a string prefix as parameters. Inside this function, the command bulk_rename_files is called to rename all files in the specified directories by appending the given prefix to each file name. Finally, the program invokes the rename_files_in_dirs function, passing in the directories list and the desired prefix.
./run_lexer.sh bulk_rename_files\(errors_included\).txt
./run_parser.sh bulk_rename_files\(error\).txt
./run_generator.sh bulk_rename_files.txt
Note that for the last bulk_rename_files.txt, we have added a dead function "not_used" to the code to demonstrate the dead code elimination functionality of the generator.
This ScriptLite program is designed to organize files in a specified source directory by categorizing them into separate folders based on their file types. The program defines four main functions to handle different file formats:
-
organize_jpg_files
: Creates a directory for JPEG files and moves all.jpg
files from the source directory to this new directory. -
organize_pdf_files
: Creates a directory for PDF files and moves all.pdf
files from the source directory to this directory. -
organize_docx_files
: Creates a directory for DOCX files and moves all.docx
files from the source directory to this directory. -
organize_all_files
: Calls the previous three functions to ensure that all specified file types are organized into their respective folders.
The program concludes by calling the organize_all_files
function, passing the source and target directories as arguments. This automates the file organization process, making it easier for users to manage and find their files.
./run_lexer.sh organize_files_by_extension.txt
./run_parser.sh organize_files_by_extension\(error\).txt
./run_generator.sh organize_files_by_extension.txt