A collection of Node.js command-line utilities for converting data files between a Software AG-style binary format (SAG) and plain text.
These utilities are designed to process data where records in the SAG format are prefixed with a 2-byte little-endian integer specifying the length of the record that follows.
sag2txt.js
: Reads SAG formatted data from stdin and converts it to plain text on stdout, with each record printed as a new line.txt2sag.js
: Reads plain text from stdin (line by line) and converts it to SAG format on stdout, prefixing each line's content with its length.
The "SAG" format handled by these tools consists of sequential records. Each record has two parts:
- A 2-byte header: This is a 16-bit unsigned integer, stored in little-endian byte order, representing the length (in bytes) of the data record that immediately follows.
- The data record: This is a sequence of bytes of the length specified in the header.
- Node.js (version 10.x or higher recommended, due to the use of
for await...of
intxt2sag.js
and modern stream APIs)
Both scripts read from standard input and write to standard output.
Converts SAG formatted data to plain text.
Syntax:
node sag2txt.js < input.sag > output.txt
Example:
If input.sag
contains a record representing "hello" (length 5) and then "world" (length 5):
0500hello0500world
(hex representation of binary)
Running the command:
node sag2txt.js < input.sag
Would produce:
hello
world
Converts plain text (line by line) to SAG formatted data.
Syntax:
node txt2sag.js < input.txt > output.sag
Example:
If input.txt
contains:
hello
world
Running the command:
node txt2sag.js < input.txt > output.sag
Would produce a binary file output.sag
where the content is equivalent to 0500hello0500world
(hex representation).
Automated tests are highly recommended to ensure the utilities function correctly and to prevent regressions if changes are made in the future.
An automated test script, run_tests.sh
, is provided to simplify this process. The script performs the following steps:
- Uses the
test1.txt
file as the initial input. - Converts
test1.txt
to a temporary SAG file (test1_generated.sag
) usingnode txt2sag.js
. - Converts
test1_generated.sag
back to a temporary text file (test1_output.txt
) usingnode sag2txt.js
. - Compares
test1_output.txt
with the originaltest1.txt
usingdiff -u
. - Reports whether the test passed or failed and cleans up temporary files.
To run the tests, simply execute the script from the root of the repository:
./run_tests.sh
Make sure the script has execute permissions (chmod +x run_tests.sh
). If diff
produces no output and the script reports success, the files are identical, and the test passes.
This codebase has been refactored to incorporate several good software development practices:
- Modern JavaScript Features:
- Utilizes
let
andconst
for variable declarations, improving scope management and preventing accidental re-declarations compared tovar
. - Employs
async/await
intxt2sag.js
for cleaner asynchronous code withreadline
.
- Utilizes
- Robust Error Handling:
- Implemented event handlers (
.on('error', ...)
) for input (stdin
) and output (stdout
) streams to catch and report issues. - Used
try...catch
blocks for handling errors during file processing logic. - Ensures the scripts exit with a non-zero status code on error.
- Implemented event handlers (
- Code Clarity and Maintainability:
- Improved variable names for better readability (e.g.,
recordBuffer
instead ofbuf
). - Refactored input reading in
sag2txt.js
to use Node.js standard stream events ('readable'
,'end'
) instead of manual synchronous reads in a loop, making the I/O operations more idiomatic and non-blocking. - Added detailed comments to explain the purpose of the scripts, the data format they handle, and complex logic sections.
- Improved variable names for better readability (e.g.,
- Safe Buffer Operations:
- Switched from
Buffer.allocUnsafe
toBuffer.alloc
to prevent potential data leaks by ensuring buffers are zero-filled upon allocation.
- Switched from
- Comprehensive Documentation:
- The README provides a clear description of the utilities, the specific SAG format they work with, prerequisites, detailed usage instructions with examples, and suggestions for testing.
- Input/Output Conventions:
- Scripts consistently use standard input (stdin) for reading data and standard output (stdout) for writing results, adhering to common command-line utility patterns. This allows them to be easily used in pipelines.