Here's a comprehensive README.md
file for your GitHub repository based on the three code sources you provided:
This tool provides automated PDF/UA compliance processing using the PDFix SDK. It performs tagging, validation, and correction of PDF documents to meet PDF/UA or WCAG 2.2 accessibility standards.
- Automatic PDF tagging - Adds proper structure and tags for accessibility
- PDF/UA validation - Checks compliance with PDF/UA standards
- Automated fixes - Corrects common compliance issues
- Validation reporting - Provides detailed XML reports of compliance issues
- Python 3.6+
- PDFix SDK (properly installed and licensed)
- Java Runtime Environment (for validation tool)
greenfield-apps-1.27.0-SNAPSHOT.jar
validation tool veraPDF
-
Clone this repository and initialize python virtial environemnt
Linux / macOS
python3 -m venv .venv source .venv/bin/activate
Windows:
python3 -m venv .venv env/Scripts/activate
-
Install dependencies:
pip install pdfix-sdk
or
pip install -r requirements
-
Ensure Java is installed and in your PATH
-
Place the validation JAR file in the expected location:
{project_root}/validation/greenfield-apps-1.27.0-SNAPSHOT.jar
python main.py
- Opens the input PDF (
pdf/example.pdf
) - Automatically tags the document
- Performs initial validation (saves to
pdf/validate.pdf
) - Applies necessary fixes based on validation results
- Performs final validation (saves to
pdf/tagged.pdf
)
Modify these paths in main.py
as needed:
inputPath = "pdf/example.pdf"
validatePath = "pdf/validate.pdf"
taggedPath = "pdf/tagged.pdf"
File | Description |
---|---|
main.py |
Main execution script |
pdf.py |
Contains PDF processing functions (autotagPdf , fixUaClauses ) |
validation.py |
Handles PDF/UA validation and report parsing |
autotagPdf(doc: PdfDoc)
- Adds accessibility tags to PDFfixUaClauses(doc: PdfDoc, rules: list)
- Applies fixes for specific PDF/UA clausesvalidatePdf(doc: PdfDoc, pdfPath: str)
- Validates PDF and returns rule violationsrunJavaValidation(pdfPath)
- Executes Java validation toolparseValidationReport(xmlReport: str)
- Parses XML validation results
The tool currently handles fixes for these PDF/UA requirements:
- Clause 5 (PDF/UA identification)
- Clause 7.1 (Document title display)
- Clause 7.2 (Language specification)
For more clauses and accessibility actions visit https://pdfix.net/products/pdfix-sdk/actions/
The script will:
- Exit with code 1 on critical errors
- Print detailed error messages to stderr
- Preserve validation reports for troubleshooting
- Processed PDF files in the
pdf/
directory - Validation reports printed to console
- Final compliance status message