Skip to content

wteja/pdf-converter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Office to PDF Converter

This is a simple HTTP server application written in Go that converts uploaded office files into PDF files using LibreOffice in headless mode. The server listens on port 5000 and exposes a single endpoint for file conversion. It also includes automatic cleanup of temporary files older than one hour.

Features

  • Converts various office files to PDF format using LibreOffice.
  • Handles file uploads via HTTP POST requests.
  • Automatic cleanup of old files from the temporary directory after one hour.
  • Minimal and efficient implementation using Go.

Requirements

  • Go: Ensure Go is installed on your system (Download Go).
  • LibreOffice: LibreOffice must be installed and accessible via the soffice command.

Supported File Formats

This application relies on LibreOffice for file conversion, so it supports any file format LibreOffice can handle. Below are the formats that can be converted to PDF:

Document Formats

  • .doc, .docx (Microsoft Word)
  • .odt, .ott (LibreOffice/OpenDocument Text)
  • .rtf (Rich Text Format)
  • .txt (Plain Text)

Spreadsheet Formats

  • .xls, .xlsx (Microsoft Excel)
  • .ods, .ots (LibreOffice/OpenDocument Spreadsheet)
  • .csv (Comma-Separated Values)

Presentation Formats

  • .ppt, .pptx (Microsoft PowerPoint)
  • .odp, .otp (LibreOffice/OpenDocument Presentation)

Other Formats

  • .svg (Scalable Vector Graphics)
  • .html, .htm (HTML Files)
  • .xml (XML Files)
  • .pdf (for PDF editing and re-exporting)

Installation

  1. Clone the repository:

    git clone <repository_url>
    cd <repository_directory>
  2. Install dependencies: This application does not have external dependencies, but ensure you have LibreOffice installed.

  3. Build the application:

    go build -o pdf-converter .
  4. Run the application:

    ./pdf-converter

The server will start listening on http://localhost:5000.

API Usage

Endpoint: /convert

  • Method: POST
  • Field Name: file
  • File Type: Any supported format (e.g., .xlsx, .docx).

Request Example (Using curl):

curl -X POST -F "[email protected]" http://localhost:5000/convert --output output.pdf

Response:

  • Success: Returns the converted PDF file as a response with the Content-Type set to application/pdf.
  • Error: Returns an appropriate HTTP status code and error message if the conversion fails.

Public Docker Image

You can use the publicly available Docker image for this application:

docker pull wteja/pdf-converter

Run the container:

docker run -p 5000:5000 wteja/pdf-converter

Configuration

  • Temporary files are stored in the ./tmp directory. Ensure the application has write access to this directory.
  • The application automatically removes files older than one hour from the tmp directory.

Code Overview

Main Components

  1. File Upload and Conversion:

    • The /convert endpoint processes file uploads, saves them to the tmp directory, and invokes LibreOffice in headless mode to perform the conversion.
  2. Temporary Directory Management:

    • All uploaded and converted files are stored in the tmp directory.
    • A background goroutine periodically checks and deletes files older than one hour.
  3. Error Handling:

    • Comprehensive error handling for file uploads, conversions, and temporary file management.

Key Functions

  • handleConvert: Handles the HTTP requests, manages file upload and conversion, and returns the resulting PDF.
  • cleanupOldFiles: Periodically deletes old files from the tmp directory.

Future Improvements

  • Support additional file formats (e.g., .pptx, .odg).
  • Add configurable cleanup duration and temporary directory path.
  • Implement better logging and monitoring.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Author

Developed by Weerayut Teja. Contributions and feedback are welcome!