Openize.MarkItDown for Python

Openize.MarkItDown for Python is a package that converts documents into Markdown format. It supports multiple file formats, provides flexible output handling, and integrates with LLMs for extended processing including OpenAI, Claude, Gemini, and Mistral.

Features

Convert .docx, .pdf, .xlsx, and .pptx to Markdown.
Save Markdown files locally or send them to an LLM for processing (OpenAI, Claude, Gemini, Mistral).
Structured with the Factory & Strategy Pattern for scalability.
Works with Windows and Linux-compatible paths.
Command-line interface for easy use.

Requirements

This package depends on the Aspose libraries, which are commercial products:

You'll need to obtain valid licenses for these libraries separately. The package will install these dependencies, but you're responsible for complying with Aspose's licensing terms.

LLM support requires valid API keys and potentially the following dependencies:

openai for OpenAI
anthropic for Claude
requests for Gemini and Mistral REST APIs

Installation

From TestPyPI

pip install openize-markitdown-python

From Source

git clone https://github.com/openize-com/openize-markitdown-python.git
cd openize-markitdown-python\packages\markitdown
pip install -e . --verbose

Usage

Command Line Interface

You can use the markitdown CLI to convert a single document or a folder of documents into Markdown format. You can also optionally send the converted Markdown to a supported LLM for post-processing.

Convert a Single File

markitdown --input-file d:\test.docx -o d:\output_folder

Convert a Folder of Documents

markitdown --input-dir d:\docs -o d:\output_folder

Use an LLM to Post-process Output

These commands require you to set the appropriate environment variables or provide API keys when prompted:

markitdown --input-file d:\test.docx -o d:\output_folder --llm openai
markitdown --input-file d:\test.docx -o d:\output_folder --llm claude
markitdown --input-file d:\test.docx -o d:\output_folder --llm gemini
markitdown --input-file d:\test.docx -o d:\output_folder --llm mistral

Python API

from openize.markitdown.core import MarkItDown

# Define input file and output directory
input_file = "report.pdf"
output_dir = "output_markdown"

# Create MarkItDown instance with desired LLM
converter = MarkItDown(output_dir, llm_client_name="mistral")

# Convert document and send output to LLM
converter.convert_document(input_file)

print("Conversion completed and data sent to LLM.")

Environment Variables

Variable	Description
`ASPOSE_LICENSE_PATH`	Path to Aspose license file (required if using paid features)
`OPENAI_API_KEY`	API key for OpenAI integration
`OPENAI_MODEL`	(Optional) Model name for OpenAI (default: `gpt-4`)
`CLAUDE_API_KEY`	API key for Claude integration
`CLAUDE_MODEL`	(Optional) Model name for Claude (default: `claude-v1`)
`GEMINI_API_KEY`	API key for Gemini integration
`GEMINI_MODEL`	(Optional) Model name for Gemini (default: `gemini-pro`)
`MISTRAL_API_KEY`	API key for Mistral integration
`MISTRAL_MODEL`	(Optional) Model name for Mistral (default: `mistral-medium`)

Setting Environment Variables

Unix-based systems:

export ASPOSE_LICENSE_PATH="/path/to/license"
export OPENAI_API_KEY="your-openai-key"
export CLAUDE_API_KEY="your-claude-key"
export GEMINI_API_KEY="your-gemini-key"
export MISTRAL_API_KEY="your-mistral-key"

Windows (PowerShell):

$env:ASPOSE_LICENSE_PATH = "C:\path\to\license"
$env:OPENAI_API_KEY = "your-openai-key"
$env:CLAUDE_API_KEY = "your-claude-key"
$env:GEMINI_API_KEY = "your-gemini-key"
$env:MISTRAL_API_KEY = "your-mistral-key"

Contributing

We appreciate your interest in contributing to this project! To ensure a smooth collaboration, please follow these steps when submitting a pull request:

Fork & Clone – Fork the repository and clone it to your local machine.
Create a Branch – Use a new branch for your contribution.
Sign the Contributor License Agreement (CLA) – Before your first contribution can be accepted, you must sign our CLA via CLA Assistant. You will be prompted to sign it when submitting your first pull request. You can also review the CLA here: https://cla.openize.com/agreement.
Submit a Pull Request (PR) – Once your changes are ready, open a PR with a clear description.
Review & Feedback – Our maintainers will review your PR and provide feedback if needed.

By contributing, you agree to the terms of the CLA and confirm that your changes comply with the project's licensing policies.

License

This package is licensed under the MIT License. However, it depends on Aspose libraries, which are proprietary, closed-source libraries.

⚠️ Users must obtain a valid license for Aspose libraries separately. This repository does not include or distribute any proprietary components.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.github/workflows		.github/workflows
.idea		.idea
packages/markitdown		packages/markitdown
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Openize.MarkItDown for Python

Features

Requirements

Installation

From TestPyPI

From Source

Usage

Command Line Interface

Convert a Single File

Convert a Folder of Documents

Use an LLM to Post-process Output

Python API

Environment Variables

Setting Environment Variables

Contributing

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Languages

openize-com/openize-markitdown-python

Folders and files

Latest commit

History

Repository files navigation

Openize.MarkItDown for Python

Features

Requirements

Installation

From TestPyPI

From Source

Usage

Command Line Interface

Convert a Single File

Convert a Folder of Documents

Use an LLM to Post-process Output

Python API

Environment Variables

Setting Environment Variables

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Languages

Packages