Openize.MarkItDown for Python is a package that converts documents into Markdown format. It supports multiple file formats, provides flexible output handling, and integrates with LLMs for extended processing including OpenAI, Claude, Gemini, and Mistral.
- Convert
.docx
,.pdf
,.xlsx
, and.pptx
to Markdown. - Save Markdown files locally or send them to an LLM for processing (OpenAI, Claude, Gemini, Mistral).
- Structured with the Factory & Strategy Pattern for scalability.
- Works with Windows and Linux-compatible paths.
- Command-line interface for easy use.
This package depends on the Aspose libraries, which are commercial products:
You'll need to obtain valid licenses for these libraries separately. The package will install these dependencies, but you're responsible for complying with Aspose's licensing terms.
LLM support requires valid API keys and potentially the following dependencies:
openai
for OpenAIanthropic
for Clauderequests
for Gemini and Mistral REST APIs
pip install openize-markitdown-python
git clone https://github.com/openize-com/openize-markitdown-python.git
cd openize-markitdown-python\packages\markitdown
pip install -e . --verbose
# Convert a file and save locally
markitdown document.docx -o output_folder
# Process with an LLM (requires corresponding API key)
markitdown document.docx -o output_folder --llm openai
markitdown document.docx -o output_folder --llm claude
markitdown document.docx -o output_folder --llm gemini
markitdown document.docx -o output_folder --llm mistral
from openize.markitdown.core import MarkItDown
# Define input file and output directory
input_file = "report.pdf"
output_dir = "output_markdown"
# Create MarkItDown instance with desired LLM
converter = MarkItDown(output_dir, llm_client_name="mistral")
# Convert document and send output to LLM
converter.convert_document(input_file)
print("Conversion completed and data sent to LLM.")
Variable | Description |
---|---|
ASPOSE_LICENSE_PATH |
Path to Aspose license file (required if using paid features) |
OPENAI_API_KEY |
API key for OpenAI integration |
OPENAI_MODEL |
(Optional) Model name for OpenAI (default: gpt-4 ) |
CLAUDE_API_KEY |
API key for Claude integration |
CLAUDE_MODEL |
(Optional) Model name for Claude (default: claude-v1 ) |
GEMINI_API_KEY |
API key for Gemini integration |
GEMINI_MODEL |
(Optional) Model name for Gemini (default: gemini-pro ) |
MISTRAL_API_KEY |
API key for Mistral integration |
MISTRAL_MODEL |
(Optional) Model name for Mistral (default: mistral-medium ) |
Unix-based systems:
export ASPOSE_LICENSE_PATH="/path/to/license"
export OPENAI_API_KEY="your-openai-key"
export CLAUDE_API_KEY="your-claude-key"
export GEMINI_API_KEY="your-gemini-key"
export MISTRAL_API_KEY="your-mistral-key"
Windows (PowerShell):
$env:ASPOSE_LICENSE_PATH = "C:\path\to\license"
$env:OPENAI_API_KEY = "your-openai-key"
$env:CLAUDE_API_KEY = "your-claude-key"
$env:GEMINI_API_KEY = "your-gemini-key"
$env:MISTRAL_API_KEY = "your-mistral-key"
We appreciate your interest in contributing to this project! To ensure a smooth collaboration, please follow these steps when submitting a pull request:
- Fork & Clone – Fork the repository and clone it to your local machine.
- Create a Branch – Use a new branch for your contribution.
- Sign the Contributor License Agreement (CLA) – Before your first contribution can be accepted, you must sign our CLA via CLA Assistant. You will be prompted to sign it when submitting your first pull request. You can also review the CLA here: https://cla.openize.com/agreement.
- Submit a Pull Request (PR) – Once your changes are ready, open a PR with a clear description.
- Review & Feedback – Our maintainers will review your PR and provide feedback if needed.
By contributing, you agree to the terms of the CLA and confirm that your changes comply with the project's licensing policies.
This package is licensed under the MIT License. However, it depends on Aspose libraries, which are proprietary, closed-source libraries.