Skip to content

Commit 961f910

Browse files
lpozobzaczynski
andauthored
Sample data and code for the article on MarkItDown (#707)
* Sample data and code for the article on MarkItDown * Fake commit * Revert * Trigger CI after workflow removal * TR updates, second round * Remove unused imports * Final QA * Reformat code --------- Co-authored-by: Bartosz Zaczyński <[email protected]>
1 parent a454058 commit 961f910

File tree

13 files changed

+92
-0
lines changed

13 files changed

+92
-0
lines changed

python-markitdown/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Python MarkItDown: Convert Documents Into LLM-Ready Markdown
2+
3+
This folder provides the code examples for the Real Python tutorial [Python MarkItDown: Convert Documents Into LLM-Ready Markdown](https://realpython.com/python-markitdown/).
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
from pathlib import Path
2+
3+
from markitdown import MarkItDown
4+
5+
6+
def main(
7+
input_dir,
8+
output_dir="output",
9+
target_formats=(".docx", ".xlsx", ".pdf"),
10+
):
11+
input_path = Path(input_dir)
12+
output_path = Path(output_dir)
13+
output_path.mkdir(parents=True, exist_ok=True)
14+
15+
md = MarkItDown()
16+
17+
for file_path in input_path.rglob("*"):
18+
if file_path.suffix in target_formats:
19+
try:
20+
result = md.convert(file_path)
21+
except Exception as e:
22+
print(f"✗ Error converting {file_path.name}: {e}")
23+
continue
24+
25+
output_file = (
26+
output_path / f"{file_path.stem}{file_path.suffix}.md"
27+
)
28+
output_file.write_text(result.markdown, encoding="utf-8")
29+
print(f"✓ Converted {file_path.name}{output_file.name}")
30+
31+
32+
if __name__ == "__main__":
33+
main("data", "output")

python-markitdown/convert_files.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
from markitdown import MarkItDown
2+
3+
md = MarkItDown()
4+
result = md.convert("./data/markdown_syntax.docx")
5+
print(result.markdown)
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
First Name,Last Name,Department,Position,Start Date
2+
Alice,Johnson,Marketing,Marketing Coordinator,1/15/2022
3+
Bob,Williams,Human Resources,HR Generalist,6/1/2021
4+
Carol,Davis,Engineering,Software Engineer,3/20/2023
5+
David,Brown,Sales,Sales Representative,9/10/2022
6+
Eve,Miller,Finance,Financial Analyst,11/5/2021
7+
Frank,Garcia,Customer Service,Customer Support Specialist,7/1/2023
8+
Grace,Rodriguez,Research & Development,Research Scientist,4/25/2022
9+
Henry,Martinez,Operations,Operations Manager,2/14/2021
7.93 KB
Binary file not shown.
12.1 KB
Binary file not shown.
50.4 KB
Binary file not shown.

python-markitdown/data/pep8.docx

33.5 KB
Binary file not shown.
81 KB
Loading
397 KB
Loading

0 commit comments

Comments
 (0)