added xlsx reader #11287

aantich · 2025-11-09T13:08:17Z

Reads multiple spreadsheets
Converts tables to tables
Basic tests added

jgm · 2025-11-09T14:04:51Z

.gitignore

+/test-docs
+doc/pptx-reader-design-v2.md
+doc/pptx-reader-design.md
+doc/xlsx-reader-design.md


As with the other PR, let's leave off these .gitignore changes.

jgm · 2025-11-09T14:06:44Z

I think the test files may need to be included in extra-source-files in pandoc.cabal.

jgm · 2025-11-09T14:08:00Z

The new format should be added to MANUAL.txt ; see the list under --readers.

jgm · 2025-11-09T14:12:48Z

One thing I noticed testing this locally: I got a table with hundreds of empty rows.
Generally when one starts a spreadsheet in Excel, it already comes with lots of rows, only some of which get used. It would probably make sense to strip all trailing empty rows before parsing.

aantich · 2025-11-09T14:20:03Z

One thing I noticed testing this locally: I got a table with hundreds of empty rows. Generally when one starts a spreadsheet in Excel, it already comes with lots of rows, only some of which get used. It would probably make sense to strip all trailing empty rows before parsing.

We'll fix the trailing empty rows; for empty rows in the middle of the sheet I believe it makes sense to keep them, since I may want to reference the exact cell coordinates eventually, e.g. a table like this:

is converted to markdown like this:

Main {#sheet-1}

Person Age Location

Anton Antich 23.0 Switzerland
James Bond 35.0 Moscow

Just a random cell

I believe this is the correct behavior.

- MANUAL updated - trailing empty rows removed

aantich · 2025-11-09T14:53:38Z

Believe we addressed all comments. Didnt test empty rows rigorously, but on a couple of quick files it works.

jgm · 2025-11-10T11:20:18Z

src/Text/Pandoc/Readers/Xlsx/Sheets.hs

+cellToInlines :: XlsxCell -> [Inline]
+cellToInlines cell =
+  let base = case cellValue cell of
+        TextValue t -> [Str t]


For text values, better to use B.toList (B.text t) where B is Text.Pandoc.Builder. This will convert the string into a list of Str and Space elements. Pandoc expects spaces to be represented as Space and not space characters inside a Str.

jgm · 2025-11-10T11:23:05Z

test/Tests/Readers/Xlsx.hs

+nativeDiff :: FilePath -> Pandoc -> Pandoc -> IO (Maybe String)
+nativeDiff normPath expectedNative actualNative
+  | expectedNative == actualNative = return Nothing
+  | otherwise = Just <$> do
+      expected <- T.unpack <$> runIOorExplode (writeNative def expectedNative)
+      actual <- T.unpack <$> runIOorExplode (writeNative def actualNative)
+      let dash = replicate 72 '-'
+      let diff = getDiff (lines actual) (lines expected)
+      return $ '\n' : dash ++
+               "\n--- " ++ normPath ++
+               "\n+++ " ++ "test" ++ "\n" ++
+               showDiff (1,1) diff ++ dash


Since this seems to be duplicated from docx reader tests, I wonder if it makes sense to import it from there, or put it in some common place, e.g. Test.Helpers ?

added xlsx reader

1da07fd

jgm reviewed Nov 9, 2025

View reviewed changes

fixed gitignore

74eb75d

addressed PR comments:

df12a75

- MANUAL updated - trailing empty rows removed

jgm reviewed Nov 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

added xlsx reader #11287

added xlsx reader #11287

aantich commented Nov 9, 2025

Uh oh!

jgm Nov 9, 2025

Uh oh!

jgm commented Nov 9, 2025

Uh oh!

jgm commented Nov 9, 2025

Uh oh!

jgm commented Nov 9, 2025

Uh oh!

aantich commented Nov 9, 2025 •

edited

Loading

Uh oh!

aantich commented Nov 9, 2025

Uh oh!

jgm Nov 10, 2025

Uh oh!

jgm Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

added xlsx reader #11287

Are you sure you want to change the base?

added xlsx reader #11287

Conversation

aantich commented Nov 9, 2025

Uh oh!

jgm Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

jgm commented Nov 9, 2025

Uh oh!

jgm commented Nov 9, 2025

Uh oh!

jgm commented Nov 9, 2025

Uh oh!

aantich commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Main {#sheet-1}

Uh oh!

aantich commented Nov 9, 2025

Uh oh!

jgm Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

jgm Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aantich commented Nov 9, 2025 •

edited

Loading