Skip to content

Conversation

zoltan-vespertool
Copy link

This is:

  • a bugfix
  • a new feature
  • refactoring
  • additional unit tests

Checklist:

  • Changes are covered by unit tests
    • Changes are covered by existing unit tests
    • New unit tests have been added
  • Code style is respected
  • Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
  • CHANGELOG.md contains a short summary of the change and a link to the pull request if applicable
  • Documentation is updated as necessary

Why this change is needed?

Worksheet::toArray now calls these methods:

     $maxCol = $this->getHighestDataColumn();
     $maxRow = $this->getHighestDataRow();

instead of getHighestColumn and getHighestRow. This is to make sure that rows/columns that only have some formatting are not included. Otherwise the export would contain 16384 columns if e.g. column width setting is applied.

@oleibman
Copy link
Collaborator

I assume you can figure out the phpcs failure on your own. Let me know if you need assistance for that.

The failing test is peculiar. The values on the spreadsheet are integers, but the test is comparing to floats. I don't see why your change should have affected anything, but the test is less than optimal anyhow. I would have no objection to your making the following changes:

        $data = $spreadsheet->getActiveSheet()->toArray(formatData: false);
        $ref = [1, null, 3, null, 5, null, 7, null, 9, null];

        for ($i = 0; $i < 10; ++$i) {
            self::assertSame($ref, \array_slice($data[$i], 0, 10, true));
        }

That being said, there has been a lot of discussion in various issues about get... vs getHighest... I need to study those to make sure that the current code does not use getHighest for a reason

@zoltan-vespertool zoltan-vespertool force-pushed the toarray-export-only-data-columns branch from 4058c8c to eb33a0b Compare September 12, 2025 08:11
@zoltan-vespertool
Copy link
Author

Fixed the formatting.

For the test: the test failing was actually related to my change. But it is not about the float/int conversion - in fact, toArray() returns all cells as string|null, so the actual values returned are '1', '3' etc. thus neither 1 as int nor 1.0 as float.

What happens there, is the rightmost column (containing '10') is not loaded because of the filter OddColumnReadFilter. Then all those cells (column J) become empty data-wise. On the other hand, the cells have formatting: libreoffice shows me those cells as "font=Calibri", while the rest of the cells as "font=-apple-system".

The other way to fix the test would be to use rangeToArray instead of toArray.

Worksheet::toArray now calls these methods:

         $maxCol = $this->getHighestDataColumn();
         $maxRow = $this->getHighestDataRow();

instead of getHighestColumn and getHighestRow. This is to make sure
that rows/columns that only have some formatting are not included.
Otherwise the export would contain 16384 columns if e.g. column width
setting is applied.
@zoltan-vespertool zoltan-vespertool force-pushed the toarray-export-only-data-columns branch from eb33a0b to 13e24b2 Compare September 13, 2025 06:35
@zoltan-vespertool
Copy link
Author

I was thinking about this a bit more, and fixed the test in a different way.

So the scope of this test is to see if read filters work. But given that the example XLSX file used in the test has cell formatting beyond data cells (the latter are the 10x10 matrix A1:J10), it was affected by the behavior of toArray() considering formatting and not only data. That was what it was working around with the array_slice() calls.

Thus the fix is to use rangeToArray() in this test, to make it independent of how toArray() works. It then allows for the removal of the array_slice() call. The string/float conversions don't matter in any way, as assertEq() would use == internally, causing PHP to use == for the array elements.

@oleibman
Copy link
Collaborator

Your work on this is exemplary. Unfortunately, as I mentioned was a possibility, I will not be able to merge this change in its present form. See #3982 (comment). The fact that your change broke an unrelated test confirms that it is indeed a breaking change.

Although I cannot merge this change, the idea behind it is worthwhile. I imagine that most users would want your behavior rather than how PhpSpreadsheet handles the situation now. But a breaking change is a breaking change. It is not something we take lightly, especially since PhpSpreadsheet already provides a means to get the result you want:

rangeToArray('A1:' . $sheet->getHighestDataColumn() . $sheet->getHighestDataRow);

However, I would not be averse to adding an extra parameter to toArray which tells it whether to use getHighestDataColumn/Row rather than getHighestColumn/Row. It would have to default to the current behavior, but we could change the default for the next breaking release. If you want to submit a PR which does that, I would be willing to move forward with it.

Thank you again for your work on this change. I am sorry that it has so far led to a disappointing outcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants