fix: Optimize RowType::hashKind #12999

Yuhta · 2025-04-10T17:44:57Z

Summary:
Avoid recurively rehash for RowType. This reduces the expression
compilation time significantly for flatmap types commonly seen in ML workload.

Differential Revision: D72803509

…ager_flush (facebookincubator#12975) Summary: Add fast path for streaming aggregation where we have input rows from same group located together. For certain functions, we can leverage this property to reduce the number of copy calls and create larger and fewer ranges for copy. This brings 3x improvements for a specific query shape common in data loading for AI training. We implement this optimization for `arbitrary` and `array_agg`. For `arbitrary`, if the input is clustered, we just keep a reference to the input vector and index that is selected; when we extract values from the container, we group all copies from same vector to one `copyRange` call so the cost is minimized. For `array_agg`, we do similar thing, only track the range (offset and size) where the input will be taken for each group, and do the copy in bulk when we extract value. There is another optimization to flush the streaming aggregation output whenever there is result available, via a new query config `streaming_aggregation_eager_flush`. This allows us to minimize the memory used by accumulators. Differential Revision: D72677410

Summary: Avoid recurively rehash for `RowType`. This reduces the expression compilation time significantly for flatmap types commonly seen in ML workload. Differential Revision: D72803509

facebook-github-bot · 2025-04-10T17:45:05Z

This pull request was exported from Phabricator. Differential Revision: D72803509

netlify · 2025-04-10T17:45:17Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`41d9504`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/67f8039bd577dc000875eb01

mbasmanova

@Yuhta Are we caching hash for ROW type? Any reason not to compute it in the ctor?

Yuhta · 2025-04-10T22:02:21Z

@mbasmanova Not sure how often this is used. Seems only used for expression deduplication so we don't need to pay for this at runtime.

facebook-github-bot · 2025-04-11T16:05:10Z

This pull request has been merged in 6b5a5f1.

prestodb-ci · 2025-04-11T16:05:24Z

Rebase triggered for oap-project/velox.

Summary: Pull Request resolved: facebookincubator#12999 Avoid recurively rehash for `RowType`. This reduces the expression compilation time significantly for flatmap types commonly seen in ML workload. Reviewed By: mbasmanova Differential Revision: D72803509 fbshipit-source-id: 95f6f0db615b526ac0d349a6a4a3f2155f00e2eb

Yuhta added 2 commits April 10, 2025 10:44

fix: Optimize RowType::hashKind

41d9504

Summary: Avoid recurively rehash for `RowType`. This reduces the expression compilation time significantly for flatmap types commonly seen in ML workload. Differential Revision: D72803509

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 10, 2025

facebook-github-bot added the fb-exported label Apr 10, 2025

mbasmanova approved these changes Apr 10, 2025

View reviewed changes

facebook-github-bot closed this in 6b5a5f1 Apr 11, 2025

facebook-github-bot added the Merged label Apr 11, 2025

prestodb-ci mentioned this pull request Apr 11, 2025

Rebase branch velox_pr_rebase (88c1a99) with oss-main (6b5a5f1) oap-project/velox#517

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Optimize RowType::hashKind #12999

fix: Optimize RowType::hashKind #12999

Yuhta commented Apr 10, 2025

facebook-github-bot commented Apr 10, 2025

netlify bot commented Apr 10, 2025 •

edited

Loading

mbasmanova left a comment

Yuhta commented Apr 10, 2025

facebook-github-bot commented Apr 11, 2025

prestodb-ci commented Apr 11, 2025

fix: Optimize RowType::hashKind #12999

fix: Optimize RowType::hashKind #12999

Conversation

Yuhta commented Apr 10, 2025

facebook-github-bot commented Apr 10, 2025

netlify bot commented Apr 10, 2025 • edited Loading

✅ Deploy Preview for meta-velox canceled.

mbasmanova left a comment

Choose a reason for hiding this comment

Yuhta commented Apr 10, 2025

facebook-github-bot commented Apr 11, 2025

prestodb-ci commented Apr 11, 2025

netlify bot commented Apr 10, 2025 •

edited

Loading