-
Notifications
You must be signed in to change notification settings - Fork 1.3k
feat: Add Aggregate::addRawClusteredInput and streaming_aggregation_eager_flush #12975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request was exported from Phabricator. Differential Revision: D72677410 |
✅ Deploy Preview for meta-velox canceled.
|
@Yuhta Jimmy, thank you for the optimization. Would you update the PR description to share some findings about why this is a useful optimization? 3x improvements for a specific query shape common in data loading for AI training. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Added two nits.
…ager_flush (facebookincubator#12975) Summary: Add fast path for streaming aggregation where we have input rows from same group located together. For certain functions, we can leverage this property to reduce the number of copy calls and create larger and fewer ranges for copy. This brings 3x improvements for a specific query shape common in data loading for AI training. We implement this optimization for `arbitrary` and `array_agg`. For `arbitrary`, if the input is clustered, we just keep a reference to the input vector and index that is selected; when we extract values from the container, we group all copies from same vector to one `copyRange` call so the cost is minimized. For `array_agg`, we do similar thing, only track the range (offset and size) where the input will be taken for each group, and do the copy in bulk when we extract value. There is another optimization to flush the streaming aggregation output whenever there is result available, via a new query config `streaming_aggregation_eager_flush`. This allows us to minimize the memory used by accumulators. Differential Revision: D72677410
This pull request was exported from Phabricator. Differential Revision: D72677410 |
…ager_flush (facebookincubator#12975) Summary: Add fast path for streaming aggregation where we have input rows from same group located together. For certain functions, we can leverage this property to reduce the number of copy calls and create larger and fewer ranges for copy. This brings 3x improvements for a specific query shape common in data loading for AI training. We implement this optimization for `arbitrary` and `array_agg`. For `arbitrary`, if the input is clustered, we just keep a reference to the input vector and index that is selected; when we extract values from the container, we group all copies from same vector to one `copyRange` call so the cost is minimized. For `array_agg`, we do similar thing, only track the range (offset and size) where the input will be taken for each group, and do the copy in bulk when we extract value. There is another optimization to flush the streaming aggregation output whenever there is result available, via a new query config `streaming_aggregation_eager_flush`. This allows us to minimize the memory used by accumulators. Differential Revision: D72677410
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, Jimmy.
…ager_flush (facebookincubator#12975) Summary: Add fast path for streaming aggregation where we have input rows from same group located together. For certain functions, we can leverage this property to reduce the number of copy calls and create larger and fewer ranges for copy. This brings 3x improvements for a specific query shape common in data loading for AI training. We implement this optimization for `arbitrary` and `array_agg`. For `arbitrary`, if the input is clustered, we just keep a reference to the input vector and index that is selected; when we extract values from the container, we group all copies from same vector to one `copyRange` call so the cost is minimized. For `array_agg`, we do similar thing, only track the range (offset and size) where the input will be taken for each group, and do the copy in bulk when we extract value. There is another optimization to flush the streaming aggregation output whenever there is result available, via a new query config `streaming_aggregation_eager_flush`. This allows us to minimize the memory used by accumulators. Reviewed By: mbasmanova Differential Revision: D72677410
This pull request was exported from Phabricator. Differential Revision: D72677410 |
…ager_flush (facebookincubator#12975) Summary: Add fast path for streaming aggregation where we have input rows from same group located together. For certain functions, we can leverage this property to reduce the number of copy calls and create larger and fewer ranges for copy. This brings 3x improvements for a specific query shape common in data loading for AI training. We implement this optimization for `arbitrary` and `array_agg`. For `arbitrary`, if the input is clustered, we just keep a reference to the input vector and index that is selected; when we extract values from the container, we group all copies from same vector to one `copyRange` call so the cost is minimized. For `array_agg`, we do similar thing, only track the range (offset and size) where the input will be taken for each group, and do the copy in bulk when we extract value. There is another optimization to flush the streaming aggregation output whenever there is result available, via a new query config `streaming_aggregation_eager_flush`. This allows us to minimize the memory used by accumulators. Reviewed By: mbasmanova Differential Revision: D72677410
This pull request was exported from Phabricator. Differential Revision: D72677410 |
…ager_flush (facebookincubator#12975) Summary: Add fast path for streaming aggregation where we have input rows from same group located together. For certain functions, we can leverage this property to reduce the number of copy calls and create larger and fewer ranges for copy. This brings 3x improvements for a specific query shape common in data loading for AI training. We implement this optimization for `arbitrary` and `array_agg`. For `arbitrary`, if the input is clustered, we just keep a reference to the input vector and index that is selected; when we extract values from the container, we group all copies from same vector to one `copyRange` call so the cost is minimized. For `array_agg`, we do similar thing, only track the range (offset and size) where the input will be taken for each group, and do the copy in bulk when we extract value. There is another optimization to flush the streaming aggregation output whenever there is result available, via a new query config `streaming_aggregation_eager_flush`. This allows us to minimize the memory used by accumulators. Reviewed By: mbasmanova Differential Revision: D72677410
This pull request was exported from Phabricator. Differential Revision: D72677410 |
…ager_flush (facebookincubator#12975) Summary: Add fast path for streaming aggregation where we have input rows from same group located together. For certain functions, we can leverage this property to reduce the number of copy calls and create larger and fewer ranges for copy. This brings 3x improvements for a specific query shape common in data loading for AI training. We implement this optimization for `arbitrary` and `array_agg`. For `arbitrary`, if the input is clustered, we just keep a reference to the input vector and index that is selected; when we extract values from the container, we group all copies from same vector to one `copyRange` call so the cost is minimized. For `array_agg`, we do similar thing, only track the range (offset and size) where the input will be taken for each group, and do the copy in bulk when we extract value. There is another optimization to flush the streaming aggregation output whenever there is result available, via a new query config `streaming_aggregation_eager_flush`. This allows us to minimize the memory used by accumulators. Reviewed By: mbasmanova Differential Revision: D72677410
This pull request was exported from Phabricator. Differential Revision: D72677410 |
This pull request has been merged in afb236a. |
Rebase triggered for oap-project/velox. |
…ager_flush (facebookincubator#12975) Summary: Pull Request resolved: facebookincubator#12975 Add fast path for streaming aggregation where we have input rows from same group located together. For certain functions, we can leverage this property to reduce the number of copy calls and create larger and fewer ranges for copy. This brings 3x improvements for a specific query shape common in data loading for AI training. We implement this optimization for `arbitrary` and `array_agg`. For `arbitrary`, if the input is clustered, we just keep a reference to the input vector and index that is selected; when we extract values from the container, we group all copies from same vector to one `copyRange` call so the cost is minimized. For `array_agg`, we do similar thing, only track the range (offset and size) where the input will be taken for each group, and do the copy in bulk when we extract value. There is another optimization to flush the streaming aggregation output whenever there is result available, via a new query config `streaming_aggregation_eager_flush`. This allows us to minimize the memory used by accumulators. Reviewed By: mbasmanova Differential Revision: D72677410 fbshipit-source-id: eedd664174b13784b47325eb2ab8274445470235
Summary:
Add fast path for streaming aggregation where we have input rows from same group located together. For certain functions, we can leverage this property to reduce the number of copy calls and create larger and fewer ranges for copy. This brings 3x improvements for a specific query shape common in data loading for AI training.
We implement this optimization for
arbitrary
andarray_agg
. Forarbitrary
, if the input is clustered, we just keep a reference to the input vector and index that is selected; when we extract values from the container, we group all copies from same vector to onecopyRange
call so the cost is minimized. Forarray_agg
, we do similar thing, only track the range (offset and size) where the input will be taken for each group, and do the copy in bulk when we extract value.There is another optimization to flush the streaming aggregation output whenever there is result available, via a new query config
streaming_aggregation_eager_flush
. This allows us to minimize the memory used by accumulators.Differential Revision: D72677410