Skip to content

Conversation

@wesmcouch
Copy link

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-875095: Add offset param to DataFrame.limit() #44

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
  3. Please describe how your code solves the related issue.

    This allows use of the OFFSET command with a DataFrame fluent API instead of calling sql().

Pre-review checklist

(For Snowflake employees)

  • This change has passed precommit
  • I have reviewed code coverage report for my PR in (Sonarqube)

@wesmcouch
Copy link
Author

@sfc-gh-bli @sfc-gh-rysun any chance I could get a review on this PR? Let me know if there are adjustments needed, I can make the changes. Thank you.

@sfc-gh-bli sfc-gh-bli changed the title Offset Support Dataframe.offset function Mar 21, 2025
Copy link
Collaborator

@sfc-gh-bli sfc-gh-bli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results of Snowflake SQL is unordered unless explicitly invoke "order by".
for example in your test
getSession().sql("select * from values(1), (2), (3), (4), (5) as t(a)").offset(3)
is not guaranteed to return 4 and 5, can be any two numbers from 1,2,3,4,5.

At the same time, Select col1 from table1 order by col1 offset 3 and select * from (select col1 from table1 order by col1) offset 3 are different. the first one is ordered but the second one is unordered.

Snowpark has an optimization on Sort + Limit to make it generateing flatten SQL to keep the order.

object SortPlusLimitPolicy extends SimplificationPolicy {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SNOW-875095: Add offset param to DataFrame.limit()

5 participants