Skip to content

Conversation

Copilot
Copy link

@Copilot Copilot AI commented Aug 26, 2025

This PR adds support for returning PyArrow tables instead of pandas DataFrames when using ResultsFormat.ARROW, enabling more efficient handling of GeoArrow geometries and better integration with the Arrow ecosystem.

Problem

Previously, even when results_format=ResultsFormat.ARROW was specified, the driver would always convert Arrow data to pandas DataFrames via reader.read_pandas(). This prevented users from working directly with Arrow tables and GeoArrow geometries, requiring unnecessary conversions.

# Before: Always returned pandas DataFrames
with connect(
    api_key=api_key,
    results_format=ResultsFormat.ARROW,  # Arrow on wire, but...
    geometry_representation=GeometryRepresentation.WKB,
) as conn:
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM buildings LIMIT 1000")
    results = cursor.fetchall()  # Always pandas.DataFrame

Solution

Added a new output_format parameter that allows users to specify whether they want Arrow tables or pandas DataFrames returned:

# After: Can return native Arrow tables
with connect(
    api_key=api_key,
    results_format=ResultsFormat.ARROW,
    output_format=OutputFormat.ARROW,  # NEW: Return Arrow tables
    geometry_representation=GeometryRepresentation.WKB,
) as conn:
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM buildings LIMIT 1000")
    results = cursor.fetchall()  # Now returns pyarrow.Table!

Key Changes

  • Added OutputFormat enum with PANDAS (default) and ARROW values
  • Extended connection functions to accept output_format parameter
  • Modified result handling to conditionally return Arrow tables vs DataFrames
  • Updated cursor implementation to handle both data types seamlessly
  • Maintained full backward compatibility - existing code works unchanged

Benefits

  • Better performance with large datasets (reduced memory usage, zero-copy operations)
  • Native GeoArrow support for spatial data structures
  • Seamless Arrow ecosystem integration
  • Backward compatible - defaults to existing pandas behavior
  • Type safe with proper metadata extraction for both formats

The implementation ensures that users can now work directly with Arrow tables when needed while maintaining the familiar pandas interface as the default.

Fixes #45.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • astral.sh
    • Triggering command: curl -LsSf REDACTED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copy link
Author

Copilot AI commented Aug 26, 2025

@mpetazzoni 👋 This repository doesn't have Copilot instructions. With Copilot instructions, I can understand the repository better, work faster and produce higher quality PRs.

I can generate a .github/copilot-instructions.md file for you automatically. Click here to open a pre-filled issue and assign it to me. I'll write the instructions, and then tag you for review.

@Copilot Copilot AI changed the title [WIP] Option to return Arrow tables with GeoArrow geometries Add option to return Arrow tables with GeoArrow geometries Aug 26, 2025
@Copilot Copilot AI requested a review from mpetazzoni August 26, 2025 20:07
Copilot finished work on behalf of mpetazzoni August 26, 2025 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Option to return Arrow tables with GeoArrow geometries
2 participants