Skip to content

SeaDatabricksClient: Add Metadata Commands #593

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 109 commits into
base: sea-migration
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
109 commits
Select commit Hold shift + click to select a range
138c2ae
[squash from exec-sea] bring over execution phase changes
varun-edachali-dbx Jun 9, 2025
3e3ab94
remove excess test
varun-edachali-dbx Jun 9, 2025
4a78165
add docstring
varun-edachali-dbx Jun 9, 2025
0dac4aa
remvoe exec func in sea backend
varun-edachali-dbx Jun 9, 2025
1b794c7
remove excess files
varun-edachali-dbx Jun 9, 2025
da5a6fe
remove excess models
varun-edachali-dbx Jun 9, 2025
686ade4
remove excess sea backend tests
varun-edachali-dbx Jun 9, 2025
31e6c83
cleanup
varun-edachali-dbx Jun 9, 2025
69ea238
re-introduce get_schema_desc
varun-edachali-dbx Jun 9, 2025
66d7517
remove SeaResultSet
varun-edachali-dbx Jun 9, 2025
71feef9
clean imports and attributes
varun-edachali-dbx Jun 9, 2025
ae9862f
pass CommandId to ExecResp
varun-edachali-dbx Jun 9, 2025
d8aa69e
remove changes in types
varun-edachali-dbx Jun 9, 2025
db139bc
add back essential types (ExecResponse, from_sea_state)
varun-edachali-dbx Jun 9, 2025
b977b12
fix fetch types
varun-edachali-dbx Jun 9, 2025
da615c0
excess imports
varun-edachali-dbx Jun 9, 2025
0da04a6
reduce diff by maintaining logs
varun-edachali-dbx Jun 9, 2025
ea9d456
fix int test types
varun-edachali-dbx Jun 9, 2025
8985c62
[squashed from exec-sea] init execution func
varun-edachali-dbx Jun 9, 2025
d9bcdbe
remove irrelevant changes
varun-edachali-dbx Jun 9, 2025
ee9fa1c
remove ResultSetFilter functionality
varun-edachali-dbx Jun 9, 2025
24c6152
remove more irrelevant changes
varun-edachali-dbx Jun 9, 2025
67fd101
remove more irrelevant changes
varun-edachali-dbx Jun 9, 2025
271fcaf
even more irrelevant changes
varun-edachali-dbx Jun 9, 2025
bf26ea3
remove sea response as init option
varun-edachali-dbx Jun 9, 2025
ed7cf91
exec test example scripts
varun-edachali-dbx Jun 9, 2025
dae15e3
formatting (black)
varun-edachali-dbx Jun 9, 2025
db5bbea
[squashed from sea-exec] merge sea stuffs
varun-edachali-dbx Jun 9, 2025
d5d3699
remove excess changes
varun-edachali-dbx Jun 9, 2025
6137a3d
remove excess removed docstring
varun-edachali-dbx Jun 9, 2025
75b0773
remove excess changes in backend
varun-edachali-dbx Jun 9, 2025
4494dcd
remove excess imports
varun-edachali-dbx Jun 9, 2025
4d0aeca
remove accidentally removed _get_schema_desc
varun-edachali-dbx Jun 9, 2025
7cece5e
remove unnecessary init with sea_response tests
varun-edachali-dbx Jun 9, 2025
8977c06
rmeove unnecessary changes
varun-edachali-dbx Jun 9, 2025
0216d7a
formatting (black)
varun-edachali-dbx Jun 9, 2025
4cb15fd
improved models and filters from cloudfetch-sea branch
varun-edachali-dbx Jun 9, 2025
dee47f7
filters stuff (align with JDBC)
varun-edachali-dbx Jun 10, 2025
e385d5b
backend from cloudfetch-sea
varun-edachali-dbx Jun 11, 2025
484064e
remove filtering, metadata ops
varun-edachali-dbx Jun 11, 2025
030edf8
raise NotImplementedErrror for metadata ops
varun-edachali-dbx Jun 11, 2025
30f8266
add metadata commands
varun-edachali-dbx Jun 11, 2025
033ae73
formatting (black)
varun-edachali-dbx Jun 11, 2025
33821f4
add metadata command unit tests
varun-edachali-dbx Jun 11, 2025
3e22c6c
change to valid table name
varun-edachali-dbx Jun 11, 2025
787f1f7
Merge branch 'sea-migration' into sea-test-scripts
varun-edachali-dbx Jun 11, 2025
165c4f3
remove un-necessary changes
varun-edachali-dbx Jun 11, 2025
a6e40d0
simplify test module
varun-edachali-dbx Jun 11, 2025
52e3088
logging -> debug level
varun-edachali-dbx Jun 11, 2025
641c09b
change table name in log
varun-edachali-dbx Jun 11, 2025
8bd12d8
Merge branch 'sea-migration' into exec-models-sea
varun-edachali-dbx Jun 11, 2025
ffded6e
remove un-necessary changes
varun-edachali-dbx Jun 11, 2025
227f6b3
remove un-necessary backend cahnges
varun-edachali-dbx Jun 11, 2025
68657a3
remove un-needed GetChunksResponse
varun-edachali-dbx Jun 11, 2025
3940eec
remove un-needed GetChunksResponse
varun-edachali-dbx Jun 11, 2025
37813ba
reduce code duplication in response parsing
varun-edachali-dbx Jun 11, 2025
267c9f4
reduce code duplication
varun-edachali-dbx Jun 11, 2025
2967119
more clear docstrings
varun-edachali-dbx Jun 11, 2025
47fd60d
introduce strongly typed ChunkInfo
varun-edachali-dbx Jun 11, 2025
982fdf2
remove is_volume_operation from response
varun-edachali-dbx Jun 12, 2025
9e14d48
add is_volume_op and more ResultData fields
varun-edachali-dbx Jun 12, 2025
be1997e
Merge branch 'exec-models-sea' into exec-phase-sea
varun-edachali-dbx Jun 12, 2025
e8e8ee7
Merge branch 'sea-test-scripts' into exec-phase-sea
varun-edachali-dbx Jun 12, 2025
05ee4e7
add test scripts
varun-edachali-dbx Jun 12, 2025
3ffa898
Merge branch 'exec-models-sea' into metadata-sea
varun-edachali-dbx Jun 12, 2025
2952d8d
Revert "Merge branch 'sea-migration' into exec-models-sea"
varun-edachali-dbx Jun 12, 2025
89e2aa0
Merge branch 'exec-phase-sea' into metadata-sea
varun-edachali-dbx Jun 12, 2025
cbace3f
Revert "Merge branch 'exec-models-sea' into exec-phase-sea"
varun-edachali-dbx Jun 12, 2025
c075b07
change logging level
varun-edachali-dbx Jun 12, 2025
c62f76d
remove un-necessary changes
varun-edachali-dbx Jun 12, 2025
199402e
remove excess changes
varun-edachali-dbx Jun 12, 2025
8ac574b
remove excess changes
varun-edachali-dbx Jun 12, 2025
398ca70
Merge branch 'sea-migration' into exec-phase-sea
varun-edachali-dbx Jun 12, 2025
b1acc5b
remove _get_schema_bytes (for now)
varun-edachali-dbx Jun 12, 2025
ef2a7ee
redundant comments
varun-edachali-dbx Jun 12, 2025
699942d
Merge branch 'sea-migration' into exec-phase-sea
varun-edachali-dbx Jun 12, 2025
af8f74e
remove fetch phase methods
varun-edachali-dbx Jun 12, 2025
5540c5c
reduce code repetititon + introduce gaps after multi line pydocs
varun-edachali-dbx Jun 12, 2025
efe3881
remove unused imports
varun-edachali-dbx Jun 12, 2025
36ab59b
move description extraction to helper func
varun-edachali-dbx Jun 12, 2025
1d57c99
formatting (black)
varun-edachali-dbx Jun 12, 2025
df6dac2
add more unit tests
varun-edachali-dbx Jun 12, 2025
ad0e527
streamline unit tests
varun-edachali-dbx Jun 12, 2025
ed446a0
test getting the list of allowed configurations
varun-edachali-dbx Jun 12, 2025
38e4b5c
reduce diff
varun-edachali-dbx Jun 12, 2025
94879c0
reduce diff
varun-edachali-dbx Jun 12, 2025
1809956
house constants in enums for readability and immutability
varun-edachali-dbx Jun 13, 2025
da5260c
add note on hybrid disposition
varun-edachali-dbx Jun 13, 2025
0385ffb
remove redundant note on arrow_schema_bytes
varun-edachali-dbx Jun 16, 2025
349c021
Merge branch 'exec-phase-sea' into metadata-sea
varun-edachali-dbx Jun 17, 2025
6229848
remove irrelevant changes
varun-edachali-dbx Jun 17, 2025
fd52356
remove un-necessary test changes
varun-edachali-dbx Jun 17, 2025
64e58b0
remove un-necessary changes in thrift backend tests
varun-edachali-dbx Jun 17, 2025
0a2cdfd
remove unimplemented methods test
varun-edachali-dbx Jun 17, 2025
90bb09c
Merge branch 'sea-migration' into exec-phase-sea
varun-edachali-dbx Jun 17, 2025
cd22389
remove invalid import
varun-edachali-dbx Jun 17, 2025
82e0f8b
Merge branch 'sea-migration' into exec-phase-sea
varun-edachali-dbx Jun 17, 2025
e64b81b
Merge branch 'exec-phase-sea' into metadata-sea
varun-edachali-dbx Jun 17, 2025
5ab9bbe
better align queries with JDBC impl
varun-edachali-dbx Jun 18, 2025
1ab6e87
line breaks after multi-line PRs
varun-edachali-dbx Jun 18, 2025
f469c24
remove unused imports
varun-edachali-dbx Jun 18, 2025
68ec65f
fix: introduce ExecuteResponse import
varun-edachali-dbx Jun 18, 2025
ffd478e
Merge branch 'sea-migration' into metadata-sea
varun-edachali-dbx Jun 18, 2025
f6d873d
remove unimplemented metadata methods test, un-necessary imports
varun-edachali-dbx Jun 18, 2025
28675f5
introduce unit tests for metadata methods
varun-edachali-dbx Jun 18, 2025
3578659
remove verbosity in ResultSetFilter docstring
varun-edachali-dbx Jun 20, 2025
8713023
remove un-necessary info in ResultSetFilter docstring
varun-edachali-dbx Jun 20, 2025
22dc252
remove explicit type checking, string literals around forward annotat…
varun-edachali-dbx Jun 20, 2025
390f592
house SQL commands in constants
varun-edachali-dbx Jun 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 160 additions & 0 deletions src/databricks/sql/backend/filters.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
"""
Client-side filtering utilities for Databricks SQL connector.

This module provides filtering capabilities for result sets returned by different backends.
"""

import logging
from typing import (
List,
Optional,
Any,
Callable,
cast,
)

from databricks.sql.backend.sea.backend import SeaDatabricksClient
from databricks.sql.backend.types import ExecuteResponse

from databricks.sql.result_set import ResultSet, SeaResultSet

logger = logging.getLogger(__name__)


class ResultSetFilter:
"""
A general-purpose filter for result sets.
"""

@staticmethod
def _filter_sea_result_set(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is specific to SEA result set and can't be used for a generic result set class? let's try to make it generic for a result set

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some service specific methods at some point during the filtering process to know what kind of result set to return, since our concrete instances are service specific. I tried to keep the root methods invoked (filter by table type) general, following which they invoke the service specific builders based on the type of the instance passed to them.

result_set: SeaResultSet, filter_func: Callable[[List[Any]], bool]
) -> SeaResultSet:
"""
Filter a SEA result set using the provided filter function.

Args:
result_set: The SEA result set to filter
filter_func: Function that takes a row and returns True if the row should be included

Returns:
A filtered SEA result set
"""

# Get all remaining rows
all_rows = result_set.results.remaining_rows()

# Filter rows
filtered_rows = [row for row in all_rows if filter_func(row)]

# Reuse the command_id from the original result set
command_id = result_set.command_id

# Create an ExecuteResponse with the filtered data
execute_response = ExecuteResponse(
command_id=command_id,
status=result_set.status,
description=result_set.description,
has_been_closed_server_side=result_set.has_been_closed_server_side,
lz4_compressed=result_set.lz4_compressed,
arrow_schema_bytes=result_set._arrow_schema_bytes,
is_staging_operation=False,
)

# Create a new ResultData object with filtered data

from databricks.sql.backend.sea.models.base import ResultData

result_data = ResultData(data=filtered_rows, external_links=None)

from databricks.sql.result_set import SeaResultSet

# Create a new SeaResultSet with the filtered data
filtered_result_set = SeaResultSet(
connection=result_set.connection,
execute_response=execute_response,
sea_client=cast(SeaDatabricksClient, result_set.backend),
buffer_size_bytes=result_set.buffer_size_bytes,
arraysize=result_set.arraysize,
result_data=result_data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you remind me what is the significance of this result_data param in result set? is this present in the base class? Is this an optional param and is used to create a result set with hard-coded rows?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not present in the base class, it is an instance of a ResultData model which represents the results returned during SEA execution. In our case, we set the filtered rows in the data array of this ResultData to effectively create a filtered SeaResultSet.

)

return filtered_result_set

@staticmethod
def filter_by_column_values(
result_set: ResultSet,
column_index: int,
allowed_values: List[str],
case_sensitive: bool = False,
) -> ResultSet:
"""
Filter a result set by values in a specific column.

Args:
result_set: The result set to filter
column_index: The index of the column to filter on
allowed_values: List of allowed values for the column
case_sensitive: Whether to perform case-sensitive comparison

Returns:
A filtered result set
"""

# Convert to uppercase for case-insensitive comparison if needed
if not case_sensitive:
allowed_values = [v.upper() for v in allowed_values]

# Determine the type of result set and apply appropriate filtering
from databricks.sql.result_set import SeaResultSet

if isinstance(result_set, SeaResultSet):
return ResultSetFilter._filter_sea_result_set(
result_set,
lambda row: (
len(row) > column_index
and isinstance(row[column_index], str)
and (
row[column_index].upper()
if not case_sensitive
else row[column_index]
)
in allowed_values
),
)

# For other result set types, return the original (should be handled by specific implementations)
logger.warning(
f"Filtering not implemented for result set type: {type(result_set).__name__}"
)
return result_set

@staticmethod
def filter_tables_by_type(
result_set: ResultSet, table_types: Optional[List[str]] = None
) -> ResultSet:
"""
Filter a result set of tables by the specified table types.

This is a client-side filter that processes the result set after it has been
retrieved from the server. It filters out tables whose type does not match
any of the types in the table_types list.

Args:
result_set: The original result set containing tables
table_types: List of table types to include (e.g., ["TABLE", "VIEW"])

Returns:
A filtered result set containing only tables of the specified types
"""

# Default table types if none specified
DEFAULT_TABLE_TYPES = ["TABLE", "VIEW", "SYSTEM TABLE"]
valid_types = (
table_types if table_types and len(table_types) > 0 else DEFAULT_TABLE_TYPES
)

# Table type is the 6th column (index 5)
return ResultSetFilter.filter_by_column_values(
result_set, 5, valid_types, case_sensitive=True
)
127 changes: 115 additions & 12 deletions src/databricks/sql/backend/sea/backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
ResultDisposition,
ResultCompression,
WaitTimeout,
MetadataCommands,
)

if TYPE_CHECKING:
Expand Down Expand Up @@ -40,6 +41,11 @@
GetStatementResponse,
CreateSessionResponse,
)
from databricks.sql.backend.sea.models.responses import (
_parse_status,
_parse_manifest,
_parse_result,
)

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -627,9 +633,22 @@ def get_catalogs(
max_rows: int,
max_bytes: int,
cursor: "Cursor",
):
"""Not implemented yet."""
raise NotImplementedError("get_catalogs is not yet implemented for SEA backend")
) -> "ResultSet":
"""Get available catalogs by executing 'SHOW CATALOGS'."""
result = self.execute_command(
operation=MetadataCommands.SHOW_CATALOGS.value,
session_id=session_id,
max_rows=max_rows,
max_bytes=max_bytes,
lz4_compression=False,
cursor=cursor,
use_cloud_fetch=False,
parameters=[],
async_op=False,
enforce_embedded_schema_correctness=False,
)
assert result is not None, "execute_command returned None in synchronous mode"
return result

def get_schemas(
self,
Expand All @@ -639,9 +658,30 @@ def get_schemas(
cursor: "Cursor",
catalog_name: Optional[str] = None,
schema_name: Optional[str] = None,
):
"""Not implemented yet."""
raise NotImplementedError("get_schemas is not yet implemented for SEA backend")
) -> "ResultSet":
"""Get schemas by executing 'SHOW SCHEMAS IN catalog [LIKE pattern]'."""
if not catalog_name:
raise ValueError("Catalog name is required for get_schemas")

operation = MetadataCommands.SHOW_SCHEMAS.value.format(catalog_name)

if schema_name:
operation += MetadataCommands.LIKE_PATTERN.value.format(schema_name)

result = self.execute_command(
operation=operation,
session_id=session_id,
max_rows=max_rows,
max_bytes=max_bytes,
lz4_compression=False,
cursor=cursor,
use_cloud_fetch=False,
parameters=[],
async_op=False,
enforce_embedded_schema_correctness=False,
)
assert result is not None, "execute_command returned None in synchronous mode"
return result

def get_tables(
self,
Expand All @@ -653,9 +693,45 @@ def get_tables(
schema_name: Optional[str] = None,
table_name: Optional[str] = None,
table_types: Optional[List[str]] = None,
):
"""Not implemented yet."""
raise NotImplementedError("get_tables is not yet implemented for SEA backend")
) -> "ResultSet":
"""Get tables by executing 'SHOW TABLES IN catalog [SCHEMA LIKE pattern] [LIKE pattern]'."""
if not catalog_name:
raise ValueError("Catalog name is required for get_tables")

operation = (
MetadataCommands.SHOW_TABLES_ALL_CATALOGS.value
if catalog_name in [None, "*", "%"]
else MetadataCommands.SHOW_TABLES.value.format(
MetadataCommands.CATALOG_SPECIFIC.value.format(catalog_name)
)
)

if schema_name:
operation += MetadataCommands.SCHEMA_LIKE_PATTERN.value.format(schema_name)

if table_name:
operation += MetadataCommands.LIKE_PATTERN.value.format(table_name)

result = self.execute_command(
operation=operation,
session_id=session_id,
max_rows=max_rows,
max_bytes=max_bytes,
lz4_compression=False,
cursor=cursor,
use_cloud_fetch=False,
parameters=[],
async_op=False,
enforce_embedded_schema_correctness=False,
)
assert result is not None, "execute_command returned None in synchronous mode"

# Apply client-side filtering by table_types
from databricks.sql.backend.filters import ResultSetFilter

result = ResultSetFilter.filter_tables_by_type(result, table_types)

return result

def get_columns(
self,
Expand All @@ -667,6 +743,33 @@ def get_columns(
schema_name: Optional[str] = None,
table_name: Optional[str] = None,
column_name: Optional[str] = None,
):
"""Not implemented yet."""
raise NotImplementedError("get_columns is not yet implemented for SEA backend")
) -> "ResultSet":
"""Get columns by executing 'SHOW COLUMNS IN CATALOG catalog [SCHEMA LIKE pattern] [TABLE LIKE pattern] [LIKE pattern]'."""
if not catalog_name:
raise ValueError("Catalog name is required for get_columns")

operation = MetadataCommands.SHOW_COLUMNS.value.format(catalog_name)

if schema_name:
operation += MetadataCommands.SCHEMA_LIKE_PATTERN.value.format(schema_name)

if table_name:
operation += MetadataCommands.TABLE_LIKE_PATTERN.value.format(table_name)

if column_name:
operation += MetadataCommands.LIKE_PATTERN.value.format(column_name)

result = self.execute_command(
operation=operation,
session_id=session_id,
max_rows=max_rows,
max_bytes=max_bytes,
lz4_compression=False,
cursor=cursor,
use_cloud_fetch=False,
parameters=[],
async_op=False,
enforce_embedded_schema_correctness=False,
)
assert result is not None, "execute_command returned None in synchronous mode"
return result
20 changes: 20 additions & 0 deletions src/databricks/sql/backend/sea/utils/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,23 @@ class WaitTimeout(Enum):

ASYNC = "0s"
SYNC = "10s"


class MetadataCommands(Enum):
"""SQL commands used in the SEA backend.

These constants are used for metadata operations and other SQL queries
to ensure consistency and avoid string literal duplication.
"""

SHOW_CATALOGS = "SHOW CATALOGS"
SHOW_SCHEMAS = "SHOW SCHEMAS IN {}"
SHOW_TABLES = "SHOW TABLES IN {}"
SHOW_TABLES_ALL_CATALOGS = "SHOW TABLES IN ALL CATALOGS"
SHOW_COLUMNS = "SHOW COLUMNS IN CATALOG {}"

SCHEMA_LIKE_PATTERN = " SCHEMA LIKE '{}'"
TABLE_LIKE_PATTERN = " TABLE LIKE '{}'"
LIKE_PATTERN = " LIKE '{}'"

CATALOG_SPECIFIC = "CATALOG {}"
Loading
Loading