You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: duckdb/011_working_with_apache_arrow.py
+6-8Lines changed: 6 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@
14
14
15
15
importmarimo
16
16
17
-
__generated_with="0.14.11"
17
+
__generated_with="0.14.12"
18
18
app=marimo.App(width="medium")
19
19
20
20
@@ -300,17 +300,15 @@ def _(mo):
300
300
### Key Benefits:
301
301
302
302
- **Memory Efficiency**: Arrow's columnar format uses 20-40% less memory than traditional DataFrames through compact columnar representation and better compression ratios
303
-
- **Zero-Copy Operations**: Data can be shared between DuckDB and Arrow-compatible systems (Polars, Pandas) without any data copying, eliminating redundant memory usage
303
+
- **Zero-Copy Operations**: Data can be shared between DuckDB and Arrow-compatible systems (Polars, Pandas) without any data copying, eliminating redundant memory usage
304
304
- **Query Performance**: 2-10x faster queries compared to traditional approaches that require data copying
305
-
- **Larger-than-Memory Analysis**: Since both libraries support streaming query results, you can execute queries on data bigger than available memory by processing one batch at a time
305
+
- **Larger-than-Memory Analysis**: Both DuckDB and Arrow-compatible libraries support streaming query results, allowing you to execute queries on data larger than available memory by processing data in batches.
306
306
- **Advanced Query Optimization**: DuckDB's optimizer can push down filters and projections directly into Arrow scans, reading only relevant columns and partitions
307
307
Let's demonstrate these benefits with concrete examples:
308
308
"""
309
309
)
310
310
return
311
311
312
-
313
-
314
312
@app.cell(hide_code=True)
315
313
def_(mo):
316
314
mo.md(r"""### Memory Efficiency Demonstration""")
@@ -529,7 +527,6 @@ def _(mo):
529
527
530
528
@app.cell
531
529
def_(polars_data, time):
532
-
importpsutil
533
530
importos
534
531
importpyarrow.computeaspc# Add this import
535
532
@@ -554,14 +551,14 @@ def _(polars_data, time):
554
551
# Compare with traditional copy-based operations
555
552
latest_start_time=time.time()
556
553
557
-
# These operations create copies
554
+
# These operations may create copies depending on Pandas' Copy-on-Write (CoW) behavior
0 commit comments