HashAggregation Operator uses more memory in 477 than 436 (same plan / query); query fails with CLUSTER_OUT_OF_MEMORY #27144

xc3p7n3 · 2025-10-29T05:20:19Z

xc3p7n3
Oct 29, 2025

Setup

Working: Trino 436
Failing: Trino 477 (same query, same data, same plan shape)
Fault-tolerant execution (FTE): enabled on both (retry-policy=TASK)
Spilling: disabled
Workers heap: -Xmx ≈ 92 GiB, memory.heap-headroom-per-node=10 GiB → usable user-memory per worker ≈ 82 GiB.
Total workers : 15

Issue

After upgrading from 436 to 477, the same query now fails with “cluster out of memory”. The logical plan is effectively identical across both versions, but in 477 the HashAggregation operators in the final leg allocate tens of GB of memory for a relatively small inputs, whereas in 436 the same operators stay in the MBs range.

Task Details in 477

Cluster/task peaks:

Peak user memory (cluster): ~1.608 TB
Worst single task peak: ~90.06 GB
Execption(from stack trace):
 Cannot allocate enough memory for task 20251029_044228_00000_mdg5w.1.43.0. Reported peak memory reservation: 89,758,680,952B. Maximum possible reservation: 88,046,829,568B.

Problematic operators (final leg):

HashAggregation (planNodeId 63)
Peak memory: ~21.1–21.6 GB
Input: ~735,865 rows, ~129–139 MB
Often WAITING_FOR_MEMORY in operator summary

HashAggregation (planNodeId 9437)
Peak memory: ~10.25 GB
Input: ~6.39 M rows, ~1.02–1.11 GB
Operator counter shows “Input rows processed without partial aggregation enabled” ≈ 4,076,620

Upstream join in the same leg is small in 477:
Hash build side: ~14,225 rows, ~510 KB
Lookup source positions: ~56.9k

Same operators in 436 (works)

HashAggregation (planNodeId 63)
Peak memory: ~0.95 MB
Input: ~5.97 M rows, ~0.94 GB

HashAggregation (planNodeId 8866)
Peak memory: ~28.6 MB
Input: ~6.36 M rows, ~0.94 GB

So, with similar inputs and the same plan shape, 436 stays at MB-scale, while 477’s final HashAggregations expand to ~10–22 GB per operator and push a single task over the per-node user-memory budget.

I need guidance on which 477 knobs best reproduce 436’s behavior for this case. Please let me know if more details are required.

436 - Success Stage

477 - Failure Stage

477 - Failure Stage
tried with join_distribution_type=PARTITIONED, prefer_partial_aggregation=true, join_max_broadcast_table_size=0B

xc3p7n3 · 2025-10-29T05:25:31Z

xc3p7n3
Oct 29, 2025
Author

config.properties

    exchange.compression-codec=LZ4
    
    query.max-stage-count=600
    query.max-execution-time=180m
    query.max-history=200
    query.max-scan-physical-bytes=2TB
    query.remote-task.max-error-duration=1m
    
    catalog.management=static
    
    node-scheduler.include-coordinator=false
    
    task.concurrency=4
    task.max-worker-threads=14
    task.max-drivers-per-task=8
    task.client.timeout=2m
    task.max-partial-aggregation-memory=16MB
    retry-policy=TASK
    task-retry-attempts-per-task=5
    
    fault-tolerant-execution-task-memory=3GB
    
    sql.forced-session-time-zone=UTC
    
    http-server.process-forwarded=true
    http-server.authentication.allow-insecure-over-http=true
    
    query.low-memory-killer.policy=total-reservation-on-blocked-nodes
    task.low-memory-killer.policy=total-reservation-on-blocked-nodes
    pages-index.eager-compaction-enabled=true
    
    shutdown.grace-period=180s
    
    log.format=JSON
    log.console-format=JSON
    ```

4 replies

pettyjamesm Oct 29, 2025
Collaborator

Can you share details about the query and specifically, what the aggregation operation involved here is?

xc3p7n3 Oct 29, 2025
Author

aggregations : max
present inside all_interactions block in below query

WITH
  time_config AS (
   SELECT
     date(date_trunc('month', (now() + INTERVAL '330' MINUTE))) month_start_date
   , date((now() + INTERVAL '330' MINUTE)) day_start_date

) 
, all_interactions AS (
   SELECT
     date(date_trunc('month', created_at_ist)) month_start_date
   , date(date_trunc('day', created_at_ist)) day_start_date
   , i.id interaction_id
   , max(i.created_at_ist) created_at_ist
   , max(i.type) interaction_type
   , max((CASE WHEN (i.type = 'TEXT_ONE') THEN 'TEXT' ELSE 'CALLING' END)) overall_interaction_type
   , max(json_extract_scalar(replace(CAST(i.metadata AS varchar), 'X-APP-TYPE', 'Xapptype'), '$.Xapptype')) app_type
   ,COALESCE(max((CASE WHEN (q.tag = 'INTERACTION_STATUS') THEN o.tag END)), max((CASE WHEN (q.tag = 'TERMINAL_INTERACTION_STATUS') THEN o.tag END))) interaction_tag
   , COALESCE(max((CASE WHEN ((q.id IN (184, 200, 218, 299, 291)) OR (q.key IN ('cc-w17s1q1')) OR (q.tag = 'INTERACTION_STATUS')) THEN o.text END)), max((CASE WHEN (q.tag = 'TERMINAL_INTERACTION_STATUS') THEN o.text END))) feedback
   FROM
     (((((
      SELECT
        id
      , type
      , agent_reference_id
      , metadata
      , (from_iso8601_timestamp(created_at) + INTERVAL '330' MINUTE) created_at_ist
      FROM
        customer_support_service.interactions_v2
      WHERE ((date_trunc('month', (from_iso8601_timestamp(created_at) + INTERVAL '330' MINUTE)) >= ((SELECT month_start_date
FROM
  time_config
) - INTERVAL '1' MONTH)) AND (type IN ('PR1', 'PR2', 'PR3')) AND ((__deleted <> 'true') OR (__deleted IS NULL)))
   )  i
   CROSS JOIN time_config config)
   LEFT JOIN (
      SELECT *
      FROM
        customer_support_service.answers
      WHERE (date_trunc('month', (from_iso8601_timestamp(created_at) + INTERVAL '330' MINUTE)) >= ((SELECT month_start_date FROM time_config) - INTERVAL '1' MONTH))
   )  a ON (a.interaction_id = i.id))
   LEFT JOIN customer_support_service.questions q ON (q.id = a.question_id))
   LEFT JOIN customer_support_service.options o ON (o.id = a.option_id))
   GROUP BY 1, 2, 3
) 
SELECT
  ai.*
, fp.interaction_preference
, u.phone_number agent_phone_number
, u.name agent_name
FROM
  ((all_interactions ai
LEFT JOIN static_tables.interaction_priority_order fp ON ((ai.overall_interaction_type = fp.source) AND (ai.interaction_tag = fp.interaction_tag))) -- This join is failing in 477, static_tables.interaction_priority_order will always have 41 rows(1.70KB)
LEFT JOIN (
   SELECT *
   FROM
     agents_service.users
   WHERE (""__deleted"" = 'false')
)  u ON (u.reference_id = ai.agent_reference_id))

This LEFT JOIN stage with aggrgations (all_interactions ai LEFT JOIN static_tables.interaction_priority_order fp ON ((ai.overall_interaction_type = fp.source) AND (ai.interaction_tag = fp.interaction_tag)) is failing.

In the screenshot static_tables.interaction_priority_order is 1.71kb (41 rows)

Stack Trace

io.trino.spi.TrinoException: Cannot allocate enough memory for task 20251029_173651_00005_mdg5w.1.6.0. Reported peak memory reservation: 90725389855B. Maximum possible reservation: 88046829568B.
	at io.trino.execution.scheduler.faulttolerant.EventDrivenFaultTolerantQueryScheduler$StageExecution.taskFailed(EventDrivenFaultTolerantQueryScheduler.java:2621)
	at io.trino.execution.scheduler.faulttolerant.EventDrivenFaultTolerantQueryScheduler$Scheduler.onRemoteTaskCompleted(EventDrivenFaultTolerantQueryScheduler.java:1835)
	at io.trino.execution.scheduler.faulttolerant.EventDrivenFaultTolerantQueryScheduler$Scheduler.onRemoteTaskCompleted(EventDrivenFaultTolerantQueryScheduler.java:712)
	at io.trino.execution.scheduler.faulttolerant.EventDrivenFaultTolerantQueryScheduler$RemoteTaskCompletedEvent.accept(EventDrivenFaultTolerantQueryScheduler.java:3450)
	at io.trino.execution.scheduler.faulttolerant.EventDrivenFaultTolerantQueryScheduler$Scheduler.processEvents(EventDrivenFaultTolerantQueryScheduler.java:959)
	at io.trino.execution.scheduler.faulttolerant.EventDrivenFaultTolerantQueryScheduler$Scheduler.run(EventDrivenFaultTolerantQueryScheduler.java:879)
	at io.trino.$gen.Trino_477____20251029_044046_2.run(Unknown Source)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:545)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:128)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:80)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
	at java.base/java.lang.Thread.run(Thread.java:1447)
	Suppressed: java.lang.RuntimeException: Task 20251029_173651_00005_mdg5w.1.3.0 failed
		at io.trino.execution.scheduler.faulttolerant.EventDrivenFaultTolerantQueryScheduler$Scheduler.lambda$checkComplete$0(EventDrivenFaultTolerantQueryScheduler.java:1077)
		at java.base/java.util.ArrayDeque.forEach(ArrayDeque.java:886)
		at io.trino.execution.scheduler.faulttolerant.EventDrivenFaultTolerantQueryScheduler$Scheduler.checkComplete(EventDrivenFaultTolerantQueryScheduler.java:1077)
		at io.trino.execution.scheduler.faulttolerant.EventDrivenFaultTolerantQueryScheduler$Scheduler.schedule(EventDrivenFaultTolerantQueryScheduler.java:1051)
		at io.trino.execution.scheduler.faulttolerant.EventDrivenFaultTolerantQueryScheduler$Scheduler.run(EventDrivenFaultTolerantQueryScheduler.java:883)
		... 8 more
	Caused by: io.trino.spi.TrinoException: Task killed because the cluster is out of memory.
		at io.trino.memory.ClusterMemoryManager.callOomKiller(ClusterMemoryManager.java:276)
		at io.trino.memory.ClusterMemoryManager.process(ClusterMemoryManager.java:229)
		at io.trino.execution.SqlQueryManager.enforceMemoryLimits(SqlQueryManager.java:347)
		at io.trino.execution.SqlQueryManager.lambda$start$0(SqlQueryManager.java:115)
		at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:545)
		at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:369)
		at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:310)
		... 3 more
		```

xc3p7n3 Nov 1, 2025
Author

@pettyjamesm

I ran the same query on version 458, and it’s using a similar amount of memory as version 436.
However, for any version beyond 459 with the same configuration, the query fails due to a “cluster out of memory” error (in both FTE and non-FTE configurations).
This issue isn’t limited to a single query — memory usage appears to be significantly higher for all queries involving aggregations.

Since version 458 has other known issues related to the Hive Metastore and hudi connector, I’m particularly interested in testing version 475 or later.
Could you please take a look and let me know if there’s anything I might be missing or doing incorrectly?

pettyjamesm Nov 3, 2025
Collaborator

Yeah, originally I thought this might be related to Trino's migration to FlatHash that caused memory increases in HashAggregationOperator for versions 427+ and which was fixed to reduce memory usage by #25127 in Trino 473 (but 473 and 474 had bugs and shouldn't be used- so let's say starting with Trino 475).

However, that doesn't seem to be the root cause here since you've got successful queries in 436 and 458 (well after the move to FlatHash) and the issue persists in Trino 477 (which should have ~comparable memory usage to before 427).

I'm wondering if there's something else going on in the plan that has changed between versions- can you share the result of an EXPLAIN / EXPLAIN ANALYZE for these queries to see whether that's the case?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HashAggregation Operator uses more memory in 477 than 436 (same plan / query); query fails with CLUSTER_OUT_OF_MEMORY #27144

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

HashAggregation Operator uses more memory in 477 than 436 (same plan / query); query fails with CLUSTER_OUT_OF_MEMORY #27144

Uh oh!

xc3p7n3 Oct 29, 2025

Setup

Issue

Replies: 1 comment · 4 replies

Uh oh!

xc3p7n3 Oct 29, 2025 Author

Uh oh!

pettyjamesm Oct 29, 2025 Collaborator

Uh oh!

Uh oh!

xc3p7n3 Oct 29, 2025 Author

Uh oh!

xc3p7n3 Nov 1, 2025 Author

Uh oh!

pettyjamesm Nov 3, 2025 Collaborator

xc3p7n3
Oct 29, 2025

Replies: 1 comment 4 replies

xc3p7n3
Oct 29, 2025
Author

pettyjamesm Oct 29, 2025
Collaborator

xc3p7n3 Oct 29, 2025
Author

xc3p7n3 Nov 1, 2025
Author

pettyjamesm Nov 3, 2025
Collaborator