Replies: 1 comment 4 replies
-
| 
         config.properties  | 
  
Beta Was this translation helpful? Give feedback.
                  
                    4 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
Setup
Working: Trino 436
Failing: Trino 477 (same query, same data, same plan shape)
Fault-tolerant execution (FTE): enabled on both (retry-policy=TASK)
Spilling: disabled
Workers heap: -Xmx ≈ 92 GiB, memory.heap-headroom-per-node=10 GiB → usable user-memory per worker ≈ 82 GiB.
Total workers : 15
Issue
After upgrading from 436 to 477, the same query now fails with “cluster out of memory”. The logical plan is effectively identical across both versions, but in 477 the HashAggregation operators in the final leg allocate tens of GB of memory for a relatively small inputs, whereas in 436 the same operators stay in the MBs range.
Task Details in 477
Cluster/task peaks:
Problematic operators (final leg):
Upstream join in the same leg is small in 477:
Hash build side: ~14,225 rows, ~510 KB
Lookup source positions: ~56.9k
Same operators in 436 (works)
So, with similar inputs and the same plan shape, 436 stays at MB-scale, while 477’s final HashAggregations expand to ~10–22 GB per operator and push a single task over the per-node user-memory budget.
I need guidance on which 477 knobs best reproduce 436’s behavior for this case. Please let me know if more details are required.
436 - Success Stage

477 - Failure Stage

477 - Failure Stage
tried with join_distribution_type=PARTITIONED, prefer_partial_aggregation=true, join_max_broadcast_table_size=0B
Beta Was this translation helpful? Give feedback.
All reactions