-
Notifications
You must be signed in to change notification settings - Fork 95
Open
Description
We have occasionally observed workflows gradually accumulating large quantities of memory.
Reproducible Example
Working with a recent workflow, Dave has managed to extract a reproducible example:
[task parameters]
origins = ADD, ARN, ATH, ATL, AUH, BAH, BCN, BDA, BER, BEY, BGI, BKK, BLQ, BLR, BOG, BOM, BOS, CAI, CAN, CLT, CMB, CMN, CPH, CPT, CTU, DAC, DEL, DEN, DFW, DOH, DTW, DXB, EWR, EZE, FCO, FNC, GIB, GOT, GRU, GYD, HAN, HEL, HKG, HND, IAD, IAH, ICN, IST, JED, JFK, JNB, KBP, KEF, KUL, KWI, LAX, LIM, LIN, LIS, LOS, MAD, MEX, MIA, MLA, MNL, MUC, NBO, NCE, NQZ, ORD, OSL, OTP, PDL, PEK, PER, PHL, PHX, PRG, PVG, RUH, SCL, SEA, SFO, SIN, SVO, TIA, TLV, TPE, VIE, WAW, YHZ, YUL, YYC, YYZ, ZAG
mogreps_timesteps = 24, 30, 36, 42, 48, 54, 60, 66, 72, 78, 84, 90, 96, 102, 108, 114, 120, 126, 132, 138
ecmwf_timesteps = 144, 150, 156, 162, 168, 174, 180, 186, 192, 198, 204, 210, 216, 222, 228, 234, 240, 246, 252, 258, 264
ecmwf_resil_timesteps = 270, 276, 282, 288
[[templates]]
origins = _%(origins)s
mogreps_timesteps = _t+%(mogreps_timesteps)03d
ecmwf_timesteps = _t+%(ecmwf_timesteps)03d
ecmwf_resil_timesteps = _t+%(ecmwf_resil_timesteps)03d
[scheduling]
initial cycle point = previous(T00)
runahead limit = P0
[[queues]]
[[[default]]]
limit = 5
[[graph]]
PT24H = '''
natseg_res_create => nats_start_00 => nats_start => nats_run_astar_mogreps<origins, mogreps_timesteps>
nats_start => nats_prepare_ecmwf
nats_prepare_ecmwf => nats_run_astar_ecmwf<origins, ecmwf_timesteps>
nats_run_astar_ecmwf<origins, ecmwf_timesteps> & nats_run_astar_mogreps<origins, mogreps_timesteps> => nats_genxml
nats_run_astar_ecmwf<origins, ecmwf_timesteps> => nats_run_astar_ecmwf_resil<origins, ecmwf_resil_timesteps>
nats_run_astar_mogreps<origins, mogreps_timesteps> & nats_run_astar_ecmwf_resil<origins, ecmwf_resil_timesteps> => natseg_res_delete
nats_genxml & nats_run_astar_ecmwf_resil<origins, ecmwf_resil_timesteps> => nats_preprocess_for_archive
nats_preprocess_for_archive => nats_archiving
nats_archiving & housekeep[-P1D] => housekeep
housekeep => archive_logs
'''
[runtime]
[[root]]
run mode = skip
[[nats_archiving]]
[[nats_genxml]]
[[housekeep]]
[[nats_prepare_ecmwf]]
[[nats_preprocess_for_archive]]
[[nats_run_astar_set<origins>]]
[[nats_run_astar_ecmwf<origins, ecmwf_timesteps>]]
inherit = nats_run_astar_set<origins>
[[nats_run_astar_ecmwf_resil<origins, ecmwf_resil_timesteps>]]
inherit = nats_run_astar_set<origins>
[[nats_run_astar_mogreps<origins, mogreps_timesteps>]]
inherit = nats_run_astar_set<origins>
[[nats_start]]
[[nats_start_00]]
[[natseg_res_create]]
[[natseg_res_delete]]
[[archive_logs]]Note, this workflow contains many-to-many triggers so the number of edges is extreme. It is likely these are triggering the issue (pun intended).
Example Results:
Example memory trace:
