Skip to content

Output writers seems not saving anything? #404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
taimoorsohail opened this issue Mar 12, 2025 · 7 comments · Fixed by #410
Closed

Output writers seems not saving anything? #404

taimoorsohail opened this issue Mar 12, 2025 · 7 comments · Fixed by #410
Labels
bug Something isn't working

Comments

@taimoorsohail
Copy link
Collaborator

taimoorsohail commented Mar 12, 2025

In multiple examples, the Checkpointer function isn't saving anything beyond iteration 0. A modified near_global_ocean example that demonstrates this is in my branch and PR #401

ocean.output_writers[:checkpoint] = Checkpointer(ocean.model;

But I have been able to recreate this across two examples with both TimeInterval and IterationInterval schedules. @navidcy

@glwagner
Copy link
Member

The output writer needs to be added to the coupled simulation

@taimoorsohail
Copy link
Collaborator Author

taimoorsohail commented Mar 12, 2025

Sure, but the code as it's written should work (and was working until yesterday).

I am calling the Checkpointer function from Oceananigans and specifying the ocean model properties only (not the coupled model). So it should be able to checkpoint the ocean properties as usual, which it isn't.

Maybe I'm misunderstanding your diagnosis of the problem!

@glwagner
Copy link
Member

Ah, but why should the code work? I guess I am proposing that we do not support this. But if we think it is a useful pattern then we can.

Mainly I am concerned that it adds too much complexity to allow output writers attached to "child" simulations. It provides a clearer / simpler view of the whole simulation if all tasks are managed by the "outermost" Simulation.

There could be a "portability" argument for allowing child simulations to carry their own output writers. But I would favor adding this kind of feature later, when we see a clear need.

@glwagner
Copy link
Member

Ok well I think output is being written at iteration 0 by this line

https://github.com/CliMA/Oceananigans.jl/blob/875a06a0964115c7f6ea1ca14c12ede5272d5ab1/src/Simulations/run.jl#L237

subsequently this should happen in

https://github.com/CliMA/Oceananigans.jl/blob/875a06a0964115c7f6ea1ca14c12ede5272d5ab1/src/Simulations/run.jl#L164-L166

One problem that can arise is if the schedules are not initialized, because

https://github.com/CliMA/Oceananigans.jl/blob/875a06a0964115c7f6ea1ca14c12ede5272d5ab1/src/Utils/schedules.jl#L60

But this should not affect IterationInterval schedules.

@navidcy
Copy link
Member

navidcy commented Mar 12, 2025

The output writer needs to be added to the coupled simulation

The near-global example adds the output writer to the child ocean model

outputs = merge(ocean.model.tracers, ocean.model.velocities)
ocean.output_writers[:surface] = JLD2OutputWriter(ocean.model, outputs;
schedule = TimeInterval(1days),
filename = "near_global_surface_fields",
indices = (:, :, grid.Nz),
with_halos = true,
overwrite_existing = true,
array_type = Array{Float32})

That's why I was under the impression that

ocean.output_writers[:checkpoint] = Checkpointer(ocean.model;
schedule = IterationInterval(2),
prefix = prefix,
dir = output_dir,
verbose = true,
overwrite_existing = true)
should work as well.

Thanks @glwagner for pointing to where the callbacks are initialized/triggered. That's helpful!

@navidcy
Copy link
Member

navidcy commented Mar 13, 2025

I think the problem is deeper. Seems that all output writers are not being called anymore.

Here's a MWE:

using ClimaOcean
using Oceananigans
using CFTime
using Dates
using Printf

arch = CPU()

Nx, Ny, Nz = 144, 60, 40

z_faces = exponential_z_faces(; Nz, depth=6000)

grid = LatitudeLongitudeGrid(arch;
                             size = (Nx, Ny, Nz),
                             halo = (7, 7, 7),
                             z = z_faces,
                             latitude  = (-75, 75),
                             longitude = (0, 360))

ocean = ocean_simulation(grid)

radiation = Radiation(arch)

atmosphere = JRA55PrescribedAtmosphere(arch; backend=JRA55NetCDFBackend(41))

coupled_model = OceanSeaIceModel(ocean; atmosphere, radiation)

simulation = Simulation(coupled_model; Δt=10, stop_iteration=10)

wall_time = Ref(time_ns())

function progress(sim)
    ocean = sim.model.ocean
    u, v, w = ocean.model.velocities
    T = ocean.model.tracers.T

    Tmax = maximum(interior(T))
    Tmin = minimum(interior(T))

    umax = (maximum(abs, interior(u)),
            maximum(abs, interior(v)),
            maximum(abs, interior(w)))

    step_time = 1e-9 * (time_ns() - wall_time[])

    msg = @sprintf("Iter: %d, simulation time: %s, atmosphere time: %s, Δt: %s", iteration(sim), prettytime(sim), prettytime(atmosphere.clock.time), prettytime(sim.Δt))
    msg *= @sprintf(", max|u|: (%.2e, %.2e, %.2e) m s⁻¹, extrema(T): (%.2f, %.2f) ᵒC, wall time: %s",
                    umax..., Tmax, Tmin, prettytime(step_time))

    @info msg

    wall_time[] = time_ns()
end

simulation.callbacks[:progress] = Callback(progress, IterationInterval(1))

outputs = merge(ocean.model.tracers, ocean.model.velocities)

ocean.output_writers[:surface] = JLD2OutputWriter(ocean.model, outputs;
                                                  schedule = IterationInterval(2),
                                                  filename = "mwe_surface",
                                                  indices = (:, :, grid.Nz),
                                                  with_halos = true,
                                                  overwrite_existing = true,
                                                  array_type = Array{Float32})


run!(simulation)

u = FieldTimeSeries("mwe_surface.jld2", "u"; backend = OnDisk())
u

With these version of deps:

(ClimaOcean) pkg> st
Project ClimaOcean v0.5.1
Status `~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/ClimaOcean.jl-v1/Project.toml`
  [79e6a3ab] Adapt v4.2.0
  [179af706] CFTime v0.1.4
  [052768ef] CUDA v5.7.0
  [6ba0ff68] ClimaSeaIce v0.2.4
  [9c784101] CubicSplines v0.2.1
  [124859b0] DataDeps v0.7.13
  [787d08f9] ImageMorphology v0.4.5
  [033835bb] JLD2 v0.5.11
  [63c18a36] KernelAbstractions v0.9.34
  [da04e1cc] MPI v0.20.22
  [85f8d34a] NCDatasets v0.14.6
  [9e8cae18] Oceananigans v0.95.26
  [6fe1bfb0] OffsetArrays v1.15.0
  [6c6a2e73] Scratch v1.2.1
  [d496a93d] SeawaterPolynomials v0.3.5
  [90137ffa] StaticArrays v1.9.13
  [49b00bb7] SurfaceFluxes v0.12.0
  [b60c26fb] Thermodynamics v0.12.9
  [ade2ca70] Dates
  [f43a241f] Downloads v1.6.0
  [de0858da] Printf
  [10745b16] Statistics v1.10.0

I get:

julia> include("mwe.jl")
[ Info: Oceananigans will use 12 threads
┌ Warning: Are you totally, 100% sure that you want to build a simulation on
│ 
│ 144×60×40 LatitudeLongitudeGrid{Float64, Periodic, Bounded, Bounded} on CPU with 7×7×7 halo and with precomputed metrics
│ 
│ rather than on an ImmersedBoundaryGrid?
└ @ ClimaOcean.OceanSimulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/ClimaOcean.jl-v1/src/OceanSimulations/ocean_simulation.jl:142
[ Info: Initializing simulation...
[ Info: Iter: 0, simulation time: 0 seconds, atmosphere time: 0 seconds, Δt: 10 seconds, max|u|: (0.00e+00, 0.00e+00, 0.00e+00) m s⁻¹, extrema(T): (0.00, 0.00) ᵒC, wall time: 1.032 minutes
[ Info:     ... simulation initialization complete (17.417 seconds)
[ Info: Executing initial time step...
[ Info: Iter: 1, simulation time: 10 seconds, atmosphere time: 10 seconds, Δt: 10 seconds, max|u|: (3.44e-03, 1.79e-03, 9.13e-06) m s⁻¹, extrema(T): (0.00, 0.00) ᵒC, wall time: 13.145 seconds
[ Info:     ... initial time step complete (13.144 seconds).
[ Info: Iter: 2, simulation time: 20 seconds, atmosphere time: 20 seconds, Δt: 10 seconds, max|u|: (6.85e-03, 2.93e-03, 1.82e-05) m s⁻¹, extrema(T): (0.00, 0.00) ᵒC, wall time: 48.280 ms
[ Info: Iter: 3, simulation time: 30 seconds, atmosphere time: 30 seconds, Δt: 10 seconds, max|u|: (1.02e-02, 4.38e-03, 2.73e-05) m s⁻¹, extrema(T): (0.00, 0.00) ᵒC, wall time: 38.134 ms
[ Info: Iter: 4, simulation time: 40 seconds, atmosphere time: 40 seconds, Δt: 10 seconds, max|u|: (1.35e-02, 5.81e-03, 3.63e-05) m s⁻¹, extrema(T): (0.00, 0.00) ᵒC, wall time: 38.186 ms
[ Info: Iter: 5, simulation time: 50 seconds, atmosphere time: 50 seconds, Δt: 10 seconds, max|u|: (1.67e-02, 7.23e-03, 4.52e-05) m s⁻¹, extrema(T): (0.00, 0.00) ᵒC, wall time: 38.771 ms
[ Info: Iter: 6, simulation time: 1 minute, atmosphere time: 1 minute, Δt: 10 seconds, max|u|: (1.99e-02, 8.63e-03, 5.40e-05) m s⁻¹, extrema(T): (0.00, 0.00) ᵒC, wall time: 38.415 ms
[ Info: Iter: 7, simulation time: 1.167 minutes, atmosphere time: 1.167 minutes, Δt: 10 seconds, max|u|: (2.31e-02, 1.00e-02, 6.26e-05) m s⁻¹, extrema(T): (0.00, 0.00) ᵒC, wall time: 39.706 ms
[ Info: Iter: 8, simulation time: 1.333 minutes, atmosphere time: 1.333 minutes, Δt: 10 seconds, max|u|: (2.61e-02, 1.14e-02, 7.11e-05) m s⁻¹, extrema(T): (0.01, 0.00) ᵒC, wall time: 38.216 ms
[ Info: Iter: 9, simulation time: 1.500 minutes, atmosphere time: 1.500 minutes, Δt: 10 seconds, max|u|: (2.91e-02, 1.27e-02, 7.95e-05) m s⁻¹, extrema(T): (0.01, 0.00) ᵒC, wall time: 37.797 ms
[ Info: Simulation is stopping after running for 0 seconds.
[ Info: Model iteration 10 equals or exceeds stop iteration 10.
[ Info: Iter: 10, simulation time: 1.667 minutes, atmosphere time: 1.667 minutes, Δt: 10 seconds, max|u|: (3.21e-02, 1.40e-02, 8.76e-05) m s⁻¹, extrema(T): (0.01, 0.00) ᵒC, wall time: 96.851 ms
144×60×1×1 FieldTimeSeries{OnDisk} located at (Face, Center, Center) of u at mwe_surface.jld2
├── grid: 144×60×40 LatitudeLongitudeGrid{Float64, Periodic, Bounded, Bounded} on CPU with 7×7×7 halo and with precomputed metrics
├── indices: (:, :, 40:40)
├── time_indexing: Linear()
├── backend: OnDisk
├── path: mwe_surface.jld2
└── name: u

julia> ut
144×60×1×1 FieldTimeSeries{OnDisk} located at (Face, Center, Center) of u at mwe_surface.jld2
├── grid: 144×60×40 LatitudeLongitudeGrid{Float64, Periodic, Bounded, Bounded} on CPU with 7×7×7 halo and with precomputed metrics
├── indices: (:, :, 40:40)
├── time_indexing: Linear()
├── backend: OnDisk
├── path: mwe_surface.jld2
└── name: u

So only 1 output? And it's even empty if you look at it!

So seems that no output was written. Don't know why yet, I'm trying to boil it down... I see that PR CliMA/Oceananigans.jl#4096 changed run.jl and in particular:

https://github.com/CliMA/Oceananigans.jl/pull/4096/files#diff-9c7c509c81ca650f05f42a5614d43df2f0eba09198630f6c949c7518e3fcc79cR236

the for writer in values(sim.output_writers) loop within time_step!(simulation).

@navidcy navidcy changed the title Checkpointer not saving past iteration 0 Output writers seems not saving anything? Mar 13, 2025
@navidcy
Copy link
Member

navidcy commented Mar 13, 2025

Using ClimaOcean v0.4.6 and Oceananigans 0.95.20 seems to work OK. I think it's the Oceananigans version (and the changes in run!?) the main culprit here.

(ClimaOcean) pkg> st
Project ClimaOcean v0.4.6
Status `~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/ClimaOcean.jl-v1/Project.toml`
  [79e6a3ab] Adapt v4.2.0
  [179af706] CFTime v0.1.4
  [052768ef] CUDA v5.7.0
⌃ [6ba0ff68] ClimaSeaIce v0.2.3
  [9c784101] CubicSplines v0.2.1
  [124859b0] DataDeps v0.7.13
  [787d08f9] ImageMorphology v0.4.5
  [033835bb] JLD2 v0.5.11
  [63c18a36] KernelAbstractions v0.9.34
  [da04e1cc] MPI v0.20.22
  [85f8d34a] NCDatasets v0.14.6
⌃ [9e8cae18] Oceananigans v0.95.20
  [6fe1bfb0] OffsetArrays v1.15.0
  [c2be9673] OrthogonalSphericalShellGrids v0.2.2
  [6c6a2e73] Scratch v1.2.1
  [d496a93d] SeawaterPolynomials v0.3.5
  [90137ffa] StaticArrays v1.9.13
  [49b00bb7] SurfaceFluxes v0.12.0
  [b60c26fb] Thermodynamics v0.12.9
  [ade2ca70] Dates
  [f43a241f] Downloads v1.6.0
  [de0858da] Printf
  [10745b16] Statistics v1.10.0
Info Packages marked with ⌃ have new versions available and may be upgradable.
julia> include("mwe.jl")
┌ Warning: Are you totally, 100% sure that you want to build a simulation on
│ 
│ 144×60×40 LatitudeLongitudeGrid{Float64, Periodic, Bounded, Bounded} on CPU with 7×7×7 halo and with precomputed metrics
│ 
│ rather than on an ImmersedBoundaryGrid?
└ @ ClimaOcean.OceanSimulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/ClimaOcean.jl-v1/src/OceanSimulations/ocean_simulation.jl:142
[ Info: Initializing simulation...
[ Info: Iter: 0, simulation time: 0 seconds, atmosphere time: 0 seconds, Δt: 10 seconds, max|u|: (0.00e+00, 0.00e+00, 0.00e+00) m s⁻¹, extrema(T): (0.00, 0.00) ᵒC, wall time: 31.368 seconds
[ Info:     ... simulation initialization complete (534.174 ms)
[ Info: Executing initial time step...
[ Info: Iter: 1, simulation time: 10 seconds, atmosphere time: 10 seconds, Δt: 10 seconds, max|u|: (3.44e-03, 1.79e-03, 9.13e-06) m s⁻¹, extrema(T): (0.00, 0.00) ᵒC, wall time: 32.678 seconds
[ Info:     ... initial time step complete (32.676 seconds).
[ Info: Iter: 2, simulation time: 20 seconds, atmosphere time: 20 seconds, Δt: 10 seconds, max|u|: (6.85e-03, 2.93e-03, 1.82e-05) m s⁻¹, extrema(T): (0.00, 0.00) ᵒC, wall time: 44.156 ms
[ Info: Iter: 3, simulation time: 30 seconds, atmosphere time: 30 seconds, Δt: 10 seconds, max|u|: (1.02e-02, 4.38e-03, 2.73e-05) m s⁻¹, extrema(T): (0.00, 0.00) ᵒC, wall time: 40.756 ms
[ Info: Iter: 4, simulation time: 40 seconds, atmosphere time: 40 seconds, Δt: 10 seconds, max|u|: (1.35e-02, 5.81e-03, 3.63e-05) m s⁻¹, extrema(T): (0.00, 0.00) ᵒC, wall time: 42.690 ms
[ Info: Iter: 5, simulation time: 50 seconds, atmosphere time: 50 seconds, Δt: 10 seconds, max|u|: (1.67e-02, 7.23e-03, 4.52e-05) m s⁻¹, extrema(T): (0.00, 0.00) ᵒC, wall time: 41.076 ms
[ Info: Iter: 6, simulation time: 1 minute, atmosphere time: 1 minute, Δt: 10 seconds, max|u|: (1.99e-02, 8.63e-03, 5.40e-05) m s⁻¹, extrema(T): (0.00, 0.00) ᵒC, wall time: 42.294 ms
[ Info: Iter: 7, simulation time: 1.167 minutes, atmosphere time: 1.167 minutes, Δt: 10 seconds, max|u|: (2.31e-02, 1.00e-02, 6.26e-05) m s⁻¹, extrema(T): (0.00, 0.00) ᵒC, wall time: 40.524 ms
[ Info: Iter: 8, simulation time: 1.333 minutes, atmosphere time: 1.333 minutes, Δt: 10 seconds, max|u|: (2.61e-02, 1.14e-02, 7.11e-05) m s⁻¹, extrema(T): (0.01, 0.00) ᵒC, wall time: 43.584 ms
[ Info: Iter: 9, simulation time: 1.500 minutes, atmosphere time: 1.500 minutes, Δt: 10 seconds, max|u|: (2.91e-02, 1.27e-02, 7.94e-05) m s⁻¹, extrema(T): (0.01, 0.00) ᵒC, wall time: 40.812 ms
[ Info: Simulation is stopping after running for 0 seconds.
[ Info: Model iteration 10 equals or exceeds stop iteration 10.
[ Info: Iter: 10, simulation time: 1.667 minutes, atmosphere time: 1.667 minutes, Δt: 10 seconds, max|u|: (3.21e-02, 1.40e-02, 8.75e-05) m s⁻¹, extrema(T): (0.01, 0.00) ᵒC, wall time: 43.526 ms
144×60×1×6 FieldTimeSeries{OnDisk} located at (Face, Center, Center) of u at mwe_surface.jld2
├── grid: 144×60×40 LatitudeLongitudeGrid{Float64, Periodic, Bounded, Bounded} on CPU with 7×7×7 halo and with precomputed metrics
├── indices: (:, :, 40:40)
├── time_indexing: Linear()
├── backend: OnDisk
├── path: mwe_surface.jld2
└── name: u

I get a FieldTimeSeries with 144×60×1×6 6 outputs as expected.

@navidcy navidcy added the bug Something isn't working label Mar 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants