Skip to content

Bad interaction between PP file and lazy aggregated_by #6464

Open
@bjlittle

Description

@bjlittle

🐛 Bug Report

The below script shows a difference between lazy aggregated_by and realized (eager) aggregated_by.

With py37, iris=3.0.3 and dask=2021.06.0 real and lazy give the same result.

With py311, iris>=3.9.0 and dask>=2024.5.0 they are very different at the end of the time series. The lazy version is wrong. I have plots generated with the release candidate for Iris 3.4, where there was no problem.

One more thing I tried was to recreate the lazy cube by touching the data, and then making it lazy again i.e., hovmollers.data = hovmollers.lazy_data(). The problem went away.

Thanks,

import datetime

import cf_units
import numpy as np
import iris
import iris.coord_categorisation as cat
import iris.quickplot as qplt

def pp_callback(cube, field, filename):
    """
    Hovmollers saved by IDL as pp have time headers not understood by Iris,
    and wrong units.
    """
    if filename.endswith('.pp'):
        # Figure out times.
        TIME_UNIT = cf_units.Unit('hours since 1970-01-01 00:00:00', 'gregorian')
        first_date = datetime.datetime(field.lbyr, field.lbmon, field.lbdat)
        last_date = datetime.datetime(field.lbyrd, field.lbmond, field.lbdatd)
        first_bound = TIME_UNIT.date2num(first_date)
        last_bound = TIME_UNIT.date2num(last_date)
        n_bounds = (last_date - first_date).days // 5 + 1
        all_bounds = np.linspace(first_bound, last_bound, n_bounds)
        bounds = np.vstack((all_bounds[:-1], all_bounds[1:])).T
        points = bounds.mean(axis=1)

        time_coord = iris.coords.DimCoord(
            points, bounds=bounds, standard_name='time', units=TIME_UNIT)
        cube.add_dim_coord(time_coord, 0)

        # Fix units.
        cube.units = 'mm day-1'

hovmollers = iris.load(
    'gpcp_pentadal_v2.2_1979_2013.pp',
    callback=pp_callback).concatenate_cube()

cat.add_day_of_month(hovmollers, 'time')
cat.add_month_number(hovmollers, 'time')
lazy_stdev = hovmollers.aggregated_by(['month_number', 'day_of_month'], iris.analysis.STD_DEV)
hovmollers.data
real_stdev = hovmollers.aggregated_by(['month_number', 'day_of_month'], iris.analysis.STD_DEV)
diff = lazy_stdev - real_stdev

print(diff.data.max())
qplt.contourf(diff)
qplt.show()

This issue was raised on behalf of @rcomer 👍

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions