Skip to content

Conversation

elfkuzco
Copy link
Collaborator

Rationale

The task files schema was a dict[str, Any]. This meant that deep within it, the created_timestamp and uploaded_timestamp which are serialized to datetime objects by the custom zimfarm_loads function are not detected by the API datetime serializer

def serialize_datetime(value: datetime.datetime) -> str:
"""Serialize datetime with 'Z' suffix for naive datetimes"""
if value.tzinfo is None:
return value.isoformat(timespec="seconds") + "Z"
return value.isoformat(timespec="seconds")
class BaseModel(pydantic.BaseModel):
model_config = ConfigDict(
use_enum_values=True,
from_attributes=True,
populate_by_name=True,
json_encoders={datetime.datetime: serialize_datetime},
serialize_by_alias=True,
)

This further meant that when we want to compute the duration between when a file was created, it compares a naive datetime (without the Z suffix) to the events datetime which has the Z suffix. This leads to no data shown for Created After on the UI as the end time is greater than the start time. This is the cause of #1416 and is evident in other task details too

By building a model of the task files, Pydantic can detect that the field is a datetime object and serialize it accordingly. This means that on the UI, when the formatBetween function is called, both datetime strings are datetime-aware

Changes

  • define pydantic schema for Task files with defaults set to None as data in the DB may be null or not and we don't want validation errors

This fixes #1416

@elfkuzco elfkuzco requested a review from benoit74 October 14, 2025 15:46
@elfkuzco elfkuzco self-assigned this Oct 14, 2025
@elfkuzco elfkuzco linked an issue Oct 14, 2025 that may be closed by this pull request
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

❌ Patch coverage is 76.66667% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.49%. Comparing base (0447df9) to head (f510ccd).

Files with missing lines Patch % Lines
backend/src/zimfarm_backend/common/external.py 12.50% 7 Missing ⚠️
Additional details and impacted files
@@                    Coverage Diff                     @@
##           recipe-similarity-data    #1421      +/-   ##
==========================================================
+ Coverage                   82.41%   82.49%   +0.08%     
==========================================================
  Files                          79       79              
  Lines                        3918     3937      +19     
  Branches                      431      431              
==========================================================
+ Hits                         3229     3248      +19     
  Misses                        570      570              
  Partials                      119      119              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@benoit74
Copy link
Collaborator

Did you validated this is going to work well on all tasks we have in DB? We should try to read all of them and check we do not get validation errors and still get all files as expected. I'm a bit worried about the impact of this change should some data be wrong in DB.

Do you want a fresh data export with the bug from #1416 to be sure to reproduce the issue?

@elfkuzco
Copy link
Collaborator Author

Yeah, I did. Tried to default to None for most fields I wasn't sure of and mirrored it closely to the same model on zimit-frontend. You could send me a fresh dump just to double-check.

@benoit74
Copy link
Collaborator

Just sent you a new dump through Slack

@elfkuzco elfkuzco force-pushed the recipe-similarity-data branch from 4897902 to 0447df9 Compare October 16, 2025 13:38
@elfkuzco
Copy link
Collaborator Author

Ran the migrations and it works for the files schema of existing tasks in the db

@elfkuzco
Copy link
Collaborator Author

However, running the model validation against the entries in the dump exposes some issues in the DB dump. The columns files, container and debug should not be nullable. But it appears they contain null as values in the column. For example, task eefedc7a-e312-455d-8867-661ec0fe05c5.

A SQL script should probably be used to coalesce these values to {} as they violate data integrity.

@benoit74
Copy link
Collaborator

Can you prepare such an SQL script to coalesce these values to {} as they violate data integrity? And check when is last task with this problem (to confirm these are just old tasks and we are not still creating tasks with problems).

@elfkuzco
Copy link
Collaborator Author

New code currently sets them to {} while creating the task. It's the old tasks that violate it

task = Task(
updated_at=requested_task.updated_at,
events=requested_task.events,
debug={},
status=requested_task.status,
timestamp=requested_task.timestamp,
requested_by=requested_task.requested_by,
canceled_by=None,
container={},
priority=requested_task.priority,
config=requested_task.config.model_dump(
mode="json", context={"show_secrets": True}
),
notification=(
requested_task.notification.model_dump(mode="json")
if requested_task.notification
else {}
),
files={},
upload=requested_task.upload,
original_schedule_name=requested_task.original_schedule_name,
context=requested_task.context,
)

@elfkuzco
Copy link
Collaborator Author

elfkuzco commented Oct 17, 2025

UPDATE task SET debug = '{}'::jsonb WHERE debug IS NULL OR debug = 'null'::jsonb;
UPDATE task SET container = '{}'::jsonb WHERE container IS NULL OR  container = 'null'::jsonb;
UPDATE task SET notification = '{}'::jsonb WHERE notification IS NULL OR  notification = 'null'::jsonb;
UPDATE task SET files = '{}'::jsonb WHERE files IS NULL OR  files = 'null'::jsonb;

@elfkuzco
Copy link
Collaborator Author

From dump, oldest task that has this fault is updated at 2023

@elfkuzco elfkuzco force-pushed the recipe-similarity-data branch 5 times, most recently from 9e8caae to eef27e9 Compare October 17, 2025 15:14
Base automatically changed from recipe-similarity-data to main October 17, 2025 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"Created After" value is missing in task details

2 participants