-
-
Notifications
You must be signed in to change notification settings - Fork 29
define model for task files #1421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## recipe-similarity-data #1421 +/- ##
==========================================================
+ Coverage 82.41% 82.49% +0.08%
==========================================================
Files 79 79
Lines 3918 3937 +19
Branches 431 431
==========================================================
+ Hits 3229 3248 +19
Misses 570 570
Partials 119 119 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Did you validated this is going to work well on all tasks we have in DB? We should try to read all of them and check we do not get validation errors and still get all files as expected. I'm a bit worried about the impact of this change should some data be wrong in DB. Do you want a fresh data export with the bug from #1416 to be sure to reproduce the issue? |
Yeah, I did. Tried to default to None for most fields I wasn't sure of and mirrored it closely to the same model on zimit-frontend. You could send me a fresh dump just to double-check. |
Just sent you a new dump through Slack |
4897902
to
0447df9
Compare
bb4b6f7
to
f510ccd
Compare
Ran the migrations and it works for the files schema of existing tasks in the db |
f510ccd
to
ab4cb00
Compare
However, running the model validation against the entries in the dump exposes some issues in the DB dump. The columns A SQL script should probably be used to coalesce these values to |
Can you prepare such an SQL script to coalesce these values to {} as they violate data integrity? And check when is last task with this problem (to confirm these are just old tasks and we are not still creating tasks with problems). |
New code currently sets them to zimfarm/backend/src/zimfarm_backend/db/tasks.py Lines 220 to 242 in 19915be
|
UPDATE task SET debug = '{}'::jsonb WHERE debug IS NULL OR debug = 'null'::jsonb;
UPDATE task SET container = '{}'::jsonb WHERE container IS NULL OR container = 'null'::jsonb;
UPDATE task SET notification = '{}'::jsonb WHERE notification IS NULL OR notification = 'null'::jsonb;
UPDATE task SET files = '{}'::jsonb WHERE files IS NULL OR files = 'null'::jsonb; |
From dump, oldest task that has this fault is updated at 2023 |
9e8caae
to
eef27e9
Compare
Rationale
The task
files
schema was adict[str, Any]
. This meant that deep within it, thecreated_timestamp
anduploaded_timestamp
which are serialized to datetime objects by the customzimfarm_loads
function are not detected by the API datetime serializerzimfarm/backend/src/zimfarm_backend/common/schemas/__init__.py
Lines 8 to 23 in 0b2638b
This further meant that when we want to compute the duration between when a file was created, it compares a
naive
datetime (without theZ
suffix) to theevents
datetime which has theZ
suffix. This leads to no data shown forCreated After
on the UI as theend
time is greater than thestart
time. This is the cause of #1416 and is evident in other task details tooBy building a model of the task files, Pydantic can detect that the field is a datetime object and serialize it accordingly. This means that on the UI, when the
formatBetween
function is called, both datetime strings are datetime-awareChanges
This fixes #1416