Skip to content

Conversation

bruntib
Copy link
Contributor

@bruntib bruntib commented Sep 15, 2025

No description provided.

Separate the previously blocking execution of `massStoreRun()`, which
was done in the context of the "API handler process", into a
"foreground" and a "background" part, exploiting the previously
implemented background task library support.

The foreground part remains executed in the context of the API handler
process, and deals with receiving and unpacking the to-be-stored data,
saving configuration and checking constraints that are cheap to check.
The foreground part can report issues synchronously to the client.

Everything else that was part of the previous `massStoreRun()` pipeline,
as implemented by the `mass_store_run.MassStoreRun` class becomes a
background task, such as the parsing of the uploaded reports and the
storing of data to the database.
This background task, implemented using the new library, executes in a
separate background worker process, and can not communicate directly
with the user.
Errors are logged to the `comments` fields.

The `massStoreRun()` API handler will continue to work as previously,
and block while waiting for the background task to terminate.
In case of an error, it synchronously reports a `RequestFailed` exception,
passing the `comments` field (into which the background process had
written the exception details) to the client.

Due to the inability for most of the exceptions previously caused in
`MassStoreRun` to "escape" as `RequestFailed`s, some parts of the API
had been deprecated and removed.
Namely, `ErrorCode.SOURCE_FILE` and `ErrorCode.REPORT_FORMAT` are no
longer sent over the API.
This does not break existing behaviour and does not cause an
incompatibility with clients: in cases where the request exceptions were
raised earlier, now a different type of exception is raised, but the
error message still precisely explains the problem as it did previously.
Even though commit d915473 introduced a
socket-level TCP keepalive support into the server's implementation,
this was observed multiple times to not be enough to
**deterministically** fix the issues with the `CodeChecker store` client
hanging indefinitely when the server takes a long time processing the
to-be-stored data.
The underlying reasons are not entirely clear and the issue only pops
up sporadically, but we did observe a few similar scenarios (such as
multi-million report storage from analysing LLVM and then storing
between datacentres) where it almost reliably reproduces.
The symptoms (even with a configure `kepalive`) generally include the
server not becoming notified about the client's disconnect, while the
client process is hung on a low-level system call `read(4, ...)`, trying
to get the Thrift response of `massStoreRun()` from the HTTP socket.
Even if the server finishes the storage processing "in time" and sent
the Thrift reply, it never reaches the client, which means it never
exits from the waiting, which means it keeps either the terminal or,
worse, a CI script occupied, blocking execution.

This is the "more proper solution" foreshadowed in
commit 15af7d8.

Implemented the server-side logic to spawn a `MassStoreRun` task and
return its token, giving the `massStoreRunAsynchronous()` API call full
force.

Implemented the client-side logic to use the new `task_client` module
and the same logic as
`CodeChecker cmd serverside-tasks --await --token TOKEN...`
to poll the server for the task's completion and status.
After the introduction of async store, massStoreRun was implemented in
a way that it is polling the task ID in every 5 seconds. This makes the
text execution too long.

In this commit a pipe has been added to the abstract task so it can
directly notify the massStoreRun function when the task is ready. No
other API function should use this pipe.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants