-
Notifications
You must be signed in to change notification settings - Fork 2
SonicRetry combined PR[CMSSW_15_1_0_pre6] #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
SonicRetry combined PR[CMSSW_15_1_0_pre6] #24
Conversation
…r method in TritonClient. Update BuildFile.xml and fix formatting in header files.
…tructor for TritonClient, and update BuildFile.xml to include Catch2 for testing.
…tests; remove old cfg
…lection; remove unused parameters and improve documentation.
if (client_) { | ||
client_->evaluate(); | ||
} else { | ||
edm::LogError("RetryActionBase") << "Client pointer is null, cannot evaluate."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an exception rather than a LogError. (It may actually need to be a return false
or similar, because the call chain is client->finish() -> action->retry() -> action->eval()
, and only finish()
should actually emit an exception.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment has not been addressed yet
} catch (std::exception& e) { | ||
edm::LogError("RetryActionDiffServer") << "Failed to retry with alternative server: " << e.what(); | ||
} catch (...) { | ||
edm::LogError("RetryActionDiffServe: rUnknownFailure") << "An unknown exception was thrown"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
@@ -0,0 +1,22 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment has not been addressed
if (client_) { | ||
client_->evaluate(); | ||
} else { | ||
edm::LogError("RetryActionBase") << "Client pointer is null, cannot evaluate."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment has not been addressed yet
parser.add_argument("--device", default="auto", type=str.lower, choices=allowed_devices, help="specify device for fallback server") | ||
parser.add_argument("--container", default="apptainer", type=str.lower, choices=allowed_containers, help="specify container for fallback server") | ||
parser.add_argument("--tries", default=0, type=int, help="number of retries for failed request") | ||
parser.add_argument("--retryAction", default="same", type=str, choices=["same","diff"], help="retry policy: same server or different server") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be added to getParser()
in customize.py
parser.add_argument("--noShm", default=False, action="store_true", help="disable shared memory") | ||
parser.add_argument("--compression", default="", type=str, choices=allowed_compression, help="enable I/O compression") | ||
parser.add_argument("--ssl", default=False, action="store_true", help="enable SSL authentication for server communication") | ||
parser.add_argument("--device", default="auto", type=str.lower, choices=allowed_devices, help="specify device for fallback server") | ||
parser.add_argument("--container", default="apptainer", type=str.lower, choices=allowed_containers, help="specify container for fallback server") | ||
parser.add_argument("--tries", default=0, type=int, help="number of retries for failed request") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are redundant with customize.py
and should be removed
Rebased #23 to CMSSW_15_1_0_pre6.
PR description:
RetryActionDiffServer
for SonicTriton usingTritonService
’s server registry; remove per-action alternate server parameters.TritonClient::updateServer(TritonService::Server::fallbackName)
to switch servers on retry, per review guidance.HeterogeneousCore/SonicTriton/test/test_RetryActionDiffServer.cc
(arms → updateServer(fallback) → no-op on second retry → exception path is caught).HeterogeneousCore/SonicTriton/test/tritonTest_cfg.py
with--retryAction {same,diff}
and a verbose confirmation line.TestHeterogeneousCoreSonicTritonRetryActionDiff_Log
to assert the selected retry policy.PR validation:
Built and ran unit/integration tests in
CMSSW_15_1_0_pre6
area:scram b -j 8
TODO/To verify:
scram b runtests TEST=HeterogeneousCore/SonicTriton
(passes)Client.Retry
is explicitly configured.