-
Notifications
You must be signed in to change notification settings - Fork 173
fix(websocket): Fix websocket client race on abort and memory leak(IDFGH-16555) #924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
|
67bd7e3 to
46871bf
Compare
| #else | ||
| // When separate TX lock is not configured, we already hold client->lock | ||
| // which protects the transport, so we can send PONG directly | ||
| esp_transport_ws_send_raw(client->transport, WS_TRANSPORT_OPCODES_PONG | WS_TRANSPORT_OPCODES_FIN, data, client->payload_len, |
Check warning
Code scanning / clang-tidy
The value '138' provided to the cast expression is not in the valid range of values for 'ws_transport_opcodes' [clang-analyzer-optin.core.EnumCastOutOfRange] Warning
46871bf to
5577e03
Compare
ca2956e to
0e58789
Compare
0e58789 to
62925a5
Compare
15dcb35 to
f474654
Compare
a6e4259 to
50e3068
Compare
50e3068 to
22eb17e
Compare
52abfc0 to
30778c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being reviewed by Cursor Bugbot
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
30778c0 to
d202ae4
Compare
david-cermak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general.
but would like to double-check the locking order, doesn't feel right to lock one while holding another.
d202ae4 to
0f28a4f
Compare
| esp_event_loop_run(client->event_handle, 0); | ||
| if (xSemaphoreTakeRecursive(client->lock, lock_timeout) != pdPASS) { | ||
| ESP_LOGE(TAG, "Failed to re-acquire lock after event loop"); | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Lock released without being held after failed reacquisition
When xSemaphoreTakeRecursive fails at line 1211 (after releasing the lock at line 1209), the break statement only exits the switch, not the while loop. Execution continues to line 1311 where xSemaphoreGiveRecursive(client->lock) is called on a mutex that isn't held by the task. In FreeRTOS, calling Give on a mutex not owned by the calling task is undefined behavior and can corrupt the mutex state or cause assertion failures.
Additional Locations (1)
david-cermak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
- Add state check in abort_connection to prevent double-close - Fix memory leak: free errormsg_buffer on disconnect - Reset connection state on reconnect to prevent stale data - Implement lock ordering for separate TX lock mode - Read buffered data immediately after connection to prevent data loss - Added sdkconfig.ci.tx_lock config
0f28a4f to
f92da56
Compare


TODO - Will remove comments after the review ( left it for easier review)
Description
This PR fixes critical memory leaks and crashes in the ESP WebSocket client that occur during reconnection scenarios(CONFIG_ESP_WS_CLIENT_SEPARATE_TX_LOCK = y).
Changes Made:
Related
#898
Checklist
Before submitting a Pull Request, please ensure the following:
Note
Fixes ws-client races and memory leak; corrects lock ordering for separate TX lock; initializes/reset state on connect and handles initial recv; adds CI config.
esp_websocket_client_abort_connection(...): add safe-state checks (skip if closing/closed), dispatch disconnect, and freeerrormsg_buffer.payload_len/offset,last_fin,last_opcode) on connect; process initial data viaesp_websocket_client_recv(...)and abort on failure.client->transport_listandclient->transportafter destroy.WEBSOCKET_TX_LOCK_TIMEOUT_MS; enforce lock ordering: releaseclient->lockbefore takingtx_lock, then re-acquire and state-check.recv()and abort logic.examples/target/sdkconfig.ci.tx_lockenabling separate TX lock with timeout.Written by Cursor Bugbot for commit f92da56. This will update automatically on new commits. Configure here.