Skip to content

Conversation

@vortigont
Copy link

@vortigont vortigont commented Oct 19, 2025

AsyncAbstractResponse::_ack could allocate temp buffer with size larger than available sock buffer (i.e. to fit headers) and eventually loosing the remainder on transfer due to not checking if the complete data was added to sock buff.

Refactoring code in favor of having a dedicated std::vector object acting as accumulating buffer and more careful control on amount of data actually copied to sockbuff

Closes #315

@vortigont
Copy link
Author

BTW is not that same problem #242 ?

@mathieucarbou
Copy link
Member

mathieucarbou commented Oct 19, 2025

BTW is not that same problem #242 ?

It looks like the same indeed!

We can ask the use to test with this fix...

--_in_flight_credit; // take a credit
#endif
request->client()->send();
_send_buffer.erase(_send_buffer.begin(), _send_buffer.begin() + written);
Copy link
Member

@mathieucarbou mathieucarbou Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vortigont : could this call be expensive ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, depends on implementation it could be compiler-optimized to something like memmove, but not sure how this is done in espressif's toolchain. But actually I do not expect this part to run frequently under normal conditions, buffer should be aligned with available space.
Other option could be to set additional member var and do index offset calculations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I do not like this buffer approach at all - it's too heavy in general to create buffer matching window size then copy data there, then copy from buffer to tcp's pcbs. Default Arduino's is 5.7k but with custom-builded lwip it could become a mem hog. Should think of something other - a small fixed-size circular buffer maybe or other type of handlers for objects that could avoid copying

Copy link
Member

@mathieucarbou mathieucarbou Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. That’s what I saw also - adding a vector field to handler this situation. Wondering if the same thing could be done without it. I was going to propose also a circular buffer because anyway it cannot be more than the pcb space, right ?

Copy link

@willmmiles willmmiles Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's an alternative solution at the architectural level -- the interface of AsyncAbstractResponse requires that it consume bytes from the implementation only once; and we can't know for sure how many bytes the socket will accept until we send it some; so to be correct, AsyncAbstractResponse is going to have to cache the bytes it couldn't send. Since the API requires it to have a temporary buffer anyways, "just keep the buffer until we've sent it all" is the least bad solution.

Performance wise, std::vector<> does hurt a bit though - it both (a) insists on zeroing the memory, and (b) doesn't have easy to use release/reallocate semantics. I tried using a default_init_allocator<> to speed it up, but it didn't help much. Ultimately in the solution I put together for the fork I've been maintaining for WLED, I ended up making up a more explicit buffer data structure. I also wound up doing some gymnastics to avoid allocating so much memory that LwIP couldn't allocate a packet buffer.

See my attempt at a whole solution here: https://github.com/Aircoookie/ESPAsyncWebServer/blob/39b830c852054444ea12a8b5d7fcb4fa004a89d7/src/WebResponses.cpp#L317

Sorry I'm a bit backlogged on pulling this out and pushing it forward...

Some design notes:

  • I opted to release the assembly buffer as soon as the data was sent, and reallocate every _ack; this keeps the "static" memory usage down and lets it better multiplex between many connections when under memory pressure.
  • If I was doing it again, I'd give serious thought to capping the buffer at TCP_MSS and looping over _fillBufferAndProcessTemplates -> client()->write(). The up side is a guarantee that it'd never be buffering more than one packet; the down side is that it would make ArduinoJSON very sad...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willmmiles : I understand a bit more. I was wondering why we needed to add a buffer instead of just using indexes, but the fact is that this implementation being in in the abstract class like you say, it has to work with and without content buffer. Thanks!

@mathieucarbou
Copy link
Member

mathieucarbou commented Oct 19, 2025

@vortigont : FYI, I opened PR #317 to add an example in the project that we did not have about large responses.

I was hopping to get the opportunity to reproduce these 2 issues but no, everything goes fine.

Questions:

  1. Were you able to reproduce ?
  2. If yes, would you be able to rebase your PR on top of Added LargeResponse example #317 and add a handler showing the issue is fixed ?

Thanks!

Copy link

@yoursunny yoursunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have verified that the bug has been fixed.
I do not understand the code, but I pointed out some typos in the changes.

@vortigont
Copy link
Author

I have verified that the bug has been fixed. I do not understand the code, but I pointed out some typos in the changes.

thanks! I'll fix typos :)

@mathieucarbou mathieucarbou force-pushed the wresp_315 branch 2 times, most recently from 92da8c7 to 0f6f725 Compare October 20, 2025 08:12
@mathieucarbou
Copy link
Member

mathieucarbou commented Oct 20, 2025

@vortigont @willmmiles : I did some testing of this PR compared to main.

I am using the new example merged yesterday: LargeResponse, with the second implementation (CustomResponse) which supports concurrent requests. I ask to send 20 requests concurrently (to specifically go over the lwip limit), and count the received bytes. Result should be 16000.

> for i in {1..20}; do ( curl -s http://192.168.4.1/2 | wc -c ) & done;

main:

=> OK: everything works fine and I receive all 20x 16000 characters.

this pr:

=> CRASH

_send_buffer.resize(std::min(space, _contentLength - _sentLength));

So I tried reduce the concurrency to 16 (lwip limit( connections:

> for i in {1..16}; do ( curl -s http://192.168.4.1/2 | wc -c ) & done;

And I am not able to reproduce anymore, except if I keep going and going

But curl + bash like that are not going a great job like autocannon... So I am sapwning it:

32 requests: 16 threads and 16 concurrent connections (so lwip limit)

autocannon -w 16 -c 16 -a 32 http://192.168.4.1/2

=> CRASH

Strangely the crash happens also when using a lower number of connections:

autocannon -w 16 -c 5 -a 32 http://192.168.4.1/2

So as long as the threads are correctly aligned and requests executed pretty much at the same time, the buffer allocations / resize are then done pretty much at the same time also I think. That explains why it is easier to reproduce with autocannon than curl.

So that's not good because it kills the concurrency level of te library.

@willmmiles : how did you solve that in your fork ? You might have the same issue also if you are buffering ? Is is what you are solving thanks to your _safe_allocate_buffer() function ?

abort() was called at PC 0x401590e7 on core 1
  #0  0x401590e7 in __cxxabiv1::__terminate(void (*)()) at /builds/idf/crosstool-NG/.build/xtensa-esp-elf/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:48



Backtrace: 0x4008b5cc:0x3ffd1350 0x4008b591:0x3ffd1370 0x400918e5:0x3ffd1390 0x401590e7:0x3ffd1410 0x4015911c:0x3ffd1430 0x401591f7:0x3ffd1450 0x40159236:0x3ffd1470 0x400dce6b:0x3ffd1490 0x400dcec1:0x3ffd14b0 0x400dd65a:0x3ffd14d0 0x400d9c7d:0x3ffd1520 0x400d9ca1:0x3ffd1540 0x400d600a:0x3ffd1560 0x400d6495:0x3ffd1590 0x400d6515:0x3ffd15d0 0x400d66b9:0x3ffd15f0 0x4008c3e1:0x3ffd1620
  #0  0x4008b5cc in panic_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/panic.c:477
  #1  0x4008b591 in esp_system_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/port/esp_system_chip.c:87
  #2  0x400918e5 in abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/src/abort.c:38
  #3  0x401590e7 in __cxxabiv1::__terminate(void (*)()) at /builds/idf/crosstool-NG/.build/xtensa-esp-elf/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:48
  #4  0x4015911c in std::terminate() at /builds/idf/crosstool-NG/.build/xtensa-esp-elf/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:58 (discriminator 1)
  #5  0x401591f7 in __cxa_throw at /builds/idf/crosstool-NG/.build/xtensa-esp-elf/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:98
  #6  0x40159236 in operator new(unsigned int) at /builds/idf/crosstool-NG/.build/xtensa-esp-elf/src/gcc/libstdc++-v3/libsupc++/new_op.cc:54 (discriminator 2)
  #7  0x400dce6b in std::__new_allocator<unsigned char>::allocate(unsigned int, void const*) at /Users/mat/.platformio/packages/toolchain-xtensa-esp-elf/xtensa-esp-elf/include/c++/14.2.0/bits/new_allocator.h:151
      (inlined by) std::allocator<unsigned char>::allocate(unsigned int) at /Users/mat/.platformio/packages/toolchain-xtensa-esp-elf/xtensa-esp-elf/include/c++/14.2.0/bits/allocator.h:196
      (inlined by) std::allocator_traits<std::allocator<unsigned char> >::allocate(std::allocator<unsigned char>&, unsigned int) at /Users/mat/.platformio/packages/toolchain-xtensa-esp-elf/xtensa-esp-elf/include/c++/14.2.0/bits/alloc_traits.h:478
      (inlined by) std::_Vector_base<unsigned char, std::allocator<unsigned char> >::_M_allocate(unsigned int) at /Users/mat/.platformio/packages/toolchain-xtensa-esp-elf/xtensa-esp-elf/include/c++/14.2.0/bits/stl_vector.h:380
      (inlined by) std::vector<unsigned char, std::allocator<unsigned char> >::_M_default_append(unsigned int) at /Users/mat/.platformio/packages/toolchain-xtensa-esp-elf/xtensa-esp-elf/include/c++/14.2.0/bits/vector.tcc:834
  #8  0x400dcec1 in std::vector<unsigned char, std::allocator<unsigned char> >::resize(unsigned int) at /Users/mat/.platformio/packages/toolchain-xtensa-esp-elf/xtensa-esp-elf/include/c++/14.2.0/bits/stl_vector.h:1016 (discriminator 1)
  #9  0x400dd65a in AsyncAbstractResponse::_ack(AsyncWebServerRequest*, unsigned int, unsigned long) at src/WebResponses.cpp:435 (discriminator 1)
  #10 0x400d9c7d in AsyncWebServerRequest::_onPoll() at src/WebRequest.cpp:222
      (inlined by) AsyncWebServerRequest::_onPoll() at src/WebRequest.cpp:218
  #11 0x400d9ca1 in std::_Function_handler<void (void*, AsyncClient*), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#2}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at src/WebRequest.cpp:82
      (inlined by) __invoke_impl<void, AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::<lambda(void*, AsyncClient*)>&, void*, AsyncClient*> at /Users/mat/.platformio/packages/toolchain-xtensa-esp-elf/xtensa-esp-elf/include/c++/14.2.0/bits/invoke.h:61
      (inlined by) __invoke_r<void, AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::<lambda(void*, AsyncClient*)>&, void*, AsyncClient*> at /Users/mat/.platformio/packages/toolchain-xtensa-esp-elf/xtensa-esp-elf/include/c++/14.2.0/bits/invoke.h:111
      (inlined by) _M_invoke at /Users/mat/.platformio/packages/toolchain-xtensa-esp-elf/xtensa-esp-elf/include/c++/14.2.0/bits/std_function.h:290
  #12 0x400d600a in std::function<void (void*, AsyncClient*)>::operator()(void*, AsyncClient*) const at /Users/mat/.platformio/packages/toolchain-xtensa-esp-elf/xtensa-esp-elf/include/c++/14.2.0/bits/std_function.h:591
  #13 0x400d6495 in AsyncClient::_poll(tcp_pcb*) at .pio/libdeps/arduino-3/AsyncTCP/src/AsyncTCP.cpp:1117
  #14 0x400d6515 in AsyncTCP_detail::handle_async_event(lwip_tcp_event_packet_t*) at .pio/libdeps/arduino-3/AsyncTCP/src/AsyncTCP.cpp:303
  #15 0x400d66b9 in _async_service_task(void*) at .pio/libdeps/arduino-3/AsyncTCP/src/AsyncTCP.cpp:328
  #16 0x4008c3e1 in vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:139

@mathieucarbou
Copy link
Member

mathieucarbou commented Oct 20, 2025

@vortigont : follow-up from #317 (comment)

I just pushed the MRE in the PR and tested it: 51f4472

> curl -s http://192.168.4.1/3 | grep -o '.' | sort | uniq -c

5760 A
4308 B
5760 C
 172 D

=> 16000 OK

Console:

Filling 'A' @ sent: 0, buflen: 5760
Filling 'B' @ sent: 5760, buflen: 4308
Filling 'C' @ sent: 10068, buflen: 5760
Filling 'D' @ sent: 15828, buflen: 172

In main branch, I receive only 15572 bytes indeed:

❯  curl http://192.168.4.1/3 | grep -o '.' | sort | uniq -c
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 15572    0 15572    0     0  18329      0 --:--:-- --:--:-- --:--:-- 18320
5332 A
4308 B
5760 C
 172 D

=> 15572

Console

Filling 'A' @ sent: 0, buflen: 5760
Filling 'B' @ sent: 5760, buflen: 4308
Filling 'C' @ sent: 10068, buflen: 5760
Filling 'D' @ sent: 15828, buflen: 172

So we have a MRE in the project showing that this is fixed 👍

The only issue now remaining is to fix the crash with concurrent requests...

@vortigont
Copy link
Author

that is... unexpected 8-0, not that that it crashes on alloc but that it does not drop req's on main branch

Here are my results and those are interesting. Yes, this PR ver crashes on high concurrency leveles, but somehow it is much faster when not crashing.
I use apache ab tool

== this PR

Concurrency Level:      10
Time taken for tests:   10.030 seconds
Complete requests:      346
Failed requests:        0
Total transferred:      5729792 bytes
HTML transferred:       5586528 bytes
Requests per second:    34.50 [#/sec] (mean)
Time per request:       289.895 [ms] (mean)
Time per request:       28.990 [ms] (mean, across all concurrent requests)
Transfer rate:          557.86 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        7   59 173.8     29    1100
Processing:    75  220 123.7    193    1151
Waiting:        7   37  10.5     38      68
Total:         82  279 215.7    229    1522

== main

Concurrency Level:      10
Time taken for tests:   10.002 seconds
Complete requests:      128
Failed requests:        0
Total transferred:      2111312 bytes
HTML transferred:       2057588 bytes
Requests per second:    12.80 [#/sec] (mean)
Time per request:       781.395 [ms] (mean)
Time per request:       78.140 [ms] (mean, across all concurrent requests)
Transfer rate:          206.14 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       11  110 275.1     29    1058
Processing:   389  562 280.2    489    2482
Waiting:      302  378  47.9    368     485
Total:        443  672 387.6    521    2513

So it is more that 2.5 times faster, do not ask me how :)

that crappy autocannon only gives zeroes to me, have no idea how to interpret it

❯ autocannon -w 10 -c 10 http://192.168.8.26/2
Running 10s test @ http://192.168.8.26/2
10 connections
10 workers

-
┌─────────┬──────┬──────┬───────┬──────┬──────┬───────┬──────┐
│ Stat    │ 2.5% │ 50%  │ 97.5% │ 99%  │ Avg  │ Stdev │ Max  │
├─────────┼──────┼──────┼───────┼──────┼──────┼───────┼──────┤
│ Latency │ 0 ms │ 0 ms │ 0 ms  │ 0 ms │ 0 ms │ 0 ms  │ 0 ms │
└─────────┴──────┴──────┴───────┴──────┴──────┴───────┴──────┘
┌───────────┬─────┬──────┬─────┬───────┬─────┬───────┬─────┐
│ Stat      │ 1%  │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼─────┼──────┼─────┼───────┼─────┼───────┼─────┤
│ Req/Sec   │ 0   │ 0    │ 0   │ 0     │ 0   │ 0     │ 0   │
├───────────┼─────┼──────┼─────┼───────┼─────┼───────┼─────┤
│ Bytes/Sec │ 0 B │ 0 B  │ 0 B │ 0 B   │ 0 B │ 0 B   │ 0 B │
└───────────┴─────┴──────┴─────┴───────┴─────┴───────┴─────┘

Req/Bytes counts sampled once per second.
# of samples: 100

156 requests in 10.02s, 0 B read

I'll think about replacing this std::vector buffer on something more static and controllable, a small circ buffer maybe, but it will take a bit more time for trial and error.

@vortigont
Copy link
Author

yeah, I'm using /2 for testing now, it crashes with ab with connections above 10 too.
Memory pressure is much higher with this PR, but so as the performance. Scratching head...

Autocannon does not give me any readable stat at all, it just runs requests then quits with zeroes in output. Works but quite useless for any analysis.

➜ autocannon -w 16 -c 16 -a 32 http://192.168.8.26/2
Running 32 requests test @ http://192.168.8.26/2
16 connections
16 workers

/
┌─────────┬──────┬──────┬───────┬──────┬──────┬───────┬──────┐
│ Stat    │ 2.5% │ 50%  │ 97.5% │ 99%  │ Avg  │ Stdev │ Max  │
├─────────┼──────┼──────┼───────┼──────┼──────┼───────┼──────┤
│ Latency │ 0 ms │ 0 ms │ 0 ms  │ 0 ms │ 0 ms │ 0 ms  │ 0 ms │
└─────────┴──────┴──────┴───────┴──────┴──────┴───────┴──────┘
┌───────────┬─────┬──────┬─────┬───────┬─────┬───────┬─────┐
│ Stat      │ 1%  │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼─────┼──────┼─────┼───────┼─────┼───────┼─────┤
│ Req/Sec   │ 0   │ 0    │ 0   │ 0     │ 0   │ 0     │ 0   │
├───────────┼─────┼──────┼─────┼───────┼─────┼───────┼─────┤
│ Bytes/Sec │ 0 B │ 0 B  │ 0 B │ 0 B   │ 0 B │ 0 B   │ 0 B │
└───────────┴─────┴──────┴─────┴───────┴─────┴───────┴─────┘

Req/Bytes counts sampled once per second.
# of samples: 31

32 requests in 1.01s, 0 B read

@mathieucarbou
Copy link
Member

that is... unexpected 8-0, not that that it crashes on alloc but that it does not drop req's on main branch

@vortigont just to clarify:

examples/LargeResponse

I did my testing:

  • with the handler at /2 of examples/LargeResponse
  • AP mode (using the example)

In main: /2 handler works (16000 bytes).
In this PR: /2 handler fails either constantly or randomly depending how requests arrive

I did not use the /3 handler for my testing: it just acts as a MRE to reproduce the is issue

Using ab -c 16 -t 10 http://192.168.4.1/2 I can reproduce the crash. Even with -c 10.

But in both cases, it is random. I have to restart ab several times to trigger it while I get to reproduce it more easily with autocannon.

examples/PerfTests

- For request serving:

This is insanely fast! We were barely reaching 13 req/s before on average! This is more like 5-6 times faster!

❯  autocannon -c 16 -w 16 -d 20 --renderStatusCodes http://192.168.4.1
Running 20s test @ http://192.168.4.1
16 connections
16 workers

\
┌─────────┬────────┬─────────┬──────────┬──────────┬────────────┬────────────┬──────────┐
│ Stat    │ 2.5%   │ 50%     │ 97.5%    │ 99%      │ Avg        │ Stdev      │ Max      │
├─────────┼────────┼─────────┼──────────┼──────────┼────────────┼────────────┼──────────┤
│ Latency │ 206 ms │ 4246 ms │ 11578 ms │ 12129 ms │ 4749.38 ms │ 3246.55 ms │ 14444 ms │
└─────────┴────────┴─────────┴──────────┴──────────┴────────────┴────────────┴──────────┘
┌───────────┬────────┬────────┬────────┬────────┬────────┬─────────┬────────┐
│ Stat      │ 1%     │ 2.5%   │ 50%    │ 97.5%  │ Avg    │ Stdev   │ Min    │
├───────────┼────────┼────────┼────────┼────────┼────────┼─────────┼────────┤
│ Req/Sec   │ 44     │ 44     │ 69     │ 80     │ 68.16  │ 7.75    │ 44     │
├───────────┼────────┼────────┼────────┼────────┼────────┼─────────┼────────┤
│ Bytes/Sec │ 193 kB │ 193 kB │ 302 kB │ 350 kB │ 298 kB │ 33.9 kB │ 193 kB │
└───────────┴────────┴────────┴────────┴────────┴────────┴─────────┴────────┘
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200  │ 1363  │
└──────┴───────┘

Req/Bytes counts sampled once per second.
# of samples: 320

3k requests in 20.03s, 5.97 MB read
200 errors (0 timeouts)

@mathieucarbou
Copy link
Member

Autocannon does not give me any readable stat at all, it just runs requests then quits with zeroes in output. Works but quite useless for any analysis.

Yes it is bad at interpreting response, I guess because of the way the response is crafted with this subclass. But anyway this is not important. What's important is a tool that triggers concurrent requests with workers/threads.

@vortigont
Copy link
Author

But anyway this is not important

it is important to understand how fast it works, otherwise I wound not have noticed :)
Anyway will use both tools from now for all side testing

@mathieucarbou
Copy link
Member

why is that request pointer passed to user's callback, any idea? I mean is there any use case for it? Otherwise would have to keep request ptr till callback compltes.

The request ptr is not passed to public api: _newClient is closed api. Only the ws client object is passed.

On of the useful thing is:

  • to identify the client: client->id() or using client ptr directly
  • to call client->ping()
  • to call client->setCloseClientOnQueueFull(false);

I might not understand your question ?

@vortigont
Copy link
Author

I might not understand your question ?

yeah, sorry, I mean here

_handleEvent(&_clients.back(), WS_EVT_CONNECT, nullptr /* request */, NULL, 0);

3rd arg was request (here replaced with nullptr), which then passed as void* to user's callback to handle ws connect event. It is in addition to client*, I never used it in my code anywhere, do not know if there could be any use of getting request obj pointer in websocket-related callback.

@mathieucarbou
Copy link
Member

mathieucarbou commented Oct 31, 2025

I might not understand your question ?

yeah, sorry, I mean here

_handleEvent(&_clients.back(), WS_EVT_CONNECT, nullptr /* request */, NULL, 0);

3rd arg was request (here replaced with nullptr), which then passed as void* to user's callback to handle ws connect event. It is in addition to client*, I never used it in my code anywhere, do not know if there could be any use of getting request obj pointer in websocket-related callback.

I see!
This is only in the case of WS_EVT_CONNECT.

Use cases:

  • in order to cast arg to request to get the request query parameters (allowed in WebSockets spec)
  • In order to get request attributes that could have been set by a middleware, looking at the request headers like cookies to add a userID / session in the request attributes.
  • Basically any use case where the app would ned to maintain a link between an identified user and a websocket client

This makes me think that I forgot to add it in AsyncWebSocketMessageHandler. The onConnect callback should be:

void onConnect(std::function<void(AsyncWebSocket *server, AsyncWebSocketClient *client, AsyncWebServerRequest *request)> onConnect) {

@vortigont
Copy link
Author

yeah, you right, that makes sense to get extensions from headers.
OK, not a big deal, I'll move it. It is for connect event only, right? We do not need to keep it further during whole ws connecton life-time, do we?

@mathieucarbou
Copy link
Member

yeah, you right, that makes sense to get extensions from headers. OK, not a big deal, I'll move it. It is for connect event only, right? We do not need to keep it further during whole ws connecton life-time, do we?

This is only for connect yes... After that, the arg is casted in a AwsFrameInfo for data events

@vortigont
Copy link
Author

should good this way :)

@mathieucarbou
Copy link
Member

should good this way :)

I will have a look, test and also update the callback to add the request: this class was added recently by me as a way to simplify WS usage so this is not used a lot and this is an acceptable api break I think providing we can update next version to 3.9.0 considering all the things that will be released.

@yoursunny
Copy link

As of commit 55b984b, this PR causes a regression in template processing.
I am running the Templates.ino example, slightly modified to use WiFi STA mode instead of AP mode, on either ESP32 or ESP32S3.
In the console output shown below, the downloaded file dynamic.html has the first octet as 0x00.
However, from the main branch, at commit 37933e3, the first octet is 0x0A.

sunny@sunnyB:~/Downloads$ wget http://192.168.5.85/dynamic.html
--2025-11-02 08:12:32--  http://192.168.5.85/dynamic.html
Connecting to 192.168.5.85:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘dynamic.html’

dynamic.html                      [ <=>                                              ]      77  --.-KB/s    in 0s

2025-11-02 08:12:33 (5.28 MB/s) - ‘dynamic.html’ saved [77]

sunny@sunnyB:~/Downloads$ hexdump -C dynamic.html
00000000  00 3c 21 44 4f 43 54 59  50 45 20 68 74 6d 6c 3e  |.<!DOCTYPE html>|
00000010  0a 3c 68 74 6d 6c 3e 0a  3c 62 6f 64 79 3e 0a 20  |.<html>.<body>. |
00000020  20 20 20 3c 68 31 3e 48  65 6c 6c 6f 2c 20 42 6f  |   <h1>Hello, Bo|
00000030  62 20 31 31 36 35 32 3c  2f 68 31 3e 0a 3c 2f 62  |b 11652</h1>.</b|
00000040  6f 64 79 3e 0a 3c 2f 68  74 6d 6c 3e 0a           |ody>.</html>.|
0000004d

@mathieucarbou
Copy link
Member

his PR causes a regression in template processing.

Thanks a lot for participating in this PR's testing!

@vortigont
Copy link
Author

good catch @yoursunny,
but I sooo do not want to get into that templates code. In fact I think it should be removed from webserver at all, it brings so many complications while providing very limited functionality. There are much better templating engines available around that does not need to be embedded into web server's code.
I'll see if I can spot the problem and fix it without much of code changes. This simple issue fix has already went too way out of it's initial scope :))

@mathieucarbou
Copy link
Member

This simple issue fix has already went too way out of it's initial scope :))

You mean was more impactful than initially thought 😀

I will take time this morning to test again the other parts. For now we always discovered bad side effects / crashes so that’s ok to make sure that this (complex) fix is well tested and does not crash / impact other ones.

@vortigont
Copy link
Author

mean could have just fix the problem without further enhancements and regressions :)))
But anyway, it is what it is. Not sure what to do with that templating, I guess it should work as-is since it's just same buffer, but somehow it places a NULL byte at the beginning.

@mathieucarbou
Copy link
Member

mean could have just fix the problem without further enhancements and regressions :))) But anyway, it is what it is. Not sure what to do with that templating, I guess it should work as-is since it's just same buffer, but somehow it places a NULL byte at the beginning.

I don't understand either why your change impacts templating and not other endpoints. That's what I will try to check.

@mathieucarbou
Copy link
Member

@vortigont : ⚠️ Added a commit to fix typos and rebased on main + squashed commits

@mathieucarbou
Copy link
Member

mathieucarbou commented Nov 4, 2025

@vortigont : test results:

❯  autocannon -c 16 -w 16 -d 20 --renderStatusCodes  http://192.168.4.1/
Running 20s test @ http://192.168.4.1/
16 connections
16 workers

/
┌─────────┬────────┬─────────┬──────────┬──────────┬────────────┬────────────┬──────────┐
│ Stat    │ 2.5%   │ 50%     │ 97.5%    │ 99%      │ Avg        │ Stdev      │ Max      │
├─────────┼────────┼─────────┼──────────┼──────────┼────────────┼────────────┼──────────┤
│ Latency │ 220 ms │ 4809 ms │ 11013 ms │ 11514 ms │ 4935.47 ms │ 3134.86 ms │ 12016 ms │
└─────────┴────────┴─────────┴──────────┴──────────┴────────────┴────────────┴──────────┘
┌───────────┬────────┬────────┬────────┬────────┬────────┬───────┬────────┐
│ Stat      │ 1%     │ 2.5%   │ 50%    │ 97.5%  │ Avg    │ Stdev │ Min    │
├───────────┼────────┼────────┼────────┼────────┼────────┼───────┼────────┤
│ Req/Sec   │ 56     │ 56     │ 68     │ 79     │ 67.66  │ 5.48  │ 56     │
├───────────┼────────┼────────┼────────┼────────┼────────┼───────┼────────┤
│ Bytes/Sec │ 245 kB │ 245 kB │ 298 kB │ 346 kB │ 296 kB │ 24 kB │ 245 kB │
└───────────┴────────┴────────┴────────┴────────┴────────┴───────┴────────┘
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200  │ 1353  │
└──────┴───────┘

Req/Bytes counts sampled once per second.
# of samples: 320

3k requests in 20.14s, 5.92 MB read
180 errors (0 timeouts)
❯  ab -c 16 -t 20 http://192.168.4.1/
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.4.1 (be patient)
Finished 1377 requests


Server Software:        
Server Hostname:        192.168.4.1
Server Port:            80

Document Path:          /
Document Length:        4272 bytes

Concurrency Level:      16
Time taken for tests:   20.001 seconds
Complete requests:      1377
Failed requests:        0
Total transferred:      6003946 bytes
HTML transferred:       5886816 bytes
Requests per second:    68.85 [#/sec] (mean)
Time per request:       232.398 [ms] (mean)
Time per request:       14.525 [ms] (mean, across all concurrent requests)
Transfer rate:          293.15 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        9  145 429.6     28    5073
Processing:    31   79  22.5     76     226
Waiting:       14   46  18.1     43     197
Total:         46  224 434.2    105    5162

Percentage of the requests served within a certain time (ms)
  50%    105
  66%    118
  75%    131
  80%    139
  90%    177
  95%   1120
  98%   1160
  99%   2149
 100%   5162 (longest request)
for i in {1..16}; do ( count=$(gtimeout 30 curl -s -N -H "Accept: text/event-stream" http://192.168.4.1/events 2>&1 | grep -c "^data:"); echo "Total: $count events, $(echo "$count / 4" | bc -l) events / second" ) & done;
Total: 1284 events, 321.00000000000000000000 events / second
Total: 1398 events, 349.50000000000000000000 events / second
Total: 1398 events, 349.50000000000000000000 events / second
Total: 1311 events, 327.75000000000000000000 events / second
Total: 1398 events, 349.50000000000000000000 events / second
Total: 1481 events, 370.25000000000000000000 events / second
Total: 1481 events, 370.25000000000000000000 events / second
Total: 1334 events, 333.50000000000000000000 events / second
Total: 1334 events, 333.50000000000000000000 events / second
Total: 1329 events, 332.25000000000000000000 events / second
Total: 1334 events, 333.50000000000000000000 events / second
Total: 1481 events, 370.25000000000000000000 events / second
Total: 1399 events, 349.75000000000000000000 events / second
Total: 1333 events, 333.25000000000000000000 events / second
Total: 1398 events, 349.50000000000000000000 events / second
Total: 1481 events, 370.25000000000000000000 events / second
ServerSentEvents.ino => OK
WebSocket.ino => OK
SlowChunkResponse.ino => NOW FAILS (as expected with he TWDT)

I have updated the comments on the test doing slow response to explain. Anyway., nobody should do that and stall the async_tcp task.

Templates.ino => OK

❯  curl --output - http://192.168.4.1/dynamic.html | hexdump 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    77    0    77    0     0   2287      0 --:--:-- --:--:-- --:--:--  2333
0000000 3c00 4421 434f 5954 4550 6820 6d74 3e6c
0000010 3c0a 7468 6c6d 0a3e 623c 646f 3e79 200a
0000020 2020 3c20 3168 483e 6c65 6f6c 202c 6f42
0000030 2062 3534 3434 3c32 682f 3e31 3c0a 622f
0000040 646f 3e79 3c0a 682f 6d74 3e6c 000a     
000004d

I have also tried other endpoints and I am not able to reproduce @yoursunny issue.

To me the PR is good to go.

Also. no memory leak. I have monitored the heap.

AsyncAbstractResponse::_ack could allocate temp buffer with size larger than
available sock buffer (i.e. to fit headers) and eventually lossing the remainder on transfer
due to not checking if the complete data was added to sock buff.

Refactoring code in favor of having a dedicated std::vector object acting as accumulating
buffer and more carefull control on amount of data actually copied to sockbuff

Closes #315

Added back MRE

added overrides

add AsyncWebServerRequest::clientRelease() method
this will explicitly relese ownership of AsyncClient* object.
Make it more clear on ownership change for SSE/WebSocket

ci(pre-commit): Apply automatic fixes

AsyncWebSocketResponse - keep request object till WS_EVT_CONNECT event is executed

user code might use HTTP headers information from the request

ci(pre-commit): Apply automatic fixes

fix typo

Add comment for slow response

Cleanup wrong log line

ci(pre-commit): Apply automatic fixes
@mathieucarbou
Copy link
Member

@vortigont : ⚠️ rebased + squashed again.

mathieucarbou
mathieucarbou previously approved these changes Nov 4, 2025
@yoursunny
Copy link

Templates.ino regression still occurs as of commit 747223f.

@mathieucarbou test report also shows the regression.

Templates.ino => OK

❯  curl --output - http://192.168.4.1/dynamic.html | hexdump 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    77    0    77    0     0   2287      0 --:--:-- --:--:-- --:--:--  2333
0000000 3c00 4421 434f 5954 4550 6820 6d74 3e6c
0000010 3c0a 7468 6c6d 0a3e 623c 646f 3e79 200a
0000020 2020 3c20 3168 483e 6c65 6f6c 202c 6f42
0000030 2062 3534 3434 3c32 682f 3e31 3c0a 622f
0000040 646f 3e79 3c0a 682f 6d74 3e6c 000a     
000004d

hexdump, without flags, renders the input in 2-octet units.
The initial two octets are 0x3c00, which means the first octet is 0x00 and the second octet is 0x3c.
This would be easier to see with hexdump -C flag.

@mathieucarbou
Copy link
Member

mathieucarbou commented Nov 4, 2025

hexdump, without flags, renders the input in 2-octet units.
The initial two octets are 0x3c00, which means the first octet is 0x00 and the second octet is 0x3c.
This would be easier to see with hexdump -C flag.

My bad! Sorry you're right! 😓

❯  curl -v --output - http://192.168.4.1/uptime.html | hexdump -C
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 192.168.4.1:80...
* Established connection to 192.168.4.1 (192.168.4.1 port 80) from 192.168.4.2 port 57069 
* using HTTP/1.x
> GET /uptime.html HTTP/1.1
> Host: 192.168.4.1
> User-Agent: curl/8.16.0
> Accept: */*
> 
* Request completely sent off
< HTTP/1.1 200 OK
< Content-Disposition: inline
< Last-Modified: Thu, 01 Jan 1970 00:13:00 GMT
< Cache-Control: no-cache
< Connection: close
< Accept-Ranges: none
< Transfer-Encoding: chunked
< Content-Type: text/html
< 
{ [98 bytes data]
100    82    0    82    0     0   2326      0 --:--:-- --:--:-- --:--:--  2342
* shutting down connection #0
00000000  00 3c 21 44 4f 43 54 59  50 45 20 68 74 6d 6c 3e  |.<!DOCTYPE html>|
00000010  0a 3c 68 74 6d 6c 3e 0a  3c 62 6f 64 79 3e 0a 20  |.<html>.<body>. |
00000020  20 20 20 3c 68 31 3e 48  65 6c 6c 6f 2c 20 42 6f  |   <h1>Hello, Bo|
00000030  62 20 31 33 20 6d 69 6e  75 74 65 73 3c 2f 68 31  |b 13 minutes</h1|
00000040  3e 0a 3c 2f 62 6f 64 79  3e 0a 3c 2f 68 74 6d 6c  |>.</body>.</html|
00000050  3e 0a                                             |>.|
00000052

And it seems to be only for templates: I did not see that elsewhere.

wireshark dump: chunk size added : 0052 then data starts with 003c...

image

@mathieucarbou mathieucarbou dismissed their stale review November 4, 2025 14:17

Still this 0x00 byte for template to fix

…e which was overriding the first template byte by null string terminator
@yoursunny
Copy link

The regression also affects ChunkResponse.ino example.
As before, I modified the WiFi connection code to use STA mode instead of AP mode, because it's more convenient in my environment.

main branch, commit c127402:

sunny@sunnyB:~$ curl -fsLS --output - http://192.168.5.85/ | hexdump -C | head -8
curl: (18) transfer closed with outstanding read data remaining
00000000  0a 3c 21 44 4f 43 54 59  50 45 20 68 74 6d 6c 3e  |.<!DOCTYPE html>|
00000010  0a 3c 68 74 6d 6c 3e 0a  3c 68 65 61 64 3e 0a 20  |.<html>.<head>. |
00000020  20 20 20 3c 74 69 74 6c  65 3e 53 61 6d 70 6c 65  |   <title>Sample|
00000030  20 48 54 4d 4c 3c 2f 74  69 74 6c 65 3e 0a 3c 2f  | HTML</title>.</|
00000040  68 65 61 64 3e 0a 3c 62  6f 64 79 3e 0a 20 20 20  |head>.<body>.   |
00000050  20 3c 68 31 3e 48 65 6c  6c 6f 2c 20 57 6f 72 6c  | <h1>Hello, Worl|
00000060  64 21 3c 2f 68 31 3e 0a  20 20 20 20 3c 70 3e 4c  |d!</h1>.    <p>L|
00000070  6f 72 65 6d 20 69 70 73  75 6d 20 64 6f 6c 6f 72  |orem ipsum dolor|

wresp_315 branch, commit 747223f:

sunny@sunnyB:~$ curl -fsLS --output - http://192.168.5.85/ | hexdump -C | head -8
00000000  00 3c 21 44 4f 43 54 59  50 45 20 68 74 6d 6c 3e  |.<!DOCTYPE html>|
00000010  0a 3c 68 74 6d 6c 3e 0a  3c 68 65 61 64 3e 0a 20  |.<html>.<head>. |
00000020  20 20 20 3c 74 69 74 6c  65 3e 53 61 6d 70 6c 65  |   <title>Sample|
00000030  20 48 54 4d 4c 3c 2f 74  69 74 6c 65 3e 0a 3c 2f  | HTML</title>.</|
00000040  68 65 61 64 3e 0a 3c 62  6f 64 79 3e 0a 20 20 20  |head>.<body>.   |
00000050  20 3c 68 31 3e 48 65 6c  6c 6f 2c 20 57 6f 72 6c  | <h1>Hello, Worl|
00000060  64 21 3c 2f 68 31 3e 0a  20 20 20 20 3c 70 3e 4c  |d!</h1>.    <p>L|
00000070  6f 72 65 6d 20 69 70 73  75 6d 20 64 6f 6c 6f 72  |orem ipsum dolor|

@mathieucarbou
Copy link
Member

mathieucarbou commented Nov 4, 2025

Templates.ino regression still occurs as of commit 747223f.

@yoursunny this is fixed in bb0dd46. This was caused by a bug in @vortigont 's refactoring where sprintf was used to convert the chunk size into hex chaarcters, but this added a null string terminator into the buffer, which was overriding the first byte form the template.

I will let you verify.

Now I have:

❯  curl -v --output - http://192.168.4.1/uptime.html | hexdump -C
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 192.168.4.1:80...
* Established connection to 192.168.4.1 (192.168.4.1 port 80) from 192.168.4.2 port 62281 
* using HTTP/1.x
> GET /uptime.html HTTP/1.1
> Host: 192.168.4.1
> User-Agent: curl/8.16.0
> Accept: */*
> 
* Request completely sent off
< HTTP/1.1 200 OK
< Content-Disposition: inline
< Last-Modified: Thu, 01 Jan 1970 00:01:00 GMT
< Cache-Control: no-cache
< Connection: close
< Accept-Ranges: none
< Transfer-Encoding: chunked
< Content-Type: text/html
< 
{ [97 bytes data]
100    81    0    81    0     0   1016      0 --:--:-- --:--:-- --:--:--  1025
* shutting down connection #0
00000000  0a 3c 21 44 4f 43 54 59  50 45 20 68 74 6d 6c 3e  |.<!DOCTYPE html>|
00000010  0a 3c 68 74 6d 6c 3e 0a  3c 62 6f 64 79 3e 0a 20  |.<html>.<body>. |
00000020  20 20 20 3c 68 31 3e 48  65 6c 6c 6f 2c 20 42 6f  |   <h1>Hello, Bo|
00000030  62 20 31 20 6d 69 6e 75  74 65 73 3c 2f 68 31 3e  |b 1 minutes</h1>|
00000040  0a 3c 2f 62 6f 64 79 3e  0a 3c 2f 68 74 6d 6c 3e  |.</body>.</html>|
00000050  0a                                                |.|
00000051

@yoursunny
Copy link

I think it's OK now, for commit bb0dd46:

Templates.ino
ChunkResponse.ino
#315 MRE

@vortigont
Copy link
Author

ah! that sprintf :) Should've used snprintf probably, but your solution is pretty interesting per ce, @mathieucarbou :)
OK, I think we are good to go finally 👍

@vortigont vortigont merged commit 652f70e into main Nov 5, 2025
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AsyncAbstractResponse truncates 85 octets in the non-empty segment Lost writes using AsyncCallbackResponse during low memory

6 participants