Performance regression on upgrade to 1.4 #4417

Zetanova · 2020-05-17T11:11:01Z

I upgraded all nodes from 1.3.X to 1.4.6 all nodes working fine
but all nodes using 100% of a core in idling state.

Even a WebApi-Node that hosts more or less no custom actors.

I am using docker 19.03.8 and dotnet core 3.1
Normaly i would use procexp to check what thread is using 100%
but under docker/linux, i dont know how to do it.

ThreadList of the WebAPI node:

I would be glad to get tips how to debug it or resolve.

Zetanova · 2020-05-17T13:11:00Z

with the help of dotnet-trace
i created a cpu perf trace and open it in vs

the hotpath is mared as "external" how to load the rest of the debug symbols?

Edit: found the filter "show external code path"

Zetanova · 2020-05-17T13:54:44Z

One hotpath is in DotNetty.Common.dll with a Slim lock:

The second hotpath is in Helios.Concurrency.DedicatedThreadPool
and most likly related to the DotNetty WaitHandle

A third hotpathis in HashWheelTimerSchedule.WaitForNextTick()

Diagnosis

All hotpaths seam to use the ManualResetEventSlim WaitHandle
It spins always first as a Life-Lock and switches then to a normal WaitHandle.

ManualResetEventSlim should only be used if it is assumed to be set most of the time and the lock time is very short. If it is used frequently and not always in a reset state if will produce high CPU cycles/spins.

Zetanova · 2020-05-18T11:41:31Z

I tested now the HashWheelTimerSchedule.WaitForNextTick() and could not reproduce the behavior in a demo project. https://github.com/Zetanova/AkkaTimerTest

it is very strage.

my current status is that if i start a 1-5 nodes in debug/docker they consume
on 2cores around 80% CPU in idling around.

Zetanova · 2020-05-18T16:01:14Z

Debug or production builds making no difference

The seed node start with ~2.5% CPU
when the second node joins the cluster then both using ~6% CPU and idling
when the 3th node is joining the cluster all nodes using ~11% CPU
when the 4th and 5th node joing the cluster all nodes using 12-17% CPU

5x 15% => 85% CPU

Aaronontheweb · 2020-05-18T18:17:03Z

As we tried to show everyone as loudly as we could in all Akka.NET v1.4 release notes and documentation: https://getakka.net/articles/remoting/performance.html

Turn off remote batching if you're running a low-traffic system.

Aaronontheweb · 2020-05-18T18:17:38Z

Please do that and update us with the results.

Zetanova · 2020-05-18T19:04:06Z

I already tried to disable dotnetty buffering without any chance.
It is not about the latency is about the sudden high CPU consumption
In a release build it is not that high

I even chance back to 1.3.6 and have the same issue.

Then i tried to chance a image in the k8s cluster
the idling container/pod jumped from 20-40m to 115-150m

I am currently try to run it on older dotnet versions.
Maybe there was something...

Aaronontheweb · 2020-05-18T19:07:40Z

Ok, if's not an issue with the DotNetty batching system then that's a bit of a mystery - might be that .NET Core changed part of the underlying runtime itself. We didn't touch many of the concurrency primitives, other than changing onto .NET Standard 2.0.

Zetanova · 2020-05-18T19:55:59Z

I tried down to mcr.microsoft.com/dotnet/core/aspnet:3.1.1-buster-slim and all with this new issue

mcr.microsoft.com/dotnet/core/aspnet:3.1-alpine3.11
mcr.microsoft.com/dotnet/core/aspnet:3.1-alpine3.10
have the same issue but only using 60MB memory (debian-buster used 100MB)

I think its a kernel patch or something all distro images a got rebuild 20 days ago
and that triggered MS to rebuild of new and old dotnet versions too.

Maybe someone can confirm the high CPU usage.
but dont forget to make a pull before to download the latest build
docker pull mcr.microsoft.com/dotnet/core/aspnet:3.1 and
mcr.microsoft.com/dotnet/core/sdk:3.1

Zetanova · 2020-05-18T20:03:01Z

It is very easy to check

start the seed node/container of the cluster
docker stats => seed node 2.6% CPU
start a second node/container and let it join the cluster
docker stats => both nodes have ~6% CPU each
start a 3th node/container and let it join the cluster
docker stats => all nodes have ~10-13% CPU each

Aaronontheweb · 2020-05-18T20:24:26Z

I think its a kernel patch or something all distro images a got rebuild 20 days ago
and that triggered MS to rebuild of new and old dotnet versions too.

So you don't think this is an Akka.NET issue? Just want to be clear.

Aaronontheweb · 2020-05-18T20:58:50Z

Might not be a bad idea to revisit #4032 cc @akkadotnet/contributors

Zetanova · 2020-05-18T21:44:40Z

yes, no akka issue

Aaronontheweb · 2020-05-26T20:15:15Z

@Zetanova looks like there's evidence that this is an Akka.NET issue - follow #4434 for updates on it. User added a pretty convincing reproduction sample.

Aaronontheweb added this to the 1.4.7 milestone May 18, 2020

Aaronontheweb added perf akka-actor akka-remote labels May 18, 2020

Zetanova closed this as completed May 18, 2020

Aaronontheweb mentioned this issue May 26, 2020

Akka v1.4 Idle CPU usage increased comparing v1.3 #4434

Closed

Aaronontheweb removed this from the 1.4.7 milestone May 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression on upgrade to 1.4 #4417

Performance regression on upgrade to 1.4 #4417

Zetanova commented May 17, 2020

Zetanova commented May 17, 2020 •

edited

Loading

Zetanova commented May 17, 2020

Zetanova commented May 18, 2020

Zetanova commented May 18, 2020

Aaronontheweb commented May 18, 2020

Aaronontheweb commented May 18, 2020

Zetanova commented May 18, 2020

Aaronontheweb commented May 18, 2020

Zetanova commented May 18, 2020

Zetanova commented May 18, 2020

Aaronontheweb commented May 18, 2020

Aaronontheweb commented May 18, 2020

Zetanova commented May 18, 2020

Aaronontheweb commented May 26, 2020

Performance regression on upgrade to 1.4 #4417

Performance regression on upgrade to 1.4 #4417

Comments

Zetanova commented May 17, 2020

Zetanova commented May 17, 2020 • edited Loading

Zetanova commented May 17, 2020

Diagnosis

Zetanova commented May 18, 2020

Zetanova commented May 18, 2020

Aaronontheweb commented May 18, 2020

Aaronontheweb commented May 18, 2020

Zetanova commented May 18, 2020

Aaronontheweb commented May 18, 2020

Zetanova commented May 18, 2020

Zetanova commented May 18, 2020

Aaronontheweb commented May 18, 2020

Aaronontheweb commented May 18, 2020

Zetanova commented May 18, 2020

Aaronontheweb commented May 26, 2020

Zetanova commented May 17, 2020 •

edited

Loading