Skip to content

[BUG] syndic event forwarding broken with 3006.10 #67936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
hurzhurz opened this issue Apr 2, 2025 · 5 comments
Open
3 tasks done

[BUG] syndic event forwarding broken with 3006.10 #67936

hurzhurz opened this issue Apr 2, 2025 · 5 comments
Labels
Bug broken, incorrect, or confusing behavior needs-triage

Comments

@hurzhurz
Copy link
Contributor

hurzhurz commented Apr 2, 2025

Description
Since 3006.10, the forwarding of events from Syndic to the higher level Master doesn't work properly anymore.
In such cases, it even raises an exception: TypeError: Minion._fire_master() got an unexpected keyword argument 'sync'

The cause seems to be this change: f6aa6ff

Removing this line seems to fix the issue:

"sync": True, # Sync needs to be true unless being called from a coroutine

Setup
Ubuntu 24.04.2 LTS with Salt 3006.10 from official repository

  • Master-of-Masters (test-salt-mom): with salt-master and salt-minion
  • Syndic (test-salt-syndic): with salt-master, salt-syndic and salt-minion

Each minion is connected to its local master. And syndic of course to the Master-of-Masters.
No special config in this test setup, just the bare minimum to connect them with another.

  • VM (Virtualbox, KVM, etc. please specify)
  • container (LXC)
  • onedir packaging

Steps to Reproduce the behavior

Let syndic-minion fire event:

root@test-salt-mom:~# salt test-salt-syndic event.fire_master '{"data":"my event data"}' 'tag'

test-salt-syndic:
    True

Syndic-Log with exception:

root@test-salt-syndic:~# tail -f /var/log/salt/syndic

2025-04-02 09:11:29,219 [tornado.application:640 ][ERROR   ][24153] Exception in callback <bound method SyndicManager._forward_events of <salt.minion.SyndicManager object at 0x7ff971b9d3c0>>
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/ioloop.py", line 1028, in _run
    return self.callback()
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py", line 3865, in _forward_events
    self._call_syndic(
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py", line 3693, in _call_syndic
    getattr(syndic_future.result(), func)(*args, **kwargs)
TypeError: Minion._fire_master() got an unexpected keyword argument 'sync'

Few events received on MoM, actually fired event is missing:

root@test-salt-mom:~# salt-run state.event pretty=True

20250402091128759007    {
    "_stamp": "2025-04-02T09:11:28.759404",
    "minions": [
        "test-salt-syndic"
    ]
}
salt/job/20250402091128759007/new       {
    "_stamp": "2025-04-02T09:11:28.759956",
    "arg": [
        {
            "data": "my event data"
        },
        "tag"
    ],
    "fun": "event.fire_master",
    "jid": "20250402091128759007",
    "minions": [
        "test-salt-syndic"
    ],
    "missing": [],
    "tgt": "test-salt-syndic",
    "tgt_type": "glob",
    "user": "root"
}
salt/job/20250402091128759007/ret/test-salt-syndic      {
    "_stamp": "2025-04-02T09:11:29.726911",
    "fun": "event.fire_master",
    "fun_args": [
        {
            "data": "my event data"
        },
        "tag"
    ],
    "id": "test-salt-syndic",
    "jid": "20250402091128759007",
    "retcode": 0,
    "return": true,
    "success": true
}

Expected behavior

With interims fix (mentioned line removed):

root@test-salt-mom:~# salt test-salt-syndic event.fire_master '{"data":"my event data"}' 'tag'

test-salt-syndic:
    True

All events received on MoM:

root@test-salt-mom:~# salt-run state.event pretty=True

20250402090836106050    {
    "_stamp": "2025-04-02T09:08:36.106334",
    "minions": [
        "test-salt-syndic"
    ]
}
salt/job/20250402090836106050/new       {
    "_stamp": "2025-04-02T09:08:36.107028",
    "arg": [],
    "fun": "test.ping",
    "jid": "20250402090836106050",
    "minions": [
        "test-salt-syndic"
    ],
    "missing": [],
    "tgt": "test-salt-syndic",
    "tgt_type": "glob",
    "user": "root"
}
20250402090836106050    {
    "_stamp": "2025-04-02T09:08:36.200337",
    "minions": [
        "test-salt-syndic"
    ]
}
syndic/test-salt-syndic/20250402090836106050    {
    "_stamp": "2025-04-02T09:08:36.201327",
    "minions": [
        "test-salt-syndic"
    ]
}
salt/job/20250402090836106050/new       {
    "_stamp": "2025-04-02T09:08:36.201811",
    "arg": [],
    "fun": "test.ping",
    "jid": "20250402090836106050",
    "minions": [
        "test-salt-syndic"
    ],
    "missing": [],
    "tgt": "test-salt-syndic",
    "tgt_type": "glob",
    "user": "root"
}
syndic/test-salt-syndic/salt/job/20250402090836106050/new       {
    "_stamp": "2025-04-02T09:08:36.202655",
    "arg": [],
    "fun": "test.ping",
    "jid": "20250402090836106050",
    "minions": [
        "test-salt-syndic"
    ],
    "missing": [],
    "tgt": "test-salt-syndic",
    "tgt_type": "glob",
    "user": "root"
}
salt/job/20250402090836106050/ret/test-salt-syndic      {
    "_stamp": "2025-04-02T09:08:36.207895",
    "fun": "test.ping",
    "fun_args": [],
    "id": "test-salt-syndic",
    "jid": "20250402090836106050",
    "retcode": 0,
    "return": true,
    "success": true
}

Versions Report

salt --versions-report
Salt Version:
          Salt: 3006.10

Python Version:
        Python: 3.10.16 (main, Mar  6 2025, 02:23:15) [GCC 11.2.0]

Dependency Versions:
          cffi: 1.14.6
      cherrypy: unknown
  cryptography: 42.0.5
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.6
       libgit2: Not Installed
  looseversion: 1.0.2
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.19.1
        pygit2: Not Installed
  python-gnupg: 0.4.8
        PyYAML: 6.0.1
         PyZMQ: 23.2.0
        relenv: 0.18.1
         smmap: Not Installed
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: ubuntu 24.04.2 noble
        locale: utf-8
       machine: x86_64
       release: 5.15.167.4-microsoft-standard-WSL2
        system: Linux
       version: Ubuntu 24.04.2 noble
@tkaehn
Copy link

tkaehn commented Apr 4, 2025

I had to downgrade to 3006.9 to be able to use Salt again. Removing "sync": True didn't seem to work.
test.ping was running without any problem. However state.apply always failed with 3006.10.

@hurzhurz
Copy link
Contributor Author

hurzhurz commented Apr 7, 2025

@tkaehn I don't think your problem is related to this event-forwarding-issue.
Even without removing that line, a state.apply should work and show a return, too.
Just some events about the execution would not be forwarded from a syndic to a upstream master.

I think you probably need to open a new issue for your specific problem, with logs etc. ...

@tkaehn
Copy link

tkaehn commented Apr 7, 2025

Thanks for your suggestion. I re-tried by upgrading to 3006.10 and I could see the same error messages in the syndic log:

[...]
TypeError: Minion._fire_master() got an unexpected keyword argument 'sync'

All directly connected minions returned from a state.apply. In contrast to this there was no return from the minions connected to the syndic server. Running state.apply directly on the syndic showed that the minions were connected.

Strangely test.ping and pillar.get are showing returns for all minions.

After removing "sync": True the issue seems to be solved. Maybe I was a little bit too impatient last week.

However, I still receive the following log messages (probably unrelated as it happens on the syndic, too):

2025-04-07 11:06:18,036 [salt.loaded.int.returner.local_cache:148 ][ERROR   ][28370] An extra return was detected from minion minion####, please verify the minion, this could be a replay attack
2025-04-07 11:06:18,811 [salt.loaded.int.returner.local_cache:148 ][ERROR   ][28365] An extra return was detected from minion minion####, please verify the minion, this could be a replay attack
2025-04-07 11:06:18,814 [salt.loaded.int.returner.local_cache:148 ][ERROR   ][28392] An extra return was detected from minion minion####, please verify the minion, this could be a replay attack
2025-04-07 11:06:18,869 [salt.master      :1180][ERROR   ][28351] Requested method not exposed: _ext_nodes
2025-04-07 11:06:18,926 [salt.loaded.int.returner.local_cache:148 ][ERROR   ][28349] An extra return was detected from minion minion####, please verify the minion, this could be a replay attack
2025-04-07 11:06:19,068 [salt.master      :1180][ERROR   ][28366] Requested method not exposed: _ext_nodes
2025-04-07 11:06:20,529 [salt.loaded.int.returner.local_cache:148 ][ERROR   ][28394] An extra return was detected from minion minion####, please verify the minion, this could be a replay attack
2025-04-07 11:06:20,537 [salt.loaded.int.returner.local_cache:148 ][ERROR   ][28391] An extra return was detected from minion minion####, please verify the minion, this could be a replay attack
2025-04-07 11:06:20,545 [salt.loaded.int.returner.local_cache:148 ][ERROR   ][28363] An extra return was detected from minion minion####, please verify the minion, this could be a replay attack
2025-04-07 11:06:20,551 [salt.loaded.int.returner.local_cache:148 ][ERROR   ][28394] An extra return was detected from minion minion####, please verify the minion, this could be a replay attack
2025-04-07 11:06:20,556 [salt.loaded.int.returner.local_cache:148 ][ERROR   ][28364] An extra return was detected from minion minion####, please verify the minion, this could be a replay attack
2025-04-07 11:06:20,595 [salt.master      :1180][ERROR   ][28364] Requested method not exposed: _ext_nodes
2025-04-07 11:06:20,649 [salt.master      :1180][ERROR   ][28351] Requested method not exposed: _ext_nodes
2025-04-07 11:06:27,197 [salt.loaded.int.returner.local_cache:148 ][ERROR   ][28371] An extra return was detected from minion minion####, please verify the minion, this could be a replay attack
2025-04-07 11:06:27,199 [salt.loaded.int.returner.local_cache:148 ][ERROR   ][28349] An extra return was detected from minion minion####, please verify the minion, this could be a replay attack
2025-04-07 11:06:27,200 [salt.loaded.int.returner.local_cache:148 ][ERROR   ][28367] An extra return was detected from minion minion####, please verify the minion, this could be a replay attack
2025-04-07 11:06:27,202 [salt.loaded.int.returner.local_cache:148 ][ERROR   ][28350] An extra return was detected from minion minion####, please verify the minion, this could be a replay attack
2025-04-07 11:06:27,207 [salt.loaded.int.returner.local_cache:148 ][ERROR   ][28347] An extra return was detected from minion minion####, please verify the minion, this could be a replay attack
2025-04-07 11:06:27,212 [salt.loaded.int.returner.local_cache:148 ][ERROR   ][28368] An extra return was detected from minion minion####, please verify the minion, this could be a replay attack
2025-04-07 11:06:27,214 [salt.loaded.int.returner.local_cache:148 ][ERROR   ][28370] An extra return was detected from minion minion####, please verify the minion, this could be a replay attack
2025-04-07 11:06:27,218 [salt.loaded.int.returner.local_cache:148 ][ERROR   ][28371] An extra return was detected from minion minion####, please verify the minion, this could be a replay attack
2025-04-07 11:06:27,221 [salt.loaded.int.returner.local_cache:148 ][ERROR   ][28369] An extra return was detected from minion minion####, please verify the minion, this could be a replay attack
2025-04-07 11:06:27,227 [salt.loaded.int.returner.local_cache:148 ][ERROR   ][28373] An extra return was detected from minion minion####, please verify the minion, this could be a replay attack
2025-04-07 11:06:27,251 [salt.master      :1180][ERROR   ][28349] Requested method not exposed: _ext_nodes
2025-04-07 11:06:27,259 [salt.master      :1180][ERROR   ][28352] Requested method not exposed: _ext_nodes

Those messages appear since a couple of versions and only if a lot of minions are targeted. I've checked that there are no duplicate keys. So the messages are false positives.

@hurzhurz
Copy link
Contributor Author

hurzhurz commented Apr 7, 2025

Hmm... maybe your problems came from a bad connection state of the syndic:
I got the impression that sometimes the syndic doesn't properly reconnect if e.g. the (mom-)master was restarted.
It then can look like the connection is unidirectional: the syndic receives requests/commands, but the master doesn't receive returns. Sometimes maybe also vice versa.

So perhaps while removing "sync": True is the fix for TypeError: Minion._fire_master() got an unexpected keyword argument 'sync', what actually helped you was restarting the syndic-service afterwards?

And about the other log messages:

Requested method not exposed: _ext_nodes: I did a quick look at the code and git history for _ext_nodes.
Looks like this is a very old deprecated function of the master. Do you have any minions with very old salt versions (older than 3003)?

An extra return was detected from minion minion####, please verify the minion, this could be a replay attack: I see that message too from time to time. I think there could be multiple causes. But probably a minion or syndic repeats the returns because it doesn't get the response that it was received within a certain time.
Maybe it is a connection issue like mentioned above. Or maybe the master is overloaded and can't process the returns in time (not enough worker threads or bad IO performance). But that are just guesses.

@tkaehn
Copy link

tkaehn commented Apr 7, 2025

Thanks for you further ideas. Reverting to the original file and restarting the syndic process leads to the same problem as before. state.apply only returns directly connected minions and e.g. test.ping returns for all minions. Even waiting a couple of minutes doesn't help.

The exception looks like this:

2025-04-07 14:02:53,407 [tornado.application:640 ][ERROR   ][50622] Exception in callback <bound method SyndicManager._forward_events of <salt.minion.SyndicManager object at 0x7fca509ec850>>
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/ioloop.py", line 1028, in _run
    return self.callback()
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py", line 3865, in _forward_events
    self._call_syndic(
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py", line 3693, in _call_syndic
    getattr(syndic_future.result(), func)(*args, **kwargs)
TypeError: Minion._fire_master() got an unexpected keyword argument 'sync'

Generally my impression is that it is best to restart the processes in order master, syndic, minion.

We have indeed older minions which could explain the _ext_nodes message.

Regarding the duplicate returns I tried to increase return_retry_timer and return_retry_timer_max and also tried
to run batches of 50 instead of 100. Both seemed to help a little bit and reduce the amount of messages. I guess I need to figure out a way to increase the performance of the master. However, it's already using 24 worker processes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior needs-triage
Projects
None yet
Development

No branches or pull requests

2 participants