Skip to content

[BUG] 3006.7 Minion makes the pillar-schedules persistent in failover process #67980

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 9 tasks
the-toxin opened this issue Apr 23, 2025 · 0 comments
Open
2 of 9 tasks
Labels
Bug broken, incorrect, or confusing behavior needs-triage

Comments

@the-toxin
Copy link

the-toxin commented Apr 23, 2025

Description
Salt Minion dump the pillar-schedules to persistent store(_schedule.conf) in the failover process

Setup
Run 2 masters and minion in the multi-master setup.

On masters set the pillar-schedule:

$ cat /srv/pillar/schedule.sls
schedule:
  get-grains:
    enabled: true
    function: grains.items
    jid_include: true
    maxrunning: 1
    metadata: {get-common-grains: true}
    seconds: 60
    splay: 1
    name: get-grains
    run: true

$ cat /srv/pillar/top.sls
base:
  '*':
    - schedule

Minion multi-master conf:

$ cat /etc/salt/minion.d/failover.conf
master:
- 10.10.10.1
- 10.10.10.2
master_type: failover
verify_master_pubkey_sign: True
random_master: True
master_alive_interval: 30
master_failback: False
master_failback_interval: 0
retry_dns: 0

The Minion's _schedule.conf includes only default scheduled jobs:

$ cat /etc/salt/minion.d/_schedule.conf
schedule:
  __master_alive_10.10.10.1:
    enabled: true
    function: status.master
    jid_include: true
    kwargs: {connected: true, master: 10.10.10.1}
    maxrunning: 1
    return_job: false
    seconds: 30
  __mine_interval: {enabled: true, function: mine.update, jid_include: true, maxrunning: 2,
    minutes: 60, return_job: false, run_on_start: true}
  • on-prem machine
  • VM (Virtualbox, KVM, etc. please specify)
  • VM running on a cloud service, please be explicit and add details
  • container (Kubernetes, Docker, containerd, etc. please specify)
  • or a combination, please be explicit
  • jails if it is FreeBSD
  • classic packaging
  • onedir packaging
  • used bootstrap to install

Steps to Reproduce the behavior
Shutdown the master that the minion is connected.
Wait some time required for the failover process.
Ensure wat minion is connected to the new master.
Power up the first master and repeat failover case but now the second master must be stopped.
Look at /etc/salt/minion.d/_schedule.conf and u will see the schedule from pillar.

$ cat /etc/salt/minion.d/_schedule.conf | grep "get-grains:" -A 9
  get-grains:
    enabled: true
    function: grains.items
    jid_include: true
    maxrunning: 1
    metadata: {get-common-grains: true}
    name: get-grains
    run: true
    seconds: 60
    splay: 1

Expected behavior
The /etc/salt/minion.d/_schedule.conf not contains pillar-schedule

$ cat /etc/salt/minion.d/_schedule.conf | grep "get-grains:" -A 9
$

Versions Report

salt --versions-report
Salt Version:
          Salt: 3006.7

Python Version:
        Python: 3.10.15 (main, Jan 28 2025, 19:31:30) [GCC 8.5.0]

Dependency Versions:
          cffi: 1.14.6
      cherrypy: unknown
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.3
       libgit2: 1.9.0
  looseversion: 1.0.2
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.19.1
        pygit2: 1.17.0
  python-gnupg: 0.4.8
        PyYAML: 6.0.1
         PyZMQ: 23.2.0
        relenv: 0.18.0
         smmap: Not Installed
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: astra 1.7_x86-64 1.7_x86-64
        locale: utf-8
       machine: x86_64
       release: 6.1.50-1-generic
        system: Linux
       version: Astra Linux 1.7_x86-64 1.7_x86-64

Additional context
The patch what helps me to fix the issue:

clear-pillar-schedules-in-failover-schedules-dump.patch
Subject: [PATCH] fix(minion/failover): clear pillar-schedules in schedules
 dump

---
 salt/minion.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/salt/minion.py b/salt/minion.py
index c2e1551137..e2e3c4e6a6 100644
--- a/salt/minion.py
+++ b/salt/minion.py
@@ -2890,7 +2890,7 @@ class Minion(MinionBase):
                         )
 
                         # put the current schedule into the new loaders
-                        self.opts["schedule"] = self.schedule.option("schedule")
+                        self.opts["schedule"] = self.schedule._get_schedule(include_pillar=False, remove_hidden=True)
                         (
                             self.functions,
                             self.returners,
-- 
@the-toxin the-toxin added Bug broken, incorrect, or confusing behavior needs-triage labels Apr 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior needs-triage
Projects
None yet
Development

No branches or pull requests

1 participant