Skip to content

Conversation

michburk
Copy link
Contributor

@michburk michburk commented Sep 29, 2025

The reproducer role used a hardcoded zuul user and hardcoded /home/zuul path in several places, particularly when configuring controller-0. This patch pulls out this hardcoded data into default vars in the reproducer.
This patch also fixes a few incorrect assumptions that a var like ansible_user_id can be used when delegating to or configuring controller-0. For example, the ansible_user_id will not resolve to controller-0's user, and instead resolve to the host that the task is being delegated from. If there is a mismatch between these two users, this causes problems. This has mostly not been an issue because our ci assumes that there is a zuul user everywhere, but people running local deployments have encountered issues when using the framework on a hypervisor with a non-zuul user.

Copy link
Contributor

openshift-ci bot commented Sep 29, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Sep 29, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/a7738eda2d9e4d6b9ecbdac4b6cf2717

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 31m 14s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 12m 31s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 14m 24s
✔️ cifmw-pod-zuul-files SUCCESS in 4m 39s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 03s
✔️ cifmw-pod-pre-commit SUCCESS in 7m 37s
✔️ build-push-container-cifmw-client SUCCESS in 23m 09s
cifmw-molecule-reproducer FAILURE in 13m 06s

@michburk michburk force-pushed the main branch 6 times, most recently from 663c195 to 9466520 Compare October 14, 2025 18:39
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/34c153c78ef7406991c39d16c100ea72

✔️ cifmw-pod-zuul-files SUCCESS in 4m 45s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 51s
cifmw-pod-pre-commit FAILURE in 8m 01s
✔️ build-push-container-cifmw-client SUCCESS in 16m 46s
cifmw-molecule-reproducer FAILURE in 9m 33s

@michburk michburk force-pushed the main branch 2 times, most recently from 2f16f45 to 4822a60 Compare October 14, 2025 20:43
@michburk michburk changed the title WIP: Allow for non-zuul users in libvirt_manager and reproducer WIP: Allow for non-zuul users in reproducer role Oct 16, 2025
@michburk michburk changed the title WIP: Allow for non-zuul users in reproducer role WIP: Refactor hard-coded zuul user in reproducer role Oct 16, 2025
@michburk michburk marked this pull request as ready for review October 16, 2025 21:15
@michburk michburk requested a review from a team as a code owner October 16, 2025 21:15
# Default repositories we always want to have
cifmw_reproducer_default_repositories:
- src: "https://github.com/openstack-k8s-operators/ci-framework"
dest: "{{ cifmw_project_dir_absolute }}"
Copy link
Contributor Author

@michburk michburk Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using cifmw_project_dir_absolute here does not always work as these dest paths are for controller-0. For example the task reproducer : Clone repository if needed: ci-framework will fail if the hypervisor and controller-0 have different users, as this task will try to create /home/<hypervisor_user_here>/src/github.com/openstack-k8s-operators/ci-framework on controller-0 rather than /home/<controller_user_here>/src/github.com/openstack-k8s-operators/ci-framework

}}
failed_when: false

- name: Discover and expose CI Framework path on remote node
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this block to later in roles/reproducer/tasks/main.yml in order to ensure that controller-0 exists before trying to set this fact. My refactor of cifmw_reproducer_default_repositories in roles/reproducer/vars/main.yml depends on knowing cifmw_reproducer_controller0_user, so setting this fact before controller-0 exists means we can't pull the ansible_ssh_user from controller-0's hostvars and the fact may be incorrect.

# All variables within this role should have a prefix of "cifmw_reproducer"
cifmw_reproducer_user: "{{ ansible_user | default('zuul') }}"
cifmw_reproducer_controller0_user: "{{ hostvars['controller-0']['ansible_ssh_user'] | default('zuul') }}"
cifmw_reproducer_controller0_user_dir: "/home/{{ cifmw_reproducer_controller0_user }}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enrique brought up the possibility of using something like $HOME rather than using a long var like cifmw_reproducer_controller0_user_dir - this would work in many cases, but some tasks will fail with this. For example:

TASK [sushy_emulator : Create and start Sushy Emulator container
...                                                                                         
task path: /home/testuser/ci-framework/roles/sushy_emulator/tasks/create_container.yml:26                                                                                    
fatal: [localhost -> controller-0(controller-0.localhost)]: FAILED! =>                                                                                                       
    changed: false                                                                                                                                                           
    msg: Container cifmw-sushy_emulator exited with code 125 when runed                                                                                                      
    stderr: |                                                                                                                                                                
      Error: creating named volume "$HOME/ci-framework-data/artifacts/sushy_emulator/.htpasswd": running volume create option: names must match [a-zA-Z0-9][a-zA-Z0-9_.-]*: invalid argument
    stderr_lines:
    - 'Error: creating named volume "$HOME/ci-framework-data/artifacts/sushy_emulator/.htpasswd":
      running volume create option: names must match [a-zA-Z0-9][a-zA-Z0-9_.-]*: invalid
      argument'

It would be possible to default to using $HOME where possible, but we have to keep track of the controller0 user anyways and I figured it'd be good to be clear and consistent about using controller0_user_dir - open to other arguments though!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this task as it's using the module: podman_container it doesn't expand env variables, so I think you need to use the lookup in this way: {{ lookup('env', 'HOME') }}

But I agree with you of consistenly using the same solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor correction: {{ lookup('env', 'HOME') }} would not be looking up HOME in controller-0's env because this task is delegated, so this would lookup HOME on the host that is being delegated from. Something like {{ hostvars['controller-0']['ansible_env']['HOME'] }} works though.

Copy link
Contributor Author

@michburk michburk Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be possible to replace cifmw_reproducer_controller0_user: "{{ hostvars['controller-0']['ansible_ssh_user'] | default('zuul') }}" with cifmw_reproducer_controller0_user: "{{ hostvars['controller-0']['ansible_env']['USER'] | default('zuul') }} if that's in some way better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ow I see. Honestly I don't have any other suggestions than what evallesp has already suggested to you.
Do you just want to add a fallback for the _user_dir?

cifmw_reproducer_controller0_user_dir: "{{ hostvars['controller-0']['ansible_env']['HOME'] | default('/home/' ~ cifmw_reproducer_controller0_user) }}"

cifmw_basedir: "{{ ansible_user_dir ~ '/ci-framework-data' }}"
cifmw_basedir: "{{ cifmw_reproducer_controller0_basedir }}"
ansible_user_dir: "{{ cifmw_reproducer_controller0_user_dir }}"
ansible_user_id: "{{ cifmw_reproducer_controller0_user }}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking for feedback on my decision to overwrite these vars here - This task delegates to controller-0 and ends up calling several other roles that configure things assuming they can use vars like ansible_user_dir and ansible_user_id , but this causes issues when there are different users between localhost and controller-0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) reflection: I'm not opposite to this solution

Copy link
Contributor

@evallesp evallesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, not approving to hold on until more people chime in

# All variables within this role should have a prefix of "cifmw_reproducer"
cifmw_reproducer_user: "{{ ansible_user | default('zuul') }}"
cifmw_reproducer_controller0_user: "{{ hostvars['controller-0']['ansible_ssh_user'] | default('zuul') }}"
cifmw_reproducer_controller0_user_dir: "/home/{{ cifmw_reproducer_controller0_user }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this task as it's using the module: podman_container it doesn't expand env variables, so I think you need to use the lookup in this way: {{ lookup('env', 'HOME') }}

But I agree with you of consistenly using the same solution.

cifmw_basedir: "{{ ansible_user_dir ~ '/ci-framework-data' }}"
cifmw_basedir: "{{ cifmw_reproducer_controller0_basedir }}"
ansible_user_dir: "{{ cifmw_reproducer_controller0_user_dir }}"
ansible_user_id: "{{ cifmw_reproducer_controller0_user }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) reflection: I'm not opposite to this solution

@michburk michburk changed the title WIP: Refactor hard-coded zuul user in reproducer role Refactor hard-coded zuul user in reproducer role Oct 17, 2025
It was already technically possible to create a controller-0 vm with a
non-zuul user. This commit enables configuring controller-0 with a
non-zuul user by using controller-0's `ansible_ssh_user` from
`hostvars` in place of hardcoded `'zuul'`.

It is worth noting that we can't simply use vars like ansible_user_id
or ansible_user_dir with tasks that are delgated to controller-0.
If the host we are running the reproducer from has a different user
from controller-0, these vars will try to use the user from
the machine that is running the reproducer rather than controller-0's
user.

For example, let's say we are running the reproducer from a machine with
the user 'exampleuser' and we have the user 'zuul' on controller-0.
If we have some task that is delegated to controller-0 and
tries to create some file in the controller-0 user's home directory, we
can't simply do:

```
- name: Create some file
  delegate_to: controller-0
  ansible.builtin.file:
    path: "{{ ansible_user_dir }}/some_file.txt"
    state: touch
```

`ansible_user_dir` resolves to `/home/exampleuser`. This path does not
exist on controller-0, so this task would fail.
Previously, the check for `common-requirements.txt` would check if
common requirements exists on localhost at a given path. This path is
based on the home directory on localhost.
If the file did exist, it would use that same path to try to install the
requirements on controller-0. This causes issues when localhost and
controller-0 do not have the same users and home directories.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants