Skip to content

Conversation

@ksuderman
Copy link
Contributor

@ksuderman ksuderman commented Oct 18, 2025

Improvements to the Docker build and some defensive programming to counter a problem with the Ansible Galaxy playbook

When themes are disabled, either by setting galaxy_manage_themes: false or if there are no galaxy_themes_subdomains, the use of the YAML folded block scalar operator (>) causes the variable themes_config_file_by_host to evaluate to the string "{ }" rather than to an empty dictionary. This causes Galaxy's config parser to fail when it tries to call .items() on a string object. While the root cause of the problem is in the Ansible playbook, a misconfiguration should not cause Galaxy to crash which in turn breaks the Docker builds.

Cannot be merged until galaxyproject/galaxy-docker-k8s#40 is merged.

Closes #21089

Friends with:

  1. Disable theme management galaxy-docker-k8s#40
  2. Fix themes_config_file_by_host to always contain a dict ansible-galaxy#234

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@github-actions github-actions bot added this to the 26.0 milestone Oct 18, 2025
@ksuderman ksuderman modified the milestones: 26.0, 25.1 Oct 18, 2025
ARG SERVER_DIR

RUN ansible-playbook -i localhost, playbook.yml -v -e "{galaxy_build_client: false}" -e galaxy_virtualenv_command=virtualenv
RUN ansible-playbook -i localhost, playbook.yml -v -e "{galaxy_build_client: false, galaxy_additional_venv_packages: false, galaxy_virtualenv_command: virtualenv}"
Copy link
Member

@nsoranzo nsoranzo Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use "/usr/bin/python3 -m venv" instead of virtualenv. You could skip the installation of virtualenv at line 54.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why they did what they did the way they did it, but this is a flag being passed to the Ansible Playbook so would require changes to the playbook, which is beyond the scope of this PR. I just want the Docker builds working again before the release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why they did what they did the way they did it, but this is a flag being passed to the Ansible Playbook so would require changes to the playbook

I don't think this would require changes to the playbook, it's a command also there: https://github.com/galaxyproject/galaxy-docker-k8s/blob/master/playbook.yml#L30

which is beyond the scope of this PR. I just want the Docker builds working again before the release.

Fair enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I missed the command on line 30. I can test using /usr/bin/python3 -m venv

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have time to dig into what the ansible-galaxy playbook is doing, but if this value is not overridden with virtualenv then we get a no such file or directory error for /usr/bin/python3

# This can happen when Jinja2 templates with folded scalars evaluate to strings
if isinstance(themes_by_host_value, str):
# If it's an empty/whitespace string or just braces, treat as empty dict
stripped = themes_by_host_value.strip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we are moving the burden too much on the Galaxy side to accept broken configurations here. I would just raise an exception if themes_by_host_value is not a dict, with a helpful message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for backwards compatibility and to maintain the existing behavior. Even if the Ansible playbook is not fixed this code will accept its (incorrect) output and allow the Docker build to complete.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather pin the ansible version if you just want to fix the Docker build until the roles are fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll pin the ansible version and remove the defensive programming changes. I'm not keen on making code changes after the freeze in any case.

ksuderman and others added 2 commits October 18, 2025 08:35

# The data in version.json will be displayed in Galaxy's /api/version endpoint
RUN printf "{\n \"git_commit\": \"$(cat GITREVISION)\",\n \"build_date\": \"$BUILD_DATE\",\n \"image_tag\": \"$IMAGE_TAG\"\n}\n" > version.json \
RUN printf "{\n \"git_commit\": \"$GIT_COMMIT\",\n \"build_date\": \"$BUILD_DATE\",\n \"image_tag\": \"$IMAGE_TAG\"\n}\n" > version.json \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GIT_COMMIT is not guaranteed to be set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GIT_COMMIT is passed in as an argument and is also used on line 142. By relying on the GIT_COMIMIT arg we can avoid copying the sizable .git directory to the build context just to get this one value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason it was done this way originally is that when you run docker build . -t galaxy:latest locally, the GIT_COMMIT won't be passed in, and there's no way to determine the commit after that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone builds locally then the BUILD_DATE and IMAGE_TAG should also be set, and if those are going to be set on the command line why not also set GIT_COMMIT?

docker build . -t galaxy:latest --build-arg GIT_COMMIT=${git rev-parse HEAD) ...

That saves copying a ~1.3GB directory to the build context to get the value of one variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BUILD_DATE and IMAGE_TAG are useful but optional. The former can be determined from docker hub while the latter is inferable from the GIT_COMMIT. Hence why the GIT_COMMIT is what's really important. If removing .git from the build context makes a substantial enough improvement (what kind of difference do you see?), I'm ok with it, but we should at least document setting the GIT_COMMIT. What do you think @afgane?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you calculate GITREVISION in stage 1, and pass it as ARG to this stage 3?

ARG SERVER_DIR

RUN ansible-playbook -i localhost, playbook.yml -v --tags "galaxy_build_client" -e galaxy_virtualenv_command=virtualenv
RUN ansible-playbook -i localhost, playbook.yml -v --tags "galaxy_build_client" -e "{galaxy_additional_venv_packages: false, galaxy_virtualenv_command: virtualenv}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is galaxy_additional_venv_packages needed? Should it be in the playbook instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without it we get the error:

239.3 Conditional result (False) was derived from value of type 'list' at '/tmp/ansible/galaxy-docker/roles/galaxy/defaults/main.yml:502:34'. Conditionals must have a boolean result.
239.3 Origin: /tmp/ansible/galaxy-docker/roles/galaxy/tasks/dependencies.yml:47:13
239.3
239.3 45         PYTHONPATH: null
239.3 46         VIRTUAL_ENV: "{{ galaxy_venv_dir }}"
239.3 47       when: galaxy_additional_venv_packages
239.3                ^ column 13

It should be fixed in the playbook, but I was originally hoping that we could fix this (for now) with only changes to the docker file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are at it, let's move both env vars (galaxy_virtualenv_command and galaxy_additional_venv_packages) into the playbook, so config doesn't get split.

self.themes_by_host[host] = {}
file_path = self._in_dir(resolve_dir_path, file_name)
_load_theme(file_path, self.themes_by_host[host])
themes_by_host_value = self.config_dict["themes_config_file_by_host"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a typing problem with config_dict I believe. config_dict is declared as a dict[str, str], but the code is expecting themes_config_file_by_host to also be a dict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe themes_config_file_by_host is a dict no? Doesn't it map a host names to config files? Eg:

themes_config_file_by_host:
  host1.example.com: /path/to/theme_1
  host2.example.com: /path/to/theme_2

Otherwise, why was the code attempting to call .items()? My understanding of the problem is that the playbook is currently producing:

themes_config_file_by_host: "{ }"   # A string. Incorrect

rather than:

themes_config_file_by_host: { }   # An empty dict. Correct

If there is a typing problem with the config_dict I would say that is a separate problem and beyond the scope of this PR.

@ksuderman
Copy link
Contributor Author

Closing in favor of #21131

I will revisit optimizing the Docker build after the 25.1 release.

@ksuderman ksuderman closed this Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Docker builds are failing

3 participants