-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix the docker build #21103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the docker build #21103
Conversation
| ARG SERVER_DIR | ||
|
|
||
| RUN ansible-playbook -i localhost, playbook.yml -v -e "{galaxy_build_client: false}" -e galaxy_virtualenv_command=virtualenv | ||
| RUN ansible-playbook -i localhost, playbook.yml -v -e "{galaxy_build_client: false, galaxy_additional_venv_packages: false, galaxy_virtualenv_command: virtualenv}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use "/usr/bin/python3 -m venv" instead of virtualenv. You could skip the installation of virtualenv at line 54.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why they did what they did the way they did it, but this is a flag being passed to the Ansible Playbook so would require changes to the playbook, which is beyond the scope of this PR. I just want the Docker builds working again before the release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why they did what they did the way they did it, but this is a flag being passed to the Ansible Playbook so would require changes to the playbook
I don't think this would require changes to the playbook, it's a command also there: https://github.com/galaxyproject/galaxy-docker-k8s/blob/master/playbook.yml#L30
which is beyond the scope of this PR. I just want the Docker builds working again before the release.
Fair enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I missed the command on line 30. I can test using /usr/bin/python3 -m venv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have time to dig into what the ansible-galaxy playbook is doing, but if this value is not overridden with virtualenv then we get a no such file or directory error for /usr/bin/python3
| # This can happen when Jinja2 templates with folded scalars evaluate to strings | ||
| if isinstance(themes_by_host_value, str): | ||
| # If it's an empty/whitespace string or just braces, treat as empty dict | ||
| stripped = themes_by_host_value.strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we are moving the burden too much on the Galaxy side to accept broken configurations here. I would just raise an exception if themes_by_host_value is not a dict, with a helpful message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for backwards compatibility and to maintain the existing behavior. Even if the Ansible playbook is not fixed this code will accept its (incorrect) output and allow the Docker build to complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather pin the ansible version if you just want to fix the Docker build until the roles are fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll pin the ansible version and remove the defensive programming changes. I'm not keen on making code changes after the freeze in any case.
Co-authored-by: Nicola Soranzo <[email protected]>
|
|
||
| # The data in version.json will be displayed in Galaxy's /api/version endpoint | ||
| RUN printf "{\n \"git_commit\": \"$(cat GITREVISION)\",\n \"build_date\": \"$BUILD_DATE\",\n \"image_tag\": \"$IMAGE_TAG\"\n}\n" > version.json \ | ||
| RUN printf "{\n \"git_commit\": \"$GIT_COMMIT\",\n \"build_date\": \"$BUILD_DATE\",\n \"image_tag\": \"$IMAGE_TAG\"\n}\n" > version.json \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GIT_COMMIT is not guaranteed to be set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GIT_COMMIT is passed in as an argument and is also used on line 142. By relying on the GIT_COMIMIT arg we can avoid copying the sizable .git directory to the build context just to get this one value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason it was done this way originally is that when you run docker build . -t galaxy:latest locally, the GIT_COMMIT won't be passed in, and there's no way to determine the commit after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone builds locally then the BUILD_DATE and IMAGE_TAG should also be set, and if those are going to be set on the command line why not also set GIT_COMMIT?
docker build . -t galaxy:latest --build-arg GIT_COMMIT=${git rev-parse HEAD) ...
That saves copying a ~1.3GB directory to the build context to get the value of one variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BUILD_DATE and IMAGE_TAG are useful but optional. The former can be determined from docker hub while the latter is inferable from the GIT_COMMIT. Hence why the GIT_COMMIT is what's really important. If removing .git from the build context makes a substantial enough improvement (what kind of difference do you see?), I'm ok with it, but we should at least document setting the GIT_COMMIT. What do you think @afgane?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you calculate GITREVISION in stage 1, and pass it as ARG to this stage 3?
| ARG SERVER_DIR | ||
|
|
||
| RUN ansible-playbook -i localhost, playbook.yml -v --tags "galaxy_build_client" -e galaxy_virtualenv_command=virtualenv | ||
| RUN ansible-playbook -i localhost, playbook.yml -v --tags "galaxy_build_client" -e "{galaxy_additional_venv_packages: false, galaxy_virtualenv_command: virtualenv}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is galaxy_additional_venv_packages needed? Should it be in the playbook instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without it we get the error:
239.3 Conditional result (False) was derived from value of type 'list' at '/tmp/ansible/galaxy-docker/roles/galaxy/defaults/main.yml:502:34'. Conditionals must have a boolean result.
239.3 Origin: /tmp/ansible/galaxy-docker/roles/galaxy/tasks/dependencies.yml:47:13
239.3
239.3 45 PYTHONPATH: null
239.3 46 VIRTUAL_ENV: "{{ galaxy_venv_dir }}"
239.3 47 when: galaxy_additional_venv_packages
239.3 ^ column 13
It should be fixed in the playbook, but I was originally hoping that we could fix this (for now) with only changes to the docker file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we are at it, let's move both env vars (galaxy_virtualenv_command and galaxy_additional_venv_packages) into the playbook, so config doesn't get split.
| self.themes_by_host[host] = {} | ||
| file_path = self._in_dir(resolve_dir_path, file_name) | ||
| _load_theme(file_path, self.themes_by_host[host]) | ||
| themes_by_host_value = self.config_dict["themes_config_file_by_host"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also a typing problem with config_dict I believe. config_dict is declared as a dict[str, str], but the code is expecting themes_config_file_by_host to also be a dict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe themes_config_file_by_host is a dict no? Doesn't it map a host names to config files? Eg:
themes_config_file_by_host:
host1.example.com: /path/to/theme_1
host2.example.com: /path/to/theme_2
Otherwise, why was the code attempting to call .items()? My understanding of the problem is that the playbook is currently producing:
themes_config_file_by_host: "{ }" # A string. Incorrect
rather than:
themes_config_file_by_host: { } # An empty dict. Correct
If there is a typing problem with the config_dict I would say that is a separate problem and beyond the scope of this PR.
|
Closing in favor of #21131 I will revisit optimizing the Docker build after the 25.1 release. |
Improvements to the Docker build and some defensive programming to counter a problem with the Ansible Galaxy playbook
When themes are disabled, either by setting
galaxy_manage_themes: falseor if there are nogalaxy_themes_subdomains, the use of the YAML folded block scalar operator (>) causes the variablethemes_config_file_by_hostto evaluate to the string"{ }"rather than to an empty dictionary. This causes Galaxy's config parser to fail when it tries to call.items()on a string object. While the root cause of the problem is in the Ansible playbook, a misconfiguration should not cause Galaxy to crash which in turn breaks the Docker builds.Cannot be merged until galaxyproject/galaxy-docker-k8s#40 is merged.
Closes #21089
Friends with:
How to test the changes?
(Select all options that apply)
License