Replies: 3 comments
-
On the machines there are also running crowdstrike agent, and datadog. |
Beta Was this translation helpful? Give feedback.
-
I think you likely had a side effect of bad settings in "productionising" of your docker-compose. The https://github.com/apache/airflow/blob/main/airflow-core/docs/howto/docker-compose/docker-compose.yaml is clearly explained bot in the docs and the compose file itself as not suitable for production environment: In the file:
In the docs:
Also it is explained what is the "Deployment Manager" (i.e. you) responsibilities when using installation with just docker containers - and that includes making your docker compose adjusted for any production settings. Basically the docker-compose you get from us is for learning and exploration and it's up to you entirely to make it works in the way your deployment and production requirement expect to have. This also means that you are responsible for appropriate configuration and automation of all the environment your docker compose runs. So diagnosing and solving the problem is on you - as you are solely responsible to make the docker-compose configuration suitable to run in your production environment. However I can provide some hints: There are few processes in the learning docker-compse that are run with Airflow user (which is used by most processes) cannot physically do it - unless you geve it sudo capabilitites. This is basic UNIX and container knowledge, nothing specific to airflow. Those "root" containers are only run when compose is restarted and they change owner of folders created to AIRFLOW_UID - which (as explained in the warning if you don't do it) shoudl have AIRFLOW_UID set so that it can change the permissions. This is also explained in the docs. Possibly your setup caused execution of that init container without AIRFLOW_UID set. But this is (again) part of your deployment and docker-compose expert knowledge that you have to have when deciding on creating your production setup to investigate and diagnose when and how it happened (if my hypothesis is right). There might also be other reasons, but only you can tell what happened looking at your logs and knowing how you run things. Also - if you find out, it would be great you report it back here, that might help other people who might have similar issues with their setup. Here is the part that you should start looking at: airflow-init:
<<: *airflow-common
entrypoint: /bin/bash
# yamllint disable rule:line-length
command:
- -c
- |
if [[ -z "${AIRFLOW_UID}" ]]; then
echo
echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m"
echo "If you are on Linux, you SHOULD follow the instructions below to set "
echo "AIRFLOW_UID environment variable, otherwise files will be owned by root."
echo "For other operating systems you can get rid of the warning with manually created .env file:"
echo " See: https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#setting-the-right-airflow-user"
echo
export AIRFLOW_UID=$$(id -u)
fi
one_meg=1048576
mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) / one_meg))
cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat)
disk_available=$$(df / | tail -1 | awk '{print $$4}')
warning_resources="false"
if (( mem_available < 4000 )) ; then
echo
echo -e "\033[1;33mWARNING!!!: Not enough memory available for Docker.\e[0m"
echo "At least 4GB of memory required. You have $$(numfmt --to iec $$((mem_available * one_meg)))"
echo
warning_resources="true"
fi
if (( cpus_available < 2 )); then
echo
echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for Docker.\e[0m"
echo "At least 2 CPUs recommended. You have $${cpus_available}"
echo
warning_resources="true"
fi
if (( disk_available < one_meg * 10 )); then
echo
echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for Docker.\e[0m"
echo "At least 10 GBs recommended. You have $$(numfmt --to iec $$((disk_available * 1024 )))"
echo
warning_resources="true"
fi
if [[ $${warning_resources} == "true" ]]; then
echo
echo -e "\033[1;33mWARNING!!!: You have not enough resources to run Airflow (see above)!\e[0m"
echo "Please follow the instructions to increase amount of resources available:"
echo " https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#before-you-begin"
echo
fi
echo
echo "Creating missing opt dirs if missing:"
echo
mkdir -v -p /opt/airflow/{logs,dags,plugins,config}
echo
echo "Airflow version:"
/entrypoint airflow version
echo
echo "Files in shared volumes:"
echo
ls -la /opt/airflow/{logs,dags,plugins,config}
echo
echo "Running airflow config list to create default config file if missing."
echo
/entrypoint airflow config list >/dev/null
echo
echo "Files in shared volumes:"
echo
ls -la /opt/airflow/{logs,dags,plugins,config}
echo
echo "Change ownership of files in /opt/airflow to ${AIRFLOW_UID}:0"
echo
chown -R "${AIRFLOW_UID}:0" /opt/airflow/
echo
echo "Change ownership of files in shared volumes to ${AIRFLOW_UID}:0"
echo
chown -v -R "${AIRFLOW_UID}:0" /opt/airflow/{logs,dags,plugins,config}
echo
echo "Files in shared volumes:"
echo
ls -la /opt/airflow/{logs,dags,plugins,config} |
Beta Was this translation helpful? Give feedback.
-
Converted it into a discussion. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Apache Airflow version
Airflow 3 version (please specify below)
3.0.4
What happened?
I am running an airflow from the compose file found at airflow samples.
I have ~150 virtual machines accross 4 aws accounts running the airflow-airflow-worker-1 process only, The workers are scaled from 1 to N.
I have each aws account one main node with the api-server, scheduler, triggerer.
Today all 3 production, 2 dev environments was not working because the logs/ folder has root:root ownership instead of airflow:root ownership as previously.
Only the qa setup was working because it was on airflow 3.0.6 version.
What you think should happen instead?
The logs/ folder should not be changed.
How to reproduce
No idea.
Operating System
amazon linux 2023
Versions of Apache Airflow Providers
the ones from the compose -apache/airflow:3.0.4
Deployment
Docker-Compose
Deployment details
Using celery.
On each node, the logs/ folder is mounted in each container.
Anything else?
This setup was working 31 days without any changes.
Today at the same time, with the same error failed 5 airflow installs in 4 different aws accounts with the same problem.
I assume airflow has changed some script which is dynamically pulled on boot of the workers/airflow
This starts happening from 2025-09-26 00:00:00 EET
after manually restoring the permissions everythings works fine.
Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions