Update docs

michael-bouvy · michael-bouvy · commit 048cbe063cee · 2024-05-27T16:33:02.000+02:00
diff --git a/README.md b/README.md
@@ -1,3 +1,3 @@
 # Deploy Magento on Kubernetes
 
-See https://www.clickandmortar.fr/blog/deploy-magento-2-kubernetes-docker
+➡️ See our complete guide: https://clickandmortar.github.io/magento-kubernetes/
diff --git a/build.sh b/build.sh
diff --git a/docs/.vitepress/config.mts b/docs/.vitepress/config.mts
@@ -5,6 +5,7 @@ import {withMermaid} from "vitepress-plugin-mermaid";
 const config = defineConfig({
   title: "Magento Kubernetes",
   description: "The ultimate guide to deploy Magento on Kubernetes",
+  lastUpdated: true,
   themeConfig: {
     logo: '/images/logo-transp.png',
     nav: [
@@ -43,20 +44,20 @@ const config = defineConfig({
           { text: 'Architecture', link: '/guide/deployment/architecture' },
           { text: 'Resources and scaling', link: '/guide/deployment/resources-scaling' },
           { text: 'Helm chart', link: '/guide/deployment/helm-chart' },
-          { text: 'CI/CD', link: '/guide/deployment/ci-cd' },
+          { text: 'CI/CD 🚧', link: '/guide/deployment/ci-cd' },
         ]
       },
       {
         text: '💡 Advanced',
         collapsed: false,
         items: [
           { text: 'High availability', link: '/guide/advanced/high-availability' },
-          { text: 'Monitoring', link: '/guide/advanced/monitoring' },
-          { text: 'Log management', link: '/guide/advanced/log-management' },
           { text: 'ARM64 architecture', link: '/guide/advanced/arm64-architecture' },
-          { text: 'Read-only filesystem', link: '/guide/advanced/read-only-filesystem' },
-          { text: 'Amazon S3 Media storage', link: '/guide/advanced/amazon-s3-media-storage' },
           { text: 'Spot / preemptible instances', link: '/guide/advanced/spot-preemptible-instances' },
+          { text: 'Monitoring 🚧', link: '/guide/advanced/monitoring' },
+          { text: 'Log management 🚧', link: '/guide/advanced/log-management' },
+          { text: 'Read-only filesystem 🚧', link: '/guide/advanced/read-only-filesystem' },
+          { text: 'Amazon S3 Media storage 🚧', link: '/guide/advanced/amazon-s3-media-storage' },
         ]
       },
       {
diff --git a/docs/guide/advanced/amazon-s3-media-storage.md b/docs/guide/advanced/amazon-s3-media-storage.md
@@ -0,0 +1,8 @@
+---
+title: Amazon S3 media storage
+---
+
+# Amazon S3 media storage
+
+> [!NOTE]
+> This page is a work in progress.
diff --git a/docs/guide/advanced/arm64-architecture.md b/docs/guide/advanced/arm64-architecture.md
@@ -0,0 +1,15 @@
+---
+title: ARM64 architecture
+---
+
+# ARM64 architecture
+
+Over the last years, and especially thanks to Apple's chips, ARM64 architecture has become more and more popular.
+
+Amazon Web Services (AWS) has also released its own ARM64-based instances, which are cheaper and more efficient than their x86 (AMD64) counterparts: [the Graviton instances](https://aws.amazon.com/ec2/graviton/).
+
+**Magento / Adobe Commerce runs perfectly on ARM64 architecture**, and you can use the same deployment strategies as for x86 (AMD64) instances.
+
+At the time of writing, all the tools and services used in this guide are compatible with ARM64 architecture.
+
+However, you may need to check the compatibility of some third-party services or tools you use in your Magento / Adobe Commerce stack.
diff --git a/docs/guide/advanced/high-availability.md b/docs/guide/advanced/high-availability.md
@@ -82,16 +82,159 @@ Using a Load Balancer both helps distribute the traffic across the nodes, and en
 
 As mentioned in the [resources and scaling page](/guide/deployment/resources-scaling##workload-placement), `Pods` should be distributed across the nodes and AZs of the Kubernetes cluster.
 
-Additionally, we should define:
+Additionally, we should:
 
-* A `PodDisruptionBudget` to ensure that at least a given number of `Pods` is available at all times
-* A readiness and liveness probe to ensure that the `Pod` is healthy and ready to receive traffic, and to restart it if needed (stale)
+* Define a `PodDisruptionBudget` to ensure that at least a given number of `Pods` is available at all times
+* Define a readiness and liveness probe to ensure that the `Pod` is healthy and ready to receive traffic, and to restart it if needed (stale)
+* Make sure containers are terminated gracefully, to avoid data loss
+
+### Readiness and liveness probes
 
 > [!IMPORTANT]
-> When defining a readiness probe for your PHP FPM container, you shouldn't rely Magento's `health_check.php` endpoint, as it depends on external services to be available.
+> When defining a readiness probe for your PHP FPM container, you shouldn't rely Magento's `health_check.php` endpoint, as it depends on external services availability (database, cache, etc.).
+
+The following options can be used to define a readiness and liveness probe for the containers:
+
+* `nginx` container: use a `httpGet` probe on the `/nginx-health` endpoint we defined in the vhost
+* `php-fpm` container:
+  * Use a `tcpSocket` probe on the port 9000 (PHP-FPM listens on this port)
+  * Use an `exec` probe to run a PHP script (similar to `health_check.php`), without relying on external services
+
+### Graceful termination
+
+When a `Pod` is terminated, Kubernetes sends a `SIGTERM` signal to the main process of the container, and waits for a grace period (30 seconds by default) before sending a `SIGKILL` signal to force the container to stop.
+
+To make sure that our `nginx` container finishes serving all the requests before being terminated, we can define a `preStop` command in the `Deployment` manifest:
+
+```yaml{13-16}
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: magento-deployment
+spec:
+  ...
+  template:
+    ...
+    spec:
+      containers:
+        - name: nginx
+          ...
+          lifecycle:
+            preStop:
+              exec:
+                command: ["/bin/sh", "-c", "sleep 3; nginx -s quit; while killall -0 nginx; do sleep 1; done"]
+          ...
+```
+
+Defining such a pre-stop command will make sure that the `SIGTERM` signal is sent to the containers only once all the requests are served.
+
+> [!TIP]
+> If for some (good) reason you need to handle some requests during more than 30 seconds, you can increase the grace period by setting the `terminationGracePeriodSeconds` property in the `Deployment` manifest.
 
 ## Database (MySQL)
 
+High availability for MySQL can _almost_ be achieved by using a primary-replica replication, with automatic failover:
+
+```mermaid
+%%{init: {"flowchart": {"htmlLabels": false}} }%%
+graph LR
+    P["`**MySQL**
+    Primary`"]
+    R["`**MySQL**
+    Replica`"]
+    
+    P -->|Replication| R
+```
+
+Practically speaking, automatic failover is rarely achieved, as it requires to be regularly tested, and to define a proper failover strategy: how is failover decided? how is the new primary elected? how is the traffic redirected to the new primary?
+
+Using a cloud provider's managed database service (RDS, Cloud SQL, etc.) reduces the risk of unavailability, compared to a self-managed database.
+
+Anyway, it is a best practice to have a read-replica for your database:
+
+* To have a replica in another Availability Zone
+* To have a backup in case the primary goes down
+* To offload read queries from the primary (although Magento / Adobe Commerce is mostly write-heavy), for instance for reporting purposes
+
+To avoid requiring change in your application config when the primary goes down, you can rely on a DNS-based failover solution:
+
+```mermaid
+%%{init: {"flowchart": {"htmlLabels": false}} }%%
+graph TB
+    subgraph B["Failover"]
+        direction TB
+        M2["`**Magento**`"]
+
+        D2["`**DNS**
+            mysql.my-hostname.com`"]
+
+
+        subgraph AZ2
+            R2["`**MySQL**
+            Replica
+            (Promoted as primary)`"]
+        end
+
+        subgraph AZ1
+            P2["`**MySQL**
+            Primary`"]
+        end
+
+        M2 -->|Resolves| D2
+
+        D2 --> AZ2
+
+        style P2 fill:#E7999E,stroke:#ff0000,stroke-dasharray: 5 5
+    end
+    
+    subgraph A["Normal operation"]
+        direction TB
+        M["`**Magento**`"]
+
+        D["`**DNS**
+            mysql.my-hostname.com`"]
+        
+        subgraph AZ21["AZ1"]
+            P["`**MySQL**
+            Primary`"]
+        end
+        
+        subgraph AZ22["AZ2"]
+            R["`**MySQL**
+            Replica`"]
+        end
+        
+        M -->|Resolves| D
+        
+        D --> AZ21
+        P -->|Replication| R
+    end
+```
+
+> [!TIP]
+> [Amazon Aurora DB Clusters](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Overview.html) offer a managed database service with automatic failover (using a single DNS record), read-replicas, and more.
+
 ## Cache (Redis)
 
+Achieving high availability for Redis is quite simple:
+
+* Using Redis cluster, with instances distributed across multiple AZs
+* Using Redis standalone nodes in multiple AZs, with a DNS-based failover solution
+
+As Magento / Adobe Commerce only stores cache in Redis (no persistent data), it is totally safe to use a DNS-based failover solution for Redis, without requiring any replication.
+
 ## Search (Elasticsearch / OpenSearch)
+
+High availability for Elasticsearch / OpenSearch can be achieved by using a cluster with multiple nodes, distributed across multiple AZs.
+
+> [!NOTE]
+> Magento / Adobe Commerce only allows specifying one Elasticsearch / OpenSearch endpoint in the configuration, so you should have some sort of load balancer in front of your Elasticsearch / OpenSearch cluster, or use a DNS-based failover solution.
+
+## Wrapping up
+
+High availability is a complex topic, and requires to take into account from the early stages of the project, including when developping modules:
+
+* How to handle and recover from failures?
+* How to ensure that the system remains operational?
+* How to test the high availability of the system?
+* How to monitor the system?
diff --git a/docs/guide/advanced/log-management.md b/docs/guide/advanced/log-management.md
@@ -0,0 +1,8 @@
+---
+title: Log management
+---
+
+# Log management
+
+> [!NOTE]
+> This page is a work in progress.
diff --git a/docs/guide/advanced/monitoring.md b/docs/guide/advanced/monitoring.md
@@ -0,0 +1,8 @@
+---
+title: Monitoring
+---
+
+# Monitoring
+
+> [!NOTE]
+> This page is a work in progress.
diff --git a/docs/guide/advanced/read-only-filesystem.md b/docs/guide/advanced/read-only-filesystem.md
@@ -0,0 +1,8 @@
+---
+title: Read-only filesystem
+---
+
+# Read-only filesystem
+
+> [!NOTE]
+> This page is a work in progress.
diff --git a/docs/guide/advanced/spot-preemptible-instances.md b/docs/guide/advanced/spot-preemptible-instances.md
@@ -0,0 +1,71 @@
+---
+title: Spot and preemptible instances
+---
+
+# Spot and preemptible instances
+
+Running Magento / Adobe Commerce on spot or preemptible instances can save you a lot of money, especially if you have a large number of instances.
+
+Spot (AWS) and preemptible (GCP) instances are spare compute instances that are available at a lower price than on-demand instances. The price of spot instances fluctuates based on supply and demand, usually between 50% and 90% off the on-demand price.
+
+There are however some caveats to using spot and preemptible instances:
+
+* Spot and preemptible instances can be terminated at any time by the cloud provider, with a 2-minute warning
+* Spot and preemptible instances may not be available when you need them, especially during peak times
+
+## Termination handling
+
+When using spot or preemptible instances, you should have a strategy to handle the termination of instances.
+
+Ours recommendations are the following:
+
+* Only run web server `Pods` on spot or preemptible instances, or non-critical workloads (like consumers or batch jobs that support restarts)
+* Make sure that your `Pods` are able to handle the termination of instances gracefully (see [graceful termination](/guide/advanced/high-availability.html#graceful-termination))
+* Keep a buffer of on-demand instances (2 minimum) to handle the termination of spot or preemptible instances
+
+> [!NOTE]
+> Amazon EKS requires an additional component to handle the termination of spot instances: the [Amazon Node Termination Handler](https://github.com/aws/aws-node-termination-handler).
+
+```mermaid
+%%{init: {"flowchart": {"htmlLabels": false}} }%%
+graph TB
+    subgraph Spot["Spot / Preemptible Pool"]
+        direction TB
+        subgraph N3["Node 3"]
+            P5[Web Server Pod]
+            P6[Web Server Pod]
+            P7[Web Server Pod]
+        end
+        subgraph N4["Node 4"]
+            P1[Web Server Pod]
+            P2[Web Server Pod]
+            P3[Web Server Pod]
+        end
+    end
+    subgraph OnDemand["On-Demand Pool"]
+        direction TB
+        subgraph N1["Node 1"]
+            PO1[Web Server Pod]
+            PO2[Cron Pod]
+            PO3[Consumer Pod]
+        end
+        subgraph N2["Node 2"]
+            PO4[Web Server Pod]
+            PO6[Consumer Pod]
+        end
+    end
+```
+
+> [!TIP]
+> You may be running your **entire non-production cluster on spot or preemptible instances**, and only have on-demand instances for production.<br/>
+> Additionally, consider using [kube-downscaler](https://codeberg.org/hjacobs/kube-downscaler) to automatically scale down / shut down unused non-production workloads, i.e. during the night and weekends.
+
+## Instance types
+
+When using spot or preemptible instances, you should use instance types that are less likely to be terminated.
+
+You should also allow multiple instance types in your node group / node pool configuration, to ensure that the cluster can still operate if one instance type is not available.
+
+For example, you could use a mix of `m6.large`, `m6.xlarge`, and `m6.2xlarge` instances in your node group configuration.
+
+To optimize the instance type selection, you can use [Karpenter](https://karpenter.sh/), with which you can define advanced instance type selection strategies based on the instance type availability and pricing. Karpenter even allows falling back to on-demand instances if no spot instances are available.
diff --git a/docs/guide/deployment/architecture.md b/docs/guide/deployment/architecture.md
@@ -307,6 +307,15 @@ The following directories might need to be shared between Pods:
 > 
 > <sup>2</sup> : Cache and sessions should be stored in Redis. Logs can be shared to facilitate cross-pod debugging, but may have simultaneous write issues. We'll see further how to get logs printed to standard output of Pods.
 
+If you decide to share the directories between `Pods`, you will need to use `PersistentVolumes` and `PersistentVolumeClaims` with a `ReadWriteMany` compatible `StorageClass`.
+
+> [!INFO]
+> Few `ReadWriteMany` storage classes options are available for Kubernetes, you should rely on the one provided by your cloud provider (i.e. AWS EFS, GCP Filestore, Azure Files, etc.), which is most of the time NFS-based.
+
+> [!TIP]
+> You should avoid as much as possible sharing directories between `Pods` (and more generally persisting data to disk), as it might lead to performance issues and data corruption.<br/>
+> Prefer, whenever possible, using external object storage solutions (i.e. S3, GCS, etc.) to store persistent data.
+
 ## Configuration and secrets
 
 Configuration and secrets should be stored in `ConfigMaps` and `Secrets` respectively.
diff --git a/docs/guide/deployment/ci-cd.md b/docs/guide/deployment/ci-cd.md
@@ -0,0 +1,8 @@
+---
+title: CI/CD
+---
+
+# CI/CD
+
+> [!NOTE]
+> This page is a work in progress.
diff --git a/docs/guide/deployment/external-services.md b/docs/guide/deployment/external-services.md
@@ -39,7 +39,3 @@ An up-to-date list of supported versions can be found in the [official documenta
 > [!INFO]
 > Whenever possible, we recommend using managed services for databases, caches, and message queues. Managed services are easier to maintain and scale, and they often come with built-in monitoring and backup solutions.
 > Although Adobe only officially supports AWS managed services, **you can use other cloud providers as well**, as long as the service versions are compatible with Magento / Adobe Commerce.
-
-## Database
-
-TODO
diff --git a/docs/guide/deployment/helm-chart.md b/docs/guide/deployment/helm-chart.md
@@ -18,11 +18,12 @@ When deploying our Magento / Adobe Commerce, the process is as follows:
    * `ConfigMap`
    * `Secret`
    * `CronJob`
+   * Etc.
 3. Wait for the new `Pods` to be ready
 4. Flush the cache
 
 > [!NOTE]
-> When using per release Redis ID prefixes, there is no need to flush the cache after each deployment.
+> When [using per release Redis ID prefixes](/guide/preparation/configuration#redis-id-prefix), there is no need to flush the cache after each deployment.
 
 > [!IMPORTANT]
 > We will rely on [Helm hooks](https://helm.sh/docs/topics/charts_hooks/) to run the `bin/magento setup:upgrade` in a Kubernetes `Job` (during the `pre-install` and `pre-upgrade` hooks).<br/>
@@ -48,3 +49,12 @@ chart/
 ├── secrets.yaml
 └── values.yaml
 ```
+
+> [!NOTE]
+> Our sample Magento / Adobe Commerce Helm Chart will soon be available in our [GitHub repository](https://github.com/ClickAndMortar/magento-kubernetes).
+
+To follow Helm's best practices, you should initialize your Helm chart with the following command:
+
+```shell
+helm create mychart
+```
diff --git a/docs/index.md b/docs/index.md
diff --git a/install.sh b/install.sh
diff --git a/vhost.nginx b/vhost.nginx

Original file line number	Diff line number	Diff line change
`@@ -1,3 +1,3 @@`
`1`	`1`	`# Deploy Magento on Kubernetes`
`2`	`2`
`3`		`-See https://www.clickandmortar.fr/blog/deploy-magento-2-kubernetes-docker`
	`3`	`+➡️ See our complete guide: https://clickandmortar.github.io/magento-kubernetes/`