Skip to content

Commit 9aa392b

Browse files
committed
wip: complete Part 2
1 parent 364c40b commit 9aa392b

File tree

10 files changed

+227
-25
lines changed

10 files changed

+227
-25
lines changed

projects/eks-tf-tutorial/README.md

Lines changed: 81 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ Goals:
1414

1515
## Development
1616

17+
### Create the EKS cluster
18+
1719
To get started:
1820

1921
```shell
@@ -24,19 +26,20 @@ terraform init
2426
Plan:
2527

2628
```shell
27-
AWS_PROFILE=personal terraform plan
29+
export AWS_PROFILE=personal
30+
terraform plan
2831
```
2932

3033
Apply:
3134

3235
```shell
33-
AWS_PROFILE=personal terraform apply
36+
terraform apply
3437
```
3538

3639
Destroy:
3740

3841
```shell
39-
AWS_PROFILE=personal terraform destroy
42+
terraform destroy
4043
```
4144

4245
## Notes
@@ -89,8 +92,79 @@ Part 2: https://www.youtube.com/watch?v=uiuoNToeMFE&list=PLiMWaCMwGJXnKY6XmeifEp
8992
- In this tutorial, we will use the policy managed by Amazon, `AmazonEKSClusterPolicy`, which contains all the permissions that the cluster needs. However, this policy assumes you will use a legacy provider that additionally requires load balancing permissions. These permissions constitute over half of the policy (i.e the `elasticloadbalancing` permissions). In the following sections, we will install AWS Load Balancer Controller that will take on that responsibility. At that point, we will create our own separate role to remove these redundant permissions.
9093
- K8s workers require several permissions of their own:
9194
- The `AmazonWorkerNodePolicy` policy provides core EC2 functionality permissions, e.g. `ec2:DescribeInstances`.
92-
- "EKS pod identity" allows fine-grained access to control of pod permissions, e.g. giving access to a pod to read/write to a specific S3 bucket.
93-
- Before the v3 of Pod Identities, we had to use an OpenID Connect provider and IAM roles for service accounts.
95+
- The latest v3 of `AmazonWorkerNodePolicy` also allows "EKS pod identity" for fine-grained access to control of pod permissions, e.g. giving access to a pod to read/write to a specific S3 bucket.
96+
- Before we had Pod Identities(`AmazonWorkerNodePolicy` v2), we had to use an OpenID Connect provider and IAM roles for service accounts to achieve the same thing.
9497
- We will look at the three separate methods for authentication later in the tutorial.
95-
- We have to grant EKS access to modify IP address configuration on EKS worker nodes.
96-
- When we create a pod on K8s, it is assigned an IP from the secondary IP address range assigned to the worker node. We're not using virtual networks like Flannel or Calico,
98+
- We have to grant EKS access to modify IP address configuration on EKS worker nodes using the `AmazonEKS_CNI_Policy`.
99+
- When we create a pod on K8s, it is assigned an IP from the secondary IP address range assigned to the worker node. We're not using virtual networks like Flannel or Calico, we get native AWS IP addresses for each pod. Later in the course, we will create k8s Service of type `LoadBalancer` which allows `Instance mode` and `IP mode`. The former is used for `NodePort` Services, while the latter works with `LoadBalancer` Services to route traffic directly to pod IPs.
100+
- Using `NodePort` isn't recommended in production due to security, scalability, and latency issues.
101+
- `NodePort` opens a port on the underlying worker node, making the node vulnerable to attacks.
102+
- `NodePort` is limited to 2767 ports per node (30000-32767)
103+
- `NodePort` increases latency as there are more network hops involved
104+
- However, `NodePort` is a lot more cost efficient, as `LoadBalancer` Services create a Network Load Balancer per Service. For this reason, it is common to use Istio to handle cluster networking for both ingress (proxying) and load balancing, as well as the other benefits Istio brings (TLS, centralised network monitoring/tooling).
105+
- Lastly, the `AmazonEC2ContainerRegistryReadOnly` policy is used to grant EKS permission to pull Docker images from ECR.
106+
107+
In the code:
108+
109+
- _7-eks.tf_:
110+
- We create an IAM role per cluster, e.g. `"${local.name_prefix}-${local.eks_name}-eks-cluster"` in case we have multiple clusters per environment.
111+
- We set up the network settings with `endpoint_public_access=true` to make the cluster publicly accessible.
112+
- Later in the tutorial, we will see how to set up a private EKS cluster using AWS Client VPN and a private DNS that we can use for ingress and private services that we don't want to expose to the internet, e.g. Grafana dashboards, Temporal UI, etc.
113+
- Even with a public cluster, the worker nodes will still be deployed on the private subnets without public IP addresses.
114+
- We specified the two private subnets to place the worker nodes. It is required to have 2 AZs. EKS will create cross-account elastic network interface in these subnets to allow communication between worker nodes and the K8s control plane.
115+
- We configured authentication to use `API`.
116+
- We used to have to manage authentication using `aws_auth` ConfigMap in kube system namespace. However, this is deprecated now. It wasn't convenient to manage existing ConfigMaps in K8s using Terraform resources.
117+
- AWS developed an API that we can use to add users to the cluster. You can still use ConfigMap and even both ConfigMap and API. But the API is highly recommended for user management. In the next section, we cover how to add IAM roles and IAM users to access the cluster.
118+
- We set the `bootstrap_cluster_creator_admin_permissions=true` to grant the Terraform user admin priveleges. This option defaults to `true` anyway, but we want to be explicit. For us, because we will use Terraform to deploy Helm Charts and plain YAML resources.
119+
- _8-nodes.tf_:
120+
- We create an IAM role for the worker nodes. We add `ec2.amazonaws.com` as the trust relationship instead of `eks.amazonaws.com` like we did in the eks cluster. In the next setion, we'll create an IAM role with a trust policy that allows specific users to assume it.
121+
- We attached the three IAM policies discussed earlier to the nodes.
122+
- Finally, we add the node group itself.
123+
- Behind the scenes its managed as an EC2 autoscaling group.
124+
- EKS has three types of autoscaling groups:
125+
- Self-managed node groups:
126+
- We can create the nodes ourselves using Terraform and Packer templates to customise the EC2 VM to install any packages we need.
127+
- Limitations: EKS will not drain the node during upgrades. This can be done semi-manually.
128+
- Managed node groups:
129+
- We'll use these in this tutorial. They are a lot easier to manage and upgrade as this is managed by the EKS control plane.
130+
- Fargate:
131+
- Fully managed node groups, "serverless" K8s, that only require the user to deploy their containers and EKS will automatically provision and scale the worker nodes for you.
132+
- A lot easier to manage but a lot more expensive. It has limitations such as EBS volumes.
133+
- We set the `node_group_name` to general
134+
- In production, it is common to have different node groups for different workloads, e.g. CPU optimised node groups, memory optimised node groups, or even GPU node groups for machine learning.
135+
- Another common node group is for spot instances. This node group can be used for fault-tolerant workloads that can handle interuptions, such as batch/streaming jobs and async workflows. Spot instances can save up to 90%.
136+
- We add the two private subnets to place the worker nodes.
137+
- In production, if we have data intensive workloads, such as running Kafka and 100s of services that read and write to Kafka from different AZs, cross AZ data transfer costs can be super expensive, even more expensive than the compute!
138+
- We could place all the worker nodes in a single AZ - you don't always need to create a highly available cluster by spreading nodes in different zones.
139+
- For `capacity_type` we chose `ON_DEMAND` but we also have `STANDARD` and `SPOT`. We also set the `instance_types`.
140+
- In a future tutorial, the instructor will show a strategy to come up with the proper node size.
141+
- The `scaling_config` by itself will not autoscale but we can set the minimum and maximum nodes as well as the `desired_size`. We will need to deploy addition components called Cluster Autoscaler to adjust the `desired_size` based on the load, such as how many pending pods we have and their resource requirements.
142+
- The `update_config` is used for cluster upgrades.
143+
- The `labels` are used for pod affinity and node selectors. There are some built-in labels derived from the node group that serve the same purpose. However, in practice when you try to migrate applications from one node group to another with the same labels it is much easier to use custom labels.
144+
145+
At the end of this part:
146+
147+
Check you're connected with the correct user:
148+
149+
```shell
150+
export AWS_PROFILE=personal
151+
aws sts get-caller-identity
152+
```
153+
154+
Next, update the local kubeconfig with the following command:
155+
156+
```shell
157+
aws eks update-kubeconfig --region eu-west-2 --name eks-tutorial-dev-demo
158+
```
159+
160+
Verify you can access the nodes (shows we have admin privileges):
161+
162+
```shell
163+
kubectl get nodes
164+
```
165+
166+
Double check we have admin privileges in the EKS cluster (should output "yes"):
167+
168+
```shell
169+
kubectl auth can-i "*" "*"
170+
```

projects/eks-tf-tutorial/terraform/0-locals.tf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ locals {
22
project = "eks-tutorial"
33
env = "dev"
44
name_prefix = "${local.project}-${local.env}"
5-
5+
66
region = "eu-west-2"
77
zone1 = "eu-west-2a"
88
zone2 = "eu-west-2b"

projects/eks-tf-tutorial/terraform/1-providers.tf

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
11
provider "aws" {
22
region = local.region
3+
4+
default_tags {
5+
tags = {
6+
Terraform = true
7+
Project = local.project
8+
Environment = local.env
9+
}
10+
}
311
}
412

513
terraform {

projects/eks-tf-tutorial/terraform/2-vpc.tf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
resource "aws_vpc" "main" {
22
cidr_block = "10.0.0.0/16"
33

4-
enable_dns_support = true
4+
enable_dns_support = true
55
enable_dns_hostnames = true
66

77
tags = {
Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,23 @@
11
resource "aws_subnet" "private_zone1" {
2-
vpc_id = aws_vpc.main.id
3-
cidr_block = "10.0.0.0/19"
4-
availability_zone = local.zone1
2+
vpc_id = aws_vpc.main.id
3+
cidr_block = "10.0.0.0/19"
4+
availability_zone = local.zone1
55

66
tags = {
7-
"Name" = "${local.name_prefix}-private-${local.zone1}"
8-
"kubernetes.io/role/internal-elb" = "1"
7+
"Name" = "${local.name_prefix}-private-${local.zone1}"
8+
"kubernetes.io/role/internal-elb" = "1"
99
"kubernetes.io/cluster/${local.name_prefix}-${local.eks_name}" = "owned"
1010
}
1111
}
1212

1313
resource "aws_subnet" "private_zone2" {
14-
vpc_id = aws_vpc.main.id
15-
cidr_block = "10.0.32.0/19"
16-
availability_zone = local.zone2
14+
vpc_id = aws_vpc.main.id
15+
cidr_block = "10.0.32.0/19"
16+
availability_zone = local.zone2
1717

1818
tags = {
19-
"Name" = "${local.name_prefix}-private-${local.zone2}"
20-
"kubernetes.io/role/internal-elb" = "1"
19+
"Name" = "${local.name_prefix}-private-${local.zone2}"
20+
"kubernetes.io/role/internal-elb" = "1"
2121
"kubernetes.io/cluster/${local.name_prefix}-${local.eks_name}" = "owned"
2222
}
2323
}
@@ -29,8 +29,8 @@ resource "aws_subnet" "public_zone1" {
2929
map_public_ip_on_launch = true
3030

3131
tags = {
32-
"Name" = "${local.name_prefix}-public-${local.zone1}"
33-
"kubernetes.io/role/elb" = "1"
32+
"Name" = "${local.name_prefix}-public-${local.zone1}"
33+
"kubernetes.io/role/elb" = "1"
3434
"kubernetes.io/cluster/${local.name_prefix}-${local.eks_name}" = "owned"
3535
}
3636
}
@@ -42,8 +42,8 @@ resource "aws_subnet" "public_zone2" {
4242
map_public_ip_on_launch = true
4343

4444
tags = {
45-
"Name" = "${local.name_prefix}-public-${local.zone2}"
46-
"kubernetes.io/role/elb" = "1"
45+
"Name" = "${local.name_prefix}-public-${local.zone2}"
46+
"kubernetes.io/role/elb" = "1"
4747
"kubernetes.io/cluster/${local.name_prefix}-${local.eks_name}" = "owned"
4848
}
4949
}

projects/eks-tf-tutorial/terraform/5-nat.tf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,5 +14,5 @@ resource "aws_nat_gateway" "nat" {
1414
Name = "${local.name_prefix}-nat"
1515
}
1616

17-
depends_on = [ aws_internet_gateway.igw ]
17+
depends_on = [aws_internet_gateway.igw]
1818
}

projects/eks-tf-tutorial/terraform/6-routes.tf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ resource "aws_route_table" "private" {
22
vpc_id = aws_vpc.main.id
33

44
route {
5-
cidr_block = "0.0.0.0/0"
5+
cidr_block = "0.0.0.0/0"
66
nat_gateway_id = aws_nat_gateway.nat.id
77
}
88

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
resource "aws_iam_role" "eks" {
2+
name = "${local.name_prefix}-${local.eks_name}-eks-cluster"
3+
4+
assume_role_policy = <<POLICY
5+
{
6+
"Version": "2012-10-17",
7+
"Statement": [
8+
{
9+
"Effect": "Allow",
10+
"Action": "sts:AssumeRole",
11+
"Principal": {
12+
"Service": "eks.amazonaws.com"
13+
}
14+
}
15+
]
16+
}
17+
POLICY
18+
}
19+
20+
resource "aws_iam_role_policy_attachment" "eks" {
21+
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
22+
role = aws_iam_role.eks.name
23+
}
24+
25+
resource "aws_eks_cluster" "eks" {
26+
name = "${local.name_prefix}-${local.eks_name}"
27+
version = local.eks_version
28+
role_arn = aws_iam_role.eks.arn
29+
30+
vpc_config {
31+
endpoint_private_access = false
32+
endpoint_public_access = true
33+
34+
subnet_ids = [
35+
aws_subnet.private_zone1.id,
36+
aws_subnet.private_zone2.id,
37+
]
38+
}
39+
40+
access_config {
41+
authentication_mode = "API"
42+
bootstrap_cluster_creator_admin_permissions = true
43+
}
44+
45+
depends_on = [aws_iam_role_policy_attachment.eks]
46+
}
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
resource "aws_iam_role" "nodes" {
2+
name = "${local.name_prefix}-${local.eks_name}-eks-nodes"
3+
4+
assume_role_policy = <<POLICY
5+
{
6+
"Version": "2012-10-17",
7+
"Statement": [
8+
{
9+
"Effect": "Allow",
10+
"Action": "sts:AssumeRole",
11+
"Principal": {
12+
"Service": "ec2.amazonaws.com"
13+
}
14+
}
15+
]
16+
}
17+
POLICY
18+
}
19+
20+
// This policy now includes AssumeRoleForPodIdentity for the Pod Identity Agent
21+
resource "aws_iam_role_policy_attachment" "amazon_eks_worker_node_policy" {
22+
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
23+
role = aws_iam_role.nodes.name
24+
}
25+
26+
resource "aws_iam_role_policy_attachment" "amazon_eks_cni_policy" {
27+
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
28+
role = aws_iam_role.nodes.name
29+
}
30+
31+
resource "aws_iam_role_policy_attachment" "amazon_ec2_container_registry_read_only" {
32+
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
33+
role = aws_iam_role.nodes.name
34+
}
35+
36+
resource "aws_eks_node_group" "general" {
37+
cluster_name = aws_eks_cluster.eks.name
38+
version = local.eks_version
39+
node_group_name = "general"
40+
node_role_arn = aws_iam_role.nodes.arn
41+
42+
subnet_ids = [
43+
aws_subnet.private_zone1.id,
44+
aws_subnet.private_zone2.id,
45+
]
46+
47+
capacity_type = "ON_DEMAND"
48+
instance_types = ["t3.large"]
49+
50+
scaling_config {
51+
desired_size = 1
52+
max_size = 10
53+
min_size = 0
54+
}
55+
56+
update_config {
57+
max_unavailable = 1
58+
}
59+
60+
labels = {
61+
role = "general"
62+
}
63+
64+
depends_on = [
65+
aws_iam_role_policy_attachment.amazon_eks_worker_node_policy,
66+
aws_iam_role_policy_attachment.amazon_eks_cni_policy,
67+
aws_iam_role_policy_attachment.amazon_ec2_container_registry_read_only,
68+
]
69+
70+
// Allow external changes without Terraform plan difference
71+
lifecycle {
72+
ignore_changes = [scaling_config[0].desired_size]
73+
}
74+
}

0 commit comments

Comments
 (0)