Traefik High Availability on Kubernetes with Let’s Encrypt, Cert Manager and AWS Route53
When deploying Traefik with Let's Encrypt on Kubernetes, need quickly arise for High Availability (HA). It requires multiple instances of Traefik to run in parallel, but prevents the use of Traefik's built-in Let's Encrypt features:
it is not possible to run multiple instances of Traefik 2.0 with Let's Encrypt enabled, because there is no way to ensure that the correct instance of Traefik receives the challenge request, and subsequent responses
Traefik recommends to use either an external system or Traefik Enterprise
If you need Let's Encrypt with high availability in a Kubernetes environment, we recommend using Traefik Enterprise which includes distributed Let's Encrypt as a supported feature.
If you want to keep using Traefik Proxy, LetsEncrypt HA can be achieved by using a Certificate Controller such as Cert-Manager.
We'll explore the Cert Manager solution in this post.
Requirements
- Basic knowledge about Kubernetes, Helm, Traefik, Let's Encrypt and AWS
- A Kubernetes cluster supporting LoadBalancer services (such as EKS or K3S)
- A publicly available AWS Route53 Hosted Zone
- In our example, we'll use Hosted Zone
devops.crafteo.io.
to deploy an app underwhoami.devops.crafteo.io
- In our example, we'll use Hosted Zone
- Multiple nodes our on Kubernetes cluster (to spread Traefik instances and achieve HA)
- A DNS record pointing to your Kubernetes nodes.
- In our case,
*.devops.crafteo.io.
will point to our nodes. - This can be achieved using a
LoadBalancer
service with most Kubernetes providers (EKS, AKS, GKE...) and creating an DNS record pointing to Load Balancer address.
- In our case,
Install Traefik on Kubernetes
To achieve High Availability with Traefik, multiple instances of Traefik spread across multiple nodes are required. This ensures at least one instance is available to serve requests even if some nodes or instances are down.
Using Helm Traefik chart, we can use a values file such as:
# We want an highly available Traefik
# Use a Deployment with nodeAffinity spreading our Pods across nodes
# We could also use a DaemonSet to have an instance par node
deployment:
replicas: 3
# Using affinity from Traefik default values example
# This pod anti-affinity forces the scheduler to put traefik pods
# on nodes where no other traefik pods are scheduled.
# It should be used when hostNetwork: true to prevent port conflicts
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: failure-domain.beta.kubernetes.io/zone
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- traefik
# Automatically redirect http to https
# Not required but handy
ports:
web:
redirectTo: websecure
Note that Traefik won't need any Let's Encrypt config as Cert Manager will handle certificate generation.
We won't go into specifics as how chart is deployed as it highly depends on the solution used (Helm CLI, Ansible, Pulumi...)
Install Cert Manager
Cert Manager is an open-source solution aiming to automate certificate management in cloud native environments. It will allow use to request and manage ACME certificates for our domain using Let's Encrypt and DNS-01 challenge via AWS Route53.
As we own devops.crafteo.io.
Hosted Zone, we'll be able to generate wildcard certificates for *.devops.crafteo.io
AWS access for Cert Manager
Cert Manager will need AWS permissions in order to access Route53 Hosted Zone when verifying domains (TXT
records will be created and checked by the verifying entity as part of DNS-01 challenge).
We'll create an IAM User with enough permissions to manage our Hosted Zone records:
- Create an IAM User named such as
certManagerIAMUser
- Affect an IAM policy allowing access to our Hosted Zone such as (by replacing
OURHOSTEDZONEID
with our Hosted Zone ID):{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "route53:GetChange", "Resource": "arn:aws:route53:::change/*" }, { "Effect": "Allow", "Action": [ "route53:ChangeResourceRecordSets", "route53:ListResourceRecordSets" ], "Resource": "arn:aws:route53:::hostedzone/OURHOSTEDZONEID" }, { "Effect": "Allow", "Action": "route53:ListHostedZonesByName", "Resource": "*" } ] }
- Generate credentials for our user (AWS Access Key ID and Secret Access Key)
Cert Manager will use a Secret
reference holding our Secret Access Key to access AWS. Create the secret such as:
kind: Secret
apiVersion: v1
type: Opaque
metadata:
name: cert-manager-aws-secret
# IMPORTANT: secret must be in same namespace as Cert Manager deployment
namespace: cert-manager
stringData:
# AWS Secret Access Key generated for our user
secret-access-key: xxx
# No need to specify Access Key ID here
# It'll be specified on Cert Manager Issuer resource
AWS Credentials are now ready to be used by Cert Manager.
_Note: we could have used another method, such as deploying Cert Manager on EC2 instances Kubernetes nodes using IAM Instance Profile. In such case, we wouldn't have needed AWS IAM user with credentials._
Deploy Helm chart
Cert Manager can be installed with Helm chart and values such as:
# Install CRDs such as ClusterIssuer
# We'll need them in order to manage certificate requests
installCRDs: true
# Define default certificate issuer to be used
# In our case, ClusterIssuer cert-manager-acme-issuer should be used by default
ingressShim:
defaultIssuerGroup: cert-manager.io
defaultIssuerKind: ClusterIssuer
defaultIssuerName: cert-manager-acme-issuer
Cert Manager chart also managed a few Kubernetes CRDs we'll use to manage certificate issuance:
ClusterIssuer
andIssuer
- Used to issue certificate when aCertificate
resource is created. It will require AWS IAM credentials to access Route53 in order to solve ACME DNS-01 challenge. See docs for details.Certificate
- Represent a certificate and reference it's relatedSecret
holding the certificate and its private key.
Let's define a ClusterIssuer
using ACME DNS-01 challenge with our AWS credentials such as:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: cert-manager-acme-issuer
# Important: use the same namespace as Cert Manager deployment
# Otherwise Cert Manager won't be able to find related elements
namespace: cert-manager
spec:
acme:
# Email on which you'll receive notification for our certificates (expiration and such)
email: pierre@crafteo.io
# Name of the secret under which to store the secret key used by acme
# This secret is managed by ClusterIssuer resource, you don't have to create it yourself
privateKeySecretRef:
name: cert-manager-acme-private-key
# ACME server to use
# Specify https://acme-v02.api.letsencrypt.org/directory for production
# Staging server issues staging certificate which won't be trusted by most external parties but can be used for development purposes
server: https://acme-staging-v02.api.letsencrypt.org/directory
# Solvers define how to validate you're the owner of the domain for which to issue certificate
# We use DNS-01 challenge with Route53 by providing related AWS credentials (access key and secret key) for an IAM user with proper rights to manage Route53 records
solvers:
- dns01:
route53:
# AWS Access Key ID for our Secret Key
accessKeyID: AKIAXXXX
# AWS region to use
region: eu-west-3
# Reference our secret with Secret Key
secretAccessKeySecretRef:
key: secret-access-key
name: cert-manager-aws-secret
# Optionally specify Hosted Zone
# As per doc:
# If set, the provider will manage only this zone in Route53 and will not do an lookup using the route53:ListHostedZonesByName api call.
# hostedZoneID: xxx
Make sure to use the Access Key ID matching our Secret Key configured as Secret
:
accessKeyID: AKIAXXXX
The example above uses Let's Encrypt staging server to avoid hitting rate limits while testing. For Production usage, you'll want to use:
server: https://acme-v02.api.letsencrypt.org/directory
Once created, our ClusterIssuer
will appear as Ready
(this may take a few seconds while ACME account is registered with provided email)
$ kubectl get clusterissuer
NAME READY
cert-manager-acme-issuer True
Cert Manager is now ready to issue certificates with our ClusterIssuer
!
Generate a certificate for our domain
Once ClusterIssuer
is ready, we can create our Certificate
in any namespace such as:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: crafteo-wildcard
spec:
# Certificate will be valid for these domain names
dnsNames:
- devops.crafteo.io
- '*.devops.crafteo.io'
# Reference our issuer
# As it's a ClusterIssuer, it can be in a different namespace
issuerRef:
kind: ClusterIssuer
name: cert-manager-acme-issuer
# Secret that will be created with our certificate and private keys
secretName: crafteo-wildcard-certificate
Cert Manager will then create a few resources to manage our certificate request:
- A
CertificateRequest
user to request a certificate from anIssuer
- An
Order
managed by ourClusterIssuer
to 'order' a certificate from ACME - A
Challenge
for each DNS names that are being verified - Finally, a
Secret
holding our certificate and its private key
Each of these resources will have it's own lifecycle (check Cert Manager docs for details).
Our certificate will appear as Ready
after a few moments (DNS-01 challenge may take a few minutes to complete):
$ kubectl get certificate
NAME READY SECRET
crafteo-wildcard True crafteo-wildcard-certificate
Along with related secret with certificate and private key:
apiVersion: v1
kind: Secret
type: kubernetes.io/tls
metadata:
name: crafteo-wildcard-certificate
data:
tls.crt: xxx
tls.key: xxx
In case something goes wrong, checking Cert Manager logs and resources events may give us insight on what happened (such as invalid IAM permissions or another error during DNS-01 challenge lifecycle):
$ kubectl -n cert-manager logs cert-manager-xxx
$ kubectl describe certificaterequest
$ kubectl describe order
$ kubectl describe challenge
$ kubectl describe certificate
Deploy an application with Ingress using our certificate
Once created, we can now deploy an application with an Ingress making use of our certificate. We have multiple possibilities here:
- Use Kubernetes plain
Ingress
- Use Traefik's
IngressRoute
We're gonna keep things simple and use plain Ingress with whoami
Docker image:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: whoami
annotations:
# Required for Traefik to handle HTTPS requests
traefik.ingress.kubernetes.io/router.tls: "true"
spec:
tls:
- hosts:
- whoami.devops.crafteo.io
secretName: crafteo-wildcard-certificate
rules:
- host: whoami.devops.crafteo.io
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: whoami
port:
number: 80
---
apiVersion: v1
kind: Service
metadata:
name: whoami
labels:
app: whoami
spec:
ports:
- port: 80
selector:
app: whoami
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: whoami
labels:
app: whoami
spec:
replicas: 1
selector:
matchLabels:
app: whoami
template:
metadata:
labels:
app: whoami
spec:
containers:
- name: whoami
image: traefik/whoami
ports:
- containerPort: 80
Our application should now be available at specified adress!
Full example with Pulumi and K3S
Check out Devops Lifecycle Example GitHub repository for a full example using Pulumi and K3S servers.