Traefik High Availability on Kubernetes with Let’s Encrypt, Cert Manager and AWS Route53

Table of Content

When deploying Traefik with Let's Encrypt on Kubernetes, need quickly arise for High Availability (HA). It requires multiple instances of Traefik to run in parallel, but prevents the use of Traefik's built-in Let's Encrypt features:

it is not possible to run multiple instances of Traefik 2.0 with Let's Encrypt enabled, because there is no way to ensure that the correct instance of Traefik receives the challenge request, and subsequent responses

Traefik recommends to use either an external system or Traefik Enterprise

If you need Let's Encrypt with high availability in a Kubernetes environment, we recommend using Traefik Enterprise which includes distributed Let's Encrypt as a supported feature.

If you want to keep using Traefik Proxy, LetsEncrypt HA can be achieved by using a Certificate Controller such as Cert-Manager.

We'll explore the Cert Manager solution in this post.

Requirements

  • Basic knowledge about Kubernetes, Helm, Traefik, Let's Encrypt and AWS
  • A Kubernetes cluster supporting LoadBalancer services (such as EKS or K3S)
  • A publicly available AWS Route53 Hosted Zone
    • In our example, we'll use Hosted Zone devops.crafteo.io. to deploy an app under whoami.devops.crafteo.io
  • Multiple nodes our on Kubernetes cluster (to spread Traefik instances and achieve HA)
  • A DNS record pointing to your Kubernetes nodes.
    • In our case, *.devops.crafteo.io. will point to our nodes.
    • This can be achieved using a LoadBalancer service with most Kubernetes providers (EKS, AKS, GKE...) and creating an DNS record pointing to Load Balancer address.

Install Traefik on Kubernetes

To achieve High Availability with Traefik, multiple instances of Traefik spread across multiple nodes are required. This ensures at least one instance is available to serve requests even if some nodes or instances are down.

Using Helm Traefik chart, we can use a values file such as:

# We want an highly available Traefik
# Use a Deployment with nodeAffinity spreading our Pods across nodes
# We could also use a DaemonSet to have an instance par node
deployment:
  replicas: 3

# Using affinity from Traefik default values example
# This pod anti-affinity forces the scheduler to put traefik pods
# on nodes where no other traefik pods are scheduled.
# It should be used when hostNetwork: true to prevent port conflicts
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - topologyKey: failure-domain.beta.kubernetes.io/zone
      labelSelector:
        matchExpressions:
        - key: app
          operator: In
          values: 
          - traefik

# Automatically redirect http to https
# Not required but handy
ports:
  web:
    redirectTo: websecure

Note that Traefik won't need any Let's Encrypt config as Cert Manager will handle certificate generation.

We won't go into specifics as how chart is deployed as it highly depends on the solution used (Helm CLI, Ansible, Pulumi...)

Install Cert Manager

Cert Manager is an open-source solution aiming to automate certificate management in cloud native environments. It will allow use to request and manage ACME certificates for our domain using Let's Encrypt and DNS-01 challenge via AWS Route53.

As we own devops.crafteo.io. Hosted Zone, we'll be able to generate wildcard certificates for *.devops.crafteo.io

AWS access for Cert Manager

Cert Manager will need AWS permissions in order to access Route53 Hosted Zone when verifying domains (TXT records will be created and checked by the verifying entity as part of DNS-01 challenge).

We'll create an IAM User with enough permissions to manage our Hosted Zone records:

  • Create an IAM User named such as certManagerIAMUser
  • Affect an IAM policy allowing access to our Hosted Zone such as (by replacing OURHOSTEDZONEID with our Hosted Zone ID):
    {
      "Version": "2012-10-17",
      "Statement": [
          {
              "Effect": "Allow",
              "Action": "route53:GetChange",
              "Resource": "arn:aws:route53:::change/*"
          },
          {
              "Effect": "Allow",
              "Action": [
                  "route53:ChangeResourceRecordSets",
                  "route53:ListResourceRecordSets"
              ],
              "Resource": "arn:aws:route53:::hostedzone/OURHOSTEDZONEID"
          },
          {
              "Effect": "Allow",
              "Action": "route53:ListHostedZonesByName",
              "Resource": "*"
          }
      ]
    }
  • Generate credentials for our user (AWS Access Key ID and Secret Access Key)

Cert Manager will use a Secret reference holding our Secret Access Key to access AWS. Create the secret such as:

kind: Secret
apiVersion: v1
type: Opaque
metadata:
  name: cert-manager-aws-secret
  # IMPORTANT: secret must be in same namespace as Cert Manager deployment
  namespace: cert-manager
stringData:
  # AWS Secret Access Key generated for our user
  secret-access-key: xxx
  # No need to specify Access Key ID here
  # It'll be specified on Cert Manager Issuer resource

AWS Credentials are now ready to be used by Cert Manager.

_Note: we could have used another method, such as deploying Cert Manager on EC2 instances Kubernetes nodes using IAM Instance Profile. In such case, we wouldn't have needed AWS IAM user with credentials._

Deploy Helm chart

Cert Manager can be installed with Helm chart and values such as:

# Install CRDs such as ClusterIssuer
# We'll need them in order to manage certificate requests
installCRDs: true

# Define default certificate issuer to be used
# In our case, ClusterIssuer cert-manager-acme-issuer should be used by default
ingressShim:
  defaultIssuerGroup: cert-manager.io
  defaultIssuerKind: ClusterIssuer
  defaultIssuerName: cert-manager-acme-issuer

Cert Manager chart also managed a few Kubernetes CRDs we'll use to manage certificate issuance:

  • ClusterIssuer and Issuer - Used to issue certificate when a Certificate resource is created. It will require AWS IAM credentials to access Route53 in order to solve ACME DNS-01 challenge. See docs for details.
  • Certificate - Represent a certificate and reference it's related Secret holding the certificate and its private key.

Let's define a ClusterIssuer using ACME DNS-01 challenge with our AWS credentials such as:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: cert-manager-acme-issuer
  # Important: use the same namespace as Cert Manager deployment
  # Otherwise Cert Manager won't be able to find related elements
  namespace: cert-manager
spec:
  acme:
    # Email on which you'll receive notification for our certificates (expiration and such)
    email: pierre@crafteo.io
    # Name of the secret under which to store the secret key used by acme
    # This secret is managed by ClusterIssuer resource, you don't have to create it yourself
    privateKeySecretRef:
      name: cert-manager-acme-private-key
    # ACME server to use
    # Specify https://acme-v02.api.letsencrypt.org/directory for production
    # Staging server issues staging certificate which won't be trusted by most external parties but can be used for development purposes
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    # Solvers define how to validate you're the owner of the domain for which to issue certificate
    # We use DNS-01 challenge with Route53 by providing related AWS credentials (access key and secret key) for an IAM user with proper rights to manage Route53 records 
    solvers:
    - dns01:
        route53:
          # AWS Access Key ID for our Secret Key
          accessKeyID: AKIAXXXX
          # AWS region to use 
          region: eu-west-3
          # Reference our secret with Secret Key
          secretAccessKeySecretRef:
            key: secret-access-key
            name: cert-manager-aws-secret
          # Optionally specify Hosted Zone
          # As per doc:
          # If set, the provider will manage only this zone in Route53 and will not do an lookup using the route53:ListHostedZonesByName api call.
          # hostedZoneID: xxx

Make sure to use the Access Key ID matching our Secret Key configured as Secret:

accessKeyID: AKIAXXXX

The example above uses Let's Encrypt staging server to avoid hitting rate limits while testing. For Production usage, you'll want to use:

server: https://acme-v02.api.letsencrypt.org/directory

Once created, our ClusterIssuer will appear as Ready (this may take a few seconds while ACME account is registered with provided email)

$ kubectl get clusterissuer
NAME                       READY
cert-manager-acme-issuer   True

Cert Manager is now ready to issue certificates with our ClusterIssuer!

Generate a certificate for our domain

Once ClusterIssuer is ready, we can create our Certificate in any namespace such as:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: crafteo-wildcard
spec:
  # Certificate will be valid for these domain names
  dnsNames:
  - devops.crafteo.io
  - '*.devops.crafteo.io'
  # Reference our issuer
  # As it's a ClusterIssuer, it can be in a different namespace
  issuerRef:
    kind: ClusterIssuer
    name: cert-manager-acme-issuer
  # Secret that will be created with our certificate and private keys
  secretName: crafteo-wildcard-certificate

Cert Manager will then create a few resources to manage our certificate request:

  • A CertificateRequest user to request a certificate from an Issuer
  • An Order managed by our ClusterIssuer to 'order' a certificate from ACME
  • A Challenge for each DNS names that are being verified
  • Finally, a Secret holding our certificate and its private key

Each of these resources will have it's own lifecycle (check Cert Manager docs for details).

Our certificate will appear as Ready after a few moments (DNS-01 challenge may take a few minutes to complete):

$ kubectl get certificate
NAME               READY   SECRET                      
crafteo-wildcard   True    crafteo-wildcard-certificate

Along with related secret with certificate and private key:

apiVersion: v1
kind: Secret
type: kubernetes.io/tls
metadata:
  name: crafteo-wildcard-certificate
data:
  tls.crt: xxx
  tls.key: xxx

In case something goes wrong, checking Cert Manager logs and resources events may give us insight on what happened (such as invalid IAM permissions or another error during DNS-01 challenge lifecycle):

$ kubectl -n cert-manager logs cert-manager-xxx
$ kubectl describe certificaterequest
$ kubectl describe order
$ kubectl describe challenge
$ kubectl describe certificate

Deploy an application with Ingress using our certificate

Once created, we can now deploy an application with an Ingress making use of our certificate. We have multiple possibilities here:

We're gonna keep things simple and use plain Ingress with whoami Docker image:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: whoami
  annotations:
    # Required for Traefik to handle HTTPS requests
    traefik.ingress.kubernetes.io/router.tls: "true"
spec:
  tls:
  - hosts:
      - whoami.devops.crafteo.io
    secretName: crafteo-wildcard-certificate
  rules:
  - host: whoami.devops.crafteo.io
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: whoami
            port:
              number: 80
---
apiVersion: v1
kind: Service
metadata:
  name: whoami
  labels:
    app: whoami
spec:
  ports:
    - port: 80
  selector:
    app: whoami
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: whoami
  labels:
    app: whoami
spec:
  replicas: 1
  selector:
    matchLabels:
      app: whoami
  template:
    metadata:
      labels:
        app: whoami
    spec:
      containers:
      - name: whoami
        image: traefik/whoami
        ports:
        - containerPort: 80

Our application should now be available at specified adress!

Full example with Pulumi and K3S

Check out Devops Lifecycle Example GitHub repository for a full example using Pulumi and K3S servers.

Leave a Reply

Your email address will not be published.