Automate TLS Credentials rotation in Linkerd with Terraform
Introduction
In the today’s world of microservices, the adaption of a service mesh is (starting) to become a standard in any modern and up-to-date infrastructure. A challenge for any service mesh solution is how such integration can happen providing from one side (among others) reliability, security and observability and from the other side all these to happen without cloud and infrastructure engineers need to go through complex and time consuming code changes to their system in order to achieve such adaptation. Furthermore, these challenges should also include any updates of the service mesh, in short or long term, and any new change should always be compatible with the infrastructure already using a previous version of this service mesh. Besides all these, do not forget, we want the service mesh layer to be as “unnoticeable” and decoupled as possible from our solution.
Linkerd is a service mesh that promises all the above. Light weight, fully open sourced and, currently, an incubator project under the Cloud Native Computing Foundation (CNCF), Linkerd has made huge steps to achieve a small, safe and trustable service mesh solution. For more information about how Linkerd works, please visit the official documentation here
How Linkerd ensures security.
One of the many features of Linkerd’s focus on security is the mTLS automation. To achieve this, a set of TLS credentials is used to generate TLS certificates¹. The TLS credentials consists of 3 parts:
- A trust anchor
- An issuer certificate
- A private key
Linkerd has the mechanism to automatically rotate the TLS certificates every 24 hours. But it has no internal mechanism to rotate the credentials issuing the certificate. So, if using the same TLS credentials for a long time and an intruder manages to get access to these credentials, the intruder then would be able to create TLS certificates for your infrastructure, thus creating a security bridge.
Official documentation¹, states to solve this issue, Linkerd has to rely on an external solution. Specifically, it needs to rely on the pretty well defined, tested and used cert-manager for the automatic rotation of the issuer certificate and the private key. Unfortunately , the trust anchor needs still to be rotated manually from time to time.
And what about Terraform?
The above process as described by the official documentation is great but is not very convenient as still needs manual steps to be completed. In an environment, that needs to get closer and closer, to fully automating (or at least get as close as possible) the infrastructure installation, these manual steps are, in fact, breaking this effort.
One of the many reasons of using Terraform is to achieve fully automation. Especially, when the goal is to have a single terraform script to replicate an infrastructure at multiple clusters with just changing few parameters. In that scenario, manual work outside the script is not desired. In this tutorial, we will see how the automation process can be applied through terraform.
Prerequisites
It is assumed that a terraform script for creating a Kubernetes cluster already exists. A tutorial on how to install a cluster with terraform on AWS EKS and other cloud solutions will follow soon. So terraform and Kubernetes cluster installation is out of scope of this tutorial
This tutorial has been tested with the following:
- Kubernetes 1.18.9 on AWS EKS
- Terraform 0.14.8
- Cert manager 1.2.0
- Helm v3
- Linkerd 2.10
Furthermore, the following terraform required providers are used:
- hashicorp/kubernetes
- hashicorp/helm
- hashicorp/kubernetes-alpha
The tutorial covers both the Control Plane Credentials¹ and the Webhook TLS Credentials² rotation.
Since we are using Linkerd, in all namespaces we add the annotation “linkerd.io/inject”=”enabled” which activates the service mesh for all pods in the namespace
How to:
Install cert-manager helm package
Cert-manager installation should be done as any helm package in terraform. In addition, a namespace is recommended to be created and add the package on the namespace:
Linkerd installation with certificate automation
Since version 2.10, Linkerd has split the big helm chart to smaller charts. For this tutorial we are going to use only the basic linkerd2 chart. Follow the official docs to install any other extra package for ex. linkerd-viz to install linkerd dashboard
The helm chart offer the possibility to create the namespace. However, the experience showed that it is better to create the namespace on our own, as we are going to need them before the installation of the chart. So as a first step, let’s create the namespace:
Keep in mind the annotation for Linkerd. In the Linkerd namespace we do not need it as the pods installed by Linkerd have by default the proxy sidecar. On the contrary, charts like linkerd-viz need the annotation as any other namespace. In addition, in some cases, during terraform plan, terraform will complain that the linkerd-viz is missing some annotations/labels. This happens because we install linkerd-viz via a helm chart and sometimes Terraform gets “confused”. In that case, simply add to the namespace declaration the annotations/labels the error indicates.
Let’s start first with the rotation of the Control Plane TLS credentials.
Firstly, we need to create a signing key pair. There are several ways to do that. Official documentation¹ suggests to use step. When we create the 2 certificates, we add them to our project folder structure. Then, we create the Kubernetes secret for the trust anchor as follows:
The next step is to issue the certificate with new TLS credentials. Since we decided to go through Terraform way, we are going to use the kubernetes-alpha provider which transforms a terraform function to yaml. Keep in my mind, this provider is not a stable realise, so you might experience issues for some applications. For our case, it is working perfectly. Of course, you can also use other similar providers that function the same way. To issue the certificate we must create 2 manifests of the cert-manager api, an Issuer and a Certificate as follows:
The important keys to above scripts are the following:
- The Issuer secret name must be the name of the secret we created in the previous step
- On Certificate, the duration and renewBefore values are indicating the duration of the certificate and how many hours after its issue it should be renewed. In the above scenario, the certificate expires every 24 hours and it renews every 12. So practically , every 12 hours we have a new certificate. These values, of course can be changed according to strategy and needs.
- The issuerRef.name must be the name of the issue created
- The secretName is the secret that it will be created by the Certificate and it will store the new TLS credentials in every certificate renewal
Let’s continue in similar way for the Webhook TLS Credentials. For Webhook we are going to use separate signed key pair and trust anchor.
So after we created the signed key pair we continue as follows:
If we are using the control plane certificate rotation, we do not need to create another certificate for the webhook as they are using the same certificate. If not, we create the proxy injector certificate similar to the control plane above and use the webhook issuer as trust anchor. In any case, we must create another certificate for the sp validator as follows:
In that way we create new credentials in each certificate rotation and they are stored to the secret defined. The secret is updated in every renewal. Hence, since we are using Terraform, this script is ready to use as it can be part of a wider cluster script installation.
Final step is to install the Linkerd2 helm chart as follows:
And that’s it!. We installed Linkerd through Terraform and we established automatic TLS credentials rotation. We do not need any more to run terminal commands inside the cluster to ensure that. Furthermore, the scripts are ready to use out of the box. This means we do not have to worry anymore that after the installation of Linkerd in a new cluster we have to do further work to ensure that are certificates will be rotated or need to be renewed before expiration. They are completely reusable. The only part that needs manual work is to create new key pairs for our Trust anchors from time to time. Unfortunately this process is not yet ready to be automated.
Conclusion
Service mesh is a must in a micro services architecture to ensure a secure and trustworthy infrastructure. Linkerd is the new big player in the game with huge potential without adding big overheard to your application. Certificate automation is an issue that must be as much automated as possible and with combination with Terraform this can be achieved. I hope this tutorial gave you a good overview on how to achieve it and bring your infrastructure closer to fully cloud automation.
Thank you and stay tuned!