Automating Kubernetes Workflows with Kyverno's Mutating Webhooks

Hi there! I’m Rodrigo, a Staff Site Reliability Engineer at Miro. In this article, I’m excited to share how Miro’s Compute team automates complex Kubernetes workflows using Kyverno’s mutating webhooks. Whether you’re a seasoned Kubernetes administrator or just getting started with container orchestration, you’ll learn how Kyverno can streamline your operations, enhance security, and bring a new level of efficiency to your Kubernetes environments. Join me as we explore practical examples and best practices that you can apply to your own infrastructure. Let’s dive in and unlock the power of Kyverno together!

Infrastructure as code and configuration management tools like Terraform, AWS CloudFormation, Pulumi, CDK, and Ansible have revolutionized how engineers define, consume, and configure infrastructure. These tools generally share a common feature: they provide a declarative system, allowing users to define their desired state of affairs from both infrastructure and runtime perspectives.

Kubernetes aligns with this approach. Backed by a powerful, API-driven platform, Kubernetes was built to be declarative from its inception. Kubernetes manifests and template-based solutions built on top of it — such as Helm charts — allow developers to define exactly what they expect and require from the Kubernetes cluster runtime. These manifests, which are structured OpenAPI definitions of Kubernetes native or custom resources, are applied to the cluster. Controllers running in the cluster then take actions to consolidate the intended features, bringing the actual state as close as possible to the declared state in the manifest.

Introducing the Kubernetes Dynamic Admission Control

While Kubernetes’ declarative nature provides a robust foundation for defining desired states, there are scenarios where dynamic changes or mutations to manifests during admission become necessary. This is where Kubernetes Dynamic Admission Control comes into play, allowing for real-time validations and modifications that can address complex requirements or enforce specific policies.

1. Admission Webhooks

It’s important to note that, in general, it’s preferable to maintain the desired state of a Kubernetes resource in a plain manifest file. This approach allows developers and platform engineers to easily trace the actual state in the cluster back to the defining code. Dynamic Admission Control should be used judiciously, primarily for complex scenarios where a centralized system is necessary to address intricate requirements.

Back in 2017, Kubernetes 1.7 introduced two of the most important features up to this day: Custom Resource and Dynamic Admission Control. These features significantly expanded Kubernetes’ extensibility, allowing users to define custom objects and implement custom logic during the admission process.

The dynamic admission control system is composed of admission webhooks, which are HTTP callbacks that receive admission requests and do something with them. Users can define two types of admission webhooks, validating admission webhook and mutating admission webhook. Mutating admission webhooks are invoked first, and can modify objects sent to the API server to enforce custom behavior. After all object modifications are complete, and after the incoming object is validated by the API server, validating admission webhooks are invoked and can reject requests to enforce custom policies.

apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
 name: "pod-policy.example.com"
webhooks:
- name: "pod-policy.example.com"
 rules:
 - apiGroups:   [""]
   apiVersions: ["v1"]
   operations:  ["CREATE"]
   resources:   ["pods"]
   scope:       "Namespaced"
 clientConfig:
   service:
     namespace: "example-namespace"
     name: "example-service"
   caBundle: <CA_BUNDLE>
 admissionReviewVersions: ["v1"]
 sideEffects: None
 timeoutSeconds: 5

Kyverno: A Policy Engine Designed for Kubernetes

Kyverno logo

While Kubernetes’ built-in admission controllers provide a solid foundation for resource management, they often require custom coding and maintenance of separate webhook servers. This can be challenging for teams looking for a more integrated and user-friendly solution. Enter Kyverno, a policy engine designed specifically for Kubernetes that addresses these challenges head-on. Kyverno extends the concept of admission control, offering a more accessible and powerful way to define and enforce policies across your Kubernetes clusters, providing a generic policy engine managed as Kubernetes resources, requiring no new language knowledge when writing those policies.

What is Kyverno?

Kyverno is an open-source policy engine specifically designed for Kubernetes. It allows cluster administrators to manage policies as Kubernetes resources, so it seamlessly integrates with existing Kubernetes workflows and tools.

Key features of Kyverno include:

Policy as code: Policies are defined using Kubernetes-style resources, allowing version control and management like any other Kubernetes object.
No new language to learn: Policies are written in YAML, the same language used for Kubernetes manifests.
Comprehensive policy actions: Kyverno can validate, mutate, generate, and clean up Kubernetes resources.
Software supply chain security: It can verify image signatures and artifacts to enhance the security of your software supply chain.
Admission control: Acts as both a validating and mutating admission controller.
CLI support: Offers a CLI tool for testing policies and validating resources outside the cluster.

Anatomy of a Kyverno Policy

A Kyverno policy is defined as a custom resource in Kubernetes. Here’s a basic structure of a Kyverno policy:

apiVersion: kyverno.io/v1
kind: ClusterPolicy #or Policy (Namespaced)
metadata:
 name: policy-name
spec:
 rules:
   - name: rule-name
     match:
       resources:
         kinds:
           - ResourceKind
     validate:
       message: "Validation message"
       pattern:
         spec:
           # Define the pattern to match against
   - name: add-userinfo
     match:
       any:
         - resources:
             kinds:
               - "ConfigMap"
             operations:
               - CREATE
     mutate:
       patchStrategicMerge:
         metadata:
           annotations:
             kyverno.io/created-by: "{{ request.userInfo | to_string(@) }}"

2. Kyverno policies and rules

Key components of a Kyverno policy include:

Metadata: Defines the name and optional namespace for the policy. A policy can be cluster-wide when defined using the ClusterPolicy CRD or namespaced, when defined using the Policy CRD.
Spec: Contains the actual policy definition.
Rules: One or more rules that define when and how the policy is applied.
Match/Exclude: Specifies which resources the policy applies to or excludes.
Validate/Mutate/Generate: Defines the action to be taken (can have multiple).

Kyverno policies can perform various actions:

Validate: Check if resources meet specific criteria.
Mutate: Modify resources during creation or update.
Generate: Create additional resources based on triggering events.
Verify images: Check image signatures and other attributes.

By leveraging these powerful features, Kyverno enables organizations to enforce best practices, security policies, and compliance requirements across their Kubernetes clusters in a native and declarative manner.

Automating complex Kubernetes workflows with mutating policies

Now that we’re all on the same page about what Kyverno policies are, I’d like to showcase their real power and demonstrate how users can leverage them to automate complex Kubernetes workflows. While your specific use case may not exactly match the examples provided, my goal is to introduce you to the general concepts and techniques that you can apply in your own policy logic. You can later adapt and reuse these principles to craft policies tailored to your unique requirements.

All the examples discussed here are available in the https://github.com/rodrigorfk/k8s-kyverno-mutating-policies repository. I encourage you to explore the code and follow along as we dive into each example.

Let’s examine four powerful examples that illustrate the versatility and capability of Kyverno mutating policies.

By studying these policies, you’ll gain insights into advanced Kyverno techniques such as:

Using ConfigMaps for dynamic configuration
Integrating with external services and APIs
Leveraging Kubernetes labels and annotations for fine-grained control
Managing resource lifecycles across different Kubernetes objects

Remember, while these examples provide powerful starting points, always test thoroughly and adapt policies to your specific needs and security requirements before applying them in production environments.

1. KEDA Prometheus Scaler ServerAddress

This policy demonstrates how to centralize and dynamically manage configuration across multiple resources. Without this policy, updating the Prometheus server address requires modifying each individual ScaledObject. This can be time consuming and error-prone, especially in environments with many microservices: https://github.com/rodrigorfk/k8s-kyverno-mutating-policies/tree/main/src/examples/02-keda-prometheus-address

Key features:

Centralizes the Prometheus server address for KEDA (Kubernetes Event-driven Autoscaling) ScaledObjects
Automatically updates ScaledObjects when the central configuration changes
Reduces operational overhead in environments with many microservices using KEDA

Use case: Ideal for organizations using KEDA extensively with Prometheus, where maintaining consistent configurations across multiple services is crucial.

2. Default hardware architecture for pods

This policy addresses challenges in heterogeneous Kubernetes clusters with mixed hardware architectures: https://github.com/rodrigorfk/k8s-kyverno-mutating-policies/tree/main/src/examples/03-pod-hardware-arch

Key features:

Automatically assigns a default architecture to newly created pods when not specified
Allows setting default architecture per namespace
Prevents scheduling issues in clusters with mixed node architectures (e.g., amd64 and arm64)

Use case: Perfect for organizations managing clusters with diverse hardware, ensuring workloads are scheduled on compatible nodes.

3. Public registry image mutation

This policy enhances security and performance by redirecting public image pulls to private registries: https://github.com/rodrigorfk/k8s-kyverno-mutating-policies/tree/main/src/examples/04-image-registry

Key features:

Replaces public image registries with private alternatives
Integrates with Amazon ECR, handling cross-region scenarios
Adds necessary imagePullSecrets automatically
Uses a ConfigMap for flexible registry mapping

Advanced ECR Integration:

Leverages Amazon ECR’s image replication and pull-through cache capabilities
Optimizes image pulling by redirecting to the same-region ECR repository
Significantly reduces pod startup time by pulling images from the same region
Utilizes ECR’s caching mechanisms for faster subsequent pulls
Eliminates egress costs associated with pulling images from different regions
Reduces bandwidth usage and associated costs for repetitive image pulls

Use case: Excellent for organizations wanting to control and optimize container image sources, particularly those using cloud provider-specific registries like Amazon ECR.

By implementing this policy, organizations can seamlessly transition from using public registries to a more secure, efficient, and cost-effective private registry setup, all without requiring changes to existing deployment manifests or developer workflows.

4. Sidecar injection

While this example is primarily for demonstration and not intended for production use, it showcases Kyverno’s powerful sidecar injection capabilities: https://github.com/rodrigorfk/k8s-kyverno-mutating-policies/tree/main/src/examples/05-sidecar-inject

Key features:

Automatically injects an Nginx sidecar into selected pods
Generates and manages TLS certificates for the Nginx sidecar using cert-manager
Demonstrates integration with external services (cert-manager) for enhanced functionality
Showcases automatic resource cleanup through ownerReferences

Use case: While the Nginx example is for demonstration, the concepts can be applied to inject various sidecars (e.g., logging, monitoring, or security tools) into your deployments.

Closing remarks

I hope you’ve found this journey through Kyverno’s capabilities as exciting and enlightening as I have in writing about it. Before we conclude, I’d like to emphasize that with great power comes great responsibility. Here are some crucial recommendations for running Kyverno in production environments:

GitOps considerations

When using Kyverno in a GitOps workflow, be aware of potential conflicts between mutating policies and GitOps controllers:

Mutated resources may diverge from the state defined in Git, causing reconciliation loops
This can lead to increased cluster churn and resource consumption

To mitigate these issues:

With Flux: No special configuration needed as Server-Side Apply (SSA) is enabled by default
With ArgoCD (version 2.10+): Enable server-side diff using annotations

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
 annotations:
   argocd.argoproj.io/compare-options: ServerSideDiff=true,IncludeMutationWebhook=true

High availability and resource management

Kyverno is a privileged component running alongside the Kubernetes API server.
Ensure deployment in high-availability mode with proper resource allocation for each replica based on your cluster’s load.
Regularly monitor and adjust resources to maintain optimal performance as your cluster grows.

Governance and security

Implement strict governance around the creation and modification of ClusterPolicy resources.
Use Kubernetes RBAC to limit who can create or modify policies.
Consider implementing a review process for new policies before they’re applied to production clusters.
Regularly audit your policies to ensure they align with your current security and operational needs.

Performance considerations

Avoid creating overly broad policies that select all resource kinds (e.g., using “kinds: [‘*’]” in your “match” conditions).
Such broad policies can significantly increase load on both the Kubernetes API Server and Kyverno components.
- Instead, be specific about which resources each policy should affect.
Use background scans judiciously, as they can impact cluster performance.

Testing and upgrade strategy

Kyverno evolves rapidly, with frequent feature additions and releases.
Implement comprehensive integration tests that verify your policy behaviors.
Test not only with the Kyverno CLI but also in real cluster environments.
Use “kubectl apply — dry-run=server -o yaml” to test policy effects without making changes.
Maintain a robust quality assurance process for your policies to prevent surprises during Kyverno upgrades.

Monitoring and alerting

Set up monitoring for Kyverno components and policy activities.
Implement alerts for policy violations and unexpected behaviors.
Regularly review Kyverno logs to catch any issues early.

Documentation and knowledge sharing

Maintain clear documentation for all your custom policies.
Ensure your team understands the purpose and impact of each policy.
Regularly review and update policies to align with evolving best practices and your organization’s needs.

Gradual rollout

When implementing new policies or major changes, consider a phased approach.
Start with audit mode before enforcing to understand the impact.
Use namespaces or labels to gradually roll out policies to subsets of your workloads.

By adhering to these best practices, you can harness the full power of Kyverno while maintaining a stable, secure, and efficient Kubernetes environment. Remember, Kyverno is a powerful tool that, when used responsibly, can significantly enhance your Kubernetes automation and governance capabilities.

As you continue your Kyverno journey, stay curious, keep experimenting, and always prioritize the stability and security of your Kubernetes clusters. Happy automating!

Interested joining the Miro Engineering team? Check out our open positions.

______________________________________

Image Credits:

Automating Kubernetes Workflows with Kyverno's Mutating Webhooks

Rodrigo Fior Kuntzer

Introducing the Kubernetes Dynamic Admission Control

Kyverno: A Policy Engine Designed for Kubernetes

What is Kyverno?

Anatomy of a Kyverno Policy

Automating complex Kubernetes workflows with mutating policies

1. KEDA Prometheus Scaler ServerAddress

2. Default hardware architecture for pods

3. Public registry image mutation

4. Sidecar injection

Closing remarks

Related blogs

Data Products Reliability: The Power of Metadata

From Takeoff to Landing: Replacing Cloud Infrastructure with Zero Downtime

Writing data product pipelines with Airflow

Ready to go beyond?

Rodrigo Fior Kuntzer

Introducing the Kubernetes Dynamic Admission Control

Kyverno: A Policy Engine Designed for Kubernetes

What is Kyverno?

Anatomy of a Kyverno Policy

Automating complex Kubernetes workflows with mutating policies

1. KEDA Prometheus Scaler ServerAddress

2. Default hardware architecture for pods

3. Public registry image mutation

4. Sidecar injection

Closing remarks

Related blogs

Data Products Reliability: The Power of Metadata

​From Takeoff to Landing: Replacing Cloud Infrastructure with Zero Downtime

Writing data product pipelines with Airflow

Ready to go beyond?

From Takeoff to Landing: Replacing Cloud Infrastructure with Zero Downtime