Securing Kubernetes with Network Policies and OPA Gatekeeper

Why Kubernetes Is Insecure by Default

Out of the box, every Kubernetes cluster operates with a flat network. Every pod can reach every other pod — across namespaces, across workloads, across trust boundaries. There are no firewalls between your frontend and your database. There is no policy engine stopping an engineer from deploying a container running as root with an image pulled from Docker Hub. The API server will happily accept whatever you send it.

This is not a bug; it is a design choice that favours operability over security. Kubernetes gives you the primitives to lock things down, but it does not lock anything down for you. If you run a production cluster without NetworkPolicies and an admission controller, you are running with the doors wide open.

In this guide we will cover two complementary layers of defence. First, NetworkPolicies — the native Kubernetes resource that controls L3/L4 traffic between pods. Second, OPA Gatekeeper — a policy engine that validates every resource before the API server persists it. Together they give you network segmentation and admission control, the two pillars of a hardened cluster.

Prerequisites: You should have a working Kubernetes cluster (1.25+) with a CNI that supports NetworkPolicies (Calico, Cilium, or Weave Net). For the Gatekeeper sections you will need Helm or the ability to apply raw manifests.

The Default-Deny Foundation

The single most impactful security change you can make to a Kubernetes namespace is to apply a default-deny NetworkPolicy. The philosophy is simple: block everything, then explicitly allow only the traffic your application actually needs. This is the network equivalent of the principle of least privilege.

A default-deny policy selects all pods in the namespace (using an empty podSelector) and declares both Ingress and Egress policy types without providing any allow rules. The result: all inbound and outbound traffic for every pod in that namespace is dropped unless another NetworkPolicy explicitly permits it.

default-deny.yamlyaml

# default-deny.yaml
# Drops ALL ingress and egress traffic for every pod in the namespace.
# Apply this first, then layer allow-rules on top.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: backend
spec:
  podSelector: {}          # selects every pod in the namespace
  policyTypes:
    - Ingress
    - Egress
  # No ingress or egress rules — everything is denied.

Apply it with kubectl apply -f default-deny.yaml. Immediately, every pod in the backend namespace loses the ability to send or receive traffic. This will break things — and that is the point. You now have a known-secure baseline from which to add explicit allowances.

Tip: Apply default-deny to every namespace you control, including default. Many teams automate this with a mutating admission webhook or a GitOps pipeline that stamps the policy onto every new namespace.

Allowing Specific Traffic

With default-deny in place, you need to punch precise holes for legitimate traffic. NetworkPolicies are additive — each new policy adds allow rules on top of the default deny. They never subtract permissions.

The following example shows a common three-tier pattern: a frontend in the frontend namespace is allowed to reach the API server in the backend namespace, and the API server is allowed to reach PostgreSQL in the same namespace on port 5432.

allow-frontend-to-backend.yamlyaml

# allow-frontend-to-backend.yaml
# Allows pods labelled app=api in the backend namespace
# to receive traffic from pods labelled app=frontend
# in the frontend namespace.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-api
  namespace: backend
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: frontend
          podSelector:
            matchLabels:
              app: frontend
      ports:
        - protocol: TCP
          port: 8080

allow-api-to-postgres.yamlyaml

# allow-api-to-postgres.yaml
# Allows the API pods to open connections to PostgreSQL.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-to-postgres
  namespace: backend
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
    - Egress
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - protocol: TCP
          port: 5432

Notice the pattern: every rule names the exact labels on both sides and the exact port. Resist the temptation to use broad selectors or to open entire port ranges. Each rule should read like a sentence: “Allow frontend pods to talk to API pods on port 8080.”

Do Not Forget DNS

This is the single most common mistake teams make when rolling out default-deny policies. When you block all egress, you also block DNS resolution. Your pods can no longer resolve service names to cluster IPs because they cannot reach kube-dns (CoreDNS) on UDP port 53. Every service call fails with a name-resolution error, not a connection timeout — which makes it look like a different problem entirely.

You must add an explicit egress rule that allows all pods to reach kube-dns in the kube-system namespace on port 53 over both UDP and TCP (TCP is used for responses larger than 512 bytes).

allow-dns.yamlyaml

# allow-dns.yaml
# Permits every pod in the namespace to resolve DNS
# via CoreDNS in kube-system. Apply alongside default-deny.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
  namespace: backend
spec:
  podSelector: {}          # applies to all pods
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53

Apply this in every namespace that has a default-deny policy. We recommend templating it into your namespace provisioning pipeline so it is never forgotten.

Namespace Isolation for Multi-Environment Clusters

If you run dev, staging, and production workloads on the same cluster (or even on separate clusters that share a mesh), namespace labels and selectors let you enforce hard boundaries between environments. The key is to label your namespaces consistently — for example, environment: production, environment: staging, and environment: dev — and then write policies that reference those labels.

isolate-production.yamlyaml

# isolate-production.yaml
# Ensures pods in production can only receive traffic
# from other production namespaces — never from dev or staging.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: restrict-ingress-to-production
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              environment: production

With this policy, a pod in dev cannot open a connection to any pod in production, even if it knows the cluster IP. This is especially important when developers have broad RBAC permissions in lower environments — you do not want a misconfigured service in dev accidentally reaching your production database.

Combine environment isolation with the default-deny and service-specific rules above and you have a layered network posture that mirrors traditional network zoning — but defined entirely in version-controlled YAML.

Introducing OPA Gatekeeper

NetworkPolicies control traffic after pods are running. But what about the resources themselves? What stops an engineer from deploying a container that runs as root, pulls from an untrusted registry, or requests no resource limits? The answer is an admission controller.

Kubernetes admission controllers sit between the API server and etcd. Every CREATE or UPDATE request passes through them before it is persisted. OPA Gatekeeper is a validating admission webhook backed by the Open Policy Agent engine. You define policies as two Kubernetes custom resources:

ConstraintTemplate — the reusable policy logic, written in Rego.
Constraint — an instance of a template, scoped to specific resources, namespaces, or labels.

This separation means platform teams write templates once, and application teams (or GitOps pipelines) bind constraints to their namespaces with custom parameters. Gatekeeper complements NetworkPolicies: policies control what gets deployed, while NetworkPolicies control what talks to what.

Install Gatekeeper with Helm:

install-gatekeeper.shyaml

# Install Gatekeeper v3.16 via Helm
helm repo add gatekeeper https://open-policy-agent.github.io/gatekeeper/charts
helm repo update
helm install gatekeeper gatekeeper/gatekeeper   --namespace gatekeeper-system   --create-namespace   --set replicas=3   --set audit.replicas=1   --set audit.logLevel=INFO

Your First Constraint: Block Containers Running as Root

Running containers as UID 0 (root) is one of the most common and dangerous misconfigurations in Kubernetes. A container breakout from a root process gives the attacker root on the node. The following ConstraintTemplate and Constraint reject any pod whose containers do not set runAsNonRoot: true.

template-disallow-root.yamlyaml

# template-disallow-root.yaml
# ConstraintTemplate that rejects pods running as root.
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8sdisallowroot
spec:
  crd:
    spec:
      names:
        kind: K8sDisallowRoot
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sdisallowroot

        # Deny if any container lacks runAsNonRoot: true
        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.securityContext.runAsNonRoot
          msg := sprintf(
            "Container '%v' must set securityContext.runAsNonRoot to true.",
            [container.name]
          )
        }

        # Also check init containers
        violation[{"msg": msg}] {
          container := input.review.object.spec.initContainers[_]
          not container.securityContext.runAsNonRoot
          msg := sprintf(
            "Init container '%v' must set securityContext.runAsNonRoot to true.",
            [container.name]
          )
        }

constraint-disallow-root.yamlyaml

# constraint-disallow-root.yaml
# Applies the template to all pods in non-system namespaces.
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sDisallowRoot
metadata:
  name: disallow-root-containers
spec:
  enforcementAction: deny   # change to "dryrun" for audit mode
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    excludedNamespaces:
      - kube-system
      - gatekeeper-system
      - cert-manager

Once applied, any kubectl apply or Helm install that tries to create a pod without runAsNonRoot: true will be rejected with a clear error message naming the offending container.

Blocking Untrusted Image Registries

Supply-chain attacks targeting container images are on the rise. Restricting which registries your cluster pulls from is a critical control. The following template and constraint only allow images from your organisation's ECR registry.

template-allowed-repos.yamlyaml

# template-allowed-repos.yaml
# ConstraintTemplate that restricts container image sources.
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8sallowedrepos
spec:
  crd:
    spec:
      names:
        kind: K8sAllowedRepos
      validation:
        openAPIV3Schema:
          type: object
          properties:
            repos:
              type: array
              items:
                type: string
              description: "List of allowed image registry prefixes."
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sallowedrepos

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not image_allowed(container.image)
          msg := sprintf(
            "Container '%v' uses image '%v' which is not from an allowed registry. Allowed prefixes: %v",
            [container.name, container.image, input.parameters.repos]
          )
        }

        violation[{"msg": msg}] {
          container := input.review.object.spec.initContainers[_]
          not image_allowed(container.image)
          msg := sprintf(
            "Init container '%v' uses image '%v' which is not from an allowed registry. Allowed prefixes: %v",
            [container.name, container.image, input.parameters.repos]
          )
        }

        image_allowed(image) {
          repo := input.parameters.repos[_]
          startswith(image, repo)
        }

constraint-allowed-repos.yamlyaml

# constraint-allowed-repos.yaml
# Only allow images from your ECR account and official Kubernetes images.
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sAllowedRepos
metadata:
  name: require-trusted-registries
spec:
  enforcementAction: deny
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    excludedNamespaces:
      - kube-system
      - gatekeeper-system
  parameters:
    repos:
      - "123456789012.dkr.ecr.us-east-1.amazonaws.com/"
      - "registry.k8s.io/"

Replace the ECR account ID with your own. You can add multiple prefixes — internal Harbor registries, GitHub Container Registry for specific orgs, or gcr.io for GKE system images. The key is that nothing from Docker Hub or an unknown registry makes it into your cluster without an explicit exception.

The Audit-Mode-First Workflow

Deploying Gatekeeper constraints straight into deny mode on a running cluster is a recipe for an incident. Existing workloads that violate the new policy will continue running (Gatekeeper only checks at admission time), but any rollout, scale event, or node migration that recreates a pod will fail. You will page your on-call engineer at 3 AM.

The safe approach is a three-phase rollout:

Phase 1: Deploy in Dry-Run Mode

Set enforcementAction: dryrun on your constraint. Gatekeeper will evaluate every admission request against the policy but will not reject anything. Violations are recorded on the constraint object and emitted as audit results.

Phase 2: Review Violations

Run kubectl get constraints and inspect the status.violations field on each constraint. This tells you exactly which existing resources violate the policy, in which namespace, and why. Share this list with application teams and give them a remediation window.

check-violations.shyaml

# List all constraints and their violation counts
kubectl get constraints

# Inspect violations for a specific constraint
kubectl get K8sDisallowRoot disallow-root-containers -o yaml | \
  yq '.status.violations'

# Example output:
# - enforcementAction: dryrun
#   kind: Pod
#   message: "Container 'app' must set securityContext.runAsNonRoot to true."
#   name: legacy-api-7f8b9c6d4-xk2lm
#   namespace: backend

Phase 3: Flip to Enforce

Once all existing violations are resolved, change enforcementAction from dryrun to deny. From this point on, any new resource that violates the policy is rejected at the API server. Commit this change through your GitOps pipeline so it is tracked and reversible.

Why this matters: We have seen teams skip audit mode and immediately break their CI/CD pipelines, block Horizontal Pod Autoscaler scale-ups, and prevent node-drain rescheduling during maintenance windows. The audit phase costs you a few days; skipping it can cost you an outage.

Monitoring and Observability

Policies are only as good as your ability to observe them. Without monitoring, dropped packets look like application bugs and rejected deployments look like platform failures.

NetworkPolicy Flow Logs

If you use Calico, enable flow logs in your FelixConfiguration to see every allowed and denied connection with source pod, destination pod, port, and action. Ship these to your SIEM or to Elasticsearch for analysis. If you use Cilium, Hubble provides a real-time flow visibility layer with its own CLI and UI. Run hubble observe --verdict DROPPED to see denied traffic in real time.

Gatekeeper Metrics

Gatekeeper exposes Prometheus metrics on port 8888 by default. The key metrics to alert on are:

gatekeeper_violations — the number of resources currently violating each constraint (gauge). Set an alert when this rises above zero in enforced constraints.
gatekeeper_request_duration_seconds — webhook latency. If this exceeds your SLO, Gatekeeper is adding unacceptable latency to the API server.
gatekeeper_request_count — total admission requests processed. A sudden drop may indicate Gatekeeper is down and the API server is failing open.

For the kubectl workflow, you can also run kubectl get constraints -o wide in a cron job or CI pipeline and fail the build if any constraint shows violations above a threshold.

Common Pitfalls

After helping dozens of teams roll out these controls, we see the same mistakes repeatedly. Here is what to watch for:

Forgetting DNS Egress

We covered this above, but it bears repeating. When you enable default-deny egress and forget the DNS allow rule, every service in the namespace breaks instantly. The failure mode — DNS resolution errors — does not obviously point to a NetworkPolicy problem. Always add the DNS rule in the same commit as the default-deny.

Blocking Kubelet Health Checks

If your pods define livenessProbe or readinessProbe with httpGet, the kubelet on the node opens a connection to the pod. With a default-deny ingress policy, that connection is dropped. The probe fails, Kubernetes restarts the pod, the probe fails again, and you enter a crash loop. You need an ingress rule that allows traffic from the node CIDR or the kubelet IP on your probe port.

Overly Broad Label Selectors

A namespaceSelector: {} with no match labels selects every namespace in the cluster, including kube-system. This is the NetworkPolicy equivalent of 0.0.0.0/0 — it defeats the purpose of segmentation. Always use explicit label selectors and audit your policies with kubectl describe networkpolicy to verify what they actually match.

Not Testing in a Non-Production Cluster

Both NetworkPolicies and Gatekeeper constraints should be tested in a staging cluster that mirrors production before they reach your live environment. Use a tool like kubectl-np-viewer to visualise your NetworkPolicies and confirm that the allow/deny matrix matches your architecture diagrams.

Ignoring Gatekeeper Failover Behaviour

By default, if the Gatekeeper webhook is unreachable, the API server fails open— it allows all requests. This means a Gatekeeper outage silently disables all your policies. If your threat model requires fail-closed behaviour, set the webhook's failurePolicy to Fail. Be aware that this means a Gatekeeper crash will block all deployments until it recovers. Run at least three replicas behind the webhook service.

Putting It All Together

Kubernetes security is not a single tool or a single layer. It is a practice. NetworkPolicies give you network segmentation — controlling which pods can talk to which other pods, on which ports, across which namespaces. OPA Gatekeeper gives you admission control — ensuring that only compliant, trusted, well-configured workloads make it onto your cluster in the first place.

Start with default-deny NetworkPolicies and the DNS allow rule. Layer on service-specific allow rules that mirror your architecture. Then deploy Gatekeeper in audit mode, review violations, remediate, and flip to enforce. Instrument everything with flow logs and Prometheus metrics. Iterate.

Every YAML example in this guide is production-ready. Copy them, adapt them to your labels and namespaces, and commit them to your GitOps repository. A hardened cluster is not a weekend project — it is an ongoing discipline. But the first step is always the same: apply the default-deny policy and see what breaks.

Need help hardening your Kubernetes clusters? NubisCore specialises in platform engineering, cloud security, and Kubernetes operations. We help teams implement NetworkPolicies, OPA Gatekeeper, and end-to-end cluster hardening — from initial audit through production enforcement. Get in touch to discuss your security posture.