Skip to content

Service-Mesh - Service-Discovery - Ingress/API GatewayΒΆ

Service-Mesh vs Ingress/API GatewayΒΆ

TasksΒΆ

They handle different traffic:

  • Service Mesh = Pod-to-Pod (EAST-WEST) inside cluster
  • Ingress = External-to-Pod (NORTH-SOUTH) into cluster

Visual ExplanationΒΆ

═══════════════════════════════════════════════════════════════════════════
                    NORTH-SOUTH vs EAST-WEST TRAFFIC
═══════════════════════════════════════════════════════════════════════════

                         External Users
                         (Internet/Corp Network)
                                 β”‚
                                 β”‚ NORTH-SOUTH
                                 β”‚ (Ingress handles this)
                                 β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚    KUBERNETES CLUSTER   β”‚
                    β”‚                         β”‚
                    β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
                    β”‚    β”‚   Ingress    β”‚     β”‚
                    β”‚    β”‚  Controller  β”‚     β”‚
                    β”‚    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
                    β”‚           β”‚             β”‚
                    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
                    β”‚  β”‚  Frontend Pod   β”‚    β”‚
                    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
                    β”‚           β”‚             β”‚
                    β”‚           β”‚ EAST-WEST   β”‚
                    β”‚           β”‚ (Service Mesh handles this)
                    β”‚           β”‚             β”‚
                    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
                    β”‚  β”‚   API Pod       │────┼─► Database Pod
                    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
                    β”‚           β”‚             β”‚
                    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
                    β”‚  β”‚   Auth Pod      │────┼─► Cache Pod
                    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
                    β”‚                         β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

What Each DoesΒΆ

Ingress/API Gateway (NORTH-SOUTH)ΒΆ

Purpose: Get traffic INTO the cluster

Handles:

βœ… External β†’ Cluster routing
βœ… TLS termination (HTTPS)
βœ… Hostname routing (api.mci.local)
βœ… Path routing (/users, /orders)
βœ… External load balancing
βœ… Public IP exposure

Example Flow:

User Browser (Internet)
    ↓
http://api.mci.local/users
    ↓
LoadBalancer IP (192.168.228.200)
    ↓
Ingress Controller
    ↓
Routes to "api-service"
    ↓
API Pod receives request

Service Mesh (EAST-WEST)ΒΆ

Purpose: Manage traffic BETWEEN pods inside cluster

Handles:

βœ… Pod β†’ Pod communication
βœ… mTLS (encrypted pod-to-pod)
βœ… Service-to-service policies
βœ… Retries, timeouts, circuit breakers
βœ… Traffic splitting (A/B testing)
βœ… Observability between services
βœ… Service discovery

Example Flow:

API Pod wants to call Auth Service
    ↓
Service Mesh intercepts
    ↓
Checks: Is API pod allowed to call Auth?
    ↓
Encrypts with mTLS
    ↓
Load balances across Auth pods
    ↓
Tracks latency, errors
    ↓
Auth Pod receives request

Complete ArchitectureΒΆ

═══════════════════════════════════════════════════════════════════════════
              BOTH INGRESS AND SERVICE MESH TOGETHER
═══════════════════════════════════════════════════════════════════════════

External User
    β”‚
    β”‚ 1. NORTH-SOUTH (Ingress)
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Ingress Controller (Cilium Envoy)                             β”‚
β”‚  β€’ TLS termination                                             β”‚
β”‚  β€’ Hostname routing                                            β”‚
β”‚  β€’ External authentication                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
                 β–Ό
            Frontend Pod
                 β”‚
                 β”‚ 2. EAST-WEST (Service Mesh)
                 β”‚    β€’ mTLS encryption
                 β”‚    β€’ L7 policies
                 β”‚    β€’ Retry logic
                 β”‚    β€’ Circuit breaker
                 β–Ό
            API Pod ──────► Auth Pod
                 β”‚              β”‚
                 β”‚              β”‚ 3. EAST-WEST
                 β”‚              β–Ό
                 β”‚         Database Pod
                 β”‚
                 β”‚ 4. EAST-WEST
                 β–Ό
            Payment Pod ───► Kafka Pod
                 β”‚
                 β”‚ 5. EAST-WEST
                 β–Ό
            Notification Pod

Why You Need BothΒΆ

Scenario Without Service Mesh (Only Ingress)ΒΆ

βœ… External traffic reaches cluster
❌ No encryption between pods
❌ No pod-to-pod policies
❌ No retry logic between services
❌ No circuit breakers
❌ Limited observability of internal traffic
❌ No canary deployments

Scenario Without Ingress (Only Service Mesh)ΒΆ

❌ External traffic can't reach cluster
❌ No TLS termination for external clients
❌ No hostname-based routing from outside
βœ… Internal pod-to-pod works great

Cilium's Unique PositionΒΆ

Cilium does BOTH Ingress AND Service Mesh!

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚               CILIUM STACK                 β”‚
β”‚                                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Ingress Controller (Cilium Envoy)   β”‚  β”‚
β”‚  β”‚  Handles: NORTH-SOUTH                β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Service Mesh (Cilium + Envoy)       β”‚  β”‚
β”‚  β”‚  Handles: EAST-WEST                  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Kube-proxy Replacement (eBPF)       β”‚  β”‚
β”‚  β”‚  Handles: Load balancing             β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Network Policies (eBPF + Envoy)     β”‚  β”‚
β”‚  β”‚  Handles: L3/L4/L7 security          β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Observability (Hubble)              β”‚  β”‚
β”‚  β”‚  Handles: All traffic visibility     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

This is why Cilium is powerful - it's an all-in-one solution.

Feature ComparisonΒΆ

Feature Ingress Only Service Mesh Only Both (Cilium)
External access βœ… ❌ βœ…
TLS termination βœ… ❌ βœ…
Hostname routing βœ… ❌ βœ…
Pod-to-pod mTLS ❌ βœ… βœ…
Retries/timeouts ❌ βœ… βœ…
Circuit breakers ❌ βœ… βœ…
Traffic splitting ❌ βœ… βœ…
L7 observability ⚠️ (edge only) βœ… (internal) βœ… (everywhere)
Network policies ⚠️ (limited) βœ… βœ…

Real-World ExampleΒΆ

E-commerce ApplicationΒΆ

User Request: https://shop.mci.local/checkout
    β”‚
    β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ β”‚ INGRESS NEEDED                          β”‚
    β”‚ β”‚ β€’ Accept HTTPS from internet            β”‚
    β”‚ β”‚ β€’ Route /checkout to frontend           β”‚
    β”‚ β”‚ β€’ TLS termination                       β”‚
    β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β–Ό
Frontend Pod
    β”‚
    β”‚ Makes API call: POST /api/payment
    β”‚
    β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ β”‚ SERVICE MESH NEEDED                     β”‚
    β”‚ β”‚ β€’ Encrypt traffic (mTLS)                β”‚
    β”‚ β”‚ β€’ Retry if payment service down         β”‚
    β”‚ β”‚ β€’ Circuit breaker if failing            β”‚
    β”‚ β”‚ β€’ L7 policy: only POST allowed          β”‚
    β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β–Ό
Payment Service Pod
    β”‚
    β”‚ Calls: GET /api/inventory
    β”‚
    β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ β”‚ SERVICE MESH NEEDED                     β”‚
    β”‚ β”‚ β€’ Check authorization                   β”‚
    β”‚ β”‚ β€’ Track latency                         β”‚
    β”‚ β”‚ β€’ Load balance across inventory pods    β”‚
    β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β–Ό
Inventory Service Pod

You need both:

  • Ingress to get the user's request into the cluster
  • Service Mesh to manage the internal service calls

When You DON'T Need Service MeshΒΆ

Simple applications:

User β†’ Ingress β†’ Single Monolith Pod β†’ Database

If you have:

  • Only 1-3 services
  • No pod-to-pod communication
  • No need for mTLS
  • No complex retry logic
  • Simple architecture

Then skip Service Mesh, just use Ingress.

When You DON'T Need IngressΒΆ

Internal-only cluster:

No external access needed
Pods only talk to each other
All traffic is EAST-WEST

Examples:

  • Data processing cluster (Kafka, Spark)
  • Batch job cluster
  • ML training cluster

Then skip Ingress, just use Service Mesh.

Summary TableΒΆ

Traffic Type Component Purpose Do You Need It?
External β†’ Cluster Ingress Get traffic in βœ… YES (always)
Pod β†’ Pod Service Mesh Manage internal traffic ⚠️ Only if microservices
External APIs API Gateway Advanced API features ❌ Not for internal tools

Bottom LineΒΆ

Service Mesh β‰  Replacement for Ingress

They're complementary:

  • Ingress = Front door to your cluster
  • Service Mesh = Internal traffic management

For MCI:

  • βœ… Use Cilium Ingress (you need external access)
  • ⚠️ Service Mesh features later (if you build microservices)
  • ❌ Skip API Gateway (internal tools don't need it)

Service Mesh vs Service DiscoveryΒΆ

These two concepts are often mentioned together, but operate at very different layers of your cluster. Here's a clear breakdown:

Service DiscoveryΒΆ

What it is: The mechanism by which workloads find each other inside the cluster.

In Kubernetes, this is handled natively and automatically through:

  • kube-dns / CoreDNS β€” Every Service gets a DNS record like my-svc.my-namespace.svc.cluster.local. Pods resolve this to a ClusterIP.
  • kube-proxy β€” Watches the API server for Endpoint changes and programs iptables/ipvs rules on each node to route traffic to healthy pods behind a Service.
  • Endpoints / EndpointSlices β€” The control plane continuously reconciles which pod IPs back a given Service, so dead pods are removed automatically.

Think of it as: "How do I find where service B is running right now?"

It answers the WHERE question β€” address resolution and basic load balancing.

Service MeshΒΆ

What it is: An infrastructure layer that controls, observes, and secures traffic between services β€” after discovery has already happened.

It's typically implemented via a sidecar proxy (e.g., Envoy) injected into every pod, plus a control plane (Istio, Linkerd, Cilium Service Mesh, Consul Connect).

A service mesh gives you:

Capability What it does
mTLS Encrypts and authenticates all pod-to-pod traffic automatically
Traffic management Canary releases, traffic splitting (90/10), retries, circuit breaking, timeouts
Observability Golden signals (latency, error rate, throughput) per service pair β€” L7 metrics
Authorization policies "Service A is allowed to call Service B on /api/* only"
Fault injection Inject delays or errors for chaos testing

Think of it as: "Now that I found service B, HOW should traffic flow, and is it safe/observable?"

It answers the HOW, WHO, and WHY questions β€” behavior, security, and visibility.

The Core DistinctionΒΆ

Request from Pod A  β†’  CoreDNS resolves "service-b"  β†’  kube-proxy routes to a healthy pod
        ↑                        ↑
   [ Service Discovery β€” built into k8s, always present ]


Request from Pod A  β†’  Envoy sidecar intercepts  β†’  applies policy/mTLS/retries  β†’  Envoy on Pod B
        ↑                             ↑
   [ Service Mesh β€” optional add-on, operates at L7 ]
Service Discovery Service Mesh
Layer L3/L4 (IP, TCP) L7 (HTTP, gRPC)
Built into k8s? βœ… Yes (CoreDNS + kube-proxy) ❌ No, add-on
Scope Finding & routing to a service Controlling traffic between services
Observability None (basic) Full L7 metrics, tracing
Security None beyond network policies mTLS, RBAC per route
Complexity Zero (transparent) High β€” significant ops overhead

Practical Guidance for Your StackΒΆ

Given your KaaS platform works with Cilium:

  • Cilium already replaces kube-proxy for service discovery using eBPF, which is faster and more scalable than iptables.
  • Cilium Service Mesh (with Hubble) can give you a significant portion of service mesh capabilities β€” L7 visibility, mTLS via WireGuard, network policies β€” without the sidecar overhead. This is worth considering over Istio for your air-gapped enterprise clusters since it has fewer moving parts.

Service Discovery vs API GatewayΒΆ

These two solve completely different problems at different boundaries of your system.


The One-Line Mental ModelΒΆ

Answers the question... Serves traffic from...
Service Discovery "Where is service B inside the cluster?" Internal (pod β†’ pod)
API Gateway "How does the outside world reach my services?" External (client β†’ cluster)

Service Discovery (recap)ΒΆ

Already covered, but to anchor the comparison:

  • Scope: Internal cluster only
  • Actors: Pod A, finding Pod B
  • Mechanism: CoreDNS + kube-proxy (or Cilium eBPF)
  • Protocol awareness: L3/L4 β€” it just resolves an IP and routes packets
  • Intelligence: Minimal β€” round-robin or IPVS-based load balancing
  • Who configures it: Kubernetes itself, automatically

It's fully transparent β€” your app code doesn't know it exists.


API GatewayΒΆ

An API Gateway sits at the edge of your cluster and acts as the single entry point for all external traffic. It's an active, intelligent proxy with full application-layer awareness.

Core responsibilities:

External Client
      β”‚
      β–Ό
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚           API Gateway              β”‚
 β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
 β”‚  β”‚  Auth / JWT validation      β”‚   β”‚
 β”‚  β”‚  Rate limiting / quotas     β”‚   β”‚
 β”‚  β”‚  TLS termination            β”‚   β”‚
 β”‚  β”‚  Request routing (L7)       β”‚   β”‚
 β”‚  β”‚  Protocol translation       β”‚   β”‚
 β”‚  β”‚  Request/response transform β”‚   β”‚
 β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚              β”‚             β”‚
      β–Ό              β–Ό             β–Ό
  Service A      Service B     Service C
  (orders)       (users)       (billing)

What an API Gateway provides:ΒΆ

Capability Detail
Routing /api/orders β†’ Service A, /api/users β†’ Service B
Auth Validate JWT/API keys before traffic reaches your services
Rate limiting 1000 req/min per tenant, per endpoint
TLS termination Handles HTTPS at the edge; internal traffic can be plain HTTP
Protocol translation REST β†’ gRPC, WebSocket upgrading
Request transformation Header injection, payload rewriting
Canary / A/B routing Route 5% of traffic to v2
Observability Per-route latency, error rates, quota usage

Common implementations: Kong, Traefik, NGINX, Istio Gateway, Emissary (Ambassador)


Side-by-Side ComparisonΒΆ

Dimension Service Discovery API Gateway
Traffic direction East-West (internal) North-South (external β†’ internal)
Layer L3/L4 L7
Who uses it Your microservices talking to each other External clients (browsers, mobile, partners)
Auth enforcement ❌ None βœ… Central enforcement point
Rate limiting ❌ None βœ… Per consumer, per route
TLS Handled by mesh (optional) βœ… Terminates external TLS
Protocol awareness IP/TCP only HTTP, gRPC, WebSocket, REST
Configured by Kubernetes automatically You (routes, policies, plugins)
Examples CoreDNS, kube-proxy, Cilium Kong, Traefik, NGINX

How They Fit TogetherΒΆ

They are not alternatives β€” they work at different boundaries and are both present in a production system:

Internet
   β”‚
   β–Ό
API Gateway          ← North-South boundary (you control this)
   β”‚
   β–Ό
Kubernetes Cluster
   β”‚
   β”œβ”€β”€ Pod A  ──(Service Discovery)──▢  Pod B
   β”‚
   └── Pod C  ──(Service Discovery)──▢  Pod D

A request flows through both: 1. External client hits the API Gateway β†’ auth checked, rate limit applied, routed to the right service 2. That service calls another internal service β†’ Service Discovery resolves it, traffic flows pod-to-pod


Practical Guidance for Your KaaS PlatformΒΆ

Since you're running Kong in your stack already:

  • Kong is your API Gateway β€” it handles all external tenant-facing traffic, auth (JWT/key-auth plugins), and rate limiting per tenant tier.
  • Cilium handles your internal service discovery and east-west routing via eBPF.
  • For your KaaS tiers, you likely want one Kong instance per tenant cluster (or at least per namespace) at the dedicated tier, and a shared Kong at the vCluster entry-level tier to keep costs down.
  • The combination of Kong (north-south) + Cilium (east-west + mesh) covers the full traffic control picture without needing a heavy Istio deployment on top.
  • For entry-level tiers (vCluster), service discovery alone is usually enough. A full mesh makes more sense at the dedicated cluster tier, where tenants need SLA-grade observability and zero-trust networking.

Service Mesh vs Service Discovery vs API GatewayΒΆ

The Big Picture FirstΒΆ

                        INTERNET / EXTERNAL CLIENTS
                                    β”‚
                                    β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚          API GATEWAY           β”‚  ← North-South
                    β”‚  Auth, Rate Limit, TLS, Route  β”‚    (Edge)
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚       KUBERNETES CLUSTER       β”‚
                    β”‚                                β”‚
                    β”‚  Pod A ──────────────► Pod B   β”‚  ← East-West
                    β”‚    β”‚    Service Mesh           β”‚    (Internal)
                    β”‚    β”‚    (Envoy sidecar)        β”‚
                    β”‚    β”‚                           β”‚
                    β”‚    └─ CoreDNS resolves Pod B   β”‚  ← Service
                    β”‚       kube-proxy/Cilium routes β”‚    Discovery
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

They are not alternatives. They are three distinct layers, each solving a different problem.


Individual IdentityΒΆ

1. Service DiscoveryΒΆ

"Where is the service?"

  • Who uses it: Pod β†’ Pod (internal, automatic)
  • Layer: L3/L4
  • Configured by: Kubernetes itself β€” zero effort from you
  • Intelligence level: Minimal β€” resolve a DNS name, route to a healthy IP
  • Implementations: CoreDNS, kube-proxy, Cilium (eBPF)

2. Service MeshΒΆ

"How should traffic behave between services?"

  • Who uses it: Pod β†’ Pod (internal, but with full L7 awareness)
  • Layer: L7
  • Configured by: You β€” traffic policies, mTLS rules, retry budgets
  • Intelligence level: High β€” observability, security, traffic shaping east-west
  • Implementations: Istio, Linkerd, Cilium Service Mesh, Consul Connect

3. API GatewayΒΆ

"How does the outside world reach my services?"

  • Who uses it: External client β†’ Cluster (edge boundary)
  • Layer: L7
  • Configured by: You β€” routes, auth plugins, rate limits, TLS certs
  • Intelligence level: High β€” but focused on external consumers, not internal behavior
  • Implementations: Kong, Traefik, NGINX, Emissary

Master Comparison TableΒΆ

Dimension Service Discovery Service Mesh API Gateway
Core question Where is it? How does it behave? How do I get in?
Traffic direction East-West East-West North-South
Layer L3/L4 L7 L7
Scope Internal only Internal only External β†’ Internal
TLS ❌ βœ… mTLS (pod-to-pod) βœ… Terminates external TLS
Auth enforcement ❌ βœ… Per service identity βœ… Per consumer (JWT/API key)
Rate limiting ❌ ⚠️ Basic βœ… Per consumer/route/tier
Observability ❌ βœ… L7 golden signals βœ… Per-route metrics
Retries / Circuit breaking ❌ βœ… βœ…
Traffic splitting ❌ βœ… (canary, A/B) βœ… (canary, weighted)
Protocol translation ❌ ⚠️ Limited βœ… REST↔gRPC, WebSocket
Required by Kubernetes βœ… Always present ❌ Optional ❌ Optional
Operational complexity Zero Very High Medium
Performance overhead Zero Medium (sidecar per pod) Low (one proxy at edge)

Where Each One Lives in a Request's JourneyΒΆ

Browser / Mobile App / Partner API
           β”‚
           β”‚  HTTPS :443
           β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚     API Gateway     β”‚  ➜ Validates JWT
  β”‚       (Kong)        β”‚  ➜ Checks rate limit (tenant quota)
  β”‚                     β”‚  ➜ Terminates TLS
  β”‚                     β”‚  ➜ Routes /api/orders β†’ order-svc
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β”‚  HTTP (internal)
           β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  order-svc Pod      β”‚
  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
  β”‚  β”‚ Envoy sidecar β”‚  β”‚  ➜ Enforces mTLS to downstream
  β”‚  β”‚ (Service Mesh)β”‚  β”‚  ➜ Emits L7 metrics to Prometheus
  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  ➜ Applies retry policy (3x, 100ms backoff)
  β”‚  [ app container ]  β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β”‚  Calls inventory-svc
           β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚   CoreDNS resolves  β”‚  ➜ inventory-svc.default.svc.cluster.local
  β”‚   inventory-svc     β”‚  ➜ Returns ClusterIP
  β”‚  (Svc Discovery)    β”‚  ➜ kube-proxy/Cilium routes to a healthy pod
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  inventory-svc Pod  β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Every layer fired exactly once in that single request. None replaced the other.


Overlap Zones (Where It Gets Confusing)ΒΆ

API Gateway vs Service Mesh β€” Traffic SplittingΒΆ

Both can do canary routing, but: - API Gateway canary β†’ splits external traffic between v1 and v2 of your public API - Mesh canary β†’ splits internal traffic between two versions of a downstream microservice, invisible to the outside

API Gateway vs Service Mesh β€” AuthΒΆ

  • API Gateway auth β†’ validates who the external consumer is (JWT, API key, OAuth)
  • Mesh auth β†’ validates which internal service is allowed to talk to which (SPIFFE identity, mTLS cert)

Service Mesh vs Service Discovery β€” RoutingΒΆ

  • Service Discovery routes at L4 β€” it just gets the packet to a pod
  • Service Mesh routes at L7 β€” it can route based on HTTP headers, gRPC method, cookie values, weight percentages

When Do You Need Each?ΒΆ

Scenario Discovery Mesh Gateway
Any Kubernetes cluster βœ… Always β€” β€”
Exposing services externally βœ… β€” βœ…
Multi-tenant SaaS (rate limits, auth) βœ… β€” βœ…
Zero-trust internal security (mTLS) βœ… βœ… β€”
Debugging latency between microservices βœ… βœ… β€”
Canary deploy internal service βœ… βœ… β€”
Canary deploy public API βœ… β€” βœ…
Full production microservice platform βœ… βœ… βœ…

Recommendation for KaaS PlatformΒΆ

Given your stack (Cilium + Kong + air-gapped Rocky Linux):

Service Discovery β†’ Cilium already replaces kube-proxy here using eBPF β€” zero extra work, better performance than iptables.

Service Mesh β†’ Use Cilium Service Mesh + Hubble instead of Istio. You get mTLS (via WireGuard), L7 visibility, and network policy enforcement with no sidecars β€” critical in air-gapped environments where every extra image and moving part is a liability. Reserve full Istio only if a large enterprise tenant specifically demands it.

API Gateway β†’ Kong is already in your stack and is the right call. For your KaaS tiers: - vCluster (entry-level): Shared Kong instance with namespace-scoped rate limiting per tenant - Dedicated CAPI cluster (enterprise): Dedicated Kong instance per cluster for full isolation and custom plugin sets per tenant

This combination gives you the full three-layer architecture with the least operational overhead in your air-gapped context.