Service-Mesh - Service-Discovery - Ingress/API GatewayΒΆ
Service-Mesh vs Ingress/API GatewayΒΆ
TasksΒΆ
They handle different traffic:
- Service Mesh = Pod-to-Pod (EAST-WEST) inside cluster
- Ingress = External-to-Pod (NORTH-SOUTH) into cluster
Visual ExplanationΒΆ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
NORTH-SOUTH vs EAST-WEST TRAFFIC
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
External Users
(Internet/Corp Network)
β
β NORTH-SOUTH
β (Ingress handles this)
β
ββββββββββββββΌβββββββββββββ
β KUBERNETES CLUSTER β
β β
β ββββββββββββββββ β
β β Ingress β β
β β Controller β β
β ββββββββ¬ββββββββ β
β β β
β ββββββββββΌβββββββββ β
β β Frontend Pod β β
β ββββββββββ¬βββββββββ β
β β β
β β EAST-WEST β
β β (Service Mesh handles this)
β β β
β ββββββββββΌβββββββββ β
β β API Pod ββββββΌββΊ Database Pod
β ββββββββββ¬βββββββββ β
β β β
β ββββββββββΌβββββββββ β
β β Auth Pod ββββββΌββΊ Cache Pod
β βββββββββββββββββββ β
β β
βββββββββββββββββββββββββββ
What Each DoesΒΆ
Ingress/API Gateway (NORTH-SOUTH)ΒΆ
Purpose: Get traffic INTO the cluster
Handles:
β
External β Cluster routing
β
TLS termination (HTTPS)
β
Hostname routing (api.mci.local)
β
Path routing (/users, /orders)
β
External load balancing
β
Public IP exposure
Example Flow:
User Browser (Internet)
β
http://api.mci.local/users
β
LoadBalancer IP (192.168.228.200)
β
Ingress Controller
β
Routes to "api-service"
β
API Pod receives request
Service Mesh (EAST-WEST)ΒΆ
Purpose: Manage traffic BETWEEN pods inside cluster
Handles:
β
Pod β Pod communication
β
mTLS (encrypted pod-to-pod)
β
Service-to-service policies
β
Retries, timeouts, circuit breakers
β
Traffic splitting (A/B testing)
β
Observability between services
β
Service discovery
Example Flow:
API Pod wants to call Auth Service
β
Service Mesh intercepts
β
Checks: Is API pod allowed to call Auth?
β
Encrypts with mTLS
β
Load balances across Auth pods
β
Tracks latency, errors
β
Auth Pod receives request
Complete ArchitectureΒΆ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
BOTH INGRESS AND SERVICE MESH TOGETHER
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
External User
β
β 1. NORTH-SOUTH (Ingress)
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Ingress Controller (Cilium Envoy) β
β β’ TLS termination β
β β’ Hostname routing β
β β’ External authentication β
ββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
Frontend Pod
β
β 2. EAST-WEST (Service Mesh)
β β’ mTLS encryption
β β’ L7 policies
β β’ Retry logic
β β’ Circuit breaker
βΌ
API Pod βββββββΊ Auth Pod
β β
β β 3. EAST-WEST
β βΌ
β Database Pod
β
β 4. EAST-WEST
βΌ
Payment Pod ββββΊ Kafka Pod
β
β 5. EAST-WEST
βΌ
Notification Pod
Why You Need BothΒΆ
Scenario Without Service Mesh (Only Ingress)ΒΆ
β
External traffic reaches cluster
β No encryption between pods
β No pod-to-pod policies
β No retry logic between services
β No circuit breakers
β Limited observability of internal traffic
β No canary deployments
Scenario Without Ingress (Only Service Mesh)ΒΆ
β External traffic can't reach cluster
β No TLS termination for external clients
β No hostname-based routing from outside
β
Internal pod-to-pod works great
Cilium's Unique PositionΒΆ
Cilium does BOTH Ingress AND Service Mesh!
ββββββββββββββββββββββββββββββββββββββββββββββ
β CILIUM STACK β
β β
β ββββββββββββββββββββββββββββββββββββββββ β
β β Ingress Controller (Cilium Envoy) β β
β β Handles: NORTH-SOUTH β β
β ββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββ β
β β Service Mesh (Cilium + Envoy) β β
β β Handles: EAST-WEST β β
β ββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββ β
β β Kube-proxy Replacement (eBPF) β β
β β Handles: Load balancing β β
β ββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββ β
β β Network Policies (eBPF + Envoy) β β
β β Handles: L3/L4/L7 security β β
β ββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββ β
β β Observability (Hubble) β β
β β Handles: All traffic visibility β β
β ββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββ
This is why Cilium is powerful - it's an all-in-one solution.
Feature ComparisonΒΆ
| Feature | Ingress Only | Service Mesh Only | Both (Cilium) |
|---|---|---|---|
| External access | β | β | β |
| TLS termination | β | β | β |
| Hostname routing | β | β | β |
| Pod-to-pod mTLS | β | β | β |
| Retries/timeouts | β | β | β |
| Circuit breakers | β | β | β |
| Traffic splitting | β | β | β |
| L7 observability | β οΈ (edge only) | β (internal) | β (everywhere) |
| Network policies | β οΈ (limited) | β | β |
Real-World ExampleΒΆ
E-commerce ApplicationΒΆ
User Request: https://shop.mci.local/checkout
β
β βββββββββββββββββββββββββββββββββββββββββββ
β β INGRESS NEEDED β
β β β’ Accept HTTPS from internet β
β β β’ Route /checkout to frontend β
β β β’ TLS termination β
β βββββββββββββββββββββββββββββββββββββββββββ
βΌ
Frontend Pod
β
β Makes API call: POST /api/payment
β
β βββββββββββββββββββββββββββββββββββββββββββ
β β SERVICE MESH NEEDED β
β β β’ Encrypt traffic (mTLS) β
β β β’ Retry if payment service down β
β β β’ Circuit breaker if failing β
β β β’ L7 policy: only POST allowed β
β βββββββββββββββββββββββββββββββββββββββββββ
βΌ
Payment Service Pod
β
β Calls: GET /api/inventory
β
β βββββββββββββββββββββββββββββββββββββββββββ
β β SERVICE MESH NEEDED β
β β β’ Check authorization β
β β β’ Track latency β
β β β’ Load balance across inventory pods β
β βββββββββββββββββββββββββββββββββββββββββββ
βΌ
Inventory Service Pod
You need both:
- Ingress to get the user's request into the cluster
- Service Mesh to manage the internal service calls
When You DON'T Need Service MeshΒΆ
Simple applications:
If you have:
- Only 1-3 services
- No pod-to-pod communication
- No need for mTLS
- No complex retry logic
- Simple architecture
Then skip Service Mesh, just use Ingress.
When You DON'T Need IngressΒΆ
Internal-only cluster:
Examples:
- Data processing cluster (Kafka, Spark)
- Batch job cluster
- ML training cluster
Then skip Ingress, just use Service Mesh.
Summary TableΒΆ
| Traffic Type | Component | Purpose | Do You Need It? |
|---|---|---|---|
| External β Cluster | Ingress | Get traffic in | β YES (always) |
| Pod β Pod | Service Mesh | Manage internal traffic | β οΈ Only if microservices |
| External APIs | API Gateway | Advanced API features | β Not for internal tools |
Bottom LineΒΆ
Service Mesh β Replacement for Ingress
They're complementary:
- Ingress = Front door to your cluster
- Service Mesh = Internal traffic management
For MCI:
- β Use Cilium Ingress (you need external access)
- β οΈ Service Mesh features later (if you build microservices)
- β Skip API Gateway (internal tools don't need it)
Service Mesh vs Service DiscoveryΒΆ
These two concepts are often mentioned together, but operate at very different layers of your cluster. Here's a clear breakdown:
Service DiscoveryΒΆ
What it is: The mechanism by which workloads find each other inside the cluster.
In Kubernetes, this is handled natively and automatically through:
- kube-dns / CoreDNS β Every Service gets a DNS record like
my-svc.my-namespace.svc.cluster.local. Pods resolve this to a ClusterIP. - kube-proxy β Watches the API server for Endpoint changes and programs iptables/ipvs rules on each node to route traffic to healthy pods behind a Service.
- Endpoints / EndpointSlices β The control plane continuously reconciles which pod IPs back a given Service, so dead pods are removed automatically.
Think of it as: "How do I find where service B is running right now?"
It answers the WHERE question β address resolution and basic load balancing.
Service MeshΒΆ
What it is: An infrastructure layer that controls, observes, and secures traffic between services β after discovery has already happened.
It's typically implemented via a sidecar proxy (e.g., Envoy) injected into every pod, plus a control plane (Istio, Linkerd, Cilium Service Mesh, Consul Connect).
A service mesh gives you:
| Capability | What it does |
|---|---|
| mTLS | Encrypts and authenticates all pod-to-pod traffic automatically |
| Traffic management | Canary releases, traffic splitting (90/10), retries, circuit breaking, timeouts |
| Observability | Golden signals (latency, error rate, throughput) per service pair β L7 metrics |
| Authorization policies | "Service A is allowed to call Service B on /api/* only" |
| Fault injection | Inject delays or errors for chaos testing |
Think of it as: "Now that I found service B, HOW should traffic flow, and is it safe/observable?"
It answers the HOW, WHO, and WHY questions β behavior, security, and visibility.
The Core DistinctionΒΆ
Request from Pod A β CoreDNS resolves "service-b" β kube-proxy routes to a healthy pod
β β
[ Service Discovery β built into k8s, always present ]
Request from Pod A β Envoy sidecar intercepts β applies policy/mTLS/retries β Envoy on Pod B
β β
[ Service Mesh β optional add-on, operates at L7 ]
| Service Discovery | Service Mesh | |
|---|---|---|
| Layer | L3/L4 (IP, TCP) | L7 (HTTP, gRPC) |
| Built into k8s? | β Yes (CoreDNS + kube-proxy) | β No, add-on |
| Scope | Finding & routing to a service | Controlling traffic between services |
| Observability | None (basic) | Full L7 metrics, tracing |
| Security | None beyond network policies | mTLS, RBAC per route |
| Complexity | Zero (transparent) | High β significant ops overhead |
Practical Guidance for Your StackΒΆ
Given your KaaS platform works with Cilium:
- Cilium already replaces kube-proxy for service discovery using eBPF, which is faster and more scalable than iptables.
- Cilium Service Mesh (with Hubble) can give you a significant portion of service mesh capabilities β L7 visibility, mTLS via WireGuard, network policies β without the sidecar overhead. This is worth considering over Istio for your air-gapped enterprise clusters since it has fewer moving parts.
Service Discovery vs API GatewayΒΆ
These two solve completely different problems at different boundaries of your system.
The One-Line Mental ModelΒΆ
| Answers the question... | Serves traffic from... | |
|---|---|---|
| Service Discovery | "Where is service B inside the cluster?" | Internal (pod β pod) |
| API Gateway | "How does the outside world reach my services?" | External (client β cluster) |
Service Discovery (recap)ΒΆ
Already covered, but to anchor the comparison:
- Scope: Internal cluster only
- Actors: Pod A, finding Pod B
- Mechanism: CoreDNS + kube-proxy (or Cilium eBPF)
- Protocol awareness: L3/L4 β it just resolves an IP and routes packets
- Intelligence: Minimal β round-robin or IPVS-based load balancing
- Who configures it: Kubernetes itself, automatically
It's fully transparent β your app code doesn't know it exists.
API GatewayΒΆ
An API Gateway sits at the edge of your cluster and acts as the single entry point for all external traffic. It's an active, intelligent proxy with full application-layer awareness.
Core responsibilities:
External Client
β
βΌ
ββββββββββββββββββββββββββββββββββββββ
β API Gateway β
β βββββββββββββββββββββββββββββββ β
β β Auth / JWT validation β β
β β Rate limiting / quotas β β
β β TLS termination β β
β β Request routing (L7) β β
β β Protocol translation β β
β β Request/response transform β β
β βββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
Service A Service B Service C
(orders) (users) (billing)
What an API Gateway provides:ΒΆ
| Capability | Detail |
|---|---|
| Routing | /api/orders β Service A, /api/users β Service B |
| Auth | Validate JWT/API keys before traffic reaches your services |
| Rate limiting | 1000 req/min per tenant, per endpoint |
| TLS termination | Handles HTTPS at the edge; internal traffic can be plain HTTP |
| Protocol translation | REST β gRPC, WebSocket upgrading |
| Request transformation | Header injection, payload rewriting |
| Canary / A/B routing | Route 5% of traffic to v2 |
| Observability | Per-route latency, error rates, quota usage |
Common implementations: Kong, Traefik, NGINX, Istio Gateway, Emissary (Ambassador)
Side-by-Side ComparisonΒΆ
| Dimension | Service Discovery | API Gateway |
|---|---|---|
| Traffic direction | East-West (internal) | North-South (external β internal) |
| Layer | L3/L4 | L7 |
| Who uses it | Your microservices talking to each other | External clients (browsers, mobile, partners) |
| Auth enforcement | β None | β Central enforcement point |
| Rate limiting | β None | β Per consumer, per route |
| TLS | Handled by mesh (optional) | β Terminates external TLS |
| Protocol awareness | IP/TCP only | HTTP, gRPC, WebSocket, REST |
| Configured by | Kubernetes automatically | You (routes, policies, plugins) |
| Examples | CoreDNS, kube-proxy, Cilium | Kong, Traefik, NGINX |
How They Fit TogetherΒΆ
They are not alternatives β they work at different boundaries and are both present in a production system:
Internet
β
βΌ
API Gateway β North-South boundary (you control this)
β
βΌ
Kubernetes Cluster
β
βββ Pod A ββ(Service Discovery)βββΆ Pod B
β
βββ Pod C ββ(Service Discovery)βββΆ Pod D
A request flows through both: 1. External client hits the API Gateway β auth checked, rate limit applied, routed to the right service 2. That service calls another internal service β Service Discovery resolves it, traffic flows pod-to-pod
Practical Guidance for Your KaaS PlatformΒΆ
Since you're running Kong in your stack already:
- Kong is your API Gateway β it handles all external tenant-facing traffic, auth (JWT/key-auth plugins), and rate limiting per tenant tier.
- Cilium handles your internal service discovery and east-west routing via eBPF.
- For your KaaS tiers, you likely want one Kong instance per tenant cluster (or at least per namespace) at the dedicated tier, and a shared Kong at the vCluster entry-level tier to keep costs down.
- The combination of Kong (north-south) + Cilium (east-west + mesh) covers the full traffic control picture without needing a heavy Istio deployment on top.
- For entry-level tiers (vCluster), service discovery alone is usually enough. A full mesh makes more sense at the dedicated cluster tier, where tenants need SLA-grade observability and zero-trust networking.
Service Mesh vs Service Discovery vs API GatewayΒΆ
The Big Picture FirstΒΆ
INTERNET / EXTERNAL CLIENTS
β
βΌ
ββββββββββββββββββββββββββββββββββ
β API GATEWAY β β North-South
β Auth, Rate Limit, TLS, Route β (Edge)
ββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββ
β KUBERNETES CLUSTER β
β β
β Pod A βββββββββββββββΊ Pod B β β East-West
β β Service Mesh β (Internal)
β β (Envoy sidecar) β
β β β
β ββ CoreDNS resolves Pod B β β Service
β kube-proxy/Cilium routes β Discovery
ββββββββββββββββββββββββββββββββββ
They are not alternatives. They are three distinct layers, each solving a different problem.
Individual IdentityΒΆ
1. Service DiscoveryΒΆ
"Where is the service?"
- Who uses it: Pod β Pod (internal, automatic)
- Layer: L3/L4
- Configured by: Kubernetes itself β zero effort from you
- Intelligence level: Minimal β resolve a DNS name, route to a healthy IP
- Implementations: CoreDNS, kube-proxy, Cilium (eBPF)
2. Service MeshΒΆ
"How should traffic behave between services?"
- Who uses it: Pod β Pod (internal, but with full L7 awareness)
- Layer: L7
- Configured by: You β traffic policies, mTLS rules, retry budgets
- Intelligence level: High β observability, security, traffic shaping east-west
- Implementations: Istio, Linkerd, Cilium Service Mesh, Consul Connect
3. API GatewayΒΆ
"How does the outside world reach my services?"
- Who uses it: External client β Cluster (edge boundary)
- Layer: L7
- Configured by: You β routes, auth plugins, rate limits, TLS certs
- Intelligence level: High β but focused on external consumers, not internal behavior
- Implementations: Kong, Traefik, NGINX, Emissary
Master Comparison TableΒΆ
| Dimension | Service Discovery | Service Mesh | API Gateway |
|---|---|---|---|
| Core question | Where is it? | How does it behave? | How do I get in? |
| Traffic direction | East-West | East-West | North-South |
| Layer | L3/L4 | L7 | L7 |
| Scope | Internal only | Internal only | External β Internal |
| TLS | β | β mTLS (pod-to-pod) | β Terminates external TLS |
| Auth enforcement | β | β Per service identity | β Per consumer (JWT/API key) |
| Rate limiting | β | β οΈ Basic | β Per consumer/route/tier |
| Observability | β | β L7 golden signals | β Per-route metrics |
| Retries / Circuit breaking | β | β | β |
| Traffic splitting | β | β (canary, A/B) | β (canary, weighted) |
| Protocol translation | β | β οΈ Limited | β RESTβgRPC, WebSocket |
| Required by Kubernetes | β Always present | β Optional | β Optional |
| Operational complexity | Zero | Very High | Medium |
| Performance overhead | Zero | Medium (sidecar per pod) | Low (one proxy at edge) |
Where Each One Lives in a Request's JourneyΒΆ
Browser / Mobile App / Partner API
β
β HTTPS :443
βΌ
βββββββββββββββββββββββ
β API Gateway β β Validates JWT
β (Kong) β β Checks rate limit (tenant quota)
β β β Terminates TLS
β β β Routes /api/orders β order-svc
βββββββββββββββββββββββ
β
β HTTP (internal)
βΌ
βββββββββββββββββββββββ
β order-svc Pod β
β βββββββββββββββββ β
β β Envoy sidecar β β β Enforces mTLS to downstream
β β (Service Mesh)β β β Emits L7 metrics to Prometheus
β βββββββββββββββββ β β Applies retry policy (3x, 100ms backoff)
β [ app container ] β
βββββββββββββββββββββββ
β
β Calls inventory-svc
βΌ
βββββββββββββββββββββββ
β CoreDNS resolves β β inventory-svc.default.svc.cluster.local
β inventory-svc β β Returns ClusterIP
β (Svc Discovery) β β kube-proxy/Cilium routes to a healthy pod
βββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β inventory-svc Pod β
βββββββββββββββββββββββ
Every layer fired exactly once in that single request. None replaced the other.
Overlap Zones (Where It Gets Confusing)ΒΆ
API Gateway vs Service Mesh β Traffic SplittingΒΆ
Both can do canary routing, but: - API Gateway canary β splits external traffic between v1 and v2 of your public API - Mesh canary β splits internal traffic between two versions of a downstream microservice, invisible to the outside
API Gateway vs Service Mesh β AuthΒΆ
- API Gateway auth β validates who the external consumer is (JWT, API key, OAuth)
- Mesh auth β validates which internal service is allowed to talk to which (SPIFFE identity, mTLS cert)
Service Mesh vs Service Discovery β RoutingΒΆ
- Service Discovery routes at L4 β it just gets the packet to a pod
- Service Mesh routes at L7 β it can route based on HTTP headers, gRPC method, cookie values, weight percentages
When Do You Need Each?ΒΆ
| Scenario | Discovery | Mesh | Gateway |
|---|---|---|---|
| Any Kubernetes cluster | β Always | β | β |
| Exposing services externally | β | β | β |
| Multi-tenant SaaS (rate limits, auth) | β | β | β |
| Zero-trust internal security (mTLS) | β | β | β |
| Debugging latency between microservices | β | β | β |
| Canary deploy internal service | β | β | β |
| Canary deploy public API | β | β | β |
| Full production microservice platform | β | β | β |
Recommendation for KaaS PlatformΒΆ
Given your stack (Cilium + Kong + air-gapped Rocky Linux):
Service Discovery β Cilium already replaces kube-proxy here using eBPF β zero extra work, better performance than iptables.
Service Mesh β Use Cilium Service Mesh + Hubble instead of Istio. You get mTLS (via WireGuard), L7 visibility, and network policy enforcement with no sidecars β critical in air-gapped environments where every extra image and moving part is a liability. Reserve full Istio only if a large enterprise tenant specifically demands it.
API Gateway β Kong is already in your stack and is the right call. For your KaaS tiers: - vCluster (entry-level): Shared Kong instance with namespace-scoped rate limiting per tenant - Dedicated CAPI cluster (enterprise): Dedicated Kong instance per cluster for full isolation and custom plugin sets per tenant
This combination gives you the full three-layer architecture with the least operational overhead in your air-gapped context.