Understanding OVHcloud Managed Kubernetes architecture
Objective
This guide explains the architecture of OVHcloud Managed Kubernetes Service (MKS) to help you understand how your clusters are deployed, managed, and connected. Understanding this architecture will help you make informed decisions about cluster configuration, troubleshooting, and scaling.
Overview
OVHcloud Managed Kubernetes Service is a CNCF-certified Kubernetes offering that abstracts away the complexity of managing the control plane while giving you full control over your worker nodes and workloads.
Control Plane architecture
The control plane is the brain of your Kubernetes cluster. OVHcloud fully manages this component, which includes:
- API Server: The front-end for the Kubernetes control plane, handling all API requests
- etcd: The distributed key-value store that holds all cluster state and configuration
- Controller Manager: Runs controller processes (node controller, replication controller, etc.)
- Scheduler: Assigns pods to nodes based on resource requirements and constraints
What OVHcloud manages for you
OVHcloud handles all operational aspects of the control plane:
| Component | OVHcloud responsibility |
|---|---|
| API Server | Deployment, scaling, high availability, security patches |
| etcd | Backups, encryption at rest, cluster health |
| Controller Manager | Updates, configuration, monitoring |
| Scheduler | Updates, configuration |
| Kubernetes versions | New minor versions availability, security patches (defined by user through Security Policy parameter) |
About Kubernetes upgrades: OVHcloud makes new Kubernetes minor versions available. You control when to trigger the upgrade. The only exception is when your cluster runs an End-of-Life version; in this case, OVHcloud will force an upgrade to the next supported version after prior notification.
Free vs Standard plan: Control plane differences
The control plane architecture differs significantly between plans:
| Feature | Free Plan | Standard Plan |
|---|---|---|
| Control Plane | Managed, single-zone | Managed, cross-AZ resilient |
| Availability | 99.5% SLO | 99.99% SLA (at GA) |
| etcd storage | Shared, up to 400MB | Dedicated, up to 8GB, distributed across 3 AZs |
| Max cluster size | 100 nodes | 500 nodes |
| Regional availability | Single-zone | 3 Availability Zones |
| CNI | Canal (Flannel + Calico) | Cilium |
Based on SLA/SLO commitments, a Free plan cluster could experience up to ~43 minutes of downtime per month (worst case), while a Standard plan cluster limits this to approximately 4 minutes maximum.
Worker Nodes architecture
Worker nodes are the machines where your containerized applications run. Unlike the control plane, you have direct control over node configuration.
How nodes are provisioned
Worker nodes are based on OVHcloud Public Cloud instances. When you create a node pool:
- OVHcloud provisions Public Cloud instances with your chosen flavor
- The instances are automatically configured with the required Kubernetes components
- Nodes register themselves with the control plane via Konnectivity
The CNI differs depending on your plan:
Node pools concept
Nodes are organized into node pools - groups of nodes sharing the same configuration:
- Flavor: Instance type (b3-8, b3-16, t1-45 for GPU, etc.)
- Autoscaling settings: Min/max nodes, scale-down thresholds
- Anti-affinity: Distribute nodes across different hypervisors
- Billing: Hourly or monthly (for gen2 flavors), Saving Plans for gen3 and above
- Labels and taints: For workload scheduling
Node lifecycle
- Installing: Node is being provisioned and configured
- Ready: Node is healthy and can receive pods
- NotReady: Node has issues (network, resources, etc.)
- Draining: Node is being evacuated (graceful, respects PDBs for 10 min max)
- Terminated: Node is deleted
GPU worker nodes (t1 and t2 flavors) may take more than one hour to reach a ready state.
Auto-healing
OVHcloud monitors node health. If a node remains in NotReady state for more than 10 minutes, auto-healing is triggered:
- Free plan: The node is reinstalled in-place
- Standard plan: The node is deleted and a new one is created
This ensures cluster stability but means:
- Do not store important data directly on nodes
- Always use Persistent Volumes for stateful workloads
- Design applications to be resilient to node failures
Node upgrades
When upgrading Kubernetes versions, MKS offers two strategies for updating worker nodes:
In-place upgrades
With in-place upgrades, each worker node is updated directly on its existing Public Cloud instance:
- MKS cordons the node (marks it unschedulable)
- MKS drains the node (evicts all pods, respecting PodDisruptionBudgets)
- Kubernetes components are upgraded on the same instance
- Node becomes Ready again
- Process repeats for the next node (strictly one-by-one)
Characteristics:
- No extra instances required
- Preserves instance identity (same IPs, same billing)
- Slower process (sequential, one node at a time)
- Temporary capacity reduction during each node upgrade
In-place upgrades can lead to resource pressure if your cluster doesn't have enough spare capacity to accommodate pods evicted from the node being upgraded. Ensure your remaining nodes can handle the extra workload.
Use cases:
- Monthly-billed instances (keep the same instance)
- Need to preserve public IP addresses
- Cost-sensitive environments (no extra instance costs)
Rolling upgrades
With rolling upgrades (currently available on Standard plan only), new worker nodes are created with the target Kubernetes version:
- MKS creates a new node running the target version
- MKS cordons and drains an old node
- Workloads migrate to the new node
- Old node is deleted
- Process repeats until all nodes are upgraded
Characteristics:
- Requires temporary extra capacity (new nodes created before old ones deleted)
- Faster upgrades with higher availability
- Clean node state (fresh instances)
- Better handling of workload migration
In the future roadmap, rolling upgrades will support Kubernetes-style maxSurge and maxUnavailable settings to control how many nodes can be added or taken offline simultaneously.
Use cases:
- Production environments requiring high availability
- Faster upgrade cycles
- When clean node state is preferred
Comparison
| Aspect | In-place | Rolling |
|---|---|---|
| Availability | Free & Standard | Standard only (for now) |
| Extra instances | No | Yes (temporary) |
| Speed | Slower (sequential) | Faster (parallel possible) |
| Instance identity | Preserved | New instances |
| Public IPs | Preserved | Changed |
| Resource pressure | Higher (reduced capacity) | Lower (extra capacity) |
| Best for | Cost optimization | High availability |
Reserved resources
Each worker node reserves resources for Kubernetes system components:
| Resource | Formula |
|---|---|
| CPU | 15% of 1 CPU + 0.5% of all CPU cores |
| RAM | Fixed 1590 MB |
| Storage | log10(total storage in GB) * 10 + 10% of total storage |
Example for b3-16 flavor: 170m CPU, 1.59GB RAM, 30GB storage reserved.
Networking architecture
Cluster network overview
Internet access options
MKS clusters support three networking patterns for Internet connectivity:
| Option | Direction | Use case |
|---|---|---|
| Load Balancer | Inbound | Expose services (recommended for production) |
| Node Floating IPs | Inbound & Outbound | Direct node access, preserve source IP |
| Gateway (SNAT) | Outbound only | Default outbound, shared IP for all nodes |
Option 1 - Load Balancer (Octavia):
- Recommended for exposing services to the Internet
- Provides health checking, load distribution
- Single entry point with a Floating IP
- Supports L4 (TCP/UDP) load balancing
Option 2 - Node Floating IPs:
- Each node can have its own Floating IP attached
- Enables direct inbound access (e.g., via NodePort services)
- Preserves source IP addresses for outbound traffic (no SNAT)
- Useful when pods need to reach external services that whitelist IPs
- IP is preserved during in-place upgrades, but changes with rolling upgrades
To attach a Floating IP to a node, a Gateway (OpenStack router) must be configured on the private network subnet. Without a Gateway, Floating IPs cannot be associated with instances in that subnet.
Option 3 - Gateway (SNAT):
- Default for outbound Internet access when no Floating IP is attached
- All nodes share the same outbound IP (Gateway IP)
- Does NOT expose services to the Internet
- Source IP is translated (SNAT)
CNI: Container Network Interface
The CNI plugin differs between plans:
| Plan | CNI | Description |
|---|---|---|
| Free | Canal (Flannel + Calico) | Flannel for overlay network, Calico for network policies |
| Standard | Cilium | eBPF-based, high performance, advanced observability |
Reserved subnets (do not use in your private network):
Free plan:
| Subnet | Purpose |
|---|---|
| 10.2.0.0/16 | Pod network |
| 10.3.0.0/16 | Service network |
| 172.17.0.0/16 | Docker daemon (legacy) |
Standard plan:
| Subnet | Purpose |
|---|---|
| 10.240.0.0/13 | Pod network |
| 10.3.0.0/16 | Service network |
Service exposure options
Kubernetes Services can be exposed in several ways:
Load Balancer integration
Creating a Service of type LoadBalancer automatically provisions an OVHcloud Public Cloud Load Balancer (based on OpenStack Octavia):
For Kubernetes versions >= 1.31, Octavia is the default Load Balancer and no annotation is required.
Private networking options
OVHcloud offers two ways to use private networks with MKS:
1. Public Cloud Private Network (without vRack)
For connecting OVHcloud Public Cloud services within the same region:
- MKS clusters
- Public Cloud instances
- Managed Databases (DBaaS)
- Other Public Cloud services
This is the simplest option when you only need private connectivity between Public Cloud resources.
2. vRack integration
For broader interconnectivity across OVHcloud product universes and regions:
vRack enables:
- Cross-region private connectivity
- Interconnection with Bare Metal servers
- Interconnection with Hosted Private Cloud (VMware)
- Interconnection with other OVHcloud dedicated products
When using a private network (with or without vRack), you will still see a public IPv4 on worker nodes. This IP is not reachable from the Internet and is used exclusively for node administration and control plane communication.
Storage architecture
Persistent Volumes with Cinder CSI
MKS uses the OpenStack Cinder CSI driver for persistent storage:
Storage classes
| Storage Class | Description | IOPS | Recommended for |
|---|---|---|---|
csi-cinder-high-speed (default) | Fixed performance SSD | Up to 3,000 | Volumes up to 100GB |
csi-cinder-high-speed-gen2 | Progressive performance NVMe | 30 IOPS/GB (max 20k) | Volumes > 100GB |
csi-cinder-classic | Traditional spinning disks | 200 guaranteed | Cost-sensitive workloads |
*-luks variants | Encrypted versions | Same as base | Sensitive data |
Access modes and limitations
| Access Mode | Support | Description |
|---|---|---|
| ReadWriteOnce (RWO) | Supported | Single node read-write |
| ReadOnlyMany (ROX) | Not supported* | Multi-node read-only |
| ReadWriteMany (RWX) | Not supported* | Multi-node read-write |
*For multi-attach volumes (RWX), use one of these OVHcloud storage solutions:
| Solution | Protocol | Documentation |
|---|---|---|
| File Storage (Public Cloud) | NFS | File Storage documentation |
| Enterprise File Storage | NFS | EFS with MKS |
| NAS-HA | NFS | NAS-HA with MKS |
| Cloud Disk Array | CephFS | CDA with MKS |
A worker node can have a maximum of 100 Cinder persistent volumes attached to it.
Security model
Shared responsibility
About Kubernetes version upgrades: OVHcloud provides new minor versions. You decide when to upgrade. However, if your cluster runs an End-of-Life version, OVHcloud will force an upgrade to the next version after prior notification.
Access control mechanisms
- kubeconfig authentication: Downloaded from OVHcloud Control Panel, provides admin access
- OIDC integration: Connect your identity provider for SSO
- API server IP restrictions: Limit access to specific IP ranges
- RBAC: Role-Based Access Control for fine-grained permissions
Security features
- Free plan: Network policies via Calico
- Standard plan: Network policies via Cilium (eBPF-based)
- Secrets encryption in transit and at rest
- Node isolation via security groups
- Audit logs available in Control Panel
Component versions
Current software versions (as of the latest Kubernetes releases):
| Component | Free Plan | Standard Plan |
|---|---|---|
| OS | Ubuntu 22.04 LTS | Ubuntu 22.04 LTS |
| Kernel | 5.15-generic | 5.15-generic |
| Container runtime | containerd 2.1.4 | containerd 2.1.4 |
| CNI | Canal (Calico v3.30.1 + Flannel v0.24.4) | Cilium |
| CSI | Cinder CSI v1.29.0 | Cinder CSI v1.29.0 |
| CoreDNS | v1.12.4 | v1.12.4 |
| Metrics Server | v0.8.0 | v0.8.0 |
For the complete version matrix, see Kubernetes Plugins & Software versions.
Architecture diagram: Complete overview
Go further
- Known limits
- Choosing the right plan: Free vs Standard
- Responsibility model
- Software versions and reserved resources
- Using vRack Private Network
- Persistent Volumes
- Expose your applications using Load Balancer
If you need training or technical assistance to implement our solutions, contact your sales representative or click on this link to get a quote and ask our Professional Services experts for a custom analysis of your project.
Join our community of users.