- SOTA Embedding Retrieval: Gemini + pgvector for Production Chat
- A Review of Agentic Design Patterns
- Model Context Protocol (MCP) and MCP Servers in LLM Agent Systems
- Building AI Agents for Automated Multi-Format Content: From News to Podcasts
- Rediscovering Cursor
- GraphRAG > Traditional Vector RAG
- Cultural Bias in LLMs
- Mapping out the AI Landscape with Topic Modelling
- Sustainable Cloud Computing: Carbon-Aware AI
- Defensive Technology for the Next Decade of AI
- Situational Awareness: The Decade Ahead
- Mechanistic Interpretability: A Survey
- Why I Left Ubuntu
- Multi-Agent Collaboration
- Embeddings and Vector Databases: Enhancing Retrieval Systems
- Building an Automated Newsletter-to-Summary Pipeline with OpenAI: Zapier AI Actions vs AWS SES & Lambda
- Local AI Image Generation
- ›MLOps: Deploying a Distributed Ray Python Server with Kubernetes, EKS & KubeRay
- Making the Switch to Linux for Development: A Developer's Experience
- Scaling Options Pricing with Ray
- The Async Worker Pool
- Browser Fingerprinting: Introducing My First NPM Package
- Reading Data from @socket.io/redis-emitter without Using a Socket.io Client
- Socket.io Middleware for Redux Store Integration
- Sharing TypeScript Code Between Microservices: A Guide Using Git Submodules
- Efficient Dataset Storage: Beyond CSVs
- Embracing Next.js 13: Why I switched from Plain React
- Deploy & Scale Socket.io Containers in ECS with Elasticache
- Implementing TOTP Authentication in Python using PyOTP
- Simplifying Lambda Layer ARNs and Creating Custom Layers in AWS
- TimeScaleDB Deployment: Docker Containers and EC2 Setup
- How to SSH into an EC2 Instance Using PuTTY
MLOps: Deploying a Distributed Ray Python Server with Kubernetes, EKS & KubeRay
Deploying distributed Python applications in production requires careful orchestration of compute resources, fault tolerance, and scalability. For Ray Serve, the official documentation recommends Kubernetes deployment using the RayService controller from KubeRay.
This integration delivers the scalability and user experience of Ray Serve alongside Kubernetes' operational benefits, including seamless integration with existing Kubernetes applications. RayService simplifies production deployment by automatically managing health checks, status reporting, failure recovery, and rolling updates.
A RayService Custom Resource (CR) encapsulates a multi-node Ray Cluster and Serve application into a single Kubernetes manifest. All deployment, upgrade, and status operations use standard kubectl
commands.
This guide covers both local development using Kind
and production deployment on AWS EKS. Follow along with the complete sample repository on GitHub.
Local Development with Kind
We'll start by deploying locally using Kind, a tool for running Kubernetes clusters using Docker containers. While Kind was designed for testing Kubernetes itself, it's excellent for local development.
Prerequisites: Install the AWS CLI, kubectl, Helm, eksctl, and Docker.
Creating a Kubernetes Cluster
Create a local Kubernetes cluster with the following command:
kind create cluster --image=kindest/node:v1.23.0
Installing the KubeRay Operator
Install the KubeRay operator using Helm:
$ helm repo add kuberay https://ray-project.github.io/kuberay-helm/
$ helm repo update
# Install both CRDs and KubeRay operator v1.0.0-rc.0.
$ helm install kuberay-operator kuberay/kuberay-operator --version 1.0.0-rc.0
# Confirm that the operator is running in the namespace `default`.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
kuberay-operator-68cc555c9-qc7cf 1/1 Running 0 22s
Configuring RayService Custom Resource
A RayService manages two key components: RayClusters and Ray Serve applications. It provides Kubernetes-native support, in-place updates, zero-downtime upgrades, and service high availability.
Custom Docker Image Setup
To deploy your application, extend the official Ray Docker images with your dependencies. The rayproject organization maintains the base images. Our sample application uses this Dockerfile:
# pull official base image
FROM rayproject/ray:nightly-py310-cpu
# install requirements.txt
COPY requirements.txt /tmp/
RUN pip install --requirement /tmp/requirements.txt
# set work directory
WORKDIR /serve_app
# set environment variables
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
# Copy App foler into workdir
COPY app /serve_app/app
Deploying the RayService
Deploy your custom Docker image by configuring the RayService as shown in the sample YAML file.
Apply the RayService configuration:
kubectl apply -f custom.yaml
Local Development Tip: If you prefer not to push images to Docker Hub, you can load local images directly into your Kind cluster.
Monitoring RayService Status
Monitor your deployment using standard kubectl
commands to track health and readiness:
kubectl get rayservice
kubectl describe rayservice rayservice-sample
When Ray Serve applications are healthy, KubeRay automatically creates two services:
- Head service (
rayservice-sample-head-svc
): Cluster management and dashboard - Serve service (
rayservice-sample-serve-svc
): Application queries and requests
Accessing the Application
Set up port forwarding to query your RayService locally:
# Forward application requests
kubectl port-forward service/rayservice-sample-serve-svc 8000
# Forward dashboard access
kubectl port-forward svc/rayservice-sample-head-svc --address 0.0.0.0 8265:8265
Access the Ray dashboard at http://localhost:8265/#/serve
to monitor your deployment.
Production Deployment on AWS EKS
Deploying on AWS EKS builds upon the local setup with additional considerations for cloud resources, networking, and permissions.
EKS Cluster Setup
Follow the AWS EKS documentation to complete these prerequisites:
- VPC Configuration: Create an Amazon VPC with public and private subnets meeting EKS requirements
- IAM Roles: Create cluster IAM roles with required EKS managed policies
- Cluster Creation: Deploy the EKS cluster through the AWS console or CLI
EKS clusters require proper IAM permissions to manage AWS resources on your behalf, so ensure all necessary service roles are configured.
Local Configuration for EKS Access
Configure your local environment to communicate with the EKS cluster:
aws eks update-kubeconfig --region eu-west-2 --name ray-cluster
Verify connectivity by checking the default Kubernetes service:
kubectl get svc
# Expected output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 3m45s
Node Group Configuration
EKS supports two node types: Fargate and Managed nodes. For Ray workloads, managed node groups offer better control and performance. See Amazon EKS nodes documentation for detailed comparisons.
Create managed node groups through the EKS console, specifying the subnets and IAM roles from previous steps.
Best Practice: Avoid running GPU workloads on Ray head nodes to prevent resource contention.
RayService Deployment
Deploy your RayService using the same configuration from the local deployment section:
kubectl apply -f custom.yaml
External Access with AWS Load Balancer
While port forwarding works for development, production deployments require proper external access through Kubernetes ingress.
Installing AWS Load Balancer Controller
Follow the official installation guide to install the AWS Load Balancer Controller. This controller manages Elastic Load Balancers for Kubernetes clusters.
Important: The controller runs on worker nodes and requires IAM permissions for ALB/NLB API access.
Configuring ALB Ingress
Configure ALB Ingress for external Ray API access following the KubeRay ingress documentation. Use this sample ingress.yaml as a starting point:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ray-cluster-ingress
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/tags: Environment=dev,Team=test
alb.ingress.kubernetes.io/subnets: subnet-1, subnet-2
alb.ingress.kubernetes.io/target-type: ip
# Health Check Settings. Health check is needed for
# ALB to route traffic to the healthy pod.
alb.ingress.kubernetes.io/healthcheck-protocol: HTTP
alb.ingress.kubernetes.io/healthcheck-port: traffic-port
alb.ingress.kubernetes.io/healthcheck-path: /-/routes
spec:
ingressClassName: alb
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: rayservice-dummy-serve-svc # Serve service
port:
number: 8000 # default HTTP port number for serving requests
Key Configuration Requirements
Subnet Configuration (alb.ingress.kubernetes.io/subnets
):
- Include at least two subnets across different Availability Zones
- Use public subnets (with "Auto-assign public IPv4 address" enabled)
- Each AZ can contain only one subnet
Service Targeting:
- Point to
rayservice-sample-serve-svc
for high availability - This service provides automatic traffic routing and health-aware load balancing
- Maintains availability during upgrades and failure scenarios
Deploying the Ingress
Apply the ingress configuration and monitor its status:
kubectl apply -f ray-service-alb-ingress.yaml
kubectl describe ingress ray-cluster-ingress
Verify the ALB creation in the AWS Console (EC2 → Load Balancing → Load Balancers). The ALB name follows the pattern k8s-default-<ingress-name>
. Use the ALB DNS name to access your Ray API externally.
Log Management and Persistence
Ray doesn't provide native log storage, requiring manual lifecycle management. By default, Ray writes logs to /tmp/ray/session_*/logs
on each pod's filesystem, including both application and system logs.
For production deployments, implement log processing using tools like Fluent Bit. The recommended approach uses log-processing sidecars for each Ray pod, sharing the /tmp/ray
directory via volume mounts.
Sidecar Configuration Options:
- Stream logs to sidecar stdout for kubectl access
- Export logs to external services (CloudWatch, Elasticsearch, etc.)
Fluent Bit Configuration
Create a ConfigMap with Fluent Bit configuration. This example tails Ray logs and exports to CloudWatch:
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentbit-config
data:
fluent-bit.conf: |
[INPUT]
Name tail
Path /tmp/ray/session_latest/logs/*
Tag ray
Path_Key true
Refresh_Interval 5
[OUTPUT]
Name cloudwatch_logs
Match *
region us-east-1
log_group_name fluent-bit-cloudwatch
log_stream_prefix from-fluent-bit-
auto_create_group On
Configuration Notes:
- Output flexibility: Use
[OUTPUT]
clauses to export to various storage backends supported by Fluent Bit - File name tracking:
Path_Key true
includes filenames in log records - Refresh optimization:
Refresh_Interval 5
checks for new files every 5 seconds (vs. default 60s), improving log visibility since/tmp/ray/session_latest/logs/
is created dynamically
Volume Configuration
Add two volumes to each RayCluster pod template:
volumes:
- name: ray-logs
emptyDir: {}
- name: fluentbit-config
configMap:
name: fluentbit-config
Configure the Ray container volume mount:
volumeMounts:
- mountPath: /tmp/ray
name: ray-logs
Sidecar Container Configuration
Add the Fluent Bit sidecar to each Ray pod in your RayCluster CR:
- name: fluentbit
image: fluent/fluent-bit:1.9.6
# These resource requests for Fluent Bit should be sufficient in production.
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 100m
memory: 128Mi
volumeMounts:
- mountPath: /tmp/ray
name: ray-logs
- mountPath: /fluent-bit/etc/fluent-bit.conf
subPath: fluent-bit.conf
name: fluentbit-config
The volume mounts provide:
- ray-logs volume: Sidecar access to Ray's log files
- fluentbit-config volume: Access to logging configuration from ConfigMap
Conclusion
Deploying distributed Ray applications on Kubernetes, especially with AWS EKS, delivers enterprise-grade scalability and operational efficiency. This comprehensive guide covers the complete journey from local development with Kind to production deployment with external access and centralized logging.
The integration of Ray with Kubernetes provides essential benefits: robust fault tolerance, seamless scaling, standardized deployment workflows, and comprehensive observability. By leveraging cloud-native patterns and AWS services, teams can focus on application logic while Kubernetes handles infrastructure complexity.
Resources
- Custom Docker Images
- RayCluster Configuration
- Ray Docs: Launching Ray Clusters on AWS
- Scaling AI and Machine Learning Workloads with Ray on AWS
- Deploying Ray Cluster for AI/ML workloads on a Kubernetes Cluster
- Cluster Management CLI
- Ray Serve Production Guide
- Deploy on Kubernetes
- RayService Quickstart
- Getting started with Amazon EKS – AWS Management Console and AWS CLI
- Start Amazon EKS Cluster with GPUs for KubeRay
- ALB configuration
- KubeRay Ingress
- AWS Load Balancer Controller installation
- Configuring a Kubernetes service account to assume an IAM role
- Deploy Ray Serve Applications
- Using Prometheus and Grafana
- KubeRay Autoscaling