Published: November 15, 2023

MLOps: Deploying a Distributed Ray Python Server with Kubernetes, EKS & KubeRay

Deploying distributed Python applications in production requires careful orchestration of compute resources, fault tolerance, and scalability. For Ray Serve, the official documentation recommends Kubernetes deployment using the RayService controller from KubeRay.

This integration delivers the scalability and user experience of Ray Serve alongside Kubernetes' operational benefits, including seamless integration with existing Kubernetes applications. RayService simplifies production deployment by automatically managing health checks, status reporting, failure recovery, and rolling updates.

A RayService Custom Resource (CR) encapsulates a multi-node Ray Cluster and Serve application into a single Kubernetes manifest. All deployment, upgrade, and status operations use standard kubectl commands.

This guide covers both local development using Kind and production deployment on AWS EKS. Follow along with the complete sample repository on GitHub.

Local Development with Kind

We'll start by deploying locally using Kind, a tool for running Kubernetes clusters using Docker containers. While Kind was designed for testing Kubernetes itself, it's excellent for local development.

Prerequisites: Install the AWS CLI, kubectl, Helm, eksctl, and Docker.

Creating a Kubernetes Cluster

Create a local Kubernetes cluster with the following command:

kind create cluster --image=kindest/node:v1.23.0

Installing the KubeRay Operator

Install the KubeRay operator using Helm:

$ helm repo add kuberay https://ray-project.github.io/kuberay-helm/
$ helm repo update
# Install both CRDs and KubeRay operator v1.0.0-rc.0.
$ helm install kuberay-operator kuberay/kuberay-operator --version 1.0.0-rc.0
# Confirm that the operator is running in the namespace `default`.
$ kubectl get pods
NAME                               READY   STATUS    RESTARTS   AGE
kuberay-operator-68cc555c9-qc7cf   1/1     Running   0          22s

Configuring RayService Custom Resource

A RayService manages two key components: RayClusters and Ray Serve applications. It provides Kubernetes-native support, in-place updates, zero-downtime upgrades, and service high availability.

Custom Docker Image Setup

To deploy your application, extend the official Ray Docker images with your dependencies. The rayproject organization maintains the base images. Our sample application uses this Dockerfile:

# pull official base image
FROM rayproject/ray:nightly-py310-cpu
# install requirements.txt
COPY requirements.txt /tmp/
RUN pip install --requirement /tmp/requirements.txt
# set work directory
WORKDIR /serve_app
# set environment variables
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
# Copy App foler into workdir
COPY app /serve_app/app

Deploying the RayService

Deploy your custom Docker image by configuring the RayService as shown in the sample YAML file.

Apply the RayService configuration:

kubectl apply -f custom.yaml

Local Development Tip: If you prefer not to push images to Docker Hub, you can load local images directly into your Kind cluster.

Monitoring RayService Status

Monitor your deployment using standard kubectl commands to track health and readiness:

kubectl get rayservice
kubectl describe rayservice rayservice-sample

When Ray Serve applications are healthy, KubeRay automatically creates two services:

  • Head service (rayservice-sample-head-svc): Cluster management and dashboard
  • Serve service (rayservice-sample-serve-svc): Application queries and requests

Accessing the Application

Set up port forwarding to query your RayService locally:

# Forward application requests
kubectl port-forward service/rayservice-sample-serve-svc 8000
 
# Forward dashboard access
kubectl port-forward svc/rayservice-sample-head-svc --address 0.0.0.0 8265:8265

Access the Ray dashboard at http://localhost:8265/#/serve to monitor your deployment.

Production Deployment on AWS EKS

Deploying on AWS EKS builds upon the local setup with additional considerations for cloud resources, networking, and permissions.

EKS Cluster Setup

Follow the AWS EKS documentation to complete these prerequisites:

  1. VPC Configuration: Create an Amazon VPC with public and private subnets meeting EKS requirements
  2. IAM Roles: Create cluster IAM roles with required EKS managed policies
  3. Cluster Creation: Deploy the EKS cluster through the AWS console or CLI

EKS clusters require proper IAM permissions to manage AWS resources on your behalf, so ensure all necessary service roles are configured.

Local Configuration for EKS Access

Configure your local environment to communicate with the EKS cluster:

aws eks update-kubeconfig --region eu-west-2 --name ray-cluster

Verify connectivity by checking the default Kubernetes service:

kubectl get svc
# Expected output:
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.100.0.1   <none>        443/TCP   3m45s

Node Group Configuration

EKS supports two node types: Fargate and Managed nodes. For Ray workloads, managed node groups offer better control and performance. See Amazon EKS nodes documentation for detailed comparisons.

Create managed node groups through the EKS console, specifying the subnets and IAM roles from previous steps.

Best Practice: Avoid running GPU workloads on Ray head nodes to prevent resource contention.

RayService Deployment

Deploy your RayService using the same configuration from the local deployment section:

kubectl apply -f custom.yaml

External Access with AWS Load Balancer

While port forwarding works for development, production deployments require proper external access through Kubernetes ingress.

Installing AWS Load Balancer Controller

Follow the official installation guide to install the AWS Load Balancer Controller. This controller manages Elastic Load Balancers for Kubernetes clusters.

Important: The controller runs on worker nodes and requires IAM permissions for ALB/NLB API access.

Configuring ALB Ingress

Configure ALB Ingress for external Ray API access following the KubeRay ingress documentation. Use this sample ingress.yaml as a starting point:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ray-cluster-ingress
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/tags: Environment=dev,Team=test
    alb.ingress.kubernetes.io/subnets: subnet-1, subnet-2
    alb.ingress.kubernetes.io/target-type: ip
    # Health Check Settings. Health check is needed for
    # ALB to route traffic to the healthy pod.
    alb.ingress.kubernetes.io/healthcheck-protocol: HTTP
    alb.ingress.kubernetes.io/healthcheck-port: traffic-port
    alb.ingress.kubernetes.io/healthcheck-path: /-/routes
spec:
  ingressClassName: alb
  rules:
    - http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: rayservice-dummy-serve-svc # Serve service
                port:
                  number: 8000 # default HTTP port number for serving requests

Key Configuration Requirements

Subnet Configuration (alb.ingress.kubernetes.io/subnets):

  • Include at least two subnets across different Availability Zones
  • Use public subnets (with "Auto-assign public IPv4 address" enabled)
  • Each AZ can contain only one subnet

Service Targeting:

  • Point to rayservice-sample-serve-svc for high availability
  • This service provides automatic traffic routing and health-aware load balancing
  • Maintains availability during upgrades and failure scenarios

Deploying the Ingress

Apply the ingress configuration and monitor its status:

kubectl apply -f ray-service-alb-ingress.yaml
kubectl describe ingress ray-cluster-ingress

Verify the ALB creation in the AWS Console (EC2 → Load Balancing → Load Balancers). The ALB name follows the pattern k8s-default-<ingress-name>. Use the ALB DNS name to access your Ray API externally.

Log Management and Persistence

Ray doesn't provide native log storage, requiring manual lifecycle management. By default, Ray writes logs to /tmp/ray/session_*/logs on each pod's filesystem, including both application and system logs.

For production deployments, implement log processing using tools like Fluent Bit. The recommended approach uses log-processing sidecars for each Ray pod, sharing the /tmp/ray directory via volume mounts.

Sidecar Configuration Options:

  • Stream logs to sidecar stdout for kubectl access
  • Export logs to external services (CloudWatch, Elasticsearch, etc.)

Fluent Bit Configuration

Create a ConfigMap with Fluent Bit configuration. This example tails Ray logs and exports to CloudWatch:

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentbit-config
data:
  fluent-bit.conf: |
    [INPUT]
        Name tail
        Path /tmp/ray/session_latest/logs/*
        Tag ray
        Path_Key true
        Refresh_Interval 5
    [OUTPUT]
        Name cloudwatch_logs
        Match   *
        region us-east-1
        log_group_name fluent-bit-cloudwatch
        log_stream_prefix from-fluent-bit-
        auto_create_group On

Configuration Notes:

  • Output flexibility: Use [OUTPUT] clauses to export to various storage backends supported by Fluent Bit
  • File name tracking: Path_Key true includes filenames in log records
  • Refresh optimization: Refresh_Interval 5 checks for new files every 5 seconds (vs. default 60s), improving log visibility since /tmp/ray/session_latest/logs/ is created dynamically

Volume Configuration

Add two volumes to each RayCluster pod template:

volumes:
  - name: ray-logs
    emptyDir: {}
  - name: fluentbit-config
    configMap:
      name: fluentbit-config

Configure the Ray container volume mount:

volumeMounts:
  - mountPath: /tmp/ray
    name: ray-logs

Sidecar Container Configuration

Add the Fluent Bit sidecar to each Ray pod in your RayCluster CR:

- name: fluentbit
  image: fluent/fluent-bit:1.9.6
  # These resource requests for Fluent Bit should be sufficient in production.
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 100m
      memory: 128Mi
  volumeMounts:
    - mountPath: /tmp/ray
      name: ray-logs
    - mountPath: /fluent-bit/etc/fluent-bit.conf
      subPath: fluent-bit.conf
      name: fluentbit-config

The volume mounts provide:

  • ray-logs volume: Sidecar access to Ray's log files
  • fluentbit-config volume: Access to logging configuration from ConfigMap

Conclusion

Deploying distributed Ray applications on Kubernetes, especially with AWS EKS, delivers enterprise-grade scalability and operational efficiency. This comprehensive guide covers the complete journey from local development with Kind to production deployment with external access and centralized logging.

The integration of Ray with Kubernetes provides essential benefits: robust fault tolerance, seamless scaling, standardized deployment workflows, and comprehensive observability. By leveraging cloud-native patterns and AWS services, teams can focus on application logic while Kubernetes handles infrastructure complexity.


Resources