MLOps: Deploying a Distributed Ray Python Server with Kubernetes, EKS & KubeRay

Published: November 15, 2023

In this post I go through the end-to-end process of deploying a distributed Python server using Kubernetes. For Ray Serve, it is recommended in the docs to deploy it in production on Kubernetes, with the recommended practice to use the RayService controller that’s provided as part of KubeRay.

This integration offers the scalability and user experience of Ray Serve with the operational benefits of Kubernetes, including the ability to integrate with existing Kubernetes-based applications. RayService simplifies production deployment by managing health checks, status reporting, failure recovery, and updates for you.

A RayService Custom Resource (CR) encapsulates a multi-node Ray Cluster and a Serve application that runs on top of it into a single Kubernetes manifest. Deploying, upgrading, and getting the status of the application can be done using standard kubectl commands.

This guide covers both local Kubernetes deployment using Kind, as well as a cloud-based deployment on AWS using EKS. You can follow along with my sample repo on my GitHub.

Deployment Locally: Kind

First, we will deploy locally on Kind, a tool for running local Kubernetes clusters using Docker container "nodes". kind was primarily designed for testing Kubernetes itself, but is really be used for local development. You will also need the AWS CLI, kubectl, Helm and eksctl, alongside Docker.

1. Creating a Kubernetes Cluster with Kind

To start, create a local Kubernetes cluster:

kind create cluster --image=kindest/node:v1.23.0

2. Install the KubeRay operator

We then install the KubeRay operator via Helm:

$ helm repo add kuberay
$ helm repo update
# Install both CRDs and KubeRay operator v1.0.0-rc.0.
$ helm install kuberay-operator kuberay/kuberay-operator --version 1.0.0-rc.0
# Confirm that the operator is running in the namespace `default`.
$ kubectl get pods
NAME                               READY   STATUS    RESTARTS   AGE
kuberay-operator-68cc555c9-qc7cf   1/1     Running   0          22s

3. Setting Up a RayService Custom Resource (CR)

A RayService manages two components: RayClusters and Ray Serve applications, and offers Kubernetes-native support, in-place updates, zero downtime upgrades, and service high availability.

Creating the CR Yaml

To package and serve your code in a custom Docker image, you need to extend the official Ray Docker images with your dependencies and package your Serve application. The rayproject organization maintains Docker images required for Ray. For example, our sample app uses a Dockerfile extending the rayproject/ray image.

# pull official base image
FROM rayproject/ray:nightly-py310-cpu
# install requirements.txt
COPY requirements.txt /tmp/
RUN pip install --requirement /tmp/requirements.txt
# set work directory
WORKDIR /serve_app
# set environment variables
# Copy App foler into workdir
COPY app /serve_app/app

Custom Docker images can be run in KubeRay by adjusting the RayService configuration as seen in the YAML file for the sample app.

To manage the Ray Serve application locally, create and update a RayService CR:

kubectl apply -f custom.yaml

Sometimes you may not want to push an image to dockerHub and instead want to run it locally with a pod based on an image that you just build on your PC. To do this, you will need to load the image into your cluster. So, to manage the Ray Serve application, create and update a RayService CR:

4. Check RayService Status

Once RayService is created, monitor its status via standard kubectl commands, which will provide insights into the health and readiness of the application.

When the Ray Serve applications are healthy and ready, KubeRay creates a head service and a Ray Serve service for the RayService custom resource. For example, rayservice-sample-head-svc and rayservice-sample-serve-svc. The latter is the one that can be used to send queries to the Serve application.

5. Querying the Application

For querying the RayService locally, set up port forwarding and access the service directly:

kubectl port-forward service/rayservice-sample-serve-svc 8000

Forward the dashboard port to localhost aswell, and check the Serve page in the Ray dashboard at http://localhost:8265/#/serve

kubectl port-forward svc/rayservice-sample-head-svc --address 8265:8265

Deploying on AWS with EKS

The process for deploying on AWS with EKS includes the local deployment steps, but it has some extra considerations for cloud-specific resources and permissions.

1. Cluster Creation on EKS

First follow the first two steps in this AWS documentation to create an Amazon VPC with public and private subnets that meets Amazon EKS requirements. You also need to create a cluster IAM role and attach the required Amazon EKS IAM managed policy to it.

Kubernetes clusters managed by Amazon EKS make calls to other AWS services on your behalf to manage the resources that you use with the service, so you'll need to make sure you create the permissions you'll need. Finally, add the cluster in the UI.

2. PC Configuration for EKS Communication

Create a kubeconfig file for the EKS cluster. The settings in this file enable the kubectl CLI to communicate with the cluster. Create or update a kubeconfig file for your cluster by running the following command:

aws eks update-kubeconfig --region eu-west-2 --name ray-cluster

Now running kubectl get svc should return:

kubernetes   ClusterIP   <none>        443/TCP   3m45s

3. Node Creation

You can create a cluster with either Fargate or Managed nodes node types. To learn more about each type, see Amazon EKS nodes. For this use case, I create a managed node group, specifying the subnets and node IAM role created in previous steps.

Create the node groups in the EKS UI. Typically, avoid running GPU workloads on the Ray head.

4. RayService Custom Resource Deployment

Deploy the Ray service the same way we did when we deployed locally.

5. AWS Load Balancer Controller Installation

When we deployed locally, we used port forwarding, and that was okay for dev. To access EKS from an endpoint, it is better to access it by configuring a Kubernetes ingress. The first step is to follow the installation instructions to set up the AWS Load Balancer controller. AWS Load Balancer Controller is a controller to help manage Elastic Load Balancers for a Kubernetes cluster.

The controller runs on the worker nodes, so it needs access to the AWS ALB/NLB APIs with IAM permissions aswell.

6. AWS Application Load Balancer (ALB) Ingress

Set up the ALB Ingress for external access to the Ray API as detailed in the KubeRay documentation. Refer to the example ingress.yaml for in the demo repo for a starting point:

kind: Ingress
  name: ray-cluster-ingress
  annotations: internet-facing Environment=dev,Team=test subnet-1, subnet-2 ip
    # Health Check Settings. Health check is needed for
    # ALB to route traffic to the healthy pod. HTTP traffic-port /-/routes
  ingressClassName: alb
    - http:
          - path: /
            pathType: Prefix
                name: rayservice-dummy-serve-svc # Serve service
                  number: 8000 # default HTTP port number for serving requests

Make sure that:

  1. Annotation
    • Include at least two subnets.
    • One Availability Zone (e.g. us-west-2a) can only have at most 1 subnet.
    • In this example, you need to select public subnets (subnets that "Auto-assign public IPv4 address" is Yes on AWS dashboard)
  2. Set the name of head pod service to what you want to direct to.
    • rayservice-sample-serve-svc is HA in general. It does traffic routing among all the workers which have Serve deployments and always tries to point to the healthy cluster, even during upgrading or failing cases.

Apply it and check the status:

kubectl apply -f ray-service-alb-ingress.yaml
kubectl describe ingress ray-cluster-ingress

You should now be able to check ALB on AWS (EC2 -> Load Balancing -> Load Balancers). The name of the ALB should be along the lines of k8s-default-<name>. Check the ALB DNS Name to interact with the newly deployed Ray API!

10. Log Persistence

Similar to Kubernetes, Ray does not provide a native storage solution for log data. You need to manage the lifecycle of the logs by themselves. By default, Ray writes logs to files in the directory /tmp/ray/session_*/logs on each Ray pod's file system, including application and system logs.

There are a number of open source log processing tools available within the Kubernetes ecosystem, but I use Fluent Bit. One way to go about processing logs is by configuring a log-processing sidecar for each Ray pod. Ray containers should be configured to share the /tmp/ray directory with the logging sidecar via a volume mount. You can configure the sidecar to do either of the following:

  • Stream Ray logs to the sidecar's stdout.
  • Export logs to an external service.

First create a ConfigMap with configuration for Fluent Bit. Below is a minimal example, which tells a Fluent Bit sidecar to (1) Tail Ray logs and (2) Output the logs to cloudwatch.

apiVersion: v1
kind: ConfigMap
  name: fluentbit-config
  fluent-bit.conf: |
        Name tail
        Path /tmp/ray/session_latest/logs/*
        Tag ray
        Path_Key true
        Refresh_Interval 5
        Name cloudwatch_logs
        Match   *
        region us-east-1
        log_group_name fluent-bit-cloudwatch
        log_stream_prefix from-fluent-bit-
        auto_create_group On

A few notes on the above config:

  • You can use an [OUTPUT] clause to export logs to a bunch of storage backends supported by Fluent Bit.
  • The Path_Key true line above ensures that file names are included in the log records emitted by Fluent Bit.
  • The Refresh_Interval 5 line asks Fluent Bit to refresh the list of files in the log directory once per 5 seconds, rather than the default 60. The reason is that the directory /tmp/ray/session_latest/logs/ does not exist initially (Ray must create it first). Setting the Refresh_Interval low allows us to see logs in the Fluent Bit container's stdout sooner.

For each pod template in our RayCluster CR, we need to add two volumes: One volume for Ray's logs and another volume to store Fluent Bit configuration from the ConfigMap.

  - name: ray-logs
    emptyDir: {}
  - name: fluentbit-config
      name: fluentbit-config

Add the following volume mount to the Ray container’s configuration:

  - mountPath: /tmp/ray
    name: ray-logs

Finally, add the Fluent Bit sidecar container to each Ray pod config in your RayCluster CR:

- name: fluentbit
  image: fluent/fluent-bit:1.9.6
  # These resource requests for Fluent Bit should be sufficient in production.
      cpu: 100m
      memory: 128Mi
      cpu: 100m
      memory: 128Mi
    - mountPath: /tmp/ray
      name: ray-logs
    - mountPath: /fluent-bit/etc/fluent-bit.conf
      subPath: fluent-bit.conf
      name: fluentbit-config

Mounting the ray-logs volume gives the sidecar container access to Ray's logs. The fluentbit-config volume gives the sidecar access to logging configuration.


Deploying a distributed Ray Python server with Kubernetes, particularly on AWS EKS, offers scalable and efficient server management. This guide provides a step-by-step approach, from setting up a local Kind cluster to deploying on AWS with external access and log management. This integration of Ray with Kubernetes is essential to enable robust, scalable, and efficient management of Ray Serve applications, harnessing the full power of cloud computing as well as container orchestration.