In this post I go through the end-to-end process of deploying a distributed Python server using Kubernetes. For Ray Serve, it is recommended in the docs to deploy it in production on Kubernetes, with the recommended practice to use the RayService controller that’s provided as part of KubeRay.
This integration offers the scalability and user experience of Ray Serve with the operational benefits of Kubernetes, including the ability to integrate with existing Kubernetes-based applications. RayService simplifies production deployment by managing health checks, status reporting, failure recovery, and updates for you.
A RayService Custom Resource (CR) encapsulates a multi-node Ray Cluster and a Serve application that runs on top of it into a single Kubernetes manifest. Deploying, upgrading, and getting the status of the application can be done using standard kubectl
commands.
This guide covers both local Kubernetes deployment using Kind, as well as a cloud-based deployment on AWS using EKS. You can follow along with my sample repo on my GitHub.
Deployment Locally: Kind
First, we will deploy locally on Kind, a tool for running local Kubernetes clusters using Docker container "nodes". kind was primarily designed for testing Kubernetes itself, but is really be used for local development. You will also need the AWS CLI, kubectl, Helm and eksctl, alongside Docker.
1. Creating a Kubernetes Cluster with Kind
To start, create a local Kubernetes cluster:
kind create cluster --image=kindest/node:v1.23.0
2. Install the KubeRay operator
We then install the KubeRay operator via Helm:
$ helm repo add kuberay https://ray-project.github.io/kuberay-helm/
$ helm repo update
# Install both CRDs and KubeRay operator v1.0.0-rc.0.
$ helm install kuberay-operator kuberay/kuberay-operator --version 1.0.0-rc.0
# Confirm that the operator is running in the namespace `default`.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
kuberay-operator-68cc555c9-qc7cf 1/1 Running 0 22s
3. Setting Up a RayService Custom Resource (CR)
A RayService manages two components: RayClusters and Ray Serve applications, and offers Kubernetes-native support, in-place updates, zero downtime upgrades, and service high availability.
Creating the CR Yaml
To package and serve your code in a custom Docker image, you need to extend the official Ray Docker images with your dependencies and package your Serve application. The rayproject organization maintains Docker images required for Ray. For example, our sample app uses a Dockerfile extending the rayproject/ray
image.
# pull official base image
FROM rayproject/ray:nightly-py310-cpu
# install requirements.txt
COPY requirements.txt /tmp/
RUN pip install --requirement /tmp/requirements.txt
# set work directory
WORKDIR /serve_app
# set environment variables
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
# Copy App foler into workdir
COPY app /serve_app/app
Custom Docker images can be run in KubeRay by adjusting the RayService
configuration as seen in the YAML
file for the sample app.
To manage the Ray Serve application locally, create and update a RayService CR:
kubectl apply -f custom.yaml
Sometimes you may not want to push an image to dockerHub and instead want to run it locally with a pod based on an image that you just build on your PC. To do this, you will need to load the image into your cluster. So, to manage the Ray Serve application, create and update a RayService CR:
4. Check RayService Status
Once RayService is created, monitor its status via standard kubectl
commands, which will provide insights into the health and readiness of the application.
When the Ray Serve applications are healthy and ready, KubeRay creates a head service and a Ray Serve service for the RayService custom resource. For example, rayservice-sample-head-svc
and rayservice-sample-serve-svc
. The latter is the one that can be used to send queries to the Serve application.
5. Querying the Application
For querying the RayService locally, set up port forwarding and access the service directly:
kubectl port-forward service/rayservice-sample-serve-svc 8000
Forward the dashboard port to localhost aswell, and check the Serve page in the Ray dashboard at http://localhost:8265/#/serve
kubectl port-forward svc/rayservice-sample-head-svc --address 0.0.0.0 8265:8265
Deploying on AWS with EKS
The process for deploying on AWS with EKS includes the local deployment steps, but it has some extra considerations for cloud-specific resources and permissions.
1. Cluster Creation on EKS
First follow the first two steps in this AWS documentation to create an Amazon VPC with public and private subnets that meets Amazon EKS requirements. You also need to create a cluster IAM role and attach the required Amazon EKS IAM managed policy to it.
Kubernetes clusters managed by Amazon EKS make calls to other AWS services on your behalf to manage the resources that you use with the service, so you'll need to make sure you create the permissions you'll need. Finally, add the cluster in the UI.
2. PC Configuration for EKS Communication
Create a kubeconfig
file for the EKS cluster. The settings in this file enable the kubectl
CLI to communicate with the cluster. Create or update a kubeconfig
file for your cluster by running the following command:
aws eks update-kubeconfig --region eu-west-2 --name ray-cluster
Now running kubectl get svc
should return:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 3m45s
3. Node Creation
You can create a cluster with either Fargate
or Managed nodes
node types. To learn more about each type, see Amazon EKS nodes. For this use case, I create a managed node group, specifying the subnets and node IAM role created in previous steps.
Create the node groups in the EKS UI. Typically, avoid running GPU workloads on the Ray head.
4. RayService Custom Resource Deployment
Deploy the Ray service the same way we did when we deployed locally.
5. AWS Load Balancer Controller Installation
When we deployed locally, we used port forwarding, and that was okay for dev. To access EKS from an endpoint, it is better to access it by configuring a Kubernetes ingress. The first step is to follow the installation instructions to set up the AWS Load Balancer controller. AWS Load Balancer Controller is a controller to help manage Elastic Load Balancers for a Kubernetes cluster.
The controller runs on the worker nodes, so it needs access to the AWS ALB/NLB APIs with IAM permissions aswell.
6. AWS Application Load Balancer (ALB) Ingress
Set up the ALB Ingress for external access to the Ray API as detailed in the KubeRay documentation. Refer to the example ingress.yaml
for in the demo repo for a starting point:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ray-cluster-ingress
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/tags: Environment=dev,Team=test
alb.ingress.kubernetes.io/subnets: subnet-1, subnet-2
alb.ingress.kubernetes.io/target-type: ip
# Health Check Settings. Health check is needed for
# ALB to route traffic to the healthy pod.
alb.ingress.kubernetes.io/healthcheck-protocol: HTTP
alb.ingress.kubernetes.io/healthcheck-port: traffic-port
alb.ingress.kubernetes.io/healthcheck-path: /-/routes
spec:
ingressClassName: alb
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: rayservice-dummy-serve-svc # Serve service
port:
number: 8000 # default HTTP port number for serving requests
Make sure that:
- Annotation
alb.ingress.kubernetes.io/subnets
- Include at least two subnets.
- One Availability Zone (e.g. us-west-2a) can only have at most 1 subnet.
- In this example, you need to select public subnets (subnets that "Auto-assign public IPv4 address" is Yes on AWS dashboard)
- Set the name of head pod service to what you want to direct to.
rayservice-sample-serve-svc
is HA in general. It does traffic routing among all the workers which have Serve deployments and always tries to point to the healthy cluster, even during upgrading or failing cases.
Apply it and check the status:
kubectl apply -f ray-service-alb-ingress.yaml
kubectl describe ingress ray-cluster-ingress
You should now be able to check ALB on AWS (EC2 -> Load Balancing -> Load Balancers). The name of the ALB should be along the lines of k8s-default-<name>
. Check the ALB DNS Name to interact with the newly deployed Ray API!
10. Log Persistence
Similar to Kubernetes, Ray does not provide a native storage solution for log data. You need to manage the lifecycle of the logs by themselves. By default, Ray writes logs to files in the directory /tmp/ray/session_*/logs
on each Ray pod's file system, including application and system logs.
There are a number of open source log processing tools available within the Kubernetes ecosystem, but I use Fluent Bit. One way to go about processing logs is by configuring a log-processing sidecar for each Ray pod. Ray containers should be configured to share the /tmp/ray
directory with the logging sidecar via a volume mount. You can configure the sidecar to do either of the following:
- Stream Ray logs to the sidecar's stdout.
- Export logs to an external service.
First create a ConfigMap with configuration for Fluent Bit. Below is a minimal example, which tells a Fluent Bit sidecar to (1) Tail Ray logs and (2) Output the logs to cloudwatch.
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentbit-config
data:
fluent-bit.conf: |
[INPUT]
Name tail
Path /tmp/ray/session_latest/logs/*
Tag ray
Path_Key true
Refresh_Interval 5
[OUTPUT]
Name cloudwatch_logs
Match *
region us-east-1
log_group_name fluent-bit-cloudwatch
log_stream_prefix from-fluent-bit-
auto_create_group On
A few notes on the above config:
- You can use an
[OUTPUT]
clause to export logs to a bunch of storage backends supported by Fluent Bit. - The
Path_Key true
line above ensures that file names are included in the log records emitted by Fluent Bit. - The
Refresh_Interval 5
line asks Fluent Bit to refresh the list of files in the log directory once per 5 seconds, rather than the default 60. The reason is that the directory/tmp/ray/session_latest/logs/
does not exist initially (Ray must create it first). Setting theRefresh_Interval
low allows us to see logs in the Fluent Bit container's stdout sooner.
For each pod template in our RayCluster CR, we need to add two volumes: One volume for Ray's logs and another volume to store Fluent Bit configuration from the ConfigMap.
volumes:
- name: ray-logs
emptyDir: {}
- name: fluentbit-config
configMap:
name: fluentbit-config
Add the following volume mount to the Ray container’s configuration:
volumeMounts:
- mountPath: /tmp/ray
name: ray-logs
Finally, add the Fluent Bit sidecar container to each Ray pod config in your RayCluster CR:
- name: fluentbit
image: fluent/fluent-bit:1.9.6
# These resource requests for Fluent Bit should be sufficient in production.
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 100m
memory: 128Mi
volumeMounts:
- mountPath: /tmp/ray
name: ray-logs
- mountPath: /fluent-bit/etc/fluent-bit.conf
subPath: fluent-bit.conf
name: fluentbit-config
Mounting the ray-logs
volume gives the sidecar container access to Ray's logs.
The fluentbit-config
volume gives the sidecar access to logging configuration.
Conclusion
Deploying a distributed Ray Python server with Kubernetes, particularly on AWS EKS, offers scalable and efficient server management. This guide provides a step-by-step approach, from setting up a local Kind cluster to deploying on AWS with external access and log management. This integration of Ray with Kubernetes is essential to enable robust, scalable, and efficient management of Ray Serve applications, harnessing the full power of cloud computing as well as container orchestration.
Resources
- Custom Docker Images
- RayCluster Configuration
- Ray Docs: Launching Ray Clusters on AWS
- Scaling AI and Machine Learning Workloads with Ray on AWS
- Deploying Ray Cluster for AI/ML workloads on a Kubernetes Cluster
- Cluster Management CLI
- Ray Serve Production Guide
- Deploy on Kubernetes
- RayService Quickstart
- Getting started with Amazon EKS – AWS Management Console and AWS CLI
- Start Amazon EKS Cluster with GPUs for KubeRay
- ALB configuration
- KubeRay Ingress
- AWS Load Balancer Controller installation
- Configuring a Kubernetes service account to assume an IAM role
- Deploy Ray Serve Applications
- Using Prometheus and Grafana
- KubeRay Autoscaling