KASI Science Cloud : Kubernetes cluster - deploying application using Helm

As demonstrated in the previous section, deploying an app often require many components with fine tuned configuration among them (i.e., lots of YAML files).

Helm is a tool to ease this process.

This document tries to give you some idea how you can deploy applications using Helm. You can consider Helm as a package manager for k8s.

Step 1. Installing Helm

Helm needs to be installed on a computer where you have setup the kube config and the kubectl executable is installed.
In this guide, we are installing Helm on the gateway node where we have set up already (see this if you have not).

install helm

$ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
$ chmod 700 get_helm.sh
$ ./get_helm.sh

there are other ways to install helm. take a look at the document if you want.
Somehow, KASI's firewall is blocking the access to "raw.githubusercontent.com", so the first command to download the "helm.sh" file will fail. Please download the file from the computer not connected to KASI's network (e.g., KASI-AP internet works for example) and then upload it to the server.

Step 2. Installing Applications from the repo

To applications using helm, you need to have a "chart". You can download charts to your local disk and run helm, or you can add a repository of the charts so that necessary files are downloaded on the fly.

> helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
> helm repo update

> helm repo list  # to show a list of repo you have added.
> helm search repo  # list available charts

We will use jupyterhub chart in the jupyterhub repo ("jupyterhub/jupyterhub" in short).

You can go ahead and deploy it using the default values, but it is good idea to have a config file so that you can customize if required.

> helm show values jupyterhub/jupyterhub > values.yaml

Take a look at the values.yml file. It can be overwhelming, but often, you only need to change few, if required.

> helm install jupyterhub-spherex jupyterhub/jupyterhub --namespace jupyterhub --create-namespace --values values.yaml

Check sure everything is okay. The PVC(and PV) will be used for db storage of the hub. Also, note that a LoadBalancer is created with an external IP.

Later, we will reconfigure it as a NodePort, to use the openstack's loadbalancer.

For now, go visit the ip (210.219.33.215). You will see a login window for the jupyter hub. By default, you can login with any username and any password.

Let' login with a login name of "jovian". You will see a message saying the servers are starting up. After waiting for several seconds, you will be greeted with the familiar jupyter notebook page.

Note that it has launched a new pod named "jupyter-jovian" and has attached a storage of 10Gi to the pod.

Now, let's disable the loadbalancer and make it a NodePort.

Edit the values.yaml, under proxy→ service, you will see its type as "LoadBalancer". Change it to "NodePort" and save the file.

While the chart is deployed, you can upgrade the release (deployment) with a new values.

> helm upgrade jupyterhub-spherex jupyterhub/jupyterhub --namespace jupyterhub --create-namespace --values values.yaml

The type of the "proxy-public" service is changed to "NodePort" and the external ip is dissociated.

Now, from the openstack side, you can add an ingress rule for this service with the node port of "31506".

Step 3. Deploying from a local files

Here we show a example that deploys "ray" from the local files.
We first need to install the source code of ray. The charts are located under the 'deployment' directory.

# we deploy ray using Helm. Its helm chart is included in their source.
# First, we need its source code. 
> git clone https://github.com/ray-project/ray.git
> cd ray/deploy/charts
> ls
ray  ray-head-loadbalancer.yaml  ray-head-nodeport.yaml

# deploy a ray cluster  on "ray" namespace as a release name of "ray-cluster"

> helm install -n ray --create-namespace ray-cluster ./ray

It will first create a ray operator in the default namespace.

Once the operator gets running, it will start to deploy ray head and worker nodes (the default is one head and two workers)

Also, by default a ClusterIP type service object is to be created.

We will change it's type to NodePort so that it works well with the ingress controller.

Let's create a ingress for the dashboard (port 8256).

ingress.yaml

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ray-internal-ingress
  annotations:
    kubernetes.io/ingress.class: "openstack"
    octavia.ingress.kubernetes.io/internal: "true"
spec:
 rules:
 - host: 
   http:
     paths:
     - path: /
       pathType: Prefix
       backend:
         serviceName: ray-cluster-ray-head
         servicePort: dashboard

Once, the ingress is created and the internal ip is assigned, we change the host name to use the nip.io dns (i.e. dashboard.10.0.1.158.nip.io).

Now, the ray dashboard is accessible at dashboard.10.0.1.158.nip.io.

Note that the ip if 10.0.1.158 is an internal ip address and it won't be reachable(but it is reachable from the gateway node). My recommendation is to make a socks5 proxy setup through the gateway and let the browser use the proxy.

Within the k8s cluster's internal network, the ray cluster can be connected at the port 10001 of "ray-cluster-ray-head" dns, or "ray-cluster-ray-head.ray.svc.cluster.local" if you are in different namespace.

For this cluster to be accessible from the external network, we can add TCP listener/pool from the openstack's load balancer. For example, we can add a listener that listens for 10001 port of our loadbalancer at 210.219.33.214 which will forward traffic to the nodeport of 30997 of each worker nodes.

A common pattern in the k8s world is to use an operator to scale (or customize) the apps, which is what ray operator does. For this, ray will use CRD (custom resource definition) of rayclusters.cluster.ray.io.

You can modify this object to change the deployment.

For example, change the "maxWorkers" and "minWorkers" of "rayWorkerType" to 10 and 4, respectively (you also need to change the global worker (Spec → MaxWorkers) so that it is larger than the maxWorker. . Then the ray operator will automatically create two more ray worker pods for you (to make it a minimum of 4). We will also change the base ray image to "rayproject/ray:latest-py39-cpu" so that it is accessed by python 3.9 (e.g., the jupyterhub we created above has a python 3.9)

Now, from the jupyterhub above, you can initialize the ray connection by

ray.init("ray://ray-cluster-ray-head.ray.svc.cluster.local:10001")

From the external network (but still within the KASI firewall), you can do

ray.init("ray://210.219.33.214:10001")

Check out the link below for an example notebook.

https://data.kasi.re.kr/gitlab/leejjoon/vo-igrins/-/blob/master/example%201.ipynb

Space shortcuts

Page tree

Step 1. Installing Helm

Step 2. Installing Applications from the repo

Step 3. Deploying from a local files