The kubernetes-setup from tvogels

Instruction of using the container cluster (Kubernetes, k8s)

Requesting access

Use this form to request access (use Accréditation=MLO).

Once you have been approved, you will get an email from the IC with a zip file named .zip. Unzip it and put these files into the .kube folder of your home directory. Then rename <your-name>.config to config.

cd ~
mkdir .kube
# Save your-name.zip in the .kube folder
mv .kube/*.config .kube/config

You can refer to this repository for more information, below are steps for most common setups.

Installing Kubernetes (Ubuntu/Debian)

sudo apt-get update && sudo apt-get install -y apt-transport-https
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubectl

Check that it is working by running:

kubectl get pods

Setting up Kubernetes

To use a kubernetes pod, you need to:

Create a dockerfile with your needed config
Build this docker image
Push the docker image to ic-registry.epfl.ch/mlo/
Create a kubernetes config file
Create a pod

Creating a dockerfile

If you are new to docker, have a look at this very simple Dockerfile. You should guess what is happening and add your own config.

Put your gaspar id after NB_USER= and your uid after NB_UID=.
You can get you uid by using the id commmand on a cluster.

The FROM line allows you to choose an image to start from. You can choose from images on the Dockerhub (or elsewhere).

Building the Docker image

Once you are happy with the Dockerfile, go to the directory of the Dockerfile and run:

docker build . -t <your-tag>

Replace <your-tag> by the name you want to give to this docker image.
It is good practice to put your name first, for example jaggi_base.

Pushing the Docker image

When you will create a pod, the server will need a Docker image to build a container for you.
The server will go look for the docker image on https://ic-registry.epfl.ch/, so you should put your Docker image there.

Go have a look at https://ic-registry.epfl.ch and use your gaspar credentials to login in.

There already is a group project named mlo. Please ask someone in the lab already using kubernetes to add you to the mlo group so that you can push your docker image to that repository.

Login docker to ic-registry.epfl.ch/mlo/

docker login ic-registry.epfl.ch

This is a 'one-in-a-lifetime' steps.

Actually pushing the Docker image

To push an image to a private registry (and not the central Docker registry) you must tag it with the registry hostname.
Then you can push it:

docker tag <your-tag> ic-registry.epfl.ch/mlo/<your-tag>
docker push ic-registry.epfl.ch/mlo/<your-tag>

Creating a kubernetes config file

Have a look at (and download) this simple kubernetes config file. Fill all elements that are in <brackets> .
<your-pod-name> needs not be the same as <your-docker-image-tag> but again it is good practice to put your name first for the pod name, for example jaggi-pod.

In this config file,

you can change: nvidia.com/gpu: 1 to request more gpus.\
you can see at the end that mlodata1 is mounted. You can remove it or change it for mloscratch.
you specify which command is run when launching the pod. Here it will sleep for 60 seconds and then die.

To run more complex or multiple commands, you can do:

 command: ["/bin/bash", "-c"]
 args: ["command1; command2 && command3"]

Approaches to running a job

There are two approaches to running pods on the container cluster:

Like in the Kubernetes basics, with command: [sleep, infinity], and then connecting to the pod over ssh to run an experiment
- This can be convenient for playing around. You can temporarily spin up as many nodes as you want
- But you pay GPU time you don't use.
Use something like command: [run, my, experiment].
- This makes debugging slightly harder, but as soon as your job finishes, the pod gets status Completed, and you (Martin) will stop paying for the pod.

Using Kubernetes

Creating a pod

Go to the directory where your kubernetes config file is and run:

kubectl create -f <your-configfile-name>.yaml

hecking which pods are running

kubectl get pods  # get all pods
kubectl get pods -l user=jaggi  # filter by label (defined in the config file)
kubectl get pod jaggi-pod  # get by pod name

Deleting a pod:

kubectl delete pod jaggi-pod

Debuging a pod:

kubectl describe pod jaggi-pod
kubectl get pod jaggi-pod -o yaml

Note on Storage across icclusters

(`mounting /mlo-container-scratch`)

Follow the instructions in Kubernetes basics, and use

volumeMounts:
- mountPath: /scratch
   name: mlo-scratch
   subPath: YOUR_USERNAME

and

volumes:
- name: mlo-scratch
   persistentVolumeClaim:
   claimName: mlo-scratch

(`mounting /mlodata1`)

spec:
  volumes:
  - name: mlodata1
    persistentVolumeClaim:
      claimName: pv-mlodata1
  containers:
  - name:  ubuntu
    volumeMounts:
    - mountPath: /mlodata1
      name: mlodata1

Some deployment template

You can find some provided templates, e.g.,

Some Tips

By default, a Docker container will run as root. This means that the files you write in the shared storage are owned by root. This is solved by changing the default user in Docker (which is already done in the simple Dockerfile) (Here another example from Tao)
To avoid the error sudo: no tty present and no askpass program specified, please use sudo -S xxx.

tvogels / kubernetes-setup Goto Github PK

kubernetes-setup's Introduction