...
This page explains how to deploy MPI (and/or GNU parallel) cluster with NFS filesystem in the KASI cloud. Here, OpenMPI (https://www.open-mpi.org/) is used as an MPI implementation. If you like to use Intel oneAPI toolkit and its MPI implementation, see the tip section of the current page. Slurm (https://slurm.schedmd.com/) workload manager can be installed in the cluster too. This how-to assumes that users know how to use a single VM by following the guide given in KASI Science Cloud : VM instances. In particular, following KASI Science Cloud : VM instances#Step5.RemoteaccesstoVMinstancesviaSSHtunneling will help you access created clusters by using SSH. The basic usage scenario is: 1) user prepares codes and data in the created NFS volume, 2) compile or run the prepared code with OpenMPI (or Intel MPI) w/ or w/o Slurm, 3) output files are stored in the NFS volume, and 4) if needed, an external NAS volume is accessed in the VMs to receive/send the data between the created MPI cluster and the NAS (see KASI Science Cloud : VM instances#Step4.ConfiguretheVMinstanceforremotedesktop&externaldatastore). The same scenario also works in the case using GNU parallel with other codes. The related codes and shell scripts mentioned in this how-to are available in https://github.com/astromsshin/cloud_ex. Cloning the github repository to the NFS volume is the easies way to use the provided materials. It is recommened to use ubuntu and other accounts instead of root account after tasks requiring root account are conducted. Typical tasks conducted in root account include 1) installing packages in a system-wide way by using apt, 2) add uses and changing passwords, etc. You can find examples of doing these tasks in the tips section of this tutorial.
If you have questions and suggestions about this tutorial page and related problems, please, contact Min-Su Shin.
Step 0. Think about the required configuration of your cluster
The current KASI cloud system supports three possible cases of a cluster: case 1) cluster without any preparation of MPI-related setups, case 2) cluster with preparation of MPI-related setups, which are also needed for using GNU parallel on multiple nodes, as well as NFS network-shared volume, and case 3) cluster with preparation of MPI-related setups, but without the NFS network-shared volume. In the case 1, you still can access the cluster nodes with ssh. Therefore, using GNU parallel is possible in this case. However, there is no network-sharef filesystem in this cluster. If you like to use Gluster network-shared filesystem with the cluster, it is possible to have the filesystem following the guide Using Gluster network filesystems. In the case 2, MPI-related configurations are automatically handled, and the NFS network-shared volume is also provided with your chosen configuration. The case 3 is the same as the case 2 except for the absence of the NFS network-shared volume. However, you can setup the Gluster network-shared filesystem following the guide Using Gluster network filesystems. As explained in the following step 1, these three different cases can be configured as your cluster by choosing three different kinds of cluster templates.
Step 1. Choose a cluster template: 1) KASI-Cluster-Basic, 2) KASI-OpenMPI-Cluster or KASI-OpenMPI-Cluster-Slurm, and 3) KASI-OpenMPI-Cluster-Simple or KASI-OpenMPI-Cluster-Slurm-Simple
As presented in the following figure, two multiple options are available for the cluster. If you need Slurm in your cluster, choose KASI-OpenMPI-Cluster-Slurm or KASI-OpenMPI-Cluster-Slurm-Simple template in Project → Cluster Infra → KASI Cluster Templates. If you simply need an MPI(or GNU parallel)-enabled cluster, choose KASI-OpenMPI-Cluster)-enabled cluster, choose KASI-OpenMPI-Cluster or KASI-OpenMPI-Cluster-Simple. If you need a cluster for the case 1 explaine the above Step 0, choose the KASI-Cluster-Basic template.
Click Next button in the following page after checking whether you are about to run a right cluster template.
...
- Stack Name: the name of the cluster which determines the hostnames of the master and slave VM nodes in the cluster.
- Password for user: password required to control the created cluster in certain situations.
- Image: VM system image.
- Flavor: VM flavor.
- Network: choose kasi-user-network.
- Minion VMs Number: the number of slave nodes. If you plan to use Slurm, it might be the number of Slurm work nodes which do not include the master node.
- NFS Mount Path: NFS directory path which will be prepared in all nodes including both master and slave nodes. ← this option is not shown for the template KASI-OpenMPI-Cluster-Simple, KASI-OpenMPI-Cluster-Slurm-Simple, and KASI-Cluster-Basic templates.
- NFS Size: the size of NFS volume. ← this option is not shown for the template KASI-OpenMPI-Cluster-Simple, KASI-OpenMPI-Cluster-Slurm-Simple, and KASI-Cluster-Basic templates.
- SSH Keys: ssh key used to access created VMs.
- Root Password: root password for root account in all nodes of the cluster. You may want to change the password after the cluster is created.
- User Script: shell commands that will be executed in all VM nodes of the cluster. Type custom commands as a single line (see https://dev.to/0xbf/run-multiple-commands-in-one-line-with-and-linux-tips-5hgm about how to use , ;, &&, ||). If you like to use GNU parallel in the cluster, type apt install parallel -y as shown above. If you are not familiar with apt command in Ubuntu OS, see https://ubuntu.com/server/docs/package-management.
...