This page explains how to deploy MPI (and/or GNU parallel) cluster with NFS filesystem in the KASI cloud. Here, OpenMPI (https://www.open-mpi.org/) is used as an MPI implementation. If you like to use Intel oneAPI toolkit and its MPI implementation, see the tip section of the current page. Slurm (https://slurm.schedmd.com/) workload manager can be installed in the cluster too. This how-to assumes that users know how to use a single VM by following the guide given in KASI Science Cloud : VM instances. In particular, following KASI Science Cloud : VM instances#Step5.RemoteaccesstoVMinstancesviaSSHtunneling will help you access created clusters by using SSH. The basic usage scenario is: 1) user prepares codes and data in the created NFS volume, 2) compile or run the prepared code with OpenMPI (or Intel MPI) w/ or w/o Slurm, 3) output files are stored in the NFS volume, and 4) if needed, an external NAS volume is accessed in the VMs to receive/send the data between the created MPI cluster and the NAS (see KASI Science Cloud : VM instances#Step4.ConfiguretheVMinstanceforremotedesktop&externaldatastore). The same scenario also works in the case using GNU parallel with other codes. The related codes and shell scripts mentioned in this how-to are available in https://github.com/astromsshin/cloud_ex. Cloning the github repository to the NFS volume is the easies way to use the provided materials. It is recommened to use ubuntu and other account instead of root account after tasks requiring root account are conducted. Typical tasks conducted in root account include 1) installing packages in a system-wide way by using apt, 2) add uses and changing passwords, etc. You can find examples of doing these tasks in the tips section of this tutorial.
If you have questions and suggestions about this tutorial page and related problems, please, contact Min-Su Shin.
As presented in the following figure, two options are available for the cluster. If you need Slurm in your cluster, choose KASI-OpenMPI-Cluster-Slurm template in Project → Cluster Infra → KASI Cluster Templates. If you simply need an MPI(or GNU parallel)-enabled cluster, choose KASI-OpenMPI-Cluster.
Click Next button in the following page after checking whether you are about to run a right cluster template.
You can check the progress of creating the cluster in Cluster Infra → KASI Clusters, Compute → Instances, and Share → Shares as shown in the following figures.
Because it takes time to build all VM nodes in the cluster, you may need to confirm that all nodes are ready with the required tools. The following is the shell script https://github.com/astromsshin/cloud_ex/blob/main/ex_mpi_check_mpirun.sh
#!/bin/bash CLUSTERNAME="mycluster" MINIONLASTIND="14" echo "... checking ${CLUSTERNAME}-master" res=$(which mpirun | wc -l) if [ ${res} -ne "1" ] then echo "[WARNING] ${CLUSTERNAME}-master is not ready yet." fi for ind in $(seq 0 ${MINIONLASTIND}) do echo "... checking ${CLUSTERNAME}-minion-${ind}" res=$(ssh ${CLUSTERNAME}-minion-${ind} "which mpirun" | wc -l) if [ ${res} -ne "1" ] then echo "[WARNING] ${CLUSTERNAME}-minion-${ind} is not ready yet." fi done |
The above script tests whether mpirun is available or not in all cluster VM nodes. https://github.com/astromsshin/cloud_ex/blob/main/ex_mpi_check_munged_and_mpirun.sh conducts the similar test for Slurm as well as OpenMPI as shown below.
#!/bin/bash CLUSTERNAME="mycluster" MINIONLASTIND="14" echo "... checking ${CLUSTERNAME}-master" res=$(which munged mpirun | wc -l) if [ ${res} -ne "2" ] then echo "[WARNING] ${CLUSTERNAME}-master is not ready yet." fi for ind in $(seq 0 ${MINIONLASTIND}) do echo "... checking ${CLUSTERNAME}-minion-${ind}" res=$(ssh ${CLUSTERNAME}-minion-${ind} "which munged mpirun" | wc -l) if [ ${res} -ne "2" ] then echo "[WARNING] ${CLUSTERNAME}-minion-${ind} is not ready yet." fi done |
Choose the cluster in Cluster Infra → KASI Clusters by clicking Delete Stacks. If some VM nodes are not erases cleanly, delete the VMs following the instruction given in KASI Science Cloud : VM instances.
Without Slurm, you can simply run MPI codes by mpirun. The following example compile the example C++ MPI codes in https://github.com/astromsshin/cloud_ex and run them.
mpic++ -o a.out ex_mpi_hostname.cpp mpic++ -o a.out ex_mpi_montecarlo_pi.cpp mpirun --allow-run-as-root -np 32 --hostfile ./ex_mpirun_hostfile.txt ./a.out |
See https://www.open-mpi.org/doc/v4.0/man1/mpirun.1.php or https://www.open-mpi.org/faq/?category=running for mpirun. You need to prepare a hostfile for mpirun, which is ex_mpirun_hostfile.txt in the above example. When you want to use mpirun in root account, you should use the option --allow-run-as-root as explained in https://www.open-mpi.org/doc/v4.1/man1/mpirun.1.php. For example, the hostfile is like the following.
mycluster-master mycluster-minion-0 mycluster-minion-1 |
When your cluster is equipped with Slurm, you may need to use Slurm commands and follow the Slurm's way to submit jobs. See https://slurm.schedmd.com/sbatch.html or https://www.open-mpi.org/faq/?category=slurm. In the following example, ex_slurm_openmpi.job file is submitted via the sbatch command.
sbatch -N 3 -n 24 ex_slurm_openmpi.job |
where ex_slurm_openmpi.job is the following
#!/bin/bash mpirun --allow-run-as-root ./a.out |
You can run GNU parallel to execute jobs in remote hosts, i.e., cluster slave nodes. See https://www.gnu.org/software/parallel/parallel_tutorial.html#remote-execution (or https://www.biostars.org/p/63816/). The following example run some simple shell commands on nodes lsited in ex_parallel_hostfile.txt.
parallel --nonall --sshloginfile ex_parallel_hostfile.txt hostname parallel --workdir /mnt/mpi --sshloginfile ex_parallel_hostfile.txt 'hostname; touch $RANDOM-$(hostname)-{}.txt' ::: 3 4 5 6 7 8 9 10 11 12 |
where ex_parallel_hostfile.txt is like the following
: mycluster-minion-0 mycluster-minion-1 |
Because the created cluster has prepared environments allowing ssh access to all VM nodes for root account in a key-based login mode without password, the following script works via ssh remote execution in all remote VM nodes.
#!/bin/bash RUNCMD='cd /tmp; /mnt/mpi/cloud_ex/tool_install_and_setup_conda_in_local_volume.sh' CLUSTERNAME="mycluster" MINIONLASTIND="2" echo "... install on ${CLUSTERNAME}-master" echo $RUNCMD | bash for ind in $(seq 0 ${MINIONLASTIND}) do echo "... install on ${CLUSTERNAME}-minion-${ind}" ssh ${CLUSTERNAME}-minion-${ind} "${RUNCMD}" done |
This script is available at https://github.com/astromsshin/cloud_ex/blob/main/tool_execute_commands_all_nodes.sh . The script shows a typical case of executing custom commands in all VM nodes including a master node in the cluster. Other scripts in this tip section basically follow the same way as the above script in executing specific commands. If you are not familiar with executing multiple Linux shell commands in a single command-line, see https://dev.to/0xbf/run-multiple-commands-in-one-line-with-and-linux-tips-5hgm . One example of using the above method is checking availability of Python modules in multiple VM nodes including the cluster master. The following script (https://github.com/astromsshin/cloud_ex/blob/main/tool_test_whether_python_modules_available.sh) runs a short Python script (from tqdm import tqdm; from astropy.io import fits; from sklearn import mixture; import matplotlib.pyplot as plt; import numpy) which does not cause errors when the required modules are available.
#!/bin/bash CLUSTERNAME="mycluster" MINIONLASTIND="2" RUNCMD='echo "Checking on $(hostname)"; python3 -c "from tqdm import tqdm; from astropy.io import fits; from sklearn import mixture; import matplotlib.pyplot as plt; import numpy; print(\"SUCCESS\")"' echo "... install on ${CLUSTERNAME}-master" echo $RUNCMD | bash for ind in $(seq 0 ${MINIONLASTIND}) do echo "... install on ${CLUSTERNAME}-minion-${ind}" ssh ${CLUSTERNAME}-minion-${ind} "${RUNCMD}" done |
SSH remote execution can be used to change root passowrds in all VM nodes as described in https://github.com/astromsshin/cloud_ex/blob/main/tool_change_password_all_nodes.sh:
#!/bin/bash CLUSTERNAME="mycluster" MINIONLASTIND="14" PWUSER="root" NEWPASSWORD="xxxxxxxxxx" echo "... changing ${CLUSTERNAME}-master : ${PWUSER}" echo -e "${NEWPASSWORD}\n${NEWPASSWORD}" | passwd ${PWUSER} for ind in $(seq 0 ${MINIONLASTIND}) do echo "... changing ${CLUSTERNAME}-minion-${ind} : ${PWUSER}" ssh ${CLUSTERNAME}-minion-${ind} "echo -e \"${NEWPASSWORD}\n${NEWPASSWORD}\" | passwd ${PWUSER}" done |
where PWUSER is a user account and NEWPASSWORD is a new password.
Ubuntu account is available as a default account in addition to root account. You may want to use the ubuntu account as your main account to use the created cluster. The following script (https://github.com/astromsshin/cloud_ex/blob/main/tool_change_password_for_ubuntu_all_nodes_and_setup_sshkey.sh) helps you setup the environment with the ubuntu account by changing a password for the ubuntu account and adding a generated ssh key file to the right directory.
#!/bin/bash # this script should be executed by root in the master node. CLUSTERNAME="mycluster" MINIONLASTIND="4" PWUSER="ubuntu" NEWPASSWORD="xxxxxxxxxx" NFSDIR="/mnt/mpi" # changing password of ubuntu account. echo "... changing ${CLUSTERNAME}-master : ${PWUSER}" echo -e "${NEWPASSWORD}\n${NEWPASSWORD}" | passwd ${PWUSER} for ind in $(seq 0 ${MINIONLASTIND}) do echo "... changing ${CLUSTERNAME}-minion-${ind} : ${PWUSER}" ssh ${CLUSTERNAME}-minion-${ind} "echo -e \"${NEWPASSWORD}\n${NEWPASSWORD}\" | passwd ${PWUSER}" done # generate ssh-key rm -f ${NFSDIR}/id_ed25519 ${NFSDIR}/id_ed25519.pub ### you should type empty passwords by entering twice. ssh-keygen -t ed25519 << endskey ${NFSDIR}/id_ed25519 endskey # setup the environemnt for ssh access without password among the cluster nodes # for ubuntu account ### master echo "setup the master: ${CLUSTERNAME}-master" cp -f ${NFSDIR}/id_ed25519 /home/ubuntu/.ssh/ cp -f ${NFSDIR}/id_ed25519.pub /home/ubuntu/.ssh/ chown ubuntu:ubuntu /home/ubuntu/.ssh/id_ed25519* chmod 600 /home/ubuntu/.ssh/id_ed25519 chmod 644 /home/ubuntu/.ssh/id_ed25519.pub cat /home/ubuntu/.ssh/id_ed25519.pub >> /home/ubuntu/.ssh/authorized_keys ### slaves for ind in $(seq 0 ${MINIONLASTIND}) do echo "setup the slave: ${CLUSTERNAME}-minion-${ind}" ssh ${CLUSTERNAME}-minion-${ind} "cp -f ${NFSDIR}/id_ed25519 /home/ubuntu/.ssh/; cp -f ${NFSDIR}/id_ed25519.pub /home/ubuntu/.ssh/; chown ubuntu:ubuntu /home/ubuntu/.ssh/id_ed25519*; chmod 600 /home/ubuntu/.ssh/id_ed25519; chmod 644 /home/ubuntu/.ssh/id_ed25519.pub; cat /home/ubuntu/.ssh/id_ed25519.pub >> /home/ubuntu/.ssh/authorized_keys" done |
The above script should be executed by root account. If you like to use a different type in stead of ed25519, you should modify the script.
It is possible to install Intel oneAPI and use its MPI implementation instead of OpenMPI. The following script (https://github.com/astromsshin/cloud_ex/blob/main/tool_install_intel_oneapi_ubuntu_all_nodes.sh) can be used to install Intel oneAPI Base and HPC Toolkits in all VM nodes of the cluster.
#!/bin/bash # See # https://www.intel.com/content/www/us/en/develop/documentation/installation-guide-for-intel-oneapi-toolkits-linux/top/installation/install-using-package-managers/apt.html install_intel_oneapi='cd /tmp; wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB; apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB; rm GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB; echo "deb https://apt.repos.intel.com/oneapi all main" | tee /etc/apt/sources.list.d/oneAPI.list; add-apt-repository "deb https://apt.repos.intel.com/oneapi all main"; apt install -y intel-basekit intel-hpckit' CLUSTERNAME="mycluster" MINIONLASTIND="14" echo "... install on ${CLUSTERNAME}-master" echo $install_intel_oneapi | bash for ind in $(seq 0 ${MINIONLASTIND}) do echo "... install on ${CLUSTERNAME}-minion-${ind}" ssh ${CLUSTERNAME}-minion-${ind} "${install_intel_oneapi}" done |
After installing the Intel toolkits, you need to setup the shell environment for the Intel tools. See https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos/use-a-config-file-for-setvars-sh-on-linux-or-macos.html for the guide to setup the environment. Here, simply source /opt/intel/oneapi/setvars.sh. As shown below, you can compile and test MPI programs by using the installed Intel toolkits.
> which mpiicpc /opt/intel/oneapi/mpi/2021.5.1/bin/mpiicpc > mpiicpc ./ex_mpi_montecarlo_pi.cpp > ldd ./a.out > which mpirun /opt/intel/oneapi/mpi/2021.5.1/bin/mpirun |
See https://www.intel.com/content/www/us/en/develop/documentation/mpi-developer-guide-linux/top.html to figure out how to compile and run MPI programs with Intel toolkits.
You may want to install software by using apt commands in Ubunut Linux environment. As described above, you can type a specific command such as "apt install parallel -y" in the cluster template as we do for installing GNU parallel in all cluster nodes. Using the custom commands in the cluster template is one way to install apt packages in all cluster nodes. If you want to install apt packages after your cluster is built, you can use the following script (https://github.com/astromsshin/cloud_ex/blob/main/tool_install_apt_packages_all_nodes.sh).
#!/bin/bash pkgs=(astropy-utils python3-astropy python3-astropy-affiliated python3-astropy-healpix python3-astropy-helpers python3-sklearn python3-skimage python3-statsmodels python3-matplotlib) CLUSTERNAME="mycluster" MINIONLASTIND="14" echo "... install on ${CLUSTERNAME}-master" apt update -y for pkg in ${pkgs[@]} do apt install $pkg -y done for ind in $(seq 0 ${MINIONLASTIND}) do echo "... install on ${CLUSTERNAME}-minion-${ind}" ssh ${CLUSTERNAME}-minion-${ind} "apt update -y" for pkg in ${pkgs[@]} do ssh ${CLUSTERNAME}-minion-${ind} "apt install ${pkg} -y" done done |
The following script is available at https://github.com/astromsshin/cloud_ex/blob/main/tool_install_and_setup_conda_in_shared_volume.sh which installs miniconda and setups a specific conda environment by using the network-shared volume.
#!/bin/bash CLUSTERNAME="mycluster" NFSDIR="/mnt/mpi" CONDAENV="xclass" CONDAURL="https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" # additional apt packages apt install zip # installation of miniconda cd ${NFSDIR} wget "${CONDAURL}" -O ./miniconda.sh bash ./miniconda.sh -b -p ${NFSDIR}/miniconda eval "$(${NFSDIR}/miniconda/bin/conda shell.bash hook)" conda init conda update -y -n base -c defaults conda # creating the environment conda create -y -n ${CONDAENV} python=2.7 # adding new conda packages conda install -y -n ${CONDAENV} numpy conda install -y -n ${CONDAENV} scipy conda install -y -n ${CONDAENV} matplotlib conda install -y -n ${CONDAENV} astropy conda install -y -n ${CONDAENV} sqlite # adding pip packages conda activate ${CONDAENV} pip install pyfits echo "Do the following things to use the environment ${CONDAENV}" echo "1) source ~/.bashrc" echo "2) conda activate ${CONDAENV}" |
You can imagine installing conda and preparing environments in your local home directory. The script https://github.com/astromsshin/cloud_ex/blob/main/tool_install_and_setup_conda_in_local_volume.sh can be used.
#!/bin/bash CLUSTERNAME="mycluster" TMPDIR="/tmp" CONDAENV="xclass" CONDAURL="https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" # additional apt packages apt install zip # installation of miniconda cd ${TMPDIR} wget "${CONDAURL}" -O ./miniconda.sh bash ./miniconda.sh -b -p ${HOME}/miniconda eval "$(${HOME}/miniconda/bin/conda shell.bash hook)" conda init conda update -y -n base -c defaults conda # creating the environment conda create -y -n ${CONDAENV} python=2.7 # adding new conda packages conda install -y -n ${CONDAENV} numpy conda install -y -n ${CONDAENV} scipy conda install -y -n ${CONDAENV} matplotlib conda install -y -n ${CONDAENV} astropy conda install -y -n ${CONDAENV} sqlite # adding pip packages conda activate ${CONDAENV} pip install pyfits echo "Do the following things to use the environment ${CONDAENV}" echo "1) source ~/.bashrc" echo "2) conda activate ${CONDAENV}" |
The following script show how to download files from the external storage, which is described in KASI Science Cloud : VM instances#Step4.ConfiguretheVMinstanceforremotedesktop&externaldatastore, in the the network-shared volume. A typical usage includes downloading compiled binary codes, related scripts, configuration files, and data from the external storage. The script is available at https://github.com/astromsshin/cloud_ex/blob/main/tool_download_from_external_storage.sh.
#!/bin/bash # edit the folloiwing variables TARGETDIR="/mnt/mpi" # assuming webdav accesses WEBDAVIP="xxxx" WEBDAVID="xxxx" WEBDAVPW="xxxx" # array of filenames that will be downloaded and saved SRCFNARR=("XCLASS.zip" "ins_custom.sh") DESTFNARR=("XCLASS.zip" "ins_custom.sh") cd ${TARGETDIR} CNT=0 for SRCFN in ${SRCFNARR[@]} do DESTFN=${DESTFNARR[$CNT]} wget -O ${DESTFN} --no-check-certificate -r -c --user ${WEBDAVID} --password ${WEBDAVPW} https://${WEBDAVIP}/home/${SRCFN} CNT=$((CNT+1)) done |
You can also upload files from the cluster to the external storage as shown in the following example script (https://github.com/astromsshin/cloud_ex/blob/main/tool_upload_to_external_storage.sh).
#!/bin/bash # edit the folloiwing variables # assuming webdav accesses WEBDAVIP="xxxx" WEBDAVID="xxxx" WEBDAVPW="xxxx" # array of filenames that will be downloaded and saved SRCFNARR=("/root/.ssh/id_rsa.pub") DESTFNARR=("cluster_master_id_rsa.pub") CNT=0 for SRCFN in ${SRCFNARR[@]} do DESTFN=${DESTFNARR[$CNT]} curl --insecure -u ${WEBDAVID}:${WEBDAVPW} -T ${SRCFN} https://${WEBDAVIP}/home/${DESTFN} CNT=$((CNT+1)) done |