Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This page explains how to deploy MPI (and/or GNU parallel) cluster with NFS filesystem in the KASI cloud. Here, OpenMPI (https://www.open-mpi.org/) is used as an MPI implementation. If you like to use Intel oneAPI toolkit and its MPI implementation, see the tip section of the current page. Slurm (https://slurm.schedmd.com/) workload manager can be installed in the cluster too. This how-to assumes that users know how to use a single VM by following the guide given in KASI Science Cloud : VM instances. The basic usage scenario is: 1) user prepares codes and data in the created NFS volume, 2) compile or run the prepared code with OpenMPI (or Intel MPI) w/ or w/o Slurm, 3) output files are stored in the NFS volume, and 4) if needed, an external NAS volume is accessed in the VMs to receive/send the data between the created MPI cluster and the NAS (see KASI Science Cloud : VM instances#Step4.ConfiguretheVMinstanceforremotedesktop&externaldatastore). The same scenario also works in the case using GNU parallel with other codes. The related codes and shell scripts mentioned in this how-to are available in https://github.com/astromsshin/cloud_ex. Cloning the github repository to the NFS volume is the easies way to use the provided materials.

...

Code Block
languagebash
mpic++ -o a.out ex_mpi_hostname.cpp
mpic++ -o a.out ex_mpi_montecarlo_pi.cpp
mpirun --allow-run-as-root -np 32 --hostfile ./ex_mpirun_hostfile.txt ./a.out

See  https://www.open-mpi.org/doc/v4.0/man1/mpirun.1.php or https://www.open-mpi.org/faq/?category=running for mpirun. You need to prepare a hostfile for mpirun, which is ex_mpirun_hostfile.txt in the above example. For example, the hostfile is like the following.

...

When your cluster is equipped with Slurm, you may need to use Slurm commands and follow the Slurm's way to submit jobs. See https://slurm.schedmd.com/sbatch.html or https://www.open-mpi.org/faq/?category=slurm. In the following example,  ex_slurm_openmpi.job file is submitted via the sbatch command.

...

You can run GNU parallel to execute jobs in remote hosts, i.e., cluster slave nodes. See https://www.gnu.org/software/parallel/parallel_tutorial.html#remote-execution. The following example run some simple shell commands on nodes lsited in ex_parallel_hostfile.txt.

...

where PWUSER is a user account and NEWPASSWORD is a new password.

Install Intel oneAPI and use its MPI

It is possible to install Intel oneAPI and use its MPI implementation instead of OpenMPI. The following script (https://github.com/astromsshin/cloud_ex/blob/main/tool_install_intel_oneapi_ubuntu_all_nodes.sh) can be used to install Intel oneAPI Base and HPC Toolkits in all VM nodes of the cluster.

Code Block
languagebash
#!/bin/bash

# See
# https://www.intel.com/content/www/us/en/develop/documentation/installation-guide-for-intel-oneapi-toolkits-linux/top/installation/install-
using-package-managers/apt.html

install_intel_oneapi='cd /tmp; wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB; apt-key add GPG-PUB-KEY-IN
TEL-SW-PRODUCTS.PUB; rm GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB; echo "deb https://apt.repos.intel.com/oneapi all main" | tee /etc/apt/sources.lis
t.d/oneAPI.list; add-apt-repository "deb https://apt.repos.intel.com/oneapi all main"; apt install -y intel-basekit intel-hpckit'

CLUSTERNAME="mycluster"
MINIONLASTIND="14"

echo "... install on ${CLUSTERNAME}-master"
echo $install_intel_oneapi | bash

for ind in $(seq 0 ${MINIONLASTIND})
do
  echo "... install on ${CLUSTERNAME}-minion-${ind}"
  ssh ${CLUSTERNAME}-minion-${ind} "${install_intel_oneapi}"
done

After installing the Intel toolkits, you need to setup the shell environment for the Intel tools. See https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos/use-a-config-file-for-setvars-sh-on-linux-or-macos.html for the guide to setup the environment. Here, simply source /opt/intel/oneapi/setvars.sh.