...
This page explains how to deploy MPI (and/or GNU parallel) cluster with NFS filesystem in the KASI cloud. Here, OpenMPI (https://www.open-mpi.org/) is used as an MPI implementation. If you like to use Intel oneAPI toolkit and its MPI implementation, see the tip section of the current page. Slurm (https://slurm.schedmd.com/) workload manager can be installed in the cluster too. This how-to assumes that users know how to use a single VM by following the guide given in KASI Science Cloud : VM instances. The basic usage scenario is: 1) user prepares codes and data in the created NFS volume, 2) compile or run the prepared code with OpenMPI (or Intel MPI) w/ or w/o Slurm, 3) output files are stored in the NFS volume, and 4) if needed, an external NAS volume is accessed in the VMs to receive/send the data between the created MPI cluster and the NAS. The same scenario also works in the case using GNU parallel with other codes. The related codes and shell scripts mentioned in this how-to are available in https://github.com/astromsshin/cloud_ex. Cloning the github repository to the NFS volume is the easies way to use the provided materials.
Step 1. Choose a cluster template: KASI-OpenMPI-Cluster or KASI-OpenMPI-Cluster-Slurm
...
You can check the progress of creating the cluster in Cluster Infra → KASI Clusters, Compute → Instances, and Share → Shares as shown in the following figures.
Step 4. (Optional) tasks after creating the cluster
Because it takes time to build all VM nodes in the cluster, you may need to confirm that all nodes are ready with the required tools. The following is the shell script https://github.com/astromsshin/cloud_ex/blob/main/ex_mpi_check_mpirun.sh
Code Block |
---|
|
#!/bin/bash
CLUSTERNAME="mycluster"
MINIONLASTIND="14"
echo "... checking ${CLUSTERNAME}-master"
res=$(which mpirun | wc -l)
if [ ${res} -ne "1" ]
then
echo "[WARNING] ${CLUSTERNAME}-master is not ready yet."
fi
for ind in $(seq 0 ${MINIONLASTIND})
do
echo "... checking ${CLUSTERNAME}-minion-${ind}"
res=$(ssh ${CLUSTERNAME}-minion-${ind} "which mpirun" | wc -l)
if [ ${res} -ne "1" ]
then
echo "[WARNING] ${CLUSTERNAME}-minion-${ind} is not ready yet."
fi
done |
The above script tests whether mpirun is available or not in all cluster VM nodes. https://github.com/astromsshin/cloud_ex/blob/main/ex_mpi_check_munged_and_mpirun.sh conducts the similar test for Slurm as well as OpenMPI as shown below.
Code Block |
---|
|
#!/bin/bash
CLUSTERNAME="mycluster"
MINIONLASTIND="14"
echo "... checking ${CLUSTERNAME}-master"
res=$(which munged mpirun | wc -l)
if [ ${res} -ne "2" ]
then
echo "[WARNING] ${CLUSTERNAME}-master is not ready yet."
fi
for ind in $(seq 0 ${MINIONLASTIND})
do
echo "... checking ${CLUSTERNAME}-minion-${ind}"
res=$(ssh ${CLUSTERNAME}-minion-${ind} "which munged mpirun" | wc -l)
if [ ${res} -ne "2" ]
then
echo "[WARNING] ${CLUSTERNAME}-minion-${ind} is not ready yet."
fi
done |
Step 5. Erasing the cluster
Choose the cluster in Cluster Infra → KASI Clusters by clicking Delete Stacks. If some VM nodes are not erases cleanly, delete the VMs following the instruction given in KASI Science Cloud : VM instances.
Useful Tips
Running MPI codes
Without Slurm, you can simply run MPI codes by mpirun. The following example compile the example C++ MPI codes in https://github.com/astromsshin/cloud_ex and run them.
Code Block |
---|
|
mpic++ -o a.out ex_mpi_hostname.cpp
mpic++ -o a.out ex_mpi_montecarlo_pi.cpp
mpirun --allow-run-as-root -np 32 --hostfile ./ex_mpirun_hostfile.txt ./a.out |
See https://www.open-mpi.org/doc/v4.0/man1/mpirun.1.php or https://www.open-mpi.org/faq/?category=running for mpirun. You need to prepare a hostfile for mpirun, which is ex_mpirun_hostfile.txt in the above example. For example, the hostfile is like the following.
Code Block |
---|
mycluster-master
mycluster-minion-0
mycluster-minion-1 |
When your cluster is equipped with Slurm, you may need to use Slurm commands and follow the Slurm's way to submit jobs. See https://slurm.schedmd.com/sbatch.html or https://www.open-mpi.org/faq/?category=slurm. In the following example, ex_slurm_openmpi.job file is submitted via the sbatch command.
Code Block |
---|
|
sbatch -N 3 -n 24 ex_slurm_openmpi.job |
where ex_slurm_openmpi.job is the following
Code Block |
---|
|
#!/bin/bash
mpirun --allow-run-as-root ./a.out |
Running GNU Parallel
You can run GNU parallel to execute jobs in remote hosts, i.e., cluster slave nodes. See https://www.gnu.org/software/parallel/parallel_tutorial.html#remote-execution. The following example run some simple shell commands on nodes lsited in ex_parallel_hostfile.txt.
Code Block |
---|
|
parallel --nonall --sshloginfile ex_parallel_hostfile.txt hostname
parallel --workdir /mnt/mpi --sshloginfile ex_parallel_hostfile.txt 'hostname; touch $RANDOM-$(hostname)-{}.txt' ::: 3 4 5 6 7 8 9 10 11 12 |
where ex_parallel_hostfile.txt is like the following
Code Block |
---|
:
mycluster-minion-0
mycluster-minion-1 |
Changing root password in multiple VM nodes
SSH remote execution can be used to change root passowrds in all VM nodes as described in https://github.com/astromsshin/cloud_ex/blob/main/tool_change_password_all_nodes.sh:
Code Block |
---|
|
#!/bin/bash
CLUSTERNAME="mycluster"
MINIONLASTIND="14"
PWUSER="root"
NEWPASSWORD="xxxxxxxxxx"
echo "... changing ${CLUSTERNAME}-master : ${PWUSER}"
echo -e "${NEWPASSWORD}\n${NEWPASSWORD}" | passwd ${PWUSER}
for ind in $(seq 0 ${MINIONLASTIND})
do
echo "... changing ${CLUSTERNAME}-minion-${ind} : ${PWUSER}"
ssh ${CLUSTERNAME}-minion-${ind} "echo -e \"${NEWPASSWORD}\n${NEWPASSWORD}\" | passwd ${PWUSER}"
done |
where PWUSER is a user account and NEWPASSWORD is a new password.Install Intel oneAPI and use its MPI