Deploy Message Passing Interface (MPI) (and/or GNU parallel) cluster

This page explains how to deploy MPI (and/or GNU parallel) cluster with NFS filesystem in the KASI cloud. Here, OpenMPI (https://www.open-mpi.org/) is used as an MPI implementation. If you like to use Intel oneAPI toolkit and its MPI implementation, see the tip section of the current page. Slurm (https://slurm.schedmd.com/) workload manager can be installed in the cluster too. This how-to assumes that users know how to use a single VM by following the guide given in KASI Science Cloud : VM instances. In particular, following KASI Science Cloud : VM instances#Step5.RemoteaccesstoVMinstancesviaSSHtunneling will help you access created clusters by using SSH. The basic usage scenario is: 1) user prepares codes and data in the created NFS volume, 2) compile or run the prepared code with OpenMPI (or Intel MPI) w/ or w/o Slurm, 3) output files are stored in the NFS volume, and 4) if needed, an external NAS volume is accessed in the VMs to receive/send the data between the created MPI cluster and the NAS (see KASI Science Cloud : VM instances#Step4.ConfiguretheVMinstanceforremotedesktop&externaldatastore). The same scenario also works in the case using GNU parallel with other codes. The related codes and shell scripts mentioned in this how-to are available in https://github.com/astromsshin/cloud_ex. Cloning the github repository to the NFS volume is the easies way to use the provided materials. It is recommened to use ubuntu and other accounts instead of root account after tasks requiring root account are conducted. Typical tasks conducted in root account include 1) installing packages in a system-wide way by using apt, 2) add uses and changing passwords, etc. You can find examples of doing these tasks in the tips section of this tutorial.
If you have questions and suggestions about this tutorial page and related problems, please, contact Min-Su Shin.

Step 0. Think about the required configuration of your cluster

The current KASI cloud system supports three possible cases of a cluster: case 1) cluster without any preparation of MPI-related setups, case 2) cluster with preparation of MPI-related setups, which are also needed for using GNU parallel on multiple nodes, as well as NFS network-shared volume, and case 3) cluster with preparation of MPI-related setups, but without the NFS network-shared volume. In the case 1, you still can access the cluster nodes with ssh. Therefore, using GNU parallel is possible in this case. However, there is no network-sharef filesystem in this cluster. If you like to use Gluster network-shared filesystem with the cluster, it is possible to have the filesystem following the guide Using Gluster network filesystems. In the case 2, MPI-related configurations are automatically handled, and the NFS network-shared volume is also provided with your chosen configuration. The case 3 is the same as the case 2 except for the absence of the NFS network-shared volume. However, you can setup the Gluster network-shared filesystem following the guide Using Gluster network filesystems. As explained in the following step 1, these three different cases can be configured as your cluster by choosing three different kinds of cluster templates.

Step 1. Choose a cluster template: case 1) KASI-Cluster-Basic, case 2) KASI-OpenMPI-Cluster or KASI-OpenMPI-Cluster-Slurm, and case 3) KASI-OpenMPI-Cluster-Simple or KASI-OpenMPI-Cluster-Slurm-Simple

As presented in the following figure, multiple options are available for the cluster. If you need Slurm in your cluster, choose KASI-OpenMPI-Cluster-Slurm or KASI-OpenMPI-Cluster-Slurm-Simple template in Project → Cluster Infra → KASI Cluster Templates. If you simply need an MPI(or GNU parallel)-enabled cluster, choose KASI-OpenMPI-Cluster or KASI-OpenMPI-Cluster-Simple. If you need a cluster for the case 1 explaine the above Step 0, choose the KASI-Cluster-Basic template.

Click Next button in the following page after checking whether you are about to run a right cluster template.

Step 2. Configure a cluster by typing configuration parameters

Stack Name: the name of the cluster which determines the hostnames of the master and slave VM nodes in the cluster.
Password for user: password required to control the created cluster in certain situations.
Image: VM system image.
Flavor: VM flavor.
Network: choose kasi-user-network.
Minion VMs Number: the number of slave nodes. If you plan to use Slurm, it might be the number of Slurm work nodes which do not include the master node.
NFS Mount Path: NFS directory path which will be prepared in all nodes including both master and slave nodes. ← this option is not shown for the template KASI-OpenMPI-Cluster-Simple, KASI-OpenMPI-Cluster-Slurm-Simple, and KASI-Cluster-Basic templates.
NFS Size: the size of NFS volume. ← this option is not shown for the template KASI-OpenMPI-Cluster-Simple, KASI-OpenMPI-Cluster-Slurm-Simple, and KASI-Cluster-Basic templates.
SSH Keys: ssh key used to access created VMs.
Root Password: root password for root account in all nodes of the cluster. You may want to change the password after the cluster is created.
User Script: shell commands that will be executed in all VM nodes of the cluster. Type custom commands as a single line (see https://dev.to/0xbf/run-multiple-commands-in-one-line-with-and-linux-tips-5hgm about how to use , ;, &&, ||). If you like to use GNU parallel in the cluster, type apt install parallel -y as shown above. If you are not familiar with apt command in Ubuntu OS, see https://ubuntu.com/server/docs/package-management.

Step 3. Checking the creation process

You can check the progress of creating the cluster in Cluster Infra → KASI Clusters, Compute → Instances, and Share → Shares as shown in the following figures.

Step 4. (Optional) tasks after creating the cluster

Because it takes time to build all VM nodes in the cluster, you may need to confirm that all nodes are ready with the required tools. The following is the shell script https://github.com/astromsshin/cloud_ex/blob/main/ex_mpi_check_mpirun.sh

#!/bin/bash
 
CLUSTERNAME="mycluster"
MINIONLASTIND="14"
 
echo "... checking ${CLUSTERNAME}-master"
res=$(which mpirun | wc -l)
if [ ${res} -ne "1" ]
then
  echo "[WARNING] ${CLUSTERNAME}-master is not ready yet."
fi
 
for ind in $(seq 0 ${MINIONLASTIND})
do
  echo "... checking ${CLUSTERNAME}-minion-${ind}"
  res=$(ssh ${CLUSTERNAME}-minion-${ind} "which mpirun" | wc -l)
  if [ ${res} -ne "1" ]
  then
    echo "[WARNING] ${CLUSTERNAME}-minion-${ind} is not ready yet."
  fi
done

The above script tests whether mpirun is available or not in all cluster VM nodes. https://github.com/astromsshin/cloud_ex/blob/main/ex_mpi_check_munged_and_mpirun.sh conducts the similar test for Slurm as well as OpenMPI as shown below.

#!/bin/bash

CLUSTERNAME="mycluster"
MINIONLASTIND="14"

echo "... checking ${CLUSTERNAME}-master"
res=$(which munged mpirun | wc -l)
if [ ${res} -ne "2" ]
then
  echo "[WARNING] ${CLUSTERNAME}-master is not ready yet."
fi

for ind in $(seq 0 ${MINIONLASTIND})
do
  echo "... checking ${CLUSTERNAME}-minion-${ind}"
  res=$(ssh ${CLUSTERNAME}-minion-${ind} "which munged mpirun" | wc -l)
  if [ ${res} -ne "2" ]
  then
    echo "[WARNING] ${CLUSTERNAME}-minion-${ind} is not ready yet."
  fi
done

Step 5. Erasing the cluster

Choose the cluster in Cluster Infra → KASI Clusters by clicking Delete Stacks. If some VM nodes are not erases cleanly, delete the VMs following the instruction given in KASI Science Cloud : VM instances.

FAQ

What accounts are avaialble? What account do I have to use?

When cluster VM instances are ready, all cluster nodes have root and ubuntu accounts. 1) The root account permits web console login with the password provided by users, and you should use the account to login via the web console. The cluster is already ready to access all nodes with the root account in the form of password-less SSH access when the cluster is made ready. However, the ssh login with the root account outside the cloud network is not possible unless you manually change the setup. 2) The ubuntu account is sudo-enabled, but the console login with the ubuntu account is not allowed when the cluster is made. If you like to allow ubuntu account to login via the web console, you need to set password for the ubuntu account as described in "Changing password of ubuntu account and preparing key-based ssh login environment in multiple VM nodes" of the useful tips. Becasue the user-provided public key is already included in the authorized keys of the ubuntu account, the SSH access with a correct private key is possible. However, the password-less SSH access among the cluster nodes is not ready with the ubuntu account when the cluster is made. In order to make the cluster nodes allow password-less SSH access among them, you may need to setup the environment by yourself as shown in "Changing password of ubuntu account and preparing key-based ssh login environment in multiple VM nodes" of the useful tips. 3) You can create new accounts by yourself because you have root-permission.

Becasue the NFS partition can have issues of permissions in being accessed in multiple VM cluster nodes, you need to be careful in choosing accounts for running your applications and in setting file/directory (in particular, NFS directory and files there) permissions. Probably, if you do not need SSH access from external machines and do not have concerns on security with the root account, using the root account might be the easiest way to avoid problems such as file/directory permission. If you need SSH access from external machines, you may need to change the SSH-related configurations for the root account. When you decide to use the ubuntu account with efforts of changing owner and permission of files and directories, you may need to use the script introducted in "Changing password of ubuntu account and preparing key-based ssh login environment in multiple VM nodes" of the useful tips.

The same steps of building custom environment is repeated when a new cluster is made again. Are there better ways?

It is recommended to have some simple scripts that can make procedures of preparing custom environments easy and fast. You can have your own scripts of installing apt packages and creating conda environments following the guide given in the useful tips. If you keep the scripts in the provided external storage, which can be accessed in the KASI cloud, or others such as github repository, you can make the cluster nodes to execute your own custom script as a step to build your custom work environments. In terms of using conda environment, you can export your custom conda environment configuration and save them in the external storage (see https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#building-identical-conda-environments). When your code can be provided as pre-compiled binary in the external storage, compilation steps can be skipped for the code.

Working directories: each node's disk, mounted external NAS space, and cluster NFS parition.

Each cluster node comes with its own disk space which is chosen by user. If your task requires locally accessible space such as /tmp, each node's disk space might be the best option for speed. Data stored in the local disk space of each node disappear when the node is destroyed, i.e., when the cluster is erased. The external NAS space can be mounted in any cluster nodes. However, it is expected for users to mount the NAS space in the cluster master node to bring required data and codes to the cluster NFS partition. The external NAS disk space is backuped, and it is not erased when the cluster is erased. Therefore, the external NAS space can be also used to store results produced in your task by copying results stored in the NFS partition to the mounted external disk space. The cluster NFS partition is available in all cluster nodes. Therefore, the NFS partition is a right place to host files that need to be accessed in all cluster nodes. The NFS partition is also destroyed when the cluster is erased. If you need to execute specific commands such as changing file permission and creating new directories in all cluster nodes, see the example given in "Executing custom commands in multiple VM nodes" of the useful tips.

Useful Tips

Running MPI codes

Without Slurm, you can simply run MPI codes by mpirun. The following example compile the example C++ MPI codes in https://github.com/astromsshin/cloud_ex and run them.

mpic++ -o a.out ex_mpi_hostname.cpp
mpic++ -o a.out ex_mpi_montecarlo_pi.cpp
mpirun --allow-run-as-root -np 32 --hostfile ./ex_mpirun_hostfile.txt ./a.out

See https://www.open-mpi.org/doc/v4.0/man1/mpirun.1.php or https://www.open-mpi.org/faq/?category=running for mpirun. You need to prepare a hostfile for mpirun, which is ex_mpirun_hostfile.txt in the above example. When you want to use mpirun in root account, you should use the option --allow-run-as-root as explained in https://www.open-mpi.org/doc/v4.1/man1/mpirun.1.php. For example, the hostfile is like the following.

mycluster-master
mycluster-minion-0
mycluster-minion-1

When your cluster is equipped with Slurm, you may need to use Slurm commands and follow the Slurm's way to submit jobs. See https://slurm.schedmd.com/sbatch.html or https://www.open-mpi.org/faq/?category=slurm. In the following example, ex_slurm_openmpi.job file is submitted via the sbatch command.

sbatch -N 3 -n 24 ex_slurm_openmpi.job

where ex_slurm_openmpi.job is the following

#!/bin/bash
 
mpirun --allow-run-as-root ./a.out

Running GNU Parallel

You can run GNU parallel to execute jobs in remote hosts, i.e., cluster slave nodes. See https://www.gnu.org/software/parallel/parallel_tutorial.html#remote-execution (or https://www.biostars.org/p/63816/). The following example run some simple shell commands on nodes lsited in ex_parallel_hostfile.txt.

parallel --nonall --sshloginfile ex_parallel_hostfile.txt hostname
 
parallel --workdir /mnt/mpi --sshloginfile ex_parallel_hostfile.txt 'hostname; touch $RANDOM-$(hostname)-{}.txt' ::: 3 4 5 6 7 8 9 10 11 12

where ex_parallel_hostfile.txt is like the following

:
mycluster-minion-0
mycluster-minion-1

Executing custom commands in multiple VM nodes (e.g., checking availability of Python modules in multiple VM nodes)

Because the created cluster has prepared environments allowing ssh access to all VM nodes for root account in a key-based login mode without password, the following script works via ssh remote execution in all remote VM nodes.

#!/bin/bash

RUNCMD='cd /tmp; /mnt/mpi/cloud_ex/tool_install_and_setup_conda_in_local_volume.sh'

CLUSTERNAME="mycluster"
MINIONLASTIND="2"

echo "... install on ${CLUSTERNAME}-master"
echo $RUNCMD | bash

for ind in $(seq 0 ${MINIONLASTIND})
do
  echo "... install on ${CLUSTERNAME}-minion-${ind}"
  ssh ${CLUSTERNAME}-minion-${ind} "${RUNCMD}"
done

This script is available at https://github.com/astromsshin/cloud_ex/blob/main/tool_execute_commands_all_nodes.sh . The script shows a typical case of executing custom commands in all VM nodes including a master node in the cluster. Other scripts in this tip section basically follow the same way as the above script in executing specific commands. If you are not familiar with executing multiple Linux shell commands in a single command-line, see https://dev.to/0xbf/run-multiple-commands-in-one-line-with-and-linux-tips-5hgm . One example of using the above method is checking availability of Python modules in multiple VM nodes including the cluster master. The following script (https://github.com/astromsshin/cloud_ex/blob/main/tool_test_whether_python_modules_available.sh) runs a short Python script (from tqdm import tqdm; from astropy.io import fits; from sklearn import mixture; import matplotlib.pyplot as plt; import numpy) which does not cause errors when the required modules are available.

#!/bin/bash

CLUSTERNAME="mycluster"
MINIONLASTIND="2"

RUNCMD='echo "Checking on $(hostname)"; python3 -c "from tqdm import tqdm; from astropy.io import fits; from sklearn import mixture; import matplotlib.pyplot as plt; import numpy; print(\"SUCCESS\")"'

echo "... install on ${CLUSTERNAME}-master"
echo $RUNCMD | bash

for ind in $(seq 0 ${MINIONLASTIND})
do
  echo "... install on ${CLUSTERNAME}-minion-${ind}"
  ssh ${CLUSTERNAME}-minion-${ind} "${RUNCMD}"
done

Changing root password in multiple VM nodes

SSH remote execution can be used to change root passowrds in all VM nodes as described in https://github.com/astromsshin/cloud_ex/blob/main/tool_change_password_all_nodes.sh:

#!/bin/bash
 
CLUSTERNAME="mycluster"
MINIONLASTIND="14"
PWUSER="root"
NEWPASSWORD="xxxxxxxxxx"
 
echo "... changing ${CLUSTERNAME}-master : ${PWUSER}"
echo -e "${NEWPASSWORD}\n${NEWPASSWORD}" | passwd ${PWUSER}
 
for ind in $(seq 0 ${MINIONLASTIND})
do
  echo "... changing ${CLUSTERNAME}-minion-${ind} : ${PWUSER}"
  ssh ${CLUSTERNAME}-minion-${ind} "echo -e \"${NEWPASSWORD}\n${NEWPASSWORD}\" | passwd ${PWUSER}"
done

where PWUSER is a user account and NEWPASSWORD is a new password.

Changing password of ubuntu account and preparing key-based ssh login environment in multiple VM nodes

Ubuntu account is available as a default account in addition to root account. You may want to use the ubuntu account as your main account to use the created cluster. The following script (https://github.com/astromsshin/cloud_ex/blob/main/tool_change_password_for_ubuntu_all_nodes_and_setup_sshkey.sh) helps you setup the environment with the ubuntu account by changing a password for the ubuntu account and adding a generated ssh key file to the right directory.

#!/bin/bash

# this script should be executed by root in the master node.

CLUSTERNAME="mycluster"
MINIONLASTIND="4"
PWUSER="ubuntu"
NEWPASSWORD="xxxxxxxxxx"

NFSDIR="/mnt/mpi"

# changing password of ubuntu account.

echo "... changing ${CLUSTERNAME}-master : ${PWUSER}"
echo -e "${NEWPASSWORD}\n${NEWPASSWORD}" | passwd ${PWUSER}

for ind in $(seq 0 ${MINIONLASTIND})
do
  echo "... changing ${CLUSTERNAME}-minion-${ind} : ${PWUSER}"
  ssh ${CLUSTERNAME}-minion-${ind} "echo -e \"${NEWPASSWORD}\n${NEWPASSWORD}\" | passwd ${PWUSER}"
done

# generate ssh-key
rm -f ${NFSDIR}/id_ed25519 ${NFSDIR}/id_ed25519.pub
### you should type empty passwords by entering twice.
ssh-keygen -t ed25519 << endskey
${NFSDIR}/id_ed25519
endskey

# setup the environemnt for ssh access without password among the cluster nodes
# for ubuntu account
### master
echo "setup the master: ${CLUSTERNAME}-master"
cp -f ${NFSDIR}/id_ed25519 /home/ubuntu/.ssh/
cp -f ${NFSDIR}/id_ed25519.pub /home/ubuntu/.ssh/
chown ubuntu:ubuntu /home/ubuntu/.ssh/id_ed25519*
chmod 600 /home/ubuntu/.ssh/id_ed25519
chmod 644 /home/ubuntu/.ssh/id_ed25519.pub
cat /home/ubuntu/.ssh/id_ed25519.pub >> /home/ubuntu/.ssh/authorized_keys
### slaves
for ind in $(seq 0 ${MINIONLASTIND})
do
  echo "setup the slave: ${CLUSTERNAME}-minion-${ind}"
  ssh ${CLUSTERNAME}-minion-${ind} "cp -f ${NFSDIR}/id_ed25519 /home/ubuntu/.ssh/; cp -f ${NFSDIR}/id_ed25519.pub /home/ubuntu/.ssh/; chown ubuntu:ubuntu /home/ubuntu/.ssh/id_ed25519*; chmod 600  /home/ubuntu/.ssh/id_ed25519; chmod 644 /home/ubuntu/.ssh/id_ed25519.pub; cat /home/ubuntu/.ssh/id_ed25519.pub >> /home/ubuntu/.ssh/authorized_keys"
done

The above script should be executed by root account. If you like to use a different type in stead of ed25519, you should modify the script.

Install Intel oneAPI and use its MPI

It is possible to install Intel oneAPI and use its MPI implementation instead of OpenMPI. The following script (https://github.com/astromsshin/cloud_ex/blob/main/tool_install_intel_oneapi_ubuntu_all_nodes.sh) can be used to install Intel oneAPI Base and HPC Toolkits in all VM nodes of the cluster. The root permission is required to execute commands included in the following script.

#!/bin/bash

# See
# https://www.intel.com/content/www/us/en/develop/documentation/installation-guide-for-intel-oneapi-toolkits-linux/top/installation/install-using-package-managers/apt.html

install_intel_oneapi='cd /tmp; wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB; apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB; rm GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB; echo "deb https://apt.repos.intel.com/oneapi all main" | tee /etc/apt/sources.list.d/oneAPI.list; add-apt-repository "deb https://apt.repos.intel.com/oneapi all main"; apt install -y intel-basekit intel-hpckit'

CLUSTERNAME="mycluster"
MINIONLASTIND="14"

echo "... install on ${CLUSTERNAME}-master"
echo $install_intel_oneapi | bash

for ind in $(seq 0 ${MINIONLASTIND})
do
  echo "... install on ${CLUSTERNAME}-minion-${ind}"
  ssh ${CLUSTERNAME}-minion-${ind} "${install_intel_oneapi}"
done

After installing the Intel toolkits, you need to setup the shell environment for the Intel tools. See https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos/use-a-config-file-for-setvars-sh-on-linux-or-macos.html for the guide to setup the environment. Here, simply source /opt/intel/oneapi/setvars.sh. As shown below, you can compile and test MPI programs by using the installed Intel toolkits.

> which mpiicpc
/opt/intel/oneapi/mpi/2021.5.1/bin/mpiicpc
> mpiicpc ./ex_mpi_montecarlo_pi.cpp
> ldd ./a.out
> which mpirun
/opt/intel/oneapi/mpi/2021.5.1/bin/mpirun

See https://www.intel.com/content/www/us/en/develop/documentation/mpi-developer-guide-linux/top.html to figure out how to compile and run MPI programs with Intel toolkits.

Installing softwares with apt

You may want to install software by using apt commands in Ubunut Linux environment. As described above, you can type a specific command such as "apt install parallel -y" in the cluster template as we do for installing GNU parallel in all cluster nodes. Using the custom commands in the cluster template is one way to install apt packages in all cluster nodes. If you want to install apt packages after your cluster is built, you can use the following script (https://github.com/astromsshin/cloud_ex/blob/main/tool_install_apt_packages_all_nodes.sh) which requires root permission.

#!/bin/bash

### [IMPORTANT]
# The root account/permission is required 
# for installation of software by using apt pacakges.
# You should run this script in root account.

# List packages.
# You can search and check packages in https://packages.ubuntu.com/
# for Ubuntu distributions.
pkgs=(astropy-utils python3-astropy python3-astropy-affiliated python3-astropy-healpix python3-astropy-helpers python3-sklearn python3-skimage python3-statsmodels python3-matplotlib zip)

CLUSTERNAME="mycluster"
MINIONLASTIND="14"

echo "... install on ${CLUSTERNAME}-master"
apt update -y
for pkg in ${pkgs[@]}
do
  apt install $pkg -y
done

for ind in $(seq 0 ${MINIONLASTIND})
do
  echo "... install on ${CLUSTERNAME}-minion-${ind}"
  ssh ${CLUSTERNAME}-minion-${ind} "apt update -y"
  for pkg in ${pkgs[@]}
  do
    ssh ${CLUSTERNAME}-minion-${ind} "apt install ${pkg} -y"
  done
done

Installing conda and preparing conda environtmens

The following script is available at https://github.com/astromsshin/cloud_ex/blob/main/tool_install_and_setup_conda_in_shared_volume.sh which installs miniconda and setups a specific conda environment by using the network-shared volume.

#!/bin/bash

CLUSTERNAME="mycluster"
NFSDIR="/mnt/mpi"

### [IMPORTANT]
# Because this script uses the directory ${NFSDIR}/miniconda
# as a Conda directory, the account running this script must have 
# a permission to create/write the directory ${NFSDIR}/miniconda.
# One way is to create the directory ${NFSDIR}/miniconda as root user,
# and then the owner of the directory is changed to the account running 
# this script. For example, as root account, 
# mkdir /mnt/mpi/miniconda
# chown ubuntu:ubuntu /mnt/mpi/miniconda
# if you like to use the conda environment as ubuntu account and you are
# running this script as ubuntu account.

CONDAENV="xclass"
CONDAURL="https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh"

# additional apt packages
# [IMPORTANT] this requires root permission apt install zip
# If you like to install apt pacakges separately in root account,               
# delete the following part.
sudo apt install zip

# installation of miniconda
cd /tmp
wget "${CONDAURL}" -O ./miniconda.sh && bash ./miniconda.sh -u -b -p ${NFSDIR}/miniconda

eval "$(${NFSDIR}/miniconda/bin/conda shell.bash hook)"
conda init
conda update -y -n base -c defaults conda

# creating the environment
conda create -y -n ${CONDAENV} python=2.7
# adding new conda packages
conda install -y -n ${CONDAENV} numpy
conda install -y -n ${CONDAENV} scipy
conda install -y -n ${CONDAENV} matplotlib
conda install -y -n ${CONDAENV} astropy
conda install -y -n ${CONDAENV} sqlite
# adding pip packages
conda activate ${CONDAENV}
pip install pyfits

echo "Do the following things to use the environment ${CONDAENV}"
echo "1) source ~/.bashrc"
echo "2) conda activate ${CONDAENV}"

You can imagine installing conda and preparing environments in your local home directory. The script https://github.com/astromsshin/cloud_ex/blob/main/tool_install_and_setup_conda_in_local_volume.sh can be used.

#!/bin/bash

CLUSTERNAME="mycluster"
TMPDIR="/tmp"

### [IMPORTANT]
# This script downloads the conda installation script to /tmp directory from 
# the url shown below.
# The conda installation directory is ${HOME}/miniconda.
# If you want to make the same conda environment available in all cluster nodes,
# you should run this script in every node.

CONDAENV="xclass"
CONDAURL="https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh"

# additional apt packages
# [IMPORTANT] this requires root permission apt install zip
# If you like to install apt pacakges separately in root account,               
# delete the following part.
sudo apt install zip

# installation of miniconda
cd ${TMPDIR}
wget "${CONDAURL}" -O ./miniconda.sh
bash ./miniconda.sh -b -p ${HOME}/miniconda

eval "$(${HOME}/miniconda/bin/conda shell.bash hook)"
conda init
conda update -y -n base -c defaults conda

# creating the environment
conda create -y -n ${CONDAENV} python=2.7
# adding new conda packages
conda install -y -n ${CONDAENV} numpy
conda install -y -n ${CONDAENV} scipy
conda install -y -n ${CONDAENV} matplotlib
conda install -y -n ${CONDAENV} astropy
conda install -y -n ${CONDAENV} sqlite
# adding pip packages
conda activate ${CONDAENV}
pip install pyfits

echo "Do the following things to use the environment ${CONDAENV}"
echo "1) source ~/.bashrc"
echo "2) conda activate ${CONDAENV}"

Downloading/uploading files from/to the external storage

The following script show how to download files from the external storage, which is described in KASI Science Cloud : VM instances#Step4.ConfiguretheVMinstanceforremotedesktop&externaldatastore, in the the network-shared volume. A typical usage includes downloading compiled binary codes, related scripts, configuration files, and data from the external storage. The script is available at https://github.com/astromsshin/cloud_ex/blob/main/tool_download_from_external_storage.sh.

#!/bin/bash

# edit the folloiwing variables
TARGETDIR="/mnt/mpi"

# assuming webdav accesses
WEBDAVIP="xxxx"
WEBDAVID="xxxx"
WEBDAVPW="xxxx"

# array of filenames that will be downloaded and saved
SRCFNARR=("XCLASS.zip" "ins_custom.sh")
DESTFNARR=("XCLASS.zip" "ins_custom.sh")

cd ${TARGETDIR}

CNT=0
for SRCFN in ${SRCFNARR[@]}
do
  DESTFN=${DESTFNARR[$CNT]}
  wget -O ${DESTFN} --no-check-certificate -r -c --user ${WEBDAVID} --password ${WEBDAVPW} https://${WEBDAVIP}/home/${SRCFN}
  CNT=$((CNT+1))
done

You can also upload files from the cluster to the external storage as shown in the following example script (https://github.com/astromsshin/cloud_ex/blob/main/tool_upload_to_external_storage.sh).

#!/bin/bash

# edit the folloiwing variables
# assuming webdav accesses
WEBDAVIP="xxxx"
WEBDAVID="xxxx"
WEBDAVPW="xxxx"

# array of filenames that will be downloaded and saved
SRCFNARR=("/root/.ssh/id_rsa.pub")
DESTFNARR=("cluster_master_id_rsa.pub")

CNT=0
for SRCFN in ${SRCFNARR[@]}
do
  DESTFN=${DESTFNARR[$CNT]}
  curl --insecure -u ${WEBDAVID}:${WEBDAVPW} -T ${SRCFN} https://${WEBDAVIP}/home/${DESTFN}
  CNT=$((CNT+1))
done

Running your own task queue

Without Slurm queue, you can still have your own queue when you use the environment for the GNU parallel, i.e., password-less ssh logins among the cluster nodes. It is recommended to use a simple task queue introduced in https://github.com/guo-yong-zhi/DistributedTaskQueue, which requires only passoword-less ssh connection and Python3 in the cluster master and worker nodes.

Space shortcuts

Page tree