Overview

Need to implement

PETSC Implementations of Conjugate Gradient and Pipelined Conjugate Gradient
Benchmarks for 18 matrices using Conjugate Gradient and Pipelined Conjugate Gradient

PETSC/TAO

HPC / Cluster Configuration

Do not use shared or dynamic libraries ⇒ petsc~shared
MPI lock to a specific socket, prevent OS switching between cores and processes. On clusters, it’s possible to get little speedups despite optimizations.

Configuring PETSc — PETSc 3.18.4 documentation

FAQ — PETSc 3.18.4 documentation

FAQ: General run-time tuning

PETSC Configuration for CUDA and MPI with Spack

# Request resources from KUACC
srun -A users --qos=users -p mid -N 1 -n 1 -c 4 --gres=gpu:2 --mem=64G --time=4:00:00 --constraint=tesla_v100 --pty $SHELL

# Initial requirements: [email protected], [email protected]
module unload gcc/9.1.0
module load gcc/8.5.0
module load python/3.9.5

# Install spack: [<https://spack.readthedocs.io/en/latest/getting_started.html>](<https://spack.readthedocs.io/en/latest/getting_started.html>)
cd ~ && mkdir pclapps && cd pclapps
git clone -c feature.manyFiles=true <https://github.com/spack/spack.git>

# Setup env for spack
cd ~/pclapps
. spack/share/spack/setup-env.sh

# Spack auto-detect compilers
spack compiler list
spack compiler find

# checking specs for a package
spack spec --reuse -I <pkg>

# Specs for petsc (with CUDA support)
# petsc source code: <https://gitlab.com/petsc/petsc>
# CUDA Compute Capabilities: <https://developer.nvidia.com/cuda-gpus#compute>
# installation step
# configuration taken from: [<https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-ci-linux-cuda-double-64idx.py>](<https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-ci-linux-cuda-double-64idx.py>)
# For GPU support petsc@main git branch only, no current releases
# Specify gcc compiler: [email protected]
# config examples: [<https://gitlab.com/petsc/petsc/-/tree/main/config/examples>](<https://gitlab.com/petsc/petsc/-/tree/main/config/examples>)
# cuda=1 hwloc=1 hypre=1 int64=1 double=1 openmp=1 mpi=1 fortran=1 hdf5=1 shared=1 superlu-dist=1
# build_system=generic clanguage=C cuda_arch=70 arch=linux-centos7-skylake_avx512

# just for petsc
spack install --reuse petsc@main+cuda+hwloc+hypre+int64+mpi+fortran+shared cuda_arch=80 \\
												^openmpi+cuda cuda_arch=80 fabrics=ucx schedulers=none \\
												^hwloc+cuda cuda_arch=80 \\
												^hypre+cuda+mixedint cuda_arch=80 \\
												^superlu-dist+cuda cuda_arch=80

PETSC Environment

# Environment Root
PCL_ROOT = /kuacc/users/ahasanli19/pclapps
SPACK_ROOT = $PCL_ROOT/spack

# Project Directory
PRJ_ROOT = $PCL_ROOT/petsc-benchmark

# Export Base Directories
export PCL_HOME = $PCL_ROOT
export PRJ_HOME = $PRJ_ROOT
export SPACK_HOME = $SPACK_ROOT

# setup spack environment
module load python/3.9.5
. $SPACK_ROOT/share/spack/setup-env.sh

# Module env vars
export PETSC_DIR = $SPACK_ROOT/opt/spack/linux-centos7-skylake_avx512/gcc-8.5.0/petsc-main-2v33uqhbhmbdfsc2g57m4hadwpldjnia
export MPI_DIR = $SPACK_ROOT/opt/spack/linux-centos7-skylake_avx512/gcc-8.5.0/openmpi-4.1.4-tp5z6a2dofstqbdm4s57czmbw7dhn22x
export CUDA_DIR = $SPACK_ROOT/opt/spack/linux-centos7-skylake_avx512/gcc-8.5.0/cuda-11.8.0-pomaumher35cjhsjthpdt7qkqjyzxki4

# petsc doesn't use .mtx (matrix market format), uses .dat format instead
python -m $PETSC_DIR/lib/petsc/bin/PetscBinaryIO convert matrix.mtx # => matrix.petsc

python3 ./scripts/measure_single_runtime.py --solver 0 --num_iters 100 --num_runs 5 --matrix_name "ecology2" --matrices_path $DEFAULT_MATRICES_FOLDER --num_gpus 1 --gpu_model "V100" -log_view
mpirun -np 2 ./bin/cg  -solver 0 -num_iters 100 -num_runs 5 -matrix_path /scratch/users/ahasanli19/pclapps/petsc-benchmark/scripts/../matrices/ecology2.petsc

python3 $PETSC_DIR/lib/petsc/bin/PetscBinaryIO.py convert "$DEFAULT_MATRICES_FOLDER/ecology2.mtx"

. batch/test01/measure_runtime_single_mat.sh --spackroot $SPACK_ROOT --matrix "ecology2" --solver "cg" --iters 5000 --runs 5 --gpus 1

Overview

Need to implement

PETSC/TAO

HPC / Cluster Configuration

PETSC Configuration for CUDA and MPI with Spack

PETSC Environment

Matrix Tutorials