I’m getting started with using NVSHMEM and I wanted to start from a simple example, with not much success.
#include <nvshmem.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
// Initialize the NVSHMEM library
nvshmem_init();
int mype = nvshmem_my_pe();
int npes = nvshmem_n_pes();
fprintf(stdout, "PE %d of %d has started ...\n", mype, npes);
// end shmem
nvshmem_finalize();
return 0;
}
Being run with the following sbatch file:
#!/bin/bash -l
#SBATCH --nodes=2 # number of nodes
#SBATCH --ntasks=8 # number of tasks
#SBATCH --ntasks-per-node=4 # number of tasks per node
#SBATCH --gpus-per-task=1 # number of gpu per task
#SBATCH --cpus-per-task=1 # number of cores per task
#SBATCH --time=00:15:00 # time (HH:MM:SS)
#SBATCH --partition=gpu # partition
#SBATCH --account=p200301 # project account
#SBATCH --qos=default # SLURM qos
module load NCCL OpenMPI CUDA NVSHMEM && nvcc -rdc=true -ccbin g++ -I $NVSHMEM_HOME/include test.cu -o test -L $NVSHMEM_HOME/lib -lnvshmem_host -lnvshmem_device -lucs -lucp && srun -n 8 ./test
The expected output would be something like:
PE 0 of 8 has started ...
PE 1 of 8 has started ...
PE 2 of 8 has started ...
.....
Instead the output I get is:
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
I think I am missing something crucial but simple, can somebody enlighten me?