Hi,
I have compiled CMAQv5.4 on different HPCs, including Discovery (Northeastern University) and Cheyenne (NCAR). I have run simulations successfully on both. Certainly, there are times that I face errors (e.g., missing files or wrong path, etc.), and I try to solve them by first checking the main log and if needed checking each core’s log (i.e., CTM_LOG).
On Discovery, using SLURM, I am getting per CTM_LOG files, yet on Cheyenne, using PBS, I am not getting these log files. I tried to find any related flags which I might have missed while compiling on Cheyenne, yet I am not sure. Is there any flag I should turn on or set any environmental variable to get the CTM_LOG files on Cheyenne?
Thanks for any help in advance,
I have also attached my headers when submitting the jobs if it helps.
SLURM:
#SBATCH -N 6
#SBATCH --ntasks-per-node=56
#SBATCH --exclusive
#SBATCH --time=24:00:00
#SBATCH --job-name=KF_CMAQ
#SBATCH --partition=000000
#SBATCH --output=slurm-%j.log
#SBATCH --constraint=ib
PBS:
#PBS -A 0000000
#PBS -N KF_CMAQ
#PBS -l walltime=12:00:00
#PBS -q regular
#PBS -j oe
#PBS -k eod
#PBS -m abe
#PBS -M 0000000
#PBS -l select=7:ncpus=36:mpiprocs=36:mem=109GB
I suspect that slurm or pbs log files are being written to your home directory, and that the scripts are failing because the executable is not being found. If logs are not being written to your home directory, there may be a default directory elsewhere on your system. Ask your system administrator.
Another option is to specify an absolute path:
#SBATCH -o /path/to/your/directory/cmaq_%j.out
and add a command to cd
to that directory in your run script.
We use slurm on our EPA machines. When we invoke the bldit_project.csh script with the argument “epa”, this gets inserted this into our run scripts:
#> The following commands output information from the SLURM
#> scheduler to the log files for traceability.
if ( $?SLURM_JOB_ID ) then
echo Job ID is $SLURM_JOB_ID
echo "Running on nodes `printenv SLURM_JOB_NODELIST`"
echo Host is $SLURM_SUBMIT_HOST
#> Switch to the working directory. By default,
#> SLURM launches processes from your home directory.
echo Working directory is $SLURM_SUBMIT_DIR
cd $SLURM_SUBMIT_DIR
endif
You might want to add something similar to your run script.
1 Like
Thanks for the information.
I would like to update you that I was able to solve the problem on Cheyenne by:
1- changing the intel compiler intel/19.1.1 to intel/2022.1
2- changing the mpi from mpt/2.25 to impi/2022.1
for CMAQ compilation.
I suspect the main change that fixed the problem should be changing the mpt to impi.
So, I suppose this problem had nothing to do with SBATCH or PBS.
Best,