It is not clear what type of compute cluster you are running on.
This is a sample of your timings from your log file, which are very slow.
================================
|>--- TIME INTEGRATION ---<|
================================
Processing Day/Time [YYYYDDD:HHMMSS]: 2023032:010000
Which is Equivalent to (UTC): 1:00:00 Wednesday, Feb. 1, 2023
Time-Step Length (HHMMSS): 000500
VDIFF completed... 252.7 seconds
COUPLE completed... 0.1 seconds
HADV completed... 571.5 seconds
ZADV completed... 4.9 seconds
HDIFF completed... 21.2 seconds
DECOUPLE completed... 12.6 seconds
PHOT completed... 649.1 seconds
CLDPROC completed... 16.8 seconds
CHEM completed... 8.0 seconds
AERO completed... 123.4 seconds
Master Time Step
Processing completed... 1660.6 seconds
The 12US1 CONUS domain should take around 60 minutes per simulation day, on 64 processors, but this depends on the per-processor speed, and if the compute nodes are being used exclusively for CMAQ.
How are you submitting your run script? If you are submitting it interactively, then your job is running on the login node, rather than the compute nodes. If the login node has fewer number of cores than 64, and if they are also busy doing other work, then this is the type of poor performance you would see.
The number of compute nodes dispatched by the slurm scheduler is specified in the run script using #SBATCH –nodes=XX #SBATCH –ntasks-per-node=YY where the maximum value of tasks per node or YY limited by many CPUs are on the compute node.
For instance, if each of your compute nodes has 64 cores or CPUs, the maximum value of YY is 64 or –ntask-per-node=64.
If running a job with 64 processors, this would require the –nodes=XX or XX to be set to 1 compute nodes, as 64x1=64, or you could use 2 compute nodes each using 32 cores with -nodes=2, and -ntask-per-node=32.
The setting for NPCOLxNPROW must also be a maximum of 64, ie. 8 x 8 to use all of the CPUs per compute node.
You have the following setting in your run script, that indicates you are using 64 cores.
@ NPCOL = 16; @ NPROW = 4
If your compute cluster has 64 cores, then a more balanced domain decomposition setting would be 8x8 = 64
Please add the following commands to the top of your run script and report back the results.
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=64
#SBATCH --exclusive
echo 'information about processor including whether using hyperthreading'
lscpu
echo 'information about cluster'
sinfo
echo 'information about filesystem'
df -h
echo 'list the mounted volumes'
showmount -e localhost
@ NPCOL = 8; @ NPROW = 8
Assuming you have a slurm scheduler on your cluster, and you have 64 cores per compute node, add the above commands to your run script and submit your job using the command:
sbatch run_cctm_202302_12US1.csh