12CONUS1 run taking long

Hi all,

I am running CCTM for 12CONUS1 domain for the timeframe of Feb 1, 2023, to May 1, 2023. The run is with 64 processor using HPC. But it is taking too long time (around 1 day to run for 1 hour simulation). How to decrease run time in this situation? I am attaching the CCTM run script and log script. Thanks for the help.

run_cctm_202302_12US1.txt (35.7 KB)
run_cctm_202302_12US1_log.txt (21.9 KB)

It is not clear what type of compute cluster you are running on.

This is a sample of your timings from your log file, which are very slow.

 |>---   TIME INTEGRATION   ---<|

 Processing Day/Time [YYYYDDD:HHMMSS]: 2023032:010000
   Which is Equivalent to (UTC): 1:00:00  Wednesday,  Feb. 1, 2023
   Time-Step Length (HHMMSS): 000500
             VDIFF completed...  252.7 seconds
            COUPLE completed...    0.1 seconds
              HADV completed...  571.5 seconds
              ZADV completed...    4.9 seconds
             HDIFF completed...   21.2 seconds
          DECOUPLE completed...   12.6 seconds
              PHOT completed...  649.1 seconds
           CLDPROC completed...   16.8 seconds
              CHEM completed...    8.0 seconds
              AERO completed...  123.4 seconds
        Master Time Step
        Processing completed... 1660.6 seconds

The 12US1 CONUS domain should take around 60 minutes per simulation day, on 64 processors, but this depends on the per-processor speed, and if the compute nodes are being used exclusively for CMAQ.

How are you submitting your run script? If you are submitting it interactively, then your job is running on the login node, rather than the compute nodes. If the login node has fewer number of cores than 64, and if they are also busy doing other work, then this is the type of poor performance you would see.

The number of compute nodes dispatched by the slurm scheduler is specified in the run script using #SBATCH –nodes=XX #SBATCH –ntasks-per-node=YY where the maximum value of tasks per node or YY limited by many CPUs are on the compute node.

For instance, if each of your compute nodes has 64 cores or CPUs, the maximum value of YY is 64 or –ntask-per-node=64.

If running a job with 64 processors, this would require the –nodes=XX or XX to be set to 1 compute nodes, as 64x1=64, or you could use 2 compute nodes each using 32 cores with -nodes=2, and -ntask-per-node=32.

The setting for NPCOLxNPROW must also be a maximum of 64, ie. 8 x 8 to use all of the CPUs per compute node.

You have the following setting in your run script, that indicates you are using 64 cores.

   @ NPCOL  =  16; @ NPROW =  4

If your compute cluster has 64 cores, then a more balanced domain decomposition setting would be 8x8 = 64

Please add the following commands to the top of your run script and report back the results.

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=64
#SBATCH --exclusive

echo 'information about processor including whether using hyperthreading'
echo 'information about cluster'
echo 'information about filesystem'
df -h
echo 'list the mounted volumes'
showmount -e localhost

   @ NPCOL  =  8; @ NPROW =  8

Assuming you have a slurm scheduler on your cluster, and you have 64 cores per compute node, add the above commands to your run script and submit your job using the command:

sbatch run_cctm_202302_12US1.csh
1 Like