CMAQ-5.4 failed when running north hemispheric simulation

lwu127 · August 14, 2023, 3:04pm

Hi everyone,

I am trying to run a north hemispheric simulation using CMAQ-5.4. The run script is attached below. I compiled the model with DEBUG mode on. The simulation stopped at (UTC): 0:45:00 with no error message in the log file and the buff file. I wonder if the error message in cctm.o file is the reason. I am totally confused and don’t know how to fix this. Could you help with this?
cctm.o2568773.txt (154.6 KB)
buff_CMAQ_CCTMv54_sha=fb7856ef5c_hmao_20230814_140029_935996302.txt|attachment (14.6 KB)
CTM_LOG_000.v54_intel_108NHEMI_20190101.txt (162.7 KB)

Thanks!
Lin
run_cctm_2019_HEMI.txt (38.3 KB)

cjcoats · August 14, 2023, 3:26pm

*** ERROR in INIT3/INITLOG3 ***
Error opening log file on unit 99
I/O STATUS = 10
DESCRIPTION: cannot overwrite existing file, unit 99, file /glade/u/home/hmao/CMAQ-5.4/CCTM/scripts/CTM_LOG_032.v54_intel_108NHEMI_20190101

You need to remove existing log-files before doing a run.

lwu127 · August 14, 2023, 3:37pm

Thanks for your reply. I actually removed existing log files before the new run. But this error message still appeared. Do you think it is this error message that caused the stop of the running? Because even if this message existed, the simulation still runs for 45 minutes.

Thanks!
Lin

lizadams · August 14, 2023, 3:46pm

How are you submitting this job?

Are you using

qsub run_cctm_2019_HEMI.csh

I don’t see commands such as this in your run script:

Job Name

#PBS -N mpi_job

If you are submitting the job interactively, then you are running on the login nodes, and it will fail.

If you are submitting the job to the queue, then I would search your HPC system help desk for tips on why jobs fail, including filling up the home directory with log files.

(Documentation | ARC NCAR)

https://arc.ucar.edu/knowledge_base/72581486#Cheyennejobscriptexamples-BatchscripttorunanMPIjob

lwu127 · August 14, 2023, 3:49pm

Thanks Liz,
I submitted the job to the queue and my script is this:
cctm_run.pbs.txt (460 Bytes)

Thanks!
Lin

lizadams · August 14, 2023, 4:07pm

I would edit your run script to add the PBS commands to the top of the run script, rather than calling the run script from your cctm_run.pbs.csh script.

You are calling mpirun -np 36 twice, once in the cctm_run.pbs.csh, and a second time in the run_cctm_2019_HEMI.csh.

Do a grep on mpirun for both scripts, and you will see the issue.

The comment below to select 2 nodes with 36 CPUs for a total of 72 MPI processes is not what you are doing.

### *Select 2 nodes with 36 CPUs each for a total of 72 MPI processes*
#PBS -l select=1:ncpus=36:mpiprocs=36
### Send email on abort, begin and end
###PBS -m abe
### Specify mail recipient
###PBS -M email_address

### Run the executable
mpirun -n 36 ./run_cctm_2019_HEMI.csh

and your run script is set to use 36 processors and contains the following commands

   @ NPCOL  =  6; @ NPROW =  6
   @ NPROCS = $NPCOL * $NPROW
   setenv NPCOL_NPROW "$NPCOL $NPROW"; 
endif

  #> Executable call for multi PE, configure for your system 
  # set MPI = /usr/local/intel/impi/3.2.2.006/bin64
  # set MPIRUN = $MPI/mpirun
  ( /usr/bin/time -p mpirun -np $NPROCS $BLD/$EXEC ) |& tee buff_${EXECUTION_ID}.txt

I would change your workflow to add the PBS commands at the top of the run_cctm_2019_HEMI.csh and then use

qsub run_cctm_2019_HEMI.csh

Alternatively, you could edit your cctm_run.pbs.txt script as follows:

### Run the executable
./run_cctm_2019_HEMI.csh

lwu127 · August 14, 2023, 7:13pm

Thanks Liz,

I followed your notes and the error in the cctm.o file disappeared. But the run still stopped after 45 minutes. I guess i will need to ask the HPC help desk for help.

Best!
Lin

lizadams · August 14, 2023, 7:19pm

If you can run top or htop on one of the compute nodes, you can check to see if you are close to using all of the memory. It may be that you can use 72 processors for the run to have more memory available per compute node.

Please ask the help desk how to set your PBS batch commands to do this. It may be as follows:

#PBS -l select=2:ncpus=36:mpiprocs=72

Then change the domain decomposition in your run script to use 72 processors

@ NPCOL  =  8; @ NPROW =  9

Topic		Replies	Views
Can’t find any error in log when i running CCTM script Run Time Errors and Issues	3	39	May 29, 2025
Error when running CMAQ v5.3 Run Time Errors and Issues	7	2020	February 12, 2021
CCTM running error in CMAQv5.0.2 Run Time Errors and Issues	11	2722	July 3, 2020
Error in running the test case CMAQ	3	38	October 3, 2024
Error in CMAQ simulation CMAQ cmaq	2	28	February 25, 2025

CMAQ-5.4 failed when running north hemispheric simulation

Job Name

Related topics