Does "PROGRAM COMPLETED SUCCESSFULLY" message must exist in CTM_LOG_000

Hi,

The CMAQ version we used is CMAQv5.3.3 released in August 2021. We check the cctm run by the following program statements.

478   set cmaq_check = `tail -n 4 $cwd/CTM_LOG_000.${CTM_APPL} | cut -c 32-43 | grep SUCCESSFULLY`
479   echo "$cmaq_check"
480   if ( $cmaq_check != "SUCCESSFULLY" ) then
481     echo "run.cctm unsucessful, d0${DOMAINS_GRID} CUR_GDATE"
482     exit 1
483   endif
484   echo "Successfully finished CMAQ d0${DOMAINS_GRID} $CUR_JDATE"
485

The statements have been working fine until this time. I believed the cctm successfully finish as usual, but the “PROGRAM COMPLETED SUCCESSFULLY” message didn’t occur in CTM_LOG_000 so the whole process stopped. Besides, the “PROGRAM COMPLETED SUCCESSFULLY” message existed in other CTM_LOG files.

I would like to know if the cctm only successfully finished when “PROGRAM COMPLETED SUCCESSFULLY” message exists in CTM_LOG_000? Thanks!

What UNIX/Linux actually supports (at the system-level) is the exit-status number of the executables: 0 for success, 1 for I/O errors, 2 for algorithm errors, negative values for system-killed, and other positive errors for programer-customized errors. I/O API routine M3EXIT was designed to set these exit-status numbers.

What all these scripts actually should be checking is that status; the code for that is something like the following:

    mpirun ${model}
    set runstat = ${status}
    if ( ${runstat} != 0 )  then
        echo "ERROR ${runstat} on program ${model} for ..."
        exit ( ${runstat} )
    endif

And of course as soon as one program-execution fails, the entire script-system should likewise fail, rather than blindly barging on afterwards, generating garbage and a large set of logs which must all be examined in order to find the original failure and the reason for it.

Of course, two generations of script-programmers ignorant of UNIX/Linux system programming have insisted on trying the sort of search-for-messages approach that you describe. This approach usually works, but can sometimes fail.

FWIW –
Carlie J. Coats, Jr., Ph.D.
I/O API Author/Maintainer
Original CMAQ systems architect.

Hi, Carlie. Thank you very much for your kind reply. It is really helpful and I learned a lot.

Hello Yeqi,

I agree with @cjcoats that basing script error checking on exit status is the preferred approach.

That said, you wrote

What does the end of your CTM_LOG_000 file show? Could you please post a copy of that file? In my experience, the “PROGRAM COMPLETED SUCCESSFULLY” message appears in CTM_LOG_000 if a run was successful, just as it appears in all the other CTM_LOG files in that case.

Hi. Thanks for your attention. I didn’t save that “wrong” log file, but I compared it with the following “sucessful” one.

In the “wrong” CTM_LOG_000, the message from line 21715 to 21720 was missing. But in other log files (e.g. CTM_LOG_100), lines 21715 to 21720 were there.

Thanks for sharing. That’s interesting and I don’t have a good explanation, I have not seen this happen before. That said, I also haven’t run any CCTM simulation including DDM and ending at 12:00 so maybe it’s something in that setup that triggers this behavior, though I don’t know what it might be. But if your CGRID and S_CGRID files were created successfully and all other output files have the correct number of time steps and reasonable fields for the last hour, it’s probably not worth your time to investigate this further.

One reason for a run stopping without any error messages is if the disk space where the output is being written to is full. Especially if the run script and the log files are also on that same disk.

Actually we have been running the CMAQ model (prvious version 5.0.2 and current 5.3.3) for several years as an air quality forecast, and this is the first time we met this problem. I think this should be a small probability event. Thank you for your time.

1 Like