CMAQ run cctm error "invalid memory reference"

Hi,

I am trying to run cmac using a CPU node.

In the log of JobScript, the mcip, icon, and bcon processes were executed and a message was displayed indicating that they were completed normally.

Immediately after the run_cctm.csh file is executed, the following error occurs.

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x2B6CA1A806F7
#1 0x2B6CA1A80D3E
#2 0x2B6CA25132EF
#3 0x2B6CA256C767
#4 0x6614B9 in emis_defn_MOD_get_emis
#5 0x6F8277 in vdiff

#6 0x67FE0D in sciproc

#7 0x673FDD in cmaq_driver_
#8 0x672E35 in MAIN__ at cmaq_main.F:?
#0 0x2ADD3CF096F7
#1 0x2ADD3CF09D3E
#2 0x2ADD3D99C2EF
#3 0x2ADD3D9F5767
#4 0x6614B9 in emis_defn_MOD_get_emis
#5 0x6F8277 in vdiff

#6 0x67FE0D in sciproc

#7 0x673FDD in cmaq_driver_
#8 0x672E35 in MAIN__ at cmaq_main.F:?
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x2B19268186F7
#1 0x2B1926818D3E
#2 0x2B19272AB2EF
#0 0x2BA7CBFAC6F7
#1 0x2BA7CBFACD3E
#2 0x2BA7CCA3F2EF
#3 0x2B1927304767
#3 0x2BA7CCA98767
#4 0x6614B9 in emis_defn_MOD_get_emis
#4 0x6614B9 in emis_defn_MOD_get_emis
#5 0x6F8277 in vdiff

#5 0x6F8277 in vdiff

#6 0x67FE0D in sciproc

#6 0x67FE0D in sciproc

#0 0x2AF4100BC6F7
#1 0x2AF4100BCD3E
#2 0x2AF410B4F2EF
#7 0x673FDD in cmaq_driver_
#7 0x673FDD in cmaq_driver_
#0 0x2B91971186F7
#3 0x2AF410BA8767
#1 0x2B9197118D3E
#2 0x2B9197BAB2EF
#3 0x2B9197C04767
#8 0x672E35 in MAIN__ at cmaq_main.F:?
#8 0x672E35 in MAIN__ at cmaq_main.F:?
#4 0x6614B9 in emis_defn_MOD_get_emis
#4 0x6614B9 in emis_defn_MOD_get_emis
#5 0x6F8277 in vdiff

#5 0x6F8277 in vdiff

#6 0x67FE0D in sciproc

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#6 0x67FE0D in sciproc

#7 0x673FDD in cmaq_driver_
#0 0x2B3436B216F7
#1 0x2B3436B21D3E
#7 0x673FDD in cmaq_driver_
#8 0x672E35 in MAIN__ at cmaq_main.F:?
#2 0x2B34375B42EF
#3 0x2B343760D767
#8 0x672E35 in MAIN__ at cmaq_main.F:?
#4 0x6614B9 in emis_defn_MOD_get_emis
#5 0x6F8277 in vdiff

#6 0x67FE0D in sciproc

#7 0x673FDD in cmaq_driver_
#8 0x672E35 in MAIN__ at cmaq_main.F:?
real 3.68
user 18.41
sys 12.48
[1] Exit 11 /usr/bin/time -p /library/mpi/mvapich2/2.2_gcc485/bin/mpirun -np 16 -machinefile /home/tech/tech6/dilab/Simple_run/EXE/machines /home/tech/tech6/dilab/Simple_run/EXE/00/CCTM/scripts/BLD_CCTM_v531_gcc/CCTM_v531.exe
break: Not in while/foreach.

 ================================
 |>---   TIME INTEGRATION   ---<|
 ================================
 Processing Day/Time [YYYYDDD:HHMMSS]: 2022341:130000
   Which is Equivalent to (UTC): 13:00:00 Wednesday,  Dec. 7, 2022
   Time-Step Length (HHMMSS): 001200

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 12217 RUNNING AT csl18
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions


** Runscript Detected an Error: CGRID file was not written. **
** This indicates that CMAQ was interrupted or an issue **
** exists with writing output. The runscript will now **
** abort rather than proceeding to subsequent days. **


I also wrote “limit stacksize unlimited” in the .cshrc file of the home directory, wondering if it was a memory problem.

Please help if this is a computational resource problem or some other problem

1 Like

There is most probably an un-checked ALLOCATE statement that is failing (production quality code should always check to see if failures happen with them, or with I/O statements, and in case of a failure should give a report sufficient to diagnose the problem, and then terminate gracefully.)

You may need limit memoryuse unlimited in addition to the limit stacksize unlimited.

The debug/traceback executable should give you more detail – exactly at which line in emis_defn_MOD_get_emis this is happening…