Segmentation Fault running Benchmark in CMAQv5.3

Hello,

I am trying to run the benchmark for CMAQv5.3 but I am running into a segmentation fault (signal 11) at 23:55:00 in the simulation. Tired making the stack size unlimited using ulimit -s unlimited but that didnt have any effect. Any help is appreciated. The log file for the run is attached below.

Zach

cctm.log.txt (236.8 KB)

Hi Zach,

That is odd.
Can you check your $OUTDIR/LOGS
likely under:

$CMAQ_DATA/output_CCTM_v53_gcc_Bench_2016_12SE1/LOGS/

Are there any error messages at the end of the CTM_LOG* files?
You can use the following command to check:

grep -i error CTM_LOG*

Also, I recommend that you try to add the following command directly to your run script:

limit stacksize unlimited

Thanks, LIz

Thanks for helping me out!

I checked the log files and no error messages are popping up at all which is odd.

I also added the limit stacksize unlimited to the run file but that didnt change anything

Zach

Did you compile with -traceback (for ifort) or -fbacktrace (for gfortran) so that you could report where this seg-fault is happening?

That might be a next step…

I have -fbacktrace set here in the config_cmaq.csh.

setenv myDBG “-Wall -O0 -g -fcheck=all -ffpe-trap=invalid,zero,overflow -fbacktrace”

The backtrace should have been included in the cctm.log file I attached in the original post.

Would you mind trying to run using the optimized run and see if it runs to completion?
Comment out the Debug_CCTM option as follows in the bldit_cctm.csh script.

# set Debug_CCTM

It has been commented out for all the runs I have done

In that case, you are building an optimized version of the code. To build with the backtrace option, then you need to uncomment that line

set Debug_CCTM

I uncommented the set Debug_CCTM line, recompiled and then ran the benchmark however it still came up with the same errors.

The log file is attached below

cctm.log.txt (236.8 KB)

You are now getting the following additional information in the log file:
0 0x2B8BA546ECB0
#1 0x2B8BA546DEB0
#2 0x2B8BA63504AF
#3 0x45A0B2 in pshut3_
#4 0x6A05A1 in cmaq_driver_
#5 0x695C3C in MAIN__ at cmaq_main.F:?

I am not sure, but I did notice that the version of I/O API that you are using is older than the one I have used to successfully run the benchmark case.

   ioapi-3.2: $Id: init3.F90 120 2019-06-21 14:18:20Z coats $
     Version with PARMS3.EXT/PARAMETER::MXVARS3= 2048
     netCDF version 4.7.0 of Jun 28 2019 14:56:39 $

Your version was about a year older.

   ioapi-3.2: $Id: init3.F90 98 2018-04-05 14:35:07Z coats $
     MPI/PnetCDF parallel-I/O enabled
     PnetCDF Library Version "1.7.0 of 03 Mar 2016"
     netCDF version 4.6.2 of Dec 23 2018 20:16:40 $

Please consider following these instructions to download and install the latest version of the netCDF and I/O API libraries.

1 Like

Liz,

I have succesfully downloaded and installed the latest version of netCDF (4.7.1) and I/O API (3.2) libraries. I set the locations in the config_cmaq.csh file as shown.

#> gfortran compiler…
case gcc:

    #> I/O API, netCDF, and MPI library locations
    setenv IOAPI_INCL_DIR   /home/colethatcher/Documents/ioapi-3.2/Linux2_x86_64gfort_openmpi_4.0.1_gcc_9.1.0 #> I/O API include header files
    setenv IOAPI_LIB_DIR    /home/colethatcher/Documents/ioapi-3.2/Linux2_x86_64gfort_openmpi_4.0.1_gcc_9.1.0  #> I/O API libraries
    setenv NETCDF_LIB_DIR   /home/colethatcher/Documents/netcdf-c-4.7.1-gcc9.1.0/lib  #> netCDF C directory path
    setenv NETCDF_INCL_DIR  /home/colethatcher/Documents/netcdf-c-4.7.1-gcc9.1.0/include #> netCDF C directory path
    setenv NETCDFF_LIB_DIR  /home/colethatcher/Documents/netcdf-fortran-4.5.1-gcc9.1.0/lib #> netCDF Fortran directory path
    setenv NETCDFF_INCL_DIR /home/colethatcher/Documents/netcdf-fortran-4.5.1-gcc9.1.0/include #> netCDF Fortran directory path
    setenv MPI_LIB_DIR      /home/linuxbrew/.linuxbrew/lib     #> MPI directory path

After recompiling using bldit_cctm.csh gcc |& tee bldit_cctm.log, I noticed I still get the old version of netcdf 4.6.2 when I try and run the benchmark test. Is there something I am doing wrong here.

cctm.log.txt (236.8 KB)

Go to $CMAQ_HOME/lib/… to manually delete the symbolic links to the old libraries before you re-run build script to update.

I did that but still no change in the library version

Have you edited the config_cmaq.csh, that is where the links are created.

I was able to successfully update my libraries but it is still giving a segmentation error at 23:55 into the Benchmark.

cctm.log.txt (235.4 KB)

Hi,

Can you try to run with more processors.
Are you currently running on 16? If so, increase to 32 processors.
@ NPCOL = 4; @ NPROW = 8

Ive done 8, 16, and 32 processors and still gives the same error.

cctm.log.txt (230.5 KB)

Are you compiling CMAQ in debug mode?
If so, try compiling and running in optimized mode.
#set Debug_CCTM #> uncomment to compile CCTM with debug option equal to TRUE

Here is a reference on causes of segmentation faults that may be helpful.
https://kb.iu.edu/d/aqsj

The last line in your output file was
Data Output completed
This line is in the RUNTIME_VARS.F

The backtrace has the following information
#0 0x2B2763916CB0
#1 0x2B2763915EB0
#2 0x2B27647F84AF
#3 0x458962 in pshut3_
#4 0x69EE51 in cmaq_driver_
#5 0x6944EC in MAIN__ at cmaq_main.F:?

You could try changing the following setting to 1:
#> Toggle Diagnostic Mode which will print verbose information to
#> standard output
setenv CTM_DIAG_LVL 0

If you are using the default settings for the following options, I would try turning them off.

setenv PRINT_PROC_TIME Y           #> Print timing for all science subprocesses to Logfile
                                   #>   [ default: TRUE or Y ]
setenv STDOUT T                    #> Override I/O-API trying to write information to both the processor 
                                   #>   logs and STDOUT [ options: T | F ]

I can’t seem to reproduce your error on my end.

Can you also share the contents of your CTM_LOG file for the 000 processor:
CTM_LOG_000.v53_gcc_Bench_2016_12SE1_20160701

This is typically moved to the following directory at the end of the run, or it may still be in your run directory.
data/output_CCTM_v53_gcc_Bench_2016_12SE1/LOGS

Also - do you know if you are running with parallel i/o?

Here are the logs

]CTM_LOG_000.v53_gcc_Bench_2016_12SE1_20160702.txt (622.2 KB)
CTM_LOG_000.v53_gcc_Bench_2016_12SE1_20160701.txt (620.0 KB)

I dont know whether I am running parallel i/o.

Thanks for helping out again

Zach