CMAQv5.2.1 segmentation fault when running CCTM

Hi CMAQ users:
I run into a crash of CCTM simulation, show segmentation fault. I just attach the running log (https://purdue0-my.sharepoint.com/:t:/g/personal/fang63_purdue_edu/ETi9jPTnXOxBlW-VoIVpWDsB_nUbdVCebWIaLxmPzaXeqA?e=w8ODgF) and running script (run_cctm_2016_tran_us_inv2.csh.txt (23.8 KB) ) here. Could anyone look into this when you have a chance? The version I am using is 5.2.1.
Best
Huan

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
CCTM_v521.exe 0000000000880643 Unknown Unknown Unknown
libpthread-2.17.s 00002AC41E09D5F0 Unknown Unknown Unknown
libiomp5.so 00002AC41BBE34DF Unknown Unknown Unknown
libiomp5.so 00002AC41BBE4087 Unknown Unknown Unknown
CCTM_v521.exe 00000000008BCAC0 Unknown Unknown Unknown
CCTM_v521.exe 00000000005AB5DE Unknown Unknown Unknown
CCTM_v521.exe 000000000059225B Unknown Unknown Unknown
CCTM_v521.exe 000000000058C219 Unknown Unknown Unknown
CCTM_v521.exe 000000000040EF62 Unknown Unknown Unknown
libc-2.17.so 00002AC41E5CE505 __libc_start_main Unknown Unknown
CCTM_v521.exe 000000000040EE69 Unknown Unknown Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
CCTM_v521.exe 00000000008806E9 Unknown Unknown Unknown
libpthread-2.17.s 00002AC41E09D5F0 Unknown Unknown Unknown
libiomp5.so 00002AC41BBE1B31 Unknown Unknown Unknown
libiomp5.so 00002AC41BBE544A Unknown Unknown Unknown
libiomp5.so 00002AC41BB5E64A Unknown Unknown Unknown
libiomp5.so 00002AC41BB65632 Unknown Unknown Unknown
ld-2.17.so 00002AC41B89D058 Unknown Unknown Unknown
libc-2.17.so 00002AC41E5E5C99 Unknown Unknown Unknown
libc-2.17.so 00002AC41E5E5CE7 Unknown Unknown Unknown
CCTM_v521.exe 0000000000878CA1 Unknown Unknown Unknown
CCTM_v521.exe 0000000000880643 Unknown Unknown Unknown
libpthread-2.17.s 00002AC41E09D5F0 Unknown Unknown Unknown
libiomp5.so 00002AC41BBE34DF Unknown Unknown Unknown
libiomp5.so 00002AC41BBE4087 Unknown Unknown Unknown
CCTM_v521.exe| 00000000008BCAC0 Unknown Unknown Unknown|
CCTM_v521.exe| 00000000005AB5DE Unknown Unknown Unknown|
CCTM_v521.exe| 000000000059225B Unknown Unknown Unknown|
CCTM_v521.exe| 000000000058C219 Unknown Unknown Unknown|
CCTM_v521.exe| 000000000040EF62 Unknown Unknown Unknown|
libc-2.17.so| 00002AC41E5CE505 __libc_start_main Unknown Unknown|
CCTM_v521.exe| 000000000040EE69 Unknown Unknown Unknown|

Hi Huan,
Thank you for providing a link to your log file.
Before the segmentation fault, you are getting the following warning.
CHECKING COMPATABILITY BETWEEN INTERNAL SPECIES LIST AND EMISSIONS INPUTS
The following tables list the chemical species present
on each emission file or from each online emission source.
The following GAS emission surrogate species are
not present from any of the Area, Point, Biogenic,
Marine Gas, or Lightning emissions sources
PACD AACD FACD GLYXL TOLU
HGNRVA HGIIGAS

*WARNING:
One or more emissions surrogates assigned to model species
are not found in emissions sources but the CTM_EMISCHK
environment variable set to False so simulation will proceed.


ATTENTION: The following emission species are available
from the inputs but are not used: 12 Species
BENZENE CO2 HOZO PNCOM
VOC YO YO2 NVOL
UNK UNR NH3_FERT VOC_INV

 >>--->> WARNING in subroutine EMIS_SPC_CHECK on PE 000
 For optimal predictions, species with the missing surrogates should have a surrogate found in at least one source.
 M3WARN:  DTBUF 0:00:00   Jan. 1, 2016  (2016001:000000)

You received several warnings about variables not being in the emissions file:
>>—>> WARNING in subroutine GET_EMIS:INTERPX
Variable “BENZ” not in file EMIS_1
M3WARN: DTBUF 0:02:30 Jan. 1, 2016 (2016001:000230)

Then you start to get reports of NaN
File “FLOOR_000.v521_intel_2016_20160101” opened for output on unit: 95
FLOOR_000.v521_intel_2016_20160101

after INITSCEN G 1.2239218E-01 A 6.4297958E+08 N 5.3750406E-05
after VDIFF G 1.0325761E+31 A NaN N -6.9620859E+30
after COUPLE_WR G Infinity A Infinity N Infinity

I used the search feature in the forum to look for similar reports of NaN, and found this:


I am quoting the relevant advice here:
"The values following “after INITSCEN” shows the sum of species concentrations for gas-phase, aerosols, and nonreactives, respectively, after reading in the initial conditions. The first science process is VDIFF, which includes emissions. After this process, your log file contains a “NaN”
This indicates aerosols already contains NaN, or Not a Number.

Check to be sure there are no errors (missing data, NaNs, or infinity values) in your emissions file."

Hi Liz:
I don’t think this is the reason, since I have another simulation which also does not have PACD, AACD, etc. in the emission input file, and also get the similar warning in the running log, while it run successfully.
Huan



Hi Liz:
Attached are two running logs of CCTM simulation. The only difference is the emission input files. The first one success, while the second one (the one I mention at the beginning) fails.
Huan

Huan,
Try again by using smaller CTM_MAXSYNC (e.g. change it from 300 to 200) in your run script. I hope this will help.

it does not resolve the segmentation fault…

Hi Huan,

Please try using the m3tools utilities to examine the differences between your emission input files.


I used m3stat on the gridded area emissions file provided with the benchmark case.
setenv INFILE emis_mole_all_20160701_cb6_bench.nc
m3stat     ! return enter to the prompts to generate the standard report
grep NaN REPORT

Or you could use m3diff to compare two versions of the same emissions input file.

I suggest that you should reduce the number of CTM_MAXSYNC further, otherwise please try the benchmark data package and make sure your model works. I wonder if you had CMAQv5.2.1 run with benchmark data package for a multiple-day case in 2017. If you have already done it successfully. Go to check your meteorology data. I say this because your model is crashed due to the invalid number of vertical diffusivity as shown in your log file at:

after VDIFF G 1.0325761E+31 A NaN N -6.9620859E+30
after COUPLE_WR G Infinity A Infinity N Infinity

It may be related to convergence issue and resulted from your WRF/MCIP outputs. You will be able to closely scrutinize what is abnormal around 0:02:30 Jan 1, 2016 from your WRF/MCIP files (check vertical and horizontal wind gradients and other parameters, for example).

As a reference, please see my log file with benchmark input package here is

after VDIFF G 2.5875720E-01 A 7.0756254E+10 N 1.7123001E-03
after COUPLE_WR G 2.4975933E+03 A 6.8392100E+14 N 1.6526962E+01

I hope this helps.

Hi Liz:

The m3tools seems not construct properly on my end, showing

/scratch/brown/fang63/cmaq/ioapi-3.2/m3tools/m3stat.f: line 2: PROGRAM: command not found
/scratch/brown/fang63/cmaq/ioapi-3.2/m3tools/m3stat.f: line 4: C***********************************************************************: command not found
/scratch/brown/fang63/cmaq/ioapi-3.2/m3tools/m3stat.f: line 5: C: command not found
/scratch/brown/fang63/cmaq/ioapi-3.2/m3tools/m3stat.f: line 6: C: command not found
/scratch/brown/fang63/cmaq/ioapi-3.2/m3tools/m3stat.f: line 7: syntax error near unexpected token (' /scratch/brown/fang63/cmaq/ioapi-3.2/m3tools/m3stat.f: line 7: C Copyright © 1992-2002 MCNC,’

I will contact the technician of my department to fix it. While, is the idea for using m3stat to check whether there is NaN in the emission input files? I wrote a python script to check this out, and it does not show there is any missing data in emission input files

Huan

Hi Huan,

To build the ioapi tools

cd ioapi-3.2
cd m3tools
cp Makefile.nocpl Makefile
make

Note: The Makefile needs to be edited to add the path to the netcdf libraries.
After version 4.1.1 of the netcdf library, the C and Fortran libraries are built separately.
In my case, I built the following fortran library: netcdf-fortran-4.4.5-gcc6.3.0
which contains bin, include, lib and share subdirectories
and the following C library: netcdf-c-4.7.0-gcc6.3.0
which also contains bin, include, lib and share subdirectories

If you copy these into a netcdf_combined directory

cp -rp netcdf-c-4.7.0-gcc6.3.0/* ./netcdf_combined
cp -rp netcdf-fortran-4.4.5-gcc6.3.0/* ./netcdf_combined

Then you can use a single directory to reference both the netcdf C and netcdf Fortran libraries in the following command

BASEDIR = ${HOME_DIR}
SRCDIR  = ${BASEDIR}/m3tools
IODIR   = ${BASEDIR}/ioapi
OBJDIR  = ${BASEDIR}/${BIN}
INSTDIR = ${INSTALL}/${BIN}
NCDFDIR = /proj/ie/proj/CMAS/CMAQ/CMAQv5.3.2_rel/LIBRARIES/netcdf_combined

# Architecture dependent stuff
# Assumes FC is an f90

include $(IODIR)/Makeinclude.${BIN}

FFLAGS = -I$(IODIR) ${MODI}$(OBJDIR) $(ARCHFLAGS) $(FOPTFLAGS) $(ARCHFLAGS)

LDFLAGS = -I$(IODIR) $(DEFINEFLAGS) $(ARCHFLAGS)

#  Incompatibility between netCDF versions before / after v4.1.1:
#  For netCDF v4 and later, you may also need the extra libraries
#  given by netCDF commands
#
#          nc-config --libs
#          nf-config --libs
#
#  Cygwin libraries need "-lnetcdff.dll -lnetcdf.dll" below
#
 LIBS = -L${OBJDIR} -lioapi -L${NCDFDIR}/lib -lnetcdff -lnetcdf $(OMPLIBS) $(ARCHLIB) $(ARCHLIBS)

Hi Liz:
The make command does not go through. I just attach the error message and Makefile here.
Huan

fang63@brown-fe00:/scratch/brown/fang63/cmaq/ioapi-3.2/m3tools setenv BIN Linux2_x86_64ifort fang63@brown-fe00:/scratch/brown/fang63/cmaq/ioapi-3.2/m3tools make
make: Nothing to be done for `all’.
Makefile.txt (13.2 KB)

Hi Huan,

Did you check in your /scratch/brown/fang63/cmaq/ioapi-3.2/bin directory to see if the programs have already been compiled?

REPORT.txt (334.6 KB)
Hi Liz:
Attached is the report of m3stat for the emission file which the CCTM fails.
Huan

Hi Liz:
I used m3stat to check the emission input file, which caused segmentation fault when running CCTM. The report file has been attached in the previous reply. It seems to be normal, could you look into that if you get a chance?
Also for the grep NaN command you suggested, it does not show any missing data.
Huan

fang63@brown-fe01:/scratch/brown/fang63/cmaq/cmaq-5.2.1/data/emis/gridded_area/inv2 $ grep NaN REPORT

Hi Feng:
If you check other replies from me under this topic, you may have some ideas what is going on. Basically, I had a successful simulation previous, then I only made on change, which is the emission input file, then the CCTM crashed. After very in detail exploration, I think the reason might be, the concentration of some species dropped to a very low, close to zero number, at some point during the simulation, so some calculations disabled.
Inside the CCTM running script, is there some settings, to allow the calculation goes to more decimal places (e.g. absolute tolerance)?
Huan

@FengLiu @lizadams
As mentioned above, the only different between this run and the previous successful run was the emission input file. I compared the emission rates, they are lower than the previous setting. On CCTM log, it shows “Convergence failure for the following species”. Anyone has ideas why it occurs, and how to deal with it?

Huan

ERROR: Max number of EBI time step reductions exceeded
Convergence failure for cell ( 1, 1, 1)
Convergence failure for the following species:Init.Conc, Pred.Conc.
NO2 1.6700E-04 2.5189E-05 ppmV
O3 3.5000E-02 1.4492E-07 ppmV
NO3 1.0000E-24 1.2214E-25 ppmV
OH 1.0000E-24 2.0672E-08 ppmV
HO2 1.0000E-24 5.5793E+00 ppmV
H2O2 1.0000E-03 4.4458E+02 ppmV
N2O5 1.0000E-24 2.0072E-29 ppmV
HNO3 5.0016E-05 5.2012E-04 ppmV
HONO 1.3076E-10 4.7010E-04 ppmV
PNA 2.0000E-09 2.0795E-08 ppmV
MEO2 1.0000E-24 2.2281E-17 ppmV
RO2 1.0000E-24 4.3558E+03 ppmV
AACD 1.0001E-06 4.4458E+02 ppmV
CXO3 1.0000E-24 1.1155E+03 ppmV
ALD2 3.0001E-05 4.0679E+03 ppmV
XO2H 1.0000E-24 1.8674E+03 ppmV
FORM 8.0060E+03 9.4390E+03 ppmV
XO2 1.0000E-24 1.9736E+03 ppmV
XO2N 1.0000E-24 5.1485E+02 ppmV
NTR1 1.0000E-24 2.5255E-16 ppmV
NTR2 1.0000E-24 5.0706E-15 ppmV
FACD 1.0001E-06 1.3035E+01 ppmV
CO 8.0000E-02 1.3824E+03 ppmV
ALDX 1.3244E-10 3.0569E+03 ppmV
GLYD 1.0000E-24 2.5570E-15 ppmV
GLY 1.0000E-24 1.9400E-01 ppmV
MGLY 2.5001E-07 5.9756E-01 ppmV
PAR 1.0994E+14 1.0994E+14 ppmV
XPAR 1.0000E-24 1.1055E-15 ppmV
ROR 1.0000E-24 5.0816E-14 ppmV
ETH 1.0994E+14 1.0994E+14 ppmV
IOLE 1.0994E+14 1.0994E+14 ppmV
ISPD 1.0000E-24 2.9938E-16 ppmV
TERP 1.0994E+14 1.0994E+14 ppmV
TRPRXN 1.0000E-24 2.8603E+03 ppmV
CL 1.0000E-24 1.4452E-36 ppmV
FMCL 1.0000E-24 4.8042E-13 ppmV
HCL 1.1991E-12 1.4106E-12 ppmV
H2NO3PIJ 1.0000E-24 1.0426E-22 ppmV
H2NO3PK 1.0000E-24 2.0928E-27 ppmV
AGLYJ 1.0000E-24 1.3925E-04 ppmV
HGIIAER 1.0000E-24 8.3112E-22 ppmV

To answer your question about setting tolerances for the chemical solver, you can do that through environment variables if you compile with the Rosenbrock or Gear solver options:

Synchronization Time Step and Tolerance Options

I have no experience with this, though, and the species concentrations you posted show that something is seriously off. Not only do a number of species drop to very low concentrations as you noted, but some other species, mostly radicals, (e.g. H2O2, HO2, RO2, XO2, etc.) show unrealistically high values in the 1-1000 ppm range. Given this, and given that you noted that your only change relative to a previous successful run was in the emission inputs, I’m somewhat doubtful that playing with the tolerances will solve the issue.

They are not unrealistically high values, they are really low values closed to zero (e.g. GLY 1.0000E-24), if I interpret the above error message correctly

And I don’t understand why this error message occurs. For example, “ETH 1.0994E+14 1.0994E+14 ppmV”, but I check the emission input file for both run. The successful run, the emission rate is min=0, max=1.147586 mole/s. The failure run, the emission rate is min=0, max=1.0014681816101074 mole/s

You are correct, many of these species concentrations are unrealistically low, but there are some like the ones I listed as well as ALD2, FORM, CO, PAR, ALDX, ETH, IOLE, and TERP that are significantly higher than they should be.

For some of these species (e.g. CO and TERP), can you compare the average emission rates (not just the min and max) from the m3stat report files created from the two sets of emissions files you are using?