EBI failure, HADV/ADV/HDIFF NaNs


#1

Hi, I am using CMAQv5.2.1 and my simulation fails on the first time step.

The meteorology for the simulation was generated with WRFv3.7.

I have tried running with the following settings, but each has failed:

  • CTM_MAXSYNC = 300, MINSYNC=60
  • CTM_MAXSYNC = 240, MINSYNC = 60
  • CTM_MAXSYNC = 240, MINSYNC = 15
  • CTM_MAXSYNC = 180, MINSYNC = 15

The log for the first day notes that GRID_CRO_2D and INIT_GASC_1 were opened, but:

CTM_CONC_1 :/foobar/CCTM_CONC_v521_gcc_india_20151201.nc
WARNING in subroutine OPEN3
File not available.
Could not open CTM_CONC_1 for update - try to open new

and

ERROR: Max number of EBI time step reductions exceeded
*** ERROR ABORT in subroutine HRSOLVER on PE 062
ERROR: Stopping because of EBI convergence failures

When I “grep after log” it seems there are NaNs for HADV, ADV, HDIFF, and DECOUPLE_, and Infinity for CLDPROC.

I have successfully run similar simulations with WRFv3.7 meteorology and the same emissions used here. I have also successfully run similar simulations with the same meteorology on a different computer.

Any recommendations for how to solve this or investigate further?

Thanks!


#2

When you grep your log file for the string “after”, when do you first see NaN? Is it in VDIFF (the first science process)? If so, I would suspect problems in the OCEAN file and sea spray emissions. You might try the beta version of CMAQ v5.3, which now has an easy way to turn off sea spray emissions.

But first, have you tried running in debug mode? Depending on your compiler, there are flags you can turn on to crash if there are floating point violations, array indices out of bounds, or uninitialized variables. These options slow down the model execution, but if your crash is happening in the first time step that is not really a problem.


#3

grep “after VDIFF” on the log shows there VDIFF is never NaN. The NaNs are in HADV, ADV, HDIFF, and DECOUPLE_. I’ve used the same OCEAN file for simulations of different time frames, as well as a simulation of this time frame on another computer. The domain is also only over land.

Regarding the debug mode, I haven’t tried this before. Are there any resources that describe how to use it and what the flags are useful? To turn this mode on, I see that you need to set Debug_CCTM TRUE in the bld script. The current flags already set in my config file for gcc myDBG are “-Wall -O0 -g -fcheck=all -ffpe-trap=invalid,zero,overflow -fbacktrace”. When running the debug mode, what do you look for to source the error? Still > grep after LOGS?

Thanks


#4

You can also go into your build directory and type

make clean
make DEBUG=TRUE

You might need to setenv compiler and compilerVrsn and source your config.cmaq first, depending on how your environment is set up.

Then, run CMAQ as normal. It may crash with a stack trace that points your way forward.
If your first problem is in HADV, then perhaps there is a problem with your BCON file.


#5

Thanks. I didn’t end up needing to run in debug mode.

Turns out the issue was in BCON. I’m using GEOS-Chem files for CMAQ BCON, and at some point in my preparation the very first timestep was left unassigned, so all variables were set to values on the order of e36.


Forrtl: severe 65: floating invalid error in SSEMIS.F in CMAQv5.2.1