HDIFF results in NaN chksums


#1

I am running a one-day test for a longer simulation. The simulation will complete, but the ACONC and CONC output are all near-zero. This is resumable because they are being wiped out after HDIFF as the checksums for HDIFF are always NaNs; e.g.:

after HADV G 6.2850745E+06 A 1.2570200E+11 N 1.5718058E-01
after ADV G 6.2899680E+06 A 1.2547241E+11 N 1.5689573E-01

 H-eddy DT & integration steps:   3.0000000E+02       1

after HDIFF G NaN A NaN N NaN
after DECOUPLE_ G NaN A NaN N NaN
after CLDPROC G 3.3154251E-06 A 2.0508180E-19 N 1.6630650E-23
after CHEM G 3.1229010E-06 A 2.0852811E-19 N 1.6630650E-23
after AERO G 7.0037115E+02 A 2.4320300E+02 N 6.4929075E-18

Could this be related to the CGRID file?


#2

The CGRID file is the restart file that includes model species concentrations at the end of each simulation period.
I don’t think that this file would cause NaN values.
It looks like you are using a profile file for the initial conditions instead of a CGRID file.

setenv ICFILE ICON_v52_China_EV_profile

The log file does report no Initial conditions found for many species, but that should be ok as CMAQ sets the values to a low concentration.

No IC found for species XO2 in INIT_GASC_1; set to  1.00E-30
     No IC found for species XO2N in INIT_GASC_1; set to  1.00E-30
     No IC found for species NTROH in INIT_GASC_1; set to  1.00E-30
     No IC found for species NTRALK in INIT_GASC_1; set to  1.00E-30

Please also check the mpi log files as you are running this case on 32 processors.

CTM_LOG_0[01-031].v52_gcc_China_EV_20121223

These per-processor log file likely contain more information
It looks like they are saved to the following directory according to your log file.

/projects/b1045/cmaq/cmaq_v5.2/data/output_CCTM_v52_gcc_China_EV_4xPROC/LOGS/

These files should all be similar/identical, so please attach one of them to this issue for us to review.

You are also getting this warning in your log file that may indicate a problem with the Emissions.
*WARNING:
One or more emissions surrogates assigned to model species
are not found in emissions sources but the CTM_EMISCHK
environment variable set to False so simulation will proceed.


You can also try to reduce the timestep by changing CTM_MINSYNC to 15
#> Sychronization Time Step and Tolerance Options
setenv CTM_MAXSYNC 300 #> max sync time step (sec) [ default: 720 ]
setenv CTM_MINSYNC 60 #> min sync time step (sec) [ default: 60 ]

recommend

  1. check WRF to see if wind speeds are too high.
  2. try compiling with the 5.2.1 version
  3. try different science modules? For Driver, switch between wrf and yamo

Just before the NaN values in the HDIFF module the ADV module is run.
Looking at your values and comparing them to the benchmark case:

after      HADV G  1.0017700E+03 A  7.3521514E+12 N  4.5811871E-01

In the benchmark case, the values look like this

 after      HADV G  2.3950352E+03 A  6.2361959E+14 N  1.4660137E+01

The values being printed out are in the util/util/cksummer.F routine
GC_CKSUM / LCELLS, which I believe are values for the Gas Phase Species number concentration
AE_CKSUM / LCELLS, which I believe are values for the Aerosol Species number concentration
NR_CKSUM / LCELLS, which I believe are values for the Non-reactive Species number concentration

Please also attach your bldit_cctm.csh script, so that we can review your science process options and your compiler. You can also try recompiling the code with the -g option or do some sort of runtime trap for uninitialized values.
https://www.dursi.ca/post/stopping-your-program-at-the-first-nan.html

Another suggestion is to look at the PBL scheme used in the meteorology model and use the same scheme in the CMAQ model.


#3

Thank you for the detailed reply, I will try these options and update.


#4

Attached is the CCTM_LOG, but it looks the same as the other.

The only thing I can think of is that my grid is tangent, and so I needed to add the call to subroutine “ll2xy_lam_tan” in ll2xy_lam.F90. However, this is for the MCIP executable, which works just fine. I’m also running the two-way model with this same build and it doesn’t have the HDIFF/NaN errors. I realize MCIP is bypassed in the twoway model, but that subroutine (ll2xy_lam_tan) is still called in twoway_aqprep.F90.

  1. CTM_MIN_SYNC changed to 15 - same issue.

  2. Wind speeds look fine

  3. I don’t want to use 5.2.1 as I’m also running the two-way model (see above)

  4. Compiling with driver/yamo results in this error:

sciproc.F:66:6:

   INCLUDE SUBST_PACTL_ID    ! PA control parameters
  1

Error: Unclassifiable statement at (1)
sciproc.F:201:15:

   IF ( LIPR ) CALL PA_UPDATE ( 'VDIF', CGRID, JDATE, JTIME, TSTEP )
           1

Error: Symbol ‘lipr’ at (1) has no IMPLICIT type
make: *** [sciproc.o] Error 1
ERROR while running make command


#5

Other than setting a trap for when the NaN first appears, I am not sure what to try. I am not familiar with the changes your are referring to in MCIP and the potential impacts to CMAQ.
@wong.david-c or @tlspero Perhaps you can help with this error?


#6

I’m in talks with David about the two-way model - I’ll try the Nan trap


#7

Problem has been resolved ---- Rebuilt ioapi, netcdf (upgraded from 4.3.3.1 to 4.6.1), and cmaq with intel compilers (instead of gcc). Thank you for the assistance.