CCTM5.3.3 run error: Program received signal SIGFPE: Floating-point exception

Hi, I got following error when I run CMAQv5.3.3:

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x2AC6866D4697
#1  0x2AC6866D4CDE
#2  0x2AC6871673FF
#3  0x9381DA in m3dry_
#4  0x6CC539 in __depv_defn_MOD_get_depv at DEPV_DEFN.F:544
#5  0x91E7E4 in vdiff_ at vdiffproc.F:411
#6  0x86810D in sciproc_ at sciproc.F:237
#7  0x8566DD in cmaq_driver_ at driver.F:717
#8  0x8510C4 in MAIN__ at cmaq_main.F:97

==================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 53953 RUNNING AT cuter-r740
=   EXIT CODE: 136
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
==================================================================

I had encounted this error before, but it disappeared after setting the DEBUG=FALSE.
For this run, I used the Noah LSM and the MODIS land use data and my WRF, CMAQ and MCIP version are 4.1.1, 5.3.3 and 5.3.3,separately. The compiler is gcc and gfortran.

I checked the m3dry.F and printed some information:
The code is:

IF ( ( NINT(GRID_DATA%LWMASK( c,r )) .EQ. 0 ) .OR. ( vegcr .EQ. 0.0 ) ) THEN  ! water
print *, 'WATER:====================================='
print *, 'LWMASK:==================', GRID_DATA%LWMASK( c,r )
print *, 'vegcr:===============', vegcr
...
ELSE   ! land
print *, 'LAND:====================================='
print *, 'LWMASK:==================', GRID_DATA%LWMASK( c,r )
print *, 'vegcr:===============', vegcr

The output is (can see from the attachment dust202103.log):

 WATER:=====================================
 LWMASK:==================   0.00000000    
 vegcr:===============   0.00000000    
 WATER:=====================================
 LWMASK:==================   0.00000000    
 vegcr:===============   0.00000000    
 WATER:=====================================
 LWMASK:==================   0.00000000    
 vegcr:===============   0.00000000    
...
 LAND:=====================================
 LWMASK:==================   1.00000000    
 vegcr:===============  0.484998971    
 =============================
   7.00000022E-03
   1101.12646    
  0.217800006      0.128800005       0.00000000       4.50000000    
 =============================
 LAND:=====================================
 LWMASK:==================   1.00000000    
 vegcr:===============  0.258183330    
 =============================
   27376.0254    
   1.34305178E+09
  0.217800006      0.108900003       0.00000000       0.00000000    
 =============================
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x2AC6866D4697
#1  0x2AC6866D4CDE
#2  0x2AC6871673FF
#3  0x9381DA in m3dry_
#4  0x6CC539 in __depv_defn_MOD_get_depv at DEPV_DEFN.F:544
#5  0x91E7E4 in vdiff_ at vdiffproc.F:411
#6  0x86810D in sciproc_ at sciproc.F:237
#7  0x8566DD in cmaq_driver_ at driver.F:717
#8  0x8510C4 in MAIN__ at cmaq_main.F:97

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 53953 RUNNING AT cuter-r740
=   EXIT CODE: 136
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Floating point exception (signal 8)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
real 28.91
user 11.91
sys 6.06

**************************************************************
** Runscript Detected an Error: CGRID file was not written. **
**   This indicates that CMAQ was interrupted or an issue   **
**   exists with writing output. The runscript will now     **
**   abort rather than proceeding to subsequent days.       **
**************************************************************

==================================
  ***** CMAQ TIMING REPORT *****
==================================
Start Day: 2021-03-06
End Day:   2021-03-07
Number of Simulation Days: 1
Domain Name:               DUST202103
Number of Grid Cells:      803088  (ROW x COL x LAY)
Number of Layers:          39
Number of Processes:       1
   All times are in seconds.

Num  Day        Wall Time
01   2021-03-06   28.91
     Total Time = 28.91
      Avg. Time = 28.91

I found there were zero LAI values over points where the grid cell is categorized as land within the landmask file(LWMASK in the file GRIDCRO2D ). I extracted the LAI and VEGCR from file METCRO2D and the LWMASK from file GRIDCRO2D into the excel, see the attachments. The excel file anomaly.xlsx shows the points where lai equal zero and landmask equal one (land).
!!!
My question are:

  1. This error may be bypassed setting the DEBUG=FALSE, and I can successfully got the results. However I wonder if this error can influence my results?
  2. Why there are zero LAI over the land points? Does something wrong with my WRF run or the land use data? If I want to solve it what I need to do?
    anomaly.xlsx (65.6 KB)
    lai-metcro2d.xlsx (75.7 KB)
    lwmask-gridcro2d.xlsx (61.6 KB)
    vegcr-metcro2d.xlsx (162.0 KB)

dust202103.log.txt (208.9 KB)

1 Like

I’ve looked over your files. Thanks for attaching details like this. I focused on the first column assuming these are gridded data in excel files. See that the anomalies=1 are the cells where LAI is zero, but all other fields suggest land.

  1. I would not use the CMAQ because so many grid cells are impacted. If it were just one odd grid cell it probably would not impact overall results.

  2. This seems almost 100% an issue with the WRF run, pre-processin… inputs or NOAH, but NOAH is used so much It seems very odd that so many land grid cells have LAI =0 if processing was done correctly.

a. Curious if LAI is updated during the run via wrflowinp_d01 or static during the run?

b. Check wrfinput_d01 file. Does LAI and VEGFRA look consistent or inconstant like your excel files show? If they look the same from a land-water perspective, something wacky is going on during the run. Or a met_em* file from the metgrid step (LAI12m vs GREENFRA). If these have an inconsistancy it points to the root of the problem.

c. check WRF output too

d. check the wrflowinp_d01 file if this is being used to update LAI. Make sure it looks correct. I think you can use ncview to quickly visually compare these fields since the differences are stark.

e. Can you inform on the landuse class where LAI=0, but a land cell? Compare this landuse classification to the VEGPARM.TBL file. IF LAI is static, the values come from this table. If LAI is time-varying, the LAI comes from the wrflowinp_d01 file.

Feel free to attach a namelist.wps file. This would allow me to help understand the domain better and even run geogrid & metgrid to reproduce and dig deeper. It might help to search the WRF forum for similar instances. We do not use NOAH much, but have not had an issue like this in those small number of tests.

3 Likes

Hi, Gilliam. Thanks for your detailed reply.
Firstly, I need to make two corrections about my post: 1. The LSM I used is NOAH-MP; 2. I used WRFv4.1.1, not 4.3.3. Sorry for that.

I made further check according @rgilliam said.
a. LAI is from static during the run.
b-c. I checked met_em*, wrfinput_d01, wrfout, and metcro2d (created by mcip) files. LAI in the four files differ from each other in values, and from land-water perspective, wrfinput consistent with met_emd01 file, while the metcro2d consistent with wrfout file, see the file “anomal-compare”, there are four sheets in it, they are abnormal LAI values(LAI=0 over land grid) from met_emd01, wrfinput, wrfout and metcro2d files, respectively.
d. I didn’t use wrflowinp file.
e. I found the landuse class including 3, 12, 13, 15, 16, 17, and 20. And LAI maybe from MPTABLE.TBL because values match. wrf/MPTABLE.TBL at master · yyr/wrf · GitHub
Finally, my confusion is:
How WPS and WRF create LAI, VEGFRA. Why the LAI from met_emd01, wrfinput and wrfout files differ from each other and there are so many zero values.
I upload my WRF namelist, and hope someone can help me figure out. In the meanwhile, I will continue to check and post on the WRF forum to find the cause.

anomaly-compare.xlsx (245.2 KB)
LAI-compare.xlsx (1.2 MB)
Landuse Classification.xlsx (130.7 KB)
e.
anomaly_met_emd01
anomaly_metcro2d
anomaly_wrfinput
anomaly_wrfout
namelist.dust202103v1.input.txt (3.9 KB)
namelist.wps.txt (1.3 KB)

Hi, all. This problem has been solved. The reason is I used the NOAH-MP LSM which WRF get LAI from table, that is the MPTABLE.TBL.
In the file MPTABLE.TBL, LAI=0 when landuse classifications are “Deciduous Needleleaf Forest”, “Croplands”, “Urban and Built-Up”, “Snow and Ice”, "Barren or Sparsely Vegetated, “Water”, and “Barren Tundra”. As my simulation time is March and domain is north of China where have large area of desert, so many LAI=0. And I changed the LSM to NOAH, this problem gone away because grids where landuse classifications are "Barren or Sparsely Vegetated, “Barren Tundra” were assigned a value of 0.01 in the WRF.
Finally, thank you for your help @rgilliam

1 Like

This is great news!

Apologies for missing the prior post. My email must be filtering out some of these forum question emails.

Rob

That’s okay. It’s a process of founding problems and solving them. :grinning: