CMAQv5.3 stops for ptfire sector in inline mode with segmentation error: forrtl: severe (408)

ehsan · December 27, 2018, 5:45am

Hello,

I am trying to run CMAQv5.3 for only “ptfire” sector as inline point source. I have 4 different wildfire emission scenarios and I am running them for whole September. I changed values in IC and BC files to 1e-30 to minimize the influence of initial and boundary conditions. I tried running all 4 in MPI mode and all 4 CMAQ runs stopped at Sep12 1:02:00 without any explicit error message to show what the issue is. I have attached a screenshot of this to this email.

Then, I compiled CMAQ in serial and tried to run it for Sep11 (the day before the date that it stopped in MPI mode) to check the log file, but it did not run at all and stopped with segmentation error. When I compiled it with debugging option for serial mode, it showed the following error:

forrtl: severe (408): fort: (3): Subscript #1 of the array MW has value 0 which is less than the lower bound of 1

I was wondering if 1e-30 values might be the issue although CMAQ in MPI mode ran for days with those values.

Thanks for any help in advance.

cgnolte · December 27, 2018, 3:11pm

Ehsan,
There is a very explicit explanation of what the error is:
Subscript #1 of the array MW has value 0 which is less than the lower bound of 1.
The stack trace indicates this error is occurring on line 1640 of EMIS_DEFN.F.

It is surprising that you would encounter an error of this sort several days into a multiday run. Are you sure you did not change anything else in the run script, the emissions control file, or the species namelists?

The issue with the serial build not working is probably separate from this array indexing problem. What is the domain size you are trying to run? Maybe the domain is too large to be run on a single processor, and you are exceeding the amount of available memory on your machine.

ehsan · December 27, 2018, 6:14pm

Hello Chris,

Thank you for quick reply. I did not touch namelist file or emission control file, but I did some modifications inside my run script for my case. I only defined one elevated source (N_EMIS_PT=1) with its stack and inline file information. I turned off other point sources and I did not include any surface emissions. I try to attach some screenshots of log file.

The domain dimension that I am trying to simulate is: 265, 250, 60. I tried to run with 10GB memory with single processor. I tried to run with single processor to get some hint about the source of the issue. How can I get more information about the source of the problem?

Thanks.

cgnolte · December 27, 2018, 8:21pm

How did you turn off surface emissions? It should be possible to do this using the emissions control file. But I suspect the problem you are currently facing is related to turning off the surface emissions. Whatever you did to turn them off, try turning them back on.

The log files are giving you enough clues to go on, and there is no indication that MPI is giving you problems. So I wouldn’t worry about getting this case running in serial mode.

ehsan · December 27, 2018, 9:46pm

Thanks for your advice.

Sorry by turning off I meant I commented out other point sources in the run script and I set N_EMIS_GR to 0 to not include surface sources in calculations because I do not have surface emissions yet. I have attached a screenshot of that here. I am not familiar with EmissCtrl.nml file yet, so I just modified the run script directly.

I did the MPI run mode for the day with error and the CTM_LOG files for all 4 scenarios show similar error all at same time and in only one PE (055) in the LOGs:

 Processing Day/Time [YYYYDDD:HHMMSS]: 2016256:010300
   Which is Equivalent to (UTC): 1:03:00  Monday,  Sept. 12, 2016
   Time-Step Length (HHMMSS): 000100

SEDDY, MBAR, FNL, HOL = 1.4676103E-06 5.4300489E-07 9.9020980E-02
-2.2222582E-02

 *** ERROR ABORT in subroutine VDIFFACMX on PE 055       
 *** ACM fails ***

PM3EXIT: DTBUF 1:28:21
Date and time 1:28:21 (*:)

ehsan · December 28, 2018, 5:34am

Hi again,

I looked at the source file that the error is coming from (CCTM/src/vdiff/acm2_m3dry/vdiffacmx.F). It comes from line 237 that compares MBAR (ACM2 mixing rate) with EPS. At the top of the program EPS is set to “1e-6” and since MBAR is “5.4300489E-07” it gives the error. I looked at the reference papers of ACM2 model (Pleim, J.E. (2007a) and Pleim, J.E. (2007b)) and I could not find any reference for EPS constant. My guess is it might be a threshold or error tolerance. I was wondering if it is reasonable to set EPS to lower values such as 1e-10 to pass this error.

Thanks.

cgnolte · December 28, 2018, 3:16pm

The log file is indicating the error is due to meteorology. I notice the model is using a 1-minute time step, which is where we usually set our lower bound. I have never seen a 12-km simulation require such a short time step. Something unusual is going on… an unusually strong convective updraft? Divergent flow? You could try visualizing the meteorological data and see if you spot anything obviously wrong or nonphysical.

It looks like your turning off gridded emissions is working correctly. Still, I might try turning gridded emissions back on to make sure the error occurs in the same place.

Unfortunately, due to a pending US government shutdown, I [and my EPA colleagues] will not be able to respond further until the government reopens. Good luck!

cjcoats · February 5, 2019, 4:25pm

As an aside about the code’s computational efficiency, the code immediately before this failure has:

FNL = 1.0 / ( 1.0 + ( ( KARMAN / ( -Met_Data%HOL( C,R ) ) ) ** 0.3333 )
 &                / ( 0.72 * KARMAN ) )

The meaning here of the **0.3333 is “take a cube root.”
However, using this exponent is both sloppy and expensive (real-exponent powers are on the order of a thousand times as expensive as a multiplication), as well as being imprecise.

As it happens, there is a fast cube-root algorithm due to Edmund Halley (of comet fame). Experence shows that using a Fortran-coded Halley-algorithm cube-route routine is about three times faster than the **0.3333. On the other hand, if the exponent is a compile-time machine-precision one-third (e.g., **(1.0/3.0) ), Intel’s compiler recognizes it and uses a machine-language implementation of the Halley algorithm that is twice faster still… .

And for that matter, divisions are 10 to 100 times as expensive as multiplications, so using a non-rationalized fraction with those extra divisions also takes more CPU time than necessary…

FWIW.

Topic		Replies	Views
Turning off emissions sources in CMAQ 5.3 v2 Run Time Errors and Issues	8	1084	November 9, 2020
Forrtl: severe 65: floating invalid error in SSEMIS.F in CMAQv5.2.1 Run Time Errors and Issues	24	4671	March 3, 2019
Proble of running CMAQ5.3 CMAQ	2	1183	July 9, 2020
CMAQ-5.3 run error Run Time Errors and Issues	17	498	May 18, 2021
Incorrect timestepping for elevated point source emissions Run Time Errors and Issues	3	984	April 6, 2020

CMAQv5.3 stops for ptfire sector in inline mode with segmentation error: forrtl: severe (408)

Related topics