Running CMAQv5.2.1 crashes at SSEMIS.F module. I extracted some variables from CMAQ run and looks like WET_M3J gets zero as the denominator and then CMAQ crashes. In M3J calculation, ESEASJ gets very small (E-25) and in my case I think it is because VFLXRH gets very small (E-30). I traced back the issue to VFLXRH equation and extracted OFRAC (ocean/seawater fraction) and SFRAC (surf zone fraction) parameters. SFRAC is 0.0 in CMAQ log files, but OFRAC always remains 3.56E-16.
I am confused about two issues in this case.
First, our most-inter domain does not touch any ocean. I have attached a visualization of our domain here (most inter-domain-03). I checked my ocean file and surfzone variable is 0.0 everywhere, but OPEN ocean for some cells is -E-15 or +E-15 or smaller which the negative value is weird since this variable is a fraction of seawater cover and it should be zero for my case. I am not sure if that is the reason CMAQ crashes.
Another issue is that in SSEMIS.F it looks like if there is no ocean in the domain and then if OFRAC and SFRAC are zero, it makes VFLXRH, ESEASJ, M3J , WET_M3J zero anyway and it makes the denominator (WET_M3J in line 938) in DRY_M2J zero and it will cause “floating invalid” error.
Any thoughts and suggestions would be greatly appreciated.
You might try downloading the beta version of CMAQ v5.3. The emissions processing (including sea spray emissions) has been substantially revised, and it is now much easier to simply turn off sea spray emissions.
If you want to use v5.2.1, then maybe the problem is here (line 751 of SSEMIS.F):
Thanks for your suggestion. I tested this in a run and it did not pass the issue, then I added some lines in the screenshot to where I thought the issue might come from inside SSEMIS.F and in debug mode it passed the error and finished for that specific day that I kept getting error.
Then, I compiled CMAQv521 in normal mode, also set CTM_MAXSYNC=150 and ran for the same date and it shows me a new error for the same date!
*** ERROR ABORT in subroutine AQCHEM on PE 077
Maximum AQCHEM total iterations exceeded
PM3EXIT: DTBUF 8:00:00 Sept. 13, 2016
Date and time 8:00:00 Sept. 13, 2016 (2016257:080000)
I did not change anything. How come in debug mode it finishes successfully for a specific day and in normal mode (with optimization flags) it gives error?
I was reading in the forum that if CMAQ crashes some day later than first day in model run, the issue might come from met files. In my case it crashes in day 13 of Sep. Might this be the cause of the crash? How can I check that?
I installed CMAQv5.3 for this case and was checking the log file of the run. I came across this warning. The model is still running and this warning is not for the day of the crash. I was wondering if this might be relevant to the error I am getting.
Check the RN and RC variables in your METCRO2D file. For the hour where you are getting convergence errors in AQCHEM, do you have large spikes in precip (around 100 cm)? If so, then this is due to a bug in MCIP that will be fixed in the next release; I can give you a patch.
I checked the variables for that specific day of crash. RN range is between min of 0.0 cm and max of 0.265 cm, but RC is -1.0 cm everywhere. Also, I checked other files and RC is -1.0 in every file! What do you think about this?
Also, I attached a screenshot of a warning in subroutine CONVCLD_ACM to my previous post. Do you think that might cause any issue related to my present situation?
RC= -1.0 indicates that no convective parameterization was used in the WRF simulation, so there is no convective (subgrid) rain. This should not be a problem.
Your problem is in SSEMIS, even though your domain has no ocean. As I said previously, emissions processing has changed in v5.3, and it is much easier to turn off sea spray emissions. But if you’re committed to v5.2.1, try this:
In ASX_DATA_MOD.F, just before the RETURN statement at the end of the INIT_MET subroutine, set both the OCEAN and SZONE arrays to 0.
Grid_Data%OCEAN = 0.0
Grid_Data%SZONE = 0.0
Again, I have not tested this, but I think it should prevent SSEMIS from being called.
I am trying CMAQv53 now. I added those lines to the source code in ASX_DATA_MOD file, and also set CTM_SS_AERO to N. Again CMAQv53 crashed with error “Maximum AQCHEM total iterations exceeded”. Then I compiled and run in dbg mode for that specific day and pull out the error message and stack:
“forrtl: severe (408): fort: (3): Subscript #1 of the array MW has value 0 which is less than the lower bound of 1”
Ok, the problem isn’t in SSEMIS anymore, so we can declare victory on one thing.
This is something else involving the new emissions processing code. There should be a much better error message and code abort rather than a crash. Something is not getting initialized. Can you tell from one of the log files (e.g., CTM_LOG_000*) what emissions stream was being processed? (Was it gridded emissions, or do you have inline point sources, or was it biogenics, or lightning NO, or … ?)
I ran CMAQ in debug mode for the day of the crash (Sep13) with yesterday ICON, and CMAQ did not start at all and crashed with the error message in the screenshot I attached to my earlier post here.
Regarding emissions stream, I have inline fire emissions for each day. I checked the CTM_LOG_ files and CMAQ tried to open gridded emissions and then in runtime log file it shows the error in MW array.
I managed to run CMAQv5.3 with debug mode for faulty day and looked at changes in species concentrations after each time step. CMAQ crashes at hr 8:00:00 with AQCHEM error at subdomain #77 and for one minute before the crash (7:59:00) there are NaN values for several modules in that subdomain. I have attached a screenshot of that to this message.
That might be the cause of the crash. So I was wondering what causes NaN values? If it comes from met files, which parameters should I check? If it comes from emissions, which file should I check? I have both inline (fires) and gridded emissions in this model run.
The NaNs are almost certainly the cause of your model crash. They indicate your simulation is corrupted.
Carlie’s suggestion for checking the input files is good. It is strange to me that NaNs first occur at 7:59. If there was a bad value at 8:00 in one of the data files, then that bad value would have been used for interpolation at every time after 7:00, so I would expect the crash to occur sooner.
Try grepping for NaN in all your log files to see when the first one occurs–it might not be on the same node that crashes later. Then look at that log file around that time and see if there are any further clues.
On most compilers you can use flags to cause the model to crash if there are floating point violations, arrays out of bounds, uninitialized variables, or other exceptions. You might try that as well.
You were right Chris. I checked my log files for the whole modeling period before model crash on Sep 13 and found out that NaNs occurred at almost every day of simulation at UTC 7:00 on two sub grids of #77 and #78. We are locally UTC-7hrs.
Then I ran CMAQv5.3 in debug mode for the first day of simulation with default ICON and BCON, CTM_SSEMDIAG = T, and CTM_SS_AERO = F to turn off sea salt. The run finished successfully, but the log files show there are NaN at 7:00 and 7:59 and several lines of following warning. I have attached two screenshots with more detail.
WARNING: EBI solver in cell ( 30, 15, 40) Init.Conc. for SULF = 1.0220E+03 ppmV
So I will check both MCIP and emission files for any bad value at UTC 7:00, but what should I look for as “bad value”?
Yes I saw that, but in my case I am running in debug mode and I turned off sea spray emissions. I ran in debug mode for the the first day of my simulation and debug mode does not show anything and CMAQ finishes successfully, but there are NaNs in CTM_LOG files at hr=7:00.
I checked my ICON with m3stat and all variables are in the range of E±10, but only SULF is set to 1.0E-30.
I am using CMAQv5.3 and created ICON/BCON from CMAQv521 since ICON/BCON programs for v53 where not available. I tried to open BCON with m3stat, but m3stat does not produce output for BCON. I checked inside BCON and it has values.