CGRID is not properly generated by a single day CCTM simulation using CMAQv5.0.2. Its size is way too small (32) compared to that of other outputs (ACONC, CONC, etc.) and cannot be opened to visualize its content using either ncdump or Verdi. What may be the cause?
Do you have any WARNING or ERROR messages in your log file that relate to the CGRID file?
This is the error at the end of the run.cctm log file:
*** ERROR ABORT in subroutine WR_CGRID on PE 000
Could not open S_CGRID file
Date and time 0:00:00 Nov 2, 2011 (2011306:000000)
application called MPI_Abort(MPI_COMM_WORLD, 538976288) - process 0
0.078u 0.171s 10:43:03.08 0.0% 0+0k 0+0io 0pf+0w
date
Tue Sep 11 00:18:52 CDT 2018
exit
The CGRID file is referred to as a circular restart file, in that it’s updated at the end of
every output time step with the full range of pollutants/layers. With the
CGRID file only containing a single time step it is a relatively small file.
Do you know how many timesteps were written out by the model before you obtained this error?
To help diagnose what is going on, before you run again, remove the CGRID file, and all output files or create a new output directory.
Restart CMAQ again and then look for an ERROR in both the run.cctm log file and the per processor CTM_LOG files if you are running on multiple processors.
It may be that the CGRID file that was created is only writing the header before failing.
You can see what is in a the header of the file using the command:
ncdump -h
(1) Regarding the time steps written out by the model, they are 24 or 25, depending on the output file:
CONC: TSTEP=UNLIMITED; // (24 currently)
ACONC: TSTEP=UNLIMITED; // (25 currently)
DRYDEP: TSTEP=UNLIMITED; // (24 currently)
WETDEP: TSTEP=UNLIMITED; // (24 currently)
(2) Looking for errors in CTM log files:
In one of them, I noticed a “failure” message that may be related to the “time step”:
“EBI Euler convergence failure
Reducing EBI time step because of convergence failure for cell (104, 12, 2) Back-up number 1”
(3) Checking the content of the CGRID file using ncdump -h, indeed only the heather was created by the model run:
netcdf CCTM_D502a_Linux2_x86_64gcc.CGRID {
}
Could (2) be the case of CGRID model failure?
Yes, (2) could be the problem, but I would expect an error statement in one of the log files. Can you try using the following command and report back if you found an error.
grep -i ERROR CTM_LOG*
You can also try to reduce the max synchronization time step (CTM_MAXSYNC in the CCTM run script), which is a conventional way to configure finer grid nests.
setenv CTM_MAXSYNC 300
Do you have the CTM_CKSUM environment variable set to yes?
setenv CTM_CKSUM Y
This should output the total (checksum) of all gas, aerosol, and nonreactive species after each science process. Sometimes you can see a species class blow up or become NaN, which can help pinpoint where and when the problem is.
There were no ERROR messages in the CTM.log files. The only ERROR was the one in the run.cctm.log file that I reported in the first place. Meanwhile and before your last suggestions, I ran another simulation with slight changes in the run.cctm script, such as:
setenv CONC_SPCS “O3 NO ANO3I ANO3J NO2 FORM ISOP ANH4J ASO4I ASO4J” #Before, the number of species was higher
setenv ACONC_BLEV_ELEV “1 20” # Before this was “1 22” and I think that the ERROR reported in the first place was related to this because I have tried in the past and “1 20” was the maximum limit at which the model did not fail.
The setenv CTM_CKSUM was set to Y in both previous and current model runs.
The setenv CTM_MAXSYNC was set to 720 in both previous and current model runs.
In the current simulation, no ERROR message was present in either the run.cctm.log or the CTM.log files. The CGRID was more completely generated and has a larger size compared to previous run (2,978,180,132 versus 32) but the time step written is 1. Also, the warning/failure in (2) above was still present in one of the CTM.log files but the model run did not stop because of it and all the CTM.log files show this message at the end:
—>> Normal Completion of program DRIVER on PE 015 <<—
Date and time 0:00:00 Nov 2, 2011 (2011306:000000)
>>----> Program completed successfully <----<<
Question: Why is the CGRID written for a single time step? Is it the output at the end of the last hour of the day? I need the CGRID file to start a 24-h Lagrangian (PinG) simulation (CCTM-APT) and I do not know how this would work if the CGRID file has only 1 time step in it.
The CGRID is updated at the end of every output time step, over-writing the previous timestep. It contains a full range of pollutants and layers. It is what you need to start (provide the initial condition) for a new model simulation and will work fine for CMAQ-APT.
Thank you for clarifying the CGRID. In this case, the CGRID file should be run for the previous day of the intended 24-h Lagrangian simulation or for the same day?
The timestamp information is available in the CGRID file that you created.
Use this command to print out the 16 lines after (-A) the line that matches ‘global attributes’.
ncdump -h CCTM_D502a_Linux2_x86_64gcc.CGRID | grep -A 16 ‘global attributes’
Review the SDATE and STIME values.
It is my understanding that the start date listed in the CGRID file will be the last hour of the model run that you did, which will be the first hour of the next day.
If this is correct, then the CMAQ should be run for the previous day of the intended simulation.
For example: if the CGRID file contains the values
:SDATE = 2011182 ; :STIME = 0 ;
Then the run script for your 24-h Lagrangian simulation should be modified to set the following values for START_DATE:
#> Set Start and End Days for looping
setenv NEW_START TRUE #> Set to FALSE for model restart
set START_DATE = “2011-07-01” #> beginning date (July 1, 2011)
set END_DATE = “2011-07-01” #> ending date (July 14, 2011)
Note: you can use the following command to determine the gregorian calendar value that matches the julian date in the CGRID file.
gregdate 2011182
Output:
Friday, July 1, 2011
Daylight Savings Time in effect.
That is also my understanding.
In my CGRID:
:STDATE = 2011306 ;
:STIME = 0 ;
However, my Lagrangian simulation has a start date of 2011305. This means that I should run CCTM for 2011304 in order to get the CGRID output at the end of 2011304, which becomes the initial condition for the 2011305 Lagrangian simulation. So, the STDATE in CGRID should be 2011305.
I tried using the “gregdate” command on my tcsh but I get the “Command not found” message. Is there an alternative command for “gregdate” on tcsh?
gregdate is a m3tools executable that is typically installed and built with the ioapi library.
You will likely need to add this directory to your path in your .cshrc once you find it’s location on your machine.
For example, I have the following command in my .cshrc
set path = ($path /nas/longleaf/home/lizadams/ioapi-3.2/Linux2_x86_64gfortmpi )
As this is the location of the gregdate command
ls /nas/longleaf/home/lizadams/ioapi-3.2/Linux2_x86_64gfortmpi/gregdate
I could make it work by running the gregdate executable from the ioapi directory where it was installed. Thank you so much for your very helpful guidance!
Hi, all;
I have also experiencing this issue with writing CGRID outputs. This issue crashed my simulation a lot of times. Sometimes, it showed up after a while of running. This time, it showed up at the very beginning. However, sometimes, with the same settings, I don’t have any problem with it at all. I am very confused by this issue now. There is no error message in the log files except the one shown below.
I tried to change settings according to the suggestion previously posted, however, that does not seem to work. Could you point me to anything I can do to avoid this issue with CGRID?
Thank you very much!
Yijia
Yijia,
The situation you are describing is very different from the subject of this thread. If you post a follow-up question, please start a new thread.
Try recompiling the model in debug mode. This can be done by uncommenting
set Debug_CCTM
in bldit_cctm.csh.
When you do so, then if a crash occurs you usually get more information as to what the problem is and what line in the code the model is executing at the time.
“Illegal instruction” says that the executable was compiled for a more-davanced CPU model than the one you are running on. Recall that each new generation of Intel or AMD processors adds new instructions that make programs potentially run faster/better, provided the programs are comiled to use these instructions; the down-side is that earlier-generation processors will fail in exactly the way you are seeing.
You need to re-compile your entire model (including libraries) for the processor you’re runnng on. (Search for -Xhost
in the output for man ifort or “-march” in the output for man gcc.…)
…or else that you have disk- or memory-corruption that is corrupting good instructions into bad ones (quite unlikely, but conceivable)