CCTM doesn´t continue processing


#1

CCTM doesn´t continue processing
Hello again:
I´m having problems to get results in a hemispheric domain. In this case, I started my simulation on 2014 January 1st at 00:00 with zero concentrations in the initial and boundary conditions for elemental carbon (AECJ and AECI).
I only have 1 layer in the emission file generated in SMOKE4.5
The idea is to run a month, so I put 744 hours in CCTM script.
I ran with 100 cpus, with 10 dedicated to NPCOL and 10 to NPRAW.
The problem I have is, when simulation processed January 3rd 20:00, CCTM did not continue processing but it still working (do not stops) and in the log file generated I couldn´t find any error.
Then I restarted but considering only 55 hours (the time to simulate until date mentioned before) and CCTM finished successfully. Next, I restarted from this date until February 1st and CCTM simulated without any problem.

Could you please let me know why could it happen?
I continued in February and the same problem occurred on February 7th at 20:00.

Please help me, because in March it happens every one or two hours since March 3rd.


#2

Ernesto,
What version of CMAQ are you using?

I am a little confused by your question. If you started at 2014-01-01 at 00:00 and the run progressed to 2014-01-03 at 20:00, that is 68 hours, not 55.

We usually script the model to process one day at a time. But it should be the case that you can run the model for a month or much longer. To do so, all hours of your time-varying inputs have to be in individual files also. Have you done so?

When you say that the model does not stop, does it appear to be hung in an infinite loop? If you run it twice, does it hang in exactly the same place?


#3

Hello Chris:
I´m using CMAQ version 5.2
Sorry if I did not explain very well when I wrote.
Well, reviewing I made an error when, because, CCTm processed until January 3rd at 06:00, being 55 hours.
Let me explain again
I started in January 1st at 00:00. I did not know about the individual emission files, so I put all the emission (a year) in one file.
So, when I tried at the firt attemp, CCTM did not process much longer than 55 hours (date mentioned before), but, the 120 cpus still running whithout writting the log files generated, so I couldn´t see the cause or error of it.

When I saw that, I decided to run only 55 hours and CCTM finished ok, without any problem and generated the CCTM_CGRID file.

Next, I restarted using it since january 3rd 07:00 until January 15th to test it, and it worked perfectly.

Then I continued until February, when I get the same problem in February 7th.
Next, I continued throough February until March 2, when the problems became big problems, because, I have to run every hour (CCTM did not process more than 1 hour).

So I did a lot of run to obtain all March.

Righ now I m running April, every 2 or 3 days, because CCTM do not process in that month more than 60 hours.

So right now I´m not understand what happen, I don´t know if there´s a problem in the cluster or if is a problem with the numerical integration.

Using ncrcat -h command I get the entire month combining all the CCTM_ACON files generated.

Did I understand right now?
Thanks a lot for your time.


#4

CMAQv5.2 was released in June 2017. I encourage you to use the latest version of the code, v5.3b2 (October 2018). If you don’t want to use a beta version, then you should at least use CMAQv5.2.1, which was released in March 2018.

Having said that, I can’t tell where your problem is, or say whether it is something that has been addressed in a more recent version.

Are the places where the code hangs reproducible? If so, that suggests something in the data or algorithm. If not, then it suggests the issue may be with your system hardware or architecture.

What is the spatial scale of your runs (grid cell size)? What is the synchronization time step?


#5

Hello again:
Well, I´m trying to simulate a hemispheric domain (South Pole) with 108 km2 as gridd resolution and 180 cells in horizontal and vertical direction.
In the CCTM runscript, where I can find the synchronization time step?
Thanks


#6

This is the configuration I have in CCTM runscript:

#> Sychronization Time Step and Tolerance Options
setenv CTM_MAXSYNC 720 #> max sync time step (sec) [ default: 720 ]
setenv CTM_MINSYNC 60 #> min sync time step (sec) [ default: 60 ]
setenv SIGMA_SYNC_TOP 0.7 #> top sigma level thru which sync step determined [ default: 0.7 ]
setenv ADV_HDIV_LIM 0.95 #> maximum horiz. div. limit for adv step adjust [ default: 0.9 ]
setenv CTM_ADV_CFL 0.95 #> max CFL [ default: 0.75]
setenv RB_ATOL 1.0E-09 #> global ROS3 solver abs tol [ default: 1.0E-07 ]


#7

In your log file, for the output period when the model is hanging, look for something like these lines:

 Top layer thru which sync step determined: 21
 Computed synchronization step (HHMMSS): 000500
 Number of Synchronization steps:   12

CMAQ is supposed to select a synchronization time step that is small enough to ensure stability while being large enough to run efficiently. Sometimes model crashes or hangs can be resolved by setting a shorter maximum time step. For example, this would limit steps to 5 minutes:

setenv CTM_MAXSYNC 300

#8

Ok, I´ll try to change the synchronization time step (increasing I guess) and will let you know if the problem persist.
Thanks a lot for your time and help.
Regards


#9

No, if this is the problem, you want to set a value for CTM_MAXSYNC that will result in a shorter time step than what the model is doing on its own.