CMAQ output missing some intermediate hours timesteps

Hi,

I got scared by seeing CMAQ output file missing a few random time steps in CONC files - I tried with different number of processors but still the same problem. It never happened before. The model keeps running without crashing. Any clues @cjcoats
image

Thanks

Some variable is not being written – that’s what you’re seeing here. You might try analyzing the file with M3Tools program m3stat (use the default analysis) to see what is missing.

Quite possibly a model-configuration problem, outside my expertise…

Thank you for the prompt reply, but this kind of behavior varies every time running the same script. Sometimes its missing something like 6th hour time step, other times its missing around 15-16th hour. And nothing has been changed from configuration file from that for successful runs made before. I am running m3stat…

m3stat crashed with

Ending lines in the REPORT file are
image

By the way can it happen due to issues with the processors? something like when mixing slow and quicker processors?

Just to clarify: The model output in this case, though has no time step for the 10th hour, it has all the time steps afterwards and before that, in the CONC file. In another run using the same script, I found CONC had 2 or 3 data ‘holes’ but not at the same hour (meaning not 10th hour, but something like 5th or 7th hours)

Hi,

If you think this is something to so with a mixture of fast and slow processors, you can try to run it with all fast or all slow processors.

Cheers,
David

I tried that and it sometimes worked, sometimes not (unless there is some conspiracy by the cluster managers, which I don’t think :), this model behavior has baffled me). Model can advance without checking if the writing of previous timesteps was successful is also confusing. Can this happen if the disk is slow to respond (I mean the hard disk)?

Thanks

You did not state what version of CMAQ you are running.
Sometime recently we allowed adding TA and PRES to the output CONC files, to facilitate conversions between ug/m3 and ppb in subsequent postprocessing without requiring the METCRO3D files. However, the first time step of the CONC file is written from the previous run’s CGRID file, which does not contain those meteorological variables, leading to missing data. I thought this error has been corrected. If the missing data occurs only with TA and PRES in the CONC file, then it does not indicate an error in the run itself.
However, if you have time steps with actual chemical species (like NO2) missing from your CONC files, then that indicates something is going wrong. Please provide more details on your run configuration.

CMAQv5.3.1 is the model. Yes, NO2 can be missing for a few random timesteps - sometimes does not miss anytime step with the same run script. Just a simple configuration - gridded area emissions only. Had run the model successfully with the same run script several times but the problem appeared only these days

I am kind of at a loss as to what could be going wrong. I have not seen this error before. It does sound like it could be a problem with communications across your cluster. If you run the model on a single processor, are there any data gaps in the output?

Not possible to run with single (1) processor - would take forever (at 1 km res). But even with multiple processors, it sometimes works perfect. Maybe due to storage being slow to respond, but there could be a way by which model should check if the write was successful, and crash if not.