ACON are different from CMAQ-ISAM and CMAQ-DDM with identical inputs

Hi,
I used identical inputs, including meteorological data, emissions, and ICBC, to conduct both CMAQ-ISAM and CMAQ-DDM simulations. However, I observed discrepancies in ozone ACON (as illustrated in the attached Figure) between the two CMAQ runs, contrary to my expectations. I’m uncertain about the reasons behind these differences. Could you please provide any insights or ideas you may have regarding this matter? Your input would be greatly appreciated.

Thanks.

Feng

@Feng,

Thanks for reporting this error. Can you provide more information about the version of CMAQ you used for the simulations. This will help us re-create the issue you are seeing!

Thanks!

@fsidi ,
Thanks for your response. I used CMAQ 5.4.0.3 for those simulations.
Thanks.

1 Like

@FengLiu,

Thanks! I’ll give it a try on our system to see what turns up. If possible, I suggest running without the instrumented tools (e.g., ISAM, DDM3D) turned on to see what your ACONC results look like. To summarize, you have three cases:

A – CMAQv5.4.0.3
B – CMAQ-ISAMv5.4.0.3
C – CMAQ-DDM3Dv5.4.0.3

You’ve noted that the ACONC from B and C don’t match, however, does A=B or A=C?

Thanks!

@fsidi ,

Thank you for the suggestion, I will run CMAQ without those tools and will let you know after I get it.

Thanks!

1 Like

@fsidi,

It appears that the ACONCs obtained from model runs A, B, and C are all different. A and B is closer. I have attached figures comparing the ACONCs based on A-B and A-C. Now, my concern is which ACONC is capable of providing us with accurate modeled concentrations of species such as O3, NO, ISOP, and others.


Thanks,

@FengLiu,

I’m still working on doing the three runs myself. The ACONC in case A (CMAQv5.4.0.3 – without ISAM and DDM) should be the most reliable source of O3, NO, ISOP and others. Do you know if you are running on heterogenous processors?

Following up on @fsidi’s last question, were the compiler version, compiler flags, number of processors, and domain decomposition identical between all your simulations?

It might also help us if you could post the bldit scripts, run scripts, CMAQ_Control_DESID_* namelist files, and master log files for each of your three simulations.

In my own application with CMAQv5.4 compiled with intel21.4 and using identical number and type of processors, on our system I have found identical results for the ACONC files using two executables compiled with and without ISAM.

@fsidi @hogrefe.christian

Hi Sidi and Christian,

I built CMAQ with BLD_CCTM_v54_pgi, BLD_CCTM_v54_ISAM_pgi, and BLD_CCTM_v54_DDM3D_pgi at the same time using identical compiler (see some info as below), compiler flags, and ran those three cases with the same number of processors, and domain decomposition.

mpif90: nvfortran 22.1-0 64-bit target on x86-64 Linux
pgcc: pgcc (aka nvc) 22.1-0 64-bit target on x86-64 Linux

Model was run on the same node with the same type of processors:
@ NPCOL = 12; @ NPROW = 8
@ NPROCS = $NPCOL * $NPROW
setenv NPCOL_NPROW “$NPCOL $NPROW”;
./time -p mpirun -oversubscribe --mca mpi_cuda_support 0 -np $NPROCS $BLD/$EXEC

I would like to wait for Sidi’s test results before I post those files you mentioned.

Thanks.

Hi Feng,

thanks for this information.

Does the spatial pattern of the differences which you showed for a specific hour vary in time? Have you looked at the spatial pattern of the differences over longer time periods? As far as you can tell, do the locations of the largest differences coincide with specific emissions and/or geographic features?

@hogrefe.christian

Hi Christian.
The figure attached to my post shows a screenshot taken at a specific hour. I’ve analyzed the spatial patterns of differences across different days and months. It appears that the largest differences are associated with certain emission hotspots. However, I’m unsure if these differences correspond to specific emission sources or geographic features.

Once more, the results from CMAQ-ISAM (B) are closer to those of CMAQ run (A) without the instruments compared to CMAQ-HDDM (C).

Thanks.

Hello Feng,

we have performed some tests on our system and can confirm that using the Nvidia Fortran compiler version 22.11 with the -O3 optimization flag turned on to compile CMAQv5.4+ base, CMAQv5.4+ ISAM, and CMAQv5.4+ DDM3D will yield some differences in the ACONC files between the three executables. The test we performed was a single day simulation using the 12NE3 benchmark case run script included in the github repository.

We then tested switching the -O3 optimization flag to -O0 and found that the differences between the three sets of runs disappeared, though there was of course an increase in runtime.

Consistent with our previous experience, we did not see any differences between the ACONC files for the three sets of runs when using the intel 21.4 compiler in optimized mode with the -O3 flag.

When using the gfortran9.5 compiler with the -O3 flag, we did not see any differences in the ACONC files between BASE and ISAM, but we saw some differences between BASE and DDM3D. These differences disappeared when switching to the -O0 flag.

If the differences between the ACONC files between BASE, ISAM, and DDM3D executables compiled with Nvidia fortran in optimization mode present a problem for your application, our experience suggests that switching to the Nvidia without optimization, the gfortran compiler without optimization, or to the intel compiler with optimization should allow you to obtain identical ACONC files, with the third option likely the fastest. However, our tests were performed only for a single day over a small domain with only two tags, so these findings may not necessarily hold true for your case and your system.

Christian

1 Like

@FengLiu,

I concur with Christian, I confirmed Christian’s tests with the nvidia/pgi compiler (note: he and I both are part of the CMAQ Development Team so we are testing on the same system). However, I did some additional tests by modifying the nvidia/pgi compiler flags. I found taking away both the ‘O3’ and ‘tp px’ flags using the nvidia compiler results in no difference between cases A, B & C mentioned above. However, this comes at a slight cost which I found to be about ~2-3x the cost of running with O3, but still much, much, much better than running with no optimization (-O0).

If you choose to go the the GNU compiler, I suggest you try similar things there with just taking away the O3 flag.

Lastly, I am running a small benchmark case for a two week period to get some actual timing results and to see how big the difference get.

1 Like

@hogrefe.christian and @fsidi

Christian and Sidi,

Thank you for providing such detailed information based on your own testing. I will proceed to test with your suggested optimizations, such as using the -O0 flag. I’ll be sure to keep you updated on my progress.

By the way, I did use the -O3 flag when building CMAQv5.4.0.3.

Thanks.

1 Like

Hi Feng,

a few more items from our end, based on tests performed by Fahim:

  • Based on Fahim’s finding that “taking away both the ‘O3’ and ‘tp px’ flags using the nvidia compiler results in no difference between cases A, B & C”, this is probably the first test we would recommend, rather than switching to “-O0” right away. Not specifying any optimization level probably defaults to level 2 or 1 rather than 0 for nvidia, and that will still be significantly faster than dropping all the way to “-O0”

  • While both Fahim’s tests and mine did confirm differences for the optimized executables with the nvidia compiler (and the gfortran compiler for the DDM3D/BASE pairs), the differences in the ACONC O3 concentrations were much smaller (on the order of 1E-05 ppm) than the differences you showed.

We look forward to hearing about the results of your tests.

Christian

@hogrefe.christian

Hi Christian,

Thank you very much for the updates. I will kill the job with “-O0” option and move on to the test with “taking away both the ‘O3’ and ‘tp px’ flags".

Could you please provide more details why the model runs are different with both the ‘O3’ and ‘tp px’ flags?

Thanks.

Could you please provide more details why the model runs are different with both the ‘O3’ and ‘tp px’ flags?

I can only talk and speculate in very general terms since this is outside my area of expertise. I’d be glad to be corrected by others with a better understanding of compiler behavior.

In my understanding, both of these flags instruct the compiler to attempt to optimize the code to improve performance. This might possibly involve things like changing loop orders which might then cause some numerical differences relative to computations performed with a non-optimized executable. Since ISAM and DDM are compile time options, the pieces of code being compiled and possibly optimized differ between a “base model” compilation, an “ISAM model” compilation, and a “DDM model” compilation. Clearly there are a compiler-to-compiler differences as to what aspects of the code are being optimized and how this affects compiling the CMAQ code in particular. For example, our tests suggest that the intel compiler’s approach to optimization does not affect the CMAQ concentration fields regardless of whether ISAM or DDM3D code blocks are included during compilation.

I do not know which portion(s) of the CMAQ code - when compiled with nvidia/pgi with a higher level of optimization and including either the ISAM or DDM3D ifdef code blocks activated by including the -Disam or -Dsens pre-processor flags in the compiler instructions - are optimized by the compiler in such a way as to affect the non-ISAM, non-DDM3D concentration fields.

1 Like

@hogrefe.christian @fsidi

Hi Christian,

Thank you very much for providing detailed information about the compiler options and functions. I would like to involve my IT team in this for further understanding.

Regarding the ACONC of ozone and other species, it seems that across all three cases (A, B, and C as described by Sidi), there is uniformity when both the ‘O3’ and ‘tp px’ flags are removed. However, I’ve observed a slight variance in the averaged ozone concentration range, which falls within -0.007 to 0.007 ppm when comparing scenarios with and without both flags in my case of CMAQ runs, there are no instruments involved, such as ISAM and DDM3D.

To ascertain the impact of those flags on CMAQ modeling performance, I intend to examine the differences using the AMET tool.

Thanks.

1 Like

Hi Feng,

thanks for reporting back that removing both the ‘O3’ and ‘tp px’ flags for the Nvidia / pgi compiler led to consistent results across the A, B, and C cases for your setup as well, just as it did for Fahim’s tests on our system.

We also look forward to hearing about your analysis as to whether the differences introduced by the compiler optimization flags have a meaningful impact on model performance results. In our setup that did not seem to be the case.

I am running a small benchmark case for a two week period to get some actual timing results and to see how big the difference get.