ACON are different from CMAQ-ISAM and CMAQ-DDM with identical inputs

@fsidi @hogrefe.christian
Hi Fahim and Christian,
CMAQ, CMAQ-ISAM, and CMAQ-DD3M simulations are experiencing significant slowdowns, resulting in higher computational costs. These models are approximately 2 to 3 times more expensive than the CMAQ suite configured with the ‘O3’ and ‘tp px’ flags. Have you considered compiling CMAQ with the ‘-O2’ optimization level? I’m eager to accelerate the CMAQ runtime.

Thanks,

Hello Feng,

your experience of seeing a 2-3 fold increase in runtime when removing the pgi/Nvidia “O3” and “tp px” flags matches what Fahim found on our system:

We don’t know which optimization level (1, 2, or something in between) the pgi/Nvidia compiler defaults to if no -Ox flag is specified. You could try run with -O1 and -O2 to see how this affects run time and whether it leads to consistent results across A, B, and C. We know that -O3 leads to inconsistent results, removing “-O3” and “-px” altogether leads to consistent results and slows down things by a factor of 2-3, and that -O0 also leads to consistent results but is even slower. If you find that either “-O1” or “-O2” lead to consistent results and has less slowdown relative to “-O3” than removing “-O3” and “-px” altogether, please let us know.

I’ll also repeat our finding that, at least on our system, using the intel compiler allows us to use the -O3 optimization flag while still maintaining consistent ACONC files across A, B, and C. Furthermore, we also find the intel compiler with the -O3 flag to be significantly faster than the Nvidia/pgi compiler with the -O3 flag. Therefore, switching compilers might be beneficial for your application as well.

2 Likes

@FengLiu,

Just to add onto this thread and to what @hogrefe.christian has said, according to the Nvidia HPC Compilers Reference Guide the default “-O” level if none is prescribed is level 1 (“-O1”).

What makes the Intel compiler appealing is that there are flags that allow you to use aggressive optimization where it makes sense (non-floating point calculations) while other places aren’t aggressively optimized. A good reference guide to learn about this Intel specific option is here: https://www.nccs.nasa.gov/images/FloatingPoint_consistency.pdf.

Given that Intel is cognizant and implements something like this, probably suggests that Nvidia also does something similar, and after some consideration and searching I recommend you try the “-Kieee” flag along with the defaults “O3 and tp px”. Doing this for us took away the differences and gives reasonable performance. Note: both @hogrefe.christian and I tested the “-Kieee” flag on different tests cases and confirmed it worked.

If this turns out to the be the case for you too, we’ll add this flag to our defaults at some point.

Hope this helps!

2 Likes

Hi Fahim,

Thank you for sharing the update regarding the new approach for optimizing options when using the PGI compiler. I appreciate it! I plan to take the following steps:

  1. Rebuilding CMAQ: I will recompile the Community Multiscale Air Quality (CMAQ) model to incorporate these changes.
  2. Running Test Cases: I will execute test cases using the updated CMAQ build to assess the consistency of the results.
  3. Evaluating Efficiency: Additionally, I’ll analyze whether the efficiency of numerical simulations improves with this new configuration.

Your information is valuable, and I’m looking forward to seeing the impact of these optimizations.

Thank you,

Christian, thank you for your suggestion and information.