CRACMM2 benchmark measurements

Hello,
I have a very minor issue and a question about the CRACMM2 benchmark (12NE3). Is there any wall times that we can compare versus? My recollection is that there was some measurement with 32 cores for the CB6R5 mechanism but I cannot find that info in the updated UserGuide/Tutorial website or anything specific to the CRACMM2 benchmark.
The minor issue is that the benchmark runs but several subroutines such as desid_module.F (ln 2091) or ELMO_PROC.F (ln 2162) generate the warning

Fortran runtime warning: An array temporary was created

It’s non-fatal and probably caused by the latest compiler version being a bit pickier than previous versions (haven’t had time to dig into the code).
Thanks

@afernandez,

Based on my current tests with CMAQv5.5 here are the wall clocks on my system with the same 2018_12NE3 benchmark on 32 cores similar to the setup in the Users Guide:

CB6: 880.10 (~14.7 mins)
CRACMM: 1030.48 seconds (~17.2 minutes)

Please note these are very rough numbers.

A much more representative ball park is an annual simulation using CRACMM2 & CB6R5 on a CONUS domain using the same exact emissions. The total wall times of the annual simulations were:

CB6 : 1059871.7893 seconds (~12.3 days)
CRACMM2: 1346010.96750002 seconds (~15.6 days)

As for your second question, array temporaries are a run time Fortran artifact used when passing arrays from one subroutine to another. As suggested, these are non-fatal warnings but depending on the size of what’s being passed may have a severe time penalty.

In both cases you highlighted, the routine being called defines a type of explicit-shape array (assumed size array) as an input argument. As such, the array being passed to this function should either be an explicit shape array or be a contiguous array in memory. If not, an temporary array will be created by the compiler to translate what’s being passed into what’s required. You can probably imagine, if these arrays are quite big then the translation can come with a severe time penalty. But in this case, I would say the arrays being passed are small and so the penalty should be small.

Hope this helps!

Hi Fahim,
Sorry but I don’t understand your numbers. If the first figures are for 24 hours, the numbers just don’t scale correctly (they’d match up better for 6 hours) even if there is some time penalization for I/O operations. Here, I’m assuming that even though you mention an annual simulation, CMAQ is still running periods of 24 hours and using the CGRID file to start the new day. Maybe this assumption is wrong so feel free to clarify if that were not the case.
Thanks,
Arturo

@afernandez,

Sorry for not being clear. The two sets of numbers provided are for different domains.

The first refers to the 2018_12NE3 (100x105x35), whereas the second set is for 2018_12US1(459x299x35). Additionally, both of these simulations used a different number of cores (32 vs 128).

Hopefully this clarifies things!

1 Like

I’m getting almost twice the performance using Graviton4 cores, but maybe your system has some relatively old processor.