Where is hot spot in cloud?

rzbbc · November 29, 2025, 7:43am

Hi there~
Thank you for watching this post~
I’m trying to optimize cldproc recently, and i locate an hotspot in subroutine RESCLD.
After some code insert and test, i realize that there exist an big load unbalance in RESCLD, and mostly in subroutine scavwdep.Unfortranately the code in scavwdep contains a loop with condition, which is hard for me to continue inserting code to watch it behavior.
I read the whole code in scavwdep.F, and my result is that it calculate ALFA in four different part, including GC, AE, NR, TR, and calculate CEND and REMOV respectly. During the calculation, subroutine GETALPHA is called to get ALFA0, ALFA2, ALFA3, and function HLCONST is called to get KH.

TWO existing optimize i found is below:
1、seperate ALFA and CEND&&REMOV’s calc, and latter the EXP(-ALFA * TAUCLD) in CEND calculation could be vectorized. But it seems like that the code weigh not too much ratio.
2、search algo in HLCONST is linear search, which can be optimize to hash(which i think maybe the core hotspot)
Another phenomena i found is that as timestep grows, the cost of cldproc grows. Is there any mechanics supporting this happens?
Does someone know where is the actual hotspot? Any sugguestion is appreciated.

rzbbc · December 5, 2025, 6:55am

i found the linear search in hlconst is the most hotspot.It deserves some dict optimization.Almost 50% optimize in scavwdep.

cjcoats · December 5, 2025, 12:21pm

In its way, this is much akin to what SMOKE is about: thirty years ago, emissions modeling was mostly a bunch of redundantly-repeated ad-hoc linear searches using character-string search-keys (the same search being repeated for every time step). The key insight was eliminating the redundancy by turning the ad-hoc problem into a “vector” problem where there were a fixed – sorted – orderings (a “vector” structure), using binary searches instead of linear, and wherever possible replacing character-string operations (which are horribly expensive) by integer operations wherever possible: then it comes down to the use of structured sparse operators (which in the case of SMOKE were sparse-matrix multiplications; for HLCONST, the operators are sparse-exponential). The same algorithmic-complexity ideas still apply.
The result was to replace an overnight supercomputer run by two minutes forty-three seconds on a desktop SPARC-II workstation.
Note that your hash-table idea is a bit of an improvement over the present, but still not as much improvement as you would get with a more structured vector approach.
[Note also that SMOKE today no longer follows all the improvements of the original]

FWIW – Carlie J. Coats, Jr.,Pl.D.
I/O API Author/Maintainer
original SMOKE Author

wong.david-c · December 9, 2025, 3:07pm

Hi rzbbc,

Thanks for keep profiling the CMAQ model. I am not a cloud person but by examining the code, I saw two major subroutines: SCAVWDEP and AQ_MAP, which are being called in RESCLD subroutine. In addition, there is a conditional statement “IF ( QCRGCOL .GT. 0.0 ) THEN” that determines whether they will be called or not. Load imbalance is inevitable. I have done a small test and this hypothesis has been confirmed.

You other observation is within HLCONST, there is an expensive operation, i.e. string comparison (Dr. Coats has pointed that out as well). I have devised a new way to handle that and I have done a quick test confirming the new modification does not alter results. If you don’t mind, please contact me directly (wong.david-c@epa.gov) and I will share the code with you so you can conduct an independent verification.

Cheers, David

rzbbc · December 12, 2025, 6:57am

Thank you for your reply~
I tried codes D.R. wong provided me, It turn out that “structured vector approach” can get some faster result in my computer than my previous realization~

rzbbc · December 12, 2025, 7:01am

Thank you for your reply. I have tested your code and the test result is in above post. It’s turns out your idea in structured vector approach is right.
cheers, rzbbc.

Ben_Murphy · December 18, 2025, 10:15pm

This is great discussion. @cgnolte replaced the string comparisons in hlconst with integer comparisons about a year ago and it will likely be included in the next public CMAQ release.

wong.david-c · December 22, 2025, 1:36pm

Hi rzbbc,

Thanks for performing an independent test. Hopefully the new code will be in the next release.

Cheers, David

system · December 29, 2025, 1:37pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Some question about openmp reconstruct CMAQ	18	301	October 28, 2025
Is r_interpolate_var_2db optimizable? CMAQ cmaq	12	170	November 27, 2025
Is All G0 need to be calculated in CALCACT? CMAQ cmaq	4	76	January 19, 2026
CMAQ v5.3.3 DESID (run-time slower issue as variable increases- more than 20 or 30? - do not know exact number) DESID	2	407	February 22, 2023
CMAQ CCTM performance seems "bursty" (poor) on 176 core HPC Node CMAQ	8	125	August 20, 2024

Where is hot spot in cloud?

Related topics