Where is hot spot in cloud?

Hi there~ :blush:
Thank you for watching this post~
I’m trying to optimize cldproc recently, and i locate an hotspot in subroutine RESCLD.
After some code insert and test, i realize that there exist an big load unbalance in RESCLD, and mostly in subroutine scavwdep.Unfortranately the code in scavwdep contains a loop with condition, which is hard for me to continue inserting code to watch it behavior.
I read the whole code in scavwdep.F, and my result is that it calculate ALFA in four different part, including GC, AE, NR, TR, and calculate CEND and REMOV respectly. During the calculation, subroutine GETALPHA is called to get ALFA0, ALFA2, ALFA3, and function HLCONST is called to get KH.

TWO existing optimize i found is below:
1、seperate ALFA and CEND&&REMOV’s calc, and latter the EXP(-ALFA * TAUCLD) in CEND calculation could be vectorized. But it seems like that the code weigh not too much ratio.
2、search algo in HLCONST is linear search, which can be optimize to hash(which i think maybe the core hotspot)
Another phenomena i found is that as timestep grows, the cost of cldproc grows. Is there any mechanics supporting this happens?
Does someone know where is the actual hotspot? Any sugguestion is appreciated.

i found the linear search in hlconst is the most hotspot.It deserves some dict optimization.Almost 50% optimize in scavwdep.

In its way, this is much akin to what SMOKE is about: thirty years ago, emissions modeling was mostly a bunch of redundantly-repeated ad-hoc linear searches using character-string search-keys (the same search being repeated for every time step). The key insight was eliminating the redundancy by turning the ad-hoc problem into a “vector” problem where there were a fixed – sorted – orderings (a “vector” structure), using binary searches instead of linear, and wherever possible replacing character-string operations (which are horribly expensive) by integer operations wherever possible: then it comes down to the use of structured sparse operators (which in the case of SMOKE were sparse-matrix multiplications; for HLCONST, the operators are sparse-exponential). The same algorithmic-complexity ideas still apply.
The result was to replace an overnight supercomputer run by two minutes forty-three seconds on a desktop SPARC-II workstation.
Note that your hash-table idea is a bit of an improvement over the present, but still not as much improvement as you would get with a more structured vector approach.
[Note also that SMOKE today no longer follows all the improvements of the original]

FWIW – Carlie J. Coats, Jr.,Pl.D.
I/O API Author/Maintainer
original SMOKE Author