Hi, D.R. David~
Thank your for your reply~
Q1:The timing (three numbers) you have provided is a little bit odd, i.e. the last two numbers are the multiple of the very first one. Could you please show me how you capture the timing in your experiment?
A1: I inserted write(*,*) "skip/calc", vname, date, time, lvl in line 6644 and in line 6693 respectively, recompile and rerun the job. It is basicly an average four split of a time interval 5min. In HHMMSS form,it is 000000, 000115, 000230,000315, 000500.
In my experiment, a timestep is 5min, i try to figure out what happen in the interpolate method, so i insert code into it.
Q2:Could you please elaborate the terms “avoid uncecessay branch” and “cal branch”?
A2: the avoid unnecessary branch i think is line 6644, which only do an count = count + 1, and calc branch is in 6647-6696, which calc ratio and interpolate in cio_bndy_data.
And finally, I agree with your description about the routine of r_interpolate_var_2db. A point you did not mentioned is that, after the next big if-then block to determine when data and time have no change, interpolation is not done otherwise interpolation is done for the entire slab, i.e. all levels, if time have changed, the timestamp will change to date/time in
cio_bndy_data_tstamp(1, 2, var_loc) = date
cio_bndy_data_tstamp(2, 2, var_loc) = time
this will cause different lvl and different time step into calc branch, and duplicate calculations will happen, which i am 100% comfirmed.
Current solution to duplicate calculation relies on the tstamp compare, which works fine if all vname lvl’s time is the same. I did not deny the mechanic of avoiding redundancy calc of current code, but there are exeptions, as the phenomina in my experiment lvl 24-31 still go to calc branch, in which the lvl 29’s time is 345, and lvl 30’s time is 230, which also confuse me sometimes. ANYWAY, my testcase is 2018_12US1.
And let me decribe my optimize again, in case i did not describe very clear.
The core optimization is according to Razor Principle.As you describe in your last post,
* the next if block determining the starting and ending level depends on the presence of lvl
* assign the interpolated data to the appropriate level
If we dive into the code, we can found that if lvl present, the interpolated date used to produce the var data, is all about a slice, code is below:
if (present(lvl)) then
beg_k = lvl
end_k = lvl
else
beg_k = 1
end_k = nlays
end if
data = 0.0
store_beg_ind = cio_bndy_data_inx(1,2,var_loc)
DO k = beg_k, end_k
starting_pt = store_beg_ind + (k - 1) * size_b2d - 1
let’s look at the var starting_pt, it’s basicly a first address of slice of an lvl in a cube. So the calc in
cio_bndy_data(store_beg_ind:store_end_ind) = cio_bndy_data(head_beg_ind:head_end_ind) * ratio1
& + cio_bndy_data(tail_beg_ind:tail_end_ind) * ratio2
is far too much to calc to cover the requirement of the following code.I can feel that the code is done by two people, or one people in different time, especially in the present(lvl) part.
By the way, I have tried to solve the tstamp and calc issue in my concern, which accelerate hadvppm from 3.2s to 0.8s every step, and results is “same” in 2hour simulation. The way to compare results is compare the CCTM_BUDGET_v54_cb6r5_ae7_aq_WR413_MYR_clang4.0.0_2018_12US1_2x96_classic_20171222.txt in output dir, and i am wondering if this is right method.
Another info is that perf result in CMAQ5.4 in my machine, interpolate_var_2db weight 15%, and most hotpoint lies in
cio_bndy_data(store_beg_ind:store_end_ind) = cio_bndy_data(head_beg_ind:head_end_ind) * ratio1
& + cio_bndy_data(tail_beg_ind:tail_end_ind) * ratio2
that why i try to optimize it and i think figure the optimize way out can benefit a lot for CMAQs speed performance~
Cheers, rzbbc.