hello,there.
i am trying to reconstruct cmaq with an openmp version. Now i am trying to reconstruct AERO module in which cost a lot in my perf result. And i met some obstacle.
First obstacle is that when i try to test an openmp testcode in aeroproc, the openmp thread is always 1. but i accutually used -fopenmp and set -x OMP_NUM_THREADS=2 in my mpirun script.
when i use the same code as a single program , the omp thread works fine; but when i use it as a subroutine in aero_driver.F, i can’t work.(i have tried to preprocess aero_driver.F and use #undef parallel to avoid !$omp parallel turns to !$omp 1)
my question 1 is that why above happens?
i found a nrow,nlay,ncol loop in aero_drvier.F, and in my experience, nlay loop can always do parallel in omp.and i begin to reconstruct in this point.
Second obstacle is that i found that aero module has some global var, which can be the context before parallel region. These global vars mostly assigned in subroutine extract_xxx but not change in parallel region, and all the extract_xxx is called before parallel region.I have tried to use threadprivate + copyin to copy these global var to every thread. however, i got some coredump.
my second question is that: “is the above method good enough to do the omp reconstruct? Is there any easier way to reconstruct?
Thank you for your attention~Any opinion will be appreciated~ ![]()