CAMx OpenMPI/MPICH memory allocation problem

Hi,

I have been trying to run CAMx benchmark case for MPI build.
I cannot get the model running, and I have been trying many different configurations and so on.
I was wondering if you could help with this.

I have tried building the model using
OpenMPI version 4.1.6 and 4.1.5 with ib and ucx and without
MPICH version 3.0.4 and 3.4a2
icc and gcc

Here is my error for my latest try with gcc 4.8.5, ifort 2021.2.0, curl 7.76.1, zlib 1.2.11, hdf5 1.12.0, netcdf-c 4.7.0, netcdf-f 4.4.5

I highly appreciate if anyone could help me or share any experience.

I have attached the error log file.
slurm-41581179.txt (879.7 KB)

While it has been a while since my last time compiling CAMx, I would like to mention that: It would be good to build all libraries under the same environment rather than using module load sometimes. Thus, I suggest you build the libs listed in your post using the same dependence and try again (At least this worked for me when I was working on CAMx modeling a few years ago).

1 Like

Ryan,

Thanks a lot for getting back to me. I have tried to do what you have mentioned as best as my capabilities. The only module I am loading is the HPC’s intel compilers only. And I have installed all else my self. Still no luck. The compilation gets completed yet the memory allocation error is totally a new set of error to me. There is also very little information in the log.
Again, thanks for sharing.

Best,
Kiarash

This section of your log file indicates that you are using CAMx v7.20:

Starting CAMx v7.20, 04-30-22
*** Error in `../../camxv7.2_mpich_nonc4/built/CAMx.v7.20.MPICH3.NCF4.ifort': double free or corruption (fasttop): 0x0000000041cbea90 ***

This lead me to the following link on the CAMx page that recommends building using an earlier version of netCDF than what you are using.

https://www.camx.com/download/netcdf/

1 Like

Hi Liz,

Thanks for the suggestion. It seems the issue has been solved for the CAMxv7.10 onwards. So I am not sure the problem comes from the netcdf, all the errors are signaling to the MPI part.

Thanks,