I want to simulate the period between 2023.05.20 to 2023.0605 , using CMAQ. However, the simulation times are taking too long. I have submitted the job to the server, running CCTM with 4 nodes and 32 cores. Surprisingly, as I increase the number of cores, the running time step gets longer. This seems unusual to me as I’m new to this. Can anyone provide some advice? Thank you!
For a parallel computing problem, total run time has two parts: compute-time and parallel overhead.
- Compute time goes down with the number of cores.
- Parallel overhead grows rapidly with the number of cores
So for a given hardware/software configuration, there is some optimal number of cores for producing the shortest total run-time. The only solutions to it are either (a) write better software; or (b) get better (faster-node-to-node interconnect) hardware.
At least, CMAQ is much better than it once was.
Thank you for your clarification