I am getting a huge difference in CMAQ CCTM performance (on an Azure CycleCloud / Slurm Cluster) when running in a container (I am using a converted docker container running on apptainer / singularity as-was) vs when running from a filesystem in a custom VM image.
I expected some perf. drop due to the container filesystem overhead, but I did not expect a factor of 2-4x - can anyone confirm that this is expected? If not, what I might need to do to improve the containerised run performance?
I have run a bunch of tests on the same CMAQ CCTM run data (but no multi-node containerised runs as I just could not get this to work…), and it looks like the only way to get CMAQ CCTM to run quick is from a filesystem (esp multi-node with BeeGFS) - is this correct / what others have found?
Single Node Container - 1x 160 cores (HB176rs_v4) NFS:
***** CMAQ TIMING REPORT *****
Start Day: 2024-09-10
End Day: 2024-09-12
Number of Simulation Days: 3
Domain Name: MDE27
Number of Grid Cells: 864360 (ROW x COL x LAY)
Number of Layers: 40
Number of Processes: 160
All times are in seconds.
Num Day Wall Time
01 2024-09-10 1663.12
02 2024-09-11 2007.64
03 2024-09-12 1478.06
Total Time = 5148.82
Avg. Time = 1716.27
Single Node Filesystem - 1x 160 cores (HB176rs_v4) NFS:
***** CMAQ TIMING REPORT *****
Start Day: 2024-09-10
End Day: 2024-09-12
Number of Simulation Days: 3
Domain Name: MDE27
Number of Grid Cells: 864360 (ROW x COL x LAY)
Number of Layers: 40
Number of Processes: 160
All times are in seconds.
Num Day Wall Time
01 2024-09-10 905.39
02 2024-09-11 1189.18
03 2024-09-12 882.81
Total Time = 2977.38
Avg. Time = 992.46
Single Node Filesystem - 1x 160 cores (HB176rs_v4) BEEGFS:
***** CMAQ TIMING REPORT *****
Start Day: 2024-09-10
End Day: 2024-09-12
Number of Simulation Days: 3
Domain Name: MDE27
Number of Grid Cells: 864360 (ROW x COL x LAY)
Number of Layers: 40
Number of Processes: 160
All times are in seconds.
Num Day Wall Time
01 2024-09-10 456.00
02 2024-09-11 448.76
03 2024-09-12 447.95
Total Time = 1352.71
Avg. Time = 450.90
Two Node Filesystem - 2x 120 cores (HB120rs_v3) BEEGFS:
***** CMAQ TIMING REPORT *****
Start Day: 2024-09-10
End Day: 2024-09-12
Number of Simulation Days: 3
Domain Name: MDE27
Number of Grid Cells: 864360 (ROW x COL x LAY)
Number of Layers: 40
Number of Processes: 240
All times are in seconds.
Num Day Wall Time
01 2024-09-10 380.24
02 2024-09-11 371.60
03 2024-09-12 368.80
Total Time = 1120.64
Avg. Time = 373.54
Note - CMAQ CCTM 5p4 and GCC Compiler