Hello all,
I am running CMAQ for 12CONUS1 domain from 01 Feb,2023 for 2 days. After running for sometime, it says:
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 1697494 RUNNING AT ec5
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
slurmstepd: error: Detected 1 oom_kill event in StepId=5112172.0. Some of the step tasks have been OOM Killed.
srun: error: ec5: task 0: Out Of Memory
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
real 9378.48
user 0.05
sys 0.04
I used following commands in slurm to execute this:
run --time=168:00:00 -c 16 --mem=32G --pty bash
sbatch -n 128 --time=168:00:00 run_cctm_202302_12US1.csh
Please find attached my runscript and log script.
slurm-5112172.txt (19.3 KB)
run_cctm_202302_12US1.txt (35.7 KB)
CTM_LOG_074.v532_gcc_12US1_459X299_20230201.txt (75.2 KB)
Thank you for your help.
Hasibul