Hi,
I am trying to install CMAQ on HPC cluster. I shift to tcsh shell script with srun --pty /bin/tcsh and then install Libraries and CMAQ.
I am so new with HPC and CMAQ .
as I try to run "./run_cctm_Bench_2016.csh in cctm script, It keeps giving me an error "srun: step creation still disabled, retrying(requested nodes are busy)
I am not sure what is this error about and is it caused by CMAQ or my HPC. Can you please help me
If you are using HPC, you will need to get to know the job submission commands for writing a submit.sub script for requesting the nodes. For details, you can jump to your institution’s HPC website for more information because various HPC has different set ups (scratch, srun, bsub, qsub, etc).
#!/bin/bash #SBATCH -A fiore # Account #SBATCH --job-name=test_run # The job name #SBATCH -c 32 # Number of cores #SBATCH -N 4 # Ensure that all cores are on one machine #SBATCH -t 0-12:00 # Runtime in D-HH:MM #SBATCH --exclusive # Memory pool for all cores #SBATCH -C mem768 #SBATCH --mem=700GB
as @Ryan had mentioned in his response to the original post in this thread, this type of slurm error is not specifically related to CMAQ and is something your institution’s IT department likely is in a better position to help you with than the users on this forum (though of course there may be some folks here who have specific insights - I do not).
Your run script contains several non-standard elements that caught my attention, though I do not know if they are related to your issue with slurm. CMAQ does not use OMP so I am unsure about the intent of those instructions.
Please also note that opening issues on the CMAQ github repository is not the preferred way to ask for technical help on CMAQ, while posting on this forum is. We will go ahead and close the issue you have opened on the CMAQ github repository shortly.
I have no in-depth familiarity with configuring srun with SBATCH options.
On our system, we use the following to run the benchmark case with a 8x4 domain composition on 32 cores:
#SBATCH -n 32
which just specifies the number of tasks to run but does not specify a specific number of nodes or CPUs per task. Your setup instead uses
#SBATCH -c 32 # Number of cores #SBATCH -N 4 # Ensure that all cores are on one machine
which (if I read the srun man pages correctly) would request 32 CPUs per task and also request 4 nodes. You also specify
#SBATCH --exclusive # Memory pool for all cores #SBATCH -C mem768 #SBATCH --mem=700GB
which is not something we typically do in our CCTM scripts.
You could try using the simpler instructions we use (just specify the number of tasks as 32 for a 8x4 domain decomposition), but there might be things specific to your system that require the more specific instructions you are providing. Your organization’s IT department might be a good resource for that.
Based on the detailed description page you have provided, it seems to me that the problem is not related to CMAQ but the way how to submit a job on your system proper. Please try to run this simple program on your system using the same method. If you can’t run this (that confirms my believe), please consult (as Christian has suggested) with your system administrator.