Please let us know how many cpus or cores are available on your machine.
lscpu
If you have only 16 cores available but you try to run using 64, then you are giving too much work to each compute node, and it will give this type of poor performance.
You can also check this using top or htop on your compute cluster. If you are on an hpc system with a login node, and are using a job scheduler, you would need to login to the compute node and run top or htop.
Another question is how much memory is available, but typically if you were exceeding the memory requirements, the model would crash with a FPE.
@cgnolte ran the same domain and determined that 89 GB of memory is required for the 12US1 domain. Is 68GB memory enough to run CMAQv5.3.1 with 12US1 platform? - #2 by cgnolte
It may also be that your filesystem is slow, and if you are exceed the l2 cache then the model tries to read information from disk instead of from memory, and if the filesystem is slow, this would lead to poor performance.
Performance benchmark information for this domain is also available here: [CMAQ on AWS)(3. Performance and Cost Optimization - aws-cmaq documentation)