@lizadams, @bbaek, @cjcoats, @tlspero, @wong.david-c, @hogrefe.christian , @cgnolte, @eyortizd
This is the recent error I got:
ls -l /home/catalyst/Desktop/Build_WRF/CMAQv5.0.2/scripts/cctm/BLD_saprc99_ae6_aq/CCTM_saprc99_ae6_aq_Linux4_x86_64gcc
-rwxr-xr-x 1 catalyst catalyst 11193024 Oct 22 14:24 /home/catalyst/Desktop/Build_WRF/CMAQv5.0.2/scripts/cctm/BLD_saprc99_ae6_aq/CCTM_saprc99_ae6_aq_Linux4_x86_64gcc
size /home/catalyst/Desktop/Build_WRF/CMAQv5.0.2/scripts/cctm/BLD_saprc99_ae6_aq/CCTM_saprc99_ae6_aq_Linux4_x86_64gcc
text data bss dec hex filename
4913497 4758192 7102728 16774417 fff511 /home/catalyst/Desktop/Build_WRF/CMAQv5.0.2/scripts/cctm/BLD_saprc99_ae6_aq/CCTM_saprc99_ae6_aq_Linux4_x86_64gcc
unlimit
limit
cputime unlimited
filesize unlimited
datasize unlimited
stacksize unlimited
coredumpsize unlimited
memoryuse unlimited
vmemoryuse unlimited
descriptors 4096
memorylocked 16384 kbytes
maxproc 63329
maxlocks unlimited
maxsignal 63329
maxmessage 819200
maxnice 0
maxrtprio 0
maxrttime unlimited
set MPIRUN = mpiexec
set TASKMAP = /home/catalyst/Desktop/Build_WRF/CMAQv5.0.2/scripts/cctm/machines
cat /home/catalyst/Desktop/Build_WRF/CMAQv5.0.2/scripts/cctm/machines
n001:8
mpirun -np 8 /home/catalyst/Desktop/Build_WRF/CMAQv5.0.2/scripts/cctm/BLD_saprc99_ae6_aq/CCTM_saprc99_ae6_aq_Linux4_x86_64gcc.sh
mpirun was unable to launch the specified application as it could not access
or execute an executable:
Executable: /home/catalyst/Desktop/Build_WRF/CMAQv5.0.2/scripts/cctm/BLD_saprc99_ae6_aq/CCTM_saprc99_ae6_aq_Linux4_x86_64gcc.sh
Node: catalyst-Precision-Tower-3620
while attempting to start process rank 0.
8 total processes failed to start
0.014u 0.011s 0:00.02 100.0% 0+0k 0+16408io 4pf+0w
mkdir -p /home/catalyst/Desktop/Build_WRF/CMAQv5.0.2/scripts/cctm/cctm_36km/logs/2016031
mv: No match.
@ i = 30 + 1
end
while ( 31 < 31 )
date
Tue Oct 22 17:14:35 CST 2019
exit
It seems the problem is due to openmpi. Your help will be greatly appreciated.
Thanks and regards.
Below is the updated error I got:
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here’s some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
–> Returned value Error (-1) instead of ORTE_SUCCESS
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[catalyst-Precision-Tower-3620:00824] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here’s some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
–> Returned value -1 instead of OPAL_SUCCESS
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here’s some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
–> Returned “Error” (-1) instead of “Success” (0)
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here’s some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
–> Returned value Error (-1) instead of ORTE_SUCCESS
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[catalyst-Precision-Tower-3620:00825] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here’s some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
–> Returned value -1 instead of OPAL_SUCCESS
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here’s some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
–> Returned “Error” (-1) instead of “Success” (0)
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here’s some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
–> Returned value Error (-1) instead of ORTE_SUCCESS
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[catalyst-Precision-Tower-3620:00826] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here’s some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
–> Returned value -1 instead of OPAL_SUCCESS
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here’s some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
–> Returned “Error” (-1) instead of “Success” (0)
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here’s some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
–> Returned value Error (-1) instead of ORTE_SUCCESS
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[catalyst-Precision-Tower-3620:00827] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here’s some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
–> Returned value -1 instead of OPAL_SUCCESS
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here’s some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
–> Returned “Error” (-1) instead of “Success” (0)
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here’s some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
–> Returned value Error (-1) instead of ORTE_SUCCESS
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[catalyst-Precision-Tower-3620:00828] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here’s some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
–> Returned value -1 instead of OPAL_SUCCESS
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here’s some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
–> Returned “Error” (-1) instead of “Success” (0)
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here’s some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
–> Returned value Error (-1) instead of ORTE_SUCCESS
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[catalyst-Precision-Tower-3620:00829] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here’s some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
–> Returned value -1 instead of OPAL_SUCCESS
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here’s some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
–> Returned “Error” (-1) instead of “Success” (0)
Primary job terminated normally, but 1 process returned
a non-zero exit code… Per user-direction, the job has been aborted.
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here’s some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
–> Returned value Error (-1) instead of ORTE_SUCCESS
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[catalyst-Precision-Tower-3620:00830] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here’s some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
–> Returned value -1 instead of OPAL_SUCCESS
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here’s some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
–> Returned “Error” (-1) instead of “Success” (0)
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[25693,1],0]
Exit code: 1
mkdir -p /home/catalyst/Desktop/Build_WRF/CMAQv5.0.2/scripts/cctm/cctm_36km/logs/2016031
mv: No match.
@ i = 30 + 1
end
while ( 31 < 31 )
date
Tue Oct 22 09:51:15 CST 2019
exit
Kindly help, please. It seems its an openmpi problem.
Thanks
Catalyst