I’m making some progress toward running CAMx with mpich and netcdf options enabled using the gfortran-13 compiler, having run CAMx successfully with netcdf but without mpi. At this stage I seem to have compiled CAMx v7.31 with netcdf-c-4.6.1, netcdf-fortran-4.4.1 and mpich-3.0.4 enabled, but am encountering runtime errors such as:
MPI_Send(97).: Invalid rank has value 1 but must be nonnegative and less than 1
A suggested solution was to make sure I selected numbers of CPUs and threads greater than 1 in the CAMx namelist which I did, without solving the problem.
Is it possible to rule out that these errors are unrelated to the fact that I’m running this on a VirtualBox 7.1 virtual machine installation using Linux Mint 22 with mpich-3.0.4 and openmpi-4.1.0? Knowing this will help focus my debugging efforts.
Thanks!
First of all, I would like to thank everyone for their help so far.
For context, I am one of the very few people with Linux user experience at work, and have been asked to step in as replacement administrator for our Dell T640 compute clusters that we use to run the CAMx photochemical model. Any administrator knowledge I have at present are from reading Linux references, practicing on a PC configured with VirtualBox and implementing the solutions suggested by forum members. I apologize for my previous questions that sound trivial or deserve an RTFM reply but I really am in way over my head as the de facto system administrator.
For this particular question I found a YouTube VirtualBox tutorial that demonstrated setting up multiple Ubuntu virtual machine nodes to simulate an MPICH cluster with passwordless SSH and an NFS shared directory for a parallel C programming exercise that I managed to adapt for Linux Mint. I am still getting similar MPI_Send: Invalid Rank errors like I was getting with just one multi-CPU virtual machine, and am no closer to debugging my problem.
Admittedly this last VirtualBox experiment is merely academic as the real installation after upgrading Linux and CAMx will be bare-metal on the Dell T640 and these workarounds for VirtualBox will be irrelevant, but hopefully useful. I guess my final question on this topic is, has anybody ever successfully run MPICH-enabled CAMx on VirtualBox? If the answer is NO then I will end my Linux Mint experiments and start on my Rocky Linux experiments.
Advice and comments appreciated!
Hi flyenz0,
I am running a linux virtual box on my personal PC at home to perform some specific tasks only so I don't have extensive experience/knowledge. Hopefully I can shed some light on your case.
First of all, in my mind, a cluster is a collection of physical machines (in your case is a collection of Dell T640) which are linked by network cables (e.g. Ethernet). I hope I have the correct understanding for your configuration. You have followed some tutorial constructing a virtual cluster, however, you were not able to run the code. Could you try to run the following simple code?
program test
implicit none
include ‘mpif.h’
integer :: mype, ierr, nprocs
call mpi_init (ierr)
call mpi_comm_size (mpi_comm_world, nprocs, ierr)
call mpi_comm_rank (mpi_comm_world, mype, ierr)
write (6, ‘(a7, 2i5)’) ’ ==d== ', mype, nprocs
! the following line prolongs this program’s execution
if (mype == 0) read (5, *) ierr
call mpi_finalize(ierr)
end program test
If there are n cores on each virtual box and there are m virtual boxes, then run the code with n*m cores. During the execution, on a different virtual box, have the top command typed in and observe whether the program is being executed on that virtual box as well. This whole exercise tries to determine your machine is cable to run a MPI job. Please let me know the outcome and feel free to contact me directly (wong.david-c@epa.gov).
Cheers,
David
Thanks for replying David.
I’m actually running this on a VirtualBox Linux Mint virtual machine on a Windows11 PC. I didn’t encounter these errors running CAMx on our CentOS7 Dell cluster, and I suspect they are either due to my misunderstanding of the correct procedure to run MPICH on VirtualBox, or an inherent limitation of VirtualBox.
I suspect the former, which is why I asked if anybody did it successfully. Then again if it’s the latter then there’d be no point in continuing the experiment.
I am familiar with the type of program you wrote as I did MPI programming as part of my PhD research, but this is different since I’m attempting to run CAMx software on virtual machines and have not found anything online describing anything similar.