NetCDF error in CMAQ5.3.1

Hello, all,

I am testing the CMAQ5.3.1 benchmark and got some problems. The CCTM module was successfully built, but errors occurred when running the executable:

In the run_cctm log file run_cctm_Bench_2016_12SE1.csh.log.txt (25.5 KB)

 application called MPI_Abort(MPI_COMM_WORLD, 538976288) - process 20
 In: PMI_Abort(538976288, application called MPI_Abort(MPI_COMM_WORLD, 538976288) - process 20)

Then go to the log file for process 20 CTM_LOG_020.v531_intel_Bench_2016_12SE1_20160701.txt (36.7 KB)

 Error creating netCDF file
 netCDF error number  -35  processing file "CTM_CONC_1"

 *** ERROR ABORT in subroutine OPCONC on PE 020          
 Could not open CTM_CONC_1

Any ideas about the errors? Thanks for your help!

Xiang

Your log file indicates pretty clearly what is wrong:

     Error creating netCDF file
 netCDF error number  -35  processing file "CTM_CONC_1"
 NetCDF: File exists && NC_NOCLOBBER
 NetCDF: File exists && NC_NOCLOBBER
 /scratch/panosg/gscratch/CMAQ5.3.1/CMAQ-5.3.1/data/output_CCTM_v531_intel_Bench_2016_12SE1/CCTM_CONC_v531_intel_Bench_2016_12SE1_20160701.nc

The solution is to delete the preexisting output file(s). Alternatively, set the run script variable CLOBBER_DATA to TRUE and the script will delete them for you.

Thank you so much for the prompt reply.

I checked and ensured that no files existed in the output folder, and I also set CLOBBER_DATA to TRUE (by default). The same errors occur. As shown below:


The above CONC file in the output folder was generated based on the current run. It seems the calculation results cannot be assigned to the CONC file.

I also tried serial: some output files were generated initially, but then it stopped with errors run_cctm_Bench_2016_12SE1_serial.csh.log.txt (12.4 KB) :

forrtl: severe (174): SIGSEGV, segmentation fault occurred

The netCDF error suggests that an attempt is being made to create a file where one already exists.
I see that your IOAPI is compiled with PnetCDF. Did you build CMAQ using set build_parallel_io ? If you are using that option, try rebuilding the model with that option turned off (commented out).

When you say that you tried a serial run, did you rebuild the model with set ParOpt commented out, or did you simply run with NPCOL_NPROW set to “1 1”? (either way should work). Is there a CTM_LOG_000 file?

Hi Dr. Nolte,

I can run the benchmark for CMAQ5.3.1 by commenting out “build_parallel_io”. Thank you so much!

Here is my summary:
There is no problem running CMAQ5.2.1 using IOAPI v3.2 April, 2018 and PnetCDF v1.9.0 with build_parallel_io turned on. However, to run successfully on CMAQ5.3.1, some modifications should be made: (1) turn off the option build_parallel_io; and (2) upgrade IOAPI to a more recent version (e.g., v3.2 November, 2019).

Xiang

Hi Xiang,

Please contact me directly wong.david-c@epa.gov to get this resolve.

Cheers,
David

I have successfully run CCTM in serial mode, but I have encountered problems when debugging multi node parallel mode. The error message of single node is as follows:

Value for IOAPI_ CHECK_ HEADERS: N returning FALSE

Value for IOAPI_ OFFSET_ 64: YES returning TRUE

Value for IOAPI_ CFMETA not defined;returning default: FALSE

Value for IOAPI_ CMAQMETA not defined; returning defaultval ': ‘NONE’

Value for IOAPI_ CMAQMETA not defined; returning defaultval ': ‘NONE’

Value for IOAPI_ SMOKEMETA not defined; returning defaultval ': ‘NONE’

Value for IOAPI_ SMOKEMETA not defined; returning defaultval ': ‘NONE’

PN_ CRTFIL3: Error creating PnetCDF file attribute FTYPE

netCDF error number -36 processing file "CTM_ CONC_ 1 "

NetCDF: Invalid argument

NetCDF: Invalid argument

WARNING in subroutine OPNLOG3 <<<

Warning netCDF file header attribute EXEC_ ID.

Not available for file: CTM_ CONC_1

netCDF error number -33

“CTM_ CONC_ 1” opened as NEW(READ-WRITE )

File name “/disk/Build_ WRF/CMAQ_ DATA/output/CCTM_ CONC_ v531_ gcc9.1.0_ Bench_ 2020_ SX_ 20200506.nc”

File type UNKNOWN

Grid name "

Dimensions: 0 rows, 0 cols, 0 lays, 0 vbles

NetCDF ID: 0 opened as VOLATILE READWRITE

Time-independent data.

Error flushing PnetCDF file "CTM_ CONC_ 1 "

PnetCDF error number -33

WRTFLAG: MPI_ SEND(SFLAG) error

*** ERROR ABORT in subroutine OPCONC on PE 000

Could not sync to disk CTM_ CONC_1

PM3EXIT: DTBUF 3:00:00 May 6, 2020

Date and time 3:00:00 May 6, 2020 (2020127:030000)

WARNING in subroutine SHUT3 <<<

Error closing PnetCDF file

File name: CTM_ CONC_1

PnetCDF error number -33

*** ERROR ABORT in subroutine PM3EXIT ***

Could not shut down I/O API files correctly

The error messages in the multi node test are as follows:
Fatal error in PMPI_Comm_create: Unknown error class, error stack:
PMPI_Comm_create(565)…: MPI_Comm_create(MPI_COMM_WORLD, group=0x88000001, new_comm=0x11a391f8) failed
PMPI_Comm_create(542)…:
MPIR_Comm_create_intra(207)…:
MPIR_Get_contextid_sparse_group(495):
MPIR_Allreduce_impl(293)…:
MPIR_Allreduce_intra_auto(178)…:
MPIR_Allreduce_intra_auto(84)…:
MPIR_Bcast_impl(310)…:
MPIR_Bcast_intra_auto(223)…:
MPIR_Bcast_intra_binomial(182)…: Failure during collective
Fatal error in PMPI_Comm_create: Unknown error class, error stack:

My running environment is cmaq5.3.1 + pnetcdf1.12.1 + ioapi-3.2 + mpich-3.3.2. Now I don’t know how to solve it. I hope I can get some help to realize multi node parallel running of CMAQ in cluster environment.
Thanks very much.

We recommend building netcdf without pnetcdf using

--disable-netcdf-4 --disable-dap

Additional instructions are available in these tutorials:

See https://cjcoats.github.io/ioapi/ERRORS.html#ncferr for the I/OL API Troubleshooting Guide’s section on netCDF errors. This error has description ncebadid = nf_ebadid = -33: not a netcdf ID (might indicate a bug in I/O API internals, or attempt to use a coupling-mode virtual file in a program linked to an I/O API library without coupling-mode enabled) ostensibly this tracks back to PN_ CRTFIL3. There was once a bug at the indicated point in this routine (now fixed); you might try updating your I/O API source, re-building, and then trying again.