Segmentation fault (core dumped)

Dear Sir,
I have a problem when I was running mcip, I am using CMAQ5.3+WRF4.1+mcip5.0+,the error is as follows:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x7F3D40A55697
#1 0x7F3D40A55CDE
#2 0x7F3D3FF503AF
#3 0x7F3D3FF5326D
#4 0x7F3D3F7E95F4
#5 0x7F3D3F7EFACD
#6 0x7F3D3F94A279
#7 0x7F3D3F948A72
#8 0x7F3D3FCFF201
#9 0x7F3D40DD9DA1
#10 0x7F3D40DDA0F5
#11 0x7F3D40DD777B
#12 0x7F3D40DE02EF
#13 0x7F3D40D85566
#14 0x7F3D40D8562B
#15 0x7F3D4107B766
#16 0x7F3D410CF5F7
#17 0x4C28B2 in setup_wrfem_
#18 0x4C1589 in setup_
#19 0x403D46 in MAIN__ at mcip.f90:?
Segmentation fault (core dumped)
Error running mcip

@Rainzmh

I’m sorry you are getting this error. Can you also give me the last bit of your log file, too, please? If you can also get a more specific line from the traceback (that is, which line in setup_wrfem.f90 is causing the problem), then I can help you more quickly. You may be able to get this information using additional debugging options on your compile of MCIP.

–Tanya

Thanks for your reply,but I don’t know how to get more specific line from the traceback,could you tell me please?Thanks a lot!

@Rainzmh

Which compiler are you using?

–Tanya

Centos7 with gcc 9.1.0

The MCIP Makefile contains sections for three different compilers: Portland Group Fortran, gfortran, and Intel Fortran. Since you are using gcc, you should be using the section for gfortran.
For each of these compilers, there is an optional variant with debugging flags. Each of these includes “-g -O0” plus additional flags. Try compiling with the debugging flags, then rerun MCIP.

#...gfortran
FC	= gfortran
NETCDF = /usr/local/apps/netcdf-4.6.3/gcc-6.1.0
IOAPI_ROOT = /usr/local/apps/ioapi-3.2_20181011/gcc-6.1.0
#FFLAGS	= -O3 -I$(NETCDF)/include -I$(IOAPI_ROOT)/Linux2_x86_64
FFLAGS	= -g -O0  \
          -ffpe-trap='invalid','zero','overflow','underflow'  \
          -I$(NETCDF)/include -I$(IOAPI_ROOT)/Linux2_x86_64
LIBS    = -L$(IOAPI_ROOT)/Linux2_x86_64 -lioapi  \
          -L$(NETCDF)/lib -lnetcdff -lnetcdf

Thanks a lot! When I follow steps,and then return to use the command:
./run_mcip.csh gcc ,the error are as follows:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x2BA57C1ED697
#1 0x2BA57C1EDCDE
#2 0x2BA57CC803AF
#3 0x2BA57CC8326D
#4 0x2BA57D2FB5F4
#5 0x2BA57D301ACD
#6 0x2BA57D45C279
#7 0x2BA57D45AA72
#8 0x2BA57D022201
#9 0x2BA57BF44DA1
#10 0x2BA57BF450F5
#11 0x2BA57BF4277B
#12 0x2BA57BF4B2EF
#13 0x2BA57BEF0566
#14 0x2BA57BEF062B
#15 0x2BA57BC67766
#16 0x2BA57BCBB5F7
#17 0x4D2AC2 in setup_wrfem_ at setup_wrfem.f90:879
#18 0x4CDD3E in setup_ at setup.f90:142
#19 0x403D56 in MAIN__ at mcip.f90:123
Segmentation fault (core dumped)
Error running mcip

and the files which are in src folders metioned above are as follows:
(mcip.f90).txt (9.2 KB) (setup.f90).txt (7.3 KB) (setup_wrfem.f90).txt (57.9 KB)

Please go into the MCIP src directory and issue the command “make clean”. Then recompile. If you get the same error, then please post the result of “ncdump -h” for your wrfout file.
Are you running MCIP on the benchmark dataset, or on your own case?
What version of netCDF are you using?

I follow your steps,and I get the same error.
The result of "ncdump -h"are here:ncdump.txt (45 KB) ,I am running MCIP on my own case.
The netcdf library version is 4.7.2

@Rainzmh

Thank you for provided additional information. The line where MCIP is failing with a segmentation fault is on the dimensions of the “Times” variable in the wrfout file. Do you know if your WRF run completed successfully? That is, are all of the times available in the wrfout file that you think should be in the file?

–Tanya

I think there may be something wrong in my wrfout file.But unfortunately I don’t konw how to check my fault.Should I use the benchmark dataset first?Could you please provide the website link to me?Thanks a lot!

Hi,
If the WRF run finished correctly, you should see “SUCCESS COMPLETE WRF” as the last message in the rsl.out file from the run.
Megan

I am sorry that I can’t find any rsl.out file .

According to the ncdump header you posted, there is only one time step in the file. It is possible that this is causing the crash, and MCIP should check for this condition somehow and exit with a more appropriate error message. Regardless, no useful MCIP output could be created from a single time step of WRF output. You will need to obtain additional WRF output files.

Here is a link to running the CMAQv5.3.1 tutorial (Benchmark) case, along with a link to download the dataset. This may not include step-by-step instructions for running MCIP, but I believe it includes both WRF and MCIP output, so you could run MCIP and see if you can reproduce the provided data.

@cgnolte

Re: “MCIP should check for this condition somehow and exit with a more appropriate error message”.

I disagree. Users have a responsibility to ensure that their WRF files are complete before trying to run MCIP. There is only so much coding that can be done to save users from themselves. If you can suggest a way to elegantly to code this, I’m open to it.

–Tanya

I meant to emphasize possible. I am still not sure that the single time step is what is causing the crash, since I don’t see why that should cause the call to nf90_inquire_variable to seg fault. Maybe it’s a bug in netCDF.

@Rainzmh

If you have not already tried this, please type “ncdump -v Times wrfout” to determine whether or not your WRF file closed correctly. The output from this command will look like “ncdump -h” at first, but last information after the header will give you what is in the variable “Times”. If there are no actual times listed in the end, then your WRF file did not close correctly, and your WRF run likely crashed.

As mentioned by @mmallard, you could look in the WRF-generated rsl.out.0000 file for the specific phrase that indicates that the WRF run completed correctly.

Hope this helps…
–Tanya