2019 platform MOVESMRG segmentation error

Hi All,

I am running SMOKE (2019 platform) over our own 36x36km domain and ran into segmentation error in MOVESMRG for RPHO and RPS. I can successfully run RPH, RPV and RPP though.

The MOVESMRG stopped in the middle of reading emission factors, and segmentation error appeared in the terminal output.

Some other steps were taken before posting here:

  1. I have also tried to process RPS for 2019 platform domain (12US1), and that turned out to be successful.
  2. I have compiled SMOKE v4.8.1, v5.0 successfully, but same segmentation error.
  3. I have tried pre-compiled SMOKE releases (v4.8.1, v4.9 and v5.0) from Github, but also had segmentation error.
  4. I have searched for solutions on this forum.

Both the MOVESMRG log file and terminal output file are uploaded here for your reference (smoke4.8.1 in 2019 platform was used):
movesmrg_RPHO_onroad_jan_2019ge_cb6_19k_20190101_nested_36_cmaq_cb6ae7.log.txt (227.9 KB)
RPHO_NESTED36.log.txt (26.4 KB)

Please help me out and thank you!

Li

To isolate where the seg-fault is happening, it might be useful
to re-build and re-run with BIN = Linux2_x86_64ifortdbg – that run would compile with traceback enabled, and would then give you the file and its line-number at which the seg-fault occurred.

That would be a much better start than just knowing the problem is somewhere in MOVESMRG.

My first thought is out of memory, so how much memory/RAM are you allocating to RPHO and RPS? Are you able to run RPD? (RPD is the most memory intensive job.)

Thanks for your reply.
Yes, I can certainly try this and get back to you here.

Hi, I have 32GB of RAM. And I can process a much bigger domain 12US1, and RPD. So lack of RAM can be safely ruled out in this case.

I assume that I need to recompile IO/API with the same BIN “Linux2_x86_64ifortdbg”.

While doing so, I encountered error as follow:

======================================================
/home/li/downloads/SMOKE_INTEL/ioapi-3.2/Linux2_x86_64ifort_mediumdbg/libioapi.a(wratt3.o): In function msan.module_ctor': ifxadHxLj.i90:(.text.msan.module_ctor+0x2): undefined reference to __msan_init’
Makefile:248: recipe for target ‘airs2m3’ failed
make[1]: *** [airs2m3] Error 1
make[1]: Leaving directory ‘/home/li/downloads/SMOKE_INTEL/ioapi-3.2/m3tools’
Makefile:204: recipe for target ‘all’ failed
make: *** [all] Error 2

Hi,

Based on the v4.8.1 code and your logs It looks like movesmrg is failing at the NOx humidity adjustment step.

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
movesmrg_09jun202 00000000006442CD Unknown Unknown Unknown
libpthread-2.27.s 000014D44008F980 Unknown Unknown Unknown
movesmrg_09jun202 00000000004074CB MAIN__ 768 movesmrg.f
movesmrg_09jun202 0000000000403B9E Unknown Unknown Unknown
libc-2.27.so 000014D43FAA9C87 __libc_start_main Unknown Unknown
movesmrg_09jun202 0000000000403AA9 Unknown Unknown Unknown

There have been significant changes to movesmrg since SMOKE v4.8. When you tested with SMOKE v5.0 did the seg fault occur in the same location?

You also said that you were able to successfully process the 12US1 domain. Is your 36 km domain fully nested into the 36US3 or is it offset in some way?

Thanks for your suggestion.

For testing purpose, I added setenv APPLY_NOX_HUMIDITY_ADJ “N” to run script. And this does not change anything.

My observation is that the MOVESMRG failed before reading in eftables of all ref counties (there should be a few more files to read according RPP or RPV for example).

SMOKE v5.0 stopped at the same spot in the log file.

For 12US1 domain, I can process RPS but not RPHO, because RPHO requires MET3D files which I don’t have.

================================================

Multi-day run note: Run starts with 2019001, is 250000 long
Running part 4, for 20190101…

 This program uses the EPA-AREAL/MCNC-EnvPgms/BAMS/ UNC IE
 Models-3 I/O Applications Programming Interface, [I/O API]
 which is built on top of the netCDF I/O library (Copyright
 993, 1996 University Corporation for Atmospheric Research
 Unidata Program) and the PVM parallel-programming library
 (from Oak Ridge National Laboratory).
 Copyright (C) 1992-2002 MCNC,
 (C) 1992-2018 Carlie J. Coats, Jr.,
 (C) 2003-2012 Baron Advanced Meteorological Systems, LLC, and
 (C) 2014-2023 UNC Institute for the Environment.
 Released under the GNU LGPL  License, version 2.1.  See URL

     https://www.gnu.org/licenses/old-licenses/lgpl-2.1.html

 for conditions of use.

 ioapi-3.2: $Id: init3.F90 247 2023-03-22 15:59:19Z coats $
 netCDF version 4.9.2 of Jul 26 2024 09:55:29 $

 SMOKE ---------------
 Copyright (c)2004 Center for Environmental Modeling for Policy Development
 All rights reserved

 Program MOVESMRG, Version SMOKEv5.0_Jun2023
 Online documentation
     https://cmascenter.org/smoke/

 No program description is available for MOVESMRG

 You will need to enter the logical names for the input and
 output files (and to have set them prior to program start,
 using "setenv <logicalname> <pathname>").

 You may use END_OF-FILE (control-D) to quit the program
 during logical-name entry. Default responses are given in
 brackets [LIKE THIS] and can be accepted by hitting the
 <RETURN> key.

1982.842u 23.959s 33:37.80 99.4% 0+0k 20261712+5687688io 35pf+0w
now checking log file /Ext05/emissions/2019/2019ge_cb6_19k/intermed/onroad/RPS/logs/movesmrg_RPS_onroad_jan_2019ge_cb6_19k_20190101_12US1_cmaq_cb6ae7.log
Now running M3STAT

In your SMOKE v5.0 test did you see a block of text like:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
movesmrg_09jun202 00000000006442CD Unknown Unknown Unknown
libpthread-2.27.s 000014D44008F980 Unknown Unknown Unknown
movesmrg_09jun202 00000000004074CB MAIN__ 768 movesmrg.f
movesmrg_09jun202 0000000000403B9E Unknown Unknown Unknown
libc-2.27.so 000014D43FAA9C87 __libc_start_main Unknown Unknown
movesmrg_09jun202 0000000000403AA9 Unknown Unknown Unknown

If not, have you been able to build a more recent version of SMOKE with traceback enabled as @cjcoats suggested?

Since you are able to run RPS for 12US1, but not for your 36km domain, it’s possible there is a problem with your 36km met data, specifically in WV since that’s where Movesmrg crashed. We note that QV (water vapor mixing ratio) and atmospheric pressure are in the METCRO3D, but we only use data from layer 1.

Dear @eyth.alison,

Thank you for looking into my case.

I have two Linux boxes. The first one has dual Intel CPUs, Ubuntu 18.04 LTS system, and intel oneapi 2024.1. For my own 36km domain (focused on Ontario, CA), on this box, I can successfully process RPH, RPV and RPP with compiled SMOKEv4.8.1. But I failed at processing RPD, RPS and RPHO.

The second box has AMD CPU, Ubuntu 22.04, and intel oneapi 2024.1. For my own 36km domain, I can only process RPD with Pre-compiled SMOKEv5.0 by EPA.

Since I can process the whole year for RPH, RPV and RPP, does that mean my MET data should be okay?

Li

Yes, I have seem a similar error message when MOVESMRG failed.

And yes, I finally build SMOKE with dbg options. I will reply to the original message separately.

Thanks again!

Li

Dear @cjcoats ,

Thanks for your suggestion. I just recompiled IO/API and SMOKE v4.8.1 with debug options enabled.

When I run processing scripts for RPHO, it failed early on at SMKINVEN stage with error message at the terminal:

Processing environment variables EMISINV_A
SMKINVEN_MONTH set to 1
Running part 1…

 This program uses the EPA-AREAL/MCNC-EnvPgms/BAMS/ UNC IE
 Models-3 I/O Applications Programming Interface, [I/O API]
 which is built on top of the netCDF I/O library (Copyright
 993, 1996 University Corporation for Atmospheric Research
 Unidata Program) and the PVM parallel-programming library
 (from Oak Ridge National Laboratory).
 Copyright (C) 1992-2002 MCNC,
 (C) 1992-2018 Carlie J. Coats, Jr.,
 (C) 2003-2012 Baron Advanced Meteorological Systems, LLC, and
 (C) 2014-2023 UNC Institute for the Environment.
 Released under the GNU LGPL  License, version 2.1.  See URL

     https://www.gnu.org/licenses/old-licenses/lgpl-2.1.html

 for conditions of use.

 ioapi-3.2: $Id: init3.F90 247 2023-03-22 15:59:19Z coats $
 netCDF version 4.9.2 of Jul 26 2024 09:55:29 $

 SMOKE ---------------
 Copyright (c)2004 Environmental Modeling for Policy Development
 All rights reserved

 Program SMKINVEN, Version SMOKEv4.8.1_Jan2021
 Online documentation
     http://www.cep.unc.edu/empd/products/smoke

 Program SMKINVEN to take ASCII area or point source files
 in IDA, EMS-95, or SMOKE list format, or mobile files
 in IDA format, and produce the I/O API and ASCII SMOKE
 inventory files and list of unique SCCs in the inventory.


 You will need to enter the logical names for the input and
 output files (and to have set them prior to program start,
 using "setenv <logicalname> <pathname>").

 You may use END_OF-FILE (control-D) to quit the program
 during logical-name entry. Default responses are given in
 brackets [LIKE THIS] and can be accepted by hitting the
 <RETURN> key.

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libpthread-2.27.s 00001486F81ED980 Unknown Unknown Unknown
smkinven 0000000000F3B0A7 Unknown Unknown Unknown
smkinven 000000000040511A Unknown Unknown Unknown
libpthread-2.27.s 00001486F81ED980 Unknown Unknown Unknown
smkinven 0000000000E47829 Unknown Unknown Unknown
smkinven 0000000000D57B96 Unknown Unknown Unknown
smkinven 0000000000E67E39 Unknown Unknown Unknown
smkinven 0000000000B76A71 Unknown Unknown Unknown
smkinven 0000000000BF207C Unknown Unknown Unknown
smkinven 0000000000B2AEEC Unknown Unknown Unknown
smkinven 0000000000B2990E Unknown Unknown Unknown
smkinven 0000000000B54496 Unknown Unknown Unknown
smkinven 0000000000AC43FF Unknown Unknown Unknown
smkinven 0000000000AC42A6 Unknown Unknown Unknown
smkinven 0000000000AA8D0F Unknown Unknown Unknown
smkinven 0000000000A1CA5B crtfil3_ 212 crtfil3.F90
smkinven 000000000092232E open3_ 455 open3.F90
smkinven 0000000000779BC9 createset_ 157 createset.f
smkinven 0000000000768261 openset_ 448 openset.f
smkinven 000000000045DE73 openinvout_ 436 openinvout.f
smkinven 0000000000576D12 MAIN__ 395 smkinven.f
smkinven 000000000040561D Unknown Unknown Unknown
libc-2.27.so 00001486F65B5C87 __libc_start_main Unknown Unknown
smkinven 000000000040551A Unknown Unknown Unknown
10.034u 0.603s 0:10.69 99.4% 0+0k 28056+11128io 2781pf+0w
now checking log file /Ext05/emissions/2019/2019ge_cb6_19k/intermed/onroad/RPHO/logs/smkinven_RPHO_onroad_jan_2019ge_cb6_19k.log

The smkinven log file has been attached here for you to review.

smkinven_RPHO_onroad_jan_2019ge_cb6_19k.log.txt (11.5 KB)

Li

Hi @cjcoats ,

When I run SMOKEv4.8.1 with only movesmrg built with ifortdbg options, I got the following error message at terminal:

Multi-day run note: Run starts with 2019001, is 250000 long
Running part 4, for 20190101…

 This program uses the EPA-AREAL/MCNC-EnvPgms/BAMS/ UNC IE
 Models-3 I/O Applications Programming Interface, [I/O API]
 which is built on top of the netCDF I/O library (Copyright
 993, 1996 University Corporation for Atmospheric Research
 Unidata Program) and the PVM parallel-programming library
 (from Oak Ridge National Laboratory).
 Copyright (C) 1992-2002 MCNC,
 (C) 1992-2018 Carlie J. Coats, Jr.,
 (C) 2003-2012 Baron Advanced Meteorological Systems, LLC, and
 (C) 2014-2023 UNC Institute for the Environment.
 Released under the GNU LGPL  License, version 2.1.  See URL

     https://www.gnu.org/licenses/old-licenses/lgpl-2.1.html

 for conditions of use.

 ioapi-3.2: $Id: init3.F90 247 2023-03-22 15:59:19Z coats $
 netCDF version 4.9.2 of Jul 26 2024 09:55:29 $

 SMOKE ---------------
 Copyright (c)2004 Environmental Modeling for Policy Development
 All rights reserved

 Program MOVESMRG, Version SMOKEv4.8.1_Jan2021
 Online documentation
     http://www.cep.unc.edu/empd/products/smoke

 No program description is available for MOVESMRG

 You will need to enter the logical names for the input and
 output files (and to have set them prior to program start,
 using "setenv <logicalname> <pathname>").

 You may use END_OF-FILE (control-D) to quit the program
 during logical-name entry. Default responses are given in
 brackets [LIKE THIS] and can be accepted by hitting the
 <RETURN> key.

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libpthread-2.27.s 000014E48197B980 Unknown Unknown Unknown
movesmrg_dbg 0000000000C6ECE7 Unknown Unknown Unknown
movesmrg_dbg 000000000040511A Unknown Unknown Unknown
libpthread-2.27.s 000014E48197B980 Unknown Unknown Unknown
movesmrg_dbg 0000000000B80CC9 Unknown Unknown Unknown
movesmrg_dbg 0000000000A91036 Unknown Unknown Unknown
movesmrg_dbg 0000000000BA12D9 Unknown Unknown Unknown
movesmrg_dbg 00000000008AFF11 Unknown Unknown Unknown
movesmrg_dbg 000000000092B51C Unknown Unknown Unknown
movesmrg_dbg 000000000086438C Unknown Unknown Unknown
movesmrg_dbg 0000000000862DAE Unknown Unknown Unknown
movesmrg_dbg 000000000088D936 Unknown Unknown Unknown
movesmrg_dbg 00000000007FDB62 Unknown Unknown Unknown
movesmrg_dbg 00000000007FDA64 Unknown Unknown Unknown
movesmrg_dbg 00000000007E2A4F Unknown Unknown Unknown
movesmrg_dbg 000000000068FA85 opnfil3_ 136 opnfil3.F90
movesmrg_dbg 000000000068D1A2 open3_ 378 open3.F90
movesmrg_dbg 00000000005D53E2 openset_ 285 openset.f
movesmrg_dbg 0000000000580FB9 rdinvmap_ 297 rdinvmap.f
movesmrg_dbg 0000000000468AC4 openmrgin_ 169 mopenmrgin.f
movesmrg_dbg 0000000000405CFA MAIN__ 245 movesmrg.f
movesmrg_dbg 000000000040561D Unknown Unknown Unknown
libc-2.27.so 000014E47FD43C87 __libc_start_main Unknown Unknown
movesmrg_dbg 000000000040551A Unknown Unknown Unknown
0.027u 0.023s 0:00.05 80.0% 0+0k 0+24io 0pf+0w
now checking log file /Ext05/emissions/2019/2019ge_cb6_19k/intermed/onroad/RPHO/logs/movesmrg_RPHO_onroad_jan_2019ge_cb6_19k_20190101_nested_36_cmaq_cb6ae7.log


  • ERROR detected in logfile:

And the movesmrg log file has been attached here for you review.

movesmrg_RPHO_onroad_jan_2019ge_cb6_19k_20190101_nested_36_cmaq_cb6ae7.log.dbg.txt (5.2 KB)

Thanks,

Li

In both cases, the failure is some sort of logical-name failure that manifests itself in a call to netCDF routine NF_OPEN, in places that have been in frequent use for the last 32 years, albeit in a situation related to SMOKE routine OPENSET. The code does something like the following:

...
CHARACTER*512 EQNAME
...
CALL NAMEVAL( FNAME, EQNAME )
...
IERR = NF_OPEN( EQNAME, FMODE, IERR )
...

My conclusion is that something is probably wrong with the EQNAME (e.g., it is longer than 511 characters, or has an embedded ASCII-zero, or …) created by OPENSET (which has been in use for 24 years). Unfortunately, the standard scripts do not give access to the program-environment so we still don’t have a good way to know exactly what is the reason for the failure. It is most probably a problem with one of the
setenv
commands in the script (or in the scripting that creates the right-hand side of this command).

FWIW

Thanks for the tips.
I will keep investigate this.

Li

It could potentially be useful to modify the script that runs movesmrg so that it documents the program-environment where this error is probably coming from, so that the script does something like the following (and then look at this output to see if you can find what is the problem):

<*run program movesmrg*>
set foo = $status 
if ( $foo )
    echo "Error $foo in program movesmrg" >> $LOGFILE
    echo "Environment:" >> $LOGFILE
    env | sort >> $LOGFILE 
    exit ( $foo )
endif

Hi @cjcoats @eyth.alison,

I ran RPS for my domain (NESTED36) vs 12US1 with env variables output to $LOGFILE. 12US1 run was successful but not for my own 36km domain.

I scanned through env variables in both log files and it seemed reasonable to me. The differences are basically due to two different domains.

Both log files are attached here, in case you want to have a quick look at them.
movesmrg_RPS_onroad_jan_2019ge_cb6_19k_20190101_nested_36_cmaq_cb6ae7.log.txt (277.9 KB)
movesmrg_RPS_onroad_jan_2019ge_cb6_19k_20190101_12US1_cmaq_cb6ae7.log.txt (480.3 KB)

Thanks,

Li

FYI we don’t normally run SMOKE-MOVES on the 36km domain, although it should be possible. There would be a different range of temperatures in 12km vs 36km and that could cause issues, also whether there are different FIPS codes on 36km. Instead, we typically aggregate the 12km emissions to 36km for the 36km cells that overlap the 12km domain. I haven’t had a chance to pour over the log file to identify something more specific.

Your talking about not normally running on the 36KM grid reminded me of something I ran into several years ago.

See https://cjcoats.github.io/ioapi/AVAIL.html#medium: if there are very large arrays, that could be the problem for default Intel or GNU compiles.
There are three different binary-incompatible “memory models” available for 64-bit x86 Linux:

  • small: At most 2 GB each for static data (COMMONs, etc.), ALLOCATEd arrays, program-stack, and program machine-code.
  • medium: Program machine-code at most 2 GB; no restrictions on data (large arrays, large stack, large data are OK).
  • large: No restrictions on code or data.

GNU and Intel default to small. If something doesn’t fit under this model’s restrictions, then you may very well silently get into trouble that would then lead to the segfault. The fix is to re-compile everything (netCDF. I/O API, SMOKE) for MOVESMRG for the medium memory model – using BIN = Linux2_x86_64gfort_medium, Linux2_x86_64ifort_medium, or Linux2_x86_64pg_medium and see if that works.

Maybe this is the issue, maybe not…