Debugging SMOKE for sectors needing MCIP files

Hello all,

I’ve been spending a lot of time running SMOKE v5.1 with the 2022v1 NEI to generate input files for CAMx, and have run into some errors running certain sectors. For context, I’m running on a basic PC with these specifications:

Processor Intel(R) Core™ i7-6700 CPU @ 3.40GHz 3.41 GHz
Installed RAM 16.0 GB (15.9 GB usable)
Internal drive 240GB, External USB drive 3TB (nominal)
Windows 10 Pro, V22H2 (host OS)
Oracle VirtualBox 7.0, Linux Mint 22 (guest OS), SMOKE v5.1

For most sectors, running the scripts were straightforward using the precompiled executables from EPA. For sectors that needed MCIP met files I could only run the onetime scripts before getting errors in the next scripts. Some of the early errors I fixed included commenting out a redefinition of SMOKE_LOCATION near the end of the scripts, and editing the appropriate directory_definitions_12US2.csh as well. I decided to postpone running the onroad scripts since I didn’t have disk space for all eftables files, and barely had enough space for just the January 2020 MCIP files (scripts edited to run January 2020 only).

Here are a list of errors I encountered, quoted from the intermed/*/logs/ files and ++++++++++ marks where the script stopped running:

/mnt/Buffalo_3TB/EPA2022v1/2022hc_cb6_22m/intermed/cmv_c1c2_12/logs/smkinven_cmv_c1c2_12_jan_2022hc_cb6_22m.log

last line:
Reading hour -specific data…
Successful open for emissions file:
/mnt/Buffalo_3TB/EPA2022v1/2022hc_cb6_22m/inputs/cmv_c1c2_12/cmv_C1C2_02_cmv_c1c2_2022_gapfilled_masked_12US1_2022_CA_hourly.csv

++++++++++

/mnt/Buffalo_3TB/EPA2022v1/2022hc_cb6_22m/intermed/cmv_c3_12/logs/smkinven_cmv_c3_12_jan_2022hc_cb6_22m.log

last line:
Successful open for emissions file:
/mnt/Buffalo_3TB/EPA2022v1/2022hc_cb6_22m/inputs/cmv_c3_12/cmv_C3_02_cmv_c3_2022_gapfilled_masked_12US1_2022_CA_hourly.csv

*** ERROR ABORT in subroutine RDFF10PD:CHECKMEM
Failure allocating memory for “HSVAL”: STATUS= 41

++++++++++

/mnt/Buffalo_3TB/EPA2022v1/2022hc_cb6_22m/intermed/beis4/logs/tmpbeis4_beis4_2022hc_cb6_22m_20220101_12US1.log

last line:
Processing Sunday Jan. 2, 2022
at time 0:00:00

*** ERROR ABORT in subroutine HRBEIS4
TAIR= 0.00 out of range at (C,R)= 1, 1
Date and time 0:00:00 Jan. 2, 2022 (2022002:000000)

++++++++++

/mnt/Buffalo_3TB/EPA2022v1/2022hc_cb6_22m/intermed/ptfire-rx3D/logs/smkinven_ptfire-rx3D_jan_2022hc_cb6_22m.log

last line:
Successful open for emissions file:
/mnt/Buffalo_3TB/EPA2022v1/2022hc_cb6_22m/inputs/ptfire-rx/ptday_sf2_2022v1_20240626_caps_rx_08jul2024_v0

*** ERROR ABORT in subroutine RDFF10PD:CHECKMEM
Failure allocating memory for “HSVAL”: STATUS= 41

++++++++++

/mnt/Buffalo_3TB/EPA2022v1/2022hc_cb6_22m/intermed/ptfire-wild3D/logs/smkinven_ptfire-wild3D_jan_2022hc_cb6_22m.log

last line:
Reading day -specific data…
Successful open for emissions file:
/mnt/Buffalo_3TB/EPA2022v1/2022hc_cb6_22m/inputs/ptfire-wild/ptday_sf2_2022v1_20240802_caps_wf_02aug2024_v0

++++++++++

Earlier I got advice from @james.beidler about running with at least 32GB RAM and enough drive space, and he recommended running rwc, ptfire and onroad to see if my system was up to it. Runs with rwc, ptagfire ptfire_othna and nonpoint/onroad* generated output in either /premerged/ or /smoke_out/, but the specific ones mentioned stopped prematurely. I also had trouble getting results running the *_adj scripts after their precursor script ran.

I strongly suspect I’ve run into an insufficient RAM issue that I can’t do anything about, but would appreciate confirmation from more experienced users. Practice runs with SMOKE on this PC are the best I can do until our regular Linux cluster is back online, but I’ve learned a lot anyway.

Thanks for your advice!

From some of the errors below it looks like you are running out of RAM.

You could consider trying to run for a smaller grid that is a subset of the 12US1 grid to see if that helps.

You also mention “the January 2020 MCIP files (scripts edited to run January 2020” but you also say you are working with the 2022 platform. I wouldn’t expect the met.-dependent processes to work with 2020 met data for a 2022 run.

Did you mean to say 2022 instead of 2020 here?

@eyth.alison thanks for confirming my guess, and yes I meant to type January 2022 (cut and paste got me). My supervisor suggested I duplicate the EPA run so I could get the same /premerged/ or /smoke_out/ results, an apples to apples comparison once I figure out how to use m3diff properly. After that I can then move on to preparing CAMx input files.

Would there be any reason to doubt whether running a subset grid of 12US1 in SMOKE would agree to 1e-6 of the same subset grid cut out from the EPA /premerged/ or /smoke_out/ files on AWS?

Our Linux cluster doesn’t have the ifort compiler and I am unsure if it will run EPA’s precompiled executables, so there’ll probably be some differences due to the gfortran compiler. I’m hoping there’s a free to download ifort compiler somewhere.

M3Tools program m3diff is documented here.

You can download the Intel Fortran compiler here. Note that the compiler-name has changed (as have some flags): it is now ifx instead of ifort; this means you’ll want to use I/O API BIN-types { Linux2_x86_64ifx, Linux2_x86_64ifxdbg, Linux2_x86_64ifx_medium, Linux2_x86_64ifx_mediumdbg }. If your cluster does have gfortran then find out the version using the command gfortran --version; if the version is 10 or greater, you need to use the *gfort10* I/O API binary types, and the corresponding flags. See
Easy Guide® for Building SMOKE

If your subset-grid is in fact “cut out” from the larger grind, then the per-cell emissions values should be an exact match (assuming you’re using M3Tools program m3wndw ) or some otherwise “correct” (binary) windowing program, and not going through some ASCII intermediate data-file (note that the language-standards explicitly do not guarantee the quality of rounding in operations to/from ASCII).

@cjcoats thanks, this answers a lot of my questions. Your new edit just answered my next question (Do the new I/O API BIN types relate to the compilation of the SMOKE executables or somewhere else?)

Since the influence of the grid points surrounding the subset grid of the EPA /premerged/ or /smoke_out/ data would be reduced to the equivalent of time-varying boundary conditions, would there be a noticeable difference from the SMOKE run subset grid where the boundary conditions are imposed by the input data?

No. It should be exact, independent of the boundary-met (etc.).

Thanks @eyth.alison and @cjcoats, I’ll think about how to best approach this next task.