Problems running latest SMOKE test case

Describe your issue in detail

After a recent hardware upgrade I just finished updating our Dell T640 40C/80T server from CentOS7 to Rocky Linux 10.1 and also installed the software prerequisites like tcsh, Intel C++ and Fortran compilers, netcdf-c, netcdf-fortran and others. I made sure I was using tcsh and expected to run all the sectors provided without editing any scripts, but of the 16 available sectors I was only able to successfully run airports, beis4, ptfire-wild, ptfire-rx and fertilizer. For all the other sectors I was getting variations of this error message:

SCRIPT ERROR: timetracker script found multiple existing entries
for the following primary keys in the file already:
Sector = ptegu
Job Name = Annual_area
Program = smkreport
Run Date = inv
This should not happen, so script does not know how to
replace the entries for these keys
ERROR: Problem calling timetracker from qa_run script
ERROR: Running qa_run for invgrid

What version of the SMOKE software are you using?

I downloaded and installed the SMOKE example case as described in the following, and I am running the precompiled executables from there: GitHub - CEMPD/SMOKE-ExampleCase-v2: SMOKE Example Case Using EPA's Emission Modeling Platform format · GitHub

What steps have you taken to resolve this issue already?

Since the error messages pointed to timetracker_v2.csh I followed the advice from this forum post: No Intermediate Log File for Temporal Allocation - #6 by mahsasoleimani I copied the version of timetracker_v2.csh provided (and did chmod to make it executable), but that was insufficient to solve the problem.

My next attempt was to add ‘all, 12LISTOS, timetracker, all, 0, 0, N’ to the run_settings.txt but that was also unsuccessful.

Finally, I followed the advice from this forum post SMOKE ptegu ERROR which mentioned that running smkreport was not essential to complete SMOKE runs, so I turned off smkreport by editing smk_ar_annual_emf.csh, smk_ar_monthly_emf.csh, smk_pt_annual_onetime_emf.csh and smk_pt_daily_emf.csh to set run_smkreport and run_m3stat to ‘N’.

At this point I was able to successfully run all 16 sectors available in the SMOKE test case. What I don’t understand is why does switching off smkreport and m3stat (supposedly unneeded) make such a big difference, is this unique to the test case? So what happens if I’m running my own cases and need to have the output from smkreport?

Thanks for any clarification - @huytran6583 this is a follow up from my last email to you.

Hi Lorenzo,

Thanks for reporting the issue and your related email to me. I’ve rerun the SMOKE example package and I could not replicate the issue that you reported here.

Upon having deeper investigation, there are several reasons that might explain the issue that you are seeing:

  1. Buggy timetracker _v2.csh that came with SMOKE Example Case v2:
    In smoke_example_case/smoke5.1/scripts/run/timetracker_v2.csh (lines 139-143), the script deletes an existing TIMELOG entry using /bin/ed via heredoc:
    /bin/ed -s $TIMELOG << EOF
    H
    ${nrow}d
    w
    EOF

On your Rocky Linux 10.1 system, it is likely /bin/ed is not available, and so this code block silently failed. Consequently, the delete command is never executed and the script continues as if the deletion succeeded. The old TIMELOG entry was NOT removed. (This could explain why I could not replicate this issue since /bin/ed does exist on my system).

  1. The solution for this buggy timetracker script is to get a newer version of timetracker_v2.csh that does not use /bin/ed. In fact, the timetracker_v2.csh that Alison provided in the forum post that you linked here is the correct one to use (which looks like came from EMP 2022v2). This timetracker_v2.csh does not use /bin/ed, rather, it uses sed/mv command to remove $TIMELOG file

  2. However, replacing timetracker_v2.csh, as you noted, did not completely resolve your issue because you might still have residual $TIMELOG files created from previous incomplete run. To completely resolve this issue, you should clean up the log directory, or better yet, remove the intermediate/ directory before re-run with the updated timetracker_v2.csh

  3. Why this issue only happened to some sectors and not to others: The crash requires at least 3 active smkreport/timetracker calls in one run. Which sectors have this depends on which REPCONFIG_INV* environment variables are set in the sector-specific run script. Sectors with fewer active REPCONFIG files make fewer qa_run calls and may crash later or not at all.

Hope this helps.

Btw, I recently developed some interactive tool to run SMOKE. Please check it out let me know what you think. Thanks.

Huy