Program Exception when running SMOKE-TestCase

We succeeded compiling IOAPI - CMAQ - SMOKE wusing gcc compilers on a Deian 10 machine

Linux hydra 5.10.0-16-amd64 #1 SMP Debian 5.10.127-1 (2022-06-30) x86_64 GNU/Linux

using:

gnu/10.2.1
ioapi/v3.2
netcdf_c-4.8.1_fortran-4.5.3
openmpi-4.1.2

We downloaded the scripts and data from here SMOKE-TestCase

git clone -b main https://github.com/CEMPD/SMOKE-TestCase.git SMOKE-TestCase

However, when we tried the first sector nonpoint, we got the following error:

SMOKE-TestCase/scripts/nonpoint/Annual_np_oilgas_12US1_2017gb_17j_TestCase.csh

(...)
SCRIPT NOTE: No duplicates found in ${SMOKEHOME}/ge_dat/temporal/atref_2017platform_np_oilgas_ONLY_2017_27apr2020_nf_v1
Running part 1...

SCRIPT NOTE: Automatically deleting log file.
      ${SMOKEHOME}/SMOKE-TestCase/intermed/np_oilgas/logs/smkinven_np_oilgas_SMOKE-TestCase.log

forrtl: severe (168): Program Exception - illegal instruction
Image              PC                Routine            Line        Source
smkinven           0000000000659A0A  Unknown               Unknown  Unknown
libpthread-2.31.s  000014F9438B5140  Unknown               Unknown  Unknown
smkinven           00000000007F9629  Unknown               Unknown  Unknown
smkinven           00000000005721A9  nameval_                  128  nameval.c
smkinven           00000000005B37C0  initlog3_                  86  initlog3.F
smkinven           000000000055C7AB  init3_                    166  init3.F90
smkinven           000000000045D34B  MAIN__                    183  smkinven.f
smkinven           000000000040565E  Unknown               Unknown  Unknown
libc-2.31.so       000014F9436FAD0A  __libc_start_main     Unknown  Unknown
smkinven           0000000000405569  Unknown               Unknown  Unknown
0.000u 0.005s 0:00.00 0.0%      0+0k 0+0io 0pf+0w
SCRIPT ERROR: timetracker script found multiple existing entries
              for the following primary keys in the file already:
      Sector   = np_oilgas
      Job Name = Annual_area
      Program  = smkinven
      Run Date = inv
      This should not happen, so script does not know how to
      replace the entries for these keys
ERROR: Problem calling timetracker from smk_run script
ERROR: Running smk_run for one-time steps

We are new using CMAQ-SMOKe, any help will be very welcome

Solange and LluĂ­s

Hello,

It is not clear to me if you have precisely followed ALL the steps listed in the github page:

Specifically did you download the data from googledrive, have them at the right directory as pointed there, made all changes to runscripts?

If so, please post your log from the command at step 4.

Also, I assume that your other thread is on the same issue/question? Lets keep one for future reference (you can always update a question/thread in this forum, no need to keep creating new).

Thank you for your answer,

We open the other post, because we thought is another topic of importance to left it clear.

Yes, we understand we followed all the steps and we have the files from the drive at the right spot.

We attach the
run_nonpoint.txt (4.6 KB)
log

Thanks for the log file.
I think that the underlying issue may be the same in both threads and related to the way the source command is invoked in your system.

Specifically, in my logs, I do not see the 8th line of your logs:
source: shell built-in command.

How do you try to run these scripts? Are you using slurm/sbatch command?
Do you have both bash and tcsh in your system? Which is the default?

Noting that you need to be running in C-shell for the source command to work.

At the top of the scripts, there may be something like #!/bin/csh

If csh doesn’t exist in /bin on your system, there could be problems. Or, if the paths are set improperly to the file to be sourced that would be a problem

A minor caveat here: RedHat is trying to change the directory-standard so that shells are in directory /usr/bin instead of /bin – e.g., /usr/bin/tcsh. Also note that (the full path for) tcsh must be in the /etc/shells allowed-shells file…

1 Like

Thank you for your suggestions.

At this stage, we are not using any queue system to perform our tests. Therefore we are running directly on the terminal.

Our O.S. in th HPC that we are using (remember no queue system is being used):
Linux hydra 5.10.0-16-amd64 #1 SMP Debian 5.10.127-1 (2022-06-30) x86_64 GNU/Linux

Our IT staff linked /bin/tsch as /bin/csh

We created a very tiny tcsh script to test our system: called test.tsch

#!/bin/csh
   set      var = '1234'

In our HPC by default uses bash. Running the test:

~$ source ./test.tsch 
~$ echo $var

~$ csh
~% source test.tsch 
~% echo $var
1234

Do you think, that there maybe any other even more simple test (than the SMOKE-TestCase) that we could perform?

Thanks

You need to understand the difference between set and setenv:

  • set establishes a variable within the shell-executable, invisible outside the shell-executable itself.
  • setenv establishes a variable at the operating-system level, visible to all executables launched afterwards.
    You need variables visible within the CMAQ-executable (etc.), hence you must use setenv.

Thank you for the answer, certainly make sense what you say.

but we did not write scripts, we are directly using the scripts as they come for the SMOKE-TestScase. We tested as you mention, we modified the test.tsch:

#!/bin/csh
   echo "Previous var: "${var}" varRoman: "${varRoman}
   set      var = '1234'
   setenv   varRoman 'MCCXXXIV'

And using it, it works as expected:

$ source ./test.tsch 
Previous var:  varRoman: 
-bash: setenv: command not found
~$ csh
~% source ./test.tsch
var: Undefined variable.
~% echo ${var}
var: Undefined variable.
~% vim ./test #(remove the first echo)
hydra:~% source ./test.tsch
hydra:~% echo $var
1234
hydra:~% echo $varRoman
MCCXXXIV
hydra:~% vim ./test.tsch # (add again the fist echo)
hydra:~% source ./test.tsch
Previous var: 1234 varRoman: MCCXXXIV

As @chef mentions, in our log we do have the line:
source: shell built-in command.

But, it does not appear in the direct simple tests.

Thanks

Dear all,

Thank you for all your responses.

To summarize, we downloaded the SMOKE-TestCase and we can not make it run in our HPC.

We got some news.

We discovered, that in our machine me were missing some system utilities:

  • GNU arbitrary precision calculator language bc
  • classic UNIX line editor ed

Also, we forgot to mention, that we already had to introduce some tiny modifications into the code and / scripts to make it work:

  • linea #1 of smoke4.7/scripts/log_analyzer/log_analyzer.py, smoke4.7/scripts/run/path_parser.py, smoke4.7/scripts/annual_report/annual_report_v2.py, we introduced:
#!/usr/bin/env python3
  • When compiling IOAPI with gcc, we need also to modify a line of code: ioapi/v3.2-20200828/gnu/ioapi/fixed_src/PARMS3.EXT, in line #104:
!        INTEGER, PARAMETER :: M3TYPES( NM3TYPES ) =                             &
!     &        (/ M3INT, M3REAL, M3DBLE, M3INT8 /)
        INTEGER, PARAMETER :: M3TYPES( NM3TYPES ) = (/ M3INT, M3REAL, M3DBLE, M3INT8 /)
  • Due to the use of gcc compilation, we also need to change smoke4.7/scripts/platform by default it is prepared for intel, has to be changed:
#set comp_abbrv = ifort 
set comp_abbrv = gcc

We got an error message with the point sector of the test case when is using the smkreport.f90

(...)
malloc(): corrupted top size
(...)

So, we recompile SMOKE in debug mode (we couldn’t compile IOAPI in debug mode, some compiling errors related to flags [-Wtabs related] were encountered), and re-running again. In doing that, we encountered the point where the error happens (now also happening with nonpoint, aside of the point sectors):

  • nonpoint:
(...)
At line 1902 of file /home/solange.luque/MODELOS/SMOKE_dbg/gcc/src/emqa/wrrephdr.f
Fortran runtime error: Index '20' of dimension 1 of array 'outdnam' above upper bound of 19

Error termination. Backtrace:
#0  0x14e579400bd0 in ???
#1  0x14e579401685 in ???
#2  0x14e579401c76 in ???
#3  0x55e8e8e454dc in wrrephdr_
        at /home/solange.luque/MODELOS/SMOKE_dbg/gcc/src/emqa/wrrephdr.f:1902
#4  0x55e8e8e3a2e9 in smkreport
        at /home/solange.luque/MODELOS/SMOKE_dbg/gcc/src/emqa/smkreport.f:406
#5  0x55e8e8df5978 in main
        at /home/solange.luque/MODELOS/SMOKE_dbg/gcc/src/emqa/smkreport.f:47
     SMOKE ---------------
     Copyright (c)2004 Environmental Modeling for Policy Development
(...)
  • point:
(...)
At line 560 of file /home/solange.luque/MODELOS/SMOKE_dbg/gcc/src/emqa/bldrepidx.f
Fortran runtime error: Index '20' of dimension 1 of array 'outdnam' above upper bound of 19

Error termination. Backtrace:
#0  0x14f1d4b2bbd0 in ???
#1  0x14f1d4b2c685 in ???
#2  0x14f1d4b2cc76 in ???
#3  0x55d128c00dac in bldrepidx_
        at /home/solange.luque/MODELOS/SMOKE_dbg/gcc/src/emqa/bldrepidx.f:560
#4  0x55d128c2e775 in smkreport
        at /home/solange.luque/MODELOS/SMOKE_dbg/gcc/src/emqa/smkreport.f:179
#5  0x55d128bea978 in main
        at /home/solange.luque/MODELOS/SMOKE_dbg/gcc/src/emqa/smkreport.f:47
     SMOKE ---------------
(...)

Digging into the code, the lines with the problem are related with an index named J of OUTDNAM( J,RCNT ) (from line #1902 of src/emqa/wrrephdr.f)
and index J of OUTDNAM( J,N ) = SBUF (from line #560 of src/emqa/bldrepidx.f).

We tried to understand the definition of J and the size of OUTDAM, but we couldn’t fine any clue.

Any help will be very welcome.

Solange