Hello CMAS Community!
I’m consistently encountering segmentation faults (SIGSEGV) when running coupled WRF 4.7.5 and CMAQ (cb6r5_ae7_aq) simulations (see the end of this message for the error text from rsl.error.0001). These crashes always occur precisely at the interval for which MCIP-generated meteorological data was processed:
- For instance, when running a simulation starting at 12:00:00 with MCIP met data generated at a 3-hour interval, the model successfully runs until exactly 15:00:00, then crashes.
- Similarly, if I run a simulation starting at 12:00:00 using MCIP met data generated at a 6-hour interval, the model will run successfully up to exactly 18:00:00 before crashing.
These crashes occur reliably and exactly at these specified MCIP intervals. My domain is relatively small (~400 km²; 100 × 100 grid cells at 200 m resolution), and I’m running with a small WRF timestep (1 second). I’m coupling every 120 WRF timesteps (wrf_cmaq_freq = 120
). I’ve tried changing CTM_MAXSYNC
and CTM_MINSYNC
but saw no improvement.
I reviewed this forum post on segmentation faults and CFL issues: Segmentation Faults and CFL Errors | WRF & MPAS-A Support Forum. I checked for CFL error messages in the output but saw none.
I also tried the recommended memory and stack size tweaks:
ulimit -s unlimited
export MP_STACK_SIZE=64000000
export OMP_STACKSIZE=64M
These did not resolve the issue.
You can access all the relevant input and output files here: Model inputs and outputs - Google Drive
Has anyone experienced a similar issue, or does anyone have suggestions on how to debug or resolve these crashes?
I’m attempting to model the hyper local cooling effect from reflecting sunlight via the injection of reflective aerosols over a small domain, including how quickly the aerosols disperse and the cooling effect is lost. This is a very unique use case. If anyone has any suggestions on how to better model something occur on the scale of a few km^2, I’d be happy to hear it!
Thanks very much for your time!
rsl.error.0001:
askid: 1 hostname: ip-172-31-83-148
module_io_quilt_old.F 2931 F
Quilting with 1 groups of 0 I/O tasks.
Ntasks in X 4 , ntasks in Y 8
Domain # 1: dx = 200.000 m
WRF V4.7.0 MODEL
git commit e204519f0dc13c99bbaf39e8a818993cc36209ad 3 files changed, 0 insertions(+), 0 deletions(-)
Parent domain
ids,ide,jds,jde 1 100 1 100
ims,ime,jms,jme 19 57 -4 20
ips,ipe,jps,jpe 26 50 1 13
DYNAMICS OPTION: Eulerian Mass Coordinate
alloc_space_field: domain 1 , 38940216 bytes allocated
med_initialdata_input: calling input_input
Input data is acceptable to use:
CURRENT DATE = 2024-08-07_12:00:00
SIMULATION START DATE = 2024-08-07_12:00:00
Max map factor in domain 1 = 1.00. Scale the dt in the model accordingly.
D01: Time step = 1.00000000 (s)
D01: Grid Distance = 0.200000003 (km)
D01: Grid Distance Ratio dt/dx = 5.00000000 (s/km)
D01: Ratio Including Maximum Map Factor = 4.98540926 (s/km)
D01: NML defined reasonable_time_step_ratio = 6.00000000
Normal ending of CAMtr_volume_mixing_ratio file
GHG annual values from CAM trace gas file
Year = 2024 , Julian day = 220
CO2 = 4.2868195693653640E-004 volume mixing ratio
N2O = 3.3564240469132870E-007 volume mixing ratio
CH4 = 2.0060859710514744E-006 volume mixing ratio
CFC11 = 2.7307591809652235E-010 volume mixing ratio
CFC12 = 4.6381286791458461E-010 volume mixing ratio
INPUT LandUse = “MODIFIED_IGBP_MODIS_NOAH”
LANDUSE TYPE = “MODIFIED_IGBP_MODIS_NOAH” FOUND 61 CATEGORIES 2 SEASONS WATER CATEGORY = 17 SNOW CATEGORY = 15
INITIALIZE THREE Noah LSM RELATED TABLES
d01 2024-08-07_12:00:00 Input data is acceptable to use:
Tile Strategy is not specified. Assuming 1D-Y
WRF TILE 1 IS 26 IE 50 JS 1 JE 13
WRF NUMBER OF TILES = 1
d01 2024-08-07_12:00:00 ----------------------------------------
d01 2024-08-07_12:00:00 W-DAMPING BEGINS AT W-COURANT NUMBER = 1.00000000
d01 2024-08-07_12:00:00 ----------------------------------------
d01 2024-08-07_18:00:00 Input data is acceptable to use:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x56fa59b0d44b in ???
#1 0x56fa59b0c2df in ???
#2 0x56fa59a108f7 in ???
#3 0x56fa5a241ce8 in __memcpy_sve
at ../sysdeps/aarch64/multiarch/memcpy_sve.S:77
#4 0x56fa5a5ca63f in ???
#5 0x56fa5a03826b in ???
#6 0x56fa5a61bea7 in ???
#7 0x56fa5a59421f in ???
#8 0x56fa59e979fb in ???
#9 0x56fa59f04027 in ???
#10 0x56fa59f39493 in ???
#11 0x56fa59eade1b in ???
#12 0xc02bb3386037 in ???
#13 0xc02bb380a08f in ???
#14 0xc02bb38312eb in ???
#15 0xc02bb388ae97 in ???
#16 0xc02bb388f407 in ???
#17 0xc02bb30bbf87 in ???
#18 0xc02bb30ab783 in ???
#19 0xc02bb30aad7b in ???
#20 0x56fa5a1c84c3 in __libc_start_call_main
at ../sysdeps/nptl/libc_start_call_main.h:58
#21 0x56fa5a1c8597 in __libc_start_main_impl
at ../csu/libc-start.c:360
#22 0xc02bb30aadef in ???
#23 0xffffffffffffffff in ???