Missing wrfout time steps for running MCIP

Hi everyone,

I am attempting to run MCIP5.4 for my 3 nested domains wrfout files (WRF4-2-1) from 2017-07-01 to 2017-07-15, after running MCIP for the first domain faced an error that there are not data for 2017-07-4_15:00:00 while date before have all 24 time steps!

I checked wrfout_d01_2017-07-04 and found that there is no time steps after 15:00.

I ran wrf again for 2017-07-04 and it has just 19 time steps.

how can I figure out this issue?

I downloaded gfs data for July 2017 from here

Hi,
It sounds like an issue with your WRF run. You should check your namelist.input file for WRF and make sure it has the appropriate end_day and end_hour
However, the fact that you reran WRF and it stopped producing output at a different time makes me think that your WRF run is crashing midway through the 15th because of some instability issue. If you look at the last line in your rsl.error.0000 file, does it say “SUCCESS COMPLETE WRF”? If not, the run may have crashed and it would be a wrf issue to debug.

Megan

1 Like

@mmallard Thank you for your response,

Actually, WRF ran successfully with start_date: 2017-07-01_00:00:00 and end_date: 2017-07-14_:18:00:00 and it said “SUCCESS COMPLETE” after finishing and I have all wrfout files for these dates, but some days like 2017-07-04 doesn’t include all 24 time steps while the day before and after that have all 24 time steps!

Sure, no problem. In your original post, you’d mentioned wanting to run mcip through the 15th, so in that case you’d need to adjust your end date accordingly to produce data for the entire period you want to run mcip.
But that is odd about your wrfout file for the 4th having fewer time steps. You can use the command "ncdump -v,Times " on that file to see which timesteps it actually wrote which could help debug the problem.

Megan

edited to add… What is frames_per_outfile set to in your wrf namelist.input?

@mmallard,

you may find attachad namelist.input and ncdump for wrfout_2017-07-04_00:00:00.

It is worth mentioning that all other WRF output files contain all 24 time steps, except for the 4th and 9th days. I ran WRF for these days in a single run, and there were no errors; it finished successfully.

based on ncdump/time there is just 15 time step for July 4th 2017 in wrfout file.
namelist.input.txt (3.9 KB)

Hi,
If you use ncdump -v,Times you can see what specific times were written to the file… Is it the first 15 hours, then nothing? or did it write the whole day but split up into 15 increments instead of 24? That info would be useful for debugging.
Also, I’m not sure what you mean when you say you ran these periods in a “single run”… Did you use a wrfrst file to restart the run? Or did you initialize separate runs on the 4th and 9th? If it’s the later, you would encounter an issue with using MCIP because of the way WRF accumulates precipitation fields. See this thread for info on that: MCIP/WRF Precipitation - MCIP - CMAS CENTER FORUM

Megan

It’s been a long time since I ran WRF…
I don’t know whether this matters, but your start and end times are inconsistent with your run duration. Your start time is 2017-07-01_00:00:00, the end time is 2017-07-14_18:00:00, which is a duration of 14 days and 18 hours, but your run_days is 14 and run_hours is 6.

I suggest running for 15 days and 0 hours. Remove the specification of end_year, end_month, end_day, and end_hour and let WRF figure those out from the start time and run length.

@hhallaji –

Even though @cgnolte is correct that some of the time elements in your namelist (i.e., run_days and run_hours) do not match the start and end time, WRF does not use those namelist variables. (I believe “real” uses those namelist variables.) The settings did not affect your WRF run, as long as you have enough data from “real” to support your WRF simulation.

You may want to double-check your “real” output to be sure you have enough data.

Also, if you intend to use these WRF data for CMAQ, then you may want to revisit your settings for the “fdda” group in the WRF namelist, as well as the setting for the “restart” variable in your “time_control” namelist.

–Tanya

Hello Megan,

Sorry for the late reply.
I used ncdump -v Times for the fourth day, and it only shows the first 15 hours, then nothing.

I meant that I ran WRF with start_date: 2017-07-01_00:00:00 and end_date: 2017-07-14_18:00:00 using restart: .true… However, after completing the WRF run and obtaining all wrfout files for those 14 days, I found that the fourth day (wrfout_d01_2017-07-04) only has the first 15 time steps instead of the usual 24 time steps like the other days. This discrepancy caused an issue when I ran MCIP with those wrfout files. I am now trying to figure out how to obtain all time steps for that day.

@cgnolte & @tlspero
Many thanks for your responses.

I checked the wrfbody_d01 using “ncdump -v Times wrfbdy_d01”. You may find screenshots of this command and “fdda” group in namelist.iput attached.


@hhallaji –

Thank you for posting your additional namelist and the screenshot of your available times from one of the “real” output files.

I see that there are missing times in the WRF simulation that ended prematurely after 14 UTC 4 July 2017. Did you check the WRF log files to determine whether the simulation on that day (and on the other days that were missing output) had crashed? If you have not already done so, please check the end of the file, “rsl.out.0000” to see what information you can pull from the file. It is possible that WRF crashed due to an issue in the model or due to a hardware issue.

Hope this helps.
Tanya

@tlspero

Thank you very much for your helps.

I have even tried to run WRF just for 4th July 2017 and there are, in wrfoutfile, just 14 timesteps for that day.

I have attached the namelist.input and rsl.out.0000 for this run along with the screenshots of “ncdump -v Times wrfbody_d01” and “ncdump -v Times wrfout_d01_2017-07-04”.

As you may see by reviewing these files, there are all computation for all time steps in rsl.out.0000 while there are just 14 timesteps in wrfoutfile.

namelist.txt (3.9 KB)
rsl.out.0000.txt (1.5 MB)


Please attach the output from the following command, as it is too difficult to evaluate screenshots.

ncdump -h wrfout_d01_2017-07-04 > wrfout_d01_2017-07-04.txt

I used the following command to see what the log file said that you had output to the file:

grep wrfout_d01  rsl.out.0000.txt

It shows that all 24 timesteps should have been output to the wrfout_d01_2017-07-04 file. Perhaps there is a permission issue, that doesn’t allow the wrfout file to be overwritten. In that case, please delete the wrf output file and rerun again.

Timing for Writing wrfout_d01_2017-07-04_00:00:00 for domain        1:  193.88071 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_01:00:00 for domain        1: 2305.67700 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_02:00:00 for domain        1:    6.93398 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_03:00:00 for domain        1:    6.37924 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_04:00:00 for domain        1:    7.23648 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_05:00:00 for domain        1:    6.57017 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_06:00:00 for domain        1:    6.49574 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_07:00:00 for domain        1:    6.74876 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_08:00:00 for domain        1:    5.33212 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_09:00:00 for domain        1:    7.22383 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_10:00:00 for domain        1:   13.42353 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_11:00:00 for domain        1:    5.79683 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_12:00:00 for domain        1:    5.31540 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_13:00:00 for domain        1:    5.03726 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_14:00:00 for domain        1:    5.28103 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_15:00:00 for domain        1:    5.72725 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_16:00:00 for domain        1:    5.35812 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_17:00:00 for domain        1:    5.11757 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_18:00:00 for domain        1:    5.44742 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_19:00:00 for domain        1:    4.93142 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_20:00:00 for domain        1:    5.44579 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_21:00:00 for domain        1:    5.73225 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_22:00:00 for domain        1:    5.01085 elapsed seconds
Timing for Writing wrfout_d01_2017-07-04_23:00:00 for domain        1:    9.31539 elapsed seconds
Timing for Writing wrfout_d01_2017-07-05_:00:00:00 for domain        1:    9.22005 elapsed seconds
d01 2017-07-05_00:00:00  Input data is acceptable to use: wrfbdy_d01
           2  input_wrf: wrf_get_next_time current_date: 2017-07-05_00:00:00 Status =           -4
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE:  <stdin>  LINE:    1151
 ---- ERROR: Ran out of valid boundary conditions in file wrfbdy_d01

@hhallaji –

I browsed through rsl.out.0000.txt, and the file shows that WRF is trying to write the output. For example:

Is your disk full? Or…did WRF crash and the output was in memory but not actually written to the file? See:

The other thought I have is that there could be something strange with your triple-nested domains (12-4-1.3-km) because the time step for the 1.3-km domain is 6.667 seconds, which makes the WRF time-keeping messy. Please try to run just the 12-km and 4-km domains here without the 1.3-km domain so we can see if this is related to time step of the 1.3-km domain.

–Tanya

@lizadams

Sorry for the difficulties.

You may find attached the output from the command.

Actually I have to say that before running WRF just for that day, I deleted all files output from real.exe and wrf.exe from previous runs and reran it.

But I will try one more time as you advised.

wrfout_do1_2017-07-04.txt (39.1 KB)

but I will try one more time.

@tlspero - Dear Tanya

They are my thoughts:

I believe the error in the last line is related to the end_date I set in namelist.input since it didn’t have information to do computation for 2017-07-05_00:00:00.

one more thing, just as a question, if the time step for the 3rd domain makes WRF messy, why it did work for the following date when I ran WRF for 14 days and except for 4th July 2017, all the other days have the 24 timesteps?

The disk is not full and WRF didn’t crash.

@hhallaji –

Thank you for the clarification.

I don’t think there is an error in your namelist for the time period:

image

It seems like the appropriate end time was set there. However–and this gets to your other question–it is possible that timekeeping was “off” inside WRF because of the 6.667-second time step for the 1.3-km domain. My guess–and it is just a guess, and I could be completely wrong–is that there are some intermittent real-number precision issues in the timekeeping, and that contributed to a spurious round-off issue inside of WRF in one of the timekeeping variables. The WRF simulation should have ended after the the restart files were written, but instead WRF thought it had another time step to complete. In other words, instead of completing 1440.0 minutes of the simulation, one of the timekeeping variables may have something like 1439.999996 minutes, and WRF moved forward to try to get to 1440.0.

That said, please consider running (as just a test) your 4 July case without the 1.3-km domain to see if you encounter the same error at the end of the day.

–Tanya

@tlspero
Dear Tanya,

Below are the results of running for the first two domains and it has all 24 timesteps in both domains, while still there is an error at the end of the run.
wrfout_d01_2017-07-04 (1).txt (39.7 KB)
wrfout_d02_2017-07-04.txt (39.7 KB)
rsl.out.0000.txt (482.5 KB)

@hhallaji –

Thank you for checking that run. For some reason, WRF wants to complete another time step, and it needs another entry in wrfbdy_d01 (and probably the wrflowbdy files, too). I’m surprised that the other days do not also have this error.

One possible solution is to reprocess the wrfbdy and wrflowbdy files to add 2017-07-05_00:00:00, and hopefully that will suffice. Alternatively, you may consider using a netCDF tool to concatenate the wrfbdy file from 5 July onto the end of 4 July to try to get past this issue.

I’m sorry you’re experiencing this. WRF should have completed after the restart files were written.

Hope this helps. If this does not help, you may want to reach out to the WRF community for specific guidance.

–Tanya