I ran WRF output (12km CONUS domain for a month) through MCIPv4.5, and one of the output files I got is METCRO3D of size 235 GB. At that rate, running 12-month WRF output through MCIP would easily yield METCRO3D file of size ~ 3TB, which is extremely big. I am curious how we could handle this issue of file size management in MCIP? Is there a way to make the MCIP output file size manageable?
Since the I/O API relies on a sequential direct access paradigm, this is not really a problem except for the disk space required. For that presumptive 1-year file (say 2018), it takes the underlying netCDF the same amount of time to access the data for 2018001:000000 that it does for 2018365:230000 – it does not have to read through the entire file to access that data, but jumps directly to it. Being able to do that was one of the initial Models-3 I/O requirements…
Note that the data still takes up about the same amount of space whether it’s in a manageable 3 pieces (MC2, MD3, MC3) or whether it’s in a hard-to-manage thousands of pieces.
And the disk space is a lot easier than the first time I did this kind of thing, when I had to jump through hoops to get a 24TB RAID for a (33-year) hydrology-modeling project. See, for example,
Thanks a lot! I understand the situation with the file size now.
While I am running MCIP currently, may I also seek your input on running MCIP on the clusters (with respect to parallel computing)? The same one-month 12km CONUS domain WRF simulation takes ~5 hours (with both single node+single core configuration and 2 nodes+44 core configuration) to process its output on MCIPv4.5, indicating that (a) there was no gain from parallel computing on running MCIP, and (b) it could take 60 hours (about 2.5 days) to run MCIP on 12-month WRF simulation. What is the recommended way to run MCIP efficiently on computing clusters?