Running SMOKE 5.1 in Windows Subsystem for Linux (WSL)

Hi everyone,

I just finished the CMAS SMOKE training course early this month and would like to get started using the EPA 2022v1 platform. The only computing resource immediately available to me is a WIn11 PC with a VirtualBox Linux Mint VM where I installed Smoke v5.1. It seems I have grossly underestimated the disk space that SMOKE needs, and the installed disk capacity falls far short of the 16TB I would need for an annual run with all 39 sectors (not counting the conversion to CAMx-ready files). The VirtualBox documentation suggests it might be possible to save a VM on an external USB drive, but it’s unclear whether the running VM would conform to the SMOKE directory structure on a WIn11 formatted external drive and execute SMOKE correctly.

I have gotten advice about considering WSL instead for my application. The closest thing I found on this topic was a CMAS forum discussion with @lizadams about running CMAQ on WSL, and the example given was a related procedure to install WRF on WSL using an external drive. Having run WRF at a previous job I understand that WRF and SMOKE have comparable hardware/software requirements so what works for WRF may well work for SMOKE. However, it would be really helpful to hear from someone who has actually installed SMOKE on WSL and is using it successfully, which would motivate me to switch from VirtualBox to WSL.

Comments gratefully accepted.

What size grid are you wanting to process? You likely can’t process a full national 12km grid with this type of system.

Thank you for asking. What I understood from my abortive attempt on VirtualBox was that given a big enough internal drive, running SMOKE (tested for the 12km national grid) was feasible since the Linux directory structure of SMOKE would remain intact. I could not determine whether I would get the same results using a VM on an external USB drive.

From WSL webpages I’ve read and videos I’ve watched, it appears that WSLv1 translates Linux operations to Windows equivalents so it could conceivably handle running SMOKE and saving output files on an external USB drive, as long as the files were not subjected to any Windows operations.

I’m looking for a temporary solution so we can keep working with SMOKE while the CMAS training is fresh in our minds. We do have a Linux cluster with an older version of SMOKE, but something went wrong during a Dell hardware upgrade in February and we had to shut it down as a preventive measure.

My colleague and I are new to SMOKE/CAMx so our short term goal is to generate a 2022 run and compare our results with the EPA platform.

Running SMOKE directly on an external USB drive or having SMOKE write directly to an external USB drive will be very slow and should be avoided if possible.

A few tips for disk management:

  1. Work with one month at a time
  2. Remove or compress SMOKE intermediate files (grid, temporal, and speciation matrixes) after QA of the outputs
  3. Compress and back-up SMOKE sector-level CMAQ gridded outputs after mrggrid (2D sector merge)
  4. Compress and back-up sector-level CAMx converted files after mrguam and mrgpt (CAMx conversion) is complete
  5. If you are not running CMAQ then compress and back-up all CMAQ model ready files

Thank you @james.beidler, that sequence is close to what I had in mind. So are you suggesting that is is possible, but not efficient, to run SMOKE from an external drive (whether VirtualBox or WSL)? We have a few old 4TB, 8TB and 16TB external drives we can repurpose, but currently no budget to swap in a bigger internal drive.

I’d like to mention that I also heard directly from @cjcoats who suggested looking into mingw64 and cygwin, but I don’t have sufficient experience or skills with them.

Yes, it is possible to run SMOKE directly on an external drive assuming that it is properly formatted. I haven’t used WSL but VirtualBox should work with this configuration.
If you can, at a minimum, do your intermediate reads and writes (“intermed” directory) on an internal SSD I think that it would help the processing speed.

If this is a desktop PC you may run into memory limits with certain sectors in 2022v1. I recommend at least 32 GB available for sectors such as onroad (SMOKE-MOVES). You should consider running smkinven for sectors such as ptfire and rwc just to verify that you won’t have memory issues before putting in a lot of configuration effort with this system.

1 Like

Thanks @james.beidler, these suggestions are very helpful. Just a clarification though - for VirtualBox with a WIn11 host and Linux Mint guest, when you say a properly formatted external drive wouldn’t this be Win11 by default since the Win11 host has to be able to read it? Or do you mean something else?

A decade ago, I got SMOKE running in a relatively-native Windows using either Cygwin or MINGW. What happened to that?

–

Carlie J. Coats, Jr., Ph.D. I/O API Author/Maintainer