Running SMOKE 5.1 in Windows Subsystem for Linux (WSL)

Hi everyone,

I just finished the CMAS SMOKE training course early this month and would like to get started using the EPA 2022v1 platform. The only computing resource immediately available to me is a WIn11 PC with a VirtualBox Linux Mint VM where I installed Smoke v5.1. It seems I have grossly underestimated the disk space that SMOKE needs, and the installed disk capacity falls far short of the 16TB I would need for an annual run with all 39 sectors (not counting the conversion to CAMx-ready files). The VirtualBox documentation suggests it might be possible to save a VM on an external USB drive, but it’s unclear whether the running VM would conform to the SMOKE directory structure on a WIn11 formatted external drive and execute SMOKE correctly.

I have gotten advice about considering WSL instead for my application. The closest thing I found on this topic was a CMAS forum discussion with @lizadams about running CMAQ on WSL, and the example given was a related procedure to install WRF on WSL using an external drive. Having run WRF at a previous job I understand that WRF and SMOKE have comparable hardware/software requirements so what works for WRF may well work for SMOKE. However, it would be really helpful to hear from someone who has actually installed SMOKE on WSL and is using it successfully, which would motivate me to switch from VirtualBox to WSL.

Comments gratefully accepted.

What size grid are you wanting to process? You likely can’t process a full national 12km grid with this type of system.

Thank you for asking. What I understood from my abortive attempt on VirtualBox was that given a big enough internal drive, running SMOKE (tested for the 12km national grid) was feasible since the Linux directory structure of SMOKE would remain intact. I could not determine whether I would get the same results using a VM on an external USB drive.

From WSL webpages I’ve read and videos I’ve watched, it appears that WSLv1 translates Linux operations to Windows equivalents so it could conceivably handle running SMOKE and saving output files on an external USB drive, as long as the files were not subjected to any Windows operations.

I’m looking for a temporary solution so we can keep working with SMOKE while the CMAS training is fresh in our minds. We do have a Linux cluster with an older version of SMOKE, but something went wrong during a Dell hardware upgrade in February and we had to shut it down as a preventive measure.

My colleague and I are new to SMOKE/CAMx so our short term goal is to generate a 2022 run and compare our results with the EPA platform.

Running SMOKE directly on an external USB drive or having SMOKE write directly to an external USB drive will be very slow and should be avoided if possible.

A few tips for disk management:

  1. Work with one month at a time
  2. Remove or compress SMOKE intermediate files (grid, temporal, and speciation matrixes) after QA of the outputs
  3. Compress and back-up SMOKE sector-level CMAQ gridded outputs after mrggrid (2D sector merge)
  4. Compress and back-up sector-level CAMx converted files after mrguam and mrgpt (CAMx conversion) is complete
  5. If you are not running CMAQ then compress and back-up all CMAQ model ready files

Thank you @james.beidler, that sequence is close to what I had in mind. So are you suggesting that is is possible, but not efficient, to run SMOKE from an external drive (whether VirtualBox or WSL)? We have a few old 4TB, 8TB and 16TB external drives we can repurpose, but currently no budget to swap in a bigger internal drive.

I’d like to mention that I also heard directly from @cjcoats who suggested looking into mingw64 and cygwin, but I don’t have sufficient experience or skills with them.

Yes, it is possible to run SMOKE directly on an external drive assuming that it is properly formatted. I haven’t used WSL but VirtualBox should work with this configuration.
If you can, at a minimum, do your intermediate reads and writes (ā€œintermedā€ directory) on an internal SSD I think that it would help the processing speed.

If this is a desktop PC you may run into memory limits with certain sectors in 2022v1. I recommend at least 32 GB available for sectors such as onroad (SMOKE-MOVES). You should consider running smkinven for sectors such as ptfire and rwc just to verify that you won’t have memory issues before putting in a lot of configuration effort with this system.

1 Like

Thanks @james.beidler, these suggestions are very helpful. Just a clarification though - for VirtualBox with a WIn11 host and Linux Mint guest, when you say a properly formatted external drive wouldn’t this be Win11 by default since the Win11 host has to be able to read it? Or do you mean something else?

A decade ago, I got SMOKE running in a relatively-native Windows using either Cygwin or MINGW. What happened to that?

–

Carlie J. Coats, Jr., Ph.D. I/O API Author/Maintainer

I’m adding this follow up here as it is related to the discussion. As far as I can tell, there seems to be 3 ways to make an external usb ntfs drive accessible to WSL:

1
(in WSL)
sudo mount -t drvfs D: /mnt/d (where D: is external drive letter from Windows drive list)
lsusb
lsblk
ls /mnt/d

2
(in Powershell)
GET-CimInstance -query ā€œSELECT * from Win32_DiskDriveā€
wsl --mount < DiskPath > --bare (where < DIskPath > is .\PHYSICALDRIVE?, ? is a number from the previous command)

(in WSL)
lsusb
lsblk
ls /mnt/wsl

3
(in Powershell)
usbipd list
usbipd bind --force (optional) --busid ?-? (where ?-? is a device number from the previous command)
usbipd attach --wsl --busid ?-?

(in WSL)
lsusb
lsblk
sudo mount /dev/sdd? /mnt/d (where ? is drive partition number from previous command)
ls /mnt/d

I’ve only managed to do method 1 successfully, does anyone know what the differences are and what I’m doing wrong with methods 2 and 3? If I can do method 3 with the external drive available only to WSL (not Windows) that would be preferable due to the specific directory structure required by the SMOKE executable I’m running from WSL, plus I’m already seeing the slow drive access rate just listing directory contents.

Just an update, I managed to get this running on my VirtualBox Linux Mint guest on a Win10 host using the following steps:

1 Format data partition (command line or GUI) on ntfs external usb drive as ext4
2 Connect external drive, reboot Win10 PC, start VirtualBox then Linux Mint guest VM
3 Configure guest VM to add external drive as usb device then select usb v3.0
4 Open a Linux terminal then locate the external drive using lsblk
5 Create mount point sudo mkdir /mnt/usb
6 Mount external drive using sudo mount /dev/sd?# /mnt/usb where ? is letter and # number from step 4
7 Install the SMOKE directories on external drive, my root directory is /mnt/usb/EPA2022v1

Based on how quickly the text screen scrolls, this configuration runs roughly 1/3 to 1/2 the speed of default configuration with internal drive I/O which is better than I expected. One thing to watch out for is when shutting down then rebooting the PC, it asks if an unreadable external drive should be formatted (Y/N).

2 Likes

Wow, great work – thanks for letting us know about this

By the way, this assumes that all the software dependencies that SMOKE needs like gfortran (in my case the EPA precompiled executables worked fine), zlib, hdf5, netcdf and python (Linux Mint has python3 preinstalled) have already been installed on the guest VM. All I did was copy the SMOKE directories over to the external drive, and also change the root directory in scripts/directory_definitions.csh.

The ā€˜unreadable external drive message’ (do NOT select ā€˜Y’) is because Win10 can’t read ext4 drives, but it works fine in the VM. Sometimes right after booting, the external drive may look empty but it takes a little while to load the contents. I normally eject the external drive from Win10 before powering down the PC.