Creating County Mask for DESID using shp2cmaq

I am trying to use shp2cmaq to create a county mask file to use with DESID. I would then use DESID to reduce emissions in a group of counties that would be a region family. I am trying to create the mask for the CMAQ version 5.5 benchmark which uses the 2018_12NE3 grid. The county shape file I am using comes from a US Census tigerline file:

https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2020&layergroup=Counties+%28and+equivalent%29

The tiger file contains the 5 digit state/county fips code in a column called GEOID. I call shp2cmaq using the tigerline zipfile with attrkey=‘GEOID’, gdnam=‘2018_12NE3’,
gdpath=‘/…/CMAQv5.4_2018_12NE3_Benchmark_2Day_Input/2018_12NE3/GRIDDESC’,
outpath=‘/…/CountyMask.nc’,
outformat=‘NETCDF3_CLASSIC’, srckey=‘intersection_area’,overwrite=True,verbose=1. The crs for the file is EPSG:4269.

The CountyMask.nc file that is created does not seem to function properly. Examining it with Panoply shows that the counties are overlapped with the grid in unknown units. Panoply output:

netcdf /home/…/CountyMask.nc {
dimensions:
TSTEP = 1;
VAR = 667;
DATE-TIME = 2;
LAY = 1;
ROW = 105;
COL = 100;
variables:
int TFLAG(TSTEP=1, VAR=667, DATE-TIME=2);
:units = “<YYYYJJJ,HHMMSS>”;
:long_name = "TFLAG ";
:var_desc = "TFLAG ";

float GEOID_09001(TSTEP=1, LAY=1, ROW=105, COL=100);
  :_FillValue = NaNf; // float
  :long_name = "GEOID_09001     ";
  :var_desc = "Fractional overlap of GEOID_09001 with 2018_12NE3                               ";
  :unit = "1               ";
  :units = "unknown";

float GEOID_09003(TSTEP=1, LAY=1, ROW=105, COL=100);
  :_FillValue = NaNf; // float
  :long_name = "GEOID_09003     ";
  :var_desc = "Fractional overlap of GEOID_09003 with 2018_12NE3                               ";
  :unit = "1               ";
  :units = "unknown";

Whereas Panoply indicates that the gridmask for states that comes with the Benchmark has the units of “fraction” as below:

netcdf /data1/CMAQ/DATA/CMAQv5.4_2018_12NE3_Benchmark_2Day_Input/2018_12NE3/GRIDMASK_STATES_12NE3.nc {
dimensions:
TSTEP = 1;
DATE-TIME = 2;
LAY = 1;
VAR = 49;
ROW = 105;
COL = 100;
variables:
int TFLAG(TSTEP=1, VAR=49, DATE-TIME=2);
:units = “<YYYYDDD,HHMMSS>”;
:long_name = "TFLAG ";
:var_desc = "Timestep-valid flags: (1) YYYYDDD or (2) HHMMSS ";

float AL(TSTEP=1, LAY=1, ROW=105, COL=100);
  :long_name = "AL              ";
  :units = "fraction";
  :var_desc = "AL fractional area per grid cell                                                ";

float AZ(TSTEP=1, LAY=1, ROW=105, COL=100);
  :long_name = "AZ              ";
  :units = "fraction";
  :var_desc = "AZ fractional area per grid cell                                                ";

How am I misusing shp2cmaq?
Thank you.

Thanks for the question and posting it. You’re right that unit/units should be fixed, but I don’t think that’s driving the problem.

For the most part, I don’t think that var_desc or units are used by DESID. Thought @Ben_Murphy might correct me. So, I think that it is something else. If you want to fix that, edit the shp2cmaq.py file to change unit to units on line 233ish. That will overwrite unknown with “1”, which is an appropriate unit. If you really want it to be fraction, you can change it from 1 to fraction. Again, I don’t think this will help – but easy to test.

I’m glad you’re using the NETCDF3_CLASSIC, which rules out the problem where the netcdf library CMAQ uses is not compatible. A few more questions to help figure this out:

  • What does it mean that “CountyMask.nc … does not seem to function properly”? Do you get an error? Or does CMAQ not apply the changes you expect?
  • It might be useful to see your DESID config and the CMAQ log.

For the default file, I added total and dominant variables with the code below:

keys = sorted([k for k in f if k != 'TFLAG' and not k.startswith('STATE_')])
f['STATE_TOT'] = eval(' + '.join([k for k in keys if k != 'TFLAG']), None, f.data_vars)
f['STATE_DOM'] = f['STATE_TOT'].dims, np.argmax(np.stack([f[k] for k in keys]), axis=0)

And plotted:

And compared to what you shared to me via email:

The domains generally match with expected coverages. So, it looks like a lot is going right…

CMAQ does not apply the changes I expect. In Verdi the NO and NO2 that I had intended to zero out are identical to the benchmark reference case and are clearly not zeroed out.

This is the runscript file:
run_Bench_CountyMask.csh (39.5 KB

Note that for the following files I added .txt to end of the filename so they would upload to this forum.

This is a log file:
CTM_LOG_000.v55_gcc_Bench_2018_12NE3_cb6r5_ae7_aq_m3dry_20180701
CTM_LOG_000.v55_gcc_Bench_2018_12NE3_cb6r5_ae7_aq_m3dry_20180701.txt (709.1 KB)

These are the nml files that describe the region family and zero the NOx emissions:
CMAQ_Control_DESID.txt (15.6 KB)

and
CMAQ_Control_DESID_cb6r5_ae7_aq.txt (13.2 KB)

This is the run log:
cctmCOUNTYlog.txt (493.1 KB)

Thank you.

This is the file you are using in your run script:

#> Spatial Masks For Emissions Scaling
setenv CMAQ_MASKS $INPDIR/GRIDMASK_COUNTIES_12NE3.nc

I assume this is the file that you generated using following these instructions: CMAQ/DOCS/Users_Guide/Tutorials/CMAQ_UG_tutorial_emissions.md at main ¡ USEPA/CMAQ ¡ GitHub

Using the following tool:

Your CTM_LOG file contains the following WARNINGs

     |> Reading Emission Control Namelist:
     +======================================

     Performing Basic Error Checks for Emission Scaling Rules

     "CMAQ_MASKS" opened as OLD:READ-ONLY
     File name "/data1/CMAQ/DATA/CMAQv5.4_2018_12NE3_Benchmark_2Day_Input/2018_12NE3/GRIDMASK_COUNTIES_12NE3.nc"
     File type GRDDED3
     Execution ID "NA"
     Grid name "12NE3"
     Dimensions: 105 rows, 100 cols, 1 lays, 667 vbles
     NetCDF ID:    524288  opened as READONLY
     Time-independent data.

 >>--->> WARNING in subroutine RDTFLAG
     Time step not available in file CMAQ_MASKS       for variable GEOID_09001


     >>--->> WARNING in subroutine XTRACT3
     Time step not available for file:  CMAQ_MASKS


     >>--->> WARNING in subroutine DESID_INIT_REGIO on PE 000
     Could not read GEOID_09001                     from file CMAQ_MASKS


     >>--->> WARNING in subroutine RDTFLAG
     Time step not available in file CMAQ_MASKS       for variable GEOID_09007

... this warning was repeated for all of the defined regions GEOID_*

     >>--->> WARNING in subroutine DESID_INIT_REGIO on PE 000
     Could not read GEOID_34041                     from file CMAQ_MASKS

     Closing file CMAQ_MASKS

I am not sure why the GEOID variables can’t be read by CMAQ, but if the regions can’t be read from the CMAQ_MASKS file, then the DESID won’t be able to apply the emission reduction.

One thing you can try is to use m3xtract to recreate the file GRIDMASK_COUNTIES_12NE3.nc file.

setenv INFILE GRIDMASK_COUNTIES_12NE3.nc
setenv OUTFILE GRIDMASK_COUNTIES_12NE3_redo.nc

m3xtract
(extract all layers, all variables)

To better understand why CMAQ fails to read the mask file, could you please upload a text file showing the output of the command ncdump -v TFLAG GRIDMASK_COUNTIES_12NE3.nc?

Yes, I have read the DESID and shp2cmaq documentation on the github page. I noticed the “could not read” warnings and was concerned that my problem was the variable name as shp2cmaq indicates that variable length and certain characters could cause problems. I tried several things including removing the “GEOID_” prefix, but that didn’t help and I was concerned that the string name might then be interpreted as numeric and the leading zeros of the 5 digit FIPS code might be lost. I eventually concluded that the variable name was not the problem and began focusing on the units “unknown”. I also conducted tests to create a region family with the benchmark state gridmask to assure I was understanding that properly. I then tried creating my own state gridmask with shp2cmaq and the file listed in the shp2cmaq instructions (cb_2022_us_state_500k.zip). The results of that were similar to the county results.

I should explain that I create CountyMask.nc in my home directory and copy it to GRIDMASK_COUNTIES_12NE3.nc in the benchmark input directory.

I will look into using m3xtract.

Thank you.

Attached is the out from ncdump -v TFLAG GRIDMASK_COUNTIES_12NE3.nc

ncdump.txt (241.8 KB)

Thank you.

I think the problem is probably related to the time-independence notation. You’ll need to update the gpath to point to your file. I believe the file is not being identified as time-independent. The code below should work in your python environment and I think it will fix the file for you.

import netCDF4
import numpy as np
gpath = 'GRIDMASK_COUNTIES_12NE3.nc'
f = netCDF4.Dataset(gpath, mode='a')
f.SDATE = np.int32(-635)
f.variables['TFLAG'][:, :, 0] = np.int32(0)
f.close()

After you confirm that this works, I’ll dig a bit deeper because the raw output seems to be working for others.

1 Like

Can you confirm that you’re using the version of shp2cmaq that has a shp2cmaq.py and not just the Notebook (.ipynb)? And, please confirm the version of cmaqsatproc that you’re using.

import cmaqsatproc as csp
print('cmaqsatproc', csp.__version__)
import shp2cmaq
print('shp2cmaq', shp2cmaq.__version__)

For example, when I run this code, I get:

cmaqsatproc 0.4.1
shp2cmaq 1.0

Confirming that versions are correct:

This solved it. Using m3xtract to essentially copy the file created by shp2cmaq allowed the county shapes (GEIOD_xxxxx) to be read from the CMAQ_MASKS file.

Thank you all for your help and patience.

I did also use this in the solution. I ran this code (I used mode =‘r+’) after creating the mask with shp2cmaq, and then ran that file through m3xtract.
Thanks.

Having obtained the correct results by applying the timestep and m3xtract fixes, I wanted to see if either alone provided the correction. In doing so I could not replicate my original error. Upon further investigation and recalling that I used the netcdf3 option in shp2cmaq after a warning about HDF5, I found that HDF5 files had been updated just prior to my obtaining the correct results. I therefore suspect the update to HDF5 may have been a contributing factor to obtaining the correct results.