Empty surrogates from surrogate tool

The surrogate tool is running but it produces empty surrogate files (the headers are present but all the outputs are missing the actual surrogate data). I am using these scripts:
load postgres DB: SurrogateToolsDB/util at master · CEMPD/SurrogateToolsDB · GitHub
run tool: NEI 2017 v1 Platform - Google Drive
Here are my run messages:

  87 checkRunMessage: PGSRGCREATE script_OUTPUT>public.cb_2017_us_county_500k
  88 PGSRGCREATE script_OUTPUT>Cutting by data shapefile boundaries
  89 PGSRGCREATE script_ERROR>psql:./outputs/WF13UDAQ/temp_files/100_create_wp_cty.sql:1: NOTICE:  table "wp_cty_100" does not exist, skipping
  90 PGSRGCREATE script_OUTPUT>DROP TABLE
  91 PGSRGCREATE script_OUTPUT>CREATE TABLE
  92 PGSRGCREATE script_OUTPUT>                          addgeometrycolumn
  93 PGSRGCREATE script_OUTPUT>---------------------------------------------------------------------
  94 PGSRGCREATE script_OUTPUT> public.wp_cty_100.geom_900921 SRID:900921 TYPE:MultiPolygon DIMS:2
  95 PGSRGCREATE script_OUTPUT>(1 row)
  96 PGSRGCREATE script_OUTPUT>
  97 PGSRGCREATE script_ERROR>psql:./outputs/WF13UDAQ/temp_files/100_create_wp_cty.sql:22: ERROR:  column "pop2016_dens" does not exist
  98 PGSRGCREATE script_ERROR>LINE 4:  pop2016_dens,
  99 PGSRGCREATE script_ERROR>         ^
 100 PGSRGCREATE script_ERROR>HINT:  There is a column named "pop2016_dens" in table "wp_cty_100", but it cannot be referenced from this part of the query.
 101 PGSRGCREATE script_OUTPUT>UPDATE 0
 102 PGSRGCREATE script_OUTPUT>UPDATE 0
 103 PGSRGCREATE script_OUTPUT>CREATE INDEX
 104 PGSRGCREATE script_OUTPUT>VACUUM
 105 PGSRGCREATE script_OUTPUT>Gridding weight data to modeling domain
 106 PGSRGCREATE script_ERROR>psql:./outputs/WF13UDAQ/temp_files/100_create_wp_cty_cell.sql:1: NOTICE:  table "wp_cty_cell_100_wf13udaq" does not exist, skipping
 107 PGSRGCREATE script_OUTPUT>DROP TABLE
 108 PGSRGCREATE script_OUTPUT>CREATE TABLE
 109 PGSRGCREATE script_ERROR>psql:./outputs/WF13UDAQ/temp_files/100_create_wp_cty_cell.sql:9: ERROR:  relation "public.wp_cty_cell_100_WF13UDAQ" does not exist
 110 PGSRGCREATE script_ERROR>CONTEXT:  SQL statement "ALTER TABLE public."wp_cty_cell_100_WF13UDAQ" ADD COLUMN geom_900921 geometry(MultiPolygon, 900921)"
 111 PGSRGCREATE script_ERROR>PL/pgSQL function addgeometrycolumn(character varying,character varying,character varying,character varying,integer,character varying,integer,boolean) line 10     6 at EXECUTE
 112 PGSRGCREATE script_ERROR>SQL statement "SELECT public.AddGeometryColumn('',$1,$2,$3,$4,$5,$6,$7)"
 113 PGSRGCREATE script_ERROR>PL/pgSQL function addgeometrycolumn(character varying,character varying,character varying,integer,character varying,integer,boolean) line 5 at SQL statement
 114 PGSRGCREATE script_ERROR>psql:./outputs/WF13UDAQ/temp_files/100_create_wp_cty_cell.sql:24: ERROR:  INSERT has more expressions than target columns
 115 PGSRGCREATE script_ERROR>LINE 6:  CASE
 116 PGSRGCREATE script_ERROR>         ^
 117 PGSRGCREATE script_ERROR>psql:./outputs/WF13UDAQ/temp_files/100_create_wp_cty_cell.sql:25: ERROR:  column "geom_900921" does not exist
 118 PGSRGCREATE script_ERROR>LINE 1: ....wp_cty_cell_100_WF13UDAQ set area_900921=ST_Area(geom_90092...
 119 PGSRGCREATE script_ERROR>                                                             ^
 120 PGSRGCREATE script_OUTPUT>UPDATE 0
 121 PGSRGCREATE script_ERROR>psql:./outputs/WF13UDAQ/temp_files/100_create_wp_cty_cell.sql:28: ERROR:  column "geom_900921" does not exist
 122 PGSRGCREATE script_OUTPUT>CREATE TABLE public.numer_100_WF13UDAQ
 123 PGSRGCREATE script_ERROR>psql:./outputs/WF13UDAQ/temp_files/100_numer.sql:1: NOTICE:  table "numer_100_wf13udaq" does not exist, skipping
 124 PGSRGCREATE script_OUTPUT>DROP TABLE
 125 PGSRGCREATE script_OUTPUT>CREATE TABLE
 126 PGSRGCREATE script_OUTPUT>INSERT 0 0
 127 PGSRGCREATE script_OUTPUT>CREATE TABLE public.denom_100_WF13UDAQ; create primary key
 128 PGSRGCREATE script_ERROR>psql:./outputs/WF13UDAQ/temp_files/100_denom.sql:1: NOTICE:  table "denom_100_wf13udaq" does not exist, skipping
 129 PGSRGCREATE script_OUTPUT>DROP TABLE
 130 PGSRGCREATE script_OUTPUT>CREATE TABLE
 131 PGSRGCREATE script_OUTPUT>INSERT 0 0
 132 PGSRGCREATE script_OUTPUT>CREATE TABLE public.surg_100_WF13UDAQ; add primary key
 133 PGSRGCREATE script_ERROR>psql:./outputs/WF13UDAQ/temp_files/100_surg.sql:1: NOTICE:  table "surg_100_wf13udaq" does not exist, skipping
 134 PGSRGCREATE script_OUTPUT>DROP TABLE
 135 PGSRGCREATE script_OUTPUT>CREATE TABLE
 136 PGSRGCREATE script_OUTPUT>INSERT 0 0
 137 PGSRGCREATE script_OUTPUT>Exporting surrogates public.surg_100_WF13UDAQ;
 138 PGSRGCREATE script_ERROR>psql: warning: extra command-line argument "surrogates" ignored
 139 SUCCESS IN RUNNING THE EXECUTABLE: PGSRGCREATE script

Try this:

There is a script called “load_shapefile.csh” under the util directory. The section for the “acs” shapefile under the “calculate density” comment needs to be copied and pasted. For the new section the table name should be set to the 2016 shapefile name and the “pop2014” columns need to be changed to “pop2016”.

1 Like

Ok I will try that. Is there a 2017 load_shapefile.csh that already exists and works for the emiss_shp2017 shapefiles?

I think there is one posted on the Github. Let me know if you can’t find it.

This one? It looks like this one still references the 2014 acs Spatial-Allocator/load_shapefile.csh at master · CMASCenter/Spatial-Allocator · GitHub

I stared over using this download: GitHub - CEMPD/SurrogateToolsDB
with these shapefiles: NEI 2017 v1 Platform - Google Drive
and the following util scripts: load_shapefile.2017.csh, load_shapefile_reproject_multi.2017.csh
The shapefiles are present in the database.
The run_pg_srgcreate.csh ran without crashing, but it only produced surrogates for these codes: 260, 261, 271, 300, 304, 305, 306, 307, 308, 309, 310, 319, 320, 321, 340, 350. The remaining surrogate output files are present, but they are empty (except for header).
My surrogate generation file has all surrogates set to “YES”.

Here is a snip from the LOG for a surrogate that generated empty output. This error is repeated for all empty surrogate outputs (but with different attributes unable to be referenced)

PGSRGCREATE script_ERROR>psql:./outputs/wf13udaq/temp_files/100_create_wp_cty.sql:22: ERROR:  column "pop2016_dens" does not exist
PGSRGCREATE script_ERROR>LINE 4:  pop2016_dens,
PGSRGCREATE script_ERROR>         ^
PGSRGCREATE script_ERROR>HINT:  There is a column named "pop2016_dens" in table "wp_cty_100", but it cannot be referenced from this part of the query.

I can’t figure out why it cannot be referenced?

I’m thinking you may be running into issues due to the 2016 → 2017 changes that aren’t fully integrated into the SurrogateToolsDB GitHub repository. As a check, what does the first data line of your surrogate_specification_pg.csv file look like?

Here’s the 2016 version:
USA,Population,100,cb_2014_us_county_500k_Poly,geoid,ACS_2014_5YR_PopHousing,pop2014,Total population from Census 2014 blocks,

Compared to 2017:
USA,Population,100,cb_2017_us_county_500k,geoid,acs2016_5yr_bg,pop2016,Total population from Census 2016 blocks,

Here is the first line of my surrogate_specification_pg.csv:
USA,Population,100,cb_2017_us_county_500k,geoid,acs2016_5yr_bg,pop2016,Total population from Census 2016 blocks,

The 2017 files can be found in the archive PG_SurrogateTool_scripts_files.2017v1.31Jul2020.tar.gz downloadable from the NEI 2017 v1 Platform - Google Drive (same place as the shapefiles).

You may need the 2017 versions of

  • shapefile_catalog_pg.csv
  • surrogate_specification_pg.csv

Sorry for all the various links! We’re working on reorganizing things to be more straightforward.

Here is the first line of my shapefile_catalog_pg.csv:
cb_2017_us_county_500k,public,“datum=NAD83”,“proj=lcc,+lat_1=33,+lat_2=45,+lat_0=40,+lon_0=-97”,2017 US Census county boundaries at 5km resolution for onshore locations only,

Here is the first line of my surrogate_specification_pg.csv:
USA,Population,100,cb_2017_us_county_500k,geoid,acs2016_5yr_bg,pop2016,Total population from Census 2016 blocks,

Yep, those look like the 2017 files. Hmm… I’m going to do some tests to see if I can figure out what’s going on.

1 Like

I tracked down the problem with the pop2016_dens column, and hopefully the fix applies all the other attributes as well. The 2017 package changed the names of some columns derived from the shapefiles, and so there are updated scripts needed to work with those names.

I’ve updated the GitHub repository with the new files and scripts for 2017. Some notes:

  • The script util/load_shapefile_reproject_multi.2017.csh wasn’t calling load_shapefile.2017.csh, but was using the 2016 version. You may have already changed this when loading your shapefiles, otherwise you’ll need to reload the shapefiles. If you look at the table acs2016_5yr_bg, you should see columns named pop2016_dens_900921, hu2016_dens_900921, etc.
  • There’s a new directory named pgscripts2017 which has the 2017 scripts with column name updates.
  • There are now 2017 versions of all the specification files (e.g. control_variables_pg.2017.csv, shapefile_catalog_pg.2017.csv, etc.) and the surrogate generation script: run_pg_srgcreate.2017.csh

After applying all the updates, I was able to successfully generate surrogates for code 100 (Population) using the 2017 shapefiles. I’d recommend you try this single surrogate as well to see if things are working. By default, load_shapefile_reproject_multi.2017.csh now just loads the two shapefiles needed for this surrogate and the new config file surrogate_generation_pg.2017.csv only generates this surrogate.

1 Like

This fix is working for me as well! Thank you so much for your help! :smiling_face_with_three_hearts:

So glad that this is now working…

Hi again -
It looks like the 2017 shapefile catalog (SurrogateToolsDB/shapefile_catalog_pg.2017.csv at master · CEMPD/SurrogateToolsDB · GitHub) is missing the hpms2017_v3_04052020 shapefile referenced by the specification file: (SurrogateToolsDB/surrogate_specification_pg.2017.csv at master · CEMPD/SurrogateToolsDB · GitHub)

I wonder if you could test the 2017 files you gave me with all the surrogates set to “YES”? The tool is only generating 29 of 62 surrogates with the files given. All the shapefiles have been loaded correctly into my DB. The tool will again generate a file and header for each surrogate, but most of the files are still empty.

A suggestion to try with the surrogates that aren’t successfully running:

Under outputs/[GRID NAME]/temp_files there are a series of sql scripts for each surrogate number. Using the surrogate-specific run script in this directory as a guide (ie. USA_170_NOFILL.csh), try manually executing each query from the sql scripts in order on the postgres command line. This should help identify where the error is happening. Fixing the error may require modification of the “load_shapefile.csh” script to create the tables with the correct names or changing a configuration option in the “control_variables_pg.csv” file.

I located the missing shapefile hpms2017_v3_04052020, and added it to the Google Drive folder where the shapefile package can be downloaded.

https://drive.google.com/drive/folders/1idGoi6I3GvKFCcf87O_8zMirM7Gtd0_E

Inside the NEI 2017 v1 Platform folder, you’ll see the archive named hpms2017_v3_04052020.zip

1 Like

Thanks for this tip! I learned that some of the geom_types for the following 2017 shapefiles need to be changed from “Multipoint” to “Point”: This involved making some changes to the template scripts in the pgscripts directory.
pil_2019_06_24
NTAD_2016_ipcd
US_Airports_NEI08v2_WRF
usa_golf_courses_2019_10
USGS_2011_mines
EIA_2015_US_Oil

Now that all my shapefiles are loaded correctly, I was able to run the tool (initial run and merge/gapfill run) and produce most of the surrogates I need.

I tried a test run in SMOKE for the nonpt sector and am getting this error message over and over again in GRDMAT:
Error reading REAL from “1.3315352676833974E-”; IOSTAT= 59
All of the 1.3km spatial surrogates I produced with the tool have scientific notation in the spatial surrogate ratio column. I don’t see scientific notation at all in the 12km or 4km surrogates provided by EPA. It seems like grdmat doesn’t like the “e”. Is there something I can do in the SurrogateTool to fix this?

This would seem to be an input-formatting error: STR2REAL is being called with a CHARACTER(LEN=20) string from routine RDSRG4GRID, but the data being fed into that routine is using a longer string (LEN at least 22) to hold the numbers, and you are getting a screw-up because those strings are being truncated.

Either use fewer significant digits for the input (and 17 digits is excessive for a REAL, by the way), or change SMOKE’s src/grdmat/rdsrg4grd.f so that SEGMENT is CHARACTER(24) instead of CHARACTER(20) and re-compile.