Empty surrogates from surrogate tool

The surrogate tool is running but it produces empty surrogate files (the headers are present but all the outputs are missing the actual surrogate data). I am using these scripts:
load postgres DB: SurrogateToolsDB/util at master · CEMPD/SurrogateToolsDB · GitHub
run tool: NEI 2017 v1 Platform - Google Drive
Here are my run messages:

  87 checkRunMessage: PGSRGCREATE script_OUTPUT>public.cb_2017_us_county_500k
  88 PGSRGCREATE script_OUTPUT>Cutting by data shapefile boundaries
  89 PGSRGCREATE script_ERROR>psql:./outputs/WF13UDAQ/temp_files/100_create_wp_cty.sql:1: NOTICE:  table "wp_cty_100" does not exist, skipping
  90 PGSRGCREATE script_OUTPUT>DROP TABLE
  91 PGSRGCREATE script_OUTPUT>CREATE TABLE
  92 PGSRGCREATE script_OUTPUT>                          addgeometrycolumn
  93 PGSRGCREATE script_OUTPUT>---------------------------------------------------------------------
  94 PGSRGCREATE script_OUTPUT> public.wp_cty_100.geom_900921 SRID:900921 TYPE:MultiPolygon DIMS:2
  95 PGSRGCREATE script_OUTPUT>(1 row)
  96 PGSRGCREATE script_OUTPUT>
  97 PGSRGCREATE script_ERROR>psql:./outputs/WF13UDAQ/temp_files/100_create_wp_cty.sql:22: ERROR:  column "pop2016_dens" does not exist
  98 PGSRGCREATE script_ERROR>LINE 4:  pop2016_dens,
  99 PGSRGCREATE script_ERROR>         ^
 100 PGSRGCREATE script_ERROR>HINT:  There is a column named "pop2016_dens" in table "wp_cty_100", but it cannot be referenced from this part of the query.
 101 PGSRGCREATE script_OUTPUT>UPDATE 0
 102 PGSRGCREATE script_OUTPUT>UPDATE 0
 103 PGSRGCREATE script_OUTPUT>CREATE INDEX
 104 PGSRGCREATE script_OUTPUT>VACUUM
 105 PGSRGCREATE script_OUTPUT>Gridding weight data to modeling domain
 106 PGSRGCREATE script_ERROR>psql:./outputs/WF13UDAQ/temp_files/100_create_wp_cty_cell.sql:1: NOTICE:  table "wp_cty_cell_100_wf13udaq" does not exist, skipping
 107 PGSRGCREATE script_OUTPUT>DROP TABLE
 108 PGSRGCREATE script_OUTPUT>CREATE TABLE
 109 PGSRGCREATE script_ERROR>psql:./outputs/WF13UDAQ/temp_files/100_create_wp_cty_cell.sql:9: ERROR:  relation "public.wp_cty_cell_100_WF13UDAQ" does not exist
 110 PGSRGCREATE script_ERROR>CONTEXT:  SQL statement "ALTER TABLE public."wp_cty_cell_100_WF13UDAQ" ADD COLUMN geom_900921 geometry(MultiPolygon, 900921)"
 111 PGSRGCREATE script_ERROR>PL/pgSQL function addgeometrycolumn(character varying,character varying,character varying,character varying,integer,character varying,integer,boolean) line 10     6 at EXECUTE
 112 PGSRGCREATE script_ERROR>SQL statement "SELECT public.AddGeometryColumn('',$1,$2,$3,$4,$5,$6,$7)"
 113 PGSRGCREATE script_ERROR>PL/pgSQL function addgeometrycolumn(character varying,character varying,character varying,integer,character varying,integer,boolean) line 5 at SQL statement
 114 PGSRGCREATE script_ERROR>psql:./outputs/WF13UDAQ/temp_files/100_create_wp_cty_cell.sql:24: ERROR:  INSERT has more expressions than target columns
 115 PGSRGCREATE script_ERROR>LINE 6:  CASE
 116 PGSRGCREATE script_ERROR>         ^
 117 PGSRGCREATE script_ERROR>psql:./outputs/WF13UDAQ/temp_files/100_create_wp_cty_cell.sql:25: ERROR:  column "geom_900921" does not exist
 118 PGSRGCREATE script_ERROR>LINE 1: ....wp_cty_cell_100_WF13UDAQ set area_900921=ST_Area(geom_90092...
 119 PGSRGCREATE script_ERROR>                                                             ^
 120 PGSRGCREATE script_OUTPUT>UPDATE 0
 121 PGSRGCREATE script_ERROR>psql:./outputs/WF13UDAQ/temp_files/100_create_wp_cty_cell.sql:28: ERROR:  column "geom_900921" does not exist
 122 PGSRGCREATE script_OUTPUT>CREATE TABLE public.numer_100_WF13UDAQ
 123 PGSRGCREATE script_ERROR>psql:./outputs/WF13UDAQ/temp_files/100_numer.sql:1: NOTICE:  table "numer_100_wf13udaq" does not exist, skipping
 124 PGSRGCREATE script_OUTPUT>DROP TABLE
 125 PGSRGCREATE script_OUTPUT>CREATE TABLE
 126 PGSRGCREATE script_OUTPUT>INSERT 0 0
 127 PGSRGCREATE script_OUTPUT>CREATE TABLE public.denom_100_WF13UDAQ; create primary key
 128 PGSRGCREATE script_ERROR>psql:./outputs/WF13UDAQ/temp_files/100_denom.sql:1: NOTICE:  table "denom_100_wf13udaq" does not exist, skipping
 129 PGSRGCREATE script_OUTPUT>DROP TABLE
 130 PGSRGCREATE script_OUTPUT>CREATE TABLE
 131 PGSRGCREATE script_OUTPUT>INSERT 0 0
 132 PGSRGCREATE script_OUTPUT>CREATE TABLE public.surg_100_WF13UDAQ; add primary key
 133 PGSRGCREATE script_ERROR>psql:./outputs/WF13UDAQ/temp_files/100_surg.sql:1: NOTICE:  table "surg_100_wf13udaq" does not exist, skipping
 134 PGSRGCREATE script_OUTPUT>DROP TABLE
 135 PGSRGCREATE script_OUTPUT>CREATE TABLE
 136 PGSRGCREATE script_OUTPUT>INSERT 0 0
 137 PGSRGCREATE script_OUTPUT>Exporting surrogates public.surg_100_WF13UDAQ;
 138 PGSRGCREATE script_ERROR>psql: warning: extra command-line argument "surrogates" ignored
 139 SUCCESS IN RUNNING THE EXECUTABLE: PGSRGCREATE script

Try this:

There is a script called “load_shapefile.csh” under the util directory. The section for the “acs” shapefile under the “calculate density” comment needs to be copied and pasted. For the new section the table name should be set to the 2016 shapefile name and the “pop2014” columns need to be changed to “pop2016”.

1 Like

Ok I will try that. Is there a 2017 load_shapefile.csh that already exists and works for the emiss_shp2017 shapefiles?

I think there is one posted on the Github. Let me know if you can’t find it.

This one? It looks like this one still references the 2014 acs Spatial-Allocator/load_shapefile.csh at master · CMASCenter/Spatial-Allocator · GitHub

I stared over using this download: GitHub - CEMPD/SurrogateToolsDB
with these shapefiles: NEI 2017 v1 Platform - Google Drive
and the following util scripts: load_shapefile.2017.csh, load_shapefile_reproject_multi.2017.csh
The shapefiles are present in the database.
The run_pg_srgcreate.csh ran without crashing, but it only produced surrogates for these codes: 260, 261, 271, 300, 304, 305, 306, 307, 308, 309, 310, 319, 320, 321, 340, 350. The remaining surrogate output files are present, but they are empty (except for header).
My surrogate generation file has all surrogates set to “YES”.

Here is a snip from the LOG for a surrogate that generated empty output. This error is repeated for all empty surrogate outputs (but with different attributes unable to be referenced)

PGSRGCREATE script_ERROR>psql:./outputs/wf13udaq/temp_files/100_create_wp_cty.sql:22: ERROR:  column "pop2016_dens" does not exist
PGSRGCREATE script_ERROR>LINE 4:  pop2016_dens,
PGSRGCREATE script_ERROR>         ^
PGSRGCREATE script_ERROR>HINT:  There is a column named "pop2016_dens" in table "wp_cty_100", but it cannot be referenced from this part of the query.

I can’t figure out why it cannot be referenced?

I’m thinking you may be running into issues due to the 2016 → 2017 changes that aren’t fully integrated into the SurrogateToolsDB GitHub repository. As a check, what does the first data line of your surrogate_specification_pg.csv file look like?

Here’s the 2016 version:
USA,Population,100,cb_2014_us_county_500k_Poly,geoid,ACS_2014_5YR_PopHousing,pop2014,Total population from Census 2014 blocks,

Compared to 2017:
USA,Population,100,cb_2017_us_county_500k,geoid,acs2016_5yr_bg,pop2016,Total population from Census 2016 blocks,

Here is the first line of my surrogate_specification_pg.csv:
USA,Population,100,cb_2017_us_county_500k,geoid,acs2016_5yr_bg,pop2016,Total population from Census 2016 blocks,

The 2017 files can be found in the archive PG_SurrogateTool_scripts_files.2017v1.31Jul2020.tar.gz downloadable from the NEI 2017 v1 Platform - Google Drive (same place as the shapefiles).

You may need the 2017 versions of

  • shapefile_catalog_pg.csv
  • surrogate_specification_pg.csv

Sorry for all the various links! We’re working on reorganizing things to be more straightforward.

Here is the first line of my shapefile_catalog_pg.csv:
cb_2017_us_county_500k,public,“datum=NAD83”,“proj=lcc,+lat_1=33,+lat_2=45,+lat_0=40,+lon_0=-97”,2017 US Census county boundaries at 5km resolution for onshore locations only,

Here is the first line of my surrogate_specification_pg.csv:
USA,Population,100,cb_2017_us_county_500k,geoid,acs2016_5yr_bg,pop2016,Total population from Census 2016 blocks,

Yep, those look like the 2017 files. Hmm… I’m going to do some tests to see if I can figure out what’s going on.

1 Like

I tracked down the problem with the pop2016_dens column, and hopefully the fix applies all the other attributes as well. The 2017 package changed the names of some columns derived from the shapefiles, and so there are updated scripts needed to work with those names.

I’ve updated the GitHub repository with the new files and scripts for 2017. Some notes:

  • The script util/load_shapefile_reproject_multi.2017.csh wasn’t calling load_shapefile.2017.csh, but was using the 2016 version. You may have already changed this when loading your shapefiles, otherwise you’ll need to reload the shapefiles. If you look at the table acs2016_5yr_bg, you should see columns named pop2016_dens_900921, hu2016_dens_900921, etc.
  • There’s a new directory named pgscripts2017 which has the 2017 scripts with column name updates.
  • There are now 2017 versions of all the specification files (e.g. control_variables_pg.2017.csv, shapefile_catalog_pg.2017.csv, etc.) and the surrogate generation script: run_pg_srgcreate.2017.csh

After applying all the updates, I was able to successfully generate surrogates for code 100 (Population) using the 2017 shapefiles. I’d recommend you try this single surrogate as well to see if things are working. By default, load_shapefile_reproject_multi.2017.csh now just loads the two shapefiles needed for this surrogate and the new config file surrogate_generation_pg.2017.csv only generates this surrogate.

1 Like

This fix is working for me as well! Thank you so much for your help! :smiling_face_with_three_hearts:

So glad that this is now working…