Relating SMOKE power plants to EPA's eGRID power plants

I have two datasets- both of which supposedly document EGU’s (Energy generating units) over CONUS, which I am trying to ‘mesh’. By meshing, I mean draw a one-to-one connection between EGU’s in eGRID power plant database (2018 or 2019), and EGU’s as documented by SMOKE using ORIS identification codes.
Since SMOKE does not classify EGU’s using renewable energy sources as their primary fuel type since they are non-emitting EGU’s, I have filtered out all renewable EGU’s in the eGRID dataset, as well as those that are classified as anything but operational (‘OP’). However, the number of independent ORIS codes reported in various SMOKE files is significantly different than the number of individual ORIS codes reported in the eGRID dataset, as well as varying amongst themselves.

EPA eGRID_2018_file: 4000+ individual ORIS codes

ptegu_file: 3162 individual ORIS codes
cemsum file: ~1400 individual ORIS codes
orisdesc_04dec2006: 2229 individual ORIS codes

Does anyone know why these numbers aren’t equating, or where I can get a definite list of every EGU that SMOKE uses?
Why don’t the cemsum, ptegu, and orisdesc files (all SMOKE-related files) have at least roughly equal lengths/dimensionalities if they’re all being used by SMOKE?

Note: The term individual ORIS ID is used, because any/all redundant ORIS ID’s (because of power plants containing multiple active boilers, and thus the same ORIS ID per boiler/generator) have been reduced to one representative ORIS ID to represent power plants (with x amount of boilers), or individual lat/lon pairs, for all four aforementioned files.

1 Like

To solve this, ORIS IDs were matched through the FF10 file and PSRC file. ~1/2 of the Oris codes were in the ff10 file in a column other than the ‘oris_facility_id’, and the other ~1/2 happened to be in the PSRC file. However, cross analyzing the eGRID dataset and the newly matched ORISID - PTEGU file, there seems to be some additional, active boilers in the eGRID dataset that are not in the PTEGU file.

EGRID contains every unit that contributes power to the grid according to the EIA. This is a broader set of units than what goes into the EGU inventories we use for modeling because only the units with a NEEDS IDs and CAP emissions are selected for the EGU inventories. There are some units with known ORIS IDs that are not selected for the EGU sector because they are not in NEEDS and there are other units that are not matched to ORIS IDs but could be.

The cemsum file only contains those units with hourly monitor data reported to CAMD and available in the yearly hourly CEMS data, which is an even smaller subset of the units that contribute to the grid. The ORIS description file (orisdesc) is something that we don’t consistently update because it is rarely used.

EGRID has a page where they are publishing an ORIS->EIS ID crosswalk and the script that they use to create it:

https://www.epa.gov/airmarkets/power-sector-data-crosswalk

There are also issues related to 1-1 unit mappings between eGRID / ORIS and EIS. We have a cross work that includes NEEDS IDs here:

ftp://newftp.epa.gov/Air/emismod/2016/v1/reports/EGU/egu_2016_2023_2030_NEEDS_NEI_ERTAC_xref_13jun2019.xlsx

Often but not always ,the NEEDS ID is a concatenation of the ORIS facility and boiler IDs.

1 Like

It doesn’t sound like you have a further question – if you do please repeat it. The eGRID and ptegu won’t fully match – we base our matching off of the NEEDS database.