Create CEM file for 2022?

Ok so I figured it out. The EPA uploads cem files quarterly (according to the help desk?), which you can access via FTP: https://gaftp.epa.gov/DMDnLoad/emissions/hourly/monthly/2022/

I made a script to batch download the zip files, unzip them, and save the file names.

Then I formatted the HOUR_UNIT files to match this: 8.2.7. PTHOUR: Point source hour-specific emissions
which just means rearranging the columns of the files from the FTP, formatting the dates, and concattenating all the files within a specific month. I used the PTINV file in the SMOKE/input/ptegu directory to guide me to make sure the IDs were matching up. It’s kinda tricky reading through the user guide to get this, but ends up being straightforward I think?

One negative is that I could not find metadata in the files to help navigate the files across the FTP and the SMOKE platform so that I can really confirm how these are all connected. Like, the EPA files use ‘ORISPL_CODE’ to match on to the ‘ORISID’, ‘UNITID’ to match ‘BLRID’, etc., etc. (at least, from matching unique datasets across the PTINV, an HOUR_UNIT*, and these FTP files, that’s what I have back-engineered to understand?). So I’m working off my matching and assumptions, but it seems to work. Also, I am not sure if I need to further clean the data but that will be my next step to compare my 2021 CEM files to my 2022 CEM files.

1 Like