Monthly Average of AQS Data

Some one recently asked how to create monthly means from daily AQS data. Here’s an easy example that should work for any system with pandas and matplotlib.

import pandas as pd
import matplotlib.pyplot as plt

YYYY = 2016

index_cols = ['State Code', 'County Code', 'Site Num', 'POC', 'Date Local']

# Reading data and subsetting by event type to prevent duplicate records
# "None" means no events
# If there are events, there are two records per day:
# "Included" means has events and the event data is included
# "Excluded" means has events and the event data was not included
data = pd.read_csv(
    f'https://aqs.epa.gov/aqsweb/airdata/daily_44201_{YYYY}.zip',
    usecols=index_cols + ['Longitude', 'Latitude', '1st Max Value', '1st Max Hour', 'Event Type', 'Pollutant Standard']
).query('`Event Type` in ("None", "Included")')

# Make a month variable. If you want it to be numeric, append `.astype('i')`
data['Month'] = data['Date Local'].str[5:7].astype('i')

# Create a group by object
monthg = data.groupby(index_cols[:-1] + ['Month'])

# Create a monthly mean
monthly = monthg.mean()
# Add count for completion checks
monthly['Count'] = monthg['1st Max Value'].count()

# Make a figure and save it
fig, ax = plt.subplots(1, 1)
tax = ax.twinx()
mc = monthly.loc[:, ['Count']].groupby(['Month']).sum()
ax.bar(mc.index, mc['Count'], color='lightgrey', alpha=0.25)
mvs = [mg['1st Max Value'] for mm, mg in monthly.groupby(['Month'])]
mms = [mm for mm, mg in monthly.groupby(['Month'])]
ax.set_ylabel('Count')
tax.set_ylabel('Ozone ppm')
tax.boxplot(mvs, positions=mms)
ax.set_title(f'Ozone MDA8 from all observations in {YYYY}')
ax.figure.savefig('OzoneMonthly.png')

image

2 Likes

Thanks @barronh for the sample code. I was wondering if it is possible to access other pollutants - do we change the number ‘44201’ in the URL to access another pollutant’s data?

Actually, just found this github link with a lot of useful information+code about AQS data access from Python:

Thanks!

1 Like

Your right on both your question and my API gist. These are two approaches – the code snippet pasted here uses “pregenerated files” and the gist uses the “AQS API”:

  • The API approach provides more flexibility, but also requires an account for large data requests and more awareness of flag meanings etc.
  • The pregenerated files approach requires downloading all the data, but has a lot of preprocessing already applied.

If you do change from 44201 to another parameter using the pregenerated approch, there are other things you might need to change too. For example, “1st Max Value” is appropriate for Ozone 8-hour averages while “Arithmetic Mean” might be the more appropriate daily value for PM.
The AQS API notebook is great for API interaction, but you’re also right that you can switch which data you’re working with by changing the code (44201 to 88101, for example).

p.s., I noticed that gist previews don’t show up right in the forum by default (the json comes through instead of the html). If you use the link button in the editor, you’ll get a text link instead of the weird preview.