Can we run analysis scripts on separate data tables ($AMET_PROJECT) on mySQL?

Hi everyone,

I populated my 2011 data in monthly chunks. Now I want to run analysis on quarterly (e.g., producing a timeseries plots for Jan, Feb, Mar) but my data is in separate Jan_, Feb_, Mar_$AMET_PROJECT. Is there a way I could do this without doing another quarterly database population? (each month took me an hour for creating database population so trying to avoid this)

Thanks,
Trang

Hi Trang,

So, if I follow you correctly, you created separate projects for each month. I guess I’m curious why you opted to do separate projects for each month. The database provides the flexibility to analyze the data by any time segment you want really, so you could put all the data for a year in a single project and use the AMET options to analyze your data by day/week/month/season, whatever.

But regardless, what you have is presumably 12 separate projects if you have an entire year. There are several options you have.

Option 1: Reprocess your data, but instead of giving each month a separate project name, just use the same project name ($AMET_PROJECT) for each month. This is the simplest option since it only requires you to make one tiny change to your AMET runscript. However, it is the slowest option since you’d effectively be re-running everything you’ve already done.

Option 2: Concatenate the monthly files you have for each network into single files that contain all the monthly data and then load that into a new project called $AMET_PROJECT. So, you would have to concatenate all the files for each network (being sure to remove the header lines in the concatenated file from all but the first file). Rename those files to, for example, IMPROVE_$AMET_PROJECT.csv, and save them in your AMET post-processing directory with your project name. Create a copy of your AMET post-processing script, set the flags for WRITE_SITEX and RUN_SITEX to F, the flags for LOAD_SITEX and CREATE_PROJECT to T, and change your AMET_PROJECT to $AMET_PROJECT (so as to load your data into the new table). If you used the aqProject_post_only.csh, you can set the start and end dates to whatever you want, AMET will just read the concatenated files you created and load that whole file. If you used the aqProject_pre_and_post.csh script, you can put your new concatenated files in the subdirectory YYYY01 and then set your start and end dates to YYYY-01-01 and YYYY-01-31. AMET will then read the concatenated files from the subdirectory YYYY01 and load those data into the new table (it will load all the data in the file, not just January). Assuming all this works, you’ll end up with a new AMET project called $AMET_PROJECT that contains all your data for all the months. You can then do your seasonal analyses using that table. I prefer this solution. It’s faster than option 1 since it does not re-run site compare. It does take some time to load the data, but you can set it and forget and come back when it’s done.

Option 3: Use the built-in MySQL function UNION to join multiple tables into a single table. This is probably the quickest option, but it’s not one I’ve ever done and there are some tricky parts to it. It’s seems pretty simple to copy the data from each individual table into a single table, but the tricks come later. Below is an example of how you would write the command at the MySQL command line to do that.

MySQL > insert into Jan_$AMET_PROJECT
select * from Feb_$AMET_PROJECT
UNION
select * from Mar_$AMET_PROJECT
UNION
select * from APR_$AMET_PROJECT

select * from DEC_$AMET_PROJECT;

The result would be that all of your data would be added to the Jan_$AMET_PROJECT tables. If you decided to go this route, you’d be done, and you could use the Jan_$AMET_PROJECT table to do the seasonal analyses. Now, you could create a new table instead of adding to the Jan_$AMET_PROJECT table, but I think you would need to create a copy of the Jan_AMET_PROJECT table first. The biggest problem with creating a new table this way is that $AMET_PROJECT won’t exist in the AMET project log table, so that would cause issues. If you want to go this route, let me know and we can discuss further.

Option 4: Create unique query strings that select data from multiple tables. This is something that MySQL is certainly capable of, but it’s not something I do often and I’m not sure how well it would integrate with the rest of the AMET code.

I hope this helps. I really suggest in the future you don’t create individual projects for each month, unless you really have a specific reason to do so.

Wyat

Thank you sooooooo much Wyatt, for your long/detailed reply!

I think the main reason why I processed data population in monthly basis was that I was inexperienced with AMET_AQ at the time I started the analysis :). I tested one month and it took me 1 hr, so to speed up the process I ran data population scripts simultaneously thinking that it would save me some time, and thought that the analysis scripts would have option to call data from different tables in MySQL. And my major focus is winter months anyway (I do have another Table I processed three winter months in it). But anyway I appreciate your advice and great experience sharing. I might try one of the suggested options.

Trang

I understand. One thing to note is that you can still run the processing concurrently for multiple months (i.e. have multiple scripts running at once) for the same AMET project. I do it all the time. Since the data for each month are unique, there’s no risk of data being overwritten. The database will just simultaneously populate the project with the monthly data from each script.

Wyat

Oh wow, that is a great way to go. I did not know we could simultaneously run data populations for different months but output them in the SAME AMET project. That is cool. And AMET is very cool too. Once I know how to use it, I love it. Thanks for building and sharing it.