The ferc_714 from gschivley

Cleaning up FERC 714 UTC offset codes and timezones

I've also been working on cleaning this up and have found that the non-standard non-standard offset abbreviations (e.g. MDT, MST) don't always mean the same things for different utilities, and some of them aren't obvious. I've compiled a per-respondent_id mapping of non-standard abbreviations to what seems like the most likely canonical offset code, and then a mapping of those offset codes both to a numerical UTC offset (e.g. UTC-8), and a canonical time zone (e.g. "America/Los_Angeles") with the goal of converting all of the dates + hours + codes into a UTC time column and an associated time zone.

I assumed that all of the 25th hours were invalid for any day that wasn't one of the "fall back" time changes transitioning out of daylight savings and into standard time (are there other reasons why there might be a valid 25 hour day recorded?). But really, since each of the days is reported as a single record, it should be in a single UTC-offset, right? So even on these time-shift days, almost all of the 25th hours are zero, and most of the ones that aren't zero are daily demand totals. Across all of the respondents and all of the years, I found only 110 records on "fall back" days that what looked like hourly demand numbers in the 25th hour. However, there seemed to be many more excess zero valued hours in the 2-3AM timeslots.

It looked like you were applying the same offset code across all the records for a given respondent with this line:

tz = r_all_years["timezone"].values[0]

But I've noticed that most respondents change which offset they claim to be reporting the times in frequently -- at the daylight/standard time switches, but also sometimes at the beginning/end of a year. Or even outside of those times. Did you notice that too, and decide to do this anyway? I didn't really understand your approach with looking at the number of hours per year per respondent -- are you trying to convert the reported hours into like, an hour-of-year? Like you're generating a complete hourly time series to use as the index, and if the number of hours that are extracted from that year's worth of data for that respondent matches the number of hours you expect to find in that year, you're just assuming they'll line up correctly on an hour-of-year basis?

Because of the way the offset codes are reported, after the conversion to UTC, one ends up with duplicate hours and gaps at the start / end of a run of a given offset code within a respondent, which I'm imagining is what you were trying to avoid here. But it also seems weird to just ignore the information that's in theory being reported by the utility as to what UTC-offset they're reporting the demand in.

gschivley / ferc_714 Goto Github PK

ferc_714's People

Contributors

Stargazers

Watchers

Forkers

ferc_714's Issues

Cleaning up FERC 714 UTC offset codes and timezones

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent