gschivley / ferc_714 Goto Github PK
View Code? Open in Web Editor NEWClean up FERC 714 hourly demand data
License: MIT License
Clean up FERC 714 hourly demand data
License: MIT License
I've also been working on cleaning this up and have found that the non-standard non-standard offset abbreviations (e.g. MDT, MST) don't always mean the same things for different utilities, and some of them aren't obvious. I've compiled a per-respondent_id mapping of non-standard abbreviations to what seems like the most likely canonical offset code, and then a mapping of those offset codes both to a numerical UTC offset (e.g. UTC-8), and a canonical time zone (e.g. "America/Los_Angeles") with the goal of converting all of the dates + hours + codes into a UTC time column and an associated time zone.
I assumed that all of the 25th hours were invalid for any day that wasn't one of the "fall back" time changes transitioning out of daylight savings and into standard time (are there other reasons why there might be a valid 25 hour day recorded?). But really, since each of the days is reported as a single record, it should be in a single UTC-offset, right? So even on these time-shift days, almost all of the 25th hours are zero, and most of the ones that aren't zero are daily demand totals. Across all of the respondents and all of the years, I found only 110 records on "fall back" days that what looked like hourly demand numbers in the 25th hour. However, there seemed to be many more excess zero valued hours in the 2-3AM timeslots.
It looked like you were applying the same offset code across all the records for a given respondent with this line:
tz = r_all_years["timezone"].values[0]
But I've noticed that most respondents change which offset they claim to be reporting the times in frequently -- at the daylight/standard time switches, but also sometimes at the beginning/end of a year. Or even outside of those times. Did you notice that too, and decide to do this anyway? I didn't really understand your approach with looking at the number of hours per year per respondent -- are you trying to convert the reported hours into like, an hour-of-year? Like you're generating a complete hourly time series to use as the index, and if the number of hours that are extracted from that year's worth of data for that respondent matches the number of hours you expect to find in that year, you're just assuming they'll line up correctly on an hour-of-year basis?
Because of the way the offset codes are reported, after the conversion to UTC, one ends up with duplicate hours and gaps at the start / end of a run of a given offset code within a respondent, which I'm imagining is what you were trying to avoid here. But it also seems weird to just ignore the information that's in theory being reported by the utility as to what UTC-offset they're reporting the demand in.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.