nigelcleland / lode Goto Github PK
View Code? Open in Web Editor NEWScripts to automate getting data from electricity market sources
License: MIT License
Scripts to automate getting data from electricity market sources
License: MIT License
ditto - EMI_scraper.py
When importing a CSV file without headers there is some poor behaviour which is stopping it being imported..
Currently a lot of functionality is bundled into the NZEMDB class.
A lot of this is redundant and it prevents developing a better interface.
Should create separate classes.
OfferDB
WindDB
NodalDB
DispatchDB
AggregateDB
Each of these would inherit from NZEMDB for the generic things such as importing and exporting to the
database. But they could also have more customised functionality.
Current system has too much overlap.
A lot of the core functionality of the scrapers is the same.
For example, downloading files, logging (eventually), filename parsing.
We could abstract this into a central class and then inherit from these.
Currently Database Tables grow very very large, beyond the point of fast query times on standard desktop systems.
One way of solving this issue would be to separate the tables into individual years.
This should limit the quantity of data involved to ~300-400mb as opposed to the 2GB currently. Then the query should be parsed initially to see which tables to query and if necessary broken down and ran across multiple tables.
This requires a wrapper to parse queries and then assign these to the appropriate tables.
In addition, it also requires smarter inserting into different tables. The insertion will have to be parsed to see which dates are in it so that it can be loaded into the appropriate table.
This is a bit of overhead but should speed things up considerably.
Temperature data is incredibly important.
Want temperature data going back to 2000 if possible.
Ideally we want this for a number of locations.
E.g. the major population sources.
In addition we'd like hourly data as well as data on humidity and wind speed.
Most like 15 locations, 14 years, 24 points per day.
Approximately 1.9million rows
This would be ~75,000 separate queries.
Possible sources.
Forecast.io: 1000 queries per day for free
Wunderground: 500 queries per day for free
Weatherbase: Need to scrape this
NIWA Clifo: Terrible interface, manually do queries
NOAA: Not sure, need further investigation
Implement password protected scrapers for:
POCP
COMITHydro
Paid Wits
EMS(?)
Wits scraper is currently in a function.
This could be abstracted into a general class similar to the EMI scraper
Add the pocp scraper - from existing code for a start
Release a template config file which the user can use to create their own config file
Could have a function which the user can use to create a config file? E.g. pass a master directory.
Might as well add this... requires username and password to access site - (I think anyone can obtain a username/password).
We have some code for this which could be included - the code itself is a bit of a mess so needs a rewrite first I think. It may also benefit from the scrapper subclass... so perhaps wait?
When importing data to the database a significant quantity of the data files are being deleted.............
If you parse a correct datetime object the database queries will fail as it tries to mung these to datetimes using the parse utility from dateutils. Should be a check upon types before this occurs.
Hey @djhume,
What are your thoughts regarding creating a global nzem-datastore module
Basically, there is a decent amount of different functionality and growing each day.
It could be quite good to group this all together and be able to import into different areas.
E.g.
from nzemdatastore.scrapers.Scraper import Scraper
from nzemdatastore.database.NZEM import NZEM
This could be a cleaner way of keeping everything together?
It also sets everything up in a far nicer way to have documentation and (eventually) tests set up.
More of a traditional type style for example
Modify the GDX scraper to continue searching for new files unless it finds a _F extension.
We should also get a basic logger up and running
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.