The lode from nigelcleland

Write a Scraper for the Generation MD dataset from emi

Update asx_scraper.py to utilise Scraper class

ditto - EMI_scraper.py

Headers on Imports Failing

When importing a CSV file without headers there is some poor behaviour which is stopping it being imported..

Separate Functionality into different Classes

Currently a lot of functionality is bundled into the NZEMDB class.
A lot of this is redundant and it prevents developing a better interface.

Should create separate classes.

OfferDB
WindDB
NodalDB
DispatchDB
AggregateDB

Each of these would inherit from NZEMDB for the generic things such as importing and exporting to the
database. But they could also have more customised functionality.

Current system has too much overlap.

Create an abstract Scraper class and subclasses for WITS/EMI

A lot of the core functionality of the scrapers is the same.
For example, downloading files, logging (eventually), filename parsing.
We could abstract this into a central class and then inherit from these.

Write a Scraper for the Reserve Prices from the emi dataset

Write Scraper Class Documentation

Separate Database Tables by Year

Currently Database Tables grow very very large, beyond the point of fast query times on standard desktop systems.

One way of solving this issue would be to separate the tables into individual years.
This should limit the quantity of data involved to ~300-400mb as opposed to the 2GB currently. Then the query should be parsed initially to see which tables to query and if necessary broken down and ran across multiple tables.

This requires a wrapper to parse queries and then assign these to the appropriate tables.

In addition, it also requires smarter inserting into different tables. The insertion will have to be parsed to see which dates are in it so that it can be loaded into the appropriate table.

This is a bit of overhead but should speed things up considerably.

Temperature Database Scraper

Temperature data is incredibly important.
Want temperature data going back to 2000 if possible.
Ideally we want this for a number of locations.
E.g. the major population sources.

In addition we'd like hourly data as well as data on humidity and wind speed.
Most like 15 locations, 14 years, 24 points per day.
Approximately 1.9million rows
This would be ~75,000 separate queries.

Possible sources.

Forecast.io: 1000 queries per day for free
Wunderground: 500 queries per day for free
Weatherbase: Need to scrape this
NIWA Clifo: Terrible interface, manually do queries
NOAA: Not sure, need further investigation

Password Protected Scrapers

Implement password protected scrapers for:

POCP
COMITHydro
Paid Wits
EMS(?)

Refactor wits_scraper functionality into a Class

Wits scraper is currently in a function.
This could be abstracted into a general class similar to the EMI scraper

POCP scraper

Add the pocp scraper - from existing code for a start

Config Template, change config to a user defined file

Release a template config file which the user can use to create their own config file
Could have a function which the user can use to create a config file? E.g. pass a master directory.

Add FTR scraper

Might as well add this... requires username and password to access site - (I think anyone can obtain a username/password).

Wits five minute ftp scraper

We have some code for this which could be included - the code itself is a bit of a mess so needs a rewrite first I think. It may also benefit from the scrapper subclass... so perhaps wait?

Add in automation ability to Scraper Class

Data Import appearsto be deleting date

When importing data to the database a significant quantity of the data files are being deleted.............

Write Schemas and seed information for each of the Database tables

Check if datetime already exists before parsing

If you parse a correct datetime object the database queries will fail as it tries to mung these to datetimes using the parse utility from dateutils. Should be a check upon types before this occurs.

Create a global nzem-datastore module

Hey @djhume,

What are your thoughts regarding creating a global nzem-datastore module

Basically, there is a decent amount of different functionality and growing each day.
It could be quite good to group this all together and be able to import into different areas.

E.g.

from nzemdatastore.scrapers.Scraper import Scraper
from nzemdatastore.database.NZEM import NZEM

This could be a cleaner way of keeping everything together?
It also sets everything up in a far nicer way to have documentation and (eventually) tests set up.

More of a traditional type style for example

nigelcleland / lode Goto Github PK

lode's People

Contributors

Stargazers

Watchers

Forkers

lode's Issues

Recommend Projects

Recommend Topics

Recommend Org