Code Monkey home page Code Monkey logo

lode's People

Contributors

djhume avatar nigelcleland avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

djhume

lode's Issues

Headers on Imports Failing

When importing a CSV file without headers there is some poor behaviour which is stopping it being imported..

Separate Functionality into different Classes

Currently a lot of functionality is bundled into the NZEMDB class.
A lot of this is redundant and it prevents developing a better interface.

Should create separate classes.

OfferDB
WindDB
NodalDB
DispatchDB
AggregateDB

Each of these would inherit from NZEMDB for the generic things such as importing and exporting to the
database. But they could also have more customised functionality.

Current system has too much overlap.

Separate Database Tables by Year

Currently Database Tables grow very very large, beyond the point of fast query times on standard desktop systems.

One way of solving this issue would be to separate the tables into individual years.
This should limit the quantity of data involved to ~300-400mb as opposed to the 2GB currently. Then the query should be parsed initially to see which tables to query and if necessary broken down and ran across multiple tables.

This requires a wrapper to parse queries and then assign these to the appropriate tables.

In addition, it also requires smarter inserting into different tables. The insertion will have to be parsed to see which dates are in it so that it can be loaded into the appropriate table.

This is a bit of overhead but should speed things up considerably.

Temperature Database Scraper

Temperature data is incredibly important.
Want temperature data going back to 2000 if possible.
Ideally we want this for a number of locations.
E.g. the major population sources.

In addition we'd like hourly data as well as data on humidity and wind speed.
Most like 15 locations, 14 years, 24 points per day.
Approximately 1.9million rows
This would be ~75,000 separate queries.

Possible sources.

Forecast.io: 1000 queries per day for free
Wunderground: 500 queries per day for free
Weatherbase: Need to scrape this
NIWA Clifo: Terrible interface, manually do queries
NOAA: Not sure, need further investigation

POCP scraper

Add the pocp scraper - from existing code for a start

Add FTR scraper

Might as well add this... requires username and password to access site - (I think anyone can obtain a username/password).

Wits five minute ftp scraper

We have some code for this which could be included - the code itself is a bit of a mess so needs a rewrite first I think. It may also benefit from the scrapper subclass... so perhaps wait?

Check if datetime already exists before parsing

If you parse a correct datetime object the database queries will fail as it tries to mung these to datetimes using the parse utility from dateutils. Should be a check upon types before this occurs.

Create a global nzem-datastore module

Hey @djhume,

What are your thoughts regarding creating a global nzem-datastore module

Basically, there is a decent amount of different functionality and growing each day.
It could be quite good to group this all together and be able to import into different areas.

E.g.

from nzemdatastore.scrapers.Scraper import Scraper
from nzemdatastore.database.NZEM import NZEM

This could be a cleaner way of keeping everything together?
It also sets everything up in a far nicer way to have documentation and (eventually) tests set up.

More of a traditional type style for example

Logging

We should also get a basic logger up and running

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.