Comments (3)
I think "library" might be the wrong term here, at least as I see it.
The way I used EE on WCIVF was to set up a local copy of the web service on each server that WCIVF used (ie, baked in to the image that spun up new instances).
This meant that I could communicate with EE over HTTP and all I needed to do was to set the EE_BASE_URL
setting.
When we use the word "library" I start thinking about it being a reusable Django app that can be pip installed and used by the project directly in the Django/python interface.
This is a nice idea, but might be the wrong way to go:
- using HTTP means switching from a central system at quiet times to a local system is much easier.
- Related, setting up a dev copy of a project would be simpler, as there would be no need to sync live data from EE down to the dev copy
- Dog fooding the public API is A Good Thing for everyone. If we use a python API then we're unlikely to have a great HTTP API (or at least, we won't care about having one).
I think the work here should be making it easier for EE to sync itself with upstream data in some way.
The considerations for that should be speed of import. For WCIVF I used a binary DB dump from postgresql. Importing this took seconds, making it quick enough to do on instance boot, meaning that we got fresh data every time an instance came up (the dumps were synced to S3 every hour, and the instance requested the latest).
from everyelection.
OK, so having chewed it over a bit, lets try and work out what we're actually going to do here..
Regardless of whether we're calling EE as a python library or as a local JSON API, the actual hard bit of this problem is how we deal with syncing data and a lot of what we need to do changes the apps that consume EE and the deploy processes supporting them, rather than EE itself.
-
We already have some scripts supporting the process of running a local EE instance (for local HTTP/JSON access). These should be generalised into some kind of re-usable ansible module that can be pulled into
polling_deploy
andwho_deploy
(using ansible galaxy, or a git submodule, or something). -
At the moment, we take a DB dump from EE every day on a schedule. It would be useful if we had a way to check if anything has changed in the database since the last export and only create a new dump if something has changed since the last export.
-
Currently WhoCIVF doesn't do scheduled imports - it just boots and:
- Sets itself to dirty/unhealthy
- Runs an import
- Sets itself to clean/healthy
If we want to update EE, we've got to cycle the server. Is it reasonable to just do this or do we need scheduled imports?
-
WhereDIV also needs to implement the clean/dirty flag
-
At the moment our DB is fairly small, but once we've been through a few boundary changes it will be much bigger. We need to ensure we don't reach a point where a the import takes so long that the load balancer chucks a machine out of the cluster.
-
If we want to run scheduled DB imports, we need to consider:
- Does running
pg_restore
with the--single-transaction
flag mean we can do a big import with no downtime and without being in an inconsistent state at any point? If so happy days, but needs testing.. - If not:
a) Client apps need to ensure that all nodes in the cluster don't try to sync at the same time (to preserve uptime)
b) Client apps should set themselves as dirty while syncing data from EE (so we don't serve up spurious results during the sync) - Our client apps (WhoCIVF, WhereDIV) should also ideally know what the last dump they imported was and only import a new DB dump if it is necessary.
- Does running
-
Is there also a 'halfway house' where we can just sync stuff that changes frequently (e.g: explainers) on a schedule but only do a full import on boot?
-
Whatever we do, when we are in "business as usual" mode (i.e: no elections happening or some minor by-elections) we want to be able to turn off any syncing process in our client apps and just call https://elections.democracyclub.org.uk/ directly.
from everyelection.
I'm going to close this as I reckon we have probably taken this as far as we are going to for the moment.. If we want to iterate it further, we can open new more specific issues for the improvements we want to make
from everyelection.
Related Issues (20)
- Import ONS UPRN Directory User Guide (February 2024)
- Import ONS Postcode Directory (February 2024) User Guide HOT 1
- Add a note that europarl coverage is an archive/not active
- When no items exist in the queue, ds-card isn't needed
- Keep subdivided tables up to date.
- Import ONS UPRN Directory (March 2024) User Guide
- Election parents don't come up in admin search
- Import ONS UPRN Directory (April 2024) User Guide
- Issue with API ordering that results in winner_count/seats_contested not updating
- Import ONS Postcode Directory (May 2024) User Guide
- Import ONS Postcode Directory (May 2024)
- Import ONS Postcode Directory (November 2019) for the UK
- Import ONS Postcode Directory (November 2018) for the UK
- Import ONS Postcode Directory (November 2017) for the UK
- Import ONS Postcode Directory (November 2016) for the UK
- Import ONS Postcode Directory (November 2014) for the UK
- Import ONS UPRN Directory (June 2024) (Epoch 110) User Guide
- Add and display reason for by-elections being called.
- Store start and end dates for ElectedRole
- Add PA IDs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from everyelection.