Code Monkey home page Code Monkey logo

Comments (3)

symroe avatar symroe commented on July 23, 2024

I think "library" might be the wrong term here, at least as I see it.

The way I used EE on WCIVF was to set up a local copy of the web service on each server that WCIVF used (ie, baked in to the image that spun up new instances).

This meant that I could communicate with EE over HTTP and all I needed to do was to set the EE_BASE_URL setting.

When we use the word "library" I start thinking about it being a reusable Django app that can be pip installed and used by the project directly in the Django/python interface.

This is a nice idea, but might be the wrong way to go:

  1. using HTTP means switching from a central system at quiet times to a local system is much easier.
  2. Related, setting up a dev copy of a project would be simpler, as there would be no need to sync live data from EE down to the dev copy
  3. Dog fooding the public API is A Good Thing for everyone. If we use a python API then we're unlikely to have a great HTTP API (or at least, we won't care about having one).

I think the work here should be making it easier for EE to sync itself with upstream data in some way.

The considerations for that should be speed of import. For WCIVF I used a binary DB dump from postgresql. Importing this took seconds, making it quick enough to do on instance boot, meaning that we got fresh data every time an instance came up (the dumps were synced to S3 every hour, and the instance requested the latest).

from everyelection.

chris48s avatar chris48s commented on July 23, 2024

OK, so having chewed it over a bit, lets try and work out what we're actually going to do here..

Regardless of whether we're calling EE as a python library or as a local JSON API, the actual hard bit of this problem is how we deal with syncing data and a lot of what we need to do changes the apps that consume EE and the deploy processes supporting them, rather than EE itself.

  • We already have some scripts supporting the process of running a local EE instance (for local HTTP/JSON access). These should be generalised into some kind of re-usable ansible module that can be pulled into polling_deploy and who_deploy (using ansible galaxy, or a git submodule, or something).

  • At the moment, we take a DB dump from EE every day on a schedule. It would be useful if we had a way to check if anything has changed in the database since the last export and only create a new dump if something has changed since the last export.

  • Currently WhoCIVF doesn't do scheduled imports - it just boots and:

    • Sets itself to dirty/unhealthy
    • Runs an import
    • Sets itself to clean/healthy
      If we want to update EE, we've got to cycle the server. Is it reasonable to just do this or do we need scheduled imports?
  • WhereDIV also needs to implement the clean/dirty flag

  • At the moment our DB is fairly small, but once we've been through a few boundary changes it will be much bigger. We need to ensure we don't reach a point where a the import takes so long that the load balancer chucks a machine out of the cluster.

  • If we want to run scheduled DB imports, we need to consider:

    • Does running pg_restore with the --single-transaction flag mean we can do a big import with no downtime and without being in an inconsistent state at any point? If so happy days, but needs testing..
    • If not:
      a) Client apps need to ensure that all nodes in the cluster don't try to sync at the same time (to preserve uptime)
      b) Client apps should set themselves as dirty while syncing data from EE (so we don't serve up spurious results during the sync)
    • Our client apps (WhoCIVF, WhereDIV) should also ideally know what the last dump they imported was and only import a new DB dump if it is necessary.
  • Is there also a 'halfway house' where we can just sync stuff that changes frequently (e.g: explainers) on a schedule but only do a full import on boot?

  • Whatever we do, when we are in "business as usual" mode (i.e: no elections happening or some minor by-elections) we want to be able to turn off any syncing process in our client apps and just call https://elections.democracyclub.org.uk/ directly.

from everyelection.

chris48s avatar chris48s commented on July 23, 2024

I'm going to close this as I reckon we have probably taken this as far as we are going to for the moment.. If we want to iterate it further, we can open new more specific issues for the improvements we want to make

from everyelection.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.