Code Monkey home page Code Monkey logo

public-services.py's Introduction

public-services.py

Code style: black Lint

This is a set of publicly-available services hosted by Dan Brotsky Consulting. The exact functionality varies over time.

Action Network/Airtable Integration

The currently posted services integrate Action Network and Airtable in support of anti-racist community organizing by the Everyday People PAC. Triggered by Action Network web hooks, the services move people and donation records over to Airtable where the data is used by organizers in support of volunteer mobilization.

Hosting Info

The brotsky.com public services are API-only and are rooted at this endpoint. They are built in Python using FastAPI. They are hosted on Heroku using Papertrail and Redis Cloud add-ons. Service configuration files are stored in Amazon S3.

License

The brotsky.com public services are open source and released under MIT License. See the LICENSE file for details.

public-services.py's People

Contributors

brotskydotcom avatar clickonetwo avatar dependabot[bot] avatar huayu-ouyang avatar

Watchers

 avatar  avatar  avatar

public-services.py's Issues

custom fields are mapped at record creation time, not at upload time

So it turns out core fields and custom fields have been handled differently in people records for quite some time. The core fields are named for their Action Network name, while the custom fields are named for their Airtable name. This was not an intended asymmetry, and the code for shifts was doing the "right thing" of naming them both for their data source field name. But that means the person code broke as soon as someone reset one of the custom field values after creating the record.

(Found while testing PR #42, so fixed in that branch.)

Async race conditions between webhook transfer tasks

Turns out both #19 and #20 are manifestations of a deeper underlying bug: a race condition between multiple webhook transfer tasks. Because the length of the submitted list can decrease while one of the transfer tasks is running, the code that was counting the number of items to work on was incorrectly assuming it would always retrieve an item from the list when in fact it sometimes got None. So fixing that will fix both those other issues.

Expand the control interface

It would be nice if the control interface offered more capability than just querying and restarting deferred items. For example, it would be nice to see some stats over how many items are waiting for processing, how many are deferred, etc.

fix Heroku workers

Turns out the profile was trying to use uvicorn to run the worker, but in fact worker are not ASGI clients. Let's get the workers running by using a straight python script with asyncio as the run loop manager.

Remove deleted timeslots from Airtable

We have found out that, unlike events, timeslots are not removed from the Airtable export when they are deleted. Instead a date is put in their timeslot_deleted_at column. So the integration needs to be enhanced to ignore CSV rows with content in this column.

security on events upload

We had an accidental upload of an empty events file, and because the logic is to remove all events that don't match the latest upload, this caused all events to be removed. So it's clear we need some protection against accidental and malicious uploads in production.

Mobilize exports are cumulative; take forever to process

Mobilize event and shift exports seem to be cumulative since the beginning of time, not deltas since the last export. And each one requires looking up a person record in addition to the event or shift record. So they are literally taking hours to process.

This means we need a cache.

Cache is too brittle and expensive

The cache is brittle because it keeps record IDs and those can be deleted from Airtable between runs.

In addition caching all the data is expensive; more expensive than having a worker who keeps the data in memory.

The bulk upload architecture needs rework, definitely.

save retry failures for later re-submit

Currently we retry failed webhook transfer items 5 times and then give up. When we give up, we should saving those items to a different key in the database, so that once the blocking issue is fixed they can be re-submitted.

Also it would be nice to have an API endpoint that triggers the re-submit.

Invalid webhook list name causes exception

Found these lines in the running server log:

Jul 29 12:31:11 bdc-public-services app/web.1 Processing webhook items on 'None'...
Jul 29 12:31:11 bdc-public-services app/web.1 Task exception was never retrieved
Jul 29 12:31:11 bdc-public-services app/web.1 future: <Task finished name='Task-393' coro=<transfer_all_webhook_items() done, defined at /app/app/workers/webhook_transfer.py:114> exception=AttributeError("'NoneType' object has no attribute 'split'")>
Jul 29 12:31:11 bdc-public-services app/web.1 Traceback (most recent call last):
Jul 29 12:31:11 bdc-public-services app/web.1   File "/app/app/workers/webhook_transfer.py", line 121, in transfer_all_webhook_items
Jul 29 12:31:11 bdc-public-services app/web.1     retry_list = await process_items(try_list)
Jul 29 12:31:11 bdc-public-services app/web.1   File "/app/app/workers/webhook_transfer.py", line 45, in process_items
Jul 29 12:31:11 bdc-public-services app/web.1     environ, guid, retry_count = list_key.split(":")
Jul 29 12:31:11 bdc-public-services app/web.1 AttributeError: 'NoneType' object has no attribute 'split'

Clearly the fact that None was used as a key in a processing list is also a bug. But we shouldn't crash because of it. That's this bug.

Race condition: Shifts must not precede events

Because events and shifts are uploaded separately, it's possible for the shifts for an even to be uploaded before we have the event information. If we process the items for these shifts before the event, then they become orphaned (not connected to the event) in Airtable and won't be relinked unless the shift data is uploaded. This breaks the stats on the Airtable side. So we need to delay creation of any such shift until the event has been created.

Processing list created with name "None"

Found this in the log:

Jul 29 11:51:44 bdc-public-services app/web.1 Running worker task to transfer received item(s).
Jul 29 11:51:44 bdc-public-services app/web.1 34.203.196.148:0 - "POST /action_network/notification HTTP/1.1" 200
Jul 29 11:51:44 bdc-public-services app/web.1 Processing webhook items on 'PROD:200729.185144.343104.10:0'...
Jul 29 11:51:44 bdc-public-services app/web.1 Processing 2 webhook item list(s)...
Jul 29 11:51:44 bdc-public-services app/web.1 Accepted 1 item(s) from webhook.
Jul 29 11:51:44 bdc-public-services app/web.1 Running worker task to transfer received item(s).
Jul 29 11:51:44 bdc-public-services app/web.1 34.203.196.148:0 - "POST /action_network/notification HTTP/1.1" 200
Jul 29 11:51:44 bdc-public-services app/web.1 Processing 3 webhook item list(s)...
Jul 29 11:51:44 bdc-public-services app/web.1 Found upload item.
Jul 29 11:51:44 bdc-public-services app/web.1 Processing webhook items on 'PROD:200729.185144.348858.10:0'...
Jul 29 11:51:44 bdc-public-services app/web.1 Processing webhook items on 'PROD:200729.185144.347614.9:0'...
Jul 29 11:51:44 bdc-public-services app/web.1 Processing 2 webhook item list(s)...
Jul 29 11:51:44 bdc-public-services app/web.1 Found upload item.Found upload item.
Jul 29 11:51:44 bdc-public-services app/web.1 Processing webhook items on 'PROD:200729.185144.356961.10:0'...
Jul 29 11:51:44 bdc-public-services app/web.1 Found upload item.
Jul 29 11:51:44 bdc-public-services app/web.1 
Jul 29 11:51:45 bdc-public-services app/web.1 Uploading new person record for [email protected].
Jul 29 11:51:45 bdc-public-services app/web.1 Found existing person record for [email protected].
Jul 29 11:51:45 bdc-public-services app/web.1 Updating 3 fields in record.
Jul 29 11:51:45 bdc-public-services app/web.1 Temporary error, will retry later: webhook_transfer.py, 56: HTTPError('422 Client Error: Unprocessable Entity for url: https://api.airtable.com/v0/appQ5XIu4UyWmXWAj/tblA58ZCCMSzBOK2h/recW2n3b4CyGgW5XX (Decoded URL) [Error: {\'type\': \'UNKNOWN_FIELD_NAME\', \'message\': \'Unknown field name: "Phone Number*"\'}]')
Jul 29 11:51:45 bdc-public-services app/web.1 Failed to process 1 item(s).
Jul 29 11:51:45 bdc-public-services app/web.1 Will save failed item(s) for later retry.
Jul 29 11:51:45 bdc-public-services app/web.1 Processing webhook items on 'PROD:200729.185144.348858.10:0'...
Jul 29 11:51:45 bdc-public-services app/web.1 List 'PROD:200729.185144.348858.10:0' done: processed 0 item(s) successfully.
Jul 29 11:51:45 bdc-public-services app/web.1 Processing webhook items on 'None'...

Apparently somewhere in the asynchronous "multiple tasks pulling from the same list" melange, the retry list got created or posted with None as its name.

Run worker process

Running the transfer worker as a background task makes sense on a hobby dyno that idles when unused, but it probably makes more sense to have the always-on dyno run the transfer worker as a separate process.

Import Events from Mobilize to Airtable

The organizers are requesting that we import scheduled events from Mobilize to a separate table in airtable, and that we show in that table all the shifted and unshifted events independent of signups. The format should allow for easily rolling up how many people have signed up for (each shift of) each event.

Invalid entity errors during shift upload

All of a sudden we are getting multiple errors in the log that look like this:

Temporary error on shift item #21, will retry later: webhook_transfer.py, 67: HTTPError('422 Client Error: Unprocessable Entity for url: https://api.airtable.com/v0/appQ5XIu4UyWmXWAj/tblXentRNTHB13B39', '422 Client Error: Unprocessable Entity for url: https://api.airtable.com/v0/appQ5XIu4UyWmXWAj/tblXentRNTHB13B39 [Error: {\'type\': \'INVALID_VALUE_FOR_COLUMN\', \'message\': \'Field "email" cannot accept the provided value\'}]')

Handle co-organized but non-publicized special events on Mobilize

Occasionally, there are Mobilize events that are co-organized by STV folks but which are not from publicized partners of STV. In this case, the event data will be exported to a separate spreadsheet which doesn't include the other Mobilize events, so importing that spreadsheet will wipe out all the Mobilize events. (This, in turn, will wipe out all the event links for the shifts for those events.) So we need two enhancements:

  1. Provide a way of marking an event as one of these special events in the imported spreadsheet, and
  2. If the spreadsheet without marking is accidentally imported, find a way of force re-linking all the shifts to their correct events once those have been re-imported.

track fundraising page for each donation

This issue is the work needed for this Asana task.

The current donation transfer code does not keep track of the fundraising form associated with each donation. Per the Asana task, in addition to transferring the donor to the Contacts table in Airtable, we should transfer info about the fundraising form to a separate table on the Airtable side as well, and then link each donation to the fundraising form table so we can aggregate donations against their form as well as their donor.

Since each donation has a link to its form as well as its donor, the code to follow and transfer the form info should be similar to what is currently used for people. It's not clear what info we need about the form other than its title, so product management should clarify what else is needed. (Clearly the form id should be the key for the forms table.)

incorrectly logging retry counts

When a temporary error occurs and an item needs to be retried, the retry list is correctly returned and set for reprocessing but it is not counted, so at the end of the run the number of retry lists is logged as 0.

enhance mappings file format

The current mappings file format has a number of different limitations:
#. It forces you to repeat the field mappings in each environment, which is tedious and error prone because they are never different on a per-environment basis.
#. It doesn't provide a way to tag the target with the type of the Airtable field, so we can't do intelligent format conversion.
#. It doesn't provide a way to collect multiple check-box fields on the AN side into a single multi-select on the Airtable side.

catch bad shift spreadsheet inputs

We need to validate that shift spreadsheets have the right fields before we send them off for processing. Otherwise the worker wastes time retrying them and then they end up deferred.

improve testing and add coverage

We currently don't have good unit tests for almost anything. In addition, we have no coverage metrics. To add better testing, we will need to stub or mock at least the network transfer code that moves data to Airtable.

Events and Time Slots deleted from Mobilize are not deleted from Airtable

The current integration views each Mobilize load as incremental, but in fact it's absolute. So we need to change the integration to delete any events and timeslots that aren't in the next upload (and maybe the shifts associated with them - we might have to only do deletions on a go-forward basis).

Incorrect currency warning in the logs

If donations come in with currency USD rather than usd, we get an incorrect warning in the logs that the currency is unexpected. The currency comparison should be case-insensitive.

Privacy enhancements

Need to get the personally identifiable information out of the logs and the cache, which still leaving them usable for debugging.

Handle upload webhooks

Action Network sends webhooks when new volunteers are uploaded, but they are currently being ignored. There's no reason to ignore them, since now that we have unified all person transfers they can be handled just like forms.

make sure Mobilize shift taker info doesn't over-write Action Network info

Now that we are planning to upload the Mobilize volunteer info directly to Action Network so that it can be deduped and merged with other form info by AN, we want to make sure we use the Mobilize shift taker info just for matching up the shift taker with an existing volunteer; we don't want it to overwrite the volunteer data (in case there is new AN data we want to keep).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.