hackoregon / team-budget Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 4.0 2.58 MB

Repo for the Budget team's backend code

License: MIT License

Python 53.42% Shell 29.65% R 16.60% Dockerfile 0.33%

backend budget

team-budget's People

Contributors

Stargazers

Watchers

Forkers

jimtyhurst hassanshamim mikethecanuck mxmoss

team-budget's Issues

Create data endpoint for the Bureau totals for a Service Area

Create data endpoint for the Bureau totals for a Service Area
Filtered by

Service Area
Fiscal Year
Return Amount in dollars
Example results (Amounts not accurate)

Service Area = Public Safety
Bureau | Fiscal Yr | Object Code | Amount

Emergency Comm. | 2014-15 | Personal | $140,034,432
Emergency Comm. | 2014-15 | IMS | $153,234,432
Emergency Comm. | 2014-15 | EMS | $134,344,432
Emergency Mgmt. | 2014-15 | Personal | $34,024,432
Emergency Mgmt. | 2014-15 | IMS | $43,444,334
Emergency Mgmt. | 2014-15 | EMS | $85,343,324
PPB | 2014-15 | Personal | $123,343,234
PPB | 2014-15 | IMS | $163,344,324
PPB | 2014-15 | EMS | $173,432,342

Verify Production DB has no duplicate rows

I haven't been very closely involved with the actual budget DB we are extracting our data from, but every now and then I'll compare my local api responses with the production ones and I see a large difference in the number of rows in certain tables.

Just review the number of records in each table in the production DB to ensure it's accurate. Must happen before demo-day.

Improve startup response from the Django app (to solve AWS deploy blocker)

Brian Grant and I have been working all this week to find a way to stabilize the Docker containers in AWS.

Problem is, the containers deploy and start fine, but the app inside them doesn't respond quickly enough for the AWS "Health Check" to consider them healthy containers, so they never get scheduled in rotation by the load balancers. Instead, AWS automatically starts new containers from the same deployed image and replaces the "unhealthy" containers. However, since the image is the same every time, AWS can never get "healthy" containers.

End result: Budget is running on ancient code, and can't be updated in AWS until the code in the container image stabilizes.

Current hypothesis (advanced by Dan, and which fits all the behaviours I've observed to date) is this: in looking at the code in views.py, it appears the app is loading all unfiltered models at startup, which means it's loading the entire database table at startup. And since some of the tables are outsized, this is causing the startup to slow down until it exceeds all reasonable timeouts for determining that the contents of the container are healthy enough to be put into rotation.

According to the second article below, "Django load all applications and their 'models.py' at startup and executes its code."

Dan fired a number of resources my way, and while I'd love to use them to resolve the problem, this isn't a great use of my limited Python skills. This could use some app dev expertise. Can any of you folks help out?

Here's the links:
https://docs.djangoproject.com/en/1.10/topics/performance/
http://kartowicz.com/dryobates/2014-10/queries-run-django-projects-startup/
https://gun.io/blog/fast-as-fuck-django-part-1-using-a-profiler/

Enable developers using Windows to run the backend Docker container locally

We have at least two developers on the Budget team who are using Windows machines for local development work.

When @mxmoss (using Windows 10 + Docker Toolbox for Windows) tried to run docker-compose up from the current "dockerize" branch of this project, he encountered this error:

moss@DESKTOP-9LE20OE MINGW64 /c/develop/python/team-budget/budget_proj (dockerize)
$ docker-compose up
Starting budgetproj_web_1

ERROR: for web  Cannot start service web: oci runtime error: container_linux.go:247: starting container process caused "exec: \"/code/docker-entrypoint.sh\": stat /code/docker-entrypoint.sh: no such file or directory"
ERROR: Encountered errors while bringing up the project.

moss@DESKTOP-9LE20OE MINGW64 /c/develop/python/team-budget/budget_proj (dockerize)

Further testing by myself with other Docker containers on Windows 10 + Docker Toolbox for Windows resulted in similar errors (due to the same problem - can't find the script inside the container).

These same Docker containers will build and run successfully on Linux, MacOS and MacOS + Docker Toolbox for Mac configurations.

convert budget in brief 16 to json

Modify History Table, Functional Area Name column length

Modify History Table, Functional Area Name column length
Functional Area Name in history table - the field isn't long enough for the data.
Needs to be greater than 48 characters

Consider: Review lengths of other columns?
(Aaron had a tool to analyze the CSV files for data lengths)

PROBLEM: History and OCRB endpoints give 502/504 response in AWS

As Jim pointed out in his comments on PR #86, the /ocrb and /history endpoints are behaving poorly.

"When accessing the services on AWS, I get a 502 Gateway Error for these two URLs:

# Returns 502
http://hacko-integration-658279555.us-west-2.elb.amazonaws.com/budget/ocrb/
http://hacko-integration-658279555.us-west-2.elb.amazonaws.com/budget/history/

but I get 200 OK and data when I supply query parameters. For example:

# Returns 200 OK
http://hacko-integration-658279555.us-west-2.elb.amazonaws.com/budget/ocrb/?fy=2015-16
http://hacko-integration-658279555.us-west-2.elb.amazonaws.com/budget/history/?fiscal_year=2015-16&bureau_code=PS

I am wondering if there is too much data for the bare endpoints /ocrb and /history. That could definitely be a problem for /history, which is a fairly large data set. In contrast, I do not see any problems with bare /kpm:

# No problems with these URLs:
http://hacko-integration-658279555.us-west-2.elb.amazonaws.com/budget/kpm/
http://hacko-integration-658279555.us-west-2.elb.amazonaws.com/budget/code/

I've since observed 504 errors as well (Gateway Timeout), but haven't inferred a pattern.

Ensure current API code is using PostgreSQL to retrieve requested data

As of the "secrets" branch, we are now enabling Budget team developers to configure their local instance of the Django app to use the AWS EC2-hosted PostgreSQL instance as the data layer.

The new configuration (from secrets branch) has no impact on the code in views.py (e.g. find_ocrb_data(), find_kpm_data()) that pulls in data through the local CSV files. Thus work needs to be done to switch the API code so that each endpoint (e.g. /kpm/, /ocrb/ and /summary/) pulls its data from whichever PostgreSQL instance (AWS or local) is currently configured in the running app.

Generate static data for use by initial CYOA API

To generate the CYOA UI for users, the back end will take in the following as data inputs:

spending by {Service Area, Bureau} by {"Quadrant"}
citywide population
"Quadrant" population

This means we'll need tabular data (probably rendered out to JSON) that captures this information. Most or all of this data should be available from the Budget in Brief document.

Fix staticfiles in production

Django assumes our webserver (nginx?) will server the static files in production, which is why our swagger views are all messed up.

Simplest fix as far as I can tell is adding and enabling this:
https://github.com/kennethreitz/dj-static Following the directions in the readme should be sufficient.

Enable Travis to push Docker image to AWS EC2 Container Service

Current automation has not yet populated the docker-push.sh script being called in the after_success section of the .travis.yml file.

Prerequisites

we have stable Docker container being built in Travis and Django tests can be executed in Travis (PR #62)
we have Travis publishing that Docker container's image to Docker Hub (Issue #41)
we have a valid EC2 Container Service instance to which to push the image from Docker Hub (DevOps Issue 35)

ImportError: cannot import name 'project_config' - docker_and_travis_runs_app branch

Currently experiencing ImportError: cannot import name 'project_config' in Travis build:
https://travis-ci.org/hackoregon/team-budget/builds/210124740

When I run the same ./budget_proj/bin/start-proj.sh script in local computer as the script called by .travis.yml, I get the same error.

Running docker exec -it [container_id] /bin/bash, I can see that the project_config.sh script exists in /code/budget_proj/, which can only occur if the aws s3 cp command completes successfully.

So the question is, exactly which process is running that causes this final output in the Travis log:

  File "/code/budget_proj/settings.py", line 15, in <module>
    from . import project_config
ImportError: cannot import name 'project_config'

is it the docker-compose -f budget_proj/docker-compose.yml up --build from start-proj.sh?
is it the python3 manage.py migrate in docker-entrypoint.sh?
what is the current working directory of the python(3) executable i.e. do we have to prepend budget_proj. for python(3) to find the project_config? Do we have to change from . to some other reference?

Fix sloppy relative import in views.py

Example line:
f = '../Data/Budget_in_Brief_OCRB_data_All_Years.csv'

Should be using sys.path and the filepath to to find the csv, not just a string.

Design and build annotated API using django and swagger with documentation

(http://swagger.io/)

Which version of Python should be used in Django app's Docker container?

As @jimtyhurst asked in PR47, we have so far arbitrarily assigned a version of Python to use in the Django app.

Currently the Dockerfile specifies FROM python:3.5, whereas the current version of Python is 3.6.0.

Is there any reason not to use Python 3.6?

Decide if historical budget data is useful for MVP

Online budget data goes back to Brass Reports for FY 2005-06

are service areas consistent in all years
does it align with MVP, what data is usefull

Create data endpoint for the total budget by Service Area

Create data endpoint for the total budget by Service Area
Filter by Fiscal Year
Result: See example (below). Amount in dollars
Concern: How to create a summary endpoint?
Example results (Amounts not accurate)

Svc Area | Fiscal Yr | Object Code | Amount

Pub. Safety | 2014-15 | Personal | $140,034,432
Pub. Safety | 2014-15 | IMS | $153,234,432
Pub. Safety | 2014-15 | EMS | $134,344,432
Parks | 2014-15 | Personal | $34,024,432
Parks | 2014-15 | IMS | $43,444,334
Parks | 2014-15 | EMS | $85,343,324
Pub. Util | 2014-15 | Personal | $123,343,234
Pub. Util | 2014-15 | IMS | $163,344,324
Pub. Util | 2014-15 | EMS | $173,432,342

How shall this project team handle database updates?

As @jimtyhurst asked in PR47, do we need to enable migrate and import every time the Django app starts?

Jim says, "Each time the image starts, it runs the migrations and imports the data as specified in docker-entrypoint.sh, right? I don't think it should do those things. In general, the application should be configured to access an existing database, so why should we run the migrations and import the data every time the web app starts?"

Does this harm anything about the database, to have these operations performed automatically?
If not in the application startup, how should we handle the following scenarios: creation of a local PostgreSQL instance in the future, creation of a Production instance of the database (when Hack Oregon lands a production version of the Winter 2017 season's apps) and the need to synchronize the database with the latest accepted changes to the application?

As to the former question, from what I see in PgAdmin at the moment, the tables appear to contain only the number of rows of data that we have in the source CSVs. So at least for the moment, this gives us room to figure out the more deterministic approach.

As to the latter, I can imagine at least three scenarios we might have to deal with:

as a Budget team developer, I want to work from a local PostgreSQL installation (a) to reduce the lag time for each query and (2) to protect other developers from unfinished and unvetted changes I'm experimenting with in any API development work I do.
as a Budget team developer, I want to use an automated script to perform the creation of the Production version of the Budget database.
as a Budget team developer, I want to use an automated script to perform database additions (migrations/schema additions and data additions/imports).
as a Budget team developer, I want to use an automated script to perform necessary data changes such as migrations/schema alterations (not additions but changes to existing schema objects) and data transformations (changes to existing data).

Enable Travis to build Docker container successfully

Enable Travis to publish Docker image to Docker Hub

Budget in Brief - Key Performance Measures - do the measures change name or footnote from year to year?

Examine the detailed naming and footnotes for each Service Area's Key Performance Measures.

It is possible that the naming of one or more KPMs, or the text of the accompanying footnote, changes from year to year.

If that is true, we cannot use a single column for "Key Performance Measure" and "Footnote" in a single table that captures the data over multiple years.

Use the data from Budget in Brief FY 2016-17 and FY 2015-16 as a baseline to validate if the text data for these fields is exactly the same from year to year.

How will we deal with the first half of the Budget in Brief document?

Work is proceeding on the "second half" of the Budget in Brief documents - all the data representing the budget breakdown by Service Area.

What are we going to do with the "first half" of these documents?

I recommend we take some proposals to our contact(s) in the City Budget Office, alongside a demo of the Service Area representations in whatever state they're at.

Options include:

As-Is: building a set of cards, endpoints and database tables to regenerate the "first half" data as-is
Cherry Pick: select the most interesting/relevant of these context-setting data and find easy/interesting ways to represent this alongside the Service Area breakdowns
Hyperlink: direct citizens to the PDF documents "for more information"

Simplify the API

Summary

As a consumer of the API, I want to be able to retrieve budget data from multiple years, so that I can present it to people who want to review the budget.

Current situation

As developers have been learning Django and developing iteratively, the API now has redundant endpoints that demonstrate different styles of implementation. For example, /ocrb, /ocrb-prod, and /summary all return budget data from the "Operating and Capital Requirements by Bureau" (OCRB) tables in the "Budget in Brief" brochures.

/ocrb was developed first, as a quick prototype, which reads data directly from CSV files. It served that purpose well and allowed developers to demonstrate an API during the very first week that coding started. Unfortunately, it works around the Django framework's Model, which makes it difficult to implement filtering and sorting, because it is not using framework classes for the list of objects.
/summary was developed next, taking advantage of the importcsv.py script, which allowed us to load the CSV files into a relational database. Initially, we used the embedded sqlite3 database for development, but the code works unchanged with the current PostgreSQL database running independently on an AWS EC2 instance. /summary was written during the second week and it enhanced the API to allow for query parameters to filter the data and it returns the data sorted in a standard order. It allows for case-insensitive matching of query parameter values to field values, but requires case-sensitive matching of query parameter names to field names.
/ocrb-prod was developed the next week, taking advantage of the data which by this time was deployed in a PostgreSQL database on an AWS EC2 instance. This implementation uses an alternative approach for handling query parameters, which provides for more concise code, which can handle future query parameters without code changes. It requires exact case-sensitive matching of query parameter values to field values and exact case-sensitive matching of query parameter names to field names.

There are two endpoints for Key Performance Measure (KPM) data with parallel implementations to the corresponding OCRB endpoints:

/kpm reads data from a CSV file.
/kpm-prod reads data from the database configured in project_config.py.

Desired situation

It is time to reconcile those different implementations, refactor, and present a simple API for accessing budget data with just one endpoint for OCRB data and one endpoint for Key Performance Measures (KPM).

It might just be as simple as choosing the latest implementation, but there are a few factors to consider, such as:

Error handling, e.g. when client makes a call with a query parameter name that does not match an object field name.
Extensibility, e.g. whether coding changes are required when new fields become available on the model objects.
Case-sensitive vs insensitive query parameter names.
Case-sensitive vs insensitive query parameter values.

[future] Upgrade the backend Docker container to Python 3.6+

Based on our limited testing of Linux developer machines in Issue #81, it appears that at present, Linux hosts cannot successfully build the docker container, so we'll go with Python 3.5 for now.

If it becomes necessary or desirable to use Python 3.6 or later, we'll have to re-investigate the ability of Linux hosts to build the docker container locally.

View Query param filtering

So just ran into this scenario:

We want to pass query params to a view that do not map to a model field. An example of this would be ?format=json, which is one way DRF can decide to render a JSON response rather than HTML for the exploreable API. With our current query filtering we get an error like this:

http://hacko-integration-658279555.us-west-2.elb.amazonaws.com/budget/ocrb/?format=json

When it should just return a json response, like the KPM endpoint does (no filtering implemented)
http://hacko-integration-658279555.us-west-2.elb.amazonaws.com/budget/kpm/?format=json

Possible solutions:

install and use the django-filter package
write our own class/function that does the filtering as necessary.

Number 2 will have to cover:

differentiate query params that filter and those that modify the request
aggregrate params and compare to model fields.
decide how to handle invalid params.

I vote Number 1 as I see it as the better solution. That said, Number 2 sounds like more fun.

Enable functional testing of API container in Travis using "source env.sh" approach

Goal: enable Travis to test the APIs running in Docker container e.g. using the python manage.py test pattern.

Approach selected: having run into a dead-end enabling the "import . from project_config" approach in issue #58, let's pursue the alternative approach raised at the Budget Team meeting on 2017-03-13: "source env.sh".

In this implementation, rather than download the project_config.py file and import it in settings.py, we'll download an env.sh file where all settings are exported as environment variables, that can be accessed by settings.py (and other bits of code).

This also shares implementation with the proposal by backend developers to allow them to switch database layers from AWS to any local database from their developer computers.

Story: Choose Your Own Adventure budget

At the 2017-02-06 Budget team meeting, the group converged on the first User Story to start building as a team.

It takes the form of this story map:

User inputs the taxes they pay to the City of Portland (e.g. from Preset selection, or free-text input, or an estimate derived from the rent they paid)
User inputs where they live (e.g. from the incorrectly-termed "quadrants", or which of the eight zones across the city they live in)
System would perform calculation of how much of the city budget the user contributes
System would factor in three sets of "static data": spending by {Service Area, Bureau} by {"Quadrant"}, citywide population and "Quadrant" population
UI would display to the user their contribution by {Service Area, Bureau} (e.g. in raw $$ amounts, or in % proportions)

One significant question to answer later: which revenue sources do not "scale linearly"? Which revenue sources are sent to specific {Service Areas, Bureaus}?

For the initial version of this Story, we decided to assume that all proportions are uniform (e.g. for every $$ contributed in taxes, the % allocated to e.g. Parks and Recreation would be calculated as a straight % of overall city spending).

Note: this story could have three variants: one for Property Owners, one for Businesses and one for Renters.

Build an initial API for Choose Your Own Adventure story

Start building out a Django/DRF API that will return the initially-JSON'ized data that Megan is working on.

This will allow us to iterate quickly on what the front end team members will need to be able to render the CYOA UI, and will also allow the devops team members to get the automation ball rolling for the API layer.

QA data import

We need to reconcile the number of records in the CSV input files with the records in the staging database.

OCRB
KPM
History

Support Multiple Environments

At the moment, our settings.py file is tightly coupled with our deployment details. If you don't know some of these secret things, you can't run manage.py test or manage.py runserver locally.

To fix this, we we'll implement multiple settings.py files, one per required environment. We'll update manage.py to default to using the development settings, unless the DJANGO_SETTINGS_MODULE environment variable is set.

Dockerize the initial API

The team has now built working API endpoints that emit data when requested.

Let's generate a Docker container around these APIs so they can be run on a local developer box. Once the "local build" is working, we can quickly move to the "and migrate the container to AWS ECS" to demonstrate a Budget API working in the cloud.

Implement the configuration that separates secrets from settings.py

We will enable developers and the staging/integration environment to connect to the AWS EC2-hosted PostgreSQL instance.

To ensure that database secrets are never published to GitHub or Docker Hub, we need to separate out these secrets from settings.py file. We will implement the model recommended by the DevOps squad that has been implemented in the backend service pattern.

Create a History Summary View

User Story: As a customer who is also involved in cable access production, I want to know how much $$ goes to the Mt Hood Cable Regulatory Commission. I guess this should be shown per year, so I can see the funding trend.

Steps to get the data:

Query the division code

http://127.0.0.1:8000/code/?code_type=division_code

Here is a portion of the result...

{
   "code_type": "division_code",
   "code": "CBMH",
   "description": "Mt Hood Cable Regulatory Commission"
},

Query the history for division_code = 'CBMH' fiscal_year 2011-12

http://127.0.0.1:8000/history/?division_code=CBMH&fiscal_year=2011-12

Get a bunch of data elements similar to this:

{
    "fund_center_code": "CBMH000001",
    "fund_code": "GENERAL",
    "functional_area_code": "CDCC00",
    "object_code": "EMS",
    "fund_center_name": "Mt. Hood Cable Regulatory Commission",
    "fund_name": "General Fund",
    "functional_area_name": "Cable Communications",
    "accounting_object_name": "External Materials and Services",
    "service_area_code": "CD",
    "program_code": "CDCC",
    "sub_program_code": "CDCC00",
    "fund_center": "CBMH000001",
    "division_code": "CBMH",
    "bureau_code": "CB",
    "bureau_name": "Office for Community Technology",
    "fiscal_year": "2011-12",
    "amount": 0
},
{
    "fund_center_code": "CBMH000001",
    "fund_code": "SPEC_REV",
    "functional_area_code": "CDCC00",
    "object_code": "IMS",
    "fund_center_name": "Mt. Hood Cable Regulatory Commission",
    "fund_name": "Special Revenue",
    "functional_area_name": "Cable Communications",
    "accounting_object_name": "Internal Materials and Services",
    "service_area_code": "CD",
    "program_code": "CDCC",
    "sub_program_code": "CDCC00",
    "fund_center": "CBMH000001",
    "division_code": "CBMH",
    "bureau_code": "CB",
    "bureau_name": "Office for Community Technology",
    "fiscal_year": "2011-12",
    "amount": 99187
},

Sum all the amounts
Right now, summing is done on the front end- either by a human or a computer.

Questions/ Comments:

I guess we should also create a code set for the fiscal years
How will users know what divisions to look for?
Either (a) they have a picklist or (b) they can enter a portion of the string
Probably picklist is the better option?
We don't really need to return all of these fields (fund_center_code & fund_center? sub_program_code?) but let's keep them for now.
Are the amounts in dollars or pennies? Just a question for QA
The key point is... we currently have history "detail" mode - returning the detail of the search.
Let's also consider a "history summary" mode - returning the sum of amounts for a search.

This summary would return:

the columns that were searched for (ie: division_code & fiscal_year)
the sum of all amounts

Do we need the NA values in Budget in Brief?

In PR#20 the "NA" values in the Amount field of the KPM data were dropped.

Question for the City Budget Office: are "NA" KPM values meaningfully distinct from a blank KPM value?

If the City Budget Office deems it equivalent in the representation we develop in the future, it's fine to leave that data out of the KPM endpoint data. If they deem it a meaningful distinction, we will need to find a way to represent that data somehow to those consuming the KPM endpoint.

ImproperlyConfigured exception when running makemigrations

I performed a walkthrough of the new README in advance of tomorrow's Hackathon - make sure I know what others will be using, and see if there's any issues that I could clear up.

I discovered one issue that I don't know how to immediately solve, so rather than delay the solution I'm posting my finding in hopes that others might know how to solve.

Steps taken

I'm following this version of the README and I got through to Step 4 of "setting up your development environment". When I run the ./budget_proj/manage.py makemigrations command, I receive this error in return:

./budget_proj/manage.py makemigrations
Traceback (most recent call last):
  File "./budget_proj/manage.py", line 22, in <module>
    execute_from_command_line(sys.argv)
  File "/Users/mike/code/~hackoregon/team-budget/budget_venv/lib/python3.5/site-packages/django/core/management/__init__.py", line 367, in execute_from_command_line
    utility.execute()
  File "/Users/mike/code/~hackoregon/team-budget/budget_venv/lib/python3.5/site-packages/django/core/management/__init__.py", line 359, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/Users/mike/code/~hackoregon/team-budget/budget_venv/lib/python3.5/site-packages/django/core/management/__init__.py", line 208, in fetch_command
    klass = load_command_class(app_name, subcommand)
  File "/Users/mike/code/~hackoregon/team-budget/budget_venv/lib/python3.5/site-packages/django/core/management/__init__.py", line 40, in load_command_class
    module = import_module('%s.management.commands.%s' % (app_name, name))
  File "/Users/mike/code/~hackoregon/team-budget/budget_venv/lib/python3.5/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 986, in _gcd_import
  File "<frozen importlib._bootstrap>", line 969, in _find_and_load
  File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 665, in exec_module
  File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
  File "/Users/mike/code/~hackoregon/team-budget/budget_venv/lib/python3.5/site-packages/django/core/management/commands/makemigrations.py", line 11, in <module>
    from django.db.migrations.autodetector import MigrationAutodetector
  File "/Users/mike/code/~hackoregon/team-budget/budget_venv/lib/python3.5/site-packages/django/db/migrations/autodetector.py", line 13, in <module>
    from django.db.migrations.questioner import MigrationQuestioner
  File "/Users/mike/code/~hackoregon/team-budget/budget_venv/lib/python3.5/site-packages/django/db/migrations/questioner.py", line 12, in <module>
    from .loader import MigrationLoader
  File "/Users/mike/code/~hackoregon/team-budget/budget_venv/lib/python3.5/site-packages/django/db/migrations/loader.py", line 10, in <module>
    from django.db.migrations.recorder import MigrationRecorder
  File "/Users/mike/code/~hackoregon/team-budget/budget_venv/lib/python3.5/site-packages/django/db/migrations/recorder.py", line 12, in <module>
    class MigrationRecorder(object):
  File "/Users/mike/code/~hackoregon/team-budget/budget_venv/lib/python3.5/site-packages/django/db/migrations/recorder.py", line 26, in MigrationRecorder
    class Migration(models.Model):
  File "/Users/mike/code/~hackoregon/team-budget/budget_venv/lib/python3.5/site-packages/django/db/migrations/recorder.py", line 27, in Migration
    app = models.CharField(max_length=255)
  File "/Users/mike/code/~hackoregon/team-budget/budget_venv/lib/python3.5/site-packages/django/db/models/fields/__init__.py", line 1043, in __init__
    super(CharField, self).__init__(*args, **kwargs)
  File "/Users/mike/code/~hackoregon/team-budget/budget_venv/lib/python3.5/site-packages/django/db/models/fields/__init__.py", line 166, in __init__
    self.db_tablespace = db_tablespace or settings.DEFAULT_INDEX_TABLESPACE
  File "/Users/mike/code/~hackoregon/team-budget/budget_venv/lib/python3.5/site-packages/django/conf/__init__.py", line 53, in __getattr__
    self._setup(name)
  File "/Users/mike/code/~hackoregon/team-budget/budget_venv/lib/python3.5/site-packages/django/conf/__init__.py", line 39, in _setup
    % (desc, ENVIRONMENT_VARIABLE))
django.core.exceptions.ImproperlyConfigured: Requested setting DEFAULT_INDEX_TABLESPACE, but settings are not configured. You must either define the environment variable DJANGO_SETTINGS_MODULE or call settings.configure() before accessing settings.

Is there a missing step where we'd run source bin/env.sh or similar to populate the DJANGO_SETTINGS_MODULE? Or is this an irreproducible artifact of my ever-shifting dev environment?

Enable automated testing for every endpoint

PR #64 trivially tests the /ocrb endpoint.

Similar testing should be added for the remaining endpoints.

Find a Budget In Brief data point in the BRASS snapshot

I talked to Megan at the end of the 2017-02-06 Budget team meeting, and we tried to think of some small “spike” or experiment that the database team members could perform. The best idea we thought of that would help move our knowledge forward fastest is to continue down the road Ron pursued today, by trying to find the BRASS snapshot location of any cell of data from any table or bullet in the Budget In Brief document.

If we get any hits, that would start to untangle the relationship between BRASS tables and the Budget In Brief. If not, it ups the urgency on meeting with Shannon to build up some SQL queries to generate that data.

Blocking access or removing default Django endpoints in production

Let's assume for the moment that we enable the default Django endpoints (by running the python3 manage.py migrate command during Docker build , which enables for example the /admin endpoint).

Assuming these are not desired or secure to run in cloud (Integration/Production), it would be good to have an actual strategy for mitigating their exposure.

Ideally disabling them entirely in Django would be good. If that can't be done (and so far Django's docs aren't helping), then blocking access from the Internet is next best - either by putting a routing filter in place in the container (gunicorn?), or some kind of container policy if possible, or EC2 security policy that blocks requests from even getting into the container.

Create skeleton wireframe

Generate a skeleton wireframe for 2017-02-13 Budget team meeting

Switch testing from django test runner to pytest

Jay McGrath swapped in pytest for django's test runner a month ago on the Housing backend, and his reasoning was to increase the flexibility in storing the test scripts - django test runner expects all tests in [app_dir]/tests.py, whereas pytest can handle test files stored anywhere in the project.

We don't have a big swath of tests to manage so far, but if we do then this looks like a scalable solution to managing a big collection of tests.

If it turns out this is worth pursuing, or if others wish to look into it, here's the core docs:
http://pytest-django.readthedocs.io/en/latest/

Documentation

Please write documentation so that other people can recreate what you have accomplished. For example: moving SQL server image to PostgreSQL. Write down the steps you take. Document what you are doing including mistakes and crazy error messages. This is a learning process. We learn from each other.

Docker: add PATH env var so that all scripts don't have to hard-code /budget_proj/bin/

I'm dealing with a bunch of fallout of trying to get Docker + Travis + the Budget Django app's pathing to work together for a successful build. See DevOps Issue 34 for details.

One idea I'm coming to is that once I figure out all the path dependencies that result from moving the Docker files into saner locations, I'd like to remove the hard-coded path dependencies that I'm building in to get this sucker to build again.

So far I've hard-coded /budget_proj/bin/ into the following files:

/budget_proj/bin/start-proj.sh
/budget_proj/bin/docker-entrypoint.sh
./travis.yml
docker-compose.yml

My half-baked idea is to insert some kind of PATH=$PATH;/budget_proj/bin/; export PATH command somewhere that it will do the most good (i.e. for all commands running in the Travis container).

Perhaps something like this will be needed to reduce the hard-coded pathing that is creeping into the Docker container runtime commands as well.

Create data endpoint for showing budget data change over time

Create data endpoint for showing budget data change over time
Want to show the change in budget totals by service area for each fiscal year
Result is in dollars per service area per fiscal year
Consider that this data will be used in a Sankey diagram or stream graph

Get historical Revenue data

We don't have revenue data. Can we get that data?
Not required for MVP

Endpoint for text data from Budget In Brief - Services Areas

We have scraped the tabular data for Service Areas from the past two years' Budget in Brief documents - you see that in the Data folder in this repo (https://github.com/hackoregon/team-budget/tree/master/Data).

Next step is to scrape the text data on those Service Area pages to enable us to emit it in an API endpoint, so that it can be rendered inline alongside the OCRB and KPM data. The data includes:

intro text just below the Service Area heading (e.g. under Parks, Recreation & Culture Service Area, the intro text is "The Parks, Recreation & Culture service area includes services for Portland Parks & Recreation, the only bureau in this service area. The bureau also administers the Golf program and Portland International Raceway."
Significant Issues and Major Projects (SIMP) - this section falls between OCRB table and KPM table, and includes one or more bullets of text

Acceptance Criteria

Assumption: the SIMP data must be emitted from the API in a format that ensures that bullets will render in the same order as in the Budget in Brief document.
The data probably varies from year to year, so API users must be able to request the appropriate text derived from a specified fiscal year (e.g. user requests all SIMP text from FY2016-17)
The data must be captured in a way that reading it directly from Django code, and importing into a database, won't noticeably change the readability of the text by the end user as compared to the experience of reading the Budget in Brief PDF. (e.g. if all bullets for a single Service Area are stored as a single record, then the bullet characters must be encoded in a way that they will automatically show up as bullets in the user's browser)

Any tool that works will do. The tool used to scrape tabular data was Tabula; unknown at the moment if this would work for text data, or if a simple cut-and-paste would work well enough.

Question for City Budget Office contacts: must the SIMP bullets be displayed every time in the same order as they are presented in the Budget in Brief PDF documents?

Clear out old import of BRASS data

Remove all ~300 tables from AWS EC2-hosted PostgreSQL that were previously imported from the BRASS database export.

This will enable us to re-use the PostgreSQL instance for the Budget-in-Brief data and the data export we received from City Budget Office the week of 2017-03-06.

Do we have a specific, immediate need for the /admin endpoint?

Currently when running the budget_proj/app without a migrate step, the following message shows up after runserver:

python3 budget_proj/manage.py runserver
Performing system checks...

System check identified no issues (0 silenced).

You have 13 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): admin, auth, contenttypes, sessions.
Run 'python manage.py migrate' to apply them.

February 19, 2017 - 21:41:45
Django version 1.10.5, using settings 'budget_proj.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
Not Found: /

The message of note is:

You have 13 unapplied migrations...

I see various projects do or do not include a step like this before runserver - don't know if that's required here:

python3 budget_proj/manage.py migrate

Enable automated testing for valid data responses for each endpoint

PR #64 trivially tests the /ocrb endpoint.
Issue #65 can replicate that testing for the other endpoints.

DevOps class Assignment 5 and 6 presented some simple yet effective automated endpoint testing. The Housing backend team have been implementing some more sophisticated automated testing of their endpoints.

Our automation pipeline should ensure that only when the endpoints are still responding with valid data, should we automatically deploy updated code to staging/integration environment. [It remains to be decided whether and under what more stringent conditions to automatically deploy code to the not-yet-available production environment.]

For example:

if the /ocrb endpoint is expected to respond with ~200 JSON records for the FY 2015-16 fiscal year, and all of a sudden the response drops to a single record (e.g. some kind of error message), that should be considered a failed build.
if the /ocrb endpoint is expected to respond with numeric values for the "amount" field and instead it is sending alphanumeric data, that should be considered a failed build (e.g. maybe some columns in the model got mixed up)

A small handful of automated tests to ensure that (a) JSON is being emitted (not some 500 error), (b) a reasonable number of records are being emitted ("reasonable" varying by endpoint obviously) and (c) the data is structured the way it is intended, that sounds to me like a minimum acceptable set of tests to consider the build "still emitting valid data responses".

We can progressively add more tests as the endpoint logic gets more sophisticated, and as we encounter any issues with code deployed to the staging/integration environment.

At this stage of development, and excluding rare moments when we will perform "pre-Demo Day" demos outside of the Hack Oregon audience, I see no reason to otherwise prevent new commits-to-master from being automatically deployed to our staging/integration environment in AWS.

Do we bother with PostgreSQL or just use the simpler sqlite3?

There is active discussion on the team whether we even need the power of a PostgreqSQL database for the "Service Area Budgets" card of our MVP (assuming this is the most likely application for us to launch at Demo Day).

At the moment we:

have no scenarios in which users will submit data
have no access control needs (all data will be available to any user hitting the API endpoints) to restrict tables or views
are working with only a few hundred KB of data to be served up at max by any API response

Questions are:

will this make it significantly harder in the future to swap in a PostgreSQL data layer, or is this reasonably trivial?
are there any significant security risks in running a sqlite3 instance inside the Docker container (if we never expose the running port outside the container)?
does this save us any money on the monthly bill now or in the anticipated future?

This is a reasonable question to ask. The trade-off of not using the standard database may be worth the benefits we gain.

Load OCRB data to AWS-hosted PostgreSQL

Load the current CSV of OCRB data into the Budget team's AWS EC2-hosted PostgreSQL instance.