Code Monkey home page Code Monkey logo

bga-payroll's People

Contributors

abigailblachman avatar bga-admin avatar derekeder avatar evz avatar fgomez828 avatar fgregg avatar hancush avatar jmithani avatar phjudge avatar smcalilly avatar xmedr avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

bga-payroll's Issues

Display wages and salaries on different charts

Some salary data is actually an hourly or per-appearance wage. The difference in scale makes it awkward to display on one chart:

screen shot 2018-02-20 at 10 52 07 am

Chart salaries and wages separately. Give the user the option to toggle between them.

Auto-advance through review steps, if there is nothing to review

the app isn't yet smart enough to advance to the next step without observing that the queue is empty. it doesn't make that observation, until someone tries to access review. in the transition code, check whether there is anything to review in the queue; if there isn't, trigger the next step.

Log of data import transformations

  • Convert empty strings to nulls
  • Remove exact duplicate rows
  • Remove weird characters from dates and date-ify them
  • Remove excess whitespace around strings

[meta] improvements to search

  • implement solr (related: decide whether to use haystack)
    • advanced query parsing
    • fast
    • faceting (for search results)
    • enables later improvements, like geosearch (employers near joliet, i.e.)
  • build a better interface for searching: 1. from the landing page, 2. from search results, 3. from entity pages (?)

How should we handle one person having multiple "salaries" in a single year?

Related to, but separate from, #4.

It looks like there are multiple records for one employee when they get a raise. For example, there are 3 unique records for Justin R Thiede in the 2017 data, each with a different "salary": $8.75, $9.25 and $9.50.

2 | 613 | WEST CHICAGO PARK DISTRICT | THIEDE | JUSTIN R | Β  | AQUATICS | 8.75 | 3/19/13 | 2017
2 | 613 | WEST CHICAGO PARK DISTRICT | THIEDE | JUSTIN R | Β  | AQUATICS | 9.25 | 3/19/13 | 2017
2 | 613 | WEST CHICAGO PARK DISTRICT | THIEDE | JUSTIN R | Β  | AQUATICS | 9.5 | 3/19/13 | 2017

How should this be represented?

point source links at the right place

our payroll models are now related to the standardized file they came from via their vintage. let's update the source urls in the front end to lead to those files!

write an error queue

the flush / match_or_create routines provide us an opportunity to intercept records that error out for some reason when we try to insert them in to the database. let's make a queue for those things so the user can review what went wrong and act accordingly.

related to #55.

Retain aliases for responding agencies and employers

Users are asked to review unknown responding agencies and employers during the data import process. They can choose to link them to an existing entity, or add them as a new entity. If they choose to link to an existing entity, we should retain both names for the entity as aliases. This will allow us to link the employer using either representation in future years of data.

Should there be a "Job" model

Right now, we link a person to a position through a Salary model.

I think we might want to have "job" model that links those two. There are three reasons.

  1. start_date is not attached to the salary, but start_date is not a property of salary conceptually. start_date is a property of how long someone has held a position, i.e a "job".
  2. Similarly the (person, position) tuple is replicated for every salary that we have for a "job" these data could be normalized out if we had a job model
  3. I found it pretty surprising that the way that you link people and positions is through a table called salary. This is pretty subjective, but I think it will be clearer to future developers (or versions of ourselves) if we we had a job model.

add date boundaries to the salary object

historically, bga has gathered prospective pay for a standard period (the calendar year). someday, they may gather actual pay. on that occasion, because employees start or leave jobs at all times of year – not just jan 1 or dec 31 – it would be nice to keep track of bounding dates for that pay. (see #56 / #57.)

discuss alternative queueing mechanism

our choice for distributed review queues, saferedisqueue, has bugs. πŸ›πŸžπŸœπŸ•·

perhaps most pressingly, the re-queueing mechanism "may work okay in a single-consumer setup", but doesn't otherwise. this is a big strike against distributed work!

let's:

  • talk about our choice and alternatives (the srq dev mentioned rabbit mq, which plays nice with celery...),
  • decide on a path forward, and
  • implement it.

prompt user to try again in a few minutes when there is nothing to checkout, but there are still unreviewed records in the queue

when multiple users are reviewing, or when one user is reviewing, but navigates away from the page, the queue can be exhausted without all records being reviewed. checked out records expire after five minutes. tell the user of the situation, and ask them to come back in a few minutes. at such time, either review will be completed, or checked out records will be available again.

when and how should we filter payroll models for the most current year?

we will soon be managing multiple years of data. sometimes, we will want to be able to filter that data by year. the first year data appears, is accessible through vintage__standardized_files__reporting_year on each object. a record is created for every incoming Salary, every year, such that we can intuit when related Job and Person come and go, based on objects related to the Salary. that means filtering should be done at the Salary level. and that means our orm and sql queries need to be refactored to do this filtering by default, in a way that is also configurable via user input.

let's talk a bit about this irl.

filter

  • aggregate statistics
  • employer pages (employee and department lists)

don't filter

  • employee pages (we'll want to see all jobs / salaries a person has been paid)
  • search (or, make this an option?)

write delayed task for subsequent standardized data imports

after the first import (#49), there will be a canonical universe of data we need to squash new data into. wire this up, collapsing records only when we can be absolutely sure they belong together.

to-do:

responding agency

  • queue
  • view
  • review (match / add) endpoints

parent employer

  • queue
  • view
  • review (match / add) endpoints

child employer

  • queue
  • view
  • review (match / add) endpoints

salary

  • queue
  • view
  • review (match/add) endpoints

Charts should have labels

The existing charts rely too heavily on tooltips. Add some labels that don't require interacting with the chart.

investigate oop with tasks in celery

we've defined a base task class with dynamic shared context for all of our delayed work in e52602f. this context is contigent on the standardized file we're operating on, e.g., each task expects a standardized file id.

however, the need for dynamic context introduces a challenge, because "the __init__ constructor (of the Task class) will only be called once per process."

this means we cannot use the __init__ method to establish context, given a standardized file id. instead, we define a setup method that accepts this id and sets class attributes accordingly.

we run this method each time a task is issued via celery's task_prerun signal. this signal provides access to the pending task (sender), as well as its args and kwargs, with which we can run setup prior to executing any task code.

this has the effect of giving us access to those contextual attributes in the task, without having to call setup at the top of each one.

however, it feels a little hacky.

when a task method is bound to a base task class, the code in the bound task is injected into that class as the run method. however, because we are not in a class context in our task method, it's not possible to define a common run method the base class and extend it via super() in the method because the task code is injected as the run method of the base class, running super(BaseClass, self).run() calls the run method of celery's Task class (the base class's parent), which raises a NotImplementedError.

source:

 def run(self, *args, **kwargs):
        """The body of the task executed by workers."""
        raise NotImplementedError('Tasks must define the run method.')

exception:

tp = <class 'celery.backends.base.NotImplementedError'>
value = NotImplementedError('Tasks must define the run method.',), tb = None

    def reraise(tp, value, tb=None):
        """Reraise exception."""
        if value.__traceback__ is not tb:
            raise value.with_traceback(tb)
>       raise value
E       celery.backends.base.NotImplementedError: Tasks must define the run method.

is there something else we should hook into, to run common code prior to task execution? or are the conventions here just unconventional?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.