Code Monkey home page Code Monkey logo

Comments (18)

MattHJensen avatar MattHJensen commented on June 9, 2024

I believe this is an important issue for making the CPS file available on PolicyBrain, and it appears important for other B-Tax users as well. '

My understanding was that we have imputations for these variables from John O'Hare, but it appears that may be incorrect (@andersonfrailey, could you chime in?)

If so, we may need to impute these variables onto the CPS file from the PUF. My view is that a simple imputations would be sufficient for now.

adding to cc @jdebacker @hdoupe.

from taxdata.

andersonfrailey avatar andersonfrailey commented on June 9, 2024

We won't be able to impute the variables @codykallen mentioned using just the CPS data. We impute a few other variables in the PUF off of the CPS so we could expand that process to include the missing variables we need for B-Tax.

from taxdata.

MattHJensen avatar MattHJensen commented on June 9, 2024

@andersonfrailey said:

We won't be able to impute the variables @codykallen mentioned using just the CPS data. We impute a few other variables in the PUF off of the CPS so we could expand that process to include the missing variables we need for B-Tax.

We may need to do the opposite: impute to the CPS off of the PUF rather than off of the CPS to the PUF.

from taxdata.

andersonfrailey avatar andersonfrailey commented on June 9, 2024

@MattHJensen said:

We may need to do the opposite: impute to the CPS off of the PUF rather than off of the CPS to the PUF.

You're right. My comment was poorly worded. We could use the same routine that we use for the of the deductions that are imputed onto the CPS: use the PUF to get beta coefficients that are used in the imputation on the CPS.

from taxdata.

MattHJensen avatar MattHJensen commented on June 9, 2024

This paper looks like it could be helpful https://www.irs.gov/pub/irs-soi/06ohara.pdf

from taxdata.

MattHJensen avatar MattHJensen commented on June 9, 2024

A Tax-Calculator and TaxData user asked:

am I right that the CPS file codes all passthrough income as active (e00900) not passive (e02000)? The sum of e00900 matches (more or less) IRS data for the sum of business income and s-corp/partnership income. This obviously has big implications in the House bill. So curious a) if I'm undestanding this correctly, and b) if you have any advice on how to handle if I am.

  • Is it true that rather than missing e02000, e02000 is lumped together with e00900?
  • Does anyone have a suggestion for a 'quick fix' that either TaxData contributors or the user could implement in the next 24-48 hours?

cc @andersonfrailey @codykallen @Amy-Xu @martinholmer @evtedeschi3

from taxdata.

codykallen avatar codykallen commented on June 9, 2024

@MattHJensen, I was under the impression that the CPS file was simply missing e02000 and e26270. If e02000 and e26270 have been misclassified as e00900, then this does have serious implications in the House bill, as the bill would make 30% of active business income (e00900 + e26270) eligible for the 25% top rate but it would make 100% of passive business income (e02000 - e26270) eligible for that rate.

Conceivably, one could apply an estimate based on the passive share of total business income, reallocate this percentage of each filing unit's e00900 and e00900p to e02000, and recalculate. This would come closer to capturing the overall score and distributional effect, but it would miss the degree to which some individuals may pay more or less under the bill.

from taxdata.

andersonfrailey avatar andersonfrailey commented on June 9, 2024

Looking through the code, e00900 in the CPS is the sum of the semp_val (Own business self-employment earnings, total value) variable in the CPS files. Unfortunately the CPS documentation doesn't specify if that means both active and passive income.

What we've done in the past for variables in the CPS that were the sum of multiple variables from the PUF is used the ratio of the two to split the CPS variable. If we're confident that the current e00900 variable in the CPS is actually e00900 + e02000, we could do the same pretty quickly.

from taxdata.

codykallen avatar codykallen commented on June 9, 2024

@andersonfrailey, be careful when splitting e00900 into e00900 and e02000. If you increase e02000 but not e26270, then you classify all of that as passive business income, whereas technically e02000 also includes some active business income (from partnerships and S corporations), e26270.

from taxdata.

andersonfrailey avatar andersonfrailey commented on June 9, 2024

Good point @codykallen. So would it be better if we split e00900 into e00900 and e02000 and then used the new e02000 variable to get at e26270?

from taxdata.

codykallen avatar codykallen commented on June 9, 2024

@andersonfrailey, I would recommend splitting e00900 into e00900 (sole proprietorship income or loss), e26270 (partnership and S corporation income or loss), and e02000 - e26270 (passive business income or loss). IRS table 1.4 from from Individual Complete Report has a useful breakdown between: "Business or Profession" (technically sole proprietorship income, e00900); partnership and S corporation income (e26270); and rent, royalty, estate and trust income or loss (passive Sch E income, e02000 - e26270).

from taxdata.

MattHJensen avatar MattHJensen commented on June 9, 2024

@andersonfrailey, I would recommend splitting e00900 into e00900 (sole proprietorship income or loss), e26270 (partnership and S corporation income or loss), and e02000 - e26270 (passive business income or loss). IRS table 1.4 from from Individual Complete Report has a useful breakdown between: "Business or Profession" (technically sole proprietorship income, e00900); partnership and S corporation income (e26270); and rent, royalty, estate and trust income or loss (passive Sch E income, e02000 - e26270).

+1

from taxdata.

andersonfrailey avatar andersonfrailey commented on June 9, 2024

Thanks for the breakdown, @codykallen. I'm working on a TaxData PR now.

from taxdata.

andersonfrailey avatar andersonfrailey commented on June 9, 2024

As seen in PR #127, splitting up e00900 isn't an effective method for getting e02000 and e26270. I'm instead going to try and impute them.

Here's a rough outline of a method I'm considering and would like some feedback on.

First step is to split the IRS PUF into bins based on income and filing type. Then determine the probability of a return having a non-zero value for the variable within each bin. Then either run a regression on those who have a non-zero value or just find the mean and standard deviation.

I’ll then split the CPS tax units into the same bins (using only the ones determined to be filers). Using the probabilities from the PUF, I’ll randomly assign tax units to have a non-zero value for the variable. Among those assigned a non-zero value, I’ll either use the regression parameters to predict a value, or randomly assign one based on the standard deviation and the mean for that bin, depending on which route I take in the first part.

Would love to hear what y'all think of this or if you have a different approach.

cc @codykallen @Amy-Xu

from taxdata.

MattHJensen avatar MattHJensen commented on June 9, 2024

@andersonfrailey, could you explain how this deals with negative / positive values?

from taxdata.

andersonfrailey avatar andersonfrailey commented on June 9, 2024

Sure. The issue with using the same methods used when we impute the various deductions is that method uses the log of all deductions. That doesn't work with e02000 and e26270 because they could be negative.

Thinking about this with a more clear head than I was yesterday, I suppose I could just tweak our current method by not using the log of e02000 and e26270. This would essentially consist of me running a logit model to determine who has a non-zero value for the variable, then an OLS for those determined by the logit to have a non-zero value. This is more or less what we do already for various deductions and expenses.

from taxdata.

martinholmer avatar martinholmer commented on June 9, 2024

@codykallen, @andersonfrailey, @MattHJensen, What's the status of taxdata issue #119?
There's been no discussion of that issue since November 17, 2017.

from taxdata.

andersonfrailey avatar andersonfrailey commented on June 9, 2024

This is something I want to work on after this UBI project is finished. I'd like to leave this issue open so that it's easy to find and won't fall off my radar.

from taxdata.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.