Comments (18)
I believe this is an important issue for making the CPS file available on PolicyBrain, and it appears important for other B-Tax users as well. '
My understanding was that we have imputations for these variables from John O'Hare, but it appears that may be incorrect (@andersonfrailey, could you chime in?)
If so, we may need to impute these variables onto the CPS file from the PUF. My view is that a simple imputations would be sufficient for now.
adding to cc @jdebacker @hdoupe.
from taxdata.
We won't be able to impute the variables @codykallen mentioned using just the CPS data. We impute a few other variables in the PUF off of the CPS so we could expand that process to include the missing variables we need for B-Tax.
from taxdata.
@andersonfrailey said:
We won't be able to impute the variables @codykallen mentioned using just the CPS data. We impute a few other variables in the PUF off of the CPS so we could expand that process to include the missing variables we need for B-Tax.
We may need to do the opposite: impute to the CPS off of the PUF rather than off of the CPS to the PUF.
from taxdata.
@MattHJensen said:
We may need to do the opposite: impute to the CPS off of the PUF rather than off of the CPS to the PUF.
You're right. My comment was poorly worded. We could use the same routine that we use for the of the deductions that are imputed onto the CPS: use the PUF to get beta coefficients that are used in the imputation on the CPS.
from taxdata.
This paper looks like it could be helpful https://www.irs.gov/pub/irs-soi/06ohara.pdf
from taxdata.
A Tax-Calculator and TaxData user asked:
am I right that the CPS file codes all passthrough income as active (e00900) not passive (e02000)? The sum of e00900 matches (more or less) IRS data for the sum of business income and s-corp/partnership income. This obviously has big implications in the House bill. So curious a) if I'm undestanding this correctly, and b) if you have any advice on how to handle if I am.
- Is it true that rather than missing e02000, e02000 is lumped together with e00900?
- Does anyone have a suggestion for a 'quick fix' that either TaxData contributors or the user could implement in the next 24-48 hours?
cc @andersonfrailey @codykallen @Amy-Xu @martinholmer @evtedeschi3
from taxdata.
@MattHJensen, I was under the impression that the CPS file was simply missing e02000
and e26270
. If e02000
and e26270
have been misclassified as e00900
, then this does have serious implications in the House bill, as the bill would make 30% of active business income (e00900 + e26270
) eligible for the 25% top rate but it would make 100% of passive business income (e02000 - e26270
) eligible for that rate.
Conceivably, one could apply an estimate based on the passive share of total business income, reallocate this percentage of each filing unit's e00900
and e00900p
to e02000
, and recalculate. This would come closer to capturing the overall score and distributional effect, but it would miss the degree to which some individuals may pay more or less under the bill.
from taxdata.
Looking through the code, e00900
in the CPS is the sum of the semp_val
(Own business self-employment earnings, total value) variable in the CPS files. Unfortunately the CPS documentation doesn't specify if that means both active and passive income.
What we've done in the past for variables in the CPS that were the sum of multiple variables from the PUF is used the ratio of the two to split the CPS variable. If we're confident that the current e00900
variable in the CPS is actually e00900 + e02000
, we could do the same pretty quickly.
from taxdata.
@andersonfrailey, be careful when splitting e00900
into e00900
and e02000
. If you increase e02000
but not e26270
, then you classify all of that as passive business income, whereas technically e02000
also includes some active business income (from partnerships and S corporations), e26270
.
from taxdata.
Good point @codykallen. So would it be better if we split e00900
into e00900
and e02000
and then used the new e02000
variable to get at e26270
?
from taxdata.
@andersonfrailey, I would recommend splitting e00900
into e00900
(sole proprietorship income or loss), e26270
(partnership and S corporation income or loss), and e02000 - e26270
(passive business income or loss). IRS table 1.4 from from Individual Complete Report has a useful breakdown between: "Business or Profession" (technically sole proprietorship income, e00900
); partnership and S corporation income (e26270
); and rent, royalty, estate and trust income or loss (passive Sch E income, e02000 - e26270
).
from taxdata.
@andersonfrailey, I would recommend splitting e00900 into e00900 (sole proprietorship income or loss), e26270 (partnership and S corporation income or loss), and e02000 - e26270 (passive business income or loss). IRS table 1.4 from from Individual Complete Report has a useful breakdown between: "Business or Profession" (technically sole proprietorship income, e00900); partnership and S corporation income (e26270); and rent, royalty, estate and trust income or loss (passive Sch E income, e02000 - e26270).
+1
from taxdata.
Thanks for the breakdown, @codykallen. I'm working on a TaxData PR now.
from taxdata.
As seen in PR #127, splitting up e00900
isn't an effective method for getting e02000
and e26270
. I'm instead going to try and impute them.
Here's a rough outline of a method I'm considering and would like some feedback on.
First step is to split the IRS PUF into bins based on income and filing type. Then determine the probability of a return having a non-zero value for the variable within each bin. Then either run a regression on those who have a non-zero value or just find the mean and standard deviation.
I’ll then split the CPS tax units into the same bins (using only the ones determined to be filers). Using the probabilities from the PUF, I’ll randomly assign tax units to have a non-zero value for the variable. Among those assigned a non-zero value, I’ll either use the regression parameters to predict a value, or randomly assign one based on the standard deviation and the mean for that bin, depending on which route I take in the first part.
Would love to hear what y'all think of this or if you have a different approach.
from taxdata.
@andersonfrailey, could you explain how this deals with negative / positive values?
from taxdata.
Sure. The issue with using the same methods used when we impute the various deductions is that method uses the log of all deductions. That doesn't work with e02000
and e26270
because they could be negative.
Thinking about this with a more clear head than I was yesterday, I suppose I could just tweak our current method by not using the log of e02000
and e26270
. This would essentially consist of me running a logit model to determine who has a non-zero value for the variable, then an OLS for those determined by the logit to have a non-zero value. This is more or less what we do already for various deductions and expenses.
from taxdata.
@codykallen, @andersonfrailey, @MattHJensen, What's the status of taxdata issue #119?
There's been no discussion of that issue since November 17, 2017.
from taxdata.
This is something I want to work on after this UBI project is finished. I'd like to leave this issue open so that it's easy to find and won't fall off my radar.
from taxdata.
Related Issues (20)
- Updating TaxData with 2014 PUF Variables
- Contributor/User Documentation Missing HOT 3
- Estimating 2014 JCT count/amount targets for nonitemizers? (2014 PUF) HOT 1
- CPS BUG: XTOT not always equal to (nu18 + n1820 + n21)
- CPS BUG: n24 and nu18 not always consistent with taxpayer/spouse ages
- Do we have a sense of where Tax Calculator's 2011 PUF outputs are most likely to be off? HOT 2
- Inconsistent child age information HOT 3
- Broken Links in Documentation
- TaxData and New PUF Vintages
- Code style improvement HOT 2
- Issue error from report.py HOT 1
- weighting issue: total number of population under 18 years old (nu18) not accurate
- 2033 puf weights and puf ratio odd results HOT 6
- manually claim Julia path for make-files HOT 2
- Apply stochastic imputation to split income between spouses in PUF HOT 1
- randomness of taxdata HOT 6
- change `gunzip` to `gzip -d` in Makefile
- `cmbtp` value HOT 1
- growth factor update
- New ACPIU grow factors HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from taxdata.