ttrodrigz / iterake Goto Github PK

View Code? Open in Web Editor NEW

22.0 5.0 7.0 29.63 MB

Create weights with iterative raking.

License: Other

R 100.00%

r weighting raking tidy

iterake's People

Contributors

Stargazers

Watchers

Forkers

sethips strengejacke eugeniogrant matthoendorf thevalueengineers jochen-binder poliscipunk

iterake's Issues

Need for Assistance?

Hi Tony
Sorry. I couldn't find your email address. Do you have a vintage or documentation for your package. I was looking for a package about rim weighting in R. I am happy to help with If you require any.

Error in universe: There are mismatches between the buckets provided, and the unique values of `data

I reinstall R and RStudio, so I have to reinstall your package. I had the version before. Since this you have made some changes. How can I get the version before?

Before I could use characters now I have to use factors as bucket input which leads to some issues.

Quite huge difference of one item result compared to cell weighting

Hello,

I'm new to R. I also don't know much about the details of raking. I just heard about it as an easier way to weight data (compared to cell weighting).

So, I installed your package. All results are very similar to those weighted cell by cell (differences usually do not exceed +/- 3 pp.). Sadly, in one case the difference is 8 pp. Do you have any idea why? It was a simple frequency table for a multiple response set (generated in SPSS using CTABLES).

This is the code (but I think that concerning this issue it's not important much):

#loading libraries
library(foreign)
library(iterake)
library(expss)
library(haven)
library(labelled)
library(rstudioapi)

#raking universe
#the target base was 753 so I increased the N to reach it
uni = universe(data = df, category(name = "q1", 
                                   buckets = c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p"), 
                                   targets = c(0.0752,0.0544,0.0554,0.0262,0.0669,
                                                 0.0870,0.1427,0.0244,0.0535,0.0296,
                                                 0.0593,0.1198,0.0331,0.0362,0.0920,
                                                 0.0441), sum.1 = TRUE), 
                          category(name = "q2",
                                   buckets = c("a","b", "c","d"), 
                                   targets = c(0.1760,0.2406,0.3397,0.2437), sum.1 = TRUE), N = 844)

#creation of the raked dataframe
df.wgt = iterake(universe = uni)

Below you can find results of the question I'm speaking about:

CELL WEIGHT
item1      2%        2
item2      13%      14
item3      15%      17
item4      16%      18
item5      18%      21
item6      21%      24
item7      22%      25
**item8      41%      47**
TOTAL     148%    113

R WEIGHT
item1      2%        2
item2      15%      19
item3      15%      19
item4      17%      21
item5      19%      24
item6      21%      26
item7      24%      31
**item8      33%      42**
TOTAL     146%    126

If you helped to fix this problem in any way it would be fantastic because I find your package very useful.

Greetings,
Konrad

Putting miminum weight value

Hello,

I am using the "max.wgt" argument to cap the maximum weight when using iterake.
But was wondering if there is also a way to cap the minimum weight?

many thanks
Tina

Non-convergence

I'm raking over a lot of characteristics and when I include them all it doesn't converge, but I notice that it stops at 50 iterations. Is there a way to increase the number of iterations to see if convergence happens later?

Weighted N differs from original N

I used your package for weighting survey data. In some cases, the weighted N is smaller than the original N. Here is an example:


library(dplyr)
library(iterake)

spss_data_weighted = iterake(universe = spss_data_uni, 
                                   max.iter = 1500,  
                                   threshold = 0.001, 
                                   stuck.limit = 10)

-- iterake summary -------------------------------------------------------------
 Convergence: Success
  Iterations: 386

Unweighted N: `500.00`
 Effective N: 217.73
  Weighted N: 445.76
  Efficiency: 43.6%
        Loss: 1.296

 NOTE: Threshold met, stopped at difference of 7.705e-01 between weighted sample and universe.


compare_margins(data = spss_data_weighted, weight = weight, universe = spss_data_uni)

print(compare_margins(data = spss_data_weighted, weight = weight, universe = spss_data_uni), n = 52)

# A tibble: 52 x 9
   category bucket uwgt_n  wgt_n uwgt_prop wgt_prop targ_prop   uwgt_diff  wgt_diff
   <chr>    <chr>   <int>  <dbl>     <dbl>    <dbl>     <dbl>       <dbl>     <dbl>
 1 RECAGE   1          72  54.4      0.144  0.122     0.122    0.0220     -8.77e-15
 2 RECAGE   2         143 151.       0.286  0.338     0.338   -0.0524     -7.25e-13
 3 RECAGE   3         204 160.       0.408  0.358     0.358    0.0498      7.75e-13
 4 RECAGE   4          81  80.9      0.162  0.181     0.181   -0.0194     -4.14e-14
 5 RECQ3    1         180  96.2      0.36   0.216     0.213    0.147       2.52e- 3
 6 RECQ3    2         150 110.       0.3    0.246     0.244    0.0563      2.67e- 3
 7 RECQ3    3         100 111.       0.2    0.249     0.251   -0.0506     -1.84e- 3
 8 RECQ3    4          70 129.       0.14   0.289     0.292   -0.152      -3.36e- 3
 9 Q3A      1          70  59.7      0.14   0.134     0.133    0.00728     1.31e- 3
10 Q3A      2          82  72.0      0.164  0.162     0.162    0.00229    -1.73e- 4
11 Q3A      3          35  21.3      0.07   0.0478    0.0481   0.0219     -2.11e- 4
12 Q3A      4          12  11.7      0.024  0.0262    0.0259  -0.00193     2.75e- 4
13 Q3A      5           7   3.33     0.014  0.00746   0.00763  0.00637    -1.67e- 4
14 Q3A      6          18  12.7      0.036  0.0285    0.0275   0.00854     1.09e- 3
15 Q3A      7          33  34.6      0.066  0.0777    0.0786  -0.0126     -8.80e- 4
16 Q3A      8           7   7.69     0.014  0.0173    0.0168  -0.00278     4.76e- 4
17 Q3A      9          41  40.6      0.082  0.0911    0.0923  -0.0103     -1.18e- 3
18 Q3A      10         99  99.5      0.198  0.223     0.221   -0.0232      1.90e- 3
19 Q3A      11         29  22.3      0.058  0.0500    0.0511   0.00689    -1.12e- 3
20 Q3A      12          4   5.32     0.008  0.0119    0.0122  -0.00420    -2.68e- 4
21 Q3A      13         23  19.3      0.046  0.0434    0.0435   0.00252    -1.10e- 4
22 Q3A      14         17   9.31     0.034  0.0209    0.0214   0.0126     -4.69e- 4
23 Q3A      15         13  15.6      0.026  0.0351    0.0359  -0.00985    -7.87e- 4
24 Q3A      16         10  10.7      0.02   0.0240    0.0236  -0.00365     3.25e- 4
25 Q5_1     1         243 244.       0.486  0.548     0.554   -0.0681     -6.38e- 3
26 Q5_1     2         121  90.5      0.242  0.203     0.205    0.0370     -2.01e- 3
27 Q5_1     3          59  29.4      0.118  0.0660    0.0640   0.0540      2.01e- 3
28 Q5_1     4          25  20.7      0.05   0.0465    0.0450   0.00503     1.58e- 3
29 Q5_1     5          15   7.36     0.03   0.0165    0.0160   0.0140      5.03e- 4
30 Q5_1     6          19  19.4      0.038  0.0436    0.0419  -0.00392     1.65e- 3
31 Q5_1     7          18  34.1      0.036  0.0766    0.0739  -0.0379      2.65e- 3
32 Q5_2     1         274 279.       0.548  0.626     0.592   -0.0444      3.32e- 2
33 Q5_2     2         117  66.9      0.234  0.150     0.140    0.0943      1.03e- 2
34 Q5_2     3          50  25.9      0.1    0.0582    0.0550   0.0450      3.22e- 3
35 Q5_2     4          18  22.1      0.036  0.0496    0.0450  -0.00904     4.53e- 3
36 Q5_2     5          14   6        0.028  0.0135    0.00840  0.0196      5.06e- 3
37 Q5_2     6          14  11.4      0.028  0.0255    0.0237   0.00434     1.88e- 3
38 Q5_2     7          13  34.6      0.026  0.0777    0.136   -0.110      -5.82e- 2
39 Q5_3     1         218 232.       0.436  0.521     0.530   -0.0937     -9.05e- 3
40 Q5_3     2         133 118.       0.266  0.264     0.266   -0.00000610 -2.15e- 3
41 Q5_3     3          54  20.7      0.108  0.0465    0.0457   0.0623      7.52e- 4
42 Q5_3     4          22  26.0      0.044  0.0582    0.0572  -0.0132      1.05e- 3
43 Q5_3     5          11   6.67     0.022  0.0150    0.0145   0.00752     4.74e- 4
44 Q5_3     6          32  18.5      0.064  0.0415    0.0404   0.0236      1.08e- 3
45 Q5_3     7          30  24.2      0.06   0.0543    0.0465   0.0135      7.84e- 3
46 Q5_4     1         204 136.       0.408  0.304     0.271    0.137       3.30e- 2
47 Q5_4     2         154 133.       0.308  0.298     0.265    0.0428      3.23e- 2
48 Q5_4     3          66  34.7      0.132  0.0778    0.0694   0.0626      8.44e- 3
49 Q5_4     4          35  52.6      0.07   0.118     0.105   -0.0352      1.28e- 2
50 Q5_4     5          15  12.2      0.03   0.0274    0.0244   0.00561     2.97e- 3
51 Q5_4     6          15  45        0.03   0.101     0.118   -0.0881     -1.72e- 2
52 Q5_4     7          11  33        0.022  0.0740    0.146   -0.124      -7.23e- 2

I fixed this by dividing the original N by the weighted N and multiply the weights by that factor:

X = 500/445.76

fin = spss_data_weighted %>%
  mutate(weight = weight * X)

This works but I don't understand the reason for the difference between the weighted N and the original N of the sample.

Iterake doesn't create weights and R is not showing any error

Hi,

I am using the Iterake package and iterake function for iterative weighting in different survey results. And it works great. But now I have a problem with it and, since RStudio didn't show any error, I don't know what that could be.

I have a dataset "inp" with 22.000 observations (respondents) and 29 variables (four of them are variables that should be used in iterative weighting: gender (with values 1 and 2), age group (with values from 1 to 15, since there are 15 age groups), the third variable is the group that a respondent belongs to according to region (values from 1 to 8) and the fourth variable has values 1 and 2 (depending of the size of a households of a respondent). So I created these four columns in my sample dataset and calculated the same shares in the whole population.

The next step was to use the function universe to create uni:

uni <- universe(
data = inp,
category(name="rim1",
buckets = c("1", "2"),
targets = c(0.51, 0.49)
),
category(name="rim2",
buckets = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"),
targets = c(0.10, 0.06, 0,06, 0.07, 0.07, 0.07, 0.07, 0.08, 0.07, 0.07, 0.07, 0.06, 0.05, 0.05, 0.05)
),
category(name="rim3",
buckets = c("1", "2", "3", "4", "5", "6", "7", "8"),
targets = c(0.26, 0.10, 0.07, 0.08, 0.26, 0.09, 0.06, 0.08)
),
category(name="rim4",
buckets = c("1", "2"),
targets = c(0.22, 0.78)
)
)

That step is working. It creates uni as a list.
The next step, that works with all the other surveys, doesn't work here, i.e. it starts working and never ends, without any errors showed:

inpw <- iterake(universe=uni, threshold = 0.0001, max.wgt=10, max.iter=50)

I tried also without all these arguments, and still the same.
Do you have any idea what could be the issue here? Any suggestion or help would be great.

Thanks for helping and thanks for such a good package, I use it every day. :)

Best regards,

Gordana

universe error

Thanks for the great package. After updating some packages / R etc.. since this spring (when iterake worked), now the `universe function fails. Any ideas? Below are examples from the package.

library(iterake)
data(dealer_data)

# build the 'universe'
dealer_uni <- universe(
  
  df = dealer_data,
  
  category(
    name = "Age",
    buckets = c("18-34", "35-54", "55+"),
    targets = c(.12, .58, .30)
  ),
  
  category(
    name = "Year",
    buckets = c(2015, 2016, 2017, 2018),
    targets = c(.22, .25, .32, .21)
  ),
  
  category(
    name = "Type",
    buckets = c("Car", "SUV", "Truck"),
    targets = c(.38, .47, .15)
  )
  
)
#> Error in rep(vec_seq_along(data), n): invalid 'times' argument

data(weight_me)
universe(
  df = weight_me,
  
  category(
    name = "costume",
    buckets = c("Bat Man", "Cactus"),
    targets = c(0.5, 0.5)),
  
  category(
    name = "seeds",
    buckets = c("Tornado", "Bird", "Earthquake"),
    targets = c(0.4, 0.3, 0.3))
)
#> Can't bind data because some arguments have the same name

^{Created on 2019-09-10 by the reprex package (v0.3.0)}

my sessionInfo

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default

BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib

LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base
other attached packages:

[1] reprex_0.3.0       iterake_0.0.0.9000
loaded via a namespace (and not attached):

[1] tidyselect_0.2.5    xfun_0.9            purrr_0.3.2

[4] splines_3.6.1       haven_2.1.1         lattice_0.20-38

[7] labelled_2.2.1      colorspace_1.4-1    vctrs_0.2.0.9002

[10] htmltools_0.3.6     base64enc_0.1-3     survival_2.44-1.1

[13] rlang_0.4.0         pillar_1.4.2        foreign_0.8-71

[16] glue_1.3.1          RColorBrewer_1.1-2  stringr_1.4.0

[19] munsell_0.5.0       gtable_0.3.0        htmlwidgets_1.3

[22] evaluate_0.14       latticeExtra_0.6-28 knitr_1.24

[25] forcats_0.4.0       callr_3.2.0         ps_1.3.0

[28] htmlTable_1.13.1    Rcpp_1.0.2          clipr_0.7.0

[31] acepack_1.4.1       backports_1.1.4     scales_1.0.0

[34] checkmate_1.9.4     Hmisc_4.2-0         fs_1.3.1

[37] gridExtra_2.3       ggplot2_3.2.1       hms_0.5.1

[40] packrat_0.5.0       digest_0.6.20       stringi_1.4.3

[43] processx_3.3.1      dplyr_0.8.3         grid_3.6.1

[46] tools_3.6.1         magrittr_1.5        lazyeval_0.2.2

[49] tibble_2.1.3        Formula_1.2-3       cluster_2.1.0

[52] whisker_0.3-2       crayon_1.3.4        tidyr_0.8.3.9000

[55] pkgconfig_2.0.2     zeallot_0.1.0       Matrix_1.2-17

[58] data.table_1.12.2   rmarkdown_1.14      assertthat_0.2.1

[61] rstudioapi_0.10     R6_2.4.0            rpart_4.1-15

[64] nnet_7.3-12         compiler_3.6.1

Installing via conda

I'm trying to install the package with conda using:

conda skeleton cran <github_url>
conda build <package-name>

When I run the first command, I get the following error:

(base) ubuntu@ip-10-0-10-231:~$ /home/ubuntu/anaconda3/bin/conda skeleton cran https://github.com/ttrodrigz/iterake.git
Adding in variants from internal_defaults
INFO:conda_build.variants:Adding in variants from internal_defaults
Parsing input package https://github.com/ttrodrigz/iterake.git:
.. name: iterake location: https://github.com/ttrodrigz/iterake new_location: /home/ubuntu/r-iterake
Making/refreshing recipe for iterake
Cloning into '/home/ubuntu/anaconda3/conda-bld/skeleton_1610567786144/work'...
done.
checkout: 'HEAD'
Your branch is up to date with 'origin/_conda_cache_origin_head'.
==> git log -n1 <==

fatal: No names found, cannot describe anything.
commit 03d54cb21f90d321c56d296212f67e07b878fb27
Author: dwitherell <[email protected]>
Date:   Thu Jun 25 14:24:09 2020 -0600

    Minor edit to address funs() deprecation

==> git describe --tags --dirty <==

commit 03d54cb21f90d321c56d296212f67e07b878fb27
Author: dwitherell <[email protected]>
Date:   Thu Jun 25 14:24:09 2020 -0600

    Minor edit to address funs() deprecation

==> git status <==

On branch _conda_cache_origin_head
Your branch is up to date with 'origin/_conda_cache_origin_head'.

nothing to commit, working tree clean


Leaving build/test directories:
  Work:
 /home/ubuntu/anaconda3/conda-bld/skeleton_1610567786144/work
  Test:
 /home/ubuntu/anaconda3/conda-bld/skeleton_1610567786144/test_tmp
Leaving build/test environments:
  Test:
source activate  /home/ubuntu/anaconda3/conda-bld/skeleton_1610567786144/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho
  Build:
source activate  /home/ubuntu/anaconda3/conda-bld/skeleton_1610567786144/_build_env


Error: no tags found

Per conda/conda#6674 (comment), it seems like the releases need to be tagged. Would this be possible?

ttrodrigz / iterake Goto Github PK

iterake's People

Contributors

Stargazers

Watchers

Forkers

iterake's Issues

Need for Assistance?

Error in universe: There are mismatches between the buckets provided, and the unique values of `data

Quite huge difference of one item result compared to cell weighting

Putting miminum weight value

Non-convergence

Weighted N differs from original N

Iterake doesn't create weights and R is not showing any error

universe error

Installing via conda

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent