Code Monkey home page Code Monkey logo

fimdbook's Introduction

Flexible Imputation of Missing Data, Online Version

This repository contains the R Markdown source for the online version of Flexible Imputation of Missing Data, Second Edition (https://stefvanbuuren.name/fimd/).

This repository tracks changes made to the book.

Probably the most useful file is R\fimd.R, which contains the sources used to perform the analyses in the book.

In case you want to alert me on any errors, inconsistencies or possible improvements, please open an issue or submit a pull request.

fimdbook's People

Contributors

stefvanbuuren avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

fimdbook's Issues

Missing Data Directory

Hi @stefvanbuuren ,

I was trying to re-build your book using bookdown::render_book("index.Rmd", "bookdown::pdf_book") in order to covert it into a pdf (personal preference as I find it easier to search through as well as I like being able to add highlights and comments). However when trying to re-build the book it appears to be looking for a data directory which is not included in the repository:

label: c85readdata1
Quitting from lines 11705-11705 (FIMD-bookdown.Rmd) 
Error in lookup.xport(file) : 
  unable to open file: 'No such file or directory'

Inspecting the code I can see:

## ----c85readdata1--------------------------------------------------------
library(foreign)
file.sas <- "data/c85/master85.xport"
original.sas <- read.xport(file.sas)

Would it be possible to either get the data directory added or if not would it be possible for you to upload a pdf version of the book?

Btw

Thank you very much for publicly releasing such great content !

bug in fimd.R code

Error in hist.default(mis, plot = FALSE, breaks = b) :
some 'x' not counted; maybe 'breaks' do not span range of 'x
mi.hist(Yimp, Yobs,
b = seq(-20, 200, 10), type = "continuous",
gray = FALSE, lwd = lwd,
obs.lwd = 1.5, mis.lwd = 1.5, imp.lwd = 1.5,
obs.col = mdc(4),mis.col = mdc(5), imp.col = "transparent",
mlt = 0.08, main = "", xlab = "Ozone (ppb)")
Solution: let the minimum of b be -40 for example?

Formula for "old" degrees of freedom

Thank you Dr. van Buuren for your excellent book!

The first equation in Section 2.3.6 gives the "old" degrees of freedom from Rubin 1987. I believe
(m-1)\left(1+\frac{1}{r^2}\right) should be changed to (m-1)\left(1+\frac{1}{r}\right)^2.

Best,
Gordon Honerkamp-Smith

Typesetting and caption issues in Section 3.8

Hi @stefvanbuuren

Thanks very much for the incredibly helpful resource "Flexible Imputation of Missing Data".

I wanted to alert you to some small issues that I found in your book while reading about nonignorability recently:

  • The Table 3.5 caption does not correspond to what is in the table
  • The Table 3.5 column headings are not typeset correctly, making it hard to understand what each column refers to
  • In Section 3.8.4, one of the equation numbers is not typeset correctly

Best regards,
Rheanna

Formula for RMSE p 52

The formula for RMSE on page 52 has a '(' in the wrong place. It should be \sqrt{E((\bar Q)- Q)^2)}

Thanks Koenraad D'Hollander for noting.

wrong prediction matrix in time raster imputation

The code https://github.com/stefvanbuuren/fimdbook/blob/master/R/fimd.R#L3028 is inconsistnet with the legend below (https://github.com/stefvanbuuren/fimdbook/blob/master/Rmd/11-longitudinal.Rmd#L834), which says:

pred[Y, paste(x, 1:9, sep = )] <- 2

whereas the code indicates

pred[Y, paste("x", 2:9, sep = "")] <- 1

More generally, there are no random effect in the example code (all predictors are set to 1 appart from the class variable), when the reader expect random effects. I guesss this is a typo, and should be corrected ?

Error in simulation code

Hi @stefvanbuuren

I am currently studying Biostatistics and while working on a consulting project I stumbled upon your online book "Flexible Imputation of Missing Data". First of all, thanks for making this publicly available and putting in all the effort. It really helped me a lot. However, when running your simulation example in chapter 2.5, I noticed that R spits out only NaN as estimates and for the confidence interval. When looking for the reason I found that there is an error in the code in the test.impute function. The row names of "tab" are not set so per default they are just "1" and "2". Therefore accessing the second row of "tab" with tab["x", … ] does not work (at least on R version 4.0). It would probably be best to replace "x" with 2. The issue is in the fimd.R file on line 390.

Best regards,
Felix

The Definition of Connectivity

I find this book very interesting, as in my projects I have been having long conversations with my Clients regarding the treatment of missing data; I think that testing the ignorability of missing mechanisms should be the starting point of data imputation process.

Nevertheless, the issue refers to the definition of Connectivity found on page 105 of Flexible Imputation of Missing Data, 2nd Ed. respectively, page 270 of Handbook of Missing Data

The Definition states (emphasis added):

"Connected and unconnected. A missing data pattern is said to be connected if any observed data point can be reached from any other observed data point through a sequence of horizontal or vertical moves (like the rook in chess)"

Looking at Figure 4.1 (reproduced from FIMD), the "File Matching" and "General" panels, I am unable to figure out how the last 3 observed values in the 3-rd column can be connected to any other observed values inside the data set unless the method:

  • allowed passing through missing data records (in which case the above definition would be superfluous as any observed values in a data set become connectable) or,
  • allowed column/row swapping (in which case the above definition seems incomplete by not including these allowances)
  • uses the term "observed data point" as representing the entire column (vector) as a data point in the multi-dimensional cloud defined by the rows (in which case the above definition would not be needing the rook-like movement and, connectivity of two data points (vectors) defined this way would suffice if at least one component is connected)

Please advise, thank you!

Figure 4.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.