Currently, it seems that if you have NA / NaN (dependent variable) cells, you get an e

Deal with NA values about bambi HOT 11 CLOSED

bambinos commented on May 30, 2024

Deal with NA values

from bambi.

Comments (11)

jake-westfall commented on May 30, 2024

Definitely, I think this should be pretty simple. I'll look at it shortly.

from bambi.

tyarkoni commented on May 30, 2024

Yes, we should definitely add this. I think we want to require the user to explicitly say that they want to drop NAs though (i.e., in fit, we can have something like dropna=False that the user can set to True).

from bambi.

jake-westfall commented on May 30, 2024

Well, if the alternative is failing with an error though, then maybe dropna=True is the better default -- assuming of course that we let the user know in some way that observations were dropped.

from bambi.

palday commented on May 30, 2024

"Dropping x incomplete rows...." would be a good bit of information. One thing to watch out for is that you only drop rows with NAs in columns you care about (a naive dropna in pandas surprised me when trying to filter NAs to avoid this issue), although depending on how you construct the data matrix you're passing to the backend, you may get that for free.

From: Jake Westfall [email protected]
Sent: Sep 10, 2016 03:26
To: bambinos/bambi
Cc: Phillip Alday; Author
Subject: Re: [bambinos/bambi] Deal with NA values (#40)

Well, if the alternative is failing with an error though, then maybe dropna=True is the better default -- assuming of course that we let the user know in some way that observations were dropped.

You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHubhttps://github.com//issues/40#issuecomment-246079180, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABmZ1-buq_wVPds-nFF0s7ajulgC7x-9ks5qogc_gaJpZM4J5gBO.

from bambi.

tyarkoni commented on May 30, 2024

I don't see failing with an exception as something we necessarily want to avoid. I think raising a warning is an okay approach, but I'm not sure people always read the warnings (and they can also be easily suppressed), and I think explicit is better than implicit in this case. My preference would be to check for NaNs and raise an exception with an informative error message (e.g., "Missing values were detected in one of the input variables! We recommend removing or replacing all invalid values manually, to ensure proper treatment. However, if you would like to simply drop all rows with at least one missing/invalid value (i.e., list-wise deletion), please set the dropna argument in fit() to True.")

from bambi.

palday commented on May 30, 2024

Failing with an explanation / warning as the default behavior would also be fine. The current failure mode (LHS and RHS have different lengths) is simply a bit confusing at first when the data are entered as a dataframe and thus with identical lengths (albeit not necessarily with identical number of non NA values). This differs from the default R behavior, but that's not necessarily a bad thing, just something to mention in the comparisons to the R packages.

For the case where dropna=True, I would still (maybe "optionally" at the INFO or DEBUG levels in logging.Logger type output) display information about the number of rows dropped.

from bambi.

tyarkoni commented on May 30, 2024

I think it makes sense to do both--i.e., raise an exception by default if NaNs are found, and also raise a warning with the number of dropped rows if dropna=True. Will try to take a pass at this later this week.

from bambi.

palday commented on May 30, 2024

Yes, I'm all for the warning regardless of the default.

from bambi.

aflaxman commented on May 30, 2024

Thanks for creating this package. I would like to use nan values in the dependent variable to include predictors for posterior predictive checks, with a pattern like:

import numpy as np, pandas as pd, pymc3 as pm, bambi

df = pd.DataFrame({'p':np.random.normal(size=400)})

# blank out some values, for which I would like to impute
df.loc[np.random.choice(df.index, size=200, replace=False), 'p'] = np.nan

model = bambi.Model(df, dropna=True)
results = model.fit('p ~ 1')

ppc = pm.sample_ppc(model.backend.trace, model=model.backend.model, samples=50)
ppc['p'].shape  # hoping for (50, 400, 1)

Is this something that others would be interested in?

from bambi.

jake-westfall commented on May 30, 2024

Yes, I do think this is something that would be of interest to a lot of people. Another way of implementing this (probably the way I'd prefer) would be to add a .predict() method to either the Model or ModelResults class. I went ahead and created a separate issue (#105) about this idea. Thanks!

from bambi.

aflaxman commented on May 30, 2024

.predict() would work well for me. Thanks!

from bambi.

Deal with NA values about bambi HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent