Comments (11)
Definitely, I think this should be pretty simple. I'll look at it shortly.
from bambi.
Yes, we should definitely add this. I think we want to require the user to explicitly say that they want to drop NAs though (i.e., in fit
, we can have something like dropna=False
that the user can set to True
).
from bambi.
Well, if the alternative is failing with an error though, then maybe dropna=True is the better default -- assuming of course that we let the user know in some way that observations were dropped.
from bambi.
"Dropping x incomplete rows...." would be a good bit of information. One thing to watch out for is that you only drop rows with NAs in columns you care about (a naive dropna in pandas surprised me when trying to filter NAs to avoid this issue), although depending on how you construct the data matrix you're passing to the backend, you may get that for free.
From: Jake Westfall [email protected]
Sent: Sep 10, 2016 03:26
To: bambinos/bambi
Cc: Phillip Alday; Author
Subject: Re: [bambinos/bambi] Deal with NA values (#40)
Well, if the alternative is failing with an error though, then maybe dropna=True is the better default -- assuming of course that we let the user know in some way that observations were dropped.
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHubhttps://github.com//issues/40#issuecomment-246079180, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABmZ1-buq_wVPds-nFF0s7ajulgC7x-9ks5qogc_gaJpZM4J5gBO.
from bambi.
I don't see failing with an exception as something we necessarily want to avoid. I think raising a warning is an okay approach, but I'm not sure people always read the warnings (and they can also be easily suppressed), and I think explicit is better than implicit in this case. My preference would be to check for NaNs and raise an exception with an informative error message (e.g., "Missing values were detected in one of the input variables! We recommend removing or replacing all invalid values manually, to ensure proper treatment. However, if you would like to simply drop all rows with at least one missing/invalid value (i.e., list-wise deletion), please set the dropna argument in fit() to True.")
from bambi.
Failing with an explanation / warning as the default behavior would also be fine. The current failure mode (LHS and RHS have different lengths) is simply a bit confusing at first when the data are entered as a dataframe and thus with identical lengths (albeit not necessarily with identical number of non NA values). This differs from the default R behavior, but that's not necessarily a bad thing, just something to mention in the comparisons to the R packages.
For the case where dropna=True
, I would still (maybe "optionally" at the INFO or DEBUG levels in logging.Logger
type output) display information about the number of rows dropped.
from bambi.
I think it makes sense to do both--i.e., raise an exception by default if NaNs are found, and also raise a warning with the number of dropped rows if dropna=True
. Will try to take a pass at this later this week.
from bambi.
Yes, I'm all for the warning regardless of the default.
from bambi.
Thanks for creating this package. I would like to use nan values in the dependent variable to include predictors for posterior predictive checks, with a pattern like:
import numpy as np, pandas as pd, pymc3 as pm, bambi
df = pd.DataFrame({'p':np.random.normal(size=400)})
# blank out some values, for which I would like to impute
df.loc[np.random.choice(df.index, size=200, replace=False), 'p'] = np.nan
model = bambi.Model(df, dropna=True)
results = model.fit('p ~ 1')
ppc = pm.sample_ppc(model.backend.trace, model=model.backend.model, samples=50)
ppc['p'].shape # hoping for (50, 400, 1)
Is this something that others would be interested in?
from bambi.
Yes, I do think this is something that would be of interest to a lot of people. Another way of implementing this (probably the way I'd prefer) would be to add a .predict()
method to either the Model
or ModelResults
class. I went ahead and created a separate issue (#105) about this idea. Thanks!
from bambi.
.predict()
would work well for me. Thanks!
from bambi.
Related Issues (20)
- Keeping track of and saving bambi-estimated models HOT 3
- Allow to use custom PyMC distributions as priors HOT 2
- dict_to_dataset() got an unexpected keyword argument 'default_dims' HOT 4
- How to plot random(group-based scores) effects? HOT 10
- How to create bounded distribution? HOT 4
- Weights option HOT 1
- Better priors for a log-link function with low frequency HOT 6
- Bambi FAQ could be more helpful HOT 7
- Not consistent naming requirements in plot_cap HOT 2
- Implementation of the R2D2 prior in bambi HOT 2
- Formulae Question for different intercept per group HOT 2
- Plotly Backend for plot_cap() HOT 1
- Add Bill's HSGP talk to the example notebooks HOT 5
- Allow to remove response name's prefix when using aliases for distributional components
- How to give custom truncated distribution as prior HOT 5
- Use `import bambi as bmb` in tests HOT 2
- Add `beta_binomial` and `dirichlet_multinomial` families HOT 1
- Calculating WAIC of Model HOT 4
- Bayes Factor Estimation HOT 5
- Add explicit `plot_cap` example in docs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bambi.