Code Monkey home page Code Monkey logo

Comments (11)

jake-westfall avatar jake-westfall commented on May 30, 2024

Definitely, I think this should be pretty simple. I'll look at it shortly.

from bambi.

tyarkoni avatar tyarkoni commented on May 30, 2024

Yes, we should definitely add this. I think we want to require the user to explicitly say that they want to drop NAs though (i.e., in fit, we can have something like dropna=False that the user can set to True).

from bambi.

jake-westfall avatar jake-westfall commented on May 30, 2024

Well, if the alternative is failing with an error though, then maybe dropna=True is the better default -- assuming of course that we let the user know in some way that observations were dropped.

from bambi.

palday avatar palday commented on May 30, 2024

"Dropping x incomplete rows...." would be a good bit of information. One thing to watch out for is that you only drop rows with NAs in columns you care about (a naive dropna in pandas surprised me when trying to filter NAs to avoid this issue), although depending on how you construct the data matrix you're passing to the backend, you may get that for free.


From: Jake Westfall [email protected]
Sent: Sep 10, 2016 03:26
To: bambinos/bambi
Cc: Phillip Alday; Author
Subject: Re: [bambinos/bambi] Deal with NA values (#40)

Well, if the alternative is failing with an error though, then maybe dropna=True is the better default -- assuming of course that we let the user know in some way that observations were dropped.

You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHubhttps://github.com//issues/40#issuecomment-246079180, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABmZ1-buq_wVPds-nFF0s7ajulgC7x-9ks5qogc_gaJpZM4J5gBO.

from bambi.

tyarkoni avatar tyarkoni commented on May 30, 2024

I don't see failing with an exception as something we necessarily want to avoid. I think raising a warning is an okay approach, but I'm not sure people always read the warnings (and they can also be easily suppressed), and I think explicit is better than implicit in this case. My preference would be to check for NaNs and raise an exception with an informative error message (e.g., "Missing values were detected in one of the input variables! We recommend removing or replacing all invalid values manually, to ensure proper treatment. However, if you would like to simply drop all rows with at least one missing/invalid value (i.e., list-wise deletion), please set the dropna argument in fit() to True.")

from bambi.

palday avatar palday commented on May 30, 2024

Failing with an explanation / warning as the default behavior would also be fine. The current failure mode (LHS and RHS have different lengths) is simply a bit confusing at first when the data are entered as a dataframe and thus with identical lengths (albeit not necessarily with identical number of non NA values). This differs from the default R behavior, but that's not necessarily a bad thing, just something to mention in the comparisons to the R packages.

For the case where dropna=True, I would still (maybe "optionally" at the INFO or DEBUG levels in logging.Logger type output) display information about the number of rows dropped.

from bambi.

tyarkoni avatar tyarkoni commented on May 30, 2024

I think it makes sense to do both--i.e., raise an exception by default if NaNs are found, and also raise a warning with the number of dropped rows if dropna=True. Will try to take a pass at this later this week.

from bambi.

palday avatar palday commented on May 30, 2024

Yes, I'm all for the warning regardless of the default.

from bambi.

aflaxman avatar aflaxman commented on May 30, 2024

Thanks for creating this package. I would like to use nan values in the dependent variable to include predictors for posterior predictive checks, with a pattern like:

import numpy as np, pandas as pd, pymc3 as pm, bambi

df = pd.DataFrame({'p':np.random.normal(size=400)})

# blank out some values, for which I would like to impute
df.loc[np.random.choice(df.index, size=200, replace=False), 'p'] = np.nan

model = bambi.Model(df, dropna=True)
results = model.fit('p ~ 1')

ppc = pm.sample_ppc(model.backend.trace, model=model.backend.model, samples=50)
ppc['p'].shape  # hoping for (50, 400, 1)

Is this something that others would be interested in?

from bambi.

jake-westfall avatar jake-westfall commented on May 30, 2024

Yes, I do think this is something that would be of interest to a lot of people. Another way of implementing this (probably the way I'd prefer) would be to add a .predict() method to either the Model or ModelResults class. I went ahead and created a separate issue (#105) about this idea. Thanks!

from bambi.

aflaxman avatar aflaxman commented on May 30, 2024

.predict() would work well for me. Thanks!

from bambi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.