Code Monkey home page Code Monkey logo

Comments (7)

rpruim avatar rpruim commented on July 20, 2024

@dtkaplan, I'm not seeing the behavior you describe:

df_stats(~ age, data = HELPrct, mean, median)
##   mean_age median_age
## 1 35.65342         35
df_stats(~ age, data = HELPrct, mean, median) %>% class()
## [1] "data.frame"
df_stats( age ~ 1, data = HELPrct, mean, median) %>% class()
## [1] "data.frame"

@rpruim, it happens when there is just one stat being calculated.

df_stats(~ age, data = HELPrct, mean) %>% class()
## [1] "numeric"
df_stats(~ age, data = HELPrct, mean, mean) %>% class()
## [1]  "data.frame"

from ggformula.

rpruim avatar rpruim commented on July 20, 2024

df_stats() does not require that the functions be in the ggformula package. So at the level of functionality, it doesn't matter where they are located. Possible reasons to include them in ggformula:

  • If they are useful for examples or in our vignette
  • If they are likely to be used in conjunction with plotting
  • If they are not specific to other tasks and don't have another natural home.

Of course, to be included in ggformula now, they should be well thought out, tested, and documented. Else we should wait until the next-next release.

from ggformula.

rpruim avatar rpruim commented on July 20, 2024

I'm not entirely sure I understand your comment about fargs. Are you worried about combining functions that do and don't take level in the same call to df_stats()?

Note: if the functions take ..., they should handle an unused level argument just fine.

We could check the elements of fargs against the formals of the functions used, but that would cause problems for functions that use ... to pass important arguments on to other functions, so I don't like that idea, at least not as a default.

from ggformula.

dtkaplan avatar dtkaplan commented on July 20, 2024

@rpruim I've written several of the interval statistics.

  • ci.mean()
  • ci.median()
  • ci.sd()
  • coverage()
  • ci.proportion() which goes along with proportion()

For the moment, they are stashed in mosaicModel in the file interval_statistics.R. Use them like this:

mtcars %>% df_stats(hp ~ cyl, ci.mean())
##   cyl     lower     upper
## 1   4  68.57236  96.70037
## 2   6  99.84850 144.72293
## 3   8 179.78111 238.64746

You can give an argument to set the confidence level, e.g. level = 0.80

  • I don't think they belong in mosaicModel. We can decide later whether to move them to ggformula
  • proportion() and ci.proportion() can take an optional argument nm identifying the level of the categorical variable to use in calculating the proportion. They also take a level argument to give the confidence level --- interesting how the nomenclature conflicts here between a "level" of a variable and a "level" for a confidence interval.
  • The naming of the output isn't very nice. I like this existing behavior:
mtcars %>% df_stats(hp ~ cyl, mean)
##   cyl   mean_hp
## 1   4  82.63636
## 2   6 122.28571
## 3   8 209.21429

Which can, if desired, be overridden by using a named argument: e.g. my_preferred_name = mean
But this doesn't carry over well to statistics that return multiple numbers, even when those numbers are named.

A statement in this form loses the variable name, but does pick up the statistic names

mtcars %>% df_stats(hp ~ cyl, ci.mean())
##   cyl     lower     upper
## 1   4  68.57236  96.70037
## 2   6  99.84850 144.72293
## 3   8 179.78111 238.64746

But we can't put it back by explicit naming, since the statistic names are not picked up when an explicit name is given:

mtcars %>% df_stats(hp ~ cyl, hp_mean = ci.mean())
##   cyl  hp_mean1  hp_mean2
## 1   4  68.57236  96.70037
## 2   6  99.84850 144.72293
## 3   8 179.78111 238.64746

from ggformula.

rpruim avatar rpruim commented on July 20, 2024

This has gotten rambly and unrelated to the issue originally described.

@dtkaplan, I'm closing this since I was never able to recreate the original problem you thought you had, so I don't see a reason to redesign df_stats() at the moment.

Let's open new issues focussed on particular things, if they exist.

from ggformula.

rpruim avatar rpruim commented on July 20, 2024

As for your new functions -- I can't figure out what you were thinking when you named that argument nm. It conjures nothing meaningful to me. How about success, which matches what we do elsewhere in mosaic and is the traditional name for this.

Names matter.

from ggformula.

rpruim avatar rpruim commented on July 20, 2024

See #36 for an issue explicitly about column names,

from ggformula.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.