Code Monkey home page Code Monkey logo

Comments (10)

rudeboybert avatar rudeboybert commented on July 29, 2024 1

Ah yes, we should mark this as 95% "confident", as we use "confident" as short-hand for:

  • this CI was generated from a process that is 95% reliable
  • i.e. if we repeated this procedure 20 times, we expect 19 of the resulting confidence intervals to contain the true parameter

We'll be sure to mark in the next version that we are using "confident" as shorthand for the above two bullet points, and not "confident" in the colloquial sense of the word.

from moderndive_book.

rudeboybert avatar rudeboybert commented on July 29, 2024

Hi, thanks for your input. Could you point a specific location/subsection number of the book?

from moderndive_book.

iwi avatar iwi commented on July 29, 2024

Hi, sure
Line 171 of 08-ci.Rmd (on the current master).

we can be 95% confident that the true mean rating of ALL IMDB ratings is between 5.46 and 6.3.

Which is the same or very similar to statement 5 in the article.

from moderndive_book.

iwi avatar iwi commented on July 29, 2024

Very nice resource, thanks!

from moderndive_book.

rudeboybert avatar rudeboybert commented on July 29, 2024

Note added.

from moderndive_book.

JohnJChristie avatar JohnJChristie commented on July 29, 2024

I sent an email to the authors but they suggested I also write here. I decided to take this opportunity to expand on the email. People run into a lot of troubles with CIs based on one simple mistake, separating understanding and use.

In order to clarify I'll take the t-test as a parallel example. In the very simplest terms the t-test is typically taught as a way to tell when a difference is real. We check if the p-value from the test is below a cutoff and conclude that the effect is real. That's it. A good text book will expand on this and give the reader a deeper understanding. Typically they'll be made aware of Type II and Type I error rates. They might have some expansion of the importance of meeting assumptions. And, there might even be some clarification of what a t-test means as a statistical model. That's all important stuff. But in the end, when reporting the t-test the student is taught to conclude that either the effect is significant or they are uncertain. There is no reporting of any of the calculable probabilities. The .05, or actual p-value, are never clarified as to their actual meaning when reporting or using. All one says is that the test was significant or that it was not.

I'm not complaining about that state of affairs. It's a good thing and as it should be in frequentist statistics because the probabilities are about the long run correctness of decisions and so decisions need to be made. None of the probabilities associated with a t-test hold if the user of the test doesn't commit to significant or uncertain decisions.

Unfortunately, when we teach frequentist confidence intervals we fall into a trap of thinking that the reporting requires explaining and that somehow the 95% (I'll stick with 95% CIs) is something more meaningful than the p-value, the t-value, or alpha. All of those values are only reported to give a full account of the method used to reach a conclusion. The conclusion is that the true value of the effect (mean, parameter, what have you) is in the range of the CI. Students should be taught to report CIs only one way, "the true effect is between (low value) and (high value), (CI 95%)". You never state you're 95% certain about a realized CI, nor do you word things to imply that or worry about it at all. The fact that the range is a 95% CI is a technical detail that needs reporting for the reader just like a t-value, alpha, and p-value are important technical details to be reported in a hypothesis test.

There are lots of reasons reported elsewhere for why you don't say, "I'm 95% confident" whether you're attempting to specifically define confidence or not or put some words in quotes only defined in the text. But I wish to focus on the primary, and often ignore, one. It's because you only get to be 95% confident if you conclude that every unknown CI contains the true value. The authors appear to recognize that the 95% is generated from a fantasy universe where an unlimited number of identical experiments could have occurred each generating a new random CI and that 95% of the time the CI will capture the true value. But, as is very common, they don't carry this through and state that the Universe requires each and every CI to be claimed to contain the true value in order for the 95% to hold. If you waffle on it and claim your 95% confident on every value then you now don't have a 95% CI anymore because you didn't state it was in the interval each time. Hearkening back to the t-test we don't say we're XX% certain there's an effect (maybe you're using post hoc power?). We don't do this because in order to ever get any calculation of certainty in the long run you have to claim an effect each time.

Understanding this dramatically simplifies CI teaching and reporting. You show things like simulations where most CIs capture the true value and help them understand how a CI works. Then you tell students to report that if, a priori, they selected a CI procedure with a high enough probability for them to believe it should contain the true value then they behave as if the CI contains the true value. It's a tool to make a decision, just like the hypothesis test is. Could they be wrong? Of course. They could be wrong claiming significance as well but we don't teach them to focus on the probability of that during reporting. And for the same reasons they cannot be focusing on the probability when reporting a CI. So please please please eliminate the use of the quoted "confident" in text. Let's not redefine words but instead just report the "95% CI" part as technical information that led to the conclusion we make; the interval contains theta.

While I'm at it I'll point out a simple short hand for students. When you use a 95% CI to report where the true value is, you get to be right 95% of the time. When you cover hypothesis testing that turns out to be a miraculously high probability of a correct decision for frequentist testing.

from moderndive_book.

ismayc avatar ismayc commented on July 29, 2024

@JohnJChristie Would you be interested in re-writing this confidence interval chapter to use the phrasing that you would prefer? @rudeboybert and I would be happy to review your contribution to fix the language you speak of if you make a pull request after you have completed the necessary edits.

from moderndive_book.

rudeboybert avatar rudeboybert commented on July 29, 2024

Hi John, thanks for your thoughtful message. I think we can at the very least agree that the issue of the "correct" way to teach the mathematically/probabilistically correct interpretation of confidence intervals is a controversial one.

We however made an editorial decision that for an introductory-level statistics/data science course, there are bigger fish to fry. At a point where there is already great potential to alienate students from the discipline of statistics, we felt the large cost in time/energy associated with the teaching of the mathematically/probabilistically rigorous definition of confidence intervals is simply too high. Especially when other topics like data visualization, data wrangling, and regression are begging for more extensive treatment. That being said, we do not use this prioritization as license to run amok with completely incorrect interpretations like "there is a 95% chance this interval contains the true parameter." We felt that the 95% "confident" language is the right compromise. We're not the only ones to adopt this compromise:

We fully understand that not everyone will agree with our thinking. That is a large part of the reason we followed not only an open-access but also open-source model for publishing ModernDive: it provides users with the minimal tools necessary for complete customizability. If there is something users don't like, they can change it to suit their own ends and needs. The source code uses the bookdown package in RStudio and is available here.

As Chester suggested, if you feel that you have language that sets a better balance throughout the text than we do of staying true to the mathematics/probability while not risking the alienation of students, we're completely open to considering it! We do ask however that potential contributors follow the GitHub pull request model of collaboration, as it greatly simplifies collaboration on our end. Jenny Bryan at UBC has written an excellent online open-source textbook Happy Git and GitHub for the useR to help R users learn GitHub from scratch.

Thanks again for your input in a very controversial debate.
Albert

screen shot 2017-08-16 at 10 24 38 am

from moderndive_book.

JohnJChristie avatar JohnJChristie commented on July 29, 2024

@rudeboybert, your appeal to authority argument is unappealing. :) The main reason is that lots of intro texts get lots of stuff wrong. I can't count the times I've read something equivalent to alpha = p-value. OpenIntro doesn't really get it wrong so much as they're vague enough that it's a bit problematic (unless they've edited things since I last looked). I've discussed the issue with them and I still think they have a problem of the making a clear distinction between things like your Figure 9.4 and how a CI is reported. Their argument is that every time they talk about 95% confidence they're talking about the procedure, not the interval, which is fine if they make that clear. I'm not sure they do. What they've told me is that clarification is up to the instructor and I do think that they leave it open enough the instructor can make it clear. The slide you included is just wrong but it would have been so easy to fix, just change "95% confident" to "conclude". That's all that's necessary! You only get to be 95% confident if you always conclude. It's baffling how many people are married to the idea of including the statistic in the conclusion. It's almost equivalent to saying, "we are 95% confidence that condition A is significantly different from condtion B" after a hypothesis test (I say almost because I'm not sure they're equally incorrect, but they're damn close).

@ismayc I have tried a bit of rewriting. It's relatively small changes I'm suggesting. In fact, what I'm asking for is less reiteration of the 95% confidence. I've edited(, forked, done a pull -don't really know github jargon) lightly. Please see my revised Chapter 9 that only changes after Fig. 9.3.

I did also add a bit after the parallel to a significance test. You might want to consider expanding how much more powerful a CI is in a later edition or more advanced chapter with a couple of simple cases. (It's easy to make graphs where there are two effects, one with a very high p-value and one very low, wherein you can't really draw any conclusion until you have CIs and then the conclusions you would draw are very different from one another).

I also note that where you say, "It's worthy of mention here that confidence intervals are always centered at the observed statistic," is false. For example, the 95% CI around a correlation of 0.98 with an N of 20 does NOT have the 0.98 in the centre (unless the CI is just absurd and includes correlations >1). Just leave all of that out. It's beyond what's necessary for intro.

If we do want to add anything to interpretation we might add something good about the size of CIs but it depends on how far you want to go. It's important to have a good understandable scale to do this but if you select one then another simple pair of figures that each contain two CIs that are either narrow or very wide but substantially overlap. Say you have CIs of the heights of north and south power forwards. You have one study that had a large N and showed two CIs that span maybe 1" total and nearly completely overlap. You have another with a much smaller sample and CIs that span 6" and also nearly completely overlap. With a significance test the conclusion is exactly the same, nothing really. But with CIs from the larger N study one can conclude that there's likely no meaningful difference between the two groups because the range of possible values doesn't really contain differences anyone would care about. (Of course, the variances of the samples have to be pretty similar too). But I'm pretty sure I digress.

from moderndive_book.

ismayc avatar ismayc commented on July 29, 2024

Thank you for your feedback and for your pull request, John. We will be revamping the later chapters of the textbook over the coming months and will take your recommendations into careful consideration as we do that.

from moderndive_book.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.