Code Monkey home page Code Monkey logo

dataviz's People

Contributors

ax3man avatar bbolker avatar clauswilke avatar jonmcalder avatar malcolmbarrett avatar steveputman avatar tjmahr avatar trashbirdecology avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dataviz's Issues

Typo in Preface

In "Good, bad, and ugly figures" there is a missing "to"
FROM:
Throughout this book, I show many different versions of the same figures, some as examples of how make a good visualization and some as examples of how not to.

TO:
Throughout this book, I show many different versions of the same figures, some as examples of how to make a good visualization and some as examples of how not to.

I.e. from "...how make a good visualization..." to "...how to make a good visualization..."

Typos/Grammar

Preface (P1, L6) - "a obscure" needs to be "an obscure."

Directory of visualizations: Plots of association (scatter, contour, etc.)

First, this looks fantastic! can't wait to see the final product.

Second, I realize it is still a work in process so I may be jumping the gun here, but I didn't notice anything in your directory of visualizations about plots of association (not sure that is the best term...) I am thinking of things like scatter plots, countour plots, etc. Essentially anything with one dist on x, one on y (and I suppose one on z).

Just wondering if you plan on adding these in and again, this looks great!

Rework data:ink ratio chapter

The chapter on data:ink ratio should be reworked to take into account the feedback by @steveharoz. See this Twitter thread: https://twitter.com/sharoz/status/1005868631023112192

Transcribed:

Why perpetuate the myth of the importance of a data-to-ink ratio? It's based entirely on Tufte's opinion books rather than empirical evidence. Debunked many times.

Bateman et al. CHI 2010
Borgo et al. TVCG 2012
Borkin et al. TVCG 2013
Haroz et al. CHI 2015
Skau et al. CGF 2015

Collectively, these articles refute the notion that “ink” or non-minimal graphical elements is predictive of performance: 1. Bateman et al 2010 and Haroz et al 2015 showed that some embellishments improve performance. 2. Bateman et al 2010, Haroz et al 2015, and Skau et al 2015 failed to find a measureable performance hit for some embellishments. 3. Haroz et al 2015 and Skau 2015 showed that some embellishments harm performance. So non-minimal ink can either improve, reduce, or not affect performance. It’s an irrelevant dimension. Of course, not every form of ink (e.g. grids and backgrounds) was tested. That could become a bit of a no true scotsman issue. But they do show that ink quantity fails to predict much of anything. So why use the term at all? And what evidence is there that it’s worth the effort to minimize contrast of grids or outlines? It's fine to like and advocate the style. But no need for the psuedosciency term. As for Borkin et al 2013, it showed an improvement in recognizability, which I completely agree is not the same as memorability.

Plan for revisions:

  • Rename the chapter to "Balance the data:ink ratio".

  • I consider the data:ink ratio useful to think about extreme cases: all the way to one end or all the way to the other end figures become ugly. In the middle, though, there is a large range of options that can work well.

  • Cite some of the relevant research literature.

  • Add a version of Figure 18.2 with a frame around the plot panel, as proposed by @hadley.

  • Make it clearer that many of the recommendations in this chapter are design choices that are guided to some extent by personal taste. Different people may make different choices, and that's fine.

Section about visualization of intersecting sets

Currently, the book doesn't have a section dedicated to the representation of multiple intersecting sets. This subject may be within the scope of the book and it's inclusion should be interesting.

I would suggest a discussion about Venn diagrams and UpSet plots.

Discrepancy in the Okabe and Ito palette

In the book, #999999 is listed as a color of the Okabe and Ito (2008) palette. But this color is not listed in their site, they use #000000 instead.

Is there a reason to use #999999 in the book? As a deutan, I find #000000 much easier to see, as it contrasts better with the other colors of the palette.

Thank you for all the work you put on this book!

Use of color and legend order in some Titanic figures (Chapter 5)

Thanks for sharing your work at this early stage, looking forward to get (and recommend!) the final book.

Two details to improve the overall coherence of figures using the Titanic dataset in chapter 5:

Figure 5.9, which is OK by itself, shows females distribution in blue, while all the other figures using this dataset use blue for males.

Also, you could consider to rearrange the order of genders in the legend in figure 5.6 to match the ones in 5.7 and 5.8; or, as I assume that the reordering is due to the same reasons explained later in figures 14.5 and 14.6, it would suffice to add a reference to that explanation.

Small typo in chapter 1

Hi Claus,

I discovered your book today via social media and I am very much enjoying it (and learning a bunch of stuff along the way!), thank you.

FYI, in chapter 1 there is a small typo:
"Let’s put things into practice. We can take the dataset shown in Table 1.2, map tempterature onto the y axis,"
Obviously it should be "temperature".

Best regards,
Andrew

Typo in description of image 17.4

Figure 17.4: Density estimates of the sepal lengths of three different iris species. By using solid, colored lines we have solved the probme of Figure 17.3 that...

I figure this should be problem

CDF Typo in Chapter 7

Both the chapter and section title, along with one instance in the first paragraph refer to the ecdf as the empirical cumulative density function instead of the distribution function. (BTW this book is great)

Revise chapter: Handling overlapping points

This chapter needs some revisions:

  1. The 2d histogram section should link to the 1d histogram chapter (Visualizing Distributions I).

  2. The contour lines section could do with better examples. The blue jay dataset will work better.

  3. The discussion about trend lines should be linked to the yet-to-be-written chapter about visualizing trends.

  4. Add one example of many contour lines in different colors showing different subsets, labeled "bad": When there are too many different subsets, the resulting figure becomes undecipherable.

Add subsection about memorable figure.

In the "telling a story" figure, it might make sense to add a brief section called "Make a memorable figure". Research has shown that embellished figures can be more memorable than plain figures (e.g.:
Bateman et al. 2010). I just need a good idea for an embellished figure.

9.2 The case for side-by-side bar charts (and maybe line plots?)

This is a great read through, thank you so much for your hard work and great communication!

I was reading through chapter 9, section 9.2 and I totally agree the side-by-side bar chart is the most logical choice in comparison to stacked bar charts and pie charts. I don't know if you cover this later, but I was thinking to myself I would have actually done a line plot where each line is a company, the x axis is the year, and the y axis is the share percent. My only grievance with this kind of plot is that the overlapping lines could obfuscate trends whereas the side-by-side bar chart doesn't have that problem. However, it is a bit tougher to see yearly trends of of the companies just by tracing the height of a bar across each group (but that's minor). I was hoping to hear what you think of the use of a line plot in this case? Thanks!

Tiny typo

From "where as" to "whereas" in "3 Color scales":

Both states are in the South, they are immediate neighbors, and yet one state (Texas) was the fifth-fastest growing state within the U.S. where as the other was the third slowest growing from 2000 to 2010.

Typo s5.2: explicity

The final sentence of section 5.2 should end "explicit y axis" not "explicity axis".

Types of bad

(continuing a discussion from twitter)

While I like that visualizations are labeled as bad or ugly, it'd be informative to make those designations more consistent and clear.

Here are possible categories:

  1. Wrong - The wrong information is shown on the screen (e.g., log scaled axis where the label also says that it's log - making it double log)
  2. Deceiving - The information may be misperceived unless you pay careful attention (e.g., small multiples with different y-axes)
  3. Imprecise - Not necessarily the wrong information but may not be good for reading/comparing individual values (e.g. pie charts with many slices or stacked bar charts)
  4. Not optimal - Some tasks may be difficult (e.g., difficult to find stuff with out of order bars)
  5. Ugly - Claus doesn't like it (e.g., angled x-axis text)

You'll probably want to combine some of those categories for simplicity.

What's tough is that a lot of these depend on which information a person wants. Stacked bars are imprecise for individual comparison, but do well for comparing the total size of the stack to another stack.

Write chapter: Visualizing trends

This chapter will talk about linear and non-linear fits, moving averages, and detrending. Will also talk about common pitfalls, such as that many smoothers are unreliable or misleading at the edges of the data range.

Feature request: direct links to code

Hi Claus,

It would be a really good addition, I think, to see if either it's possible to make figures into links in Rmarkdown, or similarly have footnotes or captions throughout the book, directly linking each fig to its source code. That way, people reading the online version can immediately jump to the code they need to reproduce.

-Stephanie

LaTeX formula

The little formula for log scales in "2.2 Nonlinear Axes" should be (use curly brackets):

$10^{0.5} = \sqrt{10} \approx 3.16$

in order to render correctly.

Principles of figure design - choosing a font

First, piggybacking off of what Jeff said, I'm so very excited for this book. It looks awesome so far!

Realizing it's a work in progress, I was wondering if you had considered explaining how to change/control font on graphs in R/ggplot2. I use the 'extrafont' package to do this, but I would surely be willing to change my approach. This may be outside the scope of the book and apologies if I've missed it in your plan, but figured I'd mention it!

Typo @ 21.2p2

A general, the program mangers told me, should be able to look at each figure and immediately see how what we were doing was improving upon or exceeding prior capabilities.

This should probably be managers or is it intentional...?

The need to see thought process

What an awesome book Claus!

I just want to take this opportunity to suggest something a few things that I believe might add to the book.

See, with subjects like these, I think two things are of great value:

1. Thought process explanation

I think that in the end, you could add a few visualizations where you explain the thought process into why that is a good choice (subjectively, of course) and what would be the wrong and alternative ways you could have constructed the visualization at hand.

A book that I think really nails this aspect (though not for data vis, but for statistical modelling) is Regression Modeling Strategies in which he explains quite nicely decisions that he took for modelling phenomena and what could've been some right choices and what would be some wrong ones. Even though data viz can be some what more subjective, I think you could emphasize that aspect and still provide value for people with this idea.

2. Some advice on reporting it self

I think that not enough time is spent on how reports should be developed around data visualizations.
Some principles and examples could help orchestrate a set of many visualizations. I'm thinking on how to coordinate fonts, colors, when to deviate from a chosen color scheme, how to mix different viz on the same type of data (ie. many visualizations of proportions) in a report. How to title and annotate viz coherently, throughout a report.

So those are a few two cents 😄

Congrats on the great book!

Data Viz Human Research and Typography

Hi,

I have finished reading the first two sections of Fundamentals of Data Visualization online and I
am really enjoying it. At the moment, I'm using it to create an interactive data visualization to embed in digital scientific papers (https://datavis-demo.herokuapp.com/).

In the final version, I'd love to read about how data visualization research with human subjects supports the arguments you make in the text (as well as more general data viz human research). Experiments supporting certain practices and principles would be great to read about.

I have also been wondering about the role of typography in creating readable data visualizations. Are there any research-supported guidelines? Finally, I found one typo/awkward sentence in chapter 10 :

"The archetypal such visualization is the pie chart"

The word "such" is not needed.

Write chapter: Visualizing uncertainty

This chapter will discuss various approaches to visualizing uncertainty, such as error bars, confidence bands, credible intervals, posterior distributions, hypothetical outcomes, etc.

Things to be completed for final draft

New chapters

  • #52 Visualizing trends
  • #53 Visualizing uncertainty
  • #54 Visualizing geospatial data

Substantially revised chapters

  • #63 Finalize directory of visualizations
  • #55 Revise "Handling overlapping points"
  • #48 Add section on memorable figures

Minor issues

  • #59 Swap first and second figure in figure captions chapter.
  • #56 Redraw figures in image format chapter
  • #61 Revisit Tufte-style bar graphs
  • #62 Replace volcano image
  • #64 Fix sina plots
  • #65 Fix margins in ridgeline plots
  • #66 Attribute data sources in all figure captions
  • #74 Add a beeswarm plot?

include the figure cited from The Economist 2011

In section 1 of chapter 19, "Figure titles and captions", the dataviz "Corrosive corruption" from The Economist is cited and a point is made for deviations from its design in figure 19.1.
IMHO not having the original available is a problem because it does not allow for a quick and easy comparison with the proposed changes.
It could maybe worth asking permission to The Economist to include the figure in the book...(the worst you can get is a no ;-)

Write chapter: Visualizing geospatial data

This chapter will provide a basic intro to making maps. Topics to be addressed are projections and choropleths. In particular, will discuss how choropleths can be misleading when different geographic regions have different sizes, and how to work around this issue.

Enable HTTPS?

Hi Claus, I wonder if it is too much trouble for you to enable HTTPS for your website https://serialmentor.com/dataviz/ (you may consider Netlify if you have not used it). I'm asking because I wish to list this book on the homepage of bookdown.org. Thank you!

Update iris figures in line drawings chapter

The figures using the iris dataset in the chapter on line drawings should be updated to look like the figures in the chapter on redundant coding. This means species names should be spelled out fully ("Iris setosa" instead of "setosa") and put in italics.

typos

Since you asked

Section 1: First sentence parallel structure, 'convert' -> 'converting'

Section 1.2: 'tempterature'

Section 6.2: 'useles'

Section 6.3: 'acutal'

Section 7.1: 'wisker' -> 'whisker'

Section 14.2: 'lifes'

Basic PCA examples

Add a low-dimensional PCA example, maybe using the blue jays dataset. Plot head-length vs. body mass and then draw PC1 & 2 into that plot, and then plot PC2 vs PC1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.