Code Monkey home page Code Monkey logo

study-tableau's Introduction

Visualization & Storytelling (Tableau)

Communicating information about data visually

Why are data visualizations more useful for delivering insight than just using summary statistics? How visual encodings influences the way we interpret information and in turn how well we can draw insights from the data? What plot do you build in a given situation? How to encode our visuals to best communicate our findings? How to design components?

[two components]

  • artistic: to be eyecatching (engaging).
  • scientific: to deliver the right insights (informative) Summary statistics like the mean and standard deviation can be great for attempting to quickly understand aspects of a dataset, but they can also be misleading.

%Anscombe's Quartet: In the 4 datasets with X,Y pairs above, they all share the same stats, but data points are very different one another. Relying on only summarry statistics can be misleading!

[What to care?]

  • Data types
    • Structured(tabulated)/Unstructured(none)
    • Quantitative(categorical)/Qualitative(numerical)
    • Discrete(count, certain validity)/Countinuous(measurement)
    • 4 levels
      • Norminal(category), Ordinal(category+order), interval(no 0), ratio(no negative)
  • number of columns
  • Quantitative over time(temporal): Line Chart
  • a couple, correlation-coefficient("strength + direction" of linear relationship): Scatter Plot

  • Univariate: Histogram / Box and Whisker Plot / Stem and Leaf Plot / Normal Quantile Plot

  • Multivariate: Bar Chart(categorical bin) / Pareto Chart(ordinal bin, which has the most problem?) / Pie Chart

  • Categorical Intensive:
    • Categorical vs Categorical vs Quantitative: Side-by-Side Bar Chart, Multi-Line Charts, Stacked Bar Chart

  • Heat-map: It is a table but colors itself based on a certain set of parameters.

  • Scatter Example in R: A scatter plot is the perfect tool for comparing continuous variables because clusters will emerge that can indicate possible correlation.
install.packages('ggplot2')
library(ggplot2)

df = read.csv(..............) 
df2 = head(df, 30)
ggplot(df2, aes(x=col, y=col, group=sssss)) + geom_point(aes(shape=sssss, color=as.factor(df2$col_1, df$col_2, ...))

[Case Study]

[Tableau Guildline]

When we use DataExtract ? (http://drawingwithnumbers.artisart.org/tde-or-live-when-to-use-tableau-data-extracts/)

How to use ? (http://onlinehelp.tableau.com/current/pro/desktop/en-us/extracting_data.html)

  • I. Connecting to Data
  • II. Combining Data (connect data from multiple sources)
  • III. Worksheets & Dashboards
  • IV. Aggregations
  • V. Hierarchies (look at the data at a year, month, day, hour, or another level)
  • VI. Marks & Filters (controls the colors, shapes and other attributes of our data)
  • VII. Show Me (controls what our ending visual looks like)
  • VIII. Small Multiples & Dual Axis (to visualize data that needs to share an axis for comparison purposes)
  • IX. Groups & Sets (categorize our data within a visualization)
  • X. Table Calculations (perform comparisons of our data over time or between groups)
  • XI. Dashboard

I. Connecting to Data

In the left sidebar you'll see the data sources you can connect to. For file sources, you can connect to an Excel file, a text file such as a CSV, or statistical files such as from SAS, SPSS, and R. If Tableau detects sub-tables, unique formatting, or some extraneous information, the Data Interpreter option becomes available.

II. Combining Data

Dragging multiple sheets(tables) into the top panel and get two different outcomes depending on where to drag it, a union or a join.

Union: Drag the second sheet below the first sheet, we get a union (if we have multiple sheets with columns in common as the columns will match up). Unions stack the data on top of each other, the second sheet ends up being appended to the end of the first sheet.

Join: Drag the second sheet beside the first, we get a join (if both have columns that we can use for the common values).Joins combines data from the sheets based on common values. We can click on the join symbol to change the type of join being performed - Inner/Left/Right/Outer.

  • Why some columns have duplicate values?

  • When do we join on multiple columns?

  • Join VS Blend: When joining multiple tables, due to the difference in their level of granularities, we lose some information('str'type...numeric values will be summed up)...A Blend will be the solution for this.

Blend: A Blend is a smart join. In general, we prepare datasets(join, union, etc) before we work for fit in our workbooks. But a blend allows us to do this on the fly in our workbooks. It experiences the granularity one by one and takes selectively. We use a blend when each datasource has different levels of granularity.

  • Open each sheet independently as two different connections(clicking and using 'show starting page' on the top left corner).
  • In the workbooks, we can see some 'orange chain icon' on the field from the second data source.
  • As a follow up, multiple 'Marksboxs' appear.
  • Note: A blending is always a left joining, so the dataset that is brought for starters will become the primary !

  • Blend and Dual Axis

  • Blend and Calculated fields
  • How is creating calculated fields in a Blend different to creating normal calculated fields? The data elements used have to be aggregated...

III. Worksheets & Dashboards

There are three main products that we can create using Tableau: 1)Worksheets, 2)Dashboards, 3)Stories. Worksheets are the core of creating dashboards and stories. On the left, fields are split between "dimensions" and "measures". Categorical, qualitative, and time data are listed as "dimensions". Quantitative numerical data is listed as "measure". Tableau automatically aggregates measures, but not dimensions. Dimensions are used to group the data and set the level of granularity.

Plotting: Select 'sheet1' in the lower left corner to start visualization. Select the data you want to plot by dragging the fields to the "columns or rows shelves". Dragging dimension to "column-shelve" and measure fields to "row-shelves" in general. It is aggregating the measure data for each dimension and summing the values.

  • map

Q. What if we want to look at the same time-period(year) across all worksheet? What if we need a consistency? (apply filter across may different worksheets)?

  • Scatter Plot

Dashboards - Action

  • The interactive dashboard: connect and let different workheets communicate with each other by 2 actions ('filtering' or 'highlighting') - isolating or outstanding

  • Interactivity (Use as filter)

Tips

Q. Exporting a worksheet - Worksheet > Export > Image >

Q. Working with 'Extract Data'

  • Create an 'extract' for Tableau to work from (Copy and Save some data from the original)
  • When you start working with big, dynamic dataset that often changes(constantly being updated)..just like a version control, we want to work with static file to build a visual. It's like capturing and saving.

IV. Aggregations

[1) Altering aggregation] Aggregation(for numerical) and granularity(for categorical) govern how Tableau operates. Let's get straight into it !

Time Series We want to see how 'certain rate' is changing as 'period' changes.

  • 'period': Discrete or Contiguous ?
  • 'Year/Quarter/Month/Day' --- Changing the granularity(NO.of ticks) of the timeline.

A. When we create a plot, selecting 'dimension'(categorical) and 'measure'(numerical), Tableau is automatically aggregating them over all records in the data set.

  • We can switch off 'aggregation measures' from the manu bar...it will give a scatter.

B. Next, we can introduce a 'dimension' that will change the level of granularity thus affect the aggregation, dragging it into the 'Marks'box(color/size/label/detail/tooltip/shape). Of course we can introduce more 'dimension'sss to increase granularity further.

C. To change the aggregation method, we can select 'measure' then Sum, Average, Median, and others. To get more points, we need to increase the granularity. That means we need to aggregate not over all the records, but over 'dimensions'(categories). By increasing the granularity, Tableau is aggregating the data over each dimension.

  • We get sums of 'measure' aggregated over groups in selected categories. The level of granularity is set by the total number of groups we have in our dimensions. We should be aware of how many groups the selected category field has.

  • Aggregations assist in our ability to draw insights, but other times having a point for every row might be better.

  • Highlighting

    • 'Highlighting' interrogates one of the levels in a category more specifically. Just click the legend.
    • 'detail' in the 'Marks'box is a dark templer. The 'dimension' on the 'detail' affects the granularity but is not represented anywhere in the chart. So if we want to highlight the 'dimension' on the 'detail', we drag the 'dimension' into 'shape' then click the legend.
  • Area Chart

Q. Adding filters: We built a chart, but there is a missing 'dimension' that need to be considered, then we put this 'dimension' into a 'filter'shelve. We can customize our filter to check and isolate the chart of a particular level of the 'dimension'.

Q. Adding new custom variable(a column of aggregation): Let's say we have, as measures, 'Units' and 'Unit_Price'. But we need to create 'total_dollar_value'. What are you gonna do?

  • We need to multiply 'Units' by 'Unit_Price'
    • RightClick on measures and select Create Calculated Field then

% Seemingly, Richard is not the best seller overall. Now we know who deserves the bonus.

Q. Adding Color: Discrete / Continuous

  • Discrete Coloring

  • Continuous Coloring

Q. Adding Labels and Formatting

  • Adding Labels: We can type our own text like.."Sales: <SUM(total_dollar_value)>"

% When some text does not appear because of its size: RightClicking -> Choose mark label > Always Show

  • Formatting Labels

  • Formatting Axis

[2) Calculated Fields] Add 'calculated fields' to a visualization on the fly

There will be times when we want to look at something but there isn't a specific field for it. For instance, maybe we want to know the 'profit' per item for each 'order' record. It seems pretty simple, just divide 'profit' by 'order' for each record, then aggregate it. 'clalculated fields' let us create new fields to use in our visualizations. To create a calculated field, open the menu on a field (such as 'Profit'), then Create > Calculated Field....Fields in the editor show up in brackets, like [Profit]. We can do simple arithmetic here, like adding a constant, or multiplying the field. We can also use functions such as absolute value, sine, square root, etc. Here we want to create a new field that calculates the profit per item for each record - [Profit]/[Quantity]. We also renamed the calculated field to "Profit per item".

[3) Parameters]

Q. How to control bin size ?

Q. Bin-control with 'Parameter'

V. Hierarchies

When you drag "Datetime" variable to Columns, there is a little plus symbol on the "Datetime" field pill. Tableau automatically makes a hierarchy of time periods from "date" and "time" data fields. As we drill down, we get more fine divisions, from year, to quarter, to month, then day. Clicking on minus sign in them, it will go back up the hierarchy. We can make some column to absorb other column as a sub-category. Select both the catgory(mom) and sub-category(child), then, in the drop down for the category(mom), create hierarchy.

VI. Marks & Filters

Mark card: Often, we’ll want to include more dimensions in our graph. We can add dimensions to the plot (increasing granularity) by dragging dimensions or measures to the Marks shelf. It has options such as color, size, and shape.

  • Color: Most often we'll be encoding data with color. We simply drag the field to "Color" in the Marks card.

  • Size: Dragging a field, either discrete or continuous, to "Size" will encode the data in the size of the markers. We'll most often use this encoding in a scatter plot. Below is a scatter plot with average quantity vs average profit for each country. I encoded the average discount using the size of the markers, it's clear that discounts are responsible for the countries with negative profits.

  • Shape: As with color and size, we can use the shape of the markers to encode data. We'll want to use discrete data only for this. Also, if we have too many categories the shapes are too difficult to identify.

  • Detail: The "Detail" card allows us to bring in a field without any visual encoding. This enables us to increase granularity without adding any graphical effect. The "Label" card adds in labels for all of our markers.

  • Tooltip: Add additional context or information to your view without taking up any precious real estate on the worksheet. This is super important when the worksheet is on the dashboard. You can populate the tooltip with as much information as needed including dynamic fields. These fields will update when the user clicks on or hovers the mouse over a mark in the view. As a result, tooltips are very useful in building interactivity and reinforcing your story for your views.

Mark & Filter card: This way we can view only the data we are interested in.

VII. Show Me

The Show Me feature is a quick way to start with a basic graph which we can add to afterwards. We can find it in the top-right of the sheet. Selecting multiple fields we want simultaneously then clicking the show me graph. From there you can customize the graph. Show Me is usually a good start once you decide what you want to look at or show. Here, filters would definitely help us narrow to look at one question at a time.

VIII. Small Multiples & Dual Axis

Small Multiple: (split) Simply dragging multiple dimensions to the Columns and Rows shelves creates a small multiple. We saw this before when learning about hierarchies. The main advantages of the Small Multiples view is to be able to see 'little ranges' of our data at a time.

Dual Axis: (overlap) When we drag multiple Measures to the Rows shelf, we get multiple plots. If we want them in one plot, we use dual axis.

IX. Groups & Sets

Groups: Groups are typically created from the view by selecting multiple data points in the view. We can use it in other sheets. For instance, create a map that shows how the low quantity countries (grouped) are distributed in the world. Here we used the group we created (Low Quantity Countries) to color the map. The blue countries are the ones in the group, the ones with low average quantity.

Sets: Sets are similar to groups in that we can select data points and create a set from them. However, sets can be dynamic where the members of the set will change as the underlying data changes. Groups on the other hand are static, the members will always be the members. For example, say we want to see how our poor performing products are affecting the overall profits. We can create a set from the product names or IDs which lose money, where the average profit is below zero. To create the set, open the menu for the Product Name field and choose Create > Set...Click the "Condition" tab. Then select By field: Profit Average < 0 as seen below.

We can use the set in plots to encode these products that are losing money. Let's look at the total profits for the different sub-categories of products. With the set you just made, you can split these bars into losses and gains. The red bars are showing how much money is lost due to the bad products. It looks like Office Supplies products are almost all winners, but Furniture is losing a lot of money.

X. Table Calculations

Table calculations can be useful for helping us to compare the data that exists in a plot to other parts of the plot. What if we want to compare the percent of profits that went into each market to the total profit. Select the drop down associated with SUM(Profit), and select Quick Table Calculation... > Percent of Total. Alternatively, it could be useful to look at compounding profit. A lot of table calculations work well for line plots. Let's take a look at an example. Add the Order Date to the Columns and Sales to the Rows. Make sure the Order Date is continuous. Drill down to the Quarter level, and select Quick Table Calculation... > Percent Difference.

In order to get a better idea of how things are moving over time, we might use a moving average. You can see below how this is done using the table calculations. Additionally, I broke out the Category to see how things change over time for each. This plot is great for seeing the trend of the data, and that sales are increasing, but they aren't great for seeing how sales are changing over time from one quarter to the next. Instead, the last question on the next concept requires the plot below.

  • For example

XI. Data Preparation

  • Data Interpreter:
    • the best format of Data for machines is where each measure and dimension has its own column. 'Pivot' works for this.

study-tableau's People

Contributors

mainkoon81 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.