alastairrushworth / inspectdf Goto Github PK

🛠️ 📊 Tools for Exploring and Comparing Data Frames

Home Page: https://alastairrushworth.github.io/inspectdf/

R 97.14% C++ 2.86%

rstats r dataframe exploratory-data-analysis eda comparison visualization

inspectdf's Introduction

👋 Hi!

I'm Alastair, a data scientist and ML engineer who loves everything python, data and ML related. I used to host a personal website, but I'm now just using github as a place to keep things together. At the moment I'm developing blaze.email which is an email content recommendation system for developers and people working in tech.

inspectdf's People

Contributors

Stargazers

Watchers

inspectdf's Issues

"" values causing error?

It appears that "" values cause the following error:

Error in if (tg$gp$fontsize < x$min.size) return() :
missing value where TRUE/FALSE needed

I am able to update all of "" to "DATA NOT PROVIDED" and it runs fine.

Thanks for the great tool!

Why only "Pearson's" method for correlation?

I liked very much your correlation plot!
It would be nice to have the option of choosing alternative methods for calculating the correlation in inspect_cor (e.g "spearman" and "kendall" )

Add number of pairwise NA in `inspect_cor()` output

improve jsd statistic for comparison

In cases where two cat columns have levels that do not appear in the other, bundle together the non-shared values?

Color/size of text labels?

Is there any way to change the color and size of text labels on the plots?

travis --> GitHub actions

[New Functionality] "inspect_num()" add parameter to define number of columns.

Hello,

Firstly, thanks for your excellent and very handy package. I use it in my normal modeling flow and when I teach, I highlight it as the de-facto solution for automatic EDA.

When using it in rMarkdown reports, the charts for the numerical variables histograms get very small.

I think, it would get a lot better if it could be parametrized the number of columns for the charts, intead of maximizing the plotting area, extending the output long wise would make the charts bigger and more readable.

Thanks again,
Carlos.

Unsorted plots?

Is there a way of not sorting the plots when enabling the option show_plot = TRUE? It would be really useful.

Thanks and congrats on the package!

Error in a dataset with 78550 rows 10 columns

While running the full dataset it gives the following errors.

datafrem characteristics:
df = 78550 rows 10 columns

str(df)
Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 78550 obs. of 10 variables:
$ date : Date, format: "2015-02-15" "2015-02-15" "2015-02-15" ...
$ author : Factor w/ 5 levels "‎ vazio","joni",..: 2 2 3 3 2 3 3 2 3 2 ...
$ message : chr "Oi Nubi" "Bom dia" "Bom dia!" "\U0001f60a" ...
$ msn_lengh : int 7 7 9 1 52 30 11 34 24 31 ...
$ day : int 15 15 15 15 15 15 15 15 15 15 ...
$ week : num 7 7 7 7 7 7 7 7 7 7 ...
$ month : num 2 2 2 2 2 2 2 2 2 2 ...
$ year : num 2015 2015 2015 2015 2015 ...
$ question_flag: chr "N" "N" "N" "N" ...
$ laughs : chr "N" "N" "N" "N" ...

Erros message

inspect_cat(df)

Column (2/5): authorError: Tibble columns must have consistent lengths, only values of length one are recycled:

Length 6: Column value
Length 11: Column prop
Call rlang::last_error() to see a backtrace

When I sample it to 10k rows it works. Still looking around over the problem.

inspect_imb fails on iris dataaset

I'm not quite sure why, but inspect_imb fails a lot for me with the same error. See below for a simple example.

library(inspectdf)
inspect_imb(iris)
# Error in sapply(df_cat_fact, are_lvls_unq) : 
#  object 'df_cat_fact' not found

type comparison to include names and types, new plots

show_ggplotly

Suggests plotly
inspect_cat : tooltip for proportion, number and label
inspect_types : ?
inspect_num : ?

Change in expected plot from v0.0.9 and v0.0.12

Hi team,

Thanks for the great package. Just noticed there's a change in the expected plot for show_plot(inspect_types(df1,df2)) between v0.0.9 and c0.0.12 and wanted to let you know incase it wasn't intended.

Code:

set.seed(2019)
diamonds_1 <- sample_n(diamonds,50)
diamonds_2 <- sample_n(diamonds,50)
show_plot(inspect_types(diamonds_1,diamonds_2))

Plot form v.0.0.9

Plot from v0.0.12

Edited to add - sorry, I think this was covered in the news for v 0.0.10 https://cran.r-project.org/web/packages/inspectdf/news/news.html

column names with spaces

check whether functions cope with column names with spaces. If ok, add tests.

Installation error on a Mac

Hi @alastairrushworth,

Many thanks for this great package. Trying to install it on a MacBook Pro (macOS Mojave 10.14.2), I am getting the following error:

** libs
/usr/local/opt/llvm/bin/clang++ -fopenmp -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I/usr/local/opt/gettext/include -I/usr/local/opt/llvm/include   -fPIC  -g -O3 -Wall -pedantic -std=c++11 -mtune=native -pipe -c RcppExports.cpp -o RcppExports.o
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/Rcpp.h:27:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/RcppCommon.h:29:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/Rcpp/r/headers.h:59:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/Rcpp/platform/compiler.h:100:
In file included from /usr/local/Cellar/llvm/6.0.0/include/c++/v1/cmath:305:
/usr/local/Cellar/llvm/6.0.0/include/c++/v1/math.h:301:15: fatal error: 'math.h' file not found
#include_next <math.h>
              ^~~~~~~~
1 error generated.
make: *** [RcppExports.o] Error 1
ERROR: compilation failed for package ‘inspectdf’
* removing ‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library/inspectdf’
Error in i.p(...) : 
  (converted from warning) installation of package /var/folders/dr/93kfhwds3l91jn94w45p2vc00000gp/T//RtmpHQU9BJ/file20f6d693429/inspectdf_0.0.0.9000.tar.gz’ had non-zero exit status```

Any help would be much appreciated.

Here is my sessionInfo() in case this would be helpful:

```sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2.2    roomba_0.1.0      ssh.utils_1.0     rio_0.5.16        fs_1.2.6          rebus_0.1-3       getPass_0.2-2     httr_1.4.0        jsonlite_1.6      sjmisc_2.7.6     
[11] naniar_0.4.1      readxl_1.1.0      janitor_1.1.1     data.table_1.11.8 forcats_0.3.0     stringr_1.3.1     dplyr_0.7.8       purrr_0.2.5       readr_1.3.0       tidyr_0.8.2      
[21] tibble_1.4.2      ggplot2_3.1.0     tidyverse_1.2.1   opencgaR_1.4.0   

loaded via a namespace (and not attached):
 [1] nlme_3.1-137          usethis_1.4.0         lubridate_1.7.4       devtools_2.0.1        rprojroot_1.3-2       tools_3.5.0           backports_1.1.2       R6_2.3.0             
 [9] sjlabelled_1.0.15     rebus.base_0.0-3      lazyeval_0.2.1        colorspace_1.3-2      withr_2.1.2           tidyselect_0.2.5      prettyunits_1.0.2     processx_3.2.1       
[17] curl_3.2              compiler_3.5.0        cli_1.0.1             rvest_0.3.2           xml2_1.2.0            desc_1.2.0            scales_1.0.0          callr_3.1.0          
[25] digest_0.6.18         foreign_0.8-70        rebus.unicode_0.0-2   stringdist_0.9.5.1    base64enc_0.1-3       pkgconfig_2.0.2       htmltools_0.3.6       sessioninfo_1.1.1    
[33] rlang_0.3.0.1         rstudioapi_0.8        shiny_1.2.0           bindr_0.1.1           generics_0.0.2        zip_1.0.0             magrittr_1.5          Rcpp_1.0.0           
[41] munsell_0.5.0         prediction_0.3.6.1    visdat_0.5.2          stringi_1.2.4         snakecase_0.9.2       pkgbuild_1.0.2        plyr_1.8.4            grid_3.5.0           
[49] parallel_3.5.0        promises_1.0.1        crayon_1.3.4          rebus.datetimes_0.0-1 miniUI_0.1.1.1        lattice_0.20-35       haven_2.0.0           hms_0.4.2            
[57] ps_1.2.1              pillar_1.3.0          rebus.numbers_0.0-1   pkgload_1.0.2         glue_1.3.0            packrat_0.5.0         remotes_2.0.2         modelr_0.1.2         
[65] httpuv_1.4.5          testthat_2.0.1        cellranger_1.1.0      gtable_0.2.0          assertthat_0.2.0      openxlsx_4.1.0        mime_0.6              xtable_1.8-3         
[73] broom_0.5.1           later_0.7.5           memoise_1.1.0```

Allow for filtered correlation plots

Great package for exploratory data analysis, thanks for sharing!

Could the package enable filtering results before plotting? I'd love to be able to do something like this:

# Load packages
library(inspectdf)
library(dplyr)

# Single dataframe summary
inspect_cor(starwars)
#> # A tibble: 3 x 7
#>   col_1      col_2    corr p_value  lower  upper pcnt_nna
#>   <chr>      <chr>   <dbl>   <dbl>  <dbl>  <dbl>    <dbl>
#> 1 birth_year mass    0.478 0.00602  0.177  0.697     41.4
#> 2 birth_year height -0.400 0.0114  -0.625 -0.113     49.4
#> 3 mass       height  0.134 0.316   -0.127  0.377     67.8

# Filter
inspect_cor(starwars) %>% 
  filter(abs(corr) > 0.2)
#> # A tibble: 2 x 7
#>   col_1      col_2    corr p_value  lower  upper pcnt_nna
#>   <chr>      <chr>   <dbl>   <dbl>  <dbl>  <dbl>    <dbl>
#> 1 birth_year mass    0.478 0.00602  0.177  0.697     41.4
#> 2 birth_year height -0.400 0.0114  -0.625 -0.113     49.4

# Filter and plot
inspect_cor(starwars) %>% 
  filter(abs(corr) > 0.2) %>% 
  show_plot()
#> Error: Tibble columns must have consistent sizes, only values of size one are recycled:
#> * Size 2: Existing data
#> * Size 3: Column `pair`

^{Created on 2020-02-21 by the reprex package (v0.3.0)}

When you have a lot of features, you want to focus only on relevant correlations and avoid clutter.

Wish for ordered categories

Hi Alastair,
thank you for the package! I think that inspect_cat / show_plot would benefit from the possibility of declaring a categorical variable or integer variable ordered, so that its level are plotted in a prespecified order (like in an ordered factor). This would be especially beneficial for features with only a few ordered categories, such as very satisfied, satisfied, ..., dissatisfied.
Best, Ulrike

installation non zero exit status

Hi
This is on my work PC.

compilation terminated.
make: *** [C:/PROGRA~~1/R/R-35~~1.1/etc/i386/Makeconf:215: RcppExports.o] Error 1
ERROR: compilation failed for package 'inspectdf'

removing 'C:/Program Files/R/R-3.5.1/library/inspectdf'
In R CMD INSTALL
Error in i.p(...) :
(converted from warning) installation of package ‘C:/Users/brplsec/AppData/Local/Temp/RtmpW8a2Gn/file2428705a7d95/inspectdf_0.0.2.9000.tar.gz’ had non-zero exit status
In addition: Warning messages:
1: In untar2(tarfile, files, list, exdir) :
skipping pax global extended headers
2: In untar2(tarfile, files, list, exdir) :
skipping pax global extended headers

Bug: `inspect_imb()` fails on factor columns

Using the latest package version on github, I'm getting errors for inspect_imb() when working with factors:

# Load 
library(inspectdf)
library(dplyr)

# Change character to factor
starwars_factor <- starwars %>% 
  mutate_if(is.character, as.factor)

# Imbalance plot
inspect_imb(starwars_factor)
#> Error: Tibble columns must have consistent sizes, only values of size one are recycled:
#> * Size 13: Existing data
#> * Size 17: Column `prop`

^{Created on 2020-02-24 by the reprex package (v0.3.0)}

Change to new cran checks badge URL

👋🏽 I maintain the cran checks badges. Please change to the new cran checks badge URL (e.g., https://badges.cranchecks.info/worst/dplyr.svg). Old badges at (e.g. https://cranchecks.info/badges/worst/dplyr) will be unavailable as of Jan 1st 2023.

Blog post from June 2022 announcing the change: https://recology.info/2022/06/cran-checks-badges/
New badges repo: https://github.com/r-hub/cchecksbadges
Old badges repo: https://github.com/sckott/cchecksapi

inspect_num %>% show_plot fails on grouped dataframe with Error object 'mid' not found

Hello,

show_plot() is failing to render a plot out of inspect_num() when the dataset is grouped.
The mid object in the code at plot_num.R#L173 is actually not defined

ReprEx

library(dplyr)
library(inspectdf)
mtcars %>% dplyr::group_by(am) %>% inspect_num() %>% show_plot()
#> Error in layer(data = data, mapping = mapping, stat = stat, geom = GeomBar, : object 'mid' not found

^{Created on 2021-08-18 by the reprex package (v2.0.1)}

"Prevalance" should be "Prevalence"

In plot_na.R there is a typo, "Prevalance of NAs" should be "Prevalence of NAs".

Cheers.

inspect_num: inconsistent binning of numerical variables in comparisons

Hi Alastair,

I like the idea and the visual, but the binning implementation seems wrong:

library(inspectdf)

data1 <- data.frame(x = c(rep(0, 250), rep(1, 250)))
data2 <- data.frame(x = c(rep(0, 230), rep(1, 270)))

show_plot(inspect_num(data1, data2))

^{Created on 2021-06-09 by the reprex package (v1.0.0)}

It even occurs for identical data:

show_plot(inspect_num(data1, data1))

Rearranging output of `inspect_cat()` removes labels for `show_plot()`

To improve spotting differences between datasets visually
(especially when there are many columns) it would be helpful if one could sort the categorical columns by the Jensen–Shannon divergence.

The code below tries to do so but it seems to distort the labels on the y-axis. Also, in case the jsd column contains missing values, those variables are deleted from the graph.

library(inspectdf)
library(dplyr)
inspect_cat(starwars, starwars[1:20, ]) %>% 
  arrange(desc(jsd)) %>% 
  show_plot()

^{Created on 2020-04-01 by the reprex package (v0.3.0)}

Suggestion: rotating `inspect_imb()` plot

For scalability, when working with data with many columns, it would be great if there would be an option to rotate the inspect_imb() plot. Currently, the best way to achieve this is by using ggplot2::coord_flip(), but that messes up the label positioning (vertical and overlapping) and it's not possible to reverse the levels on the discrete y-axis.

# Load 
library(inspectdf)
library(dplyr)
library(ggplot2)

# Imbalance plot
inspect_imb(starwars) %>% 
  show_plot()

# Rotate
inspect_imb(starwars) %>% 
  show_plot() + 
  coord_flip()

^{Created on 2020-02-24 by the reprex package (v0.3.0)}

Duplicated factor levels

Related to #6. Remove duplicated factor levels internally with a warning to user.

Automatic sizing of text

Hi @alastairrushworth, thanks for the great package!

Would you be interested in a pull request that adds automatic text sizing to the plots?

plot_cat() for mpg currently looks like this:

If plot_cat() instead used geom_fit_text() from my package ggfittext, the text could be automatically sized like this:

geom_fit_text() supports options for hiding text below a minimum size etc. that could be passed on through show_plot().

If you think this would be a worthwhile addition I can submit a pull request that adds geom_fit_text() to all the plot_*() functions.

label_size, label_angle & label_color to work for comparison plots

Error when a data.frame has a logical column

Thank you for the great package. I've recently observed a bug that such errors are thrown when a data.frame with a logical column is provided.

randnums <- c(0.2, 0.48, 0.91, -1.93, 0.75)
booleans <- c(TRUE, FALSE, FALSE, TRUE, FALSE)

df_without_na <- data.frame(a = randnums, b = booleans)
df_with_na <- data.frame(a = c(NA_real_, randnums, NA_real_), b = c(NA, booleans, NA)) 

df_without_na
#>       a     b
#> 1  0.20  TRUE
#> 2  0.48 FALSE
#> 3  0.91 FALSE
#> 4 -1.93  TRUE
#> 5  0.75 FALSE
df_with_na
#>       a     b
#> 1    NA    NA
#> 2  0.20  TRUE
#> 3  0.48 FALSE
#> 4  0.91 FALSE
#> 5 -1.93  TRUE
#> 6  0.75 FALSE
#> 7    NA    NA

library(magrittr)
library(inspectdf)

inspect_na(df_without_na) %>% 
  show_plot()
#> Error in if (!("ymin" %in% names(data)) | (all(data$ymin == data$ymax) & : missing value where TRUE/FALSE needed

inspect_na(df_with_na) %>% 
  show_plot()
#> Error in if (!("ymin" %in% names(data)) | (all(data$ymin == data$ymax) & : missing value where TRUE/FALSE needed

show_plot() bars not aligned w/ values below.

Hi Alastair,

the inspectdf PKG is really USEFUL!.

But a show_plot() quirk...

try:

unique(mtcars$carb) 
[1] 4 1 2 3 6 8
 inspect_num(mtcars) %>% show_plot()

See?.
The vertical bars in the CARB plot
are not "aligned" with the unique value markers below,
(in the x-axis).

The bars are all slightly "displaced" to the right...
(not "on top" of the unique CARB values: 4 1 2 3 6 8 ).
Even Zoomming the size of the Rstudio [Plots] Panel
doesn't help.

Same problem with columns for:
AM, GEAR , VS and CYL ...etc

Hope you can help.
Thanks Alastair!

sfd99
San Francisco
latest Rstudio/R/Ubuntu Linux
inspectdf 0.0.11

show_plot(x) displays empty histograms for DF num columns

Hi Alastair,

inspectdf Pkg is Great!.

This show_plot() example
works fine:

x <- inspect_num(starwars)
show_plot(x)

But... this ex. does not work:
(it just shows 4 empty histograms...)

x <- inspect_num(iris)
show_plot(x)

Neither does this example:
(it just shows 11 empty histograms...)

x <- inspect_num(mtcars)
show_plot(x)

Help! What am I missing?.
SFd99
San Francisco
Ubuntu Linux, R 351, Rstudio 1.1.463,
inspectdf PKG ver 0.0.2 (installed from CRAN)
----

Improvements to inspect_cat

paginate plotting outputs when many categorical features
option to color and order according to specific factor levels to make cross-column comparisons easier
text features add to named list
- basic regex: abc, 123, casing
- string length
cardinalities in plots: if cardinality = 2, show 1s and 2s as separate

inspect_cat and comparison plots

Hi Alastair.
Thanks for the package.
I've tried to do categorical-comparison plots between two data-frames (the two being partitions of some training data based on target-values).

Some example data might explain my problem a bit better:

Reprex:

library(tibble)
library(dplyr)
library(inspectdf)

df <- tibble(
  a = c(rep("x", 4), rep("y", 2), rep("x", 1), rep("y", 5)),
  target = c(rep(0, 6), rep(1, 6))
)

inspect_cat(
  df %>% filter(target == 0),
  df %>% filter(target == 1)
) %>%
  show_plot()

This results in the following image:

For category "a"

the level "x" is most common when target==0, and
level "y" is most common when target==1

I was wondering whether the level reordering is supposed to work as it does in the figure (x-first for the first data-frame, y-first for the second) or whether this might be a bug. Do you think it might make more sense for the levels to be ordered by their frequency across the combined data-frames (there are 7 ys and 5 xs here, so maybe y should come first for both dataframes)

My original aim was to quickly identify categorical vars that distinguish positive from negative samples, but this is a bit obscured when scanning down the figure (for a dozen categories), because the levels are presented in an inconsistent order for the two data-frames that are being compared.

Aside: am I correct in thinking that the planned grouped-df API would allow the above, without needing to partition the original dataframe; that is, like df %>% group_by(target) %>% inspect_cat() %>% show_plot()

Partial argument match of 'unit' to 'units'

When running the following code

withr::with_options(list(warnPartialMatchArgs = TRUE,
                         warnPartialMatchDollar = TRUE,
                         warnPartialMatchAttr = TRUE), {
    iris |>
      inspectdf::inspect_mem() |>
      inspectdf::show_plot()
})

I get multiple instances of the following warning:

Warning in format.object_size(size, standard = "auto", unit = "auto", digits = 2L) :
  partial argument match of 'unit' to 'units'

change white color for last (or second) factor.

Hi. The color for the last factor is white but for especially for groups with two factors the first one is coloured but the second is white. It would be nice the second group is not white but a lighter shade of the first color. Meaning the colorpallete should not go until white. Thanks.

Consider returning ggplot object instead of relying on print()

Hi @alastairrushworth and thanks for this awesome package! I turn to it frequently to get a sense of new datasets.

One point of friction for me is that show_plot() doesn't return the ggplot2 object created by lower-level functions like plot_cat(). Currently, I believe that if type$method == "types" in show_plot() the result will be the ggplot2 object but otherwise, because of the if statements throughout show_plot(), the result will always be NULL.

library(dplyr)
library(inspectdf)

g <- starwars %>% 
  inspect_cat() %>% 
  show_plot()

g
#> NULL

This makes it difficult for users who would like to work with the ggplot2 object, to add or change styles, for example, because they need to fall back to using ::: to access inspectdf:::plot_cat() or similar. Unfortunately for these users, the default values for plot_cat() are handled by show_plot(), further increasing friction.

g2 <- starwars %>% 
  inspect_cat() %>% 
  inspectdf:::plot_cat()
#> Error in lapply(lvl_df$levels, merge_card, high_cardinality = high_cardinality): argument "high_cardinality" is missing, with no default

If I provide the default values to the lower level functions, then I can gain access to the created ggplot2 object, but it's clear that plot_cat() isn't designed for end user consumption.

g2 <- starwars %>% 
  inspect_cat() %>% 
  inspectdf:::plot_cat(
    df_names = list(df1 = "starwars"), 
    high_cardinality = 10, 
    col_palette = 0, 
    text_labels = TRUE, 
    label_thresh = 0.1
  )

g2
#> Warning: Stacking not well defined when not anchored on the axis

Personally, I would prefer that show_plot() simply return the ggplot2 object and that default printing rules are used to display the plot rather than explicitly calling print() internally. In this way, show_plot() would work as in the last example, but without automatically printing the plot. Doing this would give the user more control over where and how the inspectdf plots are used.

^{Created on 2019-07-23 by the reprex package (v0.2.1)}

inspect_num show_plot error

Encountering following error upon using
"""inspect_num(train, valid) %>%
show_plot()"""
Error message in R:
Error in grid.Call(C_convert, x, as.integer(whatfrom), as.integer(whatto), :
Viewport has zero dimension(s)

Error: `count_na` contains unknown variables

Happens when comparing two data frames

Bug: `inspect_num()` not able to deal with different ranges in `df2`

Hi Alastair,

I ran into this issue where inspect_num() is not able to handle cases when the numeric variable has a different range in the comparison data set df2. It seems the histogram breaks are computed on the range seen in df1 alone and then applied to df2 rather than computed on the range of df1 and df2 jointly.

Here's a minimal reprex:

library(inspectdf)
data("starwars", package = "dplyr")
starwars1 <- starwars[, "height"]
starwars2 <- starwars[, "height"] + 100
inspect_num(starwars1, starwars2)
#> Error in hist.default(col_i, plot = FALSE, right = TRUE, breaks = hist_breaks): some 'x' not counted; maybe 'breaks' do not span range of 'x'

^{Created on 2023-07-07 with reprex v2.0.2}

inspect_num breaks when column is all NAs

Thanks for the great package; I noticed that the inspect_num function hits an error when hist gets a column with exclusively NAs, like this.

inspect_num(data.frame(a = 1:100, b = rep(NA_real_, 100)))

Here's the error I'm seeing:

Error in hist.default(df_num[[breaks_tbl$col_name[i]]], plot = FALSE,  :    character(0)
In addition: Warning messages: 
1: In min(value, na.rm = T) :   no non-missing arguments to min; returning Inf 
2: In max(value, na.rm = T) :   no non-missing arguments to max; returning -Inf

Cheers!

Add colorblind friendly pallettes

Y axis % of columns that is NA

Is there a way to make the y axis to go always from 0 to 100%. This would be ideal when comparing different plots from different data frames. Thanks!

alastairrushworth / inspectdf Goto Github PK

inspectdf's Introduction

👋 Hi!

Related links

inspectdf's People

Contributors

Stargazers

Watchers

Forkers

inspectdf's Issues

Erros message

ReprEx

Recommend Projects

Recommend Topics

Recommend Org