Code Monkey home page Code Monkey logo

modeldatatoo's Introduction

tidymodels

R-CMD-check Codecov test coverage CRAN_Status_Badge Downloads lifecycle

Overview

tidymodels is a “meta-package” for modeling and statistical analysis that shares the underlying design philosophy, grammar, and data structures of the tidyverse.

It includes a core set of packages that are loaded on startup:

  • broom takes the messy output of built-in functions in R, such as lm, nls, or t.test, and turns them into tidy data frames.

  • dials has tools to create and manage values of tuning parameters.

  • dplyr contains a grammar for data manipulation.

  • ggplot2 implements a grammar of graphics.

  • infer is a modern approach to statistical inference.

  • parsnip is a tidy, unified interface to creating models.

  • purrr is a functional programming toolkit.

  • recipes is a general data preprocessor with a modern interface. It can create model matrices that incorporate feature engineering, imputation, and other help tools.

  • rsample has infrastructure for resampling data so that models can be assessed and empirically validated.

  • tibble has a modern re-imagining of the data frame.

  • tune contains the functions to optimize model hyper-parameters.

  • workflows has methods to combine pre-processing steps and models into a single object.

  • yardstick contains tools for evaluating models (e.g. accuracy, RMSE, etc.).

A list of all tidymodels functions across different CRAN packages can be found at https://www.tidymodels.org/find/.

You can install the released version of tidymodels from CRAN with:

install.packages("tidymodels")

Install the development version from GitHub with:

# install.packages("pak")
pak::pak("tidymodels/tidymodels")

When loading the package, the versions and conflicts are listed:

library(tidymodels)
#> ── Attaching packages ────────────────────────────────────── tidymodels 1.2.0 ──
#> ✔ broom        1.0.5      ✔ recipes      1.0.10
#> ✔ dials        1.2.1      ✔ rsample      1.2.0 
#> ✔ dplyr        1.1.4      ✔ tibble       3.2.1 
#> ✔ ggplot2      3.5.0      ✔ tidyr        1.3.1 
#> ✔ infer        1.0.6      ✔ tune         1.2.0 
#> ✔ modeldata    1.3.0      ✔ workflows    1.1.4 
#> ✔ parsnip      1.2.1      ✔ workflowsets 1.1.0 
#> ✔ purrr        1.0.2      ✔ yardstick    1.3.1
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
#> ✖ purrr::discard() masks scales::discard()
#> ✖ dplyr::filter()  masks stats::filter()
#> ✖ dplyr::lag()     masks stats::lag()
#> ✖ recipes::step()  masks stats::step()
#> • Learn how to get started at https://www.tidymodels.org/start/

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

modeldatatoo's People

Contributors

emilhvitfeldt avatar hfrick avatar simonpcouch avatar topepo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

modeldatatoo's Issues

variable names in the chicago taxi data

Following up on #10.

The variable names in the Chicago taxi data are:

dplyr::glimpse(
  pins::pin_read(modeldatatoo:::modeldatatoo_board, "chicago_taxi")
)
#> Rows: 10,000
#> Columns: 14
#> $ tip              <fct> yes, no, yes, no, yes, yes, yes, yes, yes, no, yes, y…
#> $ trip_id          <fct> 9fee331b5a4b19daa19a149cfdfbeea91eb4dff9, b1dbc452aeb…
#> $ trip_seconds     <dbl> 333, 2692, 1076, 1599, 346, 1790, 885, 720, 2190, 216…
#> $ trip_miles       <dbl> 1.24, 5.39, 3.01, 18.38, 1.76, 13.65, 3.71, 4.80, 18.…
#> $ fare             <dbl> 6.50, 25.50, 11.23, 45.25, 7.50, 34.25, 11.25, 14.75,…
#> $ tolls            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ extras           <dbl> 0.0, 0.0, 0.0, 0.0, 1.0, 4.5, 0.0, 1.5, 4.0, 4.0, 5.0…
#> $ trip_total       <dbl> 8.00, 25.50, 14.10, 45.25, 11.00, 49.06, 12.62, 19.60…
#> $ payment_type     <fct> Credit Card, Prcard, Mobile, Prcard, Credit Card, Cre…
#> $ company          <fct> Sun Taxi, Flash Cab, City Service, Sun Taxi, Sun Taxi…
#> $ local_trip       <fct> no, no, no, no, no, no, no, no, no, yes, no, NA, no, …
#> $ trip_start_dow   <fct> Thu, Sat, Wed, Sat, Sun, Mon, Mon, Tue, Fri, Thu, Tue…
#> $ trip_start_month <fct> Feb, Mar, Feb, Apr, Jan, Feb, Mar, Mar, Jan, Apr, Apr…
#> $ trip_start_hour  <int> 13, 12, 17, 6, 15, 17, 21, 9, 19, 12, 20, 22, 10, 10,…

Created on 2023-06-30 with reprex v2.0.2

The unit of observation in the data is a trip; would we be up for removing the trip_ prefix from those variables that have it? It might make sense in that case to rename total to total_cost.

I proposed in #14 that we document trip_start_dow, trip_start_month, and trip_start_hour in such a way that mentions these are the values at the start of the trip; would we be up for removing the start_ prefix from those variables that have it?

Altogether, these changes should make for more concise variable names.

Release modeldatatoo 0.1.0

First release:

Prepare for release:

  • git pull
  • Bump required R version in DESCRIPTION to 3.6
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • git push
  • Draft blog post
  • Slack link to draft blog in #open-source-comms

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • Add preemptive link to blog post in pkgdown news menu
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)
  • usethis::use_news_md()
  • Finish blog post
  • Tweet

Release modeldatatoo 0.2.0

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • revdepcheck::cloud_check()
  • Update cran-comments.md
  • git push
  • Draft blog post
  • Slack link to draft blog in #open-source-comms

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • Add preemptive link to blog post in pkgdown news menu
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)
  • Finish blog post
  • Tweet

Release modeldatatoo 0.2.1

Prepare for release:

  • git pull
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • git push

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)

name for the chicago taxi data

Following up on #10.

Are we game for renaming the chicago_taxi data? Given that we will end up prefixing and suffixing the name of the data throughout analyses, it might be nice if the objects name were a bit shorter. I'd suggest taxi but am game for other ideas too. :)

Upkeep for modeldatatoo

2023

Necessary:

  • Update copyright holder in DESCRIPTION: person(given = "Posit Software, PBC", role = c("cph", "fnd"))
  • Double check license file uses '[package] authors' as copyright holder. Run use_mit_license()
  • Update logo (https://github.com/rstudio/hex-stickers); run use_tidy_logo()
  • usethis::use_tidy_coc()
  • usethis::use_tidy_github_actions()

Optional:

  • Review 2022 checklist to see if you completed the pkgdown updates
  • Prefer pak::pak("org/pkg") over devtools::install_github("org/pkg") in README
  • Consider running use_tidy_dependencies() and/or replace compat files with use_standalone()
  • use_standalone("r-lib/rlang", "types-check") instead of home grown argument checkers
  • Add alt-text to pictures, plots, etc; see https://posit.co/blog/knitr-fig-alt/ for examples

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.