Code Monkey home page Code Monkey logo

r-base-shortcuts's Issues

New R Code Snippet Suggestions

There are so many programmatic capabilities and secrets in R that it remains to be seen whether any R developer who has no more than a fundamental knowledge of the tool's 31 baseline packages, commonly referred to as the tidyverse, would ever discover.

R currently maintains nearly 20,000 active packages on its network, called CRAN.

Consequently, I arguably possess one of the most extensive collections of R code snippets in the world. These code snippets are principally targeted to address challenges supporting Data Science proper, Software Engineering, Machine Learning, and Generative Art.

To that end, I have hundreds of R code snippets that I have either captured or developed over the last 8 years.

This repository is so large that I had to develop a separate database and code library to manage it.

While I had most of the code snippets that you posted in your R Short Cuts, there were a few items that I didn't have.

So, as a quid pro quo to the effort that you so kindly shared, I thought I would share a few of the code snippets and ideas that I have collected over the years with you and your audience.

This code will be provided under what I refer to as the "R-Insight" series under the "Issues" section of your GitHub repository.

It is my sincere hope that you decide to share them either in their original form, or redacted as you interpret them.

In terms of credibility as a viable source in this space, I wrote a book on R which was published in July 2021 titled, "Conquering R Basics." This work can be found on Amazon Books.

If you are interested in a particular topic or interest about which I earlier referenced, reach out as I most likely have an article or two that describes it. -BR

R-Insight: Save All Data Objects in a Workspace to a Single File

When working in the R environment, data is created, collected, and used in the form of various list objects, vectors, models, and datasets. By applying the save.image function found in Base R, the entire active workspace can be saved to a single data file. The R file extension used must be .RData. In RStudio, data objects are extracted and subsequently converted into an .RData file from the Environment tab.

save.image(file = "my_work_space.RData")

To load the entire workspace back into an RStudio active session, apply the following code:
load("my_work_space.RData")

NOTE: If no formal file path is provided in the file argument, the file will be saved in the user's Documents folder.

R-Insight: Return a Listing of all Functions in a Target R Package

The R code in this snippet allows the user to directly write console output in RStudio to a pre-designated text file. The purpose of this code snippet is to provide an alternative for capturing R data results when such data cannot be stored within a data object. List object data produced by various R functions, for example, are very difficult to convert to data frames, which is necessary if the data needs to be externally accessed and used. The following R code resolves this issue:

This example returns a listing of all R package functions by name and by structure, converting the output to a text file:

library(dplyr)
fp = paste0(path.expand("~"), "/dplyr_fcn_list.txt")
sink(fp, append = FALSE, split = FALSE)
lsf.str(pos = "package:dplyr")
sink()
closeAllConnections()

NOTE1: Before executing the code, the target package must be locally installed AND loaded.

NOTE2: The lst.str function is an abbreviation for List Functions as a String.

R-Insight: Unique Identifiers

Unique Identifiers are important in data analysis. They are used to uniquely identify a record within a dataset. It may be necessary to create unique identifiers beyond the sequential order of traditional datasets in R where the numeric sequence begins with 1.

There are other ways to create unique, more complex identifiers but this solution provides an economy of code to achieve the task.

This code snippet showcases how to generate unique identifiers using two different methods:

This example uses R's ability to generate temporary file names as a means to extract and generate Unique Identifiers:

  • Example 1:
    library(easyr)
    x = right(replicate(basename(tempfile()), n = 5), 8)

NOTE1: The n argument within the replicate function determines the number of Unique Identifiers to generate.

NOTE2: Using this method (Example 1), up to 8 characters can be used to create unique identifiers. The example generates 5, 8-digit results.

  • Example 2:
    library(wakefield)
    x = string(n = 5, length = 8)

The n argument determines the number of Unique Identifiers to generate and the length argument controls the number of characters comprising each Unique Identifier.

NOTE: Example 2 is superior to Example 1 in terms of flexibility because the identifier length can be customized. Example 1 provides Unique Identifiers that cannot exceed 8 characters in length.

R-Insight: Run-Length Encoding (RLE)

The rle function found in base R is obsolete and should not be used. A much better option is the subSeq function found in the doBy package which captures a series of data points related to an RLE, including the following:

  • First position in the sequence
  • Last position in the sequence
  • Sequence length
  • The midpoint position of a repeating sequence
  • The value being examined

RLE results are captured in a data frame. A Dot Plot has been added to facilitate the visualization of an RLE. In this example, binary values are examined:
library(broman)
library(doBy)
set.seed(7854)
y = sample(x = 0:1, size = 200, replace = TRUE)

Returns a comprehensive RLE analysis in a data frame
yrle = subSeq(y)
dotplot(group = yrle$value, y = yrle$slength, main = "RLE Binary Dot Plot", xlab = "Value", ylab = "Run Length", jiggle = "fixed", bg = "red")

To get a table summary of the RLE analysis, apply the following code:
table(yrle$value, yrle$slength)
-----1----2---3--4--6
0---34--12 --7--4--0
1---31--12 --8--5--1

Two facts are quickly discernible from the RLE analysis:

  1. In the yrle data frame record 102, position 174-179, the RLE analysis shows the longest run-length pattern of (6) 1-based values. The corresponding Dot Plot supports this finding. If one was looking for an outlier pattern this is it.
  2. Considering all run-length patterns in vector y, there are no consecutive patterns of (5) values for either 0 or 1.

R-Insight: expand.grid Obsolete

The expand.grid function found in Base R is largely obsolete. A better alternative to use is the vec_expand_grid function found in the vctrs package. It substantially expands on the expand.grid function by executing improved type-set rules:

  • Increased process performance
  • Produces sorted output by default
  • Never converts strings to factors
  • Does not add additional attributes
  • Drops NULL inputs
  • Can expand any vector type, including data frames and records

A more advanced example of a cross-balanced dataset shows three dimensions of data that are organized and connected from within a combinatorial structure of job positions, code provisions, and position categories. Simulated job titles were generated from the charlatan package:

library(charlatan)
library(vctrs)
set.seed(32491)
jb = ch_job(n = 10)
cd = paste0(sample(100:300, size = 3, replace = TRUE), ".", sample(1:8, size = 3, replace = TRUE))
ct = c("TRNG", "ONBRD", "HR")
ds = vec_expand_grid(job = jb, code = cd, cat = ct)

NOTE: When using the vec_expand_grid function, all arguments must be preceded by an argument name whether it is the default x and y parameter or as field names. If argument names are not defined, the function will crash.

Final Comments on R-Insight Series

With all due respect, I find you to be excessively myopic in your interpretive application of R functions. Your position is that you are only interested in base R functions for the creation and development of production code? That is my interpretation based on your email responses to my R-Insight series.

And you [REDACTED]? I hope your [REDACTED].

You probably don't know this but there is a separate R package that actually provides improved functionality for many of the Base R functions that you are interested in promoting. The functions provided in Base R are nearly a quarter century old but that is what you are promoting on your r-base-shortcuts page? Wow.

I believe your current mindset is grossly marinated in a state of ignorance and static thinking. R has radically changed over the last decade in both its function and its form.

New ways of thinking are being instituted in R on nearly a weekly basis. I am monitoring these changes in near real-time.

I will no longer be contributed ideas to this page as the thinking supporting it is grossly misinformed. The good news is that I will never be competing for a job or a project against you or anyone who thinks like you in such a limited fashion about technology. Anyone who thinks more broadly about these technologies has an extraordinary edge on those who think like you. Your thinking on this matter needs to be seriously adjusted.

And keep promoting the rle function as the function to use for run-length encoding because that function nets you nothing in terms of RLE analysis.

Stay myopic and statically defined by using 25 year-old functions in R.

Note: @nanxstats edited this comment to remove content that violated this project's code of conduct.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.