Code Monkey home page Code Monkey logo

pkglite's Introduction

pkglite

R-CMD-check Codecov test coverage CRAN status CRAN Downloads

A tool, grammar, and standard to represent and exchange R package source code as text files. Converts one or more source packages to a text file and restores the package structures from the file.

  • To get started, see vignette("pkglite").
  • To generate file specifications, see vignette("filespec").
  • To curate file collections, see vignette("filecollection").
  • The text file format is described in vignette("format").

Installation

You can install the package via CRAN:

install.packages("pkglite")

Or, install from GitHub:

remotes::install_github("Merck/pkglite")

Workflow

library("pkglite")

Pack one R package:

"/path/to/package/" %>%
  collate(file_default()) %>%
  pack()

Pack multiple R packages:

pack(
  "/path/to/pkg1/" %>% collate(file_default()),
  "/path/to/pkg2/" %>% collate(file_default()),
  output = "/path/to/pkglite.txt"
)

Unpack one or more packages:

"/path/to/pkglite.txt" %>%
  unpack(output = "/path/to/output/")

Citation

If you use this software, please cite it as below.

Zhao, Y., Xiao, N., Anderson, K., & Zhang, Y. (2023). Electronic common technical document submission with analysis using R. Clinical Trials, 20(1), 89--92. https://doi.org/10.1177/17407745221123244

A BibTeX entry for LaTeX users is

@article{zhao2023electronic,
  title   = {Electronic common technical document submission with analysis using {R}},
  author  = {Zhao, Yujie and Xiao, Nan and Anderson, Keaven and Zhang, Yilong},
  journal = {Clinical Trials},
  volume  = {20},
  number  = {1},
  pages   = {89--92},
  year    = {2023},
  doi     = {10.1177/17407745221123244}
}

pkglite's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pkglite's Issues

Fix issues in HTML validation

From CRAN maintainers:

Please see the problems shown on
https://cran.r-project.org/web/checks/check_results_pkglite.html.

In particular, please see the "Found the following HTML validation problems" NOTEs in the "HTML version of manual" check for the r-devel debian checks results.

R 4.2.0 switched to use HTML5 for documentation pages. Now validation using HTML Tidy finds problems in the HTML generated from your Rd files of the form

  • <big> element removed from HTML5
  • <center> element removed from HTML5
  • <img> attribute "align" not allowed for HTML5
  • <img> attribute "hspace" not allowed for HTML5
  • <img> attribute "width" has invalid value "120px"
  • <img> attribute "width" has invalid value "480px"
  • <img> attribute "width" has invalid value "50px"
  • <img> attribute "width" has invalid value "72px"

For the first four, please see https://html.spec.whatwg.org/#obsolete-but-conforming-features for info on these: in principle, all can be fixed by using style attributes, e.g.

style='text-align: right;'

instead of align='right' etc., which will work for both the new and old ways of converting Rd to HTML.

For the second four, simply drop the px units: the HTML5 standard asks for a non-negative integer implied to be in CSS pixels.

Note that the problems are found in Rd files auto-generated with roxygen2: to fix it might suffice to re-generate these using the current CRAN version of roxygen2.

Clarify output path logic for unpack in vignettes

pkglite follows a simple rule when unpacking packages:

  • It will create directory(s) under the output directory named after the parsed package name(s) from each DESCRIPTION file of the package(s), and put the content of each package under them.
  • This design implies that the simplest way to avoid unnecessary confusions is to keep package names and folder names an identical mapping.
  • Users can adjust the directory names and hierarchy in userland easily, but that would belong to individual business logic.

Release pkglite 0.2.0

Prepare for release

  • Check current CRAN check results
  • Check licensing of included files
  • Review pkgdown reference index for, e.g., missing topics
  • Bump version
  • Update cran-comments.md (optional)
  • Update NEWS.md
  • Review pkgdown website
  • urlchecker::url_check()
  • Check with local machine
  • Check with GitHub Actions
  • Check with win-builder

Submit to CRAN

  • Draft GitHub release
  • Submit to CRAN via web form
  • Approve emails

Wait for CRAN

  • Accepted ๐ŸŽ‰
  • Post on r-packages mailing list
  • Tweet

Release pkglite 0.2.1

Prepare for release

  • Check current CRAN check results
  • Check licensing of included files
  • Review pkgdown reference index for, e.g., missing topics
  • Bump version
  • Update cran-comments.md (optional)
  • Update NEWS.md
  • Review pkgdown website
  • urlchecker::url_check()
  • Check with local machine
  • Check with GitHub Actions
  • Check with win-builder

Submit to CRAN

  • Draft GitHub release
  • Submit to CRAN via web form
  • Approve emails

Wait for CRAN

  • Accepted ๐ŸŽ‰

seg_char and any seq(...) - dependent function

It would be helpful to include checks before sending code into a seq(...) function.
In my case I had an empty file in a folder and got an error while running the pack function:

Error in seq.default(from = 1L, to = nchars, by = nmax) :
wrong sign in 'by' argument

I traced it all the way to

seg_char <- function(x, nmax) {
    nchars <- length(x)
    pos <- seq(from = 1L, to = nchars, by = nmax)
    short <- nchars <= nmax
    nlines <- if (short) 1L else length(pos)
    pos_start <- if (short) 1L else pos[seq_len(nlines)]
    pos_end <- if (short) nchars else c(pos[2L:nlines] - 1L, nchars)
    lapply(seq_len(nlines), function(i) x[pos_start[i]:pos_end[i]])
  }

Since the binary file had 0 bytes and no content the seq(from = 1L, to = 0, by = 64) gave the error. I propose to built in a safety net to check if files have content (eg remove empty files in a sanitize function, eg if nchars>1 then do ... else give informative error to the user to investigate)

add a `file_all` function

Should we add a file_all function?

It would be helpful if we need to move all files from a sub-folder. (e.g. inst/ folder)

Consider exporting `read_pkglite()`

It feels like it would generate more benefits than harms if we make read_pkglite() an exported function, because people might need to parse pkglite.txt directly sometimes. This would avoid the use of :::.

What do you think? @elong0527

Guess filetype for files without extensions

When evaluating file specifications to create file collections, we should follow this:

  • If a file has a known extensions, mark it as text or binary based on the dictionary (implemented)
  • Include files that do not have a file extension, and files with extensions not covered by the dictionary
  • Document this flow in the specification section

Pack one file

When packing only one file, I get an error that I traced back to the following piece of code:

idx_content_end <- c(idx_pkg[2L:nblocks] - 2L, nlines)

This line of code identifies the end of each block. If you have three files packed in the .txt file it will look for:

  • the start of the second block of content = end of first block
  • the start of the third block of content = end of second block
  • the end of the .txt file = end of third block

When you only pack one file, this logic does not work because nblocks is 1.

This can be solved by eg

if (nblocks == 1) {
idx_content_end <- c(nlines)
} else if (nblocks>1) {
idx_content_end <- c(idx_pkg[2L:nblocks] - 2L, nlines)
}

It would be good to include a unit test for this.

documentation

I don't really understand why I would use this. Can you include some info in the README or vignettes discussing why it is useful? Also in the README which vignette should be read first.

Release pkglite 0.2.2

Prepare for release

  • Check current CRAN check results
  • Check licensing of included files
  • Review pkgdown reference index for, e.g., missing topics
  • Bump version
  • Update cran-comments.md (optional)
  • Update NEWS.md
  • Review pkgdown website
  • urlchecker::url_check()
  • Check with local machine
  • Check with GitHub Actions
  • Check with win-builder

Submit to CRAN

  • Draft GitHub release
  • Submit to CRAN via web form
  • Approve emails

Wait for CRAN

  • Accepted ๐ŸŽ‰

Add SAS binary file formats to dictionary

I would like to add SAS binary data extensions to ext_binary() in the dictionary, as some packages might contain such datasets.

The current binary file extensions I consider include

  • .sas7bdat
  • .sas7bcat
  • .xpt
  • xpt5
  • xpt8

Any other recommendations? @elong0527

Note that some units tests might need an update for this change.

`_pkglite.yml`

It would be great to support defining the packing scope in a configuration file.

One of the potential benefits is automating the packing action in a CI/CD workflow, without modifying the pipeline code.

For the file format, my personal preference is a YAML file like in pkgdown, but see lintr for the syntax (not using the DCF format).

pattern_file_sanitize() - do we need the extra /?

I was trying to remove some folders before packing but ran into issues. Then I realized you make the assumption that the files are already residing in a subfolder.

sanitize.file_collection <- function(x) {
  pkg_name <- fc$pkg_name
  df <- fc$df
  df <- df[!grepl(pkglite:::cat_patterns(pattern_file_sanitize()), df$"path_abs"), ]
  new_file_collection(pkg_name, df)
}

pattern_file_sanitize()
[1] "/\\.DS_Store$"     "/Thumbs\\.db$"     "/\\.git$"          "/\\.svn$"          "/\\.hg$"           "/\\.Rproj\\.user$" "/\\.Rhistory$"    
[8] "/\\.RData$"        "/\\.Ruserdata$"   


grepl("/\\.git", ".git/blah") # trying to remove the .git folder but this does not work
FALSE
grepl("/\\.git", "/.git/blah") # it failed above because . git was in the absolute folder
TRUE

It would be more intuitive (at least to me) to not make assumptions and change the pattern to below so the function finds all files:

[1] "\\.DS_Store$"     "Thumbs\\.db$"     "\\.git$"          "\\.svn$"          "\\.hg$"           "\\.Rproj\\.user$" "\\.Rhistory$"    
[8] "\\.RData$"        "\\.Ruserdata$"   

Remove cli dependency

r-lib/cli is now using C to format certain things and thus becomes a dependency that requires compilation.

To minimize the number of dependencies, we should consider replacing it with wrappers created around r-lib/crayon. This would also avoid one other dependency of cli: tidyverse/glue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.