Code Monkey home page Code Monkey logo

randomnames's Introduction

randomNames

R-CMD-check AppVeyor Build Status CRAN_Status_Badge Development Version Rstudio mirror downloads License https://gitter.im/CenterForAssessment/randomNames

Overview

The randomNames package contains a single function randomNames which creates random gender/ethnicity correct first and/or last names where names are proportionally sampled based upon their frequency in a large scale database.

Installation

From CRAN

To install the latest stable release of randomNames from CRAN

> install.packages("randomNames")

From Github

To install the development release of randomNames from GitHub:

> devtools::install_github("CenterForAssessment/randomNames")

Usage

> randomNames(5) ## 5 last, first names
[1] "Mossberg, Cassie"  "Mendiaz, Victoria" "Miner, Cassidy"    "Austin, Brook"     "Babcock, Lloyd"

> randomNames(5, gender=1) ## 5 female last, first names
[1] "Bruckner, Birva"   "Caringer, Madelyn" "Mendoza, Rebecca"  "el-Haque, Jaleela" "Williams, Miranda"

> randomNames(5, gender=0) ## 5 male last, first names
[1] "al-Salam, Rida"    "Debus, Kai"        "al-Aly, Jaabir"    "Garces, Markus"    "Robertson, Trevor"

> randomNames(5, gender=0, ethnicity=3) ## 5 African American, male last, first names
[1] "Bashir, Shaquille" "Ursery, Keilan"    "Marlow, Marvin"    "Bell, Daishavon"   "Hammond, Kyle"

> randomNames(5, gender=1, ethnicity=6, which.names="first") ## 5 Middle Eastern, female first names
[1] "Jawhara"  "Raaniya"  "Ghaada"   "Ghazaala" "Raabia"

Resources

Contributors

The randomNames Package is crafted with ❤️ by:

I love feedback and am happy to answer questions.

randomnames's People

Contributors

adamvi avatar dbetebenner avatar gitter-badger avatar joshwlambert avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

randomnames's Issues

Plans to expand database of names?

Is there any plan to increase the number of names? I often exceed the number of names (when sampling without replacement) and it would be great to know if there are plans to add more, or are you open to accepting contributions (for example for ethnicities currently not included)?

Please add argument `initial.letter =`

Hello i really like this package, it automated a lot of my work. Now it would be perfect if i could get to choose random names that start with a specific letter in a argument inside the function for example:

randomNames(n=5, gender=1, ethnicity = 4, which.names="first", initial.letter="M")
# [1] "Maria" "Magdalena" "Margarita" "Margot" "Milagros"

Thank you and looking forward for an answer. Greetings from Peru

Handle NAs in randomNames()'s gender argument

When calling randomNames(gender = g) on a gender vector that contains NAs, the generated names do no longer represent gender correctly:

# Define gender vector
> (g <- rep(0:1, each = 3))
[1] 0 0 0 1 1 1

# Gender is correctly represented
> randomNames::randomNames(gender = g, which.names = "first")
[1] "Samuel"   "Carlos"   "Theodore" "Emlynn"   "Briana"   "Deborah" 

# Include NA in gender vector
> g[3] <- NA
> randomNames::randomNames(gender = g, which.names = "first")
[1] "Maleeha" "Sean"    "Sad"     "Sang"    "Labeeba" "Carter" 

# First gender is 0 (male)
> g[1]
[1] 0

# "Maleeha" is not among any mal first names list
> fn_male <- grep("^first.*g0$", names(randomNames::randomNamesData))
> sapply(fn_male, function(i)  "Maleeha" %in% names(randomNames::randomNamesData[[
+     names(randomNames::randomNamesData)[i]
+     ]])
+ )
[1] FALSE FALSE FALSE FALSE FALSE FALSE

Partial argument matching in `rep()`

First and foremost, just to say that I really like the {randomNames} R package. It has been really useful for a package we're developing in the Epiverse-TRACE intiative: {simulist}.

Within that package we use a testing setup option that automatically checks whether we are using partial argument matching (i.e. only matching some of the argument name within a function which is then matched to the full argument name by R). This check also picks up if dependencies used within the package are using partial argument matching, and it has detected some in {randomNames}.

Specifically, the rep() function, where currently, length is being used and partially matched to length.out. It would be great if this could be updated to avoid partial matching. I've added some links at the bottom of this issue to why using partial matching can make code brittle.

I'm happy to make a PR from a fork of the package to make the recommended changes.

Links:

New package version & CRAN release?

Thank you for your collaboration getting #82 merged and in responding to issues I've raised. Would it be possible make a new version release and submit this new version to CRAN?

This would assist getting a package I am working on onto CRAN (see epiverse-trace/simulist#1).

I've completed the reverse dependency check and there were no errors and everything passed. I can paste the output logs of this revdep_check() on this issue if you would like.

I was unable to run the R CMD check due to the inst/doc directory.

Please let me know if you would be open to the idea of a new release and a submission of this new release to CRAN, I am happy to assist in any way possible.

Leading whitespaces in some names

When running the command
randomNames(1, which.names = "first", gender = 1)
some names are returned which have leading whitespaces. E.g. " Huda" and " Muzna".

Even though this can be fixed easily with trimws it is unexpected and maybe problematic to some people if these names are also returned without whitespace sometimes.

Anonymize name across multiple records

Hi! I am successfully using this great package in data frames with one record per person. I would like to use it when I have multiple records.
Here's a quick example:

ID NAME YEAR GRADE ANONYMIZED NAME
45 Sue 2023 5 Beth
45 Sue 2024 6 Kayla

Is there a method to assign the same anonymized name to a person with the same unique identifier such as ID?

firstnames weighted by birth year?

Hi - this is a neat package for a specific purpose. One possible nice feature - could you set parameters birth_year_start = 1960 (default) and birth_year_end = 2000 (default), The user could then change these parameters to get firstnames appropriate for people born between 1880-1890, or 2000-2010. Ideally this would use the weighted frequency of each firstname by gender for the included range of birth years.

data source?

Hi, this is very useful. Could you let me know what was the source dataset(s) used to create the name distributions?

regards
Lucas

Uninformative error message when exhausting names

It seems that when the number of names is exhausted when using randomNames() (with sample.with.replacement = FALSE) it gives an uninformative error message about sampling. It would be great if the {randomNames} package could provide the user with an custom informative error message when the requested number of names is too large. This error message can also suggest turning sample.with.replacement to TRUE to help.

Here is a reprex to show an example

library(randomNames)
set.seed(1)
gender <- rep(c("M", "F"), 2525)
names <- randomNames::randomNames(
    which.names = "both",
    name.sep = " ",
    name.order = "first.last",
    gender = gender,
    sample.with.replacement = FALSE
)
str(names)
#>  chr [1:5050] "Sebastian Clayton" "Melisa White" "Eli Jackson" "Malisse Ha" ...

gender <- rep(c("M", "F"), 3000)
names <- randomNames::randomNames(
    which.names = "both",
    name.sep = " ",
    name.order = "first.last",
    gender = gender,
    sample.with.replacement = FALSE
)
#> Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'

Created on 2024-01-18 with reprex v2.0.2

Random Error with large samples without replacement

I want to generate many (9000) unique names to replace human unfriendly uuid-numbers. I wanted to extend the random name generator to enable infinite unique names by simply adding an integer when the maximum number of random names is reached.

In writing the function, I realised that I cant find a hard upper limit of number of names I can generate without replacement: I get an error at different ns.

For example below, the function returns an error on the first run, but is successful on the second, third and fourth try.

Can you elaborate on this?

library(randomNames)
#> Warning: Paket 'randomNames' wurde unter R Version 3.6.3 erstellt

set.seed(1)
one <- randomNames(5000, sample.with.replacement = FALSE)
#> Error in sample.int(length(x), size, replace, prob): kann keine Stichprobe größer als die Grundgesamtheit nehmen
#>  wenn 'replace = FALSE'
two <- randomNames(5000, sample.with.replacement = FALSE)
thr <- randomNames(5000, sample.with.replacement = FALSE)
fou <- randomNames(5000, sample.with.replacement = FALSE)

Created on 2020-03-18 by the reprex package (v0.3.0)

randomNames(0) returns more than 'n' random first and/or last names.

The documentation states that the argument ofrandomNames(n) indicates "how many names to produce". In fact, the method always returns at least one name, which is contrary to the stated documentation. This affects correctness of the following examples

names <- append(names, randomNames::randomNames(num - length(names)))
test_that("test number of generated names", {
  expect_equal(10, length(randomNames(10)))
  expect_equal(1,  length(randomNames(1)))
  expect_equal(0,  length(randomNames(0)))
})

As of randomNames 0.1-0.0, the documentation says

       n: OPTIONAL. Integer indicating how many name to produce. Best
          to use when no gender or ethnicity data is provided and one
          simple wants ‘n’ random first and/or last names.

set.seed() only works for the first row of a dataframe.

Neat package.

One minor thing, running this will result in a different person for the second row of the dataframe. The seed is only respected for the first run.

set.seed(1842)
randomNames(2, which.names = "both", return.complete.data = T)

The workaround:
set.seed(1842)
df1 <- bind_rows(randomNames(1, gender = T, ethnicity = T, which.names = "both", return.complete.data = T),
randomNames(1, gender = T, ethnicity = T, which.names = "both", return.complete.data = T),
randomNames(1, gender = T, ethnicity = T, which.names = "both", return.complete.data = T))

Any thoughts on allowing us to set the seed so we can always reproduce the same set of names?

`sample.with.replacement = FALSE` across ethnicities/ genders

I have a possibly annoying feature request: Would it be possible to make sample.with.replacement = FALSE work across ethnicities/ genders?

I wanted a list of randomly generated, unique names, but had to use a work around with unique() to get it to work.

library(randomNames)                                          
set.seed(7)                                                   
 
# expected unique names, but some are duplicated                                                             
random_names <- randomNames(100, which.names = 'first',       
sample.with.replacement = FALSE)                              
any(duplicated(random_names))                                 
#> [1] TRUE
                                                              
# by contrast, it works for a single ethnicity/ gender        
unique_random_names <- randomNames(100, which.names = 'first',
sample.with.replacement = FALSE, ethnicity = 1, gender = 1)   
any(duplicated(unique_random_names))                          
#> [1] FALSE

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.