Code Monkey home page Code Monkey logo

rethnicity's Introduction

rethnicity

R-CMD-check CRAN_Status_Badge CRAN_Downloads

The goal of rethnicity is to provide a method to predict ethnicity from names of people.

WARNING!

I created this package hoping to help applied researchers on their studies regarding ethnic bias and discrimination, and potentially eliminate the racial and ethnic disparities. By using this package, you agree to the following:

  1. You will NOT use this package for purposes other than academic research.
  2. You will NOT disclose the predicted ethnic group to the public, given the names data you might have.
  3. You will NOT discriminate anyone on the basis of race and color, by using the methods provided by this package.
  4. You understand that the method cannot make predictions 100% correct, and you should be cautious about the results.
  5. You will not use the information to study individuals, but rather to study populations in the aggregate.

Again, you should use the package responsibly and please refer to the methodology paper for details.

Installation

I recommend using the wonderful package manager pak to install this package:

# first install `pak` if not yet installed
# install.packages("pak")

# install the CRAN version
pak::pkg_install("rethnicity")

# or install the Github development version
pak::pkg_install("fangzhou-xie/rethnicity")

Of course, you can also install the package in the old way: install the released version of rethnicity from CRAN with:

install.packages("rethnicity")

Or the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("fangzhou-xie/rethnicity")

How to use this package?

There is a vignette that discusses how to use this package.

Documentation on Methodology

The complete description of the methodology is on arXiv and published on SoftwareX and please cite it as:

@article{xie2022,
  title = {Rethnicity: {{An R}} Package for Predicting Ethnicity from Names},
  shorttitle = {Rethnicity},
  author = {Xie, Fangzhou},
  year = {2022},
  month = jan,
  journal = {SoftwareX},
  volume = {17},
  pages = {100965},
  issn = {2352-7110},
  doi = {10.1016/j.softx.2021.100965},
}

@article{xie2021,
  title = {Rethnicity: Predicting {{Ethnicity}} from {{Names}}},
  shorttitle = {Predicting {{Ethnicity}} from {{Names}} with Rethnicity},
  author = {Xie, Fangzhou},
  year = {2021},
  month = sep,
  journal = {arXiv:2109.09228 [cs]},
  eprint = {2109.09228},
  eprinttype = {arxiv},
}

License

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

This license was chosen to prohibit commercial usage, while still being free and accessible for non-commercial academic uses.

rethnicity's People

Contributors

fangzhou-xie avatar tnagler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

rethnicity's Issues

interpretation of results on individuals

Hello,

I am excited to be trying out the rethnicity tool for my dataset, which contains around 340 names of film directors that were released in Germany in the past ten years. Since I am pretty new to R, I am not sure, if I simply made a mistake in my coding or if the tool is not suitable for the names in the dataset. When using method=fullname all names have highest prediction for “asian”, whereas when using method=lastname all names have highest prediction for “white”. When manually predicting for only two names of the dataset, those are both returned as highest prediction for “black”.
I was wondering, if anyone has encountered this problem with non-US American based datasets at all?

Cheers,
Mata

Sharing results

I did a fun little project predicting 2 million LA County parcel registered owner names with this library. The main issues here seem to be with Filipino and Armenian names

Screenshot_20230810_155812

Here is predicted ethnicity of owner name vs reported occupant ethnicity in 2020 census (majority at block group level)

Screenshot_20230807_214143

Release rethnicity 0.2.5

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • Update cran-comments.md
  • git push
  • rhub v2 auto-check using Github Actions

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)

CRAN warnings with GCC14

Email from CRAN team:

Dear maintainer,

Please see the problems shown on
<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcran.r-project.org%2Fweb%2Fchecks%2Fcheck_results_rethnicity.html&data=05%7C02%7Cfx31%40connect.rutgers.edu%7Cddcca7b1f6644492607008dc9c07d964%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C638556804744513348%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=5uRWORN69SV3YyrsHhdpqqzHeTph0ZEQ3khO6ATOVLw%3D&reserved=0>.

Please correct before 2024-08-03 to safely retain your package on CRAN.

This is primarily about the compiler C++ warnings
shown in the fedora-gcc results.  These are from g++ 14.x
which was released a couple of months ago.

The CRAN Team

This is caused by the change in standard (also affecting other packages like rcppsimdjson).

Right now, setting CXX_STD = CXX17 in Makevars will work, but probably need a long-term solution.

Todo: guard from passing dataframe as argument

As mentioned in #2 and #3, users have wrong predictions when they pass dataframes as vectors in the argument. Need to PR later to guard against this (return error instead of passing towards the model).

Issue and Accuracy on European Names

Hi
It's my first time downloading r-code from github but I installed the package via the normal "install.packages("rethnicity") to R, added the code in the picture below, ran it all and then tried to run " predict_ethnicity(firstnames = "Alan" , lastnames = "Turing")"
As you can see in the console I've clearly just run "predict_fullname_ccp" and earlier "predict_fullname" yet it claims it can't find the object: '_rethnicity_predict_fullname'
What have I missed?

image
image
image

issue with installation "C++14 standard requested but CXX14 is not defined"

I have problems with the installation.

First I get this message:

Package which is only available in source form, and may need
compilation of C/C++/Fortran: ‘rethnicity’

Then, I am asked to confirm whether I want to download from sources, and I give permission.

This is the message that I get when finally the package is being installed:

  • installing source package 'rethnicity' ...
    ** package 'rethnicity' successfully unpacked and MD5 sums checked
    ** using staged installation
    ** libs

*** arch - i386
Error in .shlib_internal(args) :
C++14 standard requested but CXX14 is not defined

  • removing 'C:/Users/luca_/Documents/R/win-library/3.6/rethnicity'
    Warning in install.packages :
    installation of package ‘rethnicity’ had non-zero exit status

So, eventually, the package is not installed.

What should I do?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.