Code Monkey home page Code Monkey logo

synr's Introduction

synr

Build Status

This is an R package for working with data resulting from grapheme-color synesthesia-related consistency tests. synr provides tools for exploring test data, including visualizing a single participant's data, and applying summarizing functions such as calculating color variation/consistency scores or classifying participant data as valid or invalid.

Installation

synr is available on CRAN, meaning you can simply:

install.packages('synr')

Note that this will also install packages that synr depends on unless you already have them (dbscan, data.table and ggplot2).

Usage

Once data are in an appropriately formatted data frame/tibble ('long format' - see vignettes for more information), everything starts with rolling up participant data into a 'ParticipantGroup' object with create_participantgroup_widedata:

library(synr)

pgroup <- create_participantgroup(
    formatted_df, # data frame/tibble to use, with data in 'long format'
    n_trials_per_grapheme=3, # number of trials that grapheme was used for
    participant_col_name="participant_id", # name of column which holds participant ID's
    symbol_col_name='symbol', # name of column which holds grapheme symbol strings
    color_col_name='color', # rname of column which holds response color HEX codes
    color_space_spec = "Luv" # color space to use for all calculations with participant group
)

Using the resulting object (pgroup), you can call various methods. A few examples follow.

Example group-level method: get_mean_consistency_scores

pgroup$get_mean_consistency_scores(symbol_filter=LETTERS) would return a vector of CIELUV-based consistency scores, using only data from trials involving capital letters.

Example group-level method: check_valid_get_twcv_scores

pgroup$check_valid_get_twcv_scores(symbol_filter=0:9) would return a data frame which describes classifications of all participant data, where each data set is classified as 'invalid' or 'valid', based largely on DBSCAN clustering. This may be used to identify participants who varied their responses too little, e. g. by responding with an orange color on every trial.

Example participant-level method: get_plot

pgroup$participants[[1]]$get_plot(symbol_filter=LETTERS) would produce a bar plot of per-grapheme consistency scores for a single participant, using only data from trials involving capital letters. You can see an example below.

Example bar plot of grapheme-level consistency scores

Detailed usage information

More details on required data format and how to use the above functions and more can be found in the package's vignettes, some of which are also included in the package itself (run help(synr) to find them). Additional information is available in the following article:

Wilsson, L., van Leeuwen, T.M. & Neufeld, J. synr: An R package for handling synesthesia consistency test data. Behav Res 55, 4086โ€“4098 (2023). https://doi.org/10.3758/s13428-022-02007-y

Feedback

If you have any suggestions on improvements you are very welcome to directly raise issues or commit code improvements to the github repository at https://github.com/datalowe/synr.

synr's People

Contributors

datalowe avatar

Watchers

 avatar

synr's Issues

Rename `synr_example_full` and convert it to long format

The 'small' example data frames synr_exampledf_long_small and synr_exampledf_wide_small provide example data in both long and wide formats. The 'full' example data frame should be in long format to emphasize that this is the 'default' data format for synr (esp. since 'wide' format might come to be deprecated and then not supported in later versions), and the resulting data frame should be named synr_exampledf_full.

Add arguments for specifying background/foreground colors in plotting-related functions

Currently, the Participant get_plot method always uses a white background and black foreground (graph bars, axis texts, et c) color. This means that graphemes in very light colors are hard to discern. Users should be allowed to choose which colors to use, so that they can tweak presentation as they wish.

Functions/methods relying on get_plot, such as Participant$save_plot, also need to be updated after enhancement has been applied.

Remove 'proportion color'-related functions/methods

Methods like 'Participant$get_prop_color' should be removed from synr as they are bound to cause confusion, especially with the 'color labels' that they've provided and because they only use RGB color space. DBSCAN-based validation is a better tool for what they were designed for.

Apart from removing the code, tests and examples in eg the 'main tutorial' vignette need to also be removed.

Only count noise cluster if it includes at least `dbscan_min_pts` points

The check_valid_get_twcv function's safe_num_clusters parameter is to specify the necessary number of clusters which should be enough to classify a participant's data so long as they are non-tight-knit. Currently, the DBSCAN 'noise' cluster always counts towards this tally, regardless of how many points are encompassed by the noise cluster.

This might mean that a participant has 20 green points (cluster G), 20 blue points (cluster B), 20 red points (cluster R), and just one black point (noise cluster). This would be considered by check_valid_get_twcv to reach a safe_num_clusters = 4 criterion, regardless of what the dbscan_min_pts parameter is set to.

It would make more intuitive sense, and probably be more useful, if the noise cluster counts toward the 'safe_num_clusters' tally only if it consists of at least min_pts points, just like other potential clusters don't count if they have less than min_pts points.

The documentation for Participant and ParticipantGroup methods using check_valid_get_twcv would probably need to also be updated after the improvement is applied.

Suggestion: Add identified number of clusters to output data of `get_valid_twcv`

Currently, get_valid_twcv returns a list with components valid, reason_invalid and twcv. In order to better mirror the input parameters, it would make sense to also include a component num_identified_clusters, which says how many clusters were counted toward the safe_num_clusters-related tally. Related Participant and ParticipantGroup methods would also need to be updated to pass this information on to the user.

Fail gracefully if internet resource not available

The CRAN team reported the following issue with synr: "Packages which use Internet resources should fail gracefully with an informative message if the resource is not available (and not give a check warning nor error).".

AFAIU the issue stems from the "Dingemanse sample data vignette", where it calls read.csv to read in data from GitHub.

Move `na.rm` to ends of parameter lists

Many methods in synr include a na.rm argument, but this currently goes at the beginning of parameter lists. eg ParticipantGroup includes this method:

    get_mean_consistency_scores = function(
      na.rm = FALSE,
      symbol_filter = NULL,
      method="euclidean"
    ) 

To stay in line with R conventions, na.rm should always come at the end of paramater lists.

Add option to validation functions to only include data from complete graphemes

When calculating consistency scores, oftentimes only data from graphemes with a full set of valid color responses are considered. Thus, get_valid_twcv and related functions/methods should offer the option of only considering data from such graphemes when classifying the data set as valid or not. That is, only color data points from 'complete' graphemes would be included when applying DBSCAN clustering and related calculations.

How should very light color responses, varying fairly much in hue, be handled?

Steps to illustrate issue:

githuburl <- 'https://raw.githubusercontent.com/mdingemanse/colouredvowels/master/BRM_colouredvowels_voweldata.csv'
dingemanse_voweldata <- read.csv(githuburl, sep=' ')

cvow_long <- dingemanse_voweldata %>% 
  pivot_longer(
    cols=c('color1', 'color2', 'color3',
      'timing1', 'timing2', 'timing3'),
    names_to=c(".value", "trial"),
    names_pattern="(\\w*)(\\d)",
    values_to=c('color', 'timing')
  )

pg <- create_participantgroup(
  raw_df=cvow_long, # CHANGE THIS
  n_trials_per_grapheme=3,
  id_col_name="anonid",
  symbol_col_name="item",
  color_col_name="color",
  time_col_name="timing",
  color_space_spec="Luv"
)

validity_df <- pg$check_valid_get_twcv_scores(
  min_complete_graphemes = 7,
  dbscan_eps = 30,
  dbscan_min_pts = 4,
  max_var_tight_cluster = 100,
  max_prop_single_tight_cluster = 0.6,
  safe_num_clusters = 4,
  safe_twcv = 250,
  symbol_filter = NULL
)
validity_df$id <- pg$get_ids()

print(validity_df[validity_df$id=='d47c0e32-e3e2-4acf-84d0-08bf7375308b', ])

The above shows that the particular participant is classified as having invalid data, reason 'few_clusters_low_twcv'.

The corresponding plot for the participant looks like this.

pg$participants$`d47c0e32-e3e2-4acf-84d0-08bf7375308b`$get_plot(grapheme_size = 4)

ex_light_color_resp_plot

The participant did use very light colors throughout the whole experiment. On the other hand, the colors varied a fair bit in hue. Then again, if naively applying a consistency score calculation with a threshold of 135.3 (as suggested by Rothen, Rothen, Seth, Witzel & Ward, 2013), this participant would be considered a synesthete, even though their responses do not appear very consistent to a human observer.

Participant$save_plot() produces error for all-NA response participants

Participant$save_plot() produces the error message "Error: Discrete value supplied to continuous scale" in some cases, seemingly when a participant only has graphemes with all-NA response color matrices.

(this needs to be unittested/isolated first however, it might be that the error was produced due to how Participant instances were created by create_participantgroup(), as the error occurred with a participant created this way)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.