Code Monkey home page Code Monkey logo

openspecy-package's Introduction

Open Specy 1.0

Analyze, Process, Identify, and Share Raman and (FT)IR Spectra

CRAN version Project Status R-CMD-check Codecov test coverage License: CC BY 4.0 DOI Website Gitter

Raman and (FT)IR spectral analysis tool for plastic particles and other environmental samples (Cowger et al. 2021, doi: 10.1021/acs.analchem.1c00123). With read_any(), Open Specy provides a single function for reading individual, batch, or map spectral data files like .asp, .csv, .jdx, .spc, .spa, .0, and .zip. process_spec() simplifies processing spectra, including smoothing, baseline correction, range restriction and flattening, intensity conversions, wavenumber alignment, and min-max normalization. Spectra can be identified in batch using an onboard reference library (Cowger et al. 2020, doi: 10.1177/0003702820929064) using match_spec(). A Shiny app is available via run_app() or online at https://openanalysis.org/openspecy/.

Installation

OpenSpecy is available from CRAN and GitHub.

Install from CRAN (stable version)

You can install the latest release of OpenSpecy from CRAN with:

install.packages("OpenSpecy")

Install from GitHub (development version)

To install the development version of this package, paste the following code into your R console (requires devtools):

if (!require(devtools)) install.packages("devtools")
devtools::install_github("wincowgerDEV/OpenSpecy-package")

Getting started

library(OpenSpecy)
run_app()

Simple workflow for single spectral identification

See package vignette for a detailed standard operating procedure.

# Fetch current spectral library from https://osf.io/x7dpz/
get_lib("derivative")

# Load library into global environment
spec_lib <- load_lib("derivative")

# Read sample spectrum
raman_hdpe <- read_extdata("raman_hdpe.csv") |> 
  read_any()

# Look at the spectrum
plotly_spec(raman_hdpe)

# Process the spectra and conform it to the library format
raman_proc <- raman_hdpe |>
  process_spec(conform_spec_args = list(range = spec_lib$wavenumbers), 
               smooth_intens = T, make_rel = T)

# Compare raw and processed spectra
plotly_spec(raman_hdpe, raman_proc)

top_matches <- match_spec(raman_proc, library = spec_lib, na.rm = T, top_n = 5,
                          add_library_metadata = "sample_name",
                          add_object_metadata = "col_id")

# Print the top 5 results with relevant metadata
top_matches[, c("object_id", "library_id", "match_val", "SpectrumType",
                "SpectrumIdentity")]

# Get all metadata for the matches
get_metadata(spec_lib, logic = top_matches$library_id)

Citations

Cowger W, Steinmetz Z, Gray A, Munno K, Lynch J, Hapich H, Primpke S, De Frond H, Rochman C, Herodotou O (2021). “Microplastic Spectral Classification Needs an Open Source Community: Open Specy to the Rescue!” Analytical Chemistry, 93(21), 7543–7548. doi: 10.1021/acs.analchem.1c00123.

Cowger W, Steinmetz Z, Leong N, Faltynkova A, Sherrod H (2024). “OpenSpecy: Analyze, Process, Identify, and Share Raman and (FT)IR Spectra.” R package, 1.0.8. https://github.com/wincowgerDEV/OpenSpecy-package.

openspecy-package's People

Contributors

hsherrod2019 avatar nickleong20 avatar wincowgerdev avatar zsteinmetz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

openspecy-package's Issues

adj_intens will not return km transformation.

I recently realized that the km transformation (reflectance) isn't working in the adj_intens function.

reproducible example:

library(OpenSpecy) data("raman_hdpe") head(adj_intens(raman_hdpe, type = "reflectance")) head(adj_intens(raman_hdpe, type = "none"))

The two datasets should be the inverse of one another but they produce very slightly altered versions of the same thing.

However, transmittance seems to be working fine.

Bug in R update with regular expression

It turns out that your packages, during their tests, pass invalid
characters to regular expression operations. Invalid character here
means a sequence of bytes that doesn't match a character in the encoding
the string should have. Therefore, the regular expression operations may
not (and in some cases can not) proceed correctly.

Until now, R used to silently escape such invalid characters using
"", where NN is a hexadecimal number, but then the results of such
operations could be not quite as intended. R-devel has been improved to
detect these cases and report an error or warning, and this triggers
during package checks of your packages, so they will now start failing
their tests to signal the error.

More information is available in a blog post:

https://developer.r-project.org/Blog/public/2022/06/27/why-to-avoid-%5Cx-in-regular-expressions/index.html

even though in your cases (almost all) it seems to me following a quick
check that the invalid string is not the regular expression itself, but
one of the inputs.

Please fix your packages to ensure that the strings are valid. Very
likely often the problem is that the data you process are not properly
read into R - not converted to the current encoding (or to UTF-8); they
are expected to be in the current encoding, then, but they are not (the
current encoding may be different on different systems, though UTF-8 is
most common with recent R on recent systems). I've seen this happen also
in cases when the data were assumed to be ASCII, but in fact contained
some extended ASCII characters.

[Feature]: Automated Matching

Guidelines

  • I agree to follow this project's Contributing Guidelines.

Description

Garth C created this routine for automated matching with the standard package. Hopefully there are somethings we can reuse from that.

Problem

No

Proposed Solution

library(dplyr)
library(OpenSpecy)
library(stringr)
library(ggplot2)
library(foreach)

analyze_particles <-
  function(spectra) {
    spec_lib <-
      load_lib(which = "raman")  # load the spectral library
    
    output <- list()  # initialize lists and vectors for the loops
    wavenumber <- numeric()
    intensity <- numeric()
    identity <- numeric()
    rsq <- numeric()
    match <- data.frame()
    
    for (i in 1:(nrow(spectra) - 1)) {
      for (j in 1:(ncol(spectra) - 1)) {
        wavenumber[j] = spectra[1, j + 1]
        intensity[j] = spectra[i + 1, j + 1]
      }
      data <- data.frame(wavenumber,
                         intensity)
      output[[i]] <- data
    }
    
    ## Now loop over all particles and output summary data
    
    parallel::detectCores()
    
    n.cores <- parallel::detectCores() - 1  # set core number
    
    #create the cluster
    my.cluster <- parallel::makeCluster(n.cores,
                                        type = "PSOCK")
    
    #register it to be used by %dopar%
    doParallel::registerDoParallel(cl = my.cluster)
    
    match <- foreach(i = 1:length(output)) %dopar% {
      match <- OpenSpecy::match_spec(
        output[[i]],
        library = spec_lib,
        which = "raman",
        type = "full",
        top_n = 1
      )
    }
    
    parallel::stopCluster(my.cluster)
    
    identity <- unlist(lapply(match, function(x)
      x[1, 2]))
    rsq <- unlist(lapply(match, function(x)
      x[1, 3]))
    
    spec_summary <-
      data.frame(particle_number = seq(1, length(identity)),
                 identity, rsq)
    
    ## Add indicators for whether particles are PE, PS, PET, PC, or something else
    ## PE
    spec_summary$PE <- ifelse(str_detect(spec_summary$identity, 
                                         "LDPE|olyeth"),
                              1,
                              0)
    ## PS
    spec_summary$PS <- ifelse(str_detect(spec_summary$identity, 
                                         "PS|tyrene"),
                              1,
                              0)
    ## PET
    spec_summary$PET <- ifelse(str_detect(spec_summary$identity, 
                                          "PET|terephtalate"),
                               1,
                               0)
    ## Remove PE that is actually PET
    spec_summary$PE[spec_summary$PET == 1] <- 0
    ## PC 
    spec_summary$PC <- ifelse(str_detect(spec_summary$identity, 
                                         "olycarb"),
                              1,
                              0)
    
    spec_summary$other <- with(spec_summary, ifelse(PE == 1 |
                                                      PS == 1|
                                                      PET == 1|
                                                      PC == 1,
                                                    0,
                                                    1))
    return(spec_summary)
  }

Alternatives Considered

With the new open specy file format we may not be able to reproduce this exactly.

Allow for peak location only interpretation.

I recently got a question about why someone's spectra wasn't being interpreted correctly in Open Specy.

Their full-spectrum below as you can see is just the peak locations and they let me know that too.

wavenumber intensity
3273.85 83.32
1642.25 89.6
1376.09 77.57
1007.26 69.38

They were getting some erroneous results when trying to match this spectrum. I am pretty sure it is due to the correlation coefficient being used which will have a lot of variabilities when there is a small number of data points. I am not sure what function would help to improve accuracy for these types of uploads but I wanted to bring it up just to get it on our radar. It also appears that they didn't baseline correct their spectra or smooth it so that is a whole separate problem because Open Specy will perform better with smoothed and baseline-corrected spectra.

Plot download comes out in a not ready to print way

When I click the plot download button on the plotly chart, the plot comes out as a png with white text and a transparent background so many users will probably open it and think that the text is missing because most image viewers put a white background by default.
newplot (29)

We may want to alter the way the plot download happens so that it comes out ready to print. Adding a black background could help.

Numbers missing from sliders

Just realized today that the numbers on the sliders somehow got grayed out, not sure if it was from the theme update.

image

Isn't that big of a deal, maybe we just remove them altogether since the number on top of the slider already indicates what you need to know?

[bug]:read .0 file

Guidelines

  • I agree to follow this project's Contributing Guidelines.

Project Version

No response

Platform and OS Version

No response

Existing Issues

No response

What happened?

I cannot read .0 file in our HPC system.
And I cannot understand what this error mean.

Steps to reproduce

  1. Upload the file to my HPC system
  2. Just try to run read_0() function.
  3. Error occur.
    ...

Expected behavior

read the file

Attachments

a<-"/home/zxl1080/Git/19-rw-backsheets/data/test.0"
read_text(file=a,method = "read_0")

Error in if (offset < 0 || offset > fileSize) stop("Invalid offset") :
missing value where TRUE/FALSE needed

The attachment is the file I used.
test.0.zip

Screenshots or Videos

image

Additional Information

No response

Interface graph axes

Absorbance spectra, when given in wavenumbers, are usually labeled from high (left) to low (right) numbers on the x-axis. This is to prevent misunderstandings, because a higher wavenumber means lower wavelength or frequency. In the displayed manner high wavenumber could lead to the assumption of a high wavelength or frequency.

Define own class for spectral data to reduce error handling/checking?

This is more a reminder to myself but we can also discuss it here 😄

In the long run, we could think about creating an own class for spectral data that follows a defined structure. This class can then be the preferred input for all functions/methods. This will make it easier to check for user errors on the one hand. One the other hand, this gives less control to the user.

I did that with my other package where I defined a new class calibration that takes calibration data in a defined way. LODs and LOQ are then calculated only from those class objects.

Broken error handling?

I have the strange feeling that error handling is generally broken; not only with #81. Whenever those nice popups are supposed to show up, my app just crashes completely. Do you experience the same, @wincowgerDEV?

Could this have something to do with logging? loggit at least masks stop() which may let reactivity behave differently than usual: For example here:

https://github.com/wincowgerDEV/OpenSpecy/blob/ee3de83334b6093c006497b355183df361d5eea0/inst/shiny/server.R#L136-L149

I thought stop() would leave preprocessed_data() unchanged and thus not trigger any subsequent events. At the moment, it does so anyway which causes the complete crash of the app without showing any warning/error. Hashing out the following chunks makes it work again:

https://github.com/wincowgerDEV/OpenSpecy/blob/ee3de83334b6093c006497b355183df361d5eea0/inst/shiny/server.R#L509-L519

and

https://github.com/wincowgerDEV/OpenSpecy/blob/ee3de83334b6093c006497b355183df361d5eea0/inst/shiny/server.R#L535-L593

Any idea what's the matter here?

[Feature]: new code for data transformations

Guidelines

  • I agree to follow this project's Contributing Guidelines.

Description

data_files <- list.files("FILE PATH HERE") # Identify file names (the file path should lead to the file that contains your .csv files)
data_files # Print file names

This loop will 1) read your .csv files into R as individual data frames, 2)

for(i in 1:length(data_files)) { # Head of for-loop
assign(data_files[i], # Read and store data frames
read_csv(paste0("FILE PATH HERE/", # Make sure you have a backslash at the very end of the file path (as shown)
data_files[i]), skip = 2, col_names = c("wavenumber", "intensity"))) # This "skip" function is set to skip the top 2 lines of the data frame when reading in the .csv and can be adjusted to your data accordingly. The col_names function is set to rename the columns to fit the OpenSpecy format.
write_csv(get(data_files[i]), # Write CSV files to a new folder
paste0("FILE PATH HERE/", # This file string should lead to the NEW folder you'd like to have the edited .csv files placed into. Again, make sure there is a backslash at the end of the string.
data_files[i]))
}

Problem

Transform data to csv standard

Proposed Solution

Code in desc

Alternatives Considered

Base

add derivative transformation and derivative matching.

Gabriel Erni Cassola and I have been talking about adding derivative transformations to Open Specy and adding derivative matching.

Some initial code Gabriel wrote:



## Gabriel Erni Cassola
## MiP data base matching

##_______OpenSpecy________##
#__________________________#

# main github page with basic workflow: https://github.com/wincowgerDEV/OpenSpecy
# guides: https://htmlpreview.github.io/?https://github.com/wincowgerDEV/OpenSpecy/blob/main/vignettes/sop.html

# if a need to convert spectra files comes up, this may be of help:
# https://www.effemm2.de/spectragryph/about_feat.html


#############
#__LOAD LIBS AND FILES
library("dplyr")
library("OpenSpecy")
library("ggplot2")
library("signal")

#get_lib()
spec_lib.deriv <- load_lib("ftir", path = "/Users/ernica0000/Desktop/Basel_Documents/01_research/02_WaterColumn_1Year/02_protocols:material/newOpenSpecyLib/")

# obtain file paths
all_paths <- list.files(path = "/Users/ernica0000/Desktop/Basel_Documents/01_research/02_WaterColumn_1Year/01_data/pilotExtractions_<500µm/postBern/spectra/CSVFiles/Filter2/", pattern = "*.csv", full.names = T)
# load all files into a list
inputSpectraList <- lapply(all_paths, read_text)

# define how many top matches you want for each spectrum
n.TopMatches <- 5




#__ADAPTING THE LOOP FOR DERIVATIVES OF UNKNOWN SPECTRA

# create empty df to fill
outputSpectraList.adj.deriv <- list()
outputSpectraList.adj <- list()
matches.deriv <- data.frame(sample_name = c(), spectrum_identity = c(), rsq = c(), organization = c())

# loop to obtain derivatives of unknown spectra and save them again in the list
for (i in 1:length(inputSpectraList)) {		
	sampleSpec.adj <- inputSpectraList[[i]][1:2] %>% adj_intens(type = "none")
	outputSpectraList.adj[[i]] <- sampleSpec.adj
	
	# obtain 1st derivative of the intensity from the unknown spectrum; smoothing & derivative with sgolay()
	intDeriv <- as.data.frame(filter(filt = sgolay(p = 3, n = 21, m = 1), x = sampleSpec.adj$intensity))
	colnames(intDeriv) <- "intensity"
	waveN <- as.data.frame(sampleSpec.adj$wavenumber)
	colnames(waveN) <- "wavenumber"
	sampleSpec.adj.deriv <- cbind(waveN, intDeriv)
	
	# correct spectrum; smoothing uses Savitzky and Golay filter; baseline correction with IModPolyFit
	#sampleSpec.adj.sm <- smooth_intens(sampleSpec.adj, p = 3, n = 11)
	#sampleSpec.adj.corr <- subtr_bg(sampleSpec.adj.sm, degree = 8)
	outputSpectraList.adj.deriv[[i]] <- sampleSpec.adj.deriv
	
	# match spectrum and store top match; this does min-may normalization and Pearson correlation coefficient is calculated
	matchTop <- match_spec(sampleSpec.adj.deriv, library = spec_lib.deriv, which = "ftir", top_n = n.TopMatches)
	
	# it looks like sometimes the match_spec() output contains more than the specified "top_n"; is this when positions "top_n" & "top_n+1" are tied?
	# this statement checks for longer outputs and trims it to the specified length, warning that this happened
	if(length(matchTop$sample_name) > n.TopMatches){
		print("ATTENTION: More matches than n.TopMatches in sample:")
		print(i)
		matchTop <- slice_head(matchTop, n = n.TopMatches)
	}
	
	#store results
	matches.deriv <- rbind(matches.deriv, matchTop)
}

head(matches.deriv)

# convert tibble format to "classic" data frame
matches.deriv.df <- as.data.frame(matches.deriv)

length(matches.deriv.df$sample_name)
head(matches.deriv.df)
str(matches.deriv.df);tail(matches.deriv.df)

# fetch spectra sample names from each file preserving same order as loaded into the list
all_filenames <- as.data.frame(sub("\\..*$", "", basename(all_paths)))
# replicate names by number of top matches and name column
all_filenames.exp <- as.data.frame(all_filenames[rep(seq.int(1, nrow(all_filenames)), each = n.TopMatches), 1])
colnames(all_filenames.exp) <- "specName"
colnames(all_filenames) <- "specName"

# add to the spectra table
matches.deriv.df.named <- cbind(matches.deriv.df, all_filenames.exp)

# check finished data table
head(matches.deriv.df.named)

# extract "confident" matches
confidentMatches.deriv <- subset(matches.deriv.df.named, rsq >= 0.7)
confidentMatches.deriv
unique(confidentMatches.deriv$spectrum_identity) # check different types that turned up; this can be used to subset for checking specific substances...

# export the table as CSV into the spectra folder (EDIT FILE PATH ACCORDINGLY!)
write.csv(confidentMatches.deriv, "/Users/ernica0000/Desktop/Basel_Documents/01_research/02_WaterColumn_1Year/04_Results/Filter2_MatchingResults_1deriv.csv", row.names = F)















#############
#__PLOTTING

# 1. specify the exact sample name "specName" in df "confidentMatches"
desiredSpec.name <- "RhineTrial2_ap50_2cm_6sc_Extract_91"

# this retrieves the spectra from the lists generated in the loop [DONT EDIT THIS]
desiredSpec <- which(all_filenames$specName == desiredSpec.name)
plotSpec.adj.deric <- outputSpectraList.adj.deriv[[desiredSpec]][1:2]

# 2. specify corresponding database match by providing number ("sample_name" in df "confidentMatches.deriv")
sampName <- 237

# extract spectra from library and subset to specified spectrum
refs <- spec_lib.deriv[["ftir"]][["library"]]
sub <- subset(refs, sample_name == sampName)

# 3. plot the spectra
ggplot(plotSpec.adj.deric, aes(x = wavenumber, y = intensity, color = "sample")) + geom_line() + scale_x_reverse(limits = c(3350, 1150)) + geom_line(data = sub, aes(x = wavenumber, y = intensity, color = "ref")) + theme_bw() + ggtitle(desiredSpec.name) #+ geom_line(data = plotSpecRaw, aes(x = wavenumber, y = intensity, color = "raw"))



And code for converting the spectral library to derivative version:





## Gabriel Erni Cassola
## MiP data base matching

##______using_OpenSpecy________##
#_______________________________#

# main github page with basic workflow: https://github.com/wincowgerDEV/OpenSpecy
# guides: https://htmlpreview.github.io/?https://github.com/wincowgerDEV/OpenSpecy/blob/main/vignettes/sop.html

# if a need to convert spectra files comes up, this may be of help:
# https://www.effemm2.de/spectragryph/about_feat.html




library("dplyr")
library("OpenSpecy")
library("ggplot2")
library("signal")






#__CONVERT THE LIBRARY

# loading specific library
ftir.db <- readRDS("/Users/ernica0000/Desktop/Basel_Documents/01_research/02_WaterColumn_1Year/02_protocols:material/ftir_library.rds")

# checks...
is(ftir.db)
str(ftir.db); head(ftir.db)
# check if "sample_name" and "group" are identical
all(ifelse(ftir.db$group == ftir.db$sample_name, TRUE, FALSE))


# loop for taking the 1st derivatives
ftir.db.deriv <- data.frame(wavenumber = double(), intensity = double(), sample_name = integer(), group = integer())

for (i in 1:length(unique(ftir.db$sample_name))) {
	ftir.db.subset <- ftir.db[ftir.db$sample_name == i,]
	
	# obtain 1st derivative
	ftir.db.intDeriv <- as.data.frame(filter(filt = sgolay(p = 3, n = 11, m = 1), x = ftir.db.subset$intensity))
	colnames(ftir.db.intDeriv) <- "intensity"
	
	ftir.db.subset.intDeriv <- cbind(ftir.db.subset$wavenumber, ftir.db.intDeriv$intensity, ftir.db.subset$sample_name, ftir.db.subset$group)
	ftir.db.deriv <- rbind(ftir.db.deriv, ftir.db.subset.intDeriv)
}

colnames(ftir.db.deriv)[colnames(ftir.db.deriv) == "V1"] <- "wavenumber"
colnames(ftir.db.deriv)[colnames(ftir.db.deriv) == "V2"] <- "intensity"
colnames(ftir.db.deriv)[colnames(ftir.db.deriv) == "V3"] <- "sample_name"
colnames(ftir.db.deriv)[colnames(ftir.db.deriv) == "V4"] <- "group"

head(ftir.db.deriv)
saveRDS(ftir.db.deriv, "/Users/ernica0000/Desktop/Basel_Documents/01_research/02_WaterColumn_1Year/02_protocols:material/newOpenSpecyLib/ftir_library.rds")

ftir.db.subsetPlot <- ftir.db.deriv[ftir.db$sample_name == 6,]
ggplot(ftir.db.subsetPlot, aes(x = wavenumber, y = intensity)) + geom_line() + scale_x_reverse(limits = c(3350, 1150)) + theme_bw()

csv files incorrectly read

CSV files with a row number column are being read incorrectly. It is reading the row number column as wavenumber for some reason.

Example format.
image

  Wavelength name Absorbance
1 649.9 MM_1706090001 A2 0.002235
2 650 MM_1706090001 A2 0.00225
3 650.1 MM_1706090001 A2 0.002265
4 650.2 MM_1706090001 A2 0.00228
5 650.3 MM_1706090001 A2 0.002296
6 650.4 MM_1706090001 A2 0.002311
7 650.5 MM_1706090001 A2 0.002326
8 650.6 MM_1706090001 A2 0.002341
9 650.7 MM_1706090001 A2 0.002414

After the row number column is removed the file reads correctly.

Exclusive sharing via dropbox

With #46, this bit of code

https://github.com/wincowgerDEV/OpenSpecy/blob/3f6922d88f66de37a269dc5033fd1a6c486508e1/inst/shiny/server.R#L132-L139

changed from a generic way of sharing spectra to relying exclusively on Dropbox and bypassing the config.yml. Should we keep it this way?

I'd rather argue in favor of keeping the package functions the heart of the Shiny app and using them whenever possible. I agree that this is challenging when it comes to spectra sharing but I guess it's still possible if the share_spec function allows for (1) preparing files the way we would like them to receive and (2) provide interfaces to share those via e-mail (for package users) or dropbox (if hosted). In my opinion, this should not be too complicated.

If you agree, I could make the necessary changes with one of the next PRs.

How to store large data?

OpenSpecy's spectral data base is currently stored in a number of .csv files

https://github.com/wincowgerDEV/OpenSpecy/blob/252d265c4d4dace73c6b0fc727308d5a6d540268/inst/shiny/server.R#L28-L48

I suggest to store some sample data in an .rda container instead to automatically load with the package. We'll use it for unit testing and examples. Since CRAN packages should not exceed 5 MB, we'll need to find an alternative storage solution for the complete spectral data base.

This could be

Maybe https://github.com/ropensci/opendata and https://docs.ropensci.org/piggyback/ could be another source of inspiration.

I'm still racking my brain finding a good solution on how to best upload new spectra ..

[Bug]: No way to remove printed messages

Guidelines

  • I agree to follow this project's Contributing Guidelines.

What happened?

When looping with our functions there are messages that get printed to the console e.g. with the match_spec function. It would be nice to be able to turn off those messages.

Expected behavior

A setting which allows turning on and off of printed messages would be helpful.

Citation

How should we make sure that people know how to cite the package? I think we will want the publication that we are working on to eventually be the citation for the package but for now, it can just be what is on the website, plus adding @zsteinmetz as an author? Once the publication is out then we can update the citation with the manuscript citation?

background_subtraction()

@wincowgerDEV, have a look at 4ed9713.

I not only renamed the function to make it easier to grasp what it actually does but also

  • changed the input parameters to take a formula and data argument; this makes data input more versatile
  • changed the function output to a data.frame including the wave numbers to keep the input and output format consistent

Does this make sense to you?

Keeping Shiny App Functional While Integrating with GIT

I recently made some changes to the Shiny app so that I could pull down the whole repository (with your updates) and automatically run the shiny app on my desktop so that I can keep all developments in this repository. If I should be doing anything differently let me know. @zsteinmetz

I think we need to add in a call to the shiny app so that it is pulling down the functions that you are developing as add ons, for now I added them into the application.

I am unsure of what the OpenSpecy Library call in the shiny app will do, it was throwing an error so I hashed it out.

Add support for french csv files

Dr. Dehaut proposed some strategies for adding support for french csv files example file below:
Basically, French format for csv files are ‘,’ for decimal and ‘;’ for column separation. Use the read.csv2(data, header=T) function, and it makes the job.

Example.csv

Identifying changes in user spectrum compared to library - degradation/oxidation estimate

I've been thinking in a way to extract more information in a spectra that could be helpfull to estimate degradation or distubance to virgin plastics.

The idea is to implement an advanced parameter in the app that provides information about differences in the user spectrum to the overall library. It has been tought as a within-'polymer class' analysis and we can start with the polyethylene. I propose the following steps:

  1. Round spectra to unit to speed the process
  2. Pre-process: baseline correction, normalization by the sum and scaling (Pareto scaling?)
  3. Eliminate wavenumbers inherent to PE (mainly methylene vibrations in FTIR) - Finally we'll be working with the wavenumbers ranges 800-1200 cm-1, 1500-1900 cm-1 and 3000-3400 cm-1.
  4. Run a PCA to identify the spectral regions related with the most variance in spectra.
  5. Interpret these regions - my guess is that they will be related with oxidation process, i.e. carbonyl and hydroxyl groups.
  6. Estimate in which percentile of all spectra variability the analysed spectrum is: this information might be useful to the user to estimate how oxydyzed their microplastic is (therefore how degraded it is?!)

This will be rather a relative index than an absolute one, and it 's part of the "multivariate analysis tools" that I would like to implement in OS.
Totally open to discuss!

PS: I was inspired by a work that I done with microplastics collected in different environmental conditions (under revision)

Add cluster levels as an option to matching

Shreyas did cluster analysis on the FTIR Open Specy database. We should add an option to search based on the cluster group (which simplifies analysis), in the future it would be useful to use this when developing AI by conducting feature extraction using his standard deviation analysis and classifying the clusters instead of the classes currently in the list, his notes are below:

In short, I used SciPy’s hierarchical clustering and set threshold to cluster spectra together if (pearson_coefficient > 0.3). With this criterion, the data neatly separates into 33 clusters. To make this more useful:

I’ve created a figure (simplified_cluster_grid.png) attached that shows mean and standard deviation of all spectra contained respectively within each cluster – to me this boosts some confidence in the reliability of this process.
The original OpenSpecy web download includes a metadata file. The clustering code adds a column to this file (see final column called “cluster_ix” in the file ftir_metadata_clusters.csv attached)
Up till here everything is machine/code processed. But as a final step, I use human judgement in the attached file cluster_keys_simplified.csv where I added the last column “simplified_cluster_name” to enter simplified polymer category names similar to Primpke 2018.

I’m incorporating these into our lab’s analysis code – hopefully some of our data going forward will be labelled using these simplified cluster names after fitting with the OpenSpecy database. I also think this clustering and the simplified category names could be useful to other OpenSpecy users. All of this is now available on GitHub (with a more descriptive readme file and step-by-step jupyter notebook).

ftir_metadata_clusters.csv
cluster_keys_simplified.csv
simplified_cluster_grid

[Feature]: support for jsonld OpenSpecys

Guidelines

  • I agree to follow this project's Contributing Guidelines.

Description

Json-ld is a popular web framework for adding data to knowledge graphs. If we had a way to accurately format our new OpenSpecy file format to and from jsonld it would make them better understood in the web. There's already a pretty good package for it too.

Problem

Our data may not be currently indexable by web browsers very easily.

Proposed Solution

Create a function to transform Open Specy's to and from json-ld

Alternatives Considered

Rdf is another popular web format that could be useful because it works a lot like a database and there are some functions for streaming it like a database. Xml is another one but it's really challenging to work with. Json-ld is the most readable which could be nice if we want to display the source data for people but maybe that isn't the sole use for these files.

[Bug]: An error has occurred. Check your logs or contact the app author for clarification.

Guidelines

  • I agree to follow this project's Contributing Guidelines.

Project Version

3.5.1

Platform and OS Version

macOS 10.15.1

Existing Issues

No response

What happened?

When uploading a CSV file this is the error that pops up. Past CSV files that have already been uploaded to the system work but any new ones give me this error.

Steps to reproduce

  1. upload csv

...

Expected behavior

not this error message

Attachments

No response

Screenshots or Videos

No response

Additional Information

No response

Another strange CSV format, we should add a more informative error message

I received this CSV format that will not read in Open Specy and returns an uninformative error "An error has occurred. Check your logs or contact the app author for clarification". I think we should make the error handling for csv files give something informative like "CSV file is incorrectly formatted, please ensure that your CSV file is formatted exactly the same as the test file." I am labeling this as a bug because it is a recurring issue that people who upload incorrectly formatted csv files end up contacting me about.

Wavelength;Intensity    
649 8934;93 2791
650 8576;93 3083
651 8218;93 218
652 7861;92 997
653 7503;92 6856
654 7145;92 3564
655 6788;92 819
656 643;91 9002
657 6072;91 7954
658 5715;91 7053
659 5357;91 5565
660 4999;91 3125

Some CSV files not working

Recently recieved this csv file that seems to be correctly formatted but doesn't work.

Throws the error below in the logs.
2021-05-11T16:44:17.347091+00:00 shinyapps[2419654]: Warning: Error in abs: non-numeric argument to mathematical function
2021-05-11T16:44:17.353148+00:00 shinyapps[2419654]: 142: %>%
2021-05-11T16:44:17.353148+00:00 shinyapps[2419654]: 141: adj_neg
2021-05-11T16:44:17.353149+00:00 shinyapps[2419654]: 137: adj_intens.data.frame
2021-05-11T16:44:17.353150+00:00 shinyapps[2419654]: 119: data
2021-05-11T16:44:17.353151+00:00 shinyapps[2419654]: 112: renderPlotly [/srv/connect/apps/OpenSpecy/server.R#230]
2021-05-11T16:44:17.353151+00:00 shinyapps[2419654]: 111: func
2021-05-11T16:44:17.353149+00:00 shinyapps[2419654]: 140: adj_intens.default
2021-05-11T16:44:17.353150+00:00 shinyapps[2419654]: 135: reactive:data [/srv/connect/apps/OpenSpecy/server.R#196]
2021-05-11T16:44:17.353152+00:00 shinyapps[2419654]: 108: shinyRenderWidget
2021-05-11T16:44:17.353177+00:00 shinyapps[2419654]: 94: renderFunc
2021-05-11T16:44:17.353177+00:00 shinyapps[2419654]: 93: output$MyPlot
2021-05-11T16:44:17.353178+00:00 shinyapps[2419654]: 13: runApp
2021-05-11T16:44:17.353178+00:00 shinyapps[2419654]: 12: fn
2021-05-11T16:44:17.353175+00:00 shinyapps[2419654]: 107: func
2021-05-11T16:44:17.353179+00:00 shinyapps[2419654]: 5: eval
2021-05-11T16:44:17.353178+00:00 shinyapps[2419654]: 7: connect$retry
2021-05-11T16:44:17.353179+00:00 shinyapps[2419654]: 6: eval

<style> </style>

Wavelength;Absorbance

3.997.896.328;0.999979
3.995.967.772;1
3.994.039.216;0.999974
399.211.066;0.999901
3.990.182.104;0.999803
3.988.253.548;0.999699
3.986.324.993;0.999627
3.984.396.437;0.999584
3.982.467.881;0.999506
3.980.539.325;0.999387

Add multirange selection

One common spectral analysis function is to manually specify the ranges where all peaks exist and have those searched against the spectral library. Right now we only allow one range to be selected. We could add a function to allow for multiple range selections using the plotly add boxes function. The user would draw boxes around each peak region and we would use the min and max wavenumber of each range to specify the ranges.

Plotly functions for adding boxes.
https://plotly.com/r/shapes/

The data exchange pipeline would work similar to the manual baseline subtraction routine we have setup.

New bslib package for shiny breaks the shinyBS package popups

I tried to use the new bslib package to improve the open specy theme and ended up creating a conflict with the shinyBS package. The updated theme worked but it made the shinyBS package popups completely invisible. I can't find anything online about this issue yet. It might be worth bringing up to the Shiny team or to the bslib team.

license type

Thinking about whether CC BY NC or CC by is more appropriate for the Open Specy code and libraries.

It looks like we can use any of these licenses for CRAN submission:
https://svn.r-project.org/R/trunk/share/licenses/license.db

CC BY NC is allowed.

I think the original idea I had was to limit the risk of corporations taking the Open Specy code and commercializing it. There will likely be a myriad of licenses for the data and I need to do a better job of documenting which datasets have which license. That reminds me that we recently got some data which the people are ok with us using on the website but not with sharing. We need to make some new functionality to accommodate for that. Let me know your thoughts on this matter. I am leaning toward making the source code CC BY 4.0, making the data people upload CC BY NC (by default but we hold the license so we can always make it less restrictive when we want to), and making the data in the library whatever license the sharer wants it to be.

Create funding goals bar which highlights where we are at and what our goals are

Dying: 0-100$ the application will cease to be hosted online and maintenance will stop.
Life support: 100-1000$ per year just to keep the shiny app online and essential maintenance.
Doing alright: 1k-10k$ per year for maintenance costs and minor ad-hoc updates and bug fixes (***I think this is where we are at right now)
Sustaining: 10k - 100k$ per year for having a single part-time to full-time staff person working to update and build the community and the tool. (we will be here soon once I start with the Moore institute since they will be funding my time at roughly 10% to work on this which is about the minimum).
Revolutionizing the field: 100k-1mill$ per year to have a team that is constantly pushing the science and tool closer to the ultimate goal of 100% accurate spectral identification and deep spectral diagnostics with a single click.

Share metadata failing

When I try to share metadata with the new app on shiny.io the app crashes and doesn't return an error message in the logs.

image

Highlight users and praises

I want to have a board on the homepage where we officially recognize the work of users of Open Specy and highlight any letters of support that we have. There are a ton of examples on tweets that we could pin and some users who have expressed interest in being officially recognized as one of our "spectroscopy experts".

Develop a Predictive Model for Identifying Spectra

@ardcarvalho and @wincowgerDEV are working on developing a predictive model for identifying spectra, starting with PCA. The end goal is to develop a model which can be used to accurately predict any raw unprocessed spectrum. This model will speed up identification time and allow us to rapidly expand our resources. If we use an interpretable model, we may also be able to better understand which peaks are most important for identification. Ideally, the model accuracy will be greater than 90% which is the current accuracy of our default settings. This product is ripe for publication if we manage to pull it off and could have wide implications beyond Open Specy. The model will eventually be folded into the Open Specy package as a function (as long as the model file size isn't too large) and offered as a feature in the online version of the tool.

Steps

  • Develop model using the library in Open Specy and PCA
  • Test out some other model options, see below.
  • Use the model to predict the identities of the shared unknown data in Open Specy.
  • Use the new labeled open specy data to make a new model and predict the Open Specy library with it to see if we can improve accuracy.
  • Add model as a function in package
  • Add model as a feature in Shiny App.

Some other model options that might work:

  1. https://xgboost.readthedocs.io/en/latest/R-package/xgboostPresentation.html
  2. https://github.com/wincowgerDEV/OpenSpecyAI

Unexpected column information in raman_csv

The "group" column for the raman_library.csv (and likely the ftir_library.csv) is uninformative. It just copies the sample_name column. There also isn't enough information on the website about the metadata for each column for people to quickly understand all the variables in the datasets.

Add hyperspectral SWIR image analysis

@afalty and I just spoke about their hyperspectral image analysis routine and database. They have plans to make the database and analysis routine open source and want to see it applied in Open Specy. I think this new technology has a ton of applications in microplastic research and overlap with other hyperspectral images for FTIR and Raman which we have plans to implement in the future.

Their database has 4 polymers, marine samples, PE, PET, PS, PP, weathered plastic, 250 particles of each polymer, Hyspex images, down to 100 microns - 150 microns, dry samples on a filter, preprocessing for light source.

They have a multivariate model to divide the particles by polymer type, Simca model, pca for each class.

They will share some preliminary code and data in the next month and will be publishing their results with a draft sent out by the new year, 3-4 months for publication.

Steps to implement the data and code:

  • Add the reference dataset to Open Specy's file download list.
  • Add a button on the home screen which allows the user to select that they want to analyze hyperspectral swir.
  • Incorporate image upload and reshaping.
  • Test file size limitations and set the upload limits so that Open Specy doesn't break.
  • Incorporate image preprocessing.
  • Incorporate the model classification.
  • Incorporate the processed image visualization and reporting of particle sizes, shapes, and polymer types for the images.
  • Develop new functions to implement in the CRAN package.

[Feature]: A way to print the function code to the console by default

Guidelines

  • I agree to follow this project's Contributing Guidelines.

Description

A user was trying to print the raw function to the console to inspect it and expected that it would print the raw function code with a call like below.

match_spec()

But the code just referenced another function.

Problem

No

Proposed Solution

Make the below code print the raw function by default.

match_spec()

Alternatives Considered

No

Switches/check boxes for preprocess tab

I'm currently struggling with making this bit of code work with the new functions:

https://github.com/wincowgerDEV/OpenSpecy/blob/bde5f109207940dac1985313a3838ba0f1b4810f/inst/shiny/server.R#L205-L221

This is mainly because of the nested if statements and function logic being inconsistent with the UI: A smoother of p = 0 or a background substration with degree = 0 are not the same as no smoothing/subtraction. Would it make sense to add some switches/check boxes to the tab to toggle each filter on and off?

Opening app without first loading libraries fails

When I try to open the application without first loading the libraries the libraries will try to load but then they will stall.

image

If I load the libraries first everything works fine.

There isn't an error thrown for this which tips the user off about what they need to do to get the app working. We should probably try to make the auto download work when the app tries to open and throw an error that makes sense if it doesn't.

Create a forum where people can talk about spectra and matches from the app.

I want to have a discussion forum where people can post their spectra and ask for input about whether they have processed it correctly/identified it correctly. In microplastic research, we have so many new practitioners who are just getting started and often don't have the mentorship to figure out how best to ID their spectra and clean them. This forum will hope to solve that problem. I already have a few people in mind for Admins that have gotten in contact in the past.

This could be on the OS webpage as a new tab/popup on the identification tab. It could also be an entirely separate forum location. It would be nice if whatever we use could somehow be plugged into people's email addresses so that they get an email when someone responds to their comment so we don't lose the threads as often.

Any ideas @zsteinmetz or @ardcarvalho?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.