Code Monkey home page Code Monkey logo

usepa / tada Goto Github PK

View Code? Open in Web Editor NEW
33.0 9.0 15.0 367.43 MB

This R package can be used to compile and evaluate Water Quality Portal (WQP) data for samples collected from surface water monitoring sites on streams and lakes. It can be used to create applications that support water quality programs and help states, tribes, and other stakeholders efficiently analyze the data.

Home Page: https://usepa.github.io/TADA/

License: Creative Commons Zero v1.0 Universal

R 99.97% Rez 0.03%
ow

tada's Introduction

Welcome to TADA: Tools for Automated Data Analysis!

Tools for Automated Data Analysis, or TADA, is being developed to help States, Tribes (i.e., Tribal Nations, Pueblos, Bands, Rancherias, Communities, Colonies, Towns, Indians, Villages), federal partners, and any other Water Quality Portal (WQP) users (e.g. researchers) efficiently compile and evaluate WQP data collected from water quality monitoring sites. TADA is both a stand-alone R package, and a building block to support development of the TADA R Shiny application. We encourage you to read this package's LICENSE and README files (you are here).

Installation

You must first have R and R Studio installed to use the TADA R Package (see instructions below if needed). Our team is actively developing TADA, therefore we highly recommend that you update the TADA R Package and all of its dependency libraries each time you use the package. You can install and/or update the TADA R Package and all dependencies by running:

if(!"remotes"%in%installed.packages()){
install.packages("remotes")
}

remotes::install_github("USEPA/TADA", ref = "develop", dependencies = TRUE, force = TRUE)

The TADA R Shiny application can be run on the web (R and R Studio install not required), or within R Studio. Run the following code within R Studio to install or update and run the most recent version of the TADA R Shiny application:

if(!"remotes"%in%installed.packages()){
install.packages("remotes")
}

remotes::install_github("USEPA/TADAShiny", ref = "develop", dependencies = TRUE, force = TRUE)

TADAShiny::run_app()

Water Quality Portal

In 2012, the WQP was deployed by the U.S. Geological Survey (USGS), the U.S. Environmental Protection Agency (USEPA), and the National Water Quality Monitoring Council to combine and serve water-quality data from numerous sources in a standardized format. The WQP holds over 420 million water quality sample results from over 1000 federal, state, tribal and other partners, and is the nation's largest source for single point of access for water-quality data. Participating organizations submit their data to the WQP using the EPA's Water Quality Exchange (WQX), a framework designed to map their data holdings to a common data structure.

Install R and R Studio

  1. To download R: Go to https://cran.r-project.org/ and click the link that describes your computer operating system in the first box in the menu entitled "Download and Install R".
  2. Clicking your operating system will take you to a new page, which looks slightly different for PC (first image) and Macs (second image):

  1. Download the program by clicking the appropriate link for your system, and click through the installer windows on your computer, accepting all defaults.

  2. Next, go to the following link to download RStudio: https://posit.co/download/rstudio-desktop/, scroll down a little, and click download RStudio.

  1. Again, download the installer, click through the prompts, and accept the defaults.

Note: If you are an EPA employee, please follow the directions here instead of the instructions above: https://work.epa.gov/software/r-software.

Open-Source Code Policy

Effective August 8, 2016, the OMB Mandate: M-16-21; Federal Source Code Policy: Achieving Efficiency, Transparency, and Innovation through Reusable and Open Source Software applies to new custom-developed code created or procured by EPA consistent with the scope and applicability requirements of Office of Management and Budget's (OMB's) Federal Source Code Policy. In general, it states that all new custom-developed code by Federal Agencies should be made available and reusable as open-source code.

The EPA specific implementation of OMB Mandate M-16-21 is addressed in the System Life Cycle Management Procedure. EPA has chosen to use GitHub as its version control system as well as its inventory of open-source code projects. EPA uses GitHub to inventory its custom-developed, open-source code and generate the necessary metadata file that is then posted to code.gov for broad reuse in compliance with OMB Mandate M-16-21.

If you have any questions or want to read more, check out the EPA Open Source Project Repo and EPA's Interim Open Source Code Guidance.

License

All contributions to this project will be released under the CCO-1.0 license file dedication. By submitting a pull request or issue, you are agreeing to comply with this waiver of copyright interest.

Disclaimer

This United States Environmental Protection Agency (EPA) GitHub project code is provided on an "as is" basis and the user assumes responsibility for its use. EPA has relinquished control of the information and no longer has responsibility to protect the integrity, confidentiality, or availability of the information. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA. The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.

Contact

If you have any questions, please reach out to Cristina Mullin ([email protected]).

tada's People

Contributors

coobr01 avatar cristinamullin avatar ehinman avatar elisehinman avatar hillarymarler avatar jakegreif avatar jbousquin avatar jesseboormanpadgett avatar kathryn-willi avatar katiehealy avatar laurashumway avatar ldecicco-usgs avatar mthawley avatar nx10 avatar renaemyers avatar zsmith27 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tada's Issues

Research iterating by row efficiently

The current code (rowwise and case_when functions) used to check data row by row is used in a number of functions (command+f in .R files to see which functions use it). But it takes a long time to run, especially for larger datasets. Research other methods of checking data row by row in R and compare the efficiency of those methods to the one being used.

Note that this may be the best way to do it; it could be that the process of looking at data row by row is cumbersome regardless of how it's code.

Filtering page functions

Add functions for filtering page to the utilities.R file or create a new filtering.R file to place these and the other specific filtering functions (continuous data, media not water, etc.).

Use functions from Jake's filtering vignette: https://usepa.sharepoint.com/:f:/r/sites/WQPDataAssessmentTeam/Shared%20Documents/General/TADA%20Dev/FirstTool_DataDiscoveryandCleaning/Logic-vignettes/Filtering?csf=1&web=1&e=bGWdEh

Fields for full dataset filtering
ActivityTypeCode
ActivityMediaName
ActivityMediaSubdivisionName
ActivityCommentText
MonitoringLocationTypeName
StateName
TribalLandName
OrganizationFormalName
CharacteristicName
HydrologicCondition
HydrologicEvent
BiologicalIntentName
MeasureQualifierCode
ActivityGroup
AssemblageSampledName
ProjectName
CharacteristicNameUserSupplied
DetectionQuantitationLimitTypeName
SampleTissueAnatomyName
LaboratoryName

Fields for characteristic level filtering
ActivityCommentText
ActivityTypeCode
ActivityMediaName
ActivityMediaSubdivisionName
MeasureQualifierCode
MonitoringLocationTypeName
HydrologicCondition
HydrologicEvent
ResultStatusIdentifier
MethodQualifierTypeName
ResultCommentText
ResultLaboratoryCommentText
ResultMeasure/MeasureUnitCode
ResultSampleFractionText
ResultTemperatureBasisText
ResultValueTypeName
ResultWeightBasisText
SampleCollectionEquipmentName
LaboratoryName
MethodDescriptionText
ResultParticleSizeBasisText
SampleCollectionMethod/MethodIdentifier
SampleCollectionMethod/MethodIdentifierContext
SampleCollectionMethod/MethodName
DataQuality/BiasValue
MethodSpeciationName
ResultAnalyticalMethod/MethodName
ResultAnalyticalMethod/MethodIdentifier
ResultAnalyticalMethod/MethodIdentifierContext
AssemblageSampledName
CharacteristicNameUserSupplied
DetectionQuantitationLimitTypeName

PotentialDuplicate_RowID

Page Requirements/Standards

All of the flag page functions should be consistent in the following ways:

  • Clean argument indicates whether flag columns should be appended to the data (clean = FALSE), or flagged data is transformed/filtered from the dataset and no columns are appended (clean = TRUE).
  • Default is clean = FALSE

Function Requirements

  • Reference table required?
    No
  • Include warning if flags are not applied?
    No
  • Required columns:
    ActivityIdentifier
    ActivityConductingOrganizationText
    OrganizationFormalName
    OrganizationIdentifier
    ProjectIdentifier
    ResultCommentText
    ActivityCommentText

Development Notes

  • ResultFlagsIndependent.R
  • Required columns are those that are not included when checking for duplicates
  • Use code from "auto clean" function
  • Flag is unique identifier for each unique row (identical IDs = duplicate row)

Testing vignette

Run all functions and test them, including the functions with open issues. Provide feedback.

Research how run functions upon loading package to R

We would like TADA to use the most up-to-date reference tables each time its used. Running the generate ref table functions automatically upon loading the package into R would allow us to do this. Do some research to learn how we can do that.

If it's not clear or there is no standard way to that, consider other approaches to achieving the goal of automatically refreshing tables

License & Contribution files

Discuss CCO vs. MIT license. Which is a better fit?

MIT License

Copyright (c) 2022 Environmental Protection Agency

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

QAPPDocAvailable

Page Requirements/Standards

All of the flag page functions should be consistent in the following ways:

  • Clean argument indicates whether flag columns should be appended to the data (clean = FALSE), or flagged data is transformed/filtered from the dataset and no columns are appended (clean = TRUE).
  • Default is clean = FALSE

Function Requirements

  • Reference table required?
    No
  • Include warning if flags are not applied?
    No
  • Required columns:
    ProjectFileUrl

Development Notes

  • Consider using this logic:
    ProjectAttachedBinaryObject is populated, QAPPavailable = Y (when clean = TRUE, these columns are retained)
    ProjectAttachedBinaryObject is not populated, QAPPavailable = N (when clean = TRUE, these columns are not retained)

UpdateMeasureUnitRef

Page Requirements/Standards

All of the generate reference file functions should be consistent in the following ways:

  • No argument is included. Similar to the check() or dcoument() functions in devtools
  • Where possible, read data in via URL (not from a static, downloaded file) to maintain up-to-date records
  • Finish each function with UpdateInternalData(x), which is a function unique to TADA (at the top of GenerateRefTables.R) that updates sysdata.rda without overwriting other data.

Development Notes

  • Raw data only has "inches" as the target unit for units of type (Description column) "Length Distance." This function adds data for "m" and "ft" target units

InvalidSpeciation

Page Requirements/Standards

All of the flag page functions should be consistent in the following ways:

  • Clean argument indicates whether flag columns should be appended to the data (clean = FALSE), or flagged data is transformed/filtered from the dataset and no columns are appended (clean = TRUE).
  • Default is clean = FALSE

Function Requirements

  • Reference table required?
    Yes, sourced from WQX QAQC Characteristic Validation table
  • Include warning if flags are not applied?
    Yes, Metadata transformations may be affected
  • Required columns:
    CharacteristicName
    MethodSpeciationName

Development Notes

None

Check List

  • Review how to use reference table
  • Create function to add/update reference table
  • Create InvalidSpeciation function

Flag Page Summary Table Generation

ALWAYS: Run all site and result flag functions and generate summary table (drafted in mock ups).

Flag=TRUE: append all flag columns to dataset
Flag=FALSE: do not append all flag columns to dataset

Clean=TRUE: remove all data that has been flagged AND remove flag columns if there
Clean=FALSE: do not remove all data that has been flagged

MeasureValueSpecialCharacters

Page Requirements/Standards

All of the flag page functions should be consistent in the following ways:

  • Clean argument indicates whether flag columns should be appended to the data (clean = FALSE), or flagged data is transformed/filtered from the dataset and no columns are appended (clean = TRUE).
  • Default is clean = FALSE

Function Requirements

  • Reference table required?
    No
  • Include warning if flags are not applied?
    Yes, "Data summaries and calculations may be affected by choosing to retain special characters in the ResultValue field. In order to ensure transformation functions will run properly, set clean = TRUE."
  • Required columns:
    ResultMeasureValue

Development Notes

ResultFlagsDependent.R

ComparableDataIdentifier

Add a column with the unique identifiers for each comparable data combination that has a unique identifiers included in the harmonization template. For combinations that do not already have a specific identifier in the reference file, generate a unique identifier.

CensoredData

Function title: CensoredDataSubstitutions or TransformCensorData?

Generate DetectionQuantitationLimitTypeName (and associated DetectionQuantitationLimitMeasure/MeasureValue and DetectionQuantitationLimitMeasure/MeasureUnitCode) and ResultDetectionConditionText from Result Value where needed

Depends on ResultsSpecialChars Function:

  • Convert Result Values that start with "<" into an appropriate Detection Condition and Detection Limit Value. That is: a Result Value of "<0.25" would be converted into a ResultDetectionConditionText of "Present Below Quantification Limit", a Nondetect Result Value of "0.25", and a DetectionQuantitationLimitTypeName of "Lower Quantitation Limit"

Two options (best for when <70% of data are censored):

  • Robust ROS [Regression on Order Statistics (ROS)] (for lower limits, use random number between detection limit and 0)
  • x times the detection limit

AutoFilter

Page Requirements/Standards
All of the flag page functions should be consistent in the following ways

  • Clean argument indicates whether flag columns should be appended to the data (clean = FALSE), or flagged data is transformed/filtered from the dataset and no columns are appended (clean = TRUE).
  • Default is clean = FALSE

Function Requirements

  • Reference table required?
    No
  • Include warning if flags are not applied?
    No

Required columns:
ActivityMediaName

Development Notes:
None

InvalidCoordinates

When clean = FALSE, append column titled "InvalidCoordinates" with the following:

  1. If the LAT is outside of the 0 to 90 range and longitude is outside of the -180 to 0 range, flag row as "NotInNorthAmerica".

  2. If the LAT or LONG includes the specific strings, 000 or 999, or if the LAT is outside of the -90 to 90 range and longitude is outside of the -180 to 180 range, flag row as "Invalid".

  3. Precision can be measured by the number of decimal places in the latitude and longitude provided. If the LAT or LONG does not have any numbers to the right of the decimal point, flag row as "Imprecise". Precision can be measured by the number of decimal places in the latitude and longitude provided.

When clean = TRUE, append column titled "InvalidCoordinates with the following:"

  1. If NotInNorthAmerica: LAT has a - sign, autoclean and change it to +; if LONG is +, autoclean and change it to -; include "ChangedLatLongSign" in the "InvalidCoordinates" column

  2. If the LAT or LONG includes the specific strings, 00 or 999, or if the LAT is outside of the -90 to 90 range and longitude is outside of the -180 to 180 range, flag row as "Invalid".

  3. If the LAT or LONG does not have any numbers to the right of the decimal point, still only flag row as "Imprecise". Do not remove from dataset.

Include additional Boolean argument for imprecise lat/longs:
When imprecise=TRUE, if the LAT or LONG does not have any numbers to the right of the decimal point, remove from dataset

When imprecise=FALSE, if the LAT or LONG does not have any numbers to the right of the decimal point, do not remove from dataset

InvalidFraction

Page Requirements/Standards

All of the flag page functions should be consistent in the following ways:

  • Clean argument indicates whether flag columns should be appended to the data (clean = FALSE), or flagged data is transformed/filtered from the dataset and no columns are appended (clean = TRUE).
  • Default is clean = FALSE

Function Requirements

  • Reference table required?
    Yes, sourced from WQX QAQC Characteristic Validation table
  • Include warning if flags are not applied?
    Yes, Metadata transformations may be affected
  • Required columns:
    CharacteristicName
    ResultSampleFractionText

Development Notes

None

Check List

  • Create function to add/update reference table
  • Create InvalidFraction function

Review HarmonizeData Functionality

Function Logic
Always: Generate "TADAHarmonizationTable"
Flag=TRUE: Append all yellow columns pulled from the "TADAHarmonizationTemplate" to the master TADA data profile as well.
Flag=FALSE: DO NOT append all yellow columns pulled from the "TADAHarmonizationTemplate" to the master TADA data profile
Clean=FALSE: Do not transform or convert yet.
Clean=TRUE: Perform all transformations and conversions.

Dependent on "TADAHarmonizationReferenceFile"
"TADAHarmonizationTable" is generated for the specific dataset using the "TADAHarmonizationReferenceFile". The "TADAHarmonizationReferenceFile" includes logic for harmonizing synonyms and units

Dependent on other functions
The WQXInvalidResultUnit function (clean=TRUE) is required to run this function
The WQXInvalidFraction function (clean=TRUE) is required to run this function
The WQXInvalidSpeciation function (clean=TRUE) is required to run this function
The WQXTargetUnits function is required to run this function

Notes
Suggest to focus on nutrients to start

Retrieval auto filter functions

Include the following functions in the Utilities.R file:

  • AutoFilter (media not water, biological data, etc.)
  • TrueDuplicate
  • RemoveColumnsWithNAs (do not do this because we require columns for some of the functions, remove columns at very of process end instead = final cleaning & output function; alternatively we could say remove all NAs except for the TADA critical columns)

Page Requirements/Standards
All of the flag page functions should be consistent in the following ways

Clean argument indicates whether flag columns should be appended to the data (clean = FALSE), or flagged data is transformed/filtered from the dataset and no columns are appended (clean = TRUE).
Default is clean = FALSE
Function Requirements

Reference table required?
No
Include warning if flags are not applied?
No
Required columns:
ActivityMediaName

Development Notes:

  • Consider auto filter for assemblage and media subdivision fields as well - to assist with simplifying the dataset (autofiltering) and ensuring results are comparable
  • May be part of a larger function instead of being a standalone function........could autoclean remove duplicate rows, remove blank colums, remove medianotwater data, and remove any other data our tool cannot support (TBD )

GenerateMap

Generate map using WQP station metadata. May can be static but include colors, size, and shapes to provide useful information about the WQP monitoring sites. Interactivity is a plus, but not a requirement for the MVP.

AboveNationalWQXUpperThreshold

Page Requirements/Standards

All of the flag page functions should be consistent in the following ways:

  • Clean argument indicates whether flag columns should be appended to the data (clean = FALSE), or flagged data is transformed/filtered from the dataset and no columns are appended (clean = TRUE).
  • Default is clean = FALSE

Function Requirements

  • Reference table required?
    Yes, sourced from WQX QAQC Characteristic Validation table
  • Include warning if flags are not applied?
    No
  • Required columns:
    CharacteristicName
    ResultMeasureValue
    ResultMeasure.MeasureUnitCode

Development Notes

ResultFlagsIndependent.R
Filter Type == "CharacteristicUnit"
Required columns to join the reference table and the TADA profile:

  • Characteristic, CharacteristicName
  • Source, AcitivtyMediaName
  • Value, ResultMeasure.MeasureUnitCode
    This one is a little tricky because Maximum values pertain to the "Value Unit" (target unit) column, not the "Value" (original unit) column. Therefore, units must be converted before checking if a value is outside the range. Here's some logic to get started:
  • Join Maximum and Conversion Factor columns to the input dataset (by Characteristic, Source, and Value)
  • Create a new ResultMeasureValue column (e.g. ConvertedValue), which is ResultMeasureValue / Conversion Factor
  • Create flag column, add flag when Maximum >= ConvertedValue

DepthProfileData

Page Requirements/Standards

All of the flag page functions should be consistent in the following ways:

  • Clean argument indicates whether flag columns should be appended to the data (clean = FALSE), or flagged data is transformed/filtered from the dataset and no columns are appended (clean = TRUE).
  • Default is clean = FALSE

Function Requirements

  • Reference table required?
    Yes, Unit conversion reference table
  • Include warning if flags are not applied?
    Yes, Data summaries and calculations may be affected (show only if convert = FALSE)
  • Required columns:
    Activity(Top/Bottom)DepthHeight fields
    ResultDepthHeightMeasure fields

Development Notes

  • HOLD on development- finalize components of this function with discussion
  • This function will do 2 things: 1) Flag rows with depth profile data, 2) optionally convert depth profile data to a uniform unit

WQPWebServiceImport

Import data using web service for the full physical chemical profile directly (not dataRetrieval)

In the vignette, generate two files that are used throughout process and available to a user to view at any time (.csv?). Generate "TADAProfileClean" file and "TADAProfileOriginal".

For function, simply import the data (do not write to global environment) --like what dataRetrieval does

CensoredDataSummary

Function title: CensoredDataSummary
Some result values are either below or above the detection limit of the equipment used to collect data. Users may want substitute values for this data to make it more useful for assessment. This function will provide a summary of the censored data in this dataset.

Always: generate censored data stats table and provide it to function users as a .csv (or in environment?)

Depends on ResultsSpecialChars Function

See summary table here:
https://usepa.sharepoint.com/sites/WQPDataAssessmentTeam/Shared%20Documents/Forms/AllItems.aspx?id=%2Fsites%2FWQPDataAssessmentTeam%2FShared%20Documents%2FGeneral%2FTADA%20Dev%2FFirstTool%5FDataDiscoveryandCleaning%2FTemplates%2FDraft%20Templates&viewid=deb12f0b%2D3d2c%2D4694%2Dbe22%2D7d262f785cce

UncommonAnalyticalMethodID

Page Requirements/Standards

All of the flag page functions should be consistent in the following ways

  • Clean argument indicates whether flag columns should be appended to the data (clean = FALSE), or flagged data is transformed/filtered from the dataset and no columns are appended (clean = TRUE).
  • Default is clean = FALSE

Function Requirements

  • Reference table required?
    Yes, sourced from WQX QAQC Characteristic Validation table
  • Include warning if flags are not applied?
    No
  • Required columns:
    CharacteristicName
    ResultAnalyticalMethod.MethodIdentifier
    ResultAnalyticalMethod.MethodIdentifierContext

Development Notes

None

Check List

  • Create function to add/update reference table
  • Create UncommonAnalyticalMethodID function

BelowNationalWQXLowerThreshold

Page Requirements/Standards

All of the flag page functions should be consistent in the following ways:

  • Clean argument indicates whether flag columns should be appended to the data (clean = FALSE), or flagged data is transformed/filtered from the dataset and no columns are appended (clean = TRUE).
  • Default is clean = FALSE

Function Requirements

  • Reference table required?
    Yes, sourced from WQX QAQC Characteristic Validation table
  • Include warning if flags are not applied?
    No
  • Required columns:
    CharacteristicName
    ResultMeasureValue
    ResultMeasure.MeasureUnitCode

Development Notes

ResultFlagsIndependent.R
Filter Type == "CharacteristicUnit"
Required columns to join the reference table and the TADA profile:

  • Characteristic, CharacteristicName
  • Source, AcitivtyMediaName
  • Value, ResultMeasure.MeasureUnitCode
    This one is a little tricky because Minimum values pertain to the "Value Unit" (target unit) column, not the "Value" (original unit) column. Therefore, units must be converted before checking if a value is outside the range. Here's some logic to get started:
  • Join Minimum and Conversion Factor columns to the input dataset (by Characteristic, Source, and Value)
  • Create a new ResultMeasureValue column (e.g. ConvertedValue), which is ResultMeasureValue / Conversion Factor
  • Create flag column, add flag when Minimum <= ConvertedValue

Add "TADA" to appended columns?

Consider including TADA at the beginning of all columns that TADA appends to the dataset.

For example, from the Harmonization Template:

  • TADA Suggested CharacteristicName
  • TADA CharacteristicName assumptions
  • TADA Suggested sample fraction
  • TADA Fraction assumptions
  • TADA Suggested speciation
  • TADA Speciation Conversion Factor
  • TADA Speciation Assumptions
  • TADA Suggested result unit
  • TADA UnitConversionFactor
  • TADA UnitConversionCoefficient

TADAOutliers

Consider adding outlier information to TADA stats function.

Append one or two additional columns to the dataset flagging outliers at the individual station/char level and/or at the all stations/char level.

Add new function input for stats to flag outliers across single station (input ID) or all stations:
Scale = AllStations
Scale = IndividualStations

Retrieval enhancement: dataRetrievalTemplate

Option 1: Generate Blank TADARetrieval Template
Option 2: Upload filled in TADARetrieval template and download WQP data

Use data retrieval package for this

Upon data retrieval, generate two files that are used throughout process and available to a user to view at any time (.csv?). Generate "TADAProfileClean" file and "TADAProfileOriginal".

AggregatedContinuousData

Page Requirements/Standards

All of the flag page functions should be consistent in the following ways:

  • Clean argument indicates whether flag columns should be appended to the data (clean = FALSE), or flagged data is transformed/filtered from the dataset and no columns are appended (clean = TRUE).
  • Default is clean = FALSE

Function Requirements

  • Reference table required?
    No
  • Include warning if flags are not applied?
    No
  • Required columns:
    ResultDetectionConditionText

Development Notes

ResultFlagsIndependent.R
Use code from "auto clean" function

InvalidResultUnit

Page Requirements/Standards

All of the flag page functions should be consistent in the following ways

  • Clean argument indicates whether flag columns should be appended to the data (clean = FALSE), or flagged data is transformed/filtered from the dataset and no columns are appended (clean = TRUE).
  • Default is clean = FALSE

Function Requirements

  • Reference table required?
    Yes, sourced from WQX QAQC Characteristic Validation table
  • Include warning if flags are not applied?
    Yes, Unit conversions, data summaries, and data calculations may be affected
  • Required columns:
    CharacteristicName
    ResultMeasure.MeasureUnitCode
    ActivityMediaName

Development Notes

None

Check List

  • Create function to add/update reference table
  • Create InvalidResultUnit function

CalculateTotalNitrogen

Calculate = Yes
Calculate = No

GenerateHarmonizationTable = Yes
GenreateTable = No

LogicColumn = Yes
LogicColumn = No

See Total Nitrogen Summations page for more detailed requirements

The following logic can be hard coded:
Multiple forms of nitrogen can be summed to calculate TN. Here is the logic TADA uses for this calculation:
If available, use the total N result for a given day, even if there are other constitutes available
If total N is not available, use the sum of multiple constituents available for the same day as TN. If only 1 constituent is available for a given day, use that as total N or P.

Use HarmonizationTemplate
https://usepa.sharepoint.com/:x:/r/sites/WQPDataAssessmentTeam/_layouts/15/Doc.aspx?sourcedoc=%7B756FBEA4-399E-4B40-BB15-4E52041151D0%7D&file=HarmonizationTemplate.xlsx&action=default&mobileredirect=true

Utilities.R

Create function to remove true duplicates
Create filtering functions

Fields for full dataset filtering
ActivityTypeCode
ActivityMediaName
ActivityMediaSubdivisionName
ActivityCommentText
MonitoringLocationTypeName
StateName
TribalLandName
OrganizationFormalName
CharacteristicName
HydrologicCondition
HydrologicEvent
BiologicalIntentName
MeasureQualifierCode
ActivityGroup
AssemblageSampledName
ProjectName
CharacteristicNameUserSupplied
DetectionQuantitationLimitTypeName
SampleTissueAnatomyName
LaboratoryName

Fields for characteristic level filtering
ActivityCommentText
ActivityTypeCode
ActivityMediaName
ActivityMediaSubdivisionName
MeasureQualifierCode
MonitoringLocationTypeName
HydrologicCondition
HydrologicEvent
ResultStatusIdentifier
MethodQualifierTypeName
ResultCommentText
ResultLaboratoryCommentText
ResultMeasure/MeasureUnitCode
ResultSampleFractionText
ResultTemperatureBasisText
ResultValueTypeName
ResultWeightBasisText
SampleCollectionEquipmentName
LaboratoryName
MethodDescriptionText
ResultParticleSizeBasisText
SampleCollectionMethod/MethodIdentifier
SampleCollectionMethod/MethodIdentifierContext
SampleCollectionMethod/MethodName
DataQuality/BiasValue
MethodSpeciationName
ResultAnalyticalMethod/MethodName
ResultAnalyticalMethod/MethodIdentifier
ResultAnalyticalMethod/MethodIdentifierContext
AssemblageSampledName
CharacteristicNameUserSupplied
DetectionQuantitationLimitTypeName

WQXTargetUnits

Convert all units using MeasureUnit (CSV): https://cdx2.epa.gov/wqx/download/DomainValues/MeasureUnit.CSV

If clean=FALSE, append target unit and conversion columns only

  • If unit is not recognizable or able to be converted, flag as "manual conversion required" or for some specific ones, include flag "UnitIncludesMetadata".

If clean=TRUE, append target unit and conversion columns AND convert units in dataset

  • If unit is not recognizable or able to be converted, flag as "manual conversion required" or for some specific ones, include flag "UnitIncludesMetadata".

RecordSummary

Generate record summary. May fit well in Utilities.R folder because it can be used throughout the assessment process.

This function relies on initial data retrieval generation of a "clean" file and a "raw file"

Summary
Total Records in Raw File: 544
Total Records Removed: 48
Total Records in Clean File: 496

Total Sites in Raw File: 20
Total Sites Removed: 4
Total Sites in Clean File: 16

IndividualCensoredDataSubstitutions

Apply more advanced methods and/or different methods depending on characteristics being assessed and proportion of censored data for each

Additional options
KaplanMeier
Other methods (TBD)

Must download and upload template

QAPPApproved

Page Requirements/Standards

All of the flag page functions should be consistent in the following ways:

  • Clean argument indicates whether flag columns should be appended to the data (clean = FALSE), or flagged data is transformed/filtered from the dataset and no columns are appended (clean = TRUE).
  • Default is clean = FALSE

Function Requirements

  • Reference table required?
    No
  • Include warning if flags are not applied?
    No
  • Required columns:
    QAPPApprovedIndicator

Development Notes

  • Consider using this logic:
    QAPPApprovedIndicator is populated, QAPPApproved = Y (when clean = TRUE, these columns are retained)
    QAPPApprovedIndicator is not populated, QAPPApproved = N (when clean = TRUE, these columns are not retained)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.