Code Monkey home page Code Monkey logo

canwqdata's Introduction

img Travis-CI Build StatusLicense R build status Codecov test coverage

canwqdata

An R 📦 to download open water quality data from Environment and Climate Change Canada’s National Long-term Water Quality Monitoring Data.

Features

This package is designed to get Canadian Water Quality Monitoring data into R quickly and easily. You can get data from a single monitoring station, multiple stations, or from an entire basin.

Installation

remotes::install_github("bcgov/canwqdata")

Usage

First load the package:

library(canwqdata)

The first thing you will probably want to do is get a list of the available sites and associated metadata:

sites <- wq_sites()

sites
#> # A tibble: 339 × 16
#>    SITE_NO SITE_NAME       SITE_NOM_FR SITE_TYPE SITE_DESC SITE_DESC_FR LATITUDE
#>    <chr>   <chr>           <chr>       <chr>     <chr>     <chr>           <dbl>
#>  1 72      BEAUHARNOIS CA… CANAL DE B… RIVER/RI… <NA>      <NA>             45.2
#>  2 75      ST.LAWRENCE RI… FLEUVE SAI… RIVER/RI… <NA>      <NA>             45.9
#>  3 78      ST.LAWRENCE RI… FLEUVE SAI… RIVER/RI… <NA>      <NA>             45.4
#>  4 2330001 ETCHEMIN RIVER… RIVIÈRE ET… RIVER/RI… <NA>      <NA>             46.8
#>  5 2340033 CHAUDIÈRE RIVE… RIVIÈRE CH… RIVER/RI… <NA>      <NA>             46.7
#>  6 2400004 BÉCANCOUR RIVE… RIVIÈRE BÉ… RIVER/RI… <NA>      <NA>             46.4
#>  7 3020073 MAGOG RIVER AT… RIVIÈRE MA… RIVER/RI… <NA>      <NA>             45.3
#>  8 3020333 COATICOOK RIVE… RIVIÈRE CO… RIVER/RI… <NA>      <NA>             45.3
#>  9 3040010 RICHELIEU RIVE… RIVIÈRE RI… RIVER/RI… <NA>      <NA>             45.4
#> 10 3040012 RICHELIEU RIVE… RIVIÈRE RI… RIVER/RI… <NA>      <NA>             45.1
#> # … with 329 more rows, and 9 more variables: LONGITUDE <dbl>, DATUM <chr>,
#> #   PROV_TERR <chr>, PEARSEDA <chr>, PEARSEDA_FR <chr>, OCEANDA <chr>,
#> #   OCEANDA_FR <chr>, DATA_URL <chr>, DATA_URL_FR <chr>

Then get some data from a particular station:

AL07AA0015 is a site in Alberta called Athabasca River above Athabasca Falls

athabasca_falls <- wq_site_data("AL07AA0015")

athabasca_falls
#> # A tibble: 10,538 × 11
#>    SITE_NO    DATE_TIME_HEURE     FLAG_MARQUEUR VALUE_VALEUR SDL_LDE MDL_LDM
#>    <chr>      <dttm>              <chr>                <dbl>   <dbl>   <dbl>
#>  1 AL07AA0015 2000-01-11 13:05:00 <NA>               93.2         NA      NA
#>  2 AL07AA0015 2000-01-11 13:05:00 <                   0.02        NA      NA
#>  3 AL07AA0015 2000-01-11 13:05:00 <                   0.005       NA      NA
#>  4 AL07AA0015 2000-01-11 13:05:00 <NA>                0           NA      NA
#>  5 AL07AA0015 2000-01-11 13:05:00 <                   0.0001      NA      NA
#>  6 AL07AA0015 2000-01-11 13:05:00 <NA>                0.065       NA      NA
#>  7 AL07AA0015 2000-01-11 13:05:00 <                   0.5         NA      NA
#>  8 AL07AA0015 2000-01-11 13:05:00 <NA>              114.          NA      NA
#>  9 AL07AA0015 2000-01-11 13:05:00 <                   0.002       NA      NA
#> 10 AL07AA0015 2000-01-11 13:05:00 <                   0.001       NA      NA
#> # … with 10,528 more rows, and 5 more variables: VMV_CODE <chr>,
#> #   UNIT_UNITE <chr>, VARIABLE <chr>, VARIABLE_FR <chr>, STATUS_STATUT <chr>

We can also get data from more than one station:

wq_site_data(c("YT09FC0002", "SA05JM0014"))
#> # A tibble: 23,932 × 11
#>    SITE_NO    DATE_TIME_HEURE     FLAG_MARQUEUR VALUE_VALEUR SDL_LDE MDL_LDM
#>    <chr>      <dttm>              <chr>                <dbl>   <dbl>   <dbl>
#>  1 SA05JM0014 2000-03-07 12:45:00 <NA>                0           NA      NA
#>  2 SA05JM0014 2000-03-07 12:45:00 <NA>              253           NA      NA
#>  3 SA05JM0014 2000-03-07 12:45:00 <NA>                0.047       NA      NA
#>  4 SA05JM0014 2000-03-07 12:45:00 <NA>                0.607       NA      NA
#>  5 SA05JM0014 2000-03-07 12:45:00 <NA>                0.079       NA      NA
#>  6 SA05JM0014 2000-03-07 12:45:00 <NA>                0.001       NA      NA
#>  7 SA05JM0014 2000-03-07 12:45:00 <NA>                0.039       NA      NA
#>  8 SA05JM0014 2000-03-07 12:45:00 <NA>                0.0569      NA      NA
#>  9 SA05JM0014 2000-03-07 12:45:00 <                   0.5         NA      NA
#> 10 SA05JM0014 2000-03-07 12:45:00 <                   0.05        NA      NA
#> # … with 23,922 more rows, and 5 more variables: VMV_CODE <chr>,
#> #   UNIT_UNITE <chr>, VARIABLE <chr>, VARIABLE_FR <chr>, STATUS_STATUT <chr>

Or an entire basin:

The basins are in the PEARSEDA column of the data.frame returned by wq_sites():

basins <- sort(unique(sites$PEARSEDA))
basins
#>  [1] "ARCTIC COAST-ISLANDS"      "ASSINIBOINE-RED"          
#>  [3] "CHURCHILL"                 "COLUMBIA"                 
#>  [5] "FRASER-LOWER MAINLAND"     "GREAT LAKES"              
#>  [7] "KEEWATIN-SOUTHERN BAFFIN"  "LOWER MACKENZIE"          
#>  [9] "LOWER SASKATCHEWAN-NELSON" "MARITIME COASTAL"         
#> [11] "MISSOURI"                  "NEWFOUNDLAND-LABRADOR"    
#> [13] "NORTH SASKATCHEWAN"        "NORTH SHORE-GASPÉ"        
#> [15] "OKANAGAN-SIMILKAMEEN"      "OTTAWA"                   
#> [17] "PACIFIC COASTAL"           "PEACE-ATHABASCA"          
#> [19] "SAINT JOHN-ST. CROIX"      "SOUTH SASKATCHEWAN"       
#> [21] "ST. LAWRENCE"              "WINNIPEG"                 
#> [23] "YUKON"

fraser <- wq_basin_data("FRASER-LOWER MAINLAND")

Do some quick summary stats of the fraser dataset:

library(dplyr)

fraser %>% 
  group_by(SITE_NO) %>% 
  summarise(first_date = min(DATE_TIME_HEURE), 
            latest_date = max(DATE_TIME_HEURE), 
            n_params = length(unique(VARIABLE)), 
            total_samples = n())
#> # A tibble: 15 × 5
#>    SITE_NO    first_date          latest_date         n_params total_samples
#>    <chr>      <dttm>              <dttm>                 <int>         <int>
#>  1 BC08KA0007 2000-01-12 07:45:00 2019-09-12 08:58:00      108         24941
#>  2 BC08KE0010 2000-01-05 00:00:00 2019-09-16 10:00:00       76         23477
#>  3 BC08KH0012 2006-05-11 13:07:00 2019-09-29 08:30:00      140         19511
#>  4 BC08KH0013 2014-06-16 12:45:00 2019-09-23 09:45:00      107         10375
#>  5 BC08KH0014 2014-09-23 14:00:00 2019-09-09 06:55:00      110          9397
#>  6 BC08LC0005 2011-02-24 09:45:00 2019-09-18 11:20:00       69         11866
#>  7 BC08LE0004 2000-01-04 10:00:00 2019-10-02 11:30:00      112         23469
#>  8 BC08LF0001 2000-01-05 12:00:00 2014-12-15 10:20:00       89         18410
#>  9 BC08LG0001 2003-06-24 10:45:00 2019-09-18 14:30:00       71         10366
#> 10 BC08MB0007 2004-11-15 12:00:00 2019-10-01 12:21:00      105         21297
#> 11 BC08MC0001 2000-04-18 16:30:00 2019-09-30 08:37:00      107         21775
#> 12 BC08MF0001 2000-01-04 14:10:00 2019-09-12 12:00:00      129         21475
#> 13 BC08MH0027 2000-01-07 12:16:00 2019-09-24 12:02:00      115         34775
#> 14 BC08MH0269 2004-03-03 14:40:00 2019-09-24 13:45:00      137         25932
#> 15 BC08MH0453 2008-09-02 16:25:00 2019-09-30 12:00:00      107         13389

We can also look at metadata that helps us understand what is in the different columns.

wq_params() returns a list of water quality parameters (variables), and related data - units, methods, codes, etc:

params <- wq_params()
glimpse(params)
#> Rows: 1,964
#> Columns: 12
#> $ VMV_CODE                <chr> "77", "78", "79", "80", "157", "160", "201", "…
#> $ NATIONAL_VARIABLE_CODE  <chr> "635", "365", "4541", "414", "864", "1073", "8…
#> $ VARIABLE_COMMON_NAME    <chr> "Nitrogen total", "Alkalinity total HCO3", "Ch…
#> $ VARIABLE_COMMON_NAME_FR <chr> "Azote total", "Alcalinité totale HCO3", "Chlo…
#> $ VARIABLE_TYPE           <chr> "Nitrogen", "Physical", "Chlorophyll", "Chloro…
#> $ VARIABLE_TYPE_FR        <chr> "Azote", "Physique", "Chlorophylle", "Chloroph…
#> $ MEASUREMENT_UNIT        <chr> "mg/L", "mg/L", "µg/L", "µg/L", "NTU", "mg/L",…
#> $ DESCRIPTION             <chr> "milligram per liter", "milligram per liter", …
#> $ DESCRIPTION_FR          <chr> "milligramme par litre", "milligramme par litr…
#> $ NATIONAL_METHOD_CODE    <chr> "23", "30", "35", "41", "188", "189", "8", "9"…
#> $ METHOD_TITLE            <chr> "Total nitrogen measurement by persulfate oxid…
#> $ METHOD_TITLE_FR         <chr> "Azote total par la méthode d'oxydation au per…

# wq_param_desc shows the column headings (in all other tables) and what they mean
wq_data_desc() %>% 
  glimpse()
#> Rows: 39
#> Columns: 5
#> $ COL_TITLE_TITRE    <chr> "COL_DESCRIPTION", "COL_DESCRIPTION_FR", "COL_TITLE…
#> $ COL_TITLE_FULL     <chr> "COLUMN HEADER DESCRIPTION", "COLUMN HEADER DESCRIP…
#> $ COL_TITRE_COMPLET  <chr> "DESCRIPTION DE L'EN-TÊTE DE COLONNE", "DESCRIPTION…
#> $ COL_DESCRIPTION    <chr> "COLUMN HEADER DESCRIPTION", "COLUMN HEADER DESCRIP…
#> $ COL_DESCRIPTION_FR <chr> "DESCRIPTION DE L'EN-TÊTE DE COLONNE", "DESCRIPTION…

Let’s look at Total Nitrogen in the Fraser basin:

fraser_n_total <- fraser %>% filter(VARIABLE == "NITROGEN TOTAL")

Now lets do some plotting - plot Total Nitrogen over time at all the sites, (plot it on a log scale so that they all fit)

library(ggplot2)

ggplot(fraser_n_total, aes(x = DATE_TIME_HEURE, y = VALUE_VALEUR)) + 
  geom_point(size = 0.4, alpha = 0.4, colour = "purple") + 
  facet_wrap(~ SITE_NO) + 
  scale_y_log10()

It’s also possible to download data from an entire province:

bc_sites <- sites %>% 
  filter(PROV_TERR == "BC") %>% 
  pull(SITE_NO)

all_bc_data <- wq_site_data(bc_sites)

glimpse(all_bc_data)
#> Rows: 925,542
#> Columns: 11
#> $ SITE_NO         <chr> "BC07FB0005", "BC07FB0005", "BC07FB0005", "BC07FB0005"…
#> $ DATE_TIME_HEURE <dttm> 2017-01-25 09:35:00, 2017-01-25 09:35:00, 2017-01-25 …
#> $ FLAG_MARQUEUR   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "<", NA, N…
#> $ VALUE_VALEUR    <dbl> 163.000, 4.100, 31.900, 0.060, 0.061, 0.130, 0.150, 10…
#> $ SDL_LDE         <dbl> 1.000, 0.500, 0.500, 0.001, 0.001, 0.010, 0.010, 0.050…
#> $ MDL_LDM         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ VMV_CODE        <chr> "9134", "107941", "107905", "107965", "107929", "10794…
#> $ UNIT_UNITE      <chr> "MG/L", "UG/L", "UG/L", "UG/L", "UG/L", "UG/L", "UG/L"…
#> $ VARIABLE        <chr> "ALKALINITY TOTAL CACO3", "ALUMINUM DISSOLVED", "ALUMI…
#> $ VARIABLE_FR     <chr> "ALCALINITÉ TOTALE CACO3", "ALUMINIUM DISSOUS", "ALUMI…
#> $ STATUS_STATUT   <chr> "P", "P", "P", "P", "P", "P", "P", "P", "P", "P", "P",…

Project Status

Under development, but ready for use and testing.

Getting Help or Reporting an Issue

To report bugs/issues/feature requests, please file an issue.

How to Contribute

If you would like to contribute to the package, please see our CONTRIBUTING guidelines.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

License

Copyright 2018 Province of British Columbia

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at 

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

This repository is maintained by Environmental Reporting BC. Click here for a complete list of our repositories on GitHub.

canwqdata's People

Contributors

ateucher avatar karharker avatar repo-mountie[bot] avatar stephhazlitt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

macdonaldj2 ivi-m

canwqdata's Issues

It's Been a While Since This Repository has Been Updated

This issue is a kind reminder that your repository has been inactive for 181 days. Some repositories are maintained in accordance with business requirements that infrequently change thus appearing inactive, and some repositories are inactive because they are unmaintained.

To help differentiate products that are unmaintained from products that do not require frequent maintenance, repomountie will open an issue whenever a repository has not been updated in 180 days.

  • If this product is being actively maintained, please close this issue.
  • If this repository isn't being actively maintained anymore, please archive this repository. Also, for bonus points, please add a dormant or retired life cycle badge.

Thank you for your help ensuring effective governance of our open-source ecosystem!

Add project lifecycle badge

No Project Lifecycle Badge found in your readme!

Hello! I scanned your readme and could not find a project lifecycle badge. A project lifecycle badge will provide contributors to your project as well as other stakeholders (platform services, executive) insight into the lifecycle of your repository.

What is a Project Lifecycle Badge?

It is a simple image that neatly describes your project's stage in its lifecycle. More information can be found in the project lifecycle badges documentation.

What do I need to do?

I suggest you make a PR into your README.md and add a project lifecycle badge near the top where it is easy for your users to pick it up :). Once it is merged feel free to close this issue. I will not open up a new one :)

Add missing topics

TL;DR

Topics greatly improve the discoverability of repos; please add the short code from the table below to the topics of your repo so that ministries can use GitHub's search to find out what repos belong to them and other visitors can find useful content (and reuse it!).

Why Topic

In short order we'll add our 800th repo. This large number clearly demonstrates the success of using GitHub and our Open Source initiative. This huge success means its critical that we work to make our content as discoverable as possible; Through discoverability, we promote code reuse across a large decentralized organization like the Government of British Columbia as well as allow ministries to find the repos they own.

What to do

Below is a table of abbreviation a.k.a short codes for each ministry; they're the ones used in all @gov.bc.ca email addresses. Please add the short codes of the ministry or organization that "owns" this repo as a topic.

add a topic

That's in, you're done!!!

How to use

Once topics are added, you can use them in GitHub's search. For example, enter something like org:bcgov topic:citz to find all the repos that belong to Citizens' Services. You can refine this search by adding key words specific to a subject you're interested in. To learn more about searching through repos check out GitHub's doc on searching.

Pro Tip 🤓

  • If your org is not in the list below, or the table contains errors, please create an issue here.

  • While you're doing this, add additional topics that would help someone searching for "something". These can be the language used javascript or R; something like opendata or data for data only repos; or any other key words that are useful.

  • Add a meaningful description to your repo. This is hugely valuable to people looking through our repositories.

  • If your application is live, add the production URL.

Ministry Short Codes

Short Code Organization Name
AEST Advanced Education, Skills & Training
AGRI Agriculture
ALC Agriculture Land Commission
AG Attorney General
MCF Children & Family Development
CITZ Citizens' Services
DBC Destination BC
EMBC Emergency Management BC
EAO Environmental Assessment Office
EDUC Education
EMPR Energy, Mines & Petroleum Resources
ENV Environment & Climate Change Strategy
FIN Finance
FLNR Forests, Lands, Natural Resource Operations & Rural Development
HLTH Health
FLNR Indigenous Relations & Reconciliation
JEDC Jobs, Economic Development & Competitiveness
LBR Labour Policy & Legislation
LDB BC Liquor Distribution Branch
MMHA Mental Health & Addictions
MAH Municipal Affairs & Housing
BCPC Pension Corporation
PSA Public Safety & Solicitor General & Emergency B.C.
SDPR Social Development & Poverty Reduction
TCA Tourism, Arts & Culture
TRAN Transportation & Infrastructure

NOTE See an error or omission? Please create an issue here to get it remedied.

It's Been a While Since This Repository has Been Updated

This issue is a kind reminder that your repository has been inactive for 181 days. Some repositories are maintained in accordance with business requirements that infrequently change thus appearing inactive, and some repositories are inactive because they are unmaintained.

To help differentiate products that are unmaintained from products that do not require frequent maintenance, repomountie will open an issue whenever a repository has not been updated in 180 days.

  • If this product is being actively maintained, please close this issue.
  • If this repository isn't being actively maintained anymore, please archive this repository. Also, for bonus points, please add a dormant or retired life cycle badge.

Thank you for your help ensuring effective governance of our open-source ecosystem!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.