Code Monkey home page Code Monkey logo

lab-data-delivery's People

Contributors

dylanbeaudette avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

swsalley

lab-data-delivery's Issues

Phsample missing where Pedon labsample number populated

BASE TABLE pedon.

EXEC SQL SELECT TOP 100 peiid/1 AS peiid, 1 AS id
FROM pedon
INNER JOIN phorizon ON phorizon.peiidref=pedon.peiid AND pedlabsampnum IS NOT NULL 
AND peiid IN (SELECT p1.peiid
FROM pedon AS p1 
INNER JOIN phorizon AS ph1 ON ph1.peiidref=p1.peiid AND pedlabsampnum IS NOT NULL
AND p1.peiid 
IN        
(9635, 9637, 9645, 9657, 9677, 9679, 9684, 9897, 9898, 20617, 27121, 27131, 27132, 27140, 27142, 27147, 27153, 49057, 49058, 49059, 49060, 49061, 50744, 50745, 50746, 50754, 50751, 50753, 55222, 63494, 64427, 64430, 64431, 64433, 64435, 64439, 64482, 64488, 64489, 64490, 64491, 64492, 64502, 64503, 64506, 98036, 297972, 66406, 66409, 67480, 67655, 67657, 67658, 67659, 67660, 67661, 98149, 98157, 98176, 99159, 99176, 99308, 99261, 99290, 99294, 99254, 99324, 99557, 99694, 99790, 64426, 108352, 108354, 108355, 108358, 108359, 108360, 108361, 108362, 108363, 49314, 111792, 111794, 1181334, 111807, 111840, 111842, 111846, 111860, 111868, 111870, 112669, 112794, 112827, 112884, 112942, 113179, 114082, 114035, 116114, 43391, 63788, 63789, 64602, 168290, 168170, 168177, 186624, 1176494, 831992, 951031, 167545, 167844, 167932, 168133, 168135, 168138, 168156, 168168, 168255, 168272, 195471, 220149, 235271, 242215, 253218, 254008, 254479, 254480, 254481, 254482, 254521, 256258, 256259, 261275, 346409, 361082, 713703, 361864, 361865, 361866, 361867, 361869, 361870, 361871, 361872, 361873, 362463, 362564, 364287, 364372, 364579, 364580, 364585, 364643, 364651, 364656, 364271, 364821, 364824, 365476, 366579, 366580, 366581, 366582, 366583, 366584, 366585, 366586, 366587, 366588, 366595, 366596, 371012, 371013, 371014, 371015, 371016, 371017, 371018, 371019, 371020, 371021, 371572, 371573, 371574, 371575, 371576, 371577, 365297, 365298, 365299, 338020, 351623, 261847, 352512, 352514, 365289, 365291, 366597,47811, 47812, 66852, 204228, 204236, 204233, 204234, 204226, 113829, 1072595, 1072597, 863924, 784495, 181718, 773877, 176352, 196353, 217739, 220530, 234969, 242112, 247621, 247724, 247882, 247894, 248563, 297788, 297824, 297831, 297833, 375637, 388229, 459643, 1017200, 598189, 662781, 788182, 799737, 799738, 799739, 824113, 825699, 825703, 833633, 842394, 879094, 878494, 885717, 885731, 885749, 885751, 885752, 885753, 885754, 903894, 903897, 903912, 833644, 668953, 10104, 10116, 10118, 10367, 10368, 10369, 10370, 10371, 10606, 10607, 10608, 10609, 10610, 10611, 10892, 10893, 10894, 27562, 46805, 46807, 46810, 46811, 46812, 46813, 46814, 46815, 46838, 46840, 46860, 46861, 46862, 46863, 46864, 46867, 46870, 46871, 46880, 46882, 46883, 46887, 46888, 46889, 46890, 46920, 46921, 46922, 46945, 46952, 46953, 46954, 46986, 46987, 46988, 46996, 46997, 47013, 47014, 47015, 47016, 47017, 47018, 47019, 47020, 47021, 47034, 47035, 47038, 47039, 47040, 47041, 47048, 47049, 47064, 47065, 47079, 50008, 51228, 51233, 56013, 56018, 56021, 56022, 56032, 56039, 56041, 56055, 56134, 56139, 56140, 56141, 56143, 56149, 56162, 67296, 67297, 67298, 67300, 67329, 67330, 67331, 9631, 19781, 30684, 30685, 34755, 34756, 34757, 34758, 34759, 34760, 34761, 34762, 34763, 34764, 36643, 44687, 44697, 44699, 44800, 49115, 49116, 50074, 50076, 50077, 50592, 50594, 50596, 50599, 50602, 50644, 51956, 52148, 52149, 810748, 52943, 63454, 63549, 63551, 66314, 66357, 66358, 66359, 66360, 66361, 67581, 67582, 67583, 67584, 67601, 67603, 67604, 88687, 79543, 65735, 65734, 65736, 65737, 50647, 115906, 133159, 133928, 133932, 146043, 73277, 161907, 164081, 164101, 164174, 164230, 165828, 50517, 180996, 186657, 186658, 186659, 192532, 192547, 192601, 195195, 219104, 220107, 220194, 220523, 220560, 240385, 253401, 266232, 294817, 774201, 313272, 313379, 313654, 313791, 313859, 313907, 850222, 339716, 1079962, 622689, 610101)


LEFT OUTER JOIN phsample AS phs1 ON phs1.phiidref=ph1.phiid GROUP BY p1.peiid
HAVING COUNT (phs1.phiidref) = 0  AND p1.peiid=pedon.peiid )GROUP BY peiid;
AGGREGATE ROWS BY id COLUMN peiid LIST ', '.

some columns in the FGDB / text files are all NA (KSSL Data)

Some of the more important values that appear to be entirely NA, for example:

  1. Carbon_and_Extractions:
  • oc
  • n_tot
  • fe_dith
  • fe_ox

A complete listing can be found by looking over this file. Columns specified as TEXT when known to be numeric are likely wrong.

I have confirmed that the FGDB and resulting textfiles have this problem.

add XRD data to SoilWeb web-service

Also:

  • investigate peak-normalization methods that may have been used
  • are all of the data in digital format?
  • contact Steve Monteith for details

develop and apply a robust "completeness index"

The current version isn't very informative because:

  • it doesn't know about missing horizons
  • doesn't specifically take into account properties required by CEAP/APEX
  • ??? other reasons

Data required by CEAP/APEX, via CAH:

  • sand content
  • silt content
  • 1/3 bar bulk density
  • and OD bulk density (if present)
  • sum of bases
  • CaCO3 equivalent
  • EC (sat paste)
  • 1:1 pH
  • rock fragments
  • cec7 (NH4OAc)
  • organic C
  • 1/3 bar water ret.
  • 15 bar water ret.

Ideas

  1. simple score based on number of non-null fields in this list
  2. simple score based on number of non-null fields, weighted by number of expected hz, via OSD
  3. bit mask (e.g. 1010010101) type reporting so that the user knows what is missing
  4. weighted-score, based on column importance: e.g. clay is more important than al_dith

An example via aqp

library(soilDB)
library(aqp)

# get some example data
x <- fetchKSSL('amador')

# variable of interest
vars <-  c('sand', 'silt', 'clay', 'db_13b', 'db_od', 'bs82', 'bs7', 'caco3', 'ec_12pre', 'ph_h2o', 'cec7', 'oc', 'w3cld', 'w15l2')

# get soil depth based on detection of "non-soil horizons"
sdc <- getSoilDepthClass(x, name = "hzn_desgn", top = "hzn_top", bottom = "hzn_bot", p = 'Cr|R|Cd')

# existing aqp function for detecting missing data
# inform max_depth via getSoilDepthClass(...)
# filter out non-soil horizons (there shouldn't be many in the KSSL data)
res <- missingDataGrid(x, max_depth=max(sdc$depth, na.rm = TRUE), vars=vars, filter.column = 'hzn_desgn', filter.regex = 'Cr|R|Cd')

# compute (simple) data completeness index
# 100 - [ sum(pct missing by variable) / n_variables ]
res$dci <- round(100 - (rowSums(res[, -1]) / length(vars)))

# copy back into site-level attributes
site(x) <- res[, c('pedon_key', 'dci')]

# compare with current "pedon completeness index": pretty close
plot(pedon_completeness_index ~ dci, data=site(x))
  pedon_key sand silt clay db_13b db_od bs82 bs7 caco3 ec_12pre ph_h2o cec7  oc w3cld w15l2 dci
1     32585    0    0    0      0     0    0   0   100      100      0    0 100     0     0  79
2     52931    0    0    0    100   100  100   0   100      100    100    0   0   100   100  43
3     53074    0    0    0    100   100  100   0   100      100    100    0   0   100   100  43
4     53081    0    0    0    100   100  100   0   100      100    100    0   0   100   100  43
5     59423    0    0    0    100   100    0 100     0      100      0    0 100   100     0  57

image

Add areasymbol to site table

Currently their are 38,456 records without coordinates. Therefore it would be helpful to know which areasymbol the sites reside in if their coordinates are missing. Right now they only way to determine the areasymbol is to parse the user_site_id or link to NASIS information.

I see their are keys in the ncss table. Can these by decoded? Are they found within the NASIS area table?

duplicates in FGDB and exported text files

I noticed that the NASIS pedon FGDB contains duplicate rows in some tables.

For example, site record id 9139 has 3 rows in the FGDB, but only a single row in NASIS.

SELECT * from site WHERE siteiid = '9139';
usiteid latdegrees latminutes latseconds latdir longdegrees longminutes longseconds longdir                   horizdatnm
1 00-JRD-01         44          1         28  North         120          28          57    West North American Datum of 1927
2 00-JRD-01         44          1         28  North         120          28          57    West North American Datum of 1927
3 00-JRD-01         44          1         28  North         120          28          57    West North American Datum of 1927
  locdesc plsssdetails plsssection plsstownship plssrange plssmeridian utmzone utmnorthing utmeasting
1    <NA>         <NA>           7          18S       19E   Willamette      NA          NA         NA
2    <NA>         <NA>           7          18S       19E   Willamette      NA          NA         NA
3    <NA>         <NA>           7          18S       19E   Willamette      NA          NA         NA
               geocoordsource elev geomposhill geomposmntn geompostrce geomposflats hillslopeprof geomslopeseg slope aspect
1 Estimated from other source 1628        <NA>        <NA>        <NA>         <NA>          <NA>         <NA>    15     84
2 Estimated from other source 1628        <NA>        <NA>        <NA>         <NA>          <NA>         <NA>    15     84
3 Estimated from other source 1628        <NA>        <NA>        <NA>         <NA>          <NA>         <NA>    15     84
  slopelenusle slopelenuptro shapeacross shapedown slopecomplex locphysnm siteksatclassupp siteksatclasslow   drainagecl
1           NA            NA        <NA>      <NA>       simple      <NA>             <NA>             <NA> Well drained
2           NA            NA        <NA>      <NA>       simple      <NA>             <NA>             <NA> Well drained
3           NA            NA        <NA>      <NA>       simple      <NA>             <NA>             <NA> Well drained
  runoff drainagepattern                                      pmgroupname pmgroupname_s climstaid climstanm climstatype ffd
1 Medium            <NA> volcanic ash over residuum weathered from basalt             1      <NA>      <NA>        <NA>  NA
2 Medium            <NA> volcanic ash over residuum weathered from basalt             1      <NA>      <NA>        <NA>  NA
3 Medium            <NA> volcanic ash over residuum weathered from basalt             1      <NA>      <NA>        <NA>  NA
  map reannualprecip airtempa soiltempa airtemps soiltemps airtempw soiltempw benchmarksoilfla flodfreqcl floddurcl
1  NA             NA       NA        NA       NA        NA       NA        NA                0       None      <NA>
2  NA             NA       NA        NA       NA        NA       NA        NA                0       None      <NA>
3  NA             NA       NA        NA       NA        NA       NA        NA                0       None      <NA>
  flodmonthbeg pondfreqcl ponddurcl pondmonthbeg wtabledur latstddecimaldeg longstddecimalde gpspositionalerr gpspdop
1         <NA>       None      <NA>         <NA>        NA         44.02428        -120.4836               NA      NA
2         <NA>       None      <NA>         <NA>        NA         44.02428        -120.4836               NA      NA
3         <NA>       None      <NA>         <NA>        NA         44.02428        -120.4836               NA      NA
  elevcorrected sdbiidref siteiid
1            NA       105    9139
2            NA       105    9139
3            NA       105    9139

Aggressive rounding

Why?

Old Access snapshot from 2019:

# Carbon_and_Extractions.mn_dith
# tabulation of unique values
  0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9 
19163  3212   730   215   125    72    46    29    21

Latest snapshot from 2 weeks ago:

# chemical.manganese_dithionite_citrate
# tabulation of unique values
    0 
93749 

Fields in the chemical table possibly affected:
arsenic_mehlich3_extractable
fluoride_water_extractable
exchangeable_sodium
aluminum_kcl_extractable
molybdenum_mehlich3_extractabl
total_sulfur_ncs
f_satx
phosphorus_water
manganese_kcl_extractable
nitrite_water_extractable
no3_satx
organic_carbon_walkley_black
total_estimated_salts_satx
no2_satx
fe_ammoniumoxalate_extractable
k_nh4_ph_7
corrected_gypsum_lt_2_mm
aluminum_na_pyro_phosphate
na_nh4_ph_7
manganese_na_pyro_phosphate
phosphorus_water_extractable
iron_sodium_pyro_phosphate
bromide_water_extractable
oac_satx
acetate_water_extractable
total_nitrogen_ncs
po4_satx
ammoniumoxalate_opticaldensity
br_satx
co3_satx
manganese_dithionite_citrate

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.