ncss-tech / lab-data-delivery Goto Github PK
View Code? Open in Web Editor NEWRe-working the LIMS + NASIS data integration process for snapshot and future web-service based delivery.
Re-working the LIMS + NASIS data integration process for snapshot and future web-service based delivery.
BASE TABLE pedon.
EXEC SQL SELECT TOP 100 peiid/1 AS peiid, 1 AS id
FROM pedon
INNER JOIN phorizon ON phorizon.peiidref=pedon.peiid AND pedlabsampnum IS NOT NULL
AND peiid IN (SELECT p1.peiid
FROM pedon AS p1
INNER JOIN phorizon AS ph1 ON ph1.peiidref=p1.peiid AND pedlabsampnum IS NOT NULL
AND p1.peiid
IN
(9635, 9637, 9645, 9657, 9677, 9679, 9684, 9897, 9898, 20617, 27121, 27131, 27132, 27140, 27142, 27147, 27153, 49057, 49058, 49059, 49060, 49061, 50744, 50745, 50746, 50754, 50751, 50753, 55222, 63494, 64427, 64430, 64431, 64433, 64435, 64439, 64482, 64488, 64489, 64490, 64491, 64492, 64502, 64503, 64506, 98036, 297972, 66406, 66409, 67480, 67655, 67657, 67658, 67659, 67660, 67661, 98149, 98157, 98176, 99159, 99176, 99308, 99261, 99290, 99294, 99254, 99324, 99557, 99694, 99790, 64426, 108352, 108354, 108355, 108358, 108359, 108360, 108361, 108362, 108363, 49314, 111792, 111794, 1181334, 111807, 111840, 111842, 111846, 111860, 111868, 111870, 112669, 112794, 112827, 112884, 112942, 113179, 114082, 114035, 116114, 43391, 63788, 63789, 64602, 168290, 168170, 168177, 186624, 1176494, 831992, 951031, 167545, 167844, 167932, 168133, 168135, 168138, 168156, 168168, 168255, 168272, 195471, 220149, 235271, 242215, 253218, 254008, 254479, 254480, 254481, 254482, 254521, 256258, 256259, 261275, 346409, 361082, 713703, 361864, 361865, 361866, 361867, 361869, 361870, 361871, 361872, 361873, 362463, 362564, 364287, 364372, 364579, 364580, 364585, 364643, 364651, 364656, 364271, 364821, 364824, 365476, 366579, 366580, 366581, 366582, 366583, 366584, 366585, 366586, 366587, 366588, 366595, 366596, 371012, 371013, 371014, 371015, 371016, 371017, 371018, 371019, 371020, 371021, 371572, 371573, 371574, 371575, 371576, 371577, 365297, 365298, 365299, 338020, 351623, 261847, 352512, 352514, 365289, 365291, 366597,47811, 47812, 66852, 204228, 204236, 204233, 204234, 204226, 113829, 1072595, 1072597, 863924, 784495, 181718, 773877, 176352, 196353, 217739, 220530, 234969, 242112, 247621, 247724, 247882, 247894, 248563, 297788, 297824, 297831, 297833, 375637, 388229, 459643, 1017200, 598189, 662781, 788182, 799737, 799738, 799739, 824113, 825699, 825703, 833633, 842394, 879094, 878494, 885717, 885731, 885749, 885751, 885752, 885753, 885754, 903894, 903897, 903912, 833644, 668953, 10104, 10116, 10118, 10367, 10368, 10369, 10370, 10371, 10606, 10607, 10608, 10609, 10610, 10611, 10892, 10893, 10894, 27562, 46805, 46807, 46810, 46811, 46812, 46813, 46814, 46815, 46838, 46840, 46860, 46861, 46862, 46863, 46864, 46867, 46870, 46871, 46880, 46882, 46883, 46887, 46888, 46889, 46890, 46920, 46921, 46922, 46945, 46952, 46953, 46954, 46986, 46987, 46988, 46996, 46997, 47013, 47014, 47015, 47016, 47017, 47018, 47019, 47020, 47021, 47034, 47035, 47038, 47039, 47040, 47041, 47048, 47049, 47064, 47065, 47079, 50008, 51228, 51233, 56013, 56018, 56021, 56022, 56032, 56039, 56041, 56055, 56134, 56139, 56140, 56141, 56143, 56149, 56162, 67296, 67297, 67298, 67300, 67329, 67330, 67331, 9631, 19781, 30684, 30685, 34755, 34756, 34757, 34758, 34759, 34760, 34761, 34762, 34763, 34764, 36643, 44687, 44697, 44699, 44800, 49115, 49116, 50074, 50076, 50077, 50592, 50594, 50596, 50599, 50602, 50644, 51956, 52148, 52149, 810748, 52943, 63454, 63549, 63551, 66314, 66357, 66358, 66359, 66360, 66361, 67581, 67582, 67583, 67584, 67601, 67603, 67604, 88687, 79543, 65735, 65734, 65736, 65737, 50647, 115906, 133159, 133928, 133932, 146043, 73277, 161907, 164081, 164101, 164174, 164230, 165828, 50517, 180996, 186657, 186658, 186659, 192532, 192547, 192601, 195195, 219104, 220107, 220194, 220523, 220560, 240385, 253401, 266232, 294817, 774201, 313272, 313379, 313654, 313791, 313859, 313907, 850222, 339716, 1079962, 622689, 610101)
LEFT OUTER JOIN phsample AS phs1 ON phs1.phiidref=ph1.phiid GROUP BY p1.peiid
HAVING COUNT (phs1.phiidref) = 0 AND p1.peiid=pedon.peiid )GROUP BY peiid;
AGGREGATE ROWS BY id COLUMN peiid LIST ', '.
Some of the more important values that appear to be entirely NA, for example:
A complete listing can be found by looking over this file. Columns specified as TEXT when known to be numeric are likely wrong.
I have confirmed that the FGDB and resulting textfiles have this problem.
Also:
Specifically:
hzn_top
: wrong column type (1 record, related to parsing error)"
and other illegal characters in horizon designations: fields not parsed correctlySee problems/layer.txt for a list of parsing errors.
See related QC code.
It would be nice to post both ESRI-based and open-source alternatives of the NASIS/KSSL data.
For example:
The field names are not truncated in the FGDB.
This is an intermediate solution to the larger problem (#2) of converting tables in a FGDB to tables in an SQLite file.
Ideas:
How do we deal with "trace" values associated with analytes? Some options:
text
There are probably some tables that can be left out. For example, does it make sense to include all of the NASIS metadata tables? The ncss lab layer table is duplication of the KSSL data, or is it?
Discuss!
The current version isn't very informative because:
Data required by CEAP/APEX, via CAH:
clay
is more important than al_dith
aqp
library(soilDB)
library(aqp)
# get some example data
x <- fetchKSSL('amador')
# variable of interest
vars <- c('sand', 'silt', 'clay', 'db_13b', 'db_od', 'bs82', 'bs7', 'caco3', 'ec_12pre', 'ph_h2o', 'cec7', 'oc', 'w3cld', 'w15l2')
# get soil depth based on detection of "non-soil horizons"
sdc <- getSoilDepthClass(x, name = "hzn_desgn", top = "hzn_top", bottom = "hzn_bot", p = 'Cr|R|Cd')
# existing aqp function for detecting missing data
# inform max_depth via getSoilDepthClass(...)
# filter out non-soil horizons (there shouldn't be many in the KSSL data)
res <- missingDataGrid(x, max_depth=max(sdc$depth, na.rm = TRUE), vars=vars, filter.column = 'hzn_desgn', filter.regex = 'Cr|R|Cd')
# compute (simple) data completeness index
# 100 - [ sum(pct missing by variable) / n_variables ]
res$dci <- round(100 - (rowSums(res[, -1]) / length(vars)))
# copy back into site-level attributes
site(x) <- res[, c('pedon_key', 'dci')]
# compare with current "pedon completeness index": pretty close
plot(pedon_completeness_index ~ dci, data=site(x))
pedon_key sand silt clay db_13b db_od bs82 bs7 caco3 ec_12pre ph_h2o cec7 oc w3cld w15l2 dci
1 32585 0 0 0 0 0 0 0 100 100 0 0 100 0 0 79
2 52931 0 0 0 100 100 100 0 100 100 100 0 0 100 100 43
3 53074 0 0 0 100 100 100 0 100 100 100 0 0 100 100 43
4 53081 0 0 0 100 100 100 0 100 100 100 0 0 100 100 43
5 59423 0 0 0 100 100 0 100 0 100 0 0 100 100 0 57
Currently their are 38,456 records without coordinates. Therefore it would be helpful to know which areasymbol the sites reside in if their coordinates are missing. Right now they only way to determine the areasymbol is to parse the user_site_id or link to NASIS information.
I see their are keys in the ncss table. Can these by decoded? Are they found within the NASIS area table?
I noticed that the NASIS pedon FGDB contains duplicate rows in some tables.
For example, site record id 9139
has 3 rows in the FGDB, but only a single row in NASIS.
SELECT * from site WHERE siteiid = '9139';
usiteid latdegrees latminutes latseconds latdir longdegrees longminutes longseconds longdir horizdatnm
1 00-JRD-01 44 1 28 North 120 28 57 West North American Datum of 1927
2 00-JRD-01 44 1 28 North 120 28 57 West North American Datum of 1927
3 00-JRD-01 44 1 28 North 120 28 57 West North American Datum of 1927
locdesc plsssdetails plsssection plsstownship plssrange plssmeridian utmzone utmnorthing utmeasting
1 <NA> <NA> 7 18S 19E Willamette NA NA NA
2 <NA> <NA> 7 18S 19E Willamette NA NA NA
3 <NA> <NA> 7 18S 19E Willamette NA NA NA
geocoordsource elev geomposhill geomposmntn geompostrce geomposflats hillslopeprof geomslopeseg slope aspect
1 Estimated from other source 1628 <NA> <NA> <NA> <NA> <NA> <NA> 15 84
2 Estimated from other source 1628 <NA> <NA> <NA> <NA> <NA> <NA> 15 84
3 Estimated from other source 1628 <NA> <NA> <NA> <NA> <NA> <NA> 15 84
slopelenusle slopelenuptro shapeacross shapedown slopecomplex locphysnm siteksatclassupp siteksatclasslow drainagecl
1 NA NA <NA> <NA> simple <NA> <NA> <NA> Well drained
2 NA NA <NA> <NA> simple <NA> <NA> <NA> Well drained
3 NA NA <NA> <NA> simple <NA> <NA> <NA> Well drained
runoff drainagepattern pmgroupname pmgroupname_s climstaid climstanm climstatype ffd
1 Medium <NA> volcanic ash over residuum weathered from basalt 1 <NA> <NA> <NA> NA
2 Medium <NA> volcanic ash over residuum weathered from basalt 1 <NA> <NA> <NA> NA
3 Medium <NA> volcanic ash over residuum weathered from basalt 1 <NA> <NA> <NA> NA
map reannualprecip airtempa soiltempa airtemps soiltemps airtempw soiltempw benchmarksoilfla flodfreqcl floddurcl
1 NA NA NA NA NA NA NA NA 0 None <NA>
2 NA NA NA NA NA NA NA NA 0 None <NA>
3 NA NA NA NA NA NA NA NA 0 None <NA>
flodmonthbeg pondfreqcl ponddurcl pondmonthbeg wtabledur latstddecimaldeg longstddecimalde gpspositionalerr gpspdop
1 <NA> None <NA> <NA> NA 44.02428 -120.4836 NA NA
2 <NA> None <NA> <NA> NA 44.02428 -120.4836 NA NA
3 <NA> None <NA> <NA> NA 44.02428 -120.4836 NA NA
elevcorrected sdbiidref siteiid
1 NA 105 9139
2 NA 105 9139
3 NA 105 9139
Why?
Old Access snapshot from 2019:
# Carbon_and_Extractions.mn_dith
# tabulation of unique values
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
19163 3212 730 215 125 72 46 29 21
Latest snapshot from 2 weeks ago:
# chemical.manganese_dithionite_citrate
# tabulation of unique values
0
93749
Fields in the chemical
table possibly affected:
arsenic_mehlich3_extractable
fluoride_water_extractable
exchangeable_sodium
aluminum_kcl_extractable
molybdenum_mehlich3_extractabl
total_sulfur_ncs
f_satx
phosphorus_water
manganese_kcl_extractable
nitrite_water_extractable
no3_satx
organic_carbon_walkley_black
total_estimated_salts_satx
no2_satx
fe_ammoniumoxalate_extractable
k_nh4_ph_7
corrected_gypsum_lt_2_mm
aluminum_na_pyro_phosphate
na_nh4_ph_7
manganese_na_pyro_phosphate
phosphorus_water_extractable
iron_sodium_pyro_phosphate
bromide_water_extractable
oac_satx
acetate_water_extractable
total_nitrogen_ncs
po4_satx
ammoniumoxalate_opticaldensity
br_satx
co3_satx
manganese_dithionite_citrate
What is the logic used to generate ssl_name
in the current snapshot, and when should the most recent record in the petaxhistory table be used instead?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.