odm2 / odm2 Goto Github PK

View Code? Open in Web Editor NEW

58.0 54.0 18.0 109.15 MB

An information model for spatially-discrete, feature-based earth observations.

License: Other

Python 0.95% SQLPL 0.46% PLSQL 3.85% TSQL 79.35% PLpgSQL 15.39%

odm2 earth-observations

odm2's Introduction

ODM2

The next version of the Observations Data Model.

For more information about the ODM2 development project, visit the wiki.

Have a look at the ODM2 paper in Environmental Modelling & Software. It's open access!

Horsburgh, J. S., Aufdenkampe, A. K., Mayorga, E., Lehnert, K. A., Hsu, L., Song, L., Spackman Jones, A., Damiano, S. G., Tarboton, D. G., Valentine, D., Zaslavsky, I., Whitenack, T. (2016). Observations Data Model 2: A community information model for spatially discrete Earth observations, Environmental Modelling & Software, 79, 55-74, http://dx.doi.org/10.1016/j.envsoft.2016.01.010

If you are interested in learning more about how ODM2 supports different use cases, have a look at our recent paper in the Data Science Journal.

Hsu, L., Mayorga, E., Horsburgh, J. S., Carter, M. R., Lehnert, K. A., Brantley, S. L. (2017), Enhancing Interoperability and Capabilities of Earth Science Data using the Observations Data Model 2 (ODM2), Data Science Journal, 16(4), 1-16, http://dx.doi.org/10.5334/dsj-2017-004.

Getting Started with ODM2

SQL scripts for generating blank ODM2 databases can be found at the following locations:

View Documentation of ODM2 Concepts

For more information on ODM2 concepts, examples, best practices, the ODM2 software ecosystem, etc., visit the Documentation page on the wiki.

View Diagrams and Documentation of the ODM2 Schema

Schema diagrams for the current version of the ODM2 schema are at:

Entity Relationship Diagrams

Data Use Cases

The following data use cases are available. We have focused on designing ODM2 to support these data use cases. Available code and documentation show how these data use cases were mapped to the ODM2.

Little Bear River - Hydrologic time series and water quality samples from an ODM 1.1.1 database. Implements an ODM2 database in Microsoft SQL Server.
PRISM-XAN - Water quality depth profiles and samples from Puget Sound. Implements an ODM2 database in PostgreSQL.

Our Goal with ODM2

We are working to develop a community information model to extend interoperability of spatially discrete, feature based earth observations derived from sensors and samples and improve the capture, sharing, and archival of these data. This information model, called ODM2, is being designed from a general perspective, with extensibility for achieving interoperability across multiple disciplines and systems that support publication of earth observations.

Credits

This work was supported by National Science Foundation Grant EAR-1224638. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

ODM2 draws heavily form our prior work with the CUAHSI Hydrologic information system and ODM 1.1.1 (Horsburgh et al., 2008; Horsburgh and Tarboton, 2008), our experiences workin on the Critical Zone Observatory Integrated Data Management System (CZOData), and our experiences with the EarthChem systems (e.g., Lehnert et al., 2007; Lehnert et al., 2009). It also extensively uses concepts from the Open Geospatial Consortium's Observations & Measurements standard (Cox, 2007a; Cox, 2007b; Cox, 2011a; Cox, 2011b; ISO, 2011).

References

See a full list of ODM2 related references

Cox, S.J.D. (2007a). Observations and Measurements - Part 1 - Observation schema, OGC Implementation Specification, OGC 07-022r1. 73 + xi. http://portal.opengeospatial.org/files/22466.

Cox, S.J.D. (2007b). Observations and Measurements – Part 2 - Sampling Features, OGC Implementation Specification, OGC 07-002r3. 36 + ix. http://portal.opengeospatial.org/files/22467.

Cox, S.J.D. (2011a). Geographic Information - Observations and Measurements, OGC Abstract Specification Topic 20 (same as ISO 19156:2011), OGC 10-004r3. 54. http://dx.doi.org/10.13140/2.1.1142.3042.

Cox, S.J.D. (2011b). Observations and Measurements - XML Implementation, OGC Implementation Standard, OGC 10-025r1. 66 + x. http://portal.opengeospatial.org/files/41510 (accessed September 16, 2014).

Horsburgh, J.S., D.G. Tarboton, D.R. Maidment, and I. Zaslavsky (2008). A relational model for environmental and water resources data, Water Resources Research, 44, W05406, http://dx.doi.org/10.1029/2007WR006392.

Horsburgh, J.S., D.G. Tarboton (2008). CUAHSI Community Observations Data Model (ODM) Version 1.1.1 Design Specifications, CUAHSI Open Source Software Tools, http://www.codeplex.com/Download?ProjectName=HydroServer&DownloadId=349176.

ISO 19156:2011 - Geographic information -- Observations and Measurements, International Standard (2011), International Organization for Standardization, Geneva. http://dx.doi.org/10.13140/2.1.1142.3042.

Lehnert, K.A., Walker, D., Vinay, S., Djapic, B., Ash, J., Falk, B. (2007). Community-Based Development of Standards for Geochemical and Geochronological Data, Eos Trans. AGU, 88(52), Fall Meet. Suppl., Abstract IN52A-09.

Lehnert, K.A., Walker, D., Block, K.A., Ash, J.M., Chan, C. (2009). EarthChem: Next developments to meet new demands, American Geophysical Union, Fall Meeting 2009, Abstract #V12C-01.

odm2's People

Contributors

Stargazers

Watchers

Forkers

valentinedwv castronova twhiteaker philhutch miguelcleon lottaa zubayedrakib glhey lsetiawan sr7cb amabdallah jkreft-usgs internetofwater daghjermann jillfalman henuliyanying insuyo

odm2's Issues

MethodTypesCV discussion: scope, content, etc

From Anthony, Sept 19:
Sara and I have worked on TypesCVs since last friday. Sara created black templates for all 15 Types CVs here: https://drive.google.com/folderview?id=0B3v0QxIOuR_nQ0k4Zm5BNjhlbDQ I worked a bit more on ActionTypes & MethodTypes CVs. I think we're converging on ActionTypes, and it's looking reasonably good to me and ready to put on MMI for further refinement (after you decide what fields to preserve). For MethodTypes, we still have two major approaches on the table and its unclear to me which is best:

Option 1 has MethodTypes that are nearly 1:1 with ActionTypes.
Option 2 has MethodTypeCategories match 1:1 with ActionTypes, which allows a good bit more detail in MethodTypes (i.e. DischargeMeasurement or ICPMS could each become MethodTypes). This option could potentially provide much more power to queries because it would then allow one to easily record Method details under each record in the Methods table but still group those methods at a higher level (i.e. "Discharge by acoustic doppler velocity readings at 6/10 depth at 10 points in the channel cross section" or "High resolution ICP-MS with multi-collector for isotope ratios" would still be captured by a MethodType filter for DischargeMeasurement or ICPMS). We all envisioned a huge benefit to allowing for such detailed method names (knowing that none of our current systems have that level of detail, but that this was a constraint). The drawbacks of this approach are that:
- a MethodType might often require immediate knowledge of the ActionType in order to understand it (i.e. SpecimenCollection.Automated). Does this approach require that we prepend ActionType to the term?
- the list of Method types would be very long and growing (i.e. PetDB already has 99 methods that would all translate to MethodTypes using this approach: http://www.earthchem.org/petdbWeb/search/vocabulary.jsp?category=Method). Therefore, the list of MethodTypes terms would likely be at least 1000 records and growing. That's not what we had in mind for our TypesCVs, but is MethodType our exception? If not, then do we need one more level of grouping in our Methods Table/Entity?

ODM2CV: Implement Login and Moderation System

Implement authentication for website admins and allow them to view, edit, and approve/reject submissions.

Need globally unique, persistent identifiers for Results and Datasets

To support the cataloging and other functional use cases, having a globaly unique and persistent identifier for each granular chunk that is cataloged is important. For ODM2, this is basically at the level of Results and Datasets. These need globally unique and persistent identifiers that go with the data to any catalogs.

ODMCV: Implement Edits to Existing Terms

Right now the "Edit" button on an individual term page doesn't work. Implement the ability to submit an edit to an existing term.

Add ability to annotate Annotations

As @izaslavsky suggested in our meeting last week, we might want to add the capability to store Annotations of Annotations.

Taxonomic Classifiers Feature Branch -- Propose design.

From Jan. 31, 2013 email:
Hi All,
Thanks for that discussion today on approaches to bring taxonomic & diversity data into ODM2 (or profile of OGC Observations and Measurements). Let's use this thread to share info and ideas on how to move forward.

In general, I like the solution that Tim Whiteaker tried out in ODM1 and that Jeff described. To add a "TaxonomicID" as a new field below and separate from "VariableID" in the Results table. This would link to a "Taxa" table that would be parallel the Variables table, but perhaps have even fewer fields. The main use of the "Taxa" table would be to link to an External Identifier, its URN and its resolver URL in the same way we recently implemented External Identifiers elsewhere (see http://uchic.github.io/ODM2/schemas/ODM2_Current/diagrams/ODM2ExternalIdentifers.html, thanks Jeff for posting these diagrams so soon after our meeting). This would allow the external Taxonomic Identifier to both act as:

a result of an observation act, where MethodName = "Taxanomic Classification" or something similar and ResultsValues.DataValue = True.
an object of an observation act (similar to Variable), where VariableName = "Count of" or "Percent of Feature" or some thing similar and ResultsValues.DataValue = a number.
Let's ponder this for a few days, and if it sticks, I'll open a new Feature Branch and flesh it out.

Here are some of the more interesting links and info that I came across:

Taxonomic Data Models

Blog Post: Biodiversity databases and OGC standards don't play well together - JULY 3, 2009. http://biodivertido.blogspot.com/2009/07/biodiversity-databases-and-ogc.html
This described the general problem and has a commenter suggest that that species are really Domain Features.
Presentation: A generic database for tagging data and much more. 2012. http://www.iotc.org/files/proceedings/2012/sym/07_PSym.pdf
Goal is to merge fish biodiversity data with geographic data and observations. Although inspired by OGC O&M, this data model failed (in my view) at deriving something generic (102 tables!)
GBIF Developer Blog: Taxonomic Trees in PostgreSQL - TUESDAY, 12 JUNE 2012. http://gbif.blogspot.com/2012/06/taxonomic-trees-in-postgresql.html
Although mostly about recursive query performance, this post shows the core of their taxonomic model, which doesn't really help us too much because we want their system to handle that part.
Common Data Model of the European Distributed Institute of Taxonomy. http://dev.e-taxonomy.eu/trac/wiki/CommonDataModel.
They provide a detailed view of their data model at http://cybertaxonomy.eu/cdm/v31/, but they don't really have a connection to observations.

Taxonomic Systems & Resolvers

Global Biodiversity Information Facility (GBIF) seems to be the primary global system for biological taxonomic data.
GBIF Web Services APIs (http://www.gbif.org/developer/summary are used by a large number of other biodiversity projects (http://www.gbif.org/usingdata/dataapplications).
GBIF infrastructure is built around Darwin Core, EML, UUID, and DOIs. http://www.gbif.org/infrastructure/registry.
Mineral taxonomy of the International Mineralogical Association (IMS). http://pubsites.uws.edu.au/ima-cnmnc/.
SESAR uses this list of approved mineral names.
GeoSciML Earth Material Taxonomy
Kerstin mentioned that Steve Richard is working on this for US GIN, but I couldn't find a good website.
Others???

Here's a 2008 paper that does a nice job of explaining the motivation and benefits of using external identifiers for the Biodiversity use case:

Page, R. D. (2008). Biodiversity informatics: the challenge of linking data and the role of shared identifiers. Briefings in bioinformatics, 9(5), 345-354.

Let's build on this and dig around a bit more for other examples of data models attempting to merge OGC O&M with taxonomic data.

Create UML diagrams directly from Python/SQLAlchemy code in ODM2 API

On our ODM2 conference call today we discussed the huge benefits of creating UML diagrams directly from the API code that Jeff's group has been working on. Here are links to a few approaches to do that directly from Python:

https://pypi.python.org/pypi/sadisplay
https://bitbucket.org/estin/sadisplay/wiki/Home

http://www.sqlalchemy.org/trac/wiki/UsageRecipes/SchemaDisplay

Develop guidelines and conventions for MMI-hosted vocabularies

These guidelines and conventions should include:

Common attributes to be used in all (or nearly all) vocabularies. My tentative, first suggestion is: term, name, definition, category, notes, provenance, and provenance_uri. These will be reviewed and finalized soon.
Case conventions for vocabulary attributes and terms. Probably a choice between CamelCase and lowercase_underscore. Will consult MMI's John Graybeal and Carlos Rueda for suggestions.
Descriptive vocabulary information/metadata. For example: keywords, creator, brief description, references

Python script to post-process SQL DDL for new ODM2 database

I've created a first working version of a Python script to post-process the PostgreSQL SQL DDL file output by DbWrench: https://github.com/UCHIC/ODM2/blob/master/src/blank_schema_scripts/postgresql/DbWrench_DDL_postprocess.py
See the comments on that scripts for more info and instructions.

One thing to note is that a schema still needs to be created, to hold all ODM2 entities. This script does that. It also tweaks the PostGIS geometry field (samplingfeatures.featuregeometry) to configure it to the usage I've described earlier (from my use case).

@Castronova, take a look at it and try adapting it for MS SQL Server (and MySQL?). I don't know if in the end this should be a single script with switches for each RDBMS, or individual, parallel scripts. I lean towards the former, but we can decide that later.

Restructure for Flexibility in External Identifiers

Do we want to somehow restructure external identifiers so that an external identifier can be specified for any field? Right now, there is a specific linking table for each table that can have an external identifier. This means that with the current structure an Organization cannot have any external identifiers without creating a new bridging table. Since ORCID's can apply to Organizations, we might want to reconsider that limit.

Could we restructure somehow to have a single master bridging table to link External Identifiers to any other table? Is that uglier and scarier than just adding bridging tables as needed? The same logic applies to annotations.

Remove Photos from Sensors Extension and Implement as Annotations

We decided last week that photos should be implemented as Annotations because they may be more generally applicable and useful outside the Sensors extension.

Documentation Completion

There are several articles that still need to be written and/or updated for thorough documentation of ODM2 on GitHUB. The documentation is inventoried here: https://docs.google.com/a/aggiemail.usu.edu/spreadsheets/d/1Ms9wTivbX21gVTj5DkHGIsmDlHsq2TUojX4IbvLsw6Q/edit#gid=0

Articles describing ODM2 development and extensions are of highest priority. We will also keep a running list of ODM2 'Best Practices' that need to be described, which we will address as we are able.

TimeSeriesResults is missing Begin and End dates

I am writing a query to select metadata about all Results in the database (i.e. units, variables, geometry, methods, people, affiliation, start date, end date, etc.... It seems that begin and end times are missing from the TimeSeriesResults tables. To determine these I would need to query all of the TimeSeriesResultValues for each TimeSeriesResult and return the minimum and maximum ValueDateTime. I'm guessing that this will end up being very inefficient and slow for large TimeSeriesResult datasets.

So:

Am I missing something...are begin and end times stores somewhere else?
Is there currently and efficient way of determining time series start and end times?
Should BeginDateTime, BeginDateTimeUtcOffset, EndDateTime, and *EndDateTimeUtcOffset be added to the TimeSeriesResult table?
Does this effect any of the other Result types?

ODM2 naming conventions, case sensitivity, RDBMS, etc

I'm pasting in a closed thread into GitHub to archive our decisions and why.

On Thu, Oct 24, 2013 at 4:42 PM, Emilio Mayorga [email protected] wrote:

Now that Jeff has a SQL Server instance and two of us are starting on PostgreSQL instances, we'll have to face the annoying issue of case sensitivity in SQL (the standard, vs each RDBMS). Here's a nice summary and comparison across systems:
http://www.alberton.info/dbms_identifiers_and_case_sensitivity.html

For SQL Server vs PostgreSQL, the crux of the matter is that SQL Server allows object names (schemas, tables, fields, etc) that are case sensitive, such as the CamelCase convention we're currently using in ODM2. But PostgreSQL (and Oracle, I think), and more generally the SQL standard, don't allow that unless the objects are quoted every single time (eg, "CamelCase"; SELECT "DataValues" FROM "ResultValues" WHERE "ResultsID" = 123). This is such a PITA, it's really a no-go for operational use in PostgreSQL. eg:
http://www.thenextage.com/wordpress/postgresql-case-sensitivity-part-1-the-ddl/

ODM 1.x having been implemented largely in SQL Server, it followed the CamelCase convention; MySQL is roughly like SQL Server, but with additional wrinkles, from the little I've read.

I see a few options:

Stick to CamelCase, and alienate most users of PostgreSQL and possibly Oracle and other systems.
Stick to CamelCase, but let implementers do what they wish (eg, convert to all lowercase, or to all lowercase with underscore separators). Just live with that variability and the possibility that code that interacts with relational databases will not be portable.
Switch to a case insensitive naming convention, at least for RDBMS implementations (XML implementations can stay CamelCase); say, all lowercase with underscores.
Use a case insensitive convention everywhere, in all information model manifestations. But that won't fly easily b/c we're leveraging existing standards like OGC ones (O&M, GML, etc) that use CamelCase.

I'm curious to hear what Leslie and Lulin plan to do. Also, maybe Tom and Dave have some insight about how to minimize this heterogeneity, or minimize the pain.

I think for now I'll use a function in PostgreSQL to convert all CamelCase object names to underscored lowercase, then carry on that way for now so I don't have to quote everything:
http://www.postgresonline.com/journal/archives/219-SQL-Server-to-PostgreSQL-Converting-table-structure.html
Of course, this will break all possibility of bidirectional syncing between DbWrench and my postgresql implementation. But that may not be a big deal at this early stage.

We can delay making permanent decisions until we've made more fundamental progress with ODM2, so we don't get distracted. But we probably shouldn't delay the decision for more than 4-6 weeks; say, we could decide at AGU.

Move TimeAggregationIntervalUnitsID out of ValueID

The suggestion is to move this up to to the corresponding *Results tables.

IntendedObservationSpacing for both space and time?

Given that we can have Results that are of type Time Series and of other types such as Spatial Coverage, do we need IntendedObservationSpacing in the Results table for both space and time?

Where should Navigation Method information go?

On Mon, Nov 3, 2014 at 11:19 AM, Kerstin Lehnert wrote:

We are in a meeting about the PetDB migration to ODM2, and in a discussion how to enter information about the method used to record the geospatial coordinates. We have a lot of old samples/stations in PetDB that were collected before the days of GPS. And we have samples collected by submersibles.

Where would you fit the information about the navigation?

Supporting time series with varying time support and spacing?

Do we intend to support time series (or Results of other types) with varying time support and spacing for the values (e.g., should IntendedObservationSpacing, AggregationDuration, and InterpolationTypeCV be at the Result or ResultValues level?).

Right now we have IntendedObservationSpacing at the Result level and then AggregationDuration and InterpolationTypeCV at the ResultValues level. I think this is the correct implementation because in reality you intend to collect data with a certain frequency, but may modify that at some point over time. So, each individual value has its own AggregationDuration and InterpolationTypeCV.

Standardization of Spatial Offsets

Right now we have spatial offsets in two different places in the schema (one in SamplingFeatures and one in ResultValues). Can we standardize the way SpatialOffsets are represented in both places?

Primary Keys in Sites and Specimens

It has been proposed to drop SiteID and SpecimenID from the Sites and Specimens entities and instead have SamplingFeatureID serve as the primary key for those entities - and at the same time a foreign key to the SamplingFeatures entity.

I really don’t have any objections to this. I have read up a little bit and it seems like an implementation issue. I will say that it complicated my script to create my ODM2 database from my Little Bear River ODM 1.1 database because there was no such thing as a SamplingFeatureID in ODM 1.1.1 and I ended up making the SamplingFeatureIDs the same as my SiteIDs anyway.

Those relationships are “identifying relationships”, which means that the existence of a row in the child table (Specimens or Sites) depends on a row in a parent table (SamplingFeatures). From what I have read, it’s pretty common for people to create a primary key in the child table that does not include the foreign key to the parent table. But, some believe that the “right” way to formally capture this is to have the foreign key from the parent table be part of the child’s primary key (in our case for these tables it would simply be the child’s primary key). Most are saying that this is the way super type/subtype relationships "should" be modeled.

The logical relationship is that the child cannot exist without the parent. And, there should probably be a check constraint eventually that makes sure a given SamplingFeatureID ends up in ONLY ONE child table (it is either a Site OR a Specimen, it can’t be both).

I suggest we accept Bruce’s suggestion and git rid of SiteID and SpecimenID.

As a related note: there are other places where this may affect the schema (e.g., Actions). We can only do what is suggested above when there is a 1:1 relationship between the parent entity and the child entity.

Replace ODM2SamplingFeatures.SpatialReferences with simple EPSG codes and other external refs?

I'm not persuaded we need ODM2SamplingFeatures.SpatialReferences. EPSG codes (http://www.epsg-registry.org/) are pretty universally known and understood these days. Web resolvers exist left and right, including OGC ones, and epsg SRS databases are widely disseminated. What need does this table meet that can't be handled by an epsg code? If the need is to define one's own custom SRS (eg, a purely local one), then this table doesn't have enough smarts to do it anyway; besides, standards exist to define them.

I think ultimately we should rely as much as possible on OGC/EPSG URI's, other external resources (eg, http://spatialreference.org), and a similar loose coupling system for locally defined spatial refs. eg:
http://www.opengis.net/def/crs/EPSG/0/4326
http://www.epsg-registry.org/indicio/query?request=GetRepositoryItem&id=urn:ogc:def:crs:EPSG::4326
http://spatialreference.org/ref/epsg/4326/
We shouldn't get in the business of fully defining widely used spatial references (which seems to be sort-of the purpose is for ODM2SamplingFeatures.SpatialReferences) when there are widely used standards available for that.

Now, if we're dealing with complex SRS (eg, 3D ones) defined via >1 EPSG, that may be a different matter ...

ODM2Provenance aligned with PROV?

@emiliom pointed us to an interesting article recently. How Should We Cite Data? Edmund Hart, 2014/03/25 (NEON). http://emhart.info/blog/2014/03/25/data-citation/

In the discussion thread, I learned that DataOne is using the PROV relationship expression model/language: http://www.w3.org/TR/2013/REC-prov-dm-20130430/#section-prov-overview

I was quite please to discover that the information model for PROV is very similar to the core of ODM2, and the PROV relationship types could provide a very useful reference for ODM2's RelationshipTypeCV. Therefore:

As we are working on the RelationshipTypeCV, we might want to look at PROV.
As we are working on ODM2.1, we might want to provide a full mapping to PROV.

Do we need DataTypeCV in the Variables Entity?

In ODM 1.X, the DataTypeCV indicated the recorded statistic - e.g., "Median" value, "Average" value, "Minimum" value, etc. My understanding of InterpolationTypeCV in the ODM2Results.ResultValues entity is that it is the same thing. I believe I adopted InterpolationTypeCV from WaterML 2.0. So, what is essentially the same information is in both places.

I vote for removing DataTypeCV from the Variables entity and keeping InterpolationTypeCV in ResultValues. I don't really like the term "InterpolationType", but I never really liked the term "DataType" either because it didn't convey the information that was really in that field and I always confused it with "ValueType." I think a better term is "RecordedStatistic" but we may want to keep "InterpolationTypeCV" to be consistent with WaterML 2.0.

Is ValueTypeCV an attribute of a Variable or a Result?

In ODM 1.x, ValueTypes were things like "Field Observation", "Model Simulation Result", "Derived Value", etc. Should this be an attribute of a Variable (where it is now) or of a Result?

ODM2CV: Modifications to API

The following changes need to be made to the ODM2 CV API:

Change the base URL for all URIs to http://vocabulary.odm2.org, including the ODM2 XMLNS
Make sure all ODM2 vocabularies and individual terms are exposed via the API

Domain Features -- Add a simple extension?

Notes from my Jan. 31, 2014 email:

I know that today we were leaning away from the idea of possibly using OGC Domain Features for taxa (discussion threads on Taxonomic Classifiers in #4 & #15), however we might still want to consider having a very minimal implementation of Domain Features. The reason is that nearly all of our use cases have a need to include some geographic information. We saw in:

PetDBSchema-Redesign-22.png the need for Country and State information (https://drive.google.com/file/d/0B5JYwkWgsLhRRl9OSGhrel9oSmM/edit?usp=sharing).
CZchemDB-schema-v10.png has StateCode, SSURGO_ID (USDA Soil Survey Geographic database) and SoilTaxonomy as SamplingSite attributes (https://drive.google.com/file/d/0B5JYwkWgsLhRRl9OSGhrel9oSmM/edit?usp=sharing).
ODM1.1.1SchemaDiagram.pdf also has State and Country in it's sites table, but in addition it includes an extension to connect to the ArcHydro Framework (see Fig. 2 in Horsburgh et al. 2008). https://drive.google.com/file/d/0B0mUuf2-qdlTYW1lRDZFYnc1R3M/edit?usp=sharing
And we can imagine other use cases that will crop up before long.

So do we include a very simple approach to connect ODM2 to external Domain Feature frameworks & resolvers? Can we do this within the ExternalIdentifiers schema using that approach, or something similar?

Workflows extension Feature Branch -- Review design.

A Workflows extension could be a solution to the challenge of sequences of actions that need to be explicitly known to interpret a data value.

Workflows_FeatureBranch was created in GitHub by Anthony: https://github.com/UCHIC/ODM2/tree/Workflows_FeatureBranch
See detailed notes on proposed tables & fields: https://docs.google.com/document/d/1P0y-NNNiKQ170dUkHmU2XUrHku2rVB9CNhFR3YVf6R8/edit#bookmark=id.yr568lcij1gk
This is a solution that allows for unambiguous traversing the parent/child relationships for actions from either direction (presently only unambiguous if you go up the tree, not down.).
Already spent several hours discussing this topic at IEDA on 12/17/2013. IEDA's PetDBII implementation relies on this.

ODM2CV: Implement All CVs

Right now the ODM2 CV web application works with a subset of the CVs. The following need to be done:

Create database tables for all of the ODM2 CVs following the pattern established for the existing ones
Import data for each CV from the Google Spreadsheet
Implement models and views for each CV in the web application

Issues with schema

directives.directiveID should not be nullable.

Roles.RoleDescription Null.. that makes no sense. No use for an entry in that case.

Should we just make
ActionPeople.RoleId, a CV?
Rename ActionPeople to ActionBy

OdmSamples.BatchLineSamples
missing constraint/relation on SampleID

ODM2Sensors.DeploymentActions
Missing relationship to spatial offset

SamplingFeatureTypeCV, SamplingFeatureGeoTypeCV and SiteTypeCV discussion: scope, content, etc

This is a stub for discussions on CV's related to geospatial and environment descriptors for sampling features. Some related issues are issue #31, #16, and possibly #13. It's also helpful to refer to the SamplingFeatures schema.

Communicating overall design of a monitoring system

How do we store information about a project?
Possible entities:

Action.
Organization

EPA WQX has a top level project, and projects have activities.

ODM2 has actions. Can we use a generic action?
Or can projects be treated as an organization/person.

Action Types and ActionTypeCV scope

There are multiple ActionTypes that may produce a Result - e.g., sensor deployments, sample analyses, etc. But, there are other ActionTypes that do not create Results. The ActionTypeCV should indicate which ActionTypes can produce a Result.

Length of some foreign keys > 64, should be < 64 for MySQL

I generated a MySQL create script for an ODM2 db using DBWrench. This fails on

fk_ReferenceMaterialExternalIdentifiers_ExternalIdentifierSystems
fk_TaxonomicClassifierExternalIdentifiers_ExternalIdentifierSystems

which are longer than 64 characters. MySQL has a 64 character limit (which can not really be changed), so it would be good to enforce this and fix these two.

Pre-Populating External Identifier Systems

Like units and spatial references, which are pre-populated non-CV tables, I think we should pre-populate the ExternalIdentifierSystems table with really common systems like DOI's, OrcID's, and IGSN's.

Finalize Sensor extension schema

This is our last extension schema that needs work since our addition of the FeatureAction table and highly relevant changes to the Equipment, DataQuality and Provenance schemas. However, let's wait until we finish our current review of Results. Issue #42 should be considered when working on this.

API Check List

API written in Python using SQLAlchemy Progress Chart

Specimen metadata, types, etc. -- Review

Review and confirm whether Specimen table, the SamplingFeature schema in general and it's links to External Identifiers are all sufficient to meet IEDA's needs (assuming that the issue of performance during recursive queries can be resolved).

Provide specific input to SamplingFeatureTypesCV DRAFT:
https://docs.google.com/a/stroudcenter.org/spreadsheet/ccc?key=0AiAbrk7WhhmjdFd6dUdZcnJkdWpwYWpvLXhmSUhjSUE&usp=drive_web#gid=6

and SpatialOffsetTypeCV DRAFT:
https://docs.google.com/a/stroudcenter.org/spreadsheet/ccc?key=0AiAbrk7WhhmjdFd6dUdZcnJkdWpwYWpvLXhmSUhjSUE&usp=drive_web#gid=8

Assigned to Kerstin & IEDA team during 2014-1-17 ODM2 Working Teleconference. See notes here: https://docs.google.com/a/stroudcenter.org/document/d/1P0y-NNNiKQ170dUkHmU2XUrHku2rVB9CNhFR3YVf6R8/edit#bookmark=id.ag5ij1ga6yrs

ODM2CV: Modifications to Web Interface Views

The following initial changes need to be made to the ODM2CV web user interface views:

Change all references to "ODM2 Control Vocabularies" to "ODM2 Controlled Vocabularies"
Add the description of each CV from the status google document to the front page and to the top of the view for the individual CV (just above the table of terms)
Add the following text to the footer of each page:

This material is based upon work supported by the National Science Foundation under Grants 1224638 and 1332257. Any opinions, fndings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Change the text on the front page to the following:

Version 2 of the Observations Data Model (ODM2) has several controlled vocabularies. This web page was developed to promote consistency between different instances of ODM2 through a community moderated system for managing the master controlled vocabularies. This web page displays the master controlled vocabulary entries and allows you to request additions or changes to these. You may then use these terms in an ODM2 database or in files that are intended to be interoperable with ODM2.

Changes that you request are forwarded to the moderators who will attend to requests as promptly as possible. When you submit a request, you should receive an email verifying that your request has been received. When your request is approved, you should also receive an email confirmation. If you have a request that cannot be accommodated on this website, please contact the moderators:

Make sure the title link at the top of every page is linked to the home page. Right now it links to whatever page you are on.
Provide a link at the top of each CV page to the API URL for the same content. For example on the page http://vocabulary.odm2.org/actiontype/ add a link to http://vocabulary.odm2.org/api/v1/actiontypecv/?format=skos. The text of the link should say "View this CV in SKOS" for an entire CV or "View this term in SKOS" for an individual term.
Modify the view for the individual term to take up the full width of the application. Right now it only uses about half of it.
Modify the Add term form to take up the full width of the application. Right now it takes up about half of the width.
Make the Add term form more compact vertically. The text boxes don't need to be quite as long vertically.
On the page for viewing a vocabulary, make it more obvious that you can click on each of the terms to get to a page for each term. Right now the link text is too similar to the other text.
On the page for viewing a vocabulary, implement three columns: Term, Name, Definition
On the Add term form, change "Submitter name" to "Your name", change "Submitter email" to "Your email", and "Request reason" to "Reason for your request"

Results for non-numeric data types -- Add to Results schema

Need to represent additional, non-numeric result types.

Need:

CategoryObservations (NOT DONE, but necessary for CZchemDB, PetDB, Leaf Pack Network and other uses cases)
CountObservations (probably necessary for taxonomic observations),
TruthObservations,
TemporalObservations (MAYBE)

Probably accommodated in present ODM2:

GeometryObservations (MAY NOT NEED),
PointCoverage (DONE),
TimeSeriesCoverage (DONE),
ProfileCoverage (DONE), etc.

This issues was discussed and assigned (to Anthony & Emilio) during 2014-1-17 ODM2 Working Teleconference. See notes here: https://docs.google.com/a/stroudcenter.org/document/d/1P0y-NNNiKQ170dUkHmU2XUrHku2rVB9CNhFR3YVf6R8/edit#bookmark=id.ag5ij1ga6yrs

SamplingFeatureName in SamplingFeatures

Emilio's edits to the SamplingFeatures schema inserted a new field called SamplingFeatureName into the SamplingFeatures entity. The issue is that most Specimens will not have a name and Sites are already required to have a SiteName. So, do we need additional entities (besides Specimens and Sites) in the SamplingFeatures schema to describe the types of SamplingFeatures that Emilio is naming in the SamplingFeatures entity? It is redundant to have SamplingFeatureName and SiteName.

Controlled Vocabularies: Create all CV entities and PK:FK relationships

Assigned during 2014-1-17 ODM2 Working Teleconference. See notes here: https://docs.google.com/a/stroudcenter.org/document/d/1P0y-NNNiKQ170dUkHmU2XUrHku2rVB9CNhFR3YVf6R8/edit#bookmark=id.ag5ij1ga6yrs

Provenance schema needs to be finalized

Finalize model based on data use cases.

Where to put relationships between observations and publications? There is a proposed solution in Lulin’s uploaded petdb-redesign....xml (https://drive.google.com/?tab=mo&authuser=0#folders/0B05XOc5jq65yR3JuaUZmeVpaREk) where there is a data_source table connected to the action table. Note that in data_source, “data” refers to any type of generic data or information (not just a data value)
Can we reconstruct a published table of data values? (PetDB use case)

Assigned to Jeff during 2014-1-17 ODM2 Working Teleconference. See notes here: https://docs.google.com/a/stroudcenter.org/document/d/1P0y-NNNiKQ170dUkHmU2XUrHku2rVB9CNhFR3YVf6R8/edit#bookmark=id.ag5ij1ga6yrs

Line or Polygon Sampling Features?

We currently have the capability to have sampling features of type Specimen, Site (point), and ReferenceMaterial. The SamplingFeatures entity actually supports point, line, or polygon features, but we do not have any subclass entities for Line or Polygon features - whereas we do for Specimens, Sites, and ReferenceMaterials. Do we need specific entities for these?

UML Review

Pushed to github pages:
http://uchic.github.io/ODM2/uml/index.htm?goto=2:25

Added links to Documentation in the master branch

Equipment, Sensors and Samples schemas -- Needs work!

Revise to support both Samples and Sensors extensions, which both need to draw from and be harmonized with the Equipment Extension. Need quite a bit of work here.

Assigned to Jeff, Anthony & Kerstin during 2014-1-17 ODM2 Working Teleconference. See notes here: https://docs.google.com/a/stroudcenter.org/document/d/1P0y-NNNiKQ170dUkHmU2XUrHku2rVB9CNhFR3YVf6R8/edit#bookmark=id.ag5ij1ga6yrs

Add Primary Key to every cross-reference table?

@lulin-song made this suggestion during our call today, indicating that it is not necessary for RDB management, but it is very important for object oriented programing. In other words, these cross-reference tables all become relationship objects that get much harder to deal with programmatically (i.e. for cascading deletes).

@valentinedwv agreed without hesitation, and rather emphatically.

@horsburgh and @emiliom listed some of the costs (i.e. slightly more storage, more complexity with indexing), but acknowledged that the benefits seem to outweigh the costs.

We DECIDED to have a primary key to all tables/entities in ODM2. We also decided to name the new primary keys in cross-reference tables according to the following convention:

RelationID for all relationship tables. Autonumbering Integer.
BridgeID for all bridge tables, including those with associated values, such as those in Annotations, ExtensionProperites and ExternalIdentifiers. Autonumbering Integer.

Do Annotations [and Methods] need Citations?

Per our conversation last week - do we want to implement a linkages between Annotations and Citations?

Is there a real need for all the DateTimeUTCOffset fields?

This has been a pet peeve of mine from the start. I know it's an ODM 1/ WaterML 1 legacy. But with proper "timestamp with time zone" database attributes, or ISO 8601 datetime strings on XML ODM 2 representations, these fields just add clutter in multiple tables (Actions, Results, ResultValues, etc).

I remember hearing an argument from CUAHSI HIS experience about how people always forget to include timezones in their datetimes, but do we really need explicit fields to help enforce that?? Besides, 8601 has already solved all date problems ...

Consider UDUNITS or other Units systems

Right now Units in ODM2 are the same as in ODM 1.1.1. An enhancement might be to consider other specifications for Units that might better support units conversions, etc.

People/Affiliation/Organizations tables

The columns and naming of these columns in the People/Affiliation/Organizations might benefit from some tweaking. For instance, right now we have PrimaryEmail, Phone and so on but no secondary. Also, right now the address information is such that it is hard to do something with this. I think that it would be good to be able to store multiple emails (and select one as primary), and also to store multiple phone numbers (work, home, mobile) as well as fax. In addition it would be good to pull the address apart so that one would have more granularity at this level. I have a schema which is a stab at this in MySQL workbench (see below) which may be helpful.