Code Monkey home page Code Monkey logo

stationxml-validator's People

Contributors

bentatham avatar crotwell avatar timronan avatar yazan-iris avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stationxml-validator's Issues

reduntant notation ** for powers of units

The current unit naming wiki page says:

Exponents for powers are specified with **, e.g. "s**2".

but the ** is I think redundant as there can be no other meaning for an integer within the unit name than exponent. So would be simpler to encode power as a simple number following a unit. So acceleration would be either "m s-2" or "m/s2".

This standard may be useful as it explicitly deals with conveying units within ascii:
http://unitsofmeasure.org/ucum.html
perhaps include a link to it on the wiki page? They also use a dot between multiplied units, like "m.s-2".

rule 408 confusing

408 The value of Channel::SampleRate must be equal to the value of Decimation::InputSampleRate divided by Decimation::Factor of the final response stage.

This is confusing, think you mean sample rate is input sample rate from first digital stage divided by decimation factors from each stage, not just the final stage?

disallow negative gain

403 The element StageGain::Value or InstrumentSensitivity::Value must be non-zero.

 Should negative gain be allowed? Maybe at least warning if < 0.  Also check that Frequency is >0.

 Input units should match instrument code in InstrumentSensitivity? May not always be possible, but BHZ should be a velocity (ie M/S) and BNZ should be acceleration (M/S**2).

[Develop Test] Errors 216, 316: Restriction Status must follow Network, Station, Channel Hierarchy

Create Test:
Error 216: Checks that the Station level restriction status is compatible with the Network level restriction status. (EX: A Network with an Open restriction status cannot have a Station with a Closed or Partial restriction status.) Error 216 cannot trigger on Networks with a Partial restriction designation because a Station with any restriction status fits within a Network with a Partial restriction Status. Station level restriction status must be assigned a string "Open", "Closed", or "Partial".

Error 316: Checks that the Channel level restriction status is compatible with the Station level restriction status. (EX: A Station with an Open restriction status cannot have a Channel with a Closed restriction status.) Error 316 cannot trigger on Stations with a Partial restriction designation because a Channel with any restriction status fits within a Station with a Partial restriction Status. Channel level restriction status must be assigned a string "Open" or "Closed".

Support specification of URLs for input documents

Allow a user to specify an input "file" as a URL, ala:

java -jar stationxml-validator-1.0.2.jar 'http://service.iris.edu/fdsnws/station/1/query?net=IU&sta=ANMO&loc=00&cha=BHZ&level=response&format=xml'

channel depth rule

Rule 210 Station elevation must be equal to or above Channel elevation
may not be true always, ex array in steep terrain where surface sensor is uphill from station location, or sensor is strong motion several floors up in a building.

A better check might be to compare channel elevation plus channel depth to station elevation. They might not match in all cases, but for simple stations, especially where channel lat lon is same as station, the chan elevation plus the depth of overburden should be station elevation.

Table printed for --print-units is the wrong list and needs other changes

For --print-units:

  1. The list is NOT the approved list at:
    https://github.com/iris-edu/StationXML-Validator/wiki/Unit-name-overview-for-IRIS-StationXML-validator#common-unit-names

I added a date to the list on that page to provide some level of tracking. You should put a comment the code with the date when you copied the official list into the code, so you know when it's out of date.

  1. The "table" printed when --print-units is used is ~102 characters wide even though none of the unit names are even close to that length. The default terminal width is 80 characters, so anyone that hasn't made their window >= 102 characters wide gets an ugly, hard to read table.

I'm not even sure the unit names need to be in a table, seems like unnecessary formatting. Either get rid of the table or make it less wide (maybe match the --print-rules width for consistency).

  1. Sort the the unit names. The list is a lot easier to look for something if sorted.

[Develop Test] Error 405: InstrumentSensitivity Value == Product of StageGain Value at Normalization Frequency

Note: Error number 405 will be reassigned. This number to temporarily help define the subject.

Create a test that compares the product of StageGain<Value> from stages that StageGain<Frequency> = InstrumentSensitivity<Frequency> to the the InstrumentSensitivity<Value>. These values should be equal.

EX: Code block outlining how test could be conducted. NOT TESTED.

stagegains = Vector()
count = 1
for( i in 1: length(stages)){
    if (StageGain<Frequency>[i] == InstrumentSensitivity<Frequency>){
         stagegains[count] = StageGain<Value>
         count=+
    }
}
TotalGain = product(stagegains)
if(InstrumentSensitivity<Value> != TotalGain && count == length(stages)){
    Error 405!
}

"Compare the product of the gains of each stage at the normalization
frequency listed in stage 0
" to the instrument sensitivity gain.

Refer to: #19

Error 403: Compare null unit stages to the response cascade

Error 403 is triggering when stages with "null" units are compared to the units of other stages within the response cascade. The preamp stage is comprised of only SEED blockette [58], which is unitless because blockette [58] does not contain a field that defines unit strings. The Nominal Response Library (NRL) is written to define preamp stages by blocklette [58], so every time a NRL response is included in station metadata 403 error will occur upon validation.

This error could be fixed by skipping the 403 test on preamp, "null unit", stages. An exception to error 403 could be made for stages with null units.

Refer to: #9

EX1, Refer to stage 2: https://service.iris.edu/fdsnws/station/1/query?net=AV&sta=ILW&cha=SHN&starttime=2016-09-30T01:00:00&level=response&format=xml&includecomments=true&nodata=404

Validator should have two report modes: WARNING and ERROR

Some notifications should be WARNINGS which state the preferred state of the data, but should not flag an automatic reader to disregard the entry or document.

Notifications that are an ERROR should flag readers to disregard the entry or document.

check overall gain as product of stage gains

Overall gain in InstrumentSensitivity should be product of all the stage gains IF Frequency values are same in all stages. Not sure if it would be an error if not, but all stages probably should use the same frequency value if possible.

Change unit value checking with updates and check both case sensitively and insenitively

Update the units checked from the table here:
https://github.com/iris-edu/StationXML-Validator/wiki/Unit-name-overview-for-IRIS-StationXML-validator

Also, check the unit names and issue both warnings and errors.

It is an error if the unit name is not found in our list with a case insensitive comparison.

It is a warning if the unit name is found in our list with a case insensitive comparison but not a case sensitive search.

wrong effective time overlap or bad output message

For one of my stations, stationxml here:
http://files.anss-sis.scsn.org/production/FDSNstationXML/CO/CO_CSB.xml

I got this error, which goes on for a couple of pages, trimmed for bug post. I am not sure if the overlap calculation is right, I don't think I have overlapping channels, but at a minimum the output of this error message is way too verbose to be helpful. Probably don't need to print all of the channels, just the two that supposedly overlap.

CO_CSB.xml|252|[[Channel [code=EH1, locationCode=00, startDate=Sun Apr 12 20:00:00 EDT 2009, endDate=Thu Sep 03 12:00:00 EDT 2009], Channel [code=EH1, locationCode=00, startDate=Thu Sep 03 12:00:00 EDT 2009, endDate=Wed Sep 09 12:00:00 EDT 2009], Channel [code=EH1, locationCode=00, startDate=Wed Sep 09 12:00:00 EDT 2009, endDate=Mon Dec 14 11:00:00 EST 2009],

Channel [code=VMV, locationCode=01, startDate=Thu Feb 11 18:00:00 EST 2016, endDate=Thu Mar 24 12:00:00 EDT 2016], Channel [code=VMW, locationCode=00, startDate=Wed Apr 27 14:00:00 EDT 2016, endDate=null], Channel [code=VMW, locationCode=01, startDate=Thu Feb 11 18:00:00 EST 2016, endDate=Thu Mar 24 12:00:00 EDT 2016]]] Channel epoch overlap not allowed|CO|||CSB|2009-04-13T00:00:00|||||

Error 402: Unit Case Sensitivity

Error 402 is triggering on mismatched case sensitivity of units. Because SEED traditionally uses upper case characters for assigning units, this error should be a warning to encourage future users to assign unit names the character cases outlined by SI standards but allow dataless to sxml conversions into the system. Unit name strings containing all uppercase characters are likely an artifact of the dataless to stationxlm conversion process.

Refer to: https://github.com/iris-edu/StationXML-Validator/wiki/Unit-name-overview-for-IRIS-StationXML-validator

EX1:https://service.iris.edu/fdsnws/station/1/query?net=PB&sta=B944&loc=T3&cha=RCD&starttime=2010-01-01T01:00:00&level=response&format=xml&includecomments=true&nodata=404

Rules table has some strange characters, incomplete and misspelled entries

Using 1.5.2 with the --print-rules option the resulting table has some issues.

Here are some examples, please review the entire table and fix:

a) strange characters at the end of some descriptions: station code doesn't match [A-Za-z0-9*?]{1, what is the {1 part for?

b) Typos like: callibration

c) Rule 210: 210 │ elevation XXXXXXX. I have no idea what this rule does.

d) Rule 309L 309 │ ${validatedValue} does not match: Instrument azimuth '. I have no idea what this does either. Also,${validatedValue}` is confusing, just say what it is.

e) these two do not make sense, what are the tests?, there must be more to them:
405 │ If the Channel sample rate is 0 (non-timeseries ASCII channel)
406 │ If the Channel sample rate is nonzero.

f) A whole bunch of ${validatedValue.name} usage, not sure what that is but it should be replaced with real words.

Please go through the whole table, one by one, and make sure each entry is easily understandable and fix the descriptions if needed. Obviously these should be checked with the relevant code to make sure they do what they say.

New rule to check that channel orientation codes match documented orientation

This rule only emits a warning, not a failure

A new rule that checks the orientation code in the channel (e.g. E,N,Z) against the channel's azimuth and dip values.

The test should be "within 5 degrees" of the values listed in the SEED manual:

Z — Dip -90, Azimuth 0
N — Dip 0, Azimuth 0
E — Dip 0, Azimuth 90

prefer SI units with SI approved capitalization

Although SEED has historically used all caps and says that units are case insensitive, but case matters in SI. For example "s" is second but "S" is siemen, and "mV" is millivolt, "MV" is megavolt.

To the extent possible, stationxml should probably use the SI unit standards, including preferring the SI capitalization.

Rule 310 and 408 - ACE and OCF channels should be able to have an empty response

We want to address these warnings relating to ACE and OCF channels as they apply only to channels with a sample rate and response. ACE and OCF can have an empty response block.

  • 310 : ACE : Sample rate cannot be 0 or null.
  • 310 : ACE : Sample rate cannot be 0 or null.
  • 312 : ACE : expected a value for calibration unit but was null
  • 312 : ACE : expected a value for calibration unit but was null
  • 408 : ACE : Decimation cannot be null
  • 408 : ACE : Decimation cannot be null

Add 'unknown' to SI unit list.

This was discussed with the DS team. I think we agreed to make this an acceptable unit, as opposed to directing users to writing 'unitless'.

Sample rate comparison too strict, need comparison with tolerance

The sample rate comparison for Rule 408 is too strict and runs into the classic problem of comparing floating point values.

A suggested comparison with tolerance is as follows:

abs(1-sr1/sr2) < 0.0001

I.e. two sample rates (sr1 and sr2) are considered equal if the absolute value of (1 - sr1/sr2) is less than 0.0001. A C macro that does this test is:

#define MS_ISRATETOLERABLE(A,B) (fabs (1.0 - (A / B)) < 0.0001)

Corrupt reporting output for test failures

This run:

java -jar stationxml-validator-1.0.2.jar IU.ANMO.00.BHZ-BadNetandBadUnit.xml

results in:

file|rule-id|rule-message|network|network-start-time|network-end-time|station|station-start-time|station-end-time|location|channel-code|channel-start-time|channel-end-time
IU.ANMO.00.BHZ-BadNetandBadUnit.xml|102| network code doesn't match [A-Za-z0-9*?]{1|IUUU|1988-01-01T00:00:00|2500-12-31T23:59:59|||||||| network code doesn't match [A-Za-z0-9*?]{1
IU.ANMO.00.BHZ-BadNetandBadUnit.xml|411| Invalid input unit Meters/S for stage sensitivity|IUUU|1988-01-01T00:00:00|2500-12-31T23:59:59|ANMO|1995-07-14T00:00:00|2000-10-19T16:00:00|00|BHZ|1998-10-26T20:00:00|2000-10-19T16:00:00| Invalid input unit Meters/S for stage sensitivity

A number of changes should be made:

  1. The "rule-message" is included again at the end of each of the messages
  2. The "rule-message" for rule 102 is "network code doesn't match [A-Za-z0-9*?]{1". What is that "{1" at the end? Doesn't make sense.
  3. The "rule-messages" start with a space and are inconsistently capitalized. Seems like the space doesn't need to be there and each message should start with a capital letter so it reads like a message, or at least be consistent.
  4. In my opinion, the header row would more obviously be a header row and readable if the field names started with capital letters, e.g. "File" instead of "file".

Station lat,lon rule disallow 0,0

Add to rule to disallow if both are zero. This has happened more than once in the past. Maybe someday they will install an OBS at 0,0 but for now that probably means somebody forgot to type in a value and the system default was 0.

Error 210:Station elevation must be equal to or above Channel elevation

Error 210 is, currently, not accurate in all field deployment context. Channels can be separated from a station by less than 1 km, according to SEED standards, which can equate to significant elevation gain on steep terrain or when using down hole instruments. The SEED manual states that differences between station and instrument elevations must be < 1 km (SEED Manual p 68) . Please allow a tolerance of 1 km between station and instrument elevations to make this error reflect the SEED manual.

EX1: https://service.iris.edu/fdsnws/station/1/query?net=CN&sta=DPQ&cha=HHE&starttime=2018-06-01T01:00:00&level=channel&format=xml&includecomments=true&nodata=404

make dates UTC

Dates for start/end times are printed out in local time, which is not ideal. All start/end times for net, sta, chan should be printed UTC.

Also, the full "day of week" format, like "Mon Jan 25 14:30:00 EST 2016" is verbose and harder to check. I would prefer all ISO8601 time outputs, which would match the time items later in the output.

Example:
CO_CSB.xml|406|[Channel [code=VKI, locationCode=00, startDate=Mon Jan 25 14:30:00 EST 2016, endDate=Thu Feb 11 18:00:00 EST 2016]] If Channel sample rate > 0, at least one stage must be included and be comprised of units, gain, and sample rate.|CO|||CSB|2009-04-13T00:00:00||00|VKI|2016-01-25T19:30:00|2016-02-11T23:00:00

confusing rule 402

402 The element of a stage must match the element of the preceding stage, except for stages 0 or 1.
Not sure what this means.

Does this mean check for output unit of prev stage matching input unit of next stage. That would be useful, but this message is confusing.

Rule 302 - modify the regexp match rule for location codes

Discussed with DS team.

The current rule says that the location value must not be null and must match ([A-Za-z0-9*?-\ ]{1,2})?

We should change this to read [A-Za-z0-9\ ]{0,2}, which means the field could in fact have zero characters.

Set process exit code according to validation status

The command line tool (stationxml-validator.jar) should exit with a code meaningful to the validation status.

An exit code of 0 should be used when the validation was successful. As for the exit code when the validation was not successful we have a couple of options:

  1. exit with the count of validation test failures
  2. exit with 1 on an validation error

I prefer 1.

check channel dip -90 to 90

Similar to azimuth check, dip should be -90 to 90.

Would also be really nice to correlate this with channel orientation code. So BHZ should have dip near -90. BHE should be near azimuth 90 and BHN should be near azimuth 0. I think seed tolerance is +- 5 deg?

There is the case of channels that are 180 off that may be ok (reversed polarity).

[Documentation] "Validation tests" Document does not Reflect the Current Validator Rules.

The Wiki document Validation tests does not accurately describe why certain rules are triggering.

EX: Error: 310 "If Channel sample rate = 0, no Response should be included" triggers when Channel Sample Rate = 0 for channels != OCF, SOH, and ACE.

EX: Error 402: "The element of a stage must match the element of the preceding stage, except for stages 0 or 1." is more accurately described as Stage N contains an Invalid Input Unit.

EX: Error 403: "The element StageGain::Value or InstrumentSensitivity::Value must be non-zero" should likely be described as The element of a stage must match the element of the preceding stage, except for stages 0 or 1.

EX: Error: 405 "If Channel sample rate = 0, no Response should be included" this error does not trigger while error 310 triggers frequently. Error 405 needs further investigation.

Other inconsistencies likely exist and need need to considered on a rule by rule basis.

rule 310 shouldn't complain if channel sample rate is 0

When Channel sample rate = 0 , this is a non-time-series channel that is typically specific to
the datalogger (which can have multiple sensors). In this case, the Sensor is irrelevant and the
validator should not complain about 310. For example:

CI_CRR.xml|405|[Channel [code=ACE, locationCode=, startDate=Tue Jun 10 13:09:00 MDT 2014, endDate=Thu Jun 18 13:00:00 MDT 2015]] If Channel sample rate = 0, no Response should be included.|CI|||CRR|1950-04-09T00:00:00|||ACE|2014-06-10T19:09:00|2015-06-18T19:00:00
CI_CRR.xml|310|[null] Channel sensor cannot be null|CI|||CRR|1950-04-09T00:00:00|||ACE|2014-06-10T19:09:00|2015-06-18T19:00:00

CI_CRR_ACE_fragment.xml.txt

rule 311 should allow type, manufacturer, model instead of just description

311 Channel sensor description cannot be null/empty

One of my channels has this, which I think is much better than a simple description element.

fsx:Sensorfsx:TypeNANOMETRICS:TRILLIUM 120 BOREHOLE:001009/fsx:Typefsx:ManufacturerNANOMETRICS/fsx:Manufacturerfsx:ModelTRILLIUM 120 BOREHOLE/fsx:Modelfsx:SerialNumber001009/fsx:SerialNumberfsx:CalibrationDate1900-01-01T00:00:00Z/fsx:CalibrationDate/fsx:Sensor

Include correct version in usage message

Release 1.0.2 prints version 1.0 in the header:

$ java -jar stationxml-validator-1.0.2.jar --help
===============================================================
|                   FDSN StationXml validator                  |
|                  Version 1.0                                |
================================================================
Usage:
java -jar stationxml-validator [OPTIONS] [FILE]
OPTIONS
   --[net|sta|cha|resp] default is resp 
   --ignore-rules: comma seperated numbers of validation rules
   --print-rules : print a list of validation rules
   --print-units : print a list of units used to validate
   --summary     : print summary only report for errors if any
   --debug:
   --help: print this message
+==============================================================

Clear message when validation was successful (all tests passed)

The command line tool should print a clear message, perhaps "Validation successful, all tests passed", when the validation was successful.

Currently, the headers for a list of failures is the only thing printed. They should not be printed when there are no failures.

Error 312: Unit Case Sensitivity

Error 312 is triggering because SEED traditionally uses upper case characters for assigning unit names, but these uppercase unit names are not valid according to the validator unit rules. We could consider including "volts" (plural) into the validator unit rules list of acceptable units to decrease occurrences of Error 312 and 402. Error 312 should be a warning to encourage future users to assign unit names the character cases outlined by SI standards but allow dataless to sxml conversions into the system. We could also automatically assign unit names with the correct case if a unit name matches a regular expression. We would warn the user that their unit name has been automatically changed. Strings containing all uppercase characters are likely an artifact of the dataless to stationxlm conversion process.

Refer to: #42
EX1: https://service.iris.edu/fdsnws/station/1/query?net=PB&sta=B005&cha=RDO&starttime=2018-01-01T01:00:00&level=response&format=xml&includecomments=true&nodata=404

rule 406 may not be right for SOH channels

For SOH channels, often all there is in the "response" is just a output unit, and maybe a gain, and so it may not be correct to require stages in these cases. Examples would be things like LCQ, clock quality on a Q330, where all it is is a percentage given as estimate of clock accuracy. Not sure you really want stages for this.

Example:
fsx:Responsefsx:InstrumentSensitivityfsx:Value1.000000000000E+00/fsx:Valuefsx:Frequency2.500000000000E-01/fsx:Frequencyfsx:InputUnitsfsx:Namecounts/fsx:Namefsx:DescriptionDigital Count in Digital counts/fsx:Description/fsx:InputUnitsfsx:OutputUnitsfsx:Name%/fsx:Namefsx:DescriptionPercent in Percentage/fsx:Description/fsx:OutputUnits/fsx:InstrumentSensitivity/fsx:Response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.