iris-edu / stationxml-validator Goto Github PK
View Code? Open in Web Editor NEWLicense: GNU General Public License v3.0
License: GNU General Public License v3.0
The validator complains about channel epoch overlap (252) when two epoch of two similar channels (same code) overlap even if their locID are different.
Response:Stage must be composed of the elements Response:Stage:ResponseStageType [B053-B056], Response:Stage:Decimation [B057], and Response:Stage:StageGain [B058] or Response:Stage must only include the element Response:Stage:StageGain [B058].
[B] = Blockette SEED Manual
The current unit naming wiki page says:
Exponents for powers are specified with **, e.g. "s**2".
but the ** is I think redundant as there can be no other meaning for an integer within the unit name than exponent. So would be simpler to encode power as a simple number following a unit. So acceleration would be either "m s-2" or "m/s2".
This standard may be useful as it explicitly deals with conveying units within ascii:
http://unitsofmeasure.org/ucum.html
perhaps include a link to it on the wiki page? They also use a dot between multiplied units, like "m.s-2".
Note: Error number 403.5 will be reassigned. This number to temporarily help define the subject.
Create a test to verify that InstrumentSensitivity has a Value that is nonzero.
408 The value of Channel::SampleRate must be equal to the value of Decimation::InputSampleRate divided by Decimation::Factor of the final response stage.
This is confusing, think you mean sample rate is input sample rate from first digital stage divided by decimation factors from each stage, not just the final stage?
403 The element StageGain::Value or InstrumentSensitivity::Value must be non-zero.
Should negative gain be allowed? Maybe at least warning if < 0. Also check that Frequency is >0.
Input units should match instrument code in InstrumentSensitivity? May not always be possible, but BHZ should be a velocity (ie M/S) and BNZ should be acceleration (M/S**2).
Create Test:
Error 216: Checks that the Station level restriction status is compatible with the Network level restriction status. (EX: A Network with an Open restriction status cannot have a Station with a Closed or Partial restriction status.) Error 216 cannot trigger on Networks with a Partial restriction designation because a Station with any restriction status fits within a Network with a Partial restriction Status. Station level restriction status must be assigned a string "Open", "Closed", or "Partial".
Error 316: Checks that the Channel level restriction status is compatible with the Station level restriction status. (EX: A Station with an Open restriction status cannot have a Channel with a Closed restriction status.) Error 316 cannot trigger on Stations with a Partial restriction designation because a Channel with any restriction status fits within a Station with a Partial restriction Status. Channel level restriction status must be assigned a string "Open" or "Closed".
Allow a user to specify an input "file" as a URL, ala:
java -jar stationxml-validator-1.0.2.jar 'http://service.iris.edu/fdsnws/station/1/query?net=IU&sta=ANMO&loc=00&cha=BHZ&level=response&format=xml'
Rule 210 Station elevation must be equal to or above Channel elevation
may not be true always, ex array in steep terrain where surface sensor is uphill from station location, or sensor is strong motion several floors up in a building.
A better check might be to compare channel elevation plus channel depth to station elevation. They might not match in all cases, but for simple stations, especially where channel lat lon is same as station, the chan elevation plus the depth of overburden should be station elevation.
For --print-units:
I added a date to the list on that page to provide some level of tracking. You should put a comment the code with the date when you copied the official list into the code, so you know when it's out of date.
I'm not even sure the unit names need to be in a table, seems like unnecessary formatting. Either get rid of the table or make it less wide (maybe match the --print-rules width for consistency).
A new rule is needed for response level: "Sample rates are consistent through cascade of Stage entries".
The check is, when a Stage::Decimation is present the Decimation::InputSampleRate value should be the same as the Decimation::InputSampleRate divided by the Decimation::Factor from the previous Stage.
Mentioned in #17
Note: Error number 405 will be reassigned. This number to temporarily help define the subject.
Create a test that compares the product of StageGain<Value> from stages that StageGain<Frequency> = InstrumentSensitivity<Frequency> to the the InstrumentSensitivity<Value>. These values should be equal.
EX: Code block outlining how test could be conducted. NOT TESTED.
stagegains = Vector()
count = 1
for( i in 1: length(stages)){
if (StageGain<Frequency>[i] == InstrumentSensitivity<Frequency>){
stagegains[count] = StageGain<Value>
count=+
}
}
TotalGain = product(stagegains)
if(InstrumentSensitivity<Value> != TotalGain && count == length(stages)){
Error 405!
}
"Compare the product of the gains of each stage at the normalization
frequency listed in stage 0" to the instrument sensitivity gain.
Refer to: #19
Error 403 is triggering when stages with "null" units are compared to the units of other stages within the response cascade. The preamp stage is comprised of only SEED blockette [58], which is unitless because blockette [58] does not contain a field that defines unit strings. The Nominal Response Library (NRL) is written to define preamp stages by blocklette [58], so every time a NRL response is included in station metadata 403 error will occur upon validation.
This error could be fixed by skipping the 403 test on preamp, "null unit", stages. An exception to error 403 could be made for stages with null units.
Refer to: #9
EX1, Refer to stage 2: https://service.iris.edu/fdsnws/station/1/query?net=AV&sta=ILW&cha=SHN&starttime=2016-09-30T01:00:00&level=response&format=xml&includecomments=true&nodata=404
Some notifications should be WARNINGS which state the preferred state of the data, but should not flag an automatic reader to disregard the entry or document.
Notifications that are an ERROR should flag readers to disregard the entry or document.
Overall gain in InstrumentSensitivity should be product of all the stage gains IF Frequency values are same in all stages. Not sure if it would be an error if not, but all stages probably should use the same frequency value if possible.
Rule 402, testing if InputUnit and OutputUnit entries are consistent, is triggered when a stage has no units.
Can we make this test skip stages that have no units? The attached test file is valid concerning the units.
Update the units checked from the table here:
https://github.com/iris-edu/StationXML-Validator/wiki/Unit-name-overview-for-IRIS-StationXML-validator
Also, check the unit names and issue both warnings and errors.
It is an error if the unit name is not found in our list with a case insensitive comparison.
It is a warning if the unit name is found in our list with a case insensitive comparison but not a case sensitive search.
For one of my stations, stationxml here:
http://files.anss-sis.scsn.org/production/FDSNstationXML/CO/CO_CSB.xml
I got this error, which goes on for a couple of pages, trimmed for bug post. I am not sure if the overlap calculation is right, I don't think I have overlapping channels, but at a minimum the output of this error message is way too verbose to be helpful. Probably don't need to print all of the channels, just the two that supposedly overlap.
CO_CSB.xml|252|[[Channel [code=EH1, locationCode=00, startDate=Sun Apr 12 20:00:00 EDT 2009, endDate=Thu Sep 03 12:00:00 EDT 2009], Channel [code=EH1, locationCode=00, startDate=Thu Sep 03 12:00:00 EDT 2009, endDate=Wed Sep 09 12:00:00 EDT 2009], Channel [code=EH1, locationCode=00, startDate=Wed Sep 09 12:00:00 EDT 2009, endDate=Mon Dec 14 11:00:00 EST 2009],
Channel [code=VMV, locationCode=01, startDate=Thu Feb 11 18:00:00 EST 2016, endDate=Thu Mar 24 12:00:00 EDT 2016], Channel [code=VMW, locationCode=00, startDate=Wed Apr 27 14:00:00 EDT 2016, endDate=null], Channel [code=VMW, locationCode=01, startDate=Thu Feb 11 18:00:00 EST 2016, endDate=Thu Mar 24 12:00:00 EDT 2016]]] Channel epoch overlap not allowed|CO|||CSB|2009-04-13T00:00:00|||||
Error 402 is triggering on mismatched case sensitivity of units. Because SEED traditionally uses upper case characters for assigning units, this error should be a warning to encourage future users to assign unit names the character cases outlined by SI standards but allow dataless to sxml conversions into the system. Unit name strings containing all uppercase characters are likely an artifact of the dataless to stationxlm conversion process.
Clarify the message text of rule 407 to read:
"...response must exist as either a scale factor or a polynomial."
Create a restriction in all response level conditions, 400 level error test, allowing Channel:code == "SOH", "ACE", "OCF", "LOG" and Channel:Type == "HEALTH", "FLAG", "MAINTENANCE" to pass the condition. Response information is often not applicable to the Channel:code and Channel:Type defined above and should not prevent validation of these channels.
Using 1.5.2 with the --print-rules option the resulting table has some issues.
Here are some examples, please review the entire table and fix:
a) strange characters at the end of some descriptions: station code doesn't match [A-Za-z0-9*?]{1
, what is the {1
part for?
b) Typos like: callibration
c) Rule 210: 210 │ elevation XXXXXXX
. I have no idea what this rule does.
d) Rule 309L 309 │ ${validatedValue} does not match: Instrument azimuth '. I have no idea what this does either. Also,
${validatedValue}` is confusing, just say what it is.
e) these two do not make sense, what are the tests?, there must be more to them:
405 │ If the Channel sample rate is 0 (non-timeseries ASCII channel)
406 │ If the Channel sample rate is nonzero
.
f) A whole bunch of ${validatedValue.name}
usage, not sure what that is but it should be replaced with real words.
Please go through the whole table, one by one, and make sure each entry is easily understandable and fix the descriptions if needed. Obviously these should be checked with the relevant code to make sure they do what they say.
This rule only emits a warning, not a failure
A new rule that checks the orientation code in the channel (e.g. E,N,Z) against the channel's azimuth and dip values.
The test should be "within 5 degrees" of the values listed in the SEED manual:
Z — Dip -90, Azimuth 0
N — Dip 0, Azimuth 0
E — Dip 0, Azimuth 90
Although SEED has historically used all caps and says that units are case insensitive, but case matters in SI. For example "s" is second but "S" is siemen, and "mV" is millivolt, "MV" is megavolt.
To the extent possible, stationxml should probably use the SI unit standards, including preferring the SI capitalization.
We want to address these warnings relating to ACE and OCF channels as they apply only to channels with a sample rate and response. ACE and OCF can have an empty response block.
This was discussed with the DS team. I think we agreed to make this an acceptable unit, as opposed to directing users to writing 'unitless'.
The sample rate comparison for Rule 408 is too strict and runs into the classic problem of comparing floating point values.
A suggested comparison with tolerance is as follows:
abs(1-sr1/sr2) < 0.0001
I.e. two sample rates (sr1
and sr2
) are considered equal if the absolute value of (1 - sr1/sr2) is less than 0.0001. A C macro that does this test is:
#define MS_ISRATETOLERABLE(A,B) (fabs (1.0 - (A / B)) < 0.0001)
This run:
java -jar stationxml-validator-1.0.2.jar IU.ANMO.00.BHZ-BadNetandBadUnit.xml
results in:
file|rule-id|rule-message|network|network-start-time|network-end-time|station|station-start-time|station-end-time|location|channel-code|channel-start-time|channel-end-time
IU.ANMO.00.BHZ-BadNetandBadUnit.xml|102| network code doesn't match [A-Za-z0-9*?]{1|IUUU|1988-01-01T00:00:00|2500-12-31T23:59:59|||||||| network code doesn't match [A-Za-z0-9*?]{1
IU.ANMO.00.BHZ-BadNetandBadUnit.xml|411| Invalid input unit Meters/S for stage sensitivity|IUUU|1988-01-01T00:00:00|2500-12-31T23:59:59|ANMO|1995-07-14T00:00:00|2000-10-19T16:00:00|00|BHZ|1998-10-26T20:00:00|2000-10-19T16:00:00| Invalid input unit Meters/S for stage sensitivity
A number of changes should be made:
Lots of non-seismic stations list COUNTS as a calibration unit. Discussed with DS team. Should put out a warning when capitalized as our acceptable units are SI lower-case (e.g. counts).
Add to rule to disallow if both are zero. This has happened more than once in the past. Maybe someday they will install an OBS at 0,0 but for now that probably means somebody forgot to type in a value and the system default was 0.
Error 210 is, currently, not accurate in all field deployment context. Channels can be separated from a station by less than 1 km, according to SEED standards, which can equate to significant elevation gain on steep terrain or when using down hole instruments. The SEED manual states that differences between station and instrument elevations must be < 1 km (SEED Manual p 68) . Please allow a tolerance of 1 km between station and instrument elevations to make this error reflect the SEED manual.
Maybe as part of rule 409, output of any "digital" stage should be COUNT.
Dates for start/end times are printed out in local time, which is not ideal. All start/end times for net, sta, chan should be printed UTC.
Also, the full "day of week" format, like "Mon Jan 25 14:30:00 EST 2016" is verbose and harder to check. I would prefer all ISO8601 time outputs, which would match the time items later in the output.
Example:
CO_CSB.xml|406|[Channel [code=VKI, locationCode=00, startDate=Mon Jan 25 14:30:00 EST 2016, endDate=Thu Feb 11 18:00:00 EST 2016]] If Channel sample rate > 0, at least one stage must be included and be comprised of units, gain, and sample rate.|CO|||CSB|2009-04-13T00:00:00||00|VKI|2016-01-25T19:30:00|2016-02-11T23:00:00
402 The element of a stage must match the element of the preceding stage, except for stages 0 or 1.
Not sure what this means.
Does this mean check for output unit of prev stage matching input unit of next stage. That would be useful, but this message is confusing.
Discussed with DS team.
The current rule says that the location value must not be null and must match ([A-Za-z0-9*?-\ ]{1,2})?
We should change this to read [A-Za-z0-9\ ]{0,2}, which means the field could in fact have zero characters.
Do we add a type column?
The command line tool (stationxml-validator.jar) should exit with a code meaningful to the validation status.
An exit code of 0 should be used when the validation was successful. As for the exit code when the validation was not successful we have a couple of options:
I prefer 1.
Note: Error number 404 will be reassigned. This number to temporarily help define the subject.
Create a test to verify that each stage has a StageGain Value that is nonzero.
Similar to azimuth check, dip should be -90 to 90.
Would also be really nice to correlate this with channel orientation code. So BHZ should have dip near -90. BHE should be near azimuth 90 and BHN should be near azimuth 0. I think seed tolerance is +- 5 deg?
There is the case of channels that are 180 off that may be ok (reversed polarity).
The Wiki document Validation tests does not accurately describe why certain rules are triggering.
EX: Error: 310 "If Channel sample rate = 0, no Response should be included" triggers when Channel Sample Rate = 0 for channels != OCF, SOH, and ACE.
EX: Error 402: "The element of a stage must match the element of the preceding stage, except for stages 0 or 1." is more accurately described as Stage N contains an Invalid Input Unit.
EX: Error 403: "The element StageGain::Value or InstrumentSensitivity::Value must be non-zero" should likely be described as The element of a stage must match the element of the preceding stage, except for stages 0 or 1.
EX: Error: 405 "If Channel sample rate = 0, no Response should be included" this error does not trigger while error 310 triggers frequently. Error 405 needs further investigation.
Other inconsistencies likely exist and need need to considered on a rule by rule basis.
When Channel sample rate = 0 , this is a non-time-series channel that is typically specific to
the datalogger (which can have multiple sensors). In this case, the Sensor is irrelevant and the
validator should not complain about 310. For example:
CI_CRR.xml|405|[Channel [code=ACE, locationCode=, startDate=Tue Jun 10 13:09:00 MDT 2014, endDate=Thu Jun 18 13:00:00 MDT 2015]] If Channel sample rate = 0, no Response should be included.|CI|||CRR|1950-04-09T00:00:00|||ACE|2014-06-10T19:09:00|2015-06-18T19:00:00
CI_CRR.xml|310|[null] Channel sensor cannot be null|CI|||CRR|1950-04-09T00:00:00|||ACE|2014-06-10T19:09:00|2015-06-18T19:00:00
311 Channel sensor description cannot be null/empty
One of my channels has this, which I think is much better than a simple description element.
fsx:Sensorfsx:TypeNANOMETRICS:TRILLIUM 120 BOREHOLE:001009/fsx:Typefsx:ManufacturerNANOMETRICS/fsx:Manufacturerfsx:ModelTRILLIUM 120 BOREHOLE/fsx:Modelfsx:SerialNumber001009/fsx:SerialNumberfsx:CalibrationDate1900-01-01T00:00:00Z/fsx:CalibrationDate/fsx:Sensor
Current rule is to treat N,E,Z orientations as being within 5 degrees of northing,easting,vertical, otherwise are required to use the 1,2,3 numbering system. The error message should provide a hint for the numerical naming preference for orientations that fall outside of this tolerance.
Release 1.0.2 prints version 1.0 in the header:
$ java -jar stationxml-validator-1.0.2.jar --help
===============================================================
| FDSN StationXml validator |
| Version 1.0 |
================================================================
Usage:
java -jar stationxml-validator [OPTIONS] [FILE]
OPTIONS
--[net|sta|cha|resp] default is resp
--ignore-rules: comma seperated numbers of validation rules
--print-rules : print a list of validation rules
--print-units : print a list of units used to validate
--summary : print summary only report for errors if any
--debug:
--help: print this message
+==============================================================
I downloaded the (1.5.2 jar)[https://github.com/iris-edu/StationXML-Validator/releases/download/v1.5.2/StationXml-validator--1.5.2.jar) and it does not run:
$ java -jar StationXml-validator--1.5.2.jar
Error: Could not find or load main class edu.iris.dmc.Application
Failure is reported for rule 413 even when there is no error.
The command line tool should print a clear message, perhaps "Validation successful, all tests passed", when the validation was successful.
Currently, the headers for a list of failures is the only thing printed. They should not be printed when there are no failures.
Error 312 is triggering because SEED traditionally uses upper case characters for assigning unit names, but these uppercase unit names are not valid according to the validator unit rules. We could consider including "volts" (plural) into the validator unit rules list of acceptable units to decrease occurrences of Error 312 and 402. Error 312 should be a warning to encourage future users to assign unit names the character cases outlined by SI standards but allow dataless to sxml conversions into the system. We could also automatically assign unit names with the correct case if a unit name matches a regular expression. We would warn the user that their unit name has been automatically changed. Strings containing all uppercase characters are likely an artifact of the dataless to stationxlm conversion process.
Refer to: #42
EX1: https://service.iris.edu/fdsnws/station/1/query?net=PB&sta=B005&cha=RDO&starttime=2018-01-01T01:00:00&level=response&format=xml&includecomments=true&nodata=404
For SOH channels, often all there is in the "response" is just a output unit, and maybe a gain, and so it may not be correct to require stages in these cases. Examples would be things like LCQ, clock quality on a Q330, where all it is is a percentage given as estimate of clock accuracy. Not sure you really want stages for this.
Example:
fsx:Responsefsx:InstrumentSensitivityfsx:Value1.000000000000E+00/fsx:Valuefsx:Frequency2.500000000000E-01/fsx:Frequencyfsx:InputUnitsfsx:Namecounts/fsx:Namefsx:DescriptionDigital Count in Digital counts/fsx:Description/fsx:InputUnitsfsx:OutputUnitsfsx:Name%/fsx:Namefsx:DescriptionPercent in Percentage/fsx:Description/fsx:OutputUnits/fsx:InstrumentSensitivity/fsx:Response
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.