Code Monkey home page Code Monkey logo

wfcatalog's People

Contributors

andres-h avatar jbienkowski avatar jollyfant avatar jschaeff avatar megies avatar sheimers avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wfcatalog's Issues

Problems with service tests

@Jollyfant I'm currently trying to get the service up and running, but npm test throws an exception:

sysop:~/wfcatalog/service$ npm test

> [email protected] test /home/sysop/wfcatalog/service
> node runtests.js

[ERROR-Server:30149] 1502373012210 [1] failed to authenticate against server 127.0.0.1:27017 { type: 'error',
  message: '[1] failed to authenticate against server 127.0.0.1:27017',
  className: 'Server',
  pid: 30149,
  date: 1502373012210 }

/home/sysop/wfcatalog/service/node_modules/mongodb/lib/mongo_client.js:398
              throw err
              ^
FATAL: The connection to the database could not be established.
npm ERR! Test failed.  See above for more details.

Installation of the service seemed to be OK:

sysop:~/wfcatalog/service$ npm install
npm WARN deprecated [email protected]: Please upgrade to 2.2.19 or higher

> [email protected] install /home/sysop/wfcatalog/service/node_modules/dtrace-provider
> node scripts/install.js

[email protected] /home/sysop/wfcatalog/service
├─┬ [email protected] 
│ ├─┬ [email protected] 
│ │ └── [email protected] 
│ ├── [email protected] 
│ ├─┬ [email protected] 
│ │ ├─┬ [email protected] 
│ │ │ └── [email protected] 
│ │ ├── [email protected] 
│ │ └─┬ [email protected] 
│ │   └─┬ [email protected] 
│ │     ├─┬ [email protected] 
│ │     │ └── [email protected] 
│ │     ├─┬ [email protected] 
│ │     │ └─┬ [email protected] 
│ │     │   ├── [email protected] 
│ │     │   └── [email protected] 
│ │     ├── [email protected] 
│ │     └── [email protected] 
│ └── [email protected] 
├─┬ [email protected] 
│ ├─┬ [email protected] 
│ │ ├─┬ [email protected] 
│ │ │ └── [email protected] 
│ │ └── [email protected] 
│ ├── [email protected] 
│ ├── [email protected] 
│ ├── [email protected] 
│ ├── [email protected] 
│ ├── [email protected] 
│ ├─┬ [email protected] 
│ │ └── [email protected] 
│ ├── [email protected] 
│ ├── [email protected] 
│ ├── [email protected] 
│ ├─┬ [email protected] 
│ │ └── [email protected] 
│ ├── [email protected] 
│ ├── [email protected] 
│ ├── [email protected] 
│ ├─┬ [email protected] 
│ │ └── [email protected] 
│ ├── [email protected] 
│ ├── [email protected] 
│ ├─┬ [email protected] 
│ │ ├── [email protected] 
│ │ └── [email protected] 
│ ├── [email protected] 
│ ├── [email protected] 
│ ├─┬ [email protected] 
│ │ ├── [email protected] 
│ │ ├── [email protected] 
│ │ ├── [email protected] 
│ │ └── [email protected] 
│ ├─┬ [email protected] 
│ │ └── [email protected] 
│ ├─┬ [email protected] 
│ │ └── [email protected] 
│ ├── [email protected] 
│ └── [email protected] 
└─┬ [email protected] 
  ├── [email protected] 
  ├─┬ [email protected] 
  │ ├── [email protected] 
  │ └─┬ [email protected] 
  │   ├── [email protected] 
  │   └── [email protected] 
  └─┬ [email protected] 
    ├── [email protected] 
    ├── [email protected] 
    ├── [email protected] 
    └── [email protected] 

npm WARN [email protected] license should be a valid SPDX license expression

I'm running the mongodb of Debian Jessie installed via system packages (version 1:2.4.10-5+deb8u1) and can connect with the credentials (I also put them into the configuration.json of the service) just fine:

$ mongo -u XXX -p XXX localhost:27017/wfrepo
MongoDB shell version: 2.4.10
connecting to: localhost:27017/wfrepo
> show collections
c_segments
daily_streams
system.indexes
system.users
> 
bye

The mongodb log doesn't show anything that hints at a problem (for the npm test call), especially not with credentials/authentication:

Thu Aug 10 15:50:12.163 [initandlisten] connection accepted from 127.0.0.1:50246 #2 (1 connection now open)
Thu Aug 10 15:50:12.190 [conn2] end connection 127.0.0.1:50246 (0 connections now open)
Thu Aug 10 15:50:12.198 [initandlisten] connection accepted from 127.0.0.1:50247 #3 (1 connection now open)
Thu Aug 10 15:50:12.719 [conn3] end connection 127.0.0.1:50247 (0 connections now open)

Any ideas what could be the problem or how to debug this?

CC @jwassermann

ENH: Add query parameter nodata ...

It is often a bit unexpected that this service actually does not provide the query parameter nodata. In some situations this would be really helpful.

Therefore this issue to sugest its addition.
The behaviour should in analogy and compliant to fdsnws services.

WFCatalog metadata dependency

Currently, WFCatalog does not depend on station metadata - it calculates metrics for acquired data even if some channels are not defined in StationXML. In those cases users can retrieve the metrics, but are not able to download the data itself via FDSNWS-Dataselect web service which strongly depends on metadata.

Possible solutions:

  1. Exclude channels that are not defined in StationXML from WFCatalog collector processing
  2. Calculate the metrics anyways, but apply filtering on the web service side
  3. Cross-check on metadata in the downstream product (apparently the original approach)
  4. Add a metadata query parameter with default value true in the WFCatalog implementation which would still allow retrieval of all available metrics

Webservice response and JSON schema

This issue is a follow-up in response to an email from @Jollyfant:

Hi guys,
I don't think we ever changed the schema so it's the most recent version, although it has bugs because the output does not validate on the schema, apparently!


In the JSON schema we need to change for all enums, type: "string", and enum as a seperate object. Maybe an older version of JSON schema did it this way.


The schema should work for the option inlcude=all (all metrics must be returned) and for a single document in the returned array. But the webservice has some additional issues pulling these fields from the DB


  1.  nsamp (service) is called nsam (collector) in the database @ line 1037
  2.  long_record_read is not added to the client output @ line 1066
  3.  event in progress is not added to the client output @ line 1084

and all three are therefore not returned. So these could be fixed in the service.

Hope it helps fixing the bugs!

Best,
Mathijs

Thanks @Jollyfant for your input.

Webserver crash when `include=all`

When a request comes with parameter include=all the server responds a 502.

Logs from server:

isValidRegex: Unknown type for key include
/app/server.js:1058
          short_record_read: doc["io_flags"]["srr"],
                                            ^

TypeError: Cannot read properties of undefined (reading 'srr')
    at setClientKeys (/app/server.js:1058:45)
    at processDailyStream (/app/server.js:720:25)
    at /app/node_modules/mongodb/lib/utils.js:349:28
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

As a matter of fact, I don't have any document with short_record_read

From the collector code, I can see https://github.com/EIDA/wfcatalog/blob/master/collector/WFCatalogCollector.py#L1086

I'm not sure why this information is not in my wfcatalog database ... Any hints ?

multiple collectors in parallel (same db)

This is kind of a dumb question, since after all it is a database.. but just to be sure: Is it safe to run multiple collector jobs in parallel (running on different directories), writing to the same mongo db?

mongodb docker setup missing

@Jollyfant, any reason why the MongoDB dockerfile / docker setup is not included here?

What's your experiences with using the docker setup for WFCatalog in general? Is it advisable?

WFCollector: consistency after delete and update operations

We are very interested keeping our wfcatalog in sync with the file present in the archive. In particularly we were looking into removing and updating documents after files are removed. This occasionally happens due to data curation. So we are positively surprised to see that a delete operation was recently added to the WFCollector.

However, after some code auditing I suspect that the logic of these operation might be flaw and would not ensure consistency between waveform archive and wfcatalog. I might be wrong and this is just my lag of understanding the details.

In particular the delete operation seems not to update all potentially affected documents:

For the update operation the effect seems to be somewhat minor:

I understand that especially for high sampling the effect of this "details", might be minor, but for low rates this will have an important impact.

Where I am missing something?

collector memory problems

When I run the collector on a large number of files (e.g. 120k, some with high sampling rates like 2000Hz for a full day with up to 400-500 MB), it fills memory and swap and once both is full the script gets killed (supposedly by the OS itself, since it only shows "Killed" on command line and no pythonic MemoryError).

POST request: floating point issue

Hi all,

sending POST requests to a WFCatalog webservice instance (e.g. www.orfeus-eu.org/eidaws/wfcatalog/1/query) results in an ERROR 400. The issue seems to be valid for all parameters specified as xs:float at the application.wadl file. Bellow an example for the max_gap_gt parameter:

The request was set up as described at http://www.orfeus-eu.org/data/eida/webservices/wfcatalog/. The content of the postfile is:

$ cat wfcatalog.request 
max_gap_gt=5.0
NL HGN * * 2017-01-01 2017-01-07
$ wget -O - -v --post-file wfcatalog.request http://www.orfeus-eu.org/eidaws/wfcatalog/1/query
--2017-11-01 13:51:15--  http://www.orfeus-eu.org/eidaws/wfcatalog/1/query
Resolving www.orfeus-eu.org (www.orfeus-eu.org)... 145.23.3.20
Connecting to www.orfeus-eu.org (www.orfeus-eu.org)|145.23.3.20|:80... connected.
HTTP request sent, awaiting response... 400 The submitted POST body is invalid
2017-11-01 13:51:15 ERROR 400: The submitted POST body is invalid.

The request is successful in case of omitting the floating point specification for the max_gap_gt parameter i.e. max_gap_gt=5:

$ wget -O - -v --post-file wfcatalog.request http://www.orfeus-eu.org/eidaws/wfcatalog/1/query
--2017-11-01 14:18:23--  http://www.orfeus-eu.org/eidaws/wfcatalog/1/query
Resolving www.orfeus-eu.org (www.orfeus-eu.org)... 145.23.3.20
Connecting to www.orfeus-eu.org (www.orfeus-eu.org)|145.23.3.20|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/json]
Saving to: ‘STDOUT’

-                       [<=>                 ]       0  --.-KB/s               [{"version":"1.0.0","producer":{"name":"ORFEUS ODC/KNMI","agent":"ObsPy mSEED-QC","created":"2017-01-07T05:02:08.192Z"},"station":"HGN","network":"NL","location":"02","channel":"BHZ","num_gaps":2,"num_overlaps":2,"sum_gaps":7.850001000000001,"sum_overlaps":0.05,"max_gap":5.125039,"max_overlap":0.025,"record_length":[512],"sample_rate":[40],"percent_availability":99.99091435069445,"encoding":["STEIM2"],"num_records":8695,"start_time":"2017-01-06T00:00:00.000Z","end_time":"2017--                       [ <=>                ]     537  --.-KB/s    in 0s      

2017-11-01 14:18:23 (6.37 MB/s) - written to stdout [537]

However, this behaviour seems to be against the specifications i.e. https://github.com/EIDA/wfcatalog/blob/master/wf_metadata_schema.json.

I can exclude an encoding issue since for a postfile with

$ cat wfcatalog.request 
max_gap_gt=5
NL HGN * * 2017-01-01 2017-01-07T00:00:00.000

it seems to work properly:

$ wget -O - -v --post-file wfcatalog.request http://www.orfeus-eu.org/eidaws/wfcatalog/1/query
--2017-11-01 14:10:10--  http://www.orfeus-eu.org/eidaws/wfcatalog/1/query
Resolving www.orfeus-eu.org (www.orfeus-eu.org)... 145.23.3.20
Connecting to www.orfeus-eu.org (www.orfeus-eu.org)|145.23.3.20|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/json]
Saving to: ‘STDOUT’

-                       [<=>                 ]       0  --.-KB/s               [{"version":"1.0.0","producer":{"name":"ORFEUS ODC/KNMI","agent":"ObsPy mSEED-QC","created":"2017-01-07T05:02:08.192Z"},"station":"HGN","network":"NL","location":"02","channel":"BHZ","num_gaps":2,"num_overlaps":2,"sum_gaps":7.850001000000001,"sum_overlaps":0.05,"max_gap":5.125039,"max_overlap":0.025,"record_length":[512],"sample_rate":[40],"percent_availability":99.99091435069445,"encoding":["STEIM2"],"num_records":8695,"start_time":"2017-01-06T00:00:00.000Z","end_time":"2017--                       [ <=>                ]     537  --.-KB/s    in 0s      

2017-11-01 14:10:10 (6.10 MB/s) - written to stdout [537]

GET requests are resolved successfully. The issue is reproducible also when using curl.

cheers

Enabling HTTPS for wfcatalog?

I initially thought that enabling HTTPS for the service is only server configuration. Can anyone provide insight what needs to be changed to allow HTTPS for wfcatalog? I didn't make any changes to FDSN STATION or DATASELECT after enabling HTTPS and those both work. Not wfcatalog (availabiliy, metrics).

GUI availability: multiple entries for temp. networks

Here I would like to report an observation regarding the GUI for availability as deployed at Orfeus/ODC:

This regards the drop-down menu for the network. You can observe that there can be multiple entries for the same 2 letter station code for temporary networks.

Very probably this is due to the fact that the 2-character network codes are reused for temporary networks and that this 2-character code by it self is an incomplete key. It requires the combination (start year/net code) to form a complete key. It is also not clear to the use which temp. network or experiment he is actually selecting.

I therefore would suggest that the combination of "year/net" is displayed instead, e.g.

  • 2016/3A
  • 2011/4C
  • 2015/Z3
    (We find the year more significant in the case of temporary networks)

service: fails to decode URL encoding correctly

Apparently the service does not decode URL encoded escape sequences (starting with %) correctly.

Here some evidence.

This one works:

This one should be equivalent, but fails:

Error 400: Bad Request
The submitted query string is invalid
Usage details are available from /documentation/
Request:
/wfcatalog/1/query?starttime=2018-01-01T00%3A00%3A00.000&endtime=2018-12-31T00%3A00%3A00.000
Request Submitted:
Mon Oct 01 2018 21:16:22 GMT+0000 (UTC)
Service Version:
1.0.0

This is particularly relevant because some frameworks escape characters with special meaning.

Empty location identifier

Hi,

AFAIK the webservice should accept location=-- for blank location identifiers.

However, for HTTP GET requests eidaws-wfcatalog returns HTTP response 400 in case the location=-- is used as a query filter parameter:

$ curl -v -o - "http://www.orfeus-eu.org/eidaws/wfcatalog/1/query?csegmnetwork=NL&loc=--&station=HGN&channel=BHZ&starttime=2003-06-08&end=2003-06-09"
*   Trying 145.23.3.20:80...
* TCP_NODELAY set
* Connected to www.orfeus-eu.org (145.23.3.20) port 80 (#0)
> GET /eidaws/wfcatalog/1/query?csegmnetwork=NL&loc=--&station=HGN&channel=BHZ&starttime=2003-06-08&end=2003-06-09 HTTP/1.1
> Host: www.orfeus-eu.org
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 400 The submitted query string contains an unknown key: csegmnetwork
< Date: Mon, 06 Apr 2020 15:02:12 GMT
< Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips PHP/5.4.16 mod_wsgi/3.4 Python/2.7.5
< X-Powered-By: Express
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Headers: X-Requested-With
< Content-Type: text/plain; charset=utf-8
< Content-Length: 335
< ETag: W/"14f-pZbj0Hw3xzaMTvya/Y3RLw"
< Connection: close
< 
Error 400: Bad Request
The submitted query string contains an unknown key: csegmnetwork
Usage details are available from ENTER_URI_HERE
Request:
/eidaws/wfcatalog/1/query?csegmnetwork=NL&loc=--&station=HGN&channel=BHZ&starttime=2003-06-08&end=2003-06-09
Request Submitted:
Mon Apr 06 2020 15:02:12 GMT+0000 (UTC)
Service Version:

Note also that no Service Version is specified.

When requesting the same chunk of data without explicitly passing the location parameter, data is returned:

$ curl -v -o - "http://www.orfeus-eu.org/eidaws/wfcatalog/1/query?network=NL&station=HGN&channel=BHZ&starttime=2003-06-08&end=2003-06-09"
*   Trying 145.23.3.20:80...
* TCP_NODELAY set
* Connected to www.orfeus-eu.org (145.23.3.20) port 80 (#0)
> GET /eidaws/wfcatalog/1/query?network=NL&station=HGN&channel=BHZ&starttime=2003-06-08&end=2003-06-09 HTTP/1.1
> Host: www.orfeus-eu.org
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Mon, 06 Apr 2020 15:03:27 GMT
< Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips PHP/5.4.16 mod_wsgi/3.4 Python/2.7.5
< X-Powered-By: Express
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Headers: X-Requested-With
< Content-Type: application/json
< Content-Disposition: attachment; filename=ODC-WFCATALOG-2020-04-06T15:03:27.458Z.json
< Transfer-Encoding: chunked
< 
* Connection #0 to host www.orfeus-eu.org left intact
[{"version":"1.0.0","producer":{"name":"ORFEUS ODC/KNMI","agent":"ObsPy mSEED-QC","created":"2016-06-14T15:30:45.045Z"},"station":"HGN","network":"NL","location":"00","channel":"BHZ","num_gaps":1,"num_overlaps":0,"sum_gaps":0.0184,"sum_overlaps":0,"max_gap":0.0184,"max_overlap":null,"record_length":[4096],"sample_rate":[40],"percent_availability":99.9999787037037,"encoding":["STEIM2"],"num_records":737,"start_time":"2003-06-08T00:00:00.000Z","end_time":"2003-06-09T00:00:00.000Z","format":"miniSEED","quality":"D"}]

The behaviour is different (i.e. the service returns HTTP status code 204) for HTTP POST requests (requesting the same chunk of data):

$ cat postfile
NL HGN -- BHZ 2003-06-08 2003-06-09
$ curl -v -o - --data-binary @postfile --header "Content-Type:text/plain" "http://www.orfeus-eu.org/eidaws/wfcatalog/1/query"
*   Trying 145.23.3.20:80...
* TCP_NODELAY set
* Connected to www.orfeus-eu.org (145.23.3.20) port 80 (#0)
> POST /eidaws/wfcatalog/1/query HTTP/1.1
> Host: www.orfeus-eu.org
> User-Agent: curl/7.68.0
> Accept: */*
> Content-Type:text/plain
> Content-Length: 36
> 
* upload completely sent off: 36 out of 36 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 204 No Content
< Date: Mon, 06 Apr 2020 15:09:07 GMT
< Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips PHP/5.4.16 mod_wsgi/3.4 Python/2.7.5
< X-Powered-By: Express
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Headers: X-Requested-With
< Content-Type: application/json
< Content-Disposition: attachment; filename=ODC-WFCATALOG-2020-04-06T15:09:07.238Z.json
< 
* Connection #0 to host www.orfeus-eu.org left intact

Though, using the wildcard * character returns a valid response (which agrees with the behaviour of a HTTP GET request):

$ curl -v -o - --data-binary @postfile --header "Content-Type:text/plain" "http://www.orfeus-eu.org/eidaws/wfcatalog/1/query"
*   Trying 145.23.3.20:80...
* TCP_NODELAY set
* Connected to www.orfeus-eu.org (145.23.3.20) port 80 (#0)
> POST /eidaws/wfcatalog/1/query HTTP/1.1
> Host: www.orfeus-eu.org
> User-Agent: curl/7.68.0
> Accept: */*
> Content-Type:text/plain
> Content-Length: 35
> 
* upload completely sent off: 35 out of 35 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Mon, 06 Apr 2020 15:13:13 GMT
< Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips PHP/5.4.16 mod_wsgi/3.4 Python/2.7.5
< X-Powered-By: Express
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Headers: X-Requested-With
< Content-Type: application/json
< Content-Disposition: attachment; filename=ODC-WFCATALOG-2020-04-06T15:13:13.109Z.json
< Transfer-Encoding: chunked
< 
* Connection #0 to host www.orfeus-eu.org left intact
[{"version":"1.0.0","producer":{"name":"ORFEUS ODC/KNMI","agent":"ObsPy mSEED-QC","created":"2016-06-14T15:30:45.045Z"},"station":"HGN","network":"NL","location":"00","channel":"BHZ","num_gaps":1,"num_overlaps":0,"sum_gaps":0.0184,"sum_overlaps":0,"max_gap":0.0184,"max_overlap":null,"record_length":[4096],"sample_rate":[40],"percent_availability":99.9999787037037,"encoding":["STEIM2"],"num_records":737,"start_time":"2003-06-08T00:00:00.000Z","end_time":"2003-06-09T00:00:00.000Z","format":"miniSEED","quality":"D"}]
  • The JSON schema is not very specific on which parameters are allowed.
  • Also, the definition here does not allow passing --.

Declaration of metrics parameters with comparison extensions in the wadl

the current version of the wadl (https://github.com/EIDA/wfcatalog/blob/master/service/application.wadl) declares the plain mecis parameters, but not their corresponding parameters with comparison suffixes: ..._eq, ..._ne, ..._gt, ..._ge, ..._lt, ..._le . While this may be good enough for a human interpreter aware of the service specification document (http://www.orfeus-eu.org/documents/WFCatalog_Specification-v0.22.pdf, especially page 8), an automatic routine generating a client stub based on the WADL standard would fail to catch this part of the service's functionality.

WFCatalog collector ERROR processing some files

WFCatalog collector fails collecting daily metadata for some files as the attached one. It reports the following error:

2023-05-04 12:23:41.969 [ERROR] WFCatalog Collector: Could not get daily metadata for CA.CFON..HHE.D.2023.123

CA.CFON..HHE.D.2023.123.zip

Even though the file is quite fragmented I expect collector could extract metadata from it.

Internal API

Pierre suggested that having an internal API for the WFCatalog may be useful for the data center. These are functions not accessible to the public (e.g. getting file pointers).

pymongo deprecation warning

By running WFCatalogCollector.py with the latest pymongo release, we get the message:

./wfcatalog/collector/WFCatalogCollector.py:609: DeprecationWarning: count is deprecated. Use Collection.count_documents instead.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.