dmwm / das Goto Github PK

View Code? Open in Web Editor NEW

11.0 11.0 7.0 29.21 MB

Data Aggregation System

Shell 1.37% Python 63.63% R 0.03% CSS 1.03% GAP 0.14% JavaScript 6.28% C 0.21% HTML 27.29%

das's People

Contributors

Stargazers

Watchers

Forkers

zdenekmaxa ktf davidlt perrozzi nguyenvanbang2 danieljiezhu dhootha

das's Issues

Request for python config files

DAS currently uses .ini style configuration files (das.cfg). For CMSWEB deployment we would strongly prefer python configuration files as they provide much greater ability to make the configuration location and user independent. This should also make development easier since the same configuration can be used unchanged.

For example of location independence eased by python please see DQM GUI 'devtest' configuration, which works out of the box for any user on any computer system - P5, CERN GPN (lxplus, lxbuild), desktops, laptops, and outside CERN.

https://twiki.cern.ch/twiki/bin/view/CMS/DQMTest#Specific_details

http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/CMSSW/DQM/Integration/config/

http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/CMSSW/DQM/Integration/config/server-conf-devtest.py?revision=HEAD&view=markup

See specifically use of BASEDIR and CONFIGDIR to achieve relocation. You can also see other more complex host-specific adaptation in online in:

http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/CMSSW/DQM/Integration/config/server-conf-online.py?revision=1.56&view=markup

Respect expire timestamp from DASJSON headers

Add new method for abstract_service to respect DASJSON header. The new tier0 service is already DAS compliant. It does ship data with DASJSON header which contains results as well as expire timestamp. I need to parse this info correctly.

Return HTTP error for all DAS servers when DB is down

Return HTTP 503 error when MongoDB is down. DAS server should stay alive.

New query handling

Currently, queries are a raw python dictionary. I propose that this should be replaced by a wrapper class, with the following rationale:

Often not clear in the code whether an argument should be encoded for storage {spec:[{key: name, value: value}]} or decoded for search {spec:{name: value}}. This can be replaced by query.encoded() and query.decoded(), and cache the results of the transformation in the object.
qhash is repeatedly calculated for the same query, sometimes in optimisable cases (some places in AbstractService where it is calculated in both the caller and inner context of a function), but in many cases it is calculated seperately, and by caching the result in the object after one calculation we can save on this.
it is currently very difficult to pass flags around with a query, without modifying every function it might pass through, or storing it in the query dictionary (and making everything that uses it or hashes it aware). Flags I am currently thinking of are "hide_from_analytics" (currently a function argument), "force_update" (currently not available, I've not had much success hacking this into AbstractService). Conceivably there are others.
Some currently standalone functions (bare query, looser query, query comparison) can be moved into a query class instead of being standalone functions.

DAS unit tests per data-service

Create a framework of unit test per data-service to test only data-service specific queries.

Queryspammer weighting

Weight queryspammer distributions in some quasi-real manner so that if it is used to hammer the cache it should trigger analytics appropriately.

Provide configurable location for parsertab.py

To avoid creation of parsertab.py in DAS install area I need to allow its location being configurable parameter. This will easy issue on cmsweb and allow to have it in /data/projects/das area instead of DAS source code area.

Review expiration timestamps for data-services

All APIs use 3600 sec expiration timestamp (valid for testing) which need to be adjust to real case scenario. I think DBS/phedex should have 10-15 minutes, SiteDB around an 1hour, etc.

Investigate number of open connections for DAS/Mongo interaction

Follow up from #290.

Regarding open connections, they are connected sockets, i.e sockets between DAS and MongoDB. Just ssh to cmsweb@… and run netstat -tanlp | grep ESTABLISHED | grep 27017 to see them. We have currently: {{{
$ netstat -tanlp | grep ESTABLISHED | grep 27017 | awk '{print $NF}' | sort | uniq -c
212 4500/mongod
138 4860/python
74 4875/python
}}}
Why there are that many I can't answer. Maybe every DAS thread creates some number of connections? Note that half of the sockets are for python side, the other half is the mongod side, as shown above.
in reply to: ↑ 69

Use internal YUI and wait for StaticSchruncher service

Instead of using YUI hosted by Yahoo, use local yui installation.

replace types.XXX into isinstance

Review note on DAS: general comment, would find expression if not isinstance(x, dict) style more readable than if type(x) is not types.DictType style.

Analytics Tasks

Better test the existing analytics tasks and add some new ones.

Analytics Web

Fix the analytics web so that there is a

appropriate interface between analytics daemon and web, probably a capped collection.
proper web interface
templating of outputs
plotfairy integration?

Move YML files into SITECONF/T1_CH_CERN/DAS

YML file which define DAS schema should reside in SITECONF/T1_CH_CERN/DAS. It will simplify maintenance of the DAS on cmsweb clister.

Implement DAS accumulation

DAS must support aggregation of information. Since DAS cache server utilize a REST model this can be done as series of steps:

request data via POST request, this will create a new records in cache
create crontab and run at certain interval before data expiration, e.g. every 10 minutes

GET data
invalidate expiration time and put it back via PUT request. At this time cache will keep data which are expired. Set new flag for cache to prevent it from deletion, e.g. 'accumulate':True.
request data again via POST request
compare data w/ 'accumulate':True with the new one
do diff/update
set new expiration time
place new data via PUT request into cache
So for that to work I need:
new field in a document, e.g. accumulate
the cleaner job should not delete expired documents in cache who have accumulate flag is ON.
new view in couch to select accumulated docs
new tool which will use above mentioned logic to update docs in couch.

Setup permanent URL for tier0 data-service

Contact Dirk/Stephen and request permanent tier0 data-service URL for DAS.

improve stats for DAS cli

Currently I only report stats on init, sub-system call, merge steps. I want to divide sub-system stat into URL fetch time and actual DAS sub-system processing time. This can be accomplished on making singleton DASTimer class instance and use it everywhere to collect various stats.

Handle large docs via GridFS

It is possible that DAS will received a doc whose size will exceed MongoDB limit (4MB by default). In that case the bulk insert will fail for all docs in insert sequence (due to generators). To avoid that I need a new generator routine whose purpose will be scan doc and pass it if it has size < 4MB or put it to GridFS.

Reivew overview session

Review note on DAS: overview plotfairy version, session arguments are unnecessary and can be omitted.

Comment 23 follow-up: I guess it wasn't clear enough, but "session arguments" meant "session" and "version". All you need is the actual data arguments. Also would prefer they were deleted, not just commented out.

Create a doc how to add to DAS expert's DN

Expert's DNs need to be added to MongoDB in order to allow to access Expert page. Need to create a doc for CERN operator to perform this action.

MongoDB replication/sharding

Explore mongo replication. I can have two nodes, one used for raw-cache of user on-demand queries, while another can used by populator to replicate data from data-services.
Explore mongo sharding, where define sharding key, e.g. block.

CHEP Proceeding

Write some proceedings for DAS @ CHEP 2010

Migrate DAS web server to WMCore.Webtools based

Migrate DAS web server to WMCore.WebTools based.

Sanitize checkargs

Thank you for adding checkargs to verify parameters. It has a few flaws I'd like to see fixed:
You don't use what you verify. Some arguments are casted to strings (str(x)) before checking. You should instead verify what you will use.
You should type check all arguments for reasons above. A keyword argument can be None (not given), a string (given once), or a list (if given several times).
Contents of many, but not all arguments are checked. I didn't see any additional checking added for remaining arguments elsewhere so it looks like several vulnerabilities remain. You should always sanitise all arguments. Even if the argument is free form input, you can often make sure it only consists of certain legitimate characters (e.g. letters only).
Failure to verify arguments should raise an exception.
Failure to check an argument should not return the argument value back to caller. This is unsafe; you don't know what the value contains, and you just determined it's not valid. Returning the value to caller can be used to create XSS and other attacks. My general preference is to never return anything to the caller - you simply return suitable HTTP status code.
It's not sanitising the HTTP method; note that 'method' keyword argument is not the same as the request method!

remove docs from DAS web server

Upon Lassi/L2 suggestions will remove doc part of DAS web server.

Remove code which is commented out

Code clean-up.

Migrate init script into manage

Review note on DAS bin directory: start-up scripts should be folded into manage. We very much prefer to see everything inlined directly into the manage script without several layers of indirection, for simplicity, comprehension and transparency.

Proof/improve monitoring{-das}.ini

Read existing monitoring.ini files in SITECONF/T1_CH_CERN/DAS and improve them as necessary

Need custom DAS map-reduce for Oli use case

Oli wants to have custom views in DAS to get his data:

''Essentially the sum of data for each T1 site for each combination of
acq era, tier, custodial/non-custodial.
''

I think it can be accomplished as 2 step procedure in DAS.

DAS ask DBS3/phedex for dataset/block info

DAS asks DBS3 for list of all datasets. This brings into DAS tier/era info.
DAS asks Phedex for list of all blocks. This brings into DAS block info which contains replicas.
1. We develop script which loop over all unique tier/era combinations and ask for each of them a sum of replicas from stored blocks.

Configuration unification

DAS can be either configured using either configparser or wmcore.configuration. The current config code has a few problems:

Variables with wrong type (eg mappingdb.attempts) which are only touched in rare situations.
Defaults are variously stored in das_readconfig, das_writeconfig and redundant dict.get statements when the config is used.
Defaults are provided for configparser input but not wmcore.

Provide a single layer performing validation/casting/defaults, which doesn't care whether it reads from an underlying configparser or wmcore config.

AnalyticsDB atomic operations

Restore analyticsDB to using unique qhash records, with an array of hit times. Provide a workaround to inability to pull with conditions for mongodb<=1.6. Determine interplay of capped collections and updated instead of new objects.

Related, consider making sure all related documents for a given query are removed from analytics concurrently.

Pass dir as argument to das_map

CHange this block to use external dir parameter
{{{
+if [ hostname -d == "cern.ch" ]

dir=/data/projects/das/config/maps
+else
dir=$DAS_ROOT/src/python/DAS/services/maps
+fi

}}}

DAS aggregators need to show the record if possible

When using certain aggregators, e.g. max, min, I should be able to show the record itself rather then min/max value of asked field. For instance, if user type

find block | max(block.size)

I should not only show max block.size, but also a link to a record with this value.

DAS parser cache

RE-based PLY parsing is easier than writing our own ad-hoc parser but is quite expensive. Add a (capped?) mongodb collection to store the parsed versions of string queries, and intercept new queries appropriately.

Code audit: DAS

Done.
The %post section has been reviewed and cleaned up.

Review template quoting, sanitize templates

From #290

I didn't understand the addition of urllib quoting in, for example, das_table.tmpl. Shouldn't you use encodeURIComponent in javascript code / arguments, and urllib when quoting something originating from DAS server itself? To me it seems you are now sometimes quoting javascript itself, not the javascript variable value.

Also I note here that the quoting wasn't added universally everywhere - not in all templates, and not even systematically in the one example I happened to quote, das_table.tmpl. As I wrote before, it looks like every template needs to be sanitised. I can't easily tell which values are safe.

Remove circular dependencies

utils/das_config.py calls

from DAS.utils.das_cms_config import read_wmcore

while utils/das_cms_config.py calls

from DAS.utils.das_config import DAS_OPTIONS

The remedy is to merge them together.

Test DAS w/ DBS2/Phedex/RunRegistry/Tier0

Test DAS with DBS2/Phedex/RunRegistry/Tier0 to allow PVT tester to have a look at the service.

Need init script for DAS analytics web

Eventually we will need to add DAS analytics into DAS manage init script. I need to know how to start/stop up DAS analytics web server. How to check its status, etc. A basic skeleton of init script will be useful.

refactor run_summary.py into more generic cern_sso_auth.py

Analytics web server help

We need a help section for DAS web analytics server. It should describe meaninig of sections, e.g. Main, Control. It should provide examples (some description and png image of it) how to submit certain tasks. Examples (png images) of what we should see when tasks are running, etc.

This will allow to train DAS operators.

Add ability to learn data-service output keys

Add ability to learn and add new or reload existing map from the output of data-provider. For example by learning about keys the output of some query I can add to DAS what this data-service is capable to provide. For instance, user type

run=123

the DAS query RunSummary and get output which contains L1Trigger. So DAS can gain knowledge from the output that Run Summary provides information about L1Trigger by query run=123. If this info is captured, I can improve DAS input fields. For example, I can store associative keys into separate collection with data-service. Those keys can be used as "helpers" in DAS input query, so user can type

l1 trigger

and DAS can replies, ahh, I know data-service which provide this. And in order to get l1 trigger you must type your run number.

We can apply some word processing to allow different linguistic combinations.

This way DAS will gain knowledge what data service can provide. This can improve search and make some suggestions.

genkey output is not consistent

genkey() does not necessarily produce identical output for functionally identical input, which given our reliance on qhash for finding records is a problem.

From python reference:
"CPython implementation detail: Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions."

Example:

genkey({'fields': None, 'spec': [{'key': u'dataset.name', 'value': u'"/TTbar_1jet_Et30-alpgen/Winter09_IDEAL_V12_FastSim_v1/GEN-SIM-DIGI-RECO"'}]})
'b255596fb3728afe13c5c078ad6f9105'
genkey({'fields': None, 'spec': [{'value': u'"/TTbar_1jet_Et30-alpgen/Winter09_IDEAL_V12_FastSim_v1/GEN-SIM-DIGI-RECO"', 'key': u'dataset.name'}]})
'2c7d1cfc1244e5367eefe70dfeeeb321'

Here we have only trivially transposed the order of the "key" and "value" arguments, but the result is a different hash value. This problem shows up in analytics where running QueryMaintainer from the command line works but spawning it through the server doesn't, as far as I can tell just because the dictionary construction order differs. This is not because of unicode-ness of strings (tested, using json.dumps deals with this).

I will try and modify the genkey function to produce consistent output, but this is probably performance sensitive.

Fix problem with record's count

RIght now to get total number of results I invoke the count, since I added empty records to protect access of services which does not return results, I should exclude them from the count of results for given query. Should be trivial, e.g.
db.merge.find(spec)count()
where spec contains query and non-existance of 'das.empty_record'.

Put basic login/pw for mongodb

Clarify with HTTP group if I need to put login/pw for mongodb.

Change port allocation for MongoDB

Modify mongo_init script to use ports 8230-8239 for MongoDB
Change DAS configuration to use this port slot for MongoDB
Revisit pgrep part for sysboot in mongo_init