dmwm / das2go Goto Github PK
View Code? Open in Web Editor NEWGo implementation of Data Aggregation System (DAS) for CMS experiment
License: MIT License
Go implementation of Data Aggregation System (DAS) for CMS experiment
License: MIT License
Hi,
I think in the past it was possible to query dataset in "production" status, with something like dataset=/*/PhaseIITDRSpring19DR-*/GEN-SIM-DIGI-RAW status=*
.
Unfortunately the status=
part of the query does not seem to work any more ?
Thank you,
.Andrea
Running the command
file dataset=/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIIAutumn18MiniAOD-102X_upgrade2018_realistic_v15-v2/MINIAODSIM run=1 lumi=123
(on the web interface) gives me a long error message saying things like
error=json: cannot unmarshal object into Go value of type []mongo.DASRecord
I am trying to find the miniAOD file that holds a specific event I have found in nanoAOD. For this I have looked up the value of the luminosityBlock
of the event in nanoAOD. The value is 123 and the dataset is /WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIIAutumn18NanoAODv5-Nano1June2019_102X_upgrade2018_realistic_v19-v1/NANOAODSIM
.
DAS tells me the parent dataset is /WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIIAutumn18MiniAOD-102X_upgrade2018_realistic_v15-v2/MINIAODSIM
. Thus to find the file I'm looking for, I use the previously mentioned query.
See https://hypernews.cern.ch/HyperNews/CMS/get/computing-tools/4170.html
The failed query is
file dataset=/MET/Run2017B-HighMET-17Nov2017-v1/RAW-RECO run=297292 lumi=63
When I search for a non-default dbs instance, the links generated in the page link to global search.
See here:
Now, a link in this page (clicking any dataset) looks like this:
without the instance=prod/phys03 flag
It should look like this:
Valentin, I noticed both pre-prod and prod DAS have a list of prod/* dbs instances only. DAS testbed used to provide access to the integration database. Can that feature be restored?
Hi,
I am using DAS to query the files within block:
/TTToSemiLeptonic_mtop166p5_TuneCP5_13TeV-powheg-pythia8/RunIISummer19UL17RECO-106X_mc2017_realistic_v6-v2/AODSIM#d3ae4ba2-2c38-40dc-8152-421e71ce4a21
I see the block contains 133 files. I can view the first 50 but I cannot find a way to view the next pages. If I click 'next' page, I see this message:
No results found DAS unable to find any results for your query. Please revisit your query by reviewing DAS query guide or submit a DAS github issue to resolve your query request.
I also tried changing the number of viewable results from 50 to 150, but this did not work, and I only see 50 results.
Most of the complains from golint come about improper notation use in functions, e.g. L_dbs3_datasetlist API. In go it is recommended not to use underscores in function names. These names we look-up via reflect module, see
https://github.com/dmwm/das2go/blob/master/das/das.go#L341
I propose the following fix:
Hi everybody,
@Panos512 asked me to open this ticket as a "follow up" to these in CRIC and the Service Portal (LDAP). The context there is that a comma in the DN was causing trouble and it is fixed in CRIC already and will hopefully be fixed for LDAP.
I am not sure if this is also the problem in DAS:
I have a new certificate from a new provider (Issuer: C = NL, O = GEANT Vereniging, CN = GEANT eScience Personal CA 4
) and the DN has a somewhat funny format:
$ openssl x509 -noout -text -in usercert.pem | grep Lange
Subject: DC = org, DC = terena, DC = tcs, C = DE, O = Universitaet Hamburg, CN = "Lange, Dr. Johannes <username>@uni-hamburg.de"
The CN is generated with information from an SSO of the home institution and in the case of Uni Hamburg unfortunately contains a comma (and is in quotes). All new certificates for our group members will be issued by GEANT from now on, because GridKa-CA will stop operation.
cmsweb does not complain when selecting the new certificate, but when I go on to DAS, I receive "Peer does not recognize and trust the CA that issued your certificate."
The situation is the same for http://cmsweb-testbed.cern.ch/ and http://cmsweb-prod.cern.ch/.
I am not sure if this is also caused by the comma in the DN or if this is a different problem.
Any help would be appreciated and I can provide more information, if needed!
Best,
Johannes
Dear all,
would it be possible to add back the link of a MC dataset to the McM request?
We got a lot of queries about that and we think it might be useful.
Thanks in advance.
The Pdmv group
The old das tool provides more useful information for locating where a typo is in a query. Compare the das and das2go webpages for this query with a space in the middle of the dataset name:
In the das webpage you can easily find the problem with the -----------------------------------------------^
I got report from Felipe Gómez-Cortés who claimed that DAS web UI provides different results for the following query:
site dataset=/ReggeGribovPartonMC_EposLHC_pPb_4080_4080/pPb816Spring16GS-80X_mcRun2_asymptotic_v17-v1/GEN-SIM
After series of iterations I confirmed that this is the case using my dev environment. I identified that the problem is related to unability of DAS to contact with cms-rucio service yield the following errors:
2021/06/05 11:17:28 fetch.go:486: ERROR: fail to fetch http://cms-rucio.cern.ch/replicas/cms//ReggeGribovPartonMC_EposLHC_pPb_4080_4080/pPb816Spring16GS-80X_mcRun2_asymptotic_v17-v1/GEN-SIM#3ce2d95e-7168-11e6-9fb1-002590494fb0/datasets, retries 3, error Get "http://cms-rucio.cern.ch/replicas/cms//ReggeGribovPartonMC_EposLHC_pPb_4080_4080/pPb816Spring16GS-80X_mcRun2_asymptotic_v17-v1/GEN-SIM#3ce2d95e-7168-11e6-9fb1-002590494fb0/datasets": dial tcp: lookup cms-rucio.cern.ch: no such host
2021/06/05 11:17:28 fetch.go:486: ERROR: fail to fetch http://cms-rucio.cern.ch/replicas/cms//ReggeGribovPartonMC_EposLHC_pPb_4080_4080/pPb816Spring16GS-80X_mcRun2_asymptotic_v17-v1/GEN-SIM#c541fd04-7198-11e6-9fb1-002590494fb0/datasets, retries 3, error Get "http://cms-rucio.cern.ch/replicas/cms//ReggeGribovPartonMC_EposLHC_pPb_4080_4080/pPb816Spring16GS-80X_mcRun2_asymptotic_v17-v1/GEN-SIM#c541fd04-7198-11e6-9fb1-002590494fb0/datasets": dial tcp: lookup cms-rucio.cern.ch: no such host
We need to identify the source of this issue. @ericvaandering any ideas?
When looking at the result of a dataset query, the information about the dataset below the dataset link is sometimes in a different order.
For example:
The query
Results in a view like:
whereas this query
results in:
These change with a refresh (hence the attached screen shots).
Other formatting errors appear on refresh:
In this version, you see spaces before and after the slashes in the dataset link.
I downloaded latest version from GH, and here's what I get, which is the same as with current version in CVMFS (v02.04.42 )
if I query for run
or lumi
separetly it is OK. but --query 'lumi,run
or --query 'run,lumi
fail. I have never seen this before. Could it be because of the odd dataset name, or just something I did wrong in that dataset ? Or is this a problem on dasgoclient ? If it is due to something I did in this test, no problem.
belforte@lxplus715/TestConfig> ./dasgoclient_amd64 --version
Build: git=v02.04.45 go=go1.17.7 date=2022-03-22 19:13:12.504084088 +0100 CET m=+0.001383466
belforte@lxplus715/TestConfig> ./dasgoclient_amd64 --query 'lumi,run dataset=/ThisIsATest/belforte-CrabAutoTest_userInputFiles-94ba0e06145abd65ccb1d21786dc7e1d/USER instance=prod/phys03'
panic: interface conversion: interface {} is json.Number, not []interface {}
goroutine 1 [running]:
github.com/dmwm/das2go/services.OrderByRunLumis({0xc000101c00, 0x46, 0x7c5b3f})
/home/runner/go/pkg/mod/github.com/dmwm/[email protected]/services/helpers.go:316 +0x8ea
github.com/dmwm/das2go/services.fileRunLumi({{0xc000408090, 0x83}, {0x7ffffae0dfca, 0x7d}, {0xc000345f40, 0x20}, 0xc000402870, {0xc000369720, 0x2, 0x2}, ...}, ...)
/home/runner/go/pkg/mod/github.com/dmwm/[email protected]/services/helpers.go:303 +0x349
github.com/dmwm/das2go/services.LocalAPIs.RunLumi4Dataset(...)
/home/runner/go/pkg/mod/github.com/dmwm/[email protected]/services/dbs.go:167
reflect.Value.call({0x7c0140, 0xac37e8, 0xa93d40}, {0x7c59d7, 0x4}, {0xc0001d7108, 0x1, 0x40c34b})
/opt/hostedtoolcache/go/1.17.7/x64/src/reflect/value.go:556 +0x845
reflect.Value.Call({0x7c0140, 0xac37e8, 0xc000161998}, {0xc0001d7108, 0x1, 0x1})
/opt/hostedtoolcache/go/1.17.7/x64/src/reflect/value.go:339 +0xc5
main.processLocalApis({{0xc000408090, 0x83}, {0x7ffffae0dfca, 0x7d}, {0xc000345f40, 0x20}, 0xc000402870, {0xc000369720, 0x2, 0x2}, ...}, ...)
/home/runner/work/dasgoclient/dasgoclient/main.go:1001 +0x505
main.process({0x7ffffae0dfca, 0x7d}, 0x0, {0x842680, 0x1}, 0x0, {0x0, 0x0}, {0x7cd077, 0x16}, ...)
/home/runner/work/dasgoclient/dasgoclient/main.go:509 +0x130f
main.main()
/home/runner/work/dasgoclient/dasgoclient/main.go:155 +0x120d
belforte@lxplus715/TestConfig>
I'm using the following dasgoclient query to estimate the volume of legacy data and it works as expected for this MINIAOD query (for example)
$ dasgoclient -query "block dataset=/*/Run2017*09Aug2019_UL2017*/MINIAOD | sum(block.size)"
sum(block.size): 2.12740409597139e+14
(Curiously, the same query does not work on DAS web UI while it used to work earlier).
However, for another query, this time for NANOAOD, the dasgoclient query returns no result even if the datasets exist:
$ dasgoclient -query "block dataset=/*/Run2017*UL2017_MiniAODv1_NanoAODv2*/NANOAOD | sum(block.size)"
$ dasgoclient -query "dataset=/*/Run2017*UL2017_MiniAODv1_NanoAODv2*/NANOAOD"
/BTagCSV/Run2017B-UL2017_MiniAODv1_NanoAODv2-v1/NANOAOD
/BTagCSV/Run2017C-UL2017_MiniAODv1_NanoAODv2-v1/NANOAOD
/BTagCSV/Run2017D-UL2017_MiniAODv1_NanoAODv2-v1/NANOAOD
/BTagCSV/Run2017E-UL2017_MiniAODv1_NanoAODv2-v1/NANOAOD
.....
Is there something I should do differently?
When I query for all of the files in the dataset
/SingleMuon/Run2016H-03Feb2017_ver2-v1/MINIAOD
I get a list
which does not include one of the valid files
The CMS Data Aggregation System (DAS) relies on the following PhEDEx APIs:
But the main ones are blockReplicas and fileReplicas. Here are concrete examples of URL DAS places:
# fileReplicas example
https://cmsweb.cern.ch/phedex/datasvc/json/prod/fileReplicas?dataset=%2FZeroBias%2FRun2017F-31Mar2018-v1%2FNANOAOD
# blockReplicas example
https://cmsweb.cern.ch/phedex/datasvc/json/prod/blockReplicas?dataset=%2FZeroBias%2FRun2017F-31Mar2018-v1%2FNANOAOD
You can either use this URLs directly in a browser or use curl client, e.g.
curl -L -k --key ~/.globus/userkey.pem --cert ~/.globus/usercert.pem
"https://cmsweb.cern.ch/phedex/datasvc/json/prod/blockReplicas?dataset=%2FZeroBias%2FRun2017F-31Mar2018-v1%2FN
ANOAOD"
and then you can see returned JSON document.
To perform DAS+Rucio aggregation we need to get list of Rucio APIs for all PhEDEx APIs DAS uses.
Originally the support for
file dataset=/a/b/c site=XXX run=123
query was done through DBS and Phedex APIs. First, we resolved list of blocks for a given dataset. Then, we find files for a given set of blocks and run number, and finally filter files using Phedex fileReplicas API to select files on a given site.
Now, we need to implement the same logic using DBS and Rucio APIs. The question is do we have similar to fileReplicas Rucio API to select files only for a given site or should we find another route in Rucio to accommodate this workflow.
@ericvaandering could you please comment on this?
The RUCIO API that lists dataset replicas location has a known issue (*) that make it provides inconsistent/outdated location. The correct response is provided by the very same API but with deep=True
parameter (**).
If I follow the DAS code correctly (big if) the only point where this API is used is here (***) (not sure though, the name is not what I expect the call to do). It should be then enough to add deep=True parameter to this call in order to get the correct set of dataset location.
(***)
Lines 641 to 644 in 12589ce
It would be very useful to add processing time to DAS query. We can calculate it from the time we insert the query into MongoDB and final time when records are ready. We show this time as "processing time" on DAS web UI.
I couldn't find any site for file /store/mc/RunIISummer20UL18NanoAODv9/ZJetsToQQ_HT-200to400_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_upgrade2018_realistic_v16_L1v1-v2/2550000/B369327F-C0CD-564B-9B16-8CCD9BD9DEF4.root
. You can refer to this link
Would it be possible to add one button in DAS to get the cmsDriver directly from a dataset page? This has been discussed many times and would be very beneficial especially for RelVal samples.
@vkuznet Valentin, please let me know if this issue should get created somewhere else.
As mentioned in the WMCore meeting today, while scanning the CMSWEB frontend logs, I noticed the following 3 calls:
IP "GET /couchdb/wmstats/_design/WMStats/_view/requestByOutputDataset?key=\"/SingleMuon/Run2018D-SiPixelCalSingleMuon-ForPixelALCARECO_UL2018-v1/ALCARECO\"&include_docs=true&stale=update_after HTTP/1.1" 404 [data: 10719 in 15439 out 52 body 7227 us ] [auth: TLS... "DN" "-" ] [ref: "-" "dasgoserver" ]
IP "GET /couchdb/wmstats/_design/WMStats/_view/requestByInputDataset?key=\"/SingleMuon/Run2018D-SiPixelCalSingleMuon-ForPixelALCARECO_UL2018-v1/ALCARECO\"&include_docs=true&stale=update_after HTTP/1.1" 404 [data: 10718 in 15439 out 52 body 8498 us ] [auth: TLSv... "DN" "-" ] [ref: "-" "dasgoserver" ]
IP "GET /couchdb/wmstats/_design/WMStats/_view/requestByOutputDataset?key=\"/store/data/Run2018C/EGamma/RAW/v1/000/319/349/00000/F01E03C9-1683-E811-A262-FA163E5A6AC2.root\"&include_docs=true&stale=update_after HTTP/1.1" 404 [data: 10736 in 15439 out 52 body 5896 us ] [auth: TLSv... "DN" "-" ] [ref: "-" "dasgoserver" ]
Searching for these couch views in WMCore, they have been deprecated 5 years ago (!):
dmwm/WMCore#5609
Can you please update them as follows:
requestByOutputDataset
to /reqmgr2/data/request?outputdataset=XXX
, e.g.:requestByInputDataset
to /reqmgr2/data/request?inputdataset=XXX
, e.g.:In addition to that, can you please clarify which kind of request information you need? Is it just the workflow name? Or you need some other workflow meta-data. If it's the former, then please do always use detail=False
.
Last but not least, if you check the 3rd example, there is no data sanitization on DAS Server, which means, it will make a reqmgr2 call to whatever data input is provided by the user. In this case, it asks for workflows by output dataset, but it provides a LFN.
An extra request, would you have a map of all WMCore APIs used within DAS Server/client? We might have other such cases that were not spotted yet. Thanks
Dear DAS experts,
I am currently unable to perform queries on the web interface of CMS DAS. For instance, I am trying to perform this simple query:
dataset=/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/*20UL*/NANOAODSIM
receiving only 1 result with the query I just inputted, instead of the list of related datasets.
Best,
Tommaso
Valentin, as discussed in this issue: #27
we should sanitize the user data before making a call to the reqmgr2 REST APIs.
For instance, if you are requesting ReqMgr2 data, provided an input or output dataset, you need to make sure that the user provided a valid dataset name (not a block, not a LFN, not any other string not matching standard lexicon).
When asking for a list of files from a dataset on the DAS webpage, and sorting by nevents
or size
, it does not sort the output.
e.g. the query file dataset=/JetHT/Run2016D-17Jul2018-v1/MINIAOD | sort file.nevents
as per the FAQ: https://cmsweb.cern.ch/das/request?view=list&limit=50&instance=prod%2Fglobal&input=file+dataset%3D%2FJetHT%2FRun2016D-17Jul2018-v1%2FMINIAOD+%7C+sort+file.nevents
produces files with nevents: 81919, 82659, 94697, 80021, 109376, 83485. Similarly for sort file.size
. These queries return the same file order as when sort
isn't used at all.
Aside: I also tried on the commandline with dasgoclient
, but that didn't even return anything. Should I open another issue for that, or is it related?
Thanks,
Robin
I was trying to figure out if a data set is already available at a site, but the query
'site dataset=/DYJetsToLL_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL16NanoAODAPVv9-106X_mcRun2_asymptotic_preVFP_v11-v1/NANOAODSIM'
returns
while dasgoclient
lists the sites:
dasgoclient -query 'site dataset=/DYJetsToLL_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL16NanoAODAPVv9-106X_mcRun2_asymptotic_preVFP_v11-v1/NANOAODSIM'
T1_DE_KIT_Disk
T1_ES_PIC_Disk
T1_RU_JINR_Disk
T1_US_FNAL_Disk
T2_BE_IIHE
T2_BE_UCL
T2_CH_CSCS
T2_DE_DESY
T2_EE_Estonia
T2_ES_CIEMAT
T2_FR_GRIF_LLR
T2_IN_TIFR
T2_IT_Legnaro
T2_UK_London_IC
T2_UK_SGrid_RALPP
T2_US_MIT
T2_US_Nebraska
T2_US_Purdue
T3_IT_Trieste
Any idea what could be wrong?
I'm using the web interface, and I tackle the following error:
I'm searching for children of a given file:
child file=/store/mc/RunIISummer20UL18RECO/GGToEE_Pt-35_Elastic_13TeV-lpair/AODSIM/106X_upgrade2018_realistic_v11_L1v1-v2/100000/027C3A23-DAA8-1449-9047-A418996EEC2E.root
The output is two files, but when I click on one of the files to view its related info, web interface forwards me to the following search line:
dataset=/store/mc/RunIISummer20UL18MiniAOD/GGToEE_Pt-35_Elastic_13TeV-lpair/MINIAODSIM/106X_upgrade2018_realistic_v11_L1v1-v2/100000/80B92674-24B6-9642-87C4-3C41A752F15E.root
with DBS unable to unmarshal the data into DAS record...
error.
It should be file
instead of dataset
in the search command.
This error might lead a user to conclude that the file is missing, so it is important to make this fix.
Thanks!
Dear Experts,
I am having an issue where wildcards do not seem to be working properly in DAS queries. For example when I use the following query:
dataset=/QCD_Pt-170to300_MuEnrichedPt5_TuneCP5_13TeV_pythia8/RunIIAutumn18MiniAOD-102X_upgrade2018_realistic_v15-v3/MINIAODSIM
I receive the proper dataset, however dataset=/QCD_Pt-170to300_MuEnrichedPt5_TuneCP5_13TeV_pythia8// acts as though I am searching for a dataset with that literal name, rather than the wildcards.
as an example for config dataset=/RelValTTbar_14TeV/CMSSW_12_1_0_pre4-PU_121X_mcRun3_2021_realistic_v10_HighStat-v2/MINIAODSIM
I get
https://cmsweb.cern.ch/das/request?view=list&limit=50&instance=prod%2Fglobal&input=config+dataset%3D%2FRelValTTbar_14TeV%2FCMSSW_12_1_0_pre4-PU_121X_mcRun3_2021_realistic_v10_HighStat-v2%2FMINIAODSIM
Under Config Name
link there is a generic https://cmsweb.cern.ch/das/request?input=config%3DReqMgr2&instance=prod/global , I would expect that it should return the actual reqmgr request pdmvserv_RVCMSSW_12_1_0_pre4TTbar_14TeV__HighStat_211020_131422_5228 at https://cmsweb.cern.ch/reqmgr2/fetch?rid=pdmvserv_RVCMSSW_12_1_0_pre4TTbar_14TeV__HighStat_211020_131422_5228
The following part, at Config urls:
correctly picks up only one config out of 4 available in the request, as can be seen in the reqmgr2 link above
Config Cache List
DQMConfigCacheID: 46713bf726160ce248142d29719c1878
Task1: DigiPU_2021PU: ConfigCacheID: 46713bf726160ce248142d29719b6f22
Task2: RecoPU_2021PU: ConfigCacheID: 46713bf726160ce248142d29719be2df
Task3: Nano_2021PU: ConfigCacheID: 46713bf726160ce248142d29719c2eff
With an arrival of embed feature in GoLang we should embed all static files and simplify deployment of das server The static content can be embded into das static executable and we will eliminate shipment of static
area in DAS server deployment.
the following works and gives one file
dasgoclient --limit 0 --query 'file dataset=/HIMinBiasUPC/HIRun2011-v1/RAW run=182124 lumi=40'
the site query for the file shows that the file is at T2_CH_CERN
.
However a direct combination with the site does not return anything
dasgoclient --limit 0 --query 'file dataset=/HIMinBiasUPC/HIRun2011-v1/RAW run=182124 lumi=40 site=T2_CH_CERN'
This doesn't get any results
dataset file=/store/mc/RunIISummer19ULPrePremix/Neutrino_E-10_gun/PREMIX/UL17_106X_mc2017_realistic_v6-v1/30001/3DBDCDA1-E656-814A-BE78-C0ACDB9C05E0.root
while doing a block lookup first and then getting the dataset from the block works fine.
In order to enable throttling on DBS we should switch DAS to use cmsweb-prod
Using the following query:
I see duplicated entries (with a comma separating them on the same line).
/store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/DA8E43DC-61F0-E611-8411-70106F4A94F0.root, /store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/DA8E43DC-61F0-E611-8411-70106F4A94F0.root
/store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/C037802E-62F0-E611-94AA-70106F48BBEE.root, /store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/C037802E-62F0-E611-94AA-70106F48BBEE.root
/store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/AE7F0E4C-62F0-E611-A831-0CC47A7FC7B8.root, /store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/AE7F0E4C-62F0-E611-A831-0CC47A7FC7B8.root
/store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/78A07748-62F0-E611-8AE9-0025901D4D54.root, /store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/78A07748-62F0-E611-8AE9-0025901D4D54.root
/store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/6E1B071B-62F0-E611-9069-0025901D493A.root, /store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/6E1B071B-62F0-E611-9069-0025901D493A.root
/store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/4C71AB5B-62F0-E611-93BC-047D7BD6DEC4.root, /store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/4C71AB5B-62F0-E611-93BC-047D7BD6DEC4.root
/store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/4AFFC082-62F0-E611-961E-00266CFFA1FC.root, /store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/4AFFC082-62F0-E611-961E-00266CFFA1FC.root
/store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/40D4854D-62F0-E611-9421-70106F4D23F0.root, /store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/40D4854D-62F0-E611-9421-70106F4D23F0.root
/store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/3A7C6864-62F0-E611-8819-047D7BD6DF5A.root, /store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/3A7C6864-62F0-E611-8819-047D7BD6DF5A.root
...
While investigating some crab submission problem report we found the following:
This datase is on tape at CCIN2P3 with just one block on disk in Pisa, dasgoclient correctly reports:
belforte@lxplus021/STE> dasgoclient --query 'site dataset=/BsToMuMuPhi_BMuonFilter_SoftQCDnonD_TuneCUEP8M1_13TeV-pythia8-evtgen/RunIISummer16DR80Premix-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/AODSIM'
T1_FR_CCIN2P3_Buffer
T1_FR_CCIN2P3_MSS
T2_IT_Pisa
belforte@lxplus021/STE>
While looking in DAS web UI the dataset is listed as being on tape only:
https://cmsweb.cern.ch/das/request?instance=prod/global&input=site+dataset%3D%2FBsToMuMuPhi_BMuonFilter_SoftQCDnonD_TuneCUEP8M1_13TeV-pythia8-evtgen%2FRunIISummer16DR80Premix-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1%2FAODSIM
When querying only for the block on disk, WEB UI reports correctly
Block size: 217441549 (217.4MB) Number of files: 1 Open: n Site: T2_IT_Pisa, T1_FR_CCIN2P3_Buffer, T1_FR_CCIN2P3_MSS at
https://cmsweb.cern.ch/das/request?view=list&limit=50&instance=prod%2Fglobal&input=%2FBsToMuMuPhi_BMuonFilter_SoftQCDnonD_TuneCUEP8M1_13TeV-pythia8-evtgen%2FRunIISummer16DR80Premix-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1%2FAODSIM%23e863f226-cb55-11e6-a2b6-001e67abf518
Maybe because 1 disk block out of 148 is somehow rounded to 0% and hence no listed by the
site dataset=... query in the UI ?
When looking at the result of a "parent dataset=..." query
The generated links in the page have the form "parent=..." rather than "dataset=..."
They look like this:
rather than expected:
The "next" and "last" in the <first | prev | next | last> navigation on the DAS web page
don't work but always show the first 50 results. Unfortunately, the "results/page" seems
not always to work either.
Could someone please take a look and fix things?
Thanks,
The Dynamo team (mostly Max) has completed the APIs which very closely match the phedex APIs. What one would have to do is to substitute
https://cmsweb.cern.ch/phedex/datasvc/json/prod/
with
http://dynamo.mit.edu/phedexdata/inventory/subscriptions
the rest of the call and options are the same. All dynamo APIs take exactly the same options. The returned jsons structurally are the same as the PhEDEx ones (if not then it is a mistake and will be corrected). The difference in reply would be object ids (dataset id, site id), as dynamo returns numbers that are internal to dynamo. We also omit some fields that dynamo does not have (instead of faking them), please let us know if any of them are actually used by the code so that we could adjust dynamo code. Some smaller issues in one of the API calls is still in progress but thi should be close enough for many tests.
Please let me know if you have any questions or need this to be posted somewhere else.
Hi, I hope this is the right place to report this. The web interface is not working for me right now:
Search string: dataset=/ZMM*/*/*
, taken from the help page
Result:
DAS error
DAS query:
dataset=/ZMM*/*/*
^
Error: DAS QL ERROR, query=dataset = /ZMM*/*/*, idx=0, msg=Wrong DAS key: dataset
Thanks,
David
Hi,
I selected the "plain" result format, but it always returns the "list" result format.
Compare:
and
Andrew
Clicking on "Site" for the following dataset give me an error:
dataset: /EphemeralZeroBias0/Run2022C-PromptReco-v1/AOD
Thanks,
Michael
Dear Colleagues,
I am trying to access the following information:
dataset=/BTagMu/Run2018A-12Nov2019_UL2018-v1/MINIAOD | grep dataset.nevents, dataset.nblocks, dataset.nfiles
and it displays the desired result:
Number of blocks: 77 Number of events: 22633270 Number of files: 480
But there are a coupld of datasets sets for which I need this information. But using the das client it
dasgoclient -query="dataset=/BTagMu/Run2018A-12Nov2019_UL2018-v1/MINIAOD | grep dataset.nevents, dataset.nblocks, dataset.nfiles"
Unable to extract filters=[dataset [0] nevents], error=Key path not found
Unable to extract filters=[dataset [0] nblocks], error=Key path not found
Unable to extract filters=[dataset [0] nfiles], error=Key path not found
Unable to extract filters=[dataset [0] nevents], error=Key path not found
Unable to extract filters=[dataset [0] nblocks], error=Key path not found
Unable to extract filters=[dataset [0] nfiles], error=Key path not found
Unable to extract filters=[dataset [0] nevents], error=Key path not found
Unable to extract filters=[dataset [0] nblocks], error=Key path not found
Unable to extract filters=[dataset [0] nfiles], error=Key path not found
22633270 77 480
at the end it gives the desired result printed. A quick search revels the fact that we do that query, it returns three results - from dbs3:dataset_info, dbs3:datasetlist and dbs3:filesummaries. Only the third one has nevetns, nblocks, nfiles. First two don't have them, so you get two set of errors about "Key path not found" and then the result that you want.
Could you please give your expert opinion how can we filter on dbs3:filesummaries so that we don't have those filters issues when using the dasgoclient.
Regards, Wajid
Hi,
I'm trying to get some information regarding a dataset. The command worked earlier today, but now both web interface as well as CLI return that the dataset is empty and there are no blocks or files.
This is the query and response:
dasgoclient -query 'dataset dataset=/HIRun2015/HIHardProbes/RAW' -json
[
{"das":{"expire":1667919776,"instance":"prod/global","primary_key":"dataset.name","record":1,"services":["dbs3:filesummaries"]},"dataset":[{"max_ldate":null,"median_cdate":null,"median_ldate":null,"name":"/HIRun2015/HIHardProbes/RAW","nblocks":0,"nevents":0,"nfiles":0,"nlumis":0,"num_file":0,"num_lumi":0,"size":0}],"qhash":"3f5380f0c3bb4504ae3d5ca5431a0c69"}
]
block
and file
queries return nothing:
dasgoclient -query 'block dataset=/HIRun2015/HIHardProbes/RAW'
dasgoclient -query 'file dataset=/HIRun2015/HIHardProbes/RAW'
This might be some intermittent issue, but I don't know which CMS talk forum to use for that, which is why I'm reverting to GitHub...
I think I introduced a bug in DAS maps which apply dataset_access_status=VALID for datasets queries and this prohibit to look-up datasets with not VALID status in DAS web server.
Dear experts,
I wanted to know if there is any EGamma dataset containing run 355872 released, trying dataset=/EGamma/*/* run=355872
or something like dataset=/EGamma/*/* run between(or in) [355815, 355915]
, but it did not show anything.
And I tried with dataset=/EGamma/*/* date between [20220718, 20220726]
, it shows only RAW dataset..
Then I looked up dataset run=355872
itself and could find EGamma Run2022C datasets.
For example dataset=/EGamma/Run2022C-PromptReco-v1/MINIAOD
has run 355872, created 2022-07-20.
Is this expected? If this is not the recommended way to use run/date option, what supposes to be the best option to find a specific dataset with a given run / date?
Best regards,
Jieun
The following query fails in das2go but succeeds in das:
dataset site=T2_US_Nebraska
error message is "runtime error: index out of range"
Please test DAS against the integration Rucio server
rucio_host = http://cms-rucio-int.cern.ch
auth_host = https://cms-rucio-auth-int.cern.ch
(same as the normal server except with -int added). However this server does not have much data in it and is not a replacement for the full Rucio server, so this should not be user visible. All we need to test is whether the Go DAS client can successfully authenticate against the server(s).
Eric
List of SiteDB APIs DAS uses (subject of migration to CRIC):
To perform DAS+CRIC integration we need to cover all of these use-cases.
The following query:
dasgoclient -query="file dataset=/SingleMu/Run2012D-ZMu-15Apr2014-v1/RAW-RECO run=206574 site=T2_CH_CERN"
tries CMSWEB url to place Rucio request, it can be fixed with RUCIO_URL
RUCIO_URL=http://cms-rucio.cern.ch ./dasgoclient -query="file dataset=/SingleMu/Run2012D-ZMu-15Apr2014-v1/RAW-RECO run=206574 site=T2_CH_CERN"
Therefore, I need to adjust setup of Rucio URL in combined plugin (outside of Rucio maps).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.