Code Monkey home page Code Monkey logo

das2go's People

Contributors

iarspider avatar vkuznet avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

das2go's Issues

query dataset in production status ?

Hi,
I think in the past it was possible to query dataset in "production" status, with something like dataset=/*/PhaseIITDRSpring19DR-*/GEN-SIM-DIGI-RAW status=* .

Unfortunately the status= part of the query does not seem to work any more ?

Thank you,
.Andrea

Error when looking for a specific lumi in MC

Description

Running the command
file dataset=/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIIAutumn18MiniAOD-102X_upgrade2018_realistic_v15-v2/MINIAODSIM run=1 lumi=123

(on the web interface) gives me a long error message saying things like

error=json: cannot unmarshal object into Go value of type []mongo.DASRecord

Details

I am trying to find the miniAOD file that holds a specific event I have found in nanoAOD. For this I have looked up the value of the luminosityBlock of the event in nanoAOD. The value is 123 and the dataset is /WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIIAutumn18NanoAODv5-Nano1June2019_102X_upgrade2018_realistic_v19-v1/NANOAODSIM.

DAS tells me the parent dataset is /WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIIAutumn18MiniAOD-102X_upgrade2018_realistic_v15-v2/MINIAODSIM. Thus to find the file I'm looking for, I use the previously mentioned query.

Links do not propagate dbs instance on search page

Cannot view all files within block

Hi,

I am using DAS to query the files within block:
/TTToSemiLeptonic_mtop166p5_TuneCP5_13TeV-powheg-pythia8/RunIISummer19UL17RECO-106X_mc2017_realistic_v6-v2/AODSIM#d3ae4ba2-2c38-40dc-8152-421e71ce4a21

I see the block contains 133 files. I can view the first 50 but I cannot find a way to view the next pages. If I click 'next' page, I see this message:

No results found DAS unable to find any results for your query. Please revisit your query by reviewing DAS query guide or submit a DAS github issue to resolve your query request.

I also tried changing the number of viewable results from 50 to 150, but this did not work, and I only see 50 results.

Fix poor golint coverage

Most of the complains from golint come about improper notation use in functions, e.g. L_dbs3_datasetlist API. In go it is recommended not to use underscores in function names. These names we look-up via reflect module, see
https://github.com/dmwm/das2go/blob/master/das/das.go#L341

I propose the following fix:

  • rename all L_system_apiname functions to LocalsystemAPIname
  • change aforementioned reflect construct to use new notations

GEANT Certificates not accepted / maybe a problem with comma in DN

Hi everybody,

@Panos512 asked me to open this ticket as a "follow up" to these in CRIC and the Service Portal (LDAP). The context there is that a comma in the DN was causing trouble and it is fixed in CRIC already and will hopefully be fixed for LDAP.
I am not sure if this is also the problem in DAS:

I have a new certificate from a new provider (Issuer: C = NL, O = GEANT Vereniging, CN = GEANT eScience Personal CA 4) and the DN has a somewhat funny format:

$ openssl x509 -noout -text -in usercert.pem | grep Lange
        Subject: DC = org, DC = terena, DC = tcs, C = DE, O = Universitaet Hamburg, CN = "Lange, Dr. Johannes <username>@uni-hamburg.de"

The CN is generated with information from an SSO of the home institution and in the case of Uni Hamburg unfortunately contains a comma (and is in quotes). All new certificates for our group members will be issued by GEANT from now on, because GridKa-CA will stop operation.

cmsweb does not complain when selecting the new certificate, but when I go on to DAS, I receive "Peer does not recognize and trust the CA that issued your certificate."
image
The situation is the same for http://cmsweb-testbed.cern.ch/ and http://cmsweb-prod.cern.ch/.

I am not sure if this is also caused by the comma in the DN or if this is a different problem.
Any help would be appreciated and I can provide more information, if needed!

Best,
Johannes

Adding back the link of a dataset to McM

Dear all,
would it be possible to add back the link of a MC dataset to the McM request?

We got a lot of queries about that and we think it might be useful.

Thanks in advance.

The Pdmv group

malformed query information

The old das tool provides more useful information for locating where a typo is in a query. Compare the das and das2go webpages for this query with a space in the middle of the dataset name:

https://cmsweb.cern.ch/das2go/request?view=list&limit=50&instance=prod%2Fglobal&input=dataset+dataset%3D%2Fabcdefghijk%2Flmnopqrstuv%2Fwxy+z%2F

https://cmsweb.cern.ch/das/request?view=list&limit=50&instance=prod%2Fglobal&input=dataset+dataset%3D%2Fabcdefghijk%2Flmnopqrstuv%2Fwxy+z%2F

In the das webpage you can easily find the problem with the -----------------------------------------------^

Instabilities of DAS site dataset=/a/b/c queries

I got report from Felipe Gómez-Cortés who claimed that DAS web UI provides different results for the following query:

site dataset=/ReggeGribovPartonMC_EposLHC_pPb_4080_4080/pPb816Spring16GS-80X_mcRun2_asymptotic_v17-v1/GEN-SIM

After series of iterations I confirmed that this is the case using my dev environment. I identified that the problem is related to unability of DAS to contact with cms-rucio service yield the following errors:

2021/06/05 11:17:28 fetch.go:486: ERROR: fail to fetch http://cms-rucio.cern.ch/replicas/cms//ReggeGribovPartonMC_EposLHC_pPb_4080_4080/pPb816Spring16GS-80X_mcRun2_asymptotic_v17-v1/GEN-SIM#3ce2d95e-7168-11e6-9fb1-002590494fb0/datasets, retries 3, error Get "http://cms-rucio.cern.ch/replicas/cms//ReggeGribovPartonMC_EposLHC_pPb_4080_4080/pPb816Spring16GS-80X_mcRun2_asymptotic_v17-v1/GEN-SIM#3ce2d95e-7168-11e6-9fb1-002590494fb0/datasets": dial tcp: lookup cms-rucio.cern.ch: no such host
2021/06/05 11:17:28 fetch.go:486: ERROR: fail to fetch http://cms-rucio.cern.ch/replicas/cms//ReggeGribovPartonMC_EposLHC_pPb_4080_4080/pPb816Spring16GS-80X_mcRun2_asymptotic_v17-v1/GEN-SIM#c541fd04-7198-11e6-9fb1-002590494fb0/datasets, retries 3, error Get "http://cms-rucio.cern.ch/replicas/cms//ReggeGribovPartonMC_EposLHC_pPb_4080_4080/pPb816Spring16GS-80X_mcRun2_asymptotic_v17-v1/GEN-SIM#c541fd04-7198-11e6-9fb1-002590494fb0/datasets": dial tcp: lookup cms-rucio.cern.ch: no such host

We need to identify the source of this issue. @ericvaandering any ideas?

Dataset view sub-info in random order

When looking at the result of a dataset query, the information about the dataset below the dataset link is sometimes in a different order.

For example:

The query

https://cmsweb.cern.ch/das/request?input=dataset%3D%2FSUSYVBFToHToAA_AToMuMu_AToTauTau_M-300_M-5_TuneCUETP8M1_13TeV_madgraph_pythia8%2Fdntaylor-crab_2018-11-20_Skim_MuMuTauTau_VBF_80X_v1-d605b0133e938d0f7712cc7189836ff8%2FUSER&instance=prod/phys03

Results in a view like:

image

whereas this query

https://cmsweb.cern.ch/das/request?input=dataset%3D%2FSUSYVBFToHToAA_AToMuMu_AToTauTau_M-300_M-9_TuneCUETP8M1_13TeV_madgraph_pythia8%2Fdntaylor-crab_2018-11-20_Skim_MuMuTauTau_VBF_80X_v1-d605b0133e938d0f7712cc7189836ff8%2FUSER&instance=prod/phys03

results in:

image

These change with a refresh (hence the attached screen shots).

Other formatting errors appear on refresh:

image

In this version, you see spaces before and after the slashes in the dataset link.

panic error in dasgoclient for run,lumi query

I downloaded latest version from GH, and here's what I get, which is the same as with current version in CVMFS (v02.04.42 )
if I query for run or lumi separetly it is OK. but --query 'lumi,run or --query 'run,lumi fail. I have never seen this before. Could it be because of the odd dataset name, or just something I did wrong in that dataset ? Or is this a problem on dasgoclient ? If it is due to something I did in this test, no problem.

belforte@lxplus715/TestConfig> ./dasgoclient_amd64 --version
Build: git=v02.04.45 go=go1.17.7 date=2022-03-22 19:13:12.504084088 +0100 CET m=+0.001383466
belforte@lxplus715/TestConfig> ./dasgoclient_amd64 --query 'lumi,run dataset=/ThisIsATest/belforte-CrabAutoTest_userInputFiles-94ba0e06145abd65ccb1d21786dc7e1d/USER instance=prod/phys03'
panic: interface conversion: interface {} is json.Number, not []interface {}

goroutine 1 [running]:
github.com/dmwm/das2go/services.OrderByRunLumis({0xc000101c00, 0x46, 0x7c5b3f})
	/home/runner/go/pkg/mod/github.com/dmwm/[email protected]/services/helpers.go:316 +0x8ea
github.com/dmwm/das2go/services.fileRunLumi({{0xc000408090, 0x83}, {0x7ffffae0dfca, 0x7d}, {0xc000345f40, 0x20}, 0xc000402870, {0xc000369720, 0x2, 0x2}, ...}, ...)
	/home/runner/go/pkg/mod/github.com/dmwm/[email protected]/services/helpers.go:303 +0x349
github.com/dmwm/das2go/services.LocalAPIs.RunLumi4Dataset(...)
	/home/runner/go/pkg/mod/github.com/dmwm/[email protected]/services/dbs.go:167
reflect.Value.call({0x7c0140, 0xac37e8, 0xa93d40}, {0x7c59d7, 0x4}, {0xc0001d7108, 0x1, 0x40c34b})
	/opt/hostedtoolcache/go/1.17.7/x64/src/reflect/value.go:556 +0x845
reflect.Value.Call({0x7c0140, 0xac37e8, 0xc000161998}, {0xc0001d7108, 0x1, 0x1})
	/opt/hostedtoolcache/go/1.17.7/x64/src/reflect/value.go:339 +0xc5
main.processLocalApis({{0xc000408090, 0x83}, {0x7ffffae0dfca, 0x7d}, {0xc000345f40, 0x20}, 0xc000402870, {0xc000369720, 0x2, 0x2}, ...}, ...)
	/home/runner/work/dasgoclient/dasgoclient/main.go:1001 +0x505
main.process({0x7ffffae0dfca, 0x7d}, 0x0, {0x842680, 0x1}, 0x0, {0x0, 0x0}, {0x7cd077, 0x16}, ...)
	/home/runner/work/dasgoclient/dasgoclient/main.go:509 +0x130f
main.main()
	/home/runner/work/dasgoclient/dasgoclient/main.go:155 +0x120d
belforte@lxplus715/TestConfig> 

dasgoclient query does not return value

I'm using the following dasgoclient query to estimate the volume of legacy data and it works as expected for this MINIAOD query (for example)

$ dasgoclient -query "block dataset=/*/Run2017*09Aug2019_UL2017*/MINIAOD | sum(block.size)"
sum(block.size): 2.12740409597139e+14

(Curiously, the same query does not work on DAS web UI while it used to work earlier).

However, for another query, this time for NANOAOD, the dasgoclient query returns no result even if the datasets exist:

$ dasgoclient -query "block dataset=/*/Run2017*UL2017_MiniAODv1_NanoAODv2*/NANOAOD | sum(block.size)"
$ dasgoclient -query "dataset=/*/Run2017*UL2017_MiniAODv1_NanoAODv2*/NANOAOD"
/BTagCSV/Run2017B-UL2017_MiniAODv1_NanoAODv2-v1/NANOAOD
/BTagCSV/Run2017C-UL2017_MiniAODv1_NanoAODv2-v1/NANOAOD
/BTagCSV/Run2017D-UL2017_MiniAODv1_NanoAODv2-v1/NANOAOD
/BTagCSV/Run2017E-UL2017_MiniAODv1_NanoAODv2-v1/NANOAOD
.....

Is there something I should do differently?

missing file in dataset

Add support for Rucio APIs

The CMS Data Aggregation System (DAS) relies on the following PhEDEx APIs:

But the main ones are blockReplicas and fileReplicas. Here are concrete examples of URL DAS places:

# fileReplicas example
https://cmsweb.cern.ch/phedex/datasvc/json/prod/fileReplicas?dataset=%2FZeroBias%2FRun2017F-31Mar2018-v1%2FNANOAOD
# blockReplicas example
https://cmsweb.cern.ch/phedex/datasvc/json/prod/blockReplicas?dataset=%2FZeroBias%2FRun2017F-31Mar2018-v1%2FNANOAOD

You can either use this URLs directly in a browser or use curl client, e.g.

curl -L -k --key ~/.globus/userkey.pem --cert ~/.globus/usercert.pem
"https://cmsweb.cern.ch/phedex/datasvc/json/prod/blockReplicas?dataset=%2FZeroBias%2FRun2017F-31Mar2018-v1%2FN
ANOAOD"

and then you can see returned JSON document.

To perform DAS+Rucio aggregation we need to get list of Rucio APIs for all PhEDEx APIs DAS uses.

Implement file dataset=/a/b/c site=XXX run=123 query using Rucio APIs

Originally the support for

file dataset=/a/b/c site=XXX run=123

query was done through DBS and Phedex APIs. First, we resolved list of blocks for a given dataset. Then, we find files for a given set of blocks and run number, and finally filter files using Phedex fileReplicas API to select files on a given site.

Now, we need to implement the same logic using DBS and Rucio APIs. The question is do we have similar to fileReplicas Rucio API to select files only for a given site or should we find another route in Rucio to accommodate this workflow.

@ericvaandering could you please comment on this?

Fix rucio list dataset replicas

The RUCIO API that lists dataset replicas location has a known issue (*) that make it provides inconsistent/outdated location. The correct response is provided by the very same API but with deep=True parameter (**).

If I follow the DAS code correctly (big if) the only point where this API is used is here (***) (not sure though, the name is not what I expect the call to do). It should be then enough to add deep=True parameter to this call in order to get the correct set of dataset location.

(*)
dmwm/CMSRucio#257

(**)
https://rucio.readthedocs.io/en/old-doc/restapi/replica.html#get--replicas-(path-scope_name)-datasets

(***)

das2go/das/das.go

Lines 641 to 644 in 12589ce

if urn == "block4dataset_size" {
// add datasets after url which will return CMS blocks (Rucio datasets)
furl = fmt.Sprintf("%s/datasets/", furl)
}

Add processing time to DAS query

It would be very useful to add processing time to DAS query. We can calculate it from the time we insert the query into MongoDB and final time when records are ready. We show this time as "processing time" on DAS web UI.

Can't find storage site of a existed file

I couldn't find any site for file /store/mc/RunIISummer20UL18NanoAODv9/ZJetsToQQ_HT-200to400_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_upgrade2018_realistic_v16_L1v1-v2/2550000/B369327F-C0CD-564B-9B16-8CCD9BD9DEF4.root. You can refer to this link

DASGoServer using WMCore couch views deprecated long ago

@vkuznet Valentin, please let me know if this issue should get created somewhere else.

As mentioned in the WMCore meeting today, while scanning the CMSWEB frontend logs, I noticed the following 3 calls:

IP "GET /couchdb/wmstats/_design/WMStats/_view/requestByOutputDataset?key=\"/SingleMuon/Run2018D-SiPixelCalSingleMuon-ForPixelALCARECO_UL2018-v1/ALCARECO\"&include_docs=true&stale=update_after HTTP/1.1" 404 [data: 10719 in 15439 out 52 body 7227 us ] [auth: TLS... "DN" "-" ] [ref: "-" "dasgoserver" ]

IP "GET /couchdb/wmstats/_design/WMStats/_view/requestByInputDataset?key=\"/SingleMuon/Run2018D-SiPixelCalSingleMuon-ForPixelALCARECO_UL2018-v1/ALCARECO\"&include_docs=true&stale=update_after HTTP/1.1" 404 [data: 10718 in 15439 out 52 body 8498 us ] [auth: TLSv... "DN" "-" ] [ref: "-" "dasgoserver" ]

IP "GET /couchdb/wmstats/_design/WMStats/_view/requestByOutputDataset?key=\"/store/data/Run2018C/EGamma/RAW/v1/000/319/349/00000/F01E03C9-1683-E811-A262-FA163E5A6AC2.root\"&include_docs=true&stale=update_after HTTP/1.1" 404 [data: 10736 in 15439 out 52 body 5896 us ] [auth: TLSv... "DN" "-" ] [ref: "-" "dasgoserver" ]

Searching for these couch views in WMCore, they have been deprecated 5 years ago (!):
dmwm/WMCore#5609

Can you please update them as follows:

In addition to that, can you please clarify which kind of request information you need? Is it just the workflow name? Or you need some other workflow meta-data. If it's the former, then please do always use detail=False.

Last but not least, if you check the 3rd example, there is no data sanitization on DAS Server, which means, it will make a reqmgr2 call to whatever data input is provided by the user. In this case, it asks for workflows by output dataset, but it provides a LFN.

An extra request, would you have a map of all WMCore APIs used within DAS Server/client? We might have other such cases that were not spotted yet. Thanks

Unable to perform queries on DAS

Dear DAS experts,
I am currently unable to perform queries on the web interface of CMS DAS. For instance, I am trying to perform this simple query:
dataset=/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/*20UL*/NANOAODSIM

receiving only 1 result with the query I just inputted, instead of the list of related datasets.

Best,
Tommaso

Sanitize dataset name before passing it to the ReqMgr2 REST APIs

Valentin, as discussed in this issue: #27
we should sanitize the user data before making a call to the reqmgr2 REST APIs.

For instance, if you are requesting ReqMgr2 data, provided an input or output dataset, you need to make sure that the user provided a valid dataset name (not a block, not a LFN, not any other string not matching standard lexicon).

Sorting dataset files in query not working

When asking for a list of files from a dataset on the DAS webpage, and sorting by nevents or size, it does not sort the output.

e.g. the query file dataset=/JetHT/Run2016D-17Jul2018-v1/MINIAOD | sort file.nevents as per the FAQ: https://cmsweb.cern.ch/das/request?view=list&limit=50&instance=prod%2Fglobal&input=file+dataset%3D%2FJetHT%2FRun2016D-17Jul2018-v1%2FMINIAOD+%7C+sort+file.nevents

produces files with nevents: 81919, 82659, 94697, 80021, 109376, 83485. Similarly for sort file.size. These queries return the same file order as when sort isn't used at all.

Aside: I also tried on the commandline with dasgoclient, but that didn't even return anything. Should I open another issue for that, or is it related?

Thanks,
Robin

site query broken in web UI

I was trying to figure out if a data set is already available at a site, but the query

'site dataset=/DYJetsToLL_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL16NanoAODAPVv9-106X_mcRun2_asymptotic_preVFP_v11-v1/NANOAODSIM'

equivalent to
https://cmsweb.cern.ch/das/request?instance=prod/global&input=site+dataset%3D%2FDYJetsToLL_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8%2FRunIISummer20UL16NanoAODAPVv9-106X_mcRun2_asymptotic_preVFP_v11-v1%2FNANOAODSIM

returns

image

while dasgoclient lists the sites:

dasgoclient -query 'site dataset=/DYJetsToLL_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL16NanoAODAPVv9-106X_mcRun2_asymptotic_preVFP_v11-v1/NANOAODSIM'
T1_DE_KIT_Disk
T1_ES_PIC_Disk
T1_RU_JINR_Disk
T1_US_FNAL_Disk
T2_BE_IIHE
T2_BE_UCL
T2_CH_CSCS
T2_DE_DESY
T2_EE_Estonia
T2_ES_CIEMAT
T2_FR_GRIF_LLR
T2_IN_TIFR
T2_IT_Legnaro
T2_UK_London_IC
T2_UK_SGrid_RALPP
T2_US_MIT
T2_US_Nebraska
T2_US_Purdue
T3_IT_Trieste

Any idea what could be wrong?

buggy links in cmsweb.cern.ch/das

I'm using the web interface, and I tackle the following error:
I'm searching for children of a given file:

child file=/store/mc/RunIISummer20UL18RECO/GGToEE_Pt-35_Elastic_13TeV-lpair/AODSIM/106X_upgrade2018_realistic_v11_L1v1-v2/100000/027C3A23-DAA8-1449-9047-A418996EEC2E.root

The output is two files, but when I click on one of the files to view its related info, web interface forwards me to the following search line:

dataset=/store/mc/RunIISummer20UL18MiniAOD/GGToEE_Pt-35_Elastic_13TeV-lpair/MINIAODSIM/106X_upgrade2018_realistic_v11_L1v1-v2/100000/80B92674-24B6-9642-87C4-3C41A752F15E.root

with DBS unable to unmarshal the data into DAS record... error.
It should be file instead of dataset in the search command.
This error might lead a user to conclude that the file is missing, so it is important to make this fix.

Thanks!

Wildcards Not working properly in DAS query

Dear Experts,
I am having an issue where wildcards do not seem to be working properly in DAS queries. For example when I use the following query:
dataset=/QCD_Pt-170to300_MuEnrichedPt5_TuneCP5_13TeV_pythia8/RunIIAutumn18MiniAOD-102X_upgrade2018_realistic_v15-v3/MINIAODSIM

I receive the proper dataset, however dataset=/QCD_Pt-170to300_MuEnrichedPt5_TuneCP5_13TeV_pythia8// acts as though I am searching for a dataset with that literal name, rather than the wildcards.

missing reqmgr info and relval dataset configuration

as an example for config dataset=/RelValTTbar_14TeV/CMSSW_12_1_0_pre4-PU_121X_mcRun3_2021_realistic_v10_HighStat-v2/MINIAODSIM I get
https://cmsweb.cern.ch/das/request?view=list&limit=50&instance=prod%2Fglobal&input=config+dataset%3D%2FRelValTTbar_14TeV%2FCMSSW_12_1_0_pre4-PU_121X_mcRun3_2021_realistic_v10_HighStat-v2%2FMINIAODSIM

image

Under Config Name link there is a generic https://cmsweb.cern.ch/das/request?input=config%3DReqMgr2&instance=prod/global , I would expect that it should return the actual reqmgr request pdmvserv_RVCMSSW_12_1_0_pre4TTbar_14TeV__HighStat_211020_131422_5228 at https://cmsweb.cern.ch/reqmgr2/fetch?rid=pdmvserv_RVCMSSW_12_1_0_pre4TTbar_14TeV__HighStat_211020_131422_5228

The following part, at Config urls: correctly picks up only one config out of 4 available in the request, as can be seen in the reqmgr2 link above

Config Cache List

    DQMConfigCacheID: 46713bf726160ce248142d29719c1878
    Task1: DigiPU_2021PU: ConfigCacheID: 46713bf726160ce248142d29719b6f22
    Task2: RecoPU_2021PU: ConfigCacheID: 46713bf726160ce248142d29719be2df
    Task3: Nano_2021PU: ConfigCacheID: 46713bf726160ce248142d29719c2eff 

Add embded FS feature into DAS server code

With an arrival of embed feature in GoLang we should embed all static files and simplify deployment of das server The static content can be embded into das static executable and we will eliminate shipment of static area in DAS server deployment.

dasgoclient does not return result for `file dataset=x run=y lumi=z site=s`

the following works and gives one file

dasgoclient --limit 0 --query 'file dataset=/HIMinBiasUPC/HIRun2011-v1/RAW run=182124 lumi=40'

the site query for the file shows that the file is at T2_CH_CERN.

However a direct combination with the site does not return anything

dasgoclient --limit 0 --query 'file dataset=/HIMinBiasUPC/HIRun2011-v1/RAW run=182124 lumi=40 site=T2_CH_CERN'

dataset lookup from file not working anymore?

This doesn't get any results

dataset file=/store/mc/RunIISummer19ULPrePremix/Neutrino_E-10_gun/PREMIX/UL17_106X_mc2017_realistic_v6-v1/30001/3DBDCDA1-E656-814A-BE78-C0ACDB9C05E0.root

while doing a block lookup first and then getting the dataset from the block works fine.

"plain" format dulicates entries

Using the following query:

https://cmsweb.cern.ch/das/request?view=plain&limit=50&instance=prod%2Fglobal&input=file+dataset%3D%2FSingleMuon%2FRun2016G-03Feb2017-v1%2FMINIAOD

I see duplicated entries (with a comma separating them on the same line).

/store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/DA8E43DC-61F0-E611-8411-70106F4A94F0.root, /store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/DA8E43DC-61F0-E611-8411-70106F4A94F0.root
/store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/C037802E-62F0-E611-94AA-70106F48BBEE.root, /store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/C037802E-62F0-E611-94AA-70106F48BBEE.root
/store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/AE7F0E4C-62F0-E611-A831-0CC47A7FC7B8.root, /store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/AE7F0E4C-62F0-E611-A831-0CC47A7FC7B8.root
/store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/78A07748-62F0-E611-8AE9-0025901D4D54.root, /store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/78A07748-62F0-E611-8AE9-0025901D4D54.root
/store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/6E1B071B-62F0-E611-9069-0025901D493A.root, /store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/6E1B071B-62F0-E611-9069-0025901D493A.root
/store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/4C71AB5B-62F0-E611-93BC-047D7BD6DEC4.root, /store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/4C71AB5B-62F0-E611-93BC-047D7BD6DEC4.root
/store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/4AFFC082-62F0-E611-961E-00266CFFA1FC.root, /store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/4AFFC082-62F0-E611-961E-00266CFFA1FC.root
/store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/40D4854D-62F0-E611-9421-70106F4D23F0.root, /store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/40D4854D-62F0-E611-9421-70106F4D23F0.root
/store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/3A7C6864-62F0-E611-8819-047D7BD6DF5A.root, /store/data/Run2016G/SingleMuon/MINIAOD/03Feb2017-v1/820000/3A7C6864-62F0-E611-8819-047D7BD6DF5A.root
...

web UI misreports site lists

While investigating some crab submission problem report we found the following:
This datase is on tape at CCIN2P3 with just one block on disk in Pisa, dasgoclient correctly reports:
belforte@lxplus021/STE> dasgoclient --query 'site dataset=/BsToMuMuPhi_BMuonFilter_SoftQCDnonD_TuneCUEP8M1_13TeV-pythia8-evtgen/RunIISummer16DR80Premix-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/AODSIM'
T1_FR_CCIN2P3_Buffer
T1_FR_CCIN2P3_MSS
T2_IT_Pisa
belforte@lxplus021/STE>

While looking in DAS web UI the dataset is listed as being on tape only:
https://cmsweb.cern.ch/das/request?instance=prod/global&input=site+dataset%3D%2FBsToMuMuPhi_BMuonFilter_SoftQCDnonD_TuneCUEP8M1_13TeV-pythia8-evtgen%2FRunIISummer16DR80Premix-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1%2FAODSIM

When querying only for the block on disk, WEB UI reports correctly
Block size: 217441549 (217.4MB) Number of files: 1 Open: n Site: T2_IT_Pisa, T1_FR_CCIN2P3_Buffer, T1_FR_CCIN2P3_MSS at
https://cmsweb.cern.ch/das/request?view=list&limit=50&instance=prod%2Fglobal&input=%2FBsToMuMuPhi_BMuonFilter_SoftQCDnonD_TuneCUEP8M1_13TeV-pythia8-evtgen%2FRunIISummer16DR80Premix-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1%2FAODSIM%23e863f226-cb55-11e6-a2b6-001e67abf518

Maybe because 1 disk block out of 148 is somehow rounded to 0% and hence no listed by the
site dataset=... query in the UI ?

Parent search page produces invalid query

Add support to Dynamo

The Dynamo team (mostly Max) has completed the APIs which very closely match the phedex APIs. What one would have to do is to substitute
https://cmsweb.cern.ch/phedex/datasvc/json/prod/
with
http://dynamo.mit.edu/phedexdata/inventory/subscriptions
the rest of the call and options are the same. All dynamo APIs take exactly the same options. The returned jsons structurally are the same as the PhEDEx ones (if not then it is a mistake and will be corrected). The difference in reply would be object ids (dataset id, site id), as dynamo returns numbers that are internal to dynamo. We also omit some fields that dynamo does not have (instead of faking them), please let us know if any of them are actually used by the code so that we could adjust dynamo code. Some smaller issues in one of the API calls is still in progress but thi should be close enough for many tests.

Please let me know if you have any questions or need this to be posted somewhere else.

Web interface error

Hi, I hope this is the right place to report this. The web interface is not working for me right now:

Search string: dataset=/ZMM*/*/*, taken from the help page
Result:

DAS error
DAS query:
dataset=/ZMM*/*/*
^
Error: DAS QL ERROR, query=dataset = /ZMM*/*/*, idx=0, msg=Wrong DAS key: dataset 

Thanks,
David

list result format is always used

Cannot see a site of a dataset

Clicking on "Site" for the following dataset give me an error:

dataset: /EphemeralZeroBias0/Run2022C-PromptReco-v1/AOD

Thanks,
Michael

Accessing information via DAS WebInterface and dasgoclient

Dear Colleagues,

I am trying to access the following information:
dataset=/BTagMu/Run2018A-12Nov2019_UL2018-v1/MINIAOD | grep dataset.nevents, dataset.nblocks, dataset.nfiles

and it displays the desired result:
Number of blocks: 77 Number of events: 22633270 Number of files: 480

But there are a coupld of datasets sets for which I need this information. But using the das client it
dasgoclient -query="dataset=/BTagMu/Run2018A-12Nov2019_UL2018-v1/MINIAOD | grep dataset.nevents, dataset.nblocks, dataset.nfiles"

Unable to extract filters=[dataset [0] nevents], error=Key path not found
Unable to extract filters=[dataset [0] nblocks], error=Key path not found
Unable to extract filters=[dataset [0] nfiles], error=Key path not found
Unable to extract filters=[dataset [0] nevents], error=Key path not found
Unable to extract filters=[dataset [0] nblocks], error=Key path not found
Unable to extract filters=[dataset [0] nfiles], error=Key path not found
Unable to extract filters=[dataset [0] nevents], error=Key path not found
Unable to extract filters=[dataset [0] nblocks], error=Key path not found
Unable to extract filters=[dataset [0] nfiles], error=Key path not found

22633270 77 480

at the end it gives the desired result printed. A quick search revels the fact that we do that query, it returns three results - from dbs3:dataset_info, dbs3:datasetlist and dbs3:filesummaries. Only the third one has nevetns, nblocks, nfiles. First two don't have them, so you get two set of errors about "Key path not found" and then the result that you want.

Could you please give your expert opinion how can we filter on dbs3:filesummaries so that we don't have those filters issues when using the dasgoclient.

Regards, Wajid

DAS queries to files and blocks returning 0

Hi,

I'm trying to get some information regarding a dataset. The command worked earlier today, but now both web interface as well as CLI return that the dataset is empty and there are no blocks or files.
This is the query and response:

dasgoclient -query 'dataset dataset=/HIRun2015/HIHardProbes/RAW' -json
[
{"das":{"expire":1667919776,"instance":"prod/global","primary_key":"dataset.name","record":1,"services":["dbs3:filesummaries"]},"dataset":[{"max_ldate":null,"median_cdate":null,"median_ldate":null,"name":"/HIRun2015/HIHardProbes/RAW","nblocks":0,"nevents":0,"nfiles":0,"nlumis":0,"num_file":0,"num_lumi":0,"size":0}],"qhash":"3f5380f0c3bb4504ae3d5ca5431a0c69"}
]

block and file queries return nothing:

dasgoclient -query 'block dataset=/HIRun2015/HIHardProbes/RAW'
dasgoclient -query 'file dataset=/HIRun2015/HIHardProbes/RAW'

This might be some intermittent issue, but I don't know which CMS talk forum to use for that, which is why I'm reverting to GitHub...

Dataset query with run/date option in DAS

Dear experts,

I wanted to know if there is any EGamma dataset containing run 355872 released, trying dataset=/EGamma/*/* run=355872 or something like dataset=/EGamma/*/* run between(or in) [355815, 355915], but it did not show anything.

And I tried with dataset=/EGamma/*/* date between [20220718, 20220726], it shows only RAW dataset..

Then I looked up dataset run=355872 itself and could find EGamma Run2022C datasets.
For example dataset=/EGamma/Run2022C-PromptReco-v1/MINIAOD has run 355872, created 2022-07-20.

Is this expected? If this is not the recommended way to use run/date option, what supposes to be the best option to find a specific dataset with a given run / date?

Best regards,
Jieun

Failed query

The following query fails in das2go but succeeds in das:

dataset site=T2_US_Nebraska

error message is "runtime error: index out of range"

Adjust Rucio URL in combined plugin

The following query:

dasgoclient -query="file dataset=/SingleMu/Run2012D-ZMu-15Apr2014-v1/RAW-RECO run=206574 site=T2_CH_CERN"

tries CMSWEB url to place Rucio request, it can be fixed with RUCIO_URL

RUCIO_URL=http://cms-rucio.cern.ch ./dasgoclient -query="file dataset=/SingleMu/Run2012D-ZMu-15Apr2014-v1/RAW-RECO run=206574 site=T2_CH_CERN"

Therefore, I need to adjust setup of Rucio URL in combined plugin (outside of Rucio maps).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.