Code Monkey home page Code Monkey logo

Comments (19)

vkuznet avatar vkuznet commented on August 18, 2024

Slava, it is good observation but it is not trivial to solve. Let me explain. The actual query config dataset=XXX translates into two calls to reqmgr2 service:

# one for input dataset
https://cmsweb.cern.ch/reqmgr2/data/request?inputdataset=/RelValTTbar_14TeV/CMSSW_12_1_0_pre4-PU_121X_mcRun3_2021_realistic_v10_HighStat-v2/MINIAODSIM

# one for output dataset
https://cmsweb.cern.ch/reqmgr2/data/request?outputdataset=/RelValTTbar_14TeV/CMSSW_12_1_0_pre4-PU_121X_mcRun3_2021_realistic_v10_HighStat-v2/MINIAODSIM

DAS parses generic JSON, check the return of outputdataset call. Since JSON has no fixed schema, I was only told at some point to look at ConfigCacheID match in keys. Since there is not schema, i.e. all attributes are assigned based on dynamic nature of workflow and their keys are not static, then I can't apply generic algorithm to parse it. For instance, since I don't a-priori if Task key will appear in a document (and most likely they were added at later stage) I don't know where to look for configs. This is very serious issue in WMCore code who does not provide static schema for their documents and parsing docs become very complicated problem. I can extract the actual RequestName and construct appropriate link, but I have no idea if Task attributes will be present in a document, and how many document will have, and which naming convention they will use. So far tasks have Task1, Task2, etc. but no one told me that they exists (since there is no schema publicly available). I can add new code which will look for string matches of Task attributes using regexp, but it does not guarantee that I'll miss something when new config section will be added somewhere else in a document. I hope you understand the point.

Anyway, not once I actually see new structure of reqmgr output I can try to adjust code to parse it, but this structure may still evolve somehow and if no schema will be published we'll have similar discussion again in a future.

from das2go.

slava77 avatar slava77 commented on August 18, 2024

I can extract the actual RequestName and construct appropriate link

this would be great to have.

I think that the second part requested in my issue description (about the full set of config links) is less essential.
Still, would it make sense to look for ConfigCacheID in all elements and construct the result based on that?

from das2go.

vkuznet avatar vkuznet commented on August 18, 2024

Slave, please have a look at cmsweb-testbed, e.g.

https://cmsweb-testbed.cern.ch/das/request?view=list&limit=50&instance=int%2Fglobal&input=config+dataset%3D%2FRelValTTbar_14TeV%2FCMSSW_12_1_0_pre4-PU_121X_mcRun3_2021_realistic_v10_HighStat-v2%2FMINIAODSIM

Now, it returns proper request config name, it also provide a link to ReqMgr info. So far link points cmsweb-testbed you'll get 500, but link is auto-generate from a deployment cluster and it will be working fine on cmsweb (you may check that by removing -testbed from it).

Let me know. I plan to rename Config name to Request name since it better represent the value though. After some tests and your ok, I'll put it on production.

from das2go.

slava77 avatar slava77 commented on August 18, 2024

the ReqMgr info link points to https://cmsweb-testbed.cern.ch/reqmgr2/fetch?rid=pdmvserv_RVCMSSW_12_1_0_pre4TTbar_14TeV__HighStat_211020_131422_5228
which does not exist.
I'm not sure if this is just a specifics of using cmsweb-testbed; the link should have cmsweb.cern.ch.

The link for Config urls: output-config-0 points to cmsweb.cern.ch correctly.

Instead of (or in addition to?) the plaintext Request ids: 46713bf726160ce248142d29719c1878, 46713bf726160ce248142d29719c2eff, 46713bf726160ce248142d29719b6f22, 46713bf726160ce248142d29719be2df I'd rather see links to config urls like the already present
https://cmsweb.cern.ch:8443/couchdb/reqmgr_config_cache/46713bf726160ce248142d29719c1878/configFile

from das2go.

vkuznet avatar vkuznet commented on August 18, 2024

Slava, I made necessary changes and deployed new version of production server. Now you can see results directly on cmsweb.cern.ch, see

https://cmsweb.cern.ch/das/request?view=list&limit=50&instance=prod%2Fglobal&input=config+dataset%3D%2FRelValTTbar_14TeV%2FCMSSW_12_1_0_pre4-PU_121X_mcRun3_2021_realistic_v10_HighStat-v2%2FMINIAODSIM

Please confirm that everything works.

from das2go.

slava77 avatar slava77 commented on August 18, 2024

Valentin,
thank you for the update. The links are working for me and the data is present.

I'd still be interested to see the cacheIDs decoded to more human readable strings.
Looking at https://cmsweb.cern.ch/reqmgr2/data/request?outputdataset=/RelValTTbar_14TeV/CMSSW_12_1_0_pre4-PU_121X_mcRun3_2021_realistic_v10_HighStat-v2/MINIAODSIM
I think that

  • 46713bf726160ce248142d29719c1878 which comes from "DQMConfigCacheID": "46713bf726160ce248142d29719c1878", can appear simply as DQMConfig
  • the remaining IDs are in blocks with TaskName field present, the values of the TaskName would be the most informative in describing the config links

from das2go.

vkuznet avatar vkuznet commented on August 18, 2024

Slava, as I wrote since there is no fixed schema I can't be sure which names to pick and apply. For instance, you provide output of outputdataset, but I also scan inputdataset. Does their schema are the same? I doubt it. The DQMConfigCacheID is part of a string I scan using regex to match ConfigCacheID . The question is then how it will be called if it will not be a DQM config? The structure of task parts of the dictionary seems follow some schema and I can extract TaskName, but once again I don't know if it is persistent across all config files for all different datasets we produce/consume. I think it is a general question to DMWM team, please discuss this further with @amaltaro , @todor-ivanov , and the rest of WMCore team.

Bottom line, until we'll have fixed and documented schema for all these docs I have no idea how to correctly right the code to extract unknown to me attributes.

from das2go.

slava77 avatar slava77 commented on August 18, 2024

OK, fair enough, I hope that the situation with the schema is understood and a good readable choice of config names in the config links will be made.

Thank you for the updates already made.

from das2go.

vkuznet avatar vkuznet commented on August 18, 2024

@amaltaro this request requires your review, especially since I don't know if configuration files follows specific schema. Here is it very important that documents should have static schema, and it is equally important to know up-front how certain names are created and where they will appear in a document. DAS always has this issues with different systems who do not provide the static schema. Please review and provide your feedback how certain configuration should be look-up in configuration docs, and provide schema definitions for these configuration files.

from das2go.

amaltaro avatar amaltaro commented on August 18, 2024

I am not sure I understand what is requested here.

WMCore/ReqMgr2 does define a request schema, their data types, how they are supposed to be constructed, default values, etc. However, we support multiple workflow/spec types, and each of them have its own peculiarities (also with a schema defined and enforced). In other words, there are key/value pairs that you will only find in TaskChain, others will only be available in StepChains, and ReReco will also be different. Please have a look at this documentation for further information:
https://github.com/dmwm/WMCore/wiki/Workflow-creation-and-assignment-definition#request-type-dependencies

ConfigCacheID or DQMConfigCacheID is meant to be a hash unique id, which points to a document in central CouchDB. Hence ReqMgr2 provides that id instead of the task/step to which it belongs to. It should be fairly easy to rename it in DAS to something like:

if RequestType == ReReco:
    rename config cache id to DataProcessing
elif RequestType == TaskChain:
    rename config cache id to the value in TaskName (note that you need to look at the right task dict)
elif RequestType == StepChain:
    rename config cache id to the value in StepName (note that you need to look at the right step dict)

from das2go.

vkuznet avatar vkuznet commented on August 18, 2024

@amaltaro thanks for providing this info, this is what is required. In this case, where I can find schema definitions for each individual workflows? Does diagram you showed represents all possible workflows in a system or there are others? Does naming conventions are fixed, i.e. use of CamelCase, like ConfigCacheID or DQMConfigCacheID. Where I can find all declared attributes of the schema? Does this area https://github.com/dmwm/WMCore/tree/master/doc/createSpecs represents all existing schema files? Once I have all answers and schema definitions I can proceed with implementation.

from das2go.

vkuznet avatar vkuznet commented on August 18, 2024

@slava77 , I deployed new version on production cluster which now shows IDs together with corresponding task/request name. It is shown like this:

Config urls: output-config-0 Request ids: 46713bf726160ce248142d29719be2df (RecoPU_2021PU), 46713bf726160ce248142d29719c2eff (Nano_2021PU), 46713bf726160ce248142d29719b6f22 (DigiPU_2021PU), 46713bf726160ce248142d29719c1878 (DQMConfigCacheID)

You may check your URL. Please confirm that now everything works and I can close the ticket.

from das2go.

slava77 avatar slava77 commented on August 18, 2024

looking at
https://cmsweb.cern.ch/das/request?view=list&limit=50&instance=prod%2Fglobal&input=config+dataset%3D%2FRelValTTbar_14TeV%2FCMSSW_12_1_0_pre4-PU_121X_mcRun3_2021_realistic_v10_HighStat-v2%2FMINIAODSIM

I see
image
This is nice.

however the URLs are apparently malformed "https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/46713bf726160ce248142d29719be2df%20(RecoPU_2021PU)/configFile"
should instead be "https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/46713bf726160ce248142d29719be2df/configFile"

I would drop the hash from the hyperlink text and have just RecoPU_2021PU

Another thing, ReqMgr info link has a URL pointing to https://cmsweb-k8s-prod.cern.ch/reqmgr2 is this the standard now or just a test instance?

from das2go.

vkuznet avatar vkuznet commented on August 18, 2024

thanks for spotting the issue with link, I didn't check it explicitly. Now it is fixed and I removed extra hashes from the link name. Please check and report again. Regarding URL for ReqMgr info, internally we do run it now on k8s and link is correct but I need to double check where we generate to confirm if we should point it to cmsweb or cmsweb-k8s-prod. I'll do it later.

from das2go.

slava77 avatar slava77 commented on August 18, 2024

@vkuznet
Thank you for the update
The updates look good.

One minor thing I noticed is that there is some duplicate information now, in the URL output-config-0 we have the same information as DQMConfigCacheID ; note also the difference with :8443 and without it in the two cases, respectively.

from das2go.

vkuznet avatar vkuznet commented on August 18, 2024

@slava77 , thanks for checking. I need to decide now if we need to show Config urls since we have Request ids info. I don't know yet if new info covers all output config urls. I need to check with different set of data. And, I'll fix port number too.

from das2go.

amaltaro avatar amaltaro commented on August 18, 2024

https://github.com/dmwm/WMCore/tree/master/doc/createSpecs

@vkuznet Hi Valentin, yes, this is the right place to see the request schema. It is likely missing the recent GPU* parameters though, I will have to update it in the coming days.

Does diagram you showed represents all possible workflows in a system or there are others?

yes, StoreResults is planned to be deprecated though. So it's up to you if you want to support old workflows or not (likely less than 10 such workflows every year).

Does naming conventions are fixed, i.e. use of CamelCase, like ConfigCacheID or DQMConfigCacheID.

yes, for spec attributes, we always use upper camel case.

from das2go.

vkuznet avatar vkuznet commented on August 18, 2024

@slava77 regarding cmsweb-k8s-prod link to ReqMgr. It is complicated and was introduced when DMWM team decided to separate cmsweb into two entities, one for end-users and another for production tools. Since DAS uses maps which has services URLs the direct links to services, like DBS or ReqMgr now point cmsweb-k8s-prod and internally DAS queries services via these URLs. For all other URs, like another DAS query, all links remain pointing to cmsweb. I don't want to write additional layer of redirection and current links are representing correct URLs, i.e. if it points to service on production cluster and if it uses DAS URL to access some query (which points to cmsweb).

I think this ticket can be closed, and I still need time to investigate if output-config-XXX links can be removed. Please confirm that we can close this ticket.

from das2go.

slava77 avatar slava77 commented on August 18, 2024

Please confirm that we can close this ticket.

I'm fine to have this closed.

from das2go.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.