cmsdaq / hltd Goto Github PK
View Code? Open in Web Editor NEWThe Fcube hlt daemon
License: GNU Lesser General Public License v3.0
The Fcube hlt daemon
License: GNU Lesser General Public License v3.0
It would be useful to be able to tell the FUs of a given BU to unmount the disks. The use case is if you need to reboot a BU, you need to unmount the BU disks on the FUs first to avoid stale NFS mounts. It would be nice if the hltd on the BU could take core of that either by a special command 'e.g. hltd umount', or by shutting down the hltd on the BU.
Remi
elasticsearch-py is the offical python library from the elasticsearch team:
https://github.com/elasticsearch/elasticsearch-py.
It should be better supported and maintained.
If we use an alias to point to the runindex, elasticbu.py does not realize when the index already exists because the server return a InvalidIndexNameException instead of IndexAlreadyExistsException .
Fix for a problem occassionaly seen in daqval where file move to output directory would fail. The fix captures exception thrown and retries.
New development in CMSSW enables output of HLT path rates in json files (and jsd definition). This requires implementing parsing files and insertion into elasticsearch by elastic monitor on FUs.
After yesterday's hltd update, 3 FUs which should not be used came back alive and created box info files. The hltd on the BU failed to properly talk to them. One should device a way to inhibit the start of the hltd on "blacklisted" FUs.
Currently used Pyinotify library is not a very performant one.
hltd could be ported to a different library, python-inotify.
A blog post of the author outlines performance problems seen with pyinotify.
http://www.serpentine.com/blog/2008/01/04/why-you-should-not-use-pyinotify/comment-page-1/
In tests, it was observed that, with a large number of files written and deleted (~10k/sec) pyinotify uses approx 50% of CPU, while the equivalent python-inotify code used less than 20%. This was done with hltd code, with a return placed at the point where events collected by the library would be processed by hltd (anelastic.py).
Exception to this is a lightweight wrapper around the new library (runs a thread waiting for events), as the library is more low-level than pyinotify.
License of python-inotify is LGPLv2, so it fine to use it from the legal standpoint.
Note: new library had a memory leak in C bindings which was found and fixed.
To reduce amount of box info documents, FU information is aggregated and injected into central ES as part of BU boxinfo document. Additional field is added (mapping update needed to become effective) listing hosts from which information is collected. Additionally, the unique "boxinfo_last" document can be switched on for each BU. This document is replaced each time update is made from the same BU.
For aggregated information on runs, and monitoring that is not run dependent, we are introducing elasticsearch index stored on a separate cluster.
At present it collects information on: run directory created and EoR file appeared on BU, system monitoring of BU and FUs, end of lumi appearing on BU.
All documents are tagged with per-BU (or in some cases per-FU) id so that information can be tracked per appliance or box.
The index is created and/or filled from hltd on BU. Special configuration parameter elastic_runindex_url is added to hltd.conf.
Pending CMSSW release will add checksum field in output json and ".checksum" field for the micro-merged dat file which will be included in merged json file
Using output_adler32=true in [General] section of hltd.conf, the anelastic service will check correctness of this checksum on the memory buffer used to move file to output directory.
Review the directories/files watched by inotify. Remove any unnecessary watches and call-backs.
Currently micro-merging is done by each CMSSW process appending their output to the merged file in CMSSW end of lumi event. As events are not buffered by the process and instead written to disk during the lumisection, this requires reading data back from the disk and writing it back to the merged file, thus creating additional disk I/O.
This step could be skipped by delegating the merging to hltd esCopy function. Internally it opens a file for writing and appends contents of the single merged file from local disk. Instead the function could be modified to read contents of multiple files and merge to BU mount point on the fly (similar to option "B" in mini/macro merger)
Since the complete check at all levels of merge is information we will want to query constantly and routinely, it make sense to calculate the complete flags (or percentages) for all streams systematically at the time we run the collection of other statistics, in the river plugin. This will at the same time lighten the task of the GUI and/or the server by providing pre-calculated information. For clarity, we might want to run the complete check in a separate river plugin since, unlike the microstate-and-stream-rate plugin, it will have to access the central server for both read and write.
Availability of HLT nodes for cloud requires cooperation between mechanisms for starting CMSSW jobs built into hltd and a service which runs VM instances in same machines.
An external tool, possibly integrated with LevelZero FM, will be used to control which FU nodes should stop HLT and switch to the cloud mode. The tool will then directly contact hltd on nodes using cgi interface.
From hltd version 1.6.0, an API is available to allow taking the FU out of HLT. Activation is done in a similar way to new run notifications: a cgi script creates file in watch directory (standard location is '/fff/data'). Request needs to be send to hltd(by default port 9000), using the following form of the URL: http://host:9000/cgi-bin/exclude_cgi.py
FUs will then analyze which is the last lumisection completed on BU and signal CMSSW processes to finish within two additional lumisections. During this time, hltd switches into "activatingCloud" mode, and stops accepting new run start events from BU hltd. A number of available resources are masked in box file so that BU stops requesting data for machines in a switch over mode. Upon CMSSW jobs and local merge scripts finishing, all core resources are moved to /etc/appliance/resources/cloud and finally FU switches to the "cloud" mode, at which point virtual machines can run.
Since a recent conclusion was to activate VM startup through hltd, one possibility is to run a script which will signal the local cloud service ("cloud tool") that VMs can be started (when CMSSW are finished).
An "include" interface is also provided in hltd 1.6.0. However, currently this only returns core files in their usual place and allows hltd to accept new HLT runs.
I propose to modify this interface, which will execute a script/command communicating to the cloud tool to stop VMs ("include_cgi.py"), before the switchover to HLT is completed. The script called by hltd can be synchronous, i.e. it returns only when VMs are shut down (note that cgi calls are still asynchronous: they create a file which triggers action in hltd and immediately finish).
In addition, hltd will update a file providing name of the mode in which hltd currently is. This file could be polled for any mode change by the cloud service (if necessary). I propose that the file location is: "/fff/data/mode", with the following modes possible: "HLT", "activatingCloud","deactivatingCloud","cloud".
Wrong variable name used ("putq" instead of "postq")
Logging of hltd and spawned scripts is managed using logrotate. All logs from the hltd package are moved to /var/log/hltd directory.
stdout and stderr is captured by logger in both main hltd and spawned scripts.
HLT logs are now found in /var/log/hltd/pid and appended with run number for easier distinguishing between job logs from different runs.
CMSSW stderr and stdout is redirected to log files located in /var/log/hltd/pid.
A new script is developed to scan this directory and parse output of appearing log files. Script is started as a child process by hltd.
Messages are parsed into json documents which contain category, severity, module name, instance, function call, framework state, timestamp and message content of logs. Recognized messages are those produced by the MessageLogger (DEBUG,INFO,WARNING,ERROR and FATAL) as well as stack trace from a crash (considered FATAL). Framework report information is currently ignored.
Messages are also scanned to calculate a "lexicalId" hash which can be used, for example, for rate reduction of similar messages later in the chain.
Depending on the "hltd es_cmssw_log_level" parameter in hltd.conf, threshold is set for minimum log level to store in the elasticsearch.
Presently the same index as for other run-related information is used, however this can be changed if necessary.
A tool "test/logprint.py" is also provided, doing time-window queries in elasticsearch and printing collected messages in a way similar to the Handsaw tool.
Handle left over files from crashed CMSSW processes. The meta-information should enable the mergers to handle the rest of the successfully processes events. The input file of the crashed processes should be put into a special place to be defined.
New filename schema: runX_lsY_type_otherstuff.ext .
This schema should be respected by each filetype, EoLS and EoR too.
hltd service should not be started until configured by fffmeta rpm. A switch is going to be added that is disabling service by default until modified by the configuration script in meta package.
Presently the information of state of CPU resource usage is available through box info files updated by each FU in ramdisk. It was proposed that BU hltd should instead summarize this into a number of available resources and provide to consumers (BU application).
In the updated version, a file /fff/ramdisk/appliance/resource_summary (JSON file) is written, containing also other summmarized information (taking care that it is taken from box files updated within last 10s). For example:
{
"ramdisk_occupancy": 0.32000000000000001,
"active_resources": 1,
"activeFURun": 127042,
"activeRunNumQueuedLS": 0,
"broken": 0,
"idle": 0,
"used": 1,
"cloud": 0
}
In the case where the HLT generates a very large number of error log messages, this can bring ES into a state where it is no longer able to handle the transactions. As a result, the appliance cluster stops responding to other requests and the logCollector accumulates messages in memory (resident sizes over 2GB seen). We need to implement a mechanism to a) stop logging the same message when it repeats more than n times and b) drop log messages if we get an error on the transaction. Also it might be advisable to handle log messages with a bulk insert to minimise impact on cpu/memory
At startup of a CMSSW process, currently a single CPU core resource is assigned to a process. Changes are required to configure how many cores to assign to a single multi-threaded process and configure CMSSW multi-threading level correspondignly.
This can be done by waiting for appropriate number of CPU resource core files to appear in "idle" directory, then move them to "online" and spawn a process with the appropriate "number of threads" parameter passed as a cmsRun command line parameter. The parameter modifies process options in the CMSSW python configuration, setting numberOfStreams and numberOfThreads. Default value will be 1 thread/stream (single-threaded behavior).
There are requirements to save on hardware resources by running, for example, multiple minidaq systems while using common hardware. Since hltd is a system service, in current form it does not support instantiation.
This schema is proposed:
Experimental support for this feature is implemented here:
https://github.com/smorovic/hltd/tree/multi-instances%2Bcloud
When using more than one mount point on the FUs for the BU disks, hltd does only umount the first mount point. It then fails in remounting the 2nd mount point:
INFO:2014-08-26 18:59:59 - cleanup_mountpoints: found following mount points
INFO:2014-08-26 18:59:59 - ['/fff/BU0']
INFO:2014-08-26 18:59:59 - trying umount of /fff/BU0
INFO:2014-08-26 18:59:59 - found BU to mount at bu-c2e18-27-01.daq2fus1v0.cms
INFO:2014-08-26 18:59:59 - trying to mount bu-c2e18-27-01.daq2fus1v0.cms:/ /fff/BU0/ramdisk
INFO:2014-08-26 18:59:59 - trying to mount bu-c2e18-27-01.daq2fus1v0.cms: /fff/BU0/output
INFO:2014-08-26 18:59:59 - found BU to mount at bu-c2e18-27-01.daq2fus1v1.cms
INFO:2014-08-26 18:59:59 - trying to mount bu-c2e18-27-01.daq2fus1v1.cms:/ /fff/BU1/ramdisk
ERROR:2014-08-26 18:59:59 - Command '['mount', '-t', 'nfs4', '-o', 'rw,noatime,vers=4,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,noac', 'bu-c2e18-27-01
.daq2fus1v1.cms:/fff/ramdisk', '/fff/BU1/ramdisk']' returned non-zero exit status 32
Traceback (most recent call last):
File "/opt/hltd/python/hltd.py", line 185, in cleanup_mountpoints
os.path.join('/'+conf.bu_base_dir+str(i),conf.ramdisk_subdirectory)]
File "/usr/lib64/python2.6/subprocess.py", line 505, in check_call
raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['mount', '-t', 'nfs4', '-o', 'rw,noatime,vers=4,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,noac', 'bu-c2e18-27-01.daq2fus
1v1.cms:/fff/ramdisk', '/fff/BU1/ramdisk']' returned non-zero exit status 32
CRITICAL:2014-08-26 18:59:59 - Unable to mount ramdisk - exiting.
In some cases index creation fails and is later created dynamically when first document is indexed.
This creates the index without nencessary mapping and settings.
By specifying default template, all this can be taken by elasticsearch automatically whenever index creation succeeds. FU IP allocation still needs to be applied later because it is specific to a machine which index the document.
It could be useful to define some rule about log files.
There are 2 main question to define:
1- Log destination (i suggest /var/log/hltd/ )
2- Format:
in my python script i used logging library with this configuration:
logging.basicConfig(filename="/tmp/anelastic.log",
level=logging.INFO,
format='%(levelname)s:%(asctime)s-%(name)s.%(funcName)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S')
and each class creates is own logger:
def __init__(self):
self.logger = logging.getLogger(self.__class__.__name__)
it will result something like this in the log file (type-date-class-function-message):
INFO:2014-03-19 17:51:44-LumiSectionHandler.processDATFile - *message*
We need to create a "fake stream" json files that store information about not processed events due to process crashes.
To allow the merger group to handle this stream as closely to a normal stream as possible, we need to generate the following files:
A json file for each LS containing the number of event processed, the number of event not processed and the list of not processed raw files.
A proper .INI file .
To automatically configure DAQ cluster, a separate "meta" rpm package is created. it depends on hltd and elasticsearch, and will trigger reconfiguration when those are updated.
rpm build and integrated configuration scripts are presently found in hltd git. Versioning of the rpm follows the hltd versioning schema.
Script contained in rpm can detect whether it is executed on BU or FU, and on which cluster (daqval, prod) - presently this is based on hostname naming conventions. On FUs, the script will connect to HWCfg DB to retrieve DNS name of the BU data interface for mounting NFS ramdisk/output area. This is currently only supported on daqval until production HWCfg DB is ready.
The package will also set up elasticsearch parameters accordingly: appliance cluster setup where BU is master without data, and slaves with data on FUs. Unicast is used for discovery between master and slaves in the appliance.
The full cluster should be properly configured without manual intervention using the package. For this, requirement is that, when script is running on FU, BU machine must be booted. Also, the equipment set with proper information must be present in database or configuration will fail.
In addition, package includes the init script which will run the configuration script at each boot ('refresh'), prior to starting hltd and elasticsearch services.
Package is by running scripts/metarpm.sh
we had an issue with supposedly some action by puppet on the meta rpm generated a botched modification in hltd.conf. We need to keep track of the last modification by adding a timestamp to the "edited by fff meta rpm" comment line at the top of hltd.conf
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.