Code Monkey home page Code Monkey logo

asyncstageout's People

Contributors

bbockelm avatar belforte avatar cinquo avatar dciangot avatar dtnrm avatar juztas avatar mmascher avatar spigad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

asyncstageout's Issues

Store the last query time of database source

The last query time of the database source should be stored somewhere. It will allow to begin to query the DB source since that time when the Async.StageOut server start after crashing/shutting down.

change import references

building the rpm we realized that all the python code must change in order to add AsyncTransfer to the imports. E.g.

from TransferWorker import TransferWorker
to
from AsyncTransfer.TransferWorker import TransferWorker

Calculate long term averages in a views

Currently the statistics daemon calculates min/max/average times for transfers to provide a running average (thanks for the clarification Hassen).
There should be a view that takes the min/max values reported and calculates long term averages, e.g. the min/max/avg for a day/week/month. This should be something like
{{{
function(doc) {
if (doc.timing) {
emit doc.timing.min_transfer_duration;
emit doc.timing.max_transfer_duration;
};
}
}}}
with a {{{_stats}}} reduce.

Multiuser support in the AsyncStageOut

Currently the transfers are done using the proxy of the AsyncStageOut server operator. The transfers of user files should be done using the own user proxy.

New parameters in the AsyncStageOut db

We need to add following parameters in the AsyncStageOut db:

  • Size of file
  • FTS server used to transfer this file.
  • The time when the transfer was done/failed (by adding a start_time and end_time).

Commands should log to own log files

Commands (ftscp, srmcp, lcg-ls etc.) should all log to their own log files. The component logfile should see:

{{{
logger.info("Transfer completed with return code %s, detailed logs in
%s and %s" % (rc, stdout_log, stderr_log))
}}}

or similar. Log files should be in an appropriate directory, e.g.:

{{{
$AGENT_LOGS/$USER/$DESTINATION/$TIMESTAMP-$COMMAND.std{err,out}
}}}

Request time is not updated in WorkQueue Elements

Hi Stuart,
Could you take a look at this? I don't see any thing wrong in the code below.
If you are busy, I will look at it tomorrow.

https://svnweb.cern.ch/trac/CMSDMWM/browser/WMCore/trunk/src/python/WMCore/WorkQueue/Database/MySQL/WorkQueueElement/UpdateReqMgr.py

2010-09-23 16:30:43,297:ERROR:WorkQueueManagerReqMgrPoller:Error saving reqMgr status update to db, (OperationalError) (1241, 'Operand should contain 1 column(s)') 'UPDATE wq_element SET reqmgr_time = %s\n WHERE id = %s' (1285277443, [2L, 3L, 15L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 4L, 1L])
Traceback (most recent call last):
File "/storage/local/data1/cmsdataops/wmagent/prod/install/WMCORE/src/python/WMComponent/WorkQueueManager/WorkQueueManagerReqMgrPoller.py", line 154, in reportToReqMgr
self.wq.setReqMgrUpdate(now, updated)
File "/storage/local/data1/cmsdataops/wmagent/prod/install/WMCORE/src/python/WMCore/WorkQueue/WorkQueue.py", line 237, in setReqMgrUpdate
transaction = self.existingTransaction())
File "/storage/local/data1/cmsdataops/wmagent/prod/install/WMCORE/src/python/WMCore/WorkQueue/Database/MySQL/WorkQueueElement/UpdateReqMgr.py", line 22, in execute
transaction = transaction)
File "/storage/local/data1/cmsdataops/wmagent/prod/install/WMCORE/src/python/WMCore/Database/DBCore.py", line 179, in processData
returnCursor = returnCursor)
File "/storage/local/data1/cmsdataops/wmagent/prod/install/WMCORE/src/python/WMCore/Database/MySQLCore.py", line 127, in executebinds
return DBInterface.executebinds(self, s, b, connection, returnCursor)
File "/storage/local/data1/cmsdataops/wmagent/prod/install/WMCORE/src/python/WMCore/Database/DBCore.py", line 65, in executebinds
resultProxy = connection.execute(s, b)
File "/storage/local/data1/cmsdataops/wmagent/prod/install/slc5_amd64_gcc434/external/py2-sqlalchemy/0.5.2-cmp7/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 824, in execute
return Connection.executors[c](self, object, multiparams, params)
File "/storage/local/data1/cmsdataops/wmagent/prod/install/slc5_amd64_gcc434/external/py2-sqlalchemy/0.5.2-cmp7/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 888, in _execute_text
return self.__execute_context(context)
File "/storage/local/data1/cmsdataops/wmagent/prod/install/slc5_amd64_gcc434/external/py2-sqlalchemy/0.5.2-cmp7/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 896, in __execute_context
self._cursor_execute(context.cursor, context.statement, context.parameters[0], context=context)
File "/storage/local/data1/cmsdataops/wmagent/prod/install/slc5_amd64_gcc434/external/py2-sqlalchemy/0.5.2-cmp7/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 950, in _cursor_execute
self._handle_dbapi_exception(e, statement, parameters, cursor, context)
File "/storage/local/data1/cmsdataops/wmagent/prod/install/slc5_amd64_gcc434/external/py2-sqlalchemy/0.5.2-cmp7/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 931, in _handle_dbapi_exception
raise exc.DBAPIError.instance(statement, parameters, e, connection_invalidated=is_disconnect)
OperationalError: (OperationalError) (1241, 'Operand should contain 1 column(s)') 'UPDATE wq_element SET reqmgr_time = %s\n WHERE id = %s' (1285277443, [2L, 3L, 15L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 4L, 1L])

Record statistics and tasks to separate DB

record statistics document per iteration like StageManager with record per user:task:source

AsyncStageout statistics documents should have a results field keyed by user:task:source, this information should also be in the results

Each task should be recorded to a separate DB for monitoring. This record should be updated if new files are found for a task. The document should hold the task id and the number of files associated with the task (and possibly total size).

Report transfers errors in files_database in couch

Currently, the reason of FTS transfer failure can be seen only in transfers log files. Files_database reports only the status of failed transfers as "failed" and it doesn't give more details. It will be interesting to have the failure reason in couch to avoid the need to open log files by the operator at each time where there is a transfer failure. Information in log files will be needed only for a deeper debug. To address this, it is needed to add in files_database documents a new attribute, like FailureReason or something similar, which will take as value the reason of the transfer failure.

is there any comments?

implement statistic database

Summary from ticket #521
The monitoring of the runtimeDB (files_db) gives a picture about how
current transfers are going, allowing thus the detection of current
problems. It provides also currents stats to predict short term problem
for e.g by showing the duration of transfers using a given FTS server is
increasing continuously during the last 3 days.

So the idea is to keep the docs of files in the runtimeDB for a
configurable period (N days) before removing them and updating the statDB
(long term stats) with needed information.

The cleaning of the runtimeDB after N days, as described above, may need
the development of a new component in the AsyncStageOut. This component
will poll the runtimeDB to remove files done/failed since N days and
update then the statsDB.

Add a Source baseclass for plugins to inherit from

As per comments in #389 there should be a Source base class which plugins inherit from. This should have a {{{call}}} method which the LFNSourceDuplicator uses to get the data.

The patch should also include a minor change to LFNSourceDuplicator such that it can use the new interface.

Appropriate remove command

The PFN destinations should be removed before beginning the async. stage-out transfers (and maybe also when the async. transfers fail).

Add rotating database for statistics database

To avoid the stats database becoming enormous we should rotate it (e.g. change it monthly, and record only transfers started in that month to the database). There are a couple of ways to do this:

  • maintain a history database that records stats per month/week and empty the stats database for a month once it is 2 months old (say)
  • have history documents in the stats database that contain a month summary (generated ~2 months after the month ends) and delete the documents from the stats db when the history doc is made.

I think I prefer the first option (deleting the database is cheaper than deleting a load of documents) but it makes the stats daemon more complicated - it needs to know which month it is.

This is a general problem in how we use CouchDB, ideally this should be something we can reuse elsewhere in WMCore.

LFN destination is different from the LFN source

Actually the Async. StageOut module gets the LFN of the output in the site source (the site of the WN) and uses this same path (removing /temp/ if it is in the path) to store the output in the site destination.

Let's say that a user needs to store its outputs in /store/user/username/userDir1 in the storage of the final destination site.

We have 2 options:

1- the LFN source (LFN in the site of WN) will be /store(/temp)/user/username/userDir1. If we choose this approach, we need to allow to do that in WMCore (AFAIK it can't be done actually)

2- Allow in Async. StageOut to handle LFN's when the path in the site source is different from the path in the site destination. An AsyncStaegeOut fix is needed to implement this solution.

agent name in the doc

Discussing a bit with Simon we agree that having the agent name in the transfer doc is a good idea and probably it does some future proofing.

Support execution of WMCore/bin commands in manage script

Add support for execute option to manage script to call a WMCore/bin commands and pass arguments to it & run it in the WMAgent environment.
Since it loads the secrets file, it checks that the command being executed actually exists in the WMCore/bin file

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.