Code Monkey home page Code Monkey logo

cetune's Introduction

DISCONTINUATION OF PROJECT

This project will no longer be maintained by Intel.

Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.

Intel no longer accepts patches to this project.

If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.

Contact: [email protected]

Functionality Description

  • CeTune is a toolkit/framework to deploy, benchmark, profile and tune *Ceph cluster performance.
  • Aim to speed up the procedure of benchmarking *Ceph performance, and provide clear data charts of system metrics, latency breakdown data for users to analyze *Ceph performance.
  • CeTune provides test performance through three interfaces: block, file system and object to evaluate *Ceph.

Maintainance


Prepare

  • one node as CeTune controller(AKA head), Other nodes as CeTune worker(AKA worker).
  • Head is able to autossh to all workers include himself, head has a 'hosts' file contains all workers info.
  • All nodes are able to connect to yum/apt-get repository and also being able to wget/curl from ceph.com.

Installation

  • Install to head and workers:
head and workers need deploy apt-get,wget,pip proxy.
apt-get install -y python
  • Install to head:
git clone https://github.com/01org/CeTune.git

cd /CeTune/deploy/
python controller_dependencies_install.py

# make sure head is able to autossh all worker nodes and 127.0.0.1
cd ${CeTune_PATH}/deploy/prepare-scripts; ./configure_autossh.sh ${host} ${ssh_password}
  • Install to workers:
cd /CeTune/deploy/
python worker_dependencies_install.py

Start CeTune with WebUI

# install webpy python module
cd ${CeTune_PATH}/webui/ 
git clone https://github.com/webpy/webpy.git

cd webpy
python setup.py install

# run CeTune webui
cd ${CeTune_PATH}/webui/
Python webui.py

# you will see below output
root@client01:/CeTune/webui# python webui.py
http://0.0.0.0:8080/

Add user for CeTune

cd /CeTune/visualizer/
# show help
python user_Management.py --help

# add a user
cd /CeTune/visualizer/
python user_Management.py -o add --user_name {set username} --passwd {set passwd} --role {set user role[admin|readonly]}

# delete a user
python user_Management.py -o del --user_name {username}

# list all user
python user_Management.py -o list

# update a user role
python user_Management.py -o up --user_name {username} --role {set user role[admin|readonly]}
  • CeTune WebUI

webui.png


Configure

  • Use WebUI 'Test Configuration' Page, you can specify all the deploy and benchmark required configuration.
  • Also users are also able to directly modify conf/all.conf, conf/tuner.yaml, conf/cases.conf to do configuration.
  • Configuration helper is both under 'helper' tag, right after 'User Guide' and shows on the configuration page.
  • Below is a brief intro of all configuration files' objective:
    • conf/all.conf
      • This is a configuration file to describe cluster, benchmark.
    • conf/tuner.yaml
      • This is a configuration file to tune ceph cluster, including pool configuration, ceph.conf, disk tuning, etc.
    • conf/cases.conf
      • This is a configuration file to decide which test case to run.

Deploy Ceph

Assume ceph is installed on all nodes, this part is demonstrate the workflow of using CeTune to deploy osd and mon to bring up a healthy ceph cluster.

  • Configure nodes info under 'Cluster Configuration'
KEY VALUE DESCRIPTION
clean build true / false Set true, clean current deployed ceph and redeploy a new cluster; Set false, try obtain current cluster layout, and add new osd to the existing cluster
head ${hostname} Cetune controller node hostname
user root Only support root currently
enable_rgw true / false Set true, cetune will also deploy radosgw; Set false, only deploy osd and rbd nodes
list_server ${hostname1},${hostname2},... List osd nodes here, split by ','
list_client ${hostname1},${hostname2},... List client(rbd/cosbench worker) nodes here, split by ','
list_mon ${hostname1},${hostname2},... List mon nodes here, split by ','
${server_name} ${osd_device1}:${journal_device1},${osd_device2}:${journal_device2},... After adding nodes at 'list_server', cetune will add new lines whose key is the server's name;Add osd:journal pair to corresponding node, split by ','
  • Uncheck 'Benchmark' and only check 'Deploy', then click 'Execute'

webui_deploy.png

  • WebUI will jump to 'CeTune Status' and you will about to see below console logs

webui_deploy_detail.png


Benchmark Ceph

  • Users are able to configure disk_read_ahead, scheduler, etc at 'system' settings.
  • Ceph.conf Tuning can be added to 'Ceph Tuning', so CeTune will runtime apply to ceph cluster.
  • 'Benchmark Configuration' is how we control the benchmark process, will give a detail explaination below.
    • There are two parts under 'Benchmark Configuration'.
    • the first table is to control some basic settings like where to store result data, what data will be collected, etc.
    • The second table is to control what testcase will be run, users can add multi testcase, so all the testcases will be run one by one.

Check Benchmark Results

webui_result.png

webui_result_detail.png

webui_result_detail2.png


User Guidance PDF

CeTune Documents Download Url

cetune's People

Contributors

colin582511 avatar hualongfeng avatar jian-zhang avatar liuxiaojuan avatar lixiaoy1 avatar majianpeng avatar ningli16 avatar pythonfucku avatar rdower avatar shaoshian avatar tanyy avatar xiaoxichen819 avatar xinxinsh avatar xuechendi avatar yonghengdexin735 avatar ywang19 avatar zeroaska avatar zhouyuan avatar zhubx007 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cetune's Issues

fiorbd test failed

when I start benchmark test,faild,how to resolve this problem
[2017-10-17T08:33:28.541710][LOG]: start to run performance test
[2017-10-17T08:33:28.546344][LOG]: Calculate Difference between Current Ceph Cluster Configuration with tuning
[2017-10-17T08:33:33.438428][LOG]: Tuning[analyzer] is not same with current configuration
[2017-10-17T08:33:33.908256][LOG]: Tuning has applied to ceph cluster, ceph is Healthy now
[2017-10-17T08:33:36.914239][LOG]: ============start deploy============
[2017-10-17T08:33:39.591453][LOG]: Shutting down mon daemon
[2017-10-17T08:33:40.237892][LOG]: Shutting down osd daemon
[2017-10-17T08:33:40.576835][LOG]: Starting mon daemon
[2017-10-17T08:33:40.943608][LOG]: Started mon.node02 daemon on node02
[2017-10-17T08:33:41.319379][LOG]: Started mon.node03 daemon on node03
[2017-10-17T08:33:41.696928][LOG]: Started mon.node01 daemon on node01
[2017-10-17T08:33:41.697034][LOG]: Starting osd daemon
[2017-10-17T08:33:42.005184][LOG]: Started osd.0 daemon on node01
[2017-10-17T08:33:42.317962][LOG]: Started osd.1 daemon on node01
[2017-10-17T08:33:42.634781][LOG]: Started osd.2 daemon on node02
[2017-10-17T08:33:42.954045][LOG]: Started osd.3 daemon on node03
[2017-10-17T08:33:43.432979][LOG]: not need create mgr
[2017-10-17T08:33:43.443442][LOG]: Clean process log file.
[2017-10-17T08:33:43.919403][WARNING]: Applied tuning, waiting ceph to be healthy
[2017-10-17T08:33:47.403901][WARNING]: Applied tuning, waiting ceph to be healthy
[2017-10-17T08:33:50.888564][LOG]: Tuning has applied to ceph cluster, ceph is Healthy now
[2017-10-17T08:33:52.350882][LOG]: RUNID: 13, RESULT_DIR: //mnt/data//13-3-fiorbd-seqwrite-4k-qd64-2g-100-400-rbd
[2017-10-17T08:33:52.351263][LOG]: Prerun_check: check if sysstat installed
[2017-10-17T08:33:52.658104][LOG]: Prerun_check: check if blktrace installed
[2017-10-17T08:33:53.332501][LOG]: check if FIO rbd engine installed
[2017-10-17T08:33:53.720802][LOG]: check if rbd volume fully initialized
[2017-10-17T08:33:54.206562][WARNING]: Ceph cluster used data occupied: 2.698 KB, planned_space: 10485760.0 KB
[2017-10-17T08:33:54.206722][WARNING]: rbd volume initialization has not be done
[2017-10-17T08:33:54.206871][LOG]: Preparing rbd volume
[2017-10-17T08:33:55.164264][LOG]: 1 FIO Jobs starts on node02
[2017-10-17T08:33:55.474774][LOG]: 1 FIO Jobs starts on node03
[2017-10-17T08:33:55.783428][LOG]: 1 FIO Jobs starts on node01
[2017-10-17T08:33:57.122272][WARNING]: 0 fio job still runing
[2017-10-17T08:33:57.122398][ERROR]: Planed to run 0 Fio Job, please check all.conf
[2017-10-17T08:33:57.123074][ERROR]: The test has been stopped, error_log: Traceback (most recent call last):
File "/CeTune/benchmarking/mod/benchmark.py", line 46, in go
self.prerun_check()
File "/CeTune/benchmarking/mod/bblock/fiorbd.py", line 89, in prerun_check
self.prepare_images()
File "/CeTune/benchmarking/mod/bblock/fiorbd.py", line 52, in prepare_images
raise KeyboardInterrupt
KeyboardInterrupt

[ERROR]: Generating history view failed

[2015-11-05T16:56:35.588191][LOG]: Generating ceph view
[2015-11-05T16:56:35.620374][LOG]: generate ceph line chart
[2015-11-05T16:56:37.263420][LOG]: generate ceph line chart
[2015-11-05T16:56:37.300566][LOG]: generate ceph line chart
[2015-11-05T16:56:39.344226][LOG]: generate ceph line chart
[2015-11-05T16:56:41.038648][LOG]: generate ceph line chart
[2015-11-05T16:56:42.281420][LOG]: generate ceph line chart
[2015-11-05T16:56:44.422202][LOG]: generate ceph line chart
[2015-11-05T16:56:47.786397][LOG]: generate ceph line chart
[2015-11-05T16:56:47.817943][LOG]: generate ceph line chart
[2015-11-05T16:56:47.855909][LOG]: generate ceph line chart
[2015-11-05T16:56:53.526524][LOG]: Generating client view
[2015-11-05T16:56:53.559592][LOG]: generate client line chart
[2015-11-05T16:56:55.205861][LOG]: generate client line chart
[2015-11-05T16:56:57.319046][LOG]: generate client line chart
[2015-11-05T16:56:59.110973][LOG]: generate client line chart
[2015-11-05T16:57:00.355822][LOG]: generate client line chart
[2015-11-05T17:00:36.680553][LOG]: Generating history view
[2015-11-05T17:00:36.867727][ERROR]: Generating history view failed

after i run my test
i just see [ERROR]: Generating history view failed and no result in the result reports
i want to know what's wrong and how to find where has problem ?

thank you

how can I use cosbench to test?

After reading CeTune Document.pdf,I confuse about the cosbench config.
Do I need to download the cosbench by myself? Or the cetune will download the cosbench itself?
would you give me an example for me to how to use cosbench to test

Introduce <randrepeat> in fio test

randrepeat=bool 
Seed the random number generator in a predictable way so results are repeatable across runs. Default: true.

I think we would be better to default this to false, every run has same offset is really not a good random

cetune ui improvement

  1. help doc
  2. configuration description
  3. result, timestamp
  4. result filter(js)
  5. after run test description
  6. a cetune log to record cetune webui logon and script run

analyzer module refine

  1. provide arg to decide which data wanna to parse
  2. provide arg to device the interval of iostat, sar, etc.
  3. fio result handling for mix read write

cetune stable release

need to start a branch on cetune stable release

  1. need to check qemurbd, fiorbd(with mix read write), cosbench
  2. verify the result visualization
  3. bash scripts hostname with '-'

Due data: 7/15

visualizer module refine

  1. produce excel/csv file to easily download data
  2. input a scheme to parse the result dict -- so we can decide which data we wanna to show

osd on NVMe Device error

If choose the NVMe SSD as the osd device, it could occur error with "cat /sys/block/nvme0n/" no such directory or file. I do not do any partition operation on the NVMe. So the name of NVMe is "nvme0n1". Additionally, the error may come from tuner/tuner.py : line 84, 99.

Improve History summary

  1. sort the page by run_id and [optional] can sort by OP_TYPE
  2. use double click instead of single click in summary page

CeTune broke at haproxy step for CentOS nodes

Hello guys ,
I am running CeTune on CentOS nodes and got stuck at haproxy step. On RHEL based machines there is no /etc/default/haproxy configuration file so CeTune fails , is this a known issue ?

BTW haproxy rpm is properly installed on these nodes

[2016-05-08T11:51:07.632079][LOG]: Ceph already installed on below nodes
[2016-05-08T11:51:07.632285][LOG]: Install opts is '--release '
[2016-05-08T11:51:08.581212][WARNING]: Found different configuration from conf/ceph_current_conf with your desired config : OrderedDict([('mon', {}), ('osd', {}), ('mds', {}), ('osd_num', 3), ('radosgw', [])])
[2016-05-08T11:51:08.582611][LOG]: Generating rgw ceph.conf parameters
[2016-05-08T11:51:08.582849][LOG]: configure node2-1 in ceph.conf
[2016-05-08T11:51:13.282435][LOG]: deploy radosgw instances
[2016-05-08T11:51:19.644885][LOG]: Creating rgw required pools
[2016-05-08T11:58:03.390218][LOG]: Updating haproxy configuration
[2016-05-08T11:58:05.927822][ERROR]: node2: sed: can't read /etc/default/haproxy: No such file or directory
pdsh@node1: node2: ssh exited with exit code 2

As a temporary fix i created a dummy file /etc/default/haproxyand it worked

osd down when benchmark is running

I do not know if this is a problem which should deal with it.

when benchmark is running,the osd down and Ceph becomes recovery.But everything goes as normal,except the node_ceph_health.log.

In my long test history,I do not have enough attention to it。So I need check it one by one.

Maybe cetune can give some tips if the health is not OK when runing benchmark.

fio zipf support

It's better to have zipf of fio in some tests, e.g, testing with rbd-cache

[ERROR]:analyzer Failed

Hi Guys:
I try to run a fiorbd driver use CeTune.The output logs show a little error:
[ERROR]:analyzer Failed,pls try cd analyzer;python analyzer.py --path //mnt/data//1-140-fiorbd-seqwrite-4k-qd64-10g-100-400-rbd process_data
Seem when the py script get the path parameter,more then two '/' symbol.
I run this step by manually,it's ok
This error can be ignored?
B & R, thanks.

Add user role permit to cetune

Basically, the idea is to create a admin account and read-only account, so read-only user can only read data report without changing any cetune data.

I think the trick here, is we should encrypt data to avoid any hack.

AttributeError: 'ThreadedDict' object has no attribute 'userrole'

Reproduce steps:

  1. Install CeTune master
  2. Install CeTune webui
  3. Add user:
    python user_Management.py -o add --user_name admin --passwd 123456 --role admin
  4. Run benchmark.

There's error in "python webui.py" runtime log:

10.239.44.90:63246 - - [30/Jun/2017 10:05:00] "HTTP/1.1 GET /configuration/user_role" - 500 Internal Server Error
<Storage {'timestamp': u'2017-06-30T10:04:04.898784'}>
10.239.44.90:63246 - - [30/Jun/2017 10:05:01] "HTTP/1.1 POST /monitor/tail_console" - 200 OK
get_param:<Storage {}>
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/web.py-0.40.dev0-py2.7.egg/web/application.py", line 257, in process
return self.handle()
File "/usr/local/lib/python2.7/dist-packages/web.py-0.40.dev0-py2.7.egg/web/application.py", line 248, in handle
return self._delegate(fn, self.fvars, args)
File "/usr/local/lib/python2.7/dist-packages/web.py-0.40.dev0-py2.7.egg/web/application.py", line 488, in _delegate
return handle_class(cls)
File "/usr/local/lib/python2.7/dist-packages/web.py-0.40.dev0-py2.7.egg/web/application.py", line 466, in handle_class
return tocall(*args)
File "webui.py", line 75, in GET
return common.eval_args( self, function_name, web.input() )
File "/home/ning/upload/docker/CeTune/conf/common.py", line 605, in eval_args
if function_name != "":
File "webui.py", line 82, in user_role
output = session.userrole
File "/usr/local/lib/python2.7/dist-packages/web.py-0.40.dev0-py2.7.egg/web/session.py", line 68, in getattr
return getattr(self._data, name)
AttributeError: 'ThreadedDict' object has no attribute 'userrole'

Actually, when running python user_Management.py -o list
there is admin user listed.

stable release

need to switch to some release system, either version or tags should be OK.

cephfs workload

We still lack of a stable and reliable cephfs workload, submit an issue here per some user requests.

Inconsistent problem with cetune_report.db

Data from db is not consistent with data in dest_dir.

For example:
In my test environment, there is two test result in dest_dir, I just get one test result from web UI. I must rm cetune_report.db in dest_dir and reload it to get consistent result.

custom Ceph repo

allow to deploy ceph with custom repo instead of the official one on ceph.com.

using fdisk to check whether the rbd image is attached is not reliable

fdisk output is :

image

but the output of lsblk is:

root@vclient01:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 253:0 0 20G 0 disk
├─vda1 253:1 0 3.8G 0 part [SWAP]
└─vda2 253:2 0 16.2G 0 part /

this error occur every time when I create two or more benchmark cases.
when it deattch,the fdisk info is still exist until reboot vclient.

unsupported operand type(s) for /: 'str' and 'int', write_SN_Latency/osd_node_count

Traceback (most recent call last):
  File "analyzer.py", line 591, in <module>
    main(sys.argv[1:])
  File "analyzer.py", line 587, in main
    func()
  File "analyzer.py", line 109, in process_data
    result = self.summary_result( result )
  File "analyzer.py", line 265, in summary_result
    tmp_data["SN_Latency(ms)"] = "%.3f" % write_SN_Latency/osd_node_count
TypeError: unsupported operand type(s) for /: 'str' and 'int'

Virtual Image example

HI,

First, congratulation this tool is perfect to test, I would like to know if you have example of vclient.tmp.img, and you have a fixed path into file prepare-vm.sh it's hard coded to your path address (/home/xuechendi/remote_access/vclient.tmp.img) Line: 63.

support ceph-disk when create osd daemon

In recent observation, build osd daemon by ceph-osd --mkfs option may miss some import step and bad for osd failover detection and non-root support. Will create a new branch to fully support ceph-disk in cetune deployer module.

cetune will support prepare and activate firstly.

what ceph-disk does:
Prepare:

  • create GPT partition
  • mark the partition with the ceph type uuid
  • create a file system
  • mark the fs as ready for ceph consumption
  • entire data disk is used (one big partition)
  • a new partition is added to the journal disk (so it can be easily shared)
  • triggered by administrator or ceph-deploy, e.g. 'ceph-disk [journal disk]

Activate:

  • if encrypted, map the dmcrypt volume
  • mount the volume in a temp location
  • allocate an osd id (if needed)
  • if deactived, no-op (to activate with --reactivate flag)
  • remount in the correct location /var/lib/ceph/osd/$cluster-$id
  • remove the deactive flag (with --reactivate flag)
  • start ceph-osd
  • triggered by udev when it sees the OSD gpt partition type
  • triggered by admin 'ceph-disk activate '
  • triggered on ceph service startup with 'ceph-disk activate-all'

Deactivate:

  • check partition type (support dmcrypt, mpath, normal)
  • stop ceph-osd service if needed (make osd out with option --mark-out)
  • remove 'ready', 'active', and INIT-specific files
  • create deactive flag
  • umount device and remove mount point
  • if the partition type is dmcrypt, remove the data dmcrypt map.

Destroy:

  • check partition type (support dmcrypt, mpath, normal)
  • remove OSD from CRUSH map
  • remove OSD cephx key
  • deallocate OSD ID
  • if the partition type is dmcrypt, remove the journal dmcrypt map.
  • destroy data (with --zap option)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.