beyondtheclouds / enos Goto Github PK
View Code? Open in Web Editor NEWExperimental eNvironment for OpenStack :monkey:
Home Page: https://beyondtheclouds.github.io/enos/
License: GNU General Public License v3.0
Experimental eNvironment for OpenStack :monkey:
Home Page: https://beyondtheclouds.github.io/enos/
License: GNU General Public License v3.0
Make it incrementally, first step would be :
A static approach with 4 nodes (minimum : 1 controller, 1 network, 2 computes)
Bonus : make it compatible withe vagrant-g5k
It might be valuable to write an internship subject to investigate how ENOS can be used to perform automatic performance regressions test in the CI.
Note that Orange Labs is strongly interested by this aspect.
Kolla-ansible failed to launch memcached with the following error:
Running command: '/usr/bin/memcached -vv -l 10.44.0.18 -p 11211 -c 5000'
failed to set rlimit for open files. Try starting as root or requesting smaller maxconns value.
The problem comes from the -c 5000
that is too big.
If we set such a section in reservation.yaml :
resources:
medium:
compute: 1
network: 1
large:
control: 1
then, the following section is generated in the env file :
rsc:
compute:
we can see the confusing situation where the VM named compute-0 has the role "network" and the VM named network-0 has the role "compute".
I propose to implement :
execo
under an execo_cc
module. There will be specific code for the reservation, polling the api ...Cons :
Pro :
Hints: need to pass this information to the inventory generator. For now we are using the execo.Host structure. We should probably get rid of it now and use our own structure that will help to express everything.
Caveats :
Pro :
Caveats :
Pros :
We need to tell neutron to accept trafffic from/to virtual ip. By default traffic to a virtual ip will be blocked. This can be done by updating the corresponding port in neutron by setting the allowed_address_pairs
extension
neutron port-update 9b02dbf2-5353-42b4-9d90-80595c4909fa --allowed_address_pairs list=true type=dict ip_address=10.0.2.253
To be generic, we should probably allow this using a full range of IPs using the cidr of the subnet on every ports.
Maybe not for the first iteration, but we could think to have a dedicated volume to store the registry data (similar as we we have on G5K). Except that we don't need ceph dependencies to be installed.
We'll have to reimplement another reservation logic similar to g5k.
Network isolation is available for bare metal on CC 2.
At a first sight, we could reuse most of the code above (kvm version). We just need to make sure on how the private network is created (follow the good rules of the documentation).
[1]: Something like that in ansible :
[control]
enos-2 ansible_ssh_user=debian ansible_host=10.0.2.61
[compute]
enos-0 ansible_ssh_user=debian ansible_host=10.0.2.60
[all:vars]
ansible_ssh_common_args='-o StrictHostKeyChecking=no -o ProxyCommand="ssh -W %h:%p -o StrictHostKeyChecking=no [email protected]"'
reservation_id
isn't available as scheduler_hint
Use case :
You don't want to slow down the monitoring traffic to/from the monitoring node.
The idea of this proposition is to allow different deployment models for multisite deployment to be natively supported by Enos. This goes from :
Currently we support 1. 2.
3. could be supported by a wrapper on top of Enos but requires some patches to Kolla :
( https://review.openstack.org/#/c/431588/ and https://review.openstack.org/#/c/431658/)
The main differences I see between this deployment is how Enos understand groups.
One proposition to implement this has been given here :
https://github.com/BeyondTheClouds/Wiki/wiki/CR-030117-deployment-model
related docopt is missing on tc
task
This location is likely not the same as the current
directory created in the current directory.
Values OS_AUTH_URL
and OS_REGION_NAME
in the template admin-openrc.j2
potentially are false values. The template should reuse variables openstack_region_name
and keystone_admin_url
provided by kolla-ansbile.
I tested it with Kolla master and branch 3.0.1, each time mariadb fails to start.
So I assume galera.cnf
needs an update.
It seems that vlans/subnet informations are now available through the API.
As a consequence we probably don't need the g5k_networks.yml
file anymore.
>> pp root.sites[:rennes]
#<Resource:0x3fe95a0a2994 uri="/3.0/sites/rennes"
RELATIONSHIPS
clusters, deployments, jobs, metrics, network_equipments, parent, pdus, self, servers, status, version, versions, vlans
PROPERTIES
"compilation_server"=>false
"description"=>"Grid5000 Rennes site"
"email_contact"=>"[email protected]"
"frontend_ip"=>"172.16.111.106"
"g5ksubnet"=>{"gateway"=>"10.159.255.254", "network"=>"10.156.0.0/14"}
"kavlan_ip_range"=>"10.24.0.0/14"
"kavlans"=>{"1"=>
{"gateway"=>"172.16.111.101", "network"=>"192.168.192.0/20"},
"16"=>{"gateway"=>"10.27.255.254", "network"=>"10.27.192.0/18"},
"2"=>{"gateway"=>"172.16.111.102", "network"=>"192.168.208.0/20"},
"3"=>{"gateway"=>"172.16.111.103", "network"=>"192.168.224.0/20"},
"4"=>{"gateway"=>"10.24.63.254", "network"=>"10.24.0.0/18"},
"5"=>{"gateway"=>"10.24.127.254", "network"=>"10.24.64.0/18"},
"6"=>{"gateway"=>"10.24.191.254", "network"=>"10.24.128.0/18"},
"7"=>{"gateway"=>"10.24.255.254", "network"=>"10.24.192.0/18"},
"8"=>{"gateway"=>"10.25.63.254", "network"=>"10.25.0.0/18"},
"9"=>{"gateway"=>"10.25.127.254", "network"=>"10.25.64.0/18"},
"default"=>{"gateway"=>"172.16.111.254", "network"=>"172.16.96.0/20"}}
"latitude"=>48.1
"location"=>"Rennes, France"
"longitude"=>-1.6667
"name"=>"Rennes"
"production"=>true
"renater_ip"=>"192.168.4.19"
"security_contact"=>"[email protected]"
"storage5k"=>true
"sys_admin_contact"=>"[email protected]"
"type"=>"site"
"uid"=>"rennes"
"user_support_contact"=>"[email protected]"
"virt_ip_range"=>"10.156.0.0/14"
"web"=>"http://www.irisa.fr"
"version"=>"50f72bc5970f734edadb7337e7fd406ad1952c4c">
Currently, rally is installed and configured during the up
phase. When working with an existing openstack deployment, this phase will likely not be called. As a result we should maybe differ this installation.
There is a missing square bracket in enos backup
documentation, around --backup-dir
:
usage: enos backup [--backup_dir=BACKUP_DIR [-e ENV|--env=ENV]
^
This makes docopt crashes and prevents backup.
Once TC network constraints have been defined at the NICs level (whatever the number of NICs), what is the latency we can expect at the level of the DataPlane (i.e. from the VMs that are executed on the hosts).
On NixOS, during a vagrant/vbox deployment, all vboxnet*
NIC are down and routes through these NIC are missing.
Setting auto_config: true
in the vagrant file seems to fix the bug.
Hi all,
It would be great to have two additional options in kolla.
--force-kolla-deploy:
This option should delete containers on the different nodes and invoke enos to redeploy the selected openstack like if it was the first time. This option is mandatory when you want to perform several trials of an experiment without redeploying everything (kadeploy + kolla)
--force-deploy:
This option should (re)deploy everything (kadeploy + kolla)
fatal: [graphene-6-kavlan-4.nancy.grid5000.fr]: FAILED! => {"changed": true, "cmd": ["docker", "stop", "influx"], "delta": "0:00:00.023152", "end": "2016-11-04 18:38:19.542934", "failed": true, "rc": 1, "
start": "2016-11-04 18:38:19.519782", "stderr": "Error response from daemon: No such container: influx", "stdout": "", "stdout_lines": [], "warnings": []}
add when: enable_monitoring
to the tasks
There are maybe at least these breaking changes :
Actually, I have a problem on installation. When I run this command from a frantend
pip install git+git://github.com/BeyondTheClouds/enos@master#egg=enos
I got this log at end
creating /usr/local/lib/python2.7/dist-packages/enos
error: could not create '/usr/local/lib/python2.7/dist-packages/enos': Permission denied
----------------------------------------
Cleaning up...
Command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-ojF8sT/enos/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-_lzaQi-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip-build-ojF8sT/enos
Storing debug log for failure in /home/jaddarrous/.pip/pip.log
The message says that I do not have permission. Knowing that sudo
can't be used on the frontend, I reserved an instance and I ran the command again with sudo-g5k
and I got this error log:
import pytz as _pytz
ImportError: No module named pytz
error in setup command: Error parsing /tmp/pip-build-eyIa3H/positional/setup.cfg: ImportError: No module named pytz
----------------------------------------
Cleaning up...
Command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-eyIa3H/positional/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-Pz__bH-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip-build-eyIa3H/positional
Storing debug log for failure in /root/.pip/pip.log
Did I miss something?
Some findings :
Before diving into this we should think twice if the use the node_custom_config
is relevant.
Some variables remain in the ansible/group_vars/all.yml
but are now unused :
# will be copied on the rally host to launch scenarios
rally_scenarios_dir: "{{ playbook_dir }}/../../rally/"
rally_scenarios_list: "all-scenarios.txt.sample"
rally_times: 1
rally_concurrency: 1
At line 169 of provider/g5k.py the provider expects some compute nodes in order to bound them on the /tmp directory.
The case there are no computes, the script fails.
This is missing : https://enos.readthedocs.io/en/latest/analysis/index.html#post-mortem
In Enos we rely on execo.Host.
It's desirable to change it into an home made structure since :
This will have some side effects on every provider and the extra.to_ansible_group function
see :
ansible/ansible-modules-core#5558
we use the return value in the bench phase
pip install -U pip --user
should do the trick
scenario_type
is deprecated in favor of bench.type
.
The reason why the error doesn't show up before is due that in some cases the env
from previous deployment is reloaded and still included scenario_type
.
a second call, will create a _data
subdirectory instead of copying the contents.
As a result indexing the logs in the results vm will fail.
I suggest to
/tmp/kolla-logs
/var/lib/docker/volumes/kolla_logs/_data
insidesubsequent call should overwrite previous ones, which is a more desirable behaviour.
Do we currently support this way of installing the tool ?
I may be wrong but I think that, by default, there will be an issue with the inventory directory as well as the rally directory.
never change licencing in a rush :)
Since we are importing ansible and execo we'll have to stick to GPLv3 licence.
Following our discussion on Slack, I strongly support the action of putting on the mainpage of ENOS as well as the readthedocs pages, the fact that ENOS has been designed in order to favor the 3R as defined by ACM
https://www.acm.org/publications/policies/artifact-review-badging
Using ENOS will help researchers to get all the mandatory materials to get such agreements.
Neutron is currently deployed with the default parameters provided by Kolla. As a consequence tenant networks are enabled by default. In the perspective of running Openstack over Openstack I'm tempting to switch to a simpler model for neutron deployment and use a flat provider network. IPs would be taken on the kavlan ips pool. No tenant networks / floating ip will be supported anymore. This way we avoid the cost of the overlay network for the under cloud (by extension we also could avoid it for the over cloud if needed). This configuration should be eased by the fact that kolla/newton allows custom config to be placed along with the deployment files.
Grafana lets you add annotations that draw vertical line to notify specific events[1].
To do so, we have to log relevent event into the "event" table of influxdb[2].
We can do something generic by logging every step of ansible thanks to ansible hooks[3].
[1] http://docs.grafana.org/reference/annotations/
[2] http://maxchadwick.xyz/blog/grafana-influxdb-annotations
[3] https://docs.ansible.com/ansible/developing_plugins.html
The documentation explains well enough how to use topology.
It would be nice to be able to define host constraints (in terms of memory, CPU, ...). This way will enable ENOS to emulate specific resources such as ISPs' boxes
In most cases, you write a reservation.yml for a specific provider. So adding the provider name into the reservation.yml will make stuff clearer.
Add a specific enos task to free resources, e.g., enos free
.
In the G5K provider this will be equivalent to call oargridstat
followed by oargriddel
.
I perform my tests at Nantes since Rennes vlan are dead. To make deployment faster I wanna use Nantes' Ceph, but monitors host are fixed in roles/registry/file/ceph.conf
. I rather propose to parameterize this into the reservation.yml
.
Currently ethx names are globally set. If nodes span different cluster vif naming can differ (eth0/eth1 on some nodes and eth1/eth2 on some other for instance). We should probably set this variables on a per host basis instead of globally.
Inventory file seems a good place to put this information : https://docs.ansible.com/ansible/intro_inventory.html#host-variables
Another solution would be to rely on the variable loading mechanisms and place those information on ansible/hosts_vars//... but will lead to a creation of a bunch of files.
To reproduce :
provider:
type: vagrant
option: vbox
Leads to
The provider 'vbox' could not be found, but was requested to
back the machine 'enos-127'. Please use a provider that exists.
Note that provider:vagrant
works. In this case vagrant
is called without any env variables and default to virtualbox.
Based on the previous version, I've prototyped a new way to launch benchs :
Let's see if it fit our needs :)
There is no link to the source of Enos on Enos documentation.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.