Code Monkey home page Code Monkey logo

infrastructure's Issues

text/csv.pm

Text::CSV is installed via ansible playbook:

    - name: Install Text::CSV
      shell: |
        cpanm --with-recommends Text::CSV
      tags: text_csv 

However, it doesn't appear on the machine

[root@rhel7hcxrt1 ~]# perl -MText::CVS -e 1
Can't locate Text/CVS.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .).

Running the installation command manually says its already installed and up to date:

[root@rhel7hcxrt1 ~]# cpanm --with-recommends Text::CSV
Text::CSV is up to date. (1.95)

Issues has been seen everywhere we checked so far:
RHEL 6 PPC64, RHEL 7 x86 PPC64, UB 14/16 x86,

Jenkins server - root partition is almost full

The Jenkins server (http://ci.adoptopenjdk.net) root partition is almost full.
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 188G 142G 37G 80% /

101GB of this is in /home/jenkins/.jenkins/jobs

Looking at the builds, It does not appear that they are configured to clean themselves up.
This should be configured for all builds.

example:
screen shot 2017-07-10 at 4 21 36 pm

Create missing Ansible playbooks for build machines

To ensure that machine images can be reliably recreated for AdoptOpenJDK build/test we need entirely scripted configuration that sets up a VM "from scratch".

A number of the Ansible playbooks exist in the openjdk-build repo, but they are not complete in their coverage.

Proposed steps are:

  • create an initial provisioning script to establish sufficient capability on a new node type to run as an Ansible client (e.g. keybox public key, python, more?)
  • ensure Ansible scripts exist for each CPU/OS type we manage, and are complete.

Goal is that we can discard a VM at any point and recreate it entirely using the public information in our scripts.

nagios.adoptopenjdk.net certificate about to expire

Hello,

Your certificate (or certificates) for the names listed below will expire in
19 days (on 26 Jul 17 00:42 +0000). Please make sure to renew
your certificate before then, or visitors to your website will encounter errors.

nagios.adoptopenjdk.net

For any questions or support, please visit https://community.letsencrypt.org/.
Unfortunately, we can't provide support by email.

For details about when we send these emails, please visit
https://letsencrypt.org/docs/expiration-emails/. In particular, note
that this reminder email is still sent if you've obtained a slightly
different certificate by adding or removing names. If you've replaced
this certificate with a newer one that covers more or fewer names than
the list above, you may be able to ignore this message.

If you want to stop receiving all email from this address, click
http://mandrillapp.com/track/unsub.php?u=30850198&id=8fb004715c47471b98c23130d1ca600a.OYLci%2Fk79LBUOvvM5JFmpLp8Mdw%3D&r=https%3A%2F%2Fmandrillapp.com%2Funsub%3Fmd_email%3Dbrad_blondin%2540ca.ibm.com
(Warning: this is a one-click action that cannot be undone)

Regards,
The Let's Encrypt Team

Jenkins machine configuration on Windows test machines need to update

Openjdk tests build on windows got failures for permission issue:

java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\openjdk_test_x86-64_windows\openjdk-test\OpenJDK_Playlist\openjdk-jdk8u\jdk\test\sun\management\windows\revokeall.exe

According to last two comments in adoptium/aqa-tests#37 (comment) Jenkins machine configuration need to update to specify the tools location for git.

The issue are still there suppose the configuration isn't be updated.

pLinux-LE machine for JCK testing

We need to run the JCK suite on pLinux-LE and it's access needs to be locked down so it cannot be shared with other jobs.

Spec-wise something like 2-core, 8Gb RAM and an SSD of around 100Gb would be ideal as that's what we're using for xLinux.

ci.adoptopenjdk.net package upgrade problems

The Jenkins host ci.adoptopenjdk.net had a number of critical OS package updates pending. Upgrading the packages has introduced problems with Jenkins.

Jenkins is up and running, but a number of nodes are currently flagged as offline.

Include host time synchronization pkgs in Ansible scripts

New machines that are configured for AdoptOpenJDK should have some real time clock synchronization package installed (e.g. NTP, timesyncd, etc) to ensure they do not drift too far and disrupt Jenkins pipeline coordination.

Although many of our jobs are quite long running, where they fail they may fail quickly and being out of sync by tens of seconds matters.

Request for access to Packet ARM systems

I'm working on getting the OpenJDK/OpenJ9 builds working on ARM. Would it be possible to get access to the ARM build systems for some basic toe-in-the-water evaluations of my initial builds?

build-marist-s390x-sles-12 can't resolve itself

I'm getting an issue on build-marist-s390x-sles-12 (148.100.110.56) where it is unable to resolve it's own hostname. Can we get an entry for openjdk-sles12 (The output from hostname) added to /etc/hosts on the machine - either with it's real IP or just to 127.0.0.1 please so that it resolves? This is causing some tests to fail as per adoptium/aqa-systemtest#9

Add additional hosts and services to Nagios

The following machines are not currently known to our Nagios installation, and should be added to ensure their basic health is monitored:

  • api.adoptopenjdk.net
  • staging.adoptopenjdk.net

The following publicly available services should also be monitored so the #infrastructure channel is notified if they go down:

  • HTTP/HTTPS
    • www.adoptopenjdk.net
    • api.adoptopenjdk.net
    • ci.adoptopenjdk.net
    • keybox.adoptopenjdk.net
    • staging.adoptopenjdk.net
    • ansible.adoptopenjdk.net

We do not currently have any sles12 s390x machines tagged with "test"

At the moment we run the systemtests on sles12 as this has a suitable version of the libffi library available (https://github.com/AdoptOpenJDK/openjdk-infrastructure/issues/19). At present none of those machines has a a tag of test so I cannot use that tag to run the systemtest jobs. If I leave the tag off I end up with things potentially running on master which doesn't work well at all as it doesn't have make installed. I could use build that would stop the other platforms from using the dedicated test machines, so that's not a sensible solutionest either. For now I've set the jobs to use !hg which knocks out three machines including master and is adequate until we get a sles12/s390x box tagged with test

See also the work item about machines tags: #93

Free up disk space on build-marist-s390x-sles-12 root partition

The /dev/dasdb2 file system on build-marist-s390x-sles-12 (148.100.110.56) is filling up, currently at 96%.

On the Ubuntu sister machine, the system upgrades had multiple versions of the kernel left behind. That may be happening on SLES too.

This task is to clear out any unused packages and kernels to free up the root partition.

zLinux machine for JCK testing

We need to run the JCK suite on zLinux and it's access needs to be locked down so it cannot be shared with other jobs.

Spec wise a 2-core, 8Gb, and a fast disk of around 100Gb should be ideal.

build-marist-s390x-rhel-7.3 unable to resolve host

I am having issues using wget or curl on this machine. This is also preventing me from being able to connect it to jenkins

~ ssh [email protected]
[linux1@adoptopenjdk ~]$ wget https://google.com
--2017-06-26 04:02:53--  https://google.com/
Resolving google.com (google.com)... failed: Connection refused.
wget: unable to resolve host address ‘google.com’
[linux1@adoptopenjdk ~]$ 

CC @bblondin @AdoptOpenJDK/getopenjdk

Update Nagios to latest version

Our installation of Nagios Core 4.3.1 is outdated and should be upgraded. The latest version of Nagios Core is 4.3.4 was released on 2017-08-24.

Add new s390x Linux machines to build test farm

Marist have generously created two new Ubuntu 16.04 systems for us. One is the replacement for our old RHEL6 image (148.100.110.55) while the other is an extra one that we requested to cope with the additional workload.

Both images have 8 Gig Memory / 100G Disk / 4 CP's

Systems:
LXEOJ905 - 148.100.33.178
LXEOJ906 - 148.100.33.179

I have the login details for these for those that need them.

This task is to configure the machines for build/test as appropriate, add the new nodes to Jenkins, Nagios, etc.

Bring AIX boxes on-line for build / test

The following new AIX build / test boxes are available to the project. I have added the keybox public key to the list of authorized_keys. Note that there are existing authorized keys that should be retained for the hoster's maintenance use.

power8-aix-openjdk1.osuosl.org - 140.211.9.10
power8-aix-openjdk2.osuosl.org - 140.211.9.12

Each system is 32GB memory, 5 vCPU, 1 CPU unit that can dynamically adapt to 10 CPU, and a minimal AIX 7.1 install. The AIX 7.1.4.4 DVD1 still is "mounted". The OS and related files are installed on filesystems allocated from rootvg, and /home is allocated from homevg. Each volume group is 80GB and most of rootvg is unallocated with considerable room for expansion.

Both systems have been set up with larger queue depth for the hdisks, which improves performance a little. One also can create a ramdisk.

You can customize the systems as you wish.

Windows machine for JCK testing

We need to run the JCK suite on WIndows and it's access needs to be locked down so it cannot be shared with other jobs.

Needs to have a fast disk (so I'd say SSD) and ideally powerful cores (but doesn't need many of them) so something like 2 core/8Gb/100Gb SSD should suffice. Perhaps 16Gb+ if we decide to use a ramdrive for holding the JCK test suite itself.

Windows version TBD - what do Oracle test on?

Investigate running Jenkins master as a service

Launching Jenkins currently requires remembering a long command-line. To keep things simple it would be preferable to embody this as a system service or some such thing, so that it will start on normal machine boot level, and be easier to stop/restart etc.

Original suggestion by @karianna

Get a second windows build machine

We could do with a windows 2012 server with visual studio 2013 to build the openj9 binaries as this is the required level for openj9 to build. I will investigate where we could source one from.

sigtest: Ubuntu machines need to have JDK 5, JDK 6 and JDK 9

In order to be able to build certain artefacts i.e. code-tools related (e.g. SigTest) we need to have the following JDKs installed:

  • version 5
  • version 6
  • version 9

As version 7 and 8 are already installed.

Note OpenJDK version 5 and 6 are not easily installable via ansible scripts. So an alternative source will need to be sought after. Which might add to the complications of our artefacts being built using different flavours of JDK (a bit inconsistent).

We could download from Oracle but with the latest changes, we will need a login and password to be able to download old versions of the JDK. Which means passing these details into the ansible script.

Vagrant script for ubuntu fails when run in an isolated environment

Standing up an environment using vagrant (for ubuntu 14.04) and running the ubuntu.yml Ansible script halts with the below message:

fatal: [localhost]: FAILED! => {"failed": true, "msg": "An unhandled exception occurred while
running the lookup plugin 'file'. Error was a <class 'ansible.errors.AnsibleError'>, 
original message: could not locate file in lookup: /Vendor_Files/keys/id_rsa.pub"})

This occurs when run on a local machine (reproducible on Linux and MacOSX environments).

Re-running with -v flags will help, -vv, -vvv, etc... will give will more verbose info about the issue.

See #58 (comment) in #58

Add any additional info to https://github.com/AdoptOpenJDK/openjdk-infrastructure/blob/master/ansible/README.md, once resolved or any findings during the course of the investigation.

#helpwanted #bug

MacOS machine for JCK testing

Agreed that Macstadium will provide us with two further mac's for this purpose. Waiting to find out which os level to deploy.

Create a Nagios System Configuration Tool

Create a Nagios System Configuration Tool (script) to help setup/configure new systems host.cfg files for Nagios to monitor

ask questions then generate the host.cfg file
test and enable monitoring

Move automated posting to Slack into their own channels

We have a number of automated 'bots' that post to Slack about various topics.

The bots are swamping some channels with automated messages, and hiding any real post. It is also unnecessary to archive most of the bot postings, so we can choose which are archived.

This issue is to create #<blah>-bots channels and switch the bots to posting on there so the humans have a chance.

Biggest offenders are likely:
#infrastructure where Nagios should be posting to #infrastructure-bot (un-archived), and
#website where Localize should be posting to #website-bot (un-archived).

Get a Tier 1 x86 sponsor

Currently we are hosting a lot of our x86 hardware with packet.net

I want to move away from this as I want to free up our usage limit so that we can provision more arm machines for testing

FYI @vielmetti

jenkins: Add ability to let more users view the jenkins job configurations

I've had a few people as if they can see the job configuration to be able to understand what the jobs are doing. While jenkins doesn't have any integrated ability to allow read-only access (aso by default if you can view it you edit it) there are plugins such as https://wiki.jenkins.io/display/JENKINS/Extended+Read+Permission+Plugin which will change that. opening this issue for discussion to see if there is any reason not to have this in place - do we have sensitive stuff in the jobs that wouldn't be hidden by this plugin?

zLinux machine(s) for non-JCK testing

Request for minimally 1 (eventually 2, if we do not start sharing machines across build/test functionality) zLinux machines for JCK testing (similar request to #77), "Spec wise a 2-core, 8Gb, and a fast disk of around 100Gb".

pLinux-LE machines for all non-JCK testing

I will piggy-back on the requests for JCK test machines (asking for same requirements as #76),
"Spec-wise something like 2-core, 8Gb RAM and an SSD of around 100Gb ".

One (or eventually two) machines, so that we can enable the following types of tests:

  • openjdk regression tests
  • system/stress tests
  • functional tests

(optionally/eventually some perf micro benchmarks).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.