Code Monkey home page Code Monkey logo

bosh-oneagent-release's Introduction

CircleCI

Dynatrace OneAgent BOSH Release

This is a BOSH release for Dynatrace.

This release installs Dynatrace OneAgent on BOSH managed VMs. It is intended to be used as BOSH addon for rolling out Dynatrace OneAgent to all VMs, including Linux and Windows Diego cells.

Usage

To use this BOSH release, first upload it to your BOSH. You can either build the release on your own or use a pre-built one from the Github repository releases.

bosh -e <YOUR_BOSH_DIRECTOR> upload-release /path/to/built/dynatrace-oneagent.tgz

Update the bosh-director's runtime-config. You will need to modify the runtime-config-dynatrace.yml to suit your needs, e.g. limit the addon to specific BOSH deployments. You will find your credentials in your Dynatrace UI.

bosh update-runtime-config runtime-config-dynatrace.yml

Run bosh deploy to install OneAgent on the VMs.

Limitations

All releases since 0.3.6 upwards are packaged via bosh2. This means you can't upload them to your director with the bosh1 cli. Since v2.0 of the Cloud Foundry Ops Manager, bosh2 is the default and is simply called with 'bosh'. If you use a version prior to that, bosh2 should be available besides the default bosh1. You can call it with the 'bosh2' command. Replace the commands above respectively

Releases since v1.0.5 also require BOSH Director v263 or greater.

License

Licensed under the MIT License. See the LICENSE file for details.

bosh-oneagent-release's People

Contributors

aloismayr avatar arthfl avatar baichinger avatar dtmad avatar espe0n avatar gidad avatar lrgar avatar mreider avatar srbhklkrn avatar voelzmo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bosh-oneagent-release's Issues

invalid job spec

it appears that both job specs include packages: in both of the jobs. this translates into packages being nil; however, packages is expected to be an array of strings. this error causes bosh.io not to be able to pick new release.

cc @voelzmo

Failure deploying to Windows VM

Hi,

When we try to deploy the Addon to a Windows TKGi cluster, which is composed by Windows and Ubuntu VMs, Bosh tries to use the Shell scripts in monit config instead of the Powershell scripts.

Monit config file in Windows VM:

PS C:\var\vcap\jobs> cat .\dynatrace-oneagent\monit
check process dynatrace-oneagent
  with pidfile /var/vcap/sys/run/dynatrace-oneagent/dynatrace-watchdog.pid
  start program "/var/vcap/jobs/dynatrace-oneagent/bin/start-oneagent.sh"
    with timeout 600 seconds
  stop program "/var/vcap/jobs/dynatrace-oneagent/bin/stop-oneagent.sh"
    with timeout 120 seconds
  group vcap

Example of how a monit file should look like in Windows VMs:

PS C:\var\vcap\jobs> cat .\kubelet-windows\monit
{
  "processes": [
    {
      "name": "kubelet",
      "executable": "powershell",
      "args": ["C:\\var\\vcap\\jobs\\kubelet-windows\\bin\\kubelet_ctl.ps1"],
      "env": {}
    }
  ]
}

Error:
Configuring job dynatrace-oneagent: Adding monit configuration: invalid character 'c' looking for beginning of value

The error is because the monit config file in the Windows VM is not a JSON.

We have runtime configurations for both Ubuntu Xenial and Windows2019 stemcells set in Opsmanager like in the example.

Bosh runtime config for Windows Addon:

releases:
- name: dynatrace-oneagent
  version: 1.4.0

addons:
- name: dynatrace-oneagent-sandbox2-windows-addon
  jobs:
  - name: dynatrace-oneagent
    release: dynatrace-oneagent
    properties:
      dynatrace:
        environmentid: <redacted>
        apitoken: <redacted>
        apiurl: <redacted>
        hostgroup: sevice-instance_b48b90ba-5b07-4a03-9ee4-16801560eb0d
        hosttags: cluster=TANZU_SANDBOX2 landscape=Tanzu_LS team=Tanzu_T
        hostprops: Department=Infrastructure Stage=Sandbox
        infraonly: 0
  include:
    deployments:
      - service-instance_7adea55a-5905-4768-85f9-2146c802c573

    stemcell:
      - os: windows2019
  exclude:
    lifecycle: errand

Failure building v1.4.0 for bosh.io

Hi, our pipeline that builds your release for bosh.io is failing for your latest release. Haven't dug into a lot, but it seems like maybe some of the blobs in your s3 release bucket aren't publicly readable?

Cloning into 'releases-index'...
done.
Checking out files: 100% (26058/26058), done.
[dynatrace-oneagent-1.0.2] skipping
[dynatrace-oneagent-1.0.3] skipping
[dynatrace-oneagent-1.0.4] skipping
[dynatrace-oneagent-1.1.0] skipping
[dynatrace-oneagent-1.2.0] skipping
[dynatrace-oneagent-1.2.1] skipping
[dynatrace-oneagent-1.2.2] skipping
[dynatrace-oneagent-1.3.0] skipping
[dynatrace-oneagent-1.3.1] skipping
[dynatrace-oneagent-1.3.2] skipping
[dynatrace-oneagent-1.3.3] skipping
[dynatrace-oneagent-1.4.0] importing
panic: Failed: Processing release: release=misc.Release{DirPath:"/tmp/build/6b387ced/release", MFPath:"/tmp/build/6b387ced/release/releases/dynatrace-oneagent/dynatrace-oneagent-1.4.0.yml", releaseReaderFactory:release.ReaderFactory{downloader:downloader.MuxDownloader{mux:map[string]downloader.Downloader{"http":downloader.HTTPDownloader{fs:(*system.osFileSystem)(0xc0000acc40), logger:(*logger.logger)(0xc0000b7c00)}, "https":downloader.HTTPDownloader{fs:(*system.osFileSystem)(0xc0000acc40), logger:(*logger.logger)(0xc0000b7c00)}, "file":downloader.LocalFSDownloader{fs:(*system.osFileSystem)(0xc0000acc40), logger:(*logger.logger)(0xc0000b7c00)}, "git":downloader.GitDownloader{fs:(*system.osFileSystem)(0xc0000acc40), runner:system.execCmdRunner{logger:(*logger.logger)(0xc0000b7c00)}, logger:(*logger.logger)(0xc0000b7c00)}}, logger:(*logger.logger)(0xc0000b7c00), downloadedPaths:map[string]downloader.Downloader{}}, extractor:tar.CmdExtractor{runner:system.execCmdRunner{logger:(*logger.logger)(0xc0000b7c00)}, fs:(*system.osFileSystem)(0xc0000acc40), logger:(*logger.logger)(0xc0000b7c00)}, fs:(*system.osFileSystem)(0xc0000acc40), logger:(*logger.logger)(0xc0000b7c00)}, jobReaderFactory:job.ReaderFactory{downloader:downloader.MuxDownloader{mux:map[string]downloader.Downloader{"http":downloader.HTTPDownloader{fs:(*system.osFileSystem)(0xc0000acc40), logger:(*logger.logger)(0xc0000b7c00)}, "https":downloader.HTTPDownloader{fs:(*system.osFileSystem)(0xc0000acc40), logger:(*logger.logger)(0xc0000b7c00)}, "file":downloader.LocalFSDownloader{fs:(*system.osFileSystem)(0xc0000acc40), logger:(*logger.logger)(0xc0000b7c00)}, "git":downloader.GitDownloader{fs:(*system.osFileSystem)(0xc0000acc40), runner:system.execCmdRunner{logger:(*logger.logger)(0xc0000b7c00)}, logger:(*logger.logger)(0xc0000b7c00)}}, logger:(*logger.logger)(0xc0000b7c00), downloadedPaths:map[string]downloader.Downloader{}}, extractor:tar.CmdExtractor{runner:system.execCmdRunner{logger:(*logger.logger)(0xc0000b7c00)}, fs:(*system.osFileSystem)(0xc0000acc40), logger:(*logger.logger)(0xc0000b7c00)}, fs:(*system.osFileSystem)(0xc0000acc40), logger:(*logger.logger)(0xc0000b7c00)}} Building tarball: executing bosh: exit status 1 (stdout: {
    "Tables": null,
    "Blocks": null,
    "Lines": [
        "-- Started downloading 'dynatrace-oneagent/d5cb01a64f257e8e3a4728364705070cd80a73e26bee4b5e7a3a6f6da89ef0ce' (sha1=sha256:90c2d1f560be79458ec6c432f5af1b4ee8b58e09a7247edc46f1e8754a3ebb7c)\n",
        "-- Started downloading 'dynatrace-oneagent-windows/7719df91fdd16480626a78856e782df7608e8ef8336b342c59535c158f198f39' (sha1=sha256:44272c03a7f59d66edc24b3bdd4e65bc8ad7c3d8da4950ad8b57f9841415b4bb)\n",
        "-- Failed downloading 'dynatrace-oneagent-windows/7719df91fdd16480626a78856e782df7608e8ef8336b342c59535c158f198f39' (sha1=sha256:44272c03a7f59d66edc24b3bdd4e65bc8ad7c3d8da4950ad8b57f9841415b4bb)\n",
        "-- Failed downloading 'dynatrace-oneagent/d5cb01a64f257e8e3a4728364705070cd80a73e26bee4b5e7a3a6f6da89ef0ce' (sha1=sha256:90c2d1f560be79458ec6c432f5af1b4ee8b58e09a7247edc46f1e8754a3ebb7c)\n",
        "- Downloading blob '6113ef44-4343-47e3-6751-69823a85eee1' with digest string 'sha256:44272c03a7f59d66edc24b3bdd4e65bc8ad7c3d8da4950ad8b57f9841415b4bb':\n    Getting blob from inner blobstore:\n      Getting blob from inner blobstore:\n        AccessDenied: Access Denied\n\tstatus code: 403, request id: 3EB99292CCB1749C, host id: f9Hru+RR3omvYRHRmFD5ejhslzIpEreuGzSJOvYREyAS1ZwqBl88MX+fnuBl03HFEJsNf4QnfF8=\n- Downloading blob '130126bf-c1b7-44aa-5dcb-0835d9566f2b' with digest string 'sha256:90c2d1f560be79458ec6c432f5af1b4ee8b58e09a7247edc46f1e8754a3ebb7c':\n    Getting blob from inner blobstore:\n      Getting blob from inner blobstore:\n        AccessDenied: Access Denied\n\tstatus code: 403, request id: 1BE70B51DBCD5211, host id: 351RDsjBQF3iuDyFwkHxBN0mXgSMqj4PpFCm6eTS0jvp8EljO59FrwC4fYPQIFCGpSkxb+AgTqc=",
        "Exit code 1"
    ]
} stderr: )

goroutine 1 [running]:
main.main()
	/tmp/build/6b387ced/worker/src/worker/create-releases.go:25 +0x250
exit status 2

Error handling in pre-start leads to loss of valuable info.

Hi,
At the moment, the pre-start-Script calls the Installer without checking the Return code afterwards.
(See https://github.com/Dynatrace/bosh-oneagent-release/blob/master/jobs/dynatrace-oneagent/templates/pre-start.erb#L165)
As a result, valuable information about the error is lost.
In the following example, a failed download of the Installer causes the installation to fail.
The Installer returns the code 7. However, instead of trying to download the Installer again, the Pre-start script will continue.

…
/opt/dynatrace/oneagent is on /var/vcap/data
Extracting...
Warning: S/MIME signature is missing
Unpacking. This may take a few minutes...
Error: Archive is corrupted. Installation aborted.
Setting oneagentwatchdog pid
Installation finished

Best regards
Christoph

Unable to switch from infraonly to fullstack monitoring

As we were working on bringing infraonly flag for a bosh deployment we came across a quirk.

When we enable infraonly mode and try to switch it back to full-stack the bosh agent is not willing to do it. Is this the intended behaviour?
Switching from full-stack to infraonly works. However, it cannot switch back to full-stack mode gain. It seems like this switch can happen only when the bosh VM is recreated.

On trying to understand what is going on we did an analysis and came across a few observations:

image

  1. We can see that there are some configurations that are set in /var/lib (Questionable choice of location in cloud environments IMHO…)
  2. There is a configuration file infraonly.conf which gets initialized when the flag is enabled.
  3. This file once configured does not seem to get reset. (as can be seen from the date)

We tried using both the flags: infraonly and INSTALLERARGS: INFRA_ONLY=1

What we noticed that within the installer file located /var/vcap/packages/Dynatrace-OneAgent-Linux-x.xxx.sh file there was a logic that sets this file and it seemed to be a onetime initialisation activity.

image

The next time the agent would reset, only when the VM is recreated. (I.E when the config files are no longer present in the /var/lib location)

This makes it difficult to use it in production environments when we need to make switches.

Dynatrace will Block a VM if Dynatrace is removed before the drain-script

Dear Dynatrace-Team

We currently have the Issue, where Dynatrace will block a Deployment-Process.

Cause:

The Issue is caused, when the uninstall-script has been called or when the drain script did not complete successfully.
In this Case Monit will try to revive the OneAgent using the start-oneagent-Script. However, since it does not contain an installer call, it will fail.

Symptoms:

The affected VM will go into the “STOPPED” state. When trying to redeploy the following Error occurs:

Task 1234567 | 16:00:00 | Updating instance dummy: dummy/ffffffff-ffff-ffff-ffff-ffffffffffff (0) (canary) (00:02:00)
                     L Error: Action Failed get_task: Task ffffffff-ffff-ffff-ffff-ffffffffffff result: Stopping Monitored Services: Stopping services '[dynatrace-oneagent]' errored
Task 1234567 | 16:00:00 | Error: Action Failed get_task: Task ffffffff-ffff-ffff-ffff-ffffffffffff result: Stopping Monitored Services: Stopping services '[dynatrace-oneagent]' errored

Workaround

You either have to force-delete the affected deployment and redeploy it afterwards, or manually ssh to the affected VM and call the pre-start-script manually.

Possible Solution

An easy Solution would be to add a recovery-routine to the start-script.

Something likes this:

if ! runServiceCommand start; then
  echo "error: Could not start Dynatrace OneAgent-Service"
  echo "info: An attempt is made to repair the local Dynatrace OneAgent-Service"
  if install_dynatrace; then
     if ! runServiceCommand start; then
       echo "error: Could not repair the local Dynatrace OneAgent-Service"
       exit 1
     fi
  else
    log "error: Could not repair the local Dynatrace OneAgent-Service"
    exit 1
  fi
fi

Where install_dynatrace is the setup-function from the pre-start-script

Thanks, and best Regards
Christoph

Dynatrace nginx agent hanging nginx for some routes on restart

We are facing a situation in which the nginx stops responding to routes specified by https://github.com/cloudfoundry/cloud_controller_ng/blob/e7e6ed316a89bb578ecec0aedc88fb61b8fe362c/bosh/jobs/cloud_controller_ng/templates/nginx.conf.erb#L191, when restarted. This behavior is not seen if the addon is not installed. The requests we notice fail need to have a
multipart/form-data header

This are not responded to by nginx:

$ curl -H"Content-Type: multipart/form-data; boundary=o" -XPUT -d "foo" https://URL/v2/apps/70964c35-045e-4b0e-bed9-c1794ebbff1/bits?async=true

But this one is:

$ curl -H"Content-Type: multipart/form-data; boundary=o" -XPUT -d "foo" https://URL/v2/apps/70964c35-045e-4b0e-bed9-c1794ebbff1/something_else?async=true

And this one is:

$ curl  -XPUT -d "foo" https://URL/v2/apps/70964c35-045e-4b0e-bed9-c1794ebbff1/bits?async=true

DT_INJECTION_RULES not supported

We do not see any possibility to tell the dynatrace agent not to monitor short living processes, this is e.g. required for Spark, in which a vm start in order to execute a remote call. We do not gain any additional insights from monitoring these short living processes, so we would like to exclude these.

On other platforms we can make use of DT_INJECTION_RULES so we can define which processes are irrelevant for monitoring. There is currently no possibility with this plugin to pass the variable to the agent.

[Release 1.3.3] Proxy configuration

Hello,
after updating the release version from 1.3.2 to 1.3.3, our proxy setup stopped working. Once the version was reverted to 1.3.2 the issue was fixed.

 sha1: fd97b6068e76e5b3265b9cfd5a121e6a7bb26a37
  url: https://bosh.io/d/github.com/Dynatrace/bosh-oneagent-release?v=1.3.3
  version: 1.3.3

We saw the following error in the install log:

11:04:18 Dynatrace OneAgent failed to connect to Dynatrace Cluster Node https://dynatrace.random:8443/communication. See log file for details: /var/vcap/data/dynatrace/oneagent/log/os/ruxitagent_host_14851.0.log Installation finished

And this is ruxitagent_host_14851.0.log

2020-10-28 11:10:29.182 UTC [e88dd8c8] info    [native] Storage path ................ /var/vcap/data/dynatrace/oneagent
2020-10-28 11:10:29.182 UTC [e88dd8c8] info    [native] Tenant UUID ................. 02ed582b-10d4-4dde-895b-98f18918cdbe
2020-10-28 11:10:29.182 UTC [e88dd8c8] info    [native] Tenant ID ................... 0xcc09a1cd
2020-10-28 11:10:29.182 UTC [e88dd8c8] info    [native] Network zone ................
2020-10-28 11:10:29.182 UTC [e88dd8c8] info    [native] Agent ID .................... 0x31dcd25339b2cbb1
2020-10-28 11:10:29.182 UTC [e88dd8c8] info    [native] Process group ID ............ 0x2e81c7cd12261c61
2020-10-28 11:10:29.183 UTC [e88dd8c8] info    [native] OSI ID ...................... 0x99ae3242cb44636d
2020-10-28 11:10:29.183 UTC [e88dd8c8] info    [native] Node ID ..................... 0x0000000000000000
2020-10-28 11:10:29.183 UTC [e88dd8c8] info    [native] Process group instance ID ... 0xb8cbc19f0dba6303
2020-10-28 11:10:29.183 UTC [e88dd8c8] info    [native] Container group ID .......... 0x0000000000000000
2020-10-28 11:10:29.183 UTC [e88dd8c8] info    [native] Container group instance ID . 0x0000000000000000
2020-10-28 11:10:29.183 UTC [e88dd8c8] info    [native] Container ID ................
2020-10-28 11:10:29.183 UTC [e88dd8c8] info    [native] Agent host .................. 2d2397f0-d596-42ed-a12c-470efb4cd0bf
2020-10-28 11:10:29.183 UTC [e88dd8c8] info    [native] Injection mode .............. UNKNOWN
2020-10-28 11:10:29.183 UTC [e88dd8c8] info    [native] Standalone .................. no
2020-10-28 11:10:29.183 UTC [e88dd8c8] info    [native] Log file aging .............. disabled
2020-10-28 11:10:29.183 UTC [e88dd8c8] info    [native] Agent name .................. host
2020-10-28 11:10:29.183 UTC [e88dd8c8] info    [native] Server/Collector ............ https://dynatrace.random:8443/communication;https://cluster-activegate.dynatrace.random:9999/communication
2020-10-28 11:10:29.183 UTC [e88dd8c8] info    [native] Proxy .......................
2020-10-28 11:10:29.183 UTC [e88dd8c8] info    [native] Dispatcher buffersize ....... 419430400

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.