Code Monkey home page Code Monkey logo

datadog-process-agent's Introduction

โš ๏ธ Repository locked

The process agent + system-probe code has been merged into the main datadog-agent repository.

This repository only contains code and commits prior to (and including) release 6.11.0 and 5.32.0 of the process agent, on the master branch.

Please forward any issues or pull requests regarding the trace agent to the datadog-agent repository.

datadog-process-agent's People

Contributors

adrienkohlbecker avatar conorbranagan avatar derekwbrown avatar fred-camara avatar hkaj avatar jeremy-lq avatar jtwaleson avatar kevinconaway avatar kevinjqliu avatar lachlancooper avatar leeavital avatar marc921 avatar mbotarro avatar sfluor avatar shang-wang avatar sunhay avatar truthbk avatar vatteh avatar xvello avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datadog-process-agent's Issues

Process agent on windows not working datadog v6

Hello,

After updating to datadog v6 it appears that windows live process monitor is no longer working. I've tried the following versions: 6.5.0, 6.5.1 and 6.4.2.

It seems like the process agent is constantly restarting. Logs from C:\ProgramData\Datadog\logs\process-agent.log:

2018-09-19 09:22:33 INFO (log.go:441) - config.Load()
2018-09-19 09:22:33 INFO (log.go:441) - config.load succeeded
2018-09-19 09:22:33 INFO (log.go:368) - starting the tagging system
2018-09-19 09:22:35 INFO (main_windows.go:75) - Starting process-agent for host=intt000000, endpoints=[https://process.datadoghq.com], enabled checks=[process rtprocess]
2018-09-19 09:22:35 WARN (allprocesses_windows.go:360) - unexpected quotes in string, giving up (C:\Windows\system32\cmd.exe /c ""C:\SvcFab_App__FabricSystem_App4294967295\US.Code.Current\FabricUS.bat"")
2018-09-19 09:22:35 INFO (allprocesses_windows.go:152) - Couldn't open process 0 The parameter is incorrect.
2018-09-19 09:22:35 INFO (collector.go:78) - windows process arg tracking enabled, will be refreshed every 15 checks
2018-09-19 09:22:35 INFO (collector.go:78) - will collect new process args immediately
2018-09-19 09:22:35 INFO (asm_amd64.s:2362) - Finished check #1 in 279.6043ms

Thank you.

5.17 Segfault - invalid memory address or nil pointer dereference in Docker

Just attempted an upgrade to 5.17 from 5.16. We are installing/running dd-agent with Ansible. The agent task keeps reporting changed.

TASK: [datadog-agent | Ensure datadog-agent is running] *********************** 
changed: [server-name]

Check the service:

$ sudo service datadog-agent status
datadog-agent:collector          RUNNING   pid 8444, uptime 2:01:02
datadog-agent:dogstatsd          RUNNING   pid 8434, uptime 2:01:02
datadog-agent:forwarder          RUNNING   pid 8433, uptime 2:01:02
datadog-agent:go-metro           EXITED    Aug 29 05:09 PM
datadog-agent:jmxfetch           EXITED    Aug 29 05:09 PM
datadog-agent:process-agent      FATAL     Exited too quickly (process log may have details)
datadog-agent:trace-agent        RUNNING   pid 8432, uptime 2:01:02
Datadog Agent (supervisor) is NOT running all child processes

Run process-agent manually:

$ /opt/datadog-agent/bin/process-agent
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7b247b]

goroutine 58 [running]:
github.com/DataDog/datadog-process-agent/util/docker.findDockerNetworks(0xc4201360c0, 0x40, 0x2070, 0x0, 0x40, 0xc420082dc0, 0xc42041c3c0)
	/home/jenkins/workspace/process-agent-build-ddagent/go/src/github.com/DataDog/datadog-process-agent/util/docker/docker.go:541 +0xbb
github.com/DataDog/datadog-process-agent/util/docker.(*dockerUtil).dockerContainers(0xc4203b0080, 0xae, 0x100, 0xc4203f8d80, 0x0, 0x0)
	/home/jenkins/workspace/process-agent-build-ddagent/go/src/github.com/DataDog/datadog-process-agent/util/docker/docker.go:329 +0x7bf
github.com/DataDog/datadog-process-agent/util/docker.(*dockerUtil).containers(0xc4203b0080, 0x9, 0x4072efae147ae148, 0x4055b51eb851eb85, 0x40f1296b33333333, 0x4007c28f5c28f5c3)
	/home/jenkins/workspace/process-agent-build-ddagent/go/src/github.com/DataDog/datadog-process-agent/util/docker/docker.go:381 +0x99b
github.com/DataDog/datadog-process-agent/util/docker.AllContainers(0xed137b600, 0xc420492000, 0x1, 0x1, 0x0)
	/home/jenkins/workspace/process-agent-build-ddagent/go/src/github.com/DataDog/datadog-process-agent/util/docker/docker.go:220 +0x7c
github.com/DataDog/datadog-process-agent/checks.(*ContainerCheck).Run(0x1709aa0, 0xc42039e160, 0x4d658222, 0x0, 0x0, 0x0, 0x0, 0x0)
	/home/jenkins/workspace/process-agent-build-ddagent/go/src/github.com/DataDog/datadog-process-agent/checks/container.go:50 +0x96
main.(*Collector).runCheck(0xc42040c200, 0x16cec00, 0x1709aa0)
	/home/jenkins/workspace/process-agent-build-ddagent/go/src/github.com/DataDog/datadog-process-agent/agent/collector.go:97 +0x6a
main.(*Collector).run.func2(0xc42040c200, 0xc420416900, 0x16cec00, 0x1709aa0)
	/home/jenkins/workspace/process-agent-build-ddagent/go/src/github.com/DataDog/datadog-process-agent/agent/collector.go:130 +0x347
created by main.(*Collector).run
	/home/jenkins/workspace/process-agent-build-ddagent/go/src/github.com/DataDog/datadog-process-agent/agent/collector.go:153 +0x1ac

Panic when there are stopped containers in a Fargate task

I believe the process agent is suffering from the same issue described here: DataDog/integrations-core#1955.

Short version is that the container stats endpoint returns a key for stopped containers but a null stats object. The process agent appears to assume that there is data there.

You can see an example of metadata and stats output triggering this here.

Agent log:

[PROCESS] starting process-agent
[PROCESS] Error: unable to determine creation time for container 5862d65c8843b4ffc2ebb8e4437aa4e49f24e3e2c5b89ce22d9bd8854522eb25 - parsing time "" as "2006-01-02T15:04:05Z07:00": cannot parse "" as "2006"
[PROCESS] Error: unable to get stats from ECS for container 5862d65c8843b4ffc2ebb8e4437aa4e49f24e3e2c5b89ce22d9bd8854522eb25 - json: cannot unmarshal string into Go value of type ecs.ContainerStats
[PROCESS] Error: unable to get stats from ECS for container 75aba7c191d45c9121f0ae4a7835c6dc4e047e184adc54610c00e6d500bdb55f - json: cannot unmarshal string into Go value of type ecs.ContainerStats
[PROCESS] 2018-08-17 20:58:51 INFO (config.go:398) - overriding API key from env DD_API_KEY value
[PROCESS] Error: unable to get stats from ECS for container 75aba7c191d45c9121f0ae4a7835c6dc4e047e184adc54610c00e6d500bdb55f - json: cannot unmarshal string into Go value of type ecs.ContainerStats
[PROCESS] Error: unable to determine creation time for container 5862d65c8843b4ffc2ebb8e4437aa4e49f24e3e2c5b89ce22d9bd8854522eb25 - parsing time "" as "2006-01-02T15:04:05Z07:00": cannot parse "" as "2006"
[PROCESS] Error: unable to get stats from ECS for container 5862d65c8843b4ffc2ebb8e4437aa4e49f24e3e2c5b89ce22d9bd8854522eb25 - json: cannot unmarshal string into Go value of type ecs.ContainerStats
[PROCESS] Error: unable to determine creation time for container 5862d65c8843b4ffc2ebb8e4437aa4e49f24e3e2c5b89ce22d9bd8854522eb25 - parsing time "" as "2006-01-02T15:04:05Z07:00": cannot parse "" as "2006"
[PROCESS] Error: unable to get stats from ECS for container 5862d65c8843b4ffc2ebb8e4437aa4e49f24e3e2c5b89ce22d9bd8854522eb25 - json: cannot unmarshal string into Go value of type ecs.ContainerStats
[PROCESS] Error: unable to get stats from ECS for container 75aba7c191d45c9121f0ae4a7835c6dc4e047e184adc54610c00e6d500bdb55f - json: cannot unmarshal string into Go value of type ecs.ContainerStats
[PROCESS] panic: runtime error: invalid memory address or nil pointer dereference
[PROCESS] [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x11035a6]
[PROCESS]
[PROCESS] goroutine 41 [running]:
[PROCESS] github.com/DataDog/datadog-process-agent/checks.fmtContainers(0xc4201c0960, 0x5, 0x6, 0xc4204c57a0, 0x5, 0x6, 0xbed5eba2df6405b7, 0x1576c8a0, 0x1fafbe0, 0x1, ...)
[PROCESS] /home/jenkins/workspace/process-agent-build-ddagent/go/src/github.com/DataDog/datadog-process-agent/checks/container.go:111 +0x356
[PROCESS] github.com/DataDog/datadog-process-agent/checks.(*ContainerCheck).Run(0x1faf660, 0xc420339380, 0x61278774, 0xc420339e00, 0xc42031df10, 0xc42008da40, 0xc420097980, 0xc42008da40)
[PROCESS] /home/jenkins/workspace/process-agent-build-ddagent/go/src/github.com/DataDog/datadog-process-agent/checks/container.go:63 +0x135
[PROCESS] main.(*Collector).runCheck(0xc42033ac80, 0x1541f80, 0x1faf660)
[PROCESS] /home/jenkins/workspace/process-agent-build-ddagent/go/src/github.com/DataDog/datadog-process-agent/agent/collector.go:78 +0xb4
[PROCESS] main.(*Collector).run.func2(0xc42033ac80, 0xc420096f00, 0x1541f80, 0x1faf660)
[PROCESS] /home/jenkins/workspace/process-agent-build-ddagent/go/src/github.com/DataDog/datadog-process-agent/agent/collector.go:144 +0x310
[PROCESS] created by main.(*Collector).run
[PROCESS] /home/jenkins/workspace/process-agent-build-ddagent/go/src/github.com/DataDog/datadog-process-agent/agent/collector.go:132 +0x3f1
[PROCESS] trace-agent exited with code 2, signal 0, restarting in 2 seconds

Unclear installation documentation

I'm trying to package datadog v6, but I can't find documentation how to get live process collection working.

Should this package be compiled and available at $PATH so that datadog-agent can invoke it?

I do see

UTC | INFO | (serializer.go:263 in SendJSONToV1Intake) | Sent processes metadata payload, size: 1220 bytes.

without using this package and no error about missing datadog-process-agent.

No support for Alpine Linux

Hello,

I would like to use the process agent inside an Alpine Linux environment (inside Docker container). This does not work at the moment.

golint

Need to run golint on this code and fix it up

Add simple log de-duping

If we hit the same error on every check run let's only log it once in all cases.

We're doing this in a really hacky way right now via capturing the last error and comparing. This only applies to Docker and k8s checks so generalizing it would be very nice. fada8d8

Hardcoded ports 6062 and 6060 - they need to be configurable / documentation updated to reflect ports in use

Agent ports (profile-server) are hardcoded and not mentioned anywhere in the documentation. This is causing conflicts with running processes listening to these ports.

  • Port 6062:

if opts.info {
// using the debug port to get info to work
url := "http://localhost:6062/debug/vars"
if err := Info(os.Stdout, cfg, url); err != nil {
os.Exit(1)
}
return
}
// Run a profile server.
go func() {
http.ListenAndServe("localhost:6062", nil)
}()

  • Port 6060:

https://github.com/DataDog/datadog-log-agent/blob/e3fb0e64f1b5eb11887fd0953606884fadea9053/pkg/logagent/main.go#L54

panic during monitoring of processes

We have a list of processes that we monitor, I unfortunately do not know which process causes the following panic but the stack trace should have enough hints to fix this panic:-

datdog-agent version: 6.10.1
Running on kubernetes as a daemonset.

Mar 16 11:38:23 dg1 k8s_datadog-agent_datadog-agent-:  [ TRACE ] trace-agent exited with code 0, disabling
Mar 17 01:01:25 dg4 k8s_datadog-agent_datadog-agent-:  [PROCESS] panic: runtime error: invalid memory address or nil pointer dereference
Mar 17 01:01:25 dg4 k8s_datadog-agent_datadog-agent-:  [PROCESS] [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x121fdd3]
Mar 17 01:01:25 dg4 k8s_datadog-agent_datadog-agent-:  [PROCESS]
Mar 17 01:01:25 dg4 k8s_datadog-agent_datadog-agent-:  [PROCESS] goroutine 173 [running]:
Mar 17 01:01:25 dg4 k8s_datadog-agent_datadog-agent-:  [PROCESS] github.com/DataDog/datadog-process-agent/checks.fmtProcesses(0xc4204ce700, 0xc4207bb3e0, 0xc4204f0ae0, 0xc42020f180, 0x0, 0x4d, 0x15ae947, 0x9, 0x415a3ff170a3d70a, 0x416b01d26199999a, ...)
Mar 17 01:01:25 dg4 k8s_datadog-agent_datadog-agent-:  [PROCESS] 	/home/jenkins/workspace/process-agent-build-ddagent/go/src/github.com/DataDog/datadog-process-agent/checks/process.go:215 +0x5d3
Mar 17 01:01:25 dg4 k8s_datadog-agent_datadog-agent-:  [PROCESS] github.com/DataDog/datadog-process-agent/checks.(*ProcessCheck).Run(0x2276820, 0xc4204ce700, 0x32665ce4, 0x0, 0x0, 0x0, 0x0, 0x0)
Mar 17 01:01:25 dg4 k8s_datadog-agent_datadog-agent-:  [PROCESS] 	/home/jenkins/workspace/process-agent-build-ddagent/go/src/github.com/DataDog/datadog-process-agent/checks/process.go:83 +0x275
Mar 17 01:01:25 dg4 k8s_datadog-agent_datadog-agent-:  [PROCESS] main.(*Collector).runCheck(0xc4202ce140, 0x16fe8e0, 0x2276820)
Mar 17 01:01:25 dg4 k8s_datadog-agent_datadog-agent-:  [PROCESS] 	/home/jenkins/workspace/process-agent-build-ddagent/go/src/github.com/DataDog/datadog-process-agent/cmd/agent/collector.go:85 +0x203
Mar 17 01:01:25 dg4 k8s_datadog-agent_datadog-agent-:  [PROCESS] main.(*Collector).run.func2(0xc4202ce140, 0xc4200f65a0, 0x16fe8e0, 0x2276820)
Mar 17 01:01:25 dg4 k8s_datadog-agent_datadog-agent-:  [PROCESS] 	/home/jenkins/workspace/process-agent-build-ddagent/go/src/github.com/DataDog/datadog-process-agent/cmd/agent/collector.go:151 +0x322
Mar 17 01:01:25 dg4 k8s_datadog-agent_datadog-agent-:  [PROCESS] created by main.(*Collector).run
Mar 17 01:01:25 dg4 k8s_datadog-agent_datadog-agent-:  [PROCESS] 	/home/jenkins/workspace/process-agent-build-ddagent/go/src/github.com/DataDog/datadog-process-agent/cmd/agent/collector.go:139 +0x3f4
Mar 17 01:01:25 dg4 k8s_datadog-agent_datadog-agent-:  [PROCESS] process-agent exited with code 2, signal 0, restarting in 2 seconds

`kube_service` tag missing from live container view

The kube_service tag was available in my live container view when using v5.22.0.

Now, on agent v5.22.1 and any new releases, the kube_service tag is missing.

I believe this PR may have caused this behavior: #103 ?

Do I need to change my kubernetes configuration file to pull these tags with the newer agent version?

"Collected containers" log message

I'm getting this in my datadog logs every 10 or 20 seconds:

INFO (container.go:95) - collected containers in 94.620068ms

it's flooding my logs, I think this should be a DEBUG-level log as most people don't care about this at all.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.