newrelic / newrelic-diagnostics-cli Goto Github PK
View Code? Open in Web Editor NEWNrDiag is a command line diagnostics tool for New Relic Products that was created by and is maintained by New Relic Global Technical Support
NrDiag is a command line diagnostics tool for New Relic Products that was created by and is maintained by New Relic Global Technical Support
The README.md says under "Installation" that "To install in a Docker container see here". This link doesn't take you to any instructions specific to Docker.
Are they any instructions out there for installing with Docker?
Customers will see the following nrdiag result summary when their newrelic.jar has been renamed to, say, newrelic-4.9.0.jar to indicate a specific version:
Failure - Java/Env/Process
None of the active Java processes included the -javaagent argument. For proper installation of New Relic Java agent, the -javaagent flag must be passed to the same Java process that is running your application
That is because our regex search explicitly looks for "newrelic.jar" and is not matching the actual "newrelic-someversionnumber.jar" on the customers' system.
https://github.com/newrelic/newrelic-diagnostics-cli/blob/main/tasks/java/env/process.go#L118
For infra integration config files we prompt for each since they may contain sensitive information.
"FilesToCopy": [
{
"Path": "/var/db/newrelic-infra/newrelic-integrations/cassandra-definition.yml",
"Name": "./cassandra-definition.yml",
"StoredName": "cassandra-definition.yml",
"Streamed": false,
"Identifier": ""
}
],
"Payload": [
{
"FileName": "docker-config.yml",
"FilePath": "/etc/newrelic-infra/integrations.d/"
},
{
"FileName": "cassandra-definition.yml",
"FilePath": "/var/db/newrelic-infra/newrelic-integrations/"
},
{
"FileName": "cassandra-config.yml",
"FilePath": "/etc/newrelic-infra/integrations.d/"
}
]
The idea here is to reduce the need to determine if this is a collection bug or user intervention.
This is on Rocky Linux 8.7, kernel 4.18.0-425.3.1.el8.x86_64
During nrdiag run, I got this error:
Error - Infra/Env/ClockSkew
Diagnostics CLI was unable to complete this health check because we ran into an unexpected type assertion error.
I suspect that Rocky Linux 8 is not fully supported, since I got here in the first place due to issues with getting newrelic-infra working on this system. See the forum post here:
https://forum.newrelic.com/s/hubtopic/aAX8W0000015AUpWAM/no-logs-for-my-php-apm
and here:
https://forum.newrelic.com/s/hubtopic/aAX8W00000005wQWAQ/host-logs-not-pushed
nrdiag would run without error and tell me what (if anything) is wrong or misconfigured.
Warning - Java/Env/Version
Java not found in PATH
Warning - Infra/Log/LevelCheck
Infrastructure logging level not set to verbose (debug/trace). If troubleshooting an Infrastructure issue, please set log level to: debug in newrelic-infra.yml.
See https://docs.newrelic.com/docs/infrastructure/new-relic-infrastructure/troubleshooting/generate-logs-troubleshooting-infrastructure for more information.
Error - Infra/Env/ClockSkew
Diagnostics CLI was unable to complete this health check because we ran into an unexpected type assertion error.
Please notify this issue to us whenever possible through https://discuss.newrelic.com/ by creating a new topic or through https://github.com/newrelic/newrelic-diagnostics-cli/issues
OS: Rocky Linux 8.7
Kernel: 4.18.0-425.3.1.el8.x86_64
Packages installed:
newrelic-php5-common-10.9.0.324-1.noarch
newrelic-php5-10.9.0.324-1.x86_64
newrelic-infra-1.40.1-1.el8.x86_64
newrelic-daemon-10.9.0.324-1.x86_64
newrelic-repo-5-3.noarch
The real issue isn't just about getting nrdiag working in Rocky 8. I want to get all my logs and metrics from my server into newrelic. The APM for PHP that I set up is not working. See the forum links for more info about those issues:
here: https://forum.newrelic.com/s/hubtopic/aAX8W0000015AUpWAM/no-logs-for-my-php-apm
and here: https://forum.newrelic.com/s/hubtopic/aAX8W00000005wQWAQ/host-logs-not-pushed
With .NET agent version 10.0.0 and above, the names of the install packages for Linux are changing from newrelic-netcore20-agent
to newrelic-dotnet-agent
. The install path on disk is also changing from /usr/local/newrelic-netcore20-agent
to /usr/local/newrelic-dotnet-agent
. NrDiag needs to be updated to handle the new package name and install location.
Ideally, NrDiag can be made to handle both cases, since we will need to continue supporting older versions in the field for as long as New Relic's support policy for agents requires that we do so.
It looks like these are the places in the NrDiag codebase where the newrelic-netcore20-agent
string appears:
Beyond that I'm not familiar enough with how NrDiag works to suggest how to solve this. I can be available to help work on the solution/test possible solutions if necessary.
NOTE
Provide a general summary of the request in the title above. ^^ )
Summary
NOTE
Provide a brief overview of what the new feature is all about. )
Desired Behavior
NOTE
Tell us how the new feature should work. Be specific. )
TIP
Do NOT give us access or passwords to your New Relic account or API keys! )
Possible Solution
NOTE
Not required. Suggest how to implement the addition or change. )
Additional context
TIP
Why does this feature matter to you? What unique circumstances do you have? )
Base/Env/InitSystem
Summary
Unable to parse init system from: /bin/busybox
https://wiki.gentoo.org/wiki/OpenRC#busybox
In the code: https://github.com/newrelic/newrelic-diagnostics-cli/blob/main/tasks/base/env/initSystem.go#L69
We look at env vars and config files for the agent's log locations. However, we are failing at pick up the location from the .NET agent's config file. This is how the configuration looks like:
<configuration xmlns="urn:newrelic-config" agentEnabled="true">
<service licenseKey="MY-LICENSE-KEY" />
<application>
<name>MY-APP-NAME (Prod)</name>
</application>
<log level="info" directory="I:\Logs" />
<transactionTracer enabled="true" transactionThreshold="apdex_f" stackTraceThreshold="500" recordSql="obfuscated" explainEnabled="false" explainThreshold="500" />
The solution to this problem is to add the path/fields that we must look in a config xml file and add it to the list of keysInConfigFile
:
https://github.com/newrelic/newrelic-diagnostics-cli/blob/main/tasks/base/log/logHelpers.go#L48
curl https://connection-test.newrelic.com
curl: (7) Failed to connect to connection-test.newrelic.com port 443 after 9 ms: Couldn't connect to server
The connection test fails during a diag check but the URL seems to be down when attempting to hit it from multiple locations and trying to open a TCP connection to port 443 on the server.
Run the diag tool and check connection test output or curl https://connection-test.newrelic.com
https://connection-test.newrelic.com Succeeds with a 200 OK
Failure - Base/Collector/ConnectTLS
There was an error connecting to connection-test.newrelic.com.
Please check network and proxy settings and try again or see -help for more options.
Error = Get "https://connection-test.newrelic.com/": dial tcp 162.247.242.43:443: connectex: No connection could be made because the target machine actively refused it.
See https://docs.newrelic.com/docs/new-relic-solutions/get-started/networks for more information.
Tested with a Windows and Linux based environment
Log collection for Python agent may be broken as config files take precedence over env vars.
Over here https://github.com/newrelic/newrelic-diagnostics-cli/blob/main/tasks/base/log/logHelpers.go#L117-L129 we are making the assumption that we should prioritize the location of agent logs by looking at the env vars values. That is because most agent env vars will overwrite whatever was configured in the config file. Except for the python agent.
When redirecting output with -output-path
, the json file and the filelist file are not added to the zip file
run the cli with -output-path
and review the zip
both files should be added to the zip
Hi @KaliforniaShell, thanks for reporting this! Could you please provide some details on how to reproduce the error?
nrdiag -version
to check) 'not sure'Thank you!
Originally posted by @daffinito in #218 (comment)
This InfraIntegrationsMatch is giving a false positive that says that a customer is missing a definition file when they are actually not because new versions of the agent do not require it.
Standard: This is the format used by most on-host integrations. This configuration uses two files: a definition file and a configuration file. For more details, see Standard configuration.
Newer: Starting December 2019, infrastructure agent version 1.8.0 began supporting a new format used by some integrations. This format uses a single configuration file and provides other improvements. For more details, see Newer configuration.
Run ./nrdiag -suites infra
in an environment that has the infra agent installed, a version newer than 1.8.0
You'll get the error:
Found matching integration files with some errors: Configuration file 'C:\Program Files\New Relic\newrelic-infra\integrations.d\perfmon-config.yml' does not have matching Definition file Definition file 'C:\Program Files\New Relic\newrelic-infra\custom-integrations\nri-perfmon-definition.yml' does not have matching Configuration file
This task should not fail for customers that have a newer version of the infra agent. A possible fix to this bug could be to add a check for looking at what version are they are using, and then based on that check we can judge if they are missing a definition file or not.
Reference:
This task for getting the version(s) of .NET Core/5+ installed on a system will fail if only the .NET runtime is installed, without the SDK being installed. dotnet --version
only works if the SDK is installed. Having only the runtime installed is a common case for containerized (Docker/Kubernetes) deployments. The more generic command for getting .NET version info is dotnet --info
. The output might require some more complicated parsing. Example output:
dotnet --info
.NET SDK (reflecting any global.json):
Version: 6.0.101
Commit: ef49f6213a
Runtime Environment:
OS Name: ubuntu
OS Version: 20.04
OS Platform: Linux
RID: ubuntu.20.04-x64
Base Path: /usr/share/dotnet/sdk/6.0.101/
Host (useful for support):
Version: 6.0.1
Commit: 3a25a7f1cc
.NET SDKs installed:
2.1.818 [/usr/share/dotnet/sdk]
3.1.416 [/usr/share/dotnet/sdk]
5.0.404 [/usr/share/dotnet/sdk]
6.0.101 [/usr/share/dotnet/sdk]
.NET runtimes installed:
Microsoft.AspNetCore.All 2.1.30 [/usr/share/dotnet/shared/Microsoft.AspNetCore.All]
Microsoft.AspNetCore.App 2.1.30 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 3.1.22 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 5.0.13 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 6.0.1 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 2.1.30 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 3.1.22 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 5.0.13 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 6.0.1 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
To install additional .NET runtimes or SDKs:
https://aka.ms/dotnet-download
NR diagnostic CLI fails at the beginning of the run with a message:
goroutine 20 [running]:
github.com/newrelic/newrelic-diagnostics-cli/tasks/base/config.parseJs({0x8bf800, 0xc00018f6c8})
/home/runner/work/newrelic-diagnostics-cli/newrelic-diagnostics-cli/tasks/base/config/validate.go:373 +0x1365
github.com/newrelic/newrelic-diagnostics-cli/tasks/base/config.processConfig({{0xc00038d468?, 0x1?}, {0xc00002d5c0?, 0xc000692a00?}})
/home/runner/work/newrelic-diagnostics-cli/newrelic-diagnostics-cli/tasks/base/config/validate.go:187 +0x33d
github.com/newrelic/newrelic-diagnostics-cli/tasks/base/config.BaseConfigValidate.Execute({}, {0x414025?}, 0xc000498000?)
/home/runner/work/newrelic-diagnostics-cli/newrelic-diagnostics-cli/tasks/base/config/validate.go:102 +0x249
main.processTasks({0xc00025ac00?}, {0x0, 0x0, 0x0?}, 0xc0001a51d0?)
/home/runner/work/newrelic-diagnostics-cli/newrelic-diagnostics-cli/processTasks.go:121 +0xa14
created by main.main
It likely happens due to a tasks/base/config/validate.go:373
where it has a line
if string(strings.TrimSpace(keyMap[1])[0]) == "[" {
there is an assumption that if key-value string is split by colon then after a trim a second part is non-empty string. This fails for a string like
agent_enabled:
process.env.NODE_ENV !== 'test'
due to a line break which causes OutOfBound error above.
Base/Config/Validate is panicking on a newrelic.js file:
Executing following diagnostic task suites: Node Agent
Check Results
-------------------------------------------------
Info Base/Env/CollectEnvVars [Gathered Environment variables of current shell.]
Success Base/Config/Collect
panic: runtime error: index out of range [1] with length 1
goroutine 20 [running]:
github.com/newrelic/newrelic-diagnostics-cli/tasks/base/config.formatJs({0x14000312980, 0x71})
/Users/daffinito/code/newrelic-diagnostics-cli/tasks/base/config/validate.go:319 +0x568
github.com/newrelic/newrelic-diagnostics-cli/tasks/base/config.parseJs({0x1029bd1a0?, 0x140004e7600?})
/Users/daffinito/code/newrelic-diagnostics-cli/tasks/base/config/validate.go:364 +0x58
github.com/newrelic/newrelic-diagnostics-cli/tasks/base/config.processConfig({{0x1400002635b?, 0x13?}, {0x14000026320?, 0x102bf4940?}})
/Users/daffinito/code/newrelic-diagnostics-cli/tasks/base/config/validate.go:187 +0x2c8
github.com/newrelic/newrelic-diagnostics-cli/tasks/base/config.BaseConfigValidate.Execute({}, {0x140003a1998?}, 0x102554578?)
/Users/daffinito/code/newrelic-diagnostics-cli/tasks/base/config/validate.go:102 +0x22c
main.processTasks({0x1400023ac90?}, {0x0, 0x0, 0x0?}, 0x140001993a0?)
/Users/daffinito/code/newrelic-diagnostics-cli/processTasks.go:121 +0x818
created by main.main
/Users/daffinito/code/newrelic-diagnostics-cli/core.go:128 +0x930
A lot of customers fail at passing the -javaagent flag in docker. It would be great to recommend this doc to fix their issue is this task fails for them: https://discuss.newrelic.com/t/relic-solution-what-you-need-to-know-about-new-relic-when-deploying-with-docker/52492
https://github.com/newrelic/newrelic-diagnostics-cli/blob/main/tasks/java/env/process.go#L98-L100
I am not seeing all traces that I expect in a Python application. I ran the diagnostics tool to see why. The only error that I see is
Failure - Python/Requirements/PythonVersion
Your 3.9.15 Python version is not in the list of supported versions by the Python Agent. Please review our documentation on version requirements
See https://docs.newrelic.com/docs/agents/python-agent/getting-started/compatibility-requirements-python-agent#basic for more information.
There are no problems with tracing Celery tasks in the same environment with the same Python version.
./nrdiag -a -s python -c /etc/newrelic.ini -output-path /tmp/nrdiag -v
Python (CPython/PyPy) versions supported: 2.7, 3.7, 3.8, 3.9, 3.10, and 3.11.
Recommendation: Use Python version 3.7 or higher with our agent.
I expect 3.9.15 to be supported.
The output says that 3.9.15 is not supported.
Failure - Python/Requirements/PythonVersion
Your 3.9.15 Python version is not in the list of supported versions by the Python Agent.
uname -m
: x86_64
Attempting to run the binary on an AWS Graviton processor (ARM) reveals this -- any chance ARM support is coming soon? Thank you :)
bash: ./nrdiag_x64: cannot execute binary file
We need a better timestamp format: https://github.com/newrelic/newrelic-diagnostics-cli/blob/main/scripts/build.sh#L31
This will make sorting nrdiag results/runs by time WAY more reliable downstream than using the formatted timestamp we currently provide
When Infra/Config/ValidateJMX task runs on a windows environment, it fails at this line:
echo "*:type=*,name=*" | nrjmx -hostname 127.0.0.1 -port 9999 --verbose true
https://github.com/newrelic/NrDiag/blob/main/tasks/infra/config/validateJMX.go#L184
./nrdiag -t infra/config/ValidateJMX
to only see this task in action and filter out others tasksError connecting to local JMXServer:
exec: "echo": executable file not found in %PATH%
Occasionally our Base/Log/Copy task (which lives here: https://github.com/newrelic/newrelic-diagnostics-cli/blob/main/tasks/base/log/copy.go) will collect an excessive amount of log files for the .NET agent. In this nrdiag result summary:
Summary
We found at least one New Relic log file with last modified date newer than 7 days ago: C:\ProgramData\New Relic\.NET Agent\Logs\NewRelic.Profiler.3872.log
We found at least one New Relic log file with last modified date older than 7 days ago: C:\ProgramData\New Relic\.NET Agent\Logs\NewRelic.Profiler.8720.log
We collected 77 log files for the last 7 days.
It seems several of these .NET profiler logs can be generated in a short amount of time depending on how often the monitored process occurs. A hard limit on the amount collected would reduce payload bloat, but it, unfortunately, runs the risk of excluding relevant logs. Support engineers in New Relic have to cross-reference PIDs in the agent logs to the profiler logs, and as a senior support engineer said there is no good way to separate the wheat from the chaff.
It seems then the best way to ensure what we are collecting is relevant is giving priority to more recent profiler log files. For example having a limit of 3 days instead of 7 for these particular logs. Even in the span of a couple days you could have several logs and run into json size issues. For example: (20) logs found x (3) payload instances x 158 bytes = ~9.5kb. I feel uncomfortable limiting collection of what could be useful data based on the 32kb zendesk limit, while that zendesk limit has its own solution in the s3 mmf.
That being said, if we want to collect these logs at no older than 3 days (instead of 7) we could investigate collecting profiler logs as a separate dotnet task, which may help reduce payload size for runs that are not focused on that agent (like infra). I think we should still exercise some max limit to prevent absurdities (338 profiler log files), but is also a limit that is NOT constrained by zendesk 32kb object limit, if we eventually will move to s3 storage.
Following the link provided on No data appears (Infrastructure) page via Diagnostics CLI page I followed the instructions to install this tool given in the README, but there's nothing describing where the tool is or how to build it.
As given in the README:
find . -name 'nrdiag*'
doesn't provide any result.Either an executable on top level or in a bin directory or in a directory named by my os (linux).
Linux (rasbperry buster) and Mac (Catalina)
improve PHP agent version task to mention which version we found and if there are multiple agents to direct them to this URL:https://discuss.newrelic.com/t/relic-solution-php-agent-not-reporting-web-data/53291
So the customer can understand and fix the issue
The logic for this task lives here: https://github.com/newrelic/newrelic-diagnostics-cli/blob/main/tasks/php/agent/version.go
When troubleshooting log forwarding from the infrastructure agent, it is helpful to look at the generated Fluent Bit configuration to debug issues with parsing the logging.yml configuration.
When running the infra suite, NRDiag should collect the most recently generated Fluent Bit configuration file from the system temp directory, and include it in the Infra/Config output.
Currently, we can use bash like this to identify the most recently generated Fluent Bit configuration file.
ls -lrt /tmp/nr_fb_config* | tail -1
In the temp directory, there may be many generated configuration files, however it doesn't seem worth it to include all the past generations. The most recent file should be the most relevant.
If proxy argument receives unexpected value, like proxy hostname without protocol (Example: ./nrdiag -a MY-ATTACHMENT-KEY -p my-hostname.com:443
. Whereas this what we expect: ./nrdiag -a MY-ATTACHMENT-KEY -p https://my-hostname.com:443
)
we currently debug log this problem, but otherwise from a user perspective, nrdiag just silently fails. We should have info level logging that reports the problem so the user can self-correct.
Where proxy flag gets defined:
https://github.com/newrelic/newrelic-diagnostics-cli/blob/main/config/config.go#L34
Where we should add a warning that we expect a protocol for the proxy value:
https://github.com/newrelic/newrelic-diagnostics-cli/blob/main/processOptions.go#L19
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.