Code Monkey home page Code Monkey logo

azurefunctionforsplunkvs's Introduction

NOTE

Due to lack of resources to maintain and changes in underlying dependencies, this repo is archived. It can still be forked.

AzureFunctionForSplunkVS

Azure Function sends Azure Monitor telemetry to Splunk - coded in C# / Visual Studio 2017.

Deploy to Azure

Azure Function For Splunk

Azure Function code that sends telemetry from Azure resources to a Splunk Enterprise or Splunk Cloud instance.

It consumes Metrics, Diagnostic Logs and the Activity Log according to the techniques defined by Azure Monitor, which provides highly granular and real-time monitoring data for Azure resources, and passes those selected by the user's configuration along to Splunk.

Here are a few resources if you want to learn more about Azure Monitor:

Important Security Note

The HEC endpoint for a Splunk instance is SSL encrypted. This function CAN ignore the validity of the certificate. To do so, do not provide App Setting 'splunkCertThumbprint' or leave it blank. To ENABLE cert validation, make the value of that setting the thumbprint of the cert. If you provide the cert thumbprint, the splunkAddress must be https://whatever. If you do not provide the cert thumbprint, the splunkAddress must be http://whatever.

Solution Overview

At a high level, the Azure Functions approach to delivering telemetry to Splunk does this:

  • Azure resources deliver telemetry to event hubs
  • Azure Function is triggered by these messages
  • Azure Function delivers the messages to Splunk

Azure Functions are arranged hierarchically as a Function App containing individual functions within. An individual function is triggered by a single event hub. Regarding logs from Azure Monitor, each log category CAN BE sent to its own hub. Each Azure Resource Provider that emits logs may emit more than one log category. Similarly, metrics are sent to a hub as configured by the user. Hence, there MAY BE many hubs for the Function App to watch over. BUT, you can configure all diagnostic logs to go to the same hub. This practice is recommended for simplicity's sake.

Adding additional hubs

If you choose to allow Azure services with diagnostic logs to create their default hubs, you will need to create additional functions (in addition to the list just below) - one per additional hub. The reason for this is that each function is triggered by exactly one hub.

Do this by copying EhDiagnosticLogsExt.cs and name the copy according to the new event hub. For example, if you wanted to use the default hub for Workflow Runtime messages, the default hub name is 'insights-logs-workflowruntime'. You could name your new function 'EhWorkflowRuntimeExt', for example. This is a copy of the code:

DiagnosticLogFunction

Change the first box to 'EhWorkflowRuntimeExt', the second to the same, and the third to something like "%input-hub-name-workflow-runtime%". Then, in the settings create a new one like the following:

DiagnosticLogSettings

Make the setting key match what you put in the 3rd box above ("%input-hub-name-workflow-runtime%") and the value should be the name of your new hub (e.g. insights-logs-workflowruntime). Rebuild and deploy the function app. This is as simple as making your own fork of the code, add the new function & customize it, push and merge your changes to your fork, then use the "Deploy to Azure" (customized to point to your fork) button here in the README.md (above).

Functions in the Function App

  • EhActivityLogsExt - consumes Azure Monitor Activity Logs
  • EhDiagnosticLogsExt - consumes Azure Monitor Diagnostic Logs
  • EhLadTelemetryExt - consumes telemetry from Azure Linux VMs
  • EhMetricsExt - consumes Azure Monitor Metrics
  • EhWadTelemetryExt - consumes telemetry from Azure Windows VMs
  • FaultProcessor - consumes queue messages from faulted transmissions

The Activity Log transmits to a hub named 'Insights-Operational-Logs'. This will be configurable at some point, but for now, the function should be configured to listen to that hub.

The solution leverages the capacity of an Azure Function to be triggered by arrival of messages to an Event Hub. The messages are aggregated by the Azure Functions back end so they arrive at the function already in a batch where the size of the batch depends on current message volume and settings. The batch is examined, the properties of each event are augmented, and then the events are sent via the selected output binding to the Splunk instance.

Cloud-based Splunk using HTTP Event Collector Output Binding

AzureFunctionPlusHEC The image shows only Splunk VM, but the solution targets Splunk Cloud as well. The Splunk VM may be Splunk Enterprise or a Forwarder.

Installation and Configuration

Installation and Configuration tasks for the overall solution fall into a few buckets:

  • Diagnostics profiles
  • Event hubs
  • Splunk instance
  • Azure Function

Diagnostics Profiles

Each resource to be monitored must have a diagnostics profile created for it. This can be done in the portal, but more likely you'll want to write a script to configure existing resources and update your solution templates to create these profiles upon creation of the resource. Here's a place to start:

Automatically enable Diagnostic Settings at resource creation using a Resource Manager template

Diagnostic Profiles for VMs

Each VM to be monitored by the function app requires configuration artifacts:

  • Diagnostic Extension designed for the OS
  • public configuration file - tells the extension which metrics and logs you want to emit from the VM
  • private configuration file - contains credentials for the targets of the VMs telemetry

For Linux VMs, guidance on installing the extension and guidance on designing the configuration files is in this document
Use Linux Diagnostic Extension to monitor metrics and logs
For Windows VMs, that same guidance is here:
Use PowerShell to enable Azure Diagnostics in a virtual machine running Windows

The private configuration file for a Linux VM will require a SAS token. A sample of C# code that generates a suitable SAS token is here. The sasURL in the protected config file will look something like this:

https://namespace.servicebus.windows.net/insights-telemetry-lad?sr=https%3a%2f%2fnamespace.servicebus.windows.net%2finsights-telemetry-lad&sig=wsVxC%2f%2bm7vRhOZjm%2fJMEWTX%2by0sOil6z%2bFqwoWrstkQ%3d&se=1562845709&skn=RootManageSharedAccessKey  

Event hubs

As mentioned, logs and metrics are sent through event hubs. Event hubs are created automatically by the Azure resource providers that need to write the information, so at the outset all you need to do is create the Event Hub Namespace. Here's how to do this in the portal:

Create an Event Hubs namespace and an event hub using the Azure portal

You will need to provide credentials to the Azure Function so it can read the hubs. On one end of the security spectrum you could provide the RootManageSharedAccessKey to all functions for all hubs within the namespace. At the other end of the spectrum (following the principal of least required authority) you can create a policy for each hub with Listen access and provide that credential on a function-by-function basis.

An example of copying the connection string (NOT just the key) associated with the RootManageSharedAccessKey policy is given on this page at the bottom of the page.

To create a least permissions policy:

  • Select the hub from the list of hubs on the event hub namespace blade
  • Select "Shared access policies"
  • Click "+ Add"
  • Give it a name, select "Listen", click "Create" button.
  • Once it's created, re-enter the properties for that new policy and copy the connection string (NOT just the key).

Splunk Instance

Using HEC output binding

Configuration of the Splunk instance amounts to opening the HEC endpoint and creating/copying the authentication token. The endpoint address and token value must be entered as settings into the Function App.

Instructions for opening the endpoint and creating/copying the token are on this Splunk webpage:

HTTP Event Collector walkthrough

Azure Function

There are several ways to create an Azure Function and load your code into it. Here's one such example:

Create your first function using the Azure CLI

This technique requires that your code be referencable in a github repo, and this is exactly what we need.

Because the repo needs to contain settings specific to your installation, I recommend you fork this repo and make your changes there. Then provide the address of your fork in the example above to populate your function app.

Note that the actual settings are not in the code. These are provided by you in the portal.

If you want to automate the creation of your Azure Function (recommended), there is a solution template that accomplishes this located here:

Azure Function Deployment ARM template.

Use the SplunkVS branch in the link. It's configured specifically for this function.

Or, just click the "Deploy to Azure" button at the top of this page.

Once the Function App exists, check and correct application settings. The settings are created automatically if you use the ARM template (the 'Deploy to Azure' button.)

You will then need to create an Azure Storage Queue in the storage account. Its name is case-sensitive: "transmission-faults". On the "Function app settings" page, switch "Function app edit mode" to Read/Write and then disable the FaultProcessor function on the Functions page.

Using HEC output binding

  • hubConnection - connection string for the hub namespace
  • input-hub-name-activity-logs - should be set to 'insights-operational-logs'
  • input-hub-name-diagnostics-logs - set to name of hub for diagnostic logs, e.g. insights-diagnostic-logs (or your other custom choice)
  • input-hub-name-metrics - 'insights-metrics-pt1m'
  • outputBinding - HEC
  • splunkAddress - e.g. https://YOURVM.SOMEREGION.cloudapp.azure.com:8088/services/collector/event
  • splunkToken - e.g. 5F1B2C8F-YOUR-GUID-HERE-CE29A659E7D1
  • splunkCertThumbprint - leave blank to ignore cert validity. This is a good choice if you haven't yet installed your own cert on Splunk Enterprise. Set it to the thumbprint of your cert once it's installed in Splunk Enterprise.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

azurefunctionforsplunkvs's People

Contributors

brettroquemore avatar microsoftopensource avatar msftgits avatar sebastus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

azurefunctionforsplunkvs's Issues

Support Separate Consumer Group

I'm trying to consume diagnostic logs from an event hub that is being consumed somewhere else. I know a good way to consume into multiple destinations is with consumer groups. This stackoverflow mentions there's a ConsumerGroup property in EventHubTrigger, but that's not present in the code in this repo. Is there another way to consume from a different consumer group in a given event hub, or would this have to be a feature request?

Debug Mode

Hi,
Thanks for sharing these functions!

Is there a way to enable a Debug mode and obtain a more detailed error logs ? Some of my 'custom' functions are failing and I am not able to see the issue in the console.

Example:
2018-08-22T23:50:10 Welcome, you are now connected to log-streaming service.
2018-08-22T23:51:10 No new trace in the past 1 min(s).
2018-08-22T23:52:10 No new trace in the past 2 min(s).
2018-08-22T23:52:57.870 [Info] Function started (Id=47be3062-4997-49c1-8277-5449b549a927)
2018-08-22T23:52:57.917 [Error] Function completed (Failure, Id=47be3062-4997-49c1-8277-5449b549a927, Duration=51ms)

Thanks,

Questions

Hi, thanks for the repo.

I've ported the code to for net core and functions V2. Any interest in me committing it back? Only done limited testing and not using your deployment method (as using terraform instead) so can't guarantee all is perfect but they seem sound enough. Could probably PR the changes over if you want them?

Also do you have any advice on forwarding diagnostic logs from app service and azure functions to splunk? I can't find a native way to ship logs to eventhub so was looking at using serilog and their event hub appender but advice would be welcome.

Bunch of failures when functions are processing messages from event hub to splunk

Hello Greg,

We have deployed this code to send all the events from Eventhub to Splunk. It is working great and sending most of the data to splunk, but we see bunch of error count on the function runs to process different category of events/messages. Here is the sample log and the current success/error count to get an idea of what % is failing. Wondering if anything changed that is causing these errors and if we need to make any changes to the function code to make this 100% success ?

Also, is it going to auto try to send messages from fault processor queue or no and if yes, is there going to be certain no of retires and give up ? Can you please check on these sample failure messages for each category and give some direction on how to fix ?

EhActivityLogsExt : Success-57715 ErrorCount-30601

Error emitting messages to output binding: The specified key 'operationName' does not exist in the ExpandoObject.. The messages were held in the fault processor queue for handling once the error is resolved.
The specified key 'operationName' does not exist in the ExpandoObject.
Executed 'EhActivityLogsExt' (Failed, Id=xxxxxxxxxxxxxxxx, Duration=84ms

EhDiagnosticLogsExt : Success-18697 ErrorCount-6542

Error emitting messages to output binding: Unable to extract resourceid or resourceId from the message.. The messages were held in the fault processor queue for handling once the error is resolved.
Unable to extract resourceid or resourceId from the message.
Executed 'EhDiagnosticLogsExt' (Failed, Id=xxxxxxxxxxxxxxxx, Duration=92ms)

EhMetricsExt : Success-2131 ErrorCount-1157

Error emitting messages to output binding: Unable to extract resourceid or resourceId from the message.. The messages were held in the fault processor queue for handling once the error is resolved.
Unable to extract resourceid or resourceId from the message.
Executed 'EhMetricsExt' (Failed, Id=xxxxxxxxxxxxxxxxxxxxxxx, Duration=127ms)

FaultProcessor : Success-7 ErrorCount-186658

Executing 'FaultProcessor' (Reason='New queue message detected on 'transmission-faults'.', Id=xxxxxxxxxxxxxxx)
Trigger Details: MessageId: xxxxxxxxxxxxxxxxxxxxxxxx, DequeueCount: 5, InsertionTime: 2020-10-21T14:48:17.000+00:00
Json: null
Fault Data: xxxxxxxxxxxxxxxxxxxxxxxxxxxx
Blob Reader: transmission-faults
Blob Attribute: Microsoft.WindowsAzure.Storage.Blob.BlobProperties
Object reference not set to an instance of an object.
FaultProcessor failed to transmit: xxxxxxxxxxxxxxxxxxxxxxxxxxxx
FaultProcessor failed to transmit
Executed 'FaultProcessor' (Failed, Id=xxxxxxxxxxxxxxxxxxxxxxxxx, Duration=86ms)
FaultProcessor failed to transmit

Parsing issues

Starting this week I have been getting the following error for 99% of the events in the eventhub.
zAfter parsing a value an unexpected character was encountered: {. Path 'records[0].properties', line 1, position 3757.
Obviously the records[0] changes and the position change.

I tried to pull from the eventhub and parse in python and notice 3 issues with the events coming from the eventhub. I'm not sure if this is related to the parsing issue I am seeing in your function app, but the positions are in similar spots.

I have to remove the following as it makes the json invalid:
}{"correlationId":"XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX","pageNumber":0,"isEnd":true}}]}
I also have to remove quotation marks in the following:
'}}**"**}},{', '}}}},{'
{"requestbody":**"**{"Id":

Is there something that needs to be updated within the function app, or do you think I'm running in to a separate issue?

Add Function Version to make this work

To make this solution work in Azure, kindly change the Function Version on the template. I believe the authors forgot to change the Function Version since this cannot be deployed as V1

Line 232 of the generated ARM template
{
"name": "FUNCTIONS_EXTENSION_VERSION",
"value": "~1"
},

changed to
{
"name": "FUNCTIONS_EXTENSION_VERSION",
"value": "~2"
},

Time field needs to be forced to appear at beginning of event

Splunk's default MAX_TIMESTAMP_LOOKAHEAD is 128 bytes. Unless the event's time field comes within that, Splunk is going to take HEC event received time as event's time which may not be desirable in some cases.
Time field must be forced to appear at the beginning of all event/source types.

Undeclared variables

@sebastus
Now I got this error:
Microsoft.Azure.WebJobs.Host: Error indexing method 'FaultProcessor'. Microsoft.Azure.WebJobs.Host: '%output-hub-name-proxy%' does not resolve to a value.

After defining this parameter in the application settings, I get this error:

The following 6 functions are in error:
EhActivityLogsExt: Microsoft.Azure.WebJobs.Host: Error indexing method 'EhActivityLogsExt'. Microsoft.Azure.WebJobs.Host: Unable to resolve the value for property 'EventHubAttribute.Connection'. Make sure the setting exists and has a valid value.
EhDiagnosticLogsExt: Microsoft.Azure.WebJobs.Host: Error indexing method 'EhDiagnosticLogsExt'. Microsoft.Azure.WebJobs.Host: Unable to resolve the value for property 'EventHubAttribute.Connection'. Make sure the setting exists and has a valid value.
EhLadTelemetryExt: Microsoft.Azure.WebJobs.Host: Error indexing method 'EhLadTelemetryExt'. Microsoft.Azure.WebJobs.Host: Unable to resolve the value for property 'EventHubAttribute.Connection'. Make sure the setting exists and has a valid value.
EhMetricsExt: Microsoft.Azure.WebJobs.Host: Error indexing method 'EhMetricsExt'. Microsoft.Azure.WebJobs.Host: Unable to resolve the value for property 'EventHubAttribute.Connection'. Make sure the setting exists and has a valid value.
EhWadTelemetryExt: Microsoft.Azure.WebJobs.Host: Error indexing method 'EhWadTelemetryExt'. Microsoft.Azure.WebJobs.Host: Unable to resolve the value for property 'EventHubAttribute.Connection'. Make sure the setting exists and has a valid value.
FaultProcessor: Microsoft.Azure.WebJobs.Host: Error indexing method 'FaultProcessor'. Microsoft.Azure.WebJobs.Host: Unable to resolve the value for property 'EventHubAttribute.Connection'. Make sure the setting exists and has a valid value.

Originally posted by @Mr-iX in #20 (comment)

Certificate validation issues

I always get the error

The remote certificate is invalid according to the validation procedure.

I left the thumbprint emty, but even when specifying the correct thumbprint it does not work.

Has anyone a idea, why this doesn't work?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.