Code Monkey home page Code Monkey logo

Comments (10)

vpshibin avatar vpshibin commented on June 2, 2024 1

I am facing the same issue. Upvoted it.

Meanwhile used this trick for my DCR sampling and it worked well. (May be too late for you, but would be helpful for someone)
//obtained the Millisec value from a timestamp I had (previously extracted from RawData and looks like "21:09:35.549+11:00" and will get 549 out this string). Just used the Millisec part as that's the most random one from the timestamp.
| extend TimestampMs=extract("\d+:\d+:\d+.(\d+)\+\d+:\d+", 1, TimeStamp)
// below will get the modulus of the Ms, which will be a number between 0 and 9
| extend TsRandom = toint(TimestampMs) % 10
| where TsRandom < 1 // only ones with modulo 0 are taken, 1-9 are dropped. . This will sample only 10% rows
//then you can add you other conditions like
or Properties.eventId <> "RT0008"

from bctech.

KennieNP avatar KennieNP commented on June 2, 2024

from bctech.

jwikman avatar jwikman commented on June 2, 2024

Ok, thanks for input.

I've created a support ticket for LogAnalytics. (req no. 2311100050001695, if anyone needs that)

from bctech.

jwikman avatar jwikman commented on June 2, 2024

Hi @KennieNP,

I just received an answer from MS support regarding using rand in DCR, where they say that the rand function is not supported for DCR:

I wanted to provide you with an update on this case, after checking internally on your query that, the 'rand' scalar function is not on the list of supported functions:
Since the transformation is applied to each record individually, it can't use any KQL operators that act on multiple records. Only operators that take a single row as input and return no more than one row are supported. For example, summarize isn't supported since it summarizes multiple records. See Supported KQL features for a complete list of supported features.

I kindly request you to refer the documentation guide for reference, which explains an overview of the KQL limitations: https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/data-collection-transformations-structure#kql-limitations

Did you successfully use the above sample in a DCR at some point in time?

from bctech.

KennieNP avatar KennieNP commented on June 2, 2024

from bctech.

jwikman avatar jwikman commented on June 2, 2024

Got the final word from the support request, they checked with the Product Manager and got back that "Rand function is not supported with the DCR"

Any other clues on how to do sampling on selected EventIds?

As is now, we need to skip some EventIds completely (RT0004 represented 50% of all our telemetry, and simply cost too much compared to what it gives)

from bctech.

KennieNP avatar KennieNP commented on June 2, 2024

from bctech.

jwikman avatar jwikman commented on June 2, 2024

I just created a suggestion in the Azure Monitor Community forum, please upvote if you think it makes sense! (and spread the word to others to upvote as well 😉)
You'll find it at https://feedback.azure.com/d365community/idea/eaaf14b5-b493-ee11-a81c-6045bd7fe045

I cannot understand the arguments, that I got from the support case, on why rand() is not supported: Since the transformation is applied to each record individually, it can't use any KQL operators that act on multiple records. Only operators that take a single row as input and return no more than one row are supported. I can't see that the rand() function works on multiple records...

Maybe use a custom endpoint to do the filtering on RT0004?

Wouldn't that end up with double ingestion costs for the events that we forward? First for the Azure Function (all data) and then for the rest when received at Log Analytics?

from bctech.

jwikman avatar jwikman commented on June 2, 2024

Thanks @vpshibin, that was a very creative workaround. Thanks for sharing! 👍

from bctech.

jwikman avatar jwikman commented on June 2, 2024

This is what we use today as AppTraces transformation:

source
| extend TimestampHundredths=tolong(extract(@"\d+-\d+-\d+T\d+:\d+:\d+\.(\d{2})", 1, tostring(TimeGenerated)))
| where ((TimestampHundredths< 5) and (tostring(Properties.eventId) in ("AL0000CTE", "AL0000E24", "AL0000E25", "AL0000E26", "AL0000GDP", "AL0000H7M", "AL0000H7N", "AL0000KZV", "AL0000LB0", "AL0000LB1", "AL0000LB2", "LC0040", "LC0041", "LC0042", "LC0043", "RT0003", "RT0004", "RT0008", "RT0019", "RT0035", "RT0038")))
  or (tostring(Properties.eventId) !in ("AL0000CTE", "AL0000E24", "AL0000E25", "AL0000E26", "AL0000GDP", "AL0000H7M", "AL0000H7N", "AL0000KZV", "AL0000LB0", "AL0000LB1", "AL0000LB2", "LC0040", "LC0041", "LC0042", "LC0043", "RT0003", "RT0004", "RT0008", "RT0019", "RT0035", "RT0038"))
| project-away TimestampHundredths

This cut our costs to 1/5 of earlier.

We identified above events by first running this query to get the events that produces the most data (= highest cost):

traces 
| where timestamp > ago(10d)
| extend Size = estimate_data_size(*)
| summarize  AvgSize = avg(Size), Count=count() by eventId = tostring(customDimensions.eventId)
| extend SizeMb = round(AvgSize*Count / (1024*1024),1)
| project eventId, Count, SizeMb
| sort by SizeMb

We looked through top 30 of those events, kept all events for events that we felt crucial for troubleshooting (performance related, mainly, but also some other).
For simplicity we decided to only keep 5% of all the events we identified.

This was accomplished by using the fraction of the second of the timestamp, as suggested by @vpshibin.

from bctech.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.