Code Monkey home page Code Monkey logo

aws_s3_log_ingestion_lambda's People

Contributors

danybmx avatar haihongren avatar jcsobrino avatar jsubirat avatar kolanos avatar luckslovez avatar matildasmeds avatar melissaklein24 avatar nedl86 avatar nr-dsharma avatar reclaim0982 avatar sivakumar3695 avatar sivakumarp127 avatar vpat510 avatar william-hill avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aws_s3_log_ingestion_lambda's Issues

Cloudtrail logs are being dropped if the batch size is greater than MAX_BATCH_SIZE * BATCH_SIZE_FACTOR

It appears that cloudtrail batches can be much bigger than MAX_BATCH_SIZE * BATCH_SIZE_FACTOR, when this is the case, the entire batch is currently dropped. Attached screenshot of the error. I can't upload the contents of the batch dropped as it is from a customer.
unnamed-1

Ideally, the batch size / size of log content should be checked prior to sending it to the log API, if it's bigger than the limit, it should be broken up instead of being dropped altogether.

Enable to monitor more than one S3 bucket

Enable to monitor more than one S3 bucket

Hi everyone, I started using this lambda to get logs from a bucket and it works well. What if I want to monitor others buckets? Should I deploy one instance for each bucket? How about change the code to receive a list of buckets to monitor?

I guess that with this feature this lambda will attend more scenarios and as consequence be more used since it will cause less overhead to setup than deploy one instance per bucket.

Support parsing timestamp and add it to log event when log message is formatted as JSON.

I wonder it's related to #26.

When a log file contains one line JSON and JSON has a timestamp value, how about parsing timestamp and add it to a Log event timestamp?

{"timestamp": 1622622938, "key1": "value1", "key2": "value"}
{"timestamp": 1622622948, "key1": "valueA", "key2": "valueB"}

At this point, the handler doesn't parse anything so the log event timestamp will be a time when logs are sent to New Relic Logs.

https://github.com/newrelic/aws_s3_log_ingestion_lambda/blob/master/src/handler.py#L112-L126

Our customer would like to exclude certain folders from being processed by the log ingest function.

When the cloudtrail integrity check is enabled, it will deliver checksum files with the same extension .json.gz into a "CloudTrail-Digest" folder location. The customer cannot use the lambda suffix/prefix options in this case as the folder structure is XXX/cloudtrail/AWSLogs//CloudTrail-Digest/XXX.json.gz for the digest and XXX/cloudtrail/AWSLogs//CloudTrail/XXX.json.gz for the logs. These cloudtrail logs are from multiple (100+) AWS accounts saved in a central location in one AWS Account.
For a use case like this, the AWS S3 Lambda prefix and suffix options do not work as they don't support wildcards. Rather than hardcode the exclusion of the folder CloudTrail-Digest in the function, the better option would be to add a feature to base the exclusion of folders on a regex that is passed into the function as an env variable.

Ignore logs that have no `message` field defined

Hello NewRelic devs! We have a use-case, or really a strong requirement, where we want to ignore all logs being sent to NewRelic that don't have a message field. In our case the field can be string or a json struct itself ... but if it's missing entirely we don't want to forward it along. Is there anyway or config or whatever I can take advantage of to do this?

Auto-generated S3 bucket Optional/Customizable, Allows user to specify Pre-existing S3 Bucket

Could the Auto-generated S3 bucket be optional, and configurable by name. Currently, the auto-generated bucket forces a S3 Bucket with the name of: serverlessrepo-newrelic-s3-log-in-sourcelogbucket-XXXXXXXXXXXXX. If the user decided to take advantage of a S3 bucket created by the serverlessrepo Application, it would be nice if the Parameters took a value to customize this generated S3 Buckets name. Similarly, If the user decided to not use the generated S3 Bucket, the choice to auto-create a S3 Bucket should be optional, and we should be able to specify a pre-existing S3 Bucket to use via the serverlessrepo Application Template Parameters to use with the serverlessrepo-NewRelic-s3-log-ingestion Lambda Function.

Using request_creation_time as record timestamp, and additional custom parsing

  1. We would like to use request_creation_time as the timestamp when ingested into New Relic (versus default which is using the time-of-ingest). Is it possible to add custom parsing to the lamba to enable this?

I see there might be potential to parse the data here:
https://github.com/newrelic/aws_s3_log_ingestion_lambda/blob/master/src/handler.py#L112-L126

  1. The second question is, are we able to parse and add additional fields. Our use-case is we would like to add additional filtering on the uri path (eg: filter out some uri params such as https://hello.com/user/1234). Such filtering may be difficult or impossible to do using glob patterns with the NR server-side parsing

Lambda function is created with invalid environment variable name.

In the code, environment variable with name S3_CLOUDTRAIL_LOG_PATTERN is read from environment variables.

https://github.com/newrelic/aws_s3_log_ingestion_lambda/blob/master/src/handler.py#L107

However if this is deployed from AWS Serverless Application Repository, the environment variable S3_CLOUD_TRAIL_LOG_PATTERN is defined.
("_" is inserted between "CLOUD" and "TRAIL")

https://github.com/newrelic/aws_s3_log_ingestion_lambda/blob/master/template.yml#L91
https://github.com/newrelic/aws_s3_log_ingestion_lambda/blob/master/template.yml#L128

This may introduce miss configuration by the users. They should have same environment variable name.

Decoding Error with S3 objects uploaded by fluent-bit

Just made switch from fluentd to aws-fluent-bit agent here (https://docs.fluentbit.io/manual/pipeline/outputs/s3) and noticed NewRelic ingestion Lambda broke with the following error When specifying content_type= application/gzip or content_type =application/x-gzip.

[ERROR] BadGzipFile: Not a gzipped file (b'{"')
Traceback (most recent call last):
  File "/var/task/handler.py", line 287, in lambda_handler

Then I also tried compression=gzip and the default content-type=binary/octet-stream, the NewRelic lambda does not seem to show any errors, but I was not able to see logs in NewRelic. So, I am not sure if this combo is failing silently or not.
I will investigate this combo further and post back.

Support CloudTrail logs

CloudTrail's JSON payload begins with an array of Records, the current lambda places all Records into a single NR Log entry.

The ask it to detect CloudTrail as a source and generate one NR Log entry for each CloudTrail Records entry.

NewRelic importing logs from S3 bucket/folder_name path

We are creating data partitions from an S3 bucket which contains logfiles for several components in a stack.
We’d like to write query like this → entityName = ‘S3BucketName/folder_name’ so that we can segregate the data at the time of creation of a data partition without having to apply filters afterwards.
Or at least, create an attribute (folder_name) for the path, so that the filtering is more efficient than searching through text.

[Repolinter] Open Source Policy Issues

Repolinter Report

🤖This issue was automatically generated by repolinter-action, developed by the Open Source and Developer Advocacy team at New Relic. This issue will be automatically updated or closed when changes are pushed. If you have any problems with this tool, please feel free to open a GitHub issue or give us a ping in #help-opensource.

This Repolinter run generated the following results:

❗ Error ❌ Fail ⚠️ Warn ✅ Pass Ignored Total
0 1 0 6 0 7

Fail #

readme-contains-link-to-security-policy #

Doesn't contain a link to the security policy for this repository (README.md). New Relic recommends putting a link to the open source security policy for your project (https://github.com/newrelic/<repo-name>/security/policy or ../../security/policy) in the README. For an example of this, please see the "a note about vulnerabilities" section of the Open By Default repository. For more information please visit https://nerdlife.datanerd.us/new-relic/security-guidelines-for-publishing-source-code.

Passed #

Click to see rules

license-file-exists #

Found file (LICENSE). New Relic requires that all open source projects have an associated license contained within the project. This license must be permissive (e.g. non-viral or copyleft), and we recommend Apache 2.0 for most use cases. For more information please visit https://docs.google.com/document/d/1vML4aY_czsY0URu2yiP3xLAKYufNrKsc7o4kjuegpDw/edit.

readme-file-exists #

Found file (README.md). New Relic requires a README file in all projects. This README should give a general overview of the project, and should point to additional resources (security, contributing, etc.) where developers and users can learn further. For more information please visit https://github.com/newrelic/open-by-default.

readme-starts-with-community-plus-header #

The first 5 lines contain all of the requested patterns. (README.md). The README of a community plus project should have a community plus header at the start of the README. If you already have a community plus header and this rule is failing, your header may be out of date, and you should update your header with the suggested one below. For more information please visit https://opensource.newrelic.com/oss-category/.

readme-contains-discuss-topic #

Contains a link to the appropriate discuss.newrelic.com topic (README.md). New Relic recommends directly linking the your appropriate discuss.newrelic.com topic in the README, allowing developer an alternate method of getting support. For more information please visit https://nerdlife.datanerd.us/new-relic/security-guidelines-for-publishing-source-code.

code-of-conduct-should-not-exist-here #

New Relic has moved the CODE_OF_CONDUCT file to a centralized location where it is referenced automatically by every repository in the New Relic organization. Because of this change, any other CODE_OF_CONDUCT file in a repository is now redundant and should be removed. Note that you will need to adjust any links to the local CODE_OF_CONDUCT file in your documentation to point to the central file (README and CONTRIBUTING will probably have links that need updating). For more information please visit https://docs.google.com/document/d/1y644Pwi82kasNP5VPVjDV8rsmkBKclQVHFkz8pwRUtE/view. Did not find a file matching the specified patterns. All files passed this test.

third-party-notices-file-exists #

Found file (THIRD_PARTY_NOTICES.md). A THIRD_PARTY_NOTICES.md file can be present in your repository to grant attribution to all dependencies being used by this project. This document is necessary if you are using third-party source code in your project, with the exception of code referenced outside the project's compiled/bundled binary (ex. some Java projects require modules to be pre-installed in the classpath, outside the project binary and therefore outside the scope of the THIRD_PARTY_NOTICES). Please review your project's dependencies and create a THIRD_PARTY_NOTICES.md file if necessary. For JavaScript projects, you can generate this file using the oss-cli. For more information please visit https://docs.google.com/document/d/1y644Pwi82kasNP5VPVjDV8rsmkBKclQVHFkz8pwRUtE/view.

Function Description Missing

Could the created AWS Lambda Function have a function description of:
Send log data from a S3 bucket to New Relic Logging.

This should be a very easy lift, currently the created function has no function description:
image

Thanks!

[Repolinter] Open Source Policy Issues

Repolinter Report

🤖This issue was automatically generated by repolinter-action, developed by the Open Source and Developer Advocacy team at New Relic. This issue will be automatically updated or closed when changes are pushed. If you have any problems with this tool, please feel free to open a GitHub issue or give us a ping in #help-opensource.

This Repolinter run generated the following results:

❗ Error ❌ Fail ⚠️ Warn ✅ Pass Ignored Total
0 3 0 4 0 7

Fail #

readme-starts-with-community-plus-header #

The README of a community plus project should have a community plus header at the start of the README. If you already have a community plus header and this rule is failing, your header may be out of date, and you should update your header with the suggested one below. For more information please visit https://opensource.newrelic.com/oss-category/. Below is a list of files or patterns that failed:

  • README.md: The first 5 lines do not contain the pattern(s): Open source Community Plus header (see https://opensource.newrelic.com/oss-category).
    • 🔨 Suggested Fix: prepend the latest code snippet found at https://github.com/newrelic/opensource-website/wiki/Open-Source-Category-Snippets#code-snippet-2 to file

readme-contains-link-to-security-policy #

Doesn't contain a link to the security policy for this repository (README.md). New Relic recommends putting a link to the open source security policy for your project (https://github.com/newrelic/<repo-name>/security/policy or ../../security/policy) in the README. For an example of this, please see the "a note about vulnerabilities" section of the Open By Default repository. For more information please visit https://nerdlife.datanerd.us/new-relic/security-guidelines-for-publishing-source-code.

readme-contains-forum-topic #

Doesn't contain a link to the appropriate forum.newrelic.com topic (README.md). New Relic recommends directly linking the your appropriate forum.newrelic.com topic in the README, allowing developer an alternate method of getting support. For more information please visit https://nerdlife.datanerd.us/new-relic/security-guidelines-for-publishing-source-code.

Passed #

Click to see rules

license-file-exists #

Found file (LICENSE). New Relic requires that all open source projects have an associated license contained within the project. This license must be permissive (e.g. non-viral or copyleft), and we recommend Apache 2.0 for most use cases. For more information please visit https://docs.google.com/document/d/1vML4aY_czsY0URu2yiP3xLAKYufNrKsc7o4kjuegpDw/edit.

readme-file-exists #

Found file (README.md). New Relic requires a README file in all projects. This README should give a general overview of the project, and should point to additional resources (security, contributing, etc.) where developers and users can learn further. For more information please visit https://github.com/newrelic/open-by-default.

code-of-conduct-should-not-exist-here #

New Relic has moved the CODE_OF_CONDUCT file to a centralized location where it is referenced automatically by every repository in the New Relic organization. Because of this change, any other CODE_OF_CONDUCT file in a repository is now redundant and should be removed. Note that you will need to adjust any links to the local CODE_OF_CONDUCT file in your documentation to point to the central file (README and CONTRIBUTING will probably have links that need updating). For more information please visit https://docs.google.com/document/d/1y644Pwi82kasNP5VPVjDV8rsmkBKclQVHFkz8pwRUtE/view. Did not find a file matching the specified patterns. All files passed this test.

third-party-notices-file-exists #

Found file (THIRD_PARTY_NOTICES.md). A THIRD_PARTY_NOTICES.md file can be present in your repository to grant attribution to all dependencies being used by this project. This document is necessary if you are using third-party source code in your project, with the exception of code referenced outside the project's compiled/bundled binary (ex. some Java projects require modules to be pre-installed in the classpath, outside the project binary and therefore outside the scope of the THIRD_PARTY_NOTICES). Please review your project's dependencies and create a THIRD_PARTY_NOTICES.md file if necessary. For JavaScript projects, you can generate this file using the oss-cli. For more information please visit https://docs.google.com/document/d/1y644Pwi82kasNP5VPVjDV8rsmkBKclQVHFkz8pwRUtE/view.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.