dpetzold / aws-log-parser Goto Github PK
View Code? Open in Web Editor NEWParse AWS CloudFront and LoadBalancer logs to Python3 Data Classes
License: Apache License 2.0
Parse AWS CloudFront and LoadBalancer logs to Python3 Data Classes
License: Apache License 2.0
Hi, I don't know if this behaviour is intended or not, but by asking parsed.path
we lose the information about the parsed.netloc
. Either I used your library in a wrong way, or I am missing something. But in any case, the function read_url
was not able to find my files, even though I gave it the whole path.
aws-log-parser/aws_log_parser/interface.py
Line 152 in a5c19e2
Thank you for the package.
Simon
Hi!
me again, but this time I need help.
To the end this parse logs, I need convert before in json format to import in ElasticSearch, but I have some problems with the dictionary. My question is, can I serialize into json format?
I've tried it with this code (and other more, with a bad result), but I cannot serialize the fields "type", "client", "target" and "http_request" (I don't know how to serialize this values).
This is my code:
# To serialize into json format, firstly the datetime
def jsonserial_log(obj):
if isinstance(obj, datetime.datetime):
return obj.__str__()
# Loop each line and parse into json format
for line in generator_line:
print(json.dumps(line.__dict__, default=jsonserial_log))
And this is the result:
{
"type": null,
"timestamp": "2020-09-30 23:55:09.334103+00:00",
"elb": "app/example-elb/27ccf9b868b7c350",
"client": null,
"target": null,
"request_processing_time": 0,
"target_processing_time": 0.113,
"response_processing_time": 0,
"elb_status_code": 200,
"target_status_code": 200,
"received_bytes": 214,
"sent_bytes": 5993,
"http_request": null,
"user_agent": "user-agent/0.19.0",
"ssl_cipher": "ECDHE-RSA-AES128-GCM-SHA256",
"ssl_protocol": "TLSv1.2",
"target_group_arn": "arn:aws:elasticloadbalancing:eu-west-1:12345678901:targetgroup/example/4581e368a0eb93e5",
"trace_id": "Root=1-5f751add-23f0c3cd71a6118b5901e335",
"domain_name": "api.example.com",
"chosen_cert_arn": "session-reused",
"matched_rule_priority": 100,
"request_creation_time": "2020-09-30 23:55:09.221000+00:00",
"actions_executed": [
"forward"
],
"redirect_url": null,
"error_reason": null
}
To add another comment, the fields "redirect_url" and "error_reason" are "None" in log file, I think this result is okay.
I'll be waiting for your answer. Every help is welcome.
Thanks again, and best regards!
Hi,
The parser only works with application LBs (ElbV2), and fails on classic Elb logs
classic Lb logs entries are different between application\network logs:
https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/access-log-collection.html#access-log-entry-format
vs
https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html#access-log-entry-format
whois.radb.net
Needs better docs
I might want to download and open an ALB log file from S3 outside of aws-log-parser to circumvent memory issues with very large log files. Is there a way to use aws-log-parser to parse a single log line?
Hello, I'm having the following error after installing from pip:
ModuleNotFoundError: No module named 'aws_log_parser.aws'
Exploring the files, it seems that the .aws subfolder is not created inside aws_log_parser.
I tried also to downgrade to the 1.x versions, but I still have the same problem.
What am I doing wrong?
Thank you in advance
Hi,
I think there is an error in the module. If you have a log file with more than one line, and loop it in a for, appear this error:
[ERROR] UnknownHttpType: h
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 65, in lambda_handler
print(next(generator_line))
File "/opt/python/lib/python3.8/site-packages/aws_log_parser/parser.py", line 87, in log_parser
yield log_type.model(*[
File "/opt/python/lib/python3.8/site-packages/aws_log_parser/parser.py", line 88, in <listcomp>
to_python(value, field) for value, field in zip(row, fields)
File "/opt/python/lib/python3.8/site-packages/aws_log_parser/parser.py", line 73, in to_python
return to_http_type(value)
File "/opt/python/lib/python3.8/site-packages/aws_log_parser/parser.py", line 25, in to_http_type
raise UnknownHttpType(value)END RequestId: 2506ce6c-92b8-483a-9748-ad9df3db45ad
My code:
# Build session S3
s3 = boto3.client('s3')
# Obtaining the event notification for S3
for record in event['Records']:
# Save the bucket name and key from the event
bucket = record['s3']['bucket']['name']
keyfile = record['s3']['object']['key']
# Get the new log, content and split in lines
obj = s3.get_object(Bucket=bucket, Key=keyfile)
content = obj['Body'].read().decode('utf-8')
lines = content.splitlines()
# Loop each line and parse it
for line in lines:
print(line)
generator_line = log_parser(line, LogType.LoadBalancer)
print(next(generator_line))
In my case, I have an ELB where save the logs file in a bucket S3. I can access in each log file and print each line correctly in the for.
If you change the line generator_line = log_parser(line, LogType.LoadBalancer) by generator_line = log_parser(lines, LogType.LoadBalancer), then the error doesn't appear, but only parse the first line of the log file. I attach an example log file for test.
I'll be waiting for your answer.
Thanks and regards.
Hey!
Thanks releasing this.
Would you mind making a new Pypi release with the updated code?
Currently the new code is still at 1.6.
Would be nice to pull directly from Pypi this dependency instead of via git.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.