Code Monkey home page Code Monkey logo

logswan's Introduction

                                _____
                            .xiX*****Xix.
                          .X7'        '4Xk,
                         dXl            'XX.        .
                        xXXl             XXl        .
                        4XXX             XX'
                       .  ,x            iX'   _,,xxii
                       |   ²|        ,iX7,xiiXXXXXXXl
                       |          .xi,xiXXXXXXXXXXXX:
                       .      ..iXXiXXXXXXXXXXXXXXX7.
                       .    .xXXXXXXXXXXXXXXX'XXXX7 .
                       |   ,XXXXXXXXXXXXXXXX'XXX7'  |
                       :  .XXXXX7*'"' 2XXX7'XX7'    |
  __/ \     _____    ____  \XX' _____  47'  ___  ___      _____     __
.\\_   \___/  _  \__/  _/_______\  _/______/  /  \  \____/  _  \___/  \  _____
. /     __    Y _ __   \__  _________  _____  \/\/   ____ _ _   ______ \/ __///
:/       /    |    \    |'   \/   \/    \/            \/    Y    \/   \    \  :
|\______/\_________/____|    /\____     /\_____/\_____/\____|____/\____\___/  |
+--------------------- \____/ --- \____/ ----:----------------------h7/dS!----+
                       .                     |      :
                       : .                   :      |
                       | .     Logswan       .      |
                       | :                       .  |
                       |_|_______________________|__|
                         |                       :
                                                 .

Logswan

Logswan is a fast Web log analyzer using probabilistic data structures. It is targeted at very large log files, typically APIs logs. It has constant memory usage regardless of the log file size, and takes approximatively 4MB of RAM.

Unique visitors counting is performed using two HyperLogLog counters (one for IPv4, and another one for IPv6), providing a relative accuracy of 0.10%. String representations of IP addresses are used and preferred as they offer better precision.

Project design goals include: speed, memory-usage efficiency, and keeping the code as simple as possible.

Logswan is opinionated software:

  • It only supports the Common Log Format, in order to keep the parsing code simple. It can of course process the Combined Log Format as well (referer and user agent fields will be discarded)
  • It does not split results per day, but log files can be split prior to being processed
  • Input file size and bandwidth usage are reported in bytes, there are no plans to format or round them

Logswan is written with security in mind and is running sandboxed on OpenBSD (using pledge). Experimental seccomp support is available for selected architectures and can be enabled by setting the ENABLE_SECCOMP variable to 1 when invoking CMake. It has also been extensively fuzzed using AFL and Honggfuzz.

Features

Currently implemented features:

  • Counting used bandwidth
  • Counting number of processed lines / invalid lines
  • Counting number of hits (IPv4 and IPv6 hits)
  • Counting visits (unique IP addresses for both IPv4 and IPv6)
  • GeoIP lookups (for both IPv4 and IPv6)
  • Hourly hits distribution
  • HTTP method distribution
  • HTTP protocol distribution
  • HTTP status codes distribution

Dependencies

Logswan uses the CMake build system and requires Jansson and libmaxminddb libraries and header files.

Installing dependencies

  • OpenBSD: pkg_add -r cmake jansson libmaxminddb
  • NetBSD: pkgin in cmake jansson libmaxminddb
  • FreeBSD: pkg install cmake jansson libmaxminddb
  • macOS: brew install cmake jansson libmaxminddb
  • Alpine Linux: apk add cmake gcc make musl-dev jansson-dev libmaxminddb-dev
  • Debian / Ubuntu: apt-get install build-essential cmake libjansson-dev libmaxminddb-dev
  • Fedora: dnf install cmake gcc make jansson-devel libmaxminddb-devel

Building

mkdir build
cd build
cmake ..
make

Logswan has been successfully built and tested on OpenBSD, NetBSD, FreeBSD, macOS, and Linux with both Clang and GCC.

Packages

Logswan packages are available for:

GeoIP2 databases

Logswan looks for GeoIP2 databases in ${CMAKE_INSTALL_PREFIX}/share/dbip by default, which points to /usr/local/share/dbip.

A custom directory can be set using the GEOIP2DIR variable when invoking CMake:

cmake -DGEOIP2DIR=/var/db/dbip .

The free Creative Commons licensed DB-IP IP to Country Lite database can be downloaded here.

Alternatively, GeoLite2 Country database from MaxMind can be downloaded free of charge here, but require accepting an EULA and is not freely licensed.

Usage

logswan [-ghv] [-d db] logfile

If file is a single dash (`-'), logswan reads from the standard input.

The options are as follows:

-d db	Specify path to a GeoIP database.
-g	Enable GeoIP lookups.
-h	Display usage.
-v	Display version.

Logswan outputs JSON data to stdout.

License

Logswan is released under the BSD 2-Clause license. See LICENSE file for details.

Author

Logswan is developed by Frederic Cambus.

Resources

Project homepage: https://www.logswan.org

GitHub: https://github.com/fcambus/logswan

logswan's People

Contributors

alexmyczko avatar fcambus avatar krytarowski avatar mulander avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

logswan's Issues

2 byte log file triggers null ptr deref, logswan segfaults

While fuzzing logswan with American Fuzzy Lop, I was able to trigger a null ptr deref and cause a segfault with logswan and a 2 byte log file.

The log file contains nothing more than :: on a single line.

==12377== Invalid read of size 1
==12377==    at 0x4C29514: __strrchr_sse42 (vg_replace_strmem.c:194)
==12377==    by 0x406CB0: parseRequest (parse.c:52)
==12377==    by 0x40255C: main (logswan.c:174)
==12377==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==12377== 
==12377== Process terminating with default action of signal 11 (SIGSEGV)
==12377==  Access not within mapped region at address 0x0
==12377==    at 0x4C29514: __strrchr_sse42 (vg_replace_strmem.c:194)
==12377==    by 0x406CB0: parseRequest (parse.c:52)
==12377==    by 0x40255C: main (logswan.c:174)
==12377==  If you believe this happened as a result of a stack
==12377==  overflow in your program's main thread (unlikely but
==12377==  possible), you can try to increase the size of the
==12377==  main thread stack using the --main-stacksize= flag.
==12377==  The main thread stack size used in this run was 8388608.
Segmentation fault

Program received signal SIGSEGV, Segmentation fault.
__strrchr_sse42 () at ../sysdeps/x86_64/multiarch/strrchr.S:134
134 ../sysdeps/x86_64/multiarch/strrchr.S: No such file or directory.
(gdb) bt
#0  __strrchr_sse42 () at ../sysdeps/x86_64/multiarch/strrchr.S:134
#1  0x0000000000406cb1 in parseRequest ()
#2  0x000000000040255d in main () at /home/geeknik/logswan/src/logswan.c:174
(gdb) i r
rax            0x0  0
rbx            0x0  0
rcx            0x0  0
rdx            0x60cec2 6344386
rsi            0x0  0
rdi            0x0  0
rbp            0x60b3d0 0x60b3d0
rsp            0x7fffffffe238   0x7fffffffe238
r8             0x0  0
r9             0xf47c   62588
r10            0x0  0
r11            0x7ffff74a3190   140737342222736
r12            0x61d870 6412400
r13            0x3  3
r14            0x0  0
r15            0x0  0
rip            0x7ffff74a3210   0x7ffff74a3210 <__strrchr_sse42+128>
eflags         0x10246  [ PF ZF IF RF ]
cs             0x33 51
ss             0x2b 43
ds             0x0  0
es             0x0  0
fs             0x0  0
gs             0x0  0

This a bit of an issue as Apache2 logs on the VM that I'm testing are formatted like this:

::1 - - [11/Oct/2015:16:55:45 -0500] "OPTIONS * HTTP/1.0" 200 126 "-" "Apache/2.2.22 (Debian) (internal dummy connection)"
::1 - - [11/Oct/2015:16:55:45 -0500] "OPTIONS * HTTP/1.0" 200 126 "-" "Apache/2.2.22 (Debian) (internal dummy connection)"
::1 - - [11/Oct/2015:16:55:45 -0500] "OPTIONS * HTTP/1.0" 200 126 "-" "Apache/2.2.22 (Debian) (internal dummy connection)"
::1 - - [11/Oct/2015:16:55:45 -0500] "OPTIONS * HTTP/1.0" 200 126 "-" "Apache/2.2.22 (Debian) (internal dummy connection)"

Compilation errors on OpenBSD -current

$ make
Scanning dependencies of target logswan
[100%] Building C object CMakeFiles/logswan.dir/src/logswan.c.o
/home/mulander/lab/logswan/src/logswan.c: In function 'main':
/home/mulander/lab/logswan/src/logswan.c:66: error: 'AF_INET' undeclared (first use in this function)
/home/mulander/lab/logswan/src/logswan.c:66: error: (Each undeclared identifier is reported only once
/home/mulander/lab/logswan/src/logswan.c:66: error: for each function it appears in.)
/home/mulander/lab/logswan/src/logswan.c:66: error: invalid use of undefined type 'struct sockaddr_in'
/home/mulander/lab/logswan/src/logswan.c:70: error: 'AF_INET6' undeclared (first use in this function)
/home/mulander/lab/logswan/src/logswan.c:70: error: invalid use of undefined type 'struct sockaddr_in6'
*** Error 1 in . (CMakeFiles/logswan.dir/build.make:56 'CMakeFiles/logswan.dir/src/logswan.c.o': /usr/bin/cc    -o CMakeFiles/logswan.dir/sr...)
*** Error 1 in . (CMakeFiles/Makefile2:61 'CMakeFiles/logswan.dir/all')
*** Error 1 in /home/mulander/lab/logswan/build (Makefile:76 'all')
    0m0.25s real     0m0.06s user     0m0.09s system
$ 

Logswan lies when feeded gzipped logs

Given a log rotated access file

mulander@inferno:~/lab/logswan/build$ ./logswan access.log.0.gz 
-------------------------------------------------------------------------------
                      Logswan (c) by Frederic Cambus 2015                      
-------------------------------------------------------------------------------

Processing file : access.log.0.gz

Hits : 137
Bandwidth : 0
Log file size : 31713
Runtime : 0.000420

provided the same file but ungzipped

mulander@inferno:~/lab/logswan/build$ ./logswan access.log
-------------------------------------------------------------------------------
                      Logswan (c) by Frederic Cambus 2015                      
-------------------------------------------------------------------------------

Processing file : access.log

Hits : 4672
Bandwidth : 1036874
Log file size : 554234
Runtime : 0.006893

brew install logswan

Hey @fcambus, neat library! Just wanted to let you know that it's now available on Homebrew, so Mac users can install it (and all its dependencies) with:

brew install logswan

Homebrew precompiles libraries, so installation is pretty much instant 🎉

Ref: Homebrew/homebrew-core#68306

Compilation warnings on Ubuntu 15.04

mulander@inferno:~/lab/logswan/build$ make
Scanning dependencies of target logswan
[100%] Building C object CMakeFiles/logswan.dir/src/logswan.c.o
/home/mulander/lab/logswan/src/logswan.c: In function ‘main’:
/home/mulander/lab/logswan/src/logswan.c:116:2: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘uint64_t’ [-Wformat=]
  printf("Hits : %llu\n", hits);
  ^
/home/mulander/lab/logswan/src/logswan.c:117:2: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘uint64_t’ [-Wformat=]
  printf("Bandwidth : %llu\n", bandwidth);
  ^
/home/mulander/lab/logswan/src/logswan.c:118:2: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘__off_t’ [-Wformat=]
  printf("Log file size : %llu\n", logFileSize.st_size);
  ^
Linking C executable logswan
[100%] Built target logswan

Responsible disclosure policy

Hey there!

I belong to an open source security research community, and a member (@geeknik) has found an issue, but doesn’t know the best way to disclose it.

If not a hassle, might you kindly add a SECURITY.md file with an email, or another contact method? GitHub recommends this best practice to ensure security issues are responsibly disclosed, and it would serve as a simple instruction for security researchers in the future.

Thank you for your consideration, and I look forward to hearing from you!

(cc @huntr-helper)

Logo art - who's h7/ds?

Great project, and I am quite taken with your ascii art logo. I fondly remember ascii during my bbs and early internet days. Who drew it?

Bandwidth always reported as 0

First off, thanks for the amazing tool! logswan is fantastic.

For some reason, bytes is always returned as 0 for me, I'm not 100% sure whether its my log format or something else. Heres what I've got configured in Nginx:

log_format swanlog '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent $request_length $request_time "$http_referer" "$http_user_agent"';

And the output from a sample run:

{
   "date": "2018-10-14 21:24:41",
   "generator": "Logswan 2.0.2",
   "file_name": "test.log",
   "file_size": 1018,
   "processed_lines": 10,
   "invalid_lines": 0,
   "bandwidth": 0,
   "runtime": 0.010923858999999999,
<snip>

Heres the log lines I'm running it against (with some info redacted)

127.0.0.1 - - [13/Oct/2018:06:25:13 +1100] "GET /some_path HTTP/2.0" 200 498 21 0.001 "<redacted>" "<redacted>"
127.0.0.1 - - [13/Oct/2018:06:25:14 +1100] "GET /path_2 HTTP/1.0" 200 483 81 0.000 "<redaced>" "<redacted>"

The IPs are both IPv6/IPv4 but they parse fine. HTTP2/0 doesn't seem to parse yet either weirdly.

I'm running this on Ubuntu 16.04 with the latest (Swanlog 2.0.2).

logswan coredumps when provided a non existing file

mulander@inferno:~/lab/logswan/build$ ./logswan a
-------------------------------------------------------------------------------
                      Logswan (c) by Frederic Cambus 2015                      
-------------------------------------------------------------------------------

Processing file : a

Segmentation fault (core dumped)

From gdb:

(gdb) run a
Starting program: /home/mulander/lab/logswan/build/logswan a
-------------------------------------------------------------------------------
                      Logswan (c) by Frederic Cambus 2015                      
-------------------------------------------------------------------------------

Processing file : a


Program received signal SIGSEGV, Segmentation fault.
_IO_fgets (buf=0x602240 <lineBuffer> "", n=4096, fp=0x0) at iofgets.c:50
50      iofgets.c: No such file or directory.
(gdb) bt
#0  _IO_fgets (buf=0x602240 <lineBuffer> "", n=4096, fp=0x0) at iofgets.c:50
#1  0x0000000000400e1f in main ()
(gdb)

OpenBSD oddities

Hello 👋

Awesome tool

Using the OpenBSD pkg_add logswan I encountered this program unable to parse https access.log

Processed 25 lines in 0.035031 seconds
{
   "date": "2019-02-10 14:14:07",
   "generator": "Logswan 2.0.2",
   "file_name": "-",
   "file_size": 0,
   "processed_lines": 25,
   "invalid_lines": 25,
   "bandwidth": 0,
...
  • First thoughts were, how to change log format to be compatible. By default it looks like OpenBSD is incompatible.
  • So I changed to log style common, which should work. It didn't
  • So I changed to log style combined, still didn't work

I glanced at differences between the provided examples and my own logs and could find few differences. At this point I kinda gave up.

Then I found the following from the person that suggested logswan as an alternative to google analytics

https://github.com/romanzolotarev/romanzolotarev.com/blob/master/bin/log#L10

It seems in-order to analyse logs, I need to pass through cut utility.

sudo cat /var/www/logs/access.log | cut -d" " -f2- | logswan -

I'm unsure if this should be added to troubleshooting, but it fixed log style combined and log style common httpd on OpenBSD 6.4

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.