Code Monkey home page Code Monkey logo

allinurl / goaccess Goto Github PK

View Code? Open in Web Editor NEW
17.5K 294.0 1.1K 7.93 MB

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.

Home Page: https://goaccess.io

License: MIT License

C 85.58% Makefile 0.65% M4 0.61% CSS 1.68% JavaScript 7.01% HTML 0.99% Roff 3.38% Dockerfile 0.06% Shell 0.04%
goaccess c real-time data-analysis analytics nginx apache webserver web-analytics monitoring

goaccess's Introduction

GoAccess C build GoAccess

What is it?

GoAccess is an open source real-time web log analyzer and interactive viewer that runs in a terminal on *nix systems or through your browser. It provides fast and valuable HTTP statistics for system administrators that require a visual server report on the fly. More info at: https://goaccess.io.

GoAccess Terminal Dashboard GoAccess HTML Dashboard

Features

GoAccess parses the specified web log file and outputs the data to the X terminal. Features include:

  • Completely Real Time
    All panels and metrics are timed to be updated every 200 ms on the terminal output and every second on the HTML output.

  • Minimal Configuration needed
    You can just run it against your access log file, pick the log format and let GoAccess parse the access log and show you the stats.

  • Track Application Response Time
    Track the time taken to serve the request. Extremely useful if you want to track pages that are slowing down your site.

  • Nearly All Web Log Formats
    GoAccess allows any custom log format string. Predefined options include, Apache, Nginx, Amazon S3, Elastic Load Balancing, CloudFront, etc.

  • Incremental Log Processing
    Need data persistence? GoAccess has the ability to process logs incrementally through the on-disk persistence options.

  • Only one dependency
    GoAccess is written in C. To run it, you only need ncurses as a dependency. That's it. It even features its own Web Socket server โ€” http://gwsocket.io/.

  • Visitors
    Determine the amount of hits, visitors, bandwidth, and metrics for slowest running requests by the hour, or date.

  • Metrics per Virtual Host
    Have multiple Virtual Hosts (Server Blocks)? It features a panel that displays which virtual host is consuming most of the web server resources.

  • ASN (Autonomous System Number mapping)
    Great for detecting malicious traffic patterns and block them accordingly.

  • Color Scheme Customizable
    Tailor GoAccess to suit your own color taste/schemes. Either through the terminal, or by simply applying the stylesheet on the HTML output.

  • Support for Large Datasets
    GoAccess features the ability to parse large logs due to its optimized in-memory hash tables. It has very good memory usage and pretty good performance. This storage has support for on-disk persistence as well.

  • Docker Support
    Ability to build GoAccess' Docker image from upstream. You can still fully configure it, by using Volume mapping and editing goaccess.conf. See Docker section below.

Nearly all web log formats...

GoAccess allows any custom log format string. Predefined options include, but not limited to:

  • Amazon CloudFront (Download Distribution).
  • Amazon Simple Storage Service (S3)
  • AWS Elastic Load Balancing
  • Combined Log Format (XLF/ELF) Apache | Nginx
  • Common Log Format (CLF) Apache
  • Google Cloud Storage.
  • Apache virtual hosts
  • Squid Native Format.
  • W3C format (IIS).
  • Caddy's JSON Structured format.
  • Traefik's CLF flavor

Why GoAccess?

GoAccess was designed to be a fast, terminal-based log analyzer. Its core idea is to quickly analyze and view web server statistics in real time without needing to use your browser (great if you want to do a quick analysis of your access log via SSH, or if you simply love working in the terminal).

While the terminal output is the default output, it has the capability to generate a complete, self-contained, real-time HTML report, as well as a JSON, and CSV report.

You can see it more of a monitor command tool than anything else.

Installation

Build from release

GoAccess can be compiled and used on *nix systems.

Download, extract and compile GoAccess with:

$ wget https://tar.goaccess.io/goaccess-1.9.2.tar.gz
$ tar -xzvf goaccess-1.9.2.tar.gz
$ cd goaccess-1.9.2/
$ ./configure --enable-utf8 --enable-geoip=mmdb
$ make
# make install

Build from GitHub (Development)

$ git clone https://github.com/allinurl/goaccess.git
$ cd goaccess
$ autoreconf -fiv
$ ./configure --enable-utf8 --enable-geoip=mmdb
$ make
# make install

Build in isolated container

You can also build the binary for Debian based systems in an isolated container environment to prevent cluttering your local system with the development libraries:

$ curl -L "https://github.com/allinurl/goaccess/archive/refs/heads/master.tar.gz" | tar -xz && cd goaccess-master
$ docker build -t goaccess/build.debian-12 -f Dockerfile.debian-12 .
$ docker run -i --rm -v $PWD:/goaccess goaccess/build.debian-12 > goaccess

Distributions

It is easiest to install GoAccess on GNU+Linux using the preferred package manager of your GNU+Linux distribution. Please note that not all distributions will have the latest version of GoAccess available.

Debian/Ubuntu

# apt-get install goaccess

Note: It is likely this will install an outdated version of GoAccess. To make sure that you're running the latest stable version of GoAccess see alternative option below.

Official GoAccess Debian & Ubuntu repository

$ wget -O - https://deb.goaccess.io/gnugpg.key | gpg --dearmor \
    | sudo tee /usr/share/keyrings/goaccess.gpg >/dev/null
$ echo "deb [signed-by=/usr/share/keyrings/goaccess.gpg arch=$(dpkg --print-architecture)] https://deb.goaccess.io/ $(lsb_release -cs) main" \
    | sudo tee /etc/apt/sources.list.d/goaccess.list
$ sudo apt-get update
$ sudo apt-get install goaccess

Note:

  • .deb packages in the official repo are available through HTTPS as well. You may need to install apt-transport-https.

Fedora

# yum install goaccess

Arch

# pacman -S goaccess

Gentoo

# emerge net-analyzer/goaccess

OS X / Homebrew

# brew install goaccess

FreeBSD

# cd /usr/ports/sysutils/goaccess/ && make install clean
# pkg install sysutils/goaccess

OpenBSD

# cd /usr/ports/www/goaccess && make install clean
# pkg_add goaccess

openSUSE

# zypper ar -f obs://server:http http
# zypper in goaccess

OpenIndiana

# pkg install goaccess

pkgsrc (NetBSD, Solaris, SmartOS, ...)

# pkgin install goaccess

Windows

GoAccess can be used in Windows through Cygwin. See Cygwin's packages. Or through the GNU+Linux Subsystem on Windows 10.

Distribution Packages

GoAccess has minimal requirements, it's written in C and requires only ncurses. However, below is a table of some optional dependencies in some distros to build GoAccess from source.

Distro NCurses GeoIP (opt) GeoIP2 (opt) OpenSSL (opt)
Ubuntu/Debian libncurses-dev libgeoip-dev libmaxminddb-dev libssl-dev
RHEL/CentOS ncurses-devel geoip-devel libmaxminddb-devel openssl-devel
Arch ncurses geoip libmaxminddb openssl
Gentoo sys-libs/ncurses dev-libs/geoip dev-libs/libmaxminddb dev-libs/openssl
Slackware ncurses GeoIP libmaxminddb openssl

Note: You may need to install build tools like gcc, autoconf, gettext, autopoint etc. for compiling/building software from source. e.g., base-devel, build-essential, "Development Tools".

Docker

A Docker image has been updated, capable of directing output from an access log. If you only want to output a report, you can pipe a log from the external environment to a Docker-based process:

touch report.html
cat access.log | docker run --rm -i -v ./report.html:/report.html -e LANG=$LANG allinurl/goaccess -a -o report.html --log-format COMBINED -

OR real-time

tail -F access.log | docker run -p 7890:7890 --rm -i -e LANG=$LANG allinurl/goaccess -a -o report.html --log-format COMBINED --real-time-html -

You can read more about using the docker image in DOCKER.md.

Contributing

Any help on GoAccess is welcome. The most helpful way is to try it out and give feedback. Feel free to use the GitHub issue tracker and pull requests to discuss and submit code changes.

You can contribute to our translations by editing the .po files direct on GitHub or using the visual interface inlang.com

translation badge

Storage

Default Hash Tables

In-memory storage provides better performance at the cost of limiting the dataset size to the amount of available physical memory. GoAccess uses in-memory hash tables. It has very good memory usage and pretty good performance. This storage has support for on-disk persistence as well.

Command Line / Config Options

See options that can be supplied to the command or specified in the configuration file. If specified in the configuration file, long options need to be used without prepending --.

Usage / Examples

Note: Piping data into GoAccess won't prompt a log/date/time configuration dialog, you will need to previously define it in your configuration file or in the command line.

Getting Started

To output to a terminal and generate an interactive report:

# goaccess access.log

To generate an HTML report:

# goaccess access.log -a > report.html

To generate a JSON report file:

# goaccess access.log -a -d -o report.json

To generate a CSV report to stdout:

# goaccess access.log --no-csv-summary -o csv

GoAccess also allows great flexibility for real-time filtering and parsing. For instance, to quickly diagnose issues by monitoring logs since goaccess was started:

# tail -f access.log | goaccess -

And even better, to filter while maintaining opened a pipe to preserve real-time analysis, we can make use of tail -f and a matching pattern tool such as grep, awk, sed, etc.:

# tail -f access.log | grep -i --line-buffered 'firefox' | goaccess --log-format=COMBINED -

or to parse from the beginning of the file while maintaining the pipe opened and applying a filter

# tail -f -n +0 access.log | grep -i --line-buffered 'firefox' | goaccess -o report.html --real-time-html -

Multiple Log files

There are several ways to parse multiple logs with GoAccess. The simplest is to pass multiple log files to the command line:

# goaccess access.log access.log.1

It's even possible to parse files from a pipe while reading regular files:

# cat access.log.2 | goaccess access.log access.log.1 -

Note: the single dash is appended to the command line to let GoAccess know that it should read from the pipe.

Now if we want to add more flexibility to GoAccess, we can use zcat --force to read compressed and uncompressed files. For instance, if we would like to process all log files access.log*, we can do:

# zcat --force access.log* | goaccess -

Note: On Mac OS X, use gunzip -c instead of zcat.

Multi-thread Support

Use --jobs=<count> (or -j) to enable multi-thread parsing. For example:

# goaccess access.log -o report.html -j 4

And use --chunk-size=<256-32768> to adjust chunk size, the default chunk size is 1024. For example:

# goaccess access.log -o report.html -j 4 --chunk-size=8192

Real-time HTML outputs

GoAccess has the ability the output real-time data in the HTML report. You can even email the HTML file since it is composed of a single file with no external file dependencies, how neat is that!

The process of generating a real-time HTML report is very similar to the process of creating a static report. Only --real-time-html is needed to make it real-time.

# goaccess access.log -o /usr/share/nginx/html/your_site/report.html --real-time-html

To view the report you can navigate to http://your_site/report.html.

By default, GoAccess will use the host name of the generated report. Optionally, you can specify the URL to which the client's browser will connect to. See FAQ for a more detailed example.

# goaccess access.log -o report.html --real-time-html --ws-url=goaccess.io

By default, GoAccess listens on port 7890, to use a different port other than 7890, you can specify it as (make sure the port is opened):

# goaccess access.log -o report.html --real-time-html --port=9870

And to bind the WebSocket server to a different address other than 0.0.0.0, you can specify it as:

# goaccess access.log -o report.html --real-time-html --addr=127.0.0.1

Note: To output real time data over a TLS/SSL connection, you need to use --ssl-cert=<cert.crt> and --ssl-key=<priv.key>.

Filtering

Working with dates

Another useful pipe would be filtering dates out of the web log

The following will get all HTTP requests starting on 05/Dec/2010 until the end of the file.

# sed -n '/05\/Dec\/2010/,$ p' access.log | goaccess -a -

or using relative dates such as yesterdays or tomorrows day:

# sed -n '/'$(date '+%d\/%b\/%Y' -d '1 week ago')'/,$ p' access.log | goaccess -a -

If we want to parse only a certain time-frame from DATE a to DATE b, we can do:

# sed -n '/5\/Nov\/2010/,/5\/Dec\/2010/ p' access.log | goaccess -a -

If we want to preserve only certain amount of data and recycle storage, we can keep only a certain number of days. For instance to keep & show the last 5 days:

# goaccess access.log --keep-last=5

Virtual hosts

Assuming your log contains the virtual host field. For instance:

vhost.io:80 8.8.4.4 - - [02/Mar/2016:08:14:04 -0600] "GET /shop HTTP/1.1" 200 615 "-" "Googlebot-Image/1.0"

And you would like to append the virtual host to the request in order to see which virtual host the top urls belong to:

awk '$8=$1$8' access.log | goaccess -a -

To do the same, but also use real-time filtering and parsing:

tail -f  access.log | unbuffer -p awk '$8=$1$8' | goaccess -a -

To exclude a list of virtual hosts you can do the following:

# grep -v "`cat exclude_vhost_list_file`" vhost_access.log | goaccess -

Files, status codes and bots

To parse specific pages, e.g., page views, html, htm, php, etc. within a request:

# awk '$7~/\.html|\.htm|\.php/' access.log | goaccess -

Note, $7 is the request field for the common and combined log format, (without Virtual Host), if your log includes Virtual Host, then you probably want to use $8 instead. It's best to check which field you are shooting for, e.g.:

# tail -10 access.log | awk '{print $8}'

Or to parse a specific status code, e.g., 500 (Internal Server Error):

# awk '$9~/500/' access.log | goaccess -

Or multiple status codes, e.g., all 3xx and 5xx:

# tail -f -n +0 access.log | awk '$9~/3[0-9]{2}|5[0-9]{2}/' | goaccess -o out.html -

And to get an estimated overview of how many bots (crawlers) are hitting your server:

# tail -F -n +0 access.log | grep -i --line-buffered 'bot' | goaccess -

Tips

Also, it is worth pointing out that if we want to run GoAccess at lower priority, we can run it as:

# nice -n 19 goaccess -f access.log -a

and if you don't want to install it on your server, you can still run it from your local machine!

# ssh -n root@server 'tail -f /var/log/apache2/access.log' | goaccess -

Note: SSH requires -n so GoAccess can read from stdin. Also, make sure to use SSH keys for authentication as it won't work if a passphrase is required.

Troubleshooting

We receive many questions and issues that have been answered previously.

Incremental log processing

GoAccess has the ability to process logs incrementally through its internal storage and dump its data to disk. It works in the following way:

  1. A dataset must be persisted first with --persist, then the same dataset can be loaded with.
  2. --restore. If new data is passed (piped or through a log file), it will append it to the original dataset.
NOTES

GoAccess keeps track of inodes of all the files processed (assuming files will stay on the same partition), in addition, it extracts a snippet of data from the log along with the last line parsed of each file and the timestamp of the last line parsed. e.g., inode:29627417|line:20012|ts:20171231235059

First, it compares if the snippet matches the log being parsed, if it does, it assumes the log hasn't changed drastically, e.g., hasn't been truncated. If the inode does not match the current file, it parses all lines. If the current file matches the inode, it then reads the remaining lines and updates the count of lines parsed and the timestamp. As an extra precaution, it won't parse log lines with a timestamp โ‰ค than the one stored.

Piped data works based off the timestamp of the last line read. For instance, it will parse and discard all incoming entries until it finds a timestamp >= than the one stored.

Examples
// last month access log
# goaccess access.log.1 --persist

then, load it with

// append this month access log, and preserve new data
# goaccess access.log --restore --persist

To read persisted data only (without parsing new data)

# goaccess --restore

Enjoy!

goaccess's People

Contributors

abgit avatar allinurl avatar benjaoming avatar cgzones avatar coldfix avatar d3f3kt avatar da2x avatar dcyang avatar dertuxmalwieder avatar elboulangero avatar haricot avatar julianxhokaxhiu avatar mario-donnarumma avatar metrix78 avatar millsjustin avatar octo avatar rsmarples avatar rtmkrlv avatar sea-n avatar serenaai avatar slowmo24 avatar stephenwade avatar szepeviktor avatar terceiro avatar terratech avatar thomas-mc-work avatar throwaway1037 avatar titaniumtown avatar vincentbernat avatar wodev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

goaccess's Issues

geoip GEOIP_MEMORY_CACHE vs GEOIP_STANDARD

Hi,
On latest unstable branch commit, GeoIP_new was changed from GeoIP_new(GEOIP_MEMORY_CACHE) to GeoIP_new(GEOIP_STANDARD).

Note that, GEOIP_MEMORY_CACHE is much much faster than GEOIP_STANDARD.
On my 2 million hits file, i get this parsing times:

  • GEOIP_STANDARD: in 40 seconds
  • GEOIP_MEMORY_CACHE: in 12 seconds

almost 4x faster.

best,

ISO8601 date format (goaccess 0.6.1)

I have configured our nginx servers to use the $iso8601 (ISO 8601) date format, but GoAccess seems unable to understanding that format.

In the .goaccessrc I defined the date format as %FT%T%**:**z but it seems like the colon is ignored and I just get the error

An error has occurred
Error occured at: goaccess.c - render_screens - 318
Message: Nothing valid to process.

When I change .goaccessrc date format to %FT%T%z (without : before z) I can do a

bash# cat logfile | sed 's/(+.):(.)]/\1\2]/' | goaccess

I get the expected output when running

bash# date +%FT%T%:z

v 0.7 Message: Nothing valid to process

Hi I cannot get it to work with custom format...

Can you please help me to get it working for this log? Actually it is AWS with load balancing format (with additional milliseconds process time).

It'd very helpful if the utility would tell how the config is wrong. Or having verbosity levels...

Thanks a lot!

$ ~/src/goaccess/goaccess-0.7/goaccess -f access.log -p ~/.goaccessrc 

GoAccess - version 0.7 - Dec 17 2013 10:03:23

An error has occurred
Error occured at: goaccess.c - render_screens - 366
Message: Nothing valid to process.

.goaccessrc

color_scheme 0
date_format %d/%b/%Y
log_format %^:%^ %h  %^[%d:%^] "%r" %s %b "%R" "%^" us:%D

access.log (parsing fails even if I remove first two lines without IP)

lb.example.com:80  - - - [27/Sep/2013:12:33:50 +0000] "OPTIONS * HTTP/1.0" 200 110 "-" "Apache (internal dummy connection)"
lb.example.com:80  - - - [11/Dec/2013:06:28:21 +0000] "-" 408 0 "-" "-" us:12
lb.example.com:80  180.252.172.76 - - [11/Dec/2013:06:28:21 +0000] "POST /someaction HTTP/1.1" 200 741 "http://external.referrer.example.com/somepage" "Mozilla/5.0 (Windows NT 5.2; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0" us:39317

Add option for horizontal scrolling, long paths cut off

we got logs where the path exceeds the console screen width meaning the path gets cut off and ... put instead.

I was hoping the html version would not do that but it does.

Couldn't find a way to scroll nor set the width to some larger value.

Percentages out by x100

Percentages are out by a factor of 100, i.e. when it should say "21%" it says "0.21%", for all sections.

Possibility of using glob/regexp to ignore hosts

Would be really nice to use either method to move internal hosts/loadbalancers to the ignore list.

something like -e 10.,172.

best would be of course to be able to use CIDR notation:

for example
-e 10/8,172.16/12,192.168/16

debian package

Hi,

When installing via

apt-get install goaccess

on Debian 6.0.8 (Linux 3.8.4-x86_64) or Debian 6.0.8 (Linux 2.6.18)
it installs v.0.1.2 and when I use

#goaccess -c
Unknown option `-c'.

#zcat -f /var/log/inmood.ru/access* | goaccess
goaccess version 0.1.2

Usage: goaccess [ -b ][ -s ][ -e IP_ADDRESS][ -f log_file ]

The following options can also be supplied to the command:

  -f  - Path to input log <filename>
  -b  - Enable total bandwidth consumption. To achieve faster parsing, do not en                                                                                                                                                             able this flag.
  -s  - Enable/report HTTP status codes. To avoid overhead while parsing,
            this has been disabled by default.
  -e  - Exclude an IP from being counted under the HOST module.
        This has been disabled by default.

For more details visit: http://goaccess.prosoftcorp.com

on the other hand, if I install goaccess from sources - everything is ok, I could use '-c' key to set up log format and 'zcat | goaccess' works perfect.

apt-get install geoip-database
apt-get install gcc
apt-get install libglib2.0
apt-get install libgeoip-dev
apt-get install libncursesw5-dev
wget http://downloads.sourceforge.net/project/goaccess/0.6.1/goaccess-0.6.1.tar.gz
tar -xzvf goaccess-0.6.1.tar.gz
cd goaccess-0.6.1/
./configure --enable-geoip --enable-utf8 
make
make install
./goaccess -f /var/log/nginx/access.log -c
zcat -f /var/log/nginx/access* | ./goaccess

Would you tell me please what may be a problem?

Thanks.

Capture nginx $request_time

My nginx log_format is basically a NSCA Combined Log Format with $request_time appended. $request_time is the duration of the request in seconds with millisecond precision (like 0.2 for 200ms.)

Respectively, my goaccess log_format is NSCA Combined Log Format with %T appended:

%h %^[%d:%^] "%r" %s %b "%R" "%u" %T

Yet the serve time is always zero in my goaccess report. Everything else is parsed properly from the log.

What's the problem?

top continents

Top continents so that we can get total per continent.
This is almost same as countries but will eliminate the need to re-group countries counter. We will not need to cycle countries list and foreach country add to total to a continent list.

What do you think?

top by device-type

This is a counter that will group hits by device type.
And, at least we have 3 access types:
1: desktop
2: mobile devices
3: other (eg: crawlers)

Very useful on analytics to apply some strategy.

desktop: X hits ( x%)
mobile: Y hits (y%)
other: Z hits (z%)

running goaccess from cron

Looks like goaccess do not see ~/.goaccessrc when running from cron.
I feel it is not goaccess problem, but rather cron.
May be you can help?

Thank you!

fix when isatty() cannot find valid file descriptor

When we want to export goaccess stats ( by html or csv ), goaccess automagically tries to check if a output file descriptor is valid by using isatty(). Then, conf.output_html is set and file descriptors are check and if not valid, a help info is displayed.

The problem is that, on some enviroments, isatty() don't work as expected. This is true, eg, when we try to automate a export inside a cron by:

#!/bin/bash
find /folder/ -name "*log" -exec /bin/sh -c "~/goaccess -o csv -f {} > {}.csv" \;

In previous case, goaccess isatty() cannot interpret input and isatty (STDIN_FILENO) don't work at all, so we always get a help info inside output file.

So, i created a simple fix for when using goaccess in this environments. When -o is forced it's because we want a output, so, this is same as isatty() check.

best,

List the top crawlers

Hi!

Is there any option to list the top crawlers? I can't find how to that (if available)...

Thanks.
Jose

GoAccess Haproxy CLF

Cant seem to read Haproxy CLF formats

CLF log format

log-format %{+Q}o\ %{-Q}Ci\ -\ -\ [%T]\ %r\ %st\ %B\ \"\"\ \"\"\ %Cp\ \
           %ms\ %ft\ %b\ %s\ \%Tq\ %Tw\ %Tc\ %Tr\ %Tt\ %tsc\ %ac\ %fc\ \
           %bc\ %sc\ %rc\ %sq\ %bq\ %cc\ %cs\ \%hrl\ %hsl

e.g

70.74.14.133 - - [18/Sep/2013:17:53:19 +0000] "GET /account/[email protected]&auth=fb0a&useMobileSite=fale 6 HTTP/1.1" 302 480 "" "" 50535 265 "cloud1" "recipes:t5engine::engine" "recipes:t5engine::engine_domU-12-22-39-13-40-4" 0 0 0 191 191 ---- 45 45 3 1 0 0 0 "" "" "www.domain.com"

http://blog.exceliance.fr/2012/10/29/haproxy-log-customization/

Display VirtualHosts for logs

When I look at the top request files for a log with VirtualHosts, it is not very useful to know about a path if I don't know its associated VirtualHost. Is it possible to display a column with this information? I am using version 0.6 from the Debian repo.

Thanks!

Logformat with Bandwidth for Apache

Hi,

what would be the correct LogFormat for Apache to report Bandwidth usage in GoAccess?

I have CustomLog combined in Apache, and GoAccess is using Common Log Format.

But it does not show the bandwidth usage.

docs to build from github

What do you think to add a section in documentation so that users can build goaccess from github too?

# git clone https://github.com/allinurl/goaccess.git
# cd goaccess
# autoreconf -fi
# ./configure --enable-geoip --enable-utf8 
# make

just an idea..

best,

Error occured at: goaccess.c - render_screens - 239

I'm trying to use goaccess on FreeBSD 9.1 and using a custom log format in my .goaccessrc. I used the package for installation pkg_add -r goaccess

here's my rc file:

date_format "%d/%b/%Y:%H:%M:%S %z"
log_format "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-Agent}i\""

Here's the error I get when I run goaccess -f www.domain.com-access-psky.log -a:

An error has occurred
Error occured at: goaccess.c - render_screens - 239
Message: Nothing valid to process.

This also happens with using the normal "Combined" log format.

Build error: *** glib2 not found!

I try to build goacces from master on Ubuntu 12.04 server 32 bit and get error on configure:

www-data@li269-146:~/goaccess$ ./configure --enable-utf8
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for style of include used by make... GNU
checking dependency style of gcc... gcc3
./configure: line 3686: PKG_PROG_PKG_CONFIG: command not found
./configure: line 3688: syntax error near unexpected token `GLIB2,'
./configure: line 3688: `PKG_CHECK_MODULES(GLIB2, glib-2.0, , as_fn_error $? "*** glib2 not found!" "$LINENO" 5)'

I have latest versions of glib2*:

www-data@li269-146:~/goaccess$ sudo apt-get install libglib2.0
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Note, selecting 'libglib2.0-0-refdbg' for regex 'libglib2.0'
Note, selecting 'libglib2.0-bin' for regex 'libglib2.0'
Note, selecting 'libglib2.0-dbg' for regex 'libglib2.0'
Note, selecting 'libglib2.0-cil' for regex 'libglib2.0'
Note, selecting 'libglib2.0-dev' for regex 'libglib2.0'
Note, selecting 'libglib2.0-doc' for regex 'libglib2.0'
Note, selecting 'libglib2.0-cil-dev' for regex 'libglib2.0'
Note, selecting 'libglib2.0-data' for regex 'libglib2.0'
Note, selecting 'libglib2.0-0' for regex 'libglib2.0'
Note, selecting 'libglib2.0-0-dbg' for regex 'libglib2.0'
libglib2.0-0 is already the newest version.
libglib2.0-bin is already the newest version.
libglib2.0-bin set to manually installed.
libglib2.0-data is already the newest version.
libglib2.0-data set to manually installed.
libglib2.0-dev is already the newest version.
...

I have some issue described in #26. How can i build latest goaccess?

output problem on empty/incorrect log input

Just a note that, when an error is generated and a output is defined, goaccess gets Segmentation fault.

    -bash-4.1$ echo '' | ./goaccess-0.6/goaccess > out.txt
    Segmentation fault

Afaik, software should never get a segmentation fault (read: crash).
Maybe an empty output should be stored in file instead?

buffer overflow detected with -p when compiled with -O2

goaccess-0.6.1, gcc-4.7.3, glibc-2.16.0
Adding -O2 to AM_CFLAGS, bt from gdb on goaccess -p goaccessrc:

Program received signal SIGABRT, Aborted.
0x00007ffff6fe5015 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff6fe5015 in raise () from /lib64/libc.so.6
#1  0x00007ffff6fe6498 in abort () from /lib64/libc.so.6
#2  0x00007ffff702348b in ?? () from /lib64/libc.so.6
#3  0x00007ffff70ab987 in __fortify_fail () from /lib64/libc.so.6
#4  0x00007ffff70a98f0 in __chk_fail () from /lib64/libc.so.6
#5  0x00007ffff70a9ec4 in __realpath_chk () from /lib64/libc.so.6
#6  0x0000000000402ea9 in realpath (__resolved=0x61a648 <conf+8> "", __name=<optimized out>)
    at /usr/include/bits/stdlib.h:43
#7  main (argc=3, argv=0x7fffffffda08) at goaccess.c:676

nginx log format

Hello!

I have tried everything but can't make it work with this nginx log format:

$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for"

Example:
127.0.0.1 - - [23/Aug/2013:14:01:26 +0100] "GET /sites/xxx.pt/files/imagecache/64x64/avatar/picture-156409.jpg HTTP/1.1" 200 1768 "http://xxx.pt/forum/pr-saldos-80" "Mozilla/5.0 (Windows NT 5.1; rv:22.0; Avant TriCore) Gecko/20130630 Firefox/22.0" "213.58.193.194, 213.58.193.194"

I'm using version 0.6. I need the %h to be the http_x_forwarded_for

Thanks!

can't parse custom logs

I'm trying to get goaccess working. Already take some time on docs and examples but without success.

Basically i have log lines like:

10031488_71334 xpto.domain.com 88.103.999.999 - - [16/Jun/2013:15:03:26 +0000] "GET / HTTP/1.1" 500 5624 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0" -

and already compile goaccess and create a conf file:

cat ~/.goaccessrc
date_format %d/%b/%Y:%T %z
log_format %^ %^ %h - - [$d] "%r" %s %b "%R" "%u" %^

Then i got:

./goaccess -a -f miniout.log
GoAccess - version 0.6 - Aug  5 2013 20:20:16
An error has occurred
Error occured at: goaccess.c - render_screens - 358
Message: Nothing valid to process.

Option -c seems not to work

marcob@js:~$ goaccess -c

GoAccess - 0.5

Usage: goaccess [ -e IP_ADDRESS][ - a ][ - c ]< -f log_file >

The following options can also be supplied to the command:

-f - Path to input log file.
... etcetera

I waited for a screen or prompt for defining date formate. I installed goaccess from Ubuntu Software center.
The cofinguration file .goaccessrc have neither been created.

Some detail about my system, if could help:
marcob@js:~$ uname -a
Linux js 3.5.0-37-generic #58-Ubuntu SMP Mon Jul 8 22:07:55 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Ciao!

A New Feature - Per Virtual Host Statistics

Another feature which could help an apache admin alot is to have reports and statistics based on apache vhost definitions.
There exists a utility called apache-top which is a python script, it has the feature to report based on apache vhosts configuration.

Cant have HTML report

Hi,

I try to have an HTML report but, the command never ending, I wait more than 1 hour and when I try to see the HTML file with Firefox I have this :
http://paste.ubuntu.com/5901462/

And goaccess command is still running :
goaccess -f /var/log/nginx/access.log -a > /var/www/index.html

GoAccess 6.0 with very big log file

I'm trying to get a HTML report for a 30G (55 706 267 lines) log file :

goaccess -f log.txt -a > report.html

The process is stopped by system (Ubuntu) after a few hours

Here is what I find in kern.log :

Sep 10 18:44:24 del1207 kernel: [5454908.405972] goaccess invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0, oom_score_adj=0
Sep 10 18:44:24 del1207 kernel: [5454908.405977] goaccess cpuset=/ mems_allowed=0-1
Sep 10 18:44:24 del1207 kernel: [5454908.405981] Pid: 11593, comm: goaccess Tainted: G        W    3.2.0-4-amd64 #1 Debian 3.2.41-2
Sep 10 18:44:24 del1207 kernel: [5454908.458632] [11593]  1001 11593  4787017  3997155   1       0             0 goaccess
Sep 10 18:44:24 del1207 kernel: [5454908.458635] Out of memory: Kill process 11593 (goaccess) score 943 or sacrifice child
Sep 10 18:44:24 del1207 kernel: [5454908.458659] Killed process 11593 (goaccess) total-vm:19148068kB, anon-rss:15988280kB, file-rss:340kB

It seems the system runs out of memory (?) though I'm running goaccess on a 16G RAM server ;)

Have goaccess been tested with big log files ? Are there specific options for dealing with big log files ?

goaccces doesn't parse other HTTP methods

I found goaccess didn't parse HTTP method like PUT in nginx access.log.
For example, there was PUT /xxx/ooo HTTP/1.1 in 2 - Requested files (Pages-URL) part and 5 - HTTP 404 Not Found response code. Itโ€˜s supposed to be only /xxx/ooo in `request url, not including the HTTP method and HTTP version.

In rfc2616:, there are 8HTTP methods, but I found there was only 3 methods in parser.c.

Could you add the rest of them ?

hours (visits per day hour, 00-01.. 23-00)

Another counter and will group hits by hour of day they occur.
Very useful on analytics to study the best hours to apply some strategy.

00: X hits
01: Y hits
..
23: Z hits

What do you think?

option to use custom configuration file

If we have multiple access logs with different formats we cannot use just one config.
With a new commandline option we could define a new "goaccessrc" config file.

eg:

goaccess -p /home/me/goaccessrc1 -f access_format1.log
goaccess -p /home/me/goaccessrc2 -f access_format2.log

merge needed features

hi all,
When can we see all features already made in stable version?

  • countries (isocode+name) module;
  • continents (isocode+name) module;
  • daily requests module;
  • csv output export
  • isatty detection fix / force output

Some features have more than 2 month old.. and in my case are required to turn goaccess useful.

best,
Francisco Azevedo

Nothing valid to process

1.1.1.1 - - [05/Dec/2013:04:02:07 -0800] 481 1386244927.610 0.001 201 "PUT /data/520130610/2013-12-05/4/47196c7de94546fa790dsdd5218.md HTTP/1.1" 25 "-" "HTTP_Request2/0.3.0 (http://pear.php.net/package/http_request2) PHP/5.4.19" "-" - example.net

Trying to parse log like this.

date_format %d/%b/%Y:%T %z
log_format %h - - [%d] $b $^ $T $s "%r" %^ "%R" "%u" "%^" %^ %^

But keep getting errors. Any idea
An error has occurred Error occured at: goaccess.c - render_screens - 318
Message: Nothing valid to process.

just process some modules

On big logs, when we have millions of hosts, we get a very very big export with lots of info. And sometimes that aditional info is not necessary.

I propose to have a way to select modules to process.
So, we can choose if we want to process "hosts", "referer", "404" or other modules. This way we can define modules to process (or as an alternative modules do exclude).

Something like:

# goaccess  -p general,unique,requested -f access.log

so, in previous example, goaccess will only process and give information about general stats, unique visits per day and requested pages only.

What do you think?

crawlers list

Hi,
Here goes an additional crawlers list with 330 more referrer signatures. Feel free to add it in util.c .

{"SeoCheckBot", "Crawlers"},
{"MJ12bot", "Crawlers"},
{"Genieo", "Crawlers"},
{"proximic", "Crawlers"},
{"80legs", "Crawlers"},
{"ExB Language Crawler", "Crawlers"},
{"NaverBot", "Crawlers"},
{"Qirina Hurdler", "Crawlers"},
{"ChangeDetection", "Crawlers"},
{"IntegromeDB", "Crawlers"},
{"GrapeshotCrawler", "Crawlers"},
{"IstellaBot", "Crawlers"},
{"Yahoo!", "Crawlers"},
{"sogou spider", "Crawlers"},
{"AddThis.com", "Crawlers"},
{"Zookabot", "Crawlers"},
{"ShopWiki", "Crawlers"},
{"iCjobs", "Crawlers"},
{"search.KumKie.com", "Crawlers"},
{"ZumBot", "Crawlers"},
{"Netseer", "Crawlers"},
{"magpie-crawler", "Crawlers"},
{"rogerbot", "Crawlers"},
{"ShowyouBot", "Crawlers"},
{"bitlybot", "Crawlers"},
{"Spinn3r", "Crawlers"},
{"Jyxobot", "Crawlers"},
{"Woko", "Crawlers"},
{"coccoc", "Crawlers"},
{"Wotbox", "Crawlers"},
{"Mail.Ru bot", "Crawlers"},
{"Vagabondo", "Crawlers"},
{"uMBot-FC", "Crawlers"},
{"MetaJobBot", "Crawlers"},
{"SearchmetricsBot", "Crawlers"},
{"Infohelfer", "Crawlers"},
{"Peeplo Screenshot Bot", "Crawlers"},
{"Qualidator.com Bot", "Crawlers"},
{"Daumoa", "Crawlers"},
{"Ezooms", "Crawlers"},
{"TurnitinBot", "Crawlers"},
{"NetcraftSurveyAgent", "Crawlers"},
{"Crawler4j", "Crawlers"},
{"Aboundexbot", "Crawlers"},
{"aiHitBot", "Crawlers"},
{"VoilaBot", "Crawlers"},
{"AraBot", "Crawlers"},
{"YioopBot", "Crawlers"},
{"Nuhk", "Crawlers"},
{"yacybot", "Crawlers"},
{"SEOENGBot", "Crawlers"},
{"Najdi.si", "Crawlers"},
{"linkdex.com", "Crawlers"},
{"Yahoo! JAPAN", "Crawlers"},
{"psbot", "Crawlers"},
{"trendictionbot", "Crawlers"},
{"BLEXBot", "Crawlers"},
{"spbot", "Crawlers"},
{"webmastercoffee", "Crawlers"},
{"PaperLiBot", "Crawlers"},
{"QuerySeekerSpider", "Crawlers"},
{"FacebookExternalHit", "Crawlers"},
{"UASlinkChecker", "Crawlers"},
{"Browsershots", "Crawlers"},
{"MetaURI API", "Crawlers"},
{"woriobot", "Crawlers"},
{"oBot", "Crawlers"},
{"FyberSpider", "Crawlers"},
{"Steeler", "Crawlers"},
{"Blekkobot", "Crawlers"},
{"LinkAider", "Crawlers"},
{"SEOkicks-Robot", "Crawlers"},
{"firmilybot", "Crawlers"},
{"netEstate Crawler", "Crawlers"},
{"AMZNKAssocBot", "Crawlers"},
{"OpenindexSpider", "Crawlers"},
{"BacklinkCrawler", "Crawlers"},
{"HubSpot Connect", "Crawlers"},
{"WBSearchBot", "Crawlers"},
{"MetaHeadersBot", "Crawlers"},
{"UnisterBot", "Crawlers"},
{"Strokebot", "Crawlers"},
{"GeliyooBot", "Crawlers"},
{"bot-pge.chlooe.com", "Crawlers"},
{"ownCloud Server Crawler", "Crawlers"},
{"CirrusExplorer", "Crawlers"},
{"bixocrawler", "Crawlers"},
{"ProCogSEOBot", "Crawlers"},
{"Falconsbot", "Crawlers"},
{"Dlvr.it/1.0", "Crawlers"},
{"thumbshots-de-Bot", "Crawlers"},
{"200PleaseBot", "Crawlers"},
{"discoverybot", "Crawlers"},
{"MojeekBot", "Crawlers"},
{"Panscient", "Crawlers"},
{"Plukkie", "Crawlers"},
{"R6 bot", "Crawlers"},
{"bl.uk_lddc_bot", "Crawlers"},
{"SolomonoBot", "Crawlers"},
{"Grahambot", "Crawlers"},
{"Automattic", "Crawlers"},
{"emefgebot", "Crawlers"},
{"SBSearch", "Crawlers"},
{"PiplBot", "Crawlers"},
{"TinEye", "Crawlers"},
{"FlightDeckReportsBot", "Crawlers"},
{"fastbot crawler", "Crawlers"},
{"4seohuntBot", "Crawlers"},
{"Updownerbot", "Crawlers"},
{"JikeSpider", "Crawlers"},
{"NLNZ_IAHarvester2013", "Crawlers"},
{"Nigma.ru", "Crawlers"},
{"wsAnalyzer", "Crawlers"},
{"OpenWebSpider", "Crawlers"},
{"YodaoBot", "Crawlers"},
{"SpiderLing", "Crawlers"},
{"Esribot", "Crawlers"},
{"Thumbshots.ru", "Crawlers"},
{"BlogPulse", "Crawlers"},
{"NextGenSearchBot", "Crawlers"},
{"bot.wsowner.com", "Crawlers"},
{"wscheck.com", "Crawlers"},
{"Qseero", "Crawlers"},
{"drupact", "Crawlers"},
{"HuaweiSymantecSpider", "Crawlers"},
{"PagePeeker", "Crawlers"},
{"HomeTags", "Crawlers"},
{"facebookplatform", "Crawlers"},
{"Pixray-Seeker", "Crawlers"},
{"BDFetch", "Crawlers"},
{"MeMoNewsBot", "Crawlers"},
{"ProCogBot", "Crawlers"},
{"WillyBot", "Crawlers"},
{"peerindex", "Crawlers"},
{"Job Roboter Spider", "Crawlers"},
{"MLBot", "Crawlers"},
{"WebNL", "Crawlers"},
{"Peepowbot", "Crawlers"},
{"Semager", "Crawlers"},
{"MIA Bot", "Crawlers"},
{"Eurobot", "Crawlers"},
{"DripfeedBot", "Crawlers"},
{"webinatorbot", "Crawlers"},
{"Whoismindbot", "Crawlers"},
{"Bad-Neighborhood", "Crawlers"},
{"Hailoobot", "Crawlers"},
{"akula", "Crawlers"},
{"MetamojiCrawler", "Crawlers"},
{"Page2RSS", "Crawlers"},
{"EasyBib AutoCite", "Crawlers"},
{"suggybot", "Crawlers"},
{"NerdByNature.Bot", "Crawlers"},
{"EventGuruBot", "Crawlers"},
{"quickobot", "Crawlers"},
{"gonzo", "Crawlers"},
{"CCBot", "Crawlers"},
{"bnf.fr_bot", "Crawlers"},
{"UptimeRobot", "Crawlers"},
{"Influencebot", "Crawlers"},
{"MSRBOT", "Crawlers"},
{"KeywordDensityRobot", "Crawlers"},
{"heritrix", "Crawlers"},
{"Ronzoobot", "Crawlers"},
{"RyzeCrawler", "Crawlers"},
{"ScoutJet", "Crawlers"},
{"Twikle", "Crawlers"},
{"SWEBot", "Crawlers"},
{"RADaR-Bot", "Crawlers"},
{"DCPbot", "Crawlers"},
{"Castabot", "Crawlers"},
{"percbotspider", "Crawlers"},
{"WeSEE:Search", "Crawlers"},
{"CatchBot", "Crawlers"},
{"imbot", "Crawlers"},
{"EdisterBot", "Crawlers"},
{"WASALive-Bot", "Crawlers"},
{"Accelobot", "Crawlers"},
{"PostPost", "Crawlers"},
{"factbot", "Crawlers"},
{"Setoozbot", "Crawlers"},
{"biwec", "Crawlers"},
{"GarlikCrawler", "Crawlers"},
{"Search17Bot", "Crawlers"},
{"Lijit", "Crawlers"},
{"MetaGeneratorCrawler", "Crawlers"},
{"Robots_Tester", "Crawlers"},
{"JUST-CRAWLER", "Crawlers"},
{"Apercite", "Crawlers"},
{"pmoz.info", "Crawlers"},
{"LemurWebCrawler", "Crawlers"},
{"Covario-IDS", "Crawlers"},
{"Holmes", "Crawlers"},
{"RankurBot", "Crawlers"},
{"DotBot", "Crawlers"},
{"envolk", "Crawlers"},
{"LexxeBot", "Crawlers"},
{"adressendeutschland", "Crawlers"},
{"StackRambler", "Crawlers"},
{"Abrave Spider", "Crawlers"},
{"EvriNid", "Crawlers"},
{"arachnode.net", "Crawlers"},
{"CamontSpider", "Crawlers"},
{"wikiwix-bot", "Crawlers"},
{"Nymesis", "Crawlers"},
{"trendictionbot", "Crawlers"},
{"Sitedomain-Bot", "Crawlers"},
{"SEODat", "Crawlers"},
{"SygolBot", "Crawlers"},
{"Snapbot", "Crawlers"},
{"OpenCalaisSemanticProxy", "Crawlers"},
{"ZookaBot", "Crawlers"},
{"CligooRobot", "Crawlers"},
{"cityreview", "Crawlers"},
{"nworm", "Crawlers"},
{"AboutUsBot", "Crawlers"},
{"ICC-Crawler", "Crawlers"},
{"SBIder", "Crawlers"},
{"TwengaBot", "Crawlers"},
{"Dot TK - spider", "Crawlers"},
{"EuripBot", "Crawlers"},
{"ParchBot", "Crawlers"},
{"Peew", "Crawlers"},
{"AntBot", "Crawlers"},
{"YRSpider", "Crawlers"},
{"Urlfilebot", "Crawlers"},
{"Gaisbot", "Crawlers"},
{"WatchMouse", "Crawlers"},
{"Tagoobot", "Crawlers"},
{"Motoricerca", "Crawlers"},
{"WebWatch", "Crawlers"},
{"urlfan-bot", "Crawlers"},
{"StatoolsBot", "Crawlers"},
{"page_verifier", "Crawlers"},
{"SSLBot", "Crawlers"},
{"SAI Crawler", "Crawlers"},
{"DomainDB", "Crawlers"},
{"LinkWalker", "Crawlers"},
{"WMCAI_robot", "Crawlers"},
{"voyager", "Crawlers"},
{"copyright sheriff", "Crawlers"},
{"Ocelli", "Crawlers"},
{"amibot", "Crawlers"},
{"abby", "Crawlers"},
{"NetResearchServer", "Crawlers"},
{"VideoSurf_bot", "Crawlers"},
{"XML Sitemaps Generator", "Crawlers"},
{"BlinkaCrawler", "Crawlers"},
{"nodestackbot", "Crawlers"},
{"Pompos", "Crawlers"},
{"taptubot", "Crawlers"},
{"BabalooSpider", "Crawlers"},
{"Yaanb", "Crawlers"},
{"Girafabot", "Crawlers"},
{"livedoor ScreenShot", "Crawlers"},
{"eCairn-Grabber", "Crawlers"},
{"FauBot", "Crawlers"},
{"Toread-Crawler", "Crawlers"},
{"Setoozbot", "Crawlers"},
{"MetaURI", "Crawlers"},
{"L.webis", "Crawlers"},
{"Web-sniffer", "Crawlers"},
{"FairShare", "Crawlers"},
{"Ruky-Roboter", "Crawlers"},
{"ThumbShots-Bot", "Crawlers"},
{"BotOnParade", "Crawlers"},
{"Amagit.COM", "Crawlers"},
{"HatenaScreenshot", "Crawlers"},
{"HolmesBot", "Crawlers"},
{"dotSemantic", "Crawlers"},
{"Karneval-Bot", "Crawlers"},
{"HostTracker.com", "Crawlers"},
{"AportWorm", "Crawlers"},
{"XmarksFetch", "Crawlers"},
{"FeedFinder", "Crawlers"},
{"CorpusCrawler", "Crawlers"},
{"Willow Internet Crawler", "Crawlers"},
{"OrgbyBot", "Crawlers"},
{"GingerCrawler", "Crawlers"},
{"pingdom.com_bot", "Crawlers"},
{"baypup", "Crawlers"},
{"Linguee Bot", "Crawlers"},
{"Mp3Bot", "Crawlers"},
{"192.comAgent", "Crawlers"},
{"Surphace Scout", "Crawlers"},
{"WikioFeedBot", "Crawlers"},
{"Szukacz", "Crawlers"},
{"DBLBot", "Crawlers"},
{"Thumbnail.CZ", "Crawlers"},
{"LinguaBot", "Crawlers"},
{"GurujiBot", "Crawlers"},
{"Charlotte", "Crawlers"},
{"50.nu", "Crawlers"},
{"SanszBot", "Crawlers"},
{"moba-crawler", "Crawlers"},
{"HeartRails_Capture", "Crawlers"},
{"SurveyBot", "Crawlers"},
{"MnoGoSearch", "Crawlers"},
{"smart.apnoti.com Robot", "Crawlers"},
{"Topicbot", "Crawlers"},
{"JadynAveBot", "Crawlers"},
{"OsObot", "Crawlers"},
{"WebImages", "Crawlers"},
{"WinWebBot", "Crawlers"},
{"Scooter", "Crawlers"},
{"Scarlett", "Crawlers"},
{"GOFORITBOT", "Crawlers"},
{"DKIMRepBot", "Crawlers"},
{"Yanga", "Crawlers"},
{"DNS-Digger-Explorer", "Crawlers"},
{"Robozilla", "Crawlers"},
{"adidxbot", "Crawlers"},
{"YowedoBot", "Crawlers"},
{"botmobi", "Crawlers"},
{"Fooooo_Web_Video_Crawl", "Crawlers"},
{"UptimeDog", "Crawlers"},
{"^Nail", "Crawlers"},
{"Metaspinner", "Crawlers"},
{"Touche", "Crawlers"},
{"RSSMicro", "Crawlers"},
{"SniffRSS", "Crawlers"},
{"FeedCatBot", "Crawlers"},
{"WebRankSpider", "Crawlers"},
{"Flatland Industries Web Spider", "Crawlers"},
{"DealGates Bot", "Crawlers"},
{"Link Valet Online", "Crawlers"},
{"Shelob", "Crawlers"},
{"Technoratibot", "Crawlers"},
{"Flocke bot", "Crawlers"},
{"FollowSite Bot", "Crawlers"},
{"Visbot", "Crawlers"},

btw, there's currently a duplicate crawler entry: KaloogaBot

best,

A New Feature - Adapting Squid Access Log

People often use apache log analyzers for their cache servers as well.
A good idea is to have squid access log adapted or having a document/howto for those users in which we show them how to use goaccess for that purpose.

GoAccess does not properly escape HTML in logs when generating a HTML report.

Take a look at http://eve.cognitive.io/static/report.html

How to reproduce:

  1. Look at http://eve.cognitive.io/static/report.html
  2. Expand Requested files, inspect the last row (half empty).
  3. A <title> tag has not been escaped and is causing the rest of the page to not render.

The following fantastic log entry makes it break:

112.216.56.58 - - [05/Jul/2013:04:55:11 +0000] "GET <title>phpMyAdmin HTTP/1.1" 400 172 "-" "-"
112.216.56.58 - - [05/Jul/2013:15:36:57 +0000] "GET <title>phpMyAdmin HTTP/1.1" 400 172 "-" "-"

Problem on parsing, no option for debugging :(

Sadly I dont know what todo, i tried no any possible combination, even going to setting everything to %^ just to see if it goes "posititve" but it doesnt want.... So my logfiles look like this:

dukgo.com [22/Nov/2013:06:26:00 -0500] "GET / HTTP/1.1" 302 0.000 160 "-" "-"

As we dont track the IP of our users, that field is missing, in some tries i gave it the virtual hostname as user IP but that wasnt changing anything, everything is always giving this result:

Message: Nothing valid to process.

I have so far this in my .goaccessrc as "Best of"-try I would say ;)

date_format %d/%b/%Y:%T %z
log_format %^ [%d] "%r" %s %D %b "%R" "%^"

Is there any way to debug this somehow? Finding out for example which regexps comes out in the end would be awesome ;) then i might can check out the tries.

Thanks in advance for any help

ideia: output standardization

Hi all,
Currently we have different ways to output content. eg:
goaccess -a > file.html for html
goaccess -a -o csv > file.csv for csv
goaccess -a -o json > file.json for json

What do you think to create a standard format and simplify. eg:
goaccess -a -o file.html
goaccess -a -o file.csv
goaccess -a -o file.json

Standard, simple, no extension duplication (-o EXT > f.EXT), no isatty() magic.
If -o is not used, x-terminal will be displayed

best,
Francisco Azevedo

Geolocation in debian

Does the debian package contain geolocation support by default? (debian 7, goaccess 0.5) If not then how to enable it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.