darold / pgbadger Goto Github PK

View Code? Open in Web Editor NEW

3.4K 104.0 348.0 17.15 MB

A fast PostgreSQL Log Analyzer

Home Page: http://pgbadger.darold.net/

License: PostgreSQL License

Perl 68.58% JavaScript 30.77% CSS 0.47% Raku 0.19%

postgresql perfomance tuning sql log

pgbadger's Introduction

NAME
SYNOPSIS
DESCRIPTION
FEATURE
REQUIREMENT
INSTALLATION
POSTGRESQL-CONFIGURATION
LOG-STATEMENTS
PARALLEL-PROCESSING
INCREMENTAL-REPORTS
BINARY-FORMAT
JSON-FORMAT
AUTHORS
LICENSE

NAME

pgBadger - a fast PostgreSQL log analysis report

SYNOPSIS

Usage: pgbadger [options] logfile [...]

PostgreSQL log analyzer with fully detailed reports and graphs.

Arguments:

logfile can be a single log file, a list of files, or a shell command
returning a list of files. If you want to pass log content from stdin
use - as filename. Note that input from stdin will not work with csvlog.

Options:

-a | --average minutes : number of minutes to build the average graphs of
                         queries and connections. Default 5 minutes.
-A | --histo-average min: number of minutes to build the histogram graphs
                         of queries. Default 60 minutes.
-b | --begin datetime  : start date/time for the data to be parsed in log
                         (either a timestamp or a time)
-c | --dbclient host   : only report on entries for the given client host.
-C | --nocomment       : remove comments like /* ... */ from queries.
-d | --dbname database : only report on entries for the given database.
-D | --dns-resolv      : client ip addresses are replaced by their DNS name.
                         Be warned that this can really slow down pgBadger.
-e | --end datetime    : end date/time for the data to be parsed in log
                         (either a timestamp or a time)
-E | --explode         : explode the main report by generating one report
                         per database. Global information not related to a
                         database is added to the postgres database report.
-f | --format logtype  : possible values: syslog, syslog2, stderr, jsonlog,
                         csv, pgbouncer, logplex, rds and redshift. Use this
                         option when pgBadger is not able to detect the log
                         format.
-G | --nograph         : disable graphs on HTML output. Enabled by default.
-h | --help            : show this message and exit.
-H | --html-outdir path: path to directory where HTML report must be written
                         in incremental mode, binary files stay on directory
                         defined with -O, --outdir option.
-i | --ident name      : programname used as syslog ident. Default: postgres
-I | --incremental     : use incremental mode, reports will be generated by
                         days in a separate directory, --outdir must be set.
-j | --jobs number     : number of jobs to run at same time for a single log
                         file. Run as single by default or when working with
                         csvlog format.
-J | --Jobs number     : number of log files to parse in parallel. Process
                         one file at a time by default.
-l | --last-parsed file: allow incremental log parsing by registering the
                         last datetime and line parsed. Useful if you want
                         to watch errors since last run or if you want one
                         report per day with a log rotated each week.
-L | --logfile-list file:file containing a list of log files to parse.
-m | --maxlength size  : maximum length of a query, it will be restricted to
                         the given size. Default truncate size is 100000.
-M | --no-multiline    : do not collect multiline statements to avoid garbage
                         especially on errors that generate a huge report.
-N | --appname name    : only report on entries for given application name
-o | --outfile filename: define the filename for the output. Default depends
                         on the output format: out.html, out.txt, out.bin,
                         or out.json. This option can be used multiple times
                         to output several formats. To use json output, the
                         Perl module JSON::XS must be installed, to dump
                         output to stdout, use - as filename.
-O | --outdir path     : directory where out files must be saved.
-p | --prefix string   : the value of your custom log_line_prefix as
                         defined in your postgresql.conf. Only use it if you
                         aren't using one of the standard prefixes specified
                         in the pgBadger documentation, such as if your
                         prefix includes additional variables like client ip
                         or application name. See examples below.
-P | --no-prettify     : disable SQL queries prettify formatter.
-q | --quiet           : don't print anything to stdout, not even a progress
                         bar.
-Q | --query-numbering : add numbering of queries to the output when using
                         options --dump-all-queries or --normalized-only.
-r | --remote-host ip  : set the host where to execute the cat command on
                         remote log file to parse the file locally.
-R | --retention N     : number of weeks to keep in incremental mode. Defaults
                         to 0, disabled. Used to set the number of weeks to
                         keep in output directory. Older weeks and days
                         directories are automatically removed.
-s | --sample number   : number of query samples to store. Default: 3.
-S | --select-only     : only report SELECT queries.
-t | --top number      : number of queries to store/display. Default: 20.
-T | --title string    : change title of the HTML page report.
-u | --dbuser username : only report on entries for the given user.
-U | --exclude-user username : exclude entries for the specified user from
                         report. Can be used multiple time.
-v | --verbose         : enable verbose or debug mode. Disabled by default.
-V | --version         : show pgBadger version and exit.
-w | --watch-mode      : only report errors just like logwatch could do.
-W | --wide-char       : encode html output of queries into UTF8 to avoid
                         Perl message "Wide character in print".
-x | --extension       : output format. Values: text, html, bin or json.
                         Default: html
-X | --extra-files     : in incremental mode allow pgBadger to write CSS and
                         JS files in the output directory as separate files.
-z | --zcat exec_path  : set the full path to the zcat program. Use it if
                         zcat, bzcat or unzip is not in your path.
-Z | --timezone +/-XX  : Set the number of hours from GMT of the timezone.
                         Use this to adjust date/time in JavaScript graphs.
                         The value can be an integer, ex.: 2, or a float,
                         ex.: 2.5.
--pie-limit num        : pie data lower than num% will show a sum instead.
--exclude-query regex  : any query matching the given regex will be excluded
                         from the report. For example: "^(VACUUM|COMMIT)"
                         You can use this option multiple times.
--exclude-file filename: path of the file that contains each regex to use
                         to exclude queries from the report. One regex per
                         line.
--include-query regex  : any query that does not match the given regex will
                         be excluded from the report. You can use this
                         option multiple times. For example: "(tbl1|tbl2)".
--include-file filename: path of the file that contains each regex to the
                         queries to include from the report. One regex per
                         line.
--disable-error        : do not generate error report.
--disable-hourly       : do not generate hourly report.
--disable-type         : do not generate report of queries by type, database
                         or user.
--disable-query        : do not generate query reports (slowest, most
                         frequent, queries by users, by database, ...).
--disable-session      : do not generate session report.
--disable-connection   : do not generate connection report.
--disable-lock         : do not generate lock report.
--disable-temporary    : do not generate temporary report.
--disable-checkpoint   : do not generate checkpoint/restartpoint report.
--disable-autovacuum   : do not generate autovacuum report.
--charset              : used to set the HTML charset to be used.
                         Default: utf-8.
--csv-separator        : used to set the CSV field separator, default: ,
--exclude-time  regex  : any timestamp matching the given regex will be
                         excluded from the report. Example: "2013-04-12 .*"
                         You can use this option multiple times.
--include-time  regex  : only timestamps matching the given regex will be
                         included in the report. Example: "2013-04-12 .*"
                         You can use this option multiple times.
--exclude-db name      : exclude entries for the specified database from
                         report. Example: "pg_dump". Can be used multiple
                         times.
--exclude-appname name : exclude entries for the specified application name
                         from report.  Example: "pg_dump".  Can be used
                         multiple times.
--exclude-line regex   : exclude any log entry that will match the given
                         regex. Can be used multiple times.
--exclude-client name  : exclude log entries for the specified client ip.
                         Can be used multiple times.
--anonymize            : obscure all literals in queries, useful to hide
                         confidential data.
--noreport             : no reports will be created in incremental mode.
--log-duration         : force pgBadger to associate log entries generated
                         by both log_duration = on and log_statement = 'all'
--enable-checksum      : used to add an md5 sum under each query report.
--journalctl command   : command to use to replace PostgreSQL logfile by
                         a call to journalctl. Basically it might be:
                            journalctl -u postgresql-9.5
--pid-dir path         : set the path where the pid file must be stored.
                         Default /tmp
--pid-file file        : set the name of the pid file to manage concurrent
                         execution of pgBadger. Default: pgbadger.pid
--rebuild              : used to rebuild all html reports in incremental
                         output directories where there's binary data files.
--pgbouncer-only       : only show PgBouncer-related menus in the header.
--start-monday         : in incremental mode, calendar weeks start on
                         Sunday. Use this option to start on a Monday.
--iso-week-number      : in incremental mode, calendar weeks start on
                         Monday and respect the ISO 8601 week number, range
                         01 to 53, where week 1 is the first week that has
                         at least 4 days in the new year.
--normalized-only      : only dump all normalized queries to out.txt
--log-timezone +/-XX   : Set the number of hours from GMT of the timezone
                         that must be used to adjust date/time read from
                         log file before beeing parsed. Using this option
                         makes log search with a date/time more difficult.
                         The value can be an integer, ex.: 2, or a float,
                         ex.: 2.5.
--prettify-json        : use it if you want json output to be prettified.
--month-report YYYY-MM : create a cumulative HTML report over the specified
                         month. Requires incremental output directories and
                         the presence of all necessary binary data files
--day-report YYYY-MM-DD: create an HTML report over the specified day.
                         Requires incremental output directories and the
                         presence of all necessary binary data files
--noexplain            : do not process lines generated by auto_explain.
--command CMD          : command to execute to retrieve log entries on
                         stdin. pgBadger will open a pipe to the command
                         and parse log entries generated by the command.
--no-week              : inform pgbadger to not build weekly reports in
                         incremental mode. Useful if it takes too much time.
--explain-url URL      : use it to override the url of the graphical explain
                         tool. Default: https://explain.depesz.com/
--tempdir DIR          : set directory where temporary files will be written
                         Default: File::Spec->tmpdir() || '/tmp'
--no-process-info      : disable changing process title to help identify
                         pgbadger process, some system do not support it.
--dump-all-queries     : dump all queries found in the log file replacing
                         bind parameters included in the queries at their
                         respective placeholders positions.
--keep-comments        : do not remove comments from normalized queries. It
                         can be useful if you want to distinguish between
                         same normalized queries.
--no-progressbar       : disable progressbar.

pgBadger is able to parse a remote log file using a passwordless ssh connection. Use -r or --remote-host to set the host IP address or hostname. There are also some additional options to fully control the ssh connection.

--ssh-program ssh        path to the ssh program to use. Default: ssh.
--ssh-port port          ssh port to use for the connection. Default: 22.
--ssh-user username      connection login name. Defaults to running user.
--ssh-identity file      path to the identity file to use.
--ssh-timeout second     timeout to ssh connection failure. Default: 10 sec.
--ssh-option  options    list of -o options to use for the ssh connection.
                         Options always used:
                             -o ConnectTimeout=$ssh_timeout
                             -o PreferredAuthentications=hostbased,publickey

Log file to parse can also be specified using an URI, supported protocols are http[s] and [s]ftp. The curl command will be used to download the file, and the file will be parsed during download. The ssh protocol is also supported and will use the ssh command like with the remote host use. See examples bellow.

Return codes:

0: on success
1: die on error
2: if it has been interrupted using ctr+c for example
3: the pid file already exists or can not be created
4: no log file was given at command line

Examples:

pgbadger /var/log/postgresql.log
pgbadger /var/log/postgres.log.2.gz /var/log/postgres.log.1.gz /var/log/postgres.log
pgbadger /var/log/postgresql/postgresql-2012-05-*
pgbadger --exclude-query="^(COPY|COMMIT)" /var/log/postgresql.log
pgbadger -b "2012-06-25 10:56:11" -e "2012-06-25 10:59:11" /var/log/postgresql.log
cat /var/log/postgres.log | pgbadger -
# Log line prefix with stderr log output
pgbadger --prefix '%t [%p]: user=%u,db=%d,client=%h' /pglog/postgresql-2012-08-21*
pgbadger --prefix '%m %u@%d %p %r %a : ' /pglog/postgresql.log
# Log line prefix with syslog log output
pgbadger --prefix 'user=%u,db=%d,client=%h,appname=%a' /pglog/postgresql-2012-08-21*
# Use my 8 CPUs to parse my 10GB file faster, much faster
pgbadger -j 8 /pglog/postgresql-10.1-main.log

Use URI notation for remote log file:

pgbadger http://172.12.110.1//var/log/postgresql/postgresql-10.1-main.log
pgbadger ftp://[email protected]/postgresql-10.1-main.log
pgbadger ssh://[email protected]:2222//var/log/postgresql/postgresql-10.1-main.log*

You can use together a local PostgreSQL log and a remote pgbouncer log file to parse:

pgbadger /var/log/postgresql/postgresql-10.1-main.log ssh://[email protected]/pgbouncer.log

Reporting errors every week by cron job:

30 23 * * 1 /usr/bin/pgbadger -q -w /var/log/postgresql.log -o /var/reports/pg_errors.html

Generate report every week using incremental behavior:

0 4 * * 1 /usr/bin/pgbadger -q `find /var/log/ -mtime -7 -name "postgresql.log*"` -o /var/reports/pg_errors-`date +\%F`.html -l /var/reports/pgbadger_incremental_file.dat

This supposes that your log file and HTML report are also rotated every week.

Or better, use the auto-generated incremental reports:

0 4 * * * /usr/bin/pgbadger -I -q /var/log/postgresql/postgresql.log.1 -O /var/www/pg_reports/

will generate a report per day and per week.

In incremental mode, you can also specify the number of weeks to keep in the reports:

/usr/bin/pgbadger --retention 2 -I -q /var/log/postgresql/postgresql.log.1 -O /var/www/pg_reports/

If you have a pg_dump at 23:00 and 13:00 each day during half an hour, you can use pgBadger as follow to exclude these periods from the report:

pgbadger --exclude-time "2013-09-.* (23|13):.*" postgresql.log

This will help avoid having COPY statements, as generated by pg_dump, on top of the list of slowest queries. You can also use --exclude-appname "pg_dump" to solve this problem in a simpler way.

You can also parse journalctl output just as if it was a log file:

pgbadger --journalctl 'journalctl -u postgresql-9.5'

or worst, call it from a remote host:

pgbadger -r 192.168.1.159 --journalctl 'journalctl -u postgresql-9.5'

you don't need to specify any log file at command line, but if you have other PostgreSQL log files to parse, you can add them as usual.

To rebuild all incremental html reports after, proceed as follow:

rm /path/to/reports/*.js
rm /path/to/reports/*.css
pgbadger -X -I -O /path/to/reports/ --rebuild

it will also update all resource files (JS and CSS). Use -E or --explode if the reports were built using this option.

pgBadger also supports Heroku PostgreSQL logs using logplex format:

heroku logs -p postgres | pgbadger -f logplex -o heroku.html -

this will stream Heroku PostgreSQL log to pgbadger through stdin.

pgBadger can auto detect RDS and cloudwatch PostgreSQL logs using rds format:

pgbadger -f rds -o rds_out.html rds.log

Each CloudSQL Postgresql log is a fairly normal PostgreSQL log, but encapsulated in JSON format. It is autodetected by pgBadger but in case you need to force the log format use `jsonlog`:

pgbadger -f jsonlog -o cloudsql_out.html cloudsql.log

This is the same as with the jsonlog extension, the json format is different but pgBadger can parse both formats.

pgBadger also supports logs produced by CloudNativePG Postgres operator for Kubernetes:

pgbadger -f jsonlog -o cnpg_out.html cnpg.log

To create a cumulative report over a month use command:

pgbadger --month-report 2919-05 /path/to/incremental/reports/

this will add a link to the month name into the calendar view in incremental reports to look at report for month 2019 May. Use -E or --explode if the reports were built using this option.

DESCRIPTION

pgBadger is a PostgreSQL log analyzer built for speed providing fully detailed reports based on your PostgreSQL log files. It's a small standalone Perl script that outperforms any other PostgreSQL log analyzer.

It is written in pure Perl and uses a JavaScript library (flotr2) to draw graphs so that you don't need to install any additional Perl modules or other packages. Furthermore, this library gives us more features such as zooming. pgBadger also uses the Bootstrap JavaScript library and the FontAwesome webfont for better design. Everything is embedded.

pgBadger is able to autodetect your log file format (syslog, stderr, csvlog or jsonlog) if the file is long enough. It is designed to parse huge log files as well as compressed files. Supported compressed formats are gzip, bzip2, lz4, xz, zip and zstd. For the xz format you must have an xz version higher than 5.05 that supports the --robot option. lz4 files must be compressed with the --content-size option for pgbadger to determine the uncompressed file size. For the complete list of features, see below.

All charts are zoomable and can be saved as PNG images.

You can also limit pgBadger to only report errors or remove any part of the report using command-line options.

pgBadger supports any custom format set in the log_line_prefix directive of your postgresql.conf file as long as it at least specifies the %t and %p patterns.

pgBadger allows parallel processing of a single log file or multiple files through the use of the -j option specifying the number of CPUs.

If you want to save system performance you can also use log_duration instead of log_min_duration_statement to have reports on duration and number of queries only.

FEATURE

pgBadger reports everything about your SQL queries:

    Overall statistics.
    The most frequent waiting queries.
    Queries that waited the most.
    Queries generating the most temporary files.
    Queries generating the largest temporary files.
    The slowest queries.
    Queries that took up the most time.
    The most frequent queries.
    The most frequent errors.
    Histogram of query times.
    Histogram of sessions times.
    Users involved in top queries.
    Applications involved in top queries.
    Queries generating the most cancellation.
    Queries most cancelled.
    The most time consuming prepare/bind queries

The following reports are also available with hourly charts divided into periods of five minutes:

    SQL queries statistics.
    Temporary file statistics.
    Checkpoints statistics.
    Autovacuum and autoanalyze statistics.
    Cancelled queries.
    Error events (panic, fatal, error and warning).
    Error class distribution.

There are also some pie charts about distribution of:

    Locks statistics.
    Queries by type (select/insert/update/delete).
    Distribution of queries type per database/application
    Sessions per database/user/client/application.
    Connections per database/user/client/application.
    Autovacuum and autoanalyze per table.
    Queries per user and total duration per user.

All charts are zoomable and can be saved as PNG images. SQL queries reported are highlighted and beautified automatically.

pgBadger is also able to parse PgBouncer log files and to create the following reports:

    Request Throughput
    Bytes I/O Throughput
    Average Query Duration
    Simultaneous sessions
    Histogram of sessions times
    Sessions per database
    Sessions per user
    Sessions per host
    Established connections
    Connections per database
    Connections per user
    Connections per host
    Most used reserved pools
    Most Frequent Errors/Events

You can also have incremental reports with one report per day and a cumulative report per week. Two multiprocess modes are available to speed up log parsing, one using one core per log file, and the second using multiple cores to parse a single file. These modes can be combined.

Histogram granularity can be adjusted using the -A command-line option. By default, they will report the mean of each top queries/errors occurring per hour, but you can specify the granularity down to the minute.

pgBadger can also be used in a central place to parse remote log files using a passwordless SSH connection. This mode can be used with compressed files and in the multiprocess per file mode (-J), but cannot be used with the CSV log format.

Examples of reports can be found here: https://pgbadger.darold.net/#reports

REQUIREMENT

pgBadger comes as a single Perl script - you do not need anything other than a modern Perl distribution. Charts are rendered using a JavaScript library, so you don't need anything other than a web browser. Your browser will do all the work.

If you plan to parse PostgreSQL CSV log files, you might need some Perl Modules:

    Text::CSV_XS - to parse PostgreSQL CSV log files.

This module is optional, if you don't have PostgreSQL log in the CSV format, you don't need to install it.

If you want to export statistics as JSON file, you need an additional Perl module:

    JSON::XS - JSON serialising/deserialising, done correctly and fast

This module is optional, if you don't select the json output format, you don't need to install it. You can install it on a Debian-like system using:

    sudo apt-get install libjson-xs-perl

and on RPM-like system using:

    sudo yum install perl-JSON-XS

Compressed log file format is autodetected from the file extension. If pgBadger finds a gz extension, it will use the zcat utility; with a bz2 extension, it will use bzcat; with lz4, it will use lz4cat; with zst, it will use zstdcat; if the file extension is zip or xz, then the unzip or xz utility will be used.

If those utilities are not found in the PATH environment variable, then use the --zcat command-line option to change this path. For example:

    --zcat="/usr/local/bin/gunzip -c" or --zcat="/usr/local/bin/bzip2 -dc"
    --zcat="C:\tools\unzip -p"

By default, pgBadger will use the zcat, bzcat, lz4cat, zstdcat and unzip utilities following the file extension. If you use the default autodetection of compression format, you can mix gz, bz2, lz4, xz, zip or zstd files. Specifying a custom value of --zcat option will remove the possibility of mixed compression format.

Note that multiprocessing cannot be used with compressed files or CSV files as well as under Windows platform.

INSTALLATION

Download the tarball from GitHub and unpack the archive as follow:

    tar xzf pgbadger-11.x.tar.gz
    cd pgbadger-11.x/
    perl Makefile.PL
    make && sudo make install

This will copy the Perl script pgbadger to /usr/local/bin/pgbadger by default and the man page into /usr/local/share/man/man1/pgbadger.1. Those are the default installation directories for 'site' install.

If you want to install all under /usr/ location, use INSTALLDIRS='perl' as an argument of Makefile.PL. The script will be installed into /usr/bin/pgbadger and the manpage into /usr/share/man/man1/pgbadger.1.

For example, to install everything just like Debian does, proceed as follows:

    perl Makefile.PL INSTALLDIRS=vendor

By default, INSTALLDIRS is set to site.

POSTGRESQL CONFIGURATION

You must enable and set some configuration directives in your postgresql.conf before starting.

You must first enable SQL query logging to have something to parse:

    log_min_duration_statement = 0

Here every statement will be logged, on a busy server you may want to increase this value to only log queries with a longer duration. Note that if you have log_statement set to 'all', nothing will be logged through the log_min_duration_statement directive. See the next chapter for more information.

pgBadger supports any custom format set in the log_line_prefix directive of your postgresql.conf file as long as it at least specifies a time escape sequence (%t, %m or %n) and a process-related escape sequence (%p or %c).

For example, with 'stderr' log format, log_line_prefix must be at least:

    log_line_prefix = '%t [%p]: '

Log line prefix could add user, database name, application name and client ip address as follows:

    log_line_prefix = '%t [%p]: user=%u,db=%d,app=%a,client=%h '

or for syslog log file format:

    log_line_prefix = 'user=%u,db=%d,app=%a,client=%h '

Log line prefix for stderr output could also be:

    log_line_prefix = '%t [%p]: db=%d,user=%u,app=%a,client=%h '

or for syslog output:

    log_line_prefix = 'db=%d,user=%u,app=%a,client=%h '

You need to enable other parameters in postgresql.conf to get more information from your log files:

    log_checkpoints = on
    log_connections = on
    log_disconnections = on
    log_lock_waits = on
    log_temp_files = 0
    log_autovacuum_min_duration = 0
    log_error_verbosity = default

Do not enable log_statement as its log format will not be parsed by pgBadger.

Of course your log messages should be in English with or without locale support:

    lc_messages='en_US.UTF-8'
    lc_messages='C'

pgBadger parser does not support other locales, like 'fr_FR.UTF-8' for example.

LOG STATEMENTS

Considerations about log_min_duration_statement, log_duration and log_statement configuration directives.

If you want the query statistics to include the actual query strings, you must set log_min_duration_statement to 0 or more milliseconds.

If you just want to report duration and number of queries and don't want all details about queries, set log_min_duration_statement to -1 to disable it and enable log_duration in your postgresql.conf file. If you want to add the most common query report, you can either choose to set log_min_duration_statement to a higher value or to enable log_statement.

Enabling log_min_duration_statement will add reports about slowest queries and queries that took up the most time. Take care that if you have log_statement set to 'all', nothing will be logged with log_min_duration_statement.

Warning: Do not enable both log_min_duration_statement, log_duration and log_statement all together, this will result in wrong counter values. Note that this will also increase drastically the size of your log. log_min_duration_statement should always be preferred.

PARALLEL PROCESSING

To enable parallel processing you just have to use the -j N option where N is the number of cores you want to use.

pgBadger will then proceed as follow:

    for each log file
        chunk size = int(file size / N)
        look at start/end offsets of these chunks
        fork N processes and seek to the start offset of each chunk
            each process will terminate when the parser reach the end offset
            of its chunk
            each process write stats into a binary temporary file
        wait for all children processes to terminate
    All binary temporary files generated will then be read and loaded into
    memory to build the html output.

With that method, at start/end of chunks pgBadger may truncate or omit a maximum of N queries per log file, which is an insignificant gap if you have millions of queries in your log file. The chance that the query that you were looking for is lost is near 0, this is why I think this gap is livable. Most of the time the query is counted twice but truncated.

When you have many small log files and many CPUs, it is speedier to dedicate one core to one log file at a time. To enable this behavior, you have to use option -J N instead. With 200 log files of 10MB each, the use of the -J option starts being really interesting with 8 cores. Using this method you will be sure not to lose any queries in the reports.

Here is a benchmark done on a server with 8 CPUs and a single file of 9.5GB.

     Option |  1 CPU  | 2 CPU | 4 CPU | 8 CPU
    --------+---------+-------+-------+------
       -j   | 1h41m18 | 50m25 | 25m39 | 15m58
       -J   | 1h41m18 | 54m28 | 41m16 | 34m45

With 200 log files of 10MB each, so 2GB in total, the results are slightly different:

     Option | 1 CPU | 2 CPU | 4 CPU | 8 CPU
    --------+-------+-------+-------+------
       -j   | 20m15 |  9m56 |  5m20 | 4m20
       -J   | 20m15 |  9m49 |  5m00 | 2m40

So it is recommended to use -j unless you have hundreds of small log files and can use at least 8 CPUs.

IMPORTANT: when you are using parallel parsing, pgBadger will generate a lot of temporary files in the /tmp directory and will remove them at the end, so do not remove those files unless pgBadger is not running. They are all named with the following template tmp_pgbadgerXXXX.bin so they can be easily identified.

INCREMENTAL REPORTS

pgBadger includes an automatic incremental report mode using option -I or --incremental. When running in this mode, pgBadger will generate one report per day and a cumulative report per week. Output is first done in binary format into the mandatory output directory (see option -O or --outdir), then in HTML format for daily and weekly reports with a main index file.

The main index file will show a dropdown menu per week with a link to each week report and links to daily reports of each week.

For example, if you run pgBadger as follows based on a daily rotated file:

0 4 * * * /usr/bin/pgbadger -I -q /var/log/postgresql/postgresql.log.1 -O /var/www/pg_reports/

you will have all daily and weekly reports for the full running period.

In this mode, pgBadger will create an automatic incremental file in the output directory, so you don't have to use the -l option unless you want to change the path of that file. This means that you can run pgBadger in this mode each day on a log file rotated each week, and it will not count the log entries twice.

To save disk space, you may want to use the -X or --extra-files command-line option to force pgBadger to write JavaScript and CSS to separate files in the output directory. The resources will then be loaded using script and link tags.

Rebuilding reports

Incremental reports can be rebuilt after a pgbadger report fix or a new feature to update all HTML reports. To rebuild all reports where a binary file is still present, proceed as follow:

rm /path/to/reports/*.js
rm /path/to/reports/*.css
pgbadger -X -I -O /path/to/reports/ --rebuild

it will also update all resource files (JS and CSS). Use -E or --explode if the reports were built using this option.

Monthly reports

By default, pgBadger in incremental mode only computes daily and weekly reports. If you want monthly cumulative reports, you will have to use a separate command to specify the report to build. For example, to build a report for August 2019:

pgbadger -X --month-report 2019-08 /var/www/pg_reports/

this will add a link to the month name into the calendar view of incremental reports to look at monthly report. The report for a current month can be run every day, it is entirely rebuilt each time. The monthly report is not built by default because it could take a lot of time following the amount of data.

If reports were built with the per-database option ( -E | --explode ), it must be used too when calling pgbadger to build monthly report:

pgbadger -E -X --month-report 2019-08 /var/www/pg_reports/

This is the same when using the rebuild option ( -R | --rebuild ).

BINARY FORMAT

Using the binary format it is possible to create custom incremental and cumulative reports. For example, if you want to refresh a pgBadger report each hour from a daily PostgreSQL log file, you can proceed by running the following commands each hour:

pgbadger --last-parsed .pgbadger_last_state_file -o sunday/hourX.bin /var/log/pgsql/postgresql-Sun.log

to generate the incremental data files in binary format. And to generate the fresh HTML report from that binary file:

pgbadger sunday/*.bin

Or as another example, if you generate one log file per hour and you want reports to be rebuilt each time the log file is rotated, proceed as follows:

    pgbadger -o day1/hour01.bin /var/log/pgsql/pglog/postgresql-2012-03-23_10.log
    pgbadger -o day1/hour02.bin /var/log/pgsql/pglog/postgresql-2012-03-23_11.log
    pgbadger -o day1/hour03.bin /var/log/pgsql/pglog/postgresql-2012-03-23_12.log
    ...

When you want to refresh the HTML report, for example, each time after a new binary file is generated, just do the following:

    pgbadger -o day1_report.html day1/*.bin

Adjust the commands to suit your particular needs.

JSON FORMAT

JSON format is good for sharing data with other languages, which makes it easy to integrate pgBadger result into other monitoring tools, like Cacti or Graphite.

AUTHORS

pgBadger is an original work from Gilles Darold.

The pgBadger logo is an original creation of Damien Cazeils.

The pgBadger v4.x design comes from the "Art is code" company.

This web site is a work of Gilles Darold.

pgBadger is maintained by Gilles Darold and everyone who wants to contribute.

Many people have contributed to pgBadger, they are all quoted in the Changelog file.

LICENSE

pgBadger is free software distributed under the PostgreSQL Licence.

pgbadger's People

Contributors

Stargazers

Watchers

Forkers

ireuben badalex vpicavet theory stasic neoeahit vinnix pfigue rjuju fabriziomello whuang-simoly daamien davidfetter jessegonzalez darkixion mromaine chiehwen mulriple petergram ifduyue atmsk avances123 cyril-bouthors jcarnu frost242 zhuomingliang orakle salahaldin tangsty jayon sebastianwebber kenyoon luddic sougatachandra stisti sanga zinsword dsheth atdhemuhaxhiri l1101100 anayrat luisuribe wmorancfi openbrian schmiddy kmk-online evilmartians lansz himanchali rodo samuelckaufman jmdcal brin100 gavioto jberkus lifeles666 anyxgit vocatan glynastill shamanime korriliam eshkinkot murguia fysoft2006 underley sarfarazsoomro davinirjr andersondreger teamcreed mydba samueltovar04 houssemba valterfc eddienko duanyp1991 christophe31 swrd nbluis cenj sheeju bricklen athieme vinicius-ianni aaslin girgen dharamsk cesarmarinhorj taojonesin yuanshengzeng credativ komeda-shinji robertvargason tom--bo microtan seikoudoku2000 dynjnelson vigevenoj kokizzu bitnitdit madtibo

pgbadger's Issues

Reports for syslog format are broken in pgbadger 2.3

Last release introduce a major bug on syslog parser all graphs are broken and year part of the start/end dates are false.

HTML in queries is not escaped

It seems that if you have HTML in a query, it is not escaped and is consequently parsed by the browser.

No execution plan with query when auto_explain enable

With pgfouine when we load the shared_preload_libraries auto_explain. the report generated with pgfouine give the plan with the request found in log.
It is not implemented with pgbadger.
Was it a bug in pgfouine ?
Are you going to reproduce this fonctionality with pgbadger or not ?

Failure to recognize syslog lines that begin with a timestamp

My syslog lines begin with a timestamp like "2013-02-28T10:35:11-05:00" instead of the usual "Mar 6 22:48:34.452 UTC" format. The "obvious" solution is to add a flag or automatic detection of the line format, but perhaps other people do not use this format.

Undefined subroutine &main::tempfile called at /usr/bin/pgbadger line 676.

On CentOS 5.9 x86_64 w/perl-5.8.8-38.el5_8, I get this error with pgbadger 3.1 while using the -j flag:

Undefined subroutine &main::tempfile called at /usr/bin/pgbadger line 676.

The pgbadger command is:

pgbadger -j 2 -q -b "2013-02-26 00:00:00" -e "2013-02-26 23:59:59" -o /tmp/pgbadger/2013-02-26.html /tmp/pgbadger/pglog.2013-02-26*.gz

Using --dbname with csv log format

The csv format has the timestamp, dbname, username, etc., so I assume I do not need to use --log_line_prefix. This isn't clear to me after reading the docs.

However, I cannot use the --dbname or --dbuser arguments to filter my logfiles. In each case, it produces the same report with all log data.

What am I doing wrong?

Re-licensing pgBadger under the PostgreSQL license

Robert Treat asked if we would consider re-licensing pgBadger under the PostgreSQL license (http://opensource.org/licenses/PostgreSQL). I'm agree for that change any comments, objections from contributors ?

Unlikely total duration reported (Queries that took up the most time)

pgbadger is reporting some very wrong data for the duration of queries. 13 minutes for average of executing BEGIN?

1   3359d9h39m59.44s 295,065 16m23.69s SELECT...
2 2930d17h9m50.89s 329,020 12m49.60s COMMIT;
3   2845d54m6.87s 317,910 12m53.21s BEGIN;

Would it help if I provided an example of the CSV file I'm parsing?

Cannot take "-" in the db name

is there a way to include characters in the db name?

say for example a dash? for example db=xxxx-xxxx. currently this is erroring out and pgbadger is not parcing any logs!

new feature: date and time of query

Seems to me it should be pretty easy thing to do. In the list of every top example query after the duration would be nice to see exact date and time of the query. Some thing like:

1m2.24s | 2010-10-10 10:10:10 | SELECT * FROM test1 WHERE ( id = 10 ) ;

not most usefull thing but would be nice to have

Support %m in stderr log files

It would really be nice if pgbadger supported %m in log_line_prefix for stderr format log file so that we could see fractional seconds.

Are there any plans to add this functionality?

Log all requets of a function with pgbadger

Bonjour,

Nous utilisons PgBadger (qui fonctionne très bien) mais nous aimerions savoir s'il est possible d'avoir le détail de l'exécution d'une fonction (toutes les requêtes qui sont exécutées par cette fonction). Car par défaut, PgBadger se contente d'afficher seulement la durée de la fonction entière mais pas le détail.

Error parsing with last source

Hi,

I get this error when i run pgbadger from last source
D:\log>perl D:\programme\pg_badger.git\pgbadger\pgbadger plop.log
[=======================> ] Parsed 84315852 bytes of 84388628 (99.91%), queries: 129873, events: 1
Variable length lookbehind not implemented in regex m/(?<!CLASSSY0B$$)\bALTER\b/ at D:\programme\pg_badger.git\pgbadger\pgbadger line 3915, line 46.

No problem with version 2.2
D:\log>perl D:\programme\pgbadger-2.2\pgbadger plop.log
[=======================> ] Parsed 84315852 bytes of 84388628 (99.91%), queries: 259618, events: 1

Cédric

CSV Log on STDIN Broken

Contrary to documentation, Logfiles cannot be passed on stdin.

# cat 2012-09-20_000000.csv | pgbadger -
FATAL: cannot read csvlog file -. No such file or directory

version:

# pgbadger --version
pgBadger version 2.0

Illegal division by zero if empty log file & txt output

When I have a log file with no DML and am outputting to .txt, it fails at illegal division by zero (note this does not manifest when output is .html)

%> pgbadger -o /tmp/aoeu.txt $PGDATA/pg_log/postgresql-Tue.csv
[======================> ] Parsed 51141 bytes of 54474 (93.88%)
Illegal division by zero at /usr/bin/pgbadger line 986.

The lines in question:

print $fh "SELECT: ", &comma_numbers($overall_stat{'SELECT'}) || 0, " ",
sprintf("%0.2f", ($overall_stat{'SELECT'} * 100) / $total), "%\n";

Cannot parse CSV logs from stdin

This is just a minor gripe; I only report it because that syntax is explicitly used as an example in the man page.

When you try to parse a CSV log from stdin, e.g. with
cat log.csv | pgbadger -
you get this error message:
FATAL: cannot read csvlog file -. No such file or directory

Parsing using syslog format returns no results

Here is a sample line from our logs, with names removed

Sep 30 03:09:01 host postgres[17246]: [ID * local2.info] [5-1] db=,user= LOG: duration: 1014.913 ms statement: SELECT * FROM ..

When I run pgbadger using -f syslog none of the queries show up. If I change log_line_prefix from

log_line_prefix = 'user=%u,db=%d '

log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d '

and parse using stderr all but the multiline queries parse correctly.

Any ideas?

Fatal: logfile <filename> must exists!

sudo ./pgbadger /var/lib/postgresql/9.2/main/pg_log/postgresql-2013-02-11_174034.log
//ok

sudo ./pgbadger /var/lib/postgresql/9.2/main/pg_log/postgresql-2013-02-11*
//Fatal: logfile /var/lib/postgresql/9.2/main/pg_log/postgresql-2013-02-11* must exists!

Can't handle syslog-formatted log

./pgbadger --format syslog --graph --outfile report.html  /var/log/syslog

Shows an error for each log statement and produces report without any data.

Sample input:

Jun 30 18:57:47 localhost postgres[5902]: [57-1] 2012-06-30 18:57:47 UTC LOG:  duration: 4063.003 ms  statement: select pg_sleep(4);

Error message:

DEBUG: Unknown syslog line format: Jun 30 18:57:47 localhost postgres[5902]: [57-1] 2012-06-30 18:57:47 UTC LOG:  duration: 4063.003 ms  statement: select pg_sleep(4);

Format queries to be more readable

It's great that queries have syntax highlighting, but it would be even better if you could click on a query (or maybe a format icon next to it) to turn the single line query into a nicely formatted query, similar to the formatting at http://www.dpriver.com/pp/sqlformat.htm

Support for hostnames with port as well as ip numbers

Firstly - great script! Thanks.

Our syslog outputs hostnames not ip numbers. Sometimes the hostname is not there at all.

I've modified the regex for the %r i.e. t_hostport from

'%r' => [('t_hostport', '([local]|\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})?[(\d)]*')],

'%r' => [('t_hostport', '(|[a-z,0-9.](\d)|[local]|\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}(\d*))')],

Is this something you can incorporate into a future release?

Thanks!

PS. Not sure if this allows for local. I don't know if that's valid.

[FEATURE REQUEST] Add --explain and/or --explain-analyze

I'd love to be able to invoke pgbadger and generate an html file that had the EXPLAIN [ANALYZE] output already included. Adding this functionality, would of course, require additional authentication functionality, but would be really useful and cool :)

Also, if --explain-analyze + -v, it could EXPLAIN ANALYZE VERBOSE

Graph rendering incorrectly

Got an extra column with no corresponding column header in my table. Might be because hours do not have their own column, if those are in fact supposed to be hours.

This is whatever version is included in the zip download available on the main page. I presume head? The bottom of the page says 1.0. I'm on Debian/Chromium, for what that's worth.

http://i.imgur.com/rN2UP.png

Thanks. :-)

log_statement and log_min_duration used together

Hello !

At first, many thanks for pgbadger - it's really a great tool.

Now here's my question :
We have several databases for which we log the following :

statements longer than 250ms (log_min_duration = 250)
all ddl statements, regardless of their duration (log_statement = 'ddl')

What would be cool for us is to have reporting on all statements longer than 250ms, including the dll ones.
I reckon that currently, pgbadger will simply ignore all these ddl statements.
Indeed, the duration does not appear on the same line that the statement, but in a following line (log_duration style).

The new option --enable-log_duration would not help either, as in this case it would only compute general statistics on ddl longer than 250ms, and no detail about the statements (n top, ...) at all.

So at this point, without changing our logging method, we cannot use pgbadger to get the top n statements - only the n top "non ddl statements".

Am I correct about this, or did I missed anything ?

From how I see log_statement logging format, I am under the impression that this need could be answered while parsing the log file, by looking for the next "duration only" line with the same pid when the current line is a "no duration" statement.
Obviously that could weight the code really hard, without even talking about performances ... but does it look like something you would want to add to pgbadger ?

Mael.

Having a field named field gets misformatted

For example a query like SELECT * FROM "table" WHERE "table"."field" gets formatted with field in bold red text, as if it were the FIELD keyword in SQL.

Only reference to queries in reports it a count of all queries. No SELECT info, UPDATE info, or specific queries

This is most likely a config issue on my part, but I cant' figure out what. In the reports the general "Queries" count and avg. duration are there, but not for "Select Queries" or "Write Queries". Also there is no information on longest queries, etc.

I truncated my log file and ran a few queries again just to make sure there were no statements from before my logging changes went into effect

Command:

pgbadger  -v -f syslog -s 10 -T "PGBadger Test" --prefix 'user=%u,db=%d,rh=%h,vx=%v ' --enable-log_duration --top 20 -o /var/www/sa-prod-postgres2_general.log.html /var/log/postgresql/query/sa-prod-postgres2_general.log

Output (w/ -v):

DEBUG: Autodetecting log duration format: log_duration is enabled.
DEBUG: Output 'html' reports will be written to /var/www/sa-prod-postgres2_general.log.html
DEBUG: Starting to parse log file: /var/log/postgresql/query/sa-prod-postgres2_general.log
DEBUG: Starting reading file /var/log/postgresql/query/sa-prod-postgres2_general.log...
[========================>] Parsed 486717 bytes of 486717 (100.00%), queries: 496, events: 0
DEBUG: the log statistics gathering took: 1 wallclock secs ( 0.13 usr +  0.00 sys =  0.13 CPU)
DEBUG: Ok, generating html report...
DEBUG: the report generating took: 0 wallclock secs ( 0.04 usr +  0.00 sys =  0.04 CPU)
DEBUG: the total execution time took: 1 wallclock secs ( 0.17 usr +  0.00 sys =  0.17 CPU)

Logging config:

syslog_facility = 'LOCAL0'
syslog_ident = 'postgres'
log_checkpoints = on
log_connections = on
log_disconnections = on
log_duration = on
log_line_prefix = 'user=%u,db=%d,rh=%h,vx=%v '
log_lock_waits = on
log_statement = 'all'

Logged queries:

Jan 23 13:56:09 sa-prod-postgres2 postgres[20910]: [497-1] user=postgres,db=abacus_production,rh=172.16.2.21,vx=2/0 LOG:  duration: 0.076 ms
Jan 23 13:56:09 sa-prod-postgres2 postgres[20910]: [496-1] user=postgres,db=abacus_production,rh=172.16.2.21,vx=2/254 LOG:  statement: 
Jan 23 13:56:09 sa-prod-postgres2 postgres[20910]: [497-1] user=postgres,db=abacus_production,rh=172.16.2.21,vx=2/0 LOG:  duration: 0.076 ms
Jan 23 13:56:09 sa-prod-postgres2 postgres[20910]: [498-1] user=postgres,db=abacus_production,rh=172.16.2.21,vx=2/255 LOG:  statement: SELECT t.relname, fns.nspname AS foreign_schema, f.relname AS foreign_table, c.conname, c.confkey, a.attname AS foreign_field, a.attnum FROM pg_constraint c LEFT JOIN pg_namespace ns ON (c.connamespace = ns.oid) LEFT JOIN pg_class t ON (c.conrelid = t.oid) INNER JOIN pg_class f ON (c.confrelid = f.oid) LEFT JOIN pg_namespace fns ON (f.relnamespace = fns.oid) INNER JOIN pg_attribute a ON (a.attrelid = f.oid) WHERE a.attnum > 0 AND ns.nspname = 'public' AND t.relname = 'categories' ORDER BY t.relname, c.conname, a.attnum
Jan 23 13:56:09 sa-prod-postgres2 postgres[20910]: [498-1] user=postgres,db=abacus_production,rh=172.16.2.21,vx=2/255 LOG:  statement: SELECT t.relname, fns.nspname AS foreign_schema, f.relname AS foreign_table, c.conname, c.confkey, a.attname AS foreign_field, a.attnum FROM pg_constraint c LEFT JOIN pg_namespace ns ON (c.connamespace = ns.oid) LEFT JOIN pg_class t ON (c.conrelid = t.oid) INNER JOIN pg_class f ON (c.confrelid = f.oid) LEFT JOIN pg_namespace fns ON (f.relnamespace = fns.oid) INNER JOIN pg_attribute a ON (a.attrelid = f.oid) WHERE a.attnum > 0 AND ns.nspname = 'public' AND t.relname = 'categories' ORDER BY t.relname, c.conname, a.attnum
Jan 23 13:56:09 sa-prod-postgres2 postgres[20910]: [499-1] user=postgres,db=abacus_production,rh=172.16.2.21,vx=2/0 LOG:  duration: 1.543 ms
Jan 23 13:56:09 sa-prod-postgres2 postgres[20910]: [499-1] user=postgres,db=abacus_production,rh=172.16.2.21,vx=2/0 LOG:  duration: 1.543 ms

missing semicolon problem

In all tabs of pgbadger out html file, if a query does not contain ";" character at the end, pgbadger can not seperately show related query.And tilll to the first semicolon character it shows all..
I can send some related examples if any want.Can you check please

Timestamp pattern failure under Windows

Log parser fail under Windows because of missing timezone information in timestamp.

empty reports

sorry for the support request in an issues forum. I THINK I set up the Postgres logging configuration exactly like it says in the docs:

log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d '

and my log file contains hundreds of lines that look like this:

2012-07-31 23:55:23 EDT [29158]: [3-1] user=ccurvey,db=postgres LOG:  duration: 6.348 ms  statement: BEGIN;SET statement_timeout=30000;COMMIT;SELECT 'S' AS syn, name AS nn, setting AS val
    FROM pg_settings
    WHERE name = 'data_directory'
    OR name ='log_directory'
    UNION ALL
    SELECT 'T' AS syn, spcname AS nn, spclocation AS val
    FROM pg_tablespace
    WHERE spclocation <> ''

But when I run pgbadger over my logs, I get:

Number of unique normalized queries: 0 
Number of queries: 0

and a series of blank graphs. Can anyone see what I'm doing wrong?

Strange average duration issues

Are the graphs for "average duration" supposed to look like this?

http://i.imgur.com/Ihyrx.png

Sorry, if this is intended behavior.

[FEATURE REQUEST] Allow Reports to be stored in the database

Somewhat like pt-query-digest's --review function for MySQL (http://www.percona.com/doc/percona-toolkit/2.1/pt-query-digest.html#cmdoption-pt-query-digest--review).

There could be a pgbadger database that has tables with aggregate and raw information so we could do a bit more interesting reporting via raw SQL.

Rename --name option

--name is too generic. As it helps to filter on the application name, --appname would be better, I think.

Option to exclude types of queries

Would be great to be able to exclude types of queries, specifically COPY and ANALYZE. Perhaps a regex to make it flexible?

empty queries in reports

Just d/l'ed the latest version and no matter what I do, I get no queries in my reports. I tried the -f stderr and have used the log format as described on the site. Just keeps saying queries: 0, events: 0. But I have literally thousands of queries with the duration logged.

Most (but not all) of these are prepared-statements, but are logged with their "Details" (the where-clause parameters are listed).

I could really use some help trying to figure out why pgbadger is not reporting anything about my queries. I probably can upload a sample log file (I am using stderr with the log_collector on 9.0, but I would gzip the file before uploading or emailing it).

Thanks for any tips and tricks.
-AJ

"seekable" is not defined in %File::Temp::EXPORT_TAGS at /usr/bin/pgbadger line 45

On CentOS 5.9 x86_64 I get this error after upgrading to pgbadger 3.1:

"seekable" is not defined in %File::Temp::EXPORT_TAGS at /usr/bin/pgbadger line 45
main::BEGIN() called at /usr/lib/perl5/5.8.8/File/Temp.pm line 45
eval {...} called at /usr/lib/perl5/5.8.8/File/Temp.pm line 45
Can't continue after import errors at /usr/bin/pgbadger line 45
BEGIN failed--compilation aborted at /usr/bin/pgbadger line 45.

perl is the stock CentOS 5.9 rpm: perl-5.8.8-38.el5_8
pgbadger taken from: http://yum.postgresql.org/9.2/redhat/rhel-5-x86_64/pgbadger-3.1-1.rhel5.noarch.rpm

The pgbadger command was:

pgbadger -q -b "2013-02-26 00:00:00" -e "2013-02-26 23:59:59" -o /tmp/pgbadger/2013-02-26.html /tmp/pgbadger/pglog.2013-02-26*.gz

My PostgreSQL 9.2.3 log settings are:

log_destination = 'stderr' # Valid values are combinations of
logging_collector = on # Enable capturing of stderr and csvlog
log_directory = '/var/log/pgsql' # directory where log files are written,
log_filename = 'pglog.%Y-%m-%d-%H' # log file name pattern,

log_file_mode = 0600 # creation mode for log files,

log_truncate_on_rotation = off # If on, an existing log file with the
log_rotation_age = 1h # Automatic rotation of logfiles will
log_rotation_size = 0 # Automatic rotation of logfiles will

syslog_facility = 'LOCAL0'

syslog_ident = 'postgres'

event_source = 'PostgreSQL'

client_min_messages = notice # values in order of decreasing detail:

log_min_messages = warning # values in order of decreasing detail:

log_min_error_statement = error # values in order of decreasing detail:
log_min_duration_statement = 0 # -1 is disabled, 0 logs all statements

debug_print_parse = off

debug_print_rewritten = off

debug_print_plan = off

debug_pretty_print = on

log_checkpoints = on
log_connections = on
log_disconnections = on
log_duration = off

log_error_verbosity = default # terse, default, or verbose messages

log_hostname = off

log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d ' # special values:
log_lock_waits = on # log lock waits >= deadlock_timeout
log_statement = 'none' # none, ddl, mod, all
log_temp_files = 1024 # log temporary files equal or larger
log_timezone = 'America/Denver'

pg configuration

In postgresql.conf what should be the value of the following
lc_messages =
lc_monetary =
lc_numeric =
lc_time =
It doesn't seem to work with 'French_Belgium.1252'

Thanks

Unable to parse 8.2 log

The tool returns pretty much empty data for postgres 8.2.14 log. It is a greenplum environment using 4.0.6 version. And here is the only related config turned-on

Greenplum specific configuration parameters for Master instance database

------------------------------------------------------------------------

log_statement=all

The csv log file content looks like below:

2013-01-30 06:04:27.427986 PST,,,p12117,th1515398256,,,,0,,,seg-1,,,,,"LOG","00000","database system was shut down at 2013-01-30 06:02:41 PST",,,,,,,0,,"xlog.c",6239,
2013-01-30 06:04:27.428179 PST,,,p12117,th1515398256,,,,0,,,seg-1,,,,,"LOG","00000","checkpoint record is at 98/E3841810",,,,,,,0,,"xlog.c",6338,

dbclient not working

Hi,

I have a csv log file and ganerated a report just fine, but when trying to generate the report using the --dbclient (-c) option it gives me no queries. That particular host has more than 600 sessions so it should have generated something.

Gzipped CSV Log Broken

This works fine if I decompress the file first then run pgbadger on the uncompressed log file, but...

Output when passing a gzipped csv file:

# pgbadger --verbose 2012-09-20_000000.csv.gz
DEBUG: Autodetected log format 'csv' from 2012-09-20_000000.csv.gz
DEBUG: Output 'html' reports will be written to out.html
DEBUG: Starting to parse log file: 2012-09-20_000000.csv.gz
FATAL: cannot use CSV, EIF - Loose unescaped quote
[>                        ] Parsed       483 bytes of 109459647 (0.00%)
DEBUG: the log statistics gathering tooks: 0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
DEBUG: Ok, generating html report...
DEBUG: the generating of reports tooks: 0 wallclock secs ( 0.01 usr +  0.00 sys =  0.01 CPU)
DEBUG: the total execution time tooks: 0 wallclock secs ( 0.01 usr +  0.00 sys =  0.01 CPU)

Versions:

# perl --version

This is perl, v5.10.0 built for i486-linux-gnu-thread-multi

Copyright 1987-2007, Larry Wall
<SNIP>


# pgbadger --version
pgBadger version 2.0

Long queries go outside the table

If you have a very long query, it pushes out of the table and makes the window scroll horizontally. Seems like it would be better if it wrapped.

zcat on OS X always appends a .Z to the filename (better use gunzip -c)

I just executed pgbadger on my Mac and was suprised that it is not working as expected:

% ls gistfile1.txt.gz 
gistfile1.txt.gz
% ./pgbadger -f syslog gistfile1.txt.gz 
zcat: gistfile1.txt.gz.Z: No such file or directory
[>                        ] Parsed    0 bytes of 6191 (0.00%), queries: 0, events: 0
LOG: Ok, generating html report...
%

I did some debugging and found out that zcat on OS X is running in POSIX mode and will always append .Z to all filenames (man zcat):

In POSIX mode zcat will append .Z to all filenames that do not have that suffix.

Running zcat with a simple .gz file results in an error:

% zcat gistfile1.txt.gz 
zcat: gistfile1.txt.gz.Z: No such file or directory

Possible solution: Use gunzip -c which is more portable (runs on my Linux and my Mac box).

% gunzip -c gistfile1.txt.gz
Mar  1 05:00:06 db01 postgres[20166]: [260-1] 10.4.20.108(42162):user@database:[20166]: LOG:  duration: 2646.638 ms  execute : SELECT id,count(*) as column FROM table WHERE id in ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32, $33, $34, $35, $36, $37, $38, $39, $40, $41, $42, $43, $44, $45, $46, $47, $48, $49, $50, $51, $52, $53, $54, $55, $56, $57, $58, $59, $60, $61, $62, $63, $64, $65, $66, $67, $68, $69, $70, $71, $72, $73, $74, $75, $76, $77, $78, $79, $80, $81, $82, $83, $84, $85, $86, $87, $88, $89, $90, $91, $92, $93, $94, $95, $96, $97, $98, $99, $100) AND state in ($101, $102, $103, $104, $105) AND creation_date >= $106 and       creation_date <= $107 group by id
...

!= getting replaced by $$CLASSSY0A$$!=$$CLASSSY0B$$ in the output

Incorrect parse with custom prefix

Hello, I have this prefix in Postgres conf:

log_line_prefix = '%t ' # special values:

And lines like this in my log:

2012-10-22 09:54:23 MSK LOG: checkpoints are occurring too frequently (3 seconds apart)
2012-10-22 09:54:23 MSK HINT: Consider increasing the configuration parameter "checkpoint_segments".
2012-10-22 09:54:24 MSK LOG: checkpoints are occurring too frequently (1 second apart)
2012-10-22 09:54:24 MSK HINT: Consider increasing the configuration parameter "checkpoint_segments".
2012-10-22 09:54:26 MSK LOG: checkpoints are occurring too frequently (2 seconds apart)
2012-10-22 09:54:26 MSK HINT: Consider increasing the configuration parameter "checkpoint_segments".
2012-10-25 12:31:49 MSK FATAL: remaining connection slots are reserved for non-replication superuser connections

But when I run:

$ pgbadger --prefix '%t ' -f stderr log.tar.bz2
gzip: log.tar.bz2: not in gzip format
........................................................................................................................................................................................................................................................................................................................
$

In out.html one event only that is all log.

no queries in report

Hello,

I've been testing pgbadger 2.2 for a while and has been working fine. I've let it work on an cronjob and let it go for a while.

Reports where great during the couple days I was testing it. But now my reports only show session connections and host/database/user connections, but no query analyses.
The odd thing is that the logs I feed him has queries in it (FYI I set log_min_duration_statement = 2000 ).
I don't understand how it happened all of the sudden since it's been working correctly for some time before.

Here's the command line I user with all the options :

/usr/bin/pgbadger -f stderr --prefix '%t [%p]: [%l-1] user=%u, db=%d ' -o /path/to/logs/index.html /path/to/logs/*

Options for which reports to generate

Would be great to be able to specific which reports you want and don't want on the page.

Possibility to get log from stdin

Replacement of query parameters is not working with long queries

I encountered some issue with log files that contain long queries. Only the first parameters are replaced by its actual values. It looks like the line breaks in the parameter list are causing some trouble.

This is a minimal example of a log file that can be processed via pgbadger -f syslog foo: https://gist.github.com/oxydo/5064008

pgbadger incorrectly dies when given an empty file

%> pgbadger $PGDATA/pg_log/postgresql-Wed.csv
WARNING: file /data/pgsql/pg_log/postgresql-Wed.csv is empty
FATAL: you must give a log file as command line parameter.
...

When automating reports, this behavior causes issue because it's not technically correct. I have given a log file, but there are no parse-able events. So I believe the correct behavior would be to report as if there were zero events.

3 decimal places on timings

Hey,
Just got this running on a windows server.
Works great!! just a couple queries.

Can the durations have 3 decimal places, so that ms can be seen?
In the windows batch file that I run,
__END__ :endofperl is displayed at the very top above the heading. Not a very big deal but just thought I would raise it. I'm using Strawberry Perl and NMake from VS2012.

Cheers,
Adam

wrong data aggregation when using extended query protocol

hi.

it looks like when using the extended protocol (parse/bind/execute) the aggregation of statements somehow fails.

eg: when i run a simple script against the db with just a single query being prepared, than binding some variables and finally executing the query, i will get the same query reported twice.
It doesn't matter whether it is a SELECT or UPDATE, (or whatever) the result i get when processing the logfile is that those queries are reported twice.

for me it seems that the query gets reported for every execute (which is okay) and also for the binding part...
so i'm finally getting the query reported with a runtime of several ms plus an extra entry of the same query with 0 ms (=bind part)

for my testing i used pgbadger 2.2 and postgres 9.1.6 with the recommended log_line_prefix

postgres-log: https://gist.github.com/4223968
used system call: pgbadger -f stderr -p '%t [%p]: [%l-1] db=%d,user=%u ' -o /tmp/pgbader.html postgresql-Thu.log

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.