Code Monkey home page Code Monkey logo

pgreplay's Introduction

pgreplay - record and replay real-life database workloads

pgreplay reads a PostgreSQL log file (not a WAL file), extracts the SQL statements and executes them in the same order and with the original timing against a PostgreSQL database.

If the execution of statements gets behind schedule, warning messages are issued that indicate that the server cannot handle the load in a timely fashion.

A final report gives you a useful statistical analysis of your workload and its execution.

The idea is to replay a real-world database workload as exactly as possible.

This is useful for performance tests, particularly in the following situations:

  • You want to compare the performance of your PostgreSQL application on different hardware or different operating systems.
  • You want to upgrade your database and want to make sure that the new database version does not suffer from performance regressions that affect you.

Moreover, pgreplay can give you some feeling as to how your application might scale by allowing you to try to replay the workload at a higher speed (if that is possible; see implementation details below). Be warned, though, that 500 users working at double speed is not really the same as 1000 users working at normal speed.

While pgreplay will find out if your database application will encounter performance problems, it does not provide a lot of help in the analysis of the cause of these problems. Combine pgreplay with a specialized analysis program like pgBadger for that.

As an additional feature, pgreplay lets you split the replay in two parts: you can parse the log file and create a "replay file", which contains just the statements to be replayed and is hopefully much smaller than the original log file.
Such a replay file can then be run against a database.

pgreplay is written by Laurenz Albe and is inspired by "Playr" which never made it out of Beta.

Installation

pgreplay needs PostgreSQL 8.0 or better.

It is supposed to compile without warnings and run on all platforms supported by PostgreSQL.
Since I only got to test it on Linux, AIX, FreeBSD and Windows, there may be problems with other platforms. I am interested in reports and fixes for these platforms.
On Windows, only the MinGW build environment is supported (I have no other compiler). That means that there is currently no 64-bit build for Windows (but a 32-bit executable should work fine anywhere).

To build pgreplay, you will need the pg_config utility. If you installed PostgreSQL using installation packages, you will probably have to install the development package that contains pg_config and the header files.

If pg_config is on the PATH, the installation process will look like this:

  • unpack the tarball
  • ./configure
  • make
  • make test (optional, described below)
  • make install (as superuser)

If your PostgreSQL installation is in a nonstandard directory, you will have to use the --with-postgres=<path to location of pg_config> option of configure.

Unless you link it statically, pgreplay requires the PostgreSQL client shared library on the system where it is run.

The following utilities are only necessary if you intend to develop pgreplay:

  • autoconf 2.62 or better to generate configure
  • GNU tar to make tarball (unless you want to roll it by hand)
  • groff to make the HTML documentation with make html

Docker

The Dockerfile provided with the software can be used as a starting point for creating a container that runs pgreplay. Adapt is as necessary.

Here are commands to build and run the container:

# build the image
docker build -t laurenz/pgreplay -f Dockerfile .

# and run it
docker run --rm -ti -v $(pwd):/app -w /app laurenz/pgreplay pgreplay -h

Testing

You can run a test on pgreplay before installing by running make test. This will parse sample log files and check that the result is as expected.

Then an attempt is made to replay the log files and check if that works as expected. For this you need psql installed and a PostgreSQL server running (on this or another machine) so that the following command will succeed:

psql -U postgres -d postgres -l

You can set up the PGPORT and PGHOST environment variables and a password file for the user if necessary.

There have to be a login roles named hansi and postgres in the database, and both users must be able to connect without a password. Only postgres will be used to run actual SQL statements. The regression test will create a table runtest and use it, and it will drop the table when it is done.

Usage

First, you will need to record your real-life workload. For that, set the following parameters in postgresql.conf:

  • log_min_messages = error (or more)
    (if you know that you have no cancel requests, log will do)
  • log_min_error_statement = log (or more)
  • log_connections = on
  • log_disconnections = on
  • log_line_prefix = '%m|%u|%d|%c|' (if you don't use CSV logging)
  • log_statement = 'all'
  • lc_messages must be set to English (the encoding does not matter)
  • bytea_output = escape (from version 9.0 on, only if you want to replay the log on 8.4 or earlier)

It is highly recommended that you use CSV logging, because anything that the PostgreSQL server or any loaded modules write to standard error will be written to the stderr log and might confuse the parser.

Then let your users have their way with the database.

Make sure that you have a pg_dumpall of the database cluster from the time of the start of your log file (or use the -b option with the time of your backup). Alternatively, you can use point-in-time-recovery to clone your database at the appropriate time.

When you are done, restore the database (in the "before" state) to the machine where you want to perform the load test and run pgreplay against that database.

Try to create a scenario as similar to your production system as possible (except for the change you want to test, of course). For example, if your clients connect over the network, run pgreplay on a different machine from where the database server is running.

Since passwords are not logged (and pgreplay consequently has no way of knowing them), you have two options: either change pg_hba.conf on the test database to allow trust authentication or (if that is unacceptable) create a password file as described by the PostgreSQL documentation. Alternatively, you can change the passwords of all application users to one single password that you supply to pgreplay with the -W option.

Limitations

pgreplay can only replay what is logged by PostgreSQL. This leads to some limitations:

  • COPY statements will not be replayed, because the copy data are not logged. I could have supported COPY TO statements, but that would have imposed a requirement that the directory structure on the replay system must be identical to the original machine. And if your application runs on the same machine as your database and they interact on the file system, pgreplay will probably not help you much anyway.
  • Fast-path API function calls are not logged and will not be replayed. Unfortunately, this includes the Large Object API.
  • Since the log file is always written in the database encoding (which you can specify with the -E switch of pgreplay), all SET client_encoding statements will be ignored.
  • If your cluster contains databases with different encoding, the log file will have mixed encoding as well. You cannot use pgreplay well in such an environment, because many statements against databases whose encoding does not match the -E switch will fail.
  • Since the preparation time of prepared statements is not logged (unless log_min_messages is debug2 or more), these statements will be prepared immediately before they are first executed during replay.
  • All parameters of prepared statements are logged as strings, no matter what type was originally specified during bind. This can cause errors during replay with expressions like $1 + $2, which will cause the error operator is not unique: unknown + unknown.

While pgreplay makes sure that commands are sent to the server in the order in which they were originally executed, there is no way to guarantee that they will be executed in the same order during replay: Network delay, processor contention and other factors may cause a later command to "overtake" an earlier one. While this does not matter if the commands don't affect each other, it can lead to SQL statements hitting locks unexpectedly, causing replay to deadlock and "hang". This is particularly likely if many different sessions change the same data repeatedly in short intervals.

You can work around this problem by canceling the waiting statement with pg_cancel_backend. Replay should continue normally after that.

Implementation details

pgreplay will track the "session ID" associated with each log entry (the session ID uniquely identifies a database connection). For each new session ID, a new database connection will be opened during replay. Each statement will be sent on the corresponding connection, so transactions are preserved and concurrent sessions cannot get in each other's way.

The order of statements in the log file is strictly preserved, so there cannot be any race conditions caused by different execution speeds on separate connections. On the other hand, that means that long running queries on one connection may stall execution on concurrent connections, but that's all you can get if you want to reproduce the exact same workload on a system that behaves differently.

As an example, consider this (simplified) log file:

session 1|connect
session 2|connect
session 1|statement: BEGIN
session 1|statement: SELECT something(1)
session 2|statement: BEGIN
session 2|statement: SELECT something(2)
session 1|statement: SELECT something(3)
session 2|statement: ROLLBACK
session 2|disconnect
session 1|statement: COMMIT
session 2|disconnect

This will cause two database connections to be opened, so the ROLLBACK in session 2 will not affect session 1. If SELECT something(2) takes longer than expected (longer than it did in the original), that will not stall the execution of SELECT something(3) because it runs on a different connection. The ROLLBACK, however, has to wait for the completion of the long statement. Since the order of statements is preserved, the COMMIT on session 1 will have to wait until the ROLLBACK on session 2 has started (but it does not have to wait for the completion of the ROLLBACK).

pgreplay is implemented in C and makes heavy use of asynchronous command processing (which is the reason why it is implemented in C). This way a single process can handle many concurrent connections, which makes it possible to get away without multithreading or multiprocessing.

This avoids the need for synchronization and many portability problems. But since TINSTAAFL, the choice of C brings along its own portability problems. Go figure.

Replay file format

The replay file is a binary file, integer numbers are stored in network byte order.

Each record in the replay file corresponds to one database operation and is constructed as follows:

  • 4-byte unsigned int: log file timestamp in seconds since 2000-01-01
  • 4-byte unsigned int: fractional part of log file timestamp in microseconds
  • 8-byte unsigned int: session id
  • 1-byte unsigned int: type of the database action:
    • 0 is connect
    • 1 is disconnect
    • 2 is simple statement execution
    • 3 is statement preparation
    • 4 is execution of a prepared statement
    • 5 is cancel request
  • The remainder of the record is specific to the action, strings are stored with a preceeding 4-byte unsigned int that contains the length. Read the source for details.
  • Each record is terminated by a new-line character (byte 0x0A).

Support

If you have a problem or question, the preferred option is to open an issue. This requires a GitHub account.

Professional support can be bought from CYBERTEC PostgreSQL International GmbH.

TODO list

Nothing currently. Tell me if you have good ideas.

pgreplay's People

Contributors

ghunti avatar laurenz avatar mvives-broadsign avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pgreplay's Issues

First run always lags behind

Hi.
I have a problem that I cannot explain and debug further, even though I've seen all closed issues here.

I have a CSV with 1 second of a log with 527 lines, connections, deallocates, prepares, and executes; all queries are SELECT.

When I start a new database and run the pgreplay -r -d 1 -j -h <host> -W <pass> against it, the process takes around 5 minutes to execute. If I run the same command a second time, it takes 1 second to complete.

At first, I was suspicious that this was due to some Postgresql cache and that the database needs to "warm" to start performing, so I've tried the same process with a log containing 1 hour's worth of queries. After 2 hours of running, only 12 minutes of the log were executed, so I stopped the process. I re-ran the same command, and the first 12 minutes of the log were completed within 12 minutes, but then it started to lag again.

I've counted the number of lines each second has because I've read on #18 that there seems to be a limit of 2000 queries per second, and while on some seconds I do have those numbers, of the vast majority, I'm far from it.

I've seen this problem with databases with the same or better ones than the original database.

Can you help me somehow understand this, or help me debug it?

I really appreciate any help you can provide.

Compiler warnings on MacOS

The readme requested feedback when building for platforms not specifically tested.

OSX Mojave 10.14
Postgresql 11.1
Xcode 10.1

$ gcc -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/c++/4.2.1
Apple LLVM version 10.0.0 (clang-1000.11.45.5)
Target: x86_64-apple-darwin18.0.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

pgreplay successfully compiles and appears to work, but with warnings. See attachment for the build log. Most errors seem to be related to session_id and format string mismatch.

pgreplay-build.txt

AWS RDS support

Amazon's cloud database product, RDS, does not allow changing log_line_prefix. See docs and forum post. So we are stuck with %t:%r:%u@%d:[%p]:. It would be great if pgreplay could be made to work with logs coming from AWS RDS. pgbadger allows specifying the log_line_prefix. Perhaps pgreplay could do something similar.

pgreplay not working with RDS csvlog

Hi, I was wondering what format pgreplay expects the CSV file to be generated in as I have a very large log file with lots of statements in it but it seems to skip all of them.

I put debug level up to 3 and I can see the lines are getting skipped but can't really see a reason why.

Here's a sample log line that I would have expected to be parsed as a statement but wasn't, I'm happy to run through the log and re-order or modify values to make it work.
2019-10-09 08:08:17.895 UTC,"obscured_username","obscured_database",24948,"172.31.39.150:52114",5d9d80de.6174,1,"SELECT",2019-10-09 06:40:30 UTC,3/221898308,0,LOG,00000,"duration: 0.033 ms execute <unnamed>: SELECT ""public"".""users"".""families_id"" FROM ""public"".""users"" INNER JOIN ""public"".""relations"" ON ""public"".""relations"".""user_id"" = ""public"".""users"".""id"" WHERE ""public"".""relations"".""child_id"" = $1 AND ""public"".""relations"".""admin"" = $2 /*application:Application_name,controller_with_namespace:Api::V3::ActivityController,action:show*/","parameters: $1 = '123', $2 = 't'",,,,,,,,"Passenger AppPreloader: /myapp (forking...)"

Here's some of the output from debug at the end:

Entering parse_csvlog_entry
Entering read_log_line, current line number 23152306
Encountered EOF
Leaving read_log_line
Leaving parse_errlog_entry
End-of-file encountered, creating disconnect item for session 0x6745718026424814585
Entering replay_create_disconnect
Leaving replay_create_disconnect
Entering remove_connection for session 0x6745718026424814585
Entering remove_all_pstmts
Leaving remove_all_pstmts
Leaving remove_connection
---------------------------
Item: time       = 623925914.117000
      session id = 0x6745718026424814585
      type       = disconnect
---------------------------
Leaving parse_provider
Entering file_consumer
Entering replay_free
Leaving replay_free
Leaving file_consumer
Entering parse_provider
Entering parse_csvlog_entry
Entering read_log_line, current line number 23152306
Encountered EOF
Leaving read_log_line
Leaving parse_errlog_entry
End-of-file encountered, signal end
Leaving parse_provider
Entering parse_provider_finish

Parse statistics
================

Log lines read: 23152305
Total SQL statements processed: 0
Simple SQL statements processed: 0
Parametrized SQL statements processed: 0
Named prepared SQL statements executions processed: 0
Cancel requests processed: 0
Fast-path function calls ignored: 0
Duration of recorded workload: 36 minutes 53.544 seconds
Leaving parse_provider_finish
Entering file_consumer_finish
Leaving file_consumer_finish

replay and uuids

Question: Any suggestions for dealing with UUIDs when doing a replay? The problem is that during a replay, an insert will generate a different UUID for say a primary key then when it ran in production. So then any queries that would use the UUID that was generated in production now doesn't work. Anything that can be done about this?

Upgrade from pg 12 to pg 14

I have a postgres db version 12.11. I replayed logs of 1 hour successfully. Transactions per second are about 10.000-15.000
When I upgraded database to 14.4 (and after I performed analyze) and replayed the same logs this changed drastically and transactions per second were about 1.000. Could you assist with this?

PGReplay query limit ?

Hi,
I'm currently working with pgreplay in order to evaluate a migration for my company and I am facing this kind of message:
Execution is 3 minutes behind schedule

I'm running pgreplay and postgres on two separate servers and the postgres server does not seem to have any load issue at all (CPU/RAM/I/O are good). The pgreplay server however has one CPU at 100% for a couple of hours now.
The pgreplay files I'm replaying is around 20M records on a ~20 hours timeframe. Only read only queries (It's traffic from a pg hot_standby).
Is it possible that we are hitting an issue where the machine running pgreplay is not powerful enough ? (c5.large on AWS)

P.S: I am also using the -j option of pgreplay but seeing that the replay is still running after 12+ hours, I don't think it changes something in our case :)

connection to database failed for a specific session

Hi
I have a "replay file" that was generated using the pgreplay -f command. But when I try to replay the statements against the DB, the connection to DB fails at a specific session and the test halts.

This is the error message:

Starting database connection for session <session_id>
Error: connection to database failed: could not translate host name "<the_db_host>" to address: System error

Entering replay_free
Leaving replay_free
Leaving database_consumer
Entering file_provider_finish
Leaving file_provider_finish
Entering database_consumer_finish
Error: not all database connections closed
Leaving database_consumer_finish

The connection to DB was successful when using the psql command, but it fails during pgreplay.
I'm not sure why the DB connection is failing for one particular session.

Can anyone help me identify the cause of this problem?

incorrect bind variables substitution

Hi Luarenz,

The next example seems as incorrect bind variable substitution.
For example:

On the 1-st server, PG had logged:

execute : WITH main_query AS (select row_number() over () rn, dc.* from (SELECT * FROM(
SELECT ID, concatwithdelimiter(code_statistic,statistic_short) as TEXT FROM report_statistic
) tbl WHERE 1=1
AND ( ( ( (ID = $1) ) ) ) ORDER BY TEXT asc) dc) SELECT main_query.* FROM main_query WHERE rn > $2 AND rn <= $3 + $4" parameters: $1 = '26', $2 = '0', $3 = '0', $4 = '29'

On the 2-nd server whith pgreplay, PG had logged the error:

operator is not unique: unknown + unknown
Could not choose a best candidate operator. You might need to add explicit type casts. WITH main_query AS (select row_number() over () rn, dc.* from (SELECT * FROM(
SELECT ID, concatwithdelimiter(code_statistic,statistic_short) as TEXT FROM report_statistic
) tbl WHERE 1=1
AND ( ( ( (ID = $1) ) ) ) ORDER BY TEXT asc) dc) SELECT main_query.* FROM main_query WHERE rn > $2 AND rn <= $3 + $4

Is there some way how to avoid such errors when replay with pgreplay?

Regards,
Yury

Tests are failing after installation

Hi there,

I successfully installed pgReplay on M1 Mac Mini. However, before investigating time in using it, I wanted to run the tests to ensure that my installation is really successful.

It didn't work as intended:

laurentmeyer@Laurents-Mini pgreplay % make test
cd test && ./runtest.sh
Testing stderr log parsing ...
ok
Testing CSV log parsing ...
ok
Testing replay ...
unexpected output, difference to expected:
0a1,2
> Execution is 10 seconds behind schedule
>
make: *** [test] Error 1
laurentmeyer@Laurents-Mini pgreplay % make test
cd test && ./runtest.sh
Testing stderr log parsing ...
ok
Testing CSV log parsing ...
ok
Testing replay ...
unexpected output, difference to expected:
0a1,2
> Execution is 10 seconds behind schedule
>
make: *** [test] Error 1

Did I do something wrong? I checked:

  • that the connection with db is working
  • that the path in the test runner are corresponding to actual bin on my machine

make install fails on Mac OS X

configure.in is using AC_PROG_INSTALL and install-sh is bundled for systems without the BSD compatible install autoconf is looking for.

However; Makefile.in uses "-D" in a way that is not supported by the BSD install on my Mac OS X 10.14.4 system or the bundled install-sh script. I forced that by setting passing INSTALL=./install-sh to configure); make install still fails.

Mac OS 10.14.4 install options reported on
usage: install [-bCcpSsv] [-B suffix] [-f flags] [-g group] [-m mode]
[-o owner] file1 file2
install [-bCcpSsv] [-B suffix] [-f flags] [-g group] [-m mode]
[-o owner] file1 ... fileN directory
install -d [-v] [-g group] [-m mode] [-o owner] directory ...

The bundled install-sh reports:

./install-sh --help
Usage: ./install-sh [OPTION]... [-T] SRCFILE DSTFILE
or: ./install-sh [OPTION]... SRCFILES... DIRECTORY
or: ./install-sh [OPTION]... -t DIRECTORY SRCFILES...
or: ./install-sh [OPTION]... -d DIRECTORIES...

In the 1st form, copy SRCFILE to DSTFILE.
In the 2nd and 3rd, copy all SRCFILES to DIRECTORY.
In the 4th, create DIRECTORIES.

Options:
--help display this help and exit.
--version display version info and exit.

-c (ignored)
-C install only if different (preserve the last data modification time)
-d create directories instead of installing files.
-g GROUP chgrp installed files to GROUP.
-m MODE chmod installed files to MODE.
-o USER chown installed files to USER.
-s strip installed files.
-t DIRECTORY install into DIRECTORY.
-T report an error if DSTFILE is a directory.

Environment variables override the default commands:
CHGRPPROG CHMODPROG CHOWNPROG CMPPROG CPPROG MKDIRPROG MVPROG
RMPROG STRIPPROG

This is clearly a minor issue I can work around; but ideally Makefile.in would still to autoconf supported install options so that worst-case the provided install-sh works and better-case any BSD compatible install would work (probably use a separate mkdir step to avoid depending on incompatible install behavior).

Queries per second

Hi,

I'm finding that the replay I am running frequently runs behind and it seems to be related to how quickly pgreplay can replay the statements rather than the database itself. Seems to top out around 2000 queries per second which isn't indicative of our real workload.

What do you suggest for a strategy to simulate a higher load closer to the real one?

I could kick off multiple pgreplay runs simultaneously by splitting my csv file into equal parts by session ID, do you think that could work?

unknown type 58 encountered

Hi,

I'm getting this error when I try to replay a file:

bash-4.2$ pgreplay -r /vagrant/postgresql-2015-01-15_211158.csv
Error: unknown type 58 encountered

This is a CentOS 7.0 box, with PostgreSQL 9.3

bash-4.2$ uname -a
Linux local-test-catalog-db-01 3.10.0-123.13.2.el7.x86_64 #1 SMP Thu Dec 18 14:09:13 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
bash-4.2$ psql --version
psql (PostgreSQL) 9.3.5

Thanks
-T

Generating undo records by using pgreplay

Hi,

Thanks a lot for making this amazing program. This will be super-useful for my future projects.

I just have one question. I need to manually craft an anti-WAL file that will undo certain SQL operations done from within a regular WAL file. Suppose the SQL server has run query 1~100. At this point, I want the SQL server to undo query 10, 20, and 30 (suppose these 3 queries are INSERT to the same table). One way to do this is to create 3 SQL queries that DELETE the records inserted by query 10, 20, and 30. However, I want the SQL server to do this deletion operation not via SQL queries, but via low-level read/write operations as the ones specified in the WAL file. The reason is that SQL queries could contain complex conditions by using multiple JOINs, which will be slow. So I instead try to create a special WAL file which will make the SQL server undo query 10, 20 and 30 once it starts up. I wonder if I could make such a program which crafts the WAL file by slightly modifying or leveraging your pgreplay source code. If you think this is possible, could you please give me some hint on ho & where I could start it from your source files? I would really appreciate your help.

Will PQgetResult cause stat_stmt statistics errors?

							if (PQisBusy(conn->db_conn)) {
								/* more to read */
								all_idle = 0;
							} else {
								/* read and discard all results */
								while (NULL != (result = PQgetResult(conn->db_conn))) {
									/* count statements and errors for statistics */
									++stat_stmt; // <----  here 
									result_status = PQresultStatus(result);

Hello, I have a question: If PQgetResult cannot be read at one time, or PQgetResult needs to be read multiple times, will there be a problem with stat_stmt here?

Non-strict mode

In some cases, we have problems with some lines of logs, and want to just exclude such log lines from consideration. Right now it's impossible -- pgreplay will print an error and stop parsing.

It would be great to have an option to parse log file in "non-strict" mode โ€“ a mode, when incorrect log lines (those that have some parsing issues, like unmatched quotes), lead to having Warning instead of hard Error, so parsing continues. It would also be good to see how many Warnings were collected during parsing (how many lines we've "lost").

Stuck on select() syscall

Hello,

I tried to replay log after a pgbench test. pgreplay had successfully parsed log files

Parse statistics
================

Log lines read: 50082
Total SQL statements processed: 50082
Simple SQL statements processed: 50082
Parametrized SQL statements processed: 0
Named prepared SQL statements executions processed: 0
Cancel requests processed: 0
Fast-path function calls ignored: 0

But when I replay, it seems stuck on do_select call:

postgres@pg11:~$ pgreplay -r -j -s1 /tmp/replay.log 
Execution is 10 seconds behind schedule
Execution is 30 seconds behind schedule

Here a perf report of specific pid:

  14.55%  [kernel]          [k] entry_SYSCALL_64
  12.12%  [kernel]          [k] syscall_return_via_sysret
  10.59%  [kernel]          [k] do_select

with gdb it seems stuck on

rc = select(n, rfds, wfds, xfds, timeout);

I tried first with master and manually compiled version with archlinux and kernel 4.20.6-arch1-1-ARCH

Then I tried with debian stretch inside an lxc container (so, same kernel as above).

Maybe there is something wrong with this syscall under recent kernel? I read in this post that select perform worse than epoll : https://jvns.ca/blog/2017/06/03/async-io-on-linux--select--poll--and-epoll/

Note : my goal is to sample SQL workload with pg_sampletolog and replay it with pgreplay by adjusting speedfactor.

Thanks

Will the program support pg11 onward?

when i do pgreplay, it just won't parse ,with error: paring line 1: log message does not begin with a log type.
i have no idea which format is acceptable, beyond setting log_line_prefix like in the instructions. thx

Transaction Deadlocks on Replay

I've been using pgreplay to test some theories out, but whenever I use it with a high-concurrency workload, I immediately hit transaction deadlocks because the transactions are being started/ended slightly out-of-sync with respect to the original server. Is it possible to somehow increase the resolution with which transactions are started and stopped? When using the CSV file format, is it possible to use the transaction ID to only start a transaction in the same order that it originally was started?

docker build Fails with Error (from master, commit a2f7c113197ed5623b35f09bfc299aae971d56e8)

When checking out the current revision at master (as of writing: https://github.com/laurenz/pgreplay/tree/a2f7c113197ed5623b35f09bfc299aae971d56e8) and running the following command: docker build -t laurenz/pgreplay -f Dockerfile . the following error occurs:

 > [ 4/11] RUN apt-get update &&     apt-get install --no-install-recommends -y tzdata make gcc libc6-dev postgresql-14 libpq-dev postgresql-doc-14 git ca-certificates &&     apt-get clean && rm -rf /var/lib/apt/lists/*:                                          
0.743 Ign:1 http://ports.ubuntu.com/ubuntu-ports kinetic InRelease                                                                                                                                                                                                    
0.893 Ign:2 http://ports.ubuntu.com/ubuntu-ports kinetic-updates InRelease                                                                                                                                                                                            
1.041 Ign:3 http://ports.ubuntu.com/ubuntu-ports kinetic-backports InRelease                                                                                                                                                                                          
1.186 Ign:4 http://ports.ubuntu.com/ubuntu-ports kinetic-security InRelease                                                                                                                                                                                           
1.335 Err:5 http://ports.ubuntu.com/ubuntu-ports kinetic Release
1.335   404  Not Found [IP: 185.125.190.36 80]
1.485 Err:6 http://ports.ubuntu.com/ubuntu-ports kinetic-updates Release
1.485   404  Not Found [IP: 185.125.190.36 80]
1.634 Err:7 http://ports.ubuntu.com/ubuntu-ports kinetic-backports Release
1.634   404  Not Found [IP: 185.125.190.36 80]
1.779 Err:8 http://ports.ubuntu.com/ubuntu-ports kinetic-security Release
1.779   404  Not Found [IP: 185.125.190.36 80]
1.800 Reading package lists...
1.820 E: The repository 'http://ports.ubuntu.com/ubuntu-ports kinetic Release' does not have a Release file.
1.820 E: The repository 'http://ports.ubuntu.com/ubuntu-ports kinetic-updates Release' does not have a Release file.
1.820 E: The repository 'http://ports.ubuntu.com/ubuntu-ports kinetic-backports Release' does not have a Release file.
1.820 E: The repository 'http://ports.ubuntu.com/ubuntu-ports kinetic-security Release' does not have a Release file.

I've included a PR that moves to a LTS version of Ubuntu (22.04) which can resolve this problem (if the PR is desired).

Error: unknown type 115 encountered

Hello Guys,
After ~22 hours of running, my pgreplay ends with Error: unknown type 115 encountered. I'm checking the CSV and they do not seem to contain any error. As I did not keep the aggregated CSV I used to generate the pgreplay file, I'm currently parsing it to see if it ends on the last statement (I'll keep this question updated with this info).
I' ve rerun the replay in debug, and I am noticing that all the last statements it did before seeing this error are disconnects but I can not see them in the CSV files. Is it possible that there is an issue if my csv does not end with a disconnect (or ends in a middle of a transaction )

Parsing line error

Hi,

I am running in to the below error. Can someone help me identify the problem? I tried removing some lines from the log file, but I ran into the same issue.

/usr/pgsql-13/bin/pgreplay -f -c -o /home/postgres/postgres.replay.log /home/postgres/postgres.log
Error parsing line 1: comma number 4 not found (or unmatched quotes)

-bash-4.2$ /usr/pgsql-13/bin/pgreplay -c /home/postgres/postgres.log
Error parsing line 1: comma number 4 not found (or unmatched quotes)

Unable to compile pgreplay on ubuntu 18.04

I am using commit 715e333

When configuring I get the following error:

$ ./configure
checking build system type... x86_64-pc-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking for fcntl.h... yes
checking netinet/in.h usability... yes
checking netinet/in.h presence... yes
checking for netinet/in.h... yes
checking for stdint.h... (cached) yes
checking for stdlib.h... (cached) yes
checking for string.h... (cached) yes
checking sys/time.h usability... yes
checking sys/time.h presence... yes
checking for sys/time.h... yes
checking for unistd.h... (cached) yes
checking for inttypes.h... (cached) yes
checking for libpq-fe.h... no
configure: error: PostgreSQL header files not found

I do have the header file:

$ ls /usr/include/postgresql/libpq-fe.h
/usr/include/postgresql/libpq-fe.h

after running export C_INCLUDE_PATH=/usr/include/postgresql configure works.

Seems like ubuntu is doing this in their package building script as well:
See http://archive.ubuntu.com/ubuntu/pool/universe/p/pgreplay/pgreplay_1.2.0-2ubuntu2.diff.gz under https://packages.ubuntu.com/bionic/pgreplay

Unable to parse the file

I am continuously getting the error as Error parsing line 1: no "|" found - log_line_prefix may be wrong

Does the code "|" expects to be present in log file?

Invalid free on replaying server-side prepare "ROLLBACK; DEALLOCATE ALL".

Hi, thanks for writing pgreplay! pgreplay currently tries to free an invalid pointer and crashes when the replay contains a "ROLLBACK; DEALLOCATE ALL". This can happen when the prepareThreshold of a server prepared statement is crossed and then a rollback occurs; for example, the psycopg3 PostgreSQL adapter will attempt to DEALLOCATE ALL as part of the rollback.

Environment:

  • PostgreSQL 14, Ubuntu 20.04.
  • Python3.8 with psycopg (aka psycopg3).

Stack trace:

(gdb) bt
#0  0x00007f0bc22c303b in raise () from /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f0bc22a2859 in abort () from /usr/lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f0bc230d29e in ?? () from /usr/lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f0bc231532c in ?? () from /usr/lib/x86_64-linux-gnu/libc.so.6
#4  0x00007f0bc2316b5c in ?? () from /usr/lib/x86_64-linux-gnu/libc.so.6
#5  0x000055c591c2f8d0 in remove_all_pstmts (conn=<optimized out>) at parse.c:801
#6  0x000055c591c2f9c2 in remove_connection (session_id=7091101072822916911) at parse.c:1097
#7  0x000055c591c326ca in parse_provider () at parse.c:1476
#8  0x000055c591c2effc in main (argc=<optimized out>, argv=<optimized out>) at main.c:395

To reproduce, please find attached:

  • pgreplay_poc.py.txt: Python script to generate minimal test case.
  • pgreplay_poc_logs.csv: The logs that demonstrate the segfault when replayed on a new database.
  • pgreplay_poc.sh.txt: boilerplate around cloning latest pgreplay, creating user/db, setting system options, running the Python script, getting the above logs, running pgreplay. Will delete default /var/lib/postgresql/14/main/log/ logs if run. Assumes /usr/bin/pg_config exists. To use it, enter pgreplay_pass as password when prompted.

pgreplay seems to have stopped suddenly when executing

I ran the following command against a postgresql 11.2 server:

./pgreplay/pgreplay -c -D postgres -h REDACTED -p 5432 -W REDACTED postgresql-2019-04-30_000000.csv |& tee pg-warehouse-ec2.log

The last few lines are:

Warning: COPY statement ignored in line 804460
Warning: COPY statement ignored in line 805343
Execution is 30 minutes behind schedule
Execution is 1 hour behind schedule
ubuntu@REDACTED:~$ echo $?
0

It seems like pgreplay suddenly quit and didn't print out any stats. I'm not sure what happened but I'm hoping you can give some insight.

Also, I am running a similar command against RDS (postgresql 11.1) and RDS Aurora (compatible with postgresql 10.7) with the same postgresql csv log file. Those other 2 commands are still executing so I don't think this command finished executing the entire log file because I didn't specify the -j option.

Opened connections

In a recent test run with pgreplay, the tool exited after it exhausted all the available connections to the database. See the images below for more details:

Execution of pgreplay
image

Connection count
image

Is this expected behaviour?

Replay freezes, all queries stuck in ClientRead state.

I've got a very beefy log file that averages ~2000 queries/second. Looking to replay it if possible. Running the replay command:

./pgreplay -d 1 -j -s .1 -D restapi -c postgresql.log.2020-01-28-19.csv

I added the fractional -s .1 option because it wasn't even close to keeping up. Is this a good idea? Will that help things? The replay eventually freezes, and all queries are left in idle state with the ClientRead wait event. Does this make sense? Anything I can do about it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.