Code Monkey home page Code Monkey logo

temporal_tables's Issues

Allow user-defined timestamps

For system versioning this may not really make sense, but I have a use case where I explicitly want to be able to set the lower bound for the timezone range when doing inserts and updates. To achieve this I declare the initial value of the variable time_stamp_to_use as follows

    time_stamp_to_use timestamptz := coalesce(lower(NEW.sys_period), current_timestamp);

So when the insert or update statement explicitly supplies a value for the sys_period column it will be used.

Generated Columns

Hey team - thanks for this repo.

RE Migrations - any thoughts around ignoring generated columns if created on the secondary table? ie. adding AND attgenerated = '' in the pg_attribute selects for holder and commonColumns for the history selects (link to relevant PG attr docs).

Today I had to change some of the current/history tables and add computed columns for performance reasons. On adding those columns the inserts into the base table failed - here's an example if desired

CREATE TABLE subscriptions
(
  name text NOT NULL,
  changed_value text not null
);

ALTER TABLE subscriptions
  ADD COLUMN sys_period tstzrange NOT NULL DEFAULT tstzrange(current_timestamp, null);

CREATE TABLE subscriptions_history (LIKE subscriptions);

CREATE TRIGGER versioning_trigger
BEFORE INSERT OR UPDATE OR DELETE ON subscriptions
FOR EACH ROW EXECUTE PROCEDURE versioning(
  'sys_period', 'subscriptions_history', true
);
--works fine
insert into subscriptions (name, changed_value) values ('sub_ab', 'value1');
select * from subscriptions; 

-- change dimension
alter table subscriptions  add name_cleaned text  generated always as (upper(name)) stored;
alter table subscriptions_history add name_cleaned text  generated always as (upper(name)) stored;

-- fails
update subscriptions set changed_value = 'value_2' where name = 'sub_ab'

I could argue either on this edge case but possibly we can add some advice to the docs under Migrations regarding this.

Thanks

add release workflow and dependabot

  • add dependabot, only needs configuration for github-actions
  • add release workflow using optic-release-automation-action
    • move the version in sql files in a place which makes it easier to update automatically (like the top, and using a syntax for comments which is easy to intercept with a regexp for instance)
    • add package.json file, with version 0.4.0 (which is the current version)
    • configure it so that when bumping the version, it also updates the version in the sql files. see how it's done in https://github.com/nearform/github-snooze-chrome-extension/
  • delete history.md

does not work for table names with CamelCase

Issue:

These procedures don't work if your table names are in CamelCase.

Environment:

  • Postgres 12.1

Explanation:

Current implementations of procedures make some assumptions that you have to comply with:

  • your table name should be in snake_case
  • you are allowed to pass schema within second argument of versioning function using dot notation like this: public.origin_table
    Unfortunately it all breaks if you have your table name in camel case like this: OriginTable because of the following:
  • in order to use PG identifiers in camel case, you have to quote them with double quotes (like this: INSERT INTO "OriginTable"...).
  • internally procedures try to leverage quote_ident PG function to do this, but the implementation is incorrect in some places. specifically versioning_function_nochecks works correctly around EXECUTE ('INSERT INTO ' || block, but fails on the previous lines at WHERE attrelid = history_table::regclass due to not using quote_ident function here.

How I fixed it:

  • I first tried to build the correct quote_ident pipeline for the second argument of versioning function to allow for different values to be passed such as: 'history_table', 'public.history_table', 'HistoryTable', '"HistoryTable"', 'public."HistoryTable"', '"public"."HistoryTable"', 'public.HistoryTable', etc.
  • the code of acquiring the correct identifier involved: splitting by ., trimming both parts of any quotes, correctly quoting both parts with quote_ident PG function. all that multiplied by 2 because one may or may not pass the schema. I didn't want to make a reusable function just for that to not clutter the PG function name scope with seemingly unrelated function.
  • when I saw how much code I need to add into the procedure to allow this, I thought that correct passing of identifier is out of scope of temporal_tables procedures and should be imposed on the user.
  • I then tested the original temporal_tables PG extension and found that it did not work if I passed 'HistoryTable' as a second argument either. It only worked if I passed '"HistoryTable"'. It proves that the original PG extension also imposed that onto user and did not try to handle it itself.
  • I proceeded to write a test case and implementation fixes (see PR attached).

How to back-fill history items from the past?

We've got a use case where live data is streamed to the database, but also occasionally older updates will be sent as well to back-fill the historical data.

E.g., slightly modifying the readme sample:

SELECT set_system_time('2001-12-31 23:59:59');
INSERT INTO subscriptions (name, state) VALUES ('test2', '2001 update');
SELECT set_system_time('2000-12-31 23:59:59');
UPDATE subscriptions SET state = '2000 update' WHERE name = 'test2';
SELECT set_system_time('1999-12-31 23:59:59');
UPDATE subscriptions SET state = '1999 update' WHERE name = 'test2';

The current behavior results in:

name state sys_period
test2 2001 update ["2001-12-31 23:59:59+00","2001-12-31 23:59:59.000001+00")
test2 2000 update ["2001-12-31 23:59:59.000001+00","2001-12-31 23:59:59.000002+00")
test2 1999 update ["2001-12-31 23:59:59.000002+00","2001-12-31 23:59:59.000003+00")

While what I would ideally want is

name state sys_period
test2 1999 update ["1999-12-31 23:59:59+00","2000-12-31 23:59:59+00")
test2 2000 update ["2000-12-31 23:59:59+00","2001-12-31 23:59:59+00")
test2 2001 update ["2001-12-31 23:59:59+00","2001-12-31 23:59:59.000001+00")

Is there a way to modify the versioning function so that it supports inserting historical values, or would it be too impractical to do with this extension? I'm unfortunately too inexperienced with SQL to try and figure it out by myself...

Add support to add user name

It would be nice to know who is the user who has made the change. If the tracking table can also captures who is the user made that change. It would be even better if the username is read from some SET parameter instead of current_user. Most applications use database pool with common credentials so current_user is not that much useful. The application would know who is the real user so would be nice if application can pass a transaction local parameter and the trigger can read that parameter and add the username to it.

Ideally user query like this:

SET LOCAL temporal.user = 'xxxxxxx'
UPDATE subscriptions SET state = 'updated' WHERE name = 'test1';

The subscription_history table should contain:


|name|state|sys_period|sys_user|

Updates Not Recorded When Long Running Transactions Are Open

First, thanks for making this available. My team has been using this versioning function for a while now.

Recently, we've noticed an issue with versioning in the following circumstances:

  • high load
  • an insert immediately followed by an update

In that scenario, on occasion, the update succeeds without triggering a write to the version table.

I am able to reproduce the issue without high load by initiating a long running transaction on an unrelated table before I execute the insert/update:

BEGIN;
update <some-other-table> ...;
select pg_sleep(100);
COMMIT;

The version is not written because the function will bail out on this block:

    -- Ignore rows already modified in this transaction
    transaction_info := txid_current_snapshot();
    IF OLD.xmin::text >= (txid_snapshot_xmin(transaction_info) % (2^32)::bigint)::text
    AND OLD.xmin::text <= (txid_snapshot_xmax(transaction_info) % (2^32)::bigint)::text THEN
      IF TG_OP = 'DELETE' THEN
        RETURN OLD;
      END IF;

      RETURN NEW;
    END IF;

The long-running transaction is making the snapshot's transaction range inclusive of the initial insert (e.g. OLD.xmin) even though it is already completed and therefore not present in the snapshot's xip_list. All subsequent transactions initiated and completed after the long-running transition initiated would also be inclusive in that snapshot range.

I don't have a full appreciation for the original intent of that block (beyond the comment itself), but for my use case, I think the condition could be updated to check whether the transition id of the OLD record is the same as the current transaction. Something like:

    -- Ignore rows already modified in this transaction
    transaction_info := txid_current();
    IF OLD.xmin::text = (transaction_info % (2^32)::bigint)::text THEN
      IF TG_OP = 'DELETE' THEN
        RETURN OLD;
      END IF;

      RETURN NEW;
    END IF;

Another possibility may be txid_visible_in_snapshot(bigint, txid_snapshot) - although I wonder if that may also be too greedy. The fact that the transaction id of the OLD record is still visible, may not, in and of itself suggest that it is inappropriate to also write a version on this transaction (e.g. I may be updating it from a different transaction id).

Thanks again!

Execute functioning version procedure script with C#

Hi guys,

I'm trying to run the versioning_function.sql with C# at start up time to create procedure but seems that can not run.
I removed the comment statements in script.
I read file versioning_function.sql into string and then use ExecuteSqlCommand to execute script. It thrown an exception
"Input string was not in a correct format."

Stack trace

{System.FormatException: Input string was not in a correct format.
   at System.Text.StringBuilder.FormatError()
   at System.Text.StringBuilder.AppendFormatHelper(IFormatProvider provider, String format, ParamsArray args)
   at System.String.FormatHelper(IFormatProvider provider, String format, ParamsArray args)
   at System.String.Format(String format, Object[] args)
   at Microsoft.EntityFrameworkCore.Storage.Internal.RawSqlCommandBuilder.Build(String sql, IEnumerable`1 parameters)
   at Microsoft.EntityFrameworkCore.RelationalDatabaseFacadeExtensions.ExecuteSqlCommand(DatabaseFacade databaseFacade, RawSqlString sql, IEnumerable`1 parameters)
   at Microsoft.EntityFrameworkCore.RelationalDatabaseFacadeExtensions.ExecuteSqlCommand(DatabaseFacade databaseFacade, RawSqlString sql, Object[] parameters)
   at BSWeb.Startup._updateDatabase(String rootPath, DbContext context) in Startup.cs:line 449}

Please give me an idea. Thanks!

Lagging behind

I am confused. You don't write out the inserted record to the history table upon insertion but only if an eventual update occurs? And then if I delete the record there is never a deleted event recorded in the history table? That is so strange. Am I seeing this right?

I would think I would have seen this?

  1. INSERT INTO subscriptions (name, state) VALUES ('test1', 'inserted');

  2. history table contains insert

  3. UPDATE subscriptions SET state = 'updated' WHERE name = 'test1';

  4. history table contains update

  5. DELETE FROM subscriptions WHERE name = 'test1';

  6. history table contains delete

Am I wrong in that this is not how it works?

Is this still working?

I did exactly as in the readme in postgres v14.1 in my local and it didn't work. I immediately spawned a new RDS in AWS with v16.2 and followed the readme and it still didn't work. The history table is always empty, and the data isn't propagating to it.

Thank you for your help in advance.

ignore_unchanged_values=false does not work

Hi guys - thanks for the great repo.

Seems to me range_lower never gets set unless ignore_unchanged_values=true.

Also wondering how ignore_unchaged_values can ever be anything but null when it is defined as ignore_unchanged_values := TG_ARGV[3]; (i.e fourth parameter) and the function is called as per the readme with three parameters.

Allow ignoring unchanged values

I've been using this for awhile, and it works well. However, I'd like to see an option to ignore any unchanged rows for simplicity's sake. We have issues with the amount of records stored in the history tables due to updates where no information has changed.

I agree that the best place for this is technically in the app code, but that would be more tedious and involve numerous apps and table/entity checks. Putting the check in the versioning function itself seems like the best place (at least within our tech stack) for speed of development.

document migrations in README

I had some questions that I thought would be good to document in the README

What is the migration strategy?

  • apply the same schema migrations to the temporal table?
  • don't do data migrations

What happens if a column is not created in the temporal table?

Allow a custom `sys_period` to be specified during `INSERT`

First of all, thanks for creating this great project!

The versioning() function currently intercepts UPDATE and INSERT operations to overwrite the sys_period column in the main table with [now, NULL). This makes perfect sense for UPDATEs as we want the function to manage sys_period for us. However doing this on INSERT created some problems for my use case:

  1. I am trying to migrate existing data from another database into a PSQL-with-temporal-tables database. I would like to retain the original creation dates on the existing data and thus would like to initialize the sys_period to [original_creation_date, NULL) for all the rows. But of course, my efforts are thwarted by the logic in versioning().
  2. As a sidenote, this logic also makes moot any defaults for the sys_period column in the main table schema.

I would like to update versioning() to only manage sys_period for UPDATE. For INSERT, the client can either specify a custom value or use the default value, which will likely be [now, NULL) anyway.

I have a PR ready for submission if you agree with this change.

Check for rows updated in the same transaction checks for all the transactions in the instance

Hello !

Just ran into some issues where history was not updated for lots of updates, and saw that the txid_current_snapshot() was used in the versionning function to check if the Old row was updated in the current transaction.
However, according to some testings on my end, plus postgres doc, seems that current_snapshot takes the min transaction not commited on the instance, and the max is always the not yet assigned txid at the time of the snapshot. Which means OLD.xid can never be above max, and if there is a long-running transaction in the instance (ie: cleanup work / data export etc..), the OLD.xmin will be between the snapshot min and max. Meaning we won't create an entry inside history despite the fact that we shall have.
I think the way to correctly check for that is to use txid_current to check against OLD.xmin.

I already tested this update to solve my issue, and would gladly submit a PR if you think my reasoning makes sense :)

Thanks again for this repo which helped me a lot since a long time !

EDIT from @jack-robson : The references to txid_current_snapshot, OLD.xmin and OLD.xmax reference an old version of the codebase. This could was changed to the code shown in the screenshot below, in this PR: #29

Some updates are not recorded

Looks like the code marked with this comment
-- Ignore rows already modified in this transaction
makes it ignore some updates and not write in history.

As far as I understand, if I start a transaction and do multiple updates on a record then only the first change is written to history, and history of all other updates until transaction commit are lost.

Missing updates when deleting row in the same transaction

Hi

In short, I would like to track who modified or removed given record.

As an example, let's say I have a table

create table users (
  email varchar primary key,
  modified_by varchar not null,
  sys_period tstzrange NOT NULL DEFAULT tstzrange(current_timestamp, null);
);

-- I'm omitting temporal_tables setup, you can assume that I have `users_history` table defined

Now, when I run

-- first transaction
insert into users values ("[email protected]", "[email protected]");

-- second transaction
update users set modified_by = "[email protected]" where email = "[email protected]";
delete from users where email = "[email protected]";

I'll get single record in users_history table

[email protected]; [email protected]; [timestamp-1 timestamp-2]

instead of

[email protected]; [email protected] [timestamp-1 timestamp-2]
[email protected]; [email protected] [timestamp-2 (timestamp-2 + 1 microsecond)]

From what I understand what happens is:

  • insert doesn't save anything in history (since there's no history to preserve yet)
  • update inserts old row (with [email protected]) into history table (as it should)
  • delete doesn't insert row into history table due to transaction check

I understand that it's not a bug per se (I did find information that this is present to be backwards compatible with native extension), so this "ticket" is more of a question ๐Ÿ˜…

How I've "fixed" it was to remove transaction check from versioning function and I'm wondering if this is a "intended" solution or maybe I'm missing something?

I know I could run update and delete in separate transactions but in this case it doesn't make sense - it should be all-or-nothing.

Ignore inserts without actual change

Shouldn't inserts of the same data be treated just as updates with the same data, i.e., ignored?

UPDATE: Probably best resolved with on conflict do update, even if it's cumbersome :-/

Trigger not working if table contains column with non-equality type

If the table contains a column which does not implement equality operator then the versioning trigger fails. The equality operator is not implemented for some PostGIS data types, JSON data types, etc.

Example error for table with PostGIS polygon type:

ERROR:  could not identify an equality operator for type polygon
CONTEXT:  PL/pgSQL function versioning() line 38 at IF

The reason is the NEW IS NOT DISTINCT FROM OLD statement in IF condition related to the ignore_unchanged_values feature. The failure occurs even if the ignore_unchanged_values is not enabled as all parts of the IF condition are executed. Thus the issue is quite severe as it breaks compatibility/functionality with the old fashion temporal_tables.

Questions: how deal with related table

This may not right place to ask about this. But I searched everywhere and couldn't find out how to deal with relationship when using temporal tables.

For example, I need to versioning Book table:

  • Book ManyToMany Genres
  • Book hasMany Authors

Could you give a tip about how to do that with temporal tables?

Not working with RDS Postgres 11.5

I was able to follow the example and create a temporal table in a separate db. however once I tried doing this on the main db and It just doesn't replicate the values for some reason. Is there a way to turn on more in depth error reporting?


Update 1: I went back through the example in the readme. and it looks like there is some lag on the insert. What I mean is I ran an insert on the subscriptions table and nothing appeared in the history table. however after I ran an update the insert command showed up in the history table, but no insert.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.