Code Monkey home page Code Monkey logo

postgres_cluster's Introduction

PostgreSQL multimaster

multimaster is a PostgreSQL extension with a set of patches that turns PostgreSQL into a synchronous shared-nothing cluster to provide Online Transaction Processing (OLTP) scalability and high availability with automatic disaster recovery. As compared to a standard PostgreSQL master-slave cluster, a cluster configured with the multimaster extension offers the following benefits:

  • Cluster-wide transaction isolation
  • Synchronous logical replication
  • DDL replication
  • Working with temporary tables on each cluster node
  • Fault tolerance and automatic node recovery
  • PostgreSQL online upgrades

Overview

The multimaster extension replicates the same database to all nodes of the cluster and allows write transactions on each node. To ensure data consistency in the case of concurrent updates, multimaster enforces transaction isolation cluster-wide, using multiversion concurrency control (MVCC) at the repeatable read isolation level. Any write transaction is synchronously replicated to all nodes, which increases commit latency for the time required for synchronization. Read-only transactions and queries are executed locally, without any measurable overhead.

To ensure high availability and fault tolerance of the cluster, multimaster uses three-phase commit protocol and heartbeats for failure discovery. A multi-master cluster of N nodes can continue working while the majority of the nodes are alive and reachable by other nodes. When the node is reconnected to the cluster, multimaster can automatically fast-forward the node to the actual state based on the transactions log (WAL). If WAL is no longer available for the time when the node was excluded from the cluster, you can restore the node using pg_basebackup.

For details on the multimaster internals, see the Architecture page.

Documentation

  1. Administration
    1. Installation
    2. Setting up a Multi-Master Cluster
    3. Tuning configuration params
    4. Monitoring Cluster Status
    5. Adding New Nodes to the Cluster
    6. Excluding Nodes from the Cluster
  2. Architecture
  3. Configuration Variables
  4. Built-in Functions and Views

Tests

Fault tolerance

(Link to test/failure matrix)

Performance

(Show TPC-C here on 3 nodes)

Limitations

  • multimaster can only replicate one database per cluster.

  • The replicated tables must have primary keys or replica identity. Otherwise, multimaster cannot perform logical replication. Unlogged tables are not replicated, as in the standard PostgreSQL.

  • Sequence generation. To avoid conflicts between unique identifiers on different nodes, multimaster modifies the default behavior of sequence generators. For each node, ID generation is started with the node number and is incremented by the number of nodes in each iteration. For example, in a three-node cluster, 1, 4, and 7 IDs are allocated to the objects written onto the first node, while 2, 5, and 8 IDs are reserved for the second node.

  • DDL replication. While multimaster replicates data on the logical level, DDL is replicated on the statement level, which causes distributed commits of the same statement on different nodes. As a result, complex DDL scenarios, such as stored procedures and temporary tables, may work differently as compared to the standard PostgreSQL.

  • Commit latency. The current implementation of logical replication sends data to subscriber nodes only after the local commit. In case of a heavy-write transaction, you have to wait for transaction processing twice: on the local node and on all the other nodes (simultaneously).

  • Isolation level. The multimaster extenstion currently supports only the repeatable read isolation level. This is stricter than the default read commited level, but also increases probability of serialization failure during commit. Serializable level is not supported yet.

Compatibility

The multimaster extension currently passes 162 of 166 postgres regression tests. We are working right now on proving full compatibility with the standard PostgreSQL.

Authors

Postgres Professional, Moscow, Russia.

Credits

The replication mechanism is based on logical decoding and an earlier version of the pglogical extension provided for community by the 2ndQuadrant team.

postgres_cluster's People

Contributors

adunstan avatar afiskon avatar alvherre avatar anarazel avatar bmomjian avatar darcyjcain avatar dbdbdb avatar feodor avatar grunthos avatar gsstark avatar hlinnaka avatar itgacky avatar jconway avatar kelvich avatar kgrittn avatar knizhnik avatar kvap avatar masaofujii avatar mhagander avatar nmisch avatar petere avatar pjmodos avatar ringerc avatar robertmhaas avatar scrappy avatar sfrost avatar simonat2ndquadrant avatar tatsuo-ishii avatar tglsfdc avatar vadim4o avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

postgres_cluster's Issues

FATAL: could not load library "/usr/local/pgsql/lib/multimaster.so"

Hello

Tried to start postgres service on CentOS 7 and got an error:

FATAL:  could not load library "/usr/local/pgsql/lib/multimaster.so": /usr/local/pgsql/lib/multimaster.so: undefined symbol: IndexRelationGetNumberOfKeyAttributes
LOG:  database system is shut down

All build/installation from source was done with no errors.

Made changes in postgresql.conf and pg_hba.conf:

parts of postgresql.conf:

max_wal_senders = 10  # at least the number of nodes 
max_replication_slots = 10  # at least the number of nodes 
max_connections = 100                   
max_prepared_transactions = 300 
wal_level = logical  
shared_preload_libraries = multimaster
multimaster.max_nodes = 3
multimaster.node_id = 1
multimaster.conn_strings = 'dbname=mydb host=192.168.56.141,dbname=mydb host=192.168.56.142,dbname=mydb host=192.168.56.143'

*1) mydb i did not create yet
*2) other host are were not created yet

parts of pg_hba.conf

local replication postgres trust
host replication postgres 127.0.0.1/32 trust
host replication postgres ::1/128 trust

Installed on new empty dist of CentOS 7 on VirtualBox WM(Host Win10)
CentOS Linux release 7.3.1611 (Core)

  • if i comment these lines - i can start without errors:
#multimaster.max_nodes = 3
#multimaster.node_id = 1
#multimaster.conn_strings

What can it be? I think it's problem with path/filename '/usr/local/pgsql/lib/multimaster.so' - it was not found.
But how to fix it?

Vacuum test segfault

CREATE TABLE vaccluster (i INT PRIMARY KEY);
ALTER TABLE vaccluster CLUSTER ON vaccluster_pkey;
CLUSTER vaccluster;

CREATE FUNCTION do_analyze() RETURNS VOID VOLATILE LANGUAGE SQL
    AS 'ANALYZE pg_am';
CREATE FUNCTION wrap_do_analyze(c INT) RETURNS INT IMMUTABLE LANGUAGE SQL
    AS 'SELECT $1 FROM do_analyze()';
CREATE INDEX ON vaccluster(wrap_do_analyze(i));
INSERT INTO vaccluster VALUES (1), (2);

Backtrace:

  * frame #0: 0x000000010e5ffd6b postgres`GetActiveSnapshot + 11 at snapmgr.c:759
    frame #1: 0x000000010e2731da postgres`postquel_start(es=0x00007f9ceb810de0, fcache=0x00007f9ceb80e430) + 170 at functions.c:800
    frame #2: 0x000000010e272517 postgres`fmgr_sql(fcinfo=0x00007f9cea870bd0) + 839 at functions.c:1145
    frame #3: 0x000000010e26e78a postgres`ExecMakeFunctionResultNoSets(fcache=0x00007f9cea870b60, econtext=0x00007f9cea870940, isNull="", isDone=0x0000000000000000) + 346 at execQual.c:2041
    frame #4: 0x000000010e268a17 postgres`ExecEvalFunc(fcache=0x00007f9cea870b60, econtext=0x00007f9cea870940, isNull="", isDone=0x0000000000000000) + 199 at execQual.c:2432
    frame #5: 0x000000010e2648e3 postgres`ExecEvalExprSwitchContext(expression=0x00007f9cea870b60, econtext=0x00007f9cea870940, isNull="", isDone=0x0000000000000000) + 67 at execQual.c:4413
    frame #6: 0x000000010e111b30 postgres`FormIndexDatum(indexInfo=0x00007f9cea863b50, slot=0x00007f9cea8637e0, estate=0x00007f9cea870830, values=0x00007fff51bc7ac0, isnull="") + 320 at index.c:1789
    frame #7: 0x000000010e25aada postgres`ExecInsertIndexTuples(slot=0x00007f9cea8637e0, tupleid=0x00007f9cea863684, estate=0x00007f9cea870830, noDupErr='\0', specConflict=0x0000000000000000, arbiterIndexes=0x0000000000000000) + 602 at execIndexing.c:355
    frame #8: 0x000000010ec14bfb multimaster.so`UserTableUpdateOpenIndexes(estate=0x00007f9cea870830, slot=0x00007f9cea8637e0) + 123 at pglogical_apply.c:298
    frame #9: 0x000000010ec135a8 multimaster.so`process_remote_insert(s=0x00007fff51bcbe40, rel=0x000000010ea38330) + 904 at pglogical_apply.c:739
    frame #10: 0x000000010ec12b65 multimaster.so`MtmExecutor(id=0, work=0x00007f9cea40b980, size=142) + 693 at pglogical_apply.c:997
    frame #11: 0x000000010ec0e86d multimaster.so`BgwPoolMainLoop(arg=140311923791792) + 589 at bgwpool.c:57
    frame #12: 0x000000010e37330a postgres`StartBackgroundWorker + 826 at bgworker.c:726
    frame #13: 0x000000010e388ab0 postgres`do_start_bgworker(rw=0x00007f9cea601e90) + 240 at postmaster.c:5531
    frame #14: 0x000000010e383d1b postgres`maybe_start_bgworker + 427 at postmaster.c:5706
    frame #15: 0x000000010e38424e postgres`ServerLoop + 1198 at postmaster.c:1762
    frame #16: 0x000000010e381ca9 postgres`PostmasterMain(argc=3, argv=0x00007f9cea403900) + 5897 at postmaster.c:1298
    frame #17: 0x000000010e2bb60f postgres`main(argc=3, argv=0x00007f9cea403900) + 751 at main.c:228
    frame #18: 0x00007fff8b9b15ad libdyld.dylib`start + 1

Привести в порядок xtm

  • Перенести функции
  • Залипание oldestXmin
  • Задержка вакуума аккуратнее
  • Проверить совместимость с локальными транзакциями на нодах
  • Проверить длинный repeatable read
  • Локальная нода как участник транзакции
  • Детектить изменения (?)
  • Считать кол-во нод-участников
  • Проверить изоляцию по shared-данным
  • Выделить отдельный GlobTxHash с методами доступа, а не лазить напрямую
  • Проверить GlobTxHash на переполнение и обработку oldestXid, возможно переделать на CommitTs SLRU.
  • Переделать на ilist-ы (SetStatus, GetStatus, Trim(csn), Childs)
  • Разобраться с xtm.h/xtm.c и global_snapshot.h
  • Правильные слипы: сигналы/смерть поcтмастера
  • LWlock-и вместо спинлоков
  • Пофиксить комменты и расположение функций
  • Убрать/выделить deadlock detector api
  • DTM_TRACE переделать
  • gucs для вкл/вылк
  • Ассерты на взятый/отпущенный лок условий в функциях / внутри транзакции / после prepare
  • Документация
  • Изотесты?

Shared memory overflow

There is a problem with "frozen" procArray->replication_slot_xmin.
This variable is adjusted by ProcArraySetReplicationSlotXmin which is invoked by ReplicationSlotsComputeRequiredXmin, which
is in turn is called by LogicalConfirmReceivedLocation. If transactions are executed at all nodes of multimaster, then everything works fine: replication_slot_xmin is advanced. But if we send transactions only to one multimaster node and broadcast this changes to other nodes, then no data is send through replications slot at this nodes. No data sends - no confirmations, LogicalConfirmReceivedLocation is not called and procArray->replication_slot_xmin preserves original value 599.

As a result GetOldestXmin function always returns 599, so autovacuum is actually blocked and our multimaster is not able to perform cleanup of XID->CSN map, which cause shared memory overflow. This situation happens only when write transactions are sent only to one node or if there are no write transactions at all.

Raftable client always tries to connect dead node.

Setup cluster of 3 nodes, pgbench -i, kill first one, pgbench of node2 (or just commit rw tx in connection):

WARNING:  raftable client: trying [0] localhost:6543
WARNING:  raftable client: connect timed out
WARNING:  raftable client: could not connect
WARNING:  raftable client: connect_leader() failed
WARNING:  raftable query failed after 452 ms
progress: 1.0 s, 270.0 tps, lat 3.688 ms stddev 27.804
WARNING:  raftable client: trying [2] localhost:6545

FATAL: XX000: could not load library "/usr/local/pgsql/lib/multimaster.so": /usr/local/pgsql/lib/multimaster.so: undefined symbol: IndexRelationGetNumberOfKeyAttributes

build postgres_cluster is Success
then i create extension
postgres=# create extension multimaster ;
ERROR: could not load library "/usr/local/pgsql/lib/multimaster.so": /usr/local/pgsql/lib/multimaster.so: undefined symbol: TM
STATEMENT: create extension multimaster ;
ERROR: could not load library "/usr/local/pgsql/lib/multimaster.so": /usr/local/pgsql/lib/multimaster.so: undefined symbol: TM
the log is
[postgres@mm2 pgdata]$ FATAL: XX000: could not load library "/usr/local/pgsql/lib/multimaster.so": /usr/local/pgsql/lib/multimaster.so: undefined symbol: IndexRelationGetNumberOfKeyAttributes
LOCATION: internal_load_library, dfmgr.c:235
LOG: 00000: database system is shut down
LOCATION: UnlinkLockFiles, miscinit.c:755

Rowsecurity test breaks mm

:MtmGucSet: session_authorization -> regress_rls_bob
ERROR:  catalog is missing 3 attribute(s) for relid 32389
ERROR:  catalog is missing 3 attribute(s) for relid 32389
LOG:  WAL-sender to 3 is terminated
LOG:  WAL-sender to 2 is terminated
ERROR:  replication slot "mtm_slot_3" already exists
ERROR:  replication slot "mtm_slot_2" already exists

pl/pgsql segfault

create or replace function shadowtest(in1 int)
    returns table (out1 int) as $$
declare
in1 int;
out1 int;
begin
end
$$ language plpgsql;
select shadowtest(1);

set plpgsql.extra_warnings to 'shadowed_variables';
select shadowtest(1);
create or replace function shadowtest(in1 int)
    returns table (out1 int) as $$
declare
in1 int;
out1 int;
begin
end
$$ language plpgsql;
select shadowtest(1);
drop function shadowtest(int);
  * frame #0: 0x0000000102243f38 postgres`function_parse_error_transpose(prosrc="\ndeclare\nin1 int;\nout1 int;\nbegin\nend\n") + 72 at pg_proc.c:1012
    frame #1: 0x0000000102b741d0 plpgsql.so`plpgsql_compile_error_callback(arg=0x00007faa9a840830) + 32 at pl_comp.c:934
    frame #2: 0x00000001026b1a27 postgres`errfinish(dummy=0) + 215 at elog.c:438
    frame #3: 0x0000000102b698dd plpgsql.so`plpgsql_yyparse + 6349 at pl_gram.y:742
    frame #4: 0x0000000102b73d3e plpgsql.so`do_compile(fcinfo=0x00007fff5dac7d80, procTup=0x0000000102b37e08, function=0x00007faa9b024210, hashkey=0x00007fff5dac79e8, forValidator='\x01') + 4654 at pl_comp.c:733
    frame #5: 0x0000000102b72839 plpgsql.so`plpgsql_compile(fcinfo=0x00007fff5dac7d80, forValidator='\x01') + 521 at pl_comp.c:222
    frame #6: 0x0000000102b72536 plpgsql.so`plpgsql_validator(fcinfo=0x00007fff5dac8190) + 2326 at pl_handler.c:491
    frame #7: 0x00000001026bdb63 postgres`OidFunctionCall1Coll(functionId=12658, collation=0, arg1=16407) + 163 at fmgr.c:1596
    frame #8: 0x00000001022434bc postgres`ProcedureCreate(procedureName="shadowtest", procNamespace=2200, replace='\x01', returnsSet='\x01', returnType=23, proowner=10, languageObjectId=12659, languageValidator=12658, prosrc="\ndeclare\nin1 int;\nout1 int;\nbegin\nend\n", probin=0x0000000000000000, isAgg='\0', isWindowFunc='\0', security_definer='\0', isLeakProof='\0', isStrict='\0', volatility='v', parallel='u', parameterTypes=0x00007faa9b80acc8, allParameterTypes=140370730069240, parameterModes=140370730069288, parameterNames=140370730069336, parameterDefaults=0x0000000000000000, trftypes=0, proconfig=0, procost=100, prorows=1000) + 8620 at pg_proc.c:728
    frame #9: 0x00000001022f5664 postgres`CreateFunction(stmt=0x00007faa9c0869a8, queryString="RESET SESSION AUTHORIZATION; reset all; SET plpgsql.extra_warnings TO shadowed_variables; create or replace function shadowtest(in1 int)\nreturns table (out1 int) as $$\ndeclare\nin1 int;\nout1 int;\nbegin\nend\n$$ language plpgsql;") + 2660 at functioncmds.c:1083
    frame #10: 0x00000001025224a1 postgres`ProcessUtilitySlow(parsetree=0x00007faa9c0869a8, queryString="RESET SESSION AUTHORIZATION; reset all; SET plpgsql.extra_warnings TO shadowed_variables; create or replace function shadowtest(in1 int)\nreturns table (out1 int) as $$\ndeclare\nin1 int;\nout1 int;\nbegin\nend\n$$ language plpgsql;", context=PROCESS_UTILITY_QUERY, params=0x0000000000000000, dest=0x0000000102866f48, completionTag="") + 5265 at utility.c:1376
    frame #11: 0x0000000102520b64 postgres`standard_ProcessUtility(parsetree=0x00007faa9c0869a8, queryString="RESET SESSION AUTHORIZATION; reset all; SET plpgsql.extra_warnings TO shadowed_variables; create or replace function shadowtest(in1 int)\nreturns table (out1 int) as $$\ndeclare\nin1 int;\nout1 int;\nbegin\nend\n$$ language plpgsql;", context=PROCESS_UTILITY_QUERY, params=0x0000000000000000, dest=0x0000000102866f48, completionTag="") + 3844 at utility.c:907
    frame #12: 0x0000000102d06a47 multimaster.so`MtmProcessUtility(parsetree=0x00007faa9c0869a8, queryString="RESET SESSION AUTHORIZATION; reset all; SET plpgsql.extra_warnings TO shadowed_variables; create or replace function shadowtest(in1 int)\nreturns table (out1 int) as $$\ndeclare\nin1 int;\nout1 int;\nbegin\nend\n$$ language plpgsql;", context=PROCESS_UTILITY_QUERY, params=0x0000000000000000, dest=0x0000000102866f48, completionTag="") + 1207 at multimaster.c:3550
    frame #13: 0x000000010251fc2d postgres`ProcessUtility(parsetree=0x00007faa9c0869a8, queryString="RESET SESSION AUTHORIZATION; reset all; SET plpgsql.extra_warnings TO shadowed_variables; create or replace function shadowtest(in1 int)\nreturns table (out1 int) as $$\ndeclare\nin1 int;\nout1 int;\nbegin\nend\n$$ language plpgsql;", context=PROCESS_UTILITY_QUERY, params=0x0000000000000000, dest=0x0000000102866f48, completionTag="") + 77 at utility.c:332
    frame #14: 0x00000001023a0674 postgres`_SPI_execute_plan(plan=0x00007fff5dac9d18, paramLI=0x0000000000000000, snapshot=0x0000000000000000, crosscheck_snapshot=0x0000000000000000, read_only='\0', fire_triggers='\x01', tcount=0) + 1620 at spi.c:2200
    frame #15: 0x000000010239ff05 postgres`SPI_execute(src="RESET SESSION AUTHORIZATION; reset all; SET plpgsql.extra_warnings TO shadowed_variables; create or replace function shadowtest(in1 int)\nreturns table (out1 int) as $$\ndeclare\nin1 int;\nout1 int;\nbegin\nend\n$$ language plpgsql;", read_only='\0', tcount=0) + 197 at spi.c:387
    frame #16: 0x0000000102d1606c multimaster.so`process_remote_message(s=0x00007fff5dac9e40) + 188 at pglogical_apply.c:366
    frame #17: 0x0000000102d14cc2 multimaster.so`MtmExecutor(id=0, work=0x00007faa9a405a20, size=297) + 1042 at pglogical_apply.c:1039
    frame #18: 0x0000000102d1086d multimaster.so`BgwPoolMainLoop(arg=140370711155584) + 589 at bgwpool.c:57
    frame #19: 0x000000010247530a postgres`StartBackgroundWorker + 826 at bgworker.c:726
    frame #20: 0x000000010248aab0 postgres`do_start_bgworker(rw=0x00007faa9a601a60) + 240 at postmaster.c:5531
    frame #21: 0x0000000102485d1b postgres`maybe_start_bgworker + 427 at postmaster.c:5706
    frame #22: 0x000000010248624e postgres`ServerLoop + 1198 at postmaster.c:1762
    frame #23: 0x0000000102483ca9 postgres`PostmasterMain(argc=3, argv=0x00007faa9a403900) + 5897 at postmaster.c:1298
    frame #24: 0x00000001023bd60f postgres`main(argc=3, argv=0x00007faa9a403900) + 751 at main.c:228
    frame #25: 0x00007fff8b9b15ad libdyld.dylib`start + 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.