From c4a9a0978c20eb5b5a50a148359a6a235736c3db Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Fri, 19 Jun 2015 12:12:03 +0800 Subject: [PATCH] Fix docs structure errors, a few cleanups --- doc/manual-commands.sgml | 7 + doc/manual-conflicts.sgml | 613 ++++++++++++++++++++------------------ 2 files changed, 331 insertions(+), 289 deletions(-) diff --git a/doc/manual-commands.sgml b/doc/manual-commands.sgml index c2ad56e4fe..a0e7f76901 100644 --- a/doc/manual-commands.sgml +++ b/doc/manual-commands.sgml @@ -36,6 +36,13 @@ creation. + + + bdr_init_copy only supports setup + of a single &bdr;-enabled database at this time. + + + bdr_init_copy option diff --git a/doc/manual-conflicts.sgml b/doc/manual-conflicts.sgml index 7aaf4e9996..7087121bf6 100644 --- a/doc/manual-conflicts.sgml +++ b/doc/manual-conflicts.sgml @@ -56,337 +56,348 @@ conflicts. - - Types of conflict + - - <literal>PRIMARY KEY</literal> or <literal>UNIQUE</literal> conflicts + + Types of conflict - - The most common conflicts are row conflicts where two operations affect a - row with the same key in ways they could not do on a single node. &bdr; can - detect most of those and apply last-update-wins conflict handling or invoke - a user-defined conflict handler. - + + <literal>PRIMARY KEY</literal> or <literal>UNIQUE</literal> conflicts - - Row conflicts include: - - INSERT vs INSERT - INSERT vs UPDATE - UPDATE vs DELETE - INSERT vs DELETE - - + + The most common conflicts are row conflicts where two operations affect a + row with the same key in ways they could not do on a single node. &bdr; can + detect most of those and apply last-update-wins conflict handling or invoke + a user-defined conflict handler. + - - - - The most common conflict, INSERT vs - INSERT, arises where INSERTs on two - different nodes create a tuple with the same PRIMARY - KEY values (or the same values for a single - UNIQUE constraint). &bdr; handles this by retaining - the most recently inserted tuple of the two according to the originating - host's timestamps unless a user-defined conflict handler overrides this. - - - - No special administrator action is required to deal with these conflicts, - but the user must undersand that one of the - INSERTed tuples is effectively discarded on all - nodes - there is no data merging done unless a user defined - conflict handler does it. - - - - - - INSERTs that violate multiple UNIQUE constraints - - - An INSERT/INSERT conflict - can violate more than one UNIQUE constraint - (of which one might be the PRIMARY KEY). - - - - &bdr; can only handle an - INSERT/INSERT conflict on one - unique constraint (including the PRIMARY KEY). If a - new row conflicts with more than one UNIQUE constraint - then the apply worker that's trying to apply the change will - ERROR out with: - - ERROR: multiple unique constraints violated by remotely INSERTed tuple - - (Older versions would report a "diverging uniqueness - conflict" error instead). - - - - In case of such a conflict, you must manually remove the conflicting - tuple(s) from the local side by DELETEing it or by - UPDATEing it so that it no longer conflicts with the - new remote tuple. There may be more than one conflicting tuple. There is - not currently any built-in facility to ignore, discard or merge tuples - that violate more than one local unique constraint. - - - + + Row conflicts include: + + INSERT vs INSERT + INSERT vs UPDATE + UPDATE vs DELETE + INSERT vs DELETE + + - - - - Constraint conflicts + - - Conflicts between a remote transaction being applied and existing local data - can also occur for FOREIGN KEY constraints. These - conflicts are usually transient issues that arise from transactions being - applied in a different order to the order they appeared to occur logically - on the nodes that originated them. - + INSERTs that violate one UNIQUE constraint - While apply is strictly ordered for any given origin node, there is no - enforcement of ordering of transactions between two different nodes, so - it's possible for (e.g.) node1 to insert a row into T1, which is replayed to node2. - node2 inserts a row into T2 which has a foreign key reference to the row from T1. - On node3, if the transaction from node2 that inserts the row into T2 is received - before the transaction from node1 that inserts the row into T1, - the transaction from node2 will fail to apply. This failure will record a rollback - in and an ERROR - with details in the PostgreSQL error log on the applying node (node3). In - this case &bdr; will retry the transaction from node2 periodicially, so - once it's replayed the transaction from node1 that it depends on the - transaction will commit successfully. + The most common conflict, INSERT vs + INSERT, arises where INSERTs on two + different nodes create a tuple with the same PRIMARY + KEY values (or the same values for a single + UNIQUE constraint). &bdr; handles this by retaining + the most recently inserted tuple of the two according to the originating + host's timestamps unless a user-defined conflict handler overrides this. - - Foreign key constraint deadlocks - - Simple foreign key constraint conflicts are generally transient and - require no administrator action, but for transactions that change multiple - entities this is not always the case. It is possible for - concurrent changes to tables with foreign key constraints to create - inter-node replication deadlocks where no node can - apply changes from any other node because they conflict with local data. - This causes replication activity to stop until the deadlock is broken by a - local data change on one or more of the nodes. + No special administrator action is required to deal with these conflicts, + but the user must undersand that one of the + INSERTed tuples is effectively discarded on all + nodes - there is no data merging done unless a user defined + conflict handler does it. - - For example, take a two node system with two tables and some existing data: - - CREATE TABLE parent( - id integer primary key - ); - - CREATE TABLE child( - id integer primary key, - parent_id integer not null references parent(id) - ); + - INSERT INTO parent(id) - VALUES (1), (2); + - INSERT INTO child(id, parent_id) - VALUES (11, 1), (11, 2); - - If node A does: - - INSERT INTO child(id, parent_id) - VALUES (21, 2); - - and at the same time node B does: - - DELETE FROM child WHERE parent_id = 2; - DELETE FROM parent WHERE id = 2; - - then we have a situation where the transaction from node A cannot apply - successfully to the child table on node B because the - referenced parent no longer exists. The transaction - from node B cannot apply to node A because it deletes a - parent tuple that's still referenced, the new one with - id=21. Neither transaction can replay, and both will output periodic - ERRORs in the log files as they are retried. Since - &bdr; replays transactions from a given node strictly in order, neither - node can make progress with replication unless the user, or some 3rd node, - makes changes that resolve the deadlock. - + INSERTs that violate multiple UNIQUE constraints - It is important to note that when we manually deleted the child tuples - on node B, the newly inserted child on node A was not affected because - it had not yet replicated to node B. If either node replays the other's - transaction before attempting its own local transaction then no problem - will occur. + An INSERT/INSERT conflict + can violate more than one UNIQUE constraint + (of which one might be the PRIMARY KEY). - Solving such a foreign key deadlock requires that you fix the constraint - issue on each end. In this case, you would need to insert a dummy - parent row on node B and delete the new child on node - A. Replay will continue past the deadlock point. + &bdr; can only handle an + INSERT/INSERT conflict on one + unique constraint (including the PRIMARY KEY). If a + new row conflicts with more than one UNIQUE constraint + then the apply worker that's trying to apply the change will + ERROR out with: + + ERROR: multiple unique constraints violated by remotely INSERTed tuple + + (Older versions would report a "diverging uniqueness + conflict" error instead). - &bdr; can't just apply the changes from each end anyway because doing so - would result in tables that violated their declared foreign key - constraints, which most users would view as corruption. + In case of such a conflict, you must manually remove the conflicting + tuple(s) from the local side by DELETEing it or by + UPDATEing it so that it no longer conflicts with the + new remote tuple. There may be more than one conflicting tuple. There is + not currently any built-in facility to ignore, discard or merge tuples + that violate more than one local unique constraint. - - - - Exclusion constraint conflicts - - - &bdr; doesn't support exclusion constraints and restricts their creation. - - - - - If an existing stand-alone database is converted to a &bdr; database then - all exclusion constraints should be manually dropped. - - + - - In a distributed asynchronous system it is not possible to ensure that no - set of rows that violates the constraint exists, because all transactions - on different nodes are fully isolated. Exclusion constraints would lead to - replay deadlocks where replay could not progress from any node to any - other node because of exclusion constraint violations. - - - - If you force &bdr; to create an exclusion constraint, or you don't drop - existing ones when converting a standalone database to &bdr; you should - expect replication to break. You can get it to progress again by - removing or altering the local tuple(s) that an incoming remote tuple - conflicts with so that the remote transaction can be applied. - + + Foreign Key Constraint conflicts + + + Conflicts between a remote transaction being applied and existing local data + can also occur for FOREIGN KEY constraints. These + conflicts are usually transient issues that arise from transactions being + applied in a different order to the order they appeared to occur logically + on the nodes that originated them. + + + + While apply is strictly ordered for any given origin node, there is no + enforcement of ordering of transactions between two different nodes, so + it's possible for (e.g.) node1 to insert a row into T1, which is replayed to node2. + node2 inserts a row into T2 which has a foreign key reference to the row from T1. + On node3, if the transaction from node2 that inserts the row into T2 is received + before the transaction from node1 that inserts the row into T1, + the transaction from node2 will fail to apply. This failure will record a rollback + in and an ERROR + with details in the PostgreSQL error log on the applying node (node3). In + this case &bdr; will retry the transaction from node2 periodicially, so + once it's replayed the transaction from node1 that it depends on the + transaction will commit successfully. + + + + Foreign key constraint deadlocks + + + Simple foreign key constraint conflicts are generally transient and + require no administrator action, but for transactions that change multiple + entities this is not always the case. It is possible for + concurrent changes to tables with foreign key constraints to create + inter-node replication deadlocks where no node can + apply changes from any other node because they conflict with local data. + This causes replication activity to stop until the deadlock is broken by a + local data change on one or more of the nodes. + + + + For example, take a two node system with two tables and some existing data: + + CREATE TABLE parent( + id integer primary key + ); + + CREATE TABLE child( + id integer primary key, + parent_id integer not null references parent(id) + ); + + INSERT INTO parent(id) + VALUES (1), (2); + + INSERT INTO child(id, parent_id) + VALUES (11, 1), (11, 2); + + If node A does: + + INSERT INTO child(id, parent_id) + VALUES (21, 2); + + and at the same time node B does: + + DELETE FROM child WHERE parent_id = 2; + DELETE FROM parent WHERE id = 2; + + then we have a situation where the transaction from node A cannot apply + successfully to the child table on node B because the + referenced parent no longer exists. The transaction + from node B cannot apply to node A because it deletes a + parent tuple that's still referenced, the new one with + id=21. Neither transaction can replay, and both will output periodic + ERRORs in the log files as they are retried. Since + &bdr; replays transactions from a given node strictly in order, neither + node can make progress with replication unless the user, or some 3rd node, + makes changes that resolve the deadlock. + + + + It is important to note that when we manually deleted the child tuples + on node B, the newly inserted child on node A was not affected because + it had not yet replicated to node B. If either node replays the other's + transaction before attempting its own local transaction then no problem + will occur. + + + + Solving such a foreign key deadlock requires that you fix the constraint + issue on each end. In this case, you would need to insert a dummy + parent row on node B and delete the new child on node + A. Replay will continue past the deadlock point. + + + + &bdr; can't just apply the changes from each end anyway because doing so + would result in tables that violated their declared foreign key + constraints, which most users would view as corruption. + - - Global data conflicts - - - Conflicts can also arise where nodes have global (PostgreSQL-system-wide) - data, like roles, that differs. This can result in operations - mainly - DDL - that can be run successfully and committed - on one node, but then fail to apply to other nodes. - - - - For example, node1 might have a user named - fred, but that user was not created on node2. - &bdr; does not replicate CREATE USER (see - ) so this situation can arise easily. - If fred on node1 creates a table, it will - be replicated with its owner set to fred. - When the DDL command is applied to node2 the DDL will fail - because there is no user named fred. - This failure will emit an ERROR in the - PostgreSQL logs on node2 and increment - .nr_rollbacks. - - - - Administrator intervention is required to resolve this conflict - by creating the user fred on node2. - (It need not have the same permissions, but must exist). - - - + - - Lock conflicts and deadlock aborts + + Exclusion constraint conflicts - - Because &bdr; apply processes operate very like normal user sessions - they are subject to the usual rules around row and table locking. This - can sometimes lead to &bdr; apply processes waiting on locks held - by user transactions, or even by each other. - + + &bdr; doesn't support exclusion constraints and restricts their creation. + + - Relevant locking includes; - - explicit table-level locking (LOCK TABLE ...) by user sessions - explicit row level locking (SELECT ... FOR UPDATE/FOR SHARE) by user sessions - locking from foreign keys - implicit locking because of row UPDATEs, INSERTs or DELETEs, either from local activity or apply from other servers - + If an existing stand-alone database is converted to a &bdr; database then + all exclusion constraints should be manually dropped. + + + + In a distributed asynchronous system it is not possible to ensure that no + set of rows that violates the constraint exists, because all transactions + on different nodes are fully isolated. Exclusion constraints would lead to + replay deadlocks where replay could not progress from any node to any + other node because of exclusion constraint violations. + + + + If you force &bdr; to create an exclusion constraint, or you don't drop + existing ones when converting a standalone database to &bdr; you should + expect replication to break. You can get it to progress again by + removing or altering the local tuple(s) that an incoming remote tuple + conflicts with so that the remote transaction can be applied. + - - It is even possible for a &bdr; apply process to deadlock with a user - transaction, where the user transaction is waiting on a lock held - by the apply process and vice versa. Two apply processes may also - deadlock with each other. PostgreSQL's deadlock detector will - step in and terminate one of the problem transactions. If the &bdr; apply - worker's process is terminated it will simply retry and generally succeed. - + - - All these issues are transient and generally require no administrator - action. If an apply process is stuck for a long time behind a lock - on an idle user session the administrator may choose to terminate - the user session to get replication flowing again, but this is - no different to a user holding a long lock that impacts another - user session. - + + Global data conflicts + + + Conflicts can also arise where nodes have global (PostgreSQL-system-wide) + data, like roles, that differs. This can result in operations - mainly + DDL - that can be run successfully and committed + on one node, but then fail to apply to other nodes. + + + + For example, node1 might have a user named + fred, but that user was not created on node2. + &bdr; does not replicate CREATE USER (see + ) so this situation can arise easily. + If fred on node1 creates a table, it will + be replicated with its owner set to fred. + When the DDL command is applied to node2 the DDL will fail + because there is no user named fred. + This failure will emit an ERROR in the + PostgreSQL logs on node2 and increment + .nr_rollbacks. + + + + Administrator intervention is required to resolve this conflict + by creating the user fred on node2. + (It need not have the same permissions, but must exist). + - - Use of the - log_lock_waits facility in PostgreSQL can help identify locking - related replay stalls. - + - + + Lock conflicts and deadlock aborts + + + Because &bdr; apply processes operate very like normal user sessions + they are subject to the usual rules around row and table locking. This + can sometimes lead to &bdr; apply processes waiting on locks held + by user transactions, or even by each other. + + + + Relevant locking includes; + + explicit table-level locking (LOCK TABLE ...) by user sessions + explicit row level locking (SELECT ... FOR UPDATE/FOR SHARE) by user sessions + locking from foreign keys + implicit locking because of row UPDATEs, INSERTs or DELETEs, either from local activity or apply from other servers + + + + + It is even possible for a &bdr; apply process to deadlock with a user + transaction, where the user transaction is waiting on a lock held + by the apply process and vice versa. Two apply processes may also + deadlock with each other. PostgreSQL's deadlock detector will + step in and terminate one of the problem transactions. If the &bdr; apply + worker's process is terminated it will simply retry and generally succeed. + + + + All these issues are transient and generally require no administrator + action. If an apply process is stuck for a long time behind a lock + on an idle user session the administrator may choose to terminate + the user session to get replication flowing again, but this is + no different to a user holding a long lock that impacts another + user session. + + + + Use of the + log_lock_waits facility in PostgreSQL can help identify locking + related replay stalls. + - - Divergent conflicts + - - Divergent conflicts arise when data that should be the same on different - nodes differs unexpectedly. Divergent conflicts should not occur, but not - all such conflicts can be reliably prevented at time of writing. - + + Divergent conflicts - - - Changing the PRIMARY KEY of a row can lead to a - divergent conflict if another node changes the key of the same row before - all nodes have replayed the change. Avoid changing primary keys, or - change them only on one designated node. - - + + Divergent conflicts arise when data that should be the same on different + nodes differs unexpectedly. Divergent conflicts should not occur, but not + all such conflicts can be reliably prevented at time of writing. + + - Divergent conflicts involving row data generally require administrator - action to manually adjust the data on one of the nodes to be consistent - with the other one while replication is temporarily disabled using . Such conflicts should not arise - so long as &bdr; is used as documented and settings or functions marked - as unsafe are avoided. + Changing the PRIMARY KEY of a row can lead to a + divergent conflict if another node changes the key of the same row before + all nodes have replayed the change. Avoid changing primary keys, or + change them only on one designated node. - - - - + + + + Divergent conflicts involving row data generally require administrator + action to manually adjust the data on one of the nodes to be consistent + with the other one while replication is temporarily disabled using . Such conflicts should not arise + so long as &bdr; is used as documented and settings or functions marked + as unsafe are avoided. + + + + The administrator must manually resolve such conflicts. Use of the + advanced options , + and + may be required depending + on the nature of the conflict. However, careless use of these options + can make things much worse and it isn't possible to give general + instructions for resolving all possible kinds of conflict. + @@ -422,21 +433,45 @@ The best course of action is frequently to allow conflicts to occur and design the application to work with &bdr;'s conflict resolution - mechansisms to cope with the conflict. See . + mechansisms to cope with the conflict. See . - - User defined conflict handlers &bdr; provides facilities for users to override the default last-update-wins - data row conflict resolution strategy. + data row conflict resolution strategy on row key conflicts. + + + + A user defined conflict handler, if provided, is called before default + row conflict resolution is performed. The user defined handler may choose + to ignore the new row and keep the original local row, to apply the new + row, or to generate a new row (possibly merging old and new) and apply + that instead of the new incoming row. A conflict handler may also + choose to ERROR out, which can be useful if it wishes + to abort apply of a transaction and retry it later. + + Conflict handlers cannot skip whole transactions. + + + + + User-defined conflict handlers do not have access to both the old and new + versions of the remote row, so they cannot tell which field(s) in the + remote incoming tuple changed. It is thus not possible to do reliable row + merging. Attempts to so for the general case will usually prove to be + incorrect in an asynchronous replication envirionment. It's possible in + some application-specific situations where the app "knows" more about + the data. + + + -- 2.39.5