From: Craig Ringer Date: Tue, 16 Jun 2015 02:37:56 +0000 (+0800) Subject: More explanation of conflict types X-Git-Url: http://git.postgresql.org/gitweb/static/gitweb.js?a=commitdiff_plain;h=d4ddb1b9206028622a10f8a762c457edd2a828a6;p=2ndquadrant_bdr.git More explanation of conflict types --- diff --git a/doc/manual-conflicts.sgml b/doc/manual-conflicts.sgml index 89b57c4249..315083d865 100644 --- a/doc/manual-conflicts.sgml +++ b/doc/manual-conflicts.sgml @@ -59,8 +59,8 @@ Types of conflict - - Row conflicts + + <literal>PRIMARY KEY</literal> or <literal>UNIQUE</literal> conflicts The most common conflicts are row conflicts where two operations affect a @@ -77,23 +77,82 @@ UPDATE vs DELETE INSERT vs DELETE - + + + + + The most common conflict, INSERT vs + INSERT, arises where INSERTs on two + different nodes create a tuple with the same PRIMARY + KEY values (or the same values for a single + UNIQUE constraint). &bdr; handles this by retaining + the most recently inserted tuple of the two according to the originating + host's timestamps unless a user-defined conflict handler overrides this. + + + + No special administrator action is required to deal with these conflicts, + but the user must undersand that one of the + INSERTed tuples is effectively discarded on all + nodes - there is no data merging done unless a user defined + conflict handler does it. + + + + + + INSERTs that violate multiple UNIQUE constraints + + + An INSERT/INSERT conflict + can violate more than one UNIQUE constraint + (of which one might be the PRIMARY KEY). + + + + &bdr; can only handle an + INSERT/INSERT conflict on one + unique constraint (including the PRIMARY KEY). If a + new row conflicts with more than one UNIQUE constraint + then the apply worker that's trying to apply the change will + ERROR out with: + + ERROR: multiple unique constraints violated by remotely INSERTed tuple + + (Older versions would report a "diverging uniqueness + conflict" error instead). + + + + In case of such a conflict, you must manually remove the conflicting + tuple(s) from the local side by DELETEing it or by + UPDATEing it so that it no longer conflicts with the + new remote tuple. There may be more than one conflicting tuple. There is + not currently any built-in facility to ignore, discard or merge tuples + that violate more than one local unique constraint. + + + + - + Constraint conflicts - Constraint conflicts can also occur, mainly with foreign keys. These are - usually transient issues that arise from transactions being applied in a - different order to the order they appeared to occur logically on the nodes - that originated them. + Conflicts between a remote transaction being applied and existing local data + can also occur for FOREIGN KEY constraints. These + conflicts are usually transient issues that arise from transactions being + applied in a different order to the order they appeared to occur logically + on the nodes that originated them. While apply is strictly ordered for any given origin node, there is no - enforcemnet of ordering of transactions between two different nodes, so + enforcement of ordering of transactions between two different nodes, so it's possible for (e.g.) node1 to insert a row into T1, which is replayed to node2. node2 inserts a row into T2 which has a foreign key reference to the row from T1. On node3, if the transaction from node2 that inserts the row into T2 is received @@ -106,20 +165,116 @@ transaction will commit successfully. + + Foreign key constraint deadlocks + + + Simple foreign key constraint conflicts are generally transient and + require no administrator action, but for transactions that change multiple + entities this is not always the case. It is possible for + concurrent changes to tables with foreign key constraints to create + inter-node replication deadlocks where no node can + apply changes from any other node because they conflict with local data. + This causes replication activity to stop until the deadlock is broken by a + local data change on one or more of the nodes. + + + + For example, take a two node system with two tables and some existing data: + + CREATE TABLE parent( + id integer primary key + ); + + CREATE TABLE child( + id integer primary key, + parent_id integer not null references parent(id) + ); + + INSERT INTO parent(id) + VALUES (1), (2); + + INSERT INTO child(id, parent_id) + VALUES (11, 1), (11, 2); + + If node A does: + + INSERT INTO child(id, parent_id) + VALUES (21, 2); + + and at the same time node B does: + + DELETE FROM child WHERE parent_id = 2; + DELETE FROM parent WHERE id = 2; + + then we have a situation where the transaction from node A cannot apply + successfully to the child table on node B because the + referenced parent no longer exists. The transaction + from node B cannot apply to node A because it deletes a + parent tuple that's still referenced, the new one with + id=21. Neither transaction can replay, and both will output periodic + ERRORs in the log files as they are retried. Since + &bdr; replays transactions from a given node strictly in order, neither + node can make progress with replication unless the user, or some 3rd node, + makes changes that resolve the deadlock. + + + + It is important to note that when we manually deleted the child tuples + on node B, the newly inserted child on node A was not affected because + it had not yet replicated to node B. If either node replays the other's + transaction before attempting its own local transaction then no problem + will occur. + + + + Solving such a foreign key deadlock requires that you fix the constraint + issue on each end. In this case, you would need to insert a dummy + parent row on node B and delete the new child on node + A. Replay will continue past the deadlock point. + + + + &bdr; can't just apply the changes from each end anyway because doing so + would result in tables that violated their declared foreign key + constraints, which most users would view as corruption. + + + + + + + + Exclusion constraint conflicts + - Constraint conflicts are generally transient and require no administrator action. + &bdr; doesn't support exclusion constraints and restricts their creation. - + - Constraint conflicts are the reason why &bdr; does not support exclusion - constraints. In a distributed asynchronous system it is not possible to - ensure that no set of rows that violates the constraint exists, because - all transactions on different nodes are fully isolated. Exclusion constraints - would lead to replay deadlocks where replay could not progress from any - node to any other node because of exclusion constraint violations. + If an existing stand-alone database is converted to a &bdr; database then + all exclusion constraints should be manually dropped. - + + + + In a distributed asynchronous system it is not possible to ensure that no + set of rows that violates the constraint exists, because all transactions + on different nodes are fully isolated. Exclusion constraints would lead to + replay deadlocks where replay could not progress from any node to any + other node because of exclusion constraint violations. + + + + If you force &bdr; to create an exclusion constraint, or you don't drop + existing ones when converting a standalone database to &bdr; you should + expect replication to break. You can get it to progress again by + removing or altering the local tuple(s) that an incoming remote tuple + conflicts with so that the remote transaction can be applied. +