bdr: doc: more explanation of multi-master

author Craig Ringer <craig@2ndquadrant.com>

Tue, 10 Mar 2015 14:36:51 +0000 (22:36 +0800)

committer Craig Ringer <craig@2ndquadrant.com>

Tue, 10 Mar 2015 14:36:51 +0000 (22:36 +0800)
author Craig Ringer <craig@2ndquadrant.com>
Tue, 10 Mar 2015 14:36:51 +0000 (22:36 +0800)
committer Craig Ringer <craig@2ndquadrant.com>
Tue, 10 Mar 2015 14:36:51 +0000 (22:36 +0800)
diff --git a/doc/bdr.sgml b/doc/bdr.sgml

index 911e7bb494aceb35c856143d978f8fe44a891a84..1bc839ec381765f91a997e0ab569e2bddfd19b49 100644 (file)
--- a/doc/bdr.sgml
+++ b/doc/bdr.sgml
@@ -100,6 +100,8 @@
    &manual-overview;
    &manual-node-management;
    &manual-ddl-replication;
+  &manual-conflicts;
+  &manual-global-sequences;
    &manual-replication-sets;
    &manual-monitoring;
    &manual-settings;
diff --git a/doc/filelist.sgml b/doc/filelist.sgml

index 9ce7e9f2a9763f0fd5be52750131e8e28e047200..2fc6059f369df2aa9a8e16f7a25871322d8e44a7 100644 (file)
--- a/doc/filelist.sgml
+++ b/doc/filelist.sgml
@@ -35,6 +35,8 @@
  <!ENTITY manual-overview      SYSTEM "manual-overview.sgml">
  <!ENTITY manual-node-management      SYSTEM "manual-node-management.sgml">
  <!ENTITY manual-ddl-replication      SYSTEM "manual-ddl-replication.sgml">
+<!ENTITY manual-conflicts      SYSTEM "manual-conflicts.sgml">
+<!ENTITY manual-global-sequences      SYSTEM "manual-global-sequences.sgml">
  <!ENTITY manual-replication-sets      SYSTEM "manual-replication-sets.sgml">
  <!ENTITY manual-distributed-sequences      SYSTEM "manual-distributed-sequences.sgml">
  <!ENTITY manual-monitoring      SYSTEM "manual-monitoring.sgml">
diff --git a/doc/manual-conflicts.sgml b/doc/manual-conflicts.sgml

new file mode 100644 (file)

index 0000000..ef66d40
--- /dev/null
+++ b/doc/manual-conflicts.sgml
@@ -0,0 +1,13 @@
+<chapter id="conflicts" xreflabel="Multi-master conflicts">
+ <title>Multi-master conflicts</title>
+ <indexterm>
+  <primary>Conflicts</primary>
+ </indexterm>
+ <para>
+  In multi-master use of &bdr; writes to the same or related table(s) from multiple
+  different nodes can result in data conflicts.
+
+  <!-- TODO -->
+
+ </para>
+</chapter>
diff --git a/doc/manual-global-sequences.sgml b/doc/manual-global-sequences.sgml

new file mode 100644 (file)

index 0000000..5a28c18
--- /dev/null
+++ b/doc/manual-global-sequences.sgml
@@ -0,0 +1,13 @@
+<chapter id="global-sequences" xreflabel="Global sequences">
+ <title>Global Sequences</title>
+ <indexterm>
+  <primary>Global Sequence</primary>
+  <secondary>Distributed sequence</secondary>
+ </indexterm>
+ <para>
+  To help avoid <xref linkend="conflicts"> on concurrent inserts &bdr; provides global sequences.
+
+  <!-- TODO -->
+
+ </para>
+</chapter>
diff --git a/doc/manual-overview.sgml b/doc/manual-overview.sgml

index 74ad2329ea5c0ef8335770b9fda6161b4ee07ffe..5250a755ac2c8dd1aa5047793ab05a501590f150 100644 (file)
--- a/doc/manual-overview.sgml
+++ b/doc/manual-overview.sgml
@@ -157,37 +157,47 @@
    </para>
  
    <para>
-   Major differences between physical replication and logical replication as implemented by &bdr;/&udr;:
+   Logical replication has a different set of trade-offs to physical
+   block-based replication. It isn't clearly better or worse. Physical
+   replication is a lot simpler, has less lag for big transactions, is
+   supported by older versions and may require less disk I/O, but generally
+   consumes more network bandwidth, can't replicate a subset of databases or
+   tables, and can't support multi-master or cross-version/cross-arch
+   replication. Which solution you should use depends on what you need to do.
+  </para>
+
+  <para>
+   The major differences between physical replication and logical replication as implemented by &bdr;/&udr; are:
     <itemizedlist>
      <listitem>
-     <simpara>There is no need to cancel long running queries on hot standbys, so there aren't any <quote>cancelling statement due to conflict with recovery</quote> errors.</simpara>
+     <simpara>Multi-master replication is possible (&bdr; only). All members are writable nodes that replicate changes.</simpara>
      </listitem>
      <listitem>
-     <simpara>Temporary tables may be used on replicas.</simpara>
+     <simpara>Data from index writes, <literal>VACUUM</literal>, hint bits, etc are not sent over the network, so bandwidth requirements may be reduced - especially when compared to physical replication with <literal>full_page_writes</literal>.</simpara>
      </listitem>
      <listitem>
-     <simpara>Replication across major versions (e.g. 9.4 to 9.5) is supported.</simpara>
+     <simpara>There is no need to use <ulink url="http://www.postgresql.org/docs/current/static/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACK"><literal>hot_standby_feedback</literal></ulink> or to cancel long running queries on hot standbys, so there aren't any <quote>cancelling statement due to conflict with recovery</quote> errors.</simpara>
      </listitem>
      <listitem>
-     <simpara>Replication across architectures and OSes (e.g. PPC64 Linux to x86_64 OS X) is supported.</simpara>
+     <simpara>Temporary tables may be used on replicas.</simpara>
      </listitem>
      <listitem>
-     <simpara>Replication is per-database (or finer grained), wheras physical replication can and must replicate all databases.</simpara>
+     <simpara>Tables that aren't being replicated from elsewhere may be written to (&udr; and &bdr;).</simpara>
      </listitem>
      <listitem>
-     <simpara>Logical replication imposes some restrictions on supported DDL (see: <xref linkend="ddl-replication">) that do not apply for physical replication</simpara>
+     <simpara>Replication across major versions (e.g. 9.4 to 9.5) is supported.</simpara>
      </listitem>
      <listitem>
-     <simpara>Commands that affect all databases, like <literal>ALTER SYSTEM</literal> or <literal>CREATE ROLE</literal> are <emphasis>not</emphasis> replicated by &bdr;/&udr; and must be managed by the administrator.</simpara>
+     <simpara>Replication across architectures and OSes (e.g. PPC64 Linux to x86_64 OS X) is supported.</simpara>
      </listitem>
      <listitem>
-     <simpara>Tables that aren't being replicated from elsewhere may be written to (&udr; and &bdr;).</simpara>
+     <simpara>Replication is per-database (or finer grained), wheras physical replication can and must replicate all databases.</simpara>
      </listitem>
      <listitem>
-     <simpara>Multi-master replication is possible (&bdr; only).</simpara>
+     <simpara>&bdr;'s logical replication implementation imposes some restrictions on supported DDL (see: <xref linkend="ddl-replication">) that do not apply for physical replication</simpara>
      </listitem>
      <listitem>
-     <simpara>Data from index writes, <literal>VACUUM</literal>, hint bits, etc are not sent on the wire, so network bandwidth requirements may be reduced - especially when compared to <literal>full_page_writes</literal>.</simpara>
+     <simpara>Commands that affect all databases, like <literal>ALTER SYSTEM</literal> or <literal>CREATE ROLE</literal> are <emphasis>not</emphasis> replicated by &bdr;/&udr; and must be managed by the administrator.</simpara>
      </listitem>
      <listitem>
       <simpara>Disk random I/O requirements and flush frequency may be higher than for physical replication.</simpara>
@@ -202,21 +212,14 @@
       <simpara>Logical replication cannot be used for point-in-time recovery (though it can support a replication delay).</simpara>
      </listitem>
      <listitem>
-     <simpara>Logical replication requires the use of a <ulink url="http://www.postgresql.org/docs/current/static/logicaldecoding-explanation.html">replication slot</ulink>, it isn't optional like in physical replication.</simpara>
+     <simpara>Logical replication only works via streaming, not WAL file archiving, and requires the use of a <ulink url="http://www.postgresql.org/docs/current/static/logicaldecoding-explanation.html">replication slot</ulink>.</simpara>
+    </listitem>
+    <listitem>
+     <simpara>Cascading replication is not (yet) supported by logical replication.</simpara>
      </listitem>
     </itemizedlist>
    </para>
  
-  <para>
-   It should be clear that logical replication has a different set of
-   trade-offs to physical block-based replication. It isn't clearly better or
-   worse. Physical replication is a lot simpler, has less lag for big
-   transactions, is supported by older versions and may require less disk I/O,
-   but generally consumes more network bandwidth, can't replicate a subset
-   of databases or tables, and can't support multi-master or cross-version/cross-arch
-   replication. Which solution you should use depends on what you need to do.
-  </para>
-
    <para>
     Most users will want to use physical replication and/or WAL archiving for
     redundancy, high availability, backup and PITR.  Logical replication is well
@@ -309,7 +312,7 @@
    </para>
  
    <para>
-   Trigger-based replication also requires an external dæmon process on
+   Trigger-based replication also requires an external d&aelig;mon process on
     the sending and/or receiving side. &bdr; runs its management processes
     inside PostgreSQL its self, so there's no separate replication process
     to manage.
@@ -337,11 +340,20 @@
     or another technology) it is important to understand what's involved,
     and that not all multi-master systems are equal.
    </para>
-  
-  <para>
-   &bdr; is a loosely coupled shared-nothing multi-master design.
-  </para>
-   
+
+  <note>
+   <para>
+    You don't have to use &bdr; for multi-master. It's quite reasonable
+    to write to only one node, using &bdr; like an improved read-replica
+    system. It's also possible to make sure that any given table/schema
+    is only written to on one particular node, so no conflicts can arise.
+    You still have to consider replication lag, but no more or less than with
+    normal hot standbys. It only gets complicated when your application writes
+    to the same tables on multiple nodes at once. If you need to do that,
+    keep reading.
+   </para>
+  </note>
+
    <para>
     Some multi-master systems are <emphasis>tightly-coupled</emphasis>; these
     tend to make all nodes appear to be part of the same virtual database to
@@ -372,30 +384,103 @@
     the system very tolerant of latency between nodes, temporary network
     partitions or node outages, etc, and makes scale-out more efficient.
    </para>
+  
+  <para>
+   &bdr; is a loosely coupled shared-nothing multi-master design.
+  </para>
  
    <para>
-   This is a broad and overly simplified characterisation of replication, but it's
-   enough to help you understand a few key things about &bdr;:
+   This is a broad and overly simplified characterisation of replication, but
+   it's enough to explain why applications that use &bdr; for multi-master
+   writes need to be aware of the anomalies that can be introduced by
+   asynchronous multi-master replication. It should also help illustrate
+   that applications get some significant benefits in exchange:
     <itemizedlist>
-
-    <!--
-         * eventually consistent, no single view
-         * locks not replicated, serializable and read committed only within a node
-         * conflicts possible
-         * optimistic conflict handling, user defined handlers
-         * need to use global sequences, prefer insertion to updating where contention arises
-         * application may need to negotiate synchronisation
-    -->
-
      <listitem>
       <simpara>
-      WIP
+      Applications using &bdr; are free to write to any node so long as they
+      are careful to prevent or cope with conflicts.
+     </simpara>
+    </listitem>
+    <listitem>
+     <simpara>
+      There is no complex election of a new master if a node goes down or
+      network problems arise. There is no wait for failover. Each node is
+      always a master and always directly writeable.
+     </simpara>
+    </listitem>
+    <listitem>
+     <simpara>
+      The application can be geographically distributed so that the app and is
+      close to the data and the user for better performance and availability.
+      Reads can be satisfied locally.
+     </simpara>
+    </listitem>
+    <listitem>
+     <simpara>
+      Applications can be partition-tolerant: the application can keep keep
+      working even if it loses communication with some or all other nodes, then
+      re-sync automatically when connectivity is restored. Loss of a critical
+      VPN tunnel or WAN won't bring the entire store or satellite office to a
+      halt.
       </simpara>
      </listitem>
-
     </itemizedlist>
    </para>
  
+  <para>
+   With the advantages come challenges.
+  </para>
+
+  <para>
+   Because &bdr; replicates asynchronously, not all nodes have the same view of
+   the data at any given instant. On a single node it is guaranteed that
+   committed transactions' changes become visible immediately to newly started
+   transactions (or in <literal>READ COMMITTED</literal> mode, statements).
+   This isn't true in &bdr; - if you <literal>COMMIT</literal> a transaction
+   that changes a row on one node, then <literal>SELECT</literal> that row on
+   another node, you may well still get the old value. Applications must
+   therefore be designed to be tolerant of stale data or to be "sticky" to a
+   node, where they prefer read data from the same node they write it to.
+   This is also true of applications using PostgreSQL's physical replication
+   feature unless it is used in synchronous mode with only one replica, so
+   it's a challenge that's far from unique to &bdr;.
+  </para>
+
+  <para>
+   Locking operations are not replicated to other nodes. If you lock a row
+   or table in one node the other nodes have no idea it is locked elsewhere.
+   Applications that rely on row or table locking for correctness will
+   only work correctly if all their writes and locked reads occur on a
+   single node. Applications might rely on locking explicitly via
+   <literal>LOCK TABLE</literal> or
+   <literal>SELECT ... FOR UPDATE / SHARE</literal>, but most applications
+   rely on it implicitly via <literal>UPDATE</literal> and
+   <literal>DELETE</literal> row-locking, so the absence of explicit locking
+   does not mean an app is automatically multi-master safe.
+  </para>
+
+  <para>
+   Because of asynchronous replication and the lack of global locking,
+   it is possible for transactions on different nodes to perform actions
+   that could not happen if both transactions ran on a single node.
+   These are called <emphasis>conflicts</emphasis> and are discussed in detail
+   separately; see <xref linkend="conflicts">. &bdr; can resolve
+   conflicts using a simple last-update-wins strategy or using user-defined
+   conflict handlers. Either way the application design needs to consider
+   that conflicts may occur, and where possible minimise them. Na&iuml;ive
+   applications that ignore the the possibility of conflicts when writing
+   to multiple masters may suffer from lost-updates and other undesirable
+   data anomalies.
+  </para>
+
+  <para>
+   &bdr; provides some tools to help make application design easier.  The most
+   important is <xref linkend="global-sequences">, which provide a
+   &bdr;-group-wide generator of unique values for use in synthetic keys.
+   Others are discussed in the <xref linkend="conflicts"> section.
+  </para>
+
   </sect1>
  
   <sect1 id="how-bdr-works" xreflabel="How &bdr; and &udr; work">
@@ -404,15 +489,17 @@
    <sect2>
     <title>Differences between &bdr; and &udr;</title>
  
+   <!-- NEEDS-REVIEW -->
+
     <para>
      The most important difference between &bdr; and &udr; is that
      with &bdr; changes can be made on both nodes; and that methods
      for resolving conflicts arising due to that exist.
-
     </para>
  
     <sect3>
      <title>Primary Usecases for &udr;</title>
+
      <para>
       If changes are only ever made on one node and the others are
       just used for read scalability (including temporary tables on
@@ -420,18 +507,20 @@
       from different nodes &udr; is sufficient. These scenarios cover
       a large number of replication scenarios.
      </para>
+
      <para>
       Using &udr; can noticeably reduce the likelihood of application
-     incompatibilities, because they only one database will be
-     written to, which prevents conflicts and such. On the other
-     hand, a &bdr; setup can make failover much easier, because there
-     is no need to change the setup.
+     incompatibilities. Only one database will be written to, which prevents
+     conflicts and such. On the other hand, a &bdr; setup can make failover
+     much easier, because there is no need to change the setup.
      </para>
+
     </sect3>
  
     <sect3>
      <title>Primary Usecases for &udr;</title>
      <para>
+     <!-- NEEDS-REVIEW -->
      </para>
     </sect3>
    </sect2>
author	Craig Ringer <craig@2ndquadrant.com>
	Tue, 10 Mar 2015 14:36:51 +0000 (22:36 +0800)
committer	Craig Ringer <craig@2ndquadrant.com>
	Tue, 10 Mar 2015 14:36:51 +0000 (22:36 +0800)
doc/bdr.sgml		patch \| blob \| blame \| history
doc/filelist.sgml		patch \| blob \| blame \| history
doc/manual-conflicts.sgml	[new file with mode: 0644]	patch \| blob
doc/manual-global-sequences.sgml	[new file with mode: 0644]	patch \| blob
doc/manual-overview.sgml		patch \| blob \| blame \| history