Add in documentation about complex failover scenarios, e.g. - the handling

author Christopher Browne <cbbrowne@ca.afilias.info>

Fri, 16 Jan 2009 17:16:52 +0000 (17:16 +0000)

committer Christopher Browne <cbbrowne@ca.afilias.info>

Fri, 16 Jan 2009 17:16:52 +0000 (17:16 +0000)
author Christopher Browne <cbbrowne@ca.afilias.info>
Fri, 16 Jan 2009 17:16:52 +0000 (17:16 +0000)
committer Christopher Browne <cbbrowne@ca.afilias.info>
Fri, 16 Jan 2009 17:16:52 +0000 (17:16 +0000)
diff --git a/doc/adminguide/complexenv.dia b/doc/adminguide/complexenv.dia

index 220ae3bc0de7e82786d9b2acb6fd4dc02f538861..cc88d836d6103d17acbdecafe4759f39d74064bc 100644 (file)

Binary files a/doc/adminguide/complexenv.dia and b/doc/adminguide/complexenv.dia differ
diff --git a/doc/adminguide/complexenv.png b/doc/adminguide/complexenv.png

index a55d52983a70f96b4796b5cc7de2cb53753732c8..3f729eee808a5b08b1c413c01d9585bc24dcaa7a 100644 (file)

Binary files a/doc/adminguide/complexenv.png and b/doc/adminguide/complexenv.png differ
diff --git a/doc/adminguide/complexfail.dia b/doc/adminguide/complexfail.dia

new file mode 100644 (file)

index 0000000..b0aed46

Binary files /dev/null and b/doc/adminguide/complexfail.dia differ
diff --git a/doc/adminguide/complexfail.png b/doc/adminguide/complexfail.png

new file mode 100644 (file)

index 0000000..3f729ee

Binary files /dev/null and b/doc/adminguide/complexfail.png differ
diff --git a/doc/adminguide/failover.sgml b/doc/adminguide/failover.sgml

index 0fa59d5e73097c134bc57cc2ad714f95925463a2..12665b60044770a6eeda086bb5dcb21c0283e126 100644 (file)
--- a/doc/adminguide/failover.sgml
+++ b/doc/adminguide/failover.sgml
@@ -1,4 +1,4 @@
-<!-- $Id: failover.sgml,v 1.28 2008-10-13 19:29:12 devrim Exp $ -->
+<!-- $Id: failover.sgml,v 1.29 2009-01-16 17:16:52 cbbrowne Exp $ -->
  <sect1 id="failover">
  <title>Doing switchover and failover with &slony1;</title>
  <indexterm><primary>failover</primary>
@@ -240,6 +240,81 @@ order after this change. </para> </listitem>
  
  </sect2>
  
+<sect2 id="complexfailover"> <title> Failover With Complex Node Set </title>
+
+<para> Failover is relatively <quote/simple/ if there are only two
+nodes; if a &slony1; cluster comprises many nodes, achieving a clean
+failover requires careful planning and execution. </para>
+
+<para> Consider the following diagram describing a set of six nodes at two sites.
+
+<inlinemediaobject> <imageobject> <imagedata fileref="complexenv.png">
+</imageobject> <textobject> <phrase> Symmetric Multisites </phrase>
+</textobject> </inlinemediaobject></para>
+
+<para> Let us assume that nodes 1, 2, and 3 reside at one data
+centre, and that we find ourselves needing to perform failover due to
+failure of that entire site.  Causes could range from a persistent
+loss of communications to the physical destruction of the site; the
+cause is not actually important, as what we are concerned about is how
+to get &slony1; to properly fail over to the new site.</para>
+
+<para> We will further assume that node 5 is to be the new origin,
+after failover. </para>
+
+<para> The sequence of &slony1; reconfiguration required to properly
+failover this sort of node configuration is as follows:
+</para>
+
+<itemizedlist>
+
+<listitem><para> Resubscribe (using <xref linkend="stmtsubscribeset">
+ech node that is to be kept in the reformation of the cluster that is
+not already subscribed to the intended data provider.  </para>
+
+<para> In the example cluster, this means we would likely wish to
+resubscribe nodes 4 and 6 to both point to node 5.</para>
+
+<programlisting>
+   include &lt;/tmp/failover-preamble.slonik&gt;;
+   subscribe set (id = 1, provider = 5, receiver = 4);
+   subscribe set (id = 1, provider = 5, receiver = 4);
+</programlisting>
+
+</listitem>
+<listitem><para> Drop all unimportant nodes, starting with leaf nodes.</para>
+
+<para> Since nodes 1, 2, and 3 are unaccessible, we must indicate the
+<envar>EVENT NODE</envar> so that the event reaches the still-live
+portions of the cluster. </para>
+
+<programlisting>
+   include &lt;/tmp/failover-preamble.slonik&gt;;
+   drop node (id=2, event node = 4);
+   drop node (id=3, event node = 4);
+</programlisting>
+
+</listitem>
+
+<listitem><para> Now, run <command>FAILOVER</command>.</para>
+
+<programlisting>
+   include &lt;/tmp/failover-preamble.slonik&gt;;
+   failover (id = 1, backup node = 5);
+</programlisting>
+
+</listitem>
+
+<listitem><para> Finally, drop the former origin from the cluster.</para>
+
+<programlisting>
+   include &lt;/tmp/failover-preamble.slonik&gt;;
+   drop node (id=1, event node = 4);
+</programlisting>
+</listitem>
+
+</itemizedlist>
+
  <sect2><title> Automating <command> FAIL OVER </command> </title>
  
  <indexterm><primary>automating failover</primary></indexterm>
diff --git a/doc/adminguide/slonik_ref.sgml b/doc/adminguide/slonik_ref.sgml

index abeabf6d234917dcce4bfd9142ec388252c1f3f8..b7c24a7ed697f41db8d097e75abc2d4bfc66bb0e 100644 (file)
--- a/doc/adminguide/slonik_ref.sgml
+++ b/doc/adminguide/slonik_ref.sgml
@@ -2575,6 +2575,15 @@ FAILOVER (
      linkend="stmtmoveset"> instead, as that does
      <emphasis>not</emphasis> abandon the failed node.
      </para>
+
+    <para> If there are many nodes in a cluster, and failover includes
+    dropping out additional nodes (<emphasis>e.g.</emphasis> when it
+    is necessary to treat <emphasis>all</emphasis> nodes at a site
+    including an origin as well as subscribers as failed), it is
+    necessary to carefully sequence the actions, as described in <xref
+    linkend="complexfailover">.
+    </para>
+
     </refsect1>
     <refsect1> <title> Version Information </title>
      <para> This command was introduced in &slony1; 1.0 </para>
author	Christopher Browne <cbbrowne@ca.afilias.info>
	Fri, 16 Jan 2009 17:16:52 +0000 (17:16 +0000)
committer	Christopher Browne <cbbrowne@ca.afilias.info>
	Fri, 16 Jan 2009 17:16:52 +0000 (17:16 +0000)
doc/adminguide/complexenv.dia		patch \| blob \| blame \| history
doc/adminguide/complexenv.png		patch \| blob \| blame \| history
doc/adminguide/complexfail.dia	[new file with mode: 0644]	patch \| blob
doc/adminguide/complexfail.png	[new file with mode: 0644]	patch \| blob
doc/adminguide/failover.sgml		patch \| blob \| blame \| history
doc/adminguide/slonik_ref.sgml		patch \| blob \| blame \| history