<primary>Node Management</primary>
</indexterm>
- <para>
- &bdr; and &udr; require different steps for setting up a node,
- as with &bdr; replication is all-to-all (mesh), wheras for &udr;
- replication is unidirectional. The exact commands required
- differ and are documented below under <xref linkend="node-management-udr">
- and <xref linkend="node-management-bdr">. Both modes share
- many of the same concepts; see <xref linkend="node-management-common">.
- </para>
-
- <sect1 id="node-management-common" xreflabel="Node management common to both UDR and BDR">
- <title>Node management common to both UDR and BDR</title>
+ <sect1 id="node-management-joining" xreflabel="Joining or subscribing a node">
+ <title>Joining or subscribing a node</title>
+
+ <para>
+ &bdr; and &udr; require different steps for setting up a node
+ because &bdr; replication is all-to-all (mesh), wheras for &udr;
+ replication is unidirectional. Both modes share many of the same concepts as
+ discussed below. The exact commands required differ and are documented below
+ under <xref linkend="node-management-joining-udr"> and <xref
+ linkend="node-management-joining-bdr">.
+ </para>
<para>
When a new &bdr; node is joined to an existing &bdr; group, or when a
</para>
<para>
- For the details, see <xref linkend="node-management-udr"> or
- <xref linkend="node-management-bdr"> as appropriate.
+ For the details, see <xref linkend="node-management-joining-udr"> or
+ <xref linkend="node-management-joining-bdr"> as appropriate.
</para>
- </sect1>
+ <sect2 id="node-management-joining-udr" xreflabel="Subscribing a UDR node">
+ <title>Subscribing a &udr; node</title>
- <sect1 id="node-management-udr" xreflabel="Node management for UDR">
- <title>Node Management for &udr;</title>
+ <note>
+ <para>
+ Read <xref linkend="node-management-joining"> before this section.
+ </para>
+ </note>
- <note>
<para>
- Read <xref linkend="node-management-common"> before this section.
+ The SQL function <xref linkend="functions-node-mgmt-subscribe"> is used to receive
+ changes from the database specified in the function parameters
+ into the current database. Subscribing to another node using this
+ function will automatically copy the existing data in that the
+ database subscribed to.
</para>
- </note>
+
+ <para>
+ See also: <xref linkend="functions-node-mgmt">, <xref linkend="command-bdr-init-copy">.
+ </para>
+ </sect2>
+
+ <sect2 id="node-management-joining-bdr" xreflabel="Joining or creating a BDR node">
+ <title>Joining or creating a &bdr; node</title>
+
+ <note>
+ <para>
+ Read <xref linkend="node-management-joining"> before this section.
+ </para>
+ </note>
+
+ <para>
+ For &bdr; every node has to have a connection to every other node. To make
+ configuration easy, when a new node joins it automatically configures all
+ existing nodes to connect to it. For this reason, every node, including
+ the first &bdr; node created, must know the PostgreSQL connection string
+ (sometimes referred to as a <acronym>DSN</acronym>) that other nodes
+ can use to connect to it.
+ </para>
+
+ <para>
+ The SQL function <xref linkend="function-bdr-group-create">
+ is used to create the first node of a &bdr; cluster from a standalone
+ PostgreSQL database. Doing so makes &bdr; active on that
+ database and allows other nodes to join the &bdr; cluster (which
+ consists out of one node at that point). You must specify the
+ connection string that other nodes will use to connect to this
+ node at the time of creation.
+ </para>
+
+ <para>
+ Whether you plan on using logical or physical copy to join
+ subsequent nodes, the first node must always be created
+ using <xref linkend="function-bdr-group-create">.
+ </para>
+
+ <para>
+ Once the initial node is created every further node can join the &bdr;
+ cluster using the <xref linkend="function-bdr-group-join"> function
+ or using <xref linkend="command-bdr-init-copy">.
+ </para>
+
+ <para>
+ Either way, when joining you must nominate a single node that is already a
+ member of the &bdr; group as the join target. This node's contents are
+ copied to become the initial state of the newly joined node. The new node
+ will then synchronise with the other nodes to ensure it has the same
+ contents as the others.
+ </para>
+
+ <para>
+ Generally you should pick whatever node is closest to the new node in
+ network terms as the join target.
+ </para>
+
+ <para>
+ Which node you choose to copy only really matters if you are using
+ non-default <xref linkend="replication-sets">. See the replication
+ sets documentation for more information on this.
+ </para>
+
+ <para>
+ See also: <xref linkend="functions-node-mgmt">, <xref linkend="command-bdr-init-copy">.
+ </para>
+
+ </sect2>
+
+ </sect1>
+
+ <sect1 id="node-management-removing" xreflabel="Removing a node">
+ <title>Removing a node</title>
<para>
- The SQL function <xref linkend="functions-node-mgmt-subscribe"> is used to receive
- changes from the database specified in the function parameters
- into the current database. Subscribing to another node using this
- function will automatically copy the existing data in that the
- database subscribed to.
+ Because &bdr; and &udr; can recover from extended node outages it is
+ necessary to explicitly tell the system if you are removing a node
+ permanently. If you permanently shut down a node and don't tell
+ the other nodes then performance will suffer and eventually
+ the whole system will stop working.
</para>
<para>
- See also: <xref linkend="functions-node-mgmt">, <xref linkend="command-bdr-init-copy">.
+ Each node saves up change information (using one
+ <ulink url="http://www.postgresql.org/docs/current/static/logicaldecoding-explanation.html">
+ replication slot</ulink> for each peer node) so it can replay changs to a
+ temporarily unreachable node. If a peer node remains offline indefinitely
+ this accumulating change information will cause the node to run out of
+ storage space for PostgreSQL transaction logs (<acronym>WAL</acronym>, in
+ <filename>pg_xlog</filename>), likely causing the database server to shut
+ down with an error like:
+ <programlisting>
+ PANIC: could not write to file "pg_xlog/xlogtemp.559": No space left on device
+ </programlisting>
+ or report other out-of-disk related symptoms.
</para>
- </sect1>
-
- <sect1 id="node-management-bdr" xreflabel="Node management for BDR">
- <title>Node Management for &bdr;</title>
<note>
<para>
- Read <xref linkend="node-management-common"> before this section.
+ Administrators should monitor for node outages (see: <xref
+ linkend="monitoring"> and make sure nodes have sufficient free disk space.
</para>
</note>
<para>
- For &bdr; every node has to have a connection to every other
- node. To make conifguration easy, every node addition
- automatically adds awareness of the new to all preexisting nodes.
- </para>
-
- <para>
- The SQL function <xref linkend="function-bdr-group-create">
- is used to create the first node of a &bdr; cluster from a standalone
- PostgreSQL database. Doing so makes &bdr; active on that
- database and allows other nodes to join the &bdr; cluster (which
- consists out of one node at that point). Once the initial node is
- created every further node can join the &bdr; cluster using
- the <xref linkend="function-bdr-group-join"> function.
+ A node is removed with the <xref linkend="function-bdr-part-by-node-names">
+ function. You must specify the node name (as passed during node creation)
+ to remove a node.
</para>
<para>
- See also: <xref linkend="functions-node-mgmt">, <xref linkend="command-bdr-init-copy">.
+ If you only know the slot name from <literal>pg_replication_slots</literal>
+ and not the node name from <literal>bdr.bdr_nodes</literal> you can either
+ <literal>SELECT</literal> <xref linkend="functions-bdr-get-node-name">
+ on the node you plan to remove, or look it up from the slot name using
+ <!-- TODO make this a proper xref -->
+ the <literal>bdr.bdr_node_slots</literal> view.
</para>
</sect1>