doc: Skeletal documentation on catalogs, conflicts, more on monitoring

author Craig Ringer <craig@2ndquadrant.com>

Tue, 24 Mar 2015 09:49:07 +0000 (17:49 +0800)

committer Craig Ringer <craig@2ndquadrant.com>

Tue, 24 Mar 2015 09:49:07 +0000 (17:49 +0800)
author Craig Ringer <craig@2ndquadrant.com>
Tue, 24 Mar 2015 09:49:07 +0000 (17:49 +0800)
committer Craig Ringer <craig@2ndquadrant.com>
Tue, 24 Mar 2015 09:49:07 +0000 (17:49 +0800)
diff --git a/doc/manual-catalogs-views.sgml b/doc/manual-catalogs-views.sgml

index 34ab4cf3d1b7f1c71c2ae110267c70a36df93b6f..bc11d04f0ea5f9dafff046d7a85c379ad3a5dce9 100644 (file)
--- a/doc/manual-catalogs-views.sgml
+++ b/doc/manual-catalogs-views.sgml
@@ -5,9 +5,99 @@
    <secondary>Views</secondary>
    <seealso>Monitoring</seealso>
   </indexterm>
- <para>
  
-  <!-- TODO -->
+ <sect1 id="catalog-bdr-nodes" xreflabel="bdr.bdr_nodes">
+  <title>bdr.bdr_nodes</title>
+
+  <para>
+   The <literal>bdr.bdr_nodes</literal> table keeps track of a node's
+   membership in a &bdr; group.  A row is inserted or updated in the table
+   during the node join process, and during node removal.
+  </para>
+
+  <para>
+   The 'status' column may have the following values, which are subject to change:
+   <itemizedlist>
+    <listitem><para><literal>r</literal>- Ready: The node can function fully. Slots may be created on this node and it can participate with the &bdr group.  </para> </listitem>
+    <listitem><para><literal>b</literal>- Joining: The node is bootstrapping. This state occurs when join has been initiated but a copy has not yet been begun.</para></listitem>
+    <listitem><para><literal>i</literal>- Joining: The node is doing initial slot creation or an initial dump and load</para> </listitem>
+    <listitem><para><literal>c</literal>- Joining: The node is catching up to the target node and is not yet ready to participate with the &bdr; group.  </para> </listitem>
+    <listitem><para><literal>o</literal>- Joining: The node has caught up with the target node and is waiting for all inbound and outbound slots to be created.  </para> </listitem>
+    <listitem><para><literal>k</literal>- Parting/Parted: The node has been 'killed' or removed by the user with the function <literal>bdr.bdr_part_by_node_names</literal>.</para> </listitem>
+    <!-- TODO: list incomplete for 0.9 -->
+   </itemizedlist>
+  </para>
+
+  <para>
+   Note that the status doesn't indicate whether the node is actually up right
+   now. A node may be shut down, isolated from the network, or crashed and still
+   appear as <literal>r</literal> in <literal>bdr.bdr_nodes</literal> because it's
+   still conceptually part of the BDR group. Check
+   <ulink url="http://www.postgresql.org/docs/current/static/monitoring-stats.html#PG-STAT-REPLICATION-VIEW">pg_stat_replication</ulink>
+   and
+   <ulink url="http://www.postgresql.org/docs/current/static/catalog-pg-replication-slots.html">pg_replication_slots</ulink>
+   for the connection and replay status of a node. See <xref linkend="monitoring">.
+  </para>
+
+  <warning>
+   <para>
+    Do not directly modify <literal>bdr.bdr_nodes</literal>. Use the provided
+    node management functions instead. See <xref linkend="functions-node-mgmt">.
+   </para>
+  </warning>
+
+  <!-- TODO: column definitions, copy from wiki -->
+
+ </sect1>
+
+ <sect1 id="catalog-pg-stat-bdr" xreflabel="bdr.pg_stat_bdr">
+  <title>bdr.pg_stat_bdr</title>
+
+  <para>
+   Performance and conflict statistics are maintained for each node by &bdr; in
+   the <literal>bdr.pg_stat_bdr</literal> table. This table is <emphasis>not
+   replicated</emphasis> between nodes, so each node has separate stats. Each row
+   represents the &bdr; apply statistics for a different peer node.
+  </para>
+
+  <para>
+   An example listing from this table might look like:
+   <programlisting>
+   SELECT * FROM bdr.pg_stat_bdr;
+    rep_node_id | rilocalid |               riremoteid               | nr_commit | nr_rollback | nr_insert | nr_insert_conflict | nr_update | nr_update_conflict | nr_delete | nr_delete_conflict | nr_disconnect
+   -------------+-----------+----------------------------------------+-----------+-------------+-----------+--------------------+-----------+--------------------+-----------+--------------------+---------------
+              1 |         1 | bdr_6127682459268878512_1_16386_16386_ |         4 |           0 |         6 |                  0 |         1 |                  0 |         0 |                  3 |             0
+              2 |         2 | bdr_6127682494973391064_1_16386_16386_ |         1 |           0 |         0 |                  0 |         1 |                  0 |         0 |                  0 |             0
+   (2 rows)
+   </programlisting>
+  </para>
+
+  <!-- TODO: Column definitions, copy from wiki -->
+
+ </sect1>
+
+ <sect1 id="catalog-bdr-conflict-history" xreflabel="bdr.bdr_conflict_history">
+  <title>bdr.bdr_conflict_history</title>
+
+  <para>
+   <literal>bdr.bdr_conflict_history</literal> contains an entry for each
+   conflict generated in the system while <xref linkend="conflicts-logging">
+   is enabled.
+  </para>
+
+  <para>
+   This history table is <emphasis>not replicated</emphasis> between nodes, so
+   each node has separate conflict history records. This is a technical
+   limitation that may be lifted in a future release, but it also saves on
+   unnecessary replication overhead.
+  </para>
+
+  <para>
+   It is safe to <literal>TRUNCATE</literal> this table to save disk space.
+  </para>
+
+  <!-- TODO: colun definitions, example content -->
+
+ </sect1>
  
- </para>
  </chapter>
diff --git a/doc/manual-conflicts.sgml b/doc/manual-conflicts.sgml

index ef66d40ff8378481760d8a0098a26bbd8c47e5c1..467a4e94006be8f9b4264504c4f9e49577c6878e 100644 (file)
--- a/doc/manual-conflicts.sgml
+++ b/doc/manual-conflicts.sgml
@@ -1,13 +1,80 @@
  <chapter id="conflicts" xreflabel="Multi-master conflicts">
   <title>Multi-master conflicts</title>
+
   <indexterm>
    <primary>Conflicts</primary>
   </indexterm>
+
   <para>
    In multi-master use of &bdr; writes to the same or related table(s) from multiple
    different nodes can result in data conflicts.
+ </para>
  
-  <!-- TODO -->
+ <!-- TODO -->
+
+ <sect1 id="conflicts-types" xreflabel="Types of conflict">
+  <title>Types of conflict</title>
+
+  <para>
+   Many conflict types are possible:
+   <itemizedlist>
+    <listitem><simpara><literal>INSERT</literal>/<literal>INSERT</literal></simpara></listitem>
+    <listitem><simpara><literal>INSERT</literal>/<literal>UPDATE</literal></simpara></listitem>
+    <listitem><simpara><literal>UPDATE</literal>/<literal>DELETE</literal></simpara></listitem>
+    <listitem><simpara><literal>INSERT</literal>/<literal>DELETE</literal></simpara></listitem>
+    <listitem><simpara>Transient unhandled conflicts, e.g. foreign key constraint replay order conflicts</simpara></listitem>
+    <listitem><simpara>Divergent conflicts</simpara></listitem>
+   </itemizedlist>
+  </para>
+
+  <!-- TODO: explain conflict types etc -->
+
+ </sect1>
+
+ <sect1 id="conflicts-logging" xreflabel="Conflict logging">
+  <title>Conflict logging</title>
+
+  <para>
+   To make diagnosis and handling of multi-master conflicts easier, &bdr;
+   supports logging of each conflict incident in a <xref linkend="catalog-bdr-conflict-history"> table.
+  </para>
+
+  <para>
+   Conflict logging is only enabled when <xref
+   linkend="guc-bdr-conflict-logging-include-tuples"> is
+   <literal>true</literal>.
+  </para>
+
+  <para>
+   You can use the conflict history table to determine how rapidly your
+   application creates conflicts and where those conflicts occur, allowing you to
+   improve the application to reduce conflict rates. It also helps detect cases
+   where conflict resolutions may not have produced the desired results, allowing
+   you to identify places where a user defined conflict trigger or an application
+   design change may be desirable.
+  </para>
+
+  <para>
+   Row values may optionally be logged for row conflicts. This is controlled by
+   the global database-wide option <xref linkend="guc-bdr-log-conflicts-to-table">.
+   There is no per-table control over row value logging at this time. Nor is
+   there any limit applied on the number of fields a row may have, number of
+   elements dumped in arrays, length of fields, etc, so it may not be wise to
+   enable this if you regularly work with multi-megabyte rows that may trigger
+   conflicts.
+  </para>
+
+  <para>
+   Because the conflict history table contains data on every table in the
+   database so each row's schema might be different, if row values are logged
+   they are stored as json fields. The json is created with
+   <function>row_to_json</function>, just like if you'd called it on the row
+   yourself from SQL. There is no corresponding
+   <function>json_to_row</function> function in PostgreSQL at this time, so
+   you'll need table-specific code (pl/pgsql, pl/python, pl/perl, whatever) if
+   you want to reconstruct a composite-typed tuple from the logged json.
+  </para>
+
+ </sect1>
  
- </para>
  </chapter>
diff --git a/doc/manual-monitoring.sgml b/doc/manual-monitoring.sgml

index 4ca8018e0f3ce73d4a6373ae91a482373eafeef1..5aef08614c1867967829ead22718ae626a81f014 100644 (file)
--- a/doc/manual-monitoring.sgml
+++ b/doc/manual-monitoring.sgml
@@ -10,174 +10,295 @@
    and doesn't run out of disk space or encounter other faults that may halt operations.
   </para>
  
- <para>
-  If one or more nodes are down in a &bdr; group then <acronym>DDL</acronym>
-  locking for <xref linkend="ddl-replication"> will wait indefinitely or until
-  cancelled. <acronym>DDL</acronym> locking requires <emphasis>consensus</emphasis>
-  across all nodes, not just a quorum, so it must be able to reach all nodes.
-  So it's important to monitor for node outages.
- </para>
+ <sect1 id="monitoring-why" xreflabel="Why monitoring matters">
+  <title>Why monitoring matters</title>
  
- <para>
-  Global sequence chunk allocations can also be distrupted if half or more of
-  the nodes are down or unreachable. See
-  <xref linkend="global-sequence-voting">.
- </para>
+  <para>
+   If one or more nodes are down in a &bdr; group then <acronym>DDL</acronym>
+   locking for <xref linkend="ddl-replication"> will wait indefinitely or until
+   cancelled. <acronym>DDL</acronym> locking requires <emphasis>consensus</emphasis>
+   across all nodes, not just a quorum, so it must be able to reach all nodes.
+   So it's important to monitor for node outages.
+  </para>
  
- <para>
-  The <literal>bdr.bdr_nodes</literal> table keeps track of a node's
-  membership in a &bdr; group.  A row is inserted or updated in the table
-  during the node join process, and during node removal. The 'status' column
-  may have the following values:
-  <itemizedlist>
-   <listitem>
-    <para>
-     i -  The node is doing initial slot creation or an initial dump and load (see init_replica, above)
-    </para>
-   </listitem>
-   <listitem>
-    <para>
-     c - The node is catching up to the target node and is not yet ready to participate with the &bdr; group.
-    </para>
-   </listitem>
-   <listitem>
-    <para>
-     k - The node has been 'killed' or removed by the user with the function bdr.bdr_part_by_node_names.
-    </para>
-   </listitem>
-   <listitem>
-    <para>
-     r - The node is fully ready.  Slots may be created on this node and it can participate with the &bdr group.
-    </para>
-   </listitem>
-   <!-- TODO: list incomplete for 0.9 -->
-  </itemizedlist>
- </para>
+  <para>
+   Global sequence chunk allocations can also be distrupted if half or more of
+   the nodes are down or unreachable. See
+   <xref linkend="global-sequence-voting">.
+  </para>
  
- <para>
-  Note that the status doesn't indicate whether the node is actually up right
-  now. A node may be shut down, isolated from the network, or crashed and still
-  appear as <literal>r</literal> in <literal>bdr.bdr_nodes</literal> because it's
-  still conceptually part of the BDR group. Check
-  <ulink url="http://www.postgresql.org/docs/current/static/monitoring-stats.html#PG-STAT-REPLICATION-VIEW">pg_stat_replication</ulink>
-  and
-  <ulink url="http://www.postgresql.org/docs/current/static/catalog-pg-replication-slots.html">pg_replication_slots</ulink>
-  for the connection and replay status of a node.
- </para>
+  <para>
+   Because <acronym>DDL</acronym> locking and global sequence allocations
+   insert messages into the replication stream, a node that is extremely
+   behind on replay will cause similar disruption to one that is entirely
+   down.
+  </para>
  
- <warning>
    <para>
-   Do not directly modify <literal>bdr.bdr_nodes</literal>. Use the provided
-   node management functions instead. See <xref linkend="functions-node-mgmt">.
+   Protracted node outages can also cause disk space exhaustion, resulting in
+   other nodes rejecting writes or performing emergency shutdowns.  Because
+   every node connects to every other node there is a replication slot for every
+   downstream peer node on each node. Replication slots ensure that an upstream
+   (sending) server will retain enough write-ahead log (<acronym>WAL</acronym>)
+   in <filename>pg_xlog</filename> to resume replay from point the downstream
+   peer (receiver) last replayed on that slot. If a peer stops consuming data on
+   a slot or falls increasingly behind on replay then the server that has that
+   slot will accumulate <acronym>WAL</acronym> until it runs out of disk space
+   on <filename>pg_xlog</filename>. This can happen even if the downstream peer
+   is online and replaying if it isn't able to receive and replay changes as
+   fast as the upstream node generates them.
    </para>
- </warning>
  
- <para>
-   Here is an example of a <literal>SELECT</literal> from
-   <literal>bdr.bdr_nodes</literal> that indicates that one node is ready
-   (<literal>r</literal>), one node has been removed/killed
-   (<literal>k</literal>), and one node is initializing (<literal>i</literal>):
-   <programlisting>
-   SELECT * FROM bdr.bdr_nodes;
-        node_sysid      | node_timeline | node_dboid | node_status | node_name |      node_local_dsn      |    node_init_from_dsn
-   ---------------------+---------------+------------+-------------+-----------+--------------------------+--------------------------
-    6125823754033780536 |             1 |      16385 | r           | node1     | port=5598 dbname=bdrdemo |
-    6125823714403985168 |             1 |      16386 | k           | node2     | port=5599 dbname=bdrdemo | port=5598 dbname=bdrdemo
-    6125847382076805699 |             1 |      16386 | i           | node3     | port=6000 dbname=bdrdemo | port=5598 dbname=bdrdemo
-   (3 rows)
-   </programlisting>
- </para>
+  <para>
+   It is therefore important to have automated monitoring in place to
+   ensure that if replication slots start falling badly behind the
+   admin is alerted and can take proactive action.
+  </para>
+
+ </sect1>
+
+ <sect1 id="monitoring-node-join-remove" xreflabel="Monitoring node join/removal">
+  <title>Monitoring node join/removal</title>
+
+  <para>
+   Node join and removal is asynchronous in &bdr;. The <xref
+   linkend="functions-node-mgmt"> return immediately, without first
+   ensuring the join or part operation is complete. To see when a join or
+   part operation finishes it is necessary to check the node state indirectly
+   via <xref linkend="catalog-bdr-nodes"> or using helper functions.
+  </para>
+
+  <para>
+   The helper function <xref
+   linkend="function-bdr-node-join-wait-for-ready">, when called, will cause
+   a PostgreSQL session to pause until outstanding node join operations
+   complete. More helpers for node status monitoring will be added over
+   time.
+  </para>
+
+  <para>
+   For other node status monitoring <xref linkend="catalog-bdr-nodes">
+   must be queried directly.
+  </para>
+
+  <para>
+    Here is an example of a <literal>SELECT</literal> from
+    <literal>bdr.bdr_nodes</literal> that indicates that one node is ready
+    (<literal>r</literal>), one node has been removed/killed
+    (<literal>k</literal>), and one node is initializing (<literal>i</literal>):
+    <programlisting>
+    SELECT * FROM bdr.bdr_nodes;
+         node_sysid      | node_timeline | node_dboid | node_status | node_name |      node_local_dsn      |    node_init_from_dsn
+    ---------------------+---------------+------------+-------------+-----------+--------------------------+--------------------------
+     6125823754033780536 |             1 |      16385 | r           | node1     | port=5598 dbname=bdrdemo |
+     6125823714403985168 |             1 |      16386 | k           | node2     | port=5599 dbname=bdrdemo | port=5598 dbname=bdrdemo
+     6125847382076805699 |             1 |      16386 | i           | node3     | port=6000 dbname=bdrdemo | port=5598 dbname=bdrdemo
+    (3 rows)
+    </programlisting>
+  </para>
+
+ </sect1>
+
+ <sect1 id="monitoring-peers" xreflabel="Monitoring replication peers">
+  <title>Monitoring replication peers</title>
+
+  <para>
+   As outlined in <xref linkend="monitoring-why"> it is important to monitor
+   the state of peer nodes in a &bdr; group. There are two main views
+   used for this: <literal>pg_stat_replication</literal> to monitor for
+   actively replicating nodes, and <literal>pg_replication_slots</literal>
+   to monitor for replication slot progress.
+  </para>
+
+  <sect2 id="monitoring-connections" xreflabel="Monitoring connected peers">
+   <title>Monitoring connected peers using pg_stat_replication</title>
  
- <para>
-  Administrators may query <literal>pg_catalog.pg_stat_replication</literal> to
-  monitor actively replicating connections.
-  <warning>
     <para>
-    This view does <emphasis>not</emphasis> show peers that have a slot but are
-    not currently connected, even though such peers are still making the server
-    retain WAL. It is important to monitor
-    <literal>pg_replication_slots</literal> too.
+    Administrators may query
+    <ulink url="http://www.postgresql.org/docs/current/static/monitoring-stats.html#PG-STAT-REPLICATION-VIEW">pg_catalog.pg_stat_replication</ulink>
+    to monitor actively replicating connections.
+    It shows the pid of the local side of the connection (wal sender process), the
+    application name sent by the peer (for BDR, this is <literal>bdr
+    (sysid,timeline,dboid,)</literal>), and other status information:
+    <programlisting>
+    SELECT * FROM pg_stat_replication;
+      pid  | usesysid | usename |              application_name              | client_addr | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state
+    -------+----------+---------+--------------------------------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
+     29045 |    16385 | myadmin   | bdr (6127682459268878512,1,16386,):receive |             |                 |          -1 | 2015-03-18 21:03:28.717175+00 |              | streaming | 0/189D3B8     | 0/189D3B8      | 0/189D3B8      | 0/189D3B8       |             0 | async
+     29082 |    16385 | myadmin   | bdr (6127682494973391064,1,16386,):receive |             |                 |          -1 | 2015-03-18 21:03:44.665272+00 |              | streaming | 0/189D3B8     | 0/189D3B8      | 0/189D3B8      | 0/189D3B8       |             0 | async
+    </programlisting>
+    This view shows all active replication connections, not just those used by
+    &bdr;. You will see connections from physical streaming replicas, other
+    logical decoding solutions, etc here as well.
     </para>
-  </warning>
-  It shows the pid of the local side of the connection (wal sender process), the
-  application name sent by the peer (for BDR, this is <literal>bdr
-  (sysid,timeline,dboid,)</literal>), and other status information:
-  <programlisting>
-  SELECT * FROM pg_stat_replication;
-    pid  | usesysid | usename |              application_name              | client_addr | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state
-  -------+----------+---------+--------------------------------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
-   29045 |    16385 | myadmin   | bdr (6127682459268878512,1,16386,):receive |             |                 |          -1 | 2015-03-18 21:03:28.717175+00 |              | streaming | 0/189D3B8     | 0/189D3B8      | 0/189D3B8      | 0/189D3B8       |             0 | async
-   29082 |    16385 | myadmin   | bdr (6127682494973391064,1,16386,):receive |             |                 |          -1 | 2015-03-18 21:03:44.665272+00 |              | streaming | 0/189D3B8     | 0/189D3B8      | 0/189D3B8      | 0/189D3B8       |             0 | async
-  </programlisting>
-  This view shows all replication connections, not just those used by &bdr;.
- </para>
  
- <para>
-  Information about replication slots (both logical and physical) is available
-  in the <literal>pg_catalog.pg_replication_slots</literal> view:
-  <programlisting>
-  SELECT * FROM pg_replication_slots;
-                  slot_name                | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn
-  -----------------------------------------+--------+-----------+--------+----------+--------+------+--------------+-------------
-   bdr_16386_6127682459268878512_1_16386__ | bdr    | logical   |  16386 | bdrdemo  | t      |      |          749 | 0/191B130
-   bdr_16386_6127682494973391064_1_16386__ | bdr    | logical   |  16386 | bdrdemo  | t      |      |          749 | 0/191B130
-  (2 rows)
-  </programlisting>
-  If a slot has <literal>active = t</literal>
-  then there will be a corresponding <literal>pg_stat_replication</literal> entry
-  for the walsender process connected to the slot.
- </para>
- <para>
-  This view shows only replication peers that use a slot. Physical streaming
-  replication connections that don't use slots will not show up here, only in
-  <literal>pg_stat_replication</literal>.
- </para>
+   <para>
+    To tell how far behind a given active connection is, compare its
+    <literal>flush_location</literal> (the replay position up to which
+    it has committed its work) with the sending server's
+    <literal>pg_current_xlog_insert_location()</literal> using
+    <literal>pg_xlog_location_diff</literal>, e.g:
+    <programlisting>
+     SELECT
+       pg_xlog_location_diff(pg_current_xlog_insert_location(), flush_location) AS lag_bytes,
+       pid, application_name
+     FROM pg_stat_replication;
+    </programlisting>
+    This will show lag for all replication consumers, including non-&bdr;
+    ones. To show only &bdr; peers, append
+    <literal>WHERE application_name LIKE 'bdr%'</literal>.
+   </para>
  
- <para>
-  Performance and conflict statistics are maintained for each node by &bdr; in
-  the <literal>bdr.pg_stat_bdr</literal> table. This table is <emphasis>not
-  replicated</emphasis> between nodes, so each node has separate stats. Each row
-  represents the &bdr; apply statistics for a different peer node.
-  <programlisting>
-  SELECT * FROM bdr.pg_stat_bdr;
-   rep_node_id | rilocalid |               riremoteid               | nr_commit | nr_rollback | nr_insert | nr_insert_conflict | nr_update | nr_update_conflict | nr_delete | nr_delete_conflict | nr_disconnect
-  -------------+-----------+----------------------------------------+-----------+-------------+-----------+--------------------+-----------+--------------------+-----------+--------------------+---------------
-             1 |         1 | bdr_6127682459268878512_1_16386_16386_ |         4 |           0 |         6 |                  0 |         1 |                  0 |         0 |                  3 |             0
-             2 |         2 | bdr_6127682494973391064_1_16386_16386_ |         1 |           0 |         0 |                  0 |         1 |                  0 |         0 |                  0 |             0
-  (2 rows)
-  </programlisting>
- </para>
+   <warning>
+    <para>
+     <literal>pg_stat_replication</literal> does <emphasis>not</emphasis> show
+     peers that have a slot but are not currently connected, even though such
+     peers are still making the server retain WAL. It is important to monitor
+     <literal>pg_replication_slots</literal> too.
+    </para>
+   </warning>
  
- <para>
-  You can track conflicts that have occurred on a particular node with
-  <literal>bdr.bdr_conflict_history</literal>.  This catalog is not replicated
-  between nodes.  This is a technical limitation that may be lifted in a future
-  release, but it also saves on unnecessary replication overhead.
- </para>
- <para>
-  You can use the conflict history table to determine how rapidly your
-  application creates conflicts and where those conflicts occur, allowing you to
-  improve the application to reduce conflict rates. It also helps detect cases
-  where conflict resolutions may not have produced the desired results, allowing
-  you to identify places where a user defined conflict trigger or an application
-  design change may be desirable.
- </para>
- <para>
-  Row values may optionally be logged for row conflicts. This is controlled by
-  the global database-wide option bdr.log_conflicts_to_table. There is no
-  per-table control over row value logging at this time. Nor is there any limit
-  applied on the number of fields a row may have, number of elements dumped in
-  arrays, length of fields, etc, so it may not be wise to enable this if you
-  regularly work with multi-megabyte rows that may trigger conflicts.
- </para>
- <para>
-  Because the conflict history table contains data on every table in the
-  database so each row's schema might be different, if row values are logged
-  they are stored as json fields. The json is created with row_to_json, just
-  like if you'd called it on the row yourself from SQL. There is no
-  corresponding json_to_row function in PostgreSQL at this time, so you'll need
-  table-specific code (pl/pgsql, pl/python, pl/perl, whatever) if you want to
-  reconstruct a composite-typed tuple from the logged json.
- </para>
+   <para>
+    There is not currently any facility to report how far behind a given node
+    is in elapsed seconds of wall-clock time. So you can't easily tell that
+    node <replaceable>X</replaceable> currently has data that is
+    <replaceable>n</replaceable> seconds older than the original data on node
+    <replaceable>Y</replaceable>. If this is an application requirement the
+    application should write periodic timestamp tick records to a table and
+    check how old the newest tick for a given node is on other nodes.
+   </para>
+
+  </sect2>
+
+  <sect2 id="monitoring-slots" xreflabel="Monitoring replication slots">
+   <title>Monitoring replication slots</title>
+
+   <para>
+    Information about replication slots (both logical and physical) is available
+    in the <literal>pg_catalog.pg_replication_slots</literal> view. This view
+    shows all slots, whether or not there is an active replication connection
+    using them. It looks like:
+    <programlisting>
+    SELECT * FROM pg_replication_slots;
+                    slot_name                | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn
+    -----------------------------------------+--------+-----------+--------+----------+--------+------+--------------+-------------
+     bdr_16386_6127682459268878512_1_16386__ | bdr    | logical   |  16386 | bdrdemo  | t      |      |          749 | 0/191B130
+     bdr_16386_6127682494973391064_1_16386__ | bdr    | logical   |  16386 | bdrdemo  | t      |      |          749 | 0/191B130
+    (2 rows)
+    </programlisting>
+    If a slot has <literal>active = t</literal>
+    then there will be a corresponding <literal>pg_stat_replication</literal> entry
+    for the walsender process connected to the slot.
+   </para>
+
+   <para>
+    This view shows only replication peers that use a slot. Physical streaming
+    replication connections that don't use slots will not show up here, only in
+    <literal>pg_stat_replication</literal>. &bdr; always uses slots so all
+    &bdr; peers will appear here.
+   </para>
+
+   <para>
+    To see how much extra <acronym>WAL</acronym> &bdr; slot is asking the server
+    to keep, in bytes, use a query like:
+    <programlisting>
+    SELECT
+      slot_name, database, active,
+      pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn) AS retained_bytes
+    FROM pg_replication_slots
+    WHERE plugin = 'bdr';
+    </programlisting>
+   </para>
+
+   <para>
+    Retained <acronym>WAL</acronym> isn't additive; if you have three peers, who
+    of which require 500KB of <acronym>WAL</acronym> to be retained and one that
+    requires 8MB, only 8MB is retained.  It's like a dynamic version of the
+    <literal>wal_keep_segments</literal> setting (or, in 9.5,
+    <literal>min_wal_size</literal>). So you need to monitor to make sure that
+    the <emphasis>largest</emphasis> amount of retained <acronym>WAL</acronym>
+    doens't exhaust the free space in <filename>pg_xlog</filename> on each node.
+   </para>
+
+   <para>
+    It is normal for <literal>pg_replication_slots.restart_lsn</literal> not to
+    advance as soon as <literal>pg_stat_replication.flush_location</literal>
+    advances on an active connection. The slot restat position does
+    <emphasis>not</emphasis> indicate how old the data you will see on a peer
+    node is.
+   </para>
+
+  </sect2>
+
+ </sect1>
+
+ <sect1 id="monitoring-conflict-stats" xreflabel="Monitoring conflicts">
+  <title>Monitoring conflicts</title>
+
+  <para>
+   <xref linkend="conflicts"> can arise when multiple nodes make changes
+   that affect the same tables in ways that can interact with each other.
+   The &bdr; system should be monitored to ensure that conflicts
+   are identified and, where possible, applicaiton changes are made to
+   eliminate them or make them less frequent.
+  </para>
+
+  <para>
+   Not all conflicts are logged to <xref linkend="catalog-bdr-conflict-history">
+   even when <xref linkend="guc-bdr-log-conflicts-to-table"> is on. Conflicts
+   where &bdr; cannot proactively detect and handle the conflict (like 3-way
+   foreign key conflicts) will result in an <literal>ERROR</literal> message
+   in the PostgreSQL logs and an increment of
+   <xref linkend="catalog-pg-stat-bdr"><literal>.nr_rollbacks</literal>
+   on that node for the connection the conflicting transaction originated from.
+  </para>
+
+  <para>
+   If <literal>pg_stat_bdr.nr_rollbacks</literal> keeps increasing and a node
+   isn't making forward progress, then it's likely there's a divergent conflict
+   or other issue that may need administrator action. Check the log files
+   for that node for details.
+  </para>
+
+ </sect1>
+
+ <sect1 id="monitoring-postgres-stats" xreflabel="PostgreSQL statistics views">
+  <title>PostgreSQL statistics views</title>
+
+  <para>
+   Statistics on table and index usage are updated normally by the downstream
+   master. This is essential for correct function of
+   <ulink url="http://www.postgresql.org/docs/current/static/routine-vacuuming.html">autovacuum</ulink>.
+   If there are no local writes on the downstream master and stats have not
+   been reset these two views should show matching results between upstream and
+   downstream:
+   <itemizedlist>
+    <listitem><simpara><literal>pg_stat_user_tables</literal></simpara></listitem>
+    <listitem><simpara><literal>pg_statio_user_tables</literal></simpara></listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   Since indexes are used to apply changes, the identifying indexes on
+   downstream side may appear more heavily used with workloads that perform
+   <literal>UPDATE</literal>s and <literal>DELETE</literal>s than
+   non-identifying indexes are.
+  </para>
+
+  <para>
+   The built-in index monitoring views are:
+   <itemizedlist>
+    <listitem><simpara><literal>pg_stat_user_indexes</literal></simpara></listitem>
+    <listitem><simpara><literal>pg_statio_user_indexes</literal></simpara></listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   All these views are discussed in the
+   <ulink url="http://www.postgresql.org/docs/current/static/monitoring-stats.html#MONITORING-STATS-VIEWS-TABLE">
+   PostgreSQL documentation on the statistics views</ulink>.
+  </para>
+
+ </sect1>
  
  </chapter>
author	Craig Ringer <craig@2ndquadrant.com>
	Tue, 24 Mar 2015 09:49:07 +0000 (17:49 +0800)
committer	Craig Ringer <craig@2ndquadrant.com>
	Tue, 24 Mar 2015 09:49:07 +0000 (17:49 +0800)
doc/manual-catalogs-views.sgml		patch \| blob \| blame \| history
doc/manual-conflicts.sgml		patch \| blob \| blame \| history
doc/manual-monitoring.sgml		patch \| blob \| blame \| history