Disallowing the quorum aware failover option for the native replication mode.
authorMuhammad Usama <m.usama@gmail.com>
Tue, 18 Feb 2020 12:23:10 +0000 (17:23 +0500)
committerMuhammad Usama <m.usama@gmail.com>
Tue, 18 Feb 2020 14:21:17 +0000 (19:21 +0500)
In native replication mode, Pgpool-II is responsible for replicating the data
on all backend nodes, and if a node becomes quarantined then Pgpool-II stops
sending the writes to that node and when that quarantined node becomes available
again there is no way to figure out which portion of data was not replicated
during the quarantine period. This is dangerous since it can cause
data inconsistency.

So as per the discussion, we reached the conclusion to disallow failover require
consensus and quorum existence with the native replication mode so that backend
node should never get into quarantine when Pgpool-II is configured in the
replication mode.

doc/src/sgml/watchdog.sgml
src/config/pool_config_variables.c
src/sample/pgpool.conf.sample-replication

index d0efdd4e717fee0d5dd0d2984850771fbe8204cd..8fa9cdbaabec9683192464a7831af0ec5d9aec28 100644 (file)
      <para>
       This parameter can only be set at server start.
      </para>
-    </listitem>
+       <note>
+               <para>
+               enabling <varname>failover_when_quorum_exists</varname> is not allowed in native replication mode.
+               </para>
+    </note>
+   </listitem>
    </varlistentry>
 
    <varlistentry id="guc-failover-require-consensus" xreflabel="failover_require_consensus">
       </para>
      </caution>
 
+       <note>
+               <para>
+               enabling <varname>failover_require_consensus</varname> is not allowed in native replication mode.
+               </para>
+    </note>
+
      <para>
       <varname>failover_require_consensus</varname> is not available prior to
       <productname>Pgpool-II </productname><emphasis>V3.7</emphasis>. and it is only
index 2305822adca8df9df80865a2f735857b8f2079a8..24579901f2d5a049a69f0a9a009357b5eb0d2b59 100644 (file)
@@ -4040,6 +4040,20 @@ static bool config_post_processor(ConfigContext context, int elevel)
                }
        }
 
+       /*
+        * Quarantine state in native replication mode is dangerous and it can
+        * potentially cause data inconsistency.
+        * So as per the discussions, we agreed on disallowing setting
+        * failover_when_quorum_exists in native replication mode
+        */
+
+       if (pool_config->failover_when_quorum_exists && pool_config->replication_mode)
+       {
+               pool_config->failover_when_quorum_exists = false;
+               ereport(elevel,
+                               (errmsg("invalid configuration, failover_when_quorum_exists is not allowed in native replication mode")));
+               return false;
+       }
        return true;
 }
 
index 4402071ada04efb2bbcb5477d815440f648f58f5..b2d6b5cfaf3b820b0aac48d66b7a226f48912e8b 100644 (file)
@@ -557,14 +557,16 @@ wd_de_escalation_command = ''
 
 # - Watchdog consensus settings for failover -
 
-failover_when_quorum_exists = on
+failover_when_quorum_exists = off
                                     # Only perform backend node failover
-                                    # when the watchdog cluster holds the quorum
+                                    # when the watchdog cluster holds the quorum.
+                                    # not allowed in replication mode
                                     # (change requires restart)
 
-failover_require_consensus = on
+failover_require_consensus = off
                                     # Perform failover when majority of Pgpool-II nodes
-                                    # aggrees on the backend node status change
+                                    # aggrees on the backend node status change.
+                                    # not allowed in replication mode
                                     # (change requires restart)
 
 allow_multiple_failover_requests_from_node = off