Fix possible data inconsistency in native replication mode.
authorTatsuo Ishii <ishii@sraoss.co.jp>
Wed, 11 Mar 2020 04:05:43 +0000 (13:05 +0900)
committerTatsuo Ishii <ishii@sraoss.co.jp>
Wed, 11 Mar 2020 04:13:35 +0000 (13:13 +0900)
There is a long standing bug with native replication mode. As reported
in pgpool-general, it is possible to lost sync of database if slave
DB's postgres process is killed. This is due to an oversight in
read_packets_and_process().

In replication mode if slave server's postgres is killed, then local
backend status is set to down.

*(my_backend_status[i]) = CON_DOWN;

So next DDL/DML in the same session is only issued to master node (and
other slave if there are multiple slave nodes). Of course this leads
to serious data inconsistency problem because in native replication
mode all DB nodes must receive DDL/DML at the same time.

Fix is triggering failover in thiscase.

Discussions:
https://www.pgpool.net/pipermail/pgpool-general/2020-March/006954.html
https://www.pgpool.net/pipermail/pgpool-hackers/2020-March/003540.html

src/protocol/pool_process_query.c

index 867505cd7a9f2502d3fdca97a0f6054ad72bfd62..5768d1dbc6c19e8899e288a7bfde0e5282a2542c 100644 (file)
@@ -4888,6 +4888,19 @@ SELECT_RETRY:
                                                break;
                                        }
 
+                                       /*
+                                        * In native replication mode, we need to trigger failover
+                                        * to avoid data inconsistency.
+                                        */
+                                       else if (REPLICATION)
+                                       {
+                                               was_error = 1;
+                                               if (!VALID_BACKEND(i))
+                                                       break;
+                                               notice_backend_error(i, REQ_DETAIL_SWITCHOVER);
+                                               sleep(5);
+                                       }
+
                                        /*
                                         * Just set local status to down.
                                         */