Add COPY Protocol notes to release notes

author Christopher Browne <cbbrowne@ca.afilias.info>

Mon, 27 Feb 2012 20:38:53 +0000 (15:38 -0500)

committer Christopher Browne <cbbrowne@ca.afilias.info>

Mon, 27 Feb 2012 20:38:53 +0000 (15:38 -0500)
author Christopher Browne <cbbrowne@ca.afilias.info>
Mon, 27 Feb 2012 20:38:53 +0000 (15:38 -0500)
committer Christopher Browne <cbbrowne@ca.afilias.info>
Mon, 27 Feb 2012 20:38:53 +0000 (15:38 -0500)
diff --git a/RELEASE b/RELEASE

index 8dc047ebc97cfe9cc525ce423f768d49d749adb9..6c2faac4b2c603dbafd5fdab56d7c4f89e1304b2 100644 (file)
--- a/RELEASE
+++ b/RELEASE
@@ -6,12 +6,115 @@
  ** 2.2.0
  - Bug #250 :: Log shipper does not report application name - add in setting of GUC
  
-- Bug #235 :: SYNC GROUP sizes are now dynamically grow from 1 to a maximum size set in the config file.  The old logic based on time it took to complete the last SYNC has been removed.
+- Bug #235 :: SYNC GROUP sizes are now dynamically grow from 1 to a
+              maximum size set in the config file.  The old logic
+              based on time it took to complete the last SYNC has been
+              removed.
  
  - Add correct 'ps' arguments to the altperl tools for Darwin.
  
  - Bug #256 :: set_conf_option() has an extra elevel parameter on PG 9.2
  
-- The slony1_funcs.so and .sql files that get installed are now versioned.
-  This allows multiple versions of slony to be installed in the same postgresql lib
-  directory.
+- The slony1_funcs.so and .sql files that get installed are now
+  versioned.  This allows multiple versions of slony to be installed
+  in the same postgresql lib directory.
+
+*** Major 2.2 change: COPY protocol
+
+    In versions before 2.2, Slony-I log triggers would capture logged
+    data into tables ~sl_log_1~ and ~sl_log_2~ in the form of
+    nearly-"cooked" queries, omitting just the INSERT/UPDATE/DELETE
+    statement, and the slon that processes the data would transform
+    the data into INSERT/UPDATE/DELETE statements to be individually
+    parsed and executed by the subscriber node.
+
+    In version 2.2, this is changed fairly massively.
+
+    - Log triggers
+
+      - Log triggers used to capture the text of the SQL statement to
+        perform the INSERT/UPDATE/DELETE, missing only the literal
+        INSERT (or UPDATE or DELETE) and the name of the table.
+
+      - As of version 2.2, the log triggers now capture an array of
+        text values indicating the criteria and the data to be loaded.
+
+    - Query against data provider
+
+      - The old query process used to involve declaring a cursor
+        called LOG, which would pull in all the tuples where the data
+        attribute was not "too terribly large."  Tuples would be
+        pulled in a few hundred at a time, the unusually large ones
+        being handled via a special side process in order to avoid
+        excessive memory consumption.
+
+      - In version 2.2, the query instead runs a COPY to standard
+        output, leaving a pipe open to capture that output for
+        transmission to the subscriber.
+
+    - Loading data into subscriber
+
+      - The old query process involved explicitly running each INSERT,
+        UPDATE, DELETE, and TRUNCATE against the subscriber.  This
+        required that the backend parse, plan, and process each query
+        individually.
+
+      - In version 2.2, the stream of data captured from the COPY
+        request is, in turn, passed to a COPY statement that loads the
+        data into sl_log_1 or sl_log_2 on the subscriber node.
+
+        A trigger on sl_log_1 and sl_log_2 reads the incoming data
+        and, inside the backend, transforms the array into
+        INSERT/UPDATE/DELETE statements.
+
+       This is still an oversimplification; there is a statement
+        cache so that if there is request to ~INSERT INTO some_table
+        (c1,c2) values ('b','c');~ followed (perhaps, but not
+        necessarily immediately) by ~INSERT INTO some_table (c1,c2)
+        values ('d','e');~, then the second request re-uses the plan
+        from the first query, and may instead execute ~EXECUTE
+        some_table_insert('d','e');~
+
+    - DDL handling
+
+         - The old behaviour was that DDL was treated as a special Slony
+        event, and it never was clear whether that should be executed
+        before or after other replication activity that might have
+        taken place during that Slony event.  This was fine in Slony
+        1.2 and earlier, when the processing of DDL forcibly required
+        that Slony take out locks on the origin on ALL replicated
+        tables; those locks would prevent there from being any other
+        activity going on during the processing of DDL.  But when much
+        of that locking disappeared, in version 2.0, it became
+        possible for there to be replication activity on tables not
+        being modified by DDL, and timing anomalies could occur.
+
+         - In version 2.2, a third log table, sl_log_script, is added to
+        capture script queries which would notably include DDL.  This
+        table uses the same sequence value that controls order of
+        application for replicated data, so DDL will be applied at
+        exactly the appropriate point in the transaction stream on
+        subscribers.
+
+    - Expected effects
+         - Lower processing overhead in slon process :: By using COPY,
+           all that need be done is for slon to "stream" data from the
+           provider to the subscriber.  There is no longer a need to
+           generate or process individual queries.
+      - Lower memory consumption in slon process :: Again, from using
+           COPY, memory consumption is based solely on how much data
+           is buffered in memory as part of the streaming.  A query
+           that is larger than the buffer is simply handled by letting
+           data stream through the buffer; the large query does not
+           need to be instantiated in memory, as was the case in older
+           versions of Slony.
+         - Less query processing work on subscriber database :: The use
+           of prepared statements should dramatically reduce the
+           amount of computational effort required for query parsing
+           and planning.
+      - Correct ordering of DDL within processing :: Bug #137
+           described a problem persisting from version 2.0 to 2.1
+           where DDL could be performed in the wrong order with
+           respect to logged data changes.  The shift of "script" data
+           from ~sl_event~ to ~sl_log_script~ resolves this issue.
+
author	Christopher Browne <cbbrowne@ca.afilias.info>
	Mon, 27 Feb 2012 20:38:53 +0000 (15:38 -0500)
committer	Christopher Browne <cbbrowne@ca.afilias.info>
	Mon, 27 Feb 2012 20:38:53 +0000 (15:38 -0500)