git.postgresql.org Git - postgres-xl.git/log

Explicitly track if a subplan has been executed and a valid value is computed
or not.

This works around a problem noted in issue #102, but not a full solution to the
problem. It seems there are places where InitPlan is attached to a node which
will be executed on the remote node. Even so coordinator tries to find push a
PARAM_EXEC parameter value for a remote subplan and fails because the value is
not and will never be computed on the coordinator. We for now send NULL value
for such cases to avoid a server crash

Correct stable function in test

Add an alternate expected output file for "inherit" test case to address
occasional plan change

Do not print node names in EXPLAIN output of a query in "rules" test since that
may cause non deterministic output

Write release notes for upcoming Postgres-XL 9.5r1.2 release

Load balance remote subplan execution by choosing a node randomly instead of
always picking up the first node.

When planner has a choice of executing a subplan on any of the remote nodes,
it always used to execute the subplan on the first node. That can cause
excessive load/number of connections on that node. This patch fixes that by
choosing a node randomly from the list of available nodes

Modify expected output of tests due to recent changes

Add test case for Issue #93

Use 2^32 modulo computation to convert signed integer to unsigned value since
abs() may give a different result.

This makes the redistribution code in sync with the way hash modulo is computed
elsewhere in the code

Add test case for Issue #91

Add test case for Issue #88

Add test case for Issue #87

Add test case for Issue #7

Add test case for Issue #5

Add test case for Issue #81

Create empty cluster and add nodes multiple times
to reproduce the issue. This issue occurs intermittently.

Add test case for Issue #16

Add test case for SQLsmith Issue #102

Do not FQS EXECUTE DIRECT statements.

Executor is not prepared to handle that and its not necessary since EXECUTE
DIRECT implies direct execution on the target node anyways

Block FOR SHARE/UPDATE for queries involving joins

Per report from Shaun Thomas, we don't yet support row locking when query has a
join between tables. While it may sometimes give an error, worse it may
silently lock wrong rows leading to application logic failures. The feature is
currently blocked until we have bandwidth to diagnose further and come up with
a proper fix.

Also update reqression tests to include offending tests and update expected
output now that such queries throw errors

Correct expected behaior of test.

Test reproduces Issue #84 on crash recovery
and prepared transactions.

Change several asserts checking for aggregate context to errors

This should address the crash in assert-enabled build reported by Pallavi
Sontakke. SQLSmith tries to call various functions from system catalogs and
functions should be prepared to handle such calls.

No need to look at the relation size everytime rescanning a relation with
sequence scan

This can provide big boost to performance when the inner side of a nested loop
has a sequence scan with large number of rows in outer relation

Add a test case demonstrating an extremely slow join in XL, especially with the
inner side of a join doing repeated remote subquery scans

Never ever use an invalid XID, if we fail to connect to the GTM

The code before this commit would happily proceed further if GTM becomes dead
or unreachable. This may result in random problems since rest of the code is
not prepared to deal with that situation (as seen from the crash in TAP tests).
It seems far safer to just throw an error in such case.

TAP test and report by Pallavi Sontakke

Use GTM_Sequence type to hold value of a sequence on GTM

We were incorrectly using "int" at couple of places which is not wide enough to
store 64-bit sequence values.

Per report by Helmi Ahmad

Return pointer to the buffer used to store various fields values, so that
caller can free up the memory when done with it.

This fixes a memory like while running ALTER TABLE DISTRIBUTE BY

Add a missing parenthesis, lack of which might throw surprises

Don't wait indefinitely while trying to abort a transaction on the remote node.

Treat this more as a work-around until we investigate the reasons for remote
nodes not honouring query cancellation for some complicated queries, requiring
nested remote subquery plans

Avoid pushing down evaluation of VALUES clause to a datanode for replicated
tables, unless it contains volatile function(s)

Warn about missing relations only if they are not views since views are not
created on datanodes anyways

This relates to the earlier commit

Add a warning when a node fails to find OID for the given relation.

Regression shows a bunch of warnings which should be fixed over due course

Handle ON COMMIT properly on the datanodes by not looking at
MyXactAccessedTempRel which is set up differently in XL

We can't rely on MyXactAccessedTempRel because it gets set only when local
buffers are used for a relation. In XL, even temp tables use shared buffers
when used in a global session. Skipping checking for this local flag won't be
too bad from performance perspective.

Add TAP test for crash recovery Issue #84

Test crash recovery when prepared transactions are being created in
the background. Tests #84.

Add test for complex-update issue #38

Separate out INSENSITIVE cursor Issue ( #34)

Move issue in xl_known_bugs to track it separately.

Separate out LATERAL Issue #73 into xl_known_bugs

Correct expected output for a test

Modify expected output for 'ON COMMIT DELETE ROWS' clause.

Remove an unintentional "set -x" command slipped in the previous commit

Ensure "init all" (and other init commands too) does not remove existing data
directories unless "force" option is used

We'd tried to fix this earlier, but looks like double quote is not getting
passed to the shell correctly. Instead use a single quote.

Report by Pallavi Sontakke during QA testing.

Show originating coordinator name:pid and remote node name:pid in the ps output
while running REMOTE SUBPLAN

This should help administrator to quickly find the originating session and
remote session, which can be immensely useful for debugging

Do not reset stats on the coordinator upon create of a new index or cluster.

The stats on the coordinator has no meaning for distributed tables unless
they are built using remote stats. Also make sure coordinator's view of
updated/deleted/inserted tuples is executed when fast query shipping is used

Log more information about connection error for ease in debugging

Stamp Postgres-XL 9.5r1.1

Revert "It may happen that we try to read the status of a transaction"

This reverts commit d3356cd451664f3585706f0ae817e0037814ec6e.

Other bug fixes in this area has now fixed the underlying problem and hence we
don't need this change.

Update release notes for XL release r1.1

Merge upto PostgreSQL 9.5.3 (ba37ac217791dfdf2b327c4b75e7083b6b03a2f5)

Initialise a variable as reported by valgrind

Stamp 9.5.3.

Translation updates

Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 7a7a803d44fad7952cf6b1a1da29df2b06b1b380

Add a mechanism to selectively refresh pooler information when only connection
options, such as hostname/port changes for a node

This allows us to retain connections to all other nodes in the cluster and just
recreate connections to the node whose connection information is changed. This
will be especially handy while dealing with datanode/coordinator failover

Remove some references to person names from the docs

Avoid protocol breakage when pooler fails to open connection to one or more
nodes.

While the pooler sends both fds as well pids, the client was bailing out early
if it can't find required number of fds, thus causing protocol breakdown

Per report by Zhang Eamon

Properly deallocate prepared statements on the remote node when user makes such
request

Per report by Vamshi Reddy

Release notes for 9.5.3, 9.4.8, 9.3.13, 9.2.17, 9.1.22.

Docs: improve warnings about nextval() not producing gapless sequences.

In the documentation for nextval(), point out explicitly that INSERT ...
ON CONFLICT will call nextval() if needed for the insertion case, whether
or not it ends up following the ON CONFLICT path. This seems to be a
matter of some confusion, cf bug #14126, so let's be clear about it.

Also mention the issue in the CREATE SEQUENCE reference page, since that
is another place where people might expect such things to be covered.

Minor wording improvements nearby, as well.

Back-patch to 9.5 where ON CONFLICT was introduced.

Distrust external OpenSSL clients; clear err queue

OpenSSL has an unfortunate tendency to mix per-session state error
handling with per-thread error handling.  This can cause problems when
programs that link to libpq with OpenSSL enabled have some other use of
OpenSSL; without care, one caller of OpenSSL may cause problems for the
other caller.  Backend code might similarly be affected, for example
when a third party extension independently uses OpenSSL without taking
the appropriate precautions.

To fix, don't trust other users of OpenSSL to clear the per-thread error
queue.  Instead, clear the entire per-thread queue ahead of certain I/O
operations when it appears that there might be trouble (these I/O
operations mostly need to call SSL_get_error() to check for success,
which relies on the queue being empty).  This is slightly aggressive,
but it's pretty clear that the other callers have a very dubious claim
to ownership of the per-thread queue.  Do this is both frontend and
backend code.

Finally, be more careful about clearing our own error queue, so as to
not cause these problems ourself.  It's possibly that control previously
did not always reach SSLerrmessage(), where ERR_get_error() was supposed
to be called to clear the queue's earliest code.  Make sure
ERR_get_error() is always called, so as to spare other users of OpenSSL
the possibility of similar problems caused by libpq (as opposed to
problems caused by a third party OpenSSL library like PHP's OpenSSL
extension).  Again, do this is both frontend and backend code.

See bug #12799 and https://bugs.php.net/bug.php?id=68276

Based on patches by Dave Vitek and Peter Eisentraut.

From: Peter Geoghegan <pg@bowt.ie>

Fix SSL tests

These were accidentally broken by the great backpatching of
331828b754378733cb5c2e49227603e7354e4e39.

Fix pg_upgrade to not fail when new-cluster TOAST rules differ from old.

This patch essentially reverts commit 4c6780fd17aa43ed, in favor of a much
simpler solution for the case where the new cluster would choose to create
a TOAST table but the old cluster doesn't have one: just don't create a
TOAST table.

The existing code failed in at least two different ways if the situation
arose: (1) ALTER TABLE RESET didn't grab an exclusive lock, so that the
lock sanity check in create_toast_table failed; (2) pg_upgrade did not
provide a pg_type OID for the new toast table, so that the crosscheck in
TypeCreate failed.  While both these problems were introduced by later
patches, they show that the hack being used to cause TOAST table creation
is overwhelmingly fragile (and untested).  I also note that before the
TypeCreate crosscheck was added, the code would have resulted in assigning
an indeterminate pg_type OID to the toast table, possibly causing a later
OID conflict in that catalog; so that it didn't really work even when
committed.

If we simply don't create a TOAST table, there will only be a problem if
the code tries to store a tuple that's wider than a page, and field
compression isn't sufficient to get it under a page.  Given that the TOAST
creation threshold is intended to be about a quarter of a page, it's very
hard to believe that cross-version differences in the do-we-need-a-toast-
table heuristic could result in an observable problem.  So let's just
follow the old version's conclusion about whether a TOAST table is needed.

(If we ever do change needs_toast_table() so much that this conclusion
doesn't apply, we can devise a solution at that time, and hopefully do
it in a less klugy way than 4c6780fd17aa43ed did.)

Back-patch to 9.3, like the previous patch.

Discussion: <8110.1462291671@sss.pgh.pa.us>

Fix possible read past end of string in to_timestamp().

to_timestamp() handles the TH/th format codes by advancing over two input
characters, whatever those are.  It failed to notice whether there were
two characters available to be skipped, making it possible to advance
the pointer past the end of the input string and keep on parsing.
A similar risk existed in the handling of "Y,YYY" format: it would advance
over three characters after the "," whether or not three characters were
available.

In principle this might be exploitable to disclose contents of server
memory.  But the security team concluded that it would be very hard to use
that way, because the parsing loop would stop upon hitting any zero byte,
and TH/th format codes can't be consecutive --- they have to follow some
other format code, which would have to match whatever data is there.
So it seems impractical to examine memory very much beyond the end of the
input string via this bug; and the input string will always be in local
memory not in disk buffers, making it unlikely that anything very
interesting is close to it in a predictable way.  So this doesn't quite
rise to the level of needing a CVE.

Thanks to Wolf Roediger for reporting this bug.

Fix a memory leak in GTM proxy

When two lists are concatnated, we might leak header of the second list since
only the list cells are concatnated. We must be careful not to free the list if
list_concat returned the to-be-concatnated list as-is.

Update time zone data files to tzdata release 2016d.

DST law changes in Russia (Magadan, Tomsk regions) and Venezuela.
Historical corrections for Russia. There are new zone names Europe/Kirov
and Asia/Tomsk reflecting the fact that these regions now have different
time zone histories from adjacent regions.

Fix ordering/categorization of some recently-added system views.

Somebody added pg_replication_origin, pg_replication_origin_status and
pg_replication_slots to catalogs.sgml without a whole lot of concern for
either alphabetical order or the difference between a table and a view.
Clean up the mess.

Back-patch to 9.5, not so much because this is critical as because if
I don't it will result in a cross-branch divergence in release-9.5.sgml,
which would be a maintenance hazard.

Honour shared queue refcount while deciding to remove it from hash table when
producer unbinds

It's possible that another execution of the portal may start just while we are
unbinding. Hence premature removal of the shared queue should be avoided.

Make minimum values of shared_queues and shared_queue_size GUC parameters
dependent on other settings

shared_queue_size is dependent on the number of datanodes in the cluster since
each datanode may attach itself as a consumer of the shared queue. So the
shared_queue_size now signifies per-datanode value and the actual value used
will be (max_datanodes * shared_queue_size). Existing users should modify their
settings after taking this into consideration.

Similarly, shared_queues highly depends on the number of concurrent queries. We
now conservatively set this to at least 1/4th of max_connections or user
specified value, whichever is higher.

Add a ref count mechanism to deal with situations where a Shared Queue is
acquired but never bound by any of the node, thus causing leakage

To be honest, this area requires further work. The way things are currently
setup, producer and consumers all bind to a shared queue, but only producer
eventually unbinds. The implementation has logic to wait out for consumers
before destroying a shared queue. While this is okay, a more defined entry and
exit points are required for producer and consumers.

The code also today relies on timeouts to handle the case where a consumer
never binds to a shared queue, thus causing large delays. These delays are more
prominent for very short queries.

Correct shared memory size calculation for Shared Queue hashtable.

Extend CLog, Subtrans log and CommitTsLog appropriately when an XID is received
from a remote node

doc: Fix more typos

From: Alexander Law <exclusion@gmail.com>

Fix a nasty bug that was zeroing out clog and subtrans pages, thus causing
various sorts of data corruptions.

The bug dates back to the XC days, but probably became prominent in XL because
of certain recent changes. In XC/XL, a node may not see all the XIDs and hence
clog/subtrans log must be extended whenever a new XID crosses the previously
seen page boundary. We do this by comparing the pageno where the new XID maps
with the latest_page_no as stored in the shared SLRU data structure. But to
handle XID wrap-arounds, we added a check for difference in number of pages to
be less than CLOG_WRAP_CHECK_DELTA, which was incorrectly defined as
(2^30 / CLOG_XACTS_PER_PAGE). Note that "^" is a logical XOR operator in C and
hence this was returned a very small number of 28, thus causing incorrect
zeroing of pages if ExtendCLOG is called with an XID which is older than what
28 clog pages can hold. All such transactions would suddenly be marked as
aborted, resulting in removal of perfectly valid tuples.

This patch fixes the mess by just relying on built-in routines for checking
XID wrap-arounds.

I also found another issue while working on this. We must not only zero the
page at hand, but also all intermediate pages because we won't this opportunity
later if an intermediate XID is seen.

In our test setup, this seems to help some of the recent reports of data
corruption, including "missing attributes" errors.

doc: Fix typos

From: Alexander Law <exclusion@gmail.com>

It may happen that we try to read the status of a transaction
in clog before the page has been committed.

There are places in the code that try to extend clog,
but here we simply just do one retry if it looks like
we failed to read the desired page.

Fix configure's incorrect version tests for flex and perl.

awk's equality-comparison operator is "==" not "=". We got this right
in many places, but not in configure's checks for supported version
numbers of flex and perl. It hadn't been noticed because unsupported
versions are so old as to be basically extinct in the wild, and because
the only consequence is whether or not a WARNING flies by during
configure.

Daniel Gustafsson noted the problem with respect to the test for flex,
I found the other by reviewing other awk calls.

Remove unused macros.

CHECK_PAGE_OFFSET_RANGE() has been unused forever.
CHECK_RELATION_BLOCK_RANGE() has been unused in pgstatindex.c ever since
bt_page_stats() and bt_page_items() functions were moved from pgstattuple
to pageinspect module. It still exists in pageinspect/btreefuncs.c.

Daniel Gustafsson

doc: Fix typo

From: Guillaume Lelarge <guillaume@lelarge.info>

Fix mishandling of equivalence-class tests in parameterized plans.

Given a three-or-more-way equivalence class, such as X.Y = Y.Y = Z.Z,
it was possible for the planner to omit one of the quals needed to
enforce that all members of the equivalence class are actually equal.
This only happened in the case of a parameterized join node for two
of the relations, that is a plan tree like

Nested Loop
  ->  Scan X
  ->  Nested Loop
    ->  Scan Y
    ->  Scan Z
          Filter: Z.Z = X.X

The eclass machinery normally expects to apply X.X = Y.Y when those
two relations are joined, but in this shape of plan tree they aren't
joined until the top node --- and, if the lower nested loop is marked
as parameterized by X, the top node will assume that the relevant eclass
condition(s) got pushed down into the lower node.  On the other hand,
the scan of Z assumes that it's only responsible for constraining Z.Z
to match any one of the other eclass members.  So one or another of
the required quals sometimes fell between the cracks, depending on
whether consideration of the eclass in get_joinrel_parampathinfo()
for the lower nested loop chanced to generate X.X = Y.Y or X.X = Z.Z
as the appropriate constraint there.  If it generated the latter,
it'd erroneously suppose that the Z scan would take care of matters.
To fix, force X.X = Y.Y to be generated and applied at that join node
when this case occurs.

This is *extremely* hard to hit in practice, because various planner
behaviors conspire to mask the problem; starting with the fact that the
planner doesn't really like to generate a parameterized plan of the
above shape.  (It might have been impossible to hit it before we
tweaked things to allow this plan shape for star-schema cases.)  Many
thanks to Alexander Kirkouski for submitting a reproducible test case.

The bug can be demonstrated in all branches back to 9.2 where parameterized
paths were introduced, so back-patch that far.

Fix comment whitespace in VS2105 patch

per gripe from Michael Paquier.

Fix typo in VS2015 patch

reported by Christian Ullrich

Support building with Visual Studio 2015

Adjust the way we detect the locale. As a result the minumum Windows
version supported by VS2015 and later is Windows Vista. Add some tweaks
to remove new compiler warnings. Remove documentation references to the
now obsolete msysGit.

Michael Paquier, somewhat edited by me, reviewed by Christian Ullrich.

Backpatch to 9.5

Remember asking for feedback during walsender shutdown.

Since 5a991ef8 we're explicitly asking for feedback from the receiving
side when shutting down walsender, if there's not yet replicated
data.

Unfortunately we didn't remember (i.e. set waiting_for_ping_response to
true) having asked for feedback, leading to scenarios in which replies
were requested at a high frequency.

I can't reproduce this problem on my laptop, I think that's because the
problem requires a significant TCP window to manifest due to the
!pq_is_send_pending() condition. But since this clearly is a bug, let's
fix it.  There's quite possibly more wrong than just this though.

While fiddling with WalSndDone(), I rewrote a hard to understand comment
about looking at the flush vs. the write position.

Reported-By: Nick Cleaton, Magnus Hagander
Author: Nick Cleaton
Discussion: CAFgz3kus=rC_avEgBV=+hRK5HYJ8vXskJRh8yEAbahJGTzF2VQ@mail.gmail.com
    CABUevExsjROqDcD0A2rnJ6HK6FuKGyewJr3PL12pw85BHFGS2Q@mail.gmail.com
Backpatch: 9.4, were 5a991ef8 introduced the use of feedback messages
    during shutdown.

Adjust DatumGetBool macro, this time for sure.

Commit 23a41573c attempted to fix the DatumGetBool macro to ignore bits
in a Datum that are to the left of the actual bool value.  But it did that
by casting the Datum to bool; and on compilers that use C99 semantics for
bool, that ends up being a whole-word test, not a 1-byte test.  This seems
to be the true explanation for contrib/seg failing in VS2015.  To fix, use
GET_1_BYTE() explicitly.  I think in the previous patch, I'd had some idea
of not having to commit to bool being exactly 1 byte wide, but regardless
of what the compiler's bool is, boolean columns and Datums are certainly
1 byte wide.

The previous fix was (eventually) back-patched into all active versions,
so do likewise with this one.

Revert "Convert contrib/seg's bool-returning SQL functions to V1 call convention."

This reverts commit b1dd2f86ce7d43f23f6aae307bb22de826849e7d.
That turns out to have been based on a faulty diagnosis of why the
VS2015 build was misbehaving. Instead, we need to fix DatumGetBool().

Impose a full barrier in generic-xlc.h atomics functions.

pg_atomic_compare_exchange_*_impl() were providing only the semantics of
an acquire barrier. Buildfarm members hornet and mandrill revealed this
deficit beginning with commit 008608b9d51061b1f598c197477b3dc7be9c4a64.
While we have no report of symptoms in 9.5, we can't rule out the
possibility of certain compilers, hardware, or extension code relying on
these functions' specified barrier semantics. Back-patch to 9.5, where
commit b64d92f1a5602c55ee8b27a7ac474f03b7aee340 introduced atomics.

Reviewed by Andres Freund.

doc: Fix typo

From: Andreas Seltenreich <andreas.seltenreich@credativ.de>

Rename strtoi() to strtoint().

NetBSD has seen fit to invent a libc function named strtoi(), which
conflicts with the long-established static functions of the same name in
datetime.c and ecpg's interval.c. While muttering darkly about intrusions
on application namespace, we'll rename our functions to avoid the conflict.

Back-patch to all supported branches, since this would affect attempts
to build any of them on recent NetBSD.

Thomas Munro

doc: Fix typos

From: Erik Rijkers <er@xs4all.nl>

Convert contrib/seg's bool-returning SQL functions to V1 call convention.

It appears that we can no longer get away with using V0 call convention
for bool-returning functions in newer versions of MSVC.  The compiler
seems to generate code that doesn't clear the higher-order bits of the
result register, causing the bool result Datum to often read as "true"
when "false" was intended.  This is not very surprising, since the
function thinks it's returning a bool-width result but fmgr_oldstyle
assumes that V0 functions return "char *"; what's surprising is that
that hack worked for so long on so many platforms.

The only functions of this description in core+contrib are in contrib/seg,
which we'd intentionally left mostly in V0 style to serve as a warning
canary if V0 call convention breaks.  We could imagine hacking things
so that they're still V0 (we'd have to redeclare the bool-returning
functions as returning some suitably wide integer type, like size_t,
at the C level).  But on the whole it seems better to convert 'em to V1.
We can still leave the pointer- and int-returning functions in V0 style,
so that the test coverage isn't gone entirely.

Back-patch to 9.5, since our intention is to support VS2015 in 9.5
and later.  There's no SQL-level change in the functions' behavior
so back-patching should be safe enough.

Discussion: <22094.1461273324@sss.pgh.pa.us>

Michael Paquier, adjusted some by me

Add putenv support for msvcrt from Visual Studio 2013

This was missed when VS 2013 support was added.

Michael Paquier

Fix unexpected side-effects of operator_precedence_warning.

The implementation of that feature involves injecting nodes into the
raw parsetree where explicit parentheses appear.  Various places in
parse_expr.c that test to see "is this child node of type Foo" need to
look through such nodes, else we'll get different behavior when
operator_precedence_warning is on than when it is off.  Note that we only
need to handle this when testing untransformed child nodes, since the
AEXPR_PAREN nodes will be gone anyway after transformExprRecurse.

Per report from Scott Ribe and additional code-reading.  Back-patch
to 9.5 where this feature was added.

Report: <ED37E303-1B0A-4CD8-8E1E-B9C4C2DD9A17@elevated-dev.com>

Fix planner failure with full join in RHS of left join.

Given a left join containing a full join in its righthand side, with
the left join's joinclause referencing only one side of the full join
(in a non-strict fashion, so that the full join doesn't get simplified),
the planner could fail with "failed to build any N-way joins" or related
errors.  This happened because the full join was seen as overlapping the
left join's RHS, and then recent changes within join_is_legal() caused
that function to conclude that the full join couldn't validly be formed.
Rather than try to rejigger join_is_legal() yet more to allow this,
I think it's better to fix initsplan.c so that the required join order
is explicit in the SpecialJoinInfo data structure.  The previous coding
there essentially ignored full joins, relying on the fact that we don't
flatten them in the joinlist data structure to preserve their ordering.
That's sufficient to prevent a wrong plan from being formed, but as this
example shows, it's not sufficient to ensure that the right plan will
be formed.  We need to work a bit harder to ensure that the right plan
looks sane according to the SpecialJoinInfos.

Per bug #14105 from Vojtech Rylko.  This was apparently induced by
commit 8703059c6 (though now that I've seen it, I wonder whether there
are related cases that could have failed before that); so back-patch
to all active branches.  Unfortunately, that patch also went into 9.0,
so this bug is a regression that won't be fixed in that branch.

Improve TranslateSocketError() to handle more Windows error codes.

The coverage was rather lean for cases that bind() or listen() might
return. Add entries for everything that there's a direct equivalent
for in the set of Unix errnos that elog.c has heard of.

Remove dead code in win32.h.

There's no longer a need for the MSVC-version-specific code stanza that
forcibly redefines errno code symbols, because since commit 73838b52 we're
unconditionally redefining them in the stanza before this one anyway.
Now it's merely confusing and ugly, so get rid of it; and improve the
comment that explains what's going on here.

Although this is just cosmetic, back-patch anyway since I'm intending
to back-patch some less-cosmetic changes in this same hunk of code.

Provide errno-translation wrappers around bind() and listen() on Windows.

Fix Windows builds to report something useful rather than "could not bind
IPv4 socket: No error" when bind() fails.

Back-patch of commits d1b7d4877b9a71f4 and 22989a8e34168f57.

Discussion: <4065.1452450340@sss.pgh.pa.us>

Fix ruleutils.c's dumping of ScalarArrayOpExpr containing an EXPR_SUBLINK.

When we shoehorned "x op ANY (array)" into the SQL syntax, we created a
fundamental ambiguity as to the proper treatment of a sub-SELECT on the
righthand side: perhaps what's meant is to compare x against each row of
the sub-SELECT's result, or perhaps the sub-SELECT is meant as a scalar
sub-SELECT that delivers a single array value whose members should be
compared against x.  The grammar resolves it as the former case whenever
the RHS is a select_with_parens, making the latter case hard to reach ---
but you can get at it, with tricks such as attaching a no-op cast to the
sub-SELECT.  Parse analysis would throw away the no-op cast, leaving a
parsetree with an EXPR_SUBLINK SubLink directly under a ScalarArrayOpExpr.
ruleutils.c was not clued in on this fine point, and would naively emit
"x op ANY ((SELECT ...))", which would be parsed as the first alternative,
typically leading to errors like "operator does not exist: text = text[]"
during dump/reload of a view or rule containing such a construct.  To fix,
emit a no-op cast when dumping such a parsetree.  This might well be
exactly what the user wrote to get the construct accepted in the first
place; and even if she got there with some other dodge, it is a valid
representation of the parsetree.

Per report from Karl Czajkowski.  He mentioned only a case involving
RLS policies, but actually the problem is very old, so back-patch to
all supported branches.

Report: <20160421001832.GB7976@moraine.isi.edu>

Honor PGCTLTIMEOUT environment variable for pg_regress' startup wait.

In commit 2ffa86962077c588 we made pg_ctl recognize an environment variable
PGCTLTIMEOUT to set the default timeout for starting and stopping the
postmaster.  However, pg_regress uses pg_ctl only for the "stop" end of
that; it has bespoke code for starting the postmaster, and that code has
historically had a hard-wired 60-second timeout.  Further buildfarm
experience says it'd be a good idea if that timeout were also controlled
by PGCTLTIMEOUT, so let's make it so.  Like the previous patch, back-patch
to all active branches.

Discussion: <13969.1461191936@sss.pgh.pa.us>

Fix memory leak and other bugs in ginPlaceToPage() & subroutines.

Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not.  Subsequent patches
band-aided over some of the problems with this design by making things
even messier.

One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud).  This would not typically
be noticeable during retail index updates.  It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.

Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state.  There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.

To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts.  The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page.  The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path.  The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage().  Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)

In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.

Report: <571276DD.5050303@dalibo.com>

Further reduce the number of semaphores used under --disable-spinlocks.

Per discussion, there doesn't seem to be much value in having
NUM_SPINLOCK_SEMAPHORES set to 1024: under any scenario where you are
running more than a few backends concurrently, you really had better have a
real spinlock implementation if you want tolerable performance. And 1024
semaphores is a sizable fraction of the system-wide SysV semaphore limit
on many platforms. Therefore, reduce this setting's default value to 128
to make it less likely to cause out-of-semaphores problems.

doc: Add missing parentheses

From: Alexander Law <exclusion@gmail.com>

Fix possible crash in ALTER TABLE ... REPLICA IDENTITY USING INDEX.

Careless coding added by commit 07cacba983ef79be could result in a crash
or a bizarre error message if someone tried to select an index on the
OID column as the replica identity index for a table. Back-patch to 9.4
where the feature was introduced.

Discussion: CAKJS1f8TQYgTRDyF1_u9PVCKWRWz+DkieH=U7954HeHVPJKaKg@mail.gmail.com

David Rowley

Stamp Postgres-XL 9.5r1

Update release notes and also correct product name to Postgres-XL 9.5r1