Tatsuo Ishii [Thu, 23 Apr 2020 05:40:23 +0000 (14:40 +0900)]
Doc: fix typo.
Patch contributed by Umar Hayat.
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2020-April/003587.html
Muhammad Usama [Thu, 16 Apr 2020 10:09:14 +0000 (15:09 +0500)]
Fix for segmentation fault in PCP processor: Bug 598
The crash was caused by an unconditionally pfree on buf without verifyingif it
was actually allocated or not. And freeing the but was not actually required at
the first places since the associated MemoryContext always gets resets anyway
after each loop iteration.
reference: https://www.pgpool.net/mantisbt/view.php?id=598
Tatsuo Ishii [Sun, 12 Apr 2020 09:15:05 +0000 (18:15 +0900)]
Doc: fix typo in failover.sgml.
Patch provided by sherlockcpp.
Tatsuo Ishii [Sat, 11 Apr 2020 05:34:40 +0000 (14:34 +0900)]
Fix pool show command code.
Enhance common function send_complete_and_ready() so that it uses
local variable in more sane way.
Also fix unnecessary limitation of number of rows affected by the
command.
Patch provided by sherlockcpp and Tatsuo Ishii.
Muhammad Usama [Tue, 7 Apr 2020 16:01:16 +0000 (21:01 +0500)]
Minor improvement in configure script
While checking for libpq the configure script was not honoring the LDFLAGS
settings.This sometimes breaks the configure when libraries are not present at
the defaultlocations# Please enter the commit message for your changes
Bo Peng [Sun, 5 Apr 2020 16:10:37 +0000 (01:10 +0900)]
Doc: fix document typos.
Patch is provided by sherlockcpp@foxmail.com.
Tatsuo Ishii [Tue, 31 Mar 2020 07:58:01 +0000 (16:58 +0900)]
Fix bug with query cache.
If an execute message is issued right after Sync message and query
cache hits, Pgpool-II hangs in inject_cached_message() because there's
no data to be read from backend. Fix is, change the code block which
checks the data availability in backend to the beginning of the loop
from the end of the loop. Also teach is_cache_empty() to call
pool_is_suspend_reading_from_frontend() so that if we suspend reading
from frontend. If so, return that cache is empty even if there's data
in frontend cache to receive a ready for query message from backend.
Tatsuo Ishii [Fri, 13 Mar 2020 01:36:17 +0000 (10:36 +0900)]
Fix problems in watchdog processing json data.
Comment on the patch from the author:
In the watchdog source code (src/watchdog/wd_json_data.c), there are some instances of bad handling of values read from json data.
For example:
1) The boolean pool configuration settings "load_balance_mode" and "master_slave_mode" are read using json_get_int_value_for_key(), resulting in 4-bytes being written into their location within the POOL_CONFIG, yet (being bool) they are only 1-byte long. This corrupts the values of the structure members following them.
2) Similarly, when parsing node function json data, "Flags" is read using json_get_int_value_for_key(), resulting in 4-bytes being written into an "unsigned char flags" variable on the stack, overwriting 3-bytes of stack memory following it. On a big-endian system (e.g. Solaris-sparc or Linux for IBM Z), this causes regression test "013.watchdog_failover_require_consensus" to fail, since 0 is written into Flags, rather than the intended value which is in the least significant byte of the int value written.
Bug reported in:
https://www.pgpool.net/mantisbt/view.php?id=596
Patch author:
Greg Nancarrow (Fujitsu Australia)
Tatsuo Ishii [Wed, 11 Mar 2020 04:13:28 +0000 (13:13 +0900)]
Merge branch 'V3_7_STABLE' of ssh://git.postgresql.org/pgpool2 into V3_7_STABLE
Tatsuo Ishii [Wed, 11 Mar 2020 04:05:43 +0000 (13:05 +0900)]
Fix possible data inconsistency in native replication mode.
There is a long standing bug with native replication mode. As reported
in pgpool-general, it is possible to lost sync of database if slave
DB's postgres process is killed. This is due to an oversight in
read_packets_and_process().
In replication mode if slave server's postgres is killed, then local
backend status is set to down.
*(my_backend_status[i]) = CON_DOWN;
So next DDL/DML in the same session is only issued to master node (and
other slave if there are multiple slave nodes). Of course this leads
to serious data inconsistency problem because in native replication
mode all DB nodes must receive DDL/DML at the same time.
Fix is triggering failover in thiscase.
Discussions:
https://www.pgpool.net/pipermail/pgpool-general/2020-March/006954.html
https://www.pgpool.net/pipermail/pgpool-hackers/2020-March/003540.html
pengbo [Thu, 5 Mar 2020 22:39:58 +0000 (07:39 +0900)]
Fix watchdog ping probes fail with long hostnames due to small buffer.
per 516.
Tatsuo Ishii [Wed, 26 Feb 2020 09:16:18 +0000 (18:16 +0900)]
Revert "Fix last status changed timestamp is not set."
This reverts commit
a9f5a9b4e3f081822e454e7dab04a64b0e398e24.
Tatsuo Ishii [Wed, 26 Feb 2020 01:31:37 +0000 (10:31 +0900)]
Fix last status changed timestamp is not set.
If there's no status file or -D is specified when starting up
Pgpool-II, the last status changed timestamp was not set and
pcp_node_info command for example did not show the proper timestamp,
rather showed 1970/1/1, which is the Unix timestamp epoch.
Note that correct timestamp is set once clients connect to Pgpool-II
because at that time the status is changed from "waiting" to "up". So
the phenomena is only observed before any client connects to the
server.
Bo Peng [Sat, 22 Feb 2020 00:59:10 +0000 (09:59 +0900)]
Update copyright year.
Bo Peng [Wed, 19 Feb 2020 07:44:16 +0000 (16:44 +0900)]
Update pgpool.spec file.
Bo Peng [Wed, 19 Feb 2020 06:38:15 +0000 (15:38 +0900)]
Prepare 3.7.13.
Bo Peng [Wed, 19 Feb 2020 06:25:42 +0000 (15:25 +0900)]
Doc: Update doc version.
Bo Peng [Wed, 19 Feb 2020 06:19:53 +0000 (15:19 +0900)]
Doc: add release-notes.
Bo Peng [Wed, 19 Feb 2020 06:12:35 +0000 (15:12 +0900)]
Doc: update japanese documentation.
Muhammad Usama [Tue, 18 Feb 2020 12:23:10 +0000 (17:23 +0500)]
Disallowing the quorum aware failover option for the native replication mode.
In native replication mode, Pgpool-II is responsible for replicating the data
on all backend nodes, and if a node becomes quarantined then Pgpool-II stops
sending the writes to that node and when that quarantined node becomes available
again there is no way to figure out which portion of data was not replicated
during the quarantine period. This is dangerous since it can cause
data inconsistency.
So as per the discussion, we reached the conclusion to disallow failover require
consensus and quorum existence with the native replication mode so that backend
node should never get into quarantine when Pgpool-II is configured in the
replication mode.
Bo Peng [Tue, 18 Feb 2020 08:43:23 +0000 (17:43 +0900)]
Fix rewriting query errors in native replication mode.
per bug551.
Tatsuo Ishii [Tue, 18 Feb 2020 01:28:39 +0000 (10:28 +0900)]
Doc: clarify the session disconnection conditions while failover.
Bo Peng [Mon, 17 Feb 2020 13:14:53 +0000 (22:14 +0900)]
Doc: Fix documentation error regarding delegate_IP.
Bo Peng [Wed, 12 Feb 2020 01:03:13 +0000 (10:03 +0900)]
Fix child process segfault after reload if "health_check_database" is empty.
per bug571.
Tatsuo Ishii [Sat, 8 Feb 2020 10:39:14 +0000 (19:39 +0900)]
Fix oversight to adopt PostgreSQL 12.
From PostgreSQL 12 WAL dir is renamed to "pg_wal". pgpool_setup should
have dealt with this.
Takuma Hoshiai [Tue, 4 Feb 2020 02:51:33 +0000 (11:51 +0900)]
Fix error messsage typo
Bo Peng [Tue, 4 Feb 2020 00:04:05 +0000 (09:04 +0900)]
Doc: fix doc typo.
Bo Peng [Tue, 28 Jan 2020 06:20:46 +0000 (15:20 +0900)]
Feature: Check if socket file exists at startup and remove them if PID file doesn't exist.
If the previuos socket files exist, pgpool startup fails due to bind() failire.
This commit is to remove socket files at startup if PID file doesn't exist.
Also, I found some messages are effectively ignorede because log system is not yet initialized at start point.
I changed them to standard fprintf(stderr...).
Tatsuo Ishii [Sat, 11 Jan 2020 11:49:29 +0000 (20:49 +0900)]
Fix 001.load_balance failure caused by previous commit.
Tatsuo Ishii [Fri, 10 Jan 2020 05:50:38 +0000 (14:50 +0900)]
Attempt to fix 001.load_balance test failure.
It is likely that sleep time after reload was not enough. Change them
from 1 second to 10 seconds.
Tatsuo Ishii [Sat, 4 Jan 2020 01:30:22 +0000 (10:30 +0900)]
More fix for occasional regression test 003.failover failure.
Comparison between show pool_node result and expected file failed
because of replication delay (the expected file set 0 to replication
delay column.) Fix is, disabling replication delay.
Tatsuo Ishii [Tue, 31 Dec 2019 03:06:50 +0000 (12:06 +0900)]
Fix occasional regression test 003.failover failure.
The test compares the result of show pool_nodes against pre-computed
expected text file. After primary node goes down and old standby
promotes to primary, it is possible that streaming replication worker
process stores the replication state (async, streaming) in shared
memory before the failover but fails to update the state after the
failover. Fix is, replace "async", "streaming" with space characters
in the result file to match the expected file.
Tatsuo Ishii [Fri, 27 Dec 2019 00:13:59 +0000 (09:13 +0900)]
Fix occasional regression test failure of 008.dbredirect.
In the test several databases are created and subsequent tests expect
that those databases have been already replicated. But it is possible
that the replication has not been done at that point and makes the
test fails. Fix is, add check the existence of database and if it's
not replicated, sleep for a while and retry it (up to 5 times).
Tatsuo Ishii [Thu, 26 Dec 2019 05:13:08 +0000 (14:13 +0900)]
Fix occasional regression test failure of 008.dbredirect.
In the test pgbench -i is performed. It is possible that this bring
streaming replication delay, which in turn disturbs app/db
redirecting test. Fix is, disabling delay_threshold.
Tatsuo Ishii [Thu, 12 Dec 2019 07:33:18 +0000 (16:33 +0900)]
Fix replication delay worker segfault when application_name is an empty string.
The process calls do_query() to obtain the query result against
pg_stat_replication_view. If user sets application_name to an empty
string, the result data row packet length will be 0. However
do_query() did not consider the length == 0 case, which resulted in
giving NULL pointer to strcmp() which is called from the worker
process. That means the bug is not specific to this case (a new
feature added in Pgpool-II 4.1) but it potentially affects many other
places where do_query() gets called, although it had not been reported
in the field. So this fix should be applied to all supported branches.
Per bug 565.
Takuma Hoshiai [Tue, 3 Dec 2019 01:54:07 +0000 (10:54 +0900)]
Fix a display of parameter when using PGPOOL SHOW.
If show backend_flag parameter by PGPOOL SHOW command, 'ALWAYS_MASTER' config
wasn't displayed by it.
Bo Peng [Mon, 18 Nov 2019 09:04:32 +0000 (18:04 +0900)]
Fix missing syslog setting.
Bo Peng [Mon, 18 Nov 2019 01:07:15 +0000 (10:07 +0900)]
Prepare 3.7.12-2.
Bo Peng [Fri, 15 Nov 2019 07:39:50 +0000 (16:39 +0900)]
Fix the missing syslog in configure file.
per bug557.
Bo Peng [Thu, 31 Oct 2019 00:58:53 +0000 (09:58 +0900)]
Doc: update release-note.
Bo Peng [Thu, 31 Oct 2019 00:32:05 +0000 (09:32 +0900)]
Prepare 3.7.12.
Bo Peng [Thu, 31 Oct 2019 00:24:05 +0000 (09:24 +0900)]
Doc: update doc version.
Bo Peng [Thu, 31 Oct 2019 00:11:03 +0000 (09:11 +0900)]
Fix incorrect query rewrite in replication mode.
Bo Peng [Wed, 30 Oct 2019 23:56:15 +0000 (08:56 +0900)]
Doc: Add release-note.
Bo Peng [Wed, 30 Oct 2019 09:10:04 +0000 (18:10 +0900)]
Add RHEL 8 support.
Bo Peng [Fri, 25 Oct 2019 08:22:22 +0000 (17:22 +0900)]
Fix incorrect query rewrite in replication mode.
For example:
- CREATE TABLE t1 AS SELECT now();
- SELECT now() INTO t1;
- WITH ins AS ( INSERT INTO t1 SELECT now()) SELECT;
Tatsuo Ishii [Mon, 21 Oct 2019 04:44:37 +0000 (13:44 +0900)]
Fix health check time out.
Heal check time out could happen in several places:
1) connect system call
2) select system call
3) read system call
1) was ok but in 2) and 3) it was possible that it could go into
infinite loop in Pgpool-II 3.7 or later. This was due to a mistake
when health check process was modified to be separate process in 3.7,
and back patched to 3.7 and above.
Discussion:
https://www.pgpool.net/pipermail/pgpool-hackers/2019-October/003458.html
https://www.pgpool.net/pipermail/pgpool-hackers/2019-October/003459.html
Takuma Hoshiai [Tue, 15 Oct 2019 09:38:21 +0000 (18:38 +0900)]
Doc: add failover_command description
Add more exaplain about a case that failover_command execute when it
does not have a new master node. In this case, special characters in
failover_command are specified fixed values.
Tatsuo Ishii [Tue, 15 Oct 2019 02:59:58 +0000 (11:59 +0900)]
Fix memory leaks pointed out by coverity.
Tatsuo Ishii [Tue, 15 Oct 2019 00:26:43 +0000 (09:26 +0900)]
Merge branch 'V3_7_STABLE' of ssh://git.postgresql.org/pgpool2 into V3_7_STABLE
Muhammad Usama [Mon, 14 Oct 2019 10:03:08 +0000 (15:03 +0500)]
Fix for miscellaneous watchdog issues.
The commit takes care of following reports in watchdog
-- pgpool-general: 6672 Query (watchdog split brain)
--
0000547: We need to do arping again after recovering from split-brain.
Basically in a bid to solve these issue, the commit makes blow mentioned
behavioral changes in watchdog
1-- If life check reports a watchdog node is dead but the watchdog core is still
able to connect and communicate with it without any problem, the watchdog core
was getting into the dilemma of whether to consider it as a lost or an alive node.
Fix:
With this commit, the lost nodes reported by life-check (external or internal)
are treated as hard failures even when the node is reachable from the watchdog
core. and that node can only become alive if it is restarted or secondly if
life-check informs that the node has become alive again.
The second type of node failures that can mark the watchdog node as lost, occurs
when a watchdog node fails to respond to the messages sent to it or frequent
errors occur on its socket. These errors are detected by the watchdog core
itself and get recovered automatically whenever that node becomes responsive
again. Apart from this the commit also made some improvements in the area of
detecting such internal errors.
2-- Standby nodes were too aggressive in reacting to the coup attempt
(when another watchdog node tries to become a master while valid master already
exists in the cluster) or to the possibility of split-brain (when it receives
"I AM MASTER" message from the node that is not the master as per the standby
node record). In both these situations, the standby nodes used to re-join
the master node in the hope to find the true master.
But that didn't prove to be a very good strategy, since it may lead to
influence the selection of true master node when the true-master and
fake-master nodes get into a fight to retain the master node status.
As one of the things true and fake master compares is the number of connected
standby nodes. And if during the fight the standby leaves the current master
to re-join the cluster the standby node count on the true master becomes
lower than actual and that affects its chances of winning the elections.
Fix:
The commit makes the standby nodes more lean-back and to actively reject the
nodes that are trying to become a master in the presence of the true-master
and avoid re-joining the cluster until it is absolutely necessary.
3-- The third problem was in the case of network partitioning or partial
life-check failure (when node A thinks node B is lost but node B thinks
node A is not lost) In this case the kicked out standby node was too
aggressive in trying to connect to the master or become a master itself
so that potentially put the unnecessary burden on the network and the
cluster nodes.
Fix:
So the fix for this issue is to make the isolated node a bit calm and wait
between trying to connect to master or become a master. For that purpose,a new
WD_NETWORK_ISOLATION state is added to the state machine, which just adds a
delay between successive tries of becoming a master. The node can only go into
this state when it comes to knowing that it is marked as lost on the current
master and gets out of it when it gets an indication that life-check on the
remote node has informed that node is alive again or secondly it gets out of
it after 10 seconds anyway.
Finally, Because creating the partial or complete network failures scenario is
very difficult so the commit also adds a watchdog debug aid similar to the
health-check debug to simulate different scenarios.
To enable the watchdog-debug aid the Pgpool-II needs to be compiled with
WATCHDOG_DEBUG=1" flag ($make WATCHDOG_DEBUG=1)
Once compiled with debug aid enabled, you can put the command in
pgpool_logdir->watchdog_debug_requests file to test different scenarios.
e.g.
$ echo "KILL_ALL_COMMUNICATION" > logdir/watchdog_debug_requests
$ echo "KILL_ALL_SENDERS" >> logdir/watchdog_debug_requests
The current list of commands that are supported by watchdog debug aid are
DO_NOT_REPLY_TO_BEACON <= Standby node stops replying master node beacon messages
while this line is present in the watchdog_debug_requests file
DO_NOT_SEND_BEACON <= Master node stops sending beacon messages to standby nodes
while this line is present in the watchdog_debug_requests file
KILL_ALL_COMMUNICATION <= Watchdog stops all communication with all nodes
while this line is present in the watchdog_debug_requests file
KILL_ALL_RECEIVERS <= Watchdog ignores messages of all node
while this line is present in the watchdog_debug_requests file
KILL_ALL_SENDERS <= Watchdog stops sending messages to all nodes
while this line is present in the watchdog_debug_requests file
Tatsuo Ishii [Fri, 11 Oct 2019 00:52:53 +0000 (09:52 +0900)]
Doc: add note to trusted_servers.
"Please note that you should not assign PostgreSQL servers to this
parameter." This should have been noted since there is at least one
user who actually did it.
Bo Peng [Thu, 10 Oct 2019 06:13:09 +0000 (15:13 +0900)]
Update chanelog.
Bo Peng [Thu, 10 Oct 2019 05:52:49 +0000 (14:52 +0900)]
Update pgpool.spec to support PostgreSQL 12.
Bo Peng [Thu, 10 Oct 2019 04:58:32 +0000 (13:58 +0900)]
Update src/redhat/pgpool_socket_dir.patch.
Tatsuo Ishii [Thu, 10 Oct 2019 01:01:08 +0000 (10:01 +0900)]
Fix assorted ancient v2 protocol bugs.
- In v2 code path, extract_message() pfrees memory which was returned
from pool_read_string(). This is plain wrong and could cause sefault
since the memory returned by it is being managed by pool_stream
modules.
- In v2 code path pool_process_notice_message_from_one_backend() added
"NOTICE:" to the log message. This is not necessary as the part is
already included in the message.
- In v2 code path pool_extract_error_message() did not prepare unread
data correctly. This caused subsequent
pool_process_notice_message_from_one_backend() produces empty
message, read_kind_from_backend() failure.
Takuma Hoshiai [Tue, 8 Oct 2019 07:02:01 +0000 (16:02 +0900)]
Fix extended query communication in do_query()
do_query() didn't send Describe message to PostgreSQL.
It didn't use strcasecmp() but strcasecmp() when check if query is SELECT.
Takuma Hoshiai [Tue, 8 Oct 2019 04:08:25 +0000 (13:08 +0900)]
Fix problem that syslog_facility don't change by reload
The cause is macro definition mistake. This fix unify macro definition, and delete old test code to use vsyslog().
Reported in bug 548.
Tatsuo Ishii [Fri, 4 Oct 2019 04:52:19 +0000 (13:52 +0900)]
Fix inappropriate ereport call in VALID_BACKEND.
VALID_BACKEND (more precisely pool_virtual_master_db_node_id) macro
emitted message if pgpool is performing failover/failback:
ereport(WARNING,
(errmsg("failover/failback is in progress"),
errdetail("executing failover or failback on backend"),
errhint("In a moment you should be able to reconnect to the database")));
This could be called within signal handlers and
POOL_SETMASK(&BlockSig)/POOL_SETMASK(&UnBlockSig) was called to block
an interrupt because ereport is not reentrant. However it is possible
that callers have already called POOL_SETMASK, and this could result
unwanted signal unblock.
Fix is, removing ereport and POOL_SETAMASK all together. This results
in removing the message above but we have no choice.
I found the problem while investigating regression
055.backend_all_down failure but of course the bug could have bitten
users in other places.
Muhammad Usama [Thu, 3 Oct 2019 14:53:44 +0000 (19:53 +0500)]
Fix for Coverity warnings in watchdog and lifecheck
Tatsuo Ishii [Thu, 3 Oct 2019 12:33:09 +0000 (21:33 +0900)]
Fix signal unblock leak in failover.
When failover event occurs, register_node_operation_request() gets
called to en-queue failover/failback requests. If the request queue is
full, this function returns false with unlocking semaphore. But it
forgot to unblock signal mask. This leads to block all signals
including SITERM, which makes pgpool fail to shutdown.
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2019-October/003449.html
Muhammad Usama [Sat, 28 Sep 2019 20:15:11 +0000 (01:15 +0500)]
Fix for bug-545: Quorum lost and not recovered
Master watchdog node was not adding the lost standby node to its list of valid
standby nodes after it is rediscovered by the lifecheck.The fix is to ask the
node to rejoin the master node when it gets rediscovered by the lifecheck.
As part of this commit, I have also added the watchdog data version and Pgpool-II
version in the watchdog info packet to make the extensions in the watchdog
messages easier in the future.
Thanks to Guille(reporter of this bug), for providing lots of help in testing the fix
Tatsuo Ishii [Tue, 6 Aug 2019 02:27:30 +0000 (11:27 +0900)]
Overhaul health check debug facility.
check_backend_down_request() in health_check.c is intended to simulate
the situation where communication failure between health check and
PostgreSQL backend node by creating a file containing lines:
1 down
where the first numeric is the node id starting from 0, tab, and
"down". When health check process finds the file, let health check
fails on node 1.
After health check brings the node into down status,
check_backend_down_request() change "down" to "already_down" to
prevent repeating node failure.
However, questions is, this is necessary at all. I think
check_backend_down_request() should keep on reporting the down status
and it should be called inside establish_persistent_connection() to
prevent repeating node failure because it could be better simulated
the failing situation in this way. For example, currently the health
check retry is not simulated but the new way can do it.
Moreover, in current watchdog implementation, to bring a node into
quarantine state requires *two" times of node communication error
detection. Since check_backend_down_request() only allows to raise
node down even *once" (after the down state is changed to already_down
state), it's impossible to test the watchdog quarantine using
check_backend_down_request(). I changed check_backend_down_request()
so that it continues to raise "down" event as long as the down request
file exists.
This commit enhances check_backend_down_request() as described above.
1) caller of check_backend_down_request() is
establish_persistent_connection(), rather than
do_health_check_child().
2) check_backend_down_request() does not change "down" to
"already_down" anymore. This means that the second argument of
check_backend_down_request() is not useful anymore. Probably I
should remove the argument later on.
Tatsuo Ishii [Wed, 25 Sep 2019 05:22:21 +0000 (14:22 +0900)]
Fix memory leak in replication mode.
Per coverity.
Tatsuo Ishii [Tue, 24 Sep 2019 23:49:48 +0000 (08:49 +0900)]
Fix memory leak while attempting to connect to backend.
If no backend is up and running, memory for copy of startup packet
will be lost. This was brought by commit
cdb49d3b7. Per coverity.
Tatsuo Ishii [Wed, 18 Sep 2019 01:51:28 +0000 (10:51 +0900)]
Fix uninitialized variable.
Per Coverity.
Tatsuo Ishii [Tue, 17 Sep 2019 23:07:27 +0000 (08:07 +0900)]
Fix compiler warnings.
Tatsuo Ishii [Tue, 17 Sep 2019 22:39:15 +0000 (07:39 +0900)]
Fix compiler warning.
Tatsuo Ishii [Mon, 16 Sep 2019 22:21:25 +0000 (07:21 +0900)]
Revert "Fix occasional query hang while processing DEALLOCATE."
This reverts commit
8925e047fb42ae758966efa8eade47f6f8fbc41c.
Tatsuo Ishii [Mon, 16 Sep 2019 00:24:08 +0000 (09:24 +0900)]
Fix occasional query hang while processing DEALLOCATE.
When DEALLOCATE tries to remove a named statement, it inherits
where_to_send map of the named statement in
where_to_send_deallocate(). However it forgot to copy the load balance
node id in the query context of the named statement. This made sending
query to backend not happen: if the target node id is different from
query_context->load_balance_node_id nor primary node id,
pool_virtual_master_db_node_id (it is called as MASTER_NODE_ID)
returns primary node id, and pool_send_and_wait(MASTER_NODE_ID)
ignores the request because VALID_BACKEND returns false in this case
(MASTER_NODE_ID = primary node id is not in the where_to_send map). As
a result, following check_error() waits for response from backend in
vain.
Fix is, let where_to_send_deallocate() copy load balance node id from
the query context of the previous named statement.
Per bug 546.
Tatsuo Ishii [Sun, 15 Sep 2019 13:39:18 +0000 (22:39 +0900)]
Fix segfault in certain case.
The scenario is something like:
1) a named statement is created.
2) DEALLOCATE removes it.
3) an erroneous query is executed.
In #2, "sent message" for the named statement is removed but
"uncompleted_message" is left. Then after #3, in ReadyForQuery()
uncompleted_message is added and removed. However, storage for the
uncompleted_message has been already freed in #2, and it causes a
segfault.
Fix is, in SimpleQuery() set NULL to uncompleted_message if it's not
PREPARE command so that ReadyForQuery() does not try to remove the
already removed message.
Per bug 546.
Here is a minimum test case.
'P' "_plan0x7f2d465db530" "SELECT 1" 0
'S'
'Y'
'Q' "DEALLOCATE _plan0x7f2d465db530"
'Y'
'Q' "CREATE INDEX users_auth_id_index ON non_existing_table ( auth_id )"
'Y'
'X'
Tatsuo Ishii [Thu, 12 Sep 2019 04:40:05 +0000 (13:40 +0900)]
Fix identical code used for different branches per Coverity.
Tatsuo Ishii [Thu, 12 Sep 2019 04:39:41 +0000 (13:39 +0900)]
Fix memory leak per Coverity.
Tatsuo Ishii [Tue, 10 Sep 2019 06:54:13 +0000 (15:54 +0900)]
Fix typo in fork_lifecheck_child().
Tatsuo Ishii [Tue, 10 Sep 2019 06:42:10 +0000 (15:42 +0900)]
Fix typo in fork_watchdog_child().
Tatsuo Ishii [Fri, 6 Sep 2019 07:31:07 +0000 (16:31 +0900)]
Fix memory leak.
Per Coverity.
Tatsuo Ishii [Fri, 6 Sep 2019 06:54:39 +0000 (15:54 +0900)]
Fix memory leak.
Per Coverity.
Tatsuo Ishii [Fri, 6 Sep 2019 06:24:09 +0000 (15:24 +0900)]
Fix uninitialized variable.
Probably harmless but bug is bug...
Per Coverity.
Tatsuo Ishii [Tue, 3 Sep 2019 22:45:17 +0000 (07:45 +0900)]
Doc: mention that VIP will not be brougt up if quorum does not exist.
Tatsuo Ishii [Sun, 1 Sep 2019 02:38:35 +0000 (11:38 +0900)]
Fix pgpool_setup to reflect the -p (baseport) to ORIGBASEPORT variable.
Otherwise, shutdown generated script by pgpool_setup does not use
proper port number for netstat command.
Bo Peng [Mon, 26 Aug 2019 06:51:22 +0000 (15:51 +0900)]
Doc: Fix missing documents from previous commit.
Tatsuo Ishii [Sun, 25 Aug 2019 02:37:35 +0000 (11:37 +0900)]
Doc: fix indentation in scripts.
Auto indentation by commit
2cb0bd3f8f236aeacfba37cd4d604893561bad52
broke indentation of scripts in <programlisting> tag.
Tatsuo Ishii [Sun, 25 Aug 2019 01:15:37 +0000 (10:15 +0900)]
Doc: fix typo in "What is Pgpool-II?" section.
Author: Alejandro Roman
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2019-August/003392.html
Bo Peng [Fri, 16 Aug 2019 02:51:36 +0000 (11:51 +0900)]
Add "-I" option to "arping_cmd" command default setting.
Tatsuo Ishii [Thu, 15 Aug 2019 22:35:25 +0000 (07:35 +0900)]
Doc: run auto indent using emacs.
Here is the emacs script F.Y.I.
;; must be run by emacs
(load "/home/t-ishii/.emacs.d/init.el")
(find-file (nth 0 command-line-args-left));
(indent-region (point-min) (point-max));
(save-buffer)
Bo Peng [Thu, 15 Aug 2019 07:58:21 +0000 (16:58 +0900)]
Doc: update doc version.
Bo Peng [Thu, 15 Aug 2019 05:16:50 +0000 (14:16 +0900)]
Test: rename test script.
Bo Peng [Thu, 15 Aug 2019 04:58:27 +0000 (13:58 +0900)]
Doc: fix incorrect link name.
Bo Peng [Thu, 15 Aug 2019 04:38:30 +0000 (13:38 +0900)]
Prepare 3.7.11.
Bo Peng [Thu, 15 Aug 2019 04:30:58 +0000 (13:30 +0900)]
Doc: update release-note.
Bo Peng [Thu, 15 Aug 2019 01:29:44 +0000 (10:29 +0900)]
Doc: add 3.7.11-3.4.25 release-note.
Tatsuo Ishii [Wed, 14 Aug 2019 00:14:36 +0000 (09:14 +0900)]
Fix memory leak.
Pointed out by Coverity.
Tatsuo Ishii [Thu, 8 Aug 2019 02:38:02 +0000 (11:38 +0900)]
Make waiting for TIME_WAIT in pgpool_setup optional.
Since commit
3b32bc4e583da700cc8df7c5777e90341655ad3b the shutdownall
script generated by pgpool_setup waits for Pgpool-II socket in
TIME_WAIT state disappeared. However in most cases this takes long
time and it makes uncomfortable for developer's testing works.
This commit makes the wait to be optional: unless environment variable
"CHECK_TIME_WAIT" is set to other than "false", it never waits for the
TIME_WAIT state.
Tatsuo Ishii [Fri, 9 Aug 2019 08:04:28 +0000 (17:04 +0900)]
Fix "unable to bind. cannot get parse message" error.
This was caused by too-eager memory free in parse_before_bind. It
called
pool_remove_sent_message/pool_create_sent_message/pool_add_sent_message
combo to replace the query context in the sent message. Unfortunately
pool_remove_sent_message free memory such as statement name, which was
being passed by caller. As a result, the new sent message created by
pool_create_sent_message pointed to freed statement name, which may
make a search by statement name fail because now the statement name in
the sent message points to freed memory area, which might be
overwritten by later memory allocation. Fix is, instead of calling
pool_remove_sent_message etc., just replace the query context in the
sent message.
Per bug 531.
Muhammad Usama [Thu, 8 Aug 2019 14:03:40 +0000 (19:03 +0500)]
Fix for
0000483: online-recovery is blocked after a child process exits ...
The problem is if some child process exits abnormally during the second stage
of online recovery, then the connection counter that keeps the track of exiting
processes does not get decremented and Pgpool-II keeps waiting for the exit of
the already exited process. Eventually, the recovery fails after
client_idle_limit_in_recovery expires.
The fix for this issue is to set the connection counter to zero when
client_idle_limit_in_recovery is enabled and it has less value than
recovery_timeout, Since all clients must have been kicked out by the time
when client_idle_limit_in_recovery expires.
A similar fix is already committed as part of bug 431 by Tatsuo Ishii, So this
commit basically imports the same logic in the watchdog function that processes
the remote online recovery requests.
Apart from the above-mentioned change, Hoshiai San identified that the watchdog
IPC command timeout for the online recovery start function executed through watchdog
is set exactly to the same as recovery_timeout which needs to be increased to
make the solution work correctly.
Muhammad Usama [Wed, 15 May 2019 21:36:35 +0000 (02:36 +0500)]
Fix for [pgpool-hackers: 3295] duplicate failover request ...
Pgpool should keep the backend health check running on quarantined nodes so
that when the connectivity resumes, they should automatically get removed
from the quarantine. Otherwise the temporary network glitch could send the node
into permanent quarantine state.
Muhammad Usama [Wed, 7 Aug 2019 15:22:01 +0000 (20:22 +0500)]
Fix for no primary on standby pgpool when primary is quarantined on master
Master watchdog Pgpool sends primary_node_id = -1 in the backend status sync
message if the primary node is quarantined on it. So standby watchdog Pgpool
must not update its primary_node_id if the primary backend node id in sync
message is invalid_node_id (-1) while the same sync message reports the
backend status of the current primary node as "NOT DOWN".
The issue was reported by "Tatsuo Ishii <ishii@sraoss.co.jp>" and fixed by me
Tatsuo Ishii [Thu, 8 Aug 2019 02:02:50 +0000 (11:02 +0900)]
Import some of memory manager debug facilities from PostgreSQL.
Now we can use CLOBBER_FREED_MEMORY, which is useful to detect
accesses to already pfreed memory.
Takuma Hoshiai [Mon, 29 Jul 2019 06:08:53 +0000 (15:08 +0900)]
Fix watchdog_setup command option
The mode option is incorrectly. when pgpool_setup command is called by
watchdog_setup command, mode option forget to set.
Tatsuo Ishii [Sun, 28 Jul 2019 02:11:07 +0000 (11:11 +0900)]
Fix pgpool_setup to produce correct follow master command.
The produced script incorrectly checked whether PostgreSQL is running
or not, which resulted in that it mistakenly thought PostgreSQL is
always running.