Tatsuo Ishii [Tue, 19 Jun 2018 02:31:29 +0000 (11:31 +0900)]
Fix oversight in pool_extract_error_message().
A variable used for storing the return value was mistakenly declared
as bool, rather than int. This led to a segfault issue mentioned in
75b27e7 on certain platform.
Tatsuo Ishii [Tue, 19 Jun 2018 01:10:33 +0000 (10:10 +0900)]
Fix segfault in per_node_error_log() on armhf architecture.
pool_extract_error_message() incorrectly returns 255 (in decimal) on
the architecture when previous message was not an error or a notice
message. In this case per_node_error_log() happily calls ereport since
the return value from pool_extract_error_message() is greater than
0. Unfortunately the message string returned by
pool_extract_error_message() points to garbage memory in this case, a
segfault occurs.
The fix gives per_node_error_log() a guard against the bug of
pool_extract_error_message(). Moreover, the change is more consistent
with other places where pool_extract_error_message() is called.
Fix for pool_extract_error_message() will come later on.
See:
https://github.com/pgpool/pgpool2/issues/14
for more detailed discussion.
Problem reported and patch by Christian Ehrhardt.
Tatsuo Ishii [Tue, 12 Jun 2018 06:53:39 +0000 (15:53 +0900)]
Revert "Fix 055.backend_all_down test failure."
This reverts commit
ca492a65bbf57c2dc78826690ea259ddc9d6e78b.
Tatsuo Ishii [Tue, 12 Jun 2018 06:21:52 +0000 (15:21 +0900)]
Fix 055.backend_all_down test failure.
The test fails because pgpool zombie child process remains. Actually
the failover process is properly performed but when the shutdown
script is executed in background, output to stdout/stderr was blocked,
and this could cause the zombie process syndrome. Soltution is,
redirecting stdout/stderr to /dev/null when sponing the shutdown
script in background.
Bo Peng [Mon, 11 Jun 2018 14:10:22 +0000 (23:10 +0900)]
Prepare 3.4.18.
Bo Peng [Wed, 25 Apr 2018 14:31:12 +0000 (23:31 +0900)]
Fix compile error.
Tatsuo Ishii [Wed, 18 Apr 2018 01:03:37 +0000 (10:03 +0900)]
Prevent pcp_recovery_node from recovering "unused" status node.
This allowed to try to recover a node without configuration data,
which leads to variety of problems. See discussion:
https://www.pgpool.net/pipermail/pgpool-general/2018-March/006021.html
for more details.
Also I fixed pgpool_recovery function so that it quotes an empty
string argument with double quotes. Without this, the argument is
treated as if it does not exist, which was the source of the complain
from the user.
Bo Peng [Tue, 17 Apr 2018 08:51:17 +0000 (17:51 +0900)]
Prepare 3.4.17.
Tatsuo Ishii [Mon, 9 Apr 2018 05:45:37 +0000 (14:45 +0900)]
Add new regression test for node 0 is down.
test case 1: node 0 is already down before pgpool starts.
test case 2: node 0 goes down after pgpool starts.
test case 3: node 0 goes down and DISALLOW_TO_FAILOVER flag is set after pgpool starts.
Tatsuo Ishii [Sun, 8 Apr 2018 10:18:36 +0000 (19:18 +0900)]
Make calls to to_regclass fully schema qualified.
This is always recommended way.
Bo Peng [Fri, 23 Mar 2018 09:21:34 +0000 (18:21 +0900)]
Change test script to allow test failure.
Bo Peng [Tue, 20 Mar 2018 08:42:46 +0000 (17:42 +0900)]
Add wait_for_failover_done function for test.
Bo Peng [Tue, 20 Mar 2018 08:31:31 +0000 (17:31 +0900)]
Improve the test script 003.failover.
Bo Peng [Wed, 14 Mar 2018 08:27:36 +0000 (17:27 +0900)]
Change the release version.
Bo Peng [Wed, 14 Mar 2018 06:36:34 +0000 (15:36 +0900)]
Prepare 3.4.16-2.
Bo Peng [Wed, 14 Mar 2018 05:01:52 +0000 (14:01 +0900)]
Fix some test errors.
Add "wait_for_pgpool_startup" to wait for Pgpool-II starting.
Tatsuo Ishii [Tue, 27 Feb 2018 04:22:15 +0000 (13:22 +0900)]
Allow to support pgpool_switch_xlog PostgreSQL 10.
Since PostgreSQL 10, pgpool_switch_xlog used in the recovery second
stage fails due to function name changes in PostgreSQL 10.
Tatsuo Ishii [Tue, 27 Feb 2018 04:29:38 +0000 (13:29 +0900)]
Fix failure in replication mode.
If .psqlrc exists, pgpool_seup for replication mode fails because psql
produces messages like "Pager usage is off." which in turn confuses
a command after a pipe. Fix is add -q option to psql.
Bo Peng [Tue, 13 Feb 2018 05:20:30 +0000 (14:20 +0900)]
Prepare 3.4.16.
Bo Peng [Mon, 12 Feb 2018 14:56:52 +0000 (23:56 +0900)]
Fix typos.
Tatsuo Ishii [Mon, 29 Jan 2018 04:53:18 +0000 (13:53 +0900)]
Set TCP_NODELAY and non blocking to frontend socket.
TCP_NODELAY is employed by PostgreSQL, so do we it.
Listen fd is set to non blocking. To make sure accept fd is set to non
blocking.
Tatsuo Ishii [Tue, 23 Jan 2018 23:01:22 +0000 (08:01 +0900)]
Fix segfault when %a is in log_line_prefix and debug message is on.
log_line_prefix() gets called to create a log line prefix string. If
"%a" is specified in "log_line_prefix" parameter, log_line_prefix()
calls MASTER_CONNECTION macro, which calls
pool_virtual_master_db_node_id(), which calls ereport(), which calls
log_line_prefix() if debug message is on. This leads to an infinite
recursion and a segfault. Fix is, calling MASTER_NODE_ID macro instead
of MASTER_CONNECTION macro.
Per bug 376.
Bo Peng [Fri, 19 Jan 2018 05:00:27 +0000 (14:00 +0900)]
Change systemd service file to use STOP_OPTS=" -m fast".
Bo Peng [Fri, 19 Jan 2018 04:58:44 +0000 (13:58 +0900)]
Change pgpool_setup to add restore_command in recovery.conf.
Tatsuo Ishii [Thu, 18 Jan 2018 14:34:38 +0000 (23:34 +0900)]
Revert "Fix queries hanging in parse_before_bind with extended protocol and replication + load-balancing."
This reverts commit
0ac97a9c9600d1ec8c4cbd0bf30191beeb28ace9.
Tatsuo Ishii [Thu, 18 Jan 2018 13:14:38 +0000 (22:14 +0900)]
Fix queries hanging in parse_before_bind with extended protocol and replication + load-balancing.
In case the client sends a BIND message for a query
that has not yet been parsed by the executing node,
the PARSE will be executed before attempting to BIND
the parameters.
However, during the execution of the PARSE, the session
context is not set to in_progress, which leads to wrong
backend validity tests in read_kind_from_backend which
in turn makes the process wait on a backend which is not
going to send anything.
Fixes bug #377.
Problem analysis and fix by Ancoron Luciferis and me.
Tatsuo Ishii [Thu, 18 Jan 2018 01:37:23 +0000 (10:37 +0900)]
Fix queries hanging in parse_before_bind with extended protocol and replication + load-balancing.
In case the client sends a BIND message for a query
that has not yet been parsed by the executing node,
the PARSE will be executed before attempting to BIND
the parameters.
However, during the execution of the PARSE, the session
context is not set to in_progress, which leads to wrong
backend validity tests in read_kind_from_backend which
in turn makes the process wait on a backend which is not
going to send anything.
Fixes bug #377.
Bug report, analysis and fix by Ancoron Luciferis.
Bo Peng [Mon, 8 Jan 2018 07:20:35 +0000 (16:20 +0900)]
Prepare 3.4.15.
Tatsuo Ishii [Fri, 22 Dec 2017 06:20:13 +0000 (15:20 +0900)]
Replace /bin/ed with /bin/sed.
This change requires less packages in order to install pgpool_setup,
because /bin/sed is included in most distribution's base packages,
while //bin/ed is not.
Bo Peng [Thu, 21 Dec 2017 05:42:39 +0000 (14:42 +0900)]
Change the pgpool.service and sysconfig files to output Pgpool-II log.
Removeing "Type=forking" and add OPTS=" -n" to
run Pgpool-II with non-daemon mode, because we need to redirect logs.
Using "journalctl" command to see Pgpool-II systemd log.
Bo Peng [Mon, 18 Dec 2017 02:31:59 +0000 (11:31 +0900)]
Fix some test script error of 002.native_replication.
Bo Peng [Sun, 17 Dec 2017 14:30:15 +0000 (23:30 +0900)]
Fix timestamp data inconsistency by replication mode.
From PostgreSQL10 the column default value such as 'CURRENT_DATE' changes,
Pgpool-II should also rewrite timestamp by the added default values in stead of
"'now'::text::date".
Tatsuo Ishii [Sun, 10 Dec 2017 12:04:15 +0000 (21:04 +0900)]
Fix returning transaction state when "ready for query" message received.
We return primary or master node state of ready for query message to
frontend. In most cases this is good. However if other than primary
node or master node returns an error state (this could happen if load
balance node is other than primary or master node and the query is an
errornous SELECT), this should be returned to frontend, because the
frontend already received an error.
How much this affects to applications is not clear. However this
behavior has been there for long time, probably applications do not
care the state returned by ready for query message.
Tatsuo Ishii [Wed, 15 Nov 2017 23:12:13 +0000 (08:12 +0900)]
Fix pgpool start message printed multiple times.
When an exception occurs in the main loop, longjmp() gets called and
the variable "first" restored to the initial value. This make the
pgpool start message printed multiple times. This is harmless but
confusing. To fix that, add "volatile" qualifier so that the variable
is on the stack, rather than on a register.
Fix suggested by Muhammad Usama.
pengbo [Wed, 1 Nov 2017 00:39:17 +0000 (09:39 +0900)]
Add pgpool_rhel7.sysconfig and pgpool_rhel6.sysconfig files to distribution.
pengbo [Wed, 1 Nov 2017 00:06:25 +0000 (09:06 +0900)]
Prepare 3.4.14.
pengbo [Tue, 31 Oct 2017 03:26:15 +0000 (12:26 +0900)]
Add different pgpool.sysconfig file for RHEL6 and RHEL7.
In RHEL6, the "-n" option is needed to redirect log.
Reported in bug 343.
Tatsuo Ishii [Sun, 8 Oct 2017 06:57:08 +0000 (15:57 +0900)]
Fix some compiler warnings.
Tatsuo Ishii [Sun, 8 Oct 2017 01:29:29 +0000 (10:29 +0900)]
Fix finding primary node is not working.
It was reported that finding primary node was not executed in local
mailing list [pgpool-general-jp: 1462]. The bug was introduced in
commit
e4ce880bd36b8f249bf693c086a1313148f3449a between 3.4.11 and
3.4.12. This requires the Req_info->primary_node_id to negatives
value but I forgot to miss the part. Fix the problem by initializing
Req_info->primary_node_id.
Note that the bug was not in 3.5 or above because commit
bda946e718fe6f3605eb7e82ada8754bd84a279c has made it. Unfortunately I
missed the commit was not in 3.4 stable.
Tatsuo Ishii [Fri, 22 Sep 2017 02:50:28 +0000 (11:50 +0900)]
Fix bug mistakenly overriding global backend status right after failover.
In [pgpool-general: 5728] it is reported that even if failover
disconnects a backend, the status is changed from "down" to "up" in
certain timing. After debugging I found that the backend status in
pgpool_status was changed to down, then changed again by the first
connection from a client after the failover. This happened in
new_connection(), which in charge of creating a new connection to
backend. It checks the local cached status of the backend and if it's
up, then it tries to connect to the backend. In the particular case,
the failover is triggered by failover_if_affected_tuples_mismatch, so
actually the backend is alive and new_connection() succeeds in
establishing connection to the disconnected backend. Then it override
the global status and pgpool_status file.
Fix is, check if the local backend status is obsoleted. If the global
status does not agree the local status, skip the effort to establish
the connection.
In this report the user uses native replication mode, but I think
similar situation can happen in other mode.
Tatsuo Ishii [Mon, 11 Sep 2017 06:11:28 +0000 (15:11 +0900)]
Deal with OpenSSL 1.1.
Patch by me and Usama.
pengbo [Mon, 4 Sep 2017 07:46:05 +0000 (16:46 +0900)]
Prepare 3.4.13.
Yugo Nagata [Thu, 24 Aug 2017 09:38:47 +0000 (18:38 +0900)]
Fix documentation of load-balancing
Tatsuo Ishii [Tue, 15 Aug 2017 01:25:35 +0000 (10:25 +0900)]
Fix ancient bug of pool_unread.
When realloc() is called in pool_unread(), it did not update the
buffer size. This could cause variety of memory corruption and
unexpected data reading from backend. The reason why we did not found
that is, probably recently Pgpool-II starts extensively to use
pool_unread().
Tatsuo Ishii [Thu, 3 Aug 2017 03:41:40 +0000 (12:41 +0900)]
Fix Java program in 005.regression test.
The program forgot issue commit and always ended up with ABORT. This
does not affect the test result itself, it would be better to finish
the test with commit, rather than ABORT.
Tatsuo Ishii [Thu, 3 Aug 2017 02:52:24 +0000 (11:52 +0900)]
Fix bug #303.
When failover is triggered by worker process, it is possible that
wrong DB node could failover. This is due to the db_node_id member in
the POLL_CONNECTION structure is not initialized in the process (in
child process the member is properly initialized). To solve the
problem, add new function pool_set_db_node_id() to set the structure
member variable and call it inside
make_persistent_db_connection(). For this purpose also the new
parameter "db_node_id" is added to make_persistent_db_connection and
friends.
Tatsuo Ishii [Mon, 31 Jul 2017 08:29:27 +0000 (17:29 +0900)]
Fix starting unnecessary transaction when SET command is issued.
In streaming replication mode, there's no point to start internal
transaction when SET command is issued. The existing behavior is
harmless but waste of CPU cycle.
Muhammad Usama [Thu, 27 Jul 2017 14:47:48 +0000 (19:47 +0500)]
Fix for [pgpool-general: 5621] pgpool 3.6.4 failover
Failover() function should be executed with health check alarm disabled
The reason is the find primary node and health check uses the same function to
make a connection to the backends, and the backend connect function has a built
in logic to bail out with the timeout error when the health check alarm occurs.
Now the problem is if the failover() is executed while the health check alarm is
still running, Their is a strong possibility that the find primary node function
may find the health check timer already expired and exit without locating the
new primary node.
The relevant part is back-ported from the 3.6 branch
Yugo Nagata [Thu, 20 Jul 2017 05:55:31 +0000 (14:55 +0900)]
Allow make dist to include pgpool.service
pengbo [Mon, 10 Jul 2017 07:40:10 +0000 (16:40 +0900)]
Prepare 3.4.12.
Muhammad Usama [Thu, 29 Jun 2017 15:30:59 +0000 (20:30 +0500)]
Importing the latest changes in the MemoryManager API from PostgreSQL code.
Pgpool-II has borrowed the MemoryManager from PostgreSQL source and recently
there have been some performance updates made to the PostgreSQL's
memory manager, from which Pgpool-II can also benefit from.
This need was identified by the reporter of issue: "315: High CPU usage when
commiting large transactions and using in (shared) memory cache"
Muhammad Usama [Wed, 21 Jun 2017 12:54:04 +0000 (17:54 +0500)]
Fixing: [pgpool-hackers: 2390] Problems with the relative paths in daemon mode
Pgpool-II does all the path related calculations based on CWD
(current working directory) which only works as expected and intended
until the CWD does not change.
But in daemon mode the first thing Pgpool-II do after becoming the daemon
process is, it changes it's CWD to system root ("/") and after that all the
relative paths becomes invalid.
This means in daemon mode if the pgpool.conf file is specified as an argument
using the relative path, Pgpool-II will not be able to find the pool_passwd
or other required file whose location depends on the location of pgpool.conf
file, and will eventually fail to start, or even worst may read/write
some wrong file.
The solution to this is to convert the relative path of the pgpool.conf and
all file paths provided in the argument list to the Pgpool-II to the absolute
paths at the startup and later use those absolute paths for all path related
calculations.
Apart from using the absolute paths for pgpool.conf, pcp.conf and pool_hba.conf
files, The commit also modifies the behaviour of pid_file_name configuration
parameter, when the relative path is used for this config in pgpool.conf,
that relative path is considered with relative to the pgpool.conf path,
instead of the current working directory.
Tatsuo Ishii [Mon, 19 Jun 2017 07:40:06 +0000 (16:40 +0900)]
Adjust function name change in PostgreSQL 10 dev head.
pg_current_wal_location -> pg_current_wal_lsn
pg_last_wal_replay_location -> pg_last_wal_replay_lsn
Tatsuo Ishii [Wed, 31 May 2017 01:45:55 +0000 (10:45 +0900)]
Fix query cache hang when used by node.js.
node.js sends a query in following pattern:
Parse
Bind
Describe
Execute
Flush
Sync
Notice the "Flush" message. This is unnecessary message and Pgpool-II
did not prepare for it. Since Pgpool-I supposed that next message to
Execute is "Sync" in pool_fetch_from_memory_cache(), it actually read
the "Flush" message and forwarded to backend, then discarded
subsequent "Ready for query" message, which was actually a "Sync"
message. That results in no "ready for query" message from backend.
Fix is, do not have any assumption regarding messages after Execute,
instead returns to the message processing loop. This way, whatever
messages coming after Execute should be properly processed.
Following is the test data for pgproto.
'Q' "DROP TABLE IF EXISTS pgproto_test1"
'Y'
'Q' "CREATE TABLE pgproto_test1(i INT)"
'Y'
'Q' "INSERT INTO pgproto_test1 VALUES(1)"
'Y'
'P' "S2" "SELECT 1 FROM pgproto_test1" 0
'B' "" "S2" 0 0 0
'D' 'S' "S2"
'E' "" 0
'H'
'C' 'S' "S2"
'S'
'Y'
'P' "S2" "SELECT 1 FROM pgproto_test1" 0
'B' "" "S2" 0 0 0
'D' 'S' "S2"
'E' "" 0
'H'
'C' 'S' "S2"
'S'
'Y'
'X'
Discussion: http://www.pgpool.net/pipermail/pgpool-general/2017-May/005569.html
Tatsuo Ishii [Sun, 28 May 2017 08:38:55 +0000 (17:38 +0900)]
Deal with PostgreSQL 10 in streaming replication delay checking.
From PostgreSQL 10 or later, some built in function names which are
used in streaming replication delay checking are changed. To deal with
this, get server_version_num from backend to change the calling
functions and cache them. Using 'S' message to extract the server
version string while connecting backend could be an idea, but
unfortunately connecting function used in streaming replication delay
checking does not handle 'S' parameter. Also converting from the
server version string to the server_version_num can be backend
dependent, so to extract the server_version_num would be safer.
Tatsuo Ishii [Sat, 27 May 2017 11:28:40 +0000 (20:28 +0900)]
Fix query cache memory leak.
Clearing cache buffers in case of no oid queries (like BEGIN,
CHECKPOINT, VACUUM, etc) should have been done, but it did not.
Patch from Dang Minh Huong.
Tatsuo Ishii [Fri, 19 May 2017 00:16:36 +0000 (09:16 +0900)]
Revert "Use volatile pointer for Req_info variable."
This reverts commit
5a3d3964e67e1e1e3826d17f1e56f518c701008d.
Tatsuo Ishii [Tue, 16 May 2017 10:59:56 +0000 (19:59 +0900)]
Use volatile pointer for Req_info variable.
Since Req_info points to shared memory area, using volatile pointer
will allow to fetch the fresh value.
Tatsuo Ishii [Tue, 9 May 2017 23:30:17 +0000 (08:30 +0900)]
Fix corner case bug in Pgpool-II starting up.
It is possible that a failover request is accepted before primary node
is searched. This leads Pgpool-II to a strange state: there's no
primary node if the failed node was a primary node (even if new
primary node exists as a result of promotion of existing standby).
See [pgpool-hackers: 2321] for more details.
pengbo [Thu, 27 Apr 2017 10:00:59 +0000 (19:00 +0900)]
Prepare 3.4.11
Muhammad Usama [Wed, 12 Apr 2017 14:02:03 +0000 (19:02 +0500)]
Fix for
0000299: Errors on the reloading of configuration
Pgpool-II should load the hba_file after evaluating all the
command line options at startup.
It is wrong to load hba file before evaluating all the command line arguments,
because when the Pgpool-II is invoked with reload or stop options, all it needs
to do is to signal the running Pgpool-II and exit, So loading hba file for
Pgpool reload/stop is not only unnecessary but it also emits the misleading
WARNING message of "failed while loading hba configuration" when alternative
path for hba file was used.
Muhammad Usama [Tue, 4 Apr 2017 11:11:15 +0000 (16:11 +0500)]
Fix coverity reported issues of buffer overrun.
Per coverity CID#
1332216, CID#
1332218, CID#
1373253, CID#
1373254 and CID#
1373255
Muhammad Usama [Fri, 31 Mar 2017 15:15:22 +0000 (20:15 +0500)]
Fix for [pgpool-general: 5396] pam ldap failure
Do not use Palloc and friends to allocate memory for pam_response in the
PAM conversation function, Since that memory is freed by PAM library who
knows nothing about our MemoryManager and can cause the segmentation fault.
Muhammad Usama [Thu, 23 Mar 2017 21:17:36 +0000 (02:17 +0500)]
Fix for
0000296: PGPool v3.6.2 terminated by systemd because the service Type
has been set to 'forking'
Removing the "-n" value assigned to OPTS variable in pgpool.sysconfig.
The problem was the systemd service with Type=forking expects the parent process
to exit after the startup is complete, but because the -n command line option
disables the daemon mode and systemd keeps on waiting for the Pgpool-II's parent
process to exit after startup, which never happens and eventually systemd
terminate the Pgpool-II after timeout.
As part of this commit I have also added a new variable STOP_OPTS which is
passed to ExecStop and can be used to pass extra command line options to
Pgpool-II stop command.
pengbo [Fri, 17 Mar 2017 01:24:09 +0000 (10:24 +0900)]
Add "Wants=network.target" in pgpool.service file.
per bug294.
pengbo [Thu, 16 Mar 2017 23:49:49 +0000 (08:49 +0900)]
Prepare 3.4.10
Yugo Nagata [Thu, 9 Mar 2017 02:34:12 +0000 (11:34 +0900)]
Fix pcp_promote_node bug that fails promoting node 0
The master node could not be promoted by pcp_promote_node with
the following error;
FATAL: invalid pgpool mode for process recovery request
DETAIL: specified node is already primary node, can't promote node id 0
In streaming replication mode, there is a case that Pgpool-II
regards the status of primary node as "standby" for some reasons,
for example, when pg_ctl promote is executed manually during
Pgpool-II is running, in which case, it seems to Pgpool-II
that the primary node doesn't exist.
This status mismatch should be fixe by pcp_promote_node, but when the node
is the master node (the first alive node), it fails as mentioned above.
The reason is as following. before changing the status, pcp_promote_node
checks if the specified node is already primary or not by comparing the
node id with PRIMARY_NODE_ID. However, if the primary doesn't exist from
Pgpool-II's view, PRIMARY_NODE_ID is set to 0, which is same as MASTER_NODE_ID.
Hence, when the master node is specified to be promoted, pcp_promote_node
is confused that this node is already primary and doesn't have to be
promoted, and it exits with the error.
To fix this, pcp_promote_node should check the node id by using
REAL_PRIMARY_NODE_ID, which is set -1 when the primary doesn't exist,
rather than PRIMARY_NODE_ID.
pengbo [Tue, 28 Feb 2017 05:44:38 +0000 (14:44 +0900)]
Add the latest release note link to README file.
Tatsuo Ishii [Thu, 23 Feb 2017 07:05:11 +0000 (16:05 +0900)]
Fix to release shared memory segments when Pgpool-II exits.
Per bug272. From the bug report.
"This cause the creation of a lot of segments if you start and stop
pgpool continuously (and in a testing fase it could be normal). Lot of
segments bring to reach the shmem OS configuration limit and than
suddenly stops (pgpool) working."
Muhammad Usama [Mon, 6 Feb 2017 14:41:31 +0000 (19:41 +0500)]
Adding the missing ExecStop and ExecReload commands to the systemd
service configuration file.
The patch was contributed by supp_k and enhanced by me.
Muhammad Usama [Thu, 26 Jan 2017 20:37:29 +0000 (01:37 +0500)]
Fixing
0000280: stack smashing detected
It was a buffer overflow in wd_get_cmd function.
Tatsuo Ishii [Tue, 10 Jan 2017 23:24:32 +0000 (08:24 +0900)]
Remove elog/ereport calls from signal handlers.
elog/ereport calls malloc(), which is not safe to be called inside
signal handlers, per discussion in [pgpool-hackers: 1950]. I ifdef
out them, rather than simply remove them in a hope we someday find a
better solution which make calling the functions inside signal
handlers.
Not that I did not touch exit_handler() of pgpool_main.c because
removing elog/ereport from them loses informative message like
"received smart shutdown request". Pgpool-II main process do not
heavily use malloc(), so the risk is minimum, I guess.
pengbo [Tue, 10 Jan 2017 07:59:37 +0000 (16:59 +0900)]
Fix bug failed to create INET domain socket in FreeBSD if listen_addresses = '*'.
per bug202.
Muhammad Usama [Wed, 4 Jan 2017 13:23:33 +0000 (18:23 +0500)]
Fix for
0000249: watchdog sometimes fails de-escalation.
The logic in pgpool-II main process exit_handler and terminate_all_childrens was
not making sure that pgpool-II main process should only exit after all its
children have exited. And the problem occurs when the main process shutdowns
itself before watchdog and de-escalation child processes.
The solution is to use the waitpid() system call without WNOHANG option.
Yugo Nagata [Wed, 4 Jan 2017 05:20:24 +0000 (14:20 +0900)]
Fix connection_life_time broken by authentication_timeout
When authentication_timeout is enabled,
connection_life_time could never be expired, because
alarm(0) is called at reading start-up packet.
When there only one connection pool is used, this
problem doesn't occur because the signal handler
for connection_life_time is always set at the end
of the session. However, if more than one connection
pools exist, the handler isn't set but only the time
to colse the connection is calculated.
To fix it, when authentication_timeout is enabled,
save the signal handler for conneciont_life_time
and the remaining time, and undo the handler when
authentication_timeout is disabled.
Yugo Nagata [Wed, 28 Dec 2016 08:37:11 +0000 (17:37 +0900)]
Fix authentication timeout that can occur right after client connecttions
This is possible when connection_life_time is enabled.
SIGALRM signal is used for both connection_life_time and
authentication_timeout. Usually, SIGALRM is for connection_life_time,
but when the new connection is arrive, read_startup_packet() is called,
and the handler for authentication_timeout is set by pool_signal() and
alarm(authentication_timeout) is called in enable_authentication_timeout().
However, if connection_life_time is expired **between pool_signal() and
alarm()**, authenticate_timeout() will be called when connection_life_time
is expired instead of pool_backend_timer_handler().
To fix this, call alarm() before pool_signal() to prevent the signal
handler from being with wrong timing.
pengbo [Mon, 26 Dec 2016 02:51:31 +0000 (11:51 +0900)]
Prepare 3.4.9
Muhammad Usama [Fri, 23 Dec 2016 15:41:55 +0000 (20:41 +0500)]
Tightening up the watchdog security
Now wd_authkey uses the HMAC SHA-256 hashing.
pengbo [Thu, 22 Dec 2016 02:10:36 +0000 (11:10 +0900)]
Add pgpool_adm extension.
Tatsuo Ishii [Tue, 20 Dec 2016 02:38:12 +0000 (11:38 +0900)]
Fix occasional segfault when query cache is enabled.
Per bug 263.
Tatsuo Ishii [Thu, 27 Oct 2016 08:36:28 +0000 (17:36 +0900)]
Adopt the incompatibility with psql of PostgreSQL 9.6.
Since -c option does not imply -X anymore in 9.6, regression fails if
.psqlrc is set, for example "\pset pager" is set.
Tatsuo Ishii [Thu, 27 Oct 2016 08:04:46 +0000 (17:04 +0900)]
Do not cancel a query when the query resulted in an error other than in native replication mode.
It was intended to keep the consistency, but there's no point in other
than native replication mode.
Tatsuo Ishii [Thu, 27 Oct 2016 04:24:56 +0000 (13:24 +0900)]
Remove obsoleted option "-c".
Also fix typo in the help message.
Tatsuo Ishii [Sat, 24 Sep 2016 09:24:33 +0000 (18:24 +0900)]
Fix broken pool_config.c in the previous commit.
Tatsuo Ishii [Sat, 24 Sep 2016 01:42:12 +0000 (10:42 +0900)]
Change the default value of search_primary_node_timeout from 10 to 300.
Prior default value 10 seconds is sometimes too short for a standby to
be promoted.
Per [pgpool-general: 5026].
Tatsuo Ishii [Wed, 21 Sep 2016 00:26:46 +0000 (09:26 +0900)]
Fix the case when all backends are down then 1 node attached.
When all backends are down, no connection is accepted. Then 1
PostgreSQL becomes up, and attach the node using pcp_attach_node. It
successfully finishes. However, when a new connection arrives, still
the connection is refused because pgpool child process looks into the
cached status, in which the recovered node is still in down status if
mode is streaming replication mode (native replication and other modes
are fine). Solution is, if all nodes are down, force to restart all
pgpool child.
Per bug 248.
Tatsuo Ishii [Fri, 9 Sep 2016 02:24:54 +0000 (11:24 +0900)]
Do not use random() while generating MD5 salt.
random() should not be used in security related applications. To
replace random(), import PostmasterRandom() from PostgreSQL. Also
store current time at the start up of Pgpool-II main process for later
use.
Per Coverity CID
1362583.
Tatsuo Ishii [Fri, 9 Sep 2016 00:57:14 +0000 (09:57 +0900)]
Remove useless assignment.
Per coverity CID
1362589.
Tatsuo Ishii [Mon, 5 Sep 2016 23:08:32 +0000 (08:08 +0900)]
Don't ignore sync message from frontend when query cache is enabled.
While returning cached query result, sync message sent from frontend
is discarded. This is harmless because "ready for query" messages is
sent to frontend afterward. Problem is, AccessShareLock held by
previous parse message processing is not released until sync message
is received by the backend. Fix is, forwarding the sync message to
backend and discarding "ready for query" message returned from
backend.
Per [pgpool-hackers: 1787].
pengbo [Tue, 30 Aug 2016 10:02:34 +0000 (19:02 +0900)]
Prepare 3.4.8
Muhammad Usama [Mon, 29 Aug 2016 14:32:31 +0000 (19:32 +0500)]
Fix for 237: Pgpool-II fails to start if listen_addresses is empty string
The socket descriptor array (fds[]) was not getting the array end marker
when TCP listen addresses are not used.
Tatsuo Ishii [Sat, 27 Aug 2016 08:22:26 +0000 (17:22 +0900)]
Add comment to the document about connection_cache.
Mention that connections to template0, template1, postgres and
regression databases are not cached even if connection_cache is on.
Muhammad Usama [Mon, 22 Aug 2016 13:48:48 +0000 (18:48 +0500)]
Tighten up health check timer expired condition in pool_check_fd()
check if the signal was actually the health check timer expire to make sure that
we do not declare the timer expire due to some other signal arrived while
waiting for data for health check in pool_check_fd()
Muhammad Usama [Fri, 12 Aug 2016 12:07:10 +0000 (17:07 +0500)]
Fixing a typo in the log message
Tatsuo Ishii [Thu, 11 Aug 2016 09:35:19 +0000 (18:35 +0900)]
Unbreak version 2 protocol case.
Commit:
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=
3cbce4eaf870e89fdd8f8a2ad8fab8d3d8aadf65
broke version 2 protocol case. In the session initiation phase,
"MAJOR" macro is used *before* session context is created. In this
MAJOR macro returns PROTO_MAJOR_V3, which of course breaks v2 protocol
negotiation. Fix is, not to use MAJOR macro here.
Bo Peng [Wed, 3 Aug 2016 02:31:32 +0000 (11:31 +0900)]
fix japanese doc typo
Muhammad Usama [Mon, 1 Aug 2016 14:22:41 +0000 (19:22 +0500)]
Fixing a typo in english doc
Tatsuo Ishii [Wed, 27 Jul 2016 05:28:28 +0000 (14:28 +0900)]
Replace "MAJOR" macro to prevent occasional failure.
It is reported that the macro could cause segfault
[http://www.pgpool.net/mantisbt/view.php?id=225]. The macro calls
pool_virtual_master_db_node_id() and then access
backend->slots[id]->con using the node id returned. In rare cases, it
could point to 0 (in case when the DB node is not connected), which
gives access to con->major, then it causes a segfault.
Since the intention of the macro is obtaining the protocol major
number, it is a little bit pointless to keep the info on the data for
each DB node because the number should be identical among DB
nodes. To fix this, now we have the protocol major and minor version
numbers in the session context and they are set when pgpool-II
connects to backend. The setter and getter functions are added .The
macro now just returns the stored data by using the getter function
(this will save a few cpu cycle).
Muhammad Usama [Thu, 14 Jul 2016 13:50:51 +0000 (18:50 +0500)]
Fix for [pgpool-hackers: 1501] kind does not match error
pgpool throws ".. kind does not match.." error message when all the attached
backend nodes do not return the same response to the query. Although this error
message can be a symptom of the backend node sync issue in most cases, but in
case when the message kind of backend nodes differs because one of the backend
returned the notice response while the other returned some other kind then that
case should not be considered as an error case.
Consider the scenario where a pgpool is connected to multiple backend nodes and
is expecting to receive "[C] command complete" message from all nodes.
But while processing the query one of the backend also produced an extra warning message.
... WARNING: database "testdb" must be vacuum within
11000000 transaction
Although that the query was eventually successful on all backends, but as soon
as pgpool reads this WARNING message from one backend it will throw an error
ERROR: read_kind_from_backend: 1 th kind N does not match with master or majority connection kind C
But if pgpool would have kept reading from the backend, It would have also got
the expected command complete message after that warning notice
The solution for this is to keep forwarding the notice/log messages to the
frontend while reading the backend response in read_kind_from_backend() function
until the ERROR or expected message is received.
Though there is also one slight drawback of this approach, If the backends are
configured to forward all log messages to the client application, then the client
applications will get multiple log messages for the queries that are sent to
more than one node by pgpool.
test=# begin;
LOG: statement: begin;
LOG: statement: begin;
BEGIN
But since it is a very rare scenario, and just a minor inconvenience so it's not
holding us back to commit this solution until a better one arrives.
Muhammad Usama [Thu, 14 Jul 2016 12:33:43 +0000 (17:33 +0500)]
Fix handling of pcp_listen_addresses config parameter.
The config parameter was added by Ishii-San's
patch:
d87e2706b88558321b2ad287374149d5ba15c3af but few changes were mistakenly
removed by commit:
cdb49d3b783c03a7394e191e86763f364c7c6ce6