Muhammad Usama [Mon, 4 Nov 2013 11:00:56 +0000 (16:00 +0500)]
This is the first of the series of commits to import PostgreSQL's exception and memory manager in pgpool.
This checkin includes the following items.
1-- Exception manager API of Postgres is added to pgpool, The API consists of elog.c and elog.h files.
Since this API is very extensive and is designed for PostgreSQL, So to fit it properly into pgpool I have
modified it a little bit, and most of the modifications are related to removal of code which is not required for pgpool.
2-- Added on_proc_exit callback mechanism of Postgres. To facilitate the cleanup at exit time.
3-- Added PostgreSQL's memory manager (palloc API). This includes the client side palloc functions placed in 'src/tools' directory (fe_memutils)
4-- Removed the existing memory manager which was very minimalistic and was not integrated in all parts of the code.
5-- The checkin also refectors some portions of code to make the code more readable at first glance. That includes
-- Dividing the main.c file into two files main.c and pgpool_main.c, Now the main.c file only contains the code
related to early initialisations of pgpool and parsing of the command line options.
The actual logic of the pgpool main process is moved to new pgpool_main.c file.
-- Breaking up some large functions in child.c into smaller functions.
-- Rewrite the pgpool's main loop logic to make the code more readable.
Remaining TODOs on this front.
1-- The current checkin only integrates the memory and exception manager in main process and connection creation
segment of pgpool child process. And the next TODO is to integrate it in pcp and worker child process.
2-- Integration of newly added API into query processor logic in child processes.
3-- elog.c and elog.h files needs some cleanups and changes (To remove unwanted functions and data members of ErrorData structure).
Tatsuo Ishii [Mon, 11 Nov 2013 22:59:36 +0000 (07:59 +0900)]
Fix that the script forgets to allow public access to pgpool_catalog.
The bug prevents inserting data into user tables if pgpool_catalog is
created in native replication mode. The bug was there from day 1. I
wonder why nobody noticed until today. Per [pgpool-general-jp: 1229].
Tatsuo Ishii [Mon, 11 Nov 2013 11:48:12 +0000 (20:48 +0900)]
Do not set/unset fronted connection info for dead backend.
Per bug #82.
Tatsuo Ishii [Wed, 6 Nov 2013 11:37:40 +0000 (20:37 +0900)]
Fix range check bug of MAX_NUM_BACKENDS in corner case.
MAX_NUM_BACKENDS is the allowed max number of DB nodes (128, at this
point). In reality, probably no one ever tried more than 128 DB nodes
and that's the reason why nobody noticed.
Per Coverity report "
1111429,
1111430 and
1111431 Out-of-bounds
write".
Tatsuo Ishii [Wed, 6 Nov 2013 11:10:24 +0000 (20:10 +0900)]
Fix target node selection logic when "DEALLOCATE portal|statement".
When "DEALLOCATE portal|statement" is used and last prepared statement
or portal was not found, target node selection map is not
set. Probably this is not actually harmful because prepared statement
or portal was not found is an error case. The bug was there since day
0.
Per Coverity report "
1111491 Structurally dead code".
Yugo Nagata [Tue, 5 Nov 2013 01:43:44 +0000 (10:43 +0900)]
Fix to put null character at end of ping result string used in watchdog
Tatsuo Ishii [Mon, 4 Nov 2013 12:51:04 +0000 (21:51 +0900)]
Fix load balance bug in replication mode.
When load_balance_mode = off, SELECTs including writing function
should be sent to all the DB nodes. Per [pgpool-general: 2221].
The bug was introduced in:
http://git.postgresql.org/gitweb/?p=pgpool2.git;a=commitdiff;h=
1ac45a28258074ea4d9d902aca016f970d31f311
(3.3.1).
Tatsuo Ishii [Sun, 3 Nov 2013 12:53:16 +0000 (21:53 +0900)]
Fix memory leak when do_query() fails in timestamp rewriting.
For this purpose free_select_result() is changed to accept NULL
argument. Per Coverity report "
1111454,
1111455 Resource leak".
Tatsuo Ishii [Sat, 2 Nov 2013 03:25:28 +0000 (12:25 +0900)]
Fix memory leak when SSL is requested.
When SSL is requested, pgpool child retries to read start up packet.
However it does not free the memory for previous start up packet.
Per Coverity report "
1111443 Resource".
Tatsuo Ishii [Sat, 2 Nov 2013 02:56:02 +0000 (11:56 +0900)]
Merge branch 'master' of ssh://git.postgresql.org/pgpool2
Tatsuo Ishii [Sat, 2 Nov 2013 02:42:52 +0000 (11:42 +0900)]
Fix to check return status of pool_read().
Per Coverity report "
1111374 Unchecked return value.
Yugo Nagata [Fri, 1 Nov 2013 12:46:06 +0000 (21:46 +0900)]
Remove unnecessary comparison with no effect
Yugo Nagata [Fri, 1 Nov 2013 12:23:01 +0000 (21:23 +0900)]
Fix to close socket when error occurs in wd_create_send_socket()
Yugo Nagata [Fri, 1 Nov 2013 11:20:53 +0000 (20:20 +0900)]
Fix to pass big parameter by pointer instead of by value at some function
Tatsuo Ishii [Tue, 29 Oct 2013 21:14:53 +0000 (06:14 +0900)]
Fix error message in read_password_packet().
Tatsuo Ishii [Tue, 29 Oct 2013 20:39:24 +0000 (05:39 +0900)]
Fix "show pool_status" always shows memqcache_auto_cache_invalidation as 0.
Per bug #80.
Tatsuo Ishii [Sun, 27 Oct 2013 22:58:06 +0000 (07:58 +0900)]
Fix on_shmem_exit() to not accept too many on exit functions.
Actually harmless because we currently do not such that many on exit
functions. But a bug is a bug...
Per Coverity report "
1025973 Out-of-bounds write".
Tatsuo Ishii [Sun, 27 Oct 2013 14:24:33 +0000 (23:24 +0900)]
Fix unsafe use of strcpy().
Per Coverity report "
1111479 Copy into fixed size buffer".
Tatsuo Ishii [Sat, 26 Oct 2013 10:00:17 +0000 (19:00 +0900)]
Fix memory leak in certain error case.
Per Coverity report "
1111457 Resource leak".
Tatsuo Ishii [Sat, 26 Oct 2013 09:09:39 +0000 (18:09 +0900)]
Fix memory leak when do_query() fails.
Per Coverity report "
1111444 Resource leak".
Tatsuo Ishii [Sat, 26 Oct 2013 04:07:37 +0000 (13:07 +0900)]
Fix useless argument checks.
The argument is unsigned int, so comparing less than 0 is meaning
less.
Per Coverity report "
1111418 -
1111421 Unsigned compared against 0".
Tatsuo Ishii [Sat, 26 Oct 2013 03:40:38 +0000 (12:40 +0900)]
Fix useless argument check in pool_cache_item_header().
The argument is unsigned int, so comparing less than 0 is meaning
less.
Per Coverity report "
1111422 Unsigned compared against 0".
Tatsuo Ishii [Sat, 26 Oct 2013 03:29:34 +0000 (12:29 +0900)]
Fix forgetting free_select_result() in error case.
Per Coverity report "
1111458 Resource leak".
Tatsuo Ishii [Sat, 26 Oct 2013 03:02:36 +0000 (12:02 +0900)]
Fix forgetting free() in error case.
Per Coverity report "
1111459 Resource leak".
Tatsuo Ishii [Sat, 26 Oct 2013 02:01:22 +0000 (11:01 +0900)]
Fix inappropriate shared memory allocation size for clock hand.
The memory for clock hand was allocated as
sizeof(pool_fsmm_clock_hand)) which is 8 bytes long because the
variable is declared as:
static int *pool_fsmm_clock_hand;
This is plain wrong. The memory size actually needed is only 4 bytes,
which is sizeof(*pool_fsmm_clock_hand)). In other word, the bug
allocated unnecessary 4 bytes, which did nothing bd for the execution
of program. But a bug is a bug.
Per covery report "
1111476 Wrong sizeof argument"
Nozomi Anzai [Tue, 22 Oct 2013 01:17:53 +0000 (10:17 +0900)]
checkEnv() didn't do anything and always returned 0.
Tatsuo Ishii [Sat, 19 Oct 2013 03:58:20 +0000 (12:58 +0900)]
Avoid to run out free query cache hash index entry.
If hash index entries are run out, pgpool-II cannot reuse old cache
entry because pgpool-II always expects there's at least one empty hash
index entry. To mitigate the problem, if free hash index entries are run
out, look for victim cache block to reuse the hash index entry.
Per bug #70.
Tatsuo Ishii [Sat, 19 Oct 2013 03:23:22 +0000 (12:23 +0900)]
Fix occasional segfault in query cache + extended protocol.
When the query is not "cache safe", bind_msg->param_offset was not set
in Bind(). However, Execute() unconditionally uses
bind_msg->param_offset to convert bind parameters to string so that
they can be added to the cached query string and it causes segfault
because bind_msg->param_offset is a garbage. Also logic bug to
calculate bind_msg->param_offset is corrected.
Per bug#76.
Tatsuo Ishii [Tue, 15 Oct 2013 23:23:54 +0000 (08:23 +0900)]
Fix memory allocation size bug in the code path of query cache.
In execute() memory allocation size is too small incertain case.
No bug has been reported so far, but certainly this is a bug.
Tatsuo Ishii [Tue, 15 Oct 2013 01:37:38 +0000 (10:37 +0900)]
Fix data inconsistency problem with native replication mode + extended protocol case.
It is reported that concurrent INSERT using JDBC driver causes data
difference among database node. This only happens following conditions
are all met:
1) Native replication mode
2) Extended protocol used
3) The portal created by parse message is reused by bind message
4) autocommit is on
5) SERIAL (sequence) is used
Pgpool-II's parse message function knows it has to lock the target
table when INSERT (plus #5) is issued by clients. Unfortunately bind
message function did not know it. Once parse/bind/execute finishes,
pgpool releases the lock obtained by parse because of #4. JDBC wants
to reuse the portal and starts the cycle from bind message, which does
not obtain lock. As as result, lock-free INSERT are floating around
which causes data inconsistency of course. The solution is, lock the
table in bind phase.
For this bind needs to issue LOCK in extended protocol. This was a
little bit hard because the module (do_command()) to issue internal
SQL command (other than SELECT) does not support extended protocol. To
solve the problem do_query() is modified so that it accepts other than
SELECT because it already accepts extended protocol. The modification
is minimum and is only tested for the case called from insert_lock(). I
do not recommend to replace every occurrence of do_command() with
do_query() at this point.
BTW the reason why the bug is not reported is, most users uses JDBC
with auto commit = off. In this case, the lock obtained by parse
persists until user explicitly issues commit or rollback.
Per bug report by Steve Kuekes in [pgpool-general: 2142].
Tatsuo Ishii [Mon, 7 Oct 2013 01:33:36 +0000 (10:33 +0900)]
Avoid calling find_primary_node_repeatedly() when standby node goes down.
This will reduce the time to failover. Per bug #75, patch modified by
Tatsuo Ishii.
Tatsuo Ishii [Sun, 6 Oct 2013 08:51:36 +0000 (17:51 +0900)]
Fix misuse of the word "degenerate".
Correction suggested by Andrew Bruce in [pgpool-general: 2146].
Yugo Nagata [Thu, 3 Oct 2013 03:21:14 +0000 (12:21 +0900)]
Modify descriptions about restrictions of parallel mode
Muliple rows INSERT using VALUES are not supported in parallel mode.
Yugo Nagata [Wed, 2 Oct 2013 10:42:14 +0000 (19:42 +0900)]
Fix a description about the version where pgpool.pg_ctl is required
Yugo Nagata [Wed, 2 Oct 2013 10:37:48 +0000 (19:37 +0900)]
Fix a typo and descriptions about pgpool.pg_ctl parameter
Tatsuo Ishii [Wed, 25 Sep 2013 13:58:07 +0000 (22:58 +0900)]
Fix segfault when pgpool.conf does not set log_standby_delay.
This is caused by wrong initialization for log_standby_delay in
pool_config.l. Per bug#74.
Tatsuo Ishii [Wed, 25 Sep 2013 01:41:18 +0000 (10:41 +0900)]
Fix typo in comment.
Tatsuo Ishii [Fri, 13 Sep 2013 09:11:55 +0000 (18:11 +0900)]
Remove -lcompat because it confuses FreeBSD per bug#15.
Tatsuo Ishii [Thu, 12 Sep 2013 12:45:41 +0000 (21:45 +0900)]
Add Makefiles missed in source code reorganization.
Also incorporate changes from Christoph Berg ([pgpool-general: 2127]).
Tatsuo Ishii [Thu, 12 Sep 2013 12:32:35 +0000 (21:32 +0900)]
Fix a warning/error when compiling with -Werror=format-security.
Patch contributed by Christoph Berg ([pgpool-general: 2127]).
Tatsuo Ishii [Wed, 11 Sep 2013 14:21:51 +0000 (23:21 +0900)]
Fix incorrect time stamp rewriting in replication mode for certain time zones.
Time stamp rewriting calls "SELECT now()" to get current
time. Unfortunately the buffer for the current time is too small for
certain time zones such as "02:30". Note that non-30-minutes-time-zone
such as "0900" does not reveal the problem. This explains why we
haven't the bug report until today. Bug reported in [pgpool-general:
2113] and fix provided by Sean Hogan.
Tatsuo Ishii [Tue, 27 Aug 2013 00:25:26 +0000 (09:25 +0900)]
Add 4th paramter "master node port number" to recovery script.
This makes pgpool_setup's life lot easier since it does not need to
check the database cluster dir then guesses the port number of master
database cluster. Because of this, pgpool_setup had to limit the number
of database clusters generated up to 3.
Tatsuo Ishii [Tue, 27 Aug 2013 00:23:35 +0000 (09:23 +0900)]
Add lost Makefile while source reorganization.
Tatsuo Ishii [Wed, 21 Aug 2013 11:05:24 +0000 (20:05 +0900)]
Unbreak pgpool_setup.
The command did not work in standalone anymore.
Yugo Nagata [Wed, 21 Aug 2013 07:18:06 +0000 (16:18 +0900)]
Remove pcp directory from the repo root
Apply Usama's patch (rempcp.patch)
Tatsuo Ishii [Sat, 17 Aug 2013 01:37:47 +0000 (10:37 +0900)]
Set code name for version 3.4.
Tatsuo Ishii [Sat, 17 Aug 2013 01:21:32 +0000 (10:21 +0900)]
Fix bug with load balance in replication mode.
When load_balance_mode is off in replication mode, SELECT's should be
sent to master node only, rather than sent to all nodes in an
explicit transaction. Also add regression test case for this. Problem
reported in [pgpool-general: 2038].
Tatsuo Ishii [Wed, 14 Aug 2013 22:41:57 +0000 (07:41 +0900)]
Remove unsed variable.
Yugo Nagata [Wed, 14 Aug 2013 08:11:21 +0000 (17:11 +0900)]
Fix a typo of the japanese document
Yugo Nagata [Wed, 14 Aug 2013 07:47:59 +0000 (16:47 +0900)]
Fix a typo of log message
Tatsuo Ishii [Wed, 14 Aug 2013 07:36:53 +0000 (16:36 +0900)]
Set version number to 3.4-alpha1
Tatsuo Ishii [Wed, 14 Aug 2013 07:36:04 +0000 (16:36 +0900)]
Fix for "make dist".
Tatsuo Ishii [Wed, 14 Aug 2013 07:08:06 +0000 (16:08 +0900)]
Apply code reorganization maga patch from Muhammad Usama.
Adjust for recent changes in the repository by Tatsuo Ishii.
Tatsuo Ishii [Mon, 12 Aug 2013 23:26:18 +0000 (08:26 +0900)]
Fix typos in installation of pgpool-recovery section.
Tatsuo Ishii [Mon, 5 Aug 2013 22:47:44 +0000 (07:47 +0900)]
Remove old on disk query cache.
Patch contributed by Muhammad Usama. Japanese doc changes by Tatsuo Ishii.
Tatsuo Ishii [Mon, 5 Aug 2013 12:40:34 +0000 (21:40 +0900)]
Add regression test for bug #68.
Tatsuo Ishii [Mon, 5 Aug 2013 12:15:41 +0000 (21:15 +0900)]
Fix double free bug with on memory query cache reported in bug #68.
The bug occurs when multiple bind/execute messages come after a parse
message. When a parse messages comes, query context is created along
with temp cache. The pointer to the temp cache is added to the temp
cache array when the query executed. Subsequent bind messages uses the
same temp cache pointer to the cache array. This is the source of
double free bug when the cache array discarded. The solution is, reset
temp cache pointer in the query context when the temp cache buffer is
added to the cache array.
Nozomi Anzai [Mon, 5 Aug 2013 08:07:05 +0000 (17:07 +0900)]
Fix the version of pgpoolAdmin: 3.3 -> 3.3.0
Yugo Nagata [Fri, 2 Aug 2013 05:39:25 +0000 (14:39 +0900)]
Remove unnecessary function prototype declarations
Yugo Nagata [Fri, 2 Aug 2013 02:21:42 +0000 (11:21 +0900)]
Fix typos
Yugo Nagata [Wed, 31 Jul 2013 01:39:03 +0000 (10:39 +0900)]
Fix a compile error
Tatsuo Ishii [Tue, 30 Jul 2013 13:29:30 +0000 (22:29 +0900)]
Fix watchdog test script.
Tatsuo Ishii [Tue, 30 Jul 2013 13:30:31 +0000 (22:30 +0900)]
Fix bug with wd_init_interlock pointed out in [pgpool-general: 1956].
It writes too much than allocated memory.
Tatsuo Ishii [Tue, 30 Jul 2013 12:36:33 +0000 (21:36 +0900)]
Fix to add test/regression/ for make dist.
Tatsuo Ishii [Tue, 30 Jul 2013 12:34:08 +0000 (21:34 +0900)]
Fix to remove test results of watchdog.
Yugo Nagata [Tue, 30 Jul 2013 07:43:02 +0000 (16:43 +0900)]
Rewrite descriptions about condition of watchdog's lifecheck starting
Yugo Nagata [Tue, 30 Jul 2013 07:34:50 +0000 (16:34 +0900)]
Prepare 3.3
Yugo Nagata [Tue, 30 Jul 2013 06:38:12 +0000 (15:38 +0900)]
Prepare 3.3
Yugo Nagata [Tue, 30 Jul 2013 06:32:41 +0000 (15:32 +0900)]
Fix a hang that occurs when mutiple pgpools startup before all backend startup
Specify a timeout value for lock-waiting loops.
Tatsuo Ishii [Mon, 29 Jul 2013 14:52:41 +0000 (23:52 +0900)]
Fix bug in parsing prepared statements with transaction handling in replication mode reported in [pgpool-general: 1877].
Parse() automatically starts a transaction for non SELEC query to keep
consistency among nodes in replication mode. But it does not set
allow_close_transaction variable. If wrong query comes in, the
transaction goes into an abort state but pgpool does not close the
transaction. Thus next query causes error because the transaction is
still in abort status.
Tatsuo Ishii [Mon, 29 Jul 2013 14:46:16 +0000 (23:46 +0900)]
Previous commit is incomplete.
Fix some of test.sh to handle pgbench path correctly.
Tatsuo Ishii [Sun, 28 Jul 2013 10:46:04 +0000 (19:46 +0900)]
Fix pgbench path detection.
-b option specifies pgbench installed directory. But actually
regress.sh main script sets pgbench full path. I think what the main
script does is appropreate because it gives more flxibility. If -b is
not specifiedm, first search pgbench under PostgreSQL installation
directory. If not found, try to locate in command search path. If not
found, the gives up.
Also fix the behavior when PostgreSQL installation directory is not
found. It just exited in a sub shell and it continued to run.
Tatsuo Ishii [Fri, 26 Jul 2013 05:51:26 +0000 (14:51 +0900)]
Enhance pgpool_setup and regression test suite.
Patch contributed by Muhammad Usama. Below is the comment from him:
I tried to make some changes in the regression script, mainly because the
script uses lot of hard coded paths and regression starts reporting
failures if the pgpool-II or PostgreSQL are not found at the default path.
Please find the attached patch which enables the regress.sh script to
install the pgpool on fly in temp directory to run the tests on it
(optional), and now the regress script takes some additional arguments to
specify the paths of required components.
The list of all arguments accepted by regress.sh script as implemented by
the attached patch are
pgpool2/test/regression$ sh regress.sh -?
Usage:
regress.sh: [Options]... [test_name]
Options:
-p DIRECTORY Postgres installed directory
-b DIRECTORY pgbench installed directory, if different from
Postgres installed directory
-i DIRECTORY pgpool installed directory, if already installed
pgpool is to be used for tests
-m install/noinstall make install pgpool to temp directory for
executing regression tests [Default: install]
-j FILE Postgres jdbc jar file path
-? print this help and then exit
Nozomi Anzai [Fri, 26 Jul 2013 05:46:34 +0000 (14:46 +0900)]
Add internal links in Japanese release notes.
Yugo Nagata [Thu, 25 Jul 2013 08:19:03 +0000 (17:19 +0900)]
Fix a bug of setting heartbeat_destination parameter
Yugo Nagata [Thu, 25 Jul 2013 06:20:36 +0000 (15:20 +0900)]
Fix some comments, messages and variables
Yugo Nagata [Thu, 25 Jul 2013 06:03:30 +0000 (15:03 +0900)]
Fix PATH variable to prioritize pghome
Yugo Nagata [Thu, 25 Jul 2013 05:42:41 +0000 (14:42 +0900)]
Fix to configure path to bin/lib of postgresql92 and pgpoolAdmin tar ball
Yugo Nagata [Thu, 25 Jul 2013 03:29:12 +0000 (12:29 +0900)]
Prepare 3.3.0-RC1
Nozomi Anzai [Thu, 25 Jul 2013 05:14:48 +0000 (14:14 +0900)]
Merge branch 'master' of ssh://git.postgresql.org/pgpool2
Nozomi Anzai [Thu, 25 Jul 2013 05:11:47 +0000 (14:11 +0900)]
Add debug mode of install.sh itself so that script becomes quiet.
Yugo Nagata [Thu, 25 Jul 2013 03:22:19 +0000 (12:22 +0900)]
Fix typos
Yugo Nagata [Thu, 25 Jul 2013 03:17:30 +0000 (12:17 +0900)]
Prepare 3.3.0-RC1
Yugo Nagata [Thu, 25 Jul 2013 03:11:04 +0000 (12:11 +0900)]
Prepare 3.3.0-RC1
Tatsuo Ishii [Thu, 25 Jul 2013 01:00:27 +0000 (10:00 +0900)]
Fix mistake in ssh command.
Nozomi Anzai [Wed, 24 Jul 2013 06:59:20 +0000 (15:59 +0900)]
Merge branch 'master' of ssh://git.postgresql.org/pgpool2
Nozomi Anzai [Wed, 24 Jul 2013 06:56:56 +0000 (15:56 +0900)]
Add the description about pgpool.pg_ctl.
Some visual improvements.
Yugo Nagata [Wed, 24 Jul 2013 04:56:13 +0000 (13:56 +0900)]
Fix typos of the version name
Yugo Nagata [Tue, 23 Jul 2013 07:23:42 +0000 (16:23 +0900)]
Fix to not print netmask to pg_hba.conf if node is specified by hostname
Yugo Nagata [Tue, 23 Jul 2013 06:34:34 +0000 (15:34 +0900)]
Fix a bug that password-less ssh setting fails when postgres user home isn't /home/postgres
Yugo Nagata [Tue, 23 Jul 2013 06:28:44 +0000 (15:28 +0900)]
Fix to remove /vaw/www/html/pgpoolAdmin directory in uninstall.sh
Yugo Nagata [Tue, 23 Jul 2013 05:03:15 +0000 (14:03 +0900)]
Add pcp_watchdog_info to pgpool.spec
Tatsuo Ishii [Tue, 23 Jul 2013 04:36:37 +0000 (13:36 +0900)]
Add watchdog test.
Currently the test is minimum. It just checks standby->master
escalation.
Tatsuo Ishii [Mon, 22 Jul 2013 23:09:04 +0000 (08:09 +0900)]
Fix "deploy" section.
This should had been updated when watchdog is introduced.
Yugo Nagata [Mon, 22 Jul 2013 09:59:14 +0000 (18:59 +0900)]
Fix to not bind network devices to sockets when heartbeat_device parameter is empty
Tatsuo Ishii [Mon, 22 Jul 2013 08:41:32 +0000 (17:41 +0900)]
Fix bug introduced in
8b803e1ea38c3ac9a7775f7fb9ba43870bca7dca (Add wait_for_pgpool_reload function).
Add -p option to specify start port #.
Tatsuo Ishii [Sat, 20 Jul 2013 03:58:09 +0000 (12:58 +0900)]
Fix bug with health check when used with child_life_time reported in [pgpool-general: 1892].
Here is the explanation why the problem occurs:
--------------------------------------------------------------------------------
Ok. I think I finally understand what's going on here.
Pgpool main process (14317) started health checking at Jul 12 09:17:04.
Jul 12 09:17:04 purple1-node1-ps pgpool[14317]: starting health checking
Pgpool main process set timer at 09:17:14 because you set
health_check_timeout 10. This time the health check successfully
completed. The timer for 09:17:14 is blocked by calling
signal(SIGALRM, SIG_IGN).
Unfortunately child life time was expired at 09:17:14 and pgpool main
process was busy at the time because of this.
Jul 12 09:17:14 purple1-node1-ps pgpool[16789]: child life 300 seconds expired
Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: reap_handler called
Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: starting health checking
Pgpool main re-enabled the timer and reset the timer variable
(health_check_timer_expired = 0). But when the timer re-enabled, the
signal handler for the timer set health_check_timer_expired to 1. As
a result pgpool thought that health check timer was expired.
Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: health_check: health check timer has been already expired before attempting to connect to 0 th backend
Thus failover happend even if the backend was running fine.
--------------------------------------------------------------------------------
To fix the problem new macro CLEAR_ALARM, which calls alarm(0) until
all pending alarms are cleared, is defined and used whenever necessary
to cancel health check timer. Also before forking off child process
health_check_timer_expire is explicitely cleared.
Also this causes the error message.
Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: connect_inet_domain_socket_by_port: health check timer expired
Process 11465 is a child process and is not supposed to run into this
situation. This is caused because the global variable
"health_check_timer_expired" is set to 1 before the new child is
forked off after child_life_time expired is set to 1. This could if
SIGCHLD signal is received at the moment when the bug below happens.
To make sure this never happens in connect_inet_domain_socket_by_port
checks health_check_timer_expired only if it is a main process.
Tatsuo Ishii [Sat, 20 Jul 2013 03:34:37 +0000 (12:34 +0900)]
Fix bug reported in bug #62 (main process segfault).
The bug report showed main process died after the log:
Jun 22 10:23:20 pgpool[32629]: pool_read: read failed (Connection reset by peer)
Jun 22 10:23:20 pgpool[32629]: notice_backend_error: called from pgpool main. ignored.
Jun 22 10:23:20 pgpool[32629]: child_exit: called from pgpool main. ignored.
Jun 22 10:23:20 kernel: [
11850568.371509] pgpool[32629]: segfault at 6f0ff2 ip
00007fbd7ed9811f sp
00007fff27275f08 error 4 in libc-2.15.so[
7fbd7ec49000+1b5000]
This indicates that after reading from socket while health checking
failed, it tried to initiate failover by calling notice_backend_error,
which is not appropriate because that should be handled in the health
check logic. So it is ignored. So far so good. pool_read called
child_exit after that and again it was ignored. This is also
good. Problem is, it continued to run with undefined read length from
read(2). Finally pool_read call memmove with the undefined length to
update its internal buffer, which could cause segfault depending on
the value of the read length.
Fix is, after child_exit, return with -1 to indicate error was
happened. Also I fixed the code after pool_check_fd fails. Before it
just exits process, which is never good for pgpool main
process. Although I don't think the case ever happened because I never
saw single error report which showed the case.
Tatsuo Ishii [Sat, 20 Jul 2013 03:34:22 +0000 (12:34 +0900)]
Update JDBC driver version.
Yugo Nagata [Fri, 19 Jul 2013 05:23:58 +0000 (14:23 +0900)]
Add descriptions of pcp_watchdog_info to documents