git.postgresql.org Git - pgpool2.git/commit

projects / pgpool2.git / commit

summary | shortlog | log | commit | commitdiff | tree
(parent: 7c09d88) | patch

author	Tatsuo Ishii <ishii@postgresql.org>
	Sat, 20 Jul 2013 03:58:09 +0000 (12:58 +0900)
committer	Tatsuo Ishii <ishii@postgresql.org>
	Sat, 20 Jul 2013 03:58:09 +0000 (12:58 +0900)
commit	0fd460bdb51c0b1c81563778017bba36065e1b3b
tree	2d98248ab3a9d9f8c5cd3c6814464a277f1395ac	tree
parent	7c09d884b94635e85a8fdb8b049381314309a000	commit \| diff

Fix bug with health check when used with child_life_time reported in [pgpool-general: 1892].

Here is the explanation why the problem occurs:

--------------------------------------------------------------------------------
Ok. I think I finally understand what's going on here.

Pgpool main process (14317) started health checking at Jul 12 09:17:04.

Jul 12 09:17:04 purple1-node1-ps pgpool[14317]: starting health checking

Pgpool main process set timer at 09:17:14 because you set
health_check_timeout 10. This time the health check successfully
completed. The timer for 09:17:14 is blocked by calling
signal(SIGALRM, SIG_IGN).

Unfortunately child life time was expired at 09:17:14 and pgpool main
process was busy at the time because of this.

Jul 12 09:17:14 purple1-node1-ps pgpool[16789]: child life 300 seconds expired
Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: reap_handler called

Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: starting health checking

Pgpool main re-enabled the timer and reset the timer variable
(health_check_timer_expired = 0). But when the timer re-enabled, the
signal handler for the timer set health_check_timer_expired to 1. As
a result pgpool thought that health check timer was expired.

Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: health_check: health check timer has been already expired before attempting to connect to 0 th backend

Thus failover happend even if the backend was running fine.
--------------------------------------------------------------------------------

To fix the problem new macro CLEAR_ALARM, which calls alarm(0) until
all pending alarms are cleared, is defined and used whenever necessary
to cancel health check timer. Also before forking off child process
health_check_timer_expire is explicitely cleared.

Also this causes the error message.

Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: connect_inet_domain_socket_by_port: health check timer expired

Process 11465 is a child process and is not supposed to run into this
situation. This is caused because the global variable
"health_check_timer_expired" is set to 1 before the new child is
forked off after child_life_time expired is set to 1. This could if
SIGCHLD signal is received at the moment when the bug below happens.
To make sure this never happens in connect_inet_domain_socket_by_port
checks health_check_timer_expired only if it is a main process.

main.c		diff \| blob \| blame \| history
pool_connection_pool.c		diff \| blob \| blame \| history

Clustering tool for PostgreSQL