tcpserver — TCP client connections and process management in proxies
The part of Kernun Firewall called tcpserver
handles the server side of proxies. It is implemented by the C function
tcpserver
() contained in a library linked
to proxies. After a proxy performs the initializion (command line parsing,
configuration reading, log opening), it calls
tcpserver
(). Among other parameters,
tcpserver
() gets a callback function for connection
handling. The tcpserver
() function waits for a connection
from a client and then calls the callback and passes it the file
descriptor of the accepted connection. The callback is supposed to process
the connection (it performs the proxy-specific work) and then return to
tcpserver
(). When this happens,
tcpserver
() waits for the next connection.
The tcpserver
() function also manages multiple
processes needed for parallel handling of connections. Moreover, it
processes termination and log level change signals.
The management of proxy child processes is performed using pre-forked processes. This concept of process managemenent is used, for example, by the Apache WWW server.
Most TCP process control attributes are contained in the
tcpserver
configuration section (see
tcpserver(5) manual page); some, which
are common for TCP and UDP proxies, are part of another configuration
section, application
(see
application(5) manual page).
TCP server handles some signals. All signals except
SIGUSR1
and SIGUSR2
should
be always sent to the parent process of a proxy only.
SIGUSR1
Increase the log level of a child process (or the parent process and all its children, if sent to parent).
SIGUSR2
Decrease the log level of a child process (or the parent process and all its children if sent to parent).
SIGHUP
Graceful termination; the proxy does not accept any new connection, waits until all open connections are closed, and terminates.
SIGTERM
, SIGINT
,
SIGQUIT
Immediate termination; the proxy closes all connections and terminates immediately.
If item singleproc
is present in the
application
configuration section, the proxy manages
all connections using a single process. The algorithm is very simple:
Create and bind sockets according to the configuration (see listen-on(5)).
Switch credentials according to the configuration (see application(5)).
Wait for a connection from a client.
Call the proxy-specific connection handling function and pass it the accepted connection.
After a successful return from the handling function, go to 3. If the handling function returns an error, exit TCP server.
If item singleproc
is not present in the
configuration, the parent proxy process forks child processes
that handle incoming connections. The parent does not accept any
connection; it only monitors the status of child processes, starts new
children and/or kills superfluous ones.
Parent algorithm:
Create and bind sockets according to the configuration (see listen-on(5)).
Switch credentials according to the configuration (see application(5)).
Create init-children
child processes.
Count busy children (those processing a connection) and idle ones (those waiting for a connection).
If there are less than min-idle
idle
children, try to fork new children to achieve
min-idle
. At most min-start-rate
children are forked and the total number of child processes never
exceeds max-children
. If there are still not
enough idle child processes during the next parent cycle,
2 * min-start-rate
new children will be
forked. Subsequently, the number of forked children is doubled in
each following parent cycle, up to the maximum of
max-start-rate
new children per cycle. If
min-idle
is reached, the number of forks per cycle is
changed back to min-start-rate
.
If there are more than max-idle
idle child
processes, try to kill some idle children to achieve
max-idle
. At most kill-rate
children are killed.
If SIGHUP
has been received, wait for all
children to terminate and exit.
If the parent cycle has been repeated info-cycle
times, log a statistical message containing the number of forked and killed
children.
Wait for parent-cycle
ms and start a new parent
cycle (go to 4).
If the creation of a new child process fails because of a lack
of system resources, it is repeated up to fork-retries
times. There is a pause of fork-wait
ms
between every two attempts. If all fork-retries
are
unsuccessful, no new child is started, but the proxy continues its
operation (and possibly starts children later, when the system load
decreases).
Additionally, the parent process manages a single child process that resolves DNS names from the configuration. This child process is not controlled by the above algorithm and is restarted as required for proper name resolution (see resolving(7)).
Child algorithm:
Start listening on all server sockets, as specified by the
listen-on
configuration value.
Wait for a connection from a client.
Call the proxy-specific connection handling function and pass it the accepted connection.
After a successful return from the handling function, go to 2. If the handling function returns an error, terminate the particular child process. The proxy continues running and replaces the terminated child as necessary.
In order to be able to manage its child processes, the parent
process must communicate with them. Two mechanisms are used for this
purpose: shared memory and signals. There is a shared memory structure
called “scoreboard” containing one slot for each possible
child (i.e., max-children
slots). Each child
maintains a flag in its scoreboard slot that indicates whether the
child is busy, or idle. The parent reads these flags when counting
its children. The parent sends signals to the children in order to kill
a superfluous child, perform an immediate or graceful termination, and
increase or decrease the log level. As there are not enough signal
numbers available, the parent uses SIGTERM
for
immediate termination and SIGHUP
for all other
requests. The type of request is indicated by a value set by the
parent in the scoreboard before sending the signal.
Doing
select
()/accept
() by
multiple processes in parallel on the same set of sockets causes
a problem (see, e.g., Apache WWW server documentation, section
"General Performance hints"). If a single connection arrives, all
processes are woken up from select
() and call
accept
(). A single accept
()
succeeds and returns, all the other processes are blocked in
accept
(). However, all processes are waiting for
a connection on a single socket now and the remaining sockets are not
handled. Therefore, select
() and
accept
() are placed in a critical section secured
by a lock, which ensures that only one process sleeps
in select
() at a time. The lock is implemented
using flock
() on a file specified by a parameter
of the lock
item. For a large number of child processes
(many hundreds or thousands), locking via flock
()
may behave incorrectly and block the proxy operation. Therefore, it is
possible to use an alternative lock implementation selected by the
alt-lock
item. The following possibilities are available:
none
No locking is done. Accept is called in the
non-blocking mode, in order to solve the above-mentioned problem
with processes blocked in an accept
() function on
a single socket.
semaphore
Locking is done using a System V semaphore.
lock2
Locking uses a two-level
flock
() locking scheme with locking parts of
a single lock file. This is an experimental variant that should
not be used, because it exhibits a similar problem with many
processes as the standard single
flock
().
multilock2
This is the recommended alternative locking
mechanism. It uses a two-level flock
()
locking scheme with each lock on a separate file. The set of
NxN processes is divided into N subsets of N processes. Members of
each subset share one lock and there is a single global lock.
To acquire the lock, a process must first lock the lock file
belonging to its subset and then lock the global lock. This
algorithm reduces the maximum number of processes waiting on
a single lock file.
If either both or none of the lock
and
alt-lock
items are specified, the standard locking
is used if max-children
is up to 500, and
multilock2
is used for
max-children
of 501 or more.
Experiments indicate that this arrangement is not strictly
necessary on FreeBSD, because it seems that if there is a very short
time between select
() and
accept
(), only a single process is woken up from
select
() and calls accept
().
However, this positive feature is dependent on timing (and thus on such
unpredictable conditions as the system load). We have implemented the
serialization lock in order to prevent race conditions.
Be careful when configuring lock-file names for proxies. If two different proxies happen to use the same filename, one of them gets stuck. Such a situation looks rather strange: the TCP handshake takes place, but data exchange does not. As proxy processes are unable to detect this situation, care should be taken.
If item nodaemon
without
singleproc
is used in the configuration, i.e.,
parent/children operation in no-daemon mode, the proxy runs in
the same process group as its parent process (if it was not moved to
another group before executing the proxy program). The proxy parent
process uses kill(0, sig)
syscall to
propagate SIGTERM
and SIGHUP
to its children. But the signal is delivered to all processes in the
process group of the proxy. Thus, other processes (not belonging to
the proxy) in the same group should make appropriate provisions in
order not to be disturbed by these signals.