sum-stats — generates proxy usage statistics from Kernun logs
sum-stats
[-p
] [period
-t
] [type
-n
] [name
-l
...] [field=limit
--filter
...] [field=filter
--spam-threshold
] [value
--shift
] [time_offset
--start
...] [time_spec
--finish
] [time_spec
--entitle
] [label
–info
] [list
--db
] -o
outfile
The sum-stats script reads a Kernun log from the standard input and generates proxy usage statistics. The exact contents of the output depend on the proxy type. However, the generated output always retains the following structure:
Summary: totals + Kernun Clear Web database hit-rate (for
http-proxy
and icap-server
)
Histograms: per-hour, per-day, per-weekday (depends on
period
)
Hitparades: per-client, per-server, ... (depends on
type
)
-p period
Sets the period
(daily
, weekly
, monthly
).
Log items outside the date interval based on this period are
filtered out.
Use --shift
for specifing which period
to be generated. The current period (day/week/month) is
generated by default. For example, use -p weekly
--shift -1w
for generating the statistics for the last
week.
-t type
Sets the type
of the
proxy. If not set, the default value is proxy
(does not assume any particular proxy type). A list of recognized
proxy types can be found below.
-n name
Sets the name
of the proxy
(altname) to be included in the statistics (other proxies are
filtered out). If not set, all proxies are included.
-l field=limit
Sets the limit
for
the given field
(top N clients, servers, ...).
If not set, the field is excluded from the statistics.
The special value 0 means not to limit this field at all, All the values are included in the statistics, regardless of their total count. Note that using field limit 0 can result in a VERY BIG statistics that can lead to problems when viewing them.
A list of available fields can be found below.
--filter field=limit
Sets the filter
for
the given field
(clients, servers, ...).
If set, only the log records that match the filter are taken
into account. If set, the statistics for the field that is
being filtered are supressed, since it would be degenerate.
--spam-threshold value
Sets the spam-threshold
;
mails with spam score above this level are considered SPAM.
If not set, the default value is 5000
.
--shift time_offset
Behaves as if the processing day was executed
earlier/later, given by
time_offset
. The form of the
time_offset
is
[<SIGN>]<COUNT>[<UNIT>][_<ROUND>]
SIGN
: '-' for shift to
the history, + for shift to the future. Defaults to '+'
COUNT
: the number of
days/weeks/months. Can be 0 for no shift, which can be
useful in conjunction with ROUND
.
UNIT
: 'h' for hours,
'd' for days, 'w' for weeks, 'm' for months.
If
ommited, UNIT
default depends on
the period selected by --period
: 'm' for
monthly period, 'w' for weekly period and 'd' for daily
period. If no period is selected, 'd' is used as the
default value for UNIT.
ROUND
: if given, the
result is rounded up or down within the given unit. Use 'up'
for round up, 'down' for round down.
For example, --shift -2w_up
shifts two weeks back, to the Sunday
23:59:59. The option can be given more than once in which case
the time in sequence shifted more times.
See also environmental variable TIME
Setting the environmental variable TIME
has
the similar effect as using –shift
. The
time is given as the system time when the script is executed by
default. This can be overriden by the TIME
environmental variable. The resulting value is then used as the
base for the –shift
options.
--start time_spec
,
--finish time_spec
Explicitly sets the time interval to be used. The
timespec
is one of the
following:
iso timestamp
:
one of YYYY-MM-DDTHH:MM:SS
,
YYYY-MM-DDTHH:MM
,
YYYY-MM-DDTHH
,
YYYY-MM-DD
unix timestamp
: the
number of seconds since 1970
time_offset
:
time is given as an offset to the current time (possibly
affected by option shift
Options --start
and
--finish
are mutually exclusive with option
period
, which sets the interval implicitly.
--info list
Instead of creating the statistics, reports some information, given as a comma separated list of desired info:
fields
: print the fields
valid for the particular type
types
: print the available
types
results
: print the
available results
interval
: print the
time interval that would be used
log_files
: list the filenames
that likely contain the desired time interval without the
eventual compression suffix.
log_files
: list the filenames
that likely contain the desired time interval.
log_files_ts
: print the
shell script that cats the files that likely contain the
desired time interval.
period_inst_name
: period
instance name. Prints the suggested name of the periodic
statistics, if generated with the current arguments.
Based on the beginning of the interval, it is used 'YYYYMM'
for monthly, 'YYYYWW' for weekly and 'YYYYMMDD' for daily
statistics.
oldest_log
: print the
timestamp of the oldest line in the available logs.
--db
If present, the newly created statistics is also indexed in the statistics index database.
-o outfile
The output will be saved to outfile
.html,
accompanied by its data file outfile
.json.
TIME
The timestamp used to calculate the interval of dates to be included in the statistics (affected by the period, shift). If not set, the current time is used.