sum-stats — generates proxy usage statistics from Kernun logs
sum-stats [-p ] [period-t ] [type-n ] [name-l ...] [field=limit--filter ...] [field=filter--spam-threshold ] [value--shift ] [time_offset--start ...] [time_spec--finish ] [time_spec--entitle ] [label–info ] [list--db] -o outfile
The sum-stats script reads a Kernun log from the standard input and generates proxy usage statistics. The exact contents of the output depend on the proxy type. However, the generated output always retains the following structure:
Summary: totals + Kernun Clear Web database hit-rate (for
http-proxy and icap-server)
Histograms: per-hour, per-day, per-weekday (depends on
period)
Hitparades: per-client, per-server, ... (depends on
type)
-p periodSets the period
(daily, weekly, monthly).
Log items outside the date interval based on this period are
filtered out.
Use --shift for specifing which period
to be generated. The current period (day/week/month) is
generated by default. For example, use -p weekly
--shift -1w for generating the statistics for the last
week.
-t typeSets the type of the
proxy. If not set, the default value is proxy
(does not assume any particular proxy type). A list of recognized
proxy types can be found below.
-n nameSets the name of the proxy
(altname) to be included in the statistics (other proxies are
filtered out). If not set, all proxies are included.
-l field=limitSets the limit for
the given field (top N clients, servers, ...).
If not set, the field is excluded from the statistics.
The special value 0 means not to limit this field at all, All the values are included in the statistics, regardless of their total count. Note that using field limit 0 can result in a VERY BIG statistics that can lead to problems when viewing them.
A list of available fields can be found below.
--filter field=limitSets the filter for
the given field (clients, servers, ...).
If set, only the log records that match the filter are taken
into account. If set, the statistics for the field that is
being filtered are supressed, since it would be degenerate.
--spam-threshold valueSets the spam-threshold;
mails with spam score above this level are considered SPAM.
If not set, the default value is 5000.
--shift time_offsetBehaves as if the processing day was executed
earlier/later, given by
time_offset. The form of the
time_offset is
[<SIGN>]<COUNT>[<UNIT>][_<ROUND>]
SIGN: '-' for shift to
the history, + for shift to the future. Defaults to '+'
COUNT: the number of
days/weeks/months. Can be 0 for no shift, which can be
useful in conjunction with ROUND.
UNIT: 'h' for hours,
'd' for days, 'w' for weeks, 'm' for months.
If
ommited, UNIT default depends on
the period selected by --period: 'm' for
monthly period, 'w' for weekly period and 'd' for daily
period. If no period is selected, 'd' is used as the
default value for UNIT.
ROUND: if given, the
result is rounded up or down within the given unit. Use 'up'
for round up, 'down' for round down.
For example, --shift -2w_up shifts two weeks back, to the Sunday
23:59:59. The option can be given more than once in which case
the time in sequence shifted more times.
See also environmental variable TIME Setting the environmental variable TIME has
the similar effect as using –shift. The
time is given as the system time when the script is executed by
default. This can be overriden by the TIME
environmental variable. The resulting value is then used as the
base for the –shift options.
--start time_spec,
--finish time_specExplicitly sets the time interval to be used. The
timespec is one of the
following:
iso timestamp:
one of YYYY-MM-DDTHH:MM:SS,
YYYY-MM-DDTHH:MM,
YYYY-MM-DDTHH,
YYYY-MM-DD
unix timestamp: the
number of seconds since 1970
time_offset:
time is given as an offset to the current time (possibly
affected by option shift
Options --start and
--finish are mutually exclusive with option
period, which sets the interval implicitly.
--info listInstead of creating the statistics, reports some information, given as a comma separated list of desired info:
fields: print the fields
valid for the particular type
types: print the available
types
results: print the
available results
interval: print the
time interval that would be used
log_files: list the filenames
that likely contain the desired time interval without the
eventual compression suffix.
log_files: list the filenames
that likely contain the desired time interval.
log_files_ts: print the
shell script that cats the files that likely contain the
desired time interval.
period_inst_name: period
instance name. Prints the suggested name of the periodic
statistics, if generated with the current arguments.
Based on the beginning of the interval, it is used 'YYYYMM'
for monthly, 'YYYYWW' for weekly and 'YYYYMMDD' for daily
statistics.
oldest_log: print the
timestamp of the oldest line in the available logs.
--dbIf present, the newly created statistics is also indexed in the statistics index database.
-o outfileThe output will be saved to outfile.html,
accompanied by its data file outfile.json.
TIMEThe timestamp used to calculate the interval of dates to be included in the statistics (affected by the period, shift). If not set, the current time is used.