clear-web-db — tool for managing the Clear Web DataBase
clear-web-db
{ -d
| -n
| -m
}
clear-web-db
{ -o
| -u
| -s
| -S
} [db
]
clear-web-db
-l
[-f
] [db
]
clear-web-db
-c
[locale
]
clear-web-db
-a
[db
] incr
clear-web-db
-i
[db
] db2
incr
Utility clear-web-db can be used to create, modify, and view the Clear Web operational database, which is used by http-proxy(8) to categorize Web pages. It is usually called by program clear-web-db-update.sh (1), which handles automatic periodic updates of the database.
URLs used by program clear-web-db consist of
a server domain name followed by an optional path, including neither the method, nor
the port number. Example: www.tns.cz/index.html
is used
instead of http://www.tns.cz:80/index.html
.
-o
Reads the input textual data from the standard input
and writes the operational database into file
db
.
-u
Reads the input textual data from the standard input
and updates the operational database in file
db
.
-i
Creates the incremental update data between the
operational databases stored in files
db
and
db2
and writes the result into
file incr
.
-a
Applies the incremental updates from
incr
to database
db
.
-s
Reads URLs, one per line, from the standard input,
searches for them in database db
,
and writes corresponding categories to the standard output. Each output
line contains the numeric bitmap of categories followed by a
space-delimited list of category names.
-S
Reads URLs, one per line, from the standard input,
searches for them in database db
,
and writes corresponding categories to the standard output along with the
part of the URL that was used to determine the categories. Each output
line contains the numeric bitmap of categories followed by the matching
URL substring.
-l
Lists the contents of database
db
. Each line of output
contains a hash value, a bitmap of categories, and a list
of category names. When listing an incremental update
data, records to be deleted are listed as a hash value
followed by the character
“-
”. If option
-f
is specified, flags are displayed
between the hash value and the bitmap of categories.
A flags value is an OR-combination of constants: 0x1
= there may be a record with a longer domain name suffix,
0x2 = there may be a record with a longer path prefix, 0x4
= the record contains categories (without it, the record is
used only to continue with a longer domain or path
according to 0x1 or 0x2), 0x8 = denotes a record to be
deleted (in an incremental update only).
-d
Reads URLs, one per line, from the standard input, and writes digests (hash values) to the standard output.
-c
Writes all known categories (numbers and names) to
the standard output. If the
locale
argument is specified, the
category name is localized into given locale.
-n
Reads category bitmaps, one per line, from the standard input and writes the corresponding lists of category names to the standard output.
-m
Reads lists of category names from the standard input and writes the corresponding category bitmaps to the standard output.