clear-web-db — tool for managing the Clear Web DataBase
clear-web-db { -d | -n | -m }
clear-web-db { -o | -u | -s | -S } [db]
clear-web-db -l [-f] [db]
clear-web-db -c [locale]
clear-web-db -a [db] incr
clear-web-db -i [db] db2 incr
Utility clear-web-db can be used to create, modify, and view the Clear Web operational database, which is used by http-proxy(8) to categorize Web pages. It is usually called by program clear-web-db-update.sh (1), which handles automatic periodic updates of the database.
URLs used by program clear-web-db consist of
a server domain name followed by an optional path, including neither the method, nor
the port number. Example: www.tns.cz/index.html is used
instead of http://www.tns.cz:80/index.html.
-oReads the input textual data from the standard input
and writes the operational database into file
db.
-uReads the input textual data from the standard input
and updates the operational database in file
db.
-iCreates the incremental update data between the
operational databases stored in files
db and
db2 and writes the result into
file incr.
-aApplies the incremental updates from
incr to database
db.
-sReads URLs, one per line, from the standard input,
searches for them in database db,
and writes corresponding categories to the standard output. Each output
line contains the numeric bitmap of categories followed by a
space-delimited list of category names.
-SReads URLs, one per line, from the standard input,
searches for them in database db,
and writes corresponding categories to the standard output along with the
part of the URL that was used to determine the categories. Each output
line contains the numeric bitmap of categories followed by the matching
URL substring.
-lLists the contents of database
db. Each line of output
contains a hash value, a bitmap of categories, and a list
of category names. When listing an incremental update
data, records to be deleted are listed as a hash value
followed by the character
“-”. If option
-f is specified, flags are displayed
between the hash value and the bitmap of categories.
A flags value is an OR-combination of constants: 0x1
= there may be a record with a longer domain name suffix,
0x2 = there may be a record with a longer path prefix, 0x4
= the record contains categories (without it, the record is
used only to continue with a longer domain or path
according to 0x1 or 0x2), 0x8 = denotes a record to be
deleted (in an incremental update only).
-dReads URLs, one per line, from the standard input, and writes digests (hash values) to the standard output.
-cWrites all known categories (numbers and names) to
the standard output. If the
locale argument is specified, the
category name is localized into given locale.
-nReads category bitmaps, one per line, from the standard input and writes the corresponding lists of category names to the standard output.
-mReads lists of category names from the standard input and writes the corresponding category bitmaps to the standard output.