doctype-identification — document type recognition methods and configuration
The third-level ACL, doc-acl
, has as an entry
condition mime-type
specification (see
access-control(7),
configuration(7) manual pages). There are several
methods that can be used to recognize the type of a document and several ways to
control them in the configuration. The recognized type can be used in ACL
matching, and also can be sent instead of the original Content-Type (see
force-doctype-ident
in
acl(5) manual page).
The following methods are used by Kernun applications. The order of
their usage is defined in the configuration. They are tried
subsequently until some succeeds; in none does, the type remains unknown,
matching the string ""
only.
The default metod is quite simple. The proxy uses the type declared
by the document originator (server for download, client for upload) in
the Content-Type
header (e.g. SMTP or MIME
header).
This method is very clear and fast, it needs no special configuration, but it has two disadvantages: some protocols, such as FTP, cannot use it, and the others are quite vulnerable to type faking.
Method name: content-type
If the name of the document (or URL) is available, the type can be guessed by searching a database that maps filename extensions to MIME types.
This method has characteristics similar to the default one (clarity, speed, but also vulnerability). It needs somewhat more information to be well configured, namely a file with the extension-to-type mapping database.
Method name: extension
The last method is similar to the one used e.g. by the system command file. The proxy reads the initial block of the document (the size of the block is configurable) and tries to guess the file type based on this block, with the help of a magic number file (see magic(5) manual page).
This method is the most complicated — it needs to gather some data from the document originator before the control decisions are made. On the other hand, this method yields results very close to the real content of documents, regardless of data originator's instructions.
Method name: magic
In order to ensure correct operation, it is necessary to define the order
of the methods and some additional information. If no order is given, only the
content-type
method is used.
Each proxy can set global parameters and the default order of methods
in the doctype-identification
section (see
application(5) manual page).
mime-types
shared-file-name
;
This item must be used if the extension
method is used anywhere in the proxy configuration.
magic
[filename [scan-size]]
;
This item should be used if nonstandard
filename or different block size is to be used for the
magic
method.
order
[for direction
]
order
;
This item is repeatable, but the only reason
for this is to enable different method order for upload and
download. In protocols where direction makes sense (e.g. FTP or
IMAP4, contrary to POP3 or SMTP), the keyword
for
with a value can be used to distinguish between
upload and download definitions. The
is simply a list
of above-mentioned keywords (method names). order
Each ACL on the first two levels (with some exceptions, such as
delivery-acl
in SMTP) can redefine the default
order (see acl(5) manual page).
doctype-ident-order
[for direction
]
order
;
This item has the same syntax and semantics as the
order
item of the proxy global
doctype-identification
section.
For a particular transfer, the order in the second-level ACL is searched for; then (if not found), the one in the first-level ACL is tried and, finally, the order from the proxy global section is used (if any).
Suppose the following configuration:
proxy ... { doctype-identification { ... doctype-ident-order for download { extension, magic }; } session-acl sa-1 { ... doctype-ident-order for upload { content-type, magic }; } session-acl sa-2 { ... doctype-ident-order for download { }; } session-acl sa-3 { ... doctype-ident-order { content-type, magic }; }
In this case, downloads according to sa-1 use the "{ extension, magic }" order, while uploads use "{ content-type, magic }"; sa-2 downloads use no method (type will be "") while uploads use default method (content-type); finally, transfers via sa-3 use "{ content-type, magic }" (in both directions).
Kernun: acl(5), application(5), access-control(7), configuration(7)
FreeBSD: file(1), magic(5)