17. Content Processing

Kernun UTM provides tools to investigate and filter the contents of the passing packets. These tools make it possible for the administrator to filter the contents of the HTML and mail traffic.

17.1. Content Type Detection

The third-level ACL, doc-acl, has as an entry condition mime-type specification (see access-control(7), configuration(7) manual pages). The content type detection can be performed using several different methods; the administrator can set the order of the used methods globally for the whole proxy, or separately for each of its doc-acls.

The following content-type detection methods are available:

content-type

(Original Content-Type) — the proxy uses the type declared by the document originator in the Content-Type header;

extension

(File Name Extension Mapping) — the proxy tries to guess the MIME type from the name of the document (if specified);

magic

(Magic Number Recognition) — the proxy reads the initial block of the document and tries to guess the file type from it with help of the magic number file (see magic(5) manual page).

Note

Method magic is skipped in the http-proxy when the response is partial and does not contain its beginning. Partial HTTP response either contains HTTP header Range or has content-type multipart/byteranges. The method magic is always skipped when the content-type is multipart/byteranges. Item request-acl.delete-req-hdr-range can be used to make the server send the entire response.

The common usage of mime-type conditions is in the http-proxy, where the administrator can forbid selected types of documents (e.g., video, applications). In our example, we will start with the initial configuration file, as shown in Section 2, “The Initial Configuration”, and deny all documents recognized as any of the video mime-types for the http-proxy HTTP.

First, we need to specify the order of the used Content-Type detection methods. We add the doctype-identification section into the http-proxy HTTP, insert the order item into it and select the intended methods in its detail. In order to do so, we append three values magic, extension and content-type into the doctype-ident-method-list order field, leaving the field direction-set for unchecked (it can be used to define different order for each of the traffic directions — upload and download).

The magic and extension methods need further configuration: the former a magic file and the latter a file that contains the extension to the content-type mapping database. We specify them by adding two shared-file items pointing to the corresponding configuration files. In our example, we will use the sample configuration file for magic located in samples/shared/magic and the extension configuration file located in samples/shared/mime.types. We configure the selected methods to use these shared files by inserting two new items into the added doctype-identification section. The magic item points to the name of the magic configuration shared-file and the mime-types item to the name of the extension configuration shared-file.

Having configured the order of the used Content-Type detection methods, we can proceed to the filtering of all video documents. We do so by inserting new doc-acl called VIDEO into the http-proxy HTTP section. We will add it right above the doc-acl DOCOK section, so that VIDEO takes precedence. We restrict the ACL to the video MIME type by inserting the mime-type option. We define the set of matching MIME types in str-set type. In this example we insert a single item, video. In a more complex situation, regular expressions can be used to define all the types to be matched for the ACL. Finally, we need to add the deny item into the inserted doc-acl VIDEO. Figure 5.68, “Content Type detection configuration for HTTP proxy” shows the relevant part of configuration.

Figure 5.68. Content Type detection configuration for HTTP proxy

Content Type detection configuration for HTTP proxy

The resulting configuration file can be found among Kernun UTM samples under the name doctype-detection.cml in /usr/local/kernun/conf/samples/cml. For more detailed information on Content Type detection, see the doctype-identification(7) manual page.

17.2. HTML Filtering

Besides other tools used to filter whole packets and documents (URL filtering, antivirus checking, etc.), Kernun UTM provides an HTML filter that filters (or replaces) elements and attributes inside an HTML document. The filter is applied to every passing document on the third level of ACL processing, so it is applicable with all proxies that have the doc-acl section (http-proxy, imap4-proxy, pop3-proxy, etc.). The filter gets the whole HTML document, and deletes (or replaces) undesirable elements and attributes according to the specified rules. If there are no rules in the HTML filter, the document will be passed “as-is”. If we create a rule of some type, all the matching elements (or attributes) will be accepted/denied according to the matching rule. If no rule matches, but there is a rule of the element/attribute type specified, the element/attribute will be denied.

For example, the administrator may want to deny all the Adobe Flash animations and replace all the URIs that refer to “suspicious” Web sites. We create a system-level html-filter section and name it HTMLFILTER. Now we need to add rules that will filter the elements and attributes of the document. First, we want to filter all the Flash animations. These can be contained in two HTML elements, embed and object, so we need to delete these two elements, but not all of them – only those with the application/x-shockwave-flash Content-Type. We add an embed-tag-type item, which represents a filter rule that applies to embed tags with the corresponding content type. Now, we need to specify the content types we want to filter (either as whole Content-Type names or using regexps) and the action that is to be done (accept or deny). We append one value, “application/x-shockwave-flash” to str-set val and select deny from the action combo box. Now, we have filtered all the embed HTML elements with flash animations. We can do the same with the object elements using the object filter item set to the same values.

The items added so far filter out the undesirable contents. However, it may be useful to warn the users that we have changed the documents they are viewing. We can do so using the replace- filter items, which define the content that will replace a deleted element/attribute. For elements that can appear in both the head and body HTML elements (such as embed and object) we can set the replace text separately for each of the two cases. In our sample, we will create a replace-body-embed-tags item and set its value to “Flash embed tag DENIED” and a replace-body-object-tags item and set its value to “Flash object tag DENIED ”.

We must not forget to create accepting rules for both object and embed-tag-type that will accept all the Content Types. Otherwise, the previously added rules would delete all the embed and object elements.

Clickjacking protection can be implemented by filtering IFRAME elements. This is done by adding iframe-tag-src rules, which follow the convention stated above. The SRC of the IFRAME element is being matched as a regexp (/^http:\/\/([^.]+\.)*kernun.com\// in the example matches the domain kernun.com and its subdomains). The replacement text can be set by replace-iframe-tags.

The HTML filter provides another type of useful rules: attribute filters. These can be used to delete (or replace) whole attributes from the document. In our example, we want to replace all the “suspicious” URI attributes with a neutral one. We will add a new uri item rule and set it to deny the undesired URI regexps, in our case /.*photo.*/, /.*video.*/ and /.*warez.*/. Again, as we have created a new filter rule, we need to allow all other URIs that do not match this rule. We add a new uri item and set it to accept all the URIs (*). Now we add the replacement for the whole URI attribute (not only the value that is matched in the uri item) by inserting a new replace-uri item rule and setting it to a neutral URI, for example "href='http://www.kernun.com'"[39].

Having created the whole HTML filter, we can use it in any proxy that has a third-level ACL doc-acl. We tell the proxy to use an HTML filter by inserting an html-filter item with the filter's name into the proxy's doc-acl. The use of a slightly more complex HTML filter blocking all the Flash documents and the specified URIs in the http-proxy HTTP is depicted in Figure 5.69, “HTML filter example”.

Figure 5.69. HTML filter example

HTML filter example


The complete resulting configuration can be found in /usr/local/kernun/conf/samples/cml/html-filter.cml. More complex HTML filter rules are created in /usr/local/kernun/conf/samples/include/html-filter.cml. For more information on HTML filtration, see the mod-html-filter(5) manual page.

17.3. MIME Processing

In Section 2, “The Initial Configuration”, we created simple mail-handling proxies (smtp-proxy, imap4-proxy and pop3-proxy), which just passed the e-mail and did not check it in any way. Kernun UTM provides tools to decode the e-mail MIME structure and examine each document. The mail may be scanned for viruses (see Section 15, “Antivirus Checking of Data”) and for spam content (see Section 16, “Antispam Processing of E-mail”). These scans are added to the proxy by inserting a use-antivirus and/or use-antispam item into the proxy and setting them to the respective antivirus/antispam section name.

Kernun UTM has also a mail filter, which can be used to repair the MIME structure and headers of the e-mail according to corresponding RFCs, so that other mail servers could process the mail and possibly deliver it to the recipient(s). Errors in e-mails that are to be corrected can be specified in the mail filter. The relevant part of a sample configuration with a mail filter is shown in Figure 5.70, “Mail filter example in use with IMAP4, POP3 and SMTP proxies”; the entire configuration can be found in /usr/local/kernun/conf/samples/cml/mime-processing.cml.

Figure 5.70. Mail filter example in use with IMAP4, POP3 and SMTP proxies

Mail filter example in use with IMAP4, POP3 and SMTP proxies


For more information on MIME filtration, see the mod-mail-doc(5) manual page.



[39] This rule might replace attributes other than href (such as src or action) with a href, but such a change will not “damage” the document more than simple deletion of the attribute would.