Bulk Classification – Classifying and Marking / Labeling Large Numbers of Files

As organizations mature their content protection strategies, they typically establish policies that ensure that all newly created documents and emails include proper classification. Once the information is classified, organizations’ can implement information security controls matched to the classification of the information.

However, most organizations have a large volume of legacy documents that have never been classified. This poses a problem for the organization’s information security policy. How can we bulk classify large numbers of legacy files so appropriate security controls can also be applied to these documents?


We recently had discussions with a customer that was faced with exactly this problem. The company had a mandate in place that required all documents created or accessed in the last six months to be classified. Documents were being manually classified by office workers who were adding a classification label to the footer of each document. This manual process could not keep up with the organization’s requirement to classify any document that was accessed in the past six months. They estimated that there were approximately 2 million files to be classified. The file types supported were Microsoft Office 2003 and 2007 Excel, PowerPoint and Word. The organization’s classification labels were already well defined as PUBLIC, INTERNAL, RESTRICTED and HIGHLY RESTRICTED.

There are a number of automated categorization tools in the market place that attempt to categorize documents based on their content. But none of these tools focus on the security classification of the documents, and none of them can apply classification labels in the document itself (i.e. in the header or footer). Bulk Classification of existing content is a complex undertaking. Consideration must be given to file access, infrastructure performance, visual labeling, and error and exception handling.

In this situation, our Titus Professional Services organization worked with the customer to develop a process and the tools needed to do the bulk classification. The solution consisted of appropriate modules for discovery, classification and visual marking of existing documents. The solution utilized Titus utilities executed under Powershell scripts. Custom solutions could also be built for uploading documents to SharePoint.

Much of this functionality already existed in the current Titus desktop software. For example, our current Titus Classification for Office  product can be used to classify and label Office documents. We just needed to figure out how to extract the functionality so it could be applied in a bulk fashion to a large number of documents.

In many cases the customer already knows what classification to apply to legacy documents. Because they know the business process that went into creating the document, they understand what classification to apply. But there are some situations in which the customer doesn’t know what classification to apply to certain legacy files. In these situations Titus Professional Services uses our content scanning engine to discover the classification of files. This is the same content scanning engine we use in our Message Classification product to detect potentially sensitive content in email.

If you have had experience doing bulk classification, please let us know your thoughts. If you are interested in finding out more about Titus Professional Services for bulk classification of documents, please email us at info@titus.com.

Leave a Reply