Five Key Considerations When Starting Your Data Classification Project


In the few short years I have been with TITUS, I have experienced a transition from having to educate organizations about “why classification is important” to explaining how TITUS will make their classification initiative a success. With the rapid growth in the classification market comes some fun dynamics such as new competition, new partnership opportunities and even new ways of looking at the definition of what data classification means.

Data classification was recently added on two different Gartner hype cycles: Information Governance, and Data Security. This suggests there are many ways that vendors position data classification solutions, which can create confusion for those looking to classify their data. Knowing your data is foundational to both information governance and data security; the challenge is knowing if the classification solution you are considering meets both your immediate and future requirements.

Five Key Considerations When Choosing a Classification Solution

Below are five things to consider when starting your data classification project:

  • Determine your data classification objective(s)
    Data classification can have a positive impact on multiple aspects of your business, such as data security, user security awareness, compliance, and information lifecycle management. Given the breadth of value classification can bring to an organization, it is important to start by identifying your key success factors. We like to call it the “crawl” phase – or the initial project benefit. For example, if your organization’s mandate is to leverage data classification for data security optimization (i.e. enterprise DLP integration), then your “crawl” objectives will look a little bit different from a mandate to enable some form of downstream information governance solution (i.e. archiving/retention).
  • Find out the kind of data the vendor classifies
    A data classification vendor can classify two types of data: unstructured (such as data found in emails, Office documents and files), and structured data (data found in databases). Most data classification vendors focus on unstructured data as this is where the biggest risk for data breaches can occur (some stats suggest 80%+ of data in organizations is unstructured), but there are times when the vendor will only classify structured data. If you had to prioritize between the two, unstructured data should always be the first step as this is where most data breaches occur. Unstructured data is constantly being shared internally as well as externally to move business forward. Furthermore, structured data has a habit of turning into unstructured data in the form of reports, lists, etc. exported from lines of business applications like ERP or CRM.
  • Determine what data you want to classify
    Vendors offer two approaches to data classification – in some cases a vendor may offer both:

    1. New data classified at the point of creation. As email, documents, or files are being created, a classification is applied. It will be important to determine the range of classification application options the solutions you are considering will provide. Most data that contains PII, PCI and PHI is easy for a system to auto-classify. But for data like intellectual property (IP), the manual or user-applied classification options will be important.
    2. Legacy data is classified at rest. In this approach, the solution will scan “data at rest” in network file shares and cloud repositories to identify and classify content based on corporate policies.

Depending on your particular requirements, one approach may be more important for your initial objectives and drive the solution selection. Be aware, however, as your project matures both approaches will be necessary.

  • Understand where you want to classify data
    There are many places data resides so it will be important to define where you want to classify. As organizations begin enabling an on-premise/cloud hybrid reality, they will also have to define where data will be classified. The natural starting point is to provide knowledge workers with the ability to classify from the desktop. However, more and more of the data they create and use will be shared and stored in locations other than the local hard drive or network file share. With enterprise file sync-and-share options enabling enterprises around the world, it is equally important to ensure data is classified on platforms such as Box, Dropbox, OneDrive and SharePoint online. Finally, there is the “in between” world of mobile devices. Enabling email and document classification on a mobile device will prove to be a very important consideration when evaluating the completeness of a data classification solution.
  • Be certain the classification is persistent
    This last consideration is possibly one of the most important. Many vendors talk about data classification, but in fact only provide basic data identification. Identified files have their properties/content scanned and the sensitivity logged into a database, but the file itself has not changed. The classification in this case would not be recognizable to other data management systems and may be completely lost if the file is copied or moved. For the classification to be persistent, it needs to be added to the file’s metadata. Applying classification to the metadata ensures that it travels with the document wherever it moves. Without the classification in the metadata, downstream data governance and security systems will not be able to leverage the classification for optimized policy enforcement.

Bonus Consideration:

  • Get to Know Your Vendor
    The experience and the support provided by the vendor count as much as the solution features – sometimes more. During your evaluation be sure to get to know what support the vendor will provide. Will the vendor you choose provide the support your organization will need? In addition, understanding their road map and product direction will be key to knowing if the classification solution fits with your immediate and future requirements and infrastructure.

Leave a Reply