Deconstructing three machine learning myths
Find out some myths behind machine learning and its impact on your data security strategy.
Last week, I wrote about how machine learning can be practically applied to data classification and protection. There’s no denying the fact that machine learning can help you identify and protect sensitive information. It reduces file analysis to seconds or minutes, and gives people more time to analyze results and determine what needs to be done next.
However, there are many who believe that machine learning is a silver bullet solution to their security problems. But as Raffael Marty, VP of Corporate Development at Forcepoint and one of the leading industry experts on big data and analytics points out, the risks of relying heavily on technology is dangerous and can even create a false sense of security.
So, how can machine learning help you keep your organization’s most sensitive information secure?
The first step is to break down three common machine learning myths.
Machine Learning Myth #1: Machines just know what sensitive information is
As business leaders, we assume our most sensitive information is locked down. And if a piece of sensitive information is leaked, we can simply push a few buttons to find out where it is, and bring it back to safety.
That’s a myth.
The use of full automation certainly has its place in areas such as detecting anomalies or operating within a concrete set of rules. But most organizations don’t have mature data management practices.
Less than 1% of unstructured data is analyzed or used at all and people have access to more data than they should.
The direct identification of sensitive material is too difficult for an algorithm to learn on its own, especially when it can’t factor in context or nuance without prior knowledge.
Tip: Make sure everyone in your organization understands what classification or category needs to be attached to a document based on the type of information it contains.
Machine Learning Myth #2: Machine learning will work wonders with tons of data
Just like any company that has deployed a CRM program, it’s all about the quality of the data that goes in rather than the volume of it. It’s no different for information security and, as the old saying goes, “garbage in, garbage out!”
It takes time and resources to collect, analyze, double and triple check, and finally prepare a really good and accurate data set, based on the context of its regular usage, to train your algorithms.
That’s the most practical way to get machine learning to precisely derive the information you have and help you determine what you need to do to protect it.
Without reliable training data sets, you run the risk of frustrating users as well as data stewards or security analysts. If your users get frustrated, you’ll face a constant uphill battle to get them involved and support your security program, no matter how much promise a piece of technology has to make their lives easier.
Tip: Bring in the right stakeholders from across your business to understand what type of data they have and what their policies are for information that is created, shared, stored, and deleted.
Machine Learning Myth #3: You can build and train an algorithm once, and it will do the rest on its own
Data hoarding is a real problem, but the sensitivity of information changes all the time. What might be classified as sensitive today may be eventually become public information down the road because context changes. This is why it’s important to separate “what” something is (its category) and “how” it should be handled (based on its sensitivity).
Forrester Research’s July 2018 report entitled Rethinking Data Discovery And Classification Strategies refers to this as “dynamic data classification”, which currently requires employees andtools for automation and enforcement for information security.
With the context evolving so quickly today, it’s not practical to assume algorithms can learn and identify sensitive information unique to your organization on their own. Like humans, improvement requires re-assessment and we need methods/processes for that to take place.
Tip: Schedule reviews and have feedback mechanisms in place to review your information security policies specifically for data classification, information lifecycle, and data governance.
It’s not about adding more layers to security
Technology will continue to have a tremendous impact on how we run our businesses and keep information secure. But it’s clear we need to adapt our thinking and our approach to how we protect data.
It all starts with how we define data in this digital age: Data isn’t merely documents that we create and use for a single use, so why do we treat it that way?
People will always be a top security risk for data loss and breaches – that’s just reality.
So, the solution isn’t to add more layers of security for the sake of it. The solution is to have those layers work together – and this includes machine learning – to strengthen your information security program that makes the most sense for your business.