Nmedia - Fotolia
- Mike Matchett, Small World Big Data
Our data center machines, due to all the information we feed them, are getting smarter. How can you use machine learning to your advantage?
Machine learning is a key part of how big data brings operational intelligence into our organizations. But while machine learning algorithms are fascinating, the science gets complex very quickly. We can't all be data scientists, but IT professionals need to learn about how our machines are learning.
Demystifying the algorithms
We are increasingly seeing practical and achievable goals for machine learning, such as finding usable patterns in our data and then making predictions. Often, these predictive models are used in operational processes to optimize an ongoing decision-making process, but they can also provide key insight and information to inform strategic decisions.
The basic premise of machine learning is to train an algorithm to predict an output value within some probabilistic bounds when it is given specific input data. Keep in mind that machine learning techniques today are inductive, not deductive -- it leads to probabilistic correlations, not definitive conclusions.
The process of building these algorithms is called predictive modeling. Once you have mastered such a model on your data, you can sometimes examine it directly for insight into that original data, and/or apply the model to new data to predict something important. Broadly, a model's output can be a classification of something, a likely outcome, a hidden relationship or attribute or an estimate of value.
Typically, machine learning techniques predict a value that is categorical, such as a label, color, membership or quality. For example, does this subject belong to a set of customers that we should try to retain, or that will buy something, or that will respond favorably to an offer?
Predictions can also be numerical if we are concerned with estimating quantities or value on a continuous scale. The output type determines the best learning method, and affects the measurements we use to judge the quality of the modeling.
Who supervises machine learning methods?
Machine learning methods are either supervised or unsupervised. The difference isn't whether the algorithms are free to misbehave, but rather whether they learn from training data that has the true outcome available -- previously determined and added to the data set to provide supervision -- or instead try to discover any natural patterns within a given set of data. Most business use predictive modeling, exploit supervised methods on training data, and usually aim to predict if a given instance -- an email, person, company, or transaction -- belongs to an interesting category -- spam, likely buyer, good for credit, gets follow-up offer.
Unsupervised machine learning methods can provide new insights if you don't know exactly what you are looking for before you start. Unsupervised learning can also produce clustering and hierarchy charts that show inherent relationships in the data, and can also discover which fields of data seem dependent or independent, or rules that describe, summarize or generalize the data. In turn, these insights can be used to help build better predictive models.
This is just the initial stage into a deeper data science. Building machine learning models is an iterative exercise, and requires data scrubbing and experimentation. There are some automated and "guided" modeling tools emerging that promise to reduce the need for data scientists, but those will likely have the most payback in areas that are well-understood and common across industries. For real differentiation, it's likely that you'll need to dig in yourself.
Mike Matchett is a senior analyst and consultant at Taneja Group. Contact him via email at [email protected].
Take a look at the machine learning process