Supervised vs Unsupervised AIOps Machine Learning
What is “machine learning”?
Machine Learning (ML) is frequently characterized as the effort to give machines the ability to learn without being explicitly programmed. It is important here to clarify what is meant by ‘learning’ in this context. In ML, a machine is said to learn if its performance at a task or tasks, as measured by a defined metric, improves with ‘experience’ – i.e. exposure to more data. It does not mean teaching machines to think like humans do, or to at least simulate such thinking – that is the domain of artificial intelligence.
In machine programming, developers tell the machine (via software) every specific action to take and in what order. They let it know in advance what outcomes to expect and how to deal with them. A software application is a set of instructions to a machine about what to do and how to react to what happens (user input, feedback, data, etc.) “Bugs” are cases where the programmers failed to put in an instruction, did it incorrectly, didn’t account for output or user response, etc. That’s why good programming is hard – you have to anticipate every possibility and eventuality. It’s also (one reason) why software applications constantly need updates.
For IT, machine learning is, at heart, the recognition that as IT gets more complex, devices and systems get connected and digital data volumes explode, human programmers will fail to:
- Consistently anticipate how IT data and systems will evolve and interact
- Respond rapidly enough to those changes
Additionally, ML is also an acknowledgement that even if it were possible for humans to accomplish these two things, the cost of doing so would be prohibitive.
IT Operations is notoriously, and appropriately, conservative with the adoption of new technology. When your charter is performance and availability, you can’t take risks with the unproven or cutting edge. Machine learning is now, however, mature and ubiquitous in everyday technology. Netflix, Google, Waze, Amazon, Apple – pretty much any service or mobile app that gives recommendations, offers routing or guidance, processes reviews, or uses voice-recognition is backed by machine learning. We’ve hit the tipping point for ML and AIOps in recognizing that it is time for IT Operations to adopt it.
Supervised machine learning
“Supervision” in machine learning is a bit of a misnomer. The supervision of the machine is really a form of ‘training’ it. If a machine needs to learn a task using sample data (“input”) and an expected outcome (“output”), then the learning is supervised. Supervised machine learning gives the machine a starting point – the input – and an end point – the output. The job of the machine is to infer how to get from input to output.
The machine must be told the ‘what’, it has to figure out the ‘how’. In supervised machine learning:
- The supervisor must make decisions about what sample data will best train the machine.
- The supervisor must determine what learning algorithm should be used.
- The supervisor must verify the accuracy of the machine output.
Assume you have a data set α that, through a pre-defined analytics process f produces the outcome of x. Assume also that you have another data set β that through the same analytics process produces the outcome of y. Expressed mathematically:
f(α) = x
f(β) = y
What you want the machine to learn is the analytics process (the ‘function’) f. You don’t give the machine f: you give it the pairs of α <-> x and β <-> y and it must learn f from the initial data sets and outcomes. The number of data <-> outcome pairs you have to provide will vary (and likely be greater than 2) but the machine can be considered to have ‘learned’ when it correctly generates known outcomes from known data sets.
Once you have validated, using known input and output data, that the machine has learned f, you have completed the process of supervision. You may then allow it to analyze data sets for which the outcome is unknown and have reasonable faith that it will generate a correct outcome. This type of machine learning is typically used on data that is labeled (in the IT world = “structured”) to solve classification problems like ‘spam/not spam’ or ‘threat/not threat’ and regression problems like ‘when will metric X hit 90%’.
Unsupervised machine learning
Machine learning is unsupervised when you have input data but no expected outcome. With no outcome, you can’t train the machine, so input data cannot be used as a sample. Instead, the machine is tasked to learn from the data itself. There are no correct answers and no supervisor.
Unsupervised machine learning is used to look at the structure of data or the distribution of elements in a data set. It is used for clustering to identify inherent groupings like common phrases in logs/events, or associations, like the frequency when X failure occurs, failure Y also occurs. This type of machine learning is typically used on data that is unlabeled (in IT = “unstructured”).
Considerations for ML use in IT
The type of machine learning that should be used depends on the data available and the problem one is trying to solve. No one approach works for everything, and even within the same area, different approaches have tradeoffs. Some considerations for machine learning in AIOps:
- Supervised or unsupervised depends on the problem. IT has problems that fit both profiles.
- Someone must make design decisions about which algorithms are used for machine learning and in the case of supervision, what data is used to train the system and what constitutes “correct”. To solve IT problems, this means using IT data and IT domain knowledge.
- A clear definition of the desired outcome or “what good looks like” is required to perform supervised machine learning. This can come in the form of domain knowledge from the user and/or domain knowledge built into the system.
- A lot of enterprise IT data is similarly structured regardless of industry or application. This means for many use cases, machine learning analytics will be broadly applicable across different IT environments.
Some IT-specific machine learning examples
Here are some examples of machine learning analytics that BMC has implemented and to which TrueSight products they apply.
- Forecasting – determining when metrics will hit thresholds and performing “what if?” scenarios.
- Algorithms: Proprietary combination of multiple techniques including robust linear regression, regime change detection, seasonality decomposition, Box-Jenkins and more.
- BMC domain knowledge added: Yes (patented US #9160634 B2)
- Type of machine learning: Supervised (system is already trained by BMC – customer benefits without having to actively supervise but can modify parameters)
- Products: TrueSight Capacity
- Probable Cause Analysis – continuously correlate millions of data points and automatically identify a small number of potential causes.
- Algorithms: Proprietary combination of multiple techniques including but not limited to Pearson product-moment correlation coefficients or other suitable linear correlation algorithms.
- BMC domain knowledge added: Yes (patented US # 8463899 B2)
- Type of machine learning: Unsupervised (leverages historical data, defined relationships and domain knowledge without training or specific output)
- Products: TrueSight Operations
- Dynamic Baselining – determine future behavior of a metric based on that metric’s past behavior. Incorporates daily, weekly and monthly behaviors.
- Algorithms: Poisson and normal regression
- BMC domain knowledge added: Yes
- Type of machine learning: Unsupervised (historical data from metric used without training or specific output)
- Products: TrueSight Capacity, TrueSight Operations, TrueSight Intelligence
- Cloud Migration – simulate migration between on-premises and/or cloud providers to optimize cost.
- Algorithms: Greedy decision trees
- BMC domain knowledge added: Yes
- Type of machine learning: Supervised (output is known – lowest cost – and system has been trained by BMC. User can choose parameters e.g. cloud provider, billing, instance size, type, etc.)
- Products: TrueSight Cloud Cost Control
- Clustering – find similarities and frequency distributions of word pairings in unstructured data (logs, notes, etc.)
- Algorithms: Levenshtein (logs), Latent Dirichlet Allocation (events)
- BMC domain knowledge added: Yes
- Type of machine learning: Unsupervised (data from logs or events used without specific outcome known in advance)
- Products: TrueSight Operations, TrueSight Intelligence
Before embarking on analytics initiatives, IT leaders need to assess the problems they are trying to solve; the type, amount and frequency of data available; whether the outcome/task is known; and whether they have the expertise available to support supervised machine learning. This will dictate the appropriate algorithms and machine learning techniques.
If choosing to build over buy, IT leaders should also assess the knowledge and skill sets available in their organizations. Building ML-driven analytics internally requires not only the technical skill set to understand analytics, machine learning and statistical modelling, but also the ability to translate it into software code. The initiative will need to be supported by domain practitioners who not only know their domain and challenges well, but can communicate them effectively to others.
If the decision is to buy, IT leaders should qualify vendors by asking what IT problems they solve, what strategies and methods they use to do it, and what domain knowledge and expertise they build into their solutions. Avoid platforms with open-ended capabilities that don’t tie to specific problems or require staff and practitioners to act as consultants. IT environments differ greatly, but IT methods vary little. AIOps solution providers should be able to embed IT domain knowledge that provides quick time-to-value without a heavy burden on internal IT staff.