My Metric is Better Than Yours – Optimizing Business Decisions with AI

P. Sondergaard, Senior VP at Gartner famously said: “Algorithms are where the real value lies”. Artificial Intelligence (AI) and machine learning are reshaping entire industry sectors and verticals, creating the foundation of the next industrial revolution.

Optimizing Metrics, AI

Companies, especially retailers, are becoming increasingly data-driven and quantitative. Predictive Applications allow operational decisions to be automated by almost 99% through sophisticated AI algorithms that analyze all relevant data to make predictions that are then executed as business decisions. For example, a retailer can use the application to automatically set the best price for any product at any sales location on a daily basis, all while keeping track of changes in customer behavior.

But how do you measure the success? This question is the cornerstone of any new project that introduces AI-based applications into the operative systems of any enterprise. "Is it worth it? Does the company benefit from it?" However simply answering “yes” is not enough. It needs to be backed up by a quantitative measure, which increases — or decreases — significantly through the optimized decisions an AI application delivers. Similarly, once the AI-based application has been rolled-out and operates continuously, feeding decisions into the operative systems, project managers and executives need to monitor its performance in day-to-day operations: “Is everything OK? Are the decisions still optimal? Or has something changed that warrants a closer look and maybe more work on the AI model?”

The most important benchmark of success retailers and enterprises is their (expected) profit. During the setup phase, the success can be measured by how much more money is made compared to before the new AI system was in place.

However, appointing overall profit as the only metric is challenging — several projects may be worked on at the same time, each providing a significant improvement to the business. This would make it difficult to ascertain which project contributed how much. Additionally, project managers and executives may want to highlight the performance of a specific part of the business by introducing a Key Performance Indicator (KPI). For example, a grocery retailer may want to also measure stock-out and waste of perishable goods in the portfolio of monitored metrics even though both quantities are implicitly contained in the expected profit, which also accounts for stock-out in terms of lost sales and customer satisfaction, as well as waste as write-off and disposal costs.

Introducing explicit KPIs are an ambivalent choice at best — they are essentially an attempt at mapping a complex business setup to a few numbers, which, more often than not, are not directly related to the overall business strategy. For example, reducing stock-out and waste of perishable goods is an important goal but it is hard to imagine a CEO of a publicly-traded company to report to their shareholders: “Look, our expected profit plummeted and we have significant losses – but waste is down 10 percent!” Less likely would be the shareholders cheering at that announcement.

However, it does happen that the discussion about KPIs and metrics does not get the attention it needs. For example:

  • Which metrics and KPIs should be chosen? Already the choice which KPIs to monitor has a major impact on the project – the parts that are not reflected in these numbers might as well not exist. Are the KPIs and metrics chosen really the ones benefitting the company most? Or are they just the default set of numbers that have always been monitored? Can the choice of KPIs be changed? What will happen to contractual obligations in this case?
  • Which values of the KPIs or metrics are desirable? Once a set of KPIs and metrics has been agreed upon, monitoring them is not sufficient. Agreeing to a "cut off" or division between "good" and "bad" is essential. Not setting limits would be equivalent to not monitoring the KPIs and metrics at all: It’s nice to have a bit of information but it has to be actionable. Even worse, acting on KPIs and metrics following instinct or a "gut feeling" makes operational decisions of a business less reliable and predictable. Getting this aspect right is one of the most difficult – and crucial – aspects of defining the project and business case. What does "good" and "bad" mean in the particular context of a specific project? Why is the limit chosen at a particular value and not another? What impact does a "bad" KPI have? Does "slightly bad" or "barely good" make a difference? If yes, how much and what does that mean in this specific setting? What value of a KPI requires prompt – and which immediate – action and how are these actions defined? Can the follow-up process be automated as well?
  • KPIs and metrics often conflict with one another. For example, a supermarket may monitor waste of perishable goods as well as the stock-out rate. Is one of the conflicting KPIs more important than the other? To what extent? Can one KPI be optimized at the expense of the others or is there a lower limit below which neither KPI should fall?
  • In practice, KPIs are often required to be "simple" – both in terms of what they relate to and the concept and the mathematics behind it. Having a well-defined and measurable KPI is not only desirable, but essential. For example, optimizing "customer satisfaction" may be a crucial aspect of the business – but how should this be measured or what does this even mean? Everyone has an intuitive albeit vague understanding of "satisfaction" – but how do you to translate this into a hard and measurable number? The number of complaints? The average time a customer spends in the store? The number of times a "happy face" button is pressed at the exit compared to the number of "angry face" button pushes? Each of these examples may be inherently biased and a better metric can be found in most of the cases: If you aim to reduce the returns or complaints, why not use these numbers instead?
  • One of the many traps project teams fall into too often is attempting to oversimplify for various reasons. If the aspect that should be monitored is not fully understood in all the details, a vague definition gives everybody the impression that something has been agreed upon, even if it’s not clear how the project will benefit from this. In other cases, the situation can be complex and not easily translated into a simple number – and sometimes it’s even because the project manager or a member of the senior management team involved in the project does not understand the mathematical or conceptual complexities behind a specific KPI. The obvious solution of using a status indicator similar to traffic light (good/need to pay attention/problem) is often not viable as it requires an enormous amount of trust and courage: Trusting your team (and perhaps also the vendor’s team) that they do the right thing – and having the courage to "let go" as a manager. As a side note, we ourselves do this all the time, for example when driving a car – have you really thought about the engine temperature in detail or what might happen if the needle is not in the centre of the gauge?

The measurement of forecast quality, especially in the case of regression is a good example that demonstrates the complexity of the discussion about metrics and KPIs. The discussion is sufficiently complex to warrant writing entire books about it. Reducing this complex issue to a single number that is the result of a simple formula is difficult – if not impossible in some cases – as many aspects need to be considered. And yet, almost each new project tries to define a new simple metric that attempts to "square the circle". The matter is complicated by the fact that many managers feel the need to include a metric – and maybe even a contractual target – to measure the accuracy of the forecasts of the projects. While accurate forecasts are key to any optimization that is built upon them, businesses mostly don’t derive value from the forecast itself, but rather from the optimized decision that uses the forecast as input.

So why make the forecast quality an explicitly monitored KPI, maybe even with contractual obligations? It is even more removed from more tangible KPIs such as stock-out or waste rates than the expected profit which is, in the end, what the company really cares about. Admittedly, the statement: "We monitor the forecast quality on a daily basis and have contractual obligations and penalties in place to ensure stable operations" sounds much more active and reflects a high degree of idea of control. Especially compared to the statement: "Let’s make sure we get the highest (expected) profit we can achieve and make this our sole target."

Although each item by itself may sound fairly straight-forward, change can be difficult. A particular set of KPIs or metrics may have been in use for so long that changing them would require a lot of effort and courage. The latter is particularly important if a specific metric is required by the top-level management – even if it’s not particularly helpful for the project. It is not unheard of that new metrics or KPIs are invented because "the CEO wants to see a number labelled “xyz” on their management dashboard and it has to be between 95 percent and 99 percent" to keep the project going. It's not clear what such a metric should actually measure or how it's related to the operational decisions or the business value in such a case. Although it may be an amusing distraction to simply add such a number, it is almost guaranteed to cause issues at some later point. By inventing a metric that isn't tied to the business goal, acting on it should the value fall below the threshold can do more harm than good. The other solution of just adding a random number between 95 percent and 99 percent with a slight variance would tempt fate with the “11th commandment”: Thou shalt not be found out.

External advisors can also complicate the project. While the consultants may advise a particular metric should be used, maybe even citing to the findings of a professor in the field. But if that academic work is taken out of context or doesn’t apply to the specific case due to some assumptions, "because the professor said so" can be extremely difficult to argue against.

Ulrich Kerzel earned his PhD under Professor Dr Feindt at the US Fermi National Laboratory and at that time made a considerable contribution to core technology of NeuroBayes. After his PhD, he went to the University of Cambridge, were he was a Senior Research Fellow at Magdelene College. His research work focused on complex statistical analyses to understand the origin of matter and antimatter using data from the LHCb experiment at the Large Hadron Collider at CERN, the world’s biggest research institute for particle physics. He continued this work as a Research Fellow at CERN before he came to Blue Yonder as a senior data scientist. Ulrich Kerzel earned his PhD under Professor Dr Feindt at the US Fermi National Laboratory and at that time made a considerable contribution to core technology of NeuroBayes. After his PhD, he went to the University of Cambridge, where he was a Senior Research Fellow at Magdelene College. His research work focused on complex statistical analyses to understand the origin of matter and antimatter using data from the LHCb experiment at the Large Hadron Collider at CERN, the world’s biggest research institute for particle physics. He continued this work as a Research Fellow at CERN before he came to Blue Yonder as a Principal Data Scientist.

 

Dr. Ulrich Kerzel Dr. Ulrich Kerzel

earned his PhD under Professor Dr Feindt at the US Fermi National Laboratory and at that time made a considerable contribution to core technology of NeuroBayes. He continued this work as a Research Fellow at CERN before he came to Blue Yonder as a Principal Data Scientist.