Building Robust AI Systems – Part I

“What's the first thought that comes to mind when you think of Artificial Intelligence (AI)? Perhaps it’s a scene from a science fiction movie? Or – more realistically – AlphaGo, developed by the Google owned company DeepMind that learned to beat the best players at the complex game of Go? Perhaps you wonder about job security? MIT Technology Review recently published an article that analyzed when AI will exceed human performance in various fields - and you can even ask an AI when it will take over your job.

The above developments highlight the potential AI has to disrupt entire industries and push companies to rethink how they create value. But, they do not quite convey the full picture of what it takes to use AI-based systems in an enterprise field. It requires data – lot’s of it and of sufficient quality. AI and machine learning models learn from the data they are given, so high-quality data is paramount to ensure the models can learn the fundamental properties of a given system described by the data and are not tricked by deviations or errors.

In an enterprise environment, predictions are typically only useful if they can be delivered reliably and the decisions derived from them executed seamlessly – which is no small feat based on the number of predictions required.
Take a typical supermarket chain, for example. These have tens of thousands of products that are sold at hundreds of locations and need to be replenished on a daily basis. To ease long-term planning, it requires predictions for several weeks in advance. Say this supermarket chain has 30,000 active products across 1,000 stores, receiving daily deliveries each day of the year. To add complexity, predictions are required 21 days in advance to ease mid-term planning. So it’s not unthinkable that 30,000x1000x365x21 = 230 billion predictions have to be calculated for a single supermarket chain every single day. While this is just a rough estimate and probably an upper limit for most scenarios, it does, however, illustrate that the number of predictions required can become very large, very quickly. In many cases, predictions and the subsequently calculated order quantities have to be calculated in a short amount of time to accommodate operational constraints for example, to schedule delivery windows in the supply chain.

The challenges of building a robust AI system capable of delivering those billions of decisions on a daily basis can be broadly grouped into three topics:

  • Data handling and data quality
  • Deployment and operational excellence
  • AI development

We will focus on the first two parts, Data handling and quality, as well as deployment and operational excellence in this part and dedicate the next blog to AI and model development

Data handling and Data Quality

Building sophisticated AI models can only be done with a large amount of high-quality data. This requires the technical capabilities to access all required data, which may be scattered in a heterogeneous IT landscape comprising of various systems. Data need to be put together in an optimized storage system and then meticulously groomed to ensure that the quality of the data is as good as it can be. According to a report by Crowdflower, Data Scientists spend the bulk of their time (60%) on data handling, data quality and related issues and only about 4% of their time actually building or improving AI or machine learning models.

Choosing the right storage system depends on the type, amount and frequency the data are acquired and accessed, which are two separate requirements. Depending on the use-case, storage of the data may be more important than accessing it. However, if a new AI model is to be developed and trained, all data have to be accessed at some point. Some use-cases require a fast in-memory relational database, others a non-relational database – Master data may require one specific setup for storing and accessing the data, whereas transactional data, sensor data and other streams, may lead to another set of requirements. Although there are certainly a few best practices, each new use-case or project has its unique requirements and constraints. Careful consideration should be spent on which technology – or rather, which mix of different technologies – should be used for the project. More often than not, project managers can’t resist the temptation to buy a big compute cluster to get the project going – and then find out later that this doesn’t match the needs of the project but, as the new and expensive hardware is there, it has to be made a success one way or another.

As data quality is the foundation that any AI system will be built on, "data quality should be everyone’s job," as Thomas C. Redman wrote in the Harvard Business Review. Improving data quality is not a heroic one-time effort, but a continuous process as each new data delivery potentially contains new errors. It’s important to note that no one has only high quality data, despite all efforts to clean the data, some "bad data" will always be there. The "Bad Data Handbook" by Q. Ethan McCallum shares some practical insights of pitfalls and traps gathered by a number of experts in the field. In some cases, historic data may not exist – or are inaccessible. For example, if the IT systems have been migrated to a new setup and only part of the historic records were migrated. Other information such as maintenance logs, construction plans, special arrangements with vendors or logistics partners may only be stored as electronic documents (e.g. PDF) or even hard-copies. In one project the client’s project manager even exclaimed: "But our operators know how to handle this particular situation very well – they have a sticky note on their monitor!"
But to Data Scientists, it is clear that none of this is suitable for developing machine learning models and all this information needs to be transformed into electronic data records first – which requires a strong data quality effort. Even with the best intentions, the data do not represent the physical reality perfectly. Master-data may be incomplete and some ambiguities may remain during the data grooming or a number of factors may influence transactional or sensor data as they are transferred throughout the IT systems.
In one project concerned with the financial setup of a global company, each number was recorded with utmost care – however, the database storing all details did not capture how the company itself changed over the years. Parts of the company were re-structured, bought or sold, responsibilities for markets and products shifted. Even though the data may have been correct for any fixed moment in time, it was impossible to relate the current numbers of a business unit to those from one, five or 10 years ago. However, as companies start to benefit from introducing AI into their business, data quality efforts become not only easier to approve (as they are often seen as "cost-centers" without immediate benefit) but also mission critical to improve the decisions an AI based system can make.

Deployment and Operational Excellence

In addition to proprietary software, Data Scientists tap into a cornucopia of open-source toolkits such as Pandas, Scikit-Learn, Theano, Keras, Torch, TensorFlow, CNTK and others. KDnuggets compared various newly released deep-learning frameworks and found that Google’s TensorFlow is far more popular than others based on forks and stars at GitHub. This does not necessarily mean that TensorFlow is better than the others: Each framework has its unique strengths and weaknesses and its respective use-cases should be carefully considered when deciding which one to choose. Whatever framework is chosen, it is important to keep in mind that the "two-language problem" (a term coined by Wes McKinney in his book “Python for Data Analysis”) should be avoided if at all possible: Data Scientists should use the same framework and tools available on the production systems to develop and test a model in an environment that is as close to the one used to calculate the predictions at scale as possible. Otherwise, the Data Scientists may develop an AI system in one setup or language, which then needs to be translated into whatever is available on the production systems. Such a port is difficult because the various frameworks may not support the same set of features or behave in the same way and translating and testing a model is costly.

At the end of the day, what matters most to customers is not the most beautiful or even the most complex AI setups – but delivering billions of high-quality decisions every single day for the operational part of the business. Two aspects are critical:

  • Robustness against failures in the compute infrastructure
  • Robustness against algorithms dealing with exceptions and errors in the data, which will be the focus of the next part

The sheer scale of the predictions combined with the need to keep strict deadlines requires scale-out of computations. This increases the complexity of the distributed systems, which requires a high level of software engineering and operations expertise. Systems have to be designed so that they are fault tolerant, replicated to avoid local outages – and easy to recover should any failure happen despite all best efforts.

Ulrich Kerzel earned his PhD under Professor Dr Feindt at the US Fermi National Laboratory and at that time made a considerable contribution to core technology of NeuroBayes. After his PhD, he went to the University of Cambridge, were he was a Senior Research Fellow at Magdelene College. His research work focused on complex statistical analyses to understand the origin of matter and antimatter using data from the LHCb experiment at the Large Hadron Collider at CERN, the world’s biggest research institute for particle physics. He continued this work as a Research Fellow at CERN before he came to Blue Yonder as a senior data scientist. Ulrich Kerzel earned his PhD under Professor Dr Feindt at the US Fermi National Laboratory and at that time made a considerable contribution to core technology of NeuroBayes. After his PhD, he went to the University of Cambridge, where he was a Senior Research Fellow at Magdelene College. His research work focused on complex statistical analyses to understand the origin of matter and antimatter using data from the LHCb experiment at the Large Hadron Collider at CERN, the world’s biggest research institute for particle physics. He continued this work as a Research Fellow at CERN before he came to Blue Yonder as a Principal Data Scientist.

 

Dr. Ulrich Kerzel Dr. Ulrich Kerzel

earned his PhD under Professor Dr Feindt at the US Fermi National Laboratory and at that time made a considerable contribution to core technology of NeuroBayes. He continued this work as a Research Fellow at CERN before he came to Blue Yonder as a Principal Data Scientist.