If the phrase ‘Logistic Regression’ (or even just that picture of a calculator!) has just brought you out in a cold sweat, then don’t panic!!
Basically, for the purposes of this blog post, think of logistic regression as being a bit like making a cake. When you make a cake, you take various ingredients in different quantities, mix them together and put them all in an oven. With logistic regression, we took the things we knew about the bottle of wine (alcohol level, pH etc.) as the ingredients, worked out how much of each of them we should put into our mixture then passed it through a mathematical formula (our equivalent of the oven) and out popped a cake – or in our case, a prediction of whether a bottle of wine is good or not.
Many Artificial Intelligence models use a similar approach, and the increased sophistication meant that we could achieve much better results. We were able to produce a model that gave us an 89% prediction accuracy, and importantly, unlike our simple Decision Tree, this was making predictions for every bottle of wine in the list, not just the small subset of ones it thought were likely to be good.
Unfortunately, this improvement in accuracy comes at the expense of explainability. It is much more difficult to explain how the model is making its predictions than it was for the Decision Tree.
Explainability in Artificial Intelligence has become a bit of a hot topic recently, and because of that, there are now tools on the market that are helping to provide insight into these types of models. We ran one of those tools on our model and got the following:
Which is interesting, as it’s basically telling us how much of each ingredient was put into the cake mixture to get the prediction.
But, (and this isn’t a criticism of the tool as it is a difficult challenge it’s trying to address and this is just one graph from the many it provides) the problem is that it’s a bit like knowing the cake ingredients without knowing the recipe. It helps to a certain extent, and certainly you can see what the most important ingredients were, but it would be difficult to sum up the above in a simple sentence other than being able to say something like “the most important factors in predicting the quality of red wine seem to be the levels of alcohol and sulphates”.
Well, by trading off how easy it is to explain the results of the model, we’ve been able to achieve a significant improvement in prediction accuracy. However, this was a much more time-consuming process to get this result and required specialist data science knowledge and skills to create the model.
By using tools that are starting to come onto the market, we can still offer some explanation about what the most important factors are in making the decision about the quality of the wine, but realistically, the level of information we have extracted here is unlikely to be sufficient to explain the reasoning behind a more sensitive situation such as being refused a loan more than simply saying something like “I’m afraid that you don’t meet our lending criteria”.
Logistic Regression Summary
- The accuracy of the predictions from this model is much higher, and we can now make predictions for every wine on the list, not just a subset
- The gain in accuracy has come at the expense of how easy it is to explain the model
- Producing a model like this takes significantly more time than a simpler model and requires specialist skills and knowledge