“There are two kinds of forecasters: those who don’t know, and those who don’t know they don’t know.” ― John Kenneth Galbraith. Our company was fortunate enough to be presented with an opportunity to build an inbound, call volume forecasting and arrival pattern model for one of our clients in their client service centre. At first glance, this seemed a tantalising and mouth-watering opportunity for our data and statistically inclined team to sink their teeth into. However, the enormity of the challenge hit home, as the trigger for the request was the pressing need to migrate off the current, off-the-shelf forecasting solution (a module bolted onto the telephony system). Unfortunately, the off-the-shelf solution was not providing the accuracy required by the client, as well as requiring significant manual effort to run.
Below are a few key points that we learned along this journey:
1.Talk to the people who know
There is no better person that “knows” their business like the client. Our first step (before touching any data) was to solidify the objective of the project, gain context and ultimately set the project up for success. The client was looking for a monthly forecast of volumes that would enable them to (i) plan their resourcing efficiently to meet demand, (ii) set improved daily targets for staff, and (iii) eradicate the current, manual effort to generate a monthly forecast.
The next step was to gather and clarify specific information that would deliver an effective call forecasting solution. This included agreeing various data definitions (date ranges, public holidays, shift hours, etc.) to ensure all parties were on the same the page and comparing apples with apples. And then, in conjunction with the client, we gathered the potential which factors which contributed / affected daily call volumes. The initial list included items such as product launches, campaigns, IT system failures, month, day of the month. Our next steps was to determine the effect each of these factors had on call volumes. The aim of a statistical forecast is to understand the effect a factor (X variables) have on the Output (Y) – Daily Call Volumes.
2.Remove the “noise”
This step is of paramount importance. There are two-types of variation in any process or system. (i) “Common Cause Variation” – this is usual, historical, quantifiable, known variation in a system; and (ii) “Special Cause Variation” – this is unusual, not previously observed, non-quantifiable variation. When building a prediction model, the key is to focus on what is “known” or can be explained. Thus, we highlighted and removed outliers (2 standard deviations outside of the mean) from the data set (Fig 1. Below). These outliers require special investigation and understanding, as they were outside the norm. In a call centre environment these outliers may result due to marketing campaigns, IT system failures, product issues, etc. We will touch on how we dealt with special-cause variation later on.
The focus was for us to now build a Regression model dealing with only Common Cause Variation that we could in fact quantify to some extent.
3.Balance textbook stats versus real-world application
Now the rubber hits the road. We accumulated roughly a year’s worth of historic data and removed the outliers. Then, using Minitab (statistical software) we ran all the potential factors (X’s) through a Multiple Regression model until we had a model (prediction equation/s) that we think could be applicable. This is where the balance of textbook statistics and real-world application becomes an art form. There are 2 key statistical indicators that determine if your multiple regression equation will be accurate enough to predict the future. These are (i) “R-Squared” – a statistical measure of how close the data are to the fitted regression line; and (ii) “P-Value” – indicates the relationship between the Y output (daily call volumes in our case) and the X variables (day of the month and week day in this exercise).
We were happy with the P-value and saw a strong relationship between Call Volumes (Y) and Day of the Month and Week Day (X’s). However, the R-Square value was not ideal. Textbooks will tell you that you need an R-square of greater than 80% to use the regression equation to predict. This means that 80% of the variation in Y (Call Volumes) can be explained by the regression model. The R-Square we were achieving was 62% – quite a way off.
Nevertheless, we asked an important question, “How accurate does predicting call volumes at the beginning of a month need to be?” Do you need to get it absolutely 100% correct, or can you predict within an acceptable range? Based on a discussion with our client it turns out that we had +/- 30 call leeway either side of what we predicted. The result – the call prediction solution was able to predict a 90% accuracy of calls within the agreed range. (The off-the-shelf solution only predicted a 30% accuracy).
4.Add your flavour
The client’s original request was to predict calls a month ahead. While the requirement does have benefit from a staffing perspective, this does not necessarily provide a tactical view. Therefore, there were two additional enhancements we brought to the clients attention (and were subsequently implemented):
- Daily Forecast. This would provide a daily forecast, as at the close of business in order to effectively plan for the day ahead. It looked at additional factors over and above day of the month week day. These included a 3, 5 and 10 day moving average to factor in any seasonal movements that were taking place over the current 2 week period. The model had a favourable R-Square of 79.59%. This could be seen in the accuracy of the forecast achieved.
- Intra-day Forecast. This forecast was especially useful in addressing Special Cause Variation. The premise was that by 10:00 and 11:00 you would know what kind of day you were having. Is it a high, mid or low volume day? The model built the regression equation based on the number of calls received by 10:00 and 11:00 (X variables) to provide the number of daily calls (Y). The model was effective with an R-square in the upper 80’s. The data feed received was 15 minutes delayed, therefore the solution would tell you by 10:15 or 11:15 if you needed extra or less staff to manage the workload. It effectively allowed you to “balance the line” in a multi-skilled environment.
Author: Claudio dalla Venezia