Why we ignore MAPE, RMSE, and other mathematical errors in forecasting demand

When a business is faced with the task of forecasting demand for inventory management, the question usually arises of which forecasting method is best. There is no one single, clear answer to the question of which method is superior. However, from what we've seen in the industry, the most commonly used single point forecasting model is the mean average percent error (MAPE) model. The mean absolute error (MAE) and the mean square error of prediction (RMSE) are also common.

The forecast error in this case is the difference between the actual value of demand and its forecast value. In other words, the greater the forecast error, the less accurate the forecast. For example, with a forecasting error of 5%, the forecast will be 95% accurate. MAPE was initially used to forecast time series which have normal distributions, such as electrical consumption. Only later did it begin to be used as a tool for demand forecasting. In practice, the error can be calculated for each item, as well as the average rating for all product groups.

Despite the fact that most companies still use the above forecast models, we believe their lack of precision makes them less than optimal for use in real business situations. For simplicity of presentation, we highlight three key points that can cause error with these point forecast models. Let's call them mistake No. 1, No. 2 and No. 3. First, we will describe these errors in detail, and then describe how our model helps eliminate them.

Why MAPE, RMSE, and other common errors should be avoided

Mistake number 1 is that the methods used are more related to mathematics than to business, in that they use general numbers (or percentages) that do not say anything about money. Businesses need to make decisions based on their bottom line. For example, an error of 80% at first glance sounds intimidating. But in reality it may be reflecting quite different things going on. An error involving nails that cost 50 cents per nail unequivocally means losses. But not the sort of losses we're looking at from sale of industrial equipment worth 700,000 USD at that same level of forecasting error. In addition, the volume of goods is also more important, something else that is ignored by these forecasting errors.

Another key issue (mistake No. 2) ignored by point forecasts is the money that ends up tied up in inventory and lost profit from the stock-outs. For example, if we predict the sale of 20 rims, but actually only sold 15, the cost of our error is 5 rims, which we will have to pay holdings costs for over a specific period of time, and therefore the cost of tied-up capital at a certain percentage. If we consider the opposite situation, however - we forecast the sale of 20 discs and get orders for 25. Now we're dealing with lost profits, or the difference between how much we ordered and how much we sold. In fact, it's the same forecasting error in both cases, but the result can be quite different.

The third key point (error No. 3) - these errors are associated only with a point forecast and don't describe safety stock. In some cases safety stock can be 20 - 70% of the total inventory on hand. Therefore, no matter how accurate the forecast is from a single point model perspective, we're still failing to calculate the safety stock, and so the actual data may be significantly distorted.

Business Profitability Criteria

Given the disadvantages of the forecasting mistakes described above, single point approaches are a poor basis for comparing algorithms. Among other things, it is often disconnected from the realities of business. The approach we use allows us to evaluate the accuracy of algorithms in money terms and calculate the cost of forecasting errors in the language of business finances. This way we can eliminate error №1.

For error № 2, we examine two different cases. If the forecast turns out to be less than real demand, we will have a shortfall, the financial effect of which is found by taking the number of unsold goods and multiplying it by the difference in purchase and sale prices. For example, say we buy rims at 3000 dollars a piece and sell them for 4000 dollars a piece. The forecast for the month was 1,000 discs, while actual demand was 1200 units. The loss will be equal to:

(1200-1000) * (4000-3000) = 200 000 dollars.

If the forecast exceeds actual demand, the company will incur losses from holdings costs. The loss will be equal to costs for unsold product multiplied by the rate of return on other investments in the same period. Let's suppose that actual demand in the previous example was 800 rims and we had to hold the rims for another month. Let the rate on other investments be 20% per year. In this case, we calculate the loss as:

(1000-800)*3000*0,2/12=10 000 dollars.

And so we will consider one of these values in each specific case.

In order to eliminate error № 3, we compare algorithms using the concept of service level. The level of service (which we will refer to as Type II service level, fill rate) - is the amount of demand that we can meet using inventory on hand during a replenishment cycle. For example, a 90% service level means that we will be able to service 90% of demand. While at first glance, it may seem logical that the service level should always be 100%, to maximize profits, this is not the case in reality: meeting 100% of demand means extreme over-stock, and in the case of perishables and fixed shelf life products, spoilage and waste. Not to mention that holding costs, spoilage, and capital costs will ultimately reduce profits over what could be earned at a service level of 95%. It should be noted that for each individual unit will have its own optimal service level.

Since safety stock can account for a lot of what we're looking at, it cannot be ignored when comparing algorithms (as is done when calculating errors using MAPE, RMSE, etc.). Therefore, we do not compare the forecast, but the optimal stock for a given service level. The optimal inventory for a given service level is the amount of goods that must be kept in stock in order to achieve maximum profit from the sale of goods while minimizing holding costs.

As the main criterion (criterion №. 1) for the quality of forecasting, we will use the total value of losses for the service level described above (solution № 2). Thus, we estimate the losses in monetary terms when using this particular algorithm. The smaller the loss, the more accurate the algorithm.

We should note here that the optimal inventory level can also differ for varying service levels. In some cases the forecast will be right on target, while in other cases it may be skewed in either direction. Since many companies do not calculate an optimal service level, but use a fixed level, we calculate the main criterion for all the most common service levels: 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% and then give a summary of cost. This allows us to test how well the model works overall.

For companies that calculate optimal service level we use an additional criterion (criterion № 2). In general terms, it is basically the ratio of losses to optimal service level optimal level in terms of the expected (model) vs actual (observed) distribution of sales. The predicted optimal service level does not always correspond to the actual optimal level already in use. Therefore, we need to compare the error between the forecast sales at the optimal (according to the model) service level and the actual sales volume that provides this service level according to company data.

To illustrate the application of this criterion, let us return to our example with the rims. Suppose that the predicted service level is 90%, and the optimal amount of stock for this case is assumed to be 3000 rims. If we assume in the first case that the actual service level was higher than the forecast at 92%, then volume of orders also increased to 3300 rims. The forecast error is found from the difference between the real and actual [SIC - note!] sales volume, multiplied by the difference in sales prices. To sum, we have:

(3300-3000)*(4000-3000)=300 000 dollars.

Now, let's consider the opposite situation: the actual service level was 87%, less than predicted. The real sales volume amounted to 2850 rims. The forecasting error is found by adding the cost of unsold inventory multiplied by the ROR for alternative investments in the same period (for our example we look at one month and use a ROR of 20% annually). The total value of the criterion will be equal to:

(3000-2850)*3000*0.2/12 = 7500 dollars

Of course, we should ideally calculate the error only at the optimal service level, between the forecast and actual values. But since not all companies have moved to optimal service levels, we are limited to these two criteria.

The criteria we use, in contrast to classical mathematical errors, show the total loss in money terms when comparing various models. And so the best model will be the one that minimizes losses. This approach will allow businesses to evaluate the operation of various algorithms in a language they understand.  

Comparison of forecast accuracy with Forecast NOW's model and the ARIMA method (based on household chemical product ranges).

 The criteria

 (losses in dollars) 

 Forecast NOW! 

 ARIMA 

 Ratio

 Criteria №1

 (losses at the optimal service level)

92 997 114

169 916 601

82,71%

 Criteria №2

4 188 749

7 611 365

81,71%

Criteria №1 (total value for most common service levels)

820 099 299

1 550 434 475

89,05%

 

Comparison of forecast accuracy with Forecast NOW's model and the Croston method (based on household chemical product ranges).

 The criteria

 (losses in dollars) 

 Forecast NOW! 

 The Croston method

Ratio

Criteria №1

 (losses at the optimal service level)

6 379 616

8 328 509

30,55%

  Criteria №2

1 076 984

1 341 537

24,56%

Criteria №1 (total value for most common service levels)

128 690 989

161 891 666

20,51%