Tourist Arrivals in Sri Lanka: A Comparative Study of Holt- Winter’s versus Box- Jenkin’s Modeling Methods

Tourism is one of the vastly growing and largest industries in the world. Contribution of tourism to Sri Lanka’s total foreign exchange earnings in 2016 amounted to 14.2%. After the civil strife in Sri Lanka, tourist arrivals continue to grow annually. Therefore, the postconflict tourist arrivals were considered for this study. Forecasting is an essential analytical tool in tourism policy and planning. In all the regions at all times, there is no specific model that outperforms other models regularly. Therefore, the objective of this study is to compare Holt-Winter’s and BoxJenkin’s methods of modeling the tourist arrivals and to recommend a better method to forecast the future tourist arrivals in Sri Lanka. Appropriate tests were applied in modeling exercises for both methods. The results demonstrate that, during June 2009 to June 2017, nearly 10.5 million of tourists had visited the island. Both models are adequate for forecasting tourist arrivals. However, based on the forecasting accuracy measures of the model, the Box-Jenkin’s method outperforms the HoltWinter’s method. The BoxJenkin’s model gives approximately 90% forecasting accuracy and therefore it is recommended to forecast the tourist arrivals in Sri Lanka. *Correspondence should be addressed to Mr. S. R. Gnanapragasam. Email: srgna@ou.ac.lk https://orcid.org/0000-0003-1411-4853 (Received 15th September 2017; Revised 14th May 2018; Accepted 22nd May 2018) © OUSL This article is published under the Creative Commons Attribution-Share Alike 4.0 International License (CC-BY-SA). This license permits use, distribution and reproduction in any medium; provided it is licensed under the same terms and the original work is properly cited. 65 DOI: http://doi.org/10.4038/ouslj.v13i1.7395


Introduction
Tourism is one of the fastest growing industries in the world. It contributes to economic development of a country in terms of generating foreign exchange earnings and employment opportunities. After identifying the need to set up an institutional framework, in 1966, the government of Sri Lanka decided to develop tourism in a planned and a systematic manner. As a result, currently tourism in Sri Lanka is managed by Sri Lanka Tourism Development Authority (SLTDA). In Sri Lanka, the tourism sector continues to perform well and is able to retain its rank in the third level as one of the main sources of foreign exchange earners for the national economy. According to the Annual Statistical Report 2016 of SLTDA, the contribution of the tourism to Sri Lanka's total foreign exchange earnings in 2016 amounted to 14.2%. Sri Lanka has always been a tourist destination due to its natural beauty and uniqueness. It has continued to attract foreign investors and tourists since its independence in 1948. Though from the beginning, tourist arrivals in the island continued to grow annually, Sri Lanka had some setbacks in this sector due to factors such as Tsunami in 2004, world economic crisis in 2009 and mainly due to uncertainty in security during the ethnic conflict from 1983 to 2009. The studies of ; Gnanapragasam, Cooray and Dissanayake (2016), have shown that, there was a dramatic growth in tourist arrivals in Sri Lanka after June 2009. Therefore, in this study, only the postconflict series of tourist arrivals is taken for the analysis.
Forecasting is an essential analytical tool in tourism policy and planning. These decisions have to be taken by the relevant 66 Tourist Arrivals in Sri Lanka: A Comparative Study of Holt-Winter's versus Box-Jenkin's Modeling Methods authorities in a way that facilitates the needs of the future tourists in Sri Lanka. Song and Li (2008) had done a comprehensive review of published studies from the year 2000 on tourism demand modelling. One of the important results in this review is that the methods used in analyzing and forecasting the demand for tourism had been more varied. Further, this review showed that, as far as the forecasting accuracy is concerned, there is no single model that consistently outperforms than other models in all aspects. Therefore, it recommends the application of several modeling methodologies to forecast the tourism demand, and identify the best model based on the forecasting accuracy.
Post-conflict tourism demand in Sri Lanka was forecast using different methodologies by some researchers in the past. Kurukulasooriya and Lelwala (2014) had used the classical decomposition method and it showed a forecasting accuracy of 96%. In the study of , the dynamic transfer function modeling method was tried out with 91.37% forecasting accuracy. Further, the method of state space with 94.05% forecasting accuracy was employed by another study of . Moreover, Autoregressive Integrated Moving Average (ARIMA) and decomposition modeling methods were compared in the study of Ishara and Wijekoon (2017) and they recommended that the decomposition method performed well in terms of accuracy of the fitted models.
The exponential smoothing method was applied for the modeling exercises, for forecasting tourist arrivals in different regions in some parts of the globe, in the following studies. Bermudez, Segura and Vercher (2007) argued that the additive Holt-Winters model was inappropriate if the seasonal components or the error variance depend on the level of the series, but it could also be useful after an adequate data transformation for the UK air passenger data. In the study of Witt, Newbound and Watkins (1992) showed that the exponential smoothing generates method forecasts with lower error magnitudes than other modeling methods for Las Vegas arrivals data. In Lim and McAleer (2001) study too, the Holt-Winters additive and multiplicative seasonal models outperformed the other models in forecasting arrivals to Australia from Hong Kong, Malaysia and Singapore. Akuno, Otieno, Mwangi and Bichanga (2015) had compared two models based on the accuracy of the models and it was identified that double exponential smoothing model was better than ARIMA modelling to forecast tourist arrivals in Kenya. The common theme of those studies suggests that, generally forecasting accuracy is high in exponential smoothing modelling and this approach obtains a level of accuracy comparable to those of other more sophisticated models.
However, in some previous studies on tourism demand for different countries in the world, it is recommended that ARIMA modeling method is the most accurate method for forecasting tourist arrivals. Cho (2001) said that univariate ARIMA is a more suitable method and can be applied to forecast the fluctuating series of visitor arrivals from different countries to Hong Kong. The results of Chu (1998) showed that the accuracy of the forecasts differs depending on the country being forecast, but that the ARIMA model is the most accurate method for forecasting international tourist arrivals in Asian-Pacific countries. Loganathan, Nanthakumar and Yahaya (2010) recommended that ARIMA model with seasonal effects approaches had offered valuable insights and provided reliable forecasts of tourism demand in Malaysia. Analytic results of the study of Chang, Hsueh and Tian (2011) demonstrated that ARIMA outperformed other approaches in terms of accuracy of models and provided effective alternatives for forecasting tourism demand with evidence from Taiwan. The results of Saayman and Saayman (2010) showed that seasonal ARIMA models deliver the most accurate predictions of arrivals in South Africa and it further indicated that the methods that take into account the seasonality of international tourist arrivals outperform the others in terms of forecasting accuracy. According to the preliminary analysis of Gounopoulos, Petmezas and Santamaria (2012), the ARIMA model outperforms other exponential smoothing models as a directional forecasting tool for tourist arrivals in Greece. The results of Chaitip, Chaiboonsri and Mukhjang (2008) confirmed that the best forecasting method based on the structural modelling methods for international visitor arrivals in Thailand that established a single variable is the seasonal ARIMA.
From these past studies it can be concluded that, in all the regions at all times, there is no specific model that outperforms other models regularly. Also, some of them recommend Holt-Winter's is the better method while the others recommend the Box-Jenkin's as the better method. Therefore, the objective of this study is to compare Holt-Winter's and Box-Jenkin's methods of modeling the tourism demand and to recommend a better method based on the accuracy of the model to forecast the future tourism demand for Sri Lanka.
For this study, monthly tourist arrivals from June 2009 to June 2017 were extracted from the Statistical Annual Reports of SLTDA. To develop the models by both methods, the series from June 2009 to December 2016 were used, whereas the arrivals from January 2017 to June 2017 were used for the validation of the fitted models.

Preliminary analysis
The following techniques were carried out to check the behavior of the series, or in other words, to check the stationary condition of the series, before developing any models as the preliminary analysis. Eviews and MINITAB software were used to get the output of the results. In this subsection, Y t is defined as the response variable at the time t .

Inspecting the behavior of the series
The plot of time series is generally used to get an idea about the series and its behaviour. Also, it is used to inspect extreme observations, missing data, and elements of non-stationary such as trend or seasonality or cyclic pattern or irregular variations.

Testing unit root
Augmented Dickey-Fuller (ADF) test is used to test whether the series has a unit root. Also, it is used to confirm, statistically, that the stationary of the series in terms of trend availability.
N is the total number of rankings, R i is the sum of the rankings in a specific season, n i is the number of rankings in a specific season and L is the length of season.

Determining the nature of the process
Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) are examined to determine the nature of the process under consideration. If both ACF and PACF decay exponentially then the series is stationary.
ACF at lag k is defined by If the first several autocorrelations are persistently large in the graph of ACF and trailed off to zero rather slowly, it can be assumed that a trend exists and hence the time series is nonstationary.

Transforming to stationary series
The differencing method is employed to transform the nonstationary series into a stationary series so as to improve the forecasting performance as follows: Regular differencing: If the series has a trend then by taking first difference (or at most 2 differences) the trend can be eliminated from the series and it is defined as Seasonal differencing: In this method, differences are taken at seasonal lags. If the spikes appear repeatedly in the ACF graph at particular lags, then it can be assumed that there is a seasonal pattern in the series. It is defined as:

Developing Holt-Winter's modelling method
Exponential Smoothing is a method of forecasting that induces historical patterns such as trends and seasonal patterns into the future. An exponentially weighted moving average refers to a weighted moving average of the series in which the weights decay exponentially.
Single Exponential Smoothing (SES) method is used for short-term forecasting, usually just one period into the future. The model assumes that the series fluctuates around a reasonably stable mean (no trend or consistent pattern of growth).

Double Exponential Smoothing (DES)
method is used when the series shows a trend. This method with a trend works much like SES except that two components (level and trend) must be updated each period.

Holt-Winter's Seasonal Exponential Smoothing
Method is used when the series shows trend and seasonality. To handle seasonality, a third parameter is added in addition to the parameters in DES. Depending on the type of seasonality, multiplicative and additive seasonal models can be developed. This method is relatively good for short-term forecasting. To apply this method, there is no need for large amounts of historical data and also it is the preferred forecasting technique by many statisticians.
The multiplicative Holt-Winter's model containing linear trend is represented as , whereas the additive Holt-Winter's model containing linear trend is represented as  Gardner (1985) that, in practice, to compute forecasts the Optimum Smoothing Constants must be used. The common approach is to work with several values of smoothing constants and select the best combination which produces the minimum Sum of Squares of Errors (SSE) for the evaluation criteria used. This procedure is time consuming and thus the grid search algorithm in STATISTICA software is used to determine the smoothing constants.

Grid search algorithm
One common method to estimate the smoothing constants is to perform a grid search of the parameter space. Thus, STATISTICA provides each parameter from the minimum (from zero) to maximum (to one) by incrementing step by step.
is the cumulative distribution function of the specified distribution, Y i are the ordered data and N is the total number of observations. In addition to the AD test, the normal probability plot is also obtained to check the normality condition of the residuals. If the plot looks fairly straight, or in other words if it is almost linear, then it can be assumed that the residuals are normally distributed.

Testing independency of residuals
One of the most important tests for detecting serial correlation is Durbin Watson (DW) statistic. DW statistic is used to test for randomness of residuals. The test statistic is defined as: , where ˆt u is the white noise of a fitted model. The DW closer to 2 reveals that the residuals are free from serial correlation. Further to DW statistic, the plot of standard residuals versus fitted values is also obtained to check the independency of residuals. If this plot shows random pattern and it almost lies within 2  limits, then it can be claimed that the residual is distributed independently.

Testing heteroscedasticity of residuals
White's general test is used in order to check the constant variance of residuals. Accordingly, the null hypothesis is: H0: Homoscedasticity against the alternative hypothesis H1: Heteroscedasticity. The test statistic of the White's general test is:

Selecting best model
In some situations, two or more models could satisfy all the conditions of the diagnostic checking. In such situations, to select the best model among those significant models, the following criterions are applied: Akaike Information Criterion (AIC) is often used for model selection.
For sample size n , the expression of AIC is given by: According to Lewis (1982), the forecasting accuracy of the fitted model is high when MAPE value is less than 10%. However, if the value of MAPE is in between 10% and 20%, then it is a good forecasting.

Behavior of the series
It is well known that stationary series are necessary to fit Box-Jenkin's and Holt-Winter's models for better forecasting performance. Therefore, the stationary condition of the original series of the tourist arrivals, after the civil conflict, to Sri Lanka is tested in this subsection before fitting any models. For this purpose, the entire series from June 2009 to June 2017 is used.
The Figure 1 represents the time series plot of tourist arrivals after the civil conflict in Sri Lanka.  Figure 1 that there are no any extreme observations. Also, this plot clearly shows an upward trend. Further, it can be observed that, after every 12 th point, there is a peak point, marked as A, B, C and D in Figure 1  1, the original series is non-stationary and therefore it has to be converted to a stationary series. For this purpose, differencing method is employed.

Month
The first regular difference of the original series is taken and the plot of the 1 st regular differenced series is shown in Figure 2.  Table 1 as well. It is further observed from Figure 2 that, after every 12 th point, there is a peak point, marked as E, F, G and H in Figure 2 (it repeats after every 12 th month). It further suggests that there is a seasonal pattern in the 1 st regular differenced series. Therefore, the 12 th seasonal difference is taken from the 1 st regular differenced series and the plot of the 12 th seasonal differenced is plotted in Figure 3.
There is no clear indication of a trend or a seasonal pattern in Figure 3. It can be assumed that the 12 th seasonal differenced series is stationary and it is statistically confirmed in Table 1. The ACF of the 1 st regular differenced series shown in Figure 4 is not decaying exponentially and there are some high spikes that repeatedly appear in the middle. Therefore, it can be concluded that the 1 st differenced series is also a non-stationary series. It can be further confirmed from the p-values of ADF (0.00) and Kruskal- Wallis (0.00) tests in Table 1 that the 1 st regular differenced series is also non-stationary.
The Table 1 summaries the p-values of ADF and Kruskal-Wallis tests for the 1 st regular difference and the 12 th seasonal differenced series and the decision on the stationary condition based on those p-values.

(No seasonality) Decision on stationary condition
Non-stationary Stationary Moreover, at 12 th , 24 th and 36 th lags high spikes can be seen in the ACF of the 1 st regular differenced series in Figure 4. This seasonal pattern suggests that, this series has seasonality with length 12. Therefore, to remove this seasonality, again a 12 th seasonal difference is taken and the relevant ACF plot is shown in Figure 5. It can be seen from the ACF for 12 th seasonal differenced series in Figure 5 that in almost all the lags, except one at the beginning spikes, are not significant. It seems this transferred series is stationary. Nevertheless, graph of PACF is also obtained for the 12 th seasonal differenced series and it is shown in Figure 6. Now from ACF plot in Figure 5 and PACF plot in Figure 6, it can be claimed that, 12 th seasonal differenced series is stationary. At the same time, the p-values of ADF (0.00) and Kruskal-Wallis (0.53) tests in Table 1 strongly confirm that, the newly generated series by taking the 12 th seasonal difference is stationary. Thus, this converted series now can be used to fit both Holt-Winter's and Box-Jenkin's models.

Developing the Holt-Winter's model
Grid search algorithm in STATISTICA is used to estimate relevant smoothing constants. In this algorithm the values are chosen by setting the initial values as 0.01 and incremental step value by 0.1 and 0.05. Both additive and multiplicative seasonal model types are considered in the Holt-Winter's method. The 1 st choice among the best 10 combinations of smoothing constants based on the ascending order of sum of squares of errors in both model types are taken. Hence the smoothing constants corresponding to both model types with MAPE values are summarized in Table 2 Figure 7 that the normal probability plot is fairly straight and therefore it can be claimed that the residuals of the Holt-Winter's model are normally distributed. Further the p-value (0.793) of Anderson Darling test very strongly confirms the result that the distribution of the residuals is normally distributed. Therefore, the first condition of the diagnostic checking is satisfied by this Holt-Winter's model.

Independency testing of the fitted Holt-Winter's model
The points in the plot of standard residuals versus fitted values in Figure 8 are scattered randomly and it almost lies within 2  limits. Hence it can be stated that, the residuals of the Holt-Winter's model are independent. Therefore, the second underlying assumption of the significant model is also satisfied as there is no serial correlation among the residuals of this Holt-Winter's model.

Heteroscedasticity testing of the fitted Holt-Winter's model
Since the plot of standard residuals versus observations order of Holt-Winter's model in Figure 9 is almost symmetric about zero and within 2  limits, and as there is no any systematic patter, it can be confirmed that the variance of the residuals is constant. Hence, the third condition of a significant model is also satisfied. Therefore, there is no heteroscedasticity in the residuals of the fitted Holt-Winter's model.   Figure 5 and PACF graph in Figure 6, it can be observed that, the appropriate models can have 3 AR terms and 1 MA term as the significant spikes appear at the relevant lags in those graphs. Possible different models, with all the combinations of those terms, are tried out. The selection criterions of the significant models are summarized in Table 3. Among three significant models in Table 3 which satisfy diagnostic checking, seasonal ARIMA(0, 1, 1)(0, 1, 0)12 model is selected as the best model which has the lowest AIC (21.06) and SBC (21.09) values. The results of the diagnostic checking only of the best model are described as follows:

Normality testing of the fitted Box-Jenkin's model
The normal probability plot of the residuals of the Box-Jenkin's model in Figure 10 is almost linear and therefore it can be stated that the residuals of the model is normally distributed. Further, the p-value (0.548) of Anderson Darling test confirms that the distribution of the residuals is normal. Therefore, it can be concluded with 95% confident that the residuals of the Box-Jenkin's model are normally distributed and it satisfies the first condition of the diagnostic checking.

Independency testing of the fitted Box-Jenkin's model
It can be seen from Figure 11 that, the plot of standard residuals versus fitted values of the Box-Jenkin's model is scatted randomly and it almost lies within 2  limits. Therefore, the residuals are independent. Moreover, the Durbin Watson statistic (1.99) confirms that the residuals have no serial correlation. Therefore, the residuals satisfy the second condition of diagnostic checking that there is no serial correlation among the residuals of the Box-Jenkin's model.

Heteroscedasticity testing of the fitted Box-Jenkin's model
There is no any systematic patter in the plot of standard residuals versus observations order of the Box-Jenkin's model in Figure 12; it is symmetric about zero and it almost lies within 2  limits.
Therefore, it can be claimed that the residuals of the model have constant variance. At the same time, the p-value (0.67) of White's general test strongly confirms that there is no heteroscedasticity. Therefore, it can be concluded with 95% confidence that, the variance of residuals is constant and thus no heteroscedasticity exists. All three underline assumptions of the diagnostic checking are satisfied by the fitted Box-Jenkin's model. Therefore, this model is significant and it can be used for the purpose of forecasting future tourism demand for Sri Lanka.

Validating the fitted models
To check the model validation as the accuracy of the fitted model, the series from January 2017 to June 2017 are taken into account. Predicted values from both Holt-Winter's and Box-Jenkin's models are, separately, obtained in this particular period and which are summarized in Table 4 as follows: As far as the accuracy of the model is concerned based on the MAPE values in Table 4, both are good models. On the one hand, MAPE value of the Box-Jenkin's model is approximately 10% and therefore the accuracy of that model is very high. On the other hand, the MAPE value of the Holt-Winter's model is approximately 14% and its accuracy also good. However, when compared with MAPE values, the Box-Jenkin's method performed better than the Holt-Winter's method in terms of forecasting accuracy of the model. Therefore, the Box-Jenkin's model is recommended to forecast future tourism demand for Sri Lanka.

Forecasting future tourist arrivals in Sri Lanka
The Figure 13 represents the monthly-wise forecast of tourism demand by the method of the Box-Jenkin's from July 2017 to December 2017. According to the month-wise forecast of tourism demand by the method of Box-Jenkin's model in Figure 13, in the month of December 2017 more than 235,000 tourists will arrive to Sri Lanka and which will be not only the highest monthly arrival for that year, but also the highest monthly arrivals in the history of Sri Lankan tourism so far. In the meantime, a lower number of tourists will come in the months of September and October in the year 2017. On the average, nearly 192,000 tourist arrivals per month are estimated for the second-half of the year 2017. Nearly 1.15 million of tourists will arrive in the second-half of the year 2017, which is 5.7% increase compared to the arrivals in the second-half of the year 2016.

Conclusions and Recommendations
Based on the series of tourist arrivals from June 2009 to June 2017, nearly 10.5 million of tourists had visited the island after the civil conflict in Sri Lanka. Accordingly, on the average, nearly 108,000 visitors per month had come to Sri Lanka as tourists.
Both models fitted by the methods of Holt-Winter's and Box-Jenkin's are significant. However, based on the forecasting accuracy of the model, it is concluded that the Box-Jenkin's method outperforms the Holt-Winter's method for tourist arrivals in Sri Lanka, after the civil conflict. Therefore, SARIMA (0, 1, 1) (0, 1, 0)12 is recommended as the best model to forecast the future tourism demand for Sri Lanka.
It can also be concluded that, in the second half of the year 2017, nearly 1.15 million of tourists will arrive to Sri Lanka and it is about 5.7% increase compared to the arrivals in the second half of the year 2016. Therefore, the country must be ready to cater to them in a professional manner such as facilitating their accommodations at their required level-that will increase the demand for tourist in the future. Hence, the tourism industry could contribute significantly to the country's economy in terms of foreign exchange.
It is hereby recommended that further studies should be conducted employing the neural network modeling method for the post conflict tourism demand for Sri Lanka and compare the forecasting accuracy of that model with the other models fitted so far in the same time scenario.