An Empirical Study on Human Leptospirosis Cases in the Western Province of Sri Lanka

Leptospirosis is a zoonotic infectious disease in the world. It is growing as a major public health threat in Sri Lanka. The records in Sri Lanka show that, over 4000 cases were reported in the year 2016 in which nearly one fourth of total cases was reported only from the Western province. The objective of this study is to model leptospirosis cases in Western province of Sri Lanka using time series analysis. Since the purpose of forecasting is to plan the future activities, this study will support in term of planning the programmes of control for future. Appropriate tests were employed for the preliminary analysis to study the behavior of provinces-wise and district-wise distribution of leptospirosis cases in Sri Lanka. Seasonal Autoregressive Integrated Moving Average (SARIMA) models were developed using standard techniques. Diagnostic tests for tentatively fitted models were checked. In addition, for the purpose of selecting the best model, usual selection criteria were used. Finally, mean absolute percentage error was used to measure the accuracy of forecasting. The results show that, Western province (28.41%) is the mostly affected part of the island by human leptospirosis. Moreover, Gampaha (10.78%), Kalutara (9.59%) and Colombo (8.04%) districts in Western province are ranked among the first 5 districts of Sri Lanka based on average number of recorded cases. The accuracy of the fitted SARIMA(1, 0, 0)(0, 1, 1)12 model is over 85%. Therefore, it can be used to forecast future leptospirosis cases in the Western province. Based on the fitted model, the expected number of new cases in the Western province for the year 2017 is estimated to be 1168. * Corresponding Author: S. R. Gnanapragasam email: srgna@ou.ac.lk https://orcid.org/0000-0003-1411-4853 (Received 23 February 2017; Revised 12 May 2017; Accepted 20 July 2017) © OUSL http://doi.org/10.4038/ouslj.v12i1.7354

The results show that, Western province (28.41%) is the mostly affected part of the island by human leptospirosis.Moreover, Gampaha (10.78%),Kalutara (9.59%) and Colombo (8.04%) districts in Western province are ranked among the first 5 districts of Sri Lanka based on average number of recorded cases.The accuracy of the fitted SARIMA(1, 0, 0)(0, 1, 1)12 model is over 85%.Therefore, it can be used to forecast future leptospirosis cases in the Western province.Based on the fitted model, the expected number of new cases in the Western province for the year 2017 is estimated to be 1168.

Introduction
Leptospirosis is a zoonotic infectious disease in the world.It is generally known as rat fever which affects animals as well as humans.It is growing as a major public health threat in Sri Lanka.A large number of suspected human leptospirosis cases began in Sri Lanka in late 2007 (Agampodi, Nugegoda & Thevanesam, 2010;Gunaratna, Handunnetti, Bulathsinghalage & Somaratne, 2012).The leptospirosis infection is in rodents and other domesticated wild animals.The common route of infection is through water contaminated by urine from infected animals.Transmission from animals to humans is common but human to human transmission is very rare (Plouffe, 2016;Chadsuthi, Modchang, Lenbury, Iamsirithaworn & Triampo, 2012).Mostly outdoor and agricultural workers such as those in paddy fields are particularly at risk of this infection.A higher number of leptospirosis cases were reported during the rainy season and it might even reach epidemic proportions in case of flooding because the floods cause rodents to move into the residential areas.
Some of the previous studies on the clinical features of human leptospirosis are as follows.The study of Gunaratna et al. (2012) showed that, in Sri Lanka, serum nitrite levels were increased in patients with confirmed leptospirosis cases compared to healthy controls.Also the study of Ramsey, Rubin-Smith, Norwich, Katumuluwa, Hettiarachchi, Wimalage, Danushka, Madushanka, Nadeeshani, Thilakarathna, Sewwandi, Malhari, Sirisena, & Agampodi (2015) concluded that, the most striking feature of leptospirosis cases was the higher prevalence of hypotension and bradycardia in patients.In addition, the review of Naotunna & Agampodi (2016) clearly showed the diversity of leptospira in Sri Lanka.This diversity was only for the disease causing agent.From the records in epidemiology unit in the island, it can be observed, on the average that, the leptospirosis cases in Sri Lanka were relatively high in every other year.Agampodi, Dahanayaka, Bandaranayaka, Perera, Priyankara, Weerawansa, Matthias & Vinetz (2014) argued that, the unusual clinical features observed during the year 2011 leptospirosis cases in Sri Lanka could be due to uncommon L. kirschneri strains that arose in the context of dry rather than wet season epidemiology and perhaps due to changing human-animal interactions or introduction of novel Leptospira to the region.
The statistical approaches were also done previously to identify the associations between the human leptospirosis and the other influential factors.For instance, Chadsuthi et al. (2012) developed models to study seasonal trends of climate factors and their effects on leptospirosis incidence in Thailand.Through those models, they were able to show the trend in leptospirosis cases and closely fitted the recorded data.It was noted in the study that, strong seasonality of reported leptospirosis cases was present year round.Further, models were developed by Plouffe (2016) to analyze leptospirosis incidences in Sri Lanka and its relation to rainfall.It was found that, the model which included current and previous rainfall covariates, as well as regression on previous cases of leptospirosis at a local and seasonal time scale.Those covariates indicated no significant correlation with leptospirosis incidences in Sri Lanka.Moreover, in the study of Robertson, Nelson & Stephen, (2012), space-time scan statistics were combined with regression modeling to test associations during endemic and outbreak periods.It was found that, the leptospirosis risk was positively associated with shorter average distance to rivers and with higher percentage of agriculture made up of farms.Further, in the study, the outbreak locations in 2008 were characterized by shorter distance to rivers and higher population density.
The records of leptospirosis in Sri Lanka show that, island wide, over 4000 cases were reported in the year 2016 in which nearly one fourth of total cases (1066) was reported only from the Western province (Epidemiology Unit, 2017).The study of Denipitiya, Chandrasekharan, Abeyewickreme, Viswakula & Hapugoda, (2016) analyzed spatial and seasonal patterns of human leptospirosis in the Gampaha district of Western province and it predicted the leptospirosis epidemic trend in that district.The study provided an evidence base for reducing disease burden by improving understanding of the dynamic patterns of the disease only in the Gampaha district in the Western province of Sri Lanka.However, in this study, leptospirosis cases in all three districts of Western province are taken into account.The objective of this study is to model leptospirosis cases in Western province of Sri Lanka using time series analysis.The study of Agampodi et al. (2010) discussed the urgent need for a national programme of control and prevention of leptospirosis.Since the purpose of forecasting is to plan activities for the future, this study will support in term of planning those programmes of control in the Western province of Sri Lanka for better health in future.

Methodology Source of data
Monthly leptospirosis records from January 2010 to December 2016 were taken from the epidemiology unit of the Ministry of Health, Nutrition and Indigenous medicine in Sri Lanka.To develop the time series models, the data from January 2010 to June 2016 were used whereas the records from July 2016 to December 2016 were used for validation of the fitted model.

Standard tests for the preliminary analysis
Prior to model development, the following standard tests were carried out to study the behavior of data.

Augmented Dickey-Fuller test (ADF)
ADF test is used to test whether the series has a unit root.It is to confirm, statistically, that the stationary of series in terms of trend availability.Test statistic for the model , where 11     , u t is the white noise and n is the number of observations.Hypothesis: H0:   1   and series has a unit root versus H1:   1   and series has no unit root.

Kruskal-Wallis Test
This test is used to confirm the seasonality in the series.The hypothesis to be tested in this test is H0: series has no seasonality versus H1: series has seasonality.The test statistic is defined as: , where N is the total number of rankings, R i is the sum of the rankings in a specific season, n i is the number of rankings in a specific season and L is the length of the season.

Autocorrelation function (ACF)
ACF at lag k is defined by . If the first several autocorrelations are persistently large in the graph of ACF and trailed off to zero rather slowly, it can be assumed that a trend exists and hence the time series is nonstationary.

Partial autocorrelation function (PACF)
PACF between is the conditional correlation between and YY t tk  and defined as follows: Purpose of examining autocorrelation function (ACF) and partial autocorrelation function (PACF) are to determine the nature of the process under consideration.

Seasonal differencing method
This method is used to transfer the non-stationary series to stationary series.In this method, differences are taken at seasonal lags.If the spikes appear seasonally in the autocorrelation function at particular lags, then it can be assumed that there is a seasonal pattern in the series.It is defined as: , where L is the length of the season.

Development of seasonal ARIMA model
A model with combinations of autoregressive terms and moving average terms are generally called as auto regressive moving averages ( ARMA ) model.A formulation of an ARMA process is given as: . If the series has a trend, it can often be converted to a stationary series by differencing and it is generally denoted as

 
ARIMA , , p d q , where d indicates the amount of differencing.
In some cases, the series shows a repeating or cyclic behavior.These seasonal patterns can be very effectively used to further improve the forecasting

Diagnostic tests
Diagnostic tests are performed to determine the adequacy of the model.After identifying the tentative models, the following tests are applied for residual analysis:

Anderson Darling test (AD)
AD test is used to test, if a sample of data comes from a population with a specific distribution.It is a modification of Kolmogorov-Smirnov (K-S) test and gives more weight to the tails than does the K-S test.Here the hypotheses are H0: the data follow normal distribution versus H1: the data do not follow normal distribution.The test statistic of AD test is: , where F is the cumulative distribution function of the specified distribution, Y i are the ordered data and N is the total number of observations.

Durbin-Watson (DW) statistic
The most important test for detecting serial correlation is DW statistic.DW statistic is used to test for randomness of residuals.The test statistic is defined as: , where u t is the white noise of a fitted model.The DW closer to 2 reveals that the residuals have no serial correlation.

Lagrange's Multiplier (LM) test
LM test is used to test the independency of residuals.It is an alternative test of Durbin Watson test for serial correlation among residuals.The null hypothesis to be tested here is that, H0: there is no serial correlation of any order.
where df is the number of regressors in the auxiliary regression (only linear terms of the dependent variable are in the auxiliary regression), 2 R is the coefficient of determination and n is the number of observations.

White's general test
This test is used in order to check the constant variance of residuals.Accordingly the null hypothesis is H0: Homoscedasticity against the alternative hypothesis H1: Heteroscedasticity.The test statistic is: , where df is the number of regressors in the auxiliary regression (squared terms of the dependent variable are also included in addition to terms in the LM test in auxiliary regression).

Model selection
To select the best model among the significant models, the following criterions are applied: is the proportion of variance of a dependent variable explained by the model.Mostly, the best model gives the largest 2 R value.

Akaike Information Criterion (AIC)
AIC is often used for model selection.For sample size n , the expression of AIC is given by: where k is the number of parameters in the model and 2  is the sample variance of the residuals.Usually, the best model is the one which gives the lowest AIC value.

Schwartz's Bayesian Criterion (SBC)
SBC is another mostly used technique for model selection in the time series analysis.For sample size n , the expression of SBC is given as: where k is the number of parameters in the model and 2  is the sample variance of the residuals.
Generally, the best model gives the lowest SBC value.

Model validation
To check the accuracy of the fitted model, the following technique is used.

Mean Absolute Percentage Error (MAPE)
MAPE is used to check the accuracy of the model.It is the average of the sum of the absolute values of the percentage errors.It is generally used for evaluation of the forecast against the validation sample.To compare the average forecast accuracy of different models, MAPE statistic is used and it is defined as follows: .

Results and Discussions
For the preliminary analysis, the monthly leptospirosis cases from January 2010 to December 2016 are taken.Accordingly, the results are discussed as follows:

Average leptospirosis cases in Sri Lanka
The Figure 1  It is observed from 1(a) that, every other year, the cases are higher than that of in the immediate preceding year.Also, on the average, the highest number of cases is reported in the year 2011 whereas the lowest is reported in the year 2012.In all other years the number of cases is almost similar to each other year.According to Agampodi et al. (2014), the unusual cases reported in the year 2011 are due to uncommon L. kirschneri strains and perhaps due to changing human-animal interactions.
As per the pattern appearing in Figure 1(b), it can be stated that, the higher average cases are reported in the month of March.On the other hand, the lowest cases are reported in the month of August on the average.Moreover, average monthly cases are 355 and hence it can be stated, on the average, that nearly 12 leptospirosis cases per day are reported island-wide.
However, when the raw data are inspected, it is observed that, the number of cases in the month of March in the year 2011 is extremely high compared to the data in other months in any year.This extreme observation ( 745) is from Kurunegala district in Sri Lanka and it is more than the doubled amount of the overall mean (355) observations.By adjusting this extreme observation, the highest cases are generally reported in the month of November of every year, on the average.

Ranking the provinces and districts of Sri Lanka based on leptospirosis
When the average leptospirosis cases in provinces are considered, the provinces in Sri Lanka can be ranked as marked in Figu re 2.
According to Figure 2, Western province is ranked as number 1 and the last rank is assigned to Northern province.The ranks 2 and 3 go to the provinces Subaragamuwa and Southern respectively.It is observed from Figure 2 that, the first 5 ranks belong to the parts of the country where the density is high and the paddy field activities are higher than that of in the other part of the island.It is very clear when district rankings are ordered in Table 1.

Leptospirosis cases in the Western province of Sri Lanka
In this subsection, only the leptospirosis cases in Western province are considered and the results relevant to Western province are discussed.

Figure 3(a): Annual leptospirosis Figure 3(b): District wise leptospirosis cases in Western province cases in Western province
As per the pattern appearing in Figure 3(a), even though the recent past cases in Western province are low in comparison with the cases in the year 2010 and 2011, there appears to be a positive trend.The average number of cases from the year 2010 to 2016 is more than 1200.This suggests around 100 cases per month.Thus, over 3 cases per day are reported in the Western province alone.
It is clear from Figure 3(b) that in Western province, more cases are from Gampaha district (37.93%),where more paddy field activities are taken place, followed by Kalutara district (33.76%).Colombo district cases (28.32%) are relatively less but not that low when compared with other two districts in Western province of Sri Lanka.

Pattern of the leptospirosis cases recorded in Western province
The pattern of the leptospirosis cases is discussed in this subsection.Particularly the trend and the seasonality in the time series are investigated.In this part, the monthly leptospirosis cases from January 2010 to June 2016 are taken for analysis.
Figure 4: Plot of the time series of leptospirosis cases in Western province of Sri Lanka It seems from Figure 4 that, the series is steady as there is no clear upward or downward trend.But it is hard to make a statement on the seasonal pattern.However, these have to be statistically proved.Therefore, the standard tests are applied to check the stationary condition and the relevant statistics are reported in Table 2.The p-value (0.00) of ADF test for original series suggests that, the original data has no trend.However, the p-value (0.00) of the Kruskal-Wallis test for original series strongly confirms that, the data has seasonality.Thus, the series has no trend but seasonality exists.Hence, it can be concluded that, the original series is nonstationary and it has to be transferred to a stationary series prior to fitting ARIMA models.For further investigation, the graphs of ACF and PACF are also examined.Both graphs show that the original series is non-stationary as they do not decay exponentially.Also the twelfth lag in each of ACF and PACF graphs in Figure 5 has significantly high spike.Therefore, the original series may have seasonality with length 12.To remove the seasonality, the 12 th difference is taken and once again the above mentioned standard tests are carried to the transferred series as well.According to the p-values of ADF (0.00) and the Kruskal Wallis (0.94) tests for transferred series in Table 2, now it can be concluded that, 12 th differenced series is stationary.Hence, this transferred series can be used to fit ARIMA models.

ARIMA model development for the leptospirosis cases in Western province
In this subsection too, to develop the seasonal ARIMA model for leptospirosis cases in Western province of Sri Lanka, the data from January 2010 to June 2016 are considered.
Based on the significant spikes in the graphs of ACF and PACF for transferred series, all possible seasonal ARIMA models are tried out and three significant models among them are taken for further analysis.The relevant test results of those selected tentative models are reported in Table 3 as the diagnostic checking for tentatively selected models.On the other hand, the p-values of Anderson Darling, Lagrange's Multiplier and White's general tests are not significant and Durbin Watson statistic is closer to 2 for 2 nd and 3 rd models in Table 3.It reveals that the residuals of 2 nd and 3 rd models in Table 3 satisfy the diagnostic conditions.Also the relevant coefficients of those two models are significant at 5% level.Hence, it can be claimed that, SARIMA (1, 0, 0)(0, 1, 1)12 and SARIMA (0, 0, 1)(1, 1, 0)12 models are significant.Therefore, only those two models are taken for further analysis.The appropriate selection criterions for those models are summarized in Table 4.

Accuracy of fitted model for the leptospirosis cases in Western province
In order to measure the accuracy of the fitted model, the monthly leptospirosis cases from July 2016 to December 2016 are used.In this case to calculate MAPE value, the actually recorded cases with forecasted cases from the best model are compared and summarized in Figure 6.

Figure 6: Observed Vs Forecasted leptospirosis cases in Western province of Sri Lanka
According to the information available in Figure 6, the calculated MAPE value is 14.89 and which is less than 15%.Thus, it can be concluded that, the accuracy of the forecast from the best model is over 85%.Hence, SARIMA (1, 0, 0)(0, 1, 1)12 model can be considered as a better model to forecast the future leptospirosis cases in the Western province of Sri Lanka.
The monthly-wise forecasted cases are summarized in Table 5.Based on the monthly-wise forecasted leptospirosis cases in Table 5, months of March, September, October and November in the year 2017 will have more number of cases.Further, from total cases, it can be concluded that, the expected number of new leptospirosis cases in the Western province for the entire year 2017 will be approximately 1168.

Conclusions
From the preliminary analysis based on the records, it can be concluded that, nearly 30% of total leptospirosis cases, reported in the island from 2010 to 2016, are from Western province of Sri Lanka.Further, it is noted that Ratnapura, Kurunegala, Gampaha and Kalutara are the mostly affected districts in Sri Lanka by the leptospirosis.
On the average number of cases, nearly 12 leptospirosis cases per day are reported island wide.In which over 3 cases per day are reported from the Western province of Sri Lanka.Further, leptospirosis cases are slightly high in the 4 th quarter of every year.
The estimated time series model for the seasonal leptospirosis cases in Western province of Sri Lanka is According to the fitted model, the expected leptospirosis cases in the Western province of Sri Lanka for the year 2017 are 1168.Therefore, the relevant authorities can plan their activities to provide the treatments for these leptospirosis cases for the year 2017 and also can make arrangement to control these leptospirosis cases in the Western province of Sri Lanka.

Limitation of the study
In this study, only the leptospirosis cases recorded in the Western province of Sri Lanka are considered.The clinical factors for higher number of leptospirosis cases in these areas were not taken into consideration through this study.
At the same time, it is better if a further study considers the other non-clinical factors such as density of the population and climate (during dry monsoon or wet monsoon).If such data were available, then another modeling method such as Dynamic Transfer Function could be applied and the associations with those factors could also be analyzed.
q terms that account for the correlation at low lags.Seasonal  AR P and Seasonal MA Q terms that account for the correlation at the seasonal lags where d , D and S indicate the amount of regular differencing, seasonal differencing and seasonality respectively.
(a) and Figure 1(b) summarise the average annual and monthly leptospirosis cases in Sri Lanka respectively.

Figure
Figure 1(a): Annual average cases Figure 1(b): Monthly average cases

Figure 5 :
Figure 5: Graphs of ACF and PACF of the leptospirosis cases in Western province

Table 1 : District wise ranks on leptospirosis cases in Sri Lanka
Based on the ranking order in Table1, it can be stated that, Ratnapura and Kurunegala districts get first two ranks.At the same time, 3 rd , 4 th and 5 th ranks are for the districts, Gampaha, Kalutara and Colombo, in the Western province of Sri Lanka where this study is focused on.On the other hand, the last five ranks are assigned to the districts in the Northern province where the density is low.Furthermore, as per the percentages of island wide leptospirosis cases in Table1, the most number of cases is recorded in the Western province (28.41%), and hence it is the mostly affected part of the island by leptospirosis.Moreover, Gampaha (10.78%),Kalutara (9.59%) and Colombo (8.04%) districts in the Western province are ranked among first 5 districts of Sri Lanka.It can be concluded that, nearly 30% of the total cases are reported from Western province of Sri Lanka.