### INTRODUCTION

### MATERIALS AND METHODS

### The fitting and selection of the SARIMA models

*p*,

*d*,

*q*)×(

*P*,

*D*,

*Q*) can be represented with the following expression [19]:

*φ*

*(*

_{p}*B*) is the autoregressive operator = 1 −

*φ*

_{1}

*B*−

*φ*

_{2}

*B*

^{2}− … −

*φ*

_{p}*B*

*,*

^{p}*θ*

*(*

_{q}*B*) is the moving average operator = 1 −

*θ*

_{1}

*B*−

*θ*

_{2}

*B*

^{2}− … −

*θ*

_{q}*B*

*, Φ*

^{q}*(*

_{P}*B*

*) is the seasonal autoregressive operator = 1 − Φ*

^{S}

_{S}*B*

*− Φ*

^{S}_{2}

_{S}*B*

^{2}

*− … − Φ*

^{S}

_{PS}*B*

*, Θ*

^{PS}*(*

_{Q}*B*

*) is the seasonal moving average operator = 1 − Θ*

^{S}

_{S}*B*

*− Θ*

^{S}_{2}

_{S}*B*

^{2}

*− … − Θ*

^{S}

_{QS}*B*

*,*

^{QS}*Z*

*is the time series data consisting of the successive observations at time*

_{t}*t*,

*ɛ*

*is the residual error at time*

_{t}*t*,

*B*is the backward shift operator, for which

*B*

^{k}*Z*

*=*

_{t}*Z*

_{t}_{−}

*,*

_{k}*S*is the seasonal length,

*k*is the lag order,

*p*is the non-seasonal autoregressive order,

*d*is the order of non-seasonal differencing,

*q*is the non-seasonal moving average order,

*P*is the seasonal autoregressive order,

*D*is the order of seasonal differencing,

*Q*is the seasonal moving average order.

*p*and

*q*parameters were fitted until the moment, when the residuals from such a fitted process did not show any autocorrelation. The suggestion made by the above-mentioned authors about the adequacy of the models whose

*p*+

*q*<3 and

*P*+

*Q*<3 was considered.

*20*−

*p*−

*q*degrees of freedom (see correlograms in Supplementary Figures S1, S2). The lack of statistical significance indicates that the process is not a white noise. If the statistics was significant, such models (as unfitted) were excluded.

*r*

^{2}*is the residual autocorrelation function:*

_{k}*S*is the maximum lag of this function,

*x*

*is the observation at time*

_{t}*t*,

*x*

*is the observation at time*

_{t−k}*t−k*,

*χ̄*is the mean observation value,

*k*is the lag order, and

*n*is the number of observations.

*Z*

*is the observed value at time*

_{t}*t*,

*Ẑ*

_{t}is the predicted value at time

*t*,

*n*is the overall number of the observed values.

_{c}) [25] were considered in the selection of the best prediction model. The AIC and AIC

_{c}values were calculated using the following formulae:

*k*is the number of model parameters (

*p*+

*q*+

*P*+

*Q*+1) and

*RSS*is the residual sum of squares:

*Z*

*is the real value,*

_{i}*Ẑ*

*is the predicted value,*

_{i}*n*is the sample size (the number of observations).

### The construction and selection of the NARX artificial neural networks

### The use of Wood’s model

*Z*is the average daily milk yield at a given lactation stage,

*t*is the time expressed in days,

*e*is the natural logarithm base,

*a*is the scale parameter that regulates the general height of the curve,

*b*is the parameter controlling the increasing part of the lactation curve,

*c*is the parameter associated with the decreasing part of the curve.

### The verification of the predictions generated by the SARIMA, NARX, and Wood’s models

*r*

*and*

_{1}*r*

*are the compared correlation coefficients,*

_{2}*n*

*and*

_{1}*n*

*are the corresponding sample sizes,*

_{2}*t*is the value of the Student’s t statistics (

*α*= 0.05 and

*n*

*+*

_{1}*n*

*− 4 degrees of freedom were adopted).*

_{2}*I*

^{2}*is the term representing the prediction bias:*

_{O}*Z̄*

*is the mean real value, and*

_{i}*Ẑ*

*is the mean predicted value,*

_{m}*I*

^{2}*represents the error caused by inadequate prediction flexibility:*

_{B}*σ*

*is the standard deviation of the series of real values and*

_{i}*σ*

*is the standard deviation of the series of predicted values.*

_{p}*I*

^{2}*represents the error resulting from the insufficient convergence between the directions of predicted changes and the changes in the predicted variable:*

_{E}*r*is the correlation coefficient between the milk yields predicted by a given model and the real values.

*HLN*) was used:

*es*is the difference between the real values and those predicted by a given model,

*en*is the difference between the real values and those predicted by the model being compared.

### RESULTS

### The SARIMA model selection

_{c}values) for each season was selected for the comparison with the remaining models (Table 2). All the parameters of the presented SARIMA models selected for prediction were statistically significant (Supplementary Table S4). The Ljung-Box Q-statistics was not significant for any of them. The ACF graph (Supplementary Figures S1A, S1B) for the SARIMA models selected using the correlograms shows the exponential decay of the autocorrelation coefficients. Practically, they did not exceed the assumed 95% confidence level for any lag. Since the values of the Ljung-Box Q-statistics for the individual models were statistically non-significant for the last lag (

*k*= 20), it can be stated that this was not white noise. The PACF values did not exceed the assumed confidence level either. Only the correlograms were not so ideal in this case (Supplementary Figures S2A, S2B).

### The selection of the NARX networks for prediction

### Comparison between the predictions generated by the investigated models

^{3}for better legibility) confirmed a good prediction quality. In younger cows, we found the lowest Theil coefficients for Wood’s models in the first two seasons (1.27 and 2.86, respectively). The values for the SARIMA and NARX models in these two seasons were 2.24 and 2.99 as well as 1.84 and 4.80, respectively. In the third and fourth season, we observed the lowest coefficients for the SARIMA models (2.81 and 4.81, respectively), slightly higher for Wood’s model (3.72 and 9.59, respectively), and the highest ones for NARX (3.79 and 11.40, respectively). For all models, the greatest component of the Theil coefficient was

*I*

^{2}*in the first and third season. In the second and fourth season,*

_{E}*I*

^{2}*played a larger role for the NARX and Wood’s models. We found the lowest contribution of the*

_{O}*I*

^{2}*coefficient for almost all models in younger cows (except for the SARIMA and Wood’s models in season 1).*

_{B}*I*

^{2}*had the greatest contribution to the Theil coefficient. Only in the first season,*

_{E}*I*

^{2}*played a larger role for the NARX and Wood’s models. We found a greater contribution of the*

_{0}*I*

^{2}*coefficient (compared with*

_{B}*I*

^{2}*) in the second and fourth season (for all models) and the fourth season (for SARIMA).*

_{0}### DISCUSSION

*p*,

*d*,

*q*, and

*P*,

*D*,

*Q*parameters based on the autocorrelation and PACFs, parameter estimation and the goodness-of-fit verification. The correlogram analysis itself is not an ideal tool for the accurate model parameter selection, since the correlograms do not always unequivocally indicate the adequate choice of the

*p*,

*q*, and

*P*,

*Q*parameters. In the selection of the SARIMA model for specific prediction, AIC and AIC

_{c}can be used. It is worth noting that, in most cases, the models with the low AIC values were also characterized by the low MAE, RMSE, and MAPE values. As stated by Box et al [21], the fitting of the processes with the increasing values of the

*p*and

*q*or

*P*and

*Q*parameters until the moment, when the residuals of such a fitted process do not show autocorrelation, can be limited to the value of

*p*+

*q*/(

*P*+

*Q*)<3. The investigated SARIMA models did not differ considerably in the number of parameters and the AIC and AIC

_{c}values were mainly associated with the above-mentioned errors. It seems that the ultimate selection of the best predicting SARIMA model can be based on RMSE, MAE, and MAPE. The AIC measure, including the number of model parameters, is not more accurate, since the models usually did not differ between each other by more than one or two parameters. In the case of artificial neural networks, the number of parameters was incomparably higher than that for the SARIMA models. Therefore, the comparison between SARIMA and NARX using AIC and AIC

_{c}was not performed, similarly as for Wood’s model.

*I*

^{2}*) resulting from the insufficient convergence between the directions of prediction changes and the changes in the predicted variable. This indicates an insufficient prediction of turning points, which may lead to the prediction overestimation in some lactation periods (seasons 2 and 4 for younger cows and season 4 for older cows) and underestimation in others (season 1 and 3 for older cows) (Figures 2, 3). This finding corresponds to some extent with the results obtained by Murphy et al [15], who investigated lactational milk yield prediction at different time horizons and observed greater MAE and RMSE values for the NARX networks in the first prediction horizon (within 30 days in milk) but these errors decreased in the subsequent prediction periods. On the other hand, the static artificial neural networks were characterized by a general increase in the prediction accuracy with decreasing horizon (a reduction in RMSE from 12.0% to 10.7% with the prediction horizon changing from 305 to 10 days), although in the same prediction horizon range, the underestimation of lactation peak yield increased from 4.8% to 7.4%. However, when analyzing these predictions at 10-day intervals, the errors for NARX and the rest of the models were characterized by larger fluctuations – the neural networks had lower errors in some periods, and the remaining models in others. The predictions generated by the SARIMA, NARX, and Wood’s models in our study were more flexible, which was confirmed by the lowest contribution of the*

_{E}*I*

^{2}*term to the overall Theil coefficient. Moreover, a very low contribution of the*

_{B}*I*

^{2}*term (especially for the SARIMA models) indicated a low prediction bias. However, this coefficient predominated among the components of the Theil coefficient for three NARX and three Wood’s models, confirming a higher prediction bias for them.*

_{O}*p*,

*q*, and

*P*,

*Q*parameters. The advantage of the NARX networks in this regard is evident. Finally, the development of Wood’s models was easier compared with other models (NARX and SARIMA) presented in our study.