Prediction of Future Milk Yield with Random Regression Model Using Test-day Records in Holstein Cows

Various random regression models with different order of Legendre polynomials for permanent environmental and genetic effects were constructed to predict future milk yield of Holstein cows in Korea. A total of 257,908 test-day (TD) milk yield records from a total of 28,135 cows belonging to 1,090 herds were considered for estimating (co)variance of the random covariate coefficients using an expectation-maximization REML algorithm in an animal mixed model. The variances did not change much between the models, having different order of Legendre polynomial, but a decreasing trend was observed with increase in the order of Legendre polynomial in the model. The R-squared value of the model increased and the residual variance reduced with the increase in order of Legendre polynomial in the model. Therefore, a model with 5 order of Legendre polynomial was considered for predicting future milk yield. For predicting the future milk yield of cows, 132,771 TD records from 28,135 cows were randomly selected from the above data by way of preceding partial TD record, and then future milk yields were estimated using incomplete records from each cow randomly retained. Results suggested that we could predict the next four months milk yield with an error deviation of 4 kg. The correlation of more than 70% between predicted and observed values was estimated for the next four months milk yield. Even using only 3 TD records of some cows, the average milk yield of Korean Holstein cows would be predicted with high accuracy if compared with observed milk yield. Persistency of each cow was estimated which might be useful for selecting the cows with higher persistency. The results of the present study suggested the use of a 5 order Legendre polynomial to predict the future milk yield of each cow. (


INTRODUCTION
For avoiding a shortfall or an overproduction of milk, the dairy producers and milk processors of Korea need an accurate prediction of future milk yield from each cow of their herds.The early prediction will also help the Korean dairy industry to take appropriate measures well in time to ensure normal milk supply to consumers.The accurate measurement of daily milk production of each cow and of thus each herd may not be feasible due to high expenses incurred towards data collection.
Test day models (TDM) based on test-day (TD) records of milk production at regular interval of time is now gaining importance not only due to its cost effectiveness but also due to its accuracy to predict future milk yield with fewer test-day records.The TDM can include the individual test-day effects, which affect the test-day yields substantially and it can account for individual differences in the shape of the lactation curves of cows (Jamrozik et al., 1996).Thus, the TDM can be used to analyze individual test-day records of cows in place of full 305 days lactation model (Mayeres et al., 2004).
Several methods have been suggested to predict future milk yield from a limited number of test-day records.Jones (1997) used Empirical Bayesian method (EBM) in which the milk yields from lactation in progress were combined with prior information gathered from herd mates.Macciotta et al. (2002) used Autoregressive Integrated Moving Average (ARMA) model to predict TD milk production data, which required large run of time series data and also required technical expertise on the part of forecaster.Schaeffer and Dekkers (1994) developed random regression model, where, the shape of the lactation curve was modeled by a random regression function.The shape of the lactation curve for an individual cow was divided as two sets of regressions on days in milk (DIM).The fixed regressions for all cows of the same subclass described the general shape for that cow and the random regressions for a cow described the genetic deviations from the fixed regressions, which allowed each cow to have a different shape of the lactation curve on a genetic level (Jamrozik et al., 1996).This was the random regression model for TD yield.Shadparvar and Yazdanshenas (2005) reported genetic parameters for TD milk yields estimated by REML procedure with assuming different TD milk yields to different traits in a multiple traits test day model.Later on, Pool and Meuwissen (1999) developed phenotypic TD models incorporating Legendre polynomials being able to interpolate and extrapolate missing records.The objectives of the present study were to predict the future milk yields of each cow, to predict the persistency of production of each cow, and to access the best random regression function for predicting future milk yield using TDM.

Data
The raw data comprised of a total of 360,945 TD milk yield records from first lactation of Holstein cows in Korea from 1997 to 2003.The recording was done once in a month and each TD yield comprised of morning and evening milk yield of each cow.The data were restricted to DIM on TD between 1 to 400 days, on cows having TD records between 5 and 14 (both included), on herd having TD records more than 10 (i.e. from each herd at least 11 cows had been recorded during each TD) and on cows having TD milk yield from 3 to 70 kg.Frequency by number of records on milk yields by herd-test-date (HTD) was shown in Figure 1.After imposing all the restrictions, the final data consist of 257,908 TD records from 28,135 cows belonging to 1,090 herds.This data set consisting of whole TD records was used to estimate the (co)variance components of random covariate coefficients using model with Legendre polynomial of different order of fit and to estimate the milking curve of each cow.The detailed general statistics of the final data structure is given in Table 1.
For predicting the future milk yield of cows, 132,771 TD records from 28,135 cows were randomly selected from the above data by way of retaining preceding partial TD records on each cow and then future milk yields by each cow were estimated using incomplete records.The random selection of records was carried out by using function of random number generation on SAS package (SAS, 2004).

Statistical analysis
Variance component estimation : The data were classified according to the age at breeding by an interval of 3 months of age (Figure 3).For model construction, the following effects were included in the model for estimating variance component of regression coefficients after checking significances of their effects on milk yields (Table 2).
Model : Three models with 3 rd , 4 th and 5 th order of Legendre polynomials were considered.Within each model, herd was considered as fixed effect and as random effect thus making a total of six models for analysis of covariance components.All the models were compared and the best model was chosen to predict the future milk yield of cows with criteria of prediction error variances.Computer program on which EM-REML algorithm implemented was used to obtain the (co)variance component estimates.
Model search : The shape of the lactation curve of each cow was estimated by TD random regression model with assuming as below: Where, y ij = j th test day milk yield of i th animal; x' ij = incidence row vector for fixed effects of test day milk yield j in animal i; β = [PYR:PM:BA:TM:DIM:HERD]'; Ø ij(m) is a row vector of random regression factors of the m th order Legendre polynomial; k i(m) is a (m by 1) vector of individual random regression coefficients of animal i with m as the order of fit; e ij = random error term.In this model, all the effects were considered as fixed except random regression factor and residual effect.
Another model with assuming herd effects of random was used as below:
The DIM was standardized with the range from -1 to 1.

Prediction of future milk yield
The second data set, with retaining preceding partial TD records on each cow, was used for future milk prediction.The prediction of the deleted records was done and the predicted milk yield was compared with the measured milk yield of each cow.For this purpose, a graph on DIM verses residual standard deviation for predicted milk yield was drawn.Graphical representation of correlation between the real and predicted milk yield was carried out.The average of all cows for predicted and observed milk yield was also plotted to see the accuracy of prediction for whole lactation.

Persistency
Persistency usually refers to the rate of decline in daily yield after the peak of lactation.The persistency of each cow was estimated as proposed by Togashi and Lin (2004) with some modification as follow: Persistency of animal Where, j = j th days in milk; i = i th animal a j(i) was the estimated milk yield at j th days in milk of animal and given by k j(i) = vector 5 th order Legendre polynomial of j th DIM of i th animal Sol(i) = vector of covariate estimate of Legendre polynomial on i th animal When an average lactation curve of Holstein cows for first lactation milk yield was plotted it was observed that the peak of lactation curve was achieved on day 65 (Figure 3).This peak day was included in the formula.

Phenotypic (co)variance estimates
The phenotypic variance and covariance estimates for the random regression coefficients for milk yield modeled by TDM having 5 th order of Legendre polynomials were presented in Table 3 and 4. The herd effect was considered as fixed (Table 3) and as random (Table 4).The present estimate for the intercept (β 0 ) was much lower than the estimate reported by Pool and Meuwissen (1999).But for the other higher order random covariate coefficients the present estimates were higher than the estimates reported by Pool and Meuwissen (1999).The (co)variance estimates from model having Legendre polynomial of 3 rd and 4 th order of fit were estimated (not shown).The phenotypic variances of the six models considering 3 rd , 4 th and 5 th order of Legendre polynomial had not shown much difference but a decreasing trend was observed in the value of the estimates with an increase in number of random covariate coefficients (i.e. with the order of fit of the Legendre polynomial) in the model (Figure 3).This result was in agreement with the findings of Pool and Meuwissen (1999).The difference in the model considering herd as fixed or random effect had not shown any significant difference in estimating variances for random covariate coefficients.We   The correlations estimated between the random covariate coefficients for the TD model with 3 rd , 4 th and 5 th order of Legendre polynomial function were presented in Table 5.The correlations were small in magnitude but not negligible.The present findings were similar to the previous findings of Pool and Meuwissen (1999) who reported low correlations between the random regression coefficients.There was no specific trend observed for the correlations with the increase in the order of random covariate coefficients in the model.However, Pool and Meuwissen (1999) found a tendency of stronger correlations among the coefficients with higher order of random regression coefficients in the model.The mean of all the random covariate coefficients from all the 28,135 cows were observed as zero (data not shown), which was expected.

Comparison of different models
For predicting future milk yield, a total of six models with 3 rd , 4 th and 5 th order of Legendre polynomials were considered and with each order of Legendre polynomial two models were constructed on the basis of consideration of herd as random or fixed effect.As already mentioned above there was not much difference between the models considering herd as fixed or random on the basis of variances of the random covariate coefficients (Figure 3).
The predicted and observed values of milk yields obtained from the models with 3 rd , 4 th and 5 th order of Legendre polynomials were plotted (Figure 4).The figures for models with 3 rd and 4 th order of polynomial were not shown here.The determinant of the model (R 2 ) increased with the higher order of Legendre polynomial in the models with assuming fixed effect of herds (Table 6).The differences between the models with different order of Legendre polynomials were also compared on the basis of residual variances.The residual variance decreased with the increase in the order of Legendre polynomial in the model (Table 6).Figure 5 showed a plot of residuals for milk yield by DIM using 5 th order Legendre polynomial.On the basis of these comparisons, model with 5 th order of Legendre polynomials considering herd as fixed effect was selected to predict the future milk yields.Pool and Meuwissen (1999) also suggested the suitability of model with 5 th order of Legendre polynomial over model with lower or higher order of Legendre polynomial.

Prediction of future milk yield
The model equation for predicting the milk yield using   .For estimating the accuracy of the prediction, we compared predictions using various test-day records.Residual standard deviation on classes of DIM, which were monthly classified of DIM, for predicted milk yield considering 3, 6 and 9 TD records were presented in Figure 6.Results showed that the next four month's milk yield could be predicted with an error prediction of 4 kg.The residual standard deviation of prediction was increased up to 5 kg till the end of lactation.
When DIM was plotted against the correlation between the real and predicted milk yield, almost 70% correlation between predicted and observed values was shown for following next four months (Figure 7).Even considering only 3 TD records, more than 60% of correlation was observed between the real and predicted milk yield up to 11 months of lactation.These results suggested that few TD observations could be used to predict future milk yield at an early lactation of a cow.However if the records were available on further TD, then its inclusion in the analysis would give better prediction with higher correlation.
Considering average lactation curve of all the cows using observed and predicted values (with only 3 TD records), the two curves were almost similar and a minor difference was observed from 9 month of lactation till the 13 months of lactation (Figure 8).These results suggested that the future average milk from all the cows could be accurately measured using first few months' TD records for almost whole lactation.Using 9 TD records, the two curves were found to be almost similar for whole lactation (Figure 9).Thus, the more the number of test-day records was used, the more the accuracy of prediction would be shown.

Figure 3 .
Figure3.Comparison of variances of the random covariate coefficients for milk yield (kg 2 ) of a test day model using Legendre polynomials of 3 rd , 4 th and 5 th order of fit in Holstein cows.Leg(m) with m th order of fit; Fix = Model assumed herd effect of fixed; Ran = Model assumed herd effect of random.

Figure 4 .
Figure 4.A plot between predicted and observed values of milk yield using 5 th order of Legendre polynomials (y = 1.05674x-1.55486).

Figure 5 .Figure 6 .
Figure 5.A plot of residuals for milk yield by DIM using 5 th order Legendre polynomials in Holstein cows.

Figure 7 .Figure 11 .Figure 8 .Figure 9 .Figure 10 .
Figure 7. Plots of correlation estimates between real and predicted milk yields by monthly classes of days in milk (DIM) using 3, 6 and 9 TD records in Holstein cows.

Table 1 .
General information for test day milk records at first lactation on the study in Holstein cows

Table 2 .
Analysis of variance for milk yields on first lactation test-day records in Holstein cows (R 2 = 0.34)

Table 3 .
Variance (diagonal)and covariance (below diagonal) estimates of the random covariate coefficients for milk yields using the model with assumed 5 th order of Legendre polynomials and herd effect of fixed using EM-REML algorithm in Holstein

Table 4 .
Variance (diagonal)and covariance (below diagonal) estimates of the random covariate coefficients for milk yields using the model with assumed 5 th order of Legendre polynomials and herd effect of random using EM-REML algorithm in Holstein β m = parameter for m th order random regression coefficients.

Table 5 .
Correlations between random covariate coefficients modeled by a test day model using 3 rd (LEG3), 4 th (LEG4) and 5 th (LEG5) order of fit Legendre polynomial, respectively, for milk yields in Holstein cows used herd as fixed effect in the model for predicting future milk yield.