WIND TURBINE ROTOR ACCELERATION: IDENTIFICATION
USING GAUSSIAN REGRESSION
W. E. Leithead
Dept of Electronic and Electrical Engineering, University of Strathclyde, Glasgow, U.K.
and
Hamilton Institute, National University of Ireland, Maynooth, Co. Kildare, Ireland
Yunong Zhang, Kian Seng Neo
Hamilton Institute, National University of Ireland, Maynooth, Co. Kildare, Ireland
Keywords: Data analysis, Gaussian regression, inde
pendent processes, fast algorithms.
Abstract: Gaussian processes prior model methods for data analysis are applied to wind turbine time series data to
identify both rotor speed and rotor acceleration from a poor measurement of rotor speed. In so doing, two
issues are addressed. Firstly, the rotor speed is extracted from a combined rotor speed and generator speed
measurement. A novel adaptation of Gaussian process regression based on two independent processes rather
than a single process is presented. Secondly, efficient algorithms for the manipulation of large matrices are
required. The Toeplitz nature of the matrices is exploited to derive novel fast algorithms for the Gaussian
process methodology that are memory efficient.
1 INTRODUCTION
Following some initial publications in the late 1990s
(e.g., Rasmussen (1996), Gibbs (1997), Mackay
(1998), Williams and Barber (1998)), interest has
grown quickly into the application of Gaussian
process prior models to data analysis; e.g. Williams
(1999), Gibbs and Mackay (2000), Sambu et al
(2000), Toshioka and Ishii (2001), Leith et al
(2002), Shi et al (2003), Solak et al (2003), Leithead
et al (2003a), Leithead et al (2003b). In this paper,
these methods are applied to wind turbine time
series data, specifically, site measurements of the
rotor speed for a commercial 1MW machine.
However, the measurement is some unknown
combination of the rotor speed and the generator
speed (scaled by the gearbox ratio) (Leithead et al,
2003b). Furthermore, the data is corrupted by
significant measurement noise. The objective of the
data analysis is to extract from the data both the
rotor speed and the rotor acceleration, an initial yet
important part of identifying the aerodynamics and
drive-train dynamics of variable speed wind turbines
(Leithead et al, 2003b). Previously, only traditional
filtering methods have been employed (Leithead et
al, 2003b).
To successfully identify the wind turbine rotor
spee
d
and acceleration using Gaussian process prior
models, two particular issues must be addressed.
Firstly, since the measurement is a combination of
rotor speed and generator speed, only that
contribution to the measurement due to the rotor
speed must be extracted. Secondly, since analysis
using Gaussian process prior models involves the
inversion and multiplication of N-dimensional
square matrices where N is the number of data
measurements (24,000 in this case), the matrix
manipulations must be efficient. In this paper, novel
adaptations of the Gaussian process data analysis
methodology to meet these two issues are presented
(the first in section 3 and the second in section 4)
and successfully applied to the wind turbine data (in
section 5).
84
E. Leithead W., Zhang Y. and Seng Neo K. (2005).
WIND TURBINE ROTOR ACCELERATION: IDENTIFICATION USING GAUSSIAN REGRESSION.
In Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics - Signal Processing, Systems Modeling and
Control, pages 84-91
DOI: 10.5220/0001179300840091
Copyright
c
SciTePress
2 GAUSSIAN PROCESS PRIOR
MODELS
Gaussian process prior models and their application
to data analysis is reviewed in theis section.
Consider a smooth scalar function f(.) dependent
on the explanatory variable,
. Suppose
N measurements,
{
, of the value of the
function with additive Gaussian white measurement
noise, i.e. y
p
D z
}
N
1i
ii
)y,(
=
z
i
=f(z
i
)+n
i
,
are available and denote them
by M. It is of interest here to use this data to learn
the mapping f(z) or, more precisely, to determine a
probabilistic description of f(z) on the domain, D,
containing the data. Note that this is a regression
formulation and it is assumed the explanatory
variable, z, is noise free. The probabilistic
description of the function, f(z), adopted is the
stochastic process, f
z
, with the E[f
z
], as z varies,
interpreted to be a fit to f(z). By necessity, to define
the stochastic process, f
z
, the probability
distributions of f
z
for every choice of value of
D
z
are required together with the joint probability
distributions of
for every choice of finite
sample, {z
i
f
z
1
,…,z
k
}, from D, for all k>1. Given the
joint probability distribution for
i
, i=1..N, and the
joint probability distribution for n
f
z
i
, i=1..N, the joint
probability distribution for y
i
, i=1..N, is readily
obtained since the measurement noise, n
i
, and the
f(z
i
) (and so the
i
) are statistically independent. M
is a single event belonging to the joint probability
distribution for y
f
z
i
, i=1..N.
In the Bayesian probability context, the prior
belief is placed directly on the probability
distributions describing f
z
which are then
conditioned on the information, M, to determine the
posterior probability distributions. In particular, in
the Gaussian process prior model, it is assumed that
the prior probability distributions for the f
z
are all
Gaussian with zero mean (in the absence of any
evidence the value of f(z) is as likely to be positive
as negative). To complete the statistical description,
requires only a definition of the covariance function
=E[ , ], for all z
),(C
jif
zz
i
f
z
j
f
z
i
and z
j
. The
resulting posterior probability distributions are also
Gaussian. This model is used to carry out inference
as follows.
Clearly where p(M)
acts as a normalising constant. Hence, with the
Gaussian prior assumption,
M)/p(M),f(p)M|f(p
zz
=
[]
Λ
Y
Y
z
zz
f
fMfp
1
2221
T
`2111
T
2
1
exp)|(
ΛΛ
Λ
(1)
where
, Λ
T
N
yy ],[
1
"=Y
11
is E[f
z
, f
z
], the ij
th
element of the covariance matrix Λ
22
is E[y
i
, y
j
] and
the i
th
element of vector Λ
21
is E[y
i
, f
z
]. Both Λ
11
and
Λ
21
depend on z. Applying the partitioned matrix
inversion lemma, it follows that
(
)
][ )
ˆ
()
ˆ
(exp|
1
2
1
zzzzzz
ffffMfp Λ
(2)
with
, .
Therefore, the prediction from this model is that the
most likely value of f(z) is the mean,
, with
variance Λ
Y
z
-1
22
T
21
f
ˆ
ΛΛ=
21
-1
22
T
2111
ΛΛΛΛ=Λ
z
z
f
ˆ
z
. Note that is simply a z-dependent
weighted linear combination of the measured data
points, Y, using weights
. The measurement
noise, n
z
f
ˆ
-1
22
T
21
ΛΛ
i
, i=1,..N, is statistically independent of f(z
i
),
i=1,..N, and has covariance matrix B. Hence, the
covariances for the measurements, y
i
, are simply
E[y
i
,y
j
] = E[ , ]+ B
i
f
z
j
f
z
ij
; E[y
i
, f
z
] = E[ , f
i
f
z
z
] (3)
In addition, assume that the related stochastic
process,
, where and e
i
f
e
z
δ
δ=
δ+
/)f(ff
)(
δ
i
i
zez
e
z
i
is
a unit basis vector, is well-defined in the limit as
0
δ
, i.e. all the necessary probability
distributions for a complete description exist. Denote
the derivative stochastic process, i.e. the limiting
random process, by
. When the partial derivative
of f(z) in the direction e
i
f
e
z
i
exists, E[ ] as z varies is
interpreted as a fit to
i
f
e
z
δ
)(
z
f
i
z
. Provided the
covariance E[
, ] is sufficiently differentiable,
it is well known (O’Hagan, 1978) that
is itself
Gaussian and that
i
f
z
j
f
z
i
f
e
z
]f[E
z
]f[E
i
i
z
e
z
=
(4)
where z
i
denotes the i
th
element of z; that is, the
expected value of the derivative stochastic process is
just the derivative of the expected value of the
stochastic process. Furthermore,
]f,f[E]f,f[E
]f,f[E]f,f[E
101
i
0
10
j
1
i
0
1
i
2
j
1
i
zzz
e
z
zz
e
z
e
z
=
=
(5)
where
Q(z
1
i
o
,z
1
) denotes the partial derivative of
Q(z
o
,z
1
) with respect to the i
th
element of its first
argument, etc.
WIND TURBINE ROTOR ACCELERATION: IDENTIFICATION USING GAUSSIAN REGRESSION
85
The prior covariance function is generally
dependent on a few hyperparameters,
θ
. To obtain a
model given the data, M, the hyperparameters are
adapted to maximise the likelihood, p(M|
θ
), or
equivalently to minimize the negative log likelihood,
L(
θ
), where
YY
1
)(
2
1
)(detlog
2
1
)(
+=
θθθ
CCL
T
(6)
with
22
)( Λ=
θ
C , the covariance matrix of the
measurements.
3 MODELS WITH TWO
GAUSSIAN PROCESSES
Suppose that the measurements are not of a single
function but of the sum of two functions with
different characteristics; that is, the measured values
are y
i
=f(z
i
)+g(z
i
)+n
i
. Now, it is of interest to use the
data to learn the mappings, f(z) and g(z), or, more
precisely, to determine a probabilistic description for
them. The probabilistic description by means of a
single stochastic process, discussed in the previous
section, is no longer adequate. Instead, a novel
probabilistic description in terms of the sum of two
independent stochastic processes, f
z
and g
z
, is
proposed below.
Since f
z
and g
z
are independent, E[FG
T
]=0 where
and . Let the
covariance functions for f
T
]f,f[
N1
zz
F "=
T
]g,g[
N1
zz
G "=
z
and g
z
be and
, respectively. Note that this is a different
model from one using a single stochastic process
with covariance function, (C
),(C
jif
zz
),(C
jig
zz
f
+ C
g
). It follows that
[]
=
=
Q
YGF
Y
G
F
GGFF
GGGG
FFFF
ΛΛ
ΛΛ
ΛΛ
Λ 0
0
E
TTT
(7)
with
, and
.
]E[
T
FF
FF
=Λ ]E[
T
GG
GG
=Λ
GGFF
BQ ΛΛ ++=
The prior joint probability distribution for F, G
and Y is Gaussian with mean zero and covariance
matrix Λ. The requirement is to obtain the posterior
probability distribution for F and G conditioned on
the data set, M, subject to the condition that they
remain independent. Of course, the posterior
probability distribution remains Gaussian. The mean
and covariance matrix for the posterior is provided
by the following theorem (Leithead et al, 2005).
Theorem 1: Given that the prior joint probability
distribution for F, G and Y is Gaussian with mean
zero and covariance matrix Λ, the posterior joint
probability distribution for [F
T
, G
T
]
T
conditioned on
the M, subject to the condition that they remain
independent, is Gaussian with
=
=
BQBQ
BQ
YQBQ
YQ
GGF
FFF
GGF
FFF
11
1
11
1
0
0
cov
mean
Λ
Λ
Λ
Λ
(8)
where
FFF
BQ
Λ
+
=
.
Proof. Omitted due to space limitations.
Note, since f
z
and g
z
remain independent when
conditioned on M, the prediction and covariance for
(f
z
+g
z
) are simply the sum of the individual
predictions and covariance values.
0 1 2 3 4 5 6 7 8
-4
-3
-2
-1
0
1
2
3
Data, long length-scale prediction and total error
Seconds
Figure 1: Data values (***), long length-scale component
with confidence intervals (––) and total error and
confidence interval (==)
Example: A commonly used prior covariance
function for a Gaussian process with scalar
explanatory variable is
][
2
2
1
)zz(exp
ji
da
(9)
It ensures that measurements associated with nearby
values of the explanatory variable should have
higher correlation than more widely separated values
of the explanatory variable;
is related to the
overall mean amplitude and d inversely related to
the length-scale of the Gaussian process.
a
Let the covariance function for f
z
be (9) with
a=1.8 and d=2.5, and the covariance function for g
z
be (9) with a=0.95 and d=120; that is, f
z
has a long
ICINCO 2005 - SIGNAL PROCESSING, SYSTEMS MODELING AND CONTROL
86
length-scale and g
z
a short length-scale. In addition,
let the measurement noise be Gaussian white noise
with variance b=0.04, i.e. B
ij
=b
δ
ij
., where
δ
ij
is the
Kronecker delta. A set of 800 measurements at
constant interval, 0.01, for y
i
=f(z
i
)+g(z
i
)+n
i
, with the
f(z
i
) and g(z
i
) the sample values for the stochastic
processes f
z
and g
z
, respectively, is shown in figure
1.
A prediction for the long and short length-scale
components is obtained using (8); that is, the
conditioning on the data is chosen such that as much
of the data as possible is explained by the long
length-scale component. The long length-scale
component with its confidence interval (
standard
deviations) is shown in Figure 1 and the short
length-scale component with its confidence interval
is shown in Figure 2. The prediction error for (f
2±
z
+g
z
)
with its confidence interval is also depicted in Figure
2.
0 1 2 3 4 5 6 7 8
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
Short length-scale prediction and confidence intervals
Seconds
Figure 2: Short length-scale component with confidence
intervals
In Section 5, Theorem 1 is applied to the wind
turbine measurement data to extract the contribution
due to the rotor speed and, together with (4) and (5),
to identify the rotor acceleration.
4 TOEPLITZ-BASED
EFFICIENCY IMPROVEMENT
In section 3, a novel adaptation of the Gaussian
regression methodology to support the extraction of
separate components from data is presented.
However, before that procedure can be applied to
large data sets, fast and memory efficient algorithms
are required. That requirement is addressed in this
section.
As the log likelihood, (6), is in general nonlinear
and multimodal, efficient optimisation routines
usually need gradient information,
YY
111
2
1
2
1
=
C
C
C
C
Ctr
L
i
T
ii
θθθ
(10)
where
i
θ
denotes the i-th hyperparameter and )(
tr
the trace operation of a matrix. Let us denote
i
C
θ
/ hereafter by
P
for notational convenience.
Clearly, in general, the number of operations in
solving for
and is of , whilst
the memory space to store
C , and
Cdetlog
1
C
)(
3
NO
1
C
P
is
(see Table 1 for specific values). For large
data sets, fast algorithms, that require less memory
allocation, are required for the basic matrix
manipulations,
, and ,
when tuning the Gaussian process prior model
hyperparameters.
)(
2
NO
Cdetlog
yC
1
)(
1
PCtr
Now consider a time series with fixed interval
between the measurements. The explanatory
variable, z, in the Gaussian process prior model is a
scalar, the time t. When the covariance function
depends on the difference in the explanatory
variable, as is almost always the case, the covariance
matrix
)(
θ
C and its derivative matrices
P
are
Toeplitz; that is,
[]
110
021
201
110
)(
=
=
=
N
NN
N
N
cccc
cToeplitz
ccc
ccc
ccc
C
"
"
#%##
"
"
Furthermore,
)(
θ
C is symmetric and positive
definite.
Here, the Toeplitz nature of
)(
θ
C is exploited to
derive novel fast algorithms for the Gaussian process
methodology that require less memory allocation. It
is well-known that positive-definite Toeplitz
matrices can be elegantly manipulated in
operations. For example, Trench's algorithm
inverts
with operations, and Levinson's
algorithm solves for
with operations
(Golub & Van Loan, 1996). However, direct
application of these algorithms to Gaussian process
regression may fail even with medium-scale datasets
due to lack of memory space, see Table 1. For
example, on a Pentium-IV 3GHz 512MB-RAM PC,
)(
2
NO
C 4/13
2
N
yC
1
2
4N
WIND TURBINE ROTOR ACCELERATION: IDENTIFICATION USING GAUSSIAN REGRESSION
87
a MATLAB-JAVA algorithm usually fails around
because storing almost uses up the
available system memory. The solution is to adapt
the fast algorithms to use only vector-level storage.
From Table 1 that approach is theoretically able to
handle very large datasets, such as 1-million data
points, in terms of memory requirements.
7000=N
1
C
Table 1: Memory/storage requirement in double precision
N Matrix Vector
1000 7.7 MB 7.6 KB
7000 373.9 MB 53.4 KB
15000 1716.6 MB 114.4 KB
20000 152.6 KB
30000 228.9 KB
50000 381.5 KB
100000 762.9 KB
1000000 7.629 MB
Two versions of fast Toeplitz-computation
algorithms are discussed and compared below;
namely, a full-matrix version and a vector-storage
version.
4.1 Full-Matrix Toeplitz
Computation
The simplest way of applying Toeplitz computation
to Gaussian process regression is to compute
directly as the basis for the other matrix
manipulations. Specifically, Trench's algorithm of
operations can be readily modified to obtain
whilst simultaneously determining as
the logarithm sum of the reflection coefficients.
Then, given
, the computation
1
C
)(
2
NO
1
C Cdetlog
1
C
∑∑
=
ij
ijij
pcPCtr )(
1
is easily performed in
operations, where
)(
2
NO
ij
c
and p
ij
are the ij-th
elements of
and P, respectively.
1
C
Table 2: Accuracy and speedup of Toeplitz computation
N=1000 N=2000 N=3000
Accuracy on
Cdetlog
4.1×10
-15
3.4×10
-15
3.5×10
-15
yC
1
1.3×10
-12
7.0×10
-13
2.2×10
-12
)(
1
PCtr
8.1×10
-14
1.7×10
-13
5.5×10
-14
Speed up 70.45 91.64 90.84
Note that Trench's algorithm uses Durbin's
algorithm to solve Yule-Walker equations (Golub &
Van Loan, 1996). These two algorithms are
implemented separately; specifically, the matrix-free
algorithm that generates an instrumental vector and
the remaining part that generates
. The former
does not contain any matrix or matrix-related
computation/storage, and is thus able to perform
very high-dimension Toeplitz-computation. In view
of this, it is also used in Section 4.2 as a part of the
vector-storage version of Toeplitz-computation.
1
C
A large number of numerical experiments are
performed to verify the correctness of the modified
algorithms and their implementation. The covariance
function is
ijji
bda
δ
+ ][
2
2
1
)zz(exp
(11)
with random hyperparameters,
, )3,0(a
)05.0,0(
d , )3.0,0(
b . The numerical stability,
accuracy and speed-up of the algorithms are
compared to the standard MATLAB matrix-
inversion routines, see Table 2 where the mean of
the relative errors and speed-up ratios are shown.
Each test is based on 100 random covariance
matrices. Trench's algorithm is sufficiently stable
for the Gaussian process context in the sense that it
can work well for
, though it is slightly
less stable than the MATLAB INV routine (the latter
can work well for
).
11
10/
ad
15
10/
ad
4.2 Vector-Storage Toeplitz
Computation
As discussed above, the full-matrix Toeplitz
computation works well for medium-scale
regression tasks with a speed-up of around 100.
However, the matrix-level memory allocation is still
an issue for large datasets with N greater than 7,000
such as the wind-turbine data. It follows from Table
1, that if possible, a specialized vector-level storage
version of the algorithms is attractive for specific
computation task, such as Gaussian regression for
time series.
The modified matrix-free Durbin’s algorithm, see
above, is used to compute
and Levinson’s
algorithm to compute
. The remaining
manipulation, namely
, is obtained with
the aid of the following theorem.
Cdetlog
yC
1
)(
1
PCtr
Theorem 2. can be computed as
)(
1
PCtr
ICINCO 2005 - SIGNAL PROCESSING, SYSTEMS MODELING AND CONTROL
88
=
+=
N
i
ii
ppPCtr
2
11
1
2)(
ϕϕ
(12)
where P is Toeplitz with representative vector
and ],,,[
21 N
pppp "=
i
ϕ
denotes the summation
of the elements in the
i th diagonal of .
1
C
Proof. Omitted due to space limitation.
Table 3: Accuracy and run-time of Toeplitz computation.
Accuracy in the form
of mean (std)
Time
(seconds)
N=10000
1.4×10
-13
(2.0×10
-13
)
29.1 (22.5)
N=20000
2.0×10
-13
(3.7×10
-13
)
264.7
(154.2)
N=30000
1.4×10
-13
(2.1×10
-13
)
730.8
(393.3)
N=40000
2.5×10
-13
(5.6×10
-13
)
1555.3
(784.8)
N=50000
2.8×10
-13
(1.4×10
-12
)
2497.1
(1339.3)
N=60000
1.4×10
-13
(2.2×10
-13
)
3641.7
(1957.0)
Before applying them to Gaussian process
regression, a large number of numerical experiments
are also performed for the efficient and economical
vector-storage version of the Toeplitz algorithms.
Random covariance matrices are generated and
tested as in the previous subsection. Table 3 shows
the numerical accuracy and execution time of the
algorithms (the standard deviation is given in
brackets). The results substantiate the efficacy of the
vector-storage Toeplitz computation on large
datasets.
5 WIND TURBINE DATA
The measurement data for the wind turbine rotor
speed consist of a run of 600 seconds sampled at
40Hz. A typical section, from 200s to 400s, is shown
in Figure 3. The data has a long length-scale
component due to variations in the aerodynamic
torque, caused by changes in the wind speed and the
pitch angle of the rotor blades, and a short length-
scale component due to the structural and electro-
mechanical dynamics of the machine. From Figure
3, these two components can be clearly seen as can
the poor quality of the data.
200 220 240 260 280 300 320 340 360 380 400
24.5
25
25.5
26
26.5
27
27.5
Data
Seconds
Figure 3: Rotor speed measurements from 200s to 400s
265 266 267 268 269 270 271 272 273 274 275
24.5
25
25.5
26
26.5
27
27.5
Long-length scale prediction and confidence intervals
Seconds
Data
Prediction
Std Dev
Figure 4: Rotor speed prediction, confidence intervals and
data from 265s to 275s
200 220 240 260 280 300 320 340 360 380 400
24.5
25
25.5
26
26.5
27
27.5
Long-length scale prediction and confidence intervals
Seconds
Figure 5: Rotor speed prediction with confidence intervals.
WIND TURBINE ROTOR ACCELERATION: IDENTIFICATION USING GAUSSIAN REGRESSION
89
It is required to estimate the long-length-scale
component in the rotor speed. Since the structural
and electro-mechanical dynamics only induce small
oscillations in the measured values, a prediction of
the rotor speed using (8) is appropriate with f
z
and g
z
the long and short-length components respectively.
The covariance function for f
z
is chosen to have the
form (9) with hyperparameters a
f
and d
f
as is the
covariance function for g
z
with hyperparameters a
g
and d
g
. The measurement noise is assumed to be
Gaussian white noise with variance b. Hence, the
prior covariance for the measurements, y
i
, at time, t
i
,
are
ij
bda
da
δ
++
=
])tt(exp[
])tt(exp[]y,y[E
2
jig
2
1
g
2
jif
2
1
fji
(13)
Given the data, the hyperparameters are adapted to
maximize the likelihood. Since there are 24,000 data
values, it is necessary to use the vector-storage
Toeplitz algorithms of Section 4.2.
200 220 240 260 280 300 320 340 360 380 400
-1.5
-1
-0.5
0
0.5
1
1.5
Derivative prediction with confidence intervals
Seconds
Figure 6: Derivative prediction with confidence intervals
265 266 267 268 269 270 271 272 273 274 275
-1.5
-1
-0.5
0
0.5
1
1.5
Derivative prediction with confidence intervals
Seconds
Figure 7: Derivative prediction with confidence intervals
A section, from 200s to 400s, of the prediction for
the rotor speed, i.e. the long length-scale component,
together with the confidence intervals is shown in
Figure 5 and a typical short section, from 265s to
275s, in Figure 4. From the latter, it can be seen that
the rotor speed has been successfully extracted.
However, it is not the rotor speed per se that is of
interest but its derivative. A section, from 200s to
400s, of the prediction for the derivative of the rotor
speed together with the confidence intervals is
shown in Figure 6 and a short section, from 265s to
275s, in Figure 7.
6 CONCLUSIONS
From poor quality wind turbine rotor speed
measurements, the rotor speed and acceleration are
estimated within narrow confidence intervals using
Gaussian process regression. To do so, two issues
are addressed. Firstly, the rotor speed is extracted
from a combined rotor speed and generator speed
measurement. A novel adaptation of Gaussian
process regression based on two independent
processes rather than a single process is presented.
Secondly, efficient algorithms for the manipulation
of large matrices (24,000x24,000) are required. The
Toeplitz nature of the matrices is exploited to derive
novel fast algorithms for the Gaussian process
methodology that are memory efficient.
ACKNOWLEDGEMENTS
This work was supported by Science Foundation
Ireland grant, 00/PI.1/C067, and by the EPSRC
grant, GR/M76379/01
.
REFERENCES
Gibbs, M. N., 1997. Bayesian Gaussian processes for
regression and classification, Ph.D. thesis, Cambridge
University.
Gibbs, M. N., and Mackay, D. J. C., 2000. Variational
Gaussian process classifiers, IEEE Transactions on
Neural Networks, Vol. 11, pp. 1458-1464.
Golub, G. H., and Van Loan, C. F. , 1996. Matrix
Computations. Baltimore: Johns Hopkins University
Press.
Leith, D. J., Leithead, W. E., Solak, E., and Murray-
Smith, R., 2002. Divide and conquer identification
using Gaussian process priors, Proceedings of the 41st
ICINCO 2005 - SIGNAL PROCESSING, SYSTEMS MODELING AND CONTROL
90
IEEE Conference on Decision and Control, Vol. l 1,
pp. 624-629.
Leithead, W. E., Solak, E., and Leith, D., 2003a. Direct
identification of nonlinear structure using Gaussian
process prior models, Proceedings of European
Control Conference, Cambridge.
Leithead, W. E., Hardan, F., and Leith, D. J., 2003b,
Identification of aerodynamics and drive-train
dynamics for a variable speed wind turbine,
Proceedings of European Wind Energy Conference,
Madrid.
Leithead, W. E., Kian Seng Neo, Leith, D. J., 2005.
Gaussian regression based on methods with two
stochastic processes, to be presented, IFAC, Prague,
2005.
Mackay, D. J. C., 1998. Introduction to Gaussian
processes. In Neural Networks and Machine Learning,
F: Computer and Systems Sciences (Bishop, C. M.,
Ed.), Vol. 168, pp. 133-165, Springer: Berlin,
Heidelberg.
O’Hagan, A., 1978. On curve fitting and optimal design
for regression, J. Royal Stat Soc. B, 40, pp. 1-42.
Rasmussen, C. E., 1996. Evaluation of Gaussian processes
and other methods for non-linear regression, Ph.D.
thesis, University of Toronto.
Sambu, S., Wallat, M., Graepel, T., and Obermayer, K.,
2000. Gaussian process regression: active data
selection and test point rejection, Proceedings of the
IEEE International Joint Conference on Neural
Networks, Vol. 3, pp. 241-246.
Shi, J. Q., Murray-Smith, R., and Titterington, D. M.,
2003. Bayesian regression and classification using
mixtures of multiple Gaussian processes, International
Journal of Adaptive Control and Signal Processing,
Vol. l 17, pp. 149-161.
Solak, E., Murray-Smith, R.,. Leithead, W. E., Leith, D.,
and Rasmussen, C. E., 2003. Derivative observations
in Gaussian process models of dynamic systems,
Advances in Neural Information Processing Systems,
Vol. 15, pp. 1033-1040, MIT Press.
Williams, C. K. I., 1999. Prediction with Gaussian
processes: from linear regression to linear prediction
and beyond, Learning in Graphical Models (Jordan,
M. I., Ed.), pp. 599-621.
Williams, C. K. I., and Barber, D., 1998. Bayesian
classification with Gaussian processes, IEEE
Transactions on Pattern Analysis and Machine
Intelligence, Vol. l 20, pp. 1342-1351.
Yoshioka, T., Ishii, S., 2001. Fast Gaussian process
regression using representative data, Proceedings of
International Joint Conference on Neural Networks,
Vol. l 1, pp. 132-137.
WIND TURBINE ROTOR ACCELERATION: IDENTIFICATION USING GAUSSIAN REGRESSION
91