
 
has proven effective for discerning texture (Unser, 
M., 1995). The texture feature vector consists of 14 
coefficients (7 for the mean and 7 for the variance) 
which are produced due to seven sub-bands that are 
created for two (2) resolution levels. In total, the 
feature vector used for indoor/outdoor classification 
consists of 23 coefficients. 
2.2  City/Landscape visual feature 
extraction
 
In the case of city/landscape classification, robust 
features are extracted using a combination of color 
and structural information expressed by the line 
segment orientation.  
The color is considered in the same manner as in the 
case of indoor/outdoor feature extraction. We obtain 
a vector of 9 coefficients that have been computed 
using the Equations 2-4. Together with color, we use 
a line segment descriptor. The underlying idea is to 
distinguish between long horizontal and vertical 
contours that dominate in city images and short 
length contours having other directions than either 
horizontal or vertical that can be found in landscape 
images. A similar contour descriptor has been 
proposed in (Stauder et al., 2004) leading to the 
extraction of a 12-bin histogram while in (Vailaya et 
al.,  2001) the edge direction distribution has been 
proposed for the discrimination between city and 
non-city images. 
To construct the line segment descriptor which is a 
histogram of line segment directions, we follow the 
next steps. First, we apply an edge detection using 
the Canny edge detector (Canny, J., 1986). The 
produced edges are thinned and thereafter we try to 
transform the edge representation into a line segment 
representation. For this, we apply a non-parametric 
curve segmentation into straight lines as it is 
explained in (Rosin, P.,L., and West, G.A.W., 1995). 
The direction of each straight line is calculated and 
categorized as being either horizontal, vertical or 
diagonal. Furthermore, the line segment length is 
taken into account in order to be labelled as either 
short or long segment. A segment will be considered 
as a long one if it is greater than 10% of the 
minimum dimension (either width or height).   
Finally, a histogram with six (6) bins is computed. A 
schematic representation of the different required 
steps is shown at Figure 5. In total, the feature vector 
used for city/landscape classification consists of 15 
coefficients.  
3  CLASSIFICATION - FEATURE 
FUSION 
In the particular binary classification problem 
(indoor vs. outdoor and city vs. landscape) the 
classification step was performed using two well-
known classification algorithms, K-NN 
(Theodoridis, S., and Koutroumbas, K., 1997) and 
Support Vector Machines (SVM) (Cortes C., and 
Vapnik, V., 1995)( Vapnik, V., 1998)( Chang, C.C., 
and Lin, C.-J.).  
Formally, the support vector machines (SVM) 
require the solution of an optimisation problem, 
given a training set of instance-label pairs (x
i
, y
i
), 
i=1,…,m, where 
n
i
R∈ and  {1, 1}
m
i
y ∈− . The 
optimisation problem is defined as follows : 
,,
1
1
min
2
(())1
0
m
T
i
b
i
T
ii i
i
C
subject to y x b
ωξ
ωω ξ
φξ
ξ
=
+
≥−
≥
∑
 (5) 
According to this, training vectors x
i
 are mapped into 
a higher dimensional space by the function 
. Then, 
SVM finds a linear separating hyperplane with the 
maximal margin in this higher dimensional space. 
For this search, there are a few parameters that play 
a critical role at the classification performance. 
Firstly, the parameter C  at Eq. 5,  that applies a 
penalty at the error term. Secondly, the so-called 
kernel function denoted as : 
(, ) () ()
T
ij i j
xx x x
φφ
≡
.  
One of the main aspects in classification is the 
interaction between the features and the available 
classifiers. Mainly, there are two trends in this 
interaction. Either different features are combined 
into a final feature vector as the input to the 
classifier (Lim, H-H., and Jin, J.S., 2005), (Stauder 
et al., 2004), or feature vectors associated with 
different modalities are fed into independent pattern 
classifiers whose classification outputs are then 
combined (Serrano et al., 2004), (Szummer, M., and 
Picard, R., 1998), (Payne, A., and Singh, S., 2005). 
These basic trends have shown both advantages and 
disadvantages. A disadvantage of the latter trend is 
that the training of multiple classifiers on individual 
features may not be viable at all, as single feature 
does not provide sufficient discriminative power, 
resulting in many poor classifiers for fusion.  
In our approach, we follow the former trend, where 
the classifier’s input feature vector consists of a 
concatenation of each feature that is considered for 
the corresponding classification problem (indoor vs. 
outdoor, or city vs. landscape). A detailed discussion 
about these features has already been given at 
Section 2. 
SCENE CATEGORIZATION USING LOW-LEVEL VISUAL FEATURES
157