
picture surface
selection
view interpolation
Novel Perspective Synthesis
structure from motion
user efforts
manual perspective 
specification
input images
multi-perspectives 
panorama
camera pose and
3D points
Panorama Generation
synthesized perspectives
perspective 
composition
perspective 
configuration
dense stereo 
3D geometrical 
    information
original perspectives
Figure 1: The system framework.
misalignments. The framework consists of two steps:
firstly, parts of various perspectives are selected such
that visual discontinuities among those parts can be
minimized, and then, remaining artifacts are further
suppressed through a fusion process.
An overview of our system is presented in Fig 1.
In our system, street scenes are captured by a video
camera (with a fixed intrinsic camera parameter K)
moving along the scene to capture it looking side-
ways. The camera pose of each input image (i.e.,
the translation vector T, the rotation matrix R and K)
is recovered using our Structure from Motion (SfM)
system, together with a sparse set of reconstructed 3D
scenes points. From recovered camera poses, novel
perspectives are synthesized based on 3D geometri-
cal information estimated using our dense stereo al-
gorithm. An interface for manually specifying the
multi-perspective configuration is provided based on
our perspective composition framework, which com-
bines different perspectives (original or novel) to form
the resultant panorama.
The rest of this paper is organized as follows. Sec-
tion 2 presents background. Section 3 presents our al-
gorithm for synthesizing novel perspectives. Section
4 describes our perspective composition framework.
Results and discussions are presented in Section 5 and
Section 6 concludes this paper.
2 BACKGROUND
The earliest attempt at combining images captured
at different viewpoints is perhaps view interpolation,
which warps pixels from input images to a reference
coordinate using a pre-computed 3D scene geome-
try (Szeliski and Kang, 1995; Kumar et al., 1995;
Zheng and Kang, 2007). There are two main prob-
lems with these approaches: to establish an accurate
correspondence for stereo is still a hard vision prob-
lem, and there will likely be holes in the resultant im-
age due to sampling issues of the forward mapping
and the occlusion problem. Another thread is based
on optimal seam (Shum and Szeliski, 2000; Agarwala
et al., 2006), which stitches input images with their
own perspective and formulates the composition into
a labeling problem, i.e., pixel values are chosen to be
one of the input images. Results are inherently multi-
perspective. However, these approaches only work
well for roughly planar scene, as for scenes with large
depth variations, it is often impossible to find an opti-
mal partition that can create seamless mosaics.
The strip mosaic offers a better alternative. The
basic idea is to cut a thin strip from a dense col-
lection of images and put them together to form a
panorama. In its early form, the push-broom model
(Zheng, 2003; Peleg et al., 2000), the resultant im-
age is parallel in one direction and perspective in
the other, while the crossed-slits (Zomet et al., 2003)
model is perspective in one direction and is perspec-
tive from a different viewpoint in the other direction.
Therefore, the aspect ratio distortion is inherent due
to the different projections along the two directions.
In addition, because scenes within each strip are
rendered from a regular pinhole perspective, given a
certain strip width, there is a depth at which scenes
show no distortion. For a further depth, scenes might
be duplicately rendered, i.e., over-sampled, while for
a closer depth, scenes cannot be fully covered, i.e.,
under-sampled. In the literature, this kind of artifact
is named a sampling error distortion (Zheng, 2003),
see Fig 2.
Unlike the view interpolation and optimal seam,
even for scenes with complex geometrical struc-
tures, strip mosaic can still produce visually accept-
able results in spite of the fore-mentioned aspect ra-
tio and sampling error distortions. Therefore, the
strip mosaic provides a foundation upon which multi-
perspective panoramas in a large scale can be con-
structed. An interactive approach is presented in (Ro-
man et al., 2004), where several perspectives in the
form of vertical slits are specified by users and gaps
in-between them are filled with inverse perspectives.
Some other approaches attempt to automatically de-
GRAPP 2012 - International Conference on Computer Graphics Theory and Applications
228