ABSTRACT
We present a model that predicts saccadic eye-movements and
can be tuned to a particular human observer who is viewing
a dynamic sequence of images. Our work is motivated by
applications that involve gaze-contingent interactive
displays on which information is displayed as a function of
gaze direction. The approach therefore differs from
standard approaches in two ways: (i) we deal with dynamic
scenes, and (ii) we provide means of adapting the model to a
particular observer. As an indicator for the degree of
saliency we evaluate the intrinsic dimension of the image
sequence within a geometric approach implemented by using
the structure tensor. Out of these candidate saliency-based
locations, the currently attended location is selected
according to a strategy found by supervised learning. The
data are obtained with an eye-tracker and subjects who view
video sequences. The selection algorithm receives candidate
locations of current and past frames and a limited history
of locations attended in the past. We use a linear mapping
that is obtained by minimizing the quadratic difference
between the predicted and the actually attended location by
gradient descent. Being linear, the learned mapping can be
quickly adapted to the individual observer.
Keywords: Eye-movements, saccades, saliency map,
intrinsic dimension, machine learning, gaze-contingent
display
Download paper in PDF format.
The videos and further results will be made available here
soon.
|