Cross-modal prediction in audio-visual communication

We present a novel means for predicting the shape of a person's mouth from the corresponding speech signal and explore applications of this prediction to video coding. The prediction is accomplished by modeling the probability distribution of the audiovisual features by a Gaussian mixture densi...

Full description

Saved in:

Bibliographic Details
Published in	1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings Vol. 4; pp. 2056 - 2059 vol. 4
Main Authors	Rao, R.R., Tsuhan Chen
Format	Conference Proceeding
Language	English
Published	IEEE 1996
Subjects	Acoustic measurements Decoding Distributed computing Mouth Predictive coding Predictive models Probability distribution Shape Speech coding Video coding
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We present a novel means for predicting the shape of a person's mouth from the corresponding speech signal and explore applications of this prediction to video coding. The prediction is accomplished by modeling the probability distribution of the audiovisual features by a Gaussian mixture density. The optimal estimate for the visual features given the acoustic features can then be computed using this probability distribution. The ability to predict a person's mouth shape from the corresponding audio leads to a number of interesting joint audio-video coding strategies. In the cross-modal predictive coding system described, a model-based video coder compares measured visual parameters with predicted visual parameters, and sends the difference between the two to the receiver. Since the decoder also receives the acoustic data, it can form the prediction and then reconstruct the original parameters by adding the transmitted error signal.
ISBN:	9780780331921 0780331923
ISSN:	1520-6149 2379-190X
DOI:	10.1109/ICASSP.1996.545722