Achieving the Optimum Rate for Cross-Modal Source Coding

Multi-modal applications are expected to dominate in the 5G and B5G era. However, traditional source coding methods are not efficient or reliable due to neglecting semantic redundancy and mutual influences between different modalities' sources. To address this, cross-modal source coding (CMSC)...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on multimedia Vol. 26; pp. 9722 - 9735
Main Authors Yuan, Zhe, Wu, Dan, Zhou, Liang
Format Journal Article
LanguageEnglish
Published IEEE 2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Multi-modal applications are expected to dominate in the 5G and B5G era. However, traditional source coding methods are not efficient or reliable due to neglecting semantic redundancy and mutual influences between different modalities' sources. To address this, cross-modal source coding (CMSC) has been proposed as a promising solution. However, there are still two main challenges: determining the optimum rate of CMSC considering delay and reliability constraints, and designing a practical CMSC near the optimum rate. To tackle these challenges, this paper focuses on studying the optimum source coding rate of CMSC and its practical implementation. On the theoretical side, an <inline-formula><tex-math notation="LaTeX">(n,\epsilon)</tex-math></inline-formula>-achievable rate region is derived, representing the source coding rates subject to a fixed blocklength <inline-formula><tex-math notation="LaTeX">n</tex-math></inline-formula> and the target error probability <inline-formula><tex-math notation="LaTeX">\epsilon</tex-math></inline-formula>. Additionally, the optimum source coding rate can be approximated by calculating the infimum of the <inline-formula><tex-math notation="LaTeX">(n,\epsilon)</tex-math></inline-formula>-achievable rate region with a rate dispersion function. On the technical side, a general implementation for CMSC is proposed, which fully leveraging channel coding and artificial intelligence (AI) semantic analysis to achieve the optimum rate. Numerical results demonstrate that CMSC can obtain 50% improvement in theory and 37.5% enhancement in practice against the baseline model abstracted from traditional schemes when multi-modal sources are semantically correlated.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2024.3397192