Hint-AD: Holistically Aligned Interpretability in End-to-End Autonomous Driving

End-to-end architectures in autonomous driving (AD) face a significant challenge in interpretability, impeding human-AI trust. Human-friendly natural language has been explored for tasks such as driving explanation and 3D captioning. However, previous works primarily focused on the paradigm of decla...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Ding, Kairui, Chen, Boyuan, Su, Yuchen, Huan-ang Gao, Bu, Jin, Chonghao Sima, Zhang, Wuqiang, Li, Xiaohui, Barsch, Paul, Li, Hongyang, Zhao, Hao
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 10.09.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:End-to-end architectures in autonomous driving (AD) face a significant challenge in interpretability, impeding human-AI trust. Human-friendly natural language has been explored for tasks such as driving explanation and 3D captioning. However, previous works primarily focused on the paradigm of declarative interpretability, where the natural language interpretations are not grounded in the intermediate outputs of AD systems, making the interpretations only declarative. In contrast, aligned interpretability establishes a connection between language and the intermediate outputs of AD systems. Here we introduce Hint-AD, an integrated AD-language system that generates language aligned with the holistic perception-prediction-planning outputs of the AD model. By incorporating the intermediate outputs and a holistic token mixer sub-network for effective feature adaptation, Hint-AD achieves desirable accuracy, achieving state-of-the-art results in driving language tasks including driving explanation, 3D dense captioning, and command prediction. To facilitate further study on driving explanation task on nuScenes, we also introduce a human-labeled dataset, Nu-X. Codes, dataset, and models will be publicly available.
ISSN:2331-8422