When, Where, and What? A Novel Benchmark for Accident Anticipation and Localization with Large Language Models
As autonomous driving systems increasingly become part of daily transportation, the ability to accurately anticipate and mitigate potential traffic accidents is paramount. Traditional accident anticipation models primarily utilizing dashcam videos are adept at predicting when an accident may occur b...
Saved in:
Main Authors | , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
23.07.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | As autonomous driving systems increasingly become part of daily
transportation, the ability to accurately anticipate and mitigate potential
traffic accidents is paramount. Traditional accident anticipation models
primarily utilizing dashcam videos are adept at predicting when an accident may
occur but fall short in localizing the incident and identifying involved
entities. Addressing this gap, this study introduces a novel framework that
integrates Large Language Models (LLMs) to enhance predictive capabilities
across multiple dimensions--what, when, and where accidents might occur. We
develop an innovative chain-based attention mechanism that dynamically adjusts
to prioritize high-risk elements within complex driving scenes. This mechanism
is complemented by a three-stage model that processes outputs from smaller
models into detailed multimodal inputs for LLMs, thus enabling a more nuanced
understanding of traffic dynamics. Empirical validation on the DAD, CCD, and
A3D datasets demonstrates superior performance in Average Precision (AP) and
Mean Time-To-Accident (mTTA), establishing new benchmarks for accident
prediction technology. Our approach not only advances the technological
framework for autonomous driving safety but also enhances human-AI interaction,
making predictive insights generated by autonomous systems more intuitive and
actionable. |
---|---|
DOI: | 10.48550/arxiv.2407.16277 |