DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation
This research introduces DesignQA, a novel benchmark aimed at evaluating the proficiency of multimodal large language models (MLLMs) in comprehending and applying engineering requirements in technical documentation. Developed with a focus on real-world engineering challenges, DesignQA uniquely combi...
Saved in:
Main Authors | , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
11.04.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This research introduces DesignQA, a novel benchmark aimed at evaluating the
proficiency of multimodal large language models (MLLMs) in comprehending and
applying engineering requirements in technical documentation. Developed with a
focus on real-world engineering challenges, DesignQA uniquely combines
multimodal data-including textual design requirements, CAD images, and
engineering drawings-derived from the Formula SAE student competition.
Different from many existing MLLM benchmarks, DesignQA contains
document-grounded visual questions where the input image and input document
come from different sources. The benchmark features automatic evaluation
metrics and is divided into segments-Rule Comprehension, Rule Compliance, and
Rule Extraction-based on tasks that engineers perform when designing according
to requirements. We evaluate state-of-the-art models (at the time of writing)
like GPT-4o, GPT-4, Claude-Opus, Gemini-1.0, and LLaVA-1.5 against the
benchmark, and our study uncovers the existing gaps in MLLMs' abilities to
interpret complex engineering documentation. The MLLMs tested, while promising,
struggle to reliably retrieve relevant rules from the Formula SAE
documentation, face challenges in recognizing technical components in CAD
images, and encounter difficulty in analyzing engineering drawings. These
findings underscore the need for multimodal models that can better handle the
multifaceted questions characteristic of design according to technical
documentation. This benchmark sets a foundation for future advancements in
AI-supported engineering design processes. DesignQA is publicly available at:
https://github.com/anniedoris/design_qa/. |
---|---|
DOI: | 10.48550/arxiv.2404.07917 |