ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding
Surgical instrument segmentation is crucial in surgical scene understanding, thereby facilitating surgical safety. Existing algorithms directly detected all instruments of pre-defined categories in the input image, lacking the capability to segment specific instruments according to the surgeon'...
Saved in:
Main Authors | , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
28.07.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Surgical instrument segmentation is crucial in surgical scene understanding,
thereby facilitating surgical safety. Existing algorithms directly detected all
instruments of pre-defined categories in the input image, lacking the
capability to segment specific instruments according to the surgeon's
intention. During different stages of surgery, surgeons exhibit varying
preferences and focus toward different surgical instruments. Therefore, an
instrument segmentation algorithm that adheres to the surgeon's intention can
minimize distractions from irrelevant instruments and assist surgeons to a
great extent. The recent Segment Anything Model (SAM) reveals the capability to
segment objects following prompts, but the manual annotations for prompts are
impractical during the surgery. To address these limitations in operating
rooms, we propose an audio-driven surgical instrument segmentation framework,
named ASI-Seg, to accurately segment the required surgical instruments by
parsing the audio commands of surgeons. Specifically, we propose an
intention-oriented multimodal fusion to interpret the segmentation intention
from audio commands and retrieve relevant instrument details to facilitate
segmentation. Moreover, to guide our ASI-Seg segment of the required surgical
instruments, we devise a contrastive learning prompt encoder to effectively
distinguish the required instruments from the irrelevant ones. Therefore, our
ASI-Seg promotes the workflow in the operating rooms, thereby providing
targeted support and reducing the cognitive load on surgeons. Extensive
experiments are performed to validate the ASI-Seg framework, which reveals
remarkable advantages over classical state-of-the-art and medical SAMs in both
semantic segmentation and intention-oriented segmentation. The source code is
available at https://github.com/Zonmgin-Zhang/ASI-Seg. |
---|---|
DOI: | 10.48550/arxiv.2407.19435 |