DAM4SAM

A Distractor-Aware Memory (DAM) for
Visual Object Tracking with SAM2

Faculty of Computer and Information Science, University of Ljubljana
^*Equal Contribution

Abstract

Memory-based trackers such as SAM2 demonstrate remarkable performance, however still struggle with distractors. We propose a new plug-in distractor-aware memory (DAM) and management strategy that substantially improves tracking robustness. The new model is demonstrated on SAM2.1, leading to DAM4SAM, which sets a new state-of-the-art on six benchmarks, including the most challenging VOT/S benchmarks without additional training. We also propose a new distractor-distilled (DiDi) dataset to better study the distractor problem.

DiDi: A distractor-distilled dataset

DiDi is a distractor-distilled tracking dataset created to address the limitation of low distractor presence in current visual object tracking benchmarks. To enhance the evaluation and analysis of tracking performance amidst distractors, we have semi-automatically distilled several existing benchmarks into the DiDi dataset. The dataset is available for download at this link.

Example frames from the DiDi dataset showing distractors. Targets are denoted by green bounding boxes.

SOTA comparison on DiDi

Model	Quality	Accuracy	Robustness
TransT	0.465	0.669	0.678
KeepTrack	0.502	0.646	0.748
SeqTrack	0.529	0.714	0.718
AQATrack	0.535	0.693	0.753
AOT	0.541	0.622	0.852
Cutie	0.575	0.704	0.776
ODTrack	0.608	0.740 🥇	0.809
SAM2.1Long	0.646	0.719	0.883
SAM2.1 (baseline)	0.649 🥉	0.720	0.887 🥉
SAMURAI	0.680 🥈	0.722 🥉	0.930 🥈
DAM4SAM (ours)	0.694 🥇	0.727 🥈	0.944 🥇

Qualitative comparison

Long-term tracking amidst distractors, with frequent target occlusions

Target redetection after occlusion amidst distractors

Tracking over very long duration and appearance changes

Video object removal by Remove Anything

We demonstrate the ability of DAM4SAM, combined with ProPainter, to remove the objects in the video. Object removal is performed by a simple pipeline: first, using proposed DAM4SAM for segmenting a selected object in each frame and second using the ProPainter tool for object inpainting. Integration script is included on our GitHub.

A Distractor-Aware Memory (DAM) for
Visual Object Tracking with SAM2

Abstract

DiDi: A distractor-distilled dataset

SOTA comparison on DiDi

Qualitative comparison

Video object removal by Remove Anything

Video instructions for running

Examples

A Distractor-Aware Memory (DAM) forVisual Object Tracking with SAM2

Abstract

DiDi: A distractor-distilled dataset

SOTA comparison on DiDi

Qualitative comparison

Video object removal by Remove Anything

Video instructions for running

Examples

A Distractor-Aware Memory (DAM) for
Visual Object Tracking with SAM2