DAM4SAM tracking examples
Memory-based trackers such as SAM2 demonstrate remarkable performance, however still struggle with distractors. We propose a new plug-in distractor-aware memory (DAM) and management strategy that substantially improves tracking robustness. The new model is demonstrated on SAM2.1, leading to DAM4SAM, which sets a new state-of-the-art on six benchmarks, including the most challenging VOT/S benchmarks without additional training. We also propose a new distractor-distilled (DiDi) dataset to better study the distractor problem.
DiDi is a distractor-distilled tracking dataset created to address the limitation of low distractor presence in current visual object tracking benchmarks. To enhance the evaluation and analysis of tracking performance amidst distractors, we have semi-automatically distilled several existing benchmarks into the DiDi dataset. The dataset is available for download at this link.
Model | Quality | Accuracy | Robustness |
---|---|---|---|
TransT | 0.465 | 0.669 | 0.678 |
KeepTrack | 0.502 | 0.646 | 0.748 |
SeqTrack | 0.529 | 0.714 | 0.718 |
AQATrack | 0.535 | 0.693 | 0.753 |
AOT | 0.541 | 0.622 | 0.852 |
Cutie | 0.575 | 0.704 | 0.776 |
ODTrack | 0.608 | 0.740 🥇 | 0.809 |
SAM2.1Long | 0.646 | 0.719 | 0.883 |
SAM2.1 (baseline) | 0.649 🥉 | 0.720 | 0.887 🥉 |
SAMURAI | 0.680 🥈 | 0.722 🥉 | 0.930 🥈 |
DAM4SAM (ours) | 0.694 🥇 | 0.727 🥈 | 0.944 🥇 |
We demonstrate the ability of DAM4SAM, combined with ProPainter, to remove the objects in the video. Object removal is performed by a simple pipeline: first, using proposed DAM4SAM for segmenting a selected object in each frame and second using the ProPainter tool for object inpainting. Integration script is included on our GitHub.