发表期刊

arXiv

arXiv:2503.13068:《Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation》