发表期刊arXivarXiv:2503.13068:《Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation》