Home
News
People
Publications
Seminar
Courses
Events
Vacancies
Contact
Pulp Fictions
Light
Dark
Automatic
visual reasoning
Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models
Visual latent reasoning lets a multimodal large language model (MLLM) create intermediate visual evidence as continuous tokens, …
Yanting Miao
,
Yutao Sun
,
Dexin Wang
,
Mengyu Zhou
,
Pascal Poupart
,
Lei Lv
,
Qi Zhao
,
Li Wang
,
Hao Li
,
Xiaoxi Jiang
,
Guanjun Jiang
PDF
Cite
Source Document
Cite
×