Block-Based Visual Programming Language

Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding

Abstract: Visual grounding seeks to localize the image region corresponding to a free-form text description. Recently, the strong multimodal capabilities of Large Vision-Language Models (LVLMs) have ...

IEEE

ViUniT: Visual Unit Tests for More Robust Visual Programming

Abstract: Programming based approaches to reasoning tasks have substantially expanded the types of questions models can answer about visual scenes. Yet on benchmark visual reasoning data, when models ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding

ViUniT: Visual Unit Tests for More Robust Visual Programming

Trending now