vision-language-models

TEVI: Text-Conditioned Editing of Visual Representations via Sparse Autoencoders for Improved Vision-Language Alignment

TEVI is a framework that uses captions to guide the editing of image embeddings via sparse autoencoders, improving vision-language alignment and retrieval performance.

Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery

Discover-then-Name is an efficient task-agnostic approach to build concept bottleneck models (CBMs) by first discovering concepts learnt by the model using sparse autoencoders and then naming them automatically, yielding semantically meaningful concepts with appropriate names that help construct performant and interpretable CBMs.