inherent-interpretability

CFM: Language-aligned Concept Foundation Model for Vision

CFM is a language-aligned concept foundation model that extracts spatially localized, visually grounded, and human-interpretable concepts at various granularities from images, organizes them into hierarchies, automatically assigns names to them, enabling concept-based explanations for any downstream task that the foundation model can perform, such as classification, open vocabulary segmentation, and captioning.

Align Once to Explain: Feature Alignment for Scalable B-cosification of Foundational Vision Transformers

ALOE is a method to transform large-scale ViT-based foundation models such as DINOv3 and SigLIP2 into inherently interpretable B-cos variants at a fraction of the cost of training from scratch, bringing interpretability while retaining strong performance across a range of downstream tasks and datasets.

FaCT: Faithful Concept Traces for Explaining Neural Network Decisions

FaCT combines concept-discovery with model-inherent attributions to construct a model that provides faithful concept traces for explaining its decisions, i.e., contributions of pixels to concepts and concepts to the final decision can be faithfully traced. We also propose a novel concept-consistency metric, C2-Score, and show that FaCT yields more consistent and interpretable concepts while retaining competitive performance.