b-cos-networks

Align Once to Explain: Feature Alignment for Scalable B-cosification of Foundational Vision Transformers

ALOE is a method to transform large-scale ViT-based foundation models such as DINOv3 and SigLIP2 into inherently interpretable B-cos variants at a fraction of the cost of training from scratch, bringing interpretability while retaining strong performance across a range of downstream tasks and datasets.

B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability

B-cos LMs extend B-cos networks to language models, providing more faithful and human interpretable explanations than post-hoc methods while maintaining comparable task performance.

B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable

B-cosification is a method to transform existing pre-trained models to inherently interpretable B-cos variants at a fraction of the cost of training from scratch, yielding models that are interpretable while often outperforming them in terms of classification performance. We also apply B-cosification to CLIP and show that the B-cosified version remains competitive on performance while being interpretable.