b-cos-networks

Align Once to Explain: Feature Alignment for Scalable B-cosification of Foundational Vision Transformers

Foundational vision models have become the de facto standard for many vision tasks due to their strong performance. However, they are notoriously opaque and remain hard to interpret. We present ALOE (ALign Once to Explain), a one-time, label-free …

B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability

Post-hoc explanation methods for black-box models often struggle with faithfulness and human interpretability due to the lack of explainability in current neural architectures. Meanwhile, B-cos networks have been introduced to improve model …

B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable

B-cos Networks have been shown to be effective for obtaining highly human interpretable explanations of model decisions by architecturally enforcing stronger alignment between inputs and weight. B-cos variants of convolutional networks (CNNs) and …