language-models

B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability

B-cos LMs extend B-cos networks to language models, providing more faithful and human interpretable explanations than post-hoc methods while maintaining comparable task performance.