attribution-methods | Sukrut Rao

FaCT: Faithful Concept Traces for Explaining Neural Network Decisions

FaCT combines concept-discovery with model-inherent attributions to construct a model that provides faithful concept traces for explaining its decisions, i.e., contributions of pixels to concepts and concepts to the final decision can be faithfully traced. We also propose a novel concept-consistency metric, C2-Score, and show that FaCT yields more consistent and interpretable concepts while retaining competitive performance.

Good Teachers Explain: Explanation-Enhanced Knowledge Distillation

Explanation-enhanced knowledge distillation (e2KD) is a method to faithfully distill teacher models into students by additionally optimizing the similarity of teacher and student explanations. We show that e2KD consistently improves accuracy and student-teacher agreement, ensures that students learn from teachers to be right for the right reasons, and is robust across architectures, data amounts, and works even with pre-computed explanations.

Better Understanding Differences in Attribution Methods via Systematic Evaluations

We propose three novel evaluation schemes to better understand the faithfulness and differences between attribution methods, and use them to study strengths and shortcomings of some widely used attribution methods. We extend [our work on attribution evaluation](publication/towards-better-understanding-attribution-methods/) to more attribution methods, models, and perform additional analyses.

Studying How to Efficiently and Effectively Guide Models with Explanations

We perform an in-depth study on model guidance with explanations by evaluating various design choices, and also explore ways to improve efficiency. We show that guidance is effective even with limited, coarse, and noisy annotations, using the energy loss with model-inherent B-cos explanations works the best, and that guidance can help improve generalization under distribution shifts.

Understanding Attributions

Towards Better Understanding Attribution Methods