concept-bottlenecks

Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery

Discover-then-Name is an efficient task-agnostic approach to build concept bottleneck models (CBMs) by first discovering concepts learnt by the model using sparse autoencoders and then naming them automatically, yielding semantically meaningful concepts with appropriate names that help construct performant and interpretable CBMs.