Accelerating sparse autoencoder training via layer-wise transfer learning in large language models

Ghilardi, Davide, Belotti, Federico, Molinari, Marco and Lim, Jaehyuk (2024) Accelerating sparse autoencoder training via layer-wise transfer learning in large language models. In: Belinkov, Yonatan, Kim, Najoung, Jumelet, Jaap, Mohebbi, Hosein, Mueller, Aaron and Chen, Hanjie, (eds.) Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP. Proceedings of BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP) (7). Association for Computational Linguistics, Miami, FL, 530 - 550. ISBN 9798891761704

Text (2024.blackboxnlp-1.32) - Published Version
Available under License Creative Commons Attribution.
Download (4MB)

Scopus publication

Identification Number: 10.18653/v1/2024.blackboxnlp-1.32

Abstract

Sparse AutoEncoders (SAEs) have gained popularity as a tool for enhancing the interpretability of Large Language Models (LLMs). However, training SAEs can be computationally intensive, especially as model complexity grows. In this study, the potential of transfer learning to accelerate SAEs training is explored by capitalizing on the shared representations found across adjacent layers of LLMs. Our experimental results demonstrate that fine-tuning SAEs using pre-trained models from nearby layers not only maintains but often improves the quality of learned representations, while significantly accelerating convergence. These findings indicate that the strategic reuse of pre-trained SAEs is a promising approach, particularly in settings where computational resources are constrained.

Item Type:	Book Section
Additional Information:	© 2024 Association for Computational Linguistics
Divisions:	LSE
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Date Deposited:	26 Jun 2025 08:57
Last Modified:	15 Nov 2025 01:30
URI:	http://eprints.lse.ac.uk/id/eprint/128562

Actions (login required)

View Item

Download Statistics

Downloads

Downloads per month over past year

View more statistics