- Mechanistic Permutability: Match Features Across Layers
In this paper, we introduce SAE Match, a novel, data-free method for aligning SAE features across different layers of a neural network
- M P : M F ACROSS LAYERS MECHANISTIC PERMUTABILITY: MATC
We propose SAE Match, a novel method for aligning Sparse Autoencoder features across layers without the need for input data, enabling the study of feature dynamics throughout the network
- Mechanistic Permutability: Match Features Across Layers
Through extensive experiments on the Gemma 2 language model, we demonstrate that our method effectively captures feature evolution across layers, improving feature matching quality
- Cross-Layer Feature Alignment and Steering in Large Language
In Mechanistic Permutability [1], we proposed a method to match features across layers by comparing their SAE parameters, suggesting that many features are re-indexed rather than disappearing as you go deeper
- Mechanistic Permutability: Match Features Across Layers
In this paper, we introduce SAE Match, a novel, data-free method for aligning SAE features across different layers of a neural network
- arXiv:2410. 07656v1 [cs. LG] 10 Oct 2024 - ResearchGate
We propose SAE Match, a novel method for aligning Sparse Autoencoder features across layers without the need for input data, enabling the study of feature dynamics throughout the network
- Mechanistic Permutability: Match Features Across Layers
We've developed SAE Match, a method to align interpretable features across layers in deep neural networks This approach reveals how features in large language models (LLMs) persist and transform across layers, providing new insights into their internal workings
- Mechanistic Permutability: Match Features Across Layers
Understanding how features evolve across layers in deep neural networks is a fundamental challenge in mechanistic interpretability, particularly due to polysemanticity and feature superposition
|