Training-Free Layer Selection for Parameter-Efficient Fine-Tuning of Language Models
Abstract
Adapting large language models (LLMs) to downstream tasks via full fine-tuning is computationally prohibitive. While parameter-efficient fine-tuning (PEFT) methods exist, they often rely on predefined heuristics, incur training overhead for parameter selection, or suffer from poor generalization. This work introduces a novel, training-free layer selection strategy for partial fine-tuning. Our approach leverages the geometric relationships between layer representations by computing the cosine similarity of the [CLS] token embeddings across all layers before fine-tuning begins, using only a single forward pass on a sample of the data. This yields layer-wise importance scores, allowing us to strategically select a small subset of layers for adaptation while freezing the rest. Extensive experiments across 15 diverse NLP tasks, including single-sentence and sentence-pair classifications, demonstrate that our method consistently outperforms various PEFT baselines, including heuristic selections, dynamic/gradient-based methods, and I/O similarity-based selection. Critically, it achieves performance remarkably close (often within 1-2%) to full fine-tuning while drastically reducing trainable parameters by up to 75% and reaching training speedups of 1.5×. Furthermore, the method exhibits superior robustness in cross-domain evaluations compared to baselines and generalizes effectively across different model architectures. By exploiting inherent structural properties of pre-trained models via inter-layer [CLS] token similarity, our approach offers an efficient, effective, and robust paradigm for partial LLM fine-tuning.
Collections
- 2025 [2]