Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

TL;DR

RandOpt enhances large-scale models via random perturbations and ensemble voting around pretrained weights.

cs.LG 🔴 Advanced 2026-03-13 12 views

Yulu Gan Phillip Isola

pretraining random perturbation ensemble learning large-scale models optimization methods

Key Findings

Methodology

The paper introduces RandOpt, a method that enhances model performance by random parameter perturbations around pretrained weights, selecting the top K perturbations, and ensembling predictions via majority vote. This fully parallel approach does not rely on gradient updates and is suitable for post-training optimization of large-scale models. Core mechanisms include random sampling, performance evaluation, and ensemble voting.

Key Results

On the Countdown task using the Olmo-3-7B-Instruct model, RandOpt achieved accuracy comparable to GRPO and ES by making 5000 random weight guesses and ensembling the best perturbations.
RandOpt improved the accuracy of the Qwen2.5-VL-3B-Instruct model on the GQA dataset by 12.4%.
Experiments show that RandOpt performs competitively across various tasks and model scales, even without sequential optimization steps.

Significance

This research reveals the high-density distribution of task experts around pretrained weights in large-scale models. RandOpt provides an effective post-training optimization strategy through simple random perturbations and ensemble methods, challenging traditional gradient-based optimization approaches. This finding is significant for optimizing and applying large-scale models, especially under limited computational resources.

Technical Contribution

RandOpt's technical contributions lie in its fully parallel design, which does not rely on gradient updates and performs excellently in large-scale models. By exploring the high-density solution space around pretrained weights through random perturbations and ensemble voting, this method offers new possibilities for post-training optimization of large-scale models.

Novelty

RandOpt's novelty lies in leveraging the high-density task experts around pretrained weights to enhance performance through random perturbations and ensemble voting. Unlike traditional gradient-based methods, it offers a novel post-training optimization strategy, particularly suitable for large-scale models.

Limitations

RandOpt requires K forward passes at test time, increasing computational cost.
The method is less effective on small-scale models due to lower density of expert solutions.
In some tasks, performance gains may partly result from format corrections rather than genuine reasoning improvements.

Future Work

Future research directions include exploring ways to enhance RandOpt's performance without increasing computational cost, such as integrating multiple models' strengths into a single model through distillation. Additionally, applying RandOpt to a broader range of tasks and models and further optimizing its random perturbation strategy are promising areas for exploration.

AI Executive Summary

In today's AI research, pretrained models have become a standard approach, offering generalized representation capabilities through pretraining on large datasets. However, effectively optimizing these pretrained models post-training remains a challenging issue. Traditional methods often rely on iterative optimization techniques like gradient descent, which may not be efficient for large-scale models.

This paper introduces a novel method called RandOpt, which enhances model performance by random parameter perturbations around pretrained weights, selecting the top K perturbations, and ensembling predictions via majority vote. The core of this method is leveraging the high-density task experts around pretrained weights through simple random perturbations and ensemble methods, achieving performance comparable to or better than traditional methods.

RandOpt's technical principles include three key steps: first, performing N random parameter perturbations around pretrained weights; second, evaluating each perturbation's performance; and finally, selecting the top K perturbations for ensemble voting. This approach allows RandOpt to quickly find high-performing task experts without relying on gradient updates.

Experimental results demonstrate that RandOpt performs excellently across various tasks and model scales. For instance, on the Countdown task using the Olmo-3-7B-Instruct model, RandOpt achieved accuracy comparable to GRPO and ES by making 5000 random weight guesses and ensembling the best perturbations. Additionally, RandOpt improved the accuracy of the Qwen2.5-VL-3B-Instruct model on the GQA dataset by 12.4%.

The broad application of this method could have profound impacts on optimizing and applying large-scale models, especially under limited computational resources. RandOpt offers a novel post-training optimization strategy, challenging traditional gradient-based methods, and reveals the high-density distribution of task experts around pretrained weights in large-scale models.

However, RandOpt also has limitations, such as requiring K forward passes at test time, increasing computational cost. Additionally, the method is less effective on small-scale models due to lower density of expert solutions. Future research directions include exploring ways to enhance RandOpt's performance without increasing computational cost and applying it to a broader range of tasks and models.

Deep Analysis

Background

In recent years, pretrained models have made significant advances in fields like natural language processing and computer vision. By pretraining on large datasets, these models acquire generalized representation capabilities, providing a strong starting point for downstream tasks. However, effectively optimizing these pretrained models post-training remains a challenging issue. Traditional methods often rely on iterative optimization techniques like gradient descent, which may not be efficient for large-scale models and require substantial computational resources. Furthermore, as model size increases, the distribution characteristics of task experts around pretrained weights change, offering potential for new optimization methods.

Core Problem

The core problem addressed in this paper is how to effectively optimize large-scale pretrained models post-training. Traditional gradient-based optimization methods may not be efficient for large-scale models and require substantial computational resources. Additionally, as model size increases, the distribution characteristics of task experts around pretrained weights change, offering potential for new optimization methods. Therefore, designing an efficient post-training optimization method that leverages these distribution characteristics is the focus of this research.

Innovation

RandOpt's core innovation lies in leveraging the high-density task experts around pretrained weights to enhance performance through random perturbations and ensemble voting. This method differs from traditional gradient-based optimization methods, offering a novel post-training optimization strategy, particularly suitable for large-scale models. Specifically, RandOpt performs random parameter perturbations around pretrained weights, selects the top K perturbations, and ensembles predictions via majority vote to enhance model performance. This fully parallel approach does not rely on gradient updates and is suitable for post-training optimization of large-scale models.

Methodology

RandOpt's methodology involves the following steps:

�� Random Perturbation: Perform N random parameter perturbations around pretrained weights, generating multiple candidate models.

�� Performance Evaluation: Evaluate each candidate model's performance on specific tasks.

�� Ensemble Voting: Select the top K candidate models and generate final predictions through ensemble voting.

This approach allows RandOpt to quickly find high-performing task experts without relying on gradient updates.

Experiments

The experimental design includes testing RandOpt's performance across various tasks and model scales. Datasets used include Countdown, GSM8K, MATH-500, OlympiadBench, etc., with models like Qwen, Llama, OLMo3, covering parameter scales from 0.5B to 8B. Baseline methods include PPO, GRPO, ES, etc., with evaluation metrics like accuracy and reasoning ability. Ablation studies were conducted to verify the contribution of RandOpt's key components to performance.

Results

Applications

RandOpt's application scenarios include post-training optimization of large-scale models, especially under limited computational resources. By leveraging the high-density task experts around pretrained weights, RandOpt can enhance model performance without increasing computational cost. Additionally, this method can be applied to multi-task learning, model ensembling, and other fields, offering new ideas for optimizing and applying large-scale models.

Limitations & Outlook

Despite RandOpt's excellent performance on large-scale models, its effectiveness on small-scale models is limited due to lower density of expert solutions. Additionally, RandOpt requires K forward passes at test time, increasing computational cost. Future research directions include exploring ways to enhance RandOpt's performance without increasing computational cost and applying it to a broader range of tasks and models.

Plain Language Accessible to non-experts

Imagine you're in a massive library searching for a specific book. This library represents a pretrained large-scale model, and the book you're looking for is a specific task expert. In traditional methods, you might need a complex search algorithm, like gradient descent, to help you find this book. But in RandOpt, we use a simpler approach: randomly picking some books and then selecting the ones closest to what you want.

This process is like randomly selecting some books in the library and quickly browsing through them to judge which one is closest to what you're looking for. Finally, you combine the contents of these books to form a complete answer. The advantage of this method is that it doesn't require complex search algorithms, just simple random selection and quick judgment.

In this way, RandOpt can quickly find high-performing task experts in large-scale models without relying on traditional gradient-based optimization methods. It's like finding the book you want in a massive library through random selection and quick judgment, simple yet effective.

ELI14 Explained like you're 14

Hey there! Imagine you're playing a super complex game with a huge map, and you need to find treasure hidden on the map. Traditional methods are like using a super complex compass to help you get closer to the treasure step by step. But today, we're talking about RandOpt, which is like randomly placing some markers on the map and then seeing which marker is closest to the treasure.

This method is like randomly placing some markers in the game and quickly checking which marker is closest to the treasure. Finally, you combine the information from these markers to form a complete route. The advantage of this method is that it doesn't require a complex compass, just simple random placement and quick judgment.

In this way, RandOpt can quickly find the treasure in the game without relying on traditional compass methods. It's like finding the treasure you want on a huge game map through random placement and quick judgment. Isn't that cool?

Glossary

Pretraining

The process of training a model on a large dataset to acquire generalized representation capabilities.

In this paper, pretraining is the foundation of RandOpt, providing the initial weight distribution.

Random Perturbation

The process of randomly altering model parameters to explore different solutions.

RandOpt uses random perturbations to find high-performing solutions around pretrained weights.

Ensemble Voting

A method of improving final prediction accuracy by combining predictions from multiple models.

RandOpt uses ensemble voting to select the best perturbations.

Task Expert

A model or model parameters that perform exceptionally well on a specific task.

RandOpt leverages task experts around pretrained weights to enhance performance.

Gradient Descent

An optimization algorithm that iteratively updates parameters to minimize a loss function.

A commonly used optimization technique in traditional methods, which RandOpt does not rely on.

Large-scale Model

A deep learning model with a large number of parameters, typically trained on large datasets.

RandOpt is particularly suitable for post-training optimization of large-scale models.

Post-training Optimization

Further optimization performed on a pretrained model to enhance performance on specific tasks.

RandOpt is a post-training optimization method.

Ablation Study

A study that evaluates the impact of removing or altering certain parts of a model on overall performance.

Used in experiments to verify the contribution of RandOpt's key components.

Reasoning Ability

The model's ability to understand and solve complex problems.

RandOpt enhances reasoning ability across multiple tasks.

Computational Cost

The computational resources and time required to execute an algorithm or train a model.

RandOpt has a higher computational cost at test time.

Open Questions Unanswered questions from this research

1 How can RandOpt's performance be enhanced without increasing computational cost? The current method requires multiple forward passes at test time, increasing computational cost.
2 RandOpt is less effective on small-scale models; how can its performance be improved on these models?
3 What is the potential for applying RandOpt to a broader range of tasks and models? Current research mainly focuses on specific tasks and model scales.
4 How can RandOpt's random perturbation strategy be further optimized to improve performance across different tasks?
5 Are RandOpt's performance gains partly due to format corrections rather than genuine reasoning improvements? How can the contributions of these two factors be distinguished?
6 In multi-task learning, how can RandOpt be combined with other ensemble learning methods to improve overall performance?
7 How can RandOpt's methodology be applied to optimize other types of models, such as generative models or reinforcement learning models?

Applications

Immediate Applications

Large-scale Model Optimization

RandOpt can be used to optimize large-scale pretrained models, especially under limited computational resources, by enhancing performance through random perturbations and ensemble voting.

Multi-task Learning

In multi-task learning, RandOpt can quickly find expert solutions for specific tasks, improving overall performance.

Model Ensembling

RandOpt offers a new model ensembling method, enhancing robustness and accuracy through random perturbations and ensemble voting.

Long-term Vision

General Artificial Intelligence

By optimizing large-scale models, RandOpt has the potential to advance general artificial intelligence, especially in multi-task and multi-domain applications.

Automated Optimization

RandOpt's methodology can be used to develop automated model optimization tools, reducing human intervention and increasing efficiency.

Abstract

Pretraining produces a learned parameter vector that is typically treated as a starting point for further iterative adaptation. In this work, we instead view the outcome of pretraining as a distribution over parameter vectors, whose support already contains task-specific experts. We show that in small models such expert solutions occupy a negligible fraction of the volume of this distribution, making their discovery reliant on structured optimization methods such as gradient descent. In contrast, in large, well-pretrained models the density of task-experts increases dramatically, so that diverse, task-improving specialists populate a substantial fraction of the neighborhood around the pretrained weights. Motivated by this perspective, we explore a simple, fully parallel post-training method that samples $N$ parameter perturbations at random, selects the top $K$, and ensembles predictions via majority vote. Despite its simplicity, this approach is competitive with standard post-training methods such as PPO, GRPO, and ES for contemporary large-scale models.

cs.LG cs.AI

References (20)

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

Drew A. Hudson, Christopher D. Manning

2019 2850 citations ⭐ Influential

Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning

Xin Qiu, Yulu Gan, Conor F. Hayes et al.

2025 11 citations ⭐ Influential View Analysis →

Training Verifiers to Solve Math Word Problems

K. Cobbe, Vineet Kosaraju, Mo Bavarian et al.

2021 7708 citations ⭐ Influential View Analysis →

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal et al.

2017 25735 citations ⭐ Influential View Analysis →

Evaluating Benchmark Problems by Random Guessing

J. Kolen, S. C. Kremer

2001 23 citations ⭐ Influential

How Learning Can Guide Evolution

Geoffrey E. Hinton, S. Nowlan

1996 1232 citations

Training language models to follow instructions with human feedback

Long Ouyang, Jeff Wu, Xu Jiang et al.

2022 18970 citations View Analysis →

The Linear Representation Hypothesis and the Geometry of Large Language Models

Kiho Park, Yo Joong Choe, Victor Veitch

2023 401 citations View Analysis →

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

Y. Gal, Zoubin Ghahramani

2015 11154 citations View Analysis →

HybridFlow: A Flexible and Efficient RLHF Framework

Guangming Sheng, Chi Zhang, Zilingfeng Ye et al.

2024 1287 citations View Analysis →

Spurious Rewards: Rethinking Training Signals in RLVR

Rulin Shao, Shuyue Stella Li, R. Xin et al.

2025 135 citations View Analysis →

Learning to Reason in 13 Parameters

John X. Morris, Niloofar Mireshghallah, Mark Ibrahim et al.

2026 2 citations View Analysis →

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Chelsea Finn, P. Abbeel, S. Levine

2017 13954 citations View Analysis →

PEP: Parameter Ensembling by Perturbation

Alireza Mehrtash, P. Abolmaesumi, P. Golland et al.

2020 11 citations View Analysis →

Analyzing Reinforcement Learning Benchmarks with Random Weight Guessing

Declan Oller, T. Glasmachers, Giuseppe Cuccu

2020 11 citations View Analysis →

Interpreting the Weight Space of Customized Diffusion Models

Amil Dravid, Yossi Gandelsman, Kuan-Chieh Jackson Wang et al.

2024 24 citations View Analysis →

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

Chaoqun He, Renjie Luo, Yuzhuo Bai et al.

2024 838 citations View Analysis →

Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving

Yangzhen Wu, Zhiqing Sun, Shanda Li et al.

2024 164 citations View Analysis →

MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model

Prasanna Mayilvahanan, Ricardo Dominguez-Olmedo, Thaddaus Wiedemer et al.

2025 3 citations View Analysis →

Snapshot Ensembles: Train 1, get M for free

Gao Huang, Yixuan Li, Geoff Pleiss et al.

2017 1055 citations View Analysis →

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

Pretraining

Random Perturbation

Ensemble Voting

Task Expert

Gradient Descent

Large-scale Model

Post-training Optimization

Ablation Study

Reasoning Ability

Computational Cost

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Large-scale Model Optimization

Multi-task Learning

Model Ensembling

Long-term Vision

General Artificial Intelligence

Automated Optimization

Abstract

References (20)

Related Papers

PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization

Representation Learning for Spatiotemporal Physical Systems

Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights

MXNorm: Reusing MXFP block scales for efficient tensor normalisation

ZO-SAM: Zero-Order Sharpness-Aware Minimization for Efficient Sparse Training

BoSS: A Best-of-Strategies Selector as an Oracle for Deep Active Learning