SOLAR

Scalable Optimization of Large-scale Architecture for Reasoning

Chen Li*, Yinyi Luo*, Anudeepsekhar Bolimera, Uzair Ahmed, Shri Kiran Srinivasan, Hrishikesh Gokhale, Marios Savvides

* Equal Contribution

Carnegie Mellon University

Introduction

Large Language Models excel in reasoning yet often rely on Chain-of-Thought prompts, limiting performance on tasks demanding more nuanced topological structures. We present SOLAR (Scalable Optimization of Large-scale Architecture for Reasoning), a framework that dynamically optimizes Chain-of-Thought (CoT), Tree-of-Thought (ToT), and Graph-of-Thought (GoT) topologies to boost accuracy and efficiency. Our Topological-Annotation-Generation (TAG) system automates dataset creation, annotation, and difficulty segmentation, leading to stronger post training and test-time performance. We also propose Topological-Scaling, a curriculum-learning-based approach that adaptively combines post training and inference scaling to each task. On MATH and GSM8K, SOLAR delivers notable
gains: +5% accuracy with Topological Tuning, +9% with Topological Rewarding, and +10.02% with Hybrid Scaling, while reducing response length by over 5%, lowering inference latency. To further enhance efficiency, we introduce a
multi-task Topological Reward Model (M-TRM) that selects both the optimal reasoning topology and final answer in a single pass, eliminating multiple single-task TRMs. Remarkably, M-TRM also surpasses all single-task TRMs, improving
accuracy by +10% and rank correlation by +9%. Overall, SOLAR establishes a new benchmark for scalable, high-precision LLM reasoning and introduces a fully automated, dynamic topology competition mechanism.

An overview of the STELAR-Vision framework.

SOLAR Architecture

Motivation

Win Rate comparisons across pretrained models demonstrate that different tasks favor different reasoning topologies, as evidenced by distinct win-rate distributions. This finding underscores the potential to enhance LLM reasoning by explicitly augmenting them with optimal topological strategies.

Contribution

Topological Reasoning Characterization: We systematically show that different tasks require distinct topologies, a phenomenon validated across various models and datasets.
Topological-Annotation-Generation (TAG): An automated system to build and annotate large-scale topological datasets, including difficulty segmentation, facilitating robust post training.
Hierarchical Topological-Scaling Framework: A unified mechanism combining post training and inference scaling optimizations, significantly boosting performance while allowing flexible trade-offs between accuracy and efficiency.

Experiments

Conclusion

In this work, we introduced SOLAR, a paradigm shift in LLM reasoning that adaptively selects among Chain-of-Thought, Tree-of-Thought, and Graph-of-Thought strategies, unifying post-training and inference-scale optimizations to generate effective policies and refine solutions through competitive selection. Leveraging Topological-Annotation-Generation (TAG) and curriculum learning-based Topological Scaling, SOLAR surpasses conventional reasoning approaches, achieving higher performance on MATH and GSM8K while exhibiting “resilience to overthinking” by reducing response length without sacrificing accuracy. Looking forward, we aim to explore how reasoning structures interact with scaling laws, what factors drive Vision-Language Models toward non-default reasoning topologies across their development lifecycle, and the principles underlying our anti-overthinking behavior. We are also integrating reward model based methods to further enhance generalization, advancing the scalability, efficiency, and ethical foundations of adaptive reasoning architectures.