Scaling Laws for Energy Efficiency of Local LLMs

Deploying local large language models and vision-language models on edge devices requires balancing accuracy with constrained computational and energy budgets. This paper systematically benchmarks LLMs and VLMs on CPU tiers, uncovering scaling laws for computational cost with token length and image resolution, and showing that quantum-inspired compression can reduce energy consumption by up to 62% while preserving accuracy, enabling sustainable edge inference.

Scaling Laws for Energy Efficiency of Local LLMs

Abstract

Projects