XSkill: Continual Learning from Experience and Skills in Multimodal Agents
This paper introduces XSkill, a dual-stream framework enabling multimodal agents to continually learn from visually-grounded task-level skills and action-level experiences without explicit retraining. This approach improves agent performance by enhancing tool-use efficiency and flexibility.
cs.AI
Read Analysis →On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents
This paper explores the phenomenon of "information self-locking" in reinforcement learning for active reasoning in Large Language Model (LLM) agents. It investigates how LLM agents might get stuck in suboptimal reasoning loops and proposes methods to overcome these limitations for improved active reasoning.
cs.AI
Read Analysis →Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training
This research investigates using reasoning Large Language Models (LLMs) as judges for evaluating other LLMs during post-training in non-verifiable domains, exploring their effectiveness, practical impact, and potential pitfalls in complex, subjective tasks.
cs.AI
Read Analysis →A prospective clinical feasibility study of a conversational diagnostic AI in an ambulatory primary care clinic
This paper presents a prospective clinical feasibility study of an LLM-based conversational AI (Amy) in a real-world primary care setting. It evaluates Amy's diagnostic capabilities, management plans, and user satisfaction, finding high safety and acceptance, despite human providers having an edge in practicality and cost-effectiveness of management plans. It's a vital step towards broader clinical translation.
cs.AI
Read Analysis →A Robust and Efficient Multi-Agent Reinforcement Learning Framework for Traffic Signal Control
This paper proposes a robust Multi-Agent Reinforcement Learning (MARL) framework for Traffic Signal Control, validated in the Vissim traffic simulator. It addresses generalization challenges through adaptive state representation, a novel reward function, and agent communication. The framework shows superior performance in diverse traffic scenarios.
cs.AI
Read Analysis →Can RL Improve Generalization of LLM Agents? An Empirical Study
This empirical study investigates whether Reinforcement Learning (RL) can enhance the generalization capabilities of Large Language Model (LLM) agents. The research explores various RL techniques and their impact on LLM agents' performance across diverse and unseen tasks.
cs.AI
Read Analysis →OpenClaw-RL: Train Any Agent Simply by Talking
This framework converts real-time "next-state signals" from AI agent interactions into continuous, online learning sources. It recovers both implicit evaluative signals and explicit directive signals, enabling agents to achieve rapid personalization in conversational settings and improve performance across diverse general agent tasks like terminal, GUI, SWE, and tool-calling environments. This allows agents to improve simply by being used, adapting to user re-queries, corrections, and explicit feedback.
cs.AI✓ AI
Read Analysis →Highly Autonomous Cyber-Capable Agents: Anticipating Capabilities, Tactics, and Strategic Implications
This report introduces "Highly Autonomous Cyber-Capable Agents" (HACCAs), AI systems capable of autonomously conducting multi-stage cyber campaigns comparable to top hacking groups. It defines HACCAs, forecasts their emergence, identifies five core operational tactics (e.g., autonomous infrastructure setup, detection evasion), and analyzes strategic implications like intensified interstate cyber competition and proliferation of offensive capabilities. It also flags tail risks such as inadvertent cyber-nuclear escalation and sustained loss of control, proposing policy recommendations.
cs.AI
Read Analysis →Few-for-Many Personalized Federated Learning
This paper addresses scalability in Personalized Federated Learning (PFL) for heterogeneous data distributions by reformulating PFL as a "few-for-many" optimization problem. It maintains a small number of shared server models (K << M clients) to collectively serve all clients, rather than M distinct models. The proposed algorithm, FedFew, automatically discovers optimal model diversity through efficient gradient-based updates, achieving near-optimal personalization and outperforming state-of-the-art approaches with as few as 3 models on vision, NLP, and medical imaging datasets.
cs.AI
Read Analysis →