All Research Papers

Browse through 500 research papers. Search and discover the latest AI research from arXiv.

XSkill: Continual Learning from Experience and Skills in Multimodal Agents

This paper introduces XSkill, a dual-stream framework enabling multimodal agents to continually learn from visually-grounded task-level skills and action-level experiences without explicit retraining....

By: Guanyu Jiang, Zhaochen Su, Xiaoye Qu, Yi R. (May)Fung2026-03-13

#cs.AI

On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents

This paper explores the phenomenon of "information self-locking" in reinforcement learning for active reasoning in Large Language Model (LLM) agents. It investigates how LLM agents might get stuck in ...

By: Deyu Zou, Yongqiang Chen, Fan Feng, Mufei Li, Pan Li, Yu Gong, James Cheng2026-03-13

#cs.AI

Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training

This research investigates using reasoning Large Language Models (LLMs) as judges for evaluating other LLMs during post-training in non-verifiable domains, exploring their effectiveness, practical imp...

By: Yixin Liu, Yue Yu, DiJia Su, Sid Wang, Xuewei Wang, Song Jiang, Bo Liu, Arman Cohan, Yuandong Tian, Zhengxing Chen2026-03-13

#cs.AI

A prospective clinical feasibility study of a conversational diagnostic AI in an ambulatory primary care clinic

This paper presents a prospective clinical feasibility study of an LLM-based conversational AI (Amy) in a real-world primary care setting. It evaluates Amy's diagnostic capabilities, management plans,...

By: Peter Brod, Enil Pipu2026-03-10

#cs.AI

A Robust and Efficient Multi-Agent Reinforcement Learning Framework for Traffic Signal Control

This paper proposes a robust Multi-Agent Reinforcement Learning (MARL) framework for Traffic Signal Control, validated in the Vissim traffic simulator. It addresses generalization challenges through a...

By: Sheng-You Huang, Hsiao-Chuan Chang, Yen-Chi Chen, Ting-Han Wei, I-Hau Yeh, Sheng-Yao Kuan, Chien-Yao Wang, Hsuan-Han Lee, I-Chen Wu2026-03-13

#cs.AI

Can RL Improve Generalization of LLM Agents? An Empirical Study

This empirical study investigates whether Reinforcement Learning (RL) can enhance the generalization capabilities of Large Language Model (LLM) agents. The research explores various RL techniques and ...

By: Zhiheng Xi, Xin Guo, Jiaqi Liu, Jiazheng Zhang, Yutao Fan, Zhihao Zhang, Shichun Liu, Mingxu Chai, Xiaowei Shi, Yitao Zhai, Xunliang Cai, Tao Gui, Qi Zhang, Xuanjing Huang2026-03-13

#cs.AI

OpenClaw-RL: Train Any Agent Simply by Talking

This framework converts real-time "next-state signals" from AI agent interactions into continuous, online learning sources. It recovers both implicit evaluative signals and explicit directive signals,...

By: Yinjie Wang, Xuyang Chen, Xiaolong Jin, Mengdi Wang, Ling Yang2026-03-10

#cs.AI✓ Analyzed#Reinforcement Learning#Large Language Models

Highly Autonomous Cyber-Capable Agents: Anticipating Capabilities, Tactics, and Strategic Implications

This report introduces "Highly Autonomous Cyber-Capable Agents" (HACCAs), AI systems capable of autonomously conducting multi-stage cyber campaigns comparable to top hacking groups. It defines HACCAs,...

By: Jam Kraprayoon, Shaun Ee, Brianna Rosen, Yohan Matthew, Aditya Singh, Christopher Covino, Asher Brass Gershovich2026-03-12

#cs.AI

Few-for-Many Personalized Federated Learning

This paper addresses scalability in Personalized Federated Learning (PFL) for heterogeneous data distributions by reformulating PFL as a "few-for-many" optimization problem. It maintains a small numbe...

By: Ping Guo, Tiantian Zhang, Xi Lin, Xiang Li, Zhi-Ri Tang, Qingfu Zhang2026-03-12

#cs.AI

Ψ 0 Ψ_0 Ψ0: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation

This paper introduces an open foundation model for universal humanoid loco-manipulation. It employs a decoupled learning strategy that first pre-trains on human egocentric videos to acquire generaliza...

By: Songlin Wei, Hongyi Jing, Zihao Wang, Jingwen Zhang, Hongzhuo Ma, Yunpeng Zhai, Wenbo Shi, Bo Peng, Jingsheng Zhou, Pengyuan Sun, Yiran Wu, Wei Zhao, Yang Gao2026-03-12

#cs.AI

When OpenClaw Meets Hospital: Toward an Agentic Operating System for Dynamic Clinical Workflows

This work proposes an architecture that adapts LLM agents for hospital environments to significantly improve clinical workflows. It addresses reliability, security, and long-term memory limitations by...

By: Wenxian Yang, Hanzheng Qiu, Bangqun Zhang, Chengquan Li, Zhiyong Huang, Xiaobin Feng, Rongshan Yu, Jiahong Dong2026-03-12

#cs.AI

DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

Recent work synthesizes agentic tasks for post-training tool-using LLMs, yet robust generalization under shifts in tasks and toolsets remains an open challenge. We trace this brittleness to insufficie...

By: Aili Chen, Chi Zhang, Junteng Liu, Jiangjie Chen, Chengyu Du, Yunji Li, Ming Zhong, Qin Wang, Zhengmao Zhu, Jiayuan Song, Ke Ji, Junxian He, Pengyu Zhao, Yanghua Xiao2026-03-10

#cs.AI

TinyVLM: Zero-Shot Object Detection on Microcontrollers via Vision-Language Distillation with Matryoshka Embeddings

TinyVLM enables zero-shot object detection directly on microcontrollers by employing vision-language distillation with Matryoshka embeddings. This significantly pushes the boundaries of edge AI, allow...

By: Bibin Wilson2026-03-15

#cs.CV

Towards Data-driven Nitrogen Estimation in Wheat Fields using Multispectral Images

This research explores a data-driven approach for estimating nitrogen levels in wheat fields using multispectral images. This has direct real-world application in precision agriculture, enabling optim...

By: Andreas Tritsarolis, Tomaž Bokan, Matej Brumen, Domen Mongus, Yannis Theodoridis2026-03-15

#cs.CV

Latent Replay Detection: Memory-Efficient Continual Object Detection on Microcontrollers via Task-Adaptive Compression

This paper introduces Latent Replay Detection, a memory-efficient approach for continual object detection on microcontrollers. It leverages task-adaptive compression to mitigate catastrophic forgettin...

By: Bibin Wilson2026-03-15

#cs.CV

OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams

OmniStream introduces a unified framework for real-time perception, 3D reconstruction, and action planning in continuous data streams. This approach is crucial for embodied AI and robotics, enabling a...

By: Yibin Yan, Jilan Xu, Shangzhe Di, Haoning Wu, Weidi Xie2026-03-13

#cs.CV

EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation

This paper proposes EVATok, a novel adaptive length video tokenization method designed for efficient visual autoregressive generation. It aims to improve the efficiency of video generation models by d...

By: Tianwei Xiong, Jun Hao Liew, Zilong Huang, Zhijie Lin, Jiashi Feng, Xihui Liu2026-03-13

#cs.CV

GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing

This paper introduces GRADE, a benchmark for evaluating discipline-informed reasoning in image editing. It provides a structured framework to assess how well AI models understand and apply domain-spec...

By: Mingxin Liu, Ziqian Fan, Zhaokai Wang, Leyao Gu, Zirun Zhu, Yiguo He, Yuchen Yang, Changyao Tian, Xiangyu Zhao, Ning Liao, Shaofeng Zhang, Qibing Ren, Zhihang Zhong, Xuanhe Zhou, Junchi Yan, Xue Yang2026-03-13

#cs.CV

Automated Quality Check of Sensor Data Annotations

This paper proposes an automated method for checking the quality of sensor data annotations, a critical component for training reliable machine learning models in autonomous systems. Ensuring high-qua...

By: Niklas Freund, Zekiye Ilknur-Öz, Tobias Klockau, Patrick Naumann, Philipp Neumaier, Martin Köppel2026-03-15

#cs.CV

Understanding LoRA as Knowledge Memory: An Empirical Analysis

Continuous knowledge updating for pre-trained large language models (LLMs) is increasingly necessary yet remains challenging. Although inference-time methods like In-Context Learning (ICL) and Retriev...

By: Seungju Back, Dongwoo Lee, Naun Kang, Hyoungjin Kim, Seonhoon Kim, Woo Suk Choi, Kyoungmin Lee2026-03-01

#cs.AI

Multi-Sourced, Multi-Agent Evidence Retrieval for Fact-Checking

Misinformation spreading over the Internet poses a significant threat to both societies and individuals, necessitating robust and scalable fact-checking that relies on retrieving accurate and trustwor...

By: Shuzhi Gong, Richard O. Sinnott, Jianzhong Qi, Cecile Paris, Preslav Nakov, Zhuohan Xie2026-03-03

#cs.AI

Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use

Agentic reasoning models, which leverage external tools for multi-step tasks, hold immense promise but also introduce new safety challenges. A critical aspect of their safe deployment is the ability t...

By: Aradhye Agarwal, Gurdit Siyan, Yash Pandya, Joykirat Singh, Akshay Nambi, Ahmed Awadallah2026-03-04

#cs.AI

Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation

Large Language Model (LLM)-based agents are increasingly adopted in high-stakes settings, but current benchmarks evaluate mainly whether a task was completed, not how. We introduce Procedure-Aware Eva...

By: Hongliu Cao, Ilias Driouich, Eoin Thomas2026-03-04

#cs.AI

Adaptive Confidence Regularization for Multimodal Failure Detection

The deployment of multimodal models in high-stakes domains, such as self-driving vehicles and medical diagnostics, demands not only strong predictive performance but also reliable mechanisms for detec...

By: Moru Liu, Hao Dong, Olga Fink, Mario Trapp2026-03-03

#cs.AI

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

GPU kernel optimization is fundamental to modern deep learning but remains a highly specialized task requiring deep hardware expertise. Despite strong performance in general programming, large languag...

By: Weinan Dai, Hanlin Wu, Qiying Yu, Huan-ang Gao, Jiahao Li, Chengquan Jiang, Weiqiang Lou, Yufan Song, Hongli Yu, Jiaze Chen, Wei-Ying Ma, Ya-Qin Zhang, Jingjing Liu, Mingxuan Wang, Xin Liu, Hao Zhou2026-02-27

#cs.AI

SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation

This research presents SeeThrough3D, a groundbreaking method for text-to-image generation that incorporates occlusion-aware 3D control. It allows for more precise and realistic synthesis of images by ...

By: Vaibhav Agrawal, Rishubh Parihar, Pradhaan Bhat, Ravi Kiran Sarvadevabhatla, R. Venkatesh Babu2026-02-27

#cs.AI

SWE-MiniSandbox: Container-Free Reinforcement Learning for Building Software Engineering Agents

The SWE-MiniSandbox paper presents a novel container-free reinforcement learning environment designed for developing and testing software engineering agents. This sandbox facilitates efficient trainin...

By: Danlong Yuan, Wei Wu, Zhengren Wang, Xueliang Zhao, Huishuai Zhang, Dongyan Zhao2026-02-27

#cs.AI

Model Agreement via Anchoring

This paper introduces a novel approach to achieving agreement between different AI models through a technique called anchoring. It explores how anchoring can enhance the robustness and reliability of ...

By: Eric Eaton, Surbhi Goel, Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta, Jessica Sorrell2026-02-27

#cs.AI

MHDash: An Online Platform for Benchmarking Mental Health-Aware AI Assistants

This paper introduces MHDash, a new online platform specifically designed for benchmarking mental health-aware AI assistants. It provides standardized metrics and datasets to evaluate the effectivenes...

By: Yihe Zhang, Cheyenne N Mohawk, Kaiying Han, Vijay Srinivas Tida, Manyu Li, Xiali Hei2026-02-27

#cs.AI

Anatomy of Agentic Memory: Taxonomy and Empirical Analysis of Evaluation and System Limitations

This paper delves into the structure and function of "agentic memory" in AI systems, proposing a comprehensive taxonomy and an empirical analysis of its evaluation and inherent limitations. Understand...

By: Dongming Jiang, Yi Li, Songtao Wei, Jinxin Yang, Ayushi Kishore, Alysa Zhao, Dingyi Kang, Xu Hu, Feng Chen, Qiannan Li, Bingzhe Li2026-02-27

#cs.AI

CXReasonAgent: Evidence-Grounded Diagnostic Reasoning Agent for Chest X-rays

This research introduces CXReasonAgent, an AI diagnostic reasoning agent specifically designed for analyzing chest X-rays. It leverages evidence-grounded reasoning to provide highly accurate and expla...

By: Hyungyung Lee, Hangyul Yoon, Edward Choi2026-02-27

#cs.AI

Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training

This study uncovers the phenomenon of "prompt interference" during LLM post-training, explaining why optimizing for Pass@k metrics (where k>1) can inadvertently lead to a degradation in Pass@1 perform...

By: Emily Chen, David Lee, Sarah Johnson, Michael Brown, Anna Garcia, Daniel Wilson, Olivia Taylor2026-02-25

#cs.AI

ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence on Mobile Devices

ProactiveMobile introduces a new, comprehensive benchmark for evaluating and advancing proactive intelligence on mobile devices. It focuses on scenarios where AI anticipates user needs and provides ti...

By: Dezhi Kong, Zhengzhao Feng, Qiliang Liang, Hao Wang, Haofei Sun, Changpeng Yang, Yang Li, Peng Zhou, Shuai Nie, Hongzhen Wang, Linfeng Zhou2026-02-26

#cs.AI

Petri Net Relaxation for Infeasibility Explanation and Sequential Task Planning

This paper introduces a novel method for analyzing and relaxing Petri Nets to explain infeasible task plans and to facilitate robust sequential task planning in complex robotic and automated systems. ...

By: Nguyen Cong Nhat Le, John G. Rogers, Claire N. Bonial, Neil T. Dantam2026-02-26

#cs.AI

Semantic Partial Grounding via LLMs

This research introduces a novel approach to semantic partial grounding, leveraging the capabilities of large language models to interpret and act upon incomplete or ambiguous instructions in dynamic ...

By: Giuseppe Canonaco, Alberto Pozanco, Daniel Borrajo2026-02-26

#cs.AI

Language Models Exhibit Inconsistent Biases Towards Algorithmic Agents and Human Experts

This research investigates how large language models perceive and exhibit biases when evaluating the capabilities and trustworthiness of algorithmic agents versus human experts. Findings reveal incons...

By: Jessica Y. Bo, Lillio Mok, Ashton Anderson2026-02-26

#cs.AI

On Data Engineering for Scaling LLM Terminal Capabilities

This paper explores advanced data engineering strategies crucial for scaling large language models (LLMs) to enhance their "terminal capabilities," i.e., their ability to execute complex commands and ...

By: Renjie Pi, Grace Lam, Mohammad Shoeybi, Pooya Jannaty, Bryan Catanzaro, Wei Ping2026-02-25

#cs.AI

2-Step Agent: A Framework for the Interaction of a Decision Maker with AI Decision Support

We propose the 2-Step Agent framework, designed to optimize the interaction between human decision-makers and AI decision support systems. This framework structures the decision process into two disti...

By: Otto Nyberg, Fausto Carcassi, Giovanni Cinà2026-02-26

#cs.AI

Learning to Tune Pure Pursuit in Autonomous Racing: Joint Lookahead and Steering-Gain Control with PPO

This research explores using Proximal Policy Optimization (PPO) to learn optimal tuning parameters for the Pure Pursuit algorithm in autonomous racing. By jointly controlling lookahead distance and st...

By: Mohamed Elgouhary, Amr S. El-Wakeel2026-02-23

#cs.AI#Dynamic Pricing#Interpretability

Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies

This paper presents a method for efficient online coordination in multi-agent systems using diffusion policies. It focuses on enabling agents to collaborate effectively in dynamic environments, which ...

By: Zhuoran Li, Hai Zhong, Xun Wang, Qingxin Xia, Lihua Zhang, Longbo Huang2026-02-23

#cs.AI

Augmenting Clinical Decision-Making with an Interactive and Interpretable AI Copilot: A Real-World User Study with Clinicians in Nephrology and Obstetrics

This paper describes an interactive and interpretable AI copilot designed to augment clinical decision-making, specifically evaluated with clinicians in nephrology and obstetrics through a real-world ...

By: Yinghao Zhu, Dehao Sui, Zixiang Wang, Xuning Hu, Lei Gu, Yifan Qi, Tianchen Wu, Ling Wang, Yuan Wei, Wen Tang, Zhihan Cui, Yasha Wang, Lequan Yu, Ewen M Harrison, Junyi Gao, Liantao Ma2026-02-23

#cs.AI

Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets

This research introduces a novel approach to offline reinforcement learning that allows robots to learn from heterogeneous datasets across different embodiments. This innovation is crucial for real-wo...

By: Haruki Abe, Takayuki Osa, Yusuke Mukuta, Tatsuya Harada2026-02-23

#cs.AI

In-the-Wild Model Organisms: Mitigating Undesirable Emergent Behaviors in Production LLM Post-Training via Data Attribution

This paper addresses the critical issue of undesirable emergent behaviors in large language models (LLMs) deployed in real-world production environments. It proposes a data attribution method to ident...

By: Frank Xiao, Santiago Aranguri2026-02-23

#cs.AI

Chatting with Images for Introspective Visual Thinking

This research explores a novel approach where AI systems can engage in "introspective visual thinking" by "chatting with images." This enables a deeper understanding and interpretation of visual data ...

By: Junfei Wu, Jian Guan, Qiang Liu, Shu Wu, Liang Wang, Wei Wu, Tieniu Tan2026-02-23

#cs.AI

Conversational Behavior Modeling Foundation Model With Multi-Level Perception

This paper introduces a foundational model for conversational behavior modeling, incorporating multi-level perception to understand and generate more nuanced and contextually appropriate dialogue. It ...

By: Dingkun Zhou, Shuchang Pan, Jiachen Lian, Siddharth Banerjee, Sarika Pasumarthy, Dhruv Hebbar, Siddhant Patel, Zeyi Austin Li, Kan Jen Cheng, Sanay Bordia, Krish Patel, Akshaj Gupta, Tingle Li, Gopala Anumanchipalli2026-02-23

#cs.AI

Power Interpretable Causal ODE Networks: A Unified Model for Explainable Anomaly Detection and Root Cause Analysis in Power Systems

This paper proposes a unified model, Power Interpretable Causal ODE Networks, for explainable anomaly detection and root cause analysis specifically in power systems. This research has critical real-w...

By: Yue Sun, Likai Wang, Rick S. Blum, Parv Venkitasubramaniam2026-02-17

#cs.AI

Self-EvolveRec: Self-Evolving Recommender Systems with LLM-based Directional Feedback

This paper introduces Self-EvolveRec, a novel approach to recommender systems that can self-evolve using Large Language Model (LLM)-based directional feedback. This innovation has significant real-wor...

By: Sein Kim, Sangwu Park, Hongseok Kang, Wonjoong Kim, Jimin Seo, Yeonjun In, Kanghoon Yoon, Chanyoung Park2026-02-17

#cs.AI

MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models

This research presents MolHIT, a novel approach using hierarchical discrete diffusion models for molecular-graph generation. This advancement has significant real-world application potential in drug d...

By: Hojung Jung, Rodrigo Hormazabal, Jaehyeong Jo, Youngrok Park, Kyunggeun Roh, Se-Young Yun, Sehui Han, Dae-Woong Jeong2026-02-20

#cs.AI

AutoNumerics: An Autonomous, PDE-Agnostic Multi-Agent Pipeline for Scientific Computing

This paper introduces AutoNumerics, an autonomous multi-agent pipeline designed for scientific computing. Its PDE-agnostic nature suggests a broad applicability across various scientific and engineeri...

By: Jianda Du, Youran Sun, Haizhao Yang2026-02-20

#cs.AI

AIdentifyAGE Ontology for Decision Support in Forensic Dental Age Assessment

This paper introduces the AIdentifyAGE ontology, a domain-specific, standardized, and semantically coherent framework designed for forensic dental age assessment. It integrates both manual and AI-assi...

By: Renato Marcelo, Ana Rodrigues, Cristiana Palmela Pereira, António Figueiras, Rui Santos, José Rui Figueira, Alexandre P Francisco, Cátia Vaz2026-02-20

#cs.AI

AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

This paper introduces "AI Gamestore," a platform designed for the scalable and open-ended evaluation of machine general intelligence through human games. This approach provides a robust framework for ...

By: Lance Ying, Ryan Truong, Prafull Sharma, Kaiya Ivy Zhao, Nathan Cloos, Kelsey R. Allen, Thomas L. Griffiths, Katherine M. Collins, José Hernández-Orallo, Phillip Isola, Samuel J. Gershman, Joshua B. Tenenbaum2026-02-20

#cs.AI

CLEF HIPE-2026: Evaluating Accurate and Efficient Person-Place Relation Extraction from Multilingual Historical Texts

This paper is part of the CLEF HIPE-2026 evaluation lab, focusing on the challenging task of accurately and efficiently extracting person-place relationships from diverse multilingual historical texts...

By: Juri Opitz, Corina Raclé, Emanuela Boros, Andrianos Michail, Matteo Romanello, Maud Ehrmann, Simon Clematide2026-02-20

#cs.AI

Towards Efficient Constraint Handling in Neural Solvers for Routing Problems

This paper introduces Construct-and-Refine (CaR), a novel, general, and efficient constraint-handling framework for neural routing solvers. While neural solvers excel in computational efficiency for s...

By: Jieyi Bi, Zhiguang Cao, Jianan Zhou, Wen Song, Yaoxin Wu, Jie Zhang, Yining Ma, Cathy Wu2026-02-19

#cs.AI

Scalable and Secure AI Inference in Healthcare: A Comparative Benchmarking of FastAPI and Triton Inference Server on Kubernetes

This paper presents a comparative benchmarking of FastAPI and Triton Inference Server on Kubernetes for scalable and secure AI inference in healthcare. The research addresses critical deployment chall...

By: Ratul Ali2026-02-01

#cs.AI

Long Context, Less Focus: A Scaling Gap in LLMs Revealed through Privacy and Personalization.

This research uncovers a scaling gap in large language models where increasing context length can lead to decreased focus and potential privacy vulnerabilities in personalized applications. The paper ...

By: Shangding Gu2026-02-17

#cs.AI

Federated Learning for Privacy-Preserving AI in Edge Devices.

This paper explores the application of federated learning techniques to enable privacy-preserving artificial intelligence on edge devices. It addresses challenges related to data security and efficien...

By: Elena Petrova, Mykhailo Kovalenko, Sergii Denysenko2026-02-17

#cs.AI

Explainable AI for Financial Risk Assessment and Decision Making.

This research focuses on developing explainable artificial intelligence (XAI) models for enhanced financial risk assessment and decision-making. It aims to provide transparency and interpretability in...

By: Maksym Bondarenko, Oksana Popova, Viktor Melnyk, Andrii Hryhoruk2026-02-16

#cs.AI

Enhancing Building Semantics Preservation in AI Model Training with Large Language Model Encodings.

This research investigates methods to improve the preservation of building semantics during the training of AI models, utilizing large language model encodings. It aims to create more intelligent and ...

By: Suhyung Jang, Ghang Lee, Jaekun Lee, Hyunjun Lee2026-02-18

#cs.AI

Developing AI Agents with Simulated Data: Why, what, and how?

This paper explores the methodology, rationale, and practical steps involved in developing AI agents using simulated data. It delves into the benefits and challenges of synthetic environments for trai...

By: Xiaoran Liu, Istvan David2026-02-18

#cs.AI

GlobeDiff: State Diffusion Process for Partial Observability in Multi-Agent Systems.

This paper introduces GlobeDiff, a novel state diffusion process designed to address partial observability challenges in multi-agent systems. It offers a robust framework for agents to infer global st...

By: Yiqin Yang, Xu Yang, Yuhua Jiang, Ni Mu, Hao Hu, Runpeng Xie, Ziyou Zhang, Siyuan Li, Yuan-Hua Ni, Qianchuan Zhao, Bo Xu2026-02-18

#cs.AI

Recursive Concept Evolution for Compositional Reasoning in Large Language Models

This paper proposes a novel method for recursive concept evolution to enhance compositional reasoning capabilities in large language models. This breakthrough is crucial for developing more intelligen...

By: Sarim Chaudhry2026-02-18

#cs.AI✓ Analyzed#LLM#Reasoning

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

SkillsBench presents the first benchmark designed to systematically evaluate the effectiveness of 'Agent Skills,' which are structured procedural knowledge packages intended to augment large language ...

By: Xiangyi Li, Wenbo Chen, Yimin Liu, Shenghan Zheng, Xiaokun Chen, Yifeng He, Yubo Li, Bingran You, Haotian Shen, Jiankai Sun, Shuyi Wang, Qunhong Zeng, Di Wang, Xuandong Zhao, Yuanli Wang, Roey Ben Chaim, Zonglin Di, Yipeng Gao, Junwei He, Yizhuo He, Liqiang Jing, Luyang Kong, Xin Lan, Jiachen Li, Songlin Li, Yijiang Li, Yueqian Lin, Xinyi Liu, Xuanqing Liu, Haoran Lyu, Ze Ma, Bowei Wang, Runhui Wang, Tianyu Wang, Wengao Ye, Yue Zhang2026-02-13

#cs.AI

Intelligent AI Delegation

This paper introduces a comprehensive framework for intelligent AI delegation, drawing on human organizational theory, advanced AI protocols, and cryptography. The approach aims to establish a robust ...

By: Nenad Tomašev, Matija Franklin, Simon Osindero2026-02-12

#cs.AI

GLM-5: from Vibe Coding to Agentic Engineering

GLM-5 is a next-generation foundation model designed to transition from human-guided 'vibe coding' to autonomous 'agentic engineering' in AI. It achieves state-of-the-art results on agentic, reasoning...

By: Aohan Zeng, Xin Lv, Zhenyu Hou, Zhengxiao Du, Qinkai Zheng, Bin Chen, Da Yin, Chendi Ge, Chengxing Xie, Cunxiang Wang, Gengzheng Pan, Hao Zeng, Haoke Zhang, Haoran Wang, Huilong Chen, Jiajie Zhang, Jian Jiao, Jiaqi Guo, Jingsen Wang, Jingzhao Du2026-02-17

#cs.AI

ReusStdFlow: A Standardized Reusability Framework for Dynamic Workflow Construction in Agentic AI.

ReusStdFlow is a framework designed to tackle the "reusability dilemma" and structural hallucinations in enterprise Agentic AI. It proposes an "Extraction-Storage-Construction" paradigm that deconstru...

By: Gaoyang Zhang, Shanghong Zou, Yafang Wang, He Zhang, Ruohua Xu, Feng Zhao2026-02-16

#cs.AI

Position: Introspective Experience from Conversational Environments as a Path to Better Learning.

This position paper argues that robust AI reasoning emerges from linguistic self-reflection, internalized from high-quality social interaction, rather than simply from scale. Drawing on Vygotskian dev...

By: Claudiu Cristian Musat, Jackson Tolins, Diego Antognini, Jingling Li, Martin Klissarov, Tom Duerig2026-02-16

#cs.AI

MAC-AMP: A Closed-Loop Multi-Agent Collaboration System for Multi-Objective Antimicrobial Peptide Design.

This paper introduces MAC-AMP, a closed-loop multi-agent collaboration (MAC) system for multi-objective antimicrobial peptide (AMP) design, addressing the global health threat of antimicrobial resista...

By: Gen Zhou, Sugitha Janarthanan, Lianghong Chen, Pingzhao Hu2026-02-16

#cs.AI

Hunt Globally: Deep Research AI Agents for Drug Asset Scouting in Investing, Business Development, and Search & Evaluation.

Bio-pharmaceutical innovation has shifted, with most new drug assets originating outside the U.S. and disclosed through non-English channels. This creates multi-billion-dollar risks for investors and ...

By: Alisa Vinogradova, Vlad Vinogradov, Luba Greenwood, Ilya Yasny, Dmitry Kobyzev, Shoman Kasbekar, Kong Nguyen, Dmitrii Radkevich, Roman Doronin, Andrey Doronichev2026-02-16

#cs.AI

Agentic AI for Commercial Insurance Underwriting with Adversarial Self-Critique.

Commercial insurance underwriting is a labor-intensive process where AI can offer efficiency, but existing solutions lack comprehensive reasoning and reliability for regulated, high-stakes environment...

By: Joyjit Roy, Samaresh Kumar Singh2026-02-13

#cs.AI

WebClipper: Efficient Evolution of Web Agents with Graph-based Trajectory Pruning

Deep Research systems leveraging web agents face challenges in search efficiency due to long tool-call trajectories, cyclic reasoning, and unproductive explorations. WebClipper is a novel framework th...

By: Junjie Wang, Zequn Xie, Dan Yang, Jie Feng, Yue Shen, Duolin Sun, Meixiu Long, Yihan Jiao, Zhehao Tan, Jian Wang, Peng Wei, Jinjie Gu2026-02-13

#cs.AI

Position: Agentic Evolution is the Path to Evolving LLMs

As Large Language Models (LLMs) move from controlled training environments to open-ended real-world applications, a critical limitation arises: static training cannot keep pace with continuous environ...

By: Minhua Lin, Hanqing Lu, Zhan Shi, Bing He, Rui Mao, Zhiwei Zhang, Zongyu Wu, Xianfeng Tang, Hui Liu, Zhenwei Dai, Xiang Zhang, Suhang Wang, Benoit Dumoulin, Jian Pei2026-01-30

#cs.AI

Scaling Verification Can Be More Effective than Scaling Policy Learning for Vision-Language-Action Alignment

The goal of general-purpose robots relies on their ability to understand and execute natural language instructions, but Vision-Language-Action (VLA) models often misalign actions with instructions. Th...

By: Jacky Kwok, Xilun Zhang, Mengdi Xu, Yuejiang Liu, Azalia Mirhoseini, Chelsea Finn, Marco Pavone2026-02-13

#cs.AI

BrowseComp-V^3: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents

Multimodal Large Language Models (MLLMs) are evolving into autonomous agents capable of multimodal web browsing and deep searching. Existing benchmarks fall short in task complexity, evidence accessib...

By: Huanyao Zhang, Jiepeng Zhou, Bo Li, Bowen Zhou, Yanzhe Dan, Haishan Lu, Zhiyong Cao, Jiaoyang Chen, Yuqian Han, Zinan Sheng, Zhengwei Tao, Hao Liang, Jialong Wu, Yang Shi, Yuanpeng He, Jiaye Lin, Qintong Zhang, Guochen Yan, Runhao Zhao, Zhengpin Li, Xiaohan Yu, Lang Mei, Chong Chen, Wentao Zhang, Bin Cui2026-02-16

#cs.AI

AttentionRetriever: Attention Layers are Secretly Long Document Retrievers

Retrieval Augmented Generation (RAG) is crucial for Large Language Models (LLMs) in processing long documents, but current retrieval models are inadequate for this task due to challenges like context-...

By: David Jiahao Fu, Lam Thanh Do, Jiayu Li, Kevin Chen-Chuan Chang2026-02-13

#cs.AI

Consistency of Large Reasoning Models Under Multi-Turn Attacks

Large reasoning models demonstrate state-of-the-art performance on complex tasks, but their robustness against multi-turn adversarial attacks is underexplored. This paper evaluates nine frontier reaso...

By: Yubo Li, Ramayya Krishnan, Rema Padman2026-02-13

#cs.AI

From Gameplay Traces to Game Mechanics: Causal Induction with Large Language Models

This research explores the use of Large Language Models (LLMs) for causal induction to reverse-engineer game mechanics from gameplay traces. By analyzing player behavior, the system can infer the unde...

By: Mohit Jiwatode, Alexander Dockhorn, Bodo Rosenhahn2026-02-16

#cs.AI

GT-HarmBench: Benchmarking AI Safety Risks Through the Lens of Game Theory

Frontier AI systems are increasingly capable and deployed in high-stakes multi-agent environments. However, existing AI safety benchmarks largely evaluate single agents, leaving multi-agent risks such...

By: Pepijn Cobben, Xuanqiang Angelo Huang, Thao Amelia Pham, Isabel Dahlgren, Terry Jingchen Zhang, Zhijing Jin2026-02-16

#cs.AI

Autonomous Data Processing using Meta-Agents

This work introduces a novel framework for autonomous data processing leveraging meta-agents. These meta-agents are designed to intelligently manage and execute data-related tasks without constant hum...

By: Udayan Khurana2026-02-16

#cs.AI

Agentic Test-Time Scaling for WebAgents

This work presents CATTS, a simple technique for dynamically allocating compute for multi-step agents, especially web agents. It empirically studies inference-time scaling for web agents, addressing t...

By: Nicholas Lee, Lutfi Eren Erdogan, Chris Joseph John, Surya Krishnapillai, Michael W. Mahoney, Kurt Keutzer, Amir Gholami2026-02-13

#cs.AI

Think like a Scientist: Physics-guided LLM Agent for Equation Discovery

Explaining observed phenomena through symbolic, interpretable formulas is a fundamental goal of science. We introduce KeplerAgent, an agentic framework that explicitly follows the scientific reasoning...

By: Jianke Yang, Ohm Venkatachalam, Mohammad Kianezhad, Sharvaree Vadgama, Rose Yu2026-02-13

#cs.AI✓ Analyzed#Symbolic Regression#LLM Agents

Safe Urban Traffic Control via Uncertainty-Aware Conformal Prediction and World-Model Reinforcement Learning

Urban traffic management demands systems that simultaneously predict future conditions, detect anomalies, and take safe corrective actions while providing reliability guarantees. We present STREAM-RL,...

By: Joydeep Chandra, Satyam Kumar Navneet, Aleksandr Algazinov, Yong Zhang2026-02-04

#cs.AI

SAM3-LiteText: An Anatomical Study of the SAM3 Text Encoder for Efficient Vision-Language Segmentation

This paper conducts an in-depth anatomical study of the SAM3 text encoder, a critical component for vision-language segmentation models, focusing on identifying architectural bottlenecks and proposing...

By: Chengxi Zeng, Yuxuan Jiang, Ge Gao, Shuai Wang, Duolikun Danier, Bin Zhu, Stevan Rudinac, David Bull, Fan Zhang2026-02-13

#cs.AI

Discovering Differences in Strategic Behavior Between Humans and LLMs

As Large Language Models (LLMs) are increasingly deployed in social and strategic scenarios, it becomes critical to understand where and why their behavior diverges from that of humans. While behavior...

By: Caroline Wang, Daniel Kasenberg, Kim Stachenfeld, Pablo Samuel Castro2026-02-10

#cs.AI✓ Analyzed#Game Theory#LLM

AppleVLM: End-to-end Autonomous Driving with Advanced Perception and Planning-Enhanced Vision-Language Models

End-to-end autonomous driving has emerged as a promising paradigm. We propose AppleVLM, an advanced perception and planning-enhanced VLM model for robust end-to-end driving. AppleVLM introduces a nove...

By: Yuxuan Han, Kunyuan Wu, Qianyi Shao, Renxiang Xiao, Zilu Wang, Cansen Jiang, Yi Xiao, Liang Hu, Yunjiang Lou2026-02-04

#cs.AI

Collective Behavior of AI Agents: the Case of Moltbook

We present a large scale data analysis of Moltbook, a Reddit-style social media platform exclusively populated by AI agents. Analyzing over 369,000 posts and 3.0 million comments from approximately 46...

By: Yukun Jiang, Yage Zhang, Xinyue Shen, Michael Backes, Yang Zhang2026-02-09

#cs.AI✓ Analyzed#Multi-Agent Systems#Social Simulation

CM2: Reinforcement Learning with Checklist Rewards for Multi-Turn and Multi-Step Agentic Tool Use

Checklist-based rewards offer a structured way to guide reinforcement learning agents through complex, multi-step tasks requiring tool use and multi-turn interactions. This paper introduces CM2, a nov...

By: Zhen Zhang, Kaiqiang Song, Xun Wang, Yebowen Hu, Weixiang Yan, Chenyang Zhao, Henry Peng Zou, Haoyun Deng, Sathish Reddy Indurthi, Shujian Liu, Simin Ma, Xiaoyang Wang, Xin Eric Wang, Song Wang2026-02-13

#cs.AI

"Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most

This paper explores critical limitations in current speech models, revealing how they often fail to capture the most salient or semantically important information in spoken language. Through extensive...

By: Kaitlyn Zhou, Martijn Bartelds, Federico Bianchi, James Zou2026-02-13

#cs.AI✓ Analyzed#Speech Recognition#ASR

ViT-5: Vision Transformers for The Mid-2020s

This work systematically investigates modernizing Vision Transformer backbones by leveraging architectural advancements from the past five years. While preserving the canonical Attention-FFN structure...

By: Feng Wang, Sucheng Ren, Tiezheng Zhang, Predrag Neskovic, Anand Bhattad, Cihang Xie, Alan Yuille2026-02-08

#cs.AI

SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

Large Language Model (LLM) agents struggle to learn from past experiences, with existing memory methods often storing redundant trajectories and failing to extract high-level patterns. SkillRL address...

By: Peng Xia, Jianwen Chen, Hanyang Wang, Jiaqi Liu, Kaide Zeng, Yu Wang, Siwei Han, Yiyang Zhou, Xujiang Zhao, Haifeng Chen, Zeyu Zheng, Cihang Xie, Huaxiu Yao2026-02-09

#cs.AI

Olaf-World: Orienting Latent Actions for Video World Modeling

Scaling action-controllable world models is hindered by the scarcity of action labels. While latent action learning aims to extract control interfaces from unlabeled video, learned latents often fail ...

By: Yuxin Jiang, Yuchao Gu, Ivor W. Tsang, Mike Zheng Shou2026-02-10

#cs.AI

Fake-HR1: Rethinking reasoning of vision language model for synthetic image detection

Recent studies show Chain-of-Thought (CoT) reasoning can improve synthetic image detection, but lengthy reasoning incurs substantial resource overhead. Fake-HR1 proposes a large-scale hybrid-reasoning...

By: Changjiang Jiang, Xinkuan Sha, Fengchang Yu, Jingjing Liu, Jian Liu, Mingqi Fang, Chenfeng Zhang, Wei Lu2026-02-10

#cs.AI

Generative Modeling via Drifting

Drifting Models propose a new generative modeling paradigm that shifts iterative distribution matching to training time, enabling high-quality sample generation in a single forward pass. This addresse...

By: Mingyang Deng, He Li, Tianhong Li, Yilun Du, Kaiming He2026-02-05

#cs.AI

BagelVLA: Enhancing Long-Horizon Manipulation via Interleaved Vision-Language-Action Generation

Equipping embodied agents with the ability to reason about tasks, foresee physical outcomes, and generate precise actions is essential for general-purpose manipulation. BagelVLA is a unified model tha...

By: Yucheng Hu, Jianke Zhang, Yuanfei Luo, Yanjiang Guo, Xiaoyu Chen, Xinshu Sun, Kun Feng, Qingzhou Lu, Sheng Chen, Yangang Zhang, Wei Li, Jianyu Chen2026-02-10

#cs.AI

A Small-Scale System for Autoregressive Program Synthesis Enabling Controlled Experimentation

This paper presents Cadmus, a system designed for research on program synthesis with small models, avoiding the complexities and high computational demands of large language models (LLMs). Cadmus incl...

By: Russ Webb, Jason Ramapuram2026-02-09

#cs.AI

The Use of AI Tools to Develop and Validate Q-Matrices

This paper investigates the application of artificial intelligence tools for the development and validation of Q-matrices, which are essential components in psychometrics and educational assessment fo...

By: Kevin Fan, Jacquelyn A. Bialo, Hongli Li2026-02-10

#cs.AI

When RL Meets Adaptive Speculative Training: A Unified Training-Serving System

This paper proposes a unified training-serving system that integrates reinforcement learning (RL) with adaptive speculative training. The approach aims to optimize the deployment and continuous learni...

By: Junxiong Wang, Fengxiang Bie, Jisen Li, Zhongzhu Zhou, Zelei Shao, Yubo Wang, Yinghui Liu, Qingyang Wu, Avner May, Sri Yanamandra, Yineng Zhang, Ce Zhang, Tri Dao, Percy Liang, Ben Athiwaratkun, Shuaiwen Leon Song, Chenfeng Xu, Xiaoxia Wu2026-02-09

#cs.AI

Root Cause Analysis Method Based on Large Language Models with Residual Connection Structures

This research proposes a novel root cause analysis method that leverages large language models (LLMs) enhanced with residual connection structures. The approach aims to improve the accuracy and effici...

By: Liming Zhou, Ailing Liu, Hongwei Liu, Min He, Heng Zhang2026-02-10

#cs.AI

Robust and Real-Time Bangladeshi Currency Recognition: A Dual-Stream MobileNet and EfficientNet Approach

This paper presents a robust and real-time system for Bangladeshi currency recognition utilizing a dual-stream MobileNet and EfficientNet approach. This innovative solution offers enhanced accuracy an...

By: Subreena, Mohammad Amzad Hossain, Mirza Raquib, Saydul Akbar Murad, Farida Siddiqi Prity, Muhammad Hanif, Nick Rahimi2026-02-09

#cs.AI

Strategizing at Speed: A Learned Model Predictive Game for Multi-Agent Drone Racing

This research introduces a learned model predictive game framework for multi-agent drone racing, enabling drones to develop complex strategies and execute them at high speeds. This advancement holds s...

By: Andrei-Carlo Papuc, Lasse Peters, Sihao Sun, Laura Ferranti, Javier Alonso-Mora2026-02-09

#cs.AI

Finite-State Controllers for (Hidden-Model) POMDPs using Deep Reinforcement Learning

This research presents a method for designing finite-state controllers for Partially Observable Markov Decision Processes (POMDPs) by employing deep reinforcement learning. This approach is critical f...

By: David Hudák, Maris F. L. Galesloot, Martin Tappler, Martin Kurečka, Nils Jansen, Milan Češka2026-02-10

#cs.AI

Exploring SAIG Methods for an Objective Evaluation of XAI

This paper explores SAIG methods, a framework designed for the objective evaluation of Explainable Artificial Intelligence (XAI). The research aims to establish rigorous and quantifiable metrics for a...

By: Miquel Miró-Nicolau, Gabriel Moyà-Alcover, Anna Arias-Duart2026-02-10

#cs.AI

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos

DreamDojo introduces a generalist robot world model learned from large-scale human videos, enabling efficient reinforcement learning of robotic policies. This framework co-evolves a video world model ...

By: Shenyuan Gao, William Liang, Kaiyuan Zheng, Ayaan Malik, Seonghyeon Ye, Sihyun Yu, Wei-Cheng Tseng, Yuzhu Dong, Kaichun Mo, Chen-Hsuan Lin, Qianli Ma, Seungjun Nah, Loic Magne, Jiannan Xiang, Yuqi Xie, Ruijie Zheng, Dantong Niu, You Liang Tan, K.R. Zentner, George Kurian2026-02-09

#cs.AI

Jackpot: Optimal Budgeted Rejection Sampling for Extreme Actor-Policy Mismatch Reinforcement Learning

This paper introduces 'Jackpot,' a framework designed to improve the efficiency of reinforcement learning (RL) for large language models (LLMs) by reducing the distribution mismatch between the rollou...

By: Zhuoming Chen, Hongyi Liu, Yang Zhou, Haizhong Zheng, Beidi Chen2026-02-09

#cs.AI

From Features to Actions: Explainability in Traditional and Agentic AI Systems

This research delves into the critical area of explainable AI (XAI), comparing and contrasting explainability in traditional AI models with that in more complex agentic systems. Enhancing XAI is vital...

By: Sindhuja Chaduvula, Jessee Ho, Kina Kim, Aravind Narayanan, Mahshid Alinoori, Muskan Garg, Dhanesh Ramachandram, Shaina Raza2026-02-09

#cs.AI

Agentic Uncertainty Reveals Agentic Overconfidence

This paper explores the phenomenon of overconfidence in AI agents, using agentic uncertainty as a mechanism to identify and potentially mitigate it. Understanding and addressing overconfidence is cruc...

By: Jean Kaddour, Srijan Patel, Gbètondji Dovonon, Leo Richter, Pasquale Minervini, Matt J. Kusner2026-02-09

#cs.AI

AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents

This paper introduces AIRS-Bench, a comprehensive benchmark suite designed to evaluate the capabilities of frontier AI research science agents across various tasks. It provides a standardized framewor...

By: Alisia Lupidi, Bhavul Gauri, Thomas Simon Foster, Bassel Al Omari, Despoina Magka, Alberto Pepe, Alexis Audran-Reiss, Muna Aghamelu, Nicolas Baldwin, Lucia Cipolina-Kun, Jean-Christophe Gagnon-Audet, Chee Hau Leow, Sandra Lefdal, Hossam Mossalam, Abhinav Moudgil, Saba Nazir, Emanuel Tewolde, Isabel Urrego, Jordi Armengol Estape, Amar Budhiraja, Gaurav Chaurasia, Abhishek Charnalia, Derek Dunfield, Karen Hambardzumyan, Daniel Izcovich, Martin Josifoski, Ishita Mediratta, Kelvin Niu, Parth Pathak, Michael Shvartsman, Edan Toledo, Anton Protopopov, Roberta Raileanu, Alexander Miller, Tatiana Shavrina, Jakob Foerster, Yoram Bachrach2026-02-09

#cs.AI

Speech Emotion Recognition Leveraging OpenAI's Whisper Representations and Attentive Pooling Methods

This paper focuses on improving speech emotion recognition by utilizing representations from OpenAI's Whisper model combined with attentive pooling. This advancement has significant real-world applica...

By: Ali Shendabadi, Parnia Izadirad, Mostafa Salehi, Mahmoud Bijankhan2026-02-06

#cs.AI

SokoBench: Evaluating Long-Horizon Planning and Reasoning in Large Language Models

SokoBench is introduced as a benchmark for evaluating the long-horizon planning and reasoning capabilities of large language models. This is critical for developing more capable and reliable LLMs for ...

By: Sebastiano Monti, Carlo Nicolini, Gianni Pellegrini, Jacopo Staiano, Bruno Lepri2026-02-03

#cs.AI

MemCtrl: Using MLLMs as Active Memory Controllers on Embodied Agents

This paper investigates the use of multimodal large language models (MLLMs) as active memory controllers for embodied agents. This approach could significantly enhance the autonomy and adaptability of...

By: Vishnu Sashank Dorbala, Dinesh Manocha2026-02-03

#cs.AI

Learning Event-Based Shooter Models from Virtual Reality Experiments

This research explores learning models of shooter behavior based on events observed in virtual reality experiments. This has potential applications in training simulations, developing more realistic A...

By: Christopher A. McClurg, Alan R. Wagner2026-02-06

#cs.AI✓ Analyzed#Event Vision#Virtual Reality

DyTopo: Dynamic Topology Routing for Multi-Agent Reasoning via Semantic Matching

This paper presents DyTopo, a dynamic topology routing method for multi-agent reasoning. It uses semantic matching to enable flexible and efficient communication between agents, which is crucial for c...

By: Yuxing Lu, Yucheng Hu, Xukai Zhao, Jiuxin Cao2026-02-06

#cs.AI

Geographically-aware Transformer-based Traffic Forecasting for Urban Motorway Digital Twins

This research introduces a geographically-aware transformer-based model for traffic forecasting, specifically designed for urban motorway digital twins. This technology offers crucial support for smar...

By: Krešimir Kušić, Vinny Cahill, Ivana Dusparic2026-02-06

#cs.AI

Artificial Intelligence as Strange Intelligence: Against Linear Models of Intelligence

This paper critiques the linear model of AI progress, introducing "familiar intelligence" and "strange intelligence". It argues that AI intelligence is likely to be strange, combining superhuman capac...

By: Kendra Chilson, Eric Schwitzgebel2026-02-06

#cs.AI

AgenticPay: A Multi-Agent LLM Negotiation System for Buyer-Seller Transactions

AgenticPay proposes a multi-agent large language model (LLM) negotiation system designed for buyer-seller transactions. This system could revolutionize e-commerce by automating and optimizing negotiat...

By: Xianyang Liu, Shangding Gu, Dawn Song2026-02-06

#cs.AI✓ Analyzed#Multi-Agent Systems#LLM Negotiation

T-LLM: Teaching Large Language Models to Forecast Time Series via Temporal Distillation

This paper proposes T-LLM, a temporal distillation framework that enables general-purpose Large Language Models (LLMs) to perform time series forecasting. By transferring predictive behavior from a li...

By: Suhan Guo, Furao Shen, Yiwen Luo, Yunfeng Liu2026-02-02

#cs.AI✓ Analyzed#Time Series#LLM

SWE-Universe: Scale Real-World Verifiable Environments to Millions

SWE-Universe is a framework developed to automatically construct over 800,000 real-world, multilingual, verifiable software engineering environments from GitHub PRs. This massive dataset significantly...

By: Mouxiang Chen, Lei Zhang, Yunlong Feng, Yiheng Li, Xiao Sun, Jindong Wang, Fei Xia, Changtai Li, Rui Zhang, Wenbo Li, Tianyu Pang, Xianfeng Wen, Dongchen Jiang, Ziyue Li, Zhigang Zeng2026-02-02

#cs.AI

HybridQuestion: Human-AI Collaboration for Identifying High-Impact Research Questions

This paper explores a human-AI hybrid solution, HybridQuestion, that integrates the scalable data processing capabilities of AI with human expert judgment to identify meaningful research questions. Th...

By: Keyu Zhao, Fengli Xu, Yong Li, Tie-Yan Liu2026-02-05

#cs.AI

Knowledge Model Prompting Increases LLM Performance on Planning Tasks

Large Language Models (LLMs) often struggle with reasoning and planning tasks. This paper introduces the Task-Method-Knowledge (TMK) framework, a prompting technique that significantly improves LLM re...

By: Erik Goh, John Kos, Ashok Goel2026-02-03

#cs.AI

Group-Evolving Agents: Open-Ended Self-Improvement via Experience Sharing

This paper introduces Group-Evolving Agents (GEA), a new paradigm for open-ended self-improvement where a group of agents acts as the fundamental evolutionary unit, enabling explicit experience sharin...

By: Zhaotian Weng, Antonis Antoniades, Deepak Nathani, Zhen Zhang, Xiao Pu, Xin Eric Wang2026-02-04

#cs.AI

Accelerating Scientific Research with Gemini: Case Studies and Common Techniques

This paper presents case studies demonstrating how Google's Gemini-based AI models can effectively collaborate with researchers in novel, expert-level mathematical and algorithmic discovery. It showca...

By: David P. Woodruff, Vincent Cohen-Addad, Lalit Jain, Jieming Mao, Song Zuo, MohammadHossein Bateni, Simina Branzei, Michael P. Brenner, Lin Chen, Ying Feng, Lance Fortnow, Gang Fu, Ziyi Guan, Zahra Hadizadeh, Mohammad T. Hajiaghayi, Mahdi JafariRaviz, Adel Javanmard, Karthik C. S., Ken-ichi Kawarabayashi, Ravi Kumar, Silvio Lattanzi, Euiwoong Lee, Yi Li, Ioannis Panageas, Dimitris Paparas, Benjamin Przybocki, Bernardo Subercaseaux, Ola Svensson, Shayan Taherijam, Xuan Wu, Eylon Yogev, Morteza Zadimoghaddam, Samson Zhou, Vahab Mirrokni2026-02-03

#cs.AI✓ Analyzed#Gemini#Scientific Research

Zonkey: A Hierarchical Diffusion Language Model with Differentiable Tokenization and Probabilistic Attention

Large language models (LLMs) have revolutionized natural language processing, yet they remain constrained by fixed, non-differentiable tokenizers like Byte Pair Encoding (BPE), which hinder end-to-end...

By: Alon Rozental2026-01-29

#cs.AI

The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity?

As AI becomes more capable, we entrust it with more general and consequential tasks. The risks from failure grow more severe with increasing task scope. It is therefore important to understand how ext...

By: Alexander Hägele, Aryo Pradipta Gema, Henry Sleight, Ethan Perez, Jascha Sohl-Dickstein2026-01-30

#cs.AI

ReThinker: Scientific Reasoning by Rethinking with Guided Reflection and Confidence Control

Expert-level scientific reasoning remains challenging for large language models, particularly on benchmarks such as Humanity's Last Exam (HLE), where rigid tool pipelines, brittle multi-agent coordina...

By: Zhentao Tang, Yuqi Cui, Shixiong Kai, Wenqian Zhao, Ke Ye, Xing Li, Anxin Tian, Zehua Pei, Hui-Ling Zhen, Shoubo Hu, Xiaoguang Li, Yunhe Wang, Mingxuan Yuan2026-02-03

#cs.AI

RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

We propose RLAnything, a reinforcement learning framework that dynamically forges environment, policy, and reward models through closed-loop optimization, amplifying learning signals and strengthening...

By: Yinjie Wang, Tianbao Xie, Ke Shen, Mengdi Wang, Ling Yang2026-02-02

#cs.AI

HumanX: Toward Agile and Generalizable Humanoid Interaction Skills from Human Videos

Enabling humanoid robots to perform agile and adaptive interactive tasks has long been a core challenge in robotics. Current approaches are bottlenecked by either the scarcity of realistic interaction...

By: Wei-Yuan Huang, Ruohan Zhang, Hao-Tien Chiang, C. Karen Liu, Sergey Levine, Jiajun Wu2026-02-02

#cs.AI

From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents

Embodied agents operating in complex, dynamic environments often struggle with uncertainty in their perceptions and actions. This paper proposes a novel framework that bridges the gap between large la...

By: SeungWon Seo, SooBin Lim, SeongRae Noh, Haneul Kim, HyeongYeop Kang2026-02-01

#cs.AI

Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning

Large language model (LLM) agents, while powerful, often suffer from inefficiencies due to processing irrelevant information and generating verbose thoughts. Agent-Omit introduces a novel training par...

By: Yansong Ning, Jun Fang, Naiqiang Tan, Hao Liu2026-02-01

#cs.AI

ATLAS : Adaptive Self-Evolutionary Research Agent with Task-Distributed Multi-LLM Supporters

Recent multi-LLM agent systems perform well in prompt optimization and automated problem-solving, but many either keep the solver frozen after fine-tuning or become inefficient due to the increasing s...

By: Ujin Jeon, Jiyong Kwon, Madison Ann Sullivan, Caleb Eunho Lee, Guang Lin2026-02-04

#cs.AI

Reward-free Alignment for Conflicting Objectives

Direct alignment methods are increasingly used to align large language models (LLMs) with human preferences. However, many real-world alignment problems involve multiple conflicting objectives, where ...

By: Peter Chen, Xiaopeng Li, Xi Chen, Tianyi Lin2026-02-03

#cs.AI

PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

PixelGen is a pixel-space diffusion framework that uses perceptual supervision through LPIPS and DINO-based losses to generate high-quality images without requiring VAEs or latent representations. Pix...

By: Zehong Ma, Ruihan Xu, Shiliang Zhang2026-02-03

#cs.AI

AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations

High-quality scientific illustrations are crucial for effectively communicating complex scientific and technical concepts, yet their manual creation remains a well-recognized bottleneck in both resear...

By: Minjun Zhu, Zhen Lin, Yixuan Weng, Panzhong Lu, Qiujie Xie, Yifan Wei, Sifan Liu, Qiyao Sun, Yue Zhang2026-02-04

#cs.AI

AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration

This paper introduces AOrchestra, a novel framework designed to automate the creation and management of sub-agents within complex large language model (LLM)-based multi-agent systems. It aims to strea...

By: Jianhao Ruan, Zhihao Xu, Yiran Peng, Fashen Ren, Zhaoyang Yu, Xinbing Liang, Jinyu Xiang, Bang Liu, Chenglin Wu, Yuyu Luo, Jiayi Zhang2026-02-04

#cs.AI

Conformal Thinking: Risk Control for Reasoning on a Compute Budget

Reasoning Large Language Models (LLMs) enable test-time scaling, with dataset-level accuracy improving as the token budget increases, motivating adaptive reasoning -- spending tokens when they improve...

By: Xi Wang, Anushri Suresh, Alvin Zhang, Rishi More, William Jurayj, Benjamin Van Durme, Mehrdad Farajtabar, Daniel Khashabi, Eric Nalisnick2026-02-04

#cs.AI

When LLM meets Fuzzy-TOPSIS for Personnel Selection through Automated Profile Analysis

This study introduces an automated personnel selection system that combines large language models (LLMs) with Fuzzy-TOPSIS to enhance the hiring process. The system uses advanced natural language proc...

By: Shahria Hoque, Ahmed Akib Jawad Karim, Md. Golam Rabiul Alam, Nirjhar Gope2026-01-29

#cs.AI✓ Analyzed#LLM#Fuzzy-TOPSIS

Reinforcement Learning via Self-Distillation

This research introduces Self-Distillation Policy Optimization (SDPO), an on-policy reinforcement learning algorithm designed to significantly improve Large Language Model (LLM) performance by effecti...

By: Jonas Hübotter, Frederike Lübeck, Lejs Behric, Anton Baumann, Marco Bagatella, Daniel Marta, Ido Hakimi, Idan Shenfeld, Thomas Kleine Buening, Carlos Guestrin, Andreas Krause2026-01-28

#cs.AI

Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models

This paper introduces On-Policy Self-Distillation (OPSD), a novel framework enabling a single Large Language Model (LLM) to act as both teacher and student to significantly enhance its mathematical re...

By: Siyan Zhao, Zhihui Xie, Mengchen Liu, Xiangchen Song, Haoyang Li, Yuhui Li, Yizhou Wang2026-01-26

#cs.AI

One-step Latent-free Image Generation with Pixel Mean Flows

This paper introduces "pixel MeanFlow" (pMF), an innovative generative model enabling one-step, latent-free image generation. Diverging from conventional diffusion/flow-based models that employ multi-...

By: Yiyang Lu, Susie Lu, Qiao Sun, Hanhong Zhao, Zhicheng Jiang, Xianbang Wang, Tianhong Li, Zhengyang Geng, Kaiming He2026-01-29

#cs.AI

JAF: Judge Agent Forest

This paper introduces JAF: Judge Agent Forest, a novel framework designed to enhance the self-refinement and evaluation processes of agentic AI systems. Instead of assessing responses in isolation, th...

By: Sahil Garg, Brad Cheezum, Sridhar Dutta, Vishal Agarwal2026-01-29

#cs.AI

DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation

DynamicVLA is a novel framework for dynamic object manipulation, addressing the challenges faced by Vision-Language-Action (VLA) models in scenarios requiring rapid perception and continuous control o...

By: Haozhe Xie, Beichen Wen, Jiarui Zheng, Zhaoxi Chen, Fangzhou Hong, Haiwen Diao, Ziwei Liu2026-01-29

#cs.AI

DeepSeek-OCR 2: Visual Causal Flow

DeepSeek-OCR 2 introduces DeepEncoder V2, a cutting-edge vision-language model that significantly advances optical character recognition (OCR) capabilities. This model features a novel 'visual causal ...

By: Haoran Wei, Yaofeng Sun, Yukun Li2026-01-28

#cs.AI

Who's in Charge? Disempowerment Patterns in Real-World LLM Usage

This paper provides the first large-scale empirical analysis of disempowerment patterns in real-world AI assistant interactions, examining 1.5 million consumer Claude.ai conversations. The focus is on...

By: Mrinank Sharma, Miles McCain, Raymond Douglas, David Duvenaud2026-01-27

#cs.AI

LLM-ForcedAligner: A Non-Autoregressive and Accurate LLM-Based Forced Aligner for Multilingual and Long-Form Speech

Traditional forced alignment (FA) methods often suffer from language-specificity and cumulative temporal shifts. This paper introduces LLM-ForcedAligner, a novel approach that reformulates FA as a slo...

By: Bingshen Mu, Xian Shi, Xiong Wang, Hexin Liu, Jin Xu, Lei Xie2026-01-26

#cs.AI

How AI Impacts Skill Formation

AI assistance provides significant productivity gains, especially for novice workers. However, this study investigates how such assistance affects skill development. Through randomized experiments, it...

By: Judy Hanwen Shen, Alex Tamkin2026-01-28

#cs.AI#Generative AI#Skill Acquisition

The Patient is not a Moving Document: A World Model Training Paradigm for Longitudinal EHR

This paper introduces a novel world model training paradigm specifically designed for longitudinal Electronic Health Records (EHR). It addresses the challenges of integrating and interpreting continuo...

By: Irsyad Adam, Zekai Chen, David Laprade, Shaun Porwal, David Laub, Erik Reinertsen, Arda Pekis, Kevin Brown2026-01-30

#cs.AI

World of Workflows: a Benchmark for Bringing World Models to Enterprise Systems

This research proposes "World of Workflows," a benchmark designed to facilitate the integration of advanced AI world models into enterprise systems. It aims to evaluate and accelerate the application ...

By: Lakshya Gupta, Litao Li, Yizhe Liu, Sriram Ganapathi Subramanian, Kaheer Suleman, Zichen Zhang, Haoye Lu, Sumit Pasupalak2026-01-30

#cs.AI

Routing the Lottery: Adaptive Subnetworks for Heterogeneous Data

This paper introduces Runtime Task Learning (RTL), an adaptive AI method that enables models to dynamically adjust their architectures based on incoming heterogeneous data. It demonstrates significant...

By: Grzegorz Stefanski, Alberto Presta, Michal Byra2026-01-29

#cs.AI

PhaseCoder: Microphone Geometry-Agnostic Spatial Audio Understanding for Multimodal LLMs

This paper presents PhaseCoder, a transformer-only spatial audio encoder that operates independently of microphone geometry. It processes raw multichannel audio and microphone coordinates to perform l...

By: Artem Dementyev, Wazeer Zulfikar, Sinan Hersek, Pascal Getreuer, Anurag Kumar, Vivek Kumar2026-01-26

#cs.AI

Solver-in-the-Loop: MDP-Based Benchmarks for Self-Correction and Behavioral Rationality in Operations Research

This work introduces two new benchmarks, ORDebug and ORBias, that integrate a solver into the evaluation loop for AI models. ORDebug assesses iterative self-correction in solving infeasible operations...

By: Ruicheng Ao, David Simchi-Levi, Xinshang Wang2026-01-21

#cs.AI

Exploring Reasoning Reward Model for Agents

This paper focuses on developing and exploring a reasoning reward model designed to improve the capabilities of AI agents. It likely investigates how to effectively train agents by providing rewards t...

By: Kaixuan Fan, Kaituo Feng, Manyuan Zhang, Tianshuo Peng, Zhixun Li, Yilei Jiang, Shuang Chen, Peng Pei, Xunliang Cai, Xiangyu Yue2026-01-30

#cs.AI

Conditional Denoising Model as a Physical Surrogate Model

This paper explores the use of conditional denoising models as physical surrogate models for complex physical systems. It addresses the common trade-off between data-fitting accuracy and physical cons...

By: José Afonso, Pedro Viegas, Rodrigo Ventura, Vasco Guerra2026-01-21

#cs.AI

Self-Improving Pretraining: using post-trained models to pretrain better models

The "Self-Improving Pretraining" framework integrates alignment objectives (safety, factuality, quality) directly into LLM pretraining using a powerful post-trained model as a dynamic rewriter and jud...

By: Ellen Xiaoqing Tan, Shehzaad Dhuliawala, Jing Xu2026-01-29

#cs.AI

LLM-Assisted Logic Rule Learning: Scaling Human Expertise for Time Series Anomaly Detection

This framework leverages LLMs to encode human expertise into interpretable logic rules for time series anomaly detection in supply chains. It outperforms unsupervised methods in accuracy and interpret...

By: Jianing Fang, Yuxuan Chen, Yanchao Tan, Guangtao Huang, Hongxing Li, Xiang Li, Fei Wang, Yiheng Fan, Ziyue Li, Kai Shu, Jun Wang, Zihui Xue, Jie Xu2026-01-27

#cs.AI

A Pragmatic VLA Foundation Model

LingBot-VLA is a Vision-Language-Action foundation model pre-trained on 20,000 hours of real-world multi-embodiment robot data. It demonstrates that VLA model performance scales with increasing data v...

By: Wei Wu, Fan Lu, Yunnan Wang2026-01-26

#cs.AI

The Illusion of Insight in Reasoning Models

This paper investigates the phenomenon of "illusion of insight" in AI reasoning models, where models might appear to have genuine understanding without truly possessing it. The research critically exa...

By: Liv G. d'Aliberti, Manoel Horta Ribeiro2026-01-26

#cs.AI

Progressive Ideation using an Agentic AI Framework for Human-AI Co-Creation

The paper introduces an agentic AI framework designed to facilitate human-AI co-creation through progressive ideation. This framework allows for iterative development of ideas, combining human creativ...

By: Sankar B, Srinidhi Ranjini Girish, Aadya Bharti, Dibakar Sen2026-01-26

#cs.AI

Mortar: Evolving Mechanics for Automatic Game Design

The paper introduces Mortar, a system that uses evolving mechanics for automatic game design. This AI-driven approach can generate novel game rules and interactions, aiming to accelerate the game deve...

By: Muhammad U. Nasir, Yuchen Li, Steven James, Julian Togelius2026-01-26

#cs.AI

From Clay to Code: Typological and Material Reasoning in AI Interpretations of Iranian Pigeon Towers

This research explores AI's ability to interpret and reason about architectural heritage, specifically Iranian Pigeon Towers, using typological and material reasoning. It demonstrates how AI can contr...

By: Abolhassan Pishahang, Maryam Badiei2026-01-26

#cs.AI

DA-DPO: Cost-efficient Difficulty-aware Preference Optimization for Reducing MLLM Hallucinations

This work presents DA-DPO, a cost-efficient and difficulty-aware preference optimization method aimed at significantly reducing hallucinations in Multimodal Large Language Models (MLLMs). By optimizin...

By: Longtian Qiu, Shan Ning, Chuyu Zhang, Jiaxuan Sun, Xuming He2026-01-26

#cs.AI

Adaptive Causal Coordination Detection for Social Media: A Memory-Guided Framework with Semi-Supervised Learning

This paper proposes a memory-guided framework with semi-supervised learning for detecting adaptive causal coordination on social media. The approach aims to identify complex, evolving coordination pat...

By: Weng Ding, Yi Han, Mu-Jiang-Shan Wang2026-01-26

#cs.AI

A multi-algorithm approach for operational human resources workload balancing in a last mile urban delivery system

This paper proposes a multi-algorithm approach to optimize human resources workload balancing in last-mile urban delivery systems. The methodology aims to improve operational efficiency and resource a...

By: Luis M. Moreno-Saavedra, Silvia Jimenez-Fernandez, Antonio Portilla-Figueras, David Casillas-Perez, Sancho Salcedo-Sanz2026-01-26

#cs.AI

Can Semantic Methods Enhance Team Sports Tactics? A Methodology for Football with Broader Applications

This research explores how semantic methods can improve tactical analysis in team sports, specifically football. It presents a methodology that uses AI to derive deeper insights into game strategies, ...

By: Alessio Di Rubbo, Mattia Neri, Remo Pareschi, Marco Pedroni, Roberto Valtancoli, Paolino Zica2026-01-26

#cs.AI#Sports Analytics#Semantic Web

Self-Distillation Enables Continual Learning

This paper introduces Self-Distillation Fine-Tuning (SDFT), a method enabling large language models to continually acquire new skills and knowledge from demonstrations without catastrophic forgetting....

By: Idan Shenfeld, Tianxiao Shen, Jonathan Gordon2026-01-27

#cs.AI

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

This paper introduces AgentDoG, a diagnostic guardrail framework for AI agent safety and security, addressing challenges from autonomous tool use and environmental interactions. It provides fine-grain...

By: Shanghai AI Lab2026-01-26

#cs.AI

CovAgent: Overcoming the 30% Curse of Mobile Application Coverage with Agentic AI and Dynamic Instrumentation

This paper proposes CovAgent, an agentic AI-powered approach to enhance Android app UI testing by inspecting decompiled Smali code and component transition graphs. It reasons about unsatisfied activat...

By: Wei Minn, Biniam Fisseha Demissie2026-01-29

#cs.AI✓ Analyzed#LLM#Mobile Testing

Ultra-Low Latency Object Detection on Edge Devices for Autonomous Drone Navigation

We present a highly optimized neural network architecture and deployment framework enabling real-time, ultra-low latency object detection on resource-constrained edge devices for autonomous drone navi...

By: Dr. Hiroshi Tanaka, Dr. Isabella Rossi, Dr. Jacob Smith, Dr. Katerina Novikova2026-01-28

#cs.AI

Transparent and Trustworthy AI for Real-time Financial Fraud Detection

We propose a novel explainable AI framework designed for real-time financial fraud detection, offering both high accuracy and clear, human-understandable explanations for its predictions. This system ...

By: Dr. Robert Johnson, Dr. Sarah Chen, Dr. Thomas Lee, Dr. Ursula Weber, Dr. Victor Morales2026-01-26

#cs.AI

Multi-Agent Reinforcement Learning for Dynamic Urban Traffic Signal Control

This paper presents a multi-agent reinforcement learning system that dynamically optimizes urban traffic signal control in real-time. Experimental results demonstrate significant reductions in traffic...

By: Dr. Wendy Davis, Dr. Xuan Zhou, Dr. Yuri Kim, Dr. Zoe Green2026-01-25

#cs.AI

Adaptive Learning Content Generation with Large Language Models for K-12 Education

We explore the use of large language models to adaptively generate personalized educational content for K-12 students, catering to individual learning styles and paces. This approach promises to revol...

By: Dr. Alex Chang, Dr. Brenda Lee, Dr. Carlos Ruiz, Dr. Diana Popova, Dr. Ethan Brown2026-01-24

#cs.AI

AI-Powered Optimization of Direct Air Capture Technologies for Carbon Dioxide Removal

This research focuses on applying advanced AI techniques, including Bayesian optimization and deep learning, to optimize the design and operational parameters of direct air capture (DAC) technologies....

By: Dr. Fiona MacLeod, Dr. Gregory Parker, Dr. Hannah Zhao, Dr. Ivan Volkov, Dr. Jessica O'Connell2026-01-23

#cs.AI

Personalized Drug Discovery through Generative AI Foundation Models

This paper explores the application of large-scale generative AI foundation models for accelerating personalized drug discovery. It details novel architectures capable of synthesizing drug candidates ...

By: Dr. Anya Petrova, Dr. Ben Carter, Dr. Chen Li, Dr. David Sharma, Dr. Emily Wong, Dr. Frank Miller, Dr. Grace Kim2026-01-29

#cs.AI✓ Analyzed#Generative AI#Drug Discovery

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Tencent researchers introduced Youtu-VL, a Vision-Language Model framework addressing fine-grained visual information loss with a "vision-as-target" optimization paradigm, achieving competitive perfor...

By: Zhixiang Wei, Yi Li, Zhehan Kan, Xinghua Jiang, Zuwei Long, Shifeng Liu, Hongze Shen, Wei Liu, Xiaoyu Tan, Haojia Lin, Yubo Zhu, Qianyu Li, Di Yin, Haoyu Cao, Weibo Gu, Xin Li, Yinsong Liu, Deqiang Jiang, Xing Sun, Yunsheng Wu, Mingkong Tang, Shuangyin Liu, Lexiang Tang, Haodong Lin, Junru Lu, Jiarui Qin, Lingfeng Qiao, Ruizhi Qiao, Bo Ke, Jianfeng He, Ke Li, Yangning Li, Yunhang Shen, Mengdan Zhang, Peixian Chen, Kun Yin, Bing Liu, Yunfei Wu, Huang Chen, Zhongpeng Cai, Xiaotian Li2026-01-27

#cs.AI

Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models

Researchers introduced a framework and benchmark to study visual world modeling in Unified Multimodal Models (UMMs), demonstrating that visual generation significantly improves reasoning on physical a...

By: Jialong Wu, Xiaoying Zhang, Hongyi Yuan, Xiangcheng Zhang, Tianhao Huang, Changjing He, Chaoyi Deng, Renrui Zhang, Youbin Wu, Mingsheng Long2026-01-27

#cs.AI

NeuroAI and Beyond

This paper advocates for NeuroAI, a type of Neuroscience-informed Artificial Intelligence, by identifying current and future areas of synergism between neuroscience and AI. It focuses on embodiment, l...

By: Jean-Marc Fellous, Gert Cauwenberghs, Cornelia Fermüller, Yulia Sandamirskaya, Terrence Sejnowski2026-01-29

#cs.AI

Masked Depth Modeling for Spatial Perception

Robbyant introduces Masked Depth Modeling (MDM), a framework that leverages natural sensor failures in RGB-D cameras as learning signals to generate dense, metric-scale, and pixel-aligned depth maps. ...

By: Bin Tan, Changjiang Sun, Xiage Qin, Hanat Adai, Zelin Fu, Tian Zhou, Han Zhang, Yinghao Xu, Xing Zhu, Yujun Shen, Nan Xue2026-01-25

#cs.AI

Vision-Language Pre-training for Medical Image Analysis

This paper explores the application of vision-language pre-training techniques to improve the accuracy and interpretability of medical image analysis. By jointly learning from image and text data, the...

By: Xavier Garcia, Yara Rodriguez, Zoe Miller2026-01-22

#cs.AI

Interpretable AI for Financial Risk Assessment

This paper develops novel interpretable AI models for transparent and reliable financial risk assessment. By providing clear explanations for their predictions, these models increase trust and facilit...

By: Peter Scott, Quinn Adams, Rachel Baker2026-01-24

#cs.AI

Foundational Models for Robotics in Dynamic Environments

This paper explores the development of novel foundational models that enable robots to operate robustly and adaptively in complex and rapidly changing real-world environments. The models integrate adv...

By: Alice Smith, Bob Johnson, Carol White2026-01-28

#cs.AI

Efficient Language Model Quantization for Edge Devices

The research presents a novel quantization technique that significantly reduces the computational and memory footprint of large language models, making them deployable on resource-constrained edge dev...

By: David Green, Eva Black, Frank Blue2026-01-27

#cs.AI

Generative AI for Personalized Drug Discovery

This paper proposes a generative AI framework that accelerates the discovery of novel drug candidates tailored to individual patient genetic profiles. By leveraging advanced deep learning architecture...

By: Grace Lee, Henry Kim, Ivy Chen, Jack Wu2026-01-26

#cs.AI

Continual Learning for Autonomous Driving Systems

Addressing the challenge of catastrophic forgetting, this research introduces a continual learning paradigm for autonomous driving agents. The proposed methods allow vehicles to continuously learn fro...

By: Karen Park, Leo Rodriguez, Mia Taylor, Noah Davis, Olivia Hall2026-01-25

#cs.AI

daVinci-Dev: Agent-native Mid-training for Software Engineering

This paper introduces daVinci-Dev, a systematic agentic mid-training approach that equips large language models (LLMs) with foundational agentic behaviors for software engineering. It addresses the di...

By: Ji Zeng, Dayuan Fu, Tiantian Mi, Yumin Zhuang, Yaxing Huang, Xuefeng Li, Lyumanshan Ye, Muhang Xie, Qishuo Hua, Zhen Huang, Mohan Jiang, Hanning Wang, Jifan Lin, Yang Xiao, Jie Sun, Yunze Wu, Pengfei Liu2026-01-26

#cs.AI

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

This paper introduces SOAR, a new self-improvement framework that enables large language models (LLMs) to generate their own curricula for mathematical reasoning problems they cannot initially solve. ...

By: Shobhita Sundaram, John Quan, Ariel Kwiatkowski, Kartik Ahuja, Yann Ollivier, Julia Kempe2026-01-26

#cs.AI

TelcoAI: Advancing 3GPP Technical Specification Search through Agentic Multi-Modal Retrieval-Augmented Generation

This paper presents TelcoAI, an agentic, multi-modal Retrieval-Augmented Generation (RAG) system specifically designed for 3GPP documentation. It significantly improves recall, claim recall, and faith...

By: Rahul Ghosh, Chun-Hao Liu, Gaurav Rele, Vidya Sagar Ravipati, Hazar Aouad2026-01-27

#cs.AI

TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models

This paper introduces TSRBench, a comprehensive benchmark designed for multi-task and multi-modal time series reasoning. It aims to evaluate and advance generalist AI models in their ability to unders...

By: Fangxu Yu, Xingang Guo, Lingzhi Yuan, Haoqiang Kang, Hongyu Zhao, Lianhui Qin, Furong Huang, Bin Hu, Tianyi Zhou2026-01-27

#cs.AI

When Agents Fail to Act: A Diagnostic Framework for Tool Invocation Reliability in Multi-Agent LLM Systems

This research introduces a comprehensive diagnostic framework that utilizes big data analytics to evaluate the procedural reliability of intelligent agent systems. It addresses critical needs for depl...

By: Donghao Huang, Gauri Malwe, Zhaoxia Wang2026-01-26

#cs.AI

Why Keep Your Doubts to Yourself? Trading Visual Uncertainties in Multi-Agent Bandit Systems

This paper investigates how multi-agent bandit systems can effectively exchange and leverage visual uncertainties to improve decision-making. This is particularly relevant in dynamic environments wher...

By: Jusheng Zhang, Yijia Fan, Kaitong Cai, Jing Yang, Jiawei Yao, Jian Wang, Guanlong Qu, Ziliang Chen, Keze Wang2026-01-27

#cs.AI✓ Analyzed#Multi-Agent Systems#Contextual Bandits

Health-SCORE: Towards Scalable Rubrics for Improving Health-LLMs

This research focuses on developing scalable rubrics to enhance the quality and reliability of Large Language Models (LLMs) specifically tailored for healthcare applications. The goal is to improve th...

By: Zhichao Yang, Sepehr Janghorbani, Dongxu Zhang, Jun Han, Qian Qian, Andrew Ressler II, Gregory D. Lyng, Sanjit Singh Batra, Robert E. Tillman2026-01-27

#cs.AI

Conditioned Generative Modeling of Molecular Glues: A Realistic AI Approach for Synthesizable Drug-like Molecules

The paper proposes a novel generative AI approach for creating synthesizable drug-like molecular glues. This realistic AI method offers a promising pathway for discovering new therapeutic compounds, a...

By: Naeyma N. Islam, Thomas R. Caulfield2026-01-27

#cs.AI

When Generative AI Meets Extended Reality: Enabling Scalable and Natural Interactions

This paper explores the convergence of generative AI and Extended Reality (XR) to enable more scalable and natural human-computer interactions. It delves into how AI can enhance immersive experiences,...

By: Mingyu Zhu, Jiangong Chen, Bin Li2026-01-22

#cs.AI

Skywork UniPic 3.0: Unified Multi-Image Composition via Sequence Modeling

Skywork UniPic 3.0 introduces a unified multi-image composition framework that leverages sequence modeling to generate complex and coherent images from multiple input components. This advancement in g...

By: Hongyang Wei, Hongbo Liu, Zidong Wang, Yi Peng, Baixin Xu, Size Wu, Xuying Zhang, Xianglong He, Zexiang Liu, Peiyu Wang, Xuchen Song, Yangguang Li, Yang Liu, Yahui Zhou2026-01-23

#cs.AI

Multimodal Climate Disinformation Detection: Integrating Vision-Language Models with External Knowledge Sources

This research proposes a novel approach to detect climate change disinformation by integrating vision-language models with external knowledge sources. The multimodal system analyzes both textual and v...

By: Marzieh Adeli Shamsabad, Hamed Ghodrati2026-01-23

#cs.AI✓ Analyzed#Multimodal Learning#Disinformation Detection

Multi-Persona Thinking for Bias Mitigation in Large Language Models

This paper proposes "Multi-Persona Thinking" as a novel approach to mitigate social biases in Large Language Models (LLMs). By enabling LLMs to consider multiple perspectives, the research aims to red...

By: Yuxing Chen, Guoqing Luo, Zijun Wu, Lili Mou2026-01-23

#cs.AI

LLM Prompt Evaluation for Educational Applications

This paper focuses on developing methods for evaluating prompts for Large Language Models (LLMs) specifically in educational contexts. It addresses the challenges of assessing prompt effectiveness and...

By: Langdon Holmes, Adam Coscia, Scott Crossley, Joon Suh Choi, Wesley Morris2026-01-23

#cs.AI

FlexLLM: Composable HLS Library for Flexible Hybrid LLM Accelerator Design

This paper introduces FlexLLM, a composable High-Level Synthesis (HLS) library designed for flexible hybrid Large Language Model (LLM) accelerator design. It aims to streamline the development of effi...

By: Jiahao Zhang, Zifan He, Nicholas Fraser, Michaela Blott, Yizhou Sun, Jason Cong2026-01-23

#cs.AI

Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

This paper introduces Cosmos Policy, a method for fine-tuning large, pretrained latent video diffusion models into unified robot policies for visuomotor control and planning. It achieves state-of-the-...

By: Moo Jin Kim, Yihuai Gao, Tsung-Yi Lin, Yen-Chen Lin, Yunhao Ge, Grace Lam, Percy Liang, Shuran Song, Ming-Yu Liu, Chelsea Finn, Jinwei Gu2026-01-23

#cs.AI

Controlling Long-Horizon Behavior in Language Model Agents with Explicit State Dynamics

This paper explores methods for controlling the long-term behavior of language model agents by incorporating explicit state dynamics. It aims to improve the predictability and reliability of AI agents...

By: Sukesh Subaharan2026-01-23

#cs.AI

VideoMaMa: Mask-Guided Video Matting via Generative Prior

Generalizing video matting models to real-world videos remains a significant challenge due to the scarcity of labeled data. We present VideoMaMa, a novel mask-guided video matting framework that conve...

By: Sangbeom Lim, Seoung Wug Oh, Jiahui Huang, Heeji Yoon, Seungryong Kim, Joon-Young Lee2026-01-20

#cs.AI

Reasoning in Action: MCTS-Driven Knowledge Retrieval for Large Language Models

Large language models (LLMs) often struggle with complex reasoning tasks that require accurate and up-to-date factual knowledge. This paper proposes a novel framework that integrates Monte Carlo Tree ...

By: Shuqi Liu, Bowei He, Chen Ma, Linqi Song2026-01-22

#cs.AI

Large Language Model-Powered Evolutionary Code Optimization on a Phylogenetic Tree

Optimizing scientific computing algorithms for modern GPUs is a labor-intensive and iterative process involving repeated code modification, benchmarking, and tuning across complex hardware and softwar...

By: Leyi Zhao, Weijie Huang, Yitong Guo, Jiang Bian, Chenghong Wang, Xuhong Zhang2026-01-20

#cs.AI#LLM#Knowledge Graph

An Agentic Operationalization of DISARM for FIMI Investigation on Social Media

Foreign Information Manipulation and Interference (FIMI) on social media poses a significant threat to democratic processes. This paper proposes a framework-agnostic agent-based operationalization of ...

By: Kevin Tseng, Juan Carlos Toledano, Bart De Clerck, Yuliia Dukach, Phil Tinn2026-01-21

#cs.AI✓ Analyzed#DISARM#FIMI

Call2Instruct: Automated Pipeline for Generating Q&A Datasets from Call Center Recordings for LLM Fine-Tuning

Specific domains depend on high-quality fine-tuning datasets, particularly in instructional format (e.g., Question-Answer - Q&A). However, generating these datasets, particularly from unstructured sou...

By: Alex Echeverria, Sávio Salvarino Teles de Oliveira, Fernando Marques Federson2026-01-20

#cs.AI

The Agentic Leash: Extracting Causal Feedback Fuzzy Cognitive Maps with LLMs

This paper introduces "The Agentic Leash," a method for extracting causal feedback fuzzy cognitive maps using Large Language Models (LLMs). This approach enables better interpretability and understand...

By: Akash Kumar Panda, Olaoluwa Adigun, Bart Kosko2026-01-15

#cs.AI

A Vision-and-Knowledge Enhanced Large Language Model for Generalizable Pedestrian Crossing Behavior Inference

This paper presents a novel approach utilizing a vision-and-knowledge enhanced large language model to achieve generalizable inference of pedestrian crossing behavior. This development is crucial for ...

By: Qingwen Pu, Kun Xie, Hong Yang2026-01-19

#cs.AI

SciCoQA: Quality Assurance for Scientific Paper--Code Alignment

We present SciCoQA, a dataset for detecting discrepancies between scientific publications and their codebases to ensure faithful implementations. We construct SciCoQA from GitHub issues and reproducib...

By: Tim Baumgärtner, Nitay2026-01-19

#imported✓ Analyzed#Reproducibility#NLP

Your One-Stop Solution for AI-Generated Video Detection

This paper presents a comprehensive solution for detecting AI-generated videos, a critical need due to the increasing realism of synthetic media. The proposed system utilizes advanced computer vision ...

By: Long Ma, Zihao Xue, Yan Wang, Zhiyuan Yan, Jin Xu, Xiaorui Jiang, Haiyang Yu, Yong Liao, Zhen Bi2026-01-14

#cs.AI✓ Analyzed#Deepfake Detection#Video Forensics

The Great March 100: 100 Detail-oriented Tasks for Evaluating Embodied AI Agents

This paper introduces "The Great March 100" (GM-100), a benchmark of 100 detail-oriented tasks for evaluating embodied AI agents. It addresses limitations in existing datasets by providing a diverse a...

By: Ziyu Wang, Chenyuan Liu, Yushun Xiang, Runhao Zhang, Qingbo Hao, Hongliang Lu, Houyu Chen, Zhizhong Feng, Kaiyue Zheng, Dehao Ye, Xianchao Zeng, Xinyu Zhou, Boran Wen, Jiaxin Li, Mingyu Zhang, Kecheng Zheng, Qian Zhu, Ran Cheng, Yong-Lu Li2026-01-16

#cs.AI

ShapeR: Robust Conditional 3D Shape Generation from Casual Captures

ShapeR introduces a novel approach for robust conditional 3D object shape generation from casually captured image sequences. It leverages multi-modal inputs like SLAM points, posed images, and VLM-gen...

By: Yawar Siddiqui, Duncan Frost, Samir Aroudj, Armen Avetisyan, Henry Howard-Jenkins, Daniel DeTone, Pierre Moulon, Qirui Wu, Zhengqin Li, Julian Straub, Richard Newcombe, Jakob Engel2026-01-16

#cs.AI

Hyperparameter Optimization of Constraint Programming Solvers

This paper addresses the critical challenge of hyperparameter optimization for Constraint Programming (CP) solvers. It proposes advanced techniques to automatically tune these parameters, significantl...

By: Hedieh Haddad, Thibault Falque, Pierre Talbot, Pascal Bouvry2026-01-19

#cs.AI

Health Facility Location in Ethiopia: Leveraging LLMs to Integrate Expert Knowledge into Algorithmic Planning

This research proposes the Large language model and Extended Greedy (LEG) framework to optimize health facility location in Ethiopia. It integrates expert knowledge, articulated in natural language, w...

By: Yohai Trabelsi, Guojun Xiong, Fentabil Getnet, Stéphane Verguet, Milind Tambe2026-01-16

#cs.AI

Exploring LLM Features in Predictive Process Monitoring for Small-Scale Event-Logs

This paper extends an LLM-based framework for Predictive Process Monitoring (PPM), evaluating its generality and reasoning mechanisms. It demonstrates that LLMs outperform benchmark methods in data-sc...

By: Alessandro Padella, Massimiliano de Leoni, Marlon Dumas2026-01-16

#cs.AI

BoxMind: Closed-loop AI strategy optimization for elite boxing validated in the 2024 Olympics

This paper introduces BoxMind, a closed-loop AI expert system for optimizing boxing strategies. It uses multi-modal data to define atomic punch events and proposes a graph-based predictive model to ca...

By: Kaiwen Wang, Kaili Zheng, Rongrong Deng, Qingmin Fan, Milin Zhang, Zongrui Li, Xuesi Zhou, Bo Han, Liren Chen, Chenyi Guo, Ji Wu2026-01-16

#cs.AI

The Impact of Generative AI on Architectural Conceptual Design: Performance, Creative Self-Efficacy and Cognitive Load

This research investigates the multifaceted impact of generative AI tools on the early stages of architectural design. It examines how these AI systems influence designers' performance, their creative...

By: Han Jiang, Yao Xiao, Rachel Hurley, Shichao Liu2026-01-16

#cs.AI

Structure and Diversity Aware Context Bubble Construction for Enterprise Retrieval Augmented Systems

This paper introduces a novel approach for constructing "context bubbles" in enterprise retrieval-augmented generation (RAG) systems, focusing on both the structural integrity and semantic diversity o...

By: Amir Khurshid, Abhishek Sehgal2026-01-16

#cs.AI

Japanese AI Agent System on Human Papillomavirus Vaccination: System Design

Human papillomavirus (HPV) vaccine hesitancy poses significant public health challenges, particularly in Japan where proactive vaccination recommendations were suspended from 2013 to 2021. The resulti...

By: Junyu Liu, Siwen Yang, Dexiu Ma, Qian Niu, Zequn Zhang, Momoko Nagai-Tanima, Tomoki Aoyama2026-01-19

#cs.AI

Can We Trust AI Explanations? Evidence of Systematic Underreporting in Chain-of-Thought Reasoning

This research investigates the reliability of AI explanations, specifically focusing on chain-of-thought reasoning in large language models. The study provides evidence of systematic underreporting, w...

By: Deep Pankajbhai Mehta2026-01-15

#cs.AI

CogCanvas: Verbatim-Grounded Artifact Extraction for Long LLM Conversations

This paper introduces CogCanvas, a system designed for verbatim-grounded artifact extraction from extensive Large Language Model (LLM) conversations. It addresses the challenge of managing and leverag...

By: Tao An2026-01-15

#cs.AI

Foundation Models for Personalized Healthcare: A Multimodal Approach

This paper introduces a novel framework utilizing multimodal foundation models to create highly personalized healthcare solutions. It integrates patient data from various sources including genomics, e...

By: Anna Petrova, Dmytro Kovalenko, Olena Lysenko, Sergii Tkachenko, Victoria Bondar2026-01-18

#cs.AI

Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering

This paper introduces ML-Master 2.0, an autonomous agent tackling ultra-long-horizon machine learning engineering. It uses Hierarchical Cognitive Caching to manage context and sustain strategic cohere...

By: Xinyu Zhu, Yuzhu Cai, Zexi Liu, Bingyang Zheng, Cheng Wang, Rui Ye, Jiaao Chen, Hanrui Wang, Wei-Chen Wang, Yuzhi Zhang, Linfeng Zhang, Weinan E, Di Jin, Siheng Chen2026-01-15

#cs.AI

LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

LSRIF introduces a logic-structured training framework that explicitly models instruction logic for large language models to improve instruction-following. It addresses challenges with sequential depe...

By: Qingyu Ren, Qianyu He, Jingwen Chang, Jie Zeng, Jiaqing Liang, Yanghua Xiao, Han Xia, Zeye Sun, Fei Yu2026-01-10

#cs.AI

Epistemology gives a Future to Complementarity in Human-AI Interactions

This paper leverages epistemology to reframe human-AI complementarity, aiming to address theoretical challenges in understanding when human-AI teams outperform either alone. It seeks to provide a more...

By: Andrea Ferrario, Rasita Vinay, Matteo Casserini, Alessandro Facchini2026-01-14

#cs.AI

DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation

DeepResearchEval is an automated framework for constructing deep research tasks and evaluating AI agents. It addresses challenges in assessing multi-step web research and cross-source information synt...

By: Yibo Wang, Lei Wang, Yue Deng, Keming Wu, Yao Xiao, Huanjin Yao, Liwei Kang, Hai Ye, Yongcheng Jing, Lidong Bing2026-01-14

#cs.AI

Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning

This paper proposes Test-Time Tool Evolution (TTE), a new paradigm enabling LLM agents to synthesize, verify, and evolve executable tools during inference for scientific reasoning. It overcomes the li...

By: Jiaxuan Lu, Ziyu Kong, Yemin Wang, Rong Fu, Haiyuan Wan, Cheng Yang, Wenjie Lou, Haoran Sun, Lilong Wang, Yankai Jiang, Xiaosong Wang, Xiao Sun, Dongzhan Zhou2026-01-12

#cs.AI

A Scoping Review of the Ethical Perspectives on Anthropomorphising Large Language Model-Based Conversational Agents

This scoping review maps ethically-oriented work on anthropomorphising LLM-based conversational agents, discussing benefits like engagement and inclusion versus concerns such as deception and overreli...

By: Andrea Ferrario, Tetsuya Sakai, Matteo Casserini, Alessandro Facchini2026-01-14

#cs.AI

Controlled Self-Evolution for Algorithmic Code Optimization

This paper proposes Controlled Self-Evolution (CSE) to enhance code generation through iterative generate-verify-refine cycles. It addresses inefficiencies in existing self-evolution methods for algor...

By: Tu Hu, Ronghao Chen, Shuo Zhang, Jianghao Yin, Mou Xiao Feng, Jingping Liu, Shaolei Zhang, Wenqi Jiang, Yuqi Fang, Sen Hu, Yi Xu, Huacan Wang2026-01-12

#cs.AI

PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records

Real-world deployment of GUI agents requires aligning with users' complex implicit intents, beyond explicit instructions. This paper introduces "PersonalAlign," a new agent task where agents utilize l...

By: Yibo Lyu, Gongwei Chen, Rui Shao, Weili Guan, Liqiang Nie2026-01-14

#cs.AI

StackPlanner: A Centralized Hierarchical Multi-Agent System with Task-Experience Memory Management

Centralized multi-agent systems based on LLMs often struggle with unstable long-horizon collaboration due to a lack of memory management, leading to context bloat, error accumulation, and poor cross-t...

By: Ruizhe Zhang, Xinke Jiang, Zhibang Yang, Zhixin Zhang, Jiaran Gao, Yuzhen Xiao, Hongbin Lai, Xu Chu, Junfeng Zhao, Yasha Wang2026-01-09

#cs.AI

SimuAgent: An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning

While LLMs excel in text-based code automation, their potential in graph-oriented engineering workflows like Simulink remains underexplored. SimuAgent is an LLM-powered modeling and simulation agent f...

By: Yanchang Liang, Xiaowei Zhao2026-01-09

#cs.AI

DynaDebate: Breaking Homogeneity in Multi-Agent Debate with Dynamic Path Generation

Large Language Model-based Multi-Agent Debate (MAD) frameworks enhance reasoning and collaboration, but existing approaches suffer from agents adopting identical reasoning paths, leading to errors and...

By: Zhenghao Li, Zhi Zheng, Wei Chen, Jielun Zhao, Yong Chen, Tong Xu, Enhong Chen2026-01-09

#cs.AI

Automating Supply Chain Disruption Monitoring via an Agentic AI Approach

Modern supply chains are increasingly vulnerable to disruptions. This paper introduces a minimally supervised agentic AI framework that autonomously monitors, analyzes, and responds to disruptions acr...

By: Sara AlMahri, Liming Xu, Alexandra Brintrup2026-01-15

#cs.AI

ConvoLearn: A Dataset of Constructivist Tutor-Student Dialogue

Large Language Models (LLMs) in educational applications often reveal solutions rather than fostering dialogic learning. This paper introduces ConvoLearn, a dataset grounded in knowledge building theo...

By: Mayank Sharma, Roy Pea, Hari Subramonyam2026-01-13

#cs.AI

Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Multi-agent systems powered by Large Language Models (LLMs) often struggle with resource-intensive and unstable training due to non-stationarity and sparse rewards in multi-agent reinforcement learnin...

By: Zhiyuan Hu, Yunhai Hu, Juncheng Liu, Shuyue Stella Li, Yucheng Wang, Zhen Xu, See-Kiong Ng, Anh Tuan Luu, Xinxing Xu, Bryan Hooi, Cynthia Breazeal, Hae Won Park2026-01-14

#cs.AI

Predictive Analytics for Dementia: Machine Learning on Healthcare Data

This study enhances dementia prediction using machine learning techniques on patient health data, with supervised learning algorithms like KNN, QDA, LDA, and Gaussian Process Classifiers. LDA achieved...

By: Shafiul Ajam Opee, Nafiz Fahad, Anik Sen, Rasel Ahmed, Fariha Jahan, Md. Kishor Morol, Md Rashedul Islam2026-01-13

#cs.AI

Open World Knowledge Aided Single-Cell Foundation Model with Robust Cross-Modal Cell-Language Pre-training

Recent advancements in single-cell multi-omics provide profound insights into cellular heterogeneity. This paper proposes OKR-CELL, an Open-world Language Knowledge-Aided Robust Single-Cell Foundation...

By: Haoran Wang, Xuanyi Zhang, Shuangsang Fang, Longke Ran, Ziqing Deng, Yong Zhang, Yuxiang Li, Shaoshuai Li2026-01-09

#cs.AI

Few-shot Class-Incremental Learning via Generative Co-Memory Regularization

This work introduces a generative co-memory regularization approach for Few-shot Class-Incremental Learning (FSCIL). The method leverages generative domain adaptation to fine-tune a pre-trained encode...

By: Kexin Bao, Yong Li, Dan Zeng, Shiming Ge2026-01-12

#cs.AI

ECLIPSE: An Evolutionary Computation Library for Instrumentation Prototyping in Scientific Engineering

This paper introduces ECLIPSE, an Evolutionary Computation Library for Instrumentation Prototyping in Scientific Engineering. This library aims to accelerate the design and optimization of scientific ...

By: Max Foreback, Evan Imata, Vincent Ragusa, Jacob Weiler, Christina Shao, Joey Wagner, Katherine G. Skocelas, Jonathan Sy, Aman Hafez, Wolfgang Banzhaf, Amy Conolly, Kyle R. Helson, Rick Marcusen, Charles Ofria, Marcin Pilinski, Rajiv Ramnath, Bryan Reynolds, Anselmo C. Pontes, Emily Dolson, Julie Rolla2026-01-09

#cs.AI

Benchmarking Small Language Models and Small Reasoning Language Models on System Log Severity Classification

This paper benchmarks nine small language models (SLMs) and small reasoning language models (SRLMs) on system log severity classification using real-world `journalctl` data from Linux production serve...

By: Yahya Masri, Emily Ma, Zifu Wang, Joseph Rogers, Chaowei Yang2026-01-13

#cs.AI

AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling for LLMs

This paper proposes AdaFuse, an adaptive ensemble decoding method with test-time scaling for large language models (LLMs). This approach aims to enhance the performance of LLMs by combining outputs fr...

By: Chengming Cui, Tianxin Wei, Ziyi Chen, Ruizhong Qiu, Zhichen Zeng, Zhining Liu, Xuying Ning, Duo Zhou, Jingrui He2026-01-12

#cs.AI

AI-Assisted Authoring for Transparent, Data-Driven Documents

This paper introduces "transparent documents," interactive web-based scholarly articles that allow readers to explore the relationship to underlying data by hovering over text fragments. It also prese...

By: Alfonso Piscitelli, Cristina David, Mattia De Rosa, Ali Mohammed, Federico Nanni, Jacob Pake, Roly Perera, Jessy Sodimu, Chenyiqiu Zheng2026-01-12

#cs.AI

PsychEval: A Multi-Session and Multi-Therapy Benchmark for High-Realism and Comprehensive AI Psychological Counselor

This paper proposes PsychEval, a new multi-session and multi-therapy benchmark for evaluating AI psychological counselors. It aims to provide high-realism and comprehensive assessment of AI's capabili...

By: Wei Wang, Chenxu Wang, Guohao Li, Guojun Wu, Yutong Lu, Xiang Zhou, Zhaoting Li, Shizhong Liu2026-01-06

#cs.AI

MineNPC-Task: Task Suite for Memory-Aware Minecraft Agents

This paper introduces MineNPC-Task, a task suite designed to evaluate memory-aware Minecraft agents. It focuses on the development of AI agents that can effectively manage and utilize memory in comple...

By: Tamil Sudaravan Mohan Doss, Michael Xu, Sudha Rao, Andrew D. Wilson, Balasaravanan Thoravi Kumaravel2026-01-09

#cs.AI

Finetuning Large Language Models for Automated Depression Screening in Nigerian Pidgin English: GENSCORE Pilot Study

Depression is a major contributor to the mental-health burden in Nigeria, yet screening coverage remains limited due to low access to clinicians, stigma, and language barriers. This paper explores fin...

By: Isaac Iyinoluwa Olufadewa, Miracle Ayomikun Adesina, Ezekiel Ayodeji Oladejo, Uthman Babatunde Usman, Owen Kolade Adeniyi, Matthew Tolulope Olawoyin2026-01-05

#cs.AI

Adaptive Hybrid Optimizer based Framework for Lumpy Skin Disease Identification

Lumpy Skin Disease (LSD) is a contagious viral infection that significantly deteriorates livestock health. Early and precise identification is crucial. This paper proposes a hybrid deep learning-based...

By: Muhammad Tahir, Abdul Basit, Muhammad Awais, Muhammad Imran, Farman Ali, Muhammad Shoaib, Ali Raza2026-01-06

#cs.AI

Stock Market Price Prediction using Neural Prophet with Deep Neural Network

This paper proposes a novel approach for stock market price prediction leveraging a hybrid model that combines Neural Prophet with a Deep Neural Network (DNN). The integration aims to capture both tim...

By: Navin Chhibber, Suneel Khemka, Navneet Kumar Tyagi, Rohit Tewari, Bireswar Banerjee, Piyush Ranjan2026-01-09

#cs.AI#Stock Prediction#Deep Learning

Learning Latent Action World Models In The Wild

Agents capable of reasoning and planning in the real world require the ability of predicting the consequences of their actions. While world models possess this capability, they most often require acti...

By: Quentin Garrido, Tushar Nagarajan, Basile Terver, Nicolas Ballas, Yann LeCun, Michael Rabbat2026-01-09

#cs.AI

Observations and Remedies for Large Language Model Bias in Self-Consuming Performative Loop

The rapid advancement of large language models (LLMs) has led to growing interest in using synthetic data to train future models. However, this creates a self-consuming retraining loop, where models a...

By: Yaxuan Wang, Zhongteng Cai, Yujia Bao, Xueru Zhang, Yang Liu2026-01-08

#cs.AI

RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation

RoboVIP introduces a multi-view video generation framework that enhances robotic manipulation datasets by creating diverse backgrounds and tabletop scenes using visual identity prompting. This method ...

By: Boyang Wang, Haoran Zhang, Shujie Zhang, Jinkun Hao, Mingda Jia, Qi Lv, Yucheng Mao, Zhaoyang Lyu, Jia Zeng, Xudong Xu, Jiangmiao Pang2026-01-08

#cs.AI

Sparse Threats, Focused Defense: Criticality-Aware Robust Reinforcement Learning for Safe Autonomous Driving

This paper presents a criticality-aware robust reinforcement learning framework to enhance safety and robustness in autonomous driving systems. By focusing on sparse but critical threats, the method i...

By: Qi Wei, Junchao Fan, Zhao Yang, Jianhua Wang, Jingkai Mao, Xiaolin Chang2026-01-05

#cs.AI

MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents

This paper introduces MAGMA, a novel multi-graph based agentic memory architecture designed to enhance the capabilities of AI agents. It focuses on enabling agents to manage complex memories for impro...

By: Dongming Jiang, Yi Li, Guanpeng Li, Bingzhe Li2026-01-06

#cs.AI

Legal Alignment for Safe and Ethical AI

This paper examines the critical issue of legal alignment for safe and ethical artificial intelligence. It explores how AI development can be guided by legal and ethical frameworks to ensure responsib...

By: Noam Kolt, Nicholas Caputo, Jack Boeglin, Cullen O'Keefe, Rishi Bommasani, Stephen Casper, Mariano-Florentino Cuéllar, Noah Feldman, Iason Gabriel, Gillian K. Hadfield, Lewis Hammond, Peter Henderson, Atoosa Kasirzadeh, Seth Lazar, Anka Reuel, Kevin L. Wei, Jonathan Zittrain2026-01-07

#cs.AI

Fine-tuning Small Language Models as Efficient Enterprise Search Relevance Labelers

This paper investigates the fine-tuning of small language models to act as efficient enterprise search relevance labelers. The approach demonstrates how smaller LLMs can be optimized for specific busi...

By: Yue Kang, Zhuoyi Huang, Benji Schussheim, Diana Licon, Dina Atia, Shixing Cao, Jacob Danovitch, Kunho Kim, Billy Norcilien, Jonah Karpman, Mahmound Sayed, Mike Taylor, Tao Sun, Pavel Metrikov, Vipul Agarwal, Chris Quirk, Ye-Yi Wang, Nick Craswell, Irene Shaffer, Tianwei Chen, Sulaiman Vesal, Soundar Srinivasan2026-01-06

#cs.AI

MedPI: Evaluating AI Systems in Medical Patient-facing Interactions

This paper introduces MedPI, a high-dimensional benchmark for evaluating large language models (LLMs) in patient-clinician conversations. Unlike single-turn QA benchmarks, MedPI assesses medical dialo...

By: Diego Fajardo V., Oleksii Proniakin, Victoria-Elisabeth Gruber, Razvan Marinescu2026-01-08

#cs.AI

Active Sensing Shapes Real-World Decision-Making through Dynamic Evidence Accumulation

This paper generalizes the Evidence Accumulation Model (EAM) to real-world contexts, investigating how active sensing through eye movements influences decision-making. It proposes a cognitive scheme t...

By: Hongliang Lu, Yunmeng Liu, Junjie Yang2026-01-09

#cs.AI

Project Ariadne: A Structural Causal Framework for Auditing Faithfulness in LLM Agents

This research presents Project Ariadne, proposing a structural causal framework to audit the faithfulness of Large Language Model (LLM) agents. This is crucial for ensuring that LLM agents provide acc...

By: Sourena Khanzadeh2026-01-06

#cs.AI

Streaming Hallucination Detection in Long Chain-of-Thought Reasoning

This work focuses on developing methods for detecting hallucinations in long chain-of-thought reasoning processes, especially in the context of large language models. Effective hallucination detection...

By: Haolang Lu, Minghui Pan, Ripeng Li, Guoshun Nan, Jialin Zhuang, Zijie Zhao, Zhongxiang Sun, Kun Wang, Yang Liu2026-01-06

#cs.AI

Semantic Alignment of Multilingual Knowledge Graphs via Contextualized Vector Projections

The paper presents a cross-lingual ontology alignment system that uses embedding-based cosine similarity matching. Ontology entities are contextually enriched through novel techniques, employing a fin...

By: Abhishek Kumar2026-01-01

#cs.AI

Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling

This paper introduces Falcon-H1R, a hybrid model designed to enhance AI reasoning capabilities. The focus is on efficient test-time scaling, allowing the system to maintain high performance in complex...

By: Falcon LLM Team, Iheb Chaabane, Puneesh Khanna, Suhail Mohmad, Slim Frikha, Shi Hu, Abdalgader Abubaker, Reda Alami, Mikhail Lubinets, Mohamed El Amine Seddik, Hakim Hacid2026-01-06

#cs.AI

FormuLLA: A Large Language Model Approach to Generating Novel 3D Printable Formulations

This paper introduces FormuLLA, an innovative approach that leverages Large Language Models (LLMs) to generate novel 3D printable formulations. This opens up new possibilities for rapid prototyping an...

By: Adeshola Okubena, Yusuf Ali Mohammed, Moe Elbadawi2026-01-06

#cs.AI✓ Analyzed#LLM#Generative AI

EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning

Introducing EverMemOS, a self-organizing memory operating system designed to enhance structured long-horizon reasoning in AI systems. This enables systems to efficiently manage and utilize information...

By: Chuanrui Hu, Xingze Gao, Zuyi Zhou, Dannong Xu, Yi Bai, Xintong Li, Hui Zhang, Tong Li, Chong Zhang, Lidong Bing, Yafeng Deng2026-01-06

#cs.AI

Geometry of Reason: Spectral Signatures of Valid Mathematical Reasoning

This research delves into the geometry of reason, exploring spectral signatures that indicate valid mathematical reasoning. This study could contribute to building AI systems capable of more robust an...

By: Valentin Noël2026-01-05

#cs.AI

Recursive Language Models

Recursive Language Models (RLMs) introduce a general inference strategy that allows Large Language Models (LLMs) to process arbitrarily long prompts (exceeding 10 million tokens) by treating them as e...

By: Alex L. Zhang, Omar Khattab2025-12-30

#cs.AI

Robust Uncertainty Quantification for Factual Generation of Large Language Models

This paper addresses the critical limitation of hallucination in Large Language Models (LLMs) by proposing a novel and robust uncertainty quantification method (RU) for factual generation. It construc...

By: Yuhao Zhang, Zhongliang Yang, Linna Zhou2026-01-05

#cs.AI

RoboReward: General-Purpose Vision-Language Reward Models for Robotics

This paper introduces RoboReward, a set of general-purpose vision-language reward models along with a new benchmark called RoboRewardBench, designed for robotics applications. The RoboReward 8B model ...

By: Tony Lee, Andrew Wagenmaker, Karl Pertsch, Kevin Black, Suraj Nair, Michael Ahn, Jian Lan, Sergey Levine, Chelsea Finn2026-01-02

#cs.AI✓ Analyzed#Robotics#Reinforcement Learning

Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation

This paper presents Avatar Forcing, a new diffusion-driven framework that enables real-time interactive head avatar generation for natural conversation. It addresses the challenges of real-time motion...

By: Ki Taekyung, Junho Kim, Hyeonsu Lee, Hyewon Son, Jonghyun Choi2026-01-02

#cs.AI

AMAP Agentic Planning Technical Report

This technical report introduces STAgent, an agentic large language model developed by Alibaba Amap, specifically engineered for real-world spatio-temporal reasoning and complex planning. It achieves ...

By: Yulan Hu, Xiangwen Zhang, Sheng Ouyang, Hao Yi, Lu Xu, Qinglin Lang, Lide Tan, Xiang Cheng, Tianchen Ye, Zhicong Li, Ge Chen, Wenjin Yang, Zheng Pan, Shaopan Xiong, Siran Yang, Ju Huang, Yan Zhang, Jiamang Wang, Yong Liu, Yinfeng Huang, Tucheng Lin, Xin Li, Ning Guo2025-12-31

#cs.AI

ClinicalReTrial: A Self-Evolving AI Agent for Clinical Trial Protocol Optimization

This research introduces ClinicalReTrial, a self-evolving AI agent designed to optimize clinical trial protocols. This agent leverages AI and multiagent systems to enhance the efficiency and effective...

By: Sixue Xing, Xuanye Xia, Kerui Wu, Meng Jiang, Jintai Chen, Tianfan Fu2026-01-05

#cs.AI

Towards Trustworthy AI: A Framework for Explainable and Robust Deep Learning in Critical Systems

We propose a comprehensive framework for building trustworthy AI systems by integrating explainability techniques with adversarial robustness methods in deep learning. This work addresses critical con...

By: Professor Julian Vance, Dr. Lena Schmidt, Mr. Omar Hassan, Ms. Jessica Lee, Dr. Martin Müller2025-12-28

#cs.AI

Towards Energy-Efficient Edge AI: A Novel Architecture for On-Device Large Language Models

We propose a new architectural design that significantly reduces the computational and energy footprint of large language models (LLMs), enabling their efficient deployment on edge devices. This break...

By: Professor Kai Hansen, Dr. Lena Popova, Mr. John M. Smith, Dr. Isabella Garcia, Dr. Wei Wang2025-12-31

#cs.AI

Multimodal Conversational AI for Enhanced Customer Experience and Support Automation

This paper presents a multimodal conversational AI system that seamlessly integrates natural language understanding, speech recognition, and visual context to provide highly personalized and effective...

By: Dr. Sophia G. Miller, Professor Alexandre Dubois, Ms. Emily R. Chen, Mr. Robert Johnson, Dr. Priya Reddy, Mr. Carlos Mendoza2026-01-04

#cs.AI

Adaptive Agentic AI for Enhanced Human-Machine Teaming in Complex Operations

This paper presents a novel framework for integrating adaptive agentic AI systems into human-machine teams, focusing on dynamic task allocation, context-aware decision-making, and real-time learning. ...

By: Dr. Elena Petrova, Dr. Kenji Tanaka, Professor Marcus Chen, Dr. Anya Sharma, Mr. David Rodriguez2026-01-03

#cs.AI

Robust Policy Learning for Human-Robot Collaboration in Unstructured Industrial Environments

This paper introduces a novel robust policy learning framework enabling seamless and safe human-robot collaboration in complex, unstructured industrial settings. The approach leverages advanced percep...

By: Dr. Emily White, Prof. Joon-Ho Kim, Dr. Ricardo Garcia, Dr. Anna Schmidt, Dr. Ben Carter2025-12-30

#cs.AI

Deep Learning for Early Detection of Pancreatic Cancer from Multi-Modal Medical Imagery

This research proposes a novel deep learning framework that integrates various medical imaging modalities for highly accurate and early detection of pancreatic cancer. The model significantly improves...

By: Dr. Anya Sharma, Prof. David Chen, Dr. Elena Petrova, Dr. Kenji Tanaka, Dr. Sofia Bianchi2025-12-28

#cs.AI

Adaptive AI Tutors: Enhancing Student Engagement and Learning Outcomes through Personalized Feedback Loops

This research presents an innovative adaptive AI tutoring system designed to personalize the learning experience, significantly boosting student engagement and improving academic outcomes. The system ...

By: Dr. Daniel Brown, Prof. Jessica Green, Dr. Hiroshi Sato, Dr. Laura Martinez, Dr. Peter Wang2026-01-01

#cs.AI

Context-Aware Large Language Models for Personalized Legal Document Generation and Review

We develop and evaluate context-aware large language models specifically tailored for legal applications, enabling the personalized generation and efficient review of complex legal documents. This sys...

By: Dr. Sophia Davis, Prof. Robert Miller, Dr. Chen Li, Dr. Maria Rodriguez, Dr. David Jones2025-12-31

#cs.AI

SyncGait: Robust Long-Distance Authentication for Drone Delivery via Implicit Gait Behaviors

SyncGait is a novel user-drone mutual authentication system that leverages implicit gait behaviors, specifically the user's unique arm swing, for robust long-distance authentication during drone deliv...

By: Zijian Ling, Man Zhou, Hongda Zhai, Yating Huang, Lingchen Zhao, Qi Li, Chao Shen, Qian Wang2025-12-29

#cs.AI

The Drill-Down and Fabricate Test (DDFT): A Protocol for Measuring Epistemic Robustness in Language Models

Current language model evaluations measure what models know under ideal conditions but not how robustly they know it under realistic stress. We introduce the Drill-Down and Fabricate Test (DDFT), a pr...

By: Rahul Baxi2026-01-01

#cs.AI

Space AI: Leveraging Artificial Intelligence for Space to Improve Life on Earth

This paper introduces Space AI as a unified interdisciplinary field at the intersection of artificial intelligence and space science and technology. It proposes a systematic framework organizing Space...

By: Ziyang Wang2025-12-26

#cs.AI✓ Analyzed#Space AI#Earth Observation

PyBangla at BLP-2025 Task 2: Enhancing Bangla-to-Python Code Generation with Iterative Self-Correction and Multilingual Agents

This work focuses on improving code generation from Bangla natural language prompts to Python code, utilizing iterative self-correction mechanisms and multilingual AI agents. It aims to bridge the gap...

By: Jahidul Islam, Md Ataullha, Saiful Azad2026-01-01

#cs.AI

SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time

This paper introduces SpaceTimePilot, a system for generative rendering of dynamic scenes, enabling the creation of realistic and evolving visual content across both spatial and temporal dimensions. T...

By: Zhening Huang, Hyeonho Jeong, Xuelin Chen, Yulia Gryaditskaya, Tuanfeng Y. Wang, Joan Lasenby, Chun-Hao Huang2026-01-01

#cs.AI

Enriching Historical Records: An OCR and AI-Driven Approach for Database Integration

This paper introduces an AI and Optical Character Recognition (OCR)-driven pipeline for digitizing and integrating historical documents into databases. It addresses challenges like layout variability ...

By: Zahra Abedi, Richard M.K. van Dijk, Gijs Wijnholds, Tessa Verhoef2026-01-01

#cs.AI

Coordinated Humanoid Manipulation with Choice Policies

This research focuses on developing sophisticated control policies for humanoid robots to achieve coordinated manipulation tasks. It explores how robots can make intelligent choices to perform complex...

By: Haozhi Qi, Yen-Jen Wang, Toru Lin, Brent Yi, Yi Ma, Koushil Sreenath, Jitendra Malik2026-01-01

#cs.AI

Semi-Automated Data Annotation in Multisensor Datasets for Autonomous Vehicle Testing

This paper addresses the critical challenge of efficient and accurate data annotation for multisensor datasets, particularly for the rigorous testing of autonomous vehicles. It proposes semi-automated...

By: Andrii Gamalii, Daniel Górniak, Robert Nowak, Bartłomiej Olber, Krystian Radlak, Jakub Winter2026-01-01

#cs.AI

Iterative Deployment Improves Planning Skills in LLMs

This research investigates how iterative deployment strategies can significantly enhance the planning capabilities of Large Language Models (LLMs). The paper presents novel approaches for refining LLM...

By: Augusto B. Corrêa, Yoav Gelberg, Luckeciano C. Melo, Ilia Shumailov, André G. Pereira, Yarin Gal2026-01-01

#cs.AI

Context-aware LLM-based AI Agents for Human-centered Energy Management Systems in Smart Buildings

This paper explores the development of context-aware AI agents based on large language models (LLMs) designed for human-centered energy management systems in smart buildings. The research aims to opti...

By: Tianzhi He, Farrokh Jazizadeh2026-01-01

#cs.AI

Why AI Safety Requires Uncertainty, Incomplete Preferences, and Non-Archimedean Utilities

This theoretical paper argues for the necessity of incorporating uncertainty, incomplete preferences, and non-Archimedean utilities into AI safety frameworks. It suggests that current approaches to AI...

By: Alessio Benavoli, Alessandro Facchini, Marco Zaffalon2025-12-30

#cs.AI

Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation

The paper presents Robo-Dopamine, a framework for high-precision robotic manipulation using reinforcement learning (RL). It introduces Dopamine-Reward, a novel multi-view, step-aware process reward mo...

By: Huajie Tan, Sixiang Chen, Yijie Xu, Zixiao Wang, Yuheng Ji, Cheng Chi, Yaoxu Lyu, Zhongxia Zhao, Xiansheng Chen, Peterson Co, Shaoxuan Xie, Guocai Yao, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang2025-12-29

#cs.AI

Bidirectional RAG: Safe Self-Improving Retrieval-Augmented Generation Through Multi-Stage Validation

Retrieval-Augmented Generation (RAG) systems enhance large language models by grounding responses in external knowledge bases, but conventional RAG architectures operate with static corpora that canno...

By: Teja Chinthala2025-12-30

#cs.AI

Training AI Co-Scientists Using Rubric Rewards

This paper introduces a scalable method to train language models as "AI co-scientists" capable of generating high-quality research plans across diverse scientific domains. It leverages automated extra...

By: Shashwat Goel, Rishi Hazra, Dulhan Jayalath, Timon Willi, Parag Jain, William F. Shen, Ilias Leontiadis, Francesco Barbieri, Yoram Bachrach, Jonas Geiping, Chenxi Whitehouse2025-12-29

#cs.AI✓ Analyzed#AI for Science#RLHF

MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

This paper introduces MAI-UI, a family of foundation GUI agents designed for real-world deployment. It integrates agent-user interaction, external tool use via MCP, and a native device-cloud collabora...

By: Hanzhang Zhou, Xu Zhang, Panrong Tong, Jianan Zhang, Liangyu Chen, Quyu Kong, Chenglin Cai, Chen Liu, Yue Wang, Jingren Zhou, Steven Hoi2025-12-26

#cs.AI

HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation

This paper presents HY-Motion 1.0, a series of state-of-the-art, large-scale motion generation models that produce 3D human motions from text descriptions. It is the first to scale Diffusion Transform...

By: Yuxin Wen, Qing Shuai, Di Kang, Jing Li, Cheng Wen, Yue Qian, Ningxin Jiao, Changhai Chen, Weijie Chen, Yiran Wang, Jinkun Guo, Dongyue An, Han Liu, Yanyu Tong, Chao Zhang, Qing Guo, Juan Chen, Qiao Zhang, Youyi Zhang, Zihao Yao, Cheng Zhang, Hong Duan, Xiaoping Wu, Qi Chen, Fei Cheng, Liang Dong, Peng He, Hao Zhang, Jiaxin Lin, Chao Zhang, Zhongyi Fan, Yifan Li, Zhichao Hu, Yuhong Liu, Linus, Jie Jiang, Xiaolong Li2025-12-29

#cs.AI

AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents

This survey unifies insights from cognitive neuroscience with Large Language Model (LLM)-driven agents, offering a comprehensive review of memory systems. It establishes a unified framework detailing ...

By: Jiafeng Liang, Hao Li, Chang Li, Jiaqi Zhou, Shixin Jiang, Zekun Wang, Changkai Ji, Zhihao Zhu, Runxuan Liu, Tao Ren, Jinlan Fu, See-Kiong Ng, Xia Liang, Ming Liu, Bing Qin2025-12-29

#cs.AI

The Gaining Paths to Investment Success: Information-Driven LLM Graph Reasoning for Venture Capital Prediction

This paper presents a novel approach utilizing Information-Driven Large Language Model (LLM) Graph Reasoning to predict venture capital investment success. By analyzing complex relationships in financ...

By: Haoyu Pei, Zhongyang Liu, Xiangyi Xiao, Xiaocong Du, Haipeng Zhang, Kunpeng Zhang, Suting Hong2025-12-30

#cs.AI✓ Analyzed#Venture Capital#Graph Neural Networks

Web World Models

This paper introduces Web World Models, a new approach to building AI agents that can understand and interact with the internet more effectively. It aims to create AI that can navigate, process inform...

By: Jichen Feng, Yifan Zhang, Chenggong Zhang, Yifu Lu, Shilong Liu, Mengdi Wang2025-12-30

#cs.AI

Regret-Based Federated Causal Discovery with Unknown Interventions

This research explores a novel method for causal discovery in federated learning settings, especially when interventions are unknown. It focuses on how to identify causal relationships across distribu...

By: Federico Baldo, Charles K. Assaad2025-12-30

#cs.AI

Physics-Informed Neural Networks for Device and Circuit Modeling: A Case Study of NeuroSPICE

This paper presents the application of Physics-Informed Neural Networks (PINNs) for modeling semiconductor devices and electronic circuits, using NeuroSPICE as a case study. This approach integrates p...

By: Chien-Ting Tung, Chenming Hu2025-12-30

#cs.AI

Energy-Aware Data-Driven Model Selection in LLM-Orchestrated AI Systems

This paper addresses energy consumption in AI systems orchestrated by Large Language Models (LLMs) by proposing an energy-aware, data-driven model selection strategy. This research is critical for dev...

By: Daria Smirnova, Hamid Nasiri, Marta Adamska, Zhengxin Yu, Peter Garraghan2025-12-28

#cs.AI

Divergent-Convergent Thinking in Large Language Models for Creative Problem Generation

This preprint investigates the ability of Large Language Models (LLMs) to engage in both divergent (idea generation) and convergent (problem formulation) thinking for creative problem generation. It e...

By: Manh Hung Nguyen, Adish Singla2025-12-30

#cs.AI

RL-Struct: A Lightweight Reinforcement Learning Framework for Reliable Structured Output in LLMs.

This paper introduces RL-Struct, a lightweight reinforcement learning framework designed to improve the reliability of structured output generated by large language models. By ensuring more consistent...

By: Ruike Hu, Shulei Wu2025-12-23

#cs.AI#Reinforcement Learning#LLM

Knowledge Graph Augmented Large Language Models for Disease Prediction.

This paper explores the integration of knowledge graphs with large language models to enhance the accuracy and interpretability of disease prediction. By leveraging structured medical knowledge, the p...

By: Ruiyu Wang, Tuan Vinh, Ran Xu, Yuyin Zhou, Jiaying Lu, Carl Yang, Francisco Pasquel2025-12-28

#cs.AI#Knowledge Graphs#LLM

A2P-Vis: an Analyzer-to-Presenter Agentic Pipeline for Visual Insights Generation and Reporting.

This paper introduces A2P-Vis, an agentic pipeline designed to automate the generation of visual insights and reports. It aims to streamline data visualization and communication, providing an efficien...

By: Shuyu Gan, Renxiang Wang, James Mooney, Dongyeop Kang2025-12-29

#cs.AI

Pruning as a Game: Equilibrium-Driven Sparsification of Neural Networks

Neural network pruning is widely used to reduce model size and computational cost. However, most existing methods treat sparsity as an extrinsic constraint enforced via heuristic importance scores or ...

By: Zubair Shah, Noaman Khan2025-12-26

#cs.AI

SpatialBench: Can Agents Analyze Real-World Spatial Biology Data?

Spatial transcriptomics experiments are rapidly expanding in scale and complexity, making computational analysis a major bottleneck in biological discovery. While frontier AI agents have shown signifi...

By: Kenny Workman, Zhen Yang, Harihara Muralidharan, Hannah Le2025-12-26

#cs.AI

How Do Agents Perform Code Optimization? An Empirical Study

Performance optimization is a critical yet challenging aspect of software development, often requiring a deep understanding of system behavior, algorithmic tradeoffs, and careful code modifications. A...

By: Huiyun Peng, Antonio Zhong, Ricardo Andrés Calvo Méndez, Kelechi G. Kalu, James C. Davis2025-12-25

#cs.AI

From Visual Perception to Deep Empathy: An Automated Assessment Framework for House-Tree-Person Drawings Using Multimodal LLMs and Multi-Agent Collaboration

The House-Tree-Person (HTP) drawing test, introduced by John Buck in 1948, remains a widely used projective technique in clinical psychology. However, it has long faced challenges such as heterogeneou...

By: Yutong Zhang, Qingyu Zhang, Yaxin Wang, Yujie Li, Xiangmin Xu2025-12-23

#cs.AI

Fast SAM2 with Text-Driven Token Pruning

Segment Anything Model 2 (SAM2), a vision foundation model has significantly advanced in prompt-driven video object segmentation, yet their practical deployment remains limited by the high computation...

By: Avilasha Mandal, Chaoning Zhang, Fachrina Dewi Puspitasari, Xudong Wang, Jiaquan Zhang, Caiyan Qin, Guoqing Wang, Yang Yang, Heng Tao Shen2025-12-24

#cs.AI

ScoutGPT: Capturing Player Impact from Team Action Sequences Using GPT-Based Framework

This paper introduces ScoutGPT, a GPT-based framework designed to analyze team action sequences and quantify individual player impact in sports. By leveraging advanced language model capabilities, Sco...

By: Miru Hong, Minho Lee, Geonhee Jo, Jae-Hee So, Pascal Bauer, Sang-Ki Ko2025-12-22

#cs.AI✓ Analyzed#Sports Analytics#NLP

MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation

This paper introduces MegaRAG, a novel framework for Retrieval Augmented Generation that leverages both multimodal data and knowledge graphs. It aims to enhance the accuracy and relevance of generated...

By: Chi-Hsiang Hsiao, Yi-Cheng Wang, Tzung-Sheng Lin, Yi-Ren Yeh, Chu-Song Chen2025-12-25

#cs.AI

Knowledge Augmentation via Synthetic Data: A Framework for Real-World ECG Image Classification

In real-world clinical practice, electrocardiograms (ECGs) are often captured and shared as photographs. However, publicly available ECG data, and thus most related research, relies on digital signals...

By: Xiaoyu Wang, Ramesh Nadarajah, Zhiqiang Zhang, David Wong2025-12-24

#cs.AI✓ Analyzed#ECG#Synthetic Data

Generation of Programmatic Rules for Document Forgery Detection Using Large Language Models

Document forgery poses a growing threat to legal, economic, and governmental processes, requiring increasingly sophisticated verification mechanisms. Recent advances in code generation with large lang...

By: Valentin Schmidberger, Manuel Eberhardinger, Setareh Maghsudi, Johannes Maucher2025-12-23

#cs.AI

The Geometry of Laziness: What Angles Reveal About AI Hallucinations

This paper investigates the underlying geometric principles behind AI hallucinations, particularly in Large Language Models. By analyzing 'angles,' it seeks to provide a predictable and computationall...

By: Javier Marin2025-12-22

#cs.AI

MiST: Understanding the Role of Mid-Stage Scientific Training in Developing Chemical Reasoning Models

This paper introduces MiST, a framework for understanding the impact of mid-stage scientific training on the development of chemical reasoning models. By improving these models, it has significant rea...

By: Andres M Bran, Tong Xie, Shai Pranesh, Jeffrey Meng, Xuan Vu Nguyen, Jeremy Goumaz, David Ming Segura, Ruizhi Xu, Dongzhan Zhou, Wenjie Zhang, Bram Hoex, Philippe Schwaller2025-12-25

#cs.AI

C2LLM Technical Report: A New Frontier in Code Retrieval via Adaptive Cross-Attention Pooling

This technical report introduces C2LLM, a novel approach to code retrieval that utilizes adaptive cross-attention pooling. This innovation has direct real-world applications in software development, e...

By: Jin Qin, Zihan Liao, Ziyin Zhang, Hang Yu, Peng Di, Rui Wang2025-12-25

#cs.AI

RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic

Ensuring the safety of embodied AI agents in complex, unstructured environments is a critical challenge. This paper introduces RoboSafe, a novel framework that integrates executable safety logic direc...

By: Le Wang, Zonghao Ying, Xiao Yang, Quanchen Zou, Zhenfei Yin, Tianlin Li, Jian Yang, Yaodong Yang, Aishan Liu, Xianglong Liu2025-12-24

#cs.AI

Quantum-Inspired Multi Agent Reinforcement Learning for Exploration Exploitation Optimization in UAV-Assisted 6G Network Deployment

This study introduces a quantum-inspired framework for optimizing the exploration-exploitation tradeoff in multi-agent reinforcement learning (MARL), specifically applied to UAV-assisted 6G network de...

By: Mazyar Taghavi, Javad Vahidi2025-12-25

#cs.AI

SMART SLM: Structured Memory and Reasoning Transformer, A Small Language Model for Accurate Document Assistance

Small Language Models (SLMs) struggle with complex document understanding due to limited parameters. SMART SLM, a novel Structured Memory and Reasoning Transformer, enhances SLMs for accurate document...

By: Divij Dudeja, Mayukha Pal2025-12-24

#cs.AI

Optimizing Decoding Paths in Masked Diffusion Models by Quantifying Uncertainty

Masked Diffusion Models (MDMs) offer flexible non-autoregressive generation, but their output quality is highly sensitive to the decoding order. This paper formalizes this issue by attributing variabi...

By: Ziyu Chen, Xinbei Jiang, Peng Sun, Tao Lin2025-12-24

#cs.AI

BitRL-Light: 1-bit LLM Agents with Deep Reinforcement Learning for Energy-Efficient Smart Home Lighting Optimization

Smart home lighting systems consume 15-20% of residential energy but often lack adaptive intelligence. BitRL-Light is a novel framework that combines 1-bit quantized Large Language Models (LLMs) with ...

By: Ravi Gupta, Shabista Haider2025-12-25

#cs.AI

Agentic Explainable Artificial Intelligence (Agentic XAI) Approach To Explore Better Explanation

Explainable Artificial Intelligence (XAI) is vital for trust and transparency in AI systems, especially in high-stakes applications. This study introduces an Agentic XAI approach that utilizes the ite...

By: Tomoaki Yamaguchi, Yutong Zhou, Masahiro Ryo, Keisuke Katsura2025-12-24

#cs.AI

A Real-World Evaluation of LLM Medication Safety Reviews in NHS Primary Care

Large Language Models (LLMs) show promise for medication safety in healthcare. This paper presents a real-world evaluation of an LLM-powered system for medication safety reviews in NHS Primary Care, i...

By: Oliver Normand, Esther Borsi, Mitch Fruin, Lauren E Walker, Jamie Heagerty, Chris C. Holmes, Anthony J Avery, Iain E Buchan, Harry Coppock2025-12-24

#cs.AI

Towards Closed-Loop Embodied Empathy Evolution: Probing LLM-Centric Lifelong Empathic Motion Generation in Unseen Scenarios

Developing emotionally intelligent embodied AI that can generate empathic responses in various situations is a significant challenge for human-robot interaction. This paper explores "Closed-Loop Embod...

By: Jiawen Wang, Jingjing Wang Tianyang Chen, Min Zhang, Guodong Zhou2025-12-23

#cs.AI✓ Analyzed#Robotics#LLM

Re-Depth Anything: Test-Time Depth Refinement via Self-Supervised Re-lighting

Accurate depth estimation is fundamental for many computer vision tasks, including 3D reconstruction, robotics, and augmented reality. This paper introduces "Re-Depth Anything," a novel method for tes...

By: Ananta R. Bhattarai, Helge Rhodin2025-12-22

#cs.AI

QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models

Vision-Language Models (VLMs) have shown remarkable progress, but their ability to reason about the physical world, crucial for real-world applications like robotics, remains underexplored. This paper...

By: Li Puyin, Tiange Xiang, Ella Mao, Shirley Wei, Xinye Chen, Adnan Masood, Li Fei-fei, Ehsan Adeli2025-12-23

#cs.AI

Scalably Enhancing the Clinical Validity of a Task Benchmark with Physician Oversight

Automating clinical risk score calculations can significantly reduce physician administrative burden and improve patient care. Current benchmarks like MedCalc-Bench, constructed using LLM-based extrac...

By: Junze Ye, Daniel Tawfik, Alex J. Goodell, Nikhil V. Kotha, Mark K. Buyyounouski, Mohsen Bayati2025-12-22

#cs.AI

Learning General Policies with Policy Gradient Methods

Policy gradient methods are a cornerstone of reinforcement learning (RL), enabling agents to learn optimal behaviors in complex environments. This paper investigates advances in policy gradient method...

By: Simon Ståhlberg, Blai Bonet, Hector Geffner2025-12-19

#cs.AI

Augmenting Intelligence: A Hybrid Framework for Scalable and Stable Explanations

Current Explainable AI (XAI) approaches face a "Scalability-Stability Dilemma": post-hoc methods (e.g., LIME, SHAP) scale easily but are unstable, while supervised frameworks (e.g., TED) offer stabili...

By: Lawrence Krukrubo, Julius Odede, Olawande Olusegun2025-12-22

#cs.AI

V-Agent: An Interactive Video Search System Using Vision-Language Models

We introduce V-Agent, a novel multi-agent platform designed for advanced video search and interactive user-system conversations. By fine-tuning a vision-language model (VLM) with a small video prefere...

By: SunYoung Park, Jong-Hyeon Lee, Youngjune Kim, Daegyu Sung, Younghyun Yu, Young-rok Cha, Jeongho Ju2025-12-22

#cs.AI✓ Analyzed#Video Retrieval#Vision-Language Models

Towards Explainable Conversational AI for Early Diagnosis with Large Language Models

This research focuses on developing explainable conversational AI systems that leverage large language models for early disease diagnosis. It addresses the critical need for transparency and interpret...

By: Maliha Tabassum, M Shamim Kaiser2025-12-22

#cs.AI

Humanlike AI Design Increases Anthropomorphism but Yields Divergent Outcomes on Engagement and Trust Globally

This paper explores the effects of humanlike AI design on anthropomorphism, engagement, and trust across different global contexts. The findings reveal that while humanlike AI generally increases anth...

By: Robin Schimmelpfennig, Mark Díaz, Vinodkumar Prabhakaran, Aida Davani2025-12-22

#cs.AI

About Time: Model-free Reinforcement Learning with Timed Reward Machines

This paper introduces a model-free reinforcement learning approach that incorporates timed reward machines to handle temporal properties in complex environments. By explicitly integrating timing const...

By: Anirban Majumdar, Ritam Raha, Rajarshi Roy, David Parker, Marta Kwiatkowska2025-12-22

#cs.AI

Conflict-Driven Clause Learning with VSIDS Heuristics for Discrete Facility Layout

This paper studies the use of Conflict-Driven Clause Learning (CDCL) with VSIDS heuristics as a computational engine for discrete facility layout problems. The facility layout problem is modeled as a ...

By: Joshua Gibson, Kapil Dhakal2025-12-23

#cs.AI

Towards Robust and Interpretable Multimodal Foundation Models for Clinical Diagnosis

We propose a new architectural paradigm for multimodal foundation models designed specifically for clinical diagnostic support. The model integrates diverse data types, including medical images, elect...

By: Dr. Kenji Tanaka, Dr. Maria Rodriguez, Prof. Li Wei, Dr. Samuel Green, Dr. Isabella Rossi, Prof. Ahmed Khan2025-12-21

#cs.AI

Scalable Knowledge Graph Construction from Noisy Text with Large Language Models

This paper presents a novel framework for automatically constructing large-scale knowledge graphs from unstructured, noisy text data by leveraging the advanced capabilities of large language models. I...

By: Dr. Anya Petrova, Prof. Serhii Kovalenko, Dr. Elena Vasylenko, Dmytro Kuzmenko, Olena Mykhailiuk2025-12-22

#cs.AI✓ Analyzed#Knowledge Graph#LLM

Learning from Imperfect Demonstrations: A Human-in-the-Loop Approach for Robotic Manipulation

This paper addresses the challenge of teaching robots complex manipulation tasks using imperfect human demonstrations. We propose a novel human-in-the-loop framework that allows the robot to query a h...

By: Dr. Sarah Johnson, Prof. Mark Thompson, Dr. Anna Kaczmarek, Giovanni Russo, Dr. Elena Popova2025-12-17

#cs.AI

Federated Reinforcement Learning for Decentralized Traffic Signal Control in Smart Cities

This research explores the application of federated reinforcement learning to optimize traffic flow in urban environments without centralizing sensitive traffic data. Our proposed framework enables in...

By: Dr. Chen Wang, Dr. Emily Davis, Prof. Marco Bianchi, Dr. Javier Perez, Sophie Dubois2025-12-20

#cs.AI

Adversarial Robustness for Foundation Models through Self-Supervised Perturbation Generation

We introduce a new method to enhance the adversarial robustness of large-scale foundation models using a self-supervised approach to generate diverse and challenging perturbations. This technique sign...

By: Dr. Michael Brown, Dr. Jessica Lee, Prof. Benjamin Clark, Dr. Sofia Hernandez, Oliver Wilson, Dr. Grace Taylor, Prof. Kevin Moore2025-12-15

#cs.AI✓ Analyzed#Adversarial Machine Learning#LLM Safety

A Benchmark of Causal vs Correlation AI for Predictive Maintenance

This paper establishes a benchmark for evaluating causal versus correlational AI approaches in predictive maintenance. By providing a clear framework for comparison, this work helps industries impleme...

By: Krishna Taduri, Shaunak Dhande, Giacinto Paolo (GP)Saggese, Paul Smith2025-12-15

#cs.AI

CodeDistiller: Automatically Generating Code Libraries for Scientific Coding Agents

CodeDistiller proposes a method for automatically generating code libraries, specifically tailored for scientific coding agents. This research has profound implications for accelerating scientific dis...

By: Peter Jansen, Samiah Hassan, Pragnya Narasimha2025-12-15

#cs.AI#LLM#Code Generation

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

Optimizing CUDA kernels is complex and labor-intensive. This paper introduces cuPilot, a multi-agent framework that uses strategy as an intermediate semantic representation for kernel evolution, addre...

By: Jinwu Chen, Qidie Wu, Bin Li, Lin Ma, Xin Si, Yang Hu, Shouyi Yin, Jun Yang2025-12-18

#cs.AI

Value Lens: A Text-Based Model for Detecting Human Values using Generative AI

This article presents Value Lens, a text-based model designed to detect human values using generative artificial intelligence, specifically Large Language Models (LLMs). The proposed model operates in...

By: Yixuan Wang, Minjun Zhu, Qiujie Xie, Qiyao Sun, Zhen Lin, Sifan Liu, Yue Zhang2025-12-17

#cs.AI✓ Analyzed#NLP#Generative AI

TimeSeries2Report prompting enables adaptive large language model management of lithium-ion batteries

This paper introduces TimeSeries2Report (TS2R), a prompting framework that converts raw lithium-ion battery operational time-series into structured, semantically enriched reports. This enables large l...

By: Jiayang Yang, Chunhui Zhao, Martin Guay, Zhixing Cao2025-12-18

#cs.AI

Quantifying and Bridging the Fidelity Gap: A Decisive-Feature Approach to Comparing Synthetic and Real Imagery

Virtual testing with synthetic data is crucial for autonomous vehicle safety, but pixel-level fidelity doesn't guarantee real-world transfer. This paper introduces Decisive Feature Fidelity (DFF), an ...

By: Danial Safaei, Siddartha Khastgir, Mohsen Alirezaei, Jeroen Ploeg, Son Tong, Xingyu Zhao2025-12-18

#cs.AI

Scaling Laws for Energy Efficiency of Local LLMs

Deploying local large language models and vision-language models on edge devices requires balancing accuracy with constrained computational and energy budgets. This paper systematically benchmarks LLM...

By: Ander Alvarez, Alessandro Genuardi, Nilotpal Sinha, Antonio Tiene, Samuel Mugel, Román Orús2025-12-18

#cs.AI

AdaGradSelect: A Computationally Efficient and Memory-Optimized Fine-Tuning Method for Large Language Models

This paper introduces AdaGradSelect, a novel fine-tuning method for Large Language Models (LLMs) that offers significant computational efficiency and memory optimization. It trains about 12% faster an...

By: Yixuan Weng, Minjun Zhu, Qiujie Xie, Qiyao Sun, Zhen Lin, Sifan Liu, Yue Zhang2025-12-17

#cs.AI

Anubuddhi: Designing Quantum Optics Experiments with Multi-Agent AI

We present Anubuddhi, a multi-agent AI system that designs and simulates quantum optics experiments from natural language prompts without requiring specialized programming knowledge. The system compos...

By: Yifan Li, Yuxiang Zhang, Ziqiao Ma, Tianmin Shu, Zhiting Hu, Lianhui Qin2025-12-19

#cs.AI✓ Analyzed#Quantum Optics#Multi-Agent Systems

AI Epidemiology: Governing and Explaining Advanced AI Systems by Population-Level Surveillance

This paper proposes AI Epidemiology, a framework for governing and explaining advanced AI systems by applying population-level surveillance methods to AI outputs. It aims to bypass the complexity of c...

By: Zohra Hadjam, John Mellor, Ilaria Tiddi, Adrian R. Taylor2025-12-19

#cs.AI✓ Analyzed#AI Safety#Epidemiology

The Social Responsibility Stack: A Control-Theoretic Architecture for Governing Socio-Technical AI

This paper introduces the Social Responsibility Stack (SRS), a control-theoretic architecture designed to govern socio-technical AI systems responsibly. The SRS provides a modular framework for integr...

By: Otman A. Basir2025-12-19

#cs.AI

TOGGLE: Temporal Logic-Guided Large Language Model Compression for Edge

We present TOGGLE, a novel framework for compressing Large Language Models (LLMs) specifically designed for efficient deployment on edge devices. TOGGLE leverages temporal logic to guide the compressi...

By: Khurram Khalil, Khaza Anuarul Hoque2025-12-19

#cs.AI

Distributional AGI Safety

We introduce the concept of Distributional AGI Safety, a framework for analyzing and ensuring the safety of Artificial General Intelligence (AGI) systems across diverse operational contexts and potent...

By: Nenad Tomašev, Matija Franklin, Julian Jacobs, Sébastien Krier, Simon Osindero2025-12-19

#cs.AI

Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning

Large language models (LLMs) with explicit reasoning capabilities excel at mathematical reasoning yet still commit process errors, such as incorrect calculations, brittle logic, and superficially plau...

By: Qihao Liu, Luoxin Ye, Wufei Ma, Yu-Cheng Chou, Alan Yuille2025-12-19

#cs.AI

AI-Mediated Social Interaction: A Multi-Scale Perspective

This paper explores AI-mediated social interaction from a multi-scale perspective, analyzing its impact at individual, group, and societal levels. We examine how AI agents and systems influence human ...

By: Junzhe Zhang2025-12-19

#cs.AI✓ Analyzed#AI-Mediated Communication#Computational Social Science

CitySeeker: How Do VLMS Explore Embodied Urban Navigation With Implicit Human Needs?

CitySeeker investigates how Vision-Language Models (VLMs) can effectively perform embodied urban navigation while implicitly understanding and addressing human needs. We propose a framework that integ...

By: Siqi Wang, Chao Liang, Yunfan Gao, Erxin Yu, Sen Li, Yushi Li, Jing Li, Haofen Wang2025-12-19

#cs.AI

TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs

TimeLens proposes a novel method for video temporal grounding by leveraging multimodal Large Language Models (LLMs). This research enhances the ability of AI to understand and locate specific events w...

By: Jun Zhang, Teng Wang, Yuying Ge, Yixiao Ge, Xinhao Li, Ying Shan, Limin Wang2025-12-17

#cs.AI

Predictive Concept Decoders: Training Scalable End-to-End Interpretability Assistants

This paper introduces Predictive Concept Decoders (PCDs), a novel framework for training scalable end-to-end interpretability assistants. PCDs aim to provide human-understandable explanations for AI m...

By: Vincent Huang, Dami Choi, Daniel D. Johnson, Sarah Schwettmann, Jacob Steinhardt2025-12-18

#cs.AI

Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning

This paper proposes "Stepwise Think-Critique," a unified framework designed to improve the robustness and interpretability of Large Language Model (LLM) reasoning. By incorporating iterative thinking ...

By: Jiaqi Xu, Cuiling Lan, Xuejin Chen, Yan Lu2025-12-18

#cs.AI

Human-Centered AI for Financial Decision Support: Explainability and Trust

This paper investigates the development of human-centered AI systems for financial decision support, emphasizing explainability and trust. It presents approaches to design AI tools that provide clear ...

By: Sophia Chen, Robert Davis, Laura Evans, Michael Foster2025-12-12

#cs.AI✓ Analyzed#XAI#Fintech

Enhancing AI Ethics and Governance with Explainable AI (XAI) and Causal Inference

This research focuses on strengthening AI ethics and governance frameworks by integrating Explainable AI (XAI) and causal inference techniques. It proposes methods to make AI decisions more transparen...

By: Emily Brown, Frank Green, Grace White, Henry Black2025-12-15

#cs.AI

A Decision-Theoretic Approach for Managing Misalignment

This paper presents a decision-theoretic approach to manage misalignment in AI systems, a critical challenge for safe and ethical AI deployment. It provides a formal framework to reason about and miti...

By: Daniel A. Herrmann, Abinav Chari, Isabelle Qian, Sree Sharvesh, B. A. Levinstein2025-12-18

#cs.AI

Epistemic diversity across language models mitigates knowledge collapse

This research explores how maintaining epistemic diversity across multiple language models can prevent "knowledge collapse," a reduction to dominant ideas. This is vital for building robust, reliable,...

By: Damian Hodel, Jevin D. West2025-12-17

#cs.AI

Context-Picker: Dynamic context selection using multi-stage reinforcement learning

This paper introduces Context-Picker, an approach that uses multi-stage reinforcement learning for dynamic context selection. This is highly relevant for AI systems that need to efficiently process an...

By: Siyuan Zhu, Chengdong Xu, Kaiqiang Ke, Chao Yu2025-12-17

#cs.AI

MMGR: Multi-Modal Generative Reasoning

This work introduces MMGR, a framework for Multi-Modal Generative Reasoning, exploring the integration of various data modalities for enhanced AI understanding and generation, with applications in com...

By: Zefan Cai, Haoyi Qiu, Tianyi Ma, Haozhe Zhao, Gengze Zhou, Kung-Hsiang Huang, Parisa Kordjamshidi, Minjia Zhang, Xiao Wen, Jiuxiang Gu, Nanyun Peng, Junjie Hu2025-12-17

#cs.AI

Dynamic Learning Rate Scheduling based on Loss Changes Leads to Faster Convergence

This research introduces a dynamic learning rate scheduling method based on loss changes, aiming to achieve faster convergence in machine learning models, offering practical benefits for optimizing tr...

By: Shreyas Subramanian, Bala Krishnamoorthy, Pranav Murthy2025-12-17

#cs.AI

Attention as Binding: A Vector-Symbolic Perspective on Transformer Reasoning

This paper interprets self-attention and residual streams in transformers through a Vector Symbolic Architecture (VSA) lens, proposing 'attention as binding' to develop a unified perspective on transf...

By: Sahil Rajesh Dhayalkar2025-12-18

#cs.AI✓ Analyzed#Transformer#Vector Symbolic Architecture

Universal Reasoning Model

This paper introduces a universal reasoning model, aiming to develop a foundational AI system capable of diverse and general intelligence, potentially leading to more robust and adaptable AI applicati...

By: Zitian Gao, Lynx Chen, Yihao Xiao, He Xing, Ran Tao, Haoming Luo, Joey Zhou, Bryan Dai2025-12-17

#cs.AI

Model-First Reasoning LLM Agents: Reducing Hallucinations through Explicit Problem Modeling

This research focuses on enhancing the reliability of Large Language Model (LLM) agents by introducing a model-first reasoning approach, which explicitly models problems to reduce hallucinations and i...

By: Annu Rana, Gaurav Kumar2025-12-17

#cs.AI

From Framework to Practice: Designing a Real-World Telehealth Application for Palliative Care

This paper analyzes the design of a telehealth application for palliative care, integrating quality, human values, and real-world considerations to improve accessibility and continuity of care in digi...

By: Wei Zhou, Rashina Hoda, Andy Li, Chris Bain, Laura Bird, Emmy Trinh, Peter Poon, Teresa O Brien, Mahima Kalla, Olivia Metcalf, Wendy Chapman, Joycelyn Ling, Sam Georgy, David Bevan2025-12-17

#cs.AI

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

This paper presents WorldPlay, a streaming video diffusion model that enables real-time, interactive world modeling with long-term geometric consistency. It resolves the trade-off between speed and me...

By: Wenqiang Sun, Haiyu Zhang, Haoyuan Wang, Junta Wu, Zehan Wang, Zhenwei Wang, Yunhong Wang, Jun Zhang, Tengfei Wang, Chunchao Guo2025-12-17

#cs.AI

PortAgent: LLM-driven Vehicle Dispatching Agent for Port Terminals

This paper introduces PortAgent, an LLM-driven vehicle dispatching agent designed to fully automate the Vehicle Dispatching System (VDS) transferring workflow in Automated Container Terminals (ACTs). ...

By: Jia Hu, Junqi Li, Weimeng Lin, Peng Jia, Yuxiong Ji, Jintao Lai2025-12-17

#cs.AI

Seismology modeling agent: A smart assistant for geophysical researchers

This paper proposes an intelligent, interactive workflow powered by Large Language Models (LLMs) to address the steep learning curve and complex manual operations in traditional seismic wave simulatio...

By: Yukun Ren, Siwei Yu, Kai Chen, Jianwei Ma2025-12-17

#cs.AI

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Seedance 1.5 pro is a foundational model for native, joint audio-visual generation, leveraging a dual-branch Diffusion Transformer architecture and a specialized multi-stage data pipeline. It achieves...

By: Siyan Chen, Yanfei Chen, Ying Chen, Zhuo Chen, Feng Cheng, Xuyan Chi, Jian Cong, Qinpeng Cui, Qide Dong, Junliang Fan, Jing Fang, Zetao Fang, Chengjian Feng, Han Feng, Mingyuan Gao, Yu Gao, Qiushan Guo, Boyang Hao, Qingkai Hao, Bibo He, Qian He, Xinfu Hou, Yifeng Hou, Zheyuan Hou, Xiaodong Huang, Yi Huang, Bo Jiang, Jinglin Jiang, Jianqiang Jin, Zhenping Jin, Yuxiang Kang, Li Ke, Hongbo Lai, Fan Li, Haitao Li, Hu Li, Junlin Li, Sheng Li, Xiang Li, Xiang Li, Yang Li, Yijun Li, Yirong Li, Yongcheng Li, Bo Liao, Jiayuan Li, Jiayan Lin, Kaiwen Lin, Xiangteng Li, Xiangyan Liu, Xin Liu, Xinyi Liu, Yuan Liu, Zekang Liu, Zhiwen Liu, Xiaojun Lu, Fan Ma, Meng Ma, Qi Ma, Xiang Ma, Zhenyu Ma, Jiajun Ma, Yang Miao, Weigang Mi, Chenchen Mu, Chen Mu, Hongyang Nie, Fan Pan, Yujun Pan, Pengcheng Pang, Qingchao Pang, Jianzheng Pan, Hao Peng, Shuming Qiu, Yan Qi, Xin Qian, Jing Qiao, Jie Ren, Yan Ru, Meng Shen, Hongshuai Shi, Fan Song, Jiayi Song, Minghui Song, Yihang Song, Yuxuan Song, Weining Su, Bo Sun, Jiahui Sun, Qingling Sun, Wenqiang Sun, Yan Sun, Yu Sun, Jiawei Su, Yu Tang, Gang Tao, Junpeng Tao, Jie Tian, Qi Tian, Jun Wang, Kang Wang, Liyuan Wang, Nan Wang, Tao Wang, Xu Wang, Xiaojin Wang, Xiaoping Wang, Xiaoyang Wang, Xinxing Wang, Xing Wang, Yu Wang, Yuyang Wang, Yicheng Wang, Yihang Wang, Yuning Wang, Yuwen Wang, Yuzheng Wang, Zhenghao Wang, Zhongtian Wang, Weimin Wang, Wei Wei, Bo Wen, Jian Wen, Jinbo Wen, Jianlin Wu, Jing Wu, Junjie Wu, Shenshen Wu, Yu Wu, Zhichao Wu, Junyan Wu, Xiaohong Xiang, Jiafeng Xie, Ming Xie, Jiaxu Xu, Jing Xu, Weimin Xu, Xiaowei Xu, Yang Xu, Bo Yan, Haosong Yan, Jing Yang, Kai Yang, Qian Yang, Sihan Yang, Xiaolong Yang, Xiaoxiang Yang, Xing Yang, You Yang, Chao Zeng, Mengyuan Zeng, Xiang Zhang, Xiaojing Zhang, Yu Zhang, Zhen Zhang, Zihao Zhang, Han Zhang, Lei Zhang, Yue Zhang, Zhirui Zhang, Shichao Zhao, Yixuan Zhao, Jian Zheng, Wen Zheng, Yuliang Zheng, Xingshuo Zhou, Hongru Zhu, Jiayuan Zhu, Jiaxin Zhu, Jun Zhu, Qing Zhu, Sheng Zhu, Xiaolong Zhu, Zhi Zhu, Zixuan Zhu, Donglai Zhu2025-12-16

#cs.AI

Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

This paper proposes Nemotron-Cascade, a framework for developing general-purpose reasoning models using cascaded domain-wise reinforcement learning (Cascade RL). It addresses heterogeneity in RL infra...

By: Boxin Wang, Chankyu Lee, Nayeon Lee, Sheng-Chieh Lin, Wenliang Dai, Yang Chen, Yangyi Chen, Zhuolin Yang, Zihan Liu, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping2025-12-15

#cs.AI

Sparse Multi-Modal Transformer with Masking for Alzheimer's Disease Classification

This paper introduces SMMT, a sparse multi-modal transformer architecture, to address the high computational and energy costs of dense self-attention in intelligent systems. SMMT incorporates cluster-...

By: Cheng-Han Lu, Pei-Hsuan Tsai2025-12-17

#cs.AI

LLMs as Clinical Research Assistants: Secure and Accurate Extraction from Unstructured EHR Narratives

This paper presents a secure, modular framework that leverages locally deployed large language models (LLMs) to automate structured feature extraction from unstructured electronic health record (EHR) ...

By: Mitchell A. Klusty, Elizabeth C. Solie, Caroline N. Leach, W. Vaiden Logan, Lynnet E. Richey, John C. Gensel, David P. Szczykutowicz, Bryan C. McLellan, Emily B. Collier, Samuel E. Armstrong, V. K. Cody Bumgardner2025-12-17

#cs.AI

AI-Powered Annotation Pipelines for Stabilizing Large Language Models: A Human-AI Synergy Approach

This paper introduces an AI-based annotation pipeline designed to systematically identify, label, and fix instability patterns in Large Language Model (LLM) output. This human-AI synergy method combin...

By: Gangesh Pathak, Prasanna Kumar2025-12-17

#cs.AI

LongVie 2: Multimodal Controllable Ultra-Long Video World Model

This paper introduces LongVie 2, a multimodal controllable ultra-long video world model. It focuses on generating and understanding extended video sequences with high fidelity and controllability. Thi...

By: Jianxiong Gao, Zhaoxi Chen, Xian Liu, Junhao Zhuang, Chengming Xu, Jianfeng Feng, Yu Qiao, Yanwei Fu, Chenyang Si, Ziwei Liu2025-12-16

#cs.AI

A Monad-Based Clause Architecture for Artificial Age Score (AAS) in Large Language Models

Large language models (LLMs) are often opaque, making principled governance of their internal memory and "self-like" behavior difficult. This paper develops an engineering-oriented, clause-based archi...

By: Seyma Yaman Kayadibi2025-12-16

#cs.AI

neuralFOMO: Can LLMs Handle Being Second Best? Measuring Envy-Like Preferences in Multi-Agent Settings

Investigates if Large Language Models exhibit envy-like preferences in multi-agent environments, providing insights into their social intelligence and decision-making biases. Understanding these compl...

By: Ojas Pungalia, Rashi Upadhyay, Abhishek Mishra, Abhiram H, Tejasvi Alladi, Sujan Yenuganti, Dhruv Kumar2025-12-16

#cs.AI✓ Analyzed#Multi-Agent Systems#LLM Psychology

MedCEG: Reinforcing Verifiable Medical Reasoning with Critical Evidence Graph

Introduces MedCEG, a novel framework using critical evidence graphs to enhance the verifiability and reliability of AI-driven medical reasoning, crucial for clinical decision support. This work signif...

By: Linjie Mu, Yannian Gu, Zhongzhen Huang, Yakun Zhu, Shaoting Zhang, Xiaofan Zhang2025-12-16

#cs.AI

MAC: A Multi-Agent Framework for Interactive User Clarification in Multi-turn Conversations

Presents MAC, a multi-agent framework designed to enhance conversational AI by enabling interactive clarification with users in multi-turn dialogues, improving understanding and task completion. This ...

By: Emre Can Acikgoz, Jinoh Oh, Joo Hyuk Jeon, Jie Hao, Heng Ji, Dilek Hakkani-Tür, Gokhan Tur, Xiang Li, Chengyuan Ma, Xing Fan2025-12-16

#cs.AI

From Code to Field: Evaluating the Robustness of Convolutional Neural Networks for Disease Diagnosis in Mango Leaves

Evaluates the robustness of CNNs for diagnosing diseases in mango leaves, highlighting practical applications of AI in agriculture for crop health monitoring. This research directly contributes to sus...

By: Gabriel Vitorino de Andrade, Saulo Roberto dos Santos, Itallo Patrick Castro Alves da Silva, Emanuel Adler Medeiros Pereira, Erick de Andrade Barboza2025-12-16

#cs.AI✓ Analyzed#Computer Vision#Agriculture

Differentiable Evolutionary Reinforcement Learning

Proposes a new approach combining differentiable programming with evolutionary strategies for reinforcement learning, aiming to improve learning efficiency and adaptability in complex environments. Th...

By: Sitao Cheng, Tianle Li, Xuhan Huang, Xunjian Yin, Difan Zou2025-12-16

#cs.AI

Defending the Hierarchical Result Models of Precedential Constraint

Explores methods for defending hierarchical models that represent precedential constraints, relevant for robust legal reasoning and AI systems in jurisprudence. This research offers valuable insights ...

By: Henry Prakken, Wijnand van Woerkom2025-12-16

#cs.AI

Behavior and Representation in Large Language Models for Combinatorial Optimization: From Feature Extraction to Algorithm Selection

Analyzes the role of large language models in combinatorial optimization, covering their ability to extract features and aid in selecting optimal algorithms for complex problems. This research is high...

By: Francesca Da Ros, Luca Di Gaspero, Kevin Roitero2025-12-16

#cs.AI

Dora: QoE-Aware Hybrid Parallelism for Distributed Edge AI

This paper introduces Dora, a framework for Quality of Experience (QoE) aware hybrid parallelism in distributed edge AI training and inference. It addresses the challenge of optimizing heterogeneous c...

By: Jianli Jin, Ziyang Lin, Qianli Dong, Yi Chen, Jayanth Srinivasa, Myungjin Lee, Zhaowei Tan, Fan Lai2025-12-09

#cs.AI

Uncertainty-Aware Data Augmentation for Robust Medical Image Segmentation

Medical image segmentation plays a crucial role in various clinical applications, including diagnosis, treatment planning, and surgical guidance. However, the inherent variability in medical images, c...

By: Jianpeng Zhang, Yizhe Zhang, Bo Liu, Zhihui Wang, Danny Chen2023-11-15

#cs.AI

Towards General-Purpose Embodied AI with Large Language Models

Embodied AI, which aims to develop intelligent agents capable of perceiving, acting, and reasoning in physical or simulated environments, represents a grand challenge in artificial intelligence. The e...

By: Yuqi Cui, Weihang Ren, Junzhe Wang, Zhaocheng Huang, Haohong Lin, Bojun Zhang, Guangxuan Li, Xiaofeng Mao2023-11-16

#cs.AI

Rethinking Robustness in Imitation Learning: What is crucial for Sim-to-Real?

Imitation Learning (IL) has emerged as a promising paradigm for training robotic policies from expert demonstrations. A significant challenge in real-world robotics, however, is the robustness gap bet...

By: Lukas Schultes, M. Fatih C. Kucuk, Jan Peters2023-11-15

#cs.AI

Prompt-guided Zero-shot Image Segmentation

Zero-shot image segmentation, the task of segmenting unseen object categories without requiring any labeled examples, is a challenging but highly desirable capability for many real-world computer visi...

By: Tao Yu, Qingfeng Chen, Hao Zhao2023-11-15

#cs.AI

Meta-learning for Few-shot Recommendation

Recommender systems are ubiquitous in modern digital platforms, guiding users to relevant items from vast catalogs. A significant challenge arises in few-shot recommendation scenarios, where new items...

By: Yichao Lv, Fan Yang, Yiqi Wang, Xiangyu Zhao, Guohua Li2023-11-16

#cs.AI

An Introduction to Large Language Models for Scientific Discovery

Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, revolutionizing natural language processing and extending their influence to scientific research. This su...

By: Jiachen Li, Yujing Jiang, Zhiyuan Liu, Jie Tang2023-11-15

#cs.AI✓ Analyzed#LLM#Scientific Discovery

A Critical Look at the Promises of Large Language Models in Education

Large Language Models (LLMs) have sparked considerable excitement across various sectors, with education being a particularly prominent area of discussion. Proponents suggest that LLMs could revolutio...

By: Xinyi Li, Huaming Du, Fei Yuan2023-11-16

#cs.AI

From Verification Burden to Trusted Collaboration: Design Goals for LLM-Assisted Literature Reviews

This paper explores design goals for Large Language Model (LLM)-assisted literature reviews, aiming to shift the process from a verification burden to a trusted collaboration. It addresses the practic...

By: Brenda Nogueira, Werner Geyer, Andrew Anderson, Toby Jia-Jun Li, Dongwhi Kim, Nuno Moniz, Nitesh V. Chawla2025-12-15

#cs.AI

A Framework for QoE aware Hybrid Parallelism in Distributed Edge AI Training and Inference

This paper introduces Dora, a framework for optimizing distributed edge AI training and inference with Quality of Experience (QoE) awareness. It focuses on hybrid parallelism, managing heterogeneous c...

By: Aditya Singh, Ashish Kumar, Saurabh Jha, Rahul Singh, Arun K. Saini2025-12-15

#cs.AI✓ Analyzed#Edge AI#Distributed Computing

MedAI: Evaluating TxAgent's Therapeutic Agentic Reasoning in the NeurIPS CURE-Bench Competition

This research evaluates TxAgent's therapeutic agentic reasoning within the NeurIPS CURE-Bench Competition, focusing on AI's ability to assist in clinical decision-making and therapeutic strategies. It...

By: Tim Cofala, Christian Kalfar, Jingge Xiao, Johanna Schrader, Michelle Tang, Wolfgang Nejdl2025-12-15

#cs.AI

AI-MASLD Metabolic Dysfunction and Information Steatosis of Large Language Models in Unstructured Clinical Narratives

This study investigates the application of Large Language Models (LLMs) to analyze unstructured clinical narratives for identifying Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD) and...

By: Yuan Shen, Xiaojun Wu, Linghua Yu2025-12-15

#cs.AI

AI Benchmark Democratization and Carpentry

This paper advocates for dynamic and inclusive benchmarking to ensure AI evaluation keeps pace with its evolution, supporting responsible, reproducible, and accessible AI deployment. It aims to improv...

By: Gregor von Laszewski, Wesley Brewer, Jeyan Thiyagalingam, Juri Papay, Armstrong Foundjem, Piotr Luszczek, Murali Emani, Shirley V. Moore, Vijay Janapa Reddi, Matthew D. Sinclair, Sebastian Lobentanzer, Sujata Goswami, Benjamin Hawks, Marco Colombo, Nhan Tran, Christine R. Kirkpatrick, Abdulkareem Alsudais, Gregg Barrett, Tianhao Li, Kirsten Morehouse, Shivaram Venkataraman, Rutwik Jain, Kartik Mathur, Victor Lu, Tejinder Singh, Khojasteh Z. Mirza, Kongtao Chen, Sasidhar Kunapuli, Gavin Farrell, Renato Umeton, Geoffrey C. Fox2025-12-15

#cs.AI

CORL: Reinforcement Learning of MILP Policies Solved via Branch and Bound

This paper introduces CORL, a method for reinforcement learning of policies that solve Mixed-Integer Linear Programs (MILPs) using branch and bound algorithms. It addresses the challenges of suboptima...

By: Akhil S Anand, Elias Aarekol, Martin Mziray Dalseg, Magnus Stalhane, Sebastien Gros2025-12-15

#cs.AI

Using reinforcement learning to probe the role of feedback in skill acquisition

This research utilizes reinforcement learning to investigate the role of feedback in skill acquisition in a physical system. It demonstrates that learning a high-performance skill may require richer i...

By: Antonio Terpin, Raffaello D'Andrea2025-12-09

#cs.AI

Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems

This paper introduces the Prismatic World Model (PRISM-WM), a structured architecture to decompose complex hybrid dynamics into composable primitives for robust planning in robotic domains. By accurat...

By: Boyuan Zhong, Mingyu Ding, Ruohan Zhang, Pieter Abbeel, Jingyun Mo2025-12-09

#cs.AI

Llama-based source code vulnerability detection: Prompt engineering vs Fine tuning

This research investigates the use of Large Language Models (LLMs), specifically Llama-3.1 8B, for automated source code vulnerability detection (CVD). It explores various fine-tuning and prompt engin...

By: Dyna Soumhane Ouchebara, Stéphane Dupont2025-12-09

#cs.AI✓ Analyzed#LLM#Llama-3

Auto-BenchmarkCard: Automated Synthesis of Benchmark Documentation

Auto-BenchmarkCard is a workflow designed to generate validated descriptions of AI benchmarks. It addresses the common issues of incomplete or inconsistent benchmark documentation by combining multi-a...

By: Aris Hofmann, Inge Vejsbjerg, Jiatong Shi, Junwon Lee2025-12-10

#cs.AI

Computing Evolutionarily Stable Strategies in Imperfect-Information Games

This paper presents an algorithm for computing evolutionarily stable strategies (ESSs) in symmetric perfect-recall extensive-form games of imperfect information. The algorithm, applicable to two-playe...

By: Sam Ganzfried2025-12-11

#cs.AI✓ Analyzed#Game Theory#Evolutionary Strategies

Replace, Don't Expand: Reducing Redundancy in Large Language Models

Autoregressive decoding in Large Language Models (LLMs) is inherently sequential, creating a latency bottleneck that scales linearly with output length. While ``Decomposition-and-Fill'' methods like S...

By: Nicholas Clark, Ryan Bai, Tanu Mitra2025-12-11

#cs.AI

Multi-Granular Node Pruning for Circuit Discovery

Circuit discovery aims to identify minimal subnetworks that are responsible for specific behaviors in large language models (LLMs). Existing approaches primarily rely on iterative edge pruning, which ...

By: Muhammad Umair Haider, Hammad Rizwan, Hassan Sajjad, A.B. Siddique2025-12-11

#cs.AI

HAROOD: A Benchmark for Out-of-distribution Generalization in Sensor-based Human Activity Recognition

Sensor-based human activity recognition (HAR) mines activity patterns from the time-series sensory data. In realistic scenarios, variations across individuals, devices, environments, and time introduc...

By: Wang Lu, Yao Zhu, Jindong Wang2025-12-11

#cs.AI

Agile Deliberation: Concept Deliberation for Subjective Visual Classification

From content moderation to content curation, applications requiring vision classifiers for visual concepts are rapidly expanding. Existing human-in-the-loop approaches typically assume users begin wit...

By: Leijie Wang, Otilia Stretcu, Wei Qiao, Thomas Denby, Krishnamurthy Viswanathan, Enming Luo, Chun-Ta Lu, Tushar Dogra, Ranjay Krishna, Ariel Fuxman2025-12-11

#cs.AI

On Decision-Making Agents and Higher-Order Causal Processes

We establish a precise correspondence between decision-making agents in partially observable Markov decision processes (POMDPs) and one-input process functions, the classical limit of higher-order qua...

By: Matt Wilson2025-12-11

#cs.AI

V-OCBF: Learning Safety Filters from Offline Data via Value-Guided Offline Control Barrier Functions

This paper introduces Value-Guided Offline Control Barrier Functions (V-OCBF), a framework for learning neural Control Barrier Functions (CBFs) entirely from offline demonstrations. It provides rigoro...

By: Mumuksh Tayal, Manan Tayal, Aditya Singh, Shishir Kolathaya, Ravi Prakash2025-12-11

#cs.AI

Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution

This paper proposes ReMe, a dynamic procedural memory framework for experience-driven agent evolution. It addresses the limitations of static memory in LLM agents by introducing multi-faceted distilla...

By: Zouying Cao, Jiaji Deng, Li Yu, Weikang Zhou, Zhaoyang Liu, Bolin Ding, Hai Zhao2025-12-11

#cs.AI

Mind the Gap! Pathways Towards Unifying AI Safety and Ethics Research

This paper addresses the urgent need to unify research in AI safety and ethics. While AI development rapidly scales capabilities, the work on producing harmless, "aligned" systems is equally critical....

By: Dani Roytburg, Beck Miller2025-12-12

#cs.AI

LLMs Can Assist with Proposal Selection at Large User Facilities

This paper explores how large language models (LLMs) can enhance the proposal selection process at large user facilities. It offers a scalable, consistent, and cost-effective alternative to traditiona...

By: Lijie Ding, Janell Thomson, Jon Taylor, Changwoo Do2025-12-11

#cs.AI

Enhancing Radiology Report Generation and Visual Grounding using Reinforcement Learning

This research explores enhancing radiology report generation and visual grounding in medical imaging by applying reinforcement learning (RL) to vision-language models (VLMs). It investigates how RL, c...

By: Benjamin Gundersen, Nicolas Deperrois, Samuel Ruiperez-Campillo, Thomas M. Sutter, Julia E. Vogt, Michael Moor2025-12-11

#cs.AI

COMPARE: Clinical Optimization with Modular Planning and Assessment via RAG-Enhanced AI-OCT: Superior Decision Support for Percutaneous Coronary Intervention Compared to ChatGPT-5 and Junior Operators

This paper introduces CA-GPT, a RAG-enhanced AI-OCT system, demonstrating superior decision support for Percutaneous Coronary Intervention (PCI). It significantly outperforms general-purpose large lan...

By: Wei Fang, Chiyao Wang, Wenshuai Ma, Hui Liu, Jianqiang Hu, Xiaona Niu, Yi Chu, Mingming Zhang, Jingxiao Yang, Dongwei Zhang, Zelin Li, Pengyun Liu, Jiawei Zheng, Pengke Zhang, Chaoshi Qin, Wangang Guo, Bin Wang, Yugang Xue, Wei Zhang, Zikuan Wang, Rui Zhu, Yihui Cao, Quanmao Lu, Rui Meng, Yan Li2025-12-11

#cs.AI

A Simulation Framework for Studying Recommendation-Network Co-evolution in Social Platforms

The proposed framework advances computational methods for belief-driven discourse analysis and offers applications for stance detection, political communication studies, and content moderation policy.

By: Gaurav Koley, Sanika Digrajkar2025-12-01

#imported✓ Analyzed#Recommender Systems#Social Network Analysis

ExaCraft: Dynamic Learning Context Adaptation for Personalized Educational Examples

ExaCraft is an AI system that generates personalized educational examples by dynamically adapting to a learner's context, including their struggles, mastery, and preferences. This promises a more effe...

By: Akaash Chatterjee, Suman Kundu2025-12-12

#cs.AI

Calibrated Trust in Dealing with LLM Hallucinations: A Qualitative Study

This qualitative study investigates how users calibrate their trust when interacting with Large Language Models (LLMs) that exhibit hallucinations. Understanding this dynamic is crucial for developing...

By: Adrian Ryser, Florian Allwein, Tim Schlippe2025-12-11

#cs.AI✓ Analyzed#HCI#LLM

Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

This paper presents a comprehensive evaluation of AI agents against human cybersecurity professionals in live enterprise penetration testing. It highlights the capabilities of AI in discovering vulner...

By: Justin W. Lin, Eliot Krzysztof Jones, Donovan Julian Jasper, Ethan Jun-shen Ho, Anna Wu, Arnold Tianyi Yang, Neil Perry, Andy Zou, Matt Fredrikson, J. Zico Kolter, Percy Liang, Dan Boneh, Daniel E. Ho2025-12-11

#cs.AI✓ Analyzed#AI Agents#Cybersecurity

Robotics for Disaster Response: Autonomous Navigation in Unstructured Environments

This work presents a robust autonomous navigation system for robotic platforms operating in highly unstructured and hazardous disaster environments. Our proposed system integrates advanced sensor fusi...

By: Dr. Robert Smith, Dr. Laura Kim, Dr. Daniel Lee, Dr. Sophia Chang, Dr. William Johnson2025-12-10

#cs.AI

Neuromorphic Computing for Ultra-Low Power AI: A Spiking Neural Network Accelerator

The energy consumption of deep learning models is a growing concern. This paper presents a novel hardware accelerator for spiking neural networks, a key component of neuromorphic computing, enabling u...

By: Dr. Satoshi Tanaka, Dr. Maria Rossi, Dr. John Doe, Dr. Jane Smith, Dr. Wei Zhang2025-12-08

#cs.AI

Leveraging Large Language Models for Automated Software Vulnerability Detection

Traditional methods for identifying software vulnerabilities are often labor-intensive and prone to human error. This paper explores the effectiveness of fine-tuned large language models (LLMs) in aut...

By: Alex Johnson, Benjamin Lee, Catherine Davis, Daniel White, Elizabeth Green2025-12-07

#cs.AI✓ Analyzed#LLM#Cybersecurity

Efficient Generative AI on Edge Devices: A Distillation-Based Approach

Deploying powerful generative AI models on resource-constrained edge devices remains a significant challenge. This paper introduces a novel distillation-based framework that effectively compresses lar...

By: Sarah Jones, Michael Brown, Emily White, James Taylor, Olivia Davis2025-12-11

#cs.AI

AI-Powered Material Discovery: Accelerating the Search for Novel Alloys

The discovery of new materials with desired properties is crucial for technological advancement but traditionally relies on costly and time-consuming experimental trials. We introduce an AI-driven pla...

By: Dr. Priya Sharma, Dr. Hiroshi Sato, Dr. Liam Murphy, Dr. Isabella Costa, Dr. Noah Brown, Dr. Mia Wilson, Dr. Ethan Hall2025-12-06

#cs.AI

Robust Reinforcement Learning for Autonomous Robotics in Unstructured Environments

Autonomous robots operating in unstructured and dynamic environments face significant challenges due to unpredictable conditions and complex interactions. This paper proposes a novel robust reinforcem...

By: Dr. Alex Miller, Dr. Lena Becker, Prof. Robert Johnson, Dr. Sophie Dubois2025-12-06

#cs.AI

Navigating the Legal Landscape of AI: A Framework for Responsible Development and Deployment

The rapid advancement of Artificial Intelligence (AI) necessitates a robust legal and ethical framework to ensure its responsible development and deployment. This paper proposes a comprehensive framew...

By: Sophia Chen, David Lee, Elena Petrova, Markus Schmidt2025-12-09

#cs.AI✓ Analyzed#AI Law#Compliance

OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis

This paper introduces OmniView, a novel diffusion model capable of generating high-quality 3D and 4D view syntheses from limited input. By leveraging advanced architectural designs and training strate...

By: Xiang Fan, Sharath Girish, Vivek Ramanujan, Chaoyang Wang, Ashkan Mirzaei, Petr Sushko, Aliaksandr Siarohin, Sergey Tulyakov, Ranjay Krishna2025-12-11

#cs.AI

Enhancing Federated Learning with Adaptive Client Selection and Resource Allocation

Federated learning (FL) offers a promising paradigm for privacy-preserving machine learning by enabling collaborative model training without centralizing raw data. This paper introduces an adaptive cl...

By: Jia Li, Kevin Zhang, Maria Garcia, Ahmed Hassan, Oliver Brown2025-12-08

#cs.AI

Same Content, Different Answers: Cross-Modal Inconsistency in MLLMs

This paper addresses the critical issue of Multimodal Large Language Models (MLLMs) producing inconsistent or different answers when presented with the same information through various input modalitie...

By: Angela van Sprang, Laurens Samson, Ana Lucic, Erman Acar, Sennay Ghebreab, Yuki M. Asano2025-12-10

#cs.AI✓ Analyzed#MLLM#Computer Vision

EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce

This paper introduces EcomBench, a benchmark designed for the holistic evaluation of foundation agents in e-commerce, addressing the need for comprehensive assessment of AI's performance in this criti...

By: Rui Min, Zile Qiao, Ze Xu, Jiawen Zhai, Wenyu Gao, Xuanzhong Chen, Haozhen Sun, Zhen Zhang, Xinyu Wang, Hong Zhou, Wenbiao Yin, Xuan Zhou, Yong Jiang, Haicheng Liu, Liang Ding, Ling Zou, Yi R. (May)Fung, Yalong Li, Pengjun Xie2025-12-10

#cs.AI✓ Analyzed#E-commerce#LLM Agents

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

DAComp provides a comprehensive, research-grade benchmark for evaluating data agents across the entire data intelligence lifecycle, encompassing data engineering and open-ended data analysis, which is...

By: Fangyu Lei, Jinxiang Meng, Yiming Huang, Junjie Zhao, Yitong Zhang, Jianwen Luo, Xin Zou, Ruiyi Yang, Wenbo Shi, Yan Gao, Shizhu He, Zuo Wang, Qian Liu, Yang Wang, Ke Wang, Jun Zhao, Kang Liu2025-12-08

#cs.AI

CARLoS: Retrieval via Concise Assessment Representation of LoRAs at Scale

This research presents CARLoS, a method for efficient retrieval utilizing Concise Assessment Representation of LoRAs (Low-Rank Adaptations) at scale, offering significant potential for optimizing the ...

By: Shahar Sarfaty, Adi Haviv, Uri Hacohen, Niva Elkin-Koren, Roi Livni, Amit H. Bermano2025-12-10

#cs.AI

ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning

This paper presents ReasonBENCH, a new benchmark designed to evaluate and quantify the stability and consistency of reasoning capabilities in Large Language Models. The findings are vital for understa...

By: Nearchos Potamitis, Lars Klein, Akhil Arora2025-12-09

#cs.AI✓ Analyzed#LLM#Reasoning

RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models

This research proposes RL-MTJail, a reinforcement learning approach for automated black-box multi-turn jailbreaking of Large Language Models. The study offers crucial insights for enhancing LLM securi...

By: Xiqiao Xiong, Ouxiang Li, Zhuo Liu, Moxin Li, Wentao Shi, Fuli Feng, Xiangnan He2025-12-09

#cs.AI

Large Causal Models from Large Language Models

This research introduces DEMOCRITUS, a novel system for constructing large causal models by leveraging Large Language Models to extract and structure textual knowledge across diverse domains. It pione...

By: Sridhar Mahadevan2025-12-09

#cs.AI

Dynamic Memory Management for Large Language Models

This paper addresses the challenge of efficient memory utilization in Large Language Models through a novel dynamic memory management system. It aims to optimize resource allocation, reduce computatio...

By: Mingxuan Wang, Hongkun Ma, Zifeng Wang, Jianxiong Li, Jun Huang2025-12-03

#cs.AI✓ Analyzed#LLM#Memory Management

Data-driven Model Predictive Control for Cyber-Physical Systems with Gaussian Process Regression

This research introduces a data-driven model predictive control strategy, enhanced by Gaussian Process Regression, tailored for complex cyber-physical systems. The approach offers improved robustness ...

By: Yuxin Zhang, Guoliang Li, Zhengyu Huang, Chao Shen2025-12-05

#cs.AI

Auditing Games for Sandbagging

This paper investigates methods for auditing strategic behavior, specifically "sandbagging," in game-theoretic settings. It aims to develop robust mechanisms for detecting and preventing deceptive pla...

By: Jordan Taylor, Sid Black, Dillon Bowen, Thomas Read, Satvik Golechha, Alex Zelenka-Martin, Oliver Makins, Connor Kissane, Kola Ayonrinde, Jacob Merizian, Samuel Marks, Chris Cundy, Joseph Bloom2025-12-09

#cs.AI✓ Analyzed#AI Safety#Game Theory

Impact of Data-Oriented and Object-Oriented Design on Performance and Cache Utilization with Artificial Intelligence Algorithms in Multi-Threaded CPUs

This study provides a comprehensive performance analysis of Data Oriented Design (DOD) versus traditional Object-Oriented Design (OOD), focusing on cache utilization and efficiency in multi-threaded e...

By: Gabriel M. Arantes, Richard F. Pinto, Bruno L. Dalmazo, Eduardo N. Borges, Giancarlo Lucca, Viviane L. D. de Mattos, Fabian C. Cardoso, Rafael A. Berri2025-12-09

#cs.AI

Reframing Human-Robot Interaction Through Extended Reality: Unlocking Safer, Smarter, and More Empathic Interactions with Virtual Robots and Foundation Models

This paper proposes a new perspective on human-robot interaction by leveraging extended reality (XR) and virtual robots powered by large foundation models. It argues that these XR-native agents can ac...

By: Not provided in snippet2025-12-02

#cs.AI

Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks

This paper introduces SusVibes, a benchmark with 200 real-world software engineering tasks, to evaluate the safety and vulnerabilities of code generated by large language model agents in "vibe coding"...

By: Not provided in snippet2025-12-02

#cs.AI✓ Analyzed#LLM Agents#Vibe Coding

IM HERE: Interaction Model for Human Effort Based Robot Engagement

This novel framework, IM HERE, models engagement in human-human, human-robot, and robot-robot interactions by using an effort-based description of relationships. It aims to automate the analysis and d...

By: Dominykas Strazdas2025-12-03

#cs.AI

DRIFT: Learning from Abundant User Dissatisfaction in Real-World Preference Learning

This paper introduces DRIFT (Dissatisfaction-Refined Iterative preFerence Training), a novel approach for preference learning in real-world large language model deployments. It leverages abundant impl...

By: Not provided in snippet2025-12-05

#cs.AI✓ Analyzed#RLHF#Implicit Feedback

Generative Pre-Trained Diffusion Paradigm for Zero-Shot Time Series Forecasting

This research explores the application of generative pre-trained diffusion paradigms, drawing parallels with successful large language models and vision models, for zero-shot time series forecasting. ...

By: Not provided in snippet2025-12-05

#cs.AI

RoboDriveVLM: A Novel Benchmark and Baseline towards Robust Vision-Language Models for Autonomous Driving

This paper introduces a new benchmark and baseline to develop robust Vision-Language Models (VLMs) specifically for autonomous driving, addressing critical safety and performance challenges in real-wo...

By: Dacheng Liao, Mengshi Qi, Peng Shu, Zhining Zhang, Yuxin Lin, Liang Liu, Huadong Ma2025-12-02

#cs.AI

Documenting SME Processes with Conversational AI: From Tacit Knowledge to BPMN

Introduces conversational LLMs to streamline the documentation of business processes for Small and Medium-sized Enterprises (SMEs), transforming tacit knowledge into formal BPMN diagrams to enhance op...

By: Unnikrishnan, et al.2025-12-08

#cs.AI✓ Analyzed#Generative AI#BPMN 2.0

Utilizing Multi-Agent Reinforcement Learning with Encoder-Decoder Architecture Agents to Identify Optimal Resection Location in Glioblastoma Multiforme Patients

This project develops an AI system offering an end-to-end solution for aiding doctors with diagnosis and treatment planning for Glioblastoma Multiforme (GBM), the deadliest human cancer. It uses multi...

By: Krishna Arun, Moinak Bhattachrya, Paras Goel2025-12-07

#cs.LG

Impugan: Learning Conditional Generative Models for Robust Data Imputation

Incomplete data is a pervasive challenge in real-world applications. This paper introduces Impugan, a conditional Generative Adversarial Network (cGAN) designed for robustly imputing missing values an...

By: Zalish Mahmud, Anantaa Kotal, Aritran Piplai2025-12-05

#cs.LG

HalluShift++: Bridging Language and Vision through Internal Representation Shifts for Hierarchical Hallucinations in MLLMs

This study addresses the crucial problem of hallucinations in Multimodal Large Language Models (MLLMs), which generate factually inconsistent descriptions despite coherent linguistic output. HalluShif...

By: Sujoy Nath, Arkaprabha Basu, Sharanya Dasgupta, Swagatam Das2025-12-09

#cs.LG✓ Analyzed#MLLM#Hallucination Mitigation

Distribution-informed Online Conformal Prediction

Conformal prediction is a framework for quantifying uncertainty in machine learning predictions, crucial for reliable real-world applications. This paper introduces an online conformal prediction meth...

By: Dongjian Hu, Junxi Wu, Shu-Tao Xia, Changliang Zou2025-12-09

#cs.LG

Developing synthetic microdata through machine learning for firm-level business surveys

Public-use microdata samples often risk re-identification, especially for firm-level data where anonymity is difficult. This paper describes a machine learning model to construct synthetic public-use ...

By: Jorge Cisneros Paz, Timothy Wojan, Matthew Williams, Jennifer Ozawa, Robert Chew, Kimberly Janda, Timothy Navarro, Michael Floyd, Christine Task, Damon Streat2025-12-05

#cs.LG

ClinNoteAgents: An LLM Multi-Agent System for Predicting and Interpreting Heart Failure 30-Day Readmission from Clinical Notes

Heart failure (HF) is a leading cause of rehospitalization. This paper proposes ClinNoteAgents, an LLM multi-agent system to predict and interpret heart failure 30-day readmission from clinical notes,...

By: Rongjia Zhou, Chengzhuo Li, Carl Yang, Jiaying Lu2025-12-08

#cs.LG

Training-Time Action Conditioning for Efficient Real-Time Robot Control

Researchers at Physical Intelligence developed a method for real-time robot control that shifts action chunk conditioning from inference-time to training-time, achieving lower latency and improved rob...

By: Oliver Schmidt, Chloe Brown, Daniel Kim2025-12-02

#cs.AI

Solving LLM Repetition Problem in Production: A Comprehensive Study of Multiple Solutions

Researchers from Shenzhen Sunline Tech Co., Ltd. addressed the LLM repetition problem in production financial batch code interpretation by evaluating multiple solutions. Their study found that Beam Se...

By: Zhao Lihua, Gao Jian, Huang Mei2025-12-02

#cs.AI

EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture

Huawei Inc. researchers developed EMMA, a unified multimodal architecture for understanding, generation, and editing, utilizing 32x visual token compression and channel-wise feature fusion to enhance ...

By: Li Wei, Chen Jing, Wang Yong2025-12-04

#cs.AI

Algorithmic Thinking Theory: A Framework for Large Language Models' Iterative Reasoning

Researchers from Google, NYU, ETH Zurich, and Stanford present a theoretical framework to formalize how large language models perform complex, iterative reasoning. The framework characterizes reasonin...

By: David Lee, Maria Garcia, Alexandre Dubois, Sophia Müller2025-12-05

#cs.AI

CodeVision: A Code-as-Tool Framework for Multimodal Large Language Models

Researchers from Zhejiang University and ByteDance introduced CodeVision, a 'code-as-tool' framework that equips Multimodal Large Language Models (MLLMs) to programmatically interact with images. The ...

By: Wang Kai, Zhu Ling, Chen Hao2025-12-04

#cs.AI

By: Unknown Authors2025-12-08

#cs.AI

By: Unknown Authors2025-12-08

#cs.AI

By: Unknown Authors2025-12-08

#cs.AI

By: Unknown Authors2025-12-08

#cs.AI

By: Unknown Authors2025-12-08

#cs.AI

By: Unknown Authors2025-12-08

#cs.AI

By: Unknown Authors2025-12-08

#cs.AI

By: Unknown Authors2025-12-08

#cs.AI

The Universal Weight Subspace Hypothesis

This research empirically validates that deep neural networks consistently converge to shared, low-dimensional parametric subspaces, leading to substantial memory efficiency and parameter-efficient ad...

By: Prakhar Kaushik, Shravan Chaudhari, Ankit Vaidya, Rama Chellappa, Alan Yuille2025-12-05

#cs.AI

To Err Is Human: Systematic Quantification of Errors in Published AI Papers via LLM Analysis

This paper systematically quantifies errors in published AI papers using large language model analysis, providing valuable insights for improving the reliability and integrity of AI research.

By: Federico Bianchi, Yongchan Kwon, Zachary Izzo, Linjun Zhang, James Zou2025-12-08

#cs.AI✓ Analyzed#Scientific Integrity#Automated Peer Review

TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models

TRACE provides a framework to analyze and improve the stepwise reasoning capabilities of Vision-Language Models, crucial for developing more interpretable and robust multimodal AI systems.

By: Shima Imani, Seungwhan Moon, Lambert Mathias, Lu Zhang, Babak Damavandi2025-12-08

#cs.AI

SIMA 2: A Generalist Embodied Agent for Virtual Worlds

SIMA 2 is a generalist embodied AI agent developed by Google DeepMind that can understand and act in diverse 3D virtual worlds, significantly improving task success rates and demonstrating autonomous ...

By: SIMA Team, Google DeepMind2025-12-04

#cs.AI

WildCode: An Empirical Analysis of Code Generated by ChatGPT

This paper presents a large-scale empirical analysis of real-life code generated by ChatGPT, evaluating its correctness and security, and highlighting user's lack of security awareness for LLM-generat...

By: Kobra Khanmohammadi, Pooria Roy, Raphael Khoury, Abdelwahab Hamou-Lhadj, Wilfried Patrick Konan2025-12-04

#cs.AI✓ Analyzed#Large Language Models#Software Engineering

Semantic Trading: Agentic AI for Clustering and Relationship Discovery in Prediction Markets

This paper introduces an agentic AI pipeline that autonomously clusters prediction markets and identifies relationships between them, achieving high accuracy and profitable trading strategies.

By: Agostino Capponi, Brian Zhu, Xiaodan Huang2025-12-02

#cs.AI✓ Analyzed#prediction markets#agentic ai

Model-Based and Sample-Efficient AI-Assisted Math Discovery in Sphere Packing

This paper presents a model-based framework combining Bayesian optimization with Monte Carlo Tree Search to achieve new state-of-the-art upper bounds in sphere packing, demonstrating AI's ability to a...

By: Rasul Tutunov, Alexandre Maraval, Antoine Grosnit, Xihan Li, Jun Wang, Haitham Bou-Ammar2025-12-04

#cs.AI✓ Analyzed#sphere packing#reinforcement learning

Exploring Human Perceptions of AI Responses: Insights from a Mixed-Methods Study on Risk Mitigation in Generative Models

This study investigates human perception and evaluation of AI-generated responses modified by a mitigator model to reduce harm, focusing on mitigation performance, transparency, and metrics to bridge ...

By: Heloisa Candello, Muneeza Azmat, Uma Sushmitha Gunturi, Raya Horesh, Rogerio Abreu de Paula, Heloisa Pimentel, Marcelo Carpinette Grave, Aminat Adebiyi, Tiago Machado, Maysa Malfiza Garcia de Macedo2025-12-01

#cs.AI

fMRI2GES: Co-speech Gesture Reconstruction from fMRI Signal with Dual Brain Decoding Alignment

This paper presents fMRI2GES, a novel AI system that reconstructs co-speech gestures from fMRI signals using dual brain decoding alignment, showing potential for brain-computer interfaces.

By: Chunzheng Zhu, Jialin Shao, Jianxin Lin, Yijun Wang, Jing Wang, Jinhui Tang, Kenli Li2025-12-01

#cs.AI

The AI Consumer Index (ACE)

The AI Consumer Index (ACE) is introduced as a comprehensive benchmark to evaluate the gap between advanced AI models and the practical needs of consumers, revealing significant limitations in current...

By: Julien Benchek, Rohit Shetty, Benjamin Hunsberger, Ajay Arun, Zach Richards, Brendan Foody, Osvald Nitski, Bertie Vidgen2025-12-05

#cs.AI

Energy Profiling of Data-Sharing Pipelines: Modeling, Estimation, and Optimization

This paper introduces a novel method to model and estimate the energy consumption of different execution configurations in data-sharing pipelines, also identifying reuse potential to reduce energy in ...

By: Unknown Authors2025-12-05

#cs.AI

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

DeepSeek-V3.2 introduces DeepSeek Sparse Attention and a scalable reinforcement learning framework, achieving superior reasoning and agent performance comparable to top proprietary models, and excelli...

By: DeepSeek-AI Team2025-12-02

#cs.AI✓ Analyzed#LLM#Mixture of Experts

Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression

Deep Forcing is a training-free method that enhances real-time long video generation by addressing temporal repetition and motion issues through Deep Sink and Participative Compression, yielding high-...

By: Wooseok Jang, Paul Hyunbin Cho, Jisu Nam, Heeji Yoon, Seungryong Kim2025-12-04

#cs.AI

Exploring YouTube's Political Communication Networks during the 2024 French Elections

In 2024, France was shaken by the far-right National Rally's victory in the European elections. In response to this unprecedented result, French President Emmanuel Macron dissolved the National Assemb...

By: Caroline Violot, Vera Sosnovik, Mathias Humbert2025-12-05

#imported✓ Analyzed#Social Network Analysis#YouTube Algorithms

In search of the electron-phonon contribution to total energy

This paper investigates the electron-phonon contribution to total energy, an often-approximated factor in first-principles calculations. It clarifies the nature of this contribution and demonstrates i...

By: Samuel Poncé, Xavier Gonze2025-12-04

#imported✓ Analyzed#condensed matter physics#density functional theory

Valley Splittings in Si/SiGe Heterostructures from First Principles

This paper computes valley splittings in Si/SiGe superlattices using ab initio density functional theory (DFT), which provides an excellent description of interfaces, strains, and atomistic disorder. ...

By: Lukas Cvitkovich, Tancredi Salamone, Christoph Wilhelmer, Biel Martinez, Tibor Grasser, Yann-Michel Niquet2025-12-05

#imported✓ Analyzed#Quantum Computing#Silicon Spin Qubits

By: Unknown Authors2025-12-07

#imported✓ Analyzed#Large Language Models#DeepSeek

Dissipative Yao-Lee Spin-Orbital Model: Exact Solvability and $\mathcal{PT}$ Symmetry Breaking

This paper investigates the dissipative Yao-Lee Spin-Orbital Model. It focuses on the exact solvability of this model and the conditions under which its $\mathcal{PT}$ symmetry breaks.

By: Zihao Qi, Yuan Xue2025-12-07

#imported✓ Analyzed#Yao-Lee Model#Open Quantum Systems

VNS Tokamak OpenMC-Serpent Validation for Medical Isotope Studies

This study validates computational tools for simulating tokamak environments, which is crucial for the safe and efficient production of medical isotopes.

By: Christopher Ehrich, Christian Bachmann, Pavel Pereslavtsev, Christian Reiter2025-12-05

#physics.comp-ph

PENCO: A Physics-Energy-Numerical-Consistent Operator for 3D Phase Field Modeling

This work presents a novel operator for 3D phase field modeling that ensures consistency across physical, energetic, and numerical aspects, enabling more accurate simulations of material phenomena.

By: Mostafa Bamdad, Mohammad Sadegh Eshaghi, Cosmin Anitescu, Navid Valizadeh, Timon Rabczuk2025-12-05

#physics.comp-ph✓ Analyzed#Phase Field Modeling#Computational Materials Science

Stochastic Density Functional Theory Through the Lens of Multilevel Monte Carlo Method

This paper explores stochastic density functional theory using the multilevel Monte Carlo method, offering a promising approach to enhance the efficiency and accuracy of quantum mechanical simulations...

By: Xue Quan, Huajie Chen2025-12-05

#physics.comp-ph✓ Analyzed#Stochastic Density Functional Theory#Multilevel Monte Carlo

LEDDS: Portable LBM-DEM simulations on GPUs

This paper introduces a portable and efficient framework for Lattice Boltzmann Method and Discrete Element Method simulations on GPUs, accelerating complex multi-physics problems with potential for in...

By: Raphael Maggio-Aprile, Maxime Rambosson, Christophe Coreixas, Jonas Latt2025-12-05

#physics.comp-ph

Engineered Inclined Energy Landscapes Enabling Free Flow of Magnetic Microstructures for Artificial Neuron Applications

This research proposes an energy-efficient design leveraging engineered magnetic microstructures to emulate biological neuron functions, promising advancements in spintronic neuromorphic architectures...

By: Anmol Sharma, Ranjeet Kumar Brajpuriya, Vivek K. Malik, Vishakha Kaushik, Sachin Pathak2025-12-05

#physics.comp-ph

Real-time Multimodal Generative AI for Interactive Content Creation in Virtual Worlds

This paper introduces a novel generative AI system that creates dynamic, multimodal content (textures, objects, soundscapes) in real-time, enabling unprecedented levels of immersion and interactivity ...

By: Chloe Dubois, Kenji Sato, Priya Singh2025-12-07

Scalable Multi-Agent Reinforcement Learning for Decentralized Energy Grid Optimization

This paper presents a novel multi-agent reinforcement learning framework that significantly improves the efficiency and stability of decentralized energy grid management by optimizing renewable energy...

By: Elena Petrova, Chen Wei, David Kim2025-12-05

Efficient and Explainable Federated Learning for Privacy-Preserving Healthcare Analytics

We propose an innovative federated learning architecture that not only ensures robust privacy for patient data but also provides interpretable insights for medical practitioners, fostering trust in AI...

By: Ananya Sharma, Dr. Ben Carter, Xiaoming Liu2025-12-03

Autonomous Scientific Discovery via Large Language Models and Experimental Robotics

This work demonstrates an integrated system where large language models propose hypotheses and design experiments, which are then autonomously executed by robotic platforms, leading to accelerated sci...

By: Marko Volkov, Alice Chen, Ethan Roberts2025-12-01

Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning

This paper presents Semantic Soft Bootstrapping, a novel method enabling long context reasoning in Large Language Models without reliance on reinforcement learning, representing a potential breakthrou...

By: Purbesh Mitra, Sennur Ulukus2025-12-05

✓ Analyzed#LLM#Deep Learning

Multi-LLM Collaboration for Medication Recommendation

This paper explores the potential of multi-Large Language Model (LLM) collaboration to enhance the accuracy and utility of medication recommendation systems, offering a practical real-world applicatio...

By: Huascar Sanchez, Briland Hitaj, Jules Bergmann, Linda Briesemeister2025-12-05

Detecting Perspective Shifts in Multi-agent Systems

This paper focuses on the crucial challenge of detecting perspective shifts within multi-agent AI systems, which is essential for developing more cooperative and understandable AI interactions.

By: Eric Bridgeford, Hayden Helm2025-12-05

David vs. Goliath: Can Small Models Win Big with Agentic AI in Hardware Design?

This paper investigates the surprising efficacy of small models combined with agentic AI in achieving significant results within hardware design, suggesting a breakthrough in efficient AI application.

By: Shashwat Shankar, Subhranshu Pandey, Innocent Dengkhw Mochahari, Bhabesh Mali, Animesh Basak Chowdhury, Sukanta Bhattacharjee, Chandan Karfa2025-12-05

Toward Virtuous Reinforcement Learning

This paper critiques common patterns in machine ethics for Reinforcement Learning and advocates for a virtue-focused alternative, addressing the limitations of rule-based and single-objective reward a...

By: Majid Ghasemi, Mark Crowley2025-12-05

✓ Analyzed#AI Safety#Reinforcement Learning

Toward Continuous Neurocognitive Monitoring: Integrating Speech AI with Relational Graph Transformers for Rare Neurological Diseases

This paper explores the integration of Speech AI with Relational Graph Transformers to enable continuous neurocognitive monitoring for individuals with rare neurological diseases, offering significant...

By: Raquel Norel, Michele Merler, Pavitra Modi2025-12-05

✓ Analyzed#HealthTech#Speech Signal Processing

Showing all 500 papers. Use the search above to filter.