This paper introduces XSkill, a dual-stream framework enabling multimodal agents to continually learn from visually-grounded task-level skills and action-level experiences without explicit retraining....
By: Guanyu Jiang, Zhaochen Su, Xiaoye Qu, Yi R. (May)Fung
This paper explores the phenomenon of "information self-locking" in reinforcement learning for active reasoning in Large Language Model (LLM) agents. It investigates how LLM agents might get stuck in ...
By: Deyu Zou, Yongqiang Chen, Fan Feng, Mufei Li, Pan Li, Yu Gong, James Cheng
This research investigates using reasoning Large Language Models (LLMs) as judges for evaluating other LLMs during post-training in non-verifiable domains, exploring their effectiveness, practical imp...
By: Yixin Liu, Yue Yu, DiJia Su, Sid Wang, Xuewei Wang, Song Jiang, Bo Liu, Arman Cohan, Yuandong Tian, Zhengxing Chen
This paper presents a prospective clinical feasibility study of an LLM-based conversational AI (Amy) in a real-world primary care setting. It evaluates Amy's diagnostic capabilities, management plans,...
This paper proposes a robust Multi-Agent Reinforcement Learning (MARL) framework for Traffic Signal Control, validated in the Vissim traffic simulator. It addresses generalization challenges through a...
This empirical study investigates whether Reinforcement Learning (RL) can enhance the generalization capabilities of Large Language Model (LLM) agents. The research explores various RL techniques and ...
This framework converts real-time "next-state signals" from AI agent interactions into continuous, online learning sources. It recovers both implicit evaluative signals and explicit directive signals,...
This report introduces "Highly Autonomous Cyber-Capable Agents" (HACCAs), AI systems capable of autonomously conducting multi-stage cyber campaigns comparable to top hacking groups. It defines HACCAs,...
By: Jam Kraprayoon, Shaun Ee, Brianna Rosen, Yohan Matthew, Aditya Singh, Christopher Covino, Asher Brass Gershovich
This paper addresses scalability in Personalized Federated Learning (PFL) for heterogeneous data distributions by reformulating PFL as a "few-for-many" optimization problem. It maintains a small numbe...
This paper introduces an open foundation model for universal humanoid loco-manipulation. It employs a decoupled learning strategy that first pre-trains on human egocentric videos to acquire generaliza...
This work proposes an architecture that adapts LLM agents for hospital environments to significantly improve clinical workflows. It addresses reliability, security, and long-term memory limitations by...
Recent work synthesizes agentic tasks for post-training tool-using LLMs, yet robust generalization under shifts in tasks and toolsets remains an open challenge. We trace this brittleness to insufficie...
By: Aili Chen, Chi Zhang, Junteng Liu, Jiangjie Chen, Chengyu Du, Yunji Li, Ming Zhong, Qin Wang, Zhengmao Zhu, Jiayuan Song, Ke Ji, Junxian He, Pengyu Zhao, Yanghua Xiao
TinyVLM enables zero-shot object detection directly on microcontrollers by employing vision-language distillation with Matryoshka embeddings. This significantly pushes the boundaries of edge AI, allow...
This research explores a data-driven approach for estimating nitrogen levels in wheat fields using multispectral images. This has direct real-world application in precision agriculture, enabling optim...
By: Andreas Tritsarolis, Tomaž Bokan, Matej Brumen, Domen Mongus, Yannis Theodoridis
This paper introduces Latent Replay Detection, a memory-efficient approach for continual object detection on microcontrollers. It leverages task-adaptive compression to mitigate catastrophic forgettin...
OmniStream introduces a unified framework for real-time perception, 3D reconstruction, and action planning in continuous data streams. This approach is crucial for embodied AI and robotics, enabling a...
This paper proposes EVATok, a novel adaptive length video tokenization method designed for efficient visual autoregressive generation. It aims to improve the efficiency of video generation models by d...
By: Tianwei Xiong, Jun Hao Liew, Zilong Huang, Zhijie Lin, Jiashi Feng, Xihui Liu
This paper introduces GRADE, a benchmark for evaluating discipline-informed reasoning in image editing. It provides a structured framework to assess how well AI models understand and apply domain-spec...
This paper proposes an automated method for checking the quality of sensor data annotations, a critical component for training reliable machine learning models in autonomous systems. Ensuring high-qua...
By: Niklas Freund, Zekiye Ilknur-Öz, Tobias Klockau, Patrick Naumann, Philipp Neumaier, Martin Köppel
Continuous knowledge updating for pre-trained large language models (LLMs) is increasingly necessary yet remains challenging. Although inference-time methods like In-Context Learning (ICL) and Retriev...
By: Seungju Back, Dongwoo Lee, Naun Kang, Hyoungjin Kim, Seonhoon Kim, Woo Suk Choi, Kyoungmin Lee
Misinformation spreading over the Internet poses a significant threat to both societies and individuals, necessitating robust and scalable fact-checking that relies on retrieving accurate and trustwor...
By: Shuzhi Gong, Richard O. Sinnott, Jianzhong Qi, Cecile Paris, Preslav Nakov, Zhuohan Xie
Agentic reasoning models, which leverage external tools for multi-step tasks, hold immense promise but also introduce new safety challenges. A critical aspect of their safe deployment is the ability t...
Large Language Model (LLM)-based agents are increasingly adopted in high-stakes settings, but current benchmarks evaluate mainly whether a task was completed, not how. We introduce Procedure-Aware Eva...
The deployment of multimodal models in high-stakes domains, such as self-driving vehicles and medical diagnostics, demands not only strong predictive performance but also reliable mechanisms for detec...
GPU kernel optimization is fundamental to modern deep learning but remains a highly specialized task requiring deep hardware expertise. Despite strong performance in general programming, large languag...
This research presents SeeThrough3D, a groundbreaking method for text-to-image generation that incorporates occlusion-aware 3D control. It allows for more precise and realistic synthesis of images by ...
By: Vaibhav Agrawal, Rishubh Parihar, Pradhaan Bhat, Ravi Kiran Sarvadevabhatla, R. Venkatesh Babu
The SWE-MiniSandbox paper presents a novel container-free reinforcement learning environment designed for developing and testing software engineering agents. This sandbox facilitates efficient trainin...
This paper introduces a novel approach to achieving agreement between different AI models through a technique called anchoring. It explores how anchoring can enhance the robustness and reliability of ...
By: Eric Eaton, Surbhi Goel, Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta, Jessica Sorrell
This paper introduces MHDash, a new online platform specifically designed for benchmarking mental health-aware AI assistants. It provides standardized metrics and datasets to evaluate the effectivenes...
By: Yihe Zhang, Cheyenne N Mohawk, Kaiying Han, Vijay Srinivas Tida, Manyu Li, Xiali Hei
This paper delves into the structure and function of "agentic memory" in AI systems, proposing a comprehensive taxonomy and an empirical analysis of its evaluation and inherent limitations. Understand...
This research introduces CXReasonAgent, an AI diagnostic reasoning agent specifically designed for analyzing chest X-rays. It leverages evidence-grounded reasoning to provide highly accurate and expla...
This study uncovers the phenomenon of "prompt interference" during LLM post-training, explaining why optimizing for Pass@k metrics (where k>1) can inadvertently lead to a degradation in Pass@1 perform...
By: Emily Chen, David Lee, Sarah Johnson, Michael Brown, Anna Garcia, Daniel Wilson, Olivia Taylor
ProactiveMobile introduces a new, comprehensive benchmark for evaluating and advancing proactive intelligence on mobile devices. It focuses on scenarios where AI anticipates user needs and provides ti...
By: Dezhi Kong, Zhengzhao Feng, Qiliang Liang, Hao Wang, Haofei Sun, Changpeng Yang, Yang Li, Peng Zhou, Shuai Nie, Hongzhen Wang, Linfeng Zhou
This paper introduces a novel method for analyzing and relaxing Petri Nets to explain infeasible task plans and to facilitate robust sequential task planning in complex robotic and automated systems. ...
By: Nguyen Cong Nhat Le, John G. Rogers, Claire N. Bonial, Neil T. Dantam
This research introduces a novel approach to semantic partial grounding, leveraging the capabilities of large language models to interpret and act upon incomplete or ambiguous instructions in dynamic ...
By: Giuseppe Canonaco, Alberto Pozanco, Daniel Borrajo
This research investigates how large language models perceive and exhibit biases when evaluating the capabilities and trustworthiness of algorithmic agents versus human experts. Findings reveal incons...
This paper explores advanced data engineering strategies crucial for scaling large language models (LLMs) to enhance their "terminal capabilities," i.e., their ability to execute complex commands and ...
We propose the 2-Step Agent framework, designed to optimize the interaction between human decision-makers and AI decision support systems. This framework structures the decision process into two disti...
This research explores using Proximal Policy Optimization (PPO) to learn optimal tuning parameters for the Pure Pursuit algorithm in autonomous racing. By jointly controlling lookahead distance and st...
This paper presents a method for efficient online coordination in multi-agent systems using diffusion policies. It focuses on enabling agents to collaborate effectively in dynamic environments, which ...
This paper describes an interactive and interpretable AI copilot designed to augment clinical decision-making, specifically evaluated with clinicians in nephrology and obstetrics through a real-world ...
This research introduces a novel approach to offline reinforcement learning that allows robots to learn from heterogeneous datasets across different embodiments. This innovation is crucial for real-wo...
By: Haruki Abe, Takayuki Osa, Yusuke Mukuta, Tatsuya Harada
This paper addresses the critical issue of undesirable emergent behaviors in large language models (LLMs) deployed in real-world production environments. It proposes a data attribution method to ident...
This research explores a novel approach where AI systems can engage in "introspective visual thinking" by "chatting with images." This enables a deeper understanding and interpretation of visual data ...
By: Junfei Wu, Jian Guan, Qiang Liu, Shu Wu, Liang Wang, Wei Wu, Tieniu Tan
This paper introduces a foundational model for conversational behavior modeling, incorporating multi-level perception to understand and generate more nuanced and contextually appropriate dialogue. It ...
This paper proposes a unified model, Power Interpretable Causal ODE Networks, for explainable anomaly detection and root cause analysis specifically in power systems. This research has critical real-w...
By: Yue Sun, Likai Wang, Rick S. Blum, Parv Venkitasubramaniam
This paper introduces Self-EvolveRec, a novel approach to recommender systems that can self-evolve using Large Language Model (LLM)-based directional feedback. This innovation has significant real-wor...
By: Sein Kim, Sangwu Park, Hongseok Kang, Wonjoong Kim, Jimin Seo, Yeonjun In, Kanghoon Yoon, Chanyoung Park
This research presents MolHIT, a novel approach using hierarchical discrete diffusion models for molecular-graph generation. This advancement has significant real-world application potential in drug d...
By: Hojung Jung, Rodrigo Hormazabal, Jaehyeong Jo, Youngrok Park, Kyunggeun Roh, Se-Young Yun, Sehui Han, Dae-Woong Jeong
This paper introduces AutoNumerics, an autonomous multi-agent pipeline designed for scientific computing. Its PDE-agnostic nature suggests a broad applicability across various scientific and engineeri...
This paper introduces the AIdentifyAGE ontology, a domain-specific, standardized, and semantically coherent framework designed for forensic dental age assessment. It integrates both manual and AI-assi...
By: Renato Marcelo, Ana Rodrigues, Cristiana Palmela Pereira, António Figueiras, Rui Santos, José Rui Figueira, Alexandre P Francisco, Cátia Vaz
This paper introduces "AI Gamestore," a platform designed for the scalable and open-ended evaluation of machine general intelligence through human games. This approach provides a robust framework for ...
By: Lance Ying, Ryan Truong, Prafull Sharma, Kaiya Ivy Zhao, Nathan Cloos, Kelsey R. Allen, Thomas L. Griffiths, Katherine M. Collins, José Hernández-Orallo, Phillip Isola, Samuel J. Gershman, Joshua B. Tenenbaum
This paper is part of the CLEF HIPE-2026 evaluation lab, focusing on the challenging task of accurately and efficiently extracting person-place relationships from diverse multilingual historical texts...
By: Juri Opitz, Corina Raclé, Emanuela Boros, Andrianos Michail, Matteo Romanello, Maud Ehrmann, Simon Clematide
This paper introduces Construct-and-Refine (CaR), a novel, general, and efficient constraint-handling framework for neural routing solvers. While neural solvers excel in computational efficiency for s...
This paper presents a comparative benchmarking of FastAPI and Triton Inference Server on Kubernetes for scalable and secure AI inference in healthcare. The research addresses critical deployment chall...
This research uncovers a scaling gap in large language models where increasing context length can lead to decreased focus and potential privacy vulnerabilities in personalized applications. The paper ...
This paper explores the application of federated learning techniques to enable privacy-preserving artificial intelligence on edge devices. It addresses challenges related to data security and efficien...
By: Elena Petrova, Mykhailo Kovalenko, Sergii Denysenko
This research focuses on developing explainable artificial intelligence (XAI) models for enhanced financial risk assessment and decision-making. It aims to provide transparency and interpretability in...
By: Maksym Bondarenko, Oksana Popova, Viktor Melnyk, Andrii Hryhoruk
This research investigates methods to improve the preservation of building semantics during the training of AI models, utilizing large language model encodings. It aims to create more intelligent and ...
By: Suhyung Jang, Ghang Lee, Jaekun Lee, Hyunjun Lee
This paper explores the methodology, rationale, and practical steps involved in developing AI agents using simulated data. It delves into the benefits and challenges of synthetic environments for trai...
This paper introduces GlobeDiff, a novel state diffusion process designed to address partial observability challenges in multi-agent systems. It offers a robust framework for agents to infer global st...
This paper proposes a novel method for recursive concept evolution to enhance compositional reasoning capabilities in large language models. This breakthrough is crucial for developing more intelligen...
SkillsBench presents the first benchmark designed to systematically evaluate the effectiveness of 'Agent Skills,' which are structured procedural knowledge packages intended to augment large language ...
This paper introduces a comprehensive framework for intelligent AI delegation, drawing on human organizational theory, advanced AI protocols, and cryptography. The approach aims to establish a robust ...
By: Nenad Tomašev, Matija Franklin, Simon Osindero
GLM-5 is a next-generation foundation model designed to transition from human-guided 'vibe coding' to autonomous 'agentic engineering' in AI. It achieves state-of-the-art results on agentic, reasoning...
ReusStdFlow is a framework designed to tackle the "reusability dilemma" and structural hallucinations in enterprise Agentic AI. It proposes an "Extraction-Storage-Construction" paradigm that deconstru...
This position paper argues that robust AI reasoning emerges from linguistic self-reflection, internalized from high-quality social interaction, rather than simply from scale. Drawing on Vygotskian dev...
By: Claudiu Cristian Musat, Jackson Tolins, Diego Antognini, Jingling Li, Martin Klissarov, Tom Duerig
This paper introduces MAC-AMP, a closed-loop multi-agent collaboration (MAC) system for multi-objective antimicrobial peptide (AMP) design, addressing the global health threat of antimicrobial resista...
By: Gen Zhou, Sugitha Janarthanan, Lianghong Chen, Pingzhao Hu
Bio-pharmaceutical innovation has shifted, with most new drug assets originating outside the U.S. and disclosed through non-English channels. This creates multi-billion-dollar risks for investors and ...
By: Alisa Vinogradova, Vlad Vinogradov, Luba Greenwood, Ilya Yasny, Dmitry Kobyzev, Shoman Kasbekar, Kong Nguyen, Dmitrii Radkevich, Roman Doronin, Andrey Doronichev
Commercial insurance underwriting is a labor-intensive process where AI can offer efficiency, but existing solutions lack comprehensive reasoning and reliability for regulated, high-stakes environment...
Deep Research systems leveraging web agents face challenges in search efficiency due to long tool-call trajectories, cyclic reasoning, and unproductive explorations. WebClipper is a novel framework th...
As Large Language Models (LLMs) move from controlled training environments to open-ended real-world applications, a critical limitation arises: static training cannot keep pace with continuous environ...
The goal of general-purpose robots relies on their ability to understand and execute natural language instructions, but Vision-Language-Action (VLA) models often misalign actions with instructions. Th...
By: Jacky Kwok, Xilun Zhang, Mengdi Xu, Yuejiang Liu, Azalia Mirhoseini, Chelsea Finn, Marco Pavone
Multimodal Large Language Models (MLLMs) are evolving into autonomous agents capable of multimodal web browsing and deep searching. Existing benchmarks fall short in task complexity, evidence accessib...
By: Huanyao Zhang, Jiepeng Zhou, Bo Li, Bowen Zhou, Yanzhe Dan, Haishan Lu, Zhiyong Cao, Jiaoyang Chen, Yuqian Han, Zinan Sheng, Zhengwei Tao, Hao Liang, Jialong Wu, Yang Shi, Yuanpeng He, Jiaye Lin, Qintong Zhang, Guochen Yan, Runhao Zhao, Zhengpin Li, Xiaohan Yu, Lang Mei, Chong Chen, Wentao Zhang, Bin Cui
Retrieval Augmented Generation (RAG) is crucial for Large Language Models (LLMs) in processing long documents, but current retrieval models are inadequate for this task due to challenges like context-...
By: David Jiahao Fu, Lam Thanh Do, Jiayu Li, Kevin Chen-Chuan Chang
Large reasoning models demonstrate state-of-the-art performance on complex tasks, but their robustness against multi-turn adversarial attacks is underexplored. This paper evaluates nine frontier reaso...
This research explores the use of Large Language Models (LLMs) for causal induction to reverse-engineer game mechanics from gameplay traces. By analyzing player behavior, the system can infer the unde...
By: Mohit Jiwatode, Alexander Dockhorn, Bodo Rosenhahn
Frontier AI systems are increasingly capable and deployed in high-stakes multi-agent environments. However, existing AI safety benchmarks largely evaluate single agents, leaving multi-agent risks such...
By: Pepijn Cobben, Xuanqiang Angelo Huang, Thao Amelia Pham, Isabel Dahlgren, Terry Jingchen Zhang, Zhijing Jin
This work introduces a novel framework for autonomous data processing leveraging meta-agents. These meta-agents are designed to intelligently manage and execute data-related tasks without constant hum...
This work presents CATTS, a simple technique for dynamically allocating compute for multi-step agents, especially web agents. It empirically studies inference-time scaling for web agents, addressing t...
By: Nicholas Lee, Lutfi Eren Erdogan, Chris Joseph John, Surya Krishnapillai, Michael W. Mahoney, Kurt Keutzer, Amir Gholami
Explaining observed phenomena through symbolic, interpretable formulas is a fundamental goal of science. We introduce KeplerAgent, an agentic framework that explicitly follows the scientific reasoning...
By: Jianke Yang, Ohm Venkatachalam, Mohammad Kianezhad, Sharvaree Vadgama, Rose Yu
Urban traffic management demands systems that simultaneously predict future conditions, detect anomalies, and take safe corrective actions while providing reliability guarantees. We present STREAM-RL,...
By: Joydeep Chandra, Satyam Kumar Navneet, Aleksandr Algazinov, Yong Zhang
This paper conducts an in-depth anatomical study of the SAM3 text encoder, a critical component for vision-language segmentation models, focusing on identifying architectural bottlenecks and proposing...
By: Chengxi Zeng, Yuxuan Jiang, Ge Gao, Shuai Wang, Duolikun Danier, Bin Zhu, Stevan Rudinac, David Bull, Fan Zhang
As Large Language Models (LLMs) are increasingly deployed in social and strategic scenarios, it becomes critical to understand where and why their behavior diverges from that of humans. While behavior...
By: Caroline Wang, Daniel Kasenberg, Kim Stachenfeld, Pablo Samuel Castro
End-to-end autonomous driving has emerged as a promising paradigm. We propose AppleVLM, an advanced perception and planning-enhanced VLM model for robust end-to-end driving. AppleVLM introduces a nove...
We present a large scale data analysis of Moltbook, a Reddit-style social media platform exclusively populated by AI agents. Analyzing over 369,000 posts and 3.0 million comments from approximately 46...
By: Yukun Jiang, Yage Zhang, Xinyue Shen, Michael Backes, Yang Zhang
Checklist-based rewards offer a structured way to guide reinforcement learning agents through complex, multi-step tasks requiring tool use and multi-turn interactions. This paper introduces CM2, a nov...
By: Zhen Zhang, Kaiqiang Song, Xun Wang, Yebowen Hu, Weixiang Yan, Chenyang Zhao, Henry Peng Zou, Haoyun Deng, Sathish Reddy Indurthi, Shujian Liu, Simin Ma, Xiaoyang Wang, Xin Eric Wang, Song Wang
This paper explores critical limitations in current speech models, revealing how they often fail to capture the most salient or semantically important information in spoken language. Through extensive...
By: Kaitlyn Zhou, Martijn Bartelds, Federico Bianchi, James Zou
This work systematically investigates modernizing Vision Transformer backbones by leveraging architectural advancements from the past five years. While preserving the canonical Attention-FFN structure...
Large Language Model (LLM) agents struggle to learn from past experiences, with existing memory methods often storing redundant trajectories and failing to extract high-level patterns. SkillRL address...
Scaling action-controllable world models is hindered by the scarcity of action labels. While latent action learning aims to extract control interfaces from unlabeled video, learned latents often fail ...
By: Yuxin Jiang, Yuchao Gu, Ivor W. Tsang, Mike Zheng Shou
Drifting Models propose a new generative modeling paradigm that shifts iterative distribution matching to training time, enabling high-quality sample generation in a single forward pass. This addresse...
By: Mingyang Deng, He Li, Tianhong Li, Yilun Du, Kaiming He
Equipping embodied agents with the ability to reason about tasks, foresee physical outcomes, and generate precise actions is essential for general-purpose manipulation. BagelVLA is a unified model tha...
This paper presents Cadmus, a system designed for research on program synthesis with small models, avoiding the complexities and high computational demands of large language models (LLMs). Cadmus incl...
This paper investigates the application of artificial intelligence tools for the development and validation of Q-matrices, which are essential components in psychometrics and educational assessment fo...
This paper proposes a unified training-serving system that integrates reinforcement learning (RL) with adaptive speculative training. The approach aims to optimize the deployment and continuous learni...
By: Junxiong Wang, Fengxiang Bie, Jisen Li, Zhongzhu Zhou, Zelei Shao, Yubo Wang, Yinghui Liu, Qingyang Wu, Avner May, Sri Yanamandra, Yineng Zhang, Ce Zhang, Tri Dao, Percy Liang, Ben Athiwaratkun, Shuaiwen Leon Song, Chenfeng Xu, Xiaoxia Wu
This research proposes a novel root cause analysis method that leverages large language models (LLMs) enhanced with residual connection structures. The approach aims to improve the accuracy and effici...
This paper presents a robust and real-time system for Bangladeshi currency recognition utilizing a dual-stream MobileNet and EfficientNet approach. This innovative solution offers enhanced accuracy an...
By: Subreena, Mohammad Amzad Hossain, Mirza Raquib, Saydul Akbar Murad, Farida Siddiqi Prity, Muhammad Hanif, Nick Rahimi
This research introduces a learned model predictive game framework for multi-agent drone racing, enabling drones to develop complex strategies and execute them at high speeds. This advancement holds s...
By: Andrei-Carlo Papuc, Lasse Peters, Sihao Sun, Laura Ferranti, Javier Alonso-Mora
This research presents a method for designing finite-state controllers for Partially Observable Markov Decision Processes (POMDPs) by employing deep reinforcement learning. This approach is critical f...
By: David Hudák, Maris F. L. Galesloot, Martin Tappler, Martin Kurečka, Nils Jansen, Milan Češka
This paper explores SAIG methods, a framework designed for the objective evaluation of Explainable Artificial Intelligence (XAI). The research aims to establish rigorous and quantifiable metrics for a...
By: Miquel Miró-Nicolau, Gabriel Moyà-Alcover, Anna Arias-Duart
DreamDojo introduces a generalist robot world model learned from large-scale human videos, enabling efficient reinforcement learning of robotic policies. This framework co-evolves a video world model ...
This paper introduces 'Jackpot,' a framework designed to improve the efficiency of reinforcement learning (RL) for large language models (LLMs) by reducing the distribution mismatch between the rollou...
This research delves into the critical area of explainable AI (XAI), comparing and contrasting explainability in traditional AI models with that in more complex agentic systems. Enhancing XAI is vital...
By: Sindhuja Chaduvula, Jessee Ho, Kina Kim, Aravind Narayanan, Mahshid Alinoori, Muskan Garg, Dhanesh Ramachandram, Shaina Raza
This paper explores the phenomenon of overconfidence in AI agents, using agentic uncertainty as a mechanism to identify and potentially mitigate it. Understanding and addressing overconfidence is cruc...
By: Jean Kaddour, Srijan Patel, Gbètondji Dovonon, Leo Richter, Pasquale Minervini, Matt J. Kusner
This paper introduces AIRS-Bench, a comprehensive benchmark suite designed to evaluate the capabilities of frontier AI research science agents across various tasks. It provides a standardized framewor...
By: Alisia Lupidi, Bhavul Gauri, Thomas Simon Foster, Bassel Al Omari, Despoina Magka, Alberto Pepe, Alexis Audran-Reiss, Muna Aghamelu, Nicolas Baldwin, Lucia Cipolina-Kun, Jean-Christophe Gagnon-Audet, Chee Hau Leow, Sandra Lefdal, Hossam Mossalam, Abhinav Moudgil, Saba Nazir, Emanuel Tewolde, Isabel Urrego, Jordi Armengol Estape, Amar Budhiraja, Gaurav Chaurasia, Abhishek Charnalia, Derek Dunfield, Karen Hambardzumyan, Daniel Izcovich, Martin Josifoski, Ishita Mediratta, Kelvin Niu, Parth Pathak, Michael Shvartsman, Edan Toledo, Anton Protopopov, Roberta Raileanu, Alexander Miller, Tatiana Shavrina, Jakob Foerster, Yoram Bachrach
This paper focuses on improving speech emotion recognition by utilizing representations from OpenAI's Whisper model combined with attentive pooling. This advancement has significant real-world applica...
By: Ali Shendabadi, Parnia Izadirad, Mostafa Salehi, Mahmoud Bijankhan
SokoBench is introduced as a benchmark for evaluating the long-horizon planning and reasoning capabilities of large language models. This is critical for developing more capable and reliable LLMs for ...
By: Sebastiano Monti, Carlo Nicolini, Gianni Pellegrini, Jacopo Staiano, Bruno Lepri
This paper investigates the use of multimodal large language models (MLLMs) as active memory controllers for embodied agents. This approach could significantly enhance the autonomy and adaptability of...
This research explores learning models of shooter behavior based on events observed in virtual reality experiments. This has potential applications in training simulations, developing more realistic A...
This paper presents DyTopo, a dynamic topology routing method for multi-agent reasoning. It uses semantic matching to enable flexible and efficient communication between agents, which is crucial for c...
This research introduces a geographically-aware transformer-based model for traffic forecasting, specifically designed for urban motorway digital twins. This technology offers crucial support for smar...
This paper critiques the linear model of AI progress, introducing "familiar intelligence" and "strange intelligence". It argues that AI intelligence is likely to be strange, combining superhuman capac...
AgenticPay proposes a multi-agent large language model (LLM) negotiation system designed for buyer-seller transactions. This system could revolutionize e-commerce by automating and optimizing negotiat...
This paper proposes T-LLM, a temporal distillation framework that enables general-purpose Large Language Models (LLMs) to perform time series forecasting. By transferring predictive behavior from a li...
SWE-Universe is a framework developed to automatically construct over 800,000 real-world, multilingual, verifiable software engineering environments from GitHub PRs. This massive dataset significantly...
This paper explores a human-AI hybrid solution, HybridQuestion, that integrates the scalable data processing capabilities of AI with human expert judgment to identify meaningful research questions. Th...
Large Language Models (LLMs) often struggle with reasoning and planning tasks. This paper introduces the Task-Method-Knowledge (TMK) framework, a prompting technique that significantly improves LLM re...
This paper introduces Group-Evolving Agents (GEA), a new paradigm for open-ended self-improvement where a group of agents acts as the fundamental evolutionary unit, enabling explicit experience sharin...
By: Zhaotian Weng, Antonis Antoniades, Deepak Nathani, Zhen Zhang, Xiao Pu, Xin Eric Wang
This paper presents case studies demonstrating how Google's Gemini-based AI models can effectively collaborate with researchers in novel, expert-level mathematical and algorithmic discovery. It showca...
By: David P. Woodruff, Vincent Cohen-Addad, Lalit Jain, Jieming Mao, Song Zuo, MohammadHossein Bateni, Simina Branzei, Michael P. Brenner, Lin Chen, Ying Feng, Lance Fortnow, Gang Fu, Ziyi Guan, Zahra Hadizadeh, Mohammad T. Hajiaghayi, Mahdi JafariRaviz, Adel Javanmard, Karthik C. S., Ken-ichi Kawarabayashi, Ravi Kumar, Silvio Lattanzi, Euiwoong Lee, Yi Li, Ioannis Panageas, Dimitris Paparas, Benjamin Przybocki, Bernardo Subercaseaux, Ola Svensson, Shayan Taherijam, Xuan Wu, Eylon Yogev, Morteza Zadimoghaddam, Samson Zhou, Vahab Mirrokni
Large language models (LLMs) have revolutionized natural language processing, yet they remain constrained by fixed, non-differentiable tokenizers like Byte Pair Encoding (BPE), which hinder end-to-end...
As AI becomes more capable, we entrust it with more general and consequential tasks. The risks from failure grow more severe with increasing task scope. It is therefore important to understand how ext...
By: Alexander Hägele, Aryo Pradipta Gema, Henry Sleight, Ethan Perez, Jascha Sohl-Dickstein
Expert-level scientific reasoning remains challenging for large language models, particularly on benchmarks such as Humanity's Last Exam (HLE), where rigid tool pipelines, brittle multi-agent coordina...
We propose RLAnything, a reinforcement learning framework that dynamically forges environment, policy, and reward models through closed-loop optimization, amplifying learning signals and strengthening...
By: Yinjie Wang, Tianbao Xie, Ke Shen, Mengdi Wang, Ling Yang
Enabling humanoid robots to perform agile and adaptive interactive tasks has long been a core challenge in robotics. Current approaches are bottlenecked by either the scarcity of realistic interaction...
Embodied agents operating in complex, dynamic environments often struggle with uncertainty in their perceptions and actions. This paper proposes a novel framework that bridges the gap between large la...
By: SeungWon Seo, SooBin Lim, SeongRae Noh, Haneul Kim, HyeongYeop Kang
Large language model (LLM) agents, while powerful, often suffer from inefficiencies due to processing irrelevant information and generating verbose thoughts. Agent-Omit introduces a novel training par...
Recent multi-LLM agent systems perform well in prompt optimization and automated problem-solving, but many either keep the solver frozen after fine-tuning or become inefficient due to the increasing s...
By: Ujin Jeon, Jiyong Kwon, Madison Ann Sullivan, Caleb Eunho Lee, Guang Lin
Direct alignment methods are increasingly used to align large language models (LLMs) with human preferences. However, many real-world alignment problems involve multiple conflicting objectives, where ...
PixelGen is a pixel-space diffusion framework that uses perceptual supervision through LPIPS and DINO-based losses to generate high-quality images without requiring VAEs or latent representations. Pix...
High-quality scientific illustrations are crucial for effectively communicating complex scientific and technical concepts, yet their manual creation remains a well-recognized bottleneck in both resear...
This paper introduces AOrchestra, a novel framework designed to automate the creation and management of sub-agents within complex large language model (LLM)-based multi-agent systems. It aims to strea...
Reasoning Large Language Models (LLMs) enable test-time scaling, with dataset-level accuracy improving as the token budget increases, motivating adaptive reasoning -- spending tokens when they improve...
By: Xi Wang, Anushri Suresh, Alvin Zhang, Rishi More, William Jurayj, Benjamin Van Durme, Mehrdad Farajtabar, Daniel Khashabi, Eric Nalisnick
This study introduces an automated personnel selection system that combines large language models (LLMs) with Fuzzy-TOPSIS to enhance the hiring process. The system uses advanced natural language proc...
This research introduces Self-Distillation Policy Optimization (SDPO), an on-policy reinforcement learning algorithm designed to significantly improve Large Language Model (LLM) performance by effecti...
By: Jonas Hübotter, Frederike Lübeck, Lejs Behric, Anton Baumann, Marco Bagatella, Daniel Marta, Ido Hakimi, Idan Shenfeld, Thomas Kleine Buening, Carlos Guestrin, Andreas Krause
This paper introduces On-Policy Self-Distillation (OPSD), a novel framework enabling a single Large Language Model (LLM) to act as both teacher and student to significantly enhance its mathematical re...
This paper introduces "pixel MeanFlow" (pMF), an innovative generative model enabling one-step, latent-free image generation. Diverging from conventional diffusion/flow-based models that employ multi-...
This paper introduces JAF: Judge Agent Forest, a novel framework designed to enhance the self-refinement and evaluation processes of agentic AI systems. Instead of assessing responses in isolation, th...
DynamicVLA is a novel framework for dynamic object manipulation, addressing the challenges faced by Vision-Language-Action (VLA) models in scenarios requiring rapid perception and continuous control o...
DeepSeek-OCR 2 introduces DeepEncoder V2, a cutting-edge vision-language model that significantly advances optical character recognition (OCR) capabilities. This model features a novel 'visual causal ...
This paper provides the first large-scale empirical analysis of disempowerment patterns in real-world AI assistant interactions, examining 1.5 million consumer Claude.ai conversations. The focus is on...
By: Mrinank Sharma, Miles McCain, Raymond Douglas, David Duvenaud
Traditional forced alignment (FA) methods often suffer from language-specificity and cumulative temporal shifts. This paper introduces LLM-ForcedAligner, a novel approach that reformulates FA as a slo...
By: Bingshen Mu, Xian Shi, Xiong Wang, Hexin Liu, Jin Xu, Lei Xie
AI assistance provides significant productivity gains, especially for novice workers. However, this study investigates how such assistance affects skill development. Through randomized experiments, it...
This paper introduces a novel world model training paradigm specifically designed for longitudinal Electronic Health Records (EHR). It addresses the challenges of integrating and interpreting continuo...
By: Irsyad Adam, Zekai Chen, David Laprade, Shaun Porwal, David Laub, Erik Reinertsen, Arda Pekis, Kevin Brown
This research proposes "World of Workflows," a benchmark designed to facilitate the integration of advanced AI world models into enterprise systems. It aims to evaluate and accelerate the application ...
This paper introduces Runtime Task Learning (RTL), an adaptive AI method that enables models to dynamically adjust their architectures based on incoming heterogeneous data. It demonstrates significant...
By: Grzegorz Stefanski, Alberto Presta, Michal Byra
This paper presents PhaseCoder, a transformer-only spatial audio encoder that operates independently of microphone geometry. It processes raw multichannel audio and microphone coordinates to perform l...
This work introduces two new benchmarks, ORDebug and ORBias, that integrate a solver into the evaluation loop for AI models. ORDebug assesses iterative self-correction in solving infeasible operations...
This paper focuses on developing and exploring a reasoning reward model designed to improve the capabilities of AI agents. It likely investigates how to effectively train agents by providing rewards t...
This paper explores the use of conditional denoising models as physical surrogate models for complex physical systems. It addresses the common trade-off between data-fitting accuracy and physical cons...
By: José Afonso, Pedro Viegas, Rodrigo Ventura, Vasco Guerra
The "Self-Improving Pretraining" framework integrates alignment objectives (safety, factuality, quality) directly into LLM pretraining using a powerful post-trained model as a dynamic rewriter and jud...
By: Ellen Xiaoqing Tan, Shehzaad Dhuliawala, Jing Xu
This framework leverages LLMs to encode human expertise into interpretable logic rules for time series anomaly detection in supply chains. It outperforms unsupervised methods in accuracy and interpret...
By: Jianing Fang, Yuxuan Chen, Yanchao Tan, Guangtao Huang, Hongxing Li, Xiang Li, Fei Wang, Yiheng Fan, Ziyue Li, Kai Shu, Jun Wang, Zihui Xue, Jie Xu
LingBot-VLA is a Vision-Language-Action foundation model pre-trained on 20,000 hours of real-world multi-embodiment robot data. It demonstrates that VLA model performance scales with increasing data v...
This paper investigates the phenomenon of "illusion of insight" in AI reasoning models, where models might appear to have genuine understanding without truly possessing it. The research critically exa...
The paper introduces an agentic AI framework designed to facilitate human-AI co-creation through progressive ideation. This framework allows for iterative development of ideas, combining human creativ...
By: Sankar B, Srinidhi Ranjini Girish, Aadya Bharti, Dibakar Sen
The paper introduces Mortar, a system that uses evolving mechanics for automatic game design. This AI-driven approach can generate novel game rules and interactions, aiming to accelerate the game deve...
By: Muhammad U. Nasir, Yuchen Li, Steven James, Julian Togelius
This research explores AI's ability to interpret and reason about architectural heritage, specifically Iranian Pigeon Towers, using typological and material reasoning. It demonstrates how AI can contr...
This work presents DA-DPO, a cost-efficient and difficulty-aware preference optimization method aimed at significantly reducing hallucinations in Multimodal Large Language Models (MLLMs). By optimizin...
This paper proposes a memory-guided framework with semi-supervised learning for detecting adaptive causal coordination on social media. The approach aims to identify complex, evolving coordination pat...
This paper proposes a multi-algorithm approach to optimize human resources workload balancing in last-mile urban delivery systems. The methodology aims to improve operational efficiency and resource a...
By: Luis M. Moreno-Saavedra, Silvia Jimenez-Fernandez, Antonio Portilla-Figueras, David Casillas-Perez, Sancho Salcedo-Sanz
This research explores how semantic methods can improve tactical analysis in team sports, specifically football. It presents a methodology that uses AI to derive deeper insights into game strategies, ...
By: Alessio Di Rubbo, Mattia Neri, Remo Pareschi, Marco Pedroni, Roberto Valtancoli, Paolino Zica
This paper introduces Self-Distillation Fine-Tuning (SDFT), a method enabling large language models to continually acquire new skills and knowledge from demonstrations without catastrophic forgetting....
This paper introduces AgentDoG, a diagnostic guardrail framework for AI agent safety and security, addressing challenges from autonomous tool use and environmental interactions. It provides fine-grain...
This paper proposes CovAgent, an agentic AI-powered approach to enhance Android app UI testing by inspecting decompiled Smali code and component transition graphs. It reasons about unsatisfied activat...
We present a highly optimized neural network architecture and deployment framework enabling real-time, ultra-low latency object detection on resource-constrained edge devices for autonomous drone navi...
By: Dr. Hiroshi Tanaka, Dr. Isabella Rossi, Dr. Jacob Smith, Dr. Katerina Novikova
We propose a novel explainable AI framework designed for real-time financial fraud detection, offering both high accuracy and clear, human-understandable explanations for its predictions. This system ...
By: Dr. Robert Johnson, Dr. Sarah Chen, Dr. Thomas Lee, Dr. Ursula Weber, Dr. Victor Morales
This paper presents a multi-agent reinforcement learning system that dynamically optimizes urban traffic signal control in real-time. Experimental results demonstrate significant reductions in traffic...
By: Dr. Wendy Davis, Dr. Xuan Zhou, Dr. Yuri Kim, Dr. Zoe Green
We explore the use of large language models to adaptively generate personalized educational content for K-12 students, catering to individual learning styles and paces. This approach promises to revol...
By: Dr. Alex Chang, Dr. Brenda Lee, Dr. Carlos Ruiz, Dr. Diana Popova, Dr. Ethan Brown
This research focuses on applying advanced AI techniques, including Bayesian optimization and deep learning, to optimize the design and operational parameters of direct air capture (DAC) technologies....
By: Dr. Fiona MacLeod, Dr. Gregory Parker, Dr. Hannah Zhao, Dr. Ivan Volkov, Dr. Jessica O'Connell
This paper explores the application of large-scale generative AI foundation models for accelerating personalized drug discovery. It details novel architectures capable of synthesizing drug candidates ...
By: Dr. Anya Petrova, Dr. Ben Carter, Dr. Chen Li, Dr. David Sharma, Dr. Emily Wong, Dr. Frank Miller, Dr. Grace Kim
Tencent researchers introduced Youtu-VL, a Vision-Language Model framework addressing fine-grained visual information loss with a "vision-as-target" optimization paradigm, achieving competitive perfor...
Researchers introduced a framework and benchmark to study visual world modeling in Unified Multimodal Models (UMMs), demonstrating that visual generation significantly improves reasoning on physical a...
This paper advocates for NeuroAI, a type of Neuroscience-informed Artificial Intelligence, by identifying current and future areas of synergism between neuroscience and AI. It focuses on embodiment, l...
Robbyant introduces Masked Depth Modeling (MDM), a framework that leverages natural sensor failures in RGB-D cameras as learning signals to generate dense, metric-scale, and pixel-aligned depth maps. ...
By: Bin Tan, Changjiang Sun, Xiage Qin, Hanat Adai, Zelin Fu, Tian Zhou, Han Zhang, Yinghao Xu, Xing Zhu, Yujun Shen, Nan Xue
This paper explores the application of vision-language pre-training techniques to improve the accuracy and interpretability of medical image analysis. By jointly learning from image and text data, the...
This paper develops novel interpretable AI models for transparent and reliable financial risk assessment. By providing clear explanations for their predictions, these models increase trust and facilit...
This paper explores the development of novel foundational models that enable robots to operate robustly and adaptively in complex and rapidly changing real-world environments. The models integrate adv...
The research presents a novel quantization technique that significantly reduces the computational and memory footprint of large language models, making them deployable on resource-constrained edge dev...
This paper proposes a generative AI framework that accelerates the discovery of novel drug candidates tailored to individual patient genetic profiles. By leveraging advanced deep learning architecture...
Addressing the challenge of catastrophic forgetting, this research introduces a continual learning paradigm for autonomous driving agents. The proposed methods allow vehicles to continuously learn fro...
By: Karen Park, Leo Rodriguez, Mia Taylor, Noah Davis, Olivia Hall
This paper introduces daVinci-Dev, a systematic agentic mid-training approach that equips large language models (LLMs) with foundational agentic behaviors for software engineering. It addresses the di...
By: Ji Zeng, Dayuan Fu, Tiantian Mi, Yumin Zhuang, Yaxing Huang, Xuefeng Li, Lyumanshan Ye, Muhang Xie, Qishuo Hua, Zhen Huang, Mohan Jiang, Hanning Wang, Jifan Lin, Yang Xiao, Jie Sun, Yunze Wu, Pengfei Liu
This paper introduces SOAR, a new self-improvement framework that enables large language models (LLMs) to generate their own curricula for mathematical reasoning problems they cannot initially solve. ...
By: Shobhita Sundaram, John Quan, Ariel Kwiatkowski, Kartik Ahuja, Yann Ollivier, Julia Kempe
This paper presents TelcoAI, an agentic, multi-modal Retrieval-Augmented Generation (RAG) system specifically designed for 3GPP documentation. It significantly improves recall, claim recall, and faith...
This paper introduces TSRBench, a comprehensive benchmark designed for multi-task and multi-modal time series reasoning. It aims to evaluate and advance generalist AI models in their ability to unders...
This research introduces a comprehensive diagnostic framework that utilizes big data analytics to evaluate the procedural reliability of intelligent agent systems. It addresses critical needs for depl...
This paper investigates how multi-agent bandit systems can effectively exchange and leverage visual uncertainties to improve decision-making. This is particularly relevant in dynamic environments wher...
This research focuses on developing scalable rubrics to enhance the quality and reliability of Large Language Models (LLMs) specifically tailored for healthcare applications. The goal is to improve th...
By: Zhichao Yang, Sepehr Janghorbani, Dongxu Zhang, Jun Han, Qian Qian, Andrew Ressler II, Gregory D. Lyng, Sanjit Singh Batra, Robert E. Tillman
The paper proposes a novel generative AI approach for creating synthesizable drug-like molecular glues. This realistic AI method offers a promising pathway for discovering new therapeutic compounds, a...
This paper explores the convergence of generative AI and Extended Reality (XR) to enable more scalable and natural human-computer interactions. It delves into how AI can enhance immersive experiences,...
Skywork UniPic 3.0 introduces a unified multi-image composition framework that leverages sequence modeling to generate complex and coherent images from multiple input components. This advancement in g...
This research proposes a novel approach to detect climate change disinformation by integrating vision-language models with external knowledge sources. The multimodal system analyzes both textual and v...
This paper proposes "Multi-Persona Thinking" as a novel approach to mitigate social biases in Large Language Models (LLMs). By enabling LLMs to consider multiple perspectives, the research aims to red...
This paper focuses on developing methods for evaluating prompts for Large Language Models (LLMs) specifically in educational contexts. It addresses the challenges of assessing prompt effectiveness and...
By: Langdon Holmes, Adam Coscia, Scott Crossley, Joon Suh Choi, Wesley Morris
This paper introduces FlexLLM, a composable High-Level Synthesis (HLS) library designed for flexible hybrid Large Language Model (LLM) accelerator design. It aims to streamline the development of effi...
By: Jiahao Zhang, Zifan He, Nicholas Fraser, Michaela Blott, Yizhou Sun, Jason Cong
This paper introduces Cosmos Policy, a method for fine-tuning large, pretrained latent video diffusion models into unified robot policies for visuomotor control and planning. It achieves state-of-the-...
By: Moo Jin Kim, Yihuai Gao, Tsung-Yi Lin, Yen-Chen Lin, Yunhao Ge, Grace Lam, Percy Liang, Shuran Song, Ming-Yu Liu, Chelsea Finn, Jinwei Gu
This paper explores methods for controlling the long-term behavior of language model agents by incorporating explicit state dynamics. It aims to improve the predictability and reliability of AI agents...
Generalizing video matting models to real-world videos remains a significant challenge due to the scarcity of labeled data. We present VideoMaMa, a novel mask-guided video matting framework that conve...
By: Sangbeom Lim, Seoung Wug Oh, Jiahui Huang, Heeji Yoon, Seungryong Kim, Joon-Young Lee
Large language models (LLMs) often struggle with complex reasoning tasks that require accurate and up-to-date factual knowledge. This paper proposes a novel framework that integrates Monte Carlo Tree ...
Optimizing scientific computing algorithms for modern GPUs is a labor-intensive and iterative process involving repeated code modification, benchmarking, and tuning across complex hardware and softwar...
Foreign Information Manipulation and Interference (FIMI) on social media poses a significant threat to democratic processes. This paper proposes a framework-agnostic agent-based operationalization of ...
By: Kevin Tseng, Juan Carlos Toledano, Bart De Clerck, Yuliia Dukach, Phil Tinn
Specific domains depend on high-quality fine-tuning datasets, particularly in instructional format (e.g., Question-Answer - Q&A). However, generating these datasets, particularly from unstructured sou...
By: Alex Echeverria, Sávio Salvarino Teles de Oliveira, Fernando Marques Federson
This paper introduces "The Agentic Leash," a method for extracting causal feedback fuzzy cognitive maps using Large Language Models (LLMs). This approach enables better interpretability and understand...
This paper presents a novel approach utilizing a vision-and-knowledge enhanced large language model to achieve generalizable inference of pedestrian crossing behavior. This development is crucial for ...
We present SciCoQA, a dataset for detecting discrepancies between scientific publications and their codebases to ensure faithful implementations. We construct SciCoQA from GitHub issues and reproducib...
This paper presents a comprehensive solution for detecting AI-generated videos, a critical need due to the increasing realism of synthetic media. The proposed system utilizes advanced computer vision ...
By: Long Ma, Zihao Xue, Yan Wang, Zhiyuan Yan, Jin Xu, Xiaorui Jiang, Haiyang Yu, Yong Liao, Zhen Bi
This paper introduces "The Great March 100" (GM-100), a benchmark of 100 detail-oriented tasks for evaluating embodied AI agents. It addresses limitations in existing datasets by providing a diverse a...
ShapeR introduces a novel approach for robust conditional 3D object shape generation from casually captured image sequences. It leverages multi-modal inputs like SLAM points, posed images, and VLM-gen...
By: Yawar Siddiqui, Duncan Frost, Samir Aroudj, Armen Avetisyan, Henry Howard-Jenkins, Daniel DeTone, Pierre Moulon, Qirui Wu, Zhengqin Li, Julian Straub, Richard Newcombe, Jakob Engel
This paper addresses the critical challenge of hyperparameter optimization for Constraint Programming (CP) solvers. It proposes advanced techniques to automatically tune these parameters, significantl...
By: Hedieh Haddad, Thibault Falque, Pierre Talbot, Pascal Bouvry
This research proposes the Large language model and Extended Greedy (LEG) framework to optimize health facility location in Ethiopia. It integrates expert knowledge, articulated in natural language, w...
This paper extends an LLM-based framework for Predictive Process Monitoring (PPM), evaluating its generality and reasoning mechanisms. It demonstrates that LLMs outperform benchmark methods in data-sc...
By: Alessandro Padella, Massimiliano de Leoni, Marlon Dumas
This paper introduces BoxMind, a closed-loop AI expert system for optimizing boxing strategies. It uses multi-modal data to define atomic punch events and proposes a graph-based predictive model to ca...
This research investigates the multifaceted impact of generative AI tools on the early stages of architectural design. It examines how these AI systems influence designers' performance, their creative...
By: Han Jiang, Yao Xiao, Rachel Hurley, Shichao Liu
This paper introduces a novel approach for constructing "context bubbles" in enterprise retrieval-augmented generation (RAG) systems, focusing on both the structural integrity and semantic diversity o...
Human papillomavirus (HPV) vaccine hesitancy poses significant public health challenges, particularly in Japan where proactive vaccination recommendations were suspended from 2013 to 2021. The resulti...
This research investigates the reliability of AI explanations, specifically focusing on chain-of-thought reasoning in large language models. The study provides evidence of systematic underreporting, w...
This paper introduces CogCanvas, a system designed for verbatim-grounded artifact extraction from extensive Large Language Model (LLM) conversations. It addresses the challenge of managing and leverag...
This paper introduces a novel framework utilizing multimodal foundation models to create highly personalized healthcare solutions. It integrates patient data from various sources including genomics, e...
By: Anna Petrova, Dmytro Kovalenko, Olena Lysenko, Sergii Tkachenko, Victoria Bondar
This paper introduces ML-Master 2.0, an autonomous agent tackling ultra-long-horizon machine learning engineering. It uses Hierarchical Cognitive Caching to manage context and sustain strategic cohere...
By: Xinyu Zhu, Yuzhu Cai, Zexi Liu, Bingyang Zheng, Cheng Wang, Rui Ye, Jiaao Chen, Hanrui Wang, Wei-Chen Wang, Yuzhi Zhang, Linfeng Zhang, Weinan E, Di Jin, Siheng Chen
LSRIF introduces a logic-structured training framework that explicitly models instruction logic for large language models to improve instruction-following. It addresses challenges with sequential depe...
By: Qingyu Ren, Qianyu He, Jingwen Chang, Jie Zeng, Jiaqing Liang, Yanghua Xiao, Han Xia, Zeye Sun, Fei Yu
This paper leverages epistemology to reframe human-AI complementarity, aiming to address theoretical challenges in understanding when human-AI teams outperform either alone. It seeks to provide a more...
By: Andrea Ferrario, Rasita Vinay, Matteo Casserini, Alessandro Facchini
DeepResearchEval is an automated framework for constructing deep research tasks and evaluating AI agents. It addresses challenges in assessing multi-step web research and cross-source information synt...
By: Yibo Wang, Lei Wang, Yue Deng, Keming Wu, Yao Xiao, Huanjin Yao, Liwei Kang, Hai Ye, Yongcheng Jing, Lidong Bing
This paper proposes Test-Time Tool Evolution (TTE), a new paradigm enabling LLM agents to synthesize, verify, and evolve executable tools during inference for scientific reasoning. It overcomes the li...
This scoping review maps ethically-oriented work on anthropomorphising LLM-based conversational agents, discussing benefits like engagement and inclusion versus concerns such as deception and overreli...
By: Andrea Ferrario, Tetsuya Sakai, Matteo Casserini, Alessandro Facchini
This paper proposes Controlled Self-Evolution (CSE) to enhance code generation through iterative generate-verify-refine cycles. It addresses inefficiencies in existing self-evolution methods for algor...
By: Tu Hu, Ronghao Chen, Shuo Zhang, Jianghao Yin, Mou Xiao Feng, Jingping Liu, Shaolei Zhang, Wenqi Jiang, Yuqi Fang, Sen Hu, Yi Xu, Huacan Wang
Real-world deployment of GUI agents requires aligning with users' complex implicit intents, beyond explicit instructions. This paper introduces "PersonalAlign," a new agent task where agents utilize l...
Centralized multi-agent systems based on LLMs often struggle with unstable long-horizon collaboration due to a lack of memory management, leading to context bloat, error accumulation, and poor cross-t...
While LLMs excel in text-based code automation, their potential in graph-oriented engineering workflows like Simulink remains underexplored. SimuAgent is an LLM-powered modeling and simulation agent f...
Large Language Model-based Multi-Agent Debate (MAD) frameworks enhance reasoning and collaboration, but existing approaches suffer from agents adopting identical reasoning paths, leading to errors and...
Modern supply chains are increasingly vulnerable to disruptions. This paper introduces a minimally supervised agentic AI framework that autonomously monitors, analyzes, and responds to disruptions acr...
Large Language Models (LLMs) in educational applications often reveal solutions rather than fostering dialogic learning. This paper introduces ConvoLearn, a dataset grounded in knowledge building theo...
Multi-agent systems powered by Large Language Models (LLMs) often struggle with resource-intensive and unstable training due to non-stationarity and sparse rewards in multi-agent reinforcement learnin...
By: Zhiyuan Hu, Yunhai Hu, Juncheng Liu, Shuyue Stella Li, Yucheng Wang, Zhen Xu, See-Kiong Ng, Anh Tuan Luu, Xinxing Xu, Bryan Hooi, Cynthia Breazeal, Hae Won Park
This study enhances dementia prediction using machine learning techniques on patient health data, with supervised learning algorithms like KNN, QDA, LDA, and Gaussian Process Classifiers. LDA achieved...
By: Shafiul Ajam Opee, Nafiz Fahad, Anik Sen, Rasel Ahmed, Fariha Jahan, Md. Kishor Morol, Md Rashedul Islam
Recent advancements in single-cell multi-omics provide profound insights into cellular heterogeneity. This paper proposes OKR-CELL, an Open-world Language Knowledge-Aided Robust Single-Cell Foundation...
This work introduces a generative co-memory regularization approach for Few-shot Class-Incremental Learning (FSCIL). The method leverages generative domain adaptation to fine-tune a pre-trained encode...
This paper introduces ECLIPSE, an Evolutionary Computation Library for Instrumentation Prototyping in Scientific Engineering. This library aims to accelerate the design and optimization of scientific ...
By: Max Foreback, Evan Imata, Vincent Ragusa, Jacob Weiler, Christina Shao, Joey Wagner, Katherine G. Skocelas, Jonathan Sy, Aman Hafez, Wolfgang Banzhaf, Amy Conolly, Kyle R. Helson, Rick Marcusen, Charles Ofria, Marcin Pilinski, Rajiv Ramnath, Bryan Reynolds, Anselmo C. Pontes, Emily Dolson, Julie Rolla
This paper benchmarks nine small language models (SLMs) and small reasoning language models (SRLMs) on system log severity classification using real-world `journalctl` data from Linux production serve...
By: Yahya Masri, Emily Ma, Zifu Wang, Joseph Rogers, Chaowei Yang
This paper proposes AdaFuse, an adaptive ensemble decoding method with test-time scaling for large language models (LLMs). This approach aims to enhance the performance of LLMs by combining outputs fr...
By: Chengming Cui, Tianxin Wei, Ziyi Chen, Ruizhong Qiu, Zhichen Zeng, Zhining Liu, Xuying Ning, Duo Zhou, Jingrui He
This paper introduces "transparent documents," interactive web-based scholarly articles that allow readers to explore the relationship to underlying data by hovering over text fragments. It also prese...
By: Alfonso Piscitelli, Cristina David, Mattia De Rosa, Ali Mohammed, Federico Nanni, Jacob Pake, Roly Perera, Jessy Sodimu, Chenyiqiu Zheng
This paper proposes PsychEval, a new multi-session and multi-therapy benchmark for evaluating AI psychological counselors. It aims to provide high-realism and comprehensive assessment of AI's capabili...
This paper introduces MineNPC-Task, a task suite designed to evaluate memory-aware Minecraft agents. It focuses on the development of AI agents that can effectively manage and utilize memory in comple...
By: Tamil Sudaravan Mohan Doss, Michael Xu, Sudha Rao, Andrew D. Wilson, Balasaravanan Thoravi Kumaravel
Depression is a major contributor to the mental-health burden in Nigeria, yet screening coverage remains limited due to low access to clinicians, stigma, and language barriers. This paper explores fin...
By: Isaac Iyinoluwa Olufadewa, Miracle Ayomikun Adesina, Ezekiel Ayodeji Oladejo, Uthman Babatunde Usman, Owen Kolade Adeniyi, Matthew Tolulope Olawoyin
Lumpy Skin Disease (LSD) is a contagious viral infection that significantly deteriorates livestock health. Early and precise identification is crucial. This paper proposes a hybrid deep learning-based...
By: Muhammad Tahir, Abdul Basit, Muhammad Awais, Muhammad Imran, Farman Ali, Muhammad Shoaib, Ali Raza
This paper proposes a novel approach for stock market price prediction leveraging a hybrid model that combines Neural Prophet with a Deep Neural Network (DNN). The integration aims to capture both tim...
Agents capable of reasoning and planning in the real world require the ability of predicting the consequences of their actions. While world models possess this capability, they most often require acti...
By: Quentin Garrido, Tushar Nagarajan, Basile Terver, Nicolas Ballas, Yann LeCun, Michael Rabbat
The rapid advancement of large language models (LLMs) has led to growing interest in using synthetic data to train future models. However, this creates a self-consuming retraining loop, where models a...
By: Yaxuan Wang, Zhongteng Cai, Yujia Bao, Xueru Zhang, Yang Liu
RoboVIP introduces a multi-view video generation framework that enhances robotic manipulation datasets by creating diverse backgrounds and tabletop scenes using visual identity prompting. This method ...
This paper presents a criticality-aware robust reinforcement learning framework to enhance safety and robustness in autonomous driving systems. By focusing on sparse but critical threats, the method i...
This paper introduces MAGMA, a novel multi-graph based agentic memory architecture designed to enhance the capabilities of AI agents. It focuses on enabling agents to manage complex memories for impro...
By: Dongming Jiang, Yi Li, Guanpeng Li, Bingzhe Li
This paper examines the critical issue of legal alignment for safe and ethical artificial intelligence. It explores how AI development can be guided by legal and ethical frameworks to ensure responsib...
By: Noam Kolt, Nicholas Caputo, Jack Boeglin, Cullen O'Keefe, Rishi Bommasani, Stephen Casper, Mariano-Florentino Cuéllar, Noah Feldman, Iason Gabriel, Gillian K. Hadfield, Lewis Hammond, Peter Henderson, Atoosa Kasirzadeh, Seth Lazar, Anka Reuel, Kevin L. Wei, Jonathan Zittrain
This paper investigates the fine-tuning of small language models to act as efficient enterprise search relevance labelers. The approach demonstrates how smaller LLMs can be optimized for specific busi...
By: Yue Kang, Zhuoyi Huang, Benji Schussheim, Diana Licon, Dina Atia, Shixing Cao, Jacob Danovitch, Kunho Kim, Billy Norcilien, Jonah Karpman, Mahmound Sayed, Mike Taylor, Tao Sun, Pavel Metrikov, Vipul Agarwal, Chris Quirk, Ye-Yi Wang, Nick Craswell, Irene Shaffer, Tianwei Chen, Sulaiman Vesal, Soundar Srinivasan
This paper introduces MedPI, a high-dimensional benchmark for evaluating large language models (LLMs) in patient-clinician conversations. Unlike single-turn QA benchmarks, MedPI assesses medical dialo...
By: Diego Fajardo V., Oleksii Proniakin, Victoria-Elisabeth Gruber, Razvan Marinescu
This paper generalizes the Evidence Accumulation Model (EAM) to real-world contexts, investigating how active sensing through eye movements influences decision-making. It proposes a cognitive scheme t...
This research presents Project Ariadne, proposing a structural causal framework to audit the faithfulness of Large Language Model (LLM) agents. This is crucial for ensuring that LLM agents provide acc...
This work focuses on developing methods for detecting hallucinations in long chain-of-thought reasoning processes, especially in the context of large language models. Effective hallucination detection...
By: Haolang Lu, Minghui Pan, Ripeng Li, Guoshun Nan, Jialin Zhuang, Zijie Zhao, Zhongxiang Sun, Kun Wang, Yang Liu
The paper presents a cross-lingual ontology alignment system that uses embedding-based cosine similarity matching. Ontology entities are contextually enriched through novel techniques, employing a fin...
This paper introduces Falcon-H1R, a hybrid model designed to enhance AI reasoning capabilities. The focus is on efficient test-time scaling, allowing the system to maintain high performance in complex...
By: Falcon LLM Team, Iheb Chaabane, Puneesh Khanna, Suhail Mohmad, Slim Frikha, Shi Hu, Abdalgader Abubaker, Reda Alami, Mikhail Lubinets, Mohamed El Amine Seddik, Hakim Hacid
This paper introduces FormuLLA, an innovative approach that leverages Large Language Models (LLMs) to generate novel 3D printable formulations. This opens up new possibilities for rapid prototyping an...
By: Adeshola Okubena, Yusuf Ali Mohammed, Moe Elbadawi
Introducing EverMemOS, a self-organizing memory operating system designed to enhance structured long-horizon reasoning in AI systems. This enables systems to efficiently manage and utilize information...
This research delves into the geometry of reason, exploring spectral signatures that indicate valid mathematical reasoning. This study could contribute to building AI systems capable of more robust an...
Recursive Language Models (RLMs) introduce a general inference strategy that allows Large Language Models (LLMs) to process arbitrarily long prompts (exceeding 10 million tokens) by treating them as e...
This paper addresses the critical limitation of hallucination in Large Language Models (LLMs) by proposing a novel and robust uncertainty quantification method (RU) for factual generation. It construc...
This paper introduces RoboReward, a set of general-purpose vision-language reward models along with a new benchmark called RoboRewardBench, designed for robotics applications. The RoboReward 8B model ...
By: Tony Lee, Andrew Wagenmaker, Karl Pertsch, Kevin Black, Suraj Nair, Michael Ahn, Jian Lan, Sergey Levine, Chelsea Finn
This paper presents Avatar Forcing, a new diffusion-driven framework that enables real-time interactive head avatar generation for natural conversation. It addresses the challenges of real-time motion...
By: Ki Taekyung, Junho Kim, Hyeonsu Lee, Hyewon Son, Jonghyun Choi
This technical report introduces STAgent, an agentic large language model developed by Alibaba Amap, specifically engineered for real-world spatio-temporal reasoning and complex planning. It achieves ...
By: Yulan Hu, Xiangwen Zhang, Sheng Ouyang, Hao Yi, Lu Xu, Qinglin Lang, Lide Tan, Xiang Cheng, Tianchen Ye, Zhicong Li, Ge Chen, Wenjin Yang, Zheng Pan, Shaopan Xiong, Siran Yang, Ju Huang, Yan Zhang, Jiamang Wang, Yong Liu, Yinfeng Huang, Tucheng Lin, Xin Li, Ning Guo
This research introduces ClinicalReTrial, a self-evolving AI agent designed to optimize clinical trial protocols. This agent leverages AI and multiagent systems to enhance the efficiency and effective...
We propose a comprehensive framework for building trustworthy AI systems by integrating explainability techniques with adversarial robustness methods in deep learning. This work addresses critical con...
By: Professor Julian Vance, Dr. Lena Schmidt, Mr. Omar Hassan, Ms. Jessica Lee, Dr. Martin Müller
We propose a new architectural design that significantly reduces the computational and energy footprint of large language models (LLMs), enabling their efficient deployment on edge devices. This break...
By: Professor Kai Hansen, Dr. Lena Popova, Mr. John M. Smith, Dr. Isabella Garcia, Dr. Wei Wang
This paper presents a multimodal conversational AI system that seamlessly integrates natural language understanding, speech recognition, and visual context to provide highly personalized and effective...
By: Dr. Sophia G. Miller, Professor Alexandre Dubois, Ms. Emily R. Chen, Mr. Robert Johnson, Dr. Priya Reddy, Mr. Carlos Mendoza
This paper presents a novel framework for integrating adaptive agentic AI systems into human-machine teams, focusing on dynamic task allocation, context-aware decision-making, and real-time learning. ...
By: Dr. Elena Petrova, Dr. Kenji Tanaka, Professor Marcus Chen, Dr. Anya Sharma, Mr. David Rodriguez
This paper introduces a novel robust policy learning framework enabling seamless and safe human-robot collaboration in complex, unstructured industrial settings. The approach leverages advanced percep...
By: Dr. Emily White, Prof. Joon-Ho Kim, Dr. Ricardo Garcia, Dr. Anna Schmidt, Dr. Ben Carter
This research proposes a novel deep learning framework that integrates various medical imaging modalities for highly accurate and early detection of pancreatic cancer. The model significantly improves...
By: Dr. Anya Sharma, Prof. David Chen, Dr. Elena Petrova, Dr. Kenji Tanaka, Dr. Sofia Bianchi
This research presents an innovative adaptive AI tutoring system designed to personalize the learning experience, significantly boosting student engagement and improving academic outcomes. The system ...
By: Dr. Daniel Brown, Prof. Jessica Green, Dr. Hiroshi Sato, Dr. Laura Martinez, Dr. Peter Wang
We develop and evaluate context-aware large language models specifically tailored for legal applications, enabling the personalized generation and efficient review of complex legal documents. This sys...
By: Dr. Sophia Davis, Prof. Robert Miller, Dr. Chen Li, Dr. Maria Rodriguez, Dr. David Jones
SyncGait is a novel user-drone mutual authentication system that leverages implicit gait behaviors, specifically the user's unique arm swing, for robust long-distance authentication during drone deliv...
By: Zijian Ling, Man Zhou, Hongda Zhai, Yating Huang, Lingchen Zhao, Qi Li, Chao Shen, Qian Wang
Current language model evaluations measure what models know under ideal conditions but not how robustly they know it under realistic stress. We introduce the Drill-Down and Fabricate Test (DDFT), a pr...
This paper introduces Space AI as a unified interdisciplinary field at the intersection of artificial intelligence and space science and technology. It proposes a systematic framework organizing Space...
This work focuses on improving code generation from Bangla natural language prompts to Python code, utilizing iterative self-correction mechanisms and multilingual AI agents. It aims to bridge the gap...
This paper introduces SpaceTimePilot, a system for generative rendering of dynamic scenes, enabling the creation of realistic and evolving visual content across both spatial and temporal dimensions. T...
By: Zhening Huang, Hyeonho Jeong, Xuelin Chen, Yulia Gryaditskaya, Tuanfeng Y. Wang, Joan Lasenby, Chun-Hao Huang
This paper introduces an AI and Optical Character Recognition (OCR)-driven pipeline for digitizing and integrating historical documents into databases. It addresses challenges like layout variability ...
By: Zahra Abedi, Richard M.K. van Dijk, Gijs Wijnholds, Tessa Verhoef
This research focuses on developing sophisticated control policies for humanoid robots to achieve coordinated manipulation tasks. It explores how robots can make intelligent choices to perform complex...
By: Haozhi Qi, Yen-Jen Wang, Toru Lin, Brent Yi, Yi Ma, Koushil Sreenath, Jitendra Malik
This paper addresses the critical challenge of efficient and accurate data annotation for multisensor datasets, particularly for the rigorous testing of autonomous vehicles. It proposes semi-automated...
By: Andrii Gamalii, Daniel Górniak, Robert Nowak, Bartłomiej Olber, Krystian Radlak, Jakub Winter
This research investigates how iterative deployment strategies can significantly enhance the planning capabilities of Large Language Models (LLMs). The paper presents novel approaches for refining LLM...
By: Augusto B. Corrêa, Yoav Gelberg, Luckeciano C. Melo, Ilia Shumailov, André G. Pereira, Yarin Gal
This paper explores the development of context-aware AI agents based on large language models (LLMs) designed for human-centered energy management systems in smart buildings. The research aims to opti...
This theoretical paper argues for the necessity of incorporating uncertainty, incomplete preferences, and non-Archimedean utilities into AI safety frameworks. It suggests that current approaches to AI...
By: Alessio Benavoli, Alessandro Facchini, Marco Zaffalon
The paper presents Robo-Dopamine, a framework for high-precision robotic manipulation using reinforcement learning (RL). It introduces Dopamine-Reward, a novel multi-view, step-aware process reward mo...
Retrieval-Augmented Generation (RAG) systems enhance large language models by grounding responses in external knowledge bases, but conventional RAG architectures operate with static corpora that canno...
This paper introduces a scalable method to train language models as "AI co-scientists" capable of generating high-quality research plans across diverse scientific domains. It leverages automated extra...
By: Shashwat Goel, Rishi Hazra, Dulhan Jayalath, Timon Willi, Parag Jain, William F. Shen, Ilias Leontiadis, Francesco Barbieri, Yoram Bachrach, Jonas Geiping, Chenxi Whitehouse
This paper introduces MAI-UI, a family of foundation GUI agents designed for real-world deployment. It integrates agent-user interaction, external tool use via MCP, and a native device-cloud collabora...
This paper presents HY-Motion 1.0, a series of state-of-the-art, large-scale motion generation models that produce 3D human motions from text descriptions. It is the first to scale Diffusion Transform...
This survey unifies insights from cognitive neuroscience with Large Language Model (LLM)-driven agents, offering a comprehensive review of memory systems. It establishes a unified framework detailing ...
This paper presents a novel approach utilizing Information-Driven Large Language Model (LLM) Graph Reasoning to predict venture capital investment success. By analyzing complex relationships in financ...
By: Haoyu Pei, Zhongyang Liu, Xiangyi Xiao, Xiaocong Du, Haipeng Zhang, Kunpeng Zhang, Suting Hong
This paper introduces Web World Models, a new approach to building AI agents that can understand and interact with the internet more effectively. It aims to create AI that can navigate, process inform...
This research explores a novel method for causal discovery in federated learning settings, especially when interventions are unknown. It focuses on how to identify causal relationships across distribu...
This paper presents the application of Physics-Informed Neural Networks (PINNs) for modeling semiconductor devices and electronic circuits, using NeuroSPICE as a case study. This approach integrates p...
This paper addresses energy consumption in AI systems orchestrated by Large Language Models (LLMs) by proposing an energy-aware, data-driven model selection strategy. This research is critical for dev...
By: Daria Smirnova, Hamid Nasiri, Marta Adamska, Zhengxin Yu, Peter Garraghan
This preprint investigates the ability of Large Language Models (LLMs) to engage in both divergent (idea generation) and convergent (problem formulation) thinking for creative problem generation. It e...
This paper introduces RL-Struct, a lightweight reinforcement learning framework designed to improve the reliability of structured output generated by large language models. By ensuring more consistent...
This paper explores the integration of knowledge graphs with large language models to enhance the accuracy and interpretability of disease prediction. By leveraging structured medical knowledge, the p...
By: Ruiyu Wang, Tuan Vinh, Ran Xu, Yuyin Zhou, Jiaying Lu, Carl Yang, Francisco Pasquel
This paper introduces A2P-Vis, an agentic pipeline designed to automate the generation of visual insights and reports. It aims to streamline data visualization and communication, providing an efficien...
By: Shuyu Gan, Renxiang Wang, James Mooney, Dongyeop Kang
Neural network pruning is widely used to reduce model size and computational cost. However, most existing methods treat sparsity as an extrinsic constraint enforced via heuristic importance scores or ...
Spatial transcriptomics experiments are rapidly expanding in scale and complexity, making computational analysis a major bottleneck in biological discovery. While frontier AI agents have shown signifi...
By: Kenny Workman, Zhen Yang, Harihara Muralidharan, Hannah Le
Performance optimization is a critical yet challenging aspect of software development, often requiring a deep understanding of system behavior, algorithmic tradeoffs, and careful code modifications. A...
By: Huiyun Peng, Antonio Zhong, Ricardo Andrés Calvo Méndez, Kelechi G. Kalu, James C. Davis
The House-Tree-Person (HTP) drawing test, introduced by John Buck in 1948, remains a widely used projective technique in clinical psychology. However, it has long faced challenges such as heterogeneou...
Segment Anything Model 2 (SAM2), a vision foundation model has significantly advanced in prompt-driven video object segmentation, yet their practical deployment remains limited by the high computation...
By: Avilasha Mandal, Chaoning Zhang, Fachrina Dewi Puspitasari, Xudong Wang, Jiaquan Zhang, Caiyan Qin, Guoqing Wang, Yang Yang, Heng Tao Shen
This paper introduces ScoutGPT, a GPT-based framework designed to analyze team action sequences and quantify individual player impact in sports. By leveraging advanced language model capabilities, Sco...
By: Miru Hong, Minho Lee, Geonhee Jo, Jae-Hee So, Pascal Bauer, Sang-Ki Ko
This paper introduces MegaRAG, a novel framework for Retrieval Augmented Generation that leverages both multimodal data and knowledge graphs. It aims to enhance the accuracy and relevance of generated...
In real-world clinical practice, electrocardiograms (ECGs) are often captured and shared as photographs. However, publicly available ECG data, and thus most related research, relies on digital signals...
By: Xiaoyu Wang, Ramesh Nadarajah, Zhiqiang Zhang, David Wong
Document forgery poses a growing threat to legal, economic, and governmental processes, requiring increasingly sophisticated verification mechanisms. Recent advances in code generation with large lang...
By: Valentin Schmidberger, Manuel Eberhardinger, Setareh Maghsudi, Johannes Maucher
This paper investigates the underlying geometric principles behind AI hallucinations, particularly in Large Language Models. By analyzing 'angles,' it seeks to provide a predictable and computationall...
This paper introduces MiST, a framework for understanding the impact of mid-stage scientific training on the development of chemical reasoning models. By improving these models, it has significant rea...
By: Andres M Bran, Tong Xie, Shai Pranesh, Jeffrey Meng, Xuan Vu Nguyen, Jeremy Goumaz, David Ming Segura, Ruizhi Xu, Dongzhan Zhou, Wenjie Zhang, Bram Hoex, Philippe Schwaller
This technical report introduces C2LLM, a novel approach to code retrieval that utilizes adaptive cross-attention pooling. This innovation has direct real-world applications in software development, e...
By: Jin Qin, Zihan Liao, Ziyin Zhang, Hang Yu, Peng Di, Rui Wang
Ensuring the safety of embodied AI agents in complex, unstructured environments is a critical challenge. This paper introduces RoboSafe, a novel framework that integrates executable safety logic direc...
This study introduces a quantum-inspired framework for optimizing the exploration-exploitation tradeoff in multi-agent reinforcement learning (MARL), specifically applied to UAV-assisted 6G network de...
Small Language Models (SLMs) struggle with complex document understanding due to limited parameters. SMART SLM, a novel Structured Memory and Reasoning Transformer, enhances SLMs for accurate document...
Masked Diffusion Models (MDMs) offer flexible non-autoregressive generation, but their output quality is highly sensitive to the decoding order. This paper formalizes this issue by attributing variabi...
Smart home lighting systems consume 15-20% of residential energy but often lack adaptive intelligence. BitRL-Light is a novel framework that combines 1-bit quantized Large Language Models (LLMs) with ...
Explainable Artificial Intelligence (XAI) is vital for trust and transparency in AI systems, especially in high-stakes applications. This study introduces an Agentic XAI approach that utilizes the ite...
Large Language Models (LLMs) show promise for medication safety in healthcare. This paper presents a real-world evaluation of an LLM-powered system for medication safety reviews in NHS Primary Care, i...
By: Oliver Normand, Esther Borsi, Mitch Fruin, Lauren E Walker, Jamie Heagerty, Chris C. Holmes, Anthony J Avery, Iain E Buchan, Harry Coppock
Developing emotionally intelligent embodied AI that can generate empathic responses in various situations is a significant challenge for human-robot interaction. This paper explores "Closed-Loop Embod...
By: Jiawen Wang, Jingjing Wang Tianyang Chen, Min Zhang, Guodong Zhou
Accurate depth estimation is fundamental for many computer vision tasks, including 3D reconstruction, robotics, and augmented reality. This paper introduces "Re-Depth Anything," a novel method for tes...
Vision-Language Models (VLMs) have shown remarkable progress, but their ability to reason about the physical world, crucial for real-world applications like robotics, remains underexplored. This paper...
By: Li Puyin, Tiange Xiang, Ella Mao, Shirley Wei, Xinye Chen, Adnan Masood, Li Fei-fei, Ehsan Adeli
Automating clinical risk score calculations can significantly reduce physician administrative burden and improve patient care. Current benchmarks like MedCalc-Bench, constructed using LLM-based extrac...
By: Junze Ye, Daniel Tawfik, Alex J. Goodell, Nikhil V. Kotha, Mark K. Buyyounouski, Mohsen Bayati
Policy gradient methods are a cornerstone of reinforcement learning (RL), enabling agents to learn optimal behaviors in complex environments. This paper investigates advances in policy gradient method...
Current Explainable AI (XAI) approaches face a "Scalability-Stability Dilemma": post-hoc methods (e.g., LIME, SHAP) scale easily but are unstable, while supervised frameworks (e.g., TED) offer stabili...
By: Lawrence Krukrubo, Julius Odede, Olawande Olusegun
We introduce V-Agent, a novel multi-agent platform designed for advanced video search and interactive user-system conversations. By fine-tuning a vision-language model (VLM) with a small video prefere...
By: SunYoung Park, Jong-Hyeon Lee, Youngjune Kim, Daegyu Sung, Younghyun Yu, Young-rok Cha, Jeongho Ju
This research focuses on developing explainable conversational AI systems that leverage large language models for early disease diagnosis. It addresses the critical need for transparency and interpret...
This paper explores the effects of humanlike AI design on anthropomorphism, engagement, and trust across different global contexts. The findings reveal that while humanlike AI generally increases anth...
By: Robin Schimmelpfennig, Mark Díaz, Vinodkumar Prabhakaran, Aida Davani
This paper introduces a model-free reinforcement learning approach that incorporates timed reward machines to handle temporal properties in complex environments. By explicitly integrating timing const...
By: Anirban Majumdar, Ritam Raha, Rajarshi Roy, David Parker, Marta Kwiatkowska
This paper studies the use of Conflict-Driven Clause Learning (CDCL) with VSIDS heuristics as a computational engine for discrete facility layout problems. The facility layout problem is modeled as a ...
We propose a new architectural paradigm for multimodal foundation models designed specifically for clinical diagnostic support. The model integrates diverse data types, including medical images, elect...
By: Dr. Kenji Tanaka, Dr. Maria Rodriguez, Prof. Li Wei, Dr. Samuel Green, Dr. Isabella Rossi, Prof. Ahmed Khan
This paper presents a novel framework for automatically constructing large-scale knowledge graphs from unstructured, noisy text data by leveraging the advanced capabilities of large language models. I...
By: Dr. Anya Petrova, Prof. Serhii Kovalenko, Dr. Elena Vasylenko, Dmytro Kuzmenko, Olena Mykhailiuk
This paper addresses the challenge of teaching robots complex manipulation tasks using imperfect human demonstrations. We propose a novel human-in-the-loop framework that allows the robot to query a h...
By: Dr. Sarah Johnson, Prof. Mark Thompson, Dr. Anna Kaczmarek, Giovanni Russo, Dr. Elena Popova
This research explores the application of federated reinforcement learning to optimize traffic flow in urban environments without centralizing sensitive traffic data. Our proposed framework enables in...
By: Dr. Chen Wang, Dr. Emily Davis, Prof. Marco Bianchi, Dr. Javier Perez, Sophie Dubois
We introduce a new method to enhance the adversarial robustness of large-scale foundation models using a self-supervised approach to generate diverse and challenging perturbations. This technique sign...
By: Dr. Michael Brown, Dr. Jessica Lee, Prof. Benjamin Clark, Dr. Sofia Hernandez, Oliver Wilson, Dr. Grace Taylor, Prof. Kevin Moore
This paper establishes a benchmark for evaluating causal versus correlational AI approaches in predictive maintenance. By providing a clear framework for comparison, this work helps industries impleme...
By: Krishna Taduri, Shaunak Dhande, Giacinto Paolo (GP)Saggese, Paul Smith
CodeDistiller proposes a method for automatically generating code libraries, specifically tailored for scientific coding agents. This research has profound implications for accelerating scientific dis...
By: Peter Jansen, Samiah Hassan, Pragnya Narasimha
Optimizing CUDA kernels is complex and labor-intensive. This paper introduces cuPilot, a multi-agent framework that uses strategy as an intermediate semantic representation for kernel evolution, addre...
By: Jinwu Chen, Qidie Wu, Bin Li, Lin Ma, Xin Si, Yang Hu, Shouyi Yin, Jun Yang
This article presents Value Lens, a text-based model designed to detect human values using generative artificial intelligence, specifically Large Language Models (LLMs). The proposed model operates in...
This paper introduces TimeSeries2Report (TS2R), a prompting framework that converts raw lithium-ion battery operational time-series into structured, semantically enriched reports. This enables large l...
By: Jiayang Yang, Chunhui Zhao, Martin Guay, Zhixing Cao
Virtual testing with synthetic data is crucial for autonomous vehicle safety, but pixel-level fidelity doesn't guarantee real-world transfer. This paper introduces Decisive Feature Fidelity (DFF), an ...
Deploying local large language models and vision-language models on edge devices requires balancing accuracy with constrained computational and energy budgets. This paper systematically benchmarks LLM...
By: Ander Alvarez, Alessandro Genuardi, Nilotpal Sinha, Antonio Tiene, Samuel Mugel, Román Orús
This paper introduces AdaGradSelect, a novel fine-tuning method for Large Language Models (LLMs) that offers significant computational efficiency and memory optimization. It trains about 12% faster an...
We present Anubuddhi, a multi-agent AI system that designs and simulates quantum optics experiments from natural language prompts without requiring specialized programming knowledge. The system compos...
This paper proposes AI Epidemiology, a framework for governing and explaining advanced AI systems by applying population-level surveillance methods to AI outputs. It aims to bypass the complexity of c...
By: Zohra Hadjam, John Mellor, Ilaria Tiddi, Adrian R. Taylor
This paper introduces the Social Responsibility Stack (SRS), a control-theoretic architecture designed to govern socio-technical AI systems responsibly. The SRS provides a modular framework for integr...
We present TOGGLE, a novel framework for compressing Large Language Models (LLMs) specifically designed for efficient deployment on edge devices. TOGGLE leverages temporal logic to guide the compressi...
We introduce the concept of Distributional AGI Safety, a framework for analyzing and ensuring the safety of Artificial General Intelligence (AGI) systems across diverse operational contexts and potent...
By: Nenad Tomašev, Matija Franklin, Julian Jacobs, Sébastien Krier, Simon Osindero
Large language models (LLMs) with explicit reasoning capabilities excel at mathematical reasoning yet still commit process errors, such as incorrect calculations, brittle logic, and superficially plau...
By: Qihao Liu, Luoxin Ye, Wufei Ma, Yu-Cheng Chou, Alan Yuille
This paper explores AI-mediated social interaction from a multi-scale perspective, analyzing its impact at individual, group, and societal levels. We examine how AI agents and systems influence human ...
By: Junzhe Zhang
#cs.AI✓ Analyzed#AI-Mediated Communication#Computational Social Science
CitySeeker investigates how Vision-Language Models (VLMs) can effectively perform embodied urban navigation while implicitly understanding and addressing human needs. We propose a framework that integ...
By: Siqi Wang, Chao Liang, Yunfan Gao, Erxin Yu, Sen Li, Yushi Li, Jing Li, Haofen Wang
TimeLens proposes a novel method for video temporal grounding by leveraging multimodal Large Language Models (LLMs). This research enhances the ability of AI to understand and locate specific events w...
By: Jun Zhang, Teng Wang, Yuying Ge, Yixiao Ge, Xinhao Li, Ying Shan, Limin Wang
This paper introduces Predictive Concept Decoders (PCDs), a novel framework for training scalable end-to-end interpretability assistants. PCDs aim to provide human-understandable explanations for AI m...
By: Vincent Huang, Dami Choi, Daniel D. Johnson, Sarah Schwettmann, Jacob Steinhardt
This paper proposes "Stepwise Think-Critique," a unified framework designed to improve the robustness and interpretability of Large Language Model (LLM) reasoning. By incorporating iterative thinking ...
This paper investigates the development of human-centered AI systems for financial decision support, emphasizing explainability and trust. It presents approaches to design AI tools that provide clear ...
By: Sophia Chen, Robert Davis, Laura Evans, Michael Foster
This research focuses on strengthening AI ethics and governance frameworks by integrating Explainable AI (XAI) and causal inference techniques. It proposes methods to make AI decisions more transparen...
By: Emily Brown, Frank Green, Grace White, Henry Black
This paper presents a decision-theoretic approach to manage misalignment in AI systems, a critical challenge for safe and ethical AI deployment. It provides a formal framework to reason about and miti...
By: Daniel A. Herrmann, Abinav Chari, Isabelle Qian, Sree Sharvesh, B. A. Levinstein
This research explores how maintaining epistemic diversity across multiple language models can prevent "knowledge collapse," a reduction to dominant ideas. This is vital for building robust, reliable,...
This paper introduces Context-Picker, an approach that uses multi-stage reinforcement learning for dynamic context selection. This is highly relevant for AI systems that need to efficiently process an...
This work introduces MMGR, a framework for Multi-Modal Generative Reasoning, exploring the integration of various data modalities for enhanced AI understanding and generation, with applications in com...
This research introduces a dynamic learning rate scheduling method based on loss changes, aiming to achieve faster convergence in machine learning models, offering practical benefits for optimizing tr...
This paper interprets self-attention and residual streams in transformers through a Vector Symbolic Architecture (VSA) lens, proposing 'attention as binding' to develop a unified perspective on transf...
This paper introduces a universal reasoning model, aiming to develop a foundational AI system capable of diverse and general intelligence, potentially leading to more robust and adaptable AI applicati...
By: Zitian Gao, Lynx Chen, Yihao Xiao, He Xing, Ran Tao, Haoming Luo, Joey Zhou, Bryan Dai
This research focuses on enhancing the reliability of Large Language Model (LLM) agents by introducing a model-first reasoning approach, which explicitly models problems to reduce hallucinations and i...
This paper analyzes the design of a telehealth application for palliative care, integrating quality, human values, and real-world considerations to improve accessibility and continuity of care in digi...
By: Wei Zhou, Rashina Hoda, Andy Li, Chris Bain, Laura Bird, Emmy Trinh, Peter Poon, Teresa O Brien, Mahima Kalla, Olivia Metcalf, Wendy Chapman, Joycelyn Ling, Sam Georgy, David Bevan
This paper presents WorldPlay, a streaming video diffusion model that enables real-time, interactive world modeling with long-term geometric consistency. It resolves the trade-off between speed and me...
This paper introduces PortAgent, an LLM-driven vehicle dispatching agent designed to fully automate the Vehicle Dispatching System (VDS) transferring workflow in Automated Container Terminals (ACTs). ...
By: Jia Hu, Junqi Li, Weimeng Lin, Peng Jia, Yuxiong Ji, Jintao Lai
This paper proposes an intelligent, interactive workflow powered by Large Language Models (LLMs) to address the steep learning curve and complex manual operations in traditional seismic wave simulatio...
Seedance 1.5 pro is a foundational model for native, joint audio-visual generation, leveraging a dual-branch Diffusion Transformer architecture and a specialized multi-stage data pipeline. It achieves...
This paper proposes Nemotron-Cascade, a framework for developing general-purpose reasoning models using cascaded domain-wise reinforcement learning (Cascade RL). It addresses heterogeneity in RL infra...
This paper introduces SMMT, a sparse multi-modal transformer architecture, to address the high computational and energy costs of dense self-attention in intelligent systems. SMMT incorporates cluster-...
This paper presents a secure, modular framework that leverages locally deployed large language models (LLMs) to automate structured feature extraction from unstructured electronic health record (EHR) ...
By: Mitchell A. Klusty, Elizabeth C. Solie, Caroline N. Leach, W. Vaiden Logan, Lynnet E. Richey, John C. Gensel, David P. Szczykutowicz, Bryan C. McLellan, Emily B. Collier, Samuel E. Armstrong, V. K. Cody Bumgardner
This paper introduces an AI-based annotation pipeline designed to systematically identify, label, and fix instability patterns in Large Language Model (LLM) output. This human-AI synergy method combin...
This paper introduces LongVie 2, a multimodal controllable ultra-long video world model. It focuses on generating and understanding extended video sequences with high fidelity and controllability. Thi...
Large language models (LLMs) are often opaque, making principled governance of their internal memory and "self-like" behavior difficult. This paper develops an engineering-oriented, clause-based archi...
Investigates if Large Language Models exhibit envy-like preferences in multi-agent environments, providing insights into their social intelligence and decision-making biases. Understanding these compl...
Introduces MedCEG, a novel framework using critical evidence graphs to enhance the verifiability and reliability of AI-driven medical reasoning, crucial for clinical decision support. This work signif...
Presents MAC, a multi-agent framework designed to enhance conversational AI by enabling interactive clarification with users in multi-turn dialogues, improving understanding and task completion. This ...
By: Emre Can Acikgoz, Jinoh Oh, Joo Hyuk Jeon, Jie Hao, Heng Ji, Dilek Hakkani-Tür, Gokhan Tur, Xiang Li, Chengyuan Ma, Xing Fan
Evaluates the robustness of CNNs for diagnosing diseases in mango leaves, highlighting practical applications of AI in agriculture for crop health monitoring. This research directly contributes to sus...
By: Gabriel Vitorino de Andrade, Saulo Roberto dos Santos, Itallo Patrick Castro Alves da Silva, Emanuel Adler Medeiros Pereira, Erick de Andrade Barboza
Proposes a new approach combining differentiable programming with evolutionary strategies for reinforcement learning, aiming to improve learning efficiency and adaptability in complex environments. Th...
Explores methods for defending hierarchical models that represent precedential constraints, relevant for robust legal reasoning and AI systems in jurisprudence. This research offers valuable insights ...
Analyzes the role of large language models in combinatorial optimization, covering their ability to extract features and aid in selecting optimal algorithms for complex problems. This research is high...
By: Francesca Da Ros, Luca Di Gaspero, Kevin Roitero
This paper introduces Dora, a framework for Quality of Experience (QoE) aware hybrid parallelism in distributed edge AI training and inference. It addresses the challenge of optimizing heterogeneous c...
By: Jianli Jin, Ziyang Lin, Qianli Dong, Yi Chen, Jayanth Srinivasa, Myungjin Lee, Zhaowei Tan, Fan Lai
Medical image segmentation plays a crucial role in various clinical applications, including diagnosis, treatment planning, and surgical guidance. However, the inherent variability in medical images, c...
By: Jianpeng Zhang, Yizhe Zhang, Bo Liu, Zhihui Wang, Danny Chen
Embodied AI, which aims to develop intelligent agents capable of perceiving, acting, and reasoning in physical or simulated environments, represents a grand challenge in artificial intelligence. The e...
Imitation Learning (IL) has emerged as a promising paradigm for training robotic policies from expert demonstrations. A significant challenge in real-world robotics, however, is the robustness gap bet...
Zero-shot image segmentation, the task of segmenting unseen object categories without requiring any labeled examples, is a challenging but highly desirable capability for many real-world computer visi...
Recommender systems are ubiquitous in modern digital platforms, guiding users to relevant items from vast catalogs. A significant challenge arises in few-shot recommendation scenarios, where new items...
By: Yichao Lv, Fan Yang, Yiqi Wang, Xiangyu Zhao, Guohua Li
Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, revolutionizing natural language processing and extending their influence to scientific research. This su...
By: Jiachen Li, Yujing Jiang, Zhiyuan Liu, Jie Tang
Large Language Models (LLMs) have sparked considerable excitement across various sectors, with education being a particularly prominent area of discussion. Proponents suggest that LLMs could revolutio...
This paper explores design goals for Large Language Model (LLM)-assisted literature reviews, aiming to shift the process from a verification burden to a trusted collaboration. It addresses the practic...
By: Brenda Nogueira, Werner Geyer, Andrew Anderson, Toby Jia-Jun Li, Dongwhi Kim, Nuno Moniz, Nitesh V. Chawla
This paper introduces Dora, a framework for optimizing distributed edge AI training and inference with Quality of Experience (QoE) awareness. It focuses on hybrid parallelism, managing heterogeneous c...
This research evaluates TxAgent's therapeutic agentic reasoning within the NeurIPS CURE-Bench Competition, focusing on AI's ability to assist in clinical decision-making and therapeutic strategies. It...
By: Tim Cofala, Christian Kalfar, Jingge Xiao, Johanna Schrader, Michelle Tang, Wolfgang Nejdl
This study investigates the application of Large Language Models (LLMs) to analyze unstructured clinical narratives for identifying Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD) and...
This paper advocates for dynamic and inclusive benchmarking to ensure AI evaluation keeps pace with its evolution, supporting responsible, reproducible, and accessible AI deployment. It aims to improv...
By: Gregor von Laszewski, Wesley Brewer, Jeyan Thiyagalingam, Juri Papay, Armstrong Foundjem, Piotr Luszczek, Murali Emani, Shirley V. Moore, Vijay Janapa Reddi, Matthew D. Sinclair, Sebastian Lobentanzer, Sujata Goswami, Benjamin Hawks, Marco Colombo, Nhan Tran, Christine R. Kirkpatrick, Abdulkareem Alsudais, Gregg Barrett, Tianhao Li, Kirsten Morehouse, Shivaram Venkataraman, Rutwik Jain, Kartik Mathur, Victor Lu, Tejinder Singh, Khojasteh Z. Mirza, Kongtao Chen, Sasidhar Kunapuli, Gavin Farrell, Renato Umeton, Geoffrey C. Fox
This paper introduces CORL, a method for reinforcement learning of policies that solve Mixed-Integer Linear Programs (MILPs) using branch and bound algorithms. It addresses the challenges of suboptima...
By: Akhil S Anand, Elias Aarekol, Martin Mziray Dalseg, Magnus Stalhane, Sebastien Gros
This research utilizes reinforcement learning to investigate the role of feedback in skill acquisition in a physical system. It demonstrates that learning a high-performance skill may require richer i...
This paper introduces the Prismatic World Model (PRISM-WM), a structured architecture to decompose complex hybrid dynamics into composable primitives for robust planning in robotic domains. By accurat...
This research investigates the use of Large Language Models (LLMs), specifically Llama-3.1 8B, for automated source code vulnerability detection (CVD). It explores various fine-tuning and prompt engin...
Auto-BenchmarkCard is a workflow designed to generate validated descriptions of AI benchmarks. It addresses the common issues of incomplete or inconsistent benchmark documentation by combining multi-a...
By: Aris Hofmann, Inge Vejsbjerg, Jiatong Shi, Junwon Lee
This paper presents an algorithm for computing evolutionarily stable strategies (ESSs) in symmetric perfect-recall extensive-form games of imperfect information. The algorithm, applicable to two-playe...
Autoregressive decoding in Large Language Models (LLMs) is inherently sequential, creating a latency bottleneck that scales linearly with output length. While ``Decomposition-and-Fill'' methods like S...
Circuit discovery aims to identify minimal subnetworks that are responsible for specific behaviors in large language models (LLMs). Existing approaches primarily rely on iterative edge pruning, which ...
By: Muhammad Umair Haider, Hammad Rizwan, Hassan Sajjad, A.B. Siddique
Sensor-based human activity recognition (HAR) mines activity patterns from the time-series sensory data. In realistic scenarios, variations across individuals, devices, environments, and time introduc...
From content moderation to content curation, applications requiring vision classifiers for visual concepts are rapidly expanding. Existing human-in-the-loop approaches typically assume users begin wit...
We establish a precise correspondence between decision-making agents in partially observable Markov decision processes (POMDPs) and one-input process functions, the classical limit of higher-order qua...
This paper introduces Value-Guided Offline Control Barrier Functions (V-OCBF), a framework for learning neural Control Barrier Functions (CBFs) entirely from offline demonstrations. It provides rigoro...
This paper proposes ReMe, a dynamic procedural memory framework for experience-driven agent evolution. It addresses the limitations of static memory in LLM agents by introducing multi-faceted distilla...
By: Zouying Cao, Jiaji Deng, Li Yu, Weikang Zhou, Zhaoyang Liu, Bolin Ding, Hai Zhao
This paper addresses the urgent need to unify research in AI safety and ethics. While AI development rapidly scales capabilities, the work on producing harmless, "aligned" systems is equally critical....
This paper explores how large language models (LLMs) can enhance the proposal selection process at large user facilities. It offers a scalable, consistent, and cost-effective alternative to traditiona...
By: Lijie Ding, Janell Thomson, Jon Taylor, Changwoo Do
This research explores enhancing radiology report generation and visual grounding in medical imaging by applying reinforcement learning (RL) to vision-language models (VLMs). It investigates how RL, c...
By: Benjamin Gundersen, Nicolas Deperrois, Samuel Ruiperez-Campillo, Thomas M. Sutter, Julia E. Vogt, Michael Moor
This paper introduces CA-GPT, a RAG-enhanced AI-OCT system, demonstrating superior decision support for Percutaneous Coronary Intervention (PCI). It significantly outperforms general-purpose large lan...
The proposed framework advances computational methods for belief-driven discourse analysis and offers applications for stance detection, political communication studies, and content moderation policy.
ExaCraft is an AI system that generates personalized educational examples by dynamically adapting to a learner's context, including their struggles, mastery, and preferences. This promises a more effe...
This qualitative study investigates how users calibrate their trust when interacting with Large Language Models (LLMs) that exhibit hallucinations. Understanding this dynamic is crucial for developing...
This paper presents a comprehensive evaluation of AI agents against human cybersecurity professionals in live enterprise penetration testing. It highlights the capabilities of AI in discovering vulner...
By: Justin W. Lin, Eliot Krzysztof Jones, Donovan Julian Jasper, Ethan Jun-shen Ho, Anna Wu, Arnold Tianyi Yang, Neil Perry, Andy Zou, Matt Fredrikson, J. Zico Kolter, Percy Liang, Dan Boneh, Daniel E. Ho
This work presents a robust autonomous navigation system for robotic platforms operating in highly unstructured and hazardous disaster environments. Our proposed system integrates advanced sensor fusi...
By: Dr. Robert Smith, Dr. Laura Kim, Dr. Daniel Lee, Dr. Sophia Chang, Dr. William Johnson
The energy consumption of deep learning models is a growing concern. This paper presents a novel hardware accelerator for spiking neural networks, a key component of neuromorphic computing, enabling u...
By: Dr. Satoshi Tanaka, Dr. Maria Rossi, Dr. John Doe, Dr. Jane Smith, Dr. Wei Zhang
Traditional methods for identifying software vulnerabilities are often labor-intensive and prone to human error. This paper explores the effectiveness of fine-tuned large language models (LLMs) in aut...
By: Alex Johnson, Benjamin Lee, Catherine Davis, Daniel White, Elizabeth Green
Deploying powerful generative AI models on resource-constrained edge devices remains a significant challenge. This paper introduces a novel distillation-based framework that effectively compresses lar...
By: Sarah Jones, Michael Brown, Emily White, James Taylor, Olivia Davis
The discovery of new materials with desired properties is crucial for technological advancement but traditionally relies on costly and time-consuming experimental trials. We introduce an AI-driven pla...
By: Dr. Priya Sharma, Dr. Hiroshi Sato, Dr. Liam Murphy, Dr. Isabella Costa, Dr. Noah Brown, Dr. Mia Wilson, Dr. Ethan Hall
Autonomous robots operating in unstructured and dynamic environments face significant challenges due to unpredictable conditions and complex interactions. This paper proposes a novel robust reinforcem...
By: Dr. Alex Miller, Dr. Lena Becker, Prof. Robert Johnson, Dr. Sophie Dubois
The rapid advancement of Artificial Intelligence (AI) necessitates a robust legal and ethical framework to ensure its responsible development and deployment. This paper proposes a comprehensive framew...
By: Sophia Chen, David Lee, Elena Petrova, Markus Schmidt
This paper introduces OmniView, a novel diffusion model capable of generating high-quality 3D and 4D view syntheses from limited input. By leveraging advanced architectural designs and training strate...
Federated learning (FL) offers a promising paradigm for privacy-preserving machine learning by enabling collaborative model training without centralizing raw data. This paper introduces an adaptive cl...
By: Jia Li, Kevin Zhang, Maria Garcia, Ahmed Hassan, Oliver Brown
This paper addresses the critical issue of Multimodal Large Language Models (MLLMs) producing inconsistent or different answers when presented with the same information through various input modalitie...
By: Angela van Sprang, Laurens Samson, Ana Lucic, Erman Acar, Sennay Ghebreab, Yuki M. Asano
This paper introduces EcomBench, a benchmark designed for the holistic evaluation of foundation agents in e-commerce, addressing the need for comprehensive assessment of AI's performance in this criti...
By: Rui Min, Zile Qiao, Ze Xu, Jiawen Zhai, Wenyu Gao, Xuanzhong Chen, Haozhen Sun, Zhen Zhang, Xinyu Wang, Hong Zhou, Wenbiao Yin, Xuan Zhou, Yong Jiang, Haicheng Liu, Liang Ding, Ling Zou, Yi R. (May)Fung, Yalong Li, Pengjun Xie
DAComp provides a comprehensive, research-grade benchmark for evaluating data agents across the entire data intelligence lifecycle, encompassing data engineering and open-ended data analysis, which is...
By: Fangyu Lei, Jinxiang Meng, Yiming Huang, Junjie Zhao, Yitong Zhang, Jianwen Luo, Xin Zou, Ruiyi Yang, Wenbo Shi, Yan Gao, Shizhu He, Zuo Wang, Qian Liu, Yang Wang, Ke Wang, Jun Zhao, Kang Liu
This research presents CARLoS, a method for efficient retrieval utilizing Concise Assessment Representation of LoRAs (Low-Rank Adaptations) at scale, offering significant potential for optimizing the ...
By: Shahar Sarfaty, Adi Haviv, Uri Hacohen, Niva Elkin-Koren, Roi Livni, Amit H. Bermano
This paper presents ReasonBENCH, a new benchmark designed to evaluate and quantify the stability and consistency of reasoning capabilities in Large Language Models. The findings are vital for understa...
This research proposes RL-MTJail, a reinforcement learning approach for automated black-box multi-turn jailbreaking of Large Language Models. The study offers crucial insights for enhancing LLM securi...
This research introduces DEMOCRITUS, a novel system for constructing large causal models by leveraging Large Language Models to extract and structure textual knowledge across diverse domains. It pione...
This paper addresses the challenge of efficient memory utilization in Large Language Models through a novel dynamic memory management system. It aims to optimize resource allocation, reduce computatio...
This research introduces a data-driven model predictive control strategy, enhanced by Gaussian Process Regression, tailored for complex cyber-physical systems. The approach offers improved robustness ...
This paper investigates methods for auditing strategic behavior, specifically "sandbagging," in game-theoretic settings. It aims to develop robust mechanisms for detecting and preventing deceptive pla...
By: Jordan Taylor, Sid Black, Dillon Bowen, Thomas Read, Satvik Golechha, Alex Zelenka-Martin, Oliver Makins, Connor Kissane, Kola Ayonrinde, Jacob Merizian, Samuel Marks, Chris Cundy, Joseph Bloom
This study provides a comprehensive performance analysis of Data Oriented Design (DOD) versus traditional Object-Oriented Design (OOD), focusing on cache utilization and efficiency in multi-threaded e...
By: Gabriel M. Arantes, Richard F. Pinto, Bruno L. Dalmazo, Eduardo N. Borges, Giancarlo Lucca, Viviane L. D. de Mattos, Fabian C. Cardoso, Rafael A. Berri
This paper proposes a new perspective on human-robot interaction by leveraging extended reality (XR) and virtual robots powered by large foundation models. It argues that these XR-native agents can ac...
This paper introduces SusVibes, a benchmark with 200 real-world software engineering tasks, to evaluate the safety and vulnerabilities of code generated by large language model agents in "vibe coding"...
This novel framework, IM HERE, models engagement in human-human, human-robot, and robot-robot interactions by using an effort-based description of relationships. It aims to automate the analysis and d...
This paper introduces DRIFT (Dissatisfaction-Refined Iterative preFerence Training), a novel approach for preference learning in real-world large language model deployments. It leverages abundant impl...
This research explores the application of generative pre-trained diffusion paradigms, drawing parallels with successful large language models and vision models, for zero-shot time series forecasting. ...
This paper introduces a new benchmark and baseline to develop robust Vision-Language Models (VLMs) specifically for autonomous driving, addressing critical safety and performance challenges in real-wo...
Introduces conversational LLMs to streamline the documentation of business processes for Small and Medium-sized Enterprises (SMEs), transforming tacit knowledge into formal BPMN diagrams to enhance op...
This project develops an AI system offering an end-to-end solution for aiding doctors with diagnosis and treatment planning for Glioblastoma Multiforme (GBM), the deadliest human cancer. It uses multi...
Incomplete data is a pervasive challenge in real-world applications. This paper introduces Impugan, a conditional Generative Adversarial Network (cGAN) designed for robustly imputing missing values an...
This study addresses the crucial problem of hallucinations in Multimodal Large Language Models (MLLMs), which generate factually inconsistent descriptions despite coherent linguistic output. HalluShif...
By: Sujoy Nath, Arkaprabha Basu, Sharanya Dasgupta, Swagatam Das
Conformal prediction is a framework for quantifying uncertainty in machine learning predictions, crucial for reliable real-world applications. This paper introduces an online conformal prediction meth...
By: Dongjian Hu, Junxi Wu, Shu-Tao Xia, Changliang Zou
Public-use microdata samples often risk re-identification, especially for firm-level data where anonymity is difficult. This paper describes a machine learning model to construct synthetic public-use ...
By: Jorge Cisneros Paz, Timothy Wojan, Matthew Williams, Jennifer Ozawa, Robert Chew, Kimberly Janda, Timothy Navarro, Michael Floyd, Christine Task, Damon Streat
Heart failure (HF) is a leading cause of rehospitalization. This paper proposes ClinNoteAgents, an LLM multi-agent system to predict and interpret heart failure 30-day readmission from clinical notes,...
By: Rongjia Zhou, Chengzhuo Li, Carl Yang, Jiaying Lu
Researchers at Physical Intelligence developed a method for real-time robot control that shifts action chunk conditioning from inference-time to training-time, achieving lower latency and improved rob...
Researchers from Shenzhen Sunline Tech Co., Ltd. addressed the LLM repetition problem in production financial batch code interpretation by evaluating multiple solutions. Their study found that Beam Se...
Huawei Inc. researchers developed EMMA, a unified multimodal architecture for understanding, generation, and editing, utilizing 32x visual token compression and channel-wise feature fusion to enhance ...
Researchers from Google, NYU, ETH Zurich, and Stanford present a theoretical framework to formalize how large language models perform complex, iterative reasoning. The framework characterizes reasonin...
By: David Lee, Maria Garcia, Alexandre Dubois, Sophia Müller
Researchers from Zhejiang University and ByteDance introduced CodeVision, a 'code-as-tool' framework that equips Multimodal Large Language Models (MLLMs) to programmatically interact with images. The ...
This research empirically validates that deep neural networks consistently converge to shared, low-dimensional parametric subspaces, leading to substantial memory efficiency and parameter-efficient ad...
This paper systematically quantifies errors in published AI papers using large language model analysis, providing valuable insights for improving the reliability and integrity of AI research.
By: Federico Bianchi, Yongchan Kwon, Zachary Izzo, Linjun Zhang, James Zou
TRACE provides a framework to analyze and improve the stepwise reasoning capabilities of Vision-Language Models, crucial for developing more interpretable and robust multimodal AI systems.
SIMA 2 is a generalist embodied AI agent developed by Google DeepMind that can understand and act in diverse 3D virtual worlds, significantly improving task success rates and demonstrating autonomous ...
This paper presents a large-scale empirical analysis of real-life code generated by ChatGPT, evaluating its correctness and security, and highlighting user's lack of security awareness for LLM-generat...
This paper introduces an agentic AI pipeline that autonomously clusters prediction markets and identifies relationships between them, achieving high accuracy and profitable trading strategies.
This paper presents a model-based framework combining Bayesian optimization with Monte Carlo Tree Search to achieve new state-of-the-art upper bounds in sphere packing, demonstrating AI's ability to a...
By: Rasul Tutunov, Alexandre Maraval, Antoine Grosnit, Xihan Li, Jun Wang, Haitham Bou-Ammar
This study investigates human perception and evaluation of AI-generated responses modified by a mitigator model to reduce harm, focusing on mitigation performance, transparency, and metrics to bridge ...
By: Heloisa Candello, Muneeza Azmat, Uma Sushmitha Gunturi, Raya Horesh, Rogerio Abreu de Paula, Heloisa Pimentel, Marcelo Carpinette Grave, Aminat Adebiyi, Tiago Machado, Maysa Malfiza Garcia de Macedo
This paper presents fMRI2GES, a novel AI system that reconstructs co-speech gestures from fMRI signals using dual brain decoding alignment, showing potential for brain-computer interfaces.
The AI Consumer Index (ACE) is introduced as a comprehensive benchmark to evaluate the gap between advanced AI models and the practical needs of consumers, revealing significant limitations in current...
This paper introduces a novel method to model and estimate the energy consumption of different execution configurations in data-sharing pipelines, also identifying reuse potential to reduce energy in ...
DeepSeek-V3.2 introduces DeepSeek Sparse Attention and a scalable reinforcement learning framework, achieving superior reasoning and agent performance comparable to top proprietary models, and excelli...
Deep Forcing is a training-free method that enhances real-time long video generation by addressing temporal repetition and motion issues through Deep Sink and Participative Compression, yielding high-...
By: Wooseok Jang, Paul Hyunbin Cho, Jisu Nam, Heeji Yoon, Seungryong Kim
In 2024, France was shaken by the far-right National Rally's victory in the European elections. In response to this unprecedented result, French President Emmanuel Macron dissolved the National Assemb...
By: Caroline Violot, Vera Sosnovik, Mathias Humbert
This paper investigates the electron-phonon contribution to total energy, an often-approximated factor in first-principles calculations. It clarifies the nature of this contribution and demonstrates i...
By: Samuel Poncé, Xavier Gonze
#imported✓ Analyzed#condensed matter physics#density functional theory
This paper computes valley splittings in Si/SiGe superlattices using ab initio density functional theory (DFT), which provides an excellent description of interfaces, strains, and atomistic disorder. ...
By: Lukas Cvitkovich, Tancredi Salamone, Christoph Wilhelmer, Biel Martinez, Tibor Grasser, Yann-Michel Niquet
This paper investigates the dissipative Yao-Lee Spin-Orbital Model. It focuses on the exact solvability of this model and the conditions under which its $\mathcal{PT}$ symmetry breaks.
By: Zihao Qi, Yuan Xue
#imported✓ Analyzed#Yao-Lee Model#Open Quantum Systems
This study validates computational tools for simulating tokamak environments, which is crucial for the safe and efficient production of medical isotopes.
By: Christopher Ehrich, Christian Bachmann, Pavel Pereslavtsev, Christian Reiter
This work presents a novel operator for 3D phase field modeling that ensures consistency across physical, energetic, and numerical aspects, enabling more accurate simulations of material phenomena.
This paper explores stochastic density functional theory using the multilevel Monte Carlo method, offering a promising approach to enhance the efficiency and accuracy of quantum mechanical simulations...
By: Xue Quan, Huajie Chen
#physics.comp-ph✓ Analyzed#Stochastic Density Functional Theory#Multilevel Monte Carlo
This paper introduces a portable and efficient framework for Lattice Boltzmann Method and Discrete Element Method simulations on GPUs, accelerating complex multi-physics problems with potential for in...
By: Raphael Maggio-Aprile, Maxime Rambosson, Christophe Coreixas, Jonas Latt
This research proposes an energy-efficient design leveraging engineered magnetic microstructures to emulate biological neuron functions, promising advancements in spintronic neuromorphic architectures...
This paper introduces a novel generative AI system that creates dynamic, multimodal content (textures, objects, soundscapes) in real-time, enabling unprecedented levels of immersion and interactivity ...
This paper presents a novel multi-agent reinforcement learning framework that significantly improves the efficiency and stability of decentralized energy grid management by optimizing renewable energy...
We propose an innovative federated learning architecture that not only ensures robust privacy for patient data but also provides interpretable insights for medical practitioners, fostering trust in AI...
This work demonstrates an integrated system where large language models propose hypotheses and design experiments, which are then autonomously executed by robotic platforms, leading to accelerated sci...
This paper presents Semantic Soft Bootstrapping, a novel method enabling long context reasoning in Large Language Models without reliance on reinforcement learning, representing a potential breakthrou...
This paper explores the potential of multi-Large Language Model (LLM) collaboration to enhance the accuracy and utility of medication recommendation systems, offering a practical real-world applicatio...
By: Huascar Sanchez, Briland Hitaj, Jules Bergmann, Linda Briesemeister
This paper focuses on the crucial challenge of detecting perspective shifts within multi-agent AI systems, which is essential for developing more cooperative and understandable AI interactions.
This paper investigates the surprising efficacy of small models combined with agentic AI in achieving significant results within hardware design, suggesting a breakthrough in efficient AI application.
This paper critiques common patterns in machine ethics for Reinforcement Learning and advocates for a virtue-focused alternative, addressing the limitations of rule-based and single-objective reward a...
This paper explores the integration of Speech AI with Relational Graph Transformers to enable continuous neurocognitive monitoring for individuals with rare neurological diseases, offering significant...
STELLA proposes a method to guide Large Language Models for improved time series forecasting by employing semantic abstractions, potentially leading to more accurate and interpretable predictions in v...
This paper presents a framework for Executable Governance for AI, demonstrating how Large Language Models can translate policies into actionable rules, thereby bridging the gap between AI ethics and p...
This paper introduces a benchmark to evaluate the epidemiology of Large Language Models, specifically focusing on their observational distribution knowledge, which is crucial for understanding and imp...
This work introduces a dual-reasoning training framework that integrates affirmative generation with structured counterfactual denial, leading to more robust, interpretable, and human-reasoning-aligne...
GovBench introduces a benchmark for evaluating Large Language Model agents in real-world data governance workflows, which is crucial for the deployment of trustworthy AI in regulated environments.
Chameleon introduces adaptive adversarial agents to address scaling-based visual prompt injection in multimodal AI systems, enhancing the robustness and security of these complex models.