← Back to Home

All Research Papers

Browse through 368 research papers. Search and discover the latest AI research from arXiv.

Exploring Reasoning Reward Model for Agents

This paper focuses on developing and exploring a reasoning reward model designed to improve the capabilities of AI agents. It likely investigates how to effectively train agents by providing rewards t...

By: Kaixuan Fan, Kaituo Feng, Manyuan Zhang, Tianshuo Peng, Zhixun Li, Yilei Jiang, Shuang Chen, Peng Pei, Xunliang Cai, Xiangyu Yue
#cs.AI
Read More

Conditional Denoising Model as a Physical Surrogate Model

This paper explores the use of conditional denoising models as physical surrogate models for complex physical systems. It addresses the common trade-off between data-fitting accuracy and physical cons...

By: José Afonso, Pedro Viegas, Rodrigo Ventura, Vasco Guerra
#cs.AI
Read More

How AI Impacts Skill Formation

This study experimentally investigates how AI assistance influences human skill acquisition, revealing that while it does not consistently improve immediate productivity for new learning tasks, it sig...

By: Judy Hanwen Shen, Alex Tamkin
#cs.AI✓ Analyzed#Generative AI#Skill Acquisition
Read More

A Pragmatic VLA Foundation Model

LingBot-VLA is a Vision-Language-Action foundation model pre-trained on 20,000 hours of real-world multi-embodiment robot data. It demonstrates that VLA model performance scales with increasing data v...

By: Wei Wu, Fan Lu, Yunnan Wang
#cs.AI
Read More

The Illusion of Insight in Reasoning Models

This paper investigates the phenomenon of "illusion of insight" in AI reasoning models, where models might appear to have genuine understanding without truly possessing it. The research critically exa...

By: Liv G. d'Aliberti, Manoel Horta Ribeiro
#cs.AI
Read More

Mortar: Evolving Mechanics for Automatic Game Design

The paper introduces Mortar, a system that uses evolving mechanics for automatic game design. This AI-driven approach can generate novel game rules and interactions, aiming to accelerate the game deve...

By: Muhammad U. Nasir, Yuchen Li, Steven James, Julian Togelius
#cs.AI
Read More

Self-Distillation Enables Continual Learning

This paper introduces Self-Distillation Fine-Tuning (SDFT), a method enabling large language models to continually acquire new skills and knowledge from demonstrations without catastrophic forgetting....

By: Idan Shenfeld, Tianxiao Shen, Jonathan Gordon
#cs.AI
Read More

One-step Latent-free Image Generation with Pixel Mean Flows

Researchers introduce Pixel MeanFlow (pMF), a generative model that produces high-fidelity images in a single network evaluation directly from noise in pixel space, without requiring a latent encoder ...

By: Yiyang Lu, Susie Lu, Qiao Sun, Hanhong Zhao, Zhicheng Jiang, Xianbang Wang, Tianhong Li, Zhengyang Geng, Kaiming He
#cs.AI
Read More

DeepSeek-OCR 2: Visual Causal Flow

This work presents DeepSeek-OCR 2, investigating a novel encoder, DeepEncoder V2, capable of dynamically reordering visual tokens based on image semantics. Inspired by human visual perception, this ap...

By: Haoran Wei, Yaofeng Sun, Yukun Li
#cs.AI
Read More

Personalized Drug Discovery through Generative AI Foundation Models

This paper explores the application of large-scale generative AI foundation models for accelerating personalized drug discovery. It details novel architectures capable of synthesizing drug candidates ...

By: Dr. Anya Petrova, Dr. Ben Carter, Dr. Chen Li, Dr. David Sharma, Dr. Emily Wong, Dr. Frank Miller, Dr. Grace Kim
#cs.AI✓ Analyzed#Generative AI#Drug Discovery
Read More

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Tencent researchers introduced Youtu-VL, a Vision-Language Model framework addressing fine-grained visual information loss with a "vision-as-target" optimization paradigm, achieving competitive perfor...

By: Zhixiang Wei, Yi Li, Zhehan Kan, Xinghua Jiang, Zuwei Long, Shifeng Liu, Hongze Shen, Wei Liu, Xiaoyu Tan, Haojia Lin, Yubo Zhu, Qianyu Li, Di Yin, Haoyu Cao, Weibo Gu, Xin Li, Yinsong Liu, Deqiang Jiang, Xing Sun, Yunsheng Wu, Mingkong Tang, Shuangyin Liu, Lexiang Tang, Haodong Lin, Junru Lu, Jiarui Qin, Lingfeng Qiao, Ruizhi Qiao, Bo Ke, Jianfeng He, Ke Li, Yangning Li, Yunhang Shen, Mengdan Zhang, Peixian Chen, Kun Yin, Bing Liu, Yunfei Wu, Huang Chen, Zhongpeng Cai, Xiaotian Li
#cs.AI
Read More

Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models

Researchers introduced a framework and benchmark to study visual world modeling in Unified Multimodal Models (UMMs), demonstrating that visual generation significantly improves reasoning on physical a...

By: Jialong Wu, Xiaoying Zhang, Hongyi Yuan, Xiangcheng Zhang, Tianhao Huang, Changjing He, Chaoyi Deng, Renrui Zhang, Youbin Wu, Mingsheng Long
#cs.AI
Read More

NeuroAI and Beyond

This paper advocates for NeuroAI, a type of Neuroscience-informed Artificial Intelligence, by identifying current and future areas of synergism between neuroscience and AI. It focuses on embodiment, l...

By: Jean-Marc Fellous, Gert Cauwenberghs, Cornelia Fermüller, Yulia Sandamirskaya, Terrence Sejnowski
#cs.AI
Read More

Masked Depth Modeling for Spatial Perception

Robbyant introduces Masked Depth Modeling (MDM), a framework that leverages natural sensor failures in RGB-D cameras as learning signals to generate dense, metric-scale, and pixel-aligned depth maps. ...

By: Bin Tan, Changjiang Sun, Xiage Qin, Hanat Adai, Zelin Fu, Tian Zhou, Han Zhang, Yinghao Xu, Xing Zhu, Yujun Shen, Nan Xue
#cs.AI
Read More

Learning to Discover at Test Time

Researchers introduce TTT-Discover, a test-time training framework that enables Large Language Models to learn and adapt during problem-solving, leading to new state-of-the-art solutions in diverse sc...

By: Mert Yuksekgonul, Daniel Koceja, Xinhao Li, Federico Bianchi, Jed McCaleb, Xiaolong Wang, Jan Kautz, Yejin Choi, James Zou, Carlos Guestrin, Yu Sun
#cs.AI
Read More

Vision-Language Pre-training for Medical Image Analysis

This paper explores the application of vision-language pre-training techniques to improve the accuracy and interpretability of medical image analysis. By jointly learning from image and text data, the...

By: Xavier Garcia, Yara Rodriguez, Zoe Miller
#cs.AI
Read More

Interpretable AI for Financial Risk Assessment

This paper develops novel interpretable AI models for transparent and reliable financial risk assessment. By providing clear explanations for their predictions, these models increase trust and facilit...

By: Peter Scott, Quinn Adams, Rachel Baker
#cs.AI
Read More

Foundational Models for Robotics in Dynamic Environments

This paper explores the development of novel foundational models that enable robots to operate robustly and adaptively in complex and rapidly changing real-world environments. The models integrate adv...

By: Alice Smith, Bob Johnson, Carol White
#cs.AI
Read More

Efficient Language Model Quantization for Edge Devices

The research presents a novel quantization technique that significantly reduces the computational and memory footprint of large language models, making them deployable on resource-constrained edge dev...

By: David Green, Eva Black, Frank Blue
#cs.AI
Read More

Generative AI for Personalized Drug Discovery

This paper proposes a generative AI framework that accelerates the discovery of novel drug candidates tailored to individual patient genetic profiles. By leveraging advanced deep learning architecture...

By: Grace Lee, Henry Kim, Ivy Chen, Jack Wu
#cs.AI
Read More

Continual Learning for Autonomous Driving Systems

Addressing the challenge of catastrophic forgetting, this research introduces a continual learning paradigm for autonomous driving agents. The proposed methods allow vehicles to continuously learn fro...

By: Karen Park, Leo Rodriguez, Mia Taylor, Noah Davis, Olivia Hall
#cs.AI
Read More

daVinci-Dev: Agent-native Mid-training for Software Engineering

This paper introduces daVinci-Dev, a systematic agentic mid-training approach that equips large language models (LLMs) with foundational agentic behaviors for software engineering. It addresses the di...

By: Ji Zeng, Dayuan Fu, Tiantian Mi, Yumin Zhuang, Yaxing Huang, Xuefeng Li, Lyumanshan Ye, Muhang Xie, Qishuo Hua, Zhen Huang, Mohan Jiang, Hanning Wang, Jifan Lin, Yang Xiao, Jie Sun, Yunze Wu, Pengfei Liu
#cs.AI
Read More

Health-SCORE: Towards Scalable Rubrics for Improving Health-LLMs

This research focuses on developing scalable rubrics to enhance the quality and reliability of Large Language Models (LLMs) specifically tailored for healthcare applications. The goal is to improve th...

By: Zhichao Yang, Sepehr Janghorbani, Dongxu Zhang, Jun Han, Qian Qian, Andrew Ressler II, Gregory D. Lyng, Sanjit Singh Batra, Robert E. Tillman
#cs.AI
Read More

Skywork UniPic 3.0: Unified Multi-Image Composition via Sequence Modeling

Skywork UniPic 3.0 introduces a unified multi-image composition framework that leverages sequence modeling to generate complex and coherent images from multiple input components. This advancement in g...

By: Hongyang Wei, Hongbo Liu, Zidong Wang, Yi Peng, Baixin Xu, Size Wu, Xuying Zhang, Xianglong He, Zexiang Liu, Peiyu Wang, Xuchen Song, Yangguang Li, Yang Liu, Yahui Zhou
#cs.AI
Read More

LLM Prompt Evaluation for Educational Applications

This paper focuses on developing methods for evaluating prompts for Large Language Models (LLMs) specifically in educational contexts. It addresses the challenges of assessing prompt effectiveness and...

By: Langdon Holmes, Adam Coscia, Scott Crossley, Joon Suh Choi, Wesley Morris
#cs.AI
Read More

Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

This paper introduces Cosmos Policy, a method for fine-tuning large, pretrained latent video diffusion models into unified robot policies for visuomotor control and planning. It achieves state-of-the-...

By: Moo Jin Kim, Yihuai Gao, Tsung-Yi Lin, Yen-Chen Lin, Yunhao Ge, Grace Lam, Percy Liang, Shuran Song, Ming-Yu Liu, Chelsea Finn, Jinwei Gu
#cs.AI
Read More

VideoMaMa: Mask-Guided Video Matting via Generative Prior

Generalizing video matting models to real-world videos remains a significant challenge due to the scarcity of labeled data. We present VideoMaMa, a novel mask-guided video matting framework that conve...

By: Sangbeom Lim, Seoung Wug Oh, Jiahui Huang, Heeji Yoon, Seungryong Kim, Joon-Young Lee
#cs.AI
Read More

Your One-Stop Solution for AI-Generated Video Detection

This paper presents a comprehensive solution for detecting AI-generated videos, a critical need due to the increasing realism of synthetic media. The proposed system utilizes advanced computer vision ...

By: Long Ma, Zihao Xue, Yan Wang, Zhiyuan Yan, Jin Xu, Xiaorui Jiang, Haiyang Yu, Yong Liao, Zhen Bi
#cs.AI✓ Analyzed#Deepfake Detection#Video Forensics
Read More

The Great March 100: 100 Detail-oriented Tasks for Evaluating Embodied AI Agents

This paper introduces "The Great March 100" (GM-100), a benchmark of 100 detail-oriented tasks for evaluating embodied AI agents. It addresses limitations in existing datasets by providing a diverse a...

By: Ziyu Wang, Chenyuan Liu, Yushun Xiang, Runhao Zhang, Qingbo Hao, Hongliang Lu, Houyu Chen, Zhizhong Feng, Kaiyue Zheng, Dehao Ye, Xianchao Zeng, Xinyu Zhou, Boran Wen, Jiaxin Li, Mingyu Zhang, Kecheng Zheng, Qian Zhu, Ran Cheng, Yong-Lu Li
#cs.AI
Read More

ShapeR: Robust Conditional 3D Shape Generation from Casual Captures

ShapeR introduces a novel approach for robust conditional 3D object shape generation from casually captured image sequences. It leverages multi-modal inputs like SLAM points, posed images, and VLM-gen...

By: Yawar Siddiqui, Duncan Frost, Samir Aroudj, Armen Avetisyan, Henry Howard-Jenkins, Daniel DeTone, Pierre Moulon, Qirui Wu, Zhengqin Li, Julian Straub, Richard Newcombe, Jakob Engel
#cs.AI
Read More

Hyperparameter Optimization of Constraint Programming Solvers

This paper addresses the critical challenge of hyperparameter optimization for Constraint Programming (CP) solvers. It proposes advanced techniques to automatically tune these parameters, significantl...

By: Hedieh Haddad, Thibault Falque, Pierre Talbot, Pascal Bouvry
#cs.AI
Read More

LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

LSRIF introduces a logic-structured training framework that explicitly models instruction logic for large language models to improve instruction-following. It addresses challenges with sequential depe...

By: Qingyu Ren, Qianyu He, Jingwen Chang, Jie Zeng, Jiaqing Liang, Yanghua Xiao, Han Xia, Zeye Sun, Fei Yu
#cs.AI
Read More

Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning

This paper proposes Test-Time Tool Evolution (TTE), a new paradigm enabling LLM agents to synthesize, verify, and evolve executable tools during inference for scientific reasoning. It overcomes the li...

By: Jiaxuan Lu, Ziyu Kong, Yemin Wang, Rong Fu, Haiyuan Wan, Cheng Yang, Wenjie Lou, Haoran Sun, Lilong Wang, Yankai Jiang, Xiaosong Wang, Xiao Sun, Dongzhan Zhou
#cs.AI
Read More

Controlled Self-Evolution for Algorithmic Code Optimization

This paper proposes Controlled Self-Evolution (CSE) to enhance code generation through iterative generate-verify-refine cycles. It addresses inefficiencies in existing self-evolution methods for algor...

By: Tu Hu, Ronghao Chen, Shuo Zhang, Jianghao Yin, Mou Xiao Feng, Jingping Liu, Shaolei Zhang, Wenqi Jiang, Yuqi Fang, Sen Hu, Yi Xu, Huacan Wang
#cs.AI
Read More

Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Multi-agent systems powered by Large Language Models (LLMs) often struggle with resource-intensive and unstable training due to non-stationarity and sparse rewards in multi-agent reinforcement learnin...

By: Zhiyuan Hu, Yunhai Hu, Juncheng Liu, Shuyue Stella Li, Yucheng Wang, Zhen Xu, See-Kiong Ng, Anh Tuan Luu, Xinxing Xu, Bryan Hooi, Cynthia Breazeal, Hae Won Park
#cs.AI
Read More

Predictive Analytics for Dementia: Machine Learning on Healthcare Data

This study enhances dementia prediction using machine learning techniques on patient health data, with supervised learning algorithms like KNN, QDA, LDA, and Gaussian Process Classifiers. LDA achieved...

By: Shafiul Ajam Opee, Nafiz Fahad, Anik Sen, Rasel Ahmed, Fariha Jahan, Md. Kishor Morol, Md Rashedul Islam
#cs.AI
Read More

ECLIPSE: An Evolutionary Computation Library for Instrumentation Prototyping in Scientific Engineering

This paper introduces ECLIPSE, an Evolutionary Computation Library for Instrumentation Prototyping in Scientific Engineering. This library aims to accelerate the design and optimization of scientific ...

By: Max Foreback, Evan Imata, Vincent Ragusa, Jacob Weiler, Christina Shao, Joey Wagner, Katherine G. Skocelas, Jonathan Sy, Aman Hafez, Wolfgang Banzhaf, Amy Conolly, Kyle R. Helson, Rick Marcusen, Charles Ofria, Marcin Pilinski, Rajiv Ramnath, Bryan Reynolds, Anselmo C. Pontes, Emily Dolson, Julie Rolla
#cs.AI
Read More

AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling for LLMs

This paper proposes AdaFuse, an adaptive ensemble decoding method with test-time scaling for large language models (LLMs). This approach aims to enhance the performance of LLMs by combining outputs fr...

By: Chengming Cui, Tianxin Wei, Ziyi Chen, Ruizhong Qiu, Zhichen Zeng, Zhining Liu, Xuying Ning, Duo Zhou, Jingrui He
#cs.AI
Read More

AI-Assisted Authoring for Transparent, Data-Driven Documents

This paper introduces "transparent documents," interactive web-based scholarly articles that allow readers to explore the relationship to underlying data by hovering over text fragments. It also prese...

By: Alfonso Piscitelli, Cristina David, Mattia De Rosa, Ali Mohammed, Federico Nanni, Jacob Pake, Roly Perera, Jessy Sodimu, Chenyiqiu Zheng
#cs.AI
Read More

MineNPC-Task: Task Suite for Memory-Aware Minecraft Agents

This paper introduces MineNPC-Task, a task suite designed to evaluate memory-aware Minecraft agents. It focuses on the development of AI agents that can effectively manage and utilize memory in comple...

By: Tamil Sudaravan Mohan Doss, Michael Xu, Sudha Rao, Andrew D. Wilson, Balasaravanan Thoravi Kumaravel
#cs.AI
Read More

Stock Market Price Prediction using Neural Prophet with Deep Neural Network

This paper proposes a novel approach for stock market price prediction leveraging a hybrid model that combines Neural Prophet with a Deep Neural Network (DNN). The integration aims to capture both tim...

By: Navin Chhibber, Suneel Khemka, Navneet Kumar Tyagi, Rohit Tewari, Bireswar Banerjee, Piyush Ranjan
#cs.AI#Stock Prediction#Deep Learning
Read More

Learning Latent Action World Models In The Wild

Agents capable of reasoning and planning in the real world require the ability of predicting the consequences of their actions. While world models possess this capability, they most often require acti...

By: Quentin Garrido, Tushar Nagarajan, Basile Terver, Nicolas Ballas, Yann LeCun, Michael Rabbat
#cs.AI
Read More

Legal Alignment for Safe and Ethical AI

This paper examines the critical issue of legal alignment for safe and ethical artificial intelligence. It explores how AI development can be guided by legal and ethical frameworks to ensure responsib...

By: Noam Kolt, Nicholas Caputo, Jack Boeglin, Cullen O'Keefe, Rishi Bommasani, Stephen Casper, Mariano-Florentino Cuéllar, Noah Feldman, Iason Gabriel, Gillian K. Hadfield, Lewis Hammond, Peter Henderson, Atoosa Kasirzadeh, Seth Lazar, Anka Reuel, Kevin L. Wei, Jonathan Zittrain
#cs.AI
Read More

Fine-tuning Small Language Models as Efficient Enterprise Search Relevance Labelers

This paper investigates the fine-tuning of small language models to act as efficient enterprise search relevance labelers. The approach demonstrates how smaller LLMs can be optimized for specific busi...

By: Yue Kang, Zhuoyi Huang, Benji Schussheim, Diana Licon, Dina Atia, Shixing Cao, Jacob Danovitch, Kunho Kim, Billy Norcilien, Jonah Karpman, Mahmound Sayed, Mike Taylor, Tao Sun, Pavel Metrikov, Vipul Agarwal, Chris Quirk, Ye-Yi Wang, Nick Craswell, Irene Shaffer, Tianwei Chen, Sulaiman Vesal, Soundar Srinivasan
#cs.AI
Read More

Streaming Hallucination Detection in Long Chain-of-Thought Reasoning

This work focuses on developing methods for detecting hallucinations in long chain-of-thought reasoning processes, especially in the context of large language models. Effective hallucination detection...

By: Haolang Lu, Minghui Pan, Ripeng Li, Guoshun Nan, Jialin Zhuang, Zijie Zhao, Zhongxiang Sun, Kun Wang, Yang Liu
#cs.AI
Read More

Recursive Language Models

Recursive Language Models (RLMs) introduce a general inference strategy that allows Large Language Models (LLMs) to process arbitrarily long prompts (exceeding 10 million tokens) by treating them as e...

By: Alex L. Zhang, Omar Khattab
#cs.AI
Read More

RoboReward: General-Purpose Vision-Language Reward Models for Robotics

This paper introduces RoboReward, a set of general-purpose vision-language reward models along with a new benchmark called RoboRewardBench, designed for robotics applications. The RoboReward 8B model ...

By: Tony Lee, Andrew Wagenmaker, Karl Pertsch, Kevin Black, Suraj Nair, Michael Ahn, Jian Lan, Sergey Levine, Chelsea Finn
#cs.AI✓ Analyzed#Robotics#Reinforcement Learning
Read More

AMAP Agentic Planning Technical Report

This technical report introduces STAgent, an agentic large language model developed by Alibaba Amap, specifically engineered for real-world spatio-temporal reasoning and complex planning. It achieves ...

By: Yulan Hu, Xiangwen Zhang, Sheng Ouyang, Hao Yi, Lu Xu, Qinglin Lang, Lide Tan, Xiang Cheng, Tianchen Ye, Zhicong Li, Ge Chen, Wenjin Yang, Zheng Pan, Shaopan Xiong, Siran Yang, Ju Huang, Yan Zhang, Jiamang Wang, Yong Liu, Yinfeng Huang, Tucheng Lin, Xin Li, Ning Guo
#cs.AI
Read More

Coordinated Humanoid Manipulation with Choice Policies

This research focuses on developing sophisticated control policies for humanoid robots to achieve coordinated manipulation tasks. It explores how robots can make intelligent choices to perform complex...

By: Haozhi Qi, Yen-Jen Wang, Toru Lin, Brent Yi, Yi Ma, Koushil Sreenath, Jitendra Malik
#cs.AI
Read More

Iterative Deployment Improves Planning Skills in LLMs

This research investigates how iterative deployment strategies can significantly enhance the planning capabilities of Large Language Models (LLMs). The paper presents novel approaches for refining LLM...

By: Augusto B. Corrêa, Yoav Gelberg, Luckeciano C. Melo, Ilia Shumailov, André G. Pereira, Yarin Gal
#cs.AI
Read More

Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation

The paper presents Robo-Dopamine, a framework for high-precision robotic manipulation using reinforcement learning (RL). It introduces Dopamine-Reward, a novel multi-view, step-aware process reward mo...

By: Huajie Tan, Sixiang Chen, Yijie Xu, Zixiao Wang, Yuheng Ji, Cheng Chi, Yaoxu Lyu, Zhongxia Zhao, Xiansheng Chen, Peterson Co, Shaoxuan Xie, Guocai Yao, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang
#cs.AI
Read More

Training AI Co-Scientists Using Rubric Rewards

This paper introduces a scalable method to train language models as "AI co-scientists" capable of generating high-quality research plans across diverse scientific domains. It leverages automated extra...

By: Shashwat Goel, Rishi Hazra, Dulhan Jayalath, Timon Willi, Parag Jain, William F. Shen, Ilias Leontiadis, Francesco Barbieri, Yoram Bachrach, Jonas Geiping, Chenxi Whitehouse
#cs.AI✓ Analyzed#AI for Science#RLHF
Read More

MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

This paper introduces MAI-UI, a family of foundation GUI agents designed for real-world deployment. It integrates agent-user interaction, external tool use via MCP, and a native device-cloud collabora...

By: Hanzhang Zhou, Xu Zhang, Panrong Tong, Jianan Zhang, Liangyu Chen, Quyu Kong, Chenglin Cai, Chen Liu, Yue Wang, Jingren Zhou, Steven Hoi
#cs.AI
Read More

HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation

This paper presents HY-Motion 1.0, a series of state-of-the-art, large-scale motion generation models that produce 3D human motions from text descriptions. It is the first to scale Diffusion Transform...

By: Yuxin Wen, Qing Shuai, Di Kang, Jing Li, Cheng Wen, Yue Qian, Ningxin Jiao, Changhai Chen, Weijie Chen, Yiran Wang, Jinkun Guo, Dongyue An, Han Liu, Yanyu Tong, Chao Zhang, Qing Guo, Juan Chen, Qiao Zhang, Youyi Zhang, Zihao Yao, Cheng Zhang, Hong Duan, Xiaoping Wu, Qi Chen, Fei Cheng, Liang Dong, Peng He, Hao Zhang, Jiaxin Lin, Chao Zhang, Zhongyi Fan, Yifan Li, Zhichao Hu, Yuhong Liu, Linus, Jie Jiang, Xiaolong Li
#cs.AI
Read More

AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents

This survey unifies insights from cognitive neuroscience with Large Language Model (LLM)-driven agents, offering a comprehensive review of memory systems. It establishes a unified framework detailing ...

By: Jiafeng Liang, Hao Li, Chang Li, Jiaqi Zhou, Shixin Jiang, Zekun Wang, Changkai Ji, Zhihao Zhu, Runxuan Liu, Tao Ren, Jinlan Fu, See-Kiong Ng, Xia Liang, Ming Liu, Bing Qin
#cs.AI
Read More

Web World Models

This paper introduces Web World Models, a new approach to building AI agents that can understand and interact with the internet more effectively. It aims to create AI that can navigate, process inform...

By: Jichen Feng, Yifan Zhang, Chenggong Zhang, Yifu Lu, Shilong Liu, Mengdi Wang
#cs.AI
Read More

Knowledge Graph Augmented Large Language Models for Disease Prediction.

This paper explores the integration of knowledge graphs with large language models to enhance the accuracy and interpretability of disease prediction. By leveraging structured medical knowledge, the p...

By: Ruiyu Wang, Tuan Vinh, Ran Xu, Yuyin Zhou, Jiaying Lu, Carl Yang, Francisco Pasquel
#cs.AI#Knowledge Graphs#LLM
Read More

How Do Agents Perform Code Optimization? An Empirical Study

Performance optimization is a critical yet challenging aspect of software development, often requiring a deep understanding of system behavior, algorithmic tradeoffs, and careful code modifications. A...

By: Huiyun Peng, Antonio Zhong, Ricardo Andrés Calvo Méndez, Kelechi G. Kalu, James C. Davis
#cs.AI
Read More

Fast SAM2 with Text-Driven Token Pruning

Segment Anything Model 2 (SAM2), a vision foundation model has significantly advanced in prompt-driven video object segmentation, yet their practical deployment remains limited by the high computation...

By: Avilasha Mandal, Chaoning Zhang, Fachrina Dewi Puspitasari, Xudong Wang, Jiaquan Zhang, Caiyan Qin, Guoqing Wang, Yang Yang, Heng Tao Shen
#cs.AI
Read More

MiST: Understanding the Role of Mid-Stage Scientific Training in Developing Chemical Reasoning Models

This paper introduces MiST, a framework for understanding the impact of mid-stage scientific training on the development of chemical reasoning models. By improving these models, it has significant rea...

By: Andres M Bran, Tong Xie, Shai Pranesh, Jeffrey Meng, Xuan Vu Nguyen, Jeremy Goumaz, David Ming Segura, Ruizhi Xu, Dongzhan Zhou, Wenjie Zhang, Bram Hoex, Philippe Schwaller
#cs.AI
Read More

RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic

Ensuring the safety of embodied AI agents in complex, unstructured environments is a critical challenge. This paper introduces RoboSafe, a novel framework that integrates executable safety logic direc...

By: Le Wang, Zonghao Ying, Xiao Yang, Quanchen Zou, Zhenfei Yin, Tianlin Li, Jian Yang, Yaodong Yang, Aishan Liu, Xianglong Liu
#cs.AI
Read More

A Real-World Evaluation of LLM Medication Safety Reviews in NHS Primary Care

Large Language Models (LLMs) show promise for medication safety in healthcare. This paper presents a real-world evaluation of an LLM-powered system for medication safety reviews in NHS Primary Care, i...

By: Oliver Normand, Esther Borsi, Mitch Fruin, Lauren E Walker, Jamie Heagerty, Chris C. Holmes, Anthony J Avery, Iain E Buchan, Harry Coppock
#cs.AI
Read More

Learning General Policies with Policy Gradient Methods

Policy gradient methods are a cornerstone of reinforcement learning (RL), enabling agents to learn optimal behaviors in complex environments. This paper investigates advances in policy gradient method...

By: Simon Ståhlberg, Blai Bonet, Hector Geffner
#cs.AI
Read More

V-Agent: An Interactive Video Search System Using Vision-Language Models

We introduce V-Agent, a novel multi-agent platform designed for advanced video search and interactive user-system conversations. By fine-tuning a vision-language model (VLM) with a small video prefere...

By: SunYoung Park, Jong-Hyeon Lee, Youngjune Kim, Daegyu Sung, Younghyun Yu, Young-rok Cha, Jeongho Ju
#cs.AI✓ Analyzed#Video Retrieval#Vision-Language Models
Read More

Adversarial Robustness for Foundation Models through Self-Supervised Perturbation Generation

We introduce a new method to enhance the adversarial robustness of large-scale foundation models using a self-supervised approach to generate diverse and challenging perturbations. This technique sign...

By: Dr. Michael Brown, Dr. Jessica Lee, Prof. Benjamin Clark, Dr. Sofia Hernandez, Oliver Wilson, Dr. Grace Taylor, Prof. Kevin Moore
#cs.AI✓ Analyzed#Adversarial Machine Learning#LLM Safety
Read More

Scaling Laws for Energy Efficiency of Local LLMs

Deploying local large language models and vision-language models on edge devices requires balancing accuracy with constrained computational and energy budgets. This paper systematically benchmarks LLM...

By: Ander Alvarez, Alessandro Genuardi, Nilotpal Sinha, Antonio Tiene, Samuel Mugel, Román Orús
#cs.AI
Read More

Anubuddhi: Designing Quantum Optics Experiments with Multi-Agent AI

We present Anubuddhi, a multi-agent AI system that designs and simulates quantum optics experiments from natural language prompts without requiring specialized programming knowledge. The system compos...

By: Yifan Li, Yuxiang Zhang, Ziqiao Ma, Tianmin Shu, Zhiting Hu, Lianhui Qin
#cs.AI✓ Analyzed#Quantum Optics#Multi-Agent Systems
Read More

Distributional AGI Safety

We introduce the concept of Distributional AGI Safety, a framework for analyzing and ensuring the safety of Artificial General Intelligence (AGI) systems across diverse operational contexts and potent...

By: Nenad Tomašev, Matija Franklin, Julian Jacobs, Sébastien Krier, Simon Osindero
#cs.AI
Read More

AI-Mediated Social Interaction: A Multi-Scale Perspective

This paper explores AI-mediated social interaction from a multi-scale perspective, analyzing its impact at individual, group, and societal levels. We examine how AI agents and systems influence human ...

By: Junzhe Zhang
#cs.AI✓ Analyzed#AI-Mediated Communication#Computational Social Science
Read More

A Decision-Theoretic Approach for Managing Misalignment

This paper presents a decision-theoretic approach to manage misalignment in AI systems, a critical challenge for safe and ethical AI deployment. It provides a formal framework to reason about and miti...

By: Daniel A. Herrmann, Abinav Chari, Isabelle Qian, Sree Sharvesh, B. A. Levinstein
#cs.AI
Read More

MMGR: Multi-Modal Generative Reasoning

This work introduces MMGR, a framework for Multi-Modal Generative Reasoning, exploring the integration of various data modalities for enhanced AI understanding and generation, with applications in com...

By: Zefan Cai, Haoyi Qiu, Tianyi Ma, Haozhe Zhao, Gengze Zhou, Kung-Hsiang Huang, Parisa Kordjamshidi, Minjia Zhang, Xiao Wen, Jiuxiang Gu, Nanyun Peng, Junjie Hu
#cs.AI
Read More

Universal Reasoning Model

This paper introduces a universal reasoning model, aiming to develop a foundational AI system capable of diverse and general intelligence, potentially leading to more robust and adaptable AI applicati...

By: Zitian Gao, Lynx Chen, Yihao Xiao, He Xing, Ran Tao, Haoming Luo, Joey Zhou, Bryan Dai
#cs.AI
Read More

From Framework to Practice: Designing a Real-World Telehealth Application for Palliative Care

This paper analyzes the design of a telehealth application for palliative care, integrating quality, human values, and real-world considerations to improve accessibility and continuity of care in digi...

By: Wei Zhou, Rashina Hoda, Andy Li, Chris Bain, Laura Bird, Emmy Trinh, Peter Poon, Teresa O Brien, Mahima Kalla, Olivia Metcalf, Wendy Chapman, Joycelyn Ling, Sam Georgy, David Bevan
#cs.AI
Read More

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Seedance 1.5 pro is a foundational model for native, joint audio-visual generation, leveraging a dual-branch Diffusion Transformer architecture and a specialized multi-stage data pipeline. It achieves...

By: Siyan Chen, Yanfei Chen, Ying Chen, Zhuo Chen, Feng Cheng, Xuyan Chi, Jian Cong, Qinpeng Cui, Qide Dong, Junliang Fan, Jing Fang, Zetao Fang, Chengjian Feng, Han Feng, Mingyuan Gao, Yu Gao, Qiushan Guo, Boyang Hao, Qingkai Hao, Bibo He, Qian He, Xinfu Hou, Yifeng Hou, Zheyuan Hou, Xiaodong Huang, Yi Huang, Bo Jiang, Jinglin Jiang, Jianqiang Jin, Zhenping Jin, Yuxiang Kang, Li Ke, Hongbo Lai, Fan Li, Haitao Li, Hu Li, Junlin Li, Sheng Li, Xiang Li, Xiang Li, Yang Li, Yijun Li, Yirong Li, Yongcheng Li, Bo Liao, Jiayuan Li, Jiayan Lin, Kaiwen Lin, Xiangteng Li, Xiangyan Liu, Xin Liu, Xinyi Liu, Yuan Liu, Zekang Liu, Zhiwen Liu, Xiaojun Lu, Fan Ma, Meng Ma, Qi Ma, Xiang Ma, Zhenyu Ma, Jiajun Ma, Yang Miao, Weigang Mi, Chenchen Mu, Chen Mu, Hongyang Nie, Fan Pan, Yujun Pan, Pengcheng Pang, Qingchao Pang, Jianzheng Pan, Hao Peng, Shuming Qiu, Yan Qi, Xin Qian, Jing Qiao, Jie Ren, Yan Ru, Meng Shen, Hongshuai Shi, Fan Song, Jiayi Song, Minghui Song, Yihang Song, Yuxuan Song, Weining Su, Bo Sun, Jiahui Sun, Qingling Sun, Wenqiang Sun, Yan Sun, Yu Sun, Jiawei Su, Yu Tang, Gang Tao, Junpeng Tao, Jie Tian, Qi Tian, Jun Wang, Kang Wang, Liyuan Wang, Nan Wang, Tao Wang, Xu Wang, Xiaojin Wang, Xiaoping Wang, Xiaoyang Wang, Xinxing Wang, Xing Wang, Yu Wang, Yuyang Wang, Yicheng Wang, Yihang Wang, Yuning Wang, Yuwen Wang, Yuzheng Wang, Zhenghao Wang, Zhongtian Wang, Weimin Wang, Wei Wei, Bo Wen, Jian Wen, Jinbo Wen, Jianlin Wu, Jing Wu, Junjie Wu, Shenshen Wu, Yu Wu, Zhichao Wu, Junyan Wu, Xiaohong Xiang, Jiafeng Xie, Ming Xie, Jiaxu Xu, Jing Xu, Weimin Xu, Xiaowei Xu, Yang Xu, Bo Yan, Haosong Yan, Jing Yang, Kai Yang, Qian Yang, Sihan Yang, Xiaolong Yang, Xiaoxiang Yang, Xing Yang, You Yang, Chao Zeng, Mengyuan Zeng, Xiang Zhang, Xiaojing Zhang, Yu Zhang, Zhen Zhang, Zihao Zhang, Han Zhang, Lei Zhang, Yue Zhang, Zhirui Zhang, Shichao Zhao, Yixuan Zhao, Jian Zheng, Wen Zheng, Yuliang Zheng, Xingshuo Zhou, Hongru Zhu, Jiayuan Zhu, Jiaxin Zhu, Jun Zhu, Qing Zhu, Sheng Zhu, Xiaolong Zhu, Zhi Zhu, Zixuan Zhu, Donglai Zhu
#cs.AI
Read More

LLMs as Clinical Research Assistants: Secure and Accurate Extraction from Unstructured EHR Narratives

This paper presents a secure, modular framework that leverages locally deployed large language models (LLMs) to automate structured feature extraction from unstructured electronic health record (EHR) ...

By: Mitchell A. Klusty, Elizabeth C. Solie, Caroline N. Leach, W. Vaiden Logan, Lynnet E. Richey, John C. Gensel, David P. Szczykutowicz, Bryan C. McLellan, Emily B. Collier, Samuel E. Armstrong, V. K. Cody Bumgardner
#cs.AI
Read More

LongVie 2: Multimodal Controllable Ultra-Long Video World Model

This paper introduces LongVie 2, a multimodal controllable ultra-long video world model. It focuses on generating and understanding extended video sequences with high fidelity and controllability. Thi...

By: Jianxiong Gao, Zhaoxi Chen, Xian Liu, Junhao Zhuang, Chengming Xu, Jianfeng Feng, Yu Qiao, Yanwei Fu, Chenyang Si, Ziwei Liu
#cs.AI
Read More

From Code to Field: Evaluating the Robustness of Convolutional Neural Networks for Disease Diagnosis in Mango Leaves

Evaluates the robustness of CNNs for diagnosing diseases in mango leaves, highlighting practical applications of AI in agriculture for crop health monitoring. This research directly contributes to sus...

By: Gabriel Vitorino de Andrade, Saulo Roberto dos Santos, Itallo Patrick Castro Alves da Silva, Emanuel Adler Medeiros Pereira, Erick de Andrade Barboza
#cs.AI✓ Analyzed#Computer Vision#Agriculture
Read More

Differentiable Evolutionary Reinforcement Learning

Proposes a new approach combining differentiable programming with evolutionary strategies for reinforcement learning, aiming to improve learning efficiency and adaptability in complex environments. Th...

By: Sitao Cheng, Tianle Li, Xuhan Huang, Xunjian Yin, Difan Zou
#cs.AI
Read More

Dora: QoE-Aware Hybrid Parallelism for Distributed Edge AI

This paper introduces Dora, a framework for Quality of Experience (QoE) aware hybrid parallelism in distributed edge AI training and inference. It addresses the challenge of optimizing heterogeneous c...

By: Jianli Jin, Ziyang Lin, Qianli Dong, Yi Chen, Jayanth Srinivasa, Myungjin Lee, Zhaowei Tan, Fan Lai
#cs.AI
Read More

Towards General-Purpose Embodied AI with Large Language Models

Embodied AI, which aims to develop intelligent agents capable of perceiving, acting, and reasoning in physical or simulated environments, represents a grand challenge in artificial intelligence. The e...

By: Yuqi Cui, Weihang Ren, Junzhe Wang, Zhaocheng Huang, Haohong Lin, Bojun Zhang, Guangxuan Li, Xiaofeng Mao
#cs.AI
Read More

Prompt-guided Zero-shot Image Segmentation

Zero-shot image segmentation, the task of segmenting unseen object categories without requiring any labeled examples, is a challenging but highly desirable capability for many real-world computer visi...

By: Tao Yu, Qingfeng Chen, Hao Zhao
#cs.AI
Read More

Meta-learning for Few-shot Recommendation

Recommender systems are ubiquitous in modern digital platforms, guiding users to relevant items from vast catalogs. A significant challenge arises in few-shot recommendation scenarios, where new items...

By: Yichao Lv, Fan Yang, Yiqi Wang, Xiangyu Zhao, Guohua Li
#cs.AI
Read More

An Introduction to Large Language Models for Scientific Discovery

Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, revolutionizing natural language processing and extending their influence to scientific research. This su...

By: Jiachen Li, Yujing Jiang, Zhiyuan Liu, Jie Tang
#cs.AI✓ Analyzed#LLM#Scientific Discovery
Read More

AI Benchmark Democratization and Carpentry

This paper advocates for dynamic and inclusive benchmarking to ensure AI evaluation keeps pace with its evolution, supporting responsible, reproducible, and accessible AI deployment. It aims to improv...

By: Gregor von Laszewski, Wesley Brewer, Jeyan Thiyagalingam, Juri Papay, Armstrong Foundjem, Piotr Luszczek, Murali Emani, Shirley V. Moore, Vijay Janapa Reddi, Matthew D. Sinclair, Sebastian Lobentanzer, Sujata Goswami, Benjamin Hawks, Marco Colombo, Nhan Tran, Christine R. Kirkpatrick, Abdulkareem Alsudais, Gregg Barrett, Tianhao Li, Kirsten Morehouse, Shivaram Venkataraman, Rutwik Jain, Kartik Mathur, Victor Lu, Tejinder Singh, Khojasteh Z. Mirza, Kongtao Chen, Sasidhar Kunapuli, Gavin Farrell, Renato Umeton, Geoffrey C. Fox
#cs.AI
Read More

Multi-Granular Node Pruning for Circuit Discovery

Circuit discovery aims to identify minimal subnetworks that are responsible for specific behaviors in large language models (LLMs). Existing approaches primarily rely on iterative edge pruning, which ...

By: Muhammad Umair Haider, Hammad Rizwan, Hassan Sajjad, A.B. Siddique
#cs.AI
Read More

Agile Deliberation: Concept Deliberation for Subjective Visual Classification

From content moderation to content curation, applications requiring vision classifiers for visual concepts are rapidly expanding. Existing human-in-the-loop approaches typically assume users begin wit...

By: Leijie Wang, Otilia Stretcu, Wei Qiao, Thomas Denby, Krishnamurthy Viswanathan, Enming Luo, Chun-Ta Lu, Tushar Dogra, Ranjay Krishna, Ariel Fuxman
#cs.AI
Read More

COMPARE: Clinical Optimization with Modular Planning and Assessment via RAG-Enhanced AI-OCT: Superior Decision Support for Percutaneous Coronary Intervention Compared to ChatGPT-5 and Junior Operators

This paper introduces CA-GPT, a RAG-enhanced AI-OCT system, demonstrating superior decision support for Percutaneous Coronary Intervention (PCI). It significantly outperforms general-purpose large lan...

By: Wei Fang, Chiyao Wang, Wenshuai Ma, Hui Liu, Jianqiang Hu, Xiaona Niu, Yi Chu, Mingming Zhang, Jingxiao Yang, Dongwei Zhang, Zelin Li, Pengyun Liu, Jiawei Zheng, Pengke Zhang, Chaoshi Qin, Wangang Guo, Bin Wang, Yugang Xue, Wei Zhang, Zikuan Wang, Rui Zhu, Yihui Cao, Quanmao Lu, Rui Meng, Yan Li
#cs.AI
Read More

Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

This paper presents a comprehensive evaluation of AI agents against human cybersecurity professionals in live enterprise penetration testing. It highlights the capabilities of AI in discovering vulner...

By: Justin W. Lin, Eliot Krzysztof Jones, Donovan Julian Jasper, Ethan Jun-shen Ho, Anna Wu, Arnold Tianyi Yang, Neil Perry, Andy Zou, Matt Fredrikson, J. Zico Kolter, Percy Liang, Dan Boneh, Daniel E. Ho
#cs.AI✓ Analyzed#AI Agents#Cybersecurity
Read More

AI-Powered Material Discovery: Accelerating the Search for Novel Alloys

The discovery of new materials with desired properties is crucial for technological advancement but traditionally relies on costly and time-consuming experimental trials. We introduce an AI-driven pla...

By: Dr. Priya Sharma, Dr. Hiroshi Sato, Dr. Liam Murphy, Dr. Isabella Costa, Dr. Noah Brown, Dr. Mia Wilson, Dr. Ethan Hall
#cs.AI
Read More

OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis

This paper introduces OmniView, a novel diffusion model capable of generating high-quality 3D and 4D view syntheses from limited input. By leveraging advanced architectural designs and training strate...

By: Xiang Fan, Sharath Girish, Vivek Ramanujan, Chaoyang Wang, Ashkan Mirzaei, Petr Sushko, Aliaksandr Siarohin, Sergey Tulyakov, Ranjay Krishna
#cs.AI
Read More

Same Content, Different Answers: Cross-Modal Inconsistency in MLLMs

This paper addresses the critical issue of Multimodal Large Language Models (MLLMs) producing inconsistent or different answers when presented with the same information through various input modalitie...

By: Angela van Sprang, Laurens Samson, Ana Lucic, Erman Acar, Sennay Ghebreab, Yuki M. Asano
#cs.AI✓ Analyzed#MLLM#Computer Vision
Read More

EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce

This paper introduces EcomBench, a benchmark designed for the holistic evaluation of foundation agents in e-commerce, addressing the need for comprehensive assessment of AI's performance in this criti...

By: Rui Min, Zile Qiao, Ze Xu, Jiawen Zhai, Wenyu Gao, Xuanzhong Chen, Haozhen Sun, Zhen Zhang, Xinyu Wang, Hong Zhou, Wenbiao Yin, Xuan Zhou, Yong Jiang, Haicheng Liu, Liang Ding, Ling Zou, Yi R. (May)Fung, Yalong Li, Pengjun Xie
#cs.AI✓ Analyzed#E-commerce#LLM Agents
Read More

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

DAComp provides a comprehensive, research-grade benchmark for evaluating data agents across the entire data intelligence lifecycle, encompassing data engineering and open-ended data analysis, which is...

By: Fangyu Lei, Jinxiang Meng, Yiming Huang, Junjie Zhao, Yitong Zhang, Jianwen Luo, Xin Zou, Ruiyi Yang, Wenbo Shi, Yan Gao, Shizhu He, Zuo Wang, Qian Liu, Yang Wang, Ke Wang, Jun Zhao, Kang Liu
#cs.AI
Read More

ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning

This paper presents ReasonBENCH, a new benchmark designed to evaluate and quantify the stability and consistency of reasoning capabilities in Large Language Models. The findings are vital for understa...

By: Nearchos Potamitis, Lars Klein, Akhil Arora
#cs.AI✓ Analyzed#LLM#Reasoning
Read More

Large Causal Models from Large Language Models

This research introduces DEMOCRITUS, a novel system for constructing large causal models by leveraging Large Language Models to extract and structure textual knowledge across diverse domains. It pione...

By: Sridhar Mahadevan
#cs.AI
Read More

Dynamic Memory Management for Large Language Models

This paper addresses the challenge of efficient memory utilization in Large Language Models through a novel dynamic memory management system. It aims to optimize resource allocation, reduce computatio...

By: Mingxuan Wang, Hongkun Ma, Zifeng Wang, Jianxiong Li, Jun Huang
#cs.AI✓ Analyzed#LLM#Memory Management
Read More

Auditing Games for Sandbagging

This paper investigates methods for auditing strategic behavior, specifically "sandbagging," in game-theoretic settings. It aims to develop robust mechanisms for detecting and preventing deceptive pla...

By: Jordan Taylor, Sid Black, Dillon Bowen, Thomas Read, Satvik Golechha, Alex Zelenka-Martin, Oliver Makins, Connor Kissane, Kola Ayonrinde, Jacob Merizian, Samuel Marks, Chris Cundy, Joseph Bloom
#cs.AI✓ Analyzed#AI Safety#Game Theory
Read More

Impact of Data-Oriented and Object-Oriented Design on Performance and Cache Utilization with Artificial Intelligence Algorithms in Multi-Threaded CPUs

This study provides a comprehensive performance analysis of Data Oriented Design (DOD) versus traditional Object-Oriented Design (OOD), focusing on cache utilization and efficiency in multi-threaded e...

By: Gabriel M. Arantes, Richard F. Pinto, Bruno L. Dalmazo, Eduardo N. Borges, Giancarlo Lucca, Viviane L. D. de Mattos, Fabian C. Cardoso, Rafael A. Berri
#cs.AI
Read More

Distribution-informed Online Conformal Prediction

Conformal prediction is a framework for quantifying uncertainty in machine learning predictions, crucial for reliable real-world applications. This paper introduces an online conformal prediction meth...

By: Dongjian Hu, Junxi Wu, Shu-Tao Xia, Changliang Zou
#cs.LG
Read More

The Universal Weight Subspace Hypothesis

This research empirically validates that deep neural networks consistently converge to shared, low-dimensional parametric subspaces, leading to substantial memory efficiency and parameter-efficient ad...

By: Prakhar Kaushik, Shravan Chaudhari, Ankit Vaidya, Rama Chellappa, Alan Yuille
#cs.AI
Read More

WildCode: An Empirical Analysis of Code Generated by ChatGPT

This paper presents a large-scale empirical analysis of real-life code generated by ChatGPT, evaluating its correctness and security, and highlighting user's lack of security awareness for LLM-generat...

By: Kobra Khanmohammadi, Pooria Roy, Raphael Khoury, Abdelwahab Hamou-Lhadj, Wilfried Patrick Konan
#cs.AI✓ Analyzed#Large Language Models#Software Engineering
Read More

Model-Based and Sample-Efficient AI-Assisted Math Discovery in Sphere Packing

This paper presents a model-based framework combining Bayesian optimization with Monte Carlo Tree Search to achieve new state-of-the-art upper bounds in sphere packing, demonstrating AI's ability to a...

By: Rasul Tutunov, Alexandre Maraval, Antoine Grosnit, Xihan Li, Jun Wang, Haitham Bou-Ammar
#cs.AI✓ Analyzed#sphere packing#reinforcement learning
Read More

Exploring Human Perceptions of AI Responses: Insights from a Mixed-Methods Study on Risk Mitigation in Generative Models

This study investigates human perception and evaluation of AI-generated responses modified by a mitigator model to reduce harm, focusing on mitigation performance, transparency, and metrics to bridge ...

By: Heloisa Candello, Muneeza Azmat, Uma Sushmitha Gunturi, Raya Horesh, Rogerio Abreu de Paula, Heloisa Pimentel, Marcelo Carpinette Grave, Aminat Adebiyi, Tiago Machado, Maysa Malfiza Garcia de Macedo
#cs.AI
Read More

The AI Consumer Index (ACE)

The AI Consumer Index (ACE) is introduced as a comprehensive benchmark to evaluate the gap between advanced AI models and the practical needs of consumers, revealing significant limitations in current...

By: Julien Benchek, Rohit Shetty, Benjamin Hunsberger, Ajay Arun, Zach Richards, Brendan Foody, Osvald Nitski, Bertie Vidgen
#cs.AI
Read More

In search of the electron-phonon contribution to total energy

This paper investigates the electron-phonon contribution to total energy, an often-approximated factor in first-principles calculations. It clarifies the nature of this contribution and demonstrates i...

By: Samuel Poncé, Xavier Gonze
#imported✓ Analyzed#condensed matter physics#density functional theory
Read More

Valley Splittings in Si/SiGe Heterostructures from First Principles

This paper computes valley splittings in Si/SiGe superlattices using ab initio density functional theory (DFT), which provides an excellent description of interfaces, strains, and atomistic disorder. ...

By: Lukas Cvitkovich, Tancredi Salamone, Christoph Wilhelmer, Biel Martinez, Tibor Grasser, Yann-Michel Niquet
#imported✓ Analyzed#Quantum Computing#Silicon Spin Qubits
Read More

By: Unknown Authors
#imported✓ Analyzed#Large Language Models#DeepSeek
Read More

PENCO: A Physics-Energy-Numerical-Consistent Operator for 3D Phase Field Modeling

This work presents a novel operator for 3D phase field modeling that ensures consistency across physical, energetic, and numerical aspects, enabling more accurate simulations of material phenomena.

By: Mostafa Bamdad, Mohammad Sadegh Eshaghi, Cosmin Anitescu, Navid Valizadeh, Timon Rabczuk
#physics.comp-ph✓ Analyzed#Phase Field Modeling#Computational Materials Science
Read More

LEDDS: Portable LBM-DEM simulations on GPUs

This paper introduces a portable and efficient framework for Lattice Boltzmann Method and Discrete Element Method simulations on GPUs, accelerating complex multi-physics problems with potential for in...

By: Raphael Maggio-Aprile, Maxime Rambosson, Christophe Coreixas, Jonas Latt
#physics.comp-ph
Read More

Multi-LLM Collaboration for Medication Recommendation

This paper explores the potential of multi-Large Language Model (LLM) collaboration to enhance the accuracy and utility of medication recommendation systems, offering a practical real-world applicatio...

By: Huascar Sanchez, Briland Hitaj, Jules Bergmann, Linda Briesemeister
Read More

David vs. Goliath: Can Small Models Win Big with Agentic AI in Hardware Design?

This paper investigates the surprising efficacy of small models combined with agentic AI in achieving significant results within hardware design, suggesting a breakthrough in efficient AI application.

By: Shashwat Shankar, Subhranshu Pandey, Innocent Dengkhw Mochahari, Bhabesh Mali, Animesh Basak Chowdhury, Sukanta Bhattacharjee, Chandan Karfa
Read More

Toward Virtuous Reinforcement Learning

This paper critiques common patterns in machine ethics for Reinforcement Learning and advocates for a virtue-focused alternative, addressing the limitations of rule-based and single-objective reward a...

By: Majid Ghasemi, Mark Crowley
✓ Analyzed#AI Safety#Reinforcement Learning
Read More

Showing all 368 papers. Use the search above to filter.