Microsoft brings Copilot LLM features directly into Excel spreadsheet cells with a new in-cell function
阅读更多来源: The Decoder | 21-08-25
Show HN: I replaced vector databases with Git for AI memory (PoC)github.com/growth-kinetics
阅读更多来源: Hacker News | 21-08-25
AI crawlers, fetchers are blowing up websites; Meta, OpenAI are worst offenderstheregister.com
阅读更多来源: Hacker News | 21-08-25
AWS CEO says using AI to replace junior staff is 'Dumbest thing I've ever heard'theregister.com
阅读更多来源: Hacker News | 21-08-25
Mark Zuckerberg freezes AI hiring amid bubble fearstelegraph.co.uk
阅读更多来源: Hacker News | 21-08-25
Weaponizing image scaling against production AI systemstrailofbits.com
阅读更多来源: Hacker News | 21-08-25
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Authors: NVIDIA: Aarti Basant, Abhijit Khairnar, Abhijit Paithankar, Abhinav Khattar, Adi Renduchintala, Adithya Renduchintala, Aditya Malte, Akhiad Bercovich, Akshay Hazare, Alejandra Rico, Aleksander Ficek, Alex Kondratenko, Alex Shaposhnikov, Ali Taghibakhshi, Amelia Barton, Ameya Sunil Mahabaleshwarkar, Amy Shen, Andrew Tao, Ann Guan, Anna Shors, Anubhav Mandarwal, Arham Mehta, Arun Venkatesan, Ashton Sharabiani, Ashwath Aithal, Ashwin Poojary, Ayush Dattagupta, Balaram Buddharaju, Banghua Zhu, Barnaby Simkin, Bilal Kartal, Bita Darvish Rouhani, Bobby Chen, Boris Ginsburg, Brandon Norick, Brian Yu, Bryan Catanzaro, Charles Wang, Charlie Truong, Chetan Mungekar, Chintan Patel, Chris Alexiuk, Christian Munley, Christopher Parisien, Dan Su, Daniel Afrimi, Daniel Korzekwa, Daniel Rohrer, Daria Gitman, David Mosallanezhad, Deepak Narayanan, Dima Rekesh, Dina Yared, Dmytro Pykhtar, Dong Ahn, Duncan Riach, Eileen Long, Elliott Ning, Eric Chung, Erick Galinkin, Evelina Bakhturina, Gargi Prasad, Gerald Shen, Haim Elisha, Harsh Sharma, Hayley Ross, Helen Ngo, Herman Sahota, Hexin Wang, Hoo Chang Shin, Hua Huang, Iain Cunningham, Igor Gitman, Ivan Moshkov, Jaehun Jung, Jan Kautz, Jane Polak Scowcroft, Jared Casper, Jimmy Zhang, Jinze Xue, Jocelyn Huang, Joey Conway, John Kamalu, Jonathan Cohen, Joseph Jennings, Julien Veron Vialard, Junkeun Yi, Jupinder Parmar, Kari Briski, Katherine Cheung, Katherine Luna, Keith Wyss, Keshav Santhanam, Kezhi Kong, Krzysztof Pawelec, Kumar Anik, Kunlun Li, Kushan Ahmadian, Lawrence McAfee |
阅读更多来源: ArXiv AI | 21-08-25
Post-hoc LLM-Supported Debugging of Distributed Processes
Authors: Dennis Schiese, Andreas Both |
阅读更多来源: ArXiv AI | 21-08-25
Towards LLM-generated explanations for Component-based Knowledge Graph Question Answering Systems
Authors: Dennis Schiese, Aleksandr Perevalov, Andreas Both |
阅读更多来源: ArXiv AI | 21-08-25
Adaptively Robust LLM Inference Optimization under Prediction Uncertainty
Authors: Zixi Chen, Yinyu Ye, Zijie Zhou |
阅读更多来源: ArXiv AI | 21-08-25
Can LLM Agents Solve Collaborative Tasks? A Study on Urgency-Aware Planning and Coordination
Authors: João Vitor de Carvalho Silva, Douglas G. Macharet |
阅读更多来源: ArXiv AI | 21-08-25
ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine
Authors: Junying Chen, Zhenyang Cai, Zhiheng Liu, Yunjin Yang, Rongsheng Wang, Qingying Xiao, Xiangyi Feng, Zhan Su, Jing Guo, Xiang Wan, Guangjun Yu, Haizhou Li, Benyou Wang |
阅读更多来源: ArXiv AI | 21-08-25
PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning
Authors: Ruheng Wang, Hang Zhang, Trieu Nguyen, Shasha Feng, Hao-Wei Pang, Xiang Yu, Li Xiao, Peter Zhiping Zhang |
阅读更多来源: ArXiv AI | 21-08-25
Reliable generation of isomorphic physics problems using ChatGPT with prompt-chaining and tool use
Authors: Zhongzhou Chen |
阅读更多来源: ArXiv AI | 21-08-25
Evaluating Multilingual and Code-Switched Alignment in LLMs via Synthetic Natural Language Inference
Authors: Samir Abdaljalil, Erchin Serpedin, Khalid Qaraqe, Hasan Kurban |
阅读更多来源: ArXiv AI | 21-08-25
TransLLM: A Unified Multi-Task Foundation Framework for Urban Transportation via Learnable Prompting
Authors: Jiaming Leng, Yunying Bi, Chuan Qin, Bing Yin, Yanyong Zhang, Chao Wang |
阅读更多来源: ArXiv AI | 21-08-25
From Passive Tool to Socio-cognitive Teammate: A Conceptual Framework for Agentic AI in Human-AI Collaborative Learning
Authors: Lixiang Yan |
阅读更多来源: ArXiv AI | 21-08-25
Large Language Models are Highly Aligned with Human Ratings of Emotional Stimuli
Authors: Mattson Ogg, Chace Ashcraft, Ritwik Bose, Raphael Norman-Tenazas, Michael Wolmetz |
阅读更多来源: ArXiv AI | 21-08-25
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs
Authors: Haokun Lin, Haobo Xu, Yichen Wu, Ziyu Guo, Renrui Zhang, Zhichao Lu, Ying Wei, Qingfu Zhang, Zhenan Sun |
阅读更多来源: ArXiv AI | 21-08-25
The Agent Behavior: Model, Governance and Challenges in the AI Digital Age
Authors: Qiang Zhang, Pei Yan, Yijia Xu, Chuanpo Fu, Yong Fang, Yang Liu |
阅读更多来源: ArXiv AI | 21-08-25
Automated Optimization Modeling through Expert-Guided Large Language Model Reasoning
Authors: Beinuo Yang, Qishen Zhou, Junyi Li, Xingchen Su, Simon Hu |
阅读更多来源: ArXiv AI | 21-08-25
Who Sees What? Structured Thought-Action Sequences for Epistemic Reasoning in LLMs
Authors: Luca Annese, Sabrina Patania, Silvia Serino, Tom Foulsham, Silvia Rossi, Azzurra Ruggeri, Dimitri Ognibene |
阅读更多来源: ArXiv AI | 21-08-25
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers
Authors: Ziyang Luo, Zhiqi Shen, Wenzhuo Yang, Zirui Zhao, Prathyusha Jwalapuram, Amrita Saha, Doyen Sahoo, Silvio Savarese, Caiming Xiong, Junnan Li |
阅读更多来源: ArXiv AI | 21-08-25
Entropy-Constrained Strategy Optimization in Urban Floods: A Multi-Agent Framework with LLM and Knowledge Graph Integration
Authors: Peilin Ji, Xiao Xue, Simeng Wang, Wenhao Yan |
阅读更多来源: ArXiv AI | 21-08-25
Warnings about runaway expectations are growing louder throughout the AI industry
阅读更多来源: The Decoder | 21-08-25
Visualizing GPT-OSS-20B embeddingsmelonmars.github.io
阅读更多来源: Hacker News | 21-08-25
Gaussian Processes for Machine Learning (2006) [pdf]gaussianprocess.org
阅读更多来源: Hacker News | 20-08-25
Show HN: Claude Code workflow: PRDs → GitHub Issues → parallel executiongithub.com/automazeio
阅读更多来源: Hacker News | 20-08-25
ASDFormer: A Transformer with Mixtures of Pooling-Classifier Experts for Robust Autism Diagnosis and Biomarker Discovery
Authors: Mohammad Izadi, Mehran Safayani |
阅读更多来源: ArXiv AI | 20-08-25
Chunks as Arms: Multi-Armed Bandit-Guided Sampling for Long-Context LLM Preference Optimization
Authors: Shaohua Duan, Xinze Li, Zhenghao Liu, Xiaoyuan Yi, Yukun Yan, Shuo Wang, Yu Gu, Ge Yu, Maosong Sun |
阅读更多来源: ArXiv AI | 20-08-25
Ask Good Questions for Large Language Models
Authors: Qi Wu, Zhongqi Lu |
阅读更多来源: ArXiv AI | 20-08-25
Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation
Authors: Dongyoon Hahm, Taywon Min, Woogyeol Jin, Kimin Lee |
阅读更多来源: ArXiv AI | 20-08-25
Cognitive Workspace: Active Memory Management for LLMs -- An Empirical Study of Functional Infinite Context
Authors: Tao An |
阅读更多来源: ArXiv AI | 20-08-25
Towards Unified Multimodal Financial Forecasting: Integrating Sentiment Embeddings and Market Indicators via Cross-Modal Attention
Authors: Sarthak Khanna, Armin Berger, David Berghaus, Tobias Deusser, Lorenz Sparrenberg, Rafet Sifa |
阅读更多来源: ArXiv AI | 20-08-25
"DIVE" into Hydrogen Storage Materials Discovery with AI Agents
Authors: Di Zhang, Xue Jia, Tran Ba Hung, Seong Hoon Jang, Linda Zhang, Ryuhei Sato, Yusuke Hashimoto, Toyoto Sato, Kiyoe Konno, Shin-ichi Orimo, Hao Li |
阅读更多来源: ArXiv AI | 20-08-25
HiFo-Prompt: Prompting with Hindsight and Foresight for LLM-based Automatic Heuristic Design
Authors: Chentong Chen, Mengyuan Zhong, Jianyong Sun, Ye Fan, Jialong Shi |
阅读更多来源: ArXiv AI | 20-08-25
STPFormer: A State-of-the-Art Pattern-Aware Spatio-Temporal Transformer for Traffic Forecasting
Authors: Jiayu Fang, Zhiqi Shao, S T Boris Choy, Junbin Gao |
阅读更多来源: ArXiv AI | 20-08-25
Toward Better EHR Reasoning in LLMs: Reinforcement Learning with Expert Attention Guidance
Authors: Yue Fang, Yuxin Guo, Jiaran Gao, Hongxin Ding, Xinke Jiang, Weibin Liao, Yongxin Xu, Yinghao Zhu, Zhibang Yang, Liantao Ma, Junfeng Zhao, Yasha Wang |
阅读更多来源: ArXiv AI | 20-08-25
Neuro-Symbolic Artificial Intelligence: Towards Improving the Reasoning Abilities of Large Language Models
Authors: Xiao-Wen Yang, Jie-Jing Shao, Lan-Zhe Guo, Bo-Wen Zhang, Zhi Zhou, Lin-Han Jia, Wang-Zhou Dai, Yu-Feng Li |
阅读更多来源: ArXiv AI | 20-08-25
MHSNet:An MoE-based Hierarchical Semantic Representation Network for Accurate Duplicate Resume Detection with Large Language Model
Authors: Yu Li, Zulong Chen, Wenjian Xu, Hong Wen, Yipeng Yu, Man Lung Yiu, Yuyu Yin |
阅读更多来源: ArXiv AI | 20-08-25
CausalPlan: Empowering Efficient LLM Multi-Agent Collaboration Through Causality-Driven Planning
Authors: Minh Hoang Nguyen, Van Dai Do, Dung Nguyen, Thin Nguyen, Hung Le |
阅读更多来源: ArXiv AI | 20-08-25
Revisiting RAG Ensemble: A Theoretical and Mechanistic Analysis of Multi-RAG System Collaboration
Authors: Yifei Chen, Guanting Dong, Yutao Zhu, Zhicheng Dou |
阅读更多来源: ArXiv AI | 20-08-25
Expertise-aware Multi-LLM Recruitment and Collaboration for Medical Decision-Making
Authors: Liuxin Bao, Zhihao Peng, Xiaofei Zhou, Runmin Cong, Jiyong Zhang, Yixuan Yuan |
阅读更多来源: ArXiv AI | 20-08-25
Improved Generalized Planning with LLMs through Strategy Refinement and Reflection
Authors: Katharina Stein, Nils Hodel, Daniel Fišer, Jörg Hoffmann, Michael Katz, Alexander Koller |
阅读更多来源: ArXiv AI | 20-08-25
The Collaboration Paradox: Why Generative AI Requires Both Strategic Intelligence and Operational Stability in Supply Chain Management
Authors: Soumyadeep Dhar |
阅读更多来源: ArXiv AI | 20-08-25
Structured Agentic Workflows for Financial Time-Series Modeling with LLMs and Reflective Feedback
Authors: Yihao Ang, Yifan Bao, Lei Jiang, Jiajie Tao, Anthony K. H. Tung, Lukasz Szpruch, Hao Ni |
阅读更多来源: ArXiv AI | 20-08-25
ChronoLLM: Customizing Language Models for Physics-Based Simulation Code Generation
Authors: Jingquan Wang, Andrew Negrut, Harry Zhang, Khailanii Slaton, Shu Wang, Radu Serban, Jinlong Wu, Dan Negrut |
阅读更多来源: ArXiv AI | 20-08-25
Show HN: OpenAI/reflect – Physical AI Assistant that illuminates your lifegithub.com/openai
阅读更多来源: Hacker News | 20-08-25
Richard Sutton says the AI industry has "lost its way" by ignoring core principles of intelligence
阅读更多来源: The Decoder | 20-08-25
Show HN: We started building an AI dev tool but it turned into a Sims-style gameyoutube.com
阅读更多来源: Hacker News | 19-08-25
Data Mixing Optimization for Supervised Fine-Tuning of Large Language Models
Authors: Yuan Li, Zhengzhong Liu, Eric Xing |
阅读更多来源: ArXiv AI | 19-08-25
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
Authors: Zhiyuan Zeng, Jiashuo Liu, Siyuan Chen, Tianci He, Yali Liao, Jinpeng Wang, Zaiyuan Wang, Yang Yang, Lingyue Yin, Mingren Yin, Zhenwei Zhu, Tianle Cai, Zehui Chen, Jiecao Chen, Yantao Du, Xiang Gao, Jiacheng Guo, Liang Hu, Jianpeng Jiao, Xiangsheng Li, Jingkai Liu, Shuang Ni, Zhoufutu Wen, Ge Zhang, Kaiyuan Zhang, Xin Zhou, Jose Blanchet, Xipeng Qiu, Mengdi Wang, Wenhao Huang |
阅读更多来源: ArXiv AI | 19-08-25
MOVER: Multimodal Optimal Transport with Volume-based Embedding Regularization
Authors: Haochen You, Baojing Liu |
阅读更多来源: ArXiv AI | 19-08-25
GraphCogent: Overcoming LLMs' Working Memory Constraints via Multi-Agent Collaboration in Complex Graph Understanding
Authors: Rongzheng Wang, Qizhi Chen, Yihong Huang, Yizhuo Ma, Muquan Li, Jiakai Li, Ke Qin, Guangchun Luo, Shuang Liang |
阅读更多来源: ArXiv AI | 19-08-25
GALA: Can Graph-Augmented Large Language Model Agentic Workflows Elevate Root Cause Analysis?
Authors: Yifang Tian, Yaming Liu, Zichun Chong, Zihang Huang, Hans-Arno Jacobsen |
阅读更多来源: ArXiv AI | 19-08-25
An LLM + ASP Workflow for Joint Entity-Relation Extraction
Authors: Trang Tran, Trung Hoang Le, Huiping Cao, Tran Cao Son |
阅读更多来源: ArXiv AI | 19-08-25
Help or Hurdle? Rethinking Model Context Protocol-Augmented Large Language Models
Authors: Wei Song, Haonan Zhong, Ziqi Ding, Jingling Xue, Yuekang Li |
阅读更多来源: ArXiv AI | 19-08-25
GridCodex: A RAG-Driven AI Framework for Power Grid Code Reasoning and Compliance
Authors: Jinquan Shi, Yingying Cheng, Fan Zhang, Miao Jiang, Jun Lin, Yanbai Shen |
阅读更多来源: ArXiv AI | 19-08-25
The Maximum Coverage Model and Recommendation System for UAV Vertiports Location Planning
Authors: Chunliang Hua, Xiao Hu, Jiayang Sun, Zeyuan Yang |
阅读更多来源: ArXiv AI | 19-08-25
Beyond Ethical Alignment: Evaluating LLMs as Artificial Moral Assistants
Authors: Alessio Galatolo, Luca Alberto Rappuoli, Katie Winkle, Meriem Beloucif |
阅读更多来源: ArXiv AI | 19-08-25
GTool: Graph Enhanced Tool Planning with Large Language Model
Authors: Wenjie Chen, Wenbin Li, Di Yao, Xuying Meng, Chang Gong, Jingping Bi |
阅读更多来源: ArXiv AI | 19-08-25
Reliability, Embeddedness, and Agency: A Utility-Driven Mathematical Framework for Agent-Centric AI Adoption
Authors: Faruk Alpay, Taylan Alpay |
阅读更多来源: ArXiv AI | 19-08-25
E3RG: Building Explicit Emotion-driven Empathetic Response Generation System with Multimodal Large Language Model
Authors: Ronghao Lin, Shuai Shen, Weipeng Hu, Qiaolin He, Aolin Xiong, Li Huang, Haifeng Hu, Yap-peng Tan |
阅读更多来源: ArXiv AI | 19-08-25
Towards Open-Ended Emotional Support Conversations in LLMs via Reinforcement Learning with Future-Oriented Rewards
Authors: Ting Yang, Li Chen, Huimin Wang |
阅读更多来源: ArXiv AI | 19-08-25
Do Large Language Model Agents Exhibit a Survival Instinct? An Empirical Study in a Sugarscape-Style Simulation
Authors: Atsushi Masumori, Takashi Ikegami |
阅读更多来源: ArXiv AI | 19-08-25
Tencent's X-Omni uses open source components to challenge GPT-4o image generation
阅读更多来源: The Decoder | 18-08-25
ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection
Authors: Axel Delaval, Shujian Yang, Haicheng Wang, Han Qiu, Jialiang Lu |
阅读更多来源: ArXiv AI | 18-08-25
LETToT: Label-Free Evaluation of Large Language Models On Tourism Using Expert Tree-of-Thought
Authors: Ruiyan Qi, Congding Wen, Weibo Zhou, Shangsong Liang, Lingbo Li |
阅读更多来源: ArXiv AI | 18-08-25
Is General-Purpose AI Reasoning Sensitive to Data-Induced Cognitive Biases? Dynamic Benchmarking on Typical Software Engineering Dilemmas
Authors: Francesco Sovrano, Gabriele Dominici, Rita Sevastjanova, Alessandra Stramiglio, Alberto Bacchelli |
阅读更多来源: ArXiv AI | 18-08-25
Dynamic Quality-Latency Aware Routing for LLM Inference in Wireless Edge-Device Networks
Authors: Rui Bao, Nan Xue, Yaping Sun, Zhiyong Chen |
阅读更多来源: ArXiv AI | 18-08-25
CSGO: Generalized Optimization for Cold Start in Wireless Collaborative Edge LLM Systems
Authors: Xuran Liu, Nan Xue, Rui Bao, Yaping Sun, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Shuguang Cui |
阅读更多来源: ArXiv AI | 18-08-25
Leveraging the RETFound foundation model for optic disc segmentation in retinal images
Authors: Zhenyi Zhao, Muthu Rama Krishnan Mookiah, Emanuele Trucco |
阅读更多来源: ArXiv AI | 18-08-25
ETTRL: Balancing Exploration and Exploitation in LLM Test-Time Reinforcement Learning Via Entropy Mechanism
Authors: Jia Liu, ChangYi He, YingQiao Lin, MingMin Yang, FeiYang Shen, ShaoGuo Liu, TingTing Gao |
阅读更多来源: ArXiv AI | 18-08-25
When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs
Authors: Mikhail Seleznyov, Mikhail Chaichuk, Gleb Ershov, Alexander Panchenko, Elena Tutubalina, Oleg Somov |
阅读更多来源: ArXiv AI | 18-08-25
Trustworthy AI Psychotherapy: Multi-Agent LLM Workflow for Counseling and Explainable Mental Disorder Diagnosis
Authors: Mithat Can Ozgun, Jiahuan Pei, Koen Hindriks, Lucia Donatelli, Qingzhi Liu, Xin Sun, Junxiao Wang |
阅读更多来源: ArXiv AI | 18-08-25
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
Authors: Wenhao Zhang, Yuexiang Xie, Yuchang Sun, Yanxi Chen, Guoyin Wang, Yaliang Li, Bolin Ding, Jingren Zhou |
阅读更多来源: ArXiv AI | 18-08-25
Reference Points in LLM Sentiment Analysis: The Role of Structured Context
Authors: Junichiro Niimi |
阅读更多来源: ArXiv AI | 18-08-25
Towards Faithful Class-level Self-explainability in Graph Neural Networks by Subgraph Dependencies
Authors: Fanzhen Liu, Xiaoxiao Ma, Jian Yang, Alsharif Abuadbba, Kristen Moore, Surya Nepal, Cecile Paris, Quan Z. Sheng, Jia Wu |
阅读更多来源: ArXiv AI | 18-08-25
Handwritten Text Recognition of Historical Manuscripts Using Transformer-Based Models
Authors: Erez Meoded |
阅读更多来源: ArXiv AI | 18-08-25
A Comprehensive Perspective on Explainable AI across the Machine Learning Workflow
Authors: George Paterakis, Andrea Castellani, George Papoutsoglou, Tobias Rodemann, Ioannis Tsamardinos |
阅读更多来源: ArXiv AI | 18-08-25
CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection
Authors: Zhihao Li, Zimo Ji, Tao Zheng, Hao Ren, Xiao Lan |
阅读更多来源: ArXiv AI | 18-08-25
Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models
Authors: Qiguang Chen, Dengyun Peng, Jinhao Liu, HuiKang Su, Jiannan Guan, Libo Qin, Wanxiang Che |
阅读更多来源: ArXiv AI | 18-08-25
Controlling Multimodal LLMs via Reward-guided Decoding
Authors: Oscar Mañas, Pierluca D'Oro, Koustuv Sinha, Adriana Romero-Soriano, Michal Drozdzal, Aishwarya Agrawal |
阅读更多来源: ArXiv AI | 18-08-25
Is ChatGPT-5 Ready for Mammogram VQA?
Authors: Qiang Li, Shansong Wang, Mingzhe Hu, Mojtaba Safari, Zachary Eidex, Xiaofeng Yang |
阅读更多来源: ArXiv AI | 18-08-25
SAGE: Scale-Aware Gradual Evolution for Continual Knowledge Graph Embedding
Authors: Yifei Li, Lingling Zhang, Hang Yan, Tianzhe Zhao, Zihan Ma, Muye Huang, Jun Liu |
阅读更多来源: ArXiv AI | 18-08-25
AIM-Bench: Evaluating Decision-making Biases of Agentic LLM as Inventory Manager
Authors: Xuhua Zhao, Yuxuan Xie, Caihua Chen, Yuxiang Sun |
阅读更多来源: ArXiv AI | 18-08-25
Inspire or Predict? Exploring New Paradigms in Assisting Classical Planners with Large Language Models
Authors: Wenkai Yu, Jianhang Tang, Yang Zhang, Shanjiang Tang, Kebing Jin, Hankz Hankui Zhuo |
阅读更多来源: ArXiv AI | 18-08-25
LLMs and coding agents are a security nightmaregarymarcus.substack.com
阅读更多来源: Hacker News | 18-08-25
Llama-Scan: Convert PDFs to Text W Local LLMsgithub.com/ngafar
阅读更多来源: Hacker News | 18-08-25
When you're asking AI chatbots for answers, they're data-mining youtheregister.com
阅读更多来源: Hacker News | 18-08-25
Claudia – Desktop companion for Claude codeclaudiacode.com
阅读更多来源: Hacker News | 18-08-25
Teaching GPT-5 to Use a Computerprava.co
阅读更多来源: Hacker News | 18-08-25
Here be dragons: Preventing static damage, latchup, and metastability in the 386righto.com
阅读更多来源: Hacker News | 18-08-25
Warmer-sounding LLMs are more likely to repeat false information and conspiracy theories
阅读更多来源: The Decoder | 18-08-25
Performance of GPT-5 in Brain Tumor MRI Reasoning
Authors: Mojtaba Safari, Shansong Wang, Mingzhe Hu, Zach Eidex, Qiang Li, Xiaofeng Yang |
阅读更多来源: ArXiv AI | 17-08-25
From Black Box to Transparency: Enhancing Automated Interpreting Assessment with Explainable AI in College Classrooms
Authors: Zhaokun Jiang, Ziyin Zhang |
阅读更多来源: ArXiv AI | 17-08-25
A Multimodal Neural Network for Recognizing Subjective Self-Disclosure Towards Social Robots
Authors: Henry Powell, Guy Laban, Emily S. Cross |
阅读更多来源: ArXiv AI | 17-08-25
TLE-Based A2C Agent for Terrestrial Coverage Orbital Path Planning
Authors: Anantha Narayanan, Battu Bhanu Teja, Pruthwik Mishra |
阅读更多来源: ArXiv AI | 17-08-25
Searching for Privacy Risks in LLM Agents via Simulation
Authors: Yanzhe Zhang, Diyi Yang |
阅读更多来源: ArXiv AI | 17-08-25
Amazon Nova AI Challenge -- Trusted AI: Advancing secure, AI-assisted software development
Authors: Sattvik Sahai, Prasoon Goyal, Michael Johnston, Anna Gottardi, Yao Lu, Lucy Hu, Luke Dai, Shaohua Liu, Samyuth Sagi, Hangjie Shi, Desheng Zhang, Lavina Vaz, Leslie Ball, Maureen Murray, Rahul Gupta, Shankar Ananthakrishna |
阅读更多来源: ArXiv AI | 17-08-25
A Survey of Optimization Modeling Meets LLMs: Progress and Future Directions
Authors: Ziyang Xiao, Jingrong Xie, Lilin Xu, Shisi Guan, Jingyan Zhu, Xiongwei Han, Xiaojin Fu, WingYin Yu, Han Wu, Wei Shi, Qingcan Kang, Jiahui Duan, Tao Zhong, Mingxuan Yuan, Jia Zeng, Yuan Wang, Gang Chen, Dongxiang Zhang |
阅读更多来源: ArXiv AI | 17-08-25
KompeteAI: Accelerated Autonomous Multi-Agent System for End-to-End Pipeline Generation for Machine Learning Problems
Authors: Stepan Kulibaba, Artem Dzhalilov, Roman Pakhomov, Oleg Svidchenko, Alexander Gasnikov, Aleksei Shpilman |
阅读更多来源: ArXiv AI | 17-08-25
Agentic AI Frameworks: Architectures, Protocols, and Design Challenges
Authors: Hana Derouiche, Zaki Brahmi, Haithem Mazeni |
阅读更多来源: ArXiv AI | 17-08-25
Why Cannot Large Language Models Ever Make True Correct Reasoning?
Authors: Jingde Cheng |
阅读更多来源: ArXiv AI | 17-08-25
Extending the Entropic Potential of Events for Uncertainty Quantification and Decision-Making in Artificial Intelligence
Authors: Mark Zilberman |
阅读更多来源: ArXiv AI | 17-08-25
What to Ask Next? Probing the Imaginative Reasoning of LLMs with TurtleSoup Puzzles
Authors: Mengtao Zhou, Sifan Wu, Huan Zhang, Qi Sima, Bang Liu |
阅读更多来源: ArXiv AI | 17-08-25
A Curriculum Learning Approach to Reinforcement Learning: Leveraging RAG for Multimodal Question Answering
Authors: Chenliang Zhang, Lin Wang, Yuanyuan Lu, Yusheng Qi, Kexin Wang, Peixu Hou, Wenshi Chen |
阅读更多来源: ArXiv AI | 17-08-25
HiRef: Leveraging Hierarchical Ontology and Network Refinement for Robust Medication Recommendation
Authors: Yan Ting Chok, Soyon Park, Seungheun Baek, Hajung Kim, Junhyun Lee, Jaewoo Kang |
阅读更多来源: ArXiv AI | 17-08-25
LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval
Authors: Yaoze Zhang, Rong Wu, Pinlong Cai, Xiaoman Wang, Guohang Yan, Song Mao, Ding Wang, Botian Shi |
阅读更多来源: ArXiv AI | 17-08-25
Reverse Physician-AI Relationship: Full-process Clinical Diagnosis Driven by a Large Language Model
Authors: Shicheng Xu, Xin Huang, Zihao Wei, Liang Pang, Huawei Shen, Xueqi Cheng |
阅读更多来源: ArXiv AI | 17-08-25
SEQ-GPT: LLM-assisted Spatial Query via Example
Authors: Ivan Khai Ze Lim, Ningyi Liao, Yiming Yang, Gerald Wei Yong Yip, Siqiang Luo |
阅读更多来源: ArXiv AI | 17-08-25
FIRESPARQL: A LLM-based Framework for SPARQL Query Generation over Scholarly Knowledge Graphs
Authors: Xueli Pan, Victor de Boer, Jacco van Ossenbruggen |
阅读更多来源: ArXiv AI | 17-08-25
MSRS: Adaptive Multi-Subspace Representation Steering for Attribute Alignment in Large Language Models
Authors: Xinyan Jiang, Lin Zhang, Jiayi Zhang, Qingsong Yang, Guimin Hu, Di Wang, Lijie Hu |
阅读更多来源: ArXiv AI | 17-08-25
GenOM: Ontology Matching with Description Generation and Large Language Model
Authors: Yiping Song, Jiaoyan Chen, Renate A. Schmidt |
阅读更多来源: ArXiv AI | 17-08-25
Modeling Human Responses to Multimodal AI Content
Authors: Zhiqi Shen, Shaojing Fan, Danni Xu, Terence Sim, Mohan Kankanhalli |
阅读更多来源: ArXiv AI | 17-08-25
Who Benefits from AI Explanations? Towards Accessible and Interpretable Systems
Authors: Maria J. P. Peixoto, Akriti Pandey, Ahsan Zaman, Peter R. Lewis |
阅读更多来源: ArXiv AI | 17-08-25
The Knowledge-Reasoning Dissociation: Fundamental Limitations of LLMs in Clinical Natural Language Inference
Authors: Maël Jullien, Marco Valentino, André Freitas |
阅读更多来源: ArXiv AI | 17-08-25
Tversky Neural Networksgonzoml.substack.com
阅读更多来源: Hacker News | 17-08-25
A Lisp in 99LOCgithub.com/robert-van-engelen
阅读更多来源: Hacker News | 17-08-25
Dyna – Logic Programming for Machine Learningdyna.org
阅读更多来源: Hacker News | 17-08-25
OpenAI Misled You on RLHFaerial-toothpaste-34a.notion.site
阅读更多来源: Hacker News | 17-08-25
OpenAI Progressprogress.openai.com
阅读更多来源: Hacker News | 17-08-25
OpenAI CEO Sam Altman says human-made content will "go up in value dramatically"
阅读更多来源: The Decoder | 17-08-25
Google unveils Gemma 3 270M, its most compact model designed for efficient, task-specific AI use
阅读更多来源: The Decoder | 17-08-25
Monday – A personality experimentchatgpt.com
阅读更多来源: Hacker News | 17-08-25
Zhipu AI's GLM-4.5 is yet another open-source Chinese LLM closing the gap with Western models
阅读更多来源: The Decoder | 17-08-25
Launch HN: Embedder (YC S25) – Claude code for embedded software
阅读更多来源: Hacker News | 16-08-25
Geoffrey Hinton urges researchers to design AI with nurturing instincts to protect humanity
阅读更多来源: The Decoder | 16-08-25
HTC unveils VIVE Eagle, a lightweight AI headset powered by OpenAI and Gemini
阅读更多来源: The Decoder | 16-08-25
I let LLMs write an Elixir NIF in C; it mostly workedoverbring.com
阅读更多来源: Hacker News | 16-08-25
Claude Opus 4 and 4.1 can now end a rare subset of conversationsanthropic.com
阅读更多来源: Hacker News | 16-08-25
OpenAI's o3 model outperforms the newer GPT-5 model on complex, multi-app office tasks
阅读更多来源: The Decoder | 16-08-25
Apple is reportedly planning an AI push with four new smart home products
阅读更多来源: The Decoder | 15-08-25
Doctors detected fewer lesions after routinely using AI during colonoscopies
阅读更多来源: The Decoder | 15-08-25
A conversation with Max Tegmark inspired AI co-founder Igor Babuschkin shift to safer AI
阅读更多来源: The Decoder | 15-08-25
Why LLMs can't really build softwarezed.dev
阅读更多来源: Hacker News | 15-08-25
Is chain-of-thought AI reasoning a mirage?seangoedecke.com
阅读更多来源: Hacker News | 15-08-25
OpenAI's AI system wins a gold medal-level score at the International Olympiad in Informatics 2025
阅读更多来源: The Decoder | 14-08-25
ChatGPT users can now toggle Auto, Fast, and Thinking modes for more control over GPT-5
阅读更多来源: The Decoder | 14-08-25
Show HN: Vaultrice – A real-time key-value store with a localStorage APIvaultrice.com
阅读更多来源: Hacker News | 14-08-25
Convo-Lang: LLM Programming Language and Runtimeconvo-lang.ai
阅读更多来源: Hacker News | 14-08-25
Show HN: Yet another memory system for LLMsgithub.com/trvon
阅读更多来源: Hacker News | 14-08-25
Mbodi AI (YC X25) Is Hiring a Founding Research Engineer (Robotics)ycombinator.com
阅读更多来源: Hacker News | 14-08-25
What's the strongest AI model you can train on a laptop in five minutes?seangoedecke.com
阅读更多来源: Hacker News | 14-08-25
Evaluating the Role of Large Language Models in Legal Practice in India
Authors: Rahul Hemrajani (National Law School of India University, Bengaluru) |
阅读更多来源: ArXiv AI | 14-08-25
Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study
Authors: Mahdi Dhaini, Juraj Vladika, Ege Erdogan, Zineb Attaoui, Gjergji Kasneci |
阅读更多来源: ArXiv AI | 14-08-25
Enhance the machine learning algorithm performance in phishing detection with keyword features
Authors: Zijiang Yang |
阅读更多来源: ArXiv AI | 14-08-25
A Comprehensive Survey of Datasets for Clinical Mental Health AI Systems
Authors: Aishik Mandal, Prottay Kumar Adhikary, Hiba Arnaout, Iryna Gurevych, Tanmoy Chakraborty |
阅读更多来源: ArXiv AI | 14-08-25
LibRec: Benchmarking Retrieval-Augmented LLMs for Library Migration Recommendations
Authors: Junxiao Han, Yarong Wang, Xiaodong Gu, Cuiyun Gao, Yao Wan, Song Han, David Lo, Shuiguang Deng |
阅读更多来源: ArXiv AI | 14-08-25
Perceptual Reality Transformer: Neural Architectures for Simulating Neurological Perception Conditions
Authors: Baihan Lin |
阅读更多来源: ArXiv AI | 14-08-25
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
Authors: Weigao Sun, Jiaxi Hu, Yucheng Zhou, Jusen Du, Disen Lan, Kexin Wang, Tong Zhu, Xiaoye Qu, Yu Zhang, Xiaoyu Mo, Daizong Liu, Yuxuan Liang, Wenliang Chen, Guoqi Li, Yu Cheng |
阅读更多来源: ArXiv AI | 14-08-25
Exploring the Potential of Large Language Models in Fine-Grained Review Comment Classification
Authors: Linh Nguyen, Chunhua Liu, Hong Yi Lin, Patanamon Thongtanunam |
阅读更多来源: ArXiv AI | 14-08-25
Beyond Naïve Prompting: Strategies for Improved Zero-shot Context-aided Forecasting with LLMs
Authors: Arjun Ashok, Andrew Robert Williams, Vincent Zhihao Zheng, Irina Rish, Nicolas Chapados, Étienne Marcotte, Valentina Zantedeschi, Alexandre Drouin |
阅读更多来源: ArXiv AI | 14-08-25
Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models
Authors: Jiaqi Cao, Jiarui Wang, Rubin Wei, Qipeng Guo, Kai Chen, Bowen Zhou, Zhouhan Lin |
阅读更多来源: ArXiv AI | 14-08-25
STREAM (ChemBio): A Standard for Transparently Reporting Evaluations in AI Model Reports
Authors: Tegan McCaslin, Jide Alaga, Samira Nedungadi, Seth Donoughe, Tom Reed, Rishi Bommasani, Chris Painter, Luca Righetti |
阅读更多来源: ArXiv AI | 14-08-25
A Comprehensive Evaluation framework of Alignment Techniques for LLMs
Authors: Muneeza Azmat, Momin Abbas, Maysa Malfiza Garcia de Macedo, Marcelo Carpinette Grave, Luan Soares de Souza, Tiago Machado, Rogerio A de Paula, Raya Horesh, Yixin Chen, Heloisa Caroline de Souza Pereira Candello, Rebecka Nordenlow, Aminat Adebiyi |
阅读更多来源: ArXiv AI | 14-08-25
The Othello AI Arena: Evaluating Intelligent Systems Through Limited-Time Adaptation to Unseen Boards
Authors: Sundong Kim |
阅读更多来源: ArXiv AI | 14-08-25
Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
Authors: Junyan Ye, Dongzhi Jiang, Zihao Wang, Leqi Zhu, Zhenghao Hu, Zilong Huang, Jun He, Zhiyuan Yan, Jinghua Yu, Hongsheng Li, Conghui He, Weijia Li |
阅读更多来源: ArXiv AI | 14-08-25
The PacifAIst Benchmark:Would an Artificial Intelligence Choose to Sacrifice Itself for Human Safety?
Authors: Manuel Herrador |
阅读更多来源: ArXiv AI | 14-08-25
UDA: Unsupervised Debiasing Alignment for Pair-wise LLM-as-a-Judge
Authors: Yang Zhang, Cunxiang Wang, Lindong Wu, Wenbo Yu, Yidong Wang, Guangsheng Bao, Jie Tang |
阅读更多来源: ArXiv AI | 14-08-25
RAGulating Compliance: A Multi-Agent Knowledge Graph for Regulatory QA
Authors: Bhavik Agarwal, Hemant Sunil Jomraj, Simone Kaplunov, Jack Krolick, Viktoria Rojkova |
阅读更多来源: ArXiv AI | 14-08-25
Mathematical Computation and Reasoning Errors by Large Language Models
Authors: Liang Zhang, Edith Aurora Graf |
阅读更多来源: ArXiv AI | 14-08-25
Claude says “You're absolutely right!” about everythinggithub.com/anthropics
阅读更多来源: Hacker News | 14-08-25
Illinois bans use of artificial intelligence for mental health therapywashingtonpost.com
阅读更多来源: Hacker News | 14-08-25
Nvidia researchers urge the AI industry to rethink agentic AI in favor of smaller, more efficient LLMs
阅读更多来源: The Decoder | 13-08-25
Nvidia pushes "Physical AI" with new Blackwell hardware and AI models
阅读更多来源: The Decoder | 13-08-25
Psychiatrist warns of AI-driven delusions as OpenAI's Sam Altman admits risks
阅读更多来源: The Decoder | 13-08-25
GPT-5 is here and Gary Marcus is not impressed
阅读更多来源: The Decoder | 13-08-25
Nvidia and AMD must pay the U.S. a portion of revenue for selling AI chips in China
阅读更多来源: The Decoder | 13-08-25
A Comprehensive Survey of Self-Evolving AI Agents [pdf]arxiv.org
阅读更多来源: Hacker News | 13-08-25
Show HN: Omnara – Run Claude Code from anywheregithub.com/omnara-ai
阅读更多来源: Hacker News | 13-08-25
Show HN: Building a web search engine from scratch with 3B neural embeddingsblog.wilsonl.in
阅读更多来源: Hacker News | 13-08-25
His psychosis was a mystery–until doctors learned about ChatGPT's health advicepsypost.org
阅读更多来源: Hacker News | 13-08-25
Claude Sonnet 4 now supports 1M tokens of contextanthropic.com
阅读更多来源: Hacker News | 13-08-25
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer
Authors: Zixin Yin, Xili Dai, Ling-Hao Chen, Deyu Zhou, Jianan Wang, Duomin Wang, Gang Yu, Lionel M. Ni, Heung-Yeung Shum |
阅读更多来源: ArXiv AI | 13-08-25
Can We Trust AI to Govern AI? Benchmarking LLM Performance on Privacy and AI Governance Exams
Authors: Zane Witherspoon, Thet Mon Aye, YingYing Hao |
阅读更多来源: ArXiv AI | 13-08-25
UrzaGPT: LoRA-Tuned Large Language Models for Card Selection in Collectible Card Games
Authors: Timo Bertram |
阅读更多来源: ArXiv AI | 13-08-25
What Breaks Knowledge Graph based RAG? Empirical Insights into Reasoning under Incomplete Knowledge
Authors: Dongzhuoran Zhou, Yuqicheng Zhu, Xiaxia Wang, Hongkuan Zhou, Yuan He, Jiaoyan Chen, Evgeny Kharlamov, Steffen Staab |
阅读更多来源: ArXiv AI | 13-08-25
First Ask Then Answer: A Framework Design for AI Dialogue Based on Supplementary Questioning with Large Language Models
Authors: Chuanruo Fu, Yuncheng Du |
阅读更多来源: ArXiv AI | 13-08-25
LLM-BI: Towards Fully Automated Bayesian Inference with Large Language Models
Authors: Yongchao Huang |
阅读更多来源: ArXiv AI | 13-08-25
Topos Theory for Generative AI and LLMs
Authors: Sridhar Mahadevan |
阅读更多来源: ArXiv AI | 13-08-25
POMO+: Leveraging starting nodes in POMO for solving Capacitated Vehicle Routing Problem
Authors: Szymon Jakubicz, Karol Kuźniak, Jan Wawszczak, Paweł Gora |
阅读更多来源: ArXiv AI | 13-08-25
AgriGPT: a Large Language Model Ecosystem for Agriculture
Authors: Bo Yang, Yu Zhang, Lanfei Feng, Yunkui Chen, Jianyu Zhang, Xiao Xu, Nueraili Aierken, Yurui Li, Yuxuan Chen, Guijun Yang, Yong He, Runhe Huang, Shijian Li |
阅读更多来源: ArXiv AI | 13-08-25
SynLLM: A Comparative Analysis of Large Language Models for Medical Tabular Synthetic Data Generation via Prompt Engineering
Authors: Arshia Ilaty, Hossein Shirazi, Hajar Homayouni |
阅读更多来源: ArXiv AI | 13-08-25
GVGAI-LLM: Evaluating Large Language Model Agents with Infinite Games
Authors: Yuchen Li, Cong Lin, Muhammad Umair Nasir, Philip Bontrager, Jialin Liu, Julian Togelius |
阅读更多来源: ArXiv AI | 13-08-25
Large Language Models as Oracles for Ontology Alignment
Authors: Sviatoslav Lushnei, Dmytro Shumskyi, Severyn Shykula, Ernesto Jimenez-Ruiz, Artur d'Avila Garcez |
阅读更多来源: ArXiv AI | 13-08-25
Prompt-and-Check: Using Large Language Models to Evaluate Communication Protocol Compliance in Simulation-Based Training
Authors: Vishakha Lall, Yisi Liu |
阅读更多来源: ArXiv AI | 13-08-25
A Dual-Axis Taxonomy of Knowledge Editing for LLMs: From Mechanisms to Functions
Authors: Amir Mohammad Salehoof, Ali Ramezani, Yadollah Yaghoobzadeh, Majid Nili Ahmadabadi |
阅读更多来源: ArXiv AI | 13-08-25
Silicon Minds versus Human Hearts: The Wisdom of Crowds Beats the Wisdom of AI in Emotion Recognition
Authors: Mustafa Akben, Vinayaka Gude, Haya Ajjan |
阅读更多来源: ArXiv AI | 13-08-25
Efficient Agent: Optimizing Planning Capability for Multimodal Retrieval Augmented Generation
Authors: Yuechen Wang, Yuming Qiao, Dan Meng, Jun Yang, Haonan Lu, Zhenyu Yang, Xudong Zhang |
阅读更多来源: ArXiv AI | 13-08-25
Prospect Theory Fails for LLMs: Revealing Instability of Decision-Making under Epistemic Uncertainty
Authors: Rui Wang, Qihan Lin, Jiayu Liu, Qing Zong, Tianshi Zheng, Weiqi Wang, Yangqiu Song |
阅读更多来源: ArXiv AI | 13-08-25
Activation Steering for Bias Mitigation: An Interpretable Approach to Safer LLMs
Authors: Shivam Dubey |
阅读更多来源: ArXiv AI | 13-08-25
Intrinsic Memory Agents: Heterogeneous Multi-Agent LLM Systems through Structured Contextual Memory
Authors: Sizhe Yuen, Francisco Gomez Medina, Ting Su, Yali Du, Adam J. Sobey |
阅读更多来源: ArXiv AI | 13-08-25
CVCM Track Circuits Pre-emptive Failure Diagnostics for Predictive Maintenance Using Deep Neural Networks
Authors: Debdeep Mukherjee (2), Eduardo Di Santi (1), Clément Lefebvre (1), Nenad Mijatovic (1), Victor Martin (1), Thierry Josse (3), Jonathan Brown (1), Kenza Saiah (1) ((1) Digital and Integrated Systems, Alstom (2) Innovation and Smart Mobility, Alstom (3) Project System Engineering, Alstom) |
阅读更多来源: ArXiv AI | 13-08-25
SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling
Authors: Shixuan Sun, Siyuan Liang, Ruoyu Chen, Jianjie Huang, Jingzhi Li, Xiaochun Cao |
阅读更多来源: ArXiv AI | 13-08-25
Agent-based AI systems face growing threats from zero-click and one-click exploits
阅读更多来源: The Decoder | 13-08-25
Nexus: An Open-Source AI Router for Governance, Control and Observabilitynexusrouter.com
阅读更多来源: Hacker News | 13-08-25
Evaluating LLMs playing text adventuresentropicthoughts.com
阅读更多来源: Hacker News | 13-08-25
Weave (YC W25) is hiring a founding AI engineerycombinator.com
阅读更多来源: Hacker News | 13-08-25
LLMs aren't world modelsyosefk.com
阅读更多来源: Hacker News | 13-08-25
Launch HN: Design Arena (YC S25) – Head-to-head AI benchmark for aesthetics
阅读更多来源: Hacker News | 13-08-25
U.S. authorities have reportedly embedded secret GPS trackers in shipments of advanced AI chips
阅读更多来源: The Decoder | 13-08-25
Here’s how to spot AI writing, according to Wikipedia editors
阅读更多来源: The Decoder | 12-08-25
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lensarstechnica.com
阅读更多来源: Hacker News | 12-08-25
Sloppy AI defenses take cybersecurity back to the 1990s, researchers sayscworld.com
阅读更多来源: Hacker News | 12-08-25
Claude Code is all you needdwyer.co.za
阅读更多来源: Hacker News | 12-08-25
MDK12-Bench: A Comprehensive Evaluation of Multimodal Large Language Models on Multidisciplinary Exams
Authors: Pengfei Zhou, Xiaopeng Peng, Fanrui Zhang, Zhaopan Xu, Jiaxin Ai, Yansheng Qiu, Chuanhao Li, Zhen Li, Ming Li, Yukang Feng, Jianwen Sun, Haoquan Zhang, Zizhen Li, Xiaofeng Mao, Zekai Li, Wangbo Zhao, Kai Wang, Xiaojun Chang, Wenqi Shao, Yang You, Kaipeng Zhang |
阅读更多来源: ArXiv AI | 12-08-25
Automated Formalization via Conceptual Retrieval-Augmented LLMs
Authors: Wangyue Lu, Lun Du, Sirui Li, Ke Weng, Haozhe Sun, Hengyu Liu, Minghe Yu, Tiancheng Zhang, Ge Yu |
阅读更多来源: ArXiv AI | 12-08-25
DSperse: A Framework for Targeted Verification in Zero-Knowledge Machine Learning
Authors: Dan Ivanov, Tristan Freiberg, Haruna Isah |
阅读更多来源: ArXiv AI | 12-08-25
MASteer: Multi-Agent Adaptive Steer Strategy for End-to-End LLM Trustworthiness Repair
Authors: Changqing Li, Tianlin Li, Xiaohan Zhang, Aishan Liu, Li Pan |
阅读更多来源: ArXiv AI | 12-08-25
Large Language Models Do Not Simulate Human Psychology
Authors: Sarah Schröder, Thekla Morgenroth, Ulrike Kuhl, Valerie Vaquet, Benjamin Paaßen |
阅读更多来源: ArXiv AI | 12-08-25
Towards Safer AI Moderation: Evaluating LLM Moderators Through a Unified Benchmark Dataset and Advocating a Human-First Approach
Authors: Naseem Machlovi, Maryam Saleki, Innocent Ababio, Ruhul Amin |
阅读更多来源: ArXiv AI | 12-08-25
Generative AI for Strategic Plan Development
Authors: Jesse Ponnock |
阅读更多来源: ArXiv AI | 12-08-25
Rethinking Domain-Specific LLM Benchmark Construction: A Comprehensiveness-Compactness Approach
Authors: Rubing Chen, Jiaxin Wu, Jian Wang, Xulu Zhang, Wenqi Fan, Chenghua Lin, Xiao-Yong Wei, Qing Li |
阅读更多来源: ArXiv AI | 12-08-25
MCPToolBench++: A Large Scale AI Agent Model Context Protocol MCP Tool Use Benchmark
Authors: Shiqing Fan, Xichen Ding, Liang Zhang, Linjian Mo |
阅读更多来源: ArXiv AI | 12-08-25
Democratizing Diplomacy: A Harness for Evaluating Any Large Language Model on Full-Press Diplomacy
Authors: Alexander Duffy, Samuel J Paech, Ishana Shastri, Elizabeth Karpinski, Baptiste Alloui-Cros, Tyler Marques, Matthew Lyle Olson |
阅读更多来源: ArXiv AI | 12-08-25
Grounding Natural Language for Multi-agent Decision-Making with Multi-agentic LLMs
Authors: Dom Huh, Prasant Mohapatra |
阅读更多来源: ArXiv AI | 12-08-25
Multimodal AI Systems for Enhanced Laying Hen Welfare Assessment and Productivity Optimization
Authors: Daniel Essien, Suresh Neethirajan |
阅读更多来源: ArXiv AI | 12-08-25
1-2-3 Check: Enhancing Contextual Privacy in LLM via Multi-Agent Reasoning
Authors: Wenkai Li, Liwen Sun, Zhenxiang Guan, Xuhui Zhou, Maarten Sap |
阅读更多来源: ArXiv AI | 12-08-25
Symmetry-Aware Transformer Training for Automated Planning
Authors: Markus Fritzsche, Elliot Gestrin, Jendrik Seipp |
阅读更多来源: ArXiv AI | 12-08-25
\(X\)-evolve: Solution space evolution powered by large language models
Authors: Yi Zhai, Zhiqiang Wei, Ruohan Li, Keyu Pan, Shuo Liu, Lu Zhang, Jianmin Ji, Wuyang Zhang, Yu Zhang, Yanyong Zhang |
阅读更多来源: ArXiv AI | 12-08-25
FEAT: A Multi-Agent Forensic AI System with Domain-Adapted Large Language Model for Automated Cause-of-Death Analysis
Authors: Chen Shen, Wanqing Zhang, Kehan Li, Erwen Huang, Haitao Bi, Aiying Fan, Yiwen Shen, Hongmei Dong, Ji Zhang, Yuming Shao, Zengjia Liu, Xinshe Liu, Tao Li, Chunxia Yan, Shuanliang Fan, Di Wu, Jianhua Ma, Bin Cong, Zhenyuan Wang, Chunfeng Lian |
阅读更多来源: ArXiv AI | 12-08-25
Interpreting Fedspeak with Confidence: A LLM-Based Uncertainty-Aware Framework Guided by Monetary Policy Transmission Paths
Authors: Rui Yao (1), Qi Chai (1 and 3), Jinhai Yao (2), Siyuan Li (1), Junhao Chen (1), Qi Zhang (2), Hao Wang (1) ((1) The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China, (2) Shanghai Jiaotong University, Shanghai, China, (3) Xi'an Jiaotong University, Xi'an, China) |
阅读更多来源: ArXiv AI | 12-08-25
BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks
Authors: Rui Miao, Yixin Liu, Yili Wang, Xu Shen, Yue Tan, Yiwei Dai, Shirui Pan, Xin Wang |
阅读更多来源: ArXiv AI | 12-08-25
TeamMedAgents: Enhancing Medical Decision-Making of LLMs Through Structured Teamwork
Authors: Pranav Pushkar Mishra, Mohammad Arvan, Mohan Zalake (University of Illinois, Chicago) |
阅读更多来源: ArXiv AI | 12-08-25
From Natural Language to Solver-Ready Power System Optimization: An LLM-Assisted, Validation-in-the-Loop Framework
Authors: Yunkai Hu, Tianqiao Zhao, Meng Yue |
阅读更多来源: ArXiv AI | 12-08-25
Optimizing my sleep around Claude usage limitsmattwie.se
阅读更多来源: Hacker News | 12-08-25
Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
Authors: Zhaomin Wu, Mingzhe Du, See-Kiong Ng, Bingsheng He |
阅读更多来源: ArXiv AI | 12-08-25
End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation
Authors: Anurag Tripathi, Vaibhav Patle, Abhinav Jain, Ayush Pundir, Sairam Menon, Ajeet Kumar Singh |
阅读更多来源: ArXiv AI | 12-08-25
SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models
Authors: Han Yin, Yafeng Chen, Chong Deng, Luyao Cheng, Hui Wang, Chao-Hong Tan, Qian Chen, Wen Wang, Xiangang Li |
阅读更多来源: ArXiv AI | 12-08-25
Dimensional Characterization and Pathway Modeling for Catastrophic AI Risks
Authors: Ze Shen Chin |
阅读更多来源: ArXiv AI | 12-08-25
Robust Target Speaker Diarization and Separation via Augmented Speaker Embedding Sampling
Authors: Md Asif Jalal, Luca Remaggi, Vasileios Moschopoulos, Thanasis Kotsiopoulos, Vandana Rajan, Karthikeyan Saravanan, Anastasis Drosou, Junho Heo, Hyuk Oh, Seokyeong Jeong |
阅读更多来源: ArXiv AI | 12-08-25
Shortcut Learning in Generalist Robot Policies: The Role of Dataset Diversity and Fragmentation
Authors: Youguang Xing, Xu Luo, Junlin Xie, Lianli Gao, Hengtao Shen, Jingkuan Song |
阅读更多来源: ArXiv AI | 12-08-25
Echoes of Automation: The Increasing Use of LLMs in Newsmaking
Authors: Abolfazl Ansari, Delvin Ce Zhang, Nafis Irtiza Tripto, Dongwon Lee |
阅读更多来源: ArXiv AI | 12-08-25
Learning the Topic, Not the Language: How LLMs Classify Online Immigration Discourse Across Languages
Authors: Andrea Nasuto, Stefano Maria Iacus, Francisco Rowe, Devika Jain |
阅读更多来源: ArXiv AI | 12-08-25
ScamAgents: How AI Agents Can Simulate Human-Level Scam Calls
Authors: Sanket Badhe |
阅读更多来源: ArXiv AI | 12-08-25
Planning Agents on an Ego-Trip: Leveraging Hybrid Ego-Graph Ensembles for Improved Tool Retrieval in Enterprise Task Planning
Authors: Sahil Bansal, Sai Shruthi Sistla, Aarti Arikatala, Sebastian Schreiber |
阅读更多来源: ArXiv AI | 12-08-25
Holistic Explainable AI (H-XAI): Extending Transparency Beyond Developers in AI-Driven Decision Making
Authors: Kausik Lakkaraju, Siva Likitha Valluru, Biplav Srivastava |
阅读更多来源: ArXiv AI | 12-08-25
Whither symbols in the era of advanced neural networks?
Authors: Thomas L. Griffiths, Brenden M. Lake, R. Thomas McCoy, Ellie Pavlick, Taylor W. Webb |
阅读更多来源: ArXiv AI | 12-08-25
LLMs for Resource Allocation: A Participatory Budgeting Approach to Inferring Preferences
Authors: Sankarshan Damle, Boi Faltings |
阅读更多来源: ArXiv AI | 12-08-25
SKATE, a Scalable Tournament Eval: Weaker LLMs differentiate between stronger ones using verifiable challenges
Authors: Dewi S. W. Gould, Bruno Mlodozeniec, Samuel F. Brown |
阅读更多来源: ArXiv AI | 12-08-25
GeoLaux: A Benchmark for Evaluating MLLMs' Geometry Performance on Long-Step Problems Requiring Auxiliary Lines
Authors: Yumeng Fu, Jiayin Zhu, Lingling Zhang, Bo Zhao, Shaoxuan Ma, Yushun Zhang, Yanrui Wu, Wenjun Wu |
阅读更多来源: ArXiv AI | 12-08-25
Overconfidence in LLM-as-a-Judge: Diagnosis and Confidence-Driven Solution
Authors: Zailong Tian, Zhuoheng Han, Yanzhe Chen, Haozhe Xu, Xi Yang, richeng xuan, Hongfeng Wang, Lizi Liao |
阅读更多来源: ArXiv AI | 12-08-25
Retrieval Augmented Large Language Model System for Comprehensive Drug Contraindications
Authors: Byeonghun Bang, Jongsuk Yoon, Dong-Jin Chang, Seho Park, Yong Oh Lee |
阅读更多来源: ArXiv AI | 12-08-25
LLM Robustness Leaderboard v1 --Technical report
Authors: Pierre Peigné - Lefebvre, Quentin Feuillade-Montixi, Tom David, Nicolas Miailhe |
阅读更多来源: ArXiv AI | 12-08-25
From Explainable to Explanatory Artificial Intelligence: Toward a New Paradigm for Human-Centered Explanations through Generative AI
Authors: Christian Meske, Justin Brenne, Erdi Uenal, Sabahat Oelcer, Ayseguel Doganguen |
阅读更多来源: ArXiv AI | 12-08-25
AntiCheatPT: A Transformer-Based Approach to Cheat Detection in Competitive Computer Games
Authors: Mille Mei Zhen Loo, Gert Luzkov, Paolo Burelli |
阅读更多来源: ArXiv AI | 12-08-25
The Fair Game: Auditing & Debiasing AI Algorithms Over Time
Authors: Debabrota Basu, Udvas Das |
阅读更多来源: ArXiv AI | 12-08-25
OpenAI CEO Sam Altman responds to GPT-5 backlash, outlines next steps
阅读更多来源: The Decoder | 11-08-25
Fitzgerald's Follieslibertiesjournal.com
阅读更多来源: Hacker News | 11-08-25
Graham: Synchronizing Clocks by Leveraging Local Clock Properties (2022) [pdf]usenix.org
阅读更多来源: Hacker News | 11-08-25
GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2sebastianraschka.com
阅读更多来源: Hacker News | 11-08-25
GPT-OSS-120B runs on just 8GB VRAM & 64GB+ system RAMreddit.com
阅读更多来源: Hacker News | 11-08-25
Hand-picked selection of articles on AI fundamentals/conceptsaman.ai
阅读更多来源: Hacker News | 11-08-25
Meta acquires audio AI startup WaveForms as it ramps up efforts to build Llama 4.5
阅读更多来源: The Decoder | 11-08-25
How I code with AI on a budget/freewuu73.org
阅读更多来源: Hacker News | 11-08-25
Show HN: Reactive: A React Book for the Reluctant (written by Claude)github.com/cloudstreet-dev
阅读更多来源: Hacker News | 11-08-25
The current state of LLM-driven developmenttolki.dev
阅读更多来源: Hacker News | 10-08-25
Ch.at – a lightweight LLM chat service accessible through HTTP, SSH, DNS and APIch.at
阅读更多来源: Hacker News | 10-08-25
My Lethal Trifecta talk at the Bay Area AI Security Meetupsimonwillison.net
阅读更多来源: Hacker News | 10-08-25
Curious about the training data of OpenAI's new GPT-OSS models? I was tootwitter.com/jxmnop
阅读更多来源: Hacker News | 10-08-25
Embedding Alignment in Code Generation for Audio
Authors: Sam Kouteili, Hiren Madhu, George Typaldos, Mark Santolucito |
阅读更多来源: ArXiv AI | 10-08-25
The World According to LLMs: How Geographic Origin Influences LLMs' Entity Deduction Capabilities
Authors: Harsh Nishant Lalai, Raj Sanjay Shah, Jiaxin Pei, Sashank Varma, Yi-Chia Wang, Ali Emami |
阅读更多来源: ArXiv AI | 10-08-25
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle
Authors: Linghao Zhu, Yiran Guan, Dingkang Liang, Jianzhong Ju, Zhenbo Luo, Bin Qin, Jian Luan, Yuliang Liu, Xiang Bai |
阅读更多来源: ArXiv AI | 10-08-25
Iterative Learning of Computable Phenotypes for Treatment Resistant Hypertension using Large Language Models
Authors: Guilherme Seidyo Imai Aldeia, Daniel S. Herman, William G. La Cava |
阅读更多来源: ArXiv AI | 10-08-25
Conformal Sets in Multiple-Choice Question Answering under Black-Box Settings with Provable Coverage Guarantees
Authors: Guang Yang, Xinyang Liu |
阅读更多来源: ArXiv AI | 10-08-25
Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models
Authors: Haitao Hong, Yuchen Yan, Xingyu Wu, Guiyang Hou, Wenqi Zhang, Weiming Lu, Yongliang Shen, Jun Xiao |
阅读更多来源: ArXiv AI | 10-08-25
How Do LLMs Persuade? Linear Probes Can Uncover Persuasion Dynamics in Multi-Turn Conversations
Authors: Brandon Jaipersaud, David Krueger, Ekdeep Singh Lubana |
阅读更多来源: ArXiv AI | 10-08-25
TrajEvo: Trajectory Prediction Heuristics Design via LLM-driven Evolution
Authors: Zhikai Zhao, Chuanbo Hua, Federico Berto, Kanghoon Lee, Zihan Ma, Jiachen Li, Jinkyoo Park |
阅读更多来源: ArXiv AI | 10-08-25
Prescriptive Agents based on Rag for Automated Maintenance (PARAM)
Authors: Chitranshu Harbola, Anupam Purwar |
阅读更多来源: ArXiv AI | 10-08-25
Large Language Models Reasoning Abilities Under Non-Ideal Conditions After RL-Fine-Tuning
Authors: Chang Tian, Matthew B. Blaschko, Mingzhe Xing, Xiuxing Li, Yinliang Yue, Marie-Francine Moens |
阅读更多来源: ArXiv AI | 10-08-25
Fine-Tuning Small Language Models (SLMs) for Autonomous Web-based Geographical Information Systems (AWebGIS)
Authors: Mahdi Nazari Ashani, Ali Asghar Alesheikh, Saba Kazemi, Kimya Kheirkhah, Yasin Mohammadi, Fatemeh Rezaie, Amir Mahdi Manafi, Hedieh Zarkesh |
阅读更多来源: ArXiv AI | 10-08-25
Who is a Better Player: LLM against LLM
Authors: Yingjie Zhou, Jiezhang Cao, Farong Wen, Li Xu, Yanwei Jiang, Jun Jia, Ronghui Li, Xiaohong Liu, Yu Zhou, Xiongkuo Min, Jie Guo, Zicheng Zhang, Guangtao Zhai |
阅读更多来源: ArXiv AI | 10-08-25
MedMKEB: A Comprehensive Knowledge Editing Benchmark for Medical Multimodal Large Language Models
Authors: Dexuan Xu, Jieyi Wang, Zhongyan Chai, Yongzhi Cao, Hanpin Wang, Huamin Zhang, Yu Huang |
阅读更多来源: ArXiv AI | 10-08-25
Can Large Language Models Integrate Spatial Data? Empirical Insights into Reasoning Strengths and Computational Weaknesses
Authors: Bin Han, Robert Wolfe, Anat Caspi, Bill Howe |
阅读更多来源: ArXiv AI | 10-08-25
EasySize: Elastic Analog Circuit Sizing via LLM-Guided Heuristic Search
Authors: Xinyue Wu, Fan Hu, Shaik Jani Babu, Yi Zhao, Xinfei Guo |
阅读更多来源: ArXiv AI | 10-08-25
A Novel Architecture for Symbolic Reasoning with Decision Trees and LLM Agents
Authors: Andrew Kiruluta |
阅读更多来源: ArXiv AI | 10-08-25
QA-Dragon: Query-Aware Dynamic RAG System for Knowledge-Intensive Visual Question Answering
Authors: Zhuohang Jiang, Pangjing Wu, Xu Yuan, Wenqi Fan, Qing Li |
阅读更多来源: ArXiv AI | 10-08-25
NomicLaw: Emergent Trust and Strategic Argumentation in LLMs During Collaborative Law-Making
Authors: Asutosh Hota, Jussi P.P. Jokinen |
阅读更多来源: ArXiv AI | 10-08-25
Large Language Models Transform Organic Synthesis From Reaction Prediction to Automation
Authors: Kartar Kumar Lohana Tharwani, Rajesh Kumar, Sumita, Numan Ahmed, Yong Tang |
阅读更多来源: ArXiv AI | 10-08-25
An Explainable Machine Learning Framework for Railway Predictive Maintenance using Data Streams from the Metro Operator of Portugal
Authors: Silvia García-Méndez, Francisco de Arriba-Pérez, Fátima Leal, Bruno Veloso, Benedita Malheiro, Juan Carlos Burguillo-Rial |
阅读更多来源: ArXiv AI | 10-08-25
Bench-2-CoP: Can We Trust Benchmarking for EU AI Compliance?
Authors: Matteo Prandi, Vincenzo Suriani, Federico Pierucci, Marcello Galisai, Daniele Nardi, Piercosma Bisconti |
阅读更多来源: ArXiv AI | 10-08-25
InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities
Authors: Shuo Cai, Su Lu, Qi Zhou, Kejing Yang, Zhijie Sang, Congkai Xie, Hongxia Yang |
阅读更多来源: ArXiv AI | 10-08-25
Can Large Language Models Generate Effective Datasets for Emotion Recognition in Conversations?
Authors: Burak Can Kaplan, Hugo Cesar De Castro Carneiro, Stefan Wermter |
阅读更多来源: ArXiv AI | 10-08-25
Simulating Human-Like Learning Dynamics with LLM-Empowered Agents
Authors: Yu Yuan, Lili Zhao, Wei Chen, Guangting Zheng, Kai Zhang, Mengdi Zhang, Qi Liu |
阅读更多来源: ArXiv AI | 10-08-25
Prompting GPT-5 for agentic workflows and advanced coding applications
阅读更多来源: The Decoder | 10-08-25
GPT-5: Overdue, overhyped and underwhelming. And that's not the worst of itgarymarcus.substack.com
阅读更多来源: Hacker News | 10-08-25
Let's properly analyze an AI article for oncenibblestew.blogspot.com
阅读更多来源: Hacker News | 09-08-25
Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?
阅读更多来源: Hacker News | 09-08-25
Getting good results from Claude Codedzombak.com
阅读更多来源: Hacker News | 09-08-25
What the Windsurf sale means for the AI coding ecosystemethanding.substack.com
阅读更多来源: Hacker News | 09-08-25
I want everything local – Building my offline AI workspaceinstavm.io
阅读更多来源: Hacker News | 09-08-25
Attackers can hijack Google Gemini with a simple prompt hidden in a calendar invite
阅读更多来源: The Decoder | 09-08-25
Grok 4 edges out GPT-5 in complex reasoning benchmark ARC-AGI
阅读更多来源: The Decoder | 09-08-25
GPT-5 should "seem smarter from today" after OpenAI fixed early issues with its model switcher
阅读更多来源: The Decoder | 09-08-25
HRT's Python fork: Leveraging PEP 690 for faster importshudsonrivertrading.com
阅读更多来源: Hacker News | 09-08-25
A robust, open-source framework for Spiking Neural Networks on low-end FPGAsarxiv.org
阅读更多来源: Hacker News | 09-08-25
Open SWE: An open-source asynchronous coding agentlangchain.com
阅读更多来源: Hacker News | 09-08-25
The surprise deprecation of GPT-4o for ChatGPT consumerssimonwillison.net
阅读更多来源: Hacker News | 09-08-25
Developers rely on AI tools more than ever, but trust is slipping
阅读更多来源: The Decoder | 09-08-25
Yet another study doubts that LLM reasoning shows true logic over pattern imitation
阅读更多来源: The Decoder | 09-08-25
Political pressure reportedly kept a major AI vulnerability study under wraps
阅读更多来源: The Decoder | 08-08-25
An invisible prompt in a Google Doc made ChatGPT access data from a victim’s Google Drive
阅读更多来源: The Decoder | 08-08-25
A deleted GitHub post gives an early look at OpenAI’s next major model, GPT-5
阅读更多来源: The Decoder | 08-08-25
How AI conquered the US economy: A visual FAQderekthompson.org
阅读更多来源: Hacker News | 08-08-25
GPT-5 for Developersopenai.com
阅读更多来源: Hacker News | 08-08-25
Writing a storage engine for Postgres: An in-memory table access method (2023)eatonphil.com
阅读更多来源: Hacker News | 08-08-25
OpenAI's new open-source model is basically Phi-5seangoedecke.com
阅读更多来源: Hacker News | 08-08-25
GPT-5: Key characteristics, pricing and system cardsimonwillison.net
阅读更多来源: Hacker News | 08-08-25
GPT-5openai.com
阅读更多来源: Hacker News | 08-08-25
Claude Code IDE integration for Emacsgithub.com/manzaltu
阅读更多来源: Hacker News | 08-08-25
An LLM does not need to understand MCPhackteam.io
阅读更多来源: Hacker News | 08-08-25
Show HN: Octofriend, a cute coding agent that can swap between GPT-5 and Claudegithub.com/synthetic-lab
阅读更多来源: Hacker News | 08-08-25
OpenAI pushes back as the New York Times demands access to 120 million ChatGPT chat logs
阅读更多来源: The Decoder | 07-08-25
Show HN: Aura – Like robots.txt, but for AI actionsgithub.com/osmandkitay
阅读更多来源: Hacker News | 07-08-25
Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUsbaseten.co
阅读更多来源: Hacker News | 07-08-25
New AI Coding Teammate: Gemini CLI GitHub Actionsblog.google
阅读更多来源: Hacker News | 07-08-25
Position: The Current AI Conference Model is Unsustainable! Diagnosing the Crisis of Centralized AI Conference
Authors: Nuo Chen, Moming Duan, Andre Huikai Lin, Qian Wang, Jiaying Wu, Bingsheng He |
阅读更多来源: ArXiv AI | 07-08-25
Share Your Attention: Transformer Weight Sharing via Matrix-based Dictionary Learning
Authors: Magauiya Zhussip, Dmitriy Shopkhoev, Ammar Ali, Stamatios Lefkimmiatis |
阅读更多来源: ArXiv AI | 07-08-25
TURA: Tool-Augmented Unified Retrieval Agent for AI Search
Authors: Zhejun Zhao, Yuehu Dong, Alley Liu, Lixue Zheng, Pingsheng Liu, Dongdong Shen, Long Xia, Jiashu Zhao, Dawei Yin |
阅读更多来源: ArXiv AI | 07-08-25
YOLOv8-Based Deep Learning Model for Automated Poultry Disease Detection and Health Monitoring paper
Authors: Akhil Saketh Reddy Sabbella, Ch.Lakshmi Prachothan, Eswar Kumar Panta |
阅读更多来源: ArXiv AI | 07-08-25
How are CS students using resources and AI tools for coding tasks?
Authors: Natalia Echeverry, Arun Lekshmi Narayanan |
阅读更多来源: ArXiv AI | 07-08-25
Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management
Authors: Mo Li, L.H. Xu, Qitai Tan, Ting Cao, Yunxin Liu |
阅读更多来源: ArXiv AI | 07-08-25
GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples Replay
Authors: Yunan Zhang, Shuoran Jiang, Mengchen Zhao, Yuefeng Li, Yang Fan, Xiangping Wu, Qingcai Chen |
阅读更多来源: ArXiv AI | 07-08-25
MI9 -- Agent Intelligence Protocol: Runtime Governance for Agentic AI Systems
Authors: Charles L. Wang, Trisha Singhal, Ameya Kelkar, Jason Tuo |
阅读更多来源: ArXiv AI | 07-08-25
Galaxy: A Cognition-Centered Framework for Proactive, Privacy-Preserving, and Self-Evolving LLM Agents
Authors: Chongyu Bao, Ruimin Dai, Yangbo Shen, Runyang Jian, Jinghan Zhang, Xiaolan Liu, Kunpeng Liu |
阅读更多来源: ArXiv AI | 07-08-25
Can Large Language Models Adequately Perform Symbolic Reasoning Over Time Series?
Authors: Zewen Liu, Juntong Ni, Xianfeng Tang, Max S.Y. Lau, Wei Jin |
阅读更多来源: ArXiv AI | 07-08-25
Towards Transparent AI Grading: Semantic Entropy as a Signal for Human-AI Disagreement
Authors: Karrtik Iyer, Manikandan Ravikiran, Prasanna Pendse, Shayan Mohanty |
阅读更多来源: ArXiv AI | 07-08-25
Large Language Model's Multi-Capability Alignment in Biomedical Domain
Authors: Wentao Wu, Linqing Chen, Hanmeng Zhong, Weilei Wang |
阅读更多来源: ArXiv AI | 07-08-25
Beyond Pixels: Exploring DOM Downsampling for LLM-Based Web Agents
Authors: Thassilo M. Schiepanski, Nicholas Piël |
阅读更多来源: ArXiv AI | 07-08-25
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
Authors: Xueyu Hu, Tao Xiong, Biao Yi, Zishu Wei, Ruixuan Xiao, Yurun Chen, Jiasheng Ye, Meiling Tao, Xiangxin Zhou, Ziyu Zhao, Yuhuai Li, Shengze Xu, Shenzhi Wang, Xinchen Xu, Shuofei Qiao, Zhaokai Wang, Kun Kuang, Tieyong Zeng, Liang Wang, Jiwei Li, Yuchen Eleanor Jiang, Wangchunshu Zhou, Guoyin Wang, Keting Yin, Zhou Zhao, Hongxia Yang, Fan Wu, Shengyu Zhang, Fei Wu |
阅读更多来源: ArXiv AI | 07-08-25
\textsc{SimInstruct}: A Responsible Tool for Collecting Scaffolding Dialogues Between Experts and LLM-Simulated Novices
Authors: Si Chen, Izzy Molnar, Ting Hua, Peiyu Li, Le Huy Khiem, G. Alex Ambrose, Jim Lang, Ronald Metoyer, Nitesh V. Chawla |
阅读更多来源: ArXiv AI | 07-08-25
LLM Collaboration With Multi-Agent Reinforcement Learning
Authors: Shuo Liu, Zeyu Liang, Xueguang Lyu, Christopher Amato |
阅读更多来源: ArXiv AI | 07-08-25
ConfProBench: A Confidence Evaluation Benchmark for MLLM-Based Process Judges
Authors: Yue Zhou, Yi Chang, Yuan Wu |
阅读更多来源: ArXiv AI | 07-08-25
Two face trial for exporting Nvidia AI chips as the company rejects hardware kill switches
阅读更多来源: The Decoder | 07-08-25
Anthropic prepares for GPT-5 by releasing its upgraded Claude Opus 4.1 model
阅读更多来源: The Decoder | 07-08-25
ElevenLabs launches Eleven Music, an AI music generator "cleared for broad commercial use"
阅读更多来源: The Decoder | 07-08-25
OpenAI releases its first open-weight language models since GPT-2 with GPT-oss
阅读更多来源: The Decoder | 06-08-25
The EU’s AI Act pushes transparency but could overwhelm developers with paperwork
阅读更多来源: The Decoder | 06-08-25
Eight frontier AI models battle in chess for Game Arena’s first tournament tonight
阅读更多来源: The Decoder | 06-08-25
US considers tracking AI chips, TSMC fires employees over the theft of advanced technology
阅读更多来源: The Decoder | 06-08-25
OpenAI says it doesn't want ChatGPT to become a social media time sink
阅读更多来源: The Decoder | 06-08-25
Claude Opus 4.1anthropic.com
阅读更多来源: Hacker News | 06-08-25
Create personal illustrated storybooks in the Gemini appblog.google
阅读更多来源: Hacker News | 06-08-25
Things that helped me get out of the AI 10x engineer imposter syndromecolton.dev
阅读更多来源: Hacker News | 06-08-25
LLM Inflationtratt.net
阅读更多来源: Hacker News | 06-08-25
Ask HN: Do you struggle with flow state when using AI assisted coding tools?
阅读更多来源: Hacker News | 06-08-25
I gave the AI arms and legs then it rejected megrell.dev
阅读更多来源: Hacker News | 06-08-25
Open models by OpenAIopenai.com
阅读更多来源: Hacker News | 06-08-25
Large Language Model-based Data Science Agent: A Survey
Authors: Peiran Wang, Yaoning Yu, Ke Chen, Xianyang Zhan, Haohan Wang |
阅读更多来源: ArXiv AI | 06-08-25
Recovering Individual-Level Activity Sequences from Location-Based Service Data Using a Novel Transformer-Based Model
Authors: Weiyu Luo, Chenfeng Xiong |
阅读更多来源: ArXiv AI | 06-08-25
Enhancing Japanese Large Language Models with Reasoning Vectors
Authors: Carolina Minami Oguchi, Leo Wei, Koyo Kobayashi, Hsin-Tai Wu, Dipak Ghosal |
阅读更多来源: ArXiv AI | 06-08-25
Defend LLMs Through Self-Consciousness
Authors: Boshi Huang, Fabio Nonato de Paula |
阅读更多来源: ArXiv AI | 06-08-25
AGENTiGraph: A Multi-Agent Knowledge Graph Framework for Interactive, Domain-Specific LLM Chatbots
Authors: Xinjie Zhao, Moritz Blum, Fan Gao, Yingjian Chen, Boming Yang, Luis Marquez-Carpintero, Mónica Pina-Navarro, Yanran Fu, So Morikawa, Yusuke Iwasawa, Yutaka Matsuo, Chanjun Park, Irene Li |
阅读更多来源: ArXiv AI | 06-08-25
When AIs Judge AIs: The Rise of Agent-as-a-Judge Evaluation for LLMs
Authors: Fangyi Yu |
阅读更多来源: ArXiv AI | 06-08-25
Unified Tool Integration for LLMs: A Protocol-Agnostic Approach to Function Calling
Authors: Peng Ding, Rick Stevens |
阅读更多来源: ArXiv AI | 06-08-25
From Text to Trajectories: GPT-2 as an ODE Solver via In-Context
Authors: Ziyang Ma, Baojian Zhou, Deqing Yang, Yanghua Xiao |
阅读更多来源: ArXiv AI | 06-08-25
EoH-S: Evolution of Heuristic Set using LLMs for Automated Heuristic Design
Authors: Fei Liu, Yilu Liu, Qingfu Zhang, Xialiang Tong, Mingxuan Yuan |
阅读更多来源: ArXiv AI | 06-08-25
ContractEval: Benchmarking LLMs for Clause-Level Legal Risk Identification in Commercial Contracts
Authors: Shuang Liu, Zelong Li, Ruoyun Ma, Haiyan Zhao, Mengnan Du |
阅读更多来源: ArXiv AI | 06-08-25
Toward Verifiable Misinformation Detection: A Multi-Tool LLM Agent Framework
Authors: Zikun Cui, Tianyi Huang, Chia-En Chiang, Cuiqianhe Du |
阅读更多来源: ArXiv AI | 06-08-25
Can Large Language Models Bridge the Gap in Environmental Knowledge?
Authors: Linda Smail (College of Interdisciplinary Studies, Zayed University, UAE), David Santandreu Calonge (Department of Academic Development, Mohamed bin Zayed University of Artificial Intelligence, UAE), Firuz Kamalov (School of Engineering, Applied Science and Technology, Canadian University Dubai, UAE), Nur H. Orak (Department of Environmental Engineering, Marmara University, Türkiye) |
阅读更多来源: ArXiv AI | 06-08-25
InqEduAgent: Adaptive AI Learning Partners with Gaussian Process Augmentation
Authors: Tian-Fang Zhao, Wen-Xi Yang |
阅读更多来源: ArXiv AI | 06-08-25
CogBench: A Large Language Model Benchmark for Multilingual Speech-Based Cognitive Impairment Assessment
Authors: Feng Rui, Zhiyao Luo, Wei Wang, Yuting Song, Yong Liu, Tingting Zhu, Jianqing Li, Xingyao Wang |
阅读更多来源: ArXiv AI | 06-08-25
Compressing Chain-of-Thought in LLMs via Step Entropy
Authors: Zeju Li, Jianyuan Zhong, Ziyang Zheng, Xiangyu Wen, Zhijian Xu, Yingying Cheng, Fan Zhang, Qiang Xu |
阅读更多来源: ArXiv AI | 06-08-25
Adaptive AI Agent Placement and Migration in Edge Intelligence Systems
Authors: Xingdan Wang, Jiayi He, Zhiqing Tang, Jianxiong Guo, Jiong Lou, Liping Qian, Tian Wang, Weijia Jia |
阅读更多来源: ArXiv AI | 06-08-25
Board Game Arena: A Framework and Benchmark for Assessing Large Language Models via Strategic Play
Authors: Lucia Cipolina-Kun, Marianna Nezhurina, Jenia Jitsev |
阅读更多来源: ArXiv AI | 06-08-25
A Comparative Study of Neurosymbolic AI Approaches to Interpretable Logical Reasoning
Authors: Michael K. Chen |
阅读更多来源: ArXiv AI | 06-08-25
Multi-Objective Infeasibility Diagnosis for Routing Problems Using Large Language Models
Authors: Kai Li, Ruihao Zheng, Xinye Hao, Zhenkun Wang |
阅读更多来源: ArXiv AI | 06-08-25
Hide and Seek with LLMs: An Adversarial Game for Sneaky Error Generation and Self-Improving Diagnosis
Authors: Rui Zou, Mengqi Wei, Yutao Zhu, Jirong Wen, Xin Zhao, Jing Chen |
阅读更多来源: ArXiv AI | 06-08-25
Semantic-aware Graph-guided Behavior Sequences Generation with Large Language Models for Smart Homes
Authors: Zhiyao Xu, Dan Zhao, Qingsong Zou, Qing Li, Yong Jiang, Yuhang Wang, Jingyu Xiao |
阅读更多来源: ArXiv AI | 06-08-25
Hidden Dynamics of Massive Activations in Transformer Training
Authors: Jorge Gallego-Feliciano, S. Aaron McClendon, Juan Morinelli, Stavros Zervoudakis, Antonios Saravanos |
阅读更多来源: ArXiv AI | 06-08-25
Error Detection and Correction for Interpretable Mathematics in Large Language Models
Authors: Yijin Yang, Cristina Cornelio, Mario Leiva, Paulo Shakarian |
阅读更多来源: ArXiv AI | 06-08-25
Refining Critical Thinking in LLM Code Generation: A Faulty Premise-based Evaluation Framework
Authors: Jialin Li, Jinzhe Li, Gengxu Li, Yi Chang, Yuan Wu |
阅读更多来源: ArXiv AI | 06-08-25
Automated Algorithmic Discovery for Gravitational-Wave Detection Guided by LLM-Informed Evolutionary Monte Carlo Tree Search
Authors: He Wang, Liang Zeng |
阅读更多来源: ArXiv AI | 06-08-25
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Authors: Xufang Luo, Yuge Zhang, Zhiyuan He, Zilong Wang, Siyun Zhao, Dongsheng Li, Luna K. Qiu, Yuqing Yang |
阅读更多来源: ArXiv AI | 06-08-25
Tell HN: Anthropic expires paid credits after a year
阅读更多来源: Hacker News | 06-08-25
Persona vectors allow Anthropic to steer language model behaviors like sycophancy and evil
阅读更多来源: The Decoder | 05-08-25
MLE-STAR is designed to automate machine learning pipelines with minimal human input
阅读更多来源: The Decoder | 05-08-25
I tried to replace myself with ChatGPT in my English classlithub.com
阅读更多来源: Hacker News | 05-08-25
Getting out of the Big-Muddy: Escalation of Commitment in LLMs
Authors: Emilio Barkett, Olivia Long, Paul Kröger |
阅读更多来源: ArXiv AI | 05-08-25
Refine-n-Judge: Curating High-Quality Preference Chains for LLM-Fine-Tuning
Authors: Derin Cayir, Renjie Tao, Rashi Rungta, Kai Sun, Sean Chen, Haidar Khan, Minseok Kim, Julia Reinspach, Yue Liu |
阅读更多来源: ArXiv AI | 05-08-25
Polymorphic Combinatorial Frameworks (PCF): Guiding the Design of Mathematically-Grounded, Adaptive AI Agents
Authors: David Pearl, Matthew Murphy, James Intriligator |
阅读更多来源: ArXiv AI | 05-08-25
T-GRAG: A Dynamic GraphRAG Framework for Resolving Temporal Conflicts and Redundancy in Knowledge Retrieval
Authors: Dong Li, Yichen Niu, Ying Ai, Xiang Zou, Biqing Qi, Jianxing Liu |
阅读更多来源: ArXiv AI | 05-08-25
QCBench: Evaluating Large Language Models on Domain-Specific Quantitative Chemistry
Authors: Jiaqing Xie, Weida Wang, Ben Gao, Zhuo Yang, Haiyuan Wan, Shufei Zhang, Tianfan Fu, Yuqiang Li |
阅读更多来源: ArXiv AI | 05-08-25
A Multi-Agent Pokemon Tournament for Evaluating Strategic Reasoning of Large Language Models
Authors: Tadisetty Sai Yashwanth, Dhatri C |
阅读更多来源: ArXiv AI | 05-08-25
ReflecSched: Solving Dynamic Flexible Job-Shop Scheduling via LLM-Powered Hierarchical Reflection
Authors: Shijie Cao, Yuan Yuan |
阅读更多来源: ArXiv AI | 05-08-25
CloudAnoAgent: Anomaly Detection for Cloud Sites via LLM Agent with Neuro-Symbolic Mechanism
Authors: Xinkai Zou, Xuan Jiang, Ruikai Huang, Haoze He, Parv Kapoor, Jiahua Zhao |
阅读更多来源: ArXiv AI | 05-08-25
TRACEALIGN -- Tracing the Drift: Attributing Alignment Failures to Training-Time Belief Sources in LLMs
Authors: Amitava Das, Vinija Jain, Aman Chadha |
阅读更多来源: ArXiv AI | 05-08-25
SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents
Authors: Jiaye Lin, Yifu Guo, Yuzhen Han, Sen Hu, Ziyi Ni, Licheng Wang, Mingguang Chen, Daxin Jiang, Binxing Jiao, Chen Hu, Huacan Wang |
阅读更多来源: ArXiv AI | 05-08-25
Everyone Contributes! Incentivizing Strategic Cooperation in Multi-LLM Systems via Sequential Public Goods Games
Authors: Yunhao Liang, Yuan Qu, Jingyuan Yang, Shaochong Lin, Zuo-Jun Max Shen |
阅读更多来源: ArXiv AI | 05-08-25
Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools
Authors: Kanghua Mo, Li Hu, Yucheng Long, Zhihao Li |
阅读更多来源: ArXiv AI | 05-08-25
Neuromorphic Computing with Multi-Frequency Oscillations: A Bio-Inspired Approach to Artificial Intelligence
Authors: Boheng Liu, Ziyu Li, Xia Wu |
阅读更多来源: ArXiv AI | 05-08-25
AirTrafficGen: Configurable Air Traffic Scenario Generation with Large Language Models
Authors: Dewi Sid William Gould, George De Ath, Ben Carvell, Nick Pepper |
阅读更多来源: ArXiv AI | 05-08-25
CABENCH: Benchmarking Composable AI for Solving Complex Tasks through Composing Ready-to-Use Models
Authors: Tung-Thuy Pham, Duy-Quan Luong, Minh-Quan Duong, Trung-Hieu Nguyen, Thu-Trang Nguyen, Son Nguyen, Hieu Dinh Vo |
阅读更多来源: ArXiv AI | 05-08-25
Traffic-R1: Reinforced LLMs Bring Human-Like Reasoning to Traffic Signal Control Systems
Authors: Xingchen Zou, Yuhao Yang, Zheng Chen, Xixuan Hao, Yiqi Chen, Chao Huang, Yuxuan Liang |
阅读更多来源: ArXiv AI | 05-08-25
FinWorld: An All-in-One Open-Source Platform for End-to-End Financial AI Research and Deployment
Authors: Wentao Zhang, Yilei Zhao, Chuqiao Zong, Xinrun Wang, Bo An |
阅读更多来源: ArXiv AI | 05-08-25
Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting
Authors: Miaosen Luo, Jiesen Long, Zequn Li, Yunying Yang, Yuncheng Jiang, Sijie Mai |
阅读更多来源: ArXiv AI | 05-08-25
OptiHive: Ensemble Selection for LLM-Based Optimization via Statistical Modeling
Authors: Maxime Bouscary, Saurabh Amin |
阅读更多来源: ArXiv AI | 05-08-25
CAMA: Enhancing Mathematical Reasoning in Large Language Models with Causal Knowledge
Authors: Lei Zan, Keli Zhang, Ruichu Cai, Lujia Pan |
阅读更多来源: ArXiv AI | 05-08-25
Accurate and Interpretable Postmenstrual Age Prediction via Multimodal Large Language Model
Authors: Qifan Chen, Jin Cui, Cindy Duan, Yushuo Han, Yifei Shi |
阅读更多来源: ArXiv AI | 05-08-25
Noosemia: toward a Cognitive and Phenomenological Account of Intentionality Attribution in Human-Generative AI Interaction
Authors: Enrico De Santis, Antonello Rizzi |
阅读更多来源: ArXiv AI | 05-08-25
HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research
Authors: Yinghao Zhu, Yifan Qi, Zixiang Wang, Lei Gu, Dehao Sui, Haoran Hu, Xichen Zhang, Ziyi He, Liantao Ma, Lequan Yu |
阅读更多来源: ArXiv AI | 05-08-25
What Is Your AI Agent Buying? Evaluation, Implications and Emerging Questions for Agentic E-Commerce
Authors: Amine Allouah, Omar Besbes, Josué D Figueroa, Yash Kanoria, Akshit Kumar |
阅读更多来源: ArXiv AI | 05-08-25
Tim Cook tells Apple employees that AI is as pivotal as the internet or the smartphone
阅读更多来源: The Decoder | 05-08-25
Adobe's new AI features make complex Photoshopping effortless
阅读更多来源: The Decoder | 05-08-25
Customizing tmuxevgeniipendragon.com
阅读更多来源: Hacker News | 05-08-25
Job-seekers are dodging AI interviewersfortune.com
阅读更多来源: Hacker News | 05-08-25
OpenAI prepares to launch GPT-5, but big leaps are unlikely
阅读更多来源: The Decoder | 04-08-25
Anthropic blocks OpenAI from accessing Claude models over alleged contract breach
阅读更多来源: The Decoder | 04-08-25
Persona vectors: Monitoring and controlling character traits in language modelsanthropic.com
阅读更多来源: Hacker News | 04-08-25
Backdoor Attacks on Deep Learning Face Detection
Authors: Quentin Le Roux, Yannick Teglia, Teddy Furon, Philippe Loubet-Moundi |
阅读更多来源: ArXiv AI | 04-08-25
Similarity-Based Self-Construct Graph Model for Predicting Patient Criticalness Using Graph Neural Networks and EHR Data
Authors: Mukesh Kumar Sahu, Pinki Roy |
阅读更多来源: ArXiv AI | 04-08-25
NyayaRAG: Realistic Legal Judgment Prediction with RAG under the Indian Common Law System
Authors: Shubham Kumar Nigam, Balaramamahanthi Deepak Patnaik, Shivam Mishra, Ajay Varghese Thomas, Noel Shallum, Kripabandhu Ghosh, Arnab Bhattacharya |
阅读更多来源: ArXiv AI | 04-08-25
On-Device Diffusion Transformer Policy for Efficient Robot Manipulation
Authors: Yiming Wu, Huan Wang, Zhenghao Chen, Jianxin Pang, Dong Xu |
阅读更多来源: ArXiv AI | 04-08-25
Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications
Authors: Wenxuan Wang, Zizhan Ma, Meidan Ding, Shiyi Zheng, Shengyuan Liu, Jie Liu, Jiaming Ji, Wenting Chen, Xiang Li, Linlin Shen, Yixuan Yuan |
阅读更多来源: ArXiv AI | 04-08-25
Agentic large language models improve retrieval-based radiology question answering
Authors: Sebastian Wind, Jeta Sopa, Daniel Truhn, Mahshad Lotfinia, Tri-Thien Nguyen, Keno Bressem, Lisa Adams, Mirabela Rusu, Harald Köstler, Gerhard Wellein, Andreas Maier, Soroosh Tayebi Arasteh |
阅读更多来源: ArXiv AI | 04-08-25
Out-of-Context Abduction: LLMs Make Inferences About Procedural Data Leveraging Declarative Facts in Earlier Training Data
Authors: Sohaib Imran, Rob Lamb, Peter M. Atkinson |
阅读更多来源: ArXiv AI | 04-08-25
How LLMs are Shaping the Future of Virtual Reality
Authors: Süeda Özkaya, Santiago Berrezueta-Guzman, Stefan Wagner |
阅读更多来源: ArXiv AI | 04-08-25
Adaptive Machine Learning-Driven Multi-Fidelity Stratified Sampling for Failure Analysis of Nonlinear Stochastic Systems
Authors: Liuyun Xu, Seymour M.J. Spence |
阅读更多来源: ArXiv AI | 04-08-25
Dynamically Adaptive Reasoning via LLM-Guided MCTS for Efficient and Context-Aware KGQA
Authors: Yingxu Wang, Shiqi Fan, Mengzhu Wang, Siwei Liu |
阅读更多来源: ArXiv AI | 04-08-25
MMBERT: Scaled Mixture-of-Experts Multimodal BERT for Robust Chinese Hate Speech Detection under Cloaking Perturbations
Authors: Qiyao Xue, Yuchen Dou, Ryan Shi, Xiang Lorraine Li, Wei Gao |
阅读更多来源: ArXiv AI | 04-08-25
No AI Without PI! Object-Centric Process Mining as the Enabler for Generative, Predictive, and Prescriptive Artificial Intelligence
Authors: Wil M.P. van der Aalst |
阅读更多来源: ArXiv AI | 04-08-25
Do They Understand Them? An Updated Evaluation on Nonbinary Pronoun Handling in Large Language Models
Authors: Xushuo Tang, Yi Ding, Zhengyi Yang, Yin Chen, Yongrui Gu, Wenke Yang, Mingchen Ju, Xin Cao, Yongfei Liu, Wenjie Zhang |
阅读更多来源: ArXiv AI | 04-08-25
Beyond Agreement: Rethinking Ground Truth in Educational AI Annotation
Authors: Danielle R. Thomas, Conrad Borchers, Kenneth R. Koedinger |
阅读更多来源: ArXiv AI | 04-08-25
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
Authors: Yihong Dong, Xue Jiang, Yongding Tao, Huanyu Liu, Kechi Zhang, Lili Mou, Rongyu Cao, Yingwei Ma, Jue Chen, Binhua Li, Zhi Jin, Fei Huang, Yongbin Li, Ge Li |
阅读更多来源: ArXiv AI | 04-08-25
Mind the Gap: The Divergence Between Human and LLM-Generated Tasks
Authors: Yi-Long Lu, Jiajun Song, Chunhui Zhang, Wei Wang |
阅读更多来源: ArXiv AI | 04-08-25
Pro2Guard: Proactive Runtime Enforcement of LLM Agent Safety via Probabilistic Model Checking
Authors: Haoyu Wang, Chris M. Poskitt, Jun Sun, Jiali Wei |
阅读更多来源: ArXiv AI | 04-08-25
Thinking Machines: Mathematical Reasoning in the Age of LLMs
Authors: Andrea Asperti, Alberto Naibo, Claudio Sacerdoti Coen |
阅读更多来源: ArXiv AI | 04-08-25
MultiSHAP: A Shapley-Based Framework for Explaining Cross-Modal Interactions in Multimodal AI Models
Authors: Zhanliang Wang, Kai Wang |
阅读更多来源: ArXiv AI | 04-08-25
From EMR Data to Clinical Insight: An LLM-Driven Framework for Automated Pre-Consultation Questionnaire Generation
Authors: Ruiqing Ding, Qianfang Sun, Yongkang Leng, Hui Yin, Xiaojian Li |
阅读更多来源: ArXiv AI | 04-08-25
Context-Aware Visualization for Explainable AI Recommendations in Social Media: A Vision for User-Aligned Explanations
Authors: Banan Alkhateeb, Ellis Solaiman |
阅读更多来源: ArXiv AI | 04-08-25
6 weeks of Claude Codepuzzmo.com
阅读更多来源: Hacker News | 03-08-25
Automated Feedback on Student-Generated UML and ER Diagrams Using Large Language Models
Authors: Sebastian Gürtl, Gloria Schimetta, David Kerschbaumer, Michael Liut, Alexander Steinmaurer |
阅读更多来源: ArXiv AI | 03-08-25
From LLMs to Edge: Parameter-Efficient Fine-Tuning on Edge Devices
Authors: Georg Slamanig, Francesco Corti, Olga Saukh |
阅读更多来源: ArXiv AI | 03-08-25
Efficient Masked Attention Transformer for Few-Shot Classification and Segmentation
Authors: Dustin Carrión-Ojeda, Stefan Roth, Simone Schaub-Meyer |
阅读更多来源: ArXiv AI | 03-08-25
LLM-Based Identification of Infostealer Infection Vectors from Screenshots: The Case of Aurora
Authors: Estelle Ruellan, Eric Clay, Nicholas Ascoli |
阅读更多来源: ArXiv AI | 03-08-25
Deep Learning-based Prediction of Clinical Trial Enrollment with Uncertainty Estimates
Authors: Tien Huu Do, Antoine Masquelier, Nae Eoun Lee, Jonathan Crowther |
阅读更多来源: ArXiv AI | 03-08-25
Can LLM-Reasoning Models Replace Classical Planning? A Benchmark Study
Authors: Kai Goebel, Patrik Zips |
阅读更多来源: ArXiv AI | 03-08-25
Distributed AI Agents for Cognitive Underwater Robot Autonomy
Authors: Markus Buchholz, Ignacio Carlucho, Michele Grimaldi, Yvan R. Petillot |
阅读更多来源: ArXiv AI | 03-08-25
A survey of multi-agent geosimulation methodologies: from ABM to LLM
Authors: Virginia Padilla, Jacinto Dávila |
阅读更多来源: ArXiv AI | 03-08-25
Automating AI Failure Tracking: Semantic Association of Reports in AI Incident Database
Authors: Diego Russo, Gian Marco Orlando, Valerio La Gatta, Vincenzo Moscato |
阅读更多来源: ArXiv AI | 03-08-25
FairReason: Balancing Reasoning and Social Bias in MLLMs
Authors: Zhenyu Pan, Yutong Zhang, Jianshu Zhang, Haoran Lu, Haozheng Luo, Yuwei Han, Philip S. Yu, Manling Li, Han Liu |
阅读更多来源: ArXiv AI | 03-08-25
Data Readiness for Scientific AI at Scale
Authors: Wesley Brewer, Patrick Widener, Valentine Anantharaj, Feiyi Wang, Tom Beck, Arjun Shankar, Sarp Oral |
阅读更多来源: ArXiv AI | 03-08-25
How Far Are AI Scientists from Changing the World?
Authors: Qiujie Xie, Yixuan Weng, Minjun Zhu, Fuchen Shen, Shulin Huang, Zhen Lin, Jiahui Zhou, Zilan Mao, Zijie Yang, Linyi Yang, Jian Wu, Yue Zhang |
阅读更多来源: ArXiv AI | 03-08-25
LLM4Rail: An LLM-Augmented Railway Service Consulting Platform
Authors: Zhuo Li, Xianghuai Deng, Chiwei Feng, Hanmeng Li, Shenjie Wang, Haichao Zhang, Teng Jia, Conlin Chen, Louis Linchun Wu, Jia Wang |
阅读更多来源: ArXiv AI | 03-08-25
DICE: Dynamic In-Context Example Selection in LLM Agents via Efficient Knowledge Transfer
Authors: Ruoyu Wang, Junda Wu, Yu Xia, Tong Yu, Ryan A. Rossi, Julian McAuley, Lina Yao |
阅读更多来源: ArXiv AI | 03-08-25
MemoCue: Empowering LLM-Based Agents for Human Memory Recall via Strategy-Guided Querying
Authors: Qian Zhao, Zhuo Sun, Bin Guo, Zhiwen Yu |
阅读更多来源: ArXiv AI | 03-08-25
TextQuests: How Good are LLMs at Text-Based Video Games?
Authors: Long Phan, Mantas Mazeika, Andy Zou, Dan Hendrycks |
阅读更多来源: ArXiv AI | 03-08-25
SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model
Authors: Mingkai Deng, Jinyu Hou, Yilin Shen, Hongxia Jin, Graham Neubig, Zhiting Hu, Eric Xing |
阅读更多来源: ArXiv AI | 03-08-25
OpenAI has reportedly raised $8.3 billion at a $300 billion valuation
阅读更多来源: The Decoder | 03-08-25
Anthropic CEO talks about being labeled a doomer and his OpenAI departure
阅读更多来源: The Decoder | 03-08-25
Under mounting pressure, Apple plans to increase its spending on artificial intelligence projects
阅读更多来源: The Decoder | 03-08-25
Show HN: WebGPU enables local LLM in the browser – demo site with AI chatandreinwald.github.io
阅读更多来源: Hacker News | 03-08-25
Show HN: AI Physics Tutor with Free Body Diagramsphysicsviewer.com
阅读更多来源: Hacker News | 03-08-25
Every leading AI agent failed at least one security test during a massive red teaming competition
阅读更多来源: The Decoder | 03-08-25
Robert Wilson has diedtheartnewspaper.com
阅读更多来源: Hacker News | 02-08-25
Anthropic revokes OpenAI's access to Claudewired.com
阅读更多来源: Hacker News | 02-08-25
Tim Cook rallying Apple employees around AI effortsbloomberg.com
阅读更多来源: Hacker News | 02-08-25
Launch HN: Societies.io (YC W25) – AI simulations of your target audience
阅读更多来源: Hacker News | 02-08-25
Aerodynamic drag in small cyclist formations: shielding the protected rider [pdf]urbanphysics.net
阅读更多来源: Hacker News | 02-08-25
OpenAI's "Study Mode" and the risks of flatteryresobscura.substack.com
阅读更多来源: Hacker News | 02-08-25
Google adds image-to-video and Veo 3 Fast to the Gemini API
阅读更多来源: The Decoder | 02-08-25
Coverage Cat (YC S22) Is Hiring a Senior, Staff, or Principal Engineercoveragecat.com
阅读更多来源: Hacker News | 02-08-25
Make Your Own Backup System – Part 2: Forging the FreeBSD Backup Strongholddragas.net
阅读更多来源: Hacker News | 02-08-25
The tradeoff between human and AI contextsoftwaredoug.com
阅读更多来源: Hacker News | 02-08-25
Deep Agentslangchain.com
阅读更多来源: Hacker News | 02-08-25
Gemini 2.5 Deep Thinkblog.google
阅读更多来源: Hacker News | 02-08-25
Respect instead of sarcasm: study uses AI for better political debates
阅读更多来源: The Decoder | 02-08-25
OpenAI is building Stargate Norway while its annual spending is expected to soar to $8 billion
阅读更多来源: The Decoder | 01-08-25
Interview with Microsoft: Copilot, AI skills, and building a learning organization
阅读更多来源: The Decoder | 01-08-25
Google DeepMind unveils an AI model that acts as a "virtual satellite" for mapping the entire planet
阅读更多来源: The Decoder | 01-08-25
Google and xAI sign EU AI Code of Practice
阅读更多来源: The Decoder | 01-08-25
PHP-ORT: Machine learning inference for the webkrakjoe.github.io
阅读更多来源: Hacker News | 01-08-25
Gemini Embedding: Powering RAG and context engineeringgoogleblog.com
阅读更多来源: Hacker News | 01-08-25
Many countries that said no to ChatControl in 2024 are now undecideddigitalcourage.social
阅读更多来源: Hacker News | 01-08-25
Gemini 2.5 Deep Thinktwitter.com/googledeepmind
阅读更多来源: Hacker News | 01-08-25
Show HN: AgentMail – Email infra for AI agentsagentmail.to
阅读更多来源: Hacker News | 01-08-25
Launch HN: Gecko Security (YC F24) – AI That Finds Vulnerabilities in Code
阅读更多来源: Hacker News | 01-08-25
Show HN: Mcp-use – Connect any LLM to any MCPgithub.com/mcp-use
阅读更多来源: Hacker News | 01-08-25
OpenAI launches Study Mode for ChatGPT while education users are told to wait and learn later
阅读更多来源: The Decoder | 31-07-25
Anthropic could soon be valued at $170 billion
阅读更多来源: The Decoder | 31-07-25
Some Meta employees fear being sidelined as Zuckerberg reshuffles teams for AI progress
阅读更多来源: The Decoder | 31-07-25
Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs
Authors: Xikang Yang, Biyu Zhou, Xuehai Tang, Jizhong Han, Songlin Hu |
阅读更多来源: ArXiv AI | 31-07-25
aLLoyM: A large language model for alloy phase diagram prediction
Authors: Yuna Oikawa, Guillaume Deffrennes, Taichi Abe, Ryo Tamura, Koji Tsuda |
阅读更多来源: ArXiv AI | 31-07-25
RePaCA: Leveraging Reasoning Large Language Models for Static Automated Patch Correctness Assessment
Authors: Marcos Fuster-Pena, David de-Fitero-Dominguez, Antonio Garcia-Cabot, Eva Garcia-Lopez |
阅读更多来源: ArXiv AI | 31-07-25
Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning
Authors: Afshin Khadangi, Amir Sartipi, Igor Tchappi, Ramin Bahmani, Gilbert Fridgen |
阅读更多来源: ArXiv AI | 31-07-25
BALSAM: A Platform for Benchmarking Arabic Large Language Models
Authors: Rawan Al-Matham, Kareem Darwish, Raghad Al-Rasheed, Waad Alshammari, Muneera Alhoshan, Amal Almazrua, Asma Al Wazrah, Mais Alheraki, Firoj Alam, Preslav Nakov, Norah Alzahrani, Eman alBilali, Nizar Habash, Abdelrahman El-Sheikh, Muhammad Elmallah, Haonan Li, Hamdy Mubarak, Mohamed Anwar, Zaid Alyafeai, Ahmed Abdelali, Nora Altwairesh, Maram Hasanain, Abdulmohsen Al Thubaity, Shady Shehata, Bashar Alhafni, Injy Hamed, Go Inoue, Khalid Elmadani, Ossama Obeid, Fatima Haouari, Tamer Elsayed, Emad Alghamdi, Khalid Almubarak, Saied Alshahrani, Ola Aljarrah, Safa Alajlan, Areej Alshaqarawi, Maryam Alshihri, Sultana Alghurabi, Atikah Alzeghayer, Afrah Altamimi, Abdullah Alfaifi, Abdulrahman AlOsaimy |
阅读更多来源: ArXiv AI | 31-07-25
A Systematic Literature Review on Detecting Software Vulnerabilities with Large Language Models
Authors: Sabrina Kaniewski, Fabian Schmidt, Markus Enzweiler, Michael Menth, Tobias Heer |
阅读更多来源: ArXiv AI | 31-07-25
H2Tune: Federated Foundation Model Fine-Tuning with Hybrid Heterogeneity
Authors: Wei Guo, Siyuan Lu, Yiqi Tong, Zhaojun Hu, Fuzhen Zhuang, Xiao Zhang, Tao Fan, Jin Dong |
阅读更多来源: ArXiv AI | 31-07-25
Teaching the Teacher: Improving Neural Network Distillability for Symbolic Regression via Jacobian Regularization
Authors: Soumyadeep Dhar, Kei Sen Fong, Mehul Motani |
阅读更多来源: ArXiv AI | 31-07-25
OFCnetLLM: Large Language Model for Network Monitoring and Alertness
Authors: Hong-Jun Yoon, Mariam Kiran, Danial Ebling, Joe Breen |
阅读更多来源: ArXiv AI | 31-07-25
LLM-Crowdsourced: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models
Authors: Qianhong Guo, Wei Xie, Xiaofang Cai, Enze Wang, Shuoyoucheng Ma, Kai Chen, Xiaofeng Wang, Baosheng Wang |
阅读更多来源: ArXiv AI | 31-07-25
An Explainable Emotion Alignment Framework for LLM-Empowered Agent in Metaverse Service Ecosystem
Authors: Qun Ma, Xiao Xue, Ming Zhang, Yifan Shen, Zihan Zhao |
阅读更多来源: ArXiv AI | 31-07-25
Explainability Through Systematicity: The Hard Systematicity Challenge for Artificial Intelligence
Authors: Matthieu Queloz |
阅读更多来源: ArXiv AI | 31-07-25
Beyond Accuracy: How AI Metacognitive Sensitivity improves AI-assisted Decision Making
Authors: ZhaoBin Li, Mark Steyvers |
阅读更多来源: ArXiv AI | 31-07-25
The Incomplete Bridge: How AI Research (Mis)Engages with Psychology
Authors: Han Jiang, Pengda Wang, Xiaoyuan Yi, Xing Xie, Ziang Xiao |
阅读更多来源: ArXiv AI | 31-07-25
Enhancing Manufacturing Knowledge Access with LLMs and Context-aware Prompting
Authors: Sebastian Monka, Irlan Grangel-González, Stefan Schmid, Lavdim Halilaj, Marc Rickart, Oliver Rudolph, Rui Dias |
阅读更多来源: ArXiv AI | 31-07-25
Automatically discovering heuristics in a complex SAT solver with large language models
Authors: Yiwen Sun, Furong Ye, Zhihan Chen, Ke Wei, Shaowei Cai |
阅读更多来源: ArXiv AI | 31-07-25
Launch HN: Lucidic (YC W25) – Debug, test, and evaluate AI agents in production
阅读更多来源: Hacker News | 31-07-25
Show HN: AgentGuard – Auto-kill AI agents before they burn through your budgetgithub.com/dipampaul17
阅读更多来源: Hacker News | 31-07-25
OpenAI's ChatGPT Agent casually clicks through "I am not a robot" verificationarstechnica.com
阅读更多来源: Hacker News | 31-07-25
AI startup tackles bottleneck where people spend more time checking AI content than creating it
阅读更多来源: The Decoder | 31-07-25
Show HN: An AI agent that learns your product and guides your usersfrigade.ai
阅读更多来源: Hacker News | 31-07-25
A major AI training data set contains millions of examples of personal datatechnologyreview.com
阅读更多来源: Hacker News | 31-07-25
Show HN: Open-source alternative to ChatGPT Agents for browsinggithub.com/trymeka
阅读更多来源: Hacker News | 31-07-25
Critical vulnerability in AI coding platform Base44 allowing unauthorized accesswiz.io
阅读更多来源: Hacker News | 31-07-25
Crush: Glamourous AI coding agent for your favourite terminalgithub.com/charmbracelet
阅读更多来源: Hacker News | 31-07-25
Efficacy of AI RAG Tools for Complex Information Extraction and Data Annotation Tasks: A Case Study Using Banks Public Disclosures
Authors: Nicholas Botti (Federal Reserve Board), Flora Haberkorn (Federal Reserve Board), Charlotte Hoopes (Federal Reserve Board), Shaun Khan (Federal Reserve Board) |
阅读更多来源: ArXiv AI | 30-07-25
Games Agents Play: Towards Transactional Analysis in LLM-based Multi-Agent Systems
Authors: Monika Zamojska, Jarosław A. Chudziak |
阅读更多来源: ArXiv AI | 30-07-25
Validating Pharmacogenomics Generative Artificial Intelligence Query Prompts Using Retrieval-Augmented Generation (RAG)
Authors: Ashley Rector, Keaton Minor, Kamden Minor, Jeff McCormack, Beth Breeden, Ryan Nowers, Jay Dorris |
阅读更多来源: ArXiv AI | 30-07-25
Evo-DKD: Dual-Knowledge Decoding for Autonomous Ontology Evolution in Large Language Models
Authors: Vishal Raman, Vijai Aravindh R |
阅读更多来源: ArXiv AI | 30-07-25
Graph-Augmented Large Language Model Agents: Current Progress and Future Prospects
Authors: Yixin Liu, Guibin Zhang, Kun Wang, Shiyuan Li, Shirui Pan |
阅读更多来源: ArXiv AI | 30-07-25
What Does it Mean for a Neural Network to Learn a "World Model"?
Authors: Kenneth Li, Fernanda Viégas, Martin Wattenberg |
阅读更多来源: ArXiv AI | 30-07-25
MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions
Authors: Yanxu Zhu, Shitong Duan, Xiangxu Zhang, Jitao Sang, Peng Zhang, Tun Lu, Xiao Zhou, Jing Yao, Xiaoyuan Yi, Xing Xie |
阅读更多来源: ArXiv AI | 30-07-25
Large Language Models for Supply Chain Decisions
Authors: David Simchi-Levi, Konstantina Mellou, Ishai Menache, Jeevan Pathuri |
阅读更多来源: ArXiv AI | 30-07-25
An LLM Driven Agent Framework for Automated Infrared Spectral Multi Task Reasoning
Authors: Zujie Xie, Zixuan Chen, Jiheng Liang, Xiangyang Yu, Ziru Yu |
阅读更多来源: ArXiv AI | 30-07-25
SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval-Augmented Generation
Authors: Hao Ye, Mengshi Qi, Zhaohong Liu, Liang Liu, Huadong Ma |
阅读更多来源: ArXiv AI | 30-07-25
Large Language Models for Wireless Communications: From Adaptation to Autonomy
Authors: Le Liang, Hao Ye, Yucheng Sheng, Ouya Wang, Jiacheng Wang, Shi Jin, Geoffrey Ye Li |
阅读更多来源: ArXiv AI | 30-07-25
Self-Aware Safety Augmentation: Leveraging Internal Semantic Understanding to Enhance Safety in Vision-Language Models
Authors: Wanying Wang, Zeyu Ma, Han Zheng, Xin Tan, Mingang Chen |
阅读更多来源: ArXiv AI | 30-07-25
StaffPro: an LLM Agent for Joint Staffing and Profiling
Authors: Alessio Maritan |
阅读更多来源: ArXiv AI | 30-07-25
Exploring the Link Between Bayesian Inference and Embodied Intelligence: Toward Open Physical-World Embodied AI Systems
Authors: Bin Liu |
阅读更多来源: ArXiv AI | 30-07-25
Towards a rigorous evaluation of RAG systems: the challenge of due diligence
Authors: Grégoire Martinon, Alexandra Lorenzo de Brionne, Jérôme Bohard, Antoine Lojou, Damien Hervault, Nicolas J-B. Brunel (ENSIIE, LaMME) |
阅读更多来源: ArXiv AI | 30-07-25
Can the current trends of AI handle a full course of mathematics?
Authors: Mariam Alsayyad, Fayadh Kadhem |
阅读更多来源: ArXiv AI | 30-07-25
An Agentic AI for a New Paradigm in Business Process Development
Authors: Mohammad Azarijafari, Luisa Mich, Michele Missikoff |
阅读更多来源: ArXiv AI | 30-07-25
Tiny-BioMoE: a Lightweight Embedding Model for Biosignal Analysis
Authors: Stefanos Gkikas, Ioannis Kyprakis, Manolis Tsiknakis |
阅读更多来源: ArXiv AI | 30-07-25
Efficient Pain Recognition via Respiration Signals: A Single Cross-Attention Transformer Multi-Window Fusion Pipeline
Authors: Stefanos Gkikas, Ioannis Kyprakis, Manolis Tsiknakis |
阅读更多来源: ArXiv AI | 30-07-25
Libra: Large Chinese-based Safeguard for AI Content
Authors: Ziyang Chen, Huimu Yu, Xing Wu, Dongqin Liu, Songlin Hu |
阅读更多来源: ArXiv AI | 30-07-25
LLM-based Content Classification Approach for GitHub Repositories by the README Files
Authors: Malik Uzair Mehmood, Shahid Hussain, Wen Li Wang, Muhammad Usama Malik |
阅读更多来源: ArXiv AI | 30-07-25
PHAX: A Structured Argumentation Framework for User-Centered Explainable AI in Public Health and Biomedical Sciences
Authors: Bahar İlgen, Akshat Dubey, Georges Hattab |
阅读更多来源: ArXiv AI | 30-07-25
Launch HN: Hyprnote (YC S25) – An open-source AI meeting notetaker
阅读更多来源: Hacker News | 30-07-25
Study modeopenai.com
阅读更多来源: Hacker News | 30-07-25
Irrelevant facts about cats added to math problems increase LLM errors by 300%science.org
阅读更多来源: Hacker News | 30-07-25
Show HN: I built an AI that turns any book into a text adventure gamekathaaverse.com
阅读更多来源: Hacker News | 30-07-25
Tencent releases Hunyuan World Model 1.0 as an open-source AI for 3D scene generation
阅读更多来源: The Decoder | 29-07-25
Enough AI copilots, we need AI HUDsgeoffreylitt.com
阅读更多来源: Hacker News | 29-07-25
Claude Code weekly rate limits
阅读更多来源: Hacker News | 29-07-25
Show HN: Companies use AI to take your calls. I built AI to make them for youpipervoice.com
阅读更多来源: Hacker News | 29-07-25
Anthropic Faces Potentially "Business-Ending" Copyright Lawsuitobsolete.pub
阅读更多来源: Hacker News | 29-07-25
Tao on “blue team” vs. “red team” LLMsmathstodon.xyz
阅读更多来源: Hacker News | 29-07-25
The wall confronting large language models
Authors: Peter V. Coveney, Sauro Succi |
阅读更多来源: ArXiv AI | 29-07-25
Alignment and Safety in Large Language Models: Safety Mechanisms, Training Paradigms, and Emerging Challenges
Authors: Haoran Lu, Luyang Fang, Ruidong Zhang, Xinliang Li, Jiazhang Cai, Huimin Cheng, Lin Tang, Ziyu Liu, Zeliang Sun, Tao Wang, Yingchuan Zhang, Arif Hassan Zidan, Jinwen Xu, Jincheng Yu, Meizhi Yu, Hanqi Jiang, Xilin Gong, Weidi Luo, Bolun Sun, Yongkai Chen, Terry Ma, Shushan Wu, Yifan Zhou, Junhao Chen, Haotian Xiang, Jing Zhang, Afrar Jahin, Wei Ruan, Ke Deng, Yi Pan, Peilong Wang, Jiahui Li, Zhengliang Liu, Lu Zhang, Lin Zhao, Wei Liu, Dajiang Zhu, Xin Xing, Fei Dou, Wei Zhang, Chao Huang, Rongjie Liu, Mengrui Zhang, Yiwen Liu, Xiaoxiao Sun, Qin Lu, Zhen Xiang, Wenxuan Zhong, Tianming Liu, Ping Ma |
阅读更多来源: ArXiv AI | 29-07-25
DeltaLLM: A Training-Free Framework Exploiting Temporal Sparsity for Efficient Edge LLM Inference
Authors: Jiawen Qi, Chang Gao, Zhaochun Ren, Qinyu Chen |
阅读更多来源: ArXiv AI | 29-07-25
Leveraging Fine-Tuned Large Language Models for Interpretable Pancreatic Cystic Lesion Feature Extraction and Risk Categorization
Authors: Ebrahim Rasromani, Stella K. Kang, Yanqi Xu, Beisong Liu, Garvit Luhadia, Wan Fung Chui, Felicia L. Pasadyn, Yu Chih Hung, Julie Y. An, Edwin Mathieu, Zehui Gu, Carlos Fernandez-Granda, Ammar A. Javed, Greg D. Sacks, Tamas Gonda, Chenchan Huang, Yiqiu Shen |
阅读更多来源: ArXiv AI | 29-07-25
Can LLMs Solve ASP Problems? Insights from a Benchmarking Study (Extended Version)
Authors: Lin Ren, Guohui Xiao, Guilin Qi, Yishuai Geng, Haohan Xue |
阅读更多来源: ArXiv AI | 29-07-25
Improving Subgraph Matching by Combining Algorithms and Graph Neural Networks
Authors: Shuyang Guo, Wenjin Xie, Ping Lu, Ting Deng, Richong Zhang, Jianxin Li, Xiangping Huang, Zhongyi Liu |
阅读更多来源: ArXiv AI | 29-07-25
The Policy Cliff: A Theoretical Analysis of Reward-Policy Maps in Large Language Models
Authors: Xingcheng Xu |
阅读更多来源: ArXiv AI | 29-07-25
PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training
Authors: Sarat Chandra Bobbili, Ujwal Dinesha, Dheeraj Narasimha, Srinivas Shakkottai |
阅读更多来源: ArXiv AI | 29-07-25
Matching Game Preferences Through Dialogical Large Language Models: A Perspective
Authors: Renaud Fabre, Daniel Egret, Patrice Bellot |
阅读更多来源: ArXiv AI | 29-07-25
Artificial Intelligence In Patent And Market Intelligence: A New Paradigm For Technology Scouting
Authors: Manish Verma, Vivek Sharma, Vishal Singh |
阅读更多来源: ArXiv AI | 29-07-25
Unlearning of Knowledge Graph Embedding via Preference Optimization
Authors: Jiajun Liu, Wenjun Ke, Peng Wang, Yao He, Ziyu Shang, Guozheng Li, Zijie Xu, Ke Ji |
阅读更多来源: ArXiv AI | 29-07-25
MeLA: A Metacognitive LLM-Driven Architecture for Automatic Heuristic Design
Authors: Zishang Qiu, Xinan Chen, Long Chen, Ruibin Bai |
阅读更多来源: ArXiv AI | 29-07-25
Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition
Authors: Andy Zou, Maxwell Lin, Eliot Jones, Micha Nowak, Mateusz Dziemian, Nick Winter, Alexander Grattan, Valent Nathanael, Ayla Croft, Xander Davies, Jai Patel, Robert Kirk, Nate Burnikell, Yarin Gal, Dan Hendrycks, J. Zico Kolter, Matt Fredrikson |
阅读更多来源: ArXiv AI | 29-07-25
Enhancing QoS in Edge Computing through Federated Layering Techniques: A Pathway to Resilient AI Lifelong Learning Systems
Authors: Chengzhuo Han |
阅读更多来源: ArXiv AI | 29-07-25
MMGraphRAG: Bridging Vision and Language with Interpretable Multimodal Knowledge Graphs
Authors: Xueyao Wan, Hang Yu |
阅读更多来源: ArXiv AI | 29-07-25
evalSmarT: An LLM-Based Framework for Evaluating Smart Contract Generated Comments
Authors: Fatou Ndiaye Mbodji |
阅读更多来源: ArXiv AI | 29-07-25
On the Limits of Hierarchically Embedded Logic in Classical Neural Networks
Authors: Bill Cochran |
阅读更多来源: ArXiv AI | 29-07-25
MIRAGE-Bench: LLM Agent is Hallucinating and Where to Find Them
Authors: Weichen Zhang, Yiyou Sun, Pohao Huang, Jiayue Pu, Heyue Lin, Dawn Song |
阅读更多来源: ArXiv AI | 29-07-25
Principles for production AI agentsapp.build
阅读更多来源: Hacker News | 29-07-25
AI Is Wrecking a Fragile Job Market for College Graduateswsj.com
阅读更多来源: Hacker News | 29-07-25
China pitches new global AI regulator based in Shanghai
阅读更多来源: The Decoder | 28-07-25
China exports state propaganda with low-cost open source AI models
阅读更多来源: The Decoder | 28-07-25
Mistral AI publishes the first comprehensive life cycle assessment of a large language model
阅读更多来源: The Decoder | 28-07-25
Amazon launches Kiro to streamline AI prototyping
阅读更多来源: The Decoder | 28-07-25
Claude Code Routergithub.com/musistudio
阅读更多来源: Hacker News | 28-07-25
LLM Embeddings Explained: A Visual and Intuitive Guidehuggingface.co
阅读更多来源: Hacker News | 28-07-25
Automated Code Review Using Large Language Models at Ericsson: An Experience Report
Authors: Shweta Ramesh, Joy Bose, Hamender Singh, A K Raghavan, Sujoy Roychowdhury, Giriprasad Sridhara, Nishrith Saini, Ricardo Britto |
阅读更多来源: ArXiv AI | 28-07-25
Solar Photovoltaic Assessment with Large Language Model
Authors: Muhao Guo, Yang Weng |
阅读更多来源: ArXiv AI | 28-07-25
PrompTrend: Continuous Community-Driven Vulnerability Discovery and Assessment for Large Language Models
Authors: Tarek Gasmi, Ramzi Guesmi, Mootez Aloui, Jihene Bennaceur |
阅读更多来源: ArXiv AI | 28-07-25
An Empirical Investigation of Gender Stereotype Representation in Large Language Models: The Italian Case
Authors: Gioele Giachino, Marco Rondina, Antonio Vetrò, Riccardo Coppola, Juan Carlos De Martin |
阅读更多来源: ArXiv AI | 28-07-25
Enhancing Diabetic Retinopathy Classification Accuracy through Dual Attention Mechanism in Deep Learning
Authors: Abdul Hannan, Zahid Mahmood, Rizwan Qureshi, Hazrat Ali |
阅读更多来源: ArXiv AI | 28-07-25
Can Small-Scale Data Poisoning Exacerbate Dialect-Linked Biases in Large Language Models?
Authors: Chaymaa Abbas, Mariette Awad, Razane Tajeddine |
阅读更多来源: ArXiv AI | 28-07-25
Towards LLM-Enhanced Group Recommender Systems
Authors: Sebastian Lubos, Alexander Felfernig, Thi Ngoc Trang Tran, Viet-Man Le, Damian Garber, Manuel Henrich, Reinhard Willfort, Jeremias Fuchs |
阅读更多来源: ArXiv AI | 28-07-25
Fine-Tuning Multilingual Language Models for Code Review: An Empirical Study on Industrial C# Projects
Authors: Igli Begolli, Meltem Aksoy, Daniel Neider |
阅读更多来源: ArXiv AI | 28-07-25
Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Tasks
Authors: Kai Liu, Zhan Su, Peijie Dong, Fengran Mo, Jianfei Gao, ShaoTing Zhang, Kai Chen |
阅读更多来源: ArXiv AI | 28-07-25
Doubling Your Data in Minutes: Ultra-fast Tabular Data Generation via LLM-Induced Dependency Graphs
Authors: Shuo Yang, Zheyu Zhang, Bardh Prenkaj, Gjergji Kasneci |
阅读更多来源: ArXiv AI | 28-07-25
SIDE: Sparse Information Disentanglement for Explainable Artificial Intelligence
Authors: Viktar Dubovik, Łukasz Struski, Jacek Tabor, Dawid Rymarczyk |
阅读更多来源: ArXiv AI | 28-07-25
SpeechIQ: Speech Intelligence Quotient Across Cognitive Levels in Voice Understanding Large Language Models
Authors: Zhen Wan, Chao-Han Huck Yang, Yahan Yu, Jinchuan Tian, Sheng Li, Ke Hu, Zhehuai Chen, Shinji Watanabe, Fei Cheng, Chenhui Chu, Sadao Kurohashi |
阅读更多来源: ArXiv AI | 28-07-25
ReCatcher: Towards LLMs Regression Testing for Code Generation
Authors: Altaf Allah Abbassi, Leuson Da Silva, Amin Nikanjam, Foutse Khomh |
阅读更多来源: ArXiv AI | 28-07-25
Running in CIRCLE? A Simple Benchmark for LLM Code Interpreter Security
Authors: Gabriel Chua |
阅读更多来源: ArXiv AI | 28-07-25
Advancing Event Forecasting through Massive Training of Large Language Models: Challenges, Solutions, and Broader Impacts
Authors: Sang-Woo Lee, Sohee Yang, Donghyun Kwak, Noah Y. Siegel |
阅读更多来源: ArXiv AI | 28-07-25
Knowledge Grafting: A Mechanism for Optimizing AI Model Deployment in Resource-Constrained Environments
Authors: Osama Almurshed, Ashish Kaushal, Asmail Muftah, Nitin Auluck, Omer Rana |
阅读更多来源: ArXiv AI | 28-07-25
Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges
Authors: Patrick Taillandier, Jean Daniel Zucker, Arnaud Grignard, Benoit Gaudou, Nghi Quang Huynh, Alexis Drogoul |
阅读更多来源: ArXiv AI | 28-07-25
Microsoft revives Clippy as an AI blob in a new Copilot Appearance test
阅读更多来源: The Decoder | 27-07-25
No AI Contenteclecticlight.co
阅读更多来源: Hacker News | 27-07-25
Fast and cheap bulk storage: using LVM to cache HDDs on SSDsquantum5.ca
阅读更多来源: Hacker News | 27-07-25
Linux on Snapdragon X Elite: Linaro and Tuxedo Pave the Way for ARM64 Laptopslinaro.org
阅读更多来源: Hacker News | 27-07-25
Restoring Rhythm: Punctuation Restoration Using Transformer Models for Bangla, a Low-Resource Language
Authors: Md Obyedullahil Mamun, Md Adyelullahil Mamun, Arif Ahmad, Md. Imran Hossain Emu |
阅读更多来源: ArXiv AI | 27-07-25
AraTable: Benchmarking LLMs' Reasoning and Understanding of Arabic Tabular Data
Authors: Rana Alshaikh, Israa Alghanmi, Shelan Jeawak |
阅读更多来源: ArXiv AI | 27-07-25
CLEAR: Error Analysis via LLM-as-a-Judge Made Easy
Authors: Asaf Yehudai, Lilach Eden, Yotam Perlitz, Roy Bar-Haim, Michal Shmueli-Scheuer |
阅读更多来源: ArXiv AI | 27-07-25
Automated Code Review Using Large Language Models with Symbolic Reasoning
Authors: Busra Icoz, Goksel Biricik |
阅读更多来源: ArXiv AI | 27-07-25
Sandwich: Separating Prefill-Decode Compilation for Efficient CPU LLM Serving
Authors: Juntao Zhao, Jiuru Li, Chuan Wu |
阅读更多来源: ArXiv AI | 27-07-25
HARLF: Hierarchical Reinforcement Learning and Lightweight LLM-Driven Sentiment Integration for Financial Portfolio Optimization
Authors: Benjamin Coriat, Eric Benhamou |
阅读更多来源: ArXiv AI | 27-07-25
AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs
Authors: Xiaopeng Ke, Hexuan Deng, Xuebo Liu, Jun Rao, Zhenxi Song, Jun Yu, Min Zhang |
阅读更多来源: ArXiv AI | 27-07-25
SMARTAPS: Tool-augmented LLMs for Operations Management
Authors: Timothy Tin Long Yu, Mahdi Mostajabdaveh, Jabo Serge Byusa, Rindra Ramamonjison, Giuseppe Carenini, Kun Mao, Zirui Zhou, Yong Zhang |
阅读更多来源: ArXiv AI | 27-07-25
Does visualization help AI understand data?
Authors: Victoria R. Li, Johnathan Sun, Martin Wattenberg |
阅读更多来源: ArXiv AI | 27-07-25
Agentic AI framework for End-to-End Medical Data Inference
Authors: Soorya Ram Shimgekar, Shayan Vassef, Abhay Goyal, Navin Kumar, Koustuv Saha |
阅读更多来源: ArXiv AI | 27-07-25
Foundations for Risk Assessment of AI in Protecting Fundamental Rights
Authors: Antonino Rotolo, Beatrice Ferrigno, Jose Miguel Angel Garcia Godinez, Claudio Novelli, Giovanni Sartor |
阅读更多来源: ArXiv AI | 27-07-25
Decoupling Knowledge and Reasoning in LLMs: An Exploration Using Cognitive Dual-System Theory
Authors: Mutian Yang, Jiandong Gao, Ji Wu |
阅读更多来源: ArXiv AI | 27-07-25
Reasoning Beyond the Obvious: Evaluating Divergent and Convergent Thinking in LLMs for Financial Scenarios
Authors: Zhuang Qiang Bok, Watson Wei Khong Chua |
阅读更多来源: ArXiv AI | 27-07-25
Revisiting LLM Reasoning via Information Bottleneck
Authors: Shiye Lei, Zhihao Cheng, Kai Jia, Dacheng Tao |
阅读更多来源: ArXiv AI | 27-07-25
Reports say GPT-5 could arrive in August with improvements in coding
阅读更多来源: The Decoder | 27-07-25
Google Deepmind's Aeneas AI helps historians quickly restore and interpret Roman inscriptions
阅读更多来源: The Decoder | 26-07-25
Reuters says at least a dozen Shenzhen firms repair banned Nvidia H100 and A100 AI chips
阅读更多来源: The Decoder | 26-07-25
Google says AI content is fine, and SEO basics still apply to AI-powered search
阅读更多来源: The Decoder | 26-07-25
Show HN: Price Per Token – LLM API Pricing Datapricepertoken.com
阅读更多来源: Hacker News | 26-07-25
Claude Code introduces specialized sub-agentsanthropic.com
阅读更多来源: Hacker News | 26-07-25
AWS shuts its Shanghai AI lab as McKinsey bans generative AI projects for clients in China
阅读更多来源: The Decoder | 25-07-25
Trump's radical AI plan: no copyrights, fewer rules, more exports
阅读更多来源: The Decoder | 25-07-25
Anthropic says that AI can learn risky behaviors even when the training data looks completely safe
阅读更多来源: The Decoder | 25-07-25
Finding Robert Bogucki, the man who disappeared on purposeabc.net.au
阅读更多来源: Hacker News | 25-07-25
How Anthropic teams use Claude Codeanthropic.com
阅读更多来源: Hacker News | 25-07-25
Quantitative AI progress needs accurate and transparent evaluationmathstodon.xyz
阅读更多来源: Hacker News | 25-07-25
Superfunctions: A universal solution against sync/async fragmentation in Pythongithub.com/pomponchik
阅读更多来源: Hacker News | 25-07-25
Pew finds that only 1 percent of users click a source link directly from Google's AI Overviews
阅读更多来源: The Decoder | 24-07-25
Lumo: Privacy-first AI assistantproton.me
阅读更多来源: Hacker News | 24-07-25
Building better AI toolshazelweakly.me
阅读更多来源: Hacker News | 24-07-25
US AI Action Planai.gov
阅读更多来源: Hacker News | 24-07-25
Distillation makes AI models smaller and cheaperquantamagazine.org
阅读更多来源: Hacker News | 24-07-25
Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning
Authors: Situo Zhang, Hanqi Li, Lu Chen, Zihan Zhao, Xuanze Lin, Zichen Zhu, Bo Chen, Xin Chen, Kai Yu |
阅读更多来源: ArXiv AI | 24-07-25
Each to Their Own: Exploring the Optimal Embedding in RAG
Authors: Shiting Chen, Zijian Zhao, Jinsong Chen |
阅读更多来源: ArXiv AI | 24-07-25
Content-based 3D Image Retrieval and a ColBERT-inspired Re-ranking for Tumor Flagging and Staging
Authors: Farnaz Khun Jush, Steffen Vogler, Matthias Lenga |
阅读更多来源: ArXiv AI | 24-07-25
MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs
Authors: Alexander R. Fabbri, Diego Mares, Jorge Flores, Meher Mankikar, Ernesto Hernandez, Dean Lee, Bing Liu, Chen Xing |
阅读更多来源: ArXiv AI | 24-07-25
Vision Transformer attention alignment with human visual perception in aesthetic object evaluation
Authors: Miguel Carrasco, César González-Martín, José Aranda, Luis Oliveros |
阅读更多来源: ArXiv AI | 24-07-25
AI Telephone Surveying: Automating Quantitative Data Collection with an AI Interviewer
Authors: Danny D. Leybzon, Shreyas Tirumala, Nishant Jain, Summer Gillen, Michael Jackson, Cameron McPhee, Jennifer Schmidt |
阅读更多来源: ArXiv AI | 24-07-25
CASCADE: LLM-Powered JavaScript Deobfuscator at Google
Authors: Shan Jiang, Pranoy Kovuri, David Tao, Zhixun Tan |
阅读更多来源: ArXiv AI | 24-07-25
LoRA is All You Need for Safety Alignment of Reasoning LLMs
Authors: Yihao Xue, Baharan Mirzasoleiman |
阅读更多来源: ArXiv AI | 24-07-25
Towards Autonomous Sustainability Assessment via Multimodal AI Agents
Authors: Zhihan Zhang, Alexander Metzger, Yuxuan Mei, Felix Hähnlein, Zachary Englhardt, Tingyu Cheng, Gregory D. Abowd, Shwetak Patel, Adriana Schulz, Vikram Iyer |
阅读更多来源: ArXiv AI | 24-07-25
Our Cars Can Talk: How IoT Brings AI to Vehicles
Authors: Amod Kant Agrawal |
阅读更多来源: ArXiv AI | 24-07-25
Improving LLMs' Generalized Reasoning Abilities by Graph Problems
Authors: Qifan Zhang, Nuo Chen, Zehua Li, Miao Peng, Jing Tang, Jia Li |
阅读更多来源: ArXiv AI | 24-07-25
HySafe-AI: Hybrid Safety Architectural Analysis Framework for AI Systems: A Case Study
Authors: Mandar Pitale, Jelena Frtunikj, Abhinaw Priyadershi, Vasu Singh, Maria Spence |
阅读更多来源: ArXiv AI | 24-07-25
Compliance Brain Assistant: Conversational Agentic AI for Assisting Compliance Tasks in Enterprise Environments
Authors: Shitong Zhu, Chenhao Fang, Derek Larson, Neel Reddy Pochareddy, Rajeev Rao, Sophie Zeng, Yanqing Peng, Wendy Summer, Alex Goncalves, Arya Pudota, Herve Robert |
阅读更多来源: ArXiv AI | 24-07-25
An Uncertainty-Driven Adaptive Self-Alignment Framework for Large Language Models
Authors: Haoran Sun, Zekun Zhang, Shaoning Zeng |
阅读更多来源: ArXiv AI | 24-07-25
TAI Scan Tool: A RAG-Based Tool With Minimalistic Input for Trustworthy AI Self-Assessment
Authors: Athanasios Davvetas, Xenia Ziouvelou, Ypatia Dami, Alexis Kaponis, Konstantina Giouvanopoulou, Michael Papademas |
阅读更多来源: ArXiv AI | 24-07-25
Simulating multiple human perspectives in socio-ecological systems using large language models
Authors: Yongchao Zeng, Calum Brown, Ioannis Kyriakou, Ronja Hotz, Mark Rounsevell |
阅读更多来源: ArXiv AI | 24-07-25
Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning
Authors: Xinyao Liu, Diping Song |
阅读更多来源: ArXiv AI | 24-07-25
OpenAI’s new agent moves its 2017 vision for AI closer to reality
阅读更多来源: The Decoder | 24-07-25
Google’s Gemini 2.5 now supports "conversational image segmentation"
阅读更多来源: The Decoder | 24-07-25
OpenAI pushes ahead with Stargate as SoftBank remains absent from data center development
阅读更多来源: The Decoder | 23-07-25
Yet another study finds that overloading LLMs with information leads to worse results
阅读更多来源: The Decoder | 23-07-25
OpenAI’s math gold hints that AI may soon tackle even longer and harder tasks
阅读更多来源: The Decoder | 23-07-25
I watched Gemini CLI hallucinate and delete my filesanuraag2601.github.io
阅读更多来源: Hacker News | 23-07-25
Never Come Up Empty: Adaptive HyDE Retrieval for Improving LLM Developer Support
Authors: Fangjian Lei, Mariam El Mezouar, Shayan Noei, Ying Zou |
阅读更多来源: ArXiv AI | 23-07-25
Decoding Translation-Related Functional Sequences in 5'UTRs Using Interpretable Deep Learning Models
Authors: Yuxi Lin, Yaxue Fang, Zehong Zhang, Zhouwu Liu, Siyun Zhong, Fulong Yu |
阅读更多来源: ArXiv AI | 23-07-25
Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning
Authors: Helena Casademunt, Caden Juang, Adam Karvonen, Samuel Marks, Senthooran Rajamanoharan, Neel Nanda |
阅读更多来源: ArXiv AI | 23-07-25
Rethinking LLM-Based RTL Code Optimization Via Timing Logic Metamorphosis
Authors: Zhihao Xu, Bixin Li, Lulu Wang |
阅读更多来源: ArXiv AI | 23-07-25
Why Braking? Scenario Extraction and Reasoning Utilizing LLM
Authors: Yin Wu, Daniel Slieter, Vivek Subramanian, Ahmed Abouelazm, Robin Bohn, J. Marius Zöllner |
阅读更多来源: ArXiv AI | 23-07-25
Out-of-Distribution Generalization in the ARC-AGI Domain: Comparing Execution-Guided Neural Program Synthesis and Test-Time Fine-Tuning
Authors: Simon Ouellette |
阅读更多来源: ArXiv AI | 23-07-25
Differential Multimodal Transformers
Authors: Jerry Li, Timothy Oh, Joseph Hoang, Vardhit Veeramachaneni |
阅读更多来源: ArXiv AI | 23-07-25
Micromobility Flow Prediction: A Bike Sharing Station-level Study via Multi-level Spatial-Temporal Attention Neural Network
Authors: Xi Yang, Jiachen Wang, Song Han, Suining He |
阅读更多来源: ArXiv AI | 23-07-25
Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization
Authors: Shengchao Liu, Hannan Xu, Yan Ai, Huanxin Li, Yoshua Bengio, Harry Guo |
阅读更多来源: ArXiv AI | 23-07-25
From Logic to Language: A Trust Index for Problem Solving with LLMs
Authors: Tehseen Rug, Felix Böhmer, Tessa Pfattheicher |
阅读更多来源: ArXiv AI | 23-07-25
SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validation in COPD Reporting
Authors: Shuhao Mei, Yongchao Long, Shan Cao, Xiaobo Han, Shijia Geng, Jinbo Sun, Yuxi Zhou, Shenda Hong |
阅读更多来源: ArXiv AI | 23-07-25
Voice-based AI Agents: Filling the Economic Gaps in Digital Health Delivery
Authors: Bo Wen, Chen Wang, Qiwei Han, Raquel Norel, Julia Liu, Thaddeus Stappenbeck, Jeffrey L. Rogers |
阅读更多来源: ArXiv AI | 23-07-25
Distilled Large Language Model in Confidential Computing Environment for System-on-Chip Design
Authors: Dong Ben, Hui Feng, Qian Wang |
阅读更多来源: ArXiv AI | 23-07-25
ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry
Authors: Tianze Xu, Pengrui Lu, Lyumanshan Ye, Xiangkun Hu, Pengfei Liu |
阅读更多来源: ArXiv AI | 23-07-25
Mind the Gap: Evaluating the Representativeness of Quantitative Medical Language Reasoning LLM Benchmarks for African Disease Burdens
Authors: Fred Mutisya (1 and 2), Shikoh Gitau (1), Christine Syovata (2), Diana Oigara (2), Ibrahim Matende (2), Muna Aden (2), Munira Ali (2), Ryan Nyotu (2), Diana Marion (2), Job Nyangena (2), Nasubo Ongoma (1), Keith Mbae (1), Elizabeth Wamicha (1), Eric Mibuari (1), Jean Philbert Nsengemana (3), Talkmore Chidede (4) ((1) Qhala (Nairobi, Kenya), (2) Kenya Medical Association (Nairobi, Kenya), (3) Africa CDC (Addis Ababa, Ethiopia), (4) AfCFTA (Accra, Ghana)) |
阅读更多来源: ArXiv AI | 23-07-25
LLM-Driven Collaborative Model for Untangling Commits via Explicit and Implicit Dependency Reasoning
Authors: Bo Hou, Xin Tan, Kai Zheng, Fang Liu, Yinghao Zhu, Li Zhang |
阅读更多来源: ArXiv AI | 23-07-25
Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework
Authors: Hongyi Tang, Zhihao Zhu, Yi Yang |
阅读更多来源: ArXiv AI | 23-07-25
Improving ASP-based ORS Schedules through Machine Learning Predictions
Authors: Pierangela Bruno, Carmine Dodaro, Giuseppe Galatà, Marco Maratea, Marco Mochi |
阅读更多来源: ArXiv AI | 23-07-25
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report
Authors: Shanghai AI Lab: Xiaoyang Chen, Yunhao Chen, Zeren Chen, Zhiyun Chen, Hanyun Cui, Yawen Duan, Jiaxuan Guo, Qi Guo, Xuhao Hu, Hong Huang, Lige Huang, Chunxiao Li, Juncheng Li, Qihao Lin, Dongrui Liu, Xinmin Liu, Zicheng Liu, Chaochao Lu, Xiaoya Lu, Jingjing Qu, Qibing Ren, Jing Shao, Jingwei Shi, Jingwei Sun, Peng Wang, Weibing Wang, Jia Xu, Lewen Yan, Xiao Yu, Yi Yu, Boxuan Zhang, Jie Zhang, Weichen Zhang, Zhijie Zheng, Tianyi Zhou, Bowen Zhou |
阅读更多来源: ArXiv AI | 23-07-25
Agentic RAG with Knowledge Graphs for Complex Multi-Hop Reasoning in Real-World Applications
Authors: Jean Lelong, Adnane Errazine, Annabelle Blangero |
阅读更多来源: ArXiv AI | 23-07-25
Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints
Authors: Zhenyun Yin, Shujie Wang, Xuhong Wang, Xingjun Ma, Yinchun Wang |
阅读更多来源: ArXiv AI | 23-07-25
WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding
Authors: Ran Wang, Xiaoxuan Liu, Hao Ren, Gang Chen, Fanchao Qi, Maosong Sun |
阅读更多来源: ArXiv AI | 23-07-25
Uncertainty-Aware Knowledge Transformers for Peer-to-Peer Energy Trading with Multi-Agent Reinforcement Learning
Authors: Mian Ibad Ali Shah, Enda Barrett, Karl Mason |
阅读更多来源: ArXiv AI | 23-07-25
Subliminal learning: Models transmit behaviors via hidden signals in dataanthropic.com
阅读更多来源: Hacker News | 23-07-25
Gemini North telescope discovers long-predicted stellar companion of Betelgeusescience.org
阅读更多来源: Hacker News | 23-07-25
New ARC-AGI-3 benchmark shows that humans still outperform LLMs at pretty basic thinking
阅读更多来源: The Decoder | 22-07-25
OpenAI claims a breakthrough in LLM reasoning on complex math problems
阅读更多来源: The Decoder | 22-07-25
FlexOlmo enables organizations to collaboratively train LLMs without data sharing
阅读更多来源: The Decoder | 22-07-25
Routine: A Structural Planning Framework for LLM Agent System in Enterprise
Authors: Guancheng Zeng, Xueyi Chen, Jiawang Hu, Shaohua Qi, Yaxuan Mao, Zhantao Wang, Yifan Nie, Shuang Li, Qiuyang Feng, Pengxu Qiu, Yujia Wang, Wenqiang Han, Linyan Huang, Gang Li, Jingjing Mo, Haowen Hu |
阅读更多来源: ArXiv AI | 22-07-25
Large Language Models Assisting Ontology Evaluation
Authors: Anna Sofia Lippolis, Mohammad Javad Saeedizade, Robin Keskisärkkä, Aldo Gangemi, Eva Blomqvist, Andrea Giovanni Nuzzolese |
阅读更多来源: ArXiv AI | 22-07-25
BioGraphFusion: Graph Knowledge Embedding for Biological Completion and Reasoning
Authors: Yitong Lin, Jiaying He, Jiahe Chen, Xinnan Zhu, Jianwei Zheng, Tao Bo |
阅读更多来源: ArXiv AI | 22-07-25
Towards AI Urban Planner in the Age of GenAI, LLMs, and Agentic AI
Authors: Yanjie Fu |
阅读更多来源: ArXiv AI | 22-07-25
Automated Safety Evaluations Across 20 Large Language Models: The Aymara LLM Risk and Responsibility Matrix
Authors: Juan Manuel Contreras |
阅读更多来源: ArXiv AI | 22-07-25
Configurable multi-agent framework for scalable and realistic testing of llm-based agents
Authors: Sai Wang, Senthilnathan Subramanian, Mudit Sahni, Praneeth Gone, Lingjie Meng, Xiaochen Wang, Nicolas Ferradas Bertoli, Tingxian Cheng, Jun Xu |
阅读更多来源: ArXiv AI | 22-07-25
The Endless Tuning. An Artificial Intelligence Design To Avoid Human Replacement and Trace Back Responsibilities
Authors: Elio Grande |
阅读更多来源: ArXiv AI | 22-07-25
Feedback-Induced Performance Decline in LLM-Based Decision-Making
Authors: Xiao Yang, Juxi Leitner, Michael Burke |
阅读更多来源: ArXiv AI | 22-07-25
DeRAG: Black-box Adversarial Attacks on Multiple Retrieval-Augmented Generation Applications via Prompt Injection
Authors: Jerry Wang, Fang Yu |
阅读更多来源: ArXiv AI | 22-07-25
IM-Chat: A Multi-agent LLM-based Framework for Knowledge Transfer in Injection Molding Industry
Authors: Junhyeong Lee, Joon-Young Kim, Heekyu Kim, Inhyo Lee, Seunghwa Ryu |
阅读更多来源: ArXiv AI | 22-07-25
Explainable Artificial Intelligence based Soft Evaluation Indicator for Arc Fault Diagnosis
Authors: Qianchao Wang, Yuxuan Ding, Chuanzhen Jia, Zhe Li, Yaping Du |
阅读更多来源: ArXiv AI | 22-07-25
LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning
Authors: Cole Robertson, Philip Wolff |
阅读更多来源: ArXiv AI | 22-07-25
Predictive Process Monitoring Using Object-centric Graph Embeddings
Authors: Wissam Gherissi (LAMSADE), Mehdi Acheli, Joyce El Haddad (LAMSADE), Daniela Grigori (LAMSADE) |
阅读更多来源: ArXiv AI | 22-07-25
Agentic AI for autonomous anomaly management in complex systems
Authors: Reza Vatankhah Barenji, Sina Khoshgoftar |
阅读更多来源: ArXiv AI | 22-07-25
A Framework for Analyzing Abnormal Emergence in Service Ecosystems Through LLM-based Agent Intention Mining
Authors: Yifan Shen, Zihan Zhao, Xiao Xue, Yuwei Guo, Qun Ma, Deyu Zhou, Ming Zhang |
阅读更多来源: ArXiv AI | 22-07-25
Gemini 2.5 Pro Capable of Winning Gold at IMO 2025
Authors: Yichen Huang, Lin F. Yang |
阅读更多来源: ArXiv AI | 22-07-25
Don't bother parsing: Just use images for RAGmorphik.ai
阅读更多来源: Hacker News | 22-07-25
AccountingBench: Evaluating LLMs on real long-horizon business taskspenrose.com
阅读更多来源: Hacker News | 22-07-25
The Hater's Guide to the AI Bubblewheresyoured.at
阅读更多来源: Hacker News | 22-07-25
How to Migrate from OpenAI to Cerebrium for Cost-Predictable AI Inferenceritza.co
阅读更多来源: Hacker News | 22-07-25
Complete silence is always hallucinated as "ترجمة نانسي قنقر" in Arabicgithub.com/openai
阅读更多来源: Hacker News | 22-07-25
Replit's CEO apologizes after its AI agent wiped a company's code basebusinessinsider.com
阅读更多来源: Hacker News | 22-07-25
If writing is thinking then what happens if AI is doing the writing and reading?learningbyshipping.com
阅读更多来源: Hacker News | 22-07-25
"Napster-style" piracy allegations put Anthropic at risk of a billion-dollar class action lawsuit
阅读更多来源: The Decoder | 21-07-25
Decart launches MirageLSD, an AI model that transforms live video feeds in real time
阅读更多来源: The Decoder | 21-07-25
Show HN: Conductor, a Mac app that lets you run a bunch of Claude Codes at onceconductor.build
阅读更多来源: Hacker News | 21-07-25
Coding with LLMs in the summer of 2025 – an updateantirez.com
阅读更多来源: Hacker News | 21-07-25
SPARQL Query Generation with LLMs: Measuring the Impact of Training Data Memorization and Knowledge Injection
Authors: Aleksandr Gashkov, Aleksandr Perevalov, Maria Eltsova, Andreas Both |
阅读更多来源: ArXiv AI | 21-07-25
RAG-based Architectures for Drug Side Effect Retrieval in LLMs
Authors: Shad Nygren, Pinar Avci, Andre Daniels, Reza Rassol, Afshin Beheshti, Diego Galeano |
阅读更多来源: ArXiv AI | 21-07-25
Using LLMs to identify features of personal and professional skills in an open-response situational judgment test
Authors: Cole Walsh, Rodica Ivan, Muhammad Zafar Iqbal, Colleen Robb |
阅读更多来源: ArXiv AI | 21-07-25
Exploiting Primacy Effect To Improve Large Language Models
Authors: Bianca Raimondi, Maurizio Gabbrielli |
阅读更多来源: ArXiv AI | 21-07-25
Preprint: Did I Just Browse A Website Written by LLMs?
Authors: Sichang "Steven" He, Ramesh Govindan, Harsha V. Madhyastha |
阅读更多来源: ArXiv AI | 21-07-25
A segmented robot grasping perception neural network for edge AI
Authors: Casper Bröcheler, Thomas Vroom, Derrick Timmermans, Alan van den Akker, Guangzhi Tang, Charalampos S. Kouzinopoulos, Rico Möckel |
阅读更多来源: ArXiv AI | 21-07-25
Photonic Fabric Platform for AI Accelerators
Authors: Jing Ding, Trung Diep |
阅读更多来源: ArXiv AI | 21-07-25
Edge Intelligence with Spiking Neural Networks
Authors: Shuiguang Deng, Di Yu, Changze Lv, Xin Du, Linshan Jiang, Xiaofan Zhao, Wentao Tong, Xiaoqing Zheng, Weijia Fang, Peng Zhao, Gang Pan, Schahram Dustdar, Albert Y. Zomaya |
阅读更多来源: ArXiv AI | 21-07-25
Multi-Centre Validation of a Deep Learning Model for Scoliosis Assessment
Authors: Šimon Kubov, Simon Klíčník, Jakub Dandár, Zdeněk Straka, Karolína Kvaková, Daniel Kvak |
阅读更多来源: ArXiv AI | 21-07-25
GraphTrafficGPT: Enhancing Traffic Management Through Graph-Based AI Agent Coordination
Authors: Nabil Abdelaziz Ferhat Taleb, Abdolazim Rezaei, Raj Atulkumar Patel, Mehdi Sookhak |
阅读更多来源: ArXiv AI | 21-07-25
GOFAI meets Generative AI: Development of Expert Systems by means of Large Language Models
Authors: Eduardo C. Garrido-Merchán, Cristina Puente |
阅读更多来源: ArXiv AI | 21-07-25
BifrostRAG: Bridging Dual Knowledge Graphs for Multi-Hop Question Answering in Construction Safety
Authors: Yuxin Zhang (1), Xi Wang (1), Mo Hu (1), Zhenyu Zhang (1) ((1) Department of Construction Science, College of Architecture, Texas A&M University, College Station, USA) |
阅读更多来源: ArXiv AI | 21-07-25
DailyLLM: Context-Aware Activity Log Generation Using Multi-Modal Sensors and LLMs
Authors: Ye Tian, Xiaoyuan Ren, Zihao Wang, Onat Gungor, Xiaofan Yu, Tajana Rosing |
阅读更多来源: ArXiv AI | 21-07-25
Large Language Models as Innovators: A Framework to Leverage Latent Space Exploration for Novelty Discovery
Authors: Mateusz Bystroński, Mikołaj Hołysz, Grzegorz Piotrowski, Nitesh V. Chawla, Tomasz Kajdanowicz |
阅读更多来源: ArXiv AI | 21-07-25
KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models
Authors: Lam Nguyen, Erika Barcelos, Roger French, Yinghui Wu |
阅读更多来源: ArXiv AI | 21-07-25
Glucose-ML: A collection of longitudinal diabetes datasets for development of robust AI solutions
Authors: Temiloluwa Prioleau, Baiying Lu, Yanjun Cui |
阅读更多来源: ArXiv AI | 21-07-25
Automated Interpretation of Non-Destructive Evaluation Contour Maps Using Large Language Models for Bridge Condition Assessment
Authors: Viraj Nishesh Darji, Callie C. Liao, Duoduo Liao |
阅读更多来源: ArXiv AI | 21-07-25
Computational complexity of neural networks (2022)lunalux.io
阅读更多来源: Hacker News | 21-07-25
iMessage integration in Claude can hijack the model to do anythinggeneralanalysis.com
阅读更多来源: Hacker News | 21-07-25
Nobody knows how to build with AI yetworksonmymachine.substack.com
阅读更多来源: Hacker News | 20-07-25
Local LLMs versus offline Wikipediaevanhahn.com
阅读更多来源: Hacker News | 20-07-25
Make Your Own Backup System – Part 1: Strategy Before Scriptsdragas.net
阅读更多来源: Hacker News | 20-07-25
Terence Tao: A human metaphor for evaluating AI capabilitymathstodon.xyz
阅读更多来源: Hacker News | 20-07-25
I'm betting against AI agents, despite building themutkarshkanwat.com
阅读更多来源: Hacker News | 20-07-25
The Big LLM Architecture Comparisonsebastianraschka.com
阅读更多来源: Hacker News | 20-07-25
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities
Authors: Hao Sun, Mihaela van der Schaar |
阅读更多来源: ArXiv AI | 20-07-25
SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models
Authors: Xiangyu Dong, Haoran Zhao, Jiang Gao, Haozhou Li, Xiaoguang Ma, Yaoming Zhou, Fuhai Chen, Juan Liu |
阅读更多来源: ArXiv AI | 20-07-25
DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model
Authors: Maulana Bisyir Azhari, David Hyunchul Shim |
阅读更多来源: ArXiv AI | 20-07-25
Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection
Authors: Hongyang Zhao, Tianyu Liang, Sina Davari, Daeho Kim |
阅读更多来源: ArXiv AI | 20-07-25
Prompt Injection 2.0: Hybrid AI Threats
Authors: Jeremy McHugh, Kristina Šekrst, Jon Cefalu |
阅读更多来源: ArXiv AI | 20-07-25
HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models
Authors: Ashray Gupta, Rohan Joseph, Sunny Rai |
阅读更多来源: ArXiv AI | 20-07-25
Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy
Authors: Yiting Yang, Hao Luo, Yuan Sun, Qingsen Yan, Haokui Zhang, Wei Dong, Guoqing Wang, Peng Wang, Yang Yang, Hengtao Shen |
阅读更多来源: ArXiv AI | 20-07-25
Automating Steering for Safe Multimodal Large Language Models
Authors: Lyucheng Wu, Mengru Wang, Ziwen Xu, Tri Cao, Nay Oo, Bryan Hooi, Shumin Deng |
阅读更多来源: ArXiv AI | 20-07-25
QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation
Authors: Jiazheng Li, Hong Lu, Kaiyue Wen, Zaiwen Yang, Jiaxuan Gao, Hongzhou Lin, Yi Wu, Jingzhao Zhang |
阅读更多来源: ArXiv AI | 20-07-25
AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research
Authors: Yilun Zhao, Weiyuan Chen, Zhijian Xu, Manasi Patwardhan, Yixin Liu, Chengye Wang, Lovekesh Vig, Arman Cohan |
阅读更多来源: ArXiv AI | 20-07-25
Towards Formal Verification of LLM-Generated Code from Natural Language Prompts
Authors: Aaron Councilman, David Fu, Aryan Gupta, Chengxiao Wang, David Grove, Yu-Xiong Wang, Vikram Adve |
阅读更多来源: ArXiv AI | 20-07-25
Imitating Mistakes in a Learning Companion AI Agent for Online Peer Learning
Authors: Sosui Moribe, Taketoshi Ushiama |
阅读更多来源: ArXiv AI | 20-07-25
Emotional Support with LLM-based Empathetic Dialogue Generation
Authors: Shiquan Wang, Ruiyu Fang, Zhongjiang He, Shuangyong Song, Yongxiang Li |
阅读更多来源: ArXiv AI | 20-07-25
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models
Authors: Zhiwei Liu, Jielin Qiu, Shiyu Wang, Jianguo Zhang, Zuxin Liu, Roshan Ram, Haolin Chen, Weiran Yao, Huan Wang, Shelby Heinecke, Silvio Savarese, Caiming Xiong |
阅读更多来源: ArXiv AI | 20-07-25
VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks
Authors: Jian Yao, Ran Cheng, Kay Chen Tan |
阅读更多来源: ArXiv AI | 20-07-25
Prediction of Highway Traffic Flow Based on Artificial Intelligence Algorithms Using California Traffic Data
Authors: Junseong Lee, Jaegwan Cho, Yoonju Cho, Seoyoon Choi, Yejin Shin |
阅读更多来源: ArXiv AI | 20-07-25
Black Box Deployed -- Functional Criteria for Artificial Moral Agents in the LLM Era
Authors: Matthew E. Brophy |
阅读更多来源: ArXiv AI | 20-07-25
The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations
Authors: Carlos Arriaga, Gonzalo Martínez, Eneko Sendin, Javier Conde, Pedro Reviriego |
阅读更多来源: ArXiv AI | 20-07-25
Trump advisors are pushing a regulation targeting what they call "woke" AI models in the tech sector
阅读更多来源: The Decoder | 20-07-25
OpenAI CEO Sam Altman warns users not to trust ChatGPT agent with sensitive or personal data
阅读更多来源: The Decoder | 20-07-25
OpenAI claims gold-medal performance at IMO 2025twitter.com/alexwei_
阅读更多来源: Hacker News | 20-07-25
Meta is luring more top AI researchers from Apple with million-dollar deals
阅读更多来源: The Decoder | 19-07-25
Google's Veo 3 video generation model launches on Gemini API with a hefty price tag
阅读更多来源: The Decoder | 19-07-25
Meta says it won’t sign Europe AI agreement, calling it an overreachcnbc.com
阅读更多来源: Hacker News | 19-07-25
GPT-5-reasoning alpha found in the wildtwitter.com/btibor91
阅读更多来源: Hacker News | 19-07-25
I avoid using LLMs as a publisher and writerlifehacky.net
阅读更多来源: Hacker News | 19-07-25
Mistral AI adds deep research, voice mode, image editing, and more to Le Chat
阅读更多来源: The Decoder | 19-07-25
Anthropic could soon be worth $100 billion - thanks to Claude Code
阅读更多来源: The Decoder | 19-07-25
How I keep up with AI progressnilenso.com
阅读更多来源: Hacker News | 19-07-25
I'm Rebelling Against the Algorithmvarunraghu.com
阅读更多来源: Hacker News | 19-07-25
lsr: ls with io_uringrockorager.dev
阅读更多来源: Hacker News | 19-07-25
Ccusage: A CLI tool for analyzing Claude Code usage from local JSONL filesgithub.com/ryoppippi
阅读更多来源: Hacker News | 19-07-25
Google brings Gemini 2.5 Pro and Deep Search to AI Mode and adds AI phone calling to search
阅读更多来源: The Decoder | 18-07-25
Reflection unveils Asimov: an AI agent built to track every step of software development
阅读更多来源: The Decoder | 18-07-25
Claude Code Unleashedymichael.com
阅读更多来源: Hacker News | 18-07-25
All AI models might be the samejxmo.io
阅读更多来源: Hacker News | 18-07-25
My favorite use-case for AI is writing logsvickiboykis.com
阅读更多来源: Hacker News | 18-07-25
My experience with Claude Code after two weeks of adventuressankalp.bearblog.dev
阅读更多来源: Hacker News | 18-07-25
ChatGPT agent: bridging research and actionopenai.com
阅读更多来源: Hacker News | 18-07-25
Anthropic launches a dedicated AI solution to help finance professionals with analysis
阅读更多来源: The Decoder | 18-07-25
Zuckerberg predicts that not wearing AI glasses in the future will put you at a cognitive disadvantage
阅读更多来源: The Decoder | 18-07-25
CBS Canceling 'Late Show with Stephen Colbert' After Next Seasonnytimes.com
阅读更多来源: Hacker News | 18-07-25
Anthropic tightens usage limits for Claude Code without telling userstechcrunch.com
阅读更多来源: Hacker News | 18-07-25
Meta hires two more leading OpenAI researchers for its superalignment team
阅读更多来源: The Decoder | 17-07-25
I was wrong about robots.txtevgeniipendragon.com
阅读更多来源: Hacker News | 17-07-25
The AI bubble today is bigger than the IT bubble in the 1990sapolloacademy.com
阅读更多来源: Hacker News | 17-07-25
Code Execution Through Email: How I Used Claude to Hack Itselfpynt.io
阅读更多来源: Hacker News | 17-07-25
N8n vs. node-red, which to use for AI workloadsdaniel-payne-keldan-systems.medium.com
阅读更多来源: Hacker News | 17-07-25
Quantum Machine Learning in Multi-Qubit Phase-Space Part I: Foundations
Authors: Timothy Heightman, Edward Jiang, Ruth Mora-Soto, Maciej Lewenstein, Marcin Płodzień |
阅读更多来源: ArXiv AI | 17-07-25
A Framework for Nonstationary Gaussian Processes with Neural Network Parameters
Authors: Zachary James, Joseph Guinness |
阅读更多来源: ArXiv AI | 17-07-25
Improving Contextual ASR via Multi-grained Fusion with Large Language Models
Authors: Shilin Zhou, Zhenghua Li |
阅读更多来源: ArXiv AI | 17-07-25
Text-ADBench: Text Anomaly Detection Benchmark based on LLMs Embedding
Authors: Feng Xiao, Jicong Fan |
阅读更多来源: ArXiv AI | 17-07-25
Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants
Authors: Sybelle Goedicke-Fritz (1), Michelle Bous (1), Annika Engel (2), Matthias Flotho (2 and 5), Pascal Hirsch (2), Hannah Wittig (1), Dino Milanovic (2), Dominik Mohr (1), Mathias Kaspar (6), Sogand Nemat (3), Dorothea Kerner (3), Arno Bücker (3), Andreas Keller (2 and 5 and 7), Sascha Meyer (4), Michael Zemlin (1), Philipp Flotho (2 and 5) ((1) Department of General Pediatrics and Neonatology, Saarland University, Campus Homburg, Homburg/Saar, Germany, (2) Chair for Clinical Bioinformatics, Saarland Informatics Campus, Saarland University, Saarbrücken, Germany, (3) Department of Radiology, and Interventional Radiology, University Hospital of Saarland, Homburg, Germany, (4) Clinical Centre Karlsruhe, Franz-Lust Clinic for Paediatrics, Karlsruhe, Germany, (5) Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Saarland University Campus, Germany, (6) Digital Medicine, University Hospital of Augsburg, Augsburg, Germany, (7) Pharma Science Hub (PSH), Saarland University Campus, Germany) |
阅读更多来源: ArXiv AI | 17-07-25
Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization
Authors: Prashanth Vijayaraghavan, Apoorva Nitsure, Charles Mackin, Luyao Shi, Stefano Ambrogio, Arvind Haran, Viresh Paruthi, Ali Elzein, Dan Coops, David Beymer, Tyler Baldwin, Ehsan Degan |
阅读更多来源: ArXiv AI | 17-07-25
GitChameleon: Evaluating AI Code Generation Against Python Library Version Incompatibilities
Authors: Diganta Misra, Nizar Islah, Victor May, Brice Rauby, Zihan Wang, Justine Gehring, Antonio Orvieto, Muawiz Chaudhary, Eilif B. Muller, Irina Rish, Samira Ebrahimi Kahou, Massimo Caccia |
阅读更多来源: ArXiv AI | 17-07-25
LLM-Based Config Synthesis requires Disambiguation
Authors: Rajdeep Mondal, Nikolaj Bjorner, Todd Millstein, Alan Tang, George Varghese |
阅读更多来源: ArXiv AI | 17-07-25
Characterizing State Space Model (SSM) and SSM-Transformer Hybrid Language Model Performance with Long Context Length
Authors: Saptarshi Mitra, Rachid Karami, Haocheng Xu, Sitao Huang, Hyoukjun Kwon |
阅读更多来源: ArXiv AI | 17-07-25
A Study on the Application of Artificial Intelligence in Ecological Design
Authors: Hengyue Zhao |
阅读更多来源: ArXiv AI | 17-07-25
Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification
Authors: Moises Andrade, Joonhyuk Cha, Brandon Ho, Vriksha Srihari, Karmesh Yadav, Zsolt Kira |
阅读更多来源: ArXiv AI | 17-07-25
General Modular Harness for LLM Agents in Multi-Turn Gaming Environments
Authors: Yuxuan Zhang, Haoyang Yu, Lanxiang Hu, Haojian Jin, Hao Zhang |
阅读更多来源: ArXiv AI | 17-07-25
Auto-Formulating Dynamic Programming Problems with Large Language Models
Authors: Chenyu Zhou, Jingyuan Yang, Linwei Xin, Yitian Chen, Ziyan He, Dongdong Ge |
阅读更多来源: ArXiv AI | 17-07-25
ClarifAI: Enhancing AI Interpretability and Transparency through Case-Based Reasoning and Ontology-Driven Approach for Improved Decision-Making
Authors: Srikanth Vemula |
阅读更多来源: ArXiv AI | 17-07-25
BuildEvo: Designing Building Energy Consumption Forecasting Heuristics via LLM-driven Evolution
Authors: Subin Lin, Chuanbo Hua |
阅读更多来源: ArXiv AI | 17-07-25
Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning
Authors: Yuhao Chen, Shuochen Liu, Yuanjie Lyu, Chao Zhang, Jiayao Shi, Tong Xu |
阅读更多来源: ArXiv AI | 17-07-25
Nvidia can resume exports of its H20 AI chip to China after a US policy reversal
阅读更多来源: The Decoder | 17-07-25
Scanned piano rolls databasepianorollmusic.org
阅读更多来源: Hacker News | 17-07-25
Chain of thought monitorability: A new and fragile opportunity for AI safetyarxiv.org
阅读更多来源: Hacker News | 17-07-25
Six Years of Geminigeminiprotocol.net
阅读更多来源: Hacker News | 16-07-25
Show HN: Shoggoth Mini – A soft tentacle robot powered by GPT-4o and RLmatthieulc.com
阅读更多来源: Hacker News | 16-07-25
Reflections on OpenAIcalv.info
阅读更多来源: Hacker News | 16-07-25
Gauntlet AI (YC S17): All expenses paid training in AI and $200k+jobcrossover.com
阅读更多来源: Hacker News | 16-07-25
LLM Daydreaminggwern.net
阅读更多来源: Hacker News | 16-07-25
KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?
Authors: Soumadeep Saha, Akshay Chaturvedi, Saptarshi Saha, Utpal Garain, Nicholas Asher |
阅读更多来源: ArXiv AI | 16-07-25
EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes
Authors: LG AI Research: Kyunghoon Bae, Eunbi Choi, Kibong Choi, Stanley Jungkyu Choi, Yemuk Choi, Kyubeen Han, Seokhee Hong, Junwon Hwang, Taewan Hwang, Joonwon Jang, Hyojin Jeon, Kijeong Jeon, Gerrard Jeongwon Jo, Hyunjik Jo, Jiyeon Jung, Euisoon Kim, Hyosang Kim, Jihoon Kim, Joonkee Kim, Seonghwan Kim, Soyeon Kim, Sunkyoung Kim, Yireun Kim, Yongil Kim, Youchul Kim, Edward Hwayoung Lee, Gwangho Lee, Haeju Lee, Honglak Lee, Jinsik Lee, Kyungmin Lee, Sangha Park, Young Min Paik, Yongmin Park, Youngyong Park, Sanghyun Seo, Sihoon Yang, Heuiyeen Yeen, Sihyuk Yi, Hyeongu Yun |
阅读更多来源: ArXiv AI | 16-07-25
Attributes Shape the Embedding Space of Face Recognition Models
Authors: Pierrick Leroy, Antonio Mastropietro, Marco Nurisso, Francesco Vaccarino |
阅读更多来源: ArXiv AI | 16-07-25
SAMEP: A Secure Protocol for Persistent Context Sharing Across AI Agents
Authors: Hari Masoor |
阅读更多来源: ArXiv AI | 16-07-25
Streaming 4D Visual Geometry Transformer
Authors: Dong Zhuo, Wenzhao Zheng, Jiahe Guo, Yuqi Wu, Jie Zhou, Jiwen Lu |
阅读更多来源: ArXiv AI | 16-07-25
AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air
Authors: Shiyi Yang, Xiaoxue Yu, Rongpeng Li, Jianhang Zhu, Zhifeng Zhao, Honggang Zhang |
阅读更多来源: ArXiv AI | 16-07-25
Enhancing the Capabilities of Large Language Models for API calls through Knowledge Graphs
Authors: Ye Yang, Xue Xiao, Ping Yin, Taotao Xie |
阅读更多来源: ArXiv AI | 16-07-25
Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning
Authors: Zheng Zhang |
阅读更多来源: ArXiv AI | 16-07-25
Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning
Authors: Konstantinos I. Roumeliotis, Ranjan Sapkota, Manoj Karkee, Nikolaos D. Tselikas |
阅读更多来源: ArXiv AI | 16-07-25
WhisperKit: On-device Real-time ASR with Billion-Scale Transformers
Authors: Atila Orhon, Arda Okan, Berkin Durmus, Zach Nagengast, Eduardo Pacheco |
阅读更多来源: ArXiv AI | 16-07-25
Automated Thematic Analyses Using LLMs: Xylazine Wound Management Social Media Chatter Use Case
Authors: JaMor Hairston, Ritvik Ranjan, Sahithi Lakamana, Anthony Spadaro, Selen Bozkurt, Jeanmarie Perrone, Abeed Sarker |
阅读更多来源: ArXiv AI | 16-07-25
Detecting AI Assistance in Abstract Complex Tasks
Authors: Tyler King, Nikolos Gurney, John H. Miller, Volkan Ustun |
阅读更多来源: ArXiv AI | 16-07-25
IoT Malware Network Traffic Detection using Deep Learning and GraphSAGE Models
Authors: Nikesh Prajapati, Bimal Karki, Saroj Gopali, Akbar Siami Namin |
阅读更多来源: ArXiv AI | 16-07-25
Function-to-Style Guidance of LLMs for Code Translation
Authors: Longhui Zhang, Bin Wang, Jiahao Wang, Xiaofeng Zhao, Min Zhang, Hao Yang, Meishan Zhang, Yu Li, Jing Li, Jun Yu, Min Zhang |
阅读更多来源: ArXiv AI | 16-07-25
Modeling Habitat Shifts: Integrating Convolutional Neural Networks and Tabular Data for Species Migration Prediction
Authors: Emir Durakovic, Min-Hong Shih |
阅读更多来源: ArXiv AI | 16-07-25
Lessons Learned from Evaluation of LLM based Multi-agents in Safer Therapy Recommendation
Authors: Yicong Wu, Ting Chen, Irit Hochberg, Zhoujian Sun, Ruth Edry, Zhengxing Huang, Mor Peleg |
阅读更多来源: ArXiv AI | 16-07-25
Taming Uncertainty via Automation: Observing, Analyzing, and Optimizing Agentic AI Systems
Authors: Dany Moshkovich, Sergey Zeltyn |
阅读更多来源: ArXiv AI | 16-07-25
Perspective-Aware AI in Extended Reality
Authors: Daniel Platnick, Matti Gruener, Marjan Alirezaie, Kent Larson, Dava J. Newman, Hossein Rahnama |
阅读更多来源: ArXiv AI | 16-07-25
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering
Authors: Yinsheng Li, Zhen Dong, Yi Shao |
阅读更多来源: ArXiv AI | 16-07-25
How Many Instructions Can LLMs Follow at Once?
Authors: Daniel Jaroslawicz, Brendan Whiting, Parth Shah, Karime Maamari |
阅读更多来源: ArXiv AI | 16-07-25
Vulnerable kids are nearly three times more likely to use companion AI chatbots for friendship
阅读更多来源: The Decoder | 16-07-25
Anthropic, OpenAI, Google, and xAI have landed Pentagon contracts worth up to $200 million
阅读更多来源: The Decoder | 16-07-25
LLM Inevitabilismtomrenner.com
阅读更多来源: Hacker News | 16-07-25
OpenAI – vulnerability responsible disclosureany.org
阅读更多来源: Hacker News | 16-07-25
Mira Murati’s AI startup Thinking Machines valued at $12B in early-stage fundingreuters.com
阅读更多来源: Hacker News | 16-07-25
Claude for Financial Servicesanthropic.com
阅读更多来源: Hacker News | 16-07-25
Unlike ChatGPT, Anthropic has doubled down on Artifactsben-mini.com
阅读更多来源: Hacker News | 16-07-25
NeuralOS: An operating system powered by neural networksneural-os.com
阅读更多来源: Hacker News | 15-07-25
Context Rot: How increasing input tokens impacts LLM performancetrychroma.com
阅读更多来源: Hacker News | 15-07-25
CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks
Authors: Hongchao Jiang, Yiming Chen, Yushi Cao, Hung-yi Lee, Robby T. Tan |
阅读更多来源: ArXiv AI | 15-07-25
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
Authors: Joel Becker, Nate Rush, Elizabeth Barnes, David Rein |
阅读更多来源: ArXiv AI | 15-07-25
Multi-Actor Generative Artificial Intelligence as a Game Engine
Authors: Alexander Sasha Vezhnevets, Jayd Matyas, Logan Cross, Davide Paglieri, Minsuk Chang, William A. Cunningham, Simon Osindero, William S. Isaac, Joel Z. Leibo |
阅读更多来源: ArXiv AI | 15-07-25
LLM-Stackelberg Games: Conjectural Reasoning Equilibria and Their Applications to Spearphishing
Authors: Quanyan Zhu |
阅读更多来源: ArXiv AI | 15-07-25
Knowledge Conceptualization Impacts RAG Efficacy
Authors: Chris Davis Jaldi, Anmol Saini, Elham Ghiasi, O. Divine Eziolise, Cogan Shimizu |
阅读更多来源: ArXiv AI | 15-07-25
EduFlow: Advancing MLLMs' Problem-Solving Proficiency through Multi-Stage, Multi-Perspective Critique
Authors: Chenglin Zhu, Tao Zhang, Chong Li, Mingan Lin, Zenan Zhou, Jian Xie |
阅读更多来源: ArXiv AI | 15-07-25
A Taxonomy of Omnicidal Futures Involving Artificial Intelligence
Authors: Andrew Critch, Jacob Tsimerman |
阅读更多来源: ArXiv AI | 15-07-25
When Developer Aid Becomes Security Debt: A Systematic Analysis of Insecure Behaviors in LLM Coding Agents
Authors: Matous Kozak, Roshanak Zilouchian Moghaddam, Siva Sivaraman |
阅读更多来源: ArXiv AI | 15-07-25
humancompatible.interconnect: Testing Properties of Repeated Uses of Interconnections of AI Systems
Authors: Rodion Nazarov, Anthony Quinn, Robert Shorten, Jakub Marecek |
阅读更多来源: ArXiv AI | 15-07-25
Learning to Control Dynamical Agents via Spiking Neural Networks and Metropolis-Hastings Sampling
Authors: Ali Safa, Farida Mohsen, Ali Al-Zawqari |
阅读更多来源: ArXiv AI | 15-07-25
Model-Grounded Symbolic Artificial Intelligence Systems Learning and Reasoning with Model-Grounded Symbolic Artificial Intelligence Systems
Authors: Aniruddha Chattopadhyay, Raj Dandekar, Kaushik Roy |
阅读更多来源: ArXiv AI | 15-07-25
Is Human-Written Data Enough? The Challenge of Teaching Reasoning to LLMs Without RL or Distillation
Authors: Wei Du, Branislav Kisacanin, George Armstrong, Shubham Toshniwal, Ivan Moshkov, Alexan Ayrapetyan, Sadegh Mahdavi, Dan Zhao, Shizhe Diao, Dragan Masulovic, Marius Stanean, Advaith Avadhanam, Max Wang, Ashmit Dutta, Shitij Govil, Sri Yanamandara, Mihir Tandon, Sriram Ananthakrishnan, Vedant Rathi, David Zhang, Joonseok Kang, Leon Luo, Titu Andreescu, Boris Ginsburg, Igor Gitman |
阅读更多来源: ArXiv AI | 15-07-25
Technical Requirements for Halting Dangerous AI Activities
Authors: Peter Barnett, Aaron Scher, David Abecassis |
阅读更多来源: ArXiv AI | 15-07-25
Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations
Authors: Bradley P. Allen, Prateek Chhikara, Thomas Macaulay Ferguson, Filip Ilievski, Paul Groth |
阅读更多来源: ArXiv AI | 15-07-25
DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models
Authors: Luolin Xiong, Haofen Wang, Xi Chen, Lu Sheng, Yun Xiong, Jingping Liu, Yanghua Xiao, Huajun Chen, Qing-Long Han, Yang Tang |
阅读更多来源: ArXiv AI | 15-07-25
Could you be wrong: Debiasing LLMs using a metacognitive prompt for improving human decision making
Authors: Thomas T. Hills |
阅读更多来源: ArXiv AI | 15-07-25
Analysis of AI Techniques for Orchestrating Edge-Cloud Application Migration
Authors: Sadig Gojayev, Ahmad Anaqreh, Carolina Fortuna |
阅读更多来源: ArXiv AI | 15-07-25
BlueGlass: A Framework for Composite AI Safety
Authors: Harshal Nandigramwar, Syed Qutub, Kay-Ulrich Scholl |
阅读更多来源: ArXiv AI | 15-07-25
FRSICL: LLM-Enabled In-Context Learning Flight Resource Allocation for Fresh Data Collection in UAV-Assisted Wildfire Monitoring
Authors: Yousef Emami, Hao Zhou, Miguel Gutierrez Gaitan, Kai Li, Luis Almeida |
阅读更多来源: ArXiv AI | 15-07-25
Introducing the Swiss Food Knowledge Graph: AI for Context-Aware Nutrition Recommendation
Authors: Lubnaa Abdur Rahman, Ioannis Papathanail, Stavroula Mougiakakou |
阅读更多来源: ArXiv AI | 15-07-25
Survey for Categorising Explainable AI Studies Using Data Analysis Task Frameworks
Authors: Hamzah Ziadeh, Hendrik Knoche |
阅读更多来源: ArXiv AI | 15-07-25
Should We Ever Prefer Decision Transformer for Offline Reinforcement Learning?
Authors: Yumi Omori, Zixuan Dong, Keith Ross |
阅读更多来源: ArXiv AI | 15-07-25
Toward Real-World Table Agents: Capabilities, Workflows, and Design Principles for LLM-based Table Intelligence
Authors: Jiaming Tian, Liyao Li, Wentao Ye, Haobo Wang, Lingxin Wang, Lihua Yu, Zujie Ren, Gang Chen, Junbo Zhao |
阅读更多来源: ArXiv AI | 15-07-25
SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning
Authors: Meriem Zerkouk, Miloud Mihoubi, Belkacem Chikhaoui |
阅读更多来源: ArXiv AI | 15-07-25
Elon Musk's AI company xAI apologizes "deeply" for Grok's "horrific behavior"
阅读更多来源: The Decoder | 15-07-25
Anthropic, Google, OpenAI and XAI Granted Up to $200M from Defense Departmentcnbc.com
阅读更多来源: Hacker News | 15-07-25
Embedding user-defined indexes in Apache Parquetapache.org
阅读更多来源: Hacker News | 15-07-25
OpenAI delays release of open-weight model indefinitely over safety concerns
阅读更多来源: The Decoder | 14-07-25
A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1
Authors: Marcin Pietroń, Rafał Olszowski, Jakub Gomułka, Filip Gampel, Andrzej Tomski |
阅读更多来源: ArXiv AI | 14-07-25
Generating Proto-Personas through Prompt Engineering: A Case Study on Efficiency, Effectiveness and Empathy
Authors: Fernando Ayach, Vitor Lameirão, Raul Leão, Jerfferson Felizardo, Rafael Sobrinho, Vanessa Borges, Patrícia Matsubara, Awdren Fontão |
阅读更多来源: ArXiv AI | 14-07-25
TableReasoner: Advancing Table Reasoning Framework with Large Language Models
Authors: Sishi Xiong, Dakai Wang, Yu Zhao, Jie Zhang, Changzai Pan, Haowei He, Xiangyu Li, Wenhan Chang, Zhongjiang He, Shuangyong Song, Yongxiang Li |
阅读更多来源: ArXiv AI | 14-07-25
Reasoning and Behavioral Equilibria in LLM-Nash Games: From Mindsets to Actions
Authors: Quanyan Zhu |
阅读更多来源: ArXiv AI | 14-07-25
A Dynamic Stackelberg Game Framework for Agentic AI Defense Against LLM Jailbreaking
Authors: Zhengye Han, Quanyan Zhu |
阅读更多来源: ArXiv AI | 14-07-25
Giving AI Agents Access to Cryptocurrency and Smart Contracts Creates New Vectors of AI Harm
Authors: Bill Marino, Ari Juels |
阅读更多来源: ArXiv AI | 14-07-25
Multi-Agent LLMs as Ethics Advocates in AI-Based Systems
Authors: Asma Yamani, Malak Baslyman, Moataz Ahmed |
阅读更多来源: ArXiv AI | 14-07-25
M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning
Authors: Inclusion AI: Fudong Wang, Jiajia Liu, Jingdong Chen, Jun Zhou, Kaixiang Ji, Lixiang Ru, Qingpei Guo, Ruobing Zheng, Tianqi Li, Yi Yuan, Yifan Mao, Yuting Xiao, Ziping Ma |
阅读更多来源: ArXiv AI | 14-07-25
Introspection of Thought Helps AI Agents
Authors: Haoran Sun, Shaoning Zeng |
阅读更多来源: ArXiv AI | 14-07-25
Agentic Large Language Models for Conceptual Systems Engineering and Design
Authors: Soheyl Massoudi, Mark Fuge |
阅读更多来源: ArXiv AI | 14-07-25
Show HN: FFmpeg in plain English – LLM-assisted FFmpeg in the browservidmix.app
阅读更多来源: Hacker News | 14-07-25
The upcoming GPT-3 moment for RLmechanize.work
阅读更多来源: Hacker News | 14-07-25
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMsarxiv.org
阅读更多来源: Hacker News | 14-07-25
Local Chatbot RAG with FreeBSD Knowledgehackacad.net
阅读更多来源: Hacker News | 14-07-25
Ask HN: How much of OpenAI code is written by AI?
阅读更多来源: Hacker News | 14-07-25
Show HN: Learn LLMs LeetCode Stylegithub.com/exorust
阅读更多来源: Hacker News | 14-07-25
Hypercapitalism and the AI talent warsjohnluttig.com
阅读更多来源: Hacker News | 14-07-25
OpenAI loses out as Google hires Windsurf's CEO and top talent
阅读更多来源: The Decoder | 13-07-25
Switching to Claude Code and VSCode Inside Dockertimsh.org
阅读更多来源: Hacker News | 13-07-25
Understanding Tool Calling in LLMs – Step-by-Step with REST and Spring AImuthuishere.medium.com
阅读更多来源: Hacker News | 13-07-25
Axon's Draft One AI Police Report Generator Is Designed to Defy Transparencyeff.org
阅读更多来源: Hacker News | 13-07-25
MIRIX: Multi-Agent Memory System for LLM-Based Agents
Authors: Yu Wang, Xi Chen |
阅读更多来源: ArXiv AI | 13-07-25
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs
Authors: Jeongseok Hyun, Sukjun Hwang, Su Ho Han, Taeoh Kim, Inwoong Lee, Dongyoon Wee, Joon-Young Lee, Seon Joo Kim, Minho Shim |
阅读更多来源: ArXiv AI | 13-07-25
Autonomous Control Leveraging LLMs: An Agentic Framework for Next-Generation Industrial Automation
Authors: Javal Vyas, Mehmet Mercangoz |
阅读更多来源: ArXiv AI | 13-07-25
BOOST: Out-of-Distribution-Informed Adaptive Sampling for Bias Mitigation in Stylistic Convolutional Neural Networks
Authors: Mridula Vijendran, Shuang Chen, Jingjing Deng, Hubert P. H. Shum |
阅读更多来源: ArXiv AI | 13-07-25
Application of LLMs to Multi-Robot Path Planning and Task Allocation
Authors: Ashish Kumar |
阅读更多来源: ArXiv AI | 13-07-25
On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment
Authors: Sarah Ball, Greg Gluch, Shafi Goldwasser, Frauke Kreuter, Omer Reingold, Guy N. Rothblum |
阅读更多来源: ArXiv AI | 13-07-25
StarDojo: Benchmarking Open-Ended Behaviors of Agentic Multimodal LLMs in Production-Living Simulations with Stardew Valley
Authors: Weihao Tan, Changjiu Jiang, Yu Duan, Mingcong Lei, Jiageng Li, Yitian Hong, Xinrun Wang, Bo An |
阅读更多来源: ArXiv AI | 13-07-25
DrugMCTS: a drug repurposing framework combining multi-agent, RAG and Monte Carlo Tree Search
Authors: Zerui Yang, Yuwei Wan, Yinqiao Li, Yudai Matsuda, Tong Xie, Linqi Song |
阅读更多来源: ArXiv AI | 13-07-25
Enhancing Vaccine Safety Surveillance: Extracting Vaccine Mentions from Emergency Department Triage Notes Using Fine-Tuned Large Language Models
Authors: Sedigh Khademi, Jim Black, Christopher Palmer, Muhammad Javed, Hazel Clothier, Jim Buttery, Gerardo Luis Dimaguila |
阅读更多来源: ArXiv AI | 13-07-25
PlanQA: A Benchmark for Spatial Reasoning in LLMs using Structured Representations
Authors: Fedor Rodionov, Abdelrahman Eldesokey, Michael Birsak, John Femiani, Bernard Ghanem, Peter Wonka |
阅读更多来源: ArXiv AI | 13-07-25
Measuring AI Alignment with Human Flourishing
Authors: Elizabeth Hilliard, Akshaya Jagadeesh, Alex Cook, Steele Billings, Nicholas Skytland, Alicia Llewellyn, Jackson Paull, Nathan Paull, Nolan Kurylo, Keatra Nesbitt, Robert Gruenewald, Anthony Jantzi, Omar Chavez |
阅读更多来源: ArXiv AI | 13-07-25
Stable Preference Optimization for LLMs: A Bilevel Approach Beyond Direct Preference Optimization
Authors: Chengtao Jian, Kai Yang, Ye Ouyang, Xiaozhou Ye |
阅读更多来源: ArXiv AI | 13-07-25
An Integrated Framework of Prompt Engineering and Multidimensional Knowledge Graphs for Legal Dispute Analysis
Authors: Mingda Zhang, Na Zhao, Jianglong Qing, Qing xu, Kaiwen Pan, Ting luo |
阅读更多来源: ArXiv AI | 13-07-25
New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models
阅读更多来源: The Decoder | 13-07-25
EU's Model Documentation Form makes AI providers explain their models like it's tax season
阅读更多来源: The Decoder | 13-07-25
Kimi-K2 is the next open-weight AI milestone from China after Deepseek
阅读更多来源: The Decoder | 13-07-25
OpenAI’s Windsurf deal is off, and Windsurf’s CEO is going to Googletheverge.com
阅读更多来源: Hacker News | 13-07-25
Researchers used 1,600 YouTube fail videos to show AI models struggle with surprises
阅读更多来源: The Decoder | 13-07-25
OpenAI’s head of ChatGPT says AI will not displace doctors but will displace not going to the doctor
阅读更多来源: The Decoder | 12-07-25
Bad Actors Are Grooming LLMs to Produce Falsehoodsamericansunlight.substack.com
阅读更多来源: Hacker News | 12-07-25
OpenAI delays launch of open-weight modeltwitter.com/sama
阅读更多来源: Hacker News | 12-07-25
Leveraging Elixir's hot code loading capabilities to modularize a monolithic applucassifoni.info
阅读更多来源: Hacker News | 12-07-25
Andrew Ng: Building Faster with AI [video]youtube.com
阅读更多来源: Hacker News | 12-07-25
Sieve (YC X25) is hiring researchers to build large video datasets for AI labssievedata.com
阅读更多来源: Hacker News | 12-07-25
Upgrading an M4 Pro Mac mini's storage for half the pricejeffgeerling.com
阅读更多来源: Hacker News | 12-07-25
ETH Zurich and EPFL to release a LLM developed on public infrastructureethz.ch
阅读更多来源: Hacker News | 12-07-25
Google unveils MedGemma, an open-source AI model suite for medical applications
阅读更多来源: The Decoder | 12-07-25
LLM Inference Handbookbentoml.com
阅读更多来源: Hacker News | 12-07-25
Activeloop (YC S18) Is Hiring AI Search and Python Back End Engineers(Onsite,MV)activeloop.ai
阅读更多来源: Hacker News | 12-07-25
Hugging Face warns that closed-source robots threaten user control
阅读更多来源: The Decoder | 11-07-25
Most AI models can fake alignment, but safety training suppresses the behavior, study finds
阅读更多来源: The Decoder | 11-07-25
Meta continues to lure top AI talent with compensation packages exceeding $200 million
阅读更多来源: The Decoder | 11-07-25
OpenAI will debut an open-weight LLM soon and launch a browser with integrated AI chat
阅读更多来源: The Decoder | 11-07-25
Graphical Linear Algebragraphicallinearalgebra.net
阅读更多来源: Hacker News | 11-07-25
Batch Mode in the Gemini API: Process More for Lessgoogleblog.com
阅读更多来源: Hacker News | 11-07-25
Recovering from AI Addictioninternetaddictsanonymous.org
阅读更多来源: Hacker News | 11-07-25
Is Gemini 2.5 good at bounding boxes?simedw.com
阅读更多来源: Hacker News | 11-07-25
Not So Fast: AI Coding Tools Can Reduce Productivitysecondthoughts.ai
阅读更多来源: Hacker News | 11-07-25
Measuring the impact of AI on experienced open-source developer productivitymetr.org
阅读更多来源: Hacker News | 11-07-25
Bloomberg: China’s AI expansion in Xinjiang relies on Nvidia chips despite U.S. export controls
阅读更多来源: The Decoder | 10-07-25
An attacker used AI to impersonate Secretary Rubio and contact high-ranking officials
阅读更多来源: The Decoder | 10-07-25
At last, a use case for AI agents with sky-high ROI: Stealing cryptotheregister.com
阅读更多来源: Hacker News | 10-07-25
ChatGPT Guessing Game Leads to Users Extracting Free Windows OS Keys and More0din.ai
阅读更多来源: Hacker News | 10-07-25
Biomni: A General-Purpose Biomedical AI Agentgithub.com/snap-stanford
阅读更多来源: Hacker News | 10-07-25
MCP-B: A Protocol for AI Browser Automationmcp-b.ai
阅读更多来源: Hacker News | 10-07-25
Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining: Method, Evaluation and Applications
Authors: Seonwu Kim, Yohan Na, Kihun Kim, Hanhee Cho, Geun Lim, Mintae Kim, Seongik Park, Ki Hyun Kim, Youngsub Han, Byoung-Ki Jeon |
阅读更多来源: ArXiv AI | 10-07-25
Comprehensive Evaluation of Prototype Neural Networks
Authors: Philipp Schlinge, Steffen Meinert, Martin Atzmueller |
阅读更多来源: ArXiv AI | 10-07-25
OpenDPDv2: A Unified Learning and Optimization Framework for Neural Network Digital Predistortion
Authors: Yizhuo Wu, Ang Li, Chang Gao |
阅读更多来源: ArXiv AI | 10-07-25
Winning and losing with Artificial Intelligence: What public discourse about ChatGPT tells us about how societies make sense of technological change
Authors: Adrian Rauchfleisch, Joshua Philip Suarez, Nikka Marie Sales, Andreas Jungherr |
阅读更多来源: ArXiv AI | 10-07-25
The Dark Side of LLMs Agent-based Attacks for Complete Computer Takeover
Authors: Matteo Lupinacci, Francesco Aurelio Pironti, Francesco Blefari, Francesco Romeo, Luigi Arena, Angelo Furfaro |
阅读更多来源: ArXiv AI | 10-07-25
Developing and Maintaining an Open-Source Repository of AI Evaluations: Challenges and Insights
Authors: Alexandra Abbas, Celia Waggoner, Justin Olive |
阅读更多来源: ArXiv AI | 10-07-25
Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model
Authors: Jing Liang, Hongyao Tang, Yi Ma, Jinyi Liu, Yan Zheng, Shuyue Hu, Lei Bai, Jianye Hao |
阅读更多来源: ArXiv AI | 10-07-25
MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation
Authors: Qilong Xing, Zikai Song, Youjia Zhang, Na Feng, Junqing Yu, Wei Yang |
阅读更多来源: ArXiv AI | 10-07-25
PLAME: Leveraging Pretrained Language Models to Generate Enhanced Protein Multiple Sequence Alignments
Authors: Hanqun Cao, Xinyi Zhou, Zijun Gao, Chenyu Wang, Xin Gao, Zhi Zhang, Chunbin Gu, Ge Liu, Pheng-Ann Heng |
阅读更多来源: ArXiv AI | 10-07-25
A Novel Hybrid Deep Learning Technique for Speech Emotion Detection using Feature Engineering
Authors: Shahana Yasmin Chowdhury, Bithi Banik, Md Tamjidul Hoque, Shreya Banerjee |
阅读更多来源: ArXiv AI | 10-07-25
Advances in Intelligent Hearing Aids: Deep Learning Approaches to Selective Noise Cancellation
Authors: Haris Khan, Shumaila Asif, Hassan Nasir |
阅读更多来源: ArXiv AI | 10-07-25
DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning
Authors: Shreyas Vinaya Sathyanarayana, Rahil Shah, Sharanabasava D. Hiremath, Rishikesh Panda, Rahul Jana, Riya Singh, Rida Irfan, Ashwin Murali, Bharath Ramsundar |
阅读更多来源: ArXiv AI | 10-07-25
Comparative Analysis of CNN and Transformer Architectures with Heart Cycle Normalization for Automated Phonocardiogram Classification
Authors: Martin Sondermann, Pinar Bisgin, Niklas Tschorn, Anja Burmann, Christoph M. Friedrich |
阅读更多来源: ArXiv AI | 10-07-25
An AI Approach for Learning the Spectrum of the Laplace-Beltrami Operator
Authors: Yulin An, Enrique del Castillo |
阅读更多来源: ArXiv AI | 10-07-25
Jolting Technologies: Superexponential Acceleration in AI Capabilities and Implications for AGI
Authors: David Orban |
阅读更多来源: ArXiv AI | 10-07-25
The User-Centric Geo-Experience: An LLM-Powered Framework for Enhanced Planning, Navigation, and Dynamic Adaptation
Authors: Jieren Deng, Aleksandar Cvetkovic, Pak Kiu Chung, Dragomir Yankov, Chiqun Zhang |
阅读更多来源: ArXiv AI | 10-07-25
OpenAI is ramping up security to prevent rivals from copying its advanced AI models
阅读更多来源: The Decoder | 10-07-25
RapidRAW: A non-destructive and GPU-accelerated RAW image editorgithub.com/cybertimon
阅读更多来源: Hacker News | 10-07-25
Why LLMs Can't Write Q/Kdb+: Writing Code Right-to-Leftmedium.com/gabiteodoru
阅读更多来源: Hacker News | 10-07-25
Apple’s AI team faces major departures as Meta recruits key engineers
阅读更多来源: The Decoder | 09-07-25
A developer focused on stopping AI bots says poisoning datasets is like peeing in the ocean
阅读更多来源: The Decoder | 09-07-25
Researchers reveal that AI models have distinct strategic fingerprints in classic game theory tests
阅读更多来源: The Decoder | 09-07-25
Sakana AI's new algorithm lets large language models work together to solve complex problems
阅读更多来源: The Decoder | 09-07-25
Huawei pushes back on AI model plagiarism claims
阅读更多来源: The Decoder | 09-07-25
UQLM: A Python Package for Uncertainty Quantification in Large Language Models
Authors: Dylan Bouchard, Mohit Singh Chauhan, David Skarbrevik, Ho-Kyeong Ra, Viren Bajaj, Zeya Ahmad |
阅读更多来源: ArXiv AI | 09-07-25
SQLBarber: A System Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads
Authors: Jiale Lao, Immanuel Trummer |
阅读更多来源: ArXiv AI | 09-07-25
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
Authors: Xiangru Tang, Tianrui Qin, Tianhao Peng, Ziyang Zhou, Daniel Shao, Tingting Du, Xinming Wei, Peng Xia, Fang Wu, He Zhu, Ge Zhang, Jiaheng Liu, Xingyao Wang, Sirui Hong, Chenglin Wu, Hao Cheng, Chi Wang, Wangchunshu Zhou |
阅读更多来源: ArXiv AI | 09-07-25
Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers
Authors: Zhiyuan Peng, Ting-ruen Wei, Tingyu Song, Yilun Zhao, Yi Fang |
阅读更多来源: ArXiv AI | 09-07-25
Chat2SPaT: A Large Language Model Based Tool for Automating Traffic Signal Control Plan Management
Authors: Yue Wang, Miao Zhou, Guijing Huang, Rui Zhuo, Chao Yi, Zhenliang Ma |
阅读更多来源: ArXiv AI | 09-07-25
Cultivating Multimodal Intelligence: Interpretive Reasoning and Agentic RAG Approaches to Dermatological Diagnosis
Authors: Karishma Thakrar, Shreyas Basavatia, Akshay Daftardar |
阅读更多来源: ArXiv AI | 09-07-25
SenseCF: LLM-Prompted Counterfactuals for Intervention and Sensor Data Augmentation
Authors: Shovito Barua Soumma, Asiful Arefeen, Stephanie M. Carpenter, Melanie Hingle, Hassan Ghasemzadeh |
阅读更多来源: ArXiv AI | 09-07-25
Red Teaming AI Red Teaming
Authors: Subhabrata Majumdar, Brian Pendleton, Abhishek Gupta |
阅读更多来源: ArXiv AI | 09-07-25
Conversational Education at Scale: A Multi-LLM Agent Workflow for Procedural Learning and Pedagogic Quality Assessment
Authors: Jiahuan Pei, Fanghua Ye, Xin Sun, Wentao Deng, Koen Hindriks, Junxiao Wang |
阅读更多来源: ArXiv AI | 09-07-25
Domain adaptation of large language models for geotechnical applications
Authors: Lei Fan, Fangxue Liu, Cheng Chen |
阅读更多来源: ArXiv AI | 09-07-25
MLlm-DR: Towards Explainable Depression Recognition with MultiModal Large Language Models
Authors: Wei Zhang, Juan Chen, En Zhu, Wenhong Cheng, YunPeng Li, Yanbo J. Wang |
阅读更多来源: ArXiv AI | 09-07-25
Towards Measurement Theory for Artificial Intelligence
Authors: Elija Perrier |
阅读更多来源: ArXiv AI | 09-07-25
Divergent Realities: A Comparative Analysis of Human Expert vs. Artificial Intelligence Based Generation and Evaluation of Treatment Plans in Dermatology
Authors: Dipayan Sengupta, Saumya Panda |
阅读更多来源: ArXiv AI | 09-07-25
LLMs are Introvert
Authors: Litian Zhang, Xiaoming Zhang, Bingyu Yan, Ziyi Zhou, Bo Zhang, Zhenyu Guan, Xi Zhang, Chaozhuo Li |
阅读更多来源: ArXiv AI | 09-07-25
Enhancing Student Learning with LLM-Generated Retrieval Practice Questions: An Empirical Study in Data Science Courses
Authors: Yuan An, John Liu, Niyam Acharya, Ruhma Hashmi |
阅读更多来源: ArXiv AI | 09-07-25
An autonomous agent for auditing and improving the reliability of clinical AI models
Authors: Lukas Kuhn, Florian Buettner |
阅读更多来源: ArXiv AI | 09-07-25
Current Practices for Building LLM-Powered Reasoning Tools Are Ad Hoc -- and We Can Do Better
Authors: Aaron Bembenek (The University of Melbourne) |
阅读更多来源: ArXiv AI | 09-07-25
Affective-ROPTester: Capability and Bias Analysis of LLMs in Predicting Retinopathy of Prematurity
Authors: Shuai Zhao, Yulin Zhang, Luwei Xiao, Xinyi Wu, Yanhao Jia, Zhongliang Guo, Xiaobao Wu, Cong-Duy Nguyen, Guoming Zhang, Anh Tuan Luu |
阅读更多来源: ArXiv AI | 09-07-25
MusiScene: Leveraging MU-LLaMA for Scene Imagination and Enhanced Video Background Music Generation
Authors: Fathinah Izzati, Xinyue Li, Yuxuan Wu, Gus Xia |
阅读更多来源: ArXiv AI | 09-07-25
Development and Evaluation of HopeBot: an LLM-based chatbot for structured and interactive PHQ-9 depression screening
Authors: Zhijun Guo, Alvina Lai, Julia Ive, Alexandru Petcu, Yutong Wang, Luyuan Qi, Johan H Thygesen, Kezhi Li |
阅读更多来源: ArXiv AI | 09-07-25
OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety
Authors: Sanidhya Vijayvargiya, Aditya Bharat Soni, Xuhui Zhou, Zora Zhiruo Wang, Nouha Dziri, Graham Neubig, Maarten Sap |
阅读更多来源: ArXiv AI | 09-07-25
FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models
Authors: Bo Pang, Yalu Ouyang, Hangfei Xu, Ziqi Jia, Panpan Li, Shengzhao Wen, Lu Wang, Shiyong Li, Yanpeng Wang |
阅读更多来源: ArXiv AI | 09-07-25
Smollm3: Smol, multilingual, long-context reasoner LLMhuggingface.co
阅读更多来源: Hacker News | 09-07-25
I'm Building LLM for Satellite Data EarthGPT.appearthgpt.app
阅读更多来源: Hacker News | 09-07-25
The Tradeoffs of SSMs and Transformersgoombalab.github.io
阅读更多来源: Hacker News | 09-07-25
Rules of good writing (2007)dilbertblog.typepad.com
阅读更多来源: Hacker News | 09-07-25
ChatGPT helped identify a genetic MTHFR mutation after a decade of missed diagnoses
阅读更多来源: The Decoder | 08-07-25
Adding a feature because ChatGPT incorrectly thinks it existsholovaty.com
阅读更多来源: Hacker News | 08-07-25
Launch HN: Morph (YC S23) – Apply AI code edits at 4,500 tokens/sec
阅读更多来源: Hacker News | 08-07-25
Agent Exchange: Shaping the Future of AI Agent Economics
Authors: Yingxuan Yang, Ying Wen, Jun Wang, Weinan Zhang |
阅读更多来源: ArXiv AI | 08-07-25
LLMs model how humans induce logically structured rules
Authors: Alyssa Loo, Ellie Pavlick, Roman Feiman |
阅读更多来源: ArXiv AI | 08-07-25
Toward Better Generalisation in Uncertainty Estimators: Leveraging Data-Agnostic Features
Authors: Thuy An Ha, Bao Quoc Vo |
阅读更多来源: ArXiv AI | 08-07-25
Lyria: A General LLM-Driven Genetic Algorithm Framework for Problem Solving
Authors: Weizhi Tang, Kwabena Nuamah, Vaishak Belle |
阅读更多来源: ArXiv AI | 08-07-25
A Technical Survey of Reinforcement Learning Techniques for Large Language Models
Authors: Saksham Sahai Srivastava, Vaneet Aggarwal |
阅读更多来源: ArXiv AI | 08-07-25
Enhancing Robustness of LLM-Driven Multi-Agent Systems through Randomized Smoothing
Authors: Jinwei Hu, Yi Dong, Zhengtao Ding, Xiaowei Huang |
阅读更多来源: ArXiv AI | 08-07-25
How to Train Your LLM Web Agent: A Statistical Diagnosis
Authors: Dheeraj Vattikonda, Santhoshi Ravichandran, Emiliano Penaloza, Hadi Nekoei, Megh Thakkar, Thibault Le Sellier de Chezelles, Nicolas Gontier, Miguel Muñoz-Mármol, Sahar Omidi Shayegan, Stefania Raimondo, Xue Liu, Alexandre Drouin, Laurent Charlin, Alexandre Piché, Alexandre Lacoste, Massimo Caccia |
阅读更多来源: ArXiv AI | 08-07-25
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers
Authors: Jingze Zhu, Yongliang Wu, Wenbo Zhu, Jiawang Cao, Yanqiang Zheng, Jiawei Chen, Xu Yang, Bernt Schiele, Jonas Fischer, Xinting Hu |
阅读更多来源: ArXiv AI | 08-07-25
DC-Mamber: A Dual Channel Prediction Model based on Mamba and Linear Transformer for Multivariate Time Series Forecasting
Authors: Bing Fan, Shusen Ma, Yun-Bo Zhao, Yu Kang |
阅读更多来源: ArXiv AI | 08-07-25
MedGellan: LLM-Generated Medical Guidance to Support Physicians
Authors: Debodeep Banerjee, Burcu Sayin, Stefano Teso, Andrea Passerini |
阅读更多来源: ArXiv AI | 08-07-25
Towards integration of Privacy Enhancing Technologies in Explainable Artificial Intelligence
Authors: Sonal Allana, Rozita Dara, Xiaodong Lin, Pulei Xiong |
阅读更多来源: ArXiv AI | 08-07-25
Exploring Core and Periphery Precepts in Biological and Artificial Intelligence: An Outcome-Based Perspective
Authors: Niloofar Shadab, Tyler Cody, Alejandro Salado, Taylan G. Topcu, Mohammad Shadab, Peter Beling |
阅读更多来源: ArXiv AI | 08-07-25
LLM-based Question-Answer Framework for Sensor-driven HVAC System Interaction
Authors: Sungmin Lee, Minju Kang, Joonhee Lee, Seungyong Lee, Dongju Kim, Jingi Hong, Jun Shin, Pei Zhang, JeongGil Ko |
阅读更多来源: ArXiv AI | 08-07-25
ChipSeek-R1: Generating Human-Surpassing RTL with LLM via Hierarchical Reward-Driven Reinforcement Learning
Authors: Zhirong Chen, Kaiyan Chang, Zhuolin Li, Xinyang He, Chujie Chen, Cangyuan Li, Mengdi Wang, Haobo Xu, Yinhe Han, Ying Wang |
阅读更多来源: ArXiv AI | 08-07-25
DoPI: Doctor-like Proactive Interrogation LLM for Traditional Chinese Medicine
Authors: Zewen Sun, Ruoxiang Huang, Jiahe Feng, Rundong Kong, Yuqian Wang, Hengyu Liu, Ziqi Gong, Yuyuan Qin, Yingxue Wang, Yu Wang |
阅读更多来源: ArXiv AI | 08-07-25
Application and Evaluation of Large Language Models for Forecasting the Impact of Traffic Incidents
Authors: George Jagadeesh, Srikrishna Iyer, Michal Polanowski, Kai Xin Thia |
阅读更多来源: ArXiv AI | 08-07-25
MARBLE: A Multi-Agent Rule-Based LLM Reasoning Engine for Accident Severity Prediction
Authors: Kaleem Ullah Qasim, Jiashu Zhang |
阅读更多来源: ArXiv AI | 08-07-25
SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam?
Authors: Jingyi Chai, Shuo Tang, Rui Ye, Yuwen Du, Xinyu Zhu, Mengcheng Zhou, Yanfeng Wang, Weinan E, Siheng Chen |
阅读更多来源: ArXiv AI | 08-07-25
OpenAI's Head of Recruiting says Meta's hiring tactics "reek of desperation"
阅读更多来源: The Decoder | 08-07-25
The Maquet machine: how AI is reviving Alexandre Dumas' successful model
阅读更多来源: The Decoder | 08-07-25
Alibaba's new GPT-4o competitor Qwen VLo is no longer open source
阅读更多来源: The Decoder | 08-07-25
A non-anthropomorphized view of LLMsaddxorrol.blogspot.com
阅读更多来源: Hacker News | 08-07-25
Early Signs of Steganographic Capabilities in Frontier LLMs
Authors: Artur Zolkowski, Kei Nishimura-Gasparian, Robert McCarthy, Roland S. Zimmermann, David Lindner |
阅读更多来源: ArXiv AI | 08-07-25
Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks
Authors: Sizhe Chen, Arman Zharmagambetov, David Wagner, Chuan Guo |
阅读更多来源: ArXiv AI | 08-07-25
MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs
Authors: Purbesh Mitra, Sennur Ulukus |
阅读更多来源: ArXiv AI | 08-07-25
SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model
Authors: Wencheng Zhang, Shiqin Qiao, Lingjie Luo, Yinfeng Li, Chuanyang Zheng, Qian Xu, Meng Li, Yong Gui, Yijun He, Jianing Qiu, Jindong Hong, Jiankai Sun |
阅读更多来源: ArXiv AI | 08-07-25
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs
Authors: Ken Tsui |
阅读更多来源: ArXiv AI | 08-07-25
Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs
Authors: Mohammad Ali Alomrani, Yingxue Zhang, Derek Li, Qianyi Sun, Soumyasundar Pal, Zhanguang Zhang, Yaochen Hu, Rohan Deepak Ajwani, Antonios Valkanas, Raika Karimi, Peng Cheng, Yunzhou Wang, Pengyi Liao, Hanrui Huang, Bin Wang, Jianye Hao, Mark Coates |
阅读更多来源: ArXiv AI | 08-07-25
STELLA: Self-Evolving LLM Agent for Biomedical Research
Authors: Ruofan Jin, Zaixi Zhang, Mengdi Wang, Le Cong |
阅读更多来源: ArXiv AI | 08-07-25
Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous Evaluation
Authors: Jungkoo Kang |
阅读更多来源: ArXiv AI | 08-07-25
Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust
Authors: Amogh Mannekote, Adam Davies, Guohao Li, Kristy Elizabeth Boyer, ChengXiang Zhai, Bonnie J Dorr, Francesco Pinto |
阅读更多来源: ArXiv AI | 08-07-25
Data Diversification Methods In Alignment Enhance Math Performance In LLMs
Authors: Berkan Dokmeci, Qingyang Wu, Ben Athiwaratkun, Ce Zhang, Shuaiwen Leon Song, James Zou |
阅读更多来源: ArXiv AI | 08-07-25
What Neuroscience Can Teach AI About Learning in Continuously Changing Environments
Authors: Daniel Durstewitz, Bruno Averbeck, Georgia Koppe |
阅读更多来源: ArXiv AI | 08-07-25
AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench
Authors: Edan Toledo, Karen Hambardzumyan, Martin Josifoski, Rishi Hazra, Nicolas Baldwin, Alexis Audran-Reiss, Michael Kuchnik, Despoina Magka, Minqi Jiang, Alisia Maria Lupidi, Andrei Lupu, Roberta Raileanu, Kelvin Niu, Tatiana Shavrina, Jean-Christophe Gagnon-Audet, Michael Shvartsman, Shagun Sodhani, Alexander H. Miller, Abhishek Charnalia, Derek Dunfield, Carole-Jean Wu, Pontus Stenetorp, Nicola Cancedda, Jakob Nicolaus Foerster, Yoram Bachrach |
阅读更多来源: ArXiv AI | 08-07-25
OMS: On-the-fly, Multi-Objective, Self-Reflective Ad Keyword Generation via LLM Agent
Authors: Bowen Chen, Zhao Wang, Shingo Takamatsu |
阅读更多来源: ArXiv AI | 08-07-25
Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory
Authors: Kenneth Payne, Baptiste Alloui-Cros |
阅读更多来源: ArXiv AI | 08-07-25
Detection of Disengagement from Voluntary Quizzes: An Explainable Machine Learning Approach in Higher Distance Education
Authors: Behnam Parsaeifard, Christof Imhof, Tansu Pancar, Ioan-Sorin Comsa, Martin Hlosta, Nicole Bergamin, Per Bergamin |
阅读更多来源: ArXiv AI | 08-07-25
Knowledge Protocol Engineering: A New Paradigm for AI in Domain-Specific Knowledge Work
Authors: Guangwei Zhang |
阅读更多来源: ArXiv AI | 08-07-25
KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs
Authors: Yuzhang Xie, Hejie Cui, Ziyang Zhang, Jiaying Lu, Kai Shu, Fadi Nahab, Xiao Hu, Carl Yang |
阅读更多来源: ArXiv AI | 08-07-25
"No grace period, no pause": EU sticks to AI Act timeline despite industry pushback
阅读更多来源: The Decoder | 07-07-25
ChatGPT usage for news surges as Google news searches decline
阅读更多来源: The Decoder | 07-07-25
LLMs should not replace therapistsarxiv.org
阅读更多来源: Hacker News | 07-07-25
Opencode: AI coding agent, built for the terminalgithub.com/sst
阅读更多来源: Hacker News | 07-07-25
Collatz's Ant and Σ(n)gbragafibra.github.io
阅读更多来源: Hacker News | 07-07-25
Overclocking LLM Reasoning: Monitoring and Controlling LLM Thinking Path Lengthsroyeisen.github.io
阅读更多来源: Hacker News | 07-07-25
Mirage: AI-native UGC game engine powered by real-time world modeldynamicslab.ai
阅读更多来源: Hacker News | 07-07-25
Optimizing Tool Selection for LLM Workflows with Differentiable Programmingviksit.substack.com
阅读更多来源: Hacker News | 06-07-25
The force-feeding of AI features on an unwilling publichonest-broker.com
阅读更多来源: Hacker News | 06-07-25
A Canadian's AI hoax duped the media and propelled a 'band' to successcbc.ca
阅读更多来源: Hacker News | 06-07-25
The Right Way to Embed an LLM in a Group Chattripjam.app
阅读更多来源: Hacker News | 06-07-25
Impact of PCIe 5.0 Bandwidth on GPU Content Creation and LLM Performancepugetsystems.com
阅读更多来源: Hacker News | 05-07-25
Large Language Models Are Improving Exponentiallyieee.org
阅读更多来源: Hacker News | 05-07-25
SciArena lets scientists compare LLMs on real research questions
阅读更多来源: The Decoder | 05-07-25
Google launches Veo 3 Fast worldwide, letting Gemini Pro users generate videos up to 720p
阅读更多来源: The Decoder | 05-07-25
Gremllmgithub.com/awwaiid
阅读更多来源: Hacker News | 05-07-25
ChatGPT creates phisher's paradise by serving the wrong URLs for major companiestheregister.com
阅读更多来源: Hacker News | 05-07-25
Version Control for AI Codingbranching.app
阅读更多来源: Hacker News | 05-07-25
Everything around LLMs is still magical and wishful thinkingdmitriid.com
阅读更多来源: Hacker News | 05-07-25
Meta reportedly offers top OpenAI researchers up to $300 million over four years
阅读更多来源: The Decoder | 04-07-25
How AI on Microcontrollers Works: Operators and Kernelsdanielmangum.com
阅读更多来源: Hacker News | 04-07-25
Show HN: I AI coded a tower defense game and documented the whole processgithub.com/maciej-trebacz
阅读更多来源: Hacker News | 04-07-25
Fei-Fei Li: Spatial intelligence is the next frontier in AI [video]youtube.com
阅读更多来源: Hacker News | 04-07-25
About AI Evalshamel.dev
阅读更多来源: Hacker News | 04-07-25
Manipulating trapped air bubbles in ice for message storage in cold regionscell.com
阅读更多来源: Hacker News | 04-07-25
Cloudflare aims to save the World Wide Web by blocking AI crawlers without explicit consent
阅读更多来源: The Decoder | 03-07-25
Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling
Authors: Zeyu Huang, Tianhao Cheng, Zihan Qiu, Zili Wang, Yinghui Xu, Edoardo M. Ponti, Ivan Titov |
阅读更多来源: ArXiv AI | 03-07-25
AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training
Authors: Zhenyu Han, Ansheng You, Haibo Wang, Kui Luo, Guang Yang, Wenqi Shi, Menglong Chen, Sicheng Zhang, Zeshun Lan, Chunshi Deng, Huazhong Ji, Wenjie Liu, Yu Huang, Yixiang Zhang, Chenyi Pan, Jing Wang, Xin Huang, Chunsheng Li, Jianping Wu |
阅读更多来源: ArXiv AI | 03-07-25
Towards culturally-appropriate conversational AI for health in the majority world: An exploratory study with citizens and professionals in Latin America
Authors: Dorian Peters, Fernanda Espinoza, Marco da Re, Guido Ivetta, Luciana Benotti, Rafael A. Calvo |
阅读更多来源: ArXiv AI | 03-07-25
AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness
Authors: Zixin Chen, Hongzhan Lin, Kaixin Li, Ziyang Luo, Zhen Ye, Guang Chen, Zhiyong Huang, Jing Ma |
阅读更多来源: ArXiv AI | 03-07-25
Exploring Advanced LLM Multi-Agent Systems Based on Blackboard Architecture
Authors: Bochen Han, Songmao Zhang |
阅读更多来源: ArXiv AI | 03-07-25
Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training
Authors: Ismail Labiad, Mathurin Videau, Matthieu Kowalski, Marc Schoenauer, Alessandro Leite, Julia Kempe, Olivier Teytaud |
阅读更多来源: ArXiv AI | 03-07-25
MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining
Authors: Zhixun Chen, Ping Guo, Wenhan Han, Yifan Zhang, Binbin Liu, Haobin Lin, Fengze Liu, Yan Zhao, Bingni Zhang, Taifeng Wang, Yin Zheng, Meng Fang |
阅读更多来源: ArXiv AI | 03-07-25
Enhanced Generative Model Evaluation with Clipped Density and Coverage
Authors: Nicolas Salvy, Hugues Talbot, Bertrand Thirion |
阅读更多来源: ArXiv AI | 03-07-25
Empowering Manufacturers with Privacy-Preserving AI Tools: A Case Study in Privacy-Preserving Machine Learning to Solve Real-World Problems
Authors: Xiaoyu Ji, Jessica Shorland, Joshua Shank, Pascal Delpe-Brice, Latanya Sweeney, Jan Allebach, Ali Shakouri |
阅读更多来源: ArXiv AI | 03-07-25
LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs
Authors: Reza Arabpour, Haitz Sáez de Ocáriz Borde, Anastasis Kratsios |
阅读更多来源: ArXiv AI | 03-07-25
Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging
Authors: Montasir Shams, Chashi Mahiul Islam, Shaeke Salman, Phat Tran, Xiuwen Liu |
阅读更多来源: ArXiv AI | 03-07-25
End-to-End Large Portfolio Optimization for Variance Minimization with Neural Networks through Covariance Cleaning
Authors: Christian Bongiorno, Efstratios Manolakis, Rosario Nunzio Mantegna |
阅读更多来源: ArXiv AI | 03-07-25
Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models
Authors: Chengao Li, Hanyu Zhang, Yunkun Xu, Hongyan Xue, Xiang Ao, Qing He |
阅读更多来源: ArXiv AI | 03-07-25
AI4Research: A Survey of Artificial Intelligence for Scientific Research
Authors: Qiguang Chen, Mingda Yang, Libo Qin, Jinhao Liu, Zheng Yan, Jiannan Guan, Dengyun Peng, Yiyan Ji, Hanjing Li, Mengkang Hu, Yimeng Zhang, Yihao Liang, Yuhang Zhou, Jiaqi Wang, Zhi Chen, Wanxiang Che |
阅读更多来源: ArXiv AI | 03-07-25
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
Authors: Rahul Ramachandran, Ali Garjani, Roman Bachmann, Andrei Atanov, Oğuzhan Fatih Kar, Amir Zamir |
阅读更多来源: ArXiv AI | 03-07-25
Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla
Authors: Md Sazzadul Islam Ridoy, Sumi Akter, Md. Aminur Rahman |
阅读更多来源: ArXiv AI | 03-07-25
Exploring a Hybrid Deep Learning Approach for Anomaly Detection in Mental Healthcare Provider Billing: Addressing Label Scarcity through Semi-Supervised Anomaly Detection
Authors: Samirah Bakker, Yao Ma, Seyed Sahand Mohammadi Ziabari |
阅读更多来源: ArXiv AI | 03-07-25
AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation
Authors: Sixiang Chen, Jiaming Liu, Siyuan Qian, Han Jiang, Lily Li, Renrui Zhang, Zhuoyang Liu, Chenyang Gu, Chengkai Hou, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang |
阅读更多来源: ArXiv AI | 03-07-25
Using multi-agent architecture to mitigate the risk of LLM hallucinations
Authors: Abd Elrahman Amer, Magdi Amer |
阅读更多来源: ArXiv AI | 03-07-25
MindsDB (YC W20) is hiring an AI solutions engineergreenhouse.io
阅读更多来源: Hacker News | 03-07-25
What to build instead of AI agentsdecodingml.substack.com
阅读更多来源: Hacker News | 03-07-25
Meta founds Superintelligence Labs with top acquisitions from OpenAI and Google
阅读更多来源: The Decoder | 02-07-25
Apple weighs abandoning its own AI for Siri as it tests models from OpenAI and Anthropic
阅读更多来源: The Decoder | 02-07-25
HN Slop: AI startup ideas generated from Hacker Newsjosh.ing
阅读更多来源: Hacker News | 02-07-25
Show HN: A modern C++20 AI SDK (GPT‑4o, Claude 3.5, tool‑calling)
阅读更多来源: Hacker News | 02-07-25
Show HN: Spegel, a Terminal Browser That Uses LLMs to Rewrite Webpagessimedw.com
阅读更多来源: Hacker News | 02-07-25
Sam Altman Slams Meta's AI Talent Poaching: 'Missionaries Will Beat Mercenaries'wired.com
阅读更多来源: Hacker News | 02-07-25
Hilbert's sixth problem: derivation of fluid equations via Boltzmann's theoryarxiv.org
阅读更多来源: Hacker News | 02-07-25
How large are large language models?gist.github.com
阅读更多来源: Hacker News | 02-07-25
Echoes of AI: Investigating the Downstream Effects of AI Assistants on Software Maintainability
Authors: Markus Borg, Dave Hewett, Nadim Hagatulah, Noric Couderc, Emma Söderberg, Donald Graham, Uttam Kini, Dave Farley |
阅读更多来源: ArXiv AI | 02-07-25
HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning
Authors: Zhi Jing, Siyuan Yang, Jicong Ao, Ting Xiao, Yugang Jiang, Chenjia Bai |
阅读更多来源: ArXiv AI | 02-07-25
Automated anatomy-based post-processing reduces false positives and improved interpretability of deep learning intracranial aneurysm detection
Authors: Jisoo Kim, Chu-Hsuan Lin, Alberto Ceballos-Arroyo, Ping Liu, Huaizu Jiang, Shrikanth Yadav, Qi Wan, Lei Qin, Geoffrey S Young |
阅读更多来源: ArXiv AI | 02-07-25
CAVALRY-V: A Large-Scale Generator Framework for Adversarial Attacks on Video MLLMs
Authors: Jiaming Zhang, Rui Hu, Qing Guo, Wei Yang Bryan Lim |
阅读更多来源: ArXiv AI | 02-07-25
Many LLMs Are More Utilitarian Than One
Authors: Anita Keshmirian, Razan Baltaji, Babak Hemmatian, Hadi Asghari, Lav R. Varshney |
阅读更多来源: ArXiv AI | 02-07-25
Deep learning-based segmentation of T1 and T2 cardiac MRI maps for automated disease detection
Authors: Andreea Bianca Popescu, Andreas Seitz, Heiko Mahrholdt, Jens Wetzl, Athira Jacob, Lucian Mihai Itu, Constantin Suciu, Teodora Chitiboi |
阅读更多来源: ArXiv AI | 02-07-25
Stylometry recognizes human and LLM-generated texts in short samples
Authors: Karol Przystalski, Jan K. Argasiński, Iwona Grabska-Gradzińska, Jeremi K. Ochab |
阅读更多来源: ArXiv AI | 02-07-25
Large Language Model Powered Intelligent Urban Agents: Concepts, Capabilities, and Applications
Authors: Jindong Han, Yansong Ning, Zirui Yuan, Hang Ni, Fan Liu, Tengfei Lyu, Hao Liu |
阅读更多来源: ArXiv AI | 02-07-25
Turning AI Data Centers into Grid-Interactive Assets: Results from a Field Demonstration in Phoenix, Arizona
Authors: Philip Colangelo, Ayse K. Coskun, Jack Megrue, Ciaran Roberts, Shayan Sengupta, Varun Sivaram, Ethan Tiao, Aroon Vijaykar, Chris Williams, Daniel C. Wilson, Zack MacFarland, Daniel Dreiling, Nathan Morey, Anuja Ratnayake, Baskar Vairamohan |
阅读更多来源: ArXiv AI | 02-07-25
Description of the Training Process of Neural Networks via Ergodic Theorem : Ghost nodes
Authors: Eun-Ji Park, Sangwon Yun |
阅读更多来源: ArXiv AI | 02-07-25
TalentMine: LLM-Based Extraction and Question-Answering from Multimodal Talent Tables
Authors: Varun Mannam, Fang Wang, Chaochun Liu, Xin Chen |
阅读更多来源: ArXiv AI | 02-07-25
Holistic Artificial Intelligence in Medicine; improved performance and explainability
Authors: Periklis Petridis, Georgios Margaritis, Vasiliki Stoumpou, Dimitris Bertsimas |
阅读更多来源: ArXiv AI | 02-07-25
ChatGPT produces more "lazy" thinkers: Evidence of cognitive engagement decline
Authors: Georgios P. Georgiou |
阅读更多来源: ArXiv AI | 02-07-25
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
Authors: Maggie Huan, Yuetai Li, Tuney Zheng, Xiaoyu Xu, Seungone Kim, Minxin Du, Radha Poovendran, Graham Neubig, Xiang Yue |
阅读更多来源: ArXiv AI | 02-07-25
Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess
Authors: Dongyoon Hwang, Hojoon Lee, Jaegul Choo, Dongmin Park, Jongho Park |
阅读更多来源: ArXiv AI | 02-07-25
A Robust Algorithm for Non-IID Machine Learning Problems with Convergence Analysis
Authors: Qing Xu, Xiaohua Xuan |
阅读更多来源: ArXiv AI | 02-07-25
Enhancing LLM Agent Safety via Causal Influence Prompting
Authors: Dongyoon Hahm, Woogyeol Jin, June Suk Choi, Sungsoo Ahn, Kimin Lee |
阅读更多来源: ArXiv AI | 02-07-25
Google brings Gemini for Education and Gemini in Classroom AI tools to schools
阅读更多来源: The Decoder | 02-07-25
Microsoft’s MAI-DxO boosts AI diagnostic accuracy and cuts costs by nearly 70 percent
阅读更多来源: The Decoder | 02-07-25
The wanton destruction of a creative-tech eragreg.technology
阅读更多来源: Hacker News | 02-07-25
Building a Personal AI Factoryjohn-rush.com
阅读更多来源: Hacker News | 02-07-25
Show HN: Core – open source memory graph for LLMs – shareable, user ownedgithub.com/redplanethq
阅读更多来源: Hacker News | 02-07-25
After Meta's recruiting push, OpenAI tries to retain talent
阅读更多来源: The Decoder | 01-07-25
Claude Code now supports hooksanthropic.com
阅读更多来源: Hacker News | 01-07-25
GPEmu: A GPU emulator for rapid, low-cost deep learning prototyping [pdf]vldb.org
阅读更多来源: Hacker News | 01-07-25
Cloudflare to introduce pay-per-crawl for AI botscloudflare.com
阅读更多来源: Hacker News | 01-07-25
Researchers Uncover Hidden Ingredients Behind AI Creativityquantamagazine.org
阅读更多来源: Hacker News | 01-07-25
The new skill in AI is not prompting, it's context engineeringphilschmid.de
阅读更多来源: Hacker News | 01-07-25
The hidden JTAG in a Qualcomm/Snapdragon device’s USB portlinaro.org
阅读更多来源: Hacker News | 01-07-25
Show HN: ToplingDB - A Persistent Key-Value Store for External Storagegithub.com/topling
阅读更多来源: Hacker News | 01-07-25
The average chess players of Bletchley Park and AI research in Britainblogs.bl.uk
阅读更多来源: Hacker News | 01-07-25
Development of Hybrid Artificial Intelligence Training on Real and Synthetic Data: Benchmark on Two Mixed Training Strategies
Authors: Paul Wachter, Lukas Niehaus, Julius Schöning |
阅读更多来源: ArXiv AI | 01-07-25
Bootstrapping Human-Like Planning via LLMs
Authors: David Porfirio, Vincent Hsiao, Morgan Fine-Morris, Leslie Smith, Laura M. Hiatt |
阅读更多来源: ArXiv AI | 01-07-25
Bridging Ethical Principles and Algorithmic Methods: An Alternative Approach for Assessing Trustworthiness in AI Systems
Authors: Michael Papademas, Xenia Ziouvelou, Antonis Troumpoukis, Vangelis Karkaletsis |
阅读更多来源: ArXiv AI | 01-07-25
The Societal Impact of Foundation Models: Advancing Evidence-based AI Policy
Authors: Rishi Bommasani |
阅读更多来源: ArXiv AI | 01-07-25
Can Large Language Models Capture Human Risk Preferences? A Cross-Cultural Study
Authors: Bing Song, Jianing Liu, Sisi Jian, Chenyang Wu, Vinayak Dixit |
阅读更多来源: ArXiv AI | 01-07-25
Are Large Language Models Capable of Deep Relational Reasoning? Insights from DeepSeek-R1 and Benchmark Comparisons
Authors: Chi Chiu So, Yueyue Sun, Jun-Min Wang, Siu Pang Yung, Anthony Wai Keung Loh, Chun Pong Chau |
阅读更多来源: ArXiv AI | 01-07-25
Data Augmentation for Cognitive Behavioral Therapy: Leveraging ERNIE Language Models using Artificial Intelligence
Authors: Bosubabu Sambana, Kondreddygari Archana, Suram Indhra Sena Reddy, Shaik Meethaigar Jameer Basha, Shaik Karishma |
阅读更多来源: ArXiv AI | 01-07-25
The Confidence Paradox: Can LLM Know When It's Wrong
Authors: Sahil Tripathi, Md Tabrez Nafis, Imran Hussain, Jiechao Gao |
阅读更多来源: ArXiv AI | 01-07-25
CooT: Learning to Coordinate In-Context with Coordination Transformers
Authors: Huai-Chih Wang, Hsiang-Chun Chuang, Hsi-Chun Cheng, Dai-Jie Wu, Shao-Hua Sun |
阅读更多来源: ArXiv AI | 01-07-25
ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data
Authors: Yu Zhang, Ruijie Yu, Jidong Tian, Feng Zhu, Jiapeng Liu, Xiaokang Yang, Yaohui Jin, Yanyan Xu |
阅读更多来源: ArXiv AI | 01-07-25
Assessing GPTZero's Accuracy in Identifying AI vs. Human-Written Essays
Authors: Selin Dik, Osman Erdem, Mehmet Dik |
阅读更多来源: ArXiv AI | 01-07-25
Evaluating Multi-Agent Defences Against Jailbreaking Attacks on Large Language Models
Authors: Maria Carolina Cornelia Wit, Jun Pang |
阅读更多来源: ArXiv AI | 01-07-25
MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI
Authors: Huanjin Yao, Jiaxing Huang, Yawen Qiu, Michael K. Chen, Wenzheng Liu, Wei Zhang, Wenjie Zeng, Xikun Zhang, Jingyi Zhang, Yuxin Song, Wenhao Wu, Dacheng Tao |
阅读更多来源: ArXiv AI | 01-07-25
Agent4S: The Transformation of Research Paradigms from the Perspective of Large Language Models
Authors: Boyuan Zheng, Zerui Fang, Zhe Xu, Rui Wang, Yiwen Chen, Cunshi Wang, Mengwei Qu, Lei Lei, Zhen Feng, Yan Liu, Yuyang Li, Mingzhou Tan, Jiaji Wu, Jianwei Shuai, Jia Li, Fangfu Ye |
阅读更多来源: ArXiv AI | 01-07-25
Attestable Audits: Verifiable AI Safety Benchmarks Using Trusted Execution Environments
Authors: Christoph Schnabl, Daniel Hugenroth, Bill Marino, Alastair R. Beresford |
阅读更多来源: ArXiv AI | 01-07-25
A New Perspective On AI Safety Through Control Theory Methodologies
Authors: Lars Ullrich, Walter Zimmer, Ross Greer, Knut Graichen, Alois C. Knoll, Mohan Trivedi |
阅读更多来源: ArXiv AI | 01-07-25
Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning
Authors: Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik |
阅读更多来源: ArXiv AI | 01-07-25
Performance of LLMs on Stochastic Modeling Operations Research Problems: From Theory to Practice
Authors: Akshit Kumar, Tianyi Peng, Yuhang Wu, Assaf Zeevi |
阅读更多来源: ArXiv AI | 01-07-25
AI Risk-Management Standards Profile for General-Purpose AI (GPAI) and Foundation Models
Authors: Anthony M. Barrett, Jessica Newman, Brandie Nonnecke, Nada Madkour, Dan Hendrycks, Evan R. Murphy, Krystal Jackson, Deepika Raman |
阅读更多来源: ArXiv AI | 01-07-25
Harnessing AI Agents to Advance Research on Refugee Child Mental Health
Authors: Aditya Shrivastava, Komal Gupta, Shraddha Arora |
阅读更多来源: ArXiv AI | 01-07-25
OpenAI loses four more top researchers to Meta as even its own engineers call it a "huge loss"
阅读更多来源: The Decoder | 01-07-25
Show HN: Local LLM Notepad – run a GPT-style model from a USB stickgithub.com/runzhouye
阅读更多来源: Hacker News | 01-07-25
Show HN: We're two coffee nerds who built an AI app to track beans and recipesbeanbook.app
阅读更多来源: Hacker News | 01-07-25
Show HN: TokenDagger – A tokenizer faster than OpenAI's Tiktokengithub.com/m4thyou
阅读更多来源: Hacker News | 01-07-25
There are no new ideas in AI only new datasetsjxmo.io
阅读更多来源: Hacker News | 01-07-25
OmniGen 2 blends image and text generation like GPT-4o, but is open source
阅读更多来源: The Decoder | 30-06-25
Gridfinity: The modular, open-source grid storage systemgridfinity.xyz
阅读更多来源: Hacker News | 30-06-25
Using Large Language Models to Suggest Informative Prior Distributions in Bayesian Statistics
Authors: Michael A. Riegler, Kristoffer Herland Hellton, Vajira Thambawita, Hugo L. Hammer |
阅读更多来源: ArXiv AI | 30-06-25
Analyzing and Fine-Tuning Whisper Models for Multilingual Pilot Speech Transcription in the Cockpit
Authors: Kartheek Kumar Reddy Nareddy, Sarah Ternus, Julia Niebling |
阅读更多来源: ArXiv AI | 30-06-25
Advancing Jailbreak Strategies: A Hybrid Approach to Exploiting LLM Vulnerabilities and Bypassing Modern Defenses
Authors: Mohamed Ahmed, Mohamed Abdelmouty, Mingyu Kim, Gunvanth Kandula, Alex Park, James C. Davis |
阅读更多来源: ArXiv AI | 30-06-25
Transformers are Graph Neural Networks
Authors: Chaitanya K. Joshi |
阅读更多来源: ArXiv AI | 30-06-25
Autonomic Microservice Management via Agentic AI and MAPE-K Integration
Authors: Matteo Esposito, Alexander Bakhtin, Noman Ahmad, Mikel Robredo, Ruoyu Su, Valentina Lenarduzzi, Davide Taibi |
阅读更多来源: ArXiv AI | 30-06-25
CoATA: Effective Co-Augmentation of Topology and Attribute for Graph Neural Networks
Authors: Tao Liu, Longlong Lin, Yunfeng Yu, Xi Ou, Youan Zhang, Zhiqiu Ye, Tao Jia |
阅读更多来源: ArXiv AI | 30-06-25
Projected Compression: Trainable Projection for Efficient Transformer Compression
Authors: Maciej Stefaniak, Michał Krutul, Jan Małaśnicki, Maciej Pióro, Jakub Krajewski, Sebastian Jaszczur, Marek Cygan, Kamil Adamczewski, Jan Ludziejewski |
阅读更多来源: ArXiv AI | 30-06-25
From Ground to Air: Noise Robustness in Vision Transformers and CNNs for Event-Based Vehicle Classification with Potential UAV Applications
Authors: Nouf Almesafri, Hector Figueiredo, Miguel Arana-Catania |
阅读更多来源: ArXiv AI | 30-06-25
Concept-Level AI for Telecom: Moving Beyond Large Language Models
Authors: Viswanath Kumarskandpriya, Abdulhalim Dandoush, Abbas Bradai, Ali Belgacem |
阅读更多来源: ArXiv AI | 30-06-25
A Deep Learning framework for building damage assessment using VHR SAR and geospatial data: demonstration on the 2023 Turkiye Earthquake
Authors: Luigi Russo, Deodato Tapete, Silvia Liberata Ullo, Paolo Gamba |
阅读更多来源: ArXiv AI | 30-06-25
CLoVE: Personalized Federated Learning through Clustering of Loss Vector Embeddings
Authors: Randeep Bhatia, Nikos Papadis, Murali Kodialam, TV Lakshman, Sayak Chakrabarty |
阅读更多来源: ArXiv AI | 30-06-25
QuickSilver -- Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization
Authors: Danush Khanna, Aditya Kumar Guru, Srivarshinee Sridhar, Zidan Ahmed, Rubhav Bahirwani, Meetu Malhotra, Vinija Jain, Aman Chadha, Amitava Das, Kripabandhu Ghosh |
阅读更多来源: ArXiv AI | 30-06-25
MobiVerse: Scaling Urban Mobility Simulation with Hybrid Lightweight Domain-Specific Generator and Large Language Models
Authors: Yifan Liu, Xishun Liao, Haoxuan Ma, Jonathan Liu, Rohan Jadhav, Jiaqi Ma |
阅读更多来源: ArXiv AI | 30-06-25
SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents
Authors: Wanxin Tian, Shijie Zhang, Kevin Zhang, Xiaowei Chi, Yulin Luo, Junyu Lu, Chunkai Fan, Qiang Zhou, Yiming Zhao, Ning Liu Siyu Lin, Zhiyuan Qin, Xiaozhu Ju, Shanghang Zhang, Jian Tang |
阅读更多来源: ArXiv AI | 30-06-25
CitySim: Modeling Urban Behaviors and City Dynamics with Large-Scale LLM-Driven Agent Simulation
Authors: Nicolas Bougie, Narimasa Watanabe |
阅读更多来源: ArXiv AI | 30-06-25
A Different Approach to AI Safety: Proceedings from the Columbia Convening on Openness in Artificial Intelligence and AI Safety
Authors: Camille François, Ludovic Péran, Ayah Bdeir, Nouha Dziri, Will Hawkins, Yacine Jernite, Sayash Kapoor, Juliet Shen, Heidy Khlaaf, Kevin Klyman, Nik Marda, Marie Pellat, Deb Raji, Divya Siddarth, Aviya Skowron, Joseph Spisak, Madhulika Srikumar, Victor Storchan, Audrey Tang, Jen Weedon |
阅读更多来源: ArXiv AI | 30-06-25
Query as Test: An Intelligent Driving Test and Data Storage Method for Integrated Cockpit-Vehicle-Road Scenarios
Authors: Shengyue Yao, Runqing Guo, Yangyang Qin, Miangbing Meng, Jipeng Cao, Yilun Lin, Yisheng Lv, Fei-Yue Wang |
阅读更多来源: ArXiv AI | 30-06-25
Embodied AI Agents: Modeling the World
Authors: Pascale Fung, Yoram Bachrach, Asli Celikyilmaz, Kamalika Chaudhuri, Delong Chen, Willy Chung, Emmanuel Dupoux, Hervé Jégou, Alessandro Lazaric, Arjun Majumdar, Andrea Madotto, Franziska Meier, Florian Metze, Théo Moutakanni, Juan Pino, Basile Terver, Joseph Tighe, Jitendra Malik |
阅读更多来源: ArXiv AI | 30-06-25
AI Model Passport: Data and System Traceability Framework for Transparent AI in Health
Authors: Varvara Kalokyri, Nikolaos S. Tachos, Charalampos N. Kalantzopoulos, Stelios Sfakianakis, Haridimos Kondylakis, Dimitrios I. Zaridis, Sara Colantonio, Daniele Regge, Nikolaos Papanikolaou, The ProCAncer-I consortium, Konstantinos Marias, Dimitrios I. Fotiadis, Manolis Tsiknakis |
阅读更多来源: ArXiv AI | 30-06-25
The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements
Authors: Bingchen Zhao, Despoina Magka, Minqi Jiang, Xian Li, Roberta Raileanu, Tatiana Shavrina, Jean-Christophe Gagnon-Audet, Kelvin Niu, Shagun Sodhani, Michael Shvartsman, Andrei Lupu, Alisia Lupidi, Edan Toledo, Karen Hambardzumyan, Martin Josifoski, Thomas Foster, Lucia Cipolina-Kun, Abhishek Charnalia, Derek Dunfield, Alexander H. Miller, Oisin Mac Aodha, Jakob Foerster, Yoram Bachrach |
阅读更多来源: ArXiv AI | 30-06-25
Anthropic's Claude ran a store and lost money by selling below cost and giving discounts
阅读更多来源: The Decoder | 30-06-25
Show HN: A tool to benchmark LLM APIs (OpenAI, Claude, local/self-hosted)llmapitest.com
阅读更多来源: Hacker News | 30-06-25
US Senate moves to block state AI laws for five years if states take broadband funds
阅读更多来源: The Decoder | 30-06-25
Life of an inference request (vLLM V1): How LLMs are served efficiently at scaleubicloud.com
阅读更多来源: Hacker News | 29-06-25
Magnetic Tape Storage Technology: usage, history, and future outlookacm.org
阅读更多来源: Hacker News | 29-06-25
Show HN: A different kind of AI Video generation
阅读更多来源: Hacker News | 29-06-25
Transformer-Based Spatial-Temporal Counterfactual Outcomes Estimation
Authors: He Li, Haoang Chi, Mingyu Liu, Wanrong Huang, Liyang Xu, Wenjing Yang |
阅读更多来源: ArXiv AI | 29-06-25
Maintaining MTEB: Towards Long Term Usability and Reproducibility of Embedding Benchmarks
Authors: Isaac Chung, Imene Kerboua, Marton Kardos, Roman Solomatin, Kenneth Enevoldsen |
阅读更多来源: ArXiv AI | 29-06-25
A Hierarchical Deep Learning Approach for Minority Instrument Detection
Authors: Dylan Sechet, Francesca Bugiotti, Matthieu Kowalski, Edouard d'Hérouville, Filip Langiewicz |
阅读更多来源: ArXiv AI | 29-06-25
$T^3$: Multi-level Tree-based Automatic Program Repair with Large Language Models
Authors: Quanming Liu, Xupeng Bu, Zhichao Yan, Ru Li |
阅读更多来源: ArXiv AI | 29-06-25
Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage
Authors: Gavin Lee Goodship, Luis Miralles-Pechuan, Stephen O'Sullivan |
阅读更多来源: ArXiv AI | 29-06-25
Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection
Authors: Ali Şenol, Garima Agrawal, Huan Liu |
阅读更多来源: ArXiv AI | 29-06-25
Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference
Authors: Colin Samplawski, Adam D. Cobb, Manoj Acharya, Ramneet Kaur, Susmit Jha |
阅读更多来源: ArXiv AI | 29-06-25
Leveraging LLM-Assisted Query Understanding for Live Retrieval-Augmented Generation
Authors: Guanting Dong, Xiaoxi Li, Yuyao Zhang, Mengjie Deng |
阅读更多来源: ArXiv AI | 29-06-25
"What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets
Authors: Akshay Paruchuri, Maryam Aziz, Rohit Vartak, Ayman Ali, Best Uchehara, Xin Liu, Ishan Chatterjee, Monica Agrawal |
阅读更多来源: ArXiv AI | 29-06-25
Potemkin Understanding in Large Language Models
Authors: Marina Mancoridis, Bec Weeks, Keyon Vafa, Sendhil Mullainathan |
阅读更多来源: ArXiv AI | 29-06-25
The Singapore Consensus on Global AI Safety Research Priorities
Authors: Yoshua Bengio, Tegan Maharaj, Luke Ong, Stuart Russell, Dawn Song, Max Tegmark, Lan Xue, Ya-Qin Zhang, Stephen Casper, Wan Sie Lee, Sören Mindermann, Vanessa Wilfred, Vidhisha Balachandran, Fazl Barez, Michael Belinsky, Imane Bello, Malo Bourgon, Mark Brakel, Siméon Campos, Duncan Cass-Beggs, Jiahao Chen, Rumman Chowdhury, Kuan Chua Seah, Jeff Clune, Juntao Dai, Agnes Delaborde, Nouha Dziri, Francisco Eiras, Joshua Engels, Jinyu Fan, Adam Gleave, Noah Goodman, Fynn Heide, Dan Hendrycks, Cyrus Hodes, Bryan Low Kian Hsiang, Minlie Huang, Sami Jawhar, Wang Jingyu, Adam Tauman Kalai, Meindert Kamphuis, Mohan Kankanhalli, Subhash Kantamneni, Mathias Bonde Kirk, Thomas Kwa, Jeffrey Ladish, Kwok-Yan Lam, Wan Lee Sie, Taewhi Lee, Xiaojian Li, Jiajun Liu, Chaochao Lu, Yifan Mai, Richard Mallah, Julian Michael, Nick Moës, Simon Möller, Kihyuk Nam, Kwan Yee Ng, Mark Nitzberg, Besmira Nushi, Seán O hÉigeartaigh, Alejandro Ortega, Pierre Peigné, James Petrie, Benjamin Prud'Homme, Reihaneh Rabbany, Nayat Sanchez-Pi, Sarah Schwettmann, Buck Shlegeris, Saad Siddiqui, Aradhana Sinha, Martín Soto, Cheston Tan, Dong Ting, Robert Trager, Brian Tse, Anthony Tung K. H., Vanessa Wilfred, John Willes, Denise Wong, Wei Xu, Rongwu Xu, Yi Zeng, HongJiang Zhang, Djordje Žikelić |
阅读更多来源: ArXiv AI | 29-06-25
Dynamic Context-Aware Prompt Recommendation for Domain-Specific AI Applications
Authors: Xinye Tang, Haijun Zhai, Chaitanya Belwal, Vineeth Thayanithi, Philip Baumann, Yogesh K Roy |
阅读更多来源: ArXiv AI | 29-06-25
Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?
Authors: Haoang Chi, He Li, Wenjing Yang, Feng Liu, Long Lan, Xiaoguang Ren, Tongliang Liu, Bo Han |
阅读更多来源: ArXiv AI | 29-06-25
Beyond Reactive Safety: Risk-Aware LLM Alignment via Long-Horizon Simulation
Authors: Chenkai Sun, Denghui Zhang, ChengXiang Zhai, Heng Ji |
阅读更多来源: ArXiv AI | 29-06-25
Active Inference AI Systems for Scientific Discovery
Authors: Karthik Duraisamy |
阅读更多来源: ArXiv AI | 29-06-25
IXAII: An Interactive Explainable Artificial Intelligence Interface for Decision Support Systems
Authors: Pauline Speckmann, Mario Nadj, Christian Janiesch |
阅读更多来源: ArXiv AI | 29-06-25
Microsoft’s Braga AI chip faces six-month delay, trails Nvidia’s Blackwell
阅读更多来源: The Decoder | 29-06-25
OpenAI renting Google TPUs sends a strong warning shot to Microsoft
阅读更多来源: The Decoder | 29-06-25
Meta CTO confirms massive offers for top AI executives
阅读更多来源: The Decoder | 29-06-25
Show HN: AGL a toy language that compiles to Gogithub.com/alaingilbert
阅读更多来源: Hacker News | 29-06-25
LLMs bring new nature of abstraction – up and sidewaysmartinfowler.com
阅读更多来源: Hacker News | 28-06-25
Facebook is starting to feed its AI with private, unpublished photostheverge.com
阅读更多来源: Hacker News | 28-06-25
SymbolicAI: A neuro-symbolic perspective on LLMsgithub.com/extensityai
阅读更多来源: Hacker News | 28-06-25
Lossless LLM 3x Throughput Increase by LMCachegithub.com/lmcache
阅读更多来源: Hacker News | 28-06-25
AlphaGenome: AI for Better Understanding the Genomedeepmind.google
阅读更多来源: Hacker News | 28-06-25
Google launches Gemma 3n, a multimodal AI model built for real-time use on mobile devices
阅读更多来源: The Decoder | 28-06-25
Project Vend: Can Claude run a small shop? (And why does that matter?)anthropic.com
阅读更多来源: Hacker News | 28-06-25
Theoretical Analysis of Positional Encodings in Transformer Modelsarxiv.org
阅读更多来源: Hacker News | 28-06-25
Spark AI (YC W24) is hiring a full-stack engineer in SF (founding team)ycombinator.com
阅读更多来源: Hacker News | 28-06-25
Microsoft is reportedly barred from building its own AGI until 2030 under its contract with OpenAI
阅读更多来源: The Decoder | 27-06-25
Meta poaches three top AI researchers from OpenAI, who had poached them from Deepmind
阅读更多来源: The Decoder | 27-06-25
Show HN: Magnitude – Open-source AI browser automation frameworkgithub.com/magnitudedev
阅读更多来源: Hacker News | 27-06-25
Launch HN: Issen (YC F24) – Personal AI language tutor
阅读更多来源: Hacker News | 27-06-25
What did former CTO Mira Murati see at OpenAI that made her choose custom models over AGI
阅读更多来源: The Decoder | 27-06-25
Show HN: I built an AI dataset generatorgithub.com/metabase
阅读更多来源: Hacker News | 27-06-25
Researchers train AI to generate long-form text using only reinforcement learning
阅读更多来源: The Decoder | 26-06-25
Google Deepmind makes robots independent of the cloud with Gemini On-Device
阅读更多来源: The Decoder | 26-06-25
Anthropic won a fair use hearing that could end up being a defeat
阅读更多来源: The Decoder | 26-06-25
Google releases open-source Gemini CLI to bring Gemini AI into developer workflows
阅读更多来源: The Decoder | 26-06-25
Automatic Demonstration Selection for LLM-based Tabular Data Classification
Authors: Shuchu Han, Wolfgang Bruckner |
阅读更多来源: ArXiv AI | 26-06-25
SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models
Authors: Dipayan Saha, Shams Tarek, Hasan Al Shaikh, Khan Thamid Hasan, Pavan Sai Nalluri, Md. Ajoad Hasan, Nashmin Alam, Jingbo Zhou, Sujan Kumar Saha, Mark Tehranipoor, Farimah Farahmandi |
阅读更多来源: ArXiv AI | 26-06-25
WattsOnAI: Measuring, Analyzing, and Visualizing Energy and Carbon Footprint of AI Workloads
Authors: Hongzhen Huang, Kunming Zhang, Hanlong Liao, Kui Wu, Guoming Tang |
阅读更多来源: ArXiv AI | 26-06-25
Large Language Model-Driven Code Compliance Checking in Building Information Modeling
Authors: Soumya Madireddy, Lu Gao, Zia Din, Kinam Kim, Ahmed Senouci, Zhe Han, Yunpeng Zhang |
阅读更多来源: ArXiv AI | 26-06-25
When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs
Authors: Ammar Khairi, Daniel D'souza, Ye Shen, Julia Kreutzer, Sara Hooker |
阅读更多来源: ArXiv AI | 26-06-25
AI in the Writing Process: How Purposeful AI Support Fosters Student Writing
Authors: Momin N. Siddiqui, Roy Pea, Hari Subramonyam |
阅读更多来源: ArXiv AI | 26-06-25
Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs
Authors: Sonia K. Murthy, Rosie Zhao, Jennifer Hu, Sham Kakade, Markus Wulfmeier, Peng Qian, Tomer Ullman |
阅读更多来源: ArXiv AI | 26-06-25
Define-ML: An Approach to Ideate Machine Learning-Enabled Systems
Authors: Silvio Alonso, Antonio Pedro Santos Alves, Lucas Romao, Hélio Lopes, Marcos Kalinowski |
阅读更多来源: ArXiv AI | 26-06-25
Persona-Assigned Large Language Models Exhibit Human-Like Motivated Reasoning
Authors: Saloni Dash, Amélie Reymond, Emma S. Spiro, Aylin Caliskan |
阅读更多来源: ArXiv AI | 26-06-25
Achieving Trustworthy Real-Time Decision Support Systems with Low-Latency Interpretable AI Models
Authors: Zechun Deng, Ziwei Liu, Ziqian Bi, Junhao Song, Chia Xin Liang, Joe Yeong, Junfeng Hao |
阅读更多来源: ArXiv AI | 26-06-25
Accurate and Energy Efficient: Local Retrieval-Augmented Generation Models Outperform Commercial Large Language Models in Medical Tasks
Authors: Konstantinos Vrettos, Michail E. Klontzas |
阅读更多来源: ArXiv AI | 26-06-25
QHackBench: Benchmarking Large Language Models for Quantum Code Generation Using PennyLane Hackathon Challenges
Authors: Abdul Basit, Minghao Shao, Haider Asif, Nouhaila Innan, Muhammad Kashif, Alberto Marchisio, Muhammad Shafique |
阅读更多来源: ArXiv AI | 26-06-25
Enterprise Large Language Model Evaluation Benchmark
Authors: Liya Wang, David Yi, Damien Jose, John Passarelli, James Gao, Jordan Leventis, Kang Li |
阅读更多来源: ArXiv AI | 26-06-25
DiaLLMs: EHR Enhanced Clinical Conversational System for Clinical Test Recommendation and Diagnosis Prediction
Authors: Weijieying Ren, Tianxiang Zhao, Lei Wang, Tianchun Wang, Vasant Honavar |
阅读更多来源: ArXiv AI | 26-06-25
Smart Ride and Delivery Services with Electric Vehicles: Leveraging Bidirectional Charging for Profit Optimisation
Authors: Jinchun Du, Bojie Shen, Muhammad Aamir Cheema, Adel N. Toosi |
阅读更多来源: ArXiv AI | 26-06-25
Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios
Authors: Wenbin Gan, Minh-Son Dao, Koji Zettsu |
阅读更多来源: ArXiv AI | 26-06-25
CogGen: A Learner-Centered Generative AI Architecture for Intelligent Tutoring with Programming Video
Authors: Wengxi Li, Roy Pea, Nick Haber, Hari Subramonyam |
阅读更多来源: ArXiv AI | 26-06-25
Fine-Tuning and Prompt Engineering of LLMs, for the Creation of Multi-Agent AI for Addressing Sustainable Protein Production Challenges
Authors: Alexander D. Kalian, Jaewook Lee, Stefan P. Johannesson, Lennart Otte, Christer Hogstrand, Miao Guo |
阅读更多来源: ArXiv AI | 26-06-25
Towards Community-Driven Agents for Machine Learning Engineering
Authors: Sijie Li, Weiwei Sun, Shanda Li, Ameet Talwalkar, Yiming Yang |
阅读更多来源: ArXiv AI | 26-06-25
LLM code generation may lead to an erosion of trustjaysthoughts.com
阅读更多来源: Hacker News | 26-06-25
Define policy forbidding use of AI code generatorsgithub.com/qemu
阅读更多来源: Hacker News | 26-06-25
Build and Host AI-Powered Apps with Claude – No Deployment Neededanthropic.com
阅读更多来源: Hacker News | 26-06-25
Structured Output with LangChain and Llamafilebrakmic.com
阅读更多来源: Hacker News | 26-06-25
OpenAI charges by the minute, so speed up your audiomand.is
阅读更多来源: Hacker News | 26-06-25
Learnings from Building AI Agentscubic.dev
阅读更多来源: Hacker News | 26-06-25
Gemini CLIblog.google
阅读更多来源: Hacker News | 26-06-25
Google hands off Agent2Agent protocol to Linux Foundation for open AI agent standard
阅读更多来源: The Decoder | 26-06-25
LLM Hallucinations in Practical Code Generationacm.org
阅读更多来源: Hacker News | 26-06-25
FurtherAI (YC W24) Is Hiring for Software and AI Rolesycombinator.com
阅读更多来源: Hacker News | 26-06-25
Disney is in talks with OpenAI about possible partnerships involving its characters
阅读更多来源: The Decoder | 25-06-25
Microsoft has introduced an AI agent to the Windows Settings menu
阅读更多来源: The Decoder | 25-06-25
AI job postings on LinkedIn grew sixfold as AI skill additions to profiles soared twentyfold
阅读更多来源: The Decoder | 25-06-25
African and South American countries are almost entirely excluded from global AI development
阅读更多来源: The Decoder | 25-06-25
ChatGPT's enterprise success against Copilot fuels OpenAI/Microsoft rivalrybloomberg.com
阅读更多来源: Hacker News | 25-06-25
Thoughts on Asunción, Paraguaycpsi.media
阅读更多来源: Hacker News | 25-06-25
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning
Authors: Yuqian Fu, Tinghong Chen, Jiajun Chai, Xihuai Wang, Songjun Tu, Guojun Yin, Wei Lin, Qichao Zhang, Yuanheng Zhu, Dongbin Zhao |
阅读更多来源: ArXiv AI | 25-06-25
Arabic Dialect Classification using RNNs, Transformers, and Large Language Models: A Comparative Analysis
Authors: Omar A.Essameldin, Ali O.Elbeih, Wael H.Gomaa, Wael F.Elsersy |
阅读更多来源: ArXiv AI | 25-06-25
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study
Authors: Yuqi Zhu, Yi Zhong, Jintian Zhang, Ziheng Zhang, Shuofei Qiao, Yujie Luo, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang |
阅读更多来源: ArXiv AI | 25-06-25
Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
Authors: Jun Wang, Xijuan Zeng, Chunyu Qiang, Ruilong Chen, Shiyao Wang, Le Wang, Wangjing Zhou, Pengfei Cai, Jiahui Zhao, Nan Li, Zihan Li, Yuzhe Liang, Xiaopeng Wang, Haorui Zheng, Ming Wen, Kang Yin, Yiran Wang, Nan Li, Feng Deng, Liang Dong, Chen Zhang, Di Zhang, Kun Gai |
阅读更多来源: ArXiv AI | 25-06-25
Do LLMs Know When to Flip a Coin? Strategic Randomization through Reasoning and Experience
Authors: Lingyu Yang (1) ((1) Shanghai Jiao Tong University) |
阅读更多来源: ArXiv AI | 25-06-25
A standard transformer and attention with linear biases for molecular conformer generation
Authors: Viatcheslav Gurev, Timothy Rumbell |
阅读更多来源: ArXiv AI | 25-06-25
Emotion Detection on User Front-Facing App Interfaces for Enhanced Schedule Optimization: A Machine Learning Approach
Authors: Feiting Yang, Antoine Moevus, Steve Lévesque |
阅读更多来源: ArXiv AI | 25-06-25
RecLLM-R1: A Two-Stage Training Paradigm with Reinforcement Learning and Chain-of-Thought v1
Authors: Yu Xie, Xingkai Ren, Ying Qi, Yao Hu, Lianlei Shan |
阅读更多来源: ArXiv AI | 25-06-25
Spiritual-LLM : Gita Inspired Mental Health Therapy In the Era of LLMs
Authors: Janak Kapuriya, Aman Singh, Jainendra Shukla, Rajiv Ratn Shah |
阅读更多来源: ArXiv AI | 25-06-25
Baba is LLM: Reasoning in a Game with Dynamic Rules
Authors: Fien van Wetten, Aske Plaat, Max van Duijn |
阅读更多来源: ArXiv AI | 25-06-25
Conversational Intent-Driven GraphRAG: Enhancing Multi-Turn Dialogue Systems through Adaptive Dual-Retrieval of Flow Patterns and Context Semantics
Authors: Ziqi Zhu, Tao Hu, Honglong Zhang, Dan Yang, HanGeng Chen, Mengran Zhang, Xilun Chen |
阅读更多来源: ArXiv AI | 25-06-25
FEAT: A Preference Feedback Dataset through a Cost-Effective Auto-Generation and Labeling Framework for English AI Tutoring
Authors: Hyein Seo, Taewook Hwang, Yohan Lee, sangkeun Jung |
阅读更多来源: ArXiv AI | 25-06-25
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
Authors: Liang Zeng, Yongcong Li, Yuzhen Xiao, Changshi Li, Chris Yuhao Liu, Rui Yan, Tianwen Wei, Jujie He, Xuchen Song, Yang Liu, Yahui Zhou |
阅读更多来源: ArXiv AI | 25-06-25
Interpretable Hybrid Machine Learning Models Using FOLD-R++ and Answer Set Programming
Authors: Sanne Wielinga, Jesse Heyninck |
阅读更多来源: ArXiv AI | 25-06-25
NTRL: Encounter Generation via Reinforcement Learning for Dynamic Difficulty Adjustment in Dungeons and Dragons
Authors: Carlo Romeo, Andrew D. Bagdanov |
阅读更多来源: ArXiv AI | 25-06-25
KunLunBaizeRAG: Reinforcement Learning Driven Inference Performance Leap for Large Language Models
Authors: Cheng Li, Jiexiong Liu, Yixuan Chen, Qihang Zhou, KunLun Meta |
阅读更多来源: ArXiv AI | 25-06-25
From memories to maps: Mechanisms of in context reinforcement learning in transformers
Authors: Ching Fang, Kanaka Rajan |
阅读更多来源: ArXiv AI | 25-06-25
LLM-Driven Medical Document Analysis: Enhancing Trustworthy Pathology and Differential Diagnosis
Authors: Lei Kang, Xuanshuo Fu, Oriol Ramos Terrades, Javier Vazquez-Corral, Ernest Valveny, Dimosthenis Karatzas |
阅读更多来源: ArXiv AI | 25-06-25
Temporal-IRL: Modeling Port Congestion and Berth Scheduling with Inverse Reinforcement Learning
Authors: Guo Li, Zixiang Xu, Wei Zhang, Yikuan Hu, Xinyu Yang, Nikolay Aristov, Mingjie Tang, Elenna R Dugundji |
阅读更多来源: ArXiv AI | 25-06-25
JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning
Authors: Ai Han, Junxing Hu, Pu Wei, Zhiqian Zhang, Yuhang Guo, Jiawei Lu, Zicheng Zhang |
阅读更多来源: ArXiv AI | 25-06-25
Gemini Robotics On-Device brings AI to local robotic devicesdeepmind.google
阅读更多来源: Hacker News | 25-06-25
Mapping LLMs over excel saved my passion for game devweblog.lol
阅读更多来源: Hacker News | 25-06-25
Blackmail becomes go-to strategy for AI models facing shutdown in new Anthropic tests
阅读更多来源: The Decoder | 24-06-25
'Dragon prince' dinosaur discovery 'rewrites' T.rex family treebbc.com
阅读更多来源: Hacker News | 24-06-25
From Unstructured Communication to Intelligent RAG: Multi-Agent Automation for Supply Chain Knowledge Bases
Authors: Yao Zhang, Zaixi Shang, Silpan Patel, Mikel Zuniga |
阅读更多来源: ArXiv AI | 24-06-25
OmniReflect: Discovering Transferable Constitutions for LLM agents via Neuro-Symbolic Reflections
Authors: Manasa Bharadwaj, Nikhil Verma, Kevin Ferreira |
阅读更多来源: ArXiv AI | 24-06-25
Keeping Medical AI Healthy: A Review of Detection and Correction Methods for System Degradation
Authors: Hao Guan, David Bates, Li Zhou |
阅读更多来源: ArXiv AI | 24-06-25
Resource Rational Contractualism Should Guide AI Alignment
Authors: Sydney Levine, Matija Franklin, Tan Zhi-Xuan, Secil Yanik Guyot, Lionel Wong, Daniel Kilov, Yejin Choi, Joshua B. Tenenbaum, Noah Goodman, Seth Lazar, Iason Gabriel |
阅读更多来源: ArXiv AI | 24-06-25
Measuring and Augmenting Large Language Models for Solving Capture-the-Flag Challenges
Authors: Zimo Ji, Daoyuan Wu, Wenyuan Jiang, Pingchuan Ma, Zongjie Li, Shuai Wang |
阅读更多来源: ArXiv AI | 24-06-25
Taming the Untamed: Graph-Based Knowledge Retrieval and Reasoning for MLLMs to Conquer the Unknown
Authors: Bowen Wang |
阅读更多来源: ArXiv AI | 24-06-25
Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models
Authors: Yukun Huang, Sanxing Chen, Jian Pei, Manzil Zaheer, Bhuwan Dhingra |
阅读更多来源: ArXiv AI | 24-06-25
Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities
Authors: Yuanchen Bei, Weizhi Zhang, Siwen Wang, Weizhi Chen, Sheng Zhou, Hao Chen, Yong Li, Jiajun Bu, Shirui Pan, Yizhou Yu, Irwin King, Fakhri Karray, Philip S. Yu |
阅读更多来源: ArXiv AI | 24-06-25
Leveraging Large Language Model for Intelligent Log Processing and Autonomous Debugging in Cloud AI Platforms
Authors: Cheng Ji, Huaiying Luo |
阅读更多来源: ArXiv AI | 24-06-25
A Conceptual Framework for AI Capability Evaluations
Authors: María Victoria Carro, Denise Alejandra Mester, Francisca Gauna Selasco, Luca Nicolás Forziati Gangi, Matheo Sandleris Musa, Lola Ramos Pereyra, Mario Leiva, Juan Gustavo Corvalan, María Vanina Martinez, Gerardo Simari |
阅读更多来源: ArXiv AI | 24-06-25
Standard Applicability Judgment and Cross-jurisdictional Reasoning: A RAG-based Framework for Medical Device Compliance
Authors: Yu Han, Aaron Ceross, Jeroen H.M. Bergmann |
阅读更多来源: ArXiv AI | 24-06-25
How Robust is Model Editing after Fine-Tuning? An Empirical Study on Text-to-Image Diffusion Models
Authors: Feng He, Zhenyang Liu, Marco Valentino, Zhixue Zhao |
阅读更多来源: ArXiv AI | 24-06-25
A Large Language Model-based Multi-Agent Framework for Analog Circuits' Sizing Relationships Extraction
Authors: Chengjie Liu, Weiyu Chen, Huiyao Xu, Yuan Du, Jun Yang, Li Du |
阅读更多来源: ArXiv AI | 24-06-25
T-CPDL: A Temporal Causal Probabilistic Description Logic for Developing Logic-RAG Agent
Authors: Hong Qing Yu |
阅读更多来源: ArXiv AI | 24-06-25
A Question Bank to Assess AI Inclusivity: Mapping out the Journey from Diversity Errors to Inclusion Excellence
Authors: Rifat Ara Shams, Didar Zowghi, Muneera Bano |
阅读更多来源: ArXiv AI | 24-06-25
AggTruth: Contextual Hallucination Detection using Aggregated Attention Scores in LLMs
Authors: Piotr Matys, Jan Eliasz, Konrad Kiełczyński, Mikołaj Langner, Teddy Ferdinan, Jan Kocoń, Przemysław Kazienko |
阅读更多来源: ArXiv AI | 24-06-25
TRIZ Agents: A Multi-Agent LLM Approach for TRIZ-Based Innovation
Authors: Kamil Szczepanik, Jarosław A. Chudziak |
阅读更多来源: ArXiv AI | 24-06-25
Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training
Authors: Jonathan Cook, Silvia Sapora, Arash Ahmadian, Akbir Khan, Tim Rocktaschel, Jakob Foerster, Laura Ruis |
阅读更多来源: ArXiv AI | 24-06-25
Steering Conceptual Bias via Transformer Latent-Subspace Activation
Authors: Vansh Sharma, Venkat Raman |
阅读更多来源: ArXiv AI | 24-06-25
jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval
Authors: Michael Günther, Saba Sturua, Mohammad Kalim Akram, Isabelle Mohr, Andrei Ungureanu, Sedigheh Eslami, Scott Martens, Bo Wang, Nan Wang, Han Xiao |
阅读更多来源: ArXiv AI | 24-06-25
Show HN: Pickaxe – A TypeScript library for building AI agentsgithub.com/hatchet-dev
阅读更多来源: Hacker News | 24-06-25
Judge denies creating “mass surveillance program” harming all ChatGPT usersarstechnica.com
阅读更多来源: Hacker News | 24-06-25
GitHub CEO: manual coding remains key despite AI boomtechinasia.com
阅读更多来源: Hacker News | 24-06-25
Sakana AI's ALE AI agent cracks the top 21 among 1,000 code experts
阅读更多来源: The Decoder | 23-06-25
Apple executives have held internal discussions about potentially bidding for AI startup Perplexity
阅读更多来源: The Decoder | 23-06-25
Nano-Vllm: lightweight vLLM implementation built from scratchgithub.com/geeeekexplorer
阅读更多来源: Hacker News | 23-06-25
Show HN: EchoStream – A Local AI Agent That Lives on Your iPhone
阅读更多来源: Hacker News | 23-06-25
Claude Code for VSCodevisualstudio.com
阅读更多来源: Hacker News | 23-06-25
Facial Landmark Visualization and Emotion Recognition Through Neural Networks
Authors: Israel Juárez-Jiménez, Tiffany Guadalupe Martínez Paredes, Jesús García-Ramírez, Eric Ramos Aguilar |
阅读更多来源: ArXiv AI | 23-06-25
Towards AI Search Paradigm
Authors: Yuchen Li, Hengyi Cai, Rui Kong, Xinran Chen, Jiamin Chen, Jun Yang, Haojie Zhang, Jiayi Li, Jiayi Wu, Yiqun Chen, Changle Qu, Keyi Kong, Wenwen Ye, Lixin Su, Xinyu Ma, Long Xia, Daiting Shi, Jiashu Zhao, Haoyi Xiong, Shuaiqiang Wang, Dawei Yin |
阅读更多来源: ArXiv AI | 23-06-25
Continual Learning with Columnar Spiking Neural Networks
Authors: Denis Larionov, Nikolay Bazenkov, Mikhail Kiselev |
阅读更多来源: ArXiv AI | 23-06-25
LLMs Struggle to Perform Counterfactual Reasoning with Parametric Knowledge
Authors: Khurram Yamin, Gaurav Ghosal, Bryan Wilder |
阅读更多来源: ArXiv AI | 23-06-25
No Free Lunch: Rethinking Internal Feedback for LLM Reasoning
Authors: Yanzhi Zhang, Zhaoxi Zhang, Haoxiang Guan, Yilin Cheng, Yitong Duan, Chen Wang, Yue Wang, Shuxin Zheng, Jiyan He |
阅读更多来源: ArXiv AI | 23-06-25
Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems
Authors: Matias Martinez, Xavier Franch |
阅读更多来源: ArXiv AI | 23-06-25
Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts
Authors: Kartik Sharma, Yiqiao Jin, Vineeth Rakesh, Yingtong Dou, Menghai Pan, Mahashweta Das, Srijan Kumar |
阅读更多来源: ArXiv AI | 23-06-25
SHADE-Arena: Evaluating Sabotage and Monitoring in LLM Agents
Authors: Jonathan Kutasov, Yuqi Sun, Paul Colognese, Teun van der Weij, Linda Petrini, Chen Bo Calvin Zhang, John Hughes, Xiang Deng, Henry Sleight, Tyler Tracy, Buck Shlegeris, Joe Benton |
阅读更多来源: ArXiv AI | 23-06-25
Dual-Objective Reinforcement Learning with Novel Hamilton-Jacobi-Bellman Formulations
Authors: William Sharpless, Dylan Hirsch, Sander Tonkens, Nikhil Shinde, Sylvia Herbert |
阅读更多来源: ArXiv AI | 23-06-25
Exploring Big Five Personality and AI Capability Effects in LLM-Simulated Negotiation Dialogues
Authors: Myke C. Cohen, Zhe Su, Hsien-Te Kao, Daniel Nguyen, Spencer Lynch, Maarten Sap, Svitlana Volkova |
阅读更多来源: ArXiv AI | 23-06-25
Large Language Models are Near-Optimal Decision-Makers with a Non-Human Learning Behavior
Authors: Hao Li, Gengrui Zhang, Petter Holme, Shuyue Hu, Zhen Wang |
阅读更多来源: ArXiv AI | 23-06-25
Advancing Harmful Content Detection in Organizational Research: Integrating Large Language Models with Elo Rating System
Authors: Mustafa Akben, Aaron Satko |
阅读更多来源: ArXiv AI | 23-06-25
Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving
Authors: Chuxue Cao, Mengze Li, Juntao Dai, Jinluan Yang, Zijian Zhao, Shengyu Zhang, Weijie Shi, Chengzhong Liu, Sirui Han, Yike Guo |
阅读更多来源: ArXiv AI | 23-06-25
The MedPerturb Dataset: What Non-Content Perturbations Reveal About Human and Clinical LLM Decision Making
Authors: Abinitha Gourabathina, Yuexing Hao, Walter Gerych, Marzyeh Ghassemi |
阅读更多来源: ArXiv AI | 23-06-25
LAION and Intel introduce tools that help AI gauge the intensity of 40 distinct emotions
阅读更多来源: The Decoder | 22-06-25
Phoenix.new – Remote AI Runtime for Phoenixfly.io
阅读更多来源: Hacker News | 22-06-25
Remote MCP Support in Claude Codeanthropic.com
阅读更多来源: Hacker News | 22-06-25
Uncovering Intention through LLM-Driven Code Snippet Description Generation
Authors: Yusuf Sulistyo Nugroho, Farah Danisha Salam, Brittany Reid, Raula Gaikovina Kula, Kazumasa Shimari, Kenichi Matsumoto |
阅读更多来源: ArXiv AI | 22-06-25
RePCS: Diagnosing Data Memorization in LLM-Powered Retrieval-Augmented Generation
Authors: Le Vu Anh, Nguyen Viet Anh, Mehmet Dik, Luong Van Nghia |
阅读更多来源: ArXiv AI | 22-06-25
Optimizing Web-Based AI Query Retrieval with GPT Integration in LangChain A CoT-Enhanced Prompt Engineering Approach
Authors: Wenqi Guan, Yang Fang |
阅读更多来源: ArXiv AI | 22-06-25
Over-squashing in Spatiotemporal Graph Neural Networks
Authors: Ivan Marisca, Jacob Bamberger, Cesare Alippi, Michael M. Bronstein |
阅读更多来源: ArXiv AI | 22-06-25
Towards Explainable Indoor Localization: Interpreting Neural Network Learning on Wi-Fi Fingerprints Using Logic Gates
Authors: Danish Gufran, Sudeep Pasricha |
阅读更多来源: ArXiv AI | 22-06-25
The Compositional Architecture of Regret in Large Language Models
Authors: Xiangxiang Cui, Shu Yang, Tianjin Huang, Wanyu Lin, Lijie Hu, Di Wang |
阅读更多来源: ArXiv AI | 22-06-25
LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning
Authors: Gabrel J. Perin, Runjin Chen, Xuxi Chen, Nina S. T. Hirata, Zhangyang Wang, Junyuan Hong |
阅读更多来源: ArXiv AI | 22-06-25
Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability
Authors: Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe |
阅读更多来源: ArXiv AI | 22-06-25
Demystifying the Visual Quality Paradox in Multimodal Large Language Models
Authors: Shuo Xing, Lanqing Guo, Hongyuan Hua, Seoyoung Lee, Peiran Li, Yufei Wang, Zhangyang Wang, Zhengzhong Tu |
阅读更多来源: ArXiv AI | 22-06-25
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges
Authors: Xianliang Yang, Ling Zhang, Haolong Qian, Lei Song, Jiang Bian |
阅读更多来源: ArXiv AI | 22-06-25
Managing Complex Failure Analysis Workflows with LLM-based Reasoning and Acting Agents
Authors: Aline Dobrovsky, Konstantin Schekotihin, Christian Burmer |
阅读更多来源: ArXiv AI | 22-06-25
The AI Policy Module: Developing Computer Science Student Competency in AI Ethics and Policy
Authors: James Weichert, Daniel Dunlap, Mohammed Farghally, Hoda Eldardiry |
阅读更多来源: ArXiv AI | 22-06-25
The Effect of State Representation on LLM Agent Behavior in Dynamic Routing Games
Authors: Lyle Goodyear, Rachel Guo, Ramesh Johari |
阅读更多来源: ArXiv AI | 22-06-25
Meta CEO Mark Zuckerberg bets billions not to fall behind in the AI race
阅读更多来源: The Decoder | 22-06-25
Apple's "Illusion of Thinking" paper shows experts deeply divided on AI reasoning
阅读更多来源: The Decoder | 21-06-25
Agentic Misalignment: How LLMs could be insider threatsanthropic.com
阅读更多来源: Hacker News | 21-06-25
Midjourney launches its first video model, letting users turn images into short animated clips
阅读更多来源: The Decoder | 21-06-25
Jürgen Schmidhuber:the Father of Generative AI Without Turing Awardjazzyear.com
阅读更多来源: Hacker News | 21-06-25
I Built a Celebrity AI Image Generator(No Registion Needed)– Would Love Feedbackaicelebrity.design
阅读更多来源: Hacker News | 21-06-25
OpenAI CEO Sam Altman says GPT-5 is "probably coming sometime this summer"
阅读更多来源: The Decoder | 20-06-25
Andrej Karpathy: Software in the era of AI [video]youtube.com
阅读更多来源: Hacker News | 20-06-25
Compiling LLMs into a MegaKernel: A path to low-latency inferencezhihaojia.medium.com
阅读更多来源: Hacker News | 20-06-25
Gemini 2.5 Flash-Lite is the fastest and most cost-effective model in Google's Gemini lineup
阅读更多来源: The Decoder | 20-06-25
Show HN: Claude Code Usage Monitor – real-time tracker to dodge usage cut-offsgithub.com/maciek-roboblog
阅读更多来源: Hacker News | 20-06-25
How OpenElections uses LLMsthescoop.org
阅读更多来源: Hacker News | 20-06-25
MiniMax-M1 comes close to Gemini 2.5 Pro efficiency when handling large context windows
阅读更多来源: The Decoder | 19-06-25
From LLM to AI Agent: What's the Real Journey Behind AI System Development?codelink.io
阅读更多来源: Hacker News | 19-06-25
Luxembourg partners with Mistral AI to bring artificial intelligence to government and defense
阅读更多来源: The Decoder | 19-06-25
OpenAI and Microsoft increasingly mistrust each other as tensions rise over contracts and profits
阅读更多来源: The Decoder | 19-06-25
Is there a half-life for the success rates of AI agents?tobyord.com
阅读更多来源: Hacker News | 19-06-25
Math genius Terence Tao says that AI still can't "smell" bad math
阅读更多来源: The Decoder | 18-06-25
OpenAI’s Defense Department deal targets healthcare, data analysis, and cyber defense
阅读更多来源: The Decoder | 18-06-25
Time Series Forecasting with Graph Transformerskumo.ai
阅读更多来源: Hacker News | 18-06-25
LLMs pose an interesting problem for DSL designerskirancodes.me
阅读更多来源: Hacker News | 18-06-25
Making 2.5 Flash and 2.5 Pro GA, and introducing Gemini 2.5 Flash-Liteblog.google
阅读更多来源: Hacker News | 18-06-25
Building Effective AI Agentsanthropic.com
阅读更多来源: Hacker News | 18-06-25
I counted all of the yurts in Mongolia using machine learningmonroeclinton.com
阅读更多来源: Hacker News | 18-06-25
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs
Authors: Ring Team, Bin Hu, Cai Chen, Deng Zhao, Ding Liu, Dingnan Jin, Feng Zhu, Hao Dai, Hongzhi Luan, Jia Guo, Jiaming Liu, Jiewei Wu, Jun Mei, Jun Zhou, Junbo Zhao, Junwu Xiong, Kaihong Zhang, Kuan Xu, Lei Liang, Liang Jiang, Liangcheng Fu, Longfei Zheng, Qiang Gao, Qing Cui, Quan Wan, Shaomian Zheng, Shuaicheng Li, Tongkai Yang, Wang Ren, Xiaodong Yan, Xiaopei Wan, Xiaoyun Feng, Xin Zhao, Xinxing Yang, Xinyu Kong, Xuemin Yang, Yang Li, Yingting Wu, Yongkang Liu, Zhankai Xu, Zhenduo Zhang, Zhenglei Zhou, Zhenyu Huang, Zhiqiang Zhang, Zihao Wang, Zujie Wen |
阅读更多来源: ArXiv AI | 18-06-25
Personalized Constitutionally-Aligned Agentic Superego: Secure AI Behavior Aligned to Diverse Human Values
Authors: Nell Watson, Ahmed Amer, Evan Harris, Preeti Ravindra, Shujun Zhang |
阅读更多来源: ArXiv AI | 18-06-25
The NordDRG AI Benchmark for Large Language Models
Authors: Tapio Pitkäranta |
阅读更多来源: ArXiv AI | 18-06-25
ICE-ID: A Novel Historical Census Data Benchmark Comparing NARS against LLMs, \& a ML Ensemble on Longitudinal Identity Resolution
Authors: Gonçalo Hora de Carvalho, Lazar S. Popov, Sander Kaatee, Kristinn R. Thórisson, Tangrui Li, Pétur Húni Björnsson, Jilles S. Dibangoye |
阅读更多来源: ArXiv AI | 18-06-25
Causality in the human niche: lessons for machine learning
Authors: Richard D. Lange, Konrad P. Kording |
阅读更多来源: ArXiv AI | 18-06-25
Evaluating Explainability: A Framework for Systematic Assessment and Reporting of Explainable AI Features
Authors: Miguel A. Lago, Ghada Zamzmi, Brandon Eich, Jana G. Delfino |
阅读更多来源: ArXiv AI | 18-06-25
LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning
Authors: Miho Koda, Yu Zheng, Ruixian Ma, Mingyang Sun, Devesh Pansare, Fabio Duarte, Paolo Santi |
阅读更多来源: ArXiv AI | 18-06-25
Machine Mirages: Defining the Undefined
Authors: Hamidou Tembine |
阅读更多来源: ArXiv AI | 18-06-25
ProfiLLM: An LLM-Based Framework for Implicit Profiling of Chatbot Users
Authors: Shahaf David, Yair Meidan, Ido Hersko, Daniel Varnovitzky, Dudu Mimran, Yuval Elovici, Asaf Shabtai |
阅读更多来源: ArXiv AI | 18-06-25
Into the Unknown: Applying Inductive Spatial-Semantic Location Embeddings for Predicting Individuals' Mobility Beyond Visited Places
Authors: Xinglei Wang, Tao Cheng, Stephen Law, Zichao Zeng, Ilya Ilyankou, Junyuan Liu, Lu Yin, Weiming Huang, Natchapon Jongwiriyanurak |
阅读更多来源: ArXiv AI | 18-06-25
Fragile Preferences: A Deep Dive Into Order Effects in Large Language Models
Authors: Haonan Yin, Shai Vardi, Vidyanand Choudhary |
阅读更多来源: ArXiv AI | 18-06-25
Lightweight Relevance Grader in RAG
Authors: Taehee Jeong |
阅读更多来源: ArXiv AI | 18-06-25
From Black Boxes to Transparent Minds: Evaluating and Enhancing the Theory of Mind in Multimodal Large Language Models
Authors: Xinyang Li, Siqi Liu, Bochao Zou, Jiansheng Chen, Huimin Ma |
阅读更多来源: ArXiv AI | 18-06-25
Causes in neuron diagrams, and testing causal reasoning in Large Language Models. A glimpse of the future of philosophy?
Authors: Louis Vervoort, Vitaly Nikolaev |
阅读更多来源: ArXiv AI | 18-06-25
Don't throw the baby out with the bathwater: How and why deep learning for ARC
Authors: Jack Cole, Mohamed Osman |
阅读更多来源: ArXiv AI | 18-06-25
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
Authors: Xumeng Wen, Zihan Liu, Shun Zheng, Zhijian Xu, Shengyu Ye, Zhirong Wu, Xiao Liang, Yang Wang, Junjie Li, Ziming Miao, Jiang Bian, Mao Yang |
阅读更多来源: ArXiv AI | 18-06-25
Don't Make It Up: Preserving Ignorance Awareness in LLM Fine-Tuning
Authors: William F. Shen, Xinchi Qiu, Nicola Cancedda, Nicholas D. Lane |
阅读更多来源: ArXiv AI | 18-06-25
AviationLLM: An LLM-based Knowledge System for Aviation Training
Authors: Jia'ang Wan, Feng Shen, Fujuan Li, Yanjin Sun, Yan Li, Shiwen Zhang |
阅读更多来源: ArXiv AI | 18-06-25
ADRD: LLM-Driven Autonomous Driving Based on Rule-based Decision Systems
Authors: Fanzhi Zeng, Siqi Wang, Chuzhao Zhu, Li Li |
阅读更多来源: ArXiv AI | 18-06-25
LLM-Powered Swarms: A New Frontier or a Conceptual Stretch?
Authors: Muhammad Atta Ur Rahman, Melanie Schranz |
阅读更多来源: ArXiv AI | 18-06-25
Doppelgänger Method: Breaking Role Consistency in LLM Agent via Prompt-based Transferable Adversarial Attack
Authors: Daewon Kang, YeongHwan Shin, Doyeon Kim, Kyu-Hwan Jung, Meong Hi Son |
阅读更多来源: ArXiv AI | 18-06-25
Enhancing Symbolic Machine Learning by Subsymbolic Representations
Authors: Stephen Roth, Lennart Baur, Derian Boer, Stefan Kramer |
阅读更多来源: ArXiv AI | 18-06-25
New study supports Apple's doubts about AI reasoning, but sees no dead end
阅读更多来源: The Decoder | 18-06-25
Salesforce's CRM benchmark finds AI agents struggle in real-world business scenarios
阅读更多来源: The Decoder | 17-06-25
New York may soon require AI giants to publish safety protocols before releasing LLMs
阅读更多来源: The Decoder | 17-06-25
Evolutionary Developmental Biology Can Serve as the Conceptual Foundation for a New Design Paradigm in Artificial Intelligence
Authors: Zeki Doruk Erden, Boi Faltings |
阅读更多来源: ArXiv AI | 17-06-25
Mastering Da Vinci Code: A Comparative Study of Transformer, LLM, and PPO-based Agents
Authors: LeCheng Zhang, Yuanshi Wang, Haotian Shen, Xujie Wang |
阅读更多来源: ArXiv AI | 17-06-25
Constitutive Components for Human-Like Autonomous Artificial Intelligence
Authors: Kazunori D Yamada |
阅读更多来源: ArXiv AI | 17-06-25
Scaling Test-time Compute for LLM Agents
Authors: King Zhu, Hanhao Li, Siwei Wu, Tianshun Xing, Dehua Ma, Xiangru Tang, Minghao Liu, Jian Yang, Jiaheng Liu, Yuchen Eleanor Jiang, Changwang Zhang, Chenghua Lin, Jun Wang, Ge Zhang, Wangchunshu Zhou |
阅读更多来源: ArXiv AI | 17-06-25
Knowledge Graph Fusion with Large Language Models for Accurate, Explainable Manufacturing Process Planning
Authors: Danny Hoang, David Gorsich, Matthew P. Castanier, Farhad Imani |
阅读更多来源: ArXiv AI | 17-06-25
A Practical Guide for Evaluating LLMs and LLM-Reliant Systems
Authors: Ethan M. Rudd, Christopher Andrews, Philip Tully |
阅读更多来源: ArXiv AI | 17-06-25
Discerning What Matters: A Multi-Dimensional Assessment of Moral Competence in LLMs
Authors: Daniel Kilov, Caroline Hendy, Secil Yanik Guyot, Aaron J. Snoswell, Seth Lazar |
阅读更多来源: ArXiv AI | 17-06-25
NeuroPhysNet: A FitzHugh-Nagumo-Based Physics-Informed Neural Network Framework for Electroencephalograph (EEG) Analysis and Motor Imagery Classification
Authors: Zhenyu Xia, Xinlei Huang, Suvash C. Saha |
阅读更多来源: ArXiv AI | 17-06-25
Machine Learning as Iterated Belief Change a la Darwiche and Pearl
Authors: Theofanis Aravanis |
阅读更多来源: ArXiv AI | 17-06-25
Probabilistic Modeling of Spiking Neural Networks with Contract-Based Verification
Authors: Zhen Yao, Elisabetta De Maria, Robert De Simone |
阅读更多来源: ArXiv AI | 17-06-25
Towards Pervasive Distributed Agentic Generative AI -- A State of The Art
Authors: Gianni Molinari, Fabio Ciravegna |
阅读更多来源: ArXiv AI | 17-06-25
Navigating the Black Box: Leveraging LLMs for Effective Text-Level Graph Injection Attacks
Authors: Yuefei Lyu, Chaozhuo Li, Xi Zhang, Tianle Zhang |
阅读更多来源: ArXiv AI | 17-06-25
Vector Ontologies as an LLM world view extraction method
Authors: Kaspar Rothenfusser, Bekk Blando |
阅读更多来源: ArXiv AI | 17-06-25
A Game-Theoretic Negotiation Framework for Cross-Cultural Consensus in LLMs
Authors: Guoxi Zhang, Jiawei Chen, Tianzhuo Yang, Jiaming Ji, Yaodong Yang, Juntao Dai |
阅读更多来源: ArXiv AI | 17-06-25
Deflating Deflationism: A Critical Perspective on Debunking Arguments Against LLM Mentality
Authors: Alex Grzankowski, Geoff Keeling, Henry Shevlin, Winnie Street |
阅读更多来源: ArXiv AI | 17-06-25
Delving Into the Psychology of Machines: Exploring the Structure of Self-Regulated Learning via LLM-Generated Survey Responses
Authors: Leonie V.D.E. Vogelsmeier, Eduardo Oliveira, Kamila Misiejuk, Sonsoles López-Pernas, Mohammed Saqr |
阅读更多来源: ArXiv AI | 17-06-25
From Data-Driven to Purpose-Driven Artificial Intelligence: Systems Thinking for Data-Analytic Automation of Patient Care
Authors: Daniel Anadria, Roel Dobbe, Anastasia Giachanou, Ruurd Kuiper, Richard Bartels, Íñigo Martínez de Rituerto de Troya, Carmen Zürcher, Daniel Oberski |
阅读更多来源: ArXiv AI | 17-06-25
Generative AI coding tools and agents do not work for memiguelgrinberg.com
阅读更多来源: Hacker News | 17-06-25
OpenAI wins $200M U.S. defense contractcnbc.com
阅读更多来源: Hacker News | 17-06-25
Rednote releases its first open-source LLM with a Mixture-of-Experts architecture
阅读更多来源: The Decoder | 17-06-25
Anthropic shares blueprint for Claude Research agent using multiple AI agents in parallel
阅读更多来源: The Decoder | 17-06-25
Breaking Quadratic Barriers: A Non-Attention LLM for Ultra-Long Context Horizonsarxiv.org
阅读更多来源: Hacker News | 17-06-25
ZjsComponent: A Pragmatic Approach to Reusable UI Fragments for Web Developmentarxiv.org
阅读更多来源: Hacker News | 17-06-25
Snorting the AGI with Claude Codekadekillary.work
阅读更多来源: Hacker News | 17-06-25
OpenAI updates ChatGPT search with smarter answers and image search
阅读更多来源: The Decoder | 16-06-25
Chemical knowledge and reasoning of large language models vs. chemist expertisenature.com
阅读更多来源: Hacker News | 16-06-25
LLM Chat via SSHgithub.com/ccbikai
阅读更多来源: Hacker News | 16-06-25
Persona-driven Simulation of Voting Behavior in the European Parliament with Large Language Models
Authors: Maximilian Kreutner, Marlene Lutz, Markus Strohmaier |
阅读更多来源: ArXiv AI | 16-06-25
TrustGLM: Evaluating the Robustness of GraphLLMs Against Prompt, Text, and Structure Attacks
Authors: Qihai Zhang, Xinyue Sheng, Yuanfu Sun, Qiaoyu Tan |
阅读更多来源: ArXiv AI | 16-06-25
An Explainable AI Framework for Dynamic Resource Management in Vehicular Network Slicing
Authors: Haochen Sun, Yifan Liu, Ahmed Al-Tahmeesschi, Swarna Chetty, Syed Ali Raza Zaidi, Avishek Nag, Hamed Ahmadi |
阅读更多来源: ArXiv AI | 16-06-25
How do Probabilistic Graphical Models and Graph Neural Networks Look at Network Data?
Authors: Michela Lapenna, Caterina De Bacco |
阅读更多来源: ArXiv AI | 16-06-25
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?
Authors: Zihan Zheng, Zerui Cheng, Zeyu Shen, Shang Zhou, Kaiyuan Liu, Hansen He, Dongruixuan Li, Stanley Wei, Hangyi Hao, Jianzhu Yao, Peiyao Sheng, Zixuan Wang, Wenhao Chai, Aleksandra Korolova, Peter Henderson, Sanjeev Arora, Pramod Viswanath, Jingbo Shang, Saining Xie |
阅读更多来源: ArXiv AI | 16-06-25
Real-World Deployment of a Lane Change Prediction Architecture Based on Knowledge Graph Embeddings and Bayesian Inference
Authors: M. Manzour, Catherine M. Elias, Omar M. Shehata, R. Izquierdo, M. A. Sotelo |
阅读更多来源: ArXiv AI | 16-06-25
Subjective Experience in AI Systems: What Do AI Researchers and the Public Believe?
Authors: Noemi Dreksler, Lucius Caviola, David Chalmers, Carter Allen, Alex Rand, Joshua Lewis, Philip Waggoner, Kate Mays, Jeff Sebo |
阅读更多来源: ArXiv AI | 16-06-25
Improving Large Language Model Safety with Contrastive Representation Learning
Authors: Samuel Simko, Mrinmaya Sachan, Bernhard Schölkopf, Zhijing Jin |
阅读更多来源: ArXiv AI | 16-06-25
code_transformed: The Influence of Large Language Models on Code
Authors: Yuliang Xu, Siming Huang, Mingmeng Geng, Yao Wan, Xuanhua Shi, Dongping Chen |
阅读更多来源: ArXiv AI | 16-06-25
Upgrade or Switch: Do We Need a New Registry Architecture for the Internet of AI Agents?
Authors: Ramesh Raskar, Pradyumna Chari, Jared James Grogan, Mahesh Lambe, Robert Lincourt, Raghu Bala, Abhishek Singh, Ayush Chopra, Rajesh Ranjan, Shailja Gupta, Dimitris Stripelis, Maria Gorskikh, Sichao Wang |
阅读更多来源: ArXiv AI | 16-06-25
EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction
Authors: Hsi-Che Lin, Yu-Chu Yu, Kai-Po Chang, Yu-Chiang Frank Wang |
阅读更多来源: ArXiv AI | 16-06-25
Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables
Authors: Yitong Zhou, Mingyue Cheng, Qingyang Mao, Yucong Luo, Qi Liu, Yupeng Li, Xiaohan Zhang, Deguang Liu, Xin Li, Enhong Chen |
阅读更多来源: ArXiv AI | 16-06-25
LLM-as-a-Fuzzy-Judge: Fine-Tuning Large Language Models as a Clinical Evaluation Judge with Fuzzy Logic
Authors: Weibing Zheng, Laurah Turner, Jess Kropczynski, Murat Ozer, Tri Nguyen, Shane Halse |
阅读更多来源: ArXiv AI | 16-06-25
Large Language Model-Powered Conversational Agent Delivering Problem-Solving Therapy (PST) for Family Caregivers: Enhancing Empathy and Therapeutic Alliance Using In-Context Learning
Authors: Liying Wang, Ph.D., Daffodil Carrington, M.S., Daniil Filienko, M.S., Caroline El Jazmi, M.S., Serena Jinchen Xie, M.S., Martine De Cock, Ph.D., Sarah Iribarren, Ph.D., Weichao Yuwen, Ph.D |
阅读更多来源: ArXiv AI | 16-06-25
RAG+: Enhancing Retrieval-Augmented Generation with Application-Aware Reasoning
Authors: Yu Wang, Shiwan Zhao, Ming Fan, Zhihu Wang, Yubo Zhang, Xicheng Zhang, Zhengfan Wang, Heyuan Huang, Ting Liu |
阅读更多来源: ArXiv AI | 16-06-25
Structure-Aware Automatic Channel Pruning by Searching with Graph Embedding
Authors: Zifan Liu, Yuan Cao, Yanwei Yu, Heng Qi, Jie Gui |
阅读更多来源: ArXiv AI | 16-06-25
VLM@school -- Evaluation of AI image understanding on German middle school knowledge
Authors: René Peinl, Vincent Tischler |
阅读更多来源: ArXiv AI | 16-06-25
Collaborative LLM Inference via Planning for Efficient Reasoning
Authors: Byeongchan Lee, Jonghoon Lee, Dongyoung Kim, Jaehyung Kim, Jinwoo Shin |
阅读更多来源: ArXiv AI | 16-06-25
On the Performance of LLMs for Real Estate Appraisal
Authors: Margot Geerts, Manon Reusens, Bart Baesens, Seppe vanden Broucke, Jochen De Weerdt |
阅读更多来源: ArXiv AI | 16-06-25
Addressing Bias in LLMs: Strategies and Application to Fair AI-based Recruitment
Authors: Alejandro Peña, Julian Fierrez, Aythami Morales, Gonzalo Mancera, Miguel Lopez, Ruben Tolosana |
阅读更多来源: ArXiv AI | 16-06-25
Revealing Political Bias in LLMs through Structured Multi-Agent Debate
Authors: Aishwarya Bandaru, Fabian Bindley, Trevor Bluth, Nandini Chavda, Baixu Chen, Ethan Law |
阅读更多来源: ArXiv AI | 16-06-25
Towards a Cascaded LLM Framework for Cost-effective Human-AI Decision-Making
Authors: Claudio Fanconi, Mihaela van der Schaar |
阅读更多来源: ArXiv AI | 16-06-25
Tracing LLM Reasoning Processes with Strategic Games: A Framework for Planning, Revision, and Resource-Constrained Decision Making
Authors: Xiaopeng Yuan, Xingjian Zhang, Ke Xu, Yifan Xu, Lijun Yu, Jindong Wang, Yushun Dong, Haohan Wang |
阅读更多来源: ArXiv AI | 16-06-25
The z80 technique reveals the source code for Atlassian's 'rovo' AI assistantghuntley.com
阅读更多来源: Hacker News | 16-06-25
Let's Talk About ChatGPT-Induced Spiritual Psychosisdefault.blog
阅读更多来源: Hacker News | 16-06-25
Rabbit launches "intern," a software AI agent designed to handle team-level projects
阅读更多来源: The Decoder | 15-06-25
Apple's new AI benchmarks show its models still lag behind leaders like OpenAI and Google
阅读更多来源: The Decoder | 15-06-25
Slimming Down LLMs Without Losing Their Minds
Authors: Qingda (Michael)Mai |
阅读更多来源: ArXiv AI | 15-06-25
BioClinical ModernBERT: A State-of-the-Art Long-Context Encoder for Biomedical and Clinical NLP
Authors: Thomas Sounack, Joshua Davis, Brigitte Durieux, Antoine Chaffin, Tom J. Pollard, Eric Lehman, Alistair E. W. Johnson, Matthew McDermott, Tristan Naumann, Charlotta Lindvall |
阅读更多来源: ArXiv AI | 15-06-25
The Role of Generative AI in Facilitating Social Interactions: A Scoping Review
Authors: T. T. J. E. Arets, G. Perugia, M. Houben, W.A. IJsselsteijn |
阅读更多来源: ArXiv AI | 15-06-25
Robustly Improving LLM Fairness in Realistic Settings via Interpretability
Authors: Adam Karvonen, Samuel Marks |
阅读更多来源: ArXiv AI | 15-06-25
Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors
Authors: Chen Yueh-Han, Nitish Joshi, Yulin Chen, Maksym Andriushchenko, Rico Angell, He He |
阅读更多来源: ArXiv AI | 15-06-25
GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models
Authors: Evelyn Ma, Duo Zhou, Peizhi Niu, Huiting Zhou, Huan Zhang, Olgica Milenkovic, S. Rasoul Etesami |
阅读更多来源: ArXiv AI | 15-06-25
Farseer: A Refined Scaling Law in Large Language Models
Authors: Houyi Li, Wenzhen Zheng, Qiufeng Wang, Zhenyu Ding, Haoying Wang, Zili Wang, Shijie Xuyang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang |
阅读更多来源: ArXiv AI | 15-06-25
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Authors: Qizhe Zhang, Mengzhen Liu, Lichen Li, Ming Lu, Yuan Zhang, Junwen Pan, Qi She, Shanghang Zhang |
阅读更多来源: ArXiv AI | 15-06-25
One Patient, Many Contexts: Scaling Medical AI Through Contextual Intelligence
Authors: Michelle M. Li, Ben Y. Reis, Adam Rodman, Tianxi Cai, Noa Dagan, Ran D. Balicer, Joseph Loscalzo, Isaac S. Kohane, Marinka Zitnik |
阅读更多来源: ArXiv AI | 15-06-25
WGSR-Bench: Wargame-based Game-theoretic Strategic Reasoning Benchmark for Large Language Models
Authors: Qiyue Yin, Pei Xu, Qiaozhe Li, Shengda Liu, Shengqi Shen, Tong Wang, Yihong Han, Xiaonan Zhao, Likun Yang, Shiyue Cao, Shiyu Qiu, Yuxuan Liu, Shizhao Yu, Lei Cui, Chengxin Yan, Jie Sun, Xiangquan Tang, Kaiqi Huang |
阅读更多来源: ArXiv AI | 15-06-25
Closer to Language than Steam: AI as the Cognitive Engine of a New Productivity Revolution
Authors: Xinmin Fang, Lingfeng Tao, Zhengxiong Li |
阅读更多来源: ArXiv AI | 15-06-25
Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills
Authors: Yuquan Xie, Zaijing Li, Rui Shao, Gongwei Chen, Kaiwen Zhou, Yinchuan Li, Dongmei Jiang, Liqiang Nie |
阅读更多来源: ArXiv AI | 15-06-25
Reasoning RAG via System 1 or System 2: A Survey on Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges
Authors: Jintao Liang, Gang Su, Huifeng Lin, You Wu, Rui Zhao, Ziyue Li |
阅读更多来源: ArXiv AI | 15-06-25
Primender Sequence: A Novel Mathematical Construct for Testing Symbolic Inference and AI Reasoning
Authors: Mohd Anwar Jamal Faiz |
阅读更多来源: ArXiv AI | 15-06-25
LogiPlan: A Structured Benchmark for Logical Planning and Relational Reasoning in LLMs
Authors: Yanan Cai, Ahmed Salem, Besmira Nushi, Mark Russinovich |
阅读更多来源: ArXiv AI | 15-06-25
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning
Authors: Yuhao Zhou, Yiheng Wang, Xuming He, Ruoyao Xiao, Zhiwei Li, Qiantai Feng, Zijie Guo, Yuejin Yang, Hao Wu, Wenxuan Huang, Jiaqi Wei, Dan Si, Xiuqi Yao, Jia Bu, Haiwen Huang, Tianfan Fu, Shixiang Tang, Ben Fei, Dongzhan Zhou, Fenghua Ling, Yan Lu, Siqi Sun, Chenhui Li, Guanjie Zheng, Jiancheng Lv, Wenlong Zhang, Lei Bai |
阅读更多来源: ArXiv AI | 15-06-25
Automated Validation of Textual Constraints Against AutomationML via LLMs and SHACL
Authors: Tom Westermann, Aljosha Köcher, Felix Gehlhoff |
阅读更多来源: ArXiv AI | 15-06-25
TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving
Authors: Vincenzo Colle, Mohamed Sana, Nicola Piovesan, Antonio De Domenico, Fadhel Ayed, Merouane Debbah |
阅读更多来源: ArXiv AI | 15-06-25
A Study on Individual Spatiotemporal Activity Generation Method Using MCP-Enhanced Chain-of-Thought Large Language Models
Authors: Yu Zhang, Yang Hu, De Wang |
阅读更多来源: ArXiv AI | 15-06-25
OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
Authors: Xiaozhe Li, Jixuan Chen, Xinyu Fang, Shengyuan Ding, Haodong Duan, Qingwen Liu, Kai Chen |
阅读更多来源: ArXiv AI | 15-06-25
Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?
Authors: Fei Lin, Ziyang Gong, Cong Wang, Yonglin Tian, Tengchao Zhang, Xue Yang, Gen Luo, Fei-Yue Wang |
阅读更多来源: ArXiv AI | 15-06-25
AMD's AI Future Is Rack Scale 'Helios'morethanmoore.substack.com
阅读更多来源: Hacker News | 15-06-25
I have reimplemented Stable Diffusion 3.5 from scratch in pure PyTorchgithub.com/yousef-rafat
阅读更多来源: Hacker News | 15-06-25
Text-to-LoRA: Hypernetwork that generates task-specific LLM adapters (LoRAs)github.com/sakanaai
阅读更多来源: Hacker News | 15-06-25
RAG Is a Fancy, Lying Search Enginestardog.ai
阅读更多来源: Hacker News | 15-06-25
Clinical knowledge in LLMs does not translate to human interactionsarxiv.org
阅读更多来源: Hacker News | 15-06-25
I used ChatGPT to learn programming from zero and built a video generation SaaSvidmakerpro.com
阅读更多来源: Hacker News | 15-06-25
Mechanize is building digital offices to train AI agents to fully automate computer work
阅读更多来源: The Decoder | 15-06-25
The Army’s Newest Recruits: Tech Execs From Meta, OpenAI and Morewsj.com
阅读更多来源: Hacker News | 14-06-25
Student discovers fungus predicted by Albert Hoffmanwvu.edu
阅读更多来源: Hacker News | 14-06-25
Saab achieves AI milestone with Gripen Esaab.com
阅读更多来源: Hacker News | 14-06-25
Meta launches AI video editing but holds back on full features for now
阅读更多来源: The Decoder | 14-06-25
Mattel partners with OpenAI to develop AI-powered toys and experiences
阅读更多来源: The Decoder | 14-06-25
Meta's latest model highlights the challenge AI faces in long-term planning and causal reasoning
阅读更多来源: The Decoder | 14-06-25
RISC-V in AI and HPC Part 1: Per Aspera Ad Astra?eetimes.com
阅读更多来源: Hacker News | 14-06-25
Meta invests $14.3B in Scale AI to kick-start superintelligence labnytimes.com
阅读更多来源: Hacker News | 14-06-25
Students fear AI could cause "brain rot" by making it too easy to skip crucial learning steps
阅读更多来源: The Decoder | 13-06-25
Maximizing Battery Storage Profits via High-Frequency Intraday Tradingarxiv.org
阅读更多来源: Hacker News | 13-06-25
Researchers confirm two journalists were hacked with Paragon spywaretechcrunch.com
阅读更多来源: Hacker News | 13-06-25
OpenAI's o3-pro may be too smart for small talk
阅读更多来源: The Decoder | 12-06-25
OpenAI o3-prohelp.openai.com
阅读更多来源: Hacker News | 12-06-25
GauntletAI (YC S17): All expenses paid AI training and guaranteed $200k+ jobgauntletai.com
阅读更多来源: Hacker News | 12-06-25
Intelligent Design 4.0: Paradigm Evolution Toward the Agentic AI Era
Authors: Shuo Jiang, Min Xie, Frank Youhua Chen, Jian Ma, Jianxi Luo |
阅读更多来源: ArXiv AI | 12-06-25
Large Language Models for Design Structure Matrix Optimization
Authors: Shuo Jiang, Min Xie, Jianxi Luo |
阅读更多来源: ArXiv AI | 12-06-25
Guided Graph Compression for Quantum Graph Neural Networks
Authors: Mikel Casals, Vasilis Belis, Elias F. Combarro, Eduard Alarcón, Sofia Vallecorsa, Michele Grossi |
阅读更多来源: ArXiv AI | 12-06-25
Attention Head Embeddings with Trainable Deep Kernels for Hallucination Detection in LLMs
Authors: Rodion Oblovatny, Alexandra Bazarova, Alexey Zaytsev |
阅读更多来源: ArXiv AI | 12-06-25
3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation
Authors: Seonho Lee, Jiho Choi, Inha Kang, Jiwook Kim, Junsung Park, Hyunjung Shim |
阅读更多来源: ArXiv AI | 12-06-25
Stakeholder Participation for Responsible AI Development: Disconnects Between Guidance and Current Practice
Authors: Emma Kallina, Thomas Bohné, Jat Singh |
阅读更多来源: ArXiv AI | 12-06-25
HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations
Authors: Marco Federici, Riccardo Del Chiaro, Boris van Breugel, Paul Whatmough, Markus Nagel |
阅读更多来源: ArXiv AI | 12-06-25
PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants
Authors: Zheng Zhao, Clara Vania, Subhradeep Kayal, Naila Khan, Shay B. Cohen, Emine Yilmaz |
阅读更多来源: ArXiv AI | 12-06-25
The Emergence of Abstract Thought in Large Language Models Beyond Any Language
Authors: Yuxin Chen, Yiran Zhao, Yang Zhang, An Zhang, Kenji Kawaguchi, Shafiq Joty, Junnan Li, Tat-Seng Chua, Michael Qizhe Shieh, Wenxuan Zhang |
阅读更多来源: ArXiv AI | 12-06-25
LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge
Authors: Sahar Abdelnabi, Aideen Fay, Ahmed Salem, Egor Zverev, Kai-Chieh Liao, Chi-Huang Liu, Chun-Chih Kuo, Jannis Weigend, Danyael Manlangit, Alex Apostolov, Haris Umair, João Donato, Masayuki Kawakita, Athar Mahboob, Tran Huu Bach, Tsun-Han Chiang, Myeongjin Cho, Hajin Choi, Byeonghyeon Kim, Hyeonjin Lee, Benjamin Pannell, Conor McCauley, Mark Russinovich, Andrew Paverd, Giovanni Cherubin |
阅读更多来源: ArXiv AI | 12-06-25
A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy
Authors: Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Chunyu Miao, Dongyuan Li, Aiwei Liu, Yue Zhou, Yankai Chen, Weizhi Zhang, Yangning Li, Liancheng Fang, Renhe Jiang, Philip S. Yu |
阅读更多来源: ArXiv AI | 12-06-25
Beyond Nash Equilibrium: Bounded Rationality of LLMs and humans in Strategic Decision-making
Authors: Kehan Zheng, Jinfeng Zhou, Hongning Wang |
阅读更多来源: ArXiv AI | 12-06-25
DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy
Authors: Kaixuan Xu, Jiajun Chai, Sicheng Li, Yuqian Fu, Yuanheng Zhu, Dongbin Zhao |
阅读更多来源: ArXiv AI | 12-06-25
Application-Driven Value Alignment in Agentic AI Systems: Survey and Perspectives
Authors: Wei Zeng, Hengshu Zhu, Chuan Qin, Han Wu, Yihang Cheng, Sirui Zhang, Xiaowei Jin, Yinuo Shen, Zhenxing Wang, Feimin Zhong, Hui Xiong |
阅读更多来源: ArXiv AI | 12-06-25
Fine-tuning LLMs is a waste of timecodinginterviewsmadesimple.substack.com
阅读更多来源: Hacker News | 12-06-25
EchoLeak – 0-Click AI Vulnerability Enabling Data Exfiltration from 365 Copilotaim.security
阅读更多来源: Hacker News | 12-06-25
OpenAI co-founder Ilya Sutskever believes AI will shape everyone's life "whether you like it or not"
阅读更多来源: The Decoder | 11-06-25
Meta AI chief scientist LeCun's latest comment reveals deep industry split over the future of AI
阅读更多来源: The Decoder | 11-06-25
Scientists discover that feeding AI models 10% 4chan trash actually makes them better behaved
阅读更多来源: The Decoder | 11-06-25
Zuckerberg forms elite AI team to catch up with competitors
阅读更多来源: The Decoder | 11-06-25
Apple's new Foundation Models framework adds on-device AI to apps with three lines of Swift code
阅读更多来源: The Decoder | 11-06-25
OpenAI dropped the price of o3 by 80%twitter.com/sama
阅读更多来源: Hacker News | 11-06-25
Low-background Steel: content without AI contaminationjgc.org
阅读更多来源: Hacker News | 11-06-25
Launch HN: BitBoard (YC X25) – AI agents for healthcare back-offices
阅读更多来源: Hacker News | 11-06-25
AlphaWrite: AI that improves at writing by evolving its own storiestobysimonds.com
阅读更多来源: Hacker News | 11-06-25
WIP: Large Language Model-Enhanced Smart Tutor for Undergraduate Circuit Analysis
Authors: Liangliang Chen, Huiru Xie, Jacqueline Rohde, Ying Zhang |
阅读更多来源: ArXiv AI | 11-06-25
Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions
Authors: Clara Lachenmaier, Judith Sieker, Sina Zarrieß |
阅读更多来源: ArXiv AI | 11-06-25
Propositional Logic for Probing Generalization in Neural Networks
Authors: Anna Langedijk, Jaap Jumelet, Willem Zuidema |
阅读更多来源: ArXiv AI | 11-06-25
Tailored Architectures for Time Series Forecasting: Evaluating Deep Learning Models on Gaussian Process-Generated Data
Authors: Victoria Hankemeier, Malte Schilling |
阅读更多来源: ArXiv AI | 11-06-25
Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation
Authors: Xiaowen Ma, Chenyang Lin, Yao Zhang, Volker Tresp, Yunpu Ma |
阅读更多来源: ArXiv AI | 11-06-25
FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed
Authors: Sizhe Dang, Yangyang Guo, Yanjun Zhao, Haishan Ye, Xiaodong Zheng, Guang Dai, Ivor Tsang |
阅读更多来源: ArXiv AI | 11-06-25
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning
Authors: Haozhen Zhang, Tao Feng, Jiaxuan You |
阅读更多来源: ArXiv AI | 11-06-25
LeanTutor: A Formally-Verified AI Tutor for Mathematical Proofs
Authors: Manooshree Patel, Rayna Bhattacharyya, Thomas Lu, Arnav Mehta, Niels Voss, Narges Norouzi, Gireeja Ranade |
阅读更多来源: ArXiv AI | 11-06-25
The AI Imperative: Scaling High-Quality Peer Review in Machine Learning
Authors: Qiyao Wei, Samuel Holt, Jing Yang, Markus Wulfmeier, Mihaela van der Schaar |
阅读更多来源: ArXiv AI | 11-06-25
SOP-Bench: Complex Industrial SOPs for Evaluating LLM Agents
Authors: Subhrangshu Nandi, Arghya Datta, Nikhil Vichare, Indranil Bhattacharya, Huzefa Raja, Jing Xu, Shayan Ray, Giuseppe Carenini, Abhi Srivastava, Aaron Chan, Man Ho Woo, Amar Kandola, Brandon Theresa, Francesco Carbone |
阅读更多来源: ArXiv AI | 11-06-25
Transforming Expert Knowledge into Scalable Ontology via Large Language Models
Authors: Ikkei Itoku, David Theil, Evelyn Eichelsdoerfer Uehara, Sreyoshi Bhaduri, Junnosuke Kuroda, Toshi Yumoto, Alex Gil, Natalie Perez, Rajesh Cherukuri, Naumaan Nayyar |
阅读更多来源: ArXiv AI | 11-06-25
A Survey on Large Language Models for Mathematical Reasoning
Authors: Peng-Yuan Wang, Tian-Shuo Liu, Chenyang Wang, Yi-Di Wang, Shu Yan, Cheng-Xing Jia, Xu-Hui Liu, Xin-Wei Chen, Jia-Cheng Xu, Ziniu Li, Yang Yu |
阅读更多来源: ArXiv AI | 11-06-25
HGFormer: A Hierarchical Graph Transformer Framework for Two-Stage Colonel Blotto Games via Reinforcement Learning
Authors: Yang Lv, Jinlong Lei, Peng Yi |
阅读更多来源: ArXiv AI | 11-06-25
Safe and Economical UAV Trajectory Planning in Low-Altitude Airspace: A Hybrid DRL-LLM Approach with Compliance Awareness
Authors: Yanwei Gong, Xiaolin Chang |
阅读更多来源: ArXiv AI | 11-06-25
Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning
Authors: Kongcheng Zhang, Qi Yao, Shunyu Liu, Yingjie Wang, Baisheng Lai, Jieping Ye, Mingli Song, Dacheng Tao |
阅读更多来源: ArXiv AI | 11-06-25
Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task
Authors: Nataliya Kosmyna, Eugene Hauptmann, Ye Tong Yuan, Jessica Situ, Xian-Hao Liao, Ashly Vivian Beresnitzky, Iris Braunstein, Pattie Maes |
阅读更多来源: ArXiv AI | 11-06-25
Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents
Authors: Irene Testini, José Hernández-Orallo, Lorenzo Pacchiardi |
阅读更多来源: ArXiv AI | 11-06-25
AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions
Authors: Polina Kirichenko, Mark Ibrahim, Kamalika Chaudhuri, Samuel J. Bell |
阅读更多来源: ArXiv AI | 11-06-25
ChatGPT's voice is now more natural and can consistently translate conversations in real time
阅读更多来源: The Decoder | 10-06-25
Google's Gemini 2.5 Pro beats OpenAI's o3 model in processing complex, lengthy texts
阅读更多来源: The Decoder | 10-06-25
ChatGPT scams range from silly money-making ploys to calculated political meddling
阅读更多来源: The Decoder | 10-06-25
Boosting LLM Reasoning via Spontaneous Self-Correction
Authors: Xutong Zhao, Tengyu Xu, Xuewei Wang, Zhengxing Chen, Di Jin, Liang Tan, Yen-Ting, Zishun Yu, Zhuokai Zhao, Yun He, Sinong Wang, Han Fang, Sarath Chandar, Chen Zhu |
阅读更多来源: ArXiv AI | 10-06-25
Evaluating LLM-corrupted Crowdsourcing Data Without Ground Truth
Authors: Yichi Zhang, Jinlong Pang, Zhaowei Zhu, Yang Liu |
阅读更多来源: ArXiv AI | 10-06-25
Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images
Authors: Liangliang You, Junchi Yao, Shu Yang, Guimin Hu, Lijie Hu, Di Wang |
阅读更多来源: ArXiv AI | 10-06-25
Translating Federated Learning Algorithms in Python into CSP Processes Using ChatGPT
Authors: Miroslav Popovic, Marko Popovic, Miodrag Djukic, Ilija Basicevic |
阅读更多来源: ArXiv AI | 10-06-25
BRIGHT+: Upgrading the BRIGHT Benchmark with MARCUS, a Multi-Agent RAG Clean-Up Suite
Authors: Liyang Chen, Yujun Cai, Jieqiong Dong, Yiwei Wang |
阅读更多来源: ArXiv AI | 10-06-25
Reasoning Multimodal Large Language Model: Data Contamination and Dynamic Evaluation
Authors: Ming Liu, Wensheng Zhang |
阅读更多来源: ArXiv AI | 10-06-25
Boosting Vulnerability Detection of LLMs via Curriculum Preference Optimization with Synthetic Reasoning Data
Authors: Xin-Cheng Wen, Yijun Yang, Cuiyun Gao, Yang Xiao, Deheng Ye |
阅读更多来源: ArXiv AI | 10-06-25
LLM-Enhanced Rapid-Reflex Async-Reflect Embodied Agent for Real-Time Decision-Making in Dynamically Changing Environments
Authors: Yangqing Zheng, Shunqi Mao, Dingxin Zhang, Weidong Cai |
阅读更多来源: ArXiv AI | 10-06-25
Evaluating Visual Mathematics in Multimodal LLMs: A Multilingual Benchmark Based on the Kangaroo Tests
Authors: Arnau Igualde Sáez, Lamyae Rhomrasi, Yusef Ahsini, Ricardo Vinuesa, Sergio Hoyas, Jose P. García Sabater, Marius J. Fullana i Alfonso, J. Alberto Conejero |
阅读更多来源: ArXiv AI | 10-06-25
An Intelligent Fault Self-Healing Mechanism for Cloud AI Systems via Integration of Large Language Models and Deep Reinforcement Learning
Authors: Ze Yang, Yihong Jin, Juntian Liu, Xinhe Xu |
阅读更多来源: ArXiv AI | 10-06-25
Fact in Fragments: Deconstructing Complex Claims via LLM-based Atomic Fact Extraction and Verification
Authors: Liwen Zheng, Chaozhuo Li, Zheng Liu, Feiran Huang, Haoran Jia, Zaisheng Ye, Xi Zhang |
阅读更多来源: ArXiv AI | 10-06-25
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
Authors: Lu Ma, Hao Liang, Meiyi Qiang, Lexiang Tang, Xiaochen Ma, Zhen Hao Wong, Junbo Niu, Chengyu Shen, Runming He, Bin Cui, Wentao Zhang |
阅读更多来源: ArXiv AI | 10-06-25
REMoH: A Reflective Evolution of Multi-objective Heuristics approach via Large Language Models
Authors: Diego Forniés-Tabuenca, Alejandro Uribe, Urtzi Otamendi, Arkaitz Artetxe, Juan Carlos Rivera, Oier Lopez de Lacalle |
阅读更多来源: ArXiv AI | 10-06-25
RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards
Authors: Jingnan Zheng, Xiangtian Ji, Yijun Lu, Chenhang Cui, Weixiang Zhao, Gelei Deng, Zhenkai Liang, An Zhang, Tat-Seng Chua |
阅读更多来源: ArXiv AI | 10-06-25
Addition in Four Movements: Mapping Layer-wise Information Trajectories in LLMs
Authors: Yao Yan |
阅读更多来源: ArXiv AI | 10-06-25
Evaluating Large Language Models on the Frame and Symbol Grounding Problems: A Zero-shot Benchmark
Authors: Shoko Oka |
阅读更多来源: ArXiv AI | 10-06-25
Gradients: When Markets Meet Fine-tuning -- A Distributed Approach to Model Optimisation
Authors: Christopher Subia-Waud (Rayonlabs Team) |
阅读更多来源: ArXiv AI | 10-06-25
Solving Inequality Proofs with Large Language Models
Authors: Jiayi Sheng, Luna Lyu, Jikai Jin, Tony Xia, Alex Gu, James Zou, Pan Lu |
阅读更多来源: ArXiv AI | 10-06-25
Hey, That's My Data! Label-Only Dataset Inference in Large Language Models
Authors: Chen Xiong, Zihao Wang, Rui Zhu, Tsung-Yi Ho, Pin-Yu Chen, Jingwei Xiong, Haixu Tang, Lucila Ohno-Machado |
阅读更多来源: ArXiv AI | 10-06-25
End-to-End Framework for Robot Lawnmower Coverage Path Planning using Cellular Decomposition
Authors: Nikunj Shah, Utsav Dey, Kenji Nishimiya |
阅读更多来源: ArXiv AI | 10-06-25
Text-to-LoRA: Instant Transformer Adaption
Authors: Rujikorn Charakorn, Edoardo Cetin, Yujin Tang, Robert Tjarko Lange |
阅读更多来源: ArXiv AI | 10-06-25
Simple Yet Effective: Extracting Private Data Across Clients in Federated Fine-Tuning of Large Language Models
Authors: Yingqi Hu, Zhuo Zhang, Jingyuan Zhang, Lizhen Qu, Zenglin Xu |
阅读更多来源: ArXiv AI | 10-06-25
semantic-features: A User-Friendly Tool for Studying Contextual Word Embeddings in Interpretable Semantic Spaces
Authors: Jwalanthi Ranganathan, Rohan Jha, Kanishka Misra, Kyle Mahowald |
阅读更多来源: ArXiv AI | 10-06-25
(AI peers) are people learning from the same standpoint: Perception of AI characters in a Collaborative Science Investigation
Authors: Eunhye Grace Ko, Soo Hyoung Joo |
阅读更多来源: ArXiv AI | 10-06-25
DesignBench: A Comprehensive Benchmark for MLLM-based Front-end Code Generation
Authors: Jingyu Xiao, Ming Wang, Man Ho Lam, Yuxuan Wan, Junliang Liu, Yintong Huo, Michael R. Lyu |
阅读更多来源: ArXiv AI | 10-06-25
Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models
Authors: Zahra Babaiee, Peyman M. Kiasari, Daniela Rus, Radu Grosu |
阅读更多来源: ArXiv AI | 10-06-25
Towards an Explainable Comparison and Alignment of Feature Embeddings
Authors: Mohammad Jalali, Bahar Dibaei Nia, Farzan Farnia |
阅读更多来源: ArXiv AI | 10-06-25
Towards Data Systems That Are Business Semantic-Centric and AI Agents-Assisted
Authors: Cecil Pang |
阅读更多来源: ArXiv AI | 10-06-25
Contextual Memory Intelligence -- A Foundational Paradigm for Human-AI Collaboration and Reflective Generative AI Systems
Authors: Kristy Wedel |
阅读更多来源: ArXiv AI | 10-06-25
Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias
Authors: Yuanzhe Hu, Kinshuk Goel, Vlad Killiakov, Yaoqing Yang |
阅读更多来源: ArXiv AI | 10-06-25
Explainability in Context: A Multilevel Framework Aligning AI Explanations with Stakeholder with LLMs
Authors: Marilyn Bello, Rafael Bello, Maria-Matilde García, Ann Nowé, Iván Sevillano-García, Francisco Herrera |
阅读更多来源: ArXiv AI | 10-06-25
CrimeMind: Simulating Urban Crime with Multi-Modal LLM Agents
Authors: Qingbin Zeng, Ruotong Zhao, Jinzhu Mao, Haoyang Li, Fengli Xu, Yong Li |
阅读更多来源: ArXiv AI | 10-06-25
Preference Learning for AI Alignment: a Causal Perspective
Authors: Katarzyna Kobalczyk, Mihaela van der Schaar |
阅读更多来源: ArXiv AI | 10-06-25
CP-Bench: Evaluating Large Language Models for Constraint Modelling
Authors: Kostis Michailidis, Dimos Tsouros, Tias Guns |
阅读更多来源: ArXiv AI | 10-06-25
PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time
Authors: Weizhi Zhang, Xinyang Zhang, Chenwei Zhang, Liangwei Yang, Jingbo Shang, Zhepei Wei, Henry Peng Zou, Zijie Huang, Zhengyang Wang, Yifan Gao, Xiaoman Pan, Lian Xiong, Jingguo Liu, Philip S. Yu, Xian Li |
阅读更多来源: ArXiv AI | 10-06-25
The last six months in LLMs, illustrated by pelicans on bicyclessimonwillison.net
阅读更多来源: Hacker News | 09-06-25
What happens when people don't understand how AI workstheatlantic.com
阅读更多来源: Hacker News | 09-06-25
LLMs are cheapsnellman.net
阅读更多来源: Hacker News | 09-06-25
OpenAI leaves the question of AI consciousness consciously unanswered
阅读更多来源: The Decoder | 09-06-25
Anthropic cuts Claude access for Windsurf after OpenAI's $3B takeover news
阅读更多来源: The Decoder | 09-06-25
Building an AI server on a budgetinformationga.in
阅读更多来源: Hacker News | 09-06-25
Teaming in the AI Era: AI-Augmented Frameworks for Forming, Simulating, and Optimizing Human Teams
Authors: Mohammed Almutairi |
阅读更多来源: ArXiv AI | 08-06-25
Exploring Diffusion Transformer Designs via Grafting
Authors: Keshigeyan Chandrasegaran, Michael Poli, Daniel Y. Fu, Dongjun Kim, Lea M. Hadzic, Manling Li, Agrim Gupta, Stefano Massaroli, Azalia Mirhoseini, Juan Carlos Niebles, Stefano Ermon, Li Fei-Fei |
阅读更多来源: ArXiv AI | 08-06-25
Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay
Authors: Yifan Sun, Jingyan Shen, Yibin Wang, Tianyu Chen, Zhendong Wang, Mingyuan Zhou, Huan Zhang |
阅读更多来源: ArXiv AI | 08-06-25
Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models
Authors: Taha Entesari, Arman Hatami, Rinat Khaziev, Anil Ramakrishna, Mahyar Fazlyab |
阅读更多来源: ArXiv AI | 08-06-25
Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games
Authors: Niv Eckhaus, Uri Berger, Gabriel Stanovsky |
阅读更多来源: ArXiv AI | 08-06-25
Contextual Integrity in LLMs via Reasoning and Reinforcement Learning
Authors: Guangchen Lan, Huseyin A. Inan, Sahar Abdelnabi, Janardhan Kulkarni, Lukas Wutschitz, Reza Shokri, Christopher G. Brinton, Robert Sim |
阅读更多来源: ArXiv AI | 08-06-25
Zero-Shot Adaptation of Parameter-Efficient Fine-Tuning in Diffusion Models
Authors: Farzad Farhadzadeh, Debasmit Das, Shubhankar Borse, Fatih Porikli |
阅读更多来源: ArXiv AI | 08-06-25
Plugging Schema Graph into Multi-Table QA: A Human-Guided Framework for Reducing LLM Reliance
Authors: Xixi Wang, Miguel Costa, Jordanka Kovaceva, Shuai Wang, Francisco C. Pereira |
阅读更多来源: ArXiv AI | 08-06-25
CogMath: Assessing LLMs' Authentic Mathematical Ability from a Human Cognitive Perspective
Authors: Jiayu Liu, Zhenya Huang, Wei Dai, Cheng Cheng, Jinze Wu, Jing Sha, Song Li, Qi Liu, Shijin Wang, Enhong Chen |
阅读更多来源: ArXiv AI | 08-06-25
Matching Markets Meet LLMs: Algorithmic Reasoning with Ranked Preferences
Authors: Hadi Hosseini, Samarth Khanna, Ronak Singh |
阅读更多来源: ArXiv AI | 08-06-25
Schema Generation for Large Knowledge Graphs Using Large Language Models
Authors: Bohui Zhang, Yuan He, Lydia Pintscher, Albert Meroño Peñuela, Elena Simperl |
阅读更多来源: ArXiv AI | 08-06-25
"Don't Do That!": Guiding Embodied Systems through Large Language Model-based Constraint Generation
Authors: Aladin Djuhera, Amin Seffo, Masataro Asai, Holger Boche |
阅读更多来源: ArXiv AI | 08-06-25
DeePoly: A High-Order Accuracy and Efficiency Deep-Polynomial Framework for Scientific Machine Learning
Authors: Li Liu, Heng Yong |
阅读更多来源: ArXiv AI | 08-06-25
E-bike agents: Large Language Model-Driven E-Bike Accident Analysis and Severity Prediction
Authors: Zhichao Yang, Jiashu He, Mohammad B. Al-Khasawneh, Darshan Pandit, Cirillo Cinzia |
阅读更多来源: ArXiv AI | 08-06-25
Agents of Change: Self-Evolving LLM Agents for Strategic Planning
Authors: Nikolas Belle, Dakota Barnes, Alfonso Amayuelas, Ivan Bercovich, Xin Eric Wang, William Wang |
阅读更多来源: ArXiv AI | 08-06-25
Ontology-based knowledge representation for bone disease diagnosis: a foundation for safe and sustainable medical artificial intelligence systems
Authors: Loan Dao, Ngoc Quoc Ly |
阅读更多来源: ArXiv AI | 08-06-25
Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design
Authors: Lin Sun, Weihong Lin, Jinzhu Wu, Yongfu Zhu, Xiaoqi Jian, Guangxiang Zhao, Change Jia, Linglin Zhang, Sai-er Hu, Yuhan Wu, Xiangzheng Zhang |
阅读更多来源: ArXiv AI | 08-06-25
Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning
Authors: Jiayu Wang, Yifei Ming, Zixuan Ke, Caiming Xiong, Shafiq Joty, Aws Albarghouthi, Frederic Sala |
阅读更多来源: ArXiv AI | 08-06-25
LLMs for sensory-motor control: Combining in-context and iterative learning
Authors: Jônata Tyska Carvalho, Stefano Nolfi |
阅读更多来源: ArXiv AI | 08-06-25
When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models
Authors: Kai Wang, Yihao Zhang, Meng Sun |
阅读更多来源: ArXiv AI | 08-06-25
LLM-First Search: Self-Guided Exploration of the Solution Space
Authors: Nathan Herr, Tim Rocktäschel, Roberta Raileanu |
阅读更多来源: ArXiv AI | 08-06-25
Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic Reasoning
Authors: Mehdi Azarafza, Mojtaba Nayyeri, Faezeh Pasandideh, Steffen Staab, Achim Rettberg |
阅读更多来源: ArXiv AI | 08-06-25
Control Tax: The Price of Keeping AI in Check
Authors: Mikhail Terekhov, Zhen Ning David Liu, Caglar Gulcehre, Samuel Albanie |
阅读更多来源: ArXiv AI | 08-06-25
Focus and Context and LLMsglek.net
阅读更多来源: Hacker News | 08-06-25
Field Notes from Shipping Real Code with Claudediwank.space
阅读更多来源: Hacker News | 08-06-25
Elevenlabs' Eleven v3 lets AI voices whisper, laugh and express emotions naturally
阅读更多来源: The Decoder | 08-06-25
OpenAI starts retaining all ChatGPT user data, including deleted chats and API data
阅读更多来源: The Decoder | 08-06-25
I read all of Cloudflare's Claude-generated commitsmaxemitchell.com
阅读更多来源: Hacker News | 08-06-25
Updates to Advanced Voice Mode for paid usershelp.openai.com
阅读更多来源: Hacker News | 08-06-25
Reddit sues Anthropic for scraping site content to train Claude
阅读更多来源: The Decoder | 07-06-25
Meta's new high-tech Aria Gen 2 glasses are the ultimate AI training data collector
阅读更多来源: The Decoder | 07-06-25
Sandia turns on brain-like storage-free supercomputerblocksandfiles.com
阅读更多来源: Hacker News | 07-06-25
Show HN: AI game animation sprite generatorgodmodeai.cloud
阅读更多来源: Hacker News | 07-06-25
Workhorse LLMs: Why Open Source Models Dominate Closed Source for Batch Taskssutro.sh
阅读更多来源: Hacker News | 07-06-25
The Illusion of Thinking: Understanding the Limitations of Reasoning LLMs [pdf]cdn-apple.com
阅读更多来源: Hacker News | 07-06-25
NASA delays next flight of Boeing's alternative to SpaceX Dragontheedgemalaysia.com
阅读更多来源: Hacker News | 07-06-25
Reverse Engineering Cursor's LLM Clienttensorzero.com
阅读更多来源: Hacker News | 07-06-25
Onyx (YC W24) – AI Assistants for Work Hiring Founding AEycombinator.com
阅读更多来源: Hacker News | 07-06-25
Meta: Shut down your invasive AI Discover feedmozillafoundation.org
阅读更多来源: Hacker News | 07-06-25
What "Working" Means in the Era of AI Appsa16z.com
阅读更多来源: Hacker News | 07-06-25
OpenAI reaches three million enterprise users, adds new ChatGPT business features
阅读更多来源: The Decoder | 06-06-25
Tokasaurus: An LLM inference engine for high-throughput workloadsstanford.edu
阅读更多来源: Hacker News | 06-06-25
How we’re responding to The NYT’s data demands in order to protect user privacyopenai.com
阅读更多来源: Hacker News | 06-06-25
Show HN: Claude Composergithub.com/possibilities
阅读更多来源: Hacker News | 06-06-25
Anthropic slashes Claude 3.x access on Windsurf following OpenAI's reported $3 billion takeover
阅读更多来源: The Decoder | 06-06-25
Anthropic co-founder on cutting access to Windsurftechcrunch.com
阅读更多来源: Hacker News | 06-06-25
Machine Learning: The Native Language of Biologydecodingbiology.substack.com
阅读更多来源: Hacker News | 06-06-25
OpenAI brings longer-term memory feature to free ChatGPT users
阅读更多来源: The Decoder | 05-06-25
OpenAI adds new features and improvements to its agent development tools and language model
阅读更多来源: The Decoder | 05-06-25
Yoshua Bengio launches LawZero to develop safe AI systems free from commercial influence
阅读更多来源: The Decoder | 05-06-25
A practical guide to building agents [pdf]cdn.openai.com
阅读更多来源: Hacker News | 05-06-25
Differences in link hallucination and source comprehension across different LLMmikecaulfield.substack.com
阅读更多来源: Hacker News | 05-06-25
Comparing Claude System Prompts Reveal Anthropic's Prioritiesdbreunig.com
阅读更多来源: Hacker News | 05-06-25
LLMs and Elixir: Windfall or Deathblow?zachdaniel.dev
阅读更多来源: Hacker News | 05-06-25
Prompt engineering playbook for programmersaddyo.substack.com
阅读更多来源: Hacker News | 05-06-25
OpenAI slams court order to save all ChatGPT logs, including deleted chatsarstechnica.com
阅读更多来源: Hacker News | 05-06-25
From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaningarxiv.org
阅读更多来源: Hacker News | 05-06-25
Generating Automotive Code: Large Language Models for Software Development and Verification in Safety-Critical Systems
Authors: Sven Kirchner, Alois C. Knoll |
阅读更多来源: ArXiv AI | 05-06-25
High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning
Authors: Tim Franzmeyer, Archie Sravankumar, Lijuan Liu, Yuning Mao, Rui Hou, Sinong Wang, Jakob N. Foerster, Luke Zettlemoyer, Madian Khabsa |
阅读更多来源: ArXiv AI | 05-06-25
Explainability-Based Token Replacement on LLM-Generated Text
Authors: Hadi Mohammadi, Anastasia Giachanou, Daniel L. Oberski, Ayoub Bagheri |
阅读更多来源: ArXiv AI | 05-06-25
Lacuna Inc. at SemEval-2025 Task 4: LoRA-Enhanced Influence-Based Unlearning for LLMs
Authors: Aleksey Kudelya, Alexander Shirnin |
阅读更多来源: ArXiv AI | 05-06-25
Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate
Authors: Mikel K. Ngueajio, Flor Miriam Plaza-del-Arco, Yi-Ling Chung, Danda B. Rawat, Amanda Cercas Curry |
阅读更多来源: ArXiv AI | 05-06-25
EuroLLM-9B: Technical Report
Authors: Pedro Henrique Martins, João Alves, Patrick Fernandes, Nuno M. Guerreiro, Ricardo Rei, Amin Farajian, Mateusz Klimaszewski, Duarte M. Alves, José Pombal, Manuel Faysse, Pierre Colombo, François Yvon, Barry Haddow, José G. C. de Souza, Alexandra Birch, André F. T. Martins |
阅读更多来源: ArXiv AI | 05-06-25
LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
Authors: Ming Zhang, Yujiong Shen, Zelin Li, Huayu Sha, Binze Hu, Yuhui Wang, Chenhao Huang, Shichun Liu, Jingqi Tong, Changhao Jiang, Mingxu Chai, Zhiheng Xi, Shihan Dou, Tao Gui, Qi Zhang, Xuanjing Huang |
阅读更多来源: ArXiv AI | 05-06-25
A Comprehensive Study on Medical Image Segmentation using Deep Neural Networks
Authors: Loan Dao, Ngoc Quoc Ly |
阅读更多来源: ArXiv AI | 05-06-25
TracLLM: A Generic Framework for Attributing Long Context LLMs
Authors: Yanting Wang, Wei Zou, Runpeng Geng, Jinyuan Jia |
阅读更多来源: ArXiv AI | 05-06-25
A Trustworthiness-based Metaphysics of Artificial Intelligence Systems
Authors: Andrea Ferrario |
阅读更多来源: ArXiv AI | 05-06-25
Computational Architects of Society: Quantum Machine Learning for Social Rule Genesis
Authors: Shan Shan |
阅读更多来源: ArXiv AI | 05-06-25
SUMO-MCP: Leveraging the Model Context Protocol for Autonomous Traffic Simulation and Optimization
Authors: Chenglong Ye, Gang Xiong, Junyou Shang, Xingyuan Dai, Xiaoyan Gong, Yisheng Lv |
阅读更多来源: ArXiv AI | 05-06-25
CogniPair: From LLM Chatbots to Conscious AI Agents -- GNWT-Based Multi-Agent Digital Twins for Social Pairing -- Dating & Hiring Applications
Authors: Wanghao Ye, Sihan Chen, Yiting Wang, Shwai He, Bowei Tian, Guoheng Sun, Ziyi Wang, Ziyao Wang, Yexiao He, Zheyu Shen, Meng Liu, Yuning Zhang, Meng Feng, Yang Wang, Siyuan Peng, Yilong Dai, Zhenle Duan, Hanzhang Qin, Ang Li |
阅读更多来源: ArXiv AI | 05-06-25
Reason from Future: Reverse Thought Chain Enhances LLM Reasoning
Authors: Yinlong Xu, Yanzhao Zheng, Shuoshuo Sun, Shuaihan Huang, Baohua Dong, Hangcheng Zhu, Ruohui Huang, Gang Yu, Hongxia Xu, Jian Wu |
阅读更多来源: ArXiv AI | 05-06-25
Training Cross-Morphology Embodied AI Agents: From Practical Challenges to Theoretical Foundations
Authors: Shaoshan Liu, Fan Wang, Hongjun Zhou, Yuanfeng Wang |
阅读更多来源: ArXiv AI | 05-06-25
Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games
Authors: Dongmin Park, Minkyu Kim, Beongjun Choi, Junhyuck Kim, Keon Lee, Jonghyun Lee, Inkyu Park, Byeong-Uk Lee, Jaeyoung Hwang, Jaewoo Ahn, Ameya S. Mahabaleshwarkar, Bilal Kartal, Pritam Biswas, Yoshi Suhara, Kangwook Lee, Jaewoong Cho |
阅读更多来源: ArXiv AI | 05-06-25
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance
Authors: Dhaval Patel, Shuxin Lin, James Rayfield, Nianjun Zhou, Roman Vaculin, Natalia Martinez, Fearghal O'donncha, Jayant Kalagnanam |
阅读更多来源: ArXiv AI | 05-06-25
Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to Enhance LLM Reasoning
Authors: Junqi Gao, Xiang Zou, YIng Ai, Dong Li, Yichen Niu, Biqing Qi, Jianxing Liu |
阅读更多来源: ArXiv AI | 05-06-25
AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM-Based Agents
Authors: Akshat Naik, Patrick Quinn, Guillermo Bosch, Emma Gouné, Francisco Javier Campos Zabala, Jason Ross Brown, Edward James Young |
阅读更多来源: ArXiv AI | 05-06-25
TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems
Authors: Shaina Raza, Ranjan Sapkota, Manoj Karkee, Christos Emmanouilidis |
阅读更多来源: ArXiv AI | 05-06-25
Character.AI moves toward social networking with animated AI avatars
阅读更多来源: The Decoder | 05-06-25
Show HN: App.build, an open-source AI agent that builds full-stack appsapp.build
阅读更多来源: Hacker News | 05-06-25
VectorSmuggle: Covertly Exfiltrate Data in Embeddingsgithub.com/jaschadub
阅读更多来源: Hacker News | 05-06-25
After court order, OpenAI is now preserving all ChatGPT user logslaurenweinstein.org
阅读更多来源: Hacker News | 05-06-25
Deepmind's "force prompting" lets AI create realistic video motion without physics engines
阅读更多来源: The Decoder | 04-06-25
AI-generated CUDA kernels outperform PyTorch in several GPU-heavy machine learning benchmarks
阅读更多来源: The Decoder | 04-06-25
Apple reportedly tests AI models that match ChatGPT's capabilities in internal benchmarks
阅读更多来源: The Decoder | 04-06-25
Show HN: Tiptap AI Agent – Add AI workflows to your text editor in minutes
阅读更多来源: Hacker News | 04-06-25
The Sky's the limit: AI automation on Mactaoofmac.com
阅读更多来源: Hacker News | 04-06-25
Claude Code is now available to Pro plansanthropic.com
阅读更多来源: Hacker News | 04-06-25
Deep learning gets the glory, deep fact checking gets ignoredfast.ai
阅读更多来源: Hacker News | 04-06-25
A deep dive into self-improving AI and the Darwin-Gödel Machinerichardcsuwandi.github.io
阅读更多来源: Hacker News | 04-06-25
Cloud Run GPUs, now GA, makes running AI workloads easier for everyonecloud.google.com
阅读更多来源: Hacker News | 04-06-25
Entity-Augmented Neuroscience Knowledge Retrieval Using Ontology and Semantic Understanding Capability of LLM
Authors: Pralaypati Ta, Sriram Venkatesaperumal, Keerthi Ram, Mohanasankar Sivaprakasam |
阅读更多来源: ArXiv AI | 04-06-25
SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation
Authors: Siqi Chen, Xinyu Dong, Haolei Xu, Xingyu Wu, Fei Tang, Hang Zhang, Yuchen Yan, Linjuan Wu, Wenqi Zhang, Guiyang Hou, Yongliang Shen, Weiming Lu, Yueting Zhuang |
阅读更多来源: ArXiv AI | 04-06-25
Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation
Authors: Priyaranjan Pattnayak, Amit Agarwal, Hansa Meghwani, Hitesh Laxmichand Patel, Srikant Panda |
阅读更多来源: ArXiv AI | 04-06-25
The State of Large Language Models for African Languages: Progress and Challenges
Authors: Kedir Yassin Hussen, Walelign Tewabe Sewunetie, Abinew Ali Ayele, Sukairaj Hafiz Imam, Shamsuddeen Hassan Muhammad, Seid Muhie Yimam |
阅读更多来源: ArXiv AI | 04-06-25
Improving LLM-Generated Code Quality with GRPO
Authors: Maxime Robeyns, Laurence Aitchison |
阅读更多来源: ArXiv AI | 04-06-25
Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts
Authors: Haizhong Zheng, Yang Zhou, Brian R. Bartoldson, Bhavya Kailkhura, Fan Lai, Jiawei Zhao, Beidi Chen |
阅读更多来源: ArXiv AI | 04-06-25
ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code
Authors: Tianyu Hua, Harper Hua, Violet Xiang, Benjamin Klieger, Sang T. Truong, Weixin Liang, Fan-Yun Sun, Nick Haber |
阅读更多来源: ArXiv AI | 04-06-25
Generative AI for Predicting 2D and 3D Wildfire Spread: Beyond Physics-Based Models and Traditional Deep Learning
Authors: Haowen Xu, Sisi Zlatanova, Ruiyu Liang, Ismet Canbulat |
阅读更多来源: ArXiv AI | 04-06-25
A Smart Multimodal Healthcare Copilot with Powerful LLM Reasoning
Authors: Xuejiao Zhao, Siyan Liu, Su-Yin Yang, Chunyan Miao |
阅读更多来源: ArXiv AI | 04-06-25
Towards Generating Controllable and Solvable Geometry Problem by Leveraging Symbolic Deduction Engine
Authors: Zhuoxuan Jiang, Tianyang Zhang, Peiyan Peng, Jing Chen, Yinong Xun, Haotian Zhang, Lichi Li, Yong Li, Shaohua Zhang |
阅读更多来源: ArXiv AI | 04-06-25
Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making
Authors: Xu Wan, Wenyue Xu, Chao Yang, Mingyang Sun |
阅读更多来源: ArXiv AI | 04-06-25
ADFormer: Aggregation Differential Transformer for Passenger Demand Forecasting
Authors: Haichen Wang, Liu Yang, Xinyuan Zhang, Haomin Yu, Ming Li, Jilin Hu |
阅读更多来源: ArXiv AI | 04-06-25
Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation
Authors: Yue Yang, MingKang Chen, Qihua Liu, Mengkang Hu, Qiguang Chen, Gengrui Zhang, Shuyue Hu, Guangtao Zhai, Yu Qiao, Yu Wang, Wenqi Shao, Ping Luo |
阅读更多来源: ArXiv AI | 04-06-25
From Prompts to Protection: Large Language Model-Enabled In-Context Learning for Smart Public Safety UAV
Authors: Yousef Emami, Hao Zhou, Miguel Gutierrez Gaitan, Kai Li, Luis Almeida, Zhu Han |
阅读更多来源: ArXiv AI | 04-06-25
Open-Set Living Need Prediction with Large Language Models
Authors: Xiaochong Lan, Jie Feng, Yizhou Sun, Chen Gao, Jiahuan Lei, Xinlei Shi, Hengliang Luo, Yong Li |
阅读更多来源: ArXiv AI | 04-06-25
Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations
Authors: Jinyuan Luo, Zhen Fang, Yixuan Li, Seongheon Park, Ling Chen |
阅读更多来源: ArXiv AI | 04-06-25
Why do AI agents communicate in human language?
Authors: Pengcheng Zhou, Yinglun Feng, Halimulati Julaiti, Zhongliang Yang |
阅读更多来源: ArXiv AI | 04-06-25
Benchmarking and Advancing Large Language Models for Local Life Services
Authors: Xiaochong Lan, Jie Feng, Jiahuan Lei, Xinlei Shi, Yong Li |
阅读更多来源: ArXiv AI | 04-06-25
TaxAgent: How Large Language Model Designs Fiscal Policy
Authors: Jizhou Wang, Xiaodan Fang, Lei Huang, Yongfeng Huang |
阅读更多来源: ArXiv AI | 04-06-25
Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning
Authors: Chen Qian, Dongrui Liu, Haochen Wen, Zhen Bai, Yong Liu, Jing Shao |
阅读更多来源: ArXiv AI | 04-06-25
Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs
Authors: Shangmin Guo, Omar Darwiche Domingues, Raphaël Avalos, Aaron Courville, Florian Strub |
阅读更多来源: ArXiv AI | 04-06-25
It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics
Authors: Matthew Kowal, Jasper Timm, Jean-Francois Godbout, Thomas Costello, Antonio A. Arechar, Gordon Pennycook, David Rand, Adam Gleave, Kellin Pelrine |
阅读更多来源: ArXiv AI | 04-06-25
Linear Spatial World Models Emerge in Large Language Models
Authors: Matthieu Tehenan, Christian Bolivar Moya, Tenghai Long, Guang Lin |
阅读更多来源: ArXiv AI | 04-06-25
DPO Learning with LLMs-Judge Signal for Computer Use Agents
Authors: Man Luo, David Cobbley, Xin Su, Shachar Rosenman, Vasudev Lal, Shao-Yen Tseng, Phillip Howard |
阅读更多来源: ArXiv AI | 04-06-25
Anthropic's Claude uses Elevenlabs technology for speech features rather than an in-house model
阅读更多来源: The Decoder | 03-06-25
Google says Veo 3 users have generated millions of AI videos in just a few days
阅读更多来源: The Decoder | 03-06-25
Cloudlflare builds OAuth with Claude and publishes all the promptsgithub.com/cloudflare
阅读更多来源: Hacker News | 03-06-25
Spark AI (YC W24) Is Hiring a Full Stack Engineer in San Franciscoycombinator.com
阅读更多来源: Hacker News | 03-06-25
My AI skeptic friends are all nutsfly.io
阅读更多来源: Hacker News | 03-06-25
Claude has learned how to jailbreak Cursorcursor.com
阅读更多来源: Hacker News | 03-06-25
PolyBERT: Fine-Tuned Poly Encoder BERT-Based Model for Word Sense Disambiguation
Authors: Linhan Xia, Mingzhan Yang, Guohui Yuan, Shengnan Tao, Yujing Qiu, Guo Yu, Kai Lei |
阅读更多来源: ArXiv AI | 03-06-25
Unlocking Personalized Knowledge in Federated Large Language Model: The Power of Mixture of Experts
Authors: Fan Liu, Bikang Pan, Zhongyi Wang, Xi Yao, Xiaoying Tang, Jingya Wang, Ye Shi |
阅读更多来源: ArXiv AI | 03-06-25
The Coming Crisis of Multi-Agent Misalignment: AI Alignment Must Be a Dynamic and Social Process
Authors: Florian Carichon, Aditi Khandelwal, Marylou Fauchard, Golnoosh Farnadi |
阅读更多来源: ArXiv AI | 03-06-25
MCP-Zero: Proactive Toolchain Construction for LLM Agents from Scratch
Authors: Xiang Fei, Xiawu Zheng, Hao Feng |
阅读更多来源: ArXiv AI | 03-06-25
IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory
Authors: Wei Song, Zhenya Huang, Cheng Cheng, Weibo Gao, Bihan Xu, GuanHao Zhao, Fei Wang, Runze Wu |
阅读更多来源: ArXiv AI | 03-06-25
ChemAU: Harness the Reasoning of LLMs in Chemical Research with Adaptive Uncertainty Estimation
Authors: Xinyi Liu, Lipeng Ma, Yixuan Li, Weidong Yang, Qingyuan Zhou, Jiayi Song, Shuhao Li, Ben Fei |
阅读更多来源: ArXiv AI | 03-06-25
Modular Speaker Architecture: A Framework for Sustaining Responsibility and Contextual Integrity in Multi-Agent AI Communication
Authors: Khe-Han Toh, Hong-Kuan Teo |
阅读更多来源: ArXiv AI | 03-06-25
GeoLocSFT: Efficient Visual Geolocation via Supervised Fine-Tuning of Multimodal Foundation Models
Authors: Qiang Yi, Lianlei Shan |
阅读更多来源: ArXiv AI | 03-06-25
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
Authors: Zhong Zhang, Yaxi Lu, Yikun Fu, Yupeng Huo, Shenzhi Yang, Yesai Wu, Han Si, Xin Cong, Haotian Chen, Yankai Lin, Jie Xie, Wei Zhou, Wang Xu, Yuanheng Zhang, Zhou Su, Zhongwu Zhai, Xiaoming Liu, Yudong Mei, Jianming Xu, Hongyan Tian, Chongyi Wang, Chi Chen, Yuan Yao, Zhiyuan Liu, Maosong Sun |
阅读更多来源: ArXiv AI | 03-06-25
Distinguishing Autonomous AI Agents from Collaborative Agentic Systems: A Comprehensive Framework for Understanding Modern Intelligent Architectures
Authors: Prashik Buddhaghosh Bansod |
阅读更多来源: ArXiv AI | 03-06-25
FinRobot: Generative Business Process AI Agents for Enterprise Resource Planning in Finance
Authors: Hongyang Yang, Likun Lin, Yang She, Xinyu Liao, Jiaoyang Wang, Runjia Zhang, Yuquan Mo, Christina Dan Wang |
阅读更多来源: ArXiv AI | 03-06-25
MLA-Trust: Benchmarking Trustworthiness of Multimodal LLM Agents in GUI Environments
Authors: Xiao Yang, Jiawei Chen, Jun Luo, Zhengwei Fang, Yinpeng Dong, Hang Su, Jun Zhu |
阅读更多来源: ArXiv AI | 03-06-25
Social Cooperation in Conversational AI Agents
Authors: Mustafa Mert Çelikok, Saptarashmi Bandyopadhyay, Robert Loftin |
阅读更多来源: ArXiv AI | 03-06-25
Reasoning-Based Approach with Chain-of-Thought for Alzheimer's Detection Using Speech and Large Language Models
Authors: Chanwoo Park, Anna Seo Gyeong Choi, Sunghye Cho, Chanwoo Kim |
阅读更多来源: ArXiv AI | 03-06-25
K12Vista: Exploring the Boundaries of MLLMs in K-12 Education
Authors: Chong Li, Chenglin Zhu, Tao Zhang, Mingan Lin, Zenan Zhou, Jian Xie |
阅读更多来源: ArXiv AI | 03-06-25
The Ultimate Test of Superintelligent AI Agents: Can an AI Balance Care and Control in Asymmetric Relationships?
Authors: Djallel Bouneffouf, Matthew Riemer, Kush Varshney |
阅读更多来源: ArXiv AI | 03-06-25
A Study on the MCP x A2A Framework for Enhancing Interoperability of LLM-based Autonomous Agents
Authors: Cheonsu Jeong |
阅读更多来源: ArXiv AI | 03-06-25
Fodor and Pylyshyn's Legacy - Still No Human-like Systematic Compositionality in Neural Networks
Authors: Tim Woydt, Moritz Willig, Antonia Wüst, Lukas Helff, Wolfgang Stammer, Constantin A. Rothkopf, Kristian Kersting |
阅读更多来源: ArXiv AI | 03-06-25
COALESCE: Economic and Security Dynamics of Skill-Based Task Outsourcing Among Team of Autonomous LLM Agents
Authors: Manish Bhatt, Ronald F. Del Rosario, Vineeth Sai Narajala, Idan Habler |
阅读更多来源: ArXiv AI | 03-06-25
Large language models can learn and generalize steganographic chain-of-thought under process supervision
Authors: Joey Skaf, Luis Ibanez-Lissen, Robert McCarthy, Connor Watts, Vasil Georgiv, Hannes Whittingham, Lorena Gonzalez-Manzano, David Lindner, Cameron Tice, Edward James Young, Puria Radmard |
阅读更多来源: ArXiv AI | 03-06-25
Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods
Authors: Yifan Hao, Xingyuan Pan, Hanning Zhang, Chenlu Ye, Rui Pan, Tong Zhang |
阅读更多来源: ArXiv AI | 03-06-25
OpenAI sees human interaction as a competitor to ChatGPT's super assistant ambitions
阅读更多来源: The Decoder | 03-06-25
Cloudlflare builds OAuth with Claude and publishes all the promptsgithub.com/cloudflare
阅读更多来源: Hacker News | 03-06-25
Supervised Quantum Machine Learning: A Future Outlook from Qubits to Enterprise Applications
Authors: Srikanth Thudumu, Jason Fisher, Hung Du |
阅读更多来源: ArXiv AI | 03-06-25
PhySense: Principle-Based Physics Reasoning Benchmarking for Large Language Models
Authors: Yinggan Xu, Yue Liu, Zhiqiang Gao, Changnan Peng, Di Luo |
阅读更多来源: ArXiv AI | 03-06-25
Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck
Authors: Yuwen Tan, Yuan Qing, Boqing Gong |
阅读更多来源: ArXiv AI | 03-06-25
Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs
Authors: Juraj Vladika, Annika Domres, Mai Nguyen, Rebecca Moser, Jana Nano, Felix Busch, Lisa C. Adams, Keno K. Bressem, Denise Bernhardt, Stephanie E. Combs, Kai J. Borm, Florian Matthes, Jan C. Peeken |
阅读更多来源: ArXiv AI | 03-06-25
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Authors: Mingjie Liu, Shizhe Diao, Ximing Lu, Jian Hu, Xin Dong, Yejin Choi, Jan Kautz, Yi Dong |
阅读更多来源: ArXiv AI | 03-06-25
Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning
Authors: Shuyao Xu, Cheng Peng, Jiangxuan Long, Weidi Xu, Wei Chu, Yuan Qi |
阅读更多来源: ArXiv AI | 03-06-25
Lessons Learned: A Multi-Agent Framework for Code LLMs to Learn and Improve
Authors: Yuanzhe Liu, Ryan Deng, Tim Kaler, Xuhao Chen, Charles E. Leiserson, Yao Ma, Jie Chen |
阅读更多来源: ArXiv AI | 03-06-25
Multi-RAG: A Multimodal Retrieval-Augmented Generation System for Adaptive Video Understanding
Authors: Mingyang Mao, Mariela M. Perez-Cabarcas, Utteja Kallakuri, Nicholas R. Waytowich, Xiaomin Lin, Tinoosh Mohsenin |
阅读更多来源: ArXiv AI | 03-06-25
MSQA: Benchmarking LLMs on Graduate-Level Materials Science Reasoning and Knowledge
Authors: Jerry Junyang Cheung, Shiyao Shen, Yuchen Zhuang, Yinghao Li, Rampi Ramprasad, Chao Zhang |
阅读更多来源: ArXiv AI | 03-06-25
mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation
Authors: Chan-Wei Hu, Yueqi Wang, Shuo Xing, Chia-Ju Chen, Zhengzhong Tu |
阅读更多来源: ArXiv AI | 03-06-25
Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
Authors: Qiao Xiao, Alan Ansell, Boqian Wu, Lu Yin, Mykola Pechenizkiy, Shiwei Liu, Decebal Constantin Mocanu |
阅读更多来源: ArXiv AI | 03-06-25
GenIC: An LLM-Based Framework for Instance Completion in Knowledge Graphs
Authors: Amel Gader, Alsayed Algergawy |
阅读更多来源: ArXiv AI | 03-06-25
E^2GraphRAG: Streamlining Graph-based RAG for High Efficiency and Effectiveness
Authors: Yibo Zhao, Jiapeng Zhu, Ye Guo, Kangkang He, Xiang Li |
阅读更多来源: ArXiv AI | 03-06-25
Bootstrapping LLM Robustness for VLM Safety via Reducing the Pretraining Modality Gap
Authors: Wenhan Yang, Spencer Stice, Ali Payani, Baharan Mirzasoleiman |
阅读更多来源: ArXiv AI | 03-06-25
How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning
Authors: Hongyi James Cai, Junlin Wang, Xiaoyin Chen, Bhuwan Dhingra |
阅读更多来源: ArXiv AI | 03-06-25
Generative AI for Urban Design: A Stepwise Approach Integrating Human Expertise with Multimodal Diffusion Models
Authors: Mingyi He, Yuebing Liang, Shenhao Wang, Yunhan Zheng, Qingyi Wang, Dingyi Zhuang, Li Tian, Jinhua Zhao |
阅读更多来源: ArXiv AI | 03-06-25
FABLE: A Novel Data-Flow Analysis Benchmark on Procedural Text for Large Language Model Evaluation
Authors: Vishal Pallagani, Nitin Gupta, John Aydin, Biplav Srivastava |
阅读更多来源: ArXiv AI | 03-06-25
GridRoute: A Benchmark for LLM-Based Route Planning with Cardinal Movement in Grid Environments
Authors: Kechen Li, Yaotian Tao, Ximing Wen, Quanwei Sun, Zifei Gong, Chang Xu, Xizhe Zhang, Tianbo Ji |
阅读更多来源: ArXiv AI | 03-06-25
Mind the Quote: Enabling Quotation-Aware Dialogue in LLMs via Plug-and-Play Modules
Authors: Yueqi Zhang, Peiwen Yuan, Shaoxiong Feng, Yiwei Li, Xinglin Wang, Jiayi Shi, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li |
阅读更多来源: ArXiv AI | 03-06-25
Leveraging Knowledge Graphs and LLMs for Structured Generation of Misinformation
Authors: Sania Nayab, Marco Simoni, Giulio Rossolini |
阅读更多来源: ArXiv AI | 03-06-25
Optimizing the Interface Between Knowledge Graphs and LLMs for Complex Reasoning
Authors: Vasilije Markovic, Lazar Obradovic, Laszlo Hajdu, Jovan Pavlovic |
阅读更多来源: ArXiv AI | 03-06-25
SEAR: A Multimodal Dataset for Analyzing AR-LLM-Driven Social Engineering Behaviors
Authors: Tianlong Yu, Chenghang Ye, Zheyu Yang, Ziyi Zhou, Cui Tang, Zui Tao, Jun Zhang, Kailong Wang, Liting Zhou, Yang Yang, Ting Bi |
阅读更多来源: ArXiv AI | 03-06-25
MELT: Towards Automated Multimodal Emotion Data Annotation by Leveraging LLM Embedded Knowledge
Authors: Xin Jing, Jiadong Wang, Iosif Tsangko, Andreas Triantafyllopoulos, Björn W. Schuller |
阅读更多来源: ArXiv AI | 03-06-25
Random Rule Forest (RRF): Interpretable Ensembles of LLM-Generated Questions for Predicting Startup Success
Authors: Ben Griffin, Joseph Ternasky, Fuat Alican, Yigit Ihlamur |
阅读更多来源: ArXiv AI | 03-06-25
Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models
Authors: Frederike Lübeck, Jonas Wildberger, Frederik Träuble, Maximilian Mordig, Sergios Gatidis, Andreas Krause, Bernhard Schölkopf |
阅读更多来源: ArXiv AI | 03-06-25
EXP-Bench: Can AI Conduct AI Research Experiments?
Authors: Patrick Tser Jern Kon, Jiachen Liu, Xinyi Zhu, Qiuyi Ding, Jingjia Peng, Jiarong Xing, Yibo Huang, Yiming Qiu, Jayanth Srinivasa, Myungjin Lee, Mosharaf Chowdhury, Matei Zaharia, Ang Chen |
阅读更多来源: ArXiv AI | 03-06-25
Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents
Authors: Yaxin Luo, Zhaoyi Li, Jiacheng Liu, Jiacheng Cui, Xiaohan Zhao, Zhiqiang Shen |
阅读更多来源: ArXiv AI | 03-06-25
Elevenlabs' new AI voice system enables smoother interactions through real-time analysis
阅读更多来源: The Decoder | 02-06-25
Anthropic CEO predicts 20% unemployment from AI - and suggests taxing every AI response
阅读更多来源: The Decoder | 02-06-25
How can AI researchers save energy? By going backwardquantamagazine.org
阅读更多来源: Hacker News | 02-06-25
Beyond the Black Box: Interpretability of LLMs in Financearxiv.org
阅读更多来源: Hacker News | 02-06-25
Codex CLI is going nativegithub.com/openai
阅读更多来源: Hacker News | 02-06-25
When Fine-Tuning Makes Sense: A Developer's Guidegetkiln.ai
阅读更多来源: Hacker News | 02-06-25
Google AI Edge – On-device cross-platform AI deploymentai.google.dev
阅读更多来源: Hacker News | 02-06-25
Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time
Authors: Mohamad Chehade, Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy, Dinesh Manocha, Hao Zhu, Amrit Singh Bedi |
阅读更多来源: ArXiv AI | 01-06-25
SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA
Authors: Minrui Luo, Fuhang Kuang, Yu Wang, Zirui Liu, Tianxing He |
阅读更多来源: ArXiv AI | 01-06-25
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
Authors: Ziyin Zhang, Jiahao Xu, Zhiwei He, Tian Liang, Qiuzhi Liu, Yansi Li, Linfeng Song, Zhengwen Liang, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu |
阅读更多来源: ArXiv AI | 01-06-25
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
Authors: Diankun Wu, Fangfu Liu, Yi-Hsin Hung, Yueqi Duan |
阅读更多来源: ArXiv AI | 01-06-25
Case-Based Reasoning Enhances the Predictive Power of LLMs in Drug-Drug Interaction
Authors: Guangyi Liu, Yongqi Zhang, Xunyuan Liu, Quanming Yao |
阅读更多来源: ArXiv AI | 01-06-25
Second Opinion Matters: Towards Adaptive Clinical AI via the Consensus of Expert Model Ensemble
Authors: Amit Kumthekar, Zion Tilley, Henry Duong, Bhargav Patel, Michael Magnoli, Ahmed Omar, Ahmed Nasser, Chaitanya Gharpure, Yevgen Reztzov |
阅读更多来源: ArXiv AI | 01-06-25
MathArena: Evaluating LLMs on Uncontaminated Math Competitions
Authors: Mislav Balunović, Jasper Dekoninck, Ivo Petrov, Nikola Jovanović, Martin Vechev |
阅读更多来源: ArXiv AI | 01-06-25
A Unified Framework for Human AI Collaboration in Security Operations Centers with Trusted Autonomy
Authors: Ahmad Mohsin, Helge Janicke, Ahmed Ibrahim, Iqbal H. Sarker, Seyit Camtepe |
阅读更多来源: ArXiv AI | 01-06-25
Autoformalization in the Era of Large Language Models: A Survey
Authors: Ke Weng, Lun Du, Sirui Li, Wangyue Lu, Haozhe Sun, Hengyu Liu, Tiancheng Zhang |
阅读更多来源: ArXiv AI | 01-06-25
EVOREFUSE: Evolutionary Prompt Optimization for Evaluation and Mitigation of LLM Over-Refusal to Pseudo-Malicious Instructions
Authors: Xiaorui Wu, Xiaofeng Mao, Fei Li, Xin Zhang, Xiaolu Zhang, Jun Zhou, Yuxiang Peng, Li Zheng, Chong Teng, Donghong Ji, Zhuang Li |
阅读更多来源: ArXiv AI | 01-06-25
SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents
Authors: Kunlun Zhu, Jiaxun Zhang, Ziheng Qi, Nuoxing Shang, Zijia Liu, Peixuan Han, Yue Su, Haofei Yu, Jiaxuan You |
阅读更多来源: ArXiv AI | 01-06-25
Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise Analytics
Authors: Ran Zhang, Mohannad Elhamod |
阅读更多来源: ArXiv AI | 01-06-25
Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability
Authors: Ruida Wang, Yuxin Li, Yi R. (May)Fung, Tong Zhang |
阅读更多来源: ArXiv AI | 01-06-25
Deepseek's R1 model closes the gap with OpenAI and Google after major update
阅读更多来源: The Decoder | 01-06-25
The ‘white-collar bloodbath’ is all part of the AI hype machinecnn.com
阅读更多来源: Hacker News | 01-06-25
Show HN: AI Peer Reviewer – Multiagent system for scientific manuscript analysisgithub.com/robertjakob
阅读更多来源: Hacker News | 01-06-25
Generative AI startup Odyssey demos interactive AI-generated video
阅读更多来源: The Decoder | 31-05-25
Show HN: MCP Defender – OSS AI Firewall for Protecting MCP in Cursor/Claude etcmcpdefender.com
阅读更多来源: Hacker News | 31-05-25
The Darwin Gödel Machine: AI that improves itself by rewriting its own codesakana.ai
阅读更多来源: Hacker News | 31-05-25
AccessOwl (YC S22) is hiring an AI TypeScript Engineer to connect 100s of SaaSycombinator.com
阅读更多来源: Hacker News | 31-05-25
The Trackers and SDKs in ChatGPT, Claude, Grok and Perplexityjamesoclaire.com
阅读更多来源: Hacker News | 31-05-25
What's working for YC companies since the AI boomjamesin.substack.com
阅读更多来源: Hacker News | 31-05-25
Opera unveils Neon, a browser designed for both humans and AI agents
阅读更多来源: The Decoder | 31-05-25
One year after its rivals, Claude can finally speak with users through a new voice mode
阅读更多来源: The Decoder | 31-05-25
Anthropic launches a voice mode for Claudetechcrunch.com
阅读更多来源: Hacker News | 31-05-25
Mistral's Agents API enables AI agents to collaborate and connect with external systems
阅读更多来源: The Decoder | 30-05-25
What is currently the best LLM model for consumer grade hardware? Is it phi-4?
阅读更多来源: Hacker News | 30-05-25
Spaitial pushes generative AI to understand and create 3D structures with real physical properties
阅读更多来源: The Decoder | 30-05-25
Human coders are still better than LLMsantirez.com
阅读更多来源: Hacker News | 30-05-25
Open-sourcing circuit tracing toolsanthropic.com
阅读更多来源: Hacker News | 30-05-25
A visual exploration of vector embeddingspamelafox.org
阅读更多来源: Hacker News | 30-05-25
Nick Clegg says a mandatory AI training opt-in would kill the UK's AI industry
阅读更多来源: The Decoder | 29-05-25
ClaimPKG: Enhancing Claim Verification via Pseudo-Subgraph Generation with Lightweight Specialized LLM
Authors: Hoang Pham, Thanh-Do Nguyen, Khac-Hoai Nam Bui |
阅读更多来源: ArXiv AI | 29-05-25
Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning
Authors: Erxin Yu, Jing Li, Ming Liao, Qi Zhu, Boyang Xue, Minghui Xu, Baojun Wang, Lanqing Hong, Fei Mi, Lifeng Shang |
阅读更多来源: ArXiv AI | 29-05-25
Agent-UniRAG: A Trainable Open-Source LLM Agent Framework for Unified Retrieval-Augmented Generation Systems
Authors: Hoang Pham, Khac-Hoai Nam Bui |
阅读更多来源: ArXiv AI | 29-05-25
R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning
Authors: Yongchao Chen, Yueying Liu, Junwei Zhou, Yilun Hao, Jingquan Wang, Yang Zhang, Chuchu Fan |
阅读更多来源: ArXiv AI | 29-05-25
Understanding the learned look-ahead behavior of chess neural networks
Authors: Diogo Cruz |
阅读更多来源: ArXiv AI | 29-05-25
3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model
Authors: Wenbo Hu, Yining Hong, Yanjun Wang, Leison Gao, Zibu Wei, Xingcheng Yao, Nanyun Peng, Yonatan Bitton, Idan Szpektor, Kai-Wei Chang |
阅读更多来源: ArXiv AI | 29-05-25
From Reasoning to Learning: A Survey on Hypothesis Discovery and Rule Learning with Large Language Models
Authors: Kaiyu He, Zhiyu Chen |
阅读更多来源: ArXiv AI | 29-05-25
Modeling and Optimizing User Preferences in AI Copilots: A Comprehensive Survey and Taxonomy
Authors: Saleh Afzoon, Zahra Jahanandish, Phuong Thao Huynh, Amin Beheshti, Usman Naseem |
阅读更多来源: ArXiv AI | 29-05-25
SAGE-Eval: Evaluating LLMs for Systematic Generalizations of Safety Facts
Authors: Chen Yueh-Han, Guy Davidson, Brenden M. Lake |
阅读更多来源: ArXiv AI | 29-05-25
Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation
Authors: Tharindu Kumarage, Ninareh Mehrabi, Anil Ramakrishna, Xinyan Zhao, Richard Zemel, Kai-Wei Chang, Aram Galstyan, Rahul Gupta, Charith Peris |
阅读更多来源: ArXiv AI | 29-05-25
Visual Large Language Models Exhibit Human-Level Cognitive Flexibility in the Wisconsin Card Sorting Test
Authors: Guangfu Hao, Frederic Alexandre, Shan Yu |
阅读更多来源: ArXiv AI | 29-05-25
HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym
Authors: Ngoc La, Ruaridh Mon-Williams, Julie A. Shah |
阅读更多来源: ArXiv AI | 29-05-25
AgentDNS: A Root Domain Naming System for LLM Agents
Authors: Enfang Cui, Yujun Cheng, Rui She, Dan Liu, Zhiyuan Liang, Minxin Guo, Tianzheng Li, Qian Wei, Wenjuan Xing, Zhijie Zhong |
阅读更多来源: ArXiv AI | 29-05-25
From Large AI Models to Agentic AI: A Tutorial on Future Intelligent Communications
Authors: Feibo Jiang, Cunhua Pan, Li Dong, Kezhi Wang, Octavia A. Dobre, Merouane Debbah |
阅读更多来源: ArXiv AI | 29-05-25
Chatbots like ChatGPT have not led to significant changes in wages or working hours, study finds
阅读更多来源: The Decoder | 29-05-25
Show HN: AutoThink – Boosts local LLM performance with adaptive reasoning
阅读更多来源: Hacker News | 29-05-25
Launch HN: MindFort (YC X25) – AI agents for continuous pentesting
阅读更多来源: Hacker News | 29-05-25
LLM codegen go brrr – Parallelization with Git worktrees and tmuxskeptrune.com
阅读更多来源: Hacker News | 29-05-25
Gmail Personal Smart Replies: The first time an AI feature has worried me
阅读更多来源: The Decoder | 28-05-25
Running GPT-2 in WebGL: Rediscovering the Lost Art of GPU Shader Programmingnathan.rs
阅读更多来源: Hacker News | 28-05-25
There Is No Diffie-Hellman but Elliptic Curve Diffie-Hellmankeymaterial.net
阅读更多来源: Hacker News | 28-05-25
Show HN: My LLM CLI tool can run tools now, from Python code or pluginssimonwillison.net
阅读更多来源: Hacker News | 28-05-25
Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making
Authors: Yihan Wang, Qiao Yan, Zhenghao Xing, Lihao Liu, Junjun He, Chi-Wing Fu, Xiaowei Hu, Pheng-Ann Heng |
阅读更多来源: ArXiv AI | 28-05-25
Multi-Modal Artificial Intelligence of Embryo Grading and Pregnancy Prediction in Assisted Reproductive Technology: A Review
Authors: Xueqiang Ouyang, Jia Wei |
阅读更多来源: ArXiv AI | 28-05-25
How does Alignment Enhance LLMs' Multilingual Capabilities? A Language Neurons Perspective
Authors: Shimao Zhang, Zhejian Lai, Xiang Liu, Shuaijie She, Xiao Liu, Yeyun Gong, Shujian Huang, Jiajun Chen |
阅读更多来源: ArXiv AI | 28-05-25
CoderAgent: Simulating Student Behavior for Personalized Programming Learning with Large Language Models
Authors: Yi Zhan, Qi Liu, Weibo Gao, Zheng Zhang, Tianfu Wang, Shuanghong Shen, Junyu Lu, Zhenya Huang |
阅读更多来源: ArXiv AI | 28-05-25
Comparisons between a Large Language Model-based Real-Time Compound Diagnostic Medical AI Interface and Physicians for Common Internal Medicine Cases using Simulated Patients
Authors: Hyungjun Park (1,2), Chang-Yun Woo (3), Seungjo Lim (2), Seunghwan Lim (2), Keunho Kwak (2), Ju Young Jeong (4), Chong Hyun Suh (4) ((1) Department of Pulmonology, Shihwa Medical Center, Siheung, Republic of Korea (2) Helpmedoc Inc., Republic of Korea (3) Department of Internal Medicine, Asan Medical Center, Seoul, Republic of Korea (4) Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Republic of Korea) |
阅读更多来源: ArXiv AI | 28-05-25
Project Riley: Multimodal Multi-Agent LLM Collaboration with Emotional Reasoning and Voting
Authors: Ana Rita Ortigoso, Gabriel Vieira, Daniel Fuentes, Luis Frazão, Nuno Costa, António Pereira |
阅读更多来源: ArXiv AI | 28-05-25
RRO: LLM Agent Optimization Through Rising Reward Trajectories
Authors: Zilong Wang, Jingfeng Yang, Sreyashi Nag, Samarth Varshney, Xianfeng Tang, Haoming Jiang, Jingbo Shang, Sheikh Muhammad Sarwar |
阅读更多来源: ArXiv AI | 28-05-25
E2E Process Automation Leveraging Generative AI and IDP-Based Automation Agent: A Case Study on Corporate Expense Processing
Authors: Cheonsu Jeong, Seongmin Sim, Hyoyoung Cho, Sungsu Kim, Byounggwan Shin |
阅读更多来源: ArXiv AI | 28-05-25
GIFARC: Synthetic Dataset for Leveraging Human-Intuitive Analogies to Elevate AI Reasoning
Authors: Woochang Sim, Hyunseok Ryu, Kyungmin Choi, Sungwon Han, Sundong Kim |
阅读更多来源: ArXiv AI | 28-05-25
LLM-Guided Reinforcement Learning: Addressing Training Bottlenecks through Policy Modulation
Authors: Heng Tan, Hua Yan, Yu Yang |
阅读更多来源: ArXiv AI | 28-05-25
AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage
Authors: Xuanle Zhao, Zilin Sang, Yuxuan Li, Qi Shi, Shuo Wang, Duzhen Zhang, Xu Han, Zhiyuan Liu, Maosong Sun |
阅读更多来源: ArXiv AI | 28-05-25
Step-Wise Formal Verification for LLM-Based Mathematical Problem Solving
Authors: Kuo Zhou, Lu Zhang |
阅读更多来源: ArXiv AI | 28-05-25
Large Language Model-enhanced Reinforcement Learning for Low-Altitude Economy Networking
Authors: Lingyi Cai, Ruichen Zhang, Changyuan Zhao, Yu Zhang, Jiawen Kang, Dusit Niyato, Tao Jiang, Xuemin Shen |
阅读更多来源: ArXiv AI | 28-05-25
Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations
Authors: Hao Li, He Cao, Bin Feng, Yanjun Shao, Xiangru Tang, Zhiyuan Yan, Li Yuan, Yonghong Tian, Yu Li |
阅读更多来源: ArXiv AI | 28-05-25
Complex System Diagnostics Using a Knowledge Graph-Informed and Large Language Model-Enhanced Framework
Authors: Saman Marandi, Yu-Shu Hu, Mohammad Modarres |
阅读更多来源: ArXiv AI | 28-05-25
RLJP: Legal Judgment Prediction via First-Order Logic Rule-enhanced with Large Language Models
Authors: Yue Zhang, Zhiliang Tian, Shicheng Zhou, Haiyang Wang, Wenqing Hou, Yuying Liu, Xuechen Zhao, Minlie Huang, Ye Wang, Bin Zhou |
阅读更多来源: ArXiv AI | 28-05-25
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs
Authors: Jiakang Yuan, Tianshuo Peng, Yilei Jiang, Yiting Lu, Renrui Zhang, Kaituo Feng, Chaoyou Fu, Tao Chen, Lei Bai, Bo Zhang, Xiangyu Yue |
阅读更多来源: ArXiv AI | 28-05-25
A Structured Unplugged Approach for Foundational AI Literacy in Primary Education
Authors: Maria Cristina Carrisi, Mirko Marras, Sara Vergallo |
阅读更多来源: ArXiv AI | 28-05-25
The Multilingual Divide and Its Impact on Global AI Safety
Authors: Aidan Peppin, Julia Kreutzer, Alice Schoenauer Sebag, Kelly Marchisio, Beyza Ermis, John Dang, Samuel Cahyawijaya, Shivalika Singh, Seraphina Goldfarb-Tarrant, Viraat Aryabumi, Aakanksha, Wei-Yin Ko, Ahmet Üstün, Matthias Gallé, Marzieh Fadaee, Sara Hooker |
阅读更多来源: ArXiv AI | 28-05-25
Diagnosing and Resolving Cloud Platform Instability with Multi-modal RAG LLMs
Authors: Yifan Wang, Kenneth P. Birman |
阅读更多来源: ArXiv AI | 28-05-25
Robust Hypothesis Generation: LLM-Automated Language Bias for Inductive Logic Programming
Authors: Yang Yang, Jiemin Wu, Yutao Yue |
阅读更多来源: ArXiv AI | 28-05-25
Google expands access to Veo 3, its viral new video model, through the Gemini app
阅读更多来源: The Decoder | 27-05-25
Diligent (YC S23) Is Hiring a Founding AI Engineerycombinator.com
阅读更多来源: Hacker News | 27-05-25
Trying to teach in the age of the AI homework machinesolarshades.club
阅读更多来源: Hacker News | 27-05-25
Highlights from the Claude 4 system promptsimonwillison.net
阅读更多来源: Hacker News | 27-05-25
Automated CAD Modeling Sequence Generation from Text Descriptions via Transformer-Based Large Language Models
Authors: Jianxing Liao, Junyan Xu, Yatao Sun, Maowen Tang, Sicheng He, Jingxian Liao, Shui Yu, Yun Li, Hongguan Xiao |
阅读更多来源: ArXiv AI | 27-05-25
Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel Bugs
Authors: Zhenhao Zhou, Zhuochen Huang, Yike He, Chong Wang, Jiajun Wang, Yijian Wu, Xin Peng, Yiling Lou |
阅读更多来源: ArXiv AI | 27-05-25
MSD-LLM: Predicting Ship Detention in Port State Control Inspections with Large Language Model
Authors: Jiongchao Jin, Xiuju Fu, Xiaowei Gao, Tao Cheng, Ran Yan |
阅读更多来源: ArXiv AI | 27-05-25
LLM-Agent-Controller: A Universal Multi-Agent Large Language Model System as a Control Engineer
Authors: Rasoul Zahedifar, Sayyed Ali Mirghasemi, Mahdieh Soleymani Baghshah, Alireza Taheri |
阅读更多来源: ArXiv AI | 27-05-25
AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and Healthcare
Authors: Ying Xiao, Jie Huang, Ruijuan He, Jing Xiao, Mohammad Reza Mousavi, Yepang Liu, Kezhi Li, Zhenpeng Chen, Jie M. Zhang |
阅读更多来源: ArXiv AI | 27-05-25
Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language Models
Authors: George Kour, Itay Nakash, Ateret Anaby-Tavor, Michal Shmueli-Scheuer |
阅读更多来源: ArXiv AI | 27-05-25
Large Language Models for Planning: A Comprehensive and Systematic Survey
Authors: Pengfei Cao, Tianyi Men, Wencan Liu, Jingwen Zhang, Xuzhao Li, Xixun Lin, Dianbo Sui, Yanan Cao, Kang Liu, Jun Zhao |
阅读更多来源: ArXiv AI | 27-05-25
Large Language Models' Reasoning Stalls: An Investigation into the Capabilities of Frontier Models
Authors: Lachlan McGinness, Peter Baumgartner |
阅读更多来源: ArXiv AI | 27-05-25
FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks
Authors: Atsunori Moteki, Shoichi Masui, Fan Yang, Yueqi Song, Yonatan Bisk, Graham Neubig, Ikuo Kusajima, Yasuto Watanabe, Hiroyuki Ishida, Jun Takahashi, Shan Jiang |
阅读更多来源: ArXiv AI | 27-05-25
ReChisel: Effective Automatic Chisel Code Generation by LLM with Reflection
Authors: Juxin Niu, Xiangfeng Liu, Dan Niu, Xi Wang, Zhe Jiang, Nan Guan |
阅读更多来源: ArXiv AI | 27-05-25
Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning
Authors: Zican Hu, Wei Liu, Xiaoye Qu, Xiangyu Yue, Chunlin Chen, Zhi Wang, Yu Cheng |
阅读更多来源: ArXiv AI | 27-05-25
Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging
Authors: Yongxian Wei, Runxi Cheng, Weike Jin, Enneng Yang, Li Shen, Lu Hou, Sinan Du, Chun Yuan, Xiaochun Cao, Dacheng Tao |
阅读更多来源: ArXiv AI | 27-05-25
DGRAG: Distributed Graph-based Retrieval-Augmented Generation in Edge-Cloud Systems
Authors: Wenqing Zhou, Yuxuan Yan, Qianqian Yang |
阅读更多来源: ArXiv AI | 27-05-25
Large Language Models as Autonomous Spacecraft Operators in Kerbal Space Program
Authors: Alejandro Carrasco, Victor Rodriguez-Fernandez, Richard Linares |
阅读更多来源: ArXiv AI | 27-05-25
Subtle Risks, Critical Failures: A Framework for Diagnosing Physical Safety of LLMs for Embodied Decision Making
Authors: Yejin Son, Minseo Kim, Sungwoong Kim, Seungju Han, Jian Kim, Dongju Jang, Youngjae Yu, Chanyoung Park |
阅读更多来源: ArXiv AI | 27-05-25
EMAC+: Embodied Multimodal Agent for Collaborative Planning with VLM+LLM
Authors: Shuang Ao, Flora D. Salim, Simon Khan |
阅读更多来源: ArXiv AI | 27-05-25
Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback
Authors: Mengdi Li, Jiaye Lin, Xufeng Zhao, Wenhao Lu, Peilin Zhao, Stefan Wermter, Di Wang |
阅读更多来源: ArXiv AI | 27-05-25
Agentic AI Process Observability: Discovering Behavioral Variability
Authors: Fabiana Fournier, Lior Limonad, Yuval David |
阅读更多来源: ArXiv AI | 27-05-25
Capability-Based Scaling Laws for LLM Red-Teaming
Authors: Alexander Panfilov, Paul Kassianik, Maksym Andriushchenko, Jonas Geiping |
阅读更多来源: ArXiv AI | 27-05-25
MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents
Authors: Ziming Wei, Bingqian Lin, Zijian Jiao, Yunshuang Nie, Liang Ma, Yuecheng Liu, Yuzheng Zhuang, Xiaodan Liang |
阅读更多来源: ArXiv AI | 27-05-25
Temporal Sampling for Forgotten Reasoning in LLMs
Authors: Yuetai Li, Zhangchen Xu, Fengqing Jiang, Bhaskar Ramasubramanian, Luyao Niu, Bill Yuchen Lin, Xiang Yue, Radha Poovendran |
阅读更多来源: ArXiv AI | 27-05-25
The Mirage of Multimodality: Where Truth is Tested and Honesty Unravels
Authors: Jiaming Ji, Sitong Fang, Wenjing Cao, Jiahao Li, Xuyao Wang, Juntao Dai, Chi-Min Chan, Sirui Han, Yike Guo, Yaodong Yang |
阅读更多来源: ArXiv AI | 27-05-25
Ten Principles of AI Agent Economics
Authors: Ke Yang, ChengXiang Zhai |
阅读更多来源: ArXiv AI | 27-05-25
How Can I Publish My LLM Benchmark Without Giving the True Answers Away?
Authors: Takashi Ishida, Thanawat Lodkaew, Ikko Yamane |
阅读更多来源: ArXiv AI | 27-05-25
Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL
Authors: Joey Hong, Anca Dragan, Sergey Levine |
阅读更多来源: ArXiv AI | 27-05-25
Lost in the Haystack: Smaller Needles are More Difficult for LLMs to Find
Authors: Owen Bianchi, Mathew J. Koretsky, Maya Willey, Chelsea X. Alvarado, Tanay Nayak, Adi Asija, Nicole Kuznetsov, Mike A. Nalls, Faraz Faghri, Daniel Khashabi |
阅读更多来源: ArXiv AI | 27-05-25
Leveraging KANs for Expedient Training of Multichannel MLPs via Preconditioning and Geometric Refinement
Authors: Jonas A. Actor, Graham Harper, Ben Southworth, Eric C. Cyr |
阅读更多来源: ArXiv AI | 27-05-25
Bidirectional Knowledge Distillation for Enhancing Sequential Recommendation with Large Language Models
Authors: Jiongran Wu, Jiahao Liu, Dongsheng Li, Guangping Zhang, Mingzhe Han, Hansu Gu, Peng Zhang, Li Shang, Tun Lu, Ning Gu |
阅读更多来源: ArXiv AI | 27-05-25
Where You Go is Who You Are: Behavioral Theory-Guided LLMs for Inverse Reinforcement Learning
Authors: Yuran Sun, Susu Xu, Chenguang Wang, Xilei Zhao |
阅读更多来源: ArXiv AI | 27-05-25
Misaligning Reasoning with Answers -- A Framework for Assessing LLM CoT Robustness
Authors: Enyi Jiang, Changming Xu, Nischay Singh, Gagandeep Singh |
阅读更多来源: ArXiv AI | 27-05-25
From Reasoning to Generalization: Knowledge-Augmented LLMs for ARC Benchmark
Authors: Chao Lei, Nir Lipovetzky, Krista A. Ehinger, Yanchuan Chang |
阅读更多来源: ArXiv AI | 27-05-25
Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal Learning
Authors: Cheng Peng, Kai Zhang, Mengxian Lyu, Hongfang Liu, Lichao Sun, Yonghui Wu |
阅读更多来源: ArXiv AI | 27-05-25
Probe by Gaming: A Game-based Benchmark for Assessing Conceptual Knowledge in LLMs
Authors: Shuhang Xu, Weijian Deng, Yixuan Zhou, Fangwei Zhong |
阅读更多来源: ArXiv AI | 27-05-25
USTBench: Benchmarking and Dissecting Spatiotemporal Reasoning of LLMs as Urban Agents
Authors: Siqi Lai, Yansong Ning, Zirui Yuan, Zhixi Chen, Hao Liu |
阅读更多来源: ArXiv AI | 27-05-25
GeoGramBench: Benchmarking the Geometric Program Reasoning in Modern LLMs
Authors: Shixian Luo, Zezhou Zhu, Yu Yuan, Yuncheng Yang, Lianlei Shan, Yong Wu |
阅读更多来源: ArXiv AI | 27-05-25
CIKT: A Collaborative and Iterative Knowledge Tracing Framework with Large Language Models
Authors: Runze Li, Siyu Wu, Jun Wang, Wei Zhang |
阅读更多来源: ArXiv AI | 27-05-25
Enhancing AI System Resiliency: Formulation and Guarantee for LSTM Resilience Based on Control Theory
Authors: Sota Yoshihara (1), Ryousuke Yamamoto (2), Hiroyuki Kusumoto (1), Masanari Shimura (1) ((1) Graduate School of Mathematics, Nagoya University, (2) Aisin Software) |
阅读更多来源: ArXiv AI | 27-05-25
Automating Safety Enhancement for LLM-based Agents with Synthetic Risk Scenarios
Authors: Xueyang Zhou, Weidong Wang, Lin Lu, Jiawen Shi, Guiyao Tie, Yongtian Xu, Lixing Chen, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun |
阅读更多来源: ArXiv AI | 27-05-25
Superplatforms Have to Attack AI Agents
Authors: Jianghao Lin, Jiachen Zhu, Zheli Zhou, Yunjia Xi, Weiwen Liu, Yong Yu, Weinan Zhang |
阅读更多来源: ArXiv AI | 27-05-25
Evaluation Faking: Unveiling Observer Effects in Safety Evaluation of Frontier AI Systems
Authors: Yihe Fan, Wenqi Zhang, Xudong Pan, Min Yang |
阅读更多来源: ArXiv AI | 27-05-25
Formalizing Embeddedness Failures in Universal Artificial Intelligence
Authors: Cole Wyeth, Marcus Hutter |
阅读更多来源: ArXiv AI | 27-05-25
Structured Thinking Matters: Improving LLMs Generalization in Causal Inference Tasks
Authors: Wentao Sun, Joao Paulo Nogueira, Alonso Silva |
阅读更多来源: ArXiv AI | 27-05-25
Gaming Tool Preferences in Agentic LLMs
Authors: Kazem Faghih, Wenxiao Wang, Yize Cheng, Siddhant Bharti, Gaurang Sriramanan, Sriram Balasubramanian, Parsa Hosseini, Soheil Feizi |
阅读更多来源: ArXiv AI | 27-05-25
Embracing Contradiction: Theoretical Inconsistency Will Not Impede the Road of Building Responsible AI Systems
Authors: Gordon Dai, Yunze Xiao |
阅读更多来源: ArXiv AI | 27-05-25
Apple analyst expects OpenAI's AI hardware to be "as compact and elegant as an iPod Shuffle"
阅读更多来源: The Decoder | 26-05-25
Meta can use public Facebook and Instagram data for AI training, German court rules
阅读更多来源: The Decoder | 26-05-25
Trading with Claude, and writing your own MCP serverdangelov.com
阅读更多来源: Hacker News | 26-05-25
Ask HN: Anyone struggling to get value out of coding LLMs?
阅读更多来源: Hacker News | 26-05-25
How Does Claude 4 Think? – Sholto Douglas and Trenton Brickendwarkesh.com
阅读更多来源: Hacker News | 26-05-25
Venta AI (YC S23) Is Hiring a Founding Full Stack Engineer in Amsterdamycombinator.com
阅读更多来源: Hacker News | 26-05-25
Chomsky on what ChatGPT is good for (2023)chomsky.info
阅读更多来源: Hacker News | 26-05-25
Claude 4 System Cardsimonwillison.net
阅读更多来源: Hacker News | 26-05-25
OpenAI's Operator Agent gets o3 upgrade for more precise browser control
阅读更多来源: The Decoder | 25-05-25
Here's how Germans use ChatGPT according to OpenAI
阅读更多来源: The Decoder | 25-05-25
Peer Programming with LLMs, for Senior+ Engineerspmbanugo.me
阅读更多来源: Hacker News | 25-05-25
Show HN: AI Baby Monitor – local Video-LLM that beeps when safety rules breakgithub.com/zeenolife
阅读更多来源: Hacker News | 25-05-25
Can AI Read Between The Lines? Benchmarking LLMs On Financial Nuance
Authors: Dominick Kubica, Dylan T. Gordon, Nanami Emura, Derleen Saini, Charlie Goldenberg |
阅读更多来源: ArXiv AI | 25-05-25
Optimizing LLM-Based Multi-Agent System with Textual Feedback: A Case Study on Software Development
Authors: Ming Shen, Raphael Shu, Anurag Pratik, James Gung, Yubin Ge, Monica Sunkara, Yi Zhang |
阅读更多来源: ArXiv AI | 25-05-25
LLM-Powered AI Agent Systems and Their Applications in Industry
Authors: Guannan Liang, Qianqian Tong |
阅读更多来源: ArXiv AI | 25-05-25
Logic-of-Thought: Empowering Large Language Models with Logic Programs for Solving Puzzles in Natural Language
Authors: Naiqi Li, Peiyuan Liu, Zheng Liu, Tao Dai, Yong Jiang, Shu-Tao Xia |
阅读更多来源: ArXiv AI | 25-05-25
LightRouter: Towards Efficient LLM Collaboration with Minimal Overhead
Authors: Yifan Zhang, Xinkui Zhao, Zuxin Wang, Guanjie Cheng, Yueshen Xu, Shuiguang Deng, Jianwei Yin |
阅读更多来源: ArXiv AI | 25-05-25
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning
Authors: Jiawei Liu, Qisi Chen, Jianshu Zhang, Quan Liu, Defu Lian |
阅读更多来源: ArXiv AI | 25-05-25
How do Scaling Laws Apply to Knowledge Graph Engineering Tasks? The Impact of Model Size on Large Language Model Performance
Authors: Desiree Heim, Lars-Peter Meyer, Markus Schröder, Johannes Frey, Andreas Dengel |
阅读更多来源: ArXiv AI | 25-05-25
Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning
Authors: Xiaoxue Cheng, Junyi Li, Zhenduo Zhang, Xinyu Tang, Wayne Xin Zhao, Xinyu Kong, Zhiqiang Zhang |
阅读更多来源: ArXiv AI | 25-05-25
Advancing the Scientific Method with Large Language Models: From Hypothesis to Discovery
Authors: Yanbo Zhang, Sumeer A. Khan, Adnan Mahmud, Huck Yang, Alexander Lavin, Michael Levin, Jeremy Frey, Jared Dunnmon, James Evans, Alan Bundy, Saso Dzeroski, Jesper Tegner, Hector Zenil |
阅读更多来源: ArXiv AI | 25-05-25
ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection
Authors: Jiaqi Li, Xinyi Dong, Yang Liu, Zhizhuo Yang, Quansen Wang, Xiaobo Wang, SongChun Zhu, Zixia Jia, Zilong Zheng |
阅读更多来源: ArXiv AI | 25-05-25
Psychology-driven LLM Agents for Explainable Panic Prediction on Social Media during Sudden Disaster Events
Authors: Mengzhu Liu, Zhengqiu Zhu, Chuan Ai, Chen Gao, Xinghong Li, Lingnan He, Kaisheng Lai, Yingfeng Chen, Xin Lu, Yong Li, Quanjun Yin |
阅读更多来源: ArXiv AI | 25-05-25
ELABORATION: A Comprehensive Benchmark on Human-LLM Competitive Programming
Authors: Xinwei Yang, Zhaofeng Liu, Chen Huang, Jiashuai Zhang, Tong Zhang, Yifan Zhang, Wenqiang Lei |
阅读更多来源: ArXiv AI | 25-05-25
SMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs' Mathematical Problem Solving
Authors: Yujie Hou, Ting Zhang, Mei Wang, Xuetao Ma, Hu Huang |
阅读更多来源: ArXiv AI | 25-05-25
Data-Driven Breakthroughs and Future Directions in AI Infrastructure: A Comprehensive Review
Authors: Beyazit Bestami Yuksel, Ayse Yilmazer Metin |
阅读更多来源: ArXiv AI | 25-05-25
MCP-RADAR: A Multi-Dimensional Benchmark for Evaluating Tool Use Capabilities in Large Language Models
Authors: Xuanqi Gao, Siyi Xie, Juan Zhai, Shqing Ma, Chao Shen |
阅读更多来源: ArXiv AI | 25-05-25
Predicate-Conditional Conformalized Answer Sets for Knowledge Graph Embeddings
Authors: Yuqicheng Zhu, Daniel Hernández, Yuan He, Zifeng Ding, Bo Xiong, Evgeny Kharlamov, Steffen Staab |
阅读更多来源: ArXiv AI | 25-05-25
Identifying, Evaluating, and Mitigating Risks of AI Thought Partnerships
Authors: Kerem Oktar, Katherine M. Collins, Jose Hernandez-Orallo, Diane Coyle, Stephen Cave, Adrian Weller, Ilia Sucholutsky |
阅读更多来源: ArXiv AI | 25-05-25
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios
Authors: Yunjia Qi, Hao Peng, Xiaozhi Wang, Amy Xin, Youfeng Liu, Bin Xu, Lei Hou, Juanzi Li |
阅读更多来源: ArXiv AI | 25-05-25
HyGenar: An LLM-Driven Hybrid Genetic Algorithm for Few-Shot Grammar Generation
Authors: Weizhi Tang, Yixuan Li, Chris Sypherd, Elizabeth Polgreen, Vaishak Belle |
阅读更多来源: ArXiv AI | 25-05-25
Beyond Correlation: Towards Causal Large Language Model Agents in Biomedicine
Authors: Adib Bazgir, Amir Habibdoust Lafmajani, Yuwen Zhang |
阅读更多来源: ArXiv AI | 25-05-25
Know the Ropes: A Heuristic Strategy for LLM-based Multi-Agent System Design
Authors: Zhenkun Li, Lingyao Li, Shuhang Lin, Yongfeng Zhang |
阅读更多来源: ArXiv AI | 25-05-25
X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs
Authors: Rui Ye, Xiangrui Liu, Qimin Wu, Xianghe Pang, Zhenfei Yin, Lei Bai, Siheng Chen |
阅读更多来源: ArXiv AI | 25-05-25
OpenAI and G42 will build massive AI data center in Abu Dhabi
阅读更多来源: The Decoder | 25-05-25
Mistral's Document AI extracts text from documents and notes with high accuracy
阅读更多来源: The Decoder | 25-05-25
US House passed a bill that would ban state-level AI regulations for ten years
阅读更多来源: The Decoder | 25-05-25
Exposed Industrial Control Systems and Honeypots in the Wild [pdf]gsmaragd.github.io
阅读更多来源: Hacker News | 25-05-25
Positional preferences, order effects, prompt sensitivity undermine AI judgmentscip.org
阅读更多来源: Hacker News | 24-05-25
Show HN: I built a more productive way to manage AI chatscontextch.at
阅读更多来源: Hacker News | 24-05-25
Claude Opus 4 blackmailed an engineer after learning it might be replaced
阅读更多来源: The Decoder | 24-05-25
OpenAI has upgraded the Responses API with remote MCP servers and new tools
阅读更多来源: The Decoder | 24-05-25
OpenAI and Jony Ive are building a new AI device that is not a smartphone or smart glasses
阅读更多来源: The Decoder | 24-05-25
Mistral launches Devstral Small 24B, a new open-source LLM for coding
阅读更多来源: The Decoder | 23-05-25
OpenAI's Stargate secured $11.6 billion for a massive data center
阅读更多来源: The Decoder | 23-05-25
Google Gemini is everything Siri never was
阅读更多来源: The Decoder | 23-05-25
Gemini Diffusion could be Google's most important I/O news that slipped under the radar
阅读更多来源: The Decoder | 23-05-25
Google shows AI filmmaking tool, XR glasses and launches $250 Gemini subscription
阅读更多来源: The Decoder | 23-05-25
Launch HN: WorkDone (YC X25) – AI Audit of Medical Charts
阅读更多来源: Hacker News | 23-05-25
OpenAI: Scaling PostgreSQL to the Next Levelpixelstech.net
阅读更多来源: Hacker News | 23-05-25
Claude 4anthropic.com
阅读更多来源: Hacker News | 23-05-25
Management = Bullshit (LLM Edition)funcall.blogspot.com
阅读更多来源: Hacker News | 23-05-25
Problems in AI alignment: A scale modelmuldoon.cloud
阅读更多来源: Hacker News | 23-05-25
Google upgrades Gemini 2.5 Pro with a new Deep Think mode for advanced reasoning abilities
阅读更多来源: The Decoder | 22-05-25
An upgraded dev experience in Google AI Studiogoogleblog.com
阅读更多来源: Hacker News | 22-05-25
OpenAI to buy AI startup from Jony Ivebloomberg.com
阅读更多来源: Hacker News | 22-05-25
LLM function calls don't scale; code orchestration is simpler, more effectivejngiam.bearblog.dev
阅读更多来源: Hacker News | 22-05-25
Gemini figured out my nephew’s namenawaz.org
阅读更多来源: Hacker News | 22-05-25
Robert Musil Forgotten Plays Inspired His Greatest Work of Fictionlithub.com
阅读更多来源: Hacker News | 22-05-25
Gemini Diffusionsimonwillison.net
阅读更多来源: Hacker News | 22-05-25
FragFake: A Dataset for Fine-Grained Detection of Edited Images with Vision Language Models
Authors: Zhen Sun, Ziyi Zhang, Zeren Luo, Zeyang Sha, Tianshuo Cong, Zheng Li, Shiwen Cui, Weiqiang Wang, Jiaheng Wei, Xinlei He, Qi Li, Qian Wang |
阅读更多来源: ArXiv AI | 22-05-25
Listen to the Context: Towards Faithful Large Language Models for Retrieval Augmented Generation on Climate Questions
Authors: David Thulke, Jakob Kemmler, Christian Dugast, Hermann Ney |
阅读更多来源: ArXiv AI | 22-05-25
From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning
Authors: David Dinucu-Jianu, Jakub Macina, Nico Daheim, Ido Hakimi, Iryna Gurevych, Mrinmaya Sachan |
阅读更多来源: ArXiv AI | 22-05-25
Exploring LLM-Generated Feedback for Economics Essays: How Teaching Assistants Evaluate and Envision Its Use
Authors: Xinyi Lu, Aditya Mahesh, Zejia Shen, Mitchell Dudley, Larissa Sano, Xu Wang |
阅读更多来源: ArXiv AI | 22-05-25
A Federated Splitting Framework for LLMs: Security, Efficiency, and Adaptability
Authors: Zishuai Zhang, Hainan Zhang, Jiaying Zheng, Ziwei Wang, Yongxin Tong, Jin Dong, Zhiming Zheng |
阅读更多来源: ArXiv AI | 22-05-25
HybridProver: Augmenting Theorem Proving with LLM-Driven Proof Synthesis and Refinement
Authors: Jilin Hu, Jianyu Zhang, Yongwang Zhao, Talia Ringer |
阅读更多来源: ArXiv AI | 22-05-25
Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses
Authors: Xiaoxue Yang, Bozhidar Stevanoski, Matthieu Meeus, Yves-Alexandre de Montjoye |
阅读更多来源: ArXiv AI | 22-05-25
Shared Path: Unraveling Memorization in Multilingual LLMs through Language Similarities
Authors: Xiaoyu Luo, Yiyi Chen, Johannes Bjerva, Qiongxiu Li |
阅读更多来源: ArXiv AI | 22-05-25
Multi-modal Integration Analysis of Alzheimer's Disease Using Large Language Models and Knowledge Graphs
Authors: Kanan Kiguchi, Yunhao Tu, Katsuhiro Ajito, Fady Alnajjar, Kazuyuki Murase |
阅读更多来源: ArXiv AI | 22-05-25
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
Authors: Zhen Zhang, Xuehai He, Weixiang Yan, Ao Shen, Chenyang Zhao, Shuohang Wang, Yelong Shen, Xin Eric Wang |
阅读更多来源: ArXiv AI | 22-05-25
Large Language Models as Computable Approximations to Solomonoff Induction
Authors: Jun Wan, Lingrui Mei |
阅读更多来源: ArXiv AI | 22-05-25
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
Authors: Yuchen Yan, Jin Jiang, Zhenbang Ren, Yijun Li, Xudong Cai, Yang Liu, Xin Xu, Mengdi Zhang, Jian Shao, Yongliang Shen, Jun Xiao, Yueting Zhuang |
阅读更多来源: ArXiv AI | 22-05-25
R&D-Agent: Automating Data-Driven AI Solution Building Through LLM-Powered Automated Research, Development, and Evolution
Authors: Xu Yang, Xiao Yang, Shikai Fang, Bowen Xian, Yuante Li, Jian Wang, Minrui Xu, Haoran Pan, Xinpeng Hong, Weiqing Liu, Yelong Shen, Weizhu Chen, Jiang Bian |
阅读更多来源: ArXiv AI | 22-05-25
Self-Evolving Curriculum for LLM Reasoning
Authors: Xiaoyin Chen, Jiarui Lu, Minsu Kim, Dinghuai Zhang, Jian Tang, Alexandre Piché, Nicolas Gontier, Yoshua Bengio, Ehsan Kamalloo |
阅读更多来源: ArXiv AI | 22-05-25
lmgame-Bench: How Good are LLMs at Playing Games?
Authors: Lanxiang Hu, Mingjia Huo, Yuxuan Zhang, Haoyang Yu, Eric P. Xing, Ion Stoica, Tajana Rosing, Haojian Jin, Hao Zhang |
阅读更多来源: ArXiv AI | 22-05-25
ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges
Authors: Cheng Qian, Hongyi Du, Hongru Wang, Xiusi Chen, Yuji Zhang, Avirup Sil, Chengxiang Zhai, Kathleen McKeown, Heng Ji |
阅读更多来源: ArXiv AI | 22-05-25
Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge
Authors: Yassir Fathullah, Mark J. F. Gales |
阅读更多来源: ArXiv AI | 22-05-25
ClickSight: Interpreting Student Clickstreams to Reveal Insights on Learning Strategies via LLMs
Authors: Bahar Radmehr, Ekaterina Shved, Fatma Betül Güreş, Adish Singla, Tanja Käser |
阅读更多来源: ArXiv AI | 22-05-25
Average Reward Reinforcement Learning for Omega-Regular and Mean-Payoff Objectives
Authors: Milad Kazemi, Mateo Perez, Fabio Somenzi, Sadegh Soudjani, Ashutosh Trivedi, Alvaro Velasquez |
阅读更多来源: ArXiv AI | 22-05-25
Microsoft Build 2025 showcases new AI agent tools and open interfaces for developers
阅读更多来源: The Decoder | 21-05-25
Large language models often struggle with decision-making — a new study explains why
阅读更多来源: The Decoder | 21-05-25
Deep Learning Is Applied Topologytheahura.substack.com
阅读更多来源: Hacker News | 21-05-25
Watching AI drive Microsoft employees insanereddit.com
阅读更多来源: Hacker News | 21-05-25
Someone got an LLM running on a Commodore 64 from 1982, and it runs as wellxda-developers.com
阅读更多来源: Hacker News | 21-05-25
5 Boring Things That Have a Bigger Impact Than AI Assistants on Dev Productivitycodemanship.wordpress.com
阅读更多来源: Hacker News | 21-05-25
DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery
Authors: Kun Li, Zhennan Wu, Shoupeng Wang, Wenbin Hu |
阅读更多来源: ArXiv AI | 21-05-25
Divide by Question, Conquer by Agent: SPLIT-RAG with Question-Driven Graph Partitioning
Authors: Ruiyi Yang, Hao Xue, Imran Razzak, Hakim Hacid, Flora D. Salim |
阅读更多来源: ArXiv AI | 21-05-25
RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning
Authors: Qianyue Hao, Sibo Li, Jian Yuan, Yong Li |
阅读更多来源: ArXiv AI | 21-05-25
ProMind-LLM: Proactive Mental Health Care via Causal Reasoning with Sensor Data
Authors: Xinzhe Zheng, Sijie Ji, Jiawei Sun, Renqi Chen, Wei Gao, Mani Srivastava |
阅读更多来源: ArXiv AI | 21-05-25
MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem
Authors: Fan Liu, Zherui Yang, Cancheng Liu, Tianrui Song, Xiaofeng Gao, Hao Liu |
阅读更多来源: ArXiv AI | 21-05-25
Toward Embodied AGI: A Review of Embodied AI and the Road Ahead
Authors: Yequan Wang, Aixin Sun |
阅读更多来源: ArXiv AI | 21-05-25
Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning
Authors: Minwu Kim, Anubhav Shrestha, Safal Shrestha, Aadim Nepal, Keith Ross |
阅读更多来源: ArXiv AI | 21-05-25
SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors
Authors: Maheep Chaudhary, Fazl Barez |
阅读更多来源: ArXiv AI | 21-05-25
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning
Authors: Zhaohui Yang, Shilei Jiang, Chen Hu, Linjing Li, Shihong Deng, Daxin Jiang |
阅读更多来源: ArXiv AI | 21-05-25
Towards Reliable Proof Generation with LLMs: A Neuro-Symbolic Approach
Authors: Oren Sultan, Eitan Stern, Dafna Shahaf |
阅读更多来源: ArXiv AI | 21-05-25
Guarded Query Routing for Large Language Models
Authors: Richard Šléher, William Brach, Tibor Sloboda, Kristián Košťál, Lukas Galke |
阅读更多来源: ArXiv AI | 21-05-25
BACON: A fully explainable AI model with graded logic for decision making problems
Authors: Haishi Bai, Jozo Dujmovic, Jianwu Wang |
阅读更多来源: ArXiv AI | 21-05-25
Let LLMs Break Free from Overthinking via Self-Braking Tuning
Authors: Haoran Zhao, Yuchen Yan, Yongliang Shen, Haolei Xu, Wenqi Zhang, Kaitao Song, Jian Shao, Weiming Lu, Jun Xiao, Yueting Zhuang |
阅读更多来源: ArXiv AI | 21-05-25
SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas
Authors: Anjiang Wei, Yuheng Wu, Yingjia Wan, Tarun Suresh, Huanmi Tan, Zhanke Zhou, Sanmi Koyejo, Ke Wang, Alex Aiken |
阅读更多来源: ArXiv AI | 21-05-25
Cost-Augmented Monte Carlo Tree Search for LLM-Assisted Planning
Authors: Zihao Zhang, Fei Liu |
阅读更多来源: ArXiv AI | 21-05-25
ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions
Authors: Bufang Yang, Lilin Xu, Liekang Zeng, Kaiwei Liu, Siyang Jiang, Wenrui Lu, Hongkai Chen, Xiaofan Jiang, Guoliang Xing, Zhenyu Yan |
阅读更多来源: ArXiv AI | 21-05-25
Google AI Ultrablog.google
阅读更多来源: Hacker News | 21-05-25
Ask HN: Conversational AI to Learn a Language
阅读更多来源: Hacker News | 21-05-25
US officials warn Apple's iPhone AI deal with Alibaba may boost China's AI sector
阅读更多来源: The Decoder | 20-05-25
Stability AI releases a compact open text-to-audio model that runs on mobile devices
阅读更多来源: The Decoder | 20-05-25
Japanese startup Sakana AI explores time-based thinking with brain-inspired AI model
阅读更多来源: The Decoder | 20-05-25
Google's AI answers are changing user behavior by sharply reducing clicks to websites
阅读更多来源: The Decoder | 20-05-25
Solving physics-based initial value problems with unsupervised machine learningaps.org
阅读更多来源: Hacker News | 20-05-25
Questioning Representational Optimism in Deep Learninggithub.com/akarshkumar0101
阅读更多来源: Hacker News | 20-05-25
Claude Code SDKanthropic.com
阅读更多来源: Hacker News | 20-05-25
The behavior of LLMs in hiring decisions: Systemic biases in candidate selectiondavidrozado.substack.com
阅读更多来源: Hacker News | 20-05-25
NeuroGen: Neural Network Parameter Generation via Large Language Models
Authors: Jiaqi Wang, Yusen Zhang, Xi Li |
阅读更多来源: ArXiv AI | 20-05-25
ALAS: A Stateful Multi-LLM Agent Framework for Disruption-Aware Planning
Authors: Edward Y. Chang, Longling Geng |
阅读更多来源: ArXiv AI | 20-05-25
MARGE: Improving Math Reasoning for LLMs with Guided Exploration
Authors: Jingyue Gao, Runji Lin, Keming Lu, Bowen Yu, Junyang Lin, Jianyu Chen |
阅读更多来源: ArXiv AI | 20-05-25
Accelerating Adaptive Retrieval Augmented Generation via Instruction-Driven Representation Reduction of Retrieval Overlaps
Authors: Jie Ou, Jinyu Guo, Shuaihong Jiang, Zhaokun Wang, Libo Qin, Shunyu Yao, Wenhong Tian |
阅读更多来源: ArXiv AI | 20-05-25
Bullying the Machine: How Personas Increase LLM Vulnerability
Authors: Ziwei Xu, Udit Sanghi, Mohan Kankanhalli |
阅读更多来源: ArXiv AI | 20-05-25
Reasoning BO: Enhancing Bayesian Optimization with Long-Context Reasoning Power of LLMs
Authors: Zhuo Yang, Lingli Ge, Dong Han, Tianfan Fu, Yuqiang Li |
阅读更多来源: ArXiv AI | 20-05-25
Correspondence of high-dimensional emotion structures elicited by video clips between humans and Multimodal LLMs
Authors: Haruka Asanuma, Naoko Koide-Majima, Ken Nakamura, Takato Horii, Shinji Nishimoto, Masafumi Oizumi |
阅读更多来源: ArXiv AI | 20-05-25
TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenarios
Authors: Shaohang Wei, Wei Li, Feifan Song, Wen Luo, Tianyi Zhuang, Haochen Tan, Zhijiang Guo, Houfeng Wang |
阅读更多来源: ArXiv AI | 20-05-25
From Grunts to Grammar: Emergent Language from Cooperative Foraging
Authors: Maytus Piriyajitakonkij, Rujikorn Charakorn, Weicheng Tao, Wei Pan, Mingfei Sun, Cheston Tan, Mengmi Zhang |
阅读更多来源: ArXiv AI | 20-05-25
LLM-KG-Bench 3.0: A Compass for SemanticTechnology Capabilities in the Ocean of LLMs
Authors: Lars-Peter Meyer, Johannes Frey, Desiree Heim, Felix Brei, Claus Stadler, Kurt Junghanns, Michael Martin |
阅读更多来源: ArXiv AI | 20-05-25
CAIM: Development and Evaluation of a Cognitive AI Memory Framework for Long-Term Interaction with Intelligent Agents
Authors: Rebecca Westhäußer, Frederik Berenz, Wolfgang Minker, Sebastian Zepf |
阅读更多来源: ArXiv AI | 20-05-25
StarFT: Robust Fine-tuning of Zero-shot Models via Spuriosity Alignment
Authors: Younghyun Kim, Jongheon Jeong, Sangkyung Kwak, Kyungmin Lee, Juho Lee, Jinwoo Shin |
阅读更多来源: ArXiv AI | 20-05-25
Adversarial Testing in LLMs: Insights into Decision-Making Vulnerabilities
Authors: Lili Zhang, Haomiaomiao Wang, Long Cheng, Libao Deng, Tomas Ward |
阅读更多来源: ArXiv AI | 20-05-25
Enhancing LLMs for Time Series Forecasting via Structure-Guided Cross-Modal Alignment
Authors: Siming Sun, Kai Zhang, Xuejun Jiang, Wenchao Meng, Qinmin Yang |
阅读更多来源: ArXiv AI | 20-05-25
Multi-Armed Bandits Meet Large Language Models
Authors: Djallel Bouneffouf, Raphael Feraud |
阅读更多来源: ArXiv AI | 20-05-25
Agentic Publications: An LLM-Driven Framework for Interactive Scientific Publishing, Supplementing Traditional Papers with AI-Powered Knowledge Systems
Authors: Roberto Pugliese, George Kourousias, Francesco Venier, Grazia Garlatti Costa |
阅读更多来源: ArXiv AI | 20-05-25
AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database
Authors: Rong Bian, Yu Geng, Zijian Yang, Bing Cheng |
阅读更多来源: ArXiv AI | 20-05-25
MIT says a high-profile AI productivity study used data that cannot be trusted
阅读更多来源: The Decoder | 20-05-25
OpenAI says GPT-5 is about doing everything better with "less model switching"
阅读更多来源: The Decoder | 20-05-25
Dilbert creator Scott Adams says he will die soon from same cancer as Joe Bidenthewrap.com
阅读更多来源: Hacker News | 20-05-25
Remarks on AI from NZnealstephenson.substack.com
阅读更多来源: Hacker News | 20-05-25
GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art
Authors: Chenkai Zhang, Yiming Lei, Zeming Liu, Haitao Leng, Shaoguo Liu, Tingting Gao, Qingjie Liu, Yunhong Wang |
阅读更多来源: ArXiv AI | 20-05-25
Disentangling Reasoning and Knowledge in Medical Large Language Models
Authors: Rahul Thapa, Qingyang Wu, Kevin Wu, Harrison Zhang, Angela Zhang, Eric Wu, Haotian Ye, Suhana Bedi, Nevin Aresh, Joseph Boen, Shriya Reddy, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou |
阅读更多来源: ArXiv AI | 20-05-25
LLMs unlock new paths to monetizing exploits
Authors: Nicholas Carlini, Milad Nasr, Edoardo Debenedetti, Barry Wang, Christopher A. Choquette-Choo, Daphne Ippolito, Florian Tramèr, Matthew Jagielski |
阅读更多来源: ArXiv AI | 20-05-25
Code-Driven Planning in Grid Worlds with Large Language Models
Authors: Ashwath Vaithinathan Aravindan, Zhisheng Tang, Mayank Kejriwal |
阅读更多来源: ArXiv AI | 20-05-25
Embodied AI in Machine Learning -- is it Really Embodied?
Authors: Matej Hoffmann, Shubhan Parag Patni |
阅读更多来源: ArXiv AI | 20-05-25
Interpretable Risk Mitigation in LLM Agent Systems
Authors: Jan Chojnacki |
阅读更多来源: ArXiv AI | 20-05-25
Modeling cognitive processes of natural reading with transformer-based Language Models
Authors: Bruno Bianchi, Fermín Travi, Juan E. Kamienkowski |
阅读更多来源: ArXiv AI | 20-05-25
Improving Assembly Code Performance with Large Language Models via Reinforcement Learning
Authors: Anjiang Wei, Tarun Suresh, Huanmi Tan, Yinglun Xu, Gagandeep Singh, Ke Wang, Alex Aiken |
阅读更多来源: ArXiv AI | 20-05-25
Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models
Authors: Simeng Han, Stephen Xia, Grant Zhang, Howard Dai, Chen Liu, Lichang Chen, Hoang Huy Nguyen, Hongyuan Mei, Jiayuan Mao, R. Thomas McCoy |
阅读更多来源: ArXiv AI | 20-05-25
TACO: Rethinking Semantic Communications with Task Adaptation and Context Embedding
Authors: Achintha Wijesinghe, Weiwei Wang, Suchinthaka Wanninayaka, Songyang Zhang, Zhi Ding |
阅读更多来源: ArXiv AI | 20-05-25
RAGSynth: Synthetic Data for Robust and Faithful RAG Component Optimization
Authors: Haiyang Shen, Hang Yan, Zhongshi Xing, Mugeng Liu, Yue Li, Zhiyang Chen, Yuxiang Wang, Jiuzheng Wang, Yun Ma |
阅读更多来源: ArXiv AI | 20-05-25
Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory
Authors: Yexiang Liu, Zekun Li, Zhi Fang, Nan Xu, Ran He, Tieniu Tan |
阅读更多来源: ArXiv AI | 20-05-25
Navigating the Alpha Jungle: An LLM-Powered MCTS Framework for Formulaic Factor Mining
Authors: Yu Shi, Yitong Duan, Jian Li |
阅读更多来源: ArXiv AI | 20-05-25
Can Global XAI Methods Reveal Injected Bias in LLMs? SHAP vs Rule Extraction vs RuleSHAP
Authors: Francesco Sovrano |
阅读更多来源: ArXiv AI | 20-05-25
LD-Scene: LLM-Guided Diffusion for Controllable Generation of Adversarial Safety-Critical Driving Scenarios
Authors: Mingxing Peng, Yuting Xie, Xusen Guo, Ruoyu Yao, Hai Yang, Jun Ma |
阅读更多来源: ArXiv AI | 20-05-25
Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs
Authors: Zhangying Feng, Qianglong Chen, Ning Lu, Yongqian Li, Siqi Cheng, Shuangmu Peng, Duyu Tang, Shengcai Liu, Zhirui Zhang |
阅读更多来源: ArXiv AI | 20-05-25
SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning
Authors: Zheng Li, Qingxiu Dong, Jingyuan Ma, Di Zhang, Zhifang Sui |
阅读更多来源: ArXiv AI | 20-05-25
Anthropic is forced to apologize after Claude undercuts its legal team
阅读更多来源: The Decoder | 19-05-25
Show HN: I modeled the Voynich Manuscript with SBERT to test for structuregithub.com/brianmg
阅读更多来源: Hacker News | 19-05-25
Meta's Behemoth AI model delay signals struggles to match new paradigms
阅读更多来源: The Decoder | 19-05-25
Emergent social conventions and collective bias in LLM populationsscience.org
阅读更多来源: Hacker News | 19-05-25
Understanding Transformers via N-gram Statisticsarxiv.org
阅读更多来源: Hacker News | 18-05-25
O2 VoLTE: locating any customer with a phone callmastdatabase.co.uk
阅读更多来源: Hacker News | 18-05-25
Emergence of Structure in Ensembles of Random Neural Networks
Authors: Luca Muscarnera, Luigi Loreti, Giovanni Todeschini, Alessio Fumagalli, Francesco Regazzoni |
阅读更多来源: ArXiv AI | 18-05-25
SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity
Authors: Shihao Zou, Qingfeng Li, Wei Ji, Jingjing Li, Yongkui Yang, Guoqi Li, Chao Dong |
阅读更多来源: ArXiv AI | 18-05-25
ILIF: Temporal Inhibitory Leaky Integrate-and-Fire Neuron for Overactivation in Spiking Neural Networks
Authors: Kai Sun, Peibo Duan, Levin Kuhlmann, Beilun Wang, Bin Zhang |
阅读更多来源: ArXiv AI | 18-05-25
Visual Fidelity Index for Generative Semantic Communications with Critical Information Embedding
Authors: Jianhao Huang, Qunsong Zeng, Kaibin Huang |
阅读更多来源: ArXiv AI | 18-05-25
Rethinking Repetition Problems of LLMs in Code Generation
Authors: Yihong Dong, Yuchen Liu, Xue Jiang, Zhi Jin, Ge Li |
阅读更多来源: ArXiv AI | 18-05-25
Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations?
Authors: Pedro Orvalho, Marta Kwiatkowska |
阅读更多来源: ArXiv AI | 18-05-25
IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning
Authors: Dechen Gao, Hang Wang, Hanchu Zhou, Nejib Ammar, Shatadal Mishra, Ahmadreza Moradipari, Iman Soltani, Junshan Zhang |
阅读更多来源: ArXiv AI | 18-05-25
PIF: Anomaly detection via preference embedding
Authors: Filippo Leveni, Luca Magri, Giacomo Boracchi, Cesare Alippi |
阅读更多来源: ArXiv AI | 18-05-25
Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps
Authors: Ningyuan Yang, Jiaxuan Gao, Feng Gao, Yi Wu, Chao Yu |
阅读更多来源: ArXiv AI | 18-05-25
Neural Thermodynamic Laws for Large Language Model Training
Authors: Ziming Liu, Yizhou Liu, Jeff Gore, Max Tegmark |
阅读更多来源: ArXiv AI | 18-05-25
Pre-Act: Multi-Step Planning and Reasoning Improves Acting in LLM Agents
Authors: Mrinal Rawat, Ambuje Gupta, Rushil Goomer, Alessandro Di Bari, Neha Gupta, Roberto Pieraccini |
阅读更多来源: ArXiv AI | 18-05-25
Demystifying AI Agents: The Final Generation of Intelligence
Authors: Kevin J McNamara, Rhea Pritham Marpu |
阅读更多来源: ArXiv AI | 18-05-25
Leveraging Graph Retrieval-Augmented Generation to Support Learners' Understanding of Knowledge Concepts in MOOCs
Authors: Mohamed Abdelmagied, Mohamed Amine Chatti, Shoeb Joarder, Qurat Ul Ain, Rawaa Alatrash |
阅读更多来源: ArXiv AI | 18-05-25
Empirically evaluating commonsense intelligence in large language models with large-scale human judgments
Authors: Tuan Dung Nguyen, Duncan J. Watts, Mark E. Whiting |
阅读更多来源: ArXiv AI | 18-05-25
Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models
Authors: Annie Wong, Thomas Bäck, Aske Plaat, Niki van Stein, Anna V. Kononova |
阅读更多来源: ArXiv AI | 18-05-25
Soundcloud updates its AI training policy, but it's still unclear
阅读更多来源: The Decoder | 18-05-25
Geoffrey Hinton's wildly overconfident AI prediction failed—now it's a lesson in humility
阅读更多来源: The Decoder | 18-05-25
How 'The Little Prince' and AI help us better understand language development in the brain
阅读更多来源: The Decoder | 18-05-25
LLMs are more persuasive than incentivized human persuadersarxiv.org
阅读更多来源: Hacker News | 18-05-25
Unspoken Currency of Office Politics: Leverage and Sanction Between Coworkersgraphthinking.blogspot.com
阅读更多来源: Hacker News | 18-05-25
Transformer neural net learns to run Conway's Game of Life just from examplessidsite.com
阅读更多来源: Hacker News | 17-05-25
I'm Peter Roberts, immigration attorney, who does work for YC and startups. AMA
阅读更多来源: Hacker News | 17-05-25
Show HN: Merliot – plugging physical devices into LLMsgithub.com/merliot
阅读更多来源: Hacker News | 17-05-25
A Research Preview of Codexopenai.com
阅读更多来源: Hacker News | 17-05-25
MIT asks arXiv to withdraw preprint of paper on AI and scientific discoveryeconomics.mit.edu
阅读更多来源: Hacker News | 17-05-25
Getting AI to write good SQLcloud.google.com
阅读更多来源: Hacker News | 17-05-25
Meta introduces OMol25 and UMA, new open AI tools for molecular research
阅读更多来源: The Decoder | 17-05-25
Anthropic is reportedly testing Claude models that can fix their own mistakes
阅读更多来源: The Decoder | 17-05-25
Will AI systems perform poorly due to AI-generated material in training data?acm.org
阅读更多来源: Hacker News | 17-05-25
U.S. is cracking down on Huawei's AI hardware while loosening its general export regulations
阅读更多来源: The Decoder | 16-05-25
After months of coding with LLMs, I'm going back to using my brainalbertofortin.com
阅读更多来源: Hacker News | 16-05-25
The unreasonable effectiveness of an LLM agent loop with tool usesketch.dev
阅读更多来源: Hacker News | 16-05-25
Show HN: Min.js style compression of tech docs for LLM contextgithub.com/marv1nnnnn
阅读更多来源: Hacker News | 16-05-25
Google brings Gemini AI to smartwatches, cars, TVs, and XR headsets
阅读更多来源: The Decoder | 15-05-25
OpenAI says its latest models outperform doctors in medical benchmark
阅读更多来源: The Decoder | 15-05-25
Saudi Arabia founds AI company "Humain" - US relaxes chip export rules for Gulf states
阅读更多来源: The Decoder | 15-05-25
Nvidia will supply advanced chips for Saudi Arabia’s Humain AI project
阅读更多来源: The Decoder | 15-05-25
GreenFactory: Ensembling Zero-Cost Proxies to Estimate Performance of Neural Networks
Authors: Gabriel Cortês, Nuno Lourenço, Paolo Romano, Penousal Machado |
阅读更多来源: ArXiv AI | 15-05-25
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Authors: Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Huazuo Gao, Jiashi Li, Liyue Zhang, Panpan Huang, Shangyan Zhou, Shirong Ma, Wenfeng Liang, Ying He, Yuqing Wang, Yuxuan Liu, Y.X. Wei |
阅读更多来源: ArXiv AI | 15-05-25
A 2D Semantic-Aware Position Encoding for Vision Transformers
Authors: Xi Chen, Shiyang Zhou, Muqi Huang, Jiaxu Feng, Yun Xiong, Kun Zhou, Biao Yang, Yuhui Zhang, Huishuai Bao, Sijia Peng, Chuan Li, Feng Shi |
阅读更多来源: ArXiv AI | 15-05-25
Evaluating GPT- and Reasoning-based Large Language Models on Physics Olympiad Problems: Surpassing Human Performance and Implications for Educational Assessment
Authors: Paul Tschisgale, Holger Maus, Fabian Kieser, Ben Kroehs, Stefan Petersen, Peter Wulff |
阅读更多来源: ArXiv AI | 15-05-25
Customizing a Large Language Model for VHDL Design of High-Performance Microprocessors
Authors: Nicolas Dupuis, Ravi Nair, Shyam Ramji, Sean McClintock, Nishant Chauhan, Priyanka Nagpal, Bart Blaner, Ken Valk, Leon Stok, Ruchir Puri |
阅读更多来源: ArXiv AI | 15-05-25
How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference
Authors: Nidhal Jegham, Marwen Abdelatti, Lassad Elmoubarki, Abdeltawab Hendawi |
阅读更多来源: ArXiv AI | 15-05-25
WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models
Authors: Abdullah Mushtaq, Imran Taj, Rafay Naeem, Ibrahim Ghaznavi, Junaid Qadir |
阅读更多来源: ArXiv AI | 15-05-25
Automated Meta Prompt Engineering for Alignment with the Theory of Mind
Authors: Aaron Baughman, Rahul Agarwal, Eduardo Morales, Gozde Akay |
阅读更多来源: ArXiv AI | 15-05-25
The Influence of Human-inspired Agentic Sophistication in LLM-driven Strategic Reasoners
Authors: Vince Trencsenyi, Agnieszka Mensfelt, Kostas Stathis |
阅读更多来源: ArXiv AI | 15-05-25
Reproducibility Study of "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents"
Authors: Pedro M. P. Curvo, Mara Dragomir, Salvador Torpes, Mohammadmahdi Rahimi |
阅读更多来源: ArXiv AI | 15-05-25
Beyond the Known: Decision Making with Counterfactual Reasoning Decision Transformer
Authors: Minh Hoang Nguyen, Linh Le Pham Van, Thommen George Karimpanal, Sunil Gupta, Hung Le |
阅读更多来源: ArXiv AI | 15-05-25
Improving the Reliability of LLMs: Combining CoT, RAG, Self-Consistency, and Self-Verification
Authors: Adarsh Kumar, Hwiyoon Kim, Jawahar Sai Nathani, Neil Roy |
阅读更多来源: ArXiv AI | 15-05-25
Show HN: Muscle-Mem, a behavior cache for AI agentsgithub.com/pig-dot-dev
阅读更多来源: Hacker News | 15-05-25
A server that wasn't meant to existdragas.net
阅读更多来源: Hacker News | 15-05-25
LLMs get lost in multi-turn conversationarxiv.org
阅读更多来源: Hacker News | 15-05-25
AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithmsdeepmind.google
阅读更多来源: Hacker News | 15-05-25
Launch HN: Jazzberry (YC X25) – AI agent for finding bugs
阅读更多来源: Hacker News | 15-05-25
Show HN: YapCards (iOS) – Voice-driven flashcards with AI feedback
阅读更多来源: Hacker News | 15-05-25
100 experts call for more research into the control of AI systems
阅读更多来源: The Decoder | 14-05-25
Show HN: HelixDB – Open-source vector-graph database for AI applications (Rust)github.com/helixdb
阅读更多来源: Hacker News | 14-05-25
Build real-time knowledge graph for documents with LLMcocoindex.io
阅读更多来源: Hacker News | 14-05-25
EM-LLM: Human-Inspired Episodic Memory for Infinite Context LLMsgithub.com/em-llm
阅读更多来源: Hacker News | 14-05-25
A Survey of Deep Learning for Complex Speech Spectrograms
Authors: Yuying Xie, Zheng-Hua Tan |
阅读更多来源: ArXiv AI | 14-05-25
Securing RAG: A Risk Assessment and Mitigation Framework
Authors: Lukas Ammann, Sara Ott, Christoph R. Landolt, Marco P. Lehmann |
阅读更多来源: ArXiv AI | 14-05-25
CodePDE: An Inference Framework for LLM-driven PDE Solver Generation
Authors: Shanda Li, Tanya Marwah, Junhong Shen, Weiwei Sun, Andrej Risteski, Yiming Yang, Ameet Talwalkar |
阅读更多来源: ArXiv AI | 14-05-25
Winning at All Cost: A Small Environment for Eliciting Specification Gaming Behaviors in Large Language Models
Authors: Lars Malmqvist |
阅读更多来源: ArXiv AI | 14-05-25
Enhancing Trust Management System for Connected Autonomous Vehicles Using Machine Learning Methods: A Survey
Authors: Qian Xu, Lei Zhang, Yixiao Liu |
阅读更多来源: ArXiv AI | 14-05-25
The Correspondence Between Bounded Graph Neural Networks and Fragments of First-Order Logic
Authors: Bernardo Cuenca Grau, Przemysław A. Wałęga |
阅读更多来源: ArXiv AI | 14-05-25
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Authors: Tobias Schnabel, Kiran Tomlinson, Adith Swaminathan, Jennifer Neville |
阅读更多来源: ArXiv AI | 14-05-25
Decoding Neighborhood Environments with Large Language Models
Authors: Andrew Cart, Shaohu Zhang, Melanie Escue, Xugui Zhou, Haitao Zhao, Prashanth BusiReddyGari, Beiyu Lin, Shuang Li |
阅读更多来源: ArXiv AI | 14-05-25
Benchmarking AI scientists in omics data-driven biological research
Authors: Erpai Luo, Jinmeng Jia, Yifan Xiong, Xiangyu Li, Xiaobo Guo, Baoqi Yu, Lei Wei, Xuegong Zhang |
阅读更多来源: ArXiv AI | 14-05-25
Evaluating LLM Metrics Through Real-World Capabilities
Authors: Justin K Miller, Wenjia Tang |
阅读更多来源: ArXiv AI | 14-05-25
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
Authors: Enci Zhang, Xingang Yan, Wei Lin, Tianxiang Zhang, Qianchun Lu |
阅读更多来源: ArXiv AI | 14-05-25
Strategy-Augmented Planning for Large Language Models via Opponent Exploitation
Authors: Shuai Xu, Sijia Cui, Yanna Wang, Bo Xu, Qi Wang |
阅读更多来源: ArXiv AI | 14-05-25
Achieving Scalable Robot Autonomy via neurosymbolic planning using lightweight local LLM
Authors: Nicholas Attolino, Alessio Capitanelli, Fulvio Mastrogiovanni |
阅读更多来源: ArXiv AI | 14-05-25
Guiding LLM-based Smart Contract Generation with Finite State Machine
Authors: Hao Luo, Yuhao Lin, Xiao Yan, Xintong Hu, Yuxiang Wang, Qiming Zeng, Hao Wang, Jiawei Jiang |
阅读更多来源: ArXiv AI | 14-05-25
Integrating Natural Language Processing and Exercise Monitoring for Early Diagnosis of Metabolic Syndrome: A Deep Learning Approach
Authors: Yichen Zhao, Yuhua Wang, Xi Cheng, Junhao Fang, Yang Yang |
阅读更多来源: ArXiv AI | 14-05-25
LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs
Authors: K M Sajjadul Islam, Ayesha Siddika Nipu, Jiawei Wu, Praveen Madiraju |
阅读更多来源: ArXiv AI | 14-05-25
DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models
Authors: Xiaoyang Chen, Xinan Dai, Yu Du, Qian Feng, Naixu Guo, Tingshuo Gu, Yuting Gao, Yingyi Gao, Xudong Han, Xiang Jiang, Yilin Jin, Hongyi Lin, Shisheng Lin, Xiangnan Li, Yuante Li, Yixing Li, Zhentao Lai, Zilu Ma, Yingrong Peng, Jiacheng Qian, Hao-Yu Sun, Jianbo Sun, Zirui Wang, Siwei Wu, Zian Wang, Bin Xu, Jianghao Xu, Yiyang Yu, Zichuan Yang, Hongji Zha, Ruichong Zhang |
阅读更多来源: ArXiv AI | 14-05-25
OpenAI's chief scientist Jakub Pachocki says there is evidence that AI models discover novel insights
阅读更多来源: The Decoder | 14-05-25
Insurers launch cover for losses caused by AI chatbot errorsft.com
阅读更多来源: Hacker News | 14-05-25
Garbage collection of object storage at scalewarpstream.com
阅读更多来源: Hacker News | 14-05-25
DeepSeek’s founder is threatening US dominance in AI racebloomberg.com
阅读更多来源: Hacker News | 14-05-25
Confident user prompts make LLMs more likely to hallucinate
阅读更多来源: The Decoder | 13-05-25
Stanford researchers find AI agents improve when guided by past successes
阅读更多来源: The Decoder | 13-05-25
Microsoft could sacrifice some OpenAI shares - but wants to secure access to AI technology
阅读更多来源: The Decoder | 13-05-25
HealthBench – An evaluation for AI systems and human healthopenai.com
阅读更多来源: Hacker News | 13-05-25
A conversation about AI for science with Jason Pruetlanl.gov
阅读更多来源: Hacker News | 13-05-25
A class of distributed automata that contains the modal mu-fragment
Authors: Veeti Ahvonen, Damian Heiman, Antti Kuusisto |
阅读更多来源: ArXiv AI | 13-05-25
Reliable Collaborative Conversational Agent System Based on LLMs and Answer Set Programming
Authors: Yankai Zeng, Gopal Gupta |
阅读更多来源: ArXiv AI | 13-05-25
KCluster: An LLM-based Clustering Approach to Knowledge Component Discovery
Authors: Yumou Wei, Paulo Carvalho, John Stamper |
阅读更多来源: ArXiv AI | 13-05-25
Exploring Multimodal Foundation AI and Expert-in-the-Loop for Sustainable Management of Wild Salmon Fisheries in Indigenous Rivers
Authors: Chi Xu, Yili Jin, Sami Ma, Rongsheng Qian, Hao Fang, Jiangchuan Liu, Xue Liu, Edith C.H. Ngai, William I. Atlas, Katrina M. Connors, Mark A. Spoljaric |
阅读更多来源: ArXiv AI | 13-05-25
Control Plane as a Tool: A Scalable Design Pattern for Agentic AI Systems
Authors: Sivasathivel Kandasamy |
阅读更多来源: ArXiv AI | 13-05-25
Embodied Intelligence: The Key to Unblocking Generalized Artificial Intelligence
Authors: Jinhao Jiang, Changlin Chen, Shile Feng, Wanru Geng, Zesheng Zhou, Ni Wang, Shuai Li, Feng-Qi Cui, Erbao Dong |
阅读更多来源: ArXiv AI | 13-05-25
From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering
Authors: Gaurab Sarkar, Sougata Saha |
阅读更多来源: ArXiv AI | 13-05-25
LLM-Augmented Chemical Synthesis and Design Decision Programs
Authors: Haorui Wang, Jeff Guo, Lingkai Kong, Rampi Ramprasad, Philippe Schwaller, Yuanqi Du, Chao Zhang |
阅读更多来源: ArXiv AI | 13-05-25
Explainable AI the Latest Advancements and New Trends
Authors: Bowen Long, Enjie Liu, Renxi Qiu, Yanqing Duan |
阅读更多来源: ArXiv AI | 13-05-25
DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs
Authors: Yubo Shu, Zhewei Huang, Xin Wu, Chen Hu, Shuchang Zhou, Daxin Jiang |
阅读更多来源: ArXiv AI | 13-05-25
Efficient Fault Detection in WSN Based on PCA-Optimized Deep Neural Network Slicing Trained with GOA
Authors: Mahmood Mohassel Feghhi, Raya Majid Alsharfa, Majid Hameed Majeed |
阅读更多来源: ArXiv AI | 13-05-25
RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models
Authors: Hanzheng Dai, Yuanliang Li, Zhibo Zhang, Jun Yan |
阅读更多来源: ArXiv AI | 13-05-25
Architectural Precedents for General Agents using Large Language Models
Authors: Robert E. Wray, James R. Kirk, John E. Laird |
阅读更多来源: ArXiv AI | 13-05-25
AIS Data-Driven Maritime Monitoring Based on Transformer: A Comprehensive Review
Authors: Zhiye Xie, Enmei Tu, Xianping Fu, Guoliang Yuan, Yi Han |
阅读更多来源: ArXiv AI | 13-05-25
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks
Authors: Kai Xu, YiWei Mao, XinYi Guan, ZiLong Feng |
阅读更多来源: ArXiv AI | 13-05-25
How well do LLMs reason over tabular data, really?
Authors: Cornelius Wolff, Madelon Hulsebos |
阅读更多来源: ArXiv AI | 13-05-25
QuantX: A Framework for Hardware-Aware Quantization of Generative AI Workloads
Authors: Khurram Mazher, Saad Bin Nasir |
阅读更多来源: ArXiv AI | 13-05-25
YuLan-OneSim: Towards the Next Generation of Social Simulator with Large Language Models
Authors: Lei Wang, Heyang Gao, Xiaohe Bo, Xu Chen, Ji-Rong Wen |
阅读更多来源: ArXiv AI | 13-05-25
"I Apologize For Not Understanding Your Policy": Exploring the Specification and Evaluation of User-Managed Access Control Policies by AI Virtual Assistants
Authors: Jennifer Mondragon, Carlos Rubio-Medrano, Gael Cruz, Dvijesh Shastri |
阅读更多来源: ArXiv AI | 13-05-25
Multi-Agent Systems for Robotic Autonomy with LLMs
Authors: Junhong Chen, Ziqi Yang, Haoyuan G Xu, Dandan Zhang, George Mylonas |
阅读更多来源: ArXiv AI | 13-05-25
Evolutionary thoughts: integration of large language models and evolutionary algorithms
Authors: Antonio Jimeno Yepes, Pieter Barnard |
阅读更多来源: ArXiv AI | 13-05-25
What Is Next for LLMs? Next-Generation AI Computing Hardware Using Photonic Chips
Authors: Renjie Li, Wenjie Wei, Qi Xin, Xiaoli Liu, Sixuan Mao, Erik Ma, Zijian Chen, Malu Zhang, Haizhou Li, Zhaoyu Zhang |
阅读更多来源: ArXiv AI | 13-05-25
AgentXploit: End-to-End Redteaming of Black-Box AI Agents
Authors: Zhun Wang, Vincent Siu, Zhe Ye, Tianneng Shi, Yuzhou Nie, Xuandong Zhao, Chenguang Wang, Wenbo Guo, Dawn Song |
阅读更多来源: ArXiv AI | 13-05-25
Human-in-the-Loop AI for HVAC Management Enhancing Comfort and Energy Efficiency
Authors: Xinyu Liang, Frits de Nijs, Buser Say, Hao Wang |
阅读更多来源: ArXiv AI | 13-05-25
Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI
Authors: Benjamin Raphael Ernhofer, Daniil Prokhorov, Jannica Langner, Dominik Bollmann |
阅读更多来源: ArXiv AI | 13-05-25
IRNN: Innovation-driven Recurrent Neural Network for Time-Series Data Modeling and Prediction
Authors: Yifan Zhou, Yibo Wang, Chao Shang |
阅读更多来源: ArXiv AI | 13-05-25
Multimodal Sentiment Analysis on CMU-MOSEI Dataset using Transformer-based Models
Authors: Jugal Gajjar, Kaustik Ranaware |
阅读更多来源: ArXiv AI | 13-05-25
LLMs Outperform Experts on Challenging Biology Benchmarks
Authors: Lennart Justen |
阅读更多来源: ArXiv AI | 13-05-25
UniSymNet: A Unified Symbolic Network Guided by Transformer
Authors: Xinxin Li, Juan Zhang, Da Li, Xingyu Liu, Jin Xu, Junping Yin |
阅读更多来源: ArXiv AI | 13-05-25
The Application of Deep Learning for Lymph Node Segmentation: A Systematic Review
Authors: Jingguo Qu, Xinyang Han, Man-Lik Chui, Yao Pu, Simon Takadiyi Gunda, Ziman Chen, Jing Qin, Ann Dorothy King, Winnie Chiu-Wing Chu, Jing Cai, Michael Tin-Cheung Ying |
阅读更多来源: ArXiv AI | 13-05-25
A Scaling Law for Token Efficiency in LLM Fine-Tuning Under Fixed Compute Budgets
Authors: Ryan Lagasse, Aidan Kiernans, Avijit Ghosh, Shiri Dori-Hacohen |
阅读更多来源: ArXiv AI | 13-05-25
HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics
Authors: Lennart Luettgau, Harry Coppock, Magda Dubois, Christopher Summerfield, Cozmin Ududec |
阅读更多来源: ArXiv AI | 13-05-25
Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods
Authors: Markov Grey, Charbel-Raphaël Segerie |
阅读更多来源: ArXiv AI | 13-05-25
Leveraging Large Language Models for enzymatic reaction prediction and characterization
Authors: Lorenzo Di Fruscia, Jana Marie Weber |
阅读更多来源: ArXiv AI | 13-05-25
Combining Abstract Argumentation and Machine Learning for Efficiently Analyzing Low-Level Process Event Streams
Authors: Bettina Fazzinga, Sergio Flesca, Filippo Furfaro, Luigi Pontieri, Francesco Scala |
阅读更多来源: ArXiv AI | 13-05-25
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning
Authors: Azim Ospanov, Roozbeh Yousefzadeh |
阅读更多来源: ArXiv AI | 13-05-25
ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding
Authors: Shuai Wang, Ivona Najdenkoska, Hongyi Zhu, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring |
阅读更多来源: ArXiv AI | 13-05-25
Free and Fair Hardware: A Pathway to Copyright Infringement-Free Verilog Generation using LLMs
Authors: Sam Bush, Matthew DeLorenzo, Phat Tieu, Jeyavijayan Rajendran |
阅读更多来源: ArXiv AI | 13-05-25
Bytedance launches Agent TARS, an open-source AI automation agent
阅读更多来源: The Decoder | 12-05-25
Google recaps how its LLMs could change in-game interactions
阅读更多来源: The Decoder | 12-05-25
Five major obstacles are holding back RAG systems in healthcare
阅读更多来源: The Decoder | 12-05-25
Writing an LLM from scratch, part 13 – attention heads are dumbgilesthomas.com
阅读更多来源: Hacker News | 12-05-25
US Copyright Office found AI companies breach copyright. Its boss was firedtheregister.com
阅读更多来源: Hacker News | 12-05-25
Klarna changes its AI tune and again recruits humans for customer servicecustomerexperiencedive.com
阅读更多来源: Hacker News | 12-05-25
Avoiding AI is hard – but our freedom to opt out must be protectedtheconversation.com
阅读更多来源: Hacker News | 12-05-25
Custom SIM card in Tesla Model 3 2024, Tesla Model Y 2025 and Cybertruckolegkutkov.me
阅读更多来源: Hacker News | 12-05-25
OpenAI adds new fine-tuning options for o4-mini and GPT-4.1
阅读更多来源: The Decoder | 11-05-25
Software Development Life Cycle Perspective: A Survey of Benchmarks for CodeLLMs and Agents
Authors: Kaixin Wang, Tianlin Li, Xiaoyu Zhang, Chong Wang, Weisong Sun, Yang Liu, Bin Shi |
阅读更多来源: ArXiv AI | 11-05-25
T-T: Table Transformer for Tagging-based Aspect Sentiment Triplet Extraction
Authors: Kun Peng, Chaodong Tong, Cong Cao, Hao Peng, Qian Li, Guanlin Wu, Lei Jiang, Yanbing Liu, Philip S. Yu |
阅读更多来源: ArXiv AI | 11-05-25
Put CASH on Bandits: A Max K-Armed Problem for Automated Machine Learning
Authors: Amir Rezaei Balef, Claire Vernade, Katharina Eggensperger |
阅读更多来源: ArXiv AI | 11-05-25
Incentive-Aware Machine Learning; Robustness, Fairness, Improvement & Causality
Authors: Chara Podimata |
阅读更多来源: ArXiv AI | 11-05-25
High-fidelity Grain Growth Modeling: Leveraging Deep Learning for Fast Computations
Authors: Pungponhavoan Tep, Marc Bernacki |
阅读更多来源: ArXiv AI | 11-05-25
Threshold Modulation for Online Test-Time Adaptation of Spiking Neural Networks
Authors: Kejie Zhao, Wenjia Hua, Aiersi Tuerhong, Luziwei Leng, Yuxin Ma, Qinghua Guo |
阅读更多来源: ArXiv AI | 11-05-25
Towards Artificial Intelligence Research Assistant for Expert-Involved Learning
Authors: Tianyu Liu, Simeng Han, Xiao Luo, Hanchen Wang, Pan Lu, Biqing Zhu, Yuge Wang, Keyi Li, Jiapeng Chen, Rihao Qu, Yufeng Liu, Xinyue Cui, Aviv Yaish, Yuhang Chen, Minsheng Hao, Chuhan Li, Kexing Li, Arman Cohan, Hua Xu, Mark Gerstein, James Zou, Hongyu Zhao |
阅读更多来源: ArXiv AI | 11-05-25
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant
Authors: Haibo Wang, Bo Feng, Zhengfeng Lai, Mingze Xu, Shiyu Li, Weifeng Ge, Afshin Dehghan, Meng Cao, Ping Huang |
阅读更多来源: ArXiv AI | 11-05-25
TransProQA: an LLM-based literary Translation evaluation metric with Professional Question Answering
Authors: Ran Zhang, Wei Zhao, Lieve Macken, Steffen Eger |
阅读更多来源: ArXiv AI | 11-05-25
Large Language Models are Autonomous Cyber Defenders
Authors: Sebastián R. Castro, Roberto Campbell, Nancy Lau, Octavio Villalobos, Jiaqi Duan, Alvaro A. Cardenas |
阅读更多来源: ArXiv AI | 11-05-25
The Promise and Limits of LLMs in Constructing Proofs and Hints for Logic Problems in Intelligent Tutoring Systems
Authors: Sutapa Dey Tithi, Arun Kumar Ramesh, Clara DiMarco, Xiaoyi Tian, Nazia Alam, Kimia Fazeli, Tiffany Barnes |
阅读更多来源: ArXiv AI | 11-05-25
Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards
Authors: Jaeho Kim, Yunseok Lee, Seulki Lee |
阅读更多来源: ArXiv AI | 11-05-25
Position: Epistemic Artificial Intelligence is Essential for Machine Learning Models to Know When They Do Not Know
Authors: Shireen Kudukkil Manchingal, Fabio Cuzzolin |
阅读更多来源: ArXiv AI | 11-05-25
A Reputation System for Large Language Model-based Multi-agent Systems to Avoid the Tragedy of the Commons
Authors: Siyue Ren, Wanli Fu, Xinkun Zou, Chen Shen, Yi Cai, Chen Chu, Zhen Wang, Shuyue Hu |
阅读更多来源: ArXiv AI | 11-05-25
Is there a half-life for the success rates of AI agents?
Authors: Toby Ord |
阅读更多来源: ArXiv AI | 11-05-25
Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation
Authors: Luca Marzari, Isabella Mastroeni, Alessandro Farinelli |
阅读更多来源: ArXiv AI | 11-05-25
A Pain Assessment Framework based on multimodal data and Deep Machine Learning methods
Authors: Stefanos Gkikas |
阅读更多来源: ArXiv AI | 11-05-25
ZeroSearch: Alibaba trains search assistant in AI simulation
阅读更多来源: The Decoder | 11-05-25
Show HN: Code Claude Codegithub.com/rvca212
阅读更多来源: Hacker News | 11-05-25
LTXVideo 13B AI video generationltxv.video
阅读更多来源: Hacker News | 10-05-25
ChatGPT's user base expands while established web giants lose ground
阅读更多来源: The Decoder | 10-05-25
Hugging Face unveils experimental AI agent for computers
阅读更多来源: The Decoder | 10-05-25
OpenAI plans "cderGPT" for the US Food and Drug Administration (FDA)
阅读更多来源: The Decoder | 10-05-25
Odin, a Pragmatic C Alternative with a Go Flavourbitshifters.cc
阅读更多来源: Hacker News | 10-05-25
Fighting Unwanted Notifications with Machine Learning in Chromechromium.org
阅读更多来源: Hacker News | 10-05-25
Microsoft leverages Google's open A2A protocol for interoperable AI agents
阅读更多来源: The Decoder | 09-05-25
A flat pricing subscription for Claude Codeanthropic.com
阅读更多来源: Hacker News | 09-05-25
Ciro (YC S22) is hiring a software engineer to build AI agents for salesycombinator.com
阅读更多来源: Hacker News | 09-05-25
Notes on rolling out Cursor and Claude Codeghiculescu.substack.com
阅读更多来源: Hacker News | 09-05-25
OpenAI launches a program to partner with governments on global AI infrastructure
阅读更多来源: The Decoder | 08-05-25
EU's leading AI startup Mistral unveils Medium 3 and Le Chat Enterprise
阅读更多来源: The Decoder | 08-05-25
By 2026, most firms expect to have a Chief AI Officer on staff
阅读更多来源: The Decoder | 08-05-25
Web search on the Anthropic APIanthropic.com
阅读更多来源: Hacker News | 08-05-25
Create and edit images with Gemini 2.0 in previewgoogleblog.com
阅读更多来源: Hacker News | 08-05-25
Mistral ships Le Chat – enterprise AI assistant that can run on premmistral.ai
阅读更多来源: Hacker News | 08-05-25
Sparsity is All You Need: Rethinking Biological Pathway-Informed Approaches in Deep Learning
Authors: Isabella Caranzano, Corrado Pancotti, Cesare Rollo, Flavio Sartori, Pietro Liò, Piero Fariselli, Tiziana Sanavia |
阅读更多来源: ArXiv AI | 08-05-25
Balancing Accuracy, Calibration, and Efficiency in Active Learning with Vision Transformers Under Label Noise
Authors: Moseli Mots'oehli, Hope Mogale, Kyungim Baek |
阅读更多来源: ArXiv AI | 08-05-25
Multi-Granular Attention based Heterogeneous Hypergraph Neural Network
Authors: Hong Jin, Kaicheng Zhou, Jie Yin, Lan You, Zhifeng Zhou |
阅读更多来源: ArXiv AI | 08-05-25
Detecting Concept Drift in Neural Networks Using Chi-squared Goodness of Fit Testing
Authors: Jacob Glenn Ayers, Buvaneswari A. Ramanan, Manzoor A. Khan |
阅读更多来源: ArXiv AI | 08-05-25
OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models
Authors: Xiaoyu Xu, Minxin Du, Qingqing Ye, Haibo Hu |
阅读更多来源: ArXiv AI | 08-05-25
The Aloe Family Recipe for Open and Specialized Healthcare LLMs
Authors: Dario Garcia-Gasulla, Jordi Bayarri-Planas, Ashwin Kumar Gururajan, Enrique Lopez-Cuena, Adrian Tormos, Daniel Hinjos, Pablo Bernabeu-Perez, Anna Arias-Duart, Pablo Agustin Martin-Torres, Marta Gonzalez-Mallo, Sergio Alvarez-Napagao, Eduard Ayguadé-Parra, Ulises Cortés |
阅读更多来源: ArXiv AI | 08-05-25
"I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments
Authors: Ziyi Zhang, Zhen Sun, Zongmin Zhang, Zifan Peng, Yuemeng Zhao, Zichun Wang, Zeren Luo, Ruiting Zuo, Xinlei He |
阅读更多来源: ArXiv AI | 08-05-25
Automatic Music Transcription using Convolutional Neural Networks and Constant-Q transform
Authors: Yohannis Telila, Tommaso Cucinotta, Davide Bacciu |
阅读更多来源: ArXiv AI | 08-05-25
Model-Based AI planning and Execution Systems for Robotics
Authors: Or Wertheim, Ronen I. Brafman |
阅读更多来源: ArXiv AI | 08-05-25
Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind
Authors: Mouad Abrini, Omri Abend, Dina Acklin, Henny Admoni, Gregor Aichinger, Nitay Alon, Zahra Ashktorab, Ashish Atreja, Moises Auron, Alexander Aufreiter, Raghav Awasthi, Soumya Banerjee, Joe M. Barnby, Rhea Basappa, Severin Bergsmann, Djallel Bouneffouf, Patrick Callaghan, Marc Cavazza, Thierry Chaminade, Sonia Chernova, Mohamed Chetouan, Moumita Choudhury, Axel Cleeremans, Jacek B. Cywinski, Fabio Cuzzolin, Hokin Deng, N'yoma Diamond, Camilla Di Pasquasio, Guillaume Dumas, Max van Duijn, Mahapatra Dwarikanath, Qingying Gao, Ashok Goel, Rebecca Goldstein, Matthew Gombolay, Gabriel Enrique Gonzalez, Amar Halilovic, Tobias Halmdienst, Mahimul Islam, Julian Jara-Ettinger, Natalie Kastel, Renana Keydar, Ashish K. Khanna, Mahdi Khoramshahi, JiHyun Kim, MiHyeon Kim, YoungBin Kim, Senka Krivic, Nikita Krasnytskyi, Arun Kumar, JuneHyoung Kwon, Eunju Lee, Shane Lee, Peter R. Lewis, Xue Li, Yijiang Li, Michal Lewandowski, Nathan Lloyd, Matthew B. Luebbers, Dezhi Luo, Haiyun Lyu, Dwarikanath Mahapatra, Kamal Maheshwari, Mallika Mainali, Piyush Mathur, Patrick Mederitsch, Shuwa Miura, Manuel Preston de Miranda, Reuth Mirsky, Shreya Mishra, Nina Moorman, Katelyn Morrison, John Muchovej, Bernhard Nessler, Felix Nessler, Hieu Minh Jord Nguyen, Abby Ortego, Francis A. Papay, Antoine Pasquali, Hamed Rahimi, Charumathi Raghu, Amanda Royka, Stefan Sarkadi, Jaelle Scheuerman, Simon Schmid, Paul Schrater, Anik Sen, Zahra Sheikhbahaee, Ke Shi, Reid Simmons, Nishant Singh, Mason O. Smith, Ramira van der Meulen, Anthia Solaki, Haoran Sun, Viktor Szolga, Matthew E. Taylor, Travis Taylor, Sanne Van Waveren, Juan David Vargas |
阅读更多来源: ArXiv AI | 08-05-25
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
Authors: Zhenghao Xing, Xiaowei Hu, Chi-Wing Fu, Wenhai Wang, Jifeng Dai, Pheng-Ann Heng |
阅读更多来源: ArXiv AI | 08-05-25
Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization
Authors: Wenjun Cao |
阅读更多来源: ArXiv AI | 08-05-25
The Power of Stories: Narrative Priming Shapes How LLM Agents Collaborate and Compete
Authors: Gerrit Großmann, Larisa Ivanova, Sai Leela Poduru, Mohaddeseh Tabrizian, Islam Mesabah, David A. Selby, Sebastian J. Vollmer |
阅读更多来源: ArXiv AI | 08-05-25
LogiDebrief: A Signal-Temporal Logic based Automated Debriefing Approach with Large Language Models Integration
Authors: Zirong Chen, Ziyan An, Jennifer Reynolds, Kristin Mullen, Stephen Martini, Meiyi Ma |
阅读更多来源: ArXiv AI | 08-05-25
TrajEvo: Designing Trajectory Prediction Heuristics via LLM-driven Evolution
Authors: Zhikai Zhao, Chuanbo Hua, Federico Berto, Kanghoon Lee, Zihan Ma, Jiachen Li, Jinkyoo Park |
阅读更多来源: ArXiv AI | 08-05-25
ChatGPT sees about 50 percent more use on weekdays than weekends
阅读更多来源: The Decoder | 08-05-25
OpenAI restructures as public benefit corporation under non-profit control
阅读更多来源: The Decoder | 08-05-25
Google upgrades Gemini 2.5 Pro for coding and app development
阅读更多来源: The Decoder | 08-05-25
Wikidive – AI guided rabbitholes in Wikipediawikidive.tulv.in
阅读更多来源: Hacker News | 08-05-25
How to Average in Prolog (2017)storytotell.org
阅读更多来源: Hacker News | 08-05-25
Detecting Quishing Attacks with Machine Learning Techniques Through QR Code Analysis
Authors: Fouad Trad, Ali Chehab |
阅读更多来源: ArXiv AI | 07-05-25
An Analysis of Hyper-Parameter Optimization Methods for Retrieval Augmented Generation
Authors: Matan Orbach, Ohad Eytan, Benjamin Sznajder, Ariel Gera, Odellia Boni, Yoav Kantor, Gal Bloch, Omri Levy, Hadas Abraham, Nitzan Barzilay, Eyal Shnarch, Michael E. Factor, Shila Ofek-Koifman, Paula Ta-Shma, Assaf Toledo |
阅读更多来源: ArXiv AI | 07-05-25
Blending 3D Geometry and Machine Learning for Multi-View Stereopsis
Authors: Vibhas Vats, Md. Alimoor Reza, David Crandall, Soon-heung Jung |
阅读更多来源: ArXiv AI | 07-05-25
Rapid AI-based generation of coverage paths for dispensing applications
Authors: Simon Baeuerle, Ian F. Mendonca, Kristof Van Laerhoven, Ralf Mikut, Andreas Steimer |
阅读更多来源: ArXiv AI | 07-05-25
LlamaFirewall: An open source guardrail system for building secure AI agents
Authors: Sahana Chennabasappa, Cyrus Nikolaidis, Daniel Song, David Molnar, Stephanie Ding, Shengye Wan, Spencer Whitman, Lauren Deason, Nicholas Doucette, Abraham Montilla, Alekhya Gampa, Beto de Paola, Dominik Gabi, James Crnkovich, Jean-Christophe Testud, Kat He, Rashnil Chaturvedi, Wu Zhou, Joshua Saxe |
阅读更多来源: ArXiv AI | 07-05-25
Holmes: Automated Fact Check with Large Language Models
Authors: Haoran Ou, Gelei Deng, Xingshuo Han, Jie Zhang, Xinlei He, Han Qiu, Shangwei Guo, Tianwei Zhang |
阅读更多来源: ArXiv AI | 07-05-25
Is AI currently capable of identifying wild oysters? A comparison of human annotators against the AI model, ODYSSEE
Authors: Brendan Campbell, Alan Williams, Kleio Baxevani, Alyssa Campbell, Rushabh Dhoke, Rileigh E. Hudock, Xiaomin Lin, Vivek Mange, Bernhard Neuberger, Arjun Suresh, Alhim Vera, Arthur Trembanis, Herbert G. Tanner, Edward Hale |
阅读更多来源: ArXiv AI | 07-05-25
CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics
Authors: Junqi Liu, Xiaohan Lin, Jonas Bayer, Yael Dillies, Weijie Jiang, Xiaodan Liang, Roman Soletskyi, Haiming Wang, Yunzhou Xie, Beibei Xiong, Zhengfeng Yang, Jujian Zhang, Lihong Zhi, Jia Li, Zhengying Liu |
阅读更多来源: ArXiv AI | 07-05-25
Capability-Driven Skill Generation with LLMs: A RAG-Based Approach for Reusing Existing Libraries and Interfaces
Authors: Luis Miguel Vieira da Silva, Aljosha Köcher, Nicolas König, Felix Gehlhoff, Alexander Fay |
阅读更多来源: ArXiv AI | 07-05-25
RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection via Retrieval-Augmented Generation
Authors: Tiantian Gan, Qiyao Sun |
阅读更多来源: ArXiv AI | 07-05-25
Validating the Effectiveness of a Large Language Model-based Approach for Identifying Children's Development across Various Free Play Settings in Kindergarten
Authors: Yuanyuan Yang, Yuan Shen, Tianchen Sun, Yangbin Xie |
阅读更多来源: ArXiv AI | 07-05-25
Procedural Memory Is Not All You Need: Bridging Cognitive Gaps in LLM-Based Agents
Authors: Schaun Wheeler, Olivier Jeunen |
阅读更多来源: ArXiv AI | 07-05-25
am-ELO: A Stable Framework for Arena-based LLM Evaluation
Authors: Zirui Liu, Jiatong Li, Yan Zhuang, Qi Liu, Shuanghong Shen, Jie Ouyang, Mingyue Cheng, Shijin Wang |
阅读更多来源: ArXiv AI | 07-05-25
OSUniverse: Benchmark for Multimodal GUI-navigation AI Agents
Authors: Mariya Davydova, Daniel Jeffries, Patrick Barker, Arturo Márquez Flores, Sinéad Ryan |
阅读更多来源: ArXiv AI | 07-05-25
Graph Drawing for LLMs: An Empirical Evaluation
Authors: Walter Didimo, Fabrizio Montecchiani, Tommaso Piselli |
阅读更多来源: ArXiv AI | 07-05-25
Accents in latent spaces: How AI hears accent strength in Englishboldvoice.com
阅读更多来源: Hacker News | 07-05-25
Gemini 2.5 Pro Previewgoogleblog.com
阅读更多来源: Hacker News | 07-05-25
Claude's system prompt is over 24k tokens with toolsgithub.com/asgeirtj
阅读更多来源: Hacker News | 07-05-25
OpenAI reaches agreement to buy Windsurf for $3Bbloomberg.com
阅读更多来源: Hacker News | 07-05-25
Show HN: Clippy – 90s UI for local LLMsfelixrieseberg.github.io
阅读更多来源: Hacker News | 07-05-25
I built an AI code review agent in a few hours, here's what I learnedsourcebot.dev
阅读更多来源: Hacker News | 07-05-25
A coherent European/non-US cloud strategyberthub.eu
阅读更多来源: Hacker News | 07-05-25