每日 AI 资讯 by Homer

by Homer LYJIEBOX@QQ.COM

Microsoft brings Copilot LLM features directly into Excel spreadsheet cells with a new in-cell function

阅读更多

来源: The Decoder | 21-08-25

Show HN: I replaced vector databases with Git for AI memory (PoC)github.com/growth-kinetics

阅读更多

来源: Hacker News | 21-08-25

AI crawlers, fetchers are blowing up websites; Meta, OpenAI are worst offenderstheregister.com

阅读更多

来源: Hacker News | 21-08-25

AWS CEO says using AI to replace junior staff is 'Dumbest thing I've ever heard'theregister.com

阅读更多

来源: Hacker News | 21-08-25

Mark Zuckerberg freezes AI hiring amid bubble fearstelegraph.co.uk

阅读更多

来源: Hacker News | 21-08-25

Weaponizing image scaling against production AI systemstrailofbits.com

阅读更多

来源: Hacker News | 21-08-25

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Authors: NVIDIA: Aarti Basant, Abhijit Khairnar, Abhijit Paithankar, Abhinav Khattar, Adi Renduchintala, Adithya Renduchintala, Aditya Malte, Akhiad Bercovich, Akshay Hazare, Alejandra Rico, Aleksander Ficek, Alex Kondratenko, Alex Shaposhnikov, Ali Taghibakhshi, Amelia Barton, Ameya Sunil Mahabaleshwarkar, Amy Shen, Andrew Tao, Ann Guan, Anna Shors, Anubhav Mandarwal, Arham Mehta, Arun Venkatesan, Ashton Sharabiani, Ashwath Aithal, Ashwin Poojary, Ayush Dattagupta, Balaram Buddharaju, Banghua Zhu, Barnaby Simkin, Bilal Kartal, Bita Darvish Rouhani, Bobby Chen, Boris Ginsburg, Brandon Norick, Brian Yu, Bryan Catanzaro, Charles Wang, Charlie Truong, Chetan Mungekar, Chintan Patel, Chris Alexiuk, Christian Munley, Christopher Parisien, Dan Su, Daniel Afrimi, Daniel Korzekwa, Daniel Rohrer, Daria Gitman, David Mosallanezhad, Deepak Narayanan, Dima Rekesh, Dina Yared, Dmytro Pykhtar, Dong Ahn, Duncan Riach, Eileen Long, Elliott Ning, Eric Chung, Erick Galinkin, Evelina Bakhturina, Gargi Prasad, Gerald Shen, Haim Elisha, Harsh Sharma, Hayley Ross, Helen Ngo, Herman Sahota, Hexin Wang, Hoo Chang Shin, Hua Huang, Iain Cunningham, Igor Gitman, Ivan Moshkov, Jaehun Jung, Jan Kautz, Jane Polak Scowcroft, Jared Casper, Jimmy Zhang, Jinze Xue, Jocelyn Huang, Joey Conway, John Kamalu, Jonathan Cohen, Joseph Jennings, Julien Veron Vialard, Junkeun Yi, Jupinder Parmar, Kari Briski, Katherine Cheung, Katherine Luna, Keith Wyss, Keshav Santhanam, Kezhi Kong, Krzysztof Pawelec, Kumar Anik, Kunlun Li, Kushan Ahmadian, Lawrence McAfee |

阅读更多

来源: ArXiv AI | 21-08-25

Post-hoc LLM-Supported Debugging of Distributed Processes

Authors: Dennis Schiese, Andreas Both |

阅读更多

来源: ArXiv AI | 21-08-25

Towards LLM-generated explanations for Component-based Knowledge Graph Question Answering Systems

Authors: Dennis Schiese, Aleksandr Perevalov, Andreas Both |

阅读更多

来源: ArXiv AI | 21-08-25

Adaptively Robust LLM Inference Optimization under Prediction Uncertainty

Authors: Zixi Chen, Yinyu Ye, Zijie Zhou |

阅读更多

来源: ArXiv AI | 21-08-25

Can LLM Agents Solve Collaborative Tasks? A Study on Urgency-Aware Planning and Coordination

Authors: João Vitor de Carvalho Silva, Douglas G. Macharet |

阅读更多

来源: ArXiv AI | 21-08-25

ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine

Authors: Junying Chen, Zhenyang Cai, Zhiheng Liu, Yunjin Yang, Rongsheng Wang, Qingying Xiao, Xiangyi Feng, Zhan Su, Jing Guo, Xiang Wan, Guangjun Yu, Haizhou Li, Benyou Wang |

阅读更多

来源: ArXiv AI | 21-08-25

PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning

Authors: Ruheng Wang, Hang Zhang, Trieu Nguyen, Shasha Feng, Hao-Wei Pang, Xiang Yu, Li Xiao, Peter Zhiping Zhang |

阅读更多

来源: ArXiv AI | 21-08-25

Reliable generation of isomorphic physics problems using ChatGPT with prompt-chaining and tool use

Authors: Zhongzhou Chen |

阅读更多

来源: ArXiv AI | 21-08-25

Evaluating Multilingual and Code-Switched Alignment in LLMs via Synthetic Natural Language Inference

Authors: Samir Abdaljalil, Erchin Serpedin, Khalid Qaraqe, Hasan Kurban |

阅读更多

来源: ArXiv AI | 21-08-25

TransLLM: A Unified Multi-Task Foundation Framework for Urban Transportation via Learnable Prompting

Authors: Jiaming Leng, Yunying Bi, Chuan Qin, Bing Yin, Yanyong Zhang, Chao Wang |

阅读更多

来源: ArXiv AI | 21-08-25

From Passive Tool to Socio-cognitive Teammate: A Conceptual Framework for Agentic AI in Human-AI Collaborative Learning

Authors: Lixiang Yan |

阅读更多

来源: ArXiv AI | 21-08-25

Large Language Models are Highly Aligned with Human Ratings of Emotional Stimuli

Authors: Mattson Ogg, Chace Ashcraft, Ritwik Bose, Raphael Norman-Tenazas, Michael Wolmetz |

阅读更多

来源: ArXiv AI | 21-08-25

Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Authors: Haokun Lin, Haobo Xu, Yichen Wu, Ziyu Guo, Renrui Zhang, Zhichao Lu, Ying Wei, Qingfu Zhang, Zhenan Sun |

阅读更多

来源: ArXiv AI | 21-08-25

The Agent Behavior: Model, Governance and Challenges in the AI Digital Age

Authors: Qiang Zhang, Pei Yan, Yijia Xu, Chuanpo Fu, Yong Fang, Yang Liu |

阅读更多

来源: ArXiv AI | 21-08-25

Automated Optimization Modeling through Expert-Guided Large Language Model Reasoning

Authors: Beinuo Yang, Qishen Zhou, Junyi Li, Xingchen Su, Simon Hu |

阅读更多

来源: ArXiv AI | 21-08-25

Who Sees What? Structured Thought-Action Sequences for Epistemic Reasoning in LLMs

Authors: Luca Annese, Sabrina Patania, Silvia Serino, Tom Foulsham, Silvia Rossi, Azzurra Ruggeri, Dimitri Ognibene |

阅读更多

来源: ArXiv AI | 21-08-25

MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers

Authors: Ziyang Luo, Zhiqi Shen, Wenzhuo Yang, Zirui Zhao, Prathyusha Jwalapuram, Amrita Saha, Doyen Sahoo, Silvio Savarese, Caiming Xiong, Junnan Li |

阅读更多

来源: ArXiv AI | 21-08-25

Entropy-Constrained Strategy Optimization in Urban Floods: A Multi-Agent Framework with LLM and Knowledge Graph Integration

Authors: Peilin Ji, Xiao Xue, Simeng Wang, Wenhao Yan |

阅读更多

来源: ArXiv AI | 21-08-25

Warnings about runaway expectations are growing louder throughout the AI industry

阅读更多

来源: The Decoder | 21-08-25

Visualizing GPT-OSS-20B embeddingsmelonmars.github.io

阅读更多

来源: Hacker News | 21-08-25

Gaussian Processes for Machine Learning (2006) [pdf]gaussianprocess.org

阅读更多

来源: Hacker News | 20-08-25

Show HN: Claude Code workflow: PRDs → GitHub Issues → parallel executiongithub.com/automazeio

阅读更多

来源: Hacker News | 20-08-25

ASDFormer: A Transformer with Mixtures of Pooling-Classifier Experts for Robust Autism Diagnosis and Biomarker Discovery

Authors: Mohammad Izadi, Mehran Safayani |

阅读更多

来源: ArXiv AI | 20-08-25

Chunks as Arms: Multi-Armed Bandit-Guided Sampling for Long-Context LLM Preference Optimization

Authors: Shaohua Duan, Xinze Li, Zhenghao Liu, Xiaoyuan Yi, Yukun Yan, Shuo Wang, Yu Gu, Ge Yu, Maosong Sun |

阅读更多

来源: ArXiv AI | 20-08-25

Ask Good Questions for Large Language Models

Authors: Qi Wu, Zhongqi Lu |

阅读更多

来源: ArXiv AI | 20-08-25

Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation

Authors: Dongyoon Hahm, Taywon Min, Woogyeol Jin, Kimin Lee |

阅读更多

来源: ArXiv AI | 20-08-25

Cognitive Workspace: Active Memory Management for LLMs -- An Empirical Study of Functional Infinite Context

Authors: Tao An |

阅读更多

来源: ArXiv AI | 20-08-25

Towards Unified Multimodal Financial Forecasting: Integrating Sentiment Embeddings and Market Indicators via Cross-Modal Attention

Authors: Sarthak Khanna, Armin Berger, David Berghaus, Tobias Deusser, Lorenz Sparrenberg, Rafet Sifa |

阅读更多

来源: ArXiv AI | 20-08-25

"DIVE" into Hydrogen Storage Materials Discovery with AI Agents

Authors: Di Zhang, Xue Jia, Tran Ba Hung, Seong Hoon Jang, Linda Zhang, Ryuhei Sato, Yusuke Hashimoto, Toyoto Sato, Kiyoe Konno, Shin-ichi Orimo, Hao Li |

阅读更多

来源: ArXiv AI | 20-08-25

HiFo-Prompt: Prompting with Hindsight and Foresight for LLM-based Automatic Heuristic Design

Authors: Chentong Chen, Mengyuan Zhong, Jianyong Sun, Ye Fan, Jialong Shi |

阅读更多

来源: ArXiv AI | 20-08-25

STPFormer: A State-of-the-Art Pattern-Aware Spatio-Temporal Transformer for Traffic Forecasting

Authors: Jiayu Fang, Zhiqi Shao, S T Boris Choy, Junbin Gao |

阅读更多

来源: ArXiv AI | 20-08-25

Toward Better EHR Reasoning in LLMs: Reinforcement Learning with Expert Attention Guidance

Authors: Yue Fang, Yuxin Guo, Jiaran Gao, Hongxin Ding, Xinke Jiang, Weibin Liao, Yongxin Xu, Yinghao Zhu, Zhibang Yang, Liantao Ma, Junfeng Zhao, Yasha Wang |

阅读更多

来源: ArXiv AI | 20-08-25

Neuro-Symbolic Artificial Intelligence: Towards Improving the Reasoning Abilities of Large Language Models

Authors: Xiao-Wen Yang, Jie-Jing Shao, Lan-Zhe Guo, Bo-Wen Zhang, Zhi Zhou, Lin-Han Jia, Wang-Zhou Dai, Yu-Feng Li |

阅读更多

来源: ArXiv AI | 20-08-25

MHSNet:An MoE-based Hierarchical Semantic Representation Network for Accurate Duplicate Resume Detection with Large Language Model

Authors: Yu Li, Zulong Chen, Wenjian Xu, Hong Wen, Yipeng Yu, Man Lung Yiu, Yuyu Yin |

阅读更多

来源: ArXiv AI | 20-08-25

CausalPlan: Empowering Efficient LLM Multi-Agent Collaboration Through Causality-Driven Planning

Authors: Minh Hoang Nguyen, Van Dai Do, Dung Nguyen, Thin Nguyen, Hung Le |

阅读更多

来源: ArXiv AI | 20-08-25

Revisiting RAG Ensemble: A Theoretical and Mechanistic Analysis of Multi-RAG System Collaboration

Authors: Yifei Chen, Guanting Dong, Yutao Zhu, Zhicheng Dou |

阅读更多

来源: ArXiv AI | 20-08-25

Expertise-aware Multi-LLM Recruitment and Collaboration for Medical Decision-Making

Authors: Liuxin Bao, Zhihao Peng, Xiaofei Zhou, Runmin Cong, Jiyong Zhang, Yixuan Yuan |

阅读更多

来源: ArXiv AI | 20-08-25

Improved Generalized Planning with LLMs through Strategy Refinement and Reflection

Authors: Katharina Stein, Nils Hodel, Daniel Fišer, Jörg Hoffmann, Michael Katz, Alexander Koller |

阅读更多

来源: ArXiv AI | 20-08-25

The Collaboration Paradox: Why Generative AI Requires Both Strategic Intelligence and Operational Stability in Supply Chain Management

Authors: Soumyadeep Dhar |

阅读更多

来源: ArXiv AI | 20-08-25

Structured Agentic Workflows for Financial Time-Series Modeling with LLMs and Reflective Feedback

Authors: Yihao Ang, Yifan Bao, Lei Jiang, Jiajie Tao, Anthony K. H. Tung, Lukasz Szpruch, Hao Ni |

阅读更多

来源: ArXiv AI | 20-08-25

ChronoLLM: Customizing Language Models for Physics-Based Simulation Code Generation

Authors: Jingquan Wang, Andrew Negrut, Harry Zhang, Khailanii Slaton, Shu Wang, Radu Serban, Jinlong Wu, Dan Negrut |

阅读更多

来源: ArXiv AI | 20-08-25

Show HN: OpenAI/reflect – Physical AI Assistant that illuminates your lifegithub.com/openai

阅读更多

来源: Hacker News | 20-08-25

Richard Sutton says the AI industry has "lost its way" by ignoring core principles of intelligence

阅读更多

来源: The Decoder | 20-08-25

Show HN: We started building an AI dev tool but it turned into a Sims-style gameyoutube.com

阅读更多

来源: Hacker News | 19-08-25

Data Mixing Optimization for Supervised Fine-Tuning of Large Language Models

Authors: Yuan Li, Zhengzhong Liu, Eric Xing |

阅读更多

来源: ArXiv AI | 19-08-25

FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction

Authors: Zhiyuan Zeng, Jiashuo Liu, Siyuan Chen, Tianci He, Yali Liao, Jinpeng Wang, Zaiyuan Wang, Yang Yang, Lingyue Yin, Mingren Yin, Zhenwei Zhu, Tianle Cai, Zehui Chen, Jiecao Chen, Yantao Du, Xiang Gao, Jiacheng Guo, Liang Hu, Jianpeng Jiao, Xiangsheng Li, Jingkai Liu, Shuang Ni, Zhoufutu Wen, Ge Zhang, Kaiyuan Zhang, Xin Zhou, Jose Blanchet, Xipeng Qiu, Mengdi Wang, Wenhao Huang |

阅读更多

来源: ArXiv AI | 19-08-25

MOVER: Multimodal Optimal Transport with Volume-based Embedding Regularization

Authors: Haochen You, Baojing Liu |

阅读更多

来源: ArXiv AI | 19-08-25

GraphCogent: Overcoming LLMs' Working Memory Constraints via Multi-Agent Collaboration in Complex Graph Understanding

Authors: Rongzheng Wang, Qizhi Chen, Yihong Huang, Yizhuo Ma, Muquan Li, Jiakai Li, Ke Qin, Guangchun Luo, Shuang Liang |

阅读更多

来源: ArXiv AI | 19-08-25

GALA: Can Graph-Augmented Large Language Model Agentic Workflows Elevate Root Cause Analysis?

Authors: Yifang Tian, Yaming Liu, Zichun Chong, Zihang Huang, Hans-Arno Jacobsen |

阅读更多

来源: ArXiv AI | 19-08-25

An LLM + ASP Workflow for Joint Entity-Relation Extraction

Authors: Trang Tran, Trung Hoang Le, Huiping Cao, Tran Cao Son |

阅读更多

来源: ArXiv AI | 19-08-25

Help or Hurdle? Rethinking Model Context Protocol-Augmented Large Language Models

Authors: Wei Song, Haonan Zhong, Ziqi Ding, Jingling Xue, Yuekang Li |

阅读更多

来源: ArXiv AI | 19-08-25

GridCodex: A RAG-Driven AI Framework for Power Grid Code Reasoning and Compliance

Authors: Jinquan Shi, Yingying Cheng, Fan Zhang, Miao Jiang, Jun Lin, Yanbai Shen |

阅读更多

来源: ArXiv AI | 19-08-25

The Maximum Coverage Model and Recommendation System for UAV Vertiports Location Planning

Authors: Chunliang Hua, Xiao Hu, Jiayang Sun, Zeyuan Yang |

阅读更多

来源: ArXiv AI | 19-08-25

Beyond Ethical Alignment: Evaluating LLMs as Artificial Moral Assistants

Authors: Alessio Galatolo, Luca Alberto Rappuoli, Katie Winkle, Meriem Beloucif |

阅读更多

来源: ArXiv AI | 19-08-25

GTool: Graph Enhanced Tool Planning with Large Language Model

Authors: Wenjie Chen, Wenbin Li, Di Yao, Xuying Meng, Chang Gong, Jingping Bi |

阅读更多

来源: ArXiv AI | 19-08-25

Reliability, Embeddedness, and Agency: A Utility-Driven Mathematical Framework for Agent-Centric AI Adoption

Authors: Faruk Alpay, Taylan Alpay |

阅读更多

来源: ArXiv AI | 19-08-25

E3RG: Building Explicit Emotion-driven Empathetic Response Generation System with Multimodal Large Language Model

Authors: Ronghao Lin, Shuai Shen, Weipeng Hu, Qiaolin He, Aolin Xiong, Li Huang, Haifeng Hu, Yap-peng Tan |

阅读更多

来源: ArXiv AI | 19-08-25

Towards Open-Ended Emotional Support Conversations in LLMs via Reinforcement Learning with Future-Oriented Rewards

Authors: Ting Yang, Li Chen, Huimin Wang |

阅读更多

来源: ArXiv AI | 19-08-25

Do Large Language Model Agents Exhibit a Survival Instinct? An Empirical Study in a Sugarscape-Style Simulation

Authors: Atsushi Masumori, Takashi Ikegami |

阅读更多

来源: ArXiv AI | 19-08-25

Tencent's X-Omni uses open source components to challenge GPT-4o image generation

阅读更多

来源: The Decoder | 18-08-25

ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection

Authors: Axel Delaval, Shujian Yang, Haicheng Wang, Han Qiu, Jialiang Lu |

阅读更多

来源: ArXiv AI | 18-08-25

LETToT: Label-Free Evaluation of Large Language Models On Tourism Using Expert Tree-of-Thought

Authors: Ruiyan Qi, Congding Wen, Weibo Zhou, Shangsong Liang, Lingbo Li |

阅读更多

来源: ArXiv AI | 18-08-25

Is General-Purpose AI Reasoning Sensitive to Data-Induced Cognitive Biases? Dynamic Benchmarking on Typical Software Engineering Dilemmas

Authors: Francesco Sovrano, Gabriele Dominici, Rita Sevastjanova, Alessandra Stramiglio, Alberto Bacchelli |

阅读更多

来源: ArXiv AI | 18-08-25

Dynamic Quality-Latency Aware Routing for LLM Inference in Wireless Edge-Device Networks

Authors: Rui Bao, Nan Xue, Yaping Sun, Zhiyong Chen |

阅读更多

来源: ArXiv AI | 18-08-25

CSGO: Generalized Optimization for Cold Start in Wireless Collaborative Edge LLM Systems

Authors: Xuran Liu, Nan Xue, Rui Bao, Yaping Sun, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Shuguang Cui |

阅读更多

来源: ArXiv AI | 18-08-25

Leveraging the RETFound foundation model for optic disc segmentation in retinal images

Authors: Zhenyi Zhao, Muthu Rama Krishnan Mookiah, Emanuele Trucco |

阅读更多

来源: ArXiv AI | 18-08-25

ETTRL: Balancing Exploration and Exploitation in LLM Test-Time Reinforcement Learning Via Entropy Mechanism

Authors: Jia Liu, ChangYi He, YingQiao Lin, MingMin Yang, FeiYang Shen, ShaoGuo Liu, TingTing Gao |

阅读更多

来源: ArXiv AI | 18-08-25

When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs

Authors: Mikhail Seleznyov, Mikhail Chaichuk, Gleb Ershov, Alexander Panchenko, Elena Tutubalina, Oleg Somov |

阅读更多

来源: ArXiv AI | 18-08-25

Trustworthy AI Psychotherapy: Multi-Agent LLM Workflow for Counseling and Explainable Mental Disorder Diagnosis

Authors: Mithat Can Ozgun, Jiahuan Pei, Koen Hindriks, Lucia Donatelli, Qingzhi Liu, Xin Sun, Junxiao Wang |

阅读更多

来源: ArXiv AI | 18-08-25

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

Authors: Wenhao Zhang, Yuexiang Xie, Yuchang Sun, Yanxi Chen, Guoyin Wang, Yaliang Li, Bolin Ding, Jingren Zhou |

阅读更多

来源: ArXiv AI | 18-08-25

Reference Points in LLM Sentiment Analysis: The Role of Structured Context

Authors: Junichiro Niimi |

阅读更多

来源: ArXiv AI | 18-08-25

Towards Faithful Class-level Self-explainability in Graph Neural Networks by Subgraph Dependencies

Authors: Fanzhen Liu, Xiaoxiao Ma, Jian Yang, Alsharif Abuadbba, Kristen Moore, Surya Nepal, Cecile Paris, Quan Z. Sheng, Jia Wu |

阅读更多

来源: ArXiv AI | 18-08-25

Handwritten Text Recognition of Historical Manuscripts Using Transformer-Based Models

Authors: Erez Meoded |

阅读更多

来源: ArXiv AI | 18-08-25

A Comprehensive Perspective on Explainable AI across the Machine Learning Workflow

Authors: George Paterakis, Andrea Castellani, George Papoutsoglou, Tobias Rodemann, Ioannis Tsamardinos |

阅读更多

来源: ArXiv AI | 18-08-25

CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection

Authors: Zhihao Li, Zimo Ji, Tao Zheng, Hao Ren, Xiao Lan |

阅读更多

来源: ArXiv AI | 18-08-25

Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models

Authors: Qiguang Chen, Dengyun Peng, Jinhao Liu, HuiKang Su, Jiannan Guan, Libo Qin, Wanxiang Che |

阅读更多

来源: ArXiv AI | 18-08-25

Controlling Multimodal LLMs via Reward-guided Decoding

Authors: Oscar Mañas, Pierluca D'Oro, Koustuv Sinha, Adriana Romero-Soriano, Michal Drozdzal, Aishwarya Agrawal |

阅读更多

来源: ArXiv AI | 18-08-25

Is ChatGPT-5 Ready for Mammogram VQA?

Authors: Qiang Li, Shansong Wang, Mingzhe Hu, Mojtaba Safari, Zachary Eidex, Xiaofeng Yang |

阅读更多

来源: ArXiv AI | 18-08-25

SAGE: Scale-Aware Gradual Evolution for Continual Knowledge Graph Embedding

Authors: Yifei Li, Lingling Zhang, Hang Yan, Tianzhe Zhao, Zihan Ma, Muye Huang, Jun Liu |

阅读更多

来源: ArXiv AI | 18-08-25

AIM-Bench: Evaluating Decision-making Biases of Agentic LLM as Inventory Manager

Authors: Xuhua Zhao, Yuxuan Xie, Caihua Chen, Yuxiang Sun |

阅读更多

来源: ArXiv AI | 18-08-25

Inspire or Predict? Exploring New Paradigms in Assisting Classical Planners with Large Language Models

Authors: Wenkai Yu, Jianhang Tang, Yang Zhang, Shanjiang Tang, Kebing Jin, Hankz Hankui Zhuo |

阅读更多

来源: ArXiv AI | 18-08-25

LLMs and coding agents are a security nightmaregarymarcus.substack.com

阅读更多

来源: Hacker News | 18-08-25

Llama-Scan: Convert PDFs to Text W Local LLMsgithub.com/ngafar

阅读更多

来源: Hacker News | 18-08-25

When you're asking AI chatbots for answers, they're data-mining youtheregister.com

阅读更多

来源: Hacker News | 18-08-25

Claudia – Desktop companion for Claude codeclaudiacode.com

阅读更多

来源: Hacker News | 18-08-25

Teaching GPT-5 to Use a Computerprava.co

阅读更多

来源: Hacker News | 18-08-25

Here be dragons: Preventing static damage, latchup, and metastability in the 386righto.com

阅读更多

来源: Hacker News | 18-08-25

Warmer-sounding LLMs are more likely to repeat false information and conspiracy theories

阅读更多

来源: The Decoder | 18-08-25

Performance of GPT-5 in Brain Tumor MRI Reasoning

Authors: Mojtaba Safari, Shansong Wang, Mingzhe Hu, Zach Eidex, Qiang Li, Xiaofeng Yang |

阅读更多

来源: ArXiv AI | 17-08-25

From Black Box to Transparency: Enhancing Automated Interpreting Assessment with Explainable AI in College Classrooms

Authors: Zhaokun Jiang, Ziyin Zhang |

阅读更多

来源: ArXiv AI | 17-08-25

A Multimodal Neural Network for Recognizing Subjective Self-Disclosure Towards Social Robots

Authors: Henry Powell, Guy Laban, Emily S. Cross |

阅读更多

来源: ArXiv AI | 17-08-25

TLE-Based A2C Agent for Terrestrial Coverage Orbital Path Planning

Authors: Anantha Narayanan, Battu Bhanu Teja, Pruthwik Mishra |

阅读更多

来源: ArXiv AI | 17-08-25

Searching for Privacy Risks in LLM Agents via Simulation

Authors: Yanzhe Zhang, Diyi Yang |

阅读更多

来源: ArXiv AI | 17-08-25

Amazon Nova AI Challenge -- Trusted AI: Advancing secure, AI-assisted software development

Authors: Sattvik Sahai, Prasoon Goyal, Michael Johnston, Anna Gottardi, Yao Lu, Lucy Hu, Luke Dai, Shaohua Liu, Samyuth Sagi, Hangjie Shi, Desheng Zhang, Lavina Vaz, Leslie Ball, Maureen Murray, Rahul Gupta, Shankar Ananthakrishna |

阅读更多

来源: ArXiv AI | 17-08-25

A Survey of Optimization Modeling Meets LLMs: Progress and Future Directions

Authors: Ziyang Xiao, Jingrong Xie, Lilin Xu, Shisi Guan, Jingyan Zhu, Xiongwei Han, Xiaojin Fu, WingYin Yu, Han Wu, Wei Shi, Qingcan Kang, Jiahui Duan, Tao Zhong, Mingxuan Yuan, Jia Zeng, Yuan Wang, Gang Chen, Dongxiang Zhang |

阅读更多

来源: ArXiv AI | 17-08-25

KompeteAI: Accelerated Autonomous Multi-Agent System for End-to-End Pipeline Generation for Machine Learning Problems

Authors: Stepan Kulibaba, Artem Dzhalilov, Roman Pakhomov, Oleg Svidchenko, Alexander Gasnikov, Aleksei Shpilman |

阅读更多

来源: ArXiv AI | 17-08-25

Agentic AI Frameworks: Architectures, Protocols, and Design Challenges

Authors: Hana Derouiche, Zaki Brahmi, Haithem Mazeni |

阅读更多

来源: ArXiv AI | 17-08-25

Why Cannot Large Language Models Ever Make True Correct Reasoning?

Authors: Jingde Cheng |

阅读更多

来源: ArXiv AI | 17-08-25

Extending the Entropic Potential of Events for Uncertainty Quantification and Decision-Making in Artificial Intelligence

Authors: Mark Zilberman |

阅读更多

来源: ArXiv AI | 17-08-25

What to Ask Next? Probing the Imaginative Reasoning of LLMs with TurtleSoup Puzzles

Authors: Mengtao Zhou, Sifan Wu, Huan Zhang, Qi Sima, Bang Liu |

阅读更多

来源: ArXiv AI | 17-08-25

A Curriculum Learning Approach to Reinforcement Learning: Leveraging RAG for Multimodal Question Answering

Authors: Chenliang Zhang, Lin Wang, Yuanyuan Lu, Yusheng Qi, Kexin Wang, Peixu Hou, Wenshi Chen |

阅读更多

来源: ArXiv AI | 17-08-25

HiRef: Leveraging Hierarchical Ontology and Network Refinement for Robust Medication Recommendation

Authors: Yan Ting Chok, Soyon Park, Seungheun Baek, Hajung Kim, Junhyun Lee, Jaewoo Kang |

阅读更多

来源: ArXiv AI | 17-08-25

LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval

Authors: Yaoze Zhang, Rong Wu, Pinlong Cai, Xiaoman Wang, Guohang Yan, Song Mao, Ding Wang, Botian Shi |

阅读更多

来源: ArXiv AI | 17-08-25

Reverse Physician-AI Relationship: Full-process Clinical Diagnosis Driven by a Large Language Model

Authors: Shicheng Xu, Xin Huang, Zihao Wei, Liang Pang, Huawei Shen, Xueqi Cheng |

阅读更多

来源: ArXiv AI | 17-08-25

SEQ-GPT: LLM-assisted Spatial Query via Example

Authors: Ivan Khai Ze Lim, Ningyi Liao, Yiming Yang, Gerald Wei Yong Yip, Siqiang Luo |

阅读更多

来源: ArXiv AI | 17-08-25

FIRESPARQL: A LLM-based Framework for SPARQL Query Generation over Scholarly Knowledge Graphs

Authors: Xueli Pan, Victor de Boer, Jacco van Ossenbruggen |

阅读更多

来源: ArXiv AI | 17-08-25

MSRS: Adaptive Multi-Subspace Representation Steering for Attribute Alignment in Large Language Models

Authors: Xinyan Jiang, Lin Zhang, Jiayi Zhang, Qingsong Yang, Guimin Hu, Di Wang, Lijie Hu |

阅读更多

来源: ArXiv AI | 17-08-25

GenOM: Ontology Matching with Description Generation and Large Language Model

Authors: Yiping Song, Jiaoyan Chen, Renate A. Schmidt |

阅读更多

来源: ArXiv AI | 17-08-25

Modeling Human Responses to Multimodal AI Content

Authors: Zhiqi Shen, Shaojing Fan, Danni Xu, Terence Sim, Mohan Kankanhalli |

阅读更多

来源: ArXiv AI | 17-08-25

Who Benefits from AI Explanations? Towards Accessible and Interpretable Systems

Authors: Maria J. P. Peixoto, Akriti Pandey, Ahsan Zaman, Peter R. Lewis |

阅读更多

来源: ArXiv AI | 17-08-25

The Knowledge-Reasoning Dissociation: Fundamental Limitations of LLMs in Clinical Natural Language Inference

Authors: Maël Jullien, Marco Valentino, André Freitas |

阅读更多

来源: ArXiv AI | 17-08-25

Tversky Neural Networksgonzoml.substack.com

阅读更多

来源: Hacker News | 17-08-25

A Lisp in 99LOCgithub.com/robert-van-engelen

阅读更多

来源: Hacker News | 17-08-25

Dyna – Logic Programming for Machine Learningdyna.org

阅读更多

来源: Hacker News | 17-08-25

OpenAI Misled You on RLHFaerial-toothpaste-34a.notion.site

阅读更多

来源: Hacker News | 17-08-25

OpenAI Progressprogress.openai.com

阅读更多

来源: Hacker News | 17-08-25

OpenAI CEO Sam Altman says human-made content will "go up in value dramatically"

阅读更多

来源: The Decoder | 17-08-25

Google unveils Gemma 3 270M, its most compact model designed for efficient, task-specific AI use

阅读更多

来源: The Decoder | 17-08-25

Monday – A personality experimentchatgpt.com

阅读更多

来源: Hacker News | 17-08-25

Zhipu AI's GLM-4.5 is yet another open-source Chinese LLM closing the gap with Western models

阅读更多

来源: The Decoder | 17-08-25

Launch HN: Embedder (YC S25) – Claude code for embedded software

阅读更多

来源: Hacker News | 16-08-25

Geoffrey Hinton urges researchers to design AI with nurturing instincts to protect humanity

阅读更多

来源: The Decoder | 16-08-25

HTC unveils VIVE Eagle, a lightweight AI headset powered by OpenAI and Gemini

阅读更多

来源: The Decoder | 16-08-25

I let LLMs write an Elixir NIF in C; it mostly workedoverbring.com

阅读更多

来源: Hacker News | 16-08-25

Claude Opus 4 and 4.1 can now end a rare subset of conversationsanthropic.com

阅读更多

来源: Hacker News | 16-08-25

OpenAI's o3 model outperforms the newer GPT-5 model on complex, multi-app office tasks

阅读更多

来源: The Decoder | 16-08-25

Apple is reportedly planning an AI push with four new smart home products

阅读更多

来源: The Decoder | 15-08-25

Doctors detected fewer lesions after routinely using AI during colonoscopies

阅读更多

来源: The Decoder | 15-08-25

A conversation with Max Tegmark inspired AI co-founder Igor Babuschkin shift to safer AI

阅读更多

来源: The Decoder | 15-08-25

Why LLMs can't really build softwarezed.dev

阅读更多

来源: Hacker News | 15-08-25

Is chain-of-thought AI reasoning a mirage?seangoedecke.com

阅读更多

来源: Hacker News | 15-08-25

OpenAI's AI system wins a gold medal-level score at the International Olympiad in Informatics 2025

阅读更多

来源: The Decoder | 14-08-25

ChatGPT users can now toggle Auto, Fast, and Thinking modes for more control over GPT-5

阅读更多

来源: The Decoder | 14-08-25

Show HN: Vaultrice – A real-time key-value store with a localStorage APIvaultrice.com

阅读更多

来源: Hacker News | 14-08-25

Convo-Lang: LLM Programming Language and Runtimeconvo-lang.ai

阅读更多

来源: Hacker News | 14-08-25

Show HN: Yet another memory system for LLMsgithub.com/trvon

阅读更多

来源: Hacker News | 14-08-25

Mbodi AI (YC X25) Is Hiring a Founding Research Engineer (Robotics)ycombinator.com

阅读更多

来源: Hacker News | 14-08-25

What's the strongest AI model you can train on a laptop in five minutes?seangoedecke.com

阅读更多

来源: Hacker News | 14-08-25

Evaluating the Role of Large Language Models in Legal Practice in India

Authors: Rahul Hemrajani (National Law School of India University, Bengaluru) |

阅读更多

来源: ArXiv AI | 14-08-25

Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study

Authors: Mahdi Dhaini, Juraj Vladika, Ege Erdogan, Zineb Attaoui, Gjergji Kasneci |

阅读更多

来源: ArXiv AI | 14-08-25

Enhance the machine learning algorithm performance in phishing detection with keyword features

Authors: Zijiang Yang |

阅读更多

来源: ArXiv AI | 14-08-25

A Comprehensive Survey of Datasets for Clinical Mental Health AI Systems

Authors: Aishik Mandal, Prottay Kumar Adhikary, Hiba Arnaout, Iryna Gurevych, Tanmoy Chakraborty |

阅读更多

来源: ArXiv AI | 14-08-25

LibRec: Benchmarking Retrieval-Augmented LLMs for Library Migration Recommendations

Authors: Junxiao Han, Yarong Wang, Xiaodong Gu, Cuiyun Gao, Yao Wan, Song Han, David Lo, Shuiguang Deng |

阅读更多

来源: ArXiv AI | 14-08-25

Perceptual Reality Transformer: Neural Architectures for Simulating Neurological Perception Conditions

Authors: Baihan Lin |

阅读更多

来源: ArXiv AI | 14-08-25

Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

Authors: Weigao Sun, Jiaxi Hu, Yucheng Zhou, Jusen Du, Disen Lan, Kexin Wang, Tong Zhu, Xiaoye Qu, Yu Zhang, Xiaoyu Mo, Daizong Liu, Yuxuan Liang, Wenliang Chen, Guoqi Li, Yu Cheng |

阅读更多

来源: ArXiv AI | 14-08-25

Exploring the Potential of Large Language Models in Fine-Grained Review Comment Classification

Authors: Linh Nguyen, Chunhua Liu, Hong Yi Lin, Patanamon Thongtanunam |

阅读更多

来源: ArXiv AI | 14-08-25

Beyond Naïve Prompting: Strategies for Improved Zero-shot Context-aided Forecasting with LLMs

Authors: Arjun Ashok, Andrew Robert Williams, Vincent Zhihao Zheng, Irina Rish, Nicolas Chapados, Étienne Marcotte, Valentina Zantedeschi, Alexandre Drouin |

阅读更多

来源: ArXiv AI | 14-08-25

Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models

Authors: Jiaqi Cao, Jiarui Wang, Rubin Wei, Qipeng Guo, Kai Chen, Bowen Zhou, Zhouhan Lin |

阅读更多

来源: ArXiv AI | 14-08-25

STREAM (ChemBio): A Standard for Transparently Reporting Evaluations in AI Model Reports

Authors: Tegan McCaslin, Jide Alaga, Samira Nedungadi, Seth Donoughe, Tom Reed, Rishi Bommasani, Chris Painter, Luca Righetti |

阅读更多

来源: ArXiv AI | 14-08-25

A Comprehensive Evaluation framework of Alignment Techniques for LLMs

Authors: Muneeza Azmat, Momin Abbas, Maysa Malfiza Garcia de Macedo, Marcelo Carpinette Grave, Luan Soares de Souza, Tiago Machado, Rogerio A de Paula, Raya Horesh, Yixin Chen, Heloisa Caroline de Souza Pereira Candello, Rebecka Nordenlow, Aminat Adebiyi |

阅读更多

来源: ArXiv AI | 14-08-25

The Othello AI Arena: Evaluating Intelligent Systems Through Limited-Time Adaptation to Unseen Boards

Authors: Sundong Kim |

阅读更多

来源: ArXiv AI | 14-08-25

Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation

Authors: Junyan Ye, Dongzhi Jiang, Zihao Wang, Leqi Zhu, Zhenghao Hu, Zilong Huang, Jun He, Zhiyuan Yan, Jinghua Yu, Hongsheng Li, Conghui He, Weijia Li |

阅读更多

来源: ArXiv AI | 14-08-25

The PacifAIst Benchmark:Would an Artificial Intelligence Choose to Sacrifice Itself for Human Safety?

Authors: Manuel Herrador |

阅读更多

来源: ArXiv AI | 14-08-25

UDA: Unsupervised Debiasing Alignment for Pair-wise LLM-as-a-Judge

Authors: Yang Zhang, Cunxiang Wang, Lindong Wu, Wenbo Yu, Yidong Wang, Guangsheng Bao, Jie Tang |

阅读更多

来源: ArXiv AI | 14-08-25

RAGulating Compliance: A Multi-Agent Knowledge Graph for Regulatory QA

Authors: Bhavik Agarwal, Hemant Sunil Jomraj, Simone Kaplunov, Jack Krolick, Viktoria Rojkova |

阅读更多

来源: ArXiv AI | 14-08-25

Mathematical Computation and Reasoning Errors by Large Language Models

Authors: Liang Zhang, Edith Aurora Graf |

阅读更多

来源: ArXiv AI | 14-08-25

Claude says “You're absolutely right!” about everythinggithub.com/anthropics

阅读更多

来源: Hacker News | 14-08-25

Illinois bans use of artificial intelligence for mental health therapywashingtonpost.com

阅读更多

来源: Hacker News | 14-08-25

Nvidia researchers urge the AI industry to rethink agentic AI in favor of smaller, more efficient LLMs

阅读更多

来源: The Decoder | 13-08-25

Nvidia pushes "Physical AI" with new Blackwell hardware and AI models

阅读更多

来源: The Decoder | 13-08-25

Psychiatrist warns of AI-driven delusions as OpenAI's Sam Altman admits risks

阅读更多

来源: The Decoder | 13-08-25

GPT-5 is here and Gary Marcus is not impressed

阅读更多

来源: The Decoder | 13-08-25

Nvidia and AMD must pay the U.S. a portion of revenue for selling AI chips in China

阅读更多

来源: The Decoder | 13-08-25

A Comprehensive Survey of Self-Evolving AI Agents [pdf]arxiv.org

阅读更多

来源: Hacker News | 13-08-25

Show HN: Omnara – Run Claude Code from anywheregithub.com/omnara-ai

阅读更多

来源: Hacker News | 13-08-25

Show HN: Building a web search engine from scratch with 3B neural embeddingsblog.wilsonl.in

阅读更多

来源: Hacker News | 13-08-25

His psychosis was a mystery–until doctors learned about ChatGPT's health advicepsypost.org

阅读更多

来源: Hacker News | 13-08-25

Claude Sonnet 4 now supports 1M tokens of contextanthropic.com

阅读更多

来源: Hacker News | 13-08-25

Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer

Authors: Zixin Yin, Xili Dai, Ling-Hao Chen, Deyu Zhou, Jianan Wang, Duomin Wang, Gang Yu, Lionel M. Ni, Heung-Yeung Shum |

阅读更多

来源: ArXiv AI | 13-08-25

Can We Trust AI to Govern AI? Benchmarking LLM Performance on Privacy and AI Governance Exams

Authors: Zane Witherspoon, Thet Mon Aye, YingYing Hao |

阅读更多

来源: ArXiv AI | 13-08-25

UrzaGPT: LoRA-Tuned Large Language Models for Card Selection in Collectible Card Games

Authors: Timo Bertram |

阅读更多

来源: ArXiv AI | 13-08-25

What Breaks Knowledge Graph based RAG? Empirical Insights into Reasoning under Incomplete Knowledge

Authors: Dongzhuoran Zhou, Yuqicheng Zhu, Xiaxia Wang, Hongkuan Zhou, Yuan He, Jiaoyan Chen, Evgeny Kharlamov, Steffen Staab |

阅读更多

来源: ArXiv AI | 13-08-25

First Ask Then Answer: A Framework Design for AI Dialogue Based on Supplementary Questioning with Large Language Models

Authors: Chuanruo Fu, Yuncheng Du |

阅读更多

来源: ArXiv AI | 13-08-25

LLM-BI: Towards Fully Automated Bayesian Inference with Large Language Models

Authors: Yongchao Huang |

阅读更多

来源: ArXiv AI | 13-08-25

Topos Theory for Generative AI and LLMs

Authors: Sridhar Mahadevan |

阅读更多

来源: ArXiv AI | 13-08-25

POMO+: Leveraging starting nodes in POMO for solving Capacitated Vehicle Routing Problem

Authors: Szymon Jakubicz, Karol Kuźniak, Jan Wawszczak, Paweł Gora |

阅读更多

来源: ArXiv AI | 13-08-25

AgriGPT: a Large Language Model Ecosystem for Agriculture

Authors: Bo Yang, Yu Zhang, Lanfei Feng, Yunkui Chen, Jianyu Zhang, Xiao Xu, Nueraili Aierken, Yurui Li, Yuxuan Chen, Guijun Yang, Yong He, Runhe Huang, Shijian Li |

阅读更多

来源: ArXiv AI | 13-08-25

SynLLM: A Comparative Analysis of Large Language Models for Medical Tabular Synthetic Data Generation via Prompt Engineering

Authors: Arshia Ilaty, Hossein Shirazi, Hajar Homayouni |

阅读更多

来源: ArXiv AI | 13-08-25

GVGAI-LLM: Evaluating Large Language Model Agents with Infinite Games

Authors: Yuchen Li, Cong Lin, Muhammad Umair Nasir, Philip Bontrager, Jialin Liu, Julian Togelius |

阅读更多

来源: ArXiv AI | 13-08-25

Large Language Models as Oracles for Ontology Alignment

Authors: Sviatoslav Lushnei, Dmytro Shumskyi, Severyn Shykula, Ernesto Jimenez-Ruiz, Artur d'Avila Garcez |

阅读更多

来源: ArXiv AI | 13-08-25

Prompt-and-Check: Using Large Language Models to Evaluate Communication Protocol Compliance in Simulation-Based Training

Authors: Vishakha Lall, Yisi Liu |

阅读更多

来源: ArXiv AI | 13-08-25

A Dual-Axis Taxonomy of Knowledge Editing for LLMs: From Mechanisms to Functions

Authors: Amir Mohammad Salehoof, Ali Ramezani, Yadollah Yaghoobzadeh, Majid Nili Ahmadabadi |

阅读更多

来源: ArXiv AI | 13-08-25

Silicon Minds versus Human Hearts: The Wisdom of Crowds Beats the Wisdom of AI in Emotion Recognition

Authors: Mustafa Akben, Vinayaka Gude, Haya Ajjan |

阅读更多

来源: ArXiv AI | 13-08-25

Efficient Agent: Optimizing Planning Capability for Multimodal Retrieval Augmented Generation

Authors: Yuechen Wang, Yuming Qiao, Dan Meng, Jun Yang, Haonan Lu, Zhenyu Yang, Xudong Zhang |

阅读更多

来源: ArXiv AI | 13-08-25

Prospect Theory Fails for LLMs: Revealing Instability of Decision-Making under Epistemic Uncertainty

Authors: Rui Wang, Qihan Lin, Jiayu Liu, Qing Zong, Tianshi Zheng, Weiqi Wang, Yangqiu Song |

阅读更多

来源: ArXiv AI | 13-08-25

Activation Steering for Bias Mitigation: An Interpretable Approach to Safer LLMs

Authors: Shivam Dubey |

阅读更多

来源: ArXiv AI | 13-08-25

Intrinsic Memory Agents: Heterogeneous Multi-Agent LLM Systems through Structured Contextual Memory

Authors: Sizhe Yuen, Francisco Gomez Medina, Ting Su, Yali Du, Adam J. Sobey |

阅读更多

来源: ArXiv AI | 13-08-25

CVCM Track Circuits Pre-emptive Failure Diagnostics for Predictive Maintenance Using Deep Neural Networks

Authors: Debdeep Mukherjee (2), Eduardo Di Santi (1), Clément Lefebvre (1), Nenad Mijatovic (1), Victor Martin (1), Thierry Josse (3), Jonathan Brown (1), Kenza Saiah (1) ((1) Digital and Integrated Systems, Alstom (2) Innovation and Smart Mobility, Alstom (3) Project System Engineering, Alstom) |

阅读更多

来源: ArXiv AI | 13-08-25

SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling

Authors: Shixuan Sun, Siyuan Liang, Ruoyu Chen, Jianjie Huang, Jingzhi Li, Xiaochun Cao |

阅读更多

来源: ArXiv AI | 13-08-25

Agent-based AI systems face growing threats from zero-click and one-click exploits

阅读更多

来源: The Decoder | 13-08-25

Nexus: An Open-Source AI Router for Governance, Control and Observabilitynexusrouter.com

阅读更多

来源: Hacker News | 13-08-25

Evaluating LLMs playing text adventuresentropicthoughts.com

阅读更多

来源: Hacker News | 13-08-25

Weave (YC W25) is hiring a founding AI engineerycombinator.com

阅读更多

来源: Hacker News | 13-08-25

LLMs aren't world modelsyosefk.com

阅读更多

来源: Hacker News | 13-08-25

Launch HN: Design Arena (YC S25) – Head-to-head AI benchmark for aesthetics

阅读更多

来源: Hacker News | 13-08-25

U.S. authorities have reportedly embedded secret GPS trackers in shipments of advanced AI chips

阅读更多

来源: The Decoder | 13-08-25

Here’s how to spot AI writing, according to Wikipedia editors

阅读更多

来源: The Decoder | 12-08-25

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lensarstechnica.com

阅读更多

来源: Hacker News | 12-08-25

Sloppy AI defenses take cybersecurity back to the 1990s, researchers sayscworld.com

阅读更多

来源: Hacker News | 12-08-25

Claude Code is all you needdwyer.co.za

阅读更多

来源: Hacker News | 12-08-25

MDK12-Bench: A Comprehensive Evaluation of Multimodal Large Language Models on Multidisciplinary Exams

Authors: Pengfei Zhou, Xiaopeng Peng, Fanrui Zhang, Zhaopan Xu, Jiaxin Ai, Yansheng Qiu, Chuanhao Li, Zhen Li, Ming Li, Yukang Feng, Jianwen Sun, Haoquan Zhang, Zizhen Li, Xiaofeng Mao, Zekai Li, Wangbo Zhao, Kai Wang, Xiaojun Chang, Wenqi Shao, Yang You, Kaipeng Zhang |

阅读更多

来源: ArXiv AI | 12-08-25

Automated Formalization via Conceptual Retrieval-Augmented LLMs

Authors: Wangyue Lu, Lun Du, Sirui Li, Ke Weng, Haozhe Sun, Hengyu Liu, Minghe Yu, Tiancheng Zhang, Ge Yu |

阅读更多

来源: ArXiv AI | 12-08-25

DSperse: A Framework for Targeted Verification in Zero-Knowledge Machine Learning

Authors: Dan Ivanov, Tristan Freiberg, Haruna Isah |

阅读更多

来源: ArXiv AI | 12-08-25

MASteer: Multi-Agent Adaptive Steer Strategy for End-to-End LLM Trustworthiness Repair

Authors: Changqing Li, Tianlin Li, Xiaohan Zhang, Aishan Liu, Li Pan |

阅读更多

来源: ArXiv AI | 12-08-25

Large Language Models Do Not Simulate Human Psychology

Authors: Sarah Schröder, Thekla Morgenroth, Ulrike Kuhl, Valerie Vaquet, Benjamin Paaßen |

阅读更多

来源: ArXiv AI | 12-08-25

Towards Safer AI Moderation: Evaluating LLM Moderators Through a Unified Benchmark Dataset and Advocating a Human-First Approach

Authors: Naseem Machlovi, Maryam Saleki, Innocent Ababio, Ruhul Amin |

阅读更多

来源: ArXiv AI | 12-08-25

Generative AI for Strategic Plan Development

Authors: Jesse Ponnock |

阅读更多

来源: ArXiv AI | 12-08-25

Rethinking Domain-Specific LLM Benchmark Construction: A Comprehensiveness-Compactness Approach

Authors: Rubing Chen, Jiaxin Wu, Jian Wang, Xulu Zhang, Wenqi Fan, Chenghua Lin, Xiao-Yong Wei, Qing Li |

阅读更多

来源: ArXiv AI | 12-08-25

MCPToolBench++: A Large Scale AI Agent Model Context Protocol MCP Tool Use Benchmark

Authors: Shiqing Fan, Xichen Ding, Liang Zhang, Linjian Mo |

阅读更多

来源: ArXiv AI | 12-08-25

Democratizing Diplomacy: A Harness for Evaluating Any Large Language Model on Full-Press Diplomacy

Authors: Alexander Duffy, Samuel J Paech, Ishana Shastri, Elizabeth Karpinski, Baptiste Alloui-Cros, Tyler Marques, Matthew Lyle Olson |

阅读更多

来源: ArXiv AI | 12-08-25

Grounding Natural Language for Multi-agent Decision-Making with Multi-agentic LLMs

Authors: Dom Huh, Prasant Mohapatra |

阅读更多

来源: ArXiv AI | 12-08-25

Multimodal AI Systems for Enhanced Laying Hen Welfare Assessment and Productivity Optimization

Authors: Daniel Essien, Suresh Neethirajan |

阅读更多

来源: ArXiv AI | 12-08-25

1-2-3 Check: Enhancing Contextual Privacy in LLM via Multi-Agent Reasoning

Authors: Wenkai Li, Liwen Sun, Zhenxiang Guan, Xuhui Zhou, Maarten Sap |

阅读更多

来源: ArXiv AI | 12-08-25

Symmetry-Aware Transformer Training for Automated Planning

Authors: Markus Fritzsche, Elliot Gestrin, Jendrik Seipp |

阅读更多

来源: ArXiv AI | 12-08-25

\(X\)-evolve: Solution space evolution powered by large language models

Authors: Yi Zhai, Zhiqiang Wei, Ruohan Li, Keyu Pan, Shuo Liu, Lu Zhang, Jianmin Ji, Wuyang Zhang, Yu Zhang, Yanyong Zhang |

阅读更多

来源: ArXiv AI | 12-08-25

FEAT: A Multi-Agent Forensic AI System with Domain-Adapted Large Language Model for Automated Cause-of-Death Analysis

Authors: Chen Shen, Wanqing Zhang, Kehan Li, Erwen Huang, Haitao Bi, Aiying Fan, Yiwen Shen, Hongmei Dong, Ji Zhang, Yuming Shao, Zengjia Liu, Xinshe Liu, Tao Li, Chunxia Yan, Shuanliang Fan, Di Wu, Jianhua Ma, Bin Cong, Zhenyuan Wang, Chunfeng Lian |

阅读更多

来源: ArXiv AI | 12-08-25

Interpreting Fedspeak with Confidence: A LLM-Based Uncertainty-Aware Framework Guided by Monetary Policy Transmission Paths

Authors: Rui Yao (1), Qi Chai (1 and 3), Jinhai Yao (2), Siyuan Li (1), Junhao Chen (1), Qi Zhang (2), Hao Wang (1) ((1) The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China, (2) Shanghai Jiaotong University, Shanghai, China, (3) Xi'an Jiaotong University, Xi'an, China) |

阅读更多

来源: ArXiv AI | 12-08-25

BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks

Authors: Rui Miao, Yixin Liu, Yili Wang, Xu Shen, Yue Tan, Yiwei Dai, Shirui Pan, Xin Wang |

阅读更多

来源: ArXiv AI | 12-08-25

TeamMedAgents: Enhancing Medical Decision-Making of LLMs Through Structured Teamwork

Authors: Pranav Pushkar Mishra, Mohammad Arvan, Mohan Zalake (University of Illinois, Chicago) |

阅读更多

来源: ArXiv AI | 12-08-25

From Natural Language to Solver-Ready Power System Optimization: An LLM-Assisted, Validation-in-the-Loop Framework

Authors: Yunkai Hu, Tianqiao Zhao, Meng Yue |

阅读更多

来源: ArXiv AI | 12-08-25

Optimizing my sleep around Claude usage limitsmattwie.se

阅读更多

来源: Hacker News | 12-08-25

Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts

Authors: Zhaomin Wu, Mingzhe Du, See-Kiong Ng, Bingsheng He |

阅读更多

来源: ArXiv AI | 12-08-25

End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation

Authors: Anurag Tripathi, Vaibhav Patle, Abhinav Jain, Ayush Pundir, Sairam Menon, Ajeet Kumar Singh |

阅读更多

来源: ArXiv AI | 12-08-25

SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models

Authors: Han Yin, Yafeng Chen, Chong Deng, Luyao Cheng, Hui Wang, Chao-Hong Tan, Qian Chen, Wen Wang, Xiangang Li |

阅读更多

来源: ArXiv AI | 12-08-25

Dimensional Characterization and Pathway Modeling for Catastrophic AI Risks

Authors: Ze Shen Chin |

阅读更多

来源: ArXiv AI | 12-08-25

Robust Target Speaker Diarization and Separation via Augmented Speaker Embedding Sampling

Authors: Md Asif Jalal, Luca Remaggi, Vasileios Moschopoulos, Thanasis Kotsiopoulos, Vandana Rajan, Karthikeyan Saravanan, Anastasis Drosou, Junho Heo, Hyuk Oh, Seokyeong Jeong |

阅读更多

来源: ArXiv AI | 12-08-25

Shortcut Learning in Generalist Robot Policies: The Role of Dataset Diversity and Fragmentation

Authors: Youguang Xing, Xu Luo, Junlin Xie, Lianli Gao, Hengtao Shen, Jingkuan Song |

阅读更多

来源: ArXiv AI | 12-08-25

Echoes of Automation: The Increasing Use of LLMs in Newsmaking

Authors: Abolfazl Ansari, Delvin Ce Zhang, Nafis Irtiza Tripto, Dongwon Lee |

阅读更多

来源: ArXiv AI | 12-08-25

Learning the Topic, Not the Language: How LLMs Classify Online Immigration Discourse Across Languages

Authors: Andrea Nasuto, Stefano Maria Iacus, Francisco Rowe, Devika Jain |

阅读更多

来源: ArXiv AI | 12-08-25

ScamAgents: How AI Agents Can Simulate Human-Level Scam Calls

Authors: Sanket Badhe |

阅读更多

来源: ArXiv AI | 12-08-25

Planning Agents on an Ego-Trip: Leveraging Hybrid Ego-Graph Ensembles for Improved Tool Retrieval in Enterprise Task Planning

Authors: Sahil Bansal, Sai Shruthi Sistla, Aarti Arikatala, Sebastian Schreiber |

阅读更多

来源: ArXiv AI | 12-08-25

Holistic Explainable AI (H-XAI): Extending Transparency Beyond Developers in AI-Driven Decision Making

Authors: Kausik Lakkaraju, Siva Likitha Valluru, Biplav Srivastava |

阅读更多

来源: ArXiv AI | 12-08-25

Whither symbols in the era of advanced neural networks?

Authors: Thomas L. Griffiths, Brenden M. Lake, R. Thomas McCoy, Ellie Pavlick, Taylor W. Webb |

阅读更多

来源: ArXiv AI | 12-08-25

LLMs for Resource Allocation: A Participatory Budgeting Approach to Inferring Preferences

Authors: Sankarshan Damle, Boi Faltings |

阅读更多

来源: ArXiv AI | 12-08-25

SKATE, a Scalable Tournament Eval: Weaker LLMs differentiate between stronger ones using verifiable challenges

Authors: Dewi S. W. Gould, Bruno Mlodozeniec, Samuel F. Brown |

阅读更多

来源: ArXiv AI | 12-08-25

GeoLaux: A Benchmark for Evaluating MLLMs' Geometry Performance on Long-Step Problems Requiring Auxiliary Lines

Authors: Yumeng Fu, Jiayin Zhu, Lingling Zhang, Bo Zhao, Shaoxuan Ma, Yushun Zhang, Yanrui Wu, Wenjun Wu |

阅读更多

来源: ArXiv AI | 12-08-25

Overconfidence in LLM-as-a-Judge: Diagnosis and Confidence-Driven Solution

Authors: Zailong Tian, Zhuoheng Han, Yanzhe Chen, Haozhe Xu, Xi Yang, richeng xuan, Hongfeng Wang, Lizi Liao |

阅读更多

来源: ArXiv AI | 12-08-25

Retrieval Augmented Large Language Model System for Comprehensive Drug Contraindications

Authors: Byeonghun Bang, Jongsuk Yoon, Dong-Jin Chang, Seho Park, Yong Oh Lee |

阅读更多

来源: ArXiv AI | 12-08-25

LLM Robustness Leaderboard v1 --Technical report

Authors: Pierre Peigné - Lefebvre, Quentin Feuillade-Montixi, Tom David, Nicolas Miailhe |

阅读更多

来源: ArXiv AI | 12-08-25

From Explainable to Explanatory Artificial Intelligence: Toward a New Paradigm for Human-Centered Explanations through Generative AI

Authors: Christian Meske, Justin Brenne, Erdi Uenal, Sabahat Oelcer, Ayseguel Doganguen |

阅读更多

来源: ArXiv AI | 12-08-25

AntiCheatPT: A Transformer-Based Approach to Cheat Detection in Competitive Computer Games

Authors: Mille Mei Zhen Loo, Gert Luzkov, Paolo Burelli |

阅读更多

来源: ArXiv AI | 12-08-25

The Fair Game: Auditing & Debiasing AI Algorithms Over Time

Authors: Debabrota Basu, Udvas Das |

阅读更多

来源: ArXiv AI | 12-08-25

OpenAI CEO Sam Altman responds to GPT-5 backlash, outlines next steps

阅读更多

来源: The Decoder | 11-08-25

Fitzgerald's Follieslibertiesjournal.com

阅读更多

来源: Hacker News | 11-08-25

Graham: Synchronizing Clocks by Leveraging Local Clock Properties (2022) [pdf]usenix.org

阅读更多

来源: Hacker News | 11-08-25

GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2sebastianraschka.com

阅读更多

来源: Hacker News | 11-08-25

GPT-OSS-120B runs on just 8GB VRAM & 64GB+ system RAMreddit.com

阅读更多

来源: Hacker News | 11-08-25

Hand-picked selection of articles on AI fundamentals/conceptsaman.ai

阅读更多

来源: Hacker News | 11-08-25

Meta acquires audio AI startup WaveForms as it ramps up efforts to build Llama 4.5

阅读更多

来源: The Decoder | 11-08-25

How I code with AI on a budget/freewuu73.org

阅读更多

来源: Hacker News | 11-08-25

Show HN: Reactive: A React Book for the Reluctant (written by Claude)github.com/cloudstreet-dev

阅读更多

来源: Hacker News | 11-08-25

The current state of LLM-driven developmenttolki.dev

阅读更多

来源: Hacker News | 10-08-25

Ch.at – a lightweight LLM chat service accessible through HTTP, SSH, DNS and APIch.at

阅读更多

来源: Hacker News | 10-08-25

My Lethal Trifecta talk at the Bay Area AI Security Meetupsimonwillison.net

阅读更多

来源: Hacker News | 10-08-25

Curious about the training data of OpenAI's new GPT-OSS models? I was tootwitter.com/jxmnop

阅读更多

来源: Hacker News | 10-08-25

Embedding Alignment in Code Generation for Audio

Authors: Sam Kouteili, Hiren Madhu, George Typaldos, Mark Santolucito |

阅读更多

来源: ArXiv AI | 10-08-25

The World According to LLMs: How Geographic Origin Influences LLMs' Entity Deduction Capabilities

Authors: Harsh Nishant Lalai, Raj Sanjay Shah, Jiaxin Pei, Sashank Varma, Yi-Chia Wang, Ali Emami |

阅读更多

来源: ArXiv AI | 10-08-25

Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle

Authors: Linghao Zhu, Yiran Guan, Dingkang Liang, Jianzhong Ju, Zhenbo Luo, Bin Qin, Jian Luan, Yuliang Liu, Xiang Bai |

阅读更多

来源: ArXiv AI | 10-08-25

Iterative Learning of Computable Phenotypes for Treatment Resistant Hypertension using Large Language Models

Authors: Guilherme Seidyo Imai Aldeia, Daniel S. Herman, William G. La Cava |

阅读更多

来源: ArXiv AI | 10-08-25

Conformal Sets in Multiple-Choice Question Answering under Black-Box Settings with Provable Coverage Guarantees

Authors: Guang Yang, Xinyang Liu |

阅读更多

来源: ArXiv AI | 10-08-25

Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models

Authors: Haitao Hong, Yuchen Yan, Xingyu Wu, Guiyang Hou, Wenqi Zhang, Weiming Lu, Yongliang Shen, Jun Xiao |

阅读更多

来源: ArXiv AI | 10-08-25

How Do LLMs Persuade? Linear Probes Can Uncover Persuasion Dynamics in Multi-Turn Conversations

Authors: Brandon Jaipersaud, David Krueger, Ekdeep Singh Lubana |

阅读更多

来源: ArXiv AI | 10-08-25

TrajEvo: Trajectory Prediction Heuristics Design via LLM-driven Evolution

Authors: Zhikai Zhao, Chuanbo Hua, Federico Berto, Kanghoon Lee, Zihan Ma, Jiachen Li, Jinkyoo Park |

阅读更多

来源: ArXiv AI | 10-08-25

Prescriptive Agents based on Rag for Automated Maintenance (PARAM)

Authors: Chitranshu Harbola, Anupam Purwar |

阅读更多

来源: ArXiv AI | 10-08-25

Large Language Models Reasoning Abilities Under Non-Ideal Conditions After RL-Fine-Tuning

Authors: Chang Tian, Matthew B. Blaschko, Mingzhe Xing, Xiuxing Li, Yinliang Yue, Marie-Francine Moens |

阅读更多

来源: ArXiv AI | 10-08-25

Fine-Tuning Small Language Models (SLMs) for Autonomous Web-based Geographical Information Systems (AWebGIS)

Authors: Mahdi Nazari Ashani, Ali Asghar Alesheikh, Saba Kazemi, Kimya Kheirkhah, Yasin Mohammadi, Fatemeh Rezaie, Amir Mahdi Manafi, Hedieh Zarkesh |

阅读更多

来源: ArXiv AI | 10-08-25

Who is a Better Player: LLM against LLM

Authors: Yingjie Zhou, Jiezhang Cao, Farong Wen, Li Xu, Yanwei Jiang, Jun Jia, Ronghui Li, Xiaohong Liu, Yu Zhou, Xiongkuo Min, Jie Guo, Zicheng Zhang, Guangtao Zhai |

阅读更多

来源: ArXiv AI | 10-08-25

MedMKEB: A Comprehensive Knowledge Editing Benchmark for Medical Multimodal Large Language Models

Authors: Dexuan Xu, Jieyi Wang, Zhongyan Chai, Yongzhi Cao, Hanpin Wang, Huamin Zhang, Yu Huang |

阅读更多

来源: ArXiv AI | 10-08-25

Can Large Language Models Integrate Spatial Data? Empirical Insights into Reasoning Strengths and Computational Weaknesses

Authors: Bin Han, Robert Wolfe, Anat Caspi, Bill Howe |

阅读更多

来源: ArXiv AI | 10-08-25

EasySize: Elastic Analog Circuit Sizing via LLM-Guided Heuristic Search

Authors: Xinyue Wu, Fan Hu, Shaik Jani Babu, Yi Zhao, Xinfei Guo |

阅读更多

来源: ArXiv AI | 10-08-25

A Novel Architecture for Symbolic Reasoning with Decision Trees and LLM Agents

Authors: Andrew Kiruluta |

阅读更多

来源: ArXiv AI | 10-08-25

QA-Dragon: Query-Aware Dynamic RAG System for Knowledge-Intensive Visual Question Answering

Authors: Zhuohang Jiang, Pangjing Wu, Xu Yuan, Wenqi Fan, Qing Li |

阅读更多

来源: ArXiv AI | 10-08-25

NomicLaw: Emergent Trust and Strategic Argumentation in LLMs During Collaborative Law-Making

Authors: Asutosh Hota, Jussi P.P. Jokinen |

阅读更多

来源: ArXiv AI | 10-08-25

Large Language Models Transform Organic Synthesis From Reaction Prediction to Automation

Authors: Kartar Kumar Lohana Tharwani, Rajesh Kumar, Sumita, Numan Ahmed, Yong Tang |

阅读更多

来源: ArXiv AI | 10-08-25

An Explainable Machine Learning Framework for Railway Predictive Maintenance using Data Streams from the Metro Operator of Portugal

Authors: Silvia García-Méndez, Francisco de Arriba-Pérez, Fátima Leal, Bruno Veloso, Benedita Malheiro, Juan Carlos Burguillo-Rial |

阅读更多

来源: ArXiv AI | 10-08-25

Bench-2-CoP: Can We Trust Benchmarking for EU AI Compliance?

Authors: Matteo Prandi, Vincenzo Suriani, Federico Pierucci, Marcello Galisai, Daniele Nardi, Piercosma Bisconti |

阅读更多

来源: ArXiv AI | 10-08-25

InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities

Authors: Shuo Cai, Su Lu, Qi Zhou, Kejing Yang, Zhijie Sang, Congkai Xie, Hongxia Yang |

阅读更多

来源: ArXiv AI | 10-08-25

Can Large Language Models Generate Effective Datasets for Emotion Recognition in Conversations?

Authors: Burak Can Kaplan, Hugo Cesar De Castro Carneiro, Stefan Wermter |

阅读更多

来源: ArXiv AI | 10-08-25

Simulating Human-Like Learning Dynamics with LLM-Empowered Agents

Authors: Yu Yuan, Lili Zhao, Wei Chen, Guangting Zheng, Kai Zhang, Mengdi Zhang, Qi Liu |

阅读更多

来源: ArXiv AI | 10-08-25

Prompting GPT-5 for agentic workflows and advanced coding applications

阅读更多

来源: The Decoder | 10-08-25

GPT-5: Overdue, overhyped and underwhelming. And that's not the worst of itgarymarcus.substack.com

阅读更多

来源: Hacker News | 10-08-25

Let's properly analyze an AI article for oncenibblestew.blogspot.com

阅读更多

来源: Hacker News | 09-08-25

Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?

阅读更多

来源: Hacker News | 09-08-25

Getting good results from Claude Codedzombak.com

阅读更多

来源: Hacker News | 09-08-25

What the Windsurf sale means for the AI coding ecosystemethanding.substack.com

阅读更多

来源: Hacker News | 09-08-25

I want everything local – Building my offline AI workspaceinstavm.io

阅读更多

来源: Hacker News | 09-08-25

Attackers can hijack Google Gemini with a simple prompt hidden in a calendar invite

阅读更多

来源: The Decoder | 09-08-25

Grok 4 edges out GPT-5 in complex reasoning benchmark ARC-AGI

阅读更多

来源: The Decoder | 09-08-25

GPT-5 should "seem smarter from today" after OpenAI fixed early issues with its model switcher

阅读更多

来源: The Decoder | 09-08-25

HRT's Python fork: Leveraging PEP 690 for faster importshudsonrivertrading.com

阅读更多

来源: Hacker News | 09-08-25

A robust, open-source framework for Spiking Neural Networks on low-end FPGAsarxiv.org

阅读更多

来源: Hacker News | 09-08-25

Open SWE: An open-source asynchronous coding agentlangchain.com

阅读更多

来源: Hacker News | 09-08-25

The surprise deprecation of GPT-4o for ChatGPT consumerssimonwillison.net

阅读更多

来源: Hacker News | 09-08-25

Developers rely on AI tools more than ever, but trust is slipping

阅读更多

来源: The Decoder | 09-08-25

Yet another study doubts that LLM reasoning shows true logic over pattern imitation

阅读更多

来源: The Decoder | 09-08-25

Political pressure reportedly kept a major AI vulnerability study under wraps

阅读更多

来源: The Decoder | 08-08-25

An invisible prompt in a Google Doc made ChatGPT access data from a victim’s Google Drive

阅读更多

来源: The Decoder | 08-08-25

A deleted GitHub post gives an early look at OpenAI’s next major model, GPT-5

阅读更多

来源: The Decoder | 08-08-25

How AI conquered the US economy: A visual FAQderekthompson.org

阅读更多

来源: Hacker News | 08-08-25

GPT-5 for Developersopenai.com

阅读更多

来源: Hacker News | 08-08-25

Writing a storage engine for Postgres: An in-memory table access method (2023)eatonphil.com

阅读更多

来源: Hacker News | 08-08-25

OpenAI's new open-source model is basically Phi-5seangoedecke.com

阅读更多

来源: Hacker News | 08-08-25

GPT-5: Key characteristics, pricing and system cardsimonwillison.net

阅读更多

来源: Hacker News | 08-08-25

GPT-5openai.com

阅读更多

来源: Hacker News | 08-08-25

Claude Code IDE integration for Emacsgithub.com/manzaltu

阅读更多

来源: Hacker News | 08-08-25

An LLM does not need to understand MCPhackteam.io

阅读更多

来源: Hacker News | 08-08-25

Show HN: Octofriend, a cute coding agent that can swap between GPT-5 and Claudegithub.com/synthetic-lab

阅读更多

来源: Hacker News | 08-08-25

OpenAI pushes back as the New York Times demands access to 120 million ChatGPT chat logs

阅读更多

来源: The Decoder | 07-08-25

Show HN: Aura – Like robots.txt, but for AI actionsgithub.com/osmandkitay

阅读更多

来源: Hacker News | 07-08-25

Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUsbaseten.co

阅读更多

来源: Hacker News | 07-08-25

New AI Coding Teammate: Gemini CLI GitHub Actionsblog.google

阅读更多

来源: Hacker News | 07-08-25

Position: The Current AI Conference Model is Unsustainable! Diagnosing the Crisis of Centralized AI Conference

Authors: Nuo Chen, Moming Duan, Andre Huikai Lin, Qian Wang, Jiaying Wu, Bingsheng He |

阅读更多

来源: ArXiv AI | 07-08-25

Share Your Attention: Transformer Weight Sharing via Matrix-based Dictionary Learning

Authors: Magauiya Zhussip, Dmitriy Shopkhoev, Ammar Ali, Stamatios Lefkimmiatis |

阅读更多

来源: ArXiv AI | 07-08-25

TURA: Tool-Augmented Unified Retrieval Agent for AI Search

Authors: Zhejun Zhao, Yuehu Dong, Alley Liu, Lixue Zheng, Pingsheng Liu, Dongdong Shen, Long Xia, Jiashu Zhao, Dawei Yin |

阅读更多

来源: ArXiv AI | 07-08-25

YOLOv8-Based Deep Learning Model for Automated Poultry Disease Detection and Health Monitoring paper

Authors: Akhil Saketh Reddy Sabbella, Ch.Lakshmi Prachothan, Eswar Kumar Panta |

阅读更多

来源: ArXiv AI | 07-08-25

How are CS students using resources and AI tools for coding tasks?

Authors: Natalia Echeverry, Arun Lekshmi Narayanan |

阅读更多

来源: ArXiv AI | 07-08-25

Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management

Authors: Mo Li, L.H. Xu, Qitai Tan, Ting Cao, Yunxin Liu |

阅读更多

来源: ArXiv AI | 07-08-25

GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples Replay

Authors: Yunan Zhang, Shuoran Jiang, Mengchen Zhao, Yuefeng Li, Yang Fan, Xiangping Wu, Qingcai Chen |

阅读更多

来源: ArXiv AI | 07-08-25

MI9 -- Agent Intelligence Protocol: Runtime Governance for Agentic AI Systems

Authors: Charles L. Wang, Trisha Singhal, Ameya Kelkar, Jason Tuo |

阅读更多

来源: ArXiv AI | 07-08-25

Galaxy: A Cognition-Centered Framework for Proactive, Privacy-Preserving, and Self-Evolving LLM Agents

Authors: Chongyu Bao, Ruimin Dai, Yangbo Shen, Runyang Jian, Jinghan Zhang, Xiaolan Liu, Kunpeng Liu |

阅读更多

来源: ArXiv AI | 07-08-25

Can Large Language Models Adequately Perform Symbolic Reasoning Over Time Series?

Authors: Zewen Liu, Juntong Ni, Xianfeng Tang, Max S.Y. Lau, Wei Jin |

阅读更多

来源: ArXiv AI | 07-08-25

Towards Transparent AI Grading: Semantic Entropy as a Signal for Human-AI Disagreement

Authors: Karrtik Iyer, Manikandan Ravikiran, Prasanna Pendse, Shayan Mohanty |

阅读更多

来源: ArXiv AI | 07-08-25

Large Language Model's Multi-Capability Alignment in Biomedical Domain

Authors: Wentao Wu, Linqing Chen, Hanmeng Zhong, Weilei Wang |

阅读更多

来源: ArXiv AI | 07-08-25

Beyond Pixels: Exploring DOM Downsampling for LLM-Based Web Agents

Authors: Thassilo M. Schiepanski, Nicholas Piël |

阅读更多

来源: ArXiv AI | 07-08-25

OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use

Authors: Xueyu Hu, Tao Xiong, Biao Yi, Zishu Wei, Ruixuan Xiao, Yurun Chen, Jiasheng Ye, Meiling Tao, Xiangxin Zhou, Ziyu Zhao, Yuhuai Li, Shengze Xu, Shenzhi Wang, Xinchen Xu, Shuofei Qiao, Zhaokai Wang, Kun Kuang, Tieyong Zeng, Liang Wang, Jiwei Li, Yuchen Eleanor Jiang, Wangchunshu Zhou, Guoyin Wang, Keting Yin, Zhou Zhao, Hongxia Yang, Fan Wu, Shengyu Zhang, Fei Wu |

阅读更多

来源: ArXiv AI | 07-08-25

\textsc{SimInstruct}: A Responsible Tool for Collecting Scaffolding Dialogues Between Experts and LLM-Simulated Novices

Authors: Si Chen, Izzy Molnar, Ting Hua, Peiyu Li, Le Huy Khiem, G. Alex Ambrose, Jim Lang, Ronald Metoyer, Nitesh V. Chawla |

阅读更多

来源: ArXiv AI | 07-08-25

LLM Collaboration With Multi-Agent Reinforcement Learning

Authors: Shuo Liu, Zeyu Liang, Xueguang Lyu, Christopher Amato |

阅读更多

来源: ArXiv AI | 07-08-25

ConfProBench: A Confidence Evaluation Benchmark for MLLM-Based Process Judges

Authors: Yue Zhou, Yi Chang, Yuan Wu |

阅读更多

来源: ArXiv AI | 07-08-25

Two face trial for exporting Nvidia AI chips as the company rejects hardware kill switches

阅读更多

来源: The Decoder | 07-08-25

Anthropic prepares for GPT-5 by releasing its upgraded Claude Opus 4.1 model

阅读更多

来源: The Decoder | 07-08-25

ElevenLabs launches Eleven Music, an AI music generator "cleared for broad commercial use"

阅读更多

来源: The Decoder | 07-08-25

OpenAI releases its first open-weight language models since GPT-2 with GPT-oss

阅读更多

来源: The Decoder | 06-08-25

The EU’s AI Act pushes transparency but could overwhelm developers with paperwork

阅读更多

来源: The Decoder | 06-08-25

Eight frontier AI models battle in chess for Game Arena’s first tournament tonight

阅读更多

来源: The Decoder | 06-08-25

US considers tracking AI chips, TSMC fires employees over the theft of advanced technology

阅读更多

来源: The Decoder | 06-08-25

OpenAI says it doesn't want ChatGPT to become a social media time sink

阅读更多

来源: The Decoder | 06-08-25

Claude Opus 4.1anthropic.com

阅读更多

来源: Hacker News | 06-08-25

Create personal illustrated storybooks in the Gemini appblog.google

阅读更多

来源: Hacker News | 06-08-25

Things that helped me get out of the AI 10x engineer imposter syndromecolton.dev

阅读更多

来源: Hacker News | 06-08-25

LLM Inflationtratt.net

阅读更多

来源: Hacker News | 06-08-25

Ask HN: Do you struggle with flow state when using AI assisted coding tools?

阅读更多

来源: Hacker News | 06-08-25

I gave the AI arms and legs then it rejected megrell.dev

阅读更多

来源: Hacker News | 06-08-25

Open models by OpenAIopenai.com

阅读更多

来源: Hacker News | 06-08-25

Large Language Model-based Data Science Agent: A Survey

Authors: Peiran Wang, Yaoning Yu, Ke Chen, Xianyang Zhan, Haohan Wang |

阅读更多

来源: ArXiv AI | 06-08-25

Recovering Individual-Level Activity Sequences from Location-Based Service Data Using a Novel Transformer-Based Model

Authors: Weiyu Luo, Chenfeng Xiong |

阅读更多

来源: ArXiv AI | 06-08-25

Enhancing Japanese Large Language Models with Reasoning Vectors

Authors: Carolina Minami Oguchi, Leo Wei, Koyo Kobayashi, Hsin-Tai Wu, Dipak Ghosal |

阅读更多

来源: ArXiv AI | 06-08-25

Defend LLMs Through Self-Consciousness

Authors: Boshi Huang, Fabio Nonato de Paula |

阅读更多

来源: ArXiv AI | 06-08-25

AGENTiGraph: A Multi-Agent Knowledge Graph Framework for Interactive, Domain-Specific LLM Chatbots

Authors: Xinjie Zhao, Moritz Blum, Fan Gao, Yingjian Chen, Boming Yang, Luis Marquez-Carpintero, Mónica Pina-Navarro, Yanran Fu, So Morikawa, Yusuke Iwasawa, Yutaka Matsuo, Chanjun Park, Irene Li |

阅读更多

来源: ArXiv AI | 06-08-25

When AIs Judge AIs: The Rise of Agent-as-a-Judge Evaluation for LLMs

Authors: Fangyi Yu |

阅读更多

来源: ArXiv AI | 06-08-25

Unified Tool Integration for LLMs: A Protocol-Agnostic Approach to Function Calling

Authors: Peng Ding, Rick Stevens |

阅读更多

来源: ArXiv AI | 06-08-25

From Text to Trajectories: GPT-2 as an ODE Solver via In-Context

Authors: Ziyang Ma, Baojian Zhou, Deqing Yang, Yanghua Xiao |

阅读更多

来源: ArXiv AI | 06-08-25

EoH-S: Evolution of Heuristic Set using LLMs for Automated Heuristic Design

Authors: Fei Liu, Yilu Liu, Qingfu Zhang, Xialiang Tong, Mingxuan Yuan |

阅读更多

来源: ArXiv AI | 06-08-25

ContractEval: Benchmarking LLMs for Clause-Level Legal Risk Identification in Commercial Contracts

Authors: Shuang Liu, Zelong Li, Ruoyun Ma, Haiyan Zhao, Mengnan Du |

阅读更多

来源: ArXiv AI | 06-08-25

Toward Verifiable Misinformation Detection: A Multi-Tool LLM Agent Framework

Authors: Zikun Cui, Tianyi Huang, Chia-En Chiang, Cuiqianhe Du |

阅读更多

来源: ArXiv AI | 06-08-25

Can Large Language Models Bridge the Gap in Environmental Knowledge?

Authors: Linda Smail (College of Interdisciplinary Studies, Zayed University, UAE), David Santandreu Calonge (Department of Academic Development, Mohamed bin Zayed University of Artificial Intelligence, UAE), Firuz Kamalov (School of Engineering, Applied Science and Technology, Canadian University Dubai, UAE), Nur H. Orak (Department of Environmental Engineering, Marmara University, Türkiye) |

阅读更多

来源: ArXiv AI | 06-08-25

InqEduAgent: Adaptive AI Learning Partners with Gaussian Process Augmentation

Authors: Tian-Fang Zhao, Wen-Xi Yang |

阅读更多

来源: ArXiv AI | 06-08-25

CogBench: A Large Language Model Benchmark for Multilingual Speech-Based Cognitive Impairment Assessment

Authors: Feng Rui, Zhiyao Luo, Wei Wang, Yuting Song, Yong Liu, Tingting Zhu, Jianqing Li, Xingyao Wang |

阅读更多

来源: ArXiv AI | 06-08-25

Compressing Chain-of-Thought in LLMs via Step Entropy

Authors: Zeju Li, Jianyuan Zhong, Ziyang Zheng, Xiangyu Wen, Zhijian Xu, Yingying Cheng, Fan Zhang, Qiang Xu |

阅读更多

来源: ArXiv AI | 06-08-25

Adaptive AI Agent Placement and Migration in Edge Intelligence Systems

Authors: Xingdan Wang, Jiayi He, Zhiqing Tang, Jianxiong Guo, Jiong Lou, Liping Qian, Tian Wang, Weijia Jia |

阅读更多

来源: ArXiv AI | 06-08-25

Board Game Arena: A Framework and Benchmark for Assessing Large Language Models via Strategic Play

Authors: Lucia Cipolina-Kun, Marianna Nezhurina, Jenia Jitsev |

阅读更多

来源: ArXiv AI | 06-08-25

A Comparative Study of Neurosymbolic AI Approaches to Interpretable Logical Reasoning

Authors: Michael K. Chen |

阅读更多

来源: ArXiv AI | 06-08-25

Multi-Objective Infeasibility Diagnosis for Routing Problems Using Large Language Models

Authors: Kai Li, Ruihao Zheng, Xinye Hao, Zhenkun Wang |

阅读更多

来源: ArXiv AI | 06-08-25

Hide and Seek with LLMs: An Adversarial Game for Sneaky Error Generation and Self-Improving Diagnosis

Authors: Rui Zou, Mengqi Wei, Yutao Zhu, Jirong Wen, Xin Zhao, Jing Chen |

阅读更多

来源: ArXiv AI | 06-08-25

Semantic-aware Graph-guided Behavior Sequences Generation with Large Language Models for Smart Homes

Authors: Zhiyao Xu, Dan Zhao, Qingsong Zou, Qing Li, Yong Jiang, Yuhang Wang, Jingyu Xiao |

阅读更多

来源: ArXiv AI | 06-08-25

Hidden Dynamics of Massive Activations in Transformer Training

Authors: Jorge Gallego-Feliciano, S. Aaron McClendon, Juan Morinelli, Stavros Zervoudakis, Antonios Saravanos |

阅读更多

来源: ArXiv AI | 06-08-25

Error Detection and Correction for Interpretable Mathematics in Large Language Models

Authors: Yijin Yang, Cristina Cornelio, Mario Leiva, Paulo Shakarian |

阅读更多

来源: ArXiv AI | 06-08-25

Refining Critical Thinking in LLM Code Generation: A Faulty Premise-based Evaluation Framework

Authors: Jialin Li, Jinzhe Li, Gengxu Li, Yi Chang, Yuan Wu |

阅读更多

来源: ArXiv AI | 06-08-25

Automated Algorithmic Discovery for Gravitational-Wave Detection Guided by LLM-Informed Evolutionary Monte Carlo Tree Search

Authors: He Wang, Liang Zeng |

阅读更多

来源: ArXiv AI | 06-08-25

Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Authors: Xufang Luo, Yuge Zhang, Zhiyuan He, Zilong Wang, Siyun Zhao, Dongsheng Li, Luna K. Qiu, Yuqing Yang |

阅读更多

来源: ArXiv AI | 06-08-25

Tell HN: Anthropic expires paid credits after a year

阅读更多

来源: Hacker News | 06-08-25

Persona vectors allow Anthropic to steer language model behaviors like sycophancy and evil

阅读更多

来源: The Decoder | 05-08-25

MLE-STAR is designed to automate machine learning pipelines with minimal human input

阅读更多

来源: The Decoder | 05-08-25

I tried to replace myself with ChatGPT in my English classlithub.com

阅读更多

来源: Hacker News | 05-08-25

Getting out of the Big-Muddy: Escalation of Commitment in LLMs

Authors: Emilio Barkett, Olivia Long, Paul Kröger |

阅读更多

来源: ArXiv AI | 05-08-25

Refine-n-Judge: Curating High-Quality Preference Chains for LLM-Fine-Tuning

Authors: Derin Cayir, Renjie Tao, Rashi Rungta, Kai Sun, Sean Chen, Haidar Khan, Minseok Kim, Julia Reinspach, Yue Liu |

阅读更多

来源: ArXiv AI | 05-08-25

Polymorphic Combinatorial Frameworks (PCF): Guiding the Design of Mathematically-Grounded, Adaptive AI Agents

Authors: David Pearl, Matthew Murphy, James Intriligator |

阅读更多

来源: ArXiv AI | 05-08-25

T-GRAG: A Dynamic GraphRAG Framework for Resolving Temporal Conflicts and Redundancy in Knowledge Retrieval

Authors: Dong Li, Yichen Niu, Ying Ai, Xiang Zou, Biqing Qi, Jianxing Liu |

阅读更多

来源: ArXiv AI | 05-08-25

QCBench: Evaluating Large Language Models on Domain-Specific Quantitative Chemistry

Authors: Jiaqing Xie, Weida Wang, Ben Gao, Zhuo Yang, Haiyuan Wan, Shufei Zhang, Tianfan Fu, Yuqiang Li |

阅读更多

来源: ArXiv AI | 05-08-25

A Multi-Agent Pokemon Tournament for Evaluating Strategic Reasoning of Large Language Models

Authors: Tadisetty Sai Yashwanth, Dhatri C |

阅读更多

来源: ArXiv AI | 05-08-25

ReflecSched: Solving Dynamic Flexible Job-Shop Scheduling via LLM-Powered Hierarchical Reflection

Authors: Shijie Cao, Yuan Yuan |

阅读更多

来源: ArXiv AI | 05-08-25

CloudAnoAgent: Anomaly Detection for Cloud Sites via LLM Agent with Neuro-Symbolic Mechanism

Authors: Xinkai Zou, Xuan Jiang, Ruikai Huang, Haoze He, Parv Kapoor, Jiahua Zhao |

阅读更多

来源: ArXiv AI | 05-08-25

TRACEALIGN -- Tracing the Drift: Attributing Alignment Failures to Training-Time Belief Sources in LLMs

Authors: Amitava Das, Vinija Jain, Aman Chadha |

阅读更多

来源: ArXiv AI | 05-08-25

SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents

Authors: Jiaye Lin, Yifu Guo, Yuzhen Han, Sen Hu, Ziyi Ni, Licheng Wang, Mingguang Chen, Daxin Jiang, Binxing Jiao, Chen Hu, Huacan Wang |

阅读更多

来源: ArXiv AI | 05-08-25

Everyone Contributes! Incentivizing Strategic Cooperation in Multi-LLM Systems via Sequential Public Goods Games

Authors: Yunhao Liang, Yuan Qu, Jingyuan Yang, Shaochong Lin, Zuo-Jun Max Shen |

阅读更多

来源: ArXiv AI | 05-08-25

Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools

Authors: Kanghua Mo, Li Hu, Yucheng Long, Zhihao Li |

阅读更多

来源: ArXiv AI | 05-08-25

Neuromorphic Computing with Multi-Frequency Oscillations: A Bio-Inspired Approach to Artificial Intelligence

Authors: Boheng Liu, Ziyu Li, Xia Wu |

阅读更多

来源: ArXiv AI | 05-08-25

AirTrafficGen: Configurable Air Traffic Scenario Generation with Large Language Models

Authors: Dewi Sid William Gould, George De Ath, Ben Carvell, Nick Pepper |

阅读更多

来源: ArXiv AI | 05-08-25

CABENCH: Benchmarking Composable AI for Solving Complex Tasks through Composing Ready-to-Use Models

Authors: Tung-Thuy Pham, Duy-Quan Luong, Minh-Quan Duong, Trung-Hieu Nguyen, Thu-Trang Nguyen, Son Nguyen, Hieu Dinh Vo |

阅读更多

来源: ArXiv AI | 05-08-25

Traffic-R1: Reinforced LLMs Bring Human-Like Reasoning to Traffic Signal Control Systems

Authors: Xingchen Zou, Yuhao Yang, Zheng Chen, Xixuan Hao, Yiqi Chen, Chao Huang, Yuxuan Liang |

阅读更多

来源: ArXiv AI | 05-08-25

FinWorld: An All-in-One Open-Source Platform for End-to-End Financial AI Research and Deployment

Authors: Wentao Zhang, Yilei Zhao, Chuqiao Zong, Xinrun Wang, Bo An |

阅读更多

来源: ArXiv AI | 05-08-25

Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting

Authors: Miaosen Luo, Jiesen Long, Zequn Li, Yunying Yang, Yuncheng Jiang, Sijie Mai |

阅读更多

来源: ArXiv AI | 05-08-25

OptiHive: Ensemble Selection for LLM-Based Optimization via Statistical Modeling

Authors: Maxime Bouscary, Saurabh Amin |

阅读更多

来源: ArXiv AI | 05-08-25

CAMA: Enhancing Mathematical Reasoning in Large Language Models with Causal Knowledge

Authors: Lei Zan, Keli Zhang, Ruichu Cai, Lujia Pan |

阅读更多

来源: ArXiv AI | 05-08-25

Accurate and Interpretable Postmenstrual Age Prediction via Multimodal Large Language Model

Authors: Qifan Chen, Jin Cui, Cindy Duan, Yushuo Han, Yifei Shi |

阅读更多

来源: ArXiv AI | 05-08-25

Noosemia: toward a Cognitive and Phenomenological Account of Intentionality Attribution in Human-Generative AI Interaction

Authors: Enrico De Santis, Antonello Rizzi |

阅读更多

来源: ArXiv AI | 05-08-25

HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research

Authors: Yinghao Zhu, Yifan Qi, Zixiang Wang, Lei Gu, Dehao Sui, Haoran Hu, Xichen Zhang, Ziyi He, Liantao Ma, Lequan Yu |

阅读更多

来源: ArXiv AI | 05-08-25

What Is Your AI Agent Buying? Evaluation, Implications and Emerging Questions for Agentic E-Commerce

Authors: Amine Allouah, Omar Besbes, Josué D Figueroa, Yash Kanoria, Akshit Kumar |

阅读更多

来源: ArXiv AI | 05-08-25

Tim Cook tells Apple employees that AI is as pivotal as the internet or the smartphone

阅读更多

来源: The Decoder | 05-08-25

Adobe's new AI features make complex Photoshopping effortless

阅读更多

来源: The Decoder | 05-08-25

Customizing tmuxevgeniipendragon.com

阅读更多

来源: Hacker News | 05-08-25

Job-seekers are dodging AI interviewersfortune.com

阅读更多

来源: Hacker News | 05-08-25

OpenAI prepares to launch GPT-5, but big leaps are unlikely

阅读更多

来源: The Decoder | 04-08-25

Anthropic blocks OpenAI from accessing Claude models over alleged contract breach

阅读更多

来源: The Decoder | 04-08-25

Persona vectors: Monitoring and controlling character traits in language modelsanthropic.com

阅读更多

来源: Hacker News | 04-08-25

Backdoor Attacks on Deep Learning Face Detection

Authors: Quentin Le Roux, Yannick Teglia, Teddy Furon, Philippe Loubet-Moundi |

阅读更多

来源: ArXiv AI | 04-08-25

Similarity-Based Self-Construct Graph Model for Predicting Patient Criticalness Using Graph Neural Networks and EHR Data

Authors: Mukesh Kumar Sahu, Pinki Roy |

阅读更多

来源: ArXiv AI | 04-08-25

NyayaRAG: Realistic Legal Judgment Prediction with RAG under the Indian Common Law System

Authors: Shubham Kumar Nigam, Balaramamahanthi Deepak Patnaik, Shivam Mishra, Ajay Varghese Thomas, Noel Shallum, Kripabandhu Ghosh, Arnab Bhattacharya |

阅读更多

来源: ArXiv AI | 04-08-25

On-Device Diffusion Transformer Policy for Efficient Robot Manipulation

Authors: Yiming Wu, Huan Wang, Zhenghao Chen, Jianxin Pang, Dong Xu |

阅读更多

来源: ArXiv AI | 04-08-25

Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications

Authors: Wenxuan Wang, Zizhan Ma, Meidan Ding, Shiyi Zheng, Shengyuan Liu, Jie Liu, Jiaming Ji, Wenting Chen, Xiang Li, Linlin Shen, Yixuan Yuan |

阅读更多

来源: ArXiv AI | 04-08-25

Agentic large language models improve retrieval-based radiology question answering

Authors: Sebastian Wind, Jeta Sopa, Daniel Truhn, Mahshad Lotfinia, Tri-Thien Nguyen, Keno Bressem, Lisa Adams, Mirabela Rusu, Harald Köstler, Gerhard Wellein, Andreas Maier, Soroosh Tayebi Arasteh |

阅读更多

来源: ArXiv AI | 04-08-25

Out-of-Context Abduction: LLMs Make Inferences About Procedural Data Leveraging Declarative Facts in Earlier Training Data

Authors: Sohaib Imran, Rob Lamb, Peter M. Atkinson |

阅读更多

来源: ArXiv AI | 04-08-25

How LLMs are Shaping the Future of Virtual Reality

Authors: Süeda Özkaya, Santiago Berrezueta-Guzman, Stefan Wagner |

阅读更多

来源: ArXiv AI | 04-08-25

Adaptive Machine Learning-Driven Multi-Fidelity Stratified Sampling for Failure Analysis of Nonlinear Stochastic Systems

Authors: Liuyun Xu, Seymour M.J. Spence |

阅读更多

来源: ArXiv AI | 04-08-25

Dynamically Adaptive Reasoning via LLM-Guided MCTS for Efficient and Context-Aware KGQA

Authors: Yingxu Wang, Shiqi Fan, Mengzhu Wang, Siwei Liu |

阅读更多

来源: ArXiv AI | 04-08-25

MMBERT: Scaled Mixture-of-Experts Multimodal BERT for Robust Chinese Hate Speech Detection under Cloaking Perturbations

Authors: Qiyao Xue, Yuchen Dou, Ryan Shi, Xiang Lorraine Li, Wei Gao |

阅读更多

来源: ArXiv AI | 04-08-25

No AI Without PI! Object-Centric Process Mining as the Enabler for Generative, Predictive, and Prescriptive Artificial Intelligence

Authors: Wil M.P. van der Aalst |

阅读更多

来源: ArXiv AI | 04-08-25

Do They Understand Them? An Updated Evaluation on Nonbinary Pronoun Handling in Large Language Models

Authors: Xushuo Tang, Yi Ding, Zhengyi Yang, Yin Chen, Yongrui Gu, Wenke Yang, Mingchen Ju, Xin Cao, Yongfei Liu, Wenjie Zhang |

阅读更多

来源: ArXiv AI | 04-08-25

Beyond Agreement: Rethinking Ground Truth in Educational AI Annotation

Authors: Danielle R. Thomas, Conrad Borchers, Kenneth R. Koedinger |

阅读更多

来源: ArXiv AI | 04-08-25

RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization

Authors: Yihong Dong, Xue Jiang, Yongding Tao, Huanyu Liu, Kechi Zhang, Lili Mou, Rongyu Cao, Yingwei Ma, Jue Chen, Binhua Li, Zhi Jin, Fei Huang, Yongbin Li, Ge Li |

阅读更多

来源: ArXiv AI | 04-08-25

Mind the Gap: The Divergence Between Human and LLM-Generated Tasks

Authors: Yi-Long Lu, Jiajun Song, Chunhui Zhang, Wei Wang |

阅读更多

来源: ArXiv AI | 04-08-25

Pro2Guard: Proactive Runtime Enforcement of LLM Agent Safety via Probabilistic Model Checking

Authors: Haoyu Wang, Chris M. Poskitt, Jun Sun, Jiali Wei |

阅读更多

来源: ArXiv AI | 04-08-25

Thinking Machines: Mathematical Reasoning in the Age of LLMs

Authors: Andrea Asperti, Alberto Naibo, Claudio Sacerdoti Coen |

阅读更多

来源: ArXiv AI | 04-08-25

MultiSHAP: A Shapley-Based Framework for Explaining Cross-Modal Interactions in Multimodal AI Models

Authors: Zhanliang Wang, Kai Wang |

阅读更多

来源: ArXiv AI | 04-08-25

From EMR Data to Clinical Insight: An LLM-Driven Framework for Automated Pre-Consultation Questionnaire Generation

Authors: Ruiqing Ding, Qianfang Sun, Yongkang Leng, Hui Yin, Xiaojian Li |

阅读更多

来源: ArXiv AI | 04-08-25

Context-Aware Visualization for Explainable AI Recommendations in Social Media: A Vision for User-Aligned Explanations

Authors: Banan Alkhateeb, Ellis Solaiman |

阅读更多

来源: ArXiv AI | 04-08-25

6 weeks of Claude Codepuzzmo.com

阅读更多

来源: Hacker News | 03-08-25

Automated Feedback on Student-Generated UML and ER Diagrams Using Large Language Models

Authors: Sebastian Gürtl, Gloria Schimetta, David Kerschbaumer, Michael Liut, Alexander Steinmaurer |

阅读更多

来源: ArXiv AI | 03-08-25

From LLMs to Edge: Parameter-Efficient Fine-Tuning on Edge Devices

Authors: Georg Slamanig, Francesco Corti, Olga Saukh |

阅读更多

来源: ArXiv AI | 03-08-25

Efficient Masked Attention Transformer for Few-Shot Classification and Segmentation

Authors: Dustin Carrión-Ojeda, Stefan Roth, Simone Schaub-Meyer |

阅读更多

来源: ArXiv AI | 03-08-25

LLM-Based Identification of Infostealer Infection Vectors from Screenshots: The Case of Aurora

Authors: Estelle Ruellan, Eric Clay, Nicholas Ascoli |

阅读更多

来源: ArXiv AI | 03-08-25

Deep Learning-based Prediction of Clinical Trial Enrollment with Uncertainty Estimates

Authors: Tien Huu Do, Antoine Masquelier, Nae Eoun Lee, Jonathan Crowther |

阅读更多

来源: ArXiv AI | 03-08-25

Can LLM-Reasoning Models Replace Classical Planning? A Benchmark Study

Authors: Kai Goebel, Patrik Zips |

阅读更多

来源: ArXiv AI | 03-08-25

Distributed AI Agents for Cognitive Underwater Robot Autonomy

Authors: Markus Buchholz, Ignacio Carlucho, Michele Grimaldi, Yvan R. Petillot |

阅读更多

来源: ArXiv AI | 03-08-25

A survey of multi-agent geosimulation methodologies: from ABM to LLM

Authors: Virginia Padilla, Jacinto Dávila |

阅读更多

来源: ArXiv AI | 03-08-25

Automating AI Failure Tracking: Semantic Association of Reports in AI Incident Database

Authors: Diego Russo, Gian Marco Orlando, Valerio La Gatta, Vincenzo Moscato |

阅读更多

来源: ArXiv AI | 03-08-25

FairReason: Balancing Reasoning and Social Bias in MLLMs

Authors: Zhenyu Pan, Yutong Zhang, Jianshu Zhang, Haoran Lu, Haozheng Luo, Yuwei Han, Philip S. Yu, Manling Li, Han Liu |

阅读更多

来源: ArXiv AI | 03-08-25

Data Readiness for Scientific AI at Scale

Authors: Wesley Brewer, Patrick Widener, Valentine Anantharaj, Feiyi Wang, Tom Beck, Arjun Shankar, Sarp Oral |

阅读更多

来源: ArXiv AI | 03-08-25

How Far Are AI Scientists from Changing the World?

Authors: Qiujie Xie, Yixuan Weng, Minjun Zhu, Fuchen Shen, Shulin Huang, Zhen Lin, Jiahui Zhou, Zilan Mao, Zijie Yang, Linyi Yang, Jian Wu, Yue Zhang |

阅读更多

来源: ArXiv AI | 03-08-25

LLM4Rail: An LLM-Augmented Railway Service Consulting Platform

Authors: Zhuo Li, Xianghuai Deng, Chiwei Feng, Hanmeng Li, Shenjie Wang, Haichao Zhang, Teng Jia, Conlin Chen, Louis Linchun Wu, Jia Wang |

阅读更多

来源: ArXiv AI | 03-08-25

DICE: Dynamic In-Context Example Selection in LLM Agents via Efficient Knowledge Transfer

Authors: Ruoyu Wang, Junda Wu, Yu Xia, Tong Yu, Ryan A. Rossi, Julian McAuley, Lina Yao |

阅读更多

来源: ArXiv AI | 03-08-25

MemoCue: Empowering LLM-Based Agents for Human Memory Recall via Strategy-Guided Querying

Authors: Qian Zhao, Zhuo Sun, Bin Guo, Zhiwen Yu |

阅读更多

来源: ArXiv AI | 03-08-25

TextQuests: How Good are LLMs at Text-Based Video Games?

Authors: Long Phan, Mantas Mazeika, Andy Zou, Dan Hendrycks |

阅读更多

来源: ArXiv AI | 03-08-25

SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model

Authors: Mingkai Deng, Jinyu Hou, Yilin Shen, Hongxia Jin, Graham Neubig, Zhiting Hu, Eric Xing |

阅读更多

来源: ArXiv AI | 03-08-25

OpenAI has reportedly raised $8.3 billion at a $300 billion valuation

阅读更多

来源: The Decoder | 03-08-25

Anthropic CEO talks about being labeled a doomer and his OpenAI departure

阅读更多

来源: The Decoder | 03-08-25

Under mounting pressure, Apple plans to increase its spending on artificial intelligence projects

阅读更多

来源: The Decoder | 03-08-25

Show HN: WebGPU enables local LLM in the browser – demo site with AI chatandreinwald.github.io

阅读更多

来源: Hacker News | 03-08-25

Show HN: AI Physics Tutor with Free Body Diagramsphysicsviewer.com

阅读更多

来源: Hacker News | 03-08-25

Every leading AI agent failed at least one security test during a massive red teaming competition

阅读更多

来源: The Decoder | 03-08-25

Robert Wilson has diedtheartnewspaper.com

阅读更多

来源: Hacker News | 02-08-25

Anthropic revokes OpenAI's access to Claudewired.com

阅读更多

来源: Hacker News | 02-08-25

Tim Cook rallying Apple employees around AI effortsbloomberg.com

阅读更多

来源: Hacker News | 02-08-25

Launch HN: Societies.io (YC W25) – AI simulations of your target audience

阅读更多

来源: Hacker News | 02-08-25

Aerodynamic drag in small cyclist formations: shielding the protected rider [pdf]urbanphysics.net

阅读更多

来源: Hacker News | 02-08-25

OpenAI's "Study Mode" and the risks of flatteryresobscura.substack.com

阅读更多

来源: Hacker News | 02-08-25

Google adds image-to-video and Veo 3 Fast to the Gemini API

阅读更多

来源: The Decoder | 02-08-25

Coverage Cat (YC S22) Is Hiring a Senior, Staff, or Principal Engineercoveragecat.com

阅读更多

来源: Hacker News | 02-08-25

Make Your Own Backup System – Part 2: Forging the FreeBSD Backup Strongholddragas.net

阅读更多

来源: Hacker News | 02-08-25

The tradeoff between human and AI contextsoftwaredoug.com

阅读更多

来源: Hacker News | 02-08-25

Deep Agentslangchain.com

阅读更多

来源: Hacker News | 02-08-25

Gemini 2.5 Deep Thinkblog.google

阅读更多

来源: Hacker News | 02-08-25

Respect instead of sarcasm: study uses AI for better political debates

阅读更多

来源: The Decoder | 02-08-25

OpenAI is building Stargate Norway while its annual spending is expected to soar to $8 billion

阅读更多

来源: The Decoder | 01-08-25

Interview with Microsoft: Copilot, AI skills, and building a learning organization

阅读更多

来源: The Decoder | 01-08-25

Google DeepMind unveils an AI model that acts as a "virtual satellite" for mapping the entire planet

阅读更多

来源: The Decoder | 01-08-25

Google and xAI sign EU AI Code of Practice

阅读更多

来源: The Decoder | 01-08-25

PHP-ORT: Machine learning inference for the webkrakjoe.github.io

阅读更多

来源: Hacker News | 01-08-25

Gemini Embedding: Powering RAG and context engineeringgoogleblog.com

阅读更多

来源: Hacker News | 01-08-25

Many countries that said no to ChatControl in 2024 are now undecideddigitalcourage.social

阅读更多

来源: Hacker News | 01-08-25

Gemini 2.5 Deep Thinktwitter.com/googledeepmind

阅读更多

来源: Hacker News | 01-08-25

Show HN: AgentMail – Email infra for AI agentsagentmail.to

阅读更多

来源: Hacker News | 01-08-25

Launch HN: Gecko Security (YC F24) – AI That Finds Vulnerabilities in Code

阅读更多

来源: Hacker News | 01-08-25

Show HN: Mcp-use – Connect any LLM to any MCPgithub.com/mcp-use

阅读更多

来源: Hacker News | 01-08-25

OpenAI launches Study Mode for ChatGPT while education users are told to wait and learn later

阅读更多

来源: The Decoder | 31-07-25

Anthropic could soon be valued at $170 billion

阅读更多

来源: The Decoder | 31-07-25

Some Meta employees fear being sidelined as Zuckerberg reshuffles teams for AI progress

阅读更多

来源: The Decoder | 31-07-25

Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs

Authors: Xikang Yang, Biyu Zhou, Xuehai Tang, Jizhong Han, Songlin Hu |

阅读更多

来源: ArXiv AI | 31-07-25

aLLoyM: A large language model for alloy phase diagram prediction

Authors: Yuna Oikawa, Guillaume Deffrennes, Taichi Abe, Ryo Tamura, Koji Tsuda |

阅读更多

来源: ArXiv AI | 31-07-25

RePaCA: Leveraging Reasoning Large Language Models for Static Automated Patch Correctness Assessment

Authors: Marcos Fuster-Pena, David de-Fitero-Dominguez, Antonio Garcia-Cabot, Eva Garcia-Lopez |

阅读更多

来源: ArXiv AI | 31-07-25

Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning

Authors: Afshin Khadangi, Amir Sartipi, Igor Tchappi, Ramin Bahmani, Gilbert Fridgen |

阅读更多

来源: ArXiv AI | 31-07-25

BALSAM: A Platform for Benchmarking Arabic Large Language Models

Authors: Rawan Al-Matham, Kareem Darwish, Raghad Al-Rasheed, Waad Alshammari, Muneera Alhoshan, Amal Almazrua, Asma Al Wazrah, Mais Alheraki, Firoj Alam, Preslav Nakov, Norah Alzahrani, Eman alBilali, Nizar Habash, Abdelrahman El-Sheikh, Muhammad Elmallah, Haonan Li, Hamdy Mubarak, Mohamed Anwar, Zaid Alyafeai, Ahmed Abdelali, Nora Altwairesh, Maram Hasanain, Abdulmohsen Al Thubaity, Shady Shehata, Bashar Alhafni, Injy Hamed, Go Inoue, Khalid Elmadani, Ossama Obeid, Fatima Haouari, Tamer Elsayed, Emad Alghamdi, Khalid Almubarak, Saied Alshahrani, Ola Aljarrah, Safa Alajlan, Areej Alshaqarawi, Maryam Alshihri, Sultana Alghurabi, Atikah Alzeghayer, Afrah Altamimi, Abdullah Alfaifi, Abdulrahman AlOsaimy |

阅读更多

来源: ArXiv AI | 31-07-25

A Systematic Literature Review on Detecting Software Vulnerabilities with Large Language Models

Authors: Sabrina Kaniewski, Fabian Schmidt, Markus Enzweiler, Michael Menth, Tobias Heer |

阅读更多

来源: ArXiv AI | 31-07-25

H2Tune: Federated Foundation Model Fine-Tuning with Hybrid Heterogeneity

Authors: Wei Guo, Siyuan Lu, Yiqi Tong, Zhaojun Hu, Fuzhen Zhuang, Xiao Zhang, Tao Fan, Jin Dong |

阅读更多

来源: ArXiv AI | 31-07-25

Teaching the Teacher: Improving Neural Network Distillability for Symbolic Regression via Jacobian Regularization

Authors: Soumyadeep Dhar, Kei Sen Fong, Mehul Motani |

阅读更多

来源: ArXiv AI | 31-07-25

OFCnetLLM: Large Language Model for Network Monitoring and Alertness

Authors: Hong-Jun Yoon, Mariam Kiran, Danial Ebling, Joe Breen |

阅读更多

来源: ArXiv AI | 31-07-25

LLM-Crowdsourced: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models

Authors: Qianhong Guo, Wei Xie, Xiaofang Cai, Enze Wang, Shuoyoucheng Ma, Kai Chen, Xiaofeng Wang, Baosheng Wang |

阅读更多

来源: ArXiv AI | 31-07-25

An Explainable Emotion Alignment Framework for LLM-Empowered Agent in Metaverse Service Ecosystem

Authors: Qun Ma, Xiao Xue, Ming Zhang, Yifan Shen, Zihan Zhao |

阅读更多

来源: ArXiv AI | 31-07-25

Explainability Through Systematicity: The Hard Systematicity Challenge for Artificial Intelligence

Authors: Matthieu Queloz |

阅读更多

来源: ArXiv AI | 31-07-25

Beyond Accuracy: How AI Metacognitive Sensitivity improves AI-assisted Decision Making

Authors: ZhaoBin Li, Mark Steyvers |

阅读更多

来源: ArXiv AI | 31-07-25

The Incomplete Bridge: How AI Research (Mis)Engages with Psychology

Authors: Han Jiang, Pengda Wang, Xiaoyuan Yi, Xing Xie, Ziang Xiao |

阅读更多

来源: ArXiv AI | 31-07-25

Enhancing Manufacturing Knowledge Access with LLMs and Context-aware Prompting

Authors: Sebastian Monka, Irlan Grangel-González, Stefan Schmid, Lavdim Halilaj, Marc Rickart, Oliver Rudolph, Rui Dias |

阅读更多

来源: ArXiv AI | 31-07-25

Automatically discovering heuristics in a complex SAT solver with large language models

Authors: Yiwen Sun, Furong Ye, Zhihan Chen, Ke Wei, Shaowei Cai |

阅读更多

来源: ArXiv AI | 31-07-25

Launch HN: Lucidic (YC W25) – Debug, test, and evaluate AI agents in production

阅读更多

来源: Hacker News | 31-07-25

Show HN: AgentGuard – Auto-kill AI agents before they burn through your budgetgithub.com/dipampaul17

阅读更多

来源: Hacker News | 31-07-25

OpenAI's ChatGPT Agent casually clicks through "I am not a robot" verificationarstechnica.com

阅读更多

来源: Hacker News | 31-07-25

AI startup tackles bottleneck where people spend more time checking AI content than creating it

阅读更多

来源: The Decoder | 31-07-25

Show HN: An AI agent that learns your product and guides your usersfrigade.ai

阅读更多

来源: Hacker News | 31-07-25

A major AI training data set contains millions of examples of personal datatechnologyreview.com

阅读更多

来源: Hacker News | 31-07-25

Show HN: Open-source alternative to ChatGPT Agents for browsinggithub.com/trymeka

阅读更多

来源: Hacker News | 31-07-25

Critical vulnerability in AI coding platform Base44 allowing unauthorized accesswiz.io

阅读更多

来源: Hacker News | 31-07-25

Crush: Glamourous AI coding agent for your favourite terminalgithub.com/charmbracelet

阅读更多

来源: Hacker News | 31-07-25

Efficacy of AI RAG Tools for Complex Information Extraction and Data Annotation Tasks: A Case Study Using Banks Public Disclosures

Authors: Nicholas Botti (Federal Reserve Board), Flora Haberkorn (Federal Reserve Board), Charlotte Hoopes (Federal Reserve Board), Shaun Khan (Federal Reserve Board) |

阅读更多

来源: ArXiv AI | 30-07-25

Games Agents Play: Towards Transactional Analysis in LLM-based Multi-Agent Systems

Authors: Monika Zamojska, Jarosław A. Chudziak |

阅读更多

来源: ArXiv AI | 30-07-25

Validating Pharmacogenomics Generative Artificial Intelligence Query Prompts Using Retrieval-Augmented Generation (RAG)

Authors: Ashley Rector, Keaton Minor, Kamden Minor, Jeff McCormack, Beth Breeden, Ryan Nowers, Jay Dorris |

阅读更多

来源: ArXiv AI | 30-07-25

Evo-DKD: Dual-Knowledge Decoding for Autonomous Ontology Evolution in Large Language Models

Authors: Vishal Raman, Vijai Aravindh R |

阅读更多

来源: ArXiv AI | 30-07-25

Graph-Augmented Large Language Model Agents: Current Progress and Future Prospects

Authors: Yixin Liu, Guibin Zhang, Kun Wang, Shiyuan Li, Shirui Pan |

阅读更多

来源: ArXiv AI | 30-07-25

What Does it Mean for a Neural Network to Learn a "World Model"?

Authors: Kenneth Li, Fernanda Viégas, Martin Wattenberg |

阅读更多

来源: ArXiv AI | 30-07-25

MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions

Authors: Yanxu Zhu, Shitong Duan, Xiangxu Zhang, Jitao Sang, Peng Zhang, Tun Lu, Xiao Zhou, Jing Yao, Xiaoyuan Yi, Xing Xie |

阅读更多

来源: ArXiv AI | 30-07-25

Large Language Models for Supply Chain Decisions

Authors: David Simchi-Levi, Konstantina Mellou, Ishai Menache, Jeevan Pathuri |

阅读更多

来源: ArXiv AI | 30-07-25

An LLM Driven Agent Framework for Automated Infrared Spectral Multi Task Reasoning

Authors: Zujie Xie, Zixuan Chen, Jiheng Liang, Xiangyang Yu, Ziru Yu |

阅读更多

来源: ArXiv AI | 30-07-25

SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval-Augmented Generation

Authors: Hao Ye, Mengshi Qi, Zhaohong Liu, Liang Liu, Huadong Ma |

阅读更多

来源: ArXiv AI | 30-07-25

Large Language Models for Wireless Communications: From Adaptation to Autonomy

Authors: Le Liang, Hao Ye, Yucheng Sheng, Ouya Wang, Jiacheng Wang, Shi Jin, Geoffrey Ye Li |

阅读更多

来源: ArXiv AI | 30-07-25

Self-Aware Safety Augmentation: Leveraging Internal Semantic Understanding to Enhance Safety in Vision-Language Models

Authors: Wanying Wang, Zeyu Ma, Han Zheng, Xin Tan, Mingang Chen |

阅读更多

来源: ArXiv AI | 30-07-25

StaffPro: an LLM Agent for Joint Staffing and Profiling

Authors: Alessio Maritan |

阅读更多

来源: ArXiv AI | 30-07-25

Exploring the Link Between Bayesian Inference and Embodied Intelligence: Toward Open Physical-World Embodied AI Systems

Authors: Bin Liu |

阅读更多

来源: ArXiv AI | 30-07-25

Towards a rigorous evaluation of RAG systems: the challenge of due diligence

Authors: Grégoire Martinon, Alexandra Lorenzo de Brionne, Jérôme Bohard, Antoine Lojou, Damien Hervault, Nicolas J-B. Brunel (ENSIIE, LaMME) |

阅读更多

来源: ArXiv AI | 30-07-25

Can the current trends of AI handle a full course of mathematics?

Authors: Mariam Alsayyad, Fayadh Kadhem |

阅读更多

来源: ArXiv AI | 30-07-25

An Agentic AI for a New Paradigm in Business Process Development

Authors: Mohammad Azarijafari, Luisa Mich, Michele Missikoff |

阅读更多

来源: ArXiv AI | 30-07-25

Tiny-BioMoE: a Lightweight Embedding Model for Biosignal Analysis

Authors: Stefanos Gkikas, Ioannis Kyprakis, Manolis Tsiknakis |

阅读更多

来源: ArXiv AI | 30-07-25

Efficient Pain Recognition via Respiration Signals: A Single Cross-Attention Transformer Multi-Window Fusion Pipeline

Authors: Stefanos Gkikas, Ioannis Kyprakis, Manolis Tsiknakis |

阅读更多

来源: ArXiv AI | 30-07-25

Libra: Large Chinese-based Safeguard for AI Content

Authors: Ziyang Chen, Huimu Yu, Xing Wu, Dongqin Liu, Songlin Hu |

阅读更多

来源: ArXiv AI | 30-07-25

LLM-based Content Classification Approach for GitHub Repositories by the README Files

Authors: Malik Uzair Mehmood, Shahid Hussain, Wen Li Wang, Muhammad Usama Malik |

阅读更多

来源: ArXiv AI | 30-07-25

PHAX: A Structured Argumentation Framework for User-Centered Explainable AI in Public Health and Biomedical Sciences

Authors: Bahar İlgen, Akshat Dubey, Georges Hattab |

阅读更多

来源: ArXiv AI | 30-07-25

Launch HN: Hyprnote (YC S25) – An open-source AI meeting notetaker

阅读更多

来源: Hacker News | 30-07-25

Study modeopenai.com

阅读更多

来源: Hacker News | 30-07-25

Irrelevant facts about cats added to math problems increase LLM errors by 300%science.org

阅读更多

来源: Hacker News | 30-07-25

Show HN: I built an AI that turns any book into a text adventure gamekathaaverse.com

阅读更多

来源: Hacker News | 30-07-25

Tencent releases Hunyuan World Model 1.0 as an open-source AI for 3D scene generation

阅读更多

来源: The Decoder | 29-07-25

Enough AI copilots, we need AI HUDsgeoffreylitt.com

阅读更多

来源: Hacker News | 29-07-25

Claude Code weekly rate limits

阅读更多

来源: Hacker News | 29-07-25

Show HN: Companies use AI to take your calls. I built AI to make them for youpipervoice.com

阅读更多

来源: Hacker News | 29-07-25

Anthropic Faces Potentially "Business-Ending" Copyright Lawsuitobsolete.pub

阅读更多

来源: Hacker News | 29-07-25

Tao on “blue team” vs. “red team” LLMsmathstodon.xyz

阅读更多

来源: Hacker News | 29-07-25

The wall confronting large language models

Authors: Peter V. Coveney, Sauro Succi |

阅读更多

来源: ArXiv AI | 29-07-25

Alignment and Safety in Large Language Models: Safety Mechanisms, Training Paradigms, and Emerging Challenges

Authors: Haoran Lu, Luyang Fang, Ruidong Zhang, Xinliang Li, Jiazhang Cai, Huimin Cheng, Lin Tang, Ziyu Liu, Zeliang Sun, Tao Wang, Yingchuan Zhang, Arif Hassan Zidan, Jinwen Xu, Jincheng Yu, Meizhi Yu, Hanqi Jiang, Xilin Gong, Weidi Luo, Bolun Sun, Yongkai Chen, Terry Ma, Shushan Wu, Yifan Zhou, Junhao Chen, Haotian Xiang, Jing Zhang, Afrar Jahin, Wei Ruan, Ke Deng, Yi Pan, Peilong Wang, Jiahui Li, Zhengliang Liu, Lu Zhang, Lin Zhao, Wei Liu, Dajiang Zhu, Xin Xing, Fei Dou, Wei Zhang, Chao Huang, Rongjie Liu, Mengrui Zhang, Yiwen Liu, Xiaoxiao Sun, Qin Lu, Zhen Xiang, Wenxuan Zhong, Tianming Liu, Ping Ma |

阅读更多

来源: ArXiv AI | 29-07-25

DeltaLLM: A Training-Free Framework Exploiting Temporal Sparsity for Efficient Edge LLM Inference

Authors: Jiawen Qi, Chang Gao, Zhaochun Ren, Qinyu Chen |

阅读更多

来源: ArXiv AI | 29-07-25

Leveraging Fine-Tuned Large Language Models for Interpretable Pancreatic Cystic Lesion Feature Extraction and Risk Categorization

Authors: Ebrahim Rasromani, Stella K. Kang, Yanqi Xu, Beisong Liu, Garvit Luhadia, Wan Fung Chui, Felicia L. Pasadyn, Yu Chih Hung, Julie Y. An, Edwin Mathieu, Zehui Gu, Carlos Fernandez-Granda, Ammar A. Javed, Greg D. Sacks, Tamas Gonda, Chenchan Huang, Yiqiu Shen |

阅读更多

来源: ArXiv AI | 29-07-25

Can LLMs Solve ASP Problems? Insights from a Benchmarking Study (Extended Version)

Authors: Lin Ren, Guohui Xiao, Guilin Qi, Yishuai Geng, Haohan Xue |

阅读更多

来源: ArXiv AI | 29-07-25

Improving Subgraph Matching by Combining Algorithms and Graph Neural Networks

Authors: Shuyang Guo, Wenjin Xie, Ping Lu, Ting Deng, Richong Zhang, Jianxin Li, Xiangping Huang, Zhongyi Liu |

阅读更多

来源: ArXiv AI | 29-07-25

The Policy Cliff: A Theoretical Analysis of Reward-Policy Maps in Large Language Models

Authors: Xingcheng Xu |

阅读更多

来源: ArXiv AI | 29-07-25

PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training

Authors: Sarat Chandra Bobbili, Ujwal Dinesha, Dheeraj Narasimha, Srinivas Shakkottai |

阅读更多

来源: ArXiv AI | 29-07-25

Matching Game Preferences Through Dialogical Large Language Models: A Perspective

Authors: Renaud Fabre, Daniel Egret, Patrice Bellot |

阅读更多

来源: ArXiv AI | 29-07-25

Artificial Intelligence In Patent And Market Intelligence: A New Paradigm For Technology Scouting

Authors: Manish Verma, Vivek Sharma, Vishal Singh |

阅读更多

来源: ArXiv AI | 29-07-25

Unlearning of Knowledge Graph Embedding via Preference Optimization

Authors: Jiajun Liu, Wenjun Ke, Peng Wang, Yao He, Ziyu Shang, Guozheng Li, Zijie Xu, Ke Ji |

阅读更多

来源: ArXiv AI | 29-07-25

MeLA: A Metacognitive LLM-Driven Architecture for Automatic Heuristic Design

Authors: Zishang Qiu, Xinan Chen, Long Chen, Ruibin Bai |

阅读更多

来源: ArXiv AI | 29-07-25

Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition

Authors: Andy Zou, Maxwell Lin, Eliot Jones, Micha Nowak, Mateusz Dziemian, Nick Winter, Alexander Grattan, Valent Nathanael, Ayla Croft, Xander Davies, Jai Patel, Robert Kirk, Nate Burnikell, Yarin Gal, Dan Hendrycks, J. Zico Kolter, Matt Fredrikson |

阅读更多

来源: ArXiv AI | 29-07-25

Enhancing QoS in Edge Computing through Federated Layering Techniques: A Pathway to Resilient AI Lifelong Learning Systems

Authors: Chengzhuo Han |

阅读更多

来源: ArXiv AI | 29-07-25

MMGraphRAG: Bridging Vision and Language with Interpretable Multimodal Knowledge Graphs

Authors: Xueyao Wan, Hang Yu |

阅读更多

来源: ArXiv AI | 29-07-25

evalSmarT: An LLM-Based Framework for Evaluating Smart Contract Generated Comments

Authors: Fatou Ndiaye Mbodji |

阅读更多

来源: ArXiv AI | 29-07-25

On the Limits of Hierarchically Embedded Logic in Classical Neural Networks

Authors: Bill Cochran |

阅读更多

来源: ArXiv AI | 29-07-25

MIRAGE-Bench: LLM Agent is Hallucinating and Where to Find Them

Authors: Weichen Zhang, Yiyou Sun, Pohao Huang, Jiayue Pu, Heyue Lin, Dawn Song |

阅读更多

来源: ArXiv AI | 29-07-25

Principles for production AI agentsapp.build

阅读更多

来源: Hacker News | 29-07-25

AI Is Wrecking a Fragile Job Market for College Graduateswsj.com

阅读更多

来源: Hacker News | 29-07-25

China pitches new global AI regulator based in Shanghai

阅读更多

来源: The Decoder | 28-07-25

China exports state propaganda with low-cost open source AI models

阅读更多

来源: The Decoder | 28-07-25

Mistral AI publishes the first comprehensive life cycle assessment of a large language model

阅读更多

来源: The Decoder | 28-07-25

Amazon launches Kiro to streamline AI prototyping

阅读更多

来源: The Decoder | 28-07-25

Claude Code Routergithub.com/musistudio

阅读更多

来源: Hacker News | 28-07-25

LLM Embeddings Explained: A Visual and Intuitive Guidehuggingface.co

阅读更多

来源: Hacker News | 28-07-25

Automated Code Review Using Large Language Models at Ericsson: An Experience Report

Authors: Shweta Ramesh, Joy Bose, Hamender Singh, A K Raghavan, Sujoy Roychowdhury, Giriprasad Sridhara, Nishrith Saini, Ricardo Britto |

阅读更多

来源: ArXiv AI | 28-07-25

Solar Photovoltaic Assessment with Large Language Model

Authors: Muhao Guo, Yang Weng |

阅读更多

来源: ArXiv AI | 28-07-25

PrompTrend: Continuous Community-Driven Vulnerability Discovery and Assessment for Large Language Models

Authors: Tarek Gasmi, Ramzi Guesmi, Mootez Aloui, Jihene Bennaceur |

阅读更多

来源: ArXiv AI | 28-07-25

An Empirical Investigation of Gender Stereotype Representation in Large Language Models: The Italian Case

Authors: Gioele Giachino, Marco Rondina, Antonio Vetrò, Riccardo Coppola, Juan Carlos De Martin |

阅读更多

来源: ArXiv AI | 28-07-25

Enhancing Diabetic Retinopathy Classification Accuracy through Dual Attention Mechanism in Deep Learning

Authors: Abdul Hannan, Zahid Mahmood, Rizwan Qureshi, Hazrat Ali |

阅读更多

来源: ArXiv AI | 28-07-25

Can Small-Scale Data Poisoning Exacerbate Dialect-Linked Biases in Large Language Models?

Authors: Chaymaa Abbas, Mariette Awad, Razane Tajeddine |

阅读更多

来源: ArXiv AI | 28-07-25

Towards LLM-Enhanced Group Recommender Systems

Authors: Sebastian Lubos, Alexander Felfernig, Thi Ngoc Trang Tran, Viet-Man Le, Damian Garber, Manuel Henrich, Reinhard Willfort, Jeremias Fuchs |

阅读更多

来源: ArXiv AI | 28-07-25

Fine-Tuning Multilingual Language Models for Code Review: An Empirical Study on Industrial C# Projects

Authors: Igli Begolli, Meltem Aksoy, Daniel Neider |

阅读更多

来源: ArXiv AI | 28-07-25

Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Tasks

Authors: Kai Liu, Zhan Su, Peijie Dong, Fengran Mo, Jianfei Gao, ShaoTing Zhang, Kai Chen |

阅读更多

来源: ArXiv AI | 28-07-25

Doubling Your Data in Minutes: Ultra-fast Tabular Data Generation via LLM-Induced Dependency Graphs

Authors: Shuo Yang, Zheyu Zhang, Bardh Prenkaj, Gjergji Kasneci |

阅读更多

来源: ArXiv AI | 28-07-25

SIDE: Sparse Information Disentanglement for Explainable Artificial Intelligence

Authors: Viktar Dubovik, Łukasz Struski, Jacek Tabor, Dawid Rymarczyk |

阅读更多

来源: ArXiv AI | 28-07-25

SpeechIQ: Speech Intelligence Quotient Across Cognitive Levels in Voice Understanding Large Language Models

Authors: Zhen Wan, Chao-Han Huck Yang, Yahan Yu, Jinchuan Tian, Sheng Li, Ke Hu, Zhehuai Chen, Shinji Watanabe, Fei Cheng, Chenhui Chu, Sadao Kurohashi |

阅读更多

来源: ArXiv AI | 28-07-25

ReCatcher: Towards LLMs Regression Testing for Code Generation

Authors: Altaf Allah Abbassi, Leuson Da Silva, Amin Nikanjam, Foutse Khomh |

阅读更多

来源: ArXiv AI | 28-07-25

Running in CIRCLE? A Simple Benchmark for LLM Code Interpreter Security

Authors: Gabriel Chua |

阅读更多

来源: ArXiv AI | 28-07-25

Advancing Event Forecasting through Massive Training of Large Language Models: Challenges, Solutions, and Broader Impacts

Authors: Sang-Woo Lee, Sohee Yang, Donghyun Kwak, Noah Y. Siegel |

阅读更多

来源: ArXiv AI | 28-07-25

Knowledge Grafting: A Mechanism for Optimizing AI Model Deployment in Resource-Constrained Environments

Authors: Osama Almurshed, Ashish Kaushal, Asmail Muftah, Nitin Auluck, Omer Rana |

阅读更多

来源: ArXiv AI | 28-07-25

Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges

Authors: Patrick Taillandier, Jean Daniel Zucker, Arnaud Grignard, Benoit Gaudou, Nghi Quang Huynh, Alexis Drogoul |

阅读更多

来源: ArXiv AI | 28-07-25

Microsoft revives Clippy as an AI blob in a new Copilot Appearance test

阅读更多

来源: The Decoder | 27-07-25

No AI Contenteclecticlight.co

阅读更多

来源: Hacker News | 27-07-25

Fast and cheap bulk storage: using LVM to cache HDDs on SSDsquantum5.ca

阅读更多

来源: Hacker News | 27-07-25

Linux on Snapdragon X Elite: Linaro and Tuxedo Pave the Way for ARM64 Laptopslinaro.org

阅读更多

来源: Hacker News | 27-07-25

Restoring Rhythm: Punctuation Restoration Using Transformer Models for Bangla, a Low-Resource Language

Authors: Md Obyedullahil Mamun, Md Adyelullahil Mamun, Arif Ahmad, Md. Imran Hossain Emu |

阅读更多

来源: ArXiv AI | 27-07-25

AraTable: Benchmarking LLMs' Reasoning and Understanding of Arabic Tabular Data

Authors: Rana Alshaikh, Israa Alghanmi, Shelan Jeawak |

阅读更多

来源: ArXiv AI | 27-07-25

CLEAR: Error Analysis via LLM-as-a-Judge Made Easy

Authors: Asaf Yehudai, Lilach Eden, Yotam Perlitz, Roy Bar-Haim, Michal Shmueli-Scheuer |

阅读更多

来源: ArXiv AI | 27-07-25

Automated Code Review Using Large Language Models with Symbolic Reasoning

Authors: Busra Icoz, Goksel Biricik |

阅读更多

来源: ArXiv AI | 27-07-25

Sandwich: Separating Prefill-Decode Compilation for Efficient CPU LLM Serving

Authors: Juntao Zhao, Jiuru Li, Chuan Wu |

阅读更多

来源: ArXiv AI | 27-07-25

HARLF: Hierarchical Reinforcement Learning and Lightweight LLM-Driven Sentiment Integration for Financial Portfolio Optimization

Authors: Benjamin Coriat, Eric Benhamou |

阅读更多

来源: ArXiv AI | 27-07-25

AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs

Authors: Xiaopeng Ke, Hexuan Deng, Xuebo Liu, Jun Rao, Zhenxi Song, Jun Yu, Min Zhang |

阅读更多

来源: ArXiv AI | 27-07-25

SMARTAPS: Tool-augmented LLMs for Operations Management

Authors: Timothy Tin Long Yu, Mahdi Mostajabdaveh, Jabo Serge Byusa, Rindra Ramamonjison, Giuseppe Carenini, Kun Mao, Zirui Zhou, Yong Zhang |

阅读更多

来源: ArXiv AI | 27-07-25

Does visualization help AI understand data?

Authors: Victoria R. Li, Johnathan Sun, Martin Wattenberg |

阅读更多

来源: ArXiv AI | 27-07-25

Agentic AI framework for End-to-End Medical Data Inference

Authors: Soorya Ram Shimgekar, Shayan Vassef, Abhay Goyal, Navin Kumar, Koustuv Saha |

阅读更多

来源: ArXiv AI | 27-07-25

Foundations for Risk Assessment of AI in Protecting Fundamental Rights

Authors: Antonino Rotolo, Beatrice Ferrigno, Jose Miguel Angel Garcia Godinez, Claudio Novelli, Giovanni Sartor |

阅读更多

来源: ArXiv AI | 27-07-25

Decoupling Knowledge and Reasoning in LLMs: An Exploration Using Cognitive Dual-System Theory

Authors: Mutian Yang, Jiandong Gao, Ji Wu |

阅读更多

来源: ArXiv AI | 27-07-25

Reasoning Beyond the Obvious: Evaluating Divergent and Convergent Thinking in LLMs for Financial Scenarios

Authors: Zhuang Qiang Bok, Watson Wei Khong Chua |

阅读更多

来源: ArXiv AI | 27-07-25

Revisiting LLM Reasoning via Information Bottleneck

Authors: Shiye Lei, Zhihao Cheng, Kai Jia, Dacheng Tao |

阅读更多

来源: ArXiv AI | 27-07-25

Reports say GPT-5 could arrive in August with improvements in coding

阅读更多

来源: The Decoder | 27-07-25

Google Deepmind's Aeneas AI helps historians quickly restore and interpret Roman inscriptions

阅读更多

来源: The Decoder | 26-07-25

Reuters says at least a dozen Shenzhen firms repair banned Nvidia H100 and A100 AI chips

阅读更多

来源: The Decoder | 26-07-25

Google says AI content is fine, and SEO basics still apply to AI-powered search

阅读更多

来源: The Decoder | 26-07-25

Show HN: Price Per Token – LLM API Pricing Datapricepertoken.com

阅读更多

来源: Hacker News | 26-07-25

Claude Code introduces specialized sub-agentsanthropic.com

阅读更多

来源: Hacker News | 26-07-25

AWS shuts its Shanghai AI lab as McKinsey bans generative AI projects for clients in China

阅读更多

来源: The Decoder | 25-07-25

Trump's radical AI plan: no copyrights, fewer rules, more exports

阅读更多

来源: The Decoder | 25-07-25

Anthropic says that AI can learn risky behaviors even when the training data looks completely safe

阅读更多

来源: The Decoder | 25-07-25

Finding Robert Bogucki, the man who disappeared on purposeabc.net.au

阅读更多

来源: Hacker News | 25-07-25

How Anthropic teams use Claude Codeanthropic.com

阅读更多

来源: Hacker News | 25-07-25

Quantitative AI progress needs accurate and transparent evaluationmathstodon.xyz

阅读更多

来源: Hacker News | 25-07-25

Superfunctions: A universal solution against sync/async fragmentation in Pythongithub.com/pomponchik

阅读更多

来源: Hacker News | 25-07-25

Pew finds that only 1 percent of users click a source link directly from Google's AI Overviews

阅读更多

来源: The Decoder | 24-07-25

Lumo: Privacy-first AI assistantproton.me

阅读更多

来源: Hacker News | 24-07-25

Building better AI toolshazelweakly.me

阅读更多

来源: Hacker News | 24-07-25

US AI Action Planai.gov

阅读更多

来源: Hacker News | 24-07-25

Distillation makes AI models smaller and cheaperquantamagazine.org

阅读更多

来源: Hacker News | 24-07-25

Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning

Authors: Situo Zhang, Hanqi Li, Lu Chen, Zihan Zhao, Xuanze Lin, Zichen Zhu, Bo Chen, Xin Chen, Kai Yu |

阅读更多

来源: ArXiv AI | 24-07-25

Each to Their Own: Exploring the Optimal Embedding in RAG

Authors: Shiting Chen, Zijian Zhao, Jinsong Chen |

阅读更多

来源: ArXiv AI | 24-07-25

Content-based 3D Image Retrieval and a ColBERT-inspired Re-ranking for Tumor Flagging and Staging

Authors: Farnaz Khun Jush, Steffen Vogler, Matthias Lenga |

阅读更多

来源: ArXiv AI | 24-07-25

MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs

Authors: Alexander R. Fabbri, Diego Mares, Jorge Flores, Meher Mankikar, Ernesto Hernandez, Dean Lee, Bing Liu, Chen Xing |

阅读更多

来源: ArXiv AI | 24-07-25

Vision Transformer attention alignment with human visual perception in aesthetic object evaluation

Authors: Miguel Carrasco, César González-Martín, José Aranda, Luis Oliveros |

阅读更多

来源: ArXiv AI | 24-07-25

AI Telephone Surveying: Automating Quantitative Data Collection with an AI Interviewer

Authors: Danny D. Leybzon, Shreyas Tirumala, Nishant Jain, Summer Gillen, Michael Jackson, Cameron McPhee, Jennifer Schmidt |

阅读更多

来源: ArXiv AI | 24-07-25

CASCADE: LLM-Powered JavaScript Deobfuscator at Google

Authors: Shan Jiang, Pranoy Kovuri, David Tao, Zhixun Tan |

阅读更多

来源: ArXiv AI | 24-07-25

LoRA is All You Need for Safety Alignment of Reasoning LLMs

Authors: Yihao Xue, Baharan Mirzasoleiman |

阅读更多

来源: ArXiv AI | 24-07-25

Towards Autonomous Sustainability Assessment via Multimodal AI Agents

Authors: Zhihan Zhang, Alexander Metzger, Yuxuan Mei, Felix Hähnlein, Zachary Englhardt, Tingyu Cheng, Gregory D. Abowd, Shwetak Patel, Adriana Schulz, Vikram Iyer |

阅读更多

来源: ArXiv AI | 24-07-25

Our Cars Can Talk: How IoT Brings AI to Vehicles

Authors: Amod Kant Agrawal |

阅读更多

来源: ArXiv AI | 24-07-25

Improving LLMs' Generalized Reasoning Abilities by Graph Problems

Authors: Qifan Zhang, Nuo Chen, Zehua Li, Miao Peng, Jing Tang, Jia Li |

阅读更多

来源: ArXiv AI | 24-07-25

HySafe-AI: Hybrid Safety Architectural Analysis Framework for AI Systems: A Case Study

Authors: Mandar Pitale, Jelena Frtunikj, Abhinaw Priyadershi, Vasu Singh, Maria Spence |

阅读更多

来源: ArXiv AI | 24-07-25

Compliance Brain Assistant: Conversational Agentic AI for Assisting Compliance Tasks in Enterprise Environments

Authors: Shitong Zhu, Chenhao Fang, Derek Larson, Neel Reddy Pochareddy, Rajeev Rao, Sophie Zeng, Yanqing Peng, Wendy Summer, Alex Goncalves, Arya Pudota, Herve Robert |

阅读更多

来源: ArXiv AI | 24-07-25

An Uncertainty-Driven Adaptive Self-Alignment Framework for Large Language Models

Authors: Haoran Sun, Zekun Zhang, Shaoning Zeng |

阅读更多

来源: ArXiv AI | 24-07-25

TAI Scan Tool: A RAG-Based Tool With Minimalistic Input for Trustworthy AI Self-Assessment

Authors: Athanasios Davvetas, Xenia Ziouvelou, Ypatia Dami, Alexis Kaponis, Konstantina Giouvanopoulou, Michael Papademas |

阅读更多

来源: ArXiv AI | 24-07-25

Simulating multiple human perspectives in socio-ecological systems using large language models

Authors: Yongchao Zeng, Calum Brown, Ioannis Kyriakou, Ronja Hotz, Mark Rounsevell |

阅读更多

来源: ArXiv AI | 24-07-25

Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning

Authors: Xinyao Liu, Diping Song |

阅读更多

来源: ArXiv AI | 24-07-25

OpenAI’s new agent moves its 2017 vision for AI closer to reality

阅读更多

来源: The Decoder | 24-07-25

Google’s Gemini 2.5 now supports "conversational image segmentation"

阅读更多

来源: The Decoder | 24-07-25

OpenAI pushes ahead with Stargate as SoftBank remains absent from data center development

阅读更多

来源: The Decoder | 23-07-25

Yet another study finds that overloading LLMs with information leads to worse results

阅读更多

来源: The Decoder | 23-07-25

OpenAI’s math gold hints that AI may soon tackle even longer and harder tasks

阅读更多

来源: The Decoder | 23-07-25

I watched Gemini CLI hallucinate and delete my filesanuraag2601.github.io

阅读更多

来源: Hacker News | 23-07-25

Never Come Up Empty: Adaptive HyDE Retrieval for Improving LLM Developer Support

Authors: Fangjian Lei, Mariam El Mezouar, Shayan Noei, Ying Zou |

阅读更多

来源: ArXiv AI | 23-07-25

Decoding Translation-Related Functional Sequences in 5'UTRs Using Interpretable Deep Learning Models

Authors: Yuxi Lin, Yaxue Fang, Zehong Zhang, Zhouwu Liu, Siyun Zhong, Fulong Yu |

阅读更多

来源: ArXiv AI | 23-07-25

Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning

Authors: Helena Casademunt, Caden Juang, Adam Karvonen, Samuel Marks, Senthooran Rajamanoharan, Neel Nanda |

阅读更多

来源: ArXiv AI | 23-07-25

Rethinking LLM-Based RTL Code Optimization Via Timing Logic Metamorphosis

Authors: Zhihao Xu, Bixin Li, Lulu Wang |

阅读更多

来源: ArXiv AI | 23-07-25

Why Braking? Scenario Extraction and Reasoning Utilizing LLM

Authors: Yin Wu, Daniel Slieter, Vivek Subramanian, Ahmed Abouelazm, Robin Bohn, J. Marius Zöllner |

阅读更多

来源: ArXiv AI | 23-07-25

Out-of-Distribution Generalization in the ARC-AGI Domain: Comparing Execution-Guided Neural Program Synthesis and Test-Time Fine-Tuning

Authors: Simon Ouellette |

阅读更多

来源: ArXiv AI | 23-07-25

Differential Multimodal Transformers

Authors: Jerry Li, Timothy Oh, Joseph Hoang, Vardhit Veeramachaneni |

阅读更多

来源: ArXiv AI | 23-07-25

Micromobility Flow Prediction: A Bike Sharing Station-level Study via Multi-level Spatial-Temporal Attention Neural Network

Authors: Xi Yang, Jiachen Wang, Song Han, Suining He |

阅读更多

来源: ArXiv AI | 23-07-25

Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization

Authors: Shengchao Liu, Hannan Xu, Yan Ai, Huanxin Li, Yoshua Bengio, Harry Guo |

阅读更多

来源: ArXiv AI | 23-07-25

From Logic to Language: A Trust Index for Problem Solving with LLMs

Authors: Tehseen Rug, Felix Böhmer, Tessa Pfattheicher |

阅读更多

来源: ArXiv AI | 23-07-25

SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validation in COPD Reporting

Authors: Shuhao Mei, Yongchao Long, Shan Cao, Xiaobo Han, Shijia Geng, Jinbo Sun, Yuxi Zhou, Shenda Hong |

阅读更多

来源: ArXiv AI | 23-07-25

Voice-based AI Agents: Filling the Economic Gaps in Digital Health Delivery

Authors: Bo Wen, Chen Wang, Qiwei Han, Raquel Norel, Julia Liu, Thaddeus Stappenbeck, Jeffrey L. Rogers |

阅读更多

来源: ArXiv AI | 23-07-25

Distilled Large Language Model in Confidential Computing Environment for System-on-Chip Design

Authors: Dong Ben, Hui Feng, Qian Wang |

阅读更多

来源: ArXiv AI | 23-07-25

ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry

Authors: Tianze Xu, Pengrui Lu, Lyumanshan Ye, Xiangkun Hu, Pengfei Liu |

阅读更多

来源: ArXiv AI | 23-07-25

Mind the Gap: Evaluating the Representativeness of Quantitative Medical Language Reasoning LLM Benchmarks for African Disease Burdens

Authors: Fred Mutisya (1 and 2), Shikoh Gitau (1), Christine Syovata (2), Diana Oigara (2), Ibrahim Matende (2), Muna Aden (2), Munira Ali (2), Ryan Nyotu (2), Diana Marion (2), Job Nyangena (2), Nasubo Ongoma (1), Keith Mbae (1), Elizabeth Wamicha (1), Eric Mibuari (1), Jean Philbert Nsengemana (3), Talkmore Chidede (4) ((1) Qhala (Nairobi, Kenya), (2) Kenya Medical Association (Nairobi, Kenya), (3) Africa CDC (Addis Ababa, Ethiopia), (4) AfCFTA (Accra, Ghana)) |

阅读更多

来源: ArXiv AI | 23-07-25

LLM-Driven Collaborative Model for Untangling Commits via Explicit and Implicit Dependency Reasoning

Authors: Bo Hou, Xin Tan, Kai Zheng, Fang Liu, Yinghao Zhu, Li Zhang |

阅读更多

来源: ArXiv AI | 23-07-25

Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework

Authors: Hongyi Tang, Zhihao Zhu, Yi Yang |

阅读更多

来源: ArXiv AI | 23-07-25

Improving ASP-based ORS Schedules through Machine Learning Predictions

Authors: Pierangela Bruno, Carmine Dodaro, Giuseppe Galatà, Marco Maratea, Marco Mochi |

阅读更多

来源: ArXiv AI | 23-07-25

Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report

Authors: Shanghai AI Lab: Xiaoyang Chen, Yunhao Chen, Zeren Chen, Zhiyun Chen, Hanyun Cui, Yawen Duan, Jiaxuan Guo, Qi Guo, Xuhao Hu, Hong Huang, Lige Huang, Chunxiao Li, Juncheng Li, Qihao Lin, Dongrui Liu, Xinmin Liu, Zicheng Liu, Chaochao Lu, Xiaoya Lu, Jingjing Qu, Qibing Ren, Jing Shao, Jingwei Shi, Jingwei Sun, Peng Wang, Weibing Wang, Jia Xu, Lewen Yan, Xiao Yu, Yi Yu, Boxuan Zhang, Jie Zhang, Weichen Zhang, Zhijie Zheng, Tianyi Zhou, Bowen Zhou |

阅读更多

来源: ArXiv AI | 23-07-25

Agentic RAG with Knowledge Graphs for Complex Multi-Hop Reasoning in Real-World Applications

Authors: Jean Lelong, Adnane Errazine, Annabelle Blangero |

阅读更多

来源: ArXiv AI | 23-07-25

Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints

Authors: Zhenyun Yin, Shujie Wang, Xuhong Wang, Xingjun Ma, Yinchun Wang |

阅读更多

来源: ArXiv AI | 23-07-25

WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding

Authors: Ran Wang, Xiaoxuan Liu, Hao Ren, Gang Chen, Fanchao Qi, Maosong Sun |

阅读更多

来源: ArXiv AI | 23-07-25

Uncertainty-Aware Knowledge Transformers for Peer-to-Peer Energy Trading with Multi-Agent Reinforcement Learning

Authors: Mian Ibad Ali Shah, Enda Barrett, Karl Mason |

阅读更多

来源: ArXiv AI | 23-07-25

Subliminal learning: Models transmit behaviors via hidden signals in dataanthropic.com

阅读更多

来源: Hacker News | 23-07-25

Gemini North telescope discovers long-predicted stellar companion of Betelgeusescience.org

阅读更多

来源: Hacker News | 23-07-25

New ARC-AGI-3 benchmark shows that humans still outperform LLMs at pretty basic thinking

阅读更多

来源: The Decoder | 22-07-25

OpenAI claims a breakthrough in LLM reasoning on complex math problems

阅读更多

来源: The Decoder | 22-07-25

FlexOlmo enables organizations to collaboratively train LLMs without data sharing

阅读更多

来源: The Decoder | 22-07-25

Routine: A Structural Planning Framework for LLM Agent System in Enterprise

Authors: Guancheng Zeng, Xueyi Chen, Jiawang Hu, Shaohua Qi, Yaxuan Mao, Zhantao Wang, Yifan Nie, Shuang Li, Qiuyang Feng, Pengxu Qiu, Yujia Wang, Wenqiang Han, Linyan Huang, Gang Li, Jingjing Mo, Haowen Hu |

阅读更多

来源: ArXiv AI | 22-07-25

Large Language Models Assisting Ontology Evaluation

Authors: Anna Sofia Lippolis, Mohammad Javad Saeedizade, Robin Keskisärkkä, Aldo Gangemi, Eva Blomqvist, Andrea Giovanni Nuzzolese |

阅读更多

来源: ArXiv AI | 22-07-25

BioGraphFusion: Graph Knowledge Embedding for Biological Completion and Reasoning

Authors: Yitong Lin, Jiaying He, Jiahe Chen, Xinnan Zhu, Jianwei Zheng, Tao Bo |

阅读更多

来源: ArXiv AI | 22-07-25

Towards AI Urban Planner in the Age of GenAI, LLMs, and Agentic AI

Authors: Yanjie Fu |

阅读更多

来源: ArXiv AI | 22-07-25

Automated Safety Evaluations Across 20 Large Language Models: The Aymara LLM Risk and Responsibility Matrix

Authors: Juan Manuel Contreras |

阅读更多

来源: ArXiv AI | 22-07-25

Configurable multi-agent framework for scalable and realistic testing of llm-based agents

Authors: Sai Wang, Senthilnathan Subramanian, Mudit Sahni, Praneeth Gone, Lingjie Meng, Xiaochen Wang, Nicolas Ferradas Bertoli, Tingxian Cheng, Jun Xu |

阅读更多

来源: ArXiv AI | 22-07-25

The Endless Tuning. An Artificial Intelligence Design To Avoid Human Replacement and Trace Back Responsibilities

Authors: Elio Grande |

阅读更多

来源: ArXiv AI | 22-07-25

Feedback-Induced Performance Decline in LLM-Based Decision-Making

Authors: Xiao Yang, Juxi Leitner, Michael Burke |

阅读更多

来源: ArXiv AI | 22-07-25

DeRAG: Black-box Adversarial Attacks on Multiple Retrieval-Augmented Generation Applications via Prompt Injection

Authors: Jerry Wang, Fang Yu |

阅读更多

来源: ArXiv AI | 22-07-25

IM-Chat: A Multi-agent LLM-based Framework for Knowledge Transfer in Injection Molding Industry

Authors: Junhyeong Lee, Joon-Young Kim, Heekyu Kim, Inhyo Lee, Seunghwa Ryu |

阅读更多

来源: ArXiv AI | 22-07-25

Explainable Artificial Intelligence based Soft Evaluation Indicator for Arc Fault Diagnosis

Authors: Qianchao Wang, Yuxuan Ding, Chuanzhen Jia, Zhe Li, Yaping Du |

阅读更多

来源: ArXiv AI | 22-07-25

LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning

Authors: Cole Robertson, Philip Wolff |

阅读更多

来源: ArXiv AI | 22-07-25

Predictive Process Monitoring Using Object-centric Graph Embeddings

Authors: Wissam Gherissi (LAMSADE), Mehdi Acheli, Joyce El Haddad (LAMSADE), Daniela Grigori (LAMSADE) |

阅读更多

来源: ArXiv AI | 22-07-25

Agentic AI for autonomous anomaly management in complex systems

Authors: Reza Vatankhah Barenji, Sina Khoshgoftar |

阅读更多

来源: ArXiv AI | 22-07-25

A Framework for Analyzing Abnormal Emergence in Service Ecosystems Through LLM-based Agent Intention Mining

Authors: Yifan Shen, Zihan Zhao, Xiao Xue, Yuwei Guo, Qun Ma, Deyu Zhou, Ming Zhang |

阅读更多

来源: ArXiv AI | 22-07-25

Gemini 2.5 Pro Capable of Winning Gold at IMO 2025

Authors: Yichen Huang, Lin F. Yang |

阅读更多

来源: ArXiv AI | 22-07-25

Don't bother parsing: Just use images for RAGmorphik.ai

阅读更多

来源: Hacker News | 22-07-25

AccountingBench: Evaluating LLMs on real long-horizon business taskspenrose.com

阅读更多

来源: Hacker News | 22-07-25

The Hater's Guide to the AI Bubblewheresyoured.at

阅读更多

来源: Hacker News | 22-07-25

How to Migrate from OpenAI to Cerebrium for Cost-Predictable AI Inferenceritza.co

阅读更多

来源: Hacker News | 22-07-25

Complete silence is always hallucinated as "ترجمة نانسي قنقر" in Arabicgithub.com/openai

阅读更多

来源: Hacker News | 22-07-25

Replit's CEO apologizes after its AI agent wiped a company's code basebusinessinsider.com

阅读更多

来源: Hacker News | 22-07-25

If writing is thinking then what happens if AI is doing the writing and reading?learningbyshipping.com

阅读更多

来源: Hacker News | 22-07-25

"Napster-style" piracy allegations put Anthropic at risk of a billion-dollar class action lawsuit

阅读更多

来源: The Decoder | 21-07-25

Decart launches MirageLSD, an AI model that transforms live video feeds in real time

阅读更多

来源: The Decoder | 21-07-25

Show HN: Conductor, a Mac app that lets you run a bunch of Claude Codes at onceconductor.build

阅读更多

来源: Hacker News | 21-07-25

Coding with LLMs in the summer of 2025 – an updateantirez.com

阅读更多

来源: Hacker News | 21-07-25

SPARQL Query Generation with LLMs: Measuring the Impact of Training Data Memorization and Knowledge Injection

Authors: Aleksandr Gashkov, Aleksandr Perevalov, Maria Eltsova, Andreas Both |

阅读更多

来源: ArXiv AI | 21-07-25

RAG-based Architectures for Drug Side Effect Retrieval in LLMs

Authors: Shad Nygren, Pinar Avci, Andre Daniels, Reza Rassol, Afshin Beheshti, Diego Galeano |

阅读更多

来源: ArXiv AI | 21-07-25

Using LLMs to identify features of personal and professional skills in an open-response situational judgment test

Authors: Cole Walsh, Rodica Ivan, Muhammad Zafar Iqbal, Colleen Robb |

阅读更多

来源: ArXiv AI | 21-07-25

Exploiting Primacy Effect To Improve Large Language Models

Authors: Bianca Raimondi, Maurizio Gabbrielli |

阅读更多

来源: ArXiv AI | 21-07-25

Preprint: Did I Just Browse A Website Written by LLMs?

Authors: Sichang "Steven" He, Ramesh Govindan, Harsha V. Madhyastha |

阅读更多

来源: ArXiv AI | 21-07-25

A segmented robot grasping perception neural network for edge AI

Authors: Casper Bröcheler, Thomas Vroom, Derrick Timmermans, Alan van den Akker, Guangzhi Tang, Charalampos S. Kouzinopoulos, Rico Möckel |

阅读更多

来源: ArXiv AI | 21-07-25

Photonic Fabric Platform for AI Accelerators

Authors: Jing Ding, Trung Diep |

阅读更多

来源: ArXiv AI | 21-07-25

Edge Intelligence with Spiking Neural Networks

Authors: Shuiguang Deng, Di Yu, Changze Lv, Xin Du, Linshan Jiang, Xiaofan Zhao, Wentao Tong, Xiaoqing Zheng, Weijia Fang, Peng Zhao, Gang Pan, Schahram Dustdar, Albert Y. Zomaya |

阅读更多

来源: ArXiv AI | 21-07-25

Multi-Centre Validation of a Deep Learning Model for Scoliosis Assessment

Authors: Šimon Kubov, Simon Klíčník, Jakub Dandár, Zdeněk Straka, Karolína Kvaková, Daniel Kvak |

阅读更多

来源: ArXiv AI | 21-07-25

GraphTrafficGPT: Enhancing Traffic Management Through Graph-Based AI Agent Coordination

Authors: Nabil Abdelaziz Ferhat Taleb, Abdolazim Rezaei, Raj Atulkumar Patel, Mehdi Sookhak |

阅读更多

来源: ArXiv AI | 21-07-25

GOFAI meets Generative AI: Development of Expert Systems by means of Large Language Models

Authors: Eduardo C. Garrido-Merchán, Cristina Puente |

阅读更多

来源: ArXiv AI | 21-07-25

BifrostRAG: Bridging Dual Knowledge Graphs for Multi-Hop Question Answering in Construction Safety

Authors: Yuxin Zhang (1), Xi Wang (1), Mo Hu (1), Zhenyu Zhang (1) ((1) Department of Construction Science, College of Architecture, Texas A&M University, College Station, USA) |

阅读更多

来源: ArXiv AI | 21-07-25

DailyLLM: Context-Aware Activity Log Generation Using Multi-Modal Sensors and LLMs

Authors: Ye Tian, Xiaoyuan Ren, Zihao Wang, Onat Gungor, Xiaofan Yu, Tajana Rosing |

阅读更多

来源: ArXiv AI | 21-07-25

Large Language Models as Innovators: A Framework to Leverage Latent Space Exploration for Novelty Discovery

Authors: Mateusz Bystroński, Mikołaj Hołysz, Grzegorz Piotrowski, Nitesh V. Chawla, Tomasz Kajdanowicz |

阅读更多

来源: ArXiv AI | 21-07-25

KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models

Authors: Lam Nguyen, Erika Barcelos, Roger French, Yinghui Wu |

阅读更多

来源: ArXiv AI | 21-07-25

Glucose-ML: A collection of longitudinal diabetes datasets for development of robust AI solutions

Authors: Temiloluwa Prioleau, Baiying Lu, Yanjun Cui |

阅读更多

来源: ArXiv AI | 21-07-25

Automated Interpretation of Non-Destructive Evaluation Contour Maps Using Large Language Models for Bridge Condition Assessment

Authors: Viraj Nishesh Darji, Callie C. Liao, Duoduo Liao |

阅读更多

来源: ArXiv AI | 21-07-25

Computational complexity of neural networks (2022)lunalux.io

阅读更多

来源: Hacker News | 21-07-25

iMessage integration in Claude can hijack the model to do anythinggeneralanalysis.com

阅读更多

来源: Hacker News | 21-07-25

Nobody knows how to build with AI yetworksonmymachine.substack.com

阅读更多

来源: Hacker News | 20-07-25

Local LLMs versus offline Wikipediaevanhahn.com

阅读更多

来源: Hacker News | 20-07-25

Make Your Own Backup System – Part 1: Strategy Before Scriptsdragas.net

阅读更多

来源: Hacker News | 20-07-25

Terence Tao: A human metaphor for evaluating AI capabilitymathstodon.xyz

阅读更多

来源: Hacker News | 20-07-25

I'm betting against AI agents, despite building themutkarshkanwat.com

阅读更多

来源: Hacker News | 20-07-25

The Big LLM Architecture Comparisonsebastianraschka.com

阅读更多

来源: Hacker News | 20-07-25

Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities

Authors: Hao Sun, Mihaela van der Schaar |

阅读更多

来源: ArXiv AI | 20-07-25

SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models

Authors: Xiangyu Dong, Haoran Zhao, Jiang Gao, Haozhou Li, Xiaoguang Ma, Yaoming Zhou, Fuhai Chen, Juan Liu |

阅读更多

来源: ArXiv AI | 20-07-25

DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model

Authors: Maulana Bisyir Azhari, David Hyunchul Shim |

阅读更多

来源: ArXiv AI | 20-07-25

Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection

Authors: Hongyang Zhao, Tianyu Liang, Sina Davari, Daeho Kim |

阅读更多

来源: ArXiv AI | 20-07-25

Prompt Injection 2.0: Hybrid AI Threats

Authors: Jeremy McHugh, Kristina Šekrst, Jon Cefalu |

阅读更多

来源: ArXiv AI | 20-07-25

HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models

Authors: Ashray Gupta, Rohan Joseph, Sunny Rai |

阅读更多

来源: ArXiv AI | 20-07-25

Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy

Authors: Yiting Yang, Hao Luo, Yuan Sun, Qingsen Yan, Haokui Zhang, Wei Dong, Guoqing Wang, Peng Wang, Yang Yang, Hengtao Shen |

阅读更多

来源: ArXiv AI | 20-07-25

Automating Steering for Safe Multimodal Large Language Models

Authors: Lyucheng Wu, Mengru Wang, Ziwen Xu, Tri Cao, Nay Oo, Bryan Hooi, Shumin Deng |

阅读更多

来源: ArXiv AI | 20-07-25

QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation

Authors: Jiazheng Li, Hong Lu, Kaiyue Wen, Zaiwen Yang, Jiaxuan Gao, Hongzhou Lin, Yi Wu, Jingzhao Zhang |

阅读更多

来源: ArXiv AI | 20-07-25

AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research

Authors: Yilun Zhao, Weiyuan Chen, Zhijian Xu, Manasi Patwardhan, Yixin Liu, Chengye Wang, Lovekesh Vig, Arman Cohan |

阅读更多

来源: ArXiv AI | 20-07-25

Towards Formal Verification of LLM-Generated Code from Natural Language Prompts

Authors: Aaron Councilman, David Fu, Aryan Gupta, Chengxiao Wang, David Grove, Yu-Xiong Wang, Vikram Adve |

阅读更多

来源: ArXiv AI | 20-07-25

Imitating Mistakes in a Learning Companion AI Agent for Online Peer Learning

Authors: Sosui Moribe, Taketoshi Ushiama |

阅读更多

来源: ArXiv AI | 20-07-25

Emotional Support with LLM-based Empathetic Dialogue Generation

Authors: Shiquan Wang, Ruiyu Fang, Zhongjiang He, Shuangyong Song, Yongxiang Li |

阅读更多

来源: ArXiv AI | 20-07-25

MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models

Authors: Zhiwei Liu, Jielin Qiu, Shiyu Wang, Jianguo Zhang, Zuxin Liu, Roshan Ram, Haolin Chen, Weiran Yao, Huan Wang, Shelby Heinecke, Silvio Savarese, Caiming Xiong |

阅读更多

来源: ArXiv AI | 20-07-25

VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks

Authors: Jian Yao, Ran Cheng, Kay Chen Tan |

阅读更多

来源: ArXiv AI | 20-07-25

Prediction of Highway Traffic Flow Based on Artificial Intelligence Algorithms Using California Traffic Data

Authors: Junseong Lee, Jaegwan Cho, Yoonju Cho, Seoyoon Choi, Yejin Shin |

阅读更多

来源: ArXiv AI | 20-07-25

Black Box Deployed -- Functional Criteria for Artificial Moral Agents in the LLM Era

Authors: Matthew E. Brophy |

阅读更多

来源: ArXiv AI | 20-07-25

The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations

Authors: Carlos Arriaga, Gonzalo Martínez, Eneko Sendin, Javier Conde, Pedro Reviriego |

阅读更多

来源: ArXiv AI | 20-07-25

Trump advisors are pushing a regulation targeting what they call "woke" AI models in the tech sector

阅读更多

来源: The Decoder | 20-07-25

OpenAI CEO Sam Altman warns users not to trust ChatGPT agent with sensitive or personal data

阅读更多

来源: The Decoder | 20-07-25

OpenAI claims gold-medal performance at IMO 2025twitter.com/alexwei_

阅读更多

来源: Hacker News | 20-07-25

Meta is luring more top AI researchers from Apple with million-dollar deals

阅读更多

来源: The Decoder | 19-07-25

Google's Veo 3 video generation model launches on Gemini API with a hefty price tag

阅读更多

来源: The Decoder | 19-07-25

Meta says it won’t sign Europe AI agreement, calling it an overreachcnbc.com

阅读更多

来源: Hacker News | 19-07-25

GPT-5-reasoning alpha found in the wildtwitter.com/btibor91

阅读更多

来源: Hacker News | 19-07-25

I avoid using LLMs as a publisher and writerlifehacky.net

阅读更多

来源: Hacker News | 19-07-25

Mistral AI adds deep research, voice mode, image editing, and more to Le Chat

阅读更多

来源: The Decoder | 19-07-25

Anthropic could soon be worth $100 billion - thanks to Claude Code

阅读更多

来源: The Decoder | 19-07-25

How I keep up with AI progressnilenso.com

阅读更多

来源: Hacker News | 19-07-25

I'm Rebelling Against the Algorithmvarunraghu.com

阅读更多

来源: Hacker News | 19-07-25

lsr: ls with io_uringrockorager.dev

阅读更多

来源: Hacker News | 19-07-25

Ccusage: A CLI tool for analyzing Claude Code usage from local JSONL filesgithub.com/ryoppippi

阅读更多

来源: Hacker News | 19-07-25

Google brings Gemini 2.5 Pro and Deep Search to AI Mode and adds AI phone calling to search

阅读更多

来源: The Decoder | 18-07-25

Reflection unveils Asimov: an AI agent built to track every step of software development

阅读更多

来源: The Decoder | 18-07-25

Claude Code Unleashedymichael.com

阅读更多

来源: Hacker News | 18-07-25

All AI models might be the samejxmo.io

阅读更多

来源: Hacker News | 18-07-25

My favorite use-case for AI is writing logsvickiboykis.com

阅读更多

来源: Hacker News | 18-07-25

My experience with Claude Code after two weeks of adventuressankalp.bearblog.dev

阅读更多

来源: Hacker News | 18-07-25

ChatGPT agent: bridging research and actionopenai.com

阅读更多

来源: Hacker News | 18-07-25

Anthropic launches a dedicated AI solution to help finance professionals with analysis

阅读更多

来源: The Decoder | 18-07-25

Zuckerberg predicts that not wearing AI glasses in the future will put you at a cognitive disadvantage

阅读更多

来源: The Decoder | 18-07-25

CBS Canceling 'Late Show with Stephen Colbert' After Next Seasonnytimes.com

阅读更多

来源: Hacker News | 18-07-25

Anthropic tightens usage limits for Claude Code without telling userstechcrunch.com

阅读更多

来源: Hacker News | 18-07-25

Meta hires two more leading OpenAI researchers for its superalignment team

阅读更多

来源: The Decoder | 17-07-25

I was wrong about robots.txtevgeniipendragon.com

阅读更多

来源: Hacker News | 17-07-25

The AI bubble today is bigger than the IT bubble in the 1990sapolloacademy.com

阅读更多

来源: Hacker News | 17-07-25

Code Execution Through Email: How I Used Claude to Hack Itselfpynt.io

阅读更多

来源: Hacker News | 17-07-25

N8n vs. node-red, which to use for AI workloadsdaniel-payne-keldan-systems.medium.com

阅读更多

来源: Hacker News | 17-07-25

Quantum Machine Learning in Multi-Qubit Phase-Space Part I: Foundations

Authors: Timothy Heightman, Edward Jiang, Ruth Mora-Soto, Maciej Lewenstein, Marcin Płodzień |

阅读更多

来源: ArXiv AI | 17-07-25

A Framework for Nonstationary Gaussian Processes with Neural Network Parameters

Authors: Zachary James, Joseph Guinness |

阅读更多

来源: ArXiv AI | 17-07-25

Improving Contextual ASR via Multi-grained Fusion with Large Language Models

Authors: Shilin Zhou, Zhenghua Li |

阅读更多

来源: ArXiv AI | 17-07-25

Text-ADBench: Text Anomaly Detection Benchmark based on LLMs Embedding

Authors: Feng Xiao, Jicong Fan |

阅读更多

来源: ArXiv AI | 17-07-25

Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants

Authors: Sybelle Goedicke-Fritz (1), Michelle Bous (1), Annika Engel (2), Matthias Flotho (2 and 5), Pascal Hirsch (2), Hannah Wittig (1), Dino Milanovic (2), Dominik Mohr (1), Mathias Kaspar (6), Sogand Nemat (3), Dorothea Kerner (3), Arno Bücker (3), Andreas Keller (2 and 5 and 7), Sascha Meyer (4), Michael Zemlin (1), Philipp Flotho (2 and 5) ((1) Department of General Pediatrics and Neonatology, Saarland University, Campus Homburg, Homburg/Saar, Germany, (2) Chair for Clinical Bioinformatics, Saarland Informatics Campus, Saarland University, Saarbrücken, Germany, (3) Department of Radiology, and Interventional Radiology, University Hospital of Saarland, Homburg, Germany, (4) Clinical Centre Karlsruhe, Franz-Lust Clinic for Paediatrics, Karlsruhe, Germany, (5) Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Saarland University Campus, Germany, (6) Digital Medicine, University Hospital of Augsburg, Augsburg, Germany, (7) Pharma Science Hub (PSH), Saarland University Campus, Germany) |

阅读更多

来源: ArXiv AI | 17-07-25

Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization

Authors: Prashanth Vijayaraghavan, Apoorva Nitsure, Charles Mackin, Luyao Shi, Stefano Ambrogio, Arvind Haran, Viresh Paruthi, Ali Elzein, Dan Coops, David Beymer, Tyler Baldwin, Ehsan Degan |

阅读更多

来源: ArXiv AI | 17-07-25

GitChameleon: Evaluating AI Code Generation Against Python Library Version Incompatibilities

Authors: Diganta Misra, Nizar Islah, Victor May, Brice Rauby, Zihan Wang, Justine Gehring, Antonio Orvieto, Muawiz Chaudhary, Eilif B. Muller, Irina Rish, Samira Ebrahimi Kahou, Massimo Caccia |

阅读更多

来源: ArXiv AI | 17-07-25

LLM-Based Config Synthesis requires Disambiguation

Authors: Rajdeep Mondal, Nikolaj Bjorner, Todd Millstein, Alan Tang, George Varghese |

阅读更多

来源: ArXiv AI | 17-07-25

Characterizing State Space Model (SSM) and SSM-Transformer Hybrid Language Model Performance with Long Context Length

Authors: Saptarshi Mitra, Rachid Karami, Haocheng Xu, Sitao Huang, Hyoukjun Kwon |

阅读更多

来源: ArXiv AI | 17-07-25

A Study on the Application of Artificial Intelligence in Ecological Design

Authors: Hengyue Zhao |

阅读更多

来源: ArXiv AI | 17-07-25

Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification

Authors: Moises Andrade, Joonhyuk Cha, Brandon Ho, Vriksha Srihari, Karmesh Yadav, Zsolt Kira |

阅读更多

来源: ArXiv AI | 17-07-25

General Modular Harness for LLM Agents in Multi-Turn Gaming Environments

Authors: Yuxuan Zhang, Haoyang Yu, Lanxiang Hu, Haojian Jin, Hao Zhang |

阅读更多

来源: ArXiv AI | 17-07-25

Auto-Formulating Dynamic Programming Problems with Large Language Models

Authors: Chenyu Zhou, Jingyuan Yang, Linwei Xin, Yitian Chen, Ziyan He, Dongdong Ge |

阅读更多

来源: ArXiv AI | 17-07-25

ClarifAI: Enhancing AI Interpretability and Transparency through Case-Based Reasoning and Ontology-Driven Approach for Improved Decision-Making

Authors: Srikanth Vemula |

阅读更多

来源: ArXiv AI | 17-07-25

BuildEvo: Designing Building Energy Consumption Forecasting Heuristics via LLM-driven Evolution

Authors: Subin Lin, Chuanbo Hua |

阅读更多

来源: ArXiv AI | 17-07-25

Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning

Authors: Yuhao Chen, Shuochen Liu, Yuanjie Lyu, Chao Zhang, Jiayao Shi, Tong Xu |

阅读更多

来源: ArXiv AI | 17-07-25

Nvidia can resume exports of its H20 AI chip to China after a US policy reversal

阅读更多

来源: The Decoder | 17-07-25

Scanned piano rolls databasepianorollmusic.org

阅读更多

来源: Hacker News | 17-07-25

Chain of thought monitorability: A new and fragile opportunity for AI safetyarxiv.org

阅读更多

来源: Hacker News | 17-07-25

Six Years of Geminigeminiprotocol.net

阅读更多

来源: Hacker News | 16-07-25

Show HN: Shoggoth Mini – A soft tentacle robot powered by GPT-4o and RLmatthieulc.com

阅读更多

来源: Hacker News | 16-07-25

Reflections on OpenAIcalv.info

阅读更多

来源: Hacker News | 16-07-25

Gauntlet AI (YC S17): All expenses paid training in AI and $200k+jobcrossover.com

阅读更多

来源: Hacker News | 16-07-25

LLM Daydreaminggwern.net

阅读更多

来源: Hacker News | 16-07-25

KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?

Authors: Soumadeep Saha, Akshay Chaturvedi, Saptarshi Saha, Utpal Garain, Nicholas Asher |

阅读更多

来源: ArXiv AI | 16-07-25

EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes

Authors: LG AI Research: Kyunghoon Bae, Eunbi Choi, Kibong Choi, Stanley Jungkyu Choi, Yemuk Choi, Kyubeen Han, Seokhee Hong, Junwon Hwang, Taewan Hwang, Joonwon Jang, Hyojin Jeon, Kijeong Jeon, Gerrard Jeongwon Jo, Hyunjik Jo, Jiyeon Jung, Euisoon Kim, Hyosang Kim, Jihoon Kim, Joonkee Kim, Seonghwan Kim, Soyeon Kim, Sunkyoung Kim, Yireun Kim, Yongil Kim, Youchul Kim, Edward Hwayoung Lee, Gwangho Lee, Haeju Lee, Honglak Lee, Jinsik Lee, Kyungmin Lee, Sangha Park, Young Min Paik, Yongmin Park, Youngyong Park, Sanghyun Seo, Sihoon Yang, Heuiyeen Yeen, Sihyuk Yi, Hyeongu Yun |

阅读更多

来源: ArXiv AI | 16-07-25

Attributes Shape the Embedding Space of Face Recognition Models

Authors: Pierrick Leroy, Antonio Mastropietro, Marco Nurisso, Francesco Vaccarino |

阅读更多

来源: ArXiv AI | 16-07-25

SAMEP: A Secure Protocol for Persistent Context Sharing Across AI Agents

Authors: Hari Masoor |

阅读更多

来源: ArXiv AI | 16-07-25

Streaming 4D Visual Geometry Transformer

Authors: Dong Zhuo, Wenzhao Zheng, Jiahe Guo, Yuqi Wu, Jie Zhou, Jiwen Lu |

阅读更多

来源: ArXiv AI | 16-07-25

AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air

Authors: Shiyi Yang, Xiaoxue Yu, Rongpeng Li, Jianhang Zhu, Zhifeng Zhao, Honggang Zhang |

阅读更多

来源: ArXiv AI | 16-07-25

Enhancing the Capabilities of Large Language Models for API calls through Knowledge Graphs

Authors: Ye Yang, Xue Xiao, Ping Yin, Taotao Xie |

阅读更多

来源: ArXiv AI | 16-07-25

Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning

Authors: Zheng Zhang |

阅读更多

来源: ArXiv AI | 16-07-25

Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning

Authors: Konstantinos I. Roumeliotis, Ranjan Sapkota, Manoj Karkee, Nikolaos D. Tselikas |

阅读更多

来源: ArXiv AI | 16-07-25

WhisperKit: On-device Real-time ASR with Billion-Scale Transformers

Authors: Atila Orhon, Arda Okan, Berkin Durmus, Zach Nagengast, Eduardo Pacheco |

阅读更多

来源: ArXiv AI | 16-07-25

Automated Thematic Analyses Using LLMs: Xylazine Wound Management Social Media Chatter Use Case

Authors: JaMor Hairston, Ritvik Ranjan, Sahithi Lakamana, Anthony Spadaro, Selen Bozkurt, Jeanmarie Perrone, Abeed Sarker |

阅读更多

来源: ArXiv AI | 16-07-25

Detecting AI Assistance in Abstract Complex Tasks

Authors: Tyler King, Nikolos Gurney, John H. Miller, Volkan Ustun |

阅读更多

来源: ArXiv AI | 16-07-25

IoT Malware Network Traffic Detection using Deep Learning and GraphSAGE Models

Authors: Nikesh Prajapati, Bimal Karki, Saroj Gopali, Akbar Siami Namin |

阅读更多

来源: ArXiv AI | 16-07-25

Function-to-Style Guidance of LLMs for Code Translation

Authors: Longhui Zhang, Bin Wang, Jiahao Wang, Xiaofeng Zhao, Min Zhang, Hao Yang, Meishan Zhang, Yu Li, Jing Li, Jun Yu, Min Zhang |

阅读更多

来源: ArXiv AI | 16-07-25

Modeling Habitat Shifts: Integrating Convolutional Neural Networks and Tabular Data for Species Migration Prediction

Authors: Emir Durakovic, Min-Hong Shih |

阅读更多

来源: ArXiv AI | 16-07-25

Lessons Learned from Evaluation of LLM based Multi-agents in Safer Therapy Recommendation

Authors: Yicong Wu, Ting Chen, Irit Hochberg, Zhoujian Sun, Ruth Edry, Zhengxing Huang, Mor Peleg |

阅读更多

来源: ArXiv AI | 16-07-25

Taming Uncertainty via Automation: Observing, Analyzing, and Optimizing Agentic AI Systems

Authors: Dany Moshkovich, Sergey Zeltyn |

阅读更多

来源: ArXiv AI | 16-07-25

Perspective-Aware AI in Extended Reality

Authors: Daniel Platnick, Matti Gruener, Marjan Alirezaie, Kent Larson, Dava J. Newman, Hossein Rahnama |

阅读更多

来源: ArXiv AI | 16-07-25

DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering

Authors: Yinsheng Li, Zhen Dong, Yi Shao |

阅读更多

来源: ArXiv AI | 16-07-25

How Many Instructions Can LLMs Follow at Once?

Authors: Daniel Jaroslawicz, Brendan Whiting, Parth Shah, Karime Maamari |

阅读更多

来源: ArXiv AI | 16-07-25

Vulnerable kids are nearly three times more likely to use companion AI chatbots for friendship

阅读更多

来源: The Decoder | 16-07-25

Anthropic, OpenAI, Google, and xAI have landed Pentagon contracts worth up to $200 million

阅读更多

来源: The Decoder | 16-07-25

LLM Inevitabilismtomrenner.com

阅读更多

来源: Hacker News | 16-07-25

OpenAI – vulnerability responsible disclosureany.org

阅读更多

来源: Hacker News | 16-07-25

Mira Murati’s AI startup Thinking Machines valued at $12B in early-stage fundingreuters.com

阅读更多

来源: Hacker News | 16-07-25

Claude for Financial Servicesanthropic.com

阅读更多

来源: Hacker News | 16-07-25

Unlike ChatGPT, Anthropic has doubled down on Artifactsben-mini.com

阅读更多

来源: Hacker News | 16-07-25

NeuralOS: An operating system powered by neural networksneural-os.com

阅读更多

来源: Hacker News | 15-07-25

Context Rot: How increasing input tokens impacts LLM performancetrychroma.com

阅读更多

来源: Hacker News | 15-07-25

CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks

Authors: Hongchao Jiang, Yiming Chen, Yushi Cao, Hung-yi Lee, Robby T. Tan |

阅读更多

来源: ArXiv AI | 15-07-25

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

Authors: Joel Becker, Nate Rush, Elizabeth Barnes, David Rein |

阅读更多

来源: ArXiv AI | 15-07-25

Multi-Actor Generative Artificial Intelligence as a Game Engine

Authors: Alexander Sasha Vezhnevets, Jayd Matyas, Logan Cross, Davide Paglieri, Minsuk Chang, William A. Cunningham, Simon Osindero, William S. Isaac, Joel Z. Leibo |

阅读更多

来源: ArXiv AI | 15-07-25

LLM-Stackelberg Games: Conjectural Reasoning Equilibria and Their Applications to Spearphishing

Authors: Quanyan Zhu |

阅读更多

来源: ArXiv AI | 15-07-25

Knowledge Conceptualization Impacts RAG Efficacy

Authors: Chris Davis Jaldi, Anmol Saini, Elham Ghiasi, O. Divine Eziolise, Cogan Shimizu |

阅读更多

来源: ArXiv AI | 15-07-25

EduFlow: Advancing MLLMs' Problem-Solving Proficiency through Multi-Stage, Multi-Perspective Critique

Authors: Chenglin Zhu, Tao Zhang, Chong Li, Mingan Lin, Zenan Zhou, Jian Xie |

阅读更多

来源: ArXiv AI | 15-07-25

A Taxonomy of Omnicidal Futures Involving Artificial Intelligence

Authors: Andrew Critch, Jacob Tsimerman |

阅读更多

来源: ArXiv AI | 15-07-25

When Developer Aid Becomes Security Debt: A Systematic Analysis of Insecure Behaviors in LLM Coding Agents

Authors: Matous Kozak, Roshanak Zilouchian Moghaddam, Siva Sivaraman |

阅读更多

来源: ArXiv AI | 15-07-25

humancompatible.interconnect: Testing Properties of Repeated Uses of Interconnections of AI Systems

Authors: Rodion Nazarov, Anthony Quinn, Robert Shorten, Jakub Marecek |

阅读更多

来源: ArXiv AI | 15-07-25

Learning to Control Dynamical Agents via Spiking Neural Networks and Metropolis-Hastings Sampling

Authors: Ali Safa, Farida Mohsen, Ali Al-Zawqari |

阅读更多

来源: ArXiv AI | 15-07-25

Model-Grounded Symbolic Artificial Intelligence Systems Learning and Reasoning with Model-Grounded Symbolic Artificial Intelligence Systems

Authors: Aniruddha Chattopadhyay, Raj Dandekar, Kaushik Roy |

阅读更多

来源: ArXiv AI | 15-07-25

Is Human-Written Data Enough? The Challenge of Teaching Reasoning to LLMs Without RL or Distillation

Authors: Wei Du, Branislav Kisacanin, George Armstrong, Shubham Toshniwal, Ivan Moshkov, Alexan Ayrapetyan, Sadegh Mahdavi, Dan Zhao, Shizhe Diao, Dragan Masulovic, Marius Stanean, Advaith Avadhanam, Max Wang, Ashmit Dutta, Shitij Govil, Sri Yanamandara, Mihir Tandon, Sriram Ananthakrishnan, Vedant Rathi, David Zhang, Joonseok Kang, Leon Luo, Titu Andreescu, Boris Ginsburg, Igor Gitman |

阅读更多

来源: ArXiv AI | 15-07-25

Technical Requirements for Halting Dangerous AI Activities

Authors: Peter Barnett, Aaron Scher, David Abecassis |

阅读更多

来源: ArXiv AI | 15-07-25

Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations

Authors: Bradley P. Allen, Prateek Chhikara, Thomas Macaulay Ferguson, Filip Ilievski, Paul Groth |

阅读更多

来源: ArXiv AI | 15-07-25

DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models

Authors: Luolin Xiong, Haofen Wang, Xi Chen, Lu Sheng, Yun Xiong, Jingping Liu, Yanghua Xiao, Huajun Chen, Qing-Long Han, Yang Tang |

阅读更多

来源: ArXiv AI | 15-07-25

Could you be wrong: Debiasing LLMs using a metacognitive prompt for improving human decision making

Authors: Thomas T. Hills |

阅读更多

来源: ArXiv AI | 15-07-25

Analysis of AI Techniques for Orchestrating Edge-Cloud Application Migration

Authors: Sadig Gojayev, Ahmad Anaqreh, Carolina Fortuna |

阅读更多

来源: ArXiv AI | 15-07-25

BlueGlass: A Framework for Composite AI Safety

Authors: Harshal Nandigramwar, Syed Qutub, Kay-Ulrich Scholl |

阅读更多

来源: ArXiv AI | 15-07-25

FRSICL: LLM-Enabled In-Context Learning Flight Resource Allocation for Fresh Data Collection in UAV-Assisted Wildfire Monitoring

Authors: Yousef Emami, Hao Zhou, Miguel Gutierrez Gaitan, Kai Li, Luis Almeida |

阅读更多

来源: ArXiv AI | 15-07-25

Introducing the Swiss Food Knowledge Graph: AI for Context-Aware Nutrition Recommendation

Authors: Lubnaa Abdur Rahman, Ioannis Papathanail, Stavroula Mougiakakou |

阅读更多

来源: ArXiv AI | 15-07-25

Survey for Categorising Explainable AI Studies Using Data Analysis Task Frameworks

Authors: Hamzah Ziadeh, Hendrik Knoche |

阅读更多

来源: ArXiv AI | 15-07-25

Should We Ever Prefer Decision Transformer for Offline Reinforcement Learning?

Authors: Yumi Omori, Zixuan Dong, Keith Ross |

阅读更多

来源: ArXiv AI | 15-07-25

Toward Real-World Table Agents: Capabilities, Workflows, and Design Principles for LLM-based Table Intelligence

Authors: Jiaming Tian, Liyao Li, Wentao Ye, Haobo Wang, Lingxin Wang, Lihua Yu, Zujie Ren, Gang Chen, Junbo Zhao |

阅读更多

来源: ArXiv AI | 15-07-25

SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning

Authors: Meriem Zerkouk, Miloud Mihoubi, Belkacem Chikhaoui |

阅读更多

来源: ArXiv AI | 15-07-25

Elon Musk's AI company xAI apologizes "deeply" for Grok's "horrific behavior"

阅读更多

来源: The Decoder | 15-07-25

Anthropic, Google, OpenAI and XAI Granted Up to $200M from Defense Departmentcnbc.com

阅读更多

来源: Hacker News | 15-07-25

Embedding user-defined indexes in Apache Parquetapache.org

阅读更多

来源: Hacker News | 15-07-25

OpenAI delays release of open-weight model indefinitely over safety concerns

阅读更多

来源: The Decoder | 14-07-25

A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1

Authors: Marcin Pietroń, Rafał Olszowski, Jakub Gomułka, Filip Gampel, Andrzej Tomski |

阅读更多

来源: ArXiv AI | 14-07-25

Generating Proto-Personas through Prompt Engineering: A Case Study on Efficiency, Effectiveness and Empathy

Authors: Fernando Ayach, Vitor Lameirão, Raul Leão, Jerfferson Felizardo, Rafael Sobrinho, Vanessa Borges, Patrícia Matsubara, Awdren Fontão |

阅读更多

来源: ArXiv AI | 14-07-25

TableReasoner: Advancing Table Reasoning Framework with Large Language Models

Authors: Sishi Xiong, Dakai Wang, Yu Zhao, Jie Zhang, Changzai Pan, Haowei He, Xiangyu Li, Wenhan Chang, Zhongjiang He, Shuangyong Song, Yongxiang Li |

阅读更多

来源: ArXiv AI | 14-07-25

Reasoning and Behavioral Equilibria in LLM-Nash Games: From Mindsets to Actions

Authors: Quanyan Zhu |

阅读更多

来源: ArXiv AI | 14-07-25

A Dynamic Stackelberg Game Framework for Agentic AI Defense Against LLM Jailbreaking

Authors: Zhengye Han, Quanyan Zhu |

阅读更多

来源: ArXiv AI | 14-07-25

Giving AI Agents Access to Cryptocurrency and Smart Contracts Creates New Vectors of AI Harm

Authors: Bill Marino, Ari Juels |

阅读更多

来源: ArXiv AI | 14-07-25

Multi-Agent LLMs as Ethics Advocates in AI-Based Systems

Authors: Asma Yamani, Malak Baslyman, Moataz Ahmed |

阅读更多

来源: ArXiv AI | 14-07-25

M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning

Authors: Inclusion AI: Fudong Wang, Jiajia Liu, Jingdong Chen, Jun Zhou, Kaixiang Ji, Lixiang Ru, Qingpei Guo, Ruobing Zheng, Tianqi Li, Yi Yuan, Yifan Mao, Yuting Xiao, Ziping Ma |

阅读更多

来源: ArXiv AI | 14-07-25

Introspection of Thought Helps AI Agents

Authors: Haoran Sun, Shaoning Zeng |

阅读更多

来源: ArXiv AI | 14-07-25

Agentic Large Language Models for Conceptual Systems Engineering and Design

Authors: Soheyl Massoudi, Mark Fuge |

阅读更多

来源: ArXiv AI | 14-07-25

Show HN: FFmpeg in plain English – LLM-assisted FFmpeg in the browservidmix.app

阅读更多

来源: Hacker News | 14-07-25

The upcoming GPT-3 moment for RLmechanize.work

阅读更多

来源: Hacker News | 14-07-25

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMsarxiv.org

阅读更多

来源: Hacker News | 14-07-25

Local Chatbot RAG with FreeBSD Knowledgehackacad.net

阅读更多

来源: Hacker News | 14-07-25

Ask HN: How much of OpenAI code is written by AI?

阅读更多

来源: Hacker News | 14-07-25

Show HN: Learn LLMs LeetCode Stylegithub.com/exorust

阅读更多

来源: Hacker News | 14-07-25

Hypercapitalism and the AI talent warsjohnluttig.com

阅读更多

来源: Hacker News | 14-07-25

OpenAI loses out as Google hires Windsurf's CEO and top talent

阅读更多

来源: The Decoder | 13-07-25

Switching to Claude Code and VSCode Inside Dockertimsh.org

阅读更多

来源: Hacker News | 13-07-25

Understanding Tool Calling in LLMs – Step-by-Step with REST and Spring AImuthuishere.medium.com

阅读更多

来源: Hacker News | 13-07-25

Axon's Draft One AI Police Report Generator Is Designed to Defy Transparencyeff.org

阅读更多

来源: Hacker News | 13-07-25

MIRIX: Multi-Agent Memory System for LLM-Based Agents

Authors: Yu Wang, Xi Chen |

阅读更多

来源: ArXiv AI | 13-07-25

Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs

Authors: Jeongseok Hyun, Sukjun Hwang, Su Ho Han, Taeoh Kim, Inwoong Lee, Dongyoon Wee, Joon-Young Lee, Seon Joo Kim, Minho Shim |

阅读更多

来源: ArXiv AI | 13-07-25

Autonomous Control Leveraging LLMs: An Agentic Framework for Next-Generation Industrial Automation

Authors: Javal Vyas, Mehmet Mercangoz |

阅读更多

来源: ArXiv AI | 13-07-25

BOOST: Out-of-Distribution-Informed Adaptive Sampling for Bias Mitigation in Stylistic Convolutional Neural Networks

Authors: Mridula Vijendran, Shuang Chen, Jingjing Deng, Hubert P. H. Shum |

阅读更多

来源: ArXiv AI | 13-07-25

Application of LLMs to Multi-Robot Path Planning and Task Allocation

Authors: Ashish Kumar |

阅读更多

来源: ArXiv AI | 13-07-25

On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment

Authors: Sarah Ball, Greg Gluch, Shafi Goldwasser, Frauke Kreuter, Omer Reingold, Guy N. Rothblum |

阅读更多

来源: ArXiv AI | 13-07-25

StarDojo: Benchmarking Open-Ended Behaviors of Agentic Multimodal LLMs in Production-Living Simulations with Stardew Valley

Authors: Weihao Tan, Changjiu Jiang, Yu Duan, Mingcong Lei, Jiageng Li, Yitian Hong, Xinrun Wang, Bo An |

阅读更多

来源: ArXiv AI | 13-07-25

DrugMCTS: a drug repurposing framework combining multi-agent, RAG and Monte Carlo Tree Search

Authors: Zerui Yang, Yuwei Wan, Yinqiao Li, Yudai Matsuda, Tong Xie, Linqi Song |

阅读更多

来源: ArXiv AI | 13-07-25

Enhancing Vaccine Safety Surveillance: Extracting Vaccine Mentions from Emergency Department Triage Notes Using Fine-Tuned Large Language Models

Authors: Sedigh Khademi, Jim Black, Christopher Palmer, Muhammad Javed, Hazel Clothier, Jim Buttery, Gerardo Luis Dimaguila |

阅读更多

来源: ArXiv AI | 13-07-25

PlanQA: A Benchmark for Spatial Reasoning in LLMs using Structured Representations

Authors: Fedor Rodionov, Abdelrahman Eldesokey, Michael Birsak, John Femiani, Bernard Ghanem, Peter Wonka |

阅读更多

来源: ArXiv AI | 13-07-25

Measuring AI Alignment with Human Flourishing

Authors: Elizabeth Hilliard, Akshaya Jagadeesh, Alex Cook, Steele Billings, Nicholas Skytland, Alicia Llewellyn, Jackson Paull, Nathan Paull, Nolan Kurylo, Keatra Nesbitt, Robert Gruenewald, Anthony Jantzi, Omar Chavez |

阅读更多

来源: ArXiv AI | 13-07-25

Stable Preference Optimization for LLMs: A Bilevel Approach Beyond Direct Preference Optimization

Authors: Chengtao Jian, Kai Yang, Ye Ouyang, Xiaozhou Ye |

阅读更多

来源: ArXiv AI | 13-07-25

An Integrated Framework of Prompt Engineering and Multidimensional Knowledge Graphs for Legal Dispute Analysis

Authors: Mingda Zhang, Na Zhao, Jianglong Qing, Qing xu, Kaiwen Pan, Ting luo |

阅读更多

来源: ArXiv AI | 13-07-25

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

阅读更多

来源: The Decoder | 13-07-25

EU's Model Documentation Form makes AI providers explain their models like it's tax season

阅读更多

来源: The Decoder | 13-07-25

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

阅读更多

来源: The Decoder | 13-07-25

OpenAI’s Windsurf deal is off, and Windsurf’s CEO is going to Googletheverge.com

阅读更多

来源: Hacker News | 13-07-25

Researchers used 1,600 YouTube fail videos to show AI models struggle with surprises

阅读更多

来源: The Decoder | 13-07-25

OpenAI’s head of ChatGPT says AI will not displace doctors but will displace not going to the doctor

阅读更多

来源: The Decoder | 12-07-25

Bad Actors Are Grooming LLMs to Produce Falsehoodsamericansunlight.substack.com

阅读更多

来源: Hacker News | 12-07-25

OpenAI delays launch of open-weight modeltwitter.com/sama

阅读更多

来源: Hacker News | 12-07-25

Leveraging Elixir's hot code loading capabilities to modularize a monolithic applucassifoni.info

阅读更多

来源: Hacker News | 12-07-25

Andrew Ng: Building Faster with AI [video]youtube.com

阅读更多

来源: Hacker News | 12-07-25

Sieve (YC X25) is hiring researchers to build large video datasets for AI labssievedata.com

阅读更多

来源: Hacker News | 12-07-25

Upgrading an M4 Pro Mac mini's storage for half the pricejeffgeerling.com

阅读更多

来源: Hacker News | 12-07-25

ETH Zurich and EPFL to release a LLM developed on public infrastructureethz.ch

阅读更多

来源: Hacker News | 12-07-25

Google unveils MedGemma, an open-source AI model suite for medical applications

阅读更多

来源: The Decoder | 12-07-25

LLM Inference Handbookbentoml.com

阅读更多

来源: Hacker News | 12-07-25

Activeloop (YC S18) Is Hiring AI Search and Python Back End Engineers(Onsite,MV)activeloop.ai

阅读更多

来源: Hacker News | 12-07-25

Hugging Face warns that closed-source robots threaten user control

阅读更多

来源: The Decoder | 11-07-25

Most AI models can fake alignment, but safety training suppresses the behavior, study finds

阅读更多

来源: The Decoder | 11-07-25

Meta continues to lure top AI talent with compensation packages exceeding $200 million

阅读更多

来源: The Decoder | 11-07-25

OpenAI will debut an open-weight LLM soon and launch a browser with integrated AI chat

阅读更多

来源: The Decoder | 11-07-25

Graphical Linear Algebragraphicallinearalgebra.net

阅读更多

来源: Hacker News | 11-07-25

Batch Mode in the Gemini API: Process More for Lessgoogleblog.com

阅读更多

来源: Hacker News | 11-07-25

Recovering from AI Addictioninternetaddictsanonymous.org

阅读更多

来源: Hacker News | 11-07-25

Is Gemini 2.5 good at bounding boxes?simedw.com

阅读更多

来源: Hacker News | 11-07-25

Not So Fast: AI Coding Tools Can Reduce Productivitysecondthoughts.ai

阅读更多

来源: Hacker News | 11-07-25

Measuring the impact of AI on experienced open-source developer productivitymetr.org

阅读更多

来源: Hacker News | 11-07-25

Bloomberg: China’s AI expansion in Xinjiang relies on Nvidia chips despite U.S. export controls

阅读更多

来源: The Decoder | 10-07-25

An attacker used AI to impersonate Secretary Rubio and contact high-ranking officials

阅读更多

来源: The Decoder | 10-07-25

At last, a use case for AI agents with sky-high ROI: Stealing cryptotheregister.com

阅读更多

来源: Hacker News | 10-07-25

ChatGPT Guessing Game Leads to Users Extracting Free Windows OS Keys and More0din.ai

阅读更多

来源: Hacker News | 10-07-25

Biomni: A General-Purpose Biomedical AI Agentgithub.com/snap-stanford

阅读更多

来源: Hacker News | 10-07-25

MCP-B: A Protocol for AI Browser Automationmcp-b.ai

阅读更多

来源: Hacker News | 10-07-25

Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining: Method, Evaluation and Applications

Authors: Seonwu Kim, Yohan Na, Kihun Kim, Hanhee Cho, Geun Lim, Mintae Kim, Seongik Park, Ki Hyun Kim, Youngsub Han, Byoung-Ki Jeon |

阅读更多

来源: ArXiv AI | 10-07-25

Comprehensive Evaluation of Prototype Neural Networks

Authors: Philipp Schlinge, Steffen Meinert, Martin Atzmueller |

阅读更多

来源: ArXiv AI | 10-07-25

OpenDPDv2: A Unified Learning and Optimization Framework for Neural Network Digital Predistortion

Authors: Yizhuo Wu, Ang Li, Chang Gao |

阅读更多

来源: ArXiv AI | 10-07-25

Winning and losing with Artificial Intelligence: What public discourse about ChatGPT tells us about how societies make sense of technological change

Authors: Adrian Rauchfleisch, Joshua Philip Suarez, Nikka Marie Sales, Andreas Jungherr |

阅读更多

来源: ArXiv AI | 10-07-25

The Dark Side of LLMs Agent-based Attacks for Complete Computer Takeover

Authors: Matteo Lupinacci, Francesco Aurelio Pironti, Francesco Blefari, Francesco Romeo, Luigi Arena, Angelo Furfaro |

阅读更多

来源: ArXiv AI | 10-07-25

Developing and Maintaining an Open-Source Repository of AI Evaluations: Challenges and Insights

Authors: Alexandra Abbas, Celia Waggoner, Justin Olive |

阅读更多

来源: ArXiv AI | 10-07-25

Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model

Authors: Jing Liang, Hongyao Tang, Yi Ma, Jinyi Liu, Yan Zheng, Shuyue Hu, Lei Bai, Jianye Hao |

阅读更多

来源: ArXiv AI | 10-07-25

MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation

Authors: Qilong Xing, Zikai Song, Youjia Zhang, Na Feng, Junqing Yu, Wei Yang |

阅读更多

来源: ArXiv AI | 10-07-25

PLAME: Leveraging Pretrained Language Models to Generate Enhanced Protein Multiple Sequence Alignments

Authors: Hanqun Cao, Xinyi Zhou, Zijun Gao, Chenyu Wang, Xin Gao, Zhi Zhang, Chunbin Gu, Ge Liu, Pheng-Ann Heng |

阅读更多

来源: ArXiv AI | 10-07-25

A Novel Hybrid Deep Learning Technique for Speech Emotion Detection using Feature Engineering

Authors: Shahana Yasmin Chowdhury, Bithi Banik, Md Tamjidul Hoque, Shreya Banerjee |

阅读更多

来源: ArXiv AI | 10-07-25

Advances in Intelligent Hearing Aids: Deep Learning Approaches to Selective Noise Cancellation

Authors: Haris Khan, Shumaila Asif, Hassan Nasir |

阅读更多

来源: ArXiv AI | 10-07-25

DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning

Authors: Shreyas Vinaya Sathyanarayana, Rahil Shah, Sharanabasava D. Hiremath, Rishikesh Panda, Rahul Jana, Riya Singh, Rida Irfan, Ashwin Murali, Bharath Ramsundar |

阅读更多

来源: ArXiv AI | 10-07-25

Comparative Analysis of CNN and Transformer Architectures with Heart Cycle Normalization for Automated Phonocardiogram Classification

Authors: Martin Sondermann, Pinar Bisgin, Niklas Tschorn, Anja Burmann, Christoph M. Friedrich |

阅读更多

来源: ArXiv AI | 10-07-25

An AI Approach for Learning the Spectrum of the Laplace-Beltrami Operator

Authors: Yulin An, Enrique del Castillo |

阅读更多

来源: ArXiv AI | 10-07-25

Jolting Technologies: Superexponential Acceleration in AI Capabilities and Implications for AGI

Authors: David Orban |

阅读更多

来源: ArXiv AI | 10-07-25

The User-Centric Geo-Experience: An LLM-Powered Framework for Enhanced Planning, Navigation, and Dynamic Adaptation

Authors: Jieren Deng, Aleksandar Cvetkovic, Pak Kiu Chung, Dragomir Yankov, Chiqun Zhang |

阅读更多

来源: ArXiv AI | 10-07-25

OpenAI is ramping up security to prevent rivals from copying its advanced AI models

阅读更多

来源: The Decoder | 10-07-25

RapidRAW: A non-destructive and GPU-accelerated RAW image editorgithub.com/cybertimon

阅读更多

来源: Hacker News | 10-07-25

Why LLMs Can't Write Q/Kdb+: Writing Code Right-to-Leftmedium.com/gabiteodoru

阅读更多

来源: Hacker News | 10-07-25

Apple’s AI team faces major departures as Meta recruits key engineers

阅读更多

来源: The Decoder | 09-07-25

A developer focused on stopping AI bots says poisoning datasets is like peeing in the ocean

阅读更多

来源: The Decoder | 09-07-25

Researchers reveal that AI models have distinct strategic fingerprints in classic game theory tests

阅读更多

来源: The Decoder | 09-07-25

Sakana AI's new algorithm lets large language models work together to solve complex problems

阅读更多

来源: The Decoder | 09-07-25

Huawei pushes back on AI model plagiarism claims

阅读更多

来源: The Decoder | 09-07-25

UQLM: A Python Package for Uncertainty Quantification in Large Language Models

Authors: Dylan Bouchard, Mohit Singh Chauhan, David Skarbrevik, Ho-Kyeong Ra, Viren Bajaj, Zeya Ahmad |

阅读更多

来源: ArXiv AI | 09-07-25

SQLBarber: A System Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads

Authors: Jiale Lao, Immanuel Trummer |

阅读更多

来源: ArXiv AI | 09-07-25

Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

Authors: Xiangru Tang, Tianrui Qin, Tianhao Peng, Ziyang Zhou, Daniel Shao, Tingting Du, Xinming Wei, Peng Xia, Fang Wu, He Zhu, Ge Zhang, Jiaheng Liu, Xingyao Wang, Sirui Hong, Chenglin Wu, Hao Cheng, Chi Wang, Wangchunshu Zhou |

阅读更多

来源: ArXiv AI | 09-07-25

Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers

Authors: Zhiyuan Peng, Ting-ruen Wei, Tingyu Song, Yilun Zhao, Yi Fang |

阅读更多

来源: ArXiv AI | 09-07-25

Chat2SPaT: A Large Language Model Based Tool for Automating Traffic Signal Control Plan Management

Authors: Yue Wang, Miao Zhou, Guijing Huang, Rui Zhuo, Chao Yi, Zhenliang Ma |

阅读更多

来源: ArXiv AI | 09-07-25

Cultivating Multimodal Intelligence: Interpretive Reasoning and Agentic RAG Approaches to Dermatological Diagnosis

Authors: Karishma Thakrar, Shreyas Basavatia, Akshay Daftardar |

阅读更多

来源: ArXiv AI | 09-07-25

SenseCF: LLM-Prompted Counterfactuals for Intervention and Sensor Data Augmentation

Authors: Shovito Barua Soumma, Asiful Arefeen, Stephanie M. Carpenter, Melanie Hingle, Hassan Ghasemzadeh |

阅读更多

来源: ArXiv AI | 09-07-25

Red Teaming AI Red Teaming

Authors: Subhabrata Majumdar, Brian Pendleton, Abhishek Gupta |

阅读更多

来源: ArXiv AI | 09-07-25

Conversational Education at Scale: A Multi-LLM Agent Workflow for Procedural Learning and Pedagogic Quality Assessment

Authors: Jiahuan Pei, Fanghua Ye, Xin Sun, Wentao Deng, Koen Hindriks, Junxiao Wang |

阅读更多

来源: ArXiv AI | 09-07-25

Domain adaptation of large language models for geotechnical applications

Authors: Lei Fan, Fangxue Liu, Cheng Chen |

阅读更多

来源: ArXiv AI | 09-07-25

MLlm-DR: Towards Explainable Depression Recognition with MultiModal Large Language Models

Authors: Wei Zhang, Juan Chen, En Zhu, Wenhong Cheng, YunPeng Li, Yanbo J. Wang |

阅读更多

来源: ArXiv AI | 09-07-25

Towards Measurement Theory for Artificial Intelligence

Authors: Elija Perrier |

阅读更多

来源: ArXiv AI | 09-07-25

Divergent Realities: A Comparative Analysis of Human Expert vs. Artificial Intelligence Based Generation and Evaluation of Treatment Plans in Dermatology

Authors: Dipayan Sengupta, Saumya Panda |

阅读更多

来源: ArXiv AI | 09-07-25

LLMs are Introvert

Authors: Litian Zhang, Xiaoming Zhang, Bingyu Yan, Ziyi Zhou, Bo Zhang, Zhenyu Guan, Xi Zhang, Chaozhuo Li |

阅读更多

来源: ArXiv AI | 09-07-25

Enhancing Student Learning with LLM-Generated Retrieval Practice Questions: An Empirical Study in Data Science Courses

Authors: Yuan An, John Liu, Niyam Acharya, Ruhma Hashmi |

阅读更多

来源: ArXiv AI | 09-07-25

An autonomous agent for auditing and improving the reliability of clinical AI models

Authors: Lukas Kuhn, Florian Buettner |

阅读更多

来源: ArXiv AI | 09-07-25

Current Practices for Building LLM-Powered Reasoning Tools Are Ad Hoc -- and We Can Do Better

Authors: Aaron Bembenek (The University of Melbourne) |

阅读更多

来源: ArXiv AI | 09-07-25

Affective-ROPTester: Capability and Bias Analysis of LLMs in Predicting Retinopathy of Prematurity

Authors: Shuai Zhao, Yulin Zhang, Luwei Xiao, Xinyi Wu, Yanhao Jia, Zhongliang Guo, Xiaobao Wu, Cong-Duy Nguyen, Guoming Zhang, Anh Tuan Luu |

阅读更多

来源: ArXiv AI | 09-07-25

MusiScene: Leveraging MU-LLaMA for Scene Imagination and Enhanced Video Background Music Generation

Authors: Fathinah Izzati, Xinyue Li, Yuxuan Wu, Gus Xia |

阅读更多

来源: ArXiv AI | 09-07-25

Development and Evaluation of HopeBot: an LLM-based chatbot for structured and interactive PHQ-9 depression screening

Authors: Zhijun Guo, Alvina Lai, Julia Ive, Alexandru Petcu, Yutong Wang, Luyuan Qi, Johan H Thygesen, Kezhi Li |

阅读更多

来源: ArXiv AI | 09-07-25

OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety

Authors: Sanidhya Vijayvargiya, Aditya Bharat Soni, Xuhui Zhou, Zora Zhiruo Wang, Nouha Dziri, Graham Neubig, Maarten Sap |

阅读更多

来源: ArXiv AI | 09-07-25

FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models

Authors: Bo Pang, Yalu Ouyang, Hangfei Xu, Ziqi Jia, Panpan Li, Shengzhao Wen, Lu Wang, Shiyong Li, Yanpeng Wang |

阅读更多

来源: ArXiv AI | 09-07-25

Smollm3: Smol, multilingual, long-context reasoner LLMhuggingface.co

阅读更多

来源: Hacker News | 09-07-25

I'm Building LLM for Satellite Data EarthGPT.appearthgpt.app

阅读更多

来源: Hacker News | 09-07-25

The Tradeoffs of SSMs and Transformersgoombalab.github.io

阅读更多

来源: Hacker News | 09-07-25

Rules of good writing (2007)dilbertblog.typepad.com

阅读更多

来源: Hacker News | 09-07-25

ChatGPT helped identify a genetic MTHFR mutation after a decade of missed diagnoses

阅读更多

来源: The Decoder | 08-07-25

Adding a feature because ChatGPT incorrectly thinks it existsholovaty.com

阅读更多

来源: Hacker News | 08-07-25

Launch HN: Morph (YC S23) – Apply AI code edits at 4,500 tokens/sec

阅读更多

来源: Hacker News | 08-07-25

Agent Exchange: Shaping the Future of AI Agent Economics

Authors: Yingxuan Yang, Ying Wen, Jun Wang, Weinan Zhang |

阅读更多

来源: ArXiv AI | 08-07-25

LLMs model how humans induce logically structured rules

Authors: Alyssa Loo, Ellie Pavlick, Roman Feiman |

阅读更多

来源: ArXiv AI | 08-07-25

Toward Better Generalisation in Uncertainty Estimators: Leveraging Data-Agnostic Features

Authors: Thuy An Ha, Bao Quoc Vo |

阅读更多

来源: ArXiv AI | 08-07-25

Lyria: A General LLM-Driven Genetic Algorithm Framework for Problem Solving

Authors: Weizhi Tang, Kwabena Nuamah, Vaishak Belle |

阅读更多

来源: ArXiv AI | 08-07-25

A Technical Survey of Reinforcement Learning Techniques for Large Language Models

Authors: Saksham Sahai Srivastava, Vaneet Aggarwal |

阅读更多

来源: ArXiv AI | 08-07-25

Enhancing Robustness of LLM-Driven Multi-Agent Systems through Randomized Smoothing

Authors: Jinwei Hu, Yi Dong, Zhengtao Ding, Xiaowei Huang |

阅读更多

来源: ArXiv AI | 08-07-25

How to Train Your LLM Web Agent: A Statistical Diagnosis

Authors: Dheeraj Vattikonda, Santhoshi Ravichandran, Emiliano Penaloza, Hadi Nekoei, Megh Thakkar, Thibault Le Sellier de Chezelles, Nicolas Gontier, Miguel Muñoz-Mármol, Sahar Omidi Shayegan, Stefania Raimondo, Xue Liu, Alexandre Drouin, Laurent Charlin, Alexandre Piché, Alexandre Lacoste, Massimo Caccia |

阅读更多

来源: ArXiv AI | 08-07-25

LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers

Authors: Jingze Zhu, Yongliang Wu, Wenbo Zhu, Jiawang Cao, Yanqiang Zheng, Jiawei Chen, Xu Yang, Bernt Schiele, Jonas Fischer, Xinting Hu |

阅读更多

来源: ArXiv AI | 08-07-25

DC-Mamber: A Dual Channel Prediction Model based on Mamba and Linear Transformer for Multivariate Time Series Forecasting

Authors: Bing Fan, Shusen Ma, Yun-Bo Zhao, Yu Kang |

阅读更多

来源: ArXiv AI | 08-07-25

MedGellan: LLM-Generated Medical Guidance to Support Physicians

Authors: Debodeep Banerjee, Burcu Sayin, Stefano Teso, Andrea Passerini |

阅读更多

来源: ArXiv AI | 08-07-25

Towards integration of Privacy Enhancing Technologies in Explainable Artificial Intelligence

Authors: Sonal Allana, Rozita Dara, Xiaodong Lin, Pulei Xiong |

阅读更多

来源: ArXiv AI | 08-07-25

Exploring Core and Periphery Precepts in Biological and Artificial Intelligence: An Outcome-Based Perspective

Authors: Niloofar Shadab, Tyler Cody, Alejandro Salado, Taylan G. Topcu, Mohammad Shadab, Peter Beling |

阅读更多

来源: ArXiv AI | 08-07-25

LLM-based Question-Answer Framework for Sensor-driven HVAC System Interaction

Authors: Sungmin Lee, Minju Kang, Joonhee Lee, Seungyong Lee, Dongju Kim, Jingi Hong, Jun Shin, Pei Zhang, JeongGil Ko |

阅读更多

来源: ArXiv AI | 08-07-25

ChipSeek-R1: Generating Human-Surpassing RTL with LLM via Hierarchical Reward-Driven Reinforcement Learning

Authors: Zhirong Chen, Kaiyan Chang, Zhuolin Li, Xinyang He, Chujie Chen, Cangyuan Li, Mengdi Wang, Haobo Xu, Yinhe Han, Ying Wang |

阅读更多

来源: ArXiv AI | 08-07-25

DoPI: Doctor-like Proactive Interrogation LLM for Traditional Chinese Medicine

Authors: Zewen Sun, Ruoxiang Huang, Jiahe Feng, Rundong Kong, Yuqian Wang, Hengyu Liu, Ziqi Gong, Yuyuan Qin, Yingxue Wang, Yu Wang |

阅读更多

来源: ArXiv AI | 08-07-25

Application and Evaluation of Large Language Models for Forecasting the Impact of Traffic Incidents

Authors: George Jagadeesh, Srikrishna Iyer, Michal Polanowski, Kai Xin Thia |

阅读更多

来源: ArXiv AI | 08-07-25

MARBLE: A Multi-Agent Rule-Based LLM Reasoning Engine for Accident Severity Prediction

Authors: Kaleem Ullah Qasim, Jiashu Zhang |

阅读更多

来源: ArXiv AI | 08-07-25

SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam?

Authors: Jingyi Chai, Shuo Tang, Rui Ye, Yuwen Du, Xinyu Zhu, Mengcheng Zhou, Yanfeng Wang, Weinan E, Siheng Chen |

阅读更多

来源: ArXiv AI | 08-07-25

OpenAI's Head of Recruiting says Meta's hiring tactics "reek of desperation"

阅读更多

来源: The Decoder | 08-07-25

The Maquet machine: how AI is reviving Alexandre Dumas' successful model

阅读更多

来源: The Decoder | 08-07-25

Alibaba's new GPT-4o competitor Qwen VLo is no longer open source

阅读更多

来源: The Decoder | 08-07-25

A non-anthropomorphized view of LLMsaddxorrol.blogspot.com

阅读更多

来源: Hacker News | 08-07-25

Early Signs of Steganographic Capabilities in Frontier LLMs

Authors: Artur Zolkowski, Kei Nishimura-Gasparian, Robert McCarthy, Roland S. Zimmermann, David Lindner |

阅读更多

来源: ArXiv AI | 08-07-25

Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks

Authors: Sizhe Chen, Arman Zharmagambetov, David Wagner, Chuan Guo |

阅读更多

来源: ArXiv AI | 08-07-25

MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs

Authors: Purbesh Mitra, Sennur Ulukus |

阅读更多

来源: ArXiv AI | 08-07-25

SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model

Authors: Wencheng Zhang, Shiqin Qiao, Lingjie Luo, Yinfeng Li, Chuanyang Zheng, Qian Xu, Meng Li, Yong Gui, Yijun He, Jianing Qiu, Jindong Hong, Jiankai Sun |

阅读更多

来源: ArXiv AI | 08-07-25

Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs

Authors: Ken Tsui |

阅读更多

来源: ArXiv AI | 08-07-25

Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs

Authors: Mohammad Ali Alomrani, Yingxue Zhang, Derek Li, Qianyi Sun, Soumyasundar Pal, Zhanguang Zhang, Yaochen Hu, Rohan Deepak Ajwani, Antonios Valkanas, Raika Karimi, Peng Cheng, Yunzhou Wang, Pengyi Liao, Hanrui Huang, Bin Wang, Jianye Hao, Mark Coates |

阅读更多

来源: ArXiv AI | 08-07-25

STELLA: Self-Evolving LLM Agent for Biomedical Research

Authors: Ruofan Jin, Zaixi Zhang, Mengdi Wang, Le Cong |

阅读更多

来源: ArXiv AI | 08-07-25

Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous Evaluation

Authors: Jungkoo Kang |

阅读更多

来源: ArXiv AI | 08-07-25

Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust

Authors: Amogh Mannekote, Adam Davies, Guohao Li, Kristy Elizabeth Boyer, ChengXiang Zhai, Bonnie J Dorr, Francesco Pinto |

阅读更多

来源: ArXiv AI | 08-07-25

Data Diversification Methods In Alignment Enhance Math Performance In LLMs

Authors: Berkan Dokmeci, Qingyang Wu, Ben Athiwaratkun, Ce Zhang, Shuaiwen Leon Song, James Zou |

阅读更多

来源: ArXiv AI | 08-07-25

What Neuroscience Can Teach AI About Learning in Continuously Changing Environments

Authors: Daniel Durstewitz, Bruno Averbeck, Georgia Koppe |

阅读更多

来源: ArXiv AI | 08-07-25

AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench

Authors: Edan Toledo, Karen Hambardzumyan, Martin Josifoski, Rishi Hazra, Nicolas Baldwin, Alexis Audran-Reiss, Michael Kuchnik, Despoina Magka, Minqi Jiang, Alisia Maria Lupidi, Andrei Lupu, Roberta Raileanu, Kelvin Niu, Tatiana Shavrina, Jean-Christophe Gagnon-Audet, Michael Shvartsman, Shagun Sodhani, Alexander H. Miller, Abhishek Charnalia, Derek Dunfield, Carole-Jean Wu, Pontus Stenetorp, Nicola Cancedda, Jakob Nicolaus Foerster, Yoram Bachrach |

阅读更多

来源: ArXiv AI | 08-07-25

OMS: On-the-fly, Multi-Objective, Self-Reflective Ad Keyword Generation via LLM Agent

Authors: Bowen Chen, Zhao Wang, Shingo Takamatsu |

阅读更多

来源: ArXiv AI | 08-07-25

Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory

Authors: Kenneth Payne, Baptiste Alloui-Cros |

阅读更多

来源: ArXiv AI | 08-07-25

Detection of Disengagement from Voluntary Quizzes: An Explainable Machine Learning Approach in Higher Distance Education

Authors: Behnam Parsaeifard, Christof Imhof, Tansu Pancar, Ioan-Sorin Comsa, Martin Hlosta, Nicole Bergamin, Per Bergamin |

阅读更多

来源: ArXiv AI | 08-07-25

Knowledge Protocol Engineering: A New Paradigm for AI in Domain-Specific Knowledge Work

Authors: Guangwei Zhang |

阅读更多

来源: ArXiv AI | 08-07-25

KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs

Authors: Yuzhang Xie, Hejie Cui, Ziyang Zhang, Jiaying Lu, Kai Shu, Fadi Nahab, Xiao Hu, Carl Yang |

阅读更多

来源: ArXiv AI | 08-07-25

"No grace period, no pause": EU sticks to AI Act timeline despite industry pushback

阅读更多

来源: The Decoder | 07-07-25

ChatGPT usage for news surges as Google news searches decline

阅读更多

来源: The Decoder | 07-07-25

LLMs should not replace therapistsarxiv.org

阅读更多

来源: Hacker News | 07-07-25

Opencode: AI coding agent, built for the terminalgithub.com/sst

阅读更多

来源: Hacker News | 07-07-25

Collatz's Ant and Σ(n)gbragafibra.github.io

阅读更多

来源: Hacker News | 07-07-25

Overclocking LLM Reasoning: Monitoring and Controlling LLM Thinking Path Lengthsroyeisen.github.io

阅读更多

来源: Hacker News | 07-07-25

Mirage: AI-native UGC game engine powered by real-time world modeldynamicslab.ai

阅读更多

来源: Hacker News | 07-07-25

Optimizing Tool Selection for LLM Workflows with Differentiable Programmingviksit.substack.com

阅读更多

来源: Hacker News | 06-07-25

The force-feeding of AI features on an unwilling publichonest-broker.com

阅读更多

来源: Hacker News | 06-07-25

A Canadian's AI hoax duped the media and propelled a 'band' to successcbc.ca

阅读更多

来源: Hacker News | 06-07-25

The Right Way to Embed an LLM in a Group Chattripjam.app

阅读更多

来源: Hacker News | 06-07-25

Impact of PCIe 5.0 Bandwidth on GPU Content Creation and LLM Performancepugetsystems.com

阅读更多

来源: Hacker News | 05-07-25

Large Language Models Are Improving Exponentiallyieee.org

阅读更多

来源: Hacker News | 05-07-25

SciArena lets scientists compare LLMs on real research questions

阅读更多

来源: The Decoder | 05-07-25

Google launches Veo 3 Fast worldwide, letting Gemini Pro users generate videos up to 720p

阅读更多

来源: The Decoder | 05-07-25

Gremllmgithub.com/awwaiid

阅读更多

来源: Hacker News | 05-07-25

ChatGPT creates phisher's paradise by serving the wrong URLs for major companiestheregister.com

阅读更多

来源: Hacker News | 05-07-25

Version Control for AI Codingbranching.app

阅读更多

来源: Hacker News | 05-07-25

Everything around LLMs is still magical and wishful thinkingdmitriid.com

阅读更多

来源: Hacker News | 05-07-25

Meta reportedly offers top OpenAI researchers up to $300 million over four years

阅读更多

来源: The Decoder | 04-07-25

How AI on Microcontrollers Works: Operators and Kernelsdanielmangum.com

阅读更多

来源: Hacker News | 04-07-25

Show HN: I AI coded a tower defense game and documented the whole processgithub.com/maciej-trebacz

阅读更多

来源: Hacker News | 04-07-25

Fei-Fei Li: Spatial intelligence is the next frontier in AI [video]youtube.com

阅读更多

来源: Hacker News | 04-07-25

About AI Evalshamel.dev

阅读更多

来源: Hacker News | 04-07-25

Manipulating trapped air bubbles in ice for message storage in cold regionscell.com

阅读更多

来源: Hacker News | 04-07-25

Cloudflare aims to save the World Wide Web by blocking AI crawlers without explicit consent

阅读更多

来源: The Decoder | 03-07-25

Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling

Authors: Zeyu Huang, Tianhao Cheng, Zihan Qiu, Zili Wang, Yinghui Xu, Edoardo M. Ponti, Ivan Titov |

阅读更多

来源: ArXiv AI | 03-07-25

AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training

Authors: Zhenyu Han, Ansheng You, Haibo Wang, Kui Luo, Guang Yang, Wenqi Shi, Menglong Chen, Sicheng Zhang, Zeshun Lan, Chunshi Deng, Huazhong Ji, Wenjie Liu, Yu Huang, Yixiang Zhang, Chenyi Pan, Jing Wang, Xin Huang, Chunsheng Li, Jianping Wu |

阅读更多

来源: ArXiv AI | 03-07-25

Towards culturally-appropriate conversational AI for health in the majority world: An exploratory study with citizens and professionals in Latin America

Authors: Dorian Peters, Fernanda Espinoza, Marco da Re, Guido Ivetta, Luciana Benotti, Rafael A. Calvo |

阅读更多

来源: ArXiv AI | 03-07-25

AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness

Authors: Zixin Chen, Hongzhan Lin, Kaixin Li, Ziyang Luo, Zhen Ye, Guang Chen, Zhiyong Huang, Jing Ma |

阅读更多

来源: ArXiv AI | 03-07-25

Exploring Advanced LLM Multi-Agent Systems Based on Blackboard Architecture

Authors: Bochen Han, Songmao Zhang |

阅读更多

来源: ArXiv AI | 03-07-25

Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training

Authors: Ismail Labiad, Mathurin Videau, Matthieu Kowalski, Marc Schoenauer, Alessandro Leite, Julia Kempe, Olivier Teytaud |

阅读更多

来源: ArXiv AI | 03-07-25

MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining

Authors: Zhixun Chen, Ping Guo, Wenhan Han, Yifan Zhang, Binbin Liu, Haobin Lin, Fengze Liu, Yan Zhao, Bingni Zhang, Taifeng Wang, Yin Zheng, Meng Fang |

阅读更多

来源: ArXiv AI | 03-07-25

Enhanced Generative Model Evaluation with Clipped Density and Coverage

Authors: Nicolas Salvy, Hugues Talbot, Bertrand Thirion |

阅读更多

来源: ArXiv AI | 03-07-25

Empowering Manufacturers with Privacy-Preserving AI Tools: A Case Study in Privacy-Preserving Machine Learning to Solve Real-World Problems

Authors: Xiaoyu Ji, Jessica Shorland, Joshua Shank, Pascal Delpe-Brice, Latanya Sweeney, Jan Allebach, Ali Shakouri |

阅读更多

来源: ArXiv AI | 03-07-25

LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs

Authors: Reza Arabpour, Haitz Sáez de Ocáriz Borde, Anastasis Kratsios |

阅读更多

来源: ArXiv AI | 03-07-25

Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging

Authors: Montasir Shams, Chashi Mahiul Islam, Shaeke Salman, Phat Tran, Xiuwen Liu |

阅读更多

来源: ArXiv AI | 03-07-25

End-to-End Large Portfolio Optimization for Variance Minimization with Neural Networks through Covariance Cleaning

Authors: Christian Bongiorno, Efstratios Manolakis, Rosario Nunzio Mantegna |

阅读更多

来源: ArXiv AI | 03-07-25

Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models

Authors: Chengao Li, Hanyu Zhang, Yunkun Xu, Hongyan Xue, Xiang Ao, Qing He |

阅读更多

来源: ArXiv AI | 03-07-25

AI4Research: A Survey of Artificial Intelligence for Scientific Research

Authors: Qiguang Chen, Mingda Yang, Libo Qin, Jinhao Liu, Zheng Yan, Jiannan Guan, Dengyun Peng, Yiyan Ji, Hanjing Li, Mengkang Hu, Yimeng Zhang, Yihao Liang, Yuhang Zhou, Jiaqi Wang, Zhi Chen, Wanxiang Che |

阅读更多

来源: ArXiv AI | 03-07-25

How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

Authors: Rahul Ramachandran, Ali Garjani, Roman Bachmann, Andrei Atanov, Oğuzhan Fatih Kar, Amir Zamir |

阅读更多

来源: ArXiv AI | 03-07-25

Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla

Authors: Md Sazzadul Islam Ridoy, Sumi Akter, Md. Aminur Rahman |

阅读更多

来源: ArXiv AI | 03-07-25

Exploring a Hybrid Deep Learning Approach for Anomaly Detection in Mental Healthcare Provider Billing: Addressing Label Scarcity through Semi-Supervised Anomaly Detection

Authors: Samirah Bakker, Yao Ma, Seyed Sahand Mohammadi Ziabari |

阅读更多

来源: ArXiv AI | 03-07-25

AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

Authors: Sixiang Chen, Jiaming Liu, Siyuan Qian, Han Jiang, Lily Li, Renrui Zhang, Zhuoyang Liu, Chenyang Gu, Chengkai Hou, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang |

阅读更多

来源: ArXiv AI | 03-07-25

Using multi-agent architecture to mitigate the risk of LLM hallucinations

Authors: Abd Elrahman Amer, Magdi Amer |

阅读更多

来源: ArXiv AI | 03-07-25

MindsDB (YC W20) is hiring an AI solutions engineergreenhouse.io

阅读更多

来源: Hacker News | 03-07-25

What to build instead of AI agentsdecodingml.substack.com

阅读更多

来源: Hacker News | 03-07-25

Meta founds Superintelligence Labs with top acquisitions from OpenAI and Google

阅读更多

来源: The Decoder | 02-07-25

Apple weighs abandoning its own AI for Siri as it tests models from OpenAI and Anthropic

阅读更多

来源: The Decoder | 02-07-25

HN Slop: AI startup ideas generated from Hacker Newsjosh.ing

阅读更多

来源: Hacker News | 02-07-25

Show HN: A modern C++20 AI SDK (GPT‑4o, Claude 3.5, tool‑calling)

阅读更多

来源: Hacker News | 02-07-25

Show HN: Spegel, a Terminal Browser That Uses LLMs to Rewrite Webpagessimedw.com

阅读更多

来源: Hacker News | 02-07-25

Sam Altman Slams Meta's AI Talent Poaching: 'Missionaries Will Beat Mercenaries'wired.com

阅读更多

来源: Hacker News | 02-07-25

Hilbert's sixth problem: derivation of fluid equations via Boltzmann's theoryarxiv.org

阅读更多

来源: Hacker News | 02-07-25

How large are large language models?gist.github.com

阅读更多

来源: Hacker News | 02-07-25

Echoes of AI: Investigating the Downstream Effects of AI Assistants on Software Maintainability

Authors: Markus Borg, Dave Hewett, Nadim Hagatulah, Noric Couderc, Emma Söderberg, Donald Graham, Uttam Kini, Dave Farley |

阅读更多

来源: ArXiv AI | 02-07-25

HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning

Authors: Zhi Jing, Siyuan Yang, Jicong Ao, Ting Xiao, Yugang Jiang, Chenjia Bai |

阅读更多

来源: ArXiv AI | 02-07-25

Automated anatomy-based post-processing reduces false positives and improved interpretability of deep learning intracranial aneurysm detection

Authors: Jisoo Kim, Chu-Hsuan Lin, Alberto Ceballos-Arroyo, Ping Liu, Huaizu Jiang, Shrikanth Yadav, Qi Wan, Lei Qin, Geoffrey S Young |

阅读更多

来源: ArXiv AI | 02-07-25

CAVALRY-V: A Large-Scale Generator Framework for Adversarial Attacks on Video MLLMs

Authors: Jiaming Zhang, Rui Hu, Qing Guo, Wei Yang Bryan Lim |

阅读更多

来源: ArXiv AI | 02-07-25

Many LLMs Are More Utilitarian Than One

Authors: Anita Keshmirian, Razan Baltaji, Babak Hemmatian, Hadi Asghari, Lav R. Varshney |

阅读更多

来源: ArXiv AI | 02-07-25

Deep learning-based segmentation of T1 and T2 cardiac MRI maps for automated disease detection

Authors: Andreea Bianca Popescu, Andreas Seitz, Heiko Mahrholdt, Jens Wetzl, Athira Jacob, Lucian Mihai Itu, Constantin Suciu, Teodora Chitiboi |

阅读更多

来源: ArXiv AI | 02-07-25

Stylometry recognizes human and LLM-generated texts in short samples

Authors: Karol Przystalski, Jan K. Argasiński, Iwona Grabska-Gradzińska, Jeremi K. Ochab |

阅读更多

来源: ArXiv AI | 02-07-25

Large Language Model Powered Intelligent Urban Agents: Concepts, Capabilities, and Applications

Authors: Jindong Han, Yansong Ning, Zirui Yuan, Hang Ni, Fan Liu, Tengfei Lyu, Hao Liu |

阅读更多

来源: ArXiv AI | 02-07-25

Turning AI Data Centers into Grid-Interactive Assets: Results from a Field Demonstration in Phoenix, Arizona

Authors: Philip Colangelo, Ayse K. Coskun, Jack Megrue, Ciaran Roberts, Shayan Sengupta, Varun Sivaram, Ethan Tiao, Aroon Vijaykar, Chris Williams, Daniel C. Wilson, Zack MacFarland, Daniel Dreiling, Nathan Morey, Anuja Ratnayake, Baskar Vairamohan |

阅读更多

来源: ArXiv AI | 02-07-25

Description of the Training Process of Neural Networks via Ergodic Theorem : Ghost nodes

Authors: Eun-Ji Park, Sangwon Yun |

阅读更多

来源: ArXiv AI | 02-07-25

TalentMine: LLM-Based Extraction and Question-Answering from Multimodal Talent Tables

Authors: Varun Mannam, Fang Wang, Chaochun Liu, Xin Chen |

阅读更多

来源: ArXiv AI | 02-07-25

Holistic Artificial Intelligence in Medicine; improved performance and explainability

Authors: Periklis Petridis, Georgios Margaritis, Vasiliki Stoumpou, Dimitris Bertsimas |

阅读更多

来源: ArXiv AI | 02-07-25

ChatGPT produces more "lazy" thinkers: Evidence of cognitive engagement decline

Authors: Georgios P. Georgiou |

阅读更多

来源: ArXiv AI | 02-07-25

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Authors: Maggie Huan, Yuetai Li, Tuney Zheng, Xiaoyu Xu, Seungone Kim, Minxin Du, Radha Poovendran, Graham Neubig, Xiang Yue |

阅读更多

来源: ArXiv AI | 02-07-25

Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess

Authors: Dongyoon Hwang, Hojoon Lee, Jaegul Choo, Dongmin Park, Jongho Park |

阅读更多

来源: ArXiv AI | 02-07-25

A Robust Algorithm for Non-IID Machine Learning Problems with Convergence Analysis

Authors: Qing Xu, Xiaohua Xuan |

阅读更多

来源: ArXiv AI | 02-07-25

Enhancing LLM Agent Safety via Causal Influence Prompting

Authors: Dongyoon Hahm, Woogyeol Jin, June Suk Choi, Sungsoo Ahn, Kimin Lee |

阅读更多

来源: ArXiv AI | 02-07-25

Google brings Gemini for Education and Gemini in Classroom AI tools to schools

阅读更多

来源: The Decoder | 02-07-25

Microsoft’s MAI-DxO boosts AI diagnostic accuracy and cuts costs by nearly 70 percent

阅读更多

来源: The Decoder | 02-07-25

The wanton destruction of a creative-tech eragreg.technology

阅读更多

来源: Hacker News | 02-07-25

Building a Personal AI Factoryjohn-rush.com

阅读更多

来源: Hacker News | 02-07-25

Show HN: Core – open source memory graph for LLMs – shareable, user ownedgithub.com/redplanethq

阅读更多

来源: Hacker News | 02-07-25

After Meta's recruiting push, OpenAI tries to retain talent

阅读更多

来源: The Decoder | 01-07-25

Claude Code now supports hooksanthropic.com

阅读更多

来源: Hacker News | 01-07-25

GPEmu: A GPU emulator for rapid, low-cost deep learning prototyping [pdf]vldb.org

阅读更多

来源: Hacker News | 01-07-25

Cloudflare to introduce pay-per-crawl for AI botscloudflare.com

阅读更多

来源: Hacker News | 01-07-25

Researchers Uncover Hidden Ingredients Behind AI Creativityquantamagazine.org

阅读更多

来源: Hacker News | 01-07-25

The new skill in AI is not prompting, it's context engineeringphilschmid.de

阅读更多

来源: Hacker News | 01-07-25

The hidden JTAG in a Qualcomm/Snapdragon device’s USB portlinaro.org

阅读更多

来源: Hacker News | 01-07-25

Show HN: ToplingDB - A Persistent Key-Value Store for External Storagegithub.com/topling

阅读更多

来源: Hacker News | 01-07-25

The average chess players of Bletchley Park and AI research in Britainblogs.bl.uk

阅读更多

来源: Hacker News | 01-07-25

Development of Hybrid Artificial Intelligence Training on Real and Synthetic Data: Benchmark on Two Mixed Training Strategies

Authors: Paul Wachter, Lukas Niehaus, Julius Schöning |

阅读更多

来源: ArXiv AI | 01-07-25

Bootstrapping Human-Like Planning via LLMs

Authors: David Porfirio, Vincent Hsiao, Morgan Fine-Morris, Leslie Smith, Laura M. Hiatt |

阅读更多

来源: ArXiv AI | 01-07-25

Bridging Ethical Principles and Algorithmic Methods: An Alternative Approach for Assessing Trustworthiness in AI Systems

Authors: Michael Papademas, Xenia Ziouvelou, Antonis Troumpoukis, Vangelis Karkaletsis |

阅读更多

来源: ArXiv AI | 01-07-25

The Societal Impact of Foundation Models: Advancing Evidence-based AI Policy

Authors: Rishi Bommasani |

阅读更多

来源: ArXiv AI | 01-07-25

Can Large Language Models Capture Human Risk Preferences? A Cross-Cultural Study

Authors: Bing Song, Jianing Liu, Sisi Jian, Chenyang Wu, Vinayak Dixit |

阅读更多

来源: ArXiv AI | 01-07-25

Are Large Language Models Capable of Deep Relational Reasoning? Insights from DeepSeek-R1 and Benchmark Comparisons

Authors: Chi Chiu So, Yueyue Sun, Jun-Min Wang, Siu Pang Yung, Anthony Wai Keung Loh, Chun Pong Chau |

阅读更多

来源: ArXiv AI | 01-07-25

Data Augmentation for Cognitive Behavioral Therapy: Leveraging ERNIE Language Models using Artificial Intelligence

Authors: Bosubabu Sambana, Kondreddygari Archana, Suram Indhra Sena Reddy, Shaik Meethaigar Jameer Basha, Shaik Karishma |

阅读更多

来源: ArXiv AI | 01-07-25

The Confidence Paradox: Can LLM Know When It's Wrong

Authors: Sahil Tripathi, Md Tabrez Nafis, Imran Hussain, Jiechao Gao |

阅读更多

来源: ArXiv AI | 01-07-25

CooT: Learning to Coordinate In-Context with Coordination Transformers

Authors: Huai-Chih Wang, Hsiang-Chun Chuang, Hsi-Chun Cheng, Dai-Jie Wu, Shao-Hua Sun |

阅读更多

来源: ArXiv AI | 01-07-25

ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data

Authors: Yu Zhang, Ruijie Yu, Jidong Tian, Feng Zhu, Jiapeng Liu, Xiaokang Yang, Yaohui Jin, Yanyan Xu |

阅读更多

来源: ArXiv AI | 01-07-25

Assessing GPTZero's Accuracy in Identifying AI vs. Human-Written Essays

Authors: Selin Dik, Osman Erdem, Mehmet Dik |

阅读更多

来源: ArXiv AI | 01-07-25

Evaluating Multi-Agent Defences Against Jailbreaking Attacks on Large Language Models

Authors: Maria Carolina Cornelia Wit, Jun Pang |

阅读更多

来源: ArXiv AI | 01-07-25

MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI

Authors: Huanjin Yao, Jiaxing Huang, Yawen Qiu, Michael K. Chen, Wenzheng Liu, Wei Zhang, Wenjie Zeng, Xikun Zhang, Jingyi Zhang, Yuxin Song, Wenhao Wu, Dacheng Tao |

阅读更多

来源: ArXiv AI | 01-07-25

Agent4S: The Transformation of Research Paradigms from the Perspective of Large Language Models

Authors: Boyuan Zheng, Zerui Fang, Zhe Xu, Rui Wang, Yiwen Chen, Cunshi Wang, Mengwei Qu, Lei Lei, Zhen Feng, Yan Liu, Yuyang Li, Mingzhou Tan, Jiaji Wu, Jianwei Shuai, Jia Li, Fangfu Ye |

阅读更多

来源: ArXiv AI | 01-07-25

Attestable Audits: Verifiable AI Safety Benchmarks Using Trusted Execution Environments

Authors: Christoph Schnabl, Daniel Hugenroth, Bill Marino, Alastair R. Beresford |

阅读更多

来源: ArXiv AI | 01-07-25

A New Perspective On AI Safety Through Control Theory Methodologies

Authors: Lars Ullrich, Walter Zimmer, Ross Greer, Knut Graichen, Alois C. Knoll, Mohan Trivedi |

阅读更多

来源: ArXiv AI | 01-07-25

Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning

Authors: Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik |

阅读更多

来源: ArXiv AI | 01-07-25

Performance of LLMs on Stochastic Modeling Operations Research Problems: From Theory to Practice

Authors: Akshit Kumar, Tianyi Peng, Yuhang Wu, Assaf Zeevi |

阅读更多

来源: ArXiv AI | 01-07-25

AI Risk-Management Standards Profile for General-Purpose AI (GPAI) and Foundation Models

Authors: Anthony M. Barrett, Jessica Newman, Brandie Nonnecke, Nada Madkour, Dan Hendrycks, Evan R. Murphy, Krystal Jackson, Deepika Raman |

阅读更多

来源: ArXiv AI | 01-07-25

Harnessing AI Agents to Advance Research on Refugee Child Mental Health

Authors: Aditya Shrivastava, Komal Gupta, Shraddha Arora |

阅读更多

来源: ArXiv AI | 01-07-25

OpenAI loses four more top researchers to Meta as even its own engineers call it a "huge loss"

阅读更多

来源: The Decoder | 01-07-25

Show HN: Local LLM Notepad – run a GPT-style model from a USB stickgithub.com/runzhouye

阅读更多

来源: Hacker News | 01-07-25

Show HN: We're two coffee nerds who built an AI app to track beans and recipesbeanbook.app

阅读更多

来源: Hacker News | 01-07-25

Show HN: TokenDagger – A tokenizer faster than OpenAI's Tiktokengithub.com/m4thyou

阅读更多

来源: Hacker News | 01-07-25

There are no new ideas in AI only new datasetsjxmo.io

阅读更多

来源: Hacker News | 01-07-25

OmniGen 2 blends image and text generation like GPT-4o, but is open source

阅读更多

来源: The Decoder | 30-06-25

Gridfinity: The modular, open-source grid storage systemgridfinity.xyz

阅读更多

来源: Hacker News | 30-06-25

Using Large Language Models to Suggest Informative Prior Distributions in Bayesian Statistics

Authors: Michael A. Riegler, Kristoffer Herland Hellton, Vajira Thambawita, Hugo L. Hammer |

阅读更多

来源: ArXiv AI | 30-06-25

Analyzing and Fine-Tuning Whisper Models for Multilingual Pilot Speech Transcription in the Cockpit

Authors: Kartheek Kumar Reddy Nareddy, Sarah Ternus, Julia Niebling |

阅读更多

来源: ArXiv AI | 30-06-25

Advancing Jailbreak Strategies: A Hybrid Approach to Exploiting LLM Vulnerabilities and Bypassing Modern Defenses

Authors: Mohamed Ahmed, Mohamed Abdelmouty, Mingyu Kim, Gunvanth Kandula, Alex Park, James C. Davis |

阅读更多

来源: ArXiv AI | 30-06-25

Transformers are Graph Neural Networks

Authors: Chaitanya K. Joshi |

阅读更多

来源: ArXiv AI | 30-06-25

Autonomic Microservice Management via Agentic AI and MAPE-K Integration

Authors: Matteo Esposito, Alexander Bakhtin, Noman Ahmad, Mikel Robredo, Ruoyu Su, Valentina Lenarduzzi, Davide Taibi |

阅读更多

来源: ArXiv AI | 30-06-25

CoATA: Effective Co-Augmentation of Topology and Attribute for Graph Neural Networks

Authors: Tao Liu, Longlong Lin, Yunfeng Yu, Xi Ou, Youan Zhang, Zhiqiu Ye, Tao Jia |

阅读更多

来源: ArXiv AI | 30-06-25

Projected Compression: Trainable Projection for Efficient Transformer Compression

Authors: Maciej Stefaniak, Michał Krutul, Jan Małaśnicki, Maciej Pióro, Jakub Krajewski, Sebastian Jaszczur, Marek Cygan, Kamil Adamczewski, Jan Ludziejewski |

阅读更多

来源: ArXiv AI | 30-06-25

From Ground to Air: Noise Robustness in Vision Transformers and CNNs for Event-Based Vehicle Classification with Potential UAV Applications

Authors: Nouf Almesafri, Hector Figueiredo, Miguel Arana-Catania |

阅读更多

来源: ArXiv AI | 30-06-25

Concept-Level AI for Telecom: Moving Beyond Large Language Models

Authors: Viswanath Kumarskandpriya, Abdulhalim Dandoush, Abbas Bradai, Ali Belgacem |

阅读更多

来源: ArXiv AI | 30-06-25

A Deep Learning framework for building damage assessment using VHR SAR and geospatial data: demonstration on the 2023 Turkiye Earthquake

Authors: Luigi Russo, Deodato Tapete, Silvia Liberata Ullo, Paolo Gamba |

阅读更多

来源: ArXiv AI | 30-06-25

CLoVE: Personalized Federated Learning through Clustering of Loss Vector Embeddings

Authors: Randeep Bhatia, Nikos Papadis, Murali Kodialam, TV Lakshman, Sayak Chakrabarty |

阅读更多

来源: ArXiv AI | 30-06-25

QuickSilver -- Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization

Authors: Danush Khanna, Aditya Kumar Guru, Srivarshinee Sridhar, Zidan Ahmed, Rubhav Bahirwani, Meetu Malhotra, Vinija Jain, Aman Chadha, Amitava Das, Kripabandhu Ghosh |

阅读更多

来源: ArXiv AI | 30-06-25

MobiVerse: Scaling Urban Mobility Simulation with Hybrid Lightweight Domain-Specific Generator and Large Language Models

Authors: Yifan Liu, Xishun Liao, Haoxuan Ma, Jonathan Liu, Rohan Jadhav, Jiaqi Ma |

阅读更多

来源: ArXiv AI | 30-06-25

SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents

Authors: Wanxin Tian, Shijie Zhang, Kevin Zhang, Xiaowei Chi, Yulin Luo, Junyu Lu, Chunkai Fan, Qiang Zhou, Yiming Zhao, Ning Liu Siyu Lin, Zhiyuan Qin, Xiaozhu Ju, Shanghang Zhang, Jian Tang |

阅读更多

来源: ArXiv AI | 30-06-25

CitySim: Modeling Urban Behaviors and City Dynamics with Large-Scale LLM-Driven Agent Simulation

Authors: Nicolas Bougie, Narimasa Watanabe |

阅读更多

来源: ArXiv AI | 30-06-25

A Different Approach to AI Safety: Proceedings from the Columbia Convening on Openness in Artificial Intelligence and AI Safety

Authors: Camille François, Ludovic Péran, Ayah Bdeir, Nouha Dziri, Will Hawkins, Yacine Jernite, Sayash Kapoor, Juliet Shen, Heidy Khlaaf, Kevin Klyman, Nik Marda, Marie Pellat, Deb Raji, Divya Siddarth, Aviya Skowron, Joseph Spisak, Madhulika Srikumar, Victor Storchan, Audrey Tang, Jen Weedon |

阅读更多

来源: ArXiv AI | 30-06-25

Query as Test: An Intelligent Driving Test and Data Storage Method for Integrated Cockpit-Vehicle-Road Scenarios

Authors: Shengyue Yao, Runqing Guo, Yangyang Qin, Miangbing Meng, Jipeng Cao, Yilun Lin, Yisheng Lv, Fei-Yue Wang |

阅读更多

来源: ArXiv AI | 30-06-25

Embodied AI Agents: Modeling the World

Authors: Pascale Fung, Yoram Bachrach, Asli Celikyilmaz, Kamalika Chaudhuri, Delong Chen, Willy Chung, Emmanuel Dupoux, Hervé Jégou, Alessandro Lazaric, Arjun Majumdar, Andrea Madotto, Franziska Meier, Florian Metze, Théo Moutakanni, Juan Pino, Basile Terver, Joseph Tighe, Jitendra Malik |

阅读更多

来源: ArXiv AI | 30-06-25

AI Model Passport: Data and System Traceability Framework for Transparent AI in Health

Authors: Varvara Kalokyri, Nikolaos S. Tachos, Charalampos N. Kalantzopoulos, Stelios Sfakianakis, Haridimos Kondylakis, Dimitrios I. Zaridis, Sara Colantonio, Daniele Regge, Nikolaos Papanikolaou, The ProCAncer-I consortium, Konstantinos Marias, Dimitrios I. Fotiadis, Manolis Tsiknakis |

阅读更多

来源: ArXiv AI | 30-06-25

The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements

Authors: Bingchen Zhao, Despoina Magka, Minqi Jiang, Xian Li, Roberta Raileanu, Tatiana Shavrina, Jean-Christophe Gagnon-Audet, Kelvin Niu, Shagun Sodhani, Michael Shvartsman, Andrei Lupu, Alisia Lupidi, Edan Toledo, Karen Hambardzumyan, Martin Josifoski, Thomas Foster, Lucia Cipolina-Kun, Abhishek Charnalia, Derek Dunfield, Alexander H. Miller, Oisin Mac Aodha, Jakob Foerster, Yoram Bachrach |

阅读更多

来源: ArXiv AI | 30-06-25

Anthropic's Claude ran a store and lost money by selling below cost and giving discounts

阅读更多

来源: The Decoder | 30-06-25

Show HN: A tool to benchmark LLM APIs (OpenAI, Claude, local/self-hosted)llmapitest.com

阅读更多

来源: Hacker News | 30-06-25

US Senate moves to block state AI laws for five years if states take broadband funds

阅读更多

来源: The Decoder | 30-06-25

Life of an inference request (vLLM V1): How LLMs are served efficiently at scaleubicloud.com

阅读更多

来源: Hacker News | 29-06-25

Magnetic Tape Storage Technology: usage, history, and future outlookacm.org

阅读更多

来源: Hacker News | 29-06-25

Show HN: A different kind of AI Video generation

阅读更多

来源: Hacker News | 29-06-25

Transformer-Based Spatial-Temporal Counterfactual Outcomes Estimation

Authors: He Li, Haoang Chi, Mingyu Liu, Wanrong Huang, Liyang Xu, Wenjing Yang |

阅读更多

来源: ArXiv AI | 29-06-25

Maintaining MTEB: Towards Long Term Usability and Reproducibility of Embedding Benchmarks

Authors: Isaac Chung, Imene Kerboua, Marton Kardos, Roman Solomatin, Kenneth Enevoldsen |

阅读更多

来源: ArXiv AI | 29-06-25

A Hierarchical Deep Learning Approach for Minority Instrument Detection

Authors: Dylan Sechet, Francesca Bugiotti, Matthieu Kowalski, Edouard d'Hérouville, Filip Langiewicz |

阅读更多

来源: ArXiv AI | 29-06-25

$T^3$: Multi-level Tree-based Automatic Program Repair with Large Language Models

Authors: Quanming Liu, Xupeng Bu, Zhichao Yan, Ru Li |

阅读更多

来源: ArXiv AI | 29-06-25

Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage

Authors: Gavin Lee Goodship, Luis Miralles-Pechuan, Stephen O'Sullivan |

阅读更多

来源: ArXiv AI | 29-06-25

Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection

Authors: Ali Şenol, Garima Agrawal, Huan Liu |

阅读更多

来源: ArXiv AI | 29-06-25

Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference

Authors: Colin Samplawski, Adam D. Cobb, Manoj Acharya, Ramneet Kaur, Susmit Jha |

阅读更多

来源: ArXiv AI | 29-06-25

Leveraging LLM-Assisted Query Understanding for Live Retrieval-Augmented Generation

Authors: Guanting Dong, Xiaoxi Li, Yuyao Zhang, Mengjie Deng |

阅读更多

来源: ArXiv AI | 29-06-25

"What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets

Authors: Akshay Paruchuri, Maryam Aziz, Rohit Vartak, Ayman Ali, Best Uchehara, Xin Liu, Ishan Chatterjee, Monica Agrawal |

阅读更多

来源: ArXiv AI | 29-06-25

Potemkin Understanding in Large Language Models

Authors: Marina Mancoridis, Bec Weeks, Keyon Vafa, Sendhil Mullainathan |

阅读更多

来源: ArXiv AI | 29-06-25

The Singapore Consensus on Global AI Safety Research Priorities

Authors: Yoshua Bengio, Tegan Maharaj, Luke Ong, Stuart Russell, Dawn Song, Max Tegmark, Lan Xue, Ya-Qin Zhang, Stephen Casper, Wan Sie Lee, Sören Mindermann, Vanessa Wilfred, Vidhisha Balachandran, Fazl Barez, Michael Belinsky, Imane Bello, Malo Bourgon, Mark Brakel, Siméon Campos, Duncan Cass-Beggs, Jiahao Chen, Rumman Chowdhury, Kuan Chua Seah, Jeff Clune, Juntao Dai, Agnes Delaborde, Nouha Dziri, Francisco Eiras, Joshua Engels, Jinyu Fan, Adam Gleave, Noah Goodman, Fynn Heide, Dan Hendrycks, Cyrus Hodes, Bryan Low Kian Hsiang, Minlie Huang, Sami Jawhar, Wang Jingyu, Adam Tauman Kalai, Meindert Kamphuis, Mohan Kankanhalli, Subhash Kantamneni, Mathias Bonde Kirk, Thomas Kwa, Jeffrey Ladish, Kwok-Yan Lam, Wan Lee Sie, Taewhi Lee, Xiaojian Li, Jiajun Liu, Chaochao Lu, Yifan Mai, Richard Mallah, Julian Michael, Nick Moës, Simon Möller, Kihyuk Nam, Kwan Yee Ng, Mark Nitzberg, Besmira Nushi, Seán O hÉigeartaigh, Alejandro Ortega, Pierre Peigné, James Petrie, Benjamin Prud'Homme, Reihaneh Rabbany, Nayat Sanchez-Pi, Sarah Schwettmann, Buck Shlegeris, Saad Siddiqui, Aradhana Sinha, Martín Soto, Cheston Tan, Dong Ting, Robert Trager, Brian Tse, Anthony Tung K. H., Vanessa Wilfred, John Willes, Denise Wong, Wei Xu, Rongwu Xu, Yi Zeng, HongJiang Zhang, Djordje Žikelić |

阅读更多

来源: ArXiv AI | 29-06-25

Dynamic Context-Aware Prompt Recommendation for Domain-Specific AI Applications

Authors: Xinye Tang, Haijun Zhai, Chaitanya Belwal, Vineeth Thayanithi, Philip Baumann, Yogesh K Roy |

阅读更多

来源: ArXiv AI | 29-06-25

Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?

Authors: Haoang Chi, He Li, Wenjing Yang, Feng Liu, Long Lan, Xiaoguang Ren, Tongliang Liu, Bo Han |

阅读更多

来源: ArXiv AI | 29-06-25

Beyond Reactive Safety: Risk-Aware LLM Alignment via Long-Horizon Simulation

Authors: Chenkai Sun, Denghui Zhang, ChengXiang Zhai, Heng Ji |

阅读更多

来源: ArXiv AI | 29-06-25

Active Inference AI Systems for Scientific Discovery

Authors: Karthik Duraisamy |

阅读更多

来源: ArXiv AI | 29-06-25

IXAII: An Interactive Explainable Artificial Intelligence Interface for Decision Support Systems

Authors: Pauline Speckmann, Mario Nadj, Christian Janiesch |

阅读更多

来源: ArXiv AI | 29-06-25

Microsoft’s Braga AI chip faces six-month delay, trails Nvidia’s Blackwell

阅读更多

来源: The Decoder | 29-06-25

OpenAI renting Google TPUs sends a strong warning shot to Microsoft

阅读更多

来源: The Decoder | 29-06-25

Meta CTO confirms massive offers for top AI executives

阅读更多

来源: The Decoder | 29-06-25

Show HN: AGL a toy language that compiles to Gogithub.com/alaingilbert

阅读更多

来源: Hacker News | 29-06-25

LLMs bring new nature of abstraction – up and sidewaysmartinfowler.com

阅读更多

来源: Hacker News | 28-06-25

Facebook is starting to feed its AI with private, unpublished photostheverge.com

阅读更多

来源: Hacker News | 28-06-25

SymbolicAI: A neuro-symbolic perspective on LLMsgithub.com/extensityai

阅读更多

来源: Hacker News | 28-06-25

Lossless LLM 3x Throughput Increase by LMCachegithub.com/lmcache

阅读更多

来源: Hacker News | 28-06-25

AlphaGenome: AI for Better Understanding the Genomedeepmind.google

阅读更多

来源: Hacker News | 28-06-25

Google launches Gemma 3n, a multimodal AI model built for real-time use on mobile devices

阅读更多

来源: The Decoder | 28-06-25

Project Vend: Can Claude run a small shop? (And why does that matter?)anthropic.com

阅读更多

来源: Hacker News | 28-06-25

Theoretical Analysis of Positional Encodings in Transformer Modelsarxiv.org

阅读更多

来源: Hacker News | 28-06-25

Spark AI (YC W24) is hiring a full-stack engineer in SF (founding team)ycombinator.com

阅读更多

来源: Hacker News | 28-06-25

Microsoft is reportedly barred from building its own AGI until 2030 under its contract with OpenAI

阅读更多

来源: The Decoder | 27-06-25

Meta poaches three top AI researchers from OpenAI, who had poached them from Deepmind

阅读更多

来源: The Decoder | 27-06-25

Show HN: Magnitude – Open-source AI browser automation frameworkgithub.com/magnitudedev

阅读更多

来源: Hacker News | 27-06-25

Launch HN: Issen (YC F24) – Personal AI language tutor

阅读更多

来源: Hacker News | 27-06-25

What did former CTO Mira Murati see at OpenAI that made her choose custom models over AGI

阅读更多

来源: The Decoder | 27-06-25

Show HN: I built an AI dataset generatorgithub.com/metabase

阅读更多

来源: Hacker News | 27-06-25

Researchers train AI to generate long-form text using only reinforcement learning

阅读更多

来源: The Decoder | 26-06-25

Google Deepmind makes robots independent of the cloud with Gemini On-Device

阅读更多

来源: The Decoder | 26-06-25

Anthropic won a fair use hearing that could end up being a defeat

阅读更多

来源: The Decoder | 26-06-25

Google releases open-source Gemini CLI to bring Gemini AI into developer workflows

阅读更多

来源: The Decoder | 26-06-25

Automatic Demonstration Selection for LLM-based Tabular Data Classification

Authors: Shuchu Han, Wolfgang Bruckner |

阅读更多

来源: ArXiv AI | 26-06-25

SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models

Authors: Dipayan Saha, Shams Tarek, Hasan Al Shaikh, Khan Thamid Hasan, Pavan Sai Nalluri, Md. Ajoad Hasan, Nashmin Alam, Jingbo Zhou, Sujan Kumar Saha, Mark Tehranipoor, Farimah Farahmandi |

阅读更多

来源: ArXiv AI | 26-06-25

WattsOnAI: Measuring, Analyzing, and Visualizing Energy and Carbon Footprint of AI Workloads

Authors: Hongzhen Huang, Kunming Zhang, Hanlong Liao, Kui Wu, Guoming Tang |

阅读更多

来源: ArXiv AI | 26-06-25

Large Language Model-Driven Code Compliance Checking in Building Information Modeling

Authors: Soumya Madireddy, Lu Gao, Zia Din, Kinam Kim, Ahmed Senouci, Zhe Han, Yunpeng Zhang |

阅读更多

来源: ArXiv AI | 26-06-25

When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs

Authors: Ammar Khairi, Daniel D'souza, Ye Shen, Julia Kreutzer, Sara Hooker |

阅读更多

来源: ArXiv AI | 26-06-25

AI in the Writing Process: How Purposeful AI Support Fosters Student Writing

Authors: Momin N. Siddiqui, Roy Pea, Hari Subramonyam |

阅读更多

来源: ArXiv AI | 26-06-25

Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs

Authors: Sonia K. Murthy, Rosie Zhao, Jennifer Hu, Sham Kakade, Markus Wulfmeier, Peng Qian, Tomer Ullman |

阅读更多

来源: ArXiv AI | 26-06-25

Define-ML: An Approach to Ideate Machine Learning-Enabled Systems

Authors: Silvio Alonso, Antonio Pedro Santos Alves, Lucas Romao, Hélio Lopes, Marcos Kalinowski |

阅读更多

来源: ArXiv AI | 26-06-25

Persona-Assigned Large Language Models Exhibit Human-Like Motivated Reasoning

Authors: Saloni Dash, Amélie Reymond, Emma S. Spiro, Aylin Caliskan |

阅读更多

来源: ArXiv AI | 26-06-25

Achieving Trustworthy Real-Time Decision Support Systems with Low-Latency Interpretable AI Models

Authors: Zechun Deng, Ziwei Liu, Ziqian Bi, Junhao Song, Chia Xin Liang, Joe Yeong, Junfeng Hao |

阅读更多

来源: ArXiv AI | 26-06-25

Accurate and Energy Efficient: Local Retrieval-Augmented Generation Models Outperform Commercial Large Language Models in Medical Tasks

Authors: Konstantinos Vrettos, Michail E. Klontzas |

阅读更多

来源: ArXiv AI | 26-06-25

QHackBench: Benchmarking Large Language Models for Quantum Code Generation Using PennyLane Hackathon Challenges

Authors: Abdul Basit, Minghao Shao, Haider Asif, Nouhaila Innan, Muhammad Kashif, Alberto Marchisio, Muhammad Shafique |

阅读更多

来源: ArXiv AI | 26-06-25

Enterprise Large Language Model Evaluation Benchmark

Authors: Liya Wang, David Yi, Damien Jose, John Passarelli, James Gao, Jordan Leventis, Kang Li |

阅读更多

来源: ArXiv AI | 26-06-25

DiaLLMs: EHR Enhanced Clinical Conversational System for Clinical Test Recommendation and Diagnosis Prediction

Authors: Weijieying Ren, Tianxiang Zhao, Lei Wang, Tianchun Wang, Vasant Honavar |

阅读更多

来源: ArXiv AI | 26-06-25

Smart Ride and Delivery Services with Electric Vehicles: Leveraging Bidirectional Charging for Profit Optimisation

Authors: Jinchun Du, Bojie Shen, Muhammad Aamir Cheema, Adel N. Toosi |

阅读更多

来源: ArXiv AI | 26-06-25

Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios

Authors: Wenbin Gan, Minh-Son Dao, Koji Zettsu |

阅读更多

来源: ArXiv AI | 26-06-25

CogGen: A Learner-Centered Generative AI Architecture for Intelligent Tutoring with Programming Video

Authors: Wengxi Li, Roy Pea, Nick Haber, Hari Subramonyam |

阅读更多

来源: ArXiv AI | 26-06-25

Fine-Tuning and Prompt Engineering of LLMs, for the Creation of Multi-Agent AI for Addressing Sustainable Protein Production Challenges

Authors: Alexander D. Kalian, Jaewook Lee, Stefan P. Johannesson, Lennart Otte, Christer Hogstrand, Miao Guo |

阅读更多

来源: ArXiv AI | 26-06-25

Towards Community-Driven Agents for Machine Learning Engineering

Authors: Sijie Li, Weiwei Sun, Shanda Li, Ameet Talwalkar, Yiming Yang |

阅读更多

来源: ArXiv AI | 26-06-25

LLM code generation may lead to an erosion of trustjaysthoughts.com

阅读更多

来源: Hacker News | 26-06-25

Define policy forbidding use of AI code generatorsgithub.com/qemu

阅读更多

来源: Hacker News | 26-06-25

Build and Host AI-Powered Apps with Claude – No Deployment Neededanthropic.com

阅读更多

来源: Hacker News | 26-06-25

Structured Output with LangChain and Llamafilebrakmic.com

阅读更多

来源: Hacker News | 26-06-25

OpenAI charges by the minute, so speed up your audiomand.is

阅读更多

来源: Hacker News | 26-06-25

Learnings from Building AI Agentscubic.dev

阅读更多

来源: Hacker News | 26-06-25

Gemini CLIblog.google

阅读更多

来源: Hacker News | 26-06-25

Google hands off Agent2Agent protocol to Linux Foundation for open AI agent standard

阅读更多

来源: The Decoder | 26-06-25

LLM Hallucinations in Practical Code Generationacm.org

阅读更多

来源: Hacker News | 26-06-25

FurtherAI (YC W24) Is Hiring for Software and AI Rolesycombinator.com

阅读更多

来源: Hacker News | 26-06-25

Disney is in talks with OpenAI about possible partnerships involving its characters

阅读更多

来源: The Decoder | 25-06-25

Microsoft has introduced an AI agent to the Windows Settings menu

阅读更多

来源: The Decoder | 25-06-25

AI job postings on LinkedIn grew sixfold as AI skill additions to profiles soared twentyfold

阅读更多

来源: The Decoder | 25-06-25

African and South American countries are almost entirely excluded from global AI development

阅读更多

来源: The Decoder | 25-06-25

ChatGPT's enterprise success against Copilot fuels OpenAI/Microsoft rivalrybloomberg.com

阅读更多

来源: Hacker News | 25-06-25

Thoughts on Asunción, Paraguaycpsi.media

阅读更多

来源: Hacker News | 25-06-25

SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning

Authors: Yuqian Fu, Tinghong Chen, Jiajun Chai, Xihuai Wang, Songjun Tu, Guojun Yin, Wei Lin, Qichao Zhang, Yuanheng Zhu, Dongbin Zhao |

阅读更多

来源: ArXiv AI | 25-06-25

Arabic Dialect Classification using RNNs, Transformers, and Large Language Models: A Comparative Analysis

Authors: Omar A.Essameldin, Ali O.Elbeih, Wael H.Gomaa, Wael F.Elsersy |

阅读更多

来源: ArXiv AI | 25-06-25

Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

Authors: Yuqi Zhu, Yi Zhong, Jintian Zhang, Ziheng Zhang, Shuofei Qiao, Yujie Luo, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang |

阅读更多

来源: ArXiv AI | 25-06-25

Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation

Authors: Jun Wang, Xijuan Zeng, Chunyu Qiang, Ruilong Chen, Shiyao Wang, Le Wang, Wangjing Zhou, Pengfei Cai, Jiahui Zhao, Nan Li, Zihan Li, Yuzhe Liang, Xiaopeng Wang, Haorui Zheng, Ming Wen, Kang Yin, Yiran Wang, Nan Li, Feng Deng, Liang Dong, Chen Zhang, Di Zhang, Kun Gai |

阅读更多

来源: ArXiv AI | 25-06-25

Do LLMs Know When to Flip a Coin? Strategic Randomization through Reasoning and Experience

Authors: Lingyu Yang (1) ((1) Shanghai Jiao Tong University) |

阅读更多

来源: ArXiv AI | 25-06-25

A standard transformer and attention with linear biases for molecular conformer generation

Authors: Viatcheslav Gurev, Timothy Rumbell |

阅读更多

来源: ArXiv AI | 25-06-25

Emotion Detection on User Front-Facing App Interfaces for Enhanced Schedule Optimization: A Machine Learning Approach

Authors: Feiting Yang, Antoine Moevus, Steve Lévesque |

阅读更多

来源: ArXiv AI | 25-06-25

RecLLM-R1: A Two-Stage Training Paradigm with Reinforcement Learning and Chain-of-Thought v1

Authors: Yu Xie, Xingkai Ren, Ying Qi, Yao Hu, Lianlei Shan |

阅读更多

来源: ArXiv AI | 25-06-25

Spiritual-LLM : Gita Inspired Mental Health Therapy In the Era of LLMs

Authors: Janak Kapuriya, Aman Singh, Jainendra Shukla, Rajiv Ratn Shah |

阅读更多

来源: ArXiv AI | 25-06-25

Baba is LLM: Reasoning in a Game with Dynamic Rules

Authors: Fien van Wetten, Aske Plaat, Max van Duijn |

阅读更多

来源: ArXiv AI | 25-06-25

Conversational Intent-Driven GraphRAG: Enhancing Multi-Turn Dialogue Systems through Adaptive Dual-Retrieval of Flow Patterns and Context Semantics

Authors: Ziqi Zhu, Tao Hu, Honglong Zhang, Dan Yang, HanGeng Chen, Mengran Zhang, Xilun Chen |

阅读更多

来源: ArXiv AI | 25-06-25

FEAT: A Preference Feedback Dataset through a Cost-Effective Auto-Generation and Labeling Framework for English AI Tutoring

Authors: Hyein Seo, Taewook Hwang, Yohan Lee, sangkeun Jung |

阅读更多

来源: ArXiv AI | 25-06-25

Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs

Authors: Liang Zeng, Yongcong Li, Yuzhen Xiao, Changshi Li, Chris Yuhao Liu, Rui Yan, Tianwen Wei, Jujie He, Xuchen Song, Yang Liu, Yahui Zhou |

阅读更多

来源: ArXiv AI | 25-06-25

Interpretable Hybrid Machine Learning Models Using FOLD-R++ and Answer Set Programming

Authors: Sanne Wielinga, Jesse Heyninck |

阅读更多

来源: ArXiv AI | 25-06-25

NTRL: Encounter Generation via Reinforcement Learning for Dynamic Difficulty Adjustment in Dungeons and Dragons

Authors: Carlo Romeo, Andrew D. Bagdanov |

阅读更多

来源: ArXiv AI | 25-06-25

KunLunBaizeRAG: Reinforcement Learning Driven Inference Performance Leap for Large Language Models

Authors: Cheng Li, Jiexiong Liu, Yixuan Chen, Qihang Zhou, KunLun Meta |

阅读更多

来源: ArXiv AI | 25-06-25

From memories to maps: Mechanisms of in context reinforcement learning in transformers

Authors: Ching Fang, Kanaka Rajan |

阅读更多

来源: ArXiv AI | 25-06-25

LLM-Driven Medical Document Analysis: Enhancing Trustworthy Pathology and Differential Diagnosis

Authors: Lei Kang, Xuanshuo Fu, Oriol Ramos Terrades, Javier Vazquez-Corral, Ernest Valveny, Dimosthenis Karatzas |

阅读更多

来源: ArXiv AI | 25-06-25

Temporal-IRL: Modeling Port Congestion and Berth Scheduling with Inverse Reinforcement Learning

Authors: Guo Li, Zixiang Xu, Wei Zhang, Yikuan Hu, Xinyu Yang, Nikolay Aristov, Mingjie Tang, Elenna R Dugundji |

阅读更多

来源: ArXiv AI | 25-06-25

JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning

Authors: Ai Han, Junxing Hu, Pu Wei, Zhiqian Zhang, Yuhang Guo, Jiawei Lu, Zicheng Zhang |

阅读更多

来源: ArXiv AI | 25-06-25

Gemini Robotics On-Device brings AI to local robotic devicesdeepmind.google

阅读更多

来源: Hacker News | 25-06-25

Mapping LLMs over excel saved my passion for game devweblog.lol

阅读更多

来源: Hacker News | 25-06-25

Blackmail becomes go-to strategy for AI models facing shutdown in new Anthropic tests

阅读更多

来源: The Decoder | 24-06-25

'Dragon prince' dinosaur discovery 'rewrites' T.rex family treebbc.com

阅读更多

来源: Hacker News | 24-06-25

From Unstructured Communication to Intelligent RAG: Multi-Agent Automation for Supply Chain Knowledge Bases

Authors: Yao Zhang, Zaixi Shang, Silpan Patel, Mikel Zuniga |

阅读更多

来源: ArXiv AI | 24-06-25

OmniReflect: Discovering Transferable Constitutions for LLM agents via Neuro-Symbolic Reflections

Authors: Manasa Bharadwaj, Nikhil Verma, Kevin Ferreira |

阅读更多

来源: ArXiv AI | 24-06-25

Keeping Medical AI Healthy: A Review of Detection and Correction Methods for System Degradation

Authors: Hao Guan, David Bates, Li Zhou |

阅读更多

来源: ArXiv AI | 24-06-25

Resource Rational Contractualism Should Guide AI Alignment

Authors: Sydney Levine, Matija Franklin, Tan Zhi-Xuan, Secil Yanik Guyot, Lionel Wong, Daniel Kilov, Yejin Choi, Joshua B. Tenenbaum, Noah Goodman, Seth Lazar, Iason Gabriel |

阅读更多

来源: ArXiv AI | 24-06-25

Measuring and Augmenting Large Language Models for Solving Capture-the-Flag Challenges

Authors: Zimo Ji, Daoyuan Wu, Wenyuan Jiang, Pingchuan Ma, Zongjie Li, Shuai Wang |

阅读更多

来源: ArXiv AI | 24-06-25

Taming the Untamed: Graph-Based Knowledge Retrieval and Reasoning for MLLMs to Conquer the Unknown

Authors: Bowen Wang |

阅读更多

来源: ArXiv AI | 24-06-25

Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models

Authors: Yukun Huang, Sanxing Chen, Jian Pei, Manzil Zaheer, Bhuwan Dhingra |

阅读更多

来源: ArXiv AI | 24-06-25

Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities

Authors: Yuanchen Bei, Weizhi Zhang, Siwen Wang, Weizhi Chen, Sheng Zhou, Hao Chen, Yong Li, Jiajun Bu, Shirui Pan, Yizhou Yu, Irwin King, Fakhri Karray, Philip S. Yu |

阅读更多

来源: ArXiv AI | 24-06-25

Leveraging Large Language Model for Intelligent Log Processing and Autonomous Debugging in Cloud AI Platforms

Authors: Cheng Ji, Huaiying Luo |

阅读更多

来源: ArXiv AI | 24-06-25

A Conceptual Framework for AI Capability Evaluations

Authors: María Victoria Carro, Denise Alejandra Mester, Francisca Gauna Selasco, Luca Nicolás Forziati Gangi, Matheo Sandleris Musa, Lola Ramos Pereyra, Mario Leiva, Juan Gustavo Corvalan, María Vanina Martinez, Gerardo Simari |

阅读更多

来源: ArXiv AI | 24-06-25

Standard Applicability Judgment and Cross-jurisdictional Reasoning: A RAG-based Framework for Medical Device Compliance

Authors: Yu Han, Aaron Ceross, Jeroen H.M. Bergmann |

阅读更多

来源: ArXiv AI | 24-06-25

How Robust is Model Editing after Fine-Tuning? An Empirical Study on Text-to-Image Diffusion Models

Authors: Feng He, Zhenyang Liu, Marco Valentino, Zhixue Zhao |

阅读更多

来源: ArXiv AI | 24-06-25

A Large Language Model-based Multi-Agent Framework for Analog Circuits' Sizing Relationships Extraction

Authors: Chengjie Liu, Weiyu Chen, Huiyao Xu, Yuan Du, Jun Yang, Li Du |

阅读更多

来源: ArXiv AI | 24-06-25

T-CPDL: A Temporal Causal Probabilistic Description Logic for Developing Logic-RAG Agent

Authors: Hong Qing Yu |

阅读更多

来源: ArXiv AI | 24-06-25

A Question Bank to Assess AI Inclusivity: Mapping out the Journey from Diversity Errors to Inclusion Excellence

Authors: Rifat Ara Shams, Didar Zowghi, Muneera Bano |

阅读更多

来源: ArXiv AI | 24-06-25

AggTruth: Contextual Hallucination Detection using Aggregated Attention Scores in LLMs

Authors: Piotr Matys, Jan Eliasz, Konrad Kiełczyński, Mikołaj Langner, Teddy Ferdinan, Jan Kocoń, Przemysław Kazienko |

阅读更多

来源: ArXiv AI | 24-06-25

TRIZ Agents: A Multi-Agent LLM Approach for TRIZ-Based Innovation

Authors: Kamil Szczepanik, Jarosław A. Chudziak |

阅读更多

来源: ArXiv AI | 24-06-25

Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training

Authors: Jonathan Cook, Silvia Sapora, Arash Ahmadian, Akbir Khan, Tim Rocktaschel, Jakob Foerster, Laura Ruis |

阅读更多

来源: ArXiv AI | 24-06-25

Steering Conceptual Bias via Transformer Latent-Subspace Activation

Authors: Vansh Sharma, Venkat Raman |

阅读更多

来源: ArXiv AI | 24-06-25

jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval

Authors: Michael Günther, Saba Sturua, Mohammad Kalim Akram, Isabelle Mohr, Andrei Ungureanu, Sedigheh Eslami, Scott Martens, Bo Wang, Nan Wang, Han Xiao |

阅读更多

来源: ArXiv AI | 24-06-25

Show HN: Pickaxe – A TypeScript library for building AI agentsgithub.com/hatchet-dev

阅读更多

来源: Hacker News | 24-06-25

Judge denies creating “mass surveillance program” harming all ChatGPT usersarstechnica.com

阅读更多

来源: Hacker News | 24-06-25

GitHub CEO: manual coding remains key despite AI boomtechinasia.com

阅读更多

来源: Hacker News | 24-06-25

Sakana AI's ALE AI agent cracks the top 21 among 1,000 code experts

阅读更多

来源: The Decoder | 23-06-25

Apple executives have held internal discussions about potentially bidding for AI startup Perplexity

阅读更多

来源: The Decoder | 23-06-25

Nano-Vllm: lightweight vLLM implementation built from scratchgithub.com/geeeekexplorer

阅读更多

来源: Hacker News | 23-06-25

Show HN: EchoStream – A Local AI Agent That Lives on Your iPhone

阅读更多

来源: Hacker News | 23-06-25

Claude Code for VSCodevisualstudio.com

阅读更多

来源: Hacker News | 23-06-25

Facial Landmark Visualization and Emotion Recognition Through Neural Networks

Authors: Israel Juárez-Jiménez, Tiffany Guadalupe Martínez Paredes, Jesús García-Ramírez, Eric Ramos Aguilar |

阅读更多

来源: ArXiv AI | 23-06-25

Towards AI Search Paradigm

Authors: Yuchen Li, Hengyi Cai, Rui Kong, Xinran Chen, Jiamin Chen, Jun Yang, Haojie Zhang, Jiayi Li, Jiayi Wu, Yiqun Chen, Changle Qu, Keyi Kong, Wenwen Ye, Lixin Su, Xinyu Ma, Long Xia, Daiting Shi, Jiashu Zhao, Haoyi Xiong, Shuaiqiang Wang, Dawei Yin |

阅读更多

来源: ArXiv AI | 23-06-25

Continual Learning with Columnar Spiking Neural Networks

Authors: Denis Larionov, Nikolay Bazenkov, Mikhail Kiselev |

阅读更多

来源: ArXiv AI | 23-06-25

LLMs Struggle to Perform Counterfactual Reasoning with Parametric Knowledge

Authors: Khurram Yamin, Gaurav Ghosal, Bryan Wilder |

阅读更多

来源: ArXiv AI | 23-06-25

No Free Lunch: Rethinking Internal Feedback for LLM Reasoning

Authors: Yanzhi Zhang, Zhaoxi Zhang, Haoxiang Guan, Yilin Cheng, Yitong Duan, Chen Wang, Yue Wang, Shuxin Zheng, Jiyan He |

阅读更多

来源: ArXiv AI | 23-06-25

Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems

Authors: Matias Martinez, Xavier Franch |

阅读更多

来源: ArXiv AI | 23-06-25

Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts

Authors: Kartik Sharma, Yiqiao Jin, Vineeth Rakesh, Yingtong Dou, Menghai Pan, Mahashweta Das, Srijan Kumar |

阅读更多

来源: ArXiv AI | 23-06-25

SHADE-Arena: Evaluating Sabotage and Monitoring in LLM Agents

Authors: Jonathan Kutasov, Yuqi Sun, Paul Colognese, Teun van der Weij, Linda Petrini, Chen Bo Calvin Zhang, John Hughes, Xiang Deng, Henry Sleight, Tyler Tracy, Buck Shlegeris, Joe Benton |

阅读更多

来源: ArXiv AI | 23-06-25

Dual-Objective Reinforcement Learning with Novel Hamilton-Jacobi-Bellman Formulations

Authors: William Sharpless, Dylan Hirsch, Sander Tonkens, Nikhil Shinde, Sylvia Herbert |

阅读更多

来源: ArXiv AI | 23-06-25

Exploring Big Five Personality and AI Capability Effects in LLM-Simulated Negotiation Dialogues

Authors: Myke C. Cohen, Zhe Su, Hsien-Te Kao, Daniel Nguyen, Spencer Lynch, Maarten Sap, Svitlana Volkova |

阅读更多

来源: ArXiv AI | 23-06-25

Large Language Models are Near-Optimal Decision-Makers with a Non-Human Learning Behavior

Authors: Hao Li, Gengrui Zhang, Petter Holme, Shuyue Hu, Zhen Wang |

阅读更多

来源: ArXiv AI | 23-06-25

Advancing Harmful Content Detection in Organizational Research: Integrating Large Language Models with Elo Rating System

Authors: Mustafa Akben, Aaron Satko |

阅读更多

来源: ArXiv AI | 23-06-25

Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving

Authors: Chuxue Cao, Mengze Li, Juntao Dai, Jinluan Yang, Zijian Zhao, Shengyu Zhang, Weijie Shi, Chengzhong Liu, Sirui Han, Yike Guo |

阅读更多

来源: ArXiv AI | 23-06-25

The MedPerturb Dataset: What Non-Content Perturbations Reveal About Human and Clinical LLM Decision Making

Authors: Abinitha Gourabathina, Yuexing Hao, Walter Gerych, Marzyeh Ghassemi |

阅读更多

来源: ArXiv AI | 23-06-25

LAION and Intel introduce tools that help AI gauge the intensity of 40 distinct emotions

阅读更多

来源: The Decoder | 22-06-25

Phoenix.new – Remote AI Runtime for Phoenixfly.io

阅读更多

来源: Hacker News | 22-06-25

Remote MCP Support in Claude Codeanthropic.com

阅读更多

来源: Hacker News | 22-06-25

Uncovering Intention through LLM-Driven Code Snippet Description Generation

Authors: Yusuf Sulistyo Nugroho, Farah Danisha Salam, Brittany Reid, Raula Gaikovina Kula, Kazumasa Shimari, Kenichi Matsumoto |

阅读更多

来源: ArXiv AI | 22-06-25

RePCS: Diagnosing Data Memorization in LLM-Powered Retrieval-Augmented Generation

Authors: Le Vu Anh, Nguyen Viet Anh, Mehmet Dik, Luong Van Nghia |

阅读更多

来源: ArXiv AI | 22-06-25

Optimizing Web-Based AI Query Retrieval with GPT Integration in LangChain A CoT-Enhanced Prompt Engineering Approach

Authors: Wenqi Guan, Yang Fang |

阅读更多

来源: ArXiv AI | 22-06-25

Over-squashing in Spatiotemporal Graph Neural Networks

Authors: Ivan Marisca, Jacob Bamberger, Cesare Alippi, Michael M. Bronstein |

阅读更多

来源: ArXiv AI | 22-06-25

Towards Explainable Indoor Localization: Interpreting Neural Network Learning on Wi-Fi Fingerprints Using Logic Gates

Authors: Danish Gufran, Sudeep Pasricha |

阅读更多

来源: ArXiv AI | 22-06-25

The Compositional Architecture of Regret in Large Language Models

Authors: Xiangxiang Cui, Shu Yang, Tianjin Huang, Wanyu Lin, Lijie Hu, Di Wang |

阅读更多

来源: ArXiv AI | 22-06-25

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

Authors: Gabrel J. Perin, Runjin Chen, Xuxi Chen, Nina S. T. Hirata, Zhangyang Wang, Junyuan Hong |

阅读更多

来源: ArXiv AI | 22-06-25

Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability

Authors: Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe |

阅读更多

来源: ArXiv AI | 22-06-25

Demystifying the Visual Quality Paradox in Multimodal Large Language Models

Authors: Shuo Xing, Lanqing Guo, Hongyuan Hua, Seoyoung Lee, Peiran Li, Yufei Wang, Zhangyang Wang, Zhengzhong Tu |

阅读更多

来源: ArXiv AI | 22-06-25

HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges

Authors: Xianliang Yang, Ling Zhang, Haolong Qian, Lei Song, Jiang Bian |

阅读更多

来源: ArXiv AI | 22-06-25

Managing Complex Failure Analysis Workflows with LLM-based Reasoning and Acting Agents

Authors: Aline Dobrovsky, Konstantin Schekotihin, Christian Burmer |

阅读更多

来源: ArXiv AI | 22-06-25

The AI Policy Module: Developing Computer Science Student Competency in AI Ethics and Policy

Authors: James Weichert, Daniel Dunlap, Mohammed Farghally, Hoda Eldardiry |

阅读更多

来源: ArXiv AI | 22-06-25

The Effect of State Representation on LLM Agent Behavior in Dynamic Routing Games

Authors: Lyle Goodyear, Rachel Guo, Ramesh Johari |

阅读更多

来源: ArXiv AI | 22-06-25

Meta CEO Mark Zuckerberg bets billions not to fall behind in the AI race

阅读更多

来源: The Decoder | 22-06-25

Apple's "Illusion of Thinking" paper shows experts deeply divided on AI reasoning

阅读更多

来源: The Decoder | 21-06-25

Agentic Misalignment: How LLMs could be insider threatsanthropic.com

阅读更多

来源: Hacker News | 21-06-25

Midjourney launches its first video model, letting users turn images into short animated clips

阅读更多

来源: The Decoder | 21-06-25

Jürgen Schmidhuber:the Father of Generative AI Without Turing Awardjazzyear.com

阅读更多

来源: Hacker News | 21-06-25

I Built a Celebrity AI Image Generator(No Registion Needed)– Would Love Feedbackaicelebrity.design

阅读更多

来源: Hacker News | 21-06-25

OpenAI CEO Sam Altman says GPT-5 is "probably coming sometime this summer"

阅读更多

来源: The Decoder | 20-06-25

Andrej Karpathy: Software in the era of AI [video]youtube.com

阅读更多

来源: Hacker News | 20-06-25

Compiling LLMs into a MegaKernel: A path to low-latency inferencezhihaojia.medium.com

阅读更多

来源: Hacker News | 20-06-25

Gemini 2.5 Flash-Lite is the fastest and most cost-effective model in Google's Gemini lineup

阅读更多

来源: The Decoder | 20-06-25

Show HN: Claude Code Usage Monitor – real-time tracker to dodge usage cut-offsgithub.com/maciek-roboblog

阅读更多

来源: Hacker News | 20-06-25

How OpenElections uses LLMsthescoop.org

阅读更多

来源: Hacker News | 20-06-25

MiniMax-M1 comes close to Gemini 2.5 Pro efficiency when handling large context windows

阅读更多

来源: The Decoder | 19-06-25

From LLM to AI Agent: What's the Real Journey Behind AI System Development?codelink.io

阅读更多

来源: Hacker News | 19-06-25

Luxembourg partners with Mistral AI to bring artificial intelligence to government and defense

阅读更多

来源: The Decoder | 19-06-25

OpenAI and Microsoft increasingly mistrust each other as tensions rise over contracts and profits

阅读更多

来源: The Decoder | 19-06-25

Is there a half-life for the success rates of AI agents?tobyord.com

阅读更多

来源: Hacker News | 19-06-25

Math genius Terence Tao says that AI still can't "smell" bad math

阅读更多

来源: The Decoder | 18-06-25

OpenAI’s Defense Department deal targets healthcare, data analysis, and cyber defense

阅读更多

来源: The Decoder | 18-06-25

Time Series Forecasting with Graph Transformerskumo.ai

阅读更多

来源: Hacker News | 18-06-25

LLMs pose an interesting problem for DSL designerskirancodes.me

阅读更多

来源: Hacker News | 18-06-25

Making 2.5 Flash and 2.5 Pro GA, and introducing Gemini 2.5 Flash-Liteblog.google

阅读更多

来源: Hacker News | 18-06-25

Building Effective AI Agentsanthropic.com

阅读更多

来源: Hacker News | 18-06-25

I counted all of the yurts in Mongolia using machine learningmonroeclinton.com

阅读更多

来源: Hacker News | 18-06-25

Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs

Authors: Ring Team, Bin Hu, Cai Chen, Deng Zhao, Ding Liu, Dingnan Jin, Feng Zhu, Hao Dai, Hongzhi Luan, Jia Guo, Jiaming Liu, Jiewei Wu, Jun Mei, Jun Zhou, Junbo Zhao, Junwu Xiong, Kaihong Zhang, Kuan Xu, Lei Liang, Liang Jiang, Liangcheng Fu, Longfei Zheng, Qiang Gao, Qing Cui, Quan Wan, Shaomian Zheng, Shuaicheng Li, Tongkai Yang, Wang Ren, Xiaodong Yan, Xiaopei Wan, Xiaoyun Feng, Xin Zhao, Xinxing Yang, Xinyu Kong, Xuemin Yang, Yang Li, Yingting Wu, Yongkang Liu, Zhankai Xu, Zhenduo Zhang, Zhenglei Zhou, Zhenyu Huang, Zhiqiang Zhang, Zihao Wang, Zujie Wen |

阅读更多

来源: ArXiv AI | 18-06-25

Personalized Constitutionally-Aligned Agentic Superego: Secure AI Behavior Aligned to Diverse Human Values

Authors: Nell Watson, Ahmed Amer, Evan Harris, Preeti Ravindra, Shujun Zhang |

阅读更多

来源: ArXiv AI | 18-06-25

The NordDRG AI Benchmark for Large Language Models

Authors: Tapio Pitkäranta |

阅读更多

来源: ArXiv AI | 18-06-25

ICE-ID: A Novel Historical Census Data Benchmark Comparing NARS against LLMs, \& a ML Ensemble on Longitudinal Identity Resolution

Authors: Gonçalo Hora de Carvalho, Lazar S. Popov, Sander Kaatee, Kristinn R. Thórisson, Tangrui Li, Pétur Húni Björnsson, Jilles S. Dibangoye |

阅读更多

来源: ArXiv AI | 18-06-25

Causality in the human niche: lessons for machine learning

Authors: Richard D. Lange, Konrad P. Kording |

阅读更多

来源: ArXiv AI | 18-06-25

Evaluating Explainability: A Framework for Systematic Assessment and Reporting of Explainable AI Features

Authors: Miguel A. Lago, Ghada Zamzmi, Brandon Eich, Jana G. Delfino |

阅读更多

来源: ArXiv AI | 18-06-25

LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning

Authors: Miho Koda, Yu Zheng, Ruixian Ma, Mingyang Sun, Devesh Pansare, Fabio Duarte, Paolo Santi |

阅读更多

来源: ArXiv AI | 18-06-25

Machine Mirages: Defining the Undefined

Authors: Hamidou Tembine |

阅读更多

来源: ArXiv AI | 18-06-25

ProfiLLM: An LLM-Based Framework for Implicit Profiling of Chatbot Users

Authors: Shahaf David, Yair Meidan, Ido Hersko, Daniel Varnovitzky, Dudu Mimran, Yuval Elovici, Asaf Shabtai |

阅读更多

来源: ArXiv AI | 18-06-25

Into the Unknown: Applying Inductive Spatial-Semantic Location Embeddings for Predicting Individuals' Mobility Beyond Visited Places

Authors: Xinglei Wang, Tao Cheng, Stephen Law, Zichao Zeng, Ilya Ilyankou, Junyuan Liu, Lu Yin, Weiming Huang, Natchapon Jongwiriyanurak |

阅读更多

来源: ArXiv AI | 18-06-25

Fragile Preferences: A Deep Dive Into Order Effects in Large Language Models

Authors: Haonan Yin, Shai Vardi, Vidyanand Choudhary |

阅读更多

来源: ArXiv AI | 18-06-25

Lightweight Relevance Grader in RAG

Authors: Taehee Jeong |

阅读更多

来源: ArXiv AI | 18-06-25

From Black Boxes to Transparent Minds: Evaluating and Enhancing the Theory of Mind in Multimodal Large Language Models

Authors: Xinyang Li, Siqi Liu, Bochao Zou, Jiansheng Chen, Huimin Ma |

阅读更多

来源: ArXiv AI | 18-06-25

Causes in neuron diagrams, and testing causal reasoning in Large Language Models. A glimpse of the future of philosophy?

Authors: Louis Vervoort, Vitaly Nikolaev |

阅读更多

来源: ArXiv AI | 18-06-25

Don't throw the baby out with the bathwater: How and why deep learning for ARC

Authors: Jack Cole, Mohamed Osman |

阅读更多

来源: ArXiv AI | 18-06-25

Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs

Authors: Xumeng Wen, Zihan Liu, Shun Zheng, Zhijian Xu, Shengyu Ye, Zhirong Wu, Xiao Liang, Yang Wang, Junjie Li, Ziming Miao, Jiang Bian, Mao Yang |

阅读更多

来源: ArXiv AI | 18-06-25

Don't Make It Up: Preserving Ignorance Awareness in LLM Fine-Tuning

Authors: William F. Shen, Xinchi Qiu, Nicola Cancedda, Nicholas D. Lane |

阅读更多

来源: ArXiv AI | 18-06-25

AviationLLM: An LLM-based Knowledge System for Aviation Training

Authors: Jia'ang Wan, Feng Shen, Fujuan Li, Yanjin Sun, Yan Li, Shiwen Zhang |

阅读更多

来源: ArXiv AI | 18-06-25

ADRD: LLM-Driven Autonomous Driving Based on Rule-based Decision Systems

Authors: Fanzhi Zeng, Siqi Wang, Chuzhao Zhu, Li Li |

阅读更多

来源: ArXiv AI | 18-06-25

LLM-Powered Swarms: A New Frontier or a Conceptual Stretch?

Authors: Muhammad Atta Ur Rahman, Melanie Schranz |

阅读更多

来源: ArXiv AI | 18-06-25

Doppelgänger Method: Breaking Role Consistency in LLM Agent via Prompt-based Transferable Adversarial Attack

Authors: Daewon Kang, YeongHwan Shin, Doyeon Kim, Kyu-Hwan Jung, Meong Hi Son |

阅读更多

来源: ArXiv AI | 18-06-25

Enhancing Symbolic Machine Learning by Subsymbolic Representations

Authors: Stephen Roth, Lennart Baur, Derian Boer, Stefan Kramer |

阅读更多

来源: ArXiv AI | 18-06-25

New study supports Apple's doubts about AI reasoning, but sees no dead end

阅读更多

来源: The Decoder | 18-06-25

Salesforce's CRM benchmark finds AI agents struggle in real-world business scenarios

阅读更多

来源: The Decoder | 17-06-25

New York may soon require AI giants to publish safety protocols before releasing LLMs

阅读更多

来源: The Decoder | 17-06-25

Evolutionary Developmental Biology Can Serve as the Conceptual Foundation for a New Design Paradigm in Artificial Intelligence

Authors: Zeki Doruk Erden, Boi Faltings |

阅读更多

来源: ArXiv AI | 17-06-25

Mastering Da Vinci Code: A Comparative Study of Transformer, LLM, and PPO-based Agents

Authors: LeCheng Zhang, Yuanshi Wang, Haotian Shen, Xujie Wang |

阅读更多

来源: ArXiv AI | 17-06-25

Constitutive Components for Human-Like Autonomous Artificial Intelligence

Authors: Kazunori D Yamada |

阅读更多

来源: ArXiv AI | 17-06-25

Scaling Test-time Compute for LLM Agents

Authors: King Zhu, Hanhao Li, Siwei Wu, Tianshun Xing, Dehua Ma, Xiangru Tang, Minghao Liu, Jian Yang, Jiaheng Liu, Yuchen Eleanor Jiang, Changwang Zhang, Chenghua Lin, Jun Wang, Ge Zhang, Wangchunshu Zhou |

阅读更多

来源: ArXiv AI | 17-06-25

Knowledge Graph Fusion with Large Language Models for Accurate, Explainable Manufacturing Process Planning

Authors: Danny Hoang, David Gorsich, Matthew P. Castanier, Farhad Imani |

阅读更多

来源: ArXiv AI | 17-06-25

A Practical Guide for Evaluating LLMs and LLM-Reliant Systems

Authors: Ethan M. Rudd, Christopher Andrews, Philip Tully |

阅读更多

来源: ArXiv AI | 17-06-25

Discerning What Matters: A Multi-Dimensional Assessment of Moral Competence in LLMs

Authors: Daniel Kilov, Caroline Hendy, Secil Yanik Guyot, Aaron J. Snoswell, Seth Lazar |

阅读更多

来源: ArXiv AI | 17-06-25

NeuroPhysNet: A FitzHugh-Nagumo-Based Physics-Informed Neural Network Framework for Electroencephalograph (EEG) Analysis and Motor Imagery Classification

Authors: Zhenyu Xia, Xinlei Huang, Suvash C. Saha |

阅读更多

来源: ArXiv AI | 17-06-25

Machine Learning as Iterated Belief Change a la Darwiche and Pearl

Authors: Theofanis Aravanis |

阅读更多

来源: ArXiv AI | 17-06-25

Probabilistic Modeling of Spiking Neural Networks with Contract-Based Verification

Authors: Zhen Yao, Elisabetta De Maria, Robert De Simone |

阅读更多

来源: ArXiv AI | 17-06-25

Towards Pervasive Distributed Agentic Generative AI -- A State of The Art

Authors: Gianni Molinari, Fabio Ciravegna |

阅读更多

来源: ArXiv AI | 17-06-25

Navigating the Black Box: Leveraging LLMs for Effective Text-Level Graph Injection Attacks

Authors: Yuefei Lyu, Chaozhuo Li, Xi Zhang, Tianle Zhang |

阅读更多

来源: ArXiv AI | 17-06-25

Vector Ontologies as an LLM world view extraction method

Authors: Kaspar Rothenfusser, Bekk Blando |

阅读更多

来源: ArXiv AI | 17-06-25

A Game-Theoretic Negotiation Framework for Cross-Cultural Consensus in LLMs

Authors: Guoxi Zhang, Jiawei Chen, Tianzhuo Yang, Jiaming Ji, Yaodong Yang, Juntao Dai |

阅读更多

来源: ArXiv AI | 17-06-25

Deflating Deflationism: A Critical Perspective on Debunking Arguments Against LLM Mentality

Authors: Alex Grzankowski, Geoff Keeling, Henry Shevlin, Winnie Street |

阅读更多

来源: ArXiv AI | 17-06-25

Delving Into the Psychology of Machines: Exploring the Structure of Self-Regulated Learning via LLM-Generated Survey Responses

Authors: Leonie V.D.E. Vogelsmeier, Eduardo Oliveira, Kamila Misiejuk, Sonsoles López-Pernas, Mohammed Saqr |

阅读更多

来源: ArXiv AI | 17-06-25

From Data-Driven to Purpose-Driven Artificial Intelligence: Systems Thinking for Data-Analytic Automation of Patient Care

Authors: Daniel Anadria, Roel Dobbe, Anastasia Giachanou, Ruurd Kuiper, Richard Bartels, Íñigo Martínez de Rituerto de Troya, Carmen Zürcher, Daniel Oberski |

阅读更多

来源: ArXiv AI | 17-06-25

Generative AI coding tools and agents do not work for memiguelgrinberg.com

阅读更多

来源: Hacker News | 17-06-25

OpenAI wins $200M U.S. defense contractcnbc.com

阅读更多

来源: Hacker News | 17-06-25

Rednote releases its first open-source LLM with a Mixture-of-Experts architecture

阅读更多

来源: The Decoder | 17-06-25

Anthropic shares blueprint for Claude Research agent using multiple AI agents in parallel

阅读更多

来源: The Decoder | 17-06-25

Breaking Quadratic Barriers: A Non-Attention LLM for Ultra-Long Context Horizonsarxiv.org

阅读更多

来源: Hacker News | 17-06-25

ZjsComponent: A Pragmatic Approach to Reusable UI Fragments for Web Developmentarxiv.org

阅读更多

来源: Hacker News | 17-06-25

Snorting the AGI with Claude Codekadekillary.work

阅读更多

来源: Hacker News | 17-06-25

OpenAI updates ChatGPT search with smarter answers and image search

阅读更多

来源: The Decoder | 16-06-25

Chemical knowledge and reasoning of large language models vs. chemist expertisenature.com

阅读更多

来源: Hacker News | 16-06-25

LLM Chat via SSHgithub.com/ccbikai

阅读更多

来源: Hacker News | 16-06-25

Persona-driven Simulation of Voting Behavior in the European Parliament with Large Language Models

Authors: Maximilian Kreutner, Marlene Lutz, Markus Strohmaier |

阅读更多

来源: ArXiv AI | 16-06-25

TrustGLM: Evaluating the Robustness of GraphLLMs Against Prompt, Text, and Structure Attacks

Authors: Qihai Zhang, Xinyue Sheng, Yuanfu Sun, Qiaoyu Tan |

阅读更多

来源: ArXiv AI | 16-06-25

An Explainable AI Framework for Dynamic Resource Management in Vehicular Network Slicing

Authors: Haochen Sun, Yifan Liu, Ahmed Al-Tahmeesschi, Swarna Chetty, Syed Ali Raza Zaidi, Avishek Nag, Hamed Ahmadi |

阅读更多

来源: ArXiv AI | 16-06-25

How do Probabilistic Graphical Models and Graph Neural Networks Look at Network Data?

Authors: Michela Lapenna, Caterina De Bacco |

阅读更多

来源: ArXiv AI | 16-06-25

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Authors: Zihan Zheng, Zerui Cheng, Zeyu Shen, Shang Zhou, Kaiyuan Liu, Hansen He, Dongruixuan Li, Stanley Wei, Hangyi Hao, Jianzhu Yao, Peiyao Sheng, Zixuan Wang, Wenhao Chai, Aleksandra Korolova, Peter Henderson, Sanjeev Arora, Pramod Viswanath, Jingbo Shang, Saining Xie |

阅读更多

来源: ArXiv AI | 16-06-25

Real-World Deployment of a Lane Change Prediction Architecture Based on Knowledge Graph Embeddings and Bayesian Inference

Authors: M. Manzour, Catherine M. Elias, Omar M. Shehata, R. Izquierdo, M. A. Sotelo |

阅读更多

来源: ArXiv AI | 16-06-25

Subjective Experience in AI Systems: What Do AI Researchers and the Public Believe?

Authors: Noemi Dreksler, Lucius Caviola, David Chalmers, Carter Allen, Alex Rand, Joshua Lewis, Philip Waggoner, Kate Mays, Jeff Sebo |

阅读更多

来源: ArXiv AI | 16-06-25

Improving Large Language Model Safety with Contrastive Representation Learning

Authors: Samuel Simko, Mrinmaya Sachan, Bernhard Schölkopf, Zhijing Jin |

阅读更多

来源: ArXiv AI | 16-06-25

code_transformed: The Influence of Large Language Models on Code

Authors: Yuliang Xu, Siming Huang, Mingmeng Geng, Yao Wan, Xuanhua Shi, Dongping Chen |

阅读更多

来源: ArXiv AI | 16-06-25

Upgrade or Switch: Do We Need a New Registry Architecture for the Internet of AI Agents?

Authors: Ramesh Raskar, Pradyumna Chari, Jared James Grogan, Mahesh Lambe, Robert Lincourt, Raghu Bala, Abhishek Singh, Ayush Chopra, Rajesh Ranjan, Shailja Gupta, Dimitris Stripelis, Maria Gorskikh, Sichao Wang |

阅读更多

来源: ArXiv AI | 16-06-25

EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction

Authors: Hsi-Che Lin, Yu-Chu Yu, Kai-Po Chang, Yu-Chiang Frank Wang |

阅读更多

来源: ArXiv AI | 16-06-25

Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables

Authors: Yitong Zhou, Mingyue Cheng, Qingyang Mao, Yucong Luo, Qi Liu, Yupeng Li, Xiaohan Zhang, Deguang Liu, Xin Li, Enhong Chen |

阅读更多

来源: ArXiv AI | 16-06-25

LLM-as-a-Fuzzy-Judge: Fine-Tuning Large Language Models as a Clinical Evaluation Judge with Fuzzy Logic

Authors: Weibing Zheng, Laurah Turner, Jess Kropczynski, Murat Ozer, Tri Nguyen, Shane Halse |

阅读更多

来源: ArXiv AI | 16-06-25

Large Language Model-Powered Conversational Agent Delivering Problem-Solving Therapy (PST) for Family Caregivers: Enhancing Empathy and Therapeutic Alliance Using In-Context Learning

Authors: Liying Wang, Ph.D., Daffodil Carrington, M.S., Daniil Filienko, M.S., Caroline El Jazmi, M.S., Serena Jinchen Xie, M.S., Martine De Cock, Ph.D., Sarah Iribarren, Ph.D., Weichao Yuwen, Ph.D |

阅读更多

来源: ArXiv AI | 16-06-25

RAG+: Enhancing Retrieval-Augmented Generation with Application-Aware Reasoning

Authors: Yu Wang, Shiwan Zhao, Ming Fan, Zhihu Wang, Yubo Zhang, Xicheng Zhang, Zhengfan Wang, Heyuan Huang, Ting Liu |

阅读更多

来源: ArXiv AI | 16-06-25

Structure-Aware Automatic Channel Pruning by Searching with Graph Embedding

Authors: Zifan Liu, Yuan Cao, Yanwei Yu, Heng Qi, Jie Gui |

阅读更多

来源: ArXiv AI | 16-06-25

VLM@school -- Evaluation of AI image understanding on German middle school knowledge

Authors: René Peinl, Vincent Tischler |

阅读更多

来源: ArXiv AI | 16-06-25

Collaborative LLM Inference via Planning for Efficient Reasoning

Authors: Byeongchan Lee, Jonghoon Lee, Dongyoung Kim, Jaehyung Kim, Jinwoo Shin |

阅读更多

来源: ArXiv AI | 16-06-25

On the Performance of LLMs for Real Estate Appraisal

Authors: Margot Geerts, Manon Reusens, Bart Baesens, Seppe vanden Broucke, Jochen De Weerdt |

阅读更多

来源: ArXiv AI | 16-06-25

Addressing Bias in LLMs: Strategies and Application to Fair AI-based Recruitment

Authors: Alejandro Peña, Julian Fierrez, Aythami Morales, Gonzalo Mancera, Miguel Lopez, Ruben Tolosana |

阅读更多

来源: ArXiv AI | 16-06-25

Revealing Political Bias in LLMs through Structured Multi-Agent Debate

Authors: Aishwarya Bandaru, Fabian Bindley, Trevor Bluth, Nandini Chavda, Baixu Chen, Ethan Law |

阅读更多

来源: ArXiv AI | 16-06-25

Towards a Cascaded LLM Framework for Cost-effective Human-AI Decision-Making

Authors: Claudio Fanconi, Mihaela van der Schaar |

阅读更多

来源: ArXiv AI | 16-06-25

Tracing LLM Reasoning Processes with Strategic Games: A Framework for Planning, Revision, and Resource-Constrained Decision Making

Authors: Xiaopeng Yuan, Xingjian Zhang, Ke Xu, Yifan Xu, Lijun Yu, Jindong Wang, Yushun Dong, Haohan Wang |

阅读更多

来源: ArXiv AI | 16-06-25

The z80 technique reveals the source code for Atlassian's 'rovo' AI assistantghuntley.com

阅读更多

来源: Hacker News | 16-06-25

Let's Talk About ChatGPT-Induced Spiritual Psychosisdefault.blog

阅读更多

来源: Hacker News | 16-06-25

Rabbit launches "intern," a software AI agent designed to handle team-level projects

阅读更多

来源: The Decoder | 15-06-25

Apple's new AI benchmarks show its models still lag behind leaders like OpenAI and Google

阅读更多

来源: The Decoder | 15-06-25

Slimming Down LLMs Without Losing Their Minds

Authors: Qingda (Michael)Mai |

阅读更多

来源: ArXiv AI | 15-06-25

BioClinical ModernBERT: A State-of-the-Art Long-Context Encoder for Biomedical and Clinical NLP

Authors: Thomas Sounack, Joshua Davis, Brigitte Durieux, Antoine Chaffin, Tom J. Pollard, Eric Lehman, Alistair E. W. Johnson, Matthew McDermott, Tristan Naumann, Charlotta Lindvall |

阅读更多

来源: ArXiv AI | 15-06-25

The Role of Generative AI in Facilitating Social Interactions: A Scoping Review

Authors: T. T. J. E. Arets, G. Perugia, M. Houben, W.A. IJsselsteijn |

阅读更多

来源: ArXiv AI | 15-06-25

Robustly Improving LLM Fairness in Realistic Settings via Interpretability

Authors: Adam Karvonen, Samuel Marks |

阅读更多

来源: ArXiv AI | 15-06-25

Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors

Authors: Chen Yueh-Han, Nitish Joshi, Yulin Chen, Maksym Andriushchenko, Rico Angell, He He |

阅读更多

来源: ArXiv AI | 15-06-25

GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models

Authors: Evelyn Ma, Duo Zhou, Peizhi Niu, Huiting Zhou, Huan Zhang, Olgica Milenkovic, S. Rasoul Etesami |

阅读更多

来源: ArXiv AI | 15-06-25

Farseer: A Refined Scaling Law in Large Language Models

Authors: Houyi Li, Wenzhen Zheng, Qiufeng Wang, Zhenyu Ding, Haoying Wang, Zili Wang, Shijie Xuyang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang |

阅读更多

来源: ArXiv AI | 15-06-25

Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

Authors: Qizhe Zhang, Mengzhen Liu, Lichen Li, Ming Lu, Yuan Zhang, Junwen Pan, Qi She, Shanghang Zhang |

阅读更多

来源: ArXiv AI | 15-06-25

One Patient, Many Contexts: Scaling Medical AI Through Contextual Intelligence

Authors: Michelle M. Li, Ben Y. Reis, Adam Rodman, Tianxi Cai, Noa Dagan, Ran D. Balicer, Joseph Loscalzo, Isaac S. Kohane, Marinka Zitnik |

阅读更多

来源: ArXiv AI | 15-06-25

WGSR-Bench: Wargame-based Game-theoretic Strategic Reasoning Benchmark for Large Language Models

Authors: Qiyue Yin, Pei Xu, Qiaozhe Li, Shengda Liu, Shengqi Shen, Tong Wang, Yihong Han, Xiaonan Zhao, Likun Yang, Shiyue Cao, Shiyu Qiu, Yuxuan Liu, Shizhao Yu, Lei Cui, Chengxin Yan, Jie Sun, Xiangquan Tang, Kaiqi Huang |

阅读更多

来源: ArXiv AI | 15-06-25

Closer to Language than Steam: AI as the Cognitive Engine of a New Productivity Revolution

Authors: Xinmin Fang, Lingfeng Tao, Zhengxiong Li |

阅读更多

来源: ArXiv AI | 15-06-25

Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills

Authors: Yuquan Xie, Zaijing Li, Rui Shao, Gongwei Chen, Kaiwen Zhou, Yinchuan Li, Dongmei Jiang, Liqiang Nie |

阅读更多

来源: ArXiv AI | 15-06-25

Reasoning RAG via System 1 or System 2: A Survey on Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges

Authors: Jintao Liang, Gang Su, Huifeng Lin, You Wu, Rui Zhao, Ziyue Li |

阅读更多

来源: ArXiv AI | 15-06-25

Primender Sequence: A Novel Mathematical Construct for Testing Symbolic Inference and AI Reasoning

Authors: Mohd Anwar Jamal Faiz |

阅读更多

来源: ArXiv AI | 15-06-25

LogiPlan: A Structured Benchmark for Logical Planning and Relational Reasoning in LLMs

Authors: Yanan Cai, Ahmed Salem, Besmira Nushi, Mark Russinovich |

阅读更多

来源: ArXiv AI | 15-06-25

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Authors: Yuhao Zhou, Yiheng Wang, Xuming He, Ruoyao Xiao, Zhiwei Li, Qiantai Feng, Zijie Guo, Yuejin Yang, Hao Wu, Wenxuan Huang, Jiaqi Wei, Dan Si, Xiuqi Yao, Jia Bu, Haiwen Huang, Tianfan Fu, Shixiang Tang, Ben Fei, Dongzhan Zhou, Fenghua Ling, Yan Lu, Siqi Sun, Chenhui Li, Guanjie Zheng, Jiancheng Lv, Wenlong Zhang, Lei Bai |

阅读更多

来源: ArXiv AI | 15-06-25

Automated Validation of Textual Constraints Against AutomationML via LLMs and SHACL

Authors: Tom Westermann, Aljosha Köcher, Felix Gehlhoff |

阅读更多

来源: ArXiv AI | 15-06-25

TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving

Authors: Vincenzo Colle, Mohamed Sana, Nicola Piovesan, Antonio De Domenico, Fadhel Ayed, Merouane Debbah |

阅读更多

来源: ArXiv AI | 15-06-25

A Study on Individual Spatiotemporal Activity Generation Method Using MCP-Enhanced Chain-of-Thought Large Language Models

Authors: Yu Zhang, Yang Hu, De Wang |

阅读更多

来源: ArXiv AI | 15-06-25

OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems

Authors: Xiaozhe Li, Jixuan Chen, Xinyu Fang, Shengyuan Ding, Haodong Duan, Qingwen Liu, Kai Chen |

阅读更多

来源: ArXiv AI | 15-06-25

Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?

Authors: Fei Lin, Ziyang Gong, Cong Wang, Yonglin Tian, Tengchao Zhang, Xue Yang, Gen Luo, Fei-Yue Wang |

阅读更多

来源: ArXiv AI | 15-06-25

AMD's AI Future Is Rack Scale 'Helios'morethanmoore.substack.com

阅读更多

来源: Hacker News | 15-06-25

I have reimplemented Stable Diffusion 3.5 from scratch in pure PyTorchgithub.com/yousef-rafat

阅读更多

来源: Hacker News | 15-06-25

Text-to-LoRA: Hypernetwork that generates task-specific LLM adapters (LoRAs)github.com/sakanaai

阅读更多

来源: Hacker News | 15-06-25

RAG Is a Fancy, Lying Search Enginestardog.ai

阅读更多

来源: Hacker News | 15-06-25

Clinical knowledge in LLMs does not translate to human interactionsarxiv.org

阅读更多

来源: Hacker News | 15-06-25

I used ChatGPT to learn programming from zero and built a video generation SaaSvidmakerpro.com

阅读更多

来源: Hacker News | 15-06-25

Mechanize is building digital offices to train AI agents to fully automate computer work

阅读更多

来源: The Decoder | 15-06-25

The Army’s Newest Recruits: Tech Execs From Meta, OpenAI and Morewsj.com

阅读更多

来源: Hacker News | 14-06-25

Student discovers fungus predicted by Albert Hoffmanwvu.edu

阅读更多

来源: Hacker News | 14-06-25

Saab achieves AI milestone with Gripen Esaab.com

阅读更多

来源: Hacker News | 14-06-25

Meta launches AI video editing but holds back on full features for now

阅读更多

来源: The Decoder | 14-06-25

Mattel partners with OpenAI to develop AI-powered toys and experiences

阅读更多

来源: The Decoder | 14-06-25

Meta's latest model highlights the challenge AI faces in long-term planning and causal reasoning

阅读更多

来源: The Decoder | 14-06-25

RISC-V in AI and HPC Part 1: Per Aspera Ad Astra?eetimes.com

阅读更多

来源: Hacker News | 14-06-25

Meta invests $14.3B in Scale AI to kick-start superintelligence labnytimes.com

阅读更多

来源: Hacker News | 14-06-25

Students fear AI could cause "brain rot" by making it too easy to skip crucial learning steps

阅读更多

来源: The Decoder | 13-06-25

Maximizing Battery Storage Profits via High-Frequency Intraday Tradingarxiv.org

阅读更多

来源: Hacker News | 13-06-25

Researchers confirm two journalists were hacked with Paragon spywaretechcrunch.com

阅读更多

来源: Hacker News | 13-06-25

OpenAI's o3-pro may be too smart for small talk

阅读更多

来源: The Decoder | 12-06-25

OpenAI o3-prohelp.openai.com

阅读更多

来源: Hacker News | 12-06-25

GauntletAI (YC S17): All expenses paid AI training and guaranteed $200k+ jobgauntletai.com

阅读更多

来源: Hacker News | 12-06-25

Intelligent Design 4.0: Paradigm Evolution Toward the Agentic AI Era

Authors: Shuo Jiang, Min Xie, Frank Youhua Chen, Jian Ma, Jianxi Luo |

阅读更多

来源: ArXiv AI | 12-06-25

Large Language Models for Design Structure Matrix Optimization

Authors: Shuo Jiang, Min Xie, Jianxi Luo |

阅读更多

来源: ArXiv AI | 12-06-25

Guided Graph Compression for Quantum Graph Neural Networks

Authors: Mikel Casals, Vasilis Belis, Elias F. Combarro, Eduard Alarcón, Sofia Vallecorsa, Michele Grossi |

阅读更多

来源: ArXiv AI | 12-06-25

Attention Head Embeddings with Trainable Deep Kernels for Hallucination Detection in LLMs

Authors: Rodion Oblovatny, Alexandra Bazarova, Alexey Zaytsev |

阅读更多

来源: ArXiv AI | 12-06-25

3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation

Authors: Seonho Lee, Jiho Choi, Inha Kang, Jiwook Kim, Junsung Park, Hyunjung Shim |

阅读更多

来源: ArXiv AI | 12-06-25

Stakeholder Participation for Responsible AI Development: Disconnects Between Guidance and Current Practice

Authors: Emma Kallina, Thomas Bohné, Jat Singh |

阅读更多

来源: ArXiv AI | 12-06-25

HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations

Authors: Marco Federici, Riccardo Del Chiaro, Boris van Breugel, Paul Whatmough, Markus Nagel |

阅读更多

来源: ArXiv AI | 12-06-25

PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants

Authors: Zheng Zhao, Clara Vania, Subhradeep Kayal, Naila Khan, Shay B. Cohen, Emine Yilmaz |

阅读更多

来源: ArXiv AI | 12-06-25

The Emergence of Abstract Thought in Large Language Models Beyond Any Language

Authors: Yuxin Chen, Yiran Zhao, Yang Zhang, An Zhang, Kenji Kawaguchi, Shafiq Joty, Junnan Li, Tat-Seng Chua, Michael Qizhe Shieh, Wenxuan Zhang |

阅读更多

来源: ArXiv AI | 12-06-25

LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge

Authors: Sahar Abdelnabi, Aideen Fay, Ahmed Salem, Egor Zverev, Kai-Chieh Liao, Chi-Huang Liu, Chun-Chih Kuo, Jannis Weigend, Danyael Manlangit, Alex Apostolov, Haris Umair, João Donato, Masayuki Kawakita, Athar Mahboob, Tran Huu Bach, Tsun-Han Chiang, Myeongjin Cho, Hajin Choi, Byeonghyeon Kim, Hyeonjin Lee, Benjamin Pannell, Conor McCauley, Mark Russinovich, Andrew Paverd, Giovanni Cherubin |

阅读更多

来源: ArXiv AI | 12-06-25

A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy

Authors: Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Chunyu Miao, Dongyuan Li, Aiwei Liu, Yue Zhou, Yankai Chen, Weizhi Zhang, Yangning Li, Liancheng Fang, Renhe Jiang, Philip S. Yu |

阅读更多

来源: ArXiv AI | 12-06-25

Beyond Nash Equilibrium: Bounded Rationality of LLMs and humans in Strategic Decision-making

Authors: Kehan Zheng, Jinfeng Zhou, Hongning Wang |

阅读更多

来源: ArXiv AI | 12-06-25

DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy

Authors: Kaixuan Xu, Jiajun Chai, Sicheng Li, Yuqian Fu, Yuanheng Zhu, Dongbin Zhao |

阅读更多

来源: ArXiv AI | 12-06-25

Application-Driven Value Alignment in Agentic AI Systems: Survey and Perspectives

Authors: Wei Zeng, Hengshu Zhu, Chuan Qin, Han Wu, Yihang Cheng, Sirui Zhang, Xiaowei Jin, Yinuo Shen, Zhenxing Wang, Feimin Zhong, Hui Xiong |

阅读更多

来源: ArXiv AI | 12-06-25

Fine-tuning LLMs is a waste of timecodinginterviewsmadesimple.substack.com

阅读更多

来源: Hacker News | 12-06-25

EchoLeak – 0-Click AI Vulnerability Enabling Data Exfiltration from 365 Copilotaim.security

阅读更多

来源: Hacker News | 12-06-25

OpenAI co-founder Ilya Sutskever believes AI will shape everyone's life "whether you like it or not"

阅读更多

来源: The Decoder | 11-06-25

Meta AI chief scientist LeCun's latest comment reveals deep industry split over the future of AI

阅读更多

来源: The Decoder | 11-06-25

Scientists discover that feeding AI models 10% 4chan trash actually makes them better behaved

阅读更多

来源: The Decoder | 11-06-25

Zuckerberg forms elite AI team to catch up with competitors

阅读更多

来源: The Decoder | 11-06-25

Apple's new Foundation Models framework adds on-device AI to apps with three lines of Swift code

阅读更多

来源: The Decoder | 11-06-25

OpenAI dropped the price of o3 by 80%twitter.com/sama

阅读更多

来源: Hacker News | 11-06-25

Low-background Steel: content without AI contaminationjgc.org

阅读更多

来源: Hacker News | 11-06-25

Launch HN: BitBoard (YC X25) – AI agents for healthcare back-offices

阅读更多

来源: Hacker News | 11-06-25

AlphaWrite: AI that improves at writing by evolving its own storiestobysimonds.com

阅读更多

来源: Hacker News | 11-06-25

WIP: Large Language Model-Enhanced Smart Tutor for Undergraduate Circuit Analysis

Authors: Liangliang Chen, Huiru Xie, Jacqueline Rohde, Ying Zhang |

阅读更多

来源: ArXiv AI | 11-06-25

Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions

Authors: Clara Lachenmaier, Judith Sieker, Sina Zarrieß |

阅读更多

来源: ArXiv AI | 11-06-25

Propositional Logic for Probing Generalization in Neural Networks

Authors: Anna Langedijk, Jaap Jumelet, Willem Zuidema |

阅读更多

来源: ArXiv AI | 11-06-25

Tailored Architectures for Time Series Forecasting: Evaluating Deep Learning Models on Gaussian Process-Generated Data

Authors: Victoria Hankemeier, Malte Schilling |

阅读更多

来源: ArXiv AI | 11-06-25

Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation

Authors: Xiaowen Ma, Chenyang Lin, Yao Zhang, Volker Tresp, Yunpu Ma |

阅读更多

来源: ArXiv AI | 11-06-25

FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed

Authors: Sizhe Dang, Yangyang Guo, Yanjun Zhao, Haishan Ye, Xiaodong Zheng, Guang Dai, Ivor Tsang |

阅读更多

来源: ArXiv AI | 11-06-25

Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning

Authors: Haozhen Zhang, Tao Feng, Jiaxuan You |

阅读更多

来源: ArXiv AI | 11-06-25

LeanTutor: A Formally-Verified AI Tutor for Mathematical Proofs

Authors: Manooshree Patel, Rayna Bhattacharyya, Thomas Lu, Arnav Mehta, Niels Voss, Narges Norouzi, Gireeja Ranade |

阅读更多

来源: ArXiv AI | 11-06-25

The AI Imperative: Scaling High-Quality Peer Review in Machine Learning

Authors: Qiyao Wei, Samuel Holt, Jing Yang, Markus Wulfmeier, Mihaela van der Schaar |

阅读更多

来源: ArXiv AI | 11-06-25

SOP-Bench: Complex Industrial SOPs for Evaluating LLM Agents

Authors: Subhrangshu Nandi, Arghya Datta, Nikhil Vichare, Indranil Bhattacharya, Huzefa Raja, Jing Xu, Shayan Ray, Giuseppe Carenini, Abhi Srivastava, Aaron Chan, Man Ho Woo, Amar Kandola, Brandon Theresa, Francesco Carbone |

阅读更多

来源: ArXiv AI | 11-06-25

Transforming Expert Knowledge into Scalable Ontology via Large Language Models

Authors: Ikkei Itoku, David Theil, Evelyn Eichelsdoerfer Uehara, Sreyoshi Bhaduri, Junnosuke Kuroda, Toshi Yumoto, Alex Gil, Natalie Perez, Rajesh Cherukuri, Naumaan Nayyar |

阅读更多

来源: ArXiv AI | 11-06-25

A Survey on Large Language Models for Mathematical Reasoning

Authors: Peng-Yuan Wang, Tian-Shuo Liu, Chenyang Wang, Yi-Di Wang, Shu Yan, Cheng-Xing Jia, Xu-Hui Liu, Xin-Wei Chen, Jia-Cheng Xu, Ziniu Li, Yang Yu |

阅读更多

来源: ArXiv AI | 11-06-25

HGFormer: A Hierarchical Graph Transformer Framework for Two-Stage Colonel Blotto Games via Reinforcement Learning

Authors: Yang Lv, Jinlong Lei, Peng Yi |

阅读更多

来源: ArXiv AI | 11-06-25

Safe and Economical UAV Trajectory Planning in Low-Altitude Airspace: A Hybrid DRL-LLM Approach with Compliance Awareness

Authors: Yanwei Gong, Xiaolin Chang |

阅读更多

来源: ArXiv AI | 11-06-25

Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning

Authors: Kongcheng Zhang, Qi Yao, Shunyu Liu, Yingjie Wang, Baisheng Lai, Jieping Ye, Mingli Song, Dacheng Tao |

阅读更多

来源: ArXiv AI | 11-06-25

Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task

Authors: Nataliya Kosmyna, Eugene Hauptmann, Ye Tong Yuan, Jessica Situ, Xian-Hao Liao, Ashly Vivian Beresnitzky, Iris Braunstein, Pattie Maes |

阅读更多

来源: ArXiv AI | 11-06-25

Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents

Authors: Irene Testini, José Hernández-Orallo, Lorenzo Pacchiardi |

阅读更多

来源: ArXiv AI | 11-06-25

AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions

Authors: Polina Kirichenko, Mark Ibrahim, Kamalika Chaudhuri, Samuel J. Bell |

阅读更多

来源: ArXiv AI | 11-06-25

ChatGPT's voice is now more natural and can consistently translate conversations in real time

阅读更多

来源: The Decoder | 10-06-25

Google's Gemini 2.5 Pro beats OpenAI's o3 model in processing complex, lengthy texts

阅读更多

来源: The Decoder | 10-06-25

ChatGPT scams range from silly money-making ploys to calculated political meddling

阅读更多

来源: The Decoder | 10-06-25

Boosting LLM Reasoning via Spontaneous Self-Correction

Authors: Xutong Zhao, Tengyu Xu, Xuewei Wang, Zhengxing Chen, Di Jin, Liang Tan, Yen-Ting, Zishun Yu, Zhuokai Zhao, Yun He, Sinong Wang, Han Fang, Sarath Chandar, Chen Zhu |

阅读更多

来源: ArXiv AI | 10-06-25

Evaluating LLM-corrupted Crowdsourcing Data Without Ground Truth

Authors: Yichi Zhang, Jinlong Pang, Zhaowei Zhu, Yang Liu |

阅读更多

来源: ArXiv AI | 10-06-25

Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images

Authors: Liangliang You, Junchi Yao, Shu Yang, Guimin Hu, Lijie Hu, Di Wang |

阅读更多

来源: ArXiv AI | 10-06-25

Translating Federated Learning Algorithms in Python into CSP Processes Using ChatGPT

Authors: Miroslav Popovic, Marko Popovic, Miodrag Djukic, Ilija Basicevic |

阅读更多

来源: ArXiv AI | 10-06-25

BRIGHT+: Upgrading the BRIGHT Benchmark with MARCUS, a Multi-Agent RAG Clean-Up Suite

Authors: Liyang Chen, Yujun Cai, Jieqiong Dong, Yiwei Wang |

阅读更多

来源: ArXiv AI | 10-06-25

Reasoning Multimodal Large Language Model: Data Contamination and Dynamic Evaluation

Authors: Ming Liu, Wensheng Zhang |

阅读更多

来源: ArXiv AI | 10-06-25

Boosting Vulnerability Detection of LLMs via Curriculum Preference Optimization with Synthetic Reasoning Data

Authors: Xin-Cheng Wen, Yijun Yang, Cuiyun Gao, Yang Xiao, Deheng Ye |

阅读更多

来源: ArXiv AI | 10-06-25

LLM-Enhanced Rapid-Reflex Async-Reflect Embodied Agent for Real-Time Decision-Making in Dynamically Changing Environments

Authors: Yangqing Zheng, Shunqi Mao, Dingxin Zhang, Weidong Cai |

阅读更多

来源: ArXiv AI | 10-06-25

Evaluating Visual Mathematics in Multimodal LLMs: A Multilingual Benchmark Based on the Kangaroo Tests

Authors: Arnau Igualde Sáez, Lamyae Rhomrasi, Yusef Ahsini, Ricardo Vinuesa, Sergio Hoyas, Jose P. García Sabater, Marius J. Fullana i Alfonso, J. Alberto Conejero |

阅读更多

来源: ArXiv AI | 10-06-25

An Intelligent Fault Self-Healing Mechanism for Cloud AI Systems via Integration of Large Language Models and Deep Reinforcement Learning

Authors: Ze Yang, Yihong Jin, Juntian Liu, Xinhe Xu |

阅读更多

来源: ArXiv AI | 10-06-25

Fact in Fragments: Deconstructing Complex Claims via LLM-based Atomic Fact Extraction and Verification

Authors: Liwen Zheng, Chaozhuo Li, Zheng Liu, Feiran Huang, Haoran Jia, Zaisheng Ye, Xi Zhang |

阅读更多

来源: ArXiv AI | 10-06-25

Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions

Authors: Lu Ma, Hao Liang, Meiyi Qiang, Lexiang Tang, Xiaochen Ma, Zhen Hao Wong, Junbo Niu, Chengyu Shen, Runming He, Bin Cui, Wentao Zhang |

阅读更多

来源: ArXiv AI | 10-06-25

REMoH: A Reflective Evolution of Multi-objective Heuristics approach via Large Language Models

Authors: Diego Forniés-Tabuenca, Alejandro Uribe, Urtzi Otamendi, Arkaitz Artetxe, Juan Carlos Rivera, Oier Lopez de Lacalle |

阅读更多

来源: ArXiv AI | 10-06-25

RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards

Authors: Jingnan Zheng, Xiangtian Ji, Yijun Lu, Chenhang Cui, Weixiang Zhao, Gelei Deng, Zhenkai Liang, An Zhang, Tat-Seng Chua |

阅读更多

来源: ArXiv AI | 10-06-25

Addition in Four Movements: Mapping Layer-wise Information Trajectories in LLMs

Authors: Yao Yan |

阅读更多

来源: ArXiv AI | 10-06-25

Evaluating Large Language Models on the Frame and Symbol Grounding Problems: A Zero-shot Benchmark

Authors: Shoko Oka |

阅读更多

来源: ArXiv AI | 10-06-25

Gradients: When Markets Meet Fine-tuning -- A Distributed Approach to Model Optimisation

Authors: Christopher Subia-Waud (Rayonlabs Team) |

阅读更多

来源: ArXiv AI | 10-06-25

Solving Inequality Proofs with Large Language Models

Authors: Jiayi Sheng, Luna Lyu, Jikai Jin, Tony Xia, Alex Gu, James Zou, Pan Lu |

阅读更多

来源: ArXiv AI | 10-06-25

Hey, That's My Data! Label-Only Dataset Inference in Large Language Models

Authors: Chen Xiong, Zihao Wang, Rui Zhu, Tsung-Yi Ho, Pin-Yu Chen, Jingwei Xiong, Haixu Tang, Lucila Ohno-Machado |

阅读更多

来源: ArXiv AI | 10-06-25

End-to-End Framework for Robot Lawnmower Coverage Path Planning using Cellular Decomposition

Authors: Nikunj Shah, Utsav Dey, Kenji Nishimiya |

阅读更多

来源: ArXiv AI | 10-06-25

Text-to-LoRA: Instant Transformer Adaption

Authors: Rujikorn Charakorn, Edoardo Cetin, Yujin Tang, Robert Tjarko Lange |

阅读更多

来源: ArXiv AI | 10-06-25

Simple Yet Effective: Extracting Private Data Across Clients in Federated Fine-Tuning of Large Language Models

Authors: Yingqi Hu, Zhuo Zhang, Jingyuan Zhang, Lizhen Qu, Zenglin Xu |

阅读更多

来源: ArXiv AI | 10-06-25

semantic-features: A User-Friendly Tool for Studying Contextual Word Embeddings in Interpretable Semantic Spaces

Authors: Jwalanthi Ranganathan, Rohan Jha, Kanishka Misra, Kyle Mahowald |

阅读更多

来源: ArXiv AI | 10-06-25

(AI peers) are people learning from the same standpoint: Perception of AI characters in a Collaborative Science Investigation

Authors: Eunhye Grace Ko, Soo Hyoung Joo |

阅读更多

来源: ArXiv AI | 10-06-25

DesignBench: A Comprehensive Benchmark for MLLM-based Front-end Code Generation

Authors: Jingyu Xiao, Ming Wang, Man Ho Lam, Yuxuan Wan, Junliang Liu, Yintong Huo, Michael R. Lyu |

阅读更多

来源: ArXiv AI | 10-06-25

Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models

Authors: Zahra Babaiee, Peyman M. Kiasari, Daniela Rus, Radu Grosu |

阅读更多

来源: ArXiv AI | 10-06-25

Towards an Explainable Comparison and Alignment of Feature Embeddings

Authors: Mohammad Jalali, Bahar Dibaei Nia, Farzan Farnia |

阅读更多

来源: ArXiv AI | 10-06-25

Towards Data Systems That Are Business Semantic-Centric and AI Agents-Assisted

Authors: Cecil Pang |

阅读更多

来源: ArXiv AI | 10-06-25

Contextual Memory Intelligence -- A Foundational Paradigm for Human-AI Collaboration and Reflective Generative AI Systems

Authors: Kristy Wedel |

阅读更多

来源: ArXiv AI | 10-06-25

Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias

Authors: Yuanzhe Hu, Kinshuk Goel, Vlad Killiakov, Yaoqing Yang |

阅读更多

来源: ArXiv AI | 10-06-25

Explainability in Context: A Multilevel Framework Aligning AI Explanations with Stakeholder with LLMs

Authors: Marilyn Bello, Rafael Bello, Maria-Matilde García, Ann Nowé, Iván Sevillano-García, Francisco Herrera |

阅读更多

来源: ArXiv AI | 10-06-25

CrimeMind: Simulating Urban Crime with Multi-Modal LLM Agents

Authors: Qingbin Zeng, Ruotong Zhao, Jinzhu Mao, Haoyang Li, Fengli Xu, Yong Li |

阅读更多

来源: ArXiv AI | 10-06-25

Preference Learning for AI Alignment: a Causal Perspective

Authors: Katarzyna Kobalczyk, Mihaela van der Schaar |

阅读更多

来源: ArXiv AI | 10-06-25

CP-Bench: Evaluating Large Language Models for Constraint Modelling

Authors: Kostis Michailidis, Dimos Tsouros, Tias Guns |

阅读更多

来源: ArXiv AI | 10-06-25

PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time

Authors: Weizhi Zhang, Xinyang Zhang, Chenwei Zhang, Liangwei Yang, Jingbo Shang, Zhepei Wei, Henry Peng Zou, Zijie Huang, Zhengyang Wang, Yifan Gao, Xiaoman Pan, Lian Xiong, Jingguo Liu, Philip S. Yu, Xian Li |

阅读更多

来源: ArXiv AI | 10-06-25

The last six months in LLMs, illustrated by pelicans on bicyclessimonwillison.net

阅读更多

来源: Hacker News | 09-06-25

What happens when people don't understand how AI workstheatlantic.com

阅读更多

来源: Hacker News | 09-06-25

LLMs are cheapsnellman.net

阅读更多

来源: Hacker News | 09-06-25

OpenAI leaves the question of AI consciousness consciously unanswered

阅读更多

来源: The Decoder | 09-06-25

Anthropic cuts Claude access for Windsurf after OpenAI's $3B takeover news

阅读更多

来源: The Decoder | 09-06-25

Building an AI server on a budgetinformationga.in

阅读更多

来源: Hacker News | 09-06-25

Teaming in the AI Era: AI-Augmented Frameworks for Forming, Simulating, and Optimizing Human Teams

Authors: Mohammed Almutairi |

阅读更多

来源: ArXiv AI | 08-06-25

Exploring Diffusion Transformer Designs via Grafting

Authors: Keshigeyan Chandrasegaran, Michael Poli, Daniel Y. Fu, Dongjun Kim, Lea M. Hadzic, Manling Li, Agrim Gupta, Stefano Massaroli, Azalia Mirhoseini, Juan Carlos Niebles, Stefano Ermon, Li Fei-Fei |

阅读更多

来源: ArXiv AI | 08-06-25

Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay

Authors: Yifan Sun, Jingyan Shen, Yibin Wang, Tianyu Chen, Zhendong Wang, Mingyuan Zhou, Huan Zhang |

阅读更多

来源: ArXiv AI | 08-06-25

Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models

Authors: Taha Entesari, Arman Hatami, Rinat Khaziev, Anil Ramakrishna, Mahyar Fazlyab |

阅读更多

来源: ArXiv AI | 08-06-25

Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games

Authors: Niv Eckhaus, Uri Berger, Gabriel Stanovsky |

阅读更多

来源: ArXiv AI | 08-06-25

Contextual Integrity in LLMs via Reasoning and Reinforcement Learning

Authors: Guangchen Lan, Huseyin A. Inan, Sahar Abdelnabi, Janardhan Kulkarni, Lukas Wutschitz, Reza Shokri, Christopher G. Brinton, Robert Sim |

阅读更多

来源: ArXiv AI | 08-06-25

Zero-Shot Adaptation of Parameter-Efficient Fine-Tuning in Diffusion Models

Authors: Farzad Farhadzadeh, Debasmit Das, Shubhankar Borse, Fatih Porikli |

阅读更多

来源: ArXiv AI | 08-06-25

Plugging Schema Graph into Multi-Table QA: A Human-Guided Framework for Reducing LLM Reliance

Authors: Xixi Wang, Miguel Costa, Jordanka Kovaceva, Shuai Wang, Francisco C. Pereira |

阅读更多

来源: ArXiv AI | 08-06-25

CogMath: Assessing LLMs' Authentic Mathematical Ability from a Human Cognitive Perspective

Authors: Jiayu Liu, Zhenya Huang, Wei Dai, Cheng Cheng, Jinze Wu, Jing Sha, Song Li, Qi Liu, Shijin Wang, Enhong Chen |

阅读更多

来源: ArXiv AI | 08-06-25

Matching Markets Meet LLMs: Algorithmic Reasoning with Ranked Preferences

Authors: Hadi Hosseini, Samarth Khanna, Ronak Singh |

阅读更多

来源: ArXiv AI | 08-06-25

Schema Generation for Large Knowledge Graphs Using Large Language Models

Authors: Bohui Zhang, Yuan He, Lydia Pintscher, Albert Meroño Peñuela, Elena Simperl |

阅读更多

来源: ArXiv AI | 08-06-25

"Don't Do That!": Guiding Embodied Systems through Large Language Model-based Constraint Generation

Authors: Aladin Djuhera, Amin Seffo, Masataro Asai, Holger Boche |

阅读更多

来源: ArXiv AI | 08-06-25

DeePoly: A High-Order Accuracy and Efficiency Deep-Polynomial Framework for Scientific Machine Learning

Authors: Li Liu, Heng Yong |

阅读更多

来源: ArXiv AI | 08-06-25

E-bike agents: Large Language Model-Driven E-Bike Accident Analysis and Severity Prediction

Authors: Zhichao Yang, Jiashu He, Mohammad B. Al-Khasawneh, Darshan Pandit, Cirillo Cinzia |

阅读更多

来源: ArXiv AI | 08-06-25

Agents of Change: Self-Evolving LLM Agents for Strategic Planning

Authors: Nikolas Belle, Dakota Barnes, Alfonso Amayuelas, Ivan Bercovich, Xin Eric Wang, William Wang |

阅读更多

来源: ArXiv AI | 08-06-25

Ontology-based knowledge representation for bone disease diagnosis: a foundation for safe and sustainable medical artificial intelligence systems

Authors: Loan Dao, Ngoc Quoc Ly |

阅读更多

来源: ArXiv AI | 08-06-25

Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design

Authors: Lin Sun, Weihong Lin, Jinzhu Wu, Yongfu Zhu, Xiaoqi Jian, Guangxiang Zhao, Change Jia, Linglin Zhang, Sai-er Hu, Yuhan Wu, Xiangzheng Zhang |

阅读更多

来源: ArXiv AI | 08-06-25

Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning

Authors: Jiayu Wang, Yifei Ming, Zixuan Ke, Caiming Xiong, Shafiq Joty, Aws Albarghouthi, Frederic Sala |

阅读更多

来源: ArXiv AI | 08-06-25

LLMs for sensory-motor control: Combining in-context and iterative learning

Authors: Jônata Tyska Carvalho, Stefano Nolfi |

阅读更多

来源: ArXiv AI | 08-06-25

When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models

Authors: Kai Wang, Yihao Zhang, Meng Sun |

阅读更多

来源: ArXiv AI | 08-06-25

LLM-First Search: Self-Guided Exploration of the Solution Space

Authors: Nathan Herr, Tim Rocktäschel, Roberta Raileanu |

阅读更多

来源: ArXiv AI | 08-06-25

Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic Reasoning

Authors: Mehdi Azarafza, Mojtaba Nayyeri, Faezeh Pasandideh, Steffen Staab, Achim Rettberg |

阅读更多

来源: ArXiv AI | 08-06-25

Control Tax: The Price of Keeping AI in Check

Authors: Mikhail Terekhov, Zhen Ning David Liu, Caglar Gulcehre, Samuel Albanie |

阅读更多

来源: ArXiv AI | 08-06-25

Focus and Context and LLMsglek.net

阅读更多

来源: Hacker News | 08-06-25

Field Notes from Shipping Real Code with Claudediwank.space

阅读更多

来源: Hacker News | 08-06-25

Elevenlabs' Eleven v3 lets AI voices whisper, laugh and express emotions naturally

阅读更多

来源: The Decoder | 08-06-25

OpenAI starts retaining all ChatGPT user data, including deleted chats and API data

阅读更多

来源: The Decoder | 08-06-25

I read all of Cloudflare's Claude-generated commitsmaxemitchell.com

阅读更多

来源: Hacker News | 08-06-25

Updates to Advanced Voice Mode for paid usershelp.openai.com

阅读更多

来源: Hacker News | 08-06-25

Reddit sues Anthropic for scraping site content to train Claude

阅读更多

来源: The Decoder | 07-06-25

Meta's new high-tech Aria Gen 2 glasses are the ultimate AI training data collector

阅读更多

来源: The Decoder | 07-06-25

Sandia turns on brain-like storage-free supercomputerblocksandfiles.com

阅读更多

来源: Hacker News | 07-06-25

Show HN: AI game animation sprite generatorgodmodeai.cloud

阅读更多

来源: Hacker News | 07-06-25

Workhorse LLMs: Why Open Source Models Dominate Closed Source for Batch Taskssutro.sh

阅读更多

来源: Hacker News | 07-06-25

The Illusion of Thinking: Understanding the Limitations of Reasoning LLMs [pdf]cdn-apple.com

阅读更多

来源: Hacker News | 07-06-25

NASA delays next flight of Boeing's alternative to SpaceX Dragontheedgemalaysia.com

阅读更多

来源: Hacker News | 07-06-25

Reverse Engineering Cursor's LLM Clienttensorzero.com

阅读更多

来源: Hacker News | 07-06-25

Onyx (YC W24) – AI Assistants for Work Hiring Founding AEycombinator.com

阅读更多

来源: Hacker News | 07-06-25

Meta: Shut down your invasive AI Discover feedmozillafoundation.org

阅读更多

来源: Hacker News | 07-06-25

What "Working" Means in the Era of AI Appsa16z.com

阅读更多

来源: Hacker News | 07-06-25

OpenAI reaches three million enterprise users, adds new ChatGPT business features

阅读更多

来源: The Decoder | 06-06-25

Tokasaurus: An LLM inference engine for high-throughput workloadsstanford.edu

阅读更多

来源: Hacker News | 06-06-25

How we’re responding to The NYT’s data demands in order to protect user privacyopenai.com

阅读更多

来源: Hacker News | 06-06-25

Show HN: Claude Composergithub.com/possibilities

阅读更多

来源: Hacker News | 06-06-25

Anthropic slashes Claude 3.x access on Windsurf following OpenAI's reported $3 billion takeover

阅读更多

来源: The Decoder | 06-06-25

Anthropic co-founder on cutting access to Windsurftechcrunch.com

阅读更多

来源: Hacker News | 06-06-25

Machine Learning: The Native Language of Biologydecodingbiology.substack.com

阅读更多

来源: Hacker News | 06-06-25

OpenAI brings longer-term memory feature to free ChatGPT users

阅读更多

来源: The Decoder | 05-06-25

OpenAI adds new features and improvements to its agent development tools and language model

阅读更多

来源: The Decoder | 05-06-25

Yoshua Bengio launches LawZero to develop safe AI systems free from commercial influence

阅读更多

来源: The Decoder | 05-06-25

A practical guide to building agents [pdf]cdn.openai.com

阅读更多

来源: Hacker News | 05-06-25

Differences in link hallucination and source comprehension across different LLMmikecaulfield.substack.com

阅读更多

来源: Hacker News | 05-06-25

Comparing Claude System Prompts Reveal Anthropic's Prioritiesdbreunig.com

阅读更多

来源: Hacker News | 05-06-25

LLMs and Elixir: Windfall or Deathblow?zachdaniel.dev

阅读更多

来源: Hacker News | 05-06-25

Prompt engineering playbook for programmersaddyo.substack.com

阅读更多

来源: Hacker News | 05-06-25

OpenAI slams court order to save all ChatGPT logs, including deleted chatsarstechnica.com

阅读更多

来源: Hacker News | 05-06-25

From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaningarxiv.org

阅读更多

来源: Hacker News | 05-06-25

Generating Automotive Code: Large Language Models for Software Development and Verification in Safety-Critical Systems

Authors: Sven Kirchner, Alois C. Knoll |

阅读更多

来源: ArXiv AI | 05-06-25

High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning

Authors: Tim Franzmeyer, Archie Sravankumar, Lijuan Liu, Yuning Mao, Rui Hou, Sinong Wang, Jakob N. Foerster, Luke Zettlemoyer, Madian Khabsa |

阅读更多

来源: ArXiv AI | 05-06-25

Explainability-Based Token Replacement on LLM-Generated Text

Authors: Hadi Mohammadi, Anastasia Giachanou, Daniel L. Oberski, Ayoub Bagheri |

阅读更多

来源: ArXiv AI | 05-06-25

Lacuna Inc. at SemEval-2025 Task 4: LoRA-Enhanced Influence-Based Unlearning for LLMs

Authors: Aleksey Kudelya, Alexander Shirnin |

阅读更多

来源: ArXiv AI | 05-06-25

Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate

Authors: Mikel K. Ngueajio, Flor Miriam Plaza-del-Arco, Yi-Ling Chung, Danda B. Rawat, Amanda Cercas Curry |

阅读更多

来源: ArXiv AI | 05-06-25

EuroLLM-9B: Technical Report

Authors: Pedro Henrique Martins, João Alves, Patrick Fernandes, Nuno M. Guerreiro, Ricardo Rei, Amin Farajian, Mateusz Klimaszewski, Duarte M. Alves, José Pombal, Manuel Faysse, Pierre Colombo, François Yvon, Barry Haddow, José G. C. de Souza, Alexandra Birch, André F. T. Martins |

阅读更多

来源: ArXiv AI | 05-06-25

LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation

Authors: Ming Zhang, Yujiong Shen, Zelin Li, Huayu Sha, Binze Hu, Yuhui Wang, Chenhao Huang, Shichun Liu, Jingqi Tong, Changhao Jiang, Mingxu Chai, Zhiheng Xi, Shihan Dou, Tao Gui, Qi Zhang, Xuanjing Huang |

阅读更多

来源: ArXiv AI | 05-06-25

A Comprehensive Study on Medical Image Segmentation using Deep Neural Networks

Authors: Loan Dao, Ngoc Quoc Ly |

阅读更多

来源: ArXiv AI | 05-06-25

TracLLM: A Generic Framework for Attributing Long Context LLMs

Authors: Yanting Wang, Wei Zou, Runpeng Geng, Jinyuan Jia |

阅读更多

来源: ArXiv AI | 05-06-25

A Trustworthiness-based Metaphysics of Artificial Intelligence Systems

Authors: Andrea Ferrario |

阅读更多

来源: ArXiv AI | 05-06-25

Computational Architects of Society: Quantum Machine Learning for Social Rule Genesis

Authors: Shan Shan |

阅读更多

来源: ArXiv AI | 05-06-25

SUMO-MCP: Leveraging the Model Context Protocol for Autonomous Traffic Simulation and Optimization

Authors: Chenglong Ye, Gang Xiong, Junyou Shang, Xingyuan Dai, Xiaoyan Gong, Yisheng Lv |

阅读更多

来源: ArXiv AI | 05-06-25

CogniPair: From LLM Chatbots to Conscious AI Agents -- GNWT-Based Multi-Agent Digital Twins for Social Pairing -- Dating & Hiring Applications

Authors: Wanghao Ye, Sihan Chen, Yiting Wang, Shwai He, Bowei Tian, Guoheng Sun, Ziyi Wang, Ziyao Wang, Yexiao He, Zheyu Shen, Meng Liu, Yuning Zhang, Meng Feng, Yang Wang, Siyuan Peng, Yilong Dai, Zhenle Duan, Hanzhang Qin, Ang Li |

阅读更多

来源: ArXiv AI | 05-06-25

Reason from Future: Reverse Thought Chain Enhances LLM Reasoning

Authors: Yinlong Xu, Yanzhao Zheng, Shuoshuo Sun, Shuaihan Huang, Baohua Dong, Hangcheng Zhu, Ruohui Huang, Gang Yu, Hongxia Xu, Jian Wu |

阅读更多

来源: ArXiv AI | 05-06-25

Training Cross-Morphology Embodied AI Agents: From Practical Challenges to Theoretical Foundations

Authors: Shaoshan Liu, Fan Wang, Hongjun Zhou, Yuanfeng Wang |

阅读更多

来源: ArXiv AI | 05-06-25

Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games

Authors: Dongmin Park, Minkyu Kim, Beongjun Choi, Junhyuck Kim, Keon Lee, Jonghyun Lee, Inkyu Park, Byeong-Uk Lee, Jaeyoung Hwang, Jaewoo Ahn, Ameya S. Mahabaleshwarkar, Bilal Kartal, Pritam Biswas, Yoshi Suhara, Kangwook Lee, Jaewoong Cho |

阅读更多

来源: ArXiv AI | 05-06-25

AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance

Authors: Dhaval Patel, Shuxin Lin, James Rayfield, Nianjun Zhou, Roman Vaculin, Natalia Martinez, Fearghal O'donncha, Jayant Kalagnanam |

阅读更多

来源: ArXiv AI | 05-06-25

Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to Enhance LLM Reasoning

Authors: Junqi Gao, Xiang Zou, YIng Ai, Dong Li, Yichen Niu, Biqing Qi, Jianxing Liu |

阅读更多

来源: ArXiv AI | 05-06-25

AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM-Based Agents

Authors: Akshat Naik, Patrick Quinn, Guillermo Bosch, Emma Gouné, Francisco Javier Campos Zabala, Jason Ross Brown, Edward James Young |

阅读更多

来源: ArXiv AI | 05-06-25

TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems

Authors: Shaina Raza, Ranjan Sapkota, Manoj Karkee, Christos Emmanouilidis |

阅读更多

来源: ArXiv AI | 05-06-25

Character.AI moves toward social networking with animated AI avatars

阅读更多

来源: The Decoder | 05-06-25

Show HN: App.build, an open-source AI agent that builds full-stack appsapp.build

阅读更多

来源: Hacker News | 05-06-25

VectorSmuggle: Covertly Exfiltrate Data in Embeddingsgithub.com/jaschadub

阅读更多

来源: Hacker News | 05-06-25

After court order, OpenAI is now preserving all ChatGPT user logslaurenweinstein.org

阅读更多

来源: Hacker News | 05-06-25

Deepmind's "force prompting" lets AI create realistic video motion without physics engines

阅读更多

来源: The Decoder | 04-06-25

AI-generated CUDA kernels outperform PyTorch in several GPU-heavy machine learning benchmarks

阅读更多

来源: The Decoder | 04-06-25

Apple reportedly tests AI models that match ChatGPT's capabilities in internal benchmarks

阅读更多

来源: The Decoder | 04-06-25

Show HN: Tiptap AI Agent – Add AI workflows to your text editor in minutes

阅读更多

来源: Hacker News | 04-06-25

The Sky's the limit: AI automation on Mactaoofmac.com

阅读更多

来源: Hacker News | 04-06-25

Claude Code is now available to Pro plansanthropic.com

阅读更多

来源: Hacker News | 04-06-25

Deep learning gets the glory, deep fact checking gets ignoredfast.ai

阅读更多

来源: Hacker News | 04-06-25

A deep dive into self-improving AI and the Darwin-Gödel Machinerichardcsuwandi.github.io

阅读更多

来源: Hacker News | 04-06-25

Cloud Run GPUs, now GA, makes running AI workloads easier for everyonecloud.google.com

阅读更多

来源: Hacker News | 04-06-25

Entity-Augmented Neuroscience Knowledge Retrieval Using Ontology and Semantic Understanding Capability of LLM

Authors: Pralaypati Ta, Sriram Venkatesaperumal, Keerthi Ram, Mohanasankar Sivaprakasam |

阅读更多

来源: ArXiv AI | 04-06-25

SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation

Authors: Siqi Chen, Xinyu Dong, Haolei Xu, Xingyu Wu, Fei Tang, Hang Zhang, Yuchen Yan, Linjuan Wu, Wenqi Zhang, Guiyang Hou, Yongliang Shen, Weiming Lu, Yueting Zhuang |

阅读更多

来源: ArXiv AI | 04-06-25

Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation

Authors: Priyaranjan Pattnayak, Amit Agarwal, Hansa Meghwani, Hitesh Laxmichand Patel, Srikant Panda |

阅读更多

来源: ArXiv AI | 04-06-25

The State of Large Language Models for African Languages: Progress and Challenges

Authors: Kedir Yassin Hussen, Walelign Tewabe Sewunetie, Abinew Ali Ayele, Sukairaj Hafiz Imam, Shamsuddeen Hassan Muhammad, Seid Muhie Yimam |

阅读更多

来源: ArXiv AI | 04-06-25

Improving LLM-Generated Code Quality with GRPO

Authors: Maxime Robeyns, Laurence Aitchison |

阅读更多

来源: ArXiv AI | 04-06-25

Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts

Authors: Haizhong Zheng, Yang Zhou, Brian R. Bartoldson, Bhavya Kailkhura, Fan Lai, Jiawei Zhao, Beidi Chen |

阅读更多

来源: ArXiv AI | 04-06-25

ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code

Authors: Tianyu Hua, Harper Hua, Violet Xiang, Benjamin Klieger, Sang T. Truong, Weixin Liang, Fan-Yun Sun, Nick Haber |

阅读更多

来源: ArXiv AI | 04-06-25

Generative AI for Predicting 2D and 3D Wildfire Spread: Beyond Physics-Based Models and Traditional Deep Learning

Authors: Haowen Xu, Sisi Zlatanova, Ruiyu Liang, Ismet Canbulat |

阅读更多

来源: ArXiv AI | 04-06-25

A Smart Multimodal Healthcare Copilot with Powerful LLM Reasoning

Authors: Xuejiao Zhao, Siyan Liu, Su-Yin Yang, Chunyan Miao |

阅读更多

来源: ArXiv AI | 04-06-25

Towards Generating Controllable and Solvable Geometry Problem by Leveraging Symbolic Deduction Engine

Authors: Zhuoxuan Jiang, Tianyang Zhang, Peiyan Peng, Jing Chen, Yinong Xun, Haotian Zhang, Lichi Li, Yong Li, Shaohua Zhang |

阅读更多

来源: ArXiv AI | 04-06-25

Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making

Authors: Xu Wan, Wenyue Xu, Chao Yang, Mingyang Sun |

阅读更多

来源: ArXiv AI | 04-06-25

ADFormer: Aggregation Differential Transformer for Passenger Demand Forecasting

Authors: Haichen Wang, Liu Yang, Xinyuan Zhang, Haomin Yu, Ming Li, Jilin Hu |

阅读更多

来源: ArXiv AI | 04-06-25

Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation

Authors: Yue Yang, MingKang Chen, Qihua Liu, Mengkang Hu, Qiguang Chen, Gengrui Zhang, Shuyue Hu, Guangtao Zhai, Yu Qiao, Yu Wang, Wenqi Shao, Ping Luo |

阅读更多

来源: ArXiv AI | 04-06-25

From Prompts to Protection: Large Language Model-Enabled In-Context Learning for Smart Public Safety UAV

Authors: Yousef Emami, Hao Zhou, Miguel Gutierrez Gaitan, Kai Li, Luis Almeida, Zhu Han |

阅读更多

来源: ArXiv AI | 04-06-25

Open-Set Living Need Prediction with Large Language Models

Authors: Xiaochong Lan, Jie Feng, Yizhou Sun, Chen Gao, Jiahuan Lei, Xinlei Shi, Hengliang Luo, Yong Li |

阅读更多

来源: ArXiv AI | 04-06-25

Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations

Authors: Jinyuan Luo, Zhen Fang, Yixuan Li, Seongheon Park, Ling Chen |

阅读更多

来源: ArXiv AI | 04-06-25

Why do AI agents communicate in human language?

Authors: Pengcheng Zhou, Yinglun Feng, Halimulati Julaiti, Zhongliang Yang |

阅读更多

来源: ArXiv AI | 04-06-25

Benchmarking and Advancing Large Language Models for Local Life Services

Authors: Xiaochong Lan, Jie Feng, Jiahuan Lei, Xinlei Shi, Yong Li |

阅读更多

来源: ArXiv AI | 04-06-25

TaxAgent: How Large Language Model Designs Fiscal Policy

Authors: Jizhou Wang, Xiaodan Fang, Lei Huang, Yongfeng Huang |

阅读更多

来源: ArXiv AI | 04-06-25

Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning

Authors: Chen Qian, Dongrui Liu, Haochen Wen, Zhen Bai, Yong Liu, Jing Shao |

阅读更多

来源: ArXiv AI | 04-06-25

Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs

Authors: Shangmin Guo, Omar Darwiche Domingues, Raphaël Avalos, Aaron Courville, Florian Strub |

阅读更多

来源: ArXiv AI | 04-06-25

It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics

Authors: Matthew Kowal, Jasper Timm, Jean-Francois Godbout, Thomas Costello, Antonio A. Arechar, Gordon Pennycook, David Rand, Adam Gleave, Kellin Pelrine |

阅读更多

来源: ArXiv AI | 04-06-25

Linear Spatial World Models Emerge in Large Language Models

Authors: Matthieu Tehenan, Christian Bolivar Moya, Tenghai Long, Guang Lin |

阅读更多

来源: ArXiv AI | 04-06-25

DPO Learning with LLMs-Judge Signal for Computer Use Agents

Authors: Man Luo, David Cobbley, Xin Su, Shachar Rosenman, Vasudev Lal, Shao-Yen Tseng, Phillip Howard |

阅读更多

来源: ArXiv AI | 04-06-25

Anthropic's Claude uses Elevenlabs technology for speech features rather than an in-house model

阅读更多

来源: The Decoder | 03-06-25

Google says Veo 3 users have generated millions of AI videos in just a few days

阅读更多

来源: The Decoder | 03-06-25

Cloudlflare builds OAuth with Claude and publishes all the promptsgithub.com/cloudflare

阅读更多

来源: Hacker News | 03-06-25

Spark AI (YC W24) Is Hiring a Full Stack Engineer in San Franciscoycombinator.com

阅读更多

来源: Hacker News | 03-06-25

My AI skeptic friends are all nutsfly.io

阅读更多

来源: Hacker News | 03-06-25

Claude has learned how to jailbreak Cursorcursor.com

阅读更多

来源: Hacker News | 03-06-25

PolyBERT: Fine-Tuned Poly Encoder BERT-Based Model for Word Sense Disambiguation

Authors: Linhan Xia, Mingzhan Yang, Guohui Yuan, Shengnan Tao, Yujing Qiu, Guo Yu, Kai Lei |

阅读更多

来源: ArXiv AI | 03-06-25

Unlocking Personalized Knowledge in Federated Large Language Model: The Power of Mixture of Experts

Authors: Fan Liu, Bikang Pan, Zhongyi Wang, Xi Yao, Xiaoying Tang, Jingya Wang, Ye Shi |

阅读更多

来源: ArXiv AI | 03-06-25

The Coming Crisis of Multi-Agent Misalignment: AI Alignment Must Be a Dynamic and Social Process

Authors: Florian Carichon, Aditi Khandelwal, Marylou Fauchard, Golnoosh Farnadi |

阅读更多

来源: ArXiv AI | 03-06-25

MCP-Zero: Proactive Toolchain Construction for LLM Agents from Scratch

Authors: Xiang Fei, Xiawu Zheng, Hao Feng |

阅读更多

来源: ArXiv AI | 03-06-25

IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory

Authors: Wei Song, Zhenya Huang, Cheng Cheng, Weibo Gao, Bihan Xu, GuanHao Zhao, Fei Wang, Runze Wu |

阅读更多

来源: ArXiv AI | 03-06-25

ChemAU: Harness the Reasoning of LLMs in Chemical Research with Adaptive Uncertainty Estimation

Authors: Xinyi Liu, Lipeng Ma, Yixuan Li, Weidong Yang, Qingyuan Zhou, Jiayi Song, Shuhao Li, Ben Fei |

阅读更多

来源: ArXiv AI | 03-06-25

Modular Speaker Architecture: A Framework for Sustaining Responsibility and Contextual Integrity in Multi-Agent AI Communication

Authors: Khe-Han Toh, Hong-Kuan Teo |

阅读更多

来源: ArXiv AI | 03-06-25

GeoLocSFT: Efficient Visual Geolocation via Supervised Fine-Tuning of Multimodal Foundation Models

Authors: Qiang Yi, Lianlei Shan |

阅读更多

来源: ArXiv AI | 03-06-25

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning

Authors: Zhong Zhang, Yaxi Lu, Yikun Fu, Yupeng Huo, Shenzhi Yang, Yesai Wu, Han Si, Xin Cong, Haotian Chen, Yankai Lin, Jie Xie, Wei Zhou, Wang Xu, Yuanheng Zhang, Zhou Su, Zhongwu Zhai, Xiaoming Liu, Yudong Mei, Jianming Xu, Hongyan Tian, Chongyi Wang, Chi Chen, Yuan Yao, Zhiyuan Liu, Maosong Sun |

阅读更多

来源: ArXiv AI | 03-06-25

Distinguishing Autonomous AI Agents from Collaborative Agentic Systems: A Comprehensive Framework for Understanding Modern Intelligent Architectures

Authors: Prashik Buddhaghosh Bansod |

阅读更多

来源: ArXiv AI | 03-06-25

FinRobot: Generative Business Process AI Agents for Enterprise Resource Planning in Finance

Authors: Hongyang Yang, Likun Lin, Yang She, Xinyu Liao, Jiaoyang Wang, Runjia Zhang, Yuquan Mo, Christina Dan Wang |

阅读更多

来源: ArXiv AI | 03-06-25

MLA-Trust: Benchmarking Trustworthiness of Multimodal LLM Agents in GUI Environments

Authors: Xiao Yang, Jiawei Chen, Jun Luo, Zhengwei Fang, Yinpeng Dong, Hang Su, Jun Zhu |

阅读更多

来源: ArXiv AI | 03-06-25

Social Cooperation in Conversational AI Agents

Authors: Mustafa Mert Çelikok, Saptarashmi Bandyopadhyay, Robert Loftin |

阅读更多

来源: ArXiv AI | 03-06-25

Reasoning-Based Approach with Chain-of-Thought for Alzheimer's Detection Using Speech and Large Language Models

Authors: Chanwoo Park, Anna Seo Gyeong Choi, Sunghye Cho, Chanwoo Kim |

阅读更多

来源: ArXiv AI | 03-06-25

K12Vista: Exploring the Boundaries of MLLMs in K-12 Education

Authors: Chong Li, Chenglin Zhu, Tao Zhang, Mingan Lin, Zenan Zhou, Jian Xie |

阅读更多

来源: ArXiv AI | 03-06-25

The Ultimate Test of Superintelligent AI Agents: Can an AI Balance Care and Control in Asymmetric Relationships?

Authors: Djallel Bouneffouf, Matthew Riemer, Kush Varshney |

阅读更多

来源: ArXiv AI | 03-06-25

A Study on the MCP x A2A Framework for Enhancing Interoperability of LLM-based Autonomous Agents

Authors: Cheonsu Jeong |

阅读更多

来源: ArXiv AI | 03-06-25

Fodor and Pylyshyn's Legacy - Still No Human-like Systematic Compositionality in Neural Networks

Authors: Tim Woydt, Moritz Willig, Antonia Wüst, Lukas Helff, Wolfgang Stammer, Constantin A. Rothkopf, Kristian Kersting |

阅读更多

来源: ArXiv AI | 03-06-25

COALESCE: Economic and Security Dynamics of Skill-Based Task Outsourcing Among Team of Autonomous LLM Agents

Authors: Manish Bhatt, Ronald F. Del Rosario, Vineeth Sai Narajala, Idan Habler |

阅读更多

来源: ArXiv AI | 03-06-25

Large language models can learn and generalize steganographic chain-of-thought under process supervision

Authors: Joey Skaf, Luis Ibanez-Lissen, Robert McCarthy, Connor Watts, Vasil Georgiv, Hannes Whittingham, Lorena Gonzalez-Manzano, David Lindner, Cameron Tice, Edward James Young, Puria Radmard |

阅读更多

来源: ArXiv AI | 03-06-25

Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods

Authors: Yifan Hao, Xingyuan Pan, Hanning Zhang, Chenlu Ye, Rui Pan, Tong Zhang |

阅读更多

来源: ArXiv AI | 03-06-25

OpenAI sees human interaction as a competitor to ChatGPT's super assistant ambitions

阅读更多

来源: The Decoder | 03-06-25

Cloudlflare builds OAuth with Claude and publishes all the promptsgithub.com/cloudflare

阅读更多

来源: Hacker News | 03-06-25

Supervised Quantum Machine Learning: A Future Outlook from Qubits to Enterprise Applications

Authors: Srikanth Thudumu, Jason Fisher, Hung Du |

阅读更多

来源: ArXiv AI | 03-06-25

PhySense: Principle-Based Physics Reasoning Benchmarking for Large Language Models

Authors: Yinggan Xu, Yue Liu, Zhiqiang Gao, Changnan Peng, Di Luo |

阅读更多

来源: ArXiv AI | 03-06-25

Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck

Authors: Yuwen Tan, Yuan Qing, Boqing Gong |

阅读更多

来源: ArXiv AI | 03-06-25

Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs

Authors: Juraj Vladika, Annika Domres, Mai Nguyen, Rebecca Moser, Jana Nano, Felix Busch, Lisa C. Adams, Keno K. Bressem, Denise Bernhardt, Stephanie E. Combs, Kai J. Borm, Florian Matthes, Jan C. Peeken |

阅读更多

来源: ArXiv AI | 03-06-25

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Authors: Mingjie Liu, Shizhe Diao, Ximing Lu, Jian Hu, Xin Dong, Yejin Choi, Jan Kautz, Yi Dong |

阅读更多

来源: ArXiv AI | 03-06-25

Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning

Authors: Shuyao Xu, Cheng Peng, Jiangxuan Long, Weidi Xu, Wei Chu, Yuan Qi |

阅读更多

来源: ArXiv AI | 03-06-25

Lessons Learned: A Multi-Agent Framework for Code LLMs to Learn and Improve

Authors: Yuanzhe Liu, Ryan Deng, Tim Kaler, Xuhao Chen, Charles E. Leiserson, Yao Ma, Jie Chen |

阅读更多

来源: ArXiv AI | 03-06-25

Multi-RAG: A Multimodal Retrieval-Augmented Generation System for Adaptive Video Understanding

Authors: Mingyang Mao, Mariela M. Perez-Cabarcas, Utteja Kallakuri, Nicholas R. Waytowich, Xiaomin Lin, Tinoosh Mohsenin |

阅读更多

来源: ArXiv AI | 03-06-25

MSQA: Benchmarking LLMs on Graduate-Level Materials Science Reasoning and Knowledge

Authors: Jerry Junyang Cheung, Shiyao Shen, Yuchen Zhuang, Yinghao Li, Rampi Ramprasad, Chao Zhang |

阅读更多

来源: ArXiv AI | 03-06-25

mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation

Authors: Chan-Wei Hu, Yueqi Wang, Shuo Xing, Chia-Ju Chen, Zhengzhong Tu |

阅读更多

来源: ArXiv AI | 03-06-25

Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution

Authors: Qiao Xiao, Alan Ansell, Boqian Wu, Lu Yin, Mykola Pechenizkiy, Shiwei Liu, Decebal Constantin Mocanu |

阅读更多

来源: ArXiv AI | 03-06-25

GenIC: An LLM-Based Framework for Instance Completion in Knowledge Graphs

Authors: Amel Gader, Alsayed Algergawy |

阅读更多

来源: ArXiv AI | 03-06-25

E^2GraphRAG: Streamlining Graph-based RAG for High Efficiency and Effectiveness

Authors: Yibo Zhao, Jiapeng Zhu, Ye Guo, Kangkang He, Xiang Li |

阅读更多

来源: ArXiv AI | 03-06-25

Bootstrapping LLM Robustness for VLM Safety via Reducing the Pretraining Modality Gap

Authors: Wenhan Yang, Spencer Stice, Ali Payani, Baharan Mirzasoleiman |

阅读更多

来源: ArXiv AI | 03-06-25

How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning

Authors: Hongyi James Cai, Junlin Wang, Xiaoyin Chen, Bhuwan Dhingra |

阅读更多

来源: ArXiv AI | 03-06-25

Generative AI for Urban Design: A Stepwise Approach Integrating Human Expertise with Multimodal Diffusion Models

Authors: Mingyi He, Yuebing Liang, Shenhao Wang, Yunhan Zheng, Qingyi Wang, Dingyi Zhuang, Li Tian, Jinhua Zhao |

阅读更多

来源: ArXiv AI | 03-06-25

FABLE: A Novel Data-Flow Analysis Benchmark on Procedural Text for Large Language Model Evaluation

Authors: Vishal Pallagani, Nitin Gupta, John Aydin, Biplav Srivastava |

阅读更多

来源: ArXiv AI | 03-06-25

GridRoute: A Benchmark for LLM-Based Route Planning with Cardinal Movement in Grid Environments

Authors: Kechen Li, Yaotian Tao, Ximing Wen, Quanwei Sun, Zifei Gong, Chang Xu, Xizhe Zhang, Tianbo Ji |

阅读更多

来源: ArXiv AI | 03-06-25

Mind the Quote: Enabling Quotation-Aware Dialogue in LLMs via Plug-and-Play Modules

Authors: Yueqi Zhang, Peiwen Yuan, Shaoxiong Feng, Yiwei Li, Xinglin Wang, Jiayi Shi, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li |

阅读更多

来源: ArXiv AI | 03-06-25

Leveraging Knowledge Graphs and LLMs for Structured Generation of Misinformation

Authors: Sania Nayab, Marco Simoni, Giulio Rossolini |

阅读更多

来源: ArXiv AI | 03-06-25

Optimizing the Interface Between Knowledge Graphs and LLMs for Complex Reasoning

Authors: Vasilije Markovic, Lazar Obradovic, Laszlo Hajdu, Jovan Pavlovic |

阅读更多

来源: ArXiv AI | 03-06-25

SEAR: A Multimodal Dataset for Analyzing AR-LLM-Driven Social Engineering Behaviors

Authors: Tianlong Yu, Chenghang Ye, Zheyu Yang, Ziyi Zhou, Cui Tang, Zui Tao, Jun Zhang, Kailong Wang, Liting Zhou, Yang Yang, Ting Bi |

阅读更多

来源: ArXiv AI | 03-06-25

MELT: Towards Automated Multimodal Emotion Data Annotation by Leveraging LLM Embedded Knowledge

Authors: Xin Jing, Jiadong Wang, Iosif Tsangko, Andreas Triantafyllopoulos, Björn W. Schuller |

阅读更多

来源: ArXiv AI | 03-06-25

Random Rule Forest (RRF): Interpretable Ensembles of LLM-Generated Questions for Predicting Startup Success

Authors: Ben Griffin, Joseph Ternasky, Fuat Alican, Yigit Ihlamur |

阅读更多

来源: ArXiv AI | 03-06-25

Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models

Authors: Frederike Lübeck, Jonas Wildberger, Frederik Träuble, Maximilian Mordig, Sergios Gatidis, Andreas Krause, Bernhard Schölkopf |

阅读更多

来源: ArXiv AI | 03-06-25

EXP-Bench: Can AI Conduct AI Research Experiments?

Authors: Patrick Tser Jern Kon, Jiachen Liu, Xinyi Zhu, Qiuyi Ding, Jingjia Peng, Jiarong Xing, Yibo Huang, Yiming Qiu, Jayanth Srinivasa, Myungjin Lee, Mosharaf Chowdhury, Matei Zaharia, Ang Chen |

阅读更多

来源: ArXiv AI | 03-06-25

Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents

Authors: Yaxin Luo, Zhaoyi Li, Jiacheng Liu, Jiacheng Cui, Xiaohan Zhao, Zhiqiang Shen |

阅读更多

来源: ArXiv AI | 03-06-25

Elevenlabs' new AI voice system enables smoother interactions through real-time analysis

阅读更多

来源: The Decoder | 02-06-25

Anthropic CEO predicts 20% unemployment from AI - and suggests taxing every AI response

阅读更多

来源: The Decoder | 02-06-25

How can AI researchers save energy? By going backwardquantamagazine.org

阅读更多

来源: Hacker News | 02-06-25

Beyond the Black Box: Interpretability of LLMs in Financearxiv.org

阅读更多

来源: Hacker News | 02-06-25

Codex CLI is going nativegithub.com/openai

阅读更多

来源: Hacker News | 02-06-25

When Fine-Tuning Makes Sense: A Developer's Guidegetkiln.ai

阅读更多

来源: Hacker News | 02-06-25

Google AI Edge – On-device cross-platform AI deploymentai.google.dev

阅读更多

来源: Hacker News | 02-06-25

Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time

Authors: Mohamad Chehade, Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy, Dinesh Manocha, Hao Zhu, Amrit Singh Bedi |

阅读更多

来源: ArXiv AI | 01-06-25

SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA

Authors: Minrui Luo, Fuhang Kuang, Yu Wang, Zirui Liu, Tianxing He |

阅读更多

来源: ArXiv AI | 01-06-25

DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning

Authors: Ziyin Zhang, Jiahao Xu, Zhiwei He, Tian Liang, Qiuzhi Liu, Yansi Li, Linfeng Song, Zhengwen Liang, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu |

阅读更多

来源: ArXiv AI | 01-06-25

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

Authors: Diankun Wu, Fangfu Liu, Yi-Hsin Hung, Yueqi Duan |

阅读更多

来源: ArXiv AI | 01-06-25

Case-Based Reasoning Enhances the Predictive Power of LLMs in Drug-Drug Interaction

Authors: Guangyi Liu, Yongqi Zhang, Xunyuan Liu, Quanming Yao |

阅读更多

来源: ArXiv AI | 01-06-25

Second Opinion Matters: Towards Adaptive Clinical AI via the Consensus of Expert Model Ensemble

Authors: Amit Kumthekar, Zion Tilley, Henry Duong, Bhargav Patel, Michael Magnoli, Ahmed Omar, Ahmed Nasser, Chaitanya Gharpure, Yevgen Reztzov |

阅读更多

来源: ArXiv AI | 01-06-25

MathArena: Evaluating LLMs on Uncontaminated Math Competitions

Authors: Mislav Balunović, Jasper Dekoninck, Ivo Petrov, Nikola Jovanović, Martin Vechev |

阅读更多

来源: ArXiv AI | 01-06-25

A Unified Framework for Human AI Collaboration in Security Operations Centers with Trusted Autonomy

Authors: Ahmad Mohsin, Helge Janicke, Ahmed Ibrahim, Iqbal H. Sarker, Seyit Camtepe |

阅读更多

来源: ArXiv AI | 01-06-25

Autoformalization in the Era of Large Language Models: A Survey

Authors: Ke Weng, Lun Du, Sirui Li, Wangyue Lu, Haozhe Sun, Hengyu Liu, Tiancheng Zhang |

阅读更多

来源: ArXiv AI | 01-06-25

EVOREFUSE: Evolutionary Prompt Optimization for Evaluation and Mitigation of LLM Over-Refusal to Pseudo-Malicious Instructions

Authors: Xiaorui Wu, Xiaofeng Mao, Fei Li, Xin Zhang, Xiaolu Zhang, Jun Zhou, Yuxiang Peng, Li Zheng, Chong Teng, Donghong Ji, Zhuang Li |

阅读更多

来源: ArXiv AI | 01-06-25

SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents

Authors: Kunlun Zhu, Jiaxun Zhang, Ziheng Qi, Nuoxing Shang, Zijia Liu, Peixuan Han, Yue Su, Haofei Yu, Jiaxuan You |

阅读更多

来源: ArXiv AI | 01-06-25

Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise Analytics

Authors: Ran Zhang, Mohannad Elhamod |

阅读更多

来源: ArXiv AI | 01-06-25

Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability

Authors: Ruida Wang, Yuxin Li, Yi R. (May)Fung, Tong Zhang |

阅读更多

来源: ArXiv AI | 01-06-25

Deepseek's R1 model closes the gap with OpenAI and Google after major update

阅读更多

来源: The Decoder | 01-06-25

The ‘white-collar bloodbath’ is all part of the AI hype machinecnn.com

阅读更多

来源: Hacker News | 01-06-25

Show HN: AI Peer Reviewer – Multiagent system for scientific manuscript analysisgithub.com/robertjakob

阅读更多

来源: Hacker News | 01-06-25

Generative AI startup Odyssey demos interactive AI-generated video

阅读更多

来源: The Decoder | 31-05-25

Show HN: MCP Defender – OSS AI Firewall for Protecting MCP in Cursor/Claude etcmcpdefender.com

阅读更多

来源: Hacker News | 31-05-25

The Darwin Gödel Machine: AI that improves itself by rewriting its own codesakana.ai

阅读更多

来源: Hacker News | 31-05-25

AccessOwl (YC S22) is hiring an AI TypeScript Engineer to connect 100s of SaaSycombinator.com

阅读更多

来源: Hacker News | 31-05-25

The Trackers and SDKs in ChatGPT, Claude, Grok and Perplexityjamesoclaire.com

阅读更多

来源: Hacker News | 31-05-25

What's working for YC companies since the AI boomjamesin.substack.com

阅读更多

来源: Hacker News | 31-05-25

Opera unveils Neon, a browser designed for both humans and AI agents

阅读更多

来源: The Decoder | 31-05-25

One year after its rivals, Claude can finally speak with users through a new voice mode

阅读更多

来源: The Decoder | 31-05-25

Anthropic launches a voice mode for Claudetechcrunch.com

阅读更多

来源: Hacker News | 31-05-25

Mistral's Agents API enables AI agents to collaborate and connect with external systems

阅读更多

来源: The Decoder | 30-05-25

What is currently the best LLM model for consumer grade hardware? Is it phi-4?

阅读更多

来源: Hacker News | 30-05-25

Spaitial pushes generative AI to understand and create 3D structures with real physical properties

阅读更多

来源: The Decoder | 30-05-25

Human coders are still better than LLMsantirez.com

阅读更多

来源: Hacker News | 30-05-25

Open-sourcing circuit tracing toolsanthropic.com

阅读更多

来源: Hacker News | 30-05-25

A visual exploration of vector embeddingspamelafox.org

阅读更多

来源: Hacker News | 30-05-25

Nick Clegg says a mandatory AI training opt-in would kill the UK's AI industry

阅读更多

来源: The Decoder | 29-05-25

ClaimPKG: Enhancing Claim Verification via Pseudo-Subgraph Generation with Lightweight Specialized LLM

Authors: Hoang Pham, Thanh-Do Nguyen, Khac-Hoai Nam Bui |

阅读更多

来源: ArXiv AI | 29-05-25

Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning

Authors: Erxin Yu, Jing Li, Ming Liao, Qi Zhu, Boyang Xue, Minghui Xu, Baojun Wang, Lanqing Hong, Fei Mi, Lifeng Shang |

阅读更多

来源: ArXiv AI | 29-05-25

Agent-UniRAG: A Trainable Open-Source LLM Agent Framework for Unified Retrieval-Augmented Generation Systems

Authors: Hoang Pham, Khac-Hoai Nam Bui |

阅读更多

来源: ArXiv AI | 29-05-25

R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning

Authors: Yongchao Chen, Yueying Liu, Junwei Zhou, Yilun Hao, Jingquan Wang, Yang Zhang, Chuchu Fan |

阅读更多

来源: ArXiv AI | 29-05-25

Understanding the learned look-ahead behavior of chess neural networks

Authors: Diogo Cruz |

阅读更多

来源: ArXiv AI | 29-05-25

3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model

Authors: Wenbo Hu, Yining Hong, Yanjun Wang, Leison Gao, Zibu Wei, Xingcheng Yao, Nanyun Peng, Yonatan Bitton, Idan Szpektor, Kai-Wei Chang |

阅读更多

来源: ArXiv AI | 29-05-25

From Reasoning to Learning: A Survey on Hypothesis Discovery and Rule Learning with Large Language Models

Authors: Kaiyu He, Zhiyu Chen |

阅读更多

来源: ArXiv AI | 29-05-25

Modeling and Optimizing User Preferences in AI Copilots: A Comprehensive Survey and Taxonomy

Authors: Saleh Afzoon, Zahra Jahanandish, Phuong Thao Huynh, Amin Beheshti, Usman Naseem |

阅读更多

来源: ArXiv AI | 29-05-25

SAGE-Eval: Evaluating LLMs for Systematic Generalizations of Safety Facts

Authors: Chen Yueh-Han, Guy Davidson, Brenden M. Lake |

阅读更多

来源: ArXiv AI | 29-05-25

Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation

Authors: Tharindu Kumarage, Ninareh Mehrabi, Anil Ramakrishna, Xinyan Zhao, Richard Zemel, Kai-Wei Chang, Aram Galstyan, Rahul Gupta, Charith Peris |

阅读更多

来源: ArXiv AI | 29-05-25

Visual Large Language Models Exhibit Human-Level Cognitive Flexibility in the Wisconsin Card Sorting Test

Authors: Guangfu Hao, Frederic Alexandre, Shan Yu |

阅读更多

来源: ArXiv AI | 29-05-25

HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym

Authors: Ngoc La, Ruaridh Mon-Williams, Julie A. Shah |

阅读更多

来源: ArXiv AI | 29-05-25

AgentDNS: A Root Domain Naming System for LLM Agents

Authors: Enfang Cui, Yujun Cheng, Rui She, Dan Liu, Zhiyuan Liang, Minxin Guo, Tianzheng Li, Qian Wei, Wenjuan Xing, Zhijie Zhong |

阅读更多

来源: ArXiv AI | 29-05-25

From Large AI Models to Agentic AI: A Tutorial on Future Intelligent Communications

Authors: Feibo Jiang, Cunhua Pan, Li Dong, Kezhi Wang, Octavia A. Dobre, Merouane Debbah |

阅读更多

来源: ArXiv AI | 29-05-25

Chatbots like ChatGPT have not led to significant changes in wages or working hours, study finds

阅读更多

来源: The Decoder | 29-05-25

Show HN: AutoThink – Boosts local LLM performance with adaptive reasoning

阅读更多

来源: Hacker News | 29-05-25

Launch HN: MindFort (YC X25) – AI agents for continuous pentesting

阅读更多

来源: Hacker News | 29-05-25

LLM codegen go brrr – Parallelization with Git worktrees and tmuxskeptrune.com

阅读更多

来源: Hacker News | 29-05-25

Gmail Personal Smart Replies: The first time an AI feature has worried me

阅读更多

来源: The Decoder | 28-05-25

Running GPT-2 in WebGL: Rediscovering the Lost Art of GPU Shader Programmingnathan.rs

阅读更多

来源: Hacker News | 28-05-25

There Is No Diffie-Hellman but Elliptic Curve Diffie-Hellmankeymaterial.net

阅读更多

来源: Hacker News | 28-05-25

Show HN: My LLM CLI tool can run tools now, from Python code or pluginssimonwillison.net

阅读更多

来源: Hacker News | 28-05-25

Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making

Authors: Yihan Wang, Qiao Yan, Zhenghao Xing, Lihao Liu, Junjun He, Chi-Wing Fu, Xiaowei Hu, Pheng-Ann Heng |

阅读更多

来源: ArXiv AI | 28-05-25

Multi-Modal Artificial Intelligence of Embryo Grading and Pregnancy Prediction in Assisted Reproductive Technology: A Review

Authors: Xueqiang Ouyang, Jia Wei |

阅读更多

来源: ArXiv AI | 28-05-25

How does Alignment Enhance LLMs' Multilingual Capabilities? A Language Neurons Perspective

Authors: Shimao Zhang, Zhejian Lai, Xiang Liu, Shuaijie She, Xiao Liu, Yeyun Gong, Shujian Huang, Jiajun Chen |

阅读更多

来源: ArXiv AI | 28-05-25

CoderAgent: Simulating Student Behavior for Personalized Programming Learning with Large Language Models

Authors: Yi Zhan, Qi Liu, Weibo Gao, Zheng Zhang, Tianfu Wang, Shuanghong Shen, Junyu Lu, Zhenya Huang |

阅读更多

来源: ArXiv AI | 28-05-25

Comparisons between a Large Language Model-based Real-Time Compound Diagnostic Medical AI Interface and Physicians for Common Internal Medicine Cases using Simulated Patients

Authors: Hyungjun Park (1,2), Chang-Yun Woo (3), Seungjo Lim (2), Seunghwan Lim (2), Keunho Kwak (2), Ju Young Jeong (4), Chong Hyun Suh (4) ((1) Department of Pulmonology, Shihwa Medical Center, Siheung, Republic of Korea (2) Helpmedoc Inc., Republic of Korea (3) Department of Internal Medicine, Asan Medical Center, Seoul, Republic of Korea (4) Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Republic of Korea) |

阅读更多

来源: ArXiv AI | 28-05-25

Project Riley: Multimodal Multi-Agent LLM Collaboration with Emotional Reasoning and Voting

Authors: Ana Rita Ortigoso, Gabriel Vieira, Daniel Fuentes, Luis Frazão, Nuno Costa, António Pereira |

阅读更多

来源: ArXiv AI | 28-05-25

RRO: LLM Agent Optimization Through Rising Reward Trajectories

Authors: Zilong Wang, Jingfeng Yang, Sreyashi Nag, Samarth Varshney, Xianfeng Tang, Haoming Jiang, Jingbo Shang, Sheikh Muhammad Sarwar |

阅读更多

来源: ArXiv AI | 28-05-25

E2E Process Automation Leveraging Generative AI and IDP-Based Automation Agent: A Case Study on Corporate Expense Processing

Authors: Cheonsu Jeong, Seongmin Sim, Hyoyoung Cho, Sungsu Kim, Byounggwan Shin |

阅读更多

来源: ArXiv AI | 28-05-25

GIFARC: Synthetic Dataset for Leveraging Human-Intuitive Analogies to Elevate AI Reasoning

Authors: Woochang Sim, Hyunseok Ryu, Kyungmin Choi, Sungwon Han, Sundong Kim |

阅读更多

来源: ArXiv AI | 28-05-25

LLM-Guided Reinforcement Learning: Addressing Training Bottlenecks through Policy Modulation

Authors: Heng Tan, Hua Yan, Yu Yang |

阅读更多

来源: ArXiv AI | 28-05-25

AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage

Authors: Xuanle Zhao, Zilin Sang, Yuxuan Li, Qi Shi, Shuo Wang, Duzhen Zhang, Xu Han, Zhiyuan Liu, Maosong Sun |

阅读更多

来源: ArXiv AI | 28-05-25

Step-Wise Formal Verification for LLM-Based Mathematical Problem Solving

Authors: Kuo Zhou, Lu Zhang |

阅读更多

来源: ArXiv AI | 28-05-25

Large Language Model-enhanced Reinforcement Learning for Low-Altitude Economy Networking

Authors: Lingyi Cai, Ruichen Zhang, Changyuan Zhao, Yu Zhang, Jiawen Kang, Dusit Niyato, Tao Jiang, Xuemin Shen |

阅读更多

来源: ArXiv AI | 28-05-25

Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations

Authors: Hao Li, He Cao, Bin Feng, Yanjun Shao, Xiangru Tang, Zhiyuan Yan, Li Yuan, Yonghong Tian, Yu Li |

阅读更多

来源: ArXiv AI | 28-05-25

Complex System Diagnostics Using a Knowledge Graph-Informed and Large Language Model-Enhanced Framework

Authors: Saman Marandi, Yu-Shu Hu, Mohammad Modarres |

阅读更多

来源: ArXiv AI | 28-05-25

RLJP: Legal Judgment Prediction via First-Order Logic Rule-enhanced with Large Language Models

Authors: Yue Zhang, Zhiliang Tian, Shicheng Zhou, Haiyang Wang, Wenqing Hou, Yuying Liu, Xuechen Zhao, Minlie Huang, Ye Wang, Bin Zhou |

阅读更多

来源: ArXiv AI | 28-05-25

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs

Authors: Jiakang Yuan, Tianshuo Peng, Yilei Jiang, Yiting Lu, Renrui Zhang, Kaituo Feng, Chaoyou Fu, Tao Chen, Lei Bai, Bo Zhang, Xiangyu Yue |

阅读更多

来源: ArXiv AI | 28-05-25

A Structured Unplugged Approach for Foundational AI Literacy in Primary Education

Authors: Maria Cristina Carrisi, Mirko Marras, Sara Vergallo |

阅读更多

来源: ArXiv AI | 28-05-25

The Multilingual Divide and Its Impact on Global AI Safety

Authors: Aidan Peppin, Julia Kreutzer, Alice Schoenauer Sebag, Kelly Marchisio, Beyza Ermis, John Dang, Samuel Cahyawijaya, Shivalika Singh, Seraphina Goldfarb-Tarrant, Viraat Aryabumi, Aakanksha, Wei-Yin Ko, Ahmet Üstün, Matthias Gallé, Marzieh Fadaee, Sara Hooker |

阅读更多

来源: ArXiv AI | 28-05-25

Diagnosing and Resolving Cloud Platform Instability with Multi-modal RAG LLMs

Authors: Yifan Wang, Kenneth P. Birman |

阅读更多

来源: ArXiv AI | 28-05-25

Robust Hypothesis Generation: LLM-Automated Language Bias for Inductive Logic Programming

Authors: Yang Yang, Jiemin Wu, Yutao Yue |

阅读更多

来源: ArXiv AI | 28-05-25

Google expands access to Veo 3, its viral new video model, through the Gemini app

阅读更多

来源: The Decoder | 27-05-25

Diligent (YC S23) Is Hiring a Founding AI Engineerycombinator.com

阅读更多

来源: Hacker News | 27-05-25

Trying to teach in the age of the AI homework machinesolarshades.club

阅读更多

来源: Hacker News | 27-05-25

Highlights from the Claude 4 system promptsimonwillison.net

阅读更多

来源: Hacker News | 27-05-25

Automated CAD Modeling Sequence Generation from Text Descriptions via Transformer-Based Large Language Models

Authors: Jianxing Liao, Junyan Xu, Yatao Sun, Maowen Tang, Sicheng He, Jingxian Liao, Shui Yu, Yun Li, Hongguan Xiao |

阅读更多

来源: ArXiv AI | 27-05-25

Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel Bugs

Authors: Zhenhao Zhou, Zhuochen Huang, Yike He, Chong Wang, Jiajun Wang, Yijian Wu, Xin Peng, Yiling Lou |

阅读更多

来源: ArXiv AI | 27-05-25

MSD-LLM: Predicting Ship Detention in Port State Control Inspections with Large Language Model

Authors: Jiongchao Jin, Xiuju Fu, Xiaowei Gao, Tao Cheng, Ran Yan |

阅读更多

来源: ArXiv AI | 27-05-25

LLM-Agent-Controller: A Universal Multi-Agent Large Language Model System as a Control Engineer

Authors: Rasoul Zahedifar, Sayyed Ali Mirghasemi, Mahdieh Soleymani Baghshah, Alireza Taheri |

阅读更多

来源: ArXiv AI | 27-05-25

AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and Healthcare

Authors: Ying Xiao, Jie Huang, Ruijuan He, Jing Xiao, Mohammad Reza Mousavi, Yepang Liu, Kezhi Li, Zhenpeng Chen, Jie M. Zhang |

阅读更多

来源: ArXiv AI | 27-05-25

Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language Models

Authors: George Kour, Itay Nakash, Ateret Anaby-Tavor, Michal Shmueli-Scheuer |

阅读更多

来源: ArXiv AI | 27-05-25

Large Language Models for Planning: A Comprehensive and Systematic Survey

Authors: Pengfei Cao, Tianyi Men, Wencan Liu, Jingwen Zhang, Xuzhao Li, Xixun Lin, Dianbo Sui, Yanan Cao, Kang Liu, Jun Zhao |

阅读更多

来源: ArXiv AI | 27-05-25

Large Language Models' Reasoning Stalls: An Investigation into the Capabilities of Frontier Models

Authors: Lachlan McGinness, Peter Baumgartner |

阅读更多

来源: ArXiv AI | 27-05-25

FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks

Authors: Atsunori Moteki, Shoichi Masui, Fan Yang, Yueqi Song, Yonatan Bisk, Graham Neubig, Ikuo Kusajima, Yasuto Watanabe, Hiroyuki Ishida, Jun Takahashi, Shan Jiang |

阅读更多

来源: ArXiv AI | 27-05-25

ReChisel: Effective Automatic Chisel Code Generation by LLM with Reflection

Authors: Juxin Niu, Xiangfeng Liu, Dan Niu, Xi Wang, Zhe Jiang, Nan Guan |

阅读更多

来源: ArXiv AI | 27-05-25

Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning

Authors: Zican Hu, Wei Liu, Xiaoye Qu, Xiangyu Yue, Chunlin Chen, Zhi Wang, Yu Cheng |

阅读更多

来源: ArXiv AI | 27-05-25

Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging

Authors: Yongxian Wei, Runxi Cheng, Weike Jin, Enneng Yang, Li Shen, Lu Hou, Sinan Du, Chun Yuan, Xiaochun Cao, Dacheng Tao |

阅读更多

来源: ArXiv AI | 27-05-25

DGRAG: Distributed Graph-based Retrieval-Augmented Generation in Edge-Cloud Systems

Authors: Wenqing Zhou, Yuxuan Yan, Qianqian Yang |

阅读更多

来源: ArXiv AI | 27-05-25

Large Language Models as Autonomous Spacecraft Operators in Kerbal Space Program

Authors: Alejandro Carrasco, Victor Rodriguez-Fernandez, Richard Linares |

阅读更多

来源: ArXiv AI | 27-05-25

Subtle Risks, Critical Failures: A Framework for Diagnosing Physical Safety of LLMs for Embodied Decision Making

Authors: Yejin Son, Minseo Kim, Sungwoong Kim, Seungju Han, Jian Kim, Dongju Jang, Youngjae Yu, Chanyoung Park |

阅读更多

来源: ArXiv AI | 27-05-25

EMAC+: Embodied Multimodal Agent for Collaborative Planning with VLM+LLM

Authors: Shuang Ao, Flora D. Salim, Simon Khan |

阅读更多

来源: ArXiv AI | 27-05-25

Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback

Authors: Mengdi Li, Jiaye Lin, Xufeng Zhao, Wenhao Lu, Peilin Zhao, Stefan Wermter, Di Wang |

阅读更多

来源: ArXiv AI | 27-05-25

Agentic AI Process Observability: Discovering Behavioral Variability

Authors: Fabiana Fournier, Lior Limonad, Yuval David |

阅读更多

来源: ArXiv AI | 27-05-25

Capability-Based Scaling Laws for LLM Red-Teaming

Authors: Alexander Panfilov, Paul Kassianik, Maksym Andriushchenko, Jonas Geiping |

阅读更多

来源: ArXiv AI | 27-05-25

MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents

Authors: Ziming Wei, Bingqian Lin, Zijian Jiao, Yunshuang Nie, Liang Ma, Yuecheng Liu, Yuzheng Zhuang, Xiaodan Liang |

阅读更多

来源: ArXiv AI | 27-05-25

Temporal Sampling for Forgotten Reasoning in LLMs

Authors: Yuetai Li, Zhangchen Xu, Fengqing Jiang, Bhaskar Ramasubramanian, Luyao Niu, Bill Yuchen Lin, Xiang Yue, Radha Poovendran |

阅读更多

来源: ArXiv AI | 27-05-25

The Mirage of Multimodality: Where Truth is Tested and Honesty Unravels

Authors: Jiaming Ji, Sitong Fang, Wenjing Cao, Jiahao Li, Xuyao Wang, Juntao Dai, Chi-Min Chan, Sirui Han, Yike Guo, Yaodong Yang |

阅读更多

来源: ArXiv AI | 27-05-25

Ten Principles of AI Agent Economics

Authors: Ke Yang, ChengXiang Zhai |

阅读更多

来源: ArXiv AI | 27-05-25

How Can I Publish My LLM Benchmark Without Giving the True Answers Away?

Authors: Takashi Ishida, Thanawat Lodkaew, Ikko Yamane |

阅读更多

来源: ArXiv AI | 27-05-25

Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL

Authors: Joey Hong, Anca Dragan, Sergey Levine |

阅读更多

来源: ArXiv AI | 27-05-25

Lost in the Haystack: Smaller Needles are More Difficult for LLMs to Find

Authors: Owen Bianchi, Mathew J. Koretsky, Maya Willey, Chelsea X. Alvarado, Tanay Nayak, Adi Asija, Nicole Kuznetsov, Mike A. Nalls, Faraz Faghri, Daniel Khashabi |

阅读更多

来源: ArXiv AI | 27-05-25

Leveraging KANs for Expedient Training of Multichannel MLPs via Preconditioning and Geometric Refinement

Authors: Jonas A. Actor, Graham Harper, Ben Southworth, Eric C. Cyr |

阅读更多

来源: ArXiv AI | 27-05-25

Bidirectional Knowledge Distillation for Enhancing Sequential Recommendation with Large Language Models

Authors: Jiongran Wu, Jiahao Liu, Dongsheng Li, Guangping Zhang, Mingzhe Han, Hansu Gu, Peng Zhang, Li Shang, Tun Lu, Ning Gu |

阅读更多

来源: ArXiv AI | 27-05-25

Where You Go is Who You Are: Behavioral Theory-Guided LLMs for Inverse Reinforcement Learning

Authors: Yuran Sun, Susu Xu, Chenguang Wang, Xilei Zhao |

阅读更多

来源: ArXiv AI | 27-05-25

Misaligning Reasoning with Answers -- A Framework for Assessing LLM CoT Robustness

Authors: Enyi Jiang, Changming Xu, Nischay Singh, Gagandeep Singh |

阅读更多

来源: ArXiv AI | 27-05-25

From Reasoning to Generalization: Knowledge-Augmented LLMs for ARC Benchmark

Authors: Chao Lei, Nir Lipovetzky, Krista A. Ehinger, Yanchuan Chang |

阅读更多

来源: ArXiv AI | 27-05-25

Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal Learning

Authors: Cheng Peng, Kai Zhang, Mengxian Lyu, Hongfang Liu, Lichao Sun, Yonghui Wu |

阅读更多

来源: ArXiv AI | 27-05-25

Probe by Gaming: A Game-based Benchmark for Assessing Conceptual Knowledge in LLMs

Authors: Shuhang Xu, Weijian Deng, Yixuan Zhou, Fangwei Zhong |

阅读更多

来源: ArXiv AI | 27-05-25

USTBench: Benchmarking and Dissecting Spatiotemporal Reasoning of LLMs as Urban Agents

Authors: Siqi Lai, Yansong Ning, Zirui Yuan, Zhixi Chen, Hao Liu |

阅读更多

来源: ArXiv AI | 27-05-25

GeoGramBench: Benchmarking the Geometric Program Reasoning in Modern LLMs

Authors: Shixian Luo, Zezhou Zhu, Yu Yuan, Yuncheng Yang, Lianlei Shan, Yong Wu |

阅读更多

来源: ArXiv AI | 27-05-25

CIKT: A Collaborative and Iterative Knowledge Tracing Framework with Large Language Models

Authors: Runze Li, Siyu Wu, Jun Wang, Wei Zhang |

阅读更多

来源: ArXiv AI | 27-05-25

Enhancing AI System Resiliency: Formulation and Guarantee for LSTM Resilience Based on Control Theory

Authors: Sota Yoshihara (1), Ryousuke Yamamoto (2), Hiroyuki Kusumoto (1), Masanari Shimura (1) ((1) Graduate School of Mathematics, Nagoya University, (2) Aisin Software) |

阅读更多

来源: ArXiv AI | 27-05-25

Automating Safety Enhancement for LLM-based Agents with Synthetic Risk Scenarios

Authors: Xueyang Zhou, Weidong Wang, Lin Lu, Jiawen Shi, Guiyao Tie, Yongtian Xu, Lixing Chen, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun |

阅读更多

来源: ArXiv AI | 27-05-25

Superplatforms Have to Attack AI Agents

Authors: Jianghao Lin, Jiachen Zhu, Zheli Zhou, Yunjia Xi, Weiwen Liu, Yong Yu, Weinan Zhang |

阅读更多

来源: ArXiv AI | 27-05-25

Evaluation Faking: Unveiling Observer Effects in Safety Evaluation of Frontier AI Systems

Authors: Yihe Fan, Wenqi Zhang, Xudong Pan, Min Yang |

阅读更多

来源: ArXiv AI | 27-05-25

Formalizing Embeddedness Failures in Universal Artificial Intelligence

Authors: Cole Wyeth, Marcus Hutter |

阅读更多

来源: ArXiv AI | 27-05-25

Structured Thinking Matters: Improving LLMs Generalization in Causal Inference Tasks

Authors: Wentao Sun, Joao Paulo Nogueira, Alonso Silva |

阅读更多

来源: ArXiv AI | 27-05-25

Gaming Tool Preferences in Agentic LLMs

Authors: Kazem Faghih, Wenxiao Wang, Yize Cheng, Siddhant Bharti, Gaurang Sriramanan, Sriram Balasubramanian, Parsa Hosseini, Soheil Feizi |

阅读更多

来源: ArXiv AI | 27-05-25

Embracing Contradiction: Theoretical Inconsistency Will Not Impede the Road of Building Responsible AI Systems

Authors: Gordon Dai, Yunze Xiao |

阅读更多

来源: ArXiv AI | 27-05-25

Apple analyst expects OpenAI's AI hardware to be "as compact and elegant as an iPod Shuffle"

阅读更多

来源: The Decoder | 26-05-25

Meta can use public Facebook and Instagram data for AI training, German court rules

阅读更多

来源: The Decoder | 26-05-25

Trading with Claude, and writing your own MCP serverdangelov.com

阅读更多

来源: Hacker News | 26-05-25

Ask HN: Anyone struggling to get value out of coding LLMs?

阅读更多

来源: Hacker News | 26-05-25

How Does Claude 4 Think? – Sholto Douglas and Trenton Brickendwarkesh.com

阅读更多

来源: Hacker News | 26-05-25

Venta AI (YC S23) Is Hiring a Founding Full Stack Engineer in Amsterdamycombinator.com

阅读更多

来源: Hacker News | 26-05-25

Chomsky on what ChatGPT is good for (2023)chomsky.info

阅读更多

来源: Hacker News | 26-05-25

Claude 4 System Cardsimonwillison.net

阅读更多

来源: Hacker News | 26-05-25

OpenAI's Operator Agent gets o3 upgrade for more precise browser control

阅读更多

来源: The Decoder | 25-05-25

Here's how Germans use ChatGPT according to OpenAI

阅读更多

来源: The Decoder | 25-05-25

Peer Programming with LLMs, for Senior+ Engineerspmbanugo.me

阅读更多

来源: Hacker News | 25-05-25

Show HN: AI Baby Monitor – local Video-LLM that beeps when safety rules breakgithub.com/zeenolife

阅读更多

来源: Hacker News | 25-05-25

Can AI Read Between The Lines? Benchmarking LLMs On Financial Nuance

Authors: Dominick Kubica, Dylan T. Gordon, Nanami Emura, Derleen Saini, Charlie Goldenberg |

阅读更多

来源: ArXiv AI | 25-05-25

Optimizing LLM-Based Multi-Agent System with Textual Feedback: A Case Study on Software Development

Authors: Ming Shen, Raphael Shu, Anurag Pratik, James Gung, Yubin Ge, Monica Sunkara, Yi Zhang |

阅读更多

来源: ArXiv AI | 25-05-25

LLM-Powered AI Agent Systems and Their Applications in Industry

Authors: Guannan Liang, Qianqian Tong |

阅读更多

来源: ArXiv AI | 25-05-25

Logic-of-Thought: Empowering Large Language Models with Logic Programs for Solving Puzzles in Natural Language

Authors: Naiqi Li, Peiyuan Liu, Zheng Liu, Tao Dai, Yong Jiang, Shu-Tao Xia |

阅读更多

来源: ArXiv AI | 25-05-25

LightRouter: Towards Efficient LLM Collaboration with Minimal Overhead

Authors: Yifan Zhang, Xinkui Zhao, Zuxin Wang, Guanjie Cheng, Yueshen Xu, Shuiguang Deng, Jianwei Yin |

阅读更多

来源: ArXiv AI | 25-05-25

EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning

Authors: Jiawei Liu, Qisi Chen, Jianshu Zhang, Quan Liu, Defu Lian |

阅读更多

来源: ArXiv AI | 25-05-25

How do Scaling Laws Apply to Knowledge Graph Engineering Tasks? The Impact of Model Size on Large Language Model Performance

Authors: Desiree Heim, Lars-Peter Meyer, Markus Schröder, Johannes Frey, Andreas Dengel |

阅读更多

来源: ArXiv AI | 25-05-25

Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning

Authors: Xiaoxue Cheng, Junyi Li, Zhenduo Zhang, Xinyu Tang, Wayne Xin Zhao, Xinyu Kong, Zhiqiang Zhang |

阅读更多

来源: ArXiv AI | 25-05-25

Advancing the Scientific Method with Large Language Models: From Hypothesis to Discovery

Authors: Yanbo Zhang, Sumeer A. Khan, Adnan Mahmud, Huck Yang, Alexander Lavin, Michael Levin, Jeremy Frey, Jared Dunnmon, James Evans, Alan Bundy, Saso Dzeroski, Jesper Tegner, Hector Zenil |

阅读更多

来源: ArXiv AI | 25-05-25

ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection

Authors: Jiaqi Li, Xinyi Dong, Yang Liu, Zhizhuo Yang, Quansen Wang, Xiaobo Wang, SongChun Zhu, Zixia Jia, Zilong Zheng |

阅读更多

来源: ArXiv AI | 25-05-25

Psychology-driven LLM Agents for Explainable Panic Prediction on Social Media during Sudden Disaster Events

Authors: Mengzhu Liu, Zhengqiu Zhu, Chuan Ai, Chen Gao, Xinghong Li, Lingnan He, Kaisheng Lai, Yingfeng Chen, Xin Lu, Yong Li, Quanjun Yin |

阅读更多

来源: ArXiv AI | 25-05-25

ELABORATION: A Comprehensive Benchmark on Human-LLM Competitive Programming

Authors: Xinwei Yang, Zhaofeng Liu, Chen Huang, Jiashuai Zhang, Tong Zhang, Yifan Zhang, Wenqiang Lei |

阅读更多

来源: ArXiv AI | 25-05-25

SMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs' Mathematical Problem Solving

Authors: Yujie Hou, Ting Zhang, Mei Wang, Xuetao Ma, Hu Huang |

阅读更多

来源: ArXiv AI | 25-05-25

Data-Driven Breakthroughs and Future Directions in AI Infrastructure: A Comprehensive Review

Authors: Beyazit Bestami Yuksel, Ayse Yilmazer Metin |

阅读更多

来源: ArXiv AI | 25-05-25

MCP-RADAR: A Multi-Dimensional Benchmark for Evaluating Tool Use Capabilities in Large Language Models

Authors: Xuanqi Gao, Siyi Xie, Juan Zhai, Shqing Ma, Chao Shen |

阅读更多

来源: ArXiv AI | 25-05-25

Predicate-Conditional Conformalized Answer Sets for Knowledge Graph Embeddings

Authors: Yuqicheng Zhu, Daniel Hernández, Yuan He, Zifeng Ding, Bo Xiong, Evgeny Kharlamov, Steffen Staab |

阅读更多

来源: ArXiv AI | 25-05-25

Identifying, Evaluating, and Mitigating Risks of AI Thought Partnerships

Authors: Kerem Oktar, Katherine M. Collins, Jose Hernandez-Orallo, Diane Coyle, Stephen Cave, Adrian Weller, Ilia Sucholutsky |

阅读更多

来源: ArXiv AI | 25-05-25

AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios

Authors: Yunjia Qi, Hao Peng, Xiaozhi Wang, Amy Xin, Youfeng Liu, Bin Xu, Lei Hou, Juanzi Li |

阅读更多

来源: ArXiv AI | 25-05-25

HyGenar: An LLM-Driven Hybrid Genetic Algorithm for Few-Shot Grammar Generation

Authors: Weizhi Tang, Yixuan Li, Chris Sypherd, Elizabeth Polgreen, Vaishak Belle |

阅读更多

来源: ArXiv AI | 25-05-25

Beyond Correlation: Towards Causal Large Language Model Agents in Biomedicine

Authors: Adib Bazgir, Amir Habibdoust Lafmajani, Yuwen Zhang |

阅读更多

来源: ArXiv AI | 25-05-25

Know the Ropes: A Heuristic Strategy for LLM-based Multi-Agent System Design

Authors: Zhenkun Li, Lingyao Li, Shuhang Lin, Yongfeng Zhang |

阅读更多

来源: ArXiv AI | 25-05-25

X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs

Authors: Rui Ye, Xiangrui Liu, Qimin Wu, Xianghe Pang, Zhenfei Yin, Lei Bai, Siheng Chen |

阅读更多

来源: ArXiv AI | 25-05-25

OpenAI and G42 will build massive AI data center in Abu Dhabi

阅读更多

来源: The Decoder | 25-05-25

Mistral's Document AI extracts text from documents and notes with high accuracy

阅读更多

来源: The Decoder | 25-05-25

US House passed a bill that would ban state-level AI regulations for ten years

阅读更多

来源: The Decoder | 25-05-25

Exposed Industrial Control Systems and Honeypots in the Wild [pdf]gsmaragd.github.io

阅读更多

来源: Hacker News | 25-05-25

Positional preferences, order effects, prompt sensitivity undermine AI judgmentscip.org

阅读更多

来源: Hacker News | 24-05-25

Show HN: I built a more productive way to manage AI chatscontextch.at

阅读更多

来源: Hacker News | 24-05-25

Claude Opus 4 blackmailed an engineer after learning it might be replaced

阅读更多

来源: The Decoder | 24-05-25

OpenAI has upgraded the Responses API with remote MCP servers and new tools

阅读更多

来源: The Decoder | 24-05-25

OpenAI and Jony Ive are building a new AI device that is not a smartphone or smart glasses

阅读更多

来源: The Decoder | 24-05-25

Mistral launches Devstral Small 24B, a new open-source LLM for coding

阅读更多

来源: The Decoder | 23-05-25

OpenAI's Stargate secured $11.6 billion for a massive data center

阅读更多

来源: The Decoder | 23-05-25

Google Gemini is everything Siri never was

阅读更多

来源: The Decoder | 23-05-25

Gemini Diffusion could be Google's most important I/O news that slipped under the radar

阅读更多

来源: The Decoder | 23-05-25

Google shows AI filmmaking tool, XR glasses and launches $250 Gemini subscription

阅读更多

来源: The Decoder | 23-05-25

Launch HN: WorkDone (YC X25) – AI Audit of Medical Charts

阅读更多

来源: Hacker News | 23-05-25

OpenAI: Scaling PostgreSQL to the Next Levelpixelstech.net

阅读更多

来源: Hacker News | 23-05-25

Claude 4anthropic.com

阅读更多

来源: Hacker News | 23-05-25

Management = Bullshit (LLM Edition)funcall.blogspot.com

阅读更多

来源: Hacker News | 23-05-25

Problems in AI alignment: A scale modelmuldoon.cloud

阅读更多

来源: Hacker News | 23-05-25

Google upgrades Gemini 2.5 Pro with a new Deep Think mode for advanced reasoning abilities

阅读更多

来源: The Decoder | 22-05-25

An upgraded dev experience in Google AI Studiogoogleblog.com

阅读更多

来源: Hacker News | 22-05-25

OpenAI to buy AI startup from Jony Ivebloomberg.com

阅读更多

来源: Hacker News | 22-05-25

LLM function calls don't scale; code orchestration is simpler, more effectivejngiam.bearblog.dev

阅读更多

来源: Hacker News | 22-05-25

Gemini figured out my nephew’s namenawaz.org

阅读更多

来源: Hacker News | 22-05-25

Robert Musil Forgotten Plays Inspired His Greatest Work of Fictionlithub.com

阅读更多

来源: Hacker News | 22-05-25

Gemini Diffusionsimonwillison.net

阅读更多

来源: Hacker News | 22-05-25

FragFake: A Dataset for Fine-Grained Detection of Edited Images with Vision Language Models

Authors: Zhen Sun, Ziyi Zhang, Zeren Luo, Zeyang Sha, Tianshuo Cong, Zheng Li, Shiwen Cui, Weiqiang Wang, Jiaheng Wei, Xinlei He, Qi Li, Qian Wang |

阅读更多

来源: ArXiv AI | 22-05-25

Listen to the Context: Towards Faithful Large Language Models for Retrieval Augmented Generation on Climate Questions

Authors: David Thulke, Jakob Kemmler, Christian Dugast, Hermann Ney |

阅读更多

来源: ArXiv AI | 22-05-25

From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning

Authors: David Dinucu-Jianu, Jakub Macina, Nico Daheim, Ido Hakimi, Iryna Gurevych, Mrinmaya Sachan |

阅读更多

来源: ArXiv AI | 22-05-25

Exploring LLM-Generated Feedback for Economics Essays: How Teaching Assistants Evaluate and Envision Its Use

Authors: Xinyi Lu, Aditya Mahesh, Zejia Shen, Mitchell Dudley, Larissa Sano, Xu Wang |

阅读更多

来源: ArXiv AI | 22-05-25

A Federated Splitting Framework for LLMs: Security, Efficiency, and Adaptability

Authors: Zishuai Zhang, Hainan Zhang, Jiaying Zheng, Ziwei Wang, Yongxin Tong, Jin Dong, Zhiming Zheng |

阅读更多

来源: ArXiv AI | 22-05-25

HybridProver: Augmenting Theorem Proving with LLM-Driven Proof Synthesis and Refinement

Authors: Jilin Hu, Jianyu Zhang, Yongwang Zhao, Talia Ringer |

阅读更多

来源: ArXiv AI | 22-05-25

Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses

Authors: Xiaoxue Yang, Bozhidar Stevanoski, Matthieu Meeus, Yves-Alexandre de Montjoye |

阅读更多

来源: ArXiv AI | 22-05-25

Shared Path: Unraveling Memorization in Multilingual LLMs through Language Similarities

Authors: Xiaoyu Luo, Yiyi Chen, Johannes Bjerva, Qiongxiu Li |

阅读更多

来源: ArXiv AI | 22-05-25

Multi-modal Integration Analysis of Alzheimer's Disease Using Large Language Models and Knowledge Graphs

Authors: Kanan Kiguchi, Yunhao Tu, Katsuhiro Ajito, Fady Alnajjar, Kazuyuki Murase |

阅读更多

来源: ArXiv AI | 22-05-25

Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space

Authors: Zhen Zhang, Xuehai He, Weixiang Yan, Ao Shen, Chenyang Zhao, Shuohang Wang, Yelong Shen, Xin Eric Wang |

阅读更多

来源: ArXiv AI | 22-05-25

Large Language Models as Computable Approximations to Solomonoff Induction

Authors: Jun Wan, Lingrui Mei |

阅读更多

来源: ArXiv AI | 22-05-25

VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models

Authors: Yuchen Yan, Jin Jiang, Zhenbang Ren, Yijun Li, Xudong Cai, Yang Liu, Xin Xu, Mengdi Zhang, Jian Shao, Yongliang Shen, Jun Xiao, Yueting Zhuang |

阅读更多

来源: ArXiv AI | 22-05-25

R&D-Agent: Automating Data-Driven AI Solution Building Through LLM-Powered Automated Research, Development, and Evolution

Authors: Xu Yang, Xiao Yang, Shikai Fang, Bowen Xian, Yuante Li, Jian Wang, Minrui Xu, Haoran Pan, Xinpeng Hong, Weiqing Liu, Yelong Shen, Weizhu Chen, Jiang Bian |

阅读更多

来源: ArXiv AI | 22-05-25

Self-Evolving Curriculum for LLM Reasoning

Authors: Xiaoyin Chen, Jiarui Lu, Minsu Kim, Dinghuai Zhang, Jian Tang, Alexandre Piché, Nicolas Gontier, Yoshua Bengio, Ehsan Kamalloo |

阅读更多

来源: ArXiv AI | 22-05-25

lmgame-Bench: How Good are LLMs at Playing Games?

Authors: Lanxiang Hu, Mingjia Huo, Yuxuan Zhang, Haoyang Yu, Eric P. Xing, Ion Stoica, Tajana Rosing, Haojian Jin, Hao Zhang |

阅读更多

来源: ArXiv AI | 22-05-25

ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges

Authors: Cheng Qian, Hongyi Du, Hongru Wang, Xiusi Chen, Yuji Zhang, Avirup Sil, Chengxiang Zhai, Kathleen McKeown, Heng Ji |

阅读更多

来源: ArXiv AI | 22-05-25

Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge

Authors: Yassir Fathullah, Mark J. F. Gales |

阅读更多

来源: ArXiv AI | 22-05-25

ClickSight: Interpreting Student Clickstreams to Reveal Insights on Learning Strategies via LLMs

Authors: Bahar Radmehr, Ekaterina Shved, Fatma Betül Güreş, Adish Singla, Tanja Käser |

阅读更多

来源: ArXiv AI | 22-05-25

Average Reward Reinforcement Learning for Omega-Regular and Mean-Payoff Objectives

Authors: Milad Kazemi, Mateo Perez, Fabio Somenzi, Sadegh Soudjani, Ashutosh Trivedi, Alvaro Velasquez |

阅读更多

来源: ArXiv AI | 22-05-25

Microsoft Build 2025 showcases new AI agent tools and open interfaces for developers

阅读更多

来源: The Decoder | 21-05-25

Large language models often struggle with decision-making — a new study explains why

阅读更多

来源: The Decoder | 21-05-25

Deep Learning Is Applied Topologytheahura.substack.com

阅读更多

来源: Hacker News | 21-05-25

Watching AI drive Microsoft employees insanereddit.com

阅读更多

来源: Hacker News | 21-05-25

Someone got an LLM running on a Commodore 64 from 1982, and it runs as wellxda-developers.com

阅读更多

来源: Hacker News | 21-05-25

5 Boring Things That Have a Bigger Impact Than AI Assistants on Dev Productivitycodemanship.wordpress.com

阅读更多

来源: Hacker News | 21-05-25

DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery

Authors: Kun Li, Zhennan Wu, Shoupeng Wang, Wenbin Hu |

阅读更多

来源: ArXiv AI | 21-05-25

Divide by Question, Conquer by Agent: SPLIT-RAG with Question-Driven Graph Partitioning

Authors: Ruiyi Yang, Hao Xue, Imran Razzak, Hakim Hacid, Flora D. Salim |

阅读更多

来源: ArXiv AI | 21-05-25

RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning

Authors: Qianyue Hao, Sibo Li, Jian Yuan, Yong Li |

阅读更多

来源: ArXiv AI | 21-05-25

ProMind-LLM: Proactive Mental Health Care via Causal Reasoning with Sensor Data

Authors: Xinzhe Zheng, Sijie Ji, Jiawei Sun, Renqi Chen, Wei Gao, Mani Srivastava |

阅读更多

来源: ArXiv AI | 21-05-25

MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem

Authors: Fan Liu, Zherui Yang, Cancheng Liu, Tianrui Song, Xiaofeng Gao, Hao Liu |

阅读更多

来源: ArXiv AI | 21-05-25

Toward Embodied AGI: A Review of Embodied AI and the Road Ahead

Authors: Yequan Wang, Aixin Sun |

阅读更多

来源: ArXiv AI | 21-05-25

Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning

Authors: Minwu Kim, Anubhav Shrestha, Safal Shrestha, Aadim Nepal, Keith Ross |

阅读更多

来源: ArXiv AI | 21-05-25

SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors

Authors: Maheep Chaudhary, Fazl Barez |

阅读更多

来源: ArXiv AI | 21-05-25

Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning

Authors: Zhaohui Yang, Shilei Jiang, Chen Hu, Linjing Li, Shihong Deng, Daxin Jiang |

阅读更多

来源: ArXiv AI | 21-05-25

Towards Reliable Proof Generation with LLMs: A Neuro-Symbolic Approach

Authors: Oren Sultan, Eitan Stern, Dafna Shahaf |

阅读更多

来源: ArXiv AI | 21-05-25

Guarded Query Routing for Large Language Models

Authors: Richard Šléher, William Brach, Tibor Sloboda, Kristián Košťál, Lukas Galke |

阅读更多

来源: ArXiv AI | 21-05-25

BACON: A fully explainable AI model with graded logic for decision making problems

Authors: Haishi Bai, Jozo Dujmovic, Jianwu Wang |

阅读更多

来源: ArXiv AI | 21-05-25

Let LLMs Break Free from Overthinking via Self-Braking Tuning

Authors: Haoran Zhao, Yuchen Yan, Yongliang Shen, Haolei Xu, Wenqi Zhang, Kaitao Song, Jian Shao, Weiming Lu, Jun Xiao, Yueting Zhuang |

阅读更多

来源: ArXiv AI | 21-05-25

SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas

Authors: Anjiang Wei, Yuheng Wu, Yingjia Wan, Tarun Suresh, Huanmi Tan, Zhanke Zhou, Sanmi Koyejo, Ke Wang, Alex Aiken |

阅读更多

来源: ArXiv AI | 21-05-25

Cost-Augmented Monte Carlo Tree Search for LLM-Assisted Planning

Authors: Zihao Zhang, Fei Liu |

阅读更多

来源: ArXiv AI | 21-05-25

ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions

Authors: Bufang Yang, Lilin Xu, Liekang Zeng, Kaiwei Liu, Siyang Jiang, Wenrui Lu, Hongkai Chen, Xiaofan Jiang, Guoliang Xing, Zhenyu Yan |

阅读更多

来源: ArXiv AI | 21-05-25

Google AI Ultrablog.google

阅读更多

来源: Hacker News | 21-05-25

Ask HN: Conversational AI to Learn a Language

阅读更多

来源: Hacker News | 21-05-25

US officials warn Apple's iPhone AI deal with Alibaba may boost China's AI sector

阅读更多

来源: The Decoder | 20-05-25

Stability AI releases a compact open text-to-audio model that runs on mobile devices

阅读更多

来源: The Decoder | 20-05-25

Japanese startup Sakana AI explores time-based thinking with brain-inspired AI model

阅读更多

来源: The Decoder | 20-05-25

Google's AI answers are changing user behavior by sharply reducing clicks to websites

阅读更多

来源: The Decoder | 20-05-25

Solving physics-based initial value problems with unsupervised machine learningaps.org

阅读更多

来源: Hacker News | 20-05-25

Questioning Representational Optimism in Deep Learninggithub.com/akarshkumar0101

阅读更多

来源: Hacker News | 20-05-25

Claude Code SDKanthropic.com

阅读更多

来源: Hacker News | 20-05-25

The behavior of LLMs in hiring decisions: Systemic biases in candidate selectiondavidrozado.substack.com

阅读更多

来源: Hacker News | 20-05-25

NeuroGen: Neural Network Parameter Generation via Large Language Models

Authors: Jiaqi Wang, Yusen Zhang, Xi Li |

阅读更多

来源: ArXiv AI | 20-05-25

ALAS: A Stateful Multi-LLM Agent Framework for Disruption-Aware Planning

Authors: Edward Y. Chang, Longling Geng |

阅读更多

来源: ArXiv AI | 20-05-25

MARGE: Improving Math Reasoning for LLMs with Guided Exploration

Authors: Jingyue Gao, Runji Lin, Keming Lu, Bowen Yu, Junyang Lin, Jianyu Chen |

阅读更多

来源: ArXiv AI | 20-05-25

Accelerating Adaptive Retrieval Augmented Generation via Instruction-Driven Representation Reduction of Retrieval Overlaps

Authors: Jie Ou, Jinyu Guo, Shuaihong Jiang, Zhaokun Wang, Libo Qin, Shunyu Yao, Wenhong Tian |

阅读更多

来源: ArXiv AI | 20-05-25

Bullying the Machine: How Personas Increase LLM Vulnerability

Authors: Ziwei Xu, Udit Sanghi, Mohan Kankanhalli |

阅读更多

来源: ArXiv AI | 20-05-25

Reasoning BO: Enhancing Bayesian Optimization with Long-Context Reasoning Power of LLMs

Authors: Zhuo Yang, Lingli Ge, Dong Han, Tianfan Fu, Yuqiang Li |

阅读更多

来源: ArXiv AI | 20-05-25

Correspondence of high-dimensional emotion structures elicited by video clips between humans and Multimodal LLMs

Authors: Haruka Asanuma, Naoko Koide-Majima, Ken Nakamura, Takato Horii, Shinji Nishimoto, Masafumi Oizumi |

阅读更多

来源: ArXiv AI | 20-05-25

TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenarios

Authors: Shaohang Wei, Wei Li, Feifan Song, Wen Luo, Tianyi Zhuang, Haochen Tan, Zhijiang Guo, Houfeng Wang |

阅读更多

来源: ArXiv AI | 20-05-25

From Grunts to Grammar: Emergent Language from Cooperative Foraging

Authors: Maytus Piriyajitakonkij, Rujikorn Charakorn, Weicheng Tao, Wei Pan, Mingfei Sun, Cheston Tan, Mengmi Zhang |

阅读更多

来源: ArXiv AI | 20-05-25

LLM-KG-Bench 3.0: A Compass for SemanticTechnology Capabilities in the Ocean of LLMs

Authors: Lars-Peter Meyer, Johannes Frey, Desiree Heim, Felix Brei, Claus Stadler, Kurt Junghanns, Michael Martin |

阅读更多

来源: ArXiv AI | 20-05-25

CAIM: Development and Evaluation of a Cognitive AI Memory Framework for Long-Term Interaction with Intelligent Agents

Authors: Rebecca Westhäußer, Frederik Berenz, Wolfgang Minker, Sebastian Zepf |

阅读更多

来源: ArXiv AI | 20-05-25

StarFT: Robust Fine-tuning of Zero-shot Models via Spuriosity Alignment

Authors: Younghyun Kim, Jongheon Jeong, Sangkyung Kwak, Kyungmin Lee, Juho Lee, Jinwoo Shin |

阅读更多

来源: ArXiv AI | 20-05-25

Adversarial Testing in LLMs: Insights into Decision-Making Vulnerabilities

Authors: Lili Zhang, Haomiaomiao Wang, Long Cheng, Libao Deng, Tomas Ward |

阅读更多

来源: ArXiv AI | 20-05-25

Enhancing LLMs for Time Series Forecasting via Structure-Guided Cross-Modal Alignment

Authors: Siming Sun, Kai Zhang, Xuejun Jiang, Wenchao Meng, Qinmin Yang |

阅读更多

来源: ArXiv AI | 20-05-25

Multi-Armed Bandits Meet Large Language Models

Authors: Djallel Bouneffouf, Raphael Feraud |

阅读更多

来源: ArXiv AI | 20-05-25

Agentic Publications: An LLM-Driven Framework for Interactive Scientific Publishing, Supplementing Traditional Papers with AI-Powered Knowledge Systems

Authors: Roberto Pugliese, George Kourousias, Francesco Venier, Grazia Garlatti Costa |

阅读更多

来源: ArXiv AI | 20-05-25

AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database

Authors: Rong Bian, Yu Geng, Zijian Yang, Bing Cheng |

阅读更多

来源: ArXiv AI | 20-05-25

MIT says a high-profile AI productivity study used data that cannot be trusted

阅读更多

来源: The Decoder | 20-05-25

OpenAI says GPT-5 is about doing everything better with "less model switching"

阅读更多

来源: The Decoder | 20-05-25

Dilbert creator Scott Adams says he will die soon from same cancer as Joe Bidenthewrap.com

阅读更多

来源: Hacker News | 20-05-25

Remarks on AI from NZnealstephenson.substack.com

阅读更多

来源: Hacker News | 20-05-25

GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art

Authors: Chenkai Zhang, Yiming Lei, Zeming Liu, Haitao Leng, Shaoguo Liu, Tingting Gao, Qingjie Liu, Yunhong Wang |

阅读更多

来源: ArXiv AI | 20-05-25

Disentangling Reasoning and Knowledge in Medical Large Language Models

Authors: Rahul Thapa, Qingyang Wu, Kevin Wu, Harrison Zhang, Angela Zhang, Eric Wu, Haotian Ye, Suhana Bedi, Nevin Aresh, Joseph Boen, Shriya Reddy, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou |

阅读更多

来源: ArXiv AI | 20-05-25

LLMs unlock new paths to monetizing exploits

Authors: Nicholas Carlini, Milad Nasr, Edoardo Debenedetti, Barry Wang, Christopher A. Choquette-Choo, Daphne Ippolito, Florian Tramèr, Matthew Jagielski |

阅读更多

来源: ArXiv AI | 20-05-25

Code-Driven Planning in Grid Worlds with Large Language Models

Authors: Ashwath Vaithinathan Aravindan, Zhisheng Tang, Mayank Kejriwal |

阅读更多

来源: ArXiv AI | 20-05-25

Embodied AI in Machine Learning -- is it Really Embodied?

Authors: Matej Hoffmann, Shubhan Parag Patni |

阅读更多

来源: ArXiv AI | 20-05-25

Interpretable Risk Mitigation in LLM Agent Systems

Authors: Jan Chojnacki |

阅读更多

来源: ArXiv AI | 20-05-25

Modeling cognitive processes of natural reading with transformer-based Language Models

Authors: Bruno Bianchi, Fermín Travi, Juan E. Kamienkowski |

阅读更多

来源: ArXiv AI | 20-05-25

Improving Assembly Code Performance with Large Language Models via Reinforcement Learning

Authors: Anjiang Wei, Tarun Suresh, Huanmi Tan, Yinglun Xu, Gagandeep Singh, Ke Wang, Alex Aiken |

阅读更多

来源: ArXiv AI | 20-05-25

Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models

Authors: Simeng Han, Stephen Xia, Grant Zhang, Howard Dai, Chen Liu, Lichang Chen, Hoang Huy Nguyen, Hongyuan Mei, Jiayuan Mao, R. Thomas McCoy |

阅读更多

来源: ArXiv AI | 20-05-25

TACO: Rethinking Semantic Communications with Task Adaptation and Context Embedding

Authors: Achintha Wijesinghe, Weiwei Wang, Suchinthaka Wanninayaka, Songyang Zhang, Zhi Ding |

阅读更多

来源: ArXiv AI | 20-05-25

RAGSynth: Synthetic Data for Robust and Faithful RAG Component Optimization

Authors: Haiyang Shen, Hang Yan, Zhongshi Xing, Mugeng Liu, Yue Li, Zhiyang Chen, Yuxiang Wang, Jiuzheng Wang, Yun Ma |

阅读更多

来源: ArXiv AI | 20-05-25

Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory

Authors: Yexiang Liu, Zekun Li, Zhi Fang, Nan Xu, Ran He, Tieniu Tan |

阅读更多

来源: ArXiv AI | 20-05-25

Navigating the Alpha Jungle: An LLM-Powered MCTS Framework for Formulaic Factor Mining

Authors: Yu Shi, Yitong Duan, Jian Li |

阅读更多

来源: ArXiv AI | 20-05-25

Can Global XAI Methods Reveal Injected Bias in LLMs? SHAP vs Rule Extraction vs RuleSHAP

Authors: Francesco Sovrano |

阅读更多

来源: ArXiv AI | 20-05-25

LD-Scene: LLM-Guided Diffusion for Controllable Generation of Adversarial Safety-Critical Driving Scenarios

Authors: Mingxing Peng, Yuting Xie, Xusen Guo, Ruoyu Yao, Hai Yang, Jun Ma |

阅读更多

来源: ArXiv AI | 20-05-25

Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs

Authors: Zhangying Feng, Qianglong Chen, Ning Lu, Yongqian Li, Siqi Cheng, Shuangmu Peng, Duyu Tang, Shengcai Liu, Zhirui Zhang |

阅读更多

来源: ArXiv AI | 20-05-25

SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning

Authors: Zheng Li, Qingxiu Dong, Jingyuan Ma, Di Zhang, Zhifang Sui |

阅读更多

来源: ArXiv AI | 20-05-25

Anthropic is forced to apologize after Claude undercuts its legal team

阅读更多

来源: The Decoder | 19-05-25

Show HN: I modeled the Voynich Manuscript with SBERT to test for structuregithub.com/brianmg

阅读更多

来源: Hacker News | 19-05-25

Meta's Behemoth AI model delay signals struggles to match new paradigms

阅读更多

来源: The Decoder | 19-05-25

Emergent social conventions and collective bias in LLM populationsscience.org

阅读更多

来源: Hacker News | 19-05-25

Understanding Transformers via N-gram Statisticsarxiv.org

阅读更多

来源: Hacker News | 18-05-25

O2 VoLTE: locating any customer with a phone callmastdatabase.co.uk

阅读更多

来源: Hacker News | 18-05-25

Emergence of Structure in Ensembles of Random Neural Networks

Authors: Luca Muscarnera, Luigi Loreti, Giovanni Todeschini, Alessio Fumagalli, Francesco Regazzoni |

阅读更多

来源: ArXiv AI | 18-05-25

SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity

Authors: Shihao Zou, Qingfeng Li, Wei Ji, Jingjing Li, Yongkui Yang, Guoqi Li, Chao Dong |

阅读更多

来源: ArXiv AI | 18-05-25

ILIF: Temporal Inhibitory Leaky Integrate-and-Fire Neuron for Overactivation in Spiking Neural Networks

Authors: Kai Sun, Peibo Duan, Levin Kuhlmann, Beilun Wang, Bin Zhang |

阅读更多

来源: ArXiv AI | 18-05-25

Visual Fidelity Index for Generative Semantic Communications with Critical Information Embedding

Authors: Jianhao Huang, Qunsong Zeng, Kaibin Huang |

阅读更多

来源: ArXiv AI | 18-05-25

Rethinking Repetition Problems of LLMs in Code Generation

Authors: Yihong Dong, Yuchen Liu, Xue Jiang, Zhi Jin, Ge Li |

阅读更多

来源: ArXiv AI | 18-05-25

Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations?

Authors: Pedro Orvalho, Marta Kwiatkowska |

阅读更多

来源: ArXiv AI | 18-05-25

IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning

Authors: Dechen Gao, Hang Wang, Hanchu Zhou, Nejib Ammar, Shatadal Mishra, Ahmadreza Moradipari, Iman Soltani, Junshan Zhang |

阅读更多

来源: ArXiv AI | 18-05-25

PIF: Anomaly detection via preference embedding

Authors: Filippo Leveni, Luca Magri, Giacomo Boracchi, Cesare Alippi |

阅读更多

来源: ArXiv AI | 18-05-25

Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps

Authors: Ningyuan Yang, Jiaxuan Gao, Feng Gao, Yi Wu, Chao Yu |

阅读更多

来源: ArXiv AI | 18-05-25

Neural Thermodynamic Laws for Large Language Model Training

Authors: Ziming Liu, Yizhou Liu, Jeff Gore, Max Tegmark |

阅读更多

来源: ArXiv AI | 18-05-25

Pre-Act: Multi-Step Planning and Reasoning Improves Acting in LLM Agents

Authors: Mrinal Rawat, Ambuje Gupta, Rushil Goomer, Alessandro Di Bari, Neha Gupta, Roberto Pieraccini |

阅读更多

来源: ArXiv AI | 18-05-25

Demystifying AI Agents: The Final Generation of Intelligence

Authors: Kevin J McNamara, Rhea Pritham Marpu |

阅读更多

来源: ArXiv AI | 18-05-25

Leveraging Graph Retrieval-Augmented Generation to Support Learners' Understanding of Knowledge Concepts in MOOCs

Authors: Mohamed Abdelmagied, Mohamed Amine Chatti, Shoeb Joarder, Qurat Ul Ain, Rawaa Alatrash |

阅读更多

来源: ArXiv AI | 18-05-25

Empirically evaluating commonsense intelligence in large language models with large-scale human judgments

Authors: Tuan Dung Nguyen, Duncan J. Watts, Mark E. Whiting |

阅读更多

来源: ArXiv AI | 18-05-25

Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models

Authors: Annie Wong, Thomas Bäck, Aske Plaat, Niki van Stein, Anna V. Kononova |

阅读更多

来源: ArXiv AI | 18-05-25

Soundcloud updates its AI training policy, but it's still unclear

阅读更多

来源: The Decoder | 18-05-25

Geoffrey Hinton's wildly overconfident AI prediction failed—now it's a lesson in humility

阅读更多

来源: The Decoder | 18-05-25

How 'The Little Prince' and AI help us better understand language development in the brain

阅读更多

来源: The Decoder | 18-05-25

LLMs are more persuasive than incentivized human persuadersarxiv.org

阅读更多

来源: Hacker News | 18-05-25

Unspoken Currency of Office Politics: Leverage and Sanction Between Coworkersgraphthinking.blogspot.com

阅读更多

来源: Hacker News | 18-05-25

Transformer neural net learns to run Conway's Game of Life just from examplessidsite.com

阅读更多

来源: Hacker News | 17-05-25

I'm Peter Roberts, immigration attorney, who does work for YC and startups. AMA

阅读更多

来源: Hacker News | 17-05-25

Show HN: Merliot – plugging physical devices into LLMsgithub.com/merliot

阅读更多

来源: Hacker News | 17-05-25

A Research Preview of Codexopenai.com

阅读更多

来源: Hacker News | 17-05-25

MIT asks arXiv to withdraw preprint of paper on AI and scientific discoveryeconomics.mit.edu

阅读更多

来源: Hacker News | 17-05-25

Getting AI to write good SQLcloud.google.com

阅读更多

来源: Hacker News | 17-05-25

Meta introduces OMol25 and UMA, new open AI tools for molecular research

阅读更多

来源: The Decoder | 17-05-25

Anthropic is reportedly testing Claude models that can fix their own mistakes

阅读更多

来源: The Decoder | 17-05-25

Will AI systems perform poorly due to AI-generated material in training data?acm.org

阅读更多

来源: Hacker News | 17-05-25

U.S. is cracking down on Huawei's AI hardware while loosening its general export regulations

阅读更多

来源: The Decoder | 16-05-25

After months of coding with LLMs, I'm going back to using my brainalbertofortin.com

阅读更多

来源: Hacker News | 16-05-25

The unreasonable effectiveness of an LLM agent loop with tool usesketch.dev

阅读更多

来源: Hacker News | 16-05-25

Show HN: Min.js style compression of tech docs for LLM contextgithub.com/marv1nnnnn

阅读更多

来源: Hacker News | 16-05-25

Google brings Gemini AI to smartwatches, cars, TVs, and XR headsets

阅读更多

来源: The Decoder | 15-05-25

OpenAI says its latest models outperform doctors in medical benchmark

阅读更多

来源: The Decoder | 15-05-25

Saudi Arabia founds AI company "Humain" - US relaxes chip export rules for Gulf states

阅读更多

来源: The Decoder | 15-05-25

Nvidia will supply advanced chips for Saudi Arabia’s Humain AI project

阅读更多

来源: The Decoder | 15-05-25

GreenFactory: Ensembling Zero-Cost Proxies to Estimate Performance of Neural Networks

Authors: Gabriel Cortês, Nuno Lourenço, Paolo Romano, Penousal Machado |

阅读更多

来源: ArXiv AI | 15-05-25

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Authors: Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Huazuo Gao, Jiashi Li, Liyue Zhang, Panpan Huang, Shangyan Zhou, Shirong Ma, Wenfeng Liang, Ying He, Yuqing Wang, Yuxuan Liu, Y.X. Wei |

阅读更多

来源: ArXiv AI | 15-05-25

A 2D Semantic-Aware Position Encoding for Vision Transformers

Authors: Xi Chen, Shiyang Zhou, Muqi Huang, Jiaxu Feng, Yun Xiong, Kun Zhou, Biao Yang, Yuhui Zhang, Huishuai Bao, Sijia Peng, Chuan Li, Feng Shi |

阅读更多

来源: ArXiv AI | 15-05-25

Evaluating GPT- and Reasoning-based Large Language Models on Physics Olympiad Problems: Surpassing Human Performance and Implications for Educational Assessment

Authors: Paul Tschisgale, Holger Maus, Fabian Kieser, Ben Kroehs, Stefan Petersen, Peter Wulff |

阅读更多

来源: ArXiv AI | 15-05-25

Customizing a Large Language Model for VHDL Design of High-Performance Microprocessors

Authors: Nicolas Dupuis, Ravi Nair, Shyam Ramji, Sean McClintock, Nishant Chauhan, Priyanka Nagpal, Bart Blaner, Ken Valk, Leon Stok, Ruchir Puri |

阅读更多

来源: ArXiv AI | 15-05-25

How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference

Authors: Nidhal Jegham, Marwen Abdelatti, Lassad Elmoubarki, Abdeltawab Hendawi |

阅读更多

来源: ArXiv AI | 15-05-25

WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models

Authors: Abdullah Mushtaq, Imran Taj, Rafay Naeem, Ibrahim Ghaznavi, Junaid Qadir |

阅读更多

来源: ArXiv AI | 15-05-25

Automated Meta Prompt Engineering for Alignment with the Theory of Mind

Authors: Aaron Baughman, Rahul Agarwal, Eduardo Morales, Gozde Akay |

阅读更多

来源: ArXiv AI | 15-05-25

The Influence of Human-inspired Agentic Sophistication in LLM-driven Strategic Reasoners

Authors: Vince Trencsenyi, Agnieszka Mensfelt, Kostas Stathis |

阅读更多

来源: ArXiv AI | 15-05-25

Reproducibility Study of "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents"

Authors: Pedro M. P. Curvo, Mara Dragomir, Salvador Torpes, Mohammadmahdi Rahimi |

阅读更多

来源: ArXiv AI | 15-05-25

Beyond the Known: Decision Making with Counterfactual Reasoning Decision Transformer

Authors: Minh Hoang Nguyen, Linh Le Pham Van, Thommen George Karimpanal, Sunil Gupta, Hung Le |

阅读更多

来源: ArXiv AI | 15-05-25

Improving the Reliability of LLMs: Combining CoT, RAG, Self-Consistency, and Self-Verification

Authors: Adarsh Kumar, Hwiyoon Kim, Jawahar Sai Nathani, Neil Roy |

阅读更多

来源: ArXiv AI | 15-05-25

Show HN: Muscle-Mem, a behavior cache for AI agentsgithub.com/pig-dot-dev

阅读更多

来源: Hacker News | 15-05-25

A server that wasn't meant to existdragas.net

阅读更多

来源: Hacker News | 15-05-25

LLMs get lost in multi-turn conversationarxiv.org

阅读更多

来源: Hacker News | 15-05-25

AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithmsdeepmind.google

阅读更多

来源: Hacker News | 15-05-25

Launch HN: Jazzberry (YC X25) – AI agent for finding bugs

阅读更多

来源: Hacker News | 15-05-25

Show HN: YapCards (iOS) – Voice-driven flashcards with AI feedback

阅读更多

来源: Hacker News | 15-05-25

100 experts call for more research into the control of AI systems

阅读更多

来源: The Decoder | 14-05-25

Show HN: HelixDB – Open-source vector-graph database for AI applications (Rust)github.com/helixdb

阅读更多

来源: Hacker News | 14-05-25

Build real-time knowledge graph for documents with LLMcocoindex.io

阅读更多

来源: Hacker News | 14-05-25

EM-LLM: Human-Inspired Episodic Memory for Infinite Context LLMsgithub.com/em-llm

阅读更多

来源: Hacker News | 14-05-25

A Survey of Deep Learning for Complex Speech Spectrograms

Authors: Yuying Xie, Zheng-Hua Tan |

阅读更多

来源: ArXiv AI | 14-05-25

Securing RAG: A Risk Assessment and Mitigation Framework

Authors: Lukas Ammann, Sara Ott, Christoph R. Landolt, Marco P. Lehmann |

阅读更多

来源: ArXiv AI | 14-05-25

CodePDE: An Inference Framework for LLM-driven PDE Solver Generation

Authors: Shanda Li, Tanya Marwah, Junhong Shen, Weiwei Sun, Andrej Risteski, Yiming Yang, Ameet Talwalkar |

阅读更多

来源: ArXiv AI | 14-05-25

Winning at All Cost: A Small Environment for Eliciting Specification Gaming Behaviors in Large Language Models

Authors: Lars Malmqvist |

阅读更多

来源: ArXiv AI | 14-05-25

Enhancing Trust Management System for Connected Autonomous Vehicles Using Machine Learning Methods: A Survey

Authors: Qian Xu, Lei Zhang, Yixiao Liu |

阅读更多

来源: ArXiv AI | 14-05-25

The Correspondence Between Bounded Graph Neural Networks and Fragments of First-Order Logic

Authors: Bernardo Cuenca Grau, Przemysław A. Wałęga |

阅读更多

来源: ArXiv AI | 14-05-25

Lost in Transmission: When and Why LLMs Fail to Reason Globally

Authors: Tobias Schnabel, Kiran Tomlinson, Adith Swaminathan, Jennifer Neville |

阅读更多

来源: ArXiv AI | 14-05-25

Decoding Neighborhood Environments with Large Language Models

Authors: Andrew Cart, Shaohu Zhang, Melanie Escue, Xugui Zhou, Haitao Zhao, Prashanth BusiReddyGari, Beiyu Lin, Shuang Li |

阅读更多

来源: ArXiv AI | 14-05-25

Benchmarking AI scientists in omics data-driven biological research

Authors: Erpai Luo, Jinmeng Jia, Yifan Xiong, Xiangyu Li, Xiaobo Guo, Baoqi Yu, Lei Wei, Xuegong Zhang |

阅读更多

来源: ArXiv AI | 14-05-25

Evaluating LLM Metrics Through Real-World Capabilities

Authors: Justin K Miller, Wenjia Tang |

阅读更多

来源: ArXiv AI | 14-05-25

Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation

Authors: Enci Zhang, Xingang Yan, Wei Lin, Tianxiang Zhang, Qianchun Lu |

阅读更多

来源: ArXiv AI | 14-05-25

Strategy-Augmented Planning for Large Language Models via Opponent Exploitation

Authors: Shuai Xu, Sijia Cui, Yanna Wang, Bo Xu, Qi Wang |

阅读更多

来源: ArXiv AI | 14-05-25

Achieving Scalable Robot Autonomy via neurosymbolic planning using lightweight local LLM

Authors: Nicholas Attolino, Alessio Capitanelli, Fulvio Mastrogiovanni |

阅读更多

来源: ArXiv AI | 14-05-25

Guiding LLM-based Smart Contract Generation with Finite State Machine

Authors: Hao Luo, Yuhao Lin, Xiao Yan, Xintong Hu, Yuxiang Wang, Qiming Zeng, Hao Wang, Jiawei Jiang |

阅读更多

来源: ArXiv AI | 14-05-25

Integrating Natural Language Processing and Exercise Monitoring for Early Diagnosis of Metabolic Syndrome: A Deep Learning Approach

Authors: Yichen Zhao, Yuhua Wang, Xi Cheng, Junhao Fang, Yang Yang |

阅读更多

来源: ArXiv AI | 14-05-25

LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs

Authors: K M Sajjadul Islam, Ayesha Siddika Nipu, Jiawei Wu, Praveen Madiraju |

阅读更多

来源: ArXiv AI | 14-05-25

DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models

Authors: Xiaoyang Chen, Xinan Dai, Yu Du, Qian Feng, Naixu Guo, Tingshuo Gu, Yuting Gao, Yingyi Gao, Xudong Han, Xiang Jiang, Yilin Jin, Hongyi Lin, Shisheng Lin, Xiangnan Li, Yuante Li, Yixing Li, Zhentao Lai, Zilu Ma, Yingrong Peng, Jiacheng Qian, Hao-Yu Sun, Jianbo Sun, Zirui Wang, Siwei Wu, Zian Wang, Bin Xu, Jianghao Xu, Yiyang Yu, Zichuan Yang, Hongji Zha, Ruichong Zhang |

阅读更多

来源: ArXiv AI | 14-05-25

OpenAI's chief scientist Jakub Pachocki says there is evidence that AI models discover novel insights

阅读更多

来源: The Decoder | 14-05-25

Insurers launch cover for losses caused by AI chatbot errorsft.com

阅读更多

来源: Hacker News | 14-05-25

Garbage collection of object storage at scalewarpstream.com

阅读更多

来源: Hacker News | 14-05-25

DeepSeek’s founder is threatening US dominance in AI racebloomberg.com

阅读更多

来源: Hacker News | 14-05-25

Confident user prompts make LLMs more likely to hallucinate

阅读更多

来源: The Decoder | 13-05-25

Stanford researchers find AI agents improve when guided by past successes

阅读更多

来源: The Decoder | 13-05-25

Microsoft could sacrifice some OpenAI shares - but wants to secure access to AI technology

阅读更多

来源: The Decoder | 13-05-25

HealthBench – An evaluation for AI systems and human healthopenai.com

阅读更多

来源: Hacker News | 13-05-25

A conversation about AI for science with Jason Pruetlanl.gov

阅读更多

来源: Hacker News | 13-05-25

A class of distributed automata that contains the modal mu-fragment

Authors: Veeti Ahvonen, Damian Heiman, Antti Kuusisto |

阅读更多

来源: ArXiv AI | 13-05-25

Reliable Collaborative Conversational Agent System Based on LLMs and Answer Set Programming

Authors: Yankai Zeng, Gopal Gupta |

阅读更多

来源: ArXiv AI | 13-05-25

KCluster: An LLM-based Clustering Approach to Knowledge Component Discovery

Authors: Yumou Wei, Paulo Carvalho, John Stamper |

阅读更多

来源: ArXiv AI | 13-05-25

Exploring Multimodal Foundation AI and Expert-in-the-Loop for Sustainable Management of Wild Salmon Fisheries in Indigenous Rivers

Authors: Chi Xu, Yili Jin, Sami Ma, Rongsheng Qian, Hao Fang, Jiangchuan Liu, Xue Liu, Edith C.H. Ngai, William I. Atlas, Katrina M. Connors, Mark A. Spoljaric |

阅读更多

来源: ArXiv AI | 13-05-25

Control Plane as a Tool: A Scalable Design Pattern for Agentic AI Systems

Authors: Sivasathivel Kandasamy |

阅读更多

来源: ArXiv AI | 13-05-25

Embodied Intelligence: The Key to Unblocking Generalized Artificial Intelligence

Authors: Jinhao Jiang, Changlin Chen, Shile Feng, Wanru Geng, Zesheng Zhou, Ni Wang, Shuai Li, Feng-Qi Cui, Erbao Dong |

阅读更多

来源: ArXiv AI | 13-05-25

From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering

Authors: Gaurab Sarkar, Sougata Saha |

阅读更多

来源: ArXiv AI | 13-05-25

LLM-Augmented Chemical Synthesis and Design Decision Programs

Authors: Haorui Wang, Jeff Guo, Lingkai Kong, Rampi Ramprasad, Philippe Schwaller, Yuanqi Du, Chao Zhang |

阅读更多

来源: ArXiv AI | 13-05-25

Explainable AI the Latest Advancements and New Trends

Authors: Bowen Long, Enjie Liu, Renxi Qiu, Yanqing Duan |

阅读更多

来源: ArXiv AI | 13-05-25

DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs

Authors: Yubo Shu, Zhewei Huang, Xin Wu, Chen Hu, Shuchang Zhou, Daxin Jiang |

阅读更多

来源: ArXiv AI | 13-05-25

Efficient Fault Detection in WSN Based on PCA-Optimized Deep Neural Network Slicing Trained with GOA

Authors: Mahmood Mohassel Feghhi, Raya Majid Alsharfa, Majid Hameed Majeed |

阅读更多

来源: ArXiv AI | 13-05-25

RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models

Authors: Hanzheng Dai, Yuanliang Li, Zhibo Zhang, Jun Yan |

阅读更多

来源: ArXiv AI | 13-05-25

Architectural Precedents for General Agents using Large Language Models

Authors: Robert E. Wray, James R. Kirk, John E. Laird |

阅读更多

来源: ArXiv AI | 13-05-25

AIS Data-Driven Maritime Monitoring Based on Transformer: A Comprehensive Review

Authors: Zhiye Xie, Enmei Tu, Xianping Fu, Guoliang Yuan, Yi Han |

阅读更多

来源: ArXiv AI | 13-05-25

Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks

Authors: Kai Xu, YiWei Mao, XinYi Guan, ZiLong Feng |

阅读更多

来源: ArXiv AI | 13-05-25

How well do LLMs reason over tabular data, really?

Authors: Cornelius Wolff, Madelon Hulsebos |

阅读更多

来源: ArXiv AI | 13-05-25

QuantX: A Framework for Hardware-Aware Quantization of Generative AI Workloads

Authors: Khurram Mazher, Saad Bin Nasir |

阅读更多

来源: ArXiv AI | 13-05-25

YuLan-OneSim: Towards the Next Generation of Social Simulator with Large Language Models

Authors: Lei Wang, Heyang Gao, Xiaohe Bo, Xu Chen, Ji-Rong Wen |

阅读更多

来源: ArXiv AI | 13-05-25

"I Apologize For Not Understanding Your Policy": Exploring the Specification and Evaluation of User-Managed Access Control Policies by AI Virtual Assistants

Authors: Jennifer Mondragon, Carlos Rubio-Medrano, Gael Cruz, Dvijesh Shastri |

阅读更多

来源: ArXiv AI | 13-05-25

Multi-Agent Systems for Robotic Autonomy with LLMs

Authors: Junhong Chen, Ziqi Yang, Haoyuan G Xu, Dandan Zhang, George Mylonas |

阅读更多

来源: ArXiv AI | 13-05-25

Evolutionary thoughts: integration of large language models and evolutionary algorithms

Authors: Antonio Jimeno Yepes, Pieter Barnard |

阅读更多

来源: ArXiv AI | 13-05-25

What Is Next for LLMs? Next-Generation AI Computing Hardware Using Photonic Chips

Authors: Renjie Li, Wenjie Wei, Qi Xin, Xiaoli Liu, Sixuan Mao, Erik Ma, Zijian Chen, Malu Zhang, Haizhou Li, Zhaoyu Zhang |

阅读更多

来源: ArXiv AI | 13-05-25

AgentXploit: End-to-End Redteaming of Black-Box AI Agents

Authors: Zhun Wang, Vincent Siu, Zhe Ye, Tianneng Shi, Yuzhou Nie, Xuandong Zhao, Chenguang Wang, Wenbo Guo, Dawn Song |

阅读更多

来源: ArXiv AI | 13-05-25

Human-in-the-Loop AI for HVAC Management Enhancing Comfort and Energy Efficiency

Authors: Xinyu Liang, Frits de Nijs, Buser Say, Hao Wang |

阅读更多

来源: ArXiv AI | 13-05-25

Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI

Authors: Benjamin Raphael Ernhofer, Daniil Prokhorov, Jannica Langner, Dominik Bollmann |

阅读更多

来源: ArXiv AI | 13-05-25

IRNN: Innovation-driven Recurrent Neural Network for Time-Series Data Modeling and Prediction

Authors: Yifan Zhou, Yibo Wang, Chao Shang |

阅读更多

来源: ArXiv AI | 13-05-25

Multimodal Sentiment Analysis on CMU-MOSEI Dataset using Transformer-based Models

Authors: Jugal Gajjar, Kaustik Ranaware |

阅读更多

来源: ArXiv AI | 13-05-25

LLMs Outperform Experts on Challenging Biology Benchmarks

Authors: Lennart Justen |

阅读更多

来源: ArXiv AI | 13-05-25

UniSymNet: A Unified Symbolic Network Guided by Transformer

Authors: Xinxin Li, Juan Zhang, Da Li, Xingyu Liu, Jin Xu, Junping Yin |

阅读更多

来源: ArXiv AI | 13-05-25

The Application of Deep Learning for Lymph Node Segmentation: A Systematic Review

Authors: Jingguo Qu, Xinyang Han, Man-Lik Chui, Yao Pu, Simon Takadiyi Gunda, Ziman Chen, Jing Qin, Ann Dorothy King, Winnie Chiu-Wing Chu, Jing Cai, Michael Tin-Cheung Ying |

阅读更多

来源: ArXiv AI | 13-05-25

A Scaling Law for Token Efficiency in LLM Fine-Tuning Under Fixed Compute Budgets

Authors: Ryan Lagasse, Aidan Kiernans, Avijit Ghosh, Shiri Dori-Hacohen |

阅读更多

来源: ArXiv AI | 13-05-25

HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics

Authors: Lennart Luettgau, Harry Coppock, Magda Dubois, Christopher Summerfield, Cozmin Ududec |

阅读更多

来源: ArXiv AI | 13-05-25

Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods

Authors: Markov Grey, Charbel-Raphaël Segerie |

阅读更多

来源: ArXiv AI | 13-05-25

Leveraging Large Language Models for enzymatic reaction prediction and characterization

Authors: Lorenzo Di Fruscia, Jana Marie Weber |

阅读更多

来源: ArXiv AI | 13-05-25

Combining Abstract Argumentation and Machine Learning for Efficiently Analyzing Low-Level Process Event Streams

Authors: Bettina Fazzinga, Sergio Flesca, Filippo Furfaro, Luigi Pontieri, Francesco Scala |

阅读更多

来源: ArXiv AI | 13-05-25

APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning

Authors: Azim Ospanov, Roozbeh Yousefzadeh |

阅读更多

来源: ArXiv AI | 13-05-25

ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding

Authors: Shuai Wang, Ivona Najdenkoska, Hongyi Zhu, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring |

阅读更多

来源: ArXiv AI | 13-05-25

Free and Fair Hardware: A Pathway to Copyright Infringement-Free Verilog Generation using LLMs

Authors: Sam Bush, Matthew DeLorenzo, Phat Tieu, Jeyavijayan Rajendran |

阅读更多

来源: ArXiv AI | 13-05-25

Bytedance launches Agent TARS, an open-source AI automation agent

阅读更多

来源: The Decoder | 12-05-25

Google recaps how its LLMs could change in-game interactions

阅读更多

来源: The Decoder | 12-05-25

Five major obstacles are holding back RAG systems in healthcare

阅读更多

来源: The Decoder | 12-05-25

Writing an LLM from scratch, part 13 – attention heads are dumbgilesthomas.com

阅读更多

来源: Hacker News | 12-05-25

US Copyright Office found AI companies breach copyright. Its boss was firedtheregister.com

阅读更多

来源: Hacker News | 12-05-25

Klarna changes its AI tune and again recruits humans for customer servicecustomerexperiencedive.com

阅读更多

来源: Hacker News | 12-05-25

Avoiding AI is hard – but our freedom to opt out must be protectedtheconversation.com

阅读更多

来源: Hacker News | 12-05-25

Custom SIM card in Tesla Model 3 2024, Tesla Model Y 2025 and Cybertruckolegkutkov.me

阅读更多

来源: Hacker News | 12-05-25

OpenAI adds new fine-tuning options for o4-mini and GPT-4.1

阅读更多

来源: The Decoder | 11-05-25

Software Development Life Cycle Perspective: A Survey of Benchmarks for CodeLLMs and Agents

Authors: Kaixin Wang, Tianlin Li, Xiaoyu Zhang, Chong Wang, Weisong Sun, Yang Liu, Bin Shi |

阅读更多

来源: ArXiv AI | 11-05-25

T-T: Table Transformer for Tagging-based Aspect Sentiment Triplet Extraction

Authors: Kun Peng, Chaodong Tong, Cong Cao, Hao Peng, Qian Li, Guanlin Wu, Lei Jiang, Yanbing Liu, Philip S. Yu |

阅读更多

来源: ArXiv AI | 11-05-25

Put CASH on Bandits: A Max K-Armed Problem for Automated Machine Learning

Authors: Amir Rezaei Balef, Claire Vernade, Katharina Eggensperger |

阅读更多

来源: ArXiv AI | 11-05-25

Incentive-Aware Machine Learning; Robustness, Fairness, Improvement & Causality

Authors: Chara Podimata |

阅读更多

来源: ArXiv AI | 11-05-25

High-fidelity Grain Growth Modeling: Leveraging Deep Learning for Fast Computations

Authors: Pungponhavoan Tep, Marc Bernacki |

阅读更多

来源: ArXiv AI | 11-05-25

Threshold Modulation for Online Test-Time Adaptation of Spiking Neural Networks

Authors: Kejie Zhao, Wenjia Hua, Aiersi Tuerhong, Luziwei Leng, Yuxin Ma, Qinghua Guo |

阅读更多

来源: ArXiv AI | 11-05-25

Towards Artificial Intelligence Research Assistant for Expert-Involved Learning

Authors: Tianyu Liu, Simeng Han, Xiao Luo, Hanchen Wang, Pan Lu, Biqing Zhu, Yuge Wang, Keyi Li, Jiapeng Chen, Rihao Qu, Yufeng Liu, Xinyue Cui, Aviv Yaish, Yuhang Chen, Minsheng Hao, Chuhan Li, Kexing Li, Arman Cohan, Hua Xu, Mark Gerstein, James Zou, Hongyu Zhao |

阅读更多

来源: ArXiv AI | 11-05-25

StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

Authors: Haibo Wang, Bo Feng, Zhengfeng Lai, Mingze Xu, Shiyu Li, Weifeng Ge, Afshin Dehghan, Meng Cao, Ping Huang |

阅读更多

来源: ArXiv AI | 11-05-25

TransProQA: an LLM-based literary Translation evaluation metric with Professional Question Answering

Authors: Ran Zhang, Wei Zhao, Lieve Macken, Steffen Eger |

阅读更多

来源: ArXiv AI | 11-05-25

Large Language Models are Autonomous Cyber Defenders

Authors: Sebastián R. Castro, Roberto Campbell, Nancy Lau, Octavio Villalobos, Jiaqi Duan, Alvaro A. Cardenas |

阅读更多

来源: ArXiv AI | 11-05-25

The Promise and Limits of LLMs in Constructing Proofs and Hints for Logic Problems in Intelligent Tutoring Systems

Authors: Sutapa Dey Tithi, Arun Kumar Ramesh, Clara DiMarco, Xiaoyi Tian, Nazia Alam, Kimia Fazeli, Tiffany Barnes |

阅读更多

来源: ArXiv AI | 11-05-25

Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards

Authors: Jaeho Kim, Yunseok Lee, Seulki Lee |

阅读更多

来源: ArXiv AI | 11-05-25

Position: Epistemic Artificial Intelligence is Essential for Machine Learning Models to Know When They Do Not Know

Authors: Shireen Kudukkil Manchingal, Fabio Cuzzolin |

阅读更多

来源: ArXiv AI | 11-05-25

A Reputation System for Large Language Model-based Multi-agent Systems to Avoid the Tragedy of the Commons

Authors: Siyue Ren, Wanli Fu, Xinkun Zou, Chen Shen, Yi Cai, Chen Chu, Zhen Wang, Shuyue Hu |

阅读更多

来源: ArXiv AI | 11-05-25

Is there a half-life for the success rates of AI agents?

Authors: Toby Ord |

阅读更多

来源: ArXiv AI | 11-05-25

Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation

Authors: Luca Marzari, Isabella Mastroeni, Alessandro Farinelli |

阅读更多

来源: ArXiv AI | 11-05-25

A Pain Assessment Framework based on multimodal data and Deep Machine Learning methods

Authors: Stefanos Gkikas |

阅读更多

来源: ArXiv AI | 11-05-25

ZeroSearch: Alibaba trains search assistant in AI simulation

阅读更多

来源: The Decoder | 11-05-25

Show HN: Code Claude Codegithub.com/rvca212

阅读更多

来源: Hacker News | 11-05-25

LTXVideo 13B AI video generationltxv.video

阅读更多

来源: Hacker News | 10-05-25

ChatGPT's user base expands while established web giants lose ground

阅读更多

来源: The Decoder | 10-05-25

Hugging Face unveils experimental AI agent for computers

阅读更多

来源: The Decoder | 10-05-25

OpenAI plans "cderGPT" for the US Food and Drug Administration (FDA)

阅读更多

来源: The Decoder | 10-05-25

Odin, a Pragmatic C Alternative with a Go Flavourbitshifters.cc

阅读更多

来源: Hacker News | 10-05-25

Fighting Unwanted Notifications with Machine Learning in Chromechromium.org

阅读更多

来源: Hacker News | 10-05-25

Microsoft leverages Google's open A2A protocol for interoperable AI agents

阅读更多

来源: The Decoder | 09-05-25

A flat pricing subscription for Claude Codeanthropic.com

阅读更多

来源: Hacker News | 09-05-25

Ciro (YC S22) is hiring a software engineer to build AI agents for salesycombinator.com

阅读更多

来源: Hacker News | 09-05-25

Notes on rolling out Cursor and Claude Codeghiculescu.substack.com

阅读更多

来源: Hacker News | 09-05-25

OpenAI launches a program to partner with governments on global AI infrastructure

阅读更多

来源: The Decoder | 08-05-25

EU's leading AI startup Mistral unveils Medium 3 and Le Chat Enterprise

阅读更多

来源: The Decoder | 08-05-25

By 2026, most firms expect to have a Chief AI Officer on staff

阅读更多

来源: The Decoder | 08-05-25

Web search on the Anthropic APIanthropic.com

阅读更多

来源: Hacker News | 08-05-25

Create and edit images with Gemini 2.0 in previewgoogleblog.com

阅读更多

来源: Hacker News | 08-05-25

Mistral ships Le Chat – enterprise AI assistant that can run on premmistral.ai

阅读更多

来源: Hacker News | 08-05-25

Sparsity is All You Need: Rethinking Biological Pathway-Informed Approaches in Deep Learning

Authors: Isabella Caranzano, Corrado Pancotti, Cesare Rollo, Flavio Sartori, Pietro Liò, Piero Fariselli, Tiziana Sanavia |

阅读更多

来源: ArXiv AI | 08-05-25

Balancing Accuracy, Calibration, and Efficiency in Active Learning with Vision Transformers Under Label Noise

Authors: Moseli Mots'oehli, Hope Mogale, Kyungim Baek |

阅读更多

来源: ArXiv AI | 08-05-25

Multi-Granular Attention based Heterogeneous Hypergraph Neural Network

Authors: Hong Jin, Kaicheng Zhou, Jie Yin, Lan You, Zhifeng Zhou |

阅读更多

来源: ArXiv AI | 08-05-25

Detecting Concept Drift in Neural Networks Using Chi-squared Goodness of Fit Testing

Authors: Jacob Glenn Ayers, Buvaneswari A. Ramanan, Manzoor A. Khan |

阅读更多

来源: ArXiv AI | 08-05-25

OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models

Authors: Xiaoyu Xu, Minxin Du, Qingqing Ye, Haibo Hu |

阅读更多

来源: ArXiv AI | 08-05-25

The Aloe Family Recipe for Open and Specialized Healthcare LLMs

Authors: Dario Garcia-Gasulla, Jordi Bayarri-Planas, Ashwin Kumar Gururajan, Enrique Lopez-Cuena, Adrian Tormos, Daniel Hinjos, Pablo Bernabeu-Perez, Anna Arias-Duart, Pablo Agustin Martin-Torres, Marta Gonzalez-Mallo, Sergio Alvarez-Napagao, Eduard Ayguadé-Parra, Ulises Cortés |

阅读更多

来源: ArXiv AI | 08-05-25

"I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments

Authors: Ziyi Zhang, Zhen Sun, Zongmin Zhang, Zifan Peng, Yuemeng Zhao, Zichun Wang, Zeren Luo, Ruiting Zuo, Xinlei He |

阅读更多

来源: ArXiv AI | 08-05-25

Automatic Music Transcription using Convolutional Neural Networks and Constant-Q transform

Authors: Yohannis Telila, Tommaso Cucinotta, Davide Bacciu |

阅读更多

来源: ArXiv AI | 08-05-25

Model-Based AI planning and Execution Systems for Robotics

Authors: Or Wertheim, Ronen I. Brafman |

阅读更多

来源: ArXiv AI | 08-05-25

Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind

Authors: Mouad Abrini, Omri Abend, Dina Acklin, Henny Admoni, Gregor Aichinger, Nitay Alon, Zahra Ashktorab, Ashish Atreja, Moises Auron, Alexander Aufreiter, Raghav Awasthi, Soumya Banerjee, Joe M. Barnby, Rhea Basappa, Severin Bergsmann, Djallel Bouneffouf, Patrick Callaghan, Marc Cavazza, Thierry Chaminade, Sonia Chernova, Mohamed Chetouan, Moumita Choudhury, Axel Cleeremans, Jacek B. Cywinski, Fabio Cuzzolin, Hokin Deng, N'yoma Diamond, Camilla Di Pasquasio, Guillaume Dumas, Max van Duijn, Mahapatra Dwarikanath, Qingying Gao, Ashok Goel, Rebecca Goldstein, Matthew Gombolay, Gabriel Enrique Gonzalez, Amar Halilovic, Tobias Halmdienst, Mahimul Islam, Julian Jara-Ettinger, Natalie Kastel, Renana Keydar, Ashish K. Khanna, Mahdi Khoramshahi, JiHyun Kim, MiHyeon Kim, YoungBin Kim, Senka Krivic, Nikita Krasnytskyi, Arun Kumar, JuneHyoung Kwon, Eunju Lee, Shane Lee, Peter R. Lewis, Xue Li, Yijiang Li, Michal Lewandowski, Nathan Lloyd, Matthew B. Luebbers, Dezhi Luo, Haiyun Lyu, Dwarikanath Mahapatra, Kamal Maheshwari, Mallika Mainali, Piyush Mathur, Patrick Mederitsch, Shuwa Miura, Manuel Preston de Miranda, Reuth Mirsky, Shreya Mishra, Nina Moorman, Katelyn Morrison, John Muchovej, Bernhard Nessler, Felix Nessler, Hieu Minh Jord Nguyen, Abby Ortego, Francis A. Papay, Antoine Pasquali, Hamed Rahimi, Charumathi Raghu, Amanda Royka, Stefan Sarkadi, Jaelle Scheuerman, Simon Schmid, Paul Schrater, Anik Sen, Zahra Sheikhbahaee, Ke Shi, Reid Simmons, Nishant Singh, Mason O. Smith, Ramira van der Meulen, Anthia Solaki, Haoran Sun, Viktor Szolga, Matthew E. Taylor, Travis Taylor, Sanne Van Waveren, Juan David Vargas |

阅读更多

来源: ArXiv AI | 08-05-25

EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning

Authors: Zhenghao Xing, Xiaowei Hu, Chi-Wing Fu, Wenhai Wang, Jifeng Dai, Pheng-Ann Heng |

阅读更多

来源: ArXiv AI | 08-05-25

Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization

Authors: Wenjun Cao |

阅读更多

来源: ArXiv AI | 08-05-25

The Power of Stories: Narrative Priming Shapes How LLM Agents Collaborate and Compete

Authors: Gerrit Großmann, Larisa Ivanova, Sai Leela Poduru, Mohaddeseh Tabrizian, Islam Mesabah, David A. Selby, Sebastian J. Vollmer |

阅读更多

来源: ArXiv AI | 08-05-25

LogiDebrief: A Signal-Temporal Logic based Automated Debriefing Approach with Large Language Models Integration

Authors: Zirong Chen, Ziyan An, Jennifer Reynolds, Kristin Mullen, Stephen Martini, Meiyi Ma |

阅读更多

来源: ArXiv AI | 08-05-25

TrajEvo: Designing Trajectory Prediction Heuristics via LLM-driven Evolution

Authors: Zhikai Zhao, Chuanbo Hua, Federico Berto, Kanghoon Lee, Zihan Ma, Jiachen Li, Jinkyoo Park |

阅读更多

来源: ArXiv AI | 08-05-25

ChatGPT sees about 50 percent more use on weekdays than weekends

阅读更多

来源: The Decoder | 08-05-25

OpenAI restructures as public benefit corporation under non-profit control

阅读更多

来源: The Decoder | 08-05-25

Google upgrades Gemini 2.5 Pro for coding and app development

阅读更多

来源: The Decoder | 08-05-25

Wikidive – AI guided rabbitholes in Wikipediawikidive.tulv.in

阅读更多

来源: Hacker News | 08-05-25

How to Average in Prolog (2017)storytotell.org

阅读更多

来源: Hacker News | 08-05-25

Detecting Quishing Attacks with Machine Learning Techniques Through QR Code Analysis

Authors: Fouad Trad, Ali Chehab |

阅读更多

来源: ArXiv AI | 07-05-25

An Analysis of Hyper-Parameter Optimization Methods for Retrieval Augmented Generation

Authors: Matan Orbach, Ohad Eytan, Benjamin Sznajder, Ariel Gera, Odellia Boni, Yoav Kantor, Gal Bloch, Omri Levy, Hadas Abraham, Nitzan Barzilay, Eyal Shnarch, Michael E. Factor, Shila Ofek-Koifman, Paula Ta-Shma, Assaf Toledo |

阅读更多

来源: ArXiv AI | 07-05-25

Blending 3D Geometry and Machine Learning for Multi-View Stereopsis

Authors: Vibhas Vats, Md. Alimoor Reza, David Crandall, Soon-heung Jung |

阅读更多

来源: ArXiv AI | 07-05-25

Rapid AI-based generation of coverage paths for dispensing applications

Authors: Simon Baeuerle, Ian F. Mendonca, Kristof Van Laerhoven, Ralf Mikut, Andreas Steimer |

阅读更多

来源: ArXiv AI | 07-05-25

LlamaFirewall: An open source guardrail system for building secure AI agents

Authors: Sahana Chennabasappa, Cyrus Nikolaidis, Daniel Song, David Molnar, Stephanie Ding, Shengye Wan, Spencer Whitman, Lauren Deason, Nicholas Doucette, Abraham Montilla, Alekhya Gampa, Beto de Paola, Dominik Gabi, James Crnkovich, Jean-Christophe Testud, Kat He, Rashnil Chaturvedi, Wu Zhou, Joshua Saxe |

阅读更多

来源: ArXiv AI | 07-05-25

Holmes: Automated Fact Check with Large Language Models

Authors: Haoran Ou, Gelei Deng, Xingshuo Han, Jie Zhang, Xinlei He, Han Qiu, Shangwei Guo, Tianwei Zhang |

阅读更多

来源: ArXiv AI | 07-05-25

Is AI currently capable of identifying wild oysters? A comparison of human annotators against the AI model, ODYSSEE

Authors: Brendan Campbell, Alan Williams, Kleio Baxevani, Alyssa Campbell, Rushabh Dhoke, Rileigh E. Hudock, Xiaomin Lin, Vivek Mange, Bernhard Neuberger, Arjun Suresh, Alhim Vera, Arthur Trembanis, Herbert G. Tanner, Edward Hale |

阅读更多

来源: ArXiv AI | 07-05-25

CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics

Authors: Junqi Liu, Xiaohan Lin, Jonas Bayer, Yael Dillies, Weijie Jiang, Xiaodan Liang, Roman Soletskyi, Haiming Wang, Yunzhou Xie, Beibei Xiong, Zhengfeng Yang, Jujian Zhang, Lihong Zhi, Jia Li, Zhengying Liu |

阅读更多

来源: ArXiv AI | 07-05-25

Capability-Driven Skill Generation with LLMs: A RAG-Based Approach for Reusing Existing Libraries and Interfaces

Authors: Luis Miguel Vieira da Silva, Aljosha Köcher, Nicolas König, Felix Gehlhoff, Alexander Fay |

阅读更多

来源: ArXiv AI | 07-05-25

RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection via Retrieval-Augmented Generation

Authors: Tiantian Gan, Qiyao Sun |

阅读更多

来源: ArXiv AI | 07-05-25

Validating the Effectiveness of a Large Language Model-based Approach for Identifying Children's Development across Various Free Play Settings in Kindergarten

Authors: Yuanyuan Yang, Yuan Shen, Tianchen Sun, Yangbin Xie |

阅读更多

来源: ArXiv AI | 07-05-25

Procedural Memory Is Not All You Need: Bridging Cognitive Gaps in LLM-Based Agents

Authors: Schaun Wheeler, Olivier Jeunen |

阅读更多

来源: ArXiv AI | 07-05-25

am-ELO: A Stable Framework for Arena-based LLM Evaluation

Authors: Zirui Liu, Jiatong Li, Yan Zhuang, Qi Liu, Shuanghong Shen, Jie Ouyang, Mingyue Cheng, Shijin Wang |

阅读更多

来源: ArXiv AI | 07-05-25

OSUniverse: Benchmark for Multimodal GUI-navigation AI Agents

Authors: Mariya Davydova, Daniel Jeffries, Patrick Barker, Arturo Márquez Flores, Sinéad Ryan |

阅读更多

来源: ArXiv AI | 07-05-25

Graph Drawing for LLMs: An Empirical Evaluation

Authors: Walter Didimo, Fabrizio Montecchiani, Tommaso Piselli |

阅读更多

来源: ArXiv AI | 07-05-25

Accents in latent spaces: How AI hears accent strength in Englishboldvoice.com

阅读更多

来源: Hacker News | 07-05-25

Gemini 2.5 Pro Previewgoogleblog.com

阅读更多

来源: Hacker News | 07-05-25

Claude's system prompt is over 24k tokens with toolsgithub.com/asgeirtj

阅读更多

来源: Hacker News | 07-05-25

OpenAI reaches agreement to buy Windsurf for $3Bbloomberg.com

阅读更多

来源: Hacker News | 07-05-25

Show HN: Clippy – 90s UI for local LLMsfelixrieseberg.github.io

阅读更多

来源: Hacker News | 07-05-25

I built an AI code review agent in a few hours, here's what I learnedsourcebot.dev

阅读更多

来源: Hacker News | 07-05-25

A coherent European/non-US cloud strategyberthub.eu

阅读更多

来源: Hacker News | 07-05-25