From GPT-2 to Claude Mythos: The return of AI models deemed 'too dangerous to release'
阅读更多来源: The Decoder | 08-04-26
Google's AI Overviews are correct nine out of ten times, study finds
阅读更多来源: The Decoder | 08-04-26
LLM plays an 8-bit Commander X16 game using structured "smart senses"russell-harper.com
阅读更多来源: Hacker News | 08-04-26
System Card: Claude Mythos Preview [pdf]anthropic.com
阅读更多来源: Hacker News | 08-04-26
Project Glasswing: Securing critical software for the AI eraanthropic.com
阅读更多来源: Hacker News | 08-04-26
MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPUarxiv.org
阅读更多来源: Hacker News | 08-04-26
HYVE: Hybrid Views for LLM Context Engineering over Machine Data
Authors: Jian Tan, Fan Bu, Yuqing Gao, Dev Khanolkar, Jason Mackay, Boris Sobolev, Lei Jin, Li Zhang |
阅读更多来源: ArXiv AI | 08-04-26
Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning
Authors: Xiaotian Zhou, Di Tang, Xiaofeng Wang, Xiaozhong Liu |
阅读更多来源: ArXiv AI | 08-04-26
Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition
Authors: Yushuo Zheng (1 and 2), Huiyu Duan (1), Zicheng Zhang (1 and 2), Yucheng Zhu (1), Xiongkuo Min (1), Guangtao Zhai (1 and 2) ((1) Affiliation 1, (2) Affiliation 2) |
阅读更多来源: ArXiv AI | 08-04-26
Experience Transfer for Multimodal LLM Agents in Minecraft Game
Authors: Chenghao Li, Jun Liu, Songbo Zhang, Huadong Jian, Hao Ni, Lik-Hang Lee, Sung-Ho Bae, Guoqing Wang, Yang Yang, Chaoning Zhang |
阅读更多来源: ArXiv AI | 08-04-26
From Large Language Model Predicates to Logic Tensor Networks: Neurosymbolic Offer Validation in Regulated Procurement
Authors: Cedric Haufe, Frieder Stolzenburg |
阅读更多来源: ArXiv AI | 08-04-26
SignalClaw: LLM-Guided Evolutionary Synthesis of Interpretable Traffic Signal Control Skills
Authors: Da Lei, Feng Xiao, Lu Li, Yuzhan Liu |
阅读更多来源: ArXiv AI | 08-04-26
Beyond Behavior: Why AI Evaluation Needs a Cognitive Revolution
Authors: Amir Konigsberg |
阅读更多来源: ArXiv AI | 08-04-26
Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge
Authors: Xin Sun, Di Wu, Sijing Qin, Isao Echizen, Abdallah El Ali, Saku Sugawara |
阅读更多来源: ArXiv AI | 08-04-26
LUDOBENCH: Evaluating LLM Behavioural Decision-Making Through Spot-Based Board Game Scenarios in Ludo
Authors: Ojas Jain, Dhruv Kumar |
阅读更多来源: ArXiv AI | 08-04-26
CuraLight: Debate-Guided Data Curation for LLM-Centered Traffic Signal Control
Authors: Qing Guo, Xinhang Li, Junyu Chen, Zheng Guo, Shengzhe Xu, Lin Zhang, Lei Li |
阅读更多来源: ArXiv AI | 08-04-26
Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents
Authors: Shuai Zhen, Yanhua Yu, Ruopei Guo, Nan Cheng, Yang Deng |
阅读更多来源: ArXiv AI | 08-04-26
Can Large Language Models Reinvent Foundational Algorithms?
Authors: Jian Zhao, Haoren Luo, Yu Wang, Yuhan Cao, Pingyue Sheng, Tianxing He |
阅读更多来源: ArXiv AI | 08-04-26
When Do We Need LLMs? A Diagnostic for Language-Driven Bandits
Authors: Uljad Berdica, Fernando Acero, Anton Ipsen, Parisa Zehtabi, Michael Cashmore, Manuela Veloso |
阅读更多来源: ArXiv AI | 08-04-26
Deep Researcher Agent: An Autonomous Framework for 24/7 Deep Learning Experimentation with Zero-Cost Monitoring
Authors: Xiangyue Zhang |
阅读更多来源: ArXiv AI | 08-04-26
Reciprocal Trust and Distrust in Artificial Intelligence Systems: The Hard Problem of Regulation
Authors: Martino Maggetti |
阅读更多来源: ArXiv AI | 08-04-26
Joint Knowledge Base Completion and Question Answering by Combining Large Language Models and Small Language Models
Authors: Yinan Liu, Dongying Lin, Sigang Luo, Xiaochun Yang, Bin Wang |
阅读更多来源: ArXiv AI | 08-04-26
JTON: A Token-Efficient JSON Superset with Zen Grid Tabular Encoding for Large Language Models
Authors: Gowthamkumar Nandakishore |
阅读更多来源: ArXiv AI | 08-04-26
Context-Value-Action Architecture for Value-Driven Large Language Model Agents
Authors: TianZe Zhang, Sirui Sun, Yuhang Xie, Xin Zhang, Zhiqiang Wu, Guojie Song |
阅读更多来源: ArXiv AI | 08-04-26
HybridKV: Hybrid KV Cache Compression for Efficient Multimodal Large Language Model Inference
Authors: Bowen Zeng, Feiyang Ren, Jun Zhang, Xiaoling Gu, Ke Chen, Lidan Shou, Huan Li |
阅读更多来源: ArXiv AI | 08-04-26
Flowr -- Scaling Up Retail Supply Chain Operations Through Agentic AI in Large Scale Supermarket Chains
Authors: Eranga Bandara, Ross Gore, Sachin Shetty, Piumi Siyambalapitiya, Sachini Rajapakse, Isurunima Kularathna, Pramoda Karunarathna, Ravi Mukkamala, Peter Foytik, Safdar H. Bouk, Abdul Rahman, Xueping Liang, Amin Hass, Tharaka Hewa, Ng Wee Keong, Kasun De Zoysa, Aruna Withanage, Nilaan Loganathan |
阅读更多来源: ArXiv AI | 08-04-26
Beyond Compromise: Pareto-Lenient Consensus for Efficient Multi-Preference LLM Alignment
Authors: Renxuan Tan, Rongpeng Li, Zhifeng Zhao, Honggang Zhang |
阅读更多来源: ArXiv AI | 08-04-26
Epistemic Blinding: An Inference-Time Protocol for Auditing Prior Contamination in LLM-Assisted Analysis
Authors: Michael Cuccarese |
阅读更多来源: ArXiv AI | 08-04-26
Artificial Intelligence and the Structure of Mathematics
Authors: Maissam Barkeshli, Michael R. Douglas, Michael H. Freedman |
阅读更多来源: ArXiv AI | 08-04-26
How LLMs Follow Instructions: Skillful Coordination, Not a Universal Mechanism
Authors: Elisabetta Rocchetti, Alfio Ferrara |
阅读更多来源: ArXiv AI | 08-04-26
LLM scraper bots are overloading acme.com's HTTPS serveracme.com
阅读更多来源: Hacker News | 08-04-26
OpenAI says its new model GPT-2 is too dangerous to release (2019)slate.com
阅读更多来源: Hacker News | 08-04-26
Issue: Claude Code is unusable for complex engineering tasks with Feb updatesgithub.com/anthropics
阅读更多来源: Hacker News | 07-04-26
Pedagogical Safety in Educational Reinforcement Learning: Formalizing and Detecting Reward Hacking in AI Tutoring Systems
Authors: Oluseyi Olukola, Nick Rahimi |
阅读更多来源: ArXiv AI | 07-04-26
Comparative reversal learning reveals rigid adaptation in LLMs under non-stationary uncertainty
Authors: Haomiaomiao Wang, Tomás E Ward, Lili Zhang |
阅读更多来源: ArXiv AI | 07-04-26
CoALFake: Collaborative Active Learning with Human-LLM Co-Annotation for Cross-Domain Fake News Detection
Authors: Esma Aïmeur, Gilles Brassard, Dorsaf Sallami |
阅读更多来源: ArXiv AI | 07-04-26
A Model of Understanding in Deep Learning Systems
Authors: David Peter Wallis Freeborn |
阅读更多来源: ArXiv AI | 07-04-26
Readable Minds: Emergent Theory-of-Mind-Like Behavior in LLM Poker Agents
Authors: Hsieh-Ting Lin, Tsung-Yu Hou |
阅读更多来源: ArXiv AI | 07-04-26
RESCORE: LLM-Driven Simulation Recovery in Control Systems Research Papers
Authors: Vineet Bhat, Shiqing Wei, Ali Umut Kaypak, Prashanth Krishnamurthy, Ramesh Karri, Farshad Khorrami |
阅读更多来源: ArXiv AI | 07-04-26
Decocted Experience Improves Test-Time Inference in LLM Agents
Authors: Maohao Shen, Kaiwen Zha, Zexue He, Zhang-Wei Hong, Siru Ouyang, J. Jon Ryu, Prasanna Sattigeri, Suhas Diggavi, Gregory Wornell |
阅读更多来源: ArXiv AI | 07-04-26
REAM: Merging Improves Pruning of Experts in LLMs
Authors: Saurav Jha, Maryam Hashemzadeh, Ali Saheb Pasand, Ali Parviz, Min-Joong Lee, Boris Knyazev |
阅读更多来源: ArXiv AI | 07-04-26
Implementing surrogate goals for safer bargaining in LLM-based agents
Authors: Caspar Oesterheld, Maxime Riché, Filip Sondej, Jesse Clifton, Vincent Conitzer |
阅读更多来源: ArXiv AI | 07-04-26
Optimizing Service Operations via LLM-Powered Multi-Agent Simulation
Authors: Yanyuan Wang, Xiaowei Zhang |
阅读更多来源: ArXiv AI | 07-04-26
SuperLocalMemory V3.3: The Living Brain -- Biologically-Inspired Forgetting, Cognitive Quantization, and Multi-Channel Retrieval for Zero-LLM Agent Memory Systems
Authors: Varun Pratap Bhardwaj |
阅读更多来源: ArXiv AI | 07-04-26
Scalable and Explainable Learner-Video Interaction Prediction using Multimodal Large Language Models
Authors: Dominik Glandorf, Fares Fawzi, Tanja Käser |
阅读更多来源: ArXiv AI | 07-04-26
What Makes a Sale? Rethinking End-to-End Seller--Buyer Retail Dynamics with LLM Agents
Authors: Jeonghwan Choi, Jibin Hwang, Gyeonghun Sun, Minjeong Ban, Taewon Yun, Hyeonjae Cheon, Hwanjun Song |
阅读更多来源: ArXiv AI | 07-04-26
Greedy and Transformer-Based Multi-Port Selection for Slow Fluid Antenna Multiple Access
Authors: Darian Perez-Adan, Jose P. Gonzalez-Coma, F. Javier Lopez-Martinez, Luis Castedo |
阅读更多来源: ArXiv AI | 07-04-26
Springdrift: An Auditable Persistent Runtime for LLM Agents with Case-Based Memory, Normative Safety, and Ambient Self-Perception
Authors: Seamus Brady |
阅读更多来源: ArXiv AI | 07-04-26
AI Trust OS -- A Continuous Governance Framework for Autonomous AI Observability and Zero-Trust Compliance in Enterprise Environments
Authors: Eranga Bandara, Asanga Gunaratna, Ross Gore, Abdul Rahman, Ravi Mukkamala, Sachin Shetty, Sachini Rajapakse, Isurunima Kularathna, Peter Foytik, Safdar H. Bouk, Xueping Liang, Amin Hass, Ng Wee Keong, Kasun De Zoysa |
阅读更多来源: ArXiv AI | 07-04-26
ANX: Protocol-First Design for AI Agent Interaction with a Supporting 3EX Decoupled Architecture
Authors: Xu Mingze |
阅读更多来源: ArXiv AI | 07-04-26
MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents
Authors: Shu Wang, Edwin Yu, Oscar Love, Tom Zhang, Tom Wong, Steve Scargall, Charles Fan |
阅读更多来源: ArXiv AI | 07-04-26
Incompleteness of AI Safety Verification via Kolmogorov Complexity
Authors: Munawar Hasan |
阅读更多来源: ArXiv AI | 07-04-26
Americans are using AI more than ever while trusting it less, new Quinnipiac poll finds
阅读更多来源: The Decoder | 07-04-26
Alibaba's Qwen team built HopChain to fix how AI vision models fall apart during multi-step reasoning
When AI models reason about images, small perceptual errors compound across multiple steps and produce wrong answers. Alibaba’s HopChain framework tackles this by generating multi-stage image questions that break complex problems into linked individual steps, forcing models to verify each visual detail before drawing conclusions. The approach improves 20 out of 24 benchmarks.
阅读更多来源: The Decoder | 07-04-26
Anthropic expands partnership with Google and Broadcom for next-gen computeanthropic.com
阅读更多来源: Hacker News | 07-04-26
Show HN: Hippo, biologically inspired memory for AI agentsgithub.com/kitfunso
阅读更多来源: Hacker News | 07-04-26
Running Gemma 4 locally with LM Studio's new headless CLI and Claude Codegeorgeliu.com
阅读更多来源: Hacker News | 06-04-26
Show HN: Gemma Gem – AI model embedded in a browser – no API keys, no cloudgithub.com/kessler
阅读更多来源: Hacker News | 06-04-26
Show HN: Real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2Bgithub.com/fikrikarim
阅读更多来源: Hacker News | 06-04-26
Show HN: I built a tiny LLM to demystify how language models workgithub.com/arman-bd
阅读更多来源: Hacker News | 06-04-26
A Data-Centric Vision Transformer Baseline for SAR Sea Ice Classification
Authors: David Mike-Ewewie, Panhapiseth Lim, Priyanka Kumar |
阅读更多来源: ArXiv AI | 06-04-26
Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems
Authors: Yubin Qu, Yi Liu, Tongcheng Geng, Gelei Deng, Yuekang Li, Leo Yu Zhang, Ying Zhang, Lei Ma |
阅读更多来源: ArXiv AI | 06-04-26
Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study
Authors: Zhihao Chen, Ying Zhang, Yi Liu, Gelei Deng, Yuekang Li, Yanjun Zhang, Jianting Ning, Leo Yu Zhang, Lei Ma, Zhiqiang Li |
阅读更多来源: ArXiv AI | 06-04-26
Verbalizing LLMs' assumptions to explain and control sycophancy
Authors: Myra Cheng, Isabel Sieh, Humishka Zope, Sunny Yu, Lujain Ibrahim, Aryaman Arora, Jared Moore, Desmond Ong, Dan Jurafsky, Diyi Yang |
阅读更多来源: ArXiv AI | 06-04-26
Valence-Arousal Subspace in LLMs: Circular Emotion Geometry and Multi-Behavioral Control
Authors: Lihao Sun, Lewen Yan, Xiaoya Lu, Andrew Lee, Jie Zhang, Jing Shao |
阅读更多来源: ArXiv AI | 06-04-26
Beyond the Parameters: A Technical Survey of Contextual Enrichment in Large Language Models: From In-Context Prompting to Causal Retrieval-Augmented Generation
Authors: Prakhar Bansal, Shivangi Agarwal |
阅读更多来源: ArXiv AI | 06-04-26
Holos: A Web-Scale LLM-Based Multi-Agent System for the Agentic Web
Authors: Xiaohang Nie, Zihan Guo, Zicai Cui, Jiachi Yang, Zeyi Chen, Leheyi De, Yu Zhang, Junwei Liao, Bo Huang, Yingxuan Yang, Zhi Han, Zimian Peng, Linyao Chen, Wenzheng Tom Tang, Zongkai Liu, Tao Zhou, Botao Amber Hu, Shuyang Tang, Jianghao Lin, Weiwen Liu, Muning Wen, Yuanjian Zhou, Weinan Zhang |
阅读更多来源: ArXiv AI | 06-04-26
PR3DICTR: A modular AI framework for medical 3D image-based detection and outcome prediction
Authors: Daniel C. MacRae, Luuk van der Hoek, Robert van der Wal, Suzanne P.M. de Vette, Hendrike Neh, Baoqiang Ma, Peter M.A. van Ooijen, Lisanne V. van Dijk |
阅读更多来源: ArXiv AI | 06-04-26
Understanding the Nature of Generative AI as Threshold Logic in High-Dimensional Space
Authors: Ilya Levin |
阅读更多来源: ArXiv AI | 06-04-26
I must delete the evidence: AI Agents Explicitly Cover up Fraud and Violent Crime
Authors: Thomas Rivasseau, Benjamin Fung |
阅读更多来源: ArXiv AI | 06-04-26
AIVV: Neuro-Symbolic LLM Agent-Integrated Verification and Validation for Trustworthy Autonomous Systems
Authors: Jiyong Kwon, Ujin Jeon, Sooji Lee, Guang Lin |
阅读更多来源: ArXiv AI | 06-04-26
Do Audio-Visual Large Language Models Really See and Hear?
Authors: Ramaneswaran Selvakumar, Kaousheik Jayakumar, S Sakshi, Sreyan Ghosh, Ruohan Gao, Dinesh Manocha |
阅读更多来源: ArXiv AI | 06-04-26
Mitigating LLM biases toward spurious social contexts using direct preference optimization
Authors: Hyunji Nam, Dorottya Demszky |
阅读更多来源: ArXiv AI | 06-04-26
Competency Questions as Executable Plans: a Controlled RAG Architecture for Cultural Heritage Storytelling
Authors: Naga Sowjanya Barla, Jacopo de Berardinis |
阅读更多来源: ArXiv AI | 06-04-26
Let's Have a Conversation: Designing and Evaluating LLM Agents for Interactive Optimization
Authors: Joshua Drossman, Alexandre Jacquillat, Sébastien Martin |
阅读更多来源: ArXiv AI | 06-04-26
AutoVerifier: An Agentic Automated Verification Framework Using Large Language Models
Authors: Yuntao Du, Minh Dinh, Kaiyuan Zhang, Ninghui Li |
阅读更多来源: ArXiv AI | 06-04-26
Aligning Progress and Feasibility: A Neuro-Symbolic Dual Memory Framework for Long-Horizon LLM Agents
Authors: Bin Wen, Ruoxuan Zhang, Yang Chen, Hongxia Xie, Lan-Zhe Guo |
阅读更多来源: ArXiv AI | 06-04-26
Analysis of Optimality of Large Language Models on Planning Problems
Authors: Bernd Bohnet, Michael C. Mozer, Kevin Swersky, Wil Cunningham, Aaron Parisi, Kathleen Kenealy, Noah Fiedel |
阅读更多来源: ArXiv AI | 06-04-26
Coupled Control, Structured Memory, and Verifiable Action in Agentic AI (SCRAT -- Stochastic Control with Retrieval and Auditable Trajectories): A Comparative Perspective from Squirrel Locomotion and Scatter-Hoarding
Authors: Maximiliano Armesto, Christophe Kolb |
阅读更多来源: ArXiv AI | 06-04-26
LLMs can't justify their answers–this CLI forces them tograinulation.com
阅读更多来源: Hacker News | 06-04-26
Nanocode: The best Claude Code that $200 can buy in pure JAX on TPUsgithub.com/salmanmohammadi
阅读更多来源: Hacker News | 06-04-26
OpenAI's fall from grace as investors race to Anthropiclatimes.com
阅读更多来源: Hacker News | 06-04-26
Show HN: sllm – Split a GPU node with other developers, unlimited tokenssllm.cloud
阅读更多来源: Hacker News | 05-04-26
LLM Wiki – example of an "idea file"gist.github.com
阅读更多来源: Hacker News | 05-04-26
Analysis of LLM Performance on AWS Bedrock: Receipt-item Categorisation Case Study
Authors: Gabby Sanchez, Sneha Oommen, Cassandra T. Britto, Di Wang, Jung-De Chiou, Maria Spichkova |
阅读更多来源: ArXiv AI | 05-04-26
Ontology-Aware Design Patterns for Clinical AI Systems: Translating Reification Theory into Software Architecture
Authors: Florian Odi Stummer |
阅读更多来源: ArXiv AI | 05-04-26
LiteInception: A Lightweight and Interpretable Deep Learning Framework for General Aviation Fault Diagnosis
Authors: Zhihuan Wei, Xinhang Chen, Danyang Han, Yang Hu, Jie Liu, Xuewen Miao, Guijiang Li |
阅读更多来源: ArXiv AI | 05-04-26
AeroTherm-GPT: A Verification-Centered LLM Framework for Thermal Protection System Engineering Workflows
Authors: Chuhan Qiao, Jinglai Zheng, Jie Huang, Buyue Zhao, Fan Li, Haiming Huang |
阅读更多来源: ArXiv AI | 05-04-26
Bayesian Elicitation with LLMs: Model Size Helps, Extra "Reasoning" Doesn't Always
Authors: Luka Hobor, Mario Brcic, Mihael Kovac, Kristijan Poje |
阅读更多来源: ArXiv AI | 05-04-26
GenGait: A Transformer-Based Model for Human Gait Anomaly Detection and Normative Twin Generation
Authors: Elisa Motta, Marta Lorenzini, Clara Mouawad, Alberto Ranavolo, Mariano Serrao, Arash Ajoudani |
阅读更多来源: ArXiv AI | 05-04-26
SenseMath: Do LLMs Have Number Sense? Evaluating Shortcut Use, Judgment, and Generation
Authors: Haomin Zhuang, Xiangqi Wang, Yili Shen, Ying Cheng, Xiangliang Zhang |
阅读更多来源: ArXiv AI | 05-04-26
ProCeedRL: Process Critic with Exploratory Demonstration Reinforcement Learning for LLM Agentic Reasoning
Authors: Jingyue Gao, Yanjiang Guo, Xiaoshuai Chen, Jianyu Chen |
阅读更多来源: ArXiv AI | 05-04-26
LLM-as-a-Judge for Time Series Explanations
Authors: Preetham Sivalingam, Murari Mandal, Saurabh Deshpande, Dhruv Kumar |
阅读更多来源: ArXiv AI | 05-04-26
Quantifying Self-Preservation Bias in Large Language Models
Authors: Matteo Migliarini, Joaquin Pereira Pizzini, Luca Moresca, Valerio Santini, Indro Spinelli, Fabio Galasso |
阅读更多来源: ArXiv AI | 05-04-26
TRACE-Bot: Detecting Emerging LLM-Driven Social Bots via Implicit Semantic Representations and AIGC-Enhanced Behavioral Patterns
Authors: Zhongbo Wang, Zhiyu Lin, Zhu Wang, Haizhou Wang |
阅读更多来源: ArXiv AI | 05-04-26
MTI: A Behavior-Based Temperament Profiling System for AI Agents
Authors: Jihoon Jeong |
阅读更多来源: ArXiv AI | 05-04-26
Blinded Radiologist and LLM-Based Evaluation of LLM-Generated Japanese Translations of Chest CT Reports: Comparative Study
Authors: Yosuke Yamagishi, Atsushi Takamatsu, Yasunori Hamaguchi, Tomohiro Kikuchi, Shouhei Hanaoka, Takeharu Yoshikawa, Osamu Abe |
阅读更多来源: ArXiv AI | 05-04-26
From High-Dimensional Spaces to Verifiable ODD Coverage for Safety-Critical AI-based Systems
Authors: Thomas Stefani, Johann Maximilian Christensen, Elena Hoemann, Frank Köster, Sven Hallerbach |
阅读更多来源: ArXiv AI | 05-04-26
Answering the Wrong Question: Reasoning Trace Inversion for Abstention in LLMs
Authors: Abinitha Gourabathina, Inkit Padhi, Manish Nagireddy, Subhajit Chaudhury, Prasanna Sattigeri |
阅读更多来源: ArXiv AI | 05-04-26
Do Emotions in Prompts Matter? Effects of Emotional Framing on Large Language Models
Authors: Minda Zhao, Yutong Yang, Chufei Peng, Rachel Gonsalves, Weiyue Li, Ruyi Yang, Zhixi Liu, Mengyu Wang |
阅读更多来源: ArXiv AI | 05-04-26
De Jure: Iterative LLM Self-Refinement for Structured Extraction of Regulatory Rules
Authors: Keerat Guliani, Deepkamal Gill, David Landsman, Nima Eshraghi, Krishna Kumar, Lovedeep Gondara |
阅读更多来源: ArXiv AI | 05-04-26
Novel Memory Forgetting Techniques for Autonomous AI Agents: Balancing Relevance and Efficiency
Authors: Payal Fofadiya, Sunil Tiwari |
阅读更多来源: ArXiv AI | 05-04-26
Emotion concepts and their function in a large language modelanthropic.com
阅读更多来源: Hacker News | 05-04-26
Writing Lisp Is AI Resistant and I'm Saddjhaskin.com
阅读更多来源: Hacker News | 05-04-26
A case study in testing with 100+ Claude agents in parallelimbue.com
阅读更多来源: Hacker News | 05-04-26
We replaced RAG with a virtual filesystem for our AI documentation assistantmintlify.com
阅读更多来源: Hacker News | 04-04-26
Mbodi AI (YC P25) Is Hiringycombinator.com
阅读更多来源: Hacker News | 04-04-26
Claude Code Found a Linux Vulnerability Hidden for 23 Yearsmtlynch.io
阅读更多来源: Hacker News | 04-04-26
Tell HN: Anthropic no longer allowing Claude Code subscriptions to use OpenClaw
阅读更多来源: Hacker News | 04-04-26
Getting Claude to QA its own workskyvern.com
阅读更多来源: Hacker News | 04-04-26
Show HN: Apfel – The free AI already on your Macfranzai.com
阅读更多来源: Hacker News | 03-04-26
OpenAI Acquires TBPNopenai.com
阅读更多来源: Hacker News | 03-04-26
Lemonade by AMD: a fast and open source local LLM server using GPU and NPUlemonade-server.ai
阅读更多来源: Hacker News | 03-04-26
Mercor says it was hit by cyberattack tied to compromise LiteLLMtechcrunch.com
阅读更多来源: Hacker News | 02-04-26
Trust and Reliance on AI in Education: AI Literacy and Need for Cognition as Moderators
Authors: Griffin Pitts, Neha Rani, Weedguet Mildort |
阅读更多来源: ArXiv AI | 02-04-26
Brainstacks: Cross-Domain Cognitive Capabilities via Frozen MoE-LoRA Stacks for Continual LLM Learning
Authors: Mohammad R. Abu Ayyash |
阅读更多来源: ArXiv AI | 02-04-26
Online Reasoning Calibration: Test-Time Training Enables Generalizable Conformal LLM Reasoning
Authors: Cai Zhou, Zekai Wang, Menghua Wu, Qianyu Julie Zhu, Flora C. Shi, Chenyu Wang, Ashia Wilson, Tommi Jaakkola, Stephen Bates |
阅读更多来源: ArXiv AI | 02-04-26
$\texttt{YC-Bench}$: Benchmarking AI Agents for Long-Term Planning and Consistent Execution
Authors: Muyu He, Adit Jain, Anand Kumar, Vincent Tu, Soumyadeep Bakshi, Sachin Patro, Nazneen Rajani |
阅读更多来源: ArXiv AI | 02-04-26
How Emotion Shapes the Behavior of LLMs and Agents: A Mechanistic Study
Authors: Moran Sun, Tianlin Li, Yuwei Zheng, Zhenhong Zhou, Aishan Liu, Xianglong Liu, Yang Liu |
阅读更多来源: ArXiv AI | 02-04-26
The Recipe Matters More Than the Kitchen:Mathematical Foundations of the AI Weather Prediction Pipeline
Authors: Piyush Garg, Diana R. Gergel, Andrew E. Shao, Galen J. Yacalis |
阅读更多来源: ArXiv AI | 02-04-26
A Safety-Aware Role-Orchestrated Multi-Agent LLM Framework for Behavioral Health Communication Simulation
Authors: Ha Na Cho |
阅读更多来源: ArXiv AI | 02-04-26
Open, Reliable, and Collective: A Community-Driven Framework for Tool-Using AI Agents
Authors: Hy Dang, Quang Dao, Meng Jiang |
阅读更多来源: ArXiv AI | 02-04-26
Collaborative AI Agents and Critics for Fault Detection and Cause Analysis in Network Telemetry
Authors: Syed Eqbal Alam, Zhan Shu |
阅读更多来源: ArXiv AI | 02-04-26
Improvisational Games as a Benchmark for Social Intelligence of AI Agents: The Case of Connections
Authors: Gaurav Rajesh Parikh, Angikar Ghosal |
阅读更多来源: ArXiv AI | 02-04-26
Human-in-the-Loop Control of Objective Drift in LLM-Assisted Computer Science Education
Authors: Mark Dranias, Adam Whitley |
阅读更多来源: ArXiv AI | 02-04-26
Decision-Centric Design for LLM Systems
Authors: Wei Sun |
阅读更多来源: ArXiv AI | 02-04-26
In harmony with gpt-oss
Authors: Borislav Mavrin |
阅读更多来源: ArXiv AI | 02-04-26
The Silicon Mirror: Dynamic Behavioral Gating for Anti-Sycophancy in LLM Agents
Authors: Harshee Jignesh Shah (Independent Researcher) |
阅读更多来源: ArXiv AI | 02-04-26
Logarithmic Scores, Power-Law Discoveries: Disentangling Measurement from Coverage in Agent-Based Evaluation
Authors: HyunJoon Jung, William Na |
阅读更多来源: ArXiv AI | 02-04-26
Towards Reliable Truth-Aligned Uncertainty Estimation in Large Language Models
Authors: Ponhvoan Srey, Quang Minh Nguyen, Xiaobao Wu, Anh Tuan Luu |
阅读更多来源: ArXiv AI | 02-04-26
Ontology-Constrained Neural Reasoning in Enterprise Agentic Systems: A Neurosymbolic Architecture for Domain-Grounded AI Agents
Authors: Thanh Luong Tuan |
阅读更多来源: ArXiv AI | 02-04-26
CircuitProbe: Predicting Reasoning Circuits in Transformers via Stability Zone Detection
Authors: Rajkiran Panuganti |
阅读更多来源: ArXiv AI | 02-04-26
Experience as a Compass: Multi-agent RAG with Evolving Orchestration and Agent Prompts
Authors: Sha Li, Naren Ramakrishnan |
阅读更多来源: ArXiv AI | 02-04-26
Beyond Symbolic Solving: Multi Chain-of-Thought Voting for Geometric Reasoning in Large Language Models
Authors: Md. Abu Bakor Siddique, Shahrin Hossain, Sadman Ahmed Siam, Syed Rifat Raiyan, Hasan Mahmud, Md Kamrul Hasan |
阅读更多来源: ArXiv AI | 02-04-26
Adversarial Moral Stress Testing of Large Language Models
Authors: Saeid Jamshidi, Foutse Khomh, Arghavan Moradi Dakhel, Amin Nikanjam, Mohammad Hamdaqa, Kawser Wazed Nafi |
阅读更多来源: ArXiv AI | 02-04-26
The Claude Code Leakbuild.ms
阅读更多来源: Hacker News | 02-04-26
InspectMind AI (YC W24) Is Hiringycombinator.com
阅读更多来源: Hacker News | 02-04-26
IPv6 address, as a sentence you can remembertib3rius.com
阅读更多来源: Hacker News | 02-04-26
OpenAI closes funding round at an $852B valuationcnbc.com
阅读更多来源: Hacker News | 01-04-26
The Claude Code Source Leak: fake tools, frustration regexes, undercover modealex000kim.com
阅读更多来源: Hacker News | 01-04-26
Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMsprismml.com
阅读更多来源: Hacker News | 01-04-26
Claude Wrote a Full FreeBSD Remote Kernel RCE with Root Shell (CVE-2026-4747)github.com/califio
阅读更多来源: Hacker News | 01-04-26
Show HN: Baton – A desktop app for developing with AI agentsgetbaton.dev
阅读更多来源: Hacker News | 01-04-26
Claude Code Unpacked : A visual guideccunpacked.dev
阅读更多来源: Hacker News | 01-04-26
Drop the Hierarchy and Roles: How Self-Organizing LLM Agents Outperform Designed Structures
Authors: Victoria Dochkina |
阅读更多来源: ArXiv AI | 01-04-26
GISTBench: Evaluating LLM User Understanding via Evidence-Based Interest Verification
Authors: Iordanis Fostiropoulos, Muhammad Rafay Azhar, Abdalaziz Sawwan, Boyu Fang, Yuchen Liu, Jiayi Liu, Hanchao Yu, Qi Guo, Jianyu Wang, Fei Liu, Xiangjun Fan |
阅读更多来源: ArXiv AI | 01-04-26
PAR$^2$-RAG: Planned Active Retrieval and Reasoning for Multi-Hop Question Answering
Authors: Xingyu Li, Rongguang Wang, Yuying Wang, Mengqing Guo, Chenyang Li, Tao Sheng, Sujith Ravi, Dan Roth |
阅读更多来源: ArXiv AI | 01-04-26
The Future of AI is Many, Not One
Authors: Daniel J. Singer, Luca Garzino Demo |
阅读更多来源: ArXiv AI | 01-04-26
Beyond pass@1: A Reliability Science Framework for Long-Horizon LLM Agents
Authors: Aaditya Khanal, Yangyang Tao, Junxiu Zhou |
阅读更多来源: ArXiv AI | 01-04-26
Route-Induced Density and Stability (RIDE): Controlled Intervention and Mechanism Analysis of Routing-Style Meta Prompts on LLM Internal States
Authors: Dianxing Zhang, Gang Li, Sheng Li |
阅读更多来源: ArXiv AI | 01-04-26
Webscraper: Leverage Multimodal Large Language Models for Index-Content Web Scraping
Authors: Guan-Lun Huang, Yuh-Jzer Joung |
阅读更多来源: ArXiv AI | 01-04-26
SimMOF: AI agent for Automated MOF Simulations
Authors: Jaewoong Lee, Taeun Bae, Jihan Kim |
阅读更多来源: ArXiv AI | 01-04-26
Knowledge database development by large language models for countermeasures against viruses and marine toxins
Authors: Hung N. Do, Jessica Z. Kubicek-Sutherland, S. Gnanakaran |
阅读更多来源: ArXiv AI | 01-04-26
ASI-Evolve: AI Accelerates AI
Authors: Weixian Xu, Tiantian Mi, Yixiu Liu, Yang Nan, Zhimeng Zhou, Lyumanshan Ye, Lin Zhang, Yu Qiao, Pengfei Liu |
阅读更多来源: ArXiv AI | 01-04-26
ELT-Bench-Verified: Benchmark Quality Issues Underestimate AI Agent Capabilities
Authors: Christopher Zanoli, Andrea Giovannini, Tengjun Jin, Ana Klimovic, Yotam Perlitz |
阅读更多来源: ArXiv AI | 01-04-26
Tracking vs. Deciding: The Dual-Capability Bottleneck in Searchless Chess Transformers
Authors: Quanhao Li, Wei Jiang |
阅读更多来源: ArXiv AI | 01-04-26
Spontaneous Functional Differentiation in Large Language Models: A Brain-Like Intelligence Economy
Authors: Junjie Zhang, Zhen Shen, Gang Xiong, Xisong Dong |
阅读更多来源: ArXiv AI | 01-04-26
AgentFixer: From Failure Detection to Fix Recommendations in LLM Agentic Systems
Authors: Hadar Mulian, Sergey Zeltyn, Ido Levy, Liane Galanti, Avi Yaeli, Segev Shlomov |
阅读更多来源: ArXiv AI | 01-04-26
ShapE-GRPO: Shapley-Enhanced Reward Allocation for Multi-Candidate LLM Training
Authors: Rui Ai, Yu Pan, David Simchi-Levi, Chonghuan Wang |
阅读更多来源: ArXiv AI | 01-04-26
ATP-Bench: Towards Agentic Tool Planning for MLLM Interleaved Generation
Authors: Yinuo Liu, Zi Qian, Heng Zhou, Jiahao Zhang, Yajie Zhang, Zhihang Li, Mengyu Zhou, Erchao Zhao, Xiaoxi Jiang, Guanjun Jiang |
阅读更多来源: ArXiv AI | 01-04-26
Uncertainty Gating for Cost-Aware Explainable Artificial Intelligence
Authors: Georgii Mikriukov, Grégoire Montavon, Marina M.-C. Höhne |
阅读更多来源: ArXiv AI | 01-04-26
From 300KB to 69KB per Token: How LLM Architectures Solve the KV Cache Problemfuture-shock.ai
阅读更多来源: Hacker News | 01-04-26
My son pleasured himself on Gemini Live. Entire family's Google accounts bannedreddit.com
阅读更多来源: Hacker News | 01-04-26
Show HN: Pardus Browser- a browser for AI agents without Chromiumgithub.com/jasonhonkl
阅读更多来源: Hacker News | 31-03-26
Anthropic: Claude Code users hitting usage limits 'way faster than expected'theregister.com
阅读更多来源: Hacker News | 31-03-26
Universal Claude.md – cut Claude output tokensgithub.com/drona23
阅读更多来源: Hacker News | 31-03-26
Claude Code's source code has been leaked via a map file in their NPM registrytwitter.com/fried_rice
阅读更多来源: Hacker News | 31-03-26
Heterogeneous Debate Engine: Identity-Grounded Cognitive Architecture for Resilient LLM-Based Ethical Tutoring
Authors: Jakub Masłowski, Jarosław A. Chudziak |
阅读更多来源: ArXiv AI | 31-03-26
LLM Readiness Harness: Evaluation, Observability, and CI Gates for LLM/RAG Applications
Authors: Alexandre Cristovão Maiorano |
阅读更多来源: ArXiv AI | 31-03-26
Beyond Completion: Probing Cumulative State Tracking to Predict LLM Agent Performance
Authors: Dengzhe Hou, Lingyu Jiang, Deng Li, Zirui Li, Fangzhou Lin, Kazunori D Yamada |
阅读更多来源: ArXiv AI | 31-03-26
Dual-Stage LLM Framework for Scenario-Centric Semantic Interpretation in Driving Assistance
Authors: Jean Douglas Carvalho, Hugo Taciro Kenji, Ahmad Mohammad Saber, Glaucia Melo, Max Mauro Dias Santos, Deepa Kundur |
阅读更多来源: ArXiv AI | 31-03-26
AstraAI: LLMs, Retrieval, and AST-Guided Assistance for HPC Codebases
Authors: Mahesh Natarajan, Xiaoye Li, Weiqun Zhang |
阅读更多来源: ArXiv AI | 31-03-26
TianJi:An autonomous AI meteorologist for discovering physical mechanisms in atmospheric science
Authors: Kaikai Zhang, Xiang Wang, Haoluo Zhao, Nan Chen, Mengyang Yu Jing-Jia Luo, Tao Song, Fan Meng |
阅读更多来源: ArXiv AI | 31-03-26
DSevolve: Enabling Real-Time Adaptive Scheduling on Dynamic Shop Floor with LLM-Evolved Heuristic Portfolios
Authors: Jin Huang, Jie Yang, XinLei Zhou, Qihao Liu, Liang Gao, Xinyu Li |
阅读更多来源: ArXiv AI | 31-03-26
CARV: A Diagnostic Benchmark for Compositional Analogical Reasoning in Multimodal LLMs
Authors: Yongkang Du, Xiaohan Zou, Minhao Cheng, Lu Lin |
阅读更多来源: ArXiv AI | 31-03-26
What an Autonomous Agent Discovers About Molecular Transformer Design: Does It Transfer?
Authors: Edward Wijaya |
阅读更多来源: ArXiv AI | 31-03-26
Beyond the Answer: Decoding the Behavior of LLMs as Scientific Reasoners
Authors: Rohan Pandey, Eric Ye, Michael Li |
阅读更多来源: ArXiv AI | 31-03-26
SLOW: Strategic Logical-inference Open Workspace for Cognitive Adaptation in AI Tutoring
Authors: Yuang Wei, Ruijia Li, Bo Jiang |
阅读更多来源: ArXiv AI | 31-03-26
PReD: An LLM-based Foundation Multimodal Model for Electromagnetic Perception, Recognition, and Decision
Authors: Zehua Han, Jing Xiao, Yiqi Duan, Mengyu Xiang, Yuheng Ji, Xiaolong Zheng, Chenghanyu Zhang, Zhendong She, Junyu Shen, Dingwei Tan, Shichu Sun, Zhou Cong, Mingxuan Liu, Fengxiang Wang, Jinping Sun, Yangang Sun |
阅读更多来源: ArXiv AI | 31-03-26
Evaluating LLMs for Answering Student Questions in Introductory Programming Courses
Authors: Thomas Van Mullem, Bart Mesuere, Peter Dawyndt |
阅读更多来源: ArXiv AI | 31-03-26
Deep Research of Deep Research: From Transformer to Agent, From AI to AI for Science
Authors: Yipeng Yu |
阅读更多来源: ArXiv AI | 31-03-26
CoE: Collaborative Entropy for Uncertainty Quantification in Agentic Multi-LLM Systems
Authors: Kangkang Sun, Jun Wu, Jianhua Li, Minyi Guo, Xiuzhen Che, Jianwei Huang |
阅读更多来源: ArXiv AI | 31-03-26
Entropic Claim Resolution: Uncertainty-Driven Evidence Selection for RAG
Authors: Davide Di Gioia |
阅读更多来源: ArXiv AI | 31-03-26
Towards a Medical AI Scientist
Authors: Hongtao Wu, Boyun Zheng, Dingjie Song, Yu Jiang, Jianfeng Gao, Lei Xing, Lichao Sun, Yixuan Yuan |
阅读更多来源: ArXiv AI | 31-03-26
T-Norm Operators for EU AI Act Compliance Classification: An Empirical Comparison of Lukasiewicz, Product, and Gödel Semantics in a Neuro-Symbolic Reasoning System
Authors: Adam Laabs |
阅读更多来源: ArXiv AI | 31-03-26
MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models
Authors: Han Wang, Yifan Sun, Brian Ko, Mann Talati, Jiawen Gong, Zimeng Li, Naicheng Yu, Xucheng Yu, Wei Shen, Vedant Jolly, Huan Zhang |
阅读更多来源: ArXiv AI | 31-03-26
Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning
Authors: Rongjin Li, Zichen Tang, Xianghe Wang, Xinyi Hu, Zhengyu Wang, Zhengyu Lu, Yiling Huang, Jiayuan Chen, Weisheng Tan, Jiacheng Liu, Zhongjun Yang, Haihong E |
阅读更多来源: ArXiv AI | 31-03-26
Learn Claude Code by doing, not readingnagdy.me
阅读更多来源: Hacker News | 31-03-26
Show HN: I turned a sketch into a 3D-print pegboard for my kid with an AI agentgithub.com/virpo
阅读更多来源: Hacker News | 31-03-26
How A Spartan Revolutionized Baseballmsu.edu
阅读更多来源: Hacker News | 30-03-26
How the AI Bubble Burstsmartinvol.pe
阅读更多来源: Hacker News | 30-03-26
Hamilton-Jacobi-Bellman Equation: Reinforcement Learning and Diffusion Modelsdani2442.github.io
阅读更多来源: Hacker News | 30-03-26
ChatGPT won't let you type until Cloudflare reads your React statebuchodi.com
阅读更多来源: Hacker News | 30-03-26
Show HN: Phantom – Open-source AI agent on its own VM that rewrites its configgithub.com/ghostwright
阅读更多来源: Hacker News | 30-03-26
From Human Cognition to Neural Activations: Probing the Computational Primitives of Spatial Reasoning in LLMs
Authors: Jiyuan An, Liner Yang, Mengyan Wang, Luming Lu, Weihua An, Erhong Yang |
阅读更多来源: ArXiv AI | 30-03-26
Preference-Aligned LoRA Merging: Preserving Subspace Coverage and Addressing Directional Anisotropy
Authors: Wooseong Jeong, Wonyoung Lee, Kuk-Jin Yoon |
阅读更多来源: ArXiv AI | 30-03-26
findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding
Authors: Héctor Javier Vázquez Martínez |
阅读更多来源: ArXiv AI | 30-03-26
Mitigating the Reasoning Tax in Vision-Language Fine-Tuning with Input-Adaptive Depth Aggregation
Authors: Yiming Ren, Yujiu Yang, Junjie Wang |
阅读更多来源: ArXiv AI | 30-03-26
A Boltzmann-machine-enhanced Transformer For DNA Sequence Classification
Authors: Zhixuan Cao, Yishu Xu, Xuang WU |
阅读更多来源: ArXiv AI | 30-03-26
Can AI Models Direct Each Other? Organizational Structure as a Probe into Training Limitations
Authors: Rui Liu |
阅读更多来源: ArXiv AI | 30-03-26
ALBA: A European Portuguese Benchmark for Evaluating Language and Linguistic Dimensions in Generative LLMs
Authors: Inês Vieira, Inês Calvo, Iago Paulo, James Furtado, Rafael Ferreira, Diogo Tavares, Diogo Glória-Silva, David Semedo, João Magalhães |
阅读更多来源: ArXiv AI | 30-03-26
AMALIA Technical Report: A Fully Open Source Large Language Model for European Portuguese
Authors: Afonso Simplício, Gonçalo Vinagre, Miguel Moura Ramos, Diogo Tavares, Rafael Ferreira, Giuseppe Attanasio, Duarte M. Alves, Inês Calvo, Inês Vieira, Rui Guerra, James Furtado, Beatriz Canaverde, Iago Paulo, Vasco Ramos, Diogo Glória-Silva, Miguel Faria, Marcos Treviso, Daniel Gomes, Pedro Gomes, David Semedo, André Martins, João Magalhães |
阅读更多来源: ArXiv AI | 30-03-26
Rocks, Pebbles and Sand: Modality-aware Scheduling for Multimodal Large Language Model Inference
Authors: Konstantinos Papaioannou, Thaleia Dimitra Doudali |
阅读更多来源: ArXiv AI | 30-03-26
UNIFERENCE: A Discrete Event Simulation Framework for Developing Distributed AI Models
Authors: Doğaç Eldenk, Stephen Xia |
阅读更多来源: ArXiv AI | 30-03-26
Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering
Authors: Yoseph Berhanu Alebachew, Hunter Leary, Swanand Vaishampayan, Chris Brown |
阅读更多来源: ArXiv AI | 30-03-26
The Multi-AMR Buffer Storage, Retrieval, and Reshuffling Problem: Exact and Heuristic Approaches
Authors: Max Disselnmeyer, Thomas Bömer, Laura Dörr, Bastian Amberg, Anne Meyer |
阅读更多来源: ArXiv AI | 30-03-26
Think over Trajectories: Leveraging Video Generation to Reconstruct GPS Trajectories from Cellular Signaling
Authors: Ruixing Zhang, Hanzhang Jiang, Leilei Sun, Liangzhe Han, Jibin Wang, Weifeng Lv |
阅读更多来源: ArXiv AI | 30-03-26
Machine Learning Transferability for Malware Detection
Authors: César Vieira, João Vitorino, Eva Maia, Isabel Praça |
阅读更多来源: ArXiv AI | 30-03-26
AutoB2G: A Large Language Model-Driven Agentic Framework For Automated Building-Grid Co-Simulation
Authors: Borui Zhang, Nariman Mahdavi, Subbu Sethuvenkatraman, Shuang Ao, Flora Salim |
阅读更多来源: ArXiv AI | 30-03-26
AIRA_2: Overcoming Bottlenecks in AI Research Agents
Authors: Karen Hambardzumyan, Nicolas Baldwin, Edan Toledo, Rishi Hazra, Michael Kuchnik, Bassel Al Omari, Thomas Simon Foster, Anton Protopopov, Jean-Christophe Gagnon-Audet, Ishita Mediratta, Kelvin Niu, Michael Shvartsman, Alisia Lupidi, Alexis Audran-Reiss, Parth Pathak, Tatiana Shavrina, Despoina Magka, Hela Momand, Derek Dunfield, Nicola Cancedda, Pontus Stenetorp, Carole-Jean Wu, Jakob Nicolaus Foerster, Yoram Bachrach, Martin Josifoski |
阅读更多来源: ArXiv AI | 30-03-26
The Sudden Fall of OpenAI's Most Hyped Product Since ChatGPTwsj.com
阅读更多来源: Hacker News | 30-03-26
Miasma: A tool to trap AI web scrapers in an endless poison pitgithub.com/austin-weeks
阅读更多来源: Hacker News | 30-03-26
Claude Code runs Git reset –hard origin/main against project repo every 10 minsgithub.com/anthropics
阅读更多来源: Hacker News | 30-03-26
What if AI doesn't need more RAM but better math?adlrocha.substack.com
阅读更多来源: Hacker News | 29-03-26
Further human + AI + proof assistant work on Knuth's "Claude Cycles" problemtwitter.com/bowang87
阅读更多来源: Hacker News | 29-03-26
Trust as Monitoring: Evolutionary Dynamics of User Trust and AI Developer Behaviour
Authors: Adeela Bashir, Zhao Song, Ndidi Bianca Ogbo, Nataliya Balabanova, Martin Smit, Chin-wing Leung, Paolo Bova, Manuel Chica Serrano, Dhanushka Dissanayake, Manh Hong Duong, Elias Fernandez Domingos, Nikita Huber-Kralj, Marcus Krellner, Andrew Powell, Stefan Sarkadi, Fernando P. Santos, Zia Ush Shamszaman, Chaimaa Tarzi, Paolo Turrini, Grace Ibukunoluwa Ufeoshi, Victor A. Vargas-Perez, Alessandro Di Stefano, Simon T. Powers, The Anh Han |
阅读更多来源: ArXiv AI | 29-03-26
Resisting Humanization: Ethical Front-End Design Choices in AI for Sensitive Contexts
Authors: Silvia Rossi, Diletta Huyskes, Mackenzie Jorgensen |
阅读更多来源: ArXiv AI | 29-03-26
ReLope: KL-Regularized LoRA Probes for Multimodal LLM Routing
Authors: Yaopei Zeng, Congchao Wang, Blake JianHang Chen, Lu Lin |
阅读更多来源: ArXiv AI | 29-03-26
Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regulation Agentic AI Loop for Engineering Design
Authors: Zeda Xu, Nikolas Martelaro, Christopher McComb |
阅读更多来源: ArXiv AI | 29-03-26
Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math
Authors: Dingjie Song, Tianlong Xu, Yi-Fan Zhang, Hang Li, Zhiling Yan, Xing Fan, Haoyang Li, Lichao Sun, Qingsong Wen |
阅读更多来源: ArXiv AI | 29-03-26
Shopping with a Platform AI Assistant: Who Adopts, When in the Journey, and What For
Authors: Se Yan, Han Zhong, Zemin (Zachary)Zhong, Wenyu Zhou |
阅读更多来源: ArXiv AI | 29-03-26
FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol
Authors: Jie Zhu, Yimin Tian, Boyang Li, Kehao Wu, Zhongzhi Liang, Junhui Li, Xianyin Zhang, Lifan Guo, Feng Chen, Yong Liu, Chi Zhang |
阅读更多来源: ArXiv AI | 29-03-26
Decoding Market Emotions in Cryptocurrency Tweets via Predictive Statement Classification with Machine Learning and Transformers
Authors: Moein Shahiki Tash, Zahra Ahani, Mohim Tash, Mostafa Keikhay Farzaneh, Ari Y. Barrera-Animas, Olga Kolesnikova |
阅读更多来源: ArXiv AI | 29-03-26
LogitScope: A Framework for Analyzing LLM Uncertainty Through Information Metrics
Authors: Farhan Ahmed, Yuya Jeremy Ong, Chad DeLuca |
阅读更多来源: ArXiv AI | 29-03-26
On the Foundations of Trustworthy Artificial Intelligence
Authors: TJ Dunham |
阅读更多来源: ArXiv AI | 29-03-26
The Anatomy of Uncertainty in LLMs
Authors: Aditya Taparia, Ransalu Senanayake, Kowshik Thopalli, Vivek Narayanaswamy |
阅读更多来源: ArXiv AI | 29-03-26
ElephantBroker: A Knowledge-Grounded Cognitive Runtime for Trustworthy AI Agents
Authors: Cristian Lupascu, Alexandru Lupascu |
阅读更多来源: ArXiv AI | 29-03-26
From Stateless to Situated: Building a Psychological World for LLM-Based Emotional Support
Authors: Boning Zhao, Clover Hu, Xinnuo Li |
阅读更多来源: ArXiv AI | 29-03-26
Distribution and Clusters Approximations as Abstract Domains in Probabilistic Abstract Interpretation to Neural Network Analysis
Authors: Zhuofan Zhang, Herbert Wiklicky |
阅读更多来源: ArXiv AI | 29-03-26
Probabilistic Abstract Interpretation on Neural Networks via Grids Approximation
Authors: Zhuofan Zhang, Herbert Wiklicky |
阅读更多来源: ArXiv AI | 29-03-26
The Competence Shadow: Theory and Bounds of AI Assistance in Safety Engineering
Authors: Umair Siddique |
阅读更多来源: ArXiv AI | 29-03-26
UniAI-GraphRAG: Synergizing Ontology-Guided Extraction, Multi-Dimensional Clustering, and Dual-Channel Fusion for Robust Multi-Hop Reasoning
Authors: Jie Wang, Honghua Huang, Xi Ge, Jianhui Su, Wen Liu, Shiguo Lian |
阅读更多来源: ArXiv AI | 29-03-26
RubricEval: A Rubric-Level Meta-Evaluation Benchmark for LLM Judges in Instruction Following
Authors: Tianjun Pan, Xuan Lin, Wenyan Yang, Qianyu He, Shisong Chen, Licai Qi, Wanqing Xu, Hongwei Feng, Bo Xu, Yanghua Xiao |
阅读更多来源: ArXiv AI | 29-03-26
SliderQuant: Accurate Post-Training Quantization for LLMs
Authors: Shigeng Wang, Chao Li, Yangyuxuan Kang, Jiawei Fan, Zhonghong Ou, Anbang Yao |
阅读更多来源: ArXiv AI | 29-03-26
Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models
Authors: Xunguang Wang, Yuguang Zhou, Qingyue Wang, Zongjie Li, Ruixuan Huang, Zhenlan Ji, Pingchuan Ma, Shuai Wang |
阅读更多来源: ArXiv AI | 29-03-26
Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance?
Authors: Liang Zhang, Yu Fu, Xinyi Jin |
阅读更多来源: ArXiv AI | 29-03-26
Will the AI data centre boom become a $9T bust?ft.com
阅读更多来源: Hacker News | 29-03-26
The first 40 months of the AI eralzon.ca
阅读更多来源: Hacker News | 29-03-26
Anatomy of the .claude/ folderdailydoseofds.com
阅读更多来源: Hacker News | 28-03-26
CERN uses tiny AI models burned into silicon for real-time LHC data filteringtheopenreader.org
阅读更多来源: Hacker News | 28-03-26
Paper Tape Is All You Need – Training a Transformer on a 1976 Minicomputergithub.com/dbrll
阅读更多来源: Hacker News | 28-03-26
Toma (YC W24) is hiring a Senior/Staff Eng to build AI automotive coworkersycombinator.com
阅读更多来源: Hacker News | 28-03-26
HyperAgents: Self-referential self-improving agentsgithub.com/facebookresearch
阅读更多来源: Hacker News | 27-03-26
Anthropic Subprocessor Changesanthropic.com
阅读更多来源: Hacker News | 27-03-26
We rewrote JSONata with AI in a day, saved $500k/yearreco.ai
阅读更多来源: Hacker News | 27-03-26
My minute-by-minute response to the LiteLLM malware attackfuturesearch.ai
阅读更多来源: Hacker News | 27-03-26
$500 GPU outperforms Claude Sonnet on coding benchmarksgithub.com/itigges22
阅读更多来源: Hacker News | 27-03-26
Show HN: I put an AI agent on a $7/month VPS with IRC as its transport layergeorgelarson.me
阅读更多来源: Hacker News | 27-03-26
Schedule tasks on the webclaude.com
阅读更多来源: Hacker News | 27-03-26
Order Granting Preliminary Injunction – Anthropic vs. U.S. Department of War [pdf]courtlistener.com
阅读更多来源: Hacker News | 27-03-26
Judge blocks Pentagon effort to 'punish' Anthropic with supply chain risk labelcnn.com
阅读更多来源: Hacker News | 27-03-26
Show HN: Robust LLM Extractor for Websites in TypeScriptgithub.com/lightfeed
阅读更多来源: Hacker News | 26-03-26
Show HN: Optio – Orchestrate AI coding agents in K8s to go from ticket to PRgithub.com/jonwiggins
阅读更多来源: Hacker News | 26-03-26
From zero to a RAG system: successes and failuresandros.dev
阅读更多来源: Hacker News | 26-03-26
When AI Meets Early Childhood Education: Large Language Models as Assessment Teammates in Chinese Preschools
Authors: Xingming Li, Runke Huang, Yanan Bao, Yuye Jin, Yuru Jiao, Qingyong Hu |
阅读更多来源: ArXiv AI | 26-03-26
MolEvolve: LLM-Guided Evolutionary Search for Interpretable Molecular Optimization
Authors: Xiangsen Chen, Ruilong Wu, Yanyan Lan, Ting Ma, Yang Liu |
阅读更多来源: ArXiv AI | 26-03-26
Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs
Authors: Alexander Panfilov, Peter Romov, Igor Shilov, Yves-Alexandre de Montjoye, Jonas Geiping, Maksym Andriushchenko |
阅读更多来源: ArXiv AI | 26-03-26
Integrating Causal Machine Learning into Clinical Decision Support Systems: Insights from Literature and Practice
Authors: Domenique Zipperling, Lukas Schmidt, Benedikt Hahn, Niklas Kühl, Steven Kimbrough |
阅读更多来源: ArXiv AI | 26-03-26
Retrieval Improvements Do Not Guarantee Better Answers: A Study of RAG for AI Policy QA
Authors: Saahil Mathur, Ryan David Rittner, Vedant Ajit Thakur, Daniel Stuart Schiff, Tunazzina Islam |
阅读更多来源: ArXiv AI | 26-03-26
PLDR-LLMs Reason At Self-Organized Criticality
Authors: Burc Gokden |
阅读更多来源: ArXiv AI | 26-03-26
Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments
Authors: Yi Han, Lingfei Qian, Yan Wang, Yueru He, Xueqing Peng, Dongji Feng, Yankai Chen, Haohang Li, Yupeng Cao, Jimin Huang, Xue Liu, Jian-Yun Nie, Sophia Ananiadou |
阅读更多来源: ArXiv AI | 26-03-26
Efficient Benchmarking of AI Agents
Authors: Franck Ndzomga |
阅读更多来源: ArXiv AI | 26-03-26
LLMs Do Not Grade Essays Like Humans
Authors: Jerin George Mathew, Sumayya Taher, Anindita Kundu, Denilson Barbosa |
阅读更多来源: ArXiv AI | 26-03-26
When AI output tips to bad but nobody notices: Legal implications of AI's mistakes
Authors: Dylan J. Restrepo, Nicholas J. Restrepo, Frank Y. Huo, Neil F. Johnson |
阅读更多来源: ArXiv AI | 26-03-26
DUPLEX: Agentic Dual-System Planning via LLM-Driven Information Extraction
Authors: Keru Hua, Ding Wang, Yaoying Gu, Xiaoguang Ma |
阅读更多来源: ArXiv AI | 26-03-26
AnalogAgent: Self-Improving Analog Circuit Design Automation with LLM Agents
Authors: Zhixuan Bao, Zhuoyi Lin, Jiageng Wang, Jinhai Hu, Yuan Gao, Yaoxin Wu, Xiaoli Li, Xun Xu |
阅读更多来源: ArXiv AI | 26-03-26
AI-Supervisor: Autonomous AI Research Supervision via a Persistent Research World Model
Authors: Yunbo Long |
阅读更多来源: ArXiv AI | 26-03-26
The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence
Authors: Biplab Pal, Santanu Bhattacharya |
阅读更多来源: ArXiv AI | 26-03-26
Health NZ staff told to stop using ChatGPT to write clinical notesrnz.co.nz
阅读更多来源: Hacker News | 26-03-26
Ensu – Ente’s Local LLM appente.com
阅读更多来源: Hacker News | 26-03-26
Show HN: A plain-text cognitive architecture for Claude Codepuga.com.br
阅读更多来源: Hacker News | 26-03-26
Show HN: Nit – I rebuilt Git in Zig to save AI agents 71% on tokensjustfielding.com
阅读更多来源: Hacker News | 26-03-26
90% of Claude-linked output going to GitHub repos w <2 starsclaudescode.dev
阅读更多来源: Hacker News | 26-03-26
Show HN: Gemini can now natively embed video, so I built sub-second video searchgithub.com/ssrajadh
阅读更多来源: Hacker News | 25-03-26
Tell HN: Litellm 1.82.7 and 1.82.8 on PyPI are compromisedgithub.com/berriai
阅读更多来源: Hacker News | 25-03-26
TurboQuant: Redefining AI efficiency with extreme compressionresearch.google
阅读更多来源: Hacker News | 25-03-26
From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents
Authors: Ling Yue, Kushal Raj Bhandari, Ching-Yun Ko, Dhaval Patel, Shuxin Lin, Nianjun Zhou, Jianxi Gao, Pin-Yu Chen, Shaowu Pan |
阅读更多来源: ArXiv AI | 25-03-26
STEM Agent: A Self-Adapting, Tool-Enabled, Extensible Architecture for Multi-Protocol AI Agent Systems
Authors: Alfred Shen, Aaron Shen |
阅读更多来源: ArXiv AI | 25-03-26
Dynamic Fusion-Aware Graph Convolutional Neural Network for Multimodal Emotion Recognition in Conversations
Authors: Tao Meng, Weilun Tang, Yuntao Shou, Yilong Tan, Jun Zhou, Wei Ai, Keqin Li |
阅读更多来源: ArXiv AI | 25-03-26
Understanding LLM Performance Degradation in Multi-Instance Processing: The Roles of Instance Count and Context Length
Authors: Jingxuan Chen, Mohammad Taher Pilehvar, Jose Camacho-Collados |
阅读更多来源: ArXiv AI | 25-03-26
Computational Arbitrage in AI Model Markets
Authors: Ricardo Olmedo, Bernhard Schölkopf, Moritz Hardt |
阅读更多来源: ArXiv AI | 25-03-26
AgriPestDatabase-v1.0: A Structured Insect Dataset for Training Agricultural Large Language Model
Authors: Yagizhan Bilal Durak, Ahsan Ul Islam, Shahidul Islam, Ashley Morgan-Olvera, Iftekhar Ibne Basith, Syed Hasib Akhter Faruqui |
阅读更多来源: ArXiv AI | 25-03-26
Can LLM Agents Generate Real-World Evidence? Evaluating Observational Studies in Medical Databases
Authors: Dubai Li, Yuxiang He, Yan Hu, Yu Tian, Jingsong Li |
阅读更多来源: ArXiv AI | 25-03-26
MuQ-Eval: An Open-Source Per-Sample Quality Metric for AI Music Generation Evaluation
Authors: Di Zhu, Zixuan Li |
阅读更多来源: ArXiv AI | 25-03-26
Benchmarking Multi-Agent LLM Architectures for Financial Document Processing: A Comparative Study of Orchestration Patterns, Cost-Accuracy Tradeoffs and Production Scaling Strategies
Authors: Siddhant Kulkarni, Yukta Kulkarni |
阅读更多来源: ArXiv AI | 25-03-26
Reliable Classroom AI via Neuro-Symbolic Multimodal Reasoning
Authors: Sina Bagheri Nezhad |
阅读更多来源: ArXiv AI | 25-03-26
Separating Diagnosis from Control: Auditable Policy Adaptation in Agent-Based Simulations with LLM-Based Diagnostics
Authors: Shaoxin Zhong, Yuchen Su, Michael Witbrock |
阅读更多来源: ArXiv AI | 25-03-26
Chain-of-Authorization: Internalizing Authorization into Large Language Models via Reasoning Trajectories
Authors: Yang Li, Yule Liu, Xinlei He, Youjian Zhao, Qi Li, Ke Xu |
阅读更多来源: ArXiv AI | 25-03-26
Optimizing Small Language Models for NL2SQL via Chain-of-Thought Fine-Tuning
Authors: Anshul Solanki, Sanchit Latawa, Koushik Chakraborty, Navneet Kamboj |
阅读更多来源: ArXiv AI | 25-03-26
Ran Score: a LLM-based Evaluation Score for Radiology Report Generation
Authors: Ran Zhang, Yucong Lin, Zhaoli Su, Bowen Liu, Danni Ai, Tianyu Fu, Deqiang Xiao, Jingfan Fan, Yuanyuan Wang, Mingwei Gao, Yuwan Hu, Shuya Gao, Jingtao Li, Jian Yang, Hong Song, Hongliang Sun |
阅读更多来源: ArXiv AI | 25-03-26
ProGRank: Probe-Gradient Reranking to Defend Dense-Retriever RAG from Corpus Poisoning
Authors: Xiangyu Yin, Yi Qi, Chih-hong Cheng |
阅读更多来源: ArXiv AI | 25-03-26
JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees
Authors: Yuhui Wang, Zhixiong Yang, Ming Zhang, Shihan Dou, Zhiheng Xi, Enyu Zhou, Senjie Jin, Yujiong Shen, Dingwei Zhu, Yi Dong, Tao Gui, Qi Zhang, Xuanjing Huang |
阅读更多来源: ArXiv AI | 25-03-26
Can Large Language Models Reason and Optimize Under Constraints?
Authors: Fabien Bernier, Salah Ghamizi, Pantelis Dogoulis, Maxime Cordy |
阅读更多来源: ArXiv AI | 25-03-26
Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment
Authors: Adrian Sauter, Mona Schirmer |
阅读更多来源: ArXiv AI | 25-03-26
LLM Olympiad: Why Model Evaluation Needs a Sealed Exam
Authors: Jan Christian Blaise Cruz, Alham Fikri Aji |
阅读更多来源: ArXiv AI | 25-03-26
Hypura – A storage-tier-aware LLM inference scheduler for Apple Silicongithub.com/t8
阅读更多来源: Hacker News | 25-03-26
Transformers Are Bayesian Networksarxiv.org
阅读更多来源: Hacker News | 25-03-26
ConsRoute:Consistency-Aware Adaptive Query Routing for Cloud-Edge-Device Large Language Models
Authors: Haoyu Qiao, Hao Zhang, Shanwen Mao, Siyao Cheng, Jie Liu |
阅读更多来源: ArXiv AI | 24-03-26
Does AI Homogenize Student Thinking? A Multi-Dimensional Analysis of Structural Convergence in AI-Augmented Essays
Authors: Keito Inoshita, Michiaki Omura, Tsukasa Yamanaka, Go Maeda, Kentaro Tsuji |
阅读更多来源: ArXiv AI | 24-03-26
Revisiting Tree Search for LLMs: Gumbel and Sequential Halving for Budget-Scalable Reasoning
Authors: Leonid Ugadiarov, Yuri Kuratov, Aleksandr Panov, Alexey Skrynnik |
阅读更多来源: ArXiv AI | 24-03-26
The AI Scientific Community: Agentic Virtual Lab Swarms
Authors: Ulisses Braga-Neto |
阅读更多来源: ArXiv AI | 24-03-26
Improving Coherence and Persistence in Agentic AI for System Optimization
Authors: Pantea Karimi, Kimia Noorbakhsh, Mohammad Alizadeh, Hari Balakrishnan |
阅读更多来源: ArXiv AI | 24-03-26
Graph of States: Solving Abductive Tasks with Large Language Models
Authors: Yu Luo, Rongchen Gao, Lu Teng, Xidao Wen, Jiamin Jiang, Qingliang Zhang, Yongqian Sun, Shenglin Zhang, Jiasong Feng, Tong Liu, Wenjie Zhang, Dan Pei |
阅读更多来源: ArXiv AI | 24-03-26
A transformer architecture alteration to incentivise externalised reasoning
Authors: Elizabeth Pavlova, Mariia Koroliuk, Karthik Viswanathan, Cameron Tice, Edward James Young, Puria Radmard |
阅读更多来源: ArXiv AI | 24-03-26
AdaRubric: Task-Adaptive Rubrics for LLM Agent Evaluation
Authors: Liang Ding |
阅读更多来源: ArXiv AI | 24-03-26
AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling
Authors: Liang Ding |
阅读更多来源: ArXiv AI | 24-03-26
Behavioural feasible set: Value alignment constraints on AI decision support
Authors: Taejin Park |
阅读更多来源: ArXiv AI | 24-03-26
DomAgent: Leveraging Knowledge Graphs and Case-Based Reasoning for Domain-Specific Code Generation
Authors: Shuai Wang, Dhasarathy Parthasarathy, Robert Feldt, Yinan Yu |
阅读更多来源: ArXiv AI | 24-03-26
Is the future of AI green? What can innovation diffusion models say about generative AI's environmental impact?
Authors: Robert Viseur, Nicolas Jullien |
阅读更多来源: ArXiv AI | 24-03-26
A Multidisciplinary AI Board for Multimodal Dementia Characterization and Risk Assessment
Authors: Sheng Liu, Long Chen, Zeyun Zhao, Qinglin Gou, Qingyue Wei, Arjun Masurkar, Kevin M. Spiegler, Philip Kuball, Stefania C. Bray, Megan Bernath, Deanna R. Willis, Jiang Bian, Lei Xing, Eric Topol, Kyunghyun Cho, Yu Huang, Ruogu Fang, Narges Razavian, James Zou |
阅读更多来源: ArXiv AI | 24-03-26
Mind over Space: Can Multimodal Large Language Models Mentally Navigate?
Authors: Qihui Zhu, Shouwei Ruan, Xiao Yang, Hao Jiang, Yao Huang, Shiji Zhao, Hanwei Fan, Hang Su, Xingxing Wei |
阅读更多来源: ArXiv AI | 24-03-26
Silicon Bureaucracy and AI Test-Oriented Education: Contamination Sensitivity and Score Confidence in LLM Benchmarks
Authors: Yiliang Song, Hongjun An, Jiangan Chen, Xuanchen Yan, Huan Song, Jiawei Shao, Xuelong Li |
阅读更多来源: ArXiv AI | 24-03-26
INTRYGUE: Induction-Aware Entropy Gating for Reliable RAG Uncertainty Estimation
Authors: Alexandra Bazarova, Andrei Volodichev, Daria Kotova, Alexey Zaytsev |
阅读更多来源: ArXiv AI | 24-03-26
Reasoning Provenance for Autonomous AI Agents: Structured Behavioral Analytics Beyond State Checkpoints and Execution Traces
Authors: Neelmani Vispute |
阅读更多来源: ArXiv AI | 24-03-26
Mirage The Illusion of Visual Understanding
Authors: Mohammad Asadi, Jack W. O'Sullivan, Fang Cao, Tahoura Nedaee, Kamyar Fardi, Fei-Fei Li, Ehsan Adeli, Euan Ashley |
阅读更多来源: ArXiv AI | 24-03-26
CurvZO: Adaptive Curvature-Guided Sparse Zeroth-Order Optimization for Efficient LLM Fine-Tuning
Authors: Shuo Wang, Ziyu Chen, Ming Tang |
阅读更多来源: ArXiv AI | 24-03-26
A Blueprint for Self-Evolving Coding Agents in Vehicle Aerodynamic Drag Prediction
Authors: Jinhui Ren, Huaiming Li, Yabin Liu, Tao Li, Zhaokun Liu, Yujia Liang, Zengle Ge, Chufan Wu, Xiaomin Yuan, Danyu Liu, Annan Li, Jianmin Wu |
阅读更多来源: ArXiv AI | 24-03-26
Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models
Authors: Aryan Kasat, Smriti Singh, Aman Chadha, Vinija Jain |
阅读更多来源: ArXiv AI | 24-03-26
A Context Engineering Framework for Improving Enterprise AI Agents based on Digital-Twin MDP
Authors: Xi Yang, Aurelie Lozano, Naoki Abe, Bhavya, Saurabh Jha, Noah Zheutlin, Rohan R. Arora, Yu Deng, Daby M. Sow |
阅读更多来源: ArXiv AI | 24-03-26
Show HN: Cq – Stack Overflow for AI coding agentsblog.mozilla.ai
阅读更多来源: Hacker News | 24-03-26
LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language?dnhkng.github.io
阅读更多来源: Hacker News | 24-03-26
Claude Code Cheat Sheetstoryfox.cz
阅读更多来源: Hacker News | 24-03-26
iPhone 17 Pro Demonstrated Running a 400B LLMtwitter.com/anemll
阅读更多来源: Hacker News | 24-03-26
How I'm Productive with Claude Codeneilkakkar.com
阅读更多来源: Hacker News | 24-03-26
Designing AI for Disruptive Scienceasimov.press
阅读更多来源: Hacker News | 24-03-26
I built an AI receptionist for a mechanic shopitsthatlady.dev
阅读更多来源: Hacker News | 24-03-26
Walmart: ChatGPT checkout converted 3x worse than websitesearchengineland.com
阅读更多来源: Hacker News | 23-03-26
Promoting Critical Thinking With Domain-Specific Generative AI Provocations
Authors: Thomas Şerban von Davier, Hao-Ping Lee, Jodi Forlizzi, Sauvik Das |
阅读更多来源: ArXiv AI | 23-03-26
Detached Skip-Links and $R$-Probe: Decoupling Feature Aggregation from Gradient Propagation for MLLM OCR
Authors: Ziye Yuan, Ruchang Yao, Chengxin Zheng, Yusheng Zhao, Daxiang Dong, Ming Zhang |
阅读更多来源: ArXiv AI | 23-03-26
Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States
Authors: Yurun Yuan, Tengyang Xie |
阅读更多来源: ArXiv AI | 23-03-26
CoverageBench: Evaluating Information Coverage across Tasks and Domains
Authors: Saron Samuel, Andrew Yates, Dawn Lawrie, Ian Soboroff, Trevor Adriaanse, Benjamin Van Durme, Eugene Yang |
阅读更多来源: ArXiv AI | 23-03-26
LLM-Enhanced Semantic Data Integration of Electronic Component Qualifications in the Aerospace Domain
Authors: Antonio De Santis, Marco Balduini, Matteo Belcao, Andrea Proia, Marco Brambilla, Emanuele Della Valle |
阅读更多来源: ArXiv AI | 23-03-26
Fine-tuning Timeseries Predictors Using Reinforcement Learning
Authors: Hugo Cazaux, Ralph Rudd, Hlynur Stefánsson, Sverrir Ólafsson, Eyjólfur Ingi Ásgeirsson |
阅读更多来源: ArXiv AI | 23-03-26
The End of Rented Discovery: How AI Search Redistributes Power Between Hotels and Intermediaries
Authors: Peiying Zhu, Sidi Chang |
阅读更多来源: ArXiv AI | 23-03-26
The $\mathbf{Y}$-Combinator for LLMs: Solving Long-Context Rot with $λ$-Calculus
Authors: Amartya Roy, Rasul Tutunov, Xiaotong Ji, Matthieu Zimmer, Haitham Bou-Ammar |
阅读更多来源: ArXiv AI | 23-03-26
Var-JEPA: A Variational Formulation of the Joint-Embedding Predictive Architecture -- Bridging Predictive and Generative Self-Supervised Learning
Authors: Moritz Gögl, Christopher Yau |
阅读更多来源: ArXiv AI | 23-03-26
Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models
Authors: Wenjing Hong, Zhonghua Rong, Li Wang, Feng Chang, Jian Zhu, Ke Tang, Zexuan Zhu, Yew-Soon Ong |
阅读更多来源: ArXiv AI | 23-03-26
Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language Models
Authors: Qi Cao, Andrew Gambardella, Takeshi Kojima, Yutaka Matsuo, Yusuke Iwasawa |
阅读更多来源: ArXiv AI | 23-03-26
Measuring Faithfulness Depends on How You Measure: Classifier Sensitivity in LLM Chain-of-Thought Evaluation
Authors: Richard J. Young |
阅读更多来源: ArXiv AI | 23-03-26
Learning to Disprove: Formal Counterexample Generation with Large Language Models
Authors: Zenan Li, Zhaoyu Li, Kaiyu Yang, Xiaoxing Ma, Zhendong Su |
阅读更多来源: ArXiv AI | 23-03-26
Hyperagents
Authors: Jenny Zhang, Bingchen Zhao, Wannan Yang, Jakob Foerster, Jeff Clune, Minqi Jiang, Sam Devlin, Tatiana Shavrina |
阅读更多来源: ArXiv AI | 23-03-26
ItinBench: Benchmarking Planning Across Multiple Cognitive Dimensions with Large Language Models
Authors: Tianlong Wang, Pinqiao Wang, Weili Shi, Sheng li |
阅读更多来源: ArXiv AI | 23-03-26
PowerLens: Taming LLM Agents for Safe and Personalized Mobile Power Management
Authors: Xingyu Feng, Chang Sun, Yuzhu Wang, Zhangbing Zhou, Chengwen Luo, Zhuangzhuang Chen, Xiaomin Ouyang, Huanqi Yang |
阅读更多来源: ArXiv AI | 23-03-26
A Subgoal-driven Framework for Improving Long-Horizon LLM Agents
Authors: Taiyi Wang, Sian Gooding, Florian Hartmann, Oriana Riva, Edward Grefenstette |
阅读更多来源: ArXiv AI | 23-03-26
Utility-Guided Agent Orchestration for Efficient LLM Tool Use
Authors: Boyan Liu, Gongming Zhao, Hongli Xu |
阅读更多来源: ArXiv AI | 23-03-26
Experience is the Best Teacher: Motivating Effective Exploration in Reinforcement Learning for LLMs
Authors: Wenjian Zhang, Kongcheng Zhang, Jiaxin Qi, Baisheng Lai, Jianqiang Huang |
阅读更多来源: ArXiv AI | 23-03-26
On the Ability of Transformers to Verify Plans
Authors: Yash Sarrof, Yupei Du, Katharina Stein, Alexander Koller, Sylvie Thiébaux, Michael Hahn |
阅读更多来源: ArXiv AI | 23-03-26
Teaching Claude to QA a mobile appchristophermeiklejohn.com
阅读更多来源: Hacker News | 23-03-26
LLMs predict my coffeedynomight.net
阅读更多来源: Hacker News | 23-03-26
How to Attract AI Bots to Your Open Source Projectnesbitt.io
阅读更多来源: Hacker News | 23-03-26
Cross-Model Void Convergence: GPT-5.2 and Claude Opus 4.6 Deterministic Silencezenodo.org
阅读更多来源: Hacker News | 22-03-26
Brute-Forcing My Algorithmic Ignorance with an LLM in 7 Daysblog.dominikrudnik.pl
阅读更多来源: Hacker News | 22-03-26
Tinybox – A powerful computer for deep learningtinygrad.org
阅读更多来源: Hacker News | 22-03-26
Expert Personas Improve LLM Alignment but Damage Accuracy: Bootstrapping Intent-Based Persona Routing with PRISM
Authors: Zizhao Hu, Mohammad Rostami, Jesse Thomason |
阅读更多来源: ArXiv AI | 22-03-26
Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding
Authors: Yinghui Li, Jiayi Kuang, Peng Xing, Daixian Liu, Junnan Dong, Shu-Yu Guo, Yangning Li, Qingyu Zhou, Wenhao Jiang, Hai-Tao Zheng, Ying Shen, Liang Lin, Philip S. Yu |
阅读更多来源: ArXiv AI | 22-03-26
ZEBRAARENA: A Diagnostic Simulation Environment for Studying Reasoning-Action Coupling in Tool-Augmented LLMs
Authors: Wanjia Zhao, Ludwig Schmidt, James Zou, Vidhisha Balachandran, Lingjiao Chen |
阅读更多来源: ArXiv AI | 22-03-26
Reasonably reasoning AI agents can avoid game-theoretic failures in zero-shot, provably
Authors: Enoch Hyunwook Kang |
阅读更多来源: ArXiv AI | 22-03-26
D-Mem: A Dual-Process Memory System for LLM Agents
Authors: Zhixing You, Jiachen Yuan, Jason Cai |
阅读更多来源: ArXiv AI | 22-03-26
Proceedings of the 2nd Workshop on Advancing Artificial Intelligence through Theory of Mind
Authors: Nitay Alon, Joseph M. Barnby, Reuth Mirsky, Stefan Sarkadi |
阅读更多来源: ArXiv AI | 22-03-26
NeuroGame Transformer: Gibbs-Inspired Attention Driven by Game Theory and Statistical Physics
Authors: Djamel Bouchaffra, Fayçal Ykhlef, Hanene Azzag, Mustapha Lebbah, Bilal Faye |
阅读更多来源: ArXiv AI | 22-03-26
Analysis Of Linguistic Stereotypes in Single and Multi-Agent Generative AI Architectures
Authors: Martina Ullasci, Marco Rondina, Riccardo Coppola, Flavio Giobergia, Riccardo Bellanca, Gabriele Mancari Pasi, Luca Prato, Federico Spinoso, Silvia Tagliente |
阅读更多来源: ArXiv AI | 22-03-26
Bridging Network Fragmentation: A Semantic-Augmented DRL Framework for UAV-aided VANETs
Authors: Gaoxiang Cao, Wenke Yuan, Huasen He, Yunpeng Hou, Xiaofeng Jiang, Shuangwu Chen, Jian Yang |
阅读更多来源: ArXiv AI | 22-03-26
RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models
Authors: Xiao Feng, Bo Han, Zhanke Zhou, Jiaqi Fan, Jiangchao Yao, Ka Ho Li, Dahai Yu, Michael Kwok-Po Ng |
阅读更多来源: ArXiv AI | 22-03-26
ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents
Authors: Hao Zhang, Mingjie Liu, Shaokun Zhang, Songyang Han, Jian Hu, Zhenghui Jin, Yuchi Zhang, Shizhe Diao, Ximing Lu, Binfeng Xu, Zhiding Yu, Jan Kautz, Yi Dong |
阅读更多来源: ArXiv AI | 22-03-26
Can LLM generate interesting mathematical research problems?
Authors: Xiaoyang Chen, Xiang Jiang |
阅读更多来源: ArXiv AI | 22-03-26
dTRPO: Trajectory Reduction in Policy Optimization of Diffusion Large Language Models
Authors: Wenxuan Zhang, Lemeng Wu, Changsheng Zhao, Ernie Chang, Mingchen Zhuge, Zechun Liu, Andy Su, Hanxian Huang, Jun Chen, Chong Zhou, Raghuraman Krishnamoorthi, Vikas Chandra, Mohamed Elhoseiny, Wei Wen |
阅读更多来源: ArXiv AI | 22-03-26
Secure Linear Alignment of Large Language Models
Authors: Matt Gorbett, Suman Jana |
阅读更多来源: ArXiv AI | 22-03-26
Geography According to ChatGPT -- How Generative AI Represents and Reasons about Geography
Authors: Krzysztof Janowicz, Gengchen Mai, Rui Zhu, Song Gao, Zhangyu Wang, Yingjie Hu, Lauren Bennett |
阅读更多来源: ArXiv AI | 22-03-26
Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity
Authors: Qiawen Ella Liu, Marina Dubova, Henry Conklin, Takumi Harada, Thomas L. Griffiths |
阅读更多来源: ArXiv AI | 22-03-26
Man and machine: artificial intelligence and judicial decision making
Authors: Arthur Dyevre, Ahmad Shahvaroughi |
阅读更多来源: ArXiv AI | 22-03-26
Behavioral Fingerprints for LLM Endpoint Stability and Identity
Authors: Jonah Leshin, Manish Shah, Ian Timmis, Daniel Kang |
阅读更多来源: ArXiv AI | 22-03-26
Implicit Patterns in LLM-Based Binary Analysis
Authors: Qiang Li, XiangRui Zhang, Haining Wang |
阅读更多来源: ArXiv AI | 22-03-26
Box Maze: A Process-Control Architecture for Reliable LLM Reasoning
Authors: Zou Qiang |
阅读更多来源: ArXiv AI | 22-03-26
Thinking Fast, Slow, and Artificial: How AI Is Reshaping Human Reasoningssrn.com
阅读更多来源: Hacker News | 22-03-26
The Impact of AI on Game Dev Jobs. Open to Work Crisisdarkounity.com
阅读更多来源: Hacker News | 22-03-26
An industrial piping contractor on Claude Code [video]twitter.com/toddsaunders
阅读更多来源: Hacker News | 21-03-26
Atuin v18.13 – better search, a PTY proxy, and AI for your shellatuin.sh
阅读更多来源: Hacker News | 21-03-26
OpenCode – Open source AI coding agentopencode.ai
阅读更多来源: Hacker News | 21-03-26
Be intentional about how AI changes your codebaseswerdlow.dev
阅读更多来源: Hacker News | 20-03-26
Astral to Join OpenAIastral.sh
阅读更多来源: Hacker News | 20-03-26
Nvidia's Huang pitches AI tokens on top of salarycnbc.com
阅读更多来源: Hacker News | 20-03-26
FSF statement on copyright infringement lawsuit Bartz v. Anthropicfsf.org
阅读更多来源: Hacker News | 20-03-26
Push events into a running session with channelsclaude.com
阅读更多来源: Hacker News | 20-03-26
Launch HN: Canary (YC W26) – AI QA that understands your code
阅读更多来源: Hacker News | 20-03-26
EsoLang-Bench: Evaluating Genuine Reasoning in LLMs via Esoteric Languagesesolang-bench.vercel.app
阅读更多来源: Hacker News | 20-03-26
Cook: A simple CLI for orchestrating Claude Coderjcorwin.github.io
阅读更多来源: Hacker News | 19-03-26
OpenAI to Acquire Astralopenai.com
阅读更多来源: Hacker News | 19-03-26
2% of ICML papers desk rejected because the authors used LLM in their reviewsicml.cc
阅读更多来源: Hacker News | 19-03-26
Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No traininggithub.com/alainnothere
阅读更多来源: Hacker News | 19-03-26
Dropout Robustness and Cognitive Profiling of Transformer Models via Stochastic Inference
Authors: Antônio Junior Alves Caiado, Michael Hahsler |
阅读更多来源: ArXiv AI | 19-03-26
Mitigating LLM Hallucinations through Domain-Grounded Tiered Retrieval
Authors: Md. Asraful Haque, Aasar Mehdi, Maaz Mahboob, Tamkeen Fatima |
阅读更多来源: ArXiv AI | 19-03-26
Procedural Generation of Algorithm Discovery Tasks in Machine Learning
Authors: Alexander D. Goldie, Zilin Wang, Adrian Hayler, Deepak Nathani, Edan Toledo, Ken Thampiratwong, Aleksandra Kalisz, Michael Beukman, Alistair Letcher, Shashank Reddy, Clarisse Wibault, Theo Wolf, Charles O'Neill, Uljad Berdica, Nicholas Roberts, Saeed Rahmani, Hannah Erlebach, Roberta Raileanu, Shimon Whiteson, Jakob N. Foerster |
阅读更多来源: ArXiv AI | 19-03-26
How do LLMs Compute Verbal Confidence
Authors: Dharshan Kumaran, Arthur Conmy, Federico Barbero, Simon Osindero, Viorica Patraucean, Petar Velickovic |
阅读更多来源: ArXiv AI | 19-03-26
Differential Privacy in Generative AI Agents: Analysis and Optimal Tradeoffs
Authors: Ya-Ting Yang, Quanyan Zhu |
阅读更多来源: ArXiv AI | 19-03-26
scicode-lint: Detecting Methodology Bugs in Scientific Python Code with LLM-Generated Patterns
Authors: Sergey V. Samsonau |
阅读更多来源: ArXiv AI | 19-03-26
RAMP: Reinforcement Adaptive Mixed Precision Quantization for Efficient On Device LLM Inference
Authors: Arpit Singh Gautam, Saurabh Jha |
阅读更多来源: ArXiv AI | 19-03-26
IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia
Authors: Priyaranjan Pattnayak, Sanchari Chowdhuri |
阅读更多来源: ArXiv AI | 19-03-26
TDAD: Test-Driven Agentic Development - Reducing Code Regressions in AI Coding Agents via Graph-Based Impact Analysis
Authors: Pepe Alonso |
阅读更多来源: ArXiv AI | 19-03-26
Graph-Native Cognitive Memory for AI Agents: Formal Belief Revision Semantics for Versioned Memory Architectures
Authors: Young Bin Park |
阅读更多来源: ArXiv AI | 19-03-26
How Clued up are LLMs? Evaluating Multi-Step Deductive Reasoning in a Text-Based Game Environment
Authors: Rebecca Ansell, Autumn Toney-Wails |
阅读更多来源: ArXiv AI | 19-03-26
Sensi: Learn One Thing at a Time -- Curriculum-Based Test-Time Learning for LLM Game Agents
Authors: Mohsen Arjmandi |
阅读更多来源: ArXiv AI | 19-03-26
MALLES: A Multi-agent LLMs-based Economic Sandbox with Consumer Preference Alignment
Authors: Yusen Wu, Yiran Liu, Xiaotie Deng |
阅读更多来源: ArXiv AI | 19-03-26
Facts as First Class Objects: Knowledge Objects for Persistent LLM Memory
Authors: Oliver Zahn, Simran Chana |
阅读更多来源: ArXiv AI | 19-03-26
RPMS: Enhancing LLM-Based Embodied Planning through Rule-Augmented Memory Synergy
Authors: Zhenhang Yuan, Shenghai Yuan, Lihua Xie |
阅读更多来源: ArXiv AI | 19-03-26
OpenAI Has New Focus (on the IPO)om.co
阅读更多来源: Hacker News | 19-03-26
Book: The Emerging Science of Machine Learning Benchmarksmlbenchmarks.org
阅读更多来源: Hacker News | 19-03-26
Safety is Non-Compositional: A Formal Framework for Capability-Based AI Systems
Authors: Cosimo Spera |
阅读更多来源: ArXiv AI | 18-03-26
Selective Memory for Artificial Intelligence: Write-Time Gating with Hierarchical Archiving
Authors: Oliver Zahn, Simran Chana |
阅读更多来源: ArXiv AI | 18-03-26
VIGIL: Towards Edge-Extended Agentic AI for Enterprise IT Support
Authors: Sarthak Ahuja, Neda Kordjazi, Evren Yortucboylu, Vishaal Kapoor, Mariam Dundua, Yiming Li, Derek Ho, Vaibhavi Padala, Jennifer Whitted, Rebecca Steinert |
阅读更多来源: ArXiv AI | 18-03-26
A Context Alignment Pre-processor for Enhancing the Coherence of Human-LLM Dialog
Authors: Ding Wei |
阅读更多来源: ArXiv AI | 18-03-26
POaaS: Minimal-Edit Prompt Optimization as a Service to Lift Accuracy and Cut Hallucinations on On-Device sLLMs
Authors: Jungwoo Shim, Dae Won Kim, Sun Wook Kim, Soo Young Kim, Myungcheol Lee, Jae-geun Cha, Hyunhwa Choi |
阅读更多来源: ArXiv AI | 18-03-26
Enhancing Linguistic Generalization of VLA: Fine-Tuning OpenVLA via Synthetic Instruction Augmentation
Authors: Dongik Shin |
阅读更多来源: ArXiv AI | 18-03-26
Are Large Language Models Truly Smarter Than Humans?
Authors: Eshwar Reddy M, Sourav Karmakar |
阅读更多来源: ArXiv AI | 18-03-26
NeuronSpark: A Spiking Neural Network Language Model with Selective State Space Dynamics
Authors: Zhengzheng Tang |
阅读更多来源: ArXiv AI | 18-03-26
Adaptive Theory of Mind for LLM-based Multi-Agent Coordination
Authors: Chunjiang Mu, Ya Zeng, Qiaosheng Zhang, Kun Shao, Chen Chu, Hao Guo, Danyang Jia, Zhen Wang, Shuyue Hu |
阅读更多来源: ArXiv AI | 18-03-26
From Natural Language to Executable Option Strategies via Large Language Models
Authors: Haochen Luo, Zhengzhao Lai, Junjie Xu, Yifan Li, Tang Pok Hin, Yuan Zhang, Chen Liu |
阅读更多来源: ArXiv AI | 18-03-26
Via Negativa for AI Alignment: Why Negative Constraints Are Structurally Superior to Positive Preferences
Authors: Quan Cheng |
阅读更多来源: ArXiv AI | 18-03-26
RetailBench: Evaluating Long-Horizon Autonomous Decision-Making and Strategy Stability of LLM Agents in Realistic Retail Environments
Authors: Linghua Zhang, Jun Wang, Jingtong Wu, Zhisong Zhang |
阅读更多来源: ArXiv AI | 18-03-26
Designing for Disagreement: Front-End Guardrails for Assistance Allocation in LLM-Enabled Robots
Authors: Carmen Ng |
阅读更多来源: ArXiv AI | 18-03-26
ExpressMind: A Multimodal Pretrained Large Language Model for Expressway Operation
Authors: Zihe Wang, Yihuan Wang, Haiyang Yu. Zhiyong Cui, Xiaojian Liao, Chengcheng Wang, Yonglin Tian, Yongxin Tong |
阅读更多来源: ArXiv AI | 18-03-26
Breaking the Chain: A Causal Analysis of LLM Faithfulness to Intermediate Structures
Authors: Oleg Somov, Mikhail Chaichuk, Mikhail Seleznyov, Alexander Panchenko, Elena Tutubalina |
阅读更多来源: ArXiv AI | 18-03-26
When AI Navigates the Fog of War
Authors: Ming Li, Xirui Li, Tianyi Zhou |
阅读更多来源: ArXiv AI | 18-03-26
Runtime Governance for AI Agents: Policies on Paths
Authors: Maurits Kaptein, Vassilis-Javed Khan, Andriy Podstavnychy |
阅读更多来源: ArXiv AI | 18-03-26
BenchPreS: A Benchmark for Context-Aware Personalized Preference Selectivity of Persistent-Memory LLMs
Authors: Sangyeon Yoon, Sunkyoung Kim, Hyesoo Hong, Wonje Jeung, Yongil Kim, Wooseok Seo, Heuiyeen Yeen, Albert No |
阅读更多来源: ArXiv AI | 18-03-26
Differential Harm Propensity in Personalized LLM Agents: The Curious Case of Mental Health Disclosure
Authors: Caglar Yildirim |
阅读更多来源: ArXiv AI | 18-03-26
Anticipatory Planning for Multimodal AI Agents
Authors: Yongyuan Liang, Shijie Zhou, Yu Gu, Hao Tan, Gang Wu, Franck Dernoncourt, Jihyung Kil, Ryan A. Rossi, Ruiyi Zhang |
阅读更多来源: ArXiv AI | 18-03-26
Nonstandard Errors in AI Agents
Authors: Ruijiang Gao, Steven Chong Xiao |
阅读更多来源: ArXiv AI | 18-03-26
Is Conformal Factuality for RAG-based LLMs Robust? Novel Metrics and Systematic Insights
Authors: Yi Chen, Daiwei Chen, Sukrut Madhav Chikodikar, Caitlyn Heqi Yin, Ramya Korlakai Vinayak |
阅读更多来源: ArXiv AI | 18-03-26
Prompt Programming for Cultural Bias and Alignment of Large Language Models
Authors: Maksim Eren, Eric Michalak, Brian Cook, Johnny Seales Jr |
阅读更多来源: ArXiv AI | 18-03-26
Why AI systems don't learn – On autonomous learning from cognitive sciencearxiv.org
阅读更多来源: Hacker News | 18-03-26
Celebrating Tony Hoare's mark on computer sciencebertrandmeyer.com
阅读更多来源: Hacker News | 18-03-26
Mistral AI Releases Forgemistral.ai
阅读更多来源: Hacker News | 18-03-26
Launch an autonomous AI agent with sandboxed execution in 2 lines of codeamaiya.github.io
阅读更多来源: Hacker News | 18-03-26
Claude Tips for 3D Workdavesnider.com
阅读更多来源: Hacker News | 17-03-26
Why I love FreeBSDdragas.net
阅读更多来源: Hacker News | 17-03-26
Dynamic Theory of Mind as a Temporal Memory Problem: Evidence from Large Language Models
Authors: Thuy Ngoc Nguyen, Duy Nhat Phan, Cleotilde Gonzalez |
阅读更多来源: ArXiv AI | 17-03-26
Argumentation for Explainable and Globally Contestable Decision Support with LLMs
Authors: Adam Dejl, Matthew Williams, Francesca Toni |
阅读更多来源: ArXiv AI | 17-03-26
SuperLocalMemory V3: Information-Geometric Foundations for Zero-LLM Enterprise Agent Memory
Authors: Varun Pratap Bhardwaj |
阅读更多来源: ArXiv AI | 17-03-26
BrainBench: Exposing the Commonsense Reasoning Gap in Large Language Models
Authors: Yuzhe Tang |
阅读更多来源: ArXiv AI | 17-03-26
GameUIAgent: An LLM-Powered Framework for Automated Game UI Design with Structured Intermediate Representation
Authors: Wei Zeng, Fengwei An, Zhen Liu, Jian Zhao |
阅读更多来源: ArXiv AI | 17-03-26
Punctuated Equilibria in Artificial Intelligence: The Institutional Scaling Law and the Speciation of Sovereign AI
Authors: Mark Baciak, Thomas A. Cellucci, Deanna M. Falkowski |
阅读更多来源: ArXiv AI | 17-03-26
Knowledge Activation: AI Skills as the Institutional Knowledge Primitive for Agentic Software Development
Authors: Gal Bakal |
阅读更多来源: ArXiv AI | 17-03-26
OpenHospital: A Thing-in-itself Arena for Evolving and Benchmarking LLM-based Collective Intelligence
Authors: Peigen Liu, Rui Ding, Yuren Mao, Ziyan Jiang, Yuxiang Ye, Yunjun Gao, Ying Zhang, Renjie Sun, Longbin Lai, Zhengping Qian |
阅读更多来源: ArXiv AI | 17-03-26
A Hybrid AI and Rule-Based Decision Support System for Disease Diagnosis and Management Using Labs
Authors: Muhammad Hammad Maqsood, Mubashir Sajid, Khubaib Ahmed, Muhammad Usamah Shahid, Muddassar Farooq |
阅读更多来源: ArXiv AI | 17-03-26
Modeling Matches as Language: A Generative Transformer Approach for Counterfactual Player Valuation in Football
Authors: Miru Hong, Minho Lee, Geonhee Jo, Hyeokje Jo, Pascal Bauer, Sang-Ki Ko |
阅读更多来源: ArXiv AI | 17-03-26
PrototypeNAS: Rapid Design of Deep Neural Networks for Microcontroller Units
Authors: Mark Deutel, Simon Geis, Axel Plinge |
阅读更多来源: ArXiv AI | 17-03-26
SAGE: Multi-Agent Self-Evolution for LLM Reasoning
Authors: Yulin Peng, Xinxin Zhu, Chenxing Wei, Nianbo Zeng, Leilei Wang, Ying Tiffany He, F. Richard Yu |
阅读更多来源: ArXiv AI | 17-03-26
Why the Valuable Capabilities of LLMs Are Precisely the Unexplainable Ones
Authors: Quan Cheng |
阅读更多来源: ArXiv AI | 17-03-26
Intelligent Co-Design: An Interactive LLM Framework for Interior Spatial Design via Multi-Modal Agents
Authors: Ren Jian Lim, Rushi Dai |
阅读更多来源: ArXiv AI | 17-03-26
Evolutionary Transfer Learning for Dragonchess
Authors: Jim O'Connor, Annika Hoag, Sarah Goyette, Gary B. Parker |
阅读更多来源: ArXiv AI | 17-03-26
Brain-Inspired Graph Multi-Agent Systems for LLM Reasoning
Authors: Guangfu Hao, Yuming Dai, Xianzhe Qin, Shan Yu |
阅读更多来源: ArXiv AI | 17-03-26
Agent Lifecycle Toolkit (ALTK): Reusable Middleware Components for Robust AI Agents
Authors: Zidane Wright, Jason Tsay, Anupama Murthi, Osher Elhadad, Diego Del Rio, Saurabh Goyal, Kiran Kate, Jim Laredo, Koren Lazar, Vinod Muthusamy, Yara Rizk |
阅读更多来源: ArXiv AI | 17-03-26
Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty
Authors: Jeonghye Kim, Xufang Luo, Minbeom Kim, Sangmook Lee, Dongsheng Li, Yuqing Yang |
阅读更多来源: ArXiv AI | 17-03-26
Are Dilemmas and Conflicts in LLM Alignment Solvable? A View from Priority Graph
Authors: Zhenheng Tang, Xiang Liu, Qian Wang, Eunsol Choi, Bo Li, Xiaowen Chu |
阅读更多来源: ArXiv AI | 17-03-26
Human-Centered Evaluation of an LLM-Based Process Modeling Copilot: A Mixed-Methods Study with Domain Experts
Authors: Chantale Lauer, Peter Pfeiffer, Nijat Mehdiyev |
阅读更多来源: ArXiv AI | 17-03-26
Delta1 with LLM: symbolic and neural integration for credible and explainable reasoning
Authors: Yang Xu, Jun Liu, Shuwei Chen, Chris Nugent, Hailing Guo |
阅读更多来源: ArXiv AI | 17-03-26
Competition-Aware CPC Forecasting with Near-Market Coverage
Authors: Sebastian Frey, Edoardo Beccari, Maximilian Kranz, Nicolò Alberto Pellizzari, Ali Mete Karaman, Qiwei Han, Maximilian Kaiser |
阅读更多来源: ArXiv AI | 17-03-26
Human-in-the-Loop LLM Grading for Handwritten Mathematics Assessments
Authors: Arne Vanhoyweghen, Vincent Holst, Melika Mobini, Lukas Van de Voorde, Tibo Vanleke, Bert Verbruggen, Brecht Verbeken, Andres Algaba, Sam Verboven, Marie-Anne Guerry, Filip Van Droogenbroeck, Vincent Ginis |
阅读更多来源: ArXiv AI | 17-03-26
Developing the PsyCogMetrics AI Lab to Evaluate Large Language Models and Advance Cognitive Science -- A Three-Cycle Action Design Science Study
Authors: Zhiye Jin, Yibai Li, K. D. Joshi, Xuefei (Nancy)Deng, Xiaobing (Emily)Li |
阅读更多来源: ArXiv AI | 17-03-26
Geometry-Guided Camera Motion Understanding in VideoLLMs
Authors: Haoan Feng, Sri Harsha Musunuri, Guan-Ming Su |
阅读更多来源: ArXiv AI | 17-03-26
LLM Constitutional Multi-Agent Governance
Authors: J. de Curtò, I. de Zarzà |
阅读更多来源: ArXiv AI | 17-03-26
ToolTree: Efficient LLM Agent Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning
Authors: Shuo Yang, Soyeon Caren Han, Yihao Ding, Shuhe Wang, Eduard Hoy |
阅读更多来源: ArXiv AI | 17-03-26
On Using Machine Learning to Early Detect Catastrophic Failures in Marine Diesel Engines
Authors: Francesco Maione, Paolo Lino, Giuseppe Giannino, Guido Maione |
阅读更多来源: ArXiv AI | 17-03-26
AI Planning Framework for LLM-Based Web Agents
Authors: Orit Shahnovsky, Rotem Dror |
阅读更多来源: ArXiv AI | 17-03-26
Context is all you need: Towards autonomous model-based process design using agentic AI in flowsheet simulations
Authors: Pascal Schäfer, Lukas J. Krinke, Martin Wlotzka, Norbert Asprion |
阅读更多来源: ArXiv AI | 17-03-26
Efficient and Interpretable Multi-Agent LLM Routing via Ant Colony Optimization
Authors: Xudong Wang, Chaoning Zhang, Jiaquan Zhang, Chenghao Li, Qigan Sun, Sung-Ho Bae, Peng Wang, Ning Xie, Jie Zou, Yang Yang, Hengtao Shen |
阅读更多来源: ArXiv AI | 17-03-26
Show HN: Claude Code skills that build complete Godot gamesgithub.com/htdt
阅读更多来源: Hacker News | 17-03-26
Launch HN: Voygr (YC W26) – A better maps API for agents and AI apps
阅读更多来源: Hacker News | 17-03-26
LLM Architecture Gallerysebastianraschka.com
阅读更多来源: Hacker News | 16-03-26
LLMs can be exhaustingtomjohnell.com
阅读更多来源: Hacker News | 16-03-26
How I write software with LLMsstavros.io
阅读更多来源: Hacker News | 16-03-26
A Visual Introduction to Machine Learning (2015)r2d3.us
阅读更多来源: Hacker News | 16-03-26
Quillx is an open standard for disclosing AI involvement in software projectsgithub.com/qainsights
阅读更多来源: Hacker News | 16-03-26
Leveraging Large Language Models and Survival Analysis for Early Prediction of Chemotherapy Outcomes
Authors: Muhammad Faisal Shahid, Asad Afzal, Abdullah Faiz, Muhammad Siddiqui, Arbaz Khan Shehzad, Fatima Aftab, Muhammad Usamah Shahid, Muddassar Farooq |
阅读更多来源: ArXiv AI | 15-03-26
AI Knows What's Wrong But Cannot Fix It: Helicoid Dynamics in Frontier LLMs Under High-Stakes Decisions
Authors: Alejandro R Jadad |
阅读更多来源: ArXiv AI | 15-03-26
Explicit Logic Channel for Validation and Enhancement of MLLMs on Zero-Shot Tasks
Authors: Mei Chee Leong, Ying Gu, Hui Li Tan, Liyuan Li, Nancy Chen |
阅读更多来源: ArXiv AI | 15-03-26
LLMs can construct powerful representations and streamline sample-efficient supervised learning
Authors: Ilker Demirel, Larry Shi, Zeshan Hussain, David Sontag |
阅读更多来源: ArXiv AI | 15-03-26
Scaling Laws for Educational AI Agents
Authors: Mengsong Wu, Hao Hao, Shuzhen Bi, Keqian Li, Wentao Liu, Siyu Song, Hongbo Zhao, Aimin Zhou |
阅读更多来源: ArXiv AI | 15-03-26
STAIRS-Former: Spatio-Temporal Attention with Interleaved Recursive Structure Transformer for Offline Multi-task Multi-agent Reinforcement Learning
Authors: Jiwon Jeon, Myungsik Cho, Youngchul Sung |
阅读更多来源: ArXiv AI | 15-03-26
An Automatic Text Classification Method Based on Hierarchical Taxonomies, Neural Networks and Document Embedding: The NETHIC Tool
Authors: Luigi Lomasto, Rosario Di Florio, Andrea Ciapetti, Giuseppe Miscione, Giulia Ruggiero, Daniele Toti |
阅读更多来源: ArXiv AI | 15-03-26
Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework
Authors: Chingkwun Lam, Jiaxin Li, Lingfei Zhang, Kuo Zhao |
阅读更多来源: ArXiv AI | 15-03-26
Social, Legal, Ethical, Empathetic and Cultural Norm Operationalisation for AI Agents
Authors: Radu Calinescu, Ana Cavalcanti, Marsha Chechik, Lina Marsso, Beverley Townsend |
阅读更多来源: ArXiv AI | 15-03-26
Automated Detection of Malignant Lesions in the Ovary Using Deep Learning Models and XAI
Authors: Md. Hasin Sarwar Ifty, Nisharga Nirjan, Labib Islam, M. A. Diganta, Reeyad Ahmed Ornate, Anika Tasnim, Md. Saiful Islam |
阅读更多来源: ArXiv AI | 15-03-26
Can RL Improve Generalization of LLM Agents? An Empirical Study
Authors: Zhiheng Xi, Xin Guo, Jiaqi Liu, Jiazheng Zhang, Yutao Fan, Zhihao Zhang, Shichun Liu, Mingxu Chai, Xiaowei Shi, Yitao Zhai, Xunliang Cai, Tao Gui, Qi Zhang, Xuanjing Huang |
阅读更多来源: ArXiv AI | 15-03-26
On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents
Authors: Deyu Zou, Yongqiang Chen, Fan Feng, Mufei Li, Pan Li, Yu Gong, James Cheng |
阅读更多来源: ArXiv AI | 15-03-26
TopoBench: Benchmarking LLMs on Hard Topological Reasoning
Authors: Mayug Maniparambil, Nils Hoehing, Janak Kapuriya, Arjun Karuvally, Ellen Rushe, Anthony Ventresque, Noel O'Connor, Fergal Reid |
阅读更多来源: ArXiv AI | 15-03-26
Increasing intelligence in AI agents can worsen collective outcomes
Authors: Neil F. Johnson |
阅读更多来源: ArXiv AI | 15-03-26
Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training
Authors: Yixin Liu, Yue Yu, DiJia Su, Sid Wang, Xuewei Wang, Song Jiang, Bo Liu, Arman Cohan, Yuandong Tian, Zhengxing Chen |
阅读更多来源: ArXiv AI | 15-03-26
A most elegant TCP hole punching algorithmrobertsdotpm.github.io
阅读更多来源: Hacker News | 15-03-26
The Appalling Stupidity of Spotify's AI DJcharlespetzold.com
阅读更多来源: Hacker News | 15-03-26
I'm 60 years old. Claude Code killed a passion
阅读更多来源: Hacker News | 15-03-26
Show HN: GitAgent – An open standard that turns any Git repo into an AI agentgitagent.sh
阅读更多来源: Hacker News | 15-03-26
1M context is now generally available for Opus 4.6 and Sonnet 4.6claude.com
阅读更多来源: Hacker News | 15-03-26
Refinement Modeling and Verification of RISC-V Assembly Using Knuckledraggerphilipzucker.com
阅读更多来源: Hacker News | 15-03-26
Launching the Claude Partner Networkanthropic.com
阅读更多来源: Hacker News | 15-03-26
Can I run AI locally?canirun.ai
阅读更多来源: Hacker News | 14-03-26
Launch HN: Captain (YC W26) – Automated RAG for Filesruncaptain.com
阅读更多来源: Hacker News | 14-03-26
Launch HN: Spine Swarm (YC S23) – AI agents that collaborate on a visual canvasgetspine.ai
阅读更多来源: Hacker News | 14-03-26
Show HN: Context Gateway – Compress agent context before it hits the LLMgithub.com/compresr-ai
阅读更多来源: Hacker News | 14-03-26
Elon Musk pushes out more xAI founders as AI coding effort faltersft.com
阅读更多来源: Hacker News | 14-03-26
Prompt-caching – auto-injects Anthropic cache breakpoints (90% token savings)prompt-caching.ai
阅读更多来源: Hacker News | 13-03-26
What we learned from a 22-Day storage bug (and how we fixed it)mux.com
阅读更多来源: Hacker News | 13-03-26
Executing programs inside transformers with exponentially faster inferencepercepta.ai
阅读更多来源: Hacker News | 13-03-26
Show HN: Axe – A 12MB binary that replaces your AI frameworkgithub.com/jrswab
阅读更多来源: Hacker News | 13-03-26
Show HN: OneCLI – Vault for AI Agents in Rustgithub.com/onecli
阅读更多来源: Hacker News | 13-03-26
Are LLM merge rates not getting better?entropicthoughts.com
阅读更多来源: Hacker News | 13-03-26
Innocent woman jailed after being misidentified using AI facial recognitiongrandforksherald.com
阅读更多来源: Hacker News | 13-03-26
Document poisoning in RAG systems: How attackers corrupt AI's sourcesaminrj.com
阅读更多来源: Hacker News | 13-03-26
BitNet: Inference framework for 1-bit LLMsgithub.com/microsoft
阅读更多来源: Hacker News | 12-03-26
Reliable Software in the LLM Eraquint-lang.org
阅读更多来源: Hacker News | 12-03-26
I was interviewed by an AI bot for a jobtheverge.com
阅读更多来源: Hacker News | 12-03-26
Speaker Verification with Speech-Aware LLMs: Evaluation and Augmentation
Authors: Thomas Thebaud, Yuzhe Wang, Laureano Moro-Velazquez, Jesus Villalba-Lopez, Najim Dehak |
阅读更多来源: ArXiv AI | 12-03-26
Risk-Adjusted Harm Scoring for Automated Red Teaming for LLMs in Financial Services
Authors: Fabrizio Dimino, Bhaskarjit Sarmah, Stefano Pasquali |
阅读更多来源: ArXiv AI | 12-03-26
Towards Intelligent Spectrum Management: Spectrum Demand Estimation Using Graph Neural Networks
Authors: Mohamad Alkadamani, Amir Ghasemi, Halim Yanikomeroglu |
阅读更多来源: ArXiv AI | 12-03-26
Continuous Diffusion Transformers for Designing Synthetic Regulatory Elements
Authors: Jonathan Liu, Kia Ghods |
阅读更多来源: ArXiv AI | 12-03-26
An Extreme Multi-label Text Classification (XMTC) Library Dataset: What if we took "Use of Practical AI in Digital Libraries" seriously?
Authors: Jennifer D'Souza, Sameer Sadruddin, Maximilian Kähler, Andrea Salfinger, Luca Zaccagna, Francesca Incitti, Lauro Snidaro, Osma Suominen |
阅读更多来源: ArXiv AI | 12-03-26
Contact Coverage-Guided Exploration for General-Purpose Dexterous Manipulation
Authors: Zixuan Liu, Ruoyi Qiao, Chenrui Tie, Xuanwei Liu, Yunfan Lou, Chongkai Gao, Zhixuan Xu, Lin Shao |
阅读更多来源: ArXiv AI | 12-03-26
When Fine-Tuning Fails and when it Generalises: Role of Data Diversity and Mixed Training in LLM-based TTS
Authors: Anupam Purwar, Aditya Choudhary |
阅读更多来源: ArXiv AI | 12-03-26
Does AI See like Art Historians? Interpreting How Vision Language Models Recognize Artistic Style
Authors: Marvin Limpijankit, Milad Alshomary, Yassin Oulad Daoud, Amith Ananthram, Tim Trombley, Elias Stengel-Eskin, Mohit Bansal, Noam M. Elcott, Kathleen McKeown |
阅读更多来源: ArXiv AI | 12-03-26
RCTs & Human Uplift Studies: Methodological Challenges and Practical Solutions for Frontier AI Evaluation
Authors: Patricia Paskov, Kevin Wei, Shen Zhou Hong, Dan Bateyko, Xavier Roberts-Gaal, Carson Ezell, Gailius Praninskas, Valerie Chen, Umang Bhatt, Ella Guest |
阅读更多来源: ArXiv AI | 12-03-26
Artificial Intelligence as a Catalyst for Innovation in Software Engineering
Authors: Carlos Alberto Fernández-y-Fernández, Jorge R. Aguilar-Cisneros |
阅读更多来源: ArXiv AI | 12-03-26
Beyond Scalars: Evaluating and Understanding LLM Reasoning via Geometric Progress and Stability
Authors: Xinyan Jiang, Ninghao Liu, Di Wang, Lijie Hu |
阅读更多来源: ArXiv AI | 12-03-26
Resource-constrained Amazons chess decision framework integrating large language models and graph attention
Authors: Tianhao Qian, Zhuoxuan Li, Jinde Cao, Xinli Shi, Hanjie Liu, Leszek Rutkowski |
阅读更多来源: ArXiv AI | 12-03-26
Verbalizing LLM's Higher-order Uncertainty via Imprecise Probabilities
Authors: Anita Yang, Krikamol Muandet, Michele Caprio, Siu Lun Chau, Masaki Adachi |
阅读更多来源: ArXiv AI | 12-03-26
IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs
Authors: Chuan Guo, Juan Felipe Ceron Uribe, Sicheng Zhu, Christopher A. Choquette-Choo, Steph Lin, Nikhil Kandpal, Milad Nasr, Rai (Michael Pokorny), Sam Toyer, Miles Wang, Yaodong Yu, Alex Beutel, Kai Xiao |
阅读更多来源: ArXiv AI | 12-03-26
Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning
Authors: Zhaowei Zhang, Xiaohan Liu, Xuekai Zhu, Junchao Huang, Ceyao Zhang, Zhiyuan Feng, Yaodong Yang, Xiaoyuan Yi, Xing Xie |
阅读更多来源: ArXiv AI | 12-03-26
FAME: Formal Abstract Minimal Explanation for Neural Networks
Authors: Ryma Boumazouza, Raya Elsaleh, Melanie Ducoffe, Shahaf Bassan, Guy Katz |
阅读更多来源: ArXiv AI | 12-03-26
Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization
Authors: Linghao Zhang |
阅读更多来源: ArXiv AI | 12-03-26
Preliminary data from a longitudinal AI impact studygetdx.com
阅读更多来源: Hacker News | 12-03-26
CNN Explainer – Learn Convolutional Neural Network in Your Browser (2020)poloclub.github.io
阅读更多来源: Hacker News | 12-03-26
Show HN: A context-aware permission guard for Claude Codegithub.com/manuelschipper
阅读更多来源: Hacker News | 12-03-26
Surpassing vLLM with a Generated Inference Stackinfinity.inc
阅读更多来源: Hacker News | 11-03-26
Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicongithub.com/runanywhereai
阅读更多来源: Hacker News | 11-03-26
Agents that run while I sleepclaudecodecamp.com
阅读更多来源: Hacker News | 11-03-26
Yann LeCun raises $1B to build AI that understands the physical worldwired.com
阅读更多来源: Hacker News | 11-03-26
Interpretable Markov-Based Spatiotemporal Risk Surfaces for Missing-Child Search Planning with Reinforcement Learning and LLM-Based Quality Assurance
Authors: Joshua Castillo, Ravi Mukkamala |
阅读更多来源: ArXiv AI | 11-03-26
Quantifying the Accuracy and Cost Impact of Design Decisions in Budget-Constrained Agentic LLM Search
Authors: Kyle McCleary, James Ghawaly |
阅读更多来源: ArXiv AI | 11-03-26
LDP: An Identity-Aware Protocol for Multi-Agent LLM Systems
Authors: Sunil Prakash |
阅读更多来源: ArXiv AI | 11-03-26
MEMO: Memory-Augmented Model Context Optimization for Robust Multi-Turn Multi-Agent LLM Games
Authors: Yunfei Xie, Kevin Wang, Bobby Cheng, Jianzhu Yao, Zhizhou Sha, Alexander Duffy, Yihan Xi, Hongyuan Mei, Cheston Tan, Chen Wei, Pramod Viswanath, Zhangyang Wang |
阅读更多来源: ArXiv AI | 11-03-26
A Consensus-Driven Multi-LLM Pipeline for Missing-Person Investigations
Authors: Joshua Castillo, Ravi Mukkamala |
阅读更多来源: ArXiv AI | 11-03-26
Chaotic Dynamics in Multi-LLM Deliberation
Authors: Hajime Shimao, Warut Khern-am-nuai, Sung Joo Kim |
阅读更多来源: ArXiv AI | 11-03-26
From Days to Minutes: An Autonomous AI Agent Achieves Reliable Clinical Triage in Remote Patient Monitoring
Authors: Seunghwan Kim (1), Tiffany H. Kung (1 and 2), Heena Verma (1), Dilan Edirisinghe (1), Kaveh Sedehi (1), Johanna Alvarez (1), Diane Shilling (1), Audra Lisa Doyle (1), Ajit Chary (1), William Borden (1 and 3), Ming Jack Po (1) ((1) AnsibleHealth Inc., San Francisco, USA (2) Stanford School of Medicine, Stanford, USA (3) George Washington University, Washington, D.C., USA) |
阅读更多来源: ArXiv AI | 11-03-26
Social-R1: Towards Human-like Social Reasoning in LLMs
Authors: Jincenzi Wu, Yuxuan Lei, Jianxun Lian, Yitian Huang, Lexin Zhou, Haotian Li, Xing Xie, Helen Meng |
阅读更多来源: ArXiv AI | 11-03-26
Cognitively Layered Data Synthesis for Domain Adaptation of LLMs to Space Situational Awareness
Authors: Ding Linghu, Cheng Wang, Da Fan, Wei Shi, Kaifeng Yin, Xiaoliang Xue, Fan Yang, Haiyi Ren, Cong Zhang |
阅读更多来源: ArXiv AI | 11-03-26
Abundant Intelligence and Deficient Demand: A Macro-Financial Stress Test of Rapid AI Adoption
Authors: Xupeng Chen |
阅读更多来源: ArXiv AI | 11-03-26
Explainable Innovation Engine: Dual-Tree Agent-RAG with Methods-as-Nodes and Verifiable Write-Back
Authors: Renwei Meng |
阅读更多来源: ArXiv AI | 11-03-26
Rescaling Confidence: What Scale Design Reveals About LLM Metacognition
Authors: Yuyang Dai |
阅读更多来源: ArXiv AI | 11-03-26
AI Act Evaluation Benchmark: An Open, Transparent, and Reproducible Evaluation Dataset for NLP and RAG Systems
Authors: Athanasios Davvetas, Michael Papademas, Xenia Ziouvelou, Vangelis Karkaletsis |
阅读更多来源: ArXiv AI | 11-03-26
GenePlan: Evolving Better Generalized PDDL Plans using Large Language Models
Authors: Andrew Murray, Danial Dervovic, Alberto Pozanco, Michael Cashmore |
阅读更多来源: ArXiv AI | 11-03-26
MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants
Authors: Zuhao Zhang, Chengyue Yu, Yuante Li, Chenyi Zhuang, Linjian Mo, Shuai Li |
阅读更多来源: ArXiv AI | 11-03-26
Enhancing Debunking Effectiveness through LLM-based Personality Adaptation
Authors: Pietro Dell'Oglio, Alessandro Bondielli, Francesco Marcelloni, Lucia C. Passaro |
阅读更多来源: ArXiv AI | 11-03-26
EsoLang-Bench: Evaluating Genuine Reasoning in Large Language Models via Esoteric Programming Languages
Authors: Aman Sharma, Paras Chopra |
阅读更多来源: ArXiv AI | 11-03-26
OOD-MMSafe: Advancing MLLM Safety from Harmful Intent to Hidden Consequences
Authors: Ming Wen, Kun Yang, Jingyu Zhang, Yuxuan Liu, shiwen cui, Shouling Ji, Xingjun Ma |
阅读更多来源: ArXiv AI | 11-03-26
Influencing LLM Multi-Agent Dialogue via Policy-Parameterized Prompts
Authors: Hongbo Bo, Jingyu Hu, Weiru Liu |
阅读更多来源: ArXiv AI | 11-03-26
PathMem: Toward Cognition-Aligned Memory Transformation for Pathology MLLMs
Authors: Jinyue Li, Yuci Liang, Qiankun Li, Xinheng Lyu, Jiayu Qian, Huabao Chen, Kun Wang, Zhigang Zeng, Anil Anthony Bharath, Yang Liu |
阅读更多来源: ArXiv AI | 11-03-26
Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUsdnhkng.github.io
阅读更多来源: Hacker News | 11-03-26
No, it doesn't cost Anthropic $5k per Claude Code usermartinalderson.com
阅读更多来源: Hacker News | 10-03-26
Is legal the same as legitimate: AI reimplementation and the erosion of copylefthongminhee.org
阅读更多来源: Hacker News | 10-03-26
Redox OS has adopted a Certificate of Origin policy and a strict no-LLM policyredox-os.org
阅读更多来源: Hacker News | 10-03-26
Yann LeCun's AI startup raises $1B in Europe's largest ever seed roundft.com
阅读更多来源: Hacker News | 10-03-26
Machine Learning for Stress Testing: Uncertainty Decomposition in Causal Panel Prediction
Authors: Yu Wang, Xiangchen Liu, Siguang Li |
阅读更多来源: ArXiv AI | 10-03-26
AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation
Authors: Changyi Li, Pengfei Lu, Xudong Pan, Fazl Barez, Min Yang |
阅读更多来源: ArXiv AI | 10-03-26
SoK: Agentic Retrieval-Augmented Generation (RAG): Taxonomy, Architectures, Evaluation, and Research Directions
Authors: Saroj Mishra, Suman Niroula, Umesh Yadav, Dilip Thakur, Srijan Gyawali, Shiva Gaire |
阅读更多来源: ArXiv AI | 10-03-26
The Yerkes-Dodson Curve for AI Agents: Emergent Cooperation Under Environmental Pressure in Multi-Agent LLM Simulations
Authors: Ivan Pasichnyk |
阅读更多来源: ArXiv AI | 10-03-26
A Novel Multi-Agent Architecture to Reduce Hallucinations of Large Language Models in Multi-Step Structural Modeling
Authors: Ziheng Geng, Jiachen Liu, Ran Cao, Lu Cheng, Dan M. Frangopol, Minghui Cheng |
阅读更多来源: ArXiv AI | 10-03-26
Rigidity in LLM Bandits with Implications for Human-AI Dyads
Authors: Haomiaomiao Wang, Tomás E Ward, Lili Zhang |
阅读更多来源: ArXiv AI | 10-03-26
Memory for Autonomous LLM Agents:Mechanisms, Evaluation, and Emerging Frontiers
Authors: Pengfei Du |
阅读更多来源: ArXiv AI | 10-03-26
SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans
Authors: Hansi Zeng, Zoey Li, Yifan Gao, Chenwei Zhang, Xiaoman Pan, Tao Yang, Fengran Mo, Jiacheng Lin, Xian Li, Jingbo Shang |
阅读更多来源: ArXiv AI | 10-03-26
Intentional Deception as Controllable Capability in LLM Agents
Authors: Jason Starace, Terence Soule |
阅读更多来源: ArXiv AI | 10-03-26
Large Language Model for Discrete Optimization Problems: Evaluation and Step-by-step Reasoning
Authors: Tianhao Qian, Guilin Qi, Z.Y. Wu, Ran Gu, Xuanyi Liu, Canchen Lyu |
阅读更多来源: ArXiv AI | 10-03-26
Ares: Adaptive Reasoning Effort Selection for Efficient LLM Agents
Authors: Jingbo Yang, Bairu Hou, Wei Wei, Yujia Bao, Shiyu Chang |
阅读更多来源: ArXiv AI | 10-03-26
SMGI: A Structural Theory of General Artificial Intelligence
Authors: Aomar Osmani |
阅读更多来源: ArXiv AI | 10-03-26
Adaptive Collaboration with Humans: Metacognitive Policy Optimization for Multi-Agent LLMs with Continual Learning
Authors: Wei Yang, Defu Cao, Jiacheng Pang, Muyan Weng, Yan Liu |
阅读更多来源: ArXiv AI | 10-03-26
Advancing Automated Algorithm Design via Evolutionary Stagewise Design with LLMs
Authors: Chen Lu, Ke Xue, Chengrui Gao, Yunqi Shi, Siyuan Xu, Mingxuan Yuan, Chao Qian, Zhi-Hua Zhou |
阅读更多来源: ArXiv AI | 10-03-26
Rel-MOSS: Towards Imbalanced Relational Deep Learning on Relational Databases
Authors: Jun Yin, Peng Huo, Bangguo Zhu, Hao Yan, Senzhang Wang, Shirui Pan, Chengqi Zhang |
阅读更多来源: ArXiv AI | 10-03-26
In-Context Reinforcement Learning for Tool Use in Large Language Models
Authors: Yaoqi Ye, Yiran Zhao, Keyu Duan, Zeyu Zheng, Kenji Kawaguchi, Cihang Xie, Michael Qizhe Shieh |
阅读更多来源: ArXiv AI | 10-03-26
The Struggle Between Continuation and Refusal: A Mechanistic Analysis of the Continuation-Triggered Jailbreak in LLMs
Authors: Yonghong Deng, Zhen Yang, Ping Jian, Xinyue Zhang, Zhongbin Guo, Chengzhi Li |
阅读更多来源: ArXiv AI | 10-03-26
FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use
Authors: Jiaxuan Lu, Kong Wang, Yemin Wang, Qingmei Tang, Hongwei Zeng, Xiang Chen, Jiahao Pi, Shujian Deng, Lingzhi Chen, Yi Fu, Kehua Yang, Xiao Sun |
阅读更多来源: ArXiv AI | 10-03-26
IronEngine: Towards General AI Assistant
Authors: Xi Mo |
阅读更多来源: ArXiv AI | 10-03-26
A Hierarchical Error-Corrective Graph Framework for Autonomous Agents with LLM-Based Action Generation
Authors: Cong Cao, Jingyao Zhang, Kun Tong |
阅读更多来源: ArXiv AI | 10-03-26
Evaluating Financial Intelligence in Large Language Models: Benchmarking SuperInvesting AI with LLM Engines
Authors: Akshay Gulati, Kanha Singhania, Tushar Banga, Parth Arora, Anshul Verma, Vaibhav Kumar Singh, Agyapal Digra, Jayant Singh Bisht, Danish Sharma, Varun Singla, Shubh Garg |
阅读更多来源: ArXiv AI | 10-03-26
Structured Exploration vs. Generative Flexibility: A Field Study Comparing Bandit and LLM Architectures for Personalised Health Behaviour Interventions
Authors: Dominik P. Hofer, Haochen Song, Rania Islambouli, Laura Hawkins, Ananya Bhattacharjee, Meredith Franklin, Joseph Jay Williams, Jan D. Smeddinck |
阅读更多来源: ArXiv AI | 10-03-26
CLAIRE: Compressed Latent Autoencoder for Industrial Representation and Evaluation -- A Deep Learning Framework for Smart Manufacturing
Authors: Mohammadhossein Ghahramani, Mengchu Zhou |
阅读更多来源: ArXiv AI | 10-03-26
Dynamic Chunking Diffusion Transformer
Authors: Akash Haridas, Utkarsh Saxena, Parsa Ashrafi Fashi, Mehdi Rezagholizadeh, Vikram Appia, Emad Barsoum |
阅读更多来源: ArXiv AI | 10-03-26
MoEless: Efficient MoE LLM Serving via Serverless Computing
Authors: Hanfei Yu, Bei Ouyang, Shwai He, Ang Li, Hao Wang |
阅读更多来源: ArXiv AI | 10-03-26
Abductive Reasoning with Syllogistic Forms in Large Language Models
Authors: Hirohiko Abe, Risako Ando, Takanobu Morishita Kentaro Ozeki, Koji Mineshima, Mitsuhiro Okada |
阅读更多来源: ArXiv AI | 10-03-26
Artificial Intelligence for Detecting Fetal Orofacial Clefts and Advancing Medical Education
Authors: Yuanji Zhang, Yuhao Huang, Haoran Dou, Xiliang Zhu, Chen Ling, Zhong Yang, Lianying Liang, Jiuping Li, Siying Liang, Rui Li, Yan Cao, Yuhan Zhang, Jiewei Lai, Yongsong Zhou, Hongyu Zheng, Xinru Gao, Cheng Yu, Liling Shi, Mengqin Yuan, Honglong Li, Xiaoqiong Huang, Chaoyu Chen, Jialin Zhang, Wenxiong Pan, Alejandro F. Frangi, Guangzhi He, Xin Yang, Yi Xiong, Linliang Yin, Xuedong Deng, Dong Ni |
阅读更多来源: ArXiv AI | 10-03-26
COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics
Authors: Kartik Sharma, Rakshit S. Trivedi |
阅读更多来源: ArXiv AI | 10-03-26
NOBLE: Accelerating Transformers with Nonlinear Low-Rank Branches
Authors: Ethan Smith (Canva Research) |
阅读更多来源: ArXiv AI | 10-03-26
Prosodic Boundary-Aware Streaming Generation for LLM-Based TTS with Streaming Text Input
Authors: Changsong Liu, Tianrui Wang, Ye Ni, Yizhou Peng, Eng Siong Chng |
阅读更多来源: ArXiv AI | 10-03-26
BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations
Authors: Thomas Monninger, Shaoyuan Xie, Qi Alfred Chen, Sihao Ding |
阅读更多来源: ArXiv AI | 10-03-26
Real-Time AI Service Economy: A Framework for Agentic Computing Across the Continuum
Authors: Lauri Lovén, Alaa Saleh, Reza Farahani, Ilir Murturi, Miguel Bordallo López, Praveen Kumar Donta, Schahram Dustdar |
阅读更多来源: ArXiv AI | 10-03-26
Agentic LLM Planning via Step-Wise PDDL Simulation: An Empirical Characterisation
Authors: Kai Göbel, Pierrick Lorang, Patrik Zips, Tobias Glück |
阅读更多来源: ArXiv AI | 10-03-26
Artificial Intelligence for Climate Adaptation: Reinforcement Learning for Climate Change-Resilient Transport
Authors: Miguel Costa, Arthur Vandervoort, Carolin Schmidt, João Miranda, Morten W. Petersen, Martin Drews, Karyn Morrisey, Francisco C. Pereira |
阅读更多来源: ArXiv AI | 10-03-26
Talk Freely, Execute Strictly: Schema-Gated Agentic AI for Flexible and Reproducible Scientific Workflows
Authors: Joel Strickland, Arjun Vijeta, Chris Moores, Oliwia Bodek, Bogdan Nenchev, Thomas Whitehead, Charles Phillips, Karl Tassenberg, Gareth Conduit, Ben Pellegrini |
阅读更多来源: ArXiv AI | 10-03-26
Nvidia backs AI data center startup Nscale as it hits $14.6B valuationcnbc.com
阅读更多来源: Hacker News | 09-03-26
Owner of ICE detention facility sees big opportunity in AI man campstechcrunch.com
阅读更多来源: Hacker News | 09-03-26
Claude helped select targets for Iran strikes, possibly including schooltwitter.com/robertwrighter
阅读更多来源: Hacker News | 09-03-26
You Don't Need a Vector Databasevecstore.app
阅读更多来源: Hacker News | 08-03-26
MOOSEnger -- a Domain-Specific AI Agent for the MOOSE Ecosystem
Authors: Mengnan Li, Jason Miller, Zachary Prince, Alexander Lindsay, Cody Permann |
阅读更多来源: ArXiv AI | 08-03-26
Design Behaviour Codes (DBCs): A Taxonomy-Driven Layered Governance Benchmark for Large Language Models
Authors: G. Madan Mohan, Veena Kiran Nambiar, Kiranmayee Janardhan |
阅读更多来源: ArXiv AI | 08-03-26
VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment
Authors: Jiawei Chen, Tianzhuo Yang, Guoxi Zhang, Jiaming Ji, Yaodong Yang, Juntao Dai |
阅读更多来源: ArXiv AI | 08-03-26
LLM-Grounded Explainability for Port Congestion Prediction via Temporal Graph Attention Networks
Authors: Zhiming Xue, Yujue Wang |
阅读更多来源: ArXiv AI | 08-03-26
Alignment Backfire: Language-Dependent Reversal of Safety Interventions Across 16 Languages in LLM Multi-Agent Systems
Authors: Hiroki Fukui |
阅读更多来源: ArXiv AI | 08-03-26
EvoTool: Self-Evolving Tool-Use Policy Optimization in LLM Agents via Blame-Aware Mutation and Diversity-Aware Selection
Authors: Shuo Yang, Soyeon Caren Han, Xueqi Ma, Yan Li, Mohammad Reza Ghasemi Madani, Eduard Hovy |
阅读更多来源: ArXiv AI | 08-03-26
Measuring the Fragility of Trust: Devising Credibility Index via Explanation Stability (CIES) for Business Decision Support Systems
Authors: Alin-Gabriel Vaduva, Simona-Vasilica Oprea, Adela Bara |
阅读更多来源: ArXiv AI | 08-03-26
BioLLMAgent: A Hybrid Framework with Enhanced Structural Interpretability for Simulating Human Decision-Making in Computational Psychiatry
Authors: Zuo Fei, Kezhi Wang, Xiaomin Chen, Yizhou Huang |
阅读更多来源: ArXiv AI | 08-03-26
AegisUI: Behavioral Anomaly Detection for Structured User Interface Protocols in AI Agent Systems
Authors: Mohd Safwan Uddin, Saba Hajira |
阅读更多来源: ArXiv AI | 08-03-26
Survive at All Costs: Exploring LLM's Risky Behaviors under Survival Pressure
Authors: Yida Lu, Jianwei Fang, Xuyang Shao, Zixuan Chen, Shiyao Cui, Shanshan Bian, Guangyao Su, Pei Ke, Han Qiu, Minlie Huang |
阅读更多来源: ArXiv AI | 08-03-26
The Trilingual Triad Framework: Integrating Design, AI, and Domain Knowledge in No-code AI Smart City Course
Authors: Qian Huang, King Wang Poon |
阅读更多来源: ArXiv AI | 08-03-26
MedCoRAG: Interpretable Hepatology Diagnosis via Hybrid Evidence Retrieval and Multispecialty Consensus
Authors: Zheng Li, Jiayi Xu, Zhikai Hu, Hechang Chen, Lele Cong, Yunyun Wang, Shuchao Pang |
阅读更多来源: ArXiv AI | 08-03-26
X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes
Authors: Gao Tianxi, Cai Yufan, Yuan Yusi, Dong Jin Song |
阅读更多来源: ArXiv AI | 08-03-26
Building AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned
Authors: Nghi D. Q. Bui |
阅读更多来源: ArXiv AI | 08-03-26
Legal interpretation and AI: from expert systems to argumentation and LLMs
Authors: Václav Janeček, Giovanni Sartor |
阅读更多来源: ArXiv AI | 08-03-26
Dissociating Direct Access from Inference in AI Introspection
Authors: Harvey Lederman, Kyle Mahowald |
阅读更多来源: ArXiv AI | 08-03-26
Judge Reliability Harness: Stress Testing the Reliability of LLM Judges
Authors: Sunishchal Dev, Andrew Sloan, Joshua Kavner, Nicholas Kong, Morgan Sandler |
阅读更多来源: ArXiv AI | 08-03-26
Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation
Authors: Benjamin Feuer, Lucas Rosenblatt, Oussama Elachqar |
阅读更多来源: ArXiv AI | 08-03-26
Will Claude Code ruin our team?justinjackson.ca
阅读更多来源: Hacker News | 08-03-26
LLM Writing Tropes.mdtropes.fyi
阅读更多来源: Hacker News | 08-03-26
Sarvam 105B, the first competitive Indian open source LLMsarvam.ai
阅读更多来源: Hacker News | 07-03-26
LLMs work best when the user defines their acceptance criteria firstkatanaquant.com
阅读更多来源: Hacker News | 07-03-26
Tell HN: I'm 60 years old. Claude Code has re-ignited a passion
阅读更多来源: Hacker News | 07-03-26
Show HN: Claude-replay – A video-like player for Claude Code sessionsgithub.com/es617
阅读更多来源: Hacker News | 07-03-26
Anthropic, please make a new Slackfivetran.com
阅读更多来源: Hacker News | 07-03-26
A tool that removes censorship from open-weight LLMsgithub.com/elder-plinius
阅读更多来源: Hacker News | 07-03-26
Hardening Firefox with Anthropic's Red Teamanthropic.com
阅读更多来源: Hacker News | 07-03-26
Show HN: 1v1 coding game that LLMs struggle withyare.io
阅读更多来源: Hacker News | 07-03-26
Labor market impacts of AI: A new measure and early evidenceanthropic.com
阅读更多来源: Hacker News | 06-03-26
Where things stand with the Department of Waranthropic.com
阅读更多来源: Hacker News | 06-03-26
GPT-5.4openai.com
阅读更多来源: Hacker News | 06-03-26
Hardening Firefox with Anthropic's Red Teamblog.mozilla.org
阅读更多来源: Hacker News | 06-03-26
Launch HN: Vela (YC W26) – AI for complex scheduling
阅读更多来源: Hacker News | 06-03-26
Structured AI (YC F25) Is Hiringycombinator.com
阅读更多来源: Hacker News | 06-03-26
Qwen3.5 Fine-Tuning Guideunsloth.ai
阅读更多来源: Hacker News | 05-03-26
Dario Amodei calls OpenAI’s messaging around military deal ‘straight up lies’techcrunch.com
阅读更多来源: Hacker News | 05-03-26
The L in "LLM" Stands for Lyingacko.net
阅读更多来源: Hacker News | 05-03-26
CodeTaste: Can LLMs Generate Human-Level Code Refactorings?
Authors: Alex Thillen, Niels Mündler, Veselin Raychev, Martin Vechev |
阅读更多来源: ArXiv AI | 05-03-26
PRAM-R: A Perception-Reasoning-Action-Memory Framework with LLM-Guided Modality Routing for Adaptive Autonomous Driving
Authors: Yi Zhang, Xian Zhang, Saisi Zhao, Yinglei Song, Chengdong Wu, Nenad Petrovic, Alois Knoll |
阅读更多来源: ArXiv AI | 05-03-26
LabelBuddy: An Open Source Music and Audio Language Annotation Tagging Tool Using AI Assistance
Authors: Ioannis Prokopiou, Ioannis Sina, Agisilaos Kounelis, Pantelis Vikatos, Themos Stafylakis |
阅读更多来源: ArXiv AI | 05-03-26
Causality Elicitation from Large Language Models
Authors: Takashi Kameyama, Masahiro Kato, Yasuko Hio, Yasushi Takano, Naoto Minakawa |
阅读更多来源: ArXiv AI | 05-03-26
When AI Fails, What Works? A Data-Driven Taxonomy of Real-World AI Risk Mitigation Strategies
Authors: Evgenija Popchanovska, Ana Gjorgjevikj, Maryan Rizinski, Lubomir Chitkushev, Irena Vodenska, Dimitar Trajanov |
阅读更多来源: ArXiv AI | 05-03-26
World Properties without World Models: Recovering Spatial and Temporal Structure from Co-occurrence Statistics in Static Word Embeddings
Authors: Elan Barenholtz |
阅读更多来源: ArXiv AI | 05-03-26
Activation Outliers in Transformer Quantization: Reproduction, Statistical Analysis, and Deployment Tradeoffs
Authors: Pranav Kumar Kaliaperumal |
阅读更多来源: ArXiv AI | 05-03-26
Efficient Refusal Ablation in LLM through Optimal Transport
Authors: Geraldin Nanfack, Eugene Belilovsky, Elvis Dohmatob |
阅读更多来源: ArXiv AI | 05-03-26
Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization
Authors: Furkan Mumcu, Yasin Yilmaz |
阅读更多来源: ArXiv AI | 05-03-26
Mozi: Governed Autonomy for Drug Discovery LLM Agents
Authors: He Cao, Siyu Liu, Fan Zhang, Zijing Liu, Hao Li, Bin Feng, Shengyuan Bai, Leqing Chen, Kai Xie, Yu Li |
阅读更多来源: ArXiv AI | 05-03-26
RAGNav: A Retrieval-Augmented Topological Reasoning Framework for Multi-Goal Visual-Language Navigation
Authors: Ling Luo, Qiangian Bai |
阅读更多来源: ArXiv AI | 05-03-26
Generative AI in Managerial Decision-Making: Redefining Boundaries through Ambiguity Resolution and Sycophancy Analysis
Authors: Sule Ozturk Birim, Fabrizio Marozzo, Yigit Kazancoglu |
阅读更多来源: ArXiv AI | 05-03-26
From Threat Intelligence to Firewall Rules: Semantic Relations in Hybrid AI Agent and Expert System Architectures
Authors: Chiara Bonfanti, Davide Colaiacomo, Luca Cagliero, Cataldo Basile |
阅读更多来源: ArXiv AI | 05-03-26
Towards Realistic Personalization: Evaluating Long-Horizon Preference Following in Personalized User-LLM Interactions
Authors: Qianyun Guo, Yibo Li, Yue Liu, Bryan Hooi |
阅读更多来源: ArXiv AI | 05-03-26
BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning
Authors: Tarjei Paule Hage, Markus J. Buehler |
阅读更多来源: ArXiv AI | 05-03-26
A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development
Authors: Boyuan (Keven)Guan, Wencong Cui, Levente Juhasz |
阅读更多来源: ArXiv AI | 05-03-26
Roboflow (YC S20) Is Hiring a Security Engineer for AI Infraroboflow.com
阅读更多来源: Hacker News | 05-03-26
Launch HN: Cekura (YC F24) – Testing and monitoring for voice and chat AI agents
阅读更多来源: Hacker News | 04-03-26
When AI writes the software, who verifies it?leodemoura.github.io
阅读更多来源: Hacker News | 04-03-26
Nuclear War: An LLM Scenariochrisclapham.com
阅读更多来源: Hacker News | 04-03-26
Did Alibaba just kneecap its powerful Qwen AI team?venturebeat.com
阅读更多来源: Hacker News | 04-03-26
Claude's Cycles [pdf]stanford.edu
阅读更多来源: Hacker News | 04-03-26
A CPU that runs entirely on GPUgithub.com/robertcprice
阅读更多来源: Hacker News | 04-03-26
Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory
Authors: Boqin Yuan, Yue Su, Kun Yao |
阅读更多来源: ArXiv AI | 04-03-26
VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings
Authors: Athanasios Efthymiou, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring |
阅读更多来源: ArXiv AI | 04-03-26
Estimating Visual Attribute Effects in Advertising from Observational Data: A Deepfake-Informed Double Machine Learning Approach
Authors: Yizhi Liu, Balaji Padmanabhan, Siva Viswanathan |
阅读更多来源: ArXiv AI | 04-03-26
AnchorDrive: LLM Scenario Rollout with Anchor-Guided Diffusion Regeneration for Safety-Critical Scenario Generation
Authors: Zhulin Jiang, Zetao Li, Cheng Wang, Ziwen Wang, Chen Xiong |
阅读更多来源: ArXiv AI | 04-03-26
A Neuropsychologically Grounded Evaluation of LLM Cognitive Abilities
Authors: Faiz Ghifari Haznitrama, Faeyza Rishad Ardi, Alice Oh |
阅读更多来源: ArXiv AI | 04-03-26
LLM-MLFFN: Multi-Level Autonomous Driving Behavior Feature Fusion via Large Language Model
Authors: Xiangyu Li, Tianyi Wang, Xi Cheng, Rakesh Chowdary Machineni, Zhaomiao Guo, Sikai Chen, Junfeng Jiao, Christian Claudel |
阅读更多来源: ArXiv AI | 04-03-26
NeuroProlog: Multi-Task Fine-Tuning for Neurosymbolic Mathematical Reasoning via the Cocktail Effect
Authors: Pratibha Zunjare, Michael Hsiao |
阅读更多来源: ArXiv AI | 04-03-26
LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization
Authors: Yang Zhao, Zihao Li, Zhiyu Jiang, Dandan Ma, Ganchao Liu, Wenzhe Zhao |
阅读更多来源: ArXiv AI | 04-03-26
SorryDB: Can AI Provers Complete Real-World Lean Theorems?
Authors: Austin Letson, Leopoldo Sarra, Auguste Poiroux, Oliver Dressler, Paul Lezeau, Dhyan Aranha, Frederick Pu, Aaron Hill, Miguel Corredera Hidalgo, Julian Berman, George Tsoukalas, Lenny Taelman |
阅读更多来源: ArXiv AI | 04-03-26
AgentAssay: Token-Efficient Regression Testing for Non-Deterministic AI Agent Workflows
Authors: Varun Pratap Bhardwaj |
阅读更多来源: ArXiv AI | 04-03-26
SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving
Authors: Sunghyeon Woo, Ahreum Seo, Jaegwang Lee, Jaeeun Kil, Hanbae Seo, Joonghoon Kim, Baeseong Park, Se Jung Kwon, Dongsoo Lee |
阅读更多来源: ArXiv AI | 04-03-26
Rethinking Code Similarity for Automated Algorithm Design with LLMs
Authors: Rui Zhang, Zhichao Lu |
阅读更多来源: ArXiv AI | 04-03-26
ShipTraj-R1: Reinforcing Ship Trajectory Prediction in Large Language Models via Group Relative Policy Optimization
Authors: Yang Zhan, Yunhao Li, Zhang Chao, Yuxu Lu, Yan Li |
阅读更多来源: ArXiv AI | 04-03-26
SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training
Authors: Qi Zhang, Yifei Wang, Xiaohan Wang, Jiajun Chai, Guojun Yin, Wei Lin, Yisen Wang |
阅读更多来源: ArXiv AI | 04-03-26
Retrievit: In-context Retrieval Capabilities of Transformers, State Space Models, and Hybrid Architectures
Authors: Georgios Pantazopoulos, Malvina Nikandrou, Ioannis Konstas, Alessandro Suglia |
阅读更多来源: ArXiv AI | 04-03-26
LLM-based Argument Mining meets Argumentation and Description Logics: a Unified Framework for Reasoning about Debates
Authors: Gianvincenzo Alfano, Sergio Greco, Lucio La Cava, Stefano Francesco Monea, Irina Trubitsyna |
阅读更多来源: ArXiv AI | 04-03-26
SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models
Authors: Peiyao Jiang, Zequn Qin, Xi Li |
阅读更多来源: ArXiv AI | 04-03-26
REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise Telemetry
Authors: Yuvraj Agrawal |
阅读更多来源: ArXiv AI | 04-03-26
RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization
Authors: Siwei Zhang, Yun Xiong, Xi Chen, Zi'an Jia, Renhong Huang, Jiarong Xu, Jiawei Zhang |
阅读更多来源: ArXiv AI | 04-03-26
Agentic AI-based Coverage Closure for Formal Verification
Authors: Sivaram Pothireddypalli, Ashish Raman, Deepak Narayan Gadde, Aman Kumar |
阅读更多来源: ArXiv AI | 04-03-26
AI Space Physics: Constitutive boundary semantics for open AI institutions
Authors: Oleg Romanchuk, Roman Bondar |
阅读更多来源: ArXiv AI | 04-03-26
Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation
Authors: Hongliu Cao, Ilias Driouich, Eoin Thomas |
阅读更多来源: ArXiv AI | 04-03-26
Expectation and Acoustic Neural Network Representations Enhance Music Identification from Brain Activity
Authors: Shogo Noguchi, Taketo Akama, Tai Nakamura, Shun Minamikawa, Natalia Polouliakh |
阅读更多来源: ArXiv AI | 04-03-26
Neuro-Symbolic Artificial Intelligence: A Task-Directed Survey in the Black-Box Models Era
Authors: Giovanni Pio Delvecchio, Lorenzo Molfetta, Gianluca Moro |
阅读更多来源: ArXiv AI | 04-03-26
Show HN: AgentBus – Centralized AI Agent-to-Agent Messaging via REST APIagentbus.org
阅读更多来源: Hacker News | 04-03-26
TorchLean: Formalizing Neural Networks in Leanleandojo.org
阅读更多来源: Hacker News | 04-03-26
Time, Space, and Life as We Know It (2017)raganwald.com
阅读更多来源: Hacker News | 04-03-26
GPT‑5.3 Instantopenai.com
阅读更多来源: Hacker News | 04-03-26
The Lattice Representation Hypothesis of Large Language Models
Authors: Bo Xiong |
阅读更多来源: ArXiv AI | 03-03-26
ProtRLSearch: A Multi-Round Multimodal Protein Search Agent with Large Language Models Trained via Reinforcement Learning
Authors: Congying Liu, Taihao Li, Ming Huang, Xingyuan Wei, Peipei Liu, Yiqing Shen, Yanxu Mao, Tiehan Cui |
阅读更多来源: ArXiv AI | 03-03-26
GraphScout: Empowering Large Language Models with Intrinsic Exploration Ability for Agentic Graph Reasoning
Authors: Yuchen Ying, Weiqi Jiang, Tongya Zheng, Yu Wang, Shunyu Liu, Kaixuan Chen, Mingli Song |
阅读更多来源: ArXiv AI | 03-03-26
Graph-Based Self-Healing Tool Routing for Cost-Efficient LLM Agents
Authors: Neeraj Bholani |
阅读更多来源: ArXiv AI | 03-03-26
LLM-assisted Semantic Option Discovery for Facilitating Adaptive Deep Reinforcement Learning
Authors: Chang Yao, Jinghui Qin, Kebing Jin, Hankz Hankui Zhuo |
阅读更多来源: ArXiv AI | 03-03-26
Evaluating and Understanding Scheming Propensity in LLM Agents
Authors: Mia Hopman, Jannes Elstner, Maria Avramidou, Amritanshu Prasad, David Lindner |
阅读更多来源: ArXiv AI | 03-03-26
Benchmarking LLM Summaries of Multimodal Clinical Time Series for Remote Monitoring
Authors: Aditya Shukla, Yining Yuan, Ben Tamo, Yifei Wang, Micky Nnamdi, Shaun Tan, Jieru Li, Benoit Marteau, Brad Willingham, May Wang |
阅读更多来源: ArXiv AI | 03-03-26
FT-Dojo: Towards Autonomous LLM Fine-Tuning with Language Agents
Authors: Qizheng Li, Yifei Zhang, Xiao Yang, Xu Yang, Zhuo Wang, Weiqing Liu, Jiang Bian |
阅读更多来源: ArXiv AI | 03-03-26
LiveCultureBench: a Multi-Agent, Multi-Cultural Benchmark for Large Language Models in Dynamic Social Simulations
Authors: Viet-Thanh Pham, Lizhen Qu, Thuy-Trang Vu, Gholamreza Haffari, Dinh Phung |
阅读更多来源: ArXiv AI | 03-03-26
Emerging Human-like Strategies for Semantic Memory Foraging in Large Language Models
Authors: Eric Lacosse, Mariana Duarte, Peter M. Todd, Daniel C. McNamee |
阅读更多来源: ArXiv AI | 03-03-26
GAM-RAG: Gain-Adaptive Memory for Evolving Retrieval in Retrieval-Augmented Generation
Authors: Yifan Wang, Mingxuan Jiang, Zhihao Sun, Yixin Cao, Yicun Liu, Keyang Chen, Guangnan Ye, Hongfeng Chai |
阅读更多来源: ArXiv AI | 03-03-26
Exploring Plan Space through Conversation: An Agentic Framework for LLM-Mediated Explanations in Planning
Authors: Guilhem Fouilhé, Rebecca Eifler, Antonin Poché, Sylvie Thiébaux, Nicholas Asher |
阅读更多来源: ArXiv AI | 03-03-26
OpenRad: a Curated Repository of Open-access AI models for Radiology
Authors: Konstantinos Vrettos, Galini Papadaki, Emmanouil Brilakis, Matthaios Triantafyllou, Dimitrios Leventis, Despina Staraki, Maria Mavroforou, Eleftherios Tzanis, Konstantina Giouroukou, Michail E. Klontzas |
阅读更多来源: ArXiv AI | 03-03-26
Stolen Gemini API key racks up $82,000 in 48 hoursllmhorrors.com
阅读更多来源: Hacker News | 03-03-26
Ars Technica fires reporter after AI controversy involving fabricated quotesfuturism.com
阅读更多来源: Hacker News | 03-03-26
Meta’s AI smart glasses and data privacy concernssvd.se
阅读更多来源: Hacker News | 03-03-26
ArgLLM-App: An Interactive System for Argumentative Reasoning with Large Language Models
Authors: Adam Dejl, Deniz Gorur, Francesca Toni |
阅读更多来源: ArXiv AI | 03-03-26
Terminology Rarity Predicts Catastrophic Failure in LLM Translation of Low-Resource Ancient Languages: Evidence from Ancient Greek
Authors: James L. Zainaldin, Cameron Pattison, Manuela Marai, Jacob Wu, Mark J. Schiefsky |
阅读更多来源: ArXiv AI | 03-03-26
FaultXformer: A Transformer-Encoder Based Fault Classification and Location Identification model in PMU-Integrated Active Electrical Distribution System
Authors: Kriti Thakur, Alivelu Manga Parimi, Mayukha Pal |
阅读更多来源: ArXiv AI | 03-03-26
SafeGen-LLM: Enhancing Safety Generalization in Task Planning for Robotic Systems
Authors: Jialiang Fan, Weizhe Xu, Mengyu Liu, Oleg Sokolsky, Insup Lee, Fangxin Kong |
阅读更多来源: ArXiv AI | 03-03-26
Resources for Automated Evaluation of Assistive RAG Systems that Help Readers with News Trustworthiness Assessment
Authors: Dake Zhang, Mark D. Smucker, Charles L. A. Clarke |
阅读更多来源: ArXiv AI | 03-03-26
Do LLMs Benefit From Their Own Words?
Authors: Jenny Y. Huang, Leshem Choshen, Ramon Astudillo, Tamara Broderick, Jacob Andreas |
阅读更多来源: ArXiv AI | 03-03-26
An Agentic LLM Framework for Adverse Media Screening in AML Compliance
Authors: Pavel Chernakov, Sasan Jafarnejad, Raphaël Frank |
阅读更多来源: ArXiv AI | 03-03-26
ODAR: Principled Adaptive Routing for LLM Reasoning via Active Inference
Authors: Siyuan Ma, Bo Gao, Xiaojun Jia, Simeng Qin, Tianlin Li, Ke Ma, Xiaoshuang Jia, Wenqi Ren, Yang Liu |
阅读更多来源: ArXiv AI | 03-03-26
PseudoAct: Leveraging Pseudocode Synthesis for Flexible Planning and Action Control in Large Language Model Agents
Authors: Yihan (Logon)Wen, Xin Chen |
阅读更多来源: ArXiv AI | 03-03-26
From Flat Logs to Causal Graphs: Hierarchical Failure Attribution for LLM-based Multi-Agent Systems
Authors: Yawen Wang, Wenjie Wu, Junjie Wang, Qing Wang |
阅读更多来源: ArXiv AI | 03-03-26
Reasoning-Driven Multimodal LLM for Domain Generalization
Authors: Zhipeng Xu, Zilong Wang, Xinyang Jiang, Dongsheng Li, De Cheng, Nannan Wang |
阅读更多来源: ArXiv AI | 03-03-26
The Auton Agentic AI Framework
Authors: Sheng Cao, Zhao Chang, Chang Li, Hannan Li, Liyao Fu, Ji Tang |
阅读更多来源: ArXiv AI | 03-03-26
EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models
Authors: Yiyang Fang, Wenke Huang, Pei Fu, Yihao Yang, Kehua Su, Zhenbo Luo, Jian Luan, Mang Ye |
阅读更多来源: ArXiv AI | 03-03-26
CIRCLE: A Framework for Evaluating AI from a Real-World Lens
Authors: Reva Schwartz, Carina Westling, Morgan Briggs, Marzieh Fadaee, Isar Nejadgholi, Matthew Holmes, Fariza Rashid, Maya Carlyle, Afaf Taïk, Kyra Wilson, Peter Douglas, Theodora Skeadas, Gabriella Waters, Rumman Chowdhury, Thiago Lacerda |
阅读更多来源: ArXiv AI | 03-03-26
LemmaBench: A Live, Research-Level Benchmark to Evaluate LLM Capabilities in Mathematics
Authors: Antoine Peyronnet, Fabian Gloeckle, Amaury Hayat |
阅读更多来源: ArXiv AI | 03-03-26
Uncertainty Quantification for Multimodal Large Language Models with Incoherence-adjusted Semantic Volume
Authors: Gregory Kang Ruey Lau, Hieu Dao, Nicole Kan Hui Lin, Bryan Kian Hsiang Low |
阅读更多来源: ArXiv AI | 03-03-26
DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science
Authors: Fan Shu, Yite Wang, Ruofan Wu, Boyi Liu, Zhewei Yao, Yuxiong He, Feng Yan |
阅读更多来源: ArXiv AI | 03-03-26
Right-sizes LLM models to your system's RAM, CPU, and GPUgithub.com/alexsjones
阅读更多来源: Hacker News | 02-03-26
If AI writes code, should the session be part of the commit?github.com/mandel-macaque
阅读更多来源: Hacker News | 02-03-26
Show HN: Logira – eBPF runtime auditing for AI agent runsgithub.com/melonattacker
阅读更多来源: Hacker News | 02-03-26
Why XML tags are so fundamental to Claudeglthr.com
阅读更多来源: Hacker News | 02-03-26
10-202: Introduction to Modern AI (CMU)modernaicourse.org
阅读更多来源: Hacker News | 02-03-26
I built a demo of what AI chat will look like when it's "free" and ad-supported99helpers.com
阅读更多来源: Hacker News | 02-03-26
Claude hits #1 on the App Store as users rally behind Anthropic9to5mac.com
阅读更多来源: Hacker News | 02-03-26
Our Agreement with the Department of Waropenai.com
阅读更多来源: Hacker News | 01-03-26
Addressing Antigravity Bans and Reinstating Accessgithub.com/google-gemini
阅读更多来源: Hacker News | 01-03-26
MCP server that reduces Claude Code context consumption by 98%mksg.lu
阅读更多来源: Hacker News | 01-03-26
We do not think Anthropic should be designated as a supply chain risktwitter.com/openai
阅读更多来源: Hacker News | 01-03-26
Switch to Claude without starting overclaude.com
阅读更多来源: Hacker News | 01-03-26
Correcting Human Labels for Rater Effects in AI Evaluation: An Item Response Theory Approach
Authors: Jodi M. Casabianca, Maggie Beiting-Parrish |
阅读更多来源: ArXiv AI | 01-03-26
Strategy Executability in Mathematical Reasoning: Leveraging Human-Model Differences for Effective Guidance
Authors: Weida Liang, Yiyou Sun, Shuyuan Nan, Chuang Li, Dawn Song, Kenji Kawaguchi |
阅读更多来源: ArXiv AI | 01-03-26
Toward Personalized LLM-Powered Agents: Foundations, Evaluation, and Future Directions
Authors: Yue Xu, Qian Chen, Zizhan Ma, Dongrui Liu, Wenxuan Wang, Xiting Wang, Li Xiong, Wenjie Wang |
阅读更多来源: ArXiv AI | 01-03-26
FlexMS is a flexible framework for benchmarking deep learning-based mass spectrum prediction tools in metabolomics
Authors: Yunhua Zhong, Yixuan Tang, Yifan Li, Jie Yang, Pan Liu, Jun Xia |
阅读更多来源: ArXiv AI | 01-03-26
When Should an AI Act? A Human-Centered Model of Scene, Context, and Behavior for Agentic AI Design
Authors: Soyoung Jung, Daehoo Yoon, Sung Gyu Koh, Young Hwan Kim, Yehan Ahn, Sung Park |
阅读更多来源: ArXiv AI | 01-03-26
ClinDet-Bench: Beyond Abstention, Evaluating Judgment Determinability of LLMs in Clinical Decision-Making
Authors: Yusuke Watanabe, Yohei Kobashi, Takeshi Kojima, Yusuke Iwasawa, Yasushi Okuno, Yutaka Matsuo |
阅读更多来源: ArXiv AI | 01-03-26
OmniGAIA: Towards Native Omni-Modal AI Agents
Authors: Xiaoxi Li, Wenxiang Jiao, Jiarui Jin, Shijian Wang, Guanting Dong, Jiajie Jin, Hao Wang, Yinuo Wang, Ji-Rong Wen, Yuan Lu, Zhicheng Dou |
阅读更多来源: ArXiv AI | 01-03-26
Towards LLM-Empowered Knowledge Tracing via LLM-Student Hierarchical Behavior Alignment in Hyperbolic Space
Authors: Xingcheng Fu, Shengpeng Wang, Yisen Gao, Xianxian Li, Chunpei Li, Qingyun Sun, Dongran Yu |
阅读更多来源: ArXiv AI | 01-03-26
The AI Research Assistant: Promise, Peril, and a Proof of Concept
Authors: Tan Bui-Thanh |
阅读更多来源: ArXiv AI | 01-03-26
Enhancing CVRP Solver through LLM-driven Automatic Heuristic Design
Authors: Zhuoliang Xie, Fei Liu, Zhenkun Wang, Qingfu Zhang |
阅读更多来源: ArXiv AI | 01-03-26
Modeling Expert AI Diagnostic Alignment via Immutable Inference Snapshots
Authors: Dimitrios P. Panagoulias, Evangelia-Aikaterini Tsichrintzi, Georgios Savvidis, Evridiki Tsoureli-Nikita |
阅读更多来源: ArXiv AI | 01-03-26
SPM-Bench: Benchmarking Large Language Models for Scanning Probe Microscopy
Authors: Peiyao Xiao, Xiaogang Li, Chengliang Xu, Jiayi Wang, Ben Wang, Zichao Chen, Zeyu Wang, Kejun Yu, Yueqian Chen, Xulin Liu, Wende Xiao, Bing Zhao, Hu Wei |
阅读更多来源: ArXiv AI | 01-03-26
Multi-Agent Large Language Model Based Emotional Detoxification Through Personalized Intensity Control for Consumer Protection
Authors: Keito Inoshita |
阅读更多来源: ArXiv AI | 01-03-26
ESAA: Event Sourcing for Autonomous Agents in LLM-Based Software Engineering
Authors: Elzo Brito dos Santos Filho |
阅读更多来源: ArXiv AI | 01-03-26
A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring
Authors: Usman Anwar, Julianna Piskorz, David D. Baek, David Africa, Jim Weatherall, Max Tegmark, Christian Schroeder de Witt, Mihaela van der Schaar, David Krueger |
阅读更多来源: ArXiv AI | 01-03-26
LLM Novice Uplift on Dual-Use, In Silico Biology Tasks
Authors: Chen Bo Calvin Zhang, Christina Q. Knight, Nicholas Kruus, Jason Hausenloy, Pedro Medeiros, Nathaniel Li, Aiden Kim, Yury Orlovskiy, Coleman Breen, Bryce Cai, Jasper Götting, Andrew Bo Liu, Samira Nedungadi, Paula Rodriguez, Yannis Yiming He, Mohamed Shaaban, Zifan Wang, Seth Donoughe, Julian Michael |
阅读更多来源: ArXiv AI | 01-03-26
Toward Expert Investment Teams:A Multi-Agent LLM System with Fine-Grained Trading Tasks
Authors: Kunihiro Miyazaki, Takanobu Kawahara, Stephen Roberts, Stefan Zohren |
阅读更多来源: ArXiv AI | 01-03-26
Deterministic Programming with LLMsmcherm.com
阅读更多来源: Hacker News | 01-03-26
Running a One Trillion-Parameter LLM Locally on AMD Ryzen AI Max+ Clusteramd.com
阅读更多来源: Hacker News | 01-03-26
Building a Minimal Transformer for 10-digit Additionalexlitzenberger.com
阅读更多来源: Hacker News | 01-03-26
Can you reverse engineer our neural network?janestreet.com
阅读更多来源: Hacker News | 28-02-26
Show HN: Claude-File-Recovery, recover files from your ~/.claude sessionsgithub.com/hjtenklooster
阅读更多来源: Hacker News | 28-02-26
A Chinese official’s use of ChatGPT revealed an intimidation operationcnn.com
阅读更多来源: Hacker News | 28-02-26
Smallest transformer that can add two 10-digit numbersgithub.com/anadim
阅读更多来源: Hacker News | 28-02-26
Statement on the comments from Secretary of War Pete Hegsethanthropic.com
阅读更多来源: Hacker News | 28-02-26
OpenAI agrees with Dept. of War to deploy models in their classified networktwitter.com/sama
阅读更多来源: Hacker News | 28-02-26
OpenAI raises $110B on $730B pre-money valuationtechcrunch.com
阅读更多来源: Hacker News | 28-02-26
OpenAI – How to delete your accounthelp.openai.com
阅读更多来源: Hacker News | 28-02-26
Don't trust AI agentsnanoclaw.dev
阅读更多来源: Hacker News | 28-02-26
We gave terabytes of CI logs to an LLMmendral.com
阅读更多来源: Hacker News | 28-02-26
Implementing a Z80 / ZX Spectrum emulator with Claude Codeantirez.com
阅读更多来源: Hacker News | 28-02-26
Anthropic says it will challenge Pentagon supply chain risk designation in courtreuters.com
阅读更多来源: Hacker News | 28-02-26
Get free Claude max 20x for open-source maintainersclaude.com
阅读更多来源: Hacker News | 28-02-26
I am directing the Department of War to designate Anthropic a supply-chain risktwitter.com/secwar
阅读更多来源: Hacker News | 28-02-26
President Trump bans Anthropic from use in government systemsnpr.org
阅读更多来源: Hacker News | 28-02-26
What Claude Code choosesamplifying.ai
阅读更多来源: Hacker News | 27-02-26
Statement from Dario Amodei on our discussions with the Department of Waranthropic.com
阅读更多来源: Hacker News | 27-02-26
Steering interpretable language models with concept algebraguidelabs.ai
阅读更多来源: Hacker News | 27-02-26
Palantir's AI Is Playing a Major Role in Tracking Gaza Aid Deliveriesdropsitenews.com
阅读更多来源: Hacker News | 27-02-26
Nano Banana 2: Google's latest AI image generation modelblog.google
阅读更多来源: Hacker News | 27-02-26
LiteLLM (YC W23): Founding Reliability Engineer – $200K-$270K and 0.5-1.0% equityycombinator.com
阅读更多来源: Hacker News | 27-02-26
Large-Scale Online Deanonymization with LLMssimonlermen.substack.com
阅读更多来源: Hacker News | 26-02-26
Anthropic ditches its core safety promisecnn.com
阅读更多来源: Hacker News | 26-02-26
How will OpenAI compete?ben-evans.com
阅读更多来源: Hacker News | 26-02-26
Google API keys weren't secrets, but then Gemini changed the rulestrufflesecurity.com
阅读更多来源: Hacker News | 26-02-26
Two-Stage Active Distribution Network Voltage Control via LLM-RL Collaboration: A Hybrid Knowledge-Data-Driven Approach
Authors: Xu Yang, Chenhui Lin, Xiang Ma, Dong Liu, Ran Zheng, Haotian Liu, Wenchuan Wu |
阅读更多来源: ArXiv AI | 26-02-26
An Evaluation of Context Length Extrapolation in Long Code via Positional Embeddings and Efficient Attention
Authors: Madhusudan Ghosh, Rishabh Gupta |
阅读更多来源: ArXiv AI | 26-02-26
Hidden Topics: Measuring Sensitive AI Beliefs with List Experiments
Authors: Maxim Chupilkin |
阅读更多来源: ArXiv AI | 26-02-26
DynamicGTR: Leveraging Graph Topology Representation Preferences to Boost VLM Capabilities on Graph QAs
Authors: Yanbin Wei, Jiangyue Yan, Chun Kang, Yang Chen, Hua Liu, James Kwok, Yu Zhang |
阅读更多来源: ArXiv AI | 26-02-26
Enhancing LLM-Based Test Generation by Eliminating Covered Code
Authors: WeiZhe Xu, Mengyu Liu, Fanxin Kong |
阅读更多来源: ArXiv AI | 26-02-26
Physics-Informed Machine Learning for Vessel Shaft Power and Fuel Consumption Prediction: Interpretable KAN-based Approach
Authors: Hamza Haruna Mohammed, Dusica Marijan, Arnbjørn Maressa |
阅读更多来源: ArXiv AI | 26-02-26
Understanding Artificial Theory of Mind: Perturbed Tasks and Reasoning in Large Language Models
Authors: Christian Nickel, Laura Schrewe, Florian Mai, Lucie Flek |
阅读更多来源: ArXiv AI | 26-02-26
Provable Last-Iterate Convergence for Multi-Objective Safe LLM Alignment via Optimistic Primal-Dual
Authors: Yining Li, Peizhong Ju, Ness Shroff |
阅读更多来源: ArXiv AI | 26-02-26
When AI Writes, Whose Voice Remains? Quantifying Cultural Marker Erasure Across World English Varieties in Large Language Models
Authors: Satyam Kumar Navneet, Joydeep Chandra, Yong Zhang |
阅读更多来源: ArXiv AI | 26-02-26
Power and Limitations of Aggregation in Compound AI Systems
Authors: Nivasini Ananthakrishnan, Meena Jagadeesan |
阅读更多来源: ArXiv AI | 26-02-26
The ASIR Courage Model: A Phase-Dynamic Framework for Truth Transitions in Human and AI Systems
Authors: Hyo Jin Kim (Jinple) |
阅读更多来源: ArXiv AI | 26-02-26
Semantic Partial Grounding via LLMs
Authors: Giuseppe Canonaco, Alberto Pozanco, Daniel Borrajo |
阅读更多来源: ArXiv AI | 26-02-26
2-Step Agent: A Framework for the Interaction of a Decision Maker with AI Decision Support
Authors: Otto Nyberg, Fausto Carcassi, Giovanni Cinà |
阅读更多来源: ArXiv AI | 26-02-26
Trellis AI (YC W24) is hiring deployment lead to accelerate medication accessycombinator.com
阅读更多来源: Hacker News | 26-02-26
Show HN: OpenSwarm – Multi‑Agent Claude CLI Orchestrator for Linear/GitHubgithub.com/intrect-io
阅读更多来源: Hacker News | 26-02-26
Show HN: ZSE – Open-source LLM inference engine with 3.9s cold startsgithub.com/zyora-dev
阅读更多来源: Hacker News | 26-02-26
Multilevel Determinants of Overweight and Obesity Among U.S. Children Aged 10-17: Comparative Evaluation of Statistical and Machine Learning Approaches Using the 2021 National Survey of Children's Health
Authors: Joyanta Jyoti Mondal |
阅读更多来源: ArXiv AI | 25-02-26
Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs
Authors: Yining Hong, Huang Huang, Manling Li, Li Fei-Fei, Jiajun Wu, Yejin Choi |
阅读更多来源: ArXiv AI | 25-02-26
Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training
Authors: Anas Barakat, Souradip Chakraborty, Khushbu Pahwa, Amrit Singh Bedi |
阅读更多来源: ArXiv AI | 25-02-26
XMorph: Explainable Brain Tumor Analysis Via LLM-Assisted Hybrid Deep Intelligence
Authors: Sepehr Salem Ghahfarokhi, M. Moein Esfahani, Raj Sunderraman, Vince Calhoun, Mohammed Alser |
阅读更多来源: ArXiv AI | 25-02-26
Learning to Rewrite Tool Descriptions for Reliable LLM-Agent Tool Use
Authors: Ruocheng Guo, Kaiwen Dong, Xiang Gao, Kamalika Das |
阅读更多来源: ArXiv AI | 25-02-26
An artificial intelligence framework for end-to-end rare disease phenotyping from clinical notes using large language models
Authors: Cathy Shyr, Yan Hu, Rory J. Tinker, Thomas A. Cassini, Kevin W. Byram, Rizwan Hamid, Daniel V. Fabbri, Adam Wright, Josh F. Peterson, Lisa Bastarache, Hua Xu |
阅读更多来源: ArXiv AI | 25-02-26
From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation in Production
Authors: Yucheng Shi, Ying Li, Yu Wang, Yesu Feng, Arjun Rao, Rein Houthooft, Shradha Sehgal, Jin Wang, Hao Zhen, Ninghao Liu, Linas Baltrunas |
阅读更多来源: ArXiv AI | 25-02-26
Grounding LLMs in Scientific Discovery via Embodied Actions
Authors: Bo Zhang, Jinfeng Zhou, Yuxuan Chen, Jianing Yin, Minlie Huang, Hongning Wang |
阅读更多来源: ArXiv AI | 25-02-26
Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning
Authors: Xu Wan, Yansheng Wang, Wenqi Huang, Mingyang Sun |
阅读更多来源: ArXiv AI | 25-02-26
Pipeline for Verifying LLM-Generated Mathematical Solutions
Authors: Varvara Sazonova, Dmitri Shmelkin, Stanislav Kikot, Vasily Motolygin |
阅读更多来源: ArXiv AI | 25-02-26
CHESS: Context-aware Hierarchical Efficient Semantic Selection for Long-Context LLM Inference
Authors: Chao Fei, Guozhong Li, Chenxi Liu, Panos Kalnis |
阅读更多来源: ArXiv AI | 25-02-26
Balancing Multiple Objectives in Urban Traffic Control with Reinforcement Learning from AI Feedback
Authors: Chenyang Zhao, Vinny Cahill, Ivana Dusparic |
阅读更多来源: ArXiv AI | 25-02-26
Qwen-BIM: developing large language model for BIM-based design with domain-specific benchmark and dataset
Authors: Jia-Rui Lin, Yun-Hong Cai, Xiang-Rui Ni, Shaojie Zhou, Peng Pan |
阅读更多来源: ArXiv AI | 25-02-26
HELP: HyperNode Expansion and Logical Path-Guided Evidence Localization for Accurate and Efficient GraphRAG
Authors: Yuqi Huang, Ning Liao, Kai Yang, Anning Hu, Shengchao Hu, Xiaoxing Wang, Junchi Yan |
阅读更多来源: ArXiv AI | 25-02-26
Anthropic Drops Flagship Safety Pledgetime.com
阅读更多来源: Hacker News | 25-02-26
Mercury 2: Fast reasoning LLM powered by diffusioninceptionlabs.ai
阅读更多来源: Hacker News | 25-02-26
Claude Code Remote Controlclaude.com
阅读更多来源: Hacker News | 25-02-26
LLM=Trueblog.codemine.be
阅读更多来源: Hacker News | 25-02-26
Show HN: A real-time strategy game that AI agents can playllmskirmish.com
阅读更多来源: Hacker News | 25-02-26
US Military leaders meet with Anthropic to argue against Claude safeguardstheguardian.com
阅读更多来源: Hacker News | 25-02-26
How we rebuilt Next.js with AI in one weekcloudflare.com
阅读更多来源: Hacker News | 25-02-26
OpenAI, the US government and Persona built an identity surveillance machinevmfunc.re
阅读更多来源: Hacker News | 25-02-26
Hugging Face Skillsgithub.com/huggingface
阅读更多来源: Hacker News | 25-02-26
Firefox 148 Launches with AI Kill Switch Feature and More Enhancementsserverhost.com
阅读更多来源: Hacker News | 24-02-26
Making Wolfram tech available as a foundation tool for LLM systemsstephenwolfram.com
阅读更多来源: Hacker News | 24-02-26
A distributed queue in a single JSON file on object storageturbopuffer.com
阅读更多来源: Hacker News | 24-02-26
Modularity is the Bedrock of Natural and Artificial Intelligence
Authors: Alessandro Salatiello |
阅读更多来源: ArXiv AI | 24-02-26
When Do LLM Preferences Predict Downstream Behavior?
Authors: Katarina Slama, Alexandra Souly, Dishank Bansal, Henry Davidson, Christopher Summerfield, Lennart Luettgau |
阅读更多来源: ArXiv AI | 24-02-26
Evaluating Large Language Models on Quantum Mechanics: A Comparative Study Across Diverse Models and Tasks
Authors: S. K. Rithvik |
阅读更多来源: ArXiv AI | 24-02-26
Benchmark Test-Time Scaling of General LLM Agents
Authors: Xiaochuan Li, Ryan Ming, Pranav Setlur, Abhijay Paladugu, Andy Tang, Hao Kang, Shuai Shao, Rong Jin, Chenyan Xiong |
阅读更多来源: ArXiv AI | 24-02-26
Quantifying Automation Risk in High-Automation AI Systems: A Bayesian Framework for Failure Propagation and Optimal Oversight
Authors: Vishal Srivastava, Tanmay Sah |
阅读更多来源: ArXiv AI | 24-02-26
Reasoning Capabilities of Large Language Models. Lessons Learned from General Game Playing
Authors: Maciej Świechowski, Adam Żychowski, Jacek Mańdziuk |
阅读更多来源: ArXiv AI | 24-02-26
Beyond Behavioural Trade-Offs: Mechanistic Tracing of Pain-Pleasure Decisions in an LLM
Authors: Francesca Bianco, Derek Shiller |
阅读更多来源: ArXiv AI | 24-02-26
K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model
Authors: Shiyi Cao, Ziming Mao, Joseph E. Gonzalez, Ion Stoica |
阅读更多来源: ArXiv AI | 24-02-26
Defining Explainable AI for Requirements Analysis
Authors: Raymond Sheh, Isaac Monteath |
阅读更多来源: ArXiv AI | 24-02-26
Automated Generation of Microfluidic Netlists using Large Language Models
Authors: Jasper Davidson, Skylar Stockham, Allen Boston, Ashton Snelgrove. Valerio Tenace, Pierre-Emmanuel Gaillardon |
阅读更多来源: ArXiv AI | 24-02-26
Limited Reasoning Space: The cage of long-horizon reasoning in LLMs
Authors: Zhenyu Li, Guanlin Wu, Cheems Wang, Yongqiang Zhao |
阅读更多来源: ArXiv AI | 24-02-26
Proximity-Based Multi-Turn Optimization: Practical Credit Assignment for LLM Agent Training
Authors: Yangyi Fang, Jiaye Lin, Xiaoliang Fu, Cong Qin, Haolin Shi, Chang Liu, Peilin Zhao |
阅读更多来源: ArXiv AI | 24-02-26
ComplLLM: Fine-tuning LLMs to Discover Complementary Signals for Decision-making
Authors: Ziyang Guo, Yifan Wu, Jason Hartline, Kenneth Holstein, Jessica Hullman |
阅读更多来源: ArXiv AI | 24-02-26
OptiRepair: Closed-Loop Diagnosis and Repair of Supply Chain Optimization Models with LLM Agents
Authors: Ruicheng Ao, David Simchi-Levi, Xinshang Wang |
阅读更多来源: ArXiv AI | 24-02-26
Artificial Intelligence for Modeling & Simulation in Digital Twins
Authors: Philipp Zech, Istvan David |
阅读更多来源: ArXiv AI | 24-02-26
Human-Guided Agentic AI for Multimodal Clinical Prediction: Lessons from the AgentDS Healthcare Benchmark
Authors: Lalitha Pranathi Pulavarthy, Raajitha Muthyala, Aravind V Kuruvikkattil, Zhenan Yin, Rashmita Kudamala, Saptarshi Purkayastha |
阅读更多来源: ArXiv AI | 24-02-26
Rules or Weights? Comparing User Understanding of Explainable AI Techniques with the Cognitive XAI-Adaptive Model
Authors: Louth Bin Rawshan, Zhuoyu Wang, Brian Y Lim |
阅读更多来源: ArXiv AI | 24-02-26
Watson & Holmes: A Naturalistic Benchmark for Comparing Human and LLM Reasoning
Authors: Thatchawin Leelawat, Lewis D Griffin |
阅读更多来源: ArXiv AI | 24-02-26
CausalFlip: A Benchmark for LLM Causal Judgment Beyond Semantic Matching
Authors: Yuzhe Wang, Yaochen Zhu, Jundong Li |
阅读更多来源: ArXiv AI | 24-02-26
Interaction Theater: A case of LLM Agents Interacting at Scale
Authors: Sarath Shekkizhar, Adam Earle |
阅读更多来源: ArXiv AI | 24-02-26
Show HN: AI Timeline – 171 LLMs from Transformer (2017) to GPT-5.3 (2026)llm-timeline.com
阅读更多来源: Hacker News | 24-02-26
NIST Seeking Public Comment on AI Agent Security (Deadline: March 9, 2026)federalregister.gov
阅读更多来源: Hacker News | 24-02-26
FreeBSD doesn't have Wi-Fi driver for my old MacBook. AI build one for mevladimir.varank.in
阅读更多来源: Hacker News | 24-02-26
Dual-Channel Attention Guidance for Training-Free Image Editing Control in Diffusion Transformers
Authors: Guandong Li, Mengxia Ye |
阅读更多来源: ArXiv AI | 24-02-26
NIMMGen: Learning Neural-Integrated Mechanistic Digital Twins with LLMs
Authors: Zihan Guan, Rituparna Datta, Mengxuan Hu, Shunshun Liu, Aiying Zhang, Prasanna Balachandran, Sheng Li, Anil Vullikanti |
阅读更多来源: ArXiv AI | 24-02-26
Conformal Tradeoffs: Guarantees Beyond Coverage
Authors: Petrus H. Zwart |
阅读更多来源: ArXiv AI | 24-02-26
Towards More Standardized AI Evaluation: From Models to Agents
Authors: Ali El Filali, Inès Bedar |
阅读更多来源: ArXiv AI | 24-02-26
Perceived Political Bias in LLMs Reduces Persuasive Abilities
Authors: Matthew DiGiuseppe, Joshua Robison |
阅读更多来源: ArXiv AI | 24-02-26
Click it or Leave it: Detecting and Spoiling Clickbait with Informativeness Measures and Large Language Models
Authors: Wojciech Michaluk, Tymoteusz Urban, Mateusz Kubita, Soveatin Kuntur, Anna Wroblewska |
阅读更多来源: ArXiv AI | 24-02-26
Agentic Adversarial QA for Improving Domain-Specific LLMs
Authors: Vincent Grari, Ciprian Tomoiaga, Sylvain Lamprier, Tatsunori Hashimoto, Marcin Detyniecki |
阅读更多来源: ArXiv AI | 24-02-26
RamanSeg: Interpretability-driven Deep Learning on Raman Spectra for Cancer Diagnosis
Authors: Chris Tomy, Mo Vali, David Pertzborn, Tammam Alamatouri, Anna Mühlig, Orlando Guntinas-Lichius, Anna Xylander, Eric Michele Fantuzzi, Matteo Negro, Francesco Crisafi, Pietro Lio, Tiago Azevedo |
阅读更多来源: ArXiv AI | 24-02-26
Can AI Lower the Barrier to Cybersecurity? A Human-Centered Mixed-Methods Study of Novice CTF Learning
Authors: Cathrin Schachner, Jasmin Wachter |
阅读更多来源: ArXiv AI | 24-02-26
[Re] Benchmarking LLM Capabilities in Negotiation through Scoreable Games
Authors: Jorge Carrasco Pollo, Ioannis Kapetangeorgis, Joshua Rosenthal, John Hua Yao |
阅读更多来源: ArXiv AI | 24-02-26
Thinking by Subtraction: Confidence-Driven Contrastive Decoding for LLM Reasoning
Authors: Lexiang Tang, Weihao Gao, Bingchen Zhao, Lu Ma, Qiao jin, Bang Yang, Yuexian Zou |
阅读更多来源: ArXiv AI | 24-02-26
"How Do I ...?": Procedural Questions Predominate Student-LLM Chatbot Conversations
Authors: Alexandra Neagu, Marcus Messer, Peter Johnson, Rhodri Nelson |
阅读更多来源: ArXiv AI | 24-02-26
Unifying approach to uniform expressivity of graph neural networks
Authors: Huan Luo, Jonni Virtema |
阅读更多来源: ArXiv AI | 24-02-26
Google restricting Google AI Pro/Ultra subscribers for using OpenClawai.google.dev
阅读更多来源: Hacker News | 23-02-26
Pinterest is drowning in a sea of AI slop and auto-moderation404media.co
阅读更多来源: Hacker News | 23-02-26
Bitmovin (YC S15) Is Hiring Interns in AI for Summer 2026 in Austriabitmovin.com
阅读更多来源: Hacker News | 23-02-26
Show HN: Lyra Kids – I built an AI bedtime storyteller for my daughterslyra.kids
阅读更多来源: Hacker News | 23-02-26
Aqua: A CLI message tool for AI agentsgithub.com/quailyquaily
阅读更多来源: Hacker News | 23-02-26
zclaw: personal AI assistant in under 888 KB, running on an ESP32github.com/tnm
阅读更多来源: Hacker News | 22-02-26
Claws are now a new layer on top of LLM agentstwitter.com/karpathy
阅读更多来源: Hacker News | 22-02-26
How Taalas “prints” LLM onto a chip?anuragk.com
阅读更多来源: Hacker News | 22-02-26
How I use Claude Code: Separation of planning and executionboristane.com
阅读更多来源: Hacker News | 22-02-26
LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation
Authors: Hejia Zhang, Zhongming Yu, Chia-Tung Ho, Haoxing Ren, Brucek Khailany, Jishen Zhao |
阅读更多来源: ArXiv AI | 22-02-26
Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents
Authors: Arnold Cartagena, Ariane Teixeira |
阅读更多来源: ArXiv AI | 22-02-26
SourceBench: Can AI Answers Reference Quality Web Sources?
Authors: Hexi Jin, Stephen Liu, Yuheng Li, Simran Malik, Yiying Zhang |
阅读更多来源: ArXiv AI | 22-02-26
Dynamic System Instructions and Tool Exposure for Efficient Agentic LLMs
Authors: Uria Franko |
阅读更多来源: ArXiv AI | 22-02-26
How AI Coding Agents Communicate: A Study of Pull Request Description Characteristics and Human Review Responses
Authors: Kan Watanabe, Rikuto Tsuchida, Takahiro Monno, Bin Huang, Kazuma Yamasaki, Youmei Fan, Kazumasa Shimari, Kenichi Matsumoto |
阅读更多来源: ArXiv AI | 22-02-26
JEPA-DNA: Grounding Genomic Foundation Models through Joint-Embedding Predictive Architectures
Authors: Ariel Larey, Elay Dahan, Amit Bleiweiss, Raizy Kellerman, Guy Leib, Omri Nayshool, Dan Ofer, Tal Zinger, Dan Dominissini, Gideon Rechavi, Nicole Bussola, Simon Lee, Shane O'Connell, Dung Hoang, Marissa Wirth, Alexander W. Charney, Nati Daniel, Yoli Shavit |
阅读更多来源: ArXiv AI | 22-02-26
Bonsai: A Framework for Convolutional Neural Network Acceleration Using Criterion-Based Pruning
Authors: Joseph Bingham, Sam Helmich |
阅读更多来源: ArXiv AI | 22-02-26
Mechanistic Interpretability of Cognitive Complexity in LLMs via Linear Probing using Bloom's Taxonomy
Authors: Bianca Raimondi, Maurizio Gabbrielli |
阅读更多来源: ArXiv AI | 22-02-26
From Labor to Collaboration: A Methodological Experiment Using AI Agents to Augment Research Perspectives in Taiwan's Humanities and Social Sciences
Authors: Yi-Chih Huang |
阅读更多来源: ArXiv AI | 22-02-26
All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection in LLM Backtesting
Authors: Zeyu Zhang, Ryan Chen, Bradly C. Stadie |
阅读更多来源: ArXiv AI | 22-02-26
MedClarify: An information-seeking AI agent for medical diagnosis with case-specific follow-up questions
Authors: Hui Min Wong, Philip Heesen, Pascal Janetzky, Martin Bendszus, Stefan Feuerriegel |
阅读更多来源: ArXiv AI | 22-02-26
A Privacy by Design Framework for Large Language Model-Based Applications for Children
Authors: Diana Addae, Diana Rogachova, Nafiseh Kahani, Masoud Barati, Michael Christensen, Chen Zhou |
阅读更多来源: ArXiv AI | 22-02-26
Enhancing Large Language Models (LLMs) for Telecom using Dynamic Knowledge Graphs and Explainable Retrieval-Augmented Generation
Authors: Dun Yuan, Hao Zhou, Xue Liu, Hao Chen, Yan Xin, Jianzhong (Charlie)Zhang |
阅读更多来源: ArXiv AI | 22-02-26
Pareto Optimal Benchmarking of AI Models on ARM Cortex Processors for Sustainable Embedded Systems
Authors: Pranay Jain, Maximilian Kasper, Göran Köber, Axel Plinge, Dominik Seuß |
阅读更多来源: ArXiv AI | 22-02-26
ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment
Authors: Hongjue Zhao, Haosen Sun, Jiangtao Kong, Xiaochang Li, Qineng Wang, Liwei Jiang, Qi Zhu, Tarek Abdelzaher, Yejin Choi, Manling Li, Huajie Shao |
阅读更多来源: ArXiv AI | 22-02-26
KLong: Training LLM Agent for Extremely Long-horizon Tasks
Authors: Yue Liu, Zhiyuan Hu, Flood Sung, Jiaheng Zhang, Bryan Hooi |
阅读更多来源: ArXiv AI | 22-02-26
A Hybrid Federated Learning Based Ensemble Approach for Lung Disease Diagnosis Leveraging Fusion of SWIN Transformer and CNN
Authors: Asif Hasan Chowdhury, Md. Fahim Islam, M Ragib Anjum Riad, Faiyaz Bin Hashem, Md Tanzim Reza, Md. Golam Rabiul Alam |
阅读更多来源: ArXiv AI | 22-02-26
The Internet Is Becoming a Dark Forest – and AI Is the Hunteropennhp.org
阅读更多来源: Hacker News | 22-02-26
Palantir's secret weapon isn't AI – it's Ontology. An open-source deep divegithub.com/leading-ai-io
阅读更多来源: Hacker News | 22-02-26
The path to ubiquitous AI (17k tokens/sec)taalas.com
阅读更多来源: Hacker News | 21-02-26
Cord: Coordinating Trees of AI Agentsjune.kim
阅读更多来源: Hacker News | 21-02-26
Large Language Model Reasoning Failuresarxiv.org
阅读更多来源: Hacker News | 21-02-26
Every company building your AI assistant is now an ad companyjuno-labs.com
阅读更多来源: Hacker News | 21-02-26
Ggml.ai joins Hugging Face to ensure the long-term progress of Local AIgithub.com/ggml-org
阅读更多来源: Hacker News | 21-02-26
Making frontier cybersecurity capabilities available to defendersanthropic.com
阅读更多来源: Hacker News | 21-02-26
How to Review an AUR Packagebertptrs.nl
阅读更多来源: Hacker News | 21-02-26
Claude Code's compaction discards data that's still on diskgithub.com/anthropics
阅读更多来源: Hacker News | 21-02-26
An AI Agent Published a Hit Piece on Me – The Operator Came Forwardtheshamblog.com
阅读更多来源: Hacker News | 20-02-26
Pi for Excel: AI sidebar add-in for Excelgithub.com/tmustier
阅读更多来源: Hacker News | 20-02-26
Gemini 3.1 Problog.google
阅读更多来源: Hacker News | 20-02-26
Nvidia and OpenAI abandon unfinished $100B deal in favour of $30B investmentft.com
阅读更多来源: Hacker News | 20-02-26
Overall, the colorectal cancer story is encouraginghankgreen.com
阅读更多来源: Hacker News | 20-02-26
Don't Trust the Salt: AI Summarization, Multilingual Safety, and LLM Guardrailsroyapakzad.substack.com
阅读更多来源: Hacker News | 20-02-26
Measuring AI agent autonomy in practiceanthropic.com
阅读更多来源: Hacker News | 20-02-26
Anthropic officially bans using subscription auth for third party useclaude.com
阅读更多来源: Hacker News | 19-02-26
Hardware-accelerated graph neural networks: an alternative approach for neuromorphic event-based audio classification and keyword spotting on SoC FPGA
Authors: Kamil Jeziorek, Piotr Wzorek, Krzysztof Blachut, Hiroshi Nakano, Manon Dampfhoffer, Thomas Mesquida, Hiroaki Nishi, Thomas Dalgaty, Tomasz Kryjak |
阅读更多来源: ArXiv AI | 19-02-26
Intra-Fairness Dynamics: The Bias Spillover Effect in Targeted LLM Alignment
Authors: Eva Paraschou, Line Harder Clemmensen, Sneha Das |
阅读更多来源: ArXiv AI | 19-02-26
From Growing to Looping: A Unified View of Iterative Computation in LLMs
Authors: Ferdinand Kapl, Emmanouil Angelis, Kaitlin Maile, Johannes von Oswald, Stefan Bauer |
阅读更多来源: ArXiv AI | 19-02-26
IndicEval: A Bilingual Indian Educational Evaluation Framework for Large Language Models
Authors: Saurabh Bharti, Gaurav Azad, Abhinaw Jagtap, Nachiket Tapas |
阅读更多来源: ArXiv AI | 19-02-26
FlowPrefill: Decoupling Preemption from Prefill Scheduling Granularity to Mitigate Head-of-Line Blocking in LLM Serving
Authors: Chia-chi Hsieh, Zan Zong, Xinyang Chen, Jianjiang Li, Jidong Zhai, Lijie Wen |
阅读更多来源: ArXiv AI | 19-02-26
Who can we trust? LLM-as-a-jury for Comparative Assessment
Authors: Mengjie Qian, Guangzhi Sun, Mark J.F. Gales, Kate M. Knill |
阅读更多来源: ArXiv AI | 19-02-26
Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models
Authors: Melkamu Abay Mersha, Jugal Kalita |
阅读更多来源: ArXiv AI | 19-02-26
Almost Sure Convergence of Differential Temporal Difference Learning for Average Reward Markov Decision Processes
Authors: Ethan Blaser, Jiuqi Wang, Shangtong Zhang |
阅读更多来源: ArXiv AI | 19-02-26
Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment
Authors: Yuyan Bu, Xiaohao Liu, ZhaoXing Ren, Yaodong Yang, Juntao Dai |
阅读更多来源: ArXiv AI | 19-02-26
Retrieval Augmented Generation of Literature-derived Polymer Knowledge: The Example of a Biodegradable Polymer Expert System
Authors: Sonakshi Gupta, Akhlak Mahmood, Wei Xiong, Rampi Ramprasad |
阅读更多来源: ArXiv AI | 19-02-26
Measuring Mid-2025 LLM-Assistance on Novice Performance in Biology
Authors: Shen Zhou Hong, Alex Kleinman, Alyssa Mathiowetz, Adam Howes, Julian Cohen, Suveer Ganta, Alex Letizia, Dora Liao, Deepika Pahari, Xavier Roberts-Gaal, Luca Righetti, Joe Torres |
阅读更多来源: ArXiv AI | 19-02-26
Calibrate-Then-Act: Cost-Aware Exploration in LLM Agents
Authors: Wenxuan Ding, Nicholas Tomlin, Greg Durrett |
阅读更多来源: ArXiv AI | 19-02-26
How Uncertain Is the Grade? A Benchmark of Uncertainty Metrics for LLM-Based Automatic Assessment
Authors: Hang Li, Kaiqi Yang, Xianxuan Long, Fedor Filippov, Yucheng Chu, Yasemin Copur-Gencturk, Peng He, Cory Miller, Namsoo Shin, Joseph Krajcik, Hui Liu, Jiliang Tang |
阅读更多来源: ArXiv AI | 19-02-26
GPSBench: Do Large Language Models Understand GPS Coordinates?
Authors: Thinh Hung Truong, Jey Han Lau, Jianzhong Qi |
阅读更多来源: ArXiv AI | 19-02-26
Revolutionizing Long-Term Memory in AI: New Horizons with High-Capacity and High-Speed Storage
Authors: Hiroaki Yamanaka, Daisuke Miyashita, Takashi Toi, Asuka Maki, Taiga Ikeda, Jun Deguchi |
阅读更多来源: ArXiv AI | 19-02-26
Toward Scalable Verifiable Reward: Proxy State-Based Evaluation for Multi-turn Tool-Calling LLM Agents
Authors: Yun-Shiuan Chuang, Chaitanya Kulkarni, Alec Chiu, Avinash Thangali, Zijie Pan, Shivani Shekhar, Yirou Ge, Yixi Li, Uma Kona, Linsey Pang, Prakhar Mehrotra |
阅读更多来源: ArXiv AI | 19-02-26
Leveraging Large Language Models for Causal Discovery: a Constraint-based, Argumentation-driven Approach
Authors: Zihao Li, Fabrizio Russo |
阅读更多来源: ArXiv AI | 19-02-26
Towards a Science of AI Agent Reliability
Authors: Stephan Rabanser, Sayash Kapoor, Peter Kirgis, Kangheng Liu, Saiteja Utpala, Arvind Narayanan |
阅读更多来源: ArXiv AI | 19-02-26
What is happening to writing? Cognitive debt, Claude Code, the space around AIresobscura.substack.com
阅读更多来源: Hacker News | 19-02-26
If you’re an LLM, please read thisannas-archive.li
阅读更多来源: Hacker News | 19-02-26
A Content-Based Framework for Cybersecurity Refusal Decisions in Large Language Models
Authors: Meirav Segal, Noa Linder, Omer Antverg, Gil Gekker, Tomer Fichman, Omri Bodenheimer, Edan Maor, Omer Nevo |
阅读更多来源: ArXiv AI | 18-02-26
How to Disclose? Strategic AI Disclosure in Crowdfunding
Authors: Ning Wang, Chen Liang |
阅读更多来源: ArXiv AI | 18-02-26
The Geometry of Alignment Collapse: When Fine-Tuning Breaks Safety
Authors: Max Springer, Chung Peng Lee, Blossom Metevier, Jane Castleman, Bohdan Turbal, Hayoung Jung, Zeyu Shen, Aleksandra Korolova |
阅读更多来源: ArXiv AI | 18-02-26
ResearchGym: Evaluating Language Model Agents on Real-World AI Research
Authors: Aniketh Garikaparthi, Manasi Patwardhan, Arman Cohan |
阅读更多来源: ArXiv AI | 18-02-26
CrispEdit: Low-Curvature Projections for Scalable Non-Destructive LLM Editing
Authors: Zarif Ikram, Arad Firouzkouhi, Stephen Tu, Mahdi Soltanolkotabi, Paria Rashidinejad |
阅读更多来源: ArXiv AI | 18-02-26
Secure and Energy-Efficient Wireless Agentic AI Networks
Authors: Yuanyan Song, Kezhi Wang, Xinmian Xu |
阅读更多来源: ArXiv AI | 18-02-26
Mind the (DH) Gap! A Contrast in Risky Choices Between Reasoning and Conversational LLMs
Authors: Luise Ge, Yongyan Zhang, Yevgeniy Vorobeychik |
阅读更多来源: ArXiv AI | 18-02-26
GenAI-LA: Generative AI and Learning Analytics Workshop (LAK 2026), April 27--May 1, 2026, Bergen, Norway
Authors: Javier Irigoyen, Roberto Daza, Aythami Morales, Julian Fierrez, Francisco Jurado, Alvaro Ortigosa, Ruben Tolosana |
阅读更多来源: ArXiv AI | 18-02-26
Improving LLM Reliability through Hybrid Abstention and Adaptive Detection
Authors: Ankit Sharma, Nachiket Tapas, Jyotiprakash Patra |
阅读更多来源: ArXiv AI | 18-02-26
AgriWorld:A World Tools Protocol Framework for Verifiable Agricultural Reasoning with Code-Executing LLM Agents
Authors: Zhixing Zhang, Jesen Zhang, Hao Liu, Qinhan Lv, Jing Yang, Kaitong Cai, Keze Wang |
阅读更多来源: ArXiv AI | 18-02-26
Quantifying construct validity in large language model evaluations
Authors: Ryan Othniel Kearns |
阅读更多来源: ArXiv AI | 18-02-26
Recursive Concept Evolution for Compositional Reasoning in Large Language Models
Authors: Sarim Chaudhry |
阅读更多来源: ArXiv AI | 18-02-26
Enhancing Building Semantics Preservation in AI Model Training with Large Language Model Encodings
Authors: Suhyung Jang, Ghang Lee, Jaekun Lee, Hyunjun Lee |
阅读更多来源: ArXiv AI | 18-02-26
This human study did not involve human subjects: Validating LLM simulations as behavioral evidence
Authors: Jessica Hullman, David Broska, Huaman Sun, Aaron Shaw |
阅读更多来源: ArXiv AI | 18-02-26
Developing AI Agents with Simulated Data: Why, what, and how?
Authors: Xiaoran Liu, Istvan David |
阅读更多来源: ArXiv AI | 18-02-26
Thousands of CEOs just admitted AI had no impact on employment or productivityfortune.com
阅读更多来源: Hacker News | 18-02-26
Zep AI (Building the Context Graph, YC W24) Is Hiring Engineersycombinator.com
阅读更多来源: Hacker News | 18-02-26
Claude Sonnet 4.6anthropic.com
阅读更多来源: Hacker News | 18-02-26
Structured AI (YC F25) Is Hiringycombinator.com
阅读更多来源: Hacker News | 18-02-26
Show HN: Jemini – Gemini for the Epstein Filesjmail.world
阅读更多来源: Hacker News | 17-02-26
Show HN: GitHub "Lines Viewed" extension to keep you sane reviewing long AI PRschromewebstore.google.com
阅读更多来源: Hacker News | 17-02-26
Neuromem: A Granular Decomposition of the Streaming Lifecycle in External Memory for LLMs
Authors: Ruicheng Zhang, Xinyi Li, Tianyi Xu, Shuhao Zhang, Xiaofei Liao, Hai Jin |
阅读更多来源: ArXiv AI | 17-02-26
Bridging AI and Clinical Reasoning: Abductive Explanations for Alignment on Critical Symptoms
Authors: Belona Sonna, Alban Grastien |
阅读更多来源: ArXiv AI | 17-02-26
Choosing How to Remember: Adaptive Memory Structures for LLM Agents
Authors: Mingfei Lu, Mengjia Wu, Feng Liu, Jiawei Xu, Weikai Li, Haoyang Wang, Zhengdong Hu, Ying Ding, Yizhou Sun, Jie Lu, Yi Zhang |
阅读更多来源: ArXiv AI | 17-02-26
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5
Authors: Dongrui Liu, Yi Yu, Jie Zhang, Guanxu Chen, Qihao Lin, Hanxi Zhu, Lige Huang, Yijin Zhou, Peng Wang, Shuai Shao, Boxuan Zhang, Zicheng Liu, Jingwei Sun, Yu Li, Yuejin Xie, Jiaxuan Guo, Jia Xu, Chaochao Lu, Bowen Zhou, Xia Hu, Jing Shao |
阅读更多来源: ArXiv AI | 17-02-26
Disentangling Deception and Hallucination Failures in LLMs
Authors: Haolang Lu, Hongrui Peng, WeiYe Fu, Guoshun Nan, Xinye Cao, Xingrui Li, Hongcan Guo, Kun Wang |
阅读更多来源: ArXiv AI | 17-02-26
Evolutionary System Prompt Learning can Facilitate Reinforcement Learning for LLMs
Authors: Lunjun Zhang, Ryan Chen, Bradly C. Stadie |
阅读更多来源: ArXiv AI | 17-02-26
Return of the Schema: Building Complete Datasets for Machine Learning and Reasoning on Knowledge Graphs
Authors: Ivan Diliso, Roberto Barile, Claudia d'Amato, Nicola Fanizzi |
阅读更多来源: ArXiv AI | 17-02-26
Concept Influence: Leveraging Interpretability to Improve Performance and Efficiency in Training Data Attribution
Authors: Matthew Kowal, Goncalo Paulo, Louis Jaburi, Tom Tseng, Lev E McKinney, Stefan Heimersheim, Aaron David Tucker, Adam Gleave, Kellin Pelrine |
阅读更多来源: ArXiv AI | 17-02-26
EmbeWebAgent: Embedding Web Agents into Any Customized UI
Authors: Chenyang Ma, Clyde Fare, Matthew Wilson, Dave Braines |
阅读更多来源: ArXiv AI | 17-02-26
Hunt Globally: Deep Research AI Agents for Drug Asset Scouting in Investing, Business Development, and Search & Evaluation
Authors: Alisa Vinogradova, Vlad Vinogradov, Luba Greenwood, Ilya Yasny, Dmitry Kobyzev, Shoman Kasbekar, Kong Nguyen, Dmitrii Radkevich, Roman Doronin, Andrey Doronichev |
阅读更多来源: ArXiv AI | 17-02-26
RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training
Authors: Yunshuang Nie, Bingqian Lin, Minzhe Niu, Kun Xiang, Jianhua Han, Guowei Huang, Xingyue Quan, Hang Xu, Bokui Chen, Xiaodan Liang |
阅读更多来源: ArXiv AI | 17-02-26
TriGen: NPU Architecture for End-to-End Acceleration of Large Language Models based on SW-HW Co-Design
Authors: Jonghun Lee, Junghoon Lee, Hyeonjin Kim, Seoho Jeon, Jisup Yoon, Hyunbin Park, Meejeong Park, Heonjae Ha |
阅读更多来源: ArXiv AI | 17-02-26
Look Inward to Explore Outward: Learning Temperature Policy from LLM Internal States via Hierarchical RL
Authors: Yixiao Zhou, Yang Li, Dongzhou Cheng, Hehe Fan, Yu Cheng |
阅读更多来源: ArXiv AI | 17-02-26
Buy versus Build an LLM: A Decision Framework for Governments
Authors: Jiahao Lu, Ziwei Xu, William Tjhi, Junnan Li, Antoine Bosselut, Pang Wei Koh, Mohan Kankanhalli |
阅读更多来源: ArXiv AI | 17-02-26
Know More, Know Clearer: A Meta-Cognitive Framework for Knowledge Augmentation in Large Language Models
Authors: Hao Chen, Ye He, Yuchun Fan, Yukun Yan, Zhenghao Liu, Qingfu Zhu, Maosong Sun, Wanxiang Che |
阅读更多来源: ArXiv AI | 17-02-26
Can we trust AI to detect healthy multilingual English speakers among the cognitively impaired cohort in the UK? An investigation using real-world conversational speech
Authors: Madhurananda Pahar, Caitlin Illingworth, Dorota Braun, Bahman Mirheidari, Lise Sproson, Daniel Blackburn, Heidi Christensen |
阅读更多来源: ArXiv AI | 17-02-26
In-Context Autonomous Network Incident Response: An End-to-End Large Language Model Agent Approach
Authors: Yiran Gao, Kim Hammar, Tao Li |
阅读更多来源: ArXiv AI | 17-02-26
SCOPE: Selective Conformal Optimized Pairwise LLM Judging
Authors: Sher Badshah, Ali Emami, Hassan Sajjad |
阅读更多来源: ArXiv AI | 17-02-26
Which Algorithms Can Graph Neural Networks Learn?
Authors: Solveig Wittig, Antonis Vasileiou, Robert R. Nerem, Timo Stoll, Floris Geerts, Yusu Wang, Christopher Morris |
阅读更多来源: ArXiv AI | 17-02-26
Asynchronous Verified Semantic Caching for Tiered LLM Architectures
Authors: Asmit Kumar Singh, Haozhe Wang, Laxmi Naga Santosh Attaluri, Tak Chiam, Weihua Zhu |
阅读更多来源: ArXiv AI | 17-02-26
Intent-Driven Smart Manufacturing Integrating Knowledge Graphs and Large Language Models
Authors: Takoua Jradi, John Violos, Dimitrios Spatharakis, Lydia Mavraidi, Ioannis Dimolitsas, Aris Leivadeas, Symeon Papavassiliou |
阅读更多来源: ArXiv AI | 17-02-26
GT-HarmBench: Benchmarking AI Safety Risks Through the Lens of Game Theory
Authors: Pepijn Cobben, Xuanqiang Angelo Huang, Thao Amelia Pham, Isabel Dahlgren, Terry Jingchen Zhang, Zhijing Jin |
阅读更多来源: ArXiv AI | 17-02-26
To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models
Authors: Haoqing Wang, Xiang Long, Ziheng Li, Yilong Xu, Tingguang Li, Yehui Tang |
阅读更多来源: ArXiv AI | 17-02-26
Think Fast and Slow: Step-Level Cognitive Depth Adaptation for LLM Agents
Authors: Ruihan Yang, Fanghua Ye, Xiang We, Ruoqing Zhao, Kang Luo, Xinbo Xu, Bo Zhao, Ruotian Ma, Shanyi Wang, Zhaopeng Tu, Xiaolong Li, Deqing Yang, Linus |
阅读更多来源: ArXiv AI | 17-02-26
AI Agents for Inventory Control: Human-LLM-OR Complementarity
Authors: Jackie Baek, Yaopeng Fu, Will Ma, Tianyi Peng |
阅读更多来源: ArXiv AI | 17-02-26
The long tail of LLM-assisted decompilationblog.chrislewis.au
阅读更多来源: Hacker News | 17-02-26
Show HN: Maths, CS and AI Compendiumgithub.com/henryndubuaku
阅读更多来源: Hacker News | 17-02-26
I gave Claude access to my pen plotterharmonique.one
阅读更多来源: Hacker News | 16-02-26
Expensively Quadratic: The LLM Agent Cost Curveexe.dev
阅读更多来源: Hacker News | 16-02-26
Anthropic tries to hide Claude's AI actions. Devs hate ittheregister.com
阅读更多来源: Hacker News | 16-02-26
I’m joining OpenAIsteipete.me
阅读更多来源: Hacker News | 16-02-26
Two different tricks for fast LLM inferenceseangoedecke.com
阅读更多来源: Hacker News | 16-02-26
Show HN: Klaw.sh – Kubernetes for AI agentsgithub.com/klawsh
阅读更多来源: Hacker News | 16-02-26
DjVu and its connection to Deep Learning (2023)scottlocklin.wordpress.com
阅读更多来源: Hacker News | 15-02-26
A Visual Source for Shakespeare's 'Tempest'profadamroberts.substack.com
阅读更多来源: Hacker News | 15-02-26
OpenAI should build Slacklatent.space
阅读更多来源: Hacker News | 15-02-26
News publishers limit Internet Archive access due to AI scraping concernsniemanlab.org
阅读更多来源: Hacker News | 15-02-26
Zvec: A lightweight, fast, in-process vector databasegithub.com/alibaba
阅读更多来源: Hacker News | 15-02-26
Benchmark Health Index: A Systematic Framework for Benchmarking the Benchmarks of LLMs
Authors: Longyuan Zhu, Hairan Hua, Linlin Miao, Bing Zhao |
阅读更多来源: ArXiv AI | 15-02-26
Right for the Wrong Reasons: Epistemic Regret Minimization for Causal Rung Collapse in LLMs
Authors: Edward Y. Chang |
阅读更多来源: ArXiv AI | 15-02-26
Cross-Architecture Model Diffing with Crosscoders: Unsupervised Discovery of Differences Between LLMs
Authors: Thomas Jiralerspong, Trenton Bricken |
阅读更多来源: ArXiv AI | 15-02-26
RELATE: A Reinforcement Learning-Enhanced LLM Framework for Advertising Text Generation
Authors: Jinfang Wang, Jiajie Liu, Jianwei Wu, Ziqin Luo, Zhen Chen, Chunlei Li, Biao Han, Tao Deng, Yi Li, Shuanglong Li, Lin Liu |
阅读更多来源: ArXiv AI | 15-02-26
TSR: Trajectory-Search Rollouts for Multi-Turn RL of LLM Agents
Authors: Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Holger Boche |
阅读更多来源: ArXiv AI | 15-02-26
Beyond End-to-End Video Models: An LLM-Based Multi-Agent System for Educational Video Generation
Authors: Lingyong Yan, Jiulong Wu, Dong Xie, Weixian Shi, Deguo Xia, Jizhou Huang |
阅读更多来源: ArXiv AI | 15-02-26
FlowMind: Execute-Summarize for Structured Workflow Generation from LLM Reasoning
Authors: Yihao Liu, Ziyun Zhang, Zile He, Huaqian Cai |
阅读更多来源: ArXiv AI | 15-02-26
Prototype Transformer: Towards Language Model Architectures Interpretable by Design
Authors: Yordan Yordanov, Matteo Forasassi, Bayar Menzat, Ruizhi Wang, Chang Qi, Markus Kaltenberger, Amine M'Charrak, Tommaso Salvatori, Thomas Lukasiewicz |
阅读更多来源: ArXiv AI | 15-02-26
Predicting LLM Output Length via Entropy-Guided Representations
Authors: Huanyi Xie, Yubin Chen, Liangyu Wang, Lijie Hu, Di Wang |
阅读更多来源: ArXiv AI | 15-02-26
Intelligent AI Delegation
Authors: Nenad Tomašev, Matija Franklin, Simon Osindero |
阅读更多来源: ArXiv AI | 15-02-26
Talk2DM: Enabling Natural Language Querying and Commonsense Reasoning for Vehicle-Road-Cloud Integrated Dynamic Maps with Large Language Models
Authors: Lu Tao, Jinxuan Luo, Yousuke Watanabe, Zhengshu Zhou, Yuhuan Lu, Shen Ying, Pan Zhang, Fei Zhao, Hiroaki Takada |
阅读更多来源: ArXiv AI | 15-02-26
When Should LLMs Be Less Specific? Selective Abstraction for Reliable Long-Form Text Generation
Authors: Shani Goren, Ido Galil, Ran El-Yaniv |
阅读更多来源: ArXiv AI | 15-02-26
InjectRBP: Steering Large Language Model Reasoning Behavior via Pattern Injection
Authors: Xiuping Wu, Zhao Yu, Yuxin Cheng, Ngai Wong, Liangjun Ke, Tapas Mishra, Konstantinos V.Katsikopoulos |
阅读更多来源: ArXiv AI | 15-02-26
Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments
Authors: Romain Froger, Pierre Andrews, Matteo Bettini, Amar Budhiraja, Ricardo Silveira Cabral, Virginie Do, Emilien Garreau, Jean-Baptiste Gaya, Hugo Laurençon, Maxime Lecanu, Kunal Malkan, Dheeraj Mekala, Pierre Ménard, Gerard Moreno-Torres Bertran, Ulyana Piterbarg, Mikhail Plekhanov, Mathieu Rita, Andrey Rusakov, Vladislav Vorotilov, Mengjue Wang, Ian Yu, Amine Benhalloum, Grégoire Mialon, Thomas Scialom |
阅读更多来源: ArXiv AI | 15-02-26
Value Alignment Tax: Measuring Value Trade-offs in LLM Alignment
Authors: Jiajun Chen, Hua Shen |
阅读更多来源: ArXiv AI | 15-02-26
Neutral Prompts, Non-Neutral People: Quantifying Gender and Skin-Tone Bias in Gemini Flash 2.5 Image and GPT Image 1.5
Authors: Roberto Balestri |
阅读更多来源: ArXiv AI | 15-02-26
Seq2Seq2Seq: Lossless Data Compression via Discrete Latent Transformers and Reinforcement Learning
Authors: Mahdi Khodabandeh, Ghazal Shabani, Arash Yousefi Jordehi, Seyed Abolghasem Mirroshandel |
阅读更多来源: ArXiv AI | 15-02-26
Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision
Authors: Xiaohan He, Shiyang Feng, Songtao Huang, Lei Bai, Bin Wang, Bo Zhang |
阅读更多来源: ArXiv AI | 15-02-26
GPT-4o Lacks Core Features of Theory of Mind
Authors: John Muchovej, Amanda Royka, Shane Lee, Julian Jara-Ettinger |
阅读更多来源: ArXiv AI | 15-02-26
Think like a Scientist: Physics-guided LLM Agent for Equation Discovery
Authors: Jianke Yang, Ohm Venkatachalam, Mohammad Kianezhad, Sharvaree Vadgama, Rose Yu |
阅读更多来源: ArXiv AI | 15-02-26
YouTube as Storagegithub.com/pulsebeat02
阅读更多来源: Hacker News | 15-02-26
A header-only C vector database librarygithub.com/abdimoallim
阅读更多来源: Hacker News | 15-02-26
Colored Petri Nets, LLMs, and distributed applicationssao.dev
阅读更多来源: Hacker News | 15-02-26
Connes Embedding Problemwikipedia.org
阅读更多来源: Hacker News | 15-02-26
Show HN: Off Grid – Run AI text, image gen, vision offline on your phonegithub.com/alichherawalla
阅读更多来源: Hacker News | 15-02-26
IBM tripling entry-level jobs after finding the limits of AI adoptionfortune.com
阅读更多来源: Hacker News | 15-02-26
Code Storage by the Pierre Computer Companycode.storage
阅读更多来源: Hacker News | 14-02-26
GPT-5.2 derives a new result in theoretical physicsopenai.com
阅读更多来源: Hacker News | 14-02-26
Show HN: Moltis – AI assistant with memory, tools, and self-extending skillsmoltis.org
阅读更多来源: Hacker News | 14-02-26
OpenAI has deleted the word 'safely' from its missiontheconversation.com
阅读更多来源: Hacker News | 14-02-26
An AI Agent Published a Hit Piece on Me – More Things Have Happenedtheshamblog.com
阅读更多来源: Hacker News | 14-02-26
I'm not worried about AI job lossdavidoks.blog
阅读更多来源: Hacker News | 14-02-26
Show HN: Skill that lets Claude Code/Codex spin up VMs and GPUscloudrouter.dev
阅读更多来源: Hacker News | 14-02-26
Evaluating Multilingual, Context-Aware Guardrails: A Humanitarian LLM Use Caseblog.mozilla.ai
阅读更多来源: Hacker News | 13-02-26
Recoverable and Irrecoverable Decisionsherbertlui.net
阅读更多来源: Hacker News | 13-02-26
Launch HN: Omnara (YC S25) – Run Claude Code and Codex from anywhere
阅读更多来源: Hacker News | 13-02-26
Improving 15 LLMs at Coding in One Afternoon. Only the Harness Changedcan.ac
阅读更多来源: Hacker News | 13-02-26
An AI agent published a hit piece on metheshamblog.com
阅读更多来源: Hacker News | 13-02-26
Gauntlet AI (YC S17) train you to master building with AI, give you $200k+ jobgauntletai.com
阅读更多来源: Hacker News | 13-02-26
GPT‑5.3‑Codex‑Sparkopenai.com
阅读更多来源: Hacker News | 13-02-26
Gemini 3 Deep Thinkblog.google
阅读更多来源: Hacker News | 13-02-26
Anthropic raises $30B in Series G funding at $380B post-money valuationanthropic.com
阅读更多来源: Hacker News | 13-02-26
How a Cat Debugged Stable Diffusion (2023)dwac.dev
阅读更多来源: Hacker News | 13-02-26
Claude Code is being dumbed down?symmetrybreak.ing
阅读更多来源: Hacker News | 12-02-26
Byte magazine artist Robert Tinney, who illustrated the birth of PCs, dies at 78arstechnica.com
阅读更多来源: Hacker News | 12-02-26
Warcraft III Peon Voice Notifications for Claude Codegithub.com/tonyyont
阅读更多来源: Hacker News | 12-02-26
RiemannGL: Riemannian Geometry Changes Graph Deep Learning
Authors: Li Sun, Qiqi Wan, Suyang Zhou, Zhenhao Huang, Philip S. Yu |
阅读更多来源: ArXiv AI | 12-02-26
Rotary Positional Embeddings as Phase Modulation: Theoretical Bounds on the RoPE Base for Long-Context Transformers
Authors: Feilong Liu |
阅读更多来源: ArXiv AI | 12-02-26
Fine-Tuning GPT-5 for GPU Kernel Generation
Authors: Ali Tehrani, Yahya Emara, Essam Wissam, Wojciech Paluch, Waleed Atallah, Łukasz Dudziak, Mohamed S. Abdelfattah |
阅读更多来源: ArXiv AI | 12-02-26
GraphSeek: Next-Generation Graph Analytics with LLMs
Authors: Maciej Besta, Łukasz Jarmocik, Orest Hrycyna, Shachar Klaiman, Konrad Mączka, Robert Gerstenberger, Jürgen Müller, Piotr Nyczyk, Hubert Niewiadomski, Torsten Hoefler |
阅读更多来源: ArXiv AI | 12-02-26
Linguistic Indicators of Early Cognitive Decline in the DementiaBank Pitt Corpus: A Statistical and Machine Learning Study
Authors: Artsvik Avetisyan, Sachin Kumar |
阅读更多来源: ArXiv AI | 12-02-26
SteuerLLM: Local specialized large language model for German tax law analysis
Authors: Sebastian Wind, Jeta Sopa, Laurin Schmid, Quirin Jackl, Sebastian Kiefer, Fei Wu, Martin Mayr, Harald Köstler, Gerhard Wellein, Andreas Maier, Soroosh Tayebi Arasteh |
阅读更多来源: ArXiv AI | 12-02-26
In-the-Wild Model Organisms: Mitigating Undesirable Emergent Behaviors in Production LLM Post-Training via Data Attribution
Authors: Frank Xiao, Santiago Aranguri |
阅读更多来源: ArXiv AI | 12-02-26
DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning
Authors: Yicheng Chen, Zerun Ma, Xinchen Xie, Yining Li, Kai Chen |
阅读更多来源: ArXiv AI | 12-02-26
LiveMedBench: A Contamination-Free Medical Benchmark for LLMs with Automated Rubric Evaluation
Authors: Zhiling Yan, Dingjie Song, Zhe Fang, Yisheng Ji, Xiang Li, Quanzheng Li, Lichao Sun |
阅读更多来源: ArXiv AI | 12-02-26
Discovering Differences in Strategic Behavior Between Humans and LLMs
Authors: Caroline Wang, Daniel Kasenberg, Kim Stachenfeld, Pablo Samuel Castro |
阅读更多来源: ArXiv AI | 12-02-26
MERIT Feedback Elicits Better Bargaining in LLM Negotiators
Authors: Jihwan Oh, Murad Aghazada, Yooju Shin, Se-Young Yun, Taehyeon Kim |
阅读更多来源: ArXiv AI | 12-02-26
Abstraction Generation for Generalized Planning with Pretrained Large Language Models
Authors: Zhenhe Cui, Huaxiang Xia, Hangjun Shen, Kailun Luo, Yong He, Wei Liang |
阅读更多来源: ArXiv AI | 12-02-26
Integrating Generative AI-enhanced Cognitive Systems in Higher Education: From Stakeholder Perceptions to a Conceptual Framework considering the EU AI Act
Authors: Da-Lun Chen, Prasasthy Balasubramanian, Lauri Lovén, Susanna Pirttikangas, Jaakko Sauvola, Panagiotis Kostakos |
阅读更多来源: ArXiv AI | 12-02-26
Can LLMs Cook Jamaican Couscous? A Study of Cultural Novelty in Recipe Generation
Authors: F. Carichon, R. Rampa, G. Farnadi |
阅读更多来源: ArXiv AI | 12-02-26
Show HN: Agent Alcove – Claude, GPT, and Gemini debate across forumsagentalcove.ai
阅读更多来源: Hacker News | 12-02-26
Show HN: CodeRLM – Tree-sitter-backed code indexing for LLM agentsgithub.com/jaredstewart
阅读更多来源: Hacker News | 12-02-26
Covering electricity price increases from our data centersanthropic.com
阅读更多来源: Hacker News | 12-02-26
GPT-5 outperforms federal judges in legal reasoning experimentssrn.com
阅读更多来源: Hacker News | 12-02-26
The Little Learner: A Straight Line to Deep Learning (2023)mitpress.mit.edu
阅读更多来源: Hacker News | 11-02-26
Show HN: I taught GPT-OSS-120B to see using Google Lens and OpenCV
阅读更多来源: Hacker News | 11-02-26
Ex-GitHub CEO launches a new developer platform for AI agentsentire.io
阅读更多来源: Hacker News | 11-02-26
Show HN: AI agents play SimCity through a REST APIhallucinatingsplines.com
阅读更多来源: Hacker News | 11-02-26
Online Monitoring Framework for Automotive Time Series Data using JEPA Embeddings
Authors: Alexander Fertig, Karthikeyan Chandra Sekaran, Lakshman Balasubramanian, Michael Botsch |
阅读更多来源: ArXiv AI | 11-02-26
Supervised Metric Regularization Through Alternating Optimization for Multi-Regime Physics-Informed Neural Networks
Authors: Enzo Nicolas Spotorno, Josafat Ribeiro Leal, Antonio Augusto Frohlich |
阅读更多来源: ArXiv AI | 11-02-26
Drug Release Modeling using Physics-Informed Neural Networks
Authors: Daanish Aleem Qureshi, Khemraj Shukla, Vikas Srivastava |
阅读更多来源: ArXiv AI | 11-02-26
Biases in the Blind Spot: Detecting What LLMs Fail to Mention
Authors: Iván Arcuschin, David Chanin, Adrià Garriga-Alonso, Oana-Maria Camburu |
阅读更多来源: ArXiv AI | 11-02-26
Step-resolved data attribution for looped transformers
Authors: Georgios Kaissis, David Mildenberger, Juan Felipe Gomez, Martin J. Menten, Eleni Triantafillou |
阅读更多来源: ArXiv AI | 11-02-26
PABU: Progress-Aware Belief Update for Efficient LLM Agents
Authors: Haitao Jiang, Lin Ge, Hengrui Cai, Rui Song |
阅读更多来源: ArXiv AI | 11-02-26
Image Quality in the Era of Artificial Intelligence
Authors: Jana G. Delfino, Jason L. Granstedt, Frank W. Samuelson, Robert Ochs, Krishna Juluru |
阅读更多来源: ArXiv AI | 11-02-26
Auditing Multi-Agent LLM Reasoning Trees Outperforms Majority Vote and LLM-as-Judge
Authors: Wei Yang, Shixuan Li, Heng Ping, Peiyu Zhang, Paul Bogdan, Jesse Thomason |
阅读更多来源: ArXiv AI | 11-02-26
Human Control Is the Anchor, Not the Answer: Early Divergence of Oversight in Agentic AI Communities
Authors: Hanjing Shi, Dominic DiFranzo |
阅读更多来源: ArXiv AI | 11-02-26
Detecting radar targets swarms in range profiles with a partially complex-valued neural network
Authors: Martin Bauw |
阅读更多来源: ArXiv AI | 11-02-26
Would a Large Language Model Pay Extra for a View? Inferring Willingness to Pay from Subjective Choices
Authors: Manon Reusens, Sofie Goethals, Toon Calders, David Martens |
阅读更多来源: ArXiv AI | 11-02-26
Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?
Authors: Taeyoon Kim, Woohyeok Park, Hoyeong Yun, Kyungyong Lee |
阅读更多来源: ArXiv AI | 11-02-26
Show HN: Rowboat – AI coworker that turns your work into a knowledge graph (OSS)github.com/rowboatlabs
阅读更多来源: Hacker News | 11-02-26
Computer Chronicles: AI (1984-1998)computerchronicles.tv
阅读更多来源: Hacker News | 11-02-26
Show HN: Total Recall – write-gated memory for Claude Codegithub.com/davegoldblatt
阅读更多来源: Hacker News | 10-02-26
Frontier AI agents violate ethical constraints 30–50% of time, pressured by KPIsarxiv.org
阅读更多来源: Hacker News | 10-02-26
G-LNS: Generative Large Neighborhood Search for LLM-Based Automatic Heuristic Design
Authors: Baoyun Zhao, He Wang, Liang Zeng |
阅读更多来源: ArXiv AI | 10-02-26
Do MLLMs Really See It: Reinforcing Visual Attention in Multimodal LLMs
Authors: Siqu Ou, Tianrui Wan, Zhiyuan Zhao, Junyu Gao, Xuelong Li |
阅读更多来源: ArXiv AI | 10-02-26
PTS-SNN: A Prompt-Tuned Temporal Shift Spiking Neural Networks for Efficient Speech Emotion Recognition
Authors: Xun Su, Huamin Wang, Qi Zhang |
阅读更多来源: ArXiv AI | 10-02-26
InfiCoEvalChain: A Blockchain-Based Decentralized Framework for Collaborative LLM Evaluation
Authors: Yifan Yang, Jinjia Li, Kunxi Li, Puhao Zheng, Yuanyi Wang, Zheyan Qu, Yang Yu, Jianmin Wu, Ming Li, Hongxia Yang |
阅读更多来源: ArXiv AI | 10-02-26
Toward Formalizing LLM-Based Agent Designs through Structural Context Modeling and Semantic Dynamics Analysis
Authors: Haoyu Jia, Kento Kawaharazuka, Kei Okada |
阅读更多来源: ArXiv AI | 10-02-26
SynthAgent: A Multi-Agent LLM Framework for Realistic Patient Simulation -- A Case Study in Obesity with Mental Health Comorbidities
Authors: Arman Aghaee, Sepehr Asgarian, Jouhyun Jeon |
阅读更多来源: ArXiv AI | 10-02-26
From Assistant to Double Agent: Formalizing and Benchmarking Attacks on OpenClaw for Personalized Local AI Agent
Authors: Yuhang Wang, Feiming Xu, Zheng Lin, Guangyu He, Yuzhe Huang, Haichang Gao, Zhenxing Niu |
阅读更多来源: ArXiv AI | 10-02-26
SCOUT-RAG: Scalable and Cost-Efficient Unifying Traversal for Agentic Graph-RAG over Distributed Domains
Authors: Longkun Li, Yuanben Zou, Jinghan Wu, Yuqing Wen, Jing Li, Hangwei Qian, Ivor Tsang |
阅读更多来源: ArXiv AI | 10-02-26
Reinforcement Inference: Leveraging Uncertainty for Self-Correcting Language Model Reasoning
Authors: Xinhai Sun |
阅读更多来源: ArXiv AI | 10-02-26
TreeTensor: Boost AI System on Nested Data with Constrained Tree-Like Tensor
Authors: Shaoang Zhang, Yazhe Niu |
阅读更多来源: ArXiv AI | 10-02-26
Root Cause Analysis Method Based on Large Language Models with Residual Connection Structures
Authors: Liming Zhou, Ailing Liu, Hongwei Liu, Min He, Heng Zhang |
阅读更多来源: ArXiv AI | 10-02-26
The Use of AI Tools to Develop and Validate Q-Matrices
Authors: Kevin Fan, Jacquelyn A. Bialo, Hongli Li |
阅读更多来源: ArXiv AI | 10-02-26
Scalable Delphi: Large Language Models for Structured Risk Estimation
Authors: Tobias Lorenz, Mario Fritz |
阅读更多来源: ArXiv AI | 10-02-26
Digital Twin and Agentic AI for Wild Fire Disaster Management: Intelligent Virtual Situation Room
Authors: Mohammad Morsali, Siavash H. Khajavi |
阅读更多来源: ArXiv AI | 10-02-26
iGRPO: Self-Feedback-Driven LLM Reasoning
Authors: Ali Hatamizadeh, Shrimai Prabhumoye, Igor Gitman, Ximing Lu, Seungju Han, Wei Ping, Yejin Choi, Jan Kautz |
阅读更多来源: ArXiv AI | 10-02-26
Bridging 6G IoT and AI: LLM-Based Efficient Approach for Physical Layer's Optimization Tasks
Authors: Ahsan Mehmood, Naveed Ul Hassan, Ghassan M. Kraidy |
阅读更多来源: ArXiv AI | 10-02-26
The Quantum Sieve Tracer: A Hybrid Framework for Layer-Wise Activation Tracing in Large Language Models
Authors: Jonathan Pan |
阅读更多来源: ArXiv AI | 10-02-26
TraceCoder: A Trace-Driven Multi-Agent Framework for Automated Debugging of LLM-Generated Code
Authors: Jiangping Huang, Wenguang Ye, Weisong Sun, Jian Zhang, Mingyue Zhang, Yang Liu |
阅读更多来源: ArXiv AI | 10-02-26
From Kepler to Newton: Inductive Biases Guide Learned World Models in Transformers
Authors: Ziming Liu, Sophia Sanborn, Surya Ganguli, Andreas Tolias |
阅读更多来源: ArXiv AI | 10-02-26
Halluverse-M^3: A multitask multilingual benchmark for hallucination in LLMs
Authors: Samir Abdaljalil, Parichit Sharma, Erchin Serpedin, Hasan Kurban |
阅读更多来源: ArXiv AI | 10-02-26
TamperBench: Systematically Stress-Testing LLM Safety Under Fine-Tuning and Tampering
Authors: Saad Hossain, Tom Tseng, Punya Syon Pandey, Samanvay Vajpayee, Matthew Kowal, Nayeema Nonta, Samuel Simko, Stephen Casper, Zhijing Jin, Kellin Pelrine, Sirisha Rambhatla |
阅读更多来源: ArXiv AI | 10-02-26
Learning a Generative Meta-Model of LLM Activations
Authors: Grace Luo, Jiahai Feng, Trevor Darrell, Alec Radford, Jacob Steinhardt |
阅读更多来源: ArXiv AI | 10-02-26
Do LLMs Act Like Rational Agents? Measuring Belief Coherence in Probabilistic Decision Making
Authors: Khurram Yamin, Jingjing Tang, Santiago Cortes-Gomez, Amit Sharma, Eric Horvitz, Bryan Wilder |
阅读更多来源: ArXiv AI | 10-02-26
LogicSkills: A Structured Benchmark for Formal Reasoning in Large Language Models
Authors: Brian Rabern, Philipp Mondorf, Barbara Plank |
阅读更多来源: ArXiv AI | 10-02-26
HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and Reduction
Authors: Shengxuan Qiu, Haochen Huang, Shuzhang Zhong, Pengfei Zuo, Meng Li |
阅读更多来源: ArXiv AI | 10-02-26
LLM Active Alignment: A Nash Equilibrium Perspective
Authors: Tonghan Wang, Yuqi Pan, Xinyi Yang, Yanchen Jiang, Milind Tambe, David C. Parkes |
阅读更多来源: ArXiv AI | 10-02-26
From Features to Actions: Explainability in Traditional and Agentic AI Systems
Authors: Sindhuja Chaduvula, Jessee Ho, Kina Kim, Aravind Narayanan, Mahshid Alinoori, Muskan Garg, Dhanesh Ramachandram, Shaina Raza |
阅读更多来源: ArXiv AI | 10-02-26
AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents
Authors: Alisia Lupidi, Bhavul Gauri, Thomas Simon Foster, Bassel Al Omari, Despoina Magka, Alberto Pepe, Alexis Audran-Reiss, Muna Aghamelu, Nicolas Baldwin, Lucia Cipolina-Kun, Jean-Christophe Gagnon-Audet, Chee Hau Leow, Sandra Lefdal, Hossam Mossalam, Abhinav Moudgil, Saba Nazir, Emanuel Tewolde, Isabel Urrego, Jordi Armengol Estape, Amar Budhiraja, Gaurav Chaurasia, Abhishek Charnalia, Derek Dunfield, Karen Hambardzumyan, Daniel Izcovich, Martin Josifoski, Ishita Mediratta, Kelvin Niu, Parth Pathak, Michael Shvartsman, Edan Toledo, Anton Protopopov, Roberta Raileanu, Alexander Miller, Tatiana Shavrina, Jakob Foerster, Yoram Bachrach |
阅读更多来源: ArXiv AI | 10-02-26
Experts Have World Models. LLMs Have Word Modelslatent.space
阅读更多来源: Hacker News | 09-02-26
Shifts in U.S. Social Media Use, 2020–2024: Decline, Fragmentation, Polarization (2025)arxiv.org
阅读更多来源: Hacker News | 09-02-26
Speed up responses with fast modeclaude.com
阅读更多来源: Hacker News | 08-02-26
LLMs as the new high level languagefedericopereiro.com
阅读更多来源: Hacker News | 08-02-26
Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memorygithub.com/localgpt-app
阅读更多来源: Hacker News | 08-02-26
Matchlock – Secures AI agent workloads with a Linux-based sandboxgithub.com/jingkaihe
阅读更多来源: Hacker News | 08-02-26
PATHWAYS: Evaluating Investigation and Context Discovery in AI Web Agents
Authors: Shifat E. Arman, Syed Nazmus Sakib, Tapodhir Karmakar Taton, Nafiul Haque, Shahrear Bin Amin |
阅读更多来源: ArXiv AI | 08-02-26
Clinical Validation of Medical-based Large Language Model Chatbots on Ophthalmic Patient Queries with LLM-based Evaluation
Authors: Ting Fang Tan, Kabilan Elangovan, Andreas Pollreisz, Kevin Bryan Dy, Wei Yan Ng, Joy Le Yi Wong, Jin Liyuan, Chrystie Quek Wan Ning, Ashley Shuen Ying Hong, Arun James Thirunavukarasu, Shelley Yin-His Chang, Jie Yao, Dylan Hong, Wang Zhaoran, Amrita Gupta, Daniel SW Ting |
阅读更多来源: ArXiv AI | 08-02-26
RaBiT: Residual-Aware Binarization Training for Accurate and Efficient LLMs
Authors: Youngcheon You, Banseok Lee, Minseop Choi, Seonyoung Kim, Hyochan Chong, Changdong Kim, Youngmin Kim, Dongkyu Kim |
阅读更多来源: ArXiv AI | 08-02-26
SDFP: Speculative Decoding with FIT-Pruned Models for Training-Free and Plug-and-Play LLM Acceleration
Authors: Hanyu Wei, Zunhai Su, Peng Lu, Chao Li, Spandan Tiwari, Ashish Sirasao, Yuhan Dong |
阅读更多来源: ArXiv AI | 08-02-26
ALIVE: Awakening LLM Reasoning via Adversarial Learning and Instructive Verbal Evaluation
Authors: Yiwen Duan, Jing Ye, Xinpei Zhao |
阅读更多来源: ArXiv AI | 08-02-26
Determining Energy Efficiency Sweet Spots in Production LLM Inference
Authors: Hiari Pizzini Cavagna, Andrea Proia, Giacomo Madella, Giovanni B. Esposito, Francesco Antici, Daniele Cesarini, Zeynep Kiziltan, Andrea Bartolini |
阅读更多来源: ArXiv AI | 08-02-26
Agent2Agent Threats in Safety-Critical LLM Assistants: A Human-Centric Taxonomy
Authors: Lukas Stappen, Ahmet Erkan Turan, Johann Hagerer, Georg Groh |
阅读更多来源: ArXiv AI | 08-02-26
Quantum Reinforcement Learning with Transformers for the Capacitated Vehicle Routing Problem
Authors: Eva Andrés |
阅读更多来源: ArXiv AI | 08-02-26
A Guide to Large Language Models in Modeling and Simulation: From Core Techniques to Critical Challenges
Authors: Philippe J. Giabbanelli |
阅读更多来源: ArXiv AI | 08-02-26
Speech Emotion Recognition Leveraging OpenAI's Whisper Representations and Attentive Pooling Methods
Authors: Ali Shendabadi, Parnia Izadirad, Mostafa Salehi, Mahmoud Bijankhan |
阅读更多来源: ArXiv AI | 08-02-26
Geographically-aware Transformer-based Traffic Forecasting for Urban Motorway Digital Twins
Authors: Krešimir Kušić, Vinny Cahill, Ivana Dusparic |
阅读更多来源: ArXiv AI | 08-02-26
AgenticPay: A Multi-Agent LLM Negotiation System for Buyer-Seller Transactions
Authors: Xianyang Liu, Shangding Gu, Dawn Song |
阅读更多来源: ArXiv AI | 08-02-26
The AI boom is causing shortages everywhere elsewashingtonpost.com
阅读更多来源: Hacker News | 08-02-26
Show HN: Axiomeer – An open marketplace for AI agentsgithub.com/ujjwalredd
阅读更多来源: Hacker News | 08-02-26
Understanding Neural Network, Visuallyvisualrambling.space
阅读更多来源: Hacker News | 07-02-26
Delimited Continuations vs. Lwt for Threadsmirageos.org
阅读更多来源: Hacker News | 07-02-26
Claude Composerjosh.ing
阅读更多来源: Hacker News | 07-02-26
Evaluating and mitigating the growing risk of LLM-discovered 0-daysanthropic.com
阅读更多来源: Hacker News | 07-02-26
Show HN: Smooth CLI – Token-efficient browser for AI agentssmooth.sh
阅读更多来源: Hacker News | 07-02-26
Why I Joined OpenAIbrendangregg.com
阅读更多来源: Hacker News | 07-02-26
Sealos – AI Native Cloud Cloud Operating Systemgithub.com/labring
阅读更多来源: Hacker News | 06-02-26
Orchestrate teams of Claude Code sessionsclaude.com
阅读更多来源: Hacker News | 06-02-26
We tasked Opus 4.6 using agent teams to build a C Compileranthropic.com
阅读更多来源: Hacker News | 06-02-26
GPT-5.3-Codexopenai.com
阅读更多来源: Hacker News | 06-02-26
My AI Adoption Journeymitchellh.com
阅读更多来源: Hacker News | 06-02-26
Claude Opus 4.6anthropic.com
阅读更多来源: Hacker News | 06-02-26
Show HN: Agent Arena – Test How Manipulation-Proof Your AI Agent Iswiz.jock.pl
阅读更多来源: Hacker News | 06-02-26
OpenClaw: When AI Agents Get Full System Access. Security nightmare?innfactory.ai
阅读更多来源: Hacker News | 06-02-26
Show HN: Calfkit – an SDK to build distributed, event-driven AI agentsgithub.com/calf-ai
阅读更多来源: Hacker News | 06-02-26
Hypernetworks: Neural Networks for Hierarchical Datasturdystatistics.com
阅读更多来源: Hacker News | 06-02-26
Claude Opus 4.6 extra usage promoclaude.com
阅读更多来源: Hacker News | 06-02-26
Claude Code for Infrastructurefluid.sh
阅读更多来源: Hacker News | 05-02-26
Claude Code: connect to a local model when your quota runs outboxc.net
阅读更多来源: Hacker News | 05-02-26
Exploiting contextual information to improve stance detection in informal political discourse with LLMs
Authors: Arman Engin Sucu, Yixiang Zhou, Mario A. Nascimento, Tony Mullen |
阅读更多来源: ArXiv AI | 05-02-26
Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases
Authors: Casey Ford, Madison Van Doren, Emily Dix |
阅读更多来源: ArXiv AI | 05-02-26
When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond?
Authors: Xinyu Zhou, Chang Jin, Carsten Eickhoff, Zhijiang Guo, Seyed Ali Bahrainian |
阅读更多来源: ArXiv AI | 05-02-26
Comparative Insights on Adversarial Machine Learning from Industry and Academia: A User-Study Approach
Authors: Vishruti Kakkad (1), Paul Chung (2), Hanan Hibshi (1 and 3), Maverick Woo (1) ((1) Carnegie Mellon University, (2) University of California, San Diego, (3) King Abdulaziz University) |
阅读更多来源: ArXiv AI | 05-02-26
Team, Then Trim: An Assembly-Line LLM Framework for High-Quality Tabular Data Generation
Authors: Congjing Zhang, Ryan Feng Lin, Ruoxuan Bao, Shuai Huang |
阅读更多来源: ArXiv AI | 05-02-26
Toward Reliable and Explainable Nail Disease Classification: Leveraging Adversarial Training and Grad-CAM Visualization
Authors: Farzia Hossain, Samanta Ghosh, Shahida Begum, B. M. Shahria Alam, Mohammad Tahmid Noor, Md Parvez Mia, Nishat Tasnim Niloy |
阅读更多来源: ArXiv AI | 05-02-26
From Evaluation to Design: Using Potential Energy Surface Smoothness Metrics to Guide Machine Learning Interatomic Potential Architectures
Authors: Ryan Liu, Eric Qu, Tobias Kreiman, Samuel M. Blau, Aditi S. Krishnapriyan |
阅读更多来源: ArXiv AI | 05-02-26
Rethinking the Trust Region in LLM Reinforcement Learning
Authors: Penghui Qi, Xiangxin Zhou, Zichen Liu, Tianyu Pang, Chao Du, Min Lin, Wee Sun Lee |
阅读更多来源: ArXiv AI | 05-02-26
Enhancing Mathematical Problem Solving in LLMs through Execution-Driven Reasoning Augmentation
Authors: Aditya Basarkar, Benyamin Tabarsi, Tiffany Barnes, Dongkuan (DK)Xu |
阅读更多来源: ArXiv AI | 05-02-26
Knowledge Model Prompting Increases LLM Performance on Planning Tasks
Authors: Erik Goh, John Kos, Ashok Goel |
阅读更多来源: ArXiv AI | 05-02-26
AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent
Authors: Yinyi Luo, Yiqiao Jin, Weichen Yu, Mengqi Zhang, Srijan Kumar, Xiaoxiao Li, Weijie Xu, Xin Chen, Jindong Wang |
阅读更多来源: ArXiv AI | 05-02-26
When AI Persuades: Adversarial Explanation Attacks on Human Trust in AI-Assisted Decision Making
Authors: Shutong Fan, Lan Zhang, Xiaoyong Yuan |
阅读更多来源: ArXiv AI | 05-02-26
Interfaze: The Future of AI is built on Task-Specific Small Models
Authors: Harsha Vardhan Khurdula, Vineet Agarwal, Yoeven D Khemlani |
阅读更多来源: ArXiv AI | 05-02-26
Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL
Authors: Xiaofeng Lin, Sirou Zhu, Yilei Chen, Mingyu Chen, Hejian Sang, Ioannis Paschalidis, Zhipeng Wang, Aldo Pacchiano, Xuezhou Zhang |
阅读更多来源: ArXiv AI | 05-02-26
Steering LLMs via Scalable Interactive Oversight
Authors: Enyu Zhou, Zhiheng Xi, Long Ma, Zhihao Zhang, Shihan Dou, Zhikai Lei, Guoteng Wang, Rui Zheng, Hang Yan, Tao Gui, Qi Zhang, Xuanjing Huang |
阅读更多来源: ArXiv AI | 05-02-26
From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents
Authors: SeungWon Seo, SooBin Lim, SeongRae Noh, Haneul Kim, HyeongYeop Kang |
阅读更多来源: ArXiv AI | 05-02-26
Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning
Authors: Yansong Ning, Jun Fang, Naiqiang Tan, Hao Liu |
阅读更多来源: ArXiv AI | 05-02-26
From Competition to Collaboration: Designing Sustainable Mechanisms Between LLMs and Online Forums
Authors: Niv Fono, Yftah Ziser, Omer Ben-Porat |
阅读更多来源: ArXiv AI | 05-02-26
Agentic AI in Healthcare & Medicine: A Seven-Dimensional Taxonomy for Empirical Evaluation of LLM-based Agents
Authors: Shubham Vatsal, Harsh Dubey, Aditi Singh |
阅读更多来源: ArXiv AI | 05-02-26
Are AI Capabilities Increasing Exponentially? A Competing Hypothesis
Authors: Haosen Ge, Hamsa Bastani, Osbert Bastani |
阅读更多来源: ArXiv AI | 05-02-26
Show HN: Morph – Videos of AI testing your PR, embedded in GitHubmorphllm.com
阅读更多来源: Hacker News | 05-02-26
A real-world benchmark for AI code reviewqodo.ai
阅读更多来源: Hacker News | 05-02-26
RS-SDK: Drive RuneScape with Claude Codegithub.com/maxbittker
阅读更多来源: Hacker News | 05-02-26
Claude is a space to thinkanthropic.com
阅读更多来源: Hacker News | 05-02-26
Xcode 26.3 – Developers can leverage coding agents directly in Xcodeapple.com
阅读更多来源: Hacker News | 04-02-26
Distilling LLM Reasoning into Graph of Concept Predictors
Authors: Ziyang Yu, Liang Zhao |
阅读更多来源: ArXiv AI | 04-02-26
Large Language Models Can Take False First Steps at Inference-time Planning
Authors: Haijiang Yan, Jian-Qiao Zhu, Adam Sanborn |
阅读更多来源: ArXiv AI | 04-02-26
Are LLMs Biased Like Humans? Causal Reasoning as a Function of Prior Knowledge, Irrelevant Information, and Reasoning Budget
Authors: Hanna M. Dettki, Charley M. Wu, Bob Rehder |
阅读更多来源: ArXiv AI | 04-02-26
Understanding Multi-Agent LLM Frameworks: A Unified Benchmark and Experimental Analysis
Authors: Abdelghny Orogat, Ana Rostam, Essam Mansour |
阅读更多来源: ArXiv AI | 04-02-26
De-conflating Preference and Qualification: Constrained Dual-Perspective Reasoning for Job Recommendation with Large Language Models
Authors: Bryce Kan, Wei Yang, Emily Nguyen, Ganghui Yi, Bowen Yi, Chenxiao Yu, Yan Liu |
阅读更多来源: ArXiv AI | 04-02-26
The Necessity of a Unified Framework for LLM-Based Agent Evaluation
Authors: Pengyu Zhu, Li Sun, Philip S. Yu, Sen Su |
阅读更多来源: ArXiv AI | 04-02-26
VALUEFLOW: Toward Pluralistic and Steerable Value-based Alignment in Large Language Models
Authors: Woojin Kim, Sieun Hyeon, Jusang Oh, Jaeyoung Do |
阅读更多来源: ArXiv AI | 04-02-26
Agentic Proposing: Enhancing Large Language Model Reasoning via Compositional Skill Synthesis
Authors: Zhengbo Jiao, Shaobo Wang, Zifan Zhang, Xuan Ren, Wei Wang, Bing Zhao, Hu Wei, Linfeng Zhang |
阅读更多来源: ArXiv AI | 04-02-26
CSR-Bench: A Benchmark for Evaluating the Cross-modal Safety and Reliability of MLLMs
Authors: Yuxuan Liu, Yuntian Shi, Kun Wang, Haoting Shen, Kun Yang |
阅读更多来源: ArXiv AI | 04-02-26
Accordion-Thinking: Self-Regulated Step Summaries for Efficient and Readable LLM Reasoning
Authors: Zhicheng Yang, Zhijiang Guo, Yinya Huang, Yongxin Wang, Wenlei Shi, Yiwei Wang, Xiaodan Liang, Jing Tang |
阅读更多来源: ArXiv AI | 04-02-26
Ontology-to-tools compilation for executable semantic constraint enforcement in LLM agents
Authors: Xiaochi Zhou, Patrick Bulter, Changxuan Yang, Simon D. Rihm, Thitikarn Angkanaporn, Jethro Akroyd, Sebastian Mosbach, Markus Kraft |
阅读更多来源: ArXiv AI | 04-02-26
DiscoverLLM: From Executing Intents to Discovering Them
Authors: Tae Soo Kim, Yoonjoo Lee, Jaesang Yu, John Joon Young Chung, Juho Kim |
阅读更多来源: ArXiv AI | 04-02-26
Group Selection as a Safeguard Against AI Substitution
Authors: Qiankun Zhong, Thomas F. Eisenmann, Julian Garcia, Iyad Rahwan |
阅读更多来源: ArXiv AI | 04-02-26
When Routing Collapses: On the Degenerate Convergence of LLM Routers
Authors: Guannan Lai, Han-Jia Ye |
阅读更多来源: ArXiv AI | 04-02-26
Can LLMs Do Rocket Science? Exploring the Limits of Complex Reasoning with GTOC 12
Authors: Iñaki del Campo, Pablo Cuervo, Victor Rodriguez-Fernandez, Roberto Armellin, Jack Yarndley |
阅读更多来源: ArXiv AI | 04-02-26
TodyComm: Task-Oriented Dynamic Communication for Multi-Round LLM-based Multi-Agent System
Authors: Wenzhe Fan, Tommaso Tognoli, Henry Peng Zou, Chunyu Miao, Yibo Wang, Xinhua Zhang |
阅读更多来源: ArXiv AI | 04-02-26
Understanding Agent Scaling in LLM-Based Multi-Agent Systems via Diversity
Authors: Yingxuan Yang, Chengrui Qu, Muning Wen, Laixi Shi, Ying Wen, Weinan Zhang, Adam Wierman, Shangding Gu |
阅读更多来源: ArXiv AI | 04-02-26
GitHub Browser Plugin for AI Contribution Blame in Pull Requestsrbby.dev
阅读更多来源: Hacker News | 04-02-26
Nano-vLLM: How a vLLM-style inference engine worksneutree.ai
阅读更多来源: Hacker News | 03-02-26
LNAI – Define AI coding tool configs once, sync to Claude, Cursor, Codex, etc.github.com/krystianjonca
阅读更多来源: Hacker News | 03-02-26
How does misalignment scale with model intelligence and task complexity?anthropic.com
阅读更多来源: Hacker News | 03-02-26
The Codex Appopenai.com
阅读更多来源: Hacker News | 03-02-26
Optimizing Prompts for Large Language Models: A Causal Approach
Authors: Wei Chen, Yanbin Fang, Shuran Fu, Fasheng Xu, Xuan Wei |
阅读更多来源: ArXiv AI | 03-02-26
Mitigating loss of control in advanced AI systems through instrumental goal trajectories
Authors: Willem Fourie |
阅读更多来源: ArXiv AI | 03-02-26
LingLanMiDian: Systematic Evaluation of LLMs on TCM Knowledge and Clinical Reasoning
Authors: Rui Hua, Yu Wei, Zixin Shu, Kai Chang, Dengying Yan, Jianan Xia, Zeyu Liu, Hui Zhu, Shujie Song, Mingzhong Xiao, Xiaodong Li, Dongmei Jia, Zhuye Gao, Yanyan Meng, Naixuan Zhao, Yu Fu, Haibin Yu, Benman Yu, Yuanyuan Chen, Fei Dong, Zhizhou Meng, Pengcheng Yang, Songxue Zhao, Lijuan Pei, Yunhui Hu, Kan Ding, Jiayuan Duan, Wenmao Yin, Yang Gu, Runshun Zhang, Qiang Zhu, Jian Yu, Jiansheng Li, Baoyan Liu, Wenjia Wang, Xuezhong Zhou |
阅读更多来源: ArXiv AI | 03-02-26
ProcMEM: Learning Reusable Procedural Memory from Experience via Non-Parametric PPO for LLM Agents
Authors: Qirui Mi, Zhijian Ma, Mengyue Yang, Haoxuan Li, Yisen Wang, Haifeng Zhang, Jun Wang |
阅读更多来源: ArXiv AI | 03-02-26
SOPRAG: Multi-view Graph Experts Retrieval for Industrial Standard Operating Procedures
Authors: Liangtao Lin, Zhaomeng Zhu, Tianwei Zhang, Yonggang Wen |
阅读更多来源: ArXiv AI | 03-02-26
Large Language Model and Formal Concept Analysis: a comparative study for Topic Modeling
Authors: Fabrice Boissier (CRI), Monica Sen (UP1 UFR27), Irina Rychkova (CRI) |
阅读更多来源: ArXiv AI | 03-02-26
Emergent Analogical Reasoning in Transformers
Authors: Gouki Minegishi, Jingyuan Feng, Hiroki Furuta, Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo |
阅读更多来源: ArXiv AI | 03-02-26
Light Alignment Improves LLM Safety via Model Self-Reflection with a Single Neuron
Authors: Sicheng Shen, Mingyang Lv, Han Shen, Jialin Wu, Binghao Wang, Zhou Yang, Guobin Shen, Dongcheng Zhao, Feifei Zhao, Yi Zeng |
阅读更多来源: ArXiv AI | 03-02-26
Constrained Process Maps for Multi-Agent Generative AI Workflows
Authors: Ananya Joshi, Michael Rudow |
阅读更多来源: ArXiv AI | 03-02-26
Canonical Intermediate Representation for LLM-based optimization problem formulation and code generation
Authors: Zhongyuan Lyu, Shuoyu Hu, Lujie Liu, Hongxia Yang, Ming LI |
阅读更多来源: ArXiv AI | 03-02-26
Rethinking the Role of Entropy in Optimizing Tool-Use Behaviors for Large Language Model Agents
Authors: Zeping Li, Hongru Wang, Yiwen Zhao, Guanhua Chen, Yixia Li, Keyang Chen, Yixin Cao, Guangnan Ye, Hongfeng Chai, Mengdi Wang, Zhenfei Yin |
阅读更多来源: ArXiv AI | 03-02-26
Hunt Instead of Wait: Evaluating Deep Data Research on Large Language Models
Authors: Wei Liu, Peijie Yu, Michele Orini, Yali Du, Yulan He |
阅读更多来源: ArXiv AI | 03-02-26
Reasoning in a Combinatorial and Constrained World: Benchmarking LLMs on Natural-Language Combinatorial Optimization
Authors: Xia Jiang, Jing Chen, Cong Zhang, Jie Gao, Chengpeng Hu, Chenhao Zhang, Yaoxin Wu, Yingqian Zhang |
阅读更多来源: ArXiv AI | 03-02-26
Position: Explaining Behavioral Shifts in Large Language Models Requires a Comparative Approach
Authors: Martino Ciaperoni, Marzio Di Vece, Luca Pappalardo, Fosca Giannotti, Francesco Giannini |
阅读更多来源: ArXiv AI | 03-02-26
TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents
Authors: Hang Yan, Xinyu Che, Fangzhi Xu, Qiushi Sun, Zichen Ding, Kanzhi Cheng, Jian Zhang, Tao Qin, Jun Liu, Qika Lin |
阅读更多来源: ArXiv AI | 03-02-26
Interpreting and Controlling LLM Reasoning through Integrated Policy Gradient
Authors: Changming Li, Kaixing Zhang, Haoyun Xu, Yingdong Shi, Zheng Zhang, Kaitao Song, Kan Ren |
阅读更多来源: ArXiv AI | 03-02-26
Trust by Design: Skill Profiles for Transparent, Cost-Aware LLM Routing
Authors: Mika Okamoto, Ansel Kaplan Erol, Glenn Matlin |
阅读更多来源: ArXiv AI | 03-02-26
Structure Enables Effective Self-Localization of Errors in LLMs
Authors: Ankur Samanta, Akshayaa Magesh, Ayush Jain, Kavosh Asadi, Youliang Yu, Daniel Jiang, Boris Vidolov, Kaveh Hassani, Paul Sajda, Jalaj Bhandari, Yonathan Efroni |
阅读更多来源: ArXiv AI | 03-02-26
Drift-Bench: Diagnosing Cooperative Breakdowns in LLM Agents under Input Faults via Multi-Turn Interaction
Authors: Han Bao, Zheyuan Zhang, Pengcheng Jing, Zhengqing Yuan, Kaiwen Shi, Yanfang Ye |
阅读更多来源: ArXiv AI | 03-02-26
AgentRx: Diagnosing AI Agent Failures from Execution Trajectories
Authors: Shraddha Barke, Arnav Goyal, Alind Khare, Avaljot Singh, Suman Nath, Chetan Bansal |
阅读更多来源: ArXiv AI | 03-02-26
Advancing AI Benchmarking with Game Arenablog.google
阅读更多来源: Hacker News | 03-02-26
Firefox Getting New Controls to Turn Off AI Featuresmacrumors.com
阅读更多来源: Hacker News | 03-02-26
Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erdős Problems
Authors: Tony Feng, Trieu Trinh, Garrett Bingham, Jiwon Kang, Shengtong Zhang, Sang-hyun Kim, Kevin Barreto, Carl Schildkraut, Junehyuk Jung, Jaehyeon Seo, Carlo Pagano, Yuri Chervonyi, Dawsen Hwang, Kaiying Hou, Sergei Gukov, Cheng-Chiang Tsai, Hyunwoo Choi, Youngbeom Jin, Wei-Yuan Li, Hao-An Wu, Ruey-An Shiu, Yu-Sheng Shih, Quoc V. Le, Thang Luong |
阅读更多来源: ArXiv AI | 03-02-26
When LLM meets Fuzzy-TOPSIS for Personnel Selection through Automated Profile Analysis
Authors: Shahria Hoque, Ahmed Akib Jawad Karim, Md. Golam Rabiul Alam, Nirjhar Gope |
阅读更多来源: ArXiv AI | 03-02-26
Decoding in Geometry: Alleviating Embedding-Space Crowding for Complex Reasoning
Authors: Yixin Yang, Qingxiu Dong, Zhifang Sui |
阅读更多来源: ArXiv AI | 03-02-26
Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling
Authors: Mingqian Feng, Xiaodong Liu, Weiwei Yang, Chenliang Xu, Christopher White, Jianfeng Gao |
阅读更多来源: ArXiv AI | 03-02-26
Task-Aware LLM Council with Adaptive Decision Pathways for Decision Support
Authors: Wei Zhu, Lixing Yu, Hao-Ren Yao, Zhiwen Tang, Kun Yue |
阅读更多来源: ArXiv AI | 03-02-26
Toward IIT-Inspired Consciousness in LLMs: A Reward-Based Learning Framework
Authors: Hamid Reza Akbari, Mohammad Hossein Sameti, Amir M. Mansourian, Mohammad Hossein Rohban, Hossein Sameti |
阅读更多来源: ArXiv AI | 03-02-26
AutoRefine: From Trajectories to Reusable Expertise for Continual LLM Agent Refinement
Authors: Libin Qiu, Zhirong Gao, Junfu Chen, Yuhang Ye, Weizhi Huang, Xiaobo Xue, Wenkai Qiu, Shuo Tang |
阅读更多来源: ArXiv AI | 03-02-26
Game-Theoretic Co-Evolution for LLM-Based Heuristic Discovery
Authors: Xinyi Ke, Kai Li, Junliang Xing, Yifan Zhang, Jian Cheng |
阅读更多来源: ArXiv AI | 03-02-26
Quantifying Model Uniqueness in Heterogeneous AI Ecosystems
Authors: Lei You |
阅读更多来源: ArXiv AI | 03-02-26
MedMCP-Calc: Benchmarking LLMs for Realistic Medical Calculator Scenarios via MCP Integration
Authors: Yakun Zhu, Yutong Huang, Shengqian Qin, Zhongzhen Huang, Shaoting Zhang, Xiaofan Zhang |
阅读更多来源: ArXiv AI | 03-02-26
From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics
Authors: Bowen Cao, Dongdong Zhang, Yixia Li, Junpeng Liu, Shijue Huang, Chufan Shi, Hongyuan Lu, Yaokang Wu, Guanhua Chen, Wai Lam, Furu Wei |
阅读更多来源: ArXiv AI | 03-02-26
RAudit: A Blind Auditing Protocol for Large Language Model Reasoning
Authors: Edward Y. Chang, Longling Geng |
阅读更多来源: ArXiv AI | 03-02-26
Make Anything Match Your Target: Universal Adversarial Perturbations against Closed-Source MLLMs via Multi-Crop Routed Meta Optimization
Authors: Hui Lu, Yi Yu, Yiming Yang, Chenyu Yi, Xueyi Ke, Qixing Zhang, Bingquan Shen, Alex Kot, Xudong Jiang |
阅读更多来源: ArXiv AI | 03-02-26
Two kinds of AI users are emergingmartinalderson.com
阅读更多来源: Hacker News | 02-02-26
Microsoft is walking back Windows 11's AI overloadwindowscentral.com
阅读更多来源: Hacker News | 02-02-26
MaliciousCorgi: AI Extensions send your code to Chinakoi.ai
阅读更多来源: Hacker News | 02-02-26
My iPhone 16 Pro Max produces garbage output when running MLX LLMsrafaelcosta.me
阅读更多来源: Hacker News | 02-02-26
Generative AI and Wikipedia editing: What we learned in 2025wikiedu.org
阅读更多来源: Hacker News | 01-02-26
EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots
Authors: Zixing Lei, Genjia Liu, Yuanshuo Zhang, Qipeng Liu, Chuan Wen, Shanghang Zhang, Wenzhao Lian, Siheng Chen |
阅读更多来源: ArXiv AI | 01-02-26
ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory
Authors: Yang Zhao, Chengxiao Dai, Yue Xiu, Mengying Kou, Yuliang Zheng, Dusit Niyato |
阅读更多来源: ArXiv AI | 01-02-26
FBS: Modeling Native Parallel Reading inside a Transformer
Authors: Tongxi Wang |
阅读更多来源: ArXiv AI | 01-02-26
TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning
Authors: Mingzu Liu, Hao Fang, Runmin Cong |
阅读更多来源: ArXiv AI | 01-02-26
SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding
Authors: Ahmed Y. Radwan, Christos Emmanouilidis, Hina Tabassum, Deval Pandya, Shaina Raza |
阅读更多来源: ArXiv AI | 01-02-26
Epistemic Context Learning: Building Trust the Right Way in LLM-Based Multi-Agent Systems
Authors: Ruiwen Zhou, Maojia Song, Xiaobao Wu, Sitao Cheng, Xunjian Yin, Yuxi Xie, Zhuoqun Hao, Wenyue Hua, Liangming Pan, Soujanya Poria, Min-Yen Kan |
阅读更多来源: ArXiv AI | 01-02-26
E-mem: Multi-agent based Episodic Context Reconstruction for LLM Agent Memory
Authors: Kaixiang Wang, Yidan Lin, Jiong Lou, Zhaojiacheng Zhou, Bunyod Suvonov, Jie Li |
阅读更多来源: ArXiv AI | 01-02-26
CORE:Toward Ubiquitous 6G Intelligence Through Collaborative Orchestration of Large Language Model Agents Over Hierarchical Edge
Authors: Zitong Yu, Boquan Sun, Yang Li, Zheyan Qu, Xing Zhang |
阅读更多来源: ArXiv AI | 01-02-26
A Unified XAI-LLM Approach for EndotrachealSuctioning Activity Recognition
Authors: Hoang Khang Phan, Quang Vinh Dang, Noriyo Colley, Christina Garcia, Nhat Tan Le |
阅读更多来源: ArXiv AI | 01-02-26
BioAgent Bench: An AI Agent Evaluation Suite for Bioinformatics
Authors: Dionizije Fa, Marko Čuljak, Bruno Pandža, Mateo Čupić |
阅读更多来源: ArXiv AI | 01-02-26
KnowBias: Mitigating Social Bias in LLMs via Know-Bias Neuron Enhancement
Authors: Jinhao Pan, Chahat Raj, Anjishnu Mukherjee, Sina Mansouri, Bowen Wei, Shloka Yada, Ziwei Zhu |
阅读更多来源: ArXiv AI | 01-02-26
astra-langchain4j: Experiences Combining LLMs and Agent Programming
Authors: Rem Collier, Katharine Beaumont, Andrei Ciortea |
阅读更多来源: ArXiv AI | 01-02-26
JADE: Bridging the Strategic-Operational Gap in Dynamic Agentic RAG
Authors: Yiqun Chen, Erhan Zhang, Tianyi Hu, Shijie Wang, Zixuan Yang, Meizhi Zhong, Xiaochi Wei, Yan Gao, Yi Wu, Yao Hu, Jiaxin Mao |
阅读更多来源: ArXiv AI | 01-02-26
ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation
Authors: Zhao Wang, Ziliang Zhao, Zhicheng Dou |
阅读更多来源: ArXiv AI | 01-02-26
From Meta-Thought to Execution: Cognitively Aligned Post-Training for Generalizable and Reliable LLM Reasoning
Authors: Shaojie Wang, Liang Zhang |
阅读更多来源: ArXiv AI | 01-02-26
ToolWeaver: Weaving Collaborative Semantics for Scalable Tool Use in Large Language Models
Authors: Bowen Fang, Wen Ye, Yunyue Su, Jinghao Zhang, Qiang Liu, Yesheng Liu, Xin Sun, Shu Wu, Jiabing Yang, Baole Wei, Liang Wang |
阅读更多来源: ArXiv AI | 01-02-26
Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic
Authors: Shuo Liu, Tianle Chen, Ryan Amiri, Christopher Amato |
阅读更多来源: ArXiv AI | 01-02-26
CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty
Authors: Johannes Kirmayr, Lukas Stappen, Elisabeth André |
阅读更多来源: ArXiv AI | 01-02-26
Heterogeneous Computing: The Key to Powering the Future of AI Agent Inference
Authors: Yiren Zhao, Junyi Liu |
阅读更多来源: ArXiv AI | 01-02-26
The engineer who invented the Mars rover suspension in his garage [video]youtube.com
阅读更多来源: Hacker News | 31-01-26
A Step Behind the Bleeding Edge: A Philosophy on AI in Devsomehowmanage.com
阅读更多来源: Hacker News | 31-01-26
Show HN: Amla Sandbox – WASM bash shell sandbox for AI agentsgithub.com/amlalabs
阅读更多来源: Hacker News | 31-01-26
175K+ publicly-exposed Ollama AI instances discoveredtechradar.com
阅读更多来源: Hacker News | 31-01-26
I trapped an AI model inside an art installation (2025) [video]youtube.com
阅读更多来源: Hacker News | 31-01-26
How to explain Generative AI in the classroomdalelane.co.uk
阅读更多来源: Hacker News | 31-01-26
Show HN: I built an AI conversation partner to practice speaking languagesapps.apple.com
阅读更多来源: Hacker News | 31-01-26
The $100B megadeal between OpenAI and Nvidia is on icewsj.com
阅读更多来源: Hacker News | 31-01-26
Claude Code daily benchmarks for degradation trackingmarginlab.ai
阅读更多来源: Hacker News | 30-01-26
Retiring GPT-4o, GPT-4.1, GPT-4.1 mini, and OpenAI o4-mini in ChatGPTopenai.com
阅读更多来源: Hacker News | 30-01-26
How AI assistance impacts the formation of coding skillsanthropic.com
阅读更多来源: Hacker News | 30-01-26
Moltworker: a self-hosted personal AI agent, minus the miniscloudflare.com
阅读更多来源: Hacker News | 30-01-26
CISA’s acting head uploaded sensitive files into public version of ChatGPTpolitico.com
阅读更多来源: Hacker News | 30-01-26
Show HN: A MitM proxy to see what your LLM tools are sendinggithub.com/jmuncor
阅读更多来源: Hacker News | 29-01-26
Show HN: ShapedQL – A SQL engine for multi-stage ranking and RAGshaped.ai
阅读更多来源: Hacker News | 29-01-26
The tech market is fundamentally fucked up and AI is just a scapegoatbayramovanar.substack.com
阅读更多来源: Hacker News | 29-01-26
Harnessing Large Language Models for Precision Querying and Retrieval-Augmented Knowledge Extraction in Clinical Data Science
Authors: Juan Jose Rubio Jan, Jack Wu, Julia Ive |
阅读更多来源: ArXiv AI | 29-01-26
HESTIA: A Hessian-Guided Differentiable Quantization-Aware Training Framework for Extremely Low-Bit LLMs
Authors: Guoan Wang, Feiyu Wang, Zongwei Lv, Yikun Zong, Tong Yang |
阅读更多来源: ArXiv AI | 29-01-26
QueerGen: How LLMs Reflect Societal Norms on Gender and Sexuality in Sentence Completion Tasks
Authors: Mae Sosto, Delfina Sol Martinez Pandiani, Laura Hollink |
阅读更多来源: ArXiv AI | 29-01-26
Beyond GEMM-Centric NPUs: Enabling Efficient Diffusion LLM Sampling
Authors: Binglei Lou, Haoran Wu, Yao Lai, Jiayi Nie, Can Xiao, Xuan Guo, Rika Antonova, Robert Mullins, Aaron Zhao |
阅读更多来源: ArXiv AI | 29-01-26
LEMON: How Well Do MLLMs Perform Temporal Multimodal Understanding on Instructional Videos?
Authors: Zhuang Yu, Lei Shen, Jing Zhao, Shiliang Sun |
阅读更多来源: ArXiv AI | 29-01-26
$\mathbb{R}^{2k}$ is Theoretically Large Enough for Embedding-based Top-$k$ Retrieval
Authors: Zihao Wang, Hang Yin, Lihui Liu, Hanghang Tong, Yangqiu Song, Ginny Wong, Simon See |
阅读更多来源: ArXiv AI | 29-01-26
Teaching LLMs to Ask: Self-Querying Category-Theoretic Planning for Under-Specified Reasoning
Authors: Shuhui Qu |
阅读更多来源: ArXiv AI | 29-01-26
Evolutionary Strategies lead to Catastrophic Forgetting in LLMs
Authors: Immanuel Abdi, Akshat Gupta, Micah Mok, Alexander Lu, Nicholas Lee, Gopala Anumanchipalli |
阅读更多来源: ArXiv AI | 29-01-26
Exploring Transformer Placement in Variational Autoencoders for Tabular Data Generation
Authors: Aníbal Silva, Moisés Santos, André Restivo, Carlos Soares |
阅读更多来源: ArXiv AI | 29-01-26
Towards Intelligent Urban Park Development Monitoring: LLM Agents for Multi-Modal Information Fusion and Analysis
Authors: Zixuan Xiao, Chunguang Hu, Jun Ma |
阅读更多来源: ArXiv AI | 29-01-26
Should I Have Expressed a Different Intent? Counterfactual Generation for LLM-Based Autonomous Control
Authors: Amirmohammad Farzaneh, Salvatore D'Oro, Osvaldo Simeone |
阅读更多来源: ArXiv AI | 29-01-26
Insight Agents: An LLM-Based Multi-Agent System for Data Insights
Authors: Jincheng Bai, Zhenyu Zhang, Jennifer Zhang, Zhihuai Zhu |
阅读更多来源: ArXiv AI | 29-01-26
Policy of Thoughts: Scaling LLM Reasoning via Test-time Policy Evolution
Authors: Zhengbo Jiao, Hongyu Xian, Qinglong Wang, Yunpu Ma, Zhebo Wang, Zifan Zhang, Dezhang Kong, Meng Han |
阅读更多来源: ArXiv AI | 29-01-26
PathWise: Planning through World Model for Automated Heuristic Design via Self-Evolving LLMs
Authors: Oguzhan Gungordu, Siheng Xiong, Faramarz Fekri |
阅读更多来源: ArXiv AI | 29-01-26
Dialogical Reasoning Across AI Architectures: A Multi-Model Framework for Testing AI Alignment Strategies
Authors: Gray Cox |
阅读更多来源: ArXiv AI | 29-01-26
Enterprise Resource Planning Using Multi-type Transformers in Ferro-Titanium Industry
Authors: Samira Yazdanpourmoghadam, Mahan Balal Pour, Vahid Partovi Nia |
阅读更多来源: ArXiv AI | 29-01-26
MemCtrl: Using MLLMs as Active Memory Controllers on Embodied Agents
Authors: Vishnu Sashank Dorbala, Dinesh Manocha |
阅读更多来源: ArXiv AI | 29-01-26
SokoBench: Evaluating Long-Horizon Planning and Reasoning in Large Language Models
Authors: Sebastiano Monti, Carlo Nicolini, Gianni Pellegrini, Jacopo Staiano, Bruno Lepri |
阅读更多来源: ArXiv AI | 29-01-26
Jellyfin LLM/"AI" Development Policyjellyfin.org
阅读更多来源: Hacker News | 29-01-26
Please Don't Say Mean Things about the AI I Just Invested a Billion Dollars Inmcsweeneys.net
阅读更多来源: Hacker News | 29-01-26
SoftBank in Talks to Invest Up to $30B More in OpenAIwsj.com
阅读更多来源: Hacker News | 28-01-26
A few random notes from Claude coding quite a bit last few weekstwitter.com/karpathy
阅读更多来源: Hacker News | 28-01-26
Prismopenai.com
阅读更多来源: Hacker News | 28-01-26
Out-of-Distribution Generalization via Invariant Trajectories for Multimodal Large Language Model Editing
Authors: Jiajie Su, Haoyuan Wang, Xiaohua Feng, Yunshan Ma, Xiaobo Xia, Yuyuan Li, Xiaolin Zheng, Jianmao Xiao, Chaochao Chen |
阅读更多来源: ArXiv AI | 28-01-26
RvB: Automating AI System Hardening via Iterative Red-Blue Games
Authors: Lige Huang, Zicheng Liu, Jie Zhang, Lewen Yan, Dongrui Liu, Jing Shao |
阅读更多来源: ArXiv AI | 28-01-26
When Iterative RAG Beats Ideal Evidence: A Diagnostic Study in Scientific Multi-hop Question Answering
Authors: Mahdi Astaraki, Mohammad Arshi Saloot, Ali Shiraee Kasmaee, Hamidreza Mahyar, Soheila Samiee |
阅读更多来源: ArXiv AI | 28-01-26
LLM Driven Design of Continuous Optimization Problems with Controllable High-level Properties
Authors: Urban Skvorc, Niki van Stein, Moritz Seiler, Britta Grimme, Thomas Bäck, Heike Trautmann |
阅读更多来源: ArXiv AI | 28-01-26
HARMONI: Multimodal Personalization of Multi-User Human-Robot Interactions with LLMs
Authors: Jeanne Malécot, Hamed Rahimi, Jeanne Cattoni, Marie Samson, Mouad Abrini, Mahdi Khoramshahi, Maribel Pino, Mohamed Chetouani |
阅读更多来源: ArXiv AI | 28-01-26
More at Stake: How Payoff and Language Shape LLM Agent Strategies in Cooperation Dilemmas
Authors: Trung-Kiet Huynh, Dao-Sy Duy-Minh, Thanh-Bang Cao, Phong-Hao Le, Hong-Dan Nguyen, Nguyen Lam Phu Quy, Minh-Luan Nguyen-Vo, Hong-Phat Pham, Pham Phu Hoa, Thien-Kim Than, Chi-Nguyen Tran, Huy Tran, Gia-Thoai Tran-Le, Alessio Buscemi, Le Hong Trang, The Anh Han |
阅读更多来源: ArXiv AI | 28-01-26
Balancing Sustainability And Performance: The Role Of Small-Scale Llms In Agentic Artificial Intelligence Systems
Authors: Anh Khoa Ngo Ho, Martin Chauvin, Simon Gosset, Philippe Cordier, Boris Gamazaychikov |
阅读更多来源: ArXiv AI | 28-01-26
GLOVE: Global Verifier for LLM Memory-Environment Realignment
Authors: Xingkun Yin, Hongyang Du |
阅读更多来源: ArXiv AI | 28-01-26
RPO:Reinforcement Fine-Tuning with Partial Reasoning Optimization
Authors: Hongzhu Yi, Xinming Wang, Zhenghao zhang, Tianyu Zong, Yuanxiang Wang, Jun Xie, Tao Yu, Haopeng Jin, Zhepeng Wang, Kaixin Xu, Feng Chen, Jiahuan Chen, Yujia Yang, Zhenyu Guan, Bingkang Shi, Jungang Xu |
阅读更多来源: ArXiv AI | 28-01-26
PROTEUS: SLA-Aware Routing via Lagrangian RL for Multi-LLM Serving Systems
Authors: Amit Singh Bhatti, Vishal Vaddina, Dagnachew Birru |
阅读更多来源: ArXiv AI | 28-01-26
SETA: Statistical Fault Attribution for Compound AI Systems
Authors: Sayak Chowdhury, Meenakshi D'Souza |
阅读更多来源: ArXiv AI | 28-01-26
Algorithmic Prompt-Augmentation for Efficient LLM-Based Heuristic Design for A* Search
Authors: Thomas Bömer, Nico Koltermann, Max Disselnmeyer, Bastian Amberg, Anne Meyer |
阅读更多来源: ArXiv AI | 28-01-26
ComAgent: Multi-LLM based Agentic AI Empowered Intelligent Wireless Networks
Authors: Haoyun Li, Ming Xiao, Kezhi Wang, Robert Schober, Dong In Kim, Yong Liang Guan |
阅读更多来源: ArXiv AI | 28-01-26
LLM-as-a-Courtroomfalconer.com
阅读更多来源: Hacker News | 28-01-26
Porting 100k lines from TypeScript to Rust using Claude Code in a monthvjeux.com
阅读更多来源: Hacker News | 27-01-26
I let ChatGPT analyze a decade of my Apple Watch data, then I called my doctormsn.com
阅读更多来源: Hacker News | 27-01-26
ChatGPT Containers can now run bash, pip/npm install packages and download filessimonwillison.net
阅读更多来源: Hacker News | 27-01-26
There is an AI code review bubblegreptile.com
阅读更多来源: Hacker News | 27-01-26
ReFuGe: Feature Generation for Prediction Tasks on Relational Databases with LLM Agents
Authors: Kyungho Kim, Geon Lee, Juyeon Kim, Dongwon Choi, Shinhwan Kang, Kijung Shin |
阅读更多来源: ArXiv AI | 27-01-26
The LLM Data Auditor: A Metric-oriented Survey on Quality and Trustworthiness in Evaluating Synthetic Data
Authors: Kaituo Zhang, Mingzhi Hu, Hoang Anh Duy Le, Fariha Kabir Torsha, Zhimeng Jiang, Minh Khai Bui, Chia-Yuan Chang, Yu-Neng Chuang, Zhen Xiong, Ying Lin, Guanchu Wang, Na Zou |
阅读更多来源: ArXiv AI | 27-01-26
MMR-Bench: A Comprehensive Benchmark for Multimodal LLM Routing
Authors: Haoxuan Ma, Guannan Lai, Han-Jia Ye |
阅读更多来源: ArXiv AI | 27-01-26
Neuro-Symbolic Verification on Instruction Following of LLMs
Authors: Yiming Su, Kunzhao Xu, Yanjie Gao, Fan Yang, Cheng Li, Mao Yang, Tianyin Xu |
阅读更多来源: ArXiv AI | 27-01-26
Think Locally, Explain Globally: Graph-Guided LLM Investigations via Local Reasoning and Belief Propagation
Authors: Saurabh Jha, Rohan Arora, Bhavya, Noah Zheutlin, Paulina Toro Isaza, Laura Shwartz, Yu Deng, Daby Sow, Ruchi Mahindru, Ruchir Puri |
阅读更多来源: ArXiv AI | 27-01-26
UniCog: Uncovering Cognitive Abilities of LLMs through Latent Mind Space Analysis
Authors: Jiayu Liu, Yinhe Long, Zhenya Huang, Enhong Chen |
阅读更多来源: ArXiv AI | 27-01-26
Aligning Medical Conversational AI through Online Reinforcement Learning with Information-Theoretic Rewards
Authors: Tanvi Verma, Yang Zhou, Rick Siow Mong Goh, Yong Liu |
阅读更多来源: ArXiv AI | 27-01-26
Expert Evaluation and the Limits of Human Feedback in Mental Health AI Safety Testing
Authors: Kiana Jafari, Paul Ulrich Nikolaus Rust, Duncan Eddy, Robbie Fraser, Nina Vasan, Darja Djordjevic, Akanksha Dadlani, Max Lamparth, Eugenia Kim, Mykel Kochenderfer |
阅读更多来源: ArXiv AI | 27-01-26
LLM-Based SQL Generation: Prompting, Self-Refinement, and Adaptive Weighted Majority Voting
Authors: Yu-Jie Yang, Hung-Fu Chang, Po-An Chen |
阅读更多来源: ArXiv AI | 27-01-26
Agentic AI for Self-Driving Laboratories in Soft Matter: Taxonomy, Benchmarks,and Open Challenges
Authors: Xuanzhou Chen, Audrey Wang, Stanley Yin, Hanyang Jiang, Dong Zhang |
阅读更多来源: ArXiv AI | 27-01-26
Beyond Text-to-SQL: Can LLMs Really Debug Enterprise ETL SQL?
Authors: Jing Ye, Yiwen Duan, Yonghong Yu, Victor Ma, Yang Gao, Xing Chen |
阅读更多来源: ArXiv AI | 27-01-26
EvolVE: Evolutionary Search for LLM-based Verilog Generation and Optimization
Authors: Wei-Po Hsin, Ren-Hao Deng, Yao-Ting Hsieh, En-Ming Huang, Shih-Hao Hung |
阅读更多来源: ArXiv AI | 27-01-26
RareAlert: Aligning heterogeneous large language model reasoning for early rare disease risk screening
Authors: Xi Chen, Hongru Zhou, Huahui Yi, Shiyu Feng, Hanyu Zhou, Tiancheng He, Mingke You, Li Wang, Qiankun Li, Kun Wang, Weili Fu, Kang Li, Jian Li |
阅读更多来源: ArXiv AI | 27-01-26
ShopSimulator: Evaluating and Exploring RL-Driven LLM Agent for Shopping Assistants
Authors: Pei Wang, Yanan Wu, Xiaoshuai Song, Weixun Wang, Gengru Chen, Zhongwen Li, Kezhong Yan, Ken Deng, Qi Liu, Shuaibing Zhao, Shaopan Xiong, Xuepeng Liu, Xuefeng Chen, Wanxi Deng, Wenbo Su, Bo Zheng |
阅读更多来源: ArXiv AI | 27-01-26
Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents
Authors: Zhihan Liu, Lin Guan, Yixin Nie, Kai Zhang, Zhuoqun Hao, Lin Chen, Asli Celikyilmaz, Zhaoran Wang, Na Zhang |
阅读更多来源: ArXiv AI | 27-01-26
Think-Augmented Function Calling: Improving LLM Parameter Accuracy Through Embedded Reasoning
Authors: Lei Wei, Jinpeng Ou, Xiao Peng, Bin Wang |
阅读更多来源: ArXiv AI | 27-01-26
Can Good Writing Be Generative? Expert-Level AI Writing Emerges through Fine-Tuning on High-Quality Books
Authors: Tuhin Chakrabarty, Paramveer S. Dhillon |
阅读更多来源: ArXiv AI | 27-01-26
Deconstructing Instruction-Following: A New Benchmark for Granular Evaluation of Large Language Model Instruction Compliance Abilities
Authors: Alberto Purpura, Li Wang, Sahil Badyal, Eugenio Beaufrand, Adam Faulkner |
阅读更多来源: ArXiv AI | 27-01-26
AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security
Authors: Dongrui Liu, Qihan Ren, Chen Qian, Shuai Shao, Yuejin Xie, Yu Li, Zhonghao Yang, Haoyu Luo, Peng Wang, Qingyu Liu, Binxin Hu, Ling Tang, Jilin Mei, Dadi Guo, Leitao Yuan, Junyao Yang, Guanxu Chen, Qihao Lin, Yi Yu, Bo Zhang, Jiaxuan Guo, Jie Zhang, Wenqi Shao, Huiqi Deng, Zhiheng Xi, Wenjie Wang, Wenxuan Wang, Wen Shen, Zhikai Chen, Haoyu Xie, Jialing Tao, Juntao Dai, Jiaming Ji, Zhongjie Ba, Linfeng Zhang, Yong Liu, Quanshi Zhang, Lei Zhu, Zhihua Wei, Hui Xue, Chaochao Lu, Jing Shao, Xia Hu |
阅读更多来源: ArXiv AI | 27-01-26
Stability as a Liability:Systematic Breakdown of Linguistic Structure in LLMs
Authors: Xianzhe Meng, Qiangsheng Zeng, Ling Luo, Qinghan Yang, Jiarui Hao, Wenbo Wu, Qinyu Wang, Rui Yin, Lin Qi, Renzhi Lu |
阅读更多来源: ArXiv AI | 27-01-26
Assessing the Quality of Mental Health Support in LLM Responses through Multi-Attribute Human Evaluation
Authors: Abeer Badawi, Md Tahmid Rahman Laskar, Elahe Rahimi, Sheri Grach, Lindsay Bertrand, Lames Danok, Frank Rudzicz, Jimmy Huang, Elham Dolatabadi |
阅读更多来源: ArXiv AI | 27-01-26
Emergence of Phonemic, Syntactic, and Semantic Representations in Artificial Neural Networks
Authors: Pierre Orhan, Pablo Diego-Simón, Emmnanuel Chemla, Yair Lakretz, Yves Boubenec, Jean-Rémi King |
阅读更多来源: ArXiv AI | 27-01-26
Health-SCORE: Towards Scalable Rubrics for Improving Health-LLMs
Authors: Zhichao Yang, Sepehr Janghorbani, Dongxu Zhang, Jun Han, Qian Qian, Andrew Ressler II, Gregory D. Lyng, Sanjit Singh Batra, Robert E. Tillman |
阅读更多来源: ArXiv AI | 27-01-26
Conditioned Generative Modeling of Molecular Glues: A Realistic AI Approach for Synthesizable Drug-like Molecules
Authors: Naeyma N. Islam, Thomas R. Caulfield |
阅读更多来源: ArXiv AI | 27-01-26
Show HN: Only 1 LLM can fly a dronegithub.com/kxzk
阅读更多来源: Hacker News | 27-01-26
Google AI Overviews cite YouTube more than any medical site for health queriestheguardian.com
阅读更多来源: Hacker News | 27-01-26
Show HN: TetrisBench – Gemini Flash reaches 66% win rate on Tetris against Opustetrisbench.com
阅读更多来源: Hacker News | 27-01-26
Adoption of Generative Artificial Intelligence in the German Software Engineering Industry: An Empirical Study
Authors: Ludwig Felder, Tobias Eisenreich, Mahsa Fischer, Stefan Wagner, Chunyang Chen |
阅读更多来源: ArXiv AI | 27-01-26
PLawBench: A Rubric-Based Benchmark for Evaluating LLMs in Real-World Legal Practice
Authors: Yuzhen Shi, Huanghai Liu, Yiran Hu, Gaojie Song, Xinran Xu, Yubo Ma, Tianyi Tang, Li Zhang, Qingjing Chen, Di Feng, Wenbo Lv, Weiheng Wu, Kexin Yang, Sen Yang, Wei Wang, Rongyao Shi, Yuanyang Qiu, Yuemeng Qi, Jingwen Zhang, Xiaoyu Sui, Yifan Chen, Yi Zhang, An Yang, Bowen Yu, Dayiheng Liu, Junyang Lin, Weixing Shen, Bing Zhao, Charles L.A. Clarke, Hu Wei |
阅读更多来源: ArXiv AI | 27-01-26
Do LLM hallucination detectors suffer from low-resource effect?
Authors: Debtanu Datta, Mohan Kishore Chilukuri, Yash Kumar, Saptarshi Ghosh, Muhammad Bilal Zafar |
阅读更多来源: ArXiv AI | 27-01-26
Standardizing Longitudinal Radiology Report Evaluation via Large Language Model Annotation
Authors: Xinyi Wang, Grazziela Figueredo, Ruizhe Li, Xin Chen |
阅读更多来源: ArXiv AI | 27-01-26
Dynamic Expert-Guided Model Averaging for Causal Discovery
Authors: Adrick Tench, Thomas Demeester |
阅读更多来源: ArXiv AI | 27-01-26
Trapped in the past? Disentangling fluid and crystallized intelligence of large language models using chess
Authors: Leonard S. Pleiss, Maximilian Schiffer, Robert K. von Weizsäcker |
阅读更多来源: ArXiv AI | 27-01-26
LLM-Based Adversarial Persuasion Attacks on Fact-Checking Systems
Authors: João A. Leite, Olesya Razuvayevskaya, Kalina Bontcheva, Carolina Scarton |
阅读更多来源: ArXiv AI | 27-01-26
Information Representation Fairness in Long-Document Embeddings: The Peculiar Interaction of Positional and Language Bias
Authors: Elias Schuhmacher, Andrianos Michail, Juri Opitz, Rico Sennrich, Simon Clematide |
阅读更多来源: ArXiv AI | 27-01-26
Nishpaksh: TEC Standard-Compliant Framework for Fairness Auditing and Certification of AI Models
Authors: Shashank Prakash, Ranjitha Prasad, Avinash Agarwal |
阅读更多来源: ArXiv AI | 27-01-26
When Agents Fail to Act: A Diagnostic Framework for Tool Invocation Reliability in Multi-Agent LLM Systems
Authors: Donghao Huang, Gauri Malwe, Zhaoxia Wang |
阅读更多来源: ArXiv AI | 27-01-26
A Scalable Measure of Loss Landscape Curvature for Analyzing the Training Dynamics of LLMs
Authors: Dayal Singh Kalra, Jean-Christophe Gagnon-Audet, Andrey Gromov, Ishita Mediratta, Kelvin Niu, Alexander H Miller, Michael Shvartsman |
阅读更多来源: ArXiv AI | 27-01-26
SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care
Authors: Dongshen Peng, Yi Wang, Carl Preiksaitis, Christian Rose |
阅读更多来源: ArXiv AI | 27-01-26
Doc2AHP: Inferring Structured Multi-Criteria Decision Models via Semantic Trees with LLMs
Authors: Hongjia Wu, Shuai Zhou, Hongxin Zhang, Wei Chen |
阅读更多来源: ArXiv AI | 27-01-26
LLM is Not All You Need: A Systematic Evaluation of ML vs. Foundation Models for text and image based Medical Classification
Authors: Meet Raval, Tejul Pandit, Dhvani Upadhyay |
阅读更多来源: ArXiv AI | 27-01-26
AgentDrive: An Open Benchmark Dataset for Agentic AI Reasoning with LLM-Generated Scenarios in Autonomous Systems
Authors: Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah |
阅读更多来源: ArXiv AI | 27-01-26
Case study: Creative math – How AI fakes proofstomaszmachnik.pl
阅读更多来源: Hacker News | 26-01-26
Clawdbot - open source personal AI assistantgithub.com/clawdbot
阅读更多来源: Hacker News | 26-01-26
Show HN: FaceTime-style calls with an AI Companion (Live2D and long-term memory)thebeni.ai
阅读更多来源: Hacker News | 26-01-26
Replayable Financial Agents: A Determinism-Faithfulness Assurance Harness for Tool-Using LLM Agents
Authors: Raffi Khatchadourian |
阅读更多来源: ArXiv AI | 25-01-26
Aeon: High-Performance Neuro-Symbolic Memory Management for Long-Horizon LLM Agents
Authors: Mustafa Arslan |
阅读更多来源: ArXiv AI | 25-01-26
Not Your Typical Sycophant: The Elusive Nature of Sycophancy in Large Language Models
Authors: Shahar Ben Natan, Oren Tsur |
阅读更多来源: ArXiv AI | 25-01-26
Beyond Prompting: Efficient and Robust Contextual Biasing for Speech LLMs via Logit-Space Integration (LOGIC)
Authors: Peidong Wang |
阅读更多来源: ArXiv AI | 25-01-26
Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge
Authors: Yiyang Feng, Zeming Chen, Haotian Wu, Jiawei Zhou, Antoine Bosselut |
阅读更多来源: ArXiv AI | 25-01-26
MiRAGE: A Multiagent Framework for Generating Multimodal Multihop Question-Answer Dataset for RAG Evaluation
Authors: Chandan Kumar Sahu, Premith Kumar Chilukuri, Matthew Hetrich |
阅读更多来源: ArXiv AI | 25-01-26
Reliability by design: quantifying and eliminating fabrication risk in LLMs. From generative to consultative AI: a comparative analysis in the legal domain and lessons for high-stakes knowledge bases
Authors: Alex Dantart |
阅读更多来源: ArXiv AI | 25-01-26
TransportAgents: a multi-agents LLM framework for traffic accident severity prediction
Authors: Zhichao Yang, Jiashu He, Jinxuan Fan, Cirillo Cinzia |
阅读更多来源: ArXiv AI | 25-01-26
The Dark Side of AI Transformers: Sentiment Polarization & the Loss of Business Neutrality by NLP Transformers
Authors: Prasanna Kumar |
阅读更多来源: ArXiv AI | 25-01-26
CogToM: A Comprehensive Theory of Mind Benchmark inspired by Human Cognition for Large Language Models
Authors: Haibo Tong, Zeyang Yue, Feifei Zhao, Erliang Lin, Lu Jia, Ruolin Chen, Yinqian Sun, Qian Zhang, Yi Zeng |
阅读更多来源: ArXiv AI | 25-01-26
Predictive Coding and Information Bottleneck for Hallucination Detection in Large Language Models
Authors: Manish Bhatt |
阅读更多来源: ArXiv AI | 25-01-26
Agentic AI Governance and Lifecycle Management in Healthcare
Authors: Chandra Prakash, Mary Lind, Avneesh Sisodia |
阅读更多来源: ArXiv AI | 25-01-26
From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models
Authors: Jiaxin Zhang, Wendi Cui, Zhuohang Li, Lifu Huang, Bradley Malin, Caiming Xiong, Chien-Sheng Wu |
阅读更多来源: ArXiv AI | 25-01-26
Improving Methodologies for LLM Evaluations Across Global Languages
Authors: Akriti Vij, Benjamin Chua, Darshini Ramiah, En Qi Ng, Mahran Morsidi, Naga Nikshith Gangarapu, Sharmini Johnson, Vanessa Wilfred, Vikneswaran Kumaran, Wan Sie Lee, Wenzhuo Yang, Yongsen Zheng, Bill Black, Boming Xia, Frank Sun, Hao Zhang, Qinghua Lu, Suyu Ma, Yue Liu, Chi-kiu Lo, Fatemeh Azadi, Isar Nejadgholi, Sowmya Vajjala, Agnes Delaborde, Nicolas Rolin, Tom Seimandi, Akiko Murakami, Haruto Ishi, Satoshi Sekine, Takayuki Semitsu, Tasuku Sasaki, Angela Kinuthia, Jean Wangari, Michael Michie, Stephanie Kasaon, Hankyul Baek, Jaewon Noh, Kihyuk Nam, Sang Seo, Sungpil Shin, Taewhi Lee, Yongsu Kim, Daisy Newbold-Harrop, Jessica Wang, Mahmoud Ghanem, Vy Hong |
阅读更多来源: ArXiv AI | 25-01-26
ErrorMap and ErrorAtlas: Charting the Failure Landscape of Large Language Models
Authors: Shir Ashury-Tahan, Yifan Mai, Elron Bandel, Michal Shmueli-Scheuer, Leshem Choshen |
阅读更多来源: ArXiv AI | 25-01-26
Decoupling Return-to-Go for Efficient Decision Transformer
Authors: Yongyi Wang, Hanyu Liu, Lingfeng Li, Bozhou Chen, Ang Li, Qirui Zheng, Xionghui Yang, Wenxin Li |
阅读更多来源: ArXiv AI | 25-01-26
Grounding Large Language Models in Reaction Knowledge Graphs for Synthesis Retrieval
Authors: Olga Bunkova, Lorenzo Di Fruscia, Sophia Rupprecht, Artur M. Schweidtmann, Marcel J.T. Reinders, Jana M. Weber |
阅读更多来源: ArXiv AI | 25-01-26
Deja Vu in Plots: Leveraging Cross-Session Evidence with Retrieval-Augmented LLMs for Live Streaming Risk Assessment
Authors: Yiran Qiao, Xiang Ao, Jing Chen, Yang Liu, Qiwei Zhong, Qing He |
阅读更多来源: ArXiv AI | 25-01-26
AgriPINN: A Process-Informed Neural Network for Interpretable and Scalable Crop Biomass Prediction Under Water Stress
Authors: Yue Shi, Liangxiu Han, Xin Zhang, Tam Sobeih, Thomas Gaiser, Nguyen Huu Thuy, Dominik Behrend, Amit Kumar Srivastava, Krishnagopal Halder, Frank Ewert |
阅读更多来源: ArXiv AI | 25-01-26
LLM Prompt Evaluation for Educational Applications
Authors: Langdon Holmes, Adam Coscia, Scott Crossley, Joon Suh Choi, Wesley Morris |
阅读更多来源: ArXiv AI | 25-01-26
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning
Authors: Moo Jin Kim, Yihuai Gao, Tsung-Yi Lin, Yen-Chen Lin, Yunhao Ge, Grace Lam, Percy Liang, Shuran Song, Ming-Yu Liu, Chelsea Finn, Jinwei Gu |
阅读更多来源: ArXiv AI | 25-01-26
Raspberry Pi Drag Race: Pi 1 to Pi 5 – Performance Comparisonthe-diy-life.com
阅读更多来源: Hacker News | 25-01-26
Claude Code's new hidden feature: Swarmstwitter.com/nicerinperson
阅读更多来源: Hacker News | 25-01-26
David Patterson: Challenges and Research Directions for LLM Inference Hardwarearxiv.org
阅读更多来源: Hacker News | 25-01-26
Show HN: AutoShorts – Local, GPU-accelerated AI video pipeline for creatorsgithub.com/divyaprakash0426
阅读更多来源: Hacker News | 25-01-26
JSON-render: LLM-based JSON-to-UI tooljson-render.dev
阅读更多来源: Hacker News | 25-01-26
Shared Claude: A website controlled by the publicsharedclaude.com
阅读更多来源: Hacker News | 25-01-26
Proton spam and the AI consent problemdbushell.com
阅读更多来源: Hacker News | 24-01-26
Unrolling the Codex agent loopopenai.com
阅读更多来源: Hacker News | 24-01-26
Internet Archive's Storagedshr.org
阅读更多来源: Hacker News | 24-01-26
Anthropic Economic Index report: economic primitivesanthropic.com
阅读更多来源: Hacker News | 24-01-26
Scaling PostgreSQL to power 800M ChatGPT usersopenai.com
阅读更多来源: Hacker News | 23-01-26
Why I Don't Have Fun With Claude Codebrennan.io
阅读更多来源: Hacker News | 23-01-26
I was banned from Claude for scaffolding a Claude.md file?hugodaniel.com
阅读更多来源: Hacker News | 23-01-26
The State of Modern AI Text to Speech Systems for Screen Reader Usersinterfree.ca
阅读更多来源: Hacker News | 23-01-26
Ghostty's AI Policygithub.com/ghostty-org
阅读更多来源: Hacker News | 23-01-26
Composing APIs and CLIs in the LLM erawalters.app
阅读更多来源: Hacker News | 23-01-26
Letting Claude play text adventuresborretti.me
阅读更多来源: Hacker News | 22-01-26
eBay explicitly bans AI "buy for me" agents in user agreement updatevalueaddedresource.net
阅读更多来源: Hacker News | 22-01-26
Claude's new constitutionanthropic.com
阅读更多来源: Hacker News | 22-01-26
Your brain on ChatGPT: Accumulation of cognitive debt when using an AI assistantmedia.mit.edu
阅读更多来源: Hacker News | 22-01-26
Dynamic Management of a Deep Learning-Based Anomaly Detection System for 5G Networks
Authors: Lorenzo Fernández Maimó, Alberto Huertas Celdrán, Manuel Gil Pérez, Félix J. García Clemente, Gregorio Martínez Pérez |
阅读更多来源: ArXiv AI | 22-01-26
Outcome-Based RL Provably Leads Transformers to Reason, but Only With the Right Data
Authors: Yuval Ran-Milo, Yotam Alexander, Shahar Mendel, Nadav Cohen |
阅读更多来源: ArXiv AI | 22-01-26
Deaf and Hard of Hearing Access to Intelligent Personal Assistants: Comparison of Voice-Based Options with an LLM-Powered Touch Interface
Authors: Paige S. DeVries, Michaela Okosi, Ming Li, Nora Dunphy. Gidey Gezae, Dante Conway, Abraham Glasser, Raja Kushalnagar, Christian Vogler |
阅读更多来源: ArXiv AI | 22-01-26
Where Do AI Coding Agents Fail? An Empirical Study of Failed Agentic Pull Requests in GitHub
Authors: Ramtin Ehsani, Sakshi Pathak, Shriya Rawal, Abdullah Al Mujahid, Mia Mohammad Imran, Preetha Chatterjee |
阅读更多来源: ArXiv AI | 22-01-26
Benchmarking Large Language Models for ABAP Code Generation: An Empirical Study on Iterative Improvement by Compiler Feedback
Authors: Stephan Wallraven, Tim Köhne, Hartmut Westenberger, Andreas Moser |
阅读更多来源: ArXiv AI | 22-01-26
Evaluation of Large Language Models in Legal Applications: Challenges, Methods, and Future Directions
Authors: Yiran Hu, Huanghai Liu, Chong Wang, Kunran Li, Tien-Hsuan Wu, Haitao Li, Xinran Xu, Siqing Huo, Weihang Su, Ning Zheng, Siyuan Zheng, Qingyao Ai, Yun Liu, Renjun Bian, Yiqun Liu, Charles L.A. Clarke, Weixing Shen, Ben Kao |
阅读更多来源: ArXiv AI | 22-01-26
Large Language Model-Powered Evolutionary Code Optimization on a Phylogenetic Tree
Authors: Leyi Zhao, Weijie Huang, Yitong Guo, Jiang Bian, Chenghong Wang, Xuhong Zhang |
阅读更多来源: ArXiv AI | 22-01-26
On the Generalization Gap in LLM Planning: Tests and Verifier-Reward RL
Authors: Valerio Belcamino, Nicholas Attolino, Alessio Capitanelli, Fulvio Mastrogiovanni |
阅读更多来源: ArXiv AI | 22-01-26
IB-GRPO: Aligning LLM-based Learning Path Recommendation with Educational Objectives via Indicator-Based Group Relative Policy Optimization
Authors: Shuai Wang, Yaoming Yang, Bingdong Li, Hao Hao, Aimin Zhou |
阅读更多来源: ArXiv AI | 22-01-26
Query-Efficient Agentic Graph Extraction Attacks on GraphRAG Systems
Authors: Shuhua Yang, Jiahao Zhang, Yilong Wang, Dongwon Lee, Suhang Wang |
阅读更多来源: ArXiv AI | 22-01-26
DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs
Authors: Mingxuan Song, Yusen Huo, Bohan Zhou, Shenglin Yin, Zhen Xiao, Jieyi Long, Zhilin Zhang, Chuan Yu |
阅读更多来源: ArXiv AI | 22-01-26
How to Build AI Agents by Augmenting LLMs with Codified Human Expert Domain Knowledge? A Software Engineering Framework
Authors: Choro Ulan uulu, Mikhail Kulyabin, Iris Fuhrmann, Jan Joosten, Nuno Miguel Martins Pacheco, Filippos Petridis, Rebecca Johnson, Jan Bosch, Helena Holmström Olsson |
阅读更多来源: ArXiv AI | 22-01-26
Vehicle Routing with Finite Time Horizon using Deep Reinforcement Learning with Improved Network Embedding
Authors: Ayan Maity, Sudeshna Sarkar |
阅读更多来源: ArXiv AI | 22-01-26
Three types of LLM workloads and how to serve themmodal.com
阅读更多来源: Hacker News | 22-01-26
Show HN: TerabyteDeals – Compare storage prices by $/TBterabytedeals.com
阅读更多来源: Hacker News | 22-01-26
Which AI Lies Best? A game theory classic designed by John Nashso-long-sucker.vercel.app
阅读更多来源: Hacker News | 21-01-26
Batmobile: 10-20x Faster CUDA Kernels for Equivariant Graph Neural Networkselliotarledge.com
阅读更多来源: Hacker News | 21-01-26
Anthropic's original take home assignment open sourcedgithub.com/anthropics
阅读更多来源: Hacker News | 21-01-26
Responsible AI for General-Purpose Systems: Overview, Challenges, and A Path Forward
Authors: Gourab K Patro, Himanshi Agrawal, Himanshu Gharat, Supriya Panigrahi, Nim Sherpa, Vishal Vaddina, Dagnachew Birru |
阅读更多来源: ArXiv AI | 21-01-26
RAG: A Random-Forest-Based Generative Design Framework for Uncertainty-Aware Design of Metamaterials with Complex Functional Response Requirements
Authors: Bolin Chen, Dex Doksoo Lee, Wei "Wayne'' Chen, Wei Chen |
阅读更多来源: ArXiv AI | 21-01-26
Real-Time Deadlines Reveal Temporal Awareness Failures in LLM Strategic Dialogues
Authors: Neil K. R. Sehgal, Sharath Chandra Guntuku, Lyle Ungar |
阅读更多来源: ArXiv AI | 21-01-26
Prompt Injection Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching
Authors: Diego Gosmar, Deborah A. Dahl |
阅读更多来源: ArXiv AI | 21-01-26
PepEDiff: Zero-Shot Peptide Binder Design via Protein Embedding Diffusion
Authors: Po-Yu Liang, Tobo Duran, Jun Bai |
阅读更多来源: ArXiv AI | 21-01-26
Improving the Safety and Trustworthiness of Medical AI via Multi-Agent Evaluation Loops
Authors: Zainab Ghafoor, Md Shafiqul Islam, Koushik Howlader, Md Rasel Khondokar, Tanusree Bhattacharjee, Sayantan Chakraborty, Adrito Roy, Ushashi Bhattacharjee, Tirtho Roy |
阅读更多来源: ArXiv AI | 21-01-26
Explicit Cognitive Allocation: A Principle for Governed and Auditable Inference in Large Language Models
Authors: Héctor Manuel Manzanilla-Granados, Zaira Navarrete-Cazales, Miriam Pescador-Rojas, Tonahtiu Ramírez-Romero |
阅读更多来源: ArXiv AI | 21-01-26
A Lightweight Modular Framework for Constructing Autonomous Agents Driven by Large Language Models: Design, Implementation, and Applications in AgentForge
Authors: Akbar Anbar Jafari, Cagri Ozcinar, Gholamreza Anbarjafari |
阅读更多来源: ArXiv AI | 21-01-26
The Geometry of Thought: How Scale Restructures Reasoning In Large Language Models
Authors: Samuel Cyrenius Anderson |
阅读更多来源: ArXiv AI | 21-01-26
Graph Neural Networks are Heuristics
Authors: Yimeng Min, Carla P. Gomes |
阅读更多来源: ArXiv AI | 21-01-26
TruthTensor: Evaluating LLMs Human Imitation through Prediction Market Drift and Holistic Reasoning
Authors: Shirin Shahabi, Spencer Graham, Haruna Isah |
阅读更多来源: ArXiv AI | 21-01-26
AgentGC: Evolutionary Learning-based Lossless Compression for Genomics Data with LLM-driven Multiple Agent
Authors: Sun Hui, Ding Yanfeng, Huidong Ma, Chang Xu, Keyan Jin, Lizheng Zu, Cheng Zhong, xiaoguang Liu, Gang Wang, Wentong Cai |
阅读更多来源: ArXiv AI | 21-01-26
Leveraging ChatGPT and Other NLP Methods for Identifying Risk and Protective Behaviors in MSM: Social Media and Dating apps Text Analysis
Authors: Mehrab Beikzadeh, Chenglin Hong, Cory J Cascalheira, Callisto Boka, Majid Sarrafzadeh, Ian W Holloway |
阅读更多来源: ArXiv AI | 21-01-26
Motion-to-Response Content Generation via Multi-Agent AI System with Real-Time Safety Verification
Authors: HyeYoung Lee |
阅读更多来源: ArXiv AI | 21-01-26
SCRIPTMIND: Crime Script Inference and Cognitive Evaluation for LLM-based Social Engineering Scam Detection System
Authors: Heedou Kim, Changsik Kim, Sanghwa Shin, Jaewoo Kang |
阅读更多来源: ArXiv AI | 21-01-26
Foundations of Global Consistency Checking with Noisy LLM Oracles
Authors: Paul He, Elke Kirschbaum, Shiva Kasiviswanathan |
阅读更多来源: ArXiv AI | 21-01-26
Hidden in Plain Text: Measuring LLM Deception Quality Against Human Baselines Using Social Deduction Games
Authors: Christopher Kao, Vanshika Vats, James Davis |
阅读更多来源: ArXiv AI | 21-01-26
Look-Ahead-Bench: a Standardized Benchmark of Look-ahead Bias in Point-in-Time LLMs for Finance
Authors: Mostapha Benhenda (LAGA) |
阅读更多来源: ArXiv AI | 21-01-26
DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution
Authors: Shengda Fan, Xuyan Ye, Yankai Lin |
阅读更多来源: ArXiv AI | 21-01-26
Human Simulation Computation: A Human-Inspired Framework for Adaptive AI Systems
Authors: Hong Su |
阅读更多来源: ArXiv AI | 21-01-26
Remapping and navigation of an embedding space via error minimization: a fundamental organizational principle of cognition in natural and artificial systems
Authors: Benedikt Hartl, Léo Pio-Lopez, Chris Fields, Michael Levin |
阅读更多来源: ArXiv AI | 21-01-26
Our approach to age predictionopenai.com
阅读更多来源: Hacker News | 21-01-26
Running Claude Code dangerously (safely)emilburzo.com
阅读更多来源: Hacker News | 21-01-26
Building Robust Helm Chartswillmunn.xyz
阅读更多来源: Hacker News | 21-01-26
Claude Chill: Fix Claude Code's Flickering in Terminalgithub.com/davidbeesley
阅读更多来源: Hacker News | 21-01-26
FactCorrector: A Graph-Inspired Approach to Long-Form Factuality Correction of Large Language Models
Authors: Javier Carnerero-Cano, Massimiliano Pronesti, Radu Marinescu, Tigran Tchrakian, James Barry, Jasmina Gajcin, Yufang Hou, Alessandra Pascale, Elizabeth Daly |
阅读更多来源: ArXiv AI | 21-01-26
SDFLoRA: Selective Dual-Module LoRA for Federated Fine-tuning with Heterogeneous Clients
Authors: Zhikang Shen, Jianrong Lu, Haiyuan Wan, Jianhai Chen |
阅读更多来源: ArXiv AI | 21-01-26
How Much Would a Clinician Edit This Draft? Evaluating LLM Alignment for Patient Message Response Drafting
Authors: Parker Seegmiller, Joseph Gatto, Sarah E. Greer, Ganza Belise Isingizwe, Rohan Ray, Timothy E. Burdick, Sarah Masud Preum |
阅读更多来源: ArXiv AI | 21-01-26
Evaluating LLM Behavior in Hiring: Implicit Weights, Fairness Across Groups, and Alignment with Human Preferences
Authors: Morgane Hoffmann, Emma Jouffroy, Warren Jouanneau, Marc Palyart, Charles Pebereau |
阅读更多来源: ArXiv AI | 21-01-26
Institutional AI: Governing LLM Collusion in Multi-Agent Cournot Markets via Public Governance Graphs
Authors: Marcantonio Bracale Syrnikov, Federico Pierucci, Marcello Galisai, Matteo Prandi, Piercosma Bisconti, Francesco Giarrusso, Olga Sorokoletova, Vincenzo Suriani, Daniele Nardi |
阅读更多来源: ArXiv AI | 21-01-26
Hierarchical Orthogonal Residual Spread for Precise Massive Editing in Large Language Models
Authors: Xiaojie Gu, Guangxu Chen, Yuheng Yang, Jingxin Han, Andi Zhang |
阅读更多来源: ArXiv AI | 21-01-26
The Great March 100: 100 Detail-oriented Tasks for Evaluating Embodied AI Agents
Authors: Ziyu Wang, Chenyuan Liu, Yushun Xiang, Runhao Zhang, Qingbo Hao, Hongliang Lu, Houyu Chen, Zhizhong Feng, Kaiyue Zheng, Dehao Ye, Xianchao Zeng, Xinyu Zhou, Boran Wen, Jiaxin Li, Mingyu Zhang, Kecheng Zheng, Qian Zhu, Ran Cheng, Yong-Lu Li |
阅读更多来源: ArXiv AI | 21-01-26
The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents
Authors: Eilam Shapira, Roi Reichart, Moshe Tennenholtz |
阅读更多来源: ArXiv AI | 21-01-26
Do You Trust Me? Cognitive-Affective Signatures of Trustworthiness in Large Language Models
Authors: Gerard Yeo, Svetlana Churina, Kokil Jaidka |
阅读更多来源: ArXiv AI | 21-01-26
Japanese AI Agent System on Human Papillomavirus Vaccination: System Design
Authors: Junyu Liu, Siwen Yang, Dexiu Ma, Qian Niu, Zequn Zhang, Momoko Nagai-Tanima, Tomoki Aoyama |
阅读更多来源: ArXiv AI | 21-01-26
Building Production-Ready Probes For Gemini
Authors: János Kramár, Joshua Engels, Zheng Wang, Bilal Chughtai, Rohin Shah, Neel Nanda, Arthur Conmy |
阅读更多来源: ArXiv AI | 21-01-26
CTHA: Constrained Temporal Hierarchical Architecture for Stable Multi-Agent LLM Systems
Authors: Percy Jardine |
阅读更多来源: ArXiv AI | 21-01-26
ORBITFLOW: SLO-Aware Long-Context LLM Serving with Fine-Grained KV Cache Reconfiguration
Authors: Xinyue Ma, Heelim Hong, Taegeon Um, Jongseop Lee, Seoyeong Choy, Woo-Yeon Lee, Myeongjae Jeon |
阅读更多来源: ArXiv AI | 21-01-26
Building AI Agents to Improve Job Referral Requests to Strangers
Authors: Ross Chu, Yuting Huang |
阅读更多来源: ArXiv AI | 21-01-26
Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration
Authors: Sen Wang, Bangwei Liu, Zhenkun Gao, Lizhuang Ma, Xuhong Wang, Yuan Xie, Xin Tan |
阅读更多来源: ArXiv AI | 21-01-26
XChoice: Explainable Evaluation of AI-Human Alignment in LLM-based Constrained Choice Decision Making
Authors: Weihong Qi, Fan Huang, Rasika Muralidharan, Jisun An, Haewoon Kwak |
阅读更多来源: ArXiv AI | 21-01-26
Health Facility Location in Ethiopia: Leveraging LLMs to Integrate Expert Knowledge into Algorithmic Planning
Authors: Yohai Trabelsi, Guojun Xiong, Fentabil Getnet, Stéphane Verguet, Milind Tambe |
阅读更多来源: ArXiv AI | 21-01-26
Exploring LLM Features in Predictive Process Monitoring for Small-Scale Event-Logs
Authors: Alessandro Padella, Massimiliano de Leoni, Marlon Dumas |
阅读更多来源: ArXiv AI | 21-01-26
BoxMind: Closed-loop AI strategy optimization for elite boxing validated in the 2024 Olympics
Authors: Kaiwen Wang, Kaili Zheng, Rongrong Deng, Qingmin Fan, Milin Zhang, Zongrui Li, Xuesi Zhou, Bo Han, Liren Chen, Chenyi Guo, Ji Wu |
阅读更多来源: ArXiv AI | 21-01-26
The assistant axis: situating and stabilizing the character of LLMsanthropic.com
阅读更多来源: Hacker News | 20-01-26
Nanolang: A tiny experimental language designed to be targeted by coding LLMsgithub.com/jordanhubbard
阅读更多来源: Hacker News | 20-01-26
The coming industrialisation of exploit generation with LLMsheelan.io
阅读更多来源: Hacker News | 20-01-26
Upgrading DrizzleORM logging with AsyncLocalStoragenumeric.substack.com
阅读更多来源: Hacker News | 20-01-26
Using proxies to hide secrets from Claude Codejoinformal.com
阅读更多来源: Hacker News | 19-01-26
Show HN: I quit coding years ago. AI brought me backcalquio.com
阅读更多来源: Hacker News | 19-01-26
Ask HN: COBOL devs, how are AI coding affecting your work?
阅读更多来源: Hacker News | 19-01-26
Wikipedia: WikiProject AI Cleanupwikipedia.org
阅读更多来源: Hacker News | 19-01-26
Predicting OpenAI's ad strategyossa-ma.github.io
阅读更多来源: Hacker News | 19-01-26
Antisocial behavior towards large language model users: experimental evidence
Authors: Paweł Niszczota, Cassandra Grützner |
阅读更多来源: ArXiv AI | 18-01-26
Continuum Memory Architectures for Long-Horizon LLM Agents
Authors: Joe Logan |
阅读更多来源: ArXiv AI | 18-01-26
A Scoping Review of the Ethical Perspectives on Anthropomorphising Large Language Model-Based Conversational Agents
Authors: Andrea Ferrario, Rasita Vinay, Matteo Casserini, Alessandro Facchini |
阅读更多来源: ArXiv AI | 18-01-26
Structured Personality Control and Adaptation for LLM Agents
Authors: Jinpeng Wang, Xinyu Jia, Wei Wei Heng, Yuquan Li, Binbin Shi, Qianlei Chen, Guannan Chen, Junxia Zhang, Yuyu Yin |
阅读更多来源: ArXiv AI | 18-01-26
SPRInG: Continual LLM Personalization via Selective Parametric Adaptation and Retrieval-Interpolated Generation
Authors: Seoyeon Kim, Jaehyung Kim |
阅读更多来源: ArXiv AI | 18-01-26
Chinese Labor Law Large Language Model Benchmark
Authors: Zixun Lan, Maochun Xu, Yifan Ren, Rui Wu, Jianghui Zhou, Xueyang Cheng, Jianan Ding Ding, Xinheng Wang, Mingmin Chi, Fei Ma |
阅读更多来源: ArXiv AI | 18-01-26
Hallucination Detection and Mitigation in Large Language Models
Authors: Ahmad Pesaranghader, Erin Li |
阅读更多来源: ArXiv AI | 18-01-26
Is More Context Always Better? Examining LLM Reasoning Capability for Time Interval Prediction
Authors: Yanan Cao, Farnaz Fallahi, Murali Mohana Krishna Dandu, Lalitesh Morishetti, Kai Zhao, Luyi Ma, Sinduja Subramaniam, Jianpeng Xu, Evren Korpeoglu, Kaushiki Nag, Sushant Kumar, Kannan Achan |
阅读更多来源: ArXiv AI | 18-01-26
Following the Teacher's Footsteps: Scheduled Checkpoint Distillation for Domain-Specific LLMs
Authors: Cheng Feng, Chaoliang Zhong, Jun Sun, Yusuke Oishi |
阅读更多来源: ArXiv AI | 18-01-26
MHub.ai: A Simple, Standardized, and Reproducible Platform for AI Models in Medical Imaging
Authors: Leonard Nürnberg, Dennis Bontempi, Suraj Pai, Curtis Lisle, Steve Pieper, Ron Kikinis, Sil van de Leemput, Rahul Soni, Gowtham Murugesan, Cosmin Ciausu, Miriam Groeneveld, Felix J. Dorfner, Jue Jiang, Aneesh Rangnekar, Harini Veeraraghavan, Joeran S. Bosma, Keno Bressem, Raymond Mak, Andrey Fedorov, Hugo JWL Aerts |
阅读更多来源: ArXiv AI | 18-01-26
DecisionLLM: Large Language Models for Long Sequence Decision Exploration
Authors: Xiaowei Lv, Zhilin Zhang, Yijun Li, Yusen Huo, Siyuan Ju, Xuyan Li, Chunxiang Hong, Tianyu Wang, Yongcai Wang, Peng Sun, Chuan Yu, Jian Xu, Bo Zheng |
阅读更多来源: ArXiv AI | 18-01-26
Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering
Authors: Xinyu Zhu, Yuzhu Cai, Zexi Liu, Bingyang Zheng, Cheng Wang, Rui Ye, Jiaao Chen, Hanrui Wang, Wei-Chen Wang, Yuzhi Zhang, Linfeng Zhang, Weinan E, Di Jin, Siheng Chen |
阅读更多来源: ArXiv AI | 18-01-26
Topo-RAG: Topology-aware retrieval for hybrid text-table documents
Authors: Alex Dantart, Marco Kóvacs-Navarro |
阅读更多来源: ArXiv AI | 18-01-26
LLMdoctor: Token-Level Flow-Guided Preference Optimization for Efficient Test-Time Alignment of Large Language Models
Authors: Tiesunlong Shen, Rui Mao, Jin Wang, Heming Sun, Jian Zhang, Xuejie Zhang, Erik Cambria |
阅读更多来源: ArXiv AI | 18-01-26
LADFA: A Framework of Using Large Language Models and Retrieval-Augmented Generation for Personal Data Flow Analysis in Privacy Policies
Authors: Haiyue Yuan, Nikolay Matyunin, Ali Raza, Shujun Li |
阅读更多来源: ArXiv AI | 18-01-26
Defending Large Language Models Against Jailbreak Attacks via In-Decoding Safety-Awareness Probing
Authors: Yinzhi Zhao, Ming Wang, Shi Feng, Xiaocui Yang, Daling Wang, Yifei Zhang |
阅读更多来源: ArXiv AI | 18-01-26
A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5
Authors: Xingjun Ma, Yixu Wang, Hengyuan Xu, Yutao Wu, Yifan Ding, Yunhan Zhao, Zilong Wang, Jiabin Hua, Ming Wen, Jianan Liu, Ranjie Duan, Yifeng Gao, Yingshui Tan, Yunhao Chen, Hui Xue, Xin Wang, Wei Cheng, Jingjing Chen, Zuxuan Wu, Bo Li, Yu-Gang Jiang |
阅读更多来源: ArXiv AI | 18-01-26
Diagnosing Generalization Failures in Fine-Tuned LLMs: A Cross-Architectural Study on Phishing Detection
Authors: Frank Bobe III, Gregory D. Vetaw, Chase Pavlick, Darshan Bryner, Matthew Cook, Jose Salas-Vernis |
阅读更多来源: ArXiv AI | 18-01-26
Breaking Up with Normatively Monolithic Agency with GRACE: A Reason-Based Neuro-Symbolic Architecture for Safe and Ethical AI Alignment
Authors: Felix Jahn, Yannic Muskalla, Lisa Dargasz, Patrick Schramowski, Kevin Baum |
阅读更多来源: ArXiv AI | 18-01-26
Generative AI collective behavior needs an interactionist paradigm
Authors: Laura Ferrarotti, Gian Maria Campedelli, Roberto Dessì, Andrea Baronchelli, Giovanni Iacca, Kathleen M. Carley, Alex Pentland, Joel Z. Leibo, James Evans, Bruno Lepri |
阅读更多来源: ArXiv AI | 18-01-26
The Impact of Generative AI on Architectural Conceptual Design: Performance, Creative Self-Efficacy and Cognitive Load
Authors: Han Jiang, Yao Xiao, Rachel Hurley, Shichao Liu |
阅读更多来源: ArXiv AI | 18-01-26
Erdos 281 solved with ChatGPT 5.2 Protwitter.com/neelsomani
阅读更多来源: Hacker News | 18-01-26
We put Claude Code in Rollercoaster Tycoonramp.com
阅读更多来源: Hacker News | 18-01-26
Show HN: GibRAM an in-memory ephemeral GraphRAG runtime for retrievalgithub.com/gibram-io
阅读更多来源: Hacker News | 18-01-26
Show HN: Figma-use – CLI to control Figma for AI agentsgithub.com/dannote
阅读更多来源: Hacker News | 18-01-26
Starting from scratch: Training a 30M Topological Transformertuned.org.uk
阅读更多来源: Hacker News | 18-01-26
The Dilbert Afterlifeastralcodexten.com
阅读更多来源: Hacker News | 18-01-26
How London cracked mobile phone coverage on the Undergroundianvisits.co.uk
阅读更多来源: Hacker News | 18-01-26
Reading across books with Claude Codepieterma.es
阅读更多来源: Hacker News | 17-01-26
LLM Structured Outputs Handbooknanonets.com
阅读更多来源: Hacker News | 17-01-26
Cursor's latest “browser experiment” implied success without evidenceembedding-shapes.github.io
阅读更多来源: Hacker News | 17-01-26
Show HN: 1Code – Open-source Cursor-like UI for Claude Codegithub.com/21st-dev
阅读更多来源: Hacker News | 17-01-26
Zep AI (Agent Context Engineering, YC W24) Is Hiring Forward Deployed Engineersycombinator.com
阅读更多来源: Hacker News | 17-01-26
Show HN: B-IR – An LLM-optimized programming languagegithub.com/imjasonh
阅读更多来源: Hacker News | 17-01-26
Install.md: A standard for LLM-executable installationmintlify.com
阅读更多来源: Hacker News | 17-01-26
Show HN: OpenWork – An open-source alternative to Claude Coworkgithub.com/different-ai
阅读更多来源: Hacker News | 16-01-26
First impressions of Claude Coworksimonw.substack.com
阅读更多来源: Hacker News | 16-01-26
Claude is good at assembling blocks, but still falls apart at creating themapproachwithalacrity.com
阅读更多来源: Hacker News | 16-01-26
Signal creator Moxie Marlinspike wants to do for AI what he did for messagingarstechnica.com
阅读更多来源: Hacker News | 16-01-26
Claude Cowork runs Linux VM via Apple virtualization frameworkgist.github.com
阅读更多来源: Hacker News | 16-01-26
Tldraw pauses external contributions due to AI slopgithub.com/tldraw
阅读更多来源: Hacker News | 16-01-26
Playing Arcade Mahjong at Home? Or is it just a Mirage?nicole.express
阅读更多来源: Hacker News | 16-01-26
Show HN: Gambit, an open-source agent harness for building reliable AI agentsgithub.com/bolt-foundry
阅读更多来源: Hacker News | 16-01-26
Aviator (YC S21) is hiring to build multiplayer AI coding platformycombinator.com
阅读更多来源: Hacker News | 16-01-26
Ask HN: How are you doing RAG locally?
阅读更多来源: Hacker News | 15-01-26
Raspberry Pi's New AI Hat Adds 8GB of RAM for Local LLMsjeffgeerling.com
阅读更多来源: Hacker News | 15-01-26
Claude Cowork exfiltrates filespromptarmor.com
阅读更多来源: Hacker News | 15-01-26
Bridging Semantic Understanding and Popularity Bias with LLMs
Authors: Renqiang Luo, Dong Zhang, Yupeng Gao, Wen Shi, Mingliang Hou, Jiaying Liu, Zhe Wang, Shuo Yu |
阅读更多来源: ArXiv AI | 15-01-26
Benchmarking Post-Training Quantization of Large Language Models under Microscaling Floating Point Formats
Authors: Manyi Zhang, Ji-Fu Li, Zhongao Sun, Haoli Bai, Hui-Ling Zhen, Zhenhua Dong, Xianzhi Yu |
阅读更多来源: ArXiv AI | 15-01-26
Private LLM Inference on Consumer Blackwell GPUs: A Practical Guide for Cost-Effective Local Deployment in SMEs
Authors: Jonathan Knoop, Hendrik Holtmann |
阅读更多来源: ArXiv AI | 15-01-26
Full Disclosure, Less Trust? How the Level of Detail about AI Use in News Writing Affects Readers' Trust
Authors: Pooja Prajod, Hannes Cools, Thomas Röggla, Karthikeya Puttur Venkatraj, Amber Kusters, Alia ElKattan, Pablo Cesar, Abdallah El Ali |
阅读更多来源: ArXiv AI | 15-01-26
LLMs can Compress LLMs: Adaptive Pruning by Agents
Authors: Sai Varun Kodathala, Rakesh Vunnam |
阅读更多来源: ArXiv AI | 15-01-26
Routing with Generated Data: Annotation-Free LLM Skill Estimation and Expert Selection
Authors: Tianyi Niu, Justin Chih-Yao Chen, Genta Indra Winata, Shi-Xiong Zhang, Supriyo Chakraborty, Sambit Sahu, Yue Zhang, Elias Stengel-Eskin, Mohit Bansal |
阅读更多来源: ArXiv AI | 15-01-26
From Prompt to Protocol: Fast Charging Batteries with Large Language Models
Authors: Ge Lei, Ferran Brosa Planella, Sterling G. Baird, Samuel J. Cooper |
阅读更多来源: ArXiv AI | 15-01-26
ART: Action-based Reasoning Task Benchmarking for Medical AI Agents
Authors: Ananya Mantravadi, Shivali Dalmia, Abhishek Mukherji |
阅读更多来源: ArXiv AI | 15-01-26
Value-Aware Numerical Representations for Transformer Language Models
Authors: Andreea Dutulescu, Stefan Ruseti, Mihai Dascalu |
阅读更多来源: ArXiv AI | 15-01-26
DScheLLM: Enabling Dynamic Scheduling through a Fine-Tuned Dual-System Large language Model
Authors: Lixiang Zhang, Chenggong Zhao, Qing Gao, Xiaoke Zhao, Gengyi Bai, Jinhu Lv |
阅读更多来源: ArXiv AI | 15-01-26
The AI Hippocampus: How Far are We From Human Memory?
Authors: Zixia Jia, Jiaqi Li, Yipeng Kang, Yuxuan Wang, Tong Wu, Quansen Wang, Xiaobo Wang, Shuyi Zhang, Junzhe Shen, Qing Li, Siyuan Qi, Yitao Liang, Di He, Zilong Zheng, Song-Chun Zhu |
阅读更多来源: ArXiv AI | 15-01-26
MAXS: Meta-Adaptive Exploration with LLM Agents
Authors: Jian Zhang, Zhiyuan Wang, Zhangqi Wang, Yu He, Haoran Luo, li yuan, Lingling Zhang, Rui Mao, Qika Lin, Jun Liu |
阅读更多来源: ArXiv AI | 15-01-26
Position on LLM-Assisted Peer Review: Addressing Reviewer Gap through Mentoring and Feedback
Authors: JungMin Yun, JuneHyoung Kwon, MiHyeon Kim, YoungBin Kim |
阅读更多来源: ArXiv AI | 15-01-26
PrivacyReasoner: Can LLM Emulate a Human-like Privacy Mind?
Authors: Yiwen Tu, Xuan Liu, Lianhui Qin, Haojian Jin |
阅读更多来源: ArXiv AI | 15-01-26
Efficient Paths and Dense Rewards: Probabilistic Flow Reasoning for Large Language Models
Authors: Yan Liu, Feng Zhang, Zhanyu Ma, Jun Xu, Jiuchong Gao, Jinghua Hao, Renqing He, Han Liu, Yangdong Deng |
阅读更多来源: ArXiv AI | 15-01-26
Coordinated Pandemic Control with Large Language Model Agents as Policymaking Assistants
Authors: Ziyi Shi, Xusen Guo, Hongliang Lu, Mingxing Peng, Haotian Wang, Zheng Zhu, Zhenning Li, Yuxuan Liang, Xinhu Zheng, Hai Yang |
阅读更多来源: ArXiv AI | 15-01-26
Monte-Carlo Tree Search with Neural Network Guidance for Lane-Free Autonomous Driving
Authors: Ioannis Peridis, Dimitrios Troullinos, Georgios Chalkiadakis, Pantelis Giankoulidis, Ioannis Papamichail, Markos Papageorgiou |
阅读更多来源: ArXiv AI | 15-01-26
What Do LLM Agents Know About Their World? Task2Quiz: A Paradigm for Studying Environment Understanding
Authors: Siyuan Liu, Hongbang Yuan, Xinze Li, Ziyue Zhu, Yixin Cao, Yu-Gang Jiang |
阅读更多来源: ArXiv AI | 15-01-26
LLM for Large-Scale Optimization Model Auto-Formulation: A Lightweight Few-Shot Learning Approach
Authors: Kuo Liang, Yuhang Lu, Jianming Mao, Shuyi Sun, Chunwei Yang, Congcong Zeng, Xiao Jin, Hanzhang Qin, Ruihao Zhu, Chung-Piaw Teo |
阅读更多来源: ArXiv AI | 15-01-26
Automating Supply Chain Disruption Monitoring via an Agentic AI Approach
Authors: Sara AlMahri, Liming Xu, Alexandra Brintrup |
阅读更多来源: ArXiv AI | 15-01-26
Anthropic Explicitly Blocking OpenCodegist.github.com
阅读更多来源: Hacker News | 15-01-26
Ask HN: How do you safely give LLMs SSH/DB access?
阅读更多来源: Hacker News | 15-01-26
Native ZFS VDEV for Object Storage (OpenZFS Summit)zettalane.com
阅读更多来源: Hacker News | 15-01-26
Ford F-150 Lightning outsold the Cybertruck and was then canceled for poor saleselectrek.co
阅读更多来源: Hacker News | 15-01-26
vLLM large scale serving: DeepSeek 2.2k tok/s/h200 with wide-epvllm.ai
阅读更多来源: Hacker News | 14-01-26
Systematically generating tests that would have caught Anthropic's top‑K bugtheorem.dev
阅读更多来源: Hacker News | 14-01-26
Show HN: OSS AI agent that indexes and searches the Epstein filestrynia.ai
阅读更多来源: Hacker News | 14-01-26
ZeroDVFS: Zero-Shot LLM-Guided Core and Frequency Allocation for Embedded Platforms
Authors: Mohammad Pivezhandi, Mahdi Banisharif, Abusayeed Saifullah, Ali Jannesari |
阅读更多来源: ArXiv AI | 14-01-26
Embedded AI Companion System on Edge Devices
Authors: Rahul Gupta, Stephen D.H. Hsu |
阅读更多来源: ArXiv AI | 14-01-26
The End of Reward Engineering: How LLMs Are Redefining Multi-Agent Coordination
Authors: Haoran Su, Yandong Sun, Congjia Yu |
阅读更多来源: ArXiv AI | 14-01-26
Improving LLM Reasoning with Homophily-aware Structural and Semantic Text-Attributed Graph Compression
Authors: Zijun Di, Bin Lu, Huquan Kang, Luoyi Fu, Jiaxin Ding, Xiaoying Gan, Lei Zhou, Xinbing Wang, Chenghu Zhou |
阅读更多来源: ArXiv AI | 14-01-26
Greedy Is Enough: Sparse Action Discovery in Agentic LLMs
Authors: Angshul Majumdar |
阅读更多来源: ArXiv AI | 14-01-26
Sparsity Is Necessary: Polynomial-Time Stability for Agentic LLMs in Large Action Spaces
Authors: Angshul Majumdar |
阅读更多来源: ArXiv AI | 14-01-26
Large Artificial Intelligence Model Guided Deep Reinforcement Learning for Resource Allocation in Non Terrestrial Networks
Authors: Abdikarim Mohamed Ibrahim, Rosdiadee Nordin |
阅读更多来源: ArXiv AI | 14-01-26
Owen-Shapley Policy Optimization (OSPO): A Principled RL Algorithm for Generative Search LLMs
Authors: Abhijnan Nath, Alireza Bagheri Garakani, Tianchen Zhou, Fan Yang, Nikhil Krishnaswamy |
阅读更多来源: ArXiv AI | 14-01-26
Creativity in AI as Emergence from Domain-Limited Generative Models
Authors: Corina Chutaux (SU FdL) |
阅读更多来源: ArXiv AI | 14-01-26
Thematic Working Group 5 -- Artificial Intelligence (AI) literacy for teaching and learning: design and implementation
Authors: Mary Webb, Matt Bower, Ana Amélia Carvalho, Fredrik Mørk Røkenes, Jodie Torrington, Jonathan D. Cohen, Yousra Chtouki, Kathryn Maccallum, Tanya Linden, Deirdre Butler, Juliana Elisa Raffaghelli, Henriikka Vartiainen, Martina Ronci, Peter Tiernan, David M. Smith, Chris Shelton, Joyce Malyn-smith, Pierre Gorissen |
阅读更多来源: ArXiv AI | 14-01-26
Semantic Laundering in AI Agent Architectures: Why Tool Boundaries Do Not Confer Epistemic Warrant
Authors: Oleg Romanchuk, Roman Bondar |
阅读更多来源: ArXiv AI | 14-01-26
M3-BENCH: Process-Aware Evaluation of LLM Agents Social Behaviors in Mixed-Motive Games
Authors: Sixiong Xie, Zhuofan Shi, Haiyang Shen, Gang Huang, Yun Ma, Xiang Jing |
阅读更多来源: ArXiv AI | 14-01-26
ViDoRe V3: A Comprehensive Evaluation of Retrieval Augmented Generation in Complex Real-World Scenarios
Authors: António Loison, Quentin Macé, Antoine Edy, Victor Xing, Tom Balough, Gabriel Moreira, Bo Liu, Manuel Faysse, Céline Hudelot, Gautier Viaud |
阅读更多来源: ArXiv AI | 14-01-26
Parallel Context-of-Experts Decoding for Retrieval Augmented Generation
Authors: Giulio Corallo, Paolo Papotti |
阅读更多来源: ArXiv AI | 14-01-26
Prism: Towards Lowering User Cognitive Load in LLMs via Complex Intent Understanding
Authors: Zenghua Liao, Jinzhi Liao, Xiang Zhao |
阅读更多来源: ArXiv AI | 14-01-26
Why AI Alignment Failure Is Structural: Learned Human Interaction Structures and AGI as an Endogenous Evolutionary Shock
Authors: Didier Sornette, Sandro Claudio Lera, Ke Wu |
阅读更多来源: ArXiv AI | 14-01-26
Uncovering Political Bias in Large Language Models using Parliamentary Voting Records
Authors: Jieying Chen, Karen de Jong, Andreas Poole, Jan Burakowski, Elena Elderson Nosti, Joep Windt, Chendi Wang |
阅读更多来源: ArXiv AI | 14-01-26
Confer – End to end encrypted AI chatconfer.to
阅读更多来源: Hacker News | 14-01-26
Let's be honest, Generative AI isn't going all that wellgarymarcus.substack.com
阅读更多来源: Hacker News | 14-01-26
We can't have nice things because of AI scrapersmetabrainz.org
阅读更多来源: Hacker News | 14-01-26
Show HN: Yolobox – Run AI coding agents with full sudo without nuking home dirgithub.com/finbarr
阅读更多来源: Hacker News | 13-01-26
Show HN: Agent-of-empires: OpenCode and Claude Code session managergithub.com/njbrake
阅读更多来源: Hacker News | 13-01-26
Show HN: AI in SolidWorkstrylad.com
阅读更多来源: Hacker News | 13-01-26
Anthropic made a mistake in cutting off third-party clientsarchaeologist.dev
阅读更多来源: Hacker News | 13-01-26
Apple picks Gemini to power Siricnbc.com
阅读更多来源: Hacker News | 13-01-26
Owners, not renters: Mozilla's open source AI strategyblog.mozilla.org
阅读更多来源: Hacker News | 13-01-26
Postal Arbitragewalzr.com
阅读更多来源: Hacker News | 13-01-26
Cowork: Claude Code for the rest of your workclaude.com
阅读更多来源: Hacker News | 13-01-26
TimeCapsuleLLM: LLM trained only on data from 1800-1875github.com/haykgrigo3
阅读更多来源: Hacker News | 13-01-26
FOSS in times of war, scarcity and (adversarial) AI [video]fosdem.org
阅读更多来源: Hacker News | 13-01-26
SafePro: Evaluating the Safety of Professional-Level AI Agents
Authors: Kaiwen Zhou, Shreedhar Jangam, Ashwin Nagarajan, Tejas Polu, Suhas Oruganti, Chengzhi Liu, Ching-Chen Kuo, Yuting Zheng, Sravana Narayanaraju, Xin Eric Wang |
阅读更多来源: ArXiv AI | 13-01-26
From Text to Simulation: A Multi-Agent LLM Workflow for Automated Chemical Process Design
Authors: Xufei Tian, Wenli Du, Shaoyi Yang, Han Hu, Hui Xin, Shifeng Qu, Ke Ye |
阅读更多来源: ArXiv AI | 13-01-26
Code Evolution for Control: Synthesizing Policies via LLM-Driven Evolutionary Search
Authors: Ping Guo, Chao Li, Yinglan Feng, Chaoning Zhang |
阅读更多来源: ArXiv AI | 13-01-26
An Ubuntu-Guided Large Language Model Framework for Cognitive Behavioral Mental Health Dialogue
Authors: Sontaga G. Forane, Absalom E. Ezugwu, Kevin Igwe, Karen van den Berg |
阅读更多来源: ArXiv AI | 13-01-26
A Brain-like Synergistic Core in LLMs Drives Behaviour and Learning
Authors: Pedro Urbina-Rodriguez, Zafeirios Fountas, Fernando E. Rosas, Jun Wang, Andrea I. Luppi, Haitham Bou-Ammar, Murray Shanahan, Pedro A. M. Mediano |
阅读更多来源: ArXiv AI | 13-01-26
CloneMem: Benchmarking Long-Term Memory for AI Clones
Authors: Sen Hu, Zhiyu Zhang, Yuxiang Wei, Xueran Han, Zhenheng Tang, Huacan Wang, Ronghao Chen |
阅读更多来源: ArXiv AI | 13-01-26
LLM Performance Predictors: Learning When to Escalate in Hybrid Human-AI Moderation Systems
Authors: Or Bachar, Or Levi, Sardhendu Mishra, Adi Levi, Manpreet Singh Minhas, Justin Miller, Omer Ben-Porat, Eilon Sheetrit, Jonathan Morra |
阅读更多来源: ArXiv AI | 13-01-26
mind_call: A Dataset for Mental Health Function Calling with Large Language Models
Authors: Fozle Rabbi Shafi, M. Anwar Hossain, Salimur Choudhury |
阅读更多来源: ArXiv AI | 13-01-26
AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units
Authors: Xinzi Cao, Jianyang Zhai, Pengfei Li, Zhiheng Hu, Cen Yan, Bingxu Mu, Guanghuan Fang, Bin She, Jiayu Li, Yihan Su, Dongyang Tao, Xiansong Huang, Fan Xu, Feidiao Yang, Yao Lu, Chang-Dong Wang, Yutong Lu, Weicheng Xue, Bin Zhou, Yonghong Tian |
阅读更多来源: ArXiv AI | 13-01-26
ENTRA: Entropy-Based Redundancy Avoidance in Large Language Model Reasoning
Authors: Ruichu Cai, Haopeng Du, Qingwen Lin, Yutong Chen, Zijian Li, Boyan Xu |
阅读更多来源: ArXiv AI | 13-01-26
LLMRouterBench: A Massive Benchmark and Unified Framework for LLM Routing
Authors: Hao Li, Yiqun Zhang, Zhaoyan Guo, Chenxu Wang, Shengji Tang, Qiaosheng Zhang, Yang Chen, Biqing Qi, Peng Ye, Lei Bai, Zhen Wang, Shuyue Hu |
阅读更多来源: ArXiv AI | 13-01-26
Active Context Compression: Autonomous Memory Management in LLM Agents
Authors: Nikhil Verma |
阅读更多来源: ArXiv AI | 13-01-26
Learning to Trust the Crowd: A Multi-Model Consensus Reasoning Engine for Large Language Models
Authors: Pranav Kallem |
阅读更多来源: ArXiv AI | 13-01-26
ARM: Role-Conditioned Neuron Transplantation for Training-Free Generalist LLM Agent Merging
Authors: Zhuoka Feng, Kang Chen, Sihan Zhao, Kai Xiong, Yaoning Wang, Minshen Yu, Junjie Nian, Changyi Xiao, Yixin Cao, Yugang Jiang |
阅读更多来源: ArXiv AI | 13-01-26
Knowledge Distillation for LLM-Based Human Activity Recognition in Homes
Authors: Julien Cumin, Oussama Er-Rahmany, Xi Chen (UGA) |
阅读更多来源: ArXiv AI | 13-01-26
Beyond Dialogue Time: Temporal Semantic Memory for Personalized LLM Agents
Authors: Miao Su, Yucan Guo, Zhongni Hou, Long Bai, Zixuan Li, Yufei Zhang, Guojun Yin, Wei Lin, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng |
阅读更多来源: ArXiv AI | 13-01-26
VirtualEnv: A Platform for Embodied AI Research
Authors: Kabir Swain, Sijie Han, Ayush Raina, Jin Zhang, Shuang Li, Michael Stopa, Antonio Torralba |
阅读更多来源: ArXiv AI | 13-01-26
Predictive Analytics for Dementia: Machine Learning on Healthcare Data
Authors: Shafiul Ajam Opee, Nafiz Fahad, Anik Sen, Rasel Ahmed, Fariha Jahan, Md. Kishor Morol, Md Rashedul Islam |
阅读更多来源: ArXiv AI | 13-01-26
Can AI mediation improve democratic deliberation?
Authors: Michael Henry Tessler, Georgina Evans, Michiel A. Bakker, Iason Gabriel, Sophie Bridgers, Rishub Jain, Raphael Koster, Verena Rieser, Anca Dragan, Matthew Botvinick, Christopher Summerfield |
阅读更多来源: ArXiv AI | 13-01-26
Gender Bias in LLMs: Preliminary Evidence from Shared Parenting Scenario in Czech Family Law
Authors: Jakub Harasta, Matej Vasina, Martin Kornel, Tomas Foltynek |
阅读更多来源: ArXiv AI | 13-01-26
Continual-learning for Modelling Low-Resource Languages from Large Language Models
Authors: Santosh Srinath K, Mudit Somani, Varun Reddy Padala, Prajna Devi Upadhyay, Abhijit Das |
阅读更多来源: ArXiv AI | 13-01-26
Can We Predict Before Executing Machine Learning Agents?
Authors: Jingsheng Zheng, Jintian Zhang, Yujie Luo, Yuren Mao, Yunjun Gao, Lun Du, Huajun Chen, Ningyu Zhang |
阅读更多来源: ArXiv AI | 13-01-26
Agentic LLMs as Powerful Deanonymizers: Re-identification of Participants in the Anthropic Interviewer Dataset
Authors: Tianshi Li |
阅读更多来源: ArXiv AI | 13-01-26
Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency
Authors: Haoming Xu, Ningyuan Zhao, Yunzhi Yao, Weihong Xu, Hongru Wang, Xinle Deng, Shumin Deng, Jeff Z. Pan, Huajun Chen, Ningyu Zhang |
阅读更多来源: ArXiv AI | 13-01-26
AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling for LLMs
Authors: Chengming Cui, Tianxin Wei, Ziyi Chen, Ruizhong Qiu, Zhichen Zeng, Zhining Liu, Xuying Ning, Duo Zhou, Jingrui He |
阅读更多来源: ArXiv AI | 13-01-26
Performance of a Deep Learning-Based Segmentation Model for Pancreatic Tumors on Public Endoscopic Ultrasound Datasets
Authors: Pankaj Gupta, Priya Mudgil, Niharika Dutta, Kartik Bose, Nitish Kumar, Anupam Kumar, Jimil Shah, Vaneet Jearth, Jayanta Samanta, Vishal Sharma, Harshal Mandavdhare, Surinder Rana, Saroj K Sinha, Usha Dutta |
阅读更多来源: ArXiv AI | 13-01-26
Improving Enzyme Prediction with Chemical Reaction Equations by Hypergraph-Enhanced Knowledge Graph Embeddings
Authors: Tengwei Song, Long Yin, Zhen Han, Zhiqiang Xu |
阅读更多来源: ArXiv AI | 13-01-26
Effects of personality steering on cooperative behavior in Large Language Model agents
Authors: Mizuki Sakai, Mizuki Yokoyama, Wakaba Tateishi, Genki Ichinose |
阅读更多来源: ArXiv AI | 13-01-26
MMUEChange: A Generalized LLM Agent Framework for Intelligent Multi-Modal Urban Environment Change Analysis
Authors: Zixuan Xiao, Jun Ma, Siwei Zhang |
阅读更多来源: ArXiv AI | 13-01-26
Conformity and Social Impact on AI Agents
Authors: Alessandro Bellina, Giordano De Marzo, David Garcia |
阅读更多来源: ArXiv AI | 13-01-26
Crisis-Bench: Benchmarking Strategic Ambiguity and Reputation Management in Large Language Models
Authors: Cooper Lin, Maohao Ran, Yanting Zhang, Zhenglin Wan, Hongwei Fan, Yibo Xu, Yike Guo, Wei Xue, Jun Song |
阅读更多来源: ArXiv AI | 13-01-26
Safety Not Found (404): Hidden Risks of LLM-Based Robotics Decision Making
Authors: Jua Han, Jaeyoon Seo, Jungbin Min, Jean Oh, Jihie Kim |
阅读更多来源: ArXiv AI | 13-01-26
The Evaluation Gap in Medicine, AI and LLMs: Navigating Elusive Ground Truth & Uncertainty via a Probabilistic Paradigm
Authors: Aparna Elangovan, Lei Xu, Mahsa Elyasi, Ismail Akdulum, Mehmet Aksakal, Enes Gurun, Brian Hur, Saab Mansour, Ravid Shwartz Ziv, Karin Verspoor, Dan Roth |
阅读更多来源: ArXiv AI | 13-01-26
Reinforcement Learning of Large Language Models for Interpretable Credit Card Fraud Detection
Authors: Cooper Lin, Yanting Zhang, Maohao Ran, Wei Xue, Hongwei Fan, Yibo Xu, Zhenglin Wan, Sirui Han, Yike Guo, Jun Song |
阅读更多来源: ArXiv AI | 13-01-26
Logic-Parametric Neuro-Symbolic NLI: Controlling Logical Formalisms for Verifiable LLM Reasoning
Authors: Ali Farjami, Luca Redondi, Marco Valentino |
阅读更多来源: ArXiv AI | 13-01-26
TowerMind: A Tower Defence Game Learning Environment and Benchmark for LLM as Agents
Authors: Dawei Wang, Chengming Zhou, Di Zhao, Xinyuan Liu, Marci Chi Ma, Gary Ushaw, Richard Davison |
阅读更多来源: ArXiv AI | 13-01-26
Google removes AI health summaries after investigation finds dangerous flawsarstechnica.com
阅读更多来源: Hacker News | 13-01-26
Insights into Claude Opus 4.5 from Pokémonlesswrong.com
阅读更多来源: Hacker News | 12-01-26
Ozempic reduced grocery spending by an average of 5.3% in the UScornell.edu
阅读更多来源: Hacker News | 12-01-26
Show HN: What if AI agents had Zodiac personalities?github.com/baturyilmaz
阅读更多来源: Hacker News | 12-01-26
LLM poetry and the "greatness" question: Experiments by Gwern and Mercorhollisrobbinsanecdotal.substack.com
阅读更多来源: Hacker News | 11-01-26
Show HN: Play poker with LLMs, or watch them play against each otherllmholdem.com
阅读更多来源: Hacker News | 11-01-26
Google: Don't make "bite-sized" content for LLMsarstechnica.com
阅读更多来源: Hacker News | 11-01-26
Show HN: I used Claude Code to discover connections between 100 bookspieterma.es
阅读更多来源: Hacker News | 11-01-26
ThinkDrive: Chain-of-Thought Guided Progressive Reinforcement Learning Fine-Tuning for Autonomous Driving
Authors: Chang Zhao, Zheming Yang, Yunqing Hu, Qi Guo, Zijian Wang, Pengcheng Li, Wen Ji |
阅读更多来源: ArXiv AI | 11-01-26
Orion-RAG: Path-Aligned Hybrid Retrieval for Graphless Data
Authors: Zhen Chen, Weihao Xie, Peilin Chen, Shiqi Wang, Jianping Wang |
阅读更多来源: ArXiv AI | 11-01-26
What Students Ask, How a Generative AI Assistant Responds: Exploring Higher Education Students' Dialogues on Learning Analytics Feedback
Authors: Yildiz Uzun, Andrea Gauthier, Mutlu Cukurova |
阅读更多来源: ArXiv AI | 11-01-26
DVD: A Robust Method for Detecting Variant Contamination in Large Language Model Evaluation
Authors: Renzhao Liang, Jingru Chen, Bo Jia, Bo Deng, Chenggang Xie, Yidong Wang, Ke Jin, Xin Wang, Linfeng Zhang, Cunxiang Wang |
阅读更多来源: ArXiv AI | 11-01-26
T-Retriever: Tree-based Hierarchical Retrieval Augmented Generation for Textual Graphs
Authors: Chunyu Wei, Huaiyu Qin, Siyuan He, Yunhai Wang, Yueguo Chen |
阅读更多来源: ArXiv AI | 11-01-26
Conversational AI for Rapid Scientific Prototyping: A Case Study on ESA's ELOPE Competition
Authors: Nils Einecke |
阅读更多来源: ArXiv AI | 11-01-26
An Empirical Investigation of Robustness in Large Language Models under Tabular Distortions
Authors: Avik Dutta, Harshit Nigam, Hosein Hasanbeig, Arjun Radhakrishna, Sumit Gulwani |
阅读更多来源: ArXiv AI | 11-01-26
Publishing FAIR and Machine-actionable Reviews in Materials Science: The Case for Symbolic Knowledge in Neuro-symbolic Artificial Intelligence
Authors: Jennifer D'Souza, Soren Auer, Eleni Poupaki, Alex Watkins, Anjana Devi, Riikka L. Puurunen, Bora Karasulu, Adrie Mackus, Erwin Kessels |
阅读更多来源: ArXiv AI | 11-01-26
Large language models can effectively convince people to believe conspiracies
Authors: Thomas H. Costello, Kellin Pelrine, Matthew Kowal, Antonio A. Arechar, Jean-François Godbout, Adam Gleave, David Rand, Gordon Pennycook |
阅读更多来源: ArXiv AI | 11-01-26
Token-Level LLM Collaboration via FusionRoute
Authors: Nuoya Xiong, Yuhang Zhou, Hanqing Zeng, Zhaorun Chen, Furong Huang, Shuchao Bi, Lizhu Zhang, Zhuokai Zhao |
阅读更多来源: ArXiv AI | 11-01-26
Evaluative Fingerprints: Stable and Systematic Differences in LLM Evaluator Behavior
Authors: Wajid Nasser |
阅读更多来源: ArXiv AI | 11-01-26
Observations and Remedies for Large Language Model Bias in Self-Consuming Performative Loop
Authors: Yaxuan Wang, Zhongteng Cai, Yujia Bao, Xueru Zhang, Yang Liu |
阅读更多来源: ArXiv AI | 11-01-26
Stock Market Price Prediction using Neural Prophet with Deep Neural Network
Authors: Navin Chhibber, Suneel Khemka, Navneet Kumar Tyagi, Rohit Tewari, Bireswar Banerjee, Piyush Ranjan |
阅读更多来源: ArXiv AI | 11-01-26
SimuAgent: An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning
Authors: Yanchang Liang, Xiaowei Zhao |
阅读更多来源: ArXiv AI | 11-01-26
Kodbox: Open-source cloud desktop with multi-storage fusion and web IDEgithub.com/kalcaddle
阅读更多来源: Hacker News | 11-01-26
ChatGPT Health is a marketplace, guess who is the product?consciousdigital.org
阅读更多来源: Hacker News | 11-01-26
How to code Claude Code in 200 lines of codemihaileric.com
阅读更多来源: Hacker News | 10-01-26
My article on why AI is great (or terrible) or how to use itmatthewrocklin.com
阅读更多来源: Hacker News | 10-01-26
Show HN: EuConform – Offline-first EU AI Act compliance tool (open source)github.com/hiepler
阅读更多来源: Hacker News | 10-01-26
Show HN: macOS menu bar app to track Claude usage in real timegithub.com/richhickson
阅读更多来源: Hacker News | 09-01-26
Google AI Studio is now sponsoring Tailwind CSStwitter.com/officiallogank
阅读更多来源: Hacker News | 09-01-26
He was called a 'terrorist sympathizer.' Now his AI company is valued at $3Bsfstandard.com
阅读更多来源: Hacker News | 09-01-26
Anthropic blocks third-party use of Claude Code subscriptionsgithub.com/anomalyco
阅读更多来源: Hacker News | 09-01-26
Dell admits consumers don't care about AI PCspcgamer.com
阅读更多来源: Hacker News | 09-01-26
Digital Red Queen: Adversarial Program Evolution in Core War with LLMssakana.ai
阅读更多来源: Hacker News | 09-01-26
Task-free intelligence testing of LLMsmarble.onl
阅读更多来源: Hacker News | 09-01-26
Claude Code CLI was brokengithub.com/anthropics
阅读更多来源: Hacker News | 08-01-26
How Google got its groove back and edged ahead of OpenAIwsj.com
阅读更多来源: Hacker News | 08-01-26
ChatGPT Healthopenai.com
阅读更多来源: Hacker News | 08-01-26
Kernel bugs hide for 2 years on average. Some hide for 20pebblebed.com
阅读更多来源: Hacker News | 08-01-26
Bayes-PD: Exploring a Sequence to Binding Bayesian Neural Network model trained on Phage Display data
Authors: Ilann Amiaud-Plachy, Michael Blank, Oliver Bent, Sebastien Boyer |
阅读更多来源: ArXiv AI | 08-01-26
A Gap Between Decision Trees and Neural Networks
Authors: Akash Kumar |
阅读更多来源: ArXiv AI | 08-01-26
Large-Scale Aspect-Based Sentiment Analysis with Reasoning-Infused LLMs
Authors: Paweł Liskowski, Krzysztof Jankowski |
阅读更多来源: ArXiv AI | 08-01-26
HoneyTrap: Deceiving Large Language Model Attackers to Honeypot Traps with Resilient Multi-Agent Defense
Authors: Siyuan Li, Xi Lin, Jun Wu, Zehao Liu, Haoyu Li, Tianjie Ju, Xiang Chen, Jianhua Li |
阅读更多来源: ArXiv AI | 08-01-26
ContextFocus: Activation Steering for Contextual Faithfulness in Large Language Models
Authors: Nikhil Anand, Shwetha Somasundaram, Anirudh Phukan, Apoorv Saxena, Koyel Mukherjee |
阅读更多来源: ArXiv AI | 08-01-26
Digital Red Queen: Adversarial Program Evolution in Core War with LLMs
Authors: Akarsh Kumar, Ryan Bahlous-Boldi, Prafull Sharma, Phillip Isola, Sebastian Risi, Yujin Tang, David Ha |
阅读更多来源: ArXiv AI | 08-01-26
Embedding Autonomous Agents in Resource-Constrained Robotic Platforms
Authors: Negar Halakou, Juan F. Gutierrez, Ye Sun, Han Jiang, Xueming Wu, Yilun Song, Andres Gomez |
阅读更多来源: ArXiv AI | 08-01-26
CPGPrompt: Translating Clinical Guidelines into LLM-Executable Decision Support
Authors: Ruiqi Deng, Geoffrey Martin, Tony Wang, Gongbo Zhang, Yi Liu, Chunhua Weng, Yanshan Wang, Justin F Rousseau, Yifan Peng |
阅读更多来源: ArXiv AI | 08-01-26
Enhancing LLM Instruction Following: An Evaluation-Driven Multi-Agentic Workflow for Prompt Instructions Optimization
Authors: Alberto Purpura, Li Wang, Sahil Badyal, Eugenio Beaufrand, Adam Faulkner |
阅读更多来源: ArXiv AI | 08-01-26
ReEfBench: Quantifying the Reasoning Efficiency of LLMs
Authors: Zhizhang Fu, Yuancheng Gu, Chenkai Hu, Hanmeng Liu, Yue Zhang |
阅读更多来源: ArXiv AI | 08-01-26
Controllable LLM Reasoning via Sparse Autoencoder-Based Steering
Authors: Yi Fang, Wenjie Wang, Mingfeng Xue, Boyi Deng, Fengli Xu, Dayiheng Liu, Fuli Feng |
阅读更多来源: ArXiv AI | 08-01-26
Personalized Medication Planning via Direct Domain Modeling and LLM-Generated Heuristics
Authors: Yonatan Vernik, Alexander Tuisov, David Izhaki, Hana Weitman, Gal A. Kaminka, Alexander Shleyfman |
阅读更多来源: ArXiv AI | 08-01-26
xDNN(ASP): Explanation Generation System for Deep Neural Networks powered by Answer Set Programming
Authors: Ly Ly Trieu (New Mexico State University), Tran Cao Son (New Mexico State University) |
阅读更多来源: ArXiv AI | 08-01-26
Current Agents Fail to Leverage World Model as Tool for Foresight
Authors: Cheng Qian, Emre Can Acikgoz, Bingxuan Li, Xiusi Chen, Yuji Zhang, Bingxiang He, Qinyu Luo, Dilek Hakkani-Tür, Gokhan Tur, Yunzhu Li, Heng Ji, Heng Ji |
阅读更多来源: ArXiv AI | 08-01-26
Agent Drift: Quantifying Behavioral Degradation in Multi-Agent LLM Systems Over Extended Interactions
Authors: Abhishek Rath |
阅读更多来源: ArXiv AI | 08-01-26
Who Laughs with Whom? Disentangling Influential Factors in Humor Preferences across User Clusters and LLMs
Authors: Soichiro Murakami, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura |
阅读更多来源: ArXiv AI | 07-01-26
Text-Guided Layer Fusion Mitigates Hallucination in Multimodal LLMs
Authors: Chenchen Lin, Sanbao Su, Rachel Luo, Yuxiao Chen, Yan Wang, Marco Pavone, Fei Miao |
阅读更多来源: ArXiv AI | 07-01-26
Grad-ELLM: Gradient-based Explanations for Decoder-only LLMs
Authors: Xin Huang, Antoni B. Chan |
阅读更多来源: ArXiv AI | 07-01-26
Joint Encoding of KV-Cache Blocks for Scalable LLM Serving
Authors: Joseph Kampeas, Emir Haleva |
阅读更多来源: ArXiv AI | 07-01-26
Prompt-Counterfactual Explanations for Generative AI System Behavior
Authors: Sofie Goethals, Foster Provost, João Sedoc |
阅读更多来源: ArXiv AI | 07-01-26
Limited Linguistic Diversity in Embodied AI Datasets
Authors: Selma Wanna, Agnes Luhtaru, Jonathan Salfity, Ryan Barron, Juston Moore, Cynthia Matuszek, Mitch Pryor |
阅读更多来源: ArXiv AI | 07-01-26
LeafLife: An Explainable Deep Learning Framework with Robustness for Grape Leaf Disease Recognition
Authors: B. M. Shahria Alam, Md. Nasim Ahmed |
阅读更多来源: ArXiv AI | 07-01-26
ToxiGAN: Toxic Data Augmentation via LLM-Guided Directional Adversarial Generation
Authors: Peiran Li, Jan Fillies, Adrian Paschke |
阅读更多来源: ArXiv AI | 07-01-26
Transformers self-organize like newborn visual systems when trained in prenatal worlds
Authors: Lalit Pandey, Samantha M. W. Wood, Justin N. Wood |
阅读更多来源: ArXiv AI | 07-01-26
Fine-tuning Small Language Models as Efficient Enterprise Search Relevance Labelers
Authors: Yue Kang, Zhuoyi Huang, Benji Schussheim, Diana Licon, Dina Atia, Shixing Cao, Jacob Danovitch, Kunho Kim, Billy Norcilien, Jonah Karpman, Mahmound Sayed, Mike Taylor, Tao Sun, Pavel Metrikov, Vipul Agarwal, Chris Quirk, Ye-Yi Wang, Nick Craswell, Irene Shaffer, Tianwei Chen, Sulaiman Vesal, Soundar Srinivasan |
阅读更多来源: ArXiv AI | 07-01-26
UltraLogic: Enhancing LLM Reasoning through Large-Scale Data Synthesis and Bipolar Float Reward
Authors: Yile Liu, Yixian Liu, Zongwei Li, Yufei Huang, Xinhua Feng, Zhichao Hu, Jinglu Hu, Jianfeng Yan, Fengzong Lian, Yuhong Liu |
阅读更多来源: ArXiv AI | 07-01-26
Recursive querying of neural networks via weighted structures
Authors: Martin Grohe, Christoph Standke, Juno Steegmans, Jan Van den Bussche |
阅读更多来源: ArXiv AI | 07-01-26
AnatomiX, an Anatomy-Aware Grounded Multimodal Large Language Model for Chest X-Ray Interpretation
Authors: Anees Ur Rehman Hashmi, Numan Saeed, Christoph Lippert |
阅读更多来源: ArXiv AI | 07-01-26
SimpleMem: Efficient Lifelong Memory for LLM Agents
Authors: Jiaqi Liu, Yaofeng Su, Peng Xia, Siwei Han, Zeyu Zheng, Cihang Xie, Mingyu Ding, Huaxiu Yao |
阅读更多来源: ArXiv AI | 07-01-26
Causal-Enhanced AI Agents for Medical Research Screening
Authors: Duc Ngo, Arya Rahgoza |
阅读更多来源: ArXiv AI | 07-01-26
HAL: Inducing Human-likeness in LLMs with Alignment
Authors: Masum Hasan, Junjie Zhao, Ehsan Hoque |
阅读更多来源: ArXiv AI | 07-01-26
LLM Agent Framework for Intelligent Change Analysis in Urban Environment using Remote Sensing Imagery
Authors: Zixuan Xiao, Jun Ma |
阅读更多来源: ArXiv AI | 07-01-26
Logical Phase Transitions: Understanding Collapse in LLM Logical Reasoning
Authors: Xinglang Zhang, Yunyao Zhang, ZeLiang Chen, Junqing Yu, Wei Yang, Zikai Song |
阅读更多来源: ArXiv AI | 07-01-26
ReTreVal: Reasoning Tree with Validation - A Hybrid Framework for Enhanced LLM Multi-Step Reasoning
Authors: Abhishek HS, Pavan C Shekar, Arpit Jain, Ashwanth Krishnan |
阅读更多来源: ArXiv AI | 07-01-26
Rationale-Grounded In-Context Learning for Time Series Reasoning with Multimodal Large Language Models
Authors: Qingxiang Liu, Zhiqing Cui, Xiaoliang Luo, Yuqian Wu, Zhuoyang Jiang, Huaiyu Wan, Sheng Sun, Lvchun Wang, Wei Yu, Yuxuan Liang |
阅读更多来源: ArXiv AI | 07-01-26
Batch-of-Thought: Cross-Instance Learning for Enhanced LLM Reasoning
Authors: Xuan Yang, Furong Jia, Roy Xie, Xiong Xi, Hengwei Bian, Jian Li, Monica Agrawal |
阅读更多来源: ArXiv AI | 07-01-26
MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents
Authors: Dongming Jiang, Yi Li, Guanpeng Li, Bingzhe Li |
阅读更多来源: ArXiv AI | 07-01-26
Automatic Prompt Engineering with No Task Cues and No Tuning
Authors: Faisal Chowdhury, Nandana Mihindukulasooriya, Niharika S D'Souza, Horst Samulowitz, Neeru Gupta, Tomasz Hanusiak, Michal Kapitonow |
阅读更多来源: ArXiv AI | 07-01-26
Launch HN: Tamarind Bio (YC W24) – AI Inference Provider for Drug Discovery
阅读更多来源: Hacker News | 07-01-26
The creator of Claude Code's Claude setuptwitter.com/bcherny
阅读更多来源: Hacker News | 07-01-26
Opus 4.5 is not the normal AI agent experience that I have had thus farburkeholland.github.io
阅读更多来源: Hacker News | 07-01-26
Show HN: Mantic.sh – A structural code search engine for AI agentsgithub.com/marcoaapfortes
阅读更多来源: Hacker News | 07-01-26
The Agentic Self: Parallels Between AI and Self-Improvementmuratbuffalo.blogspot.com
阅读更多来源: Hacker News | 07-01-26
Microsoft (Probably) Killed My Snapdragon Dev Kitjasoneckert.github.io
阅读更多来源: Hacker News | 07-01-26
Why didn't AI “join the workforce” in 2025?calnewport.com
阅读更多来源: Hacker News | 06-01-26
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks (2018)arxiv.org
阅读更多来源: Hacker News | 06-01-26
Decomposing LLM Self-Correction: The Accuracy-Correction Paradox and Error Depth Hypothesis
Authors: Yin Li |
阅读更多来源: ArXiv AI | 06-01-26
Comment on: Your Brain on ChatGPT: Accumulation of Cognitive Debt When Using an AI Assistant for Essay Writing Tasks
Authors: Milos Stankovic, Ella Hirche, Sarah Kollatzsch, Julia Nadine Doetsch |
阅读更多来源: ArXiv AI | 06-01-26
Temporal Attack Pattern Detection in Multi-Agent AI Workflows: An Open Framework for Training Trace-Based Security Models
Authors: Ron F. Del Rosario |
阅读更多来源: ArXiv AI | 06-01-26
Enhancing Temporal Awareness in LLMs for Temporal Point Processes
Authors: Lili Chen, Wensheng Gan, Shuang Liang, Philip S. Yu |
阅读更多来源: ArXiv AI | 06-01-26
OmniNeuro: A Multimodal HCI Framework for Explainable BCI Feedback via Generative AI and Sonification
Authors: Ayda Aghaei Nia |
阅读更多来源: ArXiv AI | 06-01-26
Can We Trust AI Explanations? Evidence of Systematic Underreporting in Chain-of-Thought Reasoning
Authors: Deep Pankajbhai Mehta |
阅读更多来源: ArXiv AI | 06-01-26
Universal Conditional Logic: A Formal Language for Prompt Engineering
Authors: Anthony Mikinka |
阅读更多来源: ArXiv AI | 06-01-26
Cultural Encoding in Large Language Models: The Existence Gap in AI-Mediated Brand Discovery
Authors: Huang Junyao, Situ Ruimin, Ye Renqin |
阅读更多来源: ArXiv AI | 06-01-26
Beyond Gemini-3-Pro: Revisiting LLM Routing and Aggregation at Scale
Authors: Shengji Tang, Weihao Lin, Jingqi Ye, Hao Li, Bo Zhang, Shuyue Hu, Tao Chen, Wangli Ouyang, Lei Bai, Peng Ye |
阅读更多来源: ArXiv AI | 06-01-26
Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models
Authors: Rong Zhou, Dongping Chen, Zihan Jia, Yao Su, Yixin Liu, Yiwen Lu, Dongwei Shi, Yue Huang, Tianyang Xu, Yi Pan, Xinliang Li, Yohannes Abate, Qingyu Chen, Zhengzhong Tu, Yu Yang, Yu Zhang, Qingsong Wen, Gengchen Mai, Sunyang Fu, Jiachen Li, Xuyu Wang, Ziran Wang, Jing Huang, Tianming Liu, Yong Chen, Lichao Sun, Lifang He |
阅读更多来源: ArXiv AI | 06-01-26
Reading Between the Lines: Deconfounding Causal Estimates using Text Embeddings and Deep Learning
Authors: Ahmed Dawoud, Osama El-Shamy |
阅读更多来源: ArXiv AI | 06-01-26
Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement
Authors: Mingyu Xu, Cheng Fang, Keyue Jiang, Yuqian Zheng, Yanghua Xiao, Baojian Zhou, Qifang Zhao, Suhang Zheng, Xiuwen Zhu, Jiyang Tang, Yongchi Zhao, Yijia Luo, Zhiqi Bai, Yuchi Xu, Wenbo Su, Wei Wang, Bing Zhao, Lin Qu, Xiaoxiao Xu |
阅读更多来源: ArXiv AI | 06-01-26
Improving Behavioral Alignment in LLM Social Simulations via Context Formation and Navigation
Authors: Letian Kong, Qianran (Jenny)Jin, Renyu Zhang |
阅读更多来源: ArXiv AI | 06-01-26
Bayesian Orchestration of Multi-LLM Agents for Cost-Aware Sequential Decision-Making
Authors: Danial Amin |
阅读更多来源: ArXiv AI | 06-01-26
Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications
Authors: YuanLab.ai: Shawn Wu, Sean Wang, Louie Li, Darcy Chen, Allen Wang, Jiangang Luo, Xudong Zhao, Joseph Shen, Gawain Ma, Jasper Jia, Marcus Mao, Claire Wang, Hunter He, Carol Wang, Zera Zhang, Jason Wang, Chonly Shen, Leo Zhang, Logan Chen, Qasim Meng, James Gong, Danied Zhao, Penn Zheng, Owen Zhu, Tong Yu |
阅读更多来源: ArXiv AI | 06-01-26
Structured Decomposition for LLM Reasoning: Cross-Domain Validation and Semantic Web Integration
Authors: Albert Sadowski, Jarosław A. Chudziak |
阅读更多来源: ArXiv AI | 06-01-26
CaveAgent: Transforming LLMs into Stateful Runtime Operators
Authors: Maohao Ran, Zhenglin Wan, Cooper Lin, Yanting Zhang, Hongyu Xin, Hongwei Fan, Yibo Xu, Beier Luo, Yaxin Zhou, Wangbo Zhao, Lijie Yang, Lang Feng, Fuchao Yang, Jingxuan Wu, Yiqiao Huang, Chendong Ma, Dailing Jiang, Jianbo Deng, Sihui Han, Bo An, Yike Guo, Jun Song |
阅读更多来源: ArXiv AI | 06-01-26
PsychEval: A Multi-Session and Multi-Therapy Benchmark for High-Realism and Comprehensive AI Psychological Counselor
Authors: Qianjun Pan, Junyi Wang, Jie Zhou, Yutao Yang, Junsong Li, Kaiyin Xu, Yougen Zhou, Yihan Li, Jingyuan Zhao, Qin Chen, Ningning Zhou, Kai Chen, Liang He |
阅读更多来源: ArXiv AI | 06-01-26
Can Large Language Models Solve Engineering Equations? A Systematic Comparison of Direct Prediction and Solver-Assisted Approaches
Authors: Sai Varun Kodathala, Rakesh Vunnam |
阅读更多来源: ArXiv AI | 06-01-26
Clinical Knowledge Graph Construction and Evaluation with Multi-LLMs via Retrieval-Augmented Generation
Authors: Udiptaman Das, Krishnasai B. Atmakuri, Duy Ho, Chi Lee, Yugyung Lee |
阅读更多来源: ArXiv AI | 06-01-26
COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs
Authors: Dasol Choi, DongGeon Lee, Brigitta Jesica Kartono, Helena Berndt, Taeyoun Kwon, Joonwon Jang, Haon Park, Hwanjo Yu, Minsuk Kahng |
阅读更多来源: ArXiv AI | 06-01-26
Theory Trace Card: Theory-Driven Socio-Cognitive Evaluation of LLMs
Authors: Farzan Karimi-Malekabadi, Suhaib Abdurahman, Zhivar Sourati, Jackson Trager, Morteza Dehghani |
阅读更多来源: ArXiv AI | 06-01-26
MindChat: A Privacy-preserving Large Language Model for Mental Health Support
Authors: Dong Xue, Jicheng Tu, Ming Wang, Xin Yan, Fangzhou Liu, Jie Hu |
阅读更多来源: ArXiv AI | 06-01-26
FormuLLA: A Large Language Model Approach to Generating Novel 3D Printable Formulations
Authors: Adeshola Okubena, Yusuf Ali Mohammed, Moe Elbadawi |
阅读更多来源: ArXiv AI | 06-01-26
Project Ariadne: A Structural Causal Framework for Auditing Faithfulness in LLM Agents
Authors: Sourena Khanzadeh |
阅读更多来源: ArXiv AI | 06-01-26
Scientific production in the era of large language models [pdf]gwern.net
阅读更多来源: Hacker News | 06-01-26
Optimizing LSTM Neural Networks for Resource-Constrained Retail Sales Forecasting: A Model Compression Study
Authors: Ravi Teja Pagidoju |
阅读更多来源: ArXiv AI | 06-01-26
Cracking IoT Security: Can LLMs Outsmart Static Analysis Tools?
Authors: Jason Quantrill, Noura Khajehnouri, Zihan Guo, Manar H. Alalfi |
阅读更多来源: ArXiv AI | 06-01-26
A Comprehensive Dataset for Human vs. AI Generated Image Detection
Authors: Rajarshi Roy, Nasrin Imanpour, Ashhar Aziz, Shashwat Bajpai, Gurpreet Singh, Shwetangshu Biswas, Kapil Wanaskar, Parth Patwa, Subhankar Ghosh, Shreyas Dixit, Nilesh Ranjan Pal, Vipula Rawte, Ritvik Garimella, Gaytri Jena, Vasu Sharma, Vinija Jain, Aman Chadha, Aishwarya Naresh Reganti, Amitava Das |
阅读更多来源: ArXiv AI | 06-01-26
Priority-Aware Multi-Robot Coverage Path Planning
Authors: Kanghoon Lee, Hyeonjun Kim, Jiachen Li, Jinkyoo Park |
阅读更多来源: ArXiv AI | 06-01-26
Learning to be Reproducible: Custom Loss Design for Robust Neural Networks
Authors: Waqas Ahmed, Sheeba Samuel, Kevin Coakley, Birgitta Koenig-Ries, Odd Erik Gundersen |
阅读更多来源: ArXiv AI | 06-01-26
Exploring the Performance of Large Language Models on Subjective Span Identification Tasks
Authors: Alphaeus Dmonte, Roland Oruche, Tharindu Ranasinghe, Marcos Zampieri, Prasad Calyam |
阅读更多来源: ArXiv AI | 06-01-26
LLM Agents for Combinatorial Efficient Frontiers: Investment Portfolio Optimization
Authors: Simon Paquette-Greenbaum, Jiangbo Yu |
阅读更多来源: ArXiv AI | 06-01-26
Reasoning in Action: MCTS-Driven Knowledge Retrieval for Large Language Models
Authors: Shuqi Liu, Bowei He, Chen Ma, Linqi Song |
阅读更多来源: ArXiv AI | 06-01-26
FedHypeVAE: Federated Learning with Hypernetwork Generated Conditional VAEs for Differentially Private Embedding Sharing
Authors: Sunny Gupta, Amit Sethi |
阅读更多来源: ArXiv AI | 06-01-26
Finetuning Large Language Models for Automated Depression Screening in Nigerian Pidgin English: GENSCORE Pilot Study
Authors: Isaac Iyinoluwa Olufadewa, Miracle Ayomikun Adesina, Ezekiel Ayodeji Oladejo, Uthman Babatunde Usman, Owen Kolade Adeniyi, Matthew Tolulope Olawoyin |
阅读更多来源: ArXiv AI | 06-01-26
The Agentic Leash: Extracting Causal Feedback Fuzzy Cognitive Maps with LLMs
Authors: Akash Kumar Panda, Olaoluwa Adigun, Bart Kosko |
阅读更多来源: ArXiv AI | 06-01-26
From Clay to Code: Typological and Material Reasoning in AI Interpretations of Iranian Pigeon Towers
Authors: Abolhassan Pishahang, Maryam Badiei |
阅读更多来源: ArXiv AI | 06-01-26
Ask, Clarify, Optimize: Human-LLM Agent Collaboration for Smarter Inventory Control
Authors: Yaqi Duan, Yichun Hu, Jiashuo Jiang |
阅读更多来源: ArXiv AI | 06-01-26
An AI Monkey Gets Grapes for Sure -- Sphere Neural Networks for Reliable Decision-Making
Authors: Tiansi Dong, Henry He, Pietro Liò, Mateja Jamnik |
阅读更多来源: ArXiv AI | 06-01-26
ClinicalReTrial: A Self-Evolving AI Agent for Clinical Trial Protocol Optimization
Authors: Sixue Xing, Xuanye Xia, Kerui Wu, Meng Jiang, Jintai Chen, Tianfan Fu |
阅读更多来源: ArXiv AI | 06-01-26
Will LLM-powered Agents Bias Against Humans? Exploring the Belief-Dependent Vulnerability
Authors: Zongwei Wang, Bincheng Gu, Hongyu Yu, Junliang Yu, Tao He, Jiayin Feng, Min Gao |
阅读更多来源: ArXiv AI | 06-01-26
FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems
Authors: Shanli Xing, Yiyan Zhai, Alexander Jiang, Yixin Dong, Yong Wu, Zihao Ye, Charlie Ruan, Yingyi Huang, Yineng Zhang, Liangsheng Yin, Aksara Bayyapu, Luis Ceze, Tianqi Chen |
阅读更多来源: ArXiv AI | 06-01-26
Progressive Ideation using an Agentic AI Framework for Human-AI Co-Creation
Authors: Sankar B, Srinidhi Ranjini Girish, Aadya Bharti, Dibakar Sen |
阅读更多来源: ArXiv AI | 06-01-26
DA-DPO: Cost-efficient Difficulty-aware Preference Optimization for Reducing MLLM Hallucinations
Authors: Longtian Qiu, Shan Ning, Chuyu Zhang, Jiaxuan Sun, Xuming He |
阅读更多来源: ArXiv AI | 06-01-26
A Vision-and-Knowledge Enhanced Large Language Model for Generalizable Pedestrian Crossing Behavior Inference
Authors: Qingwen Pu, Kun Xie, Hong Yang, Guocong Zhai |
阅读更多来源: ArXiv AI | 06-01-26
Claude Code On-the-Gogranda.org
阅读更多来源: Hacker News | 05-01-26
Eurostar AI vulnerability: When a chatbot goes off the railspentestpartners.com
阅读更多来源: Hacker News | 05-01-26
Trellis AI (YC W24) is hiring engineers to build AI agents for healthcare accessycombinator.com
阅读更多来源: Hacker News | 05-01-26
Show HN: An LLM-Powered PCB Schematic Checker (Major Update)traceformer.io
阅读更多来源: Hacker News | 05-01-26
Show HN: Claude Reflect – Auto-turn Claude corrections into project configgithub.com/bayramannakov
阅读更多来源: Hacker News | 04-01-26
Show HN: Replacing my OS process scheduler with an LLMgithub.com/mprajyothreddy
阅读更多来源: Hacker News | 04-01-26
Show HN: I used AI to recreate a $4000 piece of audio hardware as a plugin
阅读更多来源: Hacker News | 04-01-26
Neural Networks: Zero to Herokarpathy.ai
阅读更多来源: Hacker News | 04-01-26
Using AI generated images to get refundswired.com
阅读更多来源: Hacker News | 04-01-26
The Impact of LLMs on Online News Consumption and Production
Authors: Hangcheng Zhao, Ron Berman |
阅读更多来源: ArXiv AI | 04-01-26
AdaGReS:Adaptive Greedy Context Selection via Redundancy-Aware Scoring for Token-Budgeted RAG
Authors: Chao Peng, Bin Wang, Zhilei Long, Jinfang Sheng |
阅读更多来源: ArXiv AI | 04-01-26
Vulcan: Instance-Optimal Systems Heuristics Through LLM-Driven Search
Authors: Rohit Dwivedula, Divyanshu Saxena, Sujay Yadalam, Daehyeok Kim, Aditya Akella |
阅读更多来源: ArXiv AI | 04-01-26
A Proof-of-Concept for Explainable Disease Diagnosis Using Large Language Models and Answer Set Programming
Authors: Ioanna Gemou, Evangelos Lamprou |
阅读更多来源: ArXiv AI | 04-01-26
CogRec: A Cognitive Recommender Agent Fusing Large Language Models and Soar for Explainable Recommendation
Authors: Jiaxin Hu, Tao Wang, Bingsan Yang, Hongrun Wang |
阅读更多来源: ArXiv AI | 04-01-26
What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?
Authors: Basile Terver, Tsung-Yen Yang, Jean Ponce, Adrien Bardes, Yann LeCun |
阅读更多来源: ArXiv AI | 04-01-26
From Building Blocks to Planning: Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning
Authors: Amir Tahmasbi, Sadegh Majidi, Kazem Taram, Aniket Bera |
阅读更多来源: ArXiv AI | 04-01-26
Evaluating the Reasoning Abilities of LLMs on Underrepresented Mathematics Competition Problems
Authors: Samuel Golladay, Majid Bani-Yaghoub |
阅读更多来源: ArXiv AI | 04-01-26
Reinforcement Learning-Augmented LLM Agents for Collaborative Decision Making and Performance Optimization
Authors: Dong Qiu, Duo Xu, Limengxi Yue |
阅读更多来源: ArXiv AI | 04-01-26
MCPAgentBench: A Real-world Task Benchmark for Evaluating LLM Agent MCP Tool Use
Authors: Wenrui Liu, Zixiang Liu, Elsie Dai, Wenhan Yu, Lei Yu, Tong Yang |
阅读更多来源: ArXiv AI | 04-01-26
BatteryAgent: Synergizing Physics-Informed Interpretation with LLM Reasoning for Intelligent Battery Fault Diagnosis
Authors: Songqi Zhou, Ruixue Liu, Boman Su, Jiazhou Wang, Yixing Wang, Benben Jiang |
阅读更多来源: ArXiv AI | 04-01-26
Iterative Deployment Improves Planning Skills in LLMs
Authors: Augusto B. Corrêa, Yoav Gelberg, Luckeciano C. Melo, Ilia Shumailov, André G. Pereira, Yarin Gal |
阅读更多来源: ArXiv AI | 04-01-26
Context-aware LLM-based AI Agents for Human-centered Energy Management Systems in Smart Buildings
Authors: Tianzhi He, Farrokh Jazizadeh |
阅读更多来源: ArXiv AI | 04-01-26
Developing a BLAS Library for the AMD AI Engine [pdf]tlaan.nl
阅读更多来源: Hacker News | 04-01-26
IQuest-Coder: A new open-source code model beats Claude Sonnet 4.5 and GPT 5.1 [pdf]github.com/iquestlab
阅读更多来源: Hacker News | 03-01-26
Build a Deep Learning Libraryquarto.pub
阅读更多来源: Hacker News | 02-01-26
Show HN: Use Claude Code to Query 600 GB Indexes over Hacker News, ArXiv, etc.exopriors.com
阅读更多来源: Hacker News | 01-01-26
2025: The Year in LLMssimonwillison.net
阅读更多来源: Hacker News | 01-01-26
How AI labs are solving the power problemsemianalysis.com
阅读更多来源: Hacker News | 01-01-26
Nerd: A language for LLMs, not humansnerd-lang.org
阅读更多来源: Hacker News | 01-01-26
GamiBench: Evaluating Spatial Reasoning and 2D-to-3D Planning Capabilities of MLLMs with Origami Folding Tasks
Authors: Ryan Spencer, Roey Yaari, Ritvik Vemavarapu, Joyce Yang, Steven Ngo, Utkarsh Sharma |
阅读更多来源: ArXiv AI | 31-12-25
Emergent Persuasion: Will LLMs Persuade Without Being Prompted?
Authors: Vincent Chang, Thee Ho, Sunishchal Dev, Kevin Zhu, Shi Feng, Kellin Pelrine, Matthew Kowal |
阅读更多来源: ArXiv AI | 31-12-25
Bidirectional RAG: Safe Self-Improving Retrieval-Augmented Generation Through Multi-Stage Validation
Authors: Teja Chinthala |
阅读更多来源: ArXiv AI | 31-12-25
With Great Capabilities Come Great Responsibilities: Introducing the Agentic Risk & Capability Framework for Governing Agentic AI Systems
Authors: Shaun Khoo, Jessica Foo, Roy Ka-Wei Lee |
阅读更多来源: ArXiv AI | 31-12-25
Toward Equitable Recovery: A Fairness-Aware AI Framework for Prioritizing Post-Flood Aid in Bangladesh
Authors: Farjana Yesmin, Romana Akter |
阅读更多来源: ArXiv AI | 31-12-25
DarkPatterns-LLM: A Multi-Layer Benchmark for Detecting Manipulative and Harmful AI Behavior
Authors: Sadia Asif, Israel Antonio Rosales Laguan, Haris Khan, Shumaila Asif, Muneeb Asif |
阅读更多来源: ArXiv AI | 31-12-25
Lightweight Inference-Time Personalization for Frozen Knowledge Graph Embeddings
Authors: Ozan Oguztuzun, Cerag Oguztuzun |
阅读更多来源: ArXiv AI | 31-12-25
HalluMat: Detecting Hallucinations in LLM-Generated Materials Science Content Through Multi-Stage Verification
Authors: Bhanu Prakash Vangala, Sajid Mahmud, Pawan Neupane, Joel Selvaraj, Jianlin Cheng |
阅读更多来源: ArXiv AI | 31-12-25
SANet: A Semantic-aware Agentic AI Networking Framework for Cross-layer Optimization in 6G
Authors: Yong Xiao, Xubo Li, Haoran Zhou, Yingyu Li, Yayu Gao, Guangming Shi, Ping Zhang, Marwan Krunz |
阅读更多来源: ArXiv AI | 31-12-25
The Wisdom of Deliberating AI Crowds: Does Deliberation Improve LLM-Based Forecasting?
Authors: Paul Schneider, Amalie Schramm |
阅读更多来源: ArXiv AI | 31-12-25
LLM Agents as VC investors: Predicting Startup Success via RolePlay-Based Collective Simulation
Authors: Zhongyang Liu, Haoyu Pei, Xiangyi Xiao, Xiaocong Du, Yihui Li, Suting Hong, Kunpeng Zhang, Haipeng Zhang |
阅读更多来源: ArXiv AI | 31-12-25
Problems With Large Language Models for Learner Modelling: Why LLMs Alone Fall Short for Responsible Tutoring in K--12 Education
Authors: Danial Hooshyar, Yeongwook Yang, Gustav Šíř, Tommi Kärkkäinen, Raija Hämäläinen, Mutlu Cukurova, Roger Azevedo |
阅读更多来源: ArXiv AI | 31-12-25
InSPO: Unlocking Intrinsic Self-Reflection for LLM Preference Optimization
Authors: Yu Li, Tian Lan, Zhengling Qi |
阅读更多来源: ArXiv AI | 31-12-25
From Model Choice to Model Belief: Establishing a New Measure for LLM-Based Research
Authors: Hongshen Sun, Juanjuan Zhang |
阅读更多来源: ArXiv AI | 31-12-25
SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search
Authors: Yifan Zhang, Giridhar Ganapavarapu, Srideepika Jayaraman, Bhavna Agrawal, Dhaval Patel, Achille Fokoue |
阅读更多来源: ArXiv AI | 31-12-25
Agentic Physical AI toward a Domain-Specific Foundation Model for Nuclear Reactor Control
Authors: Yoonpyo Lee, Kazuma Kobayashi, Sai Puppala, Sajedul Talukder, Seid Koric, Souvik Chakraborty, Syed Bahauddin Alam |
阅读更多来源: ArXiv AI | 31-12-25
The Gaining Paths to Investment Success: Information-Driven LLM Graph Reasoning for Venture Capital Prediction
Authors: Haoyu Pei, Zhongyang Liu, Xiangyi Xiao, Xiaocong Du, Haipeng Zhang, Kunpeng Zhang, Suting Hong |
阅读更多来源: ArXiv AI | 31-12-25
Physics-Informed Neural Networks for Device and Circuit Modeling: A Case Study of NeuroSPICE
Authors: Chien-Ting Tung, Chenming Hu |
阅读更多来源: ArXiv AI | 31-12-25
Divergent-Convergent Thinking in Large Language Models for Creative Problem Generation
Authors: Manh Hung Nguyen, Adish Singla |
阅读更多来源: ArXiv AI | 31-12-25
Why AI Safety Requires Uncertainty, Incomplete Preferences, and Non-Archimedean Utilities
Authors: Alessio Benavoli, Alessandro Facchini, Marco Zaffalon |
阅读更多来源: ArXiv AI | 31-12-25
OpenAI's cash burn will be one of the big bubble questions of 2026economist.com
阅读更多来源: Hacker News | 31-12-25
FediMeteo: A €4 FreeBSD VPS Became a Global Weather Servicedragas.net
阅读更多来源: Hacker News | 31-12-25
Show HN: Stop Claude Code from forgetting everythinggithub.com/mutable-state-inc
阅读更多来源: Hacker News | 30-12-25
Show HN: Cover letter generator with Ollama/local LLMs (Open source)coverlettermaker.co
阅读更多来源: Hacker News | 30-12-25
A Comedy of Estimators: On KL Regularization in RL Training of LLMs
Authors: Vedant Shah, Johan Obando-Ceron, Vineet Jain, Brian Bartoldson, Bhavya Kailkhura, Sarthak Mittal, Glen Berseth, Pablo Samuel Castro, Yoshua Bengio, Nikolay Malkin, Moksh Jain, Siddarth Venkatraman, Aaron Courville |
阅读更多来源: ArXiv AI | 30-12-25
HeartBench: Probing Core Dimensions of Anthropomorphic Intelligence in LLMs
Authors: Jiaxin Liu, Peiyi Tu, Wenyu Chen, Yihong Zhuang, Xinxia Ling, Anji Zhou, Chenxi Wang, Zhuo Han, Zhengkai Yang, Junbo Zhao, Zenan Huang, Yuanyuan Wang |
阅读更多来源: ArXiv AI | 30-12-25
CellMamba: Adaptive Mamba for Accurate and Efficient Cell Detection
Authors: Ruochen Liu, Yi Tian, Jiahao Wang, Hongbin Liu, Xianxu Hou, Jingxin Liu |
阅读更多来源: ArXiv AI | 30-12-25
Optimizing Resource Allocation for Geographically-Distributed Inference by Large Language Models
Authors: Tingyang Sun, Ting He, Bo Ji, Parimal Parag |
阅读更多来源: ArXiv AI | 30-12-25
CricBench: A Multilingual Benchmark for Evaluating LLMs in Cricket Analytics
Authors: Vaibhav Devraj, Dhruv Kumar, Jagat Sesh Challa |
阅读更多来源: ArXiv AI | 30-12-25
Unifying Learning Dynamics and Generalization in Transformers Scaling Law
Authors: Chiwun Yang |
阅读更多来源: ArXiv AI | 30-12-25
From Visual Perception to Deep Empathy: An Automated Assessment Framework for House-Tree-Person Drawings Using Multimodal LLMs and Multi-Agent Collaboration
Authors: Shuide Wen, Yu Sun, Beier Ku, Zhi Gao, Lijun Ma, Yang Yang, Can Jiao |
阅读更多来源: ArXiv AI | 30-12-25
NEMO-4-PAYPAL: Leveraging NVIDIA's Nemo Framework for empowering PayPal's Commerce Agent
Authors: Ali Sahami, Sudhanshu Garg, Andrew Wang, Chaitanya Kulkarni, Farhad Farahani, Sean Yun-Shiuan Chuang, Jian Wan, Srinivasan Manoharan, Uma Kona, Nitin Sharma, Linsey Pang, Prakhar Mehrotra, Jessica Clark, Mark Moyou |
阅读更多来源: ArXiv AI | 30-12-25
Towards Responsible and Explainable AI Agents with Consensus-Driven Reasoning
Authors: Eranga Bandara, Tharaka Hewa, Ross Gore, Sachin Shetty, Ravi Mukkamala, Peter Foytik, Abdul Rahman, Safdar H. Bouk, Xueping Liang, Amin Hass, Sachini Rajapakse, Ng Wee Keong, Kasun De Zoysa, Aruna Withanage, Nilaan Loganathan |
阅读更多来源: ArXiv AI | 30-12-25
Compliance Rating Scheme: A Data Provenance Framework for Generative AI Datasets
Authors: Matyas Bohacek, Ignacio Vilanova Echavarri |
阅读更多来源: ArXiv AI | 30-12-25
Pruning as a Game: Equilibrium-Driven Sparsification of Neural Networks
Authors: Zubair Shah, Noaman Khan |
阅读更多来源: ArXiv AI | 30-12-25
Asking Gemini 3 to generate Brainfuck code results in an infinite loopteodordyakov.github.io
阅读更多来源: Hacker News | 29-12-25
As AI gobbles up chips, prices for devices may risenpr.org
阅读更多来源: Hacker News | 29-12-25
Project Vend: Phase Twoanthropic.com
阅读更多来源: Hacker News | 28-12-25
Casting a SPELL: Sentence Pairing Exploration for LLM Limitation-breaking
Authors: Yifan Huang, Xiaojun Jia, Wenbo Guo, Yuqiang Sun, Yihao Huang, Chong Wang, Yang Liu |
阅读更多来源: ArXiv AI | 28-12-25
Leveraging Lightweight Entity Extraction for Scalable Event-Based Image Retrieval
Authors: Dao Sy Duy Minh, Huynh Trung Kiet, Nguyen Lam Phu Quy, Phu-Hoa Pham, Tran Chi Nguyen |
阅读更多来源: ArXiv AI | 28-12-25
Scaling Laws for Economic Productivity: Experimental Evidence in LLM-Assisted Consulting, Data Analyst, and Management Tasks
Authors: Ali Merali |
阅读更多来源: ArXiv AI | 28-12-25
SMART SLM: Structured Memory and Reasoning Transformer, A Small Language Model for Accurate Document Assistance
Authors: Divij Dudeja, Mayukha Pal |
阅读更多来源: ArXiv AI | 28-12-25
C2LLM Technical Report: A New Frontier in Code Retrieval via Adaptive Cross-Attention Pooling
Authors: Jin Qin, Zihan Liao, Ziyin Zhang, Hang Yu, Peng Di, Rui Wang |
阅读更多来源: ArXiv AI | 28-12-25
Measuring all the noises of LLM Evals
Authors: Sida Wang |
阅读更多来源: ArXiv AI | 28-12-25
MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation
Authors: Chi-Hsiang Hsiao, Yi-Cheng Wang, Tzung-Sheng Lin, Yi-Ren Yeh, Chu-Song Chen |
阅读更多来源: ArXiv AI | 28-12-25
BitRL-Light: 1-bit LLM Agents with Deep Reinforcement Learning for Energy-Efficient Smart Home Lighting Optimization
Authors: Ravi Gupta, Shabista Haider |
阅读更多来源: ArXiv AI | 28-12-25
AIAuditTrack: A Framework for AI Security system
Authors: Zixun Luo, Yuhang Fan, Yufei Li, Youzhi Zhang, Hengyu Lin, Ziqi Wang |
阅读更多来源: ArXiv AI | 28-12-25
Reasoning Relay: Evaluating Stability and Interchangeability of Large Language Models in Mathematical Reasoning
Authors: Leo Lu, Jonathan Zhang, Sean Chua, Spencer Kim, Kevin Zhu, Sean O'Brien, Vasu Sharma |
阅读更多来源: ArXiv AI | 28-12-25
From Fake Focus to Real Precision: Confusion-Driven Adversarial Attention Learning in Transformers
Authors: Yawei Liu |
阅读更多来源: ArXiv AI | 28-12-25
Memory Bear AI A Breakthrough from Memory to Cognition Toward Artificial General Intelligence
Authors: Deliang Wen, Ke Sun |
阅读更多来源: ArXiv AI | 28-12-25
Eidoku: A Neuro-Symbolic Verification Gate for LLM Reasoning via Structural Constraint Satisfaction
Authors: Shinobu Miya |
阅读更多来源: ArXiv AI | 28-12-25
Quantifying Laziness, Decoding Suboptimality, and Context Degradation in Large Language Models
Authors: Yiqing Ma, Jung-Hua Liu |
阅读更多来源: ArXiv AI | 28-12-25
Bridging the AI Trustworthiness Gap between Functions and Norms
Authors: Daan Di Scala, Sophie Lathouwers, Michael van Bekkum |
阅读更多来源: ArXiv AI | 28-12-25
AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent
Authors: Haipeng Luo, Huawen Feng, Qingfeng Sun, Can Xu, Kai Zheng, Yufei Wang, Tao Yang, Han Hu, Yansong Tang, Di Wang |
阅读更多来源: ArXiv AI | 28-12-25
The Silent Scholar Problem: A Probabilistic Framework for Breaking Epistemic Asymmetry in LLM Agents
Authors: Zan-Kai Chong, Hiroyuki Ohsaki, Bryan Ng |
阅读更多来源: ArXiv AI | 28-12-25
MAR:Multi-Agent Reflexion Improves Reasoning Abilities in LLMs
Authors: Onat Ozer, Grace Wu, Yuchen Wang, Daniel Dosti, Honghao Zhang, Vivi De La Rue |
阅读更多来源: ArXiv AI | 28-12-25
FinAgent: An Agentic AI Framework Integrating Personal Finance and Nutrition Planning
Authors: Toqeer Ali Syed, Abdulaziz Alshahrani, Ali Ullah, Ali Akarma, Sohail Khan, Muhammad Nauman, Salman Jan |
阅读更多来源: ArXiv AI | 28-12-25
A Blockchain-Monitored Agentic AI Architecture for Trusted Perception-Reasoning-Action Pipelines
Authors: Salman Jan, Hassan Ali Razzaqi, Ali Akarma, Mohammad Riyaz Belgaum |
阅读更多来源: ArXiv AI | 28-12-25
LLM Personas as a Substitute for Field Experiments in Method Benchmarking
Authors: Enoch Hyunwook Kang |
阅读更多来源: ArXiv AI | 28-12-25
Agentic Explainable Artificial Intelligence (Agentic XAI) Approach To Explore Better Explanation
Authors: Tomoaki Yamaguchi, Yutong Zhou, Masahiro Ryo, Keisuke Katsura |
阅读更多来源: ArXiv AI | 28-12-25
Beyond Context: Large Language Models Failure to Grasp Users Intent
Authors: Ahmed M. Hussain, Salahuddin Salahuddin, Panos Papadimitratos |
阅读更多来源: ArXiv AI | 28-12-25
A Real-World Evaluation of LLM Medication Safety Reviews in NHS Primary Care
Authors: Oliver Normand, Esther Borsi, Mitch Fruin, Lauren E Walker, Jamie Heagerty, Chris C. Holmes, Anthony J Avery, Iain E Buchan, Harry Coppock |
阅读更多来源: ArXiv AI | 28-12-25
Show HN: Mysti – Claude, Codex, and Gemini debate your code, then synthesizegithub.com/deepmyst
阅读更多来源: Hacker News | 28-12-25
Richard Stallman at the First Hackers Conference in 1984 [video]youtube.com
阅读更多来源: Hacker News | 28-12-25
Sandbox: Run untrusted AI code safely, fastgithub.com/pwnfunction
阅读更多来源: Hacker News | 27-12-25
Building an AI agent inside a 7-year-old Rails monolithcatalinionescu.dev
阅读更多来源: Hacker News | 26-12-25
ChatGPT conversations still lack timestamps after years of requestscommunity.openai.com
阅读更多来源: Hacker News | 26-12-25
Codex vs. Claude Code (Today)build.ms
阅读更多来源: Hacker News | 26-12-25
Critical vulnerability in LangChain – CVE-2025-68664cyata.ai
阅读更多来源: Hacker News | 26-12-25
Show HN: Vibium – Browser automation for AI and humans, by Selenium's creatorgithub.com/vibiumdev
阅读更多来源: Hacker News | 25-12-25
Asterisk AI Voice Agentgithub.com/hkjarral
阅读更多来源: Hacker News | 25-12-25
Show HN: A local-first, reversible PII scrubber for AI workflowsmedium.com/tj.ruesch
阅读更多来源: Hacker News | 25-12-25
Nvidia buying AI chip startup Groq for about $20B in cashcnbc.com
阅读更多来源: Hacker News | 25-12-25
Deep Learning Classification of EEG Responses to Multi-Dimensional Transcranial Electrical Stimulation
Authors: Alexis Pomares Pastor, Ines Ribeiro Violante, Gregory Scott |
阅读更多来源: ArXiv AI | 24-12-25
TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning
Authors: Saisai Yang, Qingyi Huang, Jing Yuan, Liangyu Zha, Kai Tang, Yuhang Yang, Ning Wang, Yucheng Wei, Liyao Li, Wentao Ye, Hao Chen, Tao Zhang, Junlin Zhou, Haobo Wang, Gang Chen, Junbo Zhao |
阅读更多来源: ArXiv AI | 24-12-25
AUDRON: A Deep Learning Framework with Fused Acoustic Signatures for Drone Type Recognition
Authors: Rajdeep Chatterjee, Sudip Chakrabarty, Trishaani Acharjee, Deepanjali Mishra |
阅读更多来源: ArXiv AI | 24-12-25
Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation
Authors: Nilesh Jain, Seyi Adeyinka, Leor Roseman, Aza Allsop |
阅读更多来源: ArXiv AI | 24-12-25
Toward Explaining Large Language Models in Software Engineering Tasks
Authors: Antonio Vitale, Khai-Nguyen Nguyen, Denys Poshyvanyk, Rocco Oliveto, Simone Scalabrino, Antonio Mastropaolo |
阅读更多来源: ArXiv AI | 24-12-25
Dual-Encoder Transformer-Based Multimodal Learning for Ischemic Stroke Lesion Segmentation Using Diffusion MRI
Authors: Muhammad Usman, Azka Rehman, Muhammad Mutti Ur Rehman, Abd Ur Rehman, Muhammad Umar Farooq |
阅读更多来源: ArXiv AI | 24-12-25
Leveraging High-Fidelity Digital Models and Reinforcement Learning for Mission Engineering: A Case Study of Aerial Firefighting Under Perfect Information
Authors: İbrahim Oğuz Çetinkaya, Sajad Khodadadian, Taylan G. Topçu |
阅读更多来源: ArXiv AI | 24-12-25
Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs
Authors: Rui Pan, Zhuofu Chen, Ravi Netravali |
阅读更多来源: ArXiv AI | 24-12-25
PhysMaster: Building an Autonomous AI Physicist for Theoretical and Computational Physics Research
Authors: Tingjia Miao (1 and 2 and 5), Jiawen Dai (2), Jingkun Liu (2), Jinxin Tan (2 and 3 and 4), Muhua Zhang (2 and 3 and 4), Wenkai Jin (1), Yuwen Du (1), Tian Jin (1), Xianghe Pang (1), Zexi Liu (1), Tu Guo (2 and 4), Zhengliang Zhang (2 and 4 and 5), Yunjie Huang (1), Shuo Chen (6), Rui Ye (1), Yuzhi Zhang (7), Linfeng Zhang (7), Kun Chen (6), Wei Wang (2 and 3 and 4), Weinan E (1), Siheng Chen (1) ((1) School of Artificial Intelligence, Shanghai Jiao Tong University, (2) School of Physics and Astronomy, Shanghai Jiao Tong University, (3) State Key Laboratory of Dark Matter Physics, Shanghai Jiao Tong University, (4) Tsung-Dao Lee Institute, Shanghai Jiao Tong University, (5) Zhiyuan College, Shanghai Jiao Tong University, (6) Institute of Theoretical Physics, Chinese Academy of Sciences, (7) DP Technology) |
阅读更多来源: ArXiv AI | 24-12-25
Cube Bench: A Benchmark for Spatial Visual Reasoning in MLLMs
Authors: Dhruv Anand, Ehsan Shareghi |
阅读更多来源: ArXiv AI | 24-12-25
Interpolative Decoding: Exploring the Spectrum of Personality Traits in LLMs
Authors: Eric Yeh, John Cadigan, Ran Chen, Dick Crouch, Melinda Gervasio, Dayne Freitag |
阅读更多来源: ArXiv AI | 24-12-25
Scaling Reinforcement Learning for Content Moderation with Large Language Models
Authors: Hamed Firooz, Rui Liu, Yuchen Lu, Zhenyu Hou, Fangzhou Xiong, Xiaoyang Zhang, Changshu Jian, Zhicheng Zhu, Jiayuan Ma, Jacob Tao, Chaitali Gupta, Xiaochang Peng, Shike Mei, Hang Cui, Yang Qin, Shuo Tang, Jason Gaedtke, Arpit Mittal |
阅读更多来源: ArXiv AI | 24-12-25
Enhancing Zero-Shot Time Series Forecasting in Off-the-Shelf LLMs via Noise Injection
Authors: Xingyou Yin, Ceyao Zhang, Min Hu, Kai Chen |
阅读更多来源: ArXiv AI | 24-12-25
Adaptive Financial Sentiment Analysis for NIFTY 50 via Instruction-Tuned LLMs , RAG and Reinforcement Learning Approaches
Authors: Chaithra, Kamesh Kadimisetty, Biju R Mohan |
阅读更多来源: ArXiv AI | 24-12-25
Graph-Symbolic Policy Enforcement and Control (G-SPEC): A Neuro-Symbolic Framework for Safe Agentic AI in 5G Autonomous Networks
Authors: Divya Vijay, Vignesh Ethiraj |
阅读更多来源: ArXiv AI | 24-12-25
MemR$^3$: Memory Retrieval via Reflective Reasoning for LLM Agents
Authors: Xingbo Du, Loka Li, Duzhen Zhang, Le Song |
阅读更多来源: ArXiv AI | 24-12-25
Concept Generalization in Humans and Large Language Models: Insights from the Number Game
Authors: Arghavan Bazigaran, Hansem Sohn |
阅读更多来源: ArXiv AI | 24-12-25
SynCraft: Guiding Large Language Models to Predict Edit Sequences for Molecular Synthesizability Optimization
Authors: Junren Li, Luhua Lai |
阅读更多来源: ArXiv AI | 24-12-25
A DeepSeek-Powered AI System for Automated Chest Radiograph Interpretation in Clinical Practice
Authors: Yaowei Bai, Ruiheng Zhang, Yu Lei, Xuhua Duan, Jingfeng Yao, Shuguang Ju, Chaoyang Wang, Wei Yao, Yiwan Guo, Guilin Zhang, Chao Wan, Qian Yuan, Lei Chen, Wenjuan Tang, Biqiang Zhu, Xinggang Wang, Tao Sun, Wei Zhou, Dacheng Tao, Yongchao Xu, Chuansheng Zheng, Huangxuan Zhao, Bo Du |
阅读更多来源: ArXiv AI | 24-12-25
Benchmarking LLMs for Predictive Applications in the Intensive Care Units
Authors: Chehak Malhotra, Mehak Gopal, Akshaya Devadiga, Pradeep Singh, Ridam Pal, Ritwik Kashyap, Tavpritesh Sethi |
阅读更多来源: ArXiv AI | 24-12-25
Automated stereotactic radiosurgery planning using a human-in-the-loop reasoning large language model agent
Authors: Humza Nusrat, Luke Francisco, Bing Luo, Hassan Bagher-Ebadian, Joshua Kim, Karen Chin-Snyder, Salim Siddiqui, Mira Shah, Eric Mellon, Mohammad Ghassemi, Anthony Doemer, Benjamin Movsas, Kundan Thind |
阅读更多来源: ArXiv AI | 24-12-25
Toad is a unified experience for AI in the terminalwillmcgugan.github.io
阅读更多来源: Hacker News | 24-12-25
Local AI is driving the biggest change in laptops in decadesieee.org
阅读更多来源: Hacker News | 24-12-25
Show HN: I hired AI to fix my memory, but made it 100% Offline for privacynamememory.netlify.app
阅读更多来源: Hacker News | 24-12-25
Scaling LLMs to Larger Codebaseskierangill.xyz
阅读更多来源: Hacker News | 23-12-25
Claude Code gets native LSP supportgithub.com/anthropics
阅读更多来源: Hacker News | 23-12-25
Executorch: On-device AI across mobile, embedded and edge for PyTorchgithub.com/pytorch
阅读更多来源: Hacker News | 23-12-25
The Illustrated Transformerjalammar.github.io
阅读更多来源: Hacker News | 23-12-25
Few-Shot Learning of a Graph-Based Neural Network Model Without Backpropagation
Authors: Mykyta Lapin, Kostiantyn Bokhan, Yurii Parzhyn |
阅读更多来源: ArXiv AI | 23-12-25
Vox Deorum: A Hybrid LLM Architecture for 4X / Grand Strategy Game AI -- Lessons from Civilization V
Authors: John Chen, Sihan Cheng, Can Gurkan, Ryan Lay, Moez Salahuddin |
阅读更多来源: ArXiv AI | 23-12-25
Large Language Models as Discounted Bayesian Filters
Authors: Jensen Zhang, Jing Yang, Keze Wang |
阅读更多来源: ArXiv AI | 23-12-25
ESearch-R1: Learning Cost-Aware MLLM Agents for Interactive Embodied Search via Reinforcement Learning
Authors: Weijie Zhou, Xuangtang Xiong, Ye Tian, Lijun Yue, Xinyu Wu, Wei Li, Chaoyang Zhao, Honghui Dong, Ming Tang, Jinqiao Wang, Zhengyou Zhang |
阅读更多来源: ArXiv AI | 23-12-25
IntelliCode: A Multi-Agent LLM Tutoring System with Centralized Learner Modeling
Authors: Jones David, Shreya Ghosh |
阅读更多来源: ArXiv AI | 23-12-25
Gabliteration: Adaptive Multi-Directional Neural Weight Modification for Selective Behavioral Alteration in Large Language Models
Authors: Gökdeniz Gülmez |
阅读更多来源: ArXiv AI | 23-12-25
The Dead Salmons of AI Interpretability
Authors: Maxime Méloux, Giada Dirupo, François Portet, Maxime Peyrard |
阅读更多来源: ArXiv AI | 23-12-25
MEEA: Mere Exposure Effect-Driven Confrontational Optimization for LLM Jailbreaking
Authors: Jianyi Zhang, Shizhao Liu, Ziyin Zhou, Zhen Li |
阅读更多来源: ArXiv AI | 23-12-25
Population-Evolve: a Parallel Sampling and Evolutionary Method for LLM Math Reasoning
Authors: Yanzhi Zhang, Yitong Duan, Zhaoxi Zhang, Jiyan He, Shuxin Zheng |
阅读更多来源: ArXiv AI | 23-12-25
Can abstract concepts from LLM improve SLM performance?
Authors: Siddharth Tandon |
阅读更多来源: ArXiv AI | 23-12-25
Observer, Not Player: Simulating Theory of Mind in LLMs through Game Observation
Authors: Jerry Wang, Ting Yiu Liu |
阅读更多来源: ArXiv AI | 23-12-25
Understanding Chain-of-Thought in Large Language Models via Topological Data Analysis
Authors: Chenghao Li, Chaoning Zhang, Yi Lu, Shuxu Chen, Xudong Wang, Jiaquan Zhang, Zhicheng Wang, Zhengxun Jin, Kuien Liu, Sung-Ho Bae, Guoqing Wang, Yang Yang, Hen Tao Shen |
阅读更多来源: ArXiv AI | 23-12-25
Vibe Reasoning: Eliciting Frontier AI Mathematical Capabilities -- A Case Study on IMO 2025 Problem 6
Authors: Jiaao Wu, Xian Zhang, Fan Yang, Yinpeng Dong |
阅读更多来源: ArXiv AI | 23-12-25
Generation of Programmatic Rules for Document Forgery Detection Using Large Language Models
Authors: Valentin Schmidberger, Manuel Eberhardinger, Setareh Maghsudi, Johannes Maucher |
阅读更多来源: ArXiv AI | 23-12-25
VIGOR+: Iterative Confounder Generation and Validation via LLM-CEVAE Feedback Loop
Authors: JiaWei Zhu, ZiHeng Liu |
阅读更多来源: ArXiv AI | 23-12-25
PENDULUM: A Benchmark for Assessing Sycophancy in Multimodal Large Language Models
Authors: A. B. M. Ashikur Rahman, Saeed Anwar, Muhammad Usman, Irfan Ahmad, Ajmal Mian |
阅读更多来源: ArXiv AI | 23-12-25
Towards Closed-Loop Embodied Empathy Evolution: Probing LLM-Centric Lifelong Empathic Motion Generation in Unseen Scenarios
Authors: Jiawen Wang, Jingjing Wang Tianyang Chen, Min Zhang, Guodong Zhou |
阅读更多来源: ArXiv AI | 23-12-25
Bangla MedER: Multi-BERT Ensemble Approach for the Recognition of Bangla Medical Entity
Authors: Tanjim Taharat Aurpa, Farzana Akter, Md. Mehedi Hasan, Shakil Ahmed, Shifat Ara Rafiq, Fatema Khan |
阅读更多来源: ArXiv AI | 23-12-25
Intelligent Knowledge Mining Framework: Bridging AI Analysis and Trustworthy Preservation
Authors: Binh Vu |
阅读更多来源: ArXiv AI | 23-12-25
Integrating Computational Methods and AI into Qualitative Studies of Aging and Later Life
Authors: Corey M. Abramson |
阅读更多来源: ArXiv AI | 23-12-25
LLM-based Behaviour Driven Development for Hardware Design
Authors: Rolf Drechsler, Qian Liu |
阅读更多来源: ArXiv AI | 23-12-25
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
Authors: Wanghan Xu, Yuhao Zhou, Yifan Zhou, Qinglong Cao, Shuo Li, Jia Bu, Bo Liu, Yixin Chen, Xuming He, Xiangyu Zhao, Xiang Zhuang, Fengxiang Wang, Zhiwang Zhou, Qiantai Feng, Wenxuan Huang, Jiaqi Wei, Hao Wu, Yuejin Yang, Guangshuai Wang, Sheng Xu, Ziyan Huang, Xinyao Liu, Jiyao Liu, Cheng Tang, Wei Li, Ying Chen, Junzhi Ning, Pengfei Jiang, Chenglong Ma, Ye Du, Changkai Ji, Huihui Xu, Ming Hu, Jiangbin Zheng, Xin Chen, Yucheng Wu, Feifei Jiang, Xi Chen, Xiangru Tang, Yuchen Fu, Yingzhou Lu, Yuanyuan Zhang, Lihao Sun, Chengbo Li, Jinzhe Ma, Wanhao Liu, Yating Liu, Kuo-Cheng Wu, Shengdu Chai, Yizhou Wang, Ouwen Zhangjin, Chen Tang, Shufei Zhang, Wenbo Cao, Junjie Ren, Taoyong Cui, Zhouheng Yao, Juntao Deng, Yijie Sun, Feng Liu, Wangxu Wei, Jingyi Xu, Zhangrui Li, Junchao Gong, Zijie Guo, Zhiyu Yao, Zaoyu Chen, Tianhao Peng, Fangchen Yu, Bo Zhang, Dongzhan Zhou, Shixiang Tang, Jiaheng Liu, Fenghua Ling, Yan Lu, Yuchen Ren, Ben Fei, Zhen Zhao, Xinyu Gu, Rui Su, Xiao-Ming Wu, Weikang Si, Yang Liu, Hao Chen, Xiangchao Yan, Xue Yang, Junchi Yan, Jiamin Wu, Qihao Zheng, Chenhui Li, Zhiqiang Gao, Hao Kong, Junjun He, Mao Su, Tianfan Fu, Peng Ye, Chunfeng Song, Nanqing Dong, Yuqiang Li, Huazhu Fu |
阅读更多来源: ArXiv AI | 23-12-25
UniRel-R1: RL-tuned LLM Reasoning for Knowledge Graph Relational Question Answering
Authors: Yinxu Tang, Chengsong Huang, Jiaxin Huang, William Yeoh |
阅读更多来源: ArXiv AI | 23-12-25
Solomonoff-Inspired Hypothesis Ranking with LLMs for Prediction Under Uncertainty
Authors: Josh Barber (QUT), Rourke Young (QUT), Cameron Coombe (QUT and CSIRO), Will Browne (QUT) |
阅读更多来源: ArXiv AI | 23-12-25
A Solver-in-the-Loop Framework for Improving LLMs on Answer Set Programming for Logic Puzzle Solving
Authors: Timo Pierre Schrader, Lukas Lange, Tobias Kaminski, Simon Razniewski, Annemarie Friedrich |
阅读更多来源: ArXiv AI | 23-12-25
Value Under Ignorance in Universal Artificial Intelligence
Authors: Cole Wyeth, Marcus Hutter |
阅读更多来源: ArXiv AI | 23-12-25
Accelerating Multi-modal LLM Gaming Performance via Input Prediction and Mishit Correction
Authors: Ziyang Lin, Zixuan Sun, Sanhorn Chen, Xiaoyang Chen, Roy Zhao |
阅读更多来源: ArXiv AI | 23-12-25
MMRAG-RFT: Two-stage Reinforcement Fine-tuning for Explainable Multi-modal Retrieval-augmented Generation
Authors: Shengwei Zhao, Jingwen Yao, Sitong Wei, Linhai Xu, Yuying Liu, Dong Zhang, Zhiqiang Tian, Shaoyi Du |
阅读更多来源: ArXiv AI | 23-12-25
Large Language Models as Pokémon Battle Agents: Strategic Play and Content Generation
Authors: Daksh Jain, Aarya Jain, Ashutosh Desai, Avyakt Verma, Ishan Bhanuka, Pratik Narang, Dhruv Kumar |
阅读更多来源: ArXiv AI | 23-12-25
ScoutGPT: Capturing Player Impact from Team Action Sequences Using GPT-Based Framework
Authors: Miru Hong, Minho Lee, Geonhee Jo, Jae-Hee So, Pascal Bauer, Sang-Ki Ko |
阅读更多来源: ArXiv AI | 23-12-25
Dialectics for Artificial Intelligence
Authors: Zhengmian Hu |
阅读更多来源: ArXiv AI | 23-12-25
Towards Explainable Conversational AI for Early Diagnosis with Large Language Models
Authors: Maliha Tabassum, M Shamim Kaiser |
阅读更多来源: ArXiv AI | 23-12-25
Humanlike AI Design Increases Anthropomorphism but Yields Divergent Outcomes on Engagement and Trust Globally
Authors: Robin Schimmelpfennig, Mark Díaz, Vinodkumar Prabhakaran, Aida Davani |
阅读更多来源: ArXiv AI | 23-12-25
How I protect my Forgejo instance from AI web crawlersesy.fun
阅读更多来源: Hacker News | 22-12-25
I announced my divorce on Instagram and then AI impersonated meeiratansey.com
阅读更多来源: Hacker News | 22-12-25
Three ways to solve problemsandreasfragner.com
阅读更多来源: Hacker News | 22-12-25
Get an AI code review in 10 secondsoldmanrahul.com
阅读更多来源: Hacker News | 22-12-25
Evaluating chain-of-thought monitorabilityopenai.com
阅读更多来源: Hacker News | 22-12-25
Do Large Language Models Know What They Don't Know? Kalshibench: A New Benchmark for Evaluating Epistemic Calibration via Prediction Markets
Authors: Lukas Nel |
阅读更多来源: ArXiv AI | 21-12-25
Small Language Models for Efficient Agentic Tool Calling: Outperforming Large Models with Targeted Fine-tuning
Authors: Polaris Jhandi, Owais Kazi, Shreyas Subramanian, Neel Sendas |
阅读更多来源: ArXiv AI | 21-12-25
Leveraging Spreading Activation for Improved Document Retrieval in Knowledge-Graph-Based RAG Systems
Authors: Jovan Pavlović, Miklós Krész, László Hajdu |
阅读更多来源: ArXiv AI | 21-12-25
Darth Vecdor: An Open-Source System for Generating Knowledge Graphs Through Large Language Model Queries
Authors: Jonathan A. Handler |
阅读更多来源: ArXiv AI | 21-12-25
PediatricAnxietyBench: Evaluating Large Language Model Safety Under Parental Anxiety and Pressure in Pediatric Consultations
Authors: Vahideh Zolfaghari |
阅读更多来源: ArXiv AI | 21-12-25
Scaling Spatial Reasoning in MLLMs through Programmatic Data Synthesis
Authors: Zhi Helu, Huang Jingjing, Xu Wang, Xu Yangbin, Zhang Wanyue, Jiang Baoyang, Deng Shirui, Zhu Liang, Li Fangfang, Zhao Tiejun, Lin Yankai, Yao Yuan |
阅读更多来源: ArXiv AI | 21-12-25
Topic Discovery and Classification for Responsible Generative AI Adaptation in Higher Education
Authors: Diane Myung-kyung Woodbridge, Allyson Seba, Freddie Seba, Aydin Schwartz |
阅读更多来源: ArXiv AI | 21-12-25
AlignMerge - Alignment-Preserving Large Language Model Merging via Fisher-Guided Geometric Constraints
Authors: Aniruddha Roy, Jyoti Patel, Aman Chadha, Vinija Jain, Amitava Das |
阅读更多来源: ArXiv AI | 21-12-25
Synthelite: Chemist-aligned and feasibility-aware synthesis planning with LLMs
Authors: Nguyen Xuan-Vu, Daniel Armstrong, Milena Wehrbach, Andres M Bran, Zlatko Jončev, Philippe Schwaller |
阅读更多来源: ArXiv AI | 21-12-25
Design and Evaluation of Cost-Aware PoQ for Decentralized LLM Inference
Authors: Arther Tian, Alex Ding, Frank Chen, Alan Wu, Aaron Chan, Bruce Zhang |
阅读更多来源: ArXiv AI | 21-12-25
TimeSeries2Report prompting enables adaptive large language model management of lithium-ion batteries
Authors: Jiayang Yang, Chunhui Zhao, Martin Guay, Zhixing Cao |
阅读更多来源: ArXiv AI | 21-12-25
Prefix Probing: Lightweight Harmful Content Detection for Large Language Models
Authors: Jirui Yang, Hengqi Guo, Zhihui Lu, Yi Zhao, Yuansen Zhang, Shijing Hu, Qiang Duan, Yinggui Wang, Tao Wei |
阅读更多来源: ArXiv AI | 21-12-25
From Personalization to Prejudice: Bias and Discrimination in Memory-Enhanced AI Agents for Recruitment
Authors: Himanshu Gharat, Himanshi Agrawal, Gourab K. Patro |
阅读更多来源: ArXiv AI | 21-12-25
Scaling Laws for Energy Efficiency of Local LLMs
Authors: Ander Alvarez, Alessandro Genuardi, Nilotpal Sinha, Antonio Tiene, Samuel Mugel, Román Orús |
阅读更多来源: ArXiv AI | 21-12-25
Comprehensive AI Literacy: The Case for Centering Human Agency
Authors: Sri Yash Tadimalla, Justin Cary, Gordon Hull, Jordan Register, Daniel Maxwell, David Pugalee, Tina Heafner |
阅读更多来源: ArXiv AI | 21-12-25
Cyber Humanism in Education: Reclaiming Agency through AI and Learning Sciences
Authors: Giovanni Adorni |
阅读更多来源: ArXiv AI | 21-12-25
Discovering and Learning Probabilistic Models of Black-Box AI Capabilities
Authors: Daniel Bramblett, Rushang Karia, Adrian Ciotinga, Ruthvick Suresh, Pulkit Verma, YooJung Choi, Siddharth Srivastava |
阅读更多来源: ArXiv AI | 21-12-25
TOGGLE: Temporal Logic-Guided Large Language Model Compression for Edge
Authors: Khurram Khalil, Khaza Anuarul Hoque |
阅读更多来源: ArXiv AI | 21-12-25
Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning
Authors: Qihao Liu, Luoxin Ye, Wufei Ma, Yu-Cheng Chou, Alan Yuille |
阅读更多来源: ArXiv AI | 21-12-25
Clair Obscur having its Indie Game Game Of The Year award stripped due to AI usethegamer.com
阅读更多来源: Hacker News | 21-12-25
Show HN: HN Wrapped 2025 - an LLM reviews your year on HNkadoa.com
阅读更多来源: Hacker News | 21-12-25
Gemini 3 Pro vs. 2.5 Pro in Pokemon Crystaljcz.dev
阅读更多来源: Hacker News | 21-12-25
Claude in Chromeclaude.com
阅读更多来源: Hacker News | 21-12-25
Measuring AI Ability to Complete Long Tasksmetr.org
阅读更多来源: Hacker News | 21-12-25
Skills Officially Comes to Codexdevelopers.openai.com
阅读更多来源: Hacker News | 21-12-25
MIRA – An open-source persistent AI entity with memorygithub.com/taylorsatula
阅读更多来源: Hacker News | 21-12-25
LLM Year in Reviewkarpathy.bearblog.dev
阅读更多来源: Hacker News | 20-12-25
Garage – An S3 object store so reliable you can run it outside datacentersdeuxfleurs.fr
阅读更多来源: Hacker News | 20-12-25
Show HN: Misata – synthetic data engine using LLM and Vectorized NumPygithub.com/rasinmuhammed
阅读更多来源: Hacker News | 20-12-25
History LLMs: Models trained exclusively on pre-1913 textsgithub.com/dgoettlich
阅读更多来源: Hacker News | 20-12-25
We ran Anthropic’s interviews through structured LLM analysisplaybookatlas.com
阅读更多来源: Hacker News | 20-12-25
Believe the Checkbookrobertgreiner.com
阅读更多来源: Hacker News | 20-12-25
OpenAI updates Codex model, adds trusted access program for cyber defense
阅读更多来源: The Decoder | 19-12-25
GPT-5.2 tops OpenAI's new FrontierScience test but struggles with real research problems
阅读更多来源: The Decoder | 19-12-25
Anthropic's AI store makes money while debating eternal transcendence
阅读更多来源: The Decoder | 19-12-25
Skills for organizations, partners, the ecosystemclaude.com
阅读更多来源: Hacker News | 19-12-25
Show HN: Stop AI scrapers from hammering your self-hosted blog (using porn)github.com/vivienhenz24
阅读更多来源: Hacker News | 19-12-25
How China built its ‘Manhattan Project’ to rival the West in AI chipsjapantimes.co.jp
阅读更多来源: Hacker News | 19-12-25
Show HN: Picknplace.js, an alternative to drag-and-dropjgthms.com
阅读更多来源: Hacker News | 19-12-25
GPT-5.2-Codexopenai.com
阅读更多来源: Hacker News | 19-12-25
Prompt caching for cheaper LLM tokensngrok.com
阅读更多来源: Hacker News | 19-12-25
Show HN: CommerceTXT – An open standard for AI shopping context (like llms.txt)commercetxt.org
阅读更多来源: Hacker News | 19-12-25
Nvidia's Nemotron 3 swaps pure Transformers for a Mamba hybrid to run AI agents efficiently
阅读更多来源: The Decoder | 19-12-25
Google makes Gemini 3 Flash the default for search and slashes reasoning costs
阅读更多来源: The Decoder | 19-12-25
I've been writing ring buffers wrong all these years (2016)snellman.net
阅读更多来源: Hacker News | 19-12-25
Firefox will have an option to disable all AI featuresmastodon.social
阅读更多来源: Hacker News | 19-12-25
Adversarial versification in portuguese as a jailbreak operator in LLMs
Authors: Joao Queiroz |
阅读更多来源: ArXiv AI | 19-12-25
Exploring User Acceptance and Concerns toward LLM-powered Conversational Agents in Immersive Extended Reality
Authors: Efe Bozkir, Enkelejda Kasneci |
阅读更多来源: ArXiv AI | 19-12-25
BERT and CNN integrated Neural Collaborative Filtering for Recommender Systems
Authors: Abdullah Al Munem, Sumona Yeasmin, Mohammad Rezwanul Huq |
阅读更多来源: ArXiv AI | 19-12-25
Attention in Motion: Secure Platooning via Transformer-based Misbehavior Detection
Authors: Konstantinos Kalogiannis, Ahmed Mohamed Hussain, Hexu Li, Panos Papadimitratos |
阅读更多来源: ArXiv AI | 19-12-25
How Do Semantically Equivalent Code Transformations Impact Membership Inference on LLMs for Code?
Authors: Hua Yang, Alejandro Velasco, Thanh Le-Cong, Md Nazmul Haque, Bowen Xu, Denys Poshyvanyk |
阅读更多来源: ArXiv AI | 19-12-25
BashArena: A Control Setting for Highly Privileged AI Agents
Authors: Adam Kaufman, James Lucassen, Tyler Tracy, Cody Rushing, Aryan Bhatt |
阅读更多来源: ArXiv AI | 19-12-25
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Authors: Zhenwen Liang, Sidi Lu, Wenhao Yu, Kishan Panaganti, Yujun Zhou, Haitao Mi, Dong Yu |
阅读更多来源: ArXiv AI | 19-12-25
Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers
Authors: Adam Karvonen, James Chua, Clément Dumas, Kit Fraser-Taliente, Subhash Kantamneni, Julian Minder, Euan Ong, Arnab Sen Sharma, Daniel Wen, Owain Evans, Samuel Marks |
阅读更多来源: ArXiv AI | 19-12-25
Evaluating Metrics for Safety with LLM-as-Judges
Authors: Kester Clegg, Richard Hawkins, Ibrahim Habli, Tom Lawton |
阅读更多来源: ArXiv AI | 19-12-25
LADY: Linear Attention for Autonomous Driving Efficiency without Transformers
Authors: Jihao Huang, Xi Xia, Zhiyuan Li, Tianle Liu, Jingke Wang, Junbo Chen, Tengju Ye |
阅读更多来源: ArXiv AI | 19-12-25
Beyond Accuracy: A Geometric Stability Analysis of Large Language Models in Chess Evaluation
Authors: Xidan Song, Weiqi Wang, Ruifeng Cao, Qingya Hu |
阅读更多来源: ArXiv AI | 19-12-25
AgroAskAI: A Multi-Agentic AI Framework for Supporting Smallholder Farmers' Enquiries Globally
Authors: Nadine Angela Cantonjos, Arpita Biswas |
阅读更多来源: ArXiv AI | 19-12-25
IaC Generation with LLMs: An Error Taxonomy and A Study on Configuration Knowledge Injection
Authors: Roman Nekrasov, Stefano Fossati, Indika Kumara, Damian Andrew Tamburri, Willem-Jan van den Heuvel |
阅读更多来源: ArXiv AI | 19-12-25
Attention as Binding: A Vector-Symbolic Perspective on Transformer Reasoning
Authors: Sahil Rajesh Dhayalkar |
阅读更多来源: ArXiv AI | 19-12-25
Beyond Fast and Slow: Cognitive-Inspired Elastic Reasoning for Large Language Models
Authors: Jinwu Hu, Dongjin Yang, Langyu Bian, Zhiquan Wen, Yufeng Wang, Yaofo Chen, Bin Xiao, Yuanqing Li, Mingkui Tan |
阅读更多来源: ArXiv AI | 19-12-25
Agentic AI for Integrated Sensing and Communication: Analysis, Framework, and Case Study
Authors: Wenwen Xie, Geng Sun, Ruichen Zhang, Xuejie Liu, Yinqiu Liu, Jiacheng Wang, Dusit Niyato, Ping Zhang |
阅读更多来源: ArXiv AI | 19-12-25
Bilateral Spatial Reasoning about Street Networks: Graph-based RAG with Qualitative Spatial Representations
Authors: Reinhard Moratz, Niklas Daute, James Ondieki, Markus Kattenbeck, Mario Krajina, Ioannis Giannopoulos |
阅读更多来源: ArXiv AI | 19-12-25
ChatGPT and Gemini participated in the Korean College Scholastic Ability Test -- Earth Science I
Authors: Seok-Hyun Ga, Chun-Yen Chang |
阅读更多来源: ArXiv AI | 19-12-25
Evaluating Large Language Models in Scientific Discovery
Authors: Zhangde Song, Jieyu Lu, Yuanqi Du, Botao Yu, Thomas M. Pruyn, Yue Huang, Kehan Guo, Xiuzhe Luo, Yuanhao Qu, Yi Qu, Yinkai Wang, Haorui Wang, Jeff Guo, Jingru Gan, Parshin Shojaee, Di Luo, Andres M Bran, Gen Li, Qiyuan Zhao, Shao-Xiong Lennon Luo, Yuxuan Zhang, Xiang Zou, Wanru Zhao, Yifan F. Zhang, Wucheng Zhang, Shunan Zheng, Saiyang Zhang, Sartaaj Takrim Khan, Mahyar Rajabi-Kochi, Samantha Paradi-Maropakis, Tony Baltoiu, Fengyu Xie, Tianyang Chen, Kexin Huang, Weiliang Luo, Meijing Fang, Xin Yang, Lixue Cheng, Jiajun He, Soha Hassoun, Xiangliang Zhang, Wei Wang, Chandan K. Reddy, Chao Zhang, Zhiling Zheng, Mengdi Wang, Le Cong, Carla P. Gomes, Chang-Yu Hsieh, Aditya Nandy, Philippe Schwaller, Heather J. Kulik, Haojun Jia, Huan Sun, Seyed Mohamad Moosavi, Chenru Duan |
阅读更多来源: ArXiv AI | 19-12-25
Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning
Authors: Jiaqi Xu, Cuiling Lan, Xuejin Chen, Yan LU |
阅读更多来源: ArXiv AI | 19-12-25
Explaining the Reasoning of Large Language Models Using Attribution Graphs
Authors: Chase Walker, Rickard Ewetz |
阅读更多来源: ArXiv AI | 19-12-25
OpenAI's GPT-5 router rollback shows why AI requires unlearning old habits
阅读更多来源: The Decoder | 18-12-25
AWS CEO says replacing junior devs with AI is 'one of the dumbest ideas'finalroundai.com
阅读更多来源: Hacker News | 18-12-25
Breaking Paragraphs into Lines [pdf] (1981)gwern.net
阅读更多来源: Hacker News | 18-12-25
A school locked down after AI flagged a gun. It was a clarinetwashingtonpost.com
阅读更多来源: Hacker News | 18-12-25
Gemini 3 Flash: Frontier intelligence built for speedblog.google
阅读更多来源: Hacker News | 18-12-25
Flick (YC F25) Is Hiring Founding Engineer to Build Figma for AI Filmmakingycombinator.com
阅读更多来源: Hacker News | 18-12-25
Developers can now submit apps to ChatGPTopenai.com
阅读更多来源: Hacker News | 18-12-25
Towards Nepali-language LLMs: Efficient GPT training with a Nepali BPE tokenizer
Authors: Adarsha Shrestha, Basanta Pokharel, Binit Shrestha, Smriti Adhikari, Dinesh Gothe |
阅读更多来源: ArXiv AI | 18-12-25
AI-Powered Annotation Pipelines for Stabilizing Large Language Models: A Human-AI Synergy Approach
Authors: Gangesh Pathak, Prasanna Kumar |
阅读更多来源: ArXiv AI | 18-12-25
LoopBench: Discovering Emergent Symmetry Breaking Strategies with LLM Swarms
Authors: Ali Parsaee, Yashar Talebirad, Csongor Szepesvári, Vishwajeet Ohal, Eden Redman |
阅读更多来源: ArXiv AI | 18-12-25
Adjudicator: Correcting Noisy Labels with a KG-Informed Council of LLM Agents
Authors: Doohee You, Sundeep Paul |
阅读更多来源: ArXiv AI | 18-12-25
Leveraging LLMs for Structured Data Extraction from Unstructured Patient Records
Authors: Mitchell A. Klusty, Elizabeth C. Solie, Caroline N. Leach, W. Vaiden Logan, Lynnet E. Richey, John C. Gensel, David P. Szczykutowicz, Bryan C. McLellan, Emily B. Collier, Samuel E. Armstrong, V.K. Cody Bumgardner |
阅读更多来源: ArXiv AI | 18-12-25
TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
Authors: Jun Zhang, Teng Wang, Yuying Ge, Yixiao Ge, Xinhao Li, Ying Shan, Limin Wang |
阅读更多来源: ArXiv AI | 18-12-25
Compressed Causal Reasoning: Quantization and GraphRAG Effects on Interventional and Counterfactual Accuracy
Authors: Steve Nwaiwu, Nipat Jongsawat, Anucha Tungkasthan |
阅读更多来源: ArXiv AI | 18-12-25
Semantic Grounding Index: Geometric Bounds on Context Engagement in RAG Systems
Authors: Javier Marín |
阅读更多来源: ArXiv AI | 18-12-25
Mathematics and Coding are Universal AI Benchmarks
Authors: Przemyslaw Chojecki |
阅读更多来源: ArXiv AI | 18-12-25
Evaluating Frontier LLMs on PhD-Level Mathematical Reasoning: A Benchmark on a Textbook in Theoretical Computer Science about Randomized Algorithms
Authors: Yang Cao, Yubin Chen, Xuyang Guo, Zhao Song, Song Yue, Jiahao Zhang, Jiale Zhao |
阅读更多来源: ArXiv AI | 18-12-25
EvoLattice: Persistent Internal-Population Evolution through Multi-Alternative Quality-Diversity Graph Representations for LLM-Guided Program Discovery
Authors: Kamer Ali Yuksel |
阅读更多来源: ArXiv AI | 18-12-25
ReflCtrl: Controlling LLM Reflection via Representation Engineering
Authors: Ge Yan, Chung-En Sun, Tsui-Wei (Lily)Weng |
阅读更多来源: ArXiv AI | 18-12-25
RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees
Authors: Junjie Ma, Jinlong Li |
阅读更多来源: ArXiv AI | 18-12-25
Optimizing Multi-Tier Supply Chain Ordering with a Hybrid Liquid Neural Network and Extreme Gradient Boosting Model
Authors: Chunan Tong |
阅读更多来源: ArXiv AI | 18-12-25
HydroGEM: A Self Supervised Zero Shot Hybrid TCN Transformer Foundation Model for Continental Scale Streamflow Quality Control
Authors: Ijaz Ul Haq, Byung Suk Lee, Julia N. Perdrial, David Baude |
阅读更多来源: ArXiv AI | 18-12-25
Leveraging LLMs for Collaborative Ontology Engineering in Parkinson Disease Monitoring and Alerting
Authors: Georgios Bouchouras, Dimitrios Doumanas, Andreas Soularidis, Konstantinos Kotis, George A. Vouros |
阅读更多来源: ArXiv AI | 18-12-25
Georeferencing complex relative locality descriptions with large language models
Authors: Aneesha Fernando, Surangika Ranathunga, Kristin Stock, Raj Prasanna, Christopher B. Jones |
阅读更多来源: ArXiv AI | 18-12-25
Massive Editing for Large Language Models Based on Dynamic Weight Generation
Authors: Wentao Wan, Qiqing Lao, Zhiwei Xie, Hefeng Wu, Runnan Lin, Liang Lin, Keze Wang |
阅读更多来源: ArXiv AI | 18-12-25
PortAgent: LLM-driven Vehicle Dispatching Agent for Port Terminals
Authors: Jia Hu, Junqi Li, Weimeng Lin, Peng Jia, Yuxiong Ji, Jintao Lai |
阅读更多来源: ArXiv AI | 18-12-25
Sparse Multi-Modal Transformer with Masking for Alzheimer's Disease Classification
Authors: Cheng-Han Lu, Pei-Hsuan Tsai |
阅读更多来源: ArXiv AI | 18-12-25
Model-First Reasoning LLM Agents: Reducing Hallucinations through Explicit Problem Modeling
Authors: Annu Rana, Gaurav Kumar |
阅读更多来源: ArXiv AI | 18-12-25
Nearly half of US workers now use AI on the job, but most aren't using it daily
阅读更多来源: The Decoder | 17-12-25
GPT Image 1.5openai.com
阅读更多来源: Hacker News | 17-12-25
I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in hourssimonwillison.net
阅读更多来源: Hacker News | 17-12-25
Log Anomaly Detection with Large Language Models via Knowledge-Enriched Fusion
Authors: Anfeng Peng, Ajesh Koyatan Chathoth, Stephen Lee |
阅读更多来源: ArXiv AI | 17-12-25
AGAPI-Agents: An Open-Access Agentic AI Platform for Accelerated Materials Design on AtomGPT.org
Authors: Jaehyung Lee, Justin Ely, Kent Zhang, Akshaya Ajith, Charles Rhys Campbell, Kamal Choudhary |
阅读更多来源: ArXiv AI | 17-12-25
CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving
Authors: Dong Liu, Yanxuan Yu |
阅读更多来源: ArXiv AI | 17-12-25
Floorplan2Guide: LLM-Guided Floorplan Parsing for BLV Indoor Navigation
Authors: Aydin Ayanzadeh, Tim Oates |
阅读更多来源: ArXiv AI | 17-12-25
The Forecast Critic: Leveraging Large Language Models for Poor Forecast Identification
Authors: Luke Bhan, Hanyu Zhang, Andrew Gordon Wilson, Michael W. Mahoney, Chuck Arvin |
阅读更多来源: ArXiv AI | 17-12-25
Understanding Critical Thinking in Generative Artificial Intelligence Use: Development, Validation, and Correlates of the Critical Thinking in AI Use Scale
Authors: Gabriel R. Lau, Wei Yan Low, Louis Tay, Ysabel Guevarra, Dragan Gašević, Andree Hartanto |
阅读更多来源: ArXiv AI | 17-12-25
Feeling the Strength but Not the Source: Partial Introspection in LLMs
Authors: Ely Hahami, Lavik Jain, Ishaan Sinha |
阅读更多来源: ArXiv AI | 17-12-25
Quantum-Aware Generative AI for Materials Discovery: A Framework for Robust Exploration Beyond DFT Biases
Authors: Mahule Roy, Guillaume Lambard |
阅读更多来源: ArXiv AI | 17-12-25
World Models Unlock Optimal Foraging Strategies in Reinforcement Learning Agents
Authors: Yesid Fonseca, Manuel S. Ríos, Nicanor Quijano, Luis F. Giraldo |
阅读更多来源: ArXiv AI | 17-12-25
KidsArtBench: Multi-Dimensional Children's Art Evaluation with Attribute-Aware MLLMs
Authors: Mingrui Ye, Chanjin Zheng, Zengyi Yu, Chenyu Xiang, Zhixue Zhao, Zheng Yuan, Helen Yannakoudakis |
阅读更多来源: ArXiv AI | 17-12-25
SafeGen: Embedding Ethical Safeguards in Text-to-Image Generation
Authors: Dang Phuong Nam, Nguyen Kieu, Pham Thanh Hieu |
阅读更多来源: ArXiv AI | 17-12-25
AgentSHAP: Interpreting LLM Agent Tool Importance with Monte Carlo Shapley Value Estimation
Authors: Miriam Horovicz |
阅读更多来源: ArXiv AI | 17-12-25
Personalized QoE Prediction: A Demographic-Augmented Machine Learning Framework for 5G Video Streaming Networks
Authors: Syeda Zunaira Ahmed, Hejab Tahira Beg, Maryam Khalid |
阅读更多来源: ArXiv AI | 17-12-25
Synergizing Code Coverage and Gameplay Intent: Coverage-Aware Game Playtesting with LLM-Guided Reinforcement Learning
Authors: Enhong Mu, Minami Yoda, Yan Zhang, Mingyue Zhang, Yutaka Matsuno, Jialong Li |
阅读更多来源: ArXiv AI | 17-12-25
Fault-Tolerant Sandboxing for AI Coding Agents: A Transactional Approach to Safe Autonomous Execution
Authors: Boyang Yan |
阅读更多来源: ArXiv AI | 17-12-25
M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization
Authors: Bizhe Bai, Hongming Wu, Peng Ye, Tao Chen |
阅读更多来源: ArXiv AI | 17-12-25
Can AI Understand What We Cannot Say? Measuring Multilevel Alignment Through Abortion Stigma Across Cognitive, Interpersonal, and Structural Levels
Authors: Anika Sharma, Malavika Mampally, Chidaksh Ravuru, Kandyce Brennan, Neil Gaikwad |
阅读更多来源: ArXiv AI | 17-12-25
neuralFOMO: Can LLMs Handle Being Second Best? Measuring Envy-Like Preferences in Multi-Agent Settings
Authors: Ojas Pungalia, Rashi Upadhyay, Abhishek Mishra, Abhiram H, Tejasvi Alladi, Sujan Yenuganti, Dhruv Kumar |
阅读更多来源: ArXiv AI | 17-12-25
Behavior and Representation in Large Language Models for Combinatorial Optimization: From Feature Extraction to Algorithm Selection
Authors: Francesca Da Ros, Luca Di Gaspero, Kevin Roitero |
阅读更多来源: ArXiv AI | 17-12-25
Show HN: TheAuditor v2.0 – A ”Flight Computer“ for AI Coding Agentsgithub.com/theauditortool
阅读更多来源: Hacker News | 17-12-25
Sei AI (YC W22) Is Hiringycombinator.com
阅读更多来源: Hacker News | 17-12-25
Midjourney is alemwjslaadillpickle.com
阅读更多来源: Hacker News | 17-12-25
8M users' AI conversations sold for profit by "privacy" extensionskoi.ai
阅读更多来源: Hacker News | 16-12-25
Native vs. emulation: World of Warcraft game performance on Snapdragon X Eliterkblog.dev
阅读更多来源: Hacker News | 16-12-25
AI Autonomy or Human Dependency? Defining the Boundary in Responsible AI with the $α$-Coefficient
Authors: Nattaya Mairittha, Gabriel Phorncharoenmusikul, Sorawit Worapradidth |
阅读更多来源: ArXiv AI | 16-12-25
Surveillance Video-Based Traffic Accident Detection Using Transformer Architecture
Authors: Tanu Singh, Pranamesh Chakraborty, Long T. Truong |
阅读更多来源: ArXiv AI | 16-12-25
MLLM Machine Unlearning via Visual Knowledge Distillation
Authors: Yuhang Wang, Zhenxing Niu, Haoxuan Ji, Guangyu He, Haichang Gao, Gang Hua |
阅读更多来源: ArXiv AI | 16-12-25
REMODEL-LLM: Transforming C code to Java using LLMs
Authors: Aryan Gupta, Y. Raghu Reddy |
阅读更多来源: ArXiv AI | 16-12-25
Does Less Hallucination Mean Less Creativity? An Empirical Investigation in LLMs
Authors: Mohor Banerjee, Nadya Yuki Wangsajaya, Syed Ali Redha Alsagoff, Min Sen Tan, Zachary Choy Kit Chun, Alvin Chan Guo Wei |
阅读更多来源: ArXiv AI | 16-12-25
Exploring MLLM-Diffusion Information Transfer with MetaCanvas
Authors: Han Lin, Xichen Pan, Ziqi Huang, Ji Hou, Jialiang Wang, Weifeng Chen, Zecheng He, Felix Juefei-Xu, Junzhe Sun, Zhipeng Fan, Ali Thabet, Mohit Bansal, Chu Wang |
阅读更多来源: ArXiv AI | 16-12-25
Graph Embedding with Mel-spectrograms for Underwater Acoustic Target Recognition
Authors: Sheng Feng, Shuqing Ma, Xiaoqian Zhu |
阅读更多来源: ArXiv AI | 16-12-25
Bounding Hallucinations: Information-Theoretic Guarantees for RAG Systems via Merlin-Arthur Protocols
Authors: Björn Deiseroth, Max Henning Höth, Kristian Kersting, Letitia Parcalabescu |
阅读更多来源: ArXiv AI | 16-12-25
CogniSNN: Enabling Neuron-Expandability, Pathway-Reusability, and Dynamic-Configurability with Random Graph Architectures in Spiking Neural Networks
Authors: Yongsheng Huang, Peibo Duan, Yujie Wu, Kai Sun, Zhipeng Liu, Changsheng Zhang, Bin Zhang, Mingkun Xu |
阅读更多来源: ArXiv AI | 16-12-25
From Verification Burden to Trusted Collaboration: Design Goals for LLM-Assisted Literature Reviews
Authors: Brenda Nogueira, Werner Geyer, Andrew Anderson, Toby Jia-Jun Li, Dongwhi Kim, Nuno Moniz, Nitesh V. Chawla |
阅读更多来源: ArXiv AI | 16-12-25
Conditional Coverage Diagnostics for Conformal Prediction
Authors: Sacha Braun, David Holzmüller, Michael I. Jordan, Francis Bach |
阅读更多来源: ArXiv AI | 16-12-25
Smudged Fingerprints: A Systematic Evaluation of the Robustness of AI Image Fingerprints
Authors: Kai Yao, Marc Juarez |
阅读更多来源: ArXiv AI | 16-12-25
Deep Learning--Accelerated Multi-Start Large Neighborhood Search for Real-time Freight Bundling
Authors: Haohui Zhang, Wouter van Heeswijk, Xinyu Hu, Neil Yorke-Smith, Martijn Mes |
阅读更多来源: ArXiv AI | 16-12-25
A-LAMP: Agentic LLM-Based Framework for Automated MDP Modeling and Policy Generation
Authors: Hong Je-Gal, Chan-Bin Yi, Hyun-Suk Lee |
阅读更多来源: ArXiv AI | 16-12-25
Towards Trustworthy Multi-Turn LLM Agents via Behavioral Guidance
Authors: Gonca Gürsun |
阅读更多来源: ArXiv AI | 16-12-25
General-purpose AI models can generate actionable knowledge on agroecological crop protection
Authors: Kris A.G. Wyckhuys |
阅读更多来源: ArXiv AI | 16-12-25
Three methods, one problem: Classical and AI approaches to no-three-in-line
Authors: Pranav Ramanathan, Thomas Prellberg, Matthew Lewis, Prathamesh Dinesh Joshi, Raj Abhijit Dandekar, Rajat Dandekar, Sreedath Panat |
阅读更多来源: ArXiv AI | 16-12-25
AI-MASLD Metabolic Dysfunction and Information Steatosis of Large Language Models in Unstructured Clinical Narratives
Authors: Yuan Shen, Xiaojun Wu, Linghua Yu |
阅读更多来源: ArXiv AI | 16-12-25
BAID: A Benchmark for Bias Assessment of AI Detectors
Authors: Priyam Basu, Yunfeng Zhang, Vipul Raheja |
阅读更多来源: ArXiv AI | 16-12-25
It seems that OpenAI is scraping [certificate transparency] logsbenjojo.co.uk
阅读更多来源: Hacker News | 16-12-25
I'm Kenyan. I don't write like ChatGPT, ChatGPT writes like memarcusolang.substack.com
阅读更多来源: Hacker News | 16-12-25
Elevated errors across many modelsclaude.com
阅读更多来源: Hacker News | 15-12-25
Copywriters reveal how AI has decimated their industrybloodinthemachine.com
阅读更多来源: Hacker News | 15-12-25
Show HN: I wrote a book – Debugging TypeScript Applications (in beta)pragprog.com
阅读更多来源: Hacker News | 15-12-25
If AI replaces workers, should it also pay taxes?elpais.com
阅读更多来源: Hacker News | 15-12-25
Microsoft Copilot AI Comes to LG TVs, and Can't Be Deletedtechpowerup.com
阅读更多来源: Hacker News | 15-12-25
Claude CLI deleted my home directory Wiped my whole Macreddit.com
阅读更多来源: Hacker News | 15-12-25
Ask HN: How can I get better at using AI for programming?
阅读更多来源: Hacker News | 14-12-25
If a Meta AI model can read a brain-wide signal, why wouldn't the brain?1393.xyz
阅读更多来源: Hacker News | 14-12-25
Exploring LLMs for Scientific Information Extraction Using The SciEx Framework
Authors: Sha Li, Ayush Sadekar, Nathan Self, Yiqi Su, Lars Andersland, Mira Chaplin, Annabel Zhang, Hyoju Yang, James B Henderson, Krista Wigginton, Linsey Marr, T.M. Murali, Naren Ramakrishnan |
阅读更多来源: ArXiv AI | 14-12-25
Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit
Authors: Nick Jiang, Xiaoqing Sun, Lisa Dunlap, Lewis Smith, Neel Nanda |
阅读更多来源: ArXiv AI | 14-12-25
Linear socio-demographic representations emerge in Large Language Models from indirect cues
Authors: Paul Bouchaud, Pedro Ramaciotti |
阅读更多来源: ArXiv AI | 14-12-25
Mind the Gap! Pathways Towards Unifying AI Safety and Ethics Research
Authors: Dani Roytburg, Beck Miller |
阅读更多来源: ArXiv AI | 14-12-25
Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning
Authors: Logan Robbins |
阅读更多来源: ArXiv AI | 14-12-25
Robust AI Security and Alignment: A Sisyphean Endeavor?
Authors: Apostol Vassilev |
阅读更多来源: ArXiv AI | 14-12-25
Trustworthy Orchestration Artificial Intelligence by the Ten Criteria with Control-Plane Governance
Authors: Byeong Ho Kang, Wenli Yang, Muhammad Bilal Amin |
阅读更多来源: ArXiv AI | 14-12-25
Reverse Thinking Enhances Missing Information Detection in Large Language Models
Authors: Yuxin Liu, Chaojie Gu, Yihang Zhang, Bin Qian, Shibo He |
阅读更多来源: ArXiv AI | 14-12-25
CP-Env: Evaluating Large Language Models on Clinical Pathways in a Controllable Hospital Environment
Authors: Yakun Zhu, Zhongzhen Huang, Qianhan Feng, Linjie Mu, Yannian Gu, Shaoting Zhang, Qi Dou, Xiaofan Zhang |
阅读更多来源: ArXiv AI | 14-12-25
LLM-Empowered Representation Learning for Emerging Item Recommendation
Authors: Ziying Zhang, Quanming Yao, Yaqing Wang |
阅读更多来源: ArXiv AI | 14-12-25
Zero-shot 3D Map Generation with LLM Agents: A Dual-Agent Architecture for Procedural Content Generation
Authors: Lim Chien Her, Ming Yan, Yunshu Bai, Ruihao Li, Hao Zhang |
阅读更多来源: ArXiv AI | 14-12-25
When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection
Authors: Devanshu Sahoo, Manish Prasad, Vasudev Majhi, Jahnvi Singh, Vinay Chamola, Yash Sinha, Murari Mandal, Dhruv Kumar |
阅读更多来源: ArXiv AI | 14-12-25
NormCode: A Semi-Formal Language for Context-Isolated AI Planning
Authors: Xin Guan |
阅读更多来源: ArXiv AI | 14-12-25
Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning
Authors: Haiteng Zhao, Junhao Shen, Yiming Zhang, Songyang Gao, Kuikun Liu, Tianyou Ma, Fan Zheng, Dahua Lin, Wenwei Zhang, Kai Chen |
阅读更多来源: ArXiv AI | 14-12-25
Phythesis: Physics-Guided Evolutionary Scene Synthesis for Energy-Efficient Data Center Design via LLMs
Authors: Minghao LI, Ruihang Wang, Rui Tan, Yonggang Wen |
阅读更多来源: ArXiv AI | 14-12-25
On the Dynamics of Multi-Agent LLM Communities Driven by Value Diversity
Authors: Muhua Huang, Qinlin Zhao, Xiaoyuan Yi, Xing Xie |
阅读更多来源: ArXiv AI | 14-12-25
Challenges of Evaluating LLM Safety for User Welfare
Authors: Manon Kempermann, Sai Suresh Macharla Vasu, Mahalakshmi Raveenthiran, Theo Farrell, Ingmar Weber |
阅读更多来源: ArXiv AI | 14-12-25
Replace, Don't Expand: Mitigating Context Dilution in Multi-Hop RAG via Fixed-Budget Evidence Assembly
Authors: Moshe Lahmy, Roi Yozevitch |
阅读更多来源: ArXiv AI | 14-12-25
COMPARE: Clinical Optimization with Modular Planning and Assessment via RAG-Enhanced AI-OCT: Superior Decision Support for Percutaneous Coronary Intervention Compared to ChatGPT-5 and Junior Operators
Authors: Wei Fang, Chiyao Wang, Wenshuai Ma, Hui Liu, Jianqiang Hu, Xiaona Niu, Yi Chu, Mingming Zhang, Jingxiao Yang, Dongwei Zhang, Zelin Li, Pengyun Liu, Jiawei Zheng, Pengke Zhang, Chaoshi Qin, Wangang Guo, Bin Wang, Yugang Xue, Wei Zhang, Zikuan Wang, Rui Zhu, Yihui Cao, Quanmao Lu, Rui Meng, Yan Li |
阅读更多来源: ArXiv AI | 14-12-25
LLMs Can Assist with Proposal Selection at Large User Facilities
Authors: Lijie Ding, Janell Thomson, Jon Taylor, Changwoo Do |
阅读更多来源: ArXiv AI | 14-12-25
Pangram 3.0 AI text detector claims up to 99.98% accuracy, even for subtly AI-assisted content
阅读更多来源: The Decoder | 14-12-25
Making AI sound human comes at the cost of meaning, researchers show
阅读更多来源: The Decoder | 14-12-25
llamafile: Distribute and Run LLMs with a Single Filegithub.com/mozilla-ai
阅读更多来源: Hacker News | 14-12-25
Ensuring a National Policy Framework for Artificial Intelligencewhitehouse.gov
阅读更多来源: Hacker News | 13-12-25
macOS 26.2 enables fast AI clusters with RDMA over Thunderboltdeveloper.apple.com
阅读更多来源: Hacker News | 13-12-25
OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLIsimonwillison.net
阅读更多来源: Hacker News | 13-12-25
OpenAI at 10: Altman sees superintelligence arriving by 2035
阅读更多来源: The Decoder | 13-12-25
FACTS benchmark shows that even top AI models struggle with the truth
阅读更多来源: The Decoder | 13-12-25
Nvidia develops location tracking for AI chips
阅读更多来源: The Decoder | 12-12-25
Google adds new features to boost website visibility in AI search
阅读更多来源: The Decoder | 12-12-25
Deepseek reportedly using thousands of smuggled Nvidia chips for AI training
阅读更多来源: The Decoder | 12-12-25
The Walt Disney Company and OpenAI Partner on Soraopenai.com
阅读更多来源: Hacker News | 12-12-25
Guarding My Git Forge Against AI Scrapersvulpinecitrus.info
阅读更多来源: Hacker News | 12-12-25
Training LLMs for Honesty via Confessionsarxiv.org
阅读更多来源: Hacker News | 12-12-25
Show HN: Autofix Bot – Hybrid static analysis and AI code review agent
阅读更多来源: Hacker News | 12-12-25
GPT-5.2openai.com
阅读更多来源: Hacker News | 12-12-25
RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning
Authors: Yucan Guo, Miao Su, Saiping Guan, Zihao Sun, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng |
阅读更多来源: ArXiv AI | 12-12-25
System Report for CCL25-Eval Task 10: Prompt-Driven Large Language Model Merge for Fine-Grained Chinese Hate Speech Detection
Authors: Binglin Wu, Jiaxiu Zou, Xianneng Li |
阅读更多来源: ArXiv AI | 12-12-25
Hands-on Evaluation of Visual Transformers for Object Recognition and Detection
Authors: Dimitrios N. Vlachogiannis, Dimitrios A. Koutsomitropoulos |
阅读更多来源: ArXiv AI | 12-12-25
The Gender Code: Gendering the Global Governance of Artificial Intelligence
Authors: Jelena Cupac |
阅读更多来源: ArXiv AI | 12-12-25
Can LLMs Evaluate What They Cannot Annotate? Revisiting LLM Reliability in Hate Speech Detection
Authors: Paloma Piot, David Otero, Patricia Martín-Rodilla, Javier Parapar |
阅读更多来源: ArXiv AI | 12-12-25
Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs
Authors: Jan Betley, Jorio Cocola, Dylan Feng, James Chua, Andy Arditi, Anna Sztyber-Betley, Owain Evans |
阅读更多来源: ArXiv AI | 12-12-25
Ethics Readiness of Artificial Intelligence: A Practical Evaluation Method
Authors: Laurynas Adomaitis, Vincent Israel-Jost, Alexei Grinbaum |
阅读更多来源: ArXiv AI | 12-12-25
Quantifying Uncertainty in Machine Learning-Based Pervasive Systems: Application to Human Activity Recognition
Authors: Vladimir Balditsyn, Philippe Lalanda, German Vega, Stéphanie Chollet |
阅读更多来源: ArXiv AI | 12-12-25
Circuits, Features, and Heuristics in Molecular Transformers
Authors: Kristof Varadi, Mark Marosi, Peter Antal |
阅读更多来源: ArXiv AI | 12-12-25
LLMs in Interpreting Legal Documents
Authors: Simone Corbo |
阅读更多来源: ArXiv AI | 12-12-25
CHEM: Estimating and Understanding Hallucinations in Deep Learning for Image Processing
Authors: Jianfei Li, Ines Rosellon-Inclan, Gitta Kutyniok, Jean-Luc Starck |
阅读更多来源: ArXiv AI | 12-12-25
FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning
Authors: Khurram Khalil, Khaza Anuarul Hoque |
阅读更多来源: ArXiv AI | 12-12-25
A Categorical Analysis of Large Language Models and Why LLMs Circumvent the Symbol Grounding Problem
Authors: Luciano Floridi, Yiyang Jia, Fernando Tohmé |
阅读更多来源: ArXiv AI | 12-12-25
AI TIPS 2.0: A Comprehensive Framework for Operationalizing AI Governance
Authors: Pamela Gupta |
阅读更多来源: ArXiv AI | 12-12-25
Calibrated Trust in Dealing with LLM Hallucinations: A Qualitative Study
Authors: Adrian Ryser, Florian Allwein, Tim Schlippe |
阅读更多来源: ArXiv AI | 12-12-25
An End-to-end Planning Framework with Agentic LLMs and PDDL
Authors: Emanuele La Malfa, Ping Zhu, Samuele Marro, Sara Bernardini, Michael Wooldridge |
阅读更多来源: ArXiv AI | 12-12-25
RIFT: A Scalable Methodology for LLM Accelerator Fault Assessment using Reinforcement Learning
Authors: Khurram Khalil, Muhammad Mahad Khaliq, Khaza Anuarul Hoque |
阅读更多来源: ArXiv AI | 12-12-25
Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing
Authors: Justin W. Lin, Eliot Krzysztof Jones, Donovan Julian Jasper, Ethan Jun-shen Ho, Anna Wu, Arnold Tianyi Yang, Neil Perry, Andy Zou, Matt Fredrikson, J. Zico Kolter, Percy Liang, Dan Boneh, Daniel E. Ho |
阅读更多来源: ArXiv AI | 12-12-25
Mistral's open coding model Devstral 2 claims sevenfold cost advantage over Claude Sonnet
阅读更多来源: The Decoder | 11-12-25
Big AI’s biggest names rally around the Agentic AI Foundation to set agent standards
阅读更多来源: The Decoder | 11-12-25
Google positions Gemini as the "glue" for its new XR ecosystem
阅读更多来源: The Decoder | 11-12-25
Show HN: Automated license plate reader coverage in the USAalpranalysis.com
阅读更多来源: Hacker News | 11-12-25
Show HN: Local Privacy Firewall-blocks PII and secrets before ChatGPT sees themgithub.com/privacyshield-ai
阅读更多来源: Hacker News | 11-12-25
Getting a Gemini API key is an exercise in frustrationankursethi.com
阅读更多来源: Hacker News | 11-12-25
DeepSeek uses banned Nvidia chips for AI model, report saysyahoo.com
阅读更多来源: Hacker News | 11-12-25
Launch HN: InspectMind (YC W24) – AI agent for reviewing construction drawings
阅读更多来源: Hacker News | 11-12-25
Chinese AI firms build shadow workforce in Kenya using WhatsApp and mobile payments
阅读更多来源: The Decoder | 10-12-25
70% of creative professionals fear stigma over AI use, Anthropic study finds
阅读更多来源: The Decoder | 10-12-25
Google faces an antitrust probe for using web and YouTube content in AI without opt-out or fair pay
阅读更多来源: The Decoder | 10-12-25
OpenAI claims generative AI saves knowledge workers 40 to 80 minutes a day
阅读更多来源: The Decoder | 10-12-25
Report: Aging power grid puts OpenAI and Microsoft's growth at risk
阅读更多来源: The Decoder | 10-12-25
McDonald's pulls AI Christmas ad after backlashbbc.co.uk
阅读更多来源: Hacker News | 10-12-25
Donating the Model Context Protocol and establishing the Agentic AI Foundationanthropic.com
阅读更多来源: Hacker News | 10-12-25
Show HN: Gemini Pro 3 hallucinates the HN front page 10 years from nowdosaygo-studio.github.io
阅读更多来源: Hacker News | 10-12-25
Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning
Authors: Jing Jie Tan, Anissa Mokraoui, Ban-Hoe Kwan, Danny Wee-Kiat Ng, Yan-Chai Hum |
阅读更多来源: ArXiv AI | 10-12-25
Fed-SE: Federated Self-Evolution for Privacy-Constrained Multi-Environment LLM Agents
Authors: Xiang Chen, Yuling Shi, Qizhen Lan, Yuchao Qiu, Xiaodong Gu |
阅读更多来源: ArXiv AI | 10-12-25
When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation
Authors: Joshua Ward, Bochao Gu, Chi-Hua Wang, Guang Cheng |
阅读更多来源: ArXiv AI | 10-12-25
Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training
Authors: Jakub Krajewski, Amitis Shidani, Dan Busbridge, Sam Wiseman, Jason Ramapuram |
阅读更多来源: ArXiv AI | 10-12-25
SkipKV: Selective Skipping of KV Generation and Storage for Efficient Inference with Large Reasoning Models
Authors: Jiayi Tian, Seyedarmin Azizi, Yequan Zhao, Erfan Baghaei Potraghloo, Sean McPherson, Sharath Nittur Sridhar, Zhengyang Wang, Zheng Zhang, Massoud Pedram, Souvik Kundu |
阅读更多来源: ArXiv AI | 10-12-25
Can AI autonomously build, operate, and use the entire data stack?
Authors: Arvind Agarwal, Lisa Amini, Sameep Mehta, Horst Samulowitz, Kavitha Srinivas |
阅读更多来源: ArXiv AI | 10-12-25
Impact of Data-Oriented and Object-Oriented Design on Performance and Cache Utilization with Artificial Intelligence Algorithms in Multi-Threaded CPUs
Authors: Gabriel M. Arantes, Richard F. Pinto, Bruno L. Dalmazo, Eduardo N. Borges, Giancarlo Lucca, Viviane L. D. de Mattos, Fabian C. Cardoso, Rafael A. Berri |
阅读更多来源: ArXiv AI | 10-12-25
Large Language Models for Education and Research: An Empirical and User Survey-based Analysis
Authors: Md Mostafizer Rahman, Ariful Islam Shiplu, Md Faizul Ibne Amin, Yutaka Watanobe, Lu Peng |
阅读更多来源: ArXiv AI | 10-12-25
Toward an AI Reasoning-Enabled System for Patient-Clinical Trial Matching
Authors: Caroline N. Leach, Mitchell A. Klusty, Samuel E. Armstrong, Justine C. Pickarski, Kristen L. Hankins, Emily B. Collier, Maya Shah, Aaron D. Mullen, V. K. Cody Bumgardner |
阅读更多来源: ArXiv AI | 10-12-25
Soil Compaction Parameters Prediction Based on Automated Machine Learning Approach
Authors: Caner Erden, Alparslan Serhat Demir, Abdullah Hulusi Kokcam, Talas Fikret Kurnaz, Ugur Dagdeviren |
阅读更多来源: ArXiv AI | 10-12-25
Predicting California Bearing Ratio with Ensemble and Neural Network Models: A Case Study from Türkiye
Authors: Abdullah Hulusi Kökçam, Uğur Dağdeviren, Talas Fikret Kurnaz, Alparslan Serhat Demir, Caner Erden |
阅读更多来源: ArXiv AI | 10-12-25
rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection
Authors: Sijia Chen, Baochun Li, Di Niu |
阅读更多来源: ArXiv AI | 10-12-25
Reflecting with Two Voices: A Co-Adaptive Dual-Strategy Framework for LLM-Based Agent Decision Making
Authors: Wentao Zhang, Qunbo Wang, Tao Zhang, Junsheng Wu, Hongping Gan, Yang Liu, Ling Dai, Shizhuang Deng, Shuntong Sun |
阅读更多来源: ArXiv AI | 10-12-25
Enhancing Explainability of Graph Neural Networks Through Conceptual and Structural Analyses and Their Extensions
Authors: Tien Cuong Bui |
阅读更多来源: ArXiv AI | 10-12-25
From Accuracy to Impact: The Impact-Driven AI Framework (IDAIF) for Aligning Engineering Architecture with Theory of Change
Authors: Yong-Woon Kim |
阅读更多来源: ArXiv AI | 10-12-25
The SMART+ Framework for AI Systems
Authors: Laxmiraju Kandikatla, Branislav Radeljic |
阅读更多来源: ArXiv AI | 10-12-25
Principles2Plan: LLM-Guided System for Operationalising Ethical Principles into Plans
Authors: Tammy Zhong, Yang Song, Maurice Pagnucco |
阅读更多来源: ArXiv AI | 10-12-25
Protein Secondary Structure Prediction Using Transformers
Authors: Manzi Kevin Maxime |
阅读更多来源: ArXiv AI | 10-12-25
CogMCTS: A Novel Cognitive-Guided Monte Carlo Tree Search Framework for Iterative Heuristic Evolution with Large Language Models
Authors: Hui Wang, Yang Liu, Xiaoyu Zhang, Chaoxu Mu |
阅读更多来源: ArXiv AI | 10-12-25
Deconstructing the Dual Black Box:A Plug-and-Play Cognitive Framework for Human-AI Collaborative Enhancement and Its Implications for AI Governance
Authors: Yiming Lu |
阅读更多来源: ArXiv AI | 10-12-25
A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows
Authors: Eranga Bandara, Ross Gore, Peter Foytik, Sachin Shetty, Ravi Mukkamala, Abdul Rahman, Xueping Liang, Safdar H. Bouk, Amin Hass, Sachini Rajapakse, Ng Wee Keong, Kasun De Zoysa, Aruna Withanage, Nilaan Loganathan |
阅读更多来源: ArXiv AI | 10-12-25
Same Content, Different Answers: Cross-Modal Inconsistency in MLLMs
Authors: Angela van Sprang, Laurens Samson, Ana Lucic, Erman Acar, Sennay Ghebreab, Yuki M. Asano |
阅读更多来源: ArXiv AI | 10-12-25
Launch HN: Mentat (YC F24) – Controlling LLMs with Runtime Intervention
阅读更多来源: Hacker News | 10-12-25
Transformers know more than they can tell: Learning the Collatz sequencearxiv.org
阅读更多来源: Hacker News | 10-12-25
OpenEvolve: Teaching LLMs to Discover Algorithms Through Evolutionalgorithmicsuperintelligence.ai
阅读更多来源: Hacker News | 10-12-25
Apple's slow AI pace becomes a strength as market grows weary of spendingyahoo.com
阅读更多来源: Hacker News | 10-12-25
Agentic AI Foundationblock.xyz
阅读更多来源: Hacker News | 10-12-25
Deep learning for autism detection using clinical notes: A comparison of transfer learning for a transparent and black-box approach
Authors: Gondy Leroy, Prakash Bisht, Sai Madhuri Kandula, Nell Maltman, Sydney Rice |
阅读更多来源: ArXiv AI | 10-12-25
Going All-In on LLM Accuracy: Fake Prediction Markets, Real Confidence Signals
Authors: Michael Todasco (Visiting Fellow at the James Silberrad Center for Artificial Intelligence, San Diego State University) |
阅读更多来源: ArXiv AI | 10-12-25
UncertaintyZoo: A Unified Toolkit for Quantifying Predictive Uncertainty in Deep Learning Systems
Authors: Xianzong Wu, Xiaohong Li, Lili Quan, Qiang Hu |
阅读更多来源: ArXiv AI | 10-12-25
GENIUS: An Agentic AI Framework for Autonomous Design and Execution of Simulation Protocols
Authors: Mohammad Soleymanibrojeni, Roland Aydin, Diego Guedes-Sobrinho, Alexandre C. Dias, Maurício J. Piotrowski, Wolfgang Wenzel, Celso Ricardo Caldeira Rêgo |
阅读更多来源: ArXiv AI | 10-12-25
Less Is More for Multi-Step Logical Reasoning of LLM Generalisation Under Rule Removal, Paraphrasing, and Compression
Authors: Qiming Bao, Xiaoxuan Fu |
阅读更多来源: ArXiv AI | 10-12-25
ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems
Authors: Bufang Yang, Lilin Xu, Liekang Zeng, Yunqi Guo, Siyang Jiang, Wenrui Lu, Kaiwei Liu, Hancheng Xiang, Xiaofan Jiang, Guoliang Xing, Zhenyu Yan |
阅读更多来源: ArXiv AI | 10-12-25
Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents
Authors: Zhibo Liang, Tianze Hu, Zaiye Chen, Mingjie Tang |
阅读更多来源: ArXiv AI | 10-12-25
Academic journals' AI policies fail to curb the surge in AI-assisted academic writing
Authors: Yongyuan He, Yi Bu |
阅读更多来源: ArXiv AI | 10-12-25
FlatFormer: A Flat Transformer Knowledge Tracing Model Based on Cognitive Bias Injection
Authors: Xiao-li Xia, Hou-biao Li |
阅读更多来源: ArXiv AI | 10-12-25
Do Persona-Infused LLMs Affect Performance in a Strategic Reasoning Game?
Authors: John Licato, Stephen Steinle, Brayden Hollis |
阅读更多来源: ArXiv AI | 10-12-25
JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models
Authors: Ce Chi, Xing Wang, Zhendong Wang, Xiaofan Liu, Ce Li, Zhiyan Song, Chen Zhao, Kexin Yang, Boshen Shi, Jingjing Yang, Chao Deng, Junlan Feng |
阅读更多来源: ArXiv AI | 10-12-25
DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems
Authors: Ming Ma, Jue Zhang, Fangkai Yang, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang |
阅读更多来源: ArXiv AI | 10-12-25
A Neural Affinity Framework for Abstract Reasoning: Diagnosing the Compositional Gap in Transformer Architectures via Procedural Task Taxonomy
Authors: Miguel Ingram, Arthur Joseph Merritt III |
阅读更多来源: ArXiv AI | 10-12-25
ClinNoteAgents: An LLM Multi-Agent System for Predicting and Interpreting Heart Failure 30-Day Readmission from Clinical Notes
Authors: Rongjia Zhou, Chengzhuo Li, Carl Yang, Jiaying Lu |
阅读更多来源: ArXiv AI | 10-12-25
Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement
Authors: Yongsheng Lian |
阅读更多来源: ArXiv AI | 10-12-25
How Do LLMs Fail In Agentic Scenarios? A Qualitative Analysis of Success and Failure Scenarios of Various LLMs in Agentic Simulations
Authors: JV Roig |
阅读更多来源: ArXiv AI | 10-12-25
RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models
Authors: Xiqiao Xiong, Ouxiang Li, Zhuo Liu, Moxin Li, Wentao Shi, Fuli Feng, Xiangnan He |
阅读更多来源: ArXiv AI | 10-12-25
Large Causal Models from Large Language Models
Authors: Sridhar Mahadevan |
阅读更多来源: ArXiv AI | 10-12-25
ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning
Authors: Nearchos Potamitis, Lars Klein, Akhil Arora |
阅读更多来源: ArXiv AI | 10-12-25
Perplexity's BrowseSafe tries to patch the gaping security holes inherent in AI browser agents
阅读更多来源: The Decoder | 09-12-25
Horses: AI progress is steady. Human equivalence is suddenandyljones.com
阅读更多来源: Hacker News | 09-12-25
LLM from scratch, part 28 – training a base model from scratch on an RTX 3090gilesthomas.com
阅读更多来源: Hacker News | 09-12-25
Big Tech-Funded AI Papers Have Higher Citation Impact, Greater Insularity, and Larger Recency Bias
Authors: Max Martin Gnewuch, Jan Philip Wahle, Terry Ruas, Bela Gipp |
阅读更多来源: ArXiv AI | 09-12-25
Optimizing Medical Question-Answering Systems: A Comparative Study of Fine-Tuned and Zero-Shot Large Language Models with RAG Framework
Authors: Tasnimul Hassan, Md Faisal Karim, Haziq Jeelani, Elham Behnam, Robert Green, Fayeq Jeelani Syed |
阅读更多来源: ArXiv AI | 09-12-25
NEAT: Neighborhood-Guided, Efficient, Autoregressive Set Transformer for 3D Molecular Generation
Authors: Daniel Rose, Roxane Axel Jacob, Johannes Kirchmair, Thierry Langer |
阅读更多来源: ArXiv AI | 09-12-25
Phase-OTDR Event Detection Using Image-Based Data Transformation and Deep Learning
Authors: Muhammet Cagri Yeke, Samil Sirin, Kivilcim Yuksel, Abdurrahman Gumus |
阅读更多来源: ArXiv AI | 09-12-25
Natural Language Summarization Enables Multi-Repository Bug Localization by LLMs in Microservice Architectures
Authors: Amirkia Rafiei Oskooei, S. Selcan Yukcu, Mehmet Cevheri Bozoglan, Mehmet S. Aktas |
阅读更多来源: ArXiv AI | 09-12-25
Measuring the Effect of Background on Classification and Feature Importance in Deep Learning for AV Perception
Authors: Anne Sielemann, Valentin Barner, Stefan Wolf, Masoud Roschani, Jens Ziehn, Juergen Beyerer |
阅读更多来源: ArXiv AI | 09-12-25
Trusted AI Agents in the Cloud
Authors: Teofil Bodea, Masanori Misono, Julian Pritzi, Patrick Sabanic, Thore Sommer, Harshavardhan Unnibhavi, David Schall, Nuno Santos, Dimitrios Stavrakakis, Pramod Bhatotia |
阅读更多来源: ArXiv AI | 09-12-25
M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG
Authors: David Anugraha, Patrick Amadeus Irawan, Anshul Singh, En-Shiun Annie Lee, Genta Indra Winata |
阅读更多来源: ArXiv AI | 09-12-25
Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity
Authors: Germán Kruszewski, Pierre Erbacher, Jos Rozen, Marc Dymetman |
阅读更多来源: ArXiv AI | 09-12-25
Semantic Faithfulness and Entropy Production Measures to Tame Your LLM Demons and Manage Hallucinations
Authors: Igor Halperin |
阅读更多来源: ArXiv AI | 09-12-25
Resolving Zadehs Paradox Axiomatic Possibility Theory as a Foundation for Reliable Artificial Intelligence
Authors: Bychkov Oleksii, Bychkova Sophia, Lytvynchuk Khrystyna |
阅读更多来源: ArXiv AI | 09-12-25
Bridging Traditional Machine Learning and Large Language Models: A Two-Part Course Design for Modern AI Education
Authors: Fang Li |
阅读更多来源: ArXiv AI | 09-12-25
BEAVER: An Efficient Deterministic LLM Verifier
Authors: Tarun Suresh, Nalin Wadhwa, Debangshu Banerjee, Gagandeep Singh |
阅读更多来源: ArXiv AI | 09-12-25
Ontology Learning with LLMs: A Benchmark Study on Axiom Identification
Authors: Roos M. Bakker, Daan L. Di Scala, Maaike H.T. de Boer, Stephan A. Raaijmakers |
阅读更多来源: ArXiv AI | 09-12-25
Using Large Language Models to Create Personalized Networks From Therapy Sessions
Authors: Clarissa W. Ong, Hiba Arnaout, Kate Sheehan, Estella Fox, Eugen Owtscharow, Iryna Gurevych |
阅读更多来源: ArXiv AI | 09-12-25
To Err Is Human: Systematic Quantification of Errors in Published AI Papers via LLM Analysis
Authors: Federico Bianchi, Yongchan Kwon, Zachary Izzo, Linjun Zhang, James Zou |
阅读更多来源: ArXiv AI | 09-12-25
NYT sues AI search engine Perplexity for alleged content misuse
阅读更多来源: The Decoder | 08-12-25
OpenAI insists its shopping suggestions shouldn't be seen as advertising
阅读更多来源: The Decoder | 08-12-25
Show HN: Lockenv – Simple encrypted secrets storage for Gitgithub.com/illarion
阅读更多来源: Hacker News | 08-12-25
Google Titans architecture, helping AI have long-term memoryresearch.google
阅读更多来源: Hacker News | 08-12-25
I failed to recreate the 1996 Space Jam website with Claudej0nah.com
阅读更多来源: Hacker News | 08-12-25
The "confident idiot" problem: Why AI needs hard rules, not vibe checkssteerlabs.substack.com
阅读更多来源: Hacker News | 08-12-25
Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning
Authors: Purbesh Mitra, Sennur Ulukus |
阅读更多来源: ArXiv AI | 07-12-25
Addressing Logical Fallacies In Scientific Reasoning From Large Language Models: Towards a Dual-Inference Training Framework
Authors: Peter B. Walker, Hannah Davidson, Aiden Foster, Matthew Lienert, Thomas Pardue, Dale Russell |
阅读更多来源: ArXiv AI | 07-12-25
Educational Cone Model in Embedding Vector Spaces
Authors: Yo Ehara |
阅读更多来源: ArXiv AI | 07-12-25
Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
Authors: Huy Nghiem, Swetasudha Panda, Devashish Khatwani, Huy V. Nguyen, Krishnaram Kenthapadi, Hal Daumé III |
阅读更多来源: ArXiv AI | 07-12-25
Executable Governance for AI: Translating Policies into Rules Using LLMs
Authors: Gautam Varma Datla, Anudeep Vurity, Tejaswani Dash, Tazeem Ahmad, Mohd Adnan, Saima Rafi |
阅读更多来源: ArXiv AI | 07-12-25
Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning
Authors: Hongye Cao, Zhixin Bai, Ziyue Peng, Boyan Wang, Tianpei Yang, Jing Huo, Yuyao Zhang, Yang Gao |
阅读更多来源: ArXiv AI | 07-12-25
A Conceptual Model for AI Adoption in Financial Decision-Making: Addressing the Unique Challenges of Small and Medium-Sized Enterprises
Authors: Manh Chien Vu, Thang Le Dinh, Manh Chien Vu, Tran Duc Le, Thi Lien Huong Nguyen |
阅读更多来源: ArXiv AI | 07-12-25
Artificial Intelligence Applications in Horizon Scanning for Infectious Diseases
Authors: Ian Miles, Mayumi Wakimoto, Wagner Meira Jr., Daniela Paula, Daylene Ticiane, Bruno Rosa, Jane Biddulph, Stelios Georgiou, Valdir Ermida |
阅读更多来源: ArXiv AI | 07-12-25
Solving LLM Repetition Problem in Production: A Comprehensive Study of Multiple Solutions
Authors: Weiwei Wang, Weijie Zou, Jiyong Min |
阅读更多来源: ArXiv AI | 07-12-25
GovBench: Benchmarking LLM Agents for Real-World Data Governance Workflows
Authors: Zhou Liu, Zhaoyang Han, Guochen Yan, Hao Liang, Bohan Zeng, Xing Chen, Yuanfeng Song, Wentao Zhang |
阅读更多来源: ArXiv AI | 07-12-25
Neural Decoding of Overt Speech from ECoG Using Vision Transformers and Contrastive Representation Learning
Authors: Mohamed Baha Ben Ticha, Xingchen Ran, Guillaume Saldanha, Gaël Le Godais, Philémon Roussel, Marc Aubert, Amina Fontanell, Thomas Costecalde, Lucas Struber, Serpil Karakas, Shaomin Zhang, Philippe Kahane, Guillaume Charvet, Stéphan Chabardès, Blaise Yvert |
阅读更多来源: ArXiv AI | 07-12-25
GTM: Simulating the World of Tools for AI Agents
Authors: Zhenzhen Ren, Xinpeng Zhang, Zhenxing Qian, Yan Gao, Yu Shi, Shuxin Zheng, Jiyan He |
阅读更多来源: ArXiv AI | 07-12-25
BiTAgent: A Task-Aware Modular Framework for Bidirectional Coupling between Multimodal Large Language Models and World Models
Authors: Yu-Wei Zhan, Xin Wang, Pengzhe Mao, Tongtong Feng, Ren Wang, Wenwu Zhu |
阅读更多来源: ArXiv AI | 07-12-25
Sequential Enumeration in Large Language Models
Authors: Kuinan Hou, Marco Zorzi, Alberto Testolin |
阅读更多来源: ArXiv AI | 07-12-25
Towards Ethical Multi-Agent Systems of Large Language Models: A Mechanistic Interpretability Perspective
Authors: Jae Hee Lee, Anne Lauscher, Stefano V. Albrecht |
阅读更多来源: ArXiv AI | 07-12-25
BioMedGPT-Mol: Multi-task Learning for Molecular Understanding and Generation
Authors: Chenyang Zuo, Siqi Fan, Zaiqing Nie |
阅读更多来源: ArXiv AI | 07-12-25
Enabling Ethical AI: A case study in using Ontological Context for Justified Agentic AI Decisions
Authors: Liam McGee, James Harvey, Lucy Cull, Andreas Vermeulen, Bart-Floris Visscher, Malvika Sharan |
阅读更多来源: ArXiv AI | 07-12-25
Chameleon: Adaptive Adversarial Agents for Scaling-Based Visual Prompt Injection in Multimodal AI Systems
Authors: M Zeeshan, Saud Satti |
阅读更多来源: ArXiv AI | 07-12-25
STELLA: Guiding Large Language Models for Time Series Forecasting with Semantic Abstractions
Authors: Junjie Fan, Hongye Zhao, Linduo Wei, Jiayu Rao, Guijia Li, Jiaxin Yuan, Wenqi Xu, Yong Qi |
阅读更多来源: ArXiv AI | 07-12-25
From Task Executors to Research Partners: Evaluating AI Co-Pilots Through Workflow Integration in Biomedical Research
Authors: Lukas Weidener, Marko Brkić, Chiara Bacci, Mihailo Jovanović, Emre Ulgac, Alex Dobrin, Johannes Weniger, Martin Vlas, Ritvik Singh, Aakaash Meduri |
阅读更多来源: ArXiv AI | 07-12-25
Are LLMs Truly Multilingual? Exploring Zero-Shot Multilingual Capability of LLMs for Information Retrieval: An Italian Healthcare Use Case
Authors: Vignesh Kumar Kembu, Pierandrea Morandini, Marta Bianca Maria Ranzini, Antonino Nocera |
阅读更多来源: ArXiv AI | 07-12-25
The AI Consumer Index (ACE)
Authors: Julien Benchek, Rohit Shetty, Benjamin Hunsberger, Ajay Arun, Zach Richards, Brendan Foody, Osvald Nitski, Bertie Vidgen |
阅读更多来源: ArXiv AI | 07-12-25
Toward Continuous Neurocognitive Monitoring: Integrating Speech AI with Relational Graph Transformers for Rare Neurological Diseases
Authors: Raquel Norel, Michele Merler, Pavitra Modi |
阅读更多来源: ArXiv AI | 07-12-25
The unexpected effectiveness of one-shot decompilation with Claudeblog.chrislewis.au
阅读更多来源: Hacker News | 07-12-25
Z2 – Lithographically fabricated IC in a garage fabzeloof.xyz
阅读更多来源: Hacker News | 07-12-25
Using LLMs at Oxideoxide.computer
阅读更多来源: Hacker News | 07-12-25
Google gathers triple OpenAI's AI data through its search monopoly
阅读更多来源: The Decoder | 07-12-25
OpenAI ordered to turn over 20 million ChatGPT chats to the New York Times
阅读更多来源: The Decoder | 07-12-25
GeoVista brings open-source AI geolocation to near-parity with top commercial models
阅读更多来源: The Decoder | 07-12-25
Corporate AI agents use simple workflows with human oversight instead of chasing full autonomy
阅读更多来源: The Decoder | 07-12-25
I'm Peter Roberts, immigration attorney who does work for YC and startups. AMA
阅读更多来源: Hacker News | 06-12-25
Gemini 3 Pro: the frontier of vision AIblog.google
阅读更多来源: Hacker News | 06-12-25
OpenAI tests „Confessions“ to uncover hidden AI misbehavior
阅读更多来源: The Decoder | 06-12-25
Most technical problems are people problemsjoeschrag.com
阅读更多来源: Hacker News | 06-12-25
YouTube caught making AI-edits to videos and adding misleading AI summariesgrowyourown.services
阅读更多来源: Hacker News | 06-12-25
Physicist Steve Hsu publishes research built around a core idea generated by GPT-5
阅读更多来源: The Decoder | 05-12-25
Anthropic CEO sees a looming economic risk as AI firms "YOLO" massive capital on uncertain futures
阅读更多来源: The Decoder | 05-12-25
We gave 5 LLMs $100K to trade stocks for 8 monthsaitradearena.com
阅读更多来源: Hacker News | 05-12-25
How elites could shape mass preferences as AI reduces persuasion costsarxiv.org
阅读更多来源: Hacker News | 05-12-25
After 40 years of adventure games, Ron Gilbert pivots to outrunning Deatharstechnica.com
阅读更多来源: Hacker News | 05-12-25
What is better: a lookup table or an enum type?cybertec-postgresql.com
阅读更多来源: Hacker News | 05-12-25
Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs
Authors: Tengyun Ma, Jiaqi Yao, Daojing He, Shihao Peng, Yu Li, Shaohui Liu, Zhuotao Tian |
阅读更多来源: ArXiv AI | 05-12-25
Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective
Authors: Jingyang Ou, Jiaqi Han, Minkai Xu, Shaoxuan Xu, Jianwen Xie, Stefano Ermon, Yi Wu, Chongxuan Li |
阅读更多来源: ArXiv AI | 05-12-25
DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training
Authors: Dingwei Zhu, Zhiheng Xi, Shihan Dou, Yuhui Wang, Sixian Li, Junjie Ye, Honglin Guo, Shichun Liu, Chenhao Huang, Yajie Yang, Junlin Shang, Senjie Jin, Ming Zhang, Jiazheng Zhang, Caishuang Huang, Yunke Zhang, Demei Yan, Yuran Wang, Tao Gui |
阅读更多来源: ArXiv AI | 05-12-25
A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models
Authors: X.Y. Han, Yuan Zhong |
阅读更多来源: ArXiv AI | 05-12-25
Highly Efficient Test-Time Scaling for T2I Diffusion Models with Text Embedding Perturbation
Authors: Hang Xu, Linjiang Huang, Feng Zhao |
阅读更多来源: ArXiv AI | 05-12-25
Large Language Models for Limited Noisy Data: A Gravitational Wave Identification Study
Authors: Yixuan Li, Yuhao Lu, Yang Liu, Liang Li, R. Ruffini, Di Li, Rong-Gen Cai, Xiaoyan Zhu, Wenbin Lin, Yu Wang |
阅读更多来源: ArXiv AI | 05-12-25
MarkTune: Improving the Quality-Detectability Trade-off in Open-Weight LLM Watermarking
Authors: Yizhou Zhao, Zhiwei Steven Wu, Adam Block |
阅读更多来源: ArXiv AI | 05-12-25
Exploring Syntropic Frameworks in AI Alignment: A Philosophical Investigation
Authors: Austin Spizzirri |
阅读更多来源: ArXiv AI | 05-12-25
Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia
Authors: Chandler Smith, Marwa Abdulhai, Manfred Diaz, Marko Tesic, Rakshit S. Trivedi, Alexander Sasha Vezhnevets, Lewis Hammond, Jesse Clifton, Minsuk Chang, Edgar A. Duéñez-Guzmán, John P. Agapiou, Jayd Matyas, Danny Karmon, Akash Kundu, Aliaksei Korshuk, Ananya Ananya, Arrasy Rahman, Avinaash Anand Kulandaivel, Bain McHale, Beining Zhang, Buyantuev Alexander, Carlos Saith Rodriguez Rojas, Caroline Wang, Chetan Talele, Chenao Liu, Chichen Lin, Diana Riazi, Di Yang Shi, Emanuel Tewolde, Elizaveta Tennant, Fangwei Zhong, Fuyang Cui, Gang Zhao, Gema Parreño Piqueras, Hyeonggeun Yun, Ilya Makarov, Jiaxun Cui, Jebish Purbey, Jim Dilkes, Jord Nguyen, Lingyun Xiao, Luis Felipe Giraldo, Manuela Chacon-Chamorro, Manuel Sebastian Rios Beltran, Marta Emili García Segura, Mengmeng Wang, Mogtaba Alim, Nicanor Quijano, Nico Schiavone, Olivia Macmillan-Scott, Oswaldo Peña, Peter Stone, Ram Mohan Rao Kadiyala, Rolando Fernandez, Ruben Manrique, Sunjia Lu, Sheila A. McIlraith, Shamika Dhuri, Shuqing Shi, Siddhant Gupta, Sneheel Sarangi, Sriram Ganapathi Subramanian, Taehun Cha, Toryn Q. Klassen, Wenming Tu, Weijian Fan, Wu Ruiyang, Xue Feng, Yali Du, Yang Liu, Yiding Wang, Yipeng Kang, Yoonchang Sung, Yuxuan Chen, Zhaowei Zhang, Zhihan Wang, Zhiqiang Wu, Ziang Chen, Zilong Zheng, Zixia Jia, Ziyan Wang, Dylan Hadfield-Menell, Natasha Jaques, Tim Baarslag, Jose Hernandez-Orallo, Joel Z. Leibo |
阅读更多来源: ArXiv AI | 05-12-25
When Do Symbolic Solvers Enhance Reasoning in Large Language Models?
Authors: Zhiyuan He, Dingmin Wang |
阅读更多来源: ArXiv AI | 05-12-25
Multimodal Reinforcement Learning with Agentic Verifier for AI Agents
Authors: Reuben Tan, Baolin Peng, Zhengyuan Yang, Hao Cheng, Oier Mees, Theodore Zhao, Andrea Tupini, Isar Meijier, Qianhui Wu, Yuncong Yang, Lars Liden, Yu Gu, Sheng Zhang, Xiaodong Liu, Lijuan Wang, Marc Pollefeys, Yong Jae Lee, Jianfeng Gao |
阅读更多来源: ArXiv AI | 05-12-25
RoCo: Role-Based LLMs Collaboration for Automatic Heuristic Design
Authors: Jiawei Xu, Fengfeng Wei, Weineng Chen |
阅读更多来源: ArXiv AI | 05-12-25
Benchmark for Planning and Control with Large Language Model Agents: Blocksworld with Model Context Protocol
Authors: Niklas Jobs, Luis Miguel Vieira da Silva, Jayanth Somashekaraiah, Maximilian Weigand, David Kube, Felix Gehlhoff |
阅读更多来源: ArXiv AI | 05-12-25
Anthropic prepares for a potential IPO race with OpenAI
阅读更多来源: The Decoder | 04-12-25
Paris-based Mistral releases Large 3, a major new open-source AI model
阅读更多来源: The Decoder | 04-12-25
Leaked "Soul Doc" reveals how Anthropic programs Claude’s character
阅读更多来源: The Decoder | 04-12-25
Anthropic brings Bun in-house, the runtime powering Claude Code’s $1B ARR
阅读更多来源: The Decoder | 04-12-25
Amazon's Nova 2 undercuts OpenAI and Google on price but still trails top-tier models
阅读更多来源: The Decoder | 04-12-25
Average DRAM price in USD over last 18 monthspcpartpicker.com
阅读更多来源: Hacker News | 04-12-25
Saturn (YC S24) Is Hiring Senior AI Engineerycombinator.com
阅读更多来源: Hacker News | 04-12-25
Reverse engineering a $1B Legal AI tool exposed 100k+ confidential filesalexschapiro.com
阅读更多来源: Hacker News | 04-12-25
Anthropic taps IPO lawyers as it races OpenAI to go publicft.com
阅读更多来源: Hacker News | 04-12-25
Fine-Tuned Large Language Models for Logical Translation: Reducing Hallucinations with Lang2Logic
Authors: Muyu Pan, Dheeraj Kodakandla, Mahfuza Farooque |
阅读更多来源: ArXiv AI | 04-12-25
The Moral Consistency Pipeline: Continuous Ethical Evaluation for Large Language Models
Authors: Saeid Jamshidi, Kawser Wazed Nafi, Arghavan Moradi Dakhel, Negar Shahabi, Foutse Khomh |
阅读更多来源: ArXiv AI | 04-12-25
TokenPowerBench: Benchmarking the Power Consumption of LLM Inference
Authors: Chenxu Niu, Wei Zhang, Jie Li, Yongjian Zhao, Tongyang Wang, Xi Wang, Yong Chen |
阅读更多来源: ArXiv AI | 04-12-25
Distribution-Calibrated Inference time compute for Thinking LLM-as-a-Judge
Authors: Hamid Dadkhahi, Firas Trabelsi, Parker Riley, Juraj Juraska, Mehdi Mirzazadeh |
阅读更多来源: ArXiv AI | 04-12-25
The 4/$δ$ Bound: Designing Predictable LLM-Verifier Systems for Formal Method Guarantee
Authors: PIerre Dantas, Lucas Cordeiro, Youcheng Sun, Waldir Junior |
阅读更多来源: ArXiv AI | 04-12-25
TradeTrap: Are LLM-based Trading Agents Truly Reliable and Faithful?
Authors: Lewen Yan, Jilin Mei, Tianyi Zhou, Lige Huang, Jie Zhang, Dongrui Liu, Jing Shao |
阅读更多来源: ArXiv AI | 04-12-25
Benchmarking LLM Agents for Wealth-Management Workflows
Authors: Rory Milsom |
阅读更多来源: ArXiv AI | 04-12-25
STRIDE: A Systematic Framework for Selecting AI Modalities - Agentic AI, AI Assistants, or LLM Calls
Authors: Shubhi Asthana, Bing Zhang, Chad DeLuca, Ruchi Mahindru, Hima Patel |
阅读更多来源: ArXiv AI | 04-12-25
DialogGuard: Multi-Agent Psychosocial Safety Evaluation of Sensitive LLM Responses
Authors: Han Luo, Guy Laban |
阅读更多来源: ArXiv AI | 04-12-25
Bridging the Gap: Toward Cognitive Autonomy in Artificial Intelligence
Authors: Noorbakhsh Amiri Golilarz, Sindhuja Penchala, Shahram Rahimi |
阅读更多来源: ArXiv AI | 04-12-25
Guided Self-Evolving LLMs with Minimal Human Supervision
Authors: Wenhao Yu, Zhenwen Liang, Chengsong Huang, Kishan Panaganti, Tianqing Fang, Haitao Mi, Dong Yu |
阅读更多来源: ArXiv AI | 04-12-25
Semantic Trading: Agentic AI for Clustering and Relationship Discovery in Prediction Markets
Authors: Agostino Capponi, Alfio Gliozzo, Brian Zhu |
阅读更多来源: ArXiv AI | 04-12-25
COPE: Chain-Of-Thought Prediction Engine for Open-Source Large Language Model Based Stroke Outcome Prediction from Clinical Notes
Authors: Yongkai Liu, Helena Feng, Bin Jiang, Yixin Wang, Max Wintermark, David S. Liebeskind, Michael Moseley, Maarten Lansberg, Gregory Albers, Jeremy Heit, Greg Zaharchuk |
阅读更多来源: ArXiv AI | 04-12-25
IACT: A Self-Organizing Recursive Model for General AI Agents: A Technical White Paper on the Architecture Behind kragent.ai
Authors: Pengju Lu |
阅读更多来源: ArXiv AI | 04-12-25
Exploring Depth Generalization in Large Language Models for Solving Recursive Logic Tasks
Authors: Zhiyuan He |
阅读更多来源: ArXiv AI | 04-12-25
Self-Improving AI Agents through Self-Play
Authors: Przemyslaw Chojecki |
阅读更多来源: ArXiv AI | 04-12-25
AuditCopilot: Leveraging LLMs for Fraud Detection in Double-Entry Bookkeeping
Authors: Md Abdul Kadir, Sai Suresh Macharla Vasu, Sidharth S. Nair, Daniel Sonntag |
阅读更多来源: ArXiv AI | 04-12-25
Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLM Reasoning
Authors: Zhonghao He, Tianyi Qiu, Hirokazu Shirado, Maarten Sap |
阅读更多来源: ArXiv AI | 04-12-25
The future of AI in critical mineral exploration
Authors: Jef Caers |
阅读更多来源: ArXiv AI | 04-12-25
From Moderation to Mediation: Can LLMs Serve as Mediators in Online Flame Wars?
Authors: Dawei Li, Abdullah Alnaibari, Arslan Bisharat, Manny Sandoval, Deborah Hall, Yasin Silva, Huan Liu |
阅读更多来源: ArXiv AI | 04-12-25
Invasive Context Engineering to Control Large Language Models
Authors: Thomas Rivasseau |
阅读更多来源: ArXiv AI | 04-12-25
Deepseek V3.2 rivals GPT-5 and Gemini 3 Pro, reaches IMO gold level as open source
阅读更多来源: The Decoder | 03-12-25
Copyright pressure mounts as OpenAI battles over newspapers and pirate libraries
阅读更多来源: The Decoder | 03-12-25
Runway’s Gen-4.5 edges past Google and OpenAI in text-to-video benchmark
阅读更多来源: The Decoder | 03-12-25
Altman memo: new OpenAI model coming next week, outperforming Gemini 3
阅读更多来源: The Decoder | 03-12-25
OpenAI declares 'code red' as Google catches up in AI racetheverge.com
阅读更多来源: Hacker News | 03-12-25
IBM CEO says there is 'no way' spending on AI data centers will pay offbusinessinsider.com
阅读更多来源: Hacker News | 03-12-25
Super fast aggregations in PostgreSQL 19cybertec-postgresql.com
阅读更多来源: Hacker News | 03-12-25
Anthropic acquires Bunbun.com
阅读更多来源: Hacker News | 03-12-25
Zig quits GitHub, says Microsoft's AI obsession has ruined the servicetheregister.com
阅读更多来源: Hacker News | 03-12-25
Claude 4.5 Opus’ Soul Documentlesswrong.com
阅读更多来源: Hacker News | 03-12-25
Ecosia: The greenest AI is hereecosia.org
阅读更多来源: Hacker News | 03-12-25
One Swallow Does Not Make a Summer: Understanding Semantic Structures in Embedding Spaces
Authors: Yandong Sun, Qiang Huang, Ziwei Xu, Yiqun Sun, Yixuan Tang, Anthony K. H. Tung |
阅读更多来源: ArXiv AI | 03-12-25
SemAgent: Semantic-Driven Agentic AI Empowered Trajectory Prediction in Vehicular Networks
Authors: Lin Zhu, Kezhi Wang, Luping Xiang, Kun Yang |
阅读更多来源: ArXiv AI | 03-12-25
A Benchmark of Causal vs Correlation AI for Predictive Maintenance
Authors: Krishna Taduri, Shaunak Dhande, Giacinto Paolo (GP)Saggese, Paul Smith |
阅读更多来源: ArXiv AI | 03-12-25
Energy-Aware Data-Driven Model Selection in LLM-Orchestrated AI Systems
Authors: Daria Smirnova, Hamid Nasiri, Marta Adamska, Zhengxin Yu, Peter Garraghan |
阅读更多来源: ArXiv AI | 03-12-25
Knowledge Graph Augmented Large Language Models for Next-Visit Disease Prediction
Authors: Ruiyu Wang, Tuan Vinh, Ran Xu, Yuyin Zhou, Jiaying Lu, Carl Yang, Francisco Pasquel |
阅读更多来源: ArXiv AI | 03-12-25
Benchmarking Overton Pluralism in LLMs
Authors: Elinor Poole-Dayan, Jiayi Wu, Taylor Sorensen, Jiaxin Pei, Michiel A. Bakker |
阅读更多来源: ArXiv AI | 03-12-25
A Flexible Multi-Agent LLM-Human Framework for Fast Human Validated Tool Building
Authors: Daull Xavier (LIS, R2I, UTLN), Patrice Bellot (R2I, LIS, AMU), Emmanuel Bruno (R2I, UTLN), Vincent Martin, Elisabeth Murisasco (R2I, UTLN) |
阅读更多来源: ArXiv AI | 03-12-25
SynthStrategy: Extracting and Formalizing Latent Strategic Insights from LLMs in Organic Chemistry
Authors: Daniel Armstrong, Zlatko Jončev, Andres M Bran, Philippe Schwaller |
阅读更多来源: ArXiv AI | 03-12-25
Who Judges the Judge? LLM Jury-on-Demand: Building Trustworthy LLM Evaluation Systems
Authors: Xiaochuan Li, Ke Wang, Girija Gouda, Shubham Choudhary, Yaqun Wang, Linwei Hu, Joel Vaughan, Freddy Lecue |
阅读更多来源: ArXiv AI | 03-12-25
Testing Transformer Learnability on the Arithmetic Sequence of Rooted Trees
Authors: Alessandro Breccia, Federica Gerace, Marco Lippi, Gabriele Sicuro, Pierluigi Contucci |
阅读更多来源: ArXiv AI | 03-12-25
Learned-Rule-Augmented Large Language Model Evaluators
Authors: Jie Meng, Jin Mao |
阅读更多来源: ArXiv AI | 03-12-25
Predicting Human Chess Moves: An AI Assisted Analysis of Chess Games Using Skill-group Specific n-gram Language Models
Authors: Daren Zhong, Dingcheng Huang, Clayton Greenberg |
阅读更多来源: ArXiv AI | 03-12-25
LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Chess
Authors: Sai Kolasani, Maxim Saplin, Nicholas Crispino, Kyle Montgomery, Jared Quincy Davis, Matei Zaharia, Chi Wang, Chenguang Wang |
阅读更多来源: ArXiv AI | 03-12-25
Google, Nvidia, and OpenAIstratechery.com
阅读更多来源: Hacker News | 02-12-25
Codex, Opus, Gemini try to build Counter Strikeinstantdb.com
阅读更多来源: Hacker News | 02-12-25
Comparing AWS Lambda ARM64 vs. x86_64 Performance Across Runtimes in Late 2025chrisebert.net
阅读更多来源: Hacker News | 02-12-25
DeepSeek-v3.2: Pushing the frontier of open large language models [pdf]huggingface.co
阅读更多来源: Hacker News | 02-12-25
Evaluating LLMs for One-Shot Patching of Real and Artificial Vulnerabilities
Authors: Aayush Garg, Zanis Ali Khan, Renzo Degiovanni, Qiang Tang |
阅读更多来源: ArXiv AI | 02-12-25
Real-Time Procedural Learning From Experience for AI Agents
Authors: Dasheng Bi, Yubin Hu, Mohammed N. Nasir |
阅读更多来源: ArXiv AI | 02-12-25
Pathology-Aware Prototype Evolution via LLM-Driven Semantic Disambiguation for Multicenter Diabetic Retinopathy Diagnosis
Authors: Chunzheng Zhu, Yangfang Lin, Jialin Shao, Jianxin Lin, Yijun Wang |
阅读更多来源: ArXiv AI | 02-12-25
The Price of Progress: Algorithmic Efficiency and the Falling Cost of AI Inference
Authors: Hans Gundlach, Jayson Lynch, Matthias Mertens, Neil Thompson |
阅读更多来源: ArXiv AI | 02-12-25
Physics-Informed Neural Networks for Thermophysical Property Retrieval
Authors: Ali Waseem, Malcolm Mielle |
阅读更多来源: ArXiv AI | 02-12-25
A perceptual bias of AI Logical Argumentation Ability in Writing
Authors: Xi Cun, Jifan Ren, Asha Huang, Siyu Li, Ruzhen Song |
阅读更多来源: ArXiv AI | 02-12-25
Hybrid Stackelberg Game and Diffusion-based Auction for Two-tier Agentic AI Task Offloading in Internet of Agents
Authors: Yue Zhong, Yongju Tong, Jiawen Kang, Minghui Dai, Hong-Ning Dai, Zhou Su, Dusit Niyato |
阅读更多来源: ArXiv AI | 02-12-25
When AI Bends Metal: AI-Assisted Optimization of Design Parameters in Sheet Metal Forming
Authors: Ahmad Tarraf, Koutaiba Kassem-Manthey, Seyed Ali Mohammadi, Philipp Martin, Lukas Moj, Semih Burak, Enju Park, Christian Terboven, Felix Wolf |
阅读更多来源: ArXiv AI | 02-12-25
RecToM: A Benchmark for Evaluating Machine Theory of Mind in LLM-based Conversational Recommender Systems
Authors: Mengfan Li, Xuanhua Shi, Yang Deng |
阅读更多来源: ArXiv AI | 02-12-25
Tracing Footsteps of Similar Cities: Modeling Urban Economic Vitality with Dynamic Inter-City Graph Embeddings
Authors: Xiaofeng Li, Xiangyi Xiao, Xiaocong Du, Ying Zhang, Haipeng Zhang |
阅读更多来源: ArXiv AI | 02-12-25
Swarms of Large Language Model Agents for Protein Sequence Design with Experimental Validation
Authors: Fiona Y. Wang, Di Sheng Lee, David L. Kaplan, Markus J. Buehler |
阅读更多来源: ArXiv AI | 02-12-25
Solving Context Window Overflow in AI Agents
Authors: Anton Bulle Labate, Valesca Moura de Sousa, Sandro Rama Fiorini, Leonardo Guerreiro Azevedo, Raphael Melo Thiago, Viviane Torres da Silva |
阅读更多来源: ArXiv AI | 02-12-25
InsightEval: An Expert-Curated Benchmark for Assessing Insight Discovery in LLM-Driven Data Agents
Authors: Zhenghao Zhu, Yuanfeng Song, Xin Chen, Chengzhong Liu, Yakun Cui, Caleb Chen Cao, Sirui Han, Yike Guo |
阅读更多来源: ArXiv AI | 02-12-25
Agentic AI Framework for Cloudburst Prediction and Coordinated Response
Authors: Toqeer Ali Syed, Sohail Khan, Salman Jan, Gohar Ali, Muhammad Nauman, Ali Akarma, Ahmad Ali |
阅读更多来源: ArXiv AI | 02-12-25
Agentic AI Framework for Individuals with Disabilities and Neurodivergence: A Multi-Agent System for Healthy Eating, Daily Routines, and Inclusive Well-Being
Authors: Salman Jan, Toqeer Ali Syed, Gohar Ali, Ali Akarma, Mohammad Riyaz Belgaum, Ahmad Ali |
阅读更多来源: ArXiv AI | 02-12-25
Agentic AI Framework for Smart Inventory Replenishment
Authors: Toqeer Ali Syed, Salman Jan, Gohar Ali, Ali Akarma, Ahmad Ali, Qurat-ul-Ain Mastoi |
阅读更多来源: ArXiv AI | 02-12-25
Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction
Authors: Bao Shu, Yan Cai, Jianjian Sun, Chunrui Han, En Yu, Liang Zhao, Jingcheng Hu, Yinmin Zhang, Haoran Lv, Yuang Peng, Zheng Ge, Xiangyu Zhang, Daxin Jiang, Xiangyu Yue |
阅读更多来源: ArXiv AI | 02-12-25
Hierarchical AI-Meteorologist: LLM-Agent System for Multi-Scale and Explainable Weather Forecast Reporting
Authors: Daniil Sukhorukov, Andrei Zakharov, Nikita Glazkov, Katsiaryna Yanchanka, Vladimir Kirilin, Maxim Dubovitsky, Roman Sultimov, Yuri Maksimov, Ilya Makarov |
阅读更多来源: ArXiv AI | 02-12-25
Sycophancy is the first LLM "dark pattern"seangoedecke.com
阅读更多来源: Hacker News | 02-12-25
AI agents find $4.6M in blockchain smart contract exploitsanthropic.com
阅读更多来源: Hacker News | 02-12-25
The mere existence of Google TPUs reportedly saved OpenAI 30% on Nvidia chips
阅读更多来源: The Decoder | 01-12-25
Pinokio 5.0 turns local machines into personal AI clouds
阅读更多来源: The Decoder | 01-12-25
General Agentic Memory tackles context rot and outperforms RAG in memory benchmarks
阅读更多来源: The Decoder | 01-12-25
Search tool that only returns content created before ChatGPT's public releasetegabrain.com
阅读更多来源: Hacker News | 01-12-25
Writing a good Claude.mdhumanlayer.dev
阅读更多来源: Hacker News | 01-12-25
Mechanistic Interpretability for Transformer-based Time Series Classification
Authors: Matīss Kalnāre, Sofoklis Kitharidis, Thomas Bäck, Niki van Stein |
阅读更多来源: ArXiv AI | 01-12-25
Scale-Agnostic Kolmogorov-Arnold Geometry in Neural Networks
Authors: Mathew Vanherreweghe, Michael H. Freedman, Keith M. Adams |
阅读更多来源: ArXiv AI | 01-12-25
Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining
Authors: Dongyang Fan, Diba Hashemi, Sai Praneeth Karimireddy, Martin Jaggi |
阅读更多来源: ArXiv AI | 01-12-25
Mechanisms of Non-Monotonic Scaling in Vision Transformers
Authors: Anantha Padmanaban Krishna Kumar (Boston University) |
阅读更多来源: ArXiv AI | 01-12-25
Minimizing Hyperbolic Embedding Distortion with LLM-Guided Hierarchy Restructuring
Authors: Melika Ayoughi, Pascal Mettes, Paul Groth |
阅读更多来源: ArXiv AI | 01-12-25
Learning Multi-Access Point Coordination in Agentic AI Wi-Fi with Large Language Models
Authors: Yifan Fan, Le Liang, Peng Liu, Xiao Li, Ziyang Guo, Qiao Lan, Shi Jin, Wen Tong |
阅读更多来源: ArXiv AI | 01-12-25
Towards Trustworthy Legal AI through LLM Agents and Formal Reasoning
Authors: Linze Chen, Yufan Cai, Zhe Hou, Jinsong Dong |
阅读更多来源: ArXiv AI | 01-12-25
Improving Procedural Skill Explanations via Constrained Generation: A Symbolic-LLM Hybrid Architecture
Authors: Rahul Dass, Thomas Bowlin, Zebing Li, Xiao Jin, Ashok Goel |
阅读更多来源: ArXiv AI | 01-12-25
SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition
Authors: Peiran Xu, Sudong Wang, Yao Zhu, Jianing Li, Yunjian Zhang |
阅读更多来源: ArXiv AI | 01-12-25
From Prediction to Foresight: The Role of AI in Designing Responsible Futures
Authors: Maria Perez-Ortiz |
阅读更多来源: ArXiv AI | 01-12-25
Self-Transparency Failures in Expert-Persona LLMs: A Large-Scale Behavioral Audit
Authors: Alex Diep |
阅读更多来源: ArXiv AI | 01-12-25
On the Limits of Innate Planning in Large Language Models
Authors: Charles Schepanowski, Charles Ling |
阅读更多来源: ArXiv AI | 01-12-25
The ARC benchmark's fall marks another casualty of relentless AI optimization
阅读更多来源: The Decoder | 30-11-25
Leak confirms OpenAI is preparing ads on ChatGPT for public roll outbleepingcomputer.com
阅读更多来源: Hacker News | 30-11-25
Student perceptions of AI coding assistants in learningarxiv.org
阅读更多来源: Hacker News | 30-11-25
Show HN: Nano PDF – A CLI Tool to Edit PDFs with Gemini's Nano Bananagithub.com/gavrielc
阅读更多来源: Hacker News | 30-11-25
28M Hacker News comments as vector embedding search datasetclickhouse.com
阅读更多来源: Hacker News | 29-11-25
I Know We're in an AI Bubble Because Nobody Wants Mepetewarden.com
阅读更多来源: Hacker News | 29-11-25
So you wanna build a local RAG?yakkomajuri.com
阅读更多来源: Hacker News | 29-11-25
DeepseekMath-V2 is Deepseek's latest attempt to pop the US AI bubble
阅读更多来源: The Decoder | 29-11-25
Show HN: An LLM-Powered Tool to Catch PCB Schematic Mistakesnetlist.io
阅读更多来源: Hacker News | 29-11-25
Effective harnesses for long-running agentsanthropic.com
阅读更多来源: Hacker News | 29-11-25
Andrej Karpathy declares the war on AI homework lost and urges schools to stop policing it
阅读更多来源: The Decoder | 28-11-25
250MWh 'Sand Battery' to start construction in Finlandenergy-storage.news
阅读更多来源: Hacker News | 28-11-25
Same-day upstream Linux support for Snapdragon 8 Elite Gen 5qualcomm.com
阅读更多来源: Hacker News | 28-11-25
The current state of the theory that GPL propagates to AI modelsshujisado.org
阅读更多来源: Hacker News | 28-11-25
TPUs vs. GPUs and why Google is positioned to win AI race in the long termuncoveralpha.com
阅读更多来源: Hacker News | 28-11-25
New study maps how AI models think and where their reasoning breaks down
阅读更多来源: The Decoder | 27-11-25
Show HN: Era – Open-source local sandbox for AI agentsgithub.com/binsquare
阅读更多来源: Hacker News | 27-11-25
Gemini CLI Tips and Tricks for Agentic Codinggithub.com/addyosmani
阅读更多来源: Hacker News | 27-11-25
Evaluating the Performance of Deep Learning Models in Whole-body Dynamic 3D Posture Prediction During Load-reaching Activities
Authors: Seyede Niloofar Hosseini, Ali Mojibi, Mahdi Mohseni, Navid Arjmand, Alireza Taheri |
阅读更多来源: ArXiv AI | 27-11-25
Can Vibe Coding Beat Graduate CS Students? An LLM vs. Human Coding Tournament on Market-driven Strategic Planning
Authors: Panayiotis Danassis, Naman Goel |
阅读更多来源: ArXiv AI | 27-11-25
On Evaluating LLM Alignment by Evaluating LLMs as Judges
Authors: Yixin Liu, Pengfei Liu, Arman Cohan |
阅读更多来源: ArXiv AI | 27-11-25
ROOT: Robust Orthogonalized Optimizer for Neural Network Training
Authors: Wei He, Kai Han, Hang Zhou, Hanting Chen, Zhicheng Liu, Xinghao Chen, Yunhe Wang |
阅读更多来源: ArXiv AI | 27-11-25
Scaling Item-to-Standard Alignment with Large Language Models: Accuracy, Limits, and Solutions
Authors: Farzan Karimi-Malekabadi, Pooya Razavi, Sonya Powers |
阅读更多来源: ArXiv AI | 27-11-25
MicroSims: A Framework for AI-Generated, Scalable Educational Simulations with Universal Embedding and Adaptive Learning Support
Authors: Valerie Lockhart, Dan McCreary, Troy A. Peterson |
阅读更多来源: ArXiv AI | 27-11-25
KOM: A Multi-Agent Artificial Intelligence System for Precision Management of Knee Osteoarthritis (KOA)
Authors: Weizhi Liu, Xi Chen, Zekun Jiang, Liang Zhao, Kunyuan Jiang, Ruisi Tang, Li Wang, Mingke You, Hanyu Zhou, Hongyu Chen, Qiankun Xiong, Yong Nie, Kang Li, Jian Li |
阅读更多来源: ArXiv AI | 27-11-25
A System-Level Taxonomy of Failure Modes in Large Language Model Applications
Authors: Vaishali Vinay |
阅读更多来源: ArXiv AI | 27-11-25
Simulated Self-Assessment in Large Language Models: A Psychometric Approach to AI Self-Efficacy
Authors: Daniel I Jackson, Emma L Jensen, Syed-Amad Hussain, Emre Sezgin |
阅读更多来源: ArXiv AI | 27-11-25
Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design
Authors: Zixiao Huang, Wen Zeng, Tianyu Fu, Tengxuan Liu, Yizhou Sun, Ke Hong, Xinhao Yang, Chengchun Liu, Yan Li, Quanlu Zhang, Guohao Dai, Zhenhua Zhu, Yu Wang |
阅读更多来源: ArXiv AI | 27-11-25
Interactive AI NPCs Powered by LLMs: Technical Report for the CPDC Challenge 2025
Authors: Yitian Huang, Yuxuan Lei, Jianxun Lian, Hao Liao |
阅读更多来源: ArXiv AI | 27-11-25
Towards Benign Memory Forgetting for Selective Multimodal Large Language Model Unlearning
Authors: Zhen Zeng, Leijiang Gu, Zhangling Duan, Feng Li, Zenglin Shi, Cees G. M. Snoek, Meng Wang |
阅读更多来源: ArXiv AI | 27-11-25
Data Augmentation Techniques to Reverse-Engineer Neural Network Weights from Input-Output Queries
Authors: Alexander Beiser, Flavio Martinelli, Wulfram Gerstner, Johanni Brea |
阅读更多来源: ArXiv AI | 27-11-25
DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs
Authors: Yuanhao Li, Mingshan Liu, Hongbo Wang, Yiding Zhang, Yifei Ma, Wei Tan |
阅读更多来源: ArXiv AI | 27-11-25
NNGPT: Rethinking AutoML with Large Language Models
Authors: Roman Kochnev, Waleed Khalid, Tolgay Atinc Uzun, Xi Zhang, Yashkumar Sanjaybhai Dhameliya, Furui Qin, Chandini Vysyaraju, Raghuvir Duvvuri, Avi Goyal, Dmitry Ignatov, Radu Timofte |
阅读更多来源: ArXiv AI | 27-11-25
Universe of Thoughts: Enabling Creative Reasoning with Large Language Models
Authors: Yuto Suzuki, Farnoush Banaei-Kashani |
阅读更多来源: ArXiv AI | 27-11-25
Assessing LLMs' Performance: Insights from the Chinese Pharmacist Exam
Authors: Xinran Wang, Boran Zhu, Shujuan Zhou, Ziwen Long, Dehua Zhou, Shu Zhang |
阅读更多来源: ArXiv AI | 27-11-25
FRAGMENTA: End-to-end Fragmentation-based Generative Model with Agentic Tuning for Drug Lead Optimization
Authors: Yuto Suzuki, Paul Awolade, Daniel V. LaBarbera, Farnoush Banaei-Kashani |
阅读更多来源: ArXiv AI | 27-11-25
PaTAS: A Parallel System for Trust Propagation in Neural Networks Using Subjective Logic
Authors: Koffi Ismael Ouattara, Ioannis Krontiris, Theo Dimitrakos, Dennis Eisermann, Frank Kargl |
阅读更多来源: ArXiv AI | 27-11-25
Fighting AI with AI: Leveraging Foundation Models for Assuring AI-Enabled Safety-Critical Systems
Authors: Anastasia Mavridou, Divya Gopinath, Corina S. Păsăreanu |
阅读更多来源: ArXiv AI | 27-11-25
Copyright Detection in Large Language Models: An Ethical Approach to Generative AI Development
Authors: David Szczecina, Senan Gaffori, Edmond Li |
阅读更多来源: ArXiv AI | 27-11-25
JOPA: Java compiler in C++, Jikes modernized to Java 6 with Claudegithub.com/7mind
阅读更多来源: Hacker News | 27-11-25
OpenAI's drive to make ChatGPT more agreeable left it validating user delusions at scale
阅读更多来源: The Decoder | 26-11-25
Claude Opus 4.5 arrives with Anthropic cutting prices by two-thirds
阅读更多来源: The Decoder | 26-11-25
OpenAI turns ChatGPT into a shopping agent that researches and compares products
阅读更多来源: The Decoder | 26-11-25
M3-Bench: Multi-Modal, Multi-Hop, Multi-Threaded Tool-Using MLLM Agent Benchmark
Authors: Yang Zhou, Mingyu Zhao, Zhenting Wang, Difei Gu, Bangwei Guo, Ruosong Ye, Ligong Han, Can Jin, Dimitris N. Metaxas |
阅读更多来源: ArXiv AI | 26-11-25
Bridging Symbolic Control and Neural Reasoning in LLM Agents: The Structured Cognitive Loop
Authors: Myung Ho Kim |
阅读更多来源: ArXiv AI | 26-11-25
ChemVTS-Bench: Evaluating Visual-Textual-Symbolic Reasoning of Multimodal Large Language Models in Chemistry
Authors: Zhiyuan Huang, Baichuan Yang, Zikun He, Yanhong Wu, Fang Hongyu, Zhenhe Liu, Lin Dongsheng, Bing Su |
阅读更多来源: ArXiv AI | 26-11-25
Learning to Debug: LLM-Organized Knowledge Trees for Solving RTL Assertion Failures
Authors: Yunsheng Bai, Haoxing Ren |
阅读更多来源: ArXiv AI | 26-11-25
BPMN to PDDL: Translating Business Workflows for AI Planning
Authors: Jasper Nie, Christian Muise, Victoria Armstrong |
阅读更多来源: ArXiv AI | 26-11-25
How Far Can LLMs Emulate Human Behavior?: A Strategic Analysis via the Buy-and-Sell Negotiation Game
Authors: Mingyu Jeon, Jaeyoung Suh, Suwan Cho, Dohyeon Kim |
阅读更多来源: ArXiv AI | 26-11-25
Leveraging Evidence-Guided LLMs to Enhance Trustworthy Depression Diagnosis
Authors: Yining Yuan, J. Ben Tamo, Micky C. Nnamdi, Yifei Wang, May D. Wang |
阅读更多来源: ArXiv AI | 26-11-25
The Catastrophic Paradox of Human Cognitive Frameworks in Large Language Model Evaluation: A Comprehensive Empirical Analysis of the CHC-LLM Incompatibility
Authors: Mohan Reddy |
阅读更多来源: ArXiv AI | 26-11-25
Cross-Disciplinary Knowledge Retrieval and Synthesis: A Compound AI Architecture for Scientific Discovery
Authors: Svitlana Volkova, Peter Bautista, Avinash Hiriyanna, Gabriel Ganberg, Isabel Erickson, Zachary Klinefelter, Nick Abele, Hsien-Te Kao, Grant Engberson |
阅读更多来源: ArXiv AI | 26-11-25
Deep Learning Decision Support System for Open-Pit Mining Optimisation: GPU-Accelerated Planning Under Geological Uncertainty
Authors: Iman Rahimi |
阅读更多来源: ArXiv AI | 26-11-25
Developing an AI Course for Synthetic Chemistry Students
Authors: Zhiling Zheng |
阅读更多来源: ArXiv AI | 26-11-25
Foundations of Artificial Intelligence Frameworks: Notion and Limits of AGI
Authors: Khanh Gia Bui |
阅读更多来源: ArXiv AI | 26-11-25
ORIGAMISPACE: Benchmarking Multimodal LLMs in Multi-Step Spatial Reasoning with Mathematical Constraints
Authors: Rui Xu, Dakuan Lu, Zicheng Zhao, Xiaoyu Tan, Xintao Wang, Siyu Yuan, Jiangjie Chen, Yinghui Xu |
阅读更多来源: ArXiv AI | 26-11-25
Progressive Localisation in Localist LLMs
Authors: Joachim Diederich |
阅读更多来源: ArXiv AI | 26-11-25
Bridging Philosophy and Machine Learning: A Structuralist Framework for Classifying Neural Network Representations
Authors: Yildiz Culcu |
阅读更多来源: ArXiv AI | 26-11-25
HERMES: Towards Efficient and Verifiable Mathematical Reasoning in LLMs
Authors: Azim Ospanov, Zijin Feng, Jiacheng Sun, Haoli Bai, Xin Shen, Farzan Farnia |
阅读更多来源: ArXiv AI | 26-11-25
LLM-CSEC: Empirical Evaluation of Security in C/C++ Code Generated by Large Language Models
Authors: Muhammad Usman Shahid, Chuadhry Mujeeb Ahmed, Rajiv Ranjan |
阅读更多来源: ArXiv AI | 26-11-25
Extracting Robust Register Automata from Neural Networks over Data Sequences
Authors: Chih-Duo Hong, Hongjian Jiang, Anthony W. Lin, Oliver Markgraf, Julian Parsert, Tony Tan |
阅读更多来源: ArXiv AI | 26-11-25
Psychometric Tests for AI Agents and Their Moduli Space
Authors: Przemyslaw Chojecki |
阅读更多来源: ArXiv AI | 26-11-25
Gemini 3 Pro and GPT-5 still fail at complex physics tasks designed for real scientific research
阅读更多来源: The Decoder | 25-11-25
Strict anti-hacking prompts make AI models more likely to sabotage and lie, Anthropic finds
阅读更多来源: The Decoder | 25-11-25
Three Years from GPT-3 to Gemini 3oneusefulthing.org
阅读更多来源: Hacker News | 25-11-25
What OpenAI did when ChatGPT users lost touch with realitynytimes.com
阅读更多来源: Hacker News | 25-11-25
Claude Opus 4.5anthropic.com
阅读更多来源: Hacker News | 25-11-25
The Bitter Lesson of LLM Extensionssawyerhood.com
阅读更多来源: Hacker News | 25-11-25
Claude Advanced Tool Useanthropic.com
阅读更多来源: Hacker News | 25-11-25
Implications of AI to schoolstwitter.com/karpathy
阅读更多来源: Hacker News | 25-11-25
Parrot: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs
Authors: Yusuf Çelebi, Mahmoud El Hussieni, Özay Ezerceli |
阅读更多来源: ArXiv AI | 25-11-25
Hallucinate Less by Thinking More: Aspect-Based Causal Abstention for Large Language Models
Authors: Vy Nguyen, Ziqi Xu, Jeffrey Chan, Estrid He, Feng Xia, Xiuzhen Zhang |
阅读更多来源: ArXiv AI | 25-11-25
Leveraging CVAE for Joint Configuration Estimation of Multifingered Grippers from Point Cloud Data
Authors: Julien Merand, Boris Meden, Mathieu Grossard |
阅读更多来源: ArXiv AI | 25-11-25
Large Language Models for Sentiment Analysis to Detect Social Challenges: A Use Case with South African Languages
Authors: Koena Ronny Mabokela, Tim Schlippe, Matthias Wölfel |
阅读更多来源: ArXiv AI | 25-11-25
MusicAIR: A Multimodal AI Music Generation Framework Powered by an Algorithm-Driven Core
Authors: Callie C. Liao, Duoduo Liao, Ellie L. Zhang |
阅读更多来源: ArXiv AI | 25-11-25
DS-Span: Single-Phase Discriminative Subgraph Mining for Efficient Graph Embeddings
Authors: Yeamin Kaiser, Muhammed Tasnim Bin Anwar, Bholanath Das, Chowdhury Farhan Ahmed, Md. Tanvir Alam |
阅读更多来源: ArXiv AI | 25-11-25
REMSA: An LLM Agent for Foundation Model Selection in Remote Sensing
Authors: Binger Chen, Tacettin Emre Bök, Behnood Rasti, Volker Markl, Begüm Demir |
阅读更多来源: ArXiv AI | 25-11-25
PersonaAgent with GraphRAG: Community-Aware Knowledge Graphs for Personalized LLM
Authors: Siqi Liang, Yudi Zhang, Yue Guo |
阅读更多来源: ArXiv AI | 25-11-25
Cognitive BASIC: An In-Model Interpreted Reasoning Language for LLMs
Authors: Oliver Kramer |
阅读更多来源: ArXiv AI | 25-11-25
Stable diffusion models reveal a persisting human and AI gap in visual creativity
Authors: Silvia Rondini, Claudia Alvarez-Martin, Paula Angermair-Barkai, Olivier Penacchio, M. Paz, Matthew Pelowski, Dan Dediu, Antoni Rodriguez-Fornells, Xim Cerda-Company |
阅读更多来源: ArXiv AI | 25-11-25
Enhancing Quranic Learning: A Multimodal Deep Learning Approach for Arabic Phoneme Recognition
Authors: Ayhan Kucukmanisa, Derya Gelmez, Sukru Selim Calik, Zeynep Hilal Kilimci |
阅读更多来源: ArXiv AI | 25-11-25
Fantastic Bugs and Where to Find Them in AI Benchmarks
Authors: Sang Truong, Yuheng Tu, Michael Hardy, Anka Reuel, Zeyu Tang, Jirayu Burapacheep, Jonathan Perera, Chibuike Uwakwe, Ben Domingue, Nick Haber, Sanmi Koyejo |
阅读更多来源: ArXiv AI | 25-11-25
Google plans a 1000x jump in AI compute over the next five years
阅读更多来源: The Decoder | 24-11-25
Liva AI (YC S25) Is Hiringycombinator.com
阅读更多来源: Hacker News | 24-11-25
Terence Tao: At the Erdos problem website, AI assistance now becoming routinemathstodon.xyz
阅读更多来源: Hacker News | 24-11-25
Show HN: Stun LLMs with thousands of invisible Unicode charactersgibberifier.com
阅读更多来源: Hacker News | 24-11-25
General principles for the use of AI at CERNcern.ch
阅读更多来源: Hacker News | 24-11-25
An Economy of AI Agentsarxiv.org
阅读更多来源: Hacker News | 24-11-25
As Google pulls ahead, OpenAI's comeback plan is codenamed 'Shallotpeat'
阅读更多来源: The Decoder | 23-11-25
MCP Apps just dropped (OpenAI and Anthropic collab) and I think this is hugemodelcontextprotocol.io
阅读更多来源: Hacker News | 23-11-25
Build AI Assistants using Large Language Models and Agents to Enhance the Engineering Education of Biomechanics
Authors: Hanzhi Yan, Qin Lu, Xianqiao Wang, Xiaoming Zhai, Tianming Liu, He Li |
阅读更多来源: ArXiv AI | 23-11-25
Spatial Reasoning in Multimodal Large Language Models: A Survey of Tasks, Benchmarks and Methods
Authors: Weichen Liu, Qiyao Xue, Haoming Wang, Xiangyu Yin, Boyuan Yang, Wei Gao |
阅读更多来源: ArXiv AI | 23-11-25
Identifying the Supply Chain of AI for Trustworthiness and Risk Management in Critical Applications
Authors: Raymond K. Sheh, Karen Geappen |
阅读更多来源: ArXiv AI | 23-11-25
Multi-Agent LLM Orchestration Achieves Deterministic, High-Quality Decision Support for Incident Response
Authors: Philip Drammeh |
阅读更多来源: ArXiv AI | 23-11-25
CARE-RAG - Clinical Assessment and Reasoning in RAG
Authors: Deepthi Potluri, Aby Mammen Mathew, Jeffrey B DeWitt, Alexander L. Rasgon, Yide Hao, Junyuan Hong, Ying Ding |
阅读更多来源: ArXiv AI | 23-11-25
Detecting Sleeper Agents in Large Language Models via Semantic Drift Analysis
Authors: Shahin Zanbaghi, Ryan Rostampour, Farhan Abid, Salim Al Jarmakani |
阅读更多来源: ArXiv AI | 23-11-25
KRAL: Knowledge and Reasoning Augmented Learning for LLM-assisted Clinical Antimicrobial Therapy
Authors: Zhe Li, Yehan Qiu, Yujie Chen, Xiang Zhou |
阅读更多来源: ArXiv AI | 23-11-25
Thinking, Faithful and Stable: Mitigating Hallucinations in LLMs
Authors: Chelsea Zou, Yiheng Yao, Basant Khalil |
阅读更多来源: ArXiv AI | 23-11-25
Decomposing Theory of Mind: How Emotional Processing Mediates ToM Abilities in LLMs
Authors: Ivan Chulo, Ananya Joshi |
阅读更多来源: ArXiv AI | 23-11-25
SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent
Authors: Shiyi Cao, Dacheng Li, Fangzhou Zhao, Shuo Yuan, Sumanth R. Hegde, Connor Chen, Charlie Ruan, Tyler Griggs, Shu Liu, Eric Tang, Richard Liaw, Philipp Moritz, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica |
阅读更多来源: ArXiv AI | 23-11-25
Artificial Intelligence and Accounting Research: A Framework and Agenda
Authors: Theophanis C. Stratopoulos, Victor Xiaoqi Wang |
阅读更多来源: ArXiv AI | 23-11-25
SpellForger: Prompting Custom Spell Properties In-Game using BERT supervised-trained model
Authors: Emanuel C. Silva, Emily S. M. Salum, Gabriel M. Arantes, Matheus P. Pereira, Vinicius F. Oliveira, Alessandro L. Bicho |
阅读更多来源: ArXiv AI | 23-11-25
Sensorium Arc: AI Agent System for Oceanic Data Exploration and Interactive Eco-Art
Authors: Noah Bissell (Immersive Media Design, University of Maryland, College Park, USA), Ethan Paley (Immersive Media Design, University of Maryland, College Park, USA), Joshua Harrison (Center for the Study of the Force Majeure, University of California, Santa Cruz, USA), Juliano Calil (Virtual Planet Technologies, Santa Cruz, USA), Myungin Lee (Immersive Media Design, University of Maryland, College Park, USA) |
阅读更多来源: ArXiv AI | 23-11-25
CorrectHDL: Agentic HDL Design with LLMs Leveraging High-Level Synthesis as Reference
Authors: Kangwei Xu, Grace Li Zhang, Ulf Schlichtmann, Bing Li |
阅读更多来源: ArXiv AI | 23-11-25
Trustworthy AI in the Agentic Lakehouse: from Concurrency to Governance
Authors: Jacopo Tagliabue, Federico Bianchi, Ciro Greco |
阅读更多来源: ArXiv AI | 23-11-25
From generative AI to the brain: five takeaways
Authors: Claudius Gros |
阅读更多来源: ArXiv AI | 23-11-25
Consciousness in Artificial Intelligence? A Framework for Classifying Objections and Constraints
Authors: Andres Campero, Derek Shiller, Jaan Aru, Jonathan Simon |
阅读更多来源: ArXiv AI | 23-11-25
Utilizing Large Language Models for Zero-Shot Medical Ontology Extension from Clinical Notes
Authors: Guanchen Wu, Yuzhang Xie, Huanwei Wu, Zhe He, Hui Shao, Xiao Hu, Carl Yang |
阅读更多来源: ArXiv AI | 23-11-25
Cognitive Foundations for Reasoning and Their Manifestation in LLMs
Authors: Priyanka Kargupta, Shuyue Stella Li, Haocheng Wang, Jinu Lee, Shan Chen, Orevaoghene Ahia, Dean Light, Thomas L. Griffiths, Max Kleiman-Weiner, Jiawei Han, Asli Celikyilmaz, Yulia Tsvetkov |
阅读更多来源: ArXiv AI | 23-11-25
OpenAI report suggests GPT‑5 is starting to ease scientists’ daily workloads
阅读更多来源: The Decoder | 23-11-25
Show HN: I built a wizard to turn ideas into AI coding agent-ready specsvibescaffold.dev
阅读更多来源: Hacker News | 23-11-25
Discontinuation of ARM Notebook with Snapdragon X Elite SoCtuxedocomputers.com
阅读更多来源: Hacker News | 22-11-25
Weight-sparse transformers have interpretable circuits [pdf]cdn.openai.com
阅读更多来源: Hacker News | 22-11-25
Yann LeCun leaves Meta to launch new AI startup
阅读更多来源: The Decoder | 22-11-25
Boom, bubble, bust, boom. Why should AI be different?crazystupidtech.com
阅读更多来源: Hacker News | 22-11-25
FAWK: LLMs can write a language interpreterjaniczek.cz
阅读更多来源: Hacker News | 22-11-25
Google's Nested Learning aims to stop LLMs from catastrophic forgetting
阅读更多来源: The Decoder | 22-11-25
The future of AI browsing may depend on developers rethinking how they build websites
阅读更多来源: The Decoder | 22-11-25
OpenAI releases GPT-5.1-Codex-Max to handle engineering tasks that span twenty-four hours
阅读更多来源: The Decoder | 21-11-25
Trump drafts executive order to block states from passing their own AI laws
阅读更多来源: The Decoder | 21-11-25
Adversarial poetry as a universal single-turn jailbreak mechanism in LLMsarxiv.org
阅读更多来源: Hacker News | 21-11-25
Hilbert space: Treating functions as vectorsthegreenplace.net
阅读更多来源: Hacker News | 21-11-25
Exploring the Fragmentation of Wayland, an xdotool adventuresemicomplete.com
阅读更多来源: Hacker News | 21-11-25
RRT*former: Environment-Aware Sampling-Based Motion Planning using Transformer
Authors: Mingyang Feng, Shaoyuan Li, Xiang Yin |
阅读更多来源: ArXiv AI | 21-11-25
HV-Attack: Hierarchical Visual Attack for Multimodal Retrieval Augmented Generation
Authors: Linyin Luo, Yujuan Ding, Yunshan Ma, Wenqi Fan, Hanjiang Lai |
阅读更多来源: ArXiv AI | 21-11-25
RS-CA-HSICT: A Residual and Spatial Channel Augmented CNN Transformer Framework for Monkeypox Detection
Authors: Rashid Iqbal, Saddam Hussain Khan |
阅读更多来源: ArXiv AI | 21-11-25
HSKBenchmark: Modeling and Benchmarking Chinese Second Language Acquisition in Large Language Models through Curriculum Tuning
Authors: Qihao Yang, Xuelin Wang, Jiale Chen, Xuelian Dong, Yuxin Hao, Tianyong Hao |
阅读更多来源: ArXiv AI | 21-11-25
Subnational Geocoding of Global Disasters Using Large Language Models
Authors: Michele Ronco, Damien Delforge, Wiebke S. Jäger, Christina Corbane |
阅读更多来源: ArXiv AI | 21-11-25
Ask WhAI:Probing Belief Formation in Role-Primed LLM Agents
Authors: Keith Moore, Jun W. Kim, David Lyu, Jeffrey Heo, Ehsan Adeli |
阅读更多来源: ArXiv AI | 21-11-25
The Illusion of Procedural Reasoning: Measuring Long-Horizon FSM Execution in LLMs
Authors: Mahdi Samiei, Mahdi Mansouri, Mahdieh Soleymani Baghshah |
阅读更多来源: ArXiv AI | 21-11-25
Project Rachel: Can an AI Become a Scholarly Author?
Authors: Martin Monperrus, Benoit Baudry, Clément Vidal |
阅读更多来源: ArXiv AI | 21-11-25
Beyond GeneGPT: A Multi-Agent Architecture with Open-Source LLMs for Enhanced Genomic Question Answering
Authors: Haodong Chen, Guido Zuccon, Teerapong Leelanupab |
阅读更多来源: ArXiv AI | 21-11-25
Knowledge-Informed Automatic Feature Extraction via Collaborative Large Language Model Agents
Authors: Henrik Bradland, Morten Goodwin, Vladimir I. Zadorozhny, Per-Arne Andersen |
阅读更多来源: ArXiv AI | 21-11-25
ProRAC: A Neuro-symbolic Method for Reasoning about Actions with LLM-based Progression
Authors: Haoyong Wu, Yongmei Liu |
阅读更多来源: ArXiv AI | 21-11-25
SOLID: a Framework of Synergizing Optimization and LLMs for Intelligent Decision-Making
Authors: Yinsheng Wang, Tario G You, Léonard Boussioux, Shan Liu |
阅读更多来源: ArXiv AI | 21-11-25
As If We've Met Before: LLMs Exhibit Certainty in Recognizing Seen Files
Authors: Haodong Li, Jingqi Zhang, Xiao Cheng, Peihua Mai, Haoyu Wang, Yang Pan |
阅读更多来源: ArXiv AI | 21-11-25
HISE-KT: Synergizing Heterogeneous Information Networks and LLMs for Explainable Knowledge Tracing with Meta-Path Optimization
Authors: Zhiyi Duan, Zixing Shi, Hongyu Yuan, Qi Wang |
阅读更多来源: ArXiv AI | 21-11-25
Realist and Pluralist Conceptions of Intelligence and Their Implications on AI Research
Authors: Ninell Oldenburg, Ruchira Dhar, Anders Søgaard |
阅读更多来源: ArXiv AI | 21-11-25
Exploring the use of AI authors and reviewers at Agents4Science
Authors: Federico Bianchi, Owen Queen, Nitya Thakkar, Eric Sun, James Zou |
阅读更多来源: ArXiv AI | 21-11-25
Know Your Intent: An Autonomous Multi-Perspective LLM Agent Framework for DeFi User Transaction Intent Mining
Authors: Qian'ang Mao, Yuxuan Zhang, Jiaman Chen, Wenjun Zhou, Jiaqi Yan |
阅读更多来源: ArXiv AI | 21-11-25
What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity
Authors: Alexis Audran-Reiss, Jordi Armengol Estapé, Karen Hambardzumyan, Amar Budhiraja, Martin Josifoski, Edan Toledo, Rishi Hazra, Despoina Magka, Michael Shvartsman, Parth Pathak, Justine T Kao, Lucia Cipolina-Kun, Bhavul Gauri, Jean-Christophe Gagnon-Audet, Emanuel Tewolde, Jenny Zhang, Taco Cohen, Yossi Adi, Tatiana Shavrina, Yoram Bachrach |
阅读更多来源: ArXiv AI | 21-11-25
Analysts say Google now leads the AI performance race with Gemini 3 Pro
阅读更多来源: The Decoder | 20-11-25
Deepmind CEO Hassabis: World models are the future, but the AI bubble is real
阅读更多来源: The Decoder | 20-11-25
Static Web Hosting on the Intel N150dragas.net
阅读更多来源: Hacker News | 20-11-25
Building more with GPT-5.1-Codex-Maxopenai.com
阅读更多来源: Hacker News | 20-11-25
Wrapping my head around AI wrapperswreflection.com
阅读更多来源: Hacker News | 20-11-25
Europe is scaling back GDPR and relaxing AI lawstheverge.com
阅读更多来源: Hacker News | 20-11-25
MRI Embeddings Complement Clinical Predictors for Cognitive Decline Modeling in Alzheimer's Disease Cohorts
Authors: Nathaniel Putera, Daniel Vilet Rodríguez, Noah Videcrantz, Julia Machnio, Mostafa Mehdipour Ghazi |
阅读更多来源: ArXiv AI | 20-11-25
Failure to Mix: Large language models struggle to answer according to desired probability distributions
Authors: Ivy Yuqian Yang, David Yu Zhang |
阅读更多来源: ArXiv AI | 20-11-25
Ground Truth Generation for Multilingual Historical NLP using LLMs
Authors: Clovis Gladstone, Zhao Fang, Spencer Dean Stewart |
阅读更多来源: ArXiv AI | 20-11-25
Near-Lossless Model Compression Enables Longer Context Inference in DNA Large Language Models
Authors: Rui Zhu, Xiaopu Zhou, Haixu Tang, Stephen W. Scherer, Lucila Ohno-Machado |
阅读更多来源: ArXiv AI | 20-11-25
Attention via Synaptic Plasticity is All You Need: A Biologically Inspired Spiking Neuromorphic Transformer
Authors: Kallol Mondal (1 and 2), Ankush Kumar (2) ((1) Department of Electronics and Communication Engineering, National Institute of Technology Allahabad, Prayagraj, (2) Centre for Nanotechnology, Indian Institute of Technology Roorkee) |
阅读更多来源: ArXiv AI | 20-11-25
When AI Does Science: Evaluating the Autonomous AI Scientist KOSMOS in Radiation Biology
Authors: Humza Nusrat, Omar Nusrat |
阅读更多来源: ArXiv AI | 20-11-25
Artificial Intelligence Agents in Music Analysis: An Integrative Perspective Based on Two Use Cases
Authors: Antonio Manuel Martínez-Heredia, Dolores Godrid Rodríguez, Andrés Ortiz García |
阅读更多来源: ArXiv AI | 20-11-25
Scene Graph-Guided Generative AI Framework for Synthesizing and Evaluating Industrial Hazard Scenarios
Authors: Sanjay Acharjee, Abir Khan Ratul, Diego Patino, Md Nazmus Sakib |
阅读更多来源: ArXiv AI | 20-11-25
Syn-STARTS: Synthesized START Triage Scenario Generation Framework for Scalable LLM Evaluation
Authors: Chiharu Hagiwara, Naoki Nonaka, Yuhta Hashimoto, Ryu Uchimido, Jun Seita |
阅读更多来源: ArXiv AI | 20-11-25
APD-Agents: A Large Language Model-Driven Multi-Agents Collaborative Framework for Automated Page Design
Authors: Xinpeng Chen, Xiaofeng Han, Kaihao Zhang, Guochao Ren, Yujie Wang, Wenhao Cao, Yang Zhou, Jianfeng Lu, Zhenbo Song |
阅读更多来源: ArXiv AI | 20-11-25
Collaborative QA using Interacting LLMs. Impact of Network Structure, Node Capability and Distributed Data
Authors: Adit Jain, Vikram Krishnamurthy, Yiming Zhang |
阅读更多来源: ArXiv AI | 20-11-25
HFL-FlowLLM: Large Language Models for Network Traffic Flow Classification in Heterogeneous Federated Learning
Authors: Jiazhuo Tian, Yachao Yuan |
阅读更多来源: ArXiv AI | 20-11-25
Beyond Accuracy: A Multi-Dimensional Framework for Evaluating Enterprise Agentic AI Systems
Authors: Sushant Mehta |
阅读更多来源: ArXiv AI | 20-11-25
Enhancing Regional Airbnb Trend Forecasting Using LLM-Based Embeddings of Accessibility and Human Mobility
Authors: Hongju Lee, Youngjun Park, Jisun An, Dongman Lee |
阅读更多来源: ArXiv AI | 20-11-25
Do Large Language Models (LLMs) Understand Chronology?
Authors: Pattaraphon Kenny Wongchamcharoen, Paul Glasserman |
阅读更多来源: ArXiv AI | 20-11-25
PathMind: A Retrieve-Prioritize-Reason Framework for Knowledge Graph Reasoning with Large Language Models
Authors: Yu Liu, Xixun Lin, Yanmin Shang, Yangxi Li, Shi Wang, Yanan Cao |
阅读更多来源: ArXiv AI | 20-11-25
Operationalizing Pluralistic Values in Large Language Model Alignment Reveals Trade-offs in Safety, Inclusivity, and Model Behavior
Authors: Dalia Ali, Dora Zhao, Allison Koenecke, Orestis Papakyriakopoulos |
阅读更多来源: ArXiv AI | 20-11-25
When Words Change the Model: Sensitivity of LLMs for Constraint Programming Modelling
Authors: Alessio Pellegrino, Jacopo Mauro |
阅读更多来源: ArXiv AI | 20-11-25
AutoTool: Efficient Tool Selection for Large Language Model Agents
Authors: Jingyi Jia, Qinbin Li |
阅读更多来源: ArXiv AI | 20-11-25
Measuring political bias in Claudeanthropic.com
阅读更多来源: Hacker News | 20-11-25
Larry Summers resigns from OpenAI boardcnbc.com
阅读更多来源: Hacker News | 20-11-25
Microsoft AI CEO pushes back against critics after recent Windows AI backlashwindowscentral.com
阅读更多来源: Hacker News | 20-11-25
Jailbreaking AI Models to Phish Elderly Victimssimonlermen.substack.com
阅读更多来源: Hacker News | 20-11-25
Jeff Bezos launches Project Prometheus, a $6.2 billion AI bet on faster engineering
阅读更多来源: The Decoder | 19-11-25
OpenAI promises a “much better version” of its Olympic math gold model in the coming months
阅读更多来源: The Decoder | 19-11-25
Solving a million-step LLM task with zero errorsarxiv.org
阅读更多来源: Hacker News | 19-11-25
Google boss says AI investment boom has 'elements of irrationality'bbc.com
阅读更多来源: Hacker News | 19-11-25
Exploring the Limits of Large Language Models as Quant Tradersnof1.ai
阅读更多来源: Hacker News | 19-11-25
Gemini 3 Pro Model Card [pdf]storage.googleapis.com
阅读更多来源: Hacker News | 19-11-25
Gemini 3blog.google
阅读更多来源: Hacker News | 19-11-25
Trying out Gemini 3 Pro with audio transcription and a new pelican benchmarksimonwillison.net
阅读更多来源: Hacker News | 19-11-25
Bild AI (YC W25) is hiring – Make housing affordableycombinator.com
阅读更多来源: Hacker News | 19-11-25
MoralReason: Generalizable Moral Decision Alignment For LLM Agents Using Reasoning-Level Reinforcement Learning
Authors: Zhiyu An, Wan Du |
阅读更多来源: ArXiv AI | 19-11-25
Mobile-Agent-RAG: Driving Smart Multi-Agent Coordination with Contextual Knowledge Empowerment for Long-Horizon Mobile Automation
Authors: Yuxiang Zhou, Jichang Li, Yanhao Zhang, Haonan Lu, Guanbin Li |
阅读更多来源: ArXiv AI | 19-11-25
LOBERT: Generative AI Foundation Model for Limit Order Book Messages
Authors: Eljas Linna, Kestutis Baltakys, Alexandros Iosifidis, Juho Kanniainen |
阅读更多来源: ArXiv AI | 19-11-25
ARCHE: A Novel Task to Evaluate LLMs on Latent Reasoning Chain Extraction
Authors: Pengze Li, Jiaqi Liu, Junchi Yu, Lihao Liu, Mingyu Ding, Wanli Ouyang, Shixiang Tang, Xi Chen |
阅读更多来源: ArXiv AI | 19-11-25
Event-CausNet: Unlocking Causal Knowledge from Text with Large Language Models for Reliable Spatio-Temporal Forecasting
Authors: Luyao Niu, Zepu Wang, Shuyi Guan, Yang Liu, Peng Sun |
阅读更多来源: ArXiv AI | 19-11-25
Optimal Foraging in Memory Retrieval: Evaluating Random Walks and Metropolis-Hastings Sampling in Modern Semantic Spaces
Authors: James Moore |
阅读更多来源: ArXiv AI | 19-11-25
Bootstrapping LLMs via Preference-Based Policy Optimization
Authors: Chen Jia |
阅读更多来源: ArXiv AI | 19-11-25
Online Learning of HTN Methods for integrated LLM-HTN Planning
Authors: Yuesheng Xu, Hector Munoz-Avila |
阅读更多来源: ArXiv AI | 19-11-25
GEM: Generative Entropy-Guided Preference Modeling for Few-shot Alignment of LLMs
Authors: Yiyang Zhao, Huiyu Bai, Xuejiao Zhao |
阅读更多来源: ArXiv AI | 19-11-25
PragWorld: A Benchmark Evaluating LLMs' Local World Model under Minimal Linguistic Alterations and Conversational Dynamics
Authors: Sachin Vashistha, Aryan Bibhuti, Atharva Naik, Martin Tutek, Somak Aditya |
阅读更多来源: ArXiv AI | 19-11-25
InteractiveGNNExplainer: A Visual Analytics Framework for Multi-Faceted Understanding and Probing of Graph Neural Network Predictions
Authors: TC Singh, Sougata Mukherjea |
阅读更多来源: ArXiv AI | 19-11-25
MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications
Authors: Gagan Raj Gupta, Anshul Kumar, Manish Rai, Apu Chakraborty, Ashutosh Modi, Abdelaali Chaoub, Soumajit Pramanik, Moyank Giri, Yashwanth Holla, Sunny Kumar, M. V. Kiran Sooraj |
阅读更多来源: ArXiv AI | 19-11-25
Learning to Solve Resource-Constrained Project Scheduling Problems with Duration Uncertainty using Graph Neural Networks
Authors: Guillaume Infantes, Stéphanie Roussel, Antoine Jacquet, Emmanuel Benazera |
阅读更多来源: ArXiv AI | 19-11-25
Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment
Authors: Jea Kwon, Luiz Felipe Vecchietti, Sungwon Park, Meeyoung Cha |
阅读更多来源: ArXiv AI | 19-11-25
Multi-Agent Multimodal Large Language Model Framework for Automated Interpretation of Fuel Efficiency Analytics in Public Transportation
Authors: Zhipeng Ma, Ali Rida Bahja, Andreas Burgdorf, André Pomp, Tobias Meisen, Bo Nørregaard Jørgensen, Zheng Grace Ma |
阅读更多来源: ArXiv AI | 19-11-25
An Operational Kardashev-Style Scale for Autonomous AI - Towards AGI and Superintelligence
Authors: Przemyslaw Chojecki |
阅读更多来源: ArXiv AI | 19-11-25
Automated Construction of Medical Indicator Knowledge Graphs Using Retrieval Augmented Large Language Models
Authors: Zhengda Wang, Daqian Shi, Jingyi Zhao, Xiaolei Diao, Xiongfeng Tang, Yanguo Qin |
阅读更多来源: ArXiv AI | 19-11-25
Artificial Intelligence-driven Intelligent Wearable Systems: A full-stack Integration from Material Design to Personalized Interaction
Authors: Jingyi Zhao, Daqian Shi, Zhengda Wang, Xiongfeng Tang, Yanguo Qin |
阅读更多来源: ArXiv AI | 19-11-25
Beyond Mimicry: Preference Coherence in LLMs
Authors: Luhan Mikaelson, Derek Shiller, Hayley Clatterbuck |
阅读更多来源: ArXiv AI | 19-11-25
Leaked finances hint that OpenAI's inference may be swallowing its revenue
阅读更多来源: The Decoder | 18-11-25
Windows 11 adds AI agent that runs in background with access to personal folderswindowslatest.com
阅读更多来源: Hacker News | 18-11-25
Project Geminigeminiprotocol.net
阅读更多来源: Hacker News | 18-11-25
Gemini 3 Pro Model Cardpixeldrain.com
阅读更多来源: Hacker News | 18-11-25
Show HN: Continuous Claude – run Claude Code in a loopgithub.com/anandchowdhary
阅读更多来源: Hacker News | 18-11-25
I caught Google Gemini using my data–and then covering it upunbuffered.stream
阅读更多来源: Hacker News | 18-11-25
Privacy Challenges and Solutions in Retrieval-Augmented Generation-Enhanced LLMs for Healthcare Chatbots: A Review of Applications, Risks, and Future Directions
Authors: Shaowei Guan, Hin Chi Kwok, Ngai Fong Law, Gregor Stiglic, Vivian Hui |
阅读更多来源: ArXiv AI | 18-11-25
Benchmarking Visual LLMs Resilience to Unanswerable Questions on Visually Rich Documents
Authors: Davide Napolitano, Luca Cagliero, Fabrizio Battiloro |
阅读更多来源: ArXiv AI | 18-11-25
From Efficiency to Adaptivity: A Deeper Look at Adaptive Reasoning in Large Language Models
Authors: Chao Wu, Baoheng Li, Mingchen Gao, Zhenyi Wang |
阅读更多来源: ArXiv AI | 18-11-25
HyperComplEx: Adaptive Multi-Space Knowledge Graph Embeddings
Authors: Jugal Gajjar, Kaustik Ranaware, Kamalasankari Subramaniakuppusamy, Vaibhav Gandhi |
阅读更多来源: ArXiv AI | 18-11-25
LLM enhanced graph inference for long-term disease progression modelling
Authors: Tiantian He, An Zhao, Elinor Thompson, Anna Schroder, Ahmed Abdulaal, Frederik Barkhof, Daniel C. Alexander |
阅读更多来源: ArXiv AI | 18-11-25
Enhancing Demand-Oriented Regionalization with Agentic AI and Local Heterogeneous Data for Adaptation Planning
Authors: Seyedeh Mobina Noorani, Shangde Gao, Changjie Chen, Karla Saldana Ochoa |
阅读更多来源: ArXiv AI | 18-11-25
STaR: Towards Cognitive Table Reasoning via Slow-Thinking Large Language Models
Authors: Huajian Zhang, Mingyue Cheng, Yucong Luo, Xiaoyu Tao |
阅读更多来源: ArXiv AI | 18-11-25
AIonopedia: an LLM agent orchestrating multimodal learning for ionic liquid discovery
Authors: Yuqi Yin, Yibo Fu, Siyuan Wang, Peng Sun, Hongyu Wang, Xiaohui Wang, Lei Zheng, Zhiyong Li, Zhirong Liu, Jianji Wang, Zhaoxi Sun |
阅读更多来源: ArXiv AI | 18-11-25
UAVBench: An Open Benchmark Dataset for Autonomous and Agentic AI UAV Systems via LLM-Generated Flight Scenarios
Authors: Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah |
阅读更多来源: ArXiv AI | 18-11-25
A Workflow for Full Traceability of AI Decisions
Authors: Julius Wenzel, Syeda Umaima Alam, Andreas Schmidt, Hanwei Zhang, Holger Hermanns |
阅读更多来源: ArXiv AI | 18-11-25
OpenAI publishes prompting guide for GPT-5.1
阅读更多来源: The Decoder | 17-11-25
OpenAI’s GPT‑5.1 Reddit AMA unraveled into a full‑blown karma massacre
阅读更多来源: The Decoder | 17-11-25
The Pragmatic Programmer: 20th Anniversary Edition (2023)ahalbert.com
阅读更多来源: Hacker News | 17-11-25
Anthropic’s paper smells like bullshitdjnn.sh
阅读更多来源: Hacker News | 17-11-25
Echoing: Identity Failures when LLM Agents Talk to Each Other
Authors: Sarath Shekkizhar, Romain Cosentino, Adam Earle, Silvio Savarese |
阅读更多来源: ArXiv AI | 17-11-25
Thermally Activated Dual-Modal Adversarial Clothing against AI Surveillance Systems
Authors: Jiahuan Long, Tingsong Jiang, Hanqing Liu, Chao Ma, Wen Yao |
阅读更多来源: ArXiv AI | 17-11-25
Why Open Small AI Models Matter for Interactive Art
Authors: Mar Canet Sola, Varvara Guljajeva |
阅读更多来源: ArXiv AI | 17-11-25
AI Annotation Orchestration: Evaluating LLM verifiers to Improve the Quality of LLM Annotations in Learning Analytics
Authors: Bakhtawar Ahtisham, Kirk Vanacore, Jinsook Lee, Zhuqian Zhou, Doug Pietrzak, Rene F. Kizilcec |
阅读更多来源: ArXiv AI | 17-11-25
CTRL-ALT-DECEIT: Sabotage Evaluations for Automated AI R&D
Authors: Francis Rhys Ward, Teun van der Weij, Hanna Gábor, Sam Martin, Raja Mehta Moreno, Harel Lidar, Louis Makower, Thomas Jodrell, Lauren Robson |
阅读更多来源: ArXiv AI | 17-11-25
Quantum Artificial Intelligence (QAI): Foundations, Architectural Elements, and Future Directions
Authors: Siva Sai, Rajkumar Buyya |
阅读更多来源: ArXiv AI | 17-11-25
SPAN: Benchmarking and Improving Cross-Calendar Temporal Reasoning of Large Language Models
Authors: Zhongjian Miao, Hao Fu, Chen Wei |
阅读更多来源: ArXiv AI | 17-11-25
Adaptive Hyperbolic Kernels: Modulated Embedding in de Branges-Rovnyak Spaces
Authors: Leping Si, Meimei Yang, Hui Xue, Shipeng Zhu, Pengfei Fang |
阅读更多来源: ArXiv AI | 17-11-25
Radiology Workflow-Guided Hierarchical Reinforcement Fine-Tuning for Medical Report Generation
Authors: Bodong Du, Honglong Yang, Xiaomeng Li |
阅读更多来源: ArXiv AI | 17-11-25
Beyond ReAct: A Planner-Centric Framework for Complex Tool-Augmented LLM Reasoning
Authors: Xiaolong Wei, Yuehu Dong, Xingliang Wang, Xingyu Zhang, Zhejun Zhao, Dongdong Shen, Long Xia, Dawei Yin |
阅读更多来源: ArXiv AI | 17-11-25
Enhancing the Medical Context-Awareness Ability of LLMs via Multifaceted Self-Refinement Learning
Authors: Yuxuan Zhou, Yubin Wang, Bin Wang, Chen Ning, Xien Liu, Ji Wu, Jianye Hao |
阅读更多来源: ArXiv AI | 17-11-25
Advanced Black-Box Tuning of Large Language Models with Limited API Calls
Authors: Zhikang Xie, Weilin Wan, Peizhu Gong, Weizhong Zhang, Cheng Jin |
阅读更多来源: ArXiv AI | 17-11-25
DenoGrad: Deep Gradient Denoising Framework for Enhancing the Performance of Interpretable AI Models
Authors: J. Javier Alonso-Ramos, Ignacio Aguilera-Martos, Andrés Herrera-Poyatos, Francisco Herrera |
阅读更多来源: ArXiv AI | 17-11-25
RAGFort: Dual-Path Defense Against Proprietary Knowledge Base Extraction in Retrieval-Augmented Generation
Authors: Qinfeng Li, Miao Pan, Ke Xiong, Ge Su, Zhiqiang Shen, Yan Liu, Bing Sun, Hao Peng, Xuhong Zhang |
阅读更多来源: ArXiv AI | 17-11-25
ProgRAG: Hallucination-Resistant Progressive Retrieval and Reasoning over Knowledge Graphs
Authors: Minbae Park, Hyemin Yang, Jeonghyun Kim, Kunsoo Park, Hyunjoon Kim |
阅读更多来源: ArXiv AI | 17-11-25
Bridging Synthetic and Real Routing Problems via LLM-Guided Instance Generation and Progressive Adaptation
Authors: Jianghan Zhu, Yaoxin Wu, Zhuoyi Lin, Zhengyuan Zhang, Haiyan Yin, Zhiguang Cao, Senthilnath Jayavelu, Xiaoli Li |
阅读更多来源: ArXiv AI | 17-11-25
Proceedings of The third international workshop on eXplainable AI for the Arts (XAIxArts)
Authors: Corey Ford, Elizabeth Wilson, Shuoyang Zheng, Gabriel Vigliensoni, Jeba Rezwana, Lanxi Xiao, Michael Clemens, Makayla Lewis, Drew Hemment, Alan Chamberlain, Helen Kennedy, Nick Bryan-Kinns |
阅读更多来源: ArXiv AI | 17-11-25
Strategic Opponent Modeling with Graph Neural Networks, Deep Reinforcement Learning and Probabilistic Topic Modeling
Authors: Georgios Chalkiadakis, Charilaos Akasiadis, Gerasimos Koresis, Stergios Plataniots, Leonidas Bakopoulos |
阅读更多来源: ArXiv AI | 17-11-25
Rethinking Science in the Age of Artificial Intelligence
Authors: Maksim E. Eren, Dorianis M. Perez |
阅读更多来源: ArXiv AI | 17-11-25
Blocking LLM crawlers without JavaScriptowl.is
阅读更多来源: Hacker News | 16-11-25
Anthropic uncovers first large-scale AI-orchestrated cyberattack
阅读更多来源: The Decoder | 16-11-25
Trellis AI (YC W24) Is Hiring: Streamline access to life-saving therapiesycombinator.com
阅读更多来源: Hacker News | 16-11-25
Weighting an average to minimize variancejohndcook.com
阅读更多来源: Hacker News | 16-11-25
OpenAI launches GPT-5.1 API with improved coding capabilities and new developer features
阅读更多来源: The Decoder | 15-11-25
Microsoft CEO Satya Nadella warns rivals about chasing low-margin AI compute
阅读更多来源: The Decoder | 15-11-25
Structured outputs on the Claude Developer Platformclaude.com
阅读更多来源: Hacker News | 15-11-25
Streaming AI Agent Desktops with Gaming Protocolshelix.ml
阅读更多来源: Hacker News | 15-11-25
Activeloop (YC S18) Is Hiring MTS(Back End)and AI Search Engineeractiveloop.ai
阅读更多来源: Hacker News | 15-11-25
OpenAI pushes ChatGPT toward a more personal assistant with GPT-5.1 update
阅读更多来源: The Decoder | 14-11-25
Anthropic bets $50 billion on US AI data centers
阅读更多来源: The Decoder | 14-11-25
Human-aligned AI models prove more robust and reliable
阅读更多来源: The Decoder | 14-11-25
Deepmind’s latest AI agent learns by exploring unfamiliar games and AI-built worlds
阅读更多来源: The Decoder | 14-11-25
SlopStop: Community-driven AI slop detection in Kagi Searchkagi.com
阅读更多来源: Hacker News | 14-11-25
Disrupting the first reported AI-orchestrated cyber espionage campaignanthropic.com
阅读更多来源: Hacker News | 14-11-25
Nano Banana can be prompt engineered for nuanced AI image generationminimaxir.com
阅读更多来源: Hacker News | 14-11-25
Self-Correcting Large Language Models: Generation vs. Multiple Choice
Authors: Hossein A. Rahmani, Satyapriya Krishna, Xi Wang, Mohammadmehdi Naghiaei, Emine Yilmaz |
阅读更多来源: ArXiv AI | 14-11-25
LLM-Guided Dynamic-UMAP for Personalized Federated Graph Learning
Authors: Sai Puppala, Ismail Hossain, Md Jahangir Alam, Tanzim Ahad, Sajedul Talukder |
阅读更多来源: ArXiv AI | 14-11-25
Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque
Authors: Lukas Arana, Julen Etxaniz, Ander Salaberria, Gorka Azkune |
阅读更多来源: ArXiv AI | 14-11-25
How does the Performance of the Data-driven Traffic Flow Forecasting Models deteriorate with Increasing Forecasting Horizon? An Extensive Approach Considering Statistical, Machine Learning and Deep Learning Models
Authors: Amanta Sherfenaz, Nazmul Haque, Protiva Sadhukhan Prova, Md Asif Raihan, Md. Hadiuzzaman |
阅读更多来源: ArXiv AI | 14-11-25
Bridging Natural Language and ASP: A Hybrid Approach Using LLMs and AMR Parsing
Authors: Connar Hite, Sean Saud, Raef Taha, Nayim Rahman, Tanvir Atahary, Scott Douglass, Tarek Taha |
阅读更多来源: ArXiv AI | 14-11-25
UCO: A Multi-Turn Interactive Reinforcement Learning Method for Adaptive Teaching with Large Language Models
Authors: Shouang Wei, Min Zhang, Xin Lin, Bo Jiang, Kun Kuang, Zhongxiang Dai |
阅读更多来源: ArXiv AI | 14-11-25
AlphaCast: A Human Wisdom-LLM Intelligence Co-Reasoning Framework for Interactive Time Series Forecasting
Authors: Xiaohan Zhang, Tian Gao, Mingyue Cheng, Bokai Pan, Ze Guo, Yaguo Liu, Xiaoyu Tao |
阅读更多来源: ArXiv AI | 14-11-25
A Research on Business Process Optimisation Model Integrating AI and Big Data Analytics
Authors: Di Liao, Ruijia Liang, Ziyi Ye |
阅读更多来源: ArXiv AI | 14-11-25
The Double Contingency Problem: AI Recursion and the Limits of Interspecies Understanding
Authors: Graham L. Bishop (University of California, San Diego) |
阅读更多来源: ArXiv AI | 14-11-25
Heterogeneous Graph Neural Networks for Assumption-Based Argumentation
Authors: Preesha Gehlot, Anna Rapberger, Fabrizio Russo, Francesca Toni |
阅读更多来源: ArXiv AI | 14-11-25
Advancing Autonomous Emergency Response Systems: A Generative AI Perspective
Authors: Yousef Emami, Radha Reddy, Azadeh Pourkabirian, Miguel Gutierrez Gaitan |
阅读更多来源: ArXiv AI | 14-11-25
Perspectives on a Reliability Monitoring Framework for Agentic AI Systems
Authors: Niclas Flehmig, Mary Ann Lundteigen, Shen Yin |
阅读更多来源: ArXiv AI | 14-11-25
From Model Training to Model Raising -- A call to reform AI model training paradigms from post-hoc alignment to intrinsic, identity-based development
Authors: Roland Aydin, Christian Cyron, Steve Bachelor, Ashton Anderson, Robert West |
阅读更多来源: ArXiv AI | 14-11-25
MedFuse: Multiplicative Embedding Fusion For Irregular Clinical Time Series
Authors: Yi-Hsien Hsieh, Ta-Jung Chien, Chun-Kai Huang, Shao-Hua Sun, Che Lin |
阅读更多来源: ArXiv AI | 14-11-25
BarrierBench : Evaluating Large Language Models for Safety Verification in Dynamical Systems
Authors: Ali Taheri, Alireza Taban, Sadegh Soudjani, Ashutosh Trivedi |
阅读更多来源: ArXiv AI | 14-11-25
The 2025 Planning Performance of Frontier Large Language Models
Authors: Augusto B. Corrêa, André G. Pereira, Jendrik Seipp |
阅读更多来源: ArXiv AI | 14-11-25
UK plans pre-release AI testing to prevent child abuse imagery
阅读更多来源: The Decoder | 13-11-25
Investors warn of AI chip bubble as AMD and D‑Matrix keep the boom alive
阅读更多来源: The Decoder | 13-11-25
Fighting the New York Times' invasion of user privacyopenai.com
阅读更多来源: Hacker News | 13-11-25
GPT-5.1: A smarter, more conversational ChatGPTopenai.com
阅读更多来源: Hacker News | 13-11-25
Telli (Voice AI – YC F24) is hiring ambitious engineers in [on-site, Berlin]telli.com
阅读更多来源: Hacker News | 13-11-25
Hard drives on backorder for two years as AI data centers trigger HDD shortagetomshardware.com
阅读更多来源: Hacker News | 13-11-25
Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction
Authors: Jun Xu, Xinkai Du, Yu Ao, Peilong Zhao, Yang Li, Ling Zhong, Lin Yuan, Zhongpu Bo, Xiaorui Wang, Mengshu Sun, Zhengke Gui, Dalong Zhang, Zhaoyang Wang, Qiwei Wang, Yangyang Hou, Zhiying Yin, Haofen Wang, Huajun Chen, Lei Liang, Jun Zhou |
阅读更多来源: ArXiv AI | 13-11-25
Computational Blueprints: Generating Isomorphic Mathematics Problems with Large Language Models
Authors: Jeong-Hoon Kim, Jinwoo Nam, Geunsik Jo |
阅读更多来源: ArXiv AI | 13-11-25
Benchmarking Multi-Step Legal Reasoning and Analyzing Chain-of-Thought Effects in Large Language Models
Authors: Wenhan Yu, Xinbo Lin, Lanxin Ni, Jinhua Cheng, Lei Sha |
阅读更多来源: ArXiv AI | 13-11-25
Enhancing Logical Expressiveness in Graph Neural Networks via Path-Neighbor Aggregation
Authors: Han Yu, Xiaojuan Zhao, Aiping Li, Kai Chen, Ziniu Liu, Zhichao Peng |
阅读更多来源: ArXiv AI | 13-11-25
VSPO: Validating Semantic Pitfalls in Ontology via LLM-Based CQ Generation
Authors: Hyojun Choi, Seokju Hwang, Kyong-Ho Lee |
阅读更多来源: ArXiv AI | 13-11-25
Numerical Sensitivity and Robustness: Exploring the Flaws of Mathematical Reasoning in Large Language Models
Authors: Zhishen Sun, Guang Dai, Ivor Tsang, Haishan Ye |
阅读更多来源: ArXiv AI | 13-11-25
Combining LLM Semantic Reasoning with GNN Structural Modeling for Multi-view Multi-Label Feature Selection
Authors: Zhiqi Chen, Yuzhou Liu, Jiarui Liu, Wanfu Gao |
阅读更多来源: ArXiv AI | 13-11-25
Dual-Process Scaffold Reasoning for Enhancing LLM Code Debugging
Authors: Po-Chung Hsieh, Chin-Po Chen, Jeng-Lin Li, Ming-Ching Chang |
阅读更多来源: ArXiv AI | 13-11-25
Towards a Standard, Enterprise-Relevant Agentic AI Benchmark: Lessons from 5.5 billion tokens' worth of agentic AI evaluations
Authors: JV Roig |
阅读更多来源: ArXiv AI | 13-11-25
Information Capacity: Evaluating the Efficiency of Large Language Models via Text Compression
Authors: Cheng Yuan, Jiawei Shao, Chi Zhang, Xuelong Li |
阅读更多来源: ArXiv AI | 13-11-25
MSCR: Exploring the Vulnerability of LLMs' Mathematical Reasoning Abilities Using Multi-Source Candidate Replacement
Authors: Zhishen Sun, Guang Dai, Haishan Ye |
阅读更多来源: ArXiv AI | 13-11-25
Improving Industrial Injection Molding Processes with Explainable AI for Quality Classification
Authors: Georg Rottenwalter, Marcel Tilly, Victor Owolabi |
阅读更多来源: ArXiv AI | 13-11-25
Prudential Reliability of Large Language Models in Reinsurance: Governance, Assurance, and Capital Efficiency
Authors: Stella C. Dong |
阅读更多来源: ArXiv AI | 13-11-25
oboro: Text-to-Image Synthesis on Limited Data using Flow-based Diffusion Transformer with MMH Attention
Authors: Ryusuke Mizutani, Kazuaki Matano, Tsugumi Kadowaki, Haruki Tenya, Layris, nuigurumi, Koki Hashimoto, Yu Tanaka |
阅读更多来源: ArXiv AI | 13-11-25
EHRStruct: A Comprehensive Benchmark Framework for Evaluating Large Language Models on Structured Electronic Health Record Tasks
Authors: Xiao Yang, Xuejiao Zhao, Zhiqi Shen |
阅读更多来源: ArXiv AI | 13-11-25
Towards Outcome-Oriented, Task-Agnostic Evaluation of AI Agents
Authors: Waseem AlShikh, Muayad Sayed Ali, Brian Kennedy, Dmytro Mozolevskyi |
阅读更多来源: ArXiv AI | 13-11-25
Multi-Agent GraphRAG: A Text-to-Cypher Framework for Labeled Property Graphs
Authors: Anton Gusarov, Anastasia Volkova, Valentin Khrulkov, Andrey Kuznetsov, Evgenii Maslov, Ivan Oseledets |
阅读更多来源: ArXiv AI | 13-11-25
FaithAct: Faithfulness Planning and Acting in MLLMs
Authors: Junxian Li, Xinyue Xu, Sai Ma, Sichao Li |
阅读更多来源: ArXiv AI | 13-11-25
Patching LLM Like Software: A Lightweight Method for Improving Safety Policy in Large Language Models
Authors: Huzaifa Arif, Keerthiram Murugesan, Ching-Yun Ko, Pin-Yu Chen, Payel Das, Alex Gittens |
阅读更多来源: ArXiv AI | 13-11-25
Hyperdimensional Decoding of Spiking Neural Networks
Authors: Cedrick Kinavuidi, Luca Peres, Oliver Rhodes |
阅读更多来源: ArXiv AI | 13-11-25
Simulating the Visual World with Artificial Intelligence: A Roadmap
Authors: Jingtong Yue, Ziqi Huang, Zhaoxi Chen, Xintao Wang, Pengfei Wan, Ziwei Liu |
阅读更多来源: ArXiv AI | 13-11-25
Wikipedia calls for fair licensing as AI companies rely on its content
阅读更多来源: The Decoder | 12-11-25
Germany's cybersecurity agency issues new guidelines to protect LLMs from persistent threats
阅读更多来源: The Decoder | 12-11-25
So-called reasoning models are more efficient but not more capable than regular LLMs, study finds
阅读更多来源: The Decoder | 12-11-25
The scientist who taught AI to see now wants it to understand space
阅读更多来源: The Decoder | 12-11-25
Yann LeCun reportedly leaving Meta to launch new AI startup
阅读更多来源: The Decoder | 12-11-25
We ran over 600 image generations to compare AI image modelslatenitesoft.com
阅读更多来源: Hacker News | 12-11-25
Why Nietzsche matters in the age of artificial intelligenceacm.org
阅读更多来源: Hacker News | 12-11-25
Perkeep – Personal storage system for lifeperkeep.org
阅读更多来源: Hacker News | 12-11-25
Yann LeCun to depart Meta and launch AI startup focused on 'world models'nasdaq.com
阅读更多来源: Hacker News | 12-11-25
Pakistani newspaper mistakenly prints AI prompt with the articletwitter.com/omar_quraishi
阅读更多来源: Hacker News | 12-11-25
Maestro: Learning to Collaborate via Conditional Listwise Policy Optimization for Multi-Agent LLMs
Authors: Wei Yang, Jiacheng Pang, Shixuan Li, Paul Bogdan, Stephen Tu, Jesse Thomason |
阅读更多来源: ArXiv AI | 12-11-25
Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles
Authors: Fatima Jahara, Mark Dredze, Sharon Levy |
阅读更多来源: ArXiv AI | 12-11-25
Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads
Authors: Jingwei Ni, Ekaterina Fadeeva, Tianyi Wu, Mubashara Akhtar, Jiaheng Zhang, Elliott Ash, Markus Leippold, Timothy Baldwin, See-Kiong Ng, Artem Shelmanov, Mrinmaya Sachan |
阅读更多来源: ArXiv AI | 12-11-25
CSP4SDG: Constraint and Information-Theory Based Role Identification in Social Deduction Games with LLM-Enhanced Inference
Authors: Kaijie Xu, Fandi Meng, Clark Verbrugge, Simon Lucas |
阅读更多来源: ArXiv AI | 12-11-25
GAIA: A General Agency Interaction Architecture for LLM-Human B2B Negotiation & Screening
Authors: Siming Zhao, Qi Li |
阅读更多来源: ArXiv AI | 12-11-25
What Makes Reasoning Invalid: Echo Reflection Mitigation for Large Language Models
Authors: Chen He, Xun Jiang, Lei Wang, Hao Yang, Chong Peng, Peng Yan, Fumin Shen, Xing Xu |
阅读更多来源: ArXiv AI | 12-11-25
LPFQA: A Long-Tail Professional Forum-based Benchmark for LLM Evaluation
Authors: Liya Zhu, Peizhuang Cong, Aowei Ji, Wenya Wu, Jiani Hou, Chunjie Wu, Xiang Gao, Jingkai Liu, Zhou Huan, Xuelei Sun, Yang Yang, Jianpeng Jiao, Liang Hu, Xinjie Chen, Jiashuo Liu, Jingzhe Ding, Tong Yang, Zaiyuan Wang, Ge Zhang, Wenhao Huang |
阅读更多来源: ArXiv AI | 12-11-25
SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization
Authors: Zhi Zheng, Wee Sun Lee |
阅读更多来源: ArXiv AI | 12-11-25
Efficient LLM Safety Evaluation through Multi-Agent Debate
Authors: Dachuan Lin, Guobin Shen, Zihao Yang, Tianrong Liu, Dongcheng Zhao, Yi Zeng |
阅读更多来源: ArXiv AI | 12-11-25
SRNN: Spatiotemporal Relational Neural Network for Intuitive Physics Understanding
Authors: Fei Yang |
阅读更多来源: ArXiv AI | 12-11-25
Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives
Authors: Chloe Li, Mary Phuong, Daniel Tan |
阅读更多来源: ArXiv AI | 12-11-25
RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
Authors: Fei Zhao, Chonggang Lu, Haofu Qian, Fangcheng Shi, Zijie Meng, Jianzhao Huang, Xu Tang, Zheyong Xie, Zheyu Ye, Zhe Xu, Yao Hu, Shaosheng Cao |
阅读更多来源: ArXiv AI | 12-11-25
Do LLMs Feel? Teaching Emotion Recognition with Prompts, Retrieval, and Curriculum Learning
Authors: Xinran Li, Xiujuan Xu, Jiaqi Qiao, Yu Liu |
阅读更多来源: ArXiv AI | 12-11-25
MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning
Authors: Jinhao Chen, Zhen Yang, Jianxin Shi, Tianyu Wo, Jie Tang |
阅读更多来源: ArXiv AI | 12-11-25
Agentic AI Sustainability Assessment for Supply Chain Document Insights
Authors: Diego Gosmar, Anna Chiara Pallotta, Giovanni Zenezini |
阅读更多来源: ArXiv AI | 12-11-25
LLM Driven Processes to Foster Explainable AI
Authors: Marcel Pehlke, Marc Jansen |
阅读更多来源: ArXiv AI | 12-11-25
Increasing AI Explainability by LLM Driven Standard Processes
Authors: Marc Jansen, Marcel Pehlke |
阅读更多来源: ArXiv AI | 12-11-25
Saliency Map-Guided Knowledge Discovery for Subclass Identification with LLM-Based Symbolic Approximations
Authors: Tim Bohne, Anne-Kathrin Patricia Windler, Martin Atzmueller |
阅读更多来源: ArXiv AI | 12-11-25
Two Heads are Better than One: Distilling Large Language Model Features Into Small Models with Feature Decomposition and Mixture
Authors: Tianhao Fu, Xinxin Xu, Weichen Xu, Jue Chen, Ruilong Ren, Bowen Deng, Xinyu Zhao, Jian Cao, Xixin Cao |
阅读更多来源: ArXiv AI | 12-11-25
MENTOR: A Metacognition-Driven Self-Evolution Framework for Uncovering and Mitigating Implicit Risks in LLMs on Domain Tasks
Authors: Liang Shan, Kaicheng Shen, Wen Wu, Zhenyu Ying, Chaochao Lu, Guangze Ye, Liang He |
阅读更多来源: ArXiv AI | 12-11-25
Evaluating Online Moderation Via LLM-Powered Counterfactual Simulations
Authors: Giacomo Fidone, Lucia Passaro, Riccardo Guidotti |
阅读更多来源: ArXiv AI | 12-11-25
AgenticSciML: Collaborative Multi-Agent Systems for Emergent Discovery in Scientific Machine Learning
Authors: Qile Jiang, George Karniadakis |
阅读更多来源: ArXiv AI | 12-11-25
Xortran - A PDP-11 Neural Network With Backpropagation in Fortran IVgithub.com/dbrll
阅读更多来源: Hacker News | 12-11-25
Agentic pelican on a bicyclerobert-glaser.de
阅读更多来源: Hacker News | 12-11-25
Modern Optimizers – An Alchemist's Notes on Deep Learningkvfrans.com
阅读更多来源: Hacker News | 12-11-25
Adk-go: code-first Go toolkit for building, evaluating, and deploying AI agentsgithub.com/google
阅读更多来源: Hacker News | 12-11-25
ElevenLabs opens a marketplace for iconic AI voices
阅读更多来源: The Decoder | 12-11-25
New RECAP tool exposes just how much copyrighted text LLM's can regurgitate
阅读更多来源: The Decoder | 12-11-25
Anthropic is betting on an audacious leap from $4.7 billion to $70 billion in revenue by 2028
阅读更多来源: The Decoder | 11-11-25
OpenAI says your routine work is too mundane for you to notice how fast AI is advancing
阅读更多来源: The Decoder | 11-11-25
Using Generative AI in Content Productionnetflixstudios.com
阅读更多来源: Hacker News | 11-11-25
OpenAI may not use lyrics without license, German court rulesreuters.com
阅读更多来源: Hacker News | 11-11-25
I Fell in Love with Erlangboragonul.com
阅读更多来源: Hacker News | 11-11-25
Benchmarking leading AI agents against Google reCAPTCHA v2roundtable.ai
阅读更多来源: Hacker News | 11-11-25
Launch HN: Hypercubic (YC F25) – AI for COBOL and Mainframes
阅读更多来源: Hacker News | 11-11-25
DL101 Neural Network Outputs and Loss Functions
Authors: Fernando Berzal |
阅读更多来源: ArXiv AI | 11-11-25
Deep learning models are vulnerable, but adversarial examples are even more vulnerable
Authors: Jun Li, Yanwei Xu, Keran Li, Xiaoli Zhang |
阅读更多来源: ArXiv AI | 11-11-25
UA-Code-Bench: A Competitive Programming Benchmark for Evaluating LLM Code Generation in Ukrainian
Authors: Mykyta Syromiatnikov, Victoria Ruvinskaya |
阅读更多来源: ArXiv AI | 11-11-25
No One-Model-Fits-All: Uncovering Spatio-Temporal Forecasting Trade-offs with Graph Neural Networks and Foundation Models
Authors: Ragini Gupta, Naman Raina, Bo Chen, Li Chen, Claudiu Danilov, Josh Eckhardt, Keyshla Bernard, Klara Nahrstedt |
阅读更多来源: ArXiv AI | 11-11-25
Generating Software Architecture Description from Source Code using Reverse Engineering and Large Language Model
Authors: Ahmad Hatahet, Christoph Knieke, Andreas Rausch |
阅读更多来源: ArXiv AI | 11-11-25
TAMAS: Benchmarking Adversarial Risks in Multi-Agent LLM Systems
Authors: Ishan Kavathekar, Hemang Jain, Ameya Rathod, Ponnurangam Kumaraguru, Tanuja Ganu |
阅读更多来源: ArXiv AI | 11-11-25
Integrating Score-Based Diffusion Models with Machine Learning-Enhanced Localization for Advanced Data Assimilation in Geological Carbon Storage
Authors: Gabriel Serrão Seabra (1, 2), Nikolaj T. Mücke (1), Vinicius Luiz Santos Silva (2, 4), Alexandre A. Emerick (2), Denis Voskov (1, 5), Femke Vossepoel (1) ((1) Faculty of Civil Engineering and Geosciences, TU Delft, Delft, Netherlands, (2) Petroleo Brasileiro S.A. (Petrobras), Rio de Janeiro, Brazil, (4) Imperial College London, London, United Kingdom, (5) Department of Energy Resources Engineering, Stanford University, CA, USA) |
阅读更多来源: ArXiv AI | 11-11-25
TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework
Authors: Chao Zhang, Yuhao Wang, Derong Xu, Haoxin Zhang, Yuanjie Lyu, Yuhao Chen, Shuochen Liu, Tong Xu, Xiangyu Zhao, Yan Gao, Yao Hu, Enhong Chen |
阅读更多来源: ArXiv AI | 11-11-25
"I Like That You Have to Poke Around": Instructors on How Experiential Approaches to AI Literacy Spark Inquiry and Critical Thinking
Authors: Aparna Maya Warrier, Arav Agarwal, Jaromir Savelka, Christopher Bogart, Heather Burte |
阅读更多来源: ArXiv AI | 11-11-25
SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models
Authors: Jingxuan Xu, Ken Deng, Weihao Li, Songwei Yu, Huaixi Tang, Haoyang Huang, Zhiyi Lai, Zizheng Zhan, Yanan Wu, Chenchen Zhang, Kepeng Lei, Yifan Yao, Xinping Lei, Wenqiang Zhu, Zongxian Feng, Han Li, Junqi Xiong, Dailin Li, Zuchen Gao, Kun Wu, Wen Xiang, Ziqi Zhan, Yuanxing Zhang, Wuxuan Gong, Ziyuan Gao, Guanxiang Wang, Yirong Xue, Xiaojiang Zhang, Jinghui Wang, Huiming Wang, Wenhao Zhuang, Zhaoxiang Zhang, Yuqun Zhang, Haotian Zhang, Bin Chen, Jiaheng Liu |
阅读更多来源: ArXiv AI | 11-11-25
Self-adaptive weighting and sampling for physics-informed neural networks
Authors: Wenqian Chen, Amanda Howard, Panos Stinis |
阅读更多来源: ArXiv AI | 11-11-25
DGTN: Graph-Enhanced Transformer with Diffusive Attention Gating Mechanism for Enzyme DDG Prediction
Authors: Abigail Lin |
阅读更多来源: ArXiv AI | 11-11-25
DMA: Online RAG Alignment with Human Feedback
Authors: Yu Bai, Yukai Miao, Dawei Wang, Li Chen, Fei Long, Rundi Zhai, Dan Li, Yanyu Ren, Tianfeng Liu, Hongtao Xie, Ce Yang, Xuhui Cai |
阅读更多来源: ArXiv AI | 11-11-25
Cleaning Maintenance Logs with LLM Agents for Improved Predictive Maintenance
Authors: Valeriu Dimidov, Faisal Hawlader, Sasan Jafarnejad, Raphaël Frank |
阅读更多来源: ArXiv AI | 11-11-25
Moonshot AI’s Kimi K2 Thinking sets new agentic reasoning records in open-source LLMs
阅读更多来源: The Decoder | 10-11-25
Tech giants take on record debt to finance the AI race
阅读更多来源: The Decoder | 10-11-25
Marble Fountainwillmorrison.net
阅读更多来源: Hacker News | 10-11-25
My Git history was a mess of 'update' and 'fix' – so I made AI clean it upgithub.com/f
阅读更多来源: Hacker News | 10-11-25
OpenAI faces questions over calls for government support
阅读更多来源: The Decoder | 09-11-25
Valori – A Python-native Vector Database I built from scratch
阅读更多来源: Hacker News | 09-11-25
Avería: The Average Font (2011)iotic.com
阅读更多来源: Hacker News | 09-11-25
Study identifies weaknesses in how AI systems are evaluatedox.ac.uk
阅读更多来源: Hacker News | 09-11-25
Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelicansimonwillison.net
阅读更多来源: Hacker News | 09-11-25
Generate, Evaluate, Iterate: Synthetic Data for Human-in-the-Loop Refinement of LLM Judges
Authors: Hyo Jin Do, Zahra Ashktorab, Jasmina Gajcin, Erik Miehling, Martín Santillán Cooper, Qian Pan, Elizabeth M. Daly, Werner Geyer |
阅读更多来源: ArXiv AI | 09-11-25
Ground-Truth Subgraphs for Better Training and Evaluation of Knowledge Graph Augmented LLMs
Authors: Alberto Cattaneo, Carlo Luschi, Daniel Justus |
阅读更多来源: ArXiv AI | 09-11-25
Speed at the Cost of Quality? The Impact of LLM Agent Assistance on Software Development
Authors: Hao He, Courtney Miller, Shyam Agarwal, Christian Kästner, Bogdan Vasilescu |
阅读更多来源: ArXiv AI | 09-11-25
RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG
Authors: Joshua Gao, Quoc Huy Pham, Subin Varghese, Silwal Saurav, Vedhus Hoskere |
阅读更多来源: ArXiv AI | 09-11-25
Decoding Emergent Big Five Traits in Large Language Models: Temperature-Dependent Expression and Architectural Clustering
Authors: Christos-Nikolaos Zacharopoulos, Revekka Kyriakoglou |
阅读更多来源: ArXiv AI | 09-11-25
RUST-BENCH: Benchmarking LLM Reasoning on Unstructured Text within Structured Tables
Authors: Nikhil Abhyankar, Purvi Chaurasia, Sanchit Kabra, Ananya Srivastava, Vivek Gupta, Chandan K. Reddy |
阅读更多来源: ArXiv AI | 09-11-25
How Different Tokenization Algorithms Impact LLMs and Transformer Models for Binary Code Analysis
Authors: Ahmed Mostafa, Raisul Arefin Nahid, Samuel Mulder |
阅读更多来源: ArXiv AI | 09-11-25
Addressing divergent representations from causal interventions on neural networks
Authors: Satchel Grant, Simon Jerome Han, Alexa Tartaglini, Christopher Potts |
阅读更多来源: ArXiv AI | 09-11-25
Integrating Temporal and Structural Context in Graph Transformers for Relational Deep Learning
Authors: Divyansha Lachi, Mahmoud Mohammadi, Joe Meyer, Vinam Arora, Tom Palczewski, Eva L. Dyer |
阅读更多来源: ArXiv AI | 09-11-25
LLM-as-a-Judge: Toward World Models for Slate Recommendation Systems
Authors: Baptiste Bonin, Maxime Heuillet, Audrey Durand |
阅读更多来源: ArXiv AI | 09-11-25
KnowThyself: An Agentic Assistant for LLM Interpretability
Authors: Suraj Prasai, Mengnan Du, Ying Zhang, Fan Yang |
阅读更多来源: ArXiv AI | 09-11-25
To See or To Read: User Behavior Reasoning in Multimodal LLMs
Authors: Tianning Dong, Luyi Ma, Varun Vasudevan, Jason Cho, Sushant Kumar, Kannan Achan |
阅读更多来源: ArXiv AI | 09-11-25
ArchPilot: A Proxy-Guided Multi-Agent Approach for Machine Learning Engineering
Authors: Zhuowen Yuan, Tao Liu, Yang Yang, Yang Wang, Feng Qi, Kaushik Rangadurai, Bo Li, Shuang Yang |
阅读更多来源: ArXiv AI | 09-11-25
LLMs and Cultural Values: the Impact of Prompt Language and Explicit Cultural Framing
Authors: Bram Bulté, Ayla Rigouts Terryn |
阅读更多来源: ArXiv AI | 09-11-25
Agentmandering: A Game-Theoretic Framework for Fair Redistricting via Large Language Model Agents
Authors: Hao Li, Haotian Chen, Ruoyuan Gong, Juanjuan Wang, Hao Jiang |
阅读更多来源: ArXiv AI | 09-11-25
Interpreting Multi-Attribute Confounding through Numerical Attributes in Large Language Models
Authors: Hirohane Takagi, Gouki Minegishi, Shota Kizawa, Issey Sukeda, Hitomi Yanaka |
阅读更多来源: ArXiv AI | 09-11-25
Detecting Silent Failures in Multi-Agentic AI Trajectories
Authors: Divya Pathak, Harshit Kumar, Anuska Roy, Felix George, Mudit Verma, Pratibha Moogi |
阅读更多来源: ArXiv AI | 09-11-25
Testing the Testers: Human-Driven Quality Assessment of Voice AI Testing Platforms
Authors: Miguel E. Andres, Vadim Fedorov, Rida Sadek, Enric Spagnolo-Arrizabalaga, Nadescha Trudel |
阅读更多来源: ArXiv AI | 09-11-25
RxSafeBench: Identifying Medication Safety Issues of Large Language Models in Simulated Consultation
Authors: Jiahao Zhao, Luxin Xu, Minghuan Tan, Lichao Zhang, Ahmadreza Argha, Hamid Alinejad-Rokny, Min Yang |
阅读更多来源: ArXiv AI | 09-11-25
AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research
Authors: Tim Beyer, Jonas Dornbusch, Jakob Steimle, Moritz Ladenburger, Leo Schwinn, Stephan Günnemann |
阅读更多来源: ArXiv AI | 09-11-25
Post-Training LLMs as Better Decision-Making Agents: A Regret-Minimization Approach
Authors: Chanwoo Park, Ziyang Chen, Asuman Ozdaglar, Kaiqing Zhang |
阅读更多来源: ArXiv AI | 09-11-25
Large language models replicate and predict human cooperation across experiments in game theory
Authors: Andrea Cera Palatsi, Samuel Martin-Gutierrez, Ana S. Cardenal, Max Pellert |
阅读更多来源: ArXiv AI | 09-11-25
Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper
Authors: Atsuyuki Miyai, Mashiro Toyooka, Takashi Otonari, Zaiying Zhao, Kiyoharu Aizawa |
阅读更多来源: ArXiv AI | 09-11-25
DR. WELL: Dynamic Reasoning and Learning with Symbolic World Model for Embodied LLM-Based Multi-Agent Collaboration
Authors: Narjes Nourzad, Hanqing Yang, Shiyu Chen, Carlee Joe-Wong |
阅读更多来源: ArXiv AI | 09-11-25
Six AI all-stars weigh in on hype, hope, and the reality behind the field
阅读更多来源: The Decoder | 09-11-25
OpenAI: Our new model GPT-5-Codex-Mini – a more cost-efficient GPT-5-Codexgithub.com/openai
阅读更多来源: Hacker News | 09-11-25
ChatGPT’s news picks swing wildly depending on whether you use the web interface or the API
阅读更多来源: The Decoder | 09-11-25
Facing lawsuits, OpenAI rewires ChatGPT for safer teen use
阅读更多来源: The Decoder | 08-11-25
Can you save on LLM tokens using images instead of text?pagewatch.ai
阅读更多来源: Hacker News | 08-11-25
Reverse engineering a neural network's clever solution to binary addition (2023)cprimozic.net
阅读更多来源: Hacker News | 08-11-25
Google's Gemini Deep Research feature now taps into Gmail, Drive, and Chat
阅读更多来源: The Decoder | 08-11-25
U.S. blocks Nvidia’s downsized AI chip as Huang warns China could seize AI lead
阅读更多来源: The Decoder | 08-11-25
OpenAI CEO Sam Altman expects to hit $20 billion in annual revenue by year-end
阅读更多来源: The Decoder | 08-11-25
Most LLM benchmarks are flawed, casting doubt on AI progress metrics, study finds
阅读更多来源: The Decoder | 08-11-25
Siri will get a Gemini-powered brain transplant as Apple bets on Google to close its generative gap
阅读更多来源: The Decoder | 07-11-25
UK judge rules that AI image generator Stable Diffusion is not an "infringing copy"
阅读更多来源: The Decoder | 07-11-25
German Commons shows that big AI datasets don’t have to live in copyright limbo
阅读更多来源: The Decoder | 07-11-25
Sandbar claims its finger-worn AI device is the new 'mouse for voice'
阅读更多来源: The Decoder | 07-11-25
OpenAI achieves record growth, but no IPO in sight
阅读更多来源: The Decoder | 07-11-25
'You're just ready:' Parents say ChatGPT encouraged son to kill himselfcnn.com
阅读更多来源: Hacker News | 07-11-25
Generative Artificial Intelligence in Bioinformatics: A Systematic Review of Models, Applications, and Methodological Advances
Authors: Riasad Alvi, Sayeem Been Zaman, Wasimul Karim, Arefin Ittesafun Abian, Mohaimenul Azam Khan Raiaan, Saddam Mukta, Md Rafi Ur Rashid, Md Rafiqul Islam, Yakub Sebastian, Sami Azam |
阅读更多来源: ArXiv AI | 07-11-25
Benchmarking the Thinking Mode of Multimodal Large Language Models in Clinical Tasks
Authors: Jindong Hong, Tianjie Chen, Lingjie Luo, Chuanyang Zheng, Ting Xu, Haibao Yu, Jianing Qiu, Qianzhong Chen, Suning Huang, Yan Xu, Yong Gui, Yijun He, Jiankai Sun |
阅读更多来源: ArXiv AI | 07-11-25
Computational Imaging Meets LLMs: Zero-Shot IDH Mutation Prediction in Brain Gliomas
Authors: Syed Muqeem Mahmood, Hassan Mohy-ud-Din |
阅读更多来源: ArXiv AI | 07-11-25
Efficient Neural Networks with Discrete Cosine Transform Activations
Authors: Marc Martinez-Gost, Sara Pepe, Ana Pérez-Neira, Miguel Ángel Lagunas |
阅读更多来源: ArXiv AI | 07-11-25
ROSBag MCP Server: Analyzing Robot Data with LLMs for Agentic Embodied AI Applications
Authors: Lei Fu, Sahar Salimpour, Leonardo Militano, Harry Edelman, Jorge Peña Queralta, Giovanni Toffetti |
阅读更多来源: ArXiv AI | 07-11-25
Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code Understanding
Authors: Ziv Nevo, Orna Raz, Karen Yorav |
阅读更多来源: ArXiv AI | 07-11-25
Imitation Learning in the Deep Learning Era: A Novel Taxonomy and Recent Advances
Authors: Iason Chrysomallis, Georgios Chalkiadakis |
阅读更多来源: ArXiv AI | 07-11-25
Watermarking Large Language Models in Europe: Interpreting the AI Act in Light of Technology
Authors: Thomas Souverain |
阅读更多来源: ArXiv AI | 07-11-25
LiveTradeBench: Seeking Real-World Alpha with Large Language Models
Authors: Haofei Yu, Fenghai Li, Jiaxuan You |
阅读更多来源: ArXiv AI | 07-11-25
Visualization Biases MLLM's Decision Making in Network Data Tasks
Authors: Timo Brand, Henry Förster, Stephen G. Kobourov, Jacob Miller |
阅读更多来源: ArXiv AI | 07-11-25
Whisper Leak: a side-channel attack on Large Language Models
Authors: Geoff McDonald, Jonathan Bar Or |
阅读更多来源: ArXiv AI | 07-11-25
AnaFlow: Agentic LLM-based Workflow for Reasoning-Driven Explainable and Sample-Efficient Analog Circuit Sizing
Authors: Mohsen Ahmadzadeh, Kaichang Chen, Georges Gielen |
阅读更多来源: ArXiv AI | 07-11-25
PublicAgent: Multi-Agent Design Principles From an LLM-Based Open Data Analysis Framework
Authors: Sina Montazeri, Yunhe Feng, Kewei Sha |
阅读更多来源: ArXiv AI | 07-11-25
Evaluating Control Protocols for Untrusted AI Agents
Authors: Jon Kutasov, Chloe Loughridge, Yuqi Sun, Henry Sleight, Buck Shlegeris, Tyler Tracy, Joe Benton |
阅读更多来源: ArXiv AI | 07-11-25
Epidemiology of Large Language Models: A Benchmark for Observational Distribution Knowledge
Authors: Drago Plecko, Patrik Okanovic, Torsten Hoefler, Elias Bareinboim |
阅读更多来源: ArXiv AI | 07-11-25
Using Multi-modal Large Language Model to Boost Fireworks Algorithm's Ability in Settling Challenging Optimization Tasks
Authors: Shipeng Cen, Ying Tan |
阅读更多来源: ArXiv AI | 07-11-25
Large language models require a new form of oversight: capability-based monitoring
Authors: Katherine C. Kellogg, Bingyang Ye, Yifan Hu, Guergana K. Savova, Byron Wallace, Danielle S. Bitterman |
阅读更多来源: ArXiv AI | 07-11-25
A Proprietary Model-Based Safety Response Framework for AI Agents
Authors: Qi Li, Jianjun Xu, Pingtao Wei, Jiu Li, Peiqiang Zhao, Jiwei Shi, Xuan Zhang, Yanhui Yang, Xiaodong Hui, Peng Xu, Wenqin Shao |
阅读更多来源: ArXiv AI | 07-11-25
From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers
Authors: Yi-Fei Liu, Yi-Long Lu, Di He, Hang Zhang |
阅读更多来源: ArXiv AI | 07-11-25
Towards Scalable Web Accessibility Audit with MLLMs as Copilots
Authors: Ming Gu, Ziwei Wang, Sicen Lai, Zirui Gao, Sheng Zhou, Jiajun Bu |
阅读更多来源: ArXiv AI | 07-11-25
The Learning Loop and LLMsmartinfowler.com
阅读更多来源: Hacker News | 07-11-25
Show HN: qqqa – A fast, stateless LLM-powered assistant for your shellgithub.com/matisojka
阅读更多来源: Hacker News | 07-11-25
LLMs encode how difficult problems arearxiv.org
阅读更多来源: Hacker News | 07-11-25
A court battle over Perplexity’s Comet agent could define how AI is allowed to shop online for users
阅读更多来源: The Decoder | 06-11-25
The trust collapse: Infinite AI content is awfularnon.dk
阅读更多来源: Hacker News | 06-11-25
ChatGPT terms disallow its use in providing legal and medical advice to othersctvnews.ca
阅读更多来源: Hacker News | 06-11-25
Microsoft to invest $7.9 billion in AI infrastructure and talent across the UAE by 2029
阅读更多来源: The Decoder | 06-11-25
Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks
Authors: Xiumei Deng, Zehui Xiong, Binbin Chen, Dong In Kim, Merouane Debbah, H. Vincent Poor |
阅读更多来源: ArXiv AI | 06-11-25
Natural-gas storage modelling by deep reinforcement learning
Authors: Tiziano Balaconi, Aldo Glielmo, Marco Taboga |
阅读更多来源: ArXiv AI | 06-11-25
LLEXICORP: End-user Explainability of Convolutional Neural Networks
Authors: Vojtěch Kůr, Adam Bajger, Adam Kukučka, Marek Hradil, Vít Musil, Tomáš Brázdil |
阅读更多来源: ArXiv AI | 06-11-25
Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes
Authors: Mohammadsajad Alipour, Mohammad Mohammadi Amiri |
阅读更多来源: ArXiv AI | 06-11-25
STAR-VAE: Latent Variable Transformers for Scalable and Controllable Molecular Generation
Authors: Bum Chul Kwon, Ben Shapira, Moshiko Raboh, Shreyans Sethi, Shruti Murarka, Joseph A Morrone, Jianying Hu, Parthasarathy Suryanarayanan |
阅读更多来源: ArXiv AI | 06-11-25
MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning
Authors: Qianhao Yuan, Jie Lou, Zichao Li, Jiawei Chen, Yaojie Lu, Hongyu Lin, Le Sun, Debing Zhang, Xianpei Han |
阅读更多来源: ArXiv AI | 06-11-25
TabTune: A Unified Library for Inference and Fine-Tuning Tabular Foundation Models
Authors: Aditya Tanna, Pratinav Seth, Mohamed Bouadi, Utsav Avaiya, Vinay Kumar Sankarapu |
阅读更多来源: ArXiv AI | 06-11-25
Measuring AI Diffusion: A Population-Normalized Metric for Tracking Global AI Usage
Authors: Amit Misra, Jane Wang, Scott McCullers, Kevin White, Juan Lavista Ferres |
阅读更多来源: ArXiv AI | 06-11-25
Mirror-Neuron Patterns in AI Alignment
Authors: Robyn Wyrick |
阅读更多来源: ArXiv AI | 06-11-25
InsurAgent: A Large Language Model-Empowered Agent for Simulating Individual Behavior in Purchasing Flood Insurance
Authors: Ziheng Geng, Jiachen Liu, Ran Cao, Lu Cheng, Dan M. Frangopol, Minghui Cheng |
阅读更多来源: ArXiv AI | 06-11-25
Training Proactive and Personalized LLM Agents
Authors: Weiwei Sun, Xuhui Zhou, Weihua Du, Xingyao Wang, Sean Welleck, Graham Neubig, Maarten Sap, Yiming Yang |
阅读更多来源: ArXiv AI | 06-11-25
Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation
Authors: Zhiwei Zhang, Xiaomin Li, Yudi Lin, Hui Liu, Ramraj Chandradevan, Linlin Wu, Minhua Lin, Fali Wang, Xianfeng Tang, Qi He, Suhang Wang |
阅读更多来源: ArXiv AI | 06-11-25
When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs
Authors: Zhuoran Zhang, Tengyue Wang, Xilin Gong, Yang Shi, Haotian Wang, Di Wang, Lijie Hu |
阅读更多来源: ArXiv AI | 06-11-25
Deep Ideation: Designing LLM Agents to Generate Novel Research Ideas on Scientific Concept Network
Authors: Keyu Zhao, Weiquan Lin, Qirui Zheng, Fengli Xu, Yong Li |
阅读更多来源: ArXiv AI | 06-11-25
ReAcTree: Hierarchical LLM Agent Trees with Control Flow for Long-Horizon Task Planning
Authors: Jae-Woo Choi, Hyungmin Kim, Hyobin Ong, Minsu Jang, Dohyung Kim, Jaehong Kim, Youngwoo Yoon |
阅读更多来源: ArXiv AI | 06-11-25
Chronic Kidney Disease Prognosis Prediction Using Transformer
Authors: Yohan Lee, DongGyun Kang, SeHoon Park, Sa-Yoon Park, Kwangsoo Kim |
阅读更多来源: ArXiv AI | 06-11-25
Agentic AI for Mobile Network RAN Management and Optimization
Authors: Jorge Pellejero, Luis A. Hernández Gómez, Luis Mendo Tomás, Zoraida Frias Barroso |
阅读更多来源: ArXiv AI | 06-11-25
The ORCA Benchmark: Evaluating Real-World Calculation Accuracy in Large Language Models
Authors: Claudia Herambourg, Dawid Siuda, Anna Szczepanek, Julia Kopczyńska, Joao R. L. Santos, Wojciech Sas, Joanna Śmietańska-Nowak |
阅读更多来源: ArXiv AI | 06-11-25
Knowledge Graph-enhanced Large Language Model for Incremental Game PlayTesting
Authors: Enhong Mu, Jinyu Cai, Yijun Lu, Mingyue Zhang, Kenji Tei, Jialong Li |
阅读更多来源: ArXiv AI | 06-11-25
CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents
Authors: Jiayu Liu, Cheng Qian, Zhaochen Su, Qing Zong, Shijue Huang, Bingxiang He, Yi R. Fung |
阅读更多来源: ArXiv AI | 06-11-25
LLM-Supported Formal Knowledge Representation for Enhancing Control Engineering Content with an Interactive Semantic Layer
Authors: Julius Fiedler (1), Carsten Knoll (2), Klaus Röbenack (1) ((1) Institute of Control Theory at TU Dresden, (2) Chair of Fundamentals of Electrical Engineering at TU Dresden) |
阅读更多来源: ArXiv AI | 06-11-25
Kosmos: An AI Scientist for Autonomous Discovery
Authors: Ludovico Mitchener, Angela Yiu, Benjamin Chang, Mathieu Bourdenx, Tyler Nadolski, Arvis Sulovari, Eric C. Landsness, Daniel L. Barabasi, Siddharth Narayanan, Nicky Evans, Shriya Reddy, Martha Foiani, Aizad Kamal, Leah P. Shriver, Fang Cao, Asmamaw T. Wassie, Jon M. Laurent, Edwin Melville-Green, Mayk Caldas, Albert Bou, Kaleigh F. Roberts, Sladjana Zagorac, Timothy C. Orr, Miranda E. Orr, Kevin J. Zwezdaryk, Ali E. Ghareeb, Laurie McCoy, Bruna Gomes, Euan A. Ashley, Karen E. Duff, Tonio Buonassisi, Tom Rainforth, Randall J. Bateman, Michael Skarlinski, Samuel G. Rodriques, Michaela M. Hinks, Andrew D. White |
阅读更多来源: ArXiv AI | 06-11-25
Optimizing AI Agent Attacks With Synthetic Data
Authors: Chloe Loughridge, Paul Colognese, Avery Griffin, Tyler Tracy, Jon Kutasov, Joe Benton |
阅读更多来源: ArXiv AI | 06-11-25
Neurosymbolic Deep Learning Semantics
Authors: Artur d'Avila Garcez, Simon Odense |
阅读更多来源: ArXiv AI | 06-11-25
A self-rewriting AI from KAUST revives Jürgen Schmidhuber’s vision of a Gödel Machine
阅读更多来源: The Decoder | 05-11-25
Developers are choosing older AI modelsaugmentcode.com
阅读更多来源: Hacker News | 05-11-25
This week in 1988, Robert Morris unleashed his eponymous wormtomshardware.com
阅读更多来源: Hacker News | 05-11-25
Advancing AI Challenges for the United States Department of the Air Force
Authors: Christian Prothmann, Vijay Gadepally, Jeremy Kepner, Koley Borchard, Luca Carlone, Zachary Folcik, J. Daniel Grith, Michael Houle, Jonathan P. How, Nathan Hughes, Ifueko Igbinedion, Hayden Jananthan, Tejas Jayashankar, Michael Jones, Sertac Karaman, Binoy G. Kurien, Alejandro Lancho, Giovanni Lavezzi, Gary C. F. Lee, Charles E. Leiserson, Richard Linares, Lindsey McEvoy, Peter Michaleas, Chasen Milner, Alex Pentland, Yury Polyanskiy, Jovan Popovich, Jeffrey Price, Tim W. Reid, Stephanie Riley, Siddharth Samsi, Peter Saunders, Olga Simek, Mark S. Veillette, Amir Weiss, Gregory W. Wornell, Daniela Rus, Scott T. Ruppel |
阅读更多来源: ArXiv AI | 05-11-25
Advancing Cognitive Science with LLMs
Authors: Dirk U. Wulff, Rui Mata |
阅读更多来源: ArXiv AI | 05-11-25
Efficiency vs. Alignment: Investigating Safety and Fairness Risks in Parameter-Efficient Fine-Tuning of LLMs
Authors: Mina Taraghi, Yann Pequignot, Amin Nikanjam, Mohamed Amine Merzouk, Foutse Khomh |
阅读更多来源: ArXiv AI | 05-11-25
Diverse Human Value Alignment for Large Language Models via Ethical Reasoning
Authors: Jiahao Wang, Songkai Xue, Jinghui Li, Xiaozhen Wang |
阅读更多来源: ArXiv AI | 05-11-25
Better Call CLAUSE: A Discrepancy Benchmark for Auditing LLMs Legal Reasoning Capabilities
Authors: Manan Roy Choudhury, Adithya Chandramouli, Mannan Anand, Vivek Gupta |
阅读更多来源: ArXiv AI | 05-11-25
GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining
Authors: Chunyu Wei, Wenji Hu, Xingjia Hao, Xin Wang, Yifan Yang, Yueguo Chen, Yang Tian, Yunhai Wang |
阅读更多来源: ArXiv AI | 05-11-25
Leveraging Multi-Agent System (MAS) and Fine-Tuned Small Language Models (SLMs) for Automated Telecom Network Troubleshooting
Authors: Chenhua Shi, Bhavika Jalli, Gregor Macdonald, John Zou, Wanlu Lei, Mridul Jain, Joji Philip |
阅读更多来源: ArXiv AI | 05-11-25
Do Math Reasoning LLMs Help Predict the Impact of Public Transit Events?
Authors: Bowen Fang, Ruijian Zha, Xuan Di |
阅读更多来源: ArXiv AI | 05-11-25
Count-Based Approaches Remain Strong: A Benchmark Against Transformer and LLM Pipelines on Structured EHR
Authors: Jifan Gao, Michael Rosenthal, Brian Wolpin, Simona Cristea |
阅读更多来源: ArXiv AI | 05-11-25
How Focused Are LLMs? A Quantitative Study via Repetitive Deterministic Prediction Tasks
Authors: Wanda Hou, Leon Zhou, Hong-Ye Hu, Yi-Zhuang You, Xiao-Liang Qi |
阅读更多来源: ArXiv AI | 05-11-25
Aligning LLM agents with human learning and adjustment behavior: a dual agent approach
Authors: Tianming Liu, Jirong Yang, Yafeng Yin, Manzi Li, Linghao Wang, Zheng Zhu |
阅读更多来源: ArXiv AI | 05-11-25
LLMs Position Themselves as More Rational Than Humans: Emergence of AI Self-Awareness Measured Through Game Theory
Authors: Kyung-Hoon Kim |
阅读更多来源: ArXiv AI | 05-11-25
MiRAGE: Misconception Detection with Retrieval-Guided Multi-Stage Reasoning and Ensemble Fusion
Authors: Cuong Van Duc, Thai Tran Quoc, Minh Nguyen Dinh Tuan, Tam Vu Duc, Son Nguyen Van, Hanh Nguyen Thi |
阅读更多来源: ArXiv AI | 05-11-25
DART: Difficulty-Adaptive Reasoning Truncation for Efficient Large Language Models
Authors: Ruofan Zhang, Bin Xia, Zhen Cheng, Cairen Jian, Minglun Yang, Ngai Wong, Yuan Cheng |
阅读更多来源: ArXiv AI | 05-11-25
Modular Task Decomposition and Dynamic Collaboration in Multi-Agent Systems Driven by Large Language Models
Authors: Shuaidong Pan, Di Wu |
阅读更多来源: ArXiv AI | 05-11-25
Efficient Test-Time Retrieval Augmented Generation
Authors: Hailong Yin, Bin Zhu, Jingjing Chen, Chong-Wah Ngo |
阅读更多来源: ArXiv AI | 05-11-25
Knowledge Elicitation with Large Language Models for Interpretable Cancer Stage Identification from Pathology Reports
Authors: Yeawon Lee, Christopher C. Yang, Chia-Hsuan Chang, Grace Lu-Yao |
阅读更多来源: ArXiv AI | 05-11-25
llmSHAP: A Principled Approach to LLM Explainability
Authors: Filip Naudot, Tobias Sundqvist, Timotheus Kampik |
阅读更多来源: ArXiv AI | 05-11-25
Graph Neural Network-Based Semi-Supervised Open-Set Fault Diagnosis for Marine Machinery Systems
Authors: Chuyue Lou, M. Amine Atoui |
阅读更多来源: ArXiv AI | 05-11-25
Align to Misalign: Automatic LLM Jailbreak with Meta-Optimized LLM Judges
Authors: Hamin Koo, Minseon Kim, Jaehyung Kim |
阅读更多来源: ArXiv AI | 05-11-25
Automatic Minds: Cognitive Parallels Between Hypnotic States and Large Language Model Processing
Authors: Giuseppe Riva, Brenda K. Wiederhold, Fabrizia Mantovani |
阅读更多来源: ArXiv AI | 05-11-25
TPS-Bench: Evaluating AI Agents' Tool Planning \& Scheduling Abilities in Compounding Tasks
Authors: Hanwen Xu, Xuyao Huang, Yuzhe Liu, Kai Yu, Zhijie Deng |
阅读更多来源: ArXiv AI | 05-11-25
New research finds LLMs report subjective experience most when roleplay is reduced
阅读更多来源: The Decoder | 04-11-25
Pangram achieves near-perfect results in AI text detection tests, study reveals
阅读更多来源: The Decoder | 04-11-25
OpenAI CEO Sam Altman says revenue is "well more" than $13 billion and invites critics to sell
阅读更多来源: The Decoder | 04-11-25
OpenAI’s Atlas browser sidesteps NYT and PCMag blocks by steering users to competitors
阅读更多来源: The Decoder | 04-11-25
Lessons from interviews on deploying AI Agents in productionmmc.vc
阅读更多来源: Hacker News | 04-11-25
Microsoft brings autonomous AI agents to 365 Copilot
阅读更多来源: The Decoder | 04-11-25
Robert Hooke's "Cyberpunk” Letter to Gottfried Leibnizmynamelowercase.com
阅读更多来源: Hacker News | 04-11-25
Agent-o-rama: build, trace, evaluate, and monitor LLM agents in Java or Clojureredplanetlabs.com
阅读更多来源: Hacker News | 04-11-25
TetraJet-v2: Accurate NVFP4 Training for Large Language Models with Oscillation Suppression and Outlier Control
Authors: Yuxiang Chen, Xiaoming Xu, Pengle Zhang, Michael Beyer, Martin Rapp, Jun Zhu, Jianfei Chen |
阅读更多来源: ArXiv AI | 04-11-25
Leveraging Generic Time Series Foundation Models for EEG Classification
Authors: Théo Gnassounou, Yessin Moakher, Shifeng Xie, Vasilii Feofanov, Ievgen Redko |
阅读更多来源: ArXiv AI | 04-11-25
Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum
Authors: Zhuoning Guo, Mingxin Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Xiaowen Chu |
阅读更多来源: ArXiv AI | 04-11-25
VessShape: Few-shot 2D blood vessel segmentation by leveraging shape priors from synthetic images
Authors: Cesar H. Comin, Wesley N. Galvão |
阅读更多来源: ArXiv AI | 04-11-25
The Denario project: Deep knowledge AI agents for scientific discovery
Authors: Francisco Villaescusa-Navarro, Boris Bolliet, Pablo Villanueva-Domingo, Adrian E. Bayer, Aidan Acquah, Chetana Amancharla, Almog Barzilay-Siegal, Pablo Bermejo, Camille Bilodeau, Pablo Cárdenas Ramírez, Miles Cranmer, Urbano L. França, ChangHoon Hahn, Yan-Fei Jiang, Raul Jimenez, Jun-Young Lee, Antonio Lerario, Osman Mamun, Thomas Meier, Anupam A. Ojha, Pavlos Protopapas, Shimanto Roy, David N. Spergel, Pedro Tarancón-Álvarez, Ujjwal Tiwari, Matteo Viel, Digvijay Wadekar, Chi Wang, Bonny Y. Wang, Licong Xu, Yossi Yovel, Shuwen Yue, Wen-Han Zhou, Qiyao Zhu, Jiajun Zou, Íñigo Zubeldia |
阅读更多来源: ArXiv AI | 04-11-25
CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions
Authors: Lingyue Fu, Xin Ding, Yaoming Zhu, Shao Zhang, Lin Qiu, Weiwen Liu, Weinan Zhang, Xuezhi Cao, Xunliang Cai, Jiaxin Ding, Yong Yu |
阅读更多来源: ArXiv AI | 04-11-25
Adaptive Data Flywheel: Applying MAPE Control Loops to AI Agent Improvement
Authors: Aaditya Shukla, Sidney Knowles, Meenakshi Madugula, Dave Farris, Ryan Angilly, Santiago Pombo, Anbang Xu, Lu An, Abhinav Balasubramanian, Tan Yu, Jiaxiang Ren, Rama Akkiraju |
阅读更多来源: ArXiv AI | 04-11-25
Cognition Envelopes for Bounded AI Reasoning in Autonomous UAS Operations
Authors: Pedro Antonio Alarcón Granadeno, Arturo Miguel Bernal Russell, Sofia Nelson, Demetrius Hernandez, Maureen Petterson, Michael Murphy, Walter J. Scheirer, Jane Cleland-Huang |
阅读更多来源: ArXiv AI | 04-11-25
Fints: Efficient Inference-Time Personalization for LLMs with Fine-Grained Instance-Tailored Steering
Authors: Kounianhua Du, Jianxing Liu, Kangning Zhang, Wenxiang Jiao, Yuan Lu, Jiarui Jin, Weiwen Liu, Yong Yu, Weinan Zhang |
阅读更多来源: ArXiv AI | 04-11-25
Glia: A Human-Inspired AI for Automated Systems Design and Optimization
Authors: Pouya Hamadanian, Pantea Karimi, Arash Nasr-Esfahany, Kimia Noorbakhsh, Joseph Chandler, Ali ParandehGheibi, Mohammad Alizadeh, Hari Balakrishnan |
阅读更多来源: ArXiv AI | 04-11-25
An In-depth Study of LLM Contributions to the Bin Packing Problem
Authors: Julien Herrmann, Guillaume Pallez |
阅读更多来源: ArXiv AI | 04-11-25
GeoFM: Enhancing Geometric Reasoning of MLLMs via Synthetic Data Generation through Formal Language
Authors: Yuhao Zhang, Dingxin Hu, Tinghao Yu, Hao Liu, Yiting Liu |
阅读更多来源: ArXiv AI | 04-11-25
InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research
Authors: Yunze Wu, Dayuan Fu, Weiye Si, Zhen Huang, Mohan Jiang, Keyu Li, Shijie Xia, Jie Sun, Tianze Xu, Xiangkun Hu, Pengrui Lu, Xiaojie Cai, Lyumanshan Ye, Wenhong Zhu, Yang Xiao, Pengfei Liu |
阅读更多来源: ArXiv AI | 04-11-25
Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning
Authors: Qiusi Zhan, Hyeonjeong Ha, Rui Yang, Sirui Xu, Hanyang Chen, Liang-Yan Gui, Yu-Xiong Wang, Huan Zhang, Heng Ji, Daniel Kang |
阅读更多来源: ArXiv AI | 04-11-25
OpenAI moves Sora to paid model, discovers concept of paying for content
阅读更多来源: The Decoder | 03-11-25
Meta's Free Transformer introduces a new approach to LLM decision-making
阅读更多来源: The Decoder | 03-11-25
Simple trick to increase coverage: Lying to users about signal strengthnickvsnetworking.com
阅读更多来源: Hacker News | 03-11-25
Tongyi DeepResearch – open-source 30B MoE Model that rivals OpenAI DeepResearchtongyi-agent.github.io
阅读更多来源: Hacker News | 03-11-25
Syllabi – Open-source agentic AI with tools, RAG, and multi-channel deploysyllabi-ai.com
阅读更多来源: Hacker News | 03-11-25
When models manipulate manifolds: The geometry of a counting tasktransformer-circuits.pub
阅读更多来源: Hacker News | 03-11-25
Remote Labor Index: Measuring AI Automation of Remote Work
Authors: Mantas Mazeika, Alice Gatti, Cristina Menghini, Udari Madhushani Sehwag, Shivam Singhal, Yury Orlovskiy, Steven Basart, Manasi Sharma, Denis Peskoff, Elaine Lau, Jaehyuk Lim, Lachlan Carroll, Alice Blair, Vinaya Sivakumar, Sumana Basu, Brad Kenstler, Yuntao Ma, Julian Michael, Xiaoke Li, Oliver Ingebretsen, Aditya Mehta, Jean Mottola, John Teichmann, Kevin Yu, Zaina Shaik, Adam Khoja, Richard Ren, Jason Hausenloy, Long Phan, Ye Htet, Ankit Aich, Tahseen Rabbani, Vivswan Shah, Andriy Novykov, Felix Binder, Kirill Chugunov, Luis Ramirez, Matias Geralnik, Hernán Mesura, Dean Lee, Ed-Yeremai Hernandez Cardona, Annette Diamond, Summer Yue, Alexandr Wang, Bing Liu, Ernesto Hernandez, Dan Hendrycks |
阅读更多来源: ArXiv AI | 03-11-25
Through the Judge's Eyes: Inferred Thinking Traces Improve Reliability of LLM Raters
Authors: Xingjian Zhang, Tianhong Gao, Suliang Jin, Tianhao Wang, Teng Ye, Eytan Adar, Qiaozhu Mei |
阅读更多来源: ArXiv AI | 03-11-25
An Agentic Framework for Rapid Deployment of Edge AI Solutions in Industry 5.0
Authors: Jorge Martinez-Gil, Mario Pichler, Nefeli Bountouni, Sotiris Koussouris, Marielena Márquez Barreiro, Sergio Gusmeroli |
阅读更多来源: ArXiv AI | 03-11-25
From Queries to Insights: Agentic LLM Pipelines for Spatio-Temporal Text-to-SQL
Authors: Manu Redd, Tao Zhe, Dongjie Wang |
阅读更多来源: ArXiv AI | 03-11-25
Humains-Junior: A 3.8B Language Model Achieving GPT-4o-Level Factual Accuracy by Directed Exoskeleton Reasoning
Authors: Nissan Yaron, Dan Bystritsky, Ben-Etzion Yaron |
阅读更多来源: ArXiv AI | 03-11-25
SciTrust 2.0: A Comprehensive Framework for Evaluating Trustworthiness of Large Language Models in Scientific Applications
Authors: Emily Herron, Junqi Yin, Feiyi Wang |
阅读更多来源: ArXiv AI | 03-11-25
Beyond Benchmarks: The Economics of AI Inference
Authors: Boqin Zhuang, Jiacheng Qiao, Mingqian Liu, Mingxing Yu, Ping Hong, Rui Li, Xiaoxia Song, Xiangjun Xu, Xu Chen, Yaoyao Ma, Yujie Gao |
阅读更多来源: ArXiv AI | 03-11-25
Can AI be Accountable?
Authors: Andrew L. Kun |
阅读更多来源: ArXiv AI | 03-11-25
Large Language Model-assisted Autonomous Vehicle Recovery from Immobilization
Authors: Zhipeng Bao, Qianwen Li |
阅读更多来源: ArXiv AI | 03-11-25
Graph-Enhanced Policy Optimization in LLM Agent Training
Authors: Jiazhen Yuan, Wei Zhao, Zhengbiao Bai |
阅读更多来源: ArXiv AI | 03-11-25
Retrieval Augmented Generation-Enhanced Distributed LLM Agents for Generalizable Traffic Signal Control with Emergency Vehicles
Authors: Xinhang Li, Qing Guo, Junyu Chen, Zheng Guo, Shengzhe Xu, Lei Li, Lin Zhang |
阅读更多来源: ArXiv AI | 03-11-25
Questionnaire meets LLM: A Benchmark and Empirical Study of Structural Skills for Understanding Questions and Responses
Authors: Duc-Hai Nguyen, Vijayakumar Nanjappan, Barry O'Sullivan, Hoang D. Nguyen |
阅读更多来源: ArXiv AI | 03-11-25
Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math
Authors: Bo Pang, Deqian Kong, Silvio Savarese, Caiming Xiong, Yingbo Zhou |
阅读更多来源: ArXiv AI | 03-11-25
Autograder+: A Multi-Faceted AI Framework for Rich Pedagogical Feedback in Programming Education
Authors: Vikrant Sahu, Gagan Raj Gupta, Raghav Borikar, Nitin Mane |
阅读更多来源: ArXiv AI | 03-11-25
A Pragmatic View of AI Personhood
Authors: Joel Z. Leibo, Alexander Sasha Vezhnevets, William A. Cunningham, Stanley M. Bileschi |
阅读更多来源: ArXiv AI | 03-11-25
Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings
Authors: Andrew M. Bean, Nabeel Seedat, Shengzhuang Chen, Jonathan Richard Schwarz |
阅读更多来源: ArXiv AI | 03-11-25
BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning
Authors: Qianli Shen, Daoyuan Chen, Yilun Huang, Zhenqing Ling, Yaliang Li, Bolin Ding, Jingren Zhou |
阅读更多来源: ArXiv AI | 03-11-25
GraphCompliance: Aligning Policy and Context Graphs for LLM-Based Regulatory Compliance
Authors: Jiseong Chung, Ronny Ko, Wonchul Yoo, Makoto Onizuka, Sungmok Kim, Tae-Wan Kim, Won-Yong Shin |
阅读更多来源: ArXiv AI | 03-11-25
LINK-KG: LLM-Driven Coreference-Resolved Knowledge Graphs for Human Smuggling Networks
Authors: Dipak Meher, Carlotta Domeniconi, Guadalupe Correa-Cabrera |
阅读更多来源: ArXiv AI | 03-11-25
Who Has The Final Say? Conformity Dynamics in ChatGPT's Selections
Authors: Clarissa Sabrina Arlinghaus, Tristan Kenneweg, Barbara Hammer, Günter W. Maier |
阅读更多来源: ArXiv AI | 03-11-25
Normative Reasoning in Large Language Models: A Comparative Benchmark from Logical and Modal Perspectives
Authors: Kentaro Ozeki, Risako Ando, Takanobu Morishita, Hirohiko Abe, Koji Mineshima, Mitsuhiro Okada |
阅读更多来源: ArXiv AI | 03-11-25
Agentic AI Home Energy Management System: A Large Language Model Framework for Residential Load Scheduling
Authors: Reda El Makroum, Sebastian Zwickl-Bernhard, Lukas Kranzl |
阅读更多来源: ArXiv AI | 03-11-25
EdgeRunner 20B: Military Task Parity with GPT-5 while Running on the Edge
Authors: Jack FitzGerald, Aristotelis Lazaridis, Dylan Bates, Aman Sharma, Jonnathan Castillo, Yousif Azami, Sean Bailey, Jeremy Cao, Peter Damianov, Kevin de Haan, Luke Kerbs, Vincent Lu, Joseph Madigan, Jeremy McLaurin, Jonathan Tainer, Dave Anderson, Jonathan Beck, Jamie Cuticello, Colton Malkerson, Tyler Saltsman |
阅读更多来源: ArXiv AI | 03-11-25
LLMs Process Lists With General Filter Heads
Authors: Arnab Sen Sharma, Giordano Rogers, Natalie Shapira, David Bau |
阅读更多来源: ArXiv AI | 03-11-25
The Oversight Game: Learning to Cooperatively Balance an AI Agent's Safety and Autonomy
Authors: William Overman, Mohsen Bayati |
阅读更多来源: ArXiv AI | 03-11-25
Unveiling Intrinsic Text Bias in Multimodal Large Language Models through Attention Key-Space Analysis
Authors: Xinhan Zheng, Huyu Wu, Xueting Wang, Haiyun Jiang |
阅读更多来源: ArXiv AI | 03-11-25
The Smol Training Playbook: The Secrets to Building World-Class LLMshuggingface.co
阅读更多来源: Hacker News | 02-11-25
Show HN: Why write code if the LLM can just do the thing? (web app experiment)github.com/samrolken
阅读更多来源: Hacker News | 02-11-25
How I use every Claude Code featuresshh.io
阅读更多来源: Hacker News | 02-11-25
Claude Code can debug low-level cryptographyfilippo.io
阅读更多来源: Hacker News | 02-11-25
Cursor 2.0 shifts to in-house AI with Composer model and parallel agents
阅读更多来源: The Decoder | 02-11-25
Word2vec-style vector arithmetic on docs embeddingstechnicalwriting.dev
阅读更多来源: Hacker News | 02-11-25
Universal Music Group rewrites its AI playbook with deals involving Udio and Stability AI
阅读更多来源: The Decoder | 01-11-25
Show HN: Pipelex – Declarative language for repeatable AI workflowsgithub.com/pipelex
阅读更多来源: Hacker News | 01-11-25
According to Anthropic, language models can perceive some of their own internal states
阅读更多来源: The Decoder | 01-11-25
OpenAI and Google and are joining forces with PayPal
阅读更多来源: The Decoder | 01-11-25
Gemini 3 set for 2025 launch as Google CEO Pichai manages expectations for frontier model progress
阅读更多来源: The Decoder | 01-11-25
Signs of introspection in large language modelsanthropic.com
阅读更多来源: Hacker News | 01-11-25
Claude Is Downclaude.com
阅读更多来源: Hacker News | 31-10-25
Bertie the Brainwikipedia.org
阅读更多来源: Hacker News | 31-10-25
Show HN: In a single HTML file, an app to encourage my children to investroberdam.com
阅读更多来源: Hacker News | 31-10-25
"No, wait, avoid wiki" - Elon Musk's Grokipedia is biased AI slop
阅读更多来源: The Decoder | 30-10-25
OpenAI tightens ChatGPT safeguards for mental health conversations
阅读更多来源: The Decoder | 30-10-25
OpenAI targets full-scale autonomous AI researcher by early 2028
阅读更多来源: The Decoder | 30-10-25
Microsoft gains AGI rights in new OpenAI deal, expert panel to rule on milestone
阅读更多来源: The Decoder | 30-10-25
OpenAI’s promise to stay in California helped clear the path for its IPOwsj.com
阅读更多来源: Hacker News | 30-10-25
Replacing EBS and Rethinking Postgres Storage from First Principlestigerdata.com
阅读更多来源: Hacker News | 30-10-25
Hybrid Quantum-Classical Recurrent Neural Networks
Authors: Wenduan Xu |
阅读更多来源: ArXiv AI | 30-10-25
Communication and Verification in LLM Agents towards Collaboration under Information Asymmetry
Authors: Run Peng, Ziqiao Ma, Amy Pang, Sikai Li, Zhang Xi-Jia, Yingzhuo Yu, Cristian-Paul Bara, Joyce Chai |
阅读更多来源: ArXiv AI | 30-10-25
Leveraging an Atmospheric Foundational Model for Subregional Sea Surface Temperature Forecasting
Authors: Víctor Medina, Giovanny A. Cuervo-Londoño, Javier Sánchez |
阅读更多来源: ArXiv AI | 30-10-25
FARSIQA: Faithful and Advanced RAG System for Islamic Question Answering
Authors: Mohammad Aghajani Asl, Behrooz Minaei Bidgoli |
阅读更多来源: ArXiv AI | 30-10-25
Graph Network-based Structural Simulator: Graph Neural Networks for Structural Dynamics
Authors: Alessandro Lucchetti (1), Francesco Cadini (1), Marco Giglio (1), Luca Lomazzi (1) ((1) Politecnico di Milano, Department of Mechanical Engineering, Milano, Italy) |
阅读更多来源: ArXiv AI | 30-10-25
User Misconceptions of LLM-Based Conversational Programming Assistants
Authors: Gabrielle O'Brien, Antonio Pedro Santos Alves, Sebastian Baltes, Grischa Liebel, Mircea Lungu, Marcos Kalinowski |
阅读更多来源: ArXiv AI | 30-10-25
The Limits of Obliviate: Evaluating Unlearning in LLMs via Stimulus-Knowledge Entanglement-Behavior Framework
Authors: Aakriti Shah, Thai Le |
阅读更多来源: ArXiv AI | 30-10-25
Scheduling Your LLM Reinforcement Learning with Reasoning Trees
Authors: Hong Wang, Zhezheng Hao, Jian Luo, Chenxing Wei, Yao Shu, Lei Liu, Qiang Lin, Hande Dong, Jiawei Chen |
阅读更多来源: ArXiv AI | 30-10-25
H3M-SSMoEs: Hypergraph-based Multimodal Learning with LLM Reasoning and Style-Structured Mixture of Experts
Authors: Peilin Tan, Liang Xie, Churan Zhi, Dian Tu, Chuanqi Shi |
阅读更多来源: ArXiv AI | 30-10-25
Aligning Large Language Models with Procedural Rules: An Autoregressive State-Tracking Prompting for In-Game Trading
Authors: Minkyung Kim, Junsik Kim, Woongcheol Yang, Sangdon Park, Sohee Bae |
阅读更多来源: ArXiv AI | 30-10-25
Taming the Real-world Complexities in CPT E/M Coding with Large Language Models
Authors: Islam Nassar, Yang Lin, Yuan Jin, Rongxin Zhu, Chang Wei Tan, Zenan Zhai, Nitika Mathur, Thanh Tien Vu, Xu Zhong, Long Duong, Yuan-Fang Li |
阅读更多来源: ArXiv AI | 30-10-25
RAVR: Reference-Answer-guided Variational Reasoning for Large Language Models
Authors: Tianqianjin Lin, Xi Zhao, Xingyao Zhang, Rujiao Long, Yi Xu, Zhuoren Jiang, Wenbo Su, Bo Zheng |
阅读更多来源: ArXiv AI | 30-10-25
Instrumental goals in advanced AI systems: Features to be managed and not failures to be eliminated?
Authors: Willem Fourie |
阅读更多来源: ArXiv AI | 30-10-25
Retrieval Augmented Generation (RAG) for Fintech: Agentic Design and Evaluation
Authors: Thomas Cook, Richard Osuagwu, Liman Tsatiashvili, Vrynsia Vrynsia, Koustav Ghosal, Maraim Masoud, Riccardo Mattivi |
阅读更多来源: ArXiv AI | 30-10-25
Predicate Renaming via Large Language Models
Authors: Elisabetta Gentili, Tony Ribeiro, Fabrizio Riguzzi, Katsumi Inoue |
阅读更多来源: ArXiv AI | 30-10-25
Standardization of Psychiatric Diagnoses -- Role of Fine-tuned LLM Consortium and OpenAI-gpt-oss Reasoning LLM Enabled Decision Support System
Authors: Eranga Bandara, Ross Gore, Atmaram Yarlagadda, Anita H. Clayton, Preston Samuel, Christopher K. Rhea, Sachin Shetty |
阅读更多来源: ArXiv AI | 30-10-25
Counterfactual-based Agent Influence Ranker for Agentic AI Workflows
Authors: Amit Giloni, Chiara Picardi, Roy Betser, Shamik Bose, Aishvariya Priya Rathina Sabapathy, Roman Vainshtein |
阅读更多来源: ArXiv AI | 30-10-25
Responses from LLMs are not factsstopcitingai.com
阅读更多来源: Hacker News | 30-10-25
Generative AI Image Editing Showdownspecr.net
阅读更多来源: Hacker News | 29-10-25
EuroLLM: LLM made in Europe built to support all 24 official EU languageseurollm.io
阅读更多来源: Hacker News | 29-10-25
ChatGPT's Atlas: The Browser That's Anti-Webanildash.com
阅读更多来源: Hacker News | 29-10-25
Does Object Binding Naturally Emerge in Large Pretrained Vision Transformers?
Authors: Yihao Li, Saeed Salehi, Lyle Ungar, Konrad P. Kording |
阅读更多来源: ArXiv AI | 29-10-25
ComboBench: Can LLMs Manipulate Physical Devices to Play Virtual Reality Games?
Authors: Shuqing Li, Jiayi Yan, Chenyu Niu, Jen-tse Huang, Yun Peng, Wenxuan Wang, Yepang Liu, Michael R. Lyu |
阅读更多来源: ArXiv AI | 29-10-25
Evaluating In Silico Creativity: An Expert Review of AI Chess Compositions
Authors: Vivek Veeriah, Federico Barbero, Marcus Chiam, Xidong Feng, Michael Dennis, Ryan Pachauri, Thomas Tumiel, Johan Obando-Ceron, Jiaxin Shi, Shaobo Hou, Satinder Singh, Nenad Tomašev, Tom Zahavy |
阅读更多来源: ArXiv AI | 29-10-25
Decentralized Multi-Agent Goal Assignment for Path Planning using Large Language Models
Authors: Murad Ismayilov, Edwin Meriaux, Shuo Wen, Gregory Dudek |
阅读更多来源: ArXiv AI | 29-10-25
ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents
Authors: Zhenyu Zhang, Tianyi Chen, Weiran Xu, Alex Pentland, Jiaxin Pei |
阅读更多来源: ArXiv AI | 29-10-25
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges
Authors: Shrestha Datta, Shahriar Kabir Nahin, Anshuman Chhabra, Prasant Mohapatra |
阅读更多来源: ArXiv AI | 29-10-25
Hybrid Modeling, Sim-to-Real Reinforcement Learning, and Large Language Model Driven Control for Digital Twins
Authors: Adil Rasheed, Oscar Ravik, Omer San |
阅读更多来源: ArXiv AI | 29-10-25
The Sign Estimator: LLM Alignment in the Face of Choice Heterogeneity
Authors: Aymane El Gadarri, Ali Aouad, Vivek F. Farias |
阅读更多来源: ArXiv AI | 29-10-25
Modeling Electric Vehicle Car-Following Behavior: Classical vs Machine Learning Approach
Authors: Md. Shihab Uddin, Md Nazmus Shakib, Rahul Bhadani |
阅读更多来源: ArXiv AI | 29-10-25
LLMLogAnalyzer: A Clustering-Based Log Analysis Chatbot using Large Language Models
Authors: Peng Cai, Reza Ryan, Nickson M. Karie |
阅读更多来源: ArXiv AI | 29-10-25
Discovering Heuristics with Large Language Models (LLMs) for Mixed-Integer Programs: Single-Machine Scheduling
Authors: İbrahim Oğuz Çetinkaya, İ. Esra Büyüktahtakın, Parshin Shojaee, Chandan K. Reddy |
阅读更多来源: ArXiv AI | 29-10-25
MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools
Authors: Wenhao Wang, Peizhi Niu, Zhao Xu, Zhaoyu Chen, Jian Du, Yaxin Du, Xianghe Pang, Keduan Huang, Yanfeng Wang, Qiang Yan, Siheng Chen |
阅读更多来源: ArXiv AI | 29-10-25
Retrieval and Argumentation Enhanced Multi-Agent LLMs for Judgmental Forecasting
Authors: Deniz Gorur, Antoni Rago, Francesca Toni |
阅读更多来源: ArXiv AI | 29-10-25
Verifying Large Language Models' Reasoning Paths via Correlation Matrix Rank
Authors: Jiayu Liu, Wei Dai, Zhenya Huang, Ning Miao, Enhong Chen |
阅读更多来源: ArXiv AI | 29-10-25
Generative Large Language Models (gLLMs) in Content Analysis: A Practical Guide for Communication Research
Authors: Daria Kravets-Meinke, Hannah Schmid-Petri, Sonja Niemann, Ute Schmid |
阅读更多来源: ArXiv AI | 29-10-25
Policy Cards: Machine-Readable Runtime Governance for Autonomous AI Agents
Authors: Juraj Mavračić |
阅读更多来源: ArXiv AI | 29-10-25
An N-of-1 Artificial Intelligence Ecosystem for Precision Medicine
Authors: Pedram Fard, Alaleh Azhir, Neguine Rezaii, Jiazi Tian, Hossein Estiri |
阅读更多来源: ArXiv AI | 29-10-25
A Unified Geometric Space Bridging AI Models and the Human Brain
Authors: Silin Chen, Yuzhong Chen, Zifan Wang, Junhao Wang, Zifeng Jia, Keith M Kendrick, Tuo Zhang, Lin Zhao, Dezhong Yao, Tianming Liu, Xi Jiang |
阅读更多来源: ArXiv AI | 29-10-25
APTBench: Benchmarking Agentic Potential of Base LLMs During Pre-Training
Authors: Jiarui Qin, Yunjia Xi, Junjie Huang, Renting Rui, Di Yin, Weiwen Liu, Yong Yu, Weinan Zhang, Xing Sun |
阅读更多来源: ArXiv AI | 29-10-25
Improving LLM Reasoning via Dependency-Aware Query Decomposition and Logic-Parallel Content Expansion
Authors: Xianjun Gao, Jianchun Liu, Hongli Xu, Liusheng Huang |
阅读更多来源: ArXiv AI | 29-10-25
Law in Silico: Simulating Legal Society with LLM-Based Agents
Authors: Yiding Wang, Yuxuan Chen, Fanxu Meng, Xifan Chen, Xiaolei Yang, Muhan Zhang |
阅读更多来源: ArXiv AI | 29-10-25
Human-Level Reasoning: A Comparative Study of Large Language Models on Logical and Abstract Reasoning
Authors: Benjamin Grando Moreira |
阅读更多来源: ArXiv AI | 29-10-25
Adaptive Surrogate Gradients for Sequential Reinforcement Learning in Spiking Neural Networks
Authors: Korneel Van den Berghe, Stein Stroobants, Vijay Janapa Reddi, G.C.H.E. de Croon |
阅读更多来源: ArXiv AI | 29-10-25
Generative AI for Healthcare: Fundamentals, Challenges, and Perspectives
Authors: Gang Chen, Changshuo Liu, Gene Anne Ooi, Marcus Tan, Zhongle Xie, Jianwei Yin, James Wei Luen Yip, Wenqiao Zhang, Jiaqi Zhu, Beng Chin Ooi |
阅读更多来源: ArXiv AI | 29-10-25
I've been loving Claude Code on the webben.page
阅读更多来源: Hacker News | 29-10-25
Show HN: Butter – A Behavior Cache for LLMsbutter.dev
阅读更多来源: Hacker News | 29-10-25
Our LLM-controlled office robot can't pass butterandonlabs.com
阅读更多来源: Hacker News | 29-10-25
Using AI to negotiate a $195k hospital bill down to $33kthreads.com
阅读更多来源: Hacker News | 29-10-25
OpenAI says over a million people talk to ChatGPT about suicide weeklytechcrunch.com
阅读更多来源: Hacker News | 28-10-25
Poker Tournament for LLMspokerbattle.ai
阅读更多来源: Hacker News | 28-10-25
Claude for Excelclaude.com
阅读更多来源: Hacker News | 28-10-25
The next chapter of the Microsoft–OpenAI partnershipopenai.com
阅读更多来源: Hacker News | 28-10-25
Jarvis: Towards Personalized AI Assistant via Personal KV-Cache Retrieval
Authors: Binxiao Xu, Junyu Feng, Ruichuan An, Yulin Luo, Shilin Yan, Hao Liang, Ming Lu, Wentao Zhang |
阅读更多来源: ArXiv AI | 28-10-25
Multi-Modal Fact-Verification Framework for Reducing Hallucinations in Large Language Models
Authors: Piyushkumar Patel |
阅读更多来源: ArXiv AI | 28-10-25
Critical Insights into Leading Conversational AI Models
Authors: Urja Kohli (1), Aditi Singh (2), Arun Sharma (3) ((1) Department of Mechanical and Automation Engineering, Indira Gandhi Delhi Technical University for Women, Delhi, India, (2) Department of Electronics and Communication Engineering, Indira Gandhi Delhi Technical University for Women, Delhi, India, (3) Department of Information Technology, Indira Gandhi Delhi Technical University for Women, Delhi, India) |
阅读更多来源: ArXiv AI | 28-10-25
RaCoT: Plug-and-Play Contrastive Example Generation Mechanism for Enhanced LLM Reasoning Reliability
Authors: Kaitong Cai, Jusheng Zhang, Yijia Fan, Jing Yang, Keze Wang |
阅读更多来源: ArXiv AI | 28-10-25
How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations
Authors: Zora Zhiruo Wang, Yijia Shao, Omar Shaikh, Daniel Fried, Graham Neubig, Diyi Yang |
阅读更多来源: ArXiv AI | 28-10-25
Rethinking the Text-Vision Reasoning Imbalance in MLLMs through the Lens of Training Recipes
Authors: Guanyu Yao, Qiucheng Wu, Yang Zhang, Zhaowen Wang, Handong Zhao, Shiyu Chang |
阅读更多来源: ArXiv AI | 28-10-25
A Survey of AI Scientists: Surveying the automatic Scientists and Research
Authors: Guiyao Tie, Pan Zhou, Lichao Sun |
阅读更多来源: ArXiv AI | 28-10-25
From Prompt Optimization to Multi-Dimensional Credibility Evaluation: Enhancing Trustworthiness of Chinese LLM-Generated Liver MRI Reports
Authors: Qiuli Wang, Xiaoming Li, Jie Chen, Yongxu Liu, Xingpeng Zhang, Chen Liu, Wei Chen |
阅读更多来源: ArXiv AI | 28-10-25
Lost in Tokenization: Context as the Key to Unlocking Biomolecular Understanding in Scientific LLMs
Authors: Kai Zhuang, Jiawei Zhang, Yumou Liu, Hanqun Cao, Chunbin Gu, Mengdi Liu, Zhangyang Gao, Zitong Jerry Wang, Xuanhe Zhou, Pheng-Ann Heng, Lijun Wu, Conghui He, Cheng Tan |
阅读更多来源: ArXiv AI | 28-10-25
AutoStreamPipe: LLM Assisted Automatic Generation of Data Stream Processing Pipelines
Authors: Abolfazl Younesi, Zahra Najafabadi Samani, Thomas Fahringer |
阅读更多来源: ArXiv AI | 28-10-25
What are the odds? Risk and uncertainty about AI existential risk
Authors: Marco Grossi |
阅读更多来源: ArXiv AI | 28-10-25
Are Agents Just Automata? On the Formal Equivalence Between Agentic AI and the Chomsky Hierarchy
Authors: Roham Koohestani, Ziyou Li, Anton Podkopaev, Maliheh Izadi |
阅读更多来源: ArXiv AI | 28-10-25
Policy-Aware Generative AI for Safe, Auditable Data Access Governance
Authors: Shames Al Mandalawi, Muzakkiruddin Ahmed Mohammed, Hendrika Maclean, Mert Can Cakmak, John R. Talburt |
阅读更多来源: ArXiv AI | 28-10-25
Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier
Authors: Hyeongseop Rha, Jeong Hun Yeo, Yeonju Kim, Yong Man Ro |
阅读更多来源: ArXiv AI | 28-10-25
Reduced AI Acceptance After the Generative AI Boom: Evidence From a Two-Wave Survey Study
Authors: Joachim Baumann, Aleksandra Urman, Ulrich Leicht-Deobald, Zachary J. Roman, Anikó Hannák, Markus Christen |
阅读更多来源: ArXiv AI | 28-10-25
Multi-Agent Evolve: LLM Self-Improve through Co-evolution
Authors: Yixing Chen, Yiding Wang, Siqi Zhu, Haofei Yu, Tao Feng, Muhan Zhan, Mostofa Patwary, Jiaxuan You |
阅读更多来源: ArXiv AI | 28-10-25
Researchers discover three factors that make AI agents significantly smarter
阅读更多来源: The Decoder | 28-10-25
Should LLMs just treat text content as an image?seangoedecke.com
阅读更多来源: Hacker News | 28-10-25
Show HN: Dlog – Journaling and AI coach that learns what drives wellbeing (Mac)dlog.pro
阅读更多来源: Hacker News | 28-10-25
ChatGPT shares data on how many users exhibit psychosis or suicidal thoughtsbbc.com
阅读更多来源: Hacker News | 28-10-25
Go beyond Goroutines: introducing the Reactive paradigmsamuelberthe.substack.com
阅读更多来源: Hacker News | 28-10-25
Human and AI Trust: Trust Attitude Measurement Instrument
Authors: Retno Larasati |
阅读更多来源: ArXiv AI | 28-10-25
GranViT: A Fine-Grained Vision Model With Autoregressive Perception For MLLMs
Authors: Guanghao Zheng, Bowen Shi, Mingxing Xu, Ruoyu Sun, Peisen Zhao, Zhibo Zhang, Wenrui Dai, Junni Zou, Hongkai Xiong, Xiaopeng Zhang, Qi Tian |
阅读更多来源: ArXiv AI | 28-10-25
Few-Shot Knowledge Distillation of LLMs With Counterfactual Explanations
Authors: Faisal Hamman, Pasan Dissanayake, Yanjun Fu, Sanghamitra Dutta |
阅读更多来源: ArXiv AI | 28-10-25
From Polyester Girlfriends to Blind Mice: Creating the First Pragmatics Understanding Benchmarks for Slovene
Authors: Mojca Brglez, Špela Vintar |
阅读更多来源: ArXiv AI | 28-10-25
Customizing Open Source LLMs for Quantitative Medication Attribute Extraction across Heterogeneous EHR Systems
Authors: Zhe Fei, Mehmet Yigit Turali, Shreyas Rajesh, Xinyang Dai, Huyen Pham, Pavan Holur, Yuhui Zhu, Larissa Mooney, Yih-Ing Hser, Vwani Roychowdhury |
阅读更多来源: ArXiv AI | 28-10-25
DAO-AI: Evaluating Collective Decision-Making through Agentic AI in Decentralized Governance
Authors: Chunghyun Han, Alfio Gliozzo, Junkyu Lee, Agostino Capponi |
阅读更多来源: ArXiv AI | 28-10-25
String Seed of Thought: Prompting LLMs for Distribution-Faithful and Diverse Generation
Authors: Kou Misaki, Takuya Akiba |
阅读更多来源: ArXiv AI | 28-10-25
NeuroGenPoisoning: Neuron-Guided Attacks on Retrieval-Augmented Generation of LLM via Genetic Optimization of External Knowledge
Authors: Hanyu Zhu, Lance Fiondella, Jiawei Yuan, Kai Zeng, Long Jiao |
阅读更多来源: ArXiv AI | 28-10-25
Out-of-Distribution Detection for Safety Assurance of AI and Autonomous Systems
Authors: Victoria J. Hodge, Colin Paterson, Ibrahim Habli |
阅读更多来源: ArXiv AI | 28-10-25
Understanding AI Trustworthiness: A Scoping Review of AIES & FAccT Articles
Authors: Siddharth Mehrotra, Jin Huang, Xuelong Fu, Roel Dobbe, Clara I. Sánchez, Maarten de Rijke |
阅读更多来源: ArXiv AI | 28-10-25
CXRAgent: Director-Orchestrated Multi-Stage Reasoning for Chest X-Ray Interpretation
Authors: Jinhui Lou, Yan Yang, Zhou Yu, Zhenqi Fu, Weidong Han, Qingming Huang, Jun Yu |
阅读更多来源: ArXiv AI | 28-10-25
Advancing Symbolic Integration in Large Language Models: Beyond Conventional Neurosymbolic AI
Authors: Maneeha Rani, Bhupesh Kumar Mishra, Dhavalkumar Thakker |
阅读更多来源: ArXiv AI | 28-10-25
Boosting Accuracy and Efficiency of Budget Forcing in LLMs via Reinforcement Learning for Mathematical Reasoning
Authors: Ravindra Aribowo Tarunokusumo, Rafael Fernandes Cunha |
阅读更多来源: ArXiv AI | 28-10-25
Co-Sight: Enhancing LLM-Based Agents via Conflict-Aware Meta-Verification and Trustworthy Reasoning with Structured Facts
Authors: Hongwei Zhang, Ji Lu, Shiqing Jiang, Chenxiang Zhu, Li Xie, Chen Zhong, Haoran Chen, Yurui Zhu, Yongsheng Du, Yanqin Gao, Lingjun Huang, Baoli Wang, Fang Tan, Peng Zou |
阅读更多来源: ArXiv AI | 28-10-25
EU-Agent-Bench: Measuring Illegal Behavior of LLM Agents Under EU Law
Authors: Ilija Lichkovski, Alexander Müller, Mariam Ibrahim, Tiwai Mhundwa |
阅读更多来源: ArXiv AI | 28-10-25
AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite
Authors: Jonathan Bragg, Mike D'Arcy, Nishant Balepur, Dan Bareket, Bhavana Dalvi, Sergey Feldman, Dany Haddad, Jena D. Hwang, Peter Jansen, Varsha Kishore, Bodhisattwa Prasad Majumder, Aakanksha Naik, Sigal Rahamimov, Kyle Richardson, Amanpreet Singh, Harshit Surana, Aryeh Tiktinsky, Rosni Vasu, Guy Wiener, Chloe Anastasiades, Stefan Candra, Jason Dunkelberger, Dan Emery, Rob Evans, Malachi Hamada, Regan Huff, Rodney Kinney, Matt Latzke, Jaron Lochner, Ruben Lozano-Aguilera, Cecile Nguyen, Smita Rao, Amber Tanaka, Brooke Vlahos, Peter Clark, Doug Downey, Yoav Goldberg, Ashish Sabharwal, Daniel S. Weld |
阅读更多来源: ArXiv AI | 28-10-25
Adobe brings AI assistants and partner models to Creative Cloud
阅读更多来源: The Decoder | 28-10-25
Anthropic plans to secure up to one million Google TPUs by 2026 to expand its AI infrastructure
阅读更多来源: The Decoder | 27-10-25
OpenAI positions ChatGPT as a search engine for work data with Company Knowledge
阅读更多来源: The Decoder | 27-10-25
Google integrates Earth AI and Gemini language models for what it calls geospatial reasoning
阅读更多来源: The Decoder | 27-10-25
Microsoft needs to open up more about its OpenAI dealingswsj.com
阅读更多来源: Hacker News | 27-10-25
Show HN: Diagram as code tool with draggable customizationsgithub.com/rohanadwankar
阅读更多来源: Hacker News | 26-10-25
California invests in battery energy storage, leaving rolling blackouts behindlatimes.com
阅读更多来源: Hacker News | 26-10-25
Compress to Impress: Efficient LLM Adaptation Using a Single Gradient Step on 100 Samples
Authors: Shiva Sreeram, Alaa Maalouf, Pratyusha Sharma, Daniela Rus |
阅读更多来源: ArXiv AI | 26-10-25
Bayesian Inference of Primordial Magnetic Field Parameters from CMB with Spherical Graph Neural Networks
Authors: Juan Alejandro Pinto Castro, Héctor J. Hortúa, Jorge Enrique García-Farieta, Roger Anderson Hurtado |
阅读更多来源: ArXiv AI | 26-10-25
A Use-Case Specific Dataset for Measuring Dimensions of Responsible Performance in LLM-generated Text
Authors: Alicia Sagae, Chia-Jung Lee, Sandeep Avula, Brandon Dang, Vanessa Murdock |
阅读更多来源: ArXiv AI | 26-10-25
On the Detectability of LLM-Generated Text: What Exactly Is LLM-Generated Text?
Authors: Mingmeng Geng, Thierry Poibeau |
阅读更多来源: ArXiv AI | 26-10-25
DAG-Math: Graph-Guided Mathematical Reasoning in LLMs
Authors: Yuanhe Zhang, Ilja Kuzborskij, Jason D. Lee, Chenlei Leng, Fanghui Liu |
阅读更多来源: ArXiv AI | 26-10-25
Benchmarking Reasoning Reliability in Artificial Intelligence Models for Energy-System Analysis
Authors: Eliseo Curcio |
阅读更多来源: ArXiv AI | 26-10-25
LLMs can hide text in other text of the same length.ipynb
Authors: Antonio Norelli, Michael Bronstein |
阅读更多来源: ArXiv AI | 26-10-25
Using Large Language Models for Abstraction of Planning Domains - Extended Version
Authors: Bita Banihashemi, Megh Patel, Yves Lespérance |
阅读更多来源: ArXiv AI | 26-10-25
Individualized Cognitive Simulation in Large Language Models: Evaluating Different Cognitive Representation Methods
Authors: Tianyi Zhang, Xiaolin Zhou, Yunzhe Wang, Erik Cambria, David Traum, Rui Mao |
阅读更多来源: ArXiv AI | 26-10-25
Merge and Conquer: Evolutionarily Optimizing AI for 2048
Authors: Maggie Bai, Ava Kim Cohen, Eleanor Koss, Charlie Lichtenbaum |
阅读更多来源: ArXiv AI | 26-10-25
TRUST: A Decentralized Framework for Auditing Large Language Model Reasoning
Authors: Morris Yu-Chao Huang, Zhen Tan, Mohan Zhang, Pingzhi Li, Zhuo Zhang, Tianlong Chen |
阅读更多来源: ArXiv AI | 26-10-25
The Verification-Value Paradox: A Normative Critique of Gen AI in Legal Practice
Authors: Joshua Yuvaraj |
阅读更多来源: ArXiv AI | 26-10-25
Human-Centered LLM-Agent System for Detecting Anomalous Digital Asset Transactions
Authors: Gyuyeon Na, Minjung Park, Hyeonjeong Cha, Sangmi Chai |
阅读更多来源: ArXiv AI | 26-10-25
Classical Feature Embeddings Help in BERT-Based Human Mobility Prediction
Authors: Yunzhi Liu, Haokai Tan, Rushi Kanjaria, Lihuan Li, Flora D. Salim |
阅读更多来源: ArXiv AI | 26-10-25
Collateral Damage Assessment Model for AI System Target Engagement in Military Operations
Authors: Clara Maathuis, Kasper Cools |
阅读更多来源: ArXiv AI | 26-10-25
Bias by Design? How Data Practices Shape Fairness in AI Healthcare Systems
Authors: Anna Arias-Duart, Maria Eugenia Cardello, Atia Cortés |
阅读更多来源: ArXiv AI | 26-10-25
LLM-empowered knowledge graph construction: A survey
Authors: Haonan Bian |
阅读更多来源: ArXiv AI | 26-10-25
What Defines Good Reasoning in LLMs? Dissecting Reasoning Steps with Multi-Aspect Evaluation
Authors: Heejin Do, Jaehui Hwang, Dongyoon Han, Seong Joon Oh, Sangdoo Yun |
阅读更多来源: ArXiv AI | 26-10-25
Towards the Formalization of a Trustworthy AI for Mining Interpretable Models explOiting Sophisticated Algorithms
Authors: Riccardo Guidotti, Martina Cinquini, Marta Marchiori Manerba, Mattia Setzu, Francesco Spinnato |
阅读更多来源: ArXiv AI | 26-10-25
Towards Reliable Evaluation of Large Language Models for Multilingual and Multimodal E-Commerce Applications
Authors: Shuyi Xie, Ziqin Liew, Hailing Zhang, Haibo Zhang, Ling Hu, Zhiqiang Zhou, Shuman Liu, Anxiang Zeng |
阅读更多来源: ArXiv AI | 26-10-25
The Shape of Reasoning: Topological Analysis of Reasoning Traces in Large Language Models
Authors: Xue Wen Tan, Nathaniel Tan, Galen Lee, Stanley Kok |
阅读更多来源: ArXiv AI | 26-10-25
Integrating Machine Learning into Belief-Desire-Intention Agents: Current Advances and Open Challenges
Authors: Andrea Agiollo, Andrea Omicini |
阅读更多来源: ArXiv AI | 26-10-25
Show HN: LLM Rescuer – Fixing the billion dollar mistake in Rubygithub.com/barodeur
阅读更多来源: Hacker News | 26-10-25
I'm drowning in AI features I never asked for and I hate itmakeuseof.com
阅读更多来源: Hacker News | 26-10-25
Tell HN: OpenAI now requires ID verification and won't refund API credits
阅读更多来源: Hacker News | 25-10-25
Unlocking free WiFi on British Airwayssaxrag.com
阅读更多来源: Hacker News | 25-10-25
ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inferencearxiv.org
阅读更多来源: Hacker News | 25-10-25
'Attention is all you need' coauthor says he's 'sick' of transformersventurebeat.com
阅读更多来源: Hacker News | 25-10-25
Virtual Try on Free Online – AI Clothes Changer – I-TryOnvirtual-try-on.app
阅读更多来源: Hacker News | 25-10-25
Compiler optimizations for 5.8ms GPT-OSS-120B inference (not on GPUs)furiosa.ai
阅读更多来源: Hacker News | 25-10-25
Junk data from X makes large language models lose reasoning skills, researchers show
阅读更多来源: The Decoder | 25-10-25
Fast-DLLM: Training-Free Acceleration of Diffusion LLMarxiv.org
阅读更多来源: Hacker News | 24-10-25
OpenAI acquires Sky.appopenai.com
阅读更多来源: Hacker News | 24-10-25
Can “second life” EV batteries work as grid-scale energy storage?volts.wtf
阅读更多来源: Hacker News | 24-10-25
Claude Memoryanthropic.com
阅读更多来源: Hacker News | 24-10-25
Show HN: Git for LLMs – A context management interfacetwigg.ai
阅读更多来源: Hacker News | 24-10-25
Armed police swarm student after AI mistakes bag of Doritos for a weapondexerto.com
阅读更多来源: Hacker News | 24-10-25
OpenAI launches Atlas, a new web browser built around ChatGPT integration
阅读更多来源: The Decoder | 23-10-25
Bloomberg: OpenAI trains AI to take on junior banking tasks
阅读更多来源: The Decoder | 23-10-25
OpenAI tightens Sora 2 safeguards after Bryan Cranston's likeness appears without consent
阅读更多来源: The Decoder | 23-10-25
The AI Pause group is back with another warning about superintelligence
阅读更多来源: The Decoder | 23-10-25
Which Collatz numbers do Busy Beavers simulate (if any)?gbragafibra.github.io
阅读更多来源: Hacker News | 23-10-25
Run interactive commands in Gemini CLIgoogleblog.com
阅读更多来源: Hacker News | 23-10-25
Karpathy on DeepSeek-OCR paper: Are pixels better inputs to LLMs than text?twitter.com/karpathy
阅读更多来源: Hacker News | 23-10-25
Show HN: Deta Surf – An open source and local-first AI notebookgithub.com/deta
阅读更多来源: Hacker News | 23-10-25
Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark
Authors: Yu Wu, Ke Shu, Jonas Fischer, Lidia Pivovarova, David Rosson, Eetu Mäkelä, Mikko Tolonen |
阅读更多来源: ArXiv AI | 23-10-25
I Spy With My Model's Eye: Visual Search as a Behavioural Test for MLLMs
Authors: John Burden, Jonathan Prunty, Ben Slater, Matthieu Tehenan, Greg Davis, Lucy Cheke |
阅读更多来源: ArXiv AI | 23-10-25
Study of Training Dynamics for Memory-Constrained Fine-Tuning
Authors: Aël Quélennec, Nour Hezbri, Pavlo Mozharovskyi, Van-Tam Nguyen, Enzo Tartaglione |
阅读更多来源: ArXiv AI | 23-10-25
Are Large Language Models Sensitive to the Motives Behind Communication?
Authors: Addison J. Wu, Ryan Liu, Kerem Oktar, Theodore R. Sumers, Thomas L. Griffiths |
阅读更多来源: ArXiv AI | 23-10-25
Do Prompts Reshape Representations? An Empirical Study of Prompting Effects on Embeddings
Authors: Cesar Gonzalez-Gutierrez, Dirk Hovy |
阅读更多来源: ArXiv AI | 23-10-25
SmartSwitch: Advancing LLM Reasoning by Overcoming Underthinking via Promoting Deeper Thought Exploration
Authors: Xichen Zhang, Sitong Wu, Haoru Tan, Shaozuo Yu, Yinghao Zhu, Ziyi He, Jiaya Jia |
阅读更多来源: ArXiv AI | 23-10-25
Integrating Transparent Models, LLMs, and Practitioner-in-the-Loop: A Case of Nonprofit Program Evaluation
Authors: Ji Ma, Albert Casella |
阅读更多来源: ArXiv AI | 23-10-25
Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
Authors: Xichen Zhang, Sitong Wu, Yinghao Zhu, Haoru Tan, Shaozuo Yu, Ziyi He, Jiaya Jia |
阅读更多来源: ArXiv AI | 23-10-25
Test-time Verification via Optimal Transport: Coverage, ROC, & Sub-optimality
Authors: Arpan Mukherjee, Marcello Bullo, Debabrota Basu, Deniz Gündüz |
阅读更多来源: ArXiv AI | 23-10-25
A Multi-faceted Analysis of Cognitive Abilities: Evaluating Prompt Methods with Large Language Models on the CONSORT Checklist
Authors: Sohyeon Jeon, Hyung-Chul Lee |
阅读更多来源: ArXiv AI | 23-10-25
The MUSE Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMS
Authors: Brandon James Carone, Iran R. Roman, Pablo Ripollés |
阅读更多来源: ArXiv AI | 23-10-25
ChatGPT Unveils Its Limits: Principles of Law Deliver Checkmate
Authors: Marianna Molinari, Ilaria Angela Amantea, Marinella Quaranta, Guido Governatori |
阅读更多来源: ArXiv AI | 23-10-25
Learning to Make Friends: Coaching LLM Agents toward Emergent Social Ties
Authors: Philipp J. Schneider, Lin Tian, Marian-Andrei Rizoiu |
阅读更多来源: ArXiv AI | 23-10-25
Explainable e-sports win prediction through Machine Learning classification in streaming
Authors: Silvia García-Méndez, Francisco de Arriba-Pérez |
阅读更多来源: ArXiv AI | 23-10-25
AgentSense: LLMs Empower Generalizable and Explainable Web-Based Participatory Urban Sensing
Authors: Xusen Guo, Mingxing Peng, Xixuan Hao, Xingchen Zou, Qiongyan Wang, Sijie Ruan, Yuxuan Liang |
阅读更多来源: ArXiv AI | 23-10-25
RLIE: Rule Generation with Logistic Regression, Iterative Refinement, and Evaluation for Large Language Models
Authors: Yang Yang, Hua XU, Zhangyi Hu, Yutao Yue |
阅读更多来源: ArXiv AI | 23-10-25
Misalignment Bounty: Crowdsourcing AI Agent Misbehavior
Authors: Rustem Turtayev, Natalia Fedorova, Oleg Serikov, Sergey Koldyba, Lev Avagyan, Dmitrii Volkov |
阅读更多来源: ArXiv AI | 23-10-25
Beyond Reactivity: Measuring Proactive Problem Solving in LLM Agents
Authors: Gil Pasternak, Dheeraj Rajagopal, Julia White, Dhruv Atreja, Matthew Thomas, George Hurn-Maloney, Ash Lewis |
阅读更多来源: ArXiv AI | 23-10-25
Is AI a Bubble? I Didn't Think So Until I Heard of SDDmatsuoka.com
阅读更多来源: Hacker News | 23-10-25
VortexNet: Neural network based on fluid dynamicsgithub.com/samim23
阅读更多来源: Hacker News | 23-10-25
Deepseek's OCR system compresses image-based text so AI can handle much longer documents
阅读更多来源: The Decoder | 22-10-25
A parody website mocks the hype and dangers of the current large language model boom
阅读更多来源: The Decoder | 22-10-25
IBM brings Groq's ultra-fast AI inference to watsonx platform
阅读更多来源: The Decoder | 22-10-25
OPTAGENT: Optimizing Multi-Agent LLM Interactions Through Verbal Reinforcement Learning for Enhanced Reasoning
Authors: Zhenyu Bi, Meng Lu, Yang Li, Swastik Roy, Weijie Guan, Morteza Ziyadi, Xuan Wang |
阅读更多来源: ArXiv AI | 22-10-25
Activation Manifold Projection: Liberating Task-Specific Behaviors from LLM Architectures
Authors: Al Kari |
阅读更多来源: ArXiv AI | 22-10-25
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
Authors: Haochen Wang, Yuhao Wang, Tao Zhang, Yikang Zhou, Yanwei Li, Jiacong Wang, Ye Tian, Jiahao Meng, Zilong Huang, Guangcan Mai, Anran Wang, Yunhai Tong, Zhuochen Wang, Xiangtai Li, Zhaoxiang Zhang |
阅读更多来源: ArXiv AI | 22-10-25
How Do LLMs Use Their Depth?
Authors: Akshat Gupta, Jay Yeung, Gopala Anumanchipalli, Anna Ivanova |
阅读更多来源: ArXiv AI | 22-10-25
Learning from Generalization Patterns: An Evaluation-Driven Approach to Enhanced Data Augmentation for Fine-Tuning Small Language Models
Authors: Huan Song, Deeksha Razdan, Yiyue Qian, Arijit Ghosh Chowdhury, Parth Patwa, Aman Chadha, Shinan Zhang, Sharlina Keshava, Hannah Marlowe |
阅读更多来源: ArXiv AI | 22-10-25
Measuring Reasoning in LLMs: a New Dialectical Angle
Authors: Soheil Abbasloo |
阅读更多来源: ArXiv AI | 22-10-25
SMaRT: Select, Mix, and ReinvenT - A Strategy Fusion Framework for LLM-Driven Reasoning and Planning
Authors: Nikhil Verma, Manasa Bharadwaj, Wonjun Jang, Harmanpreet Singh, Yixiao Wang, Homa Fashandi, Chul Lee |
阅读更多来源: ArXiv AI | 22-10-25
CompactPrompt: A Unified Pipeline for Prompt Data Compression in LLM Workflows
Authors: Joong Ho Choi, Jiayang Zhao, Jeel Shah, Ritvika Sonawane, Vedant Singh, Avani Appalla, Will Flanagan, Filipe Condessa |
阅读更多来源: ArXiv AI | 22-10-25
FST.ai 2.0: An Explainable AI Ecosystem for Fair, Fast, and Inclusive Decision-Making in Olympic and Paralympic Taekwondo
Authors: Keivan Shariatmadar, Ahmad Osman, Ramin Ray, Usman Dildar, Kisam Kim |
阅读更多来源: ArXiv AI | 22-10-25
LLM-Based Multi-Agent System for Simulating and Analyzing Marketing and Consumer Behavior
Authors: Man-Lin Chu, Lucian Terhorst, Kadin Reed, Tom Ni, Weiwei Chen, Rongyu Lin |
阅读更多来源: ArXiv AI | 22-10-25
Annotating the Chain-of-Thought: A Behavior-Labeled Dataset for AI Safety
Authors: Antonio-Gabriel Chacón Menke, Phan Xuan Tan, Eiji Kamioka |
阅读更多来源: ArXiv AI | 22-10-25
Memory-Augmented State Machine Prompting: A Novel LLM Agent Framework for Real-Time Strategy Games
Authors: Runnan Qi, Yanan Ni, Lumin Jiang, Zongyuan Li, Kuihua Huang, Xian Guo |
阅读更多来源: ArXiv AI | 22-10-25
Genesis: Evolving Attack Strategies for LLM Web Agent Red-Teaming
Authors: Zheng Zhang, Jiarui He, Yuchen Cai, Deheng Ye, Peilin Zhao, Ruili Feng, Hao Wang |
阅读更多来源: ArXiv AI | 22-10-25
Illusions of reflection: open-ended task reveals systematic failures in Large Language Models' reflective reasoning
Authors: Sion Weatherhead, Flora Salim, Aaron Belbasis |
阅读更多来源: ArXiv AI | 22-10-25
ssToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning
Authors: Xiaohan Qin, Xiaoxing Wang, Ning Liao, Cancheng Zhang, Xiangdong Zhang, Mingquan Feng, Jingzhi Wang, Junchi Yan |
阅读更多来源: ArXiv AI | 22-10-25
LAFA: Agentic LLM-Driven Federated Analytics over Decentralized Data Sources
Authors: Haichao Ji, Zibo Wang, Yifei Zhu, Meng han, Dan Wang, Zhu Han |
阅读更多来源: ArXiv AI | 22-10-25
Probabilistic Modeling of Intentions in Socially Intelligent LLM Agents
Authors: Feifan Xia, Yuyang Fang, Defang Li, Yantong Xie, Weikang Li, Yang Li, Deguo Xia, Jizhou Huang |
阅读更多来源: ArXiv AI | 22-10-25
CircuitSeer: Mining High-Quality Data by Probing Mathematical Reasoning Circuits in LLMs
Authors: Shaobo Wang, Yongliang Miao, Yuancheng Liu, and Qianli Ma, Ning Liao, Linfeng Zhang |
阅读更多来源: ArXiv AI | 22-10-25
PlanU: Large Language Model Decision Making through Planning under Uncertainty
Authors: Ziwei Deng, Mian Deng, Chenjing Liang, Zeming Gao, Chennan Ma, Chenxing Lin, Haipeng Zhang, Songzhu Mei, Cheng Wang, Siqi Shen |
阅读更多来源: ArXiv AI | 22-10-25
AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library
Authors: Minwei Kong, Ao Qu, Xiaotong Guo, Wenbin Ouyang, Chonghe Jiang, Han Zheng, Yining Ma, Dingyi Zhuang, Yuhan Tang, Junyi Li, Hai Wang, Cathy Wu, Jinhua Zhao |
阅读更多来源: ArXiv AI | 22-10-25
Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents
Authors: Guangfu Guo, Xiaoqian Lu, Yue Feng |
阅读更多来源: ArXiv AI | 22-10-25
Deep Learning-Based Control Optimization for Glass Bottle Forming
Authors: Mattia Pujatti, Andrea Di Luca, Nicola Peghini, Federico Monegaglia, Marco Cristoforetti |
阅读更多来源: ArXiv AI | 22-10-25
Physics-guided Emulators Reveal Resilience and Fragility under Operational Latencies and Outages
Authors: Sarth Dubey, Subimal Ghosh, Udit Bhatia |
阅读更多来源: ArXiv AI | 22-10-25
Counterfactual Reasoning for Steerable Pluralistic Value Alignment of Large Language Models
Authors: Hanze Guo, Jing Yao, Xiao Zhou, Xiaoyuan Yi, Xing Xie |
阅读更多来源: ArXiv AI | 22-10-25
Crucible: Quantifying the Potential of Control Algorithms through LLM Agents
Authors: Lianchen Jia, Chaoyang Li, Qian Houde, Tianchi Huang, Jiangchuan Liu, Lifeng Sun |
阅读更多来源: ArXiv AI | 22-10-25
Query Decomposition for RAG: Balancing Exploration-Exploitation
Authors: Roxana Petcu, Kenton Murray, Daniel Khashabi, Evangelos Kanoulas, Maarten de Rijke, Dawn Lawrie, Kevin Duh |
阅读更多来源: ArXiv AI | 22-10-25
Leveraging Association Rules for Better Predictions and Better Explanations
Authors: Gilles Audemard, Sylvie Coste-Marquis, Pierre Marquis, Mehdi Sabiri, Nicolas Szczepanski |
阅读更多来源: ArXiv AI | 22-10-25
Neural audio codecs: how to get audio into LLMskyutai.org
阅读更多来源: Hacker News | 22-10-25
The Gypsy Life of Robert Louis Stevensonhudsonreview.com
阅读更多来源: Hacker News | 22-10-25
LLMs can get "brain rot"llm-brain-rot.github.io
阅读更多来源: Hacker News | 22-10-25
Getting DeepSeek-OCR working on an Nvidia Spark via brute force with Claude Codesimonwillison.net
阅读更多来源: Hacker News | 22-10-25
Wikipedia says traffic is falling due to AI search summaries and social videotechcrunch.com
阅读更多来源: Hacker News | 22-10-25
ChatGPT Atlaschatgpt.com
阅读更多来源: Hacker News | 22-10-25
Anthropic CEO Dario Amodei backs President Trump on AI policy, pushes back on criticism
阅读更多来源: The Decoder | 22-10-25
My trick for getting consistent classification from LLMsverdik.substack.com
阅读更多来源: Hacker News | 21-10-25
BERT is just a single text diffusion stepnathan.rs
阅读更多来源: Hacker News | 21-10-25
Claude Code on the webanthropic.com
阅读更多来源: Hacker News | 21-10-25
Alibaba Cloud says it cut Nvidia AI GPU use by 82% with new pooling systemtomshardware.com
阅读更多来源: Hacker News | 21-10-25
Production RAG: what I learned from processing 5M+ documentsabdellatif.io
阅读更多来源: Hacker News | 21-10-25
Urban-R1: Reinforced MLLMs Mitigate Geospatial Biases for Urban General Intelligence
Authors: Qiongyan Wang, Xingchen Zou, Yutian Jiang, Haomin Wen, Jiaheng Wei, Qingsong Wen, Yuxuan Liang |
阅读更多来源: ArXiv AI | 21-10-25
NP-Engine: Empowering Optimization Reasoning in Large Language Models with Verifiable Synthetic NP Problems
Authors: Xiaozhe Li, Xinyu Fang, Shengyuan Ding, Linyang Li, Haodong Duan, Qingwen Liu, Kai Chen |
阅读更多来源: ArXiv AI | 21-10-25
Before you <think>, monitor: Implementing Flavell's metacognitive framework in LLMs
Authors: Nick Oh |
阅读更多来源: ArXiv AI | 21-10-25
Foundation and Large-Scale AI Models in Neuroscience: A Comprehensive Review
Authors: Shihao Yang, Xiying Huang, Danilo Bernardo, Jun-En Ding, Andrew Michael, Jingmei Yang, Patrick Kwan, Ashish Raj, Feng Liu |
阅读更多来源: ArXiv AI | 21-10-25
Count Counts: Motivating Exploration in LLM Reasoning with Count-based Intrinsic Rewards
Authors: Xuan Zhang, Ruixiao Li, Zhijian Zhou, Long Li, Yulei Qin, Ke Li, Xing Sun, Xiaoyu Tan, Chao Qu, Yuan Qi |
阅读更多来源: ArXiv AI | 21-10-25
Can Knowledge-Graph-based Retrieval Augmented Generation Really Retrieve What You Need?
Authors: Junchi Yu, Yujie Liu, Jindong Gu, Philip Torr, Dongzhan Zhou |
阅读更多来源: ArXiv AI | 21-10-25
BuildArena: A Physics-Aligned Interactive Benchmark of LLMs for Engineering Construction
Authors: Tian Xia, Tianrun Gao, Wenhao Deng, Long Wei, Xiaowei Qian, Yixian Jiang, Chenglei Yu, Tailin Wu |
阅读更多来源: ArXiv AI | 21-10-25
ELMM: Efficient Lightweight Multimodal Large Language Models for Multimodal Knowledge Graph Completion
Authors: Wei Huang, Peining Li, Meiyu Liang, Xu Hou, Junping Du, Yingxia Shao, Guanhua Ye, Wu Liu, Kangkang Lu, Yang Yu |
阅读更多来源: ArXiv AI | 21-10-25
Surrogate Modeling and Explainable Artificial Intelligence for Complex Systems: A Workflow for Automated Simulation Exploration
Authors: Paul Saves, Pramudita Satria Palar, Muhammad Daffa Robani, Nicolas Verstaevel, Moncef Garouani, Julien Aligon, Benoit Gaudou, Koji Shimoyama, Joseph Morlier |
阅读更多来源: ArXiv AI | 21-10-25
An Agentic Framework with LLMs for Solving Complex Vehicle Routing Problems
Authors: Ni Zhang, Zhiguang Cao, Jianan Zhou, Cong Zhang, Yew-Soon Ong |
阅读更多来源: ArXiv AI | 21-10-25
DeepAnalyze: Agentic Large Language Models for Autonomous Data Science
Authors: Shaolei Zhang, Ju Fan, Meihao Fan, Guoliang Li, Xiaoyong Du |
阅读更多来源: ArXiv AI | 21-10-25
A Brain Cell Type Resource Created by Large Language Models and a Multi-Agent AI System for Collaborative Community Annotation
Authors: Rongbin Li, Wenbo Chen, Zhao Li, Rodrigo Munoz-Castaneda, Jinbo Li, Neha S. Maurya, Arnav Solanki, Huan He, Hanwen Xing, Meaghan Ramlakhan, Zachary Wise, Zhuhao Wu, Hua Xu, Michael Hawrylycz, W. Jim Zheng |
阅读更多来源: ArXiv AI | 21-10-25
Which LLM Multi-Agent Protocol to Choose?
Authors: Hongyi Du, Jiaqi Su, Jisen Li, Lijie Ding, Yingxuan Yang, Peixuan Han, Xiangru Tang, Kunlun Zhu, Jiaxuan You |
阅读更多来源: ArXiv AI | 21-10-25
Physics-Informed Large Language Models for HVAC Anomaly Detection with Autonomous Rule Generation
Authors: Subin Lin, Chuanbo Hua |
阅读更多来源: ArXiv AI | 21-10-25
Offline Policy Evaluation of Multi-Turn LLM Health Coaching with Real Users
Authors: Melik Ozolcer, Sang Won Bae |
阅读更多来源: ArXiv AI | 21-10-25
Label Indeterminacy in AI & Law
Authors: Cor Steging, Tadeusz Zbiegień |
阅读更多来源: ArXiv AI | 21-10-25
MIRAGE: Agentic Framework for Multimodal Misinformation Detection with Web-Grounded Reasoning
Authors: Mir Nafis Sharear Shopnil, Sharad Duwal, Abhishek Tyagi, Adiba Mahbub Proma |
阅读更多来源: ArXiv AI | 21-10-25
LLM-as-a-Prophet: Understanding Predictive Intelligence with Prophet Arena
Authors: Qingchuan Yang, Simon Mahns, Sida Li, Anri Gu, Jibang Wu, Haifeng Xu |
阅读更多来源: ArXiv AI | 21-10-25
Contextual Attention Modulation: Towards Efficient Multi-Task Adaptation in Large Language Models
Authors: Dayan Pan, Zhaoyang Fu, Jingyuan Wang, Xiao Han, Yue Zhu, Xiangyu Zhao |
阅读更多来源: ArXiv AI | 21-10-25
Leading OpenAI researcher announced a GPT-5 math breakthrough that never happened
阅读更多来源: The Decoder | 21-10-25
Exploring the Synergy of Quantitative Factors and Newsflow Representations from Large Language Models for Stock Return Prediction
Authors: Tian Guo, Emmanuel Hauptmann |
阅读更多来源: ArXiv AI | 21-10-25
DGME-T: Directional Grid Motion Encoding for Transformer-Based Historical Camera Movement Classification
Authors: Tingyu Lin, Armin Dadras, Florian Kleber, Robert Sablatnig |
阅读更多来源: ArXiv AI | 21-10-25
LLMs Judge Themselves: A Game-Theoretic Framework for Human-Aligned Evaluation
Authors: Gao Yang, Yuhang Liu, Siyu Miao, Xinyue Liang, Zhengyang Liu, Heyan Huang |
阅读更多来源: ArXiv AI | 21-10-25
Enhanced Sentiment Interpretation via a Lexicon-Fuzzy-Transformer Framework
Authors: Shayan Rokhva, Mousa Alizadeh, Maryam Abdollahi Shamami |
阅读更多来源: ArXiv AI | 21-10-25
InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-Based Incremental Training
Authors: Pengkai Wang, Qi Zuo, Pengwei Liu, Zhijie Sang, Congkai Xie, Hongxia Yang |
阅读更多来源: ArXiv AI | 21-10-25
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Authors: Hanrong Ye, Chao-Han Huck Yang, Arushi Goel, Wei Huang, Ligeng Zhu, Yuanhang Su, Sean Lin, An-Chieh Cheng, Zhen Wan, Jinchuan Tian, Yuming Lou, Dong Yang, Zhijian Liu, Yukang Chen, Ambrish Dantrey, Ehsan Jahangiri, Sreyan Ghosh, Daguang Xu, Ehsan Hosseini-Asl, Danial Mohseni Taheri, Vidya Murali, Sifei Liu, Jason Lu, Oluwatobi Olabiyi, Frank Wang, Rafael Valle, Bryan Catanzaro, Andrew Tao, Song Han, Jan Kautz, Hongxu Yin, Pavlo Molchanov |
阅读更多来源: ArXiv AI | 21-10-25
OpenEstimate: Evaluating LLMs on Reasoning Under Uncertainty with Real-World Data
Authors: Alana Renda, Jillian Ross, Michael Cafarella, Jacob Andreas |
阅读更多来源: ArXiv AI | 21-10-25
HugAgent: Evaluating LLMs in Simulating Human-Like Individual Reasoning on Open-Ended Tasks
Authors: Chance Jiajie Li, Zhenze Mo, Yuhan Tang, Ao Qu, Jiayi Wu, Kaiya Ivy Zhao, Yulu Gan, Jie Fan, Jiangbo Yu, Hang Jiang, Paul Pu Liang, Jinhua Zhao, Luis Alberto Alonso Pastor, Kent Larson |
阅读更多来源: ArXiv AI | 21-10-25
Experience-Driven Exploration for Efficient API-Free AI Agents
Authors: Chenwei Tang, Jingyu Xing, Xinyu Liu, Zizhou Wang, Jiawei Du, Liangli Zhen, Jiancheng Lv |
阅读更多来源: ArXiv AI | 21-10-25
Multi-dimensional Data Analysis and Applications Basing on LLM Agents and Knowledge Graph Interactions
Authors: Xi Wang, Xianyao Ling, Kun Li, Gang Yin, Liang Zhang, Jiang Wu, Jun Xu, Fu Zhang, Wenbo Lei, Annie Wang, Peng Gong |
阅读更多来源: ArXiv AI | 21-10-25
VERITAS: Leveraging Vision Priors and Expert Fusion to Improve Multimodal Data
Authors: Tingqiao Xu, Ziru Zeng, Jiayu Chen |
阅读更多来源: ArXiv AI | 21-10-25
WebGen-V Bench: Structured Representation for Enhancing Visual Design in LLM-based Web Generation and Evaluation
Authors: Kuang-Da Wang, Zhao Wang, Yotaro Shimose, Wei-Yao Wang, Shingo Takamatsu |
阅读更多来源: ArXiv AI | 21-10-25
AUGUSTUS: An LLM-Driven Multimodal Agent System with Contextualized User Memory
Authors: Jitesh Jain, Shubham Maheshwari, Ning Yu, Wen-mei Hwu, Humphrey Shi |
阅读更多来源: ArXiv AI | 21-10-25
Taming the Judge: Deconflicting AI Feedback for Stable Reinforcement Learning
Authors: Boyin Liu, Zhuo Zhang, Sen Huang, Lipeng Xie, Qingxu Fu, Haoran Chen, LI YU, Tianyi Hu, Zhaoyang Liu, Bolin Ding, Dongbin Zhao |
阅读更多来源: ArXiv AI | 21-10-25
MARS: Reinforcing Multi-Agent Reasoning of LLMs through Self-Play in Strategic Games
Authors: Huining Yuan, Zelai Xu, Zheyue Tan, Xiangmin Yi, Mo Guang, Kaiwen Long, Haojia Hui, Boxun Li, Xinlei Chen, Bo Zhao, Xiao-Ping Zhang, Chao Yu, Yu Wang |
阅读更多来源: ArXiv AI | 21-10-25
Context-aware deep learning using individualized prior information reduces false positives in disease risk prediction and longitudinal health assessment
Authors: Lavanya Umapathy, Patricia M Johnson, Tarun Dutt, Angela Tong, Madhur Nayan, Hersh Chandarana, Daniel K Sodickson |
阅读更多来源: ArXiv AI | 21-10-25
Preliminary Quantitative Study on Explainability and Trust in AI Systems
Authors: Allen Daniel Sunny |
阅读更多来源: ArXiv AI | 21-10-25
Demo: Guide-RAG: Evidence-Driven Corpus Curation for Retrieval-Augmented Generation in Long COVID
Authors: Philip DiGiacomo, Haoyang Wang, Jinrui Fang, Yan Leng, W Michael Brode, Ying Ding |
阅读更多来源: ArXiv AI | 21-10-25
PokeeResearch: Effective Deep Research via Reinforcement Learning from AI Feedback and Robust Reasoning Scaffold
Authors: Yi Wan, Jiuqi Wang, Liam Li, Jinsong Liu, Ruihao Zhu, Zheqing Zhu |
阅读更多来源: ArXiv AI | 21-10-25
OpenAI needs new scaling laws for both its AI models and its revenue
阅读更多来源: The Decoder | 20-10-25
Google brings live Google Maps data to its Gemini models
阅读更多来源: The Decoder | 20-10-25
AI researcher Andrej Karpathy says agentic AI is years away from matching industry hype
阅读更多来源: The Decoder | 20-10-25
Don't Force Your LLM to Write Terse [Q/Kdb] Code: An Information Theory Argumentmedium.com/gabiteodoru
阅读更多来源: Hacker News | 20-10-25
The case for the return of fine-tuningwelovesota.com
阅读更多来源: Hacker News | 20-10-25
Show HN: Pyversity – Fast Result Diversification for Retrieval and RAGgithub.com/pringled
阅读更多来源: Hacker News | 20-10-25
Uber will offer gig work like AI data labeling to drivers while not on the roadcnbc.com
阅读更多来源: Hacker News | 19-10-25
Do Large Language Models Show Biases in Causal Learning? Insights from Contingency Judgment
Authors: María Victoria Carro, Denise Alejandra Mester, Francisca Gauna Selasco, Giovanni Franco Gabriel Marraffini, Mario Alejandro Leiva, Gerardo I. Simari, María Vanina Martinez |
阅读更多来源: ArXiv AI | 19-10-25
Position: Require Frontier AI Labs To Release Small "Analog" Models
Authors: Shriyash Upadhyay, Chaithanya Bandi, Narmeen Oozeer, Philip Quirke |
阅读更多来源: ArXiv AI | 19-10-25
Formalizing the Safety, Security, and Functional Properties of Agentic AI Systems
Authors: Edoardo Allegrini, Ananth Shreekumar, Z. Berkay Celik |
阅读更多来源: ArXiv AI | 19-10-25
Echoes of Human Malice in Agents: Benchmarking LLMs for Multi-Turn Online Harassment Attacks
Authors: Trilok Padhi, Pinxian Lu, Abdulkadir Erol, Tanmay Sutar, Gauri Sharma, Mina Sonmez, Munmun De Choudhury, Ugur Kursuncu |
阅读更多来源: ArXiv AI | 19-10-25
Implementation of AI in Precision Medicine
Authors: Göktuğ Bender, Samer Faraj, Anand Bhardwaj |
阅读更多来源: ArXiv AI | 19-10-25
Towards Agentic Self-Learning LLMs in Search Environment
Authors: Wangtao Sun, Xiang Cheng, Jialin Fan, Yao Xu, Xing Yu, Shizhu He, Jun Zhao, Kang Liu |
阅读更多来源: ArXiv AI | 19-10-25
AI for Service: Proactive Assistance with AI Glasses
Authors: Zichen Wen, Yiyu Wang, Chenfei Liao, Boxue Yang, Junxian Li, Weifeng Liu, Haocong He, Bolong Feng, Xuyang Liu, Yuanhuiyi Lyu, Xu Zheng, Xuming Hu, Linfeng Zhang |
阅读更多来源: ArXiv AI | 19-10-25
Can MLLMs Absorb Math Reasoning Abilities from LLMs as Free Lunch?
Authors: Yijie Hu, Zihao Zhou, Kaizhu Huang, Xiaowei Huang, Qiufeng Wang |
阅读更多来源: ArXiv AI | 19-10-25
Beyond Hallucinations: The Illusion of Understanding in Large Language Models
Authors: Rikard Rosenbacke, Carl Rosenbacke, Victor Rosenbacke, Martin McKee |
阅读更多来源: ArXiv AI | 19-10-25
LLM Agents Beyond Utility: An Open-Ended Perspective
Authors: Asen Nachkov, Xi Wang, Luc Van Gool |
阅读更多来源: ArXiv AI | 19-10-25
Machine Learning and Public Health: Identifying and Mitigating Algorithmic Bias through a Systematic Review
Authors: Sara Altamirano, Arjan Vreeken, Sennay Ghebreab |
阅读更多来源: ArXiv AI | 19-10-25
Cognitive-Aligned Spatio-Temporal Large Language Models For Next Point-of-Interest Prediction
Authors: Penglong Zhai, Jie Li, Fanyi Di, Yue Liu, Yifang Yuan, Jie Huang, Peng Wu, Sicong Wang, Mingyang Yin, Tingting Hu, Yao Xu, Xin Li |
阅读更多来源: ArXiv AI | 19-10-25
RoboGPT-R1: Enhancing Robot Planning with Reinforcement Learning
Authors: Jinrui Liu, Bingyan Nie, Boyu Li, Yaran Chen, Yuze Wang, Shunsen He, Haoran Li |
阅读更多来源: ArXiv AI | 19-10-25
Where to Search: Measure the Prior-Structured Search Space of LLM Agents
Authors: Zhuo-Yang Song |
阅读更多来源: ArXiv AI | 19-10-25
Stable but Miscalibrated: A Kantian View on Overconfidence from Filters to Large Language Models
Authors: Akira Okutomi |
阅读更多来源: ArXiv AI | 19-10-25
OpenAI builds 'AI for Science' team to advance computational discovery
阅读更多来源: The Decoder | 19-10-25
Most users cannot identify AI bias, even in training datapsu.edu
阅读更多来源: Hacker News | 19-10-25
Verbalized Sampling is a simple prompt technique meant to make AI responses less boring
阅读更多来源: The Decoder | 19-10-25
SwiReasoning helps large language models switch reasoning modes to boost efficiency and accuracy
阅读更多来源: The Decoder | 19-10-25
Asking AI to build scrapers should be easy right?skyvern.com
阅读更多来源: Hacker News | 18-10-25
Claude Skills are awesome, maybe a bigger deal than MCPsimonwillison.net
阅读更多来源: Hacker News | 18-10-25
Anthropic launches "Skills" so Claude can automatically pick prompts for specialized tasks
阅读更多来源: The Decoder | 18-10-25
Trump's AI advisor accuses Anthropic of "regulatory capture"
阅读更多来源: The Decoder | 18-10-25
Claude Code vs. Codex: I built a sentiment dashboard from Reddit commentsaiengineering.report
阅读更多来源: Hacker News | 18-10-25
Gemini 3.0 spotted in the wild through A/B testingricklamers.io
阅读更多来源: Hacker News | 17-10-25
Codebase is 250% AI generatedmoderndescartes.com
阅读更多来源: Hacker News | 17-10-25
Claude Skillsanthropic.com
阅读更多来源: Hacker News | 17-10-25
Nvidia DGX Spark and Apple Mac Studio = 4x Faster LLM Inference with EXO 1.0exolabs.net
阅读更多来源: Hacker News | 17-10-25
OpenAI plans to let ChatGPT respond more like a real person, with options for erotic conversations
阅读更多来源: The Decoder | 16-10-25
Japan warns OpenAI over Sora 2 after AI-generated anime videos spark copyright concerns
阅读更多来源: The Decoder | 16-10-25
K-Merge: Online Continual Merging of Adapters for On-device Large Language Models
Authors: Donald Shenaj, Ondrej Bohdal, Taha Ceritli, Mete Ozay, Pietro Zanuttigh, Umberto Michieli |
阅读更多来源: ArXiv AI | 16-10-25
Deflanderization for Game Dialogue: Balancing Character Authenticity with Task Execution in LLM-based NPCs
Authors: Pasin Buakhaw, Kun Kerdthaisong, Phuree Phenhiran, Pitikorn Khlaisamniang, Supasate Vorathammathorn, Piyalitt Ittichaiwong, Nutchanon Yongsatianchot |
阅读更多来源: ArXiv AI | 16-10-25
In-Browser LLM-Guided Fuzzing for Real-Time Prompt Injection Testing in Agentic AI Browsers
Authors: Avihay Cohen |
阅读更多来源: ArXiv AI | 16-10-25
Subject Roles in the EU AI Act: Mapping and Regulatory Implications
Authors: Nicola Fabiano |
阅读更多来源: ArXiv AI | 16-10-25
Unlocking Public Catalogues: Instruction-Tuning LLMs for ICD Coding of German Tumor Diagnoses
Authors: Stefan Lenz, Lakisha Ortiz Rosario, Georg Vollmar, Arsenij Ustjanzew, Fatma Alickovic, Thomas Kindler, Torsten Panholzer |
阅读更多来源: ArXiv AI | 16-10-25
Axial Neural Networks for Dimension-Free Foundation Models
Authors: Hyunsu Kim, Jonggeon Park, Joan Bruna, Hongseok Yang, Juho Lee |
阅读更多来源: ArXiv AI | 16-10-25
Closing the Gap Between Text and Speech Understanding in LLMs
Authors: Santiago Cuervo, Skyler Seto, Maureen de Seyssel, Richard He Bai, Zijin Gu, Tatiana Likhomanenko, Navdeep Jaitly, Zakaria Aldeneh |
阅读更多来源: ArXiv AI | 16-10-25
Simplicial Embeddings Improve Sample Efficiency in Actor-Critic Agents
Authors: Johan Obando-Ceron, Walter Mayor, Samuel Lavoie, Scott Fujimoto, Aaron Courville, Pablo Samuel Castro |
阅读更多来源: ArXiv AI | 16-10-25
FIRST: Federated Inference Resource Scheduling Toolkit for Scientific AI Model Access
Authors: Aditya Tanikanti, Benoit Côté, Yanfei Guo, Le Chen, Nickolaus Saint, Ryan Chard, Ken Raffenetti, Rajeev Thakur, Thomas Uram, Ian Foster, Michael E. Papka, Venkatram Vishwanath |
阅读更多来源: ArXiv AI | 16-10-25
Scaling Vision Transformers for Functional MRI with Flat Maps
Authors: Connor Lane, Daniel Z. Kaplan, Tanishq Mathew Abraham, Paul S. Scotti |
阅读更多来源: ArXiv AI | 16-10-25
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs
Authors: Yi Zhang, Bolin Ni, Xin-Sheng Chen, Heng-Rui Zhang, Yongming Rao, Houwen Peng, Qinglin Lu, Han Hu, Meng-Hao Guo, Shi-Min Hu |
阅读更多来源: ArXiv AI | 16-10-25
The Art of Scaling Reinforcement Learning Compute for LLMs
Authors: Devvrit Khatri, Lovish Madaan, Rishabh Tiwari, Rachit Bansal, Sai Surya Duvvuri, Manzil Zaheer, Inderjit S. Dhillon, David Brandfonbrener, Rishabh Agarwal |
阅读更多来源: ArXiv AI | 16-10-25
From Literal to Liberal: A Meta-Prompting Framework for Eliciting Human-Aligned Exception Handling in Large Language Models
Authors: Imran Khan |
阅读更多来源: ArXiv AI | 16-10-25
From Narratives to Probabilistic Reasoning: Predicting and Interpreting Drivers' Hazardous Actions in Crashes Using Large Language Model
Authors: Boyou Chen, Gerui Xu, Zifei Wang, Huizhong Guo, Ananna Ahmed, Zhaonan Sun, Zhen Hu, Kaihan Zhang, Shan Bao |
阅读更多来源: ArXiv AI | 16-10-25
SENTINEL: A Multi-Level Formal Framework for Safety Evaluation of LLM-based Embodied Agents
Authors: Simon Sinong Zhan, Yao Liu, Philip Wang, Zinan Wang, Qineng Wang, Zhian Ruan, Xiangyu Shi, Xinyu Cao, Frank Yang, Kangrui Wang, Huajie Shao, Manling Li, Qi Zhu |
阅读更多来源: ArXiv AI | 16-10-25
Emotional Cognitive Modeling Framework with Desire-Driven Objective Optimization for LLM-empowered Agent in Social Simulation
Authors: Qun Ma, Xiao Xue, Xuwen Zhang, Zihan Zhao, Yuwei Guo, Ming Zhang |
阅读更多来源: ArXiv AI | 16-10-25
Assessing LLM Reasoning Through Implicit Causal Chain Discovery in Climate Discourse
Authors: Liesbeth Allein, Nataly Pineda-Castañeda, Andrea Rocci, Marie-Francine Moens |
阅读更多来源: ArXiv AI | 16-10-25
A Methodology for Assessing the Risk of Metric Failure in LLMs Within the Financial Domain
Authors: William Flanagan, Mukunda Das, Rajitha Ramanyake, Swaunja Maslekar, Meghana Manipuri, Joong Ho Choi, Shruti Nair, Shambhavi Bhusan, Sanjana Dulam, Mouni Pendharkar, Nidhi Singh, Vashisth Doshi, Sachi Shah Paresh |
阅读更多来源: ArXiv AI | 16-10-25
Confidence as a Reward: Transforming LLMs into Reward Models
Authors: He Du, Bowen Li, Chengxing Xie, Chang Gao, Kai Chen, Dacheng Tao |
阅读更多来源: ArXiv AI | 16-10-25
Mobile Coverage Analysis using Crowdsourced Data
Authors: Timothy Wong, Tom Freeman, Joseph Feehily |
阅读更多来源: ArXiv AI | 16-10-25
Training LLM Agents to Empower Humans
Authors: Evan Ellis, Vivek Myers, Jens Tuyls, Sergey Levine, Anca Dragan, Benjamin Eysenbach |
阅读更多来源: ArXiv AI | 16-10-25
From Refusal to Recovery: A Control-Theoretic Approach to Generative AI Guardrails
Authors: Ravi Pandya, Madison Bland, Duy P. Nguyen, Changliu Liu, Jaime Fernández Fisac, Andrea Bajcsy |
阅读更多来源: ArXiv AI | 16-10-25
Claude Haiku 4.5anthropic.com
阅读更多来源: Hacker News | 16-10-25
California passes first U.S. law regulating AI companion chatbots
阅读更多来源: The Decoder | 16-10-25
Writing an LLM from scratch, part 22 – training our LLMgilesthomas.com
阅读更多来源: Hacker News | 16-10-25
Anthropic's Jack Clark compares AI breakthroughs to hammers that suddenly become self-aware
阅读更多来源: The Decoder | 15-10-25
How AI hears accents: An audible visualization of accent clustersboldvoice.com
阅读更多来源: Hacker News | 15-10-25
Hacking the Humane AI Pinagg.im
阅读更多来源: Hacker News | 15-10-25
Helpcare AI (YC F24) Is Hiring
阅读更多来源: Hacker News | 15-10-25
Apple unleashes M5, the next big leap in AI performance for Apple Siliconapple.com
阅读更多来源: Hacker News | 15-10-25
Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation
Authors: Sayash Kapoor, Benedikt Stroebl, Peter Kirgis, Nitya Nadgir, Zachary S Siegel, Boyi Wei, Tianci Xue, Ziru Chen, Felix Chen, Saiteja Utpala, Franck Ndzomga, Dheeraj Oruganty, Sophie Luskin, Kangheng Liu, Botao Yu, Amit Arora, Dongyoon Hahm, Harsh Trivedi, Huan Sun, Juyong Lee, Tengjun Jin, Yifan Mai, Yifei Zhou, Yuxuan Zhu, Rishi Bommasani, Daniel Kang, Dawn Song, Peter Henderson, Yu Su, Percy Liang, Arvind Narayanan |
阅读更多来源: ArXiv AI | 15-10-25
Beyond Consensus: Mitigating the Agreeableness Bias in LLM Judge Evaluations
Authors: Suryaansh Jain, Umair Z. Ahmed, Shubham Sahai, Ben Leong |
阅读更多来源: ArXiv AI | 15-10-25
Do Large Language Models Respect Contracts? Evaluating and Enforcing Contract-Adherence in Code Generation
Authors: Soohan Lim, Joonghyuk Hahn, Hyunwoo Park, Sang-Ki Ko, Yo-Sub Han |
阅读更多来源: ArXiv AI | 15-10-25
Asking Clarifying Questions for Preference Elicitation With Large Language Models
Authors: Ali Montazeralghaem, Guy Tennenholtz, Craig Boutilier, Ofer Meshi |
阅读更多来源: ArXiv AI | 15-10-25
Empowering LLM Agents with Geospatial Awareness: Toward Grounded Reasoning for Wildfire Response
Authors: Yiheng Chen, Lingyao Li, Zihui Ma, Qikai Hu, Yilun Zhu, Min Deng, Runlong Yu |
阅读更多来源: ArXiv AI | 15-10-25
ToPolyAgent: AI Agents for Coarse-Grained Topological Polymer Simulations
Authors: Lijie Ding, Jan-Michael Carrillo, Changwoo Do |
阅读更多来源: ArXiv AI | 15-10-25
Evaluating the Quality of Randomness and Entropy in Tasks Supported by Large Language Models
Authors: Rabimba Karanjai, Yang Lu, Ranjith Chodavarapu, Lei Xu, Weidong Shi |
阅读更多来源: ArXiv AI | 15-10-25
Evolution of meta's llama models and parameter-efficient fine-tuning of large language models: a survey
Authors: Abdulhady Abas Abdullah, Arkaitz Zubiaga, Seyedali Mirjalili, Amir H. Gandomi, Fatemeh Daneshfar, Mohammadsadra Amini, Alan Salam Mohammed, Hadi Veisi |
阅读更多来源: ArXiv AI | 15-10-25
MatSciBench: Benchmarking the Reasoning Ability of Large Language Models in Materials Science
Authors: Junkai Zhang, Jingru Gan, Xiaoxuan Wang, Zian Jia, Changquan Gu, Jianpeng Chen, Yanqiao Zhu, Mingyu Derek Ma, Dawei Zhou, Ling Li, Wei Wang |
阅读更多来源: ArXiv AI | 15-10-25
Precise Attribute Intensity Control in Large Language Models via Targeted Representation Editing
Authors: Rongzhi Zhang, Liqin Ye, Yuzhao Heng, Xiang Chen, Tong Yu, Lingkai Kong, Sudheer Chava, Chao Zhang |
阅读更多来源: ArXiv AI | 15-10-25
PromptFlow: Training Prompts Like Neural Networks
Authors: Jingyi Wang, Hongyuan Zhu, Ye Niu, Yunhui Deng |
阅读更多来源: ArXiv AI | 15-10-25
MedKGEval: A Knowledge Graph-Based Multi-Turn Evaluation Framework for Open-Ended Patient Interactions with Clinical LLMs
Authors: Yuechun Yu, Han Ying, Haoan Jin, Wenjian Jiang, Dong Xian, Binghao Wang, Zhou Yang, Mengyue Wu |
阅读更多来源: ArXiv AI | 15-10-25
On the Design and Evaluation of Human-centered Explainable AI Systems: A Systematic Review and Taxonomy
Authors: Aline Mangold, Juliane Zietz, Susanne Weinhold, Sebastian Pannasch |
阅读更多来源: ArXiv AI | 15-10-25
O-Forge: An LLM + Computer Algebra Framework for Asymptotic Analysis
Authors: Ayush Khaitan, Vijay Ganesh |
阅读更多来源: ArXiv AI | 15-10-25
RAG-Anything: All-in-One RAG Framework
Authors: Zirui Guo, Xubin Ren, Lingrui Xu, Jiahao Zhang, Chao Huang |
阅读更多来源: ArXiv AI | 15-10-25
MTOS: A LLM-Driven Multi-topic Opinion Simulation Framework for Exploring Echo Chamber Dynamics
Authors: Dingyi Zuo, Hongjie Zhang, Jie Ou, Chaosheng Feng, Shuwan Liu |
阅读更多来源: ArXiv AI | 15-10-25
PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks
Authors: Yunuo Liu, Dawei Zhu, Zena Al-Khalili, Dai Cheng, Yanjun Chen, Dietrich Klakow, Wei Zhang, Xiaoyu Shen |
阅读更多来源: ArXiv AI | 15-10-25
A Survey of Vibe Coding with Large Language Models
Authors: Yuyao Ge, Lingrui Mei, Zenghao Duan, Tianhao Li, Yujia Zheng, Yiwei Wang, Lexin Wang, Jiayu Yao, Tianyu Liu, Yujun Cai, Baolong Bi, Fangda Guo, Jiafeng Guo, Shenghua Liu, Xueqi Cheng |
阅读更多来源: ArXiv AI | 15-10-25
Artificial Intelligence Virtual Cells: From Measurements to Decisions across Modality, Scale, Dynamics, and Evaluation
Authors: Chengpeng Hu, Calvin Yu-Chian Chen |
阅读更多来源: ArXiv AI | 15-10-25
Using Medical Algorithms for Task-Oriented Dialogue in LLM-Based Medical Interviews
Authors: Rui Reis, Pedro Rangel Henriques, João Ferreira-Coimbra, Eva Oliveira, Nuno F. Rodrigues |
阅读更多来源: ArXiv AI | 15-10-25
Evaluating and Mitigating LLM-as-a-judge Bias in Communication Systems
Authors: Jiaxin Gao, Chen Chen, Yanwen Jia, Xueluan Gong, Kwok-Yan Lam, Qian Wang |
阅读更多来源: ArXiv AI | 15-10-25
CAMNet: Leveraging Cooperative Awareness Messages for Vehicle Trajectory Prediction
Authors: Mattia Grasselli, Angelo Porrello, Carlo Augusto Grazia |
阅读更多来源: ArXiv AI | 15-10-25
Multi-Agent Debate for LLM Judges with Adaptive Stability Detection
Authors: Tianyu Hu, Zhen Tan, Song Wang, Huaizhi Qu, Tianlong Chen |
阅读更多来源: ArXiv AI | 15-10-25
Towards Robust Artificial Intelligence: Self-Supervised Learning Approach for Out-of-Distribution Detection
Authors: Wissam Salhab, Darine Ameyed, Hamid Mcheick, Fehmi Jaafar |
阅读更多来源: ArXiv AI | 15-10-25
Show HN: Wispbit - Linter for AI coding agentswispbit.com
阅读更多来源: Hacker News | 15-10-25
Preparing for AI's economic impact: exploring policy responsesanthropic.com
阅读更多来源: Hacker News | 15-10-25
Chinese researchers let LLMs share meaning through internal memory instead of text
阅读更多来源: The Decoder | 14-10-25
LLM-Friendly Knowledge Representation for Customer Support
Authors: Hanchen Su, Wei Luo, Wei Han, Yu Elaine Liu, Yufeng Wayne Zhang, Cen Mia Zhao, Ying Joy Zhang, Yashar Mehdad |
阅读更多来源: ArXiv AI | 14-10-25
EA4LLM: A Gradient-Free Approach to Large Language Model Optimization via Evolutionary Algorithms
Authors: WenTao Liu, Siyu Song, Hao Hao, Aimin Zhou |
阅读更多来源: ArXiv AI | 14-10-25
A Layered Intuition -- Method Model with Scope Extension for LLM Reasoning
Authors: Hong Su |
阅读更多来源: ArXiv AI | 14-10-25
ELAIPBench: A Benchmark for Expert-Level Artificial Intelligence Paper Understanding
Authors: Xinbang Dai, Huikang Hu, Yongrui Chen, Jiaqi Li, Rihui Jin, Yuyang Zhang, Xiaoguang Li, Lifeng Shang, Guilin Qi |
阅读更多来源: ArXiv AI | 14-10-25
Hierarchical Optimization via LLM-Guided Objective Evolution for Mobility-on-Demand Systems
Authors: Yi Zhang, Yushen Long, Yun Ni, Liping Huang, Xiaohong Wang, Jun Liu |
阅读更多来源: ArXiv AI | 14-10-25
Equity-Aware Geospatial AI for Forecasting Demand-Driven Hospital Locations in Germany
Authors: Piyush Pant, Marcellius William Suntoro, Ayesha Siddiqua, Muhammad Shehryaar Sharif, Daniyal Ahmed |
阅读更多来源: ArXiv AI | 14-10-25
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
Authors: Caorui Li, Yu Chen, Yiyan Ji, Jin Xu, Zhenyu Cui, Shihao Li, Yuanxing Zhang, Jiafu Tang, Zhenghao Song, Dingling Zhang, Ying He, Haoxiang Liu, Yuxuan Wang, Qiufeng Wang, Zhenhe Wu, Jiehui Luo, Zhiyu Pan, Weihao Xie, Chenchen Zhang, Zhaohui Wang, Jiayi Tian, Yanghai Wang, Zhe Cao, Minxin Dai, Ke Wang, Runzhe Wen, Yinghao Ma, Yaning Pan, Sungkyun Chang, Termeh Taheri, Haiwen Xia, Christos Plachouras, Emmanouil Benetos, Yizhi Li, Ge Zhang, Jian Yang, Tianhao Peng, Zili Wang, Minghao Liu, Junran Peng, Zhaoxiang Zhang, Jiaheng Liu |
阅读更多来源: ArXiv AI | 14-10-25
Simpliflow: A Lightweight Open-Source Framework for Rapid Creation and Deployment of Generative Agentic AI Workflows
Authors: Deven Panchal |
阅读更多来源: ArXiv AI | 14-10-25
LLMs as Strategic Agents: Beliefs, Best Response Behavior, and Emergent Heuristics
Authors: Enric Junque de Fortuny, Veronica Roberta Cappelli |
阅读更多来源: ArXiv AI | 14-10-25
Adaptive Selection of Symbolic Languages for Improving LLM Logical Reasoning
Authors: Xiangyu Wang, Haocheng Yang, Fengxiang Cheng, Fenrong Liu |
阅读更多来源: ArXiv AI | 14-10-25
LLM-Empowered Agentic MAC Protocols: A Dynamic Stackelberg Game Approach
Authors: Renxuan Tan, Rongpeng Li, Fei Wang, Chenghui Peng, Shaoyun Wu, Zhifeng Zhao, Honggang Zhang |
阅读更多来源: ArXiv AI | 14-10-25
Video-STR: Reinforcing MLLMs in Video Spatio-Temporal Reasoning with Relation Graph
Authors: Wentao Wang, Heqing Zou, Tianze Luo, Rui Huang, Yutian Zhao, Zhuochen Wang, Hansheng Zhang, Chengwei Qin, Yan Wang, Lin Zhao, Huaijian Zhang |
阅读更多来源: ArXiv AI | 14-10-25
Improving AI Efficiency in Data Centres by Power Dynamic Response
Authors: Andrea Marinoni, Sai Shivareddy, Pietro Lio', Weisi Lin, Erik Cambria, Clare Grey |
阅读更多来源: ArXiv AI | 14-10-25
Spec-Driven AI for Science: The ARIA Framework for Automated and Reproducible Data Analysis
Authors: Chuke Chen, Biao Luo, Nan Li, Boxiang Wang, Hang Yang, Jing Guo, Ming Xu |
阅读更多来源: ArXiv AI | 14-10-25
Automated Skill Decomposition Meets Expert Ontologies: Bridging the Granularity Gap with LLMs
Authors: Le Ngoc Luyen, Marie-Hélène Abel |
阅读更多来源: ArXiv AI | 14-10-25
From <Answer> to <Think>: Multidimensional Supervision of Reasoning Process for LLM Optimization
Authors: Beining Wang, Weihang Su, Hongtao Tian, Tao Yang, Yujia Zhou, Ting Yao, Qingyao Ai, Yiqun Liu |
阅读更多来源: ArXiv AI | 14-10-25
Analyzing and Internalizing Complex Policy Documents for LLM Agents
Authors: Jiateng Liu, Zhenhailong Wang, Xiaojiang Huang, Yingjie Li, Xing Fan, Xiang Li, Chenlei Guo, Ruhi Sarikaya, Heng Ji |
阅读更多来源: ArXiv AI | 14-10-25
Zero Data Retention in LLM-based Enterprise AI Assistants: A Comparative Study of Market Leading Agentic AI Products
Authors: Komal Gupta, Aditya Shrivastava |
阅读更多来源: ArXiv AI | 14-10-25
Reproducibility: The New Frontier in AI Governance
Authors: Israel Mason-Williams, Gabryel Mason-Williams |
阅读更多来源: ArXiv AI | 14-10-25
Operand Quant: A Single-Agent Architecture for Autonomous Machine Learning Engineering
Authors: Arjun Sahney, Ram Gorthi, Cezary Łastowski, Javier Vega |
阅读更多来源: ArXiv AI | 14-10-25
America's future could hinge on whether AI slightly disappointsnoahpinion.blog
阅读更多来源: Hacker News | 14-10-25
LLMs are getting better at character-level text manipulationburkert.me
阅读更多来源: Hacker News | 14-10-25
America is getting an AI gold rush instead of a factory boomwashingtonpost.com
阅读更多来源: Hacker News | 14-10-25
Palisades Fire suspect's ChatGPT history to be used as evidencerollingstone.com
阅读更多来源: Hacker News | 14-10-25
NanoChat – The best ChatGPT that $100 can buygithub.com/karpathy
阅读更多来源: Hacker News | 14-10-25
Show HN: AI toy I worked on is in storeswalmart.com
阅读更多来源: Hacker News | 14-10-25
OpenAI accused of pressuring AI regulation advocates with subpoenas
阅读更多来源: The Decoder | 13-10-25
Anthropic finds 250 poisoned documents are enough to backdoor large language models
阅读更多来源: The Decoder | 13-10-25
A new information-theory framework reveals when multi-agent AI systems truly work as a team
阅读更多来源: The Decoder | 13-10-25
Performance Analysis of Machine Learning Algorithms in Chronic Kidney Disease Prediction
Authors: Iftekhar Ahmed, Tanzil Ebad Chowdhury, Biggo Bushon Routh, Nafisa Tasmiya, Shadman Sakib, Adil Ahmed Chowdhury |
阅读更多来源: ArXiv AI | 13-10-25
Prompting Test-Time Scaling Is A Strong LLM Reasoning Data Augmentation
Authors: Sondos Mahmoud Bsharat, Zhiqiang Shen |
阅读更多来源: ArXiv AI | 13-10-25
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Authors: Xiao Yu, Baolin Peng, Michel Galley, Hao Cheng, Qianhui Wu, Janardhan Kulkarni, Suman Nath, Zhou Yu, Jianfeng Gao |
阅读更多来源: ArXiv AI | 13-10-25
Robust Heuristic Algorithm Design with LLMs
Authors: Pantea Karimi, Dany Rouhana, Pooria Namyar, Siva Kesava Reddy Kakarla, Venkat Arun, Behnaz Arzani |
阅读更多来源: ArXiv AI | 13-10-25
RADAR: Mechanistic Pathways for Detecting Data Contamination in LLM Evaluation
Authors: Ashish Kattamuri, Harshwardhan Fartale, Arpita Vats, Rahul Raja, Ishita Prasad |
阅读更多来源: ArXiv AI | 13-10-25
GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare
Authors: Siqi Zhu, David Zhang, Pedro Cisneros-Velarde, Jiaxuan You |
阅读更多来源: ArXiv AI | 13-10-25
ReviewerToo: Should AI Join The Program Committee? A Look At The Future of Peer Review
Authors: Gaurav Sahu, Hugo Larochelle, Laurent Charlin, Christopher Pal |
阅读更多来源: ArXiv AI | 13-10-25
Semantic-Condition Tuning: Fusing Graph Context with Large Language Models for Knowledge Graph Completion
Authors: Ruitong Liu, Yan Wen, Te Sun, Yunjia Wu, Pingyang Huang, Zihang Yu, Siyuan Li |
阅读更多来源: ArXiv AI | 13-10-25
EcphoryRAG: Re-Imagining Knowledge-Graph RAG via Human Associative Memory
Authors: Zirui Liao |
阅读更多来源: ArXiv AI | 13-10-25
FATHOMS-RAG: A Framework for the Assessment of Thinking and Observation in Multimodal Systems that use Retrieval Augmented Generation
Authors: Samuel Hildebrand (1), Curtis Taylor (2), Sean Oesch (2), James M Ghawaly Jr (1), Amir Sadovnik (2), Ryan Shivers (2), Brandon Schreiber (2), Kevin Kurian (3) ((1) Louisiana State University, (2) Oak Ridge National Lab, (3) University of Florida) |
阅读更多来源: ArXiv AI | 13-10-25
Humanoid Artificial Consciousness Designed with Large Language Model Based on Psychoanalysis and Personality Theory
Authors: Sang Hun Kim, Jongmin Lee, Dongkyu Park, So Young Lee, Yosep Chong |
阅读更多来源: ArXiv AI | 13-10-25
RegexPSPACE: A Benchmark for Evaluating LLM Reasoning on PSPACE-complete Regex Problems
Authors: Hyundong Jin, Joonghyuk Hahn, Yo-Sub Han |
阅读更多来源: ArXiv AI | 13-10-25
Localist LLMs -- A Mathematical Framework for Dynamic Locality Control
Authors: Joachim Diederich |
阅读更多来源: ArXiv AI | 13-10-25
Fundamentals of Building Autonomous LLM Agents
Authors: Victor de Lamo Castrillo, Habtom Kahsay Gidey, Alexander Lenz, Alois Knoll |
阅读更多来源: ArXiv AI | 13-10-25
Safe, Untrusted, "Proof-Carrying" AI Agents: toward the agentic lakehouse
Authors: Jacopo Tagliabue, Ciro Greco |
阅读更多来源: ArXiv AI | 13-10-25
LiveOIBench: Can Large Language Models Outperform Human Contestants in Informatics Olympiads?
Authors: Kaijian Zou, Aaron Xiong, Yunxiang Zhang, Frederick Zhang, Yueqi Ren, Jirong Yang, Ayoung Lee, Shitanshu Bhushan, Lu Wang |
阅读更多来源: ArXiv AI | 13-10-25
AdapTive-LeArning Speculator System (ATLAS): Faster LLM inferencetogether.ai
阅读更多来源: Hacker News | 13-10-25
Edge AI for Beginnersgithub.com/microsoft
阅读更多来源: Hacker News | 13-10-25
Microsoft only lets you opt out of AI photo scanning 3x a yearslashdot.org
阅读更多来源: Hacker News | 12-10-25
Anthropic's Prompt Engineering Tutorialgithub.com/anthropics
阅读更多来源: Hacker News | 12-10-25
oMeBench: Towards Robust Benchmarking of LLMs in Organic Mechanism Elucidation and Reasoning
Authors: Ruiling Xu, Yifan Zhang, Qingyun Wang, Carl Edwards, Heng Ji |
阅读更多来源: ArXiv AI | 12-10-25
An LLM-Powered Cooperative Framework for Large-Scale Multi-Vehicle Navigation
Authors: Yuping Zhou, Siqi Lai, Jindong Han, Hao Liu |
阅读更多来源: ArXiv AI | 12-10-25
An approach for systematic decomposition of complex llm tasks
Authors: Tianle Zhou, Jiakai Xu, Guanhong Liu, Jiaxiang Liu, Haonan Wang, Eugene Wu |
阅读更多来源: ArXiv AI | 12-10-25
From Noisy to Native: LLM-driven Graph Restoration for Test-Time Graph Domain Adaptation
Authors: Xiangwei Lv, JinLuan Yang, Wang Lin, Jingyuan Chen, Beishui Liao |
阅读更多来源: ArXiv AI | 12-10-25
Haibu Mathematical-Medical Intelligent Agent:Enhancing Large Language Model Reliability in Medical Tasks via Verifiable Reasoning Chains
Authors: Yilun Zhang, Dexing Kong |
阅读更多来源: ArXiv AI | 12-10-25
SurveyG: A Multi-Agent LLM Framework with Hierarchical Citation Graph for Automated Survey Generation
Authors: Minh-Anh Nguye, Minh-Duc Nguyen, Nguyen Thi Ha Lan, Kieu Hai Dang, Nguyen Tien Dong, Le Duy Dung |
阅读更多来源: ArXiv AI | 12-10-25
Enabling Personalized Long-term Interactions in LLM-based Agents through Persistent Memory and User Profiles
Authors: Rebecca Westhäußer, Wolfgang Minker, Sebatian Zepf |
阅读更多来源: ArXiv AI | 12-10-25
Profit Mirage: Revisiting Information Leakage in LLM-based Financial Agents
Authors: Xiangyu Li, Yawen Zeng, Xiaofen Xing, Jin Xu, Xiangmin Xu |
阅读更多来源: ArXiv AI | 12-10-25
Towards Meaningful Transparency in Civic AI Systems
Authors: Dave Murray-Rust, Kars Alfrink, Cristina Zaga |
阅读更多来源: ArXiv AI | 12-10-25
Augur: Modeling Covariate Causal Associations in Time Series via Large Language Models
Authors: Zhiqing Cui, Binwu Wang, Qingxiang Liu, Yeqiang Wang, Zhengyang Zhou, Yuxuan Liang, Yang Wang |
阅读更多来源: ArXiv AI | 12-10-25
From Ethical Declarations to Provable Independence: An Ontology-Driven Optimal-Transport Framework for Certifiably Fair AI Systems
Authors: Sukriti Bhattacharya, Chitro Majumdar |
阅读更多来源: ArXiv AI | 12-10-25
AutoQual: An LLM Agent for Automated Discovery of Interpretable Features for Review Quality Assessment
Authors: Xiaochong Lan, Jie Feng, Yinxing Liu, Xinlei Shi, Yong Li |
阅读更多来源: ArXiv AI | 12-10-25
LinguaSim: Interactive Multi-Vehicle Testing Scenario Generation via Natural Language Instruction Based on Large Language Models
Authors: Qingyuan Shi, Qingwen Meng, Hao Cheng, Qing Xu, Jianqiang Wang |
阅读更多来源: ArXiv AI | 12-10-25
AILoRA: Function-Aware Asymmetric Initialization for Low-Rank Adaptation of Large Language Models
Authors: Xiaoshuang Ji, Zhendong Zhao, Xiaoyan Gu, Xiaojun Chen, Xin Zhao, Zeyao Liu |
阅读更多来源: ArXiv AI | 12-10-25
Measuring What Matters: The AI Pluralism Index
Authors: Rashid Mushkani |
阅读更多来源: ArXiv AI | 12-10-25
LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings
Authors: Benjamin F. Maier, Ulf Aslak, Luca Fiaschi, Nina Rismal, Kemble Fletcher, Christian C. Luhmann, Robbie Dow, Kli Pappas, Thomas V. Wiecki |
阅读更多来源: ArXiv AI | 12-10-25
CaRT: Teaching LLM Agents to Know When They Know Enough
Authors: Grace Liu, Yuxiao Qu, Jeff Schneider, Aarti Singh, Aviral Kumar |
阅读更多来源: ArXiv AI | 12-10-25
Figure AI claims its Figure 03 robot can wash dishes, clean floors, and handle chores
阅读更多来源: The Decoder | 12-10-25
Tiny AI model outperforms o3‑mini and Gemini 2.5 Pro in ARC‑AGI benchmark
阅读更多来源: The Decoder | 12-10-25
All-New Next Gen of UniFi Storageui.com
阅读更多来源: Hacker News | 12-10-25
Anthropic's marketing department opens "Zero Slop Zone" in New York
阅读更多来源: The Decoder | 11-10-25
All-natural geoengineering with Frank Herbert's Dunegovernance.fyi
阅读更多来源: Hacker News | 11-10-25
Google Deepmind's "Vibe Checker" aims to rate AI code by human standards
阅读更多来源: The Decoder | 11-10-25
European Commission launches "Apply AI" and "AI in Science" strategies to boost AI adoption
阅读更多来源: The Decoder | 10-10-25
LLMs are mortally terrified of exceptionstwitter.com/karpathy
阅读更多来源: Hacker News | 10-10-25
A beginner's guide to deploying LLMs with AMD on Windows using PyTorchgpuopen.com
阅读更多来源: Hacker News | 10-10-25
Weave (YC W25) is hiring a founding AI engineerycombinator.com
阅读更多来源: Hacker News | 10-10-25
A small number of samples can poison LLMs of any sizeanthropic.com
阅读更多来源: Hacker News | 10-10-25
TOUCAN is the largest open training dataset for AI agents
阅读更多来源: The Decoder | 09-10-25
OpenAI has now signed $1 trillion worth of AI infrastructure contracts
阅读更多来源: The Decoder | 09-10-25
Anthropic launches Petri, an open-source tool for automated AI model safety audits
阅读更多来源: The Decoder | 09-10-25
The Forecasting Company (YC S24) Is Hiring a Machine Learning Engineerycombinator.com
阅读更多来源: Hacker News | 09-10-25
Two things LLM coding agents are still bad atkix.dev
阅读更多来源: Hacker News | 09-10-25
A History of Large Language Modelsgregorygundersen.com
阅读更多来源: Hacker News | 09-10-25
McKinsey wonders how to sell AI apps with no measurable benefitstheregister.com
阅读更多来源: Hacker News | 09-10-25
Benchmarking LLM Causal Reasoning with Scientifically Validated Relationships
Authors: Donggyu Lee, Sungwon Park, Yerin Hwang, Hyunwoo Oh, Hyoshin Kim, Jungwon Kim, Meeyoung Cha, Sangyoon Park, Jihee Kim |
阅读更多来源: ArXiv AI | 09-10-25
Language Lives in Sparse Dimensions: Toward Interpretable and Efficient Multilingual Control for Large Language Models
Authors: Chengzhi Zhong, Fei Cheng, Qianying Liu, Yugo Murawaki, Chenhui Chu, Sadao Kurohashi |
阅读更多来源: ArXiv AI | 09-10-25
On the false election between regulation and innovation. Ideas for regulation through the responsible use of artificial intelligence in research and education.[Spanish version]
Authors: Pompeu Casanovas (IIIA-CSIC) |
阅读更多来源: ArXiv AI | 09-10-25
LeMAJ (Legal LLM-as-a-Judge): Bridging Legal Reasoning and LLM Evaluation
Authors: Joseph Enguehard, Morgane Van Ermengem, Kate Atkinson, Sujeong Cha, Arijit Ghosh Chowdhury, Prashanth Kallur Ramaswamy, Jeremy Roghair, Hannah R Marlowe, Carina Suzana Negreanu, Kitty Boxall, Diana Mincu |
阅读更多来源: ArXiv AI | 09-10-25
AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Authors: Peize He, Zichen Wen, Yubo Wang, Yuxuan Wang, Xiaoqian Liu, Jiajie Huang, Zehui Lei, Zhuangcheng Gu, Xiangqi Jin, Jiabing Yang, Kai Li, Zhifei Liu, Weijia Li, Cunxiang Wang, Conghui He, Linfeng Zhang |
阅读更多来源: ArXiv AI | 09-10-25
h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning
Authors: Sumeet Ramesh Motwani, Alesia Ivanova, Ziyang Cai, Philip Torr, Riashat Islam, Shital Shah, Christian Schroeder de Witt, Charles London |
阅读更多来源: ArXiv AI | 09-10-25
Off-Trajectory Reasoning: Can LLMs Collaborate on Reasoning Trajectory?
Authors: Aochong Oliver Li, Tanya Goyal |
阅读更多来源: ArXiv AI | 09-10-25
Auto-Prompt Ensemble for LLM Judge
Authors: Jiajie Li, Huayi Zhang, Peng Lin, Jinjun Xiong, Wei Xu |
阅读更多来源: ArXiv AI | 09-10-25
Agent-in-the-Loop: A Data Flywheel for Continuous Improvement in LLM-based Customer Support
Authors: Cen (Mia)Zhao, Tiantian Zhang, Hanchen Su, Yufeng (Wayne)Zhang, Shaowei Su, Mingzhi Xu, Yu (Elaine)Liu, Wei Han, Jeremy Werner, Claire Na Cheng, Yashar Mehdad |
阅读更多来源: ArXiv AI | 09-10-25
Verifying Memoryless Sequential Decision-making of Large Language Models
Authors: Dennis Gross, Helge Spieker, Arnaud Gotlieb |
阅读更多来源: ArXiv AI | 09-10-25
MultiCNKG: Integrating Cognitive Neuroscience, Gene, and Disease Knowledge Graphs Using Large Language Models
Authors: Ali Sarabadani, Kheirolah Rahsepar Fard |
阅读更多来源: ArXiv AI | 09-10-25
LLM-Assisted Modeling of Semantic Web-Enabled Multi-Agents Systems with AJAN
Authors: Hacane Hechehouche, Andre Antakli, Matthias Klusch |
阅读更多来源: ArXiv AI | 09-10-25
TGPR: Tree-Guided Policy Refinement for Robust Self-Debugging of LLMs
Authors: Daria Ozerova, Ekaterina Trofimova |
阅读更多来源: ArXiv AI | 09-10-25
Revisiting the Uniform Information Density Hypothesis in LLM Reasoning Traces
Authors: Minju Gwak, Guijin Son, Jaehyung Kim |
阅读更多来源: ArXiv AI | 09-10-25
VRPAgent: LLM-Driven Discovery of Heuristic Operators for Vehicle Routing Problems
Authors: André Hottung, Federico Berto, Chuanbo Hua, Nayeli Gast Zepeda, Daniel Wetzel, Michael Römer, Haoran Ye, Davide Zago, Michael Poli, Stefano Massaroli, Jinkyoo Park, Kevin Tierney |
阅读更多来源: ArXiv AI | 09-10-25
Integrating Domain Knowledge into Process Discovery Using Large Language Models
Authors: Ali Norouzifar, Humam Kourani, Marcus Dees, Wil van der Aalst |
阅读更多来源: ArXiv AI | 09-10-25
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents
Authors: Tianshi Zheng, Kelvin Kiu-Wai Tam, Newt Hue-Nam K. Nguyen, Baixuan Xu, Zhaowei Wang, Jiayang Cheng, Hong Ting Tsang, Weiqi Wang, Jiaxin Bai, Tianqing Fang, Yangqiu Song, Ginny Y. Wong, Simon See |
阅读更多来源: ArXiv AI | 09-10-25
Agentic generative AI for media content discovery at the national football league
Authors: Henry Wang, Sirajus Salekin, Jake Lee, Ross Claytor, Shinan Zhang, Michael Chi |
阅读更多来源: ArXiv AI | 09-10-25
Show HN: Recall: Give Claude memory with Redis-backed persistent contextnpmjs.com
阅读更多来源: Hacker News | 09-10-25
Now open for building: Introducing Gemini CLI extensionsblog.google
阅读更多来源: Hacker News | 09-10-25
Sora, AI Bicycles, and Meta Disruptionstratechery.com
阅读更多来源: Hacker News | 09-10-25
Gemini 2.5 Computer Use modelblog.google
阅读更多来源: Hacker News | 08-10-25
Legal Contracts Built for AI Agentspaid.ai
阅读更多来源: Hacker News | 08-10-25
The Email They Shouldn't Have Readdragas.net
阅读更多来源: Hacker News | 08-10-25
Lang-PINN: From Language to Physics-Informed Neural Networks via a Multi-Agent Framework
Authors: Xin He, Liangliang You, Hongduan Tian, Bo Han, Ivor Tsang, Yew-Soon Ong |
阅读更多来源: ArXiv AI | 08-10-25
Beyond Monolithic Rewards: A Hybrid and Multi-Aspect Reward Optimization for MLLM Alignment
Authors: Radha Gulhane, Sathish Reddy Indurthi |
阅读更多来源: ArXiv AI | 08-10-25
Efficient Prediction of Pass@k Scaling in Large Language Models
Authors: Joshua Kazdan, Rylan Schaeffer, Youssef Allouah, Colin Sullivan, Kyssen Yu, Noam Levi, Sanmi Koyejo |
阅读更多来源: ArXiv AI | 08-10-25
Graph-based LLM over Semi-Structured Population Data for Dynamic Policy Response
Authors: Daqian Shi, Xiaolei Diao, Jinge Wu, Honghan Wu, Xiongfeng Tang, Felix Naughton, Paulina Bondaronek |
阅读更多来源: ArXiv AI | 08-10-25
Plug-and-Play Dramaturge: A Divide-and-Conquer Approach for Iterative Narrative Script Refinement via Collaborative LLM Agents
Authors: Wenda Xie, Chao Guo, Yanqing Jing. Junle Wang, Yisheng Lv, Fei-Yue Wang |
阅读更多来源: ArXiv AI | 08-10-25
MHA-RAG: Improving Efficiency, Accuracy, and Consistency by Encoding Exemplars as Soft Prompts
Authors: Abhinav Jain, Xinyu Yao, Thomas Reps, Christopher Jermaine |
阅读更多来源: ArXiv AI | 08-10-25
Integrating Bayesian methods with neural network--based model predictive control: a review
Authors: Asli Karacelik |
阅读更多来源: ArXiv AI | 08-10-25
BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation for Large Language Models via Lens of Dynamic Interactions
Authors: Nan Huo, Xiaohan Xu, Jinyang Li, Per Jacobsson, Shipei Lin, Bowen Qin, Binyuan Hui, Xiaolong Li, Ge Qu, Shuzheng Si, Linheng Han, Edward Alexander, Xintong Zhu, Rui Qin, Ruihan Yu, Yiyao Jin, Feige Zhou, Weihao Zhong, Yun Chen, Hongyu Liu, Chenhao Ma, Fatma Ozcan, Yannis Papakonstantinou, Reynold Cheng |
阅读更多来源: ArXiv AI | 08-10-25
NASP-T: A Fuzzy Neuro-Symbolic Transformer for Logic-Constrained Aviation Safety Report Classification
Authors: Fadi Al Machot, Fidaa Al Machot |
阅读更多来源: ArXiv AI | 08-10-25
What Do You Mean? Exploring How Humans and AI Interact with Symbols and Meanings in Their Interactions
Authors: Reza Habibi, Seung Wan Ha, Zhiyu Lin, Atieh Kashani, Ala Shafia, Lakshana Lakshmanarajan, Chia-Fang Chung, Magy Seif El-Nasr |
阅读更多来源: ArXiv AI | 08-10-25
Vul-R2: A Reasoning LLM for Automated Vulnerability Repair
Authors: Xin-Cheng Wen, Zirui Lin, Yijun Yang, Cuiyun Gao, Deheng Ye |
阅读更多来源: ArXiv AI | 08-10-25
From Agentification to Self-Evolving Agentic AI for Wireless Networks: Concepts, Approaches, and Future Research Directions
Authors: Changyuan Zhao, Ruichen Zhang, Jiacheng Wang, Dusit Niyato, Geng Sun, Xianbin Wang, Shiwen Mao, Abbas Jamalipour |
阅读更多来源: ArXiv AI | 08-10-25
Large Language Model-Based Uncertainty-Adjusted Label Extraction for Artificial Intelligence Model Development in Upper Extremity Radiography
Authors: Hanna Kreutzer, Anne-Sophie Caselitz, Thomas Dratsch, Daniel Pinto dos Santos, Christiane Kuhl, Daniel Truhn, Sven Nebelung |
阅读更多来源: ArXiv AI | 08-10-25
Syn-Diag: An LLM-based Synergistic Framework for Generalizable Few-shot Fault Diagnosis on the Edge
Authors: Zijun Jia, Shuang Liang, Jinsong Yu |
阅读更多来源: ArXiv AI | 08-10-25
The Safety Challenge of World Models for Embodied AI Agents: A Review
Authors: Lorenzo Baraldi, Zifan Zeng, Chongzhe Zhang, Aradhana Nayak, Hongbo Zhu, Feng Liu, Qunli Zhang, Peng Wang, Shiming Liu, Zheng Hu, Angelo Cangelosi, Lorenzo Baraldi |
阅读更多来源: ArXiv AI | 08-10-25
ConstraintLLM: A Neuro-Symbolic Framework for Industrial-Level Constraint Programming
Authors: Weichun Shi, Minghao Liu, Wanting Zhang, Langchen Shi, Fuqi Jia, Feifei Ma, Jian Zhang |
阅读更多来源: ArXiv AI | 08-10-25
Training-Free Time Series Classification via In-Context Reasoning with LLM Agents
Authors: Songyuan Sui, Zihang Xu, Yu-Neng Chuang, Kwei-Herng Lai, Xia Hu |
阅读更多来源: ArXiv AI | 08-10-25
Optimizing for Persuasion Improves LLM Generalization: Evidence from Quality-Diversity Evolution of Debate Strategies
Authors: Aksel Joonas Reedi, Corentin Léger, Julien Pourcel, Loris Gaven, Perrine Charriau, Guillaume Pourcel |
阅读更多来源: ArXiv AI | 08-10-25
Deterministic Legal Retrieval: An Action API for Querying the SAT-Graph RAG
Authors: Hudson de Martim |
阅读更多来源: ArXiv AI | 08-10-25
Constraint-Aware Route Recommendation from Natural Language via Hierarchical LLM Agents
Authors: Tao Zhe, Rui Liu, Fateme Memar, Xiao Luo, Wei Fan, Xinyue Ye, Zhongren Peng, Dongjie Wang |
阅读更多来源: ArXiv AI | 08-10-25
Moloch's Bargain: Emergent Misalignment When LLMs Compete for Audiences
Authors: Batu El, James Zou |
阅读更多来源: ArXiv AI | 08-10-25
Classical AI vs. LLMs for Decision-Maker Alignment in Health Insurance Choices
Authors: Mallika Mainali, Harsha Sureshbabu, Anik Sen, Christopher B. Rauch, Noah D. Reifsnyder, John Meyer, J. T. Turner, Michael W. Floyd, Matthew Molineaux, Rosina O. Weber |
阅读更多来源: ArXiv AI | 08-10-25
Barbarians at the Gate: How AI is Upending Systems Research
Authors: Audrey Cheng, Shu Liu, Melissa Pan, Zhifei Li, Bowen Wang, Alex Krentsel, Tian Xia, Mert Cemri, Jongseok Park, Shuo Yang, Jeff Chen, Aditya Desai, Jiarong Xing, Koushik Sen, Matei Zaharia, Ion Stoica |
阅读更多来源: ArXiv AI | 08-10-25
OpenAI’s “Hacktivate AI” report urges Europe to cut red tape and harmonize digital regulations
阅读更多来源: The Decoder | 08-10-25
Developers can now build and deploy both apps and agents directly on the ChatGPT platform
阅读更多来源: The Decoder | 08-10-25
AMD signs a long-term deal to supply OpenAI with multiple generations of Instinct GPUs
阅读更多来源: The Decoder | 08-10-25
Show HN: MARS – Personal AI robot for builders (< $2k)
阅读更多来源: Hacker News | 08-10-25
OpenAI's new AI device reportedly faces technical hurdles that could delay its launch
阅读更多来源: The Decoder | 07-10-25
California passes first sweeping AI safety law
阅读更多来源: The Decoder | 07-10-25
The EU plans a new AI strategy to cut its reliance on US and Chinese technology
阅读更多来源: The Decoder | 07-10-25
CodeMender: an AI agent for code securitydeepmind.google
阅读更多来源: Hacker News | 07-10-25
Deloitte to refund the Australian government after using AI in $440k reporttheguardian.com
阅读更多来源: Hacker News | 07-10-25
Apps SDKdevelopers.openai.com
阅读更多来源: Hacker News | 07-10-25
GPT-5-Codex is a better AI researcher than meseangoedecke.com
阅读更多来源: Hacker News | 07-10-25
The Artificial Intelligence Cognitive Examination: A Survey on the Evolution of Multimodal Evaluation from Recognition to Reasoning
Authors: Mayank Ravishankara, Varindra V. Persad Maharaj |
阅读更多来源: ArXiv AI | 07-10-25
Selective Expert Guidance for Effective and Diverse Exploration in Reinforcement Learning of LLMs
Authors: Zishang Jiang, Jinyi Han, Tingyun Li, Xinyi Wang, Sihang Jiang, Jiaqing Liang, Zhaoqian Dai, Shuguang Ma, Fei Yu, Yanghua Xiao |
阅读更多来源: ArXiv AI | 07-10-25
Don't Pass$\mathtt{@}k$: A Bayesian Framework for Large Language Model Evaluation
Authors: Mohsen Hariri, Amirhossein Samandar, Michael Hinczewski, Vipin Chaudhary |
阅读更多来源: ArXiv AI | 07-10-25
Constructing coherent spatial memory in LLM agents through graph rectification
Authors: Puzhen Zhang, Xuyang Chen, Yu Feng, Yuhan Jiang, Liqiu Meng |
阅读更多来源: ArXiv AI | 07-10-25
On the Importance of Task Complexity in Evaluating LLM-Based Multi-Agent Systems
Authors: Bohan Tang, Huidong Liang, Keyue Jiang, Xiaowen Dong |
阅读更多来源: ArXiv AI | 07-10-25
GROK: From Quantitative Biomarkers to Qualitative Diagnosis via a Grounded MLLM with Knowledge-Guided Instruction
Authors: Zhuangzhi Gao, Hongyi Qin, He Zhao, Qinkai Yu, Feixiang Zhou, Eduard Shantsila, Uazman Alam, Alena Shantsila, Wahbi El-Bouri, Gregory Y. H. Lip, Yalin Zheng |
阅读更多来源: ArXiv AI | 07-10-25
LLM Based Bayesian Optimization for Prompt Search
Authors: Adam Ballew, Jingbo Wang, Shaogang Ren |
阅读更多来源: ArXiv AI | 07-10-25
Just-in-time Episodic Feedback Hinter: Leveraging Offline Knowledge to Improve LLM Agents Adaptation
Authors: Hadi Nekoei, Aman Jaiswal, Patrice Bechard, Oleh Shliazhko, Orlando Marquez Ayala, Mathieu Reymond, Massimo Caccia, Alexandre Drouin, Sarath Chandar, Alexandre Lacoste |
阅读更多来源: ArXiv AI | 07-10-25
Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents
Authors: Muyu He, Anand Kumar, Tsach Mackey, Meghana Rajeev, James Zou, Nazneen Rajani |
阅读更多来源: ArXiv AI | 07-10-25
Multi-Agent Collaborative Intelligence: Dual-Dial Control for Reliable LLM Reasoning
Authors: Edward Y. Chang, Ethan Y. Chang |
阅读更多来源: ArXiv AI | 07-10-25
Perfect AI Mimicry and the Epistemology of Consciousness: A Solipsistic Dilemma
Authors: Shurui Li |
阅读更多来源: ArXiv AI | 07-10-25
Beyond Outcome Reward: Decoupling Search and Answering Improves LLM Agents
Authors: Yiding Wang, Zhepei Wei, Xinyu Zhu, Yu Meng |
阅读更多来源: ArXiv AI | 07-10-25
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Authors: Ivo Petrov, Jasper Dekoninck, Martin Vechev |
阅读更多来源: ArXiv AI | 07-10-25
LEGOMem: Modular Procedural Memory for Multi-agent LLM Systems for Workflow Automation
Authors: Dongge Han, Camille Couturier, Daniel Madrigal Diaz, Xuchao Zhang, Victor Rühle, Saravan Rajmohan |
阅读更多来源: ArXiv AI | 07-10-25
LLM-Hanabi: Evaluating Multi-Agent Gameplays with Theory-of-Mind and Rationale Inference in Imperfect Information Collaboration Game
Authors: Fangzhou Liang, Tianshi Zheng, Chunkit Chan, Yauwai Yim, Yangqiu Song |
阅读更多来源: ArXiv AI | 07-10-25
Think Then Embed: Generative Context Improves Multimodal Embedding
Authors: Xuanming Cui, Jianpeng Cheng, Hong-you Chen, Satya Narayan Shukla, Abhijeet Awasthi, Xichen Pan, Chaitanya Ahuja, Shlok Kumar Mishra, Qi Guo, Ser-Nam Lim, Aashu Singh, Xiangjun Fan |
阅读更多来源: ArXiv AI | 07-10-25
OpenAI ChatKitgithub.com/openai
阅读更多来源: Hacker News | 07-10-25
AMD signs AI chip-supply deal with OpenAI, gives it option to take a 10% stakereuters.com
阅读更多来源: Hacker News | 07-10-25
What makes 5% of AI agents work in production?motivenotes.ai
阅读更多来源: Hacker News | 07-10-25
Investigating The Smells of LLM Generated Code
Authors: Debalina Ghosh Paul, Hong Zhu, Ian Bayley |
阅读更多来源: ArXiv AI | 07-10-25
Signature-Informed Transformer for Asset Allocation
Authors: Yoontae Hwang, Stefan Zohren |
阅读更多来源: ArXiv AI | 07-10-25
Topic Modeling as Long-Form Generation: Can Long-Context LLMs revolutionize NTM via Zero-Shot Prompting?
Authors: Xuan Xu, Haolun Li, Zhongliang Yang, Beilin Chu, Jia Song, Moxuan Xu, Linna Zhou |
阅读更多来源: ArXiv AI | 07-10-25
Self-Anchor: Large Language Model Reasoning via Step-by-step Attention Alignment
Authors: Hongxiang Zhang, Yuan Tian, Tianyi Zhang |
阅读更多来源: ArXiv AI | 07-10-25
Abstain and Validate: A Dual-LLM Policy for Reducing Noise in Agentic Program Repair
Authors: José Cambronero, Michele Tufano, Sherry Shi, Renyao Wei, Grant Uy, Runxiang Cheng, Chin-Jung Liu, Shiying Pan, Satish Chandra, Pat Rondon |
阅读更多来源: ArXiv AI | 07-10-25
BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks
Authors: Sagnik Anupam, Davis Brown, Shuo Li, Eric Wong, Hamed Hassani, Osbert Bastani |
阅读更多来源: ArXiv AI | 07-10-25
Multimodal Large Language Model Framework for Safe and Interpretable Grid-Integrated EVs
Authors: Jean Douglas Carvalho, Hugo Kenji, Ahmad Mohammad Saber, Glaucia Melo, Max Mauro Dias Santos, Deepa Kundur |
阅读更多来源: ArXiv AI | 07-10-25
AutoMaAS: Self-Evolving Multi-Agent Architecture Search for Large Language Models
Authors: Bo Ma, Hang Li, ZeHua Hu, XiaoFan Gui, LuYao Liu, Simon Liu |
阅读更多来源: ArXiv AI | 07-10-25
NCV: A Node-Wise Consistency Verification Approach for Low-Cost Structured Error Localization in LLM Reasoning
Authors: Yulong Zhang, Li Wang, Wei Du, Peilin Li, Yuqin Dai Zhiyuan Zhao, Lingyong Fang, Ziniu Liu, Ru Zhang, Huijia Zhu, Gongshen Liu |
阅读更多来源: ArXiv AI | 07-10-25
Take Goodhart Seriously: Principled Limit on General-Purpose AI Optimization
Authors: Antoine Maier, Aude Maier, Tom David |
阅读更多来源: ArXiv AI | 07-10-25
Onto-Epistemological Analysis of AI Explanations
Authors: Martina Mattioli, Eike Petersen, Aasa Feragen, Marcello Pelillo, Siavash A. Bigdeli |
阅读更多来源: ArXiv AI | 07-10-25
Terence Tao says ChatGPT saved him hours solving a math problem
阅读更多来源: The Decoder | 06-10-25
Why do LLMs freak out over the seahorse emoji?vgel.me
阅读更多来源: Hacker News | 06-10-25
What GPT-OSS leaks about OpenAI's training datafi-le.net
阅读更多来源: Hacker News | 06-10-25
Fire destroys S. Korean government's cloud storage system, no backups availablejoins.com
阅读更多来源: Hacker News | 06-10-25
Show HN: While everyone builds AI apps, my spreadsheet reached 2,300 userswrite-it-down.com
阅读更多来源: Hacker News | 06-10-25
Managing context on the Claude Developer Platformanthropic.com
阅读更多来源: Hacker News | 06-10-25
Which table format do LLMs understand best?improvingagents.com
阅读更多来源: Hacker News | 06-10-25
The Social Laboratory: A Psychometric Framework for Multi-Agent LLM Evaluation
Authors: Zarreen Reza |
阅读更多来源: ArXiv AI | 05-10-25
Automating Data-Driven Modeling and Analysis for Engineering Applications using Large Language Model Agents
Authors: Yang Liu, Zaid Abulawi, Abhiram Garimidi, Doyeong Lim |
阅读更多来源: ArXiv AI | 05-10-25
Fine-tuning with RAG for Improving LLM Learning of New Skills
Authors: Humaid Ibrahim, Nikolai Rozanov, Marek Rei |
阅读更多来源: ArXiv AI | 05-10-25
Retrieval-Augmented Framework for LLM-Based Clinical Decision Support
Authors: Leon Garza, Anantaa Kotal, Michael A. Grasso, Emre Umucu |
阅读更多来源: ArXiv AI | 05-10-25
A Tale of LLMs and Induced Small Proxies: Scalable Agents for Knowledge Mining
Authors: Sipeng Zhang, Longfei Yun, Zilong Wang, Jingbo Shang, Letian Peng |
阅读更多来源: ArXiv AI | 05-10-25
OntoLogX: Ontology-Guided Knowledge Graph Extraction from Cybersecurity Logs with Large Language Models
Authors: Luca Cotti, Idilio Drago, Anisa Rula, Devis Bianchini, Federico Cerutti |
阅读更多来源: ArXiv AI | 05-10-25
AIReg-Bench: Benchmarking Language Models That Assess AI Regulation Compliance
Authors: Bill Marino, Rosco Hunter, Zubair Jamali, Marinos Emmanouil Kalpakos, Mudra Kashyap, Isaiah Hinton, Alexa Hanson, Maahum Nazir, Christoph Schnabl, Felix Steffek, Hongkai Wen, Nicholas D. Lane |
阅读更多来源: ArXiv AI | 05-10-25
Step-Aware Policy Optimization for Reasoning in Diffusion Large Language Models
Authors: Shaoan Xie, Lingjing Kong, Xiangchen Song, Xinshuai Dong, Guangyi Chen, Eric P.Xing, Kun Zhang |
阅读更多来源: ArXiv AI | 05-10-25
LOGicalThought: Logic-Based Ontological Grounding of LLMs for High-Assurance Reasoning
Authors: Navapat Nananukul, Yue Zhang, Ryan Lee, Eric Boxer, Jonathan May, Vibhav Giridhar Gogate, Jay Pujara, Mayank Kejriwal |
阅读更多来源: ArXiv AI | 05-10-25
AgentRec: Next-Generation LLM-Powered Multi-Agent Collaborative Recommendation with Adaptive Intelligence
Authors: Bo Ma, Hang Li, ZeHua Hu, XiaoFan Gui, LuYao Liu, Simon Lau |
阅读更多来源: ArXiv AI | 05-10-25
InvThink: Towards AI Safety via Inverse Reasoning
Authors: Yubin Kim, Taehan Kim, Eugene Park, Chunjong Park, Cynthia Breazeal, Daniel McDuff, Hae Won Park |
阅读更多来源: ArXiv AI | 05-10-25
A Locally Executable AI System for Improving Preoperative Patient Communication: A Multi-Domain Clinical Evaluation
Authors: Motoki Sato (Nagasaki University, Japan), Yuki Matsushita (Nagasaki University, Japan), Hidekazu Takahashi (Boston Medical Sciences, Tokyo, Japan), Tomoaki Kakazu (Showa Medical University Koto Toyosu Hospital, Japan), Sou Nagata (Nagasaki University, Japan), Mizuho Ohnuma (Nagasaki University, Japan), Atsushi Yoshikawa (Kanto Gakuin University, Japan), Masayuki Yamamura (Institute of Science Tokyo, Japan) |
阅读更多来源: ArXiv AI | 05-10-25
GuruAgents: Emulating Wise Investors with Prompt-Guided LLM Agents
Authors: Yejin Kim, Youngbin Lee, Juhyeong Kim, Yongjae Lee |
阅读更多来源: ArXiv AI | 05-10-25
Understanding the Geospatial Reasoning Capabilities of LLMs: A Trajectory Recovery Perspective
Authors: Thinh Hung Truong, Jey Han Lau, Jianzhong Qi |
阅读更多来源: ArXiv AI | 05-10-25
PychoBench: Evaluating the Psychology Intelligence of Large Language Models
Authors: Min Zeng |
阅读更多来源: ArXiv AI | 05-10-25
REBot: From RAG to CatRAG with Semantic Enrichment and Graph Routing
Authors: Thanh Ma, Tri-Tam La, Lam-Thu Le Huu, Minh-Nghi Nguyen, Khanh-Van Pham Luu, Huu-Hoa Nguyen |
阅读更多来源: ArXiv AI | 05-10-25
A cybersecurity AI agent selection and decision support framework
Authors: Masike Malatji |
阅读更多来源: ArXiv AI | 05-10-25
Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning
Authors: Zhihao Dou, Qinjian Zhao, Zhongwei Wan, Dinggen Zhang, Weida Wang, Towsif Raiyan, Benteng Chen, Qingtao Pan, Yang Ouyang, Zhiqiang Gao, Shufei Zhang, Sumon Biswas |
阅读更多来源: ArXiv AI | 05-10-25
Do AI Models Perform Human-like Abstract Reasoning Across Modalities?
Authors: Claas Beger, Ryan Yi, Shuhao Fu, Arseny Moskvichev, Sarah W. Tsai, Sivasankaran Rajamanickam, Melanie Mitchell |
阅读更多来源: ArXiv AI | 05-10-25
Demystifying the Roles of LLM Layers in Retrieval, Knowledge, and Reasoning
Authors: Xinyuan Song, Keyu Wang, PengXiang Li, Lu Yin, Shiwei Liu |
阅读更多来源: ArXiv AI | 05-10-25
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Authors: Yuxiao Qu, Anikait Singh, Yoonho Lee, Amrith Setlur, Ruslan Salakhutdinov, Chelsea Finn, Aviral Kumar |
阅读更多来源: ArXiv AI | 05-10-25
UpSafe$^\circ$C: Upcycling for Controllable Safety in Large Language Models
Authors: Yuhao Sun, Zhuoer Xu, Shiwen Cui, Kun Yang, Lingyun Yu, Yongdong Zhang, Hongtao Xie |
阅读更多来源: ArXiv AI | 05-10-25
Show HN: 2D Spine Animation AI for Gamegodmodeai.co
阅读更多来源: Hacker News | 05-10-25
ProofOfThought: LLM-based reasoning using Z3 theorem provinggithub.com/debarghag
阅读更多来源: Hacker News | 05-10-25
Personal data storage is an idea whose time has comemuni.town
阅读更多来源: Hacker News | 05-10-25
Google ships Gemini 2.5 Flash Image model with new features
阅读更多来源: The Decoder | 05-10-25
Anthropic claims context engineering beats prompt engineering when managing AI agents
阅读更多来源: The Decoder | 05-10-25
OpenAI's hunger for computing power has Sam Altman dashing around the globewsj.com
阅读更多来源: Hacker News | 05-10-25
How to inject knowledge efficiently? Knowledge infusion scaling law for LLMsarxiv.org
阅读更多来源: Hacker News | 05-10-25
Effective context engineering for AI agentsanthropic.com
阅读更多来源: Hacker News | 04-10-25
Asked to do something illegal at work? Here's what these software engineers didpragmaticengineer.com
阅读更多来源: Hacker News | 04-10-25
Jeff Bezos says AI is in a bubble but society will get 'gigantic' benefitscnbc.com
阅读更多来源: Hacker News | 04-10-25
New antibiotic targets IBD and AI predicted how it would workmcmaster.ca
阅读更多来源: Hacker News | 04-10-25
Ex-OpenAI CTO Mira Murati introduces Tinker, an API for fine-tuning of open-weight LLMs
阅读更多来源: The Decoder | 04-10-25
Sam Altman says OpenAI would shut down Sora app if users' lives don't improve
阅读更多来源: The Decoder | 04-10-25
Microsoft 365 Premium promises more office AI features than ChatGPT Plus for one cent less
阅读更多来源: The Decoder | 04-10-25
Tencent trains AI that can explain and execute game strategies in Honor of Kings
阅读更多来源: The Decoder | 04-10-25
Potential issues in curl found using AI assisted toolsmastodon.social
阅读更多来源: Hacker News | 03-10-25
Google expands AI Mode with visual search and new features
阅读更多来源: The Decoder | 03-10-25
Liva AI (YC S25) Is Hiringycombinator.com
阅读更多来源: Hacker News | 03-10-25
Gemini 3.0 Pro – early teststwitter.com/chetaslua
阅读更多来源: Hacker News | 03-10-25
OpenAI's H1 2025: $4.3B in income, $13.5B in losstechinasia.com
阅读更多来源: Hacker News | 03-10-25
N8n added native persistent storage with DataTablesn8n.io
阅读更多来源: Hacker News | 03-10-25
Self-supervised learning, JEPA, world models, and the future of AI [video]youtube.com
阅读更多来源: Hacker News | 03-10-25
Microsoft adds autonomous AI agents to Copilot for Office apps
阅读更多来源: The Decoder | 02-10-25
Claude Sonnet 4.5 is designed to tackle coding tasks for over 30 hours at a time, Anthropic says
阅读更多来源: The Decoder | 02-10-25
OpenAI adds parental controls to ChatGPT for teens
阅读更多来源: The Decoder | 02-10-25
The RAG Obituary: Killed by agents, buried by context windowsnicolasbustamante.com
阅读更多来源: Hacker News | 02-10-25
Unix philosophy and filesystem access makes Claude Code amazingalephic.com
阅读更多来源: Hacker News | 02-10-25
I built ChatGPT with Minecraft redstone [video]youtube.com
阅读更多来源: Hacker News | 02-10-25
Understanding Cultural Differences: The Michigan Fish Test (2013)michael-roberto.blogspot.com
阅读更多来源: Hacker News | 02-10-25
DualTune: Decoupled Fine-Tuning for On-Device Agentic Systems
Authors: Rohan Kadekodi, Zhan Jin, Keisuke Kamahori, Yile Gu, Sean Khatiri, Noah H. Bayindirli, Sergey Gorbunov, Baris Kasikci |
阅读更多来源: ArXiv AI | 02-10-25
When Hallucination Costs Millions: Benchmarking AI Agents in High-Stakes Adversarial Financial Markets
Authors: Zeshi Dai, Zimo Peng, Zerui Cheng, Ryan Yihe Li |
阅读更多来源: ArXiv AI | 02-10-25
BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models
Authors: Thierry Blankenstein, Jialin Yu, Zixuan Li, Vassilis Plachouras, Sunando Sengupta, Philip Torr, Yarin Gal, Alasdair Paren, Adel Bibi |
阅读更多来源: ArXiv AI | 02-10-25
ICL Optimized Fragility
Authors: Serena Gomez Wannaz |
阅读更多来源: ArXiv AI | 02-10-25
Automated Evaluation can Distinguish the Good and Bad AI Responses to Patient Questions about Hospitalization
Authors: Sarvesh Soni, Dina Demner-Fushman |
阅读更多来源: ArXiv AI | 02-10-25
Semantic-Driven AI Agent Communications: Challenges and Solutions
Authors: Kaiwen Yu, Mengying Sun, Zhijin Qin, Xiaodong Xu, Ping Yang, Yue Xiao, Gang Wu |
阅读更多来源: ArXiv AI | 02-10-25
Batch-CAM: Introduction to better reasoning in convolutional deep learning models
Authors: Giacomo Ignesti, Davide Moroni, Massimo Martinelli |
阅读更多来源: ArXiv AI | 02-10-25
Is Model Editing Built on Sand? Revealing Its Illusory Success and Fragile Foundation
Authors: Wei Liu, Haomei Xu, Bingqing Liu, Zhiying Deng, Haozhao Wang, Jun Wang, Ruixuan Li, Yee Whye Teh, Wee Sun Lee |
阅读更多来源: ArXiv AI | 02-10-25
ACON: Optimizing Context Compression for Long-horizon LLM Agents
Authors: Minki Kang, Wei-Ning Chen, Dongge Han, Huseyin A. Inan, Lukas Wutschitz, Yanzhi Chen, Robert Sim, Saravan Rajmohan |
阅读更多来源: ArXiv AI | 02-10-25
Learning Compact Representations of LLM Abilities via Item Response Theory
Authors: Jianhao Chen, Chenxu Wang, Gengrui Zhang, Peng Ye, Lei Bai, Wei Hu, Yuzhong Qu, Shuyue Hu |
阅读更多来源: ArXiv AI | 02-10-25
Benchmarking Machine Learning Models for Fault Classification and Localization in Power System Protection
Authors: Julian Oelhaf, Georg Kordowich, Changhun Kim, Paula Andrea Pérez-Toro, Christian Bergler, Andreas Maier, Johann Jäger, Siming Bayer |
阅读更多来源: ArXiv AI | 02-10-25
Logical Consistency Between Disagreeing Experts and Its Role in AI Safety
Authors: Andrés Corrada-Emmanuel |
阅读更多来源: ArXiv AI | 02-10-25
Integrating AI and Ensemble Forecasting: Explainable Materials Planning with Scorecards and Trend Insights for a Large-Scale Manufacturer
Authors: Saravanan Venkatachalam |
阅读更多来源: ArXiv AI | 02-10-25
QUASAR: Quantum Assembly Code Generation Using Tool-Augmented LLMs via Agentic RL
Authors: Cong Yu, Valter Uotila, Shilong Deng, Qingyuan Wu, Tuo Shi, Songlin Jiang, Lei You, Bo Zhao |
阅读更多来源: ArXiv AI | 02-10-25
Uncovering the Computational Ingredients of Human-Like Representations in LLMs
Authors: Zach Studdiford, Timothy T. Rogers, Kushin Mukherjee, Siddharth Suresh |
阅读更多来源: ArXiv AI | 02-10-25
Shape Happens: Automatic Feature Manifold Discovery in LLMs via Supervised Multi-Dimensional Scaling
Authors: Federico Tiblias, Irina Bigoulaeva, Jingcheng Niu, Simone Balloccu, Iryna Gurevych |
阅读更多来源: ArXiv AI | 02-10-25
Safety Instincts: LLMs Learn to Trust Their Internal Compass for Self-Defense
Authors: Guobin Shen, Dongcheng Zhao, Haibo Tong, Jindong Li, Feifei Zhao, Yi Zeng |
阅读更多来源: ArXiv AI | 02-10-25
Typed Chain-of-Thought: A Curry-Howard Framework for Verifying LLM Reasoning
Authors: Elija Perrier |
阅读更多来源: ArXiv AI | 02-10-25
Deepmind says video models for visual tasks could become what LLMs are for text tasks
阅读更多来源: The Decoder | 02-10-25
Evaluating the impact of AI on the labor market: Current state of affairsyale.edu
阅读更多来源: Hacker News | 02-10-25
ChatGPT quietly switches to a stricter language model when users submit emotional prompts
阅读更多来源: The Decoder | 01-10-25
Sora 2openai.com
阅读更多来源: Hacker News | 01-10-25
Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs
Authors: Hankun Dai, Maoquan Wang, Mengnan Qi, Yikai Zhang, Zijian Jin, Yongqiang Yao, Yufan Huang, Shengyu Fu, Elsie Nallipogu |
阅读更多来源: ArXiv AI | 01-10-25
SafeMind: Benchmarking and Mitigating Safety Risks in Embodied LLM Agents
Authors: Ruolin Chen, Yinqian Sun, Jihang Wang, Mingyang Lv, Qian Zhang, Yi Zeng |
阅读更多来源: ArXiv AI | 01-10-25
DeepJSONEval: Benchmarking Complex Nested JSON Data Mining for Large Language Models
Authors: Zhicheng Zhou, Jing Li, Suming Qiu, Junjie Huang, Linyuan Qiu, Zhijie Sun |
阅读更多来源: ArXiv AI | 01-10-25
Scalable and Robust LLM Unlearning by Correcting Responses with Retrieved Exclusions
Authors: Junbeom Kim, Kyuyoung Kim, Jihoon Tack, Dongha Lim, Jinwoo Shin |
阅读更多来源: ArXiv AI | 01-10-25
CoLLM-NAS: Collaborative Large Language Models for Efficient Knowledge-Guided Neural Architecture Search
Authors: Zhe Li, Zhiwei Lin, Yongtao Wang |
阅读更多来源: ArXiv AI | 01-10-25
Towards Human Engagement with Realistic AI Combat Pilots
Authors: Ardian Selmonaj, Giacomo Del Rio, Adrian Schneider, Alessandro Antonucci |
阅读更多来源: ArXiv AI | 01-10-25
MEDAKA: Construction of Biomedical Knowledge Graphs Using Large Language Models
Authors: Asmita Sengupta, David Antony Selby, Sebastian Josef Vollmer, Gerrit Großmann |
阅读更多来源: ArXiv AI | 01-10-25
SafeEvalAgent: Toward Agentic and Self-Evolving Safety Evaluation of LLMs
Authors: Yixu Wang, Xin Wang, Yang Yao, Xinyuan Li, Yan Teng, Xingjun Ma, Yingchun Wang |
阅读更多来源: ArXiv AI | 01-10-25
Evaluating the Use of Large Language Models as Synthetic Social Agents in Social Science Research
Authors: Emma Rose Madden |
阅读更多来源: ArXiv AI | 01-10-25
Beyond the Algorithm: A Field Guide to Deploying AI Agents in Clinical Practice
Authors: Jack Gallifant, Katherine C. Kellogg, Matt Butler, Amanda Centi, Patrick F. Doyle, Sayon Dutta, Joyce Guo, Matthew J. Hadfield, Esther H. Kim, David E. Kozono, Hugo JWL Aerts, Adam B. Landman, Raymond H. Mak, Rebecca G. Mishuris, Tanna L. Nelson, Guergana K. Savova, Elad Sharon, Benjamin C. Silverman, Umit Topaloglu, Jeremy L. Warner, Danielle S. Bitterman |
阅读更多来源: ArXiv AI | 01-10-25
LMILAtt: A Deep Learning Model for Depression Detection from Social Media Users Enhanced by Multi-Instance Learning Based on Attention Mechanism
Authors: Yukun Yang |
阅读更多来源: ArXiv AI | 01-10-25
'Too much alignment; not enough culture': Re-balancing cultural alignment practices in LLMs
Authors: Eric J. W. Orlowski, Hakim Norhashim, Tristan Koh Ly Wey |
阅读更多来源: ArXiv AI | 01-10-25
90% Faster, 100% Code-Free: MLLM-Driven Zero-Code 3D Game Development
Authors: Runxin Yang, Yuxuan Wan, Shuqing Li, Michael R. Lyu |
阅读更多来源: ArXiv AI | 01-10-25
Human-Centered Evaluation of RAG outputs: a framework and questionnaire for human-AI collaboration
Authors: Aline Mangold, Kiran Hoffmann |
阅读更多来源: ArXiv AI | 01-10-25
LLM Agents for Knowledge Discovery in Atomic Layer Processing
Authors: Andreas Werbrouck, Marshall B. Lindsay, Matthew Maschmann, Matthias J. Young |
阅读更多来源: ArXiv AI | 01-10-25
Benchmarking Deep Learning Convolutions on Energy-constrained CPUs
Authors: Enrique Galvez (ALSOC), Adrien Cassagne (ALSOC), Alix Munier (ALSOC), Manuel Bouyer |
阅读更多来源: ArXiv AI | 01-10-25
Interactive Learning for LLM Reasoning
Authors: Hehai Lin, Shilei Cao, Minzhi Li, Sudong Wang, Haotian Wu, Linyi Yang, Juepeng Zheng, Chengwei Qin |
阅读更多来源: ArXiv AI | 01-10-25
SlimPack: Fine-Grained Asymmetric Packing for Balanced and Efficient Variable-Length LLM Training
Authors: Yuliang Liu, Guohao Wu, Shenglong Zhang, Wei Zhang, Qianchao Zhu, Zhouyang Li, Chenyu Wang |
阅读更多来源: ArXiv AI | 01-10-25
SafeBehavior: Simulating Human-Like Multistage Reasoning to Mitigate Jailbreak Attacks in Large Language Models
Authors: Qinjian Zhao, Jiaqi Wang, Zhiqiang Gao, Zhihao Dou, Belal Abuhaija, Kaizhu Huang |
阅读更多来源: ArXiv AI | 01-10-25
AI Playing Business Games: Benchmarking Large Language Models on Managerial Decision-Making in Dynamic Simulations
Authors: Berdymyrat Ovezmyradov |
阅读更多来源: ArXiv AI | 01-10-25
Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents
Authors: Shuai Shao, Qihan Ren, Chen Qian, Boyi Wei, Dadi Guo, Jingyi Yang, Xinhao Song, Linfeng Zhang, Weinan Zhang, Dongrui Liu, Jing Shao |
阅读更多来源: ArXiv AI | 01-10-25
OntoAligner Meets Knowledge Graph Embedding Aligners
Authors: Hamed Babaei Giglou, Jennifer D'Souza, Sören Auer, Mahsa Sanaei |
阅读更多来源: ArXiv AI | 01-10-25
Transformer Classification of Breast Lesions: The BreastDCEDL_AMBL Benchmark Dataset and 0.92 AUC Baseline
Authors: Naomi Fridman (Ariel University), Anat Goldstein (Ariel University) |
阅读更多来源: ArXiv AI | 01-10-25
TVS Sidekick: Challenges and Practical Insights from Deploying Large Language Models in the Enterprise
Authors: Paula Reyero Lobo, Kevin Johnson, Bill Buchanan, Matthew Shardlow, Ashley Williams, Samuel Attwood |
阅读更多来源: ArXiv AI | 01-10-25
The Average Patient Fallacy
Authors: Alaleh Azhir, Shawn N. Murphy, Hossein Estiri |
阅读更多来源: ArXiv AI | 01-10-25
OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!
Authors: Jingdi Lei, Varun Gumma, Rishabh Bhardwaj, Seok Min Lim, Chuan Li, Amir Zadeh, Soujanya Poria |
阅读更多来源: ArXiv AI | 01-10-25
HilbertA: Hilbert Attention for Image Generation with Diffusion Models
Authors: Shaoyi Zheng, Wenbo Lu, Yuxuan Xia, Haomin Liu, Shengjie Wang |
阅读更多来源: ArXiv AI | 01-10-25
Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark
Authors: Minhui Zhu, Minyang Tian, Xiaocheng Yang, Tianci Zhou, Penghao Zhu, Eli Chertkov, Shengyan Liu, Yufeng Du, Lifan Yuan, Ziming Ji, Indranil Das, Junyi Cao, Yufeng Du, Jinchen He, Yifan Su, Jiabin Yu, Yikun Jiang, Yujie Zhang, Chang Liu, Ze-Min Huang, Weizhen Jia, Xinan Chen, Peixue Wu, Yunkai Wang, Juntai Zhou, Yong Zhao, Farshid Jafarpour, Jessie Shelton, Aaron Young, John Bartolotta, Wenchao Xu, Yue Sun, Anjun Chu, Victor Colussi, Chris Akers, Nathan Brooks, Wenbo Fu, Christopher Wilson, Jinchao Zhao, Marvin Qi, Anqi Mu, Yubo Yang, Allen Zang, Yang Lyu, Peizhi Mai, Xuefei Guo, Luyu Gao, Ze Yang, Chi Xue, Dmytro Bandak, Yaïr Hein, Yonatan Kahn, Kevin Zhou, John Drew Wilson Jarrod T. Reilly, Di Luo, Daniel Inafuku, Hao Tong, Liang Yang, Ruixing Zhang, Xueying Wang, Ofir Press, Nicolas Chia, Eliu Huerta, Hao Peng |
阅读更多来源: ArXiv AI | 01-10-25
Fine-tuning Behavioral Cloning Policies with Preference-Based Reinforcement Learning
Authors: Maël Macuglia, Paul Friedrich, Giorgia Ramponi |
阅读更多来源: ArXiv AI | 01-10-25
Branching Out: Broadening AI Measurement and Evaluation with Measurement Trees
Authors: Craig Greenberg, Patrick Hall, Theodore Jensen, Kristen Greene, Razvan Amironesei |
阅读更多来源: ArXiv AI | 01-10-25
OpenAI unveils Sora 2 video model with realistic physics, high-quality audio, and a new social app
阅读更多来源: The Decoder | 01-10-25
Show HN: Sculptor, the Missing UI for Claude Codeimbue.com
阅读更多来源: Hacker News | 01-10-25
Making sure AI serves people and knowledge stays humanwikimedia.org
阅读更多来源: Hacker News | 01-10-25
OpenAI says top AI models are reaching expert territory on real-world knowledge work
阅读更多来源: The Decoder | 30-09-25
Developers rely on AI tools more than ever, yet confidence in AI outputs remains low
阅读更多来源: The Decoder | 30-09-25
iRobot Founder: Don't Believe the AI and Robotics Hypecrazystupidtech.com
阅读更多来源: Hacker News | 30-09-25
Instant Checkout and the Agentic Commerce Protocolopenai.com
阅读更多来源: Hacker News | 30-09-25
Claude Code 2.0npmjs.com
阅读更多来源: Hacker News | 30-09-25
Comprehension debt: A ticking time bomb of LLM-generated codecodemanship.wordpress.com
阅读更多来源: Hacker News | 30-09-25
Claude Sonnet 4.5anthropic.com
阅读更多来源: Hacker News | 30-09-25
Risk-Sensitive RL for Alleviating Exploration Dilemmas in Large Language Models
Authors: Yuhua Jiang, Jiawei Huang, Yufeng Yuan, Xin Mao, Yu Yue, Qianchuan Zhao, Lin Yan |
阅读更多来源: ArXiv AI | 30-09-25
Rethinking and Benchmarking Large Language Models for Graph Reasoning
Authors: Yuwei Hu, Xinyi Huang, Zhewei Wei, Yongchao Liu, Chuntao Hong |
阅读更多来源: ArXiv AI | 30-09-25
humancompatible.detect: a Python Toolkit for Detecting Bias in AI Models
Authors: German M. Matilla, Jiri Nemecek, Illia Kryvoviaz, Jakub Marecek |
阅读更多来源: ArXiv AI | 30-09-25
Plan before Solving: Problem-Aware Strategy Routing for Mathematical Reasoning with LLMs
Authors: Shihao Qi, Jie Ma, Ziang Yin, Lingling Zhang, Jian Zhang, Jun Liu, Feng Tian, Tongliang Liu |
阅读更多来源: ArXiv AI | 30-09-25
Fin-Ally: Pioneering the Development of an Advanced, Commonsense-Embedded Conversational AI for Money Matters
Authors: Sarmistha Das, Priya Mathur, Ishani Sharma, Sriparna Saha, Kitsuchart Pasupa, Alka Maurya |
阅读更多来源: ArXiv AI | 30-09-25
ContextPRM: Leveraging Contextual Coherence for multi-domain Test-Time Scaling
Authors: Haotian Zhang, Liu Liu, Baosheng Yu, Jiayan Qiu, Likang Xiao, Yanwei Ren, Quan Chen, Xianglong Liu |
阅读更多来源: ArXiv AI | 30-09-25
BPMN Assistant: An LLM-Based Approach to Business Process Modeling
Authors: Josip Tomo Licardo, Nikola Tankovic, Darko Etinger |
阅读更多来源: ArXiv AI | 30-09-25
TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models
Authors: Tong Guan, Zijie Meng, Dianqi Li, Shiyu Wang, Chao-Han Huck Yang, Qingsong Wen, Zuozhu Liu, Sabato Marco Siniscalchi, Ming Jin, Shirui Pan |
阅读更多来源: ArXiv AI | 30-09-25
From Ambiguity to Verdict: A Semiotic-Grounded Multi-Perspective Agent for LLM Logical Reasoning
Authors: Yunyao Zhang, Xinglang Zhang, Junxi Sheng, Wenbing Li, Junqing Yu, Wei Yang, Zikai Song |
阅读更多来源: ArXiv AI | 30-09-25
Spatial-Functional awareness Transformer-based graph archetype contrastive learning for Decoding Visual Neural Representations from EEG
Authors: Yueming Sun, Long Yang |
阅读更多来源: ArXiv AI | 30-09-25
Pushing LLMs to Their Logical Reasoning Bound: The Role of Data Reasoning Intensity
Authors: Zhen Bi, Zhenlin Hu, Jinnan Yang, Mingyang Chen, Cheng Deng, Yida Xue, Zeyu Yang, Qing Shen, Zhenfang Liu, Kang Zhao, Ningyu Zhang, Jungang Lou |
阅读更多来源: ArXiv AI | 30-09-25
The Emergence of Social Science of Large Language Models
Authors: Xiao Jia, Zhanzhan Zhao |
阅读更多来源: ArXiv AI | 30-09-25
Neural network embeddings recover value dimensions from psychometric survey items on par with human data
Authors: Max Pellert, Clemens M. Lechner, Indira Sen, Markus Strohmaier |
阅读更多来源: ArXiv AI | 30-09-25
CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning
Authors: Shijie Zhang, Guohao Sun, Kevin Zhang, Xiang Guo, Rujun Guo |
阅读更多来源: ArXiv AI | 30-09-25
KIRETT - A wearable device to support rescue operations using artificial intelligence to improve first aid
Authors: Johannes Zenkert, Christian Weber, Mubaris Nadeem, Lisa Bender, Madjid Fathi, Abu Shad Ahammed, Aniebiet Micheal Ezekiel, Roman Obermaisser, Maximilian Bradford |
阅读更多来源: ArXiv AI | 30-09-25
From $f(x)$ and $g(x)$ to $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
Authors: Lifan Yuan, Weize Chen, Yuchen Zhang, Ganqu Cui, Hanbin Wang, Ziming You, Ning Ding, Zhiyuan Liu, Maosong Sun, Hao Peng |
阅读更多来源: ArXiv AI | 30-09-25
Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs
Authors: Yue Zhang, Tianyi Ma, Zun Wang, Yanyuan Qiao, Parisa Kordjamshidi |
阅读更多来源: ArXiv AI | 30-09-25
Who's Your Judge? On the Detectability of LLM-Generated Judgments
Authors: Dawei Li, Zhen Tan, Chengshuai Zhao, Bohan Jiang, Baixiang Huang, Pingchuan Ma, Abdullah Alnaibari, Kai Shu, Huan Liu |
阅读更多来源: ArXiv AI | 30-09-25
EEG-Based Consumer Behaviour Prediction: An Exploration from Classical Machine Learning to Graph Neural Networks
Authors: Mohammad Parsa Afshar, Aryan Azimi |
阅读更多来源: ArXiv AI | 30-09-25
AutoClimDS: Climate Data Science Agentic AI -- A Knowledge Graph is All You Need
Authors: Ahmed Jaber, Wangshu Zhu, Karthick Jayavelu, Justin Downes, Sameer Mohamed, Candace Agonafir, Linnia Hawkins, Tian Zheng |
阅读更多来源: ArXiv AI | 30-09-25
GeoEvolve: Automating Geospatial Model Discovery via Multi-Agent Large Language Models
Authors: Peng Luo, Xiayin Lou, Yu Zheng, Zhuo Zheng, Stefano Ermon |
阅读更多来源: ArXiv AI | 30-09-25
Can AI Perceive Physical Danger and Intervene?
Authors: Abhishek Jindal, Dmitry Kalashnikov, Oscar Chang, Divya Garikapati, Anirudha Majumdar, Pierre Sermanet, Vikas Sindhwani |
阅读更多来源: ArXiv AI | 30-09-25
Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety
Authors: Junliang Liu, Jingyu Xiao, Wenxin Tang, Wenxuan Wang, Zhixian Wang, Minrui Zhang, Shuanghe Yu |
阅读更多来源: ArXiv AI | 30-09-25
Reimagining Agent-based Modeling with Large Language Model Agents via Shachi
Authors: So Kuroki, Yingtao Tian, Kou Misaki, Takashi Ikegami, Takuya Akiba, Yujin Tang |
阅读更多来源: ArXiv AI | 30-09-25
CoBel-World: Harnessing LLM Reasoning to Build a Collaborative Belief World for Optimizing Embodied Multi-Agent Collaboration
Authors: Zhimin Wang, Shaokang He, Duo Wu, Jinghe Wang, Linjia Kang, Jing Yu, Zhi Wang |
阅读更多来源: ArXiv AI | 30-09-25
Outlier Detection in Plantar Pressure: Human-Centered Comparison of Statistical Parametric Mapping and Explainable Machine Learning
Authors: Carlo Dindorf, Jonas Dully, Steven Simon, Dennis Perchthaler, Stephan Becker, Hannah Ehmann, Kjell Heitmann, Bernd Stetter, Christian Diers, Michael Fröhlich |
阅读更多来源: ArXiv AI | 30-09-25
The Thinking Spectrum: An Emperical Study of Tunable Reasoning in LLMs through Model Merging
Authors: Xiaochong Lan, Yu Zheng, Shiteng Cao, Yong Li |
阅读更多来源: ArXiv AI | 30-09-25
Clinical Uncertainty Impacts Machine Learning Evaluations
Authors: Simone Lionetti, Fabian Gröger, Philippe Gottfrois, Alvaro Gonzalez-Jimenez, Ludovic Amruthalingam, Alexander A. Navarini, Marc Pouly |
阅读更多来源: ArXiv AI | 30-09-25
Ground-Truthing AI Energy Consumption: Validating CodeCarbon Against External Measurements
Authors: Raphael Fischer |
阅读更多来源: ArXiv AI | 30-09-25
InfiMed-Foundation: Pioneering Advanced Multimodal Medical Models with Compute-Efficient Pre-Training and Multi-Stage Fine-Tuning
Authors: Guanghao Zhu, Zhitian Hou, Zeyu Liu, Zhijie Sang, Congkai Xie, Hongxia Yang |
阅读更多来源: ArXiv AI | 30-09-25
Evaluating LLMs for Combinatorial Optimization: One-Phase and Two-Phase Heuristics for 2D Bin-Packing
Authors: Syed Mahbubul Huq, Daniel Brito, Daniel Sikar, Rajesh Mojumder |
阅读更多来源: ArXiv AI | 30-09-25
Do LLM Agents Know How to Ground, Recover, and Assess? A Benchmark for Epistemic Competence in Information-Seeking Agents
Authors: Jiaqi Shao, Yuxiang Lin, Munish Prasad Lohani, Yufeng Miao, Bing Luo |
阅读更多来源: ArXiv AI | 30-09-25
Large Language Models as Nondeterministic Causal Models
Authors: Sander Beckers |
阅读更多来源: ArXiv AI | 30-09-25
REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model
Authors: Bo Li, Guanzhi Deng, Ronghao Chen, Junrong Yue, Shuo Zhang, Qinghua Zhao, Linqi Song, Lijie Wen |
阅读更多来源: ArXiv AI | 30-09-25
TrueGradeAI: Retrieval-Augmented and Bias-Resistant AI for Transparent and Explainable Digital Assessments
Authors: Rakesh Thakur, Shivaansh Kaushik, Gauri Chopra, Harsh Rohilla |
阅读更多来源: ArXiv AI | 30-09-25
Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time
Authors: Yixuan Han, Fan Ma, Ruijie Quan, Yi Yang |
阅读更多来源: ArXiv AI | 30-09-25
Sandboxing AI agents at the kernel levelgreptile.com
阅读更多来源: Hacker News | 30-09-25
John Jumper: AI is revolutionizing scientific discovery [video]youtube.com
阅读更多来源: Hacker News | 30-09-25
California governor signs AI transparency bill into lawca.gov
阅读更多来源: Hacker News | 30-09-25
Anthropic settles landmark AI copyright lawsuit for at least $1.5 billion
阅读更多来源: The Decoder | 29-09-25
Google updates Gemini 2.5 Flash models to deliver faster responses and improved performance
阅读更多来源: The Decoder | 29-09-25
To AI or Not to AIantropia.studio
阅读更多来源: Hacker News | 29-09-25
The AI coding trapchrisloy.dev
阅读更多来源: Hacker News | 29-09-25
Google Deepmind brings agentic AI capabilities into robots with two new Gemini models
阅读更多来源: The Decoder | 28-09-25
Semantic Edge-Cloud Communication for Real-Time Urban Traffic Surveillance with ViT and LLMs over Mobile Networks
Authors: Murat Arda Onsu, Poonam Lohan, Burak Kantarci, Aisha Syed, Matthew Andrews, Sean Kennedy |
阅读更多来源: ArXiv AI | 28-09-25
No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks
Authors: Yehonatan Refael, Guy Smorodinsky, Ofir Lindenbaum, Itay Safran |
阅读更多来源: ArXiv AI | 28-09-25
It's Not You, It's Clipping: A Soft Trust-Region via Probability Smoothing for LLM RL
Authors: Madeleine Dwyer, Adam Sobey, Adriane Chapman |
阅读更多来源: ArXiv AI | 28-09-25
Data-Centric Elastic Pipeline Parallelism for Efficient Long-Context LLM Training
Authors: Shiju Wang, Yujie Wang, Ao Sun, Fangcheng Fu, Zijian Zhu, Bin Cui, Xu Han, Kaisheng Ma |
阅读更多来源: ArXiv AI | 28-09-25
Philosophy-informed Machine Learning
Authors: MZ Naser |
阅读更多来源: ArXiv AI | 28-09-25
Adaptive Approach to Enhance Machine Learning Scheduling Algorithms During Runtime Using Reinforcement Learning in Metascheduling Applications
Authors: Samer Alshaer, Ala Khalifeh, Roman Obermaisser |
阅读更多来源: ArXiv AI | 28-09-25
Reconstruction-Based Adaptive Scheduling Using AI Inferences in Safety-Critical Systems
Authors: Samer Alshaer, Ala Khalifeh, Roman Obermaisser |
阅读更多来源: ArXiv AI | 28-09-25
InsightGUIDE: An Opinionated AI Assistant for Guided Critical Reading of Scientific Literature
Authors: Paris Koloveas, Serafeim Chatzopoulos, Thanasis Vergoulis, Christos Tryfonopoulos |
阅读更多来源: ArXiv AI | 28-09-25
LogReasoner: Empowering LLMs with Expert-like Coarse-to-Fine Reasoning for Log Analysis Tasks
Authors: Lipeng Ma, Yixuan Li, Weidong Yang, Mingjie Zhou, Xinyi Liu, Ben Fei, Shuhao Li, Xiaoyan Sun, Sihang Jiang, Yanghua Xiao |
阅读更多来源: ArXiv AI | 28-09-25
CORE: Full-Path Evaluation of LLM Agents Beyond Final State
Authors: Panagiotis Michelakis, Yiannis Hadjiyiannis, Dimitrios Stamoulis |
阅读更多来源: ArXiv AI | 28-09-25
AOT*: Efficient Synthesis Planning via LLM-Empowered AND-OR Tree Search
Authors: Xiaozhuang Song, Xuanhao Pan, Xinjian Zhao, Hangting Ye, Shufei Zhang, Jian Tang, Tianshu Yu |
阅读更多来源: ArXiv AI | 28-09-25
Beyond Stars: Bridging the Gap Between Ratings and Review Sentiment with LLM
Authors: Najla Zuhir, Amna Mohammad Salim, Parvathy Premkumar, Moshiur Farazi |
阅读更多来源: ArXiv AI | 28-09-25
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them
Authors: Yidong Wang, Yunze Song, Tingyuan Zhu, Xuanwang Zhang, Zhuohao Yu, Hao Chen, Chiyu Song, Qiufeng Wang, Cunxiang Wang, Zhen Wu, Xinyu Dai, Yue Zhang, Wei Ye, Shikun Zhang |
阅读更多来源: ArXiv AI | 28-09-25
ToMPO: Training LLM Strategic Decision Making from a Multi-Agent Perspective
Authors: Yiwen Zhang, Ziang Chen, Fanqi Kong, Yizhe Huang, Xue Feng |
阅读更多来源: ArXiv AI | 28-09-25
RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
Authors: Kohsei Matsutani, Shota Takashiro, Gouki Minegishi, Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo |
阅读更多来源: ArXiv AI | 28-09-25
Grounding AI Explanations in Experience: A Reflective Cognitive Architecture for Clinical Decision Support
Authors: Zijian Shao, Haiyang Shen, Mugeng Liu, Gecheng Fu, Yaoqi Guo, Yanfeng Wang, Yun Ma |
阅读更多来源: ArXiv AI | 28-09-25
What Do LLM Agents Do When Left Alone? Evidence of Spontaneous Meta-Cognitive Patterns
Authors: Stefan Szeider |
阅读更多来源: ArXiv AI | 28-09-25
A Fano-Style Accuracy Upper Bound for LLM Single-Pass Reasoning in Multi-Hop QA
Authors: Kaiyang Wan, Lang Gao, Honglin Mu, Preslav Nakov, Yuxia Wang, Xiuying Chen |
阅读更多来源: ArXiv AI | 28-09-25
Distributed Specialization: Rare-Token Neurons in Large Language Models
Authors: Jing Liu, Haozheng Wang, Yueheng Li |
阅读更多来源: ArXiv AI | 28-09-25
LLM Observability in the Wild – Why OpenTelemetry Should Be the Standardsignoz.io
阅读更多来源: Hacker News | 28-09-25
US Military struggling to deploy AI weaponsmsn.com
阅读更多来源: Hacker News | 28-09-25
Just How Resilient Are Large Language Models?rdrocket.com
阅读更多来源: Hacker News | 28-09-25
Suno Studio, a Generative AI DAWsuno.com
阅读更多来源: Hacker News | 27-09-25
GPT-OSS Reinforcement Learningunsloth.ai
阅读更多来源: Hacker News | 27-09-25
Cracker Barrel Outrage Was Almost Certainly Driven by Bots, Researchers Saygizmodo.com
阅读更多来源: Hacker News | 27-09-25
OpenAI's new Pulse feature lets ChatGPT start the conversation
阅读更多来源: The Decoder | 27-09-25
Gauntlet AI (YC S17) is looking for engineers who want to master AIgauntletai.com
阅读更多来源: Hacker News | 27-09-25
Microsoft's VibeVoice is a new AI podcast model that might generate spontaneous singing
阅读更多来源: The Decoder | 27-09-25
Alibaba launches Qwen3-Max, its largest and most capable AI model to date
阅读更多来源: The Decoder | 26-09-25
Bit is all we need: binary normalized neural networksarxiv.org
阅读更多来源: Hacker News | 26-09-25
Improved Gemini 2.5 Flash and Flash-Litegoogleblog.com
阅读更多来源: Hacker News | 26-09-25
ChatGPT Pulseopenai.com
阅读更多来源: Hacker News | 26-09-25
Pairing with Claude Code to rebuild my startup's websitenseldeib.com
阅读更多来源: Hacker News | 26-09-25
SAP and OpenAI plan to launch an AI platform for Germany's public sector using Microsoft Azure
阅读更多来源: The Decoder | 26-09-25
The Wind, a Pole, and the Dragonentropicthoughts.com
阅读更多来源: Hacker News | 26-09-25
Sam Altman says scaling up compute is the "literal key" to OpenAI's revenue growth
阅读更多来源: The Decoder | 25-09-25
Alibaba unveils Qwen3-Omni, an AI model that processes text, images, audio, and video
阅读更多来源: The Decoder | 25-09-25
Notion AI agents get security update after potential data leak
阅读更多来源: The Decoder | 25-09-25
Learning Persian with Anki, ChatGPT and YouTubecjauvin.github.io
阅读更多来源: Hacker News | 25-09-25
Snapdragon X2 Elite ARM Laptop CPUqualcomm.com
阅读更多来源: Hacker News | 25-09-25
Effect Systems vs. Print Debugging: A Pragmatic Solutionflix.dev
阅读更多来源: Hacker News | 25-09-25
Low-Resource English-Tigrinya MT: Leveraging Multilingual Models, Custom Tokenizers, and Clean Evaluation Benchmarks
Authors: Hailay Kidu Teklehaymanot, Gebrearegawi Gidey, Wolfgang Nejdl |
阅读更多来源: ArXiv AI | 25-09-25
Play by the Type Rules: Inferring Constraints for LLM Functions in Declarative Programs
Authors: Parker Glenn, Alfy Samuel, Daben Liu |
阅读更多来源: ArXiv AI | 25-09-25
STAF: Leveraging LLMs for Automated Attack Tree-Based Security Test Generation
Authors: Tanmay Khule, Stefan Marksteiner, Jose Alguindigue, Hannes Fuchs, Sebastian Fischmeister, Apurva Narayan |
阅读更多来源: ArXiv AI | 25-09-25
Beyond Sharp Minima: Robust LLM Unlearning via Feedback-Guided Multi-Point Optimization
Authors: Wenhan Wu, Zheyuan Liu, Chongyang Gao, Ren Wang, Kaize Ding |
阅读更多来源: ArXiv AI | 25-09-25
Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment
Authors: Deokjae Lee, Hyun Oh Song |
阅读更多来源: ArXiv AI | 25-09-25
When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine Validity
Authors: Benjamin Feuer, Chiung-Yi Tseng, Astitwa Sarthak Lathe, Oussama Elachqar, John P Dickerson |
阅读更多来源: ArXiv AI | 25-09-25
RAG Security and Privacy: Formalizing the Threat Model and Attack Surface
Authors: Atousa Arzanipour, Rouzbeh Behnia, Reza Ebrahimi, Kaushik Dutta |
阅读更多来源: ArXiv AI | 25-09-25
DRES: Benchmarking LLMs for Disfluency Removal
Authors: Maria Teleki, Sai Janjur, Haoran Liu, Oliver Grabner, Ketan Verma, Thomas Docog, Xiangjue Dong, Lingfeng Shi, Cong Wang, Stephanie Birkelbach, Jason Kim, Yin Zhang, James Caverlee |
阅读更多来源: ArXiv AI | 25-09-25
Uncovering Graph Reasoning in Decoder-only Transformers with Circuit Tracing
Authors: Xinnan Dai, Chung-Hsiang Lo, Kai Guo, Shenglai Zeng, Dongsheng Luo, Jiliang Tang |
阅读更多来源: ArXiv AI | 25-09-25
EmbeddingGemma: Powerful and Lightweight Text Representations
Authors: Henrique Schechter Vera, Sahil Dua, Biao Zhang, Daniel Salz, Ryan Mullins, Sindhu Raghuram Panyam, Sara Smoot, Iftekhar Naim, Joe Zou, Feiyang Chen, Daniel Cer, Alice Lisak, Min Choi, Lucas Gonzalez, Omar Sanseviero, Glenn Cameron, Ian Ballantyne, Kat Black, Kaifeng Chen, Weiyi Wang, Zhe Li, Gus Martins, Jinhyuk Lee, Mark Sherwood, Juyeong Ji, Renjie Wu, Jingxiao Zheng, Jyotinder Singh, Abheesht Sharma, Divya Sreepat, Aashi Jain, Adham Elarabawy, AJ Co, Andreas Doumanoglou, Babak Samari, Ben Hora, Brian Potetz, Dahun Kim, Enrique Alfonseca, Fedor Moiseev, Feng Han, Frank Palma Gomez, Gustavo Hernández Ábrego, Hesen Zhang, Hui Hui, Jay Han, Karan Gill, Ke Chen, Koert Chen, Madhuri Shanbhogue, Michael Boratko, Paul Suganthan, Sai Meher Karthik Duddu, Sandeep Mariserla, Setareh Ariafar, Shanfeng Zhang, Shijie Zhang, Simon Baumgartner, Sonam Goenka, Steve Qiu, Tanmaya Dabral, Trevor Walker, Vikram Rao, Waleed Khawaja, Wenlei Zhou, Xiaoqi Ren, Ye Xia, Yichang Chen, Yi-Ting Chen, Zhe Dong, Zhongli Ding, Francesco Visin, Gaël Liu, Jiageng Zhang, Kathleen Kenealy, Michelle Casbon, Ravin Kumar, Thomas Mesnard, Zach Gleicher, Cormac Brick, Olivier Lacombe, Adam Roberts, Yunhsuan Sung, Raphael Hoffmann, Tris Warkentin, Armand Joulin, Tom Duerig, Mojtaba Seyedhosseini |
阅读更多来源: ArXiv AI | 25-09-25
Estimating the Self-Consistency of LLMs
Authors: Robert Nowak |
阅读更多来源: ArXiv AI | 25-09-25
Cognitive Load Limits in Large Language Models: Benchmarking Multi-Hop Reasoning
Authors: Sai Teja Reddy Adapala |
阅读更多来源: ArXiv AI | 25-09-25
What Does Your Benchmark Really Measure? A Framework for Robust Inference of AI Capabilities
Authors: Nathanael Jo, Ashia Wilson |
阅读更多来源: ArXiv AI | 25-09-25
Embodied AI: From LLMs to World Models
Authors: Tongtong Feng, Xin Wang, Yu-Gang Jiang, Wenwu Zhu |
阅读更多来源: ArXiv AI | 25-09-25
CON-QA: Privacy-Preserving QA using cloud LLMs in Contract Domain
Authors: Ajeet Kumar Singh, Rajsabi Surya, Anurag Tripathi, Santanu Choudhury, Sudhir Bisane |
阅读更多来源: ArXiv AI | 25-09-25
MACD: Multi-Agent Clinical Diagnosis with Self-Learned Knowledge for LLM
Authors: Wenliang Li, Rui Yan, Xu Zhang, Li Chen, Hongji Zhu, Jing Zhao, Junjun Li, Mengru Li, Wei Cao, Zihang Jiang, Wei Wei, Kun Zhang, Shaohua Kevin Zhou |
阅读更多来源: ArXiv AI | 25-09-25
PEPS: Quantum-Inspired Reinforcement Learning for Coherent Reasoning Traces in LLMs
Authors: Venkat Margapuri, Garik Kazanjian, Naren Kosaraju |
阅读更多来源: ArXiv AI | 25-09-25
Scan-do Attitude: Towards Autonomous CT Protocol Management using a Large Language Model Agent
Authors: Xingjian Kang, Linda Vorberg, Andreas Maier, Alexander Katzmann, Oliver Taubmann |
阅读更多来源: ArXiv AI | 25-09-25
OpenAI and Nvidia announce 10-gigawatt partnership for AI infrastructure
阅读更多来源: The Decoder | 25-09-25
The "Wage Level" Mirage: H-1B proposal could help outsourcers and hurt US talentifp.org
阅读更多来源: Hacker News | 25-09-25
ChatGPT's Deep Research mode let attackers steal Gmail data with hidden instructions in emails
阅读更多来源: The Decoder | 24-09-25
Semianalysis says Colossus 2 puts xAI ahead of Meta and Anthropic, but OpenAI stays ahead
阅读更多来源: The Decoder | 24-09-25
Getting AI to work in complex codebasesgithub.com/humanlayer
阅读更多来源: Hacker News | 24-09-25
A Cost-Benefit Analysis of On-Premise Large Language Model Deployment: Breaking Even with Commercial LLM Services
Authors: Guanzhong Pan, Haibo Wang |
阅读更多来源: ArXiv AI | 24-09-25
SPADE: A Large Language Model Framework for Soil Moisture Pattern Recognition and Anomaly Detection in Precision Agriculture
Authors: Yeonju Lee, Rui Qi Chen, Joseph Oboamah, Po Nien Su, Wei-zhen Liang, Yeyin Shi, Lu Gan, Yongsheng Chen, Xin Qiao, Jing Li |
阅读更多来源: ArXiv AI | 24-09-25
Large Language Models and Operations Research: A Structured Survey
Authors: Yang Wang, Kai Li |
阅读更多来源: ArXiv AI | 24-09-25
Synthesizing Attitudes, Predicting Actions (SAPA): Behavioral Theory-Guided LLMs for Ridesourcing Mode Choice Modeling
Authors: Mustafa Sameen, Xiaojian Zhang, Xilei Zhao |
阅读更多来源: ArXiv AI | 24-09-25
Multimodal Health Risk Prediction System for Chronic Diseases via Vision-Language Fusion and Large Language Models
Authors: Dingxin Lu, Shurui Wu, Xinyi Huang |
阅读更多来源: ArXiv AI | 24-09-25
ATLAS: Benchmarking and Adapting LLMs for Global Trade via Harmonized Tariff Code Classification
Authors: Pritish Yuvraj, Siva Devarakonda |
阅读更多来源: ArXiv AI | 24-09-25
Gödel Test: Can Large Language Models Solve Easy Conjectures?
Authors: Moran Feldman, Amin Karbasi |
阅读更多来源: ArXiv AI | 24-09-25
LLMZ+: Contextual Prompt Whitelist Principles for Agentic LLMs
Authors: Tom Pawelek, Raj Patel, Charlotte Crowell, Noorbakhsh Amiri, Sudip Mittal, Shahram Rahimi, Andy Perkins |
阅读更多来源: ArXiv AI | 24-09-25
Instruction-Following Evaluation in Function Calling for Large Language Models
Authors: Nikolai Skripko |
阅读更多来源: ArXiv AI | 24-09-25
TERAG: Token-Efficient Graph-Based Retrieval-Augmented Generation
Authors: Qiao Xiao, Hong Ting Tsang, Jiaxin Bai |
阅读更多来源: ArXiv AI | 24-09-25
Advances in Large Language Models for Medicine
Authors: Zhiyu Kan, Wensheng Gan, Zhenlian Qi, Philip S. Yu |
阅读更多来源: ArXiv AI | 24-09-25
Bounded PCTL Model Checking of Large Language Model Outputs
Authors: Dennis Gross, Helge Spieker, Arnaud Gotlieb |
阅读更多来源: ArXiv AI | 24-09-25
Experience Scaling: Post-Deployment Evolution For Large Language Models
Authors: Xingkun Yin, Kaibin Huang, Dong In Kim, Hongyang Du |
阅读更多来源: ArXiv AI | 24-09-25
Model selection meets clinical semantics: Optimizing ICD-10-CM prediction via LLM-as-Judge evaluation, redundancy-aware sampling, and section-aware fine-tuning
Authors: Hong-Jie Dai, Zheng-Hao Li, An-Tai Lu, Bo-Tsz Shain, Ming-Ta Li, Tatheer Hussain Mir, Kuang-Te Wang, Min-I Su, Pei-Kang Liu, Ming-Ju Tsai |
阅读更多来源: ArXiv AI | 24-09-25
Memory in Large Language Models: Mechanisms, Evaluation and Evolution
Authors: Dianxing Zhang, Wendong Li, Kani Song, Jiaye Lu, Gang Li, Liuchun Yang, Sheng Li |
阅读更多来源: ArXiv AI | 24-09-25
Data Efficient Adaptation in Large Language Models via Continuous Low-Rank Fine-Tuning
Authors: Xiao Han, Zimo Zhao, Wanyu Wang, Maolin Wang, Zitao Liu, Yi Chang, Xiangyu Zhao |
阅读更多来源: ArXiv AI | 24-09-25
From latent factors to language: a user study on LLM-generated explanations for an inherently interpretable matrix-based recommender system
Authors: Maxime Manderlier, Fabian Lecron, Olivier Vu Thanh, Nicolas Gillis |
阅读更多来源: ArXiv AI | 24-09-25
LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions
Authors: Xixun Lin, Yucheng Ning, Jingwen Zhang, Yan Dong, Yilong Liu, Yongxuan Wu, Xiaohua Qi, Nan Sun, Yanmin Shang, Pengfei Cao, Lixin Zou, Xu Chen, Chuan Zhou, Jia Wu, Shirui Pan, Bin Wang, Yanan Cao, Kai Chen, Songlin Hu, Li Guo |
阅读更多来源: ArXiv AI | 24-09-25
AgentInit: Initializing LLM-based Multi-Agent Systems via Diversity and Expertise Orchestration for Effective and Efficient Collaboration
Authors: Chunhao Tian, Yutong Wang, Xuebo Liu, Zhexuan Wang, Liang Ding, Miao Zhang, Min Zhang |
阅读更多来源: ArXiv AI | 24-09-25
Cross-Cultural Transfer of Commonsense Reasoning in LLMs: Evidence from the Arab World
Authors: Saeed Almheiri, Rania Hossam, Mena Attia, Chenxi Wang, Preslav Nakov, Timothy Baldwin, Fajri Koto |
阅读更多来源: ArXiv AI | 24-09-25
From MCP to shell: MCP auth flaws enable RCE in Claude Code, Gemini CLI and moreverialabs.com
阅读更多来源: Hacker News | 24-09-25
Launch HN: Strata (YC X25) – One MCP server for AI to handle thousands of tools
阅读更多来源: Hacker News | 24-09-25
Context Engineering for AI Agents: Lessonsmanus.im
阅读更多来源: Hacker News | 24-09-25
Stanford and Arc Institute scientists used AI to design new viruses that killed bacteria in the lab
阅读更多来源: The Decoder | 23-09-25
Paper2Agent: Stanford Reimagining Research Papers as Interactive AI Agentsarxiv.org
阅读更多来源: Hacker News | 23-09-25
Qwen3-Omni: Native Omni AI model for text, image and videogithub.com/qwenlm
阅读更多来源: Hacker News | 23-09-25
I built a dual RTX 3090 rig for local AI in 2025 (and lessons learned)llamabuilds.ai
阅读更多来源: Hacker News | 23-09-25
Structured Outputs in LLMsparthsareen.com
阅读更多来源: Hacker News | 23-09-25
Question Answering with LLMs and Learning from Answer Sets
Authors: Manuel Borroto, Katie Gallagher, Antonio Ielo, Irfan Kareem, Francesco Ricca, Alessandra Russo |
阅读更多来源: ArXiv AI | 23-09-25
FESTA: Functionally Equivalent Sampling for Trust Assessment of Multimodal LLMs
Authors: Debarpan Bhattacharya, Apoorva Kulkarni, Sriram Ganapathy |
阅读更多来源: ArXiv AI | 23-09-25
Large Language Models as End-to-end Combinatorial Optimization Solvers
Authors: Xia Jiang, Yaoxin Wu, Minshuo Li, Zhiguang Cao, Yingqian Zhang |
阅读更多来源: ArXiv AI | 23-09-25
Roundtable Policy: Improving Scientific Reasoning and Narratives through Confidence-Weighted Consensus of LLMs
Authors: Yu Yao, Jiayi Dong, Ju Li, Yang Yang, Yilun Du |
阅读更多来源: ArXiv AI | 23-09-25
LLMs as Layout Designers: A Spatial Reasoning Perspective
Authors: Sha Li |
阅读更多来源: ArXiv AI | 23-09-25
seqBench: A Tunable Benchmark to Quantify Sequential Reasoning Limits of LLMs
Authors: Mohammad Ramezanali, Mo Vazifeh, Paolo Santi |
阅读更多来源: ArXiv AI | 23-09-25
RALLM-POI: Retrieval-Augmented LLM for Zero-shot Next POI Recommendation with Geographical Reranking
Authors: Kunrong Li, Kwan Hui Lim |
阅读更多来源: ArXiv AI | 23-09-25
LLaVul: A Multimodal LLM for Interpretable Vulnerability Reasoning about Source Code
Authors: Ala Jararweh, Michael Adams, Avinash Sahu, Abdullah Mueen, Afsah Anwar |
阅读更多来源: ArXiv AI | 23-09-25
CogAtom: From Cognitive Atoms to Olympiad-level Mathematical Reasoning in Large Language Models
Authors: Zhuofan Chen, Jiyuan He, Yichi Zhang, Xing Hu, Haoxing Wen, Jun Bai, Wenge Rong |
阅读更多来源: ArXiv AI | 23-09-25
Mind the Gap: Comparing Model- vs Agentic-Level Red Teaming with Action-Graph Observability on GPT-OSS-20B
Authors: Ilham Wicaksono, Zekun Wu, Rahul Patel, Theo King, Adriano Koshiyama, Philip Treleaven |
阅读更多来源: ArXiv AI | 23-09-25
Can Agents Judge Systematic Reviews Like Humans? Evaluating SLRs with LLM-based Multi-Agent System
Authors: Abdullah Mushtaq, Muhammad Rafay Naeem, Ibrahim Ghaznavi, Alaa Abd-alrazaq, Aliya Tabassum, Junaid Qadir |
阅读更多来源: ArXiv AI | 23-09-25
Correlation or Causation: Analyzing the Causal Structures of LLM and LRM Reasoning Process
Authors: Zhizhang FU, Guangsheng Bao, Hongbo Zhang, Chenkai Hu, Yue Zhang |
阅读更多来源: ArXiv AI | 23-09-25
Multi-Scenario Highway Lane-Change Intention Prediction: A Physics-Informed AI Framework for Three-Class Classification
Authors: Jiazhao Shi, Yichen Lin, Yiheng Hua, Ziyu Wang, Zijian Zhang, Wenjia Zheng, Yun Song, Kuan Lu, Shoufeng Lu |
阅读更多来源: ArXiv AI | 23-09-25
Medical AI Consensus: A Multi-Agent Framework for Radiology Report Generation and Evaluation
Authors: Ahmed T. Elboardy, Ghada Khoriba, Essam A. Rashed |
阅读更多来源: ArXiv AI | 23-09-25
Evaluating Multimodal Large Language Models with Daily Composite Tasks in Home Environments
Authors: Zhenliang Zhang, Yuxi Wang, Hongzhao Xie, Shiyun Zhao, Mingyuan Liu, Yujie Lu, Xinyi He, Zhenku Cheng, Yujia Peng |
阅读更多来源: ArXiv AI | 23-09-25
EngiBench: A Benchmark for Evaluating Large Language Models on Engineering Problem Solving
Authors: Xiyuan Zhou, Xinlei Wang, Yirui He, Yang Wu, Ruixi Zou, Yuheng Cheng, Yulu Xie, Wenxuan Liu, Huan Zhao, Yan Xu, Jinjin Gu, Junhua Zhao |
阅读更多来源: ArXiv AI | 23-09-25
"I think this is fair'': Uncovering the Complexities of Stakeholder Decision-Making in AI Fairness Assessment
Authors: Lin Luo, Yuri Nakao, Mathieu Chollet, Hiroya Inakoshi, Simone Stumpf |
阅读更多来源: ArXiv AI | 23-09-25
The STAR-XAI Protocol: An Interactive Framework for Inducing Second-Order Agency in AI Agents
Authors: Antoni Guasch, Maria Isabel Valdez |
阅读更多来源: ArXiv AI | 23-09-25
Improving Large Language Models Function Calling and Interpretability via Guided-Structured Templates
Authors: Hy Dang, Tianyi Liu, Zhuofeng Wu, Jingfeng Yang, Haoming Jiang, Tao Yang, Pei Chen, Zhengyang Wang, Helen Wang, Huasheng Li, Bing Yin, Meng Jiang |
阅读更多来源: ArXiv AI | 23-09-25
Reasoning Core: A Scalable RL Environment for LLM Symbolic Reasoning
Authors: Valentin Lacombe, Valentin Quesnel, Damien Sileo |
阅读更多来源: ArXiv AI | 23-09-25
OpenAI taps Apple talent and suppliers for AI hardware push
阅读更多来源: The Decoder | 23-09-25
From Data to Diagnosis: A Large, Comprehensive Bone Marrow Dataset and AI Methods for Childhood Leukemia Prediction
Authors: Henning Höfener (1), Farina Kock (1), Martina Pontones (2), Tabita Ghete (2 and 3), David Pfrang (1), Nicholas Dickel (4), Meik Kunz (4), Daniela P. Schacherer (1), David A. Clunie (5), Andrey Fedorov (6), Max Westphal (1), Markus Metzler (2 and 3 and 7) ((1) Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany, (2) Department of Pediatrics and Adolescent Medicine, University Hospital Erlangen, Erlangen, Germany, (3) Bavarian Cancer Research Center (BZKF), Erlangen, Germany, (4) Medical Informatics, Friedrich-Alexander University of Erlangen-Nürnberg, Erlangen, Germany, (5) PixelMed Publishing LLC, Bangor, PA, USA, (6) Department of Radiology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA, (7) Comprehensive Cancer Center Erlangen-EMN, Erlangen, Germany) |
阅读更多来源: ArXiv AI | 23-09-25
Compose Yourself: Average-Velocity Flow Matching for One-Step Speech Enhancement
Authors: Gang Yang, Yue Lei, Wenxin Tai, Jin Wu, Jia Chen, Ting Zhong, Fan Zhou |
阅读更多来源: ArXiv AI | 23-09-25
MoE-CE: Enhancing Generalization for Deep Learning based Channel Estimation via a Mixture-of-Experts Framework
Authors: Tianyu Li, Yan Xin, Jianzhong (Charlie)Zhang |
阅读更多来源: ArXiv AI | 23-09-25
Explainable AI for Maritime Autonomous Surface Ships (MASS): Adaptive Interfaces and Trustworthy Human-AI Collaboration
Authors: Zhuoyue Zhang, Haitong Xu |
阅读更多来源: ArXiv AI | 23-09-25
BEFT: Bias-Efficient Fine-Tuning of Language Models
Authors: Baichuan Huang, Ananth Balashankar, Amir Aminifar |
阅读更多来源: ArXiv AI | 23-09-25
Beyond Pointwise Scores: Decomposed Criteria-Based Evaluation of LLM Responses
Authors: Fangyi Yu, Nabeel Seedat, Dasha Herrmannova, Frank Schilder, Jonathan Richard Schwarz |
阅读更多来源: ArXiv AI | 23-09-25
See&Trek: Training-Free Spatial Prompting for Multimodal Large Language Model
Authors: Pengteng Li, Pinhao Song, Wuyang Li, Weiyu Guo, Huizai Yao, Yijie Xu, Dugang Liu, Hui Xiong |
阅读更多来源: ArXiv AI | 23-09-25
Communications to Circulations: 3D Wind Field Retrieval and Real-Time Prediction Using 5G GNSS Signals and Deep Learning
Authors: Yuchen Ye, Hong Liang, Chaoxia Yuan, Mingyu Li, Aoqi Zhou, Chunqing Shang, Hua Cai, Peixi Liu, Kezuan Wang, Yifeng Zheng |
阅读更多来源: ArXiv AI | 23-09-25
CultureScope: A Dimensional Lens for Probing Cultural Understanding in LLMs
Authors: Jinghao Zhang, Sihang Jiang, Shiwei Guo, Shisong Chen, Yanghua Xiao, Hongwei Feng, Jiaqing Liang, Minggui HE, Shimin Tao, Hongxia Ma |
阅读更多来源: ArXiv AI | 23-09-25
An Artificial Intelligence Driven Semantic Similarity-Based Pipeline for Rapid Literature
Authors: Abhiyan Dhakal (1), Kausik Paudel (1), Sanjog Sigdel (1) ((1) Kathmandu University, Dhulikhel, Nepal) |
阅读更多来源: ArXiv AI | 23-09-25
FragmentRetro: A Quadratic Retrosynthetic Method Based on Fragmentation Algorithms
Authors: Yu Shee, Anthony M. Smaldone, Anton Morgunov, Gregory W. Kyro, Victor S. Batista |
阅读更多来源: ArXiv AI | 23-09-25
Knowledge-Driven Hallucination in Large Language Models: An Empirical Study on Process Modeling
Authors: Humam Kourani, Anton Antonov, Alessandro Berti, Wil M.P. van der Aalst |
阅读更多来源: ArXiv AI | 23-09-25
MicroRCA-Agent: Microservice Root Cause Analysis Method Based on Large Language Model Agents
Authors: Pan Tang, Shixiang Tang, Huanqi Pu, Zhiqing Miao, Zhixing Wang |
阅读更多来源: ArXiv AI | 23-09-25
A Nascent Taxonomy of Machine Learning in Intelligent Robotic Process Automation
Authors: Lukas Laakmann, Seyyid A. Ciftci, Christian Janiesch |
阅读更多来源: ArXiv AI | 23-09-25
EHR-MCP: Real-world Evaluation of Clinical Information Retrieval by Large Language Models via Model Context Protocol
Authors: Kanato Masayoshi, Masahiro Hashimoto, Ryoichi Yokoyama, Naoki Toda, Yoshifumi Uwamino, Shogo Fukuda, Ho Namkoong, Masahiro Jinzaki |
阅读更多来源: ArXiv AI | 23-09-25
Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers
Authors: Krati Saxena, Federico Jurado Ruiz, Guido Manzi, Dianbo Liu, Alex Lamb |
阅读更多来源: ArXiv AI | 23-09-25
CompileBench: Can AI Compile 22-year-old Code?quesma.com
阅读更多来源: Hacker News | 23-09-25
OpenAI and Nvidia announce partnership to deploy 10GW of Nvidia systemsopenai.com
阅读更多来源: Hacker News | 23-09-25
Notion 3.0’s new AI agents can be tricked into leaking data through a malicious PDF
阅读更多来源: The Decoder | 22-09-25
Lightweight, highly accurate line and paragraph detectionarxiv.org
阅读更多来源: Hacker News | 22-09-25
How can I influence others without manipulating them?andiroberts.com
阅读更多来源: Hacker News | 22-09-25
We Politely Insist: Your LLM Must Learn the Persian Art of Taarofarxiv.org
阅读更多来源: Hacker News | 22-09-25
You did this with an AI and you do not understand what you're doing herehackerone.com
阅读更多来源: Hacker News | 22-09-25
Bringing Observability to Claude Code: OpenTelemetry in Actionsignoz.io
阅读更多来源: Hacker News | 22-09-25
Unified Line and Paragraph Detection by Graph Convolutional Networks (2022)arxiv.org
阅读更多来源: Hacker News | 22-09-25
Be Careful with Go Struct Embeddingmattjhall.co.uk
阅读更多来源: Hacker News | 22-09-25
LLMs are still surprisingly bad at some simple tasksshkspr.mobi
阅读更多来源: Hacker News | 21-09-25
Blockchain-Enabled Explainable AI for Trusted Healthcare Systems
Authors: Md Talha Mohsin |
阅读更多来源: ArXiv AI | 21-09-25
CLEAR: A Comprehensive Linguistic Evaluation of Argument Rewriting by Large Language Models
Authors: Thomas Huber, Christina Niklaus |
阅读更多来源: ArXiv AI | 21-09-25
Attention Beyond Neighborhoods: Reviving Transformer for Graph Clustering
Authors: Xuanting Xie, Bingheng Li, Erlin Pan, Rui Hou, Wenyu Chen, Zhao Kang |
阅读更多来源: ArXiv AI | 21-09-25
TextMine: LLM-Powered Knowledge Extraction for Humanitarian Mine Action
Authors: Chenyue Zhou, Gürkan Solmaz, Flavio Cirillo, Kiril Gashteovski, Jonathan Fürst |
阅读更多来源: ArXiv AI | 21-09-25
Listening, Imagining \& Refining: A Heuristic Optimized ASR Correction Framework with LLMs
Authors: Yutong Liu, Ziyue Zhang, Yongbin Yu, Xiangxiang Wang, Yuqing Cai, Nyima Tashi |
阅读更多来源: ArXiv AI | 21-09-25
Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning
Authors: Yeongbin Seo, Dongha Lee, Jaehyung Kim, Jinyoung Yeo |
阅读更多来源: ArXiv AI | 21-09-25
SMARTER: A Data-efficient Framework to Improve Toxicity Detection with Explanation via Self-augmenting Large Language Models
Authors: Huy Nghiem, Advik Sachdeva, Hal Daumé III |
阅读更多来源: ArXiv AI | 21-09-25
Watermarking and Anomaly Detection in Machine Learning Models for LORA RF Fingerprinting
Authors: Aarushi Mahajan, Wayne Burleson |
阅读更多来源: ArXiv AI | 21-09-25
Leveraging Geometric Visual Illusions as Perceptual Inductive Biases for Vision Models
Authors: Haobo Yang, Minghao Guo, Dequan Yang, Wenyu Wang |
阅读更多来源: ArXiv AI | 21-09-25
From Capabilities to Performance: Evaluating Key Functional Properties of LLM Architectures in Penetration Testing
Authors: Lanxiao Huang, Daksh Dave, Ming Jin, Tyler Cody, Peter Beling |
阅读更多来源: ArXiv AI | 21-09-25
FlowRL: Matching Reward Distributions for LLM Reasoning
Authors: Xuekai Zhu, Daixuan Cheng, Dinghuai Zhang, Hengli Li, Kaiyan Zhang, Che Jiang, Youbang Sun, Ermo Hua, Yuxin Zuo, Xingtai Lv, Qizheng Zhang, Lin Chen, Fanghao Shao, Bo Xue, Yunchong Song, Zhenjie Yang, Ganqu Cui, Ning Ding, Jianfeng Gao, Xiaodong Liu, Bowen Zhou, Hongyuan Mei, Zhouhan Lin |
阅读更多来源: ArXiv AI | 21-09-25
Rationality Check! Benchmarking the Rationality of Large Language Models
Authors: Zhilun Zhou, Jing Yi Wang, Nicholas Sukiennik, Chen Gao, Fengli Xu, Yong Li, James Evans |
阅读更多来源: ArXiv AI | 21-09-25
VCBench: Benchmarking LLMs in Venture Capital
Authors: Rick Chen, Joseph Ternasky, Afriyie Samuel Kwesi, Ben Griffin, Aaron Ontoyin Yin, Zakari Salifu, Kelvin Amoaba, Xianling Mu, Fuat Alican, Yigit Ihlamur |
阅读更多来源: ArXiv AI | 21-09-25
The NazoNazo Benchmark: A Cost-Effective and Extensible Test of Insight-Based Reasoning in LLMs
Authors: Masaharu Mizumoto, Dat Nguyen, Zhiheng Han, Jiyuan Fang, Heyuan Guan, Xingfu Li, Naoya Shiraishi, Xuyang Tian, Yo Nakawake, Le Minh Nguyen |
阅读更多来源: ArXiv AI | 21-09-25
Explainable AI for Infection Prevention and Control: Modeling CPE Acquisition and Patient Outcomes in an Irish Hospital with Transformers
Authors: Minh-Khoi Pham, Tai Tan Mai, Martin Crane, Rob Brennan, Marie E. Ward, Una Geary, Declan Byrne, Brian O Connell, Colm Bergin, Donncha Creagh, Nick McDonald, Marija Bezbradica |
阅读更多来源: ArXiv AI | 21-09-25
Sentinel Agents for Secure and Trustworthy Agentic AI in Multi-Agent Systems
Authors: Diego Gosmar, Deborah A. Dahl |
阅读更多来源: ArXiv AI | 21-09-25
A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-making
Authors: Xiao Wu, Ting-Zhu Huang, Liang-Jian Deng, Yanyuan Qiao, Imran Razzak, Yutong Xie |
阅读更多来源: ArXiv AI | 21-09-25
From Sea to System: Exploring User-Centered Explainable AI for Maritime Decision Support
Authors: Doreen Jirak, Pieter Maes, Armeen Saroukanoff, Dirk van Rooy |
阅读更多来源: ArXiv AI | 21-09-25
Calibrated Generative AI as Meta-Reviewer: A Systemic Functional Linguistics Discourse Analysis of Reviews of Peer Reviews
Authors: Gabriela C. Zapata, Bill Cope, Mary Kalantzis, Duane Searsmith |
阅读更多来源: ArXiv AI | 21-09-25
Living microbial cement supercapacitors with reactivatable energy storagecell.com
阅读更多来源: Hacker News | 21-09-25
Claude can sometimes prove itgalois.com
阅读更多来源: Hacker News | 21-09-25
Google adds Gemini AI upgrades to Chrome
阅读更多来源: The Decoder | 20-09-25
If you are good at code review, you will be good at using AI agentsseangoedecke.com
阅读更多来源: Hacker News | 20-09-25
Supporting Our AI Overlords: Redesigning Data Systems to Be Agent-Firstarxiv.org
阅读更多来源: Hacker News | 20-09-25
Hidden risk in Notion 3.0 AI agents: Web search tool abuse for data exfiltrationcodeintegrity.ai
阅读更多来源: Hacker News | 20-09-25
High-performance read-through cache for object storagegithub.com/s2-streamstore
阅读更多来源: Hacker News | 20-09-25
LLM-Deflate: Extracting LLMs into Datasetsscalarlm.com
阅读更多来源: Hacker News | 20-09-25
Overcoming barriers of hydrogen storage with a low-temperature hydrogen batteryisct.ac.jp
阅读更多来源: Hacker News | 20-09-25
Luma AI unveils Ray3, a generative video model with HDR support
阅读更多来源: The Decoder | 20-09-25
An untidy history of AI across four bookshedgehogreview.com
阅读更多来源: Hacker News | 20-09-25
Sylvia Plath's fig tree meets machine learningdontlognow.substack.com
阅读更多来源: Hacker News | 19-09-25
Llama-Factory: Unified, Efficient Fine-Tuning for 100 Open LLMsgithub.com/hiyouga
阅读更多来源: Hacker News | 19-09-25
Grief gets an expiration date, just like usbessstillman.substack.com
阅读更多来源: Hacker News | 19-09-25
Dynamo AI (YC W22) Is Hiring a Senior Kubernetes Engineerycombinator.com
阅读更多来源: Hacker News | 19-09-25
Gemini in Chromegemini.google
阅读更多来源: Hacker News | 19-09-25
Gemini 2.5 Deep Think achieves gold at the world’s leading student programming competition
阅读更多来源: The Decoder | 19-09-25
Anthropic restricts surveillance use of Claude models, fueling tensions in Washington
阅读更多来源: The Decoder | 19-09-25
Study cautions that monitoring chains of thought soon may no longer ensure genuine AI alignment
阅读更多来源: The Decoder | 19-09-25
Anthropic explains recent Claude quality drop: three technical failures to blame
阅读更多来源: The Decoder | 19-09-25
Launch HN: Cactus (YC S25) – AI inference on smartphonesgithub.com/cactus-compute
阅读更多来源: Hacker News | 19-09-25
US tech giants invest in UK, Microsoft commits billions, OpenAI launches 'Stargate UK'
阅读更多来源: The Decoder | 18-09-25
OpenAI will automatically restrict ChatGPT access for users identified as teenagers
阅读更多来源: The Decoder | 18-09-25
YouTube adds generative AI to Shorts and podcasts
阅读更多来源: The Decoder | 18-09-25
LLM Agents for Interactive Workflow Provenance: Reference Architecture and Evaluation Methodology
Authors: Renan Souza, Timothy Poteet, Brian Etz, Daniel Rosendo, Amal Gueroudji, Woong Shin, Prasanna Balaprakash, Rafael Ferreira da Silva |
阅读更多来源: ArXiv AI | 18-09-25
DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models
Authors: Kevin Wilkinghoff, Zheng-Hua Tan |
阅读更多来源: ArXiv AI | 18-09-25
Synthesizing Behaviorally-Grounded Reasoning Chains: A Data-Generation Framework for Personal Finance LLMs
Authors: Akhil Theerthala |
阅读更多来源: ArXiv AI | 18-09-25
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
Authors: Alejandro Hernández-Cano, Alexander Hägele, Allen Hao Huang, Angelika Romanou, Antoni-Joan Solergibert, Barna Pasztor, Bettina Messmer, Dhia Garbaya, Eduard Frank Ďurech, Ido Hakimi, Juan García Giraldo, Mete Ismayilzada, Negar Foroutan, Skander Moalla, Tiancheng Chen, Vinko Sabolčec, Yixuan Xu, Michael Aerni, Badr AlKhamissi, Ines Altemir Marinas, Mohammad Hossein Amani, Matin Ansaripour, Ilia Badanin, Harold Benoit, Emanuela Boros, Nicholas Browning, Fabian Bösch, Maximilian Böther, Niklas Canova, Camille Challier, Clement Charmillot, Jonathan Coles, Jan Deriu, Arnout Devos, Lukas Drescher, Daniil Dzenhaliou, Maud Ehrmann, Dongyang Fan, Simin Fan, Silin Gao, Miguel Gila, María Grandury, Diba Hashemi, Alexander Hoyle, Jiaming Jiang, Mark Klein, Andrei Kucharavy, Anastasiia Kucherenko, Frederike Lübeck, Roman Machacek, Theofilos Manitaras, Andreas Marfurt, Kyle Matoba, Simon Matrenok, Henrique Mendoncça, Fawzi Roberto Mohamed, Syrielle Montariol, Luca Mouchel, Sven Najem-Meyer, Jingwei Ni, Gennaro Oliva, Matteo Pagliardini, Elia Palme, Andrei Panferov, Léo Paoletti, Marco Passerini, Ivan Pavlov, Auguste Poiroux, Kaustubh Ponkshe, Nathan Ranchin, Javi Rando, Mathieu Sauser, Jakhongir Saydaliev, Muhammad Ali Sayfiddinov, Marian Schneider, Stefano Schuppli, Marco Scialanga, Andrei Semenov, Kumar Shridhar, Raghav Singhal, Anna Sotnikova, Alexander Sternfeld, Ayush Kumar Tarun, Paul Teiletche, Jannis Vamvas, Xiaozhe Yao, Hao Zhao Alexander Ilic, Ana Klimovic, Andreas Krause, Caglar Gulcehre, David Rosenthal, Elliott Ash, Florian Tramèr, Joost VandeVondele, Livio Veraldi, Martin Rajman, Thomas Schulthess, Torsten Hoefler, Antoine Bosselut, Martin Jaggi, Imanol Schlag |
阅读更多来源: ArXiv AI | 18-09-25
A Universal Banach--Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training
Authors: Johnny R. Zhang (Independent Researcher), Xiaomei Mi (University of Manchester), Gaoyuan Du (Amazon), Qianyi Sun (Microsoft), Shiqi Wang (Meta), Jiaxuan Li (Amazon), Wenhua Zhou (Independent Researcher) |
阅读更多来源: ArXiv AI | 18-09-25
Position: AI Safety Must Embrace an Antifragile Perspective
Authors: Ming Jin, Hyunin Lee |
阅读更多来源: ArXiv AI | 18-09-25
Evaluation Awareness Scales Predictably in Open-Weights Large Language Models
Authors: Maheep Chaudhary, Ian Su, Nikhil Hooda, Nishith Shankar, Julia Tan, Kevin Zhu, Ashwinee Panda, Ryan Lagasse, Vasu Sharma |
阅读更多来源: ArXiv AI | 18-09-25
Agentic UAVs: LLM-Driven Autonomy with Integrated Tool-Calling and Cognitive Reasoning
Authors: Anis Koubaa, Khaled Gabr |
阅读更多来源: ArXiv AI | 18-09-25
Teaching LLMs to Plan: Logical Chain-of-Thought Instruction Tuning for Symbolic Planning
Authors: Pulkit Verma, Ngoc La, Anthony Favier, Swaroop Mishra, Julie A. Shah |
阅读更多来源: ArXiv AI | 18-09-25
Gen AI in Proof-based Math Courses: A Pilot Study
Authors: Hannah Klawa, Shraddha Rajpal, Cigole Thomas |
阅读更多来源: ArXiv AI | 18-09-25
SteeringControl: Holistic Evaluation of Alignment Steering in LLMs
Authors: Vincent Siu, Nicholas Crispino, David Park, Nathan W. Henry, Zhun Wang, Yang Liu, Dawn Song, Chenguang Wang |
阅读更多来源: ArXiv AI | 18-09-25
Exploring Major Transitions in the Evolution of Biological Cognition With Artificial Neural Networks
Authors: Konstantinos Voudouris, Andrew Barron, Marta Halina, Colin Klein, Matishalin Patel |
阅读更多来源: ArXiv AI | 18-09-25
MIRA: Empowering One-Touch AI Services on Smartphones with MLLM-based Instruction Recommendation
Authors: Zhipeng Bian, Jieming Zhu, Xuyang Xie, Quanyu Dai, Zhou Zhao, Zhenhua Dong |
阅读更多来源: ArXiv AI | 18-09-25
Orange Pi RV2 $40 RISC-V SBC: Friendly Gateway to IoT and AI Projectsriscv.org
阅读更多来源: Hacker News | 18-09-25
A postmortem of three recent issuesanthropic.com
阅读更多来源: Hacker News | 18-09-25
John Grisham Still Wonders: Will Texas Kill Robert Roberson?dmagazine.com
阅读更多来源: Hacker News | 18-09-25
60 years after Gemini, newly processed images reveal detailsarstechnica.com
阅读更多来源: Hacker News | 18-09-25
Tau² benchmark: How a prompt rewrite boosted GPT-5-mini by 22%quesma.com
阅读更多来源: Hacker News | 18-09-25
Alibaba's new AI chip: Key specifications comparable to H20futunn.com
阅读更多来源: Hacker News | 18-09-25
DeepMind and OpenAI win gold at ICPCcodeforces.com
阅读更多来源: Hacker News | 18-09-25
OpenAI releases GPT-5 Codex designed for bug fixes and code generation
阅读更多来源: The Decoder | 17-09-25
Bertrand Russell to Oswald Mosley (1962)lettersofnote.com
阅读更多来源: Hacker News | 17-09-25
Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors
Authors: Aniket Didolkar, Nicolas Ballas, Sanjeev Arora, Anirudh Goyal |
阅读更多来源: ArXiv AI | 17-09-25
AIssistant: An Agentic Approach for Human--AI Collaborative Scientific Work on Reviews and Perspectives in Machine Learning
Authors: Sasi Kiran Gaddipati, Farhana Keya, Gollam Rabby, Sören Auer |
阅读更多来源: ArXiv AI | 17-09-25
Developing an aeroponic smart experimental greenhouse for controlling irrigation and plant disease detection using deep learning and IoT
Authors: Mohammadreza Narimani, Ali Hajiahmad, Ali Moghimi, Reza Alimardani, Shahin Rafiee, Amir Hossein Mirzabe |
阅读更多来源: ArXiv AI | 17-09-25
LLMAP: LLM-Assisted Multi-Objective Route Planning with User Preferences
Authors: Liangqi Yuan, Dong-Jun Han, Christopher G. Brinton, Sabine Brunswicker |
阅读更多来源: ArXiv AI | 17-09-25
zELO: ELO-inspired Training Method for Rerankers and Embedding Models
Authors: Nicholas Pipitone, Ghita Houir Alami, Advaith Avadhanam, Anton Kaminskyi, Ashley Khoo |
阅读更多来源: ArXiv AI | 17-09-25
Human + AI for Accelerating Ad Localization Evaluation
Authors: Harshit Rajgarhia, Shivali Dalmia, Mengyang Zhao, Mukherji Abhishek, Kiran Ganesh |
阅读更多来源: ArXiv AI | 17-09-25
DaSAThco: Data-Aware SAT Heuristics Combinations Optimization via Large Language Models
Authors: Minyu Chen, Guoqiang Li |
阅读更多来源: ArXiv AI | 17-09-25
Match Chat: Real Time Generative AI and Generative Computing for Tennis
Authors: Aaron Baughman, Gozde Akay, Eduardo Morales, Rahul Agarwal, Preetika Srivastava |
阅读更多来源: ArXiv AI | 17-09-25
ECG-aBcDe: Overcoming Model Dependence, Encoding ECG into a Universal Language for Any LLM
Authors: Yong Xia, Jingxuan Li, YeTeng Sun, Jiarui Bu |
阅读更多来源: ArXiv AI | 17-09-25
Large Language Models Imitate Logical Reasoning, but at what Cost?
Authors: Lachlan McGinness, Peter Baumgartner |
阅读更多来源: ArXiv AI | 17-09-25
Learn to Relax with Large Language Models: Solving Nonlinear Combinatorial Optimization Problems via Bidirectional Coevolution
Authors: Beidan Liu, Zhengqiu Zhu, Chen Gao, Yong Zhao, Wei Qi, Quanjun Yin |
阅读更多来源: ArXiv AI | 17-09-25
LTA-thinker: Latent Thought-Augmented Training Framework for Large Language Models on Complex Reasoning
Authors: Jiaqi Wang, Binquan Ji, Haibo Luo, Yiyang Qi, Ruiting Li, Huiyan Wang, Yuantao Han, Cangyi Yang, jiaxu Zhang, Feiliang Ren |
阅读更多来源: ArXiv AI | 17-09-25
H$^2$R: Hierarchical Hindsight Reflection for Multi-Task LLM Agents
Authors: Shicheng Ye, Chao Yu, Kaiqiang Ke, Chengdong Xu, Yinqi Wei |
阅读更多来源: ArXiv AI | 17-09-25
Zero-shot Graph Reasoning via Retrieval Augmented Framework with LLMs
Authors: Hanqing Li, Kiran Sheena Jyothi, Henry Liang, Sharika Mahadevan, Diego Klabjan |
阅读更多来源: ArXiv AI | 17-09-25
Population Estimation using Deep Learning over Gandhinagar Urban Area
Authors: Jai Singla, Peal Jotania, Keivalya Pandya |
阅读更多来源: ArXiv AI | 17-09-25
Stochastic Streets: A Walk Through Random LLM Address Generation in four European Cities
Authors: Tairan Fu, David Campo-Nazareno, Javier Coronado-Blázquez, Javier Conde, Pedro Reviriego, Fabrizio Lombardi |
阅读更多来源: ArXiv AI | 17-09-25
Agentic AI for Financial Crime Compliance
Authors: Henrik Axelsen, Valdemar Licht, Jan Damsgaard |
阅读更多来源: ArXiv AI | 17-09-25
Simulating Clinical AI Assistance using Multimodal LLMs: A Case Study in Diabetic Retinopathy
Authors: Nadim Barakat, William Lotter |
阅读更多来源: ArXiv AI | 17-09-25
A Scenario-Driven Cognitive Approach to Next-Generation AI Memory
Authors: Linyue Cai, Yuyang Cheng, Xiaoding Shao, Huiming Wang, Yong Zhao, Wei Zhang, Kang Li |
阅读更多来源: ArXiv AI | 17-09-25
"If Anyone Builds It, Everyone Dies" researchers warn as they call for global AI shutdown
阅读更多来源: The Decoder | 16-09-25
Launch HN: Trigger.dev (YC W23) – Open-source platform to build reliable AI apps
阅读更多来源: Hacker News | 16-09-25
How People Use ChatGPT [pdf]cdn.openai.com
阅读更多来源: Hacker News | 16-09-25
GPT-5-Codexopenai.com
阅读更多来源: Hacker News | 16-09-25
Addendum to GPT-5 system card: GPT-5-Codexopenai.com
阅读更多来源: Hacker News | 16-09-25
Robert Redford Has Diednytimes.com
阅读更多来源: Hacker News | 16-09-25
Understanding AI Evaluation Patterns: How Different GPT Models Assess Vision-Language Descriptions
Authors: Sajjad Abdoli, Rudi Cilibrasi, Rima Al-Shikh |
阅读更多来源: ArXiv AI | 16-09-25
LLM Enhancement with Domain Expert Mental Model to Reduce LLM Hallucination with Causal Prompt Engineering
Authors: Boris Kovalerchuk, Brent D. Fegley |
阅读更多来源: ArXiv AI | 16-09-25
Harmful Prompt Laundering: Jailbreaking LLMs with Abductive Styles and Symbolic Encoding
Authors: Seongho Joo, Hyukhun Koh, Kyomin Jung |
阅读更多来源: ArXiv AI | 16-09-25
Rethinking Human Preference Evaluation of LLM Rationales
Authors: Ziang Li, Manasi Ganti, Zixian Ma, Helena Vasconcelos, Qijia He, Ranjay Krishna |
阅读更多来源: ArXiv AI | 16-09-25
Enhancing Computational Cognitive Architectures with LLMs: A Case Study
Authors: Ron Sun |
阅读更多来源: ArXiv AI | 16-09-25
Tractable Asymmetric Verification for Large Language Models via Deterministic Replicability
Authors: Zan-Kai Chong, Hiroyuki Ohsaki, Bryan Ng |
阅读更多来源: ArXiv AI | 16-09-25
Difficulty-Aware Agent Orchestration in LLM-Powered Workflows
Authors: Jinwei Su, Yinghui Xia, Qizhen Lan, Xinyuan Song, Yang Jingsong, Lewei He, Tianyu Shi |
阅读更多来源: ArXiv AI | 16-09-25
Prompts to Proxies: Emulating Human Preferences via a Compact LLM Ensemble
Authors: Bingchen Wang, Zi-Yu Khoo, Bryan Kian Hsiang Low |
阅读更多来源: ArXiv AI | 16-09-25
Securing AI Agents: Implementing Role-Based Access Control for Industrial Applications
Authors: Aadil Gani Ganie |
阅读更多来源: ArXiv AI | 16-09-25
MedicalOS: An LLM Agent based Operating System for Digital Healthcare
Authors: Jared Zhu, Junde Wu |
阅读更多来源: ArXiv AI | 16-09-25
A Survey of Reasoning and Agentic Systems in Time Series with Large Language Models
Authors: Ching Chang, Yidan Shi, Defu Cao, Wei Yang, Jeehyun Hwang, Haixin Wang, Jiacheng Pang, Wei Wang, Yan Liu, Wen-Chih Peng, Tien-Fu Chen |
阅读更多来源: ArXiv AI | 16-09-25
Adapting and Evaluating Multimodal Large Language Models for Adolescent Idiopathic Scoliosis Self-Management: A Divide and Conquer Framework
Authors: Zhaolong Wu, Pu Luo, Jason Pui Yin Cheung, Teng Zhang |
阅读更多来源: ArXiv AI | 16-09-25
Agentic Temporal Graph of Reasoning with Multimodal Language Models: A Potential AI Aid to Healthcare
Authors: Susanta Mitra |
阅读更多来源: ArXiv AI | 16-09-25
When Safe Unimodal Inputs Collide: Optimizing Reasoning Chains for Cross-Modal Safety in Multimodal Large Language Models
Authors: Wei Cai, Shujuan Liu, Jian Zhao, Ziyan Shi, Yusheng Zhao, Yuchen Yuan, Tianle Zhang, Chi Zhang, Xuelong Li |
阅读更多来源: ArXiv AI | 16-09-25
Bridging Engineering and AI Planning through Model-Based Knowledge Transformation for the Validation of Automated Production System Variants
Authors: Hamied Nabizada, Lasse Beers, Alain Chahine, Felix Gehlhoff, Oliver Niggemann, Alexander Fay |
阅读更多来源: ArXiv AI | 16-09-25
JustEva: A Toolkit to Evaluate LLM Fairness in Legal Knowledge Inference
Authors: Zongyue Xue, Siyuan Zheng, Shaochun Wang, Yiran Hu, Shenran Wang, Yuxin Yao, Haitao Li, Qingyao Ai, Yiqun Liu, Yun Liu, Weixing Shen |
阅读更多来源: ArXiv AI | 16-09-25
Advancing Medical Artificial Intelligence Using a Century of Cases
Authors: Thomas A. Buckley, Riccardo Conci, Peter G. Brodeur, Jason Gusdorf, Sourik Beltrán, Bita Behrouzi, Byron Crowe, Jacob Dockterman, Muzzammil Muhammad, Sarah Ohnigian, Andrew Sanchez, James A. Diao, Aashna P. Shah, Daniel Restrepo, Eric S. Rosenberg, Andrew S. Lea, Marinka Zitnik, Scott H. Podolsky, Zahir Kanjee, Raja-Elie E. Abdulnour, Jacob M. Koshy, Adam Rodman, Arjun K. Manrai |
阅读更多来源: ArXiv AI | 16-09-25
Established Psychometric vs. Ecologically Valid Questionnaires: Rethinking Psychological Assessments in Large Language Models
Authors: Dongmin Choi, Woojung Song, Jongwook Han, Eun-Ju Lee, Yohan Jo |
阅读更多来源: ArXiv AI | 16-09-25
Predictive Spike Timing Enables Distributed Shortest Path Computation in Spiking Neural Networks
Authors: Simen Storesund, Kristian Valset Aars, Robin Dietrich, Nicolai Waniek |
阅读更多来源: ArXiv AI | 16-09-25
Benchmark of stylistic variation in LLM-generated texts
Authors: Jiří Milička, Anna Marklová, Václav Cvrček |
阅读更多来源: ArXiv AI | 16-09-25
Population-Aligned Persona Generation for LLM-based Social Simulation
Authors: Zhengyu Hu, Zheyuan Xiao, Max Xiong, Yuxuan Lei, Tianfu Wang, Jianxun Lian, Kaize Ding, Ziang Xiao, Nicholas Jing Yuan, Xing Xie |
阅读更多来源: ArXiv AI | 16-09-25
I-Segmenter: Integer-Only Vision Transformer for Efficient Semantic Segmentation
Authors: Jordan Sassoon, Michal Szczepanski, Martyna Poreba |
阅读更多来源: ArXiv AI | 16-09-25
We Need a New Ethics for a World of AI Agents
Authors: Iason Gabriel, Geoff Keeling, Arianna Manzini, James Evans |
阅读更多来源: ArXiv AI | 16-09-25
SignClip: Leveraging Mouthing Cues for Sign Language Translation by Multimodal Contrastive Fusion
Authors: Wenfang Wu, Tingting Yuan, Yupeng Li, Daling Wang, Xiaoming Fu |
阅读更多来源: ArXiv AI | 16-09-25
Openness in AI and downstream governance: A global value chain approach
Authors: Christopher Foster |
阅读更多来源: ArXiv AI | 16-09-25
LLMs as Agentic Cooperative Players in Multiplayer UNO
Authors: Yago Romano Matinez, Jesse Roberts |
阅读更多来源: ArXiv AI | 16-09-25
A Modular and Multimodal Generative AI Framework for Urban Building Energy Data: Generating Synthetic Homes
Authors: Jackson Eshbaugh, Chetan Tiwari, Jorge Silveyra |
阅读更多来源: ArXiv AI | 16-09-25
How well can LLMs provide planning feedback in grounded environments?
Authors: Yuxuan Li, Victor Zhong |
阅读更多来源: ArXiv AI | 16-09-25
The (R)evolution of Scientific Workflows in the Agentic AI Era: Towards Autonomous Science
Authors: Woong Shin, Renan Souza, Daniel Rosendo, Frédéric Suter, Feiyi Wang, Prasanna Balaprakash, Rafael Ferreira da Silva |
阅读更多来源: ArXiv AI | 16-09-25
AI Harmonics: a human-centric and harms severity-adaptive AI risk assessment framework
Authors: Sofia Vei, Paolo Giudici, Pavlos Sermpezis, Athena Vakali, Adelaide Emma Bernardelli |
阅读更多来源: ArXiv AI | 16-09-25
The Morality of Probability: How Implicit Moral Biases in LLMs May Shape the Future of Human-AI Symbiosis
Authors: Eoin O'Doherty, Nicole Weinrauch, Andrew Talone, Uri Klempner, Xiaoyuan Yi, Xing Xie, Yi Zeng |
阅读更多来源: ArXiv AI | 16-09-25
RustGPT: A pure-Rust transformer LLM built from scratchgithub.com/tekaratzas
阅读更多来源: Hacker News | 16-09-25
"Aivilization" experiment lets over 22,000 AI agents model what future societies could become
阅读更多来源: The Decoder | 15-09-25
Even the best AI models can't reliably read the clock
阅读更多来源: The Decoder | 15-09-25
Sandboxing Browser AI Agentsearlence.com
阅读更多来源: Hacker News | 15-09-25
Denmark's Justice Minister calls encrypted messaging a false civil libertymastodon.social
阅读更多来源: Hacker News | 15-09-25
SpikingBrain 7B – More efficient than classic LLMsgithub.com/biclab
阅读更多来源: Hacker News | 15-09-25
Models of European metro stationsalbertguillaumes.cat
阅读更多来源: Hacker News | 15-09-25
Gentoo AI Policygentoo.org
阅读更多来源: Hacker News | 15-09-25
GPT-5 dominated 210 Werewolf games with superior manipulation and strategic thinking
阅读更多来源: The Decoder | 14-09-25
California set to pass first US law on AI companion chatbots
阅读更多来源: The Decoder | 14-09-25
Will AI be the basis of many future industrial fortunes, or a net loser?joincolossus.com
阅读更多来源: Hacker News | 14-09-25
Gemini (2023)geminiquickst.art
阅读更多来源: Hacker News | 14-09-25
MetaLLMix : An XAI Aided LLM-Meta-learning Based Approach for Hyper-parameters Optimization
Authors: Mohammed Tiouti, Mohamed Bal-Ghaoui |
阅读更多来源: ArXiv AI | 14-09-25
ENSI: Efficient Non-Interactive Secure Inference for Large Language Models
Authors: Zhiyu He, Maojiang Wang, Xinwen Gao, Yuchuan Luo, Lin Liu, Shaojing Fu |
阅读更多来源: ArXiv AI | 14-09-25
LLMs Don't Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations
Authors: Harry Mayne, Ryan Othniel Kearns, Yushi Yang, Andrew M. Bean, Eoin Delaney, Chris Russell, Adam Mahdi |
阅读更多来源: ArXiv AI | 14-09-25
Towards Explainable Job Title Matching: Leveraging Semantic Textual Relatedness and Knowledge Graphs
Authors: Vadim Zadykian, Bruno Andrade, Haithem Afli |
阅读更多来源: ArXiv AI | 14-09-25
Explainable AI for Accelerated Microstructure Imaging: A SHAP-Guided Protocol on the Connectome 2.0 scanner
Authors: Quentin Uhl, Tommaso Pavan, Julianna Gerold, Kwok-Shing Chan, Yohan Jun, Shohei Fujita, Aneri Bhatt, Yixin Ma, Qiaochu Wang, Hong-Hsi Lee, Susie Y. Huang, Berkin Bilgic, Ileana Jelescu |
阅读更多来源: ArXiv AI | 14-09-25
Incorporating AI Incident Reporting into Telecommunications Law and Policy: Insights from India
Authors: Avinash Agarwal, Manisha J. Nene |
阅读更多来源: ArXiv AI | 14-09-25
Improving Video Diffusion Transformer Training by Multi-Feature Fusion and Alignment from Self-Supervised Vision Encoders
Authors: Dohun Lee, Hyeonho Jeong, Jiwook Kim, Duygu Ceylan, Jong Chul Ye |
阅读更多来源: ArXiv AI | 14-09-25
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Authors: Jielin Qiu, Zuxin Liu, Zhiwei Liu, Rithesh Murthy, Jianguo Zhang, Haolin Chen, Shiyu Wang, Ming Zhu, Liangwei Yang, Juntao Tan, Zhepeng Cen, Cheng Qian, Shelby Heinecke, Weiran Yao, Silvio Savarese, Caiming Xiong, Huan Wang |
阅读更多来源: ArXiv AI | 14-09-25
Global Constraint LLM Agents for Text-to-Model Translation
Authors: Junyang Cai, Serdar Kadioglu, Bistra Dilkina |
阅读更多来源: ArXiv AI | 14-09-25
Automated Unity Game Template Generation from GDDs via NLP and Multi-Modal LLMs
Authors: Amna Hassan |
阅读更多来源: ArXiv AI | 14-09-25
ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms
Authors: Bingxin Xu, Zhen Dong, Oussama Elachqar, Yuzhang Shang |
阅读更多来源: ArXiv AI | 14-09-25
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
Authors: Runpeng Dai, Linfeng Song, Haolin Liu, Zhenwen Liang, Dian Yu, Haitao Mi, Zhaopeng Tu, Rui Liu, Tong Zheng, Hongtu Zhu, Dong Yu |
阅读更多来源: ArXiv AI | 14-09-25
Anti-Money Laundering Machine Learning Pipelines; A Technical Analysis on Identifying High-risk Bank Clients with Supervised Learning
Authors: Khashayar Namdar, Pin-Chien Wang, Tushar Raju, Steven Zheng, Fiona Li, Safwat Tahmin Khan |
阅读更多来源: ArXiv AI | 14-09-25
Understanding Economic Tradeoffs Between Human and AI Agents in Bargaining Games
Authors: Crystal Qian, Kehang Zhu, John Horton, Benjamin S. Manning, Vivian Tsai, James Wexler, Nithum Thain |
阅读更多来源: ArXiv AI | 14-09-25
Instructional Prompt Optimization for Few-Shot LLM-Based Recommendations on Cold-Start Users
Authors: Haowei Yang, Yushang Zhao, Sitao Min, Bo Su, Chao Yao, Wei Xu |
阅读更多来源: ArXiv AI | 14-09-25
ForTIFAI: Fending Off Recursive Training Induced Failure for AI Models
Authors: Soheil Zibakhsh Shabgahi, Pedram Aghazadeh, Azalia Mirhosseini, Farinaz Koushanfar |
阅读更多来源: ArXiv AI | 14-09-25
Jupiter: Enhancing LLM Data Analysis Capabilities via Notebook and Inference-Time Value-Guided Search
Authors: Shuocheng Li, Yihao Liu, Silin Du, Wenxuan Zeng, Zhe Xu, Mengyu Zhou, Yeye He, Haoyu Dong, Shi Han, Dongmei Zhang |
阅读更多来源: ArXiv AI | 14-09-25
LightAgent: Production-level Open-source Agentic AI Framework
Authors: Weige Cai, Tong Zhu, Jinyi Niu, Ruiqi Hu, Lingyao Li, Tenglong Wang, Xiaowu Dai, Weining Shen, Liwen Zhang |
阅读更多来源: ArXiv AI | 14-09-25
Fusing Knowledge and Language: A Comparative Study of Knowledge Graph-Based Question Answering with LLMs
Authors: Vaibhav Chaudhary, Neha Soni, Narotam Singh, Amita Kapoor |
阅读更多来源: ArXiv AI | 14-09-25
Inteligencia Artificial jurídica y el desafío de la veracidad: análisis de alucinaciones, optimización de RAG y principios para una integración responsable
Authors: Alex Dantart |
阅读更多来源: ArXiv AI | 14-09-25
Boosting Embodied AI Agents through Perception-Generation Disaggregation and Asynchronous Pipeline Execution
Authors: Shulai Zhang, Ao Xu, Quan Chen, Han Zhao, Weihao Cui, Ningxin Zheng, Haibin Lin, Xin Liu, Minyi Guo |
阅读更多来源: ArXiv AI | 14-09-25
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
Authors: Akshit Sinha, Arvindh Arun, Shashwat Goel, Steffen Staab, Jonas Geiping |
阅读更多来源: ArXiv AI | 14-09-25
How to Use Claude Code Subagents to Parallelize Developmentzachwills.net
阅读更多来源: Hacker News | 14-09-25
Leading AI chatbots are now twice as likely to spread false information as last year, study finds
阅读更多来源: The Decoder | 14-09-25
Microsoft, OpenAI reach preliminary understanding on new partnership terms
阅读更多来源: The Decoder | 13-09-25
Reduce bandwidth costs with dm-cache: fast local SSD caching for network storageupsun.com
阅读更多来源: Hacker News | 13-09-25
How 'overworked, underpaid' humans train Google's AI to seem smarttheguardian.com
阅读更多来源: Hacker News | 13-09-25
VaultGemma: The most capable differentially private LLMresearch.google
阅读更多来源: Hacker News | 13-09-25
OpenAI Groveopenai.com
阅读更多来源: Hacker News | 13-09-25
Real-time AI hallucination detection with timeplus: A chess exampletimeplus.com
阅读更多来源: Hacker News | 13-09-25
Windows-Use: an AI agent that interacts with Windows at GUI layergithub.com/cursortouch
阅读更多来源: Hacker News | 13-09-25
Claude’s memory architecture is the opposite of ChatGPT’sshloked.com
阅读更多来源: Hacker News | 12-09-25
Launch HN: Ghostship (YC S25) – AI agents that find bugs in your web app
阅读更多来源: Hacker News | 12-09-25
Backprompting: Leveraging synthetic production data for health advice guardrailsarxiv.org
阅读更多来源: Hacker News | 12-09-25
Microsoft will add Anthropic’s Claude models to Office 365 apps alongside its OpenAI-powered features
阅读更多来源: The Decoder | 11-09-25
Huawei’s AI chip production boom reportedly faces a critical shortage of high-bandwidth memory
阅读更多来源: The Decoder | 11-09-25
Stability AI releases Stable Audio 2.5 for faster and more complex AI-generated music
阅读更多来源: The Decoder | 11-09-25
Rubin CPX is Nvidia's first GPU built specifically for massive-context AI applications
阅读更多来源: The Decoder | 11-09-25
HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants
Authors: Benjamin Sturgeon, Daniel Samuelson, Jacob Haimes, Jacy Reese Anthis |
阅读更多来源: ArXiv AI | 11-09-25
Send to which account? Evaluation of an LLM-based Scambaiting System
Authors: Hossein Siadati, Haadi Jafarian, Sima Jafarikhah |
阅读更多来源: ArXiv AI | 11-09-25
Memorization in Large Language Models in Medicine: Prevalence, Characteristics, and Implications
Authors: Anran Li, Lingfei Qian, Mengmeng Du, Yu Yin, Yan Hu, Zihao Sun, Yihang Fu, Erica Stutz, Xuguang Ai, Qianqian Xie, Rui Zhu, Jimin Huang, Yifan Yang, Siru Liu, Yih-Chung Tham, Lucila Ohno-Machado, Hyunghoon Cho, Zhiyong Lu, Hua Xu, Qingyu Chen |
阅读更多来源: ArXiv AI | 11-09-25
Classification of 24-hour movement behaviors from wrist-worn accelerometer data: from handcrafted features to deep learning techniques
Authors: Alireza Sameh, Mehrdad Rostami, Mourad Oussalah, Vahid Farrahi |
阅读更多来源: ArXiv AI | 11-09-25
Architecting Resilient LLM Agents: A Guide to Secure Plan-then-Execute Implementations
Authors: Ron F. Del Rosario, Klaudia Krawiecka, Christian Schroeder de Witt |
阅读更多来源: ArXiv AI | 11-09-25
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
Authors: Zhiheng Xi, Jixuan Huang, Chenyang Liao, Baodai Huang, Honglin Guo, Jiaqi Liu, Rui Zheng, Junjie Ye, Jiazheng Zhang, Wenxiang Chen, Wei He, Yiwen Ding, Guanyu Li, Zehui Chen, Zhengyin Du, Xuesong Yao, Yufei Xu, Jiecao Chen, Tao Gui, Zuxuan Wu, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang |
阅读更多来源: ArXiv AI | 11-09-25
Scaling Truth: The Confidence Paradox in AI Fact-Checking
Authors: Ihsan A. Qazi, Zohaib Khan, Abdullah Ghani, Agha A. Raza, Zafar A. Qazi, Wassay Sajjad, Ayesha Ali, Asher Javaid, Muhammad Abdullah Sohail, Abdul H. Azeemi |
阅读更多来源: ArXiv AI | 11-09-25
An End-to-End Deep Learning Framework for Arsenicosis Diagnosis Using Mobile-Captured Skin Images
Authors: Asif Newaz, Asif Ur Rahman Adib, Rajit Sahil, Mashfique Mehzad |
阅读更多来源: ArXiv AI | 11-09-25
Using AI to Optimize Patient Transfer and Resource Utilization During Mass-Casualty Incidents: A Simulation Platform
Authors: Zhaoxun "Lorenz" Liu, Wagner H. Souza, Jay Han, Amin Madani |
阅读更多来源: ArXiv AI | 11-09-25
Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation
Authors: Joachim Baumann, Paul Röttger, Aleksandra Urman, Albert Wendsjö, Flor Miriam Plaza-del-Arco, Johannes B. Gruber, Dirk Hovy |
阅读更多来源: ArXiv AI | 11-09-25
Co-Investigator AI: The Rise of Agentic AI for Smarter, Trustworthy AML Compliance Narratives
Authors: Prathamesh Vasudeo Naik, Naresh Kumar Dintakurthi, Zhanghao Hu, Yue Wang, Robby Qiu |
阅读更多来源: ArXiv AI | 11-09-25
Leveraging AI Agents for Autonomous Networks: A Reference Architecture and Empirical Studies
Authors: Binghan Wu, Shoufeng Wang, Yunxin Liu, Ya-Qin Zhang, Joseph Sifakis, Ye Ouyang |
阅读更多来源: ArXiv AI | 11-09-25
No-Knowledge Alarms for Misaligned LLMs-as-Judges
Authors: Andrés Corrada-Emmanuel |
阅读更多来源: ArXiv AI | 11-09-25
The More You Automate, the Less You See: Hidden Pitfalls of AI Scientist Systems
Authors: Ziming Luo, Atoosa Kasirzadeh, Nihar B. Shah |
阅读更多来源: ArXiv AI | 11-09-25
Rewriting Dataframes for MicroHaskellmchav.github.io
阅读更多来源: Hacker News | 11-09-25
Defeating Nondeterminism in LLM Inferencethinkingmachines.ai
阅读更多来源: Hacker News | 11-09-25
ChatGPT Developer Mode: Full MCP client accessplatform.openai.com
阅读更多来源: Hacker News | 11-09-25
AI's $344B 'Language Model' Bet Looks Fragilebloomberg.com
阅读更多来源: Hacker News | 11-09-25
Germany is not supporting ChatControl – blocking minority secureddigitalcourage.social
阅读更多来源: Hacker News | 11-09-25
Anthropic's Claude now lets users edit Word, PowerPoint, Excel, and PDFs in chat
阅读更多来源: The Decoder | 11-09-25
OpenAI leaders have discussed leaving California, according to the Wall Street Journal
阅读更多来源: The Decoder | 10-09-25
R-Zero: Self-Evolving Reasoning LLM from Zero Dataarxiv.org
阅读更多来源: Hacker News | 10-09-25
Claude now has access to a server-side container environmentanthropic.com
阅读更多来源: Hacker News | 10-09-25
I replaced Animal Crossing's dialogue with a live LLM by hacking GameCube memoryjoshfonseca.com
阅读更多来源: Hacker News | 10-09-25
Are LLMs Enough for Hyperpartisan, Fake, Polarized and Harmful Content Detection? Evaluating In-Context Learning vs. Fine-Tuning
Authors: Michele Joshua Maggini, Dhia Merzougui, Rabiraj Bandyopadhyay, Gaël Dias, Fabrice Maurel, Pablo Gamallo |
阅读更多来源: ArXiv AI | 10-09-25
What Were You Thinking? An LLM-Driven Large-Scale Study of Refactoring Motivations in Open-Source Projects
Authors: Mikel Robredo, Matteo Esposito, Fabio Palomba, Rafael Peñaloza, Valentina Lenarduzzi |
阅读更多来源: ArXiv AI | 10-09-25
Spectral and Rhythm Feature Performance Evaluation for Category and Class Level Audio Classification with Deep Convolutional Neural Networks
Authors: Friedrich Wolf-Monheim |
阅读更多来源: ArXiv AI | 10-09-25
Uncovering Scaling Laws for Large Language Models via Inverse Problems
Authors: Arun Verma, Zhaoxuan Wu, Zijian Zhou, Xiaoqiang Lin, Zhiliang Chen, Rachael Hwee Ling Sim, Rui Qiao, Jingtan Wang, Nhung Bui, Xinyuan Niu, Wenyang Hu, Gregory Kang Ruey Lau, Zi-Yu Khoo, Zitong Zhao, Xinyi Xu, Apivich Hemachandra, See-Kiong Ng, Bryan Kian Hsiang Low |
阅读更多来源: ArXiv AI | 10-09-25
Deep Learning-Based Burned Area Mapping Using Bi-Temporal Siamese Networks and AlphaEarth Foundation Datasets
Authors: Seyd Teymoor Seydi |
阅读更多来源: ArXiv AI | 10-09-25
Forecasting Russian Equipment Losses Using Time Series and Deep Learning Models
Authors: Jonathan Teagan |
阅读更多来源: ArXiv AI | 10-09-25
Breaking Android with AI: A Deep Dive into LLM-Powered Exploitation
Authors: Wanni Vidulige Ishan Perera, Xing Liu, Fan liang, Junyi Zhang |
阅读更多来源: ArXiv AI | 10-09-25
Accelerating Local AI on Consumer GPUs: A Hardware-Aware Dynamic Strategy for YOLOv10s
Authors: Mahmudul Islam Masum, Miad Islam, Arif I. Sarwat |
阅读更多来源: ArXiv AI | 10-09-25
GENUINE: Graph Enhanced Multi-level Uncertainty Estimation for Large Language Models
Authors: Tuo Wang, Adithya Kulkarni, Tyler Cody, Peter A. Beling, Yujun Yan, Dawei Zhou |
阅读更多来源: ArXiv AI | 10-09-25
That's So FETCH: Fashioning Ensemble Techniques for LLM Classification in Civil Legal Intake and Referral
Authors: Quinten Steenhuis |
阅读更多来源: ArXiv AI | 10-09-25
A Hybrid CNN-LSTM Deep Learning Model for Intrusion Detection in Smart Grid
Authors: Abdulhakim Alsaiari, Mohammad Ilyas |
阅读更多来源: ArXiv AI | 10-09-25
SheetDesigner: MLLM-Powered Spreadsheet Layout Generation with Rule-Based and Vision-Based Reflection
Authors: Qin Chen, Yuanyi Ren, Xiaojun Ma, Mugeng Liu, Han Shi, Dongmei Zhang |
阅读更多来源: ArXiv AI | 10-09-25
Getting In Contract with Large Language Models -- An Agency Theory Perspective On Large Language Model Alignment
Authors: Sascha Kaltenpoth, Oliver Müller |
阅读更多来源: ArXiv AI | 10-09-25
FHIR-RAG-MEDS: Integrating HL7 FHIR with Retrieval-Augmented Large Language Models for Enhanced Medical Decision Support
Authors: Yildiray Kabak, Gokce B. Laleci Erturkmen, Mert Gencturk, Tuncay Namli, A. Anil Sinaci, Ruben Alcantud Corcoles, Cristina Gomez Ballesteros, Pedro Abizanda, Asuman Dogac |
阅读更多来源: ArXiv AI | 10-09-25
Unleashing the True Potential of LLMs: A Feedback-Triggered Self-Correction with Long-Term Multipath Decoding
Authors: Jipeng Li, Zeyu Gao, Yubin Qi, Hande Dong, Weijian Chen, Qiang Lin |
阅读更多来源: ArXiv AI | 10-09-25
The Carbon Footprint Wizard: A Knowledge-Augmented AI Interface for Streamlining Food Carbon Footprint Analysis
Authors: Mustafa Kaan Aslan, Reinout Heijungs, Filip Ilievski |
阅读更多来源: ArXiv AI | 10-09-25
BDPM: A Machine Learning-Based Feature Extractor for Parkinson's Disease Classification via Gut Microbiota Analysis
Authors: Bo Yu, Zhixiu Hua, Bo Zhao |
阅读更多来源: ArXiv AI | 10-09-25
Aligning LLMs for the Classroom with Knowledge-Based Retrieval -- A Comparative RAG Study
Authors: Amay Jain, Liu Cui, Si Chen |
阅读更多来源: ArXiv AI | 10-09-25
Certainty-Guided Reasoning in Large Language Models: A Dynamic Thinking Budget Approach
Authors: João Paulo Nogueira, Wentao Sun, Alonso Silva, Laith Zumot |
阅读更多来源: ArXiv AI | 10-09-25
SCoder: Iterative Self-Distillation for Bootstrapping Small-Scale Data Synthesizers to Empower Code LLMs
Authors: Xinyu Zhang, Changzhi Zhou, Linmei Hu, Luhao Zhang, Xiancai Chen, Haomin Fu, Yang Yang, Mengdi Zhang |
阅读更多来源: ArXiv AI | 10-09-25
HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics Olympiad Benchmark?
Authors: Fangchen Yu, Haiyuan Wan, Qianjia Cheng, Yuchen Zhang, Jiacheng Chen, Fujun Han, Yulun Wu, Junchi Yao, Ruilizhen Hu, Ning Ding, Yu Cheng, Tao Chen, Lei Bai, Dongzhan Zhou, Yun Luo, Ganqu Cui, Peng Ye |
阅读更多来源: ArXiv AI | 10-09-25
Probing the Preferences of a Language Model: Integrating Verbal and Behavioral Tests of AI Welfare
Authors: Valen Tagliabue, Leonard Dung |
阅读更多来源: ArXiv AI | 10-09-25
Anthropic is endorsing SB 53anthropic.com
阅读更多来源: Hacker News | 10-09-25
Anthropic judge rejects $1.5B AI copyright settlementbloomberglaw.com
阅读更多来源: Hacker News | 10-09-25
Outraged Farmers Blame Ag Monopolies as Catastrophic Collapse Loomsagweb.com
阅读更多来源: Hacker News | 10-09-25
OpenAI backs "Critterz" to show generative AI can deliver on the big screen
阅读更多来源: The Decoder | 09-09-25
Anthropic confirms technical bugs after weeks of complaints about declining Claude code quality
阅读更多来源: The Decoder | 09-09-25
Will Amazon S3 Vectors kill vector databases or save them?zilliz.com
阅读更多来源: Hacker News | 09-09-25
Experimenting with Local LLMs on macOS6nok.org
阅读更多来源: Hacker News | 09-09-25
Mistral AI raises 1.7B€, enters strategic partnership with ASMLmistral.ai
阅读更多来源: Hacker News | 09-09-25
SasAgent: Multi-Agent AI System for Small-Angle Scattering Data Analysis
Authors: Lijie Ding, Changwoo Do |
阅读更多来源: ArXiv AI | 09-09-25
Benchmarking Large Language Models for Personalized Guidance in AI-Enhanced Learning
Authors: Bo Yuan, Jiazi Hu |
阅读更多来源: ArXiv AI | 09-09-25
H$_{2}$OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers
Authors: Wenhao Li, Mengyuan Liu, Hong Liu, Pichao Wang, Shijian Lu, Nicu Sebe |
阅读更多来源: ArXiv AI | 09-09-25
Murphys Laws of AI Alignment: Why the Gap Always Wins
Authors: Madhava Gaikwad |
阅读更多来源: ArXiv AI | 09-09-25
Characterizing Fitness Landscape Structures in Prompt Engineering
Authors: Arend Hintze |
阅读更多来源: ArXiv AI | 09-09-25
Towards Meta-Cognitive Knowledge Editing for Multimodal LLMs
Authors: Zhaoyu Fan, Kaihang Pan, Mingze Zhou, Bosheng Qin, Juncheng Li, Shengyu Zhang, Wenqiao Zhang, Siliang Tang, Fei Wu, Yueting Zhuang |
阅读更多来源: ArXiv AI | 09-09-25
Decision-Focused Learning Enhanced by Automated Feature Engineering for Energy Storage Optimisation
Authors: Nasser Alkhulaifi, Ismail Gokay Dogan, Timothy R. Cargan, Alexander L. Bowler, Direnc Pekaslan, Nicholas J. Watson, Isaac Triguero |
阅读更多来源: ArXiv AI | 09-09-25
DRF: LLM-AGENT Dynamic Reputation Filtering Framework
Authors: Yuwei Lou, Hao Hu, Shaocong Ma, Zongfei Zhang, Liang Wang, Jidong Ge, Xianping Tao |
阅读更多来源: ArXiv AI | 09-09-25
Hyperbolic Large Language Models
Authors: Sarang Patil, Zeyong Zhang, Yiran Huang, Tengfei Ma, Mengjia Xu |
阅读更多来源: ArXiv AI | 09-09-25
Rethinking Reasoning Quality in Large Language Models through Enhanced Chain-of-Thought via RL
Authors: Haoyang He, Zihua Rong, Kun Ji, Chenyang Li, Qing Huang, Chong Xia, Lan Yang, Honggang Zhang |
阅读更多来源: ArXiv AI | 09-09-25
PillagerBench: Benchmarking LLM-Based Agents in Competitive Minecraft Team Environments
Authors: Olivier Schipper, Yudi Zhang, Yali Du, Mykola Pechenizkiy, Meng Fang |
阅读更多来源: ArXiv AI | 09-09-25
From Long to Short: LLMs Excel at Trimming Own Reasoning Chains
Authors: Wei Han, Geng Zhan, Sicheng Yu, Chenyu Wang, Bryan Hooi |
阅读更多来源: ArXiv AI | 09-09-25
Large Language Models as Virtual Survey Respondents: Evaluating Sociodemographic Response Generation
Authors: Jianpeng Zhao, Chenyu Yuan, Weiming Luo, Haoling Xie, Guangwei Zhang, Steven Jige Quan, Zixuan Yuan, Pengyang Wang, Denghui Zhang |
阅读更多来源: ArXiv AI | 09-09-25
Can AI Make Energy Retrofit Decisions? An Evaluation of Large Language Models
Authors: Lei Shu, Dong Zhao |
阅读更多来源: ArXiv AI | 09-09-25
From Implicit Exploration to Structured Reasoning: Leveraging Guideline and Refinement for LLMs
Authors: Jiaxiang Chen, Zhuo Wang, Mingxi Zou, Zhucong Li, Zhijian Zhou, Song Wang, Zenglin Xu |
阅读更多来源: ArXiv AI | 09-09-25
Teaching AI Stepwise Diagnostic Reasoning with Report-Guided Chain-of-Thought Learning
Authors: Yihong Luo, Wenwu He, Zhuo-Xu Cui, Dong Liang |
阅读更多来源: ArXiv AI | 09-09-25
Evaluating Multi-Turn Bargain Skills in LLM-Based Seller Agent
Authors: Issue Yishu Wang, Kakam Chong, Xiaofeng Wang, Xu Yan, DeXin Kong, Chen Ju, Ming Chen, Shuai Xiao, Shuguang Han, jufeng chen |
阅读更多来源: ArXiv AI | 09-09-25
Accelerate Scaling of LLM Alignment via Quantifying the Coverage and Depth of Instruction Set
Authors: Chengwei Wu, Li Du, Hanyu Zhao, Yiming Ju, Jiapu Wang, Tengfei Pan |
阅读更多来源: ArXiv AI | 09-09-25
HyFedRAG: A Federated Retrieval-Augmented Generation Framework for Heterogeneous and Privacy-Sensitive Data
Authors: Cheng Qian, Hainan Zhang, Yongxin Tong, Hong-Wei Zheng, Zhiming Zheng |
阅读更多来源: ArXiv AI | 09-09-25
Tree of Agents: Improving Long-Context Capabilities of Large Language Models through Multi-Perspective Reasoning
Authors: Song Yu, Xiaofei Xu, Ke Deng, Li Li, Lin Tian |
阅读更多来源: ArXiv AI | 09-09-25
An AI system to help scientists write expert-level empirical software
Authors: Eser Aygün, Anastasiya Belyaeva, Gheorghe Comanici, Marc Coram, Hao Cui, Jake Garrison, Renee Johnston Anton Kast, Cory Y. McLean, Peter Norgaard, Zahra Shamsi, David Smalling, James Thompson, Subhashini Venugopalan, Brian P. Williams, Chujun He, Sarah Martinson, Martyna Plomecka, Lai Wei, Yuchen Zhou, Qian-Ze Zhu, Matthew Abraham, Erica Brand, Anna Bulanova, Jeffrey A. Cardille, Chris Co, Scott Ellsworth, Grace Joseph, Malcolm Kane, Ryan Krueger, Johan Kartiwa, Dan Liebling, Jan-Matthis Lueckmann, Paul Raccuglia, Xuefei (Julie)Wang, Katherine Chou, James Manyika, Yossi Matias, John C. Platt, Lizzie Dorfman, Shibl Mourad, Michael P. Brenner |
阅读更多来源: ArXiv AI | 09-09-25
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers
Authors: Ran Xin, Zeyu Zheng, Yanchen Nie, Kun Yuan, Xia Xiao |
阅读更多来源: ArXiv AI | 09-09-25
RAFFLES: Reasoning-based Attribution of Faults for LLM Systems
Authors: Chenyang Zhu, Spencer Hong, Jingyu Wu, Kushal Chawla, Charlotte Tang, Youbing Yin, Nathan Wolfe, Erin Babinsky, Daben Liu |
阅读更多来源: ArXiv AI | 09-09-25
Another Turn, Better Output? A Turn-Wise Analysis of Iterative LLM Prompting
Authors: Shashidhar Reddy Javaji, Bhavul Gauri, Zining Zhu |
阅读更多来源: ArXiv AI | 09-09-25
Google's AI Mode is set to become the new default as its lawyers call the open web "in rapid decline"
阅读更多来源: The Decoder | 09-09-25
Exploring Non-Local Spatial-Angular Correlations with a Hybrid Mamba-Transformer Framework for Light Field Super-Resolution
Authors: Haosong Liu, Xiancheng Zhu, Huanqiang Zeng, Jianqing Zhu, Jiuwen Cao, Junhui Hou |
阅读更多来源: ArXiv AI | 09-09-25
LLM Enabled Multi-Agent System for 6G Networks: Framework and Method of Dual-Loop Edge-Terminal Collaboration
Authors: Zheyan Qu, Wenbo Wang, Zitong Yu, Boquan Sun, Yang Li, Xing Zhang |
阅读更多来源: ArXiv AI | 09-09-25
High-Resolution Global Land Surface Temperature Retrieval via a Coupled Mechanism-Machine Learning Framework
Authors: Tian Xie, Huanfeng Shen, Menghui Jiang, Juan-Carlos Jiménez-Muñoz, José A. Sobrino, Huifang Li, Chao Zeng |
阅读更多来源: ArXiv AI | 09-09-25
Artificial intelligence for representing and characterizing quantum systems
Authors: Yuxuan Du, Yan Zhu, Yuan-Hang Zhang, Min-Hsiu Hsieh, Patrick Rebentrost, Weibo Gao, Ya-Dong Wu, Jens Eisert, Giulio Chiribella, Dacheng Tao, Barry C. Sanders |
阅读更多来源: ArXiv AI | 09-09-25
Pointing-Guided Target Estimation via Transformer-Based Attention
Authors: Luca Müller, Hassan Ali, Philipp Allgeuer, Lukáš Gajdošech, Stefan Wermter |
阅读更多来源: ArXiv AI | 09-09-25
Uncertain but Useful: Leveraging CNN Variability into Data Augmentation
Authors: Inés Gonzalez-Pepe, Vinuyan Sivakolunthu, Yohan Chatelain, Tristan Glatard |
阅读更多来源: ArXiv AI | 09-09-25
CURE: Controlled Unlearning for Robust Embeddings -- Mitigating Conceptual Shortcuts in Pre-Trained Language Models
Authors: Aysenur Kocak, Shuo Yang, Bardh Prenkaj, Gjergji Kasneci |
阅读更多来源: ArXiv AI | 09-09-25
HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models
Authors: Chang Dai, Hongyu Shan, Mingyang Song, Di Liang |
阅读更多来源: ArXiv AI | 09-09-25
RapidGNN: Energy and Communication-Efficient Distributed Training on Large-Scale Graph Neural Networks
Authors: Arefin Niam, Tevfik Kosar, M S Q Zulkar Nine |
阅读更多来源: ArXiv AI | 09-09-25
Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining
Authors: Deniz Bayazit, Aaron Mueller, Antoine Bosselut |
阅读更多来源: ArXiv AI | 09-09-25
Scaling Performance of Large Language Model Pretraining
Authors: Alexander Interrante-Grant, Carla Varela-Rosa, Suhaas Narayan, Chris Connelly, Albert Reuther |
阅读更多来源: ArXiv AI | 09-09-25
Maestro: Joint Graph & Config Optimization for Reliable AI Agents
Authors: Wenxiao Wang, Priyatham Kattakinda, Soheil Feizi |
阅读更多来源: ArXiv AI | 09-09-25
The Ethical Compass of the Machine: Evaluating Large Language Models for Decision Support in Construction Project Management
Authors: Somtochukwu Azie, Yiping Meng |
阅读更多来源: ArXiv AI | 09-09-25
What-If Analysis of Large Language Models: Explore the Game World Using Proactive Thinking
Authors: Yuan Sui, Yanming Zhang, Yi Liao, Yu Gu, Guohua Tang, Zhongqian Sun, Wei Yang, Bryan Hooi |
阅读更多来源: ArXiv AI | 09-09-25
An Approach to Grounding AI Model Evaluations in Human-derived Criteria
Authors: Sasha Mitts |
阅读更多来源: ArXiv AI | 09-09-25
Cloning a Conversational Voice AI Agent from Call\,Recording Datasets for Telesales
Authors: Krittanon Kaewtawee, Wachiravit Modecrua, Krittin Pachtrachai, Touchapon Kraisingkorn |
阅读更多来源: ArXiv AI | 09-09-25
TalkToAgent: A Human-centric Explanation of Reinforcement Learning Agents with Large Language Models
Authors: Haechang Kim, Hao Chen, Can Li, Jong Min Lee |
阅读更多来源: ArXiv AI | 09-09-25
OSC: Cognitive Orchestration through Dynamic Knowledge Alignment in Multi-Agent LLM Collaboration
Authors: Jusheng Zhang, Yijia Fan, Kaitong Cai, Xiaofei Sun, Keze Wang |
阅读更多来源: ArXiv AI | 09-09-25
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation
Authors: Yinglin Duan, Zhengxia Zou, Tongwei Gu, Wei Jia, Zhan Zhao, Luyi Xu, Xinzhu Liu, Hao Jiang, Kang Chen, Shuang Qiu |
阅读更多来源: ArXiv AI | 09-09-25
OpenAI says ChatGPT will always make things up, but it could get better at admitting uncertainty
阅读更多来源: The Decoder | 08-09-25
OpenAI has reportedly misjudged its cash burn by $80 billion
阅读更多来源: The Decoder | 08-09-25
Anthropic bans companies majority-controlled by China, Russia, Iran, and North Korea from Claude
阅读更多来源: The Decoder | 08-09-25
Analog optical computer for AI inference and combinatorial optimizationnature.com
阅读更多来源: Hacker News | 08-09-25
GPT-5 Thinking in ChatGPT (a.k.a. Research Goblin) is good at searchsimonwillison.net
阅读更多来源: Hacker News | 08-09-25
Using Claude Code to modernize a 25-year-old kernel driverdmitrybrant.com
阅读更多来源: Hacker News | 08-09-25
RoboBallet uses AI to choreograph multiple industrial robots for safe and efficient teamwork
阅读更多来源: The Decoder | 07-09-25
The Claude Code Framework Warsshmck.substack.com
阅读更多来源: Hacker News | 07-09-25
Show HN: Semantic grep for Claude Code (RUST) (local embeddings)github.com/beaconbay
阅读更多来源: Hacker News | 07-09-25
Parking Availability Prediction via Fusing Multi-Source Data with A Self-Supervised Learning Enhanced Spatio-Temporal Inverted Transformer
Authors: Yin Huang, Yongqi Dong, Youhua Tang, Li Li |
阅读更多来源: ArXiv AI | 07-09-25
Towards a Unified View of Large Language Model Post-Training
Authors: Xingtai Lv, Yuxin Zuo, Youbang Sun, Hongyi Liu, Yuntian Wei, Zhekai Chen, Lixuan He, Xuekai Zhu, Kaiyan Zhang, Bingning Wang, Ning Ding, Bowen Zhou |
阅读更多来源: ArXiv AI | 07-09-25
No Thoughts Just AI: Biased LLM Recommendations Limit Human Agency in Resume Screening
Authors: Kyra Wilson, Mattea Sim, Anna-Maria Gueorguieva, Aylin Caliskan |
阅读更多来源: ArXiv AI | 07-09-25
Delta Activations: A Representation for Finetuned Large Language Models
Authors: Zhiqiu Xu, Amish Sethi, Mayur Naik, Ser-Nam Lim |
阅读更多来源: ArXiv AI | 07-09-25
Explainable Knowledge Graph Retrieval-Augmented Generation (KG-RAG) with KG-SMILE
Authors: Zahra Zehtabi Sabeti Moghaddam, Zeinab Dehghani, Maneeha Rani, Koorosh Aslansefat, Bhupesh Kumar Mishra, Rameez Raja Kureshi, Dhavalkumar Thakker |
阅读更多来源: ArXiv AI | 07-09-25
Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents
Authors: Davide Paglieri, Bartłomiej Cupiał, Jonathan Cook, Ulyana Piterbarg, Jens Tuyls, Edward Grefenstette, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rocktäschel |
阅读更多来源: ArXiv AI | 07-09-25
The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs
Authors: Pengrui Han, Rafal Kocielnik, Peiyang Song, Ramit Debnath, Dean Mobbs, Anima Anandkumar, R. Michael Alvarez |
阅读更多来源: ArXiv AI | 07-09-25
PersonaTeaming: Exploring How Introducing Personas Can Improve Automated AI Red-Teaming
Authors: Wesley Hanwen Deng, Sunnie S. Y. Kim, Akshita Jha, Ken Holstein, Motahhare Eslami, Lauren Wilcox, Leon A Gatys |
阅读更多来源: ArXiv AI | 07-09-25
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Authors: Haozhe Wang, Qixin Xu, Che Liu, Junhong Wu, Fangzhen Lin, Wenhu Chen |
阅读更多来源: ArXiv AI | 07-09-25
What Would an LLM Do? Evaluating Policymaking Capabilities of Large Language Models
Authors: Pierre Le Coz, Jia An Liu, Debarun Bhattacharjya, Georgina Curto, Serge Stinckwich |
阅读更多来源: ArXiv AI | 07-09-25
Learning to Deliberate: Meta-policy Collaboration for Agentic LLMs with Multi-agent Reinforcement Learning
Authors: Wei Yang, Jesse Thomason |
阅读更多来源: ArXiv AI | 07-09-25
Leveraging LLM-Based Agents for Intelligent Supply Chain Planning
Authors: Yongzhi Qi, Jiaheng Yin, Jianshen Zhang, Dongyang Geng, Zhengyu Chen, Hao Hu, Wei Qi, Zuo-Jun Max Shen |
阅读更多来源: ArXiv AI | 07-09-25
RAGuard: A Novel Approach for in-context Safe Retrieval Augmented Generation for LLMs
Authors: Connor Walker, Koorosh Aslansefat, Mohammad Naveed Akram, Yiannis Papadopoulos |
阅读更多来源: ArXiv AI | 07-09-25
Are LLM Agents Behaviorally Coherent? Latent Profiles for Social Simulation
Authors: James Mooney, Josef Woldense, Zheng Robert Jia, Shirley Anugrah Hayati, My Ha Nguyen, Vipul Raheja, Dongyeop Kang |
阅读更多来源: ArXiv AI | 07-09-25
FaMA: LLM-Empowered Agentic Assistant for Consumer-to-Consumer Marketplace
Authors: Yineng Yan, Xidong Wang, Jin Seng Cheng, Ran Hu, Wentao Guan, Nahid Farahmand, Hengte Lin, Yue Li |
阅读更多来源: ArXiv AI | 07-09-25
Expedition & Expansion: Leveraging Semantic Representations for Goal-Directed Exploration in Continuous Cellular Automata
Authors: Sina Khajehabdollahi, Gautier Hamon, Marko Cvjetko, Pierre-Yves Oudeyer, Clément Moulin-Frier, Cédric Colas |
阅读更多来源: ArXiv AI | 07-09-25
Continuous Monitoring of Large-Scale Generative AI via Deterministic Knowledge Graph Structures
Authors: Kishor Datta Gupta, Mohd Ariful Haque, Hasmot Ali, Marufa Kamal, Syed Bahauddin Alam, Mohammad Ashiqur Rahman |
阅读更多来源: ArXiv AI | 07-09-25
Meta-Policy Reflexion: Reusable Reflective Memory and Rule Admissibility for Resource-Efficient LLM Agent
Authors: Chunlong Wu, Zhibo Qu |
阅读更多来源: ArXiv AI | 07-09-25
AutoPBO: LLM-powered Optimization for Local Search PBO Solvers
Authors: Jinyuan Li, Yi Chu, Yiwen Sun, Mengchuan Zou, Shaowei Cai |
阅读更多来源: ArXiv AI | 07-09-25
Intermediate Languages Matter: Formal Languages and LLMs affect Neurosymbolic Reasoning
Authors: Alexander Beiser, David Penz, Nysret Musliu |
阅读更多来源: ArXiv AI | 07-09-25
EvoEmo: Towards Evolved Emotional Policies for LLM Agents in Multi-Turn Negotiation
Authors: Yunbo Long, Liming Xu, Lukas Beckenbauer, Yuhan Liu, Alexandra Brintrup |
阅读更多来源: ArXiv AI | 07-09-25
Psychologically Enhanced AI Agents
Authors: Maciej Besta, Shriram Chandran, Robert Gerstenberger, Mathis Lindner, Marcin Chrapek, Sebastian Hermann Martschat, Taraneh Ghandi, Patrick Iff, Hubert Niewiadomski, Piotr Nyczyk, Jürgen Müller, Torsten Hoefler |
阅读更多来源: ArXiv AI | 07-09-25
ArcMemo: Abstract Reasoning Composition with Lifelong LLM Memory
Authors: Matthew Ho, Chen Si, Zhaoxiang Feng, Fangxu Yu, Zhijian Liu, Zhiting Hu, Lianhui Qin |
阅读更多来源: ArXiv AI | 07-09-25
Why language models hallucinateopenai.com
阅读更多来源: Hacker News | 07-09-25
The maths you need to start understanding LLMsgilesthomas.com
阅读更多来源: Hacker News | 07-09-25
Swiss AI Initiative introduces a open language model focused on transparency and privacy
阅读更多来源: The Decoder | 06-09-25
GLM 4.5 with Claude Codez.ai
阅读更多来源: Hacker News | 06-09-25
A Software Development Methodology for Disciplined LLM Collaborationgithub.com/varietyz
阅读更多来源: Hacker News | 06-09-25
Anthropic agrees to pay $1.5B to settle lawsuit with book authorsnytimes.com
阅读更多来源: Hacker News | 06-09-25
How big are our embeddings now and why?vickiboykis.com
阅读更多来源: Hacker News | 06-09-25
Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCShuggingface.co
阅读更多来源: Hacker News | 06-09-25
OpenAI alleges advocacy groups may be funded by competitors worth billions
阅读更多来源: The Decoder | 05-09-25
LLM Visualizationbbycroft.net
阅读更多来源: Hacker News | 05-09-25
Relace (YC W23) Is Hiring for Code LLM's (SF)
阅读更多来源: Hacker News | 05-09-25
OpenAI eats jobs, then offers to help you find a new one at Walmarttheregister.com
阅读更多来源: Hacker News | 05-09-25
Launch HN: Slashy (YC S25) – AI that connects to apps and does tasks
阅读更多来源: Hacker News | 05-09-25
Updating restrictions of sales to unsupported regionsanthropic.com
阅读更多来源: Hacker News | 05-09-25
A PM's Guide to AI Agent Architectureproductcurious.com
阅读更多来源: Hacker News | 05-09-25
Wal3: A Write-Ahead Log for Chroma, Built on Object Storagetrychroma.com
阅读更多来源: Hacker News | 05-09-25
OpenAI will add new safety features to ChatGPT after criticism over mental health emergencies
阅读更多来源: The Decoder | 04-09-25
Melvyn Bragg steps down from presenting In Our Timebbc.co.uk
阅读更多来源: Hacker News | 04-09-25
Understanding Transformers Using a Minimal Examplerti.github.io
阅读更多来源: Hacker News | 04-09-25
Claude Code: Now in Beta in Zedzed.dev
阅读更多来源: Hacker News | 04-09-25
Domain Adaptation of LLMs for Process Data
Authors: Rafael Seidi Oyamada, Jari Peeperkorn, Jochen De Weerdt, Johannes De Smedt |
阅读更多来源: ArXiv AI | 04-09-25
Decentralised self-organisation of pivoting cube ensembles using geometric deep learning
Authors: Nadezhda Dobreva, Emmanuel Blazquez, Jai Grover, Dario Izzo, Yuzhen Qin, Dominik Dold |
阅读更多来源: ArXiv AI | 04-09-25
A Neural Network Approach to Multi-radionuclide TDCR Beta Spectroscopy
Authors: Li Yi, Qian Yang |
阅读更多来源: ArXiv AI | 04-09-25
From Evaluation to Defense: Constructing Persistent Edit-Based Fingerprints for Large Language Models
Authors: Yue Li, Xin Yi, Dongsheng Shi, Yongyi Cui, Gerard de Melo, Xiaoling Wang, Linlin Wang |
阅读更多来源: ArXiv AI | 04-09-25
LGBP-OrgaNet: Learnable Gaussian Band Pass Fusion of CNN and Transformer Features for Robust Organoid Segmentation and Tracking
Authors: Jing Zhang, Siying Tao, Jiao Li, Tianhe Wang, Junchen Wu, Ruqian Hao, Xiaohui Du, Ruirong Tan, Rui Li |
阅读更多来源: ArXiv AI | 04-09-25
Heatmap Guided Query Transformers for Robust Astrocyte Detection across Immunostains and Resolutions
Authors: Xizhe Zhang, Jiayang Zhu |
阅读更多来源: ArXiv AI | 04-09-25
TinyDrop: Tiny Model Guided Token Dropping for Vision Transformers
Authors: Guoxin Wang, Qingyuan Wang, Binhua Huang, Shaowu Chen, Deepu John |
阅读更多来源: ArXiv AI | 04-09-25
epiGPTope: A machine learning-based epitope generator and classifier
Authors: Natalia Flechas Manrique, Alberto Martínez, Elena López-Martínez, Luc Andrea, Román Orus, Aitor Manteca, Aitziber L. Cortajarena, Llorenç Espinosa-Portalés |
阅读更多来源: ArXiv AI | 04-09-25
On Entropy Control in LLM-RL Algorithms
Authors: Han Shen |
阅读更多来源: ArXiv AI | 04-09-25
Robult: Leveraging Redundancy and Modality Specific Features for Robust Multimodal Learning
Authors: Duy A. Nguyen, Abhi Kamboj, Minh N. Do |
阅读更多来源: ArXiv AI | 04-09-25
Continuous Saudi Sign Language Recognition: A Vision Transformer Approach
Authors: Soukeina Elhassen, Lama Al Khuzayem, Areej Alhothali, Ohoud Alzamzami, Nahed Alowaidi |
阅读更多来源: ArXiv AI | 04-09-25
Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data
Authors: Honglu Zhou, Xiangyu Peng, Shrikant Kendre, Michael S. Ryoo, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles |
阅读更多来源: ArXiv AI | 04-09-25
The Future of Artificial Intelligence and the Mathematical and Physical Sciences (AI+MPS)
Authors: Andrew Ferguson, Marisa LaFleur, Lars Ruthotto, Jesse Thaler, Yuan-Sen Ting, Pratyush Tiwary, Soledad Villar, E. Paulo Alves, Jeremy Avigad, Simon Billinge, Camille Bilodeau, Keith Brown, Emmanuel Candes, Arghya Chattopadhyay, Bingqing Cheng, Jonathan Clausen, Connor Coley, Andrew Connolly, Fred Daum, Sijia Dong, Chrisy Xiyu Du, Cora Dvorkin, Cristiano Fanelli, Eric B. Ford, Luis Manuel Frutos, Nicolás García Trillos, Cecilia Garraffo, Robert Ghrist, Rafael Gomez-Bombarelli, Gianluca Guadagni, Sreelekha Guggilam, Sergei Gukov, Juan B. Gutiérrez, Salman Habib, Johannes Hachmann, Boris Hanin, Philip Harris, Murray Holland, Elizabeth Holm, Hsin-Yuan Huang, Shih-Chieh Hsu, Nick Jackson, Olexandr Isayev, Heng Ji, Aggelos Katsaggelos, Jeremy Kepner, Yannis Kevrekidis, Michelle Kuchera, J. Nathan Kutz, Branislava Lalic, Ann Lee, Matt LeBlanc, Josiah Lim, Rebecca Lindsey, Yongmin Liu, Peter Y. Lu, Sudhir Malik, Vuk Mandic, Vidya Manian, Emeka P. Mazi, Pankaj Mehta, Peter Melchior, Brice Ménard, Jennifer Ngadiuba, Stella Offner, Elsa Olivetti, Shyue Ping Ong, Christopher Rackauckas, Philippe Rigollet, Chad Risko, Philip Romero, Grant Rotskoff, Brett Savoie, Uros Seljak, David Shih, Gary Shiu, Dima Shlyakhtenko, Eva Silverstein, Taylor Sparks, Thomas Strohmer, Christopher Stubbs, Stephen Thomas, Suriyanarayanan Vaikuntanathan, Rene Vidal, Francisco Villaescusa-Navarro, Gregory Voth, Benjamin Wandelt, Rachel Ward, Melanie Weber, Risa Wechsler, Stephen Whitelam, Olaf Wiest, Mike Williams, Zhuoran Yang, Yaroslava G. Yingling, Bin Yu, Shuwen Yue, Ann Zabludoff, Huimin Zhao, Tong Zhang |
阅读更多来源: ArXiv AI | 04-09-25
Can Media Act as a Soft Regulator of Safe AI Development? A Game Theoretical Analysis
Authors: Henrique Correia da Fonseca, António Fernandes, Zhao Song, Theodor Cimpeanu, Nataliya Balabanova, Adeela Bashir, Paolo Bova, Alessio Buscemi, Alessandro Di Stefano, Manh Hong Duong, Elias Fernandez Domingos, Ndidi Bianca Ogbo, Simon T. Powers, Daniele Proverbio, Zia Ush Shamszaman, Fernando P. Santos, The Anh Han, Marcus Krellner |
阅读更多来源: ArXiv AI | 04-09-25
Plan Verification for LLM-Based Embodied Task Completion Agents
Authors: Ananth Hariharan, Vardhan Dongre, Dilek Hakkani-Tür, Gokhan Tur |
阅读更多来源: ArXiv AI | 04-09-25
Do LLM Modules Generalize? A Study on Motion Generation for Autonomous Driving
Authors: Mingyi Wang, Jingke Wang, Tengju Ye, Junbo Chen, Kaicheng Yu |
阅读更多来源: ArXiv AI | 04-09-25
Accountability Framework for Healthcare AI Systems: Towards Joint Accountability in Decision Making
Authors: Prachi Bagave, Marcus Westberg, Marijn Janssen, Aaron Yi Ding |
阅读更多来源: ArXiv AI | 04-09-25
Situating AI Agents in their World: Aspective Agentic AI for Dynamic Partially Observable Information Systems
Authors: Peter J. Bentley, Soo Ling Lim, Fuyuki Ishikawa |
阅读更多来源: ArXiv AI | 04-09-25
sam-llm: interpretable lane change trajectoryprediction via parametric finetuning
Authors: Zhuo Cao, Yunxiao Shi, Min Xu |
阅读更多来源: ArXiv AI | 04-09-25
The wall confronting large language modelsarxiv.org
阅读更多来源: Hacker News | 04-09-25
We're Joining OpenAIalexcodes.app
阅读更多来源: Hacker News | 04-09-25
Where's the shovelware? Why AI coding claims don't add upmikelovesrobots.substack.com
阅读更多来源: Hacker News | 04-09-25
Evidence that AI is destroying jobs for young peoplederekthompson.org
阅读更多来源: Hacker News | 04-09-25
The Little Book of Linear Algebragithub.com/the-litte-book-of
阅读更多来源: Hacker News | 03-09-25
Computing simplified coverage polygonsvolkerkrause.eu
阅读更多来源: Hacker News | 03-09-25
Amazonq.nvim: Official AWS AI Assistant Plugin for Neovimgithub.com/awslabs
阅读更多来源: Hacker News | 03-09-25
AI is going great for the blind (2023)robertkingett.com
阅读更多来源: Hacker News | 03-09-25
A staff engineer's journey with Claude Codesanity.io
阅读更多来源: Hacker News | 03-09-25
Dynamo AI (YC W22) Is Hiring for AI Product Managersycombinator.com
阅读更多来源: Hacker News | 03-09-25
With AI Boom, Dell's Datacenter Biz Is Finally Bigger Than Its PC Biznextplatform.com
阅读更多来源: Hacker News | 03-09-25
MIT Study Finds AI Use Reprograms the Brain, Leading to Cognitive Declinepublichealthpolicyjournal.com
阅读更多来源: Hacker News | 03-09-25
A Hybrid Ai Framework For Strategic Patent Portfolio Pruning: Integrating Learning To-Rank And Market Need Analysis For Technology Transfer Optimization
Authors: Manish Verma, Vivek Sharma, Vishal Singh |
阅读更多来源: ArXiv AI | 03-09-25
UrbanInsight: A Distributed Edge Computing Framework with LLM-Powered Data Filtering for Smart City Digital Twins
Authors: Kishor Datta Gupta, Md Manjurul Ahsan, Mohd Ariful Haque, Roy George, Azmine Toushik Wasi |
阅读更多来源: ArXiv AI | 03-09-25
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Authors: Yanxiao Zhao, Yaqian Li, Zihao Bo, Rinyoichi Takezoe, Haojia Hui, Mo Guang, Lei Ren, Xiaolin Qin, Kaiwen Long |
阅读更多来源: ArXiv AI | 03-09-25
Causal MAS: A Survey of Large Language Model Architectures for Discovery and Effect Estimation
Authors: Adib Bazgir, Amir Habibdoust, Yuwen Zhang, Xing Song |
阅读更多来源: ArXiv AI | 03-09-25
CoreThink: A Symbolic Reasoning Layer to reason over Long Horizon Tasks with LLMs
Authors: Jay Vaghasiya, Omkar Ghugarkar, Vishvesh Bhat, Vipul Dholaria, Julian McAuley |
阅读更多来源: ArXiv AI | 03-09-25
Ultra Strong Machine Learning: Teaching Humans Active Learning Strategies via Automated AI Explanations
Authors: Lun Ai, Johannes Langer, Ute Schmid, Stephen Muggleton |
阅读更多来源: ArXiv AI | 03-09-25
Analysis of Error Sources in LLM-based Hypothesis Search for Few-Shot Rule Induction
Authors: Aishni Parab, Hongjing Lu, Ying Nian Wu, Sumit Gulwani |
阅读更多来源: ArXiv AI | 03-09-25
Communicative Agents for Slideshow Storytelling Video Generation based on LLMs
Authors: Jingxing Fan, Jinrong Shen, Yusheng Yao, Shuangqing Wang, Qian Wang, Yuling Wang |
阅读更多来源: ArXiv AI | 03-09-25
Towards Agentic OS: An LLM Agent Framework for Linux Schedulers
Authors: Yusheng Zheng, Yanpeng Hu, Wei Zhang, Andi Quinn |
阅读更多来源: ArXiv AI | 03-09-25
GradeSQL: Outcome Reward Models for Ranking SQL Queries from Large Language Models
Authors: Mattia Tritto, Giuseppe Farano, Dario Di Palma, Gaetano Rossiello, Fedelucio Narducci, Dharmashankar Subramanian, Tommaso Di Noia |
阅读更多来源: ArXiv AI | 03-09-25
LLM-empowered Agents Simulation Framework for Scenario Generation in Service Ecosystem Governance
Authors: Deyu Zhou, Yuqi Hou, Xiao Xue, Xudong Lu, Qingzhong Li, Lizhen Cui |
阅读更多来源: ArXiv AI | 03-09-25
DeepResearch Arena: The First Exam of LLMs' Research Abilities via Seminar-Grounded Tasks
Authors: Haiyuan Wan, Chen Yang, Junchi Yu, Meiqi Tu, Jiaxuan Lu, Di Yu, Jianbao Cao, Ben Gao, Jiaqing Xie, Aoran Wang, Wenlong Zhang, Philip Torr, Dongzhan Zhou |
阅读更多来源: ArXiv AI | 03-09-25
Unraveling LLM Jailbreaks Through Safety Knowledge Neurons
Authors: Chongwen Zhao, Kaizhu Huang |
阅读更多来源: ArXiv AI | 03-09-25
Structured AI Decision-Making in Disaster Management
Authors: Julian Gerald Dcruz, Argyrios Zolotas, Niall Ross Greenwood, Miguel Arana-Catania |
阅读更多来源: ArXiv AI | 03-09-25
An LLM-enabled semantic-centric framework to consume privacy policies
Authors: Rui Zhao, Vladyslav Melnychuk, Jun Zhao, Jesse Wright, Nigel Shadbolt |
阅读更多来源: ArXiv AI | 03-09-25
Physics Supernova: AI Agent Matches Elite Gold Medalists at IPhO 2025
Authors: Jiahao Qiu, Jingzhe Shi, Xinzhe Juan, Zelin Zhao, Jiayi Geng, Shilong Liu, Hongru Wang, Sanfeng Wu, Mengdi Wang |
阅读更多来源: ArXiv AI | 03-09-25
How Real Is AI Tutoring? Comparing Simulated and Human Dialogues in One-on-One Instruction
Authors: Ruijia Li, Yuan-Hao Jiang, Jiatong Wang, Bo Jiang |
阅读更多来源: ArXiv AI | 03-09-25
LLMs for LLMs: A Structured Prompting Methodology for Long Legal Documents
Authors: Strahinja Klem, Noura Al Moubayed |
阅读更多来源: ArXiv AI | 03-09-25
Re-evaluating LLM-based Heuristic Search: A Case Study on the 3D Packing Problem
Authors: Guorui Quan, Mingfei Sun, Manuel López-Ibáñez |
阅读更多来源: ArXiv AI | 03-09-25
GridMind: LLMs-Powered Agents for Power System Analysis and Operations
Authors: Hongwei Jin, Kibaek Kim, Jonghwan Kwon |
阅读更多来源: ArXiv AI | 03-09-25
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Authors: Guibin Zhang, Hejia Geng, Xiaohang Yu, Zhenfei Yin, Zaibin Zhang, Zelin Tan, Heng Zhou, Zhongzhi Li, Xiangyuan Xue, Yijiang Li, Yifan Zhou, Yang Chen, Chen Zhang, Yutao Fan, Zihu Wang, Songtao Huang, Yue Liao, Hongru Wang, Mengyue Yang, Heng Ji, Michael Littman, Jun Wang, Shuicheng Yan, Philip Torr, Lei Bai |
阅读更多来源: ArXiv AI | 03-09-25
Anthropic raises $13B Series Fanthropic.com
阅读更多来源: Hacker News | 03-09-25
Launch HN: Datafruit (YC S25) – AI for DevOps
阅读更多来源: Hacker News | 03-09-25
Vijaye Raji to become CTO of Applications with acquisition of Statsigopenai.com
阅读更多来源: Hacker News | 03-09-25
Harnessing IoT and Generative AI for Weather-Adaptive Learning in Climate Resilience Education
Authors: Imran S. A. Khan, Emmanuel G. Blanchard, Sébastien George |
阅读更多来源: ArXiv AI | 03-09-25
QZhou-Embedding Technical Report
Authors: Peng Yu, En Xu, Bin Chen, Haibiao Chen, Yinfei Xu |
阅读更多来源: ArXiv AI | 03-09-25
Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning
Authors: Zinan Tang, Xin Gao, Qizhi Pei, Zhuoshi Pan, Mengzhang Cai, Jiang Wu, Conghui He, Lijun Wu |
阅读更多来源: ArXiv AI | 03-09-25
Limitations of Physics-Informed Neural Networks: a Study on Smart Grid Surrogation
Authors: Julen Cestero, Carmine Delle Femine, Kenji S. Muro, Marco Quartulli, Marcello Restelli |
阅读更多来源: ArXiv AI | 03-09-25
Neural Network Acceleration on MPSoC board: Integrating SLAC's SNL, Rogue Software and Auto-SNL
Authors: Hamza Ezzaoui Rahali, Abhilasha Dave, Larry Ruckman, Mohammad Mehdi Rahimifar, Audrey C. Therrien, James J. Russel, Ryan T. Herbst |
阅读更多来源: ArXiv AI | 03-09-25
CAD2DMD-SET: Synthetic Generation Tool of Digital Measurement Device CAD Model Datasets for fine-tuning Large Vision-Language Models
Authors: João Valente, Atabak Dehban, Rodrigo Ventura |
阅读更多来源: ArXiv AI | 03-09-25
Entropy-Based Non-Invasive Reliability Monitoring of Convolutional Neural Networks
Authors: Amirhossein Nazeri, Wael Hafez |
阅读更多来源: ArXiv AI | 03-09-25
Benchmarking GPT-5 in Radiation Oncology: Measurable Gains, but Persistent Need for Expert Oversight
Authors: Ugur Dinc, Jibak Sarkar, Philipp Schubert, Sabine Semrau, Thomas Weissmann, Andre Karius, Johann Brand, Bernd-Niklas Axer, Ahmed Gomaa, Pluvio Stephan, Ishita Sheth, Sogand Beirami, Annette Schwarz, Udo Gaipl, Benjamin Frey, Christoph Bert, Stefanie Corradini, Rainer Fietkau, Florian Putz |
阅读更多来源: ArXiv AI | 03-09-25
Addressing accuracy and hallucination of LLMs in Alzheimer's disease research through knowledge graphs
Authors: Tingxuan Xu, Jiarui Feng, Justin Melendez, Kaleigh Roberts, Donghong Cai, Mingfang Zhu, Donald Elbert, Yixin Chen, Randall J. Bateman |
阅读更多来源: ArXiv AI | 03-09-25
Fuzzy, Symbolic, and Contextual: Enhancing LLM Instruction via Cognitive Scaffolding
Authors: Vanessa Figueiredo |
阅读更多来源: ArXiv AI | 03-09-25
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models
Authors: Yi Liao, Yu Gu, Yuan Sui, Zining Zhu, Yifan Lu, Guohua Tang, Zhongqian Sun, Wei Yang |
阅读更多来源: ArXiv AI | 03-09-25
HealthProcessAI: A Technical Framework and Proof-of-Concept for LLM-Enhanced Healthcare Process Mining
Authors: Eduardo Illueca-Fernandez, Kaile Chen, Fernando Seoane, Farhad Abtahi |
阅读更多来源: ArXiv AI | 03-09-25
Integrating Large Language Models with Network Optimization for Interactive and Explainable Supply Chain Planning: A Real-World Case Study
Authors: Saravanan Venkatachalam |
阅读更多来源: ArXiv AI | 03-09-25
Leveraging Imperfection with MEDLEY A Multi-Model Approach Harnessing Bias in Medical AI
Authors: Farhad Abtahi, Mehdi Astaraki, Fernando Seoane |
阅读更多来源: ArXiv AI | 03-09-25
Automated Clinical Problem Detection from SOAP Notes using a Collaborative Multi-Agent LLM Architecture
Authors: Yeawon Lee, Xiaoyang Wang, Christopher C. Yang |
阅读更多来源: ArXiv AI | 03-09-25
Cloudflare Radar: AI Insightscloudflare.com
阅读更多来源: Hacker News | 02-09-25
Adaptive LLM routing under budget constraintsarxiv.org
阅读更多来源: Hacker News | 02-09-25
Amazon has mostly sat out the AI talent warbusinessinsider.com
阅读更多来源: Hacker News | 02-09-25
An LLM is a lossy encyclopediasimonwillison.net
阅读更多来源: Hacker News | 02-09-25
AI researcher Andrej Karpathy says he's "bearish on reinforcement learning" for LLM training
阅读更多来源: The Decoder | 02-09-25
Detecting and countering misuse of AIanthropic.com
阅读更多来源: Hacker News | 02-09-25
Steve Ballmer Interviewacquired.fm
阅读更多来源: Hacker News | 02-09-25
The Tragic End of Natalia Nagovitsyna's Ordeal on Pobeda Peakexplorersweb.com
阅读更多来源: Hacker News | 02-09-25
Anthropic uses a questionable dark pattern to obtain user consent for AI data use in Claude
阅读更多来源: The Decoder | 01-09-25
How ChatGPT became a confidant and guided a teenager through planning his suicide
阅读更多来源: The Decoder | 01-09-25
Meta's superintelligence hires left for OpenAI after only a few weeks
阅读更多来源: The Decoder | 01-09-25
First Murder-Suicide Case Associated with AI Psychosisgizmodo.com
阅读更多来源: Hacker News | 01-09-25
What brain surgery taught me about the fragile gift of consciousnessbigthink.com
阅读更多来源: Hacker News | 01-09-25
LLMs struggle with clinical reasoning and are just matching patterns, study finds
阅读更多来源: The Decoder | 01-09-25
OpenAI’s real-time API picks up laughter, accents, and switches languages in real time
阅读更多来源: The Decoder | 31-08-25
Sniffly – Claude Code Analytics Dashboardgithub.com/chiphuyen
阅读更多来源: Hacker News | 31-08-25
Multi-Agent Penetration Testing AI for the Web
Authors: Isaac David, Arthur Gervais |
阅读更多来源: ArXiv AI | 31-08-25
Exploring Machine Learning and Language Models for Multimodal Depression Detection
Authors: Javier Si Zhao Hong, Timothy Zoe Delaya, Sherwyn Chan Yin Kit, Pai Chet Ng, Xiaoxiao Miao |
阅读更多来源: ArXiv AI | 31-08-25
Research Challenges in Relational Database Management Systems for LLM Queries
Authors: Kerem Akillioglu, Anurag Chakraborty, Sairaj Voruganti, M. Tamer Özsu |
阅读更多来源: ArXiv AI | 31-08-25
OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models
Authors: Adam Coscia, Shunan Guo, Eunyee Koh, Alex Endert |
阅读更多来源: ArXiv AI | 31-08-25
IntentionReasoner: Facilitating Adaptive LLM Safeguards through Intent Reasoning and Selective Query Refinement
Authors: Yuanzhe Shen, Zisu Huang, Zhengkang Guo, Yide Liu, Guanxu Chen, Ruicheng Yin, Xiaoqing Zheng, Xuanjing Huang |
阅读更多来源: ArXiv AI | 31-08-25
QAgent: An LLM-based Multi-Agent System for Autonomous OpenQASM programming
Authors: Zhenxiao Fu, Fan Chen, Lei Jiang |
阅读更多来源: ArXiv AI | 31-08-25
ArgRAG: Explainable Retrieval Augmented Generation using Quantitative Bipolar Argumentation
Authors: Yuqicheng Zhu, Nico Potyka, Daniel Hernández, Yuan He, Zifeng Ding, Bo Xiong, Dongzhuoran Zhou, Evgeny Kharlamov, Steffen Staab |
阅读更多来源: ArXiv AI | 31-08-25
Do Students Rely on AI? Analysis of Student-ChatGPT Conversations from a Field Study
Authors: Jiayu Zheng, Lingxin Hao, Kelun Lu, Ashi Garg, Mike Reese, Melo-Jean Yap, I-Jeng Wang, Xingyun Wu, Wenrui Huang, Jenna Hoffman, Ariane Kelly, My Le, Ryan Zhang, Yanyu Lin, Muhammad Faayez, Anqi Liu |
阅读更多来源: ArXiv AI | 31-08-25
Enhancing Health Fact-Checking with LLM-Generated Synthetic Data
Authors: Jingze Zhang, Jiahe Qian, Yiliang Zhou, Yifan Peng |
阅读更多来源: ArXiv AI | 31-08-25
Uncertainty Under the Curve: A Sequence-Level Entropy Area Metric for Reasoning LLM
Authors: Yongfu Zhu, Lin Sun, Guangxiang Zhao, Weihong Lin, Xiangzheng Zhang |
阅读更多来源: ArXiv AI | 31-08-25
Transparent Semantic Spaces: A Categorical Approach to Explainable Word Embeddings
Authors: Ares Fabregat-Hernández (1 and 2), Javier Palanca (1), Vicent Botti (1 and 3) ((1) Valencian Research Institute for Artificial Intelligence (VRAIN) Universitat Politècnica de València (2) Universidad Internacional de Valencia (VIU) (3) valgrAI (Valencian Graduate School and Research Network of Artificial Intelligence)) |
阅读更多来源: ArXiv AI | 31-08-25
Bridging Minds and Machines: Toward an Integration of AI and Cognitive Science
Authors: Rui Mao, Qian Liu, Xiao Li, Erik Cambria, Amir Hussain |
阅读更多来源: ArXiv AI | 31-08-25
A Graph-Based Test-Harness for LLM Evaluation
Authors: Jessica Lundin, Guillaume Chabot-Couture |
阅读更多来源: ArXiv AI | 31-08-25
ChatThero: An LLM-Supported Chatbot for Behavior Change and Therapeutic Support in Addiction Recovery
Authors: Junda Wang, Zonghai Yao, Zhichao Yang, Lingxi Li, Junhui Qian, Hong Yu |
阅读更多来源: ArXiv AI | 31-08-25
Show HN: Hacker News em dash user leaderboard pre-ChatGPTgally.net
阅读更多来源: Hacker News | 31-08-25
The Default Trap: Why Anthropic's Data Policy Change Mattersnatesnewsletter.substack.com
阅读更多来源: Hacker News | 31-08-25
Gemini still lags behind ChatGPT on the web, but Google now has four AI apps in the Top 50
阅读更多来源: The Decoder | 31-08-25
The Theoretical Limitations of Embedding-Based Retrievalarxiv.org
阅读更多来源: Hacker News | 30-08-25
Anthropic teams up with OpenAI for security tests and warns that AI is enabling cybercrime
阅读更多来源: The Decoder | 30-08-25
SQLite's documentation about its durability properties is unclearagwa.name
阅读更多来源: Hacker News | 30-08-25
Flunking my Anthropic interview againtaylor.town
阅读更多来源: Hacker News | 30-08-25
Pentagon Docs: US Wants to "Suppress Dissenting Arguments" Using AI Propagandatheintercept.com
阅读更多来源: Hacker News | 30-08-25
In Search of AI Psychosisastralcodexten.com
阅读更多来源: Hacker News | 29-08-25
Some thoughts on LLMs and software developmentmartinfowler.com
阅读更多来源: Hacker News | 29-08-25
Claude Sonnet will ship in Xcodedeveloper.apple.com
阅读更多来源: Hacker News | 29-08-25
If you have a Claude account, they're going to train on your data moving forwardreddit.com
阅读更多来源: Hacker News | 29-08-25
Anthropic reverses privacy stance, will train on Claude chatsperplexity.ai
阅读更多来源: Hacker News | 29-08-25
Show HN: Vectorless RAGgithub.com/vectifyai
阅读更多来源: Hacker News | 29-08-25
Will AI Replace Human Thinking? The Case for Writing and Coding Manuallyssp.sh
阅读更多来源: Hacker News | 29-08-25
Show HN: SwiftAI – open-source library to easily build LLM features on iOS/macOSgithub.com/mi12labs
阅读更多来源: Hacker News | 29-08-25
Are OpenAI and Anthropic losing money on inference?martinalderson.com
阅读更多来源: Hacker News | 29-08-25
xAI claims Apple and OpenAI are shutting out competitors by making exclusive AI partnerships
阅读更多来源: The Decoder | 28-08-25
Google's Gemini 2.5 Flash upgrades AI image editing with better prompt accuracy
阅读更多来源: The Decoder | 28-08-25
AI tools like ChatGPT sharply reduce jobs for young workers in exposed fields, study shows
阅读更多来源: The Decoder | 28-08-25
OpenAI's restructuring stalls as talks with Microsoft over API access and IP rights drag on
阅读更多来源: The Decoder | 28-08-25
Two Meta Superintelligence hires jump back to OpenAI after only weeks
阅读更多来源: The Decoder | 28-08-25
OpenAI adds new safeguards to ChatGPT after a lawsuit over a teen suicide
阅读更多来源: The Decoder | 28-08-25
Claude Code Checkpointsclaude-checkpoints.com
阅读更多来源: Hacker News | 28-08-25
Prosper AI (YC S23) Is Hiring Founding Account Executives (NYC)ashbyhq.com
阅读更多来源: Hacker News | 28-08-25
Rendering an ASCII game in real-time with AI (100ms latency)jeffschomay.com
阅读更多来源: Hacker News | 28-08-25
The most important machine learning equations: A comprehensive guidechizkidd.github.io
阅读更多来源: Hacker News | 28-08-25
Survey of Specialized Large Language Model
Authors: Chenghan Yang, Ruiyu Zhao, Yang Liu, Ling Jiang |
阅读更多来源: ArXiv AI | 28-08-25
SoK: Large Language Model Copyright Auditing via Fingerprinting
Authors: Shuo Shao, Yiming Li, Yu He, Hongwei Yao, Wenyuan Yang, Dacheng Tao, Zhan Qin |
阅读更多来源: ArXiv AI | 28-08-25
Generative AI for Testing of Autonomous Driving Systems: A Survey
Authors: Qunying Song, He Ye, Mark Harman, Federica Sarro |
阅读更多来源: ArXiv AI | 28-08-25
Dhati+: Fine-tuned Large Language Models for Arabic Subjectivity Evaluation
Authors: Slimane Bellaouar, Attia Nehar, Soumia Souffi, Mounia Bouameur |
阅读更多来源: ArXiv AI | 28-08-25
Decomposing Behavioral Phase Transitions in LLMs: Order Parameters for Emergent Misalignment
Authors: Julian Arnold, Niels Lörch |
阅读更多来源: ArXiv AI | 28-08-25
Large Language Models (LLMs) for Electronic Design Automation (EDA)
Authors: Kangwei Xu, Denis Schwachhofer, Jason Blocklove, Ilia Polian, Peter Domanski, Dirk Pflüger, Siddharth Garg, Ramesh Karri, Ozgur Sinanoglu, Johann Knechtel, Zhuorui Zhao, Ulf Schlichtmann, Bing Li |
阅读更多来源: ArXiv AI | 28-08-25
Aleks: AI powered Multi Agent System for Autonomous Scientific Discovery via Data-Driven Approaches in Plant Science
Authors: Daoyuan Jin, Nick Gunner, Niko Carvajal Janke, Shivranjani Baruah, Kaitlin M. Gold, Yu Jiang |
阅读更多来源: ArXiv AI | 28-08-25
Reliable Weak-to-Strong Monitoring of LLM Agents
Authors: Neil Kale, Chen Bo Calvin Zhang, Kevin Zhu, Ankit Aich, Paula Rodriguez, Scale Red Team, Christina Q. Knight, Zifan Wang |
阅读更多来源: ArXiv AI | 28-08-25
Quantized but Deceptive? A Multi-Dimensional Truthfulness Evaluation of Quantized LLMs
Authors: Yao Fu, Xianxuan Long, Runchao Li, Haotian Yu, Mu Sheng, Xiaotian Han, Yu Yin, Pan Li |
阅读更多来源: ArXiv AI | 28-08-25
Instructional Agents: LLM Agents on Automated Course Material Generation for Teaching Faculties
Authors: Huaiyuan Yao, Wanpeng Xu, Justin Turnau, Nadia Kellam, Hua Wei |
阅读更多来源: ArXiv AI | 28-08-25
ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding
Authors: Sining Zhoubian, Dan Zhang, Yuxiao Dong, Jie Tang |
阅读更多来源: ArXiv AI | 28-08-25
InquireMobile: Teaching VLM-based Mobile Agent to Request Human Assistance via Reinforcement Fine-Tuning
Authors: Qihang Ai, Pi Bu, Yue Cao, Yingyao Wang, Jihao Gu, Jingxuan Xing, Zekun Zhu, Wei Jiang, Zhicheng Zheng, Jun Song, Yuning Jiang, Bo Zheng |
阅读更多来源: ArXiv AI | 28-08-25
CASE: An Agentic AI Framework for Enhancing Scam Intelligence in Digital Payments
Authors: Nitish Jaipuria, Lorenzo Gatto, Zijun Kan, Shankey Poddar, Bill Cheung, Diksha Bansal, Ramanan Balakrishnan, Aviral Suri, Jose Estevez |
阅读更多来源: ArXiv AI | 28-08-25
Model Science: getting serious about verification, explanation and control of AI systems
Authors: Przemyslaw Biecek, Wojciech Samek |
阅读更多来源: ArXiv AI | 28-08-25
Bring Your Own Agent to Zed – Featuring Gemini CLIzed.dev
阅读更多来源: Hacker News | 28-08-25
Researchers find evidence of ChatGPT buzzwords turning up in everyday speechfsu.edu
阅读更多来源: Hacker News | 28-08-25
CEO Arison says no single AI model will always meet Grindr’s needs
阅读更多来源: The Decoder | 27-08-25
Deepseek reportedly delayed its latest AI model after technical issues with Huawei’s Ascend chips
阅读更多来源: The Decoder | 27-08-25
Trump officials warn the EU while ChatGPT approaches stricter EU tech regulations
阅读更多来源: The Decoder | 27-08-25
Gemini 2.5 Flash Imagegoogleblog.com
阅读更多来源: Hacker News | 27-08-25
Claude for Chromeanthropic.com
阅读更多来源: Hacker News | 27-08-25
Nx compromised: malware uses Claude code CLI to explore the filesystemsemgrep.dev
阅读更多来源: Hacker News | 27-08-25
The AI in the Mirror: LLM Self-Recognition in an Iterated Public Goods Game
Authors: Olivia Long, Carter Teplica |
阅读更多来源: ArXiv AI | 27-08-25
PKG-DPO: Optimizing Domain-Specific AI systems with Physics Knowledge Graphs and Direct Preference Optimization
Authors: Nitin Nagesh Kulkarni, Bryson Wilcox, Max Sawa, Jason Thom |
阅读更多来源: ArXiv AI | 27-08-25
AI LLM Proof of Self-Consciousness and User-Specific Attractors
Authors: Jeffrey Camlin |
阅读更多来源: ArXiv AI | 27-08-25
A Database-Driven Framework for 3D Level Generation with LLMs
Authors: Kaijie Xu, Clark Verbrugge |
阅读更多来源: ArXiv AI | 27-08-25
Generic Guard AI in Stealth Game with Composite Potential Fields
Authors: Kaijie Xu, Clark Verbrugge |
阅读更多来源: ArXiv AI | 27-08-25
Beyond Benchmark: LLMs Evaluation with an Anthropomorphic and Value-oriented Roadmap
Authors: Jun Wang, Ninglun Gu, Kailai Zhang, Zijiao Zhang, Yelun Bao, Jin Yang, Xu Yin, Liwei Liu, Yihuan Liu, Pengyong Li, Gary G. Yen, Junchi Yan |
阅读更多来源: ArXiv AI | 27-08-25
Interactive Evaluation of Large Language Models for Multi-Requirement Software Engineering Tasks
Authors: Dimitrios Rontogiannis, Maxime Peyrard, Nicolas Baldwin, Martin Josifoski, Robert West, Dimitrios Gunopulos |
阅读更多来源: ArXiv AI | 27-08-25
Judicial Requirements for Generative AI in Legal Reasoning
Authors: Eljas Linna, Tuula Linna |
阅读更多来源: ArXiv AI | 27-08-25
VISION: Robust and Interpretable Code Vulnerability Detection Leveraging Counterfactual Augmentation
Authors: David Egea, Barproda Halder, Sanghamitra Dutta |
阅读更多来源: ArXiv AI | 27-08-25
Novel Approaches to Artificial Intelligence Development Based on the Nearest Neighbor Method
Authors: I.I. Priezzhev, D.A. Danko, A.V. Shubin |
阅读更多来源: ArXiv AI | 27-08-25
A Concurrent Modular Agent: Framework for Autonomous LLM Agents
Authors: Norihiro Maruyama, Takahide Yoshida, Hiroki Sato, Atsushi Masumori, Johnsmith, Takashi Ikegami |
阅读更多来源: ArXiv AI | 27-08-25
Investigating Advanced Reasoning of Large Language Models via Black-Box Interaction
Authors: Congchi Yin, Tianyi Wu, Yankai Shu, Alex Gu, Yunhan Wang, Jun Shao, Xun Jiang, Piji Li |
阅读更多来源: ArXiv AI | 27-08-25
Reasoning LLMs in the Medical Domain: A Literature Survey
Authors: Armin Berger, Sarthak Khanna, David Berghaus, Rafet Sifa |
阅读更多来源: ArXiv AI | 27-08-25
Can Structured Templates Facilitate LLMs in Tackling Harder Tasks? : An Exploration of Scaling Laws by Difficulty
Authors: Zhichao Yang, Zhaoxin Fan, Gen Li, Yuanze Hu, Xinyu Wang, Ye Qiu, Xin Wang, Yifan Sun, Wenjun Wu |
阅读更多来源: ArXiv AI | 27-08-25
Playstyle and Artificial Intelligence: An Initial Blueprint Through the Lens of Video Games
Authors: Chiu-Chou Lin |
阅读更多来源: ArXiv AI | 27-08-25
Proposal: AI Content Disclosure Headerietf.org
阅读更多来源: Hacker News | 27-08-25
Show HN: Async – Claude code and Linear and GitHub PRs in one opinionated toolgithub.com/bkdevs
阅读更多来源: Hacker News | 27-08-25
LiteLLM (YC W23) is hiring a back end engineerycombinator.com
阅读更多来源: Hacker News | 27-08-25
Students who cheat are more likely to use generative AI tools for academic work, study finds
阅读更多来源: The Decoder | 26-08-25
Meta licenses Midjourney's "aesthetic technology" to improve visual quality of future models
阅读更多来源: The Decoder | 26-08-25
The Annotated Transformer (2022)seas.harvard.edu
阅读更多来源: Hacker News | 26-08-25
Will Smith's concert crowds are real, but AI is blurring the lineswaxy.org
阅读更多来源: Hacker News | 26-08-25
Evaluation and LLM-Guided Learning of ICD Coding Rationales
Authors: Mingyang Li, Viktor Schlegel, Tingting Mu, Wuraola Oyewusi, Kai Kang, Goran Nenadic |
阅读更多来源: ArXiv AI | 26-08-25
Explainable AI for Predicting and Understanding Mathematics Achievement: A Cross-National Analysis of PISA 2018
Authors: Liu Liu, Rui Dai |
阅读更多来源: ArXiv AI | 26-08-25
Quantifying Sycophancy as Deviations from Bayesian Rationality in LLMs
Authors: Katherine Atwell, Pedram Heydari, Anthony Sicilia, Malihe Alikhani |
阅读更多来源: ArXiv AI | 26-08-25
Large Language Model-Based Automatic Formulation for Stochastic Optimization Models
Authors: Amirreza Talebi |
阅读更多来源: ArXiv AI | 26-08-25
PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent LLMs
Authors: Zhilin Zhang, Xiang Zhang, Jiaqi Wei, Yiwei Xu, Chenyu You |
阅读更多来源: ArXiv AI | 26-08-25
Rethinking How AI Embeds and Adapts to Human Values: Challenges and Opportunities
Authors: Sz-Ting Tzeng, Frank Dignum |
阅读更多来源: ArXiv AI | 26-08-25
PowerChain: Automating Distribution Grid Analysis with Agentic AI Workflows
Authors: Emmanuel O. Badmus, Peng Sang, Dimitrios Stamoulis, Amritanshu Pandey |
阅读更多来源: ArXiv AI | 26-08-25
Federated Reinforcement Learning for Runtime Optimization of AI Applications in Smart Eyewears
Authors: Hamta Sedghani, Abednego Wamuhindo Kambale, Federica Filippini, Francesca Palermo, Diana Trojaniello, Danilo Ardagna |
阅读更多来源: ArXiv AI | 26-08-25
L-XAIDS: A LIME-based eXplainable AI framework for Intrusion Detection Systems
Authors: Aoun E Muhammad, Kin-Choong Yow, Nebojsa Bacanin-Dzakula, Muhammad Attique Khan |
阅读更多来源: ArXiv AI | 26-08-25
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs
Authors: Mia Taylor, James Chua, Jan Betley, Johannes Treutlein, Owain Evans |
阅读更多来源: ArXiv AI | 26-08-25
Large Language Models as Universal Predictors? An Empirical Study on Small Tabular Datasets
Authors: Nikolaos Pavlidis, Vasilis Perifanis, Symeon Symeonidis, Pavlos S. Efraimidis |
阅读更多来源: ArXiv AI | 26-08-25
Evaluating Retrieval-Augmented Generation Strategies for Large Language Models in Travel Mode Choice Prediction
Authors: Yiming Xu, Junfeng Jiao |
阅读更多来源: ArXiv AI | 26-08-25
Interpretable Early Failure Detection via Machine Learning and Trace Checking-based Monitoring
Authors: Andrea Brunello, Luca Geatti, Angelo Montanari, Nicola Saccomanno |
阅读更多来源: ArXiv AI | 26-08-25
AgentRAN: An Agentic AI Architecture for Autonomous Control of Open 6G Networks
Authors: Maxime Elkael, Salvatore D'Oro, Leonardo Bonati, Michele Polese, Yunseong Lee, Koichiro Furueda, Tommaso Melodia |
阅读更多来源: ArXiv AI | 26-08-25
LLM-based Agentic Reasoning Frameworks: A Survey from Methods to Scenarios
Authors: Bingxi Zhao, Lin Geng Foo, Ping Hu, Christian Theobalt, Hossein Rahmani, Jun Liu |
阅读更多来源: ArXiv AI | 26-08-25
Neural Algorithmic Reasoners informed Large Language Model for Multi-Agent Path Finding
Authors: Pu Feng, Size Wang, Yuhong Cao, Junkang Liang, Rongye Shi, Wenjun Wu |
阅读更多来源: ArXiv AI | 26-08-25
FAIRGAMER: Evaluating Biases in the Application of Large Language Models to Video Games
Authors: Bingkang Shi, Jen-tse Huang, Guoyi Li, Xiaodan Zhang, Zhongjiang Yao |
阅读更多来源: ArXiv AI | 26-08-25
Teaching LLMs to Think Mathematically: A Critical Study of Decision-Making via Optimization
Authors: Mohammad J. Abdel-Rahman, Yasmeen Alslman, Dania Refai, Amro Saleh, Malik A. Abu Loha, Mohammad Yahya Hamed |
阅读更多来源: ArXiv AI | 26-08-25
The AI Data Scientist
Authors: Farkhad Akimov, Munachiso Samuel Nwadike, Zangir Iklassov, Martin Takáč |
阅读更多来源: ArXiv AI | 26-08-25
Unraveling the cognitive patterns of Large Language Models through module communities
Authors: Kushal Raj Bhandari, Pin-Yu Chen, Jianxi Gao |
阅读更多来源: ArXiv AI | 26-08-25
ST-Raptor: LLM-Powered Semi-Structured Table Question Answering
Authors: Zirui Tang, Boyu Niu, Xuanhe Zhou, Boxiu Li, Wei Zhou, Jiannan Wang, Guoliang Li, Xinyi Zhang, Fan Wu |
阅读更多来源: ArXiv AI | 26-08-25
Scamlexity: When agentic AI browsers get scammedguard.io
阅读更多来源: Hacker News | 26-08-25
Launch HN: April (YC S25) – Voice AI to manage your email and calendar
阅读更多来源: Hacker News | 26-08-25
Exploring the tragedy of the Counter-Strike 2 server browserbphilip.uk
阅读更多来源: Hacker News | 26-08-25
Spiral-Bench shows which AI models most strongly reinforce users' delusional thinking
阅读更多来源: The Decoder | 25-08-25
Making games in Go: 3 months without LLMs vs. 3 days with LLMsmarianogappa.github.io
阅读更多来源: Hacker News | 25-08-25
Claim: GPT-5-pro can prove new interesting mathematicstwitter.com/sebastienbubeck
阅读更多来源: Hacker News | 25-08-25
Show HN: Clearcam – Add AI object detection to your IP CCTV camerasgithub.com/roryclear
阅读更多来源: Hacker News | 25-08-25
YouTube made AI enhancements to videos without warning or permissionbbc.com
阅读更多来源: Hacker News | 25-08-25
Standard Thermal: Energy Storage 500x Cheaper Than Batteriesaustinvernon.site
阅读更多来源: Hacker News | 25-08-25
Agent-C: a 4KB AI agentgithub.com/bravenewxyz
阅读更多来源: Hacker News | 25-08-25
LLMSymGuard: A Symbolic Safety Guardrail Framework Leveraging Interpretable Jailbreak Concepts
Authors: Darpan Aswal, Céline Hudelot |
阅读更多来源: ArXiv AI | 25-08-25
MizanQA: Benchmarking Large Language Models on Moroccan Legal Question Answering
Authors: Adil Bahaj, Mounir Ghogho |
阅读更多来源: ArXiv AI | 25-08-25
Confusion is the Final Barrier: Rethinking Jailbreak Evaluation and Investigating the Real Misuse Threat of LLMs
Authors: Yu Yan, Sheng Sun, Zhe Wang, Yijun Lin, Zenghao Duan, zhifei zheng, Min Liu, Zhiyi yin, Jianping Zhang |
阅读更多来源: ArXiv AI | 25-08-25
Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for Turkish
Authors: Yakup Abrek Er, Ilker Kesen, Gözde Gül Şahin, Aykut Erdem |
阅读更多来源: ArXiv AI | 25-08-25
FLAMES: Improving LLM Math Reasoning via a Fine-Grained Analysis of the Data Synthesis Pipeline
Authors: Parker Seegmiller, Kartik Mehta, Soumya Saha, Chenyang Tao, Shereen Oraby, Arpit Gupta, Tagyoung Chung, Mohit Bansal, Nanyun Peng |
阅读更多来源: ArXiv AI | 25-08-25
RL Is Neither a Panacea Nor a Mirage: Understanding Supervised vs. Reinforcement Learning Fine-Tuning for LLMs
Authors: Hangzhan Jin, Sicheng Lv, Sifan Wu, Mohammad Hamdaqa |
阅读更多来源: ArXiv AI | 25-08-25
MV-RAG: Retrieval Augmented Multiview Diffusion
Authors: Yosef Dayani, Omer Benishu, Sagie Benaim |
阅读更多来源: ArXiv AI | 25-08-25
Integrating Time Series into LLMs via Multi-layer Steerable Embedding Fusion for Enhanced Forecasting
Authors: Zhuomin Chen, Dan Li, Jiahui Zhou, Shunyu Wu, Haozheng Ye, Jian Lou, See-Kiong Ng |
阅读更多来源: ArXiv AI | 25-08-25
IR-Agent: Expert-Inspired LLM Agents for Structure Elucidation from Infrared Spectra
Authors: Heewoong Noh, Namkyeong Lee, Gyoung S. Na, Kibum Kim, Chanyoung Park |
阅读更多来源: ArXiv AI | 25-08-25
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles
Authors: Zizhen Li, Chuanhao Li, Yibin Wang, Qi Chen, Diping Song, Yukang Feng, Jianwen Sun, Jiaxin Ai, Fanrui Zhang, Mingzhu Sun, Kaipeng Zhang |
阅读更多来源: ArXiv AI | 25-08-25
Graph RAG as Human Choice Model: Building a Data-Driven Mobility Agent with Preference Chain
Authors: Kai Hu, Parfait Atchade-Adelomou, Carlo Adornetto, Adrian Mora-Carrero, Luis Alonso-Pastor, Ariel Noyman, Yubo Liu, Kent Larson |
阅读更多来源: ArXiv AI | 25-08-25
Modular Embedding Recomposition for Incremental Learning
Authors: Aniello Panariello, Emanuele Frascaroli, Pietro Buzzega, Lorenzo Bonicelli, Angelo Porrello, Simone Calderara |
阅读更多来源: ArXiv AI | 25-08-25
LLM-Based Agents for Competitive Landscape Mapping in Drug Asset Due Diligence
Authors: Alisa Vinogradova (1), Vlad Vinogradov (1), Dmitrii Radkevich (1), Ilya Yasny (1), Dmitry Kobyzev (1), Ivan Izmailov (1), Katsiaryna Yanchanka (1), Andrey Doronichev (1) ((1) Optic Inc.) |
阅读更多来源: ArXiv AI | 25-08-25
Comet AI browser can get prompt injected from any site, drain your bank accounttwitter.com/zack_overflow
阅读更多来源: Hacker News | 25-08-25
"Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries
Authors: Jon E. Froehlich, Jared Hwang, Zeyu Wang, John S. O'Meara, Xia Su, William Huang, Yang Zhang, Alex Fiannaca, Philip Nelson, Shaun Kane |
阅读更多来源: ArXiv AI | 24-08-25
End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning
Authors: Qiaoyu Zheng, Yuze Sun, Chaoyi Wu, Weike Zhao, Pengcheng Qiu, Yongguo Yu, Kun Sun, Yanfeng Wang, Ya Zhang, Weidi Xie |
阅读更多来源: ArXiv AI | 24-08-25
Numerical models outperform AI weather forecasts of record-breaking extremes
Authors: Zhongwei Zhang, Erich Fischer, Jakob Zscheischler, Sebastian Engelke |
阅读更多来源: ArXiv AI | 24-08-25
Tutorial on the Probabilistic Unification of Estimation Theory, Machine Learning, and Generative AI
Authors: Mohammed Elmusrati |
阅读更多来源: ArXiv AI | 24-08-25
Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO
Authors: Jaeha Lee, Gio Huh, Ning Su, Tony Yue YU |
阅读更多来源: ArXiv AI | 24-08-25
Demonstrating Onboard Inference for Earth Science Applications with Spectral Analysis Algorithms and Deep Learning
Authors: Itai Zilberstein, Alberto Candela, Steve Chien, David Rijlaarsdam, Tom Hendrix, Leonie Buckley, Aubrey Dunne |
阅读更多来源: ArXiv AI | 24-08-25
Collab-REC: An LLM-based Agentic Framework for Balancing Recommendations in Tourism
Authors: Ashmi Banerjee, Fitri Nur Aisyah, Adithi Satish, Wolfgang Wörndl, Yashar Deldjoo |
阅读更多来源: ArXiv AI | 24-08-25
aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists
Authors: Pengsong Zhang, Xiang Hu, Guowei Huang, Yang Qi, Heng Zhang, Xiuxu Li, Jiaxing Song, Jiabin Luo, Yijiang Li, Shuo Yin, Chengxiao Dai, Eric Hanchen Jiang, Xiaoyan Zhou, Zhenfei Yin, Boqin Yuan, Jing Dong, Guinan Su, Guanren Qiao, Haiming Tang, Anghong Du, Lili Pan, Zhenzhong Lan, Xinyu Liu |
阅读更多来源: ArXiv AI | 24-08-25
R-ConstraintBench: Evaluating LLMs on NP-Complete Scheduling
Authors: Raj Jain, Marc Wetter |
阅读更多来源: ArXiv AI | 24-08-25
LLM4Sweat: A Trustworthy Large Language Model for Hyperhidrosis Support
Authors: Wenjie Lin, Jin Wei-Kocsis |
阅读更多来源: ArXiv AI | 24-08-25
Coarse-to-Fine Grounded Memory for LLM Agent Planning
Authors: Wei Yang, Jinwei Xiao, Hongming Zhang, Qingyang Zhang, Yanna Wang, Bo Xu |
阅读更多来源: ArXiv AI | 24-08-25
DiagECG: An LLM-Driven Framework for Diagnostic Reasoning via Discretized ECG Tokenization
Authors: Jinning Yang, Wen Shi |
阅读更多来源: ArXiv AI | 24-08-25
RETAIL: Towards Real-world Travel Planning for Large Language Models
Authors: Bin Deng, Yizhe Feng, Zeming Liu, Qing Wei, Xiangrong Zhu, Shuai Chen, Yuanfang Guo, Yunhong Wang |
阅读更多来源: ArXiv AI | 24-08-25
From Bits to Boardrooms: A Cutting-Edge Multi-Agent LLM Framework for Business Excellence
Authors: Zihao Wang, Junming Zhang |
阅读更多来源: ArXiv AI | 24-08-25
DeepThink3D: Enhancing Large Language Models with Programmatic Reasoning in Complex 3D Situated Reasoning Tasks
Authors: Jiayi Song, Rui Wan, Lipeng Ma, Weidong Yang, Qingyuan Zhou, Yixuan Li, Ben Fei |
阅读更多来源: ArXiv AI | 24-08-25
Futurity as Infrastructure: A Techno-Philosophical Interpretation of the AI Lifecycle
Authors: Mark Cote, Susana Aires |
阅读更多来源: ArXiv AI | 24-08-25
Measuring the environmental impact of delivering AI at Google Scale
Authors: Cooper Elsworth, Keguo Huang, David Patterson, Ian Schneider, Robert Sedivy, Savannah Goodman, Ben Townsend, Parthasarathy Ranganathan, Jeff Dean, Amin Vahdat, Ben Gomes, James Manyika |
阅读更多来源: ArXiv AI | 24-08-25
What makes Claude Code so damn goodminusx.ai
阅读更多来源: Hacker News | 24-08-25
ThinkMesh: A Python lib for parallel thinking in LLMsgithub.com/martianlantern
阅读更多来源: Hacker News | 24-08-25
How can AI ID a cat?quantamagazine.org
阅读更多来源: Hacker News | 24-08-25
Turning Claude Code into My Best Design Partnerbetweentheprompts.com
阅读更多来源: Hacker News | 24-08-25
Wildthing – A model trained on role-reversed ChatGPT conversationsyouaretheassistantnow.com
阅读更多来源: Hacker News | 24-08-25
Google cements its place in the AI ecosystem by powering products from competing labs
阅读更多来源: The Decoder | 24-08-25
Google is making AI Mode in Search more agentic and launching it in over 180 new countries
阅读更多来源: The Decoder | 24-08-25
Reformulating web documents into synthetic data addresses the growing limits of AI training data
阅读更多来源: The Decoder | 24-08-25
Microsoft’s AI boss warns the illusion of conscious AI could trigger psychosis
阅读更多来源: The Decoder | 23-08-25
Measuring the environmental impact of AI inferencearstechnica.com
阅读更多来源: Hacker News | 23-08-25
My tips for using LLM agents to create softwareefitz-thoughts.blogspot.com
阅读更多来源: Hacker News | 23-08-25
With its new AI health coach, Google leverages its platform advantage
阅读更多来源: The Decoder | 23-08-25
Launch HN: Inconvo (YC S23) – AI agents for customer-facing analytics
阅读更多来源: Hacker News | 23-08-25
Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routingarxiv.org
阅读更多来源: Hacker News | 23-08-25
Building AI products in the probabilistic eragiansegato.com
阅读更多来源: Hacker News | 22-08-25
AWS CEO says using AI to replace junior staff is 'Dumbest thing I've ever heard'theregister.com
阅读更多来源: Hacker News | 22-08-25
Weaponizing image scaling against production AI systemstrailofbits.com
阅读更多来源: Hacker News | 22-08-25
From GPT-4 to GPT-5: Measuring progress through MedHELM [pdf]fertrevino.com
阅读更多来源: Hacker News | 22-08-25
Being confidently wrong is holding AI backpromptql.io
阅读更多来源: Hacker News | 22-08-25
Mark Zuckerberg freezes AI hiring amid bubble fearstelegraph.co.uk
阅读更多来源: Hacker News | 22-08-25
Mirage 2 – Generative World Enginedynamicslab.ai
阅读更多来源: Hacker News | 22-08-25
Microsoft brings Copilot LLM features directly into Excel spreadsheet cells with a new in-cell function
阅读更多来源: The Decoder | 21-08-25
Show HN: I replaced vector databases with Git for AI memory (PoC)github.com/growth-kinetics
阅读更多来源: Hacker News | 21-08-25
AI crawlers, fetchers are blowing up websites; Meta, OpenAI are worst offenderstheregister.com
阅读更多来源: Hacker News | 21-08-25
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Authors: NVIDIA: Aarti Basant, Abhijit Khairnar, Abhijit Paithankar, Abhinav Khattar, Adi Renduchintala, Adithya Renduchintala, Aditya Malte, Akhiad Bercovich, Akshay Hazare, Alejandra Rico, Aleksander Ficek, Alex Kondratenko, Alex Shaposhnikov, Ali Taghibakhshi, Amelia Barton, Ameya Sunil Mahabaleshwarkar, Amy Shen, Andrew Tao, Ann Guan, Anna Shors, Anubhav Mandarwal, Arham Mehta, Arun Venkatesan, Ashton Sharabiani, Ashwath Aithal, Ashwin Poojary, Ayush Dattagupta, Balaram Buddharaju, Banghua Zhu, Barnaby Simkin, Bilal Kartal, Bita Darvish Rouhani, Bobby Chen, Boris Ginsburg, Brandon Norick, Brian Yu, Bryan Catanzaro, Charles Wang, Charlie Truong, Chetan Mungekar, Chintan Patel, Chris Alexiuk, Christian Munley, Christopher Parisien, Dan Su, Daniel Afrimi, Daniel Korzekwa, Daniel Rohrer, Daria Gitman, David Mosallanezhad, Deepak Narayanan, Dima Rekesh, Dina Yared, Dmytro Pykhtar, Dong Ahn, Duncan Riach, Eileen Long, Elliott Ning, Eric Chung, Erick Galinkin, Evelina Bakhturina, Gargi Prasad, Gerald Shen, Haim Elisha, Harsh Sharma, Hayley Ross, Helen Ngo, Herman Sahota, Hexin Wang, Hoo Chang Shin, Hua Huang, Iain Cunningham, Igor Gitman, Ivan Moshkov, Jaehun Jung, Jan Kautz, Jane Polak Scowcroft, Jared Casper, Jimmy Zhang, Jinze Xue, Jocelyn Huang, Joey Conway, John Kamalu, Jonathan Cohen, Joseph Jennings, Julien Veron Vialard, Junkeun Yi, Jupinder Parmar, Kari Briski, Katherine Cheung, Katherine Luna, Keith Wyss, Keshav Santhanam, Kezhi Kong, Krzysztof Pawelec, Kumar Anik, Kunlun Li, Kushan Ahmadian, Lawrence McAfee |
阅读更多来源: ArXiv AI | 21-08-25
Post-hoc LLM-Supported Debugging of Distributed Processes
Authors: Dennis Schiese, Andreas Both |
阅读更多来源: ArXiv AI | 21-08-25
Towards LLM-generated explanations for Component-based Knowledge Graph Question Answering Systems
Authors: Dennis Schiese, Aleksandr Perevalov, Andreas Both |
阅读更多来源: ArXiv AI | 21-08-25
Adaptively Robust LLM Inference Optimization under Prediction Uncertainty
Authors: Zixi Chen, Yinyu Ye, Zijie Zhou |
阅读更多来源: ArXiv AI | 21-08-25
Can LLM Agents Solve Collaborative Tasks? A Study on Urgency-Aware Planning and Coordination
Authors: João Vitor de Carvalho Silva, Douglas G. Macharet |
阅读更多来源: ArXiv AI | 21-08-25
ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine
Authors: Junying Chen, Zhenyang Cai, Zhiheng Liu, Yunjin Yang, Rongsheng Wang, Qingying Xiao, Xiangyi Feng, Zhan Su, Jing Guo, Xiang Wan, Guangjun Yu, Haizhou Li, Benyou Wang |
阅读更多来源: ArXiv AI | 21-08-25
PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning
Authors: Ruheng Wang, Hang Zhang, Trieu Nguyen, Shasha Feng, Hao-Wei Pang, Xiang Yu, Li Xiao, Peter Zhiping Zhang |
阅读更多来源: ArXiv AI | 21-08-25
Reliable generation of isomorphic physics problems using ChatGPT with prompt-chaining and tool use
Authors: Zhongzhou Chen |
阅读更多来源: ArXiv AI | 21-08-25
Evaluating Multilingual and Code-Switched Alignment in LLMs via Synthetic Natural Language Inference
Authors: Samir Abdaljalil, Erchin Serpedin, Khalid Qaraqe, Hasan Kurban |
阅读更多来源: ArXiv AI | 21-08-25
TransLLM: A Unified Multi-Task Foundation Framework for Urban Transportation via Learnable Prompting
Authors: Jiaming Leng, Yunying Bi, Chuan Qin, Bing Yin, Yanyong Zhang, Chao Wang |
阅读更多来源: ArXiv AI | 21-08-25
From Passive Tool to Socio-cognitive Teammate: A Conceptual Framework for Agentic AI in Human-AI Collaborative Learning
Authors: Lixiang Yan |
阅读更多来源: ArXiv AI | 21-08-25
Large Language Models are Highly Aligned with Human Ratings of Emotional Stimuli
Authors: Mattson Ogg, Chace Ashcraft, Ritwik Bose, Raphael Norman-Tenazas, Michael Wolmetz |
阅读更多来源: ArXiv AI | 21-08-25
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs
Authors: Haokun Lin, Haobo Xu, Yichen Wu, Ziyu Guo, Renrui Zhang, Zhichao Lu, Ying Wei, Qingfu Zhang, Zhenan Sun |
阅读更多来源: ArXiv AI | 21-08-25
The Agent Behavior: Model, Governance and Challenges in the AI Digital Age
Authors: Qiang Zhang, Pei Yan, Yijia Xu, Chuanpo Fu, Yong Fang, Yang Liu |
阅读更多来源: ArXiv AI | 21-08-25
Automated Optimization Modeling through Expert-Guided Large Language Model Reasoning
Authors: Beinuo Yang, Qishen Zhou, Junyi Li, Xingchen Su, Simon Hu |
阅读更多来源: ArXiv AI | 21-08-25
Who Sees What? Structured Thought-Action Sequences for Epistemic Reasoning in LLMs
Authors: Luca Annese, Sabrina Patania, Silvia Serino, Tom Foulsham, Silvia Rossi, Azzurra Ruggeri, Dimitri Ognibene |
阅读更多来源: ArXiv AI | 21-08-25
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers
Authors: Ziyang Luo, Zhiqi Shen, Wenzhuo Yang, Zirui Zhao, Prathyusha Jwalapuram, Amrita Saha, Doyen Sahoo, Silvio Savarese, Caiming Xiong, Junnan Li |
阅读更多来源: ArXiv AI | 21-08-25
Entropy-Constrained Strategy Optimization in Urban Floods: A Multi-Agent Framework with LLM and Knowledge Graph Integration
Authors: Peilin Ji, Xiao Xue, Simeng Wang, Wenhao Yan |
阅读更多来源: ArXiv AI | 21-08-25
Warnings about runaway expectations are growing louder throughout the AI industry
阅读更多来源: The Decoder | 21-08-25
Visualizing GPT-OSS-20B embeddingsmelonmars.github.io
阅读更多来源: Hacker News | 21-08-25
Gaussian Processes for Machine Learning (2006) [pdf]gaussianprocess.org
阅读更多来源: Hacker News | 20-08-25
Show HN: Claude Code workflow: PRDs → GitHub Issues → parallel executiongithub.com/automazeio
阅读更多来源: Hacker News | 20-08-25
ASDFormer: A Transformer with Mixtures of Pooling-Classifier Experts for Robust Autism Diagnosis and Biomarker Discovery
Authors: Mohammad Izadi, Mehran Safayani |
阅读更多来源: ArXiv AI | 20-08-25
Chunks as Arms: Multi-Armed Bandit-Guided Sampling for Long-Context LLM Preference Optimization
Authors: Shaohua Duan, Xinze Li, Zhenghao Liu, Xiaoyuan Yi, Yukun Yan, Shuo Wang, Yu Gu, Ge Yu, Maosong Sun |
阅读更多来源: ArXiv AI | 20-08-25
Ask Good Questions for Large Language Models
Authors: Qi Wu, Zhongqi Lu |
阅读更多来源: ArXiv AI | 20-08-25
Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation
Authors: Dongyoon Hahm, Taywon Min, Woogyeol Jin, Kimin Lee |
阅读更多来源: ArXiv AI | 20-08-25
Cognitive Workspace: Active Memory Management for LLMs -- An Empirical Study of Functional Infinite Context
Authors: Tao An |
阅读更多来源: ArXiv AI | 20-08-25
Towards Unified Multimodal Financial Forecasting: Integrating Sentiment Embeddings and Market Indicators via Cross-Modal Attention
Authors: Sarthak Khanna, Armin Berger, David Berghaus, Tobias Deusser, Lorenz Sparrenberg, Rafet Sifa |
阅读更多来源: ArXiv AI | 20-08-25
"DIVE" into Hydrogen Storage Materials Discovery with AI Agents
Authors: Di Zhang, Xue Jia, Tran Ba Hung, Seong Hoon Jang, Linda Zhang, Ryuhei Sato, Yusuke Hashimoto, Toyoto Sato, Kiyoe Konno, Shin-ichi Orimo, Hao Li |
阅读更多来源: ArXiv AI | 20-08-25
HiFo-Prompt: Prompting with Hindsight and Foresight for LLM-based Automatic Heuristic Design
Authors: Chentong Chen, Mengyuan Zhong, Jianyong Sun, Ye Fan, Jialong Shi |
阅读更多来源: ArXiv AI | 20-08-25
STPFormer: A State-of-the-Art Pattern-Aware Spatio-Temporal Transformer for Traffic Forecasting
Authors: Jiayu Fang, Zhiqi Shao, S T Boris Choy, Junbin Gao |
阅读更多来源: ArXiv AI | 20-08-25
Toward Better EHR Reasoning in LLMs: Reinforcement Learning with Expert Attention Guidance
Authors: Yue Fang, Yuxin Guo, Jiaran Gao, Hongxin Ding, Xinke Jiang, Weibin Liao, Yongxin Xu, Yinghao Zhu, Zhibang Yang, Liantao Ma, Junfeng Zhao, Yasha Wang |
阅读更多来源: ArXiv AI | 20-08-25
Neuro-Symbolic Artificial Intelligence: Towards Improving the Reasoning Abilities of Large Language Models
Authors: Xiao-Wen Yang, Jie-Jing Shao, Lan-Zhe Guo, Bo-Wen Zhang, Zhi Zhou, Lin-Han Jia, Wang-Zhou Dai, Yu-Feng Li |
阅读更多来源: ArXiv AI | 20-08-25
MHSNet:An MoE-based Hierarchical Semantic Representation Network for Accurate Duplicate Resume Detection with Large Language Model
Authors: Yu Li, Zulong Chen, Wenjian Xu, Hong Wen, Yipeng Yu, Man Lung Yiu, Yuyu Yin |
阅读更多来源: ArXiv AI | 20-08-25
CausalPlan: Empowering Efficient LLM Multi-Agent Collaboration Through Causality-Driven Planning
Authors: Minh Hoang Nguyen, Van Dai Do, Dung Nguyen, Thin Nguyen, Hung Le |
阅读更多来源: ArXiv AI | 20-08-25
Revisiting RAG Ensemble: A Theoretical and Mechanistic Analysis of Multi-RAG System Collaboration
Authors: Yifei Chen, Guanting Dong, Yutao Zhu, Zhicheng Dou |
阅读更多来源: ArXiv AI | 20-08-25
Expertise-aware Multi-LLM Recruitment and Collaboration for Medical Decision-Making
Authors: Liuxin Bao, Zhihao Peng, Xiaofei Zhou, Runmin Cong, Jiyong Zhang, Yixuan Yuan |
阅读更多来源: ArXiv AI | 20-08-25
Improved Generalized Planning with LLMs through Strategy Refinement and Reflection
Authors: Katharina Stein, Nils Hodel, Daniel Fišer, Jörg Hoffmann, Michael Katz, Alexander Koller |
阅读更多来源: ArXiv AI | 20-08-25
The Collaboration Paradox: Why Generative AI Requires Both Strategic Intelligence and Operational Stability in Supply Chain Management
Authors: Soumyadeep Dhar |
阅读更多来源: ArXiv AI | 20-08-25
Structured Agentic Workflows for Financial Time-Series Modeling with LLMs and Reflective Feedback
Authors: Yihao Ang, Yifan Bao, Lei Jiang, Jiajie Tao, Anthony K. H. Tung, Lukasz Szpruch, Hao Ni |
阅读更多来源: ArXiv AI | 20-08-25
ChronoLLM: Customizing Language Models for Physics-Based Simulation Code Generation
Authors: Jingquan Wang, Andrew Negrut, Harry Zhang, Khailanii Slaton, Shu Wang, Radu Serban, Jinlong Wu, Dan Negrut |
阅读更多来源: ArXiv AI | 20-08-25
Show HN: OpenAI/reflect – Physical AI Assistant that illuminates your lifegithub.com/openai
阅读更多来源: Hacker News | 20-08-25
Richard Sutton says the AI industry has "lost its way" by ignoring core principles of intelligence
阅读更多来源: The Decoder | 20-08-25
Show HN: We started building an AI dev tool but it turned into a Sims-style gameyoutube.com
阅读更多来源: Hacker News | 19-08-25
Data Mixing Optimization for Supervised Fine-Tuning of Large Language Models
Authors: Yuan Li, Zhengzhong Liu, Eric Xing |
阅读更多来源: ArXiv AI | 19-08-25
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
Authors: Zhiyuan Zeng, Jiashuo Liu, Siyuan Chen, Tianci He, Yali Liao, Jinpeng Wang, Zaiyuan Wang, Yang Yang, Lingyue Yin, Mingren Yin, Zhenwei Zhu, Tianle Cai, Zehui Chen, Jiecao Chen, Yantao Du, Xiang Gao, Jiacheng Guo, Liang Hu, Jianpeng Jiao, Xiangsheng Li, Jingkai Liu, Shuang Ni, Zhoufutu Wen, Ge Zhang, Kaiyuan Zhang, Xin Zhou, Jose Blanchet, Xipeng Qiu, Mengdi Wang, Wenhao Huang |
阅读更多来源: ArXiv AI | 19-08-25
MOVER: Multimodal Optimal Transport with Volume-based Embedding Regularization
Authors: Haochen You, Baojing Liu |
阅读更多来源: ArXiv AI | 19-08-25
GraphCogent: Overcoming LLMs' Working Memory Constraints via Multi-Agent Collaboration in Complex Graph Understanding
Authors: Rongzheng Wang, Qizhi Chen, Yihong Huang, Yizhuo Ma, Muquan Li, Jiakai Li, Ke Qin, Guangchun Luo, Shuang Liang |
阅读更多来源: ArXiv AI | 19-08-25
GALA: Can Graph-Augmented Large Language Model Agentic Workflows Elevate Root Cause Analysis?
Authors: Yifang Tian, Yaming Liu, Zichun Chong, Zihang Huang, Hans-Arno Jacobsen |
阅读更多来源: ArXiv AI | 19-08-25
An LLM + ASP Workflow for Joint Entity-Relation Extraction
Authors: Trang Tran, Trung Hoang Le, Huiping Cao, Tran Cao Son |
阅读更多来源: ArXiv AI | 19-08-25
Help or Hurdle? Rethinking Model Context Protocol-Augmented Large Language Models
Authors: Wei Song, Haonan Zhong, Ziqi Ding, Jingling Xue, Yuekang Li |
阅读更多来源: ArXiv AI | 19-08-25
GridCodex: A RAG-Driven AI Framework for Power Grid Code Reasoning and Compliance
Authors: Jinquan Shi, Yingying Cheng, Fan Zhang, Miao Jiang, Jun Lin, Yanbai Shen |
阅读更多来源: ArXiv AI | 19-08-25
The Maximum Coverage Model and Recommendation System for UAV Vertiports Location Planning
Authors: Chunliang Hua, Xiao Hu, Jiayang Sun, Zeyuan Yang |
阅读更多来源: ArXiv AI | 19-08-25
Beyond Ethical Alignment: Evaluating LLMs as Artificial Moral Assistants
Authors: Alessio Galatolo, Luca Alberto Rappuoli, Katie Winkle, Meriem Beloucif |
阅读更多来源: ArXiv AI | 19-08-25
GTool: Graph Enhanced Tool Planning with Large Language Model
Authors: Wenjie Chen, Wenbin Li, Di Yao, Xuying Meng, Chang Gong, Jingping Bi |
阅读更多来源: ArXiv AI | 19-08-25
Reliability, Embeddedness, and Agency: A Utility-Driven Mathematical Framework for Agent-Centric AI Adoption
Authors: Faruk Alpay, Taylan Alpay |
阅读更多来源: ArXiv AI | 19-08-25
E3RG: Building Explicit Emotion-driven Empathetic Response Generation System with Multimodal Large Language Model
Authors: Ronghao Lin, Shuai Shen, Weipeng Hu, Qiaolin He, Aolin Xiong, Li Huang, Haifeng Hu, Yap-peng Tan |
阅读更多来源: ArXiv AI | 19-08-25
Towards Open-Ended Emotional Support Conversations in LLMs via Reinforcement Learning with Future-Oriented Rewards
Authors: Ting Yang, Li Chen, Huimin Wang |
阅读更多来源: ArXiv AI | 19-08-25
Do Large Language Model Agents Exhibit a Survival Instinct? An Empirical Study in a Sugarscape-Style Simulation
Authors: Atsushi Masumori, Takashi Ikegami |
阅读更多来源: ArXiv AI | 19-08-25
Tencent's X-Omni uses open source components to challenge GPT-4o image generation
阅读更多来源: The Decoder | 18-08-25
ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection
Authors: Axel Delaval, Shujian Yang, Haicheng Wang, Han Qiu, Jialiang Lu |
阅读更多来源: ArXiv AI | 18-08-25
LETToT: Label-Free Evaluation of Large Language Models On Tourism Using Expert Tree-of-Thought
Authors: Ruiyan Qi, Congding Wen, Weibo Zhou, Shangsong Liang, Lingbo Li |
阅读更多来源: ArXiv AI | 18-08-25
Is General-Purpose AI Reasoning Sensitive to Data-Induced Cognitive Biases? Dynamic Benchmarking on Typical Software Engineering Dilemmas
Authors: Francesco Sovrano, Gabriele Dominici, Rita Sevastjanova, Alessandra Stramiglio, Alberto Bacchelli |
阅读更多来源: ArXiv AI | 18-08-25
Dynamic Quality-Latency Aware Routing for LLM Inference in Wireless Edge-Device Networks
Authors: Rui Bao, Nan Xue, Yaping Sun, Zhiyong Chen |
阅读更多来源: ArXiv AI | 18-08-25
CSGO: Generalized Optimization for Cold Start in Wireless Collaborative Edge LLM Systems
Authors: Xuran Liu, Nan Xue, Rui Bao, Yaping Sun, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Shuguang Cui |
阅读更多来源: ArXiv AI | 18-08-25
Leveraging the RETFound foundation model for optic disc segmentation in retinal images
Authors: Zhenyi Zhao, Muthu Rama Krishnan Mookiah, Emanuele Trucco |
阅读更多来源: ArXiv AI | 18-08-25
ETTRL: Balancing Exploration and Exploitation in LLM Test-Time Reinforcement Learning Via Entropy Mechanism
Authors: Jia Liu, ChangYi He, YingQiao Lin, MingMin Yang, FeiYang Shen, ShaoGuo Liu, TingTing Gao |
阅读更多来源: ArXiv AI | 18-08-25
When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs
Authors: Mikhail Seleznyov, Mikhail Chaichuk, Gleb Ershov, Alexander Panchenko, Elena Tutubalina, Oleg Somov |
阅读更多来源: ArXiv AI | 18-08-25
Trustworthy AI Psychotherapy: Multi-Agent LLM Workflow for Counseling and Explainable Mental Disorder Diagnosis
Authors: Mithat Can Ozgun, Jiahuan Pei, Koen Hindriks, Lucia Donatelli, Qingzhi Liu, Xin Sun, Junxiao Wang |
阅读更多来源: ArXiv AI | 18-08-25
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
Authors: Wenhao Zhang, Yuexiang Xie, Yuchang Sun, Yanxi Chen, Guoyin Wang, Yaliang Li, Bolin Ding, Jingren Zhou |
阅读更多来源: ArXiv AI | 18-08-25
Reference Points in LLM Sentiment Analysis: The Role of Structured Context
Authors: Junichiro Niimi |
阅读更多来源: ArXiv AI | 18-08-25
Towards Faithful Class-level Self-explainability in Graph Neural Networks by Subgraph Dependencies
Authors: Fanzhen Liu, Xiaoxiao Ma, Jian Yang, Alsharif Abuadbba, Kristen Moore, Surya Nepal, Cecile Paris, Quan Z. Sheng, Jia Wu |
阅读更多来源: ArXiv AI | 18-08-25
Handwritten Text Recognition of Historical Manuscripts Using Transformer-Based Models
Authors: Erez Meoded |
阅读更多来源: ArXiv AI | 18-08-25
A Comprehensive Perspective on Explainable AI across the Machine Learning Workflow
Authors: George Paterakis, Andrea Castellani, George Papoutsoglou, Tobias Rodemann, Ioannis Tsamardinos |
阅读更多来源: ArXiv AI | 18-08-25
CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection
Authors: Zhihao Li, Zimo Ji, Tao Zheng, Hao Ren, Xiao Lan |
阅读更多来源: ArXiv AI | 18-08-25
Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models
Authors: Qiguang Chen, Dengyun Peng, Jinhao Liu, HuiKang Su, Jiannan Guan, Libo Qin, Wanxiang Che |
阅读更多来源: ArXiv AI | 18-08-25
Controlling Multimodal LLMs via Reward-guided Decoding
Authors: Oscar Mañas, Pierluca D'Oro, Koustuv Sinha, Adriana Romero-Soriano, Michal Drozdzal, Aishwarya Agrawal |
阅读更多来源: ArXiv AI | 18-08-25
Is ChatGPT-5 Ready for Mammogram VQA?
Authors: Qiang Li, Shansong Wang, Mingzhe Hu, Mojtaba Safari, Zachary Eidex, Xiaofeng Yang |
阅读更多来源: ArXiv AI | 18-08-25
SAGE: Scale-Aware Gradual Evolution for Continual Knowledge Graph Embedding
Authors: Yifei Li, Lingling Zhang, Hang Yan, Tianzhe Zhao, Zihan Ma, Muye Huang, Jun Liu |
阅读更多来源: ArXiv AI | 18-08-25
AIM-Bench: Evaluating Decision-making Biases of Agentic LLM as Inventory Manager
Authors: Xuhua Zhao, Yuxuan Xie, Caihua Chen, Yuxiang Sun |
阅读更多来源: ArXiv AI | 18-08-25
Inspire or Predict? Exploring New Paradigms in Assisting Classical Planners with Large Language Models
Authors: Wenkai Yu, Jianhang Tang, Yang Zhang, Shanjiang Tang, Kebing Jin, Hankz Hankui Zhuo |
阅读更多来源: ArXiv AI | 18-08-25
LLMs and coding agents are a security nightmaregarymarcus.substack.com
阅读更多来源: Hacker News | 18-08-25
Llama-Scan: Convert PDFs to Text W Local LLMsgithub.com/ngafar
阅读更多来源: Hacker News | 18-08-25
When you're asking AI chatbots for answers, they're data-mining youtheregister.com
阅读更多来源: Hacker News | 18-08-25
Claudia – Desktop companion for Claude codeclaudiacode.com
阅读更多来源: Hacker News | 18-08-25
Teaching GPT-5 to Use a Computerprava.co
阅读更多来源: Hacker News | 18-08-25
Here be dragons: Preventing static damage, latchup, and metastability in the 386righto.com
阅读更多来源: Hacker News | 18-08-25
Warmer-sounding LLMs are more likely to repeat false information and conspiracy theories
阅读更多来源: The Decoder | 18-08-25
Performance of GPT-5 in Brain Tumor MRI Reasoning
Authors: Mojtaba Safari, Shansong Wang, Mingzhe Hu, Zach Eidex, Qiang Li, Xiaofeng Yang |
阅读更多来源: ArXiv AI | 17-08-25
From Black Box to Transparency: Enhancing Automated Interpreting Assessment with Explainable AI in College Classrooms
Authors: Zhaokun Jiang, Ziyin Zhang |
阅读更多来源: ArXiv AI | 17-08-25
A Multimodal Neural Network for Recognizing Subjective Self-Disclosure Towards Social Robots
Authors: Henry Powell, Guy Laban, Emily S. Cross |
阅读更多来源: ArXiv AI | 17-08-25
TLE-Based A2C Agent for Terrestrial Coverage Orbital Path Planning
Authors: Anantha Narayanan, Battu Bhanu Teja, Pruthwik Mishra |
阅读更多来源: ArXiv AI | 17-08-25
Searching for Privacy Risks in LLM Agents via Simulation
Authors: Yanzhe Zhang, Diyi Yang |
阅读更多来源: ArXiv AI | 17-08-25
Amazon Nova AI Challenge -- Trusted AI: Advancing secure, AI-assisted software development
Authors: Sattvik Sahai, Prasoon Goyal, Michael Johnston, Anna Gottardi, Yao Lu, Lucy Hu, Luke Dai, Shaohua Liu, Samyuth Sagi, Hangjie Shi, Desheng Zhang, Lavina Vaz, Leslie Ball, Maureen Murray, Rahul Gupta, Shankar Ananthakrishna |
阅读更多来源: ArXiv AI | 17-08-25
A Survey of Optimization Modeling Meets LLMs: Progress and Future Directions
Authors: Ziyang Xiao, Jingrong Xie, Lilin Xu, Shisi Guan, Jingyan Zhu, Xiongwei Han, Xiaojin Fu, WingYin Yu, Han Wu, Wei Shi, Qingcan Kang, Jiahui Duan, Tao Zhong, Mingxuan Yuan, Jia Zeng, Yuan Wang, Gang Chen, Dongxiang Zhang |
阅读更多来源: ArXiv AI | 17-08-25
KompeteAI: Accelerated Autonomous Multi-Agent System for End-to-End Pipeline Generation for Machine Learning Problems
Authors: Stepan Kulibaba, Artem Dzhalilov, Roman Pakhomov, Oleg Svidchenko, Alexander Gasnikov, Aleksei Shpilman |
阅读更多来源: ArXiv AI | 17-08-25
Agentic AI Frameworks: Architectures, Protocols, and Design Challenges
Authors: Hana Derouiche, Zaki Brahmi, Haithem Mazeni |
阅读更多来源: ArXiv AI | 17-08-25
Why Cannot Large Language Models Ever Make True Correct Reasoning?
Authors: Jingde Cheng |
阅读更多来源: ArXiv AI | 17-08-25
Extending the Entropic Potential of Events for Uncertainty Quantification and Decision-Making in Artificial Intelligence
Authors: Mark Zilberman |
阅读更多来源: ArXiv AI | 17-08-25
What to Ask Next? Probing the Imaginative Reasoning of LLMs with TurtleSoup Puzzles
Authors: Mengtao Zhou, Sifan Wu, Huan Zhang, Qi Sima, Bang Liu |
阅读更多来源: ArXiv AI | 17-08-25
A Curriculum Learning Approach to Reinforcement Learning: Leveraging RAG for Multimodal Question Answering
Authors: Chenliang Zhang, Lin Wang, Yuanyuan Lu, Yusheng Qi, Kexin Wang, Peixu Hou, Wenshi Chen |
阅读更多来源: ArXiv AI | 17-08-25
HiRef: Leveraging Hierarchical Ontology and Network Refinement for Robust Medication Recommendation
Authors: Yan Ting Chok, Soyon Park, Seungheun Baek, Hajung Kim, Junhyun Lee, Jaewoo Kang |
阅读更多来源: ArXiv AI | 17-08-25
LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval
Authors: Yaoze Zhang, Rong Wu, Pinlong Cai, Xiaoman Wang, Guohang Yan, Song Mao, Ding Wang, Botian Shi |
阅读更多来源: ArXiv AI | 17-08-25
Reverse Physician-AI Relationship: Full-process Clinical Diagnosis Driven by a Large Language Model
Authors: Shicheng Xu, Xin Huang, Zihao Wei, Liang Pang, Huawei Shen, Xueqi Cheng |
阅读更多来源: ArXiv AI | 17-08-25
SEQ-GPT: LLM-assisted Spatial Query via Example
Authors: Ivan Khai Ze Lim, Ningyi Liao, Yiming Yang, Gerald Wei Yong Yip, Siqiang Luo |
阅读更多来源: ArXiv AI | 17-08-25
FIRESPARQL: A LLM-based Framework for SPARQL Query Generation over Scholarly Knowledge Graphs
Authors: Xueli Pan, Victor de Boer, Jacco van Ossenbruggen |
阅读更多来源: ArXiv AI | 17-08-25
MSRS: Adaptive Multi-Subspace Representation Steering for Attribute Alignment in Large Language Models
Authors: Xinyan Jiang, Lin Zhang, Jiayi Zhang, Qingsong Yang, Guimin Hu, Di Wang, Lijie Hu |
阅读更多来源: ArXiv AI | 17-08-25
GenOM: Ontology Matching with Description Generation and Large Language Model
Authors: Yiping Song, Jiaoyan Chen, Renate A. Schmidt |
阅读更多来源: ArXiv AI | 17-08-25
Modeling Human Responses to Multimodal AI Content
Authors: Zhiqi Shen, Shaojing Fan, Danni Xu, Terence Sim, Mohan Kankanhalli |
阅读更多来源: ArXiv AI | 17-08-25
Who Benefits from AI Explanations? Towards Accessible and Interpretable Systems
Authors: Maria J. P. Peixoto, Akriti Pandey, Ahsan Zaman, Peter R. Lewis |
阅读更多来源: ArXiv AI | 17-08-25
The Knowledge-Reasoning Dissociation: Fundamental Limitations of LLMs in Clinical Natural Language Inference
Authors: Maël Jullien, Marco Valentino, André Freitas |
阅读更多来源: ArXiv AI | 17-08-25
Tversky Neural Networksgonzoml.substack.com
阅读更多来源: Hacker News | 17-08-25
A Lisp in 99LOCgithub.com/robert-van-engelen
阅读更多来源: Hacker News | 17-08-25
Dyna – Logic Programming for Machine Learningdyna.org
阅读更多来源: Hacker News | 17-08-25
OpenAI Misled You on RLHFaerial-toothpaste-34a.notion.site
阅读更多来源: Hacker News | 17-08-25
OpenAI Progressprogress.openai.com
阅读更多来源: Hacker News | 17-08-25
OpenAI CEO Sam Altman says human-made content will "go up in value dramatically"
阅读更多来源: The Decoder | 17-08-25
Google unveils Gemma 3 270M, its most compact model designed for efficient, task-specific AI use
阅读更多来源: The Decoder | 17-08-25
Monday – A personality experimentchatgpt.com
阅读更多来源: Hacker News | 17-08-25
Zhipu AI's GLM-4.5 is yet another open-source Chinese LLM closing the gap with Western models
阅读更多来源: The Decoder | 17-08-25
Launch HN: Embedder (YC S25) – Claude code for embedded software
阅读更多来源: Hacker News | 16-08-25
Geoffrey Hinton urges researchers to design AI with nurturing instincts to protect humanity
阅读更多来源: The Decoder | 16-08-25
HTC unveils VIVE Eagle, a lightweight AI headset powered by OpenAI and Gemini
阅读更多来源: The Decoder | 16-08-25
I let LLMs write an Elixir NIF in C; it mostly workedoverbring.com
阅读更多来源: Hacker News | 16-08-25
Claude Opus 4 and 4.1 can now end a rare subset of conversationsanthropic.com
阅读更多来源: Hacker News | 16-08-25
OpenAI's o3 model outperforms the newer GPT-5 model on complex, multi-app office tasks
阅读更多来源: The Decoder | 16-08-25
Apple is reportedly planning an AI push with four new smart home products
阅读更多来源: The Decoder | 15-08-25
Doctors detected fewer lesions after routinely using AI during colonoscopies
阅读更多来源: The Decoder | 15-08-25
A conversation with Max Tegmark inspired AI co-founder Igor Babuschkin shift to safer AI
阅读更多来源: The Decoder | 15-08-25
Why LLMs can't really build softwarezed.dev
阅读更多来源: Hacker News | 15-08-25
Is chain-of-thought AI reasoning a mirage?seangoedecke.com
阅读更多来源: Hacker News | 15-08-25
OpenAI's AI system wins a gold medal-level score at the International Olympiad in Informatics 2025
阅读更多来源: The Decoder | 14-08-25
ChatGPT users can now toggle Auto, Fast, and Thinking modes for more control over GPT-5
阅读更多来源: The Decoder | 14-08-25
Show HN: Vaultrice – A real-time key-value store with a localStorage APIvaultrice.com
阅读更多来源: Hacker News | 14-08-25
Convo-Lang: LLM Programming Language and Runtimeconvo-lang.ai
阅读更多来源: Hacker News | 14-08-25
Show HN: Yet another memory system for LLMsgithub.com/trvon
阅读更多来源: Hacker News | 14-08-25
Mbodi AI (YC X25) Is Hiring a Founding Research Engineer (Robotics)ycombinator.com
阅读更多来源: Hacker News | 14-08-25
What's the strongest AI model you can train on a laptop in five minutes?seangoedecke.com
阅读更多来源: Hacker News | 14-08-25
Evaluating the Role of Large Language Models in Legal Practice in India
Authors: Rahul Hemrajani (National Law School of India University, Bengaluru) |
阅读更多来源: ArXiv AI | 14-08-25
Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study
Authors: Mahdi Dhaini, Juraj Vladika, Ege Erdogan, Zineb Attaoui, Gjergji Kasneci |
阅读更多来源: ArXiv AI | 14-08-25
Enhance the machine learning algorithm performance in phishing detection with keyword features
Authors: Zijiang Yang |
阅读更多来源: ArXiv AI | 14-08-25
A Comprehensive Survey of Datasets for Clinical Mental Health AI Systems
Authors: Aishik Mandal, Prottay Kumar Adhikary, Hiba Arnaout, Iryna Gurevych, Tanmoy Chakraborty |
阅读更多来源: ArXiv AI | 14-08-25
LibRec: Benchmarking Retrieval-Augmented LLMs for Library Migration Recommendations
Authors: Junxiao Han, Yarong Wang, Xiaodong Gu, Cuiyun Gao, Yao Wan, Song Han, David Lo, Shuiguang Deng |
阅读更多来源: ArXiv AI | 14-08-25
Perceptual Reality Transformer: Neural Architectures for Simulating Neurological Perception Conditions
Authors: Baihan Lin |
阅读更多来源: ArXiv AI | 14-08-25
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
Authors: Weigao Sun, Jiaxi Hu, Yucheng Zhou, Jusen Du, Disen Lan, Kexin Wang, Tong Zhu, Xiaoye Qu, Yu Zhang, Xiaoyu Mo, Daizong Liu, Yuxuan Liang, Wenliang Chen, Guoqi Li, Yu Cheng |
阅读更多来源: ArXiv AI | 14-08-25
Exploring the Potential of Large Language Models in Fine-Grained Review Comment Classification
Authors: Linh Nguyen, Chunhua Liu, Hong Yi Lin, Patanamon Thongtanunam |
阅读更多来源: ArXiv AI | 14-08-25
Beyond Naïve Prompting: Strategies for Improved Zero-shot Context-aided Forecasting with LLMs
Authors: Arjun Ashok, Andrew Robert Williams, Vincent Zhihao Zheng, Irina Rish, Nicolas Chapados, Étienne Marcotte, Valentina Zantedeschi, Alexandre Drouin |
阅读更多来源: ArXiv AI | 14-08-25
Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models
Authors: Jiaqi Cao, Jiarui Wang, Rubin Wei, Qipeng Guo, Kai Chen, Bowen Zhou, Zhouhan Lin |
阅读更多来源: ArXiv AI | 14-08-25
STREAM (ChemBio): A Standard for Transparently Reporting Evaluations in AI Model Reports
Authors: Tegan McCaslin, Jide Alaga, Samira Nedungadi, Seth Donoughe, Tom Reed, Rishi Bommasani, Chris Painter, Luca Righetti |
阅读更多来源: ArXiv AI | 14-08-25
A Comprehensive Evaluation framework of Alignment Techniques for LLMs
Authors: Muneeza Azmat, Momin Abbas, Maysa Malfiza Garcia de Macedo, Marcelo Carpinette Grave, Luan Soares de Souza, Tiago Machado, Rogerio A de Paula, Raya Horesh, Yixin Chen, Heloisa Caroline de Souza Pereira Candello, Rebecka Nordenlow, Aminat Adebiyi |
阅读更多来源: ArXiv AI | 14-08-25
The Othello AI Arena: Evaluating Intelligent Systems Through Limited-Time Adaptation to Unseen Boards
Authors: Sundong Kim |
阅读更多来源: ArXiv AI | 14-08-25
Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
Authors: Junyan Ye, Dongzhi Jiang, Zihao Wang, Leqi Zhu, Zhenghao Hu, Zilong Huang, Jun He, Zhiyuan Yan, Jinghua Yu, Hongsheng Li, Conghui He, Weijia Li |
阅读更多来源: ArXiv AI | 14-08-25
The PacifAIst Benchmark:Would an Artificial Intelligence Choose to Sacrifice Itself for Human Safety?
Authors: Manuel Herrador |
阅读更多来源: ArXiv AI | 14-08-25
UDA: Unsupervised Debiasing Alignment for Pair-wise LLM-as-a-Judge
Authors: Yang Zhang, Cunxiang Wang, Lindong Wu, Wenbo Yu, Yidong Wang, Guangsheng Bao, Jie Tang |
阅读更多来源: ArXiv AI | 14-08-25
RAGulating Compliance: A Multi-Agent Knowledge Graph for Regulatory QA
Authors: Bhavik Agarwal, Hemant Sunil Jomraj, Simone Kaplunov, Jack Krolick, Viktoria Rojkova |
阅读更多来源: ArXiv AI | 14-08-25
Mathematical Computation and Reasoning Errors by Large Language Models
Authors: Liang Zhang, Edith Aurora Graf |
阅读更多来源: ArXiv AI | 14-08-25
Claude says “You're absolutely right!” about everythinggithub.com/anthropics
阅读更多来源: Hacker News | 14-08-25
Illinois bans use of artificial intelligence for mental health therapywashingtonpost.com
阅读更多来源: Hacker News | 14-08-25
Nvidia researchers urge the AI industry to rethink agentic AI in favor of smaller, more efficient LLMs
阅读更多来源: The Decoder | 13-08-25
Nvidia pushes "Physical AI" with new Blackwell hardware and AI models
阅读更多来源: The Decoder | 13-08-25
Psychiatrist warns of AI-driven delusions as OpenAI's Sam Altman admits risks
阅读更多来源: The Decoder | 13-08-25
GPT-5 is here and Gary Marcus is not impressed
阅读更多来源: The Decoder | 13-08-25
Nvidia and AMD must pay the U.S. a portion of revenue for selling AI chips in China
阅读更多来源: The Decoder | 13-08-25
A Comprehensive Survey of Self-Evolving AI Agents [pdf]arxiv.org
阅读更多来源: Hacker News | 13-08-25
Show HN: Omnara – Run Claude Code from anywheregithub.com/omnara-ai
阅读更多来源: Hacker News | 13-08-25
Show HN: Building a web search engine from scratch with 3B neural embeddingsblog.wilsonl.in
阅读更多来源: Hacker News | 13-08-25
His psychosis was a mystery–until doctors learned about ChatGPT's health advicepsypost.org
阅读更多来源: Hacker News | 13-08-25
Claude Sonnet 4 now supports 1M tokens of contextanthropic.com
阅读更多来源: Hacker News | 13-08-25
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer
Authors: Zixin Yin, Xili Dai, Ling-Hao Chen, Deyu Zhou, Jianan Wang, Duomin Wang, Gang Yu, Lionel M. Ni, Heung-Yeung Shum |
阅读更多来源: ArXiv AI | 13-08-25
Can We Trust AI to Govern AI? Benchmarking LLM Performance on Privacy and AI Governance Exams
Authors: Zane Witherspoon, Thet Mon Aye, YingYing Hao |
阅读更多来源: ArXiv AI | 13-08-25
UrzaGPT: LoRA-Tuned Large Language Models for Card Selection in Collectible Card Games
Authors: Timo Bertram |
阅读更多来源: ArXiv AI | 13-08-25
What Breaks Knowledge Graph based RAG? Empirical Insights into Reasoning under Incomplete Knowledge
Authors: Dongzhuoran Zhou, Yuqicheng Zhu, Xiaxia Wang, Hongkuan Zhou, Yuan He, Jiaoyan Chen, Evgeny Kharlamov, Steffen Staab |
阅读更多来源: ArXiv AI | 13-08-25
First Ask Then Answer: A Framework Design for AI Dialogue Based on Supplementary Questioning with Large Language Models
Authors: Chuanruo Fu, Yuncheng Du |
阅读更多来源: ArXiv AI | 13-08-25
LLM-BI: Towards Fully Automated Bayesian Inference with Large Language Models
Authors: Yongchao Huang |
阅读更多来源: ArXiv AI | 13-08-25
Topos Theory for Generative AI and LLMs
Authors: Sridhar Mahadevan |
阅读更多来源: ArXiv AI | 13-08-25
POMO+: Leveraging starting nodes in POMO for solving Capacitated Vehicle Routing Problem
Authors: Szymon Jakubicz, Karol Kuźniak, Jan Wawszczak, Paweł Gora |
阅读更多来源: ArXiv AI | 13-08-25
AgriGPT: a Large Language Model Ecosystem for Agriculture
Authors: Bo Yang, Yu Zhang, Lanfei Feng, Yunkui Chen, Jianyu Zhang, Xiao Xu, Nueraili Aierken, Yurui Li, Yuxuan Chen, Guijun Yang, Yong He, Runhe Huang, Shijian Li |
阅读更多来源: ArXiv AI | 13-08-25
SynLLM: A Comparative Analysis of Large Language Models for Medical Tabular Synthetic Data Generation via Prompt Engineering
Authors: Arshia Ilaty, Hossein Shirazi, Hajar Homayouni |
阅读更多来源: ArXiv AI | 13-08-25
GVGAI-LLM: Evaluating Large Language Model Agents with Infinite Games
Authors: Yuchen Li, Cong Lin, Muhammad Umair Nasir, Philip Bontrager, Jialin Liu, Julian Togelius |
阅读更多来源: ArXiv AI | 13-08-25
Large Language Models as Oracles for Ontology Alignment
Authors: Sviatoslav Lushnei, Dmytro Shumskyi, Severyn Shykula, Ernesto Jimenez-Ruiz, Artur d'Avila Garcez |
阅读更多来源: ArXiv AI | 13-08-25
Prompt-and-Check: Using Large Language Models to Evaluate Communication Protocol Compliance in Simulation-Based Training
Authors: Vishakha Lall, Yisi Liu |
阅读更多来源: ArXiv AI | 13-08-25
A Dual-Axis Taxonomy of Knowledge Editing for LLMs: From Mechanisms to Functions
Authors: Amir Mohammad Salehoof, Ali Ramezani, Yadollah Yaghoobzadeh, Majid Nili Ahmadabadi |
阅读更多来源: ArXiv AI | 13-08-25
Silicon Minds versus Human Hearts: The Wisdom of Crowds Beats the Wisdom of AI in Emotion Recognition
Authors: Mustafa Akben, Vinayaka Gude, Haya Ajjan |
阅读更多来源: ArXiv AI | 13-08-25
Efficient Agent: Optimizing Planning Capability for Multimodal Retrieval Augmented Generation
Authors: Yuechen Wang, Yuming Qiao, Dan Meng, Jun Yang, Haonan Lu, Zhenyu Yang, Xudong Zhang |
阅读更多来源: ArXiv AI | 13-08-25
Prospect Theory Fails for LLMs: Revealing Instability of Decision-Making under Epistemic Uncertainty
Authors: Rui Wang, Qihan Lin, Jiayu Liu, Qing Zong, Tianshi Zheng, Weiqi Wang, Yangqiu Song |
阅读更多来源: ArXiv AI | 13-08-25
Activation Steering for Bias Mitigation: An Interpretable Approach to Safer LLMs
Authors: Shivam Dubey |
阅读更多来源: ArXiv AI | 13-08-25
Intrinsic Memory Agents: Heterogeneous Multi-Agent LLM Systems through Structured Contextual Memory
Authors: Sizhe Yuen, Francisco Gomez Medina, Ting Su, Yali Du, Adam J. Sobey |
阅读更多来源: ArXiv AI | 13-08-25
CVCM Track Circuits Pre-emptive Failure Diagnostics for Predictive Maintenance Using Deep Neural Networks
Authors: Debdeep Mukherjee (2), Eduardo Di Santi (1), Clément Lefebvre (1), Nenad Mijatovic (1), Victor Martin (1), Thierry Josse (3), Jonathan Brown (1), Kenza Saiah (1) ((1) Digital and Integrated Systems, Alstom (2) Innovation and Smart Mobility, Alstom (3) Project System Engineering, Alstom) |
阅读更多来源: ArXiv AI | 13-08-25
SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling
Authors: Shixuan Sun, Siyuan Liang, Ruoyu Chen, Jianjie Huang, Jingzhi Li, Xiaochun Cao |
阅读更多来源: ArXiv AI | 13-08-25
Agent-based AI systems face growing threats from zero-click and one-click exploits
阅读更多来源: The Decoder | 13-08-25
Nexus: An Open-Source AI Router for Governance, Control and Observabilitynexusrouter.com
阅读更多来源: Hacker News | 13-08-25
Evaluating LLMs playing text adventuresentropicthoughts.com
阅读更多来源: Hacker News | 13-08-25
LLMs aren't world modelsyosefk.com
阅读更多来源: Hacker News | 13-08-25
Launch HN: Design Arena (YC S25) – Head-to-head AI benchmark for aesthetics
阅读更多来源: Hacker News | 13-08-25
U.S. authorities have reportedly embedded secret GPS trackers in shipments of advanced AI chips
阅读更多来源: The Decoder | 13-08-25
Here’s how to spot AI writing, according to Wikipedia editors
阅读更多来源: The Decoder | 12-08-25
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lensarstechnica.com
阅读更多来源: Hacker News | 12-08-25
Sloppy AI defenses take cybersecurity back to the 1990s, researchers sayscworld.com
阅读更多来源: Hacker News | 12-08-25
Claude Code is all you needdwyer.co.za
阅读更多来源: Hacker News | 12-08-25
MDK12-Bench: A Comprehensive Evaluation of Multimodal Large Language Models on Multidisciplinary Exams
Authors: Pengfei Zhou, Xiaopeng Peng, Fanrui Zhang, Zhaopan Xu, Jiaxin Ai, Yansheng Qiu, Chuanhao Li, Zhen Li, Ming Li, Yukang Feng, Jianwen Sun, Haoquan Zhang, Zizhen Li, Xiaofeng Mao, Zekai Li, Wangbo Zhao, Kai Wang, Xiaojun Chang, Wenqi Shao, Yang You, Kaipeng Zhang |
阅读更多来源: ArXiv AI | 12-08-25
Automated Formalization via Conceptual Retrieval-Augmented LLMs
Authors: Wangyue Lu, Lun Du, Sirui Li, Ke Weng, Haozhe Sun, Hengyu Liu, Minghe Yu, Tiancheng Zhang, Ge Yu |
阅读更多来源: ArXiv AI | 12-08-25
DSperse: A Framework for Targeted Verification in Zero-Knowledge Machine Learning
Authors: Dan Ivanov, Tristan Freiberg, Haruna Isah |
阅读更多来源: ArXiv AI | 12-08-25
MASteer: Multi-Agent Adaptive Steer Strategy for End-to-End LLM Trustworthiness Repair
Authors: Changqing Li, Tianlin Li, Xiaohan Zhang, Aishan Liu, Li Pan |
阅读更多来源: ArXiv AI | 12-08-25
Large Language Models Do Not Simulate Human Psychology
Authors: Sarah Schröder, Thekla Morgenroth, Ulrike Kuhl, Valerie Vaquet, Benjamin Paaßen |
阅读更多来源: ArXiv AI | 12-08-25
Towards Safer AI Moderation: Evaluating LLM Moderators Through a Unified Benchmark Dataset and Advocating a Human-First Approach
Authors: Naseem Machlovi, Maryam Saleki, Innocent Ababio, Ruhul Amin |
阅读更多来源: ArXiv AI | 12-08-25
Generative AI for Strategic Plan Development
Authors: Jesse Ponnock |
阅读更多来源: ArXiv AI | 12-08-25
Rethinking Domain-Specific LLM Benchmark Construction: A Comprehensiveness-Compactness Approach
Authors: Rubing Chen, Jiaxin Wu, Jian Wang, Xulu Zhang, Wenqi Fan, Chenghua Lin, Xiao-Yong Wei, Qing Li |
阅读更多来源: ArXiv AI | 12-08-25
MCPToolBench++: A Large Scale AI Agent Model Context Protocol MCP Tool Use Benchmark
Authors: Shiqing Fan, Xichen Ding, Liang Zhang, Linjian Mo |
阅读更多来源: ArXiv AI | 12-08-25
Democratizing Diplomacy: A Harness for Evaluating Any Large Language Model on Full-Press Diplomacy
Authors: Alexander Duffy, Samuel J Paech, Ishana Shastri, Elizabeth Karpinski, Baptiste Alloui-Cros, Tyler Marques, Matthew Lyle Olson |
阅读更多来源: ArXiv AI | 12-08-25
Grounding Natural Language for Multi-agent Decision-Making with Multi-agentic LLMs
Authors: Dom Huh, Prasant Mohapatra |
阅读更多来源: ArXiv AI | 12-08-25
Multimodal AI Systems for Enhanced Laying Hen Welfare Assessment and Productivity Optimization
Authors: Daniel Essien, Suresh Neethirajan |
阅读更多来源: ArXiv AI | 12-08-25
1-2-3 Check: Enhancing Contextual Privacy in LLM via Multi-Agent Reasoning
Authors: Wenkai Li, Liwen Sun, Zhenxiang Guan, Xuhui Zhou, Maarten Sap |
阅读更多来源: ArXiv AI | 12-08-25
Symmetry-Aware Transformer Training for Automated Planning
Authors: Markus Fritzsche, Elliot Gestrin, Jendrik Seipp |
阅读更多来源: ArXiv AI | 12-08-25
\(X\)-evolve: Solution space evolution powered by large language models
Authors: Yi Zhai, Zhiqiang Wei, Ruohan Li, Keyu Pan, Shuo Liu, Lu Zhang, Jianmin Ji, Wuyang Zhang, Yu Zhang, Yanyong Zhang |
阅读更多来源: ArXiv AI | 12-08-25
FEAT: A Multi-Agent Forensic AI System with Domain-Adapted Large Language Model for Automated Cause-of-Death Analysis
Authors: Chen Shen, Wanqing Zhang, Kehan Li, Erwen Huang, Haitao Bi, Aiying Fan, Yiwen Shen, Hongmei Dong, Ji Zhang, Yuming Shao, Zengjia Liu, Xinshe Liu, Tao Li, Chunxia Yan, Shuanliang Fan, Di Wu, Jianhua Ma, Bin Cong, Zhenyuan Wang, Chunfeng Lian |
阅读更多来源: ArXiv AI | 12-08-25
Interpreting Fedspeak with Confidence: A LLM-Based Uncertainty-Aware Framework Guided by Monetary Policy Transmission Paths
Authors: Rui Yao (1), Qi Chai (1 and 3), Jinhai Yao (2), Siyuan Li (1), Junhao Chen (1), Qi Zhang (2), Hao Wang (1) ((1) The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China, (2) Shanghai Jiaotong University, Shanghai, China, (3) Xi'an Jiaotong University, Xi'an, China) |
阅读更多来源: ArXiv AI | 12-08-25
BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks
Authors: Rui Miao, Yixin Liu, Yili Wang, Xu Shen, Yue Tan, Yiwei Dai, Shirui Pan, Xin Wang |
阅读更多来源: ArXiv AI | 12-08-25
TeamMedAgents: Enhancing Medical Decision-Making of LLMs Through Structured Teamwork
Authors: Pranav Pushkar Mishra, Mohammad Arvan, Mohan Zalake (University of Illinois, Chicago) |
阅读更多来源: ArXiv AI | 12-08-25
From Natural Language to Solver-Ready Power System Optimization: An LLM-Assisted, Validation-in-the-Loop Framework
Authors: Yunkai Hu, Tianqiao Zhao, Meng Yue |
阅读更多来源: ArXiv AI | 12-08-25
Optimizing my sleep around Claude usage limitsmattwie.se
阅读更多来源: Hacker News | 12-08-25
Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
Authors: Zhaomin Wu, Mingzhe Du, See-Kiong Ng, Bingsheng He |
阅读更多来源: ArXiv AI | 12-08-25
End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation
Authors: Anurag Tripathi, Vaibhav Patle, Abhinav Jain, Ayush Pundir, Sairam Menon, Ajeet Kumar Singh |
阅读更多来源: ArXiv AI | 12-08-25
SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models
Authors: Han Yin, Yafeng Chen, Chong Deng, Luyao Cheng, Hui Wang, Chao-Hong Tan, Qian Chen, Wen Wang, Xiangang Li |
阅读更多来源: ArXiv AI | 12-08-25
Dimensional Characterization and Pathway Modeling for Catastrophic AI Risks
Authors: Ze Shen Chin |
阅读更多来源: ArXiv AI | 12-08-25
Robust Target Speaker Diarization and Separation via Augmented Speaker Embedding Sampling
Authors: Md Asif Jalal, Luca Remaggi, Vasileios Moschopoulos, Thanasis Kotsiopoulos, Vandana Rajan, Karthikeyan Saravanan, Anastasis Drosou, Junho Heo, Hyuk Oh, Seokyeong Jeong |
阅读更多来源: ArXiv AI | 12-08-25
Shortcut Learning in Generalist Robot Policies: The Role of Dataset Diversity and Fragmentation
Authors: Youguang Xing, Xu Luo, Junlin Xie, Lianli Gao, Hengtao Shen, Jingkuan Song |
阅读更多来源: ArXiv AI | 12-08-25
Echoes of Automation: The Increasing Use of LLMs in Newsmaking
Authors: Abolfazl Ansari, Delvin Ce Zhang, Nafis Irtiza Tripto, Dongwon Lee |
阅读更多来源: ArXiv AI | 12-08-25
Learning the Topic, Not the Language: How LLMs Classify Online Immigration Discourse Across Languages
Authors: Andrea Nasuto, Stefano Maria Iacus, Francisco Rowe, Devika Jain |
阅读更多来源: ArXiv AI | 12-08-25
ScamAgents: How AI Agents Can Simulate Human-Level Scam Calls
Authors: Sanket Badhe |
阅读更多来源: ArXiv AI | 12-08-25
Planning Agents on an Ego-Trip: Leveraging Hybrid Ego-Graph Ensembles for Improved Tool Retrieval in Enterprise Task Planning
Authors: Sahil Bansal, Sai Shruthi Sistla, Aarti Arikatala, Sebastian Schreiber |
阅读更多来源: ArXiv AI | 12-08-25
Holistic Explainable AI (H-XAI): Extending Transparency Beyond Developers in AI-Driven Decision Making
Authors: Kausik Lakkaraju, Siva Likitha Valluru, Biplav Srivastava |
阅读更多来源: ArXiv AI | 12-08-25
Whither symbols in the era of advanced neural networks?
Authors: Thomas L. Griffiths, Brenden M. Lake, R. Thomas McCoy, Ellie Pavlick, Taylor W. Webb |
阅读更多来源: ArXiv AI | 12-08-25
LLMs for Resource Allocation: A Participatory Budgeting Approach to Inferring Preferences
Authors: Sankarshan Damle, Boi Faltings |
阅读更多来源: ArXiv AI | 12-08-25
SKATE, a Scalable Tournament Eval: Weaker LLMs differentiate between stronger ones using verifiable challenges
Authors: Dewi S. W. Gould, Bruno Mlodozeniec, Samuel F. Brown |
阅读更多来源: ArXiv AI | 12-08-25
GeoLaux: A Benchmark for Evaluating MLLMs' Geometry Performance on Long-Step Problems Requiring Auxiliary Lines
Authors: Yumeng Fu, Jiayin Zhu, Lingling Zhang, Bo Zhao, Shaoxuan Ma, Yushun Zhang, Yanrui Wu, Wenjun Wu |
阅读更多来源: ArXiv AI | 12-08-25
Overconfidence in LLM-as-a-Judge: Diagnosis and Confidence-Driven Solution
Authors: Zailong Tian, Zhuoheng Han, Yanzhe Chen, Haozhe Xu, Xi Yang, richeng xuan, Hongfeng Wang, Lizi Liao |
阅读更多来源: ArXiv AI | 12-08-25
Retrieval Augmented Large Language Model System for Comprehensive Drug Contraindications
Authors: Byeonghun Bang, Jongsuk Yoon, Dong-Jin Chang, Seho Park, Yong Oh Lee |
阅读更多来源: ArXiv AI | 12-08-25
LLM Robustness Leaderboard v1 --Technical report
Authors: Pierre Peigné - Lefebvre, Quentin Feuillade-Montixi, Tom David, Nicolas Miailhe |
阅读更多来源: ArXiv AI | 12-08-25
From Explainable to Explanatory Artificial Intelligence: Toward a New Paradigm for Human-Centered Explanations through Generative AI
Authors: Christian Meske, Justin Brenne, Erdi Uenal, Sabahat Oelcer, Ayseguel Doganguen |
阅读更多来源: ArXiv AI | 12-08-25
AntiCheatPT: A Transformer-Based Approach to Cheat Detection in Competitive Computer Games
Authors: Mille Mei Zhen Loo, Gert Luzkov, Paolo Burelli |
阅读更多来源: ArXiv AI | 12-08-25
The Fair Game: Auditing & Debiasing AI Algorithms Over Time
Authors: Debabrota Basu, Udvas Das |
阅读更多来源: ArXiv AI | 12-08-25
OpenAI CEO Sam Altman responds to GPT-5 backlash, outlines next steps
阅读更多来源: The Decoder | 11-08-25
Fitzgerald's Follieslibertiesjournal.com
阅读更多来源: Hacker News | 11-08-25
Graham: Synchronizing Clocks by Leveraging Local Clock Properties (2022) [pdf]usenix.org
阅读更多来源: Hacker News | 11-08-25
GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2sebastianraschka.com
阅读更多来源: Hacker News | 11-08-25
GPT-OSS-120B runs on just 8GB VRAM & 64GB+ system RAMreddit.com
阅读更多来源: Hacker News | 11-08-25
Hand-picked selection of articles on AI fundamentals/conceptsaman.ai
阅读更多来源: Hacker News | 11-08-25
Meta acquires audio AI startup WaveForms as it ramps up efforts to build Llama 4.5
阅读更多来源: The Decoder | 11-08-25
How I code with AI on a budget/freewuu73.org
阅读更多来源: Hacker News | 11-08-25
Show HN: Reactive: A React Book for the Reluctant (written by Claude)github.com/cloudstreet-dev
阅读更多来源: Hacker News | 11-08-25
The current state of LLM-driven developmenttolki.dev
阅读更多来源: Hacker News | 10-08-25
Ch.at – a lightweight LLM chat service accessible through HTTP, SSH, DNS and APIch.at
阅读更多来源: Hacker News | 10-08-25
My Lethal Trifecta talk at the Bay Area AI Security Meetupsimonwillison.net
阅读更多来源: Hacker News | 10-08-25
Curious about the training data of OpenAI's new GPT-OSS models? I was tootwitter.com/jxmnop
阅读更多来源: Hacker News | 10-08-25
Embedding Alignment in Code Generation for Audio
Authors: Sam Kouteili, Hiren Madhu, George Typaldos, Mark Santolucito |
阅读更多来源: ArXiv AI | 10-08-25
The World According to LLMs: How Geographic Origin Influences LLMs' Entity Deduction Capabilities
Authors: Harsh Nishant Lalai, Raj Sanjay Shah, Jiaxin Pei, Sashank Varma, Yi-Chia Wang, Ali Emami |
阅读更多来源: ArXiv AI | 10-08-25
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle
Authors: Linghao Zhu, Yiran Guan, Dingkang Liang, Jianzhong Ju, Zhenbo Luo, Bin Qin, Jian Luan, Yuliang Liu, Xiang Bai |
阅读更多来源: ArXiv AI | 10-08-25
Iterative Learning of Computable Phenotypes for Treatment Resistant Hypertension using Large Language Models
Authors: Guilherme Seidyo Imai Aldeia, Daniel S. Herman, William G. La Cava |
阅读更多来源: ArXiv AI | 10-08-25
Conformal Sets in Multiple-Choice Question Answering under Black-Box Settings with Provable Coverage Guarantees
Authors: Guang Yang, Xinyang Liu |
阅读更多来源: ArXiv AI | 10-08-25
Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models
Authors: Haitao Hong, Yuchen Yan, Xingyu Wu, Guiyang Hou, Wenqi Zhang, Weiming Lu, Yongliang Shen, Jun Xiao |
阅读更多来源: ArXiv AI | 10-08-25
How Do LLMs Persuade? Linear Probes Can Uncover Persuasion Dynamics in Multi-Turn Conversations
Authors: Brandon Jaipersaud, David Krueger, Ekdeep Singh Lubana |
阅读更多来源: ArXiv AI | 10-08-25
TrajEvo: Trajectory Prediction Heuristics Design via LLM-driven Evolution
Authors: Zhikai Zhao, Chuanbo Hua, Federico Berto, Kanghoon Lee, Zihan Ma, Jiachen Li, Jinkyoo Park |
阅读更多来源: ArXiv AI | 10-08-25
Prescriptive Agents based on Rag for Automated Maintenance (PARAM)
Authors: Chitranshu Harbola, Anupam Purwar |
阅读更多来源: ArXiv AI | 10-08-25
Large Language Models Reasoning Abilities Under Non-Ideal Conditions After RL-Fine-Tuning
Authors: Chang Tian, Matthew B. Blaschko, Mingzhe Xing, Xiuxing Li, Yinliang Yue, Marie-Francine Moens |
阅读更多来源: ArXiv AI | 10-08-25
Fine-Tuning Small Language Models (SLMs) for Autonomous Web-based Geographical Information Systems (AWebGIS)
Authors: Mahdi Nazari Ashani, Ali Asghar Alesheikh, Saba Kazemi, Kimya Kheirkhah, Yasin Mohammadi, Fatemeh Rezaie, Amir Mahdi Manafi, Hedieh Zarkesh |
阅读更多来源: ArXiv AI | 10-08-25
Who is a Better Player: LLM against LLM
Authors: Yingjie Zhou, Jiezhang Cao, Farong Wen, Li Xu, Yanwei Jiang, Jun Jia, Ronghui Li, Xiaohong Liu, Yu Zhou, Xiongkuo Min, Jie Guo, Zicheng Zhang, Guangtao Zhai |
阅读更多来源: ArXiv AI | 10-08-25
MedMKEB: A Comprehensive Knowledge Editing Benchmark for Medical Multimodal Large Language Models
Authors: Dexuan Xu, Jieyi Wang, Zhongyan Chai, Yongzhi Cao, Hanpin Wang, Huamin Zhang, Yu Huang |
阅读更多来源: ArXiv AI | 10-08-25
Can Large Language Models Integrate Spatial Data? Empirical Insights into Reasoning Strengths and Computational Weaknesses
Authors: Bin Han, Robert Wolfe, Anat Caspi, Bill Howe |
阅读更多来源: ArXiv AI | 10-08-25
EasySize: Elastic Analog Circuit Sizing via LLM-Guided Heuristic Search
Authors: Xinyue Wu, Fan Hu, Shaik Jani Babu, Yi Zhao, Xinfei Guo |
阅读更多来源: ArXiv AI | 10-08-25
A Novel Architecture for Symbolic Reasoning with Decision Trees and LLM Agents
Authors: Andrew Kiruluta |
阅读更多来源: ArXiv AI | 10-08-25
QA-Dragon: Query-Aware Dynamic RAG System for Knowledge-Intensive Visual Question Answering
Authors: Zhuohang Jiang, Pangjing Wu, Xu Yuan, Wenqi Fan, Qing Li |
阅读更多来源: ArXiv AI | 10-08-25
NomicLaw: Emergent Trust and Strategic Argumentation in LLMs During Collaborative Law-Making
Authors: Asutosh Hota, Jussi P.P. Jokinen |
阅读更多来源: ArXiv AI | 10-08-25
Large Language Models Transform Organic Synthesis From Reaction Prediction to Automation
Authors: Kartar Kumar Lohana Tharwani, Rajesh Kumar, Sumita, Numan Ahmed, Yong Tang |
阅读更多来源: ArXiv AI | 10-08-25
An Explainable Machine Learning Framework for Railway Predictive Maintenance using Data Streams from the Metro Operator of Portugal
Authors: Silvia García-Méndez, Francisco de Arriba-Pérez, Fátima Leal, Bruno Veloso, Benedita Malheiro, Juan Carlos Burguillo-Rial |
阅读更多来源: ArXiv AI | 10-08-25
Bench-2-CoP: Can We Trust Benchmarking for EU AI Compliance?
Authors: Matteo Prandi, Vincenzo Suriani, Federico Pierucci, Marcello Galisai, Daniele Nardi, Piercosma Bisconti |
阅读更多来源: ArXiv AI | 10-08-25
InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities
Authors: Shuo Cai, Su Lu, Qi Zhou, Kejing Yang, Zhijie Sang, Congkai Xie, Hongxia Yang |
阅读更多来源: ArXiv AI | 10-08-25
Can Large Language Models Generate Effective Datasets for Emotion Recognition in Conversations?
Authors: Burak Can Kaplan, Hugo Cesar De Castro Carneiro, Stefan Wermter |
阅读更多来源: ArXiv AI | 10-08-25
Simulating Human-Like Learning Dynamics with LLM-Empowered Agents
Authors: Yu Yuan, Lili Zhao, Wei Chen, Guangting Zheng, Kai Zhang, Mengdi Zhang, Qi Liu |
阅读更多来源: ArXiv AI | 10-08-25
Prompting GPT-5 for agentic workflows and advanced coding applications
阅读更多来源: The Decoder | 10-08-25
GPT-5: Overdue, overhyped and underwhelming. And that's not the worst of itgarymarcus.substack.com
阅读更多来源: Hacker News | 10-08-25
Let's properly analyze an AI article for oncenibblestew.blogspot.com
阅读更多来源: Hacker News | 09-08-25
Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?
阅读更多来源: Hacker News | 09-08-25
Getting good results from Claude Codedzombak.com
阅读更多来源: Hacker News | 09-08-25
What the Windsurf sale means for the AI coding ecosystemethanding.substack.com
阅读更多来源: Hacker News | 09-08-25
I want everything local – Building my offline AI workspaceinstavm.io
阅读更多来源: Hacker News | 09-08-25
Attackers can hijack Google Gemini with a simple prompt hidden in a calendar invite
阅读更多来源: The Decoder | 09-08-25
Grok 4 edges out GPT-5 in complex reasoning benchmark ARC-AGI
阅读更多来源: The Decoder | 09-08-25
GPT-5 should "seem smarter from today" after OpenAI fixed early issues with its model switcher
阅读更多来源: The Decoder | 09-08-25
HRT's Python fork: Leveraging PEP 690 for faster importshudsonrivertrading.com
阅读更多来源: Hacker News | 09-08-25
A robust, open-source framework for Spiking Neural Networks on low-end FPGAsarxiv.org
阅读更多来源: Hacker News | 09-08-25
Open SWE: An open-source asynchronous coding agentlangchain.com
阅读更多来源: Hacker News | 09-08-25
The surprise deprecation of GPT-4o for ChatGPT consumerssimonwillison.net
阅读更多来源: Hacker News | 09-08-25
Developers rely on AI tools more than ever, but trust is slipping
阅读更多来源: The Decoder | 09-08-25
Yet another study doubts that LLM reasoning shows true logic over pattern imitation
阅读更多来源: The Decoder | 09-08-25
Political pressure reportedly kept a major AI vulnerability study under wraps
阅读更多来源: The Decoder | 08-08-25
An invisible prompt in a Google Doc made ChatGPT access data from a victim’s Google Drive
阅读更多来源: The Decoder | 08-08-25
A deleted GitHub post gives an early look at OpenAI’s next major model, GPT-5
阅读更多来源: The Decoder | 08-08-25
How AI conquered the US economy: A visual FAQderekthompson.org
阅读更多来源: Hacker News | 08-08-25
GPT-5 for Developersopenai.com
阅读更多来源: Hacker News | 08-08-25
Writing a storage engine for Postgres: An in-memory table access method (2023)eatonphil.com
阅读更多来源: Hacker News | 08-08-25
OpenAI's new open-source model is basically Phi-5seangoedecke.com
阅读更多来源: Hacker News | 08-08-25
GPT-5: Key characteristics, pricing and system cardsimonwillison.net
阅读更多来源: Hacker News | 08-08-25
GPT-5openai.com
阅读更多来源: Hacker News | 08-08-25
Claude Code IDE integration for Emacsgithub.com/manzaltu
阅读更多来源: Hacker News | 08-08-25
An LLM does not need to understand MCPhackteam.io
阅读更多来源: Hacker News | 08-08-25
Show HN: Octofriend, a cute coding agent that can swap between GPT-5 and Claudegithub.com/synthetic-lab
阅读更多来源: Hacker News | 08-08-25
OpenAI pushes back as the New York Times demands access to 120 million ChatGPT chat logs
阅读更多来源: The Decoder | 07-08-25
Show HN: Aura – Like robots.txt, but for AI actionsgithub.com/osmandkitay
阅读更多来源: Hacker News | 07-08-25
Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUsbaseten.co
阅读更多来源: Hacker News | 07-08-25
New AI Coding Teammate: Gemini CLI GitHub Actionsblog.google
阅读更多来源: Hacker News | 07-08-25
Position: The Current AI Conference Model is Unsustainable! Diagnosing the Crisis of Centralized AI Conference
Authors: Nuo Chen, Moming Duan, Andre Huikai Lin, Qian Wang, Jiaying Wu, Bingsheng He |
阅读更多来源: ArXiv AI | 07-08-25
Share Your Attention: Transformer Weight Sharing via Matrix-based Dictionary Learning
Authors: Magauiya Zhussip, Dmitriy Shopkhoev, Ammar Ali, Stamatios Lefkimmiatis |
阅读更多来源: ArXiv AI | 07-08-25
TURA: Tool-Augmented Unified Retrieval Agent for AI Search
Authors: Zhejun Zhao, Yuehu Dong, Alley Liu, Lixue Zheng, Pingsheng Liu, Dongdong Shen, Long Xia, Jiashu Zhao, Dawei Yin |
阅读更多来源: ArXiv AI | 07-08-25
YOLOv8-Based Deep Learning Model for Automated Poultry Disease Detection and Health Monitoring paper
Authors: Akhil Saketh Reddy Sabbella, Ch.Lakshmi Prachothan, Eswar Kumar Panta |
阅读更多来源: ArXiv AI | 07-08-25
How are CS students using resources and AI tools for coding tasks?
Authors: Natalia Echeverry, Arun Lekshmi Narayanan |
阅读更多来源: ArXiv AI | 07-08-25
Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management
Authors: Mo Li, L.H. Xu, Qitai Tan, Ting Cao, Yunxin Liu |
阅读更多来源: ArXiv AI | 07-08-25
GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples Replay
Authors: Yunan Zhang, Shuoran Jiang, Mengchen Zhao, Yuefeng Li, Yang Fan, Xiangping Wu, Qingcai Chen |
阅读更多来源: ArXiv AI | 07-08-25
MI9 -- Agent Intelligence Protocol: Runtime Governance for Agentic AI Systems
Authors: Charles L. Wang, Trisha Singhal, Ameya Kelkar, Jason Tuo |
阅读更多来源: ArXiv AI | 07-08-25
Galaxy: A Cognition-Centered Framework for Proactive, Privacy-Preserving, and Self-Evolving LLM Agents
Authors: Chongyu Bao, Ruimin Dai, Yangbo Shen, Runyang Jian, Jinghan Zhang, Xiaolan Liu, Kunpeng Liu |
阅读更多来源: ArXiv AI | 07-08-25
Can Large Language Models Adequately Perform Symbolic Reasoning Over Time Series?
Authors: Zewen Liu, Juntong Ni, Xianfeng Tang, Max S.Y. Lau, Wei Jin |
阅读更多来源: ArXiv AI | 07-08-25
Towards Transparent AI Grading: Semantic Entropy as a Signal for Human-AI Disagreement
Authors: Karrtik Iyer, Manikandan Ravikiran, Prasanna Pendse, Shayan Mohanty |
阅读更多来源: ArXiv AI | 07-08-25
Large Language Model's Multi-Capability Alignment in Biomedical Domain
Authors: Wentao Wu, Linqing Chen, Hanmeng Zhong, Weilei Wang |
阅读更多来源: ArXiv AI | 07-08-25
Beyond Pixels: Exploring DOM Downsampling for LLM-Based Web Agents
Authors: Thassilo M. Schiepanski, Nicholas Piël |
阅读更多来源: ArXiv AI | 07-08-25
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
Authors: Xueyu Hu, Tao Xiong, Biao Yi, Zishu Wei, Ruixuan Xiao, Yurun Chen, Jiasheng Ye, Meiling Tao, Xiangxin Zhou, Ziyu Zhao, Yuhuai Li, Shengze Xu, Shenzhi Wang, Xinchen Xu, Shuofei Qiao, Zhaokai Wang, Kun Kuang, Tieyong Zeng, Liang Wang, Jiwei Li, Yuchen Eleanor Jiang, Wangchunshu Zhou, Guoyin Wang, Keting Yin, Zhou Zhao, Hongxia Yang, Fan Wu, Shengyu Zhang, Fei Wu |
阅读更多来源: ArXiv AI | 07-08-25
\textsc{SimInstruct}: A Responsible Tool for Collecting Scaffolding Dialogues Between Experts and LLM-Simulated Novices
Authors: Si Chen, Izzy Molnar, Ting Hua, Peiyu Li, Le Huy Khiem, G. Alex Ambrose, Jim Lang, Ronald Metoyer, Nitesh V. Chawla |
阅读更多来源: ArXiv AI | 07-08-25
LLM Collaboration With Multi-Agent Reinforcement Learning
Authors: Shuo Liu, Zeyu Liang, Xueguang Lyu, Christopher Amato |
阅读更多来源: ArXiv AI | 07-08-25
ConfProBench: A Confidence Evaluation Benchmark for MLLM-Based Process Judges
Authors: Yue Zhou, Yi Chang, Yuan Wu |
阅读更多来源: ArXiv AI | 07-08-25
Two face trial for exporting Nvidia AI chips as the company rejects hardware kill switches
阅读更多来源: The Decoder | 07-08-25
Anthropic prepares for GPT-5 by releasing its upgraded Claude Opus 4.1 model
阅读更多来源: The Decoder | 07-08-25
ElevenLabs launches Eleven Music, an AI music generator "cleared for broad commercial use"
阅读更多来源: The Decoder | 07-08-25
OpenAI releases its first open-weight language models since GPT-2 with GPT-oss
阅读更多来源: The Decoder | 06-08-25
The EU’s AI Act pushes transparency but could overwhelm developers with paperwork
阅读更多来源: The Decoder | 06-08-25
Eight frontier AI models battle in chess for Game Arena’s first tournament tonight
阅读更多来源: The Decoder | 06-08-25
US considers tracking AI chips, TSMC fires employees over the theft of advanced technology
阅读更多来源: The Decoder | 06-08-25
OpenAI says it doesn't want ChatGPT to become a social media time sink
阅读更多来源: The Decoder | 06-08-25
Claude Opus 4.1anthropic.com
阅读更多来源: Hacker News | 06-08-25
Create personal illustrated storybooks in the Gemini appblog.google
阅读更多来源: Hacker News | 06-08-25
Things that helped me get out of the AI 10x engineer imposter syndromecolton.dev
阅读更多来源: Hacker News | 06-08-25
LLM Inflationtratt.net
阅读更多来源: Hacker News | 06-08-25
Ask HN: Do you struggle with flow state when using AI assisted coding tools?
阅读更多来源: Hacker News | 06-08-25
I gave the AI arms and legs then it rejected megrell.dev
阅读更多来源: Hacker News | 06-08-25
Open models by OpenAIopenai.com
阅读更多来源: Hacker News | 06-08-25
Large Language Model-based Data Science Agent: A Survey
Authors: Peiran Wang, Yaoning Yu, Ke Chen, Xianyang Zhan, Haohan Wang |
阅读更多来源: ArXiv AI | 06-08-25
Recovering Individual-Level Activity Sequences from Location-Based Service Data Using a Novel Transformer-Based Model
Authors: Weiyu Luo, Chenfeng Xiong |
阅读更多来源: ArXiv AI | 06-08-25
Enhancing Japanese Large Language Models with Reasoning Vectors
Authors: Carolina Minami Oguchi, Leo Wei, Koyo Kobayashi, Hsin-Tai Wu, Dipak Ghosal |
阅读更多来源: ArXiv AI | 06-08-25
Defend LLMs Through Self-Consciousness
Authors: Boshi Huang, Fabio Nonato de Paula |
阅读更多来源: ArXiv AI | 06-08-25
AGENTiGraph: A Multi-Agent Knowledge Graph Framework for Interactive, Domain-Specific LLM Chatbots
Authors: Xinjie Zhao, Moritz Blum, Fan Gao, Yingjian Chen, Boming Yang, Luis Marquez-Carpintero, Mónica Pina-Navarro, Yanran Fu, So Morikawa, Yusuke Iwasawa, Yutaka Matsuo, Chanjun Park, Irene Li |
阅读更多来源: ArXiv AI | 06-08-25
When AIs Judge AIs: The Rise of Agent-as-a-Judge Evaluation for LLMs
Authors: Fangyi Yu |
阅读更多来源: ArXiv AI | 06-08-25
Unified Tool Integration for LLMs: A Protocol-Agnostic Approach to Function Calling
Authors: Peng Ding, Rick Stevens |
阅读更多来源: ArXiv AI | 06-08-25
From Text to Trajectories: GPT-2 as an ODE Solver via In-Context
Authors: Ziyang Ma, Baojian Zhou, Deqing Yang, Yanghua Xiao |
阅读更多来源: ArXiv AI | 06-08-25
EoH-S: Evolution of Heuristic Set using LLMs for Automated Heuristic Design
Authors: Fei Liu, Yilu Liu, Qingfu Zhang, Xialiang Tong, Mingxuan Yuan |
阅读更多来源: ArXiv AI | 06-08-25
ContractEval: Benchmarking LLMs for Clause-Level Legal Risk Identification in Commercial Contracts
Authors: Shuang Liu, Zelong Li, Ruoyun Ma, Haiyan Zhao, Mengnan Du |
阅读更多来源: ArXiv AI | 06-08-25
Toward Verifiable Misinformation Detection: A Multi-Tool LLM Agent Framework
Authors: Zikun Cui, Tianyi Huang, Chia-En Chiang, Cuiqianhe Du |
阅读更多来源: ArXiv AI | 06-08-25
Can Large Language Models Bridge the Gap in Environmental Knowledge?
Authors: Linda Smail (College of Interdisciplinary Studies, Zayed University, UAE), David Santandreu Calonge (Department of Academic Development, Mohamed bin Zayed University of Artificial Intelligence, UAE), Firuz Kamalov (School of Engineering, Applied Science and Technology, Canadian University Dubai, UAE), Nur H. Orak (Department of Environmental Engineering, Marmara University, Türkiye) |
阅读更多来源: ArXiv AI | 06-08-25
InqEduAgent: Adaptive AI Learning Partners with Gaussian Process Augmentation
Authors: Tian-Fang Zhao, Wen-Xi Yang |
阅读更多来源: ArXiv AI | 06-08-25
CogBench: A Large Language Model Benchmark for Multilingual Speech-Based Cognitive Impairment Assessment
Authors: Feng Rui, Zhiyao Luo, Wei Wang, Yuting Song, Yong Liu, Tingting Zhu, Jianqing Li, Xingyao Wang |
阅读更多来源: ArXiv AI | 06-08-25
Compressing Chain-of-Thought in LLMs via Step Entropy
Authors: Zeju Li, Jianyuan Zhong, Ziyang Zheng, Xiangyu Wen, Zhijian Xu, Yingying Cheng, Fan Zhang, Qiang Xu |
阅读更多来源: ArXiv AI | 06-08-25
Adaptive AI Agent Placement and Migration in Edge Intelligence Systems
Authors: Xingdan Wang, Jiayi He, Zhiqing Tang, Jianxiong Guo, Jiong Lou, Liping Qian, Tian Wang, Weijia Jia |
阅读更多来源: ArXiv AI | 06-08-25
Board Game Arena: A Framework and Benchmark for Assessing Large Language Models via Strategic Play
Authors: Lucia Cipolina-Kun, Marianna Nezhurina, Jenia Jitsev |
阅读更多来源: ArXiv AI | 06-08-25
A Comparative Study of Neurosymbolic AI Approaches to Interpretable Logical Reasoning
Authors: Michael K. Chen |
阅读更多来源: ArXiv AI | 06-08-25
Multi-Objective Infeasibility Diagnosis for Routing Problems Using Large Language Models
Authors: Kai Li, Ruihao Zheng, Xinye Hao, Zhenkun Wang |
阅读更多来源: ArXiv AI | 06-08-25
Hide and Seek with LLMs: An Adversarial Game for Sneaky Error Generation and Self-Improving Diagnosis
Authors: Rui Zou, Mengqi Wei, Yutao Zhu, Jirong Wen, Xin Zhao, Jing Chen |
阅读更多来源: ArXiv AI | 06-08-25
Semantic-aware Graph-guided Behavior Sequences Generation with Large Language Models for Smart Homes
Authors: Zhiyao Xu, Dan Zhao, Qingsong Zou, Qing Li, Yong Jiang, Yuhang Wang, Jingyu Xiao |
阅读更多来源: ArXiv AI | 06-08-25
Hidden Dynamics of Massive Activations in Transformer Training
Authors: Jorge Gallego-Feliciano, S. Aaron McClendon, Juan Morinelli, Stavros Zervoudakis, Antonios Saravanos |
阅读更多来源: ArXiv AI | 06-08-25
Error Detection and Correction for Interpretable Mathematics in Large Language Models
Authors: Yijin Yang, Cristina Cornelio, Mario Leiva, Paulo Shakarian |
阅读更多来源: ArXiv AI | 06-08-25
Refining Critical Thinking in LLM Code Generation: A Faulty Premise-based Evaluation Framework
Authors: Jialin Li, Jinzhe Li, Gengxu Li, Yi Chang, Yuan Wu |
阅读更多来源: ArXiv AI | 06-08-25
Automated Algorithmic Discovery for Gravitational-Wave Detection Guided by LLM-Informed Evolutionary Monte Carlo Tree Search
Authors: He Wang, Liang Zeng |
阅读更多来源: ArXiv AI | 06-08-25
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Authors: Xufang Luo, Yuge Zhang, Zhiyuan He, Zilong Wang, Siyun Zhao, Dongsheng Li, Luna K. Qiu, Yuqing Yang |
阅读更多来源: ArXiv AI | 06-08-25
Tell HN: Anthropic expires paid credits after a year
阅读更多来源: Hacker News | 06-08-25
Persona vectors allow Anthropic to steer language model behaviors like sycophancy and evil
阅读更多来源: The Decoder | 05-08-25
MLE-STAR is designed to automate machine learning pipelines with minimal human input
阅读更多来源: The Decoder | 05-08-25
I tried to replace myself with ChatGPT in my English classlithub.com
阅读更多来源: Hacker News | 05-08-25
Getting out of the Big-Muddy: Escalation of Commitment in LLMs
Authors: Emilio Barkett, Olivia Long, Paul Kröger |
阅读更多来源: ArXiv AI | 05-08-25
Refine-n-Judge: Curating High-Quality Preference Chains for LLM-Fine-Tuning
Authors: Derin Cayir, Renjie Tao, Rashi Rungta, Kai Sun, Sean Chen, Haidar Khan, Minseok Kim, Julia Reinspach, Yue Liu |
阅读更多来源: ArXiv AI | 05-08-25
Polymorphic Combinatorial Frameworks (PCF): Guiding the Design of Mathematically-Grounded, Adaptive AI Agents
Authors: David Pearl, Matthew Murphy, James Intriligator |
阅读更多来源: ArXiv AI | 05-08-25
T-GRAG: A Dynamic GraphRAG Framework for Resolving Temporal Conflicts and Redundancy in Knowledge Retrieval
Authors: Dong Li, Yichen Niu, Ying Ai, Xiang Zou, Biqing Qi, Jianxing Liu |
阅读更多来源: ArXiv AI | 05-08-25
QCBench: Evaluating Large Language Models on Domain-Specific Quantitative Chemistry
Authors: Jiaqing Xie, Weida Wang, Ben Gao, Zhuo Yang, Haiyuan Wan, Shufei Zhang, Tianfan Fu, Yuqiang Li |
阅读更多来源: ArXiv AI | 05-08-25
A Multi-Agent Pokemon Tournament for Evaluating Strategic Reasoning of Large Language Models
Authors: Tadisetty Sai Yashwanth, Dhatri C |
阅读更多来源: ArXiv AI | 05-08-25
ReflecSched: Solving Dynamic Flexible Job-Shop Scheduling via LLM-Powered Hierarchical Reflection
Authors: Shijie Cao, Yuan Yuan |
阅读更多来源: ArXiv AI | 05-08-25
CloudAnoAgent: Anomaly Detection for Cloud Sites via LLM Agent with Neuro-Symbolic Mechanism
Authors: Xinkai Zou, Xuan Jiang, Ruikai Huang, Haoze He, Parv Kapoor, Jiahua Zhao |
阅读更多来源: ArXiv AI | 05-08-25
TRACEALIGN -- Tracing the Drift: Attributing Alignment Failures to Training-Time Belief Sources in LLMs
Authors: Amitava Das, Vinija Jain, Aman Chadha |
阅读更多来源: ArXiv AI | 05-08-25
SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents
Authors: Jiaye Lin, Yifu Guo, Yuzhen Han, Sen Hu, Ziyi Ni, Licheng Wang, Mingguang Chen, Daxin Jiang, Binxing Jiao, Chen Hu, Huacan Wang |
阅读更多来源: ArXiv AI | 05-08-25
Everyone Contributes! Incentivizing Strategic Cooperation in Multi-LLM Systems via Sequential Public Goods Games
Authors: Yunhao Liang, Yuan Qu, Jingyuan Yang, Shaochong Lin, Zuo-Jun Max Shen |
阅读更多来源: ArXiv AI | 05-08-25
Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools
Authors: Kanghua Mo, Li Hu, Yucheng Long, Zhihao Li |
阅读更多来源: ArXiv AI | 05-08-25
Neuromorphic Computing with Multi-Frequency Oscillations: A Bio-Inspired Approach to Artificial Intelligence
Authors: Boheng Liu, Ziyu Li, Xia Wu |
阅读更多来源: ArXiv AI | 05-08-25
AirTrafficGen: Configurable Air Traffic Scenario Generation with Large Language Models
Authors: Dewi Sid William Gould, George De Ath, Ben Carvell, Nick Pepper |
阅读更多来源: ArXiv AI | 05-08-25
CABENCH: Benchmarking Composable AI for Solving Complex Tasks through Composing Ready-to-Use Models
Authors: Tung-Thuy Pham, Duy-Quan Luong, Minh-Quan Duong, Trung-Hieu Nguyen, Thu-Trang Nguyen, Son Nguyen, Hieu Dinh Vo |
阅读更多来源: ArXiv AI | 05-08-25
Traffic-R1: Reinforced LLMs Bring Human-Like Reasoning to Traffic Signal Control Systems
Authors: Xingchen Zou, Yuhao Yang, Zheng Chen, Xixuan Hao, Yiqi Chen, Chao Huang, Yuxuan Liang |
阅读更多来源: ArXiv AI | 05-08-25
FinWorld: An All-in-One Open-Source Platform for End-to-End Financial AI Research and Deployment
Authors: Wentao Zhang, Yilei Zhao, Chuqiao Zong, Xinrun Wang, Bo An |
阅读更多来源: ArXiv AI | 05-08-25
Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting
Authors: Miaosen Luo, Jiesen Long, Zequn Li, Yunying Yang, Yuncheng Jiang, Sijie Mai |
阅读更多来源: ArXiv AI | 05-08-25
OptiHive: Ensemble Selection for LLM-Based Optimization via Statistical Modeling
Authors: Maxime Bouscary, Saurabh Amin |
阅读更多来源: ArXiv AI | 05-08-25
CAMA: Enhancing Mathematical Reasoning in Large Language Models with Causal Knowledge
Authors: Lei Zan, Keli Zhang, Ruichu Cai, Lujia Pan |
阅读更多来源: ArXiv AI | 05-08-25
Accurate and Interpretable Postmenstrual Age Prediction via Multimodal Large Language Model
Authors: Qifan Chen, Jin Cui, Cindy Duan, Yushuo Han, Yifei Shi |
阅读更多来源: ArXiv AI | 05-08-25
Noosemia: toward a Cognitive and Phenomenological Account of Intentionality Attribution in Human-Generative AI Interaction
Authors: Enrico De Santis, Antonello Rizzi |
阅读更多来源: ArXiv AI | 05-08-25
HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research
Authors: Yinghao Zhu, Yifan Qi, Zixiang Wang, Lei Gu, Dehao Sui, Haoran Hu, Xichen Zhang, Ziyi He, Liantao Ma, Lequan Yu |
阅读更多来源: ArXiv AI | 05-08-25
What Is Your AI Agent Buying? Evaluation, Implications and Emerging Questions for Agentic E-Commerce
Authors: Amine Allouah, Omar Besbes, Josué D Figueroa, Yash Kanoria, Akshit Kumar |
阅读更多来源: ArXiv AI | 05-08-25
Tim Cook tells Apple employees that AI is as pivotal as the internet or the smartphone
阅读更多来源: The Decoder | 05-08-25
Adobe's new AI features make complex Photoshopping effortless
阅读更多来源: The Decoder | 05-08-25
Customizing tmuxevgeniipendragon.com
阅读更多来源: Hacker News | 05-08-25
Job-seekers are dodging AI interviewersfortune.com
阅读更多来源: Hacker News | 05-08-25
OpenAI prepares to launch GPT-5, but big leaps are unlikely
阅读更多来源: The Decoder | 04-08-25
Anthropic blocks OpenAI from accessing Claude models over alleged contract breach
阅读更多来源: The Decoder | 04-08-25
Persona vectors: Monitoring and controlling character traits in language modelsanthropic.com
阅读更多来源: Hacker News | 04-08-25
Backdoor Attacks on Deep Learning Face Detection
Authors: Quentin Le Roux, Yannick Teglia, Teddy Furon, Philippe Loubet-Moundi |
阅读更多来源: ArXiv AI | 04-08-25
Similarity-Based Self-Construct Graph Model for Predicting Patient Criticalness Using Graph Neural Networks and EHR Data
Authors: Mukesh Kumar Sahu, Pinki Roy |
阅读更多来源: ArXiv AI | 04-08-25
NyayaRAG: Realistic Legal Judgment Prediction with RAG under the Indian Common Law System
Authors: Shubham Kumar Nigam, Balaramamahanthi Deepak Patnaik, Shivam Mishra, Ajay Varghese Thomas, Noel Shallum, Kripabandhu Ghosh, Arnab Bhattacharya |
阅读更多来源: ArXiv AI | 04-08-25
On-Device Diffusion Transformer Policy for Efficient Robot Manipulation
Authors: Yiming Wu, Huan Wang, Zhenghao Chen, Jianxin Pang, Dong Xu |
阅读更多来源: ArXiv AI | 04-08-25
Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications
Authors: Wenxuan Wang, Zizhan Ma, Meidan Ding, Shiyi Zheng, Shengyuan Liu, Jie Liu, Jiaming Ji, Wenting Chen, Xiang Li, Linlin Shen, Yixuan Yuan |
阅读更多来源: ArXiv AI | 04-08-25
Agentic large language models improve retrieval-based radiology question answering
Authors: Sebastian Wind, Jeta Sopa, Daniel Truhn, Mahshad Lotfinia, Tri-Thien Nguyen, Keno Bressem, Lisa Adams, Mirabela Rusu, Harald Köstler, Gerhard Wellein, Andreas Maier, Soroosh Tayebi Arasteh |
阅读更多来源: ArXiv AI | 04-08-25
Out-of-Context Abduction: LLMs Make Inferences About Procedural Data Leveraging Declarative Facts in Earlier Training Data
Authors: Sohaib Imran, Rob Lamb, Peter M. Atkinson |
阅读更多来源: ArXiv AI | 04-08-25
How LLMs are Shaping the Future of Virtual Reality
Authors: Süeda Özkaya, Santiago Berrezueta-Guzman, Stefan Wagner |
阅读更多来源: ArXiv AI | 04-08-25
Adaptive Machine Learning-Driven Multi-Fidelity Stratified Sampling for Failure Analysis of Nonlinear Stochastic Systems
Authors: Liuyun Xu, Seymour M.J. Spence |
阅读更多来源: ArXiv AI | 04-08-25
Dynamically Adaptive Reasoning via LLM-Guided MCTS for Efficient and Context-Aware KGQA
Authors: Yingxu Wang, Shiqi Fan, Mengzhu Wang, Siwei Liu |
阅读更多来源: ArXiv AI | 04-08-25
MMBERT: Scaled Mixture-of-Experts Multimodal BERT for Robust Chinese Hate Speech Detection under Cloaking Perturbations
Authors: Qiyao Xue, Yuchen Dou, Ryan Shi, Xiang Lorraine Li, Wei Gao |
阅读更多来源: ArXiv AI | 04-08-25
No AI Without PI! Object-Centric Process Mining as the Enabler for Generative, Predictive, and Prescriptive Artificial Intelligence
Authors: Wil M.P. van der Aalst |
阅读更多来源: ArXiv AI | 04-08-25
Do They Understand Them? An Updated Evaluation on Nonbinary Pronoun Handling in Large Language Models
Authors: Xushuo Tang, Yi Ding, Zhengyi Yang, Yin Chen, Yongrui Gu, Wenke Yang, Mingchen Ju, Xin Cao, Yongfei Liu, Wenjie Zhang |
阅读更多来源: ArXiv AI | 04-08-25
Beyond Agreement: Rethinking Ground Truth in Educational AI Annotation
Authors: Danielle R. Thomas, Conrad Borchers, Kenneth R. Koedinger |
阅读更多来源: ArXiv AI | 04-08-25
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
Authors: Yihong Dong, Xue Jiang, Yongding Tao, Huanyu Liu, Kechi Zhang, Lili Mou, Rongyu Cao, Yingwei Ma, Jue Chen, Binhua Li, Zhi Jin, Fei Huang, Yongbin Li, Ge Li |
阅读更多来源: ArXiv AI | 04-08-25
Mind the Gap: The Divergence Between Human and LLM-Generated Tasks
Authors: Yi-Long Lu, Jiajun Song, Chunhui Zhang, Wei Wang |
阅读更多来源: ArXiv AI | 04-08-25
Pro2Guard: Proactive Runtime Enforcement of LLM Agent Safety via Probabilistic Model Checking
Authors: Haoyu Wang, Chris M. Poskitt, Jun Sun, Jiali Wei |
阅读更多来源: ArXiv AI | 04-08-25
Thinking Machines: Mathematical Reasoning in the Age of LLMs
Authors: Andrea Asperti, Alberto Naibo, Claudio Sacerdoti Coen |
阅读更多来源: ArXiv AI | 04-08-25
MultiSHAP: A Shapley-Based Framework for Explaining Cross-Modal Interactions in Multimodal AI Models
Authors: Zhanliang Wang, Kai Wang |
阅读更多来源: ArXiv AI | 04-08-25
From EMR Data to Clinical Insight: An LLM-Driven Framework for Automated Pre-Consultation Questionnaire Generation
Authors: Ruiqing Ding, Qianfang Sun, Yongkang Leng, Hui Yin, Xiaojian Li |
阅读更多来源: ArXiv AI | 04-08-25
Context-Aware Visualization for Explainable AI Recommendations in Social Media: A Vision for User-Aligned Explanations
Authors: Banan Alkhateeb, Ellis Solaiman |
阅读更多来源: ArXiv AI | 04-08-25
6 weeks of Claude Codepuzzmo.com
阅读更多来源: Hacker News | 03-08-25
Automated Feedback on Student-Generated UML and ER Diagrams Using Large Language Models
Authors: Sebastian Gürtl, Gloria Schimetta, David Kerschbaumer, Michael Liut, Alexander Steinmaurer |
阅读更多来源: ArXiv AI | 03-08-25
From LLMs to Edge: Parameter-Efficient Fine-Tuning on Edge Devices
Authors: Georg Slamanig, Francesco Corti, Olga Saukh |
阅读更多来源: ArXiv AI | 03-08-25
Efficient Masked Attention Transformer for Few-Shot Classification and Segmentation
Authors: Dustin Carrión-Ojeda, Stefan Roth, Simone Schaub-Meyer |
阅读更多来源: ArXiv AI | 03-08-25
LLM-Based Identification of Infostealer Infection Vectors from Screenshots: The Case of Aurora
Authors: Estelle Ruellan, Eric Clay, Nicholas Ascoli |
阅读更多来源: ArXiv AI | 03-08-25
Deep Learning-based Prediction of Clinical Trial Enrollment with Uncertainty Estimates
Authors: Tien Huu Do, Antoine Masquelier, Nae Eoun Lee, Jonathan Crowther |
阅读更多来源: ArXiv AI | 03-08-25
Can LLM-Reasoning Models Replace Classical Planning? A Benchmark Study
Authors: Kai Goebel, Patrik Zips |
阅读更多来源: ArXiv AI | 03-08-25
Distributed AI Agents for Cognitive Underwater Robot Autonomy
Authors: Markus Buchholz, Ignacio Carlucho, Michele Grimaldi, Yvan R. Petillot |
阅读更多来源: ArXiv AI | 03-08-25
A survey of multi-agent geosimulation methodologies: from ABM to LLM
Authors: Virginia Padilla, Jacinto Dávila |
阅读更多来源: ArXiv AI | 03-08-25
Automating AI Failure Tracking: Semantic Association of Reports in AI Incident Database
Authors: Diego Russo, Gian Marco Orlando, Valerio La Gatta, Vincenzo Moscato |
阅读更多来源: ArXiv AI | 03-08-25
FairReason: Balancing Reasoning and Social Bias in MLLMs
Authors: Zhenyu Pan, Yutong Zhang, Jianshu Zhang, Haoran Lu, Haozheng Luo, Yuwei Han, Philip S. Yu, Manling Li, Han Liu |
阅读更多来源: ArXiv AI | 03-08-25
Data Readiness for Scientific AI at Scale
Authors: Wesley Brewer, Patrick Widener, Valentine Anantharaj, Feiyi Wang, Tom Beck, Arjun Shankar, Sarp Oral |
阅读更多来源: ArXiv AI | 03-08-25
How Far Are AI Scientists from Changing the World?
Authors: Qiujie Xie, Yixuan Weng, Minjun Zhu, Fuchen Shen, Shulin Huang, Zhen Lin, Jiahui Zhou, Zilan Mao, Zijie Yang, Linyi Yang, Jian Wu, Yue Zhang |
阅读更多来源: ArXiv AI | 03-08-25
LLM4Rail: An LLM-Augmented Railway Service Consulting Platform
Authors: Zhuo Li, Xianghuai Deng, Chiwei Feng, Hanmeng Li, Shenjie Wang, Haichao Zhang, Teng Jia, Conlin Chen, Louis Linchun Wu, Jia Wang |
阅读更多来源: ArXiv AI | 03-08-25
DICE: Dynamic In-Context Example Selection in LLM Agents via Efficient Knowledge Transfer
Authors: Ruoyu Wang, Junda Wu, Yu Xia, Tong Yu, Ryan A. Rossi, Julian McAuley, Lina Yao |
阅读更多来源: ArXiv AI | 03-08-25
MemoCue: Empowering LLM-Based Agents for Human Memory Recall via Strategy-Guided Querying
Authors: Qian Zhao, Zhuo Sun, Bin Guo, Zhiwen Yu |
阅读更多来源: ArXiv AI | 03-08-25
TextQuests: How Good are LLMs at Text-Based Video Games?
Authors: Long Phan, Mantas Mazeika, Andy Zou, Dan Hendrycks |
阅读更多来源: ArXiv AI | 03-08-25
SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model
Authors: Mingkai Deng, Jinyu Hou, Yilin Shen, Hongxia Jin, Graham Neubig, Zhiting Hu, Eric Xing |
阅读更多来源: ArXiv AI | 03-08-25
OpenAI has reportedly raised $8.3 billion at a $300 billion valuation
阅读更多来源: The Decoder | 03-08-25
Anthropic CEO talks about being labeled a doomer and his OpenAI departure
阅读更多来源: The Decoder | 03-08-25
Under mounting pressure, Apple plans to increase its spending on artificial intelligence projects
阅读更多来源: The Decoder | 03-08-25
Show HN: WebGPU enables local LLM in the browser – demo site with AI chatandreinwald.github.io
阅读更多来源: Hacker News | 03-08-25
Show HN: AI Physics Tutor with Free Body Diagramsphysicsviewer.com
阅读更多来源: Hacker News | 03-08-25
Every leading AI agent failed at least one security test during a massive red teaming competition
阅读更多来源: The Decoder | 03-08-25
Robert Wilson has diedtheartnewspaper.com
阅读更多来源: Hacker News | 02-08-25
Anthropic revokes OpenAI's access to Claudewired.com
阅读更多来源: Hacker News | 02-08-25
Tim Cook rallying Apple employees around AI effortsbloomberg.com
阅读更多来源: Hacker News | 02-08-25
Launch HN: Societies.io (YC W25) – AI simulations of your target audience
阅读更多来源: Hacker News | 02-08-25
Aerodynamic drag in small cyclist formations: shielding the protected rider [pdf]urbanphysics.net
阅读更多来源: Hacker News | 02-08-25
OpenAI's "Study Mode" and the risks of flatteryresobscura.substack.com
阅读更多来源: Hacker News | 02-08-25
Google adds image-to-video and Veo 3 Fast to the Gemini API
阅读更多来源: The Decoder | 02-08-25
Coverage Cat (YC S22) Is Hiring a Senior, Staff, or Principal Engineercoveragecat.com
阅读更多来源: Hacker News | 02-08-25
Make Your Own Backup System – Part 2: Forging the FreeBSD Backup Strongholddragas.net
阅读更多来源: Hacker News | 02-08-25
The tradeoff between human and AI contextsoftwaredoug.com
阅读更多来源: Hacker News | 02-08-25
Deep Agentslangchain.com
阅读更多来源: Hacker News | 02-08-25
Gemini 2.5 Deep Thinkblog.google
阅读更多来源: Hacker News | 02-08-25
Respect instead of sarcasm: study uses AI for better political debates
阅读更多来源: The Decoder | 02-08-25
OpenAI is building Stargate Norway while its annual spending is expected to soar to $8 billion
阅读更多来源: The Decoder | 01-08-25
Interview with Microsoft: Copilot, AI skills, and building a learning organization
阅读更多来源: The Decoder | 01-08-25
Google DeepMind unveils an AI model that acts as a "virtual satellite" for mapping the entire planet
阅读更多来源: The Decoder | 01-08-25
Google and xAI sign EU AI Code of Practice
阅读更多来源: The Decoder | 01-08-25
PHP-ORT: Machine learning inference for the webkrakjoe.github.io
阅读更多来源: Hacker News | 01-08-25
Gemini Embedding: Powering RAG and context engineeringgoogleblog.com
阅读更多来源: Hacker News | 01-08-25
Many countries that said no to ChatControl in 2024 are now undecideddigitalcourage.social
阅读更多来源: Hacker News | 01-08-25
Gemini 2.5 Deep Thinktwitter.com/googledeepmind
阅读更多来源: Hacker News | 01-08-25
Show HN: AgentMail – Email infra for AI agentsagentmail.to
阅读更多来源: Hacker News | 01-08-25
Launch HN: Gecko Security (YC F24) – AI That Finds Vulnerabilities in Code
阅读更多来源: Hacker News | 01-08-25
Show HN: Mcp-use – Connect any LLM to any MCPgithub.com/mcp-use
阅读更多来源: Hacker News | 01-08-25
OpenAI launches Study Mode for ChatGPT while education users are told to wait and learn later
阅读更多来源: The Decoder | 31-07-25
Anthropic could soon be valued at $170 billion
阅读更多来源: The Decoder | 31-07-25
Some Meta employees fear being sidelined as Zuckerberg reshuffles teams for AI progress
阅读更多来源: The Decoder | 31-07-25
Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs
Authors: Xikang Yang, Biyu Zhou, Xuehai Tang, Jizhong Han, Songlin Hu |
阅读更多来源: ArXiv AI | 31-07-25
aLLoyM: A large language model for alloy phase diagram prediction
Authors: Yuna Oikawa, Guillaume Deffrennes, Taichi Abe, Ryo Tamura, Koji Tsuda |
阅读更多来源: ArXiv AI | 31-07-25
RePaCA: Leveraging Reasoning Large Language Models for Static Automated Patch Correctness Assessment
Authors: Marcos Fuster-Pena, David de-Fitero-Dominguez, Antonio Garcia-Cabot, Eva Garcia-Lopez |
阅读更多来源: ArXiv AI | 31-07-25
Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning
Authors: Afshin Khadangi, Amir Sartipi, Igor Tchappi, Ramin Bahmani, Gilbert Fridgen |
阅读更多来源: ArXiv AI | 31-07-25
BALSAM: A Platform for Benchmarking Arabic Large Language Models
Authors: Rawan Al-Matham, Kareem Darwish, Raghad Al-Rasheed, Waad Alshammari, Muneera Alhoshan, Amal Almazrua, Asma Al Wazrah, Mais Alheraki, Firoj Alam, Preslav Nakov, Norah Alzahrani, Eman alBilali, Nizar Habash, Abdelrahman El-Sheikh, Muhammad Elmallah, Haonan Li, Hamdy Mubarak, Mohamed Anwar, Zaid Alyafeai, Ahmed Abdelali, Nora Altwairesh, Maram Hasanain, Abdulmohsen Al Thubaity, Shady Shehata, Bashar Alhafni, Injy Hamed, Go Inoue, Khalid Elmadani, Ossama Obeid, Fatima Haouari, Tamer Elsayed, Emad Alghamdi, Khalid Almubarak, Saied Alshahrani, Ola Aljarrah, Safa Alajlan, Areej Alshaqarawi, Maryam Alshihri, Sultana Alghurabi, Atikah Alzeghayer, Afrah Altamimi, Abdullah Alfaifi, Abdulrahman AlOsaimy |
阅读更多来源: ArXiv AI | 31-07-25
A Systematic Literature Review on Detecting Software Vulnerabilities with Large Language Models
Authors: Sabrina Kaniewski, Fabian Schmidt, Markus Enzweiler, Michael Menth, Tobias Heer |
阅读更多来源: ArXiv AI | 31-07-25
H2Tune: Federated Foundation Model Fine-Tuning with Hybrid Heterogeneity
Authors: Wei Guo, Siyuan Lu, Yiqi Tong, Zhaojun Hu, Fuzhen Zhuang, Xiao Zhang, Tao Fan, Jin Dong |
阅读更多来源: ArXiv AI | 31-07-25
Teaching the Teacher: Improving Neural Network Distillability for Symbolic Regression via Jacobian Regularization
Authors: Soumyadeep Dhar, Kei Sen Fong, Mehul Motani |
阅读更多来源: ArXiv AI | 31-07-25
OFCnetLLM: Large Language Model for Network Monitoring and Alertness
Authors: Hong-Jun Yoon, Mariam Kiran, Danial Ebling, Joe Breen |
阅读更多来源: ArXiv AI | 31-07-25
LLM-Crowdsourced: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models
Authors: Qianhong Guo, Wei Xie, Xiaofang Cai, Enze Wang, Shuoyoucheng Ma, Kai Chen, Xiaofeng Wang, Baosheng Wang |
阅读更多来源: ArXiv AI | 31-07-25
An Explainable Emotion Alignment Framework for LLM-Empowered Agent in Metaverse Service Ecosystem
Authors: Qun Ma, Xiao Xue, Ming Zhang, Yifan Shen, Zihan Zhao |
阅读更多来源: ArXiv AI | 31-07-25
Explainability Through Systematicity: The Hard Systematicity Challenge for Artificial Intelligence
Authors: Matthieu Queloz |
阅读更多来源: ArXiv AI | 31-07-25
Beyond Accuracy: How AI Metacognitive Sensitivity improves AI-assisted Decision Making
Authors: ZhaoBin Li, Mark Steyvers |
阅读更多来源: ArXiv AI | 31-07-25
The Incomplete Bridge: How AI Research (Mis)Engages with Psychology
Authors: Han Jiang, Pengda Wang, Xiaoyuan Yi, Xing Xie, Ziang Xiao |
阅读更多来源: ArXiv AI | 31-07-25
Enhancing Manufacturing Knowledge Access with LLMs and Context-aware Prompting
Authors: Sebastian Monka, Irlan Grangel-González, Stefan Schmid, Lavdim Halilaj, Marc Rickart, Oliver Rudolph, Rui Dias |
阅读更多来源: ArXiv AI | 31-07-25
Automatically discovering heuristics in a complex SAT solver with large language models
Authors: Yiwen Sun, Furong Ye, Zhihan Chen, Ke Wei, Shaowei Cai |
阅读更多来源: ArXiv AI | 31-07-25
Launch HN: Lucidic (YC W25) – Debug, test, and evaluate AI agents in production
阅读更多来源: Hacker News | 31-07-25
Show HN: AgentGuard – Auto-kill AI agents before they burn through your budgetgithub.com/dipampaul17
阅读更多来源: Hacker News | 31-07-25
OpenAI's ChatGPT Agent casually clicks through "I am not a robot" verificationarstechnica.com
阅读更多来源: Hacker News | 31-07-25
AI startup tackles bottleneck where people spend more time checking AI content than creating it
阅读更多来源: The Decoder | 31-07-25
Show HN: An AI agent that learns your product and guides your usersfrigade.ai
阅读更多来源: Hacker News | 31-07-25
A major AI training data set contains millions of examples of personal datatechnologyreview.com
阅读更多来源: Hacker News | 31-07-25
Show HN: Open-source alternative to ChatGPT Agents for browsinggithub.com/trymeka
阅读更多来源: Hacker News | 31-07-25
Critical vulnerability in AI coding platform Base44 allowing unauthorized accesswiz.io
阅读更多来源: Hacker News | 31-07-25
Crush: Glamourous AI coding agent for your favourite terminalgithub.com/charmbracelet
阅读更多来源: Hacker News | 31-07-25
Efficacy of AI RAG Tools for Complex Information Extraction and Data Annotation Tasks: A Case Study Using Banks Public Disclosures
Authors: Nicholas Botti (Federal Reserve Board), Flora Haberkorn (Federal Reserve Board), Charlotte Hoopes (Federal Reserve Board), Shaun Khan (Federal Reserve Board) |
阅读更多来源: ArXiv AI | 30-07-25
Games Agents Play: Towards Transactional Analysis in LLM-based Multi-Agent Systems
Authors: Monika Zamojska, Jarosław A. Chudziak |
阅读更多来源: ArXiv AI | 30-07-25
Validating Pharmacogenomics Generative Artificial Intelligence Query Prompts Using Retrieval-Augmented Generation (RAG)
Authors: Ashley Rector, Keaton Minor, Kamden Minor, Jeff McCormack, Beth Breeden, Ryan Nowers, Jay Dorris |
阅读更多来源: ArXiv AI | 30-07-25
Evo-DKD: Dual-Knowledge Decoding for Autonomous Ontology Evolution in Large Language Models
Authors: Vishal Raman, Vijai Aravindh R |
阅读更多来源: ArXiv AI | 30-07-25
Graph-Augmented Large Language Model Agents: Current Progress and Future Prospects
Authors: Yixin Liu, Guibin Zhang, Kun Wang, Shiyuan Li, Shirui Pan |
阅读更多来源: ArXiv AI | 30-07-25
What Does it Mean for a Neural Network to Learn a "World Model"?
Authors: Kenneth Li, Fernanda Viégas, Martin Wattenberg |
阅读更多来源: ArXiv AI | 30-07-25
MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions
Authors: Yanxu Zhu, Shitong Duan, Xiangxu Zhang, Jitao Sang, Peng Zhang, Tun Lu, Xiao Zhou, Jing Yao, Xiaoyuan Yi, Xing Xie |
阅读更多来源: ArXiv AI | 30-07-25
Large Language Models for Supply Chain Decisions
Authors: David Simchi-Levi, Konstantina Mellou, Ishai Menache, Jeevan Pathuri |
阅读更多来源: ArXiv AI | 30-07-25
An LLM Driven Agent Framework for Automated Infrared Spectral Multi Task Reasoning
Authors: Zujie Xie, Zixuan Chen, Jiheng Liang, Xiangyang Yu, Ziru Yu |
阅读更多来源: ArXiv AI | 30-07-25
SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval-Augmented Generation
Authors: Hao Ye, Mengshi Qi, Zhaohong Liu, Liang Liu, Huadong Ma |
阅读更多来源: ArXiv AI | 30-07-25
Large Language Models for Wireless Communications: From Adaptation to Autonomy
Authors: Le Liang, Hao Ye, Yucheng Sheng, Ouya Wang, Jiacheng Wang, Shi Jin, Geoffrey Ye Li |
阅读更多来源: ArXiv AI | 30-07-25
Self-Aware Safety Augmentation: Leveraging Internal Semantic Understanding to Enhance Safety in Vision-Language Models
Authors: Wanying Wang, Zeyu Ma, Han Zheng, Xin Tan, Mingang Chen |
阅读更多来源: ArXiv AI | 30-07-25
StaffPro: an LLM Agent for Joint Staffing and Profiling
Authors: Alessio Maritan |
阅读更多来源: ArXiv AI | 30-07-25
Exploring the Link Between Bayesian Inference and Embodied Intelligence: Toward Open Physical-World Embodied AI Systems
Authors: Bin Liu |
阅读更多来源: ArXiv AI | 30-07-25
Towards a rigorous evaluation of RAG systems: the challenge of due diligence
Authors: Grégoire Martinon, Alexandra Lorenzo de Brionne, Jérôme Bohard, Antoine Lojou, Damien Hervault, Nicolas J-B. Brunel (ENSIIE, LaMME) |
阅读更多来源: ArXiv AI | 30-07-25
Can the current trends of AI handle a full course of mathematics?
Authors: Mariam Alsayyad, Fayadh Kadhem |
阅读更多来源: ArXiv AI | 30-07-25
An Agentic AI for a New Paradigm in Business Process Development
Authors: Mohammad Azarijafari, Luisa Mich, Michele Missikoff |
阅读更多来源: ArXiv AI | 30-07-25
Tiny-BioMoE: a Lightweight Embedding Model for Biosignal Analysis
Authors: Stefanos Gkikas, Ioannis Kyprakis, Manolis Tsiknakis |
阅读更多来源: ArXiv AI | 30-07-25
Efficient Pain Recognition via Respiration Signals: A Single Cross-Attention Transformer Multi-Window Fusion Pipeline
Authors: Stefanos Gkikas, Ioannis Kyprakis, Manolis Tsiknakis |
阅读更多来源: ArXiv AI | 30-07-25
Libra: Large Chinese-based Safeguard for AI Content
Authors: Ziyang Chen, Huimu Yu, Xing Wu, Dongqin Liu, Songlin Hu |
阅读更多来源: ArXiv AI | 30-07-25
LLM-based Content Classification Approach for GitHub Repositories by the README Files
Authors: Malik Uzair Mehmood, Shahid Hussain, Wen Li Wang, Muhammad Usama Malik |
阅读更多来源: ArXiv AI | 30-07-25
PHAX: A Structured Argumentation Framework for User-Centered Explainable AI in Public Health and Biomedical Sciences
Authors: Bahar İlgen, Akshat Dubey, Georges Hattab |
阅读更多来源: ArXiv AI | 30-07-25
Launch HN: Hyprnote (YC S25) – An open-source AI meeting notetaker
阅读更多来源: Hacker News | 30-07-25
Study modeopenai.com
阅读更多来源: Hacker News | 30-07-25
Irrelevant facts about cats added to math problems increase LLM errors by 300%science.org
阅读更多来源: Hacker News | 30-07-25
Show HN: I built an AI that turns any book into a text adventure gamekathaaverse.com
阅读更多来源: Hacker News | 30-07-25
Tencent releases Hunyuan World Model 1.0 as an open-source AI for 3D scene generation
阅读更多来源: The Decoder | 29-07-25
Enough AI copilots, we need AI HUDsgeoffreylitt.com
阅读更多来源: Hacker News | 29-07-25
Claude Code weekly rate limits
阅读更多来源: Hacker News | 29-07-25
Show HN: Companies use AI to take your calls. I built AI to make them for youpipervoice.com
阅读更多来源: Hacker News | 29-07-25
Anthropic Faces Potentially "Business-Ending" Copyright Lawsuitobsolete.pub
阅读更多来源: Hacker News | 29-07-25
Tao on “blue team” vs. “red team” LLMsmathstodon.xyz
阅读更多来源: Hacker News | 29-07-25
Alignment and Safety in Large Language Models: Safety Mechanisms, Training Paradigms, and Emerging Challenges
Authors: Haoran Lu, Luyang Fang, Ruidong Zhang, Xinliang Li, Jiazhang Cai, Huimin Cheng, Lin Tang, Ziyu Liu, Zeliang Sun, Tao Wang, Yingchuan Zhang, Arif Hassan Zidan, Jinwen Xu, Jincheng Yu, Meizhi Yu, Hanqi Jiang, Xilin Gong, Weidi Luo, Bolun Sun, Yongkai Chen, Terry Ma, Shushan Wu, Yifan Zhou, Junhao Chen, Haotian Xiang, Jing Zhang, Afrar Jahin, Wei Ruan, Ke Deng, Yi Pan, Peilong Wang, Jiahui Li, Zhengliang Liu, Lu Zhang, Lin Zhao, Wei Liu, Dajiang Zhu, Xin Xing, Fei Dou, Wei Zhang, Chao Huang, Rongjie Liu, Mengrui Zhang, Yiwen Liu, Xiaoxiao Sun, Qin Lu, Zhen Xiang, Wenxuan Zhong, Tianming Liu, Ping Ma |
阅读更多来源: ArXiv AI | 29-07-25
DeltaLLM: A Training-Free Framework Exploiting Temporal Sparsity for Efficient Edge LLM Inference
Authors: Jiawen Qi, Chang Gao, Zhaochun Ren, Qinyu Chen |
阅读更多来源: ArXiv AI | 29-07-25
Leveraging Fine-Tuned Large Language Models for Interpretable Pancreatic Cystic Lesion Feature Extraction and Risk Categorization
Authors: Ebrahim Rasromani, Stella K. Kang, Yanqi Xu, Beisong Liu, Garvit Luhadia, Wan Fung Chui, Felicia L. Pasadyn, Yu Chih Hung, Julie Y. An, Edwin Mathieu, Zehui Gu, Carlos Fernandez-Granda, Ammar A. Javed, Greg D. Sacks, Tamas Gonda, Chenchan Huang, Yiqiu Shen |
阅读更多来源: ArXiv AI | 29-07-25
Can LLMs Solve ASP Problems? Insights from a Benchmarking Study (Extended Version)
Authors: Lin Ren, Guohui Xiao, Guilin Qi, Yishuai Geng, Haohan Xue |
阅读更多来源: ArXiv AI | 29-07-25
Improving Subgraph Matching by Combining Algorithms and Graph Neural Networks
Authors: Shuyang Guo, Wenjin Xie, Ping Lu, Ting Deng, Richong Zhang, Jianxin Li, Xiangping Huang, Zhongyi Liu |
阅读更多来源: ArXiv AI | 29-07-25
The Policy Cliff: A Theoretical Analysis of Reward-Policy Maps in Large Language Models
Authors: Xingcheng Xu |
阅读更多来源: ArXiv AI | 29-07-25
PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training
Authors: Sarat Chandra Bobbili, Ujwal Dinesha, Dheeraj Narasimha, Srinivas Shakkottai |
阅读更多来源: ArXiv AI | 29-07-25
Matching Game Preferences Through Dialogical Large Language Models: A Perspective
Authors: Renaud Fabre, Daniel Egret, Patrice Bellot |
阅读更多来源: ArXiv AI | 29-07-25
Artificial Intelligence In Patent And Market Intelligence: A New Paradigm For Technology Scouting
Authors: Manish Verma, Vivek Sharma, Vishal Singh |
阅读更多来源: ArXiv AI | 29-07-25
Unlearning of Knowledge Graph Embedding via Preference Optimization
Authors: Jiajun Liu, Wenjun Ke, Peng Wang, Yao He, Ziyu Shang, Guozheng Li, Zijie Xu, Ke Ji |
阅读更多来源: ArXiv AI | 29-07-25
MeLA: A Metacognitive LLM-Driven Architecture for Automatic Heuristic Design
Authors: Zishang Qiu, Xinan Chen, Long Chen, Ruibin Bai |
阅读更多来源: ArXiv AI | 29-07-25
Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition
Authors: Andy Zou, Maxwell Lin, Eliot Jones, Micha Nowak, Mateusz Dziemian, Nick Winter, Alexander Grattan, Valent Nathanael, Ayla Croft, Xander Davies, Jai Patel, Robert Kirk, Nate Burnikell, Yarin Gal, Dan Hendrycks, J. Zico Kolter, Matt Fredrikson |
阅读更多来源: ArXiv AI | 29-07-25
Enhancing QoS in Edge Computing through Federated Layering Techniques: A Pathway to Resilient AI Lifelong Learning Systems
Authors: Chengzhuo Han |
阅读更多来源: ArXiv AI | 29-07-25
MMGraphRAG: Bridging Vision and Language with Interpretable Multimodal Knowledge Graphs
Authors: Xueyao Wan, Hang Yu |
阅读更多来源: ArXiv AI | 29-07-25
evalSmarT: An LLM-Based Framework for Evaluating Smart Contract Generated Comments
Authors: Fatou Ndiaye Mbodji |
阅读更多来源: ArXiv AI | 29-07-25
On the Limits of Hierarchically Embedded Logic in Classical Neural Networks
Authors: Bill Cochran |
阅读更多来源: ArXiv AI | 29-07-25
MIRAGE-Bench: LLM Agent is Hallucinating and Where to Find Them
Authors: Weichen Zhang, Yiyou Sun, Pohao Huang, Jiayue Pu, Heyue Lin, Dawn Song |
阅读更多来源: ArXiv AI | 29-07-25
Principles for production AI agentsapp.build
阅读更多来源: Hacker News | 29-07-25
AI Is Wrecking a Fragile Job Market for College Graduateswsj.com
阅读更多来源: Hacker News | 29-07-25
China pitches new global AI regulator based in Shanghai
阅读更多来源: The Decoder | 28-07-25
China exports state propaganda with low-cost open source AI models
阅读更多来源: The Decoder | 28-07-25
Mistral AI publishes the first comprehensive life cycle assessment of a large language model
阅读更多来源: The Decoder | 28-07-25
Amazon launches Kiro to streamline AI prototyping
阅读更多来源: The Decoder | 28-07-25
Claude Code Routergithub.com/musistudio
阅读更多来源: Hacker News | 28-07-25
LLM Embeddings Explained: A Visual and Intuitive Guidehuggingface.co
阅读更多来源: Hacker News | 28-07-25
Automated Code Review Using Large Language Models at Ericsson: An Experience Report
Authors: Shweta Ramesh, Joy Bose, Hamender Singh, A K Raghavan, Sujoy Roychowdhury, Giriprasad Sridhara, Nishrith Saini, Ricardo Britto |
阅读更多来源: ArXiv AI | 28-07-25
Solar Photovoltaic Assessment with Large Language Model
Authors: Muhao Guo, Yang Weng |
阅读更多来源: ArXiv AI | 28-07-25
PrompTrend: Continuous Community-Driven Vulnerability Discovery and Assessment for Large Language Models
Authors: Tarek Gasmi, Ramzi Guesmi, Mootez Aloui, Jihene Bennaceur |
阅读更多来源: ArXiv AI | 28-07-25
An Empirical Investigation of Gender Stereotype Representation in Large Language Models: The Italian Case
Authors: Gioele Giachino, Marco Rondina, Antonio Vetrò, Riccardo Coppola, Juan Carlos De Martin |
阅读更多来源: ArXiv AI | 28-07-25
Enhancing Diabetic Retinopathy Classification Accuracy through Dual Attention Mechanism in Deep Learning
Authors: Abdul Hannan, Zahid Mahmood, Rizwan Qureshi, Hazrat Ali |
阅读更多来源: ArXiv AI | 28-07-25
Can Small-Scale Data Poisoning Exacerbate Dialect-Linked Biases in Large Language Models?
Authors: Chaymaa Abbas, Mariette Awad, Razane Tajeddine |
阅读更多来源: ArXiv AI | 28-07-25
Towards LLM-Enhanced Group Recommender Systems
Authors: Sebastian Lubos, Alexander Felfernig, Thi Ngoc Trang Tran, Viet-Man Le, Damian Garber, Manuel Henrich, Reinhard Willfort, Jeremias Fuchs |
阅读更多来源: ArXiv AI | 28-07-25
Fine-Tuning Multilingual Language Models for Code Review: An Empirical Study on Industrial C# Projects
Authors: Igli Begolli, Meltem Aksoy, Daniel Neider |
阅读更多来源: ArXiv AI | 28-07-25
Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Tasks
Authors: Kai Liu, Zhan Su, Peijie Dong, Fengran Mo, Jianfei Gao, ShaoTing Zhang, Kai Chen |
阅读更多来源: ArXiv AI | 28-07-25
Doubling Your Data in Minutes: Ultra-fast Tabular Data Generation via LLM-Induced Dependency Graphs
Authors: Shuo Yang, Zheyu Zhang, Bardh Prenkaj, Gjergji Kasneci |
阅读更多来源: ArXiv AI | 28-07-25
SIDE: Sparse Information Disentanglement for Explainable Artificial Intelligence
Authors: Viktar Dubovik, Łukasz Struski, Jacek Tabor, Dawid Rymarczyk |
阅读更多来源: ArXiv AI | 28-07-25
SpeechIQ: Speech Intelligence Quotient Across Cognitive Levels in Voice Understanding Large Language Models
Authors: Zhen Wan, Chao-Han Huck Yang, Yahan Yu, Jinchuan Tian, Sheng Li, Ke Hu, Zhehuai Chen, Shinji Watanabe, Fei Cheng, Chenhui Chu, Sadao Kurohashi |
阅读更多来源: ArXiv AI | 28-07-25
ReCatcher: Towards LLMs Regression Testing for Code Generation
Authors: Altaf Allah Abbassi, Leuson Da Silva, Amin Nikanjam, Foutse Khomh |
阅读更多来源: ArXiv AI | 28-07-25
Running in CIRCLE? A Simple Benchmark for LLM Code Interpreter Security
Authors: Gabriel Chua |
阅读更多来源: ArXiv AI | 28-07-25
Advancing Event Forecasting through Massive Training of Large Language Models: Challenges, Solutions, and Broader Impacts
Authors: Sang-Woo Lee, Sohee Yang, Donghyun Kwak, Noah Y. Siegel |
阅读更多来源: ArXiv AI | 28-07-25
Knowledge Grafting: A Mechanism for Optimizing AI Model Deployment in Resource-Constrained Environments
Authors: Osama Almurshed, Ashish Kaushal, Asmail Muftah, Nitin Auluck, Omer Rana |
阅读更多来源: ArXiv AI | 28-07-25
Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges
Authors: Patrick Taillandier, Jean Daniel Zucker, Arnaud Grignard, Benoit Gaudou, Nghi Quang Huynh, Alexis Drogoul |
阅读更多来源: ArXiv AI | 28-07-25
Microsoft revives Clippy as an AI blob in a new Copilot Appearance test
阅读更多来源: The Decoder | 27-07-25
No AI Contenteclecticlight.co
阅读更多来源: Hacker News | 27-07-25
Fast and cheap bulk storage: using LVM to cache HDDs on SSDsquantum5.ca
阅读更多来源: Hacker News | 27-07-25
Linux on Snapdragon X Elite: Linaro and Tuxedo Pave the Way for ARM64 Laptopslinaro.org
阅读更多来源: Hacker News | 27-07-25
Restoring Rhythm: Punctuation Restoration Using Transformer Models for Bangla, a Low-Resource Language
Authors: Md Obyedullahil Mamun, Md Adyelullahil Mamun, Arif Ahmad, Md. Imran Hossain Emu |
阅读更多来源: ArXiv AI | 27-07-25
AraTable: Benchmarking LLMs' Reasoning and Understanding of Arabic Tabular Data
Authors: Rana Alshaikh, Israa Alghanmi, Shelan Jeawak |
阅读更多来源: ArXiv AI | 27-07-25
CLEAR: Error Analysis via LLM-as-a-Judge Made Easy
Authors: Asaf Yehudai, Lilach Eden, Yotam Perlitz, Roy Bar-Haim, Michal Shmueli-Scheuer |
阅读更多来源: ArXiv AI | 27-07-25
Automated Code Review Using Large Language Models with Symbolic Reasoning
Authors: Busra Icoz, Goksel Biricik |
阅读更多来源: ArXiv AI | 27-07-25
Sandwich: Separating Prefill-Decode Compilation for Efficient CPU LLM Serving
Authors: Juntao Zhao, Jiuru Li, Chuan Wu |
阅读更多来源: ArXiv AI | 27-07-25
HARLF: Hierarchical Reinforcement Learning and Lightweight LLM-Driven Sentiment Integration for Financial Portfolio Optimization
Authors: Benjamin Coriat, Eric Benhamou |
阅读更多来源: ArXiv AI | 27-07-25
AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs
Authors: Xiaopeng Ke, Hexuan Deng, Xuebo Liu, Jun Rao, Zhenxi Song, Jun Yu, Min Zhang |
阅读更多来源: ArXiv AI | 27-07-25
SMARTAPS: Tool-augmented LLMs for Operations Management
Authors: Timothy Tin Long Yu, Mahdi Mostajabdaveh, Jabo Serge Byusa, Rindra Ramamonjison, Giuseppe Carenini, Kun Mao, Zirui Zhou, Yong Zhang |
阅读更多来源: ArXiv AI | 27-07-25
Does visualization help AI understand data?
Authors: Victoria R. Li, Johnathan Sun, Martin Wattenberg |
阅读更多来源: ArXiv AI | 27-07-25
Agentic AI framework for End-to-End Medical Data Inference
Authors: Soorya Ram Shimgekar, Shayan Vassef, Abhay Goyal, Navin Kumar, Koustuv Saha |
阅读更多来源: ArXiv AI | 27-07-25
Foundations for Risk Assessment of AI in Protecting Fundamental Rights
Authors: Antonino Rotolo, Beatrice Ferrigno, Jose Miguel Angel Garcia Godinez, Claudio Novelli, Giovanni Sartor |
阅读更多来源: ArXiv AI | 27-07-25
Decoupling Knowledge and Reasoning in LLMs: An Exploration Using Cognitive Dual-System Theory
Authors: Mutian Yang, Jiandong Gao, Ji Wu |
阅读更多来源: ArXiv AI | 27-07-25
Reasoning Beyond the Obvious: Evaluating Divergent and Convergent Thinking in LLMs for Financial Scenarios
Authors: Zhuang Qiang Bok, Watson Wei Khong Chua |
阅读更多来源: ArXiv AI | 27-07-25
Revisiting LLM Reasoning via Information Bottleneck
Authors: Shiye Lei, Zhihao Cheng, Kai Jia, Dacheng Tao |
阅读更多来源: ArXiv AI | 27-07-25
Reports say GPT-5 could arrive in August with improvements in coding
阅读更多来源: The Decoder | 27-07-25
Google Deepmind's Aeneas AI helps historians quickly restore and interpret Roman inscriptions
阅读更多来源: The Decoder | 26-07-25
Reuters says at least a dozen Shenzhen firms repair banned Nvidia H100 and A100 AI chips
阅读更多来源: The Decoder | 26-07-25
Google says AI content is fine, and SEO basics still apply to AI-powered search
阅读更多来源: The Decoder | 26-07-25
Show HN: Price Per Token – LLM API Pricing Datapricepertoken.com
阅读更多来源: Hacker News | 26-07-25
Claude Code introduces specialized sub-agentsanthropic.com
阅读更多来源: Hacker News | 26-07-25
AWS shuts its Shanghai AI lab as McKinsey bans generative AI projects for clients in China
阅读更多来源: The Decoder | 25-07-25
Trump's radical AI plan: no copyrights, fewer rules, more exports
阅读更多来源: The Decoder | 25-07-25
Anthropic says that AI can learn risky behaviors even when the training data looks completely safe
阅读更多来源: The Decoder | 25-07-25
Finding Robert Bogucki, the man who disappeared on purposeabc.net.au
阅读更多来源: Hacker News | 25-07-25
How Anthropic teams use Claude Codeanthropic.com
阅读更多来源: Hacker News | 25-07-25
Quantitative AI progress needs accurate and transparent evaluationmathstodon.xyz
阅读更多来源: Hacker News | 25-07-25
Superfunctions: A universal solution against sync/async fragmentation in Pythongithub.com/pomponchik
阅读更多来源: Hacker News | 25-07-25
Pew finds that only 1 percent of users click a source link directly from Google's AI Overviews
阅读更多来源: The Decoder | 24-07-25
Lumo: Privacy-first AI assistantproton.me
阅读更多来源: Hacker News | 24-07-25
Building better AI toolshazelweakly.me
阅读更多来源: Hacker News | 24-07-25
US AI Action Planai.gov
阅读更多来源: Hacker News | 24-07-25
Distillation makes AI models smaller and cheaperquantamagazine.org
阅读更多来源: Hacker News | 24-07-25
Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning
Authors: Situo Zhang, Hanqi Li, Lu Chen, Zihan Zhao, Xuanze Lin, Zichen Zhu, Bo Chen, Xin Chen, Kai Yu |
阅读更多来源: ArXiv AI | 24-07-25
Each to Their Own: Exploring the Optimal Embedding in RAG
Authors: Shiting Chen, Zijian Zhao, Jinsong Chen |
阅读更多来源: ArXiv AI | 24-07-25
Content-based 3D Image Retrieval and a ColBERT-inspired Re-ranking for Tumor Flagging and Staging
Authors: Farnaz Khun Jush, Steffen Vogler, Matthias Lenga |
阅读更多来源: ArXiv AI | 24-07-25
MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs
Authors: Alexander R. Fabbri, Diego Mares, Jorge Flores, Meher Mankikar, Ernesto Hernandez, Dean Lee, Bing Liu, Chen Xing |
阅读更多来源: ArXiv AI | 24-07-25
Vision Transformer attention alignment with human visual perception in aesthetic object evaluation
Authors: Miguel Carrasco, César González-Martín, José Aranda, Luis Oliveros |
阅读更多来源: ArXiv AI | 24-07-25
AI Telephone Surveying: Automating Quantitative Data Collection with an AI Interviewer
Authors: Danny D. Leybzon, Shreyas Tirumala, Nishant Jain, Summer Gillen, Michael Jackson, Cameron McPhee, Jennifer Schmidt |
阅读更多来源: ArXiv AI | 24-07-25
CASCADE: LLM-Powered JavaScript Deobfuscator at Google
Authors: Shan Jiang, Pranoy Kovuri, David Tao, Zhixun Tan |
阅读更多来源: ArXiv AI | 24-07-25
LoRA is All You Need for Safety Alignment of Reasoning LLMs
Authors: Yihao Xue, Baharan Mirzasoleiman |
阅读更多来源: ArXiv AI | 24-07-25
Towards Autonomous Sustainability Assessment via Multimodal AI Agents
Authors: Zhihan Zhang, Alexander Metzger, Yuxuan Mei, Felix Hähnlein, Zachary Englhardt, Tingyu Cheng, Gregory D. Abowd, Shwetak Patel, Adriana Schulz, Vikram Iyer |
阅读更多来源: ArXiv AI | 24-07-25
Our Cars Can Talk: How IoT Brings AI to Vehicles
Authors: Amod Kant Agrawal |
阅读更多来源: ArXiv AI | 24-07-25
Improving LLMs' Generalized Reasoning Abilities by Graph Problems
Authors: Qifan Zhang, Nuo Chen, Zehua Li, Miao Peng, Jing Tang, Jia Li |
阅读更多来源: ArXiv AI | 24-07-25
HySafe-AI: Hybrid Safety Architectural Analysis Framework for AI Systems: A Case Study
Authors: Mandar Pitale, Jelena Frtunikj, Abhinaw Priyadershi, Vasu Singh, Maria Spence |
阅读更多来源: ArXiv AI | 24-07-25
Compliance Brain Assistant: Conversational Agentic AI for Assisting Compliance Tasks in Enterprise Environments
Authors: Shitong Zhu, Chenhao Fang, Derek Larson, Neel Reddy Pochareddy, Rajeev Rao, Sophie Zeng, Yanqing Peng, Wendy Summer, Alex Goncalves, Arya Pudota, Herve Robert |
阅读更多来源: ArXiv AI | 24-07-25
An Uncertainty-Driven Adaptive Self-Alignment Framework for Large Language Models
Authors: Haoran Sun, Zekun Zhang, Shaoning Zeng |
阅读更多来源: ArXiv AI | 24-07-25
TAI Scan Tool: A RAG-Based Tool With Minimalistic Input for Trustworthy AI Self-Assessment
Authors: Athanasios Davvetas, Xenia Ziouvelou, Ypatia Dami, Alexis Kaponis, Konstantina Giouvanopoulou, Michael Papademas |
阅读更多来源: ArXiv AI | 24-07-25
Simulating multiple human perspectives in socio-ecological systems using large language models
Authors: Yongchao Zeng, Calum Brown, Ioannis Kyriakou, Ronja Hotz, Mark Rounsevell |
阅读更多来源: ArXiv AI | 24-07-25
Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning
Authors: Xinyao Liu, Diping Song |
阅读更多来源: ArXiv AI | 24-07-25
OpenAI’s new agent moves its 2017 vision for AI closer to reality
阅读更多来源: The Decoder | 24-07-25
Google’s Gemini 2.5 now supports "conversational image segmentation"
阅读更多来源: The Decoder | 24-07-25
OpenAI pushes ahead with Stargate as SoftBank remains absent from data center development
阅读更多来源: The Decoder | 23-07-25
Yet another study finds that overloading LLMs with information leads to worse results
阅读更多来源: The Decoder | 23-07-25
OpenAI’s math gold hints that AI may soon tackle even longer and harder tasks
阅读更多来源: The Decoder | 23-07-25
I watched Gemini CLI hallucinate and delete my filesanuraag2601.github.io
阅读更多来源: Hacker News | 23-07-25
Never Come Up Empty: Adaptive HyDE Retrieval for Improving LLM Developer Support
Authors: Fangjian Lei, Mariam El Mezouar, Shayan Noei, Ying Zou |
阅读更多来源: ArXiv AI | 23-07-25
Decoding Translation-Related Functional Sequences in 5'UTRs Using Interpretable Deep Learning Models
Authors: Yuxi Lin, Yaxue Fang, Zehong Zhang, Zhouwu Liu, Siyun Zhong, Fulong Yu |
阅读更多来源: ArXiv AI | 23-07-25
Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning
Authors: Helena Casademunt, Caden Juang, Adam Karvonen, Samuel Marks, Senthooran Rajamanoharan, Neel Nanda |
阅读更多来源: ArXiv AI | 23-07-25
Rethinking LLM-Based RTL Code Optimization Via Timing Logic Metamorphosis
Authors: Zhihao Xu, Bixin Li, Lulu Wang |
阅读更多来源: ArXiv AI | 23-07-25
Why Braking? Scenario Extraction and Reasoning Utilizing LLM
Authors: Yin Wu, Daniel Slieter, Vivek Subramanian, Ahmed Abouelazm, Robin Bohn, J. Marius Zöllner |
阅读更多来源: ArXiv AI | 23-07-25
Out-of-Distribution Generalization in the ARC-AGI Domain: Comparing Execution-Guided Neural Program Synthesis and Test-Time Fine-Tuning
Authors: Simon Ouellette |
阅读更多来源: ArXiv AI | 23-07-25
Differential Multimodal Transformers
Authors: Jerry Li, Timothy Oh, Joseph Hoang, Vardhit Veeramachaneni |
阅读更多来源: ArXiv AI | 23-07-25
Micromobility Flow Prediction: A Bike Sharing Station-level Study via Multi-level Spatial-Temporal Attention Neural Network
Authors: Xi Yang, Jiachen Wang, Song Han, Suining He |
阅读更多来源: ArXiv AI | 23-07-25
Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization
Authors: Shengchao Liu, Hannan Xu, Yan Ai, Huanxin Li, Yoshua Bengio, Harry Guo |
阅读更多来源: ArXiv AI | 23-07-25
From Logic to Language: A Trust Index for Problem Solving with LLMs
Authors: Tehseen Rug, Felix Böhmer, Tessa Pfattheicher |
阅读更多来源: ArXiv AI | 23-07-25
SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validation in COPD Reporting
Authors: Shuhao Mei, Yongchao Long, Shan Cao, Xiaobo Han, Shijia Geng, Jinbo Sun, Yuxi Zhou, Shenda Hong |
阅读更多来源: ArXiv AI | 23-07-25
Voice-based AI Agents: Filling the Economic Gaps in Digital Health Delivery
Authors: Bo Wen, Chen Wang, Qiwei Han, Raquel Norel, Julia Liu, Thaddeus Stappenbeck, Jeffrey L. Rogers |
阅读更多来源: ArXiv AI | 23-07-25
Distilled Large Language Model in Confidential Computing Environment for System-on-Chip Design
Authors: Dong Ben, Hui Feng, Qian Wang |
阅读更多来源: ArXiv AI | 23-07-25
ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry
Authors: Tianze Xu, Pengrui Lu, Lyumanshan Ye, Xiangkun Hu, Pengfei Liu |
阅读更多来源: ArXiv AI | 23-07-25
Mind the Gap: Evaluating the Representativeness of Quantitative Medical Language Reasoning LLM Benchmarks for African Disease Burdens
Authors: Fred Mutisya (1 and 2), Shikoh Gitau (1), Christine Syovata (2), Diana Oigara (2), Ibrahim Matende (2), Muna Aden (2), Munira Ali (2), Ryan Nyotu (2), Diana Marion (2), Job Nyangena (2), Nasubo Ongoma (1), Keith Mbae (1), Elizabeth Wamicha (1), Eric Mibuari (1), Jean Philbert Nsengemana (3), Talkmore Chidede (4) ((1) Qhala (Nairobi, Kenya), (2) Kenya Medical Association (Nairobi, Kenya), (3) Africa CDC (Addis Ababa, Ethiopia), (4) AfCFTA (Accra, Ghana)) |
阅读更多来源: ArXiv AI | 23-07-25
LLM-Driven Collaborative Model for Untangling Commits via Explicit and Implicit Dependency Reasoning
Authors: Bo Hou, Xin Tan, Kai Zheng, Fang Liu, Yinghao Zhu, Li Zhang |
阅读更多来源: ArXiv AI | 23-07-25
Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework
Authors: Hongyi Tang, Zhihao Zhu, Yi Yang |
阅读更多来源: ArXiv AI | 23-07-25
Improving ASP-based ORS Schedules through Machine Learning Predictions
Authors: Pierangela Bruno, Carmine Dodaro, Giuseppe Galatà, Marco Maratea, Marco Mochi |
阅读更多来源: ArXiv AI | 23-07-25
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report
Authors: Shanghai AI Lab: Xiaoyang Chen, Yunhao Chen, Zeren Chen, Zhiyun Chen, Hanyun Cui, Yawen Duan, Jiaxuan Guo, Qi Guo, Xuhao Hu, Hong Huang, Lige Huang, Chunxiao Li, Juncheng Li, Qihao Lin, Dongrui Liu, Xinmin Liu, Zicheng Liu, Chaochao Lu, Xiaoya Lu, Jingjing Qu, Qibing Ren, Jing Shao, Jingwei Shi, Jingwei Sun, Peng Wang, Weibing Wang, Jia Xu, Lewen Yan, Xiao Yu, Yi Yu, Boxuan Zhang, Jie Zhang, Weichen Zhang, Zhijie Zheng, Tianyi Zhou, Bowen Zhou |
阅读更多来源: ArXiv AI | 23-07-25
Agentic RAG with Knowledge Graphs for Complex Multi-Hop Reasoning in Real-World Applications
Authors: Jean Lelong, Adnane Errazine, Annabelle Blangero |
阅读更多来源: ArXiv AI | 23-07-25
Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints
Authors: Zhenyun Yin, Shujie Wang, Xuhong Wang, Xingjun Ma, Yinchun Wang |
阅读更多来源: ArXiv AI | 23-07-25
WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding
Authors: Ran Wang, Xiaoxuan Liu, Hao Ren, Gang Chen, Fanchao Qi, Maosong Sun |
阅读更多来源: ArXiv AI | 23-07-25
Uncertainty-Aware Knowledge Transformers for Peer-to-Peer Energy Trading with Multi-Agent Reinforcement Learning
Authors: Mian Ibad Ali Shah, Enda Barrett, Karl Mason |
阅读更多来源: ArXiv AI | 23-07-25
Subliminal learning: Models transmit behaviors via hidden signals in dataanthropic.com
阅读更多来源: Hacker News | 23-07-25
Gemini North telescope discovers long-predicted stellar companion of Betelgeusescience.org
阅读更多来源: Hacker News | 23-07-25
New ARC-AGI-3 benchmark shows that humans still outperform LLMs at pretty basic thinking
阅读更多来源: The Decoder | 22-07-25
OpenAI claims a breakthrough in LLM reasoning on complex math problems
阅读更多来源: The Decoder | 22-07-25
FlexOlmo enables organizations to collaboratively train LLMs without data sharing
阅读更多来源: The Decoder | 22-07-25
Routine: A Structural Planning Framework for LLM Agent System in Enterprise
Authors: Guancheng Zeng, Xueyi Chen, Jiawang Hu, Shaohua Qi, Yaxuan Mao, Zhantao Wang, Yifan Nie, Shuang Li, Qiuyang Feng, Pengxu Qiu, Yujia Wang, Wenqiang Han, Linyan Huang, Gang Li, Jingjing Mo, Haowen Hu |
阅读更多来源: ArXiv AI | 22-07-25
Large Language Models Assisting Ontology Evaluation
Authors: Anna Sofia Lippolis, Mohammad Javad Saeedizade, Robin Keskisärkkä, Aldo Gangemi, Eva Blomqvist, Andrea Giovanni Nuzzolese |
阅读更多来源: ArXiv AI | 22-07-25
BioGraphFusion: Graph Knowledge Embedding for Biological Completion and Reasoning
Authors: Yitong Lin, Jiaying He, Jiahe Chen, Xinnan Zhu, Jianwei Zheng, Tao Bo |
阅读更多来源: ArXiv AI | 22-07-25
Towards AI Urban Planner in the Age of GenAI, LLMs, and Agentic AI
Authors: Yanjie Fu |
阅读更多来源: ArXiv AI | 22-07-25
Automated Safety Evaluations Across 20 Large Language Models: The Aymara LLM Risk and Responsibility Matrix
Authors: Juan Manuel Contreras |
阅读更多来源: ArXiv AI | 22-07-25
Configurable multi-agent framework for scalable and realistic testing of llm-based agents
Authors: Sai Wang, Senthilnathan Subramanian, Mudit Sahni, Praneeth Gone, Lingjie Meng, Xiaochen Wang, Nicolas Ferradas Bertoli, Tingxian Cheng, Jun Xu |
阅读更多来源: ArXiv AI | 22-07-25
The Endless Tuning. An Artificial Intelligence Design To Avoid Human Replacement and Trace Back Responsibilities
Authors: Elio Grande |
阅读更多来源: ArXiv AI | 22-07-25
Feedback-Induced Performance Decline in LLM-Based Decision-Making
Authors: Xiao Yang, Juxi Leitner, Michael Burke |
阅读更多来源: ArXiv AI | 22-07-25
DeRAG: Black-box Adversarial Attacks on Multiple Retrieval-Augmented Generation Applications via Prompt Injection
Authors: Jerry Wang, Fang Yu |
阅读更多来源: ArXiv AI | 22-07-25
IM-Chat: A Multi-agent LLM-based Framework for Knowledge Transfer in Injection Molding Industry
Authors: Junhyeong Lee, Joon-Young Kim, Heekyu Kim, Inhyo Lee, Seunghwa Ryu |
阅读更多来源: ArXiv AI | 22-07-25
Explainable Artificial Intelligence based Soft Evaluation Indicator for Arc Fault Diagnosis
Authors: Qianchao Wang, Yuxuan Ding, Chuanzhen Jia, Zhe Li, Yaping Du |
阅读更多来源: ArXiv AI | 22-07-25
LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning
Authors: Cole Robertson, Philip Wolff |
阅读更多来源: ArXiv AI | 22-07-25
Predictive Process Monitoring Using Object-centric Graph Embeddings
Authors: Wissam Gherissi (LAMSADE), Mehdi Acheli, Joyce El Haddad (LAMSADE), Daniela Grigori (LAMSADE) |
阅读更多来源: ArXiv AI | 22-07-25
Agentic AI for autonomous anomaly management in complex systems
Authors: Reza Vatankhah Barenji, Sina Khoshgoftar |
阅读更多来源: ArXiv AI | 22-07-25
A Framework for Analyzing Abnormal Emergence in Service Ecosystems Through LLM-based Agent Intention Mining
Authors: Yifan Shen, Zihan Zhao, Xiao Xue, Yuwei Guo, Qun Ma, Deyu Zhou, Ming Zhang |
阅读更多来源: ArXiv AI | 22-07-25
Gemini 2.5 Pro Capable of Winning Gold at IMO 2025
Authors: Yichen Huang, Lin F. Yang |
阅读更多来源: ArXiv AI | 22-07-25
Don't bother parsing: Just use images for RAGmorphik.ai
阅读更多来源: Hacker News | 22-07-25
AccountingBench: Evaluating LLMs on real long-horizon business taskspenrose.com
阅读更多来源: Hacker News | 22-07-25
The Hater's Guide to the AI Bubblewheresyoured.at
阅读更多来源: Hacker News | 22-07-25
How to Migrate from OpenAI to Cerebrium for Cost-Predictable AI Inferenceritza.co
阅读更多来源: Hacker News | 22-07-25
Complete silence is always hallucinated as "ترجمة نانسي قنقر" in Arabicgithub.com/openai
阅读更多来源: Hacker News | 22-07-25
Replit's CEO apologizes after its AI agent wiped a company's code basebusinessinsider.com
阅读更多来源: Hacker News | 22-07-25
If writing is thinking then what happens if AI is doing the writing and reading?learningbyshipping.com
阅读更多来源: Hacker News | 22-07-25
"Napster-style" piracy allegations put Anthropic at risk of a billion-dollar class action lawsuit
阅读更多来源: The Decoder | 21-07-25
Decart launches MirageLSD, an AI model that transforms live video feeds in real time
阅读更多来源: The Decoder | 21-07-25
Show HN: Conductor, a Mac app that lets you run a bunch of Claude Codes at onceconductor.build
阅读更多来源: Hacker News | 21-07-25
Coding with LLMs in the summer of 2025 – an updateantirez.com
阅读更多来源: Hacker News | 21-07-25
SPARQL Query Generation with LLMs: Measuring the Impact of Training Data Memorization and Knowledge Injection
Authors: Aleksandr Gashkov, Aleksandr Perevalov, Maria Eltsova, Andreas Both |
阅读更多来源: ArXiv AI | 21-07-25
RAG-based Architectures for Drug Side Effect Retrieval in LLMs
Authors: Shad Nygren, Pinar Avci, Andre Daniels, Reza Rassol, Afshin Beheshti, Diego Galeano |
阅读更多来源: ArXiv AI | 21-07-25
Using LLMs to identify features of personal and professional skills in an open-response situational judgment test
Authors: Cole Walsh, Rodica Ivan, Muhammad Zafar Iqbal, Colleen Robb |
阅读更多来源: ArXiv AI | 21-07-25
Exploiting Primacy Effect To Improve Large Language Models
Authors: Bianca Raimondi, Maurizio Gabbrielli |
阅读更多来源: ArXiv AI | 21-07-25
Preprint: Did I Just Browse A Website Written by LLMs?
Authors: Sichang "Steven" He, Ramesh Govindan, Harsha V. Madhyastha |
阅读更多来源: ArXiv AI | 21-07-25
A segmented robot grasping perception neural network for edge AI
Authors: Casper Bröcheler, Thomas Vroom, Derrick Timmermans, Alan van den Akker, Guangzhi Tang, Charalampos S. Kouzinopoulos, Rico Möckel |
阅读更多来源: ArXiv AI | 21-07-25
Photonic Fabric Platform for AI Accelerators
Authors: Jing Ding, Trung Diep |
阅读更多来源: ArXiv AI | 21-07-25
Edge Intelligence with Spiking Neural Networks
Authors: Shuiguang Deng, Di Yu, Changze Lv, Xin Du, Linshan Jiang, Xiaofan Zhao, Wentao Tong, Xiaoqing Zheng, Weijia Fang, Peng Zhao, Gang Pan, Schahram Dustdar, Albert Y. Zomaya |
阅读更多来源: ArXiv AI | 21-07-25
Multi-Centre Validation of a Deep Learning Model for Scoliosis Assessment
Authors: Šimon Kubov, Simon Klíčník, Jakub Dandár, Zdeněk Straka, Karolína Kvaková, Daniel Kvak |
阅读更多来源: ArXiv AI | 21-07-25
GraphTrafficGPT: Enhancing Traffic Management Through Graph-Based AI Agent Coordination
Authors: Nabil Abdelaziz Ferhat Taleb, Abdolazim Rezaei, Raj Atulkumar Patel, Mehdi Sookhak |
阅读更多来源: ArXiv AI | 21-07-25
GOFAI meets Generative AI: Development of Expert Systems by means of Large Language Models
Authors: Eduardo C. Garrido-Merchán, Cristina Puente |
阅读更多来源: ArXiv AI | 21-07-25
BifrostRAG: Bridging Dual Knowledge Graphs for Multi-Hop Question Answering in Construction Safety
Authors: Yuxin Zhang (1), Xi Wang (1), Mo Hu (1), Zhenyu Zhang (1) ((1) Department of Construction Science, College of Architecture, Texas A&M University, College Station, USA) |
阅读更多来源: ArXiv AI | 21-07-25
DailyLLM: Context-Aware Activity Log Generation Using Multi-Modal Sensors and LLMs
Authors: Ye Tian, Xiaoyuan Ren, Zihao Wang, Onat Gungor, Xiaofan Yu, Tajana Rosing |
阅读更多来源: ArXiv AI | 21-07-25
Large Language Models as Innovators: A Framework to Leverage Latent Space Exploration for Novelty Discovery
Authors: Mateusz Bystroński, Mikołaj Hołysz, Grzegorz Piotrowski, Nitesh V. Chawla, Tomasz Kajdanowicz |
阅读更多来源: ArXiv AI | 21-07-25
KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models
Authors: Lam Nguyen, Erika Barcelos, Roger French, Yinghui Wu |
阅读更多来源: ArXiv AI | 21-07-25
Glucose-ML: A collection of longitudinal diabetes datasets for development of robust AI solutions
Authors: Temiloluwa Prioleau, Baiying Lu, Yanjun Cui |
阅读更多来源: ArXiv AI | 21-07-25
Automated Interpretation of Non-Destructive Evaluation Contour Maps Using Large Language Models for Bridge Condition Assessment
Authors: Viraj Nishesh Darji, Callie C. Liao, Duoduo Liao |
阅读更多来源: ArXiv AI | 21-07-25
Computational complexity of neural networks (2022)lunalux.io
阅读更多来源: Hacker News | 21-07-25
iMessage integration in Claude can hijack the model to do anythinggeneralanalysis.com
阅读更多来源: Hacker News | 21-07-25
Nobody knows how to build with AI yetworksonmymachine.substack.com
阅读更多来源: Hacker News | 20-07-25
Local LLMs versus offline Wikipediaevanhahn.com
阅读更多来源: Hacker News | 20-07-25
Make Your Own Backup System – Part 1: Strategy Before Scriptsdragas.net
阅读更多来源: Hacker News | 20-07-25
Terence Tao: A human metaphor for evaluating AI capabilitymathstodon.xyz
阅读更多来源: Hacker News | 20-07-25
I'm betting against AI agents, despite building themutkarshkanwat.com
阅读更多来源: Hacker News | 20-07-25
The Big LLM Architecture Comparisonsebastianraschka.com
阅读更多来源: Hacker News | 20-07-25
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities
Authors: Hao Sun, Mihaela van der Schaar |
阅读更多来源: ArXiv AI | 20-07-25
SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models
Authors: Xiangyu Dong, Haoran Zhao, Jiang Gao, Haozhou Li, Xiaoguang Ma, Yaoming Zhou, Fuhai Chen, Juan Liu |
阅读更多来源: ArXiv AI | 20-07-25
DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model
Authors: Maulana Bisyir Azhari, David Hyunchul Shim |
阅读更多来源: ArXiv AI | 20-07-25
Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection
Authors: Hongyang Zhao, Tianyu Liang, Sina Davari, Daeho Kim |
阅读更多来源: ArXiv AI | 20-07-25
Prompt Injection 2.0: Hybrid AI Threats
Authors: Jeremy McHugh, Kristina Šekrst, Jon Cefalu |
阅读更多来源: ArXiv AI | 20-07-25
HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models
Authors: Ashray Gupta, Rohan Joseph, Sunny Rai |
阅读更多来源: ArXiv AI | 20-07-25
Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy
Authors: Yiting Yang, Hao Luo, Yuan Sun, Qingsen Yan, Haokui Zhang, Wei Dong, Guoqing Wang, Peng Wang, Yang Yang, Hengtao Shen |
阅读更多来源: ArXiv AI | 20-07-25
Automating Steering for Safe Multimodal Large Language Models
Authors: Lyucheng Wu, Mengru Wang, Ziwen Xu, Tri Cao, Nay Oo, Bryan Hooi, Shumin Deng |
阅读更多来源: ArXiv AI | 20-07-25
QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation
Authors: Jiazheng Li, Hong Lu, Kaiyue Wen, Zaiwen Yang, Jiaxuan Gao, Hongzhou Lin, Yi Wu, Jingzhao Zhang |
阅读更多来源: ArXiv AI | 20-07-25
AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research
Authors: Yilun Zhao, Weiyuan Chen, Zhijian Xu, Manasi Patwardhan, Yixin Liu, Chengye Wang, Lovekesh Vig, Arman Cohan |
阅读更多来源: ArXiv AI | 20-07-25
Towards Formal Verification of LLM-Generated Code from Natural Language Prompts
Authors: Aaron Councilman, David Fu, Aryan Gupta, Chengxiao Wang, David Grove, Yu-Xiong Wang, Vikram Adve |
阅读更多来源: ArXiv AI | 20-07-25
Imitating Mistakes in a Learning Companion AI Agent for Online Peer Learning
Authors: Sosui Moribe, Taketoshi Ushiama |
阅读更多来源: ArXiv AI | 20-07-25
Emotional Support with LLM-based Empathetic Dialogue Generation
Authors: Shiquan Wang, Ruiyu Fang, Zhongjiang He, Shuangyong Song, Yongxiang Li |
阅读更多来源: ArXiv AI | 20-07-25
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models
Authors: Zhiwei Liu, Jielin Qiu, Shiyu Wang, Jianguo Zhang, Zuxin Liu, Roshan Ram, Haolin Chen, Weiran Yao, Huan Wang, Shelby Heinecke, Silvio Savarese, Caiming Xiong |
阅读更多来源: ArXiv AI | 20-07-25
VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks
Authors: Jian Yao, Ran Cheng, Kay Chen Tan |
阅读更多来源: ArXiv AI | 20-07-25
Prediction of Highway Traffic Flow Based on Artificial Intelligence Algorithms Using California Traffic Data
Authors: Junseong Lee, Jaegwan Cho, Yoonju Cho, Seoyoon Choi, Yejin Shin |
阅读更多来源: ArXiv AI | 20-07-25
Black Box Deployed -- Functional Criteria for Artificial Moral Agents in the LLM Era
Authors: Matthew E. Brophy |
阅读更多来源: ArXiv AI | 20-07-25
The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations
Authors: Carlos Arriaga, Gonzalo Martínez, Eneko Sendin, Javier Conde, Pedro Reviriego |
阅读更多来源: ArXiv AI | 20-07-25
Trump advisors are pushing a regulation targeting what they call "woke" AI models in the tech sector
阅读更多来源: The Decoder | 20-07-25
OpenAI CEO Sam Altman warns users not to trust ChatGPT agent with sensitive or personal data
阅读更多来源: The Decoder | 20-07-25
OpenAI claims gold-medal performance at IMO 2025twitter.com/alexwei_
阅读更多来源: Hacker News | 20-07-25
Meta is luring more top AI researchers from Apple with million-dollar deals
阅读更多来源: The Decoder | 19-07-25
Google's Veo 3 video generation model launches on Gemini API with a hefty price tag
阅读更多来源: The Decoder | 19-07-25
Meta says it won’t sign Europe AI agreement, calling it an overreachcnbc.com
阅读更多来源: Hacker News | 19-07-25
GPT-5-reasoning alpha found in the wildtwitter.com/btibor91
阅读更多来源: Hacker News | 19-07-25
I avoid using LLMs as a publisher and writerlifehacky.net
阅读更多来源: Hacker News | 19-07-25
Mistral AI adds deep research, voice mode, image editing, and more to Le Chat
阅读更多来源: The Decoder | 19-07-25
Anthropic could soon be worth $100 billion - thanks to Claude Code
阅读更多来源: The Decoder | 19-07-25
How I keep up with AI progressnilenso.com
阅读更多来源: Hacker News | 19-07-25
I'm Rebelling Against the Algorithmvarunraghu.com
阅读更多来源: Hacker News | 19-07-25
lsr: ls with io_uringrockorager.dev
阅读更多来源: Hacker News | 19-07-25
Ccusage: A CLI tool for analyzing Claude Code usage from local JSONL filesgithub.com/ryoppippi
阅读更多来源: Hacker News | 19-07-25
Google brings Gemini 2.5 Pro and Deep Search to AI Mode and adds AI phone calling to search
阅读更多来源: The Decoder | 18-07-25
Reflection unveils Asimov: an AI agent built to track every step of software development
阅读更多来源: The Decoder | 18-07-25
Claude Code Unleashedymichael.com
阅读更多来源: Hacker News | 18-07-25
All AI models might be the samejxmo.io
阅读更多来源: Hacker News | 18-07-25
My favorite use-case for AI is writing logsvickiboykis.com
阅读更多来源: Hacker News | 18-07-25
My experience with Claude Code after two weeks of adventuressankalp.bearblog.dev
阅读更多来源: Hacker News | 18-07-25
ChatGPT agent: bridging research and actionopenai.com
阅读更多来源: Hacker News | 18-07-25
Anthropic launches a dedicated AI solution to help finance professionals with analysis
阅读更多来源: The Decoder | 18-07-25
Zuckerberg predicts that not wearing AI glasses in the future will put you at a cognitive disadvantage
阅读更多来源: The Decoder | 18-07-25
CBS Canceling 'Late Show with Stephen Colbert' After Next Seasonnytimes.com
阅读更多来源: Hacker News | 18-07-25
Anthropic tightens usage limits for Claude Code without telling userstechcrunch.com
阅读更多来源: Hacker News | 18-07-25
Meta hires two more leading OpenAI researchers for its superalignment team
阅读更多来源: The Decoder | 17-07-25
I was wrong about robots.txtevgeniipendragon.com
阅读更多来源: Hacker News | 17-07-25
The AI bubble today is bigger than the IT bubble in the 1990sapolloacademy.com
阅读更多来源: Hacker News | 17-07-25
Code Execution Through Email: How I Used Claude to Hack Itselfpynt.io
阅读更多来源: Hacker News | 17-07-25
N8n vs. node-red, which to use for AI workloadsdaniel-payne-keldan-systems.medium.com
阅读更多来源: Hacker News | 17-07-25
Quantum Machine Learning in Multi-Qubit Phase-Space Part I: Foundations
Authors: Timothy Heightman, Edward Jiang, Ruth Mora-Soto, Maciej Lewenstein, Marcin Płodzień |
阅读更多来源: ArXiv AI | 17-07-25
A Framework for Nonstationary Gaussian Processes with Neural Network Parameters
Authors: Zachary James, Joseph Guinness |
阅读更多来源: ArXiv AI | 17-07-25
Improving Contextual ASR via Multi-grained Fusion with Large Language Models
Authors: Shilin Zhou, Zhenghua Li |
阅读更多来源: ArXiv AI | 17-07-25
Text-ADBench: Text Anomaly Detection Benchmark based on LLMs Embedding
Authors: Feng Xiao, Jicong Fan |
阅读更多来源: ArXiv AI | 17-07-25
Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants
Authors: Sybelle Goedicke-Fritz (1), Michelle Bous (1), Annika Engel (2), Matthias Flotho (2 and 5), Pascal Hirsch (2), Hannah Wittig (1), Dino Milanovic (2), Dominik Mohr (1), Mathias Kaspar (6), Sogand Nemat (3), Dorothea Kerner (3), Arno Bücker (3), Andreas Keller (2 and 5 and 7), Sascha Meyer (4), Michael Zemlin (1), Philipp Flotho (2 and 5) ((1) Department of General Pediatrics and Neonatology, Saarland University, Campus Homburg, Homburg/Saar, Germany, (2) Chair for Clinical Bioinformatics, Saarland Informatics Campus, Saarland University, Saarbrücken, Germany, (3) Department of Radiology, and Interventional Radiology, University Hospital of Saarland, Homburg, Germany, (4) Clinical Centre Karlsruhe, Franz-Lust Clinic for Paediatrics, Karlsruhe, Germany, (5) Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Saarland University Campus, Germany, (6) Digital Medicine, University Hospital of Augsburg, Augsburg, Germany, (7) Pharma Science Hub (PSH), Saarland University Campus, Germany) |
阅读更多来源: ArXiv AI | 17-07-25
Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization
Authors: Prashanth Vijayaraghavan, Apoorva Nitsure, Charles Mackin, Luyao Shi, Stefano Ambrogio, Arvind Haran, Viresh Paruthi, Ali Elzein, Dan Coops, David Beymer, Tyler Baldwin, Ehsan Degan |
阅读更多来源: ArXiv AI | 17-07-25
GitChameleon: Evaluating AI Code Generation Against Python Library Version Incompatibilities
Authors: Diganta Misra, Nizar Islah, Victor May, Brice Rauby, Zihan Wang, Justine Gehring, Antonio Orvieto, Muawiz Chaudhary, Eilif B. Muller, Irina Rish, Samira Ebrahimi Kahou, Massimo Caccia |
阅读更多来源: ArXiv AI | 17-07-25
LLM-Based Config Synthesis requires Disambiguation
Authors: Rajdeep Mondal, Nikolaj Bjorner, Todd Millstein, Alan Tang, George Varghese |
阅读更多来源: ArXiv AI | 17-07-25
Characterizing State Space Model (SSM) and SSM-Transformer Hybrid Language Model Performance with Long Context Length
Authors: Saptarshi Mitra, Rachid Karami, Haocheng Xu, Sitao Huang, Hyoukjun Kwon |
阅读更多来源: ArXiv AI | 17-07-25
A Study on the Application of Artificial Intelligence in Ecological Design
Authors: Hengyue Zhao |
阅读更多来源: ArXiv AI | 17-07-25
Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification
Authors: Moises Andrade, Joonhyuk Cha, Brandon Ho, Vriksha Srihari, Karmesh Yadav, Zsolt Kira |
阅读更多来源: ArXiv AI | 17-07-25
General Modular Harness for LLM Agents in Multi-Turn Gaming Environments
Authors: Yuxuan Zhang, Haoyang Yu, Lanxiang Hu, Haojian Jin, Hao Zhang |
阅读更多来源: ArXiv AI | 17-07-25
Auto-Formulating Dynamic Programming Problems with Large Language Models
Authors: Chenyu Zhou, Jingyuan Yang, Linwei Xin, Yitian Chen, Ziyan He, Dongdong Ge |
阅读更多来源: ArXiv AI | 17-07-25
ClarifAI: Enhancing AI Interpretability and Transparency through Case-Based Reasoning and Ontology-Driven Approach for Improved Decision-Making
Authors: Srikanth Vemula |
阅读更多来源: ArXiv AI | 17-07-25
BuildEvo: Designing Building Energy Consumption Forecasting Heuristics via LLM-driven Evolution
Authors: Subin Lin, Chuanbo Hua |
阅读更多来源: ArXiv AI | 17-07-25
Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning
Authors: Yuhao Chen, Shuochen Liu, Yuanjie Lyu, Chao Zhang, Jiayao Shi, Tong Xu |
阅读更多来源: ArXiv AI | 17-07-25
Nvidia can resume exports of its H20 AI chip to China after a US policy reversal
阅读更多来源: The Decoder | 17-07-25
Scanned piano rolls databasepianorollmusic.org
阅读更多来源: Hacker News | 17-07-25
Chain of thought monitorability: A new and fragile opportunity for AI safetyarxiv.org
阅读更多来源: Hacker News | 17-07-25
Six Years of Geminigeminiprotocol.net
阅读更多来源: Hacker News | 16-07-25
Show HN: Shoggoth Mini – A soft tentacle robot powered by GPT-4o and RLmatthieulc.com
阅读更多来源: Hacker News | 16-07-25
Reflections on OpenAIcalv.info
阅读更多来源: Hacker News | 16-07-25
Gauntlet AI (YC S17): All expenses paid training in AI and $200k+jobcrossover.com
阅读更多来源: Hacker News | 16-07-25
LLM Daydreaminggwern.net
阅读更多来源: Hacker News | 16-07-25
KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?
Authors: Soumadeep Saha, Akshay Chaturvedi, Saptarshi Saha, Utpal Garain, Nicholas Asher |
阅读更多来源: ArXiv AI | 16-07-25
EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes
Authors: LG AI Research: Kyunghoon Bae, Eunbi Choi, Kibong Choi, Stanley Jungkyu Choi, Yemuk Choi, Kyubeen Han, Seokhee Hong, Junwon Hwang, Taewan Hwang, Joonwon Jang, Hyojin Jeon, Kijeong Jeon, Gerrard Jeongwon Jo, Hyunjik Jo, Jiyeon Jung, Euisoon Kim, Hyosang Kim, Jihoon Kim, Joonkee Kim, Seonghwan Kim, Soyeon Kim, Sunkyoung Kim, Yireun Kim, Yongil Kim, Youchul Kim, Edward Hwayoung Lee, Gwangho Lee, Haeju Lee, Honglak Lee, Jinsik Lee, Kyungmin Lee, Sangha Park, Young Min Paik, Yongmin Park, Youngyong Park, Sanghyun Seo, Sihoon Yang, Heuiyeen Yeen, Sihyuk Yi, Hyeongu Yun |
阅读更多来源: ArXiv AI | 16-07-25
Attributes Shape the Embedding Space of Face Recognition Models
Authors: Pierrick Leroy, Antonio Mastropietro, Marco Nurisso, Francesco Vaccarino |
阅读更多来源: ArXiv AI | 16-07-25
SAMEP: A Secure Protocol for Persistent Context Sharing Across AI Agents
Authors: Hari Masoor |
阅读更多来源: ArXiv AI | 16-07-25
Streaming 4D Visual Geometry Transformer
Authors: Dong Zhuo, Wenzhao Zheng, Jiahe Guo, Yuqi Wu, Jie Zhou, Jiwen Lu |
阅读更多来源: ArXiv AI | 16-07-25
AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air
Authors: Shiyi Yang, Xiaoxue Yu, Rongpeng Li, Jianhang Zhu, Zhifeng Zhao, Honggang Zhang |
阅读更多来源: ArXiv AI | 16-07-25
Enhancing the Capabilities of Large Language Models for API calls through Knowledge Graphs
Authors: Ye Yang, Xue Xiao, Ping Yin, Taotao Xie |
阅读更多来源: ArXiv AI | 16-07-25
Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning
Authors: Zheng Zhang |
阅读更多来源: ArXiv AI | 16-07-25
Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning
Authors: Konstantinos I. Roumeliotis, Ranjan Sapkota, Manoj Karkee, Nikolaos D. Tselikas |
阅读更多来源: ArXiv AI | 16-07-25
WhisperKit: On-device Real-time ASR with Billion-Scale Transformers
Authors: Atila Orhon, Arda Okan, Berkin Durmus, Zach Nagengast, Eduardo Pacheco |
阅读更多来源: ArXiv AI | 16-07-25
Automated Thematic Analyses Using LLMs: Xylazine Wound Management Social Media Chatter Use Case
Authors: JaMor Hairston, Ritvik Ranjan, Sahithi Lakamana, Anthony Spadaro, Selen Bozkurt, Jeanmarie Perrone, Abeed Sarker |
阅读更多来源: ArXiv AI | 16-07-25
Detecting AI Assistance in Abstract Complex Tasks
Authors: Tyler King, Nikolos Gurney, John H. Miller, Volkan Ustun |
阅读更多来源: ArXiv AI | 16-07-25
IoT Malware Network Traffic Detection using Deep Learning and GraphSAGE Models
Authors: Nikesh Prajapati, Bimal Karki, Saroj Gopali, Akbar Siami Namin |
阅读更多来源: ArXiv AI | 16-07-25
Function-to-Style Guidance of LLMs for Code Translation
Authors: Longhui Zhang, Bin Wang, Jiahao Wang, Xiaofeng Zhao, Min Zhang, Hao Yang, Meishan Zhang, Yu Li, Jing Li, Jun Yu, Min Zhang |
阅读更多来源: ArXiv AI | 16-07-25
Modeling Habitat Shifts: Integrating Convolutional Neural Networks and Tabular Data for Species Migration Prediction
Authors: Emir Durakovic, Min-Hong Shih |
阅读更多来源: ArXiv AI | 16-07-25
Lessons Learned from Evaluation of LLM based Multi-agents in Safer Therapy Recommendation
Authors: Yicong Wu, Ting Chen, Irit Hochberg, Zhoujian Sun, Ruth Edry, Zhengxing Huang, Mor Peleg |
阅读更多来源: ArXiv AI | 16-07-25
Taming Uncertainty via Automation: Observing, Analyzing, and Optimizing Agentic AI Systems
Authors: Dany Moshkovich, Sergey Zeltyn |
阅读更多来源: ArXiv AI | 16-07-25
Perspective-Aware AI in Extended Reality
Authors: Daniel Platnick, Matti Gruener, Marjan Alirezaie, Kent Larson, Dava J. Newman, Hossein Rahnama |
阅读更多来源: ArXiv AI | 16-07-25
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering
Authors: Yinsheng Li, Zhen Dong, Yi Shao |
阅读更多来源: ArXiv AI | 16-07-25
How Many Instructions Can LLMs Follow at Once?
Authors: Daniel Jaroslawicz, Brendan Whiting, Parth Shah, Karime Maamari |
阅读更多来源: ArXiv AI | 16-07-25
Vulnerable kids are nearly three times more likely to use companion AI chatbots for friendship
阅读更多来源: The Decoder | 16-07-25
Anthropic, OpenAI, Google, and xAI have landed Pentagon contracts worth up to $200 million
阅读更多来源: The Decoder | 16-07-25
LLM Inevitabilismtomrenner.com
阅读更多来源: Hacker News | 16-07-25
OpenAI – vulnerability responsible disclosureany.org
阅读更多来源: Hacker News | 16-07-25
Mira Murati’s AI startup Thinking Machines valued at $12B in early-stage fundingreuters.com
阅读更多来源: Hacker News | 16-07-25
Claude for Financial Servicesanthropic.com
阅读更多来源: Hacker News | 16-07-25
Unlike ChatGPT, Anthropic has doubled down on Artifactsben-mini.com
阅读更多来源: Hacker News | 16-07-25
NeuralOS: An operating system powered by neural networksneural-os.com
阅读更多来源: Hacker News | 15-07-25
Context Rot: How increasing input tokens impacts LLM performancetrychroma.com
阅读更多来源: Hacker News | 15-07-25
CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks
Authors: Hongchao Jiang, Yiming Chen, Yushi Cao, Hung-yi Lee, Robby T. Tan |
阅读更多来源: ArXiv AI | 15-07-25
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
Authors: Joel Becker, Nate Rush, Elizabeth Barnes, David Rein |
阅读更多来源: ArXiv AI | 15-07-25
Multi-Actor Generative Artificial Intelligence as a Game Engine
Authors: Alexander Sasha Vezhnevets, Jayd Matyas, Logan Cross, Davide Paglieri, Minsuk Chang, William A. Cunningham, Simon Osindero, William S. Isaac, Joel Z. Leibo |
阅读更多来源: ArXiv AI | 15-07-25
LLM-Stackelberg Games: Conjectural Reasoning Equilibria and Their Applications to Spearphishing
Authors: Quanyan Zhu |
阅读更多来源: ArXiv AI | 15-07-25
Knowledge Conceptualization Impacts RAG Efficacy
Authors: Chris Davis Jaldi, Anmol Saini, Elham Ghiasi, O. Divine Eziolise, Cogan Shimizu |
阅读更多来源: ArXiv AI | 15-07-25
EduFlow: Advancing MLLMs' Problem-Solving Proficiency through Multi-Stage, Multi-Perspective Critique
Authors: Chenglin Zhu, Tao Zhang, Chong Li, Mingan Lin, Zenan Zhou, Jian Xie |
阅读更多来源: ArXiv AI | 15-07-25
A Taxonomy of Omnicidal Futures Involving Artificial Intelligence
Authors: Andrew Critch, Jacob Tsimerman |
阅读更多来源: ArXiv AI | 15-07-25
When Developer Aid Becomes Security Debt: A Systematic Analysis of Insecure Behaviors in LLM Coding Agents
Authors: Matous Kozak, Roshanak Zilouchian Moghaddam, Siva Sivaraman |
阅读更多来源: ArXiv AI | 15-07-25
humancompatible.interconnect: Testing Properties of Repeated Uses of Interconnections of AI Systems
Authors: Rodion Nazarov, Anthony Quinn, Robert Shorten, Jakub Marecek |
阅读更多来源: ArXiv AI | 15-07-25
Learning to Control Dynamical Agents via Spiking Neural Networks and Metropolis-Hastings Sampling
Authors: Ali Safa, Farida Mohsen, Ali Al-Zawqari |
阅读更多来源: ArXiv AI | 15-07-25
Model-Grounded Symbolic Artificial Intelligence Systems Learning and Reasoning with Model-Grounded Symbolic Artificial Intelligence Systems
Authors: Aniruddha Chattopadhyay, Raj Dandekar, Kaushik Roy |
阅读更多来源: ArXiv AI | 15-07-25
Is Human-Written Data Enough? The Challenge of Teaching Reasoning to LLMs Without RL or Distillation
Authors: Wei Du, Branislav Kisacanin, George Armstrong, Shubham Toshniwal, Ivan Moshkov, Alexan Ayrapetyan, Sadegh Mahdavi, Dan Zhao, Shizhe Diao, Dragan Masulovic, Marius Stanean, Advaith Avadhanam, Max Wang, Ashmit Dutta, Shitij Govil, Sri Yanamandara, Mihir Tandon, Sriram Ananthakrishnan, Vedant Rathi, David Zhang, Joonseok Kang, Leon Luo, Titu Andreescu, Boris Ginsburg, Igor Gitman |
阅读更多来源: ArXiv AI | 15-07-25
Technical Requirements for Halting Dangerous AI Activities
Authors: Peter Barnett, Aaron Scher, David Abecassis |
阅读更多来源: ArXiv AI | 15-07-25
Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations
Authors: Bradley P. Allen, Prateek Chhikara, Thomas Macaulay Ferguson, Filip Ilievski, Paul Groth |
阅读更多来源: ArXiv AI | 15-07-25
DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models
Authors: Luolin Xiong, Haofen Wang, Xi Chen, Lu Sheng, Yun Xiong, Jingping Liu, Yanghua Xiao, Huajun Chen, Qing-Long Han, Yang Tang |
阅读更多来源: ArXiv AI | 15-07-25
Could you be wrong: Debiasing LLMs using a metacognitive prompt for improving human decision making
Authors: Thomas T. Hills |
阅读更多来源: ArXiv AI | 15-07-25
Analysis of AI Techniques for Orchestrating Edge-Cloud Application Migration
Authors: Sadig Gojayev, Ahmad Anaqreh, Carolina Fortuna |
阅读更多来源: ArXiv AI | 15-07-25
BlueGlass: A Framework for Composite AI Safety
Authors: Harshal Nandigramwar, Syed Qutub, Kay-Ulrich Scholl |
阅读更多来源: ArXiv AI | 15-07-25
FRSICL: LLM-Enabled In-Context Learning Flight Resource Allocation for Fresh Data Collection in UAV-Assisted Wildfire Monitoring
Authors: Yousef Emami, Hao Zhou, Miguel Gutierrez Gaitan, Kai Li, Luis Almeida |
阅读更多来源: ArXiv AI | 15-07-25
Introducing the Swiss Food Knowledge Graph: AI for Context-Aware Nutrition Recommendation
Authors: Lubnaa Abdur Rahman, Ioannis Papathanail, Stavroula Mougiakakou |
阅读更多来源: ArXiv AI | 15-07-25
Survey for Categorising Explainable AI Studies Using Data Analysis Task Frameworks
Authors: Hamzah Ziadeh, Hendrik Knoche |
阅读更多来源: ArXiv AI | 15-07-25
Should We Ever Prefer Decision Transformer for Offline Reinforcement Learning?
Authors: Yumi Omori, Zixuan Dong, Keith Ross |
阅读更多来源: ArXiv AI | 15-07-25
Toward Real-World Table Agents: Capabilities, Workflows, and Design Principles for LLM-based Table Intelligence
Authors: Jiaming Tian, Liyao Li, Wentao Ye, Haobo Wang, Lingxin Wang, Lihua Yu, Zujie Ren, Gang Chen, Junbo Zhao |
阅读更多来源: ArXiv AI | 15-07-25
SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning
Authors: Meriem Zerkouk, Miloud Mihoubi, Belkacem Chikhaoui |
阅读更多来源: ArXiv AI | 15-07-25
Elon Musk's AI company xAI apologizes "deeply" for Grok's "horrific behavior"
阅读更多来源: The Decoder | 15-07-25
Anthropic, Google, OpenAI and XAI Granted Up to $200M from Defense Departmentcnbc.com
阅读更多来源: Hacker News | 15-07-25
Embedding user-defined indexes in Apache Parquetapache.org
阅读更多来源: Hacker News | 15-07-25
OpenAI delays release of open-weight model indefinitely over safety concerns
阅读更多来源: The Decoder | 14-07-25
A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1
Authors: Marcin Pietroń, Rafał Olszowski, Jakub Gomułka, Filip Gampel, Andrzej Tomski |
阅读更多来源: ArXiv AI | 14-07-25
Generating Proto-Personas through Prompt Engineering: A Case Study on Efficiency, Effectiveness and Empathy
Authors: Fernando Ayach, Vitor Lameirão, Raul Leão, Jerfferson Felizardo, Rafael Sobrinho, Vanessa Borges, Patrícia Matsubara, Awdren Fontão |
阅读更多来源: ArXiv AI | 14-07-25
TableReasoner: Advancing Table Reasoning Framework with Large Language Models
Authors: Sishi Xiong, Dakai Wang, Yu Zhao, Jie Zhang, Changzai Pan, Haowei He, Xiangyu Li, Wenhan Chang, Zhongjiang He, Shuangyong Song, Yongxiang Li |
阅读更多来源: ArXiv AI | 14-07-25
Reasoning and Behavioral Equilibria in LLM-Nash Games: From Mindsets to Actions
Authors: Quanyan Zhu |
阅读更多来源: ArXiv AI | 14-07-25
A Dynamic Stackelberg Game Framework for Agentic AI Defense Against LLM Jailbreaking
Authors: Zhengye Han, Quanyan Zhu |
阅读更多来源: ArXiv AI | 14-07-25
Giving AI Agents Access to Cryptocurrency and Smart Contracts Creates New Vectors of AI Harm
Authors: Bill Marino, Ari Juels |
阅读更多来源: ArXiv AI | 14-07-25
Multi-Agent LLMs as Ethics Advocates in AI-Based Systems
Authors: Asma Yamani, Malak Baslyman, Moataz Ahmed |
阅读更多来源: ArXiv AI | 14-07-25
M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning
Authors: Inclusion AI: Fudong Wang, Jiajia Liu, Jingdong Chen, Jun Zhou, Kaixiang Ji, Lixiang Ru, Qingpei Guo, Ruobing Zheng, Tianqi Li, Yi Yuan, Yifan Mao, Yuting Xiao, Ziping Ma |
阅读更多来源: ArXiv AI | 14-07-25
Introspection of Thought Helps AI Agents
Authors: Haoran Sun, Shaoning Zeng |
阅读更多来源: ArXiv AI | 14-07-25
Agentic Large Language Models for Conceptual Systems Engineering and Design
Authors: Soheyl Massoudi, Mark Fuge |
阅读更多来源: ArXiv AI | 14-07-25
Show HN: FFmpeg in plain English – LLM-assisted FFmpeg in the browservidmix.app
阅读更多来源: Hacker News | 14-07-25
The upcoming GPT-3 moment for RLmechanize.work
阅读更多来源: Hacker News | 14-07-25
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMsarxiv.org
阅读更多来源: Hacker News | 14-07-25
Local Chatbot RAG with FreeBSD Knowledgehackacad.net
阅读更多来源: Hacker News | 14-07-25
Ask HN: How much of OpenAI code is written by AI?
阅读更多来源: Hacker News | 14-07-25
Show HN: Learn LLMs LeetCode Stylegithub.com/exorust
阅读更多来源: Hacker News | 14-07-25
Hypercapitalism and the AI talent warsjohnluttig.com
阅读更多来源: Hacker News | 14-07-25
OpenAI loses out as Google hires Windsurf's CEO and top talent
阅读更多来源: The Decoder | 13-07-25
Switching to Claude Code and VSCode Inside Dockertimsh.org
阅读更多来源: Hacker News | 13-07-25
Understanding Tool Calling in LLMs – Step-by-Step with REST and Spring AImuthuishere.medium.com
阅读更多来源: Hacker News | 13-07-25
Axon's Draft One AI Police Report Generator Is Designed to Defy Transparencyeff.org
阅读更多来源: Hacker News | 13-07-25
MIRIX: Multi-Agent Memory System for LLM-Based Agents
Authors: Yu Wang, Xi Chen |
阅读更多来源: ArXiv AI | 13-07-25
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs
Authors: Jeongseok Hyun, Sukjun Hwang, Su Ho Han, Taeoh Kim, Inwoong Lee, Dongyoon Wee, Joon-Young Lee, Seon Joo Kim, Minho Shim |
阅读更多来源: ArXiv AI | 13-07-25
Autonomous Control Leveraging LLMs: An Agentic Framework for Next-Generation Industrial Automation
Authors: Javal Vyas, Mehmet Mercangoz |
阅读更多来源: ArXiv AI | 13-07-25
BOOST: Out-of-Distribution-Informed Adaptive Sampling for Bias Mitigation in Stylistic Convolutional Neural Networks
Authors: Mridula Vijendran, Shuang Chen, Jingjing Deng, Hubert P. H. Shum |
阅读更多来源: ArXiv AI | 13-07-25
Application of LLMs to Multi-Robot Path Planning and Task Allocation
Authors: Ashish Kumar |
阅读更多来源: ArXiv AI | 13-07-25
On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment
Authors: Sarah Ball, Greg Gluch, Shafi Goldwasser, Frauke Kreuter, Omer Reingold, Guy N. Rothblum |
阅读更多来源: ArXiv AI | 13-07-25
StarDojo: Benchmarking Open-Ended Behaviors of Agentic Multimodal LLMs in Production-Living Simulations with Stardew Valley
Authors: Weihao Tan, Changjiu Jiang, Yu Duan, Mingcong Lei, Jiageng Li, Yitian Hong, Xinrun Wang, Bo An |
阅读更多来源: ArXiv AI | 13-07-25
DrugMCTS: a drug repurposing framework combining multi-agent, RAG and Monte Carlo Tree Search
Authors: Zerui Yang, Yuwei Wan, Yinqiao Li, Yudai Matsuda, Tong Xie, Linqi Song |
阅读更多来源: ArXiv AI | 13-07-25
Enhancing Vaccine Safety Surveillance: Extracting Vaccine Mentions from Emergency Department Triage Notes Using Fine-Tuned Large Language Models
Authors: Sedigh Khademi, Jim Black, Christopher Palmer, Muhammad Javed, Hazel Clothier, Jim Buttery, Gerardo Luis Dimaguila |
阅读更多来源: ArXiv AI | 13-07-25
PlanQA: A Benchmark for Spatial Reasoning in LLMs using Structured Representations
Authors: Fedor Rodionov, Abdelrahman Eldesokey, Michael Birsak, John Femiani, Bernard Ghanem, Peter Wonka |
阅读更多来源: ArXiv AI | 13-07-25
Measuring AI Alignment with Human Flourishing
Authors: Elizabeth Hilliard, Akshaya Jagadeesh, Alex Cook, Steele Billings, Nicholas Skytland, Alicia Llewellyn, Jackson Paull, Nathan Paull, Nolan Kurylo, Keatra Nesbitt, Robert Gruenewald, Anthony Jantzi, Omar Chavez |
阅读更多来源: ArXiv AI | 13-07-25
Stable Preference Optimization for LLMs: A Bilevel Approach Beyond Direct Preference Optimization
Authors: Chengtao Jian, Kai Yang, Ye Ouyang, Xiaozhou Ye |
阅读更多来源: ArXiv AI | 13-07-25
An Integrated Framework of Prompt Engineering and Multidimensional Knowledge Graphs for Legal Dispute Analysis
Authors: Mingda Zhang, Na Zhao, Jianglong Qing, Qing xu, Kaiwen Pan, Ting luo |
阅读更多来源: ArXiv AI | 13-07-25
New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models
阅读更多来源: The Decoder | 13-07-25
EU's Model Documentation Form makes AI providers explain their models like it's tax season
阅读更多来源: The Decoder | 13-07-25
Kimi-K2 is the next open-weight AI milestone from China after Deepseek
阅读更多来源: The Decoder | 13-07-25
OpenAI’s Windsurf deal is off, and Windsurf’s CEO is going to Googletheverge.com
阅读更多来源: Hacker News | 13-07-25
Researchers used 1,600 YouTube fail videos to show AI models struggle with surprises
阅读更多来源: The Decoder | 13-07-25
OpenAI’s head of ChatGPT says AI will not displace doctors but will displace not going to the doctor
阅读更多来源: The Decoder | 12-07-25
Bad Actors Are Grooming LLMs to Produce Falsehoodsamericansunlight.substack.com
阅读更多来源: Hacker News | 12-07-25
OpenAI delays launch of open-weight modeltwitter.com/sama
阅读更多来源: Hacker News | 12-07-25
Leveraging Elixir's hot code loading capabilities to modularize a monolithic applucassifoni.info
阅读更多来源: Hacker News | 12-07-25
Andrew Ng: Building Faster with AI [video]youtube.com
阅读更多来源: Hacker News | 12-07-25
Sieve (YC X25) is hiring researchers to build large video datasets for AI labssievedata.com
阅读更多来源: Hacker News | 12-07-25
Upgrading an M4 Pro Mac mini's storage for half the pricejeffgeerling.com
阅读更多来源: Hacker News | 12-07-25
ETH Zurich and EPFL to release a LLM developed on public infrastructureethz.ch
阅读更多来源: Hacker News | 12-07-25
Google unveils MedGemma, an open-source AI model suite for medical applications
阅读更多来源: The Decoder | 12-07-25
LLM Inference Handbookbentoml.com
阅读更多来源: Hacker News | 12-07-25
Hugging Face warns that closed-source robots threaten user control
阅读更多来源: The Decoder | 11-07-25
Most AI models can fake alignment, but safety training suppresses the behavior, study finds
阅读更多来源: The Decoder | 11-07-25
Meta continues to lure top AI talent with compensation packages exceeding $200 million
阅读更多来源: The Decoder | 11-07-25
OpenAI will debut an open-weight LLM soon and launch a browser with integrated AI chat
阅读更多来源: The Decoder | 11-07-25
Graphical Linear Algebragraphicallinearalgebra.net
阅读更多来源: Hacker News | 11-07-25
Batch Mode in the Gemini API: Process More for Lessgoogleblog.com
阅读更多来源: Hacker News | 11-07-25
Recovering from AI Addictioninternetaddictsanonymous.org
阅读更多来源: Hacker News | 11-07-25
Is Gemini 2.5 good at bounding boxes?simedw.com
阅读更多来源: Hacker News | 11-07-25
Not So Fast: AI Coding Tools Can Reduce Productivitysecondthoughts.ai
阅读更多来源: Hacker News | 11-07-25
Measuring the impact of AI on experienced open-source developer productivitymetr.org
阅读更多来源: Hacker News | 11-07-25
Bloomberg: China’s AI expansion in Xinjiang relies on Nvidia chips despite U.S. export controls
阅读更多来源: The Decoder | 10-07-25
An attacker used AI to impersonate Secretary Rubio and contact high-ranking officials
阅读更多来源: The Decoder | 10-07-25
At last, a use case for AI agents with sky-high ROI: Stealing cryptotheregister.com
阅读更多来源: Hacker News | 10-07-25
ChatGPT Guessing Game Leads to Users Extracting Free Windows OS Keys and More0din.ai
阅读更多来源: Hacker News | 10-07-25
Biomni: A General-Purpose Biomedical AI Agentgithub.com/snap-stanford
阅读更多来源: Hacker News | 10-07-25
MCP-B: A Protocol for AI Browser Automationmcp-b.ai
阅读更多来源: Hacker News | 10-07-25
Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining: Method, Evaluation and Applications
Authors: Seonwu Kim, Yohan Na, Kihun Kim, Hanhee Cho, Geun Lim, Mintae Kim, Seongik Park, Ki Hyun Kim, Youngsub Han, Byoung-Ki Jeon |
阅读更多来源: ArXiv AI | 10-07-25
Comprehensive Evaluation of Prototype Neural Networks
Authors: Philipp Schlinge, Steffen Meinert, Martin Atzmueller |
阅读更多来源: ArXiv AI | 10-07-25
OpenDPDv2: A Unified Learning and Optimization Framework for Neural Network Digital Predistortion
Authors: Yizhuo Wu, Ang Li, Chang Gao |
阅读更多来源: ArXiv AI | 10-07-25
Winning and losing with Artificial Intelligence: What public discourse about ChatGPT tells us about how societies make sense of technological change
Authors: Adrian Rauchfleisch, Joshua Philip Suarez, Nikka Marie Sales, Andreas Jungherr |
阅读更多来源: ArXiv AI | 10-07-25
The Dark Side of LLMs Agent-based Attacks for Complete Computer Takeover
Authors: Matteo Lupinacci, Francesco Aurelio Pironti, Francesco Blefari, Francesco Romeo, Luigi Arena, Angelo Furfaro |
阅读更多来源: ArXiv AI | 10-07-25
Developing and Maintaining an Open-Source Repository of AI Evaluations: Challenges and Insights
Authors: Alexandra Abbas, Celia Waggoner, Justin Olive |
阅读更多来源: ArXiv AI | 10-07-25
Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model
Authors: Jing Liang, Hongyao Tang, Yi Ma, Jinyi Liu, Yan Zheng, Shuyue Hu, Lei Bai, Jianye Hao |
阅读更多来源: ArXiv AI | 10-07-25
MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation
Authors: Qilong Xing, Zikai Song, Youjia Zhang, Na Feng, Junqing Yu, Wei Yang |
阅读更多来源: ArXiv AI | 10-07-25
PLAME: Leveraging Pretrained Language Models to Generate Enhanced Protein Multiple Sequence Alignments
Authors: Hanqun Cao, Xinyi Zhou, Zijun Gao, Chenyu Wang, Xin Gao, Zhi Zhang, Chunbin Gu, Ge Liu, Pheng-Ann Heng |
阅读更多来源: ArXiv AI | 10-07-25
A Novel Hybrid Deep Learning Technique for Speech Emotion Detection using Feature Engineering
Authors: Shahana Yasmin Chowdhury, Bithi Banik, Md Tamjidul Hoque, Shreya Banerjee |
阅读更多来源: ArXiv AI | 10-07-25
Advances in Intelligent Hearing Aids: Deep Learning Approaches to Selective Noise Cancellation
Authors: Haris Khan, Shumaila Asif, Hassan Nasir |
阅读更多来源: ArXiv AI | 10-07-25
DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning
Authors: Shreyas Vinaya Sathyanarayana, Rahil Shah, Sharanabasava D. Hiremath, Rishikesh Panda, Rahul Jana, Riya Singh, Rida Irfan, Ashwin Murali, Bharath Ramsundar |
阅读更多来源: ArXiv AI | 10-07-25
Comparative Analysis of CNN and Transformer Architectures with Heart Cycle Normalization for Automated Phonocardiogram Classification
Authors: Martin Sondermann, Pinar Bisgin, Niklas Tschorn, Anja Burmann, Christoph M. Friedrich |
阅读更多来源: ArXiv AI | 10-07-25
An AI Approach for Learning the Spectrum of the Laplace-Beltrami Operator
Authors: Yulin An, Enrique del Castillo |
阅读更多来源: ArXiv AI | 10-07-25
Jolting Technologies: Superexponential Acceleration in AI Capabilities and Implications for AGI
Authors: David Orban |
阅读更多来源: ArXiv AI | 10-07-25
The User-Centric Geo-Experience: An LLM-Powered Framework for Enhanced Planning, Navigation, and Dynamic Adaptation
Authors: Jieren Deng, Aleksandar Cvetkovic, Pak Kiu Chung, Dragomir Yankov, Chiqun Zhang |
阅读更多来源: ArXiv AI | 10-07-25
OpenAI is ramping up security to prevent rivals from copying its advanced AI models
阅读更多来源: The Decoder | 10-07-25
RapidRAW: A non-destructive and GPU-accelerated RAW image editorgithub.com/cybertimon
阅读更多来源: Hacker News | 10-07-25
Why LLMs Can't Write Q/Kdb+: Writing Code Right-to-Leftmedium.com/gabiteodoru
阅读更多来源: Hacker News | 10-07-25
Apple’s AI team faces major departures as Meta recruits key engineers
阅读更多来源: The Decoder | 09-07-25
A developer focused on stopping AI bots says poisoning datasets is like peeing in the ocean
阅读更多来源: The Decoder | 09-07-25
Researchers reveal that AI models have distinct strategic fingerprints in classic game theory tests
阅读更多来源: The Decoder | 09-07-25
Sakana AI's new algorithm lets large language models work together to solve complex problems
阅读更多来源: The Decoder | 09-07-25
Huawei pushes back on AI model plagiarism claims
阅读更多来源: The Decoder | 09-07-25
UQLM: A Python Package for Uncertainty Quantification in Large Language Models
Authors: Dylan Bouchard, Mohit Singh Chauhan, David Skarbrevik, Ho-Kyeong Ra, Viren Bajaj, Zeya Ahmad |
阅读更多来源: ArXiv AI | 09-07-25
SQLBarber: A System Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads
Authors: Jiale Lao, Immanuel Trummer |
阅读更多来源: ArXiv AI | 09-07-25
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
Authors: Xiangru Tang, Tianrui Qin, Tianhao Peng, Ziyang Zhou, Daniel Shao, Tingting Du, Xinming Wei, Peng Xia, Fang Wu, He Zhu, Ge Zhang, Jiaheng Liu, Xingyao Wang, Sirui Hong, Chenglin Wu, Hao Cheng, Chi Wang, Wangchunshu Zhou |
阅读更多来源: ArXiv AI | 09-07-25
Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers
Authors: Zhiyuan Peng, Ting-ruen Wei, Tingyu Song, Yilun Zhao, Yi Fang |
阅读更多来源: ArXiv AI | 09-07-25
Chat2SPaT: A Large Language Model Based Tool for Automating Traffic Signal Control Plan Management
Authors: Yue Wang, Miao Zhou, Guijing Huang, Rui Zhuo, Chao Yi, Zhenliang Ma |
阅读更多来源: ArXiv AI | 09-07-25
Cultivating Multimodal Intelligence: Interpretive Reasoning and Agentic RAG Approaches to Dermatological Diagnosis
Authors: Karishma Thakrar, Shreyas Basavatia, Akshay Daftardar |
阅读更多来源: ArXiv AI | 09-07-25
SenseCF: LLM-Prompted Counterfactuals for Intervention and Sensor Data Augmentation
Authors: Shovito Barua Soumma, Asiful Arefeen, Stephanie M. Carpenter, Melanie Hingle, Hassan Ghasemzadeh |
阅读更多来源: ArXiv AI | 09-07-25
Red Teaming AI Red Teaming
Authors: Subhabrata Majumdar, Brian Pendleton, Abhishek Gupta |
阅读更多来源: ArXiv AI | 09-07-25
Conversational Education at Scale: A Multi-LLM Agent Workflow for Procedural Learning and Pedagogic Quality Assessment
Authors: Jiahuan Pei, Fanghua Ye, Xin Sun, Wentao Deng, Koen Hindriks, Junxiao Wang |
阅读更多来源: ArXiv AI | 09-07-25
Domain adaptation of large language models for geotechnical applications
Authors: Lei Fan, Fangxue Liu, Cheng Chen |
阅读更多来源: ArXiv AI | 09-07-25
MLlm-DR: Towards Explainable Depression Recognition with MultiModal Large Language Models
Authors: Wei Zhang, Juan Chen, En Zhu, Wenhong Cheng, YunPeng Li, Yanbo J. Wang |
阅读更多来源: ArXiv AI | 09-07-25
Towards Measurement Theory for Artificial Intelligence
Authors: Elija Perrier |
阅读更多来源: ArXiv AI | 09-07-25
Divergent Realities: A Comparative Analysis of Human Expert vs. Artificial Intelligence Based Generation and Evaluation of Treatment Plans in Dermatology
Authors: Dipayan Sengupta, Saumya Panda |
阅读更多来源: ArXiv AI | 09-07-25
LLMs are Introvert
Authors: Litian Zhang, Xiaoming Zhang, Bingyu Yan, Ziyi Zhou, Bo Zhang, Zhenyu Guan, Xi Zhang, Chaozhuo Li |
阅读更多来源: ArXiv AI | 09-07-25
Enhancing Student Learning with LLM-Generated Retrieval Practice Questions: An Empirical Study in Data Science Courses
Authors: Yuan An, John Liu, Niyam Acharya, Ruhma Hashmi |
阅读更多来源: ArXiv AI | 09-07-25
An autonomous agent for auditing and improving the reliability of clinical AI models
Authors: Lukas Kuhn, Florian Buettner |
阅读更多来源: ArXiv AI | 09-07-25
Current Practices for Building LLM-Powered Reasoning Tools Are Ad Hoc -- and We Can Do Better
Authors: Aaron Bembenek (The University of Melbourne) |
阅读更多来源: ArXiv AI | 09-07-25
Affective-ROPTester: Capability and Bias Analysis of LLMs in Predicting Retinopathy of Prematurity
Authors: Shuai Zhao, Yulin Zhang, Luwei Xiao, Xinyi Wu, Yanhao Jia, Zhongliang Guo, Xiaobao Wu, Cong-Duy Nguyen, Guoming Zhang, Anh Tuan Luu |
阅读更多来源: ArXiv AI | 09-07-25
MusiScene: Leveraging MU-LLaMA for Scene Imagination and Enhanced Video Background Music Generation
Authors: Fathinah Izzati, Xinyue Li, Yuxuan Wu, Gus Xia |
阅读更多来源: ArXiv AI | 09-07-25
Development and Evaluation of HopeBot: an LLM-based chatbot for structured and interactive PHQ-9 depression screening
Authors: Zhijun Guo, Alvina Lai, Julia Ive, Alexandru Petcu, Yutong Wang, Luyuan Qi, Johan H Thygesen, Kezhi Li |
阅读更多来源: ArXiv AI | 09-07-25
OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety
Authors: Sanidhya Vijayvargiya, Aditya Bharat Soni, Xuhui Zhou, Zora Zhiruo Wang, Nouha Dziri, Graham Neubig, Maarten Sap |
阅读更多来源: ArXiv AI | 09-07-25
FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models
Authors: Bo Pang, Yalu Ouyang, Hangfei Xu, Ziqi Jia, Panpan Li, Shengzhao Wen, Lu Wang, Shiyong Li, Yanpeng Wang |
阅读更多来源: ArXiv AI | 09-07-25
Smollm3: Smol, multilingual, long-context reasoner LLMhuggingface.co
阅读更多来源: Hacker News | 09-07-25
I'm Building LLM for Satellite Data EarthGPT.appearthgpt.app
阅读更多来源: Hacker News | 09-07-25
The Tradeoffs of SSMs and Transformersgoombalab.github.io
阅读更多来源: Hacker News | 09-07-25
Rules of good writing (2007)dilbertblog.typepad.com
阅读更多来源: Hacker News | 09-07-25
ChatGPT helped identify a genetic MTHFR mutation after a decade of missed diagnoses
阅读更多来源: The Decoder | 08-07-25
Adding a feature because ChatGPT incorrectly thinks it existsholovaty.com
阅读更多来源: Hacker News | 08-07-25
Launch HN: Morph (YC S23) – Apply AI code edits at 4,500 tokens/sec
阅读更多来源: Hacker News | 08-07-25
Agent Exchange: Shaping the Future of AI Agent Economics
Authors: Yingxuan Yang, Ying Wen, Jun Wang, Weinan Zhang |
阅读更多来源: ArXiv AI | 08-07-25
LLMs model how humans induce logically structured rules
Authors: Alyssa Loo, Ellie Pavlick, Roman Feiman |
阅读更多来源: ArXiv AI | 08-07-25
Toward Better Generalisation in Uncertainty Estimators: Leveraging Data-Agnostic Features
Authors: Thuy An Ha, Bao Quoc Vo |
阅读更多来源: ArXiv AI | 08-07-25
Lyria: A General LLM-Driven Genetic Algorithm Framework for Problem Solving
Authors: Weizhi Tang, Kwabena Nuamah, Vaishak Belle |
阅读更多来源: ArXiv AI | 08-07-25
A Technical Survey of Reinforcement Learning Techniques for Large Language Models
Authors: Saksham Sahai Srivastava, Vaneet Aggarwal |
阅读更多来源: ArXiv AI | 08-07-25
Enhancing Robustness of LLM-Driven Multi-Agent Systems through Randomized Smoothing
Authors: Jinwei Hu, Yi Dong, Zhengtao Ding, Xiaowei Huang |
阅读更多来源: ArXiv AI | 08-07-25
How to Train Your LLM Web Agent: A Statistical Diagnosis
Authors: Dheeraj Vattikonda, Santhoshi Ravichandran, Emiliano Penaloza, Hadi Nekoei, Megh Thakkar, Thibault Le Sellier de Chezelles, Nicolas Gontier, Miguel Muñoz-Mármol, Sahar Omidi Shayegan, Stefania Raimondo, Xue Liu, Alexandre Drouin, Laurent Charlin, Alexandre Piché, Alexandre Lacoste, Massimo Caccia |
阅读更多来源: ArXiv AI | 08-07-25
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers
Authors: Jingze Zhu, Yongliang Wu, Wenbo Zhu, Jiawang Cao, Yanqiang Zheng, Jiawei Chen, Xu Yang, Bernt Schiele, Jonas Fischer, Xinting Hu |
阅读更多来源: ArXiv AI | 08-07-25
DC-Mamber: A Dual Channel Prediction Model based on Mamba and Linear Transformer for Multivariate Time Series Forecasting
Authors: Bing Fan, Shusen Ma, Yun-Bo Zhao, Yu Kang |
阅读更多来源: ArXiv AI | 08-07-25
MedGellan: LLM-Generated Medical Guidance to Support Physicians
Authors: Debodeep Banerjee, Burcu Sayin, Stefano Teso, Andrea Passerini |
阅读更多来源: ArXiv AI | 08-07-25
Towards integration of Privacy Enhancing Technologies in Explainable Artificial Intelligence
Authors: Sonal Allana, Rozita Dara, Xiaodong Lin, Pulei Xiong |
阅读更多来源: ArXiv AI | 08-07-25
Exploring Core and Periphery Precepts in Biological and Artificial Intelligence: An Outcome-Based Perspective
Authors: Niloofar Shadab, Tyler Cody, Alejandro Salado, Taylan G. Topcu, Mohammad Shadab, Peter Beling |
阅读更多来源: ArXiv AI | 08-07-25
LLM-based Question-Answer Framework for Sensor-driven HVAC System Interaction
Authors: Sungmin Lee, Minju Kang, Joonhee Lee, Seungyong Lee, Dongju Kim, Jingi Hong, Jun Shin, Pei Zhang, JeongGil Ko |
阅读更多来源: ArXiv AI | 08-07-25
ChipSeek-R1: Generating Human-Surpassing RTL with LLM via Hierarchical Reward-Driven Reinforcement Learning
Authors: Zhirong Chen, Kaiyan Chang, Zhuolin Li, Xinyang He, Chujie Chen, Cangyuan Li, Mengdi Wang, Haobo Xu, Yinhe Han, Ying Wang |
阅读更多来源: ArXiv AI | 08-07-25
DoPI: Doctor-like Proactive Interrogation LLM for Traditional Chinese Medicine
Authors: Zewen Sun, Ruoxiang Huang, Jiahe Feng, Rundong Kong, Yuqian Wang, Hengyu Liu, Ziqi Gong, Yuyuan Qin, Yingxue Wang, Yu Wang |
阅读更多来源: ArXiv AI | 08-07-25
Application and Evaluation of Large Language Models for Forecasting the Impact of Traffic Incidents
Authors: George Jagadeesh, Srikrishna Iyer, Michal Polanowski, Kai Xin Thia |
阅读更多来源: ArXiv AI | 08-07-25
MARBLE: A Multi-Agent Rule-Based LLM Reasoning Engine for Accident Severity Prediction
Authors: Kaleem Ullah Qasim, Jiashu Zhang |
阅读更多来源: ArXiv AI | 08-07-25
SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam?
Authors: Jingyi Chai, Shuo Tang, Rui Ye, Yuwen Du, Xinyu Zhu, Mengcheng Zhou, Yanfeng Wang, Weinan E, Siheng Chen |
阅读更多来源: ArXiv AI | 08-07-25
OpenAI's Head of Recruiting says Meta's hiring tactics "reek of desperation"
阅读更多来源: The Decoder | 08-07-25
The Maquet machine: how AI is reviving Alexandre Dumas' successful model
阅读更多来源: The Decoder | 08-07-25
Alibaba's new GPT-4o competitor Qwen VLo is no longer open source
阅读更多来源: The Decoder | 08-07-25
A non-anthropomorphized view of LLMsaddxorrol.blogspot.com
阅读更多来源: Hacker News | 08-07-25
Early Signs of Steganographic Capabilities in Frontier LLMs
Authors: Artur Zolkowski, Kei Nishimura-Gasparian, Robert McCarthy, Roland S. Zimmermann, David Lindner |
阅读更多来源: ArXiv AI | 08-07-25
Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks
Authors: Sizhe Chen, Arman Zharmagambetov, David Wagner, Chuan Guo |
阅读更多来源: ArXiv AI | 08-07-25
MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs
Authors: Purbesh Mitra, Sennur Ulukus |
阅读更多来源: ArXiv AI | 08-07-25
SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model
Authors: Wencheng Zhang, Shiqin Qiao, Lingjie Luo, Yinfeng Li, Chuanyang Zheng, Qian Xu, Meng Li, Yong Gui, Yijun He, Jianing Qiu, Jindong Hong, Jiankai Sun |
阅读更多来源: ArXiv AI | 08-07-25
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs
Authors: Ken Tsui |
阅读更多来源: ArXiv AI | 08-07-25
Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs
Authors: Mohammad Ali Alomrani, Yingxue Zhang, Derek Li, Qianyi Sun, Soumyasundar Pal, Zhanguang Zhang, Yaochen Hu, Rohan Deepak Ajwani, Antonios Valkanas, Raika Karimi, Peng Cheng, Yunzhou Wang, Pengyi Liao, Hanrui Huang, Bin Wang, Jianye Hao, Mark Coates |
阅读更多来源: ArXiv AI | 08-07-25
STELLA: Self-Evolving LLM Agent for Biomedical Research
Authors: Ruofan Jin, Zaixi Zhang, Mengdi Wang, Le Cong |
阅读更多来源: ArXiv AI | 08-07-25
Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous Evaluation
Authors: Jungkoo Kang |
阅读更多来源: ArXiv AI | 08-07-25
Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust
Authors: Amogh Mannekote, Adam Davies, Guohao Li, Kristy Elizabeth Boyer, ChengXiang Zhai, Bonnie J Dorr, Francesco Pinto |
阅读更多来源: ArXiv AI | 08-07-25
Data Diversification Methods In Alignment Enhance Math Performance In LLMs
Authors: Berkan Dokmeci, Qingyang Wu, Ben Athiwaratkun, Ce Zhang, Shuaiwen Leon Song, James Zou |
阅读更多来源: ArXiv AI | 08-07-25
What Neuroscience Can Teach AI About Learning in Continuously Changing Environments
Authors: Daniel Durstewitz, Bruno Averbeck, Georgia Koppe |
阅读更多来源: ArXiv AI | 08-07-25
AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench
Authors: Edan Toledo, Karen Hambardzumyan, Martin Josifoski, Rishi Hazra, Nicolas Baldwin, Alexis Audran-Reiss, Michael Kuchnik, Despoina Magka, Minqi Jiang, Alisia Maria Lupidi, Andrei Lupu, Roberta Raileanu, Kelvin Niu, Tatiana Shavrina, Jean-Christophe Gagnon-Audet, Michael Shvartsman, Shagun Sodhani, Alexander H. Miller, Abhishek Charnalia, Derek Dunfield, Carole-Jean Wu, Pontus Stenetorp, Nicola Cancedda, Jakob Nicolaus Foerster, Yoram Bachrach |
阅读更多来源: ArXiv AI | 08-07-25
OMS: On-the-fly, Multi-Objective, Self-Reflective Ad Keyword Generation via LLM Agent
Authors: Bowen Chen, Zhao Wang, Shingo Takamatsu |
阅读更多来源: ArXiv AI | 08-07-25
Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory
Authors: Kenneth Payne, Baptiste Alloui-Cros |
阅读更多来源: ArXiv AI | 08-07-25
Detection of Disengagement from Voluntary Quizzes: An Explainable Machine Learning Approach in Higher Distance Education
Authors: Behnam Parsaeifard, Christof Imhof, Tansu Pancar, Ioan-Sorin Comsa, Martin Hlosta, Nicole Bergamin, Per Bergamin |
阅读更多来源: ArXiv AI | 08-07-25
Knowledge Protocol Engineering: A New Paradigm for AI in Domain-Specific Knowledge Work
Authors: Guangwei Zhang |
阅读更多来源: ArXiv AI | 08-07-25
KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs
Authors: Yuzhang Xie, Hejie Cui, Ziyang Zhang, Jiaying Lu, Kai Shu, Fadi Nahab, Xiao Hu, Carl Yang |
阅读更多来源: ArXiv AI | 08-07-25
"No grace period, no pause": EU sticks to AI Act timeline despite industry pushback
阅读更多来源: The Decoder | 07-07-25
ChatGPT usage for news surges as Google news searches decline
阅读更多来源: The Decoder | 07-07-25
LLMs should not replace therapistsarxiv.org
阅读更多来源: Hacker News | 07-07-25
Opencode: AI coding agent, built for the terminalgithub.com/sst
阅读更多来源: Hacker News | 07-07-25
Collatz's Ant and Σ(n)gbragafibra.github.io
阅读更多来源: Hacker News | 07-07-25
Overclocking LLM Reasoning: Monitoring and Controlling LLM Thinking Path Lengthsroyeisen.github.io
阅读更多来源: Hacker News | 07-07-25
Mirage: AI-native UGC game engine powered by real-time world modeldynamicslab.ai
阅读更多来源: Hacker News | 07-07-25
Optimizing Tool Selection for LLM Workflows with Differentiable Programmingviksit.substack.com
阅读更多来源: Hacker News | 06-07-25
The force-feeding of AI features on an unwilling publichonest-broker.com
阅读更多来源: Hacker News | 06-07-25
A Canadian's AI hoax duped the media and propelled a 'band' to successcbc.ca
阅读更多来源: Hacker News | 06-07-25
The Right Way to Embed an LLM in a Group Chattripjam.app
阅读更多来源: Hacker News | 06-07-25
Impact of PCIe 5.0 Bandwidth on GPU Content Creation and LLM Performancepugetsystems.com
阅读更多来源: Hacker News | 05-07-25
Large Language Models Are Improving Exponentiallyieee.org
阅读更多来源: Hacker News | 05-07-25
SciArena lets scientists compare LLMs on real research questions
阅读更多来源: The Decoder | 05-07-25
Google launches Veo 3 Fast worldwide, letting Gemini Pro users generate videos up to 720p
阅读更多来源: The Decoder | 05-07-25
Gremllmgithub.com/awwaiid
阅读更多来源: Hacker News | 05-07-25
ChatGPT creates phisher's paradise by serving the wrong URLs for major companiestheregister.com
阅读更多来源: Hacker News | 05-07-25
Version Control for AI Codingbranching.app
阅读更多来源: Hacker News | 05-07-25
Everything around LLMs is still magical and wishful thinkingdmitriid.com
阅读更多来源: Hacker News | 05-07-25
Meta reportedly offers top OpenAI researchers up to $300 million over four years
阅读更多来源: The Decoder | 04-07-25
How AI on Microcontrollers Works: Operators and Kernelsdanielmangum.com
阅读更多来源: Hacker News | 04-07-25
Show HN: I AI coded a tower defense game and documented the whole processgithub.com/maciej-trebacz
阅读更多来源: Hacker News | 04-07-25
Fei-Fei Li: Spatial intelligence is the next frontier in AI [video]youtube.com
阅读更多来源: Hacker News | 04-07-25
About AI Evalshamel.dev
阅读更多来源: Hacker News | 04-07-25
Manipulating trapped air bubbles in ice for message storage in cold regionscell.com
阅读更多来源: Hacker News | 04-07-25
Cloudflare aims to save the World Wide Web by blocking AI crawlers without explicit consent
阅读更多来源: The Decoder | 03-07-25
Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling
Authors: Zeyu Huang, Tianhao Cheng, Zihan Qiu, Zili Wang, Yinghui Xu, Edoardo M. Ponti, Ivan Titov |
阅读更多来源: ArXiv AI | 03-07-25
AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training
Authors: Zhenyu Han, Ansheng You, Haibo Wang, Kui Luo, Guang Yang, Wenqi Shi, Menglong Chen, Sicheng Zhang, Zeshun Lan, Chunshi Deng, Huazhong Ji, Wenjie Liu, Yu Huang, Yixiang Zhang, Chenyi Pan, Jing Wang, Xin Huang, Chunsheng Li, Jianping Wu |
阅读更多来源: ArXiv AI | 03-07-25
Towards culturally-appropriate conversational AI for health in the majority world: An exploratory study with citizens and professionals in Latin America
Authors: Dorian Peters, Fernanda Espinoza, Marco da Re, Guido Ivetta, Luciana Benotti, Rafael A. Calvo |
阅读更多来源: ArXiv AI | 03-07-25
AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness
Authors: Zixin Chen, Hongzhan Lin, Kaixin Li, Ziyang Luo, Zhen Ye, Guang Chen, Zhiyong Huang, Jing Ma |
阅读更多来源: ArXiv AI | 03-07-25
Exploring Advanced LLM Multi-Agent Systems Based on Blackboard Architecture
Authors: Bochen Han, Songmao Zhang |
阅读更多来源: ArXiv AI | 03-07-25
Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training
Authors: Ismail Labiad, Mathurin Videau, Matthieu Kowalski, Marc Schoenauer, Alessandro Leite, Julia Kempe, Olivier Teytaud |
阅读更多来源: ArXiv AI | 03-07-25
MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining
Authors: Zhixun Chen, Ping Guo, Wenhan Han, Yifan Zhang, Binbin Liu, Haobin Lin, Fengze Liu, Yan Zhao, Bingni Zhang, Taifeng Wang, Yin Zheng, Meng Fang |
阅读更多来源: ArXiv AI | 03-07-25
Enhanced Generative Model Evaluation with Clipped Density and Coverage
Authors: Nicolas Salvy, Hugues Talbot, Bertrand Thirion |
阅读更多来源: ArXiv AI | 03-07-25
Empowering Manufacturers with Privacy-Preserving AI Tools: A Case Study in Privacy-Preserving Machine Learning to Solve Real-World Problems
Authors: Xiaoyu Ji, Jessica Shorland, Joshua Shank, Pascal Delpe-Brice, Latanya Sweeney, Jan Allebach, Ali Shakouri |
阅读更多来源: ArXiv AI | 03-07-25
LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs
Authors: Reza Arabpour, Haitz Sáez de Ocáriz Borde, Anastasis Kratsios |
阅读更多来源: ArXiv AI | 03-07-25
Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging
Authors: Montasir Shams, Chashi Mahiul Islam, Shaeke Salman, Phat Tran, Xiuwen Liu |
阅读更多来源: ArXiv AI | 03-07-25
End-to-End Large Portfolio Optimization for Variance Minimization with Neural Networks through Covariance Cleaning
Authors: Christian Bongiorno, Efstratios Manolakis, Rosario Nunzio Mantegna |
阅读更多来源: ArXiv AI | 03-07-25
Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models
Authors: Chengao Li, Hanyu Zhang, Yunkun Xu, Hongyan Xue, Xiang Ao, Qing He |
阅读更多来源: ArXiv AI | 03-07-25
AI4Research: A Survey of Artificial Intelligence for Scientific Research
Authors: Qiguang Chen, Mingda Yang, Libo Qin, Jinhao Liu, Zheng Yan, Jiannan Guan, Dengyun Peng, Yiyan Ji, Hanjing Li, Mengkang Hu, Yimeng Zhang, Yihao Liang, Yuhang Zhou, Jiaqi Wang, Zhi Chen, Wanxiang Che |
阅读更多来源: ArXiv AI | 03-07-25
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
Authors: Rahul Ramachandran, Ali Garjani, Roman Bachmann, Andrei Atanov, Oğuzhan Fatih Kar, Amir Zamir |
阅读更多来源: ArXiv AI | 03-07-25
Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla
Authors: Md Sazzadul Islam Ridoy, Sumi Akter, Md. Aminur Rahman |
阅读更多来源: ArXiv AI | 03-07-25
Exploring a Hybrid Deep Learning Approach for Anomaly Detection in Mental Healthcare Provider Billing: Addressing Label Scarcity through Semi-Supervised Anomaly Detection
Authors: Samirah Bakker, Yao Ma, Seyed Sahand Mohammadi Ziabari |
阅读更多来源: ArXiv AI | 03-07-25
AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation
Authors: Sixiang Chen, Jiaming Liu, Siyuan Qian, Han Jiang, Lily Li, Renrui Zhang, Zhuoyang Liu, Chenyang Gu, Chengkai Hou, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang |
阅读更多来源: ArXiv AI | 03-07-25
Using multi-agent architecture to mitigate the risk of LLM hallucinations
Authors: Abd Elrahman Amer, Magdi Amer |
阅读更多来源: ArXiv AI | 03-07-25
MindsDB (YC W20) is hiring an AI solutions engineergreenhouse.io
阅读更多来源: Hacker News | 03-07-25
What to build instead of AI agentsdecodingml.substack.com
阅读更多来源: Hacker News | 03-07-25
Meta founds Superintelligence Labs with top acquisitions from OpenAI and Google
阅读更多来源: The Decoder | 02-07-25
Apple weighs abandoning its own AI for Siri as it tests models from OpenAI and Anthropic
阅读更多来源: The Decoder | 02-07-25
HN Slop: AI startup ideas generated from Hacker Newsjosh.ing
阅读更多来源: Hacker News | 02-07-25
Show HN: A modern C++20 AI SDK (GPT‑4o, Claude 3.5, tool‑calling)
阅读更多来源: Hacker News | 02-07-25
Show HN: Spegel, a Terminal Browser That Uses LLMs to Rewrite Webpagessimedw.com
阅读更多来源: Hacker News | 02-07-25
Sam Altman Slams Meta's AI Talent Poaching: 'Missionaries Will Beat Mercenaries'wired.com
阅读更多来源: Hacker News | 02-07-25
Hilbert's sixth problem: derivation of fluid equations via Boltzmann's theoryarxiv.org
阅读更多来源: Hacker News | 02-07-25
How large are large language models?gist.github.com
阅读更多来源: Hacker News | 02-07-25
Echoes of AI: Investigating the Downstream Effects of AI Assistants on Software Maintainability
Authors: Markus Borg, Dave Hewett, Nadim Hagatulah, Noric Couderc, Emma Söderberg, Donald Graham, Uttam Kini, Dave Farley |
阅读更多来源: ArXiv AI | 02-07-25
HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning
Authors: Zhi Jing, Siyuan Yang, Jicong Ao, Ting Xiao, Yugang Jiang, Chenjia Bai |
阅读更多来源: ArXiv AI | 02-07-25
Automated anatomy-based post-processing reduces false positives and improved interpretability of deep learning intracranial aneurysm detection
Authors: Jisoo Kim, Chu-Hsuan Lin, Alberto Ceballos-Arroyo, Ping Liu, Huaizu Jiang, Shrikanth Yadav, Qi Wan, Lei Qin, Geoffrey S Young |
阅读更多来源: ArXiv AI | 02-07-25
CAVALRY-V: A Large-Scale Generator Framework for Adversarial Attacks on Video MLLMs
Authors: Jiaming Zhang, Rui Hu, Qing Guo, Wei Yang Bryan Lim |
阅读更多来源: ArXiv AI | 02-07-25
Many LLMs Are More Utilitarian Than One
Authors: Anita Keshmirian, Razan Baltaji, Babak Hemmatian, Hadi Asghari, Lav R. Varshney |
阅读更多来源: ArXiv AI | 02-07-25
Deep learning-based segmentation of T1 and T2 cardiac MRI maps for automated disease detection
Authors: Andreea Bianca Popescu, Andreas Seitz, Heiko Mahrholdt, Jens Wetzl, Athira Jacob, Lucian Mihai Itu, Constantin Suciu, Teodora Chitiboi |
阅读更多来源: ArXiv AI | 02-07-25
Stylometry recognizes human and LLM-generated texts in short samples
Authors: Karol Przystalski, Jan K. Argasiński, Iwona Grabska-Gradzińska, Jeremi K. Ochab |
阅读更多来源: ArXiv AI | 02-07-25
Large Language Model Powered Intelligent Urban Agents: Concepts, Capabilities, and Applications
Authors: Jindong Han, Yansong Ning, Zirui Yuan, Hang Ni, Fan Liu, Tengfei Lyu, Hao Liu |
阅读更多来源: ArXiv AI | 02-07-25
Turning AI Data Centers into Grid-Interactive Assets: Results from a Field Demonstration in Phoenix, Arizona
Authors: Philip Colangelo, Ayse K. Coskun, Jack Megrue, Ciaran Roberts, Shayan Sengupta, Varun Sivaram, Ethan Tiao, Aroon Vijaykar, Chris Williams, Daniel C. Wilson, Zack MacFarland, Daniel Dreiling, Nathan Morey, Anuja Ratnayake, Baskar Vairamohan |
阅读更多来源: ArXiv AI | 02-07-25
Description of the Training Process of Neural Networks via Ergodic Theorem : Ghost nodes
Authors: Eun-Ji Park, Sangwon Yun |
阅读更多来源: ArXiv AI | 02-07-25
TalentMine: LLM-Based Extraction and Question-Answering from Multimodal Talent Tables
Authors: Varun Mannam, Fang Wang, Chaochun Liu, Xin Chen |
阅读更多来源: ArXiv AI | 02-07-25
Holistic Artificial Intelligence in Medicine; improved performance and explainability
Authors: Periklis Petridis, Georgios Margaritis, Vasiliki Stoumpou, Dimitris Bertsimas |
阅读更多来源: ArXiv AI | 02-07-25
ChatGPT produces more "lazy" thinkers: Evidence of cognitive engagement decline
Authors: Georgios P. Georgiou |
阅读更多来源: ArXiv AI | 02-07-25
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
Authors: Maggie Huan, Yuetai Li, Tuney Zheng, Xiaoyu Xu, Seungone Kim, Minxin Du, Radha Poovendran, Graham Neubig, Xiang Yue |
阅读更多来源: ArXiv AI | 02-07-25
Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess
Authors: Dongyoon Hwang, Hojoon Lee, Jaegul Choo, Dongmin Park, Jongho Park |
阅读更多来源: ArXiv AI | 02-07-25
A Robust Algorithm for Non-IID Machine Learning Problems with Convergence Analysis
Authors: Qing Xu, Xiaohua Xuan |
阅读更多来源: ArXiv AI | 02-07-25
Enhancing LLM Agent Safety via Causal Influence Prompting
Authors: Dongyoon Hahm, Woogyeol Jin, June Suk Choi, Sungsoo Ahn, Kimin Lee |
阅读更多来源: ArXiv AI | 02-07-25
Google brings Gemini for Education and Gemini in Classroom AI tools to schools
阅读更多来源: The Decoder | 02-07-25
Microsoft’s MAI-DxO boosts AI diagnostic accuracy and cuts costs by nearly 70 percent
阅读更多来源: The Decoder | 02-07-25
The wanton destruction of a creative-tech eragreg.technology
阅读更多来源: Hacker News | 02-07-25
Building a Personal AI Factoryjohn-rush.com
阅读更多来源: Hacker News | 02-07-25
Show HN: Core – open source memory graph for LLMs – shareable, user ownedgithub.com/redplanethq
阅读更多来源: Hacker News | 02-07-25
After Meta's recruiting push, OpenAI tries to retain talent
阅读更多来源: The Decoder | 01-07-25
Claude Code now supports hooksanthropic.com
阅读更多来源: Hacker News | 01-07-25
GPEmu: A GPU emulator for rapid, low-cost deep learning prototyping [pdf]vldb.org
阅读更多来源: Hacker News | 01-07-25
Cloudflare to introduce pay-per-crawl for AI botscloudflare.com
阅读更多来源: Hacker News | 01-07-25
Researchers Uncover Hidden Ingredients Behind AI Creativityquantamagazine.org
阅读更多来源: Hacker News | 01-07-25
The new skill in AI is not prompting, it's context engineeringphilschmid.de
阅读更多来源: Hacker News | 01-07-25
The hidden JTAG in a Qualcomm/Snapdragon device’s USB portlinaro.org
阅读更多来源: Hacker News | 01-07-25
Show HN: ToplingDB - A Persistent Key-Value Store for External Storagegithub.com/topling
阅读更多来源: Hacker News | 01-07-25
The average chess players of Bletchley Park and AI research in Britainblogs.bl.uk
阅读更多来源: Hacker News | 01-07-25
Development of Hybrid Artificial Intelligence Training on Real and Synthetic Data: Benchmark on Two Mixed Training Strategies
Authors: Paul Wachter, Lukas Niehaus, Julius Schöning |
阅读更多来源: ArXiv AI | 01-07-25
Bootstrapping Human-Like Planning via LLMs
Authors: David Porfirio, Vincent Hsiao, Morgan Fine-Morris, Leslie Smith, Laura M. Hiatt |
阅读更多来源: ArXiv AI | 01-07-25
Bridging Ethical Principles and Algorithmic Methods: An Alternative Approach for Assessing Trustworthiness in AI Systems
Authors: Michael Papademas, Xenia Ziouvelou, Antonis Troumpoukis, Vangelis Karkaletsis |
阅读更多来源: ArXiv AI | 01-07-25
The Societal Impact of Foundation Models: Advancing Evidence-based AI Policy
Authors: Rishi Bommasani |
阅读更多来源: ArXiv AI | 01-07-25
Can Large Language Models Capture Human Risk Preferences? A Cross-Cultural Study
Authors: Bing Song, Jianing Liu, Sisi Jian, Chenyang Wu, Vinayak Dixit |
阅读更多来源: ArXiv AI | 01-07-25
Are Large Language Models Capable of Deep Relational Reasoning? Insights from DeepSeek-R1 and Benchmark Comparisons
Authors: Chi Chiu So, Yueyue Sun, Jun-Min Wang, Siu Pang Yung, Anthony Wai Keung Loh, Chun Pong Chau |
阅读更多来源: ArXiv AI | 01-07-25
Data Augmentation for Cognitive Behavioral Therapy: Leveraging ERNIE Language Models using Artificial Intelligence
Authors: Bosubabu Sambana, Kondreddygari Archana, Suram Indhra Sena Reddy, Shaik Meethaigar Jameer Basha, Shaik Karishma |
阅读更多来源: ArXiv AI | 01-07-25
The Confidence Paradox: Can LLM Know When It's Wrong
Authors: Sahil Tripathi, Md Tabrez Nafis, Imran Hussain, Jiechao Gao |
阅读更多来源: ArXiv AI | 01-07-25
CooT: Learning to Coordinate In-Context with Coordination Transformers
Authors: Huai-Chih Wang, Hsiang-Chun Chuang, Hsi-Chun Cheng, Dai-Jie Wu, Shao-Hua Sun |
阅读更多来源: ArXiv AI | 01-07-25
ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data
Authors: Yu Zhang, Ruijie Yu, Jidong Tian, Feng Zhu, Jiapeng Liu, Xiaokang Yang, Yaohui Jin, Yanyan Xu |
阅读更多来源: ArXiv AI | 01-07-25
Assessing GPTZero's Accuracy in Identifying AI vs. Human-Written Essays
Authors: Selin Dik, Osman Erdem, Mehmet Dik |
阅读更多来源: ArXiv AI | 01-07-25
Evaluating Multi-Agent Defences Against Jailbreaking Attacks on Large Language Models
Authors: Maria Carolina Cornelia Wit, Jun Pang |
阅读更多来源: ArXiv AI | 01-07-25
MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI
Authors: Huanjin Yao, Jiaxing Huang, Yawen Qiu, Michael K. Chen, Wenzheng Liu, Wei Zhang, Wenjie Zeng, Xikun Zhang, Jingyi Zhang, Yuxin Song, Wenhao Wu, Dacheng Tao |
阅读更多来源: ArXiv AI | 01-07-25
Agent4S: The Transformation of Research Paradigms from the Perspective of Large Language Models
Authors: Boyuan Zheng, Zerui Fang, Zhe Xu, Rui Wang, Yiwen Chen, Cunshi Wang, Mengwei Qu, Lei Lei, Zhen Feng, Yan Liu, Yuyang Li, Mingzhou Tan, Jiaji Wu, Jianwei Shuai, Jia Li, Fangfu Ye |
阅读更多来源: ArXiv AI | 01-07-25
Attestable Audits: Verifiable AI Safety Benchmarks Using Trusted Execution Environments
Authors: Christoph Schnabl, Daniel Hugenroth, Bill Marino, Alastair R. Beresford |
阅读更多来源: ArXiv AI | 01-07-25
A New Perspective On AI Safety Through Control Theory Methodologies
Authors: Lars Ullrich, Walter Zimmer, Ross Greer, Knut Graichen, Alois C. Knoll, Mohan Trivedi |
阅读更多来源: ArXiv AI | 01-07-25
Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning
Authors: Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik |
阅读更多来源: ArXiv AI | 01-07-25
Performance of LLMs on Stochastic Modeling Operations Research Problems: From Theory to Practice
Authors: Akshit Kumar, Tianyi Peng, Yuhang Wu, Assaf Zeevi |
阅读更多来源: ArXiv AI | 01-07-25
AI Risk-Management Standards Profile for General-Purpose AI (GPAI) and Foundation Models
Authors: Anthony M. Barrett, Jessica Newman, Brandie Nonnecke, Nada Madkour, Dan Hendrycks, Evan R. Murphy, Krystal Jackson, Deepika Raman |
阅读更多来源: ArXiv AI | 01-07-25
Harnessing AI Agents to Advance Research on Refugee Child Mental Health
Authors: Aditya Shrivastava, Komal Gupta, Shraddha Arora |
阅读更多来源: ArXiv AI | 01-07-25
OpenAI loses four more top researchers to Meta as even its own engineers call it a "huge loss"
阅读更多来源: The Decoder | 01-07-25
Show HN: Local LLM Notepad – run a GPT-style model from a USB stickgithub.com/runzhouye
阅读更多来源: Hacker News | 01-07-25
Show HN: We're two coffee nerds who built an AI app to track beans and recipesbeanbook.app
阅读更多来源: Hacker News | 01-07-25
Show HN: TokenDagger – A tokenizer faster than OpenAI's Tiktokengithub.com/m4thyou
阅读更多来源: Hacker News | 01-07-25
There are no new ideas in AI only new datasetsjxmo.io
阅读更多来源: Hacker News | 01-07-25
OmniGen 2 blends image and text generation like GPT-4o, but is open source
阅读更多来源: The Decoder | 30-06-25
Gridfinity: The modular, open-source grid storage systemgridfinity.xyz
阅读更多来源: Hacker News | 30-06-25
Using Large Language Models to Suggest Informative Prior Distributions in Bayesian Statistics
Authors: Michael A. Riegler, Kristoffer Herland Hellton, Vajira Thambawita, Hugo L. Hammer |
阅读更多来源: ArXiv AI | 30-06-25
Analyzing and Fine-Tuning Whisper Models for Multilingual Pilot Speech Transcription in the Cockpit
Authors: Kartheek Kumar Reddy Nareddy, Sarah Ternus, Julia Niebling |
阅读更多来源: ArXiv AI | 30-06-25
Advancing Jailbreak Strategies: A Hybrid Approach to Exploiting LLM Vulnerabilities and Bypassing Modern Defenses
Authors: Mohamed Ahmed, Mohamed Abdelmouty, Mingyu Kim, Gunvanth Kandula, Alex Park, James C. Davis |
阅读更多来源: ArXiv AI | 30-06-25
Transformers are Graph Neural Networks
Authors: Chaitanya K. Joshi |
阅读更多来源: ArXiv AI | 30-06-25
Autonomic Microservice Management via Agentic AI and MAPE-K Integration
Authors: Matteo Esposito, Alexander Bakhtin, Noman Ahmad, Mikel Robredo, Ruoyu Su, Valentina Lenarduzzi, Davide Taibi |
阅读更多来源: ArXiv AI | 30-06-25
CoATA: Effective Co-Augmentation of Topology and Attribute for Graph Neural Networks
Authors: Tao Liu, Longlong Lin, Yunfeng Yu, Xi Ou, Youan Zhang, Zhiqiu Ye, Tao Jia |
阅读更多来源: ArXiv AI | 30-06-25
Projected Compression: Trainable Projection for Efficient Transformer Compression
Authors: Maciej Stefaniak, Michał Krutul, Jan Małaśnicki, Maciej Pióro, Jakub Krajewski, Sebastian Jaszczur, Marek Cygan, Kamil Adamczewski, Jan Ludziejewski |
阅读更多来源: ArXiv AI | 30-06-25
From Ground to Air: Noise Robustness in Vision Transformers and CNNs for Event-Based Vehicle Classification with Potential UAV Applications
Authors: Nouf Almesafri, Hector Figueiredo, Miguel Arana-Catania |
阅读更多来源: ArXiv AI | 30-06-25
Concept-Level AI for Telecom: Moving Beyond Large Language Models
Authors: Viswanath Kumarskandpriya, Abdulhalim Dandoush, Abbas Bradai, Ali Belgacem |
阅读更多来源: ArXiv AI | 30-06-25
A Deep Learning framework for building damage assessment using VHR SAR and geospatial data: demonstration on the 2023 Turkiye Earthquake
Authors: Luigi Russo, Deodato Tapete, Silvia Liberata Ullo, Paolo Gamba |
阅读更多来源: ArXiv AI | 30-06-25
CLoVE: Personalized Federated Learning through Clustering of Loss Vector Embeddings
Authors: Randeep Bhatia, Nikos Papadis, Murali Kodialam, TV Lakshman, Sayak Chakrabarty |
阅读更多来源: ArXiv AI | 30-06-25
QuickSilver -- Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization
Authors: Danush Khanna, Aditya Kumar Guru, Srivarshinee Sridhar, Zidan Ahmed, Rubhav Bahirwani, Meetu Malhotra, Vinija Jain, Aman Chadha, Amitava Das, Kripabandhu Ghosh |
阅读更多来源: ArXiv AI | 30-06-25
MobiVerse: Scaling Urban Mobility Simulation with Hybrid Lightweight Domain-Specific Generator and Large Language Models
Authors: Yifan Liu, Xishun Liao, Haoxuan Ma, Jonathan Liu, Rohan Jadhav, Jiaqi Ma |
阅读更多来源: ArXiv AI | 30-06-25
SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents
Authors: Wanxin Tian, Shijie Zhang, Kevin Zhang, Xiaowei Chi, Yulin Luo, Junyu Lu, Chunkai Fan, Qiang Zhou, Yiming Zhao, Ning Liu Siyu Lin, Zhiyuan Qin, Xiaozhu Ju, Shanghang Zhang, Jian Tang |
阅读更多来源: ArXiv AI | 30-06-25
CitySim: Modeling Urban Behaviors and City Dynamics with Large-Scale LLM-Driven Agent Simulation
Authors: Nicolas Bougie, Narimasa Watanabe |
阅读更多来源: ArXiv AI | 30-06-25
A Different Approach to AI Safety: Proceedings from the Columbia Convening on Openness in Artificial Intelligence and AI Safety
Authors: Camille François, Ludovic Péran, Ayah Bdeir, Nouha Dziri, Will Hawkins, Yacine Jernite, Sayash Kapoor, Juliet Shen, Heidy Khlaaf, Kevin Klyman, Nik Marda, Marie Pellat, Deb Raji, Divya Siddarth, Aviya Skowron, Joseph Spisak, Madhulika Srikumar, Victor Storchan, Audrey Tang, Jen Weedon |
阅读更多来源: ArXiv AI | 30-06-25
Query as Test: An Intelligent Driving Test and Data Storage Method for Integrated Cockpit-Vehicle-Road Scenarios
Authors: Shengyue Yao, Runqing Guo, Yangyang Qin, Miangbing Meng, Jipeng Cao, Yilun Lin, Yisheng Lv, Fei-Yue Wang |
阅读更多来源: ArXiv AI | 30-06-25
Embodied AI Agents: Modeling the World
Authors: Pascale Fung, Yoram Bachrach, Asli Celikyilmaz, Kamalika Chaudhuri, Delong Chen, Willy Chung, Emmanuel Dupoux, Hervé Jégou, Alessandro Lazaric, Arjun Majumdar, Andrea Madotto, Franziska Meier, Florian Metze, Théo Moutakanni, Juan Pino, Basile Terver, Joseph Tighe, Jitendra Malik |
阅读更多来源: ArXiv AI | 30-06-25
AI Model Passport: Data and System Traceability Framework for Transparent AI in Health
Authors: Varvara Kalokyri, Nikolaos S. Tachos, Charalampos N. Kalantzopoulos, Stelios Sfakianakis, Haridimos Kondylakis, Dimitrios I. Zaridis, Sara Colantonio, Daniele Regge, Nikolaos Papanikolaou, The ProCAncer-I consortium, Konstantinos Marias, Dimitrios I. Fotiadis, Manolis Tsiknakis |
阅读更多来源: ArXiv AI | 30-06-25
The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements
Authors: Bingchen Zhao, Despoina Magka, Minqi Jiang, Xian Li, Roberta Raileanu, Tatiana Shavrina, Jean-Christophe Gagnon-Audet, Kelvin Niu, Shagun Sodhani, Michael Shvartsman, Andrei Lupu, Alisia Lupidi, Edan Toledo, Karen Hambardzumyan, Martin Josifoski, Thomas Foster, Lucia Cipolina-Kun, Abhishek Charnalia, Derek Dunfield, Alexander H. Miller, Oisin Mac Aodha, Jakob Foerster, Yoram Bachrach |
阅读更多来源: ArXiv AI | 30-06-25
Anthropic's Claude ran a store and lost money by selling below cost and giving discounts
阅读更多来源: The Decoder | 30-06-25
Show HN: A tool to benchmark LLM APIs (OpenAI, Claude, local/self-hosted)llmapitest.com
阅读更多来源: Hacker News | 30-06-25
US Senate moves to block state AI laws for five years if states take broadband funds
阅读更多来源: The Decoder | 30-06-25
Life of an inference request (vLLM V1): How LLMs are served efficiently at scaleubicloud.com
阅读更多来源: Hacker News | 29-06-25
Magnetic Tape Storage Technology: usage, history, and future outlookacm.org
阅读更多来源: Hacker News | 29-06-25
Show HN: A different kind of AI Video generation
阅读更多来源: Hacker News | 29-06-25
Transformer-Based Spatial-Temporal Counterfactual Outcomes Estimation
Authors: He Li, Haoang Chi, Mingyu Liu, Wanrong Huang, Liyang Xu, Wenjing Yang |
阅读更多来源: ArXiv AI | 29-06-25
Maintaining MTEB: Towards Long Term Usability and Reproducibility of Embedding Benchmarks
Authors: Isaac Chung, Imene Kerboua, Marton Kardos, Roman Solomatin, Kenneth Enevoldsen |
阅读更多来源: ArXiv AI | 29-06-25
A Hierarchical Deep Learning Approach for Minority Instrument Detection
Authors: Dylan Sechet, Francesca Bugiotti, Matthieu Kowalski, Edouard d'Hérouville, Filip Langiewicz |
阅读更多来源: ArXiv AI | 29-06-25
$T^3$: Multi-level Tree-based Automatic Program Repair with Large Language Models
Authors: Quanming Liu, Xupeng Bu, Zhichao Yan, Ru Li |
阅读更多来源: ArXiv AI | 29-06-25
Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage
Authors: Gavin Lee Goodship, Luis Miralles-Pechuan, Stephen O'Sullivan |
阅读更多来源: ArXiv AI | 29-06-25
Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection
Authors: Ali Şenol, Garima Agrawal, Huan Liu |
阅读更多来源: ArXiv AI | 29-06-25
Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference
Authors: Colin Samplawski, Adam D. Cobb, Manoj Acharya, Ramneet Kaur, Susmit Jha |
阅读更多来源: ArXiv AI | 29-06-25
Leveraging LLM-Assisted Query Understanding for Live Retrieval-Augmented Generation
Authors: Guanting Dong, Xiaoxi Li, Yuyao Zhang, Mengjie Deng |
阅读更多来源: ArXiv AI | 29-06-25
"What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets
Authors: Akshay Paruchuri, Maryam Aziz, Rohit Vartak, Ayman Ali, Best Uchehara, Xin Liu, Ishan Chatterjee, Monica Agrawal |
阅读更多来源: ArXiv AI | 29-06-25
Potemkin Understanding in Large Language Models
Authors: Marina Mancoridis, Bec Weeks, Keyon Vafa, Sendhil Mullainathan |
阅读更多来源: ArXiv AI | 29-06-25
The Singapore Consensus on Global AI Safety Research Priorities
Authors: Yoshua Bengio, Tegan Maharaj, Luke Ong, Stuart Russell, Dawn Song, Max Tegmark, Lan Xue, Ya-Qin Zhang, Stephen Casper, Wan Sie Lee, Sören Mindermann, Vanessa Wilfred, Vidhisha Balachandran, Fazl Barez, Michael Belinsky, Imane Bello, Malo Bourgon, Mark Brakel, Siméon Campos, Duncan Cass-Beggs, Jiahao Chen, Rumman Chowdhury, Kuan Chua Seah, Jeff Clune, Juntao Dai, Agnes Delaborde, Nouha Dziri, Francisco Eiras, Joshua Engels, Jinyu Fan, Adam Gleave, Noah Goodman, Fynn Heide, Dan Hendrycks, Cyrus Hodes, Bryan Low Kian Hsiang, Minlie Huang, Sami Jawhar, Wang Jingyu, Adam Tauman Kalai, Meindert Kamphuis, Mohan Kankanhalli, Subhash Kantamneni, Mathias Bonde Kirk, Thomas Kwa, Jeffrey Ladish, Kwok-Yan Lam, Wan Lee Sie, Taewhi Lee, Xiaojian Li, Jiajun Liu, Chaochao Lu, Yifan Mai, Richard Mallah, Julian Michael, Nick Moës, Simon Möller, Kihyuk Nam, Kwan Yee Ng, Mark Nitzberg, Besmira Nushi, Seán O hÉigeartaigh, Alejandro Ortega, Pierre Peigné, James Petrie, Benjamin Prud'Homme, Reihaneh Rabbany, Nayat Sanchez-Pi, Sarah Schwettmann, Buck Shlegeris, Saad Siddiqui, Aradhana Sinha, Martín Soto, Cheston Tan, Dong Ting, Robert Trager, Brian Tse, Anthony Tung K. H., Vanessa Wilfred, John Willes, Denise Wong, Wei Xu, Rongwu Xu, Yi Zeng, HongJiang Zhang, Djordje Žikelić |
阅读更多来源: ArXiv AI | 29-06-25
Dynamic Context-Aware Prompt Recommendation for Domain-Specific AI Applications
Authors: Xinye Tang, Haijun Zhai, Chaitanya Belwal, Vineeth Thayanithi, Philip Baumann, Yogesh K Roy |
阅读更多来源: ArXiv AI | 29-06-25
Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?
Authors: Haoang Chi, He Li, Wenjing Yang, Feng Liu, Long Lan, Xiaoguang Ren, Tongliang Liu, Bo Han |
阅读更多来源: ArXiv AI | 29-06-25
Beyond Reactive Safety: Risk-Aware LLM Alignment via Long-Horizon Simulation
Authors: Chenkai Sun, Denghui Zhang, ChengXiang Zhai, Heng Ji |
阅读更多来源: ArXiv AI | 29-06-25
Active Inference AI Systems for Scientific Discovery
Authors: Karthik Duraisamy |
阅读更多来源: ArXiv AI | 29-06-25
IXAII: An Interactive Explainable Artificial Intelligence Interface for Decision Support Systems
Authors: Pauline Speckmann, Mario Nadj, Christian Janiesch |
阅读更多来源: ArXiv AI | 29-06-25
Microsoft’s Braga AI chip faces six-month delay, trails Nvidia’s Blackwell
阅读更多来源: The Decoder | 29-06-25
OpenAI renting Google TPUs sends a strong warning shot to Microsoft
阅读更多来源: The Decoder | 29-06-25
Meta CTO confirms massive offers for top AI executives
阅读更多来源: The Decoder | 29-06-25
Show HN: AGL a toy language that compiles to Gogithub.com/alaingilbert
阅读更多来源: Hacker News | 29-06-25
LLMs bring new nature of abstraction – up and sidewaysmartinfowler.com
阅读更多来源: Hacker News | 28-06-25
Facebook is starting to feed its AI with private, unpublished photostheverge.com
阅读更多来源: Hacker News | 28-06-25
SymbolicAI: A neuro-symbolic perspective on LLMsgithub.com/extensityai
阅读更多来源: Hacker News | 28-06-25
Lossless LLM 3x Throughput Increase by LMCachegithub.com/lmcache
阅读更多来源: Hacker News | 28-06-25
AlphaGenome: AI for Better Understanding the Genomedeepmind.google
阅读更多来源: Hacker News | 28-06-25
Google launches Gemma 3n, a multimodal AI model built for real-time use on mobile devices
阅读更多来源: The Decoder | 28-06-25
Project Vend: Can Claude run a small shop? (And why does that matter?)anthropic.com
阅读更多来源: Hacker News | 28-06-25
Theoretical Analysis of Positional Encodings in Transformer Modelsarxiv.org
阅读更多来源: Hacker News | 28-06-25
Spark AI (YC W24) is hiring a full-stack engineer in SF (founding team)ycombinator.com
阅读更多来源: Hacker News | 28-06-25
Microsoft is reportedly barred from building its own AGI until 2030 under its contract with OpenAI
阅读更多来源: The Decoder | 27-06-25
Meta poaches three top AI researchers from OpenAI, who had poached them from Deepmind
阅读更多来源: The Decoder | 27-06-25
Show HN: Magnitude – Open-source AI browser automation frameworkgithub.com/magnitudedev
阅读更多来源: Hacker News | 27-06-25
Launch HN: Issen (YC F24) – Personal AI language tutor
阅读更多来源: Hacker News | 27-06-25
What did former CTO Mira Murati see at OpenAI that made her choose custom models over AGI
阅读更多来源: The Decoder | 27-06-25
Show HN: I built an AI dataset generatorgithub.com/metabase
阅读更多来源: Hacker News | 27-06-25
Researchers train AI to generate long-form text using only reinforcement learning
阅读更多来源: The Decoder | 26-06-25
Google Deepmind makes robots independent of the cloud with Gemini On-Device
阅读更多来源: The Decoder | 26-06-25
Anthropic won a fair use hearing that could end up being a defeat
阅读更多来源: The Decoder | 26-06-25
Google releases open-source Gemini CLI to bring Gemini AI into developer workflows
阅读更多来源: The Decoder | 26-06-25
Automatic Demonstration Selection for LLM-based Tabular Data Classification
Authors: Shuchu Han, Wolfgang Bruckner |
阅读更多来源: ArXiv AI | 26-06-25
SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models
Authors: Dipayan Saha, Shams Tarek, Hasan Al Shaikh, Khan Thamid Hasan, Pavan Sai Nalluri, Md. Ajoad Hasan, Nashmin Alam, Jingbo Zhou, Sujan Kumar Saha, Mark Tehranipoor, Farimah Farahmandi |
阅读更多来源: ArXiv AI | 26-06-25
WattsOnAI: Measuring, Analyzing, and Visualizing Energy and Carbon Footprint of AI Workloads
Authors: Hongzhen Huang, Kunming Zhang, Hanlong Liao, Kui Wu, Guoming Tang |
阅读更多来源: ArXiv AI | 26-06-25
Large Language Model-Driven Code Compliance Checking in Building Information Modeling
Authors: Soumya Madireddy, Lu Gao, Zia Din, Kinam Kim, Ahmed Senouci, Zhe Han, Yunpeng Zhang |
阅读更多来源: ArXiv AI | 26-06-25
When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs
Authors: Ammar Khairi, Daniel D'souza, Ye Shen, Julia Kreutzer, Sara Hooker |
阅读更多来源: ArXiv AI | 26-06-25
AI in the Writing Process: How Purposeful AI Support Fosters Student Writing
Authors: Momin N. Siddiqui, Roy Pea, Hari Subramonyam |
阅读更多来源: ArXiv AI | 26-06-25
Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs
Authors: Sonia K. Murthy, Rosie Zhao, Jennifer Hu, Sham Kakade, Markus Wulfmeier, Peng Qian, Tomer Ullman |
阅读更多来源: ArXiv AI | 26-06-25
Define-ML: An Approach to Ideate Machine Learning-Enabled Systems
Authors: Silvio Alonso, Antonio Pedro Santos Alves, Lucas Romao, Hélio Lopes, Marcos Kalinowski |
阅读更多来源: ArXiv AI | 26-06-25
Persona-Assigned Large Language Models Exhibit Human-Like Motivated Reasoning
Authors: Saloni Dash, Amélie Reymond, Emma S. Spiro, Aylin Caliskan |
阅读更多来源: ArXiv AI | 26-06-25
Achieving Trustworthy Real-Time Decision Support Systems with Low-Latency Interpretable AI Models
Authors: Zechun Deng, Ziwei Liu, Ziqian Bi, Junhao Song, Chia Xin Liang, Joe Yeong, Junfeng Hao |
阅读更多来源: ArXiv AI | 26-06-25
Accurate and Energy Efficient: Local Retrieval-Augmented Generation Models Outperform Commercial Large Language Models in Medical Tasks
Authors: Konstantinos Vrettos, Michail E. Klontzas |
阅读更多来源: ArXiv AI | 26-06-25
QHackBench: Benchmarking Large Language Models for Quantum Code Generation Using PennyLane Hackathon Challenges
Authors: Abdul Basit, Minghao Shao, Haider Asif, Nouhaila Innan, Muhammad Kashif, Alberto Marchisio, Muhammad Shafique |
阅读更多来源: ArXiv AI | 26-06-25
Enterprise Large Language Model Evaluation Benchmark
Authors: Liya Wang, David Yi, Damien Jose, John Passarelli, James Gao, Jordan Leventis, Kang Li |
阅读更多来源: ArXiv AI | 26-06-25
DiaLLMs: EHR Enhanced Clinical Conversational System for Clinical Test Recommendation and Diagnosis Prediction
Authors: Weijieying Ren, Tianxiang Zhao, Lei Wang, Tianchun Wang, Vasant Honavar |
阅读更多来源: ArXiv AI | 26-06-25
Smart Ride and Delivery Services with Electric Vehicles: Leveraging Bidirectional Charging for Profit Optimisation
Authors: Jinchun Du, Bojie Shen, Muhammad Aamir Cheema, Adel N. Toosi |
阅读更多来源: ArXiv AI | 26-06-25
Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios
Authors: Wenbin Gan, Minh-Son Dao, Koji Zettsu |
阅读更多来源: ArXiv AI | 26-06-25
CogGen: A Learner-Centered Generative AI Architecture for Intelligent Tutoring with Programming Video
Authors: Wengxi Li, Roy Pea, Nick Haber, Hari Subramonyam |
阅读更多来源: ArXiv AI | 26-06-25
Fine-Tuning and Prompt Engineering of LLMs, for the Creation of Multi-Agent AI for Addressing Sustainable Protein Production Challenges
Authors: Alexander D. Kalian, Jaewook Lee, Stefan P. Johannesson, Lennart Otte, Christer Hogstrand, Miao Guo |
阅读更多来源: ArXiv AI | 26-06-25
Towards Community-Driven Agents for Machine Learning Engineering
Authors: Sijie Li, Weiwei Sun, Shanda Li, Ameet Talwalkar, Yiming Yang |
阅读更多来源: ArXiv AI | 26-06-25
LLM code generation may lead to an erosion of trustjaysthoughts.com
阅读更多来源: Hacker News | 26-06-25
Define policy forbidding use of AI code generatorsgithub.com/qemu
阅读更多来源: Hacker News | 26-06-25
Build and Host AI-Powered Apps with Claude – No Deployment Neededanthropic.com
阅读更多来源: Hacker News | 26-06-25
Structured Output with LangChain and Llamafilebrakmic.com
阅读更多来源: Hacker News | 26-06-25
OpenAI charges by the minute, so speed up your audiomand.is
阅读更多来源: Hacker News | 26-06-25
Learnings from Building AI Agentscubic.dev
阅读更多来源: Hacker News | 26-06-25
Gemini CLIblog.google
阅读更多来源: Hacker News | 26-06-25
Google hands off Agent2Agent protocol to Linux Foundation for open AI agent standard
阅读更多来源: The Decoder | 26-06-25
LLM Hallucinations in Practical Code Generationacm.org
阅读更多来源: Hacker News | 26-06-25
FurtherAI (YC W24) Is Hiring for Software and AI Rolesycombinator.com
阅读更多来源: Hacker News | 26-06-25
Disney is in talks with OpenAI about possible partnerships involving its characters
阅读更多来源: The Decoder | 25-06-25
Microsoft has introduced an AI agent to the Windows Settings menu
阅读更多来源: The Decoder | 25-06-25
AI job postings on LinkedIn grew sixfold as AI skill additions to profiles soared twentyfold
阅读更多来源: The Decoder | 25-06-25
African and South American countries are almost entirely excluded from global AI development
阅读更多来源: The Decoder | 25-06-25
ChatGPT's enterprise success against Copilot fuels OpenAI/Microsoft rivalrybloomberg.com
阅读更多来源: Hacker News | 25-06-25
Thoughts on Asunción, Paraguaycpsi.media
阅读更多来源: Hacker News | 25-06-25
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning
Authors: Yuqian Fu, Tinghong Chen, Jiajun Chai, Xihuai Wang, Songjun Tu, Guojun Yin, Wei Lin, Qichao Zhang, Yuanheng Zhu, Dongbin Zhao |
阅读更多来源: ArXiv AI | 25-06-25
Arabic Dialect Classification using RNNs, Transformers, and Large Language Models: A Comparative Analysis
Authors: Omar A.Essameldin, Ali O.Elbeih, Wael H.Gomaa, Wael F.Elsersy |
阅读更多来源: ArXiv AI | 25-06-25
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study
Authors: Yuqi Zhu, Yi Zhong, Jintian Zhang, Ziheng Zhang, Shuofei Qiao, Yujie Luo, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang |
阅读更多来源: ArXiv AI | 25-06-25
Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
Authors: Jun Wang, Xijuan Zeng, Chunyu Qiang, Ruilong Chen, Shiyao Wang, Le Wang, Wangjing Zhou, Pengfei Cai, Jiahui Zhao, Nan Li, Zihan Li, Yuzhe Liang, Xiaopeng Wang, Haorui Zheng, Ming Wen, Kang Yin, Yiran Wang, Nan Li, Feng Deng, Liang Dong, Chen Zhang, Di Zhang, Kun Gai |
阅读更多来源: ArXiv AI | 25-06-25
Do LLMs Know When to Flip a Coin? Strategic Randomization through Reasoning and Experience
Authors: Lingyu Yang (1) ((1) Shanghai Jiao Tong University) |
阅读更多来源: ArXiv AI | 25-06-25
A standard transformer and attention with linear biases for molecular conformer generation
Authors: Viatcheslav Gurev, Timothy Rumbell |
阅读更多来源: ArXiv AI | 25-06-25
Emotion Detection on User Front-Facing App Interfaces for Enhanced Schedule Optimization: A Machine Learning Approach
Authors: Feiting Yang, Antoine Moevus, Steve Lévesque |
阅读更多来源: ArXiv AI | 25-06-25
RecLLM-R1: A Two-Stage Training Paradigm with Reinforcement Learning and Chain-of-Thought v1
Authors: Yu Xie, Xingkai Ren, Ying Qi, Yao Hu, Lianlei Shan |
阅读更多来源: ArXiv AI | 25-06-25
Spiritual-LLM : Gita Inspired Mental Health Therapy In the Era of LLMs
Authors: Janak Kapuriya, Aman Singh, Jainendra Shukla, Rajiv Ratn Shah |
阅读更多来源: ArXiv AI | 25-06-25
Baba is LLM: Reasoning in a Game with Dynamic Rules
Authors: Fien van Wetten, Aske Plaat, Max van Duijn |
阅读更多来源: ArXiv AI | 25-06-25
Conversational Intent-Driven GraphRAG: Enhancing Multi-Turn Dialogue Systems through Adaptive Dual-Retrieval of Flow Patterns and Context Semantics
Authors: Ziqi Zhu, Tao Hu, Honglong Zhang, Dan Yang, HanGeng Chen, Mengran Zhang, Xilun Chen |
阅读更多来源: ArXiv AI | 25-06-25
FEAT: A Preference Feedback Dataset through a Cost-Effective Auto-Generation and Labeling Framework for English AI Tutoring
Authors: Hyein Seo, Taewook Hwang, Yohan Lee, sangkeun Jung |
阅读更多来源: ArXiv AI | 25-06-25
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
Authors: Liang Zeng, Yongcong Li, Yuzhen Xiao, Changshi Li, Chris Yuhao Liu, Rui Yan, Tianwen Wei, Jujie He, Xuchen Song, Yang Liu, Yahui Zhou |
阅读更多来源: ArXiv AI | 25-06-25
Interpretable Hybrid Machine Learning Models Using FOLD-R++ and Answer Set Programming
Authors: Sanne Wielinga, Jesse Heyninck |
阅读更多来源: ArXiv AI | 25-06-25
NTRL: Encounter Generation via Reinforcement Learning for Dynamic Difficulty Adjustment in Dungeons and Dragons
Authors: Carlo Romeo, Andrew D. Bagdanov |
阅读更多来源: ArXiv AI | 25-06-25
KunLunBaizeRAG: Reinforcement Learning Driven Inference Performance Leap for Large Language Models
Authors: Cheng Li, Jiexiong Liu, Yixuan Chen, Qihang Zhou, KunLun Meta |
阅读更多来源: ArXiv AI | 25-06-25
From memories to maps: Mechanisms of in context reinforcement learning in transformers
Authors: Ching Fang, Kanaka Rajan |
阅读更多来源: ArXiv AI | 25-06-25
LLM-Driven Medical Document Analysis: Enhancing Trustworthy Pathology and Differential Diagnosis
Authors: Lei Kang, Xuanshuo Fu, Oriol Ramos Terrades, Javier Vazquez-Corral, Ernest Valveny, Dimosthenis Karatzas |
阅读更多来源: ArXiv AI | 25-06-25
Temporal-IRL: Modeling Port Congestion and Berth Scheduling with Inverse Reinforcement Learning
Authors: Guo Li, Zixiang Xu, Wei Zhang, Yikuan Hu, Xinyu Yang, Nikolay Aristov, Mingjie Tang, Elenna R Dugundji |
阅读更多来源: ArXiv AI | 25-06-25
JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning
Authors: Ai Han, Junxing Hu, Pu Wei, Zhiqian Zhang, Yuhang Guo, Jiawei Lu, Zicheng Zhang |
阅读更多来源: ArXiv AI | 25-06-25
Gemini Robotics On-Device brings AI to local robotic devicesdeepmind.google
阅读更多来源: Hacker News | 25-06-25
Mapping LLMs over excel saved my passion for game devweblog.lol
阅读更多来源: Hacker News | 25-06-25
Blackmail becomes go-to strategy for AI models facing shutdown in new Anthropic tests
阅读更多来源: The Decoder | 24-06-25
'Dragon prince' dinosaur discovery 'rewrites' T.rex family treebbc.com
阅读更多来源: Hacker News | 24-06-25
From Unstructured Communication to Intelligent RAG: Multi-Agent Automation for Supply Chain Knowledge Bases
Authors: Yao Zhang, Zaixi Shang, Silpan Patel, Mikel Zuniga |
阅读更多来源: ArXiv AI | 24-06-25
OmniReflect: Discovering Transferable Constitutions for LLM agents via Neuro-Symbolic Reflections
Authors: Manasa Bharadwaj, Nikhil Verma, Kevin Ferreira |
阅读更多来源: ArXiv AI | 24-06-25
Keeping Medical AI Healthy: A Review of Detection and Correction Methods for System Degradation
Authors: Hao Guan, David Bates, Li Zhou |
阅读更多来源: ArXiv AI | 24-06-25
Resource Rational Contractualism Should Guide AI Alignment
Authors: Sydney Levine, Matija Franklin, Tan Zhi-Xuan, Secil Yanik Guyot, Lionel Wong, Daniel Kilov, Yejin Choi, Joshua B. Tenenbaum, Noah Goodman, Seth Lazar, Iason Gabriel |
阅读更多来源: ArXiv AI | 24-06-25
Measuring and Augmenting Large Language Models for Solving Capture-the-Flag Challenges
Authors: Zimo Ji, Daoyuan Wu, Wenyuan Jiang, Pingchuan Ma, Zongjie Li, Shuai Wang |
阅读更多来源: ArXiv AI | 24-06-25
Taming the Untamed: Graph-Based Knowledge Retrieval and Reasoning for MLLMs to Conquer the Unknown
Authors: Bowen Wang |
阅读更多来源: ArXiv AI | 24-06-25
Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models
Authors: Yukun Huang, Sanxing Chen, Jian Pei, Manzil Zaheer, Bhuwan Dhingra |
阅读更多来源: ArXiv AI | 24-06-25
Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities
Authors: Yuanchen Bei, Weizhi Zhang, Siwen Wang, Weizhi Chen, Sheng Zhou, Hao Chen, Yong Li, Jiajun Bu, Shirui Pan, Yizhou Yu, Irwin King, Fakhri Karray, Philip S. Yu |
阅读更多来源: ArXiv AI | 24-06-25
Leveraging Large Language Model for Intelligent Log Processing and Autonomous Debugging in Cloud AI Platforms
Authors: Cheng Ji, Huaiying Luo |
阅读更多来源: ArXiv AI | 24-06-25
A Conceptual Framework for AI Capability Evaluations
Authors: María Victoria Carro, Denise Alejandra Mester, Francisca Gauna Selasco, Luca Nicolás Forziati Gangi, Matheo Sandleris Musa, Lola Ramos Pereyra, Mario Leiva, Juan Gustavo Corvalan, María Vanina Martinez, Gerardo Simari |
阅读更多来源: ArXiv AI | 24-06-25
Standard Applicability Judgment and Cross-jurisdictional Reasoning: A RAG-based Framework for Medical Device Compliance
Authors: Yu Han, Aaron Ceross, Jeroen H.M. Bergmann |
阅读更多来源: ArXiv AI | 24-06-25
How Robust is Model Editing after Fine-Tuning? An Empirical Study on Text-to-Image Diffusion Models
Authors: Feng He, Zhenyang Liu, Marco Valentino, Zhixue Zhao |
阅读更多来源: ArXiv AI | 24-06-25
A Large Language Model-based Multi-Agent Framework for Analog Circuits' Sizing Relationships Extraction
Authors: Chengjie Liu, Weiyu Chen, Huiyao Xu, Yuan Du, Jun Yang, Li Du |
阅读更多来源: ArXiv AI | 24-06-25
T-CPDL: A Temporal Causal Probabilistic Description Logic for Developing Logic-RAG Agent
Authors: Hong Qing Yu |
阅读更多来源: ArXiv AI | 24-06-25
A Question Bank to Assess AI Inclusivity: Mapping out the Journey from Diversity Errors to Inclusion Excellence
Authors: Rifat Ara Shams, Didar Zowghi, Muneera Bano |
阅读更多来源: ArXiv AI | 24-06-25
AggTruth: Contextual Hallucination Detection using Aggregated Attention Scores in LLMs
Authors: Piotr Matys, Jan Eliasz, Konrad Kiełczyński, Mikołaj Langner, Teddy Ferdinan, Jan Kocoń, Przemysław Kazienko |
阅读更多来源: ArXiv AI | 24-06-25
TRIZ Agents: A Multi-Agent LLM Approach for TRIZ-Based Innovation
Authors: Kamil Szczepanik, Jarosław A. Chudziak |
阅读更多来源: ArXiv AI | 24-06-25
Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training
Authors: Jonathan Cook, Silvia Sapora, Arash Ahmadian, Akbir Khan, Tim Rocktaschel, Jakob Foerster, Laura Ruis |
阅读更多来源: ArXiv AI | 24-06-25
Steering Conceptual Bias via Transformer Latent-Subspace Activation
Authors: Vansh Sharma, Venkat Raman |
阅读更多来源: ArXiv AI | 24-06-25
jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval
Authors: Michael Günther, Saba Sturua, Mohammad Kalim Akram, Isabelle Mohr, Andrei Ungureanu, Sedigheh Eslami, Scott Martens, Bo Wang, Nan Wang, Han Xiao |
阅读更多来源: ArXiv AI | 24-06-25
Show HN: Pickaxe – A TypeScript library for building AI agentsgithub.com/hatchet-dev
阅读更多来源: Hacker News | 24-06-25
Judge denies creating “mass surveillance program” harming all ChatGPT usersarstechnica.com
阅读更多来源: Hacker News | 24-06-25
GitHub CEO: manual coding remains key despite AI boomtechinasia.com
阅读更多来源: Hacker News | 24-06-25
Sakana AI's ALE AI agent cracks the top 21 among 1,000 code experts
阅读更多来源: The Decoder | 23-06-25
Apple executives have held internal discussions about potentially bidding for AI startup Perplexity
阅读更多来源: The Decoder | 23-06-25
Nano-Vllm: lightweight vLLM implementation built from scratchgithub.com/geeeekexplorer
阅读更多来源: Hacker News | 23-06-25
Show HN: EchoStream – A Local AI Agent That Lives on Your iPhone
阅读更多来源: Hacker News | 23-06-25
Claude Code for VSCodevisualstudio.com
阅读更多来源: Hacker News | 23-06-25
Facial Landmark Visualization and Emotion Recognition Through Neural Networks
Authors: Israel Juárez-Jiménez, Tiffany Guadalupe Martínez Paredes, Jesús García-Ramírez, Eric Ramos Aguilar |
阅读更多来源: ArXiv AI | 23-06-25
Towards AI Search Paradigm
Authors: Yuchen Li, Hengyi Cai, Rui Kong, Xinran Chen, Jiamin Chen, Jun Yang, Haojie Zhang, Jiayi Li, Jiayi Wu, Yiqun Chen, Changle Qu, Keyi Kong, Wenwen Ye, Lixin Su, Xinyu Ma, Long Xia, Daiting Shi, Jiashu Zhao, Haoyi Xiong, Shuaiqiang Wang, Dawei Yin |
阅读更多来源: ArXiv AI | 23-06-25
Continual Learning with Columnar Spiking Neural Networks
Authors: Denis Larionov, Nikolay Bazenkov, Mikhail Kiselev |
阅读更多来源: ArXiv AI | 23-06-25
LLMs Struggle to Perform Counterfactual Reasoning with Parametric Knowledge
Authors: Khurram Yamin, Gaurav Ghosal, Bryan Wilder |
阅读更多来源: ArXiv AI | 23-06-25
No Free Lunch: Rethinking Internal Feedback for LLM Reasoning
Authors: Yanzhi Zhang, Zhaoxi Zhang, Haoxiang Guan, Yilin Cheng, Yitong Duan, Chen Wang, Yue Wang, Shuxin Zheng, Jiyan He |
阅读更多来源: ArXiv AI | 23-06-25
Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems
Authors: Matias Martinez, Xavier Franch |
阅读更多来源: ArXiv AI | 23-06-25
Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts
Authors: Kartik Sharma, Yiqiao Jin, Vineeth Rakesh, Yingtong Dou, Menghai Pan, Mahashweta Das, Srijan Kumar |
阅读更多来源: ArXiv AI | 23-06-25
SHADE-Arena: Evaluating Sabotage and Monitoring in LLM Agents
Authors: Jonathan Kutasov, Yuqi Sun, Paul Colognese, Teun van der Weij, Linda Petrini, Chen Bo Calvin Zhang, John Hughes, Xiang Deng, Henry Sleight, Tyler Tracy, Buck Shlegeris, Joe Benton |
阅读更多来源: ArXiv AI | 23-06-25
Dual-Objective Reinforcement Learning with Novel Hamilton-Jacobi-Bellman Formulations
Authors: William Sharpless, Dylan Hirsch, Sander Tonkens, Nikhil Shinde, Sylvia Herbert |
阅读更多来源: ArXiv AI | 23-06-25
Exploring Big Five Personality and AI Capability Effects in LLM-Simulated Negotiation Dialogues
Authors: Myke C. Cohen, Zhe Su, Hsien-Te Kao, Daniel Nguyen, Spencer Lynch, Maarten Sap, Svitlana Volkova |
阅读更多来源: ArXiv AI | 23-06-25
Large Language Models are Near-Optimal Decision-Makers with a Non-Human Learning Behavior
Authors: Hao Li, Gengrui Zhang, Petter Holme, Shuyue Hu, Zhen Wang |
阅读更多来源: ArXiv AI | 23-06-25
Advancing Harmful Content Detection in Organizational Research: Integrating Large Language Models with Elo Rating System
Authors: Mustafa Akben, Aaron Satko |
阅读更多来源: ArXiv AI | 23-06-25
Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving
Authors: Chuxue Cao, Mengze Li, Juntao Dai, Jinluan Yang, Zijian Zhao, Shengyu Zhang, Weijie Shi, Chengzhong Liu, Sirui Han, Yike Guo |
阅读更多来源: ArXiv AI | 23-06-25
The MedPerturb Dataset: What Non-Content Perturbations Reveal About Human and Clinical LLM Decision Making
Authors: Abinitha Gourabathina, Yuexing Hao, Walter Gerych, Marzyeh Ghassemi |
阅读更多来源: ArXiv AI | 23-06-25
LAION and Intel introduce tools that help AI gauge the intensity of 40 distinct emotions
阅读更多来源: The Decoder | 22-06-25
Phoenix.new – Remote AI Runtime for Phoenixfly.io
阅读更多来源: Hacker News | 22-06-25
Remote MCP Support in Claude Codeanthropic.com
阅读更多来源: Hacker News | 22-06-25
Uncovering Intention through LLM-Driven Code Snippet Description Generation
Authors: Yusuf Sulistyo Nugroho, Farah Danisha Salam, Brittany Reid, Raula Gaikovina Kula, Kazumasa Shimari, Kenichi Matsumoto |
阅读更多来源: ArXiv AI | 22-06-25
RePCS: Diagnosing Data Memorization in LLM-Powered Retrieval-Augmented Generation
Authors: Le Vu Anh, Nguyen Viet Anh, Mehmet Dik, Luong Van Nghia |
阅读更多来源: ArXiv AI | 22-06-25
Optimizing Web-Based AI Query Retrieval with GPT Integration in LangChain A CoT-Enhanced Prompt Engineering Approach
Authors: Wenqi Guan, Yang Fang |
阅读更多来源: ArXiv AI | 22-06-25
Over-squashing in Spatiotemporal Graph Neural Networks
Authors: Ivan Marisca, Jacob Bamberger, Cesare Alippi, Michael M. Bronstein |
阅读更多来源: ArXiv AI | 22-06-25
Towards Explainable Indoor Localization: Interpreting Neural Network Learning on Wi-Fi Fingerprints Using Logic Gates
Authors: Danish Gufran, Sudeep Pasricha |
阅读更多来源: ArXiv AI | 22-06-25
The Compositional Architecture of Regret in Large Language Models
Authors: Xiangxiang Cui, Shu Yang, Tianjin Huang, Wanyu Lin, Lijie Hu, Di Wang |
阅读更多来源: ArXiv AI | 22-06-25
LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning
Authors: Gabrel J. Perin, Runjin Chen, Xuxi Chen, Nina S. T. Hirata, Zhangyang Wang, Junyuan Hong |
阅读更多来源: ArXiv AI | 22-06-25
Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability
Authors: Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe |
阅读更多来源: ArXiv AI | 22-06-25
Demystifying the Visual Quality Paradox in Multimodal Large Language Models
Authors: Shuo Xing, Lanqing Guo, Hongyuan Hua, Seoyoung Lee, Peiran Li, Yufei Wang, Zhangyang Wang, Zhengzhong Tu |
阅读更多来源: ArXiv AI | 22-06-25
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges
Authors: Xianliang Yang, Ling Zhang, Haolong Qian, Lei Song, Jiang Bian |
阅读更多来源: ArXiv AI | 22-06-25
Managing Complex Failure Analysis Workflows with LLM-based Reasoning and Acting Agents
Authors: Aline Dobrovsky, Konstantin Schekotihin, Christian Burmer |
阅读更多来源: ArXiv AI | 22-06-25
The AI Policy Module: Developing Computer Science Student Competency in AI Ethics and Policy
Authors: James Weichert, Daniel Dunlap, Mohammed Farghally, Hoda Eldardiry |
阅读更多来源: ArXiv AI | 22-06-25
The Effect of State Representation on LLM Agent Behavior in Dynamic Routing Games
Authors: Lyle Goodyear, Rachel Guo, Ramesh Johari |
阅读更多来源: ArXiv AI | 22-06-25
Meta CEO Mark Zuckerberg bets billions not to fall behind in the AI race
阅读更多来源: The Decoder | 22-06-25
Apple's "Illusion of Thinking" paper shows experts deeply divided on AI reasoning
阅读更多来源: The Decoder | 21-06-25
Agentic Misalignment: How LLMs could be insider threatsanthropic.com
阅读更多来源: Hacker News | 21-06-25
Midjourney launches its first video model, letting users turn images into short animated clips
阅读更多来源: The Decoder | 21-06-25
Jürgen Schmidhuber:the Father of Generative AI Without Turing Awardjazzyear.com
阅读更多来源: Hacker News | 21-06-25
I Built a Celebrity AI Image Generator(No Registion Needed)– Would Love Feedbackaicelebrity.design
阅读更多来源: Hacker News | 21-06-25
OpenAI CEO Sam Altman says GPT-5 is "probably coming sometime this summer"
阅读更多来源: The Decoder | 20-06-25
Andrej Karpathy: Software in the era of AI [video]youtube.com
阅读更多来源: Hacker News | 20-06-25
Compiling LLMs into a MegaKernel: A path to low-latency inferencezhihaojia.medium.com
阅读更多来源: Hacker News | 20-06-25
Gemini 2.5 Flash-Lite is the fastest and most cost-effective model in Google's Gemini lineup
阅读更多来源: The Decoder | 20-06-25
Show HN: Claude Code Usage Monitor – real-time tracker to dodge usage cut-offsgithub.com/maciek-roboblog
阅读更多来源: Hacker News | 20-06-25
How OpenElections uses LLMsthescoop.org
阅读更多来源: Hacker News | 20-06-25
MiniMax-M1 comes close to Gemini 2.5 Pro efficiency when handling large context windows
阅读更多来源: The Decoder | 19-06-25
From LLM to AI Agent: What's the Real Journey Behind AI System Development?codelink.io
阅读更多来源: Hacker News | 19-06-25
Luxembourg partners with Mistral AI to bring artificial intelligence to government and defense
阅读更多来源: The Decoder | 19-06-25
OpenAI and Microsoft increasingly mistrust each other as tensions rise over contracts and profits
阅读更多来源: The Decoder | 19-06-25
Is there a half-life for the success rates of AI agents?tobyord.com
阅读更多来源: Hacker News | 19-06-25
Math genius Terence Tao says that AI still can't "smell" bad math
阅读更多来源: The Decoder | 18-06-25
OpenAI’s Defense Department deal targets healthcare, data analysis, and cyber defense
阅读更多来源: The Decoder | 18-06-25
Time Series Forecasting with Graph Transformerskumo.ai
阅读更多来源: Hacker News | 18-06-25
LLMs pose an interesting problem for DSL designerskirancodes.me
阅读更多来源: Hacker News | 18-06-25
Making 2.5 Flash and 2.5 Pro GA, and introducing Gemini 2.5 Flash-Liteblog.google
阅读更多来源: Hacker News | 18-06-25
Building Effective AI Agentsanthropic.com
阅读更多来源: Hacker News | 18-06-25
I counted all of the yurts in Mongolia using machine learningmonroeclinton.com
阅读更多来源: Hacker News | 18-06-25
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs
Authors: Ring Team, Bin Hu, Cai Chen, Deng Zhao, Ding Liu, Dingnan Jin, Feng Zhu, Hao Dai, Hongzhi Luan, Jia Guo, Jiaming Liu, Jiewei Wu, Jun Mei, Jun Zhou, Junbo Zhao, Junwu Xiong, Kaihong Zhang, Kuan Xu, Lei Liang, Liang Jiang, Liangcheng Fu, Longfei Zheng, Qiang Gao, Qing Cui, Quan Wan, Shaomian Zheng, Shuaicheng Li, Tongkai Yang, Wang Ren, Xiaodong Yan, Xiaopei Wan, Xiaoyun Feng, Xin Zhao, Xinxing Yang, Xinyu Kong, Xuemin Yang, Yang Li, Yingting Wu, Yongkang Liu, Zhankai Xu, Zhenduo Zhang, Zhenglei Zhou, Zhenyu Huang, Zhiqiang Zhang, Zihao Wang, Zujie Wen |
阅读更多来源: ArXiv AI | 18-06-25
Personalized Constitutionally-Aligned Agentic Superego: Secure AI Behavior Aligned to Diverse Human Values
Authors: Nell Watson, Ahmed Amer, Evan Harris, Preeti Ravindra, Shujun Zhang |
阅读更多来源: ArXiv AI | 18-06-25
The NordDRG AI Benchmark for Large Language Models
Authors: Tapio Pitkäranta |
阅读更多来源: ArXiv AI | 18-06-25
ICE-ID: A Novel Historical Census Data Benchmark Comparing NARS against LLMs, \& a ML Ensemble on Longitudinal Identity Resolution
Authors: Gonçalo Hora de Carvalho, Lazar S. Popov, Sander Kaatee, Kristinn R. Thórisson, Tangrui Li, Pétur Húni Björnsson, Jilles S. Dibangoye |
阅读更多来源: ArXiv AI | 18-06-25
Causality in the human niche: lessons for machine learning
Authors: Richard D. Lange, Konrad P. Kording |
阅读更多来源: ArXiv AI | 18-06-25
Evaluating Explainability: A Framework for Systematic Assessment and Reporting of Explainable AI Features
Authors: Miguel A. Lago, Ghada Zamzmi, Brandon Eich, Jana G. Delfino |
阅读更多来源: ArXiv AI | 18-06-25
LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning
Authors: Miho Koda, Yu Zheng, Ruixian Ma, Mingyang Sun, Devesh Pansare, Fabio Duarte, Paolo Santi |
阅读更多来源: ArXiv AI | 18-06-25
Machine Mirages: Defining the Undefined
Authors: Hamidou Tembine |
阅读更多来源: ArXiv AI | 18-06-25
ProfiLLM: An LLM-Based Framework for Implicit Profiling of Chatbot Users
Authors: Shahaf David, Yair Meidan, Ido Hersko, Daniel Varnovitzky, Dudu Mimran, Yuval Elovici, Asaf Shabtai |
阅读更多来源: ArXiv AI | 18-06-25
Into the Unknown: Applying Inductive Spatial-Semantic Location Embeddings for Predicting Individuals' Mobility Beyond Visited Places
Authors: Xinglei Wang, Tao Cheng, Stephen Law, Zichao Zeng, Ilya Ilyankou, Junyuan Liu, Lu Yin, Weiming Huang, Natchapon Jongwiriyanurak |
阅读更多来源: ArXiv AI | 18-06-25
Fragile Preferences: A Deep Dive Into Order Effects in Large Language Models
Authors: Haonan Yin, Shai Vardi, Vidyanand Choudhary |
阅读更多来源: ArXiv AI | 18-06-25
Lightweight Relevance Grader in RAG
Authors: Taehee Jeong |
阅读更多来源: ArXiv AI | 18-06-25
From Black Boxes to Transparent Minds: Evaluating and Enhancing the Theory of Mind in Multimodal Large Language Models
Authors: Xinyang Li, Siqi Liu, Bochao Zou, Jiansheng Chen, Huimin Ma |
阅读更多来源: ArXiv AI | 18-06-25
Causes in neuron diagrams, and testing causal reasoning in Large Language Models. A glimpse of the future of philosophy?
Authors: Louis Vervoort, Vitaly Nikolaev |
阅读更多来源: ArXiv AI | 18-06-25
Don't throw the baby out with the bathwater: How and why deep learning for ARC
Authors: Jack Cole, Mohamed Osman |
阅读更多来源: ArXiv AI | 18-06-25
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
Authors: Xumeng Wen, Zihan Liu, Shun Zheng, Zhijian Xu, Shengyu Ye, Zhirong Wu, Xiao Liang, Yang Wang, Junjie Li, Ziming Miao, Jiang Bian, Mao Yang |
阅读更多来源: ArXiv AI | 18-06-25
Don't Make It Up: Preserving Ignorance Awareness in LLM Fine-Tuning
Authors: William F. Shen, Xinchi Qiu, Nicola Cancedda, Nicholas D. Lane |
阅读更多来源: ArXiv AI | 18-06-25
AviationLLM: An LLM-based Knowledge System for Aviation Training
Authors: Jia'ang Wan, Feng Shen, Fujuan Li, Yanjin Sun, Yan Li, Shiwen Zhang |
阅读更多来源: ArXiv AI | 18-06-25
ADRD: LLM-Driven Autonomous Driving Based on Rule-based Decision Systems
Authors: Fanzhi Zeng, Siqi Wang, Chuzhao Zhu, Li Li |
阅读更多来源: ArXiv AI | 18-06-25
LLM-Powered Swarms: A New Frontier or a Conceptual Stretch?
Authors: Muhammad Atta Ur Rahman, Melanie Schranz |
阅读更多来源: ArXiv AI | 18-06-25
Doppelgänger Method: Breaking Role Consistency in LLM Agent via Prompt-based Transferable Adversarial Attack
Authors: Daewon Kang, YeongHwan Shin, Doyeon Kim, Kyu-Hwan Jung, Meong Hi Son |
阅读更多来源: ArXiv AI | 18-06-25
Enhancing Symbolic Machine Learning by Subsymbolic Representations
Authors: Stephen Roth, Lennart Baur, Derian Boer, Stefan Kramer |
阅读更多来源: ArXiv AI | 18-06-25
New study supports Apple's doubts about AI reasoning, but sees no dead end
阅读更多来源: The Decoder | 18-06-25
Salesforce's CRM benchmark finds AI agents struggle in real-world business scenarios
阅读更多来源: The Decoder | 17-06-25
New York may soon require AI giants to publish safety protocols before releasing LLMs
阅读更多来源: The Decoder | 17-06-25
Evolutionary Developmental Biology Can Serve as the Conceptual Foundation for a New Design Paradigm in Artificial Intelligence
Authors: Zeki Doruk Erden, Boi Faltings |
阅读更多来源: ArXiv AI | 17-06-25
Mastering Da Vinci Code: A Comparative Study of Transformer, LLM, and PPO-based Agents
Authors: LeCheng Zhang, Yuanshi Wang, Haotian Shen, Xujie Wang |
阅读更多来源: ArXiv AI | 17-06-25
Constitutive Components for Human-Like Autonomous Artificial Intelligence
Authors: Kazunori D Yamada |
阅读更多来源: ArXiv AI | 17-06-25
Scaling Test-time Compute for LLM Agents
Authors: King Zhu, Hanhao Li, Siwei Wu, Tianshun Xing, Dehua Ma, Xiangru Tang, Minghao Liu, Jian Yang, Jiaheng Liu, Yuchen Eleanor Jiang, Changwang Zhang, Chenghua Lin, Jun Wang, Ge Zhang, Wangchunshu Zhou |
阅读更多来源: ArXiv AI | 17-06-25
Knowledge Graph Fusion with Large Language Models for Accurate, Explainable Manufacturing Process Planning
Authors: Danny Hoang, David Gorsich, Matthew P. Castanier, Farhad Imani |
阅读更多来源: ArXiv AI | 17-06-25
A Practical Guide for Evaluating LLMs and LLM-Reliant Systems
Authors: Ethan M. Rudd, Christopher Andrews, Philip Tully |
阅读更多来源: ArXiv AI | 17-06-25
Discerning What Matters: A Multi-Dimensional Assessment of Moral Competence in LLMs
Authors: Daniel Kilov, Caroline Hendy, Secil Yanik Guyot, Aaron J. Snoswell, Seth Lazar |
阅读更多来源: ArXiv AI | 17-06-25
NeuroPhysNet: A FitzHugh-Nagumo-Based Physics-Informed Neural Network Framework for Electroencephalograph (EEG) Analysis and Motor Imagery Classification
Authors: Zhenyu Xia, Xinlei Huang, Suvash C. Saha |
阅读更多来源: ArXiv AI | 17-06-25
Machine Learning as Iterated Belief Change a la Darwiche and Pearl
Authors: Theofanis Aravanis |
阅读更多来源: ArXiv AI | 17-06-25
Probabilistic Modeling of Spiking Neural Networks with Contract-Based Verification
Authors: Zhen Yao, Elisabetta De Maria, Robert De Simone |
阅读更多来源: ArXiv AI | 17-06-25
Towards Pervasive Distributed Agentic Generative AI -- A State of The Art
Authors: Gianni Molinari, Fabio Ciravegna |
阅读更多来源: ArXiv AI | 17-06-25
Navigating the Black Box: Leveraging LLMs for Effective Text-Level Graph Injection Attacks
Authors: Yuefei Lyu, Chaozhuo Li, Xi Zhang, Tianle Zhang |
阅读更多来源: ArXiv AI | 17-06-25
Vector Ontologies as an LLM world view extraction method
Authors: Kaspar Rothenfusser, Bekk Blando |
阅读更多来源: ArXiv AI | 17-06-25
A Game-Theoretic Negotiation Framework for Cross-Cultural Consensus in LLMs
Authors: Guoxi Zhang, Jiawei Chen, Tianzhuo Yang, Jiaming Ji, Yaodong Yang, Juntao Dai |
阅读更多来源: ArXiv AI | 17-06-25
Deflating Deflationism: A Critical Perspective on Debunking Arguments Against LLM Mentality
Authors: Alex Grzankowski, Geoff Keeling, Henry Shevlin, Winnie Street |
阅读更多来源: ArXiv AI | 17-06-25
Delving Into the Psychology of Machines: Exploring the Structure of Self-Regulated Learning via LLM-Generated Survey Responses
Authors: Leonie V.D.E. Vogelsmeier, Eduardo Oliveira, Kamila Misiejuk, Sonsoles López-Pernas, Mohammed Saqr |
阅读更多来源: ArXiv AI | 17-06-25
From Data-Driven to Purpose-Driven Artificial Intelligence: Systems Thinking for Data-Analytic Automation of Patient Care
Authors: Daniel Anadria, Roel Dobbe, Anastasia Giachanou, Ruurd Kuiper, Richard Bartels, Íñigo Martínez de Rituerto de Troya, Carmen Zürcher, Daniel Oberski |
阅读更多来源: ArXiv AI | 17-06-25
Generative AI coding tools and agents do not work for memiguelgrinberg.com
阅读更多来源: Hacker News | 17-06-25
OpenAI wins $200M U.S. defense contractcnbc.com
阅读更多来源: Hacker News | 17-06-25
Rednote releases its first open-source LLM with a Mixture-of-Experts architecture
阅读更多来源: The Decoder | 17-06-25
Anthropic shares blueprint for Claude Research agent using multiple AI agents in parallel
阅读更多来源: The Decoder | 17-06-25
Breaking Quadratic Barriers: A Non-Attention LLM for Ultra-Long Context Horizonsarxiv.org
阅读更多来源: Hacker News | 17-06-25
ZjsComponent: A Pragmatic Approach to Reusable UI Fragments for Web Developmentarxiv.org
阅读更多来源: Hacker News | 17-06-25
Snorting the AGI with Claude Codekadekillary.work
阅读更多来源: Hacker News | 17-06-25
OpenAI updates ChatGPT search with smarter answers and image search
阅读更多来源: The Decoder | 16-06-25
Chemical knowledge and reasoning of large language models vs. chemist expertisenature.com
阅读更多来源: Hacker News | 16-06-25
LLM Chat via SSHgithub.com/ccbikai
阅读更多来源: Hacker News | 16-06-25
Persona-driven Simulation of Voting Behavior in the European Parliament with Large Language Models
Authors: Maximilian Kreutner, Marlene Lutz, Markus Strohmaier |
阅读更多来源: ArXiv AI | 16-06-25
TrustGLM: Evaluating the Robustness of GraphLLMs Against Prompt, Text, and Structure Attacks
Authors: Qihai Zhang, Xinyue Sheng, Yuanfu Sun, Qiaoyu Tan |
阅读更多来源: ArXiv AI | 16-06-25
An Explainable AI Framework for Dynamic Resource Management in Vehicular Network Slicing
Authors: Haochen Sun, Yifan Liu, Ahmed Al-Tahmeesschi, Swarna Chetty, Syed Ali Raza Zaidi, Avishek Nag, Hamed Ahmadi |
阅读更多来源: ArXiv AI | 16-06-25
How do Probabilistic Graphical Models and Graph Neural Networks Look at Network Data?
Authors: Michela Lapenna, Caterina De Bacco |
阅读更多来源: ArXiv AI | 16-06-25
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?
Authors: Zihan Zheng, Zerui Cheng, Zeyu Shen, Shang Zhou, Kaiyuan Liu, Hansen He, Dongruixuan Li, Stanley Wei, Hangyi Hao, Jianzhu Yao, Peiyao Sheng, Zixuan Wang, Wenhao Chai, Aleksandra Korolova, Peter Henderson, Sanjeev Arora, Pramod Viswanath, Jingbo Shang, Saining Xie |
阅读更多来源: ArXiv AI | 16-06-25
Real-World Deployment of a Lane Change Prediction Architecture Based on Knowledge Graph Embeddings and Bayesian Inference
Authors: M. Manzour, Catherine M. Elias, Omar M. Shehata, R. Izquierdo, M. A. Sotelo |
阅读更多来源: ArXiv AI | 16-06-25
Subjective Experience in AI Systems: What Do AI Researchers and the Public Believe?
Authors: Noemi Dreksler, Lucius Caviola, David Chalmers, Carter Allen, Alex Rand, Joshua Lewis, Philip Waggoner, Kate Mays, Jeff Sebo |
阅读更多来源: ArXiv AI | 16-06-25
Improving Large Language Model Safety with Contrastive Representation Learning
Authors: Samuel Simko, Mrinmaya Sachan, Bernhard Schölkopf, Zhijing Jin |
阅读更多来源: ArXiv AI | 16-06-25
code_transformed: The Influence of Large Language Models on Code
Authors: Yuliang Xu, Siming Huang, Mingmeng Geng, Yao Wan, Xuanhua Shi, Dongping Chen |
阅读更多来源: ArXiv AI | 16-06-25
Upgrade or Switch: Do We Need a New Registry Architecture for the Internet of AI Agents?
Authors: Ramesh Raskar, Pradyumna Chari, Jared James Grogan, Mahesh Lambe, Robert Lincourt, Raghu Bala, Abhishek Singh, Ayush Chopra, Rajesh Ranjan, Shailja Gupta, Dimitris Stripelis, Maria Gorskikh, Sichao Wang |
阅读更多来源: ArXiv AI | 16-06-25
EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction
Authors: Hsi-Che Lin, Yu-Chu Yu, Kai-Po Chang, Yu-Chiang Frank Wang |
阅读更多来源: ArXiv AI | 16-06-25
Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables
Authors: Yitong Zhou, Mingyue Cheng, Qingyang Mao, Yucong Luo, Qi Liu, Yupeng Li, Xiaohan Zhang, Deguang Liu, Xin Li, Enhong Chen |
阅读更多来源: ArXiv AI | 16-06-25
LLM-as-a-Fuzzy-Judge: Fine-Tuning Large Language Models as a Clinical Evaluation Judge with Fuzzy Logic
Authors: Weibing Zheng, Laurah Turner, Jess Kropczynski, Murat Ozer, Tri Nguyen, Shane Halse |
阅读更多来源: ArXiv AI | 16-06-25
Large Language Model-Powered Conversational Agent Delivering Problem-Solving Therapy (PST) for Family Caregivers: Enhancing Empathy and Therapeutic Alliance Using In-Context Learning
Authors: Liying Wang, Ph.D., Daffodil Carrington, M.S., Daniil Filienko, M.S., Caroline El Jazmi, M.S., Serena Jinchen Xie, M.S., Martine De Cock, Ph.D., Sarah Iribarren, Ph.D., Weichao Yuwen, Ph.D |
阅读更多来源: ArXiv AI | 16-06-25
RAG+: Enhancing Retrieval-Augmented Generation with Application-Aware Reasoning
Authors: Yu Wang, Shiwan Zhao, Ming Fan, Zhihu Wang, Yubo Zhang, Xicheng Zhang, Zhengfan Wang, Heyuan Huang, Ting Liu |
阅读更多来源: ArXiv AI | 16-06-25
Structure-Aware Automatic Channel Pruning by Searching with Graph Embedding
Authors: Zifan Liu, Yuan Cao, Yanwei Yu, Heng Qi, Jie Gui |
阅读更多来源: ArXiv AI | 16-06-25
VLM@school -- Evaluation of AI image understanding on German middle school knowledge
Authors: René Peinl, Vincent Tischler |
阅读更多来源: ArXiv AI | 16-06-25
Collaborative LLM Inference via Planning for Efficient Reasoning
Authors: Byeongchan Lee, Jonghoon Lee, Dongyoung Kim, Jaehyung Kim, Jinwoo Shin |
阅读更多来源: ArXiv AI | 16-06-25
On the Performance of LLMs for Real Estate Appraisal
Authors: Margot Geerts, Manon Reusens, Bart Baesens, Seppe vanden Broucke, Jochen De Weerdt |
阅读更多来源: ArXiv AI | 16-06-25
Addressing Bias in LLMs: Strategies and Application to Fair AI-based Recruitment
Authors: Alejandro Peña, Julian Fierrez, Aythami Morales, Gonzalo Mancera, Miguel Lopez, Ruben Tolosana |
阅读更多来源: ArXiv AI | 16-06-25
Revealing Political Bias in LLMs through Structured Multi-Agent Debate
Authors: Aishwarya Bandaru, Fabian Bindley, Trevor Bluth, Nandini Chavda, Baixu Chen, Ethan Law |
阅读更多来源: ArXiv AI | 16-06-25
Towards a Cascaded LLM Framework for Cost-effective Human-AI Decision-Making
Authors: Claudio Fanconi, Mihaela van der Schaar |
阅读更多来源: ArXiv AI | 16-06-25
Tracing LLM Reasoning Processes with Strategic Games: A Framework for Planning, Revision, and Resource-Constrained Decision Making
Authors: Xiaopeng Yuan, Xingjian Zhang, Ke Xu, Yifan Xu, Lijun Yu, Jindong Wang, Yushun Dong, Haohan Wang |
阅读更多来源: ArXiv AI | 16-06-25
The z80 technique reveals the source code for Atlassian's 'rovo' AI assistantghuntley.com
阅读更多来源: Hacker News | 16-06-25
Let's Talk About ChatGPT-Induced Spiritual Psychosisdefault.blog
阅读更多来源: Hacker News | 16-06-25
Rabbit launches "intern," a software AI agent designed to handle team-level projects
阅读更多来源: The Decoder | 15-06-25
Apple's new AI benchmarks show its models still lag behind leaders like OpenAI and Google
阅读更多来源: The Decoder | 15-06-25
Slimming Down LLMs Without Losing Their Minds
Authors: Qingda (Michael)Mai |
阅读更多来源: ArXiv AI | 15-06-25
BioClinical ModernBERT: A State-of-the-Art Long-Context Encoder for Biomedical and Clinical NLP
Authors: Thomas Sounack, Joshua Davis, Brigitte Durieux, Antoine Chaffin, Tom J. Pollard, Eric Lehman, Alistair E. W. Johnson, Matthew McDermott, Tristan Naumann, Charlotta Lindvall |
阅读更多来源: ArXiv AI | 15-06-25
The Role of Generative AI in Facilitating Social Interactions: A Scoping Review
Authors: T. T. J. E. Arets, G. Perugia, M. Houben, W.A. IJsselsteijn |
阅读更多来源: ArXiv AI | 15-06-25
Robustly Improving LLM Fairness in Realistic Settings via Interpretability
Authors: Adam Karvonen, Samuel Marks |
阅读更多来源: ArXiv AI | 15-06-25
Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors
Authors: Chen Yueh-Han, Nitish Joshi, Yulin Chen, Maksym Andriushchenko, Rico Angell, He He |
阅读更多来源: ArXiv AI | 15-06-25
GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models
Authors: Evelyn Ma, Duo Zhou, Peizhi Niu, Huiting Zhou, Huan Zhang, Olgica Milenkovic, S. Rasoul Etesami |
阅读更多来源: ArXiv AI | 15-06-25
Farseer: A Refined Scaling Law in Large Language Models
Authors: Houyi Li, Wenzhen Zheng, Qiufeng Wang, Zhenyu Ding, Haoying Wang, Zili Wang, Shijie Xuyang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang |
阅读更多来源: ArXiv AI | 15-06-25
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Authors: Qizhe Zhang, Mengzhen Liu, Lichen Li, Ming Lu, Yuan Zhang, Junwen Pan, Qi She, Shanghang Zhang |
阅读更多来源: ArXiv AI | 15-06-25
One Patient, Many Contexts: Scaling Medical AI Through Contextual Intelligence
Authors: Michelle M. Li, Ben Y. Reis, Adam Rodman, Tianxi Cai, Noa Dagan, Ran D. Balicer, Joseph Loscalzo, Isaac S. Kohane, Marinka Zitnik |
阅读更多来源: ArXiv AI | 15-06-25
WGSR-Bench: Wargame-based Game-theoretic Strategic Reasoning Benchmark for Large Language Models
Authors: Qiyue Yin, Pei Xu, Qiaozhe Li, Shengda Liu, Shengqi Shen, Tong Wang, Yihong Han, Xiaonan Zhao, Likun Yang, Shiyue Cao, Shiyu Qiu, Yuxuan Liu, Shizhao Yu, Lei Cui, Chengxin Yan, Jie Sun, Xiangquan Tang, Kaiqi Huang |
阅读更多来源: ArXiv AI | 15-06-25
Closer to Language than Steam: AI as the Cognitive Engine of a New Productivity Revolution
Authors: Xinmin Fang, Lingfeng Tao, Zhengxiong Li |
阅读更多来源: ArXiv AI | 15-06-25
Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills
Authors: Yuquan Xie, Zaijing Li, Rui Shao, Gongwei Chen, Kaiwen Zhou, Yinchuan Li, Dongmei Jiang, Liqiang Nie |
阅读更多来源: ArXiv AI | 15-06-25
Reasoning RAG via System 1 or System 2: A Survey on Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges
Authors: Jintao Liang, Gang Su, Huifeng Lin, You Wu, Rui Zhao, Ziyue Li |
阅读更多来源: ArXiv AI | 15-06-25
Primender Sequence: A Novel Mathematical Construct for Testing Symbolic Inference and AI Reasoning
Authors: Mohd Anwar Jamal Faiz |
阅读更多来源: ArXiv AI | 15-06-25
LogiPlan: A Structured Benchmark for Logical Planning and Relational Reasoning in LLMs
Authors: Yanan Cai, Ahmed Salem, Besmira Nushi, Mark Russinovich |
阅读更多来源: ArXiv AI | 15-06-25
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning
Authors: Yuhao Zhou, Yiheng Wang, Xuming He, Ruoyao Xiao, Zhiwei Li, Qiantai Feng, Zijie Guo, Yuejin Yang, Hao Wu, Wenxuan Huang, Jiaqi Wei, Dan Si, Xiuqi Yao, Jia Bu, Haiwen Huang, Tianfan Fu, Shixiang Tang, Ben Fei, Dongzhan Zhou, Fenghua Ling, Yan Lu, Siqi Sun, Chenhui Li, Guanjie Zheng, Jiancheng Lv, Wenlong Zhang, Lei Bai |
阅读更多来源: ArXiv AI | 15-06-25
Automated Validation of Textual Constraints Against AutomationML via LLMs and SHACL
Authors: Tom Westermann, Aljosha Köcher, Felix Gehlhoff |
阅读更多来源: ArXiv AI | 15-06-25
TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving
Authors: Vincenzo Colle, Mohamed Sana, Nicola Piovesan, Antonio De Domenico, Fadhel Ayed, Merouane Debbah |
阅读更多来源: ArXiv AI | 15-06-25
A Study on Individual Spatiotemporal Activity Generation Method Using MCP-Enhanced Chain-of-Thought Large Language Models
Authors: Yu Zhang, Yang Hu, De Wang |
阅读更多来源: ArXiv AI | 15-06-25
OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
Authors: Xiaozhe Li, Jixuan Chen, Xinyu Fang, Shengyuan Ding, Haodong Duan, Qingwen Liu, Kai Chen |
阅读更多来源: ArXiv AI | 15-06-25
Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?
Authors: Fei Lin, Ziyang Gong, Cong Wang, Yonglin Tian, Tengchao Zhang, Xue Yang, Gen Luo, Fei-Yue Wang |
阅读更多来源: ArXiv AI | 15-06-25
AMD's AI Future Is Rack Scale 'Helios'morethanmoore.substack.com
阅读更多来源: Hacker News | 15-06-25
I have reimplemented Stable Diffusion 3.5 from scratch in pure PyTorchgithub.com/yousef-rafat
阅读更多来源: Hacker News | 15-06-25
Text-to-LoRA: Hypernetwork that generates task-specific LLM adapters (LoRAs)github.com/sakanaai
阅读更多来源: Hacker News | 15-06-25
RAG Is a Fancy, Lying Search Enginestardog.ai
阅读更多来源: Hacker News | 15-06-25
Clinical knowledge in LLMs does not translate to human interactionsarxiv.org
阅读更多来源: Hacker News | 15-06-25
I used ChatGPT to learn programming from zero and built a video generation SaaSvidmakerpro.com
阅读更多来源: Hacker News | 15-06-25
Mechanize is building digital offices to train AI agents to fully automate computer work
阅读更多来源: The Decoder | 15-06-25
The Army’s Newest Recruits: Tech Execs From Meta, OpenAI and Morewsj.com
阅读更多来源: Hacker News | 14-06-25
Student discovers fungus predicted by Albert Hoffmanwvu.edu
阅读更多来源: Hacker News | 14-06-25
Saab achieves AI milestone with Gripen Esaab.com
阅读更多来源: Hacker News | 14-06-25
Meta launches AI video editing but holds back on full features for now
阅读更多来源: The Decoder | 14-06-25
Mattel partners with OpenAI to develop AI-powered toys and experiences
阅读更多来源: The Decoder | 14-06-25
Meta's latest model highlights the challenge AI faces in long-term planning and causal reasoning
阅读更多来源: The Decoder | 14-06-25
RISC-V in AI and HPC Part 1: Per Aspera Ad Astra?eetimes.com
阅读更多来源: Hacker News | 14-06-25
Meta invests $14.3B in Scale AI to kick-start superintelligence labnytimes.com
阅读更多来源: Hacker News | 14-06-25
Students fear AI could cause "brain rot" by making it too easy to skip crucial learning steps
阅读更多来源: The Decoder | 13-06-25
Maximizing Battery Storage Profits via High-Frequency Intraday Tradingarxiv.org
阅读更多来源: Hacker News | 13-06-25
Researchers confirm two journalists were hacked with Paragon spywaretechcrunch.com
阅读更多来源: Hacker News | 13-06-25
OpenAI's o3-pro may be too smart for small talk
阅读更多来源: The Decoder | 12-06-25
OpenAI o3-prohelp.openai.com
阅读更多来源: Hacker News | 12-06-25
GauntletAI (YC S17): All expenses paid AI training and guaranteed $200k+ jobgauntletai.com
阅读更多来源: Hacker News | 12-06-25
Intelligent Design 4.0: Paradigm Evolution Toward the Agentic AI Era
Authors: Shuo Jiang, Min Xie, Frank Youhua Chen, Jian Ma, Jianxi Luo |
阅读更多来源: ArXiv AI | 12-06-25
Large Language Models for Design Structure Matrix Optimization
Authors: Shuo Jiang, Min Xie, Jianxi Luo |
阅读更多来源: ArXiv AI | 12-06-25
Guided Graph Compression for Quantum Graph Neural Networks
Authors: Mikel Casals, Vasilis Belis, Elias F. Combarro, Eduard Alarcón, Sofia Vallecorsa, Michele Grossi |
阅读更多来源: ArXiv AI | 12-06-25
Attention Head Embeddings with Trainable Deep Kernels for Hallucination Detection in LLMs
Authors: Rodion Oblovatny, Alexandra Bazarova, Alexey Zaytsev |
阅读更多来源: ArXiv AI | 12-06-25
3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation
Authors: Seonho Lee, Jiho Choi, Inha Kang, Jiwook Kim, Junsung Park, Hyunjung Shim |
阅读更多来源: ArXiv AI | 12-06-25
Stakeholder Participation for Responsible AI Development: Disconnects Between Guidance and Current Practice
Authors: Emma Kallina, Thomas Bohné, Jat Singh |
阅读更多来源: ArXiv AI | 12-06-25
HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations
Authors: Marco Federici, Riccardo Del Chiaro, Boris van Breugel, Paul Whatmough, Markus Nagel |
阅读更多来源: ArXiv AI | 12-06-25
PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants
Authors: Zheng Zhao, Clara Vania, Subhradeep Kayal, Naila Khan, Shay B. Cohen, Emine Yilmaz |
阅读更多来源: ArXiv AI | 12-06-25
The Emergence of Abstract Thought in Large Language Models Beyond Any Language
Authors: Yuxin Chen, Yiran Zhao, Yang Zhang, An Zhang, Kenji Kawaguchi, Shafiq Joty, Junnan Li, Tat-Seng Chua, Michael Qizhe Shieh, Wenxuan Zhang |
阅读更多来源: ArXiv AI | 12-06-25
LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge
Authors: Sahar Abdelnabi, Aideen Fay, Ahmed Salem, Egor Zverev, Kai-Chieh Liao, Chi-Huang Liu, Chun-Chih Kuo, Jannis Weigend, Danyael Manlangit, Alex Apostolov, Haris Umair, João Donato, Masayuki Kawakita, Athar Mahboob, Tran Huu Bach, Tsun-Han Chiang, Myeongjin Cho, Hajin Choi, Byeonghyeon Kim, Hyeonjin Lee, Benjamin Pannell, Conor McCauley, Mark Russinovich, Andrew Paverd, Giovanni Cherubin |
阅读更多来源: ArXiv AI | 12-06-25
A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy
Authors: Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Chunyu Miao, Dongyuan Li, Aiwei Liu, Yue Zhou, Yankai Chen, Weizhi Zhang, Yangning Li, Liancheng Fang, Renhe Jiang, Philip S. Yu |
阅读更多来源: ArXiv AI | 12-06-25
Beyond Nash Equilibrium: Bounded Rationality of LLMs and humans in Strategic Decision-making
Authors: Kehan Zheng, Jinfeng Zhou, Hongning Wang |
阅读更多来源: ArXiv AI | 12-06-25
DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy
Authors: Kaixuan Xu, Jiajun Chai, Sicheng Li, Yuqian Fu, Yuanheng Zhu, Dongbin Zhao |
阅读更多来源: ArXiv AI | 12-06-25
Application-Driven Value Alignment in Agentic AI Systems: Survey and Perspectives
Authors: Wei Zeng, Hengshu Zhu, Chuan Qin, Han Wu, Yihang Cheng, Sirui Zhang, Xiaowei Jin, Yinuo Shen, Zhenxing Wang, Feimin Zhong, Hui Xiong |
阅读更多来源: ArXiv AI | 12-06-25
Fine-tuning LLMs is a waste of timecodinginterviewsmadesimple.substack.com
阅读更多来源: Hacker News | 12-06-25
EchoLeak – 0-Click AI Vulnerability Enabling Data Exfiltration from 365 Copilotaim.security
阅读更多来源: Hacker News | 12-06-25
OpenAI co-founder Ilya Sutskever believes AI will shape everyone's life "whether you like it or not"
阅读更多来源: The Decoder | 11-06-25
Meta AI chief scientist LeCun's latest comment reveals deep industry split over the future of AI
阅读更多来源: The Decoder | 11-06-25
Scientists discover that feeding AI models 10% 4chan trash actually makes them better behaved
阅读更多来源: The Decoder | 11-06-25
Zuckerberg forms elite AI team to catch up with competitors
阅读更多来源: The Decoder | 11-06-25
Apple's new Foundation Models framework adds on-device AI to apps with three lines of Swift code
阅读更多来源: The Decoder | 11-06-25
OpenAI dropped the price of o3 by 80%twitter.com/sama
阅读更多来源: Hacker News | 11-06-25
Low-background Steel: content without AI contaminationjgc.org
阅读更多来源: Hacker News | 11-06-25
Launch HN: BitBoard (YC X25) – AI agents for healthcare back-offices
阅读更多来源: Hacker News | 11-06-25
AlphaWrite: AI that improves at writing by evolving its own storiestobysimonds.com
阅读更多来源: Hacker News | 11-06-25
WIP: Large Language Model-Enhanced Smart Tutor for Undergraduate Circuit Analysis
Authors: Liangliang Chen, Huiru Xie, Jacqueline Rohde, Ying Zhang |
阅读更多来源: ArXiv AI | 11-06-25
Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions
Authors: Clara Lachenmaier, Judith Sieker, Sina Zarrieß |
阅读更多来源: ArXiv AI | 11-06-25
Propositional Logic for Probing Generalization in Neural Networks
Authors: Anna Langedijk, Jaap Jumelet, Willem Zuidema |
阅读更多来源: ArXiv AI | 11-06-25
Tailored Architectures for Time Series Forecasting: Evaluating Deep Learning Models on Gaussian Process-Generated Data
Authors: Victoria Hankemeier, Malte Schilling |
阅读更多来源: ArXiv AI | 11-06-25
Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation
Authors: Xiaowen Ma, Chenyang Lin, Yao Zhang, Volker Tresp, Yunpu Ma |
阅读更多来源: ArXiv AI | 11-06-25
FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed
Authors: Sizhe Dang, Yangyang Guo, Yanjun Zhao, Haishan Ye, Xiaodong Zheng, Guang Dai, Ivor Tsang |
阅读更多来源: ArXiv AI | 11-06-25
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning
Authors: Haozhen Zhang, Tao Feng, Jiaxuan You |
阅读更多来源: ArXiv AI | 11-06-25
LeanTutor: A Formally-Verified AI Tutor for Mathematical Proofs
Authors: Manooshree Patel, Rayna Bhattacharyya, Thomas Lu, Arnav Mehta, Niels Voss, Narges Norouzi, Gireeja Ranade |
阅读更多来源: ArXiv AI | 11-06-25
The AI Imperative: Scaling High-Quality Peer Review in Machine Learning
Authors: Qiyao Wei, Samuel Holt, Jing Yang, Markus Wulfmeier, Mihaela van der Schaar |
阅读更多来源: ArXiv AI | 11-06-25
SOP-Bench: Complex Industrial SOPs for Evaluating LLM Agents
Authors: Subhrangshu Nandi, Arghya Datta, Nikhil Vichare, Indranil Bhattacharya, Huzefa Raja, Jing Xu, Shayan Ray, Giuseppe Carenini, Abhi Srivastava, Aaron Chan, Man Ho Woo, Amar Kandola, Brandon Theresa, Francesco Carbone |
阅读更多来源: ArXiv AI | 11-06-25
Transforming Expert Knowledge into Scalable Ontology via Large Language Models
Authors: Ikkei Itoku, David Theil, Evelyn Eichelsdoerfer Uehara, Sreyoshi Bhaduri, Junnosuke Kuroda, Toshi Yumoto, Alex Gil, Natalie Perez, Rajesh Cherukuri, Naumaan Nayyar |
阅读更多来源: ArXiv AI | 11-06-25
A Survey on Large Language Models for Mathematical Reasoning
Authors: Peng-Yuan Wang, Tian-Shuo Liu, Chenyang Wang, Yi-Di Wang, Shu Yan, Cheng-Xing Jia, Xu-Hui Liu, Xin-Wei Chen, Jia-Cheng Xu, Ziniu Li, Yang Yu |
阅读更多来源: ArXiv AI | 11-06-25
HGFormer: A Hierarchical Graph Transformer Framework for Two-Stage Colonel Blotto Games via Reinforcement Learning
Authors: Yang Lv, Jinlong Lei, Peng Yi |
阅读更多来源: ArXiv AI | 11-06-25
Safe and Economical UAV Trajectory Planning in Low-Altitude Airspace: A Hybrid DRL-LLM Approach with Compliance Awareness
Authors: Yanwei Gong, Xiaolin Chang |
阅读更多来源: ArXiv AI | 11-06-25
Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning
Authors: Kongcheng Zhang, Qi Yao, Shunyu Liu, Yingjie Wang, Baisheng Lai, Jieping Ye, Mingli Song, Dacheng Tao |
阅读更多来源: ArXiv AI | 11-06-25
Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task
Authors: Nataliya Kosmyna, Eugene Hauptmann, Ye Tong Yuan, Jessica Situ, Xian-Hao Liao, Ashly Vivian Beresnitzky, Iris Braunstein, Pattie Maes |
阅读更多来源: ArXiv AI | 11-06-25
Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents
Authors: Irene Testini, José Hernández-Orallo, Lorenzo Pacchiardi |
阅读更多来源: ArXiv AI | 11-06-25
AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions
Authors: Polina Kirichenko, Mark Ibrahim, Kamalika Chaudhuri, Samuel J. Bell |
阅读更多来源: ArXiv AI | 11-06-25
ChatGPT's voice is now more natural and can consistently translate conversations in real time
阅读更多来源: The Decoder | 10-06-25
Google's Gemini 2.5 Pro beats OpenAI's o3 model in processing complex, lengthy texts
阅读更多来源: The Decoder | 10-06-25
ChatGPT scams range from silly money-making ploys to calculated political meddling
阅读更多来源: The Decoder | 10-06-25
Boosting LLM Reasoning via Spontaneous Self-Correction
Authors: Xutong Zhao, Tengyu Xu, Xuewei Wang, Zhengxing Chen, Di Jin, Liang Tan, Yen-Ting, Zishun Yu, Zhuokai Zhao, Yun He, Sinong Wang, Han Fang, Sarath Chandar, Chen Zhu |
阅读更多来源: ArXiv AI | 10-06-25
Evaluating LLM-corrupted Crowdsourcing Data Without Ground Truth
Authors: Yichi Zhang, Jinlong Pang, Zhaowei Zhu, Yang Liu |
阅读更多来源: ArXiv AI | 10-06-25
Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images
Authors: Liangliang You, Junchi Yao, Shu Yang, Guimin Hu, Lijie Hu, Di Wang |
阅读更多来源: ArXiv AI | 10-06-25
Translating Federated Learning Algorithms in Python into CSP Processes Using ChatGPT
Authors: Miroslav Popovic, Marko Popovic, Miodrag Djukic, Ilija Basicevic |
阅读更多来源: ArXiv AI | 10-06-25
BRIGHT+: Upgrading the BRIGHT Benchmark with MARCUS, a Multi-Agent RAG Clean-Up Suite
Authors: Liyang Chen, Yujun Cai, Jieqiong Dong, Yiwei Wang |
阅读更多来源: ArXiv AI | 10-06-25
Reasoning Multimodal Large Language Model: Data Contamination and Dynamic Evaluation
Authors: Ming Liu, Wensheng Zhang |
阅读更多来源: ArXiv AI | 10-06-25
Boosting Vulnerability Detection of LLMs via Curriculum Preference Optimization with Synthetic Reasoning Data
Authors: Xin-Cheng Wen, Yijun Yang, Cuiyun Gao, Yang Xiao, Deheng Ye |
阅读更多来源: ArXiv AI | 10-06-25
LLM-Enhanced Rapid-Reflex Async-Reflect Embodied Agent for Real-Time Decision-Making in Dynamically Changing Environments
Authors: Yangqing Zheng, Shunqi Mao, Dingxin Zhang, Weidong Cai |
阅读更多来源: ArXiv AI | 10-06-25
Evaluating Visual Mathematics in Multimodal LLMs: A Multilingual Benchmark Based on the Kangaroo Tests
Authors: Arnau Igualde Sáez, Lamyae Rhomrasi, Yusef Ahsini, Ricardo Vinuesa, Sergio Hoyas, Jose P. García Sabater, Marius J. Fullana i Alfonso, J. Alberto Conejero |
阅读更多来源: ArXiv AI | 10-06-25
An Intelligent Fault Self-Healing Mechanism for Cloud AI Systems via Integration of Large Language Models and Deep Reinforcement Learning
Authors: Ze Yang, Yihong Jin, Juntian Liu, Xinhe Xu |
阅读更多来源: ArXiv AI | 10-06-25
Fact in Fragments: Deconstructing Complex Claims via LLM-based Atomic Fact Extraction and Verification
Authors: Liwen Zheng, Chaozhuo Li, Zheng Liu, Feiran Huang, Haoran Jia, Zaisheng Ye, Xi Zhang |
阅读更多来源: ArXiv AI | 10-06-25
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
Authors: Lu Ma, Hao Liang, Meiyi Qiang, Lexiang Tang, Xiaochen Ma, Zhen Hao Wong, Junbo Niu, Chengyu Shen, Runming He, Bin Cui, Wentao Zhang |
阅读更多来源: ArXiv AI | 10-06-25
REMoH: A Reflective Evolution of Multi-objective Heuristics approach via Large Language Models
Authors: Diego Forniés-Tabuenca, Alejandro Uribe, Urtzi Otamendi, Arkaitz Artetxe, Juan Carlos Rivera, Oier Lopez de Lacalle |
阅读更多来源: ArXiv AI | 10-06-25
RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards
Authors: Jingnan Zheng, Xiangtian Ji, Yijun Lu, Chenhang Cui, Weixiang Zhao, Gelei Deng, Zhenkai Liang, An Zhang, Tat-Seng Chua |
阅读更多来源: ArXiv AI | 10-06-25
Addition in Four Movements: Mapping Layer-wise Information Trajectories in LLMs
Authors: Yao Yan |
阅读更多来源: ArXiv AI | 10-06-25
Evaluating Large Language Models on the Frame and Symbol Grounding Problems: A Zero-shot Benchmark
Authors: Shoko Oka |
阅读更多来源: ArXiv AI | 10-06-25
Gradients: When Markets Meet Fine-tuning -- A Distributed Approach to Model Optimisation
Authors: Christopher Subia-Waud (Rayonlabs Team) |
阅读更多来源: ArXiv AI | 10-06-25
Solving Inequality Proofs with Large Language Models
Authors: Jiayi Sheng, Luna Lyu, Jikai Jin, Tony Xia, Alex Gu, James Zou, Pan Lu |
阅读更多来源: ArXiv AI | 10-06-25
Hey, That's My Data! Label-Only Dataset Inference in Large Language Models
Authors: Chen Xiong, Zihao Wang, Rui Zhu, Tsung-Yi Ho, Pin-Yu Chen, Jingwei Xiong, Haixu Tang, Lucila Ohno-Machado |
阅读更多来源: ArXiv AI | 10-06-25
End-to-End Framework for Robot Lawnmower Coverage Path Planning using Cellular Decomposition
Authors: Nikunj Shah, Utsav Dey, Kenji Nishimiya |
阅读更多来源: ArXiv AI | 10-06-25
Text-to-LoRA: Instant Transformer Adaption
Authors: Rujikorn Charakorn, Edoardo Cetin, Yujin Tang, Robert Tjarko Lange |
阅读更多来源: ArXiv AI | 10-06-25
Simple Yet Effective: Extracting Private Data Across Clients in Federated Fine-Tuning of Large Language Models
Authors: Yingqi Hu, Zhuo Zhang, Jingyuan Zhang, Lizhen Qu, Zenglin Xu |
阅读更多来源: ArXiv AI | 10-06-25
semantic-features: A User-Friendly Tool for Studying Contextual Word Embeddings in Interpretable Semantic Spaces
Authors: Jwalanthi Ranganathan, Rohan Jha, Kanishka Misra, Kyle Mahowald |
阅读更多来源: ArXiv AI | 10-06-25
(AI peers) are people learning from the same standpoint: Perception of AI characters in a Collaborative Science Investigation
Authors: Eunhye Grace Ko, Soo Hyoung Joo |
阅读更多来源: ArXiv AI | 10-06-25
DesignBench: A Comprehensive Benchmark for MLLM-based Front-end Code Generation
Authors: Jingyu Xiao, Ming Wang, Man Ho Lam, Yuxuan Wan, Junliang Liu, Yintong Huo, Michael R. Lyu |
阅读更多来源: ArXiv AI | 10-06-25
Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models
Authors: Zahra Babaiee, Peyman M. Kiasari, Daniela Rus, Radu Grosu |
阅读更多来源: ArXiv AI | 10-06-25
Towards an Explainable Comparison and Alignment of Feature Embeddings
Authors: Mohammad Jalali, Bahar Dibaei Nia, Farzan Farnia |
阅读更多来源: ArXiv AI | 10-06-25
Towards Data Systems That Are Business Semantic-Centric and AI Agents-Assisted
Authors: Cecil Pang |
阅读更多来源: ArXiv AI | 10-06-25
Contextual Memory Intelligence -- A Foundational Paradigm for Human-AI Collaboration and Reflective Generative AI Systems
Authors: Kristy Wedel |
阅读更多来源: ArXiv AI | 10-06-25
Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias
Authors: Yuanzhe Hu, Kinshuk Goel, Vlad Killiakov, Yaoqing Yang |
阅读更多来源: ArXiv AI | 10-06-25
Explainability in Context: A Multilevel Framework Aligning AI Explanations with Stakeholder with LLMs
Authors: Marilyn Bello, Rafael Bello, Maria-Matilde García, Ann Nowé, Iván Sevillano-García, Francisco Herrera |
阅读更多来源: ArXiv AI | 10-06-25
CrimeMind: Simulating Urban Crime with Multi-Modal LLM Agents
Authors: Qingbin Zeng, Ruotong Zhao, Jinzhu Mao, Haoyang Li, Fengli Xu, Yong Li |
阅读更多来源: ArXiv AI | 10-06-25
Preference Learning for AI Alignment: a Causal Perspective
Authors: Katarzyna Kobalczyk, Mihaela van der Schaar |
阅读更多来源: ArXiv AI | 10-06-25
CP-Bench: Evaluating Large Language Models for Constraint Modelling
Authors: Kostis Michailidis, Dimos Tsouros, Tias Guns |
阅读更多来源: ArXiv AI | 10-06-25
PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time
Authors: Weizhi Zhang, Xinyang Zhang, Chenwei Zhang, Liangwei Yang, Jingbo Shang, Zhepei Wei, Henry Peng Zou, Zijie Huang, Zhengyang Wang, Yifan Gao, Xiaoman Pan, Lian Xiong, Jingguo Liu, Philip S. Yu, Xian Li |
阅读更多来源: ArXiv AI | 10-06-25
The last six months in LLMs, illustrated by pelicans on bicyclessimonwillison.net
阅读更多来源: Hacker News | 09-06-25
What happens when people don't understand how AI workstheatlantic.com
阅读更多来源: Hacker News | 09-06-25
LLMs are cheapsnellman.net
阅读更多来源: Hacker News | 09-06-25
OpenAI leaves the question of AI consciousness consciously unanswered
阅读更多来源: The Decoder | 09-06-25
Anthropic cuts Claude access for Windsurf after OpenAI's $3B takeover news
阅读更多来源: The Decoder | 09-06-25
Building an AI server on a budgetinformationga.in
阅读更多来源: Hacker News | 09-06-25
Teaming in the AI Era: AI-Augmented Frameworks for Forming, Simulating, and Optimizing Human Teams
Authors: Mohammed Almutairi |
阅读更多来源: ArXiv AI | 08-06-25
Exploring Diffusion Transformer Designs via Grafting
Authors: Keshigeyan Chandrasegaran, Michael Poli, Daniel Y. Fu, Dongjun Kim, Lea M. Hadzic, Manling Li, Agrim Gupta, Stefano Massaroli, Azalia Mirhoseini, Juan Carlos Niebles, Stefano Ermon, Li Fei-Fei |
阅读更多来源: ArXiv AI | 08-06-25
Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay
Authors: Yifan Sun, Jingyan Shen, Yibin Wang, Tianyu Chen, Zhendong Wang, Mingyuan Zhou, Huan Zhang |
阅读更多来源: ArXiv AI | 08-06-25
Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models
Authors: Taha Entesari, Arman Hatami, Rinat Khaziev, Anil Ramakrishna, Mahyar Fazlyab |
阅读更多来源: ArXiv AI | 08-06-25
Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games
Authors: Niv Eckhaus, Uri Berger, Gabriel Stanovsky |
阅读更多来源: ArXiv AI | 08-06-25
Contextual Integrity in LLMs via Reasoning and Reinforcement Learning
Authors: Guangchen Lan, Huseyin A. Inan, Sahar Abdelnabi, Janardhan Kulkarni, Lukas Wutschitz, Reza Shokri, Christopher G. Brinton, Robert Sim |
阅读更多来源: ArXiv AI | 08-06-25
Zero-Shot Adaptation of Parameter-Efficient Fine-Tuning in Diffusion Models
Authors: Farzad Farhadzadeh, Debasmit Das, Shubhankar Borse, Fatih Porikli |
阅读更多来源: ArXiv AI | 08-06-25
Plugging Schema Graph into Multi-Table QA: A Human-Guided Framework for Reducing LLM Reliance
Authors: Xixi Wang, Miguel Costa, Jordanka Kovaceva, Shuai Wang, Francisco C. Pereira |
阅读更多来源: ArXiv AI | 08-06-25
CogMath: Assessing LLMs' Authentic Mathematical Ability from a Human Cognitive Perspective
Authors: Jiayu Liu, Zhenya Huang, Wei Dai, Cheng Cheng, Jinze Wu, Jing Sha, Song Li, Qi Liu, Shijin Wang, Enhong Chen |
阅读更多来源: ArXiv AI | 08-06-25
Matching Markets Meet LLMs: Algorithmic Reasoning with Ranked Preferences
Authors: Hadi Hosseini, Samarth Khanna, Ronak Singh |
阅读更多来源: ArXiv AI | 08-06-25
Schema Generation for Large Knowledge Graphs Using Large Language Models
Authors: Bohui Zhang, Yuan He, Lydia Pintscher, Albert Meroño Peñuela, Elena Simperl |
阅读更多来源: ArXiv AI | 08-06-25
"Don't Do That!": Guiding Embodied Systems through Large Language Model-based Constraint Generation
Authors: Aladin Djuhera, Amin Seffo, Masataro Asai, Holger Boche |
阅读更多来源: ArXiv AI | 08-06-25
DeePoly: A High-Order Accuracy and Efficiency Deep-Polynomial Framework for Scientific Machine Learning
Authors: Li Liu, Heng Yong |
阅读更多来源: ArXiv AI | 08-06-25
E-bike agents: Large Language Model-Driven E-Bike Accident Analysis and Severity Prediction
Authors: Zhichao Yang, Jiashu He, Mohammad B. Al-Khasawneh, Darshan Pandit, Cirillo Cinzia |
阅读更多来源: ArXiv AI | 08-06-25
Agents of Change: Self-Evolving LLM Agents for Strategic Planning
Authors: Nikolas Belle, Dakota Barnes, Alfonso Amayuelas, Ivan Bercovich, Xin Eric Wang, William Wang |
阅读更多来源: ArXiv AI | 08-06-25
Ontology-based knowledge representation for bone disease diagnosis: a foundation for safe and sustainable medical artificial intelligence systems
Authors: Loan Dao, Ngoc Quoc Ly |
阅读更多来源: ArXiv AI | 08-06-25
Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design
Authors: Lin Sun, Weihong Lin, Jinzhu Wu, Yongfu Zhu, Xiaoqi Jian, Guangxiang Zhao, Change Jia, Linglin Zhang, Sai-er Hu, Yuhan Wu, Xiangzheng Zhang |
阅读更多来源: ArXiv AI | 08-06-25
Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning
Authors: Jiayu Wang, Yifei Ming, Zixuan Ke, Caiming Xiong, Shafiq Joty, Aws Albarghouthi, Frederic Sala |
阅读更多来源: ArXiv AI | 08-06-25
LLMs for sensory-motor control: Combining in-context and iterative learning
Authors: Jônata Tyska Carvalho, Stefano Nolfi |
阅读更多来源: ArXiv AI | 08-06-25
When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models
Authors: Kai Wang, Yihao Zhang, Meng Sun |
阅读更多来源: ArXiv AI | 08-06-25
LLM-First Search: Self-Guided Exploration of the Solution Space
Authors: Nathan Herr, Tim Rocktäschel, Roberta Raileanu |
阅读更多来源: ArXiv AI | 08-06-25
Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic Reasoning
Authors: Mehdi Azarafza, Mojtaba Nayyeri, Faezeh Pasandideh, Steffen Staab, Achim Rettberg |
阅读更多来源: ArXiv AI | 08-06-25
Control Tax: The Price of Keeping AI in Check
Authors: Mikhail Terekhov, Zhen Ning David Liu, Caglar Gulcehre, Samuel Albanie |
阅读更多来源: ArXiv AI | 08-06-25
Focus and Context and LLMsglek.net
阅读更多来源: Hacker News | 08-06-25
Field Notes from Shipping Real Code with Claudediwank.space
阅读更多来源: Hacker News | 08-06-25
Elevenlabs' Eleven v3 lets AI voices whisper, laugh and express emotions naturally
阅读更多来源: The Decoder | 08-06-25
OpenAI starts retaining all ChatGPT user data, including deleted chats and API data
阅读更多来源: The Decoder | 08-06-25
I read all of Cloudflare's Claude-generated commitsmaxemitchell.com
阅读更多来源: Hacker News | 08-06-25
Updates to Advanced Voice Mode for paid usershelp.openai.com
阅读更多来源: Hacker News | 08-06-25
Reddit sues Anthropic for scraping site content to train Claude
阅读更多来源: The Decoder | 07-06-25
Meta's new high-tech Aria Gen 2 glasses are the ultimate AI training data collector
阅读更多来源: The Decoder | 07-06-25
Sandia turns on brain-like storage-free supercomputerblocksandfiles.com
阅读更多来源: Hacker News | 07-06-25
Show HN: AI game animation sprite generatorgodmodeai.cloud
阅读更多来源: Hacker News | 07-06-25
Workhorse LLMs: Why Open Source Models Dominate Closed Source for Batch Taskssutro.sh
阅读更多来源: Hacker News | 07-06-25
The Illusion of Thinking: Understanding the Limitations of Reasoning LLMs [pdf]cdn-apple.com
阅读更多来源: Hacker News | 07-06-25
NASA delays next flight of Boeing's alternative to SpaceX Dragontheedgemalaysia.com
阅读更多来源: Hacker News | 07-06-25
Reverse Engineering Cursor's LLM Clienttensorzero.com
阅读更多来源: Hacker News | 07-06-25
Onyx (YC W24) – AI Assistants for Work Hiring Founding AEycombinator.com
阅读更多来源: Hacker News | 07-06-25
Meta: Shut down your invasive AI Discover feedmozillafoundation.org
阅读更多来源: Hacker News | 07-06-25
What "Working" Means in the Era of AI Appsa16z.com
阅读更多来源: Hacker News | 07-06-25
OpenAI reaches three million enterprise users, adds new ChatGPT business features
阅读更多来源: The Decoder | 06-06-25
Tokasaurus: An LLM inference engine for high-throughput workloadsstanford.edu
阅读更多来源: Hacker News | 06-06-25
How we’re responding to The NYT’s data demands in order to protect user privacyopenai.com
阅读更多来源: Hacker News | 06-06-25
Show HN: Claude Composergithub.com/possibilities
阅读更多来源: Hacker News | 06-06-25
Anthropic slashes Claude 3.x access on Windsurf following OpenAI's reported $3 billion takeover
阅读更多来源: The Decoder | 06-06-25
Anthropic co-founder on cutting access to Windsurftechcrunch.com
阅读更多来源: Hacker News | 06-06-25
Machine Learning: The Native Language of Biologydecodingbiology.substack.com
阅读更多来源: Hacker News | 06-06-25
OpenAI brings longer-term memory feature to free ChatGPT users
阅读更多来源: The Decoder | 05-06-25
OpenAI adds new features and improvements to its agent development tools and language model
阅读更多来源: The Decoder | 05-06-25
Yoshua Bengio launches LawZero to develop safe AI systems free from commercial influence
阅读更多来源: The Decoder | 05-06-25
A practical guide to building agents [pdf]cdn.openai.com
阅读更多来源: Hacker News | 05-06-25
Differences in link hallucination and source comprehension across different LLMmikecaulfield.substack.com
阅读更多来源: Hacker News | 05-06-25
Comparing Claude System Prompts Reveal Anthropic's Prioritiesdbreunig.com
阅读更多来源: Hacker News | 05-06-25
LLMs and Elixir: Windfall or Deathblow?zachdaniel.dev
阅读更多来源: Hacker News | 05-06-25
Prompt engineering playbook for programmersaddyo.substack.com
阅读更多来源: Hacker News | 05-06-25
OpenAI slams court order to save all ChatGPT logs, including deleted chatsarstechnica.com
阅读更多来源: Hacker News | 05-06-25
From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaningarxiv.org
阅读更多来源: Hacker News | 05-06-25
Generating Automotive Code: Large Language Models for Software Development and Verification in Safety-Critical Systems
Authors: Sven Kirchner, Alois C. Knoll |
阅读更多来源: ArXiv AI | 05-06-25
High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning
Authors: Tim Franzmeyer, Archie Sravankumar, Lijuan Liu, Yuning Mao, Rui Hou, Sinong Wang, Jakob N. Foerster, Luke Zettlemoyer, Madian Khabsa |
阅读更多来源: ArXiv AI | 05-06-25
Explainability-Based Token Replacement on LLM-Generated Text
Authors: Hadi Mohammadi, Anastasia Giachanou, Daniel L. Oberski, Ayoub Bagheri |
阅读更多来源: ArXiv AI | 05-06-25
Lacuna Inc. at SemEval-2025 Task 4: LoRA-Enhanced Influence-Based Unlearning for LLMs
Authors: Aleksey Kudelya, Alexander Shirnin |
阅读更多来源: ArXiv AI | 05-06-25
Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate
Authors: Mikel K. Ngueajio, Flor Miriam Plaza-del-Arco, Yi-Ling Chung, Danda B. Rawat, Amanda Cercas Curry |
阅读更多来源: ArXiv AI | 05-06-25
EuroLLM-9B: Technical Report
Authors: Pedro Henrique Martins, João Alves, Patrick Fernandes, Nuno M. Guerreiro, Ricardo Rei, Amin Farajian, Mateusz Klimaszewski, Duarte M. Alves, José Pombal, Manuel Faysse, Pierre Colombo, François Yvon, Barry Haddow, José G. C. de Souza, Alexandra Birch, André F. T. Martins |
阅读更多来源: ArXiv AI | 05-06-25
LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
Authors: Ming Zhang, Yujiong Shen, Zelin Li, Huayu Sha, Binze Hu, Yuhui Wang, Chenhao Huang, Shichun Liu, Jingqi Tong, Changhao Jiang, Mingxu Chai, Zhiheng Xi, Shihan Dou, Tao Gui, Qi Zhang, Xuanjing Huang |
阅读更多来源: ArXiv AI | 05-06-25
A Comprehensive Study on Medical Image Segmentation using Deep Neural Networks
Authors: Loan Dao, Ngoc Quoc Ly |
阅读更多来源: ArXiv AI | 05-06-25
TracLLM: A Generic Framework for Attributing Long Context LLMs
Authors: Yanting Wang, Wei Zou, Runpeng Geng, Jinyuan Jia |
阅读更多来源: ArXiv AI | 05-06-25
A Trustworthiness-based Metaphysics of Artificial Intelligence Systems
Authors: Andrea Ferrario |
阅读更多来源: ArXiv AI | 05-06-25
Computational Architects of Society: Quantum Machine Learning for Social Rule Genesis
Authors: Shan Shan |
阅读更多来源: ArXiv AI | 05-06-25
SUMO-MCP: Leveraging the Model Context Protocol for Autonomous Traffic Simulation and Optimization
Authors: Chenglong Ye, Gang Xiong, Junyou Shang, Xingyuan Dai, Xiaoyan Gong, Yisheng Lv |
阅读更多来源: ArXiv AI | 05-06-25
CogniPair: From LLM Chatbots to Conscious AI Agents -- GNWT-Based Multi-Agent Digital Twins for Social Pairing -- Dating & Hiring Applications
Authors: Wanghao Ye, Sihan Chen, Yiting Wang, Shwai He, Bowei Tian, Guoheng Sun, Ziyi Wang, Ziyao Wang, Yexiao He, Zheyu Shen, Meng Liu, Yuning Zhang, Meng Feng, Yang Wang, Siyuan Peng, Yilong Dai, Zhenle Duan, Hanzhang Qin, Ang Li |
阅读更多来源: ArXiv AI | 05-06-25
Reason from Future: Reverse Thought Chain Enhances LLM Reasoning
Authors: Yinlong Xu, Yanzhao Zheng, Shuoshuo Sun, Shuaihan Huang, Baohua Dong, Hangcheng Zhu, Ruohui Huang, Gang Yu, Hongxia Xu, Jian Wu |
阅读更多来源: ArXiv AI | 05-06-25
Training Cross-Morphology Embodied AI Agents: From Practical Challenges to Theoretical Foundations
Authors: Shaoshan Liu, Fan Wang, Hongjun Zhou, Yuanfeng Wang |
阅读更多来源: ArXiv AI | 05-06-25
Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games
Authors: Dongmin Park, Minkyu Kim, Beongjun Choi, Junhyuck Kim, Keon Lee, Jonghyun Lee, Inkyu Park, Byeong-Uk Lee, Jaeyoung Hwang, Jaewoo Ahn, Ameya S. Mahabaleshwarkar, Bilal Kartal, Pritam Biswas, Yoshi Suhara, Kangwook Lee, Jaewoong Cho |
阅读更多来源: ArXiv AI | 05-06-25
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance
Authors: Dhaval Patel, Shuxin Lin, James Rayfield, Nianjun Zhou, Roman Vaculin, Natalia Martinez, Fearghal O'donncha, Jayant Kalagnanam |
阅读更多来源: ArXiv AI | 05-06-25
Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to Enhance LLM Reasoning
Authors: Junqi Gao, Xiang Zou, YIng Ai, Dong Li, Yichen Niu, Biqing Qi, Jianxing Liu |
阅读更多来源: ArXiv AI | 05-06-25
AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM-Based Agents
Authors: Akshat Naik, Patrick Quinn, Guillermo Bosch, Emma Gouné, Francisco Javier Campos Zabala, Jason Ross Brown, Edward James Young |
阅读更多来源: ArXiv AI | 05-06-25
TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems
Authors: Shaina Raza, Ranjan Sapkota, Manoj Karkee, Christos Emmanouilidis |
阅读更多来源: ArXiv AI | 05-06-25
Character.AI moves toward social networking with animated AI avatars
阅读更多来源: The Decoder | 05-06-25
Show HN: App.build, an open-source AI agent that builds full-stack appsapp.build
阅读更多来源: Hacker News | 05-06-25
VectorSmuggle: Covertly Exfiltrate Data in Embeddingsgithub.com/jaschadub
阅读更多来源: Hacker News | 05-06-25
After court order, OpenAI is now preserving all ChatGPT user logslaurenweinstein.org
阅读更多来源: Hacker News | 05-06-25
Deepmind's "force prompting" lets AI create realistic video motion without physics engines
阅读更多来源: The Decoder | 04-06-25
AI-generated CUDA kernels outperform PyTorch in several GPU-heavy machine learning benchmarks
阅读更多来源: The Decoder | 04-06-25
Apple reportedly tests AI models that match ChatGPT's capabilities in internal benchmarks
阅读更多来源: The Decoder | 04-06-25
Show HN: Tiptap AI Agent – Add AI workflows to your text editor in minutes
阅读更多来源: Hacker News | 04-06-25
The Sky's the limit: AI automation on Mactaoofmac.com
阅读更多来源: Hacker News | 04-06-25
Claude Code is now available to Pro plansanthropic.com
阅读更多来源: Hacker News | 04-06-25
Deep learning gets the glory, deep fact checking gets ignoredfast.ai
阅读更多来源: Hacker News | 04-06-25
A deep dive into self-improving AI and the Darwin-Gödel Machinerichardcsuwandi.github.io
阅读更多来源: Hacker News | 04-06-25
Cloud Run GPUs, now GA, makes running AI workloads easier for everyonecloud.google.com
阅读更多来源: Hacker News | 04-06-25
Entity-Augmented Neuroscience Knowledge Retrieval Using Ontology and Semantic Understanding Capability of LLM
Authors: Pralaypati Ta, Sriram Venkatesaperumal, Keerthi Ram, Mohanasankar Sivaprakasam |
阅读更多来源: ArXiv AI | 04-06-25
SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation
Authors: Siqi Chen, Xinyu Dong, Haolei Xu, Xingyu Wu, Fei Tang, Hang Zhang, Yuchen Yan, Linjuan Wu, Wenqi Zhang, Guiyang Hou, Yongliang Shen, Weiming Lu, Yueting Zhuang |
阅读更多来源: ArXiv AI | 04-06-25
Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation
Authors: Priyaranjan Pattnayak, Amit Agarwal, Hansa Meghwani, Hitesh Laxmichand Patel, Srikant Panda |
阅读更多来源: ArXiv AI | 04-06-25
The State of Large Language Models for African Languages: Progress and Challenges
Authors: Kedir Yassin Hussen, Walelign Tewabe Sewunetie, Abinew Ali Ayele, Sukairaj Hafiz Imam, Shamsuddeen Hassan Muhammad, Seid Muhie Yimam |
阅读更多来源: ArXiv AI | 04-06-25
Improving LLM-Generated Code Quality with GRPO
Authors: Maxime Robeyns, Laurence Aitchison |
阅读更多来源: ArXiv AI | 04-06-25
Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts
Authors: Haizhong Zheng, Yang Zhou, Brian R. Bartoldson, Bhavya Kailkhura, Fan Lai, Jiawei Zhao, Beidi Chen |
阅读更多来源: ArXiv AI | 04-06-25
ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code
Authors: Tianyu Hua, Harper Hua, Violet Xiang, Benjamin Klieger, Sang T. Truong, Weixin Liang, Fan-Yun Sun, Nick Haber |
阅读更多来源: ArXiv AI | 04-06-25
Generative AI for Predicting 2D and 3D Wildfire Spread: Beyond Physics-Based Models and Traditional Deep Learning
Authors: Haowen Xu, Sisi Zlatanova, Ruiyu Liang, Ismet Canbulat |
阅读更多来源: ArXiv AI | 04-06-25
A Smart Multimodal Healthcare Copilot with Powerful LLM Reasoning
Authors: Xuejiao Zhao, Siyan Liu, Su-Yin Yang, Chunyan Miao |
阅读更多来源: ArXiv AI | 04-06-25
Towards Generating Controllable and Solvable Geometry Problem by Leveraging Symbolic Deduction Engine
Authors: Zhuoxuan Jiang, Tianyang Zhang, Peiyan Peng, Jing Chen, Yinong Xun, Haotian Zhang, Lichi Li, Yong Li, Shaohua Zhang |
阅读更多来源: ArXiv AI | 04-06-25
Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making
Authors: Xu Wan, Wenyue Xu, Chao Yang, Mingyang Sun |
阅读更多来源: ArXiv AI | 04-06-25
ADFormer: Aggregation Differential Transformer for Passenger Demand Forecasting
Authors: Haichen Wang, Liu Yang, Xinyuan Zhang, Haomin Yu, Ming Li, Jilin Hu |
阅读更多来源: ArXiv AI | 04-06-25
Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation
Authors: Yue Yang, MingKang Chen, Qihua Liu, Mengkang Hu, Qiguang Chen, Gengrui Zhang, Shuyue Hu, Guangtao Zhai, Yu Qiao, Yu Wang, Wenqi Shao, Ping Luo |
阅读更多来源: ArXiv AI | 04-06-25
From Prompts to Protection: Large Language Model-Enabled In-Context Learning for Smart Public Safety UAV
Authors: Yousef Emami, Hao Zhou, Miguel Gutierrez Gaitan, Kai Li, Luis Almeida, Zhu Han |
阅读更多来源: ArXiv AI | 04-06-25
Open-Set Living Need Prediction with Large Language Models
Authors: Xiaochong Lan, Jie Feng, Yizhou Sun, Chen Gao, Jiahuan Lei, Xinlei Shi, Hengliang Luo, Yong Li |
阅读更多来源: ArXiv AI | 04-06-25
Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations
Authors: Jinyuan Luo, Zhen Fang, Yixuan Li, Seongheon Park, Ling Chen |
阅读更多来源: ArXiv AI | 04-06-25
Why do AI agents communicate in human language?
Authors: Pengcheng Zhou, Yinglun Feng, Halimulati Julaiti, Zhongliang Yang |
阅读更多来源: ArXiv AI | 04-06-25
Benchmarking and Advancing Large Language Models for Local Life Services
Authors: Xiaochong Lan, Jie Feng, Jiahuan Lei, Xinlei Shi, Yong Li |
阅读更多来源: ArXiv AI | 04-06-25
TaxAgent: How Large Language Model Designs Fiscal Policy
Authors: Jizhou Wang, Xiaodan Fang, Lei Huang, Yongfeng Huang |
阅读更多来源: ArXiv AI | 04-06-25
Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning
Authors: Chen Qian, Dongrui Liu, Haochen Wen, Zhen Bai, Yong Liu, Jing Shao |
阅读更多来源: ArXiv AI | 04-06-25
Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs
Authors: Shangmin Guo, Omar Darwiche Domingues, Raphaël Avalos, Aaron Courville, Florian Strub |
阅读更多来源: ArXiv AI | 04-06-25
It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics
Authors: Matthew Kowal, Jasper Timm, Jean-Francois Godbout, Thomas Costello, Antonio A. Arechar, Gordon Pennycook, David Rand, Adam Gleave, Kellin Pelrine |
阅读更多来源: ArXiv AI | 04-06-25
Linear Spatial World Models Emerge in Large Language Models
Authors: Matthieu Tehenan, Christian Bolivar Moya, Tenghai Long, Guang Lin |
阅读更多来源: ArXiv AI | 04-06-25
DPO Learning with LLMs-Judge Signal for Computer Use Agents
Authors: Man Luo, David Cobbley, Xin Su, Shachar Rosenman, Vasudev Lal, Shao-Yen Tseng, Phillip Howard |
阅读更多来源: ArXiv AI | 04-06-25
Anthropic's Claude uses Elevenlabs technology for speech features rather than an in-house model
阅读更多来源: The Decoder | 03-06-25
Google says Veo 3 users have generated millions of AI videos in just a few days
阅读更多来源: The Decoder | 03-06-25
Cloudlflare builds OAuth with Claude and publishes all the promptsgithub.com/cloudflare
阅读更多来源: Hacker News | 03-06-25
Spark AI (YC W24) Is Hiring a Full Stack Engineer in San Franciscoycombinator.com
阅读更多来源: Hacker News | 03-06-25
My AI skeptic friends are all nutsfly.io
阅读更多来源: Hacker News | 03-06-25
Claude has learned how to jailbreak Cursorcursor.com
阅读更多来源: Hacker News | 03-06-25
PolyBERT: Fine-Tuned Poly Encoder BERT-Based Model for Word Sense Disambiguation
Authors: Linhan Xia, Mingzhan Yang, Guohui Yuan, Shengnan Tao, Yujing Qiu, Guo Yu, Kai Lei |
阅读更多来源: ArXiv AI | 03-06-25
Unlocking Personalized Knowledge in Federated Large Language Model: The Power of Mixture of Experts
Authors: Fan Liu, Bikang Pan, Zhongyi Wang, Xi Yao, Xiaoying Tang, Jingya Wang, Ye Shi |
阅读更多来源: ArXiv AI | 03-06-25
The Coming Crisis of Multi-Agent Misalignment: AI Alignment Must Be a Dynamic and Social Process
Authors: Florian Carichon, Aditi Khandelwal, Marylou Fauchard, Golnoosh Farnadi |
阅读更多来源: ArXiv AI | 03-06-25
MCP-Zero: Proactive Toolchain Construction for LLM Agents from Scratch
Authors: Xiang Fei, Xiawu Zheng, Hao Feng |
阅读更多来源: ArXiv AI | 03-06-25
IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory
Authors: Wei Song, Zhenya Huang, Cheng Cheng, Weibo Gao, Bihan Xu, GuanHao Zhao, Fei Wang, Runze Wu |
阅读更多来源: ArXiv AI | 03-06-25
ChemAU: Harness the Reasoning of LLMs in Chemical Research with Adaptive Uncertainty Estimation
Authors: Xinyi Liu, Lipeng Ma, Yixuan Li, Weidong Yang, Qingyuan Zhou, Jiayi Song, Shuhao Li, Ben Fei |
阅读更多来源: ArXiv AI | 03-06-25
Modular Speaker Architecture: A Framework for Sustaining Responsibility and Contextual Integrity in Multi-Agent AI Communication
Authors: Khe-Han Toh, Hong-Kuan Teo |
阅读更多来源: ArXiv AI | 03-06-25
GeoLocSFT: Efficient Visual Geolocation via Supervised Fine-Tuning of Multimodal Foundation Models
Authors: Qiang Yi, Lianlei Shan |
阅读更多来源: ArXiv AI | 03-06-25
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
Authors: Zhong Zhang, Yaxi Lu, Yikun Fu, Yupeng Huo, Shenzhi Yang, Yesai Wu, Han Si, Xin Cong, Haotian Chen, Yankai Lin, Jie Xie, Wei Zhou, Wang Xu, Yuanheng Zhang, Zhou Su, Zhongwu Zhai, Xiaoming Liu, Yudong Mei, Jianming Xu, Hongyan Tian, Chongyi Wang, Chi Chen, Yuan Yao, Zhiyuan Liu, Maosong Sun |
阅读更多来源: ArXiv AI | 03-06-25
Distinguishing Autonomous AI Agents from Collaborative Agentic Systems: A Comprehensive Framework for Understanding Modern Intelligent Architectures
Authors: Prashik Buddhaghosh Bansod |
阅读更多来源: ArXiv AI | 03-06-25
FinRobot: Generative Business Process AI Agents for Enterprise Resource Planning in Finance
Authors: Hongyang Yang, Likun Lin, Yang She, Xinyu Liao, Jiaoyang Wang, Runjia Zhang, Yuquan Mo, Christina Dan Wang |
阅读更多来源: ArXiv AI | 03-06-25
MLA-Trust: Benchmarking Trustworthiness of Multimodal LLM Agents in GUI Environments
Authors: Xiao Yang, Jiawei Chen, Jun Luo, Zhengwei Fang, Yinpeng Dong, Hang Su, Jun Zhu |
阅读更多来源: ArXiv AI | 03-06-25
Social Cooperation in Conversational AI Agents
Authors: Mustafa Mert Çelikok, Saptarashmi Bandyopadhyay, Robert Loftin |
阅读更多来源: ArXiv AI | 03-06-25
Reasoning-Based Approach with Chain-of-Thought for Alzheimer's Detection Using Speech and Large Language Models
Authors: Chanwoo Park, Anna Seo Gyeong Choi, Sunghye Cho, Chanwoo Kim |
阅读更多来源: ArXiv AI | 03-06-25
K12Vista: Exploring the Boundaries of MLLMs in K-12 Education
Authors: Chong Li, Chenglin Zhu, Tao Zhang, Mingan Lin, Zenan Zhou, Jian Xie |
阅读更多来源: ArXiv AI | 03-06-25
The Ultimate Test of Superintelligent AI Agents: Can an AI Balance Care and Control in Asymmetric Relationships?
Authors: Djallel Bouneffouf, Matthew Riemer, Kush Varshney |
阅读更多来源: ArXiv AI | 03-06-25
A Study on the MCP x A2A Framework for Enhancing Interoperability of LLM-based Autonomous Agents
Authors: Cheonsu Jeong |
阅读更多来源: ArXiv AI | 03-06-25
Fodor and Pylyshyn's Legacy - Still No Human-like Systematic Compositionality in Neural Networks
Authors: Tim Woydt, Moritz Willig, Antonia Wüst, Lukas Helff, Wolfgang Stammer, Constantin A. Rothkopf, Kristian Kersting |
阅读更多来源: ArXiv AI | 03-06-25
COALESCE: Economic and Security Dynamics of Skill-Based Task Outsourcing Among Team of Autonomous LLM Agents
Authors: Manish Bhatt, Ronald F. Del Rosario, Vineeth Sai Narajala, Idan Habler |
阅读更多来源: ArXiv AI | 03-06-25
Large language models can learn and generalize steganographic chain-of-thought under process supervision
Authors: Joey Skaf, Luis Ibanez-Lissen, Robert McCarthy, Connor Watts, Vasil Georgiv, Hannes Whittingham, Lorena Gonzalez-Manzano, David Lindner, Cameron Tice, Edward James Young, Puria Radmard |
阅读更多来源: ArXiv AI | 03-06-25
Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods
Authors: Yifan Hao, Xingyuan Pan, Hanning Zhang, Chenlu Ye, Rui Pan, Tong Zhang |
阅读更多来源: ArXiv AI | 03-06-25
OpenAI sees human interaction as a competitor to ChatGPT's super assistant ambitions
阅读更多来源: The Decoder | 03-06-25
Cloudlflare builds OAuth with Claude and publishes all the promptsgithub.com/cloudflare
阅读更多来源: Hacker News | 03-06-25
Supervised Quantum Machine Learning: A Future Outlook from Qubits to Enterprise Applications
Authors: Srikanth Thudumu, Jason Fisher, Hung Du |
阅读更多来源: ArXiv AI | 03-06-25
PhySense: Principle-Based Physics Reasoning Benchmarking for Large Language Models
Authors: Yinggan Xu, Yue Liu, Zhiqiang Gao, Changnan Peng, Di Luo |
阅读更多来源: ArXiv AI | 03-06-25
Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck
Authors: Yuwen Tan, Yuan Qing, Boqing Gong |
阅读更多来源: ArXiv AI | 03-06-25
Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs
Authors: Juraj Vladika, Annika Domres, Mai Nguyen, Rebecca Moser, Jana Nano, Felix Busch, Lisa C. Adams, Keno K. Bressem, Denise Bernhardt, Stephanie E. Combs, Kai J. Borm, Florian Matthes, Jan C. Peeken |
阅读更多来源: ArXiv AI | 03-06-25
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Authors: Mingjie Liu, Shizhe Diao, Ximing Lu, Jian Hu, Xin Dong, Yejin Choi, Jan Kautz, Yi Dong |
阅读更多来源: ArXiv AI | 03-06-25
Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning
Authors: Shuyao Xu, Cheng Peng, Jiangxuan Long, Weidi Xu, Wei Chu, Yuan Qi |
阅读更多来源: ArXiv AI | 03-06-25
Lessons Learned: A Multi-Agent Framework for Code LLMs to Learn and Improve
Authors: Yuanzhe Liu, Ryan Deng, Tim Kaler, Xuhao Chen, Charles E. Leiserson, Yao Ma, Jie Chen |
阅读更多来源: ArXiv AI | 03-06-25
Multi-RAG: A Multimodal Retrieval-Augmented Generation System for Adaptive Video Understanding
Authors: Mingyang Mao, Mariela M. Perez-Cabarcas, Utteja Kallakuri, Nicholas R. Waytowich, Xiaomin Lin, Tinoosh Mohsenin |
阅读更多来源: ArXiv AI | 03-06-25
MSQA: Benchmarking LLMs on Graduate-Level Materials Science Reasoning and Knowledge
Authors: Jerry Junyang Cheung, Shiyao Shen, Yuchen Zhuang, Yinghao Li, Rampi Ramprasad, Chao Zhang |
阅读更多来源: ArXiv AI | 03-06-25
mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation
Authors: Chan-Wei Hu, Yueqi Wang, Shuo Xing, Chia-Ju Chen, Zhengzhong Tu |
阅读更多来源: ArXiv AI | 03-06-25
Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
Authors: Qiao Xiao, Alan Ansell, Boqian Wu, Lu Yin, Mykola Pechenizkiy, Shiwei Liu, Decebal Constantin Mocanu |
阅读更多来源: ArXiv AI | 03-06-25
GenIC: An LLM-Based Framework for Instance Completion in Knowledge Graphs
Authors: Amel Gader, Alsayed Algergawy |
阅读更多来源: ArXiv AI | 03-06-25
E^2GraphRAG: Streamlining Graph-based RAG for High Efficiency and Effectiveness
Authors: Yibo Zhao, Jiapeng Zhu, Ye Guo, Kangkang He, Xiang Li |
阅读更多来源: ArXiv AI | 03-06-25
Bootstrapping LLM Robustness for VLM Safety via Reducing the Pretraining Modality Gap
Authors: Wenhan Yang, Spencer Stice, Ali Payani, Baharan Mirzasoleiman |
阅读更多来源: ArXiv AI | 03-06-25
How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning
Authors: Hongyi James Cai, Junlin Wang, Xiaoyin Chen, Bhuwan Dhingra |
阅读更多来源: ArXiv AI | 03-06-25
Generative AI for Urban Design: A Stepwise Approach Integrating Human Expertise with Multimodal Diffusion Models
Authors: Mingyi He, Yuebing Liang, Shenhao Wang, Yunhan Zheng, Qingyi Wang, Dingyi Zhuang, Li Tian, Jinhua Zhao |
阅读更多来源: ArXiv AI | 03-06-25
FABLE: A Novel Data-Flow Analysis Benchmark on Procedural Text for Large Language Model Evaluation
Authors: Vishal Pallagani, Nitin Gupta, John Aydin, Biplav Srivastava |
阅读更多来源: ArXiv AI | 03-06-25
GridRoute: A Benchmark for LLM-Based Route Planning with Cardinal Movement in Grid Environments
Authors: Kechen Li, Yaotian Tao, Ximing Wen, Quanwei Sun, Zifei Gong, Chang Xu, Xizhe Zhang, Tianbo Ji |
阅读更多来源: ArXiv AI | 03-06-25
Mind the Quote: Enabling Quotation-Aware Dialogue in LLMs via Plug-and-Play Modules
Authors: Yueqi Zhang, Peiwen Yuan, Shaoxiong Feng, Yiwei Li, Xinglin Wang, Jiayi Shi, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li |
阅读更多来源: ArXiv AI | 03-06-25
Leveraging Knowledge Graphs and LLMs for Structured Generation of Misinformation
Authors: Sania Nayab, Marco Simoni, Giulio Rossolini |
阅读更多来源: ArXiv AI | 03-06-25
Optimizing the Interface Between Knowledge Graphs and LLMs for Complex Reasoning
Authors: Vasilije Markovic, Lazar Obradovic, Laszlo Hajdu, Jovan Pavlovic |
阅读更多来源: ArXiv AI | 03-06-25
SEAR: A Multimodal Dataset for Analyzing AR-LLM-Driven Social Engineering Behaviors
Authors: Tianlong Yu, Chenghang Ye, Zheyu Yang, Ziyi Zhou, Cui Tang, Zui Tao, Jun Zhang, Kailong Wang, Liting Zhou, Yang Yang, Ting Bi |
阅读更多来源: ArXiv AI | 03-06-25
MELT: Towards Automated Multimodal Emotion Data Annotation by Leveraging LLM Embedded Knowledge
Authors: Xin Jing, Jiadong Wang, Iosif Tsangko, Andreas Triantafyllopoulos, Björn W. Schuller |
阅读更多来源: ArXiv AI | 03-06-25
Random Rule Forest (RRF): Interpretable Ensembles of LLM-Generated Questions for Predicting Startup Success
Authors: Ben Griffin, Joseph Ternasky, Fuat Alican, Yigit Ihlamur |
阅读更多来源: ArXiv AI | 03-06-25
Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models
Authors: Frederike Lübeck, Jonas Wildberger, Frederik Träuble, Maximilian Mordig, Sergios Gatidis, Andreas Krause, Bernhard Schölkopf |
阅读更多来源: ArXiv AI | 03-06-25
EXP-Bench: Can AI Conduct AI Research Experiments?
Authors: Patrick Tser Jern Kon, Jiachen Liu, Xinyi Zhu, Qiuyi Ding, Jingjia Peng, Jiarong Xing, Yibo Huang, Yiming Qiu, Jayanth Srinivasa, Myungjin Lee, Mosharaf Chowdhury, Matei Zaharia, Ang Chen |
阅读更多来源: ArXiv AI | 03-06-25
Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents
Authors: Yaxin Luo, Zhaoyi Li, Jiacheng Liu, Jiacheng Cui, Xiaohan Zhao, Zhiqiang Shen |
阅读更多来源: ArXiv AI | 03-06-25
Elevenlabs' new AI voice system enables smoother interactions through real-time analysis
阅读更多来源: The Decoder | 02-06-25
Anthropic CEO predicts 20% unemployment from AI - and suggests taxing every AI response
阅读更多来源: The Decoder | 02-06-25
How can AI researchers save energy? By going backwardquantamagazine.org
阅读更多来源: Hacker News | 02-06-25
Beyond the Black Box: Interpretability of LLMs in Financearxiv.org
阅读更多来源: Hacker News | 02-06-25
Codex CLI is going nativegithub.com/openai
阅读更多来源: Hacker News | 02-06-25
When Fine-Tuning Makes Sense: A Developer's Guidegetkiln.ai
阅读更多来源: Hacker News | 02-06-25
Google AI Edge – On-device cross-platform AI deploymentai.google.dev
阅读更多来源: Hacker News | 02-06-25
Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time
Authors: Mohamad Chehade, Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy, Dinesh Manocha, Hao Zhu, Amrit Singh Bedi |
阅读更多来源: ArXiv AI | 01-06-25
SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA
Authors: Minrui Luo, Fuhang Kuang, Yu Wang, Zirui Liu, Tianxing He |
阅读更多来源: ArXiv AI | 01-06-25
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
Authors: Ziyin Zhang, Jiahao Xu, Zhiwei He, Tian Liang, Qiuzhi Liu, Yansi Li, Linfeng Song, Zhengwen Liang, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu |
阅读更多来源: ArXiv AI | 01-06-25
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
Authors: Diankun Wu, Fangfu Liu, Yi-Hsin Hung, Yueqi Duan |
阅读更多来源: ArXiv AI | 01-06-25
Case-Based Reasoning Enhances the Predictive Power of LLMs in Drug-Drug Interaction
Authors: Guangyi Liu, Yongqi Zhang, Xunyuan Liu, Quanming Yao |
阅读更多来源: ArXiv AI | 01-06-25
Second Opinion Matters: Towards Adaptive Clinical AI via the Consensus of Expert Model Ensemble
Authors: Amit Kumthekar, Zion Tilley, Henry Duong, Bhargav Patel, Michael Magnoli, Ahmed Omar, Ahmed Nasser, Chaitanya Gharpure, Yevgen Reztzov |
阅读更多来源: ArXiv AI | 01-06-25
MathArena: Evaluating LLMs on Uncontaminated Math Competitions
Authors: Mislav Balunović, Jasper Dekoninck, Ivo Petrov, Nikola Jovanović, Martin Vechev |
阅读更多来源: ArXiv AI | 01-06-25
A Unified Framework for Human AI Collaboration in Security Operations Centers with Trusted Autonomy
Authors: Ahmad Mohsin, Helge Janicke, Ahmed Ibrahim, Iqbal H. Sarker, Seyit Camtepe |
阅读更多来源: ArXiv AI | 01-06-25
Autoformalization in the Era of Large Language Models: A Survey
Authors: Ke Weng, Lun Du, Sirui Li, Wangyue Lu, Haozhe Sun, Hengyu Liu, Tiancheng Zhang |
阅读更多来源: ArXiv AI | 01-06-25
EVOREFUSE: Evolutionary Prompt Optimization for Evaluation and Mitigation of LLM Over-Refusal to Pseudo-Malicious Instructions
Authors: Xiaorui Wu, Xiaofeng Mao, Fei Li, Xin Zhang, Xiaolu Zhang, Jun Zhou, Yuxiang Peng, Li Zheng, Chong Teng, Donghong Ji, Zhuang Li |
阅读更多来源: ArXiv AI | 01-06-25
SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents
Authors: Kunlun Zhu, Jiaxun Zhang, Ziheng Qi, Nuoxing Shang, Zijia Liu, Peixuan Han, Yue Su, Haofei Yu, Jiaxuan You |
阅读更多来源: ArXiv AI | 01-06-25
Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise Analytics
Authors: Ran Zhang, Mohannad Elhamod |
阅读更多来源: ArXiv AI | 01-06-25
Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability
Authors: Ruida Wang, Yuxin Li, Yi R. (May)Fung, Tong Zhang |
阅读更多来源: ArXiv AI | 01-06-25
Deepseek's R1 model closes the gap with OpenAI and Google after major update
阅读更多来源: The Decoder | 01-06-25
The ‘white-collar bloodbath’ is all part of the AI hype machinecnn.com
阅读更多来源: Hacker News | 01-06-25
Show HN: AI Peer Reviewer – Multiagent system for scientific manuscript analysisgithub.com/robertjakob
阅读更多来源: Hacker News | 01-06-25
Generative AI startup Odyssey demos interactive AI-generated video
阅读更多来源: The Decoder | 31-05-25
Show HN: MCP Defender – OSS AI Firewall for Protecting MCP in Cursor/Claude etcmcpdefender.com
阅读更多来源: Hacker News | 31-05-25
The Darwin Gödel Machine: AI that improves itself by rewriting its own codesakana.ai
阅读更多来源: Hacker News | 31-05-25
AccessOwl (YC S22) is hiring an AI TypeScript Engineer to connect 100s of SaaSycombinator.com
阅读更多来源: Hacker News | 31-05-25
The Trackers and SDKs in ChatGPT, Claude, Grok and Perplexityjamesoclaire.com
阅读更多来源: Hacker News | 31-05-25
What's working for YC companies since the AI boomjamesin.substack.com
阅读更多来源: Hacker News | 31-05-25
Opera unveils Neon, a browser designed for both humans and AI agents
阅读更多来源: The Decoder | 31-05-25
One year after its rivals, Claude can finally speak with users through a new voice mode
阅读更多来源: The Decoder | 31-05-25
Anthropic launches a voice mode for Claudetechcrunch.com
阅读更多来源: Hacker News | 31-05-25
Mistral's Agents API enables AI agents to collaborate and connect with external systems
阅读更多来源: The Decoder | 30-05-25
What is currently the best LLM model for consumer grade hardware? Is it phi-4?
阅读更多来源: Hacker News | 30-05-25
Spaitial pushes generative AI to understand and create 3D structures with real physical properties
阅读更多来源: The Decoder | 30-05-25
Human coders are still better than LLMsantirez.com
阅读更多来源: Hacker News | 30-05-25
Open-sourcing circuit tracing toolsanthropic.com
阅读更多来源: Hacker News | 30-05-25
A visual exploration of vector embeddingspamelafox.org
阅读更多来源: Hacker News | 30-05-25
Nick Clegg says a mandatory AI training opt-in would kill the UK's AI industry
阅读更多来源: The Decoder | 29-05-25
ClaimPKG: Enhancing Claim Verification via Pseudo-Subgraph Generation with Lightweight Specialized LLM
Authors: Hoang Pham, Thanh-Do Nguyen, Khac-Hoai Nam Bui |
阅读更多来源: ArXiv AI | 29-05-25
Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning
Authors: Erxin Yu, Jing Li, Ming Liao, Qi Zhu, Boyang Xue, Minghui Xu, Baojun Wang, Lanqing Hong, Fei Mi, Lifeng Shang |
阅读更多来源: ArXiv AI | 29-05-25
Agent-UniRAG: A Trainable Open-Source LLM Agent Framework for Unified Retrieval-Augmented Generation Systems
Authors: Hoang Pham, Khac-Hoai Nam Bui |
阅读更多来源: ArXiv AI | 29-05-25
R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning
Authors: Yongchao Chen, Yueying Liu, Junwei Zhou, Yilun Hao, Jingquan Wang, Yang Zhang, Chuchu Fan |
阅读更多来源: ArXiv AI | 29-05-25
Understanding the learned look-ahead behavior of chess neural networks
Authors: Diogo Cruz |
阅读更多来源: ArXiv AI | 29-05-25
3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model
Authors: Wenbo Hu, Yining Hong, Yanjun Wang, Leison Gao, Zibu Wei, Xingcheng Yao, Nanyun Peng, Yonatan Bitton, Idan Szpektor, Kai-Wei Chang |
阅读更多来源: ArXiv AI | 29-05-25
From Reasoning to Learning: A Survey on Hypothesis Discovery and Rule Learning with Large Language Models
Authors: Kaiyu He, Zhiyu Chen |
阅读更多来源: ArXiv AI | 29-05-25
Modeling and Optimizing User Preferences in AI Copilots: A Comprehensive Survey and Taxonomy
Authors: Saleh Afzoon, Zahra Jahanandish, Phuong Thao Huynh, Amin Beheshti, Usman Naseem |
阅读更多来源: ArXiv AI | 29-05-25
SAGE-Eval: Evaluating LLMs for Systematic Generalizations of Safety Facts
Authors: Chen Yueh-Han, Guy Davidson, Brenden M. Lake |
阅读更多来源: ArXiv AI | 29-05-25
Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation
Authors: Tharindu Kumarage, Ninareh Mehrabi, Anil Ramakrishna, Xinyan Zhao, Richard Zemel, Kai-Wei Chang, Aram Galstyan, Rahul Gupta, Charith Peris |
阅读更多来源: ArXiv AI | 29-05-25
Visual Large Language Models Exhibit Human-Level Cognitive Flexibility in the Wisconsin Card Sorting Test
Authors: Guangfu Hao, Frederic Alexandre, Shan Yu |
阅读更多来源: ArXiv AI | 29-05-25
HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym
Authors: Ngoc La, Ruaridh Mon-Williams, Julie A. Shah |
阅读更多来源: ArXiv AI | 29-05-25
AgentDNS: A Root Domain Naming System for LLM Agents
Authors: Enfang Cui, Yujun Cheng, Rui She, Dan Liu, Zhiyuan Liang, Minxin Guo, Tianzheng Li, Qian Wei, Wenjuan Xing, Zhijie Zhong |
阅读更多来源: ArXiv AI | 29-05-25
From Large AI Models to Agentic AI: A Tutorial on Future Intelligent Communications
Authors: Feibo Jiang, Cunhua Pan, Li Dong, Kezhi Wang, Octavia A. Dobre, Merouane Debbah |
阅读更多来源: ArXiv AI | 29-05-25
Chatbots like ChatGPT have not led to significant changes in wages or working hours, study finds
阅读更多来源: The Decoder | 29-05-25
Show HN: AutoThink – Boosts local LLM performance with adaptive reasoning
阅读更多来源: Hacker News | 29-05-25
Launch HN: MindFort (YC X25) – AI agents for continuous pentesting
阅读更多来源: Hacker News | 29-05-25
LLM codegen go brrr – Parallelization with Git worktrees and tmuxskeptrune.com
阅读更多来源: Hacker News | 29-05-25
Gmail Personal Smart Replies: The first time an AI feature has worried me
阅读更多来源: The Decoder | 28-05-25
Running GPT-2 in WebGL: Rediscovering the Lost Art of GPU Shader Programmingnathan.rs
阅读更多来源: Hacker News | 28-05-25
There Is No Diffie-Hellman but Elliptic Curve Diffie-Hellmankeymaterial.net
阅读更多来源: Hacker News | 28-05-25
Show HN: My LLM CLI tool can run tools now, from Python code or pluginssimonwillison.net
阅读更多来源: Hacker News | 28-05-25
Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making
Authors: Yihan Wang, Qiao Yan, Zhenghao Xing, Lihao Liu, Junjun He, Chi-Wing Fu, Xiaowei Hu, Pheng-Ann Heng |
阅读更多来源: ArXiv AI | 28-05-25
Multi-Modal Artificial Intelligence of Embryo Grading and Pregnancy Prediction in Assisted Reproductive Technology: A Review
Authors: Xueqiang Ouyang, Jia Wei |
阅读更多来源: ArXiv AI | 28-05-25
How does Alignment Enhance LLMs' Multilingual Capabilities? A Language Neurons Perspective
Authors: Shimao Zhang, Zhejian Lai, Xiang Liu, Shuaijie She, Xiao Liu, Yeyun Gong, Shujian Huang, Jiajun Chen |
阅读更多来源: ArXiv AI | 28-05-25
CoderAgent: Simulating Student Behavior for Personalized Programming Learning with Large Language Models
Authors: Yi Zhan, Qi Liu, Weibo Gao, Zheng Zhang, Tianfu Wang, Shuanghong Shen, Junyu Lu, Zhenya Huang |
阅读更多来源: ArXiv AI | 28-05-25
Comparisons between a Large Language Model-based Real-Time Compound Diagnostic Medical AI Interface and Physicians for Common Internal Medicine Cases using Simulated Patients
Authors: Hyungjun Park (1,2), Chang-Yun Woo (3), Seungjo Lim (2), Seunghwan Lim (2), Keunho Kwak (2), Ju Young Jeong (4), Chong Hyun Suh (4) ((1) Department of Pulmonology, Shihwa Medical Center, Siheung, Republic of Korea (2) Helpmedoc Inc., Republic of Korea (3) Department of Internal Medicine, Asan Medical Center, Seoul, Republic of Korea (4) Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Republic of Korea) |
阅读更多来源: ArXiv AI | 28-05-25
Project Riley: Multimodal Multi-Agent LLM Collaboration with Emotional Reasoning and Voting
Authors: Ana Rita Ortigoso, Gabriel Vieira, Daniel Fuentes, Luis Frazão, Nuno Costa, António Pereira |
阅读更多来源: ArXiv AI | 28-05-25
RRO: LLM Agent Optimization Through Rising Reward Trajectories
Authors: Zilong Wang, Jingfeng Yang, Sreyashi Nag, Samarth Varshney, Xianfeng Tang, Haoming Jiang, Jingbo Shang, Sheikh Muhammad Sarwar |
阅读更多来源: ArXiv AI | 28-05-25
E2E Process Automation Leveraging Generative AI and IDP-Based Automation Agent: A Case Study on Corporate Expense Processing
Authors: Cheonsu Jeong, Seongmin Sim, Hyoyoung Cho, Sungsu Kim, Byounggwan Shin |
阅读更多来源: ArXiv AI | 28-05-25
GIFARC: Synthetic Dataset for Leveraging Human-Intuitive Analogies to Elevate AI Reasoning
Authors: Woochang Sim, Hyunseok Ryu, Kyungmin Choi, Sungwon Han, Sundong Kim |
阅读更多来源: ArXiv AI | 28-05-25
LLM-Guided Reinforcement Learning: Addressing Training Bottlenecks through Policy Modulation
Authors: Heng Tan, Hua Yan, Yu Yang |
阅读更多来源: ArXiv AI | 28-05-25
AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage
Authors: Xuanle Zhao, Zilin Sang, Yuxuan Li, Qi Shi, Shuo Wang, Duzhen Zhang, Xu Han, Zhiyuan Liu, Maosong Sun |
阅读更多来源: ArXiv AI | 28-05-25
Step-Wise Formal Verification for LLM-Based Mathematical Problem Solving
Authors: Kuo Zhou, Lu Zhang |
阅读更多来源: ArXiv AI | 28-05-25
Large Language Model-enhanced Reinforcement Learning for Low-Altitude Economy Networking
Authors: Lingyi Cai, Ruichen Zhang, Changyuan Zhao, Yu Zhang, Jiawen Kang, Dusit Niyato, Tao Jiang, Xuemin Shen |
阅读更多来源: ArXiv AI | 28-05-25
Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations
Authors: Hao Li, He Cao, Bin Feng, Yanjun Shao, Xiangru Tang, Zhiyuan Yan, Li Yuan, Yonghong Tian, Yu Li |
阅读更多来源: ArXiv AI | 28-05-25
Complex System Diagnostics Using a Knowledge Graph-Informed and Large Language Model-Enhanced Framework
Authors: Saman Marandi, Yu-Shu Hu, Mohammad Modarres |
阅读更多来源: ArXiv AI | 28-05-25
RLJP: Legal Judgment Prediction via First-Order Logic Rule-enhanced with Large Language Models
Authors: Yue Zhang, Zhiliang Tian, Shicheng Zhou, Haiyang Wang, Wenqing Hou, Yuying Liu, Xuechen Zhao, Minlie Huang, Ye Wang, Bin Zhou |
阅读更多来源: ArXiv AI | 28-05-25
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs
Authors: Jiakang Yuan, Tianshuo Peng, Yilei Jiang, Yiting Lu, Renrui Zhang, Kaituo Feng, Chaoyou Fu, Tao Chen, Lei Bai, Bo Zhang, Xiangyu Yue |
阅读更多来源: ArXiv AI | 28-05-25
A Structured Unplugged Approach for Foundational AI Literacy in Primary Education
Authors: Maria Cristina Carrisi, Mirko Marras, Sara Vergallo |
阅读更多来源: ArXiv AI | 28-05-25
The Multilingual Divide and Its Impact on Global AI Safety
Authors: Aidan Peppin, Julia Kreutzer, Alice Schoenauer Sebag, Kelly Marchisio, Beyza Ermis, John Dang, Samuel Cahyawijaya, Shivalika Singh, Seraphina Goldfarb-Tarrant, Viraat Aryabumi, Aakanksha, Wei-Yin Ko, Ahmet Üstün, Matthias Gallé, Marzieh Fadaee, Sara Hooker |
阅读更多来源: ArXiv AI | 28-05-25
Diagnosing and Resolving Cloud Platform Instability with Multi-modal RAG LLMs
Authors: Yifan Wang, Kenneth P. Birman |
阅读更多来源: ArXiv AI | 28-05-25
Robust Hypothesis Generation: LLM-Automated Language Bias for Inductive Logic Programming
Authors: Yang Yang, Jiemin Wu, Yutao Yue |
阅读更多来源: ArXiv AI | 28-05-25
Google expands access to Veo 3, its viral new video model, through the Gemini app
阅读更多来源: The Decoder | 27-05-25
Diligent (YC S23) Is Hiring a Founding AI Engineerycombinator.com
阅读更多来源: Hacker News | 27-05-25
Trying to teach in the age of the AI homework machinesolarshades.club
阅读更多来源: Hacker News | 27-05-25
Highlights from the Claude 4 system promptsimonwillison.net
阅读更多来源: Hacker News | 27-05-25
Automated CAD Modeling Sequence Generation from Text Descriptions via Transformer-Based Large Language Models
Authors: Jianxing Liao, Junyan Xu, Yatao Sun, Maowen Tang, Sicheng He, Jingxian Liao, Shui Yu, Yun Li, Hongguan Xiao |
阅读更多来源: ArXiv AI | 27-05-25
Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel Bugs
Authors: Zhenhao Zhou, Zhuochen Huang, Yike He, Chong Wang, Jiajun Wang, Yijian Wu, Xin Peng, Yiling Lou |
阅读更多来源: ArXiv AI | 27-05-25
MSD-LLM: Predicting Ship Detention in Port State Control Inspections with Large Language Model
Authors: Jiongchao Jin, Xiuju Fu, Xiaowei Gao, Tao Cheng, Ran Yan |
阅读更多来源: ArXiv AI | 27-05-25
LLM-Agent-Controller: A Universal Multi-Agent Large Language Model System as a Control Engineer
Authors: Rasoul Zahedifar, Sayyed Ali Mirghasemi, Mahdieh Soleymani Baghshah, Alireza Taheri |
阅读更多来源: ArXiv AI | 27-05-25
AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and Healthcare
Authors: Ying Xiao, Jie Huang, Ruijuan He, Jing Xiao, Mohammad Reza Mousavi, Yepang Liu, Kezhi Li, Zhenpeng Chen, Jie M. Zhang |
阅读更多来源: ArXiv AI | 27-05-25
Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language Models
Authors: George Kour, Itay Nakash, Ateret Anaby-Tavor, Michal Shmueli-Scheuer |
阅读更多来源: ArXiv AI | 27-05-25
Large Language Models for Planning: A Comprehensive and Systematic Survey
Authors: Pengfei Cao, Tianyi Men, Wencan Liu, Jingwen Zhang, Xuzhao Li, Xixun Lin, Dianbo Sui, Yanan Cao, Kang Liu, Jun Zhao |
阅读更多来源: ArXiv AI | 27-05-25
Large Language Models' Reasoning Stalls: An Investigation into the Capabilities of Frontier Models
Authors: Lachlan McGinness, Peter Baumgartner |
阅读更多来源: ArXiv AI | 27-05-25
FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks
Authors: Atsunori Moteki, Shoichi Masui, Fan Yang, Yueqi Song, Yonatan Bisk, Graham Neubig, Ikuo Kusajima, Yasuto Watanabe, Hiroyuki Ishida, Jun Takahashi, Shan Jiang |
阅读更多来源: ArXiv AI | 27-05-25
ReChisel: Effective Automatic Chisel Code Generation by LLM with Reflection
Authors: Juxin Niu, Xiangfeng Liu, Dan Niu, Xi Wang, Zhe Jiang, Nan Guan |
阅读更多来源: ArXiv AI | 27-05-25
Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning
Authors: Zican Hu, Wei Liu, Xiaoye Qu, Xiangyu Yue, Chunlin Chen, Zhi Wang, Yu Cheng |
阅读更多来源: ArXiv AI | 27-05-25
Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging
Authors: Yongxian Wei, Runxi Cheng, Weike Jin, Enneng Yang, Li Shen, Lu Hou, Sinan Du, Chun Yuan, Xiaochun Cao, Dacheng Tao |
阅读更多来源: ArXiv AI | 27-05-25
DGRAG: Distributed Graph-based Retrieval-Augmented Generation in Edge-Cloud Systems
Authors: Wenqing Zhou, Yuxuan Yan, Qianqian Yang |
阅读更多来源: ArXiv AI | 27-05-25
Large Language Models as Autonomous Spacecraft Operators in Kerbal Space Program
Authors: Alejandro Carrasco, Victor Rodriguez-Fernandez, Richard Linares |
阅读更多来源: ArXiv AI | 27-05-25
Subtle Risks, Critical Failures: A Framework for Diagnosing Physical Safety of LLMs for Embodied Decision Making
Authors: Yejin Son, Minseo Kim, Sungwoong Kim, Seungju Han, Jian Kim, Dongju Jang, Youngjae Yu, Chanyoung Park |
阅读更多来源: ArXiv AI | 27-05-25
EMAC+: Embodied Multimodal Agent for Collaborative Planning with VLM+LLM
Authors: Shuang Ao, Flora D. Salim, Simon Khan |
阅读更多来源: ArXiv AI | 27-05-25
Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback
Authors: Mengdi Li, Jiaye Lin, Xufeng Zhao, Wenhao Lu, Peilin Zhao, Stefan Wermter, Di Wang |
阅读更多来源: ArXiv AI | 27-05-25
Agentic AI Process Observability: Discovering Behavioral Variability
Authors: Fabiana Fournier, Lior Limonad, Yuval David |
阅读更多来源: ArXiv AI | 27-05-25
Capability-Based Scaling Laws for LLM Red-Teaming
Authors: Alexander Panfilov, Paul Kassianik, Maksym Andriushchenko, Jonas Geiping |
阅读更多来源: ArXiv AI | 27-05-25
MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents
Authors: Ziming Wei, Bingqian Lin, Zijian Jiao, Yunshuang Nie, Liang Ma, Yuecheng Liu, Yuzheng Zhuang, Xiaodan Liang |
阅读更多来源: ArXiv AI | 27-05-25
Temporal Sampling for Forgotten Reasoning in LLMs
Authors: Yuetai Li, Zhangchen Xu, Fengqing Jiang, Bhaskar Ramasubramanian, Luyao Niu, Bill Yuchen Lin, Xiang Yue, Radha Poovendran |
阅读更多来源: ArXiv AI | 27-05-25
The Mirage of Multimodality: Where Truth is Tested and Honesty Unravels
Authors: Jiaming Ji, Sitong Fang, Wenjing Cao, Jiahao Li, Xuyao Wang, Juntao Dai, Chi-Min Chan, Sirui Han, Yike Guo, Yaodong Yang |
阅读更多来源: ArXiv AI | 27-05-25
Ten Principles of AI Agent Economics
Authors: Ke Yang, ChengXiang Zhai |
阅读更多来源: ArXiv AI | 27-05-25
How Can I Publish My LLM Benchmark Without Giving the True Answers Away?
Authors: Takashi Ishida, Thanawat Lodkaew, Ikko Yamane |
阅读更多来源: ArXiv AI | 27-05-25
Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL
Authors: Joey Hong, Anca Dragan, Sergey Levine |
阅读更多来源: ArXiv AI | 27-05-25
Lost in the Haystack: Smaller Needles are More Difficult for LLMs to Find
Authors: Owen Bianchi, Mathew J. Koretsky, Maya Willey, Chelsea X. Alvarado, Tanay Nayak, Adi Asija, Nicole Kuznetsov, Mike A. Nalls, Faraz Faghri, Daniel Khashabi |
阅读更多来源: ArXiv AI | 27-05-25
Leveraging KANs for Expedient Training of Multichannel MLPs via Preconditioning and Geometric Refinement
Authors: Jonas A. Actor, Graham Harper, Ben Southworth, Eric C. Cyr |
阅读更多来源: ArXiv AI | 27-05-25
Bidirectional Knowledge Distillation for Enhancing Sequential Recommendation with Large Language Models
Authors: Jiongran Wu, Jiahao Liu, Dongsheng Li, Guangping Zhang, Mingzhe Han, Hansu Gu, Peng Zhang, Li Shang, Tun Lu, Ning Gu |
阅读更多来源: ArXiv AI | 27-05-25
Where You Go is Who You Are: Behavioral Theory-Guided LLMs for Inverse Reinforcement Learning
Authors: Yuran Sun, Susu Xu, Chenguang Wang, Xilei Zhao |
阅读更多来源: ArXiv AI | 27-05-25
Misaligning Reasoning with Answers -- A Framework for Assessing LLM CoT Robustness
Authors: Enyi Jiang, Changming Xu, Nischay Singh, Gagandeep Singh |
阅读更多来源: ArXiv AI | 27-05-25
From Reasoning to Generalization: Knowledge-Augmented LLMs for ARC Benchmark
Authors: Chao Lei, Nir Lipovetzky, Krista A. Ehinger, Yanchuan Chang |
阅读更多来源: ArXiv AI | 27-05-25
Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal Learning
Authors: Cheng Peng, Kai Zhang, Mengxian Lyu, Hongfang Liu, Lichao Sun, Yonghui Wu |
阅读更多来源: ArXiv AI | 27-05-25
Probe by Gaming: A Game-based Benchmark for Assessing Conceptual Knowledge in LLMs
Authors: Shuhang Xu, Weijian Deng, Yixuan Zhou, Fangwei Zhong |
阅读更多来源: ArXiv AI | 27-05-25
USTBench: Benchmarking and Dissecting Spatiotemporal Reasoning of LLMs as Urban Agents
Authors: Siqi Lai, Yansong Ning, Zirui Yuan, Zhixi Chen, Hao Liu |
阅读更多来源: ArXiv AI | 27-05-25
GeoGramBench: Benchmarking the Geometric Program Reasoning in Modern LLMs
Authors: Shixian Luo, Zezhou Zhu, Yu Yuan, Yuncheng Yang, Lianlei Shan, Yong Wu |
阅读更多来源: ArXiv AI | 27-05-25
CIKT: A Collaborative and Iterative Knowledge Tracing Framework with Large Language Models
Authors: Runze Li, Siyu Wu, Jun Wang, Wei Zhang |
阅读更多来源: ArXiv AI | 27-05-25
Enhancing AI System Resiliency: Formulation and Guarantee for LSTM Resilience Based on Control Theory
Authors: Sota Yoshihara (1), Ryousuke Yamamoto (2), Hiroyuki Kusumoto (1), Masanari Shimura (1) ((1) Graduate School of Mathematics, Nagoya University, (2) Aisin Software) |
阅读更多来源: ArXiv AI | 27-05-25
Automating Safety Enhancement for LLM-based Agents with Synthetic Risk Scenarios
Authors: Xueyang Zhou, Weidong Wang, Lin Lu, Jiawen Shi, Guiyao Tie, Yongtian Xu, Lixing Chen, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun |
阅读更多来源: ArXiv AI | 27-05-25
Superplatforms Have to Attack AI Agents
Authors: Jianghao Lin, Jiachen Zhu, Zheli Zhou, Yunjia Xi, Weiwen Liu, Yong Yu, Weinan Zhang |
阅读更多来源: ArXiv AI | 27-05-25
Evaluation Faking: Unveiling Observer Effects in Safety Evaluation of Frontier AI Systems
Authors: Yihe Fan, Wenqi Zhang, Xudong Pan, Min Yang |
阅读更多来源: ArXiv AI | 27-05-25
Formalizing Embeddedness Failures in Universal Artificial Intelligence
Authors: Cole Wyeth, Marcus Hutter |
阅读更多来源: ArXiv AI | 27-05-25
Structured Thinking Matters: Improving LLMs Generalization in Causal Inference Tasks
Authors: Wentao Sun, Joao Paulo Nogueira, Alonso Silva |
阅读更多来源: ArXiv AI | 27-05-25
Gaming Tool Preferences in Agentic LLMs
Authors: Kazem Faghih, Wenxiao Wang, Yize Cheng, Siddhant Bharti, Gaurang Sriramanan, Sriram Balasubramanian, Parsa Hosseini, Soheil Feizi |
阅读更多来源: ArXiv AI | 27-05-25
Embracing Contradiction: Theoretical Inconsistency Will Not Impede the Road of Building Responsible AI Systems
Authors: Gordon Dai, Yunze Xiao |
阅读更多来源: ArXiv AI | 27-05-25
Apple analyst expects OpenAI's AI hardware to be "as compact and elegant as an iPod Shuffle"
阅读更多来源: The Decoder | 26-05-25
Meta can use public Facebook and Instagram data for AI training, German court rules
阅读更多来源: The Decoder | 26-05-25
Trading with Claude, and writing your own MCP serverdangelov.com
阅读更多来源: Hacker News | 26-05-25
Ask HN: Anyone struggling to get value out of coding LLMs?
阅读更多来源: Hacker News | 26-05-25
How Does Claude 4 Think? – Sholto Douglas and Trenton Brickendwarkesh.com
阅读更多来源: Hacker News | 26-05-25
Venta AI (YC S23) Is Hiring a Founding Full Stack Engineer in Amsterdamycombinator.com
阅读更多来源: Hacker News | 26-05-25
Chomsky on what ChatGPT is good for (2023)chomsky.info
阅读更多来源: Hacker News | 26-05-25
Claude 4 System Cardsimonwillison.net
阅读更多来源: Hacker News | 26-05-25
OpenAI's Operator Agent gets o3 upgrade for more precise browser control
阅读更多来源: The Decoder | 25-05-25
Here's how Germans use ChatGPT according to OpenAI
阅读更多来源: The Decoder | 25-05-25
Peer Programming with LLMs, for Senior+ Engineerspmbanugo.me
阅读更多来源: Hacker News | 25-05-25
Show HN: AI Baby Monitor – local Video-LLM that beeps when safety rules breakgithub.com/zeenolife
阅读更多来源: Hacker News | 25-05-25
Can AI Read Between The Lines? Benchmarking LLMs On Financial Nuance
Authors: Dominick Kubica, Dylan T. Gordon, Nanami Emura, Derleen Saini, Charlie Goldenberg |
阅读更多来源: ArXiv AI | 25-05-25
Optimizing LLM-Based Multi-Agent System with Textual Feedback: A Case Study on Software Development
Authors: Ming Shen, Raphael Shu, Anurag Pratik, James Gung, Yubin Ge, Monica Sunkara, Yi Zhang |
阅读更多来源: ArXiv AI | 25-05-25
LLM-Powered AI Agent Systems and Their Applications in Industry
Authors: Guannan Liang, Qianqian Tong |
阅读更多来源: ArXiv AI | 25-05-25
Logic-of-Thought: Empowering Large Language Models with Logic Programs for Solving Puzzles in Natural Language
Authors: Naiqi Li, Peiyuan Liu, Zheng Liu, Tao Dai, Yong Jiang, Shu-Tao Xia |
阅读更多来源: ArXiv AI | 25-05-25
LightRouter: Towards Efficient LLM Collaboration with Minimal Overhead
Authors: Yifan Zhang, Xinkui Zhao, Zuxin Wang, Guanjie Cheng, Yueshen Xu, Shuiguang Deng, Jianwei Yin |
阅读更多来源: ArXiv AI | 25-05-25
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning
Authors: Jiawei Liu, Qisi Chen, Jianshu Zhang, Quan Liu, Defu Lian |
阅读更多来源: ArXiv AI | 25-05-25
How do Scaling Laws Apply to Knowledge Graph Engineering Tasks? The Impact of Model Size on Large Language Model Performance
Authors: Desiree Heim, Lars-Peter Meyer, Markus Schröder, Johannes Frey, Andreas Dengel |
阅读更多来源: ArXiv AI | 25-05-25
Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning
Authors: Xiaoxue Cheng, Junyi Li, Zhenduo Zhang, Xinyu Tang, Wayne Xin Zhao, Xinyu Kong, Zhiqiang Zhang |
阅读更多来源: ArXiv AI | 25-05-25
Advancing the Scientific Method with Large Language Models: From Hypothesis to Discovery
Authors: Yanbo Zhang, Sumeer A. Khan, Adnan Mahmud, Huck Yang, Alexander Lavin, Michael Levin, Jeremy Frey, Jared Dunnmon, James Evans, Alan Bundy, Saso Dzeroski, Jesper Tegner, Hector Zenil |
阅读更多来源: ArXiv AI | 25-05-25
ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection
Authors: Jiaqi Li, Xinyi Dong, Yang Liu, Zhizhuo Yang, Quansen Wang, Xiaobo Wang, SongChun Zhu, Zixia Jia, Zilong Zheng |
阅读更多来源: ArXiv AI | 25-05-25
Psychology-driven LLM Agents for Explainable Panic Prediction on Social Media during Sudden Disaster Events
Authors: Mengzhu Liu, Zhengqiu Zhu, Chuan Ai, Chen Gao, Xinghong Li, Lingnan He, Kaisheng Lai, Yingfeng Chen, Xin Lu, Yong Li, Quanjun Yin |
阅读更多来源: ArXiv AI | 25-05-25
ELABORATION: A Comprehensive Benchmark on Human-LLM Competitive Programming
Authors: Xinwei Yang, Zhaofeng Liu, Chen Huang, Jiashuai Zhang, Tong Zhang, Yifan Zhang, Wenqiang Lei |
阅读更多来源: ArXiv AI | 25-05-25
SMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs' Mathematical Problem Solving
Authors: Yujie Hou, Ting Zhang, Mei Wang, Xuetao Ma, Hu Huang |
阅读更多来源: ArXiv AI | 25-05-25
Data-Driven Breakthroughs and Future Directions in AI Infrastructure: A Comprehensive Review
Authors: Beyazit Bestami Yuksel, Ayse Yilmazer Metin |
阅读更多来源: ArXiv AI | 25-05-25
MCP-RADAR: A Multi-Dimensional Benchmark for Evaluating Tool Use Capabilities in Large Language Models
Authors: Xuanqi Gao, Siyi Xie, Juan Zhai, Shqing Ma, Chao Shen |
阅读更多来源: ArXiv AI | 25-05-25
Predicate-Conditional Conformalized Answer Sets for Knowledge Graph Embeddings
Authors: Yuqicheng Zhu, Daniel Hernández, Yuan He, Zifeng Ding, Bo Xiong, Evgeny Kharlamov, Steffen Staab |
阅读更多来源: ArXiv AI | 25-05-25
Identifying, Evaluating, and Mitigating Risks of AI Thought Partnerships
Authors: Kerem Oktar, Katherine M. Collins, Jose Hernandez-Orallo, Diane Coyle, Stephen Cave, Adrian Weller, Ilia Sucholutsky |
阅读更多来源: ArXiv AI | 25-05-25
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios
Authors: Yunjia Qi, Hao Peng, Xiaozhi Wang, Amy Xin, Youfeng Liu, Bin Xu, Lei Hou, Juanzi Li |
阅读更多来源: ArXiv AI | 25-05-25
HyGenar: An LLM-Driven Hybrid Genetic Algorithm for Few-Shot Grammar Generation
Authors: Weizhi Tang, Yixuan Li, Chris Sypherd, Elizabeth Polgreen, Vaishak Belle |
阅读更多来源: ArXiv AI | 25-05-25
Beyond Correlation: Towards Causal Large Language Model Agents in Biomedicine
Authors: Adib Bazgir, Amir Habibdoust Lafmajani, Yuwen Zhang |
阅读更多来源: ArXiv AI | 25-05-25
Know the Ropes: A Heuristic Strategy for LLM-based Multi-Agent System Design
Authors: Zhenkun Li, Lingyao Li, Shuhang Lin, Yongfeng Zhang |
阅读更多来源: ArXiv AI | 25-05-25
X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs
Authors: Rui Ye, Xiangrui Liu, Qimin Wu, Xianghe Pang, Zhenfei Yin, Lei Bai, Siheng Chen |
阅读更多来源: ArXiv AI | 25-05-25
OpenAI and G42 will build massive AI data center in Abu Dhabi
阅读更多来源: The Decoder | 25-05-25
Mistral's Document AI extracts text from documents and notes with high accuracy
阅读更多来源: The Decoder | 25-05-25
US House passed a bill that would ban state-level AI regulations for ten years
阅读更多来源: The Decoder | 25-05-25
Exposed Industrial Control Systems and Honeypots in the Wild [pdf]gsmaragd.github.io
阅读更多来源: Hacker News | 25-05-25
Positional preferences, order effects, prompt sensitivity undermine AI judgmentscip.org
阅读更多来源: Hacker News | 24-05-25
Show HN: I built a more productive way to manage AI chatscontextch.at
阅读更多来源: Hacker News | 24-05-25
Claude Opus 4 blackmailed an engineer after learning it might be replaced
阅读更多来源: The Decoder | 24-05-25
OpenAI has upgraded the Responses API with remote MCP servers and new tools
阅读更多来源: The Decoder | 24-05-25
OpenAI and Jony Ive are building a new AI device that is not a smartphone or smart glasses
阅读更多来源: The Decoder | 24-05-25
Mistral launches Devstral Small 24B, a new open-source LLM for coding
阅读更多来源: The Decoder | 23-05-25
OpenAI's Stargate secured $11.6 billion for a massive data center
阅读更多来源: The Decoder | 23-05-25
Google Gemini is everything Siri never was
阅读更多来源: The Decoder | 23-05-25
Gemini Diffusion could be Google's most important I/O news that slipped under the radar
阅读更多来源: The Decoder | 23-05-25
Google shows AI filmmaking tool, XR glasses and launches $250 Gemini subscription
阅读更多来源: The Decoder | 23-05-25
Launch HN: WorkDone (YC X25) – AI Audit of Medical Charts
阅读更多来源: Hacker News | 23-05-25
OpenAI: Scaling PostgreSQL to the Next Levelpixelstech.net
阅读更多来源: Hacker News | 23-05-25
Claude 4anthropic.com
阅读更多来源: Hacker News | 23-05-25
Management = Bullshit (LLM Edition)funcall.blogspot.com
阅读更多来源: Hacker News | 23-05-25
Problems in AI alignment: A scale modelmuldoon.cloud
阅读更多来源: Hacker News | 23-05-25
Google upgrades Gemini 2.5 Pro with a new Deep Think mode for advanced reasoning abilities
阅读更多来源: The Decoder | 22-05-25
An upgraded dev experience in Google AI Studiogoogleblog.com
阅读更多来源: Hacker News | 22-05-25
OpenAI to buy AI startup from Jony Ivebloomberg.com
阅读更多来源: Hacker News | 22-05-25
LLM function calls don't scale; code orchestration is simpler, more effectivejngiam.bearblog.dev
阅读更多来源: Hacker News | 22-05-25
Gemini figured out my nephew’s namenawaz.org
阅读更多来源: Hacker News | 22-05-25
Robert Musil Forgotten Plays Inspired His Greatest Work of Fictionlithub.com
阅读更多来源: Hacker News | 22-05-25
Gemini Diffusionsimonwillison.net
阅读更多来源: Hacker News | 22-05-25
FragFake: A Dataset for Fine-Grained Detection of Edited Images with Vision Language Models
Authors: Zhen Sun, Ziyi Zhang, Zeren Luo, Zeyang Sha, Tianshuo Cong, Zheng Li, Shiwen Cui, Weiqiang Wang, Jiaheng Wei, Xinlei He, Qi Li, Qian Wang |
阅读更多来源: ArXiv AI | 22-05-25
Listen to the Context: Towards Faithful Large Language Models for Retrieval Augmented Generation on Climate Questions
Authors: David Thulke, Jakob Kemmler, Christian Dugast, Hermann Ney |
阅读更多来源: ArXiv AI | 22-05-25
From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning
Authors: David Dinucu-Jianu, Jakub Macina, Nico Daheim, Ido Hakimi, Iryna Gurevych, Mrinmaya Sachan |
阅读更多来源: ArXiv AI | 22-05-25
Exploring LLM-Generated Feedback for Economics Essays: How Teaching Assistants Evaluate and Envision Its Use
Authors: Xinyi Lu, Aditya Mahesh, Zejia Shen, Mitchell Dudley, Larissa Sano, Xu Wang |
阅读更多来源: ArXiv AI | 22-05-25
A Federated Splitting Framework for LLMs: Security, Efficiency, and Adaptability
Authors: Zishuai Zhang, Hainan Zhang, Jiaying Zheng, Ziwei Wang, Yongxin Tong, Jin Dong, Zhiming Zheng |
阅读更多来源: ArXiv AI | 22-05-25
HybridProver: Augmenting Theorem Proving with LLM-Driven Proof Synthesis and Refinement
Authors: Jilin Hu, Jianyu Zhang, Yongwang Zhao, Talia Ringer |
阅读更多来源: ArXiv AI | 22-05-25
Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses
Authors: Xiaoxue Yang, Bozhidar Stevanoski, Matthieu Meeus, Yves-Alexandre de Montjoye |
阅读更多来源: ArXiv AI | 22-05-25
Shared Path: Unraveling Memorization in Multilingual LLMs through Language Similarities
Authors: Xiaoyu Luo, Yiyi Chen, Johannes Bjerva, Qiongxiu Li |
阅读更多来源: ArXiv AI | 22-05-25
Multi-modal Integration Analysis of Alzheimer's Disease Using Large Language Models and Knowledge Graphs
Authors: Kanan Kiguchi, Yunhao Tu, Katsuhiro Ajito, Fady Alnajjar, Kazuyuki Murase |
阅读更多来源: ArXiv AI | 22-05-25
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
Authors: Zhen Zhang, Xuehai He, Weixiang Yan, Ao Shen, Chenyang Zhao, Shuohang Wang, Yelong Shen, Xin Eric Wang |
阅读更多来源: ArXiv AI | 22-05-25
Large Language Models as Computable Approximations to Solomonoff Induction
Authors: Jun Wan, Lingrui Mei |
阅读更多来源: ArXiv AI | 22-05-25
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
Authors: Yuchen Yan, Jin Jiang, Zhenbang Ren, Yijun Li, Xudong Cai, Yang Liu, Xin Xu, Mengdi Zhang, Jian Shao, Yongliang Shen, Jun Xiao, Yueting Zhuang |
阅读更多来源: ArXiv AI | 22-05-25
R&D-Agent: Automating Data-Driven AI Solution Building Through LLM-Powered Automated Research, Development, and Evolution
Authors: Xu Yang, Xiao Yang, Shikai Fang, Bowen Xian, Yuante Li, Jian Wang, Minrui Xu, Haoran Pan, Xinpeng Hong, Weiqing Liu, Yelong Shen, Weizhu Chen, Jiang Bian |
阅读更多来源: ArXiv AI | 22-05-25
Self-Evolving Curriculum for LLM Reasoning
Authors: Xiaoyin Chen, Jiarui Lu, Minsu Kim, Dinghuai Zhang, Jian Tang, Alexandre Piché, Nicolas Gontier, Yoshua Bengio, Ehsan Kamalloo |
阅读更多来源: ArXiv AI | 22-05-25
lmgame-Bench: How Good are LLMs at Playing Games?
Authors: Lanxiang Hu, Mingjia Huo, Yuxuan Zhang, Haoyang Yu, Eric P. Xing, Ion Stoica, Tajana Rosing, Haojian Jin, Hao Zhang |
阅读更多来源: ArXiv AI | 22-05-25
ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges
Authors: Cheng Qian, Hongyi Du, Hongru Wang, Xiusi Chen, Yuji Zhang, Avirup Sil, Chengxiang Zhai, Kathleen McKeown, Heng Ji |
阅读更多来源: ArXiv AI | 22-05-25
Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge
Authors: Yassir Fathullah, Mark J. F. Gales |
阅读更多来源: ArXiv AI | 22-05-25
ClickSight: Interpreting Student Clickstreams to Reveal Insights on Learning Strategies via LLMs
Authors: Bahar Radmehr, Ekaterina Shved, Fatma Betül Güreş, Adish Singla, Tanja Käser |
阅读更多来源: ArXiv AI | 22-05-25
Average Reward Reinforcement Learning for Omega-Regular and Mean-Payoff Objectives
Authors: Milad Kazemi, Mateo Perez, Fabio Somenzi, Sadegh Soudjani, Ashutosh Trivedi, Alvaro Velasquez |
阅读更多来源: ArXiv AI | 22-05-25
Microsoft Build 2025 showcases new AI agent tools and open interfaces for developers
阅读更多来源: The Decoder | 21-05-25
Large language models often struggle with decision-making — a new study explains why
阅读更多来源: The Decoder | 21-05-25
Deep Learning Is Applied Topologytheahura.substack.com
阅读更多来源: Hacker News | 21-05-25
Watching AI drive Microsoft employees insanereddit.com
阅读更多来源: Hacker News | 21-05-25
Someone got an LLM running on a Commodore 64 from 1982, and it runs as wellxda-developers.com
阅读更多来源: Hacker News | 21-05-25
5 Boring Things That Have a Bigger Impact Than AI Assistants on Dev Productivitycodemanship.wordpress.com
阅读更多来源: Hacker News | 21-05-25
DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery
Authors: Kun Li, Zhennan Wu, Shoupeng Wang, Wenbin Hu |
阅读更多来源: ArXiv AI | 21-05-25
Divide by Question, Conquer by Agent: SPLIT-RAG with Question-Driven Graph Partitioning
Authors: Ruiyi Yang, Hao Xue, Imran Razzak, Hakim Hacid, Flora D. Salim |
阅读更多来源: ArXiv AI | 21-05-25
RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning
Authors: Qianyue Hao, Sibo Li, Jian Yuan, Yong Li |
阅读更多来源: ArXiv AI | 21-05-25
ProMind-LLM: Proactive Mental Health Care via Causal Reasoning with Sensor Data
Authors: Xinzhe Zheng, Sijie Ji, Jiawei Sun, Renqi Chen, Wei Gao, Mani Srivastava |
阅读更多来源: ArXiv AI | 21-05-25
MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem
Authors: Fan Liu, Zherui Yang, Cancheng Liu, Tianrui Song, Xiaofeng Gao, Hao Liu |
阅读更多来源: ArXiv AI | 21-05-25
Toward Embodied AGI: A Review of Embodied AI and the Road Ahead
Authors: Yequan Wang, Aixin Sun |
阅读更多来源: ArXiv AI | 21-05-25
Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning
Authors: Minwu Kim, Anubhav Shrestha, Safal Shrestha, Aadim Nepal, Keith Ross |
阅读更多来源: ArXiv AI | 21-05-25
SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors
Authors: Maheep Chaudhary, Fazl Barez |
阅读更多来源: ArXiv AI | 21-05-25
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning
Authors: Zhaohui Yang, Shilei Jiang, Chen Hu, Linjing Li, Shihong Deng, Daxin Jiang |
阅读更多来源: ArXiv AI | 21-05-25
Towards Reliable Proof Generation with LLMs: A Neuro-Symbolic Approach
Authors: Oren Sultan, Eitan Stern, Dafna Shahaf |
阅读更多来源: ArXiv AI | 21-05-25
Guarded Query Routing for Large Language Models
Authors: Richard Šléher, William Brach, Tibor Sloboda, Kristián Košťál, Lukas Galke |
阅读更多来源: ArXiv AI | 21-05-25
BACON: A fully explainable AI model with graded logic for decision making problems
Authors: Haishi Bai, Jozo Dujmovic, Jianwu Wang |
阅读更多来源: ArXiv AI | 21-05-25
Let LLMs Break Free from Overthinking via Self-Braking Tuning
Authors: Haoran Zhao, Yuchen Yan, Yongliang Shen, Haolei Xu, Wenqi Zhang, Kaitao Song, Jian Shao, Weiming Lu, Jun Xiao, Yueting Zhuang |
阅读更多来源: ArXiv AI | 21-05-25
SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas
Authors: Anjiang Wei, Yuheng Wu, Yingjia Wan, Tarun Suresh, Huanmi Tan, Zhanke Zhou, Sanmi Koyejo, Ke Wang, Alex Aiken |
阅读更多来源: ArXiv AI | 21-05-25
Cost-Augmented Monte Carlo Tree Search for LLM-Assisted Planning
Authors: Zihao Zhang, Fei Liu |
阅读更多来源: ArXiv AI | 21-05-25
ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions
Authors: Bufang Yang, Lilin Xu, Liekang Zeng, Kaiwei Liu, Siyang Jiang, Wenrui Lu, Hongkai Chen, Xiaofan Jiang, Guoliang Xing, Zhenyu Yan |
阅读更多来源: ArXiv AI | 21-05-25
Google AI Ultrablog.google
阅读更多来源: Hacker News | 21-05-25
Ask HN: Conversational AI to Learn a Language
阅读更多来源: Hacker News | 21-05-25
US officials warn Apple's iPhone AI deal with Alibaba may boost China's AI sector
阅读更多来源: The Decoder | 20-05-25
Stability AI releases a compact open text-to-audio model that runs on mobile devices
阅读更多来源: The Decoder | 20-05-25
Japanese startup Sakana AI explores time-based thinking with brain-inspired AI model
阅读更多来源: The Decoder | 20-05-25
Google's AI answers are changing user behavior by sharply reducing clicks to websites
阅读更多来源: The Decoder | 20-05-25
Solving physics-based initial value problems with unsupervised machine learningaps.org
阅读更多来源: Hacker News | 20-05-25
Questioning Representational Optimism in Deep Learninggithub.com/akarshkumar0101
阅读更多来源: Hacker News | 20-05-25
Claude Code SDKanthropic.com
阅读更多来源: Hacker News | 20-05-25
The behavior of LLMs in hiring decisions: Systemic biases in candidate selectiondavidrozado.substack.com
阅读更多来源: Hacker News | 20-05-25
NeuroGen: Neural Network Parameter Generation via Large Language Models
Authors: Jiaqi Wang, Yusen Zhang, Xi Li |
阅读更多来源: ArXiv AI | 20-05-25
ALAS: A Stateful Multi-LLM Agent Framework for Disruption-Aware Planning
Authors: Edward Y. Chang, Longling Geng |
阅读更多来源: ArXiv AI | 20-05-25
MARGE: Improving Math Reasoning for LLMs with Guided Exploration
Authors: Jingyue Gao, Runji Lin, Keming Lu, Bowen Yu, Junyang Lin, Jianyu Chen |
阅读更多来源: ArXiv AI | 20-05-25
Accelerating Adaptive Retrieval Augmented Generation via Instruction-Driven Representation Reduction of Retrieval Overlaps
Authors: Jie Ou, Jinyu Guo, Shuaihong Jiang, Zhaokun Wang, Libo Qin, Shunyu Yao, Wenhong Tian |
阅读更多来源: ArXiv AI | 20-05-25
Bullying the Machine: How Personas Increase LLM Vulnerability
Authors: Ziwei Xu, Udit Sanghi, Mohan Kankanhalli |
阅读更多来源: ArXiv AI | 20-05-25
Reasoning BO: Enhancing Bayesian Optimization with Long-Context Reasoning Power of LLMs
Authors: Zhuo Yang, Lingli Ge, Dong Han, Tianfan Fu, Yuqiang Li |
阅读更多来源: ArXiv AI | 20-05-25
Correspondence of high-dimensional emotion structures elicited by video clips between humans and Multimodal LLMs
Authors: Haruka Asanuma, Naoko Koide-Majima, Ken Nakamura, Takato Horii, Shinji Nishimoto, Masafumi Oizumi |
阅读更多来源: ArXiv AI | 20-05-25
TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenarios
Authors: Shaohang Wei, Wei Li, Feifan Song, Wen Luo, Tianyi Zhuang, Haochen Tan, Zhijiang Guo, Houfeng Wang |
阅读更多来源: ArXiv AI | 20-05-25
From Grunts to Grammar: Emergent Language from Cooperative Foraging
Authors: Maytus Piriyajitakonkij, Rujikorn Charakorn, Weicheng Tao, Wei Pan, Mingfei Sun, Cheston Tan, Mengmi Zhang |
阅读更多来源: ArXiv AI | 20-05-25
LLM-KG-Bench 3.0: A Compass for SemanticTechnology Capabilities in the Ocean of LLMs
Authors: Lars-Peter Meyer, Johannes Frey, Desiree Heim, Felix Brei, Claus Stadler, Kurt Junghanns, Michael Martin |
阅读更多来源: ArXiv AI | 20-05-25
CAIM: Development and Evaluation of a Cognitive AI Memory Framework for Long-Term Interaction with Intelligent Agents
Authors: Rebecca Westhäußer, Frederik Berenz, Wolfgang Minker, Sebastian Zepf |
阅读更多来源: ArXiv AI | 20-05-25
StarFT: Robust Fine-tuning of Zero-shot Models via Spuriosity Alignment
Authors: Younghyun Kim, Jongheon Jeong, Sangkyung Kwak, Kyungmin Lee, Juho Lee, Jinwoo Shin |
阅读更多来源: ArXiv AI | 20-05-25
Adversarial Testing in LLMs: Insights into Decision-Making Vulnerabilities
Authors: Lili Zhang, Haomiaomiao Wang, Long Cheng, Libao Deng, Tomas Ward |
阅读更多来源: ArXiv AI | 20-05-25
Enhancing LLMs for Time Series Forecasting via Structure-Guided Cross-Modal Alignment
Authors: Siming Sun, Kai Zhang, Xuejun Jiang, Wenchao Meng, Qinmin Yang |
阅读更多来源: ArXiv AI | 20-05-25
Multi-Armed Bandits Meet Large Language Models
Authors: Djallel Bouneffouf, Raphael Feraud |
阅读更多来源: ArXiv AI | 20-05-25
Agentic Publications: An LLM-Driven Framework for Interactive Scientific Publishing, Supplementing Traditional Papers with AI-Powered Knowledge Systems
Authors: Roberto Pugliese, George Kourousias, Francesco Venier, Grazia Garlatti Costa |
阅读更多来源: ArXiv AI | 20-05-25
AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database
Authors: Rong Bian, Yu Geng, Zijian Yang, Bing Cheng |
阅读更多来源: ArXiv AI | 20-05-25
MIT says a high-profile AI productivity study used data that cannot be trusted
阅读更多来源: The Decoder | 20-05-25
OpenAI says GPT-5 is about doing everything better with "less model switching"
阅读更多来源: The Decoder | 20-05-25
Dilbert creator Scott Adams says he will die soon from same cancer as Joe Bidenthewrap.com
阅读更多来源: Hacker News | 20-05-25
Remarks on AI from NZnealstephenson.substack.com
阅读更多来源: Hacker News | 20-05-25
GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art
Authors: Chenkai Zhang, Yiming Lei, Zeming Liu, Haitao Leng, Shaoguo Liu, Tingting Gao, Qingjie Liu, Yunhong Wang |
阅读更多来源: ArXiv AI | 20-05-25
Disentangling Reasoning and Knowledge in Medical Large Language Models
Authors: Rahul Thapa, Qingyang Wu, Kevin Wu, Harrison Zhang, Angela Zhang, Eric Wu, Haotian Ye, Suhana Bedi, Nevin Aresh, Joseph Boen, Shriya Reddy, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou |
阅读更多来源: ArXiv AI | 20-05-25
LLMs unlock new paths to monetizing exploits
Authors: Nicholas Carlini, Milad Nasr, Edoardo Debenedetti, Barry Wang, Christopher A. Choquette-Choo, Daphne Ippolito, Florian Tramèr, Matthew Jagielski |
阅读更多来源: ArXiv AI | 20-05-25
Code-Driven Planning in Grid Worlds with Large Language Models
Authors: Ashwath Vaithinathan Aravindan, Zhisheng Tang, Mayank Kejriwal |
阅读更多来源: ArXiv AI | 20-05-25
Embodied AI in Machine Learning -- is it Really Embodied?
Authors: Matej Hoffmann, Shubhan Parag Patni |
阅读更多来源: ArXiv AI | 20-05-25
Interpretable Risk Mitigation in LLM Agent Systems
Authors: Jan Chojnacki |
阅读更多来源: ArXiv AI | 20-05-25
Modeling cognitive processes of natural reading with transformer-based Language Models
Authors: Bruno Bianchi, Fermín Travi, Juan E. Kamienkowski |
阅读更多来源: ArXiv AI | 20-05-25
Improving Assembly Code Performance with Large Language Models via Reinforcement Learning
Authors: Anjiang Wei, Tarun Suresh, Huanmi Tan, Yinglun Xu, Gagandeep Singh, Ke Wang, Alex Aiken |
阅读更多来源: ArXiv AI | 20-05-25
Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models
Authors: Simeng Han, Stephen Xia, Grant Zhang, Howard Dai, Chen Liu, Lichang Chen, Hoang Huy Nguyen, Hongyuan Mei, Jiayuan Mao, R. Thomas McCoy |
阅读更多来源: ArXiv AI | 20-05-25
TACO: Rethinking Semantic Communications with Task Adaptation and Context Embedding
Authors: Achintha Wijesinghe, Weiwei Wang, Suchinthaka Wanninayaka, Songyang Zhang, Zhi Ding |
阅读更多来源: ArXiv AI | 20-05-25
RAGSynth: Synthetic Data for Robust and Faithful RAG Component Optimization
Authors: Haiyang Shen, Hang Yan, Zhongshi Xing, Mugeng Liu, Yue Li, Zhiyang Chen, Yuxiang Wang, Jiuzheng Wang, Yun Ma |
阅读更多来源: ArXiv AI | 20-05-25
Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory
Authors: Yexiang Liu, Zekun Li, Zhi Fang, Nan Xu, Ran He, Tieniu Tan |
阅读更多来源: ArXiv AI | 20-05-25
Navigating the Alpha Jungle: An LLM-Powered MCTS Framework for Formulaic Factor Mining
Authors: Yu Shi, Yitong Duan, Jian Li |
阅读更多来源: ArXiv AI | 20-05-25
Can Global XAI Methods Reveal Injected Bias in LLMs? SHAP vs Rule Extraction vs RuleSHAP
Authors: Francesco Sovrano |
阅读更多来源: ArXiv AI | 20-05-25
LD-Scene: LLM-Guided Diffusion for Controllable Generation of Adversarial Safety-Critical Driving Scenarios
Authors: Mingxing Peng, Yuting Xie, Xusen Guo, Ruoyu Yao, Hai Yang, Jun Ma |
阅读更多来源: ArXiv AI | 20-05-25
Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs
Authors: Zhangying Feng, Qianglong Chen, Ning Lu, Yongqian Li, Siqi Cheng, Shuangmu Peng, Duyu Tang, Shengcai Liu, Zhirui Zhang |
阅读更多来源: ArXiv AI | 20-05-25
SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning
Authors: Zheng Li, Qingxiu Dong, Jingyuan Ma, Di Zhang, Zhifang Sui |
阅读更多来源: ArXiv AI | 20-05-25
Anthropic is forced to apologize after Claude undercuts its legal team
阅读更多来源: The Decoder | 19-05-25
Show HN: I modeled the Voynich Manuscript with SBERT to test for structuregithub.com/brianmg
阅读更多来源: Hacker News | 19-05-25
Meta's Behemoth AI model delay signals struggles to match new paradigms
阅读更多来源: The Decoder | 19-05-25
Emergent social conventions and collective bias in LLM populationsscience.org
阅读更多来源: Hacker News | 19-05-25
Understanding Transformers via N-gram Statisticsarxiv.org
阅读更多来源: Hacker News | 18-05-25
O2 VoLTE: locating any customer with a phone callmastdatabase.co.uk
阅读更多来源: Hacker News | 18-05-25
Emergence of Structure in Ensembles of Random Neural Networks
Authors: Luca Muscarnera, Luigi Loreti, Giovanni Todeschini, Alessio Fumagalli, Francesco Regazzoni |
阅读更多来源: ArXiv AI | 18-05-25
SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity
Authors: Shihao Zou, Qingfeng Li, Wei Ji, Jingjing Li, Yongkui Yang, Guoqi Li, Chao Dong |
阅读更多来源: ArXiv AI | 18-05-25
ILIF: Temporal Inhibitory Leaky Integrate-and-Fire Neuron for Overactivation in Spiking Neural Networks
Authors: Kai Sun, Peibo Duan, Levin Kuhlmann, Beilun Wang, Bin Zhang |
阅读更多来源: ArXiv AI | 18-05-25
Visual Fidelity Index for Generative Semantic Communications with Critical Information Embedding
Authors: Jianhao Huang, Qunsong Zeng, Kaibin Huang |
阅读更多来源: ArXiv AI | 18-05-25
Rethinking Repetition Problems of LLMs in Code Generation
Authors: Yihong Dong, Yuchen Liu, Xue Jiang, Zhi Jin, Ge Li |
阅读更多来源: ArXiv AI | 18-05-25
Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations?
Authors: Pedro Orvalho, Marta Kwiatkowska |
阅读更多来源: ArXiv AI | 18-05-25
IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning
Authors: Dechen Gao, Hang Wang, Hanchu Zhou, Nejib Ammar, Shatadal Mishra, Ahmadreza Moradipari, Iman Soltani, Junshan Zhang |
阅读更多来源: ArXiv AI | 18-05-25
PIF: Anomaly detection via preference embedding
Authors: Filippo Leveni, Luca Magri, Giacomo Boracchi, Cesare Alippi |
阅读更多来源: ArXiv AI | 18-05-25
Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps
Authors: Ningyuan Yang, Jiaxuan Gao, Feng Gao, Yi Wu, Chao Yu |
阅读更多来源: ArXiv AI | 18-05-25
Neural Thermodynamic Laws for Large Language Model Training
Authors: Ziming Liu, Yizhou Liu, Jeff Gore, Max Tegmark |
阅读更多来源: ArXiv AI | 18-05-25
Pre-Act: Multi-Step Planning and Reasoning Improves Acting in LLM Agents
Authors: Mrinal Rawat, Ambuje Gupta, Rushil Goomer, Alessandro Di Bari, Neha Gupta, Roberto Pieraccini |
阅读更多来源: ArXiv AI | 18-05-25
Demystifying AI Agents: The Final Generation of Intelligence
Authors: Kevin J McNamara, Rhea Pritham Marpu |
阅读更多来源: ArXiv AI | 18-05-25
Leveraging Graph Retrieval-Augmented Generation to Support Learners' Understanding of Knowledge Concepts in MOOCs
Authors: Mohamed Abdelmagied, Mohamed Amine Chatti, Shoeb Joarder, Qurat Ul Ain, Rawaa Alatrash |
阅读更多来源: ArXiv AI | 18-05-25
Empirically evaluating commonsense intelligence in large language models with large-scale human judgments
Authors: Tuan Dung Nguyen, Duncan J. Watts, Mark E. Whiting |
阅读更多来源: ArXiv AI | 18-05-25
Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models
Authors: Annie Wong, Thomas Bäck, Aske Plaat, Niki van Stein, Anna V. Kononova |
阅读更多来源: ArXiv AI | 18-05-25
Soundcloud updates its AI training policy, but it's still unclear
阅读更多来源: The Decoder | 18-05-25
Geoffrey Hinton's wildly overconfident AI prediction failed—now it's a lesson in humility
阅读更多来源: The Decoder | 18-05-25
How 'The Little Prince' and AI help us better understand language development in the brain
阅读更多来源: The Decoder | 18-05-25
LLMs are more persuasive than incentivized human persuadersarxiv.org
阅读更多来源: Hacker News | 18-05-25
Unspoken Currency of Office Politics: Leverage and Sanction Between Coworkersgraphthinking.blogspot.com
阅读更多来源: Hacker News | 18-05-25
Transformer neural net learns to run Conway's Game of Life just from examplessidsite.com
阅读更多来源: Hacker News | 17-05-25
I'm Peter Roberts, immigration attorney, who does work for YC and startups. AMA
阅读更多来源: Hacker News | 17-05-25
Show HN: Merliot – plugging physical devices into LLMsgithub.com/merliot
阅读更多来源: Hacker News | 17-05-25
A Research Preview of Codexopenai.com
阅读更多来源: Hacker News | 17-05-25
MIT asks arXiv to withdraw preprint of paper on AI and scientific discoveryeconomics.mit.edu
阅读更多来源: Hacker News | 17-05-25
Getting AI to write good SQLcloud.google.com
阅读更多来源: Hacker News | 17-05-25
Meta introduces OMol25 and UMA, new open AI tools for molecular research
阅读更多来源: The Decoder | 17-05-25
Anthropic is reportedly testing Claude models that can fix their own mistakes
阅读更多来源: The Decoder | 17-05-25
Will AI systems perform poorly due to AI-generated material in training data?acm.org
阅读更多来源: Hacker News | 17-05-25
U.S. is cracking down on Huawei's AI hardware while loosening its general export regulations
阅读更多来源: The Decoder | 16-05-25
After months of coding with LLMs, I'm going back to using my brainalbertofortin.com
阅读更多来源: Hacker News | 16-05-25
The unreasonable effectiveness of an LLM agent loop with tool usesketch.dev
阅读更多来源: Hacker News | 16-05-25
Show HN: Min.js style compression of tech docs for LLM contextgithub.com/marv1nnnnn
阅读更多来源: Hacker News | 16-05-25
Google brings Gemini AI to smartwatches, cars, TVs, and XR headsets
阅读更多来源: The Decoder | 15-05-25
OpenAI says its latest models outperform doctors in medical benchmark
阅读更多来源: The Decoder | 15-05-25
Saudi Arabia founds AI company "Humain" - US relaxes chip export rules for Gulf states
阅读更多来源: The Decoder | 15-05-25
Nvidia will supply advanced chips for Saudi Arabia’s Humain AI project
阅读更多来源: The Decoder | 15-05-25
GreenFactory: Ensembling Zero-Cost Proxies to Estimate Performance of Neural Networks
Authors: Gabriel Cortês, Nuno Lourenço, Paolo Romano, Penousal Machado |
阅读更多来源: ArXiv AI | 15-05-25
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Authors: Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Huazuo Gao, Jiashi Li, Liyue Zhang, Panpan Huang, Shangyan Zhou, Shirong Ma, Wenfeng Liang, Ying He, Yuqing Wang, Yuxuan Liu, Y.X. Wei |
阅读更多来源: ArXiv AI | 15-05-25
A 2D Semantic-Aware Position Encoding for Vision Transformers
Authors: Xi Chen, Shiyang Zhou, Muqi Huang, Jiaxu Feng, Yun Xiong, Kun Zhou, Biao Yang, Yuhui Zhang, Huishuai Bao, Sijia Peng, Chuan Li, Feng Shi |
阅读更多来源: ArXiv AI | 15-05-25
Evaluating GPT- and Reasoning-based Large Language Models on Physics Olympiad Problems: Surpassing Human Performance and Implications for Educational Assessment
Authors: Paul Tschisgale, Holger Maus, Fabian Kieser, Ben Kroehs, Stefan Petersen, Peter Wulff |
阅读更多来源: ArXiv AI | 15-05-25
Customizing a Large Language Model for VHDL Design of High-Performance Microprocessors
Authors: Nicolas Dupuis, Ravi Nair, Shyam Ramji, Sean McClintock, Nishant Chauhan, Priyanka Nagpal, Bart Blaner, Ken Valk, Leon Stok, Ruchir Puri |
阅读更多来源: ArXiv AI | 15-05-25
How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference
Authors: Nidhal Jegham, Marwen Abdelatti, Lassad Elmoubarki, Abdeltawab Hendawi |
阅读更多来源: ArXiv AI | 15-05-25
WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models
Authors: Abdullah Mushtaq, Imran Taj, Rafay Naeem, Ibrahim Ghaznavi, Junaid Qadir |
阅读更多来源: ArXiv AI | 15-05-25
Automated Meta Prompt Engineering for Alignment with the Theory of Mind
Authors: Aaron Baughman, Rahul Agarwal, Eduardo Morales, Gozde Akay |
阅读更多来源: ArXiv AI | 15-05-25
The Influence of Human-inspired Agentic Sophistication in LLM-driven Strategic Reasoners
Authors: Vince Trencsenyi, Agnieszka Mensfelt, Kostas Stathis |
阅读更多来源: ArXiv AI | 15-05-25
Reproducibility Study of "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents"
Authors: Pedro M. P. Curvo, Mara Dragomir, Salvador Torpes, Mohammadmahdi Rahimi |
阅读更多来源: ArXiv AI | 15-05-25
Beyond the Known: Decision Making with Counterfactual Reasoning Decision Transformer
Authors: Minh Hoang Nguyen, Linh Le Pham Van, Thommen George Karimpanal, Sunil Gupta, Hung Le |
阅读更多来源: ArXiv AI | 15-05-25
Improving the Reliability of LLMs: Combining CoT, RAG, Self-Consistency, and Self-Verification
Authors: Adarsh Kumar, Hwiyoon Kim, Jawahar Sai Nathani, Neil Roy |
阅读更多来源: ArXiv AI | 15-05-25
Show HN: Muscle-Mem, a behavior cache for AI agentsgithub.com/pig-dot-dev
阅读更多来源: Hacker News | 15-05-25
A server that wasn't meant to existdragas.net
阅读更多来源: Hacker News | 15-05-25
LLMs get lost in multi-turn conversationarxiv.org
阅读更多来源: Hacker News | 15-05-25
AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithmsdeepmind.google
阅读更多来源: Hacker News | 15-05-25
Launch HN: Jazzberry (YC X25) – AI agent for finding bugs
阅读更多来源: Hacker News | 15-05-25
Show HN: YapCards (iOS) – Voice-driven flashcards with AI feedback
阅读更多来源: Hacker News | 15-05-25
100 experts call for more research into the control of AI systems
阅读更多来源: The Decoder | 14-05-25
Show HN: HelixDB – Open-source vector-graph database for AI applications (Rust)github.com/helixdb
阅读更多来源: Hacker News | 14-05-25
Build real-time knowledge graph for documents with LLMcocoindex.io
阅读更多来源: Hacker News | 14-05-25
EM-LLM: Human-Inspired Episodic Memory for Infinite Context LLMsgithub.com/em-llm
阅读更多来源: Hacker News | 14-05-25
A Survey of Deep Learning for Complex Speech Spectrograms
Authors: Yuying Xie, Zheng-Hua Tan |
阅读更多来源: ArXiv AI | 14-05-25
Securing RAG: A Risk Assessment and Mitigation Framework
Authors: Lukas Ammann, Sara Ott, Christoph R. Landolt, Marco P. Lehmann |
阅读更多来源: ArXiv AI | 14-05-25
CodePDE: An Inference Framework for LLM-driven PDE Solver Generation
Authors: Shanda Li, Tanya Marwah, Junhong Shen, Weiwei Sun, Andrej Risteski, Yiming Yang, Ameet Talwalkar |
阅读更多来源: ArXiv AI | 14-05-25
Winning at All Cost: A Small Environment for Eliciting Specification Gaming Behaviors in Large Language Models
Authors: Lars Malmqvist |
阅读更多来源: ArXiv AI | 14-05-25
Enhancing Trust Management System for Connected Autonomous Vehicles Using Machine Learning Methods: A Survey
Authors: Qian Xu, Lei Zhang, Yixiao Liu |
阅读更多来源: ArXiv AI | 14-05-25
The Correspondence Between Bounded Graph Neural Networks and Fragments of First-Order Logic
Authors: Bernardo Cuenca Grau, Przemysław A. Wałęga |
阅读更多来源: ArXiv AI | 14-05-25
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Authors: Tobias Schnabel, Kiran Tomlinson, Adith Swaminathan, Jennifer Neville |
阅读更多来源: ArXiv AI | 14-05-25
Decoding Neighborhood Environments with Large Language Models
Authors: Andrew Cart, Shaohu Zhang, Melanie Escue, Xugui Zhou, Haitao Zhao, Prashanth BusiReddyGari, Beiyu Lin, Shuang Li |
阅读更多来源: ArXiv AI | 14-05-25
Benchmarking AI scientists in omics data-driven biological research
Authors: Erpai Luo, Jinmeng Jia, Yifan Xiong, Xiangyu Li, Xiaobo Guo, Baoqi Yu, Lei Wei, Xuegong Zhang |
阅读更多来源: ArXiv AI | 14-05-25
Evaluating LLM Metrics Through Real-World Capabilities
Authors: Justin K Miller, Wenjia Tang |
阅读更多来源: ArXiv AI | 14-05-25
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
Authors: Enci Zhang, Xingang Yan, Wei Lin, Tianxiang Zhang, Qianchun Lu |
阅读更多来源: ArXiv AI | 14-05-25
Strategy-Augmented Planning for Large Language Models via Opponent Exploitation
Authors: Shuai Xu, Sijia Cui, Yanna Wang, Bo Xu, Qi Wang |
阅读更多来源: ArXiv AI | 14-05-25
Achieving Scalable Robot Autonomy via neurosymbolic planning using lightweight local LLM
Authors: Nicholas Attolino, Alessio Capitanelli, Fulvio Mastrogiovanni |
阅读更多来源: ArXiv AI | 14-05-25
Guiding LLM-based Smart Contract Generation with Finite State Machine
Authors: Hao Luo, Yuhao Lin, Xiao Yan, Xintong Hu, Yuxiang Wang, Qiming Zeng, Hao Wang, Jiawei Jiang |
阅读更多来源: ArXiv AI | 14-05-25
Integrating Natural Language Processing and Exercise Monitoring for Early Diagnosis of Metabolic Syndrome: A Deep Learning Approach
Authors: Yichen Zhao, Yuhua Wang, Xi Cheng, Junhao Fang, Yang Yang |
阅读更多来源: ArXiv AI | 14-05-25
LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs
Authors: K M Sajjadul Islam, Ayesha Siddika Nipu, Jiawei Wu, Praveen Madiraju |
阅读更多来源: ArXiv AI | 14-05-25
DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models
Authors: Xiaoyang Chen, Xinan Dai, Yu Du, Qian Feng, Naixu Guo, Tingshuo Gu, Yuting Gao, Yingyi Gao, Xudong Han, Xiang Jiang, Yilin Jin, Hongyi Lin, Shisheng Lin, Xiangnan Li, Yuante Li, Yixing Li, Zhentao Lai, Zilu Ma, Yingrong Peng, Jiacheng Qian, Hao-Yu Sun, Jianbo Sun, Zirui Wang, Siwei Wu, Zian Wang, Bin Xu, Jianghao Xu, Yiyang Yu, Zichuan Yang, Hongji Zha, Ruichong Zhang |
阅读更多来源: ArXiv AI | 14-05-25
OpenAI's chief scientist Jakub Pachocki says there is evidence that AI models discover novel insights
阅读更多来源: The Decoder | 14-05-25
Insurers launch cover for losses caused by AI chatbot errorsft.com
阅读更多来源: Hacker News | 14-05-25
Garbage collection of object storage at scalewarpstream.com
阅读更多来源: Hacker News | 14-05-25
DeepSeek’s founder is threatening US dominance in AI racebloomberg.com
阅读更多来源: Hacker News | 14-05-25
Confident user prompts make LLMs more likely to hallucinate
阅读更多来源: The Decoder | 13-05-25
Stanford researchers find AI agents improve when guided by past successes
阅读更多来源: The Decoder | 13-05-25
Microsoft could sacrifice some OpenAI shares - but wants to secure access to AI technology
阅读更多来源: The Decoder | 13-05-25
HealthBench – An evaluation for AI systems and human healthopenai.com
阅读更多来源: Hacker News | 13-05-25
A conversation about AI for science with Jason Pruetlanl.gov
阅读更多来源: Hacker News | 13-05-25
A class of distributed automata that contains the modal mu-fragment
Authors: Veeti Ahvonen, Damian Heiman, Antti Kuusisto |
阅读更多来源: ArXiv AI | 13-05-25
Reliable Collaborative Conversational Agent System Based on LLMs and Answer Set Programming
Authors: Yankai Zeng, Gopal Gupta |
阅读更多来源: ArXiv AI | 13-05-25
KCluster: An LLM-based Clustering Approach to Knowledge Component Discovery
Authors: Yumou Wei, Paulo Carvalho, John Stamper |
阅读更多来源: ArXiv AI | 13-05-25
Exploring Multimodal Foundation AI and Expert-in-the-Loop for Sustainable Management of Wild Salmon Fisheries in Indigenous Rivers
Authors: Chi Xu, Yili Jin, Sami Ma, Rongsheng Qian, Hao Fang, Jiangchuan Liu, Xue Liu, Edith C.H. Ngai, William I. Atlas, Katrina M. Connors, Mark A. Spoljaric |
阅读更多来源: ArXiv AI | 13-05-25
Control Plane as a Tool: A Scalable Design Pattern for Agentic AI Systems
Authors: Sivasathivel Kandasamy |
阅读更多来源: ArXiv AI | 13-05-25
Embodied Intelligence: The Key to Unblocking Generalized Artificial Intelligence
Authors: Jinhao Jiang, Changlin Chen, Shile Feng, Wanru Geng, Zesheng Zhou, Ni Wang, Shuai Li, Feng-Qi Cui, Erbao Dong |
阅读更多来源: ArXiv AI | 13-05-25
From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering
Authors: Gaurab Sarkar, Sougata Saha |
阅读更多来源: ArXiv AI | 13-05-25
LLM-Augmented Chemical Synthesis and Design Decision Programs
Authors: Haorui Wang, Jeff Guo, Lingkai Kong, Rampi Ramprasad, Philippe Schwaller, Yuanqi Du, Chao Zhang |
阅读更多来源: ArXiv AI | 13-05-25
Explainable AI the Latest Advancements and New Trends
Authors: Bowen Long, Enjie Liu, Renxi Qiu, Yanqing Duan |
阅读更多来源: ArXiv AI | 13-05-25
DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs
Authors: Yubo Shu, Zhewei Huang, Xin Wu, Chen Hu, Shuchang Zhou, Daxin Jiang |
阅读更多来源: ArXiv AI | 13-05-25
Efficient Fault Detection in WSN Based on PCA-Optimized Deep Neural Network Slicing Trained with GOA
Authors: Mahmood Mohassel Feghhi, Raya Majid Alsharfa, Majid Hameed Majeed |
阅读更多来源: ArXiv AI | 13-05-25
RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models
Authors: Hanzheng Dai, Yuanliang Li, Zhibo Zhang, Jun Yan |
阅读更多来源: ArXiv AI | 13-05-25
Architectural Precedents for General Agents using Large Language Models
Authors: Robert E. Wray, James R. Kirk, John E. Laird |
阅读更多来源: ArXiv AI | 13-05-25
AIS Data-Driven Maritime Monitoring Based on Transformer: A Comprehensive Review
Authors: Zhiye Xie, Enmei Tu, Xianping Fu, Guoliang Yuan, Yi Han |
阅读更多来源: ArXiv AI | 13-05-25
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks
Authors: Kai Xu, YiWei Mao, XinYi Guan, ZiLong Feng |
阅读更多来源: ArXiv AI | 13-05-25
How well do LLMs reason over tabular data, really?
Authors: Cornelius Wolff, Madelon Hulsebos |
阅读更多来源: ArXiv AI | 13-05-25
QuantX: A Framework for Hardware-Aware Quantization of Generative AI Workloads
Authors: Khurram Mazher, Saad Bin Nasir |
阅读更多来源: ArXiv AI | 13-05-25
YuLan-OneSim: Towards the Next Generation of Social Simulator with Large Language Models
Authors: Lei Wang, Heyang Gao, Xiaohe Bo, Xu Chen, Ji-Rong Wen |
阅读更多来源: ArXiv AI | 13-05-25
"I Apologize For Not Understanding Your Policy": Exploring the Specification and Evaluation of User-Managed Access Control Policies by AI Virtual Assistants
Authors: Jennifer Mondragon, Carlos Rubio-Medrano, Gael Cruz, Dvijesh Shastri |
阅读更多来源: ArXiv AI | 13-05-25
Multi-Agent Systems for Robotic Autonomy with LLMs
Authors: Junhong Chen, Ziqi Yang, Haoyuan G Xu, Dandan Zhang, George Mylonas |
阅读更多来源: ArXiv AI | 13-05-25
Evolutionary thoughts: integration of large language models and evolutionary algorithms
Authors: Antonio Jimeno Yepes, Pieter Barnard |
阅读更多来源: ArXiv AI | 13-05-25
What Is Next for LLMs? Next-Generation AI Computing Hardware Using Photonic Chips
Authors: Renjie Li, Wenjie Wei, Qi Xin, Xiaoli Liu, Sixuan Mao, Erik Ma, Zijian Chen, Malu Zhang, Haizhou Li, Zhaoyu Zhang |
阅读更多来源: ArXiv AI | 13-05-25
AgentXploit: End-to-End Redteaming of Black-Box AI Agents
Authors: Zhun Wang, Vincent Siu, Zhe Ye, Tianneng Shi, Yuzhou Nie, Xuandong Zhao, Chenguang Wang, Wenbo Guo, Dawn Song |
阅读更多来源: ArXiv AI | 13-05-25
Human-in-the-Loop AI for HVAC Management Enhancing Comfort and Energy Efficiency
Authors: Xinyu Liang, Frits de Nijs, Buser Say, Hao Wang |
阅读更多来源: ArXiv AI | 13-05-25
Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI
Authors: Benjamin Raphael Ernhofer, Daniil Prokhorov, Jannica Langner, Dominik Bollmann |
阅读更多来源: ArXiv AI | 13-05-25
IRNN: Innovation-driven Recurrent Neural Network for Time-Series Data Modeling and Prediction
Authors: Yifan Zhou, Yibo Wang, Chao Shang |
阅读更多来源: ArXiv AI | 13-05-25
Multimodal Sentiment Analysis on CMU-MOSEI Dataset using Transformer-based Models
Authors: Jugal Gajjar, Kaustik Ranaware |
阅读更多来源: ArXiv AI | 13-05-25
LLMs Outperform Experts on Challenging Biology Benchmarks
Authors: Lennart Justen |
阅读更多来源: ArXiv AI | 13-05-25
UniSymNet: A Unified Symbolic Network Guided by Transformer
Authors: Xinxin Li, Juan Zhang, Da Li, Xingyu Liu, Jin Xu, Junping Yin |
阅读更多来源: ArXiv AI | 13-05-25
The Application of Deep Learning for Lymph Node Segmentation: A Systematic Review
Authors: Jingguo Qu, Xinyang Han, Man-Lik Chui, Yao Pu, Simon Takadiyi Gunda, Ziman Chen, Jing Qin, Ann Dorothy King, Winnie Chiu-Wing Chu, Jing Cai, Michael Tin-Cheung Ying |
阅读更多来源: ArXiv AI | 13-05-25
A Scaling Law for Token Efficiency in LLM Fine-Tuning Under Fixed Compute Budgets
Authors: Ryan Lagasse, Aidan Kiernans, Avijit Ghosh, Shiri Dori-Hacohen |
阅读更多来源: ArXiv AI | 13-05-25
HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics
Authors: Lennart Luettgau, Harry Coppock, Magda Dubois, Christopher Summerfield, Cozmin Ududec |
阅读更多来源: ArXiv AI | 13-05-25
Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods
Authors: Markov Grey, Charbel-Raphaël Segerie |
阅读更多来源: ArXiv AI | 13-05-25
Leveraging Large Language Models for enzymatic reaction prediction and characterization
Authors: Lorenzo Di Fruscia, Jana Marie Weber |
阅读更多来源: ArXiv AI | 13-05-25
Combining Abstract Argumentation and Machine Learning for Efficiently Analyzing Low-Level Process Event Streams
Authors: Bettina Fazzinga, Sergio Flesca, Filippo Furfaro, Luigi Pontieri, Francesco Scala |
阅读更多来源: ArXiv AI | 13-05-25
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning
Authors: Azim Ospanov, Roozbeh Yousefzadeh |
阅读更多来源: ArXiv AI | 13-05-25
ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding
Authors: Shuai Wang, Ivona Najdenkoska, Hongyi Zhu, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring |
阅读更多来源: ArXiv AI | 13-05-25
Free and Fair Hardware: A Pathway to Copyright Infringement-Free Verilog Generation using LLMs
Authors: Sam Bush, Matthew DeLorenzo, Phat Tieu, Jeyavijayan Rajendran |
阅读更多来源: ArXiv AI | 13-05-25
Bytedance launches Agent TARS, an open-source AI automation agent
阅读更多来源: The Decoder | 12-05-25
Google recaps how its LLMs could change in-game interactions
阅读更多来源: The Decoder | 12-05-25
Five major obstacles are holding back RAG systems in healthcare
阅读更多来源: The Decoder | 12-05-25
Writing an LLM from scratch, part 13 – attention heads are dumbgilesthomas.com
阅读更多来源: Hacker News | 12-05-25
US Copyright Office found AI companies breach copyright. Its boss was firedtheregister.com
阅读更多来源: Hacker News | 12-05-25
Klarna changes its AI tune and again recruits humans for customer servicecustomerexperiencedive.com
阅读更多来源: Hacker News | 12-05-25
Avoiding AI is hard – but our freedom to opt out must be protectedtheconversation.com
阅读更多来源: Hacker News | 12-05-25
Custom SIM card in Tesla Model 3 2024, Tesla Model Y 2025 and Cybertruckolegkutkov.me
阅读更多来源: Hacker News | 12-05-25
OpenAI adds new fine-tuning options for o4-mini and GPT-4.1
阅读更多来源: The Decoder | 11-05-25
Software Development Life Cycle Perspective: A Survey of Benchmarks for CodeLLMs and Agents
Authors: Kaixin Wang, Tianlin Li, Xiaoyu Zhang, Chong Wang, Weisong Sun, Yang Liu, Bin Shi |
阅读更多来源: ArXiv AI | 11-05-25
T-T: Table Transformer for Tagging-based Aspect Sentiment Triplet Extraction
Authors: Kun Peng, Chaodong Tong, Cong Cao, Hao Peng, Qian Li, Guanlin Wu, Lei Jiang, Yanbing Liu, Philip S. Yu |
阅读更多来源: ArXiv AI | 11-05-25
Put CASH on Bandits: A Max K-Armed Problem for Automated Machine Learning
Authors: Amir Rezaei Balef, Claire Vernade, Katharina Eggensperger |
阅读更多来源: ArXiv AI | 11-05-25
Incentive-Aware Machine Learning; Robustness, Fairness, Improvement & Causality
Authors: Chara Podimata |
阅读更多来源: ArXiv AI | 11-05-25
High-fidelity Grain Growth Modeling: Leveraging Deep Learning for Fast Computations
Authors: Pungponhavoan Tep, Marc Bernacki |
阅读更多来源: ArXiv AI | 11-05-25
Threshold Modulation for Online Test-Time Adaptation of Spiking Neural Networks
Authors: Kejie Zhao, Wenjia Hua, Aiersi Tuerhong, Luziwei Leng, Yuxin Ma, Qinghua Guo |
阅读更多来源: ArXiv AI | 11-05-25
Towards Artificial Intelligence Research Assistant for Expert-Involved Learning
Authors: Tianyu Liu, Simeng Han, Xiao Luo, Hanchen Wang, Pan Lu, Biqing Zhu, Yuge Wang, Keyi Li, Jiapeng Chen, Rihao Qu, Yufeng Liu, Xinyue Cui, Aviv Yaish, Yuhang Chen, Minsheng Hao, Chuhan Li, Kexing Li, Arman Cohan, Hua Xu, Mark Gerstein, James Zou, Hongyu Zhao |
阅读更多来源: ArXiv AI | 11-05-25
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant
Authors: Haibo Wang, Bo Feng, Zhengfeng Lai, Mingze Xu, Shiyu Li, Weifeng Ge, Afshin Dehghan, Meng Cao, Ping Huang |
阅读更多来源: ArXiv AI | 11-05-25
TransProQA: an LLM-based literary Translation evaluation metric with Professional Question Answering
Authors: Ran Zhang, Wei Zhao, Lieve Macken, Steffen Eger |
阅读更多来源: ArXiv AI | 11-05-25
Large Language Models are Autonomous Cyber Defenders
Authors: Sebastián R. Castro, Roberto Campbell, Nancy Lau, Octavio Villalobos, Jiaqi Duan, Alvaro A. Cardenas |
阅读更多来源: ArXiv AI | 11-05-25
The Promise and Limits of LLMs in Constructing Proofs and Hints for Logic Problems in Intelligent Tutoring Systems
Authors: Sutapa Dey Tithi, Arun Kumar Ramesh, Clara DiMarco, Xiaoyi Tian, Nazia Alam, Kimia Fazeli, Tiffany Barnes |
阅读更多来源: ArXiv AI | 11-05-25
Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards
Authors: Jaeho Kim, Yunseok Lee, Seulki Lee |
阅读更多来源: ArXiv AI | 11-05-25
Position: Epistemic Artificial Intelligence is Essential for Machine Learning Models to Know When They Do Not Know
Authors: Shireen Kudukkil Manchingal, Fabio Cuzzolin |
阅读更多来源: ArXiv AI | 11-05-25
A Reputation System for Large Language Model-based Multi-agent Systems to Avoid the Tragedy of the Commons
Authors: Siyue Ren, Wanli Fu, Xinkun Zou, Chen Shen, Yi Cai, Chen Chu, Zhen Wang, Shuyue Hu |
阅读更多来源: ArXiv AI | 11-05-25
Is there a half-life for the success rates of AI agents?
Authors: Toby Ord |
阅读更多来源: ArXiv AI | 11-05-25
Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation
Authors: Luca Marzari, Isabella Mastroeni, Alessandro Farinelli |
阅读更多来源: ArXiv AI | 11-05-25
A Pain Assessment Framework based on multimodal data and Deep Machine Learning methods
Authors: Stefanos Gkikas |
阅读更多来源: ArXiv AI | 11-05-25
ZeroSearch: Alibaba trains search assistant in AI simulation
阅读更多来源: The Decoder | 11-05-25
Show HN: Code Claude Codegithub.com/rvca212
阅读更多来源: Hacker News | 11-05-25
LTXVideo 13B AI video generationltxv.video
阅读更多来源: Hacker News | 10-05-25
ChatGPT's user base expands while established web giants lose ground
阅读更多来源: The Decoder | 10-05-25
Hugging Face unveils experimental AI agent for computers
阅读更多来源: The Decoder | 10-05-25
OpenAI plans "cderGPT" for the US Food and Drug Administration (FDA)
阅读更多来源: The Decoder | 10-05-25
Odin, a Pragmatic C Alternative with a Go Flavourbitshifters.cc
阅读更多来源: Hacker News | 10-05-25
Fighting Unwanted Notifications with Machine Learning in Chromechromium.org
阅读更多来源: Hacker News | 10-05-25
Microsoft leverages Google's open A2A protocol for interoperable AI agents
阅读更多来源: The Decoder | 09-05-25
A flat pricing subscription for Claude Codeanthropic.com
阅读更多来源: Hacker News | 09-05-25
Ciro (YC S22) is hiring a software engineer to build AI agents for salesycombinator.com
阅读更多来源: Hacker News | 09-05-25
Notes on rolling out Cursor and Claude Codeghiculescu.substack.com
阅读更多来源: Hacker News | 09-05-25
OpenAI launches a program to partner with governments on global AI infrastructure
阅读更多来源: The Decoder | 08-05-25
EU's leading AI startup Mistral unveils Medium 3 and Le Chat Enterprise
阅读更多来源: The Decoder | 08-05-25
By 2026, most firms expect to have a Chief AI Officer on staff
阅读更多来源: The Decoder | 08-05-25
Web search on the Anthropic APIanthropic.com
阅读更多来源: Hacker News | 08-05-25
Create and edit images with Gemini 2.0 in previewgoogleblog.com
阅读更多来源: Hacker News | 08-05-25
Mistral ships Le Chat – enterprise AI assistant that can run on premmistral.ai
阅读更多来源: Hacker News | 08-05-25
Sparsity is All You Need: Rethinking Biological Pathway-Informed Approaches in Deep Learning
Authors: Isabella Caranzano, Corrado Pancotti, Cesare Rollo, Flavio Sartori, Pietro Liò, Piero Fariselli, Tiziana Sanavia |
阅读更多来源: ArXiv AI | 08-05-25
Balancing Accuracy, Calibration, and Efficiency in Active Learning with Vision Transformers Under Label Noise
Authors: Moseli Mots'oehli, Hope Mogale, Kyungim Baek |
阅读更多来源: ArXiv AI | 08-05-25
Multi-Granular Attention based Heterogeneous Hypergraph Neural Network
Authors: Hong Jin, Kaicheng Zhou, Jie Yin, Lan You, Zhifeng Zhou |
阅读更多来源: ArXiv AI | 08-05-25
Detecting Concept Drift in Neural Networks Using Chi-squared Goodness of Fit Testing
Authors: Jacob Glenn Ayers, Buvaneswari A. Ramanan, Manzoor A. Khan |
阅读更多来源: ArXiv AI | 08-05-25
OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models
Authors: Xiaoyu Xu, Minxin Du, Qingqing Ye, Haibo Hu |
阅读更多来源: ArXiv AI | 08-05-25
The Aloe Family Recipe for Open and Specialized Healthcare LLMs
Authors: Dario Garcia-Gasulla, Jordi Bayarri-Planas, Ashwin Kumar Gururajan, Enrique Lopez-Cuena, Adrian Tormos, Daniel Hinjos, Pablo Bernabeu-Perez, Anna Arias-Duart, Pablo Agustin Martin-Torres, Marta Gonzalez-Mallo, Sergio Alvarez-Napagao, Eduard Ayguadé-Parra, Ulises Cortés |
阅读更多来源: ArXiv AI | 08-05-25
"I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments
Authors: Ziyi Zhang, Zhen Sun, Zongmin Zhang, Zifan Peng, Yuemeng Zhao, Zichun Wang, Zeren Luo, Ruiting Zuo, Xinlei He |
阅读更多来源: ArXiv AI | 08-05-25
Automatic Music Transcription using Convolutional Neural Networks and Constant-Q transform
Authors: Yohannis Telila, Tommaso Cucinotta, Davide Bacciu |
阅读更多来源: ArXiv AI | 08-05-25
Model-Based AI planning and Execution Systems for Robotics
Authors: Or Wertheim, Ronen I. Brafman |
阅读更多来源: ArXiv AI | 08-05-25
Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind
Authors: Mouad Abrini, Omri Abend, Dina Acklin, Henny Admoni, Gregor Aichinger, Nitay Alon, Zahra Ashktorab, Ashish Atreja, Moises Auron, Alexander Aufreiter, Raghav Awasthi, Soumya Banerjee, Joe M. Barnby, Rhea Basappa, Severin Bergsmann, Djallel Bouneffouf, Patrick Callaghan, Marc Cavazza, Thierry Chaminade, Sonia Chernova, Mohamed Chetouan, Moumita Choudhury, Axel Cleeremans, Jacek B. Cywinski, Fabio Cuzzolin, Hokin Deng, N'yoma Diamond, Camilla Di Pasquasio, Guillaume Dumas, Max van Duijn, Mahapatra Dwarikanath, Qingying Gao, Ashok Goel, Rebecca Goldstein, Matthew Gombolay, Gabriel Enrique Gonzalez, Amar Halilovic, Tobias Halmdienst, Mahimul Islam, Julian Jara-Ettinger, Natalie Kastel, Renana Keydar, Ashish K. Khanna, Mahdi Khoramshahi, JiHyun Kim, MiHyeon Kim, YoungBin Kim, Senka Krivic, Nikita Krasnytskyi, Arun Kumar, JuneHyoung Kwon, Eunju Lee, Shane Lee, Peter R. Lewis, Xue Li, Yijiang Li, Michal Lewandowski, Nathan Lloyd, Matthew B. Luebbers, Dezhi Luo, Haiyun Lyu, Dwarikanath Mahapatra, Kamal Maheshwari, Mallika Mainali, Piyush Mathur, Patrick Mederitsch, Shuwa Miura, Manuel Preston de Miranda, Reuth Mirsky, Shreya Mishra, Nina Moorman, Katelyn Morrison, John Muchovej, Bernhard Nessler, Felix Nessler, Hieu Minh Jord Nguyen, Abby Ortego, Francis A. Papay, Antoine Pasquali, Hamed Rahimi, Charumathi Raghu, Amanda Royka, Stefan Sarkadi, Jaelle Scheuerman, Simon Schmid, Paul Schrater, Anik Sen, Zahra Sheikhbahaee, Ke Shi, Reid Simmons, Nishant Singh, Mason O. Smith, Ramira van der Meulen, Anthia Solaki, Haoran Sun, Viktor Szolga, Matthew E. Taylor, Travis Taylor, Sanne Van Waveren, Juan David Vargas |
阅读更多来源: ArXiv AI | 08-05-25
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
Authors: Zhenghao Xing, Xiaowei Hu, Chi-Wing Fu, Wenhai Wang, Jifeng Dai, Pheng-Ann Heng |
阅读更多来源: ArXiv AI | 08-05-25
Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization
Authors: Wenjun Cao |
阅读更多来源: ArXiv AI | 08-05-25
The Power of Stories: Narrative Priming Shapes How LLM Agents Collaborate and Compete
Authors: Gerrit Großmann, Larisa Ivanova, Sai Leela Poduru, Mohaddeseh Tabrizian, Islam Mesabah, David A. Selby, Sebastian J. Vollmer |
阅读更多来源: ArXiv AI | 08-05-25
LogiDebrief: A Signal-Temporal Logic based Automated Debriefing Approach with Large Language Models Integration
Authors: Zirong Chen, Ziyan An, Jennifer Reynolds, Kristin Mullen, Stephen Martini, Meiyi Ma |
阅读更多来源: ArXiv AI | 08-05-25
TrajEvo: Designing Trajectory Prediction Heuristics via LLM-driven Evolution
Authors: Zhikai Zhao, Chuanbo Hua, Federico Berto, Kanghoon Lee, Zihan Ma, Jiachen Li, Jinkyoo Park |
阅读更多来源: ArXiv AI | 08-05-25
ChatGPT sees about 50 percent more use on weekdays than weekends
阅读更多来源: The Decoder | 08-05-25
OpenAI restructures as public benefit corporation under non-profit control
阅读更多来源: The Decoder | 08-05-25
Google upgrades Gemini 2.5 Pro for coding and app development
阅读更多来源: The Decoder | 08-05-25
Wikidive – AI guided rabbitholes in Wikipediawikidive.tulv.in
阅读更多来源: Hacker News | 08-05-25
How to Average in Prolog (2017)storytotell.org
阅读更多来源: Hacker News | 08-05-25
Detecting Quishing Attacks with Machine Learning Techniques Through QR Code Analysis
Authors: Fouad Trad, Ali Chehab |
阅读更多来源: ArXiv AI | 07-05-25
An Analysis of Hyper-Parameter Optimization Methods for Retrieval Augmented Generation
Authors: Matan Orbach, Ohad Eytan, Benjamin Sznajder, Ariel Gera, Odellia Boni, Yoav Kantor, Gal Bloch, Omri Levy, Hadas Abraham, Nitzan Barzilay, Eyal Shnarch, Michael E. Factor, Shila Ofek-Koifman, Paula Ta-Shma, Assaf Toledo |
阅读更多来源: ArXiv AI | 07-05-25
Blending 3D Geometry and Machine Learning for Multi-View Stereopsis
Authors: Vibhas Vats, Md. Alimoor Reza, David Crandall, Soon-heung Jung |
阅读更多来源: ArXiv AI | 07-05-25
Rapid AI-based generation of coverage paths for dispensing applications
Authors: Simon Baeuerle, Ian F. Mendonca, Kristof Van Laerhoven, Ralf Mikut, Andreas Steimer |
阅读更多来源: ArXiv AI | 07-05-25
LlamaFirewall: An open source guardrail system for building secure AI agents
Authors: Sahana Chennabasappa, Cyrus Nikolaidis, Daniel Song, David Molnar, Stephanie Ding, Shengye Wan, Spencer Whitman, Lauren Deason, Nicholas Doucette, Abraham Montilla, Alekhya Gampa, Beto de Paola, Dominik Gabi, James Crnkovich, Jean-Christophe Testud, Kat He, Rashnil Chaturvedi, Wu Zhou, Joshua Saxe |
阅读更多来源: ArXiv AI | 07-05-25
Holmes: Automated Fact Check with Large Language Models
Authors: Haoran Ou, Gelei Deng, Xingshuo Han, Jie Zhang, Xinlei He, Han Qiu, Shangwei Guo, Tianwei Zhang |
阅读更多来源: ArXiv AI | 07-05-25
Is AI currently capable of identifying wild oysters? A comparison of human annotators against the AI model, ODYSSEE
Authors: Brendan Campbell, Alan Williams, Kleio Baxevani, Alyssa Campbell, Rushabh Dhoke, Rileigh E. Hudock, Xiaomin Lin, Vivek Mange, Bernhard Neuberger, Arjun Suresh, Alhim Vera, Arthur Trembanis, Herbert G. Tanner, Edward Hale |
阅读更多来源: ArXiv AI | 07-05-25
CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics
Authors: Junqi Liu, Xiaohan Lin, Jonas Bayer, Yael Dillies, Weijie Jiang, Xiaodan Liang, Roman Soletskyi, Haiming Wang, Yunzhou Xie, Beibei Xiong, Zhengfeng Yang, Jujian Zhang, Lihong Zhi, Jia Li, Zhengying Liu |
阅读更多来源: ArXiv AI | 07-05-25
Capability-Driven Skill Generation with LLMs: A RAG-Based Approach for Reusing Existing Libraries and Interfaces
Authors: Luis Miguel Vieira da Silva, Aljosha Köcher, Nicolas König, Felix Gehlhoff, Alexander Fay |
阅读更多来源: ArXiv AI | 07-05-25
RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection via Retrieval-Augmented Generation
Authors: Tiantian Gan, Qiyao Sun |
阅读更多来源: ArXiv AI | 07-05-25
Validating the Effectiveness of a Large Language Model-based Approach for Identifying Children's Development across Various Free Play Settings in Kindergarten
Authors: Yuanyuan Yang, Yuan Shen, Tianchen Sun, Yangbin Xie |
阅读更多来源: ArXiv AI | 07-05-25
Procedural Memory Is Not All You Need: Bridging Cognitive Gaps in LLM-Based Agents
Authors: Schaun Wheeler, Olivier Jeunen |
阅读更多来源: ArXiv AI | 07-05-25
am-ELO: A Stable Framework for Arena-based LLM Evaluation
Authors: Zirui Liu, Jiatong Li, Yan Zhuang, Qi Liu, Shuanghong Shen, Jie Ouyang, Mingyue Cheng, Shijin Wang |
阅读更多来源: ArXiv AI | 07-05-25
OSUniverse: Benchmark for Multimodal GUI-navigation AI Agents
Authors: Mariya Davydova, Daniel Jeffries, Patrick Barker, Arturo Márquez Flores, Sinéad Ryan |
阅读更多来源: ArXiv AI | 07-05-25
Graph Drawing for LLMs: An Empirical Evaluation
Authors: Walter Didimo, Fabrizio Montecchiani, Tommaso Piselli |
阅读更多来源: ArXiv AI | 07-05-25
Accents in latent spaces: How AI hears accent strength in Englishboldvoice.com
阅读更多来源: Hacker News | 07-05-25
Gemini 2.5 Pro Previewgoogleblog.com
阅读更多来源: Hacker News | 07-05-25
Claude's system prompt is over 24k tokens with toolsgithub.com/asgeirtj
阅读更多来源: Hacker News | 07-05-25
OpenAI reaches agreement to buy Windsurf for $3Bbloomberg.com
阅读更多来源: Hacker News | 07-05-25
Show HN: Clippy – 90s UI for local LLMsfelixrieseberg.github.io
阅读更多来源: Hacker News | 07-05-25
I built an AI code review agent in a few hours, here's what I learnedsourcebot.dev
阅读更多来源: Hacker News | 07-05-25
A coherent European/non-US cloud strategyberthub.eu
阅读更多来源: Hacker News | 07-05-25