每日 AI 资讯 by Homer

by Homer LYJIEBOX@QQ.COM

"No grace period, no pause": EU sticks to AI Act timeline despite industry pushback

阅读更多

来源: The Decoder | 07-07-25

ChatGPT usage for news surges as Google news searches decline

阅读更多

来源: The Decoder | 07-07-25

ChatGPT helped identify a genetic MTHFR mutation after a decade of missed diagnoses

阅读更多

来源: The Decoder | 07-07-25

The Maquet machine: how AI is reviving Alexandre Dumas' successful model

阅读更多

来源: The Decoder | 07-07-25

Alibaba's new GPT-4o competitor Qwen VLo is no longer open source

阅读更多

来源: The Decoder | 07-07-25

OpenAI's Head of Recruiting says Meta's hiring tactics "reek of desperation"

阅读更多

来源: The Decoder | 07-07-25

LLMs should not replace therapistsarxiv.org

阅读更多

来源: Hacker News | 07-07-25

Opencode: AI coding agent, built for the terminalgithub.com/sst

阅读更多

来源: Hacker News | 07-07-25

A non-anthropomorphized view of LLMsaddxorrol.blogspot.com

阅读更多

来源: Hacker News | 07-07-25

Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks

Authors: Sizhe Chen, Arman Zharmagambetov, David Wagner, Chuan Guo |

阅读更多

来源: ArXiv AI | 07-07-25

Early Signs of Steganographic Capabilities in Frontier LLMs

Authors: Artur Zolkowski, Kei Nishimura-Gasparian, Robert McCarthy, Roland S. Zimmermann, David Lindner |

阅读更多

来源: ArXiv AI | 07-07-25

Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs

Authors: Ken Tsui |

阅读更多

来源: ArXiv AI | 07-07-25

SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model

Authors: Wencheng Zhang, Shiqin Qiao, Lingjie Luo, Yinfeng Li, Chuanyang Zheng, Qian Xu, Meng Li, Yong Gui, Yijun He, Jianing Qiu, Jindong Hong, Jiankai Sun |

阅读更多

来源: ArXiv AI | 07-07-25

MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs

Authors: Purbesh Mitra, Sennur Ulukus |

阅读更多

来源: ArXiv AI | 07-07-25

STELLA: Self-Evolving LLM Agent for Biomedical Research

Authors: Ruofan Jin, Zaixi Zhang, Mengdi Wang, Le Cong |

阅读更多

来源: ArXiv AI | 07-07-25

Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs

Authors: Mohammad Ali Alomrani, Yingxue Zhang, Derek Li, Qianyi Sun, Soumyasundar Pal, Zhanguang Zhang, Yaochen Hu, Rohan Deepak Ajwani, Antonios Valkanas, Raika Karimi, Peng Cheng, Yunzhou Wang, Pengyi Liao, Hanrui Huang, Bin Wang, Jianye Hao, Mark Coates |

阅读更多

来源: ArXiv AI | 07-07-25

Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust

Authors: Amogh Mannekote, Adam Davies, Guohao Li, Kristy Elizabeth Boyer, ChengXiang Zhai, Bonnie J Dorr, Francesco Pinto |

阅读更多

来源: ArXiv AI | 07-07-25

Data Diversification Methods In Alignment Enhance Math Performance In LLMs

Authors: Berkan Dokmeci, Qingyang Wu, Ben Athiwaratkun, Ce Zhang, Shuaiwen Leon Song, James Zou |

阅读更多

来源: ArXiv AI | 07-07-25

What Neuroscience Can Teach AI About Learning in Continuously Changing Environments

Authors: Daniel Durstewitz, Bruno Averbeck, Georgia Koppe |

阅读更多

来源: ArXiv AI | 07-07-25

Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous Evaluation

Authors: Jungkoo Kang |

阅读更多

来源: ArXiv AI | 07-07-25

OMS: On-the-fly, Multi-Objective, Self-Reflective Ad Keyword Generation via LLM Agent

Authors: Bowen Chen, Zhao Wang, Shingo Takamatsu |

阅读更多

来源: ArXiv AI | 07-07-25

AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench

Authors: Edan Toledo, Karen Hambardzumyan, Martin Josifoski, Rishi Hazra, Nicolas Baldwin, Alexis Audran-Reiss, Michael Kuchnik, Despoina Magka, Minqi Jiang, Alisia Maria Lupidi, Andrei Lupu, Roberta Raileanu, Kelvin Niu, Tatiana Shavrina, Jean-Christophe Gagnon-Audet, Michael Shvartsman, Shagun Sodhani, Alexander H. Miller, Abhishek Charnalia, Derek Dunfield, Carole-Jean Wu, Pontus Stenetorp, Nicola Cancedda, Jakob Nicolaus Foerster, Yoram Bachrach |

阅读更多

来源: ArXiv AI | 07-07-25

Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory

Authors: Kenneth Payne, Baptiste Alloui-Cros |

阅读更多

来源: ArXiv AI | 07-07-25

Detection of Disengagement from Voluntary Quizzes: An Explainable Machine Learning Approach in Higher Distance Education

Authors: Behnam Parsaeifard, Christof Imhof, Tansu Pancar, Ioan-Sorin Comsa, Martin Hlosta, Nicole Bergamin, Per Bergamin |

阅读更多

来源: ArXiv AI | 07-07-25

Knowledge Protocol Engineering: A New Paradigm for AI in Domain-Specific Knowledge Work

Authors: Guangwei Zhang |

阅读更多

来源: ArXiv AI | 07-07-25

KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs

Authors: Yuzhang Xie, Hejie Cui, Ziyang Zhang, Jiaying Lu, Kai Shu, Fadi Nahab, Xiao Hu, Carl Yang |

阅读更多

来源: ArXiv AI | 07-07-25

Collatz's Ant and Σ(n)gbragafibra.github.io

阅读更多

来源: Hacker News | 07-07-25

Overclocking LLM Reasoning: Monitoring and Controlling LLM Thinking Path Lengthsroyeisen.github.io

阅读更多

来源: Hacker News | 07-07-25

Mirage: AI-native UGC game engine powered by real-time world modeldynamicslab.ai

阅读更多

来源: Hacker News | 07-07-25

Optimizing Tool Selection for LLM Workflows with Differentiable Programmingviksit.substack.com

阅读更多

来源: Hacker News | 06-07-25

The force-feeding of AI features on an unwilling publichonest-broker.com

阅读更多

来源: Hacker News | 06-07-25

A Canadian's AI hoax duped the media and propelled a 'band' to successcbc.ca

阅读更多

来源: Hacker News | 06-07-25

The Right Way to Embed an LLM in a Group Chattripjam.app

阅读更多

来源: Hacker News | 06-07-25

Impact of PCIe 5.0 Bandwidth on GPU Content Creation and LLM Performancepugetsystems.com

阅读更多

来源: Hacker News | 05-07-25

Large Language Models Are Improving Exponentiallyieee.org

阅读更多

来源: Hacker News | 05-07-25

SciArena lets scientists compare LLMs on real research questions

阅读更多

来源: The Decoder | 05-07-25

Google launches Veo 3 Fast worldwide, letting Gemini Pro users generate videos up to 720p

阅读更多

来源: The Decoder | 05-07-25

Gremllmgithub.com/awwaiid

阅读更多

来源: Hacker News | 05-07-25

ChatGPT creates phisher's paradise by serving the wrong URLs for major companiestheregister.com

阅读更多

来源: Hacker News | 05-07-25

Version Control for AI Codingbranching.app

阅读更多

来源: Hacker News | 05-07-25

Everything around LLMs is still magical and wishful thinkingdmitriid.com

阅读更多

来源: Hacker News | 05-07-25

Meta reportedly offers top OpenAI researchers up to $300 million over four years

阅读更多

来源: The Decoder | 04-07-25

How AI on Microcontrollers Works: Operators and Kernelsdanielmangum.com

阅读更多

来源: Hacker News | 04-07-25

Show HN: I AI coded a tower defense game and documented the whole processgithub.com/maciej-trebacz

阅读更多

来源: Hacker News | 04-07-25

Fei-Fei Li: Spatial intelligence is the next frontier in AI [video]youtube.com

阅读更多

来源: Hacker News | 04-07-25

About AI Evalshamel.dev

阅读更多

来源: Hacker News | 04-07-25

Manipulating trapped air bubbles in ice for message storage in cold regionscell.com

阅读更多

来源: Hacker News | 04-07-25

Cloudflare aims to save the World Wide Web by blocking AI crawlers without explicit consent

阅读更多

来源: The Decoder | 03-07-25

Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling

Authors: Zeyu Huang, Tianhao Cheng, Zihan Qiu, Zili Wang, Yinghui Xu, Edoardo M. Ponti, Ivan Titov |

阅读更多

来源: ArXiv AI | 03-07-25

AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training

Authors: Zhenyu Han, Ansheng You, Haibo Wang, Kui Luo, Guang Yang, Wenqi Shi, Menglong Chen, Sicheng Zhang, Zeshun Lan, Chunshi Deng, Huazhong Ji, Wenjie Liu, Yu Huang, Yixiang Zhang, Chenyi Pan, Jing Wang, Xin Huang, Chunsheng Li, Jianping Wu |

阅读更多

来源: ArXiv AI | 03-07-25

Towards culturally-appropriate conversational AI for health in the majority world: An exploratory study with citizens and professionals in Latin America

Authors: Dorian Peters, Fernanda Espinoza, Marco da Re, Guido Ivetta, Luciana Benotti, Rafael A. Calvo |

阅读更多

来源: ArXiv AI | 03-07-25

AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness

Authors: Zixin Chen, Hongzhan Lin, Kaixin Li, Ziyang Luo, Zhen Ye, Guang Chen, Zhiyong Huang, Jing Ma |

阅读更多

来源: ArXiv AI | 03-07-25

Exploring Advanced LLM Multi-Agent Systems Based on Blackboard Architecture

Authors: Bochen Han, Songmao Zhang |

阅读更多

来源: ArXiv AI | 03-07-25

Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training

Authors: Ismail Labiad, Mathurin Videau, Matthieu Kowalski, Marc Schoenauer, Alessandro Leite, Julia Kempe, Olivier Teytaud |

阅读更多

来源: ArXiv AI | 03-07-25

MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining

Authors: Zhixun Chen, Ping Guo, Wenhan Han, Yifan Zhang, Binbin Liu, Haobin Lin, Fengze Liu, Yan Zhao, Bingni Zhang, Taifeng Wang, Yin Zheng, Meng Fang |

阅读更多

来源: ArXiv AI | 03-07-25

Enhanced Generative Model Evaluation with Clipped Density and Coverage

Authors: Nicolas Salvy, Hugues Talbot, Bertrand Thirion |

阅读更多

来源: ArXiv AI | 03-07-25

Empowering Manufacturers with Privacy-Preserving AI Tools: A Case Study in Privacy-Preserving Machine Learning to Solve Real-World Problems

Authors: Xiaoyu Ji, Jessica Shorland, Joshua Shank, Pascal Delpe-Brice, Latanya Sweeney, Jan Allebach, Ali Shakouri |

阅读更多

来源: ArXiv AI | 03-07-25

LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs

Authors: Reza Arabpour, Haitz Sáez de Ocáriz Borde, Anastasis Kratsios |

阅读更多

来源: ArXiv AI | 03-07-25

Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging

Authors: Montasir Shams, Chashi Mahiul Islam, Shaeke Salman, Phat Tran, Xiuwen Liu |

阅读更多

来源: ArXiv AI | 03-07-25

End-to-End Large Portfolio Optimization for Variance Minimization with Neural Networks through Covariance Cleaning

Authors: Christian Bongiorno, Efstratios Manolakis, Rosario Nunzio Mantegna |

阅读更多

来源: ArXiv AI | 03-07-25

Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models

Authors: Chengao Li, Hanyu Zhang, Yunkun Xu, Hongyan Xue, Xiang Ao, Qing He |

阅读更多

来源: ArXiv AI | 03-07-25

AI4Research: A Survey of Artificial Intelligence for Scientific Research

Authors: Qiguang Chen, Mingda Yang, Libo Qin, Jinhao Liu, Zheng Yan, Jiannan Guan, Dengyun Peng, Yiyan Ji, Hanjing Li, Mengkang Hu, Yimeng Zhang, Yihao Liang, Yuhang Zhou, Jiaqi Wang, Zhi Chen, Wanxiang Che |

阅读更多

来源: ArXiv AI | 03-07-25

How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

Authors: Rahul Ramachandran, Ali Garjani, Roman Bachmann, Andrei Atanov, Oğuzhan Fatih Kar, Amir Zamir |

阅读更多

来源: ArXiv AI | 03-07-25

Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla

Authors: Md Sazzadul Islam Ridoy, Sumi Akter, Md. Aminur Rahman |

阅读更多

来源: ArXiv AI | 03-07-25

Exploring a Hybrid Deep Learning Approach for Anomaly Detection in Mental Healthcare Provider Billing: Addressing Label Scarcity through Semi-Supervised Anomaly Detection

Authors: Samirah Bakker, Yao Ma, Seyed Sahand Mohammadi Ziabari |

阅读更多

来源: ArXiv AI | 03-07-25

AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

Authors: Sixiang Chen, Jiaming Liu, Siyuan Qian, Han Jiang, Lily Li, Renrui Zhang, Zhuoyang Liu, Chenyang Gu, Chengkai Hou, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang |

阅读更多

来源: ArXiv AI | 03-07-25

Using multi-agent architecture to mitigate the risk of LLM hallucinations

Authors: Abd Elrahman Amer, Magdi Amer |

阅读更多

来源: ArXiv AI | 03-07-25

MindsDB (YC W20) is hiring an AI solutions engineergreenhouse.io

阅读更多

来源: Hacker News | 03-07-25

What to build instead of AI agentsdecodingml.substack.com

阅读更多

来源: Hacker News | 03-07-25

Meta founds Superintelligence Labs with top acquisitions from OpenAI and Google

阅读更多

来源: The Decoder | 02-07-25

Apple weighs abandoning its own AI for Siri as it tests models from OpenAI and Anthropic

阅读更多

来源: The Decoder | 02-07-25

HN Slop: AI startup ideas generated from Hacker Newsjosh.ing

阅读更多

来源: Hacker News | 02-07-25

Show HN: A modern C++20 AI SDK (GPT‑4o, Claude 3.5, tool‑calling)

阅读更多

来源: Hacker News | 02-07-25

Show HN: Spegel, a Terminal Browser That Uses LLMs to Rewrite Webpagessimedw.com

阅读更多

来源: Hacker News | 02-07-25

Sam Altman Slams Meta's AI Talent Poaching: 'Missionaries Will Beat Mercenaries'wired.com

阅读更多

来源: Hacker News | 02-07-25

Hilbert's sixth problem: derivation of fluid equations via Boltzmann's theoryarxiv.org

阅读更多

来源: Hacker News | 02-07-25

How large are large language models?gist.github.com

阅读更多

来源: Hacker News | 02-07-25

Echoes of AI: Investigating the Downstream Effects of AI Assistants on Software Maintainability

Authors: Markus Borg, Dave Hewett, Nadim Hagatulah, Noric Couderc, Emma Söderberg, Donald Graham, Uttam Kini, Dave Farley |

阅读更多

来源: ArXiv AI | 02-07-25

HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning

Authors: Zhi Jing, Siyuan Yang, Jicong Ao, Ting Xiao, Yugang Jiang, Chenjia Bai |

阅读更多

来源: ArXiv AI | 02-07-25

Automated anatomy-based post-processing reduces false positives and improved interpretability of deep learning intracranial aneurysm detection

Authors: Jisoo Kim, Chu-Hsuan Lin, Alberto Ceballos-Arroyo, Ping Liu, Huaizu Jiang, Shrikanth Yadav, Qi Wan, Lei Qin, Geoffrey S Young |

阅读更多

来源: ArXiv AI | 02-07-25

CAVALRY-V: A Large-Scale Generator Framework for Adversarial Attacks on Video MLLMs

Authors: Jiaming Zhang, Rui Hu, Qing Guo, Wei Yang Bryan Lim |

阅读更多

来源: ArXiv AI | 02-07-25

Many LLMs Are More Utilitarian Than One

Authors: Anita Keshmirian, Razan Baltaji, Babak Hemmatian, Hadi Asghari, Lav R. Varshney |

阅读更多

来源: ArXiv AI | 02-07-25

Deep learning-based segmentation of T1 and T2 cardiac MRI maps for automated disease detection

Authors: Andreea Bianca Popescu, Andreas Seitz, Heiko Mahrholdt, Jens Wetzl, Athira Jacob, Lucian Mihai Itu, Constantin Suciu, Teodora Chitiboi |

阅读更多

来源: ArXiv AI | 02-07-25

Stylometry recognizes human and LLM-generated texts in short samples

Authors: Karol Przystalski, Jan K. Argasiński, Iwona Grabska-Gradzińska, Jeremi K. Ochab |

阅读更多

来源: ArXiv AI | 02-07-25

Large Language Model Powered Intelligent Urban Agents: Concepts, Capabilities, and Applications

Authors: Jindong Han, Yansong Ning, Zirui Yuan, Hang Ni, Fan Liu, Tengfei Lyu, Hao Liu |

阅读更多

来源: ArXiv AI | 02-07-25

Turning AI Data Centers into Grid-Interactive Assets: Results from a Field Demonstration in Phoenix, Arizona

Authors: Philip Colangelo, Ayse K. Coskun, Jack Megrue, Ciaran Roberts, Shayan Sengupta, Varun Sivaram, Ethan Tiao, Aroon Vijaykar, Chris Williams, Daniel C. Wilson, Zack MacFarland, Daniel Dreiling, Nathan Morey, Anuja Ratnayake, Baskar Vairamohan |

阅读更多

来源: ArXiv AI | 02-07-25

Description of the Training Process of Neural Networks via Ergodic Theorem : Ghost nodes

Authors: Eun-Ji Park, Sangwon Yun |

阅读更多

来源: ArXiv AI | 02-07-25

TalentMine: LLM-Based Extraction and Question-Answering from Multimodal Talent Tables

Authors: Varun Mannam, Fang Wang, Chaochun Liu, Xin Chen |

阅读更多

来源: ArXiv AI | 02-07-25

Holistic Artificial Intelligence in Medicine; improved performance and explainability

Authors: Periklis Petridis, Georgios Margaritis, Vasiliki Stoumpou, Dimitris Bertsimas |

阅读更多

来源: ArXiv AI | 02-07-25

ChatGPT produces more "lazy" thinkers: Evidence of cognitive engagement decline

Authors: Georgios P. Georgiou |

阅读更多

来源: ArXiv AI | 02-07-25

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Authors: Maggie Huan, Yuetai Li, Tuney Zheng, Xiaoyu Xu, Seungone Kim, Minxin Du, Radha Poovendran, Graham Neubig, Xiang Yue |

阅读更多

来源: ArXiv AI | 02-07-25

Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess

Authors: Dongyoon Hwang, Hojoon Lee, Jaegul Choo, Dongmin Park, Jongho Park |

阅读更多

来源: ArXiv AI | 02-07-25

A Robust Algorithm for Non-IID Machine Learning Problems with Convergence Analysis

Authors: Qing Xu, Xiaohua Xuan |

阅读更多

来源: ArXiv AI | 02-07-25

Enhancing LLM Agent Safety via Causal Influence Prompting

Authors: Dongyoon Hahm, Woogyeol Jin, June Suk Choi, Sungsoo Ahn, Kimin Lee |

阅读更多

来源: ArXiv AI | 02-07-25

Google brings Gemini for Education and Gemini in Classroom AI tools to schools

阅读更多

来源: The Decoder | 02-07-25

Microsoft’s MAI-DxO boosts AI diagnostic accuracy and cuts costs by nearly 70 percent

阅读更多

来源: The Decoder | 02-07-25

The wanton destruction of a creative-tech eragreg.technology

阅读更多

来源: Hacker News | 02-07-25

Building a Personal AI Factoryjohn-rush.com

阅读更多

来源: Hacker News | 02-07-25

Show HN: Core – open source memory graph for LLMs – shareable, user ownedgithub.com/redplanethq

阅读更多

来源: Hacker News | 02-07-25

After Meta's recruiting push, OpenAI tries to retain talent

阅读更多

来源: The Decoder | 01-07-25

Claude Code now supports hooksanthropic.com

阅读更多

来源: Hacker News | 01-07-25

GPEmu: A GPU emulator for rapid, low-cost deep learning prototyping [pdf]vldb.org

阅读更多

来源: Hacker News | 01-07-25

Cloudflare to introduce pay-per-crawl for AI botscloudflare.com

阅读更多

来源: Hacker News | 01-07-25

Researchers Uncover Hidden Ingredients Behind AI Creativityquantamagazine.org

阅读更多

来源: Hacker News | 01-07-25

The new skill in AI is not prompting, it's context engineeringphilschmid.de

阅读更多

来源: Hacker News | 01-07-25

The hidden JTAG in a Qualcomm/Snapdragon device’s USB portlinaro.org

阅读更多

来源: Hacker News | 01-07-25

Show HN: ToplingDB - A Persistent Key-Value Store for External Storagegithub.com/topling

阅读更多

来源: Hacker News | 01-07-25

The average chess players of Bletchley Park and AI research in Britainblogs.bl.uk

阅读更多

来源: Hacker News | 01-07-25

Development of Hybrid Artificial Intelligence Training on Real and Synthetic Data: Benchmark on Two Mixed Training Strategies

Authors: Paul Wachter, Lukas Niehaus, Julius Schöning |

阅读更多

来源: ArXiv AI | 01-07-25

Bootstrapping Human-Like Planning via LLMs

Authors: David Porfirio, Vincent Hsiao, Morgan Fine-Morris, Leslie Smith, Laura M. Hiatt |

阅读更多

来源: ArXiv AI | 01-07-25

Bridging Ethical Principles and Algorithmic Methods: An Alternative Approach for Assessing Trustworthiness in AI Systems

Authors: Michael Papademas, Xenia Ziouvelou, Antonis Troumpoukis, Vangelis Karkaletsis |

阅读更多

来源: ArXiv AI | 01-07-25

The Societal Impact of Foundation Models: Advancing Evidence-based AI Policy

Authors: Rishi Bommasani |

阅读更多

来源: ArXiv AI | 01-07-25

Can Large Language Models Capture Human Risk Preferences? A Cross-Cultural Study

Authors: Bing Song, Jianing Liu, Sisi Jian, Chenyang Wu, Vinayak Dixit |

阅读更多

来源: ArXiv AI | 01-07-25

Are Large Language Models Capable of Deep Relational Reasoning? Insights from DeepSeek-R1 and Benchmark Comparisons

Authors: Chi Chiu So, Yueyue Sun, Jun-Min Wang, Siu Pang Yung, Anthony Wai Keung Loh, Chun Pong Chau |

阅读更多

来源: ArXiv AI | 01-07-25

Data Augmentation for Cognitive Behavioral Therapy: Leveraging ERNIE Language Models using Artificial Intelligence

Authors: Bosubabu Sambana, Kondreddygari Archana, Suram Indhra Sena Reddy, Shaik Meethaigar Jameer Basha, Shaik Karishma |

阅读更多

来源: ArXiv AI | 01-07-25

The Confidence Paradox: Can LLM Know When It's Wrong

Authors: Sahil Tripathi, Md Tabrez Nafis, Imran Hussain, Jiechao Gao |

阅读更多

来源: ArXiv AI | 01-07-25

CooT: Learning to Coordinate In-Context with Coordination Transformers

Authors: Huai-Chih Wang, Hsiang-Chun Chuang, Hsi-Chun Cheng, Dai-Jie Wu, Shao-Hua Sun |

阅读更多

来源: ArXiv AI | 01-07-25

ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data

Authors: Yu Zhang, Ruijie Yu, Jidong Tian, Feng Zhu, Jiapeng Liu, Xiaokang Yang, Yaohui Jin, Yanyan Xu |

阅读更多

来源: ArXiv AI | 01-07-25

Assessing GPTZero's Accuracy in Identifying AI vs. Human-Written Essays

Authors: Selin Dik, Osman Erdem, Mehmet Dik |

阅读更多

来源: ArXiv AI | 01-07-25

Evaluating Multi-Agent Defences Against Jailbreaking Attacks on Large Language Models

Authors: Maria Carolina Cornelia Wit, Jun Pang |

阅读更多

来源: ArXiv AI | 01-07-25

MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI

Authors: Huanjin Yao, Jiaxing Huang, Yawen Qiu, Michael K. Chen, Wenzheng Liu, Wei Zhang, Wenjie Zeng, Xikun Zhang, Jingyi Zhang, Yuxin Song, Wenhao Wu, Dacheng Tao |

阅读更多

来源: ArXiv AI | 01-07-25

Agent4S: The Transformation of Research Paradigms from the Perspective of Large Language Models

Authors: Boyuan Zheng, Zerui Fang, Zhe Xu, Rui Wang, Yiwen Chen, Cunshi Wang, Mengwei Qu, Lei Lei, Zhen Feng, Yan Liu, Yuyang Li, Mingzhou Tan, Jiaji Wu, Jianwei Shuai, Jia Li, Fangfu Ye |

阅读更多

来源: ArXiv AI | 01-07-25

Attestable Audits: Verifiable AI Safety Benchmarks Using Trusted Execution Environments

Authors: Christoph Schnabl, Daniel Hugenroth, Bill Marino, Alastair R. Beresford |

阅读更多

来源: ArXiv AI | 01-07-25

A New Perspective On AI Safety Through Control Theory Methodologies

Authors: Lars Ullrich, Walter Zimmer, Ross Greer, Knut Graichen, Alois C. Knoll, Mohan Trivedi |

阅读更多

来源: ArXiv AI | 01-07-25

Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning

Authors: Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik |

阅读更多

来源: ArXiv AI | 01-07-25

Performance of LLMs on Stochastic Modeling Operations Research Problems: From Theory to Practice

Authors: Akshit Kumar, Tianyi Peng, Yuhang Wu, Assaf Zeevi |

阅读更多

来源: ArXiv AI | 01-07-25

AI Risk-Management Standards Profile for General-Purpose AI (GPAI) and Foundation Models

Authors: Anthony M. Barrett, Jessica Newman, Brandie Nonnecke, Nada Madkour, Dan Hendrycks, Evan R. Murphy, Krystal Jackson, Deepika Raman |

阅读更多

来源: ArXiv AI | 01-07-25

Harnessing AI Agents to Advance Research on Refugee Child Mental Health

Authors: Aditya Shrivastava, Komal Gupta, Shraddha Arora |

阅读更多

来源: ArXiv AI | 01-07-25

OpenAI loses four more top researchers to Meta as even its own engineers call it a "huge loss"

阅读更多

来源: The Decoder | 01-07-25

Show HN: Local LLM Notepad – run a GPT-style model from a USB stickgithub.com/runzhouye

阅读更多

来源: Hacker News | 01-07-25

Show HN: We're two coffee nerds who built an AI app to track beans and recipesbeanbook.app

阅读更多

来源: Hacker News | 01-07-25

Show HN: TokenDagger – A tokenizer faster than OpenAI's Tiktokengithub.com/m4thyou

阅读更多

来源: Hacker News | 01-07-25

There are no new ideas in AI only new datasetsjxmo.io

阅读更多

来源: Hacker News | 01-07-25

OmniGen 2 blends image and text generation like GPT-4o, but is open source

阅读更多

来源: The Decoder | 30-06-25

Gridfinity: The modular, open-source grid storage systemgridfinity.xyz

阅读更多

来源: Hacker News | 30-06-25

Using Large Language Models to Suggest Informative Prior Distributions in Bayesian Statistics

Authors: Michael A. Riegler, Kristoffer Herland Hellton, Vajira Thambawita, Hugo L. Hammer |

阅读更多

来源: ArXiv AI | 30-06-25

Analyzing and Fine-Tuning Whisper Models for Multilingual Pilot Speech Transcription in the Cockpit

Authors: Kartheek Kumar Reddy Nareddy, Sarah Ternus, Julia Niebling |

阅读更多

来源: ArXiv AI | 30-06-25

Advancing Jailbreak Strategies: A Hybrid Approach to Exploiting LLM Vulnerabilities and Bypassing Modern Defenses

Authors: Mohamed Ahmed, Mohamed Abdelmouty, Mingyu Kim, Gunvanth Kandula, Alex Park, James C. Davis |

阅读更多

来源: ArXiv AI | 30-06-25

Transformers are Graph Neural Networks

Authors: Chaitanya K. Joshi |

阅读更多

来源: ArXiv AI | 30-06-25

Autonomic Microservice Management via Agentic AI and MAPE-K Integration

Authors: Matteo Esposito, Alexander Bakhtin, Noman Ahmad, Mikel Robredo, Ruoyu Su, Valentina Lenarduzzi, Davide Taibi |

阅读更多

来源: ArXiv AI | 30-06-25

CoATA: Effective Co-Augmentation of Topology and Attribute for Graph Neural Networks

Authors: Tao Liu, Longlong Lin, Yunfeng Yu, Xi Ou, Youan Zhang, Zhiqiu Ye, Tao Jia |

阅读更多

来源: ArXiv AI | 30-06-25

Projected Compression: Trainable Projection for Efficient Transformer Compression

Authors: Maciej Stefaniak, Michał Krutul, Jan Małaśnicki, Maciej Pióro, Jakub Krajewski, Sebastian Jaszczur, Marek Cygan, Kamil Adamczewski, Jan Ludziejewski |

阅读更多

来源: ArXiv AI | 30-06-25

From Ground to Air: Noise Robustness in Vision Transformers and CNNs for Event-Based Vehicle Classification with Potential UAV Applications

Authors: Nouf Almesafri, Hector Figueiredo, Miguel Arana-Catania |

阅读更多

来源: ArXiv AI | 30-06-25

Concept-Level AI for Telecom: Moving Beyond Large Language Models

Authors: Viswanath Kumarskandpriya, Abdulhalim Dandoush, Abbas Bradai, Ali Belgacem |

阅读更多

来源: ArXiv AI | 30-06-25

A Deep Learning framework for building damage assessment using VHR SAR and geospatial data: demonstration on the 2023 Turkiye Earthquake

Authors: Luigi Russo, Deodato Tapete, Silvia Liberata Ullo, Paolo Gamba |

阅读更多

来源: ArXiv AI | 30-06-25

CLoVE: Personalized Federated Learning through Clustering of Loss Vector Embeddings

Authors: Randeep Bhatia, Nikos Papadis, Murali Kodialam, TV Lakshman, Sayak Chakrabarty |

阅读更多

来源: ArXiv AI | 30-06-25

QuickSilver -- Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization

Authors: Danush Khanna, Aditya Kumar Guru, Srivarshinee Sridhar, Zidan Ahmed, Rubhav Bahirwani, Meetu Malhotra, Vinija Jain, Aman Chadha, Amitava Das, Kripabandhu Ghosh |

阅读更多

来源: ArXiv AI | 30-06-25

MobiVerse: Scaling Urban Mobility Simulation with Hybrid Lightweight Domain-Specific Generator and Large Language Models

Authors: Yifan Liu, Xishun Liao, Haoxuan Ma, Jonathan Liu, Rohan Jadhav, Jiaqi Ma |

阅读更多

来源: ArXiv AI | 30-06-25

SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents

Authors: Wanxin Tian, Shijie Zhang, Kevin Zhang, Xiaowei Chi, Yulin Luo, Junyu Lu, Chunkai Fan, Qiang Zhou, Yiming Zhao, Ning Liu Siyu Lin, Zhiyuan Qin, Xiaozhu Ju, Shanghang Zhang, Jian Tang |

阅读更多

来源: ArXiv AI | 30-06-25

CitySim: Modeling Urban Behaviors and City Dynamics with Large-Scale LLM-Driven Agent Simulation

Authors: Nicolas Bougie, Narimasa Watanabe |

阅读更多

来源: ArXiv AI | 30-06-25

A Different Approach to AI Safety: Proceedings from the Columbia Convening on Openness in Artificial Intelligence and AI Safety

Authors: Camille François, Ludovic Péran, Ayah Bdeir, Nouha Dziri, Will Hawkins, Yacine Jernite, Sayash Kapoor, Juliet Shen, Heidy Khlaaf, Kevin Klyman, Nik Marda, Marie Pellat, Deb Raji, Divya Siddarth, Aviya Skowron, Joseph Spisak, Madhulika Srikumar, Victor Storchan, Audrey Tang, Jen Weedon |

阅读更多

来源: ArXiv AI | 30-06-25

Query as Test: An Intelligent Driving Test and Data Storage Method for Integrated Cockpit-Vehicle-Road Scenarios

Authors: Shengyue Yao, Runqing Guo, Yangyang Qin, Miangbing Meng, Jipeng Cao, Yilun Lin, Yisheng Lv, Fei-Yue Wang |

阅读更多

来源: ArXiv AI | 30-06-25

Embodied AI Agents: Modeling the World

Authors: Pascale Fung, Yoram Bachrach, Asli Celikyilmaz, Kamalika Chaudhuri, Delong Chen, Willy Chung, Emmanuel Dupoux, Hervé Jégou, Alessandro Lazaric, Arjun Majumdar, Andrea Madotto, Franziska Meier, Florian Metze, Théo Moutakanni, Juan Pino, Basile Terver, Joseph Tighe, Jitendra Malik |

阅读更多

来源: ArXiv AI | 30-06-25

AI Model Passport: Data and System Traceability Framework for Transparent AI in Health

Authors: Varvara Kalokyri, Nikolaos S. Tachos, Charalampos N. Kalantzopoulos, Stelios Sfakianakis, Haridimos Kondylakis, Dimitrios I. Zaridis, Sara Colantonio, Daniele Regge, Nikolaos Papanikolaou, The ProCAncer-I consortium, Konstantinos Marias, Dimitrios I. Fotiadis, Manolis Tsiknakis |

阅读更多

来源: ArXiv AI | 30-06-25

The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements

Authors: Bingchen Zhao, Despoina Magka, Minqi Jiang, Xian Li, Roberta Raileanu, Tatiana Shavrina, Jean-Christophe Gagnon-Audet, Kelvin Niu, Shagun Sodhani, Michael Shvartsman, Andrei Lupu, Alisia Lupidi, Edan Toledo, Karen Hambardzumyan, Martin Josifoski, Thomas Foster, Lucia Cipolina-Kun, Abhishek Charnalia, Derek Dunfield, Alexander H. Miller, Oisin Mac Aodha, Jakob Foerster, Yoram Bachrach |

阅读更多

来源: ArXiv AI | 30-06-25

Anthropic's Claude ran a store and lost money by selling below cost and giving discounts

阅读更多

来源: The Decoder | 30-06-25

Show HN: A tool to benchmark LLM APIs (OpenAI, Claude, local/self-hosted)llmapitest.com

阅读更多

来源: Hacker News | 30-06-25

US Senate moves to block state AI laws for five years if states take broadband funds

阅读更多

来源: The Decoder | 30-06-25

Life of an inference request (vLLM V1): How LLMs are served efficiently at scaleubicloud.com

阅读更多

来源: Hacker News | 29-06-25

Magnetic Tape Storage Technology: usage, history, and future outlookacm.org

阅读更多

来源: Hacker News | 29-06-25

Show HN: A different kind of AI Video generation

阅读更多

来源: Hacker News | 29-06-25

Transformer-Based Spatial-Temporal Counterfactual Outcomes Estimation

Authors: He Li, Haoang Chi, Mingyu Liu, Wanrong Huang, Liyang Xu, Wenjing Yang |

阅读更多

来源: ArXiv AI | 29-06-25

Maintaining MTEB: Towards Long Term Usability and Reproducibility of Embedding Benchmarks

Authors: Isaac Chung, Imene Kerboua, Marton Kardos, Roman Solomatin, Kenneth Enevoldsen |

阅读更多

来源: ArXiv AI | 29-06-25

A Hierarchical Deep Learning Approach for Minority Instrument Detection

Authors: Dylan Sechet, Francesca Bugiotti, Matthieu Kowalski, Edouard d'Hérouville, Filip Langiewicz |

阅读更多

来源: ArXiv AI | 29-06-25

$T^3$: Multi-level Tree-based Automatic Program Repair with Large Language Models

Authors: Quanming Liu, Xupeng Bu, Zhichao Yan, Ru Li |

阅读更多

来源: ArXiv AI | 29-06-25

Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage

Authors: Gavin Lee Goodship, Luis Miralles-Pechuan, Stephen O'Sullivan |

阅读更多

来源: ArXiv AI | 29-06-25

Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection

Authors: Ali Şenol, Garima Agrawal, Huan Liu |

阅读更多

来源: ArXiv AI | 29-06-25

Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference

Authors: Colin Samplawski, Adam D. Cobb, Manoj Acharya, Ramneet Kaur, Susmit Jha |

阅读更多

来源: ArXiv AI | 29-06-25

Leveraging LLM-Assisted Query Understanding for Live Retrieval-Augmented Generation

Authors: Guanting Dong, Xiaoxi Li, Yuyao Zhang, Mengjie Deng |

阅读更多

来源: ArXiv AI | 29-06-25

"What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets

Authors: Akshay Paruchuri, Maryam Aziz, Rohit Vartak, Ayman Ali, Best Uchehara, Xin Liu, Ishan Chatterjee, Monica Agrawal |

阅读更多

来源: ArXiv AI | 29-06-25

Potemkin Understanding in Large Language Models

Authors: Marina Mancoridis, Bec Weeks, Keyon Vafa, Sendhil Mullainathan |

阅读更多

来源: ArXiv AI | 29-06-25

The Singapore Consensus on Global AI Safety Research Priorities

Authors: Yoshua Bengio, Tegan Maharaj, Luke Ong, Stuart Russell, Dawn Song, Max Tegmark, Lan Xue, Ya-Qin Zhang, Stephen Casper, Wan Sie Lee, Sören Mindermann, Vanessa Wilfred, Vidhisha Balachandran, Fazl Barez, Michael Belinsky, Imane Bello, Malo Bourgon, Mark Brakel, Siméon Campos, Duncan Cass-Beggs, Jiahao Chen, Rumman Chowdhury, Kuan Chua Seah, Jeff Clune, Juntao Dai, Agnes Delaborde, Nouha Dziri, Francisco Eiras, Joshua Engels, Jinyu Fan, Adam Gleave, Noah Goodman, Fynn Heide, Dan Hendrycks, Cyrus Hodes, Bryan Low Kian Hsiang, Minlie Huang, Sami Jawhar, Wang Jingyu, Adam Tauman Kalai, Meindert Kamphuis, Mohan Kankanhalli, Subhash Kantamneni, Mathias Bonde Kirk, Thomas Kwa, Jeffrey Ladish, Kwok-Yan Lam, Wan Lee Sie, Taewhi Lee, Xiaojian Li, Jiajun Liu, Chaochao Lu, Yifan Mai, Richard Mallah, Julian Michael, Nick Moës, Simon Möller, Kihyuk Nam, Kwan Yee Ng, Mark Nitzberg, Besmira Nushi, Seán O hÉigeartaigh, Alejandro Ortega, Pierre Peigné, James Petrie, Benjamin Prud'Homme, Reihaneh Rabbany, Nayat Sanchez-Pi, Sarah Schwettmann, Buck Shlegeris, Saad Siddiqui, Aradhana Sinha, Martín Soto, Cheston Tan, Dong Ting, Robert Trager, Brian Tse, Anthony Tung K. H., Vanessa Wilfred, John Willes, Denise Wong, Wei Xu, Rongwu Xu, Yi Zeng, HongJiang Zhang, Djordje Žikelić |

阅读更多

来源: ArXiv AI | 29-06-25

Dynamic Context-Aware Prompt Recommendation for Domain-Specific AI Applications

Authors: Xinye Tang, Haijun Zhai, Chaitanya Belwal, Vineeth Thayanithi, Philip Baumann, Yogesh K Roy |

阅读更多

来源: ArXiv AI | 29-06-25

Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?

Authors: Haoang Chi, He Li, Wenjing Yang, Feng Liu, Long Lan, Xiaoguang Ren, Tongliang Liu, Bo Han |

阅读更多

来源: ArXiv AI | 29-06-25

Beyond Reactive Safety: Risk-Aware LLM Alignment via Long-Horizon Simulation

Authors: Chenkai Sun, Denghui Zhang, ChengXiang Zhai, Heng Ji |

阅读更多

来源: ArXiv AI | 29-06-25

Active Inference AI Systems for Scientific Discovery

Authors: Karthik Duraisamy |

阅读更多

来源: ArXiv AI | 29-06-25

IXAII: An Interactive Explainable Artificial Intelligence Interface for Decision Support Systems

Authors: Pauline Speckmann, Mario Nadj, Christian Janiesch |

阅读更多

来源: ArXiv AI | 29-06-25

Microsoft’s Braga AI chip faces six-month delay, trails Nvidia’s Blackwell

阅读更多

来源: The Decoder | 29-06-25

OpenAI renting Google TPUs sends a strong warning shot to Microsoft

阅读更多

来源: The Decoder | 29-06-25

Meta CTO confirms massive offers for top AI executives

阅读更多

来源: The Decoder | 29-06-25

Show HN: AGL a toy language that compiles to Gogithub.com/alaingilbert

阅读更多

来源: Hacker News | 29-06-25

LLMs bring new nature of abstraction – up and sidewaysmartinfowler.com

阅读更多

来源: Hacker News | 28-06-25

Facebook is starting to feed its AI with private, unpublished photostheverge.com

阅读更多

来源: Hacker News | 28-06-25

SymbolicAI: A neuro-symbolic perspective on LLMsgithub.com/extensityai

阅读更多

来源: Hacker News | 28-06-25

Lossless LLM 3x Throughput Increase by LMCachegithub.com/lmcache

阅读更多

来源: Hacker News | 28-06-25

AlphaGenome: AI for Better Understanding the Genomedeepmind.google

阅读更多

来源: Hacker News | 28-06-25

Google launches Gemma 3n, a multimodal AI model built for real-time use on mobile devices

阅读更多

来源: The Decoder | 28-06-25

Project Vend: Can Claude run a small shop? (And why does that matter?)anthropic.com

阅读更多

来源: Hacker News | 28-06-25

Theoretical Analysis of Positional Encodings in Transformer Modelsarxiv.org

阅读更多

来源: Hacker News | 28-06-25

Spark AI (YC W24) is hiring a full-stack engineer in SF (founding team)ycombinator.com

阅读更多

来源: Hacker News | 28-06-25

Microsoft is reportedly barred from building its own AGI until 2030 under its contract with OpenAI

阅读更多

来源: The Decoder | 27-06-25

Meta poaches three top AI researchers from OpenAI, who had poached them from Deepmind

阅读更多

来源: The Decoder | 27-06-25

Show HN: Magnitude – Open-source AI browser automation frameworkgithub.com/magnitudedev

阅读更多

来源: Hacker News | 27-06-25

Launch HN: Issen (YC F24) – Personal AI language tutor

阅读更多

来源: Hacker News | 27-06-25

What did former CTO Mira Murati see at OpenAI that made her choose custom models over AGI

阅读更多

来源: The Decoder | 27-06-25

Show HN: I built an AI dataset generatorgithub.com/metabase

阅读更多

来源: Hacker News | 27-06-25

Researchers train AI to generate long-form text using only reinforcement learning

阅读更多

来源: The Decoder | 26-06-25

Google Deepmind makes robots independent of the cloud with Gemini On-Device

阅读更多

来源: The Decoder | 26-06-25

Anthropic won a fair use hearing that could end up being a defeat

阅读更多

来源: The Decoder | 26-06-25

Google releases open-source Gemini CLI to bring Gemini AI into developer workflows

阅读更多

来源: The Decoder | 26-06-25

Automatic Demonstration Selection for LLM-based Tabular Data Classification

Authors: Shuchu Han, Wolfgang Bruckner |

阅读更多

来源: ArXiv AI | 26-06-25

SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models

Authors: Dipayan Saha, Shams Tarek, Hasan Al Shaikh, Khan Thamid Hasan, Pavan Sai Nalluri, Md. Ajoad Hasan, Nashmin Alam, Jingbo Zhou, Sujan Kumar Saha, Mark Tehranipoor, Farimah Farahmandi |

阅读更多

来源: ArXiv AI | 26-06-25

WattsOnAI: Measuring, Analyzing, and Visualizing Energy and Carbon Footprint of AI Workloads

Authors: Hongzhen Huang, Kunming Zhang, Hanlong Liao, Kui Wu, Guoming Tang |

阅读更多

来源: ArXiv AI | 26-06-25

Large Language Model-Driven Code Compliance Checking in Building Information Modeling

Authors: Soumya Madireddy, Lu Gao, Zia Din, Kinam Kim, Ahmed Senouci, Zhe Han, Yunpeng Zhang |

阅读更多

来源: ArXiv AI | 26-06-25

When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs

Authors: Ammar Khairi, Daniel D'souza, Ye Shen, Julia Kreutzer, Sara Hooker |

阅读更多

来源: ArXiv AI | 26-06-25

AI in the Writing Process: How Purposeful AI Support Fosters Student Writing

Authors: Momin N. Siddiqui, Roy Pea, Hari Subramonyam |

阅读更多

来源: ArXiv AI | 26-06-25

Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs

Authors: Sonia K. Murthy, Rosie Zhao, Jennifer Hu, Sham Kakade, Markus Wulfmeier, Peng Qian, Tomer Ullman |

阅读更多

来源: ArXiv AI | 26-06-25

Define-ML: An Approach to Ideate Machine Learning-Enabled Systems

Authors: Silvio Alonso, Antonio Pedro Santos Alves, Lucas Romao, Hélio Lopes, Marcos Kalinowski |

阅读更多

来源: ArXiv AI | 26-06-25

Persona-Assigned Large Language Models Exhibit Human-Like Motivated Reasoning

Authors: Saloni Dash, Amélie Reymond, Emma S. Spiro, Aylin Caliskan |

阅读更多

来源: ArXiv AI | 26-06-25

Achieving Trustworthy Real-Time Decision Support Systems with Low-Latency Interpretable AI Models

Authors: Zechun Deng, Ziwei Liu, Ziqian Bi, Junhao Song, Chia Xin Liang, Joe Yeong, Junfeng Hao |

阅读更多

来源: ArXiv AI | 26-06-25

Accurate and Energy Efficient: Local Retrieval-Augmented Generation Models Outperform Commercial Large Language Models in Medical Tasks

Authors: Konstantinos Vrettos, Michail E. Klontzas |

阅读更多

来源: ArXiv AI | 26-06-25

QHackBench: Benchmarking Large Language Models for Quantum Code Generation Using PennyLane Hackathon Challenges

Authors: Abdul Basit, Minghao Shao, Haider Asif, Nouhaila Innan, Muhammad Kashif, Alberto Marchisio, Muhammad Shafique |

阅读更多

来源: ArXiv AI | 26-06-25

Enterprise Large Language Model Evaluation Benchmark

Authors: Liya Wang, David Yi, Damien Jose, John Passarelli, James Gao, Jordan Leventis, Kang Li |

阅读更多

来源: ArXiv AI | 26-06-25

DiaLLMs: EHR Enhanced Clinical Conversational System for Clinical Test Recommendation and Diagnosis Prediction

Authors: Weijieying Ren, Tianxiang Zhao, Lei Wang, Tianchun Wang, Vasant Honavar |

阅读更多

来源: ArXiv AI | 26-06-25

Smart Ride and Delivery Services with Electric Vehicles: Leveraging Bidirectional Charging for Profit Optimisation

Authors: Jinchun Du, Bojie Shen, Muhammad Aamir Cheema, Adel N. Toosi |

阅读更多

来源: ArXiv AI | 26-06-25

Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios

Authors: Wenbin Gan, Minh-Son Dao, Koji Zettsu |

阅读更多

来源: ArXiv AI | 26-06-25

CogGen: A Learner-Centered Generative AI Architecture for Intelligent Tutoring with Programming Video

Authors: Wengxi Li, Roy Pea, Nick Haber, Hari Subramonyam |

阅读更多

来源: ArXiv AI | 26-06-25

Fine-Tuning and Prompt Engineering of LLMs, for the Creation of Multi-Agent AI for Addressing Sustainable Protein Production Challenges

Authors: Alexander D. Kalian, Jaewook Lee, Stefan P. Johannesson, Lennart Otte, Christer Hogstrand, Miao Guo |

阅读更多

来源: ArXiv AI | 26-06-25

Towards Community-Driven Agents for Machine Learning Engineering

Authors: Sijie Li, Weiwei Sun, Shanda Li, Ameet Talwalkar, Yiming Yang |

阅读更多

来源: ArXiv AI | 26-06-25

LLM code generation may lead to an erosion of trustjaysthoughts.com

阅读更多

来源: Hacker News | 26-06-25

Define policy forbidding use of AI code generatorsgithub.com/qemu

阅读更多

来源: Hacker News | 26-06-25

Build and Host AI-Powered Apps with Claude – No Deployment Neededanthropic.com

阅读更多

来源: Hacker News | 26-06-25

Structured Output with LangChain and Llamafilebrakmic.com

阅读更多

来源: Hacker News | 26-06-25

OpenAI charges by the minute, so speed up your audiomand.is

阅读更多

来源: Hacker News | 26-06-25

Learnings from Building AI Agentscubic.dev

阅读更多

来源: Hacker News | 26-06-25

Gemini CLIblog.google

阅读更多

来源: Hacker News | 26-06-25

Google hands off Agent2Agent protocol to Linux Foundation for open AI agent standard

阅读更多

来源: The Decoder | 26-06-25

LLM Hallucinations in Practical Code Generationacm.org

阅读更多

来源: Hacker News | 26-06-25

FurtherAI (YC W24) Is Hiring for Software and AI Rolesycombinator.com

阅读更多

来源: Hacker News | 26-06-25

Disney is in talks with OpenAI about possible partnerships involving its characters

阅读更多

来源: The Decoder | 25-06-25

Microsoft has introduced an AI agent to the Windows Settings menu

阅读更多

来源: The Decoder | 25-06-25

AI job postings on LinkedIn grew sixfold as AI skill additions to profiles soared twentyfold

阅读更多

来源: The Decoder | 25-06-25

African and South American countries are almost entirely excluded from global AI development

阅读更多

来源: The Decoder | 25-06-25

ChatGPT's enterprise success against Copilot fuels OpenAI/Microsoft rivalrybloomberg.com

阅读更多

来源: Hacker News | 25-06-25

Thoughts on Asunción, Paraguaycpsi.media

阅读更多

来源: Hacker News | 25-06-25

SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning

Authors: Yuqian Fu, Tinghong Chen, Jiajun Chai, Xihuai Wang, Songjun Tu, Guojun Yin, Wei Lin, Qichao Zhang, Yuanheng Zhu, Dongbin Zhao |

阅读更多

来源: ArXiv AI | 25-06-25

Arabic Dialect Classification using RNNs, Transformers, and Large Language Models: A Comparative Analysis

Authors: Omar A.Essameldin, Ali O.Elbeih, Wael H.Gomaa, Wael F.Elsersy |

阅读更多

来源: ArXiv AI | 25-06-25

Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

Authors: Yuqi Zhu, Yi Zhong, Jintian Zhang, Ziheng Zhang, Shuofei Qiao, Yujie Luo, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang |

阅读更多

来源: ArXiv AI | 25-06-25

Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation

Authors: Jun Wang, Xijuan Zeng, Chunyu Qiang, Ruilong Chen, Shiyao Wang, Le Wang, Wangjing Zhou, Pengfei Cai, Jiahui Zhao, Nan Li, Zihan Li, Yuzhe Liang, Xiaopeng Wang, Haorui Zheng, Ming Wen, Kang Yin, Yiran Wang, Nan Li, Feng Deng, Liang Dong, Chen Zhang, Di Zhang, Kun Gai |

阅读更多

来源: ArXiv AI | 25-06-25

Do LLMs Know When to Flip a Coin? Strategic Randomization through Reasoning and Experience

Authors: Lingyu Yang (1) ((1) Shanghai Jiao Tong University) |

阅读更多

来源: ArXiv AI | 25-06-25

A standard transformer and attention with linear biases for molecular conformer generation

Authors: Viatcheslav Gurev, Timothy Rumbell |

阅读更多

来源: ArXiv AI | 25-06-25

Emotion Detection on User Front-Facing App Interfaces for Enhanced Schedule Optimization: A Machine Learning Approach

Authors: Feiting Yang, Antoine Moevus, Steve Lévesque |

阅读更多

来源: ArXiv AI | 25-06-25

RecLLM-R1: A Two-Stage Training Paradigm with Reinforcement Learning and Chain-of-Thought v1

Authors: Yu Xie, Xingkai Ren, Ying Qi, Yao Hu, Lianlei Shan |

阅读更多

来源: ArXiv AI | 25-06-25

Spiritual-LLM : Gita Inspired Mental Health Therapy In the Era of LLMs

Authors: Janak Kapuriya, Aman Singh, Jainendra Shukla, Rajiv Ratn Shah |

阅读更多

来源: ArXiv AI | 25-06-25

Baba is LLM: Reasoning in a Game with Dynamic Rules

Authors: Fien van Wetten, Aske Plaat, Max van Duijn |

阅读更多

来源: ArXiv AI | 25-06-25

Conversational Intent-Driven GraphRAG: Enhancing Multi-Turn Dialogue Systems through Adaptive Dual-Retrieval of Flow Patterns and Context Semantics

Authors: Ziqi Zhu, Tao Hu, Honglong Zhang, Dan Yang, HanGeng Chen, Mengran Zhang, Xilun Chen |

阅读更多

来源: ArXiv AI | 25-06-25

FEAT: A Preference Feedback Dataset through a Cost-Effective Auto-Generation and Labeling Framework for English AI Tutoring

Authors: Hyein Seo, Taewook Hwang, Yohan Lee, sangkeun Jung |

阅读更多

来源: ArXiv AI | 25-06-25

Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs

Authors: Liang Zeng, Yongcong Li, Yuzhen Xiao, Changshi Li, Chris Yuhao Liu, Rui Yan, Tianwen Wei, Jujie He, Xuchen Song, Yang Liu, Yahui Zhou |

阅读更多

来源: ArXiv AI | 25-06-25

Interpretable Hybrid Machine Learning Models Using FOLD-R++ and Answer Set Programming

Authors: Sanne Wielinga, Jesse Heyninck |

阅读更多

来源: ArXiv AI | 25-06-25

NTRL: Encounter Generation via Reinforcement Learning for Dynamic Difficulty Adjustment in Dungeons and Dragons

Authors: Carlo Romeo, Andrew D. Bagdanov |

阅读更多

来源: ArXiv AI | 25-06-25

KunLunBaizeRAG: Reinforcement Learning Driven Inference Performance Leap for Large Language Models

Authors: Cheng Li, Jiexiong Liu, Yixuan Chen, Qihang Zhou, KunLun Meta |

阅读更多

来源: ArXiv AI | 25-06-25

From memories to maps: Mechanisms of in context reinforcement learning in transformers

Authors: Ching Fang, Kanaka Rajan |

阅读更多

来源: ArXiv AI | 25-06-25

LLM-Driven Medical Document Analysis: Enhancing Trustworthy Pathology and Differential Diagnosis

Authors: Lei Kang, Xuanshuo Fu, Oriol Ramos Terrades, Javier Vazquez-Corral, Ernest Valveny, Dimosthenis Karatzas |

阅读更多

来源: ArXiv AI | 25-06-25

Temporal-IRL: Modeling Port Congestion and Berth Scheduling with Inverse Reinforcement Learning

Authors: Guo Li, Zixiang Xu, Wei Zhang, Yikuan Hu, Xinyu Yang, Nikolay Aristov, Mingjie Tang, Elenna R Dugundji |

阅读更多

来源: ArXiv AI | 25-06-25

JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning

Authors: Ai Han, Junxing Hu, Pu Wei, Zhiqian Zhang, Yuhang Guo, Jiawei Lu, Zicheng Zhang |

阅读更多

来源: ArXiv AI | 25-06-25

Gemini Robotics On-Device brings AI to local robotic devicesdeepmind.google

阅读更多

来源: Hacker News | 25-06-25

Mapping LLMs over excel saved my passion for game devweblog.lol

阅读更多

来源: Hacker News | 25-06-25

Blackmail becomes go-to strategy for AI models facing shutdown in new Anthropic tests

阅读更多

来源: The Decoder | 24-06-25

'Dragon prince' dinosaur discovery 'rewrites' T.rex family treebbc.com

阅读更多

来源: Hacker News | 24-06-25

From Unstructured Communication to Intelligent RAG: Multi-Agent Automation for Supply Chain Knowledge Bases

Authors: Yao Zhang, Zaixi Shang, Silpan Patel, Mikel Zuniga |

阅读更多

来源: ArXiv AI | 24-06-25

OmniReflect: Discovering Transferable Constitutions for LLM agents via Neuro-Symbolic Reflections

Authors: Manasa Bharadwaj, Nikhil Verma, Kevin Ferreira |

阅读更多

来源: ArXiv AI | 24-06-25

Keeping Medical AI Healthy: A Review of Detection and Correction Methods for System Degradation

Authors: Hao Guan, David Bates, Li Zhou |

阅读更多

来源: ArXiv AI | 24-06-25

Resource Rational Contractualism Should Guide AI Alignment

Authors: Sydney Levine, Matija Franklin, Tan Zhi-Xuan, Secil Yanik Guyot, Lionel Wong, Daniel Kilov, Yejin Choi, Joshua B. Tenenbaum, Noah Goodman, Seth Lazar, Iason Gabriel |

阅读更多

来源: ArXiv AI | 24-06-25

Measuring and Augmenting Large Language Models for Solving Capture-the-Flag Challenges

Authors: Zimo Ji, Daoyuan Wu, Wenyuan Jiang, Pingchuan Ma, Zongjie Li, Shuai Wang |

阅读更多

来源: ArXiv AI | 24-06-25

Taming the Untamed: Graph-Based Knowledge Retrieval and Reasoning for MLLMs to Conquer the Unknown

Authors: Bowen Wang |

阅读更多

来源: ArXiv AI | 24-06-25

Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models

Authors: Yukun Huang, Sanxing Chen, Jian Pei, Manzil Zaheer, Bhuwan Dhingra |

阅读更多

来源: ArXiv AI | 24-06-25

Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities

Authors: Yuanchen Bei, Weizhi Zhang, Siwen Wang, Weizhi Chen, Sheng Zhou, Hao Chen, Yong Li, Jiajun Bu, Shirui Pan, Yizhou Yu, Irwin King, Fakhri Karray, Philip S. Yu |

阅读更多

来源: ArXiv AI | 24-06-25

Leveraging Large Language Model for Intelligent Log Processing and Autonomous Debugging in Cloud AI Platforms

Authors: Cheng Ji, Huaiying Luo |

阅读更多

来源: ArXiv AI | 24-06-25

A Conceptual Framework for AI Capability Evaluations

Authors: María Victoria Carro, Denise Alejandra Mester, Francisca Gauna Selasco, Luca Nicolás Forziati Gangi, Matheo Sandleris Musa, Lola Ramos Pereyra, Mario Leiva, Juan Gustavo Corvalan, María Vanina Martinez, Gerardo Simari |

阅读更多

来源: ArXiv AI | 24-06-25

Standard Applicability Judgment and Cross-jurisdictional Reasoning: A RAG-based Framework for Medical Device Compliance

Authors: Yu Han, Aaron Ceross, Jeroen H.M. Bergmann |

阅读更多

来源: ArXiv AI | 24-06-25

How Robust is Model Editing after Fine-Tuning? An Empirical Study on Text-to-Image Diffusion Models

Authors: Feng He, Zhenyang Liu, Marco Valentino, Zhixue Zhao |

阅读更多

来源: ArXiv AI | 24-06-25

A Large Language Model-based Multi-Agent Framework for Analog Circuits' Sizing Relationships Extraction

Authors: Chengjie Liu, Weiyu Chen, Huiyao Xu, Yuan Du, Jun Yang, Li Du |

阅读更多

来源: ArXiv AI | 24-06-25

T-CPDL: A Temporal Causal Probabilistic Description Logic for Developing Logic-RAG Agent

Authors: Hong Qing Yu |

阅读更多

来源: ArXiv AI | 24-06-25

A Question Bank to Assess AI Inclusivity: Mapping out the Journey from Diversity Errors to Inclusion Excellence

Authors: Rifat Ara Shams, Didar Zowghi, Muneera Bano |

阅读更多

来源: ArXiv AI | 24-06-25

AggTruth: Contextual Hallucination Detection using Aggregated Attention Scores in LLMs

Authors: Piotr Matys, Jan Eliasz, Konrad Kiełczyński, Mikołaj Langner, Teddy Ferdinan, Jan Kocoń, Przemysław Kazienko |

阅读更多

来源: ArXiv AI | 24-06-25

TRIZ Agents: A Multi-Agent LLM Approach for TRIZ-Based Innovation

Authors: Kamil Szczepanik, Jarosław A. Chudziak |

阅读更多

来源: ArXiv AI | 24-06-25

Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training

Authors: Jonathan Cook, Silvia Sapora, Arash Ahmadian, Akbir Khan, Tim Rocktaschel, Jakob Foerster, Laura Ruis |

阅读更多

来源: ArXiv AI | 24-06-25

Steering Conceptual Bias via Transformer Latent-Subspace Activation

Authors: Vansh Sharma, Venkat Raman |

阅读更多

来源: ArXiv AI | 24-06-25

jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval

Authors: Michael Günther, Saba Sturua, Mohammad Kalim Akram, Isabelle Mohr, Andrei Ungureanu, Sedigheh Eslami, Scott Martens, Bo Wang, Nan Wang, Han Xiao |

阅读更多

来源: ArXiv AI | 24-06-25

Show HN: Pickaxe – A TypeScript library for building AI agentsgithub.com/hatchet-dev

阅读更多

来源: Hacker News | 24-06-25

Judge denies creating “mass surveillance program” harming all ChatGPT usersarstechnica.com

阅读更多

来源: Hacker News | 24-06-25

GitHub CEO: manual coding remains key despite AI boomtechinasia.com

阅读更多

来源: Hacker News | 24-06-25

Sakana AI's ALE AI agent cracks the top 21 among 1,000 code experts

阅读更多

来源: The Decoder | 23-06-25

Apple executives have held internal discussions about potentially bidding for AI startup Perplexity

阅读更多

来源: The Decoder | 23-06-25

Nano-Vllm: lightweight vLLM implementation built from scratchgithub.com/geeeekexplorer

阅读更多

来源: Hacker News | 23-06-25

Show HN: EchoStream – A Local AI Agent That Lives on Your iPhone

阅读更多

来源: Hacker News | 23-06-25

Claude Code for VSCodevisualstudio.com

阅读更多

来源: Hacker News | 23-06-25

Facial Landmark Visualization and Emotion Recognition Through Neural Networks

Authors: Israel Juárez-Jiménez, Tiffany Guadalupe Martínez Paredes, Jesús García-Ramírez, Eric Ramos Aguilar |

阅读更多

来源: ArXiv AI | 23-06-25

Towards AI Search Paradigm

Authors: Yuchen Li, Hengyi Cai, Rui Kong, Xinran Chen, Jiamin Chen, Jun Yang, Haojie Zhang, Jiayi Li, Jiayi Wu, Yiqun Chen, Changle Qu, Keyi Kong, Wenwen Ye, Lixin Su, Xinyu Ma, Long Xia, Daiting Shi, Jiashu Zhao, Haoyi Xiong, Shuaiqiang Wang, Dawei Yin |

阅读更多

来源: ArXiv AI | 23-06-25

Continual Learning with Columnar Spiking Neural Networks

Authors: Denis Larionov, Nikolay Bazenkov, Mikhail Kiselev |

阅读更多

来源: ArXiv AI | 23-06-25

LLMs Struggle to Perform Counterfactual Reasoning with Parametric Knowledge

Authors: Khurram Yamin, Gaurav Ghosal, Bryan Wilder |

阅读更多

来源: ArXiv AI | 23-06-25

No Free Lunch: Rethinking Internal Feedback for LLM Reasoning

Authors: Yanzhi Zhang, Zhaoxi Zhang, Haoxiang Guan, Yilin Cheng, Yitong Duan, Chen Wang, Yue Wang, Shuxin Zheng, Jiyan He |

阅读更多

来源: ArXiv AI | 23-06-25

Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems

Authors: Matias Martinez, Xavier Franch |

阅读更多

来源: ArXiv AI | 23-06-25

Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts

Authors: Kartik Sharma, Yiqiao Jin, Vineeth Rakesh, Yingtong Dou, Menghai Pan, Mahashweta Das, Srijan Kumar |

阅读更多

来源: ArXiv AI | 23-06-25

SHADE-Arena: Evaluating Sabotage and Monitoring in LLM Agents

Authors: Jonathan Kutasov, Yuqi Sun, Paul Colognese, Teun van der Weij, Linda Petrini, Chen Bo Calvin Zhang, John Hughes, Xiang Deng, Henry Sleight, Tyler Tracy, Buck Shlegeris, Joe Benton |

阅读更多

来源: ArXiv AI | 23-06-25

Dual-Objective Reinforcement Learning with Novel Hamilton-Jacobi-Bellman Formulations

Authors: William Sharpless, Dylan Hirsch, Sander Tonkens, Nikhil Shinde, Sylvia Herbert |

阅读更多

来源: ArXiv AI | 23-06-25

Exploring Big Five Personality and AI Capability Effects in LLM-Simulated Negotiation Dialogues

Authors: Myke C. Cohen, Zhe Su, Hsien-Te Kao, Daniel Nguyen, Spencer Lynch, Maarten Sap, Svitlana Volkova |

阅读更多

来源: ArXiv AI | 23-06-25

Large Language Models are Near-Optimal Decision-Makers with a Non-Human Learning Behavior

Authors: Hao Li, Gengrui Zhang, Petter Holme, Shuyue Hu, Zhen Wang |

阅读更多

来源: ArXiv AI | 23-06-25

Advancing Harmful Content Detection in Organizational Research: Integrating Large Language Models with Elo Rating System

Authors: Mustafa Akben, Aaron Satko |

阅读更多

来源: ArXiv AI | 23-06-25

Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving

Authors: Chuxue Cao, Mengze Li, Juntao Dai, Jinluan Yang, Zijian Zhao, Shengyu Zhang, Weijie Shi, Chengzhong Liu, Sirui Han, Yike Guo |

阅读更多

来源: ArXiv AI | 23-06-25

The MedPerturb Dataset: What Non-Content Perturbations Reveal About Human and Clinical LLM Decision Making

Authors: Abinitha Gourabathina, Yuexing Hao, Walter Gerych, Marzyeh Ghassemi |

阅读更多

来源: ArXiv AI | 23-06-25

LAION and Intel introduce tools that help AI gauge the intensity of 40 distinct emotions

阅读更多

来源: The Decoder | 22-06-25

Phoenix.new – Remote AI Runtime for Phoenixfly.io

阅读更多

来源: Hacker News | 22-06-25

Remote MCP Support in Claude Codeanthropic.com

阅读更多

来源: Hacker News | 22-06-25

Uncovering Intention through LLM-Driven Code Snippet Description Generation

Authors: Yusuf Sulistyo Nugroho, Farah Danisha Salam, Brittany Reid, Raula Gaikovina Kula, Kazumasa Shimari, Kenichi Matsumoto |

阅读更多

来源: ArXiv AI | 22-06-25

RePCS: Diagnosing Data Memorization in LLM-Powered Retrieval-Augmented Generation

Authors: Le Vu Anh, Nguyen Viet Anh, Mehmet Dik, Luong Van Nghia |

阅读更多

来源: ArXiv AI | 22-06-25

Optimizing Web-Based AI Query Retrieval with GPT Integration in LangChain A CoT-Enhanced Prompt Engineering Approach

Authors: Wenqi Guan, Yang Fang |

阅读更多

来源: ArXiv AI | 22-06-25

Over-squashing in Spatiotemporal Graph Neural Networks

Authors: Ivan Marisca, Jacob Bamberger, Cesare Alippi, Michael M. Bronstein |

阅读更多

来源: ArXiv AI | 22-06-25

Towards Explainable Indoor Localization: Interpreting Neural Network Learning on Wi-Fi Fingerprints Using Logic Gates

Authors: Danish Gufran, Sudeep Pasricha |

阅读更多

来源: ArXiv AI | 22-06-25

The Compositional Architecture of Regret in Large Language Models

Authors: Xiangxiang Cui, Shu Yang, Tianjin Huang, Wanyu Lin, Lijie Hu, Di Wang |

阅读更多

来源: ArXiv AI | 22-06-25

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

Authors: Gabrel J. Perin, Runjin Chen, Xuxi Chen, Nina S. T. Hirata, Zhangyang Wang, Junyuan Hong |

阅读更多

来源: ArXiv AI | 22-06-25

Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability

Authors: Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe |

阅读更多

来源: ArXiv AI | 22-06-25

Demystifying the Visual Quality Paradox in Multimodal Large Language Models

Authors: Shuo Xing, Lanqing Guo, Hongyuan Hua, Seoyoung Lee, Peiran Li, Yufei Wang, Zhangyang Wang, Zhengzhong Tu |

阅读更多

来源: ArXiv AI | 22-06-25

HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges

Authors: Xianliang Yang, Ling Zhang, Haolong Qian, Lei Song, Jiang Bian |

阅读更多

来源: ArXiv AI | 22-06-25

Managing Complex Failure Analysis Workflows with LLM-based Reasoning and Acting Agents

Authors: Aline Dobrovsky, Konstantin Schekotihin, Christian Burmer |

阅读更多

来源: ArXiv AI | 22-06-25

The AI Policy Module: Developing Computer Science Student Competency in AI Ethics and Policy

Authors: James Weichert, Daniel Dunlap, Mohammed Farghally, Hoda Eldardiry |

阅读更多

来源: ArXiv AI | 22-06-25

The Effect of State Representation on LLM Agent Behavior in Dynamic Routing Games

Authors: Lyle Goodyear, Rachel Guo, Ramesh Johari |

阅读更多

来源: ArXiv AI | 22-06-25

Meta CEO Mark Zuckerberg bets billions not to fall behind in the AI race

阅读更多

来源: The Decoder | 22-06-25

Weave (YC W25) is hiring a founding AI engineerycombinator.com

阅读更多

来源: Hacker News | 22-06-25

Apple's "Illusion of Thinking" paper shows experts deeply divided on AI reasoning

阅读更多

来源: The Decoder | 21-06-25

Agentic Misalignment: How LLMs could be insider threatsanthropic.com

阅读更多

来源: Hacker News | 21-06-25

Midjourney launches its first video model, letting users turn images into short animated clips

阅读更多

来源: The Decoder | 21-06-25

Jürgen Schmidhuber:the Father of Generative AI Without Turing Awardjazzyear.com

阅读更多

来源: Hacker News | 21-06-25

I Built a Celebrity AI Image Generator(No Registion Needed)– Would Love Feedbackaicelebrity.design

阅读更多

来源: Hacker News | 21-06-25

OpenAI CEO Sam Altman says GPT-5 is "probably coming sometime this summer"

阅读更多

来源: The Decoder | 20-06-25

Andrej Karpathy: Software in the era of AI [video]youtube.com

阅读更多

来源: Hacker News | 20-06-25

Compiling LLMs into a MegaKernel: A path to low-latency inferencezhihaojia.medium.com

阅读更多

来源: Hacker News | 20-06-25

Gemini 2.5 Flash-Lite is the fastest and most cost-effective model in Google's Gemini lineup

阅读更多

来源: The Decoder | 20-06-25

Show HN: Claude Code Usage Monitor – real-time tracker to dodge usage cut-offsgithub.com/maciek-roboblog

阅读更多

来源: Hacker News | 20-06-25

How OpenElections uses LLMsthescoop.org

阅读更多

来源: Hacker News | 20-06-25

MiniMax-M1 comes close to Gemini 2.5 Pro efficiency when handling large context windows

阅读更多

来源: The Decoder | 19-06-25

From LLM to AI Agent: What's the Real Journey Behind AI System Development?codelink.io

阅读更多

来源: Hacker News | 19-06-25

Luxembourg partners with Mistral AI to bring artificial intelligence to government and defense

阅读更多

来源: The Decoder | 19-06-25

OpenAI and Microsoft increasingly mistrust each other as tensions rise over contracts and profits

阅读更多

来源: The Decoder | 19-06-25

Is there a half-life for the success rates of AI agents?tobyord.com

阅读更多

来源: Hacker News | 19-06-25

Math genius Terence Tao says that AI still can't "smell" bad math

阅读更多

来源: The Decoder | 18-06-25

OpenAI’s Defense Department deal targets healthcare, data analysis, and cyber defense

阅读更多

来源: The Decoder | 18-06-25

Time Series Forecasting with Graph Transformerskumo.ai

阅读更多

来源: Hacker News | 18-06-25

LLMs pose an interesting problem for DSL designerskirancodes.me

阅读更多

来源: Hacker News | 18-06-25

Making 2.5 Flash and 2.5 Pro GA, and introducing Gemini 2.5 Flash-Liteblog.google

阅读更多

来源: Hacker News | 18-06-25

Building Effective AI Agentsanthropic.com

阅读更多

来源: Hacker News | 18-06-25

I counted all of the yurts in Mongolia using machine learningmonroeclinton.com

阅读更多

来源: Hacker News | 18-06-25

Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs

Authors: Ring Team, Bin Hu, Cai Chen, Deng Zhao, Ding Liu, Dingnan Jin, Feng Zhu, Hao Dai, Hongzhi Luan, Jia Guo, Jiaming Liu, Jiewei Wu, Jun Mei, Jun Zhou, Junbo Zhao, Junwu Xiong, Kaihong Zhang, Kuan Xu, Lei Liang, Liang Jiang, Liangcheng Fu, Longfei Zheng, Qiang Gao, Qing Cui, Quan Wan, Shaomian Zheng, Shuaicheng Li, Tongkai Yang, Wang Ren, Xiaodong Yan, Xiaopei Wan, Xiaoyun Feng, Xin Zhao, Xinxing Yang, Xinyu Kong, Xuemin Yang, Yang Li, Yingting Wu, Yongkang Liu, Zhankai Xu, Zhenduo Zhang, Zhenglei Zhou, Zhenyu Huang, Zhiqiang Zhang, Zihao Wang, Zujie Wen |

阅读更多

来源: ArXiv AI | 18-06-25

Personalized Constitutionally-Aligned Agentic Superego: Secure AI Behavior Aligned to Diverse Human Values

Authors: Nell Watson, Ahmed Amer, Evan Harris, Preeti Ravindra, Shujun Zhang |

阅读更多

来源: ArXiv AI | 18-06-25

The NordDRG AI Benchmark for Large Language Models

Authors: Tapio Pitkäranta |

阅读更多

来源: ArXiv AI | 18-06-25

ICE-ID: A Novel Historical Census Data Benchmark Comparing NARS against LLMs, \& a ML Ensemble on Longitudinal Identity Resolution

Authors: Gonçalo Hora de Carvalho, Lazar S. Popov, Sander Kaatee, Kristinn R. Thórisson, Tangrui Li, Pétur Húni Björnsson, Jilles S. Dibangoye |

阅读更多

来源: ArXiv AI | 18-06-25

Causality in the human niche: lessons for machine learning

Authors: Richard D. Lange, Konrad P. Kording |

阅读更多

来源: ArXiv AI | 18-06-25

Evaluating Explainability: A Framework for Systematic Assessment and Reporting of Explainable AI Features

Authors: Miguel A. Lago, Ghada Zamzmi, Brandon Eich, Jana G. Delfino |

阅读更多

来源: ArXiv AI | 18-06-25

LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning

Authors: Miho Koda, Yu Zheng, Ruixian Ma, Mingyang Sun, Devesh Pansare, Fabio Duarte, Paolo Santi |

阅读更多

来源: ArXiv AI | 18-06-25

Machine Mirages: Defining the Undefined

Authors: Hamidou Tembine |

阅读更多

来源: ArXiv AI | 18-06-25

ProfiLLM: An LLM-Based Framework for Implicit Profiling of Chatbot Users

Authors: Shahaf David, Yair Meidan, Ido Hersko, Daniel Varnovitzky, Dudu Mimran, Yuval Elovici, Asaf Shabtai |

阅读更多

来源: ArXiv AI | 18-06-25

Into the Unknown: Applying Inductive Spatial-Semantic Location Embeddings for Predicting Individuals' Mobility Beyond Visited Places

Authors: Xinglei Wang, Tao Cheng, Stephen Law, Zichao Zeng, Ilya Ilyankou, Junyuan Liu, Lu Yin, Weiming Huang, Natchapon Jongwiriyanurak |

阅读更多

来源: ArXiv AI | 18-06-25

Fragile Preferences: A Deep Dive Into Order Effects in Large Language Models

Authors: Haonan Yin, Shai Vardi, Vidyanand Choudhary |

阅读更多

来源: ArXiv AI | 18-06-25

Lightweight Relevance Grader in RAG

Authors: Taehee Jeong |

阅读更多

来源: ArXiv AI | 18-06-25

From Black Boxes to Transparent Minds: Evaluating and Enhancing the Theory of Mind in Multimodal Large Language Models

Authors: Xinyang Li, Siqi Liu, Bochao Zou, Jiansheng Chen, Huimin Ma |

阅读更多

来源: ArXiv AI | 18-06-25

Causes in neuron diagrams, and testing causal reasoning in Large Language Models. A glimpse of the future of philosophy?

Authors: Louis Vervoort, Vitaly Nikolaev |

阅读更多

来源: ArXiv AI | 18-06-25

Don't throw the baby out with the bathwater: How and why deep learning for ARC

Authors: Jack Cole, Mohamed Osman |

阅读更多

来源: ArXiv AI | 18-06-25

Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs

Authors: Xumeng Wen, Zihan Liu, Shun Zheng, Zhijian Xu, Shengyu Ye, Zhirong Wu, Xiao Liang, Yang Wang, Junjie Li, Ziming Miao, Jiang Bian, Mao Yang |

阅读更多

来源: ArXiv AI | 18-06-25

Don't Make It Up: Preserving Ignorance Awareness in LLM Fine-Tuning

Authors: William F. Shen, Xinchi Qiu, Nicola Cancedda, Nicholas D. Lane |

阅读更多

来源: ArXiv AI | 18-06-25

AviationLLM: An LLM-based Knowledge System for Aviation Training

Authors: Jia'ang Wan, Feng Shen, Fujuan Li, Yanjin Sun, Yan Li, Shiwen Zhang |

阅读更多

来源: ArXiv AI | 18-06-25

ADRD: LLM-Driven Autonomous Driving Based on Rule-based Decision Systems

Authors: Fanzhi Zeng, Siqi Wang, Chuzhao Zhu, Li Li |

阅读更多

来源: ArXiv AI | 18-06-25

LLM-Powered Swarms: A New Frontier or a Conceptual Stretch?

Authors: Muhammad Atta Ur Rahman, Melanie Schranz |

阅读更多

来源: ArXiv AI | 18-06-25

Doppelgänger Method: Breaking Role Consistency in LLM Agent via Prompt-based Transferable Adversarial Attack

Authors: Daewon Kang, YeongHwan Shin, Doyeon Kim, Kyu-Hwan Jung, Meong Hi Son |

阅读更多

来源: ArXiv AI | 18-06-25

Enhancing Symbolic Machine Learning by Subsymbolic Representations

Authors: Stephen Roth, Lennart Baur, Derian Boer, Stefan Kramer |

阅读更多

来源: ArXiv AI | 18-06-25

New study supports Apple's doubts about AI reasoning, but sees no dead end

阅读更多

来源: The Decoder | 18-06-25

Salesforce's CRM benchmark finds AI agents struggle in real-world business scenarios

阅读更多

来源: The Decoder | 17-06-25

New York may soon require AI giants to publish safety protocols before releasing LLMs

阅读更多

来源: The Decoder | 17-06-25

Evolutionary Developmental Biology Can Serve as the Conceptual Foundation for a New Design Paradigm in Artificial Intelligence

Authors: Zeki Doruk Erden, Boi Faltings |

阅读更多

来源: ArXiv AI | 17-06-25

Mastering Da Vinci Code: A Comparative Study of Transformer, LLM, and PPO-based Agents

Authors: LeCheng Zhang, Yuanshi Wang, Haotian Shen, Xujie Wang |

阅读更多

来源: ArXiv AI | 17-06-25

Constitutive Components for Human-Like Autonomous Artificial Intelligence

Authors: Kazunori D Yamada |

阅读更多

来源: ArXiv AI | 17-06-25

Scaling Test-time Compute for LLM Agents

Authors: King Zhu, Hanhao Li, Siwei Wu, Tianshun Xing, Dehua Ma, Xiangru Tang, Minghao Liu, Jian Yang, Jiaheng Liu, Yuchen Eleanor Jiang, Changwang Zhang, Chenghua Lin, Jun Wang, Ge Zhang, Wangchunshu Zhou |

阅读更多

来源: ArXiv AI | 17-06-25

Knowledge Graph Fusion with Large Language Models for Accurate, Explainable Manufacturing Process Planning

Authors: Danny Hoang, David Gorsich, Matthew P. Castanier, Farhad Imani |

阅读更多

来源: ArXiv AI | 17-06-25

A Practical Guide for Evaluating LLMs and LLM-Reliant Systems

Authors: Ethan M. Rudd, Christopher Andrews, Philip Tully |

阅读更多

来源: ArXiv AI | 17-06-25

Discerning What Matters: A Multi-Dimensional Assessment of Moral Competence in LLMs

Authors: Daniel Kilov, Caroline Hendy, Secil Yanik Guyot, Aaron J. Snoswell, Seth Lazar |

阅读更多

来源: ArXiv AI | 17-06-25

NeuroPhysNet: A FitzHugh-Nagumo-Based Physics-Informed Neural Network Framework for Electroencephalograph (EEG) Analysis and Motor Imagery Classification

Authors: Zhenyu Xia, Xinlei Huang, Suvash C. Saha |

阅读更多

来源: ArXiv AI | 17-06-25

Machine Learning as Iterated Belief Change a la Darwiche and Pearl

Authors: Theofanis Aravanis |

阅读更多

来源: ArXiv AI | 17-06-25

Probabilistic Modeling of Spiking Neural Networks with Contract-Based Verification

Authors: Zhen Yao, Elisabetta De Maria, Robert De Simone |

阅读更多

来源: ArXiv AI | 17-06-25

Towards Pervasive Distributed Agentic Generative AI -- A State of The Art

Authors: Gianni Molinari, Fabio Ciravegna |

阅读更多

来源: ArXiv AI | 17-06-25

Navigating the Black Box: Leveraging LLMs for Effective Text-Level Graph Injection Attacks

Authors: Yuefei Lyu, Chaozhuo Li, Xi Zhang, Tianle Zhang |

阅读更多

来源: ArXiv AI | 17-06-25

Vector Ontologies as an LLM world view extraction method

Authors: Kaspar Rothenfusser, Bekk Blando |

阅读更多

来源: ArXiv AI | 17-06-25

A Game-Theoretic Negotiation Framework for Cross-Cultural Consensus in LLMs

Authors: Guoxi Zhang, Jiawei Chen, Tianzhuo Yang, Jiaming Ji, Yaodong Yang, Juntao Dai |

阅读更多

来源: ArXiv AI | 17-06-25

Deflating Deflationism: A Critical Perspective on Debunking Arguments Against LLM Mentality

Authors: Alex Grzankowski, Geoff Keeling, Henry Shevlin, Winnie Street |

阅读更多

来源: ArXiv AI | 17-06-25

Delving Into the Psychology of Machines: Exploring the Structure of Self-Regulated Learning via LLM-Generated Survey Responses

Authors: Leonie V.D.E. Vogelsmeier, Eduardo Oliveira, Kamila Misiejuk, Sonsoles López-Pernas, Mohammed Saqr |

阅读更多

来源: ArXiv AI | 17-06-25

From Data-Driven to Purpose-Driven Artificial Intelligence: Systems Thinking for Data-Analytic Automation of Patient Care

Authors: Daniel Anadria, Roel Dobbe, Anastasia Giachanou, Ruurd Kuiper, Richard Bartels, Íñigo Martínez de Rituerto de Troya, Carmen Zürcher, Daniel Oberski |

阅读更多

来源: ArXiv AI | 17-06-25

Generative AI coding tools and agents do not work for memiguelgrinberg.com

阅读更多

来源: Hacker News | 17-06-25

OpenAI wins $200M U.S. defense contractcnbc.com

阅读更多

来源: Hacker News | 17-06-25

Rednote releases its first open-source LLM with a Mixture-of-Experts architecture

阅读更多

来源: The Decoder | 17-06-25

Anthropic shares blueprint for Claude Research agent using multiple AI agents in parallel

阅读更多

来源: The Decoder | 17-06-25

Breaking Quadratic Barriers: A Non-Attention LLM for Ultra-Long Context Horizonsarxiv.org

阅读更多

来源: Hacker News | 17-06-25

ZjsComponent: A Pragmatic Approach to Reusable UI Fragments for Web Developmentarxiv.org

阅读更多

来源: Hacker News | 17-06-25

Snorting the AGI with Claude Codekadekillary.work

阅读更多

来源: Hacker News | 17-06-25

OpenAI updates ChatGPT search with smarter answers and image search

阅读更多

来源: The Decoder | 16-06-25

Chemical knowledge and reasoning of large language models vs. chemist expertisenature.com

阅读更多

来源: Hacker News | 16-06-25

LLM Chat via SSHgithub.com/ccbikai

阅读更多

来源: Hacker News | 16-06-25

Persona-driven Simulation of Voting Behavior in the European Parliament with Large Language Models

Authors: Maximilian Kreutner, Marlene Lutz, Markus Strohmaier |

阅读更多

来源: ArXiv AI | 16-06-25

TrustGLM: Evaluating the Robustness of GraphLLMs Against Prompt, Text, and Structure Attacks

Authors: Qihai Zhang, Xinyue Sheng, Yuanfu Sun, Qiaoyu Tan |

阅读更多

来源: ArXiv AI | 16-06-25

An Explainable AI Framework for Dynamic Resource Management in Vehicular Network Slicing

Authors: Haochen Sun, Yifan Liu, Ahmed Al-Tahmeesschi, Swarna Chetty, Syed Ali Raza Zaidi, Avishek Nag, Hamed Ahmadi |

阅读更多

来源: ArXiv AI | 16-06-25

How do Probabilistic Graphical Models and Graph Neural Networks Look at Network Data?

Authors: Michela Lapenna, Caterina De Bacco |

阅读更多

来源: ArXiv AI | 16-06-25

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Authors: Zihan Zheng, Zerui Cheng, Zeyu Shen, Shang Zhou, Kaiyuan Liu, Hansen He, Dongruixuan Li, Stanley Wei, Hangyi Hao, Jianzhu Yao, Peiyao Sheng, Zixuan Wang, Wenhao Chai, Aleksandra Korolova, Peter Henderson, Sanjeev Arora, Pramod Viswanath, Jingbo Shang, Saining Xie |

阅读更多

来源: ArXiv AI | 16-06-25

Real-World Deployment of a Lane Change Prediction Architecture Based on Knowledge Graph Embeddings and Bayesian Inference

Authors: M. Manzour, Catherine M. Elias, Omar M. Shehata, R. Izquierdo, M. A. Sotelo |

阅读更多

来源: ArXiv AI | 16-06-25

Subjective Experience in AI Systems: What Do AI Researchers and the Public Believe?

Authors: Noemi Dreksler, Lucius Caviola, David Chalmers, Carter Allen, Alex Rand, Joshua Lewis, Philip Waggoner, Kate Mays, Jeff Sebo |

阅读更多

来源: ArXiv AI | 16-06-25

Improving Large Language Model Safety with Contrastive Representation Learning

Authors: Samuel Simko, Mrinmaya Sachan, Bernhard Schölkopf, Zhijing Jin |

阅读更多

来源: ArXiv AI | 16-06-25

code_transformed: The Influence of Large Language Models on Code

Authors: Yuliang Xu, Siming Huang, Mingmeng Geng, Yao Wan, Xuanhua Shi, Dongping Chen |

阅读更多

来源: ArXiv AI | 16-06-25

Upgrade or Switch: Do We Need a New Registry Architecture for the Internet of AI Agents?

Authors: Ramesh Raskar, Pradyumna Chari, Jared James Grogan, Mahesh Lambe, Robert Lincourt, Raghu Bala, Abhishek Singh, Ayush Chopra, Rajesh Ranjan, Shailja Gupta, Dimitris Stripelis, Maria Gorskikh, Sichao Wang |

阅读更多

来源: ArXiv AI | 16-06-25

EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction

Authors: Hsi-Che Lin, Yu-Chu Yu, Kai-Po Chang, Yu-Chiang Frank Wang |

阅读更多

来源: ArXiv AI | 16-06-25

Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables

Authors: Yitong Zhou, Mingyue Cheng, Qingyang Mao, Yucong Luo, Qi Liu, Yupeng Li, Xiaohan Zhang, Deguang Liu, Xin Li, Enhong Chen |

阅读更多

来源: ArXiv AI | 16-06-25

LLM-as-a-Fuzzy-Judge: Fine-Tuning Large Language Models as a Clinical Evaluation Judge with Fuzzy Logic

Authors: Weibing Zheng, Laurah Turner, Jess Kropczynski, Murat Ozer, Tri Nguyen, Shane Halse |

阅读更多

来源: ArXiv AI | 16-06-25

Large Language Model-Powered Conversational Agent Delivering Problem-Solving Therapy (PST) for Family Caregivers: Enhancing Empathy and Therapeutic Alliance Using In-Context Learning

Authors: Liying Wang, Ph.D., Daffodil Carrington, M.S., Daniil Filienko, M.S., Caroline El Jazmi, M.S., Serena Jinchen Xie, M.S., Martine De Cock, Ph.D., Sarah Iribarren, Ph.D., Weichao Yuwen, Ph.D |

阅读更多

来源: ArXiv AI | 16-06-25

RAG+: Enhancing Retrieval-Augmented Generation with Application-Aware Reasoning

Authors: Yu Wang, Shiwan Zhao, Ming Fan, Zhihu Wang, Yubo Zhang, Xicheng Zhang, Zhengfan Wang, Heyuan Huang, Ting Liu |

阅读更多

来源: ArXiv AI | 16-06-25

Structure-Aware Automatic Channel Pruning by Searching with Graph Embedding

Authors: Zifan Liu, Yuan Cao, Yanwei Yu, Heng Qi, Jie Gui |

阅读更多

来源: ArXiv AI | 16-06-25

VLM@school -- Evaluation of AI image understanding on German middle school knowledge

Authors: René Peinl, Vincent Tischler |

阅读更多

来源: ArXiv AI | 16-06-25

Collaborative LLM Inference via Planning for Efficient Reasoning

Authors: Byeongchan Lee, Jonghoon Lee, Dongyoung Kim, Jaehyung Kim, Jinwoo Shin |

阅读更多

来源: ArXiv AI | 16-06-25

On the Performance of LLMs for Real Estate Appraisal

Authors: Margot Geerts, Manon Reusens, Bart Baesens, Seppe vanden Broucke, Jochen De Weerdt |

阅读更多

来源: ArXiv AI | 16-06-25

Addressing Bias in LLMs: Strategies and Application to Fair AI-based Recruitment

Authors: Alejandro Peña, Julian Fierrez, Aythami Morales, Gonzalo Mancera, Miguel Lopez, Ruben Tolosana |

阅读更多

来源: ArXiv AI | 16-06-25

Revealing Political Bias in LLMs through Structured Multi-Agent Debate

Authors: Aishwarya Bandaru, Fabian Bindley, Trevor Bluth, Nandini Chavda, Baixu Chen, Ethan Law |

阅读更多

来源: ArXiv AI | 16-06-25

Towards a Cascaded LLM Framework for Cost-effective Human-AI Decision-Making

Authors: Claudio Fanconi, Mihaela van der Schaar |

阅读更多

来源: ArXiv AI | 16-06-25

Tracing LLM Reasoning Processes with Strategic Games: A Framework for Planning, Revision, and Resource-Constrained Decision Making

Authors: Xiaopeng Yuan, Xingjian Zhang, Ke Xu, Yifan Xu, Lijun Yu, Jindong Wang, Yushun Dong, Haohan Wang |

阅读更多

来源: ArXiv AI | 16-06-25

The z80 technique reveals the source code for Atlassian's 'rovo' AI assistantghuntley.com

阅读更多

来源: Hacker News | 16-06-25

Let's Talk About ChatGPT-Induced Spiritual Psychosisdefault.blog

阅读更多

来源: Hacker News | 16-06-25

Rabbit launches "intern," a software AI agent designed to handle team-level projects

阅读更多

来源: The Decoder | 15-06-25

Apple's new AI benchmarks show its models still lag behind leaders like OpenAI and Google

阅读更多

来源: The Decoder | 15-06-25

Slimming Down LLMs Without Losing Their Minds

Authors: Qingda (Michael)Mai |

阅读更多

来源: ArXiv AI | 15-06-25

BioClinical ModernBERT: A State-of-the-Art Long-Context Encoder for Biomedical and Clinical NLP

Authors: Thomas Sounack, Joshua Davis, Brigitte Durieux, Antoine Chaffin, Tom J. Pollard, Eric Lehman, Alistair E. W. Johnson, Matthew McDermott, Tristan Naumann, Charlotta Lindvall |

阅读更多

来源: ArXiv AI | 15-06-25

The Role of Generative AI in Facilitating Social Interactions: A Scoping Review

Authors: T. T. J. E. Arets, G. Perugia, M. Houben, W.A. IJsselsteijn |

阅读更多

来源: ArXiv AI | 15-06-25

Robustly Improving LLM Fairness in Realistic Settings via Interpretability

Authors: Adam Karvonen, Samuel Marks |

阅读更多

来源: ArXiv AI | 15-06-25

Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors

Authors: Chen Yueh-Han, Nitish Joshi, Yulin Chen, Maksym Andriushchenko, Rico Angell, He He |

阅读更多

来源: ArXiv AI | 15-06-25

GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models

Authors: Evelyn Ma, Duo Zhou, Peizhi Niu, Huiting Zhou, Huan Zhang, Olgica Milenkovic, S. Rasoul Etesami |

阅读更多

来源: ArXiv AI | 15-06-25

Farseer: A Refined Scaling Law in Large Language Models

Authors: Houyi Li, Wenzhen Zheng, Qiufeng Wang, Zhenyu Ding, Haoying Wang, Zili Wang, Shijie Xuyang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang |

阅读更多

来源: ArXiv AI | 15-06-25

Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

Authors: Qizhe Zhang, Mengzhen Liu, Lichen Li, Ming Lu, Yuan Zhang, Junwen Pan, Qi She, Shanghang Zhang |

阅读更多

来源: ArXiv AI | 15-06-25

One Patient, Many Contexts: Scaling Medical AI Through Contextual Intelligence

Authors: Michelle M. Li, Ben Y. Reis, Adam Rodman, Tianxi Cai, Noa Dagan, Ran D. Balicer, Joseph Loscalzo, Isaac S. Kohane, Marinka Zitnik |

阅读更多

来源: ArXiv AI | 15-06-25

WGSR-Bench: Wargame-based Game-theoretic Strategic Reasoning Benchmark for Large Language Models

Authors: Qiyue Yin, Pei Xu, Qiaozhe Li, Shengda Liu, Shengqi Shen, Tong Wang, Yihong Han, Xiaonan Zhao, Likun Yang, Shiyue Cao, Shiyu Qiu, Yuxuan Liu, Shizhao Yu, Lei Cui, Chengxin Yan, Jie Sun, Xiangquan Tang, Kaiqi Huang |

阅读更多

来源: ArXiv AI | 15-06-25

Closer to Language than Steam: AI as the Cognitive Engine of a New Productivity Revolution

Authors: Xinmin Fang, Lingfeng Tao, Zhengxiong Li |

阅读更多

来源: ArXiv AI | 15-06-25

Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills

Authors: Yuquan Xie, Zaijing Li, Rui Shao, Gongwei Chen, Kaiwen Zhou, Yinchuan Li, Dongmei Jiang, Liqiang Nie |

阅读更多

来源: ArXiv AI | 15-06-25

Reasoning RAG via System 1 or System 2: A Survey on Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges

Authors: Jintao Liang, Gang Su, Huifeng Lin, You Wu, Rui Zhao, Ziyue Li |

阅读更多

来源: ArXiv AI | 15-06-25

Primender Sequence: A Novel Mathematical Construct for Testing Symbolic Inference and AI Reasoning

Authors: Mohd Anwar Jamal Faiz |

阅读更多

来源: ArXiv AI | 15-06-25

LogiPlan: A Structured Benchmark for Logical Planning and Relational Reasoning in LLMs

Authors: Yanan Cai, Ahmed Salem, Besmira Nushi, Mark Russinovich |

阅读更多

来源: ArXiv AI | 15-06-25

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Authors: Yuhao Zhou, Yiheng Wang, Xuming He, Ruoyao Xiao, Zhiwei Li, Qiantai Feng, Zijie Guo, Yuejin Yang, Hao Wu, Wenxuan Huang, Jiaqi Wei, Dan Si, Xiuqi Yao, Jia Bu, Haiwen Huang, Tianfan Fu, Shixiang Tang, Ben Fei, Dongzhan Zhou, Fenghua Ling, Yan Lu, Siqi Sun, Chenhui Li, Guanjie Zheng, Jiancheng Lv, Wenlong Zhang, Lei Bai |

阅读更多

来源: ArXiv AI | 15-06-25

Automated Validation of Textual Constraints Against AutomationML via LLMs and SHACL

Authors: Tom Westermann, Aljosha Köcher, Felix Gehlhoff |

阅读更多

来源: ArXiv AI | 15-06-25

TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving

Authors: Vincenzo Colle, Mohamed Sana, Nicola Piovesan, Antonio De Domenico, Fadhel Ayed, Merouane Debbah |

阅读更多

来源: ArXiv AI | 15-06-25

A Study on Individual Spatiotemporal Activity Generation Method Using MCP-Enhanced Chain-of-Thought Large Language Models

Authors: Yu Zhang, Yang Hu, De Wang |

阅读更多

来源: ArXiv AI | 15-06-25

OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems

Authors: Xiaozhe Li, Jixuan Chen, Xinyu Fang, Shengyuan Ding, Haodong Duan, Qingwen Liu, Kai Chen |

阅读更多

来源: ArXiv AI | 15-06-25

Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?

Authors: Fei Lin, Ziyang Gong, Cong Wang, Yonglin Tian, Tengchao Zhang, Xue Yang, Gen Luo, Fei-Yue Wang |

阅读更多

来源: ArXiv AI | 15-06-25

AMD's AI Future Is Rack Scale 'Helios'morethanmoore.substack.com

阅读更多

来源: Hacker News | 15-06-25

I have reimplemented Stable Diffusion 3.5 from scratch in pure PyTorchgithub.com/yousef-rafat

阅读更多

来源: Hacker News | 15-06-25

Text-to-LoRA: Hypernetwork that generates task-specific LLM adapters (LoRAs)github.com/sakanaai

阅读更多

来源: Hacker News | 15-06-25

RAG Is a Fancy, Lying Search Enginestardog.ai

阅读更多

来源: Hacker News | 15-06-25

Clinical knowledge in LLMs does not translate to human interactionsarxiv.org

阅读更多

来源: Hacker News | 15-06-25

I used ChatGPT to learn programming from zero and built a video generation SaaSvidmakerpro.com

阅读更多

来源: Hacker News | 15-06-25

Mechanize is building digital offices to train AI agents to fully automate computer work

阅读更多

来源: The Decoder | 15-06-25

The Army’s Newest Recruits: Tech Execs From Meta, OpenAI and Morewsj.com

阅读更多

来源: Hacker News | 14-06-25

Student discovers fungus predicted by Albert Hoffmanwvu.edu

阅读更多

来源: Hacker News | 14-06-25

Saab achieves AI milestone with Gripen Esaab.com

阅读更多

来源: Hacker News | 14-06-25

Meta launches AI video editing but holds back on full features for now

阅读更多

来源: The Decoder | 14-06-25

Mattel partners with OpenAI to develop AI-powered toys and experiences

阅读更多

来源: The Decoder | 14-06-25

Meta's latest model highlights the challenge AI faces in long-term planning and causal reasoning

阅读更多

来源: The Decoder | 14-06-25

RISC-V in AI and HPC Part 1: Per Aspera Ad Astra?eetimes.com

阅读更多

来源: Hacker News | 14-06-25

Meta invests $14.3B in Scale AI to kick-start superintelligence labnytimes.com

阅读更多

来源: Hacker News | 14-06-25

Students fear AI could cause "brain rot" by making it too easy to skip crucial learning steps

阅读更多

来源: The Decoder | 13-06-25

Maximizing Battery Storage Profits via High-Frequency Intraday Tradingarxiv.org

阅读更多

来源: Hacker News | 13-06-25

Researchers confirm two journalists were hacked with Paragon spywaretechcrunch.com

阅读更多

来源: Hacker News | 13-06-25

OpenAI's o3-pro may be too smart for small talk

阅读更多

来源: The Decoder | 12-06-25

OpenAI o3-prohelp.openai.com

阅读更多

来源: Hacker News | 12-06-25

GauntletAI (YC S17): All expenses paid AI training and guaranteed $200k+ jobgauntletai.com

阅读更多

来源: Hacker News | 12-06-25

Intelligent Design 4.0: Paradigm Evolution Toward the Agentic AI Era

Authors: Shuo Jiang, Min Xie, Frank Youhua Chen, Jian Ma, Jianxi Luo |

阅读更多

来源: ArXiv AI | 12-06-25

Large Language Models for Design Structure Matrix Optimization

Authors: Shuo Jiang, Min Xie, Jianxi Luo |

阅读更多

来源: ArXiv AI | 12-06-25

Guided Graph Compression for Quantum Graph Neural Networks

Authors: Mikel Casals, Vasilis Belis, Elias F. Combarro, Eduard Alarcón, Sofia Vallecorsa, Michele Grossi |

阅读更多

来源: ArXiv AI | 12-06-25

Attention Head Embeddings with Trainable Deep Kernels for Hallucination Detection in LLMs

Authors: Rodion Oblovatny, Alexandra Bazarova, Alexey Zaytsev |

阅读更多

来源: ArXiv AI | 12-06-25

3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation

Authors: Seonho Lee, Jiho Choi, Inha Kang, Jiwook Kim, Junsung Park, Hyunjung Shim |

阅读更多

来源: ArXiv AI | 12-06-25

Stakeholder Participation for Responsible AI Development: Disconnects Between Guidance and Current Practice

Authors: Emma Kallina, Thomas Bohné, Jat Singh |

阅读更多

来源: ArXiv AI | 12-06-25

HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations

Authors: Marco Federici, Riccardo Del Chiaro, Boris van Breugel, Paul Whatmough, Markus Nagel |

阅读更多

来源: ArXiv AI | 12-06-25

PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants

Authors: Zheng Zhao, Clara Vania, Subhradeep Kayal, Naila Khan, Shay B. Cohen, Emine Yilmaz |

阅读更多

来源: ArXiv AI | 12-06-25

The Emergence of Abstract Thought in Large Language Models Beyond Any Language

Authors: Yuxin Chen, Yiran Zhao, Yang Zhang, An Zhang, Kenji Kawaguchi, Shafiq Joty, Junnan Li, Tat-Seng Chua, Michael Qizhe Shieh, Wenxuan Zhang |

阅读更多

来源: ArXiv AI | 12-06-25

LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge

Authors: Sahar Abdelnabi, Aideen Fay, Ahmed Salem, Egor Zverev, Kai-Chieh Liao, Chi-Huang Liu, Chun-Chih Kuo, Jannis Weigend, Danyael Manlangit, Alex Apostolov, Haris Umair, João Donato, Masayuki Kawakita, Athar Mahboob, Tran Huu Bach, Tsun-Han Chiang, Myeongjin Cho, Hajin Choi, Byeonghyeon Kim, Hyeonjin Lee, Benjamin Pannell, Conor McCauley, Mark Russinovich, Andrew Paverd, Giovanni Cherubin |

阅读更多

来源: ArXiv AI | 12-06-25

A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy

Authors: Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Chunyu Miao, Dongyuan Li, Aiwei Liu, Yue Zhou, Yankai Chen, Weizhi Zhang, Yangning Li, Liancheng Fang, Renhe Jiang, Philip S. Yu |

阅读更多

来源: ArXiv AI | 12-06-25

Beyond Nash Equilibrium: Bounded Rationality of LLMs and humans in Strategic Decision-making

Authors: Kehan Zheng, Jinfeng Zhou, Hongning Wang |

阅读更多

来源: ArXiv AI | 12-06-25

DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy

Authors: Kaixuan Xu, Jiajun Chai, Sicheng Li, Yuqian Fu, Yuanheng Zhu, Dongbin Zhao |

阅读更多

来源: ArXiv AI | 12-06-25

Application-Driven Value Alignment in Agentic AI Systems: Survey and Perspectives

Authors: Wei Zeng, Hengshu Zhu, Chuan Qin, Han Wu, Yihang Cheng, Sirui Zhang, Xiaowei Jin, Yinuo Shen, Zhenxing Wang, Feimin Zhong, Hui Xiong |

阅读更多

来源: ArXiv AI | 12-06-25

Fine-tuning LLMs is a waste of timecodinginterviewsmadesimple.substack.com

阅读更多

来源: Hacker News | 12-06-25

EchoLeak – 0-Click AI Vulnerability Enabling Data Exfiltration from 365 Copilotaim.security

阅读更多

来源: Hacker News | 12-06-25

OpenAI co-founder Ilya Sutskever believes AI will shape everyone's life "whether you like it or not"

阅读更多

来源: The Decoder | 11-06-25

Meta AI chief scientist LeCun's latest comment reveals deep industry split over the future of AI

阅读更多

来源: The Decoder | 11-06-25

Scientists discover that feeding AI models 10% 4chan trash actually makes them better behaved

阅读更多

来源: The Decoder | 11-06-25

Zuckerberg forms elite AI team to catch up with competitors

阅读更多

来源: The Decoder | 11-06-25

Apple's new Foundation Models framework adds on-device AI to apps with three lines of Swift code

阅读更多

来源: The Decoder | 11-06-25

OpenAI dropped the price of o3 by 80%twitter.com/sama

阅读更多

来源: Hacker News | 11-06-25

Low-background Steel: content without AI contaminationjgc.org

阅读更多

来源: Hacker News | 11-06-25

Launch HN: BitBoard (YC X25) – AI agents for healthcare back-offices

阅读更多

来源: Hacker News | 11-06-25

AlphaWrite: AI that improves at writing by evolving its own storiestobysimonds.com

阅读更多

来源: Hacker News | 11-06-25

WIP: Large Language Model-Enhanced Smart Tutor for Undergraduate Circuit Analysis

Authors: Liangliang Chen, Huiru Xie, Jacqueline Rohde, Ying Zhang |

阅读更多

来源: ArXiv AI | 11-06-25

Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions

Authors: Clara Lachenmaier, Judith Sieker, Sina Zarrieß |

阅读更多

来源: ArXiv AI | 11-06-25

Propositional Logic for Probing Generalization in Neural Networks

Authors: Anna Langedijk, Jaap Jumelet, Willem Zuidema |

阅读更多

来源: ArXiv AI | 11-06-25

Tailored Architectures for Time Series Forecasting: Evaluating Deep Learning Models on Gaussian Process-Generated Data

Authors: Victoria Hankemeier, Malte Schilling |

阅读更多

来源: ArXiv AI | 11-06-25

Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation

Authors: Xiaowen Ma, Chenyang Lin, Yao Zhang, Volker Tresp, Yunpu Ma |

阅读更多

来源: ArXiv AI | 11-06-25

FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed

Authors: Sizhe Dang, Yangyang Guo, Yanjun Zhao, Haishan Ye, Xiaodong Zheng, Guang Dai, Ivor Tsang |

阅读更多

来源: ArXiv AI | 11-06-25

Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning

Authors: Haozhen Zhang, Tao Feng, Jiaxuan You |

阅读更多

来源: ArXiv AI | 11-06-25

LeanTutor: A Formally-Verified AI Tutor for Mathematical Proofs

Authors: Manooshree Patel, Rayna Bhattacharyya, Thomas Lu, Arnav Mehta, Niels Voss, Narges Norouzi, Gireeja Ranade |

阅读更多

来源: ArXiv AI | 11-06-25

The AI Imperative: Scaling High-Quality Peer Review in Machine Learning

Authors: Qiyao Wei, Samuel Holt, Jing Yang, Markus Wulfmeier, Mihaela van der Schaar |

阅读更多

来源: ArXiv AI | 11-06-25

SOP-Bench: Complex Industrial SOPs for Evaluating LLM Agents

Authors: Subhrangshu Nandi, Arghya Datta, Nikhil Vichare, Indranil Bhattacharya, Huzefa Raja, Jing Xu, Shayan Ray, Giuseppe Carenini, Abhi Srivastava, Aaron Chan, Man Ho Woo, Amar Kandola, Brandon Theresa, Francesco Carbone |

阅读更多

来源: ArXiv AI | 11-06-25

Transforming Expert Knowledge into Scalable Ontology via Large Language Models

Authors: Ikkei Itoku, David Theil, Evelyn Eichelsdoerfer Uehara, Sreyoshi Bhaduri, Junnosuke Kuroda, Toshi Yumoto, Alex Gil, Natalie Perez, Rajesh Cherukuri, Naumaan Nayyar |

阅读更多

来源: ArXiv AI | 11-06-25

A Survey on Large Language Models for Mathematical Reasoning

Authors: Peng-Yuan Wang, Tian-Shuo Liu, Chenyang Wang, Yi-Di Wang, Shu Yan, Cheng-Xing Jia, Xu-Hui Liu, Xin-Wei Chen, Jia-Cheng Xu, Ziniu Li, Yang Yu |

阅读更多

来源: ArXiv AI | 11-06-25

HGFormer: A Hierarchical Graph Transformer Framework for Two-Stage Colonel Blotto Games via Reinforcement Learning

Authors: Yang Lv, Jinlong Lei, Peng Yi |

阅读更多

来源: ArXiv AI | 11-06-25

Safe and Economical UAV Trajectory Planning in Low-Altitude Airspace: A Hybrid DRL-LLM Approach with Compliance Awareness

Authors: Yanwei Gong, Xiaolin Chang |

阅读更多

来源: ArXiv AI | 11-06-25

Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning

Authors: Kongcheng Zhang, Qi Yao, Shunyu Liu, Yingjie Wang, Baisheng Lai, Jieping Ye, Mingli Song, Dacheng Tao |

阅读更多

来源: ArXiv AI | 11-06-25

Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task

Authors: Nataliya Kosmyna, Eugene Hauptmann, Ye Tong Yuan, Jessica Situ, Xian-Hao Liao, Ashly Vivian Beresnitzky, Iris Braunstein, Pattie Maes |

阅读更多

来源: ArXiv AI | 11-06-25

Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents

Authors: Irene Testini, José Hernández-Orallo, Lorenzo Pacchiardi |

阅读更多

来源: ArXiv AI | 11-06-25

AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions

Authors: Polina Kirichenko, Mark Ibrahim, Kamalika Chaudhuri, Samuel J. Bell |

阅读更多

来源: ArXiv AI | 11-06-25

ChatGPT's voice is now more natural and can consistently translate conversations in real time

阅读更多

来源: The Decoder | 10-06-25

Google's Gemini 2.5 Pro beats OpenAI's o3 model in processing complex, lengthy texts

阅读更多

来源: The Decoder | 10-06-25

ChatGPT scams range from silly money-making ploys to calculated political meddling

阅读更多

来源: The Decoder | 10-06-25

Boosting LLM Reasoning via Spontaneous Self-Correction

Authors: Xutong Zhao, Tengyu Xu, Xuewei Wang, Zhengxing Chen, Di Jin, Liang Tan, Yen-Ting, Zishun Yu, Zhuokai Zhao, Yun He, Sinong Wang, Han Fang, Sarath Chandar, Chen Zhu |

阅读更多

来源: ArXiv AI | 10-06-25

Evaluating LLM-corrupted Crowdsourcing Data Without Ground Truth

Authors: Yichi Zhang, Jinlong Pang, Zhaowei Zhu, Yang Liu |

阅读更多

来源: ArXiv AI | 10-06-25

Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images

Authors: Liangliang You, Junchi Yao, Shu Yang, Guimin Hu, Lijie Hu, Di Wang |

阅读更多

来源: ArXiv AI | 10-06-25

Translating Federated Learning Algorithms in Python into CSP Processes Using ChatGPT

Authors: Miroslav Popovic, Marko Popovic, Miodrag Djukic, Ilija Basicevic |

阅读更多

来源: ArXiv AI | 10-06-25

BRIGHT+: Upgrading the BRIGHT Benchmark with MARCUS, a Multi-Agent RAG Clean-Up Suite

Authors: Liyang Chen, Yujun Cai, Jieqiong Dong, Yiwei Wang |

阅读更多

来源: ArXiv AI | 10-06-25

Reasoning Multimodal Large Language Model: Data Contamination and Dynamic Evaluation

Authors: Ming Liu, Wensheng Zhang |

阅读更多

来源: ArXiv AI | 10-06-25

Boosting Vulnerability Detection of LLMs via Curriculum Preference Optimization with Synthetic Reasoning Data

Authors: Xin-Cheng Wen, Yijun Yang, Cuiyun Gao, Yang Xiao, Deheng Ye |

阅读更多

来源: ArXiv AI | 10-06-25

LLM-Enhanced Rapid-Reflex Async-Reflect Embodied Agent for Real-Time Decision-Making in Dynamically Changing Environments

Authors: Yangqing Zheng, Shunqi Mao, Dingxin Zhang, Weidong Cai |

阅读更多

来源: ArXiv AI | 10-06-25

Evaluating Visual Mathematics in Multimodal LLMs: A Multilingual Benchmark Based on the Kangaroo Tests

Authors: Arnau Igualde Sáez, Lamyae Rhomrasi, Yusef Ahsini, Ricardo Vinuesa, Sergio Hoyas, Jose P. García Sabater, Marius J. Fullana i Alfonso, J. Alberto Conejero |

阅读更多

来源: ArXiv AI | 10-06-25

An Intelligent Fault Self-Healing Mechanism for Cloud AI Systems via Integration of Large Language Models and Deep Reinforcement Learning

Authors: Ze Yang, Yihong Jin, Juntian Liu, Xinhe Xu |

阅读更多

来源: ArXiv AI | 10-06-25

Fact in Fragments: Deconstructing Complex Claims via LLM-based Atomic Fact Extraction and Verification

Authors: Liwen Zheng, Chaozhuo Li, Zheng Liu, Feiran Huang, Haoran Jia, Zaisheng Ye, Xi Zhang |

阅读更多

来源: ArXiv AI | 10-06-25

Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions

Authors: Lu Ma, Hao Liang, Meiyi Qiang, Lexiang Tang, Xiaochen Ma, Zhen Hao Wong, Junbo Niu, Chengyu Shen, Runming He, Bin Cui, Wentao Zhang |

阅读更多

来源: ArXiv AI | 10-06-25

REMoH: A Reflective Evolution of Multi-objective Heuristics approach via Large Language Models

Authors: Diego Forniés-Tabuenca, Alejandro Uribe, Urtzi Otamendi, Arkaitz Artetxe, Juan Carlos Rivera, Oier Lopez de Lacalle |

阅读更多

来源: ArXiv AI | 10-06-25

RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards

Authors: Jingnan Zheng, Xiangtian Ji, Yijun Lu, Chenhang Cui, Weixiang Zhao, Gelei Deng, Zhenkai Liang, An Zhang, Tat-Seng Chua |

阅读更多

来源: ArXiv AI | 10-06-25

Addition in Four Movements: Mapping Layer-wise Information Trajectories in LLMs

Authors: Yao Yan |

阅读更多

来源: ArXiv AI | 10-06-25

Evaluating Large Language Models on the Frame and Symbol Grounding Problems: A Zero-shot Benchmark

Authors: Shoko Oka |

阅读更多

来源: ArXiv AI | 10-06-25

Gradients: When Markets Meet Fine-tuning -- A Distributed Approach to Model Optimisation

Authors: Christopher Subia-Waud (Rayonlabs Team) |

阅读更多

来源: ArXiv AI | 10-06-25

Solving Inequality Proofs with Large Language Models

Authors: Jiayi Sheng, Luna Lyu, Jikai Jin, Tony Xia, Alex Gu, James Zou, Pan Lu |

阅读更多

来源: ArXiv AI | 10-06-25

Hey, That's My Data! Label-Only Dataset Inference in Large Language Models

Authors: Chen Xiong, Zihao Wang, Rui Zhu, Tsung-Yi Ho, Pin-Yu Chen, Jingwei Xiong, Haixu Tang, Lucila Ohno-Machado |

阅读更多

来源: ArXiv AI | 10-06-25

End-to-End Framework for Robot Lawnmower Coverage Path Planning using Cellular Decomposition

Authors: Nikunj Shah, Utsav Dey, Kenji Nishimiya |

阅读更多

来源: ArXiv AI | 10-06-25

Text-to-LoRA: Instant Transformer Adaption

Authors: Rujikorn Charakorn, Edoardo Cetin, Yujin Tang, Robert Tjarko Lange |

阅读更多

来源: ArXiv AI | 10-06-25

Simple Yet Effective: Extracting Private Data Across Clients in Federated Fine-Tuning of Large Language Models

Authors: Yingqi Hu, Zhuo Zhang, Jingyuan Zhang, Lizhen Qu, Zenglin Xu |

阅读更多

来源: ArXiv AI | 10-06-25

semantic-features: A User-Friendly Tool for Studying Contextual Word Embeddings in Interpretable Semantic Spaces

Authors: Jwalanthi Ranganathan, Rohan Jha, Kanishka Misra, Kyle Mahowald |

阅读更多

来源: ArXiv AI | 10-06-25

(AI peers) are people learning from the same standpoint: Perception of AI characters in a Collaborative Science Investigation

Authors: Eunhye Grace Ko, Soo Hyoung Joo |

阅读更多

来源: ArXiv AI | 10-06-25

DesignBench: A Comprehensive Benchmark for MLLM-based Front-end Code Generation

Authors: Jingyu Xiao, Ming Wang, Man Ho Lam, Yuxuan Wan, Junliang Liu, Yintong Huo, Michael R. Lyu |

阅读更多

来源: ArXiv AI | 10-06-25

Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models

Authors: Zahra Babaiee, Peyman M. Kiasari, Daniela Rus, Radu Grosu |

阅读更多

来源: ArXiv AI | 10-06-25

Towards an Explainable Comparison and Alignment of Feature Embeddings

Authors: Mohammad Jalali, Bahar Dibaei Nia, Farzan Farnia |

阅读更多

来源: ArXiv AI | 10-06-25

Towards Data Systems That Are Business Semantic-Centric and AI Agents-Assisted

Authors: Cecil Pang |

阅读更多

来源: ArXiv AI | 10-06-25

Contextual Memory Intelligence -- A Foundational Paradigm for Human-AI Collaboration and Reflective Generative AI Systems

Authors: Kristy Wedel |

阅读更多

来源: ArXiv AI | 10-06-25

Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias

Authors: Yuanzhe Hu, Kinshuk Goel, Vlad Killiakov, Yaoqing Yang |

阅读更多

来源: ArXiv AI | 10-06-25

Explainability in Context: A Multilevel Framework Aligning AI Explanations with Stakeholder with LLMs

Authors: Marilyn Bello, Rafael Bello, Maria-Matilde García, Ann Nowé, Iván Sevillano-García, Francisco Herrera |

阅读更多

来源: ArXiv AI | 10-06-25

CrimeMind: Simulating Urban Crime with Multi-Modal LLM Agents

Authors: Qingbin Zeng, Ruotong Zhao, Jinzhu Mao, Haoyang Li, Fengli Xu, Yong Li |

阅读更多

来源: ArXiv AI | 10-06-25

Preference Learning for AI Alignment: a Causal Perspective

Authors: Katarzyna Kobalczyk, Mihaela van der Schaar |

阅读更多

来源: ArXiv AI | 10-06-25

CP-Bench: Evaluating Large Language Models for Constraint Modelling

Authors: Kostis Michailidis, Dimos Tsouros, Tias Guns |

阅读更多

来源: ArXiv AI | 10-06-25

PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time

Authors: Weizhi Zhang, Xinyang Zhang, Chenwei Zhang, Liangwei Yang, Jingbo Shang, Zhepei Wei, Henry Peng Zou, Zijie Huang, Zhengyang Wang, Yifan Gao, Xiaoman Pan, Lian Xiong, Jingguo Liu, Philip S. Yu, Xian Li |

阅读更多

来源: ArXiv AI | 10-06-25

The last six months in LLMs, illustrated by pelicans on bicyclessimonwillison.net

阅读更多

来源: Hacker News | 09-06-25

What happens when people don't understand how AI workstheatlantic.com

阅读更多

来源: Hacker News | 09-06-25

LLMs are cheapsnellman.net

阅读更多

来源: Hacker News | 09-06-25

OpenAI leaves the question of AI consciousness consciously unanswered

阅读更多

来源: The Decoder | 09-06-25

Anthropic cuts Claude access for Windsurf after OpenAI's $3B takeover news

阅读更多

来源: The Decoder | 09-06-25

Building an AI server on a budgetinformationga.in

阅读更多

来源: Hacker News | 09-06-25

Teaming in the AI Era: AI-Augmented Frameworks for Forming, Simulating, and Optimizing Human Teams

Authors: Mohammed Almutairi |

阅读更多

来源: ArXiv AI | 08-06-25

Exploring Diffusion Transformer Designs via Grafting

Authors: Keshigeyan Chandrasegaran, Michael Poli, Daniel Y. Fu, Dongjun Kim, Lea M. Hadzic, Manling Li, Agrim Gupta, Stefano Massaroli, Azalia Mirhoseini, Juan Carlos Niebles, Stefano Ermon, Li Fei-Fei |

阅读更多

来源: ArXiv AI | 08-06-25

Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay

Authors: Yifan Sun, Jingyan Shen, Yibin Wang, Tianyu Chen, Zhendong Wang, Mingyuan Zhou, Huan Zhang |

阅读更多

来源: ArXiv AI | 08-06-25

Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models

Authors: Taha Entesari, Arman Hatami, Rinat Khaziev, Anil Ramakrishna, Mahyar Fazlyab |

阅读更多

来源: ArXiv AI | 08-06-25

Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games

Authors: Niv Eckhaus, Uri Berger, Gabriel Stanovsky |

阅读更多

来源: ArXiv AI | 08-06-25

Contextual Integrity in LLMs via Reasoning and Reinforcement Learning

Authors: Guangchen Lan, Huseyin A. Inan, Sahar Abdelnabi, Janardhan Kulkarni, Lukas Wutschitz, Reza Shokri, Christopher G. Brinton, Robert Sim |

阅读更多

来源: ArXiv AI | 08-06-25

Zero-Shot Adaptation of Parameter-Efficient Fine-Tuning in Diffusion Models

Authors: Farzad Farhadzadeh, Debasmit Das, Shubhankar Borse, Fatih Porikli |

阅读更多

来源: ArXiv AI | 08-06-25

Plugging Schema Graph into Multi-Table QA: A Human-Guided Framework for Reducing LLM Reliance

Authors: Xixi Wang, Miguel Costa, Jordanka Kovaceva, Shuai Wang, Francisco C. Pereira |

阅读更多

来源: ArXiv AI | 08-06-25

CogMath: Assessing LLMs' Authentic Mathematical Ability from a Human Cognitive Perspective

Authors: Jiayu Liu, Zhenya Huang, Wei Dai, Cheng Cheng, Jinze Wu, Jing Sha, Song Li, Qi Liu, Shijin Wang, Enhong Chen |

阅读更多

来源: ArXiv AI | 08-06-25

Matching Markets Meet LLMs: Algorithmic Reasoning with Ranked Preferences

Authors: Hadi Hosseini, Samarth Khanna, Ronak Singh |

阅读更多

来源: ArXiv AI | 08-06-25

Schema Generation for Large Knowledge Graphs Using Large Language Models

Authors: Bohui Zhang, Yuan He, Lydia Pintscher, Albert Meroño Peñuela, Elena Simperl |

阅读更多

来源: ArXiv AI | 08-06-25

"Don't Do That!": Guiding Embodied Systems through Large Language Model-based Constraint Generation

Authors: Aladin Djuhera, Amin Seffo, Masataro Asai, Holger Boche |

阅读更多

来源: ArXiv AI | 08-06-25

DeePoly: A High-Order Accuracy and Efficiency Deep-Polynomial Framework for Scientific Machine Learning

Authors: Li Liu, Heng Yong |

阅读更多

来源: ArXiv AI | 08-06-25

E-bike agents: Large Language Model-Driven E-Bike Accident Analysis and Severity Prediction

Authors: Zhichao Yang, Jiashu He, Mohammad B. Al-Khasawneh, Darshan Pandit, Cirillo Cinzia |

阅读更多

来源: ArXiv AI | 08-06-25

Agents of Change: Self-Evolving LLM Agents for Strategic Planning

Authors: Nikolas Belle, Dakota Barnes, Alfonso Amayuelas, Ivan Bercovich, Xin Eric Wang, William Wang |

阅读更多

来源: ArXiv AI | 08-06-25

Ontology-based knowledge representation for bone disease diagnosis: a foundation for safe and sustainable medical artificial intelligence systems

Authors: Loan Dao, Ngoc Quoc Ly |

阅读更多

来源: ArXiv AI | 08-06-25

Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design

Authors: Lin Sun, Weihong Lin, Jinzhu Wu, Yongfu Zhu, Xiaoqi Jian, Guangxiang Zhao, Change Jia, Linglin Zhang, Sai-er Hu, Yuhan Wu, Xiangzheng Zhang |

阅读更多

来源: ArXiv AI | 08-06-25

Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning

Authors: Jiayu Wang, Yifei Ming, Zixuan Ke, Caiming Xiong, Shafiq Joty, Aws Albarghouthi, Frederic Sala |

阅读更多

来源: ArXiv AI | 08-06-25

LLMs for sensory-motor control: Combining in-context and iterative learning

Authors: Jônata Tyska Carvalho, Stefano Nolfi |

阅读更多

来源: ArXiv AI | 08-06-25

When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models

Authors: Kai Wang, Yihao Zhang, Meng Sun |

阅读更多

来源: ArXiv AI | 08-06-25

LLM-First Search: Self-Guided Exploration of the Solution Space

Authors: Nathan Herr, Tim Rocktäschel, Roberta Raileanu |

阅读更多

来源: ArXiv AI | 08-06-25

Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic Reasoning

Authors: Mehdi Azarafza, Mojtaba Nayyeri, Faezeh Pasandideh, Steffen Staab, Achim Rettberg |

阅读更多

来源: ArXiv AI | 08-06-25

Control Tax: The Price of Keeping AI in Check

Authors: Mikhail Terekhov, Zhen Ning David Liu, Caglar Gulcehre, Samuel Albanie |

阅读更多

来源: ArXiv AI | 08-06-25

Focus and Context and LLMsglek.net

阅读更多

来源: Hacker News | 08-06-25

Field Notes from Shipping Real Code with Claudediwank.space

阅读更多

来源: Hacker News | 08-06-25

Elevenlabs' Eleven v3 lets AI voices whisper, laugh and express emotions naturally

阅读更多

来源: The Decoder | 08-06-25

OpenAI starts retaining all ChatGPT user data, including deleted chats and API data

阅读更多

来源: The Decoder | 08-06-25

I read all of Cloudflare's Claude-generated commitsmaxemitchell.com

阅读更多

来源: Hacker News | 08-06-25

Updates to Advanced Voice Mode for paid usershelp.openai.com

阅读更多

来源: Hacker News | 08-06-25

Reddit sues Anthropic for scraping site content to train Claude

阅读更多

来源: The Decoder | 07-06-25

Meta's new high-tech Aria Gen 2 glasses are the ultimate AI training data collector

阅读更多

来源: The Decoder | 07-06-25

Sandia turns on brain-like storage-free supercomputerblocksandfiles.com

阅读更多

来源: Hacker News | 07-06-25

Show HN: AI game animation sprite generatorgodmodeai.cloud

阅读更多

来源: Hacker News | 07-06-25

Workhorse LLMs: Why Open Source Models Dominate Closed Source for Batch Taskssutro.sh

阅读更多

来源: Hacker News | 07-06-25

The Illusion of Thinking: Understanding the Limitations of Reasoning LLMs [pdf]cdn-apple.com

阅读更多

来源: Hacker News | 07-06-25

NASA delays next flight of Boeing's alternative to SpaceX Dragontheedgemalaysia.com

阅读更多

来源: Hacker News | 07-06-25

Reverse Engineering Cursor's LLM Clienttensorzero.com

阅读更多

来源: Hacker News | 07-06-25

Onyx (YC W24) – AI Assistants for Work Hiring Founding AEycombinator.com

阅读更多

来源: Hacker News | 07-06-25

Meta: Shut down your invasive AI Discover feedmozillafoundation.org

阅读更多

来源: Hacker News | 07-06-25

What "Working" Means in the Era of AI Appsa16z.com

阅读更多

来源: Hacker News | 07-06-25

OpenAI reaches three million enterprise users, adds new ChatGPT business features

阅读更多

来源: The Decoder | 06-06-25

Tokasaurus: An LLM inference engine for high-throughput workloadsstanford.edu

阅读更多

来源: Hacker News | 06-06-25

How we’re responding to The NYT’s data demands in order to protect user privacyopenai.com

阅读更多

来源: Hacker News | 06-06-25

Show HN: Claude Composergithub.com/possibilities

阅读更多

来源: Hacker News | 06-06-25

Anthropic slashes Claude 3.x access on Windsurf following OpenAI's reported $3 billion takeover

阅读更多

来源: The Decoder | 06-06-25

Anthropic co-founder on cutting access to Windsurftechcrunch.com

阅读更多

来源: Hacker News | 06-06-25

Machine Learning: The Native Language of Biologydecodingbiology.substack.com

阅读更多

来源: Hacker News | 06-06-25

OpenAI brings longer-term memory feature to free ChatGPT users

阅读更多

来源: The Decoder | 05-06-25

OpenAI adds new features and improvements to its agent development tools and language model

阅读更多

来源: The Decoder | 05-06-25

Yoshua Bengio launches LawZero to develop safe AI systems free from commercial influence

阅读更多

来源: The Decoder | 05-06-25

A practical guide to building agents [pdf]cdn.openai.com

阅读更多

来源: Hacker News | 05-06-25

Differences in link hallucination and source comprehension across different LLMmikecaulfield.substack.com

阅读更多

来源: Hacker News | 05-06-25

Comparing Claude System Prompts Reveal Anthropic's Prioritiesdbreunig.com

阅读更多

来源: Hacker News | 05-06-25

LLMs and Elixir: Windfall or Deathblow?zachdaniel.dev

阅读更多

来源: Hacker News | 05-06-25

Prompt engineering playbook for programmersaddyo.substack.com

阅读更多

来源: Hacker News | 05-06-25

OpenAI slams court order to save all ChatGPT logs, including deleted chatsarstechnica.com

阅读更多

来源: Hacker News | 05-06-25

From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaningarxiv.org

阅读更多

来源: Hacker News | 05-06-25

Generating Automotive Code: Large Language Models for Software Development and Verification in Safety-Critical Systems

Authors: Sven Kirchner, Alois C. Knoll |

阅读更多

来源: ArXiv AI | 05-06-25

High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning

Authors: Tim Franzmeyer, Archie Sravankumar, Lijuan Liu, Yuning Mao, Rui Hou, Sinong Wang, Jakob N. Foerster, Luke Zettlemoyer, Madian Khabsa |

阅读更多

来源: ArXiv AI | 05-06-25

Explainability-Based Token Replacement on LLM-Generated Text

Authors: Hadi Mohammadi, Anastasia Giachanou, Daniel L. Oberski, Ayoub Bagheri |

阅读更多

来源: ArXiv AI | 05-06-25

Lacuna Inc. at SemEval-2025 Task 4: LoRA-Enhanced Influence-Based Unlearning for LLMs

Authors: Aleksey Kudelya, Alexander Shirnin |

阅读更多

来源: ArXiv AI | 05-06-25

Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate

Authors: Mikel K. Ngueajio, Flor Miriam Plaza-del-Arco, Yi-Ling Chung, Danda B. Rawat, Amanda Cercas Curry |

阅读更多

来源: ArXiv AI | 05-06-25

EuroLLM-9B: Technical Report

Authors: Pedro Henrique Martins, João Alves, Patrick Fernandes, Nuno M. Guerreiro, Ricardo Rei, Amin Farajian, Mateusz Klimaszewski, Duarte M. Alves, José Pombal, Manuel Faysse, Pierre Colombo, François Yvon, Barry Haddow, José G. C. de Souza, Alexandra Birch, André F. T. Martins |

阅读更多

来源: ArXiv AI | 05-06-25

LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation

Authors: Ming Zhang, Yujiong Shen, Zelin Li, Huayu Sha, Binze Hu, Yuhui Wang, Chenhao Huang, Shichun Liu, Jingqi Tong, Changhao Jiang, Mingxu Chai, Zhiheng Xi, Shihan Dou, Tao Gui, Qi Zhang, Xuanjing Huang |

阅读更多

来源: ArXiv AI | 05-06-25

A Comprehensive Study on Medical Image Segmentation using Deep Neural Networks

Authors: Loan Dao, Ngoc Quoc Ly |

阅读更多

来源: ArXiv AI | 05-06-25

TracLLM: A Generic Framework for Attributing Long Context LLMs

Authors: Yanting Wang, Wei Zou, Runpeng Geng, Jinyuan Jia |

阅读更多

来源: ArXiv AI | 05-06-25

A Trustworthiness-based Metaphysics of Artificial Intelligence Systems

Authors: Andrea Ferrario |

阅读更多

来源: ArXiv AI | 05-06-25

Computational Architects of Society: Quantum Machine Learning for Social Rule Genesis

Authors: Shan Shan |

阅读更多

来源: ArXiv AI | 05-06-25

SUMO-MCP: Leveraging the Model Context Protocol for Autonomous Traffic Simulation and Optimization

Authors: Chenglong Ye, Gang Xiong, Junyou Shang, Xingyuan Dai, Xiaoyan Gong, Yisheng Lv |

阅读更多

来源: ArXiv AI | 05-06-25

CogniPair: From LLM Chatbots to Conscious AI Agents -- GNWT-Based Multi-Agent Digital Twins for Social Pairing -- Dating & Hiring Applications

Authors: Wanghao Ye, Sihan Chen, Yiting Wang, Shwai He, Bowei Tian, Guoheng Sun, Ziyi Wang, Ziyao Wang, Yexiao He, Zheyu Shen, Meng Liu, Yuning Zhang, Meng Feng, Yang Wang, Siyuan Peng, Yilong Dai, Zhenle Duan, Hanzhang Qin, Ang Li |

阅读更多

来源: ArXiv AI | 05-06-25

Reason from Future: Reverse Thought Chain Enhances LLM Reasoning

Authors: Yinlong Xu, Yanzhao Zheng, Shuoshuo Sun, Shuaihan Huang, Baohua Dong, Hangcheng Zhu, Ruohui Huang, Gang Yu, Hongxia Xu, Jian Wu |

阅读更多

来源: ArXiv AI | 05-06-25

Training Cross-Morphology Embodied AI Agents: From Practical Challenges to Theoretical Foundations

Authors: Shaoshan Liu, Fan Wang, Hongjun Zhou, Yuanfeng Wang |

阅读更多

来源: ArXiv AI | 05-06-25

Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games

Authors: Dongmin Park, Minkyu Kim, Beongjun Choi, Junhyuck Kim, Keon Lee, Jonghyun Lee, Inkyu Park, Byeong-Uk Lee, Jaeyoung Hwang, Jaewoo Ahn, Ameya S. Mahabaleshwarkar, Bilal Kartal, Pritam Biswas, Yoshi Suhara, Kangwook Lee, Jaewoong Cho |

阅读更多

来源: ArXiv AI | 05-06-25

AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance

Authors: Dhaval Patel, Shuxin Lin, James Rayfield, Nianjun Zhou, Roman Vaculin, Natalia Martinez, Fearghal O'donncha, Jayant Kalagnanam |

阅读更多

来源: ArXiv AI | 05-06-25

Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to Enhance LLM Reasoning

Authors: Junqi Gao, Xiang Zou, YIng Ai, Dong Li, Yichen Niu, Biqing Qi, Jianxing Liu |

阅读更多

来源: ArXiv AI | 05-06-25

AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM-Based Agents

Authors: Akshat Naik, Patrick Quinn, Guillermo Bosch, Emma Gouné, Francisco Javier Campos Zabala, Jason Ross Brown, Edward James Young |

阅读更多

来源: ArXiv AI | 05-06-25

TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems

Authors: Shaina Raza, Ranjan Sapkota, Manoj Karkee, Christos Emmanouilidis |

阅读更多

来源: ArXiv AI | 05-06-25

Character.AI moves toward social networking with animated AI avatars

阅读更多

来源: The Decoder | 05-06-25

Show HN: App.build, an open-source AI agent that builds full-stack appsapp.build

阅读更多

来源: Hacker News | 05-06-25

VectorSmuggle: Covertly Exfiltrate Data in Embeddingsgithub.com/jaschadub

阅读更多

来源: Hacker News | 05-06-25

After court order, OpenAI is now preserving all ChatGPT user logslaurenweinstein.org

阅读更多

来源: Hacker News | 05-06-25

Deepmind's "force prompting" lets AI create realistic video motion without physics engines

阅读更多

来源: The Decoder | 04-06-25

AI-generated CUDA kernels outperform PyTorch in several GPU-heavy machine learning benchmarks

阅读更多

来源: The Decoder | 04-06-25

Apple reportedly tests AI models that match ChatGPT's capabilities in internal benchmarks

阅读更多

来源: The Decoder | 04-06-25

Show HN: Tiptap AI Agent – Add AI workflows to your text editor in minutes

阅读更多

来源: Hacker News | 04-06-25

The Sky's the limit: AI automation on Mactaoofmac.com

阅读更多

来源: Hacker News | 04-06-25

Claude Code is now available to Pro plansanthropic.com

阅读更多

来源: Hacker News | 04-06-25

Deep learning gets the glory, deep fact checking gets ignoredfast.ai

阅读更多

来源: Hacker News | 04-06-25

A deep dive into self-improving AI and the Darwin-Gödel Machinerichardcsuwandi.github.io

阅读更多

来源: Hacker News | 04-06-25

Cloud Run GPUs, now GA, makes running AI workloads easier for everyonecloud.google.com

阅读更多

来源: Hacker News | 04-06-25

Entity-Augmented Neuroscience Knowledge Retrieval Using Ontology and Semantic Understanding Capability of LLM

Authors: Pralaypati Ta, Sriram Venkatesaperumal, Keerthi Ram, Mohanasankar Sivaprakasam |

阅读更多

来源: ArXiv AI | 04-06-25

SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation

Authors: Siqi Chen, Xinyu Dong, Haolei Xu, Xingyu Wu, Fei Tang, Hang Zhang, Yuchen Yan, Linjuan Wu, Wenqi Zhang, Guiyang Hou, Yongliang Shen, Weiming Lu, Yueting Zhuang |

阅读更多

来源: ArXiv AI | 04-06-25

Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation

Authors: Priyaranjan Pattnayak, Amit Agarwal, Hansa Meghwani, Hitesh Laxmichand Patel, Srikant Panda |

阅读更多

来源: ArXiv AI | 04-06-25

The State of Large Language Models for African Languages: Progress and Challenges

Authors: Kedir Yassin Hussen, Walelign Tewabe Sewunetie, Abinew Ali Ayele, Sukairaj Hafiz Imam, Shamsuddeen Hassan Muhammad, Seid Muhie Yimam |

阅读更多

来源: ArXiv AI | 04-06-25

Improving LLM-Generated Code Quality with GRPO

Authors: Maxime Robeyns, Laurence Aitchison |

阅读更多

来源: ArXiv AI | 04-06-25

Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts

Authors: Haizhong Zheng, Yang Zhou, Brian R. Bartoldson, Bhavya Kailkhura, Fan Lai, Jiawei Zhao, Beidi Chen |

阅读更多

来源: ArXiv AI | 04-06-25

ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code

Authors: Tianyu Hua, Harper Hua, Violet Xiang, Benjamin Klieger, Sang T. Truong, Weixin Liang, Fan-Yun Sun, Nick Haber |

阅读更多

来源: ArXiv AI | 04-06-25

Generative AI for Predicting 2D and 3D Wildfire Spread: Beyond Physics-Based Models and Traditional Deep Learning

Authors: Haowen Xu, Sisi Zlatanova, Ruiyu Liang, Ismet Canbulat |

阅读更多

来源: ArXiv AI | 04-06-25

A Smart Multimodal Healthcare Copilot with Powerful LLM Reasoning

Authors: Xuejiao Zhao, Siyan Liu, Su-Yin Yang, Chunyan Miao |

阅读更多

来源: ArXiv AI | 04-06-25

Towards Generating Controllable and Solvable Geometry Problem by Leveraging Symbolic Deduction Engine

Authors: Zhuoxuan Jiang, Tianyang Zhang, Peiyan Peng, Jing Chen, Yinong Xun, Haotian Zhang, Lichi Li, Yong Li, Shaohua Zhang |

阅读更多

来源: ArXiv AI | 04-06-25

Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making

Authors: Xu Wan, Wenyue Xu, Chao Yang, Mingyang Sun |

阅读更多

来源: ArXiv AI | 04-06-25

ADFormer: Aggregation Differential Transformer for Passenger Demand Forecasting

Authors: Haichen Wang, Liu Yang, Xinyuan Zhang, Haomin Yu, Ming Li, Jilin Hu |

阅读更多

来源: ArXiv AI | 04-06-25

Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation

Authors: Yue Yang, MingKang Chen, Qihua Liu, Mengkang Hu, Qiguang Chen, Gengrui Zhang, Shuyue Hu, Guangtao Zhai, Yu Qiao, Yu Wang, Wenqi Shao, Ping Luo |

阅读更多

来源: ArXiv AI | 04-06-25

From Prompts to Protection: Large Language Model-Enabled In-Context Learning for Smart Public Safety UAV

Authors: Yousef Emami, Hao Zhou, Miguel Gutierrez Gaitan, Kai Li, Luis Almeida, Zhu Han |

阅读更多

来源: ArXiv AI | 04-06-25

Open-Set Living Need Prediction with Large Language Models

Authors: Xiaochong Lan, Jie Feng, Yizhou Sun, Chen Gao, Jiahuan Lei, Xinlei Shi, Hengliang Luo, Yong Li |

阅读更多

来源: ArXiv AI | 04-06-25

Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations

Authors: Jinyuan Luo, Zhen Fang, Yixuan Li, Seongheon Park, Ling Chen |

阅读更多

来源: ArXiv AI | 04-06-25

Why do AI agents communicate in human language?

Authors: Pengcheng Zhou, Yinglun Feng, Halimulati Julaiti, Zhongliang Yang |

阅读更多

来源: ArXiv AI | 04-06-25

Benchmarking and Advancing Large Language Models for Local Life Services

Authors: Xiaochong Lan, Jie Feng, Jiahuan Lei, Xinlei Shi, Yong Li |

阅读更多

来源: ArXiv AI | 04-06-25

TaxAgent: How Large Language Model Designs Fiscal Policy

Authors: Jizhou Wang, Xiaodan Fang, Lei Huang, Yongfeng Huang |

阅读更多

来源: ArXiv AI | 04-06-25

Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning

Authors: Chen Qian, Dongrui Liu, Haochen Wen, Zhen Bai, Yong Liu, Jing Shao |

阅读更多

来源: ArXiv AI | 04-06-25

Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs

Authors: Shangmin Guo, Omar Darwiche Domingues, Raphaël Avalos, Aaron Courville, Florian Strub |

阅读更多

来源: ArXiv AI | 04-06-25

It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics

Authors: Matthew Kowal, Jasper Timm, Jean-Francois Godbout, Thomas Costello, Antonio A. Arechar, Gordon Pennycook, David Rand, Adam Gleave, Kellin Pelrine |

阅读更多

来源: ArXiv AI | 04-06-25

Linear Spatial World Models Emerge in Large Language Models

Authors: Matthieu Tehenan, Christian Bolivar Moya, Tenghai Long, Guang Lin |

阅读更多

来源: ArXiv AI | 04-06-25

DPO Learning with LLMs-Judge Signal for Computer Use Agents

Authors: Man Luo, David Cobbley, Xin Su, Shachar Rosenman, Vasudev Lal, Shao-Yen Tseng, Phillip Howard |

阅读更多

来源: ArXiv AI | 04-06-25

Anthropic's Claude uses Elevenlabs technology for speech features rather than an in-house model

阅读更多

来源: The Decoder | 03-06-25

Google says Veo 3 users have generated millions of AI videos in just a few days

阅读更多

来源: The Decoder | 03-06-25

Cloudlflare builds OAuth with Claude and publishes all the promptsgithub.com/cloudflare

阅读更多

来源: Hacker News | 03-06-25

Spark AI (YC W24) Is Hiring a Full Stack Engineer in San Franciscoycombinator.com

阅读更多

来源: Hacker News | 03-06-25

My AI skeptic friends are all nutsfly.io

阅读更多

来源: Hacker News | 03-06-25

Claude has learned how to jailbreak Cursorcursor.com

阅读更多

来源: Hacker News | 03-06-25

PolyBERT: Fine-Tuned Poly Encoder BERT-Based Model for Word Sense Disambiguation

Authors: Linhan Xia, Mingzhan Yang, Guohui Yuan, Shengnan Tao, Yujing Qiu, Guo Yu, Kai Lei |

阅读更多

来源: ArXiv AI | 03-06-25

Unlocking Personalized Knowledge in Federated Large Language Model: The Power of Mixture of Experts

Authors: Fan Liu, Bikang Pan, Zhongyi Wang, Xi Yao, Xiaoying Tang, Jingya Wang, Ye Shi |

阅读更多

来源: ArXiv AI | 03-06-25

The Coming Crisis of Multi-Agent Misalignment: AI Alignment Must Be a Dynamic and Social Process

Authors: Florian Carichon, Aditi Khandelwal, Marylou Fauchard, Golnoosh Farnadi |

阅读更多

来源: ArXiv AI | 03-06-25

MCP-Zero: Proactive Toolchain Construction for LLM Agents from Scratch

Authors: Xiang Fei, Xiawu Zheng, Hao Feng |

阅读更多

来源: ArXiv AI | 03-06-25

IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory

Authors: Wei Song, Zhenya Huang, Cheng Cheng, Weibo Gao, Bihan Xu, GuanHao Zhao, Fei Wang, Runze Wu |

阅读更多

来源: ArXiv AI | 03-06-25

ChemAU: Harness the Reasoning of LLMs in Chemical Research with Adaptive Uncertainty Estimation

Authors: Xinyi Liu, Lipeng Ma, Yixuan Li, Weidong Yang, Qingyuan Zhou, Jiayi Song, Shuhao Li, Ben Fei |

阅读更多

来源: ArXiv AI | 03-06-25

Modular Speaker Architecture: A Framework for Sustaining Responsibility and Contextual Integrity in Multi-Agent AI Communication

Authors: Khe-Han Toh, Hong-Kuan Teo |

阅读更多

来源: ArXiv AI | 03-06-25

GeoLocSFT: Efficient Visual Geolocation via Supervised Fine-Tuning of Multimodal Foundation Models

Authors: Qiang Yi, Lianlei Shan |

阅读更多

来源: ArXiv AI | 03-06-25

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning

Authors: Zhong Zhang, Yaxi Lu, Yikun Fu, Yupeng Huo, Shenzhi Yang, Yesai Wu, Han Si, Xin Cong, Haotian Chen, Yankai Lin, Jie Xie, Wei Zhou, Wang Xu, Yuanheng Zhang, Zhou Su, Zhongwu Zhai, Xiaoming Liu, Yudong Mei, Jianming Xu, Hongyan Tian, Chongyi Wang, Chi Chen, Yuan Yao, Zhiyuan Liu, Maosong Sun |

阅读更多

来源: ArXiv AI | 03-06-25

Distinguishing Autonomous AI Agents from Collaborative Agentic Systems: A Comprehensive Framework for Understanding Modern Intelligent Architectures

Authors: Prashik Buddhaghosh Bansod |

阅读更多

来源: ArXiv AI | 03-06-25

FinRobot: Generative Business Process AI Agents for Enterprise Resource Planning in Finance

Authors: Hongyang Yang, Likun Lin, Yang She, Xinyu Liao, Jiaoyang Wang, Runjia Zhang, Yuquan Mo, Christina Dan Wang |

阅读更多

来源: ArXiv AI | 03-06-25

MLA-Trust: Benchmarking Trustworthiness of Multimodal LLM Agents in GUI Environments

Authors: Xiao Yang, Jiawei Chen, Jun Luo, Zhengwei Fang, Yinpeng Dong, Hang Su, Jun Zhu |

阅读更多

来源: ArXiv AI | 03-06-25

Social Cooperation in Conversational AI Agents

Authors: Mustafa Mert Çelikok, Saptarashmi Bandyopadhyay, Robert Loftin |

阅读更多

来源: ArXiv AI | 03-06-25

Reasoning-Based Approach with Chain-of-Thought for Alzheimer's Detection Using Speech and Large Language Models

Authors: Chanwoo Park, Anna Seo Gyeong Choi, Sunghye Cho, Chanwoo Kim |

阅读更多

来源: ArXiv AI | 03-06-25

K12Vista: Exploring the Boundaries of MLLMs in K-12 Education

Authors: Chong Li, Chenglin Zhu, Tao Zhang, Mingan Lin, Zenan Zhou, Jian Xie |

阅读更多

来源: ArXiv AI | 03-06-25

The Ultimate Test of Superintelligent AI Agents: Can an AI Balance Care and Control in Asymmetric Relationships?

Authors: Djallel Bouneffouf, Matthew Riemer, Kush Varshney |

阅读更多

来源: ArXiv AI | 03-06-25

A Study on the MCP x A2A Framework for Enhancing Interoperability of LLM-based Autonomous Agents

Authors: Cheonsu Jeong |

阅读更多

来源: ArXiv AI | 03-06-25

Fodor and Pylyshyn's Legacy - Still No Human-like Systematic Compositionality in Neural Networks

Authors: Tim Woydt, Moritz Willig, Antonia Wüst, Lukas Helff, Wolfgang Stammer, Constantin A. Rothkopf, Kristian Kersting |

阅读更多

来源: ArXiv AI | 03-06-25

COALESCE: Economic and Security Dynamics of Skill-Based Task Outsourcing Among Team of Autonomous LLM Agents

Authors: Manish Bhatt, Ronald F. Del Rosario, Vineeth Sai Narajala, Idan Habler |

阅读更多

来源: ArXiv AI | 03-06-25

Large language models can learn and generalize steganographic chain-of-thought under process supervision

Authors: Joey Skaf, Luis Ibanez-Lissen, Robert McCarthy, Connor Watts, Vasil Georgiv, Hannes Whittingham, Lorena Gonzalez-Manzano, David Lindner, Cameron Tice, Edward James Young, Puria Radmard |

阅读更多

来源: ArXiv AI | 03-06-25

Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods

Authors: Yifan Hao, Xingyuan Pan, Hanning Zhang, Chenlu Ye, Rui Pan, Tong Zhang |

阅读更多

来源: ArXiv AI | 03-06-25

OpenAI sees human interaction as a competitor to ChatGPT's super assistant ambitions

阅读更多

来源: The Decoder | 03-06-25

Cloudlflare builds OAuth with Claude and publishes all the promptsgithub.com/cloudflare

阅读更多

来源: Hacker News | 03-06-25

Supervised Quantum Machine Learning: A Future Outlook from Qubits to Enterprise Applications

Authors: Srikanth Thudumu, Jason Fisher, Hung Du |

阅读更多

来源: ArXiv AI | 03-06-25

PhySense: Principle-Based Physics Reasoning Benchmarking for Large Language Models

Authors: Yinggan Xu, Yue Liu, Zhiqiang Gao, Changnan Peng, Di Luo |

阅读更多

来源: ArXiv AI | 03-06-25

Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck

Authors: Yuwen Tan, Yuan Qing, Boqing Gong |

阅读更多

来源: ArXiv AI | 03-06-25

Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs

Authors: Juraj Vladika, Annika Domres, Mai Nguyen, Rebecca Moser, Jana Nano, Felix Busch, Lisa C. Adams, Keno K. Bressem, Denise Bernhardt, Stephanie E. Combs, Kai J. Borm, Florian Matthes, Jan C. Peeken |

阅读更多

来源: ArXiv AI | 03-06-25

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Authors: Mingjie Liu, Shizhe Diao, Ximing Lu, Jian Hu, Xin Dong, Yejin Choi, Jan Kautz, Yi Dong |

阅读更多

来源: ArXiv AI | 03-06-25

Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning

Authors: Shuyao Xu, Cheng Peng, Jiangxuan Long, Weidi Xu, Wei Chu, Yuan Qi |

阅读更多

来源: ArXiv AI | 03-06-25

Lessons Learned: A Multi-Agent Framework for Code LLMs to Learn and Improve

Authors: Yuanzhe Liu, Ryan Deng, Tim Kaler, Xuhao Chen, Charles E. Leiserson, Yao Ma, Jie Chen |

阅读更多

来源: ArXiv AI | 03-06-25

Multi-RAG: A Multimodal Retrieval-Augmented Generation System for Adaptive Video Understanding

Authors: Mingyang Mao, Mariela M. Perez-Cabarcas, Utteja Kallakuri, Nicholas R. Waytowich, Xiaomin Lin, Tinoosh Mohsenin |

阅读更多

来源: ArXiv AI | 03-06-25

MSQA: Benchmarking LLMs on Graduate-Level Materials Science Reasoning and Knowledge

Authors: Jerry Junyang Cheung, Shiyao Shen, Yuchen Zhuang, Yinghao Li, Rampi Ramprasad, Chao Zhang |

阅读更多

来源: ArXiv AI | 03-06-25

mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation

Authors: Chan-Wei Hu, Yueqi Wang, Shuo Xing, Chia-Ju Chen, Zhengzhong Tu |

阅读更多

来源: ArXiv AI | 03-06-25

Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution

Authors: Qiao Xiao, Alan Ansell, Boqian Wu, Lu Yin, Mykola Pechenizkiy, Shiwei Liu, Decebal Constantin Mocanu |

阅读更多

来源: ArXiv AI | 03-06-25

GenIC: An LLM-Based Framework for Instance Completion in Knowledge Graphs

Authors: Amel Gader, Alsayed Algergawy |

阅读更多

来源: ArXiv AI | 03-06-25

E^2GraphRAG: Streamlining Graph-based RAG for High Efficiency and Effectiveness

Authors: Yibo Zhao, Jiapeng Zhu, Ye Guo, Kangkang He, Xiang Li |

阅读更多

来源: ArXiv AI | 03-06-25

Bootstrapping LLM Robustness for VLM Safety via Reducing the Pretraining Modality Gap

Authors: Wenhan Yang, Spencer Stice, Ali Payani, Baharan Mirzasoleiman |

阅读更多

来源: ArXiv AI | 03-06-25

How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning

Authors: Hongyi James Cai, Junlin Wang, Xiaoyin Chen, Bhuwan Dhingra |

阅读更多

来源: ArXiv AI | 03-06-25

Generative AI for Urban Design: A Stepwise Approach Integrating Human Expertise with Multimodal Diffusion Models

Authors: Mingyi He, Yuebing Liang, Shenhao Wang, Yunhan Zheng, Qingyi Wang, Dingyi Zhuang, Li Tian, Jinhua Zhao |

阅读更多

来源: ArXiv AI | 03-06-25

FABLE: A Novel Data-Flow Analysis Benchmark on Procedural Text for Large Language Model Evaluation

Authors: Vishal Pallagani, Nitin Gupta, John Aydin, Biplav Srivastava |

阅读更多

来源: ArXiv AI | 03-06-25

GridRoute: A Benchmark for LLM-Based Route Planning with Cardinal Movement in Grid Environments

Authors: Kechen Li, Yaotian Tao, Ximing Wen, Quanwei Sun, Zifei Gong, Chang Xu, Xizhe Zhang, Tianbo Ji |

阅读更多

来源: ArXiv AI | 03-06-25

Mind the Quote: Enabling Quotation-Aware Dialogue in LLMs via Plug-and-Play Modules

Authors: Yueqi Zhang, Peiwen Yuan, Shaoxiong Feng, Yiwei Li, Xinglin Wang, Jiayi Shi, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li |

阅读更多

来源: ArXiv AI | 03-06-25

Leveraging Knowledge Graphs and LLMs for Structured Generation of Misinformation

Authors: Sania Nayab, Marco Simoni, Giulio Rossolini |

阅读更多

来源: ArXiv AI | 03-06-25

Optimizing the Interface Between Knowledge Graphs and LLMs for Complex Reasoning

Authors: Vasilije Markovic, Lazar Obradovic, Laszlo Hajdu, Jovan Pavlovic |

阅读更多

来源: ArXiv AI | 03-06-25

SEAR: A Multimodal Dataset for Analyzing AR-LLM-Driven Social Engineering Behaviors

Authors: Tianlong Yu, Chenghang Ye, Zheyu Yang, Ziyi Zhou, Cui Tang, Zui Tao, Jun Zhang, Kailong Wang, Liting Zhou, Yang Yang, Ting Bi |

阅读更多

来源: ArXiv AI | 03-06-25

MELT: Towards Automated Multimodal Emotion Data Annotation by Leveraging LLM Embedded Knowledge

Authors: Xin Jing, Jiadong Wang, Iosif Tsangko, Andreas Triantafyllopoulos, Björn W. Schuller |

阅读更多

来源: ArXiv AI | 03-06-25

Random Rule Forest (RRF): Interpretable Ensembles of LLM-Generated Questions for Predicting Startup Success

Authors: Ben Griffin, Joseph Ternasky, Fuat Alican, Yigit Ihlamur |

阅读更多

来源: ArXiv AI | 03-06-25

Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models

Authors: Frederike Lübeck, Jonas Wildberger, Frederik Träuble, Maximilian Mordig, Sergios Gatidis, Andreas Krause, Bernhard Schölkopf |

阅读更多

来源: ArXiv AI | 03-06-25

EXP-Bench: Can AI Conduct AI Research Experiments?

Authors: Patrick Tser Jern Kon, Jiachen Liu, Xinyi Zhu, Qiuyi Ding, Jingjia Peng, Jiarong Xing, Yibo Huang, Yiming Qiu, Jayanth Srinivasa, Myungjin Lee, Mosharaf Chowdhury, Matei Zaharia, Ang Chen |

阅读更多

来源: ArXiv AI | 03-06-25

Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents

Authors: Yaxin Luo, Zhaoyi Li, Jiacheng Liu, Jiacheng Cui, Xiaohan Zhao, Zhiqiang Shen |

阅读更多

来源: ArXiv AI | 03-06-25

Elevenlabs' new AI voice system enables smoother interactions through real-time analysis

阅读更多

来源: The Decoder | 02-06-25

Anthropic CEO predicts 20% unemployment from AI - and suggests taxing every AI response

阅读更多

来源: The Decoder | 02-06-25

How can AI researchers save energy? By going backwardquantamagazine.org

阅读更多

来源: Hacker News | 02-06-25

Beyond the Black Box: Interpretability of LLMs in Financearxiv.org

阅读更多

来源: Hacker News | 02-06-25

Codex CLI is going nativegithub.com/openai

阅读更多

来源: Hacker News | 02-06-25

When Fine-Tuning Makes Sense: A Developer's Guidegetkiln.ai

阅读更多

来源: Hacker News | 02-06-25

Google AI Edge – On-device cross-platform AI deploymentai.google.dev

阅读更多

来源: Hacker News | 02-06-25

Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time

Authors: Mohamad Chehade, Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy, Dinesh Manocha, Hao Zhu, Amrit Singh Bedi |

阅读更多

来源: ArXiv AI | 01-06-25

SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA

Authors: Minrui Luo, Fuhang Kuang, Yu Wang, Zirui Liu, Tianxing He |

阅读更多

来源: ArXiv AI | 01-06-25

DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning

Authors: Ziyin Zhang, Jiahao Xu, Zhiwei He, Tian Liang, Qiuzhi Liu, Yansi Li, Linfeng Song, Zhengwen Liang, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu |

阅读更多

来源: ArXiv AI | 01-06-25

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

Authors: Diankun Wu, Fangfu Liu, Yi-Hsin Hung, Yueqi Duan |

阅读更多

来源: ArXiv AI | 01-06-25

Case-Based Reasoning Enhances the Predictive Power of LLMs in Drug-Drug Interaction

Authors: Guangyi Liu, Yongqi Zhang, Xunyuan Liu, Quanming Yao |

阅读更多

来源: ArXiv AI | 01-06-25

Second Opinion Matters: Towards Adaptive Clinical AI via the Consensus of Expert Model Ensemble

Authors: Amit Kumthekar, Zion Tilley, Henry Duong, Bhargav Patel, Michael Magnoli, Ahmed Omar, Ahmed Nasser, Chaitanya Gharpure, Yevgen Reztzov |

阅读更多

来源: ArXiv AI | 01-06-25

MathArena: Evaluating LLMs on Uncontaminated Math Competitions

Authors: Mislav Balunović, Jasper Dekoninck, Ivo Petrov, Nikola Jovanović, Martin Vechev |

阅读更多

来源: ArXiv AI | 01-06-25

A Unified Framework for Human AI Collaboration in Security Operations Centers with Trusted Autonomy

Authors: Ahmad Mohsin, Helge Janicke, Ahmed Ibrahim, Iqbal H. Sarker, Seyit Camtepe |

阅读更多

来源: ArXiv AI | 01-06-25

Autoformalization in the Era of Large Language Models: A Survey

Authors: Ke Weng, Lun Du, Sirui Li, Wangyue Lu, Haozhe Sun, Hengyu Liu, Tiancheng Zhang |

阅读更多

来源: ArXiv AI | 01-06-25

EVOREFUSE: Evolutionary Prompt Optimization for Evaluation and Mitigation of LLM Over-Refusal to Pseudo-Malicious Instructions

Authors: Xiaorui Wu, Xiaofeng Mao, Fei Li, Xin Zhang, Xiaolu Zhang, Jun Zhou, Yuxiang Peng, Li Zheng, Chong Teng, Donghong Ji, Zhuang Li |

阅读更多

来源: ArXiv AI | 01-06-25

SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents

Authors: Kunlun Zhu, Jiaxun Zhang, Ziheng Qi, Nuoxing Shang, Zijia Liu, Peixuan Han, Yue Su, Haofei Yu, Jiaxuan You |

阅读更多

来源: ArXiv AI | 01-06-25

Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise Analytics

Authors: Ran Zhang, Mohannad Elhamod |

阅读更多

来源: ArXiv AI | 01-06-25

Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability

Authors: Ruida Wang, Yuxin Li, Yi R. (May)Fung, Tong Zhang |

阅读更多

来源: ArXiv AI | 01-06-25

Deepseek's R1 model closes the gap with OpenAI and Google after major update

阅读更多

来源: The Decoder | 01-06-25

The ‘white-collar bloodbath’ is all part of the AI hype machinecnn.com

阅读更多

来源: Hacker News | 01-06-25

Show HN: AI Peer Reviewer – Multiagent system for scientific manuscript analysisgithub.com/robertjakob

阅读更多

来源: Hacker News | 01-06-25

Generative AI startup Odyssey demos interactive AI-generated video

阅读更多

来源: The Decoder | 31-05-25

Show HN: MCP Defender – OSS AI Firewall for Protecting MCP in Cursor/Claude etcmcpdefender.com

阅读更多

来源: Hacker News | 31-05-25

The Darwin Gödel Machine: AI that improves itself by rewriting its own codesakana.ai

阅读更多

来源: Hacker News | 31-05-25

AccessOwl (YC S22) is hiring an AI TypeScript Engineer to connect 100s of SaaSycombinator.com

阅读更多

来源: Hacker News | 31-05-25

The Trackers and SDKs in ChatGPT, Claude, Grok and Perplexityjamesoclaire.com

阅读更多

来源: Hacker News | 31-05-25

What's working for YC companies since the AI boomjamesin.substack.com

阅读更多

来源: Hacker News | 31-05-25

Opera unveils Neon, a browser designed for both humans and AI agents

阅读更多

来源: The Decoder | 31-05-25

One year after its rivals, Claude can finally speak with users through a new voice mode

阅读更多

来源: The Decoder | 31-05-25

Anthropic launches a voice mode for Claudetechcrunch.com

阅读更多

来源: Hacker News | 31-05-25

Mistral's Agents API enables AI agents to collaborate and connect with external systems

阅读更多

来源: The Decoder | 30-05-25

What is currently the best LLM model for consumer grade hardware? Is it phi-4?

阅读更多

来源: Hacker News | 30-05-25

Spaitial pushes generative AI to understand and create 3D structures with real physical properties

阅读更多

来源: The Decoder | 30-05-25

Human coders are still better than LLMsantirez.com

阅读更多

来源: Hacker News | 30-05-25

Open-sourcing circuit tracing toolsanthropic.com

阅读更多

来源: Hacker News | 30-05-25

A visual exploration of vector embeddingspamelafox.org

阅读更多

来源: Hacker News | 30-05-25

Nick Clegg says a mandatory AI training opt-in would kill the UK's AI industry

阅读更多

来源: The Decoder | 29-05-25

ClaimPKG: Enhancing Claim Verification via Pseudo-Subgraph Generation with Lightweight Specialized LLM

Authors: Hoang Pham, Thanh-Do Nguyen, Khac-Hoai Nam Bui |

阅读更多

来源: ArXiv AI | 29-05-25

Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning

Authors: Erxin Yu, Jing Li, Ming Liao, Qi Zhu, Boyang Xue, Minghui Xu, Baojun Wang, Lanqing Hong, Fei Mi, Lifeng Shang |

阅读更多

来源: ArXiv AI | 29-05-25

Agent-UniRAG: A Trainable Open-Source LLM Agent Framework for Unified Retrieval-Augmented Generation Systems

Authors: Hoang Pham, Khac-Hoai Nam Bui |

阅读更多

来源: ArXiv AI | 29-05-25

R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning

Authors: Yongchao Chen, Yueying Liu, Junwei Zhou, Yilun Hao, Jingquan Wang, Yang Zhang, Chuchu Fan |

阅读更多

来源: ArXiv AI | 29-05-25

Understanding the learned look-ahead behavior of chess neural networks

Authors: Diogo Cruz |

阅读更多

来源: ArXiv AI | 29-05-25

3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model

Authors: Wenbo Hu, Yining Hong, Yanjun Wang, Leison Gao, Zibu Wei, Xingcheng Yao, Nanyun Peng, Yonatan Bitton, Idan Szpektor, Kai-Wei Chang |

阅读更多

来源: ArXiv AI | 29-05-25

From Reasoning to Learning: A Survey on Hypothesis Discovery and Rule Learning with Large Language Models

Authors: Kaiyu He, Zhiyu Chen |

阅读更多

来源: ArXiv AI | 29-05-25

Modeling and Optimizing User Preferences in AI Copilots: A Comprehensive Survey and Taxonomy

Authors: Saleh Afzoon, Zahra Jahanandish, Phuong Thao Huynh, Amin Beheshti, Usman Naseem |

阅读更多

来源: ArXiv AI | 29-05-25

SAGE-Eval: Evaluating LLMs for Systematic Generalizations of Safety Facts

Authors: Chen Yueh-Han, Guy Davidson, Brenden M. Lake |

阅读更多

来源: ArXiv AI | 29-05-25

Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation

Authors: Tharindu Kumarage, Ninareh Mehrabi, Anil Ramakrishna, Xinyan Zhao, Richard Zemel, Kai-Wei Chang, Aram Galstyan, Rahul Gupta, Charith Peris |

阅读更多

来源: ArXiv AI | 29-05-25

Visual Large Language Models Exhibit Human-Level Cognitive Flexibility in the Wisconsin Card Sorting Test

Authors: Guangfu Hao, Frederic Alexandre, Shan Yu |

阅读更多

来源: ArXiv AI | 29-05-25

HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym

Authors: Ngoc La, Ruaridh Mon-Williams, Julie A. Shah |

阅读更多

来源: ArXiv AI | 29-05-25

AgentDNS: A Root Domain Naming System for LLM Agents

Authors: Enfang Cui, Yujun Cheng, Rui She, Dan Liu, Zhiyuan Liang, Minxin Guo, Tianzheng Li, Qian Wei, Wenjuan Xing, Zhijie Zhong |

阅读更多

来源: ArXiv AI | 29-05-25

From Large AI Models to Agentic AI: A Tutorial on Future Intelligent Communications

Authors: Feibo Jiang, Cunhua Pan, Li Dong, Kezhi Wang, Octavia A. Dobre, Merouane Debbah |

阅读更多

来源: ArXiv AI | 29-05-25

Chatbots like ChatGPT have not led to significant changes in wages or working hours, study finds

阅读更多

来源: The Decoder | 29-05-25

Show HN: AutoThink – Boosts local LLM performance with adaptive reasoning

阅读更多

来源: Hacker News | 29-05-25

Launch HN: MindFort (YC X25) – AI agents for continuous pentesting

阅读更多

来源: Hacker News | 29-05-25

LLM codegen go brrr – Parallelization with Git worktrees and tmuxskeptrune.com

阅读更多

来源: Hacker News | 29-05-25

Gmail Personal Smart Replies: The first time an AI feature has worried me

阅读更多

来源: The Decoder | 28-05-25

Running GPT-2 in WebGL: Rediscovering the Lost Art of GPU Shader Programmingnathan.rs

阅读更多

来源: Hacker News | 28-05-25

There Is No Diffie-Hellman but Elliptic Curve Diffie-Hellmankeymaterial.net

阅读更多

来源: Hacker News | 28-05-25

Show HN: My LLM CLI tool can run tools now, from Python code or pluginssimonwillison.net

阅读更多

来源: Hacker News | 28-05-25

Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making

Authors: Yihan Wang, Qiao Yan, Zhenghao Xing, Lihao Liu, Junjun He, Chi-Wing Fu, Xiaowei Hu, Pheng-Ann Heng |

阅读更多

来源: ArXiv AI | 28-05-25

Multi-Modal Artificial Intelligence of Embryo Grading and Pregnancy Prediction in Assisted Reproductive Technology: A Review

Authors: Xueqiang Ouyang, Jia Wei |

阅读更多

来源: ArXiv AI | 28-05-25

How does Alignment Enhance LLMs' Multilingual Capabilities? A Language Neurons Perspective

Authors: Shimao Zhang, Zhejian Lai, Xiang Liu, Shuaijie She, Xiao Liu, Yeyun Gong, Shujian Huang, Jiajun Chen |

阅读更多

来源: ArXiv AI | 28-05-25

CoderAgent: Simulating Student Behavior for Personalized Programming Learning with Large Language Models

Authors: Yi Zhan, Qi Liu, Weibo Gao, Zheng Zhang, Tianfu Wang, Shuanghong Shen, Junyu Lu, Zhenya Huang |

阅读更多

来源: ArXiv AI | 28-05-25

Comparisons between a Large Language Model-based Real-Time Compound Diagnostic Medical AI Interface and Physicians for Common Internal Medicine Cases using Simulated Patients

Authors: Hyungjun Park (1,2), Chang-Yun Woo (3), Seungjo Lim (2), Seunghwan Lim (2), Keunho Kwak (2), Ju Young Jeong (4), Chong Hyun Suh (4) ((1) Department of Pulmonology, Shihwa Medical Center, Siheung, Republic of Korea (2) Helpmedoc Inc., Republic of Korea (3) Department of Internal Medicine, Asan Medical Center, Seoul, Republic of Korea (4) Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Republic of Korea) |

阅读更多

来源: ArXiv AI | 28-05-25

Project Riley: Multimodal Multi-Agent LLM Collaboration with Emotional Reasoning and Voting

Authors: Ana Rita Ortigoso, Gabriel Vieira, Daniel Fuentes, Luis Frazão, Nuno Costa, António Pereira |

阅读更多

来源: ArXiv AI | 28-05-25

RRO: LLM Agent Optimization Through Rising Reward Trajectories

Authors: Zilong Wang, Jingfeng Yang, Sreyashi Nag, Samarth Varshney, Xianfeng Tang, Haoming Jiang, Jingbo Shang, Sheikh Muhammad Sarwar |

阅读更多

来源: ArXiv AI | 28-05-25

E2E Process Automation Leveraging Generative AI and IDP-Based Automation Agent: A Case Study on Corporate Expense Processing

Authors: Cheonsu Jeong, Seongmin Sim, Hyoyoung Cho, Sungsu Kim, Byounggwan Shin |

阅读更多

来源: ArXiv AI | 28-05-25

GIFARC: Synthetic Dataset for Leveraging Human-Intuitive Analogies to Elevate AI Reasoning

Authors: Woochang Sim, Hyunseok Ryu, Kyungmin Choi, Sungwon Han, Sundong Kim |

阅读更多

来源: ArXiv AI | 28-05-25

LLM-Guided Reinforcement Learning: Addressing Training Bottlenecks through Policy Modulation

Authors: Heng Tan, Hua Yan, Yu Yang |

阅读更多

来源: ArXiv AI | 28-05-25

AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage

Authors: Xuanle Zhao, Zilin Sang, Yuxuan Li, Qi Shi, Shuo Wang, Duzhen Zhang, Xu Han, Zhiyuan Liu, Maosong Sun |

阅读更多

来源: ArXiv AI | 28-05-25

Step-Wise Formal Verification for LLM-Based Mathematical Problem Solving

Authors: Kuo Zhou, Lu Zhang |

阅读更多

来源: ArXiv AI | 28-05-25

Large Language Model-enhanced Reinforcement Learning for Low-Altitude Economy Networking

Authors: Lingyi Cai, Ruichen Zhang, Changyuan Zhao, Yu Zhang, Jiawen Kang, Dusit Niyato, Tao Jiang, Xuemin Shen |

阅读更多

来源: ArXiv AI | 28-05-25

Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations

Authors: Hao Li, He Cao, Bin Feng, Yanjun Shao, Xiangru Tang, Zhiyuan Yan, Li Yuan, Yonghong Tian, Yu Li |

阅读更多

来源: ArXiv AI | 28-05-25

Complex System Diagnostics Using a Knowledge Graph-Informed and Large Language Model-Enhanced Framework

Authors: Saman Marandi, Yu-Shu Hu, Mohammad Modarres |

阅读更多

来源: ArXiv AI | 28-05-25

RLJP: Legal Judgment Prediction via First-Order Logic Rule-enhanced with Large Language Models

Authors: Yue Zhang, Zhiliang Tian, Shicheng Zhou, Haiyang Wang, Wenqing Hou, Yuying Liu, Xuechen Zhao, Minlie Huang, Ye Wang, Bin Zhou |

阅读更多

来源: ArXiv AI | 28-05-25

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs

Authors: Jiakang Yuan, Tianshuo Peng, Yilei Jiang, Yiting Lu, Renrui Zhang, Kaituo Feng, Chaoyou Fu, Tao Chen, Lei Bai, Bo Zhang, Xiangyu Yue |

阅读更多

来源: ArXiv AI | 28-05-25

A Structured Unplugged Approach for Foundational AI Literacy in Primary Education

Authors: Maria Cristina Carrisi, Mirko Marras, Sara Vergallo |

阅读更多

来源: ArXiv AI | 28-05-25

The Multilingual Divide and Its Impact on Global AI Safety

Authors: Aidan Peppin, Julia Kreutzer, Alice Schoenauer Sebag, Kelly Marchisio, Beyza Ermis, John Dang, Samuel Cahyawijaya, Shivalika Singh, Seraphina Goldfarb-Tarrant, Viraat Aryabumi, Aakanksha, Wei-Yin Ko, Ahmet Üstün, Matthias Gallé, Marzieh Fadaee, Sara Hooker |

阅读更多

来源: ArXiv AI | 28-05-25

Diagnosing and Resolving Cloud Platform Instability with Multi-modal RAG LLMs

Authors: Yifan Wang, Kenneth P. Birman |

阅读更多

来源: ArXiv AI | 28-05-25

Robust Hypothesis Generation: LLM-Automated Language Bias for Inductive Logic Programming

Authors: Yang Yang, Jiemin Wu, Yutao Yue |

阅读更多

来源: ArXiv AI | 28-05-25

Google expands access to Veo 3, its viral new video model, through the Gemini app

阅读更多

来源: The Decoder | 27-05-25

Diligent (YC S23) Is Hiring a Founding AI Engineerycombinator.com

阅读更多

来源: Hacker News | 27-05-25

Trying to teach in the age of the AI homework machinesolarshades.club

阅读更多

来源: Hacker News | 27-05-25

Highlights from the Claude 4 system promptsimonwillison.net

阅读更多

来源: Hacker News | 27-05-25

Automated CAD Modeling Sequence Generation from Text Descriptions via Transformer-Based Large Language Models

Authors: Jianxing Liao, Junyan Xu, Yatao Sun, Maowen Tang, Sicheng He, Jingxian Liao, Shui Yu, Yun Li, Hongguan Xiao |

阅读更多

来源: ArXiv AI | 27-05-25

Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel Bugs

Authors: Zhenhao Zhou, Zhuochen Huang, Yike He, Chong Wang, Jiajun Wang, Yijian Wu, Xin Peng, Yiling Lou |

阅读更多

来源: ArXiv AI | 27-05-25

MSD-LLM: Predicting Ship Detention in Port State Control Inspections with Large Language Model

Authors: Jiongchao Jin, Xiuju Fu, Xiaowei Gao, Tao Cheng, Ran Yan |

阅读更多

来源: ArXiv AI | 27-05-25

LLM-Agent-Controller: A Universal Multi-Agent Large Language Model System as a Control Engineer

Authors: Rasoul Zahedifar, Sayyed Ali Mirghasemi, Mahdieh Soleymani Baghshah, Alireza Taheri |

阅读更多

来源: ArXiv AI | 27-05-25

AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and Healthcare

Authors: Ying Xiao, Jie Huang, Ruijuan He, Jing Xiao, Mohammad Reza Mousavi, Yepang Liu, Kezhi Li, Zhenpeng Chen, Jie M. Zhang |

阅读更多

来源: ArXiv AI | 27-05-25

Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language Models

Authors: George Kour, Itay Nakash, Ateret Anaby-Tavor, Michal Shmueli-Scheuer |

阅读更多

来源: ArXiv AI | 27-05-25

Large Language Models for Planning: A Comprehensive and Systematic Survey

Authors: Pengfei Cao, Tianyi Men, Wencan Liu, Jingwen Zhang, Xuzhao Li, Xixun Lin, Dianbo Sui, Yanan Cao, Kang Liu, Jun Zhao |

阅读更多

来源: ArXiv AI | 27-05-25

Large Language Models' Reasoning Stalls: An Investigation into the Capabilities of Frontier Models

Authors: Lachlan McGinness, Peter Baumgartner |

阅读更多

来源: ArXiv AI | 27-05-25

FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks

Authors: Atsunori Moteki, Shoichi Masui, Fan Yang, Yueqi Song, Yonatan Bisk, Graham Neubig, Ikuo Kusajima, Yasuto Watanabe, Hiroyuki Ishida, Jun Takahashi, Shan Jiang |

阅读更多

来源: ArXiv AI | 27-05-25

ReChisel: Effective Automatic Chisel Code Generation by LLM with Reflection

Authors: Juxin Niu, Xiangfeng Liu, Dan Niu, Xi Wang, Zhe Jiang, Nan Guan |

阅读更多

来源: ArXiv AI | 27-05-25

Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning

Authors: Zican Hu, Wei Liu, Xiaoye Qu, Xiangyu Yue, Chunlin Chen, Zhi Wang, Yu Cheng |

阅读更多

来源: ArXiv AI | 27-05-25

Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging

Authors: Yongxian Wei, Runxi Cheng, Weike Jin, Enneng Yang, Li Shen, Lu Hou, Sinan Du, Chun Yuan, Xiaochun Cao, Dacheng Tao |

阅读更多

来源: ArXiv AI | 27-05-25

DGRAG: Distributed Graph-based Retrieval-Augmented Generation in Edge-Cloud Systems

Authors: Wenqing Zhou, Yuxuan Yan, Qianqian Yang |

阅读更多

来源: ArXiv AI | 27-05-25

Large Language Models as Autonomous Spacecraft Operators in Kerbal Space Program

Authors: Alejandro Carrasco, Victor Rodriguez-Fernandez, Richard Linares |

阅读更多

来源: ArXiv AI | 27-05-25

Subtle Risks, Critical Failures: A Framework for Diagnosing Physical Safety of LLMs for Embodied Decision Making

Authors: Yejin Son, Minseo Kim, Sungwoong Kim, Seungju Han, Jian Kim, Dongju Jang, Youngjae Yu, Chanyoung Park |

阅读更多

来源: ArXiv AI | 27-05-25

EMAC+: Embodied Multimodal Agent for Collaborative Planning with VLM+LLM

Authors: Shuang Ao, Flora D. Salim, Simon Khan |

阅读更多

来源: ArXiv AI | 27-05-25

Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback

Authors: Mengdi Li, Jiaye Lin, Xufeng Zhao, Wenhao Lu, Peilin Zhao, Stefan Wermter, Di Wang |

阅读更多

来源: ArXiv AI | 27-05-25

Agentic AI Process Observability: Discovering Behavioral Variability

Authors: Fabiana Fournier, Lior Limonad, Yuval David |

阅读更多

来源: ArXiv AI | 27-05-25

Capability-Based Scaling Laws for LLM Red-Teaming

Authors: Alexander Panfilov, Paul Kassianik, Maksym Andriushchenko, Jonas Geiping |

阅读更多

来源: ArXiv AI | 27-05-25

MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents

Authors: Ziming Wei, Bingqian Lin, Zijian Jiao, Yunshuang Nie, Liang Ma, Yuecheng Liu, Yuzheng Zhuang, Xiaodan Liang |

阅读更多

来源: ArXiv AI | 27-05-25

Temporal Sampling for Forgotten Reasoning in LLMs

Authors: Yuetai Li, Zhangchen Xu, Fengqing Jiang, Bhaskar Ramasubramanian, Luyao Niu, Bill Yuchen Lin, Xiang Yue, Radha Poovendran |

阅读更多

来源: ArXiv AI | 27-05-25

The Mirage of Multimodality: Where Truth is Tested and Honesty Unravels

Authors: Jiaming Ji, Sitong Fang, Wenjing Cao, Jiahao Li, Xuyao Wang, Juntao Dai, Chi-Min Chan, Sirui Han, Yike Guo, Yaodong Yang |

阅读更多

来源: ArXiv AI | 27-05-25

Ten Principles of AI Agent Economics

Authors: Ke Yang, ChengXiang Zhai |

阅读更多

来源: ArXiv AI | 27-05-25

How Can I Publish My LLM Benchmark Without Giving the True Answers Away?

Authors: Takashi Ishida, Thanawat Lodkaew, Ikko Yamane |

阅读更多

来源: ArXiv AI | 27-05-25

Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL

Authors: Joey Hong, Anca Dragan, Sergey Levine |

阅读更多

来源: ArXiv AI | 27-05-25

Lost in the Haystack: Smaller Needles are More Difficult for LLMs to Find

Authors: Owen Bianchi, Mathew J. Koretsky, Maya Willey, Chelsea X. Alvarado, Tanay Nayak, Adi Asija, Nicole Kuznetsov, Mike A. Nalls, Faraz Faghri, Daniel Khashabi |

阅读更多

来源: ArXiv AI | 27-05-25

Leveraging KANs for Expedient Training of Multichannel MLPs via Preconditioning and Geometric Refinement

Authors: Jonas A. Actor, Graham Harper, Ben Southworth, Eric C. Cyr |

阅读更多

来源: ArXiv AI | 27-05-25

Bidirectional Knowledge Distillation for Enhancing Sequential Recommendation with Large Language Models

Authors: Jiongran Wu, Jiahao Liu, Dongsheng Li, Guangping Zhang, Mingzhe Han, Hansu Gu, Peng Zhang, Li Shang, Tun Lu, Ning Gu |

阅读更多

来源: ArXiv AI | 27-05-25

Where You Go is Who You Are: Behavioral Theory-Guided LLMs for Inverse Reinforcement Learning

Authors: Yuran Sun, Susu Xu, Chenguang Wang, Xilei Zhao |

阅读更多

来源: ArXiv AI | 27-05-25

Misaligning Reasoning with Answers -- A Framework for Assessing LLM CoT Robustness

Authors: Enyi Jiang, Changming Xu, Nischay Singh, Gagandeep Singh |

阅读更多

来源: ArXiv AI | 27-05-25

From Reasoning to Generalization: Knowledge-Augmented LLMs for ARC Benchmark

Authors: Chao Lei, Nir Lipovetzky, Krista A. Ehinger, Yanchuan Chang |

阅读更多

来源: ArXiv AI | 27-05-25

Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal Learning

Authors: Cheng Peng, Kai Zhang, Mengxian Lyu, Hongfang Liu, Lichao Sun, Yonghui Wu |

阅读更多

来源: ArXiv AI | 27-05-25

Probe by Gaming: A Game-based Benchmark for Assessing Conceptual Knowledge in LLMs

Authors: Shuhang Xu, Weijian Deng, Yixuan Zhou, Fangwei Zhong |

阅读更多

来源: ArXiv AI | 27-05-25

USTBench: Benchmarking and Dissecting Spatiotemporal Reasoning of LLMs as Urban Agents

Authors: Siqi Lai, Yansong Ning, Zirui Yuan, Zhixi Chen, Hao Liu |

阅读更多

来源: ArXiv AI | 27-05-25

GeoGramBench: Benchmarking the Geometric Program Reasoning in Modern LLMs

Authors: Shixian Luo, Zezhou Zhu, Yu Yuan, Yuncheng Yang, Lianlei Shan, Yong Wu |

阅读更多

来源: ArXiv AI | 27-05-25

CIKT: A Collaborative and Iterative Knowledge Tracing Framework with Large Language Models

Authors: Runze Li, Siyu Wu, Jun Wang, Wei Zhang |

阅读更多

来源: ArXiv AI | 27-05-25

Enhancing AI System Resiliency: Formulation and Guarantee for LSTM Resilience Based on Control Theory

Authors: Sota Yoshihara (1), Ryousuke Yamamoto (2), Hiroyuki Kusumoto (1), Masanari Shimura (1) ((1) Graduate School of Mathematics, Nagoya University, (2) Aisin Software) |

阅读更多

来源: ArXiv AI | 27-05-25

Automating Safety Enhancement for LLM-based Agents with Synthetic Risk Scenarios

Authors: Xueyang Zhou, Weidong Wang, Lin Lu, Jiawen Shi, Guiyao Tie, Yongtian Xu, Lixing Chen, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun |

阅读更多

来源: ArXiv AI | 27-05-25

Superplatforms Have to Attack AI Agents

Authors: Jianghao Lin, Jiachen Zhu, Zheli Zhou, Yunjia Xi, Weiwen Liu, Yong Yu, Weinan Zhang |

阅读更多

来源: ArXiv AI | 27-05-25

Evaluation Faking: Unveiling Observer Effects in Safety Evaluation of Frontier AI Systems

Authors: Yihe Fan, Wenqi Zhang, Xudong Pan, Min Yang |

阅读更多

来源: ArXiv AI | 27-05-25

Formalizing Embeddedness Failures in Universal Artificial Intelligence

Authors: Cole Wyeth, Marcus Hutter |

阅读更多

来源: ArXiv AI | 27-05-25

Structured Thinking Matters: Improving LLMs Generalization in Causal Inference Tasks

Authors: Wentao Sun, Joao Paulo Nogueira, Alonso Silva |

阅读更多

来源: ArXiv AI | 27-05-25

Gaming Tool Preferences in Agentic LLMs

Authors: Kazem Faghih, Wenxiao Wang, Yize Cheng, Siddhant Bharti, Gaurang Sriramanan, Sriram Balasubramanian, Parsa Hosseini, Soheil Feizi |

阅读更多

来源: ArXiv AI | 27-05-25

Embracing Contradiction: Theoretical Inconsistency Will Not Impede the Road of Building Responsible AI Systems

Authors: Gordon Dai, Yunze Xiao |

阅读更多

来源: ArXiv AI | 27-05-25

Apple analyst expects OpenAI's AI hardware to be "as compact and elegant as an iPod Shuffle"

阅读更多

来源: The Decoder | 26-05-25

Meta can use public Facebook and Instagram data for AI training, German court rules

阅读更多

来源: The Decoder | 26-05-25

Trading with Claude, and writing your own MCP serverdangelov.com

阅读更多

来源: Hacker News | 26-05-25

Ask HN: Anyone struggling to get value out of coding LLMs?

阅读更多

来源: Hacker News | 26-05-25

How Does Claude 4 Think? – Sholto Douglas and Trenton Brickendwarkesh.com

阅读更多

来源: Hacker News | 26-05-25

Venta AI (YC S23) Is Hiring a Founding Full Stack Engineer in Amsterdamycombinator.com

阅读更多

来源: Hacker News | 26-05-25

Chomsky on what ChatGPT is good for (2023)chomsky.info

阅读更多

来源: Hacker News | 26-05-25

Claude 4 System Cardsimonwillison.net

阅读更多

来源: Hacker News | 26-05-25

OpenAI's Operator Agent gets o3 upgrade for more precise browser control

阅读更多

来源: The Decoder | 25-05-25

Here's how Germans use ChatGPT according to OpenAI

阅读更多

来源: The Decoder | 25-05-25

Peer Programming with LLMs, for Senior+ Engineerspmbanugo.me

阅读更多

来源: Hacker News | 25-05-25

Show HN: AI Baby Monitor – local Video-LLM that beeps when safety rules breakgithub.com/zeenolife

阅读更多

来源: Hacker News | 25-05-25

Can AI Read Between The Lines? Benchmarking LLMs On Financial Nuance

Authors: Dominick Kubica, Dylan T. Gordon, Nanami Emura, Derleen Saini, Charlie Goldenberg |

阅读更多

来源: ArXiv AI | 25-05-25

Optimizing LLM-Based Multi-Agent System with Textual Feedback: A Case Study on Software Development

Authors: Ming Shen, Raphael Shu, Anurag Pratik, James Gung, Yubin Ge, Monica Sunkara, Yi Zhang |

阅读更多

来源: ArXiv AI | 25-05-25

LLM-Powered AI Agent Systems and Their Applications in Industry

Authors: Guannan Liang, Qianqian Tong |

阅读更多

来源: ArXiv AI | 25-05-25

Logic-of-Thought: Empowering Large Language Models with Logic Programs for Solving Puzzles in Natural Language

Authors: Naiqi Li, Peiyuan Liu, Zheng Liu, Tao Dai, Yong Jiang, Shu-Tao Xia |

阅读更多

来源: ArXiv AI | 25-05-25

LightRouter: Towards Efficient LLM Collaboration with Minimal Overhead

Authors: Yifan Zhang, Xinkui Zhao, Zuxin Wang, Guanjie Cheng, Yueshen Xu, Shuiguang Deng, Jianwei Yin |

阅读更多

来源: ArXiv AI | 25-05-25

EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning

Authors: Jiawei Liu, Qisi Chen, Jianshu Zhang, Quan Liu, Defu Lian |

阅读更多

来源: ArXiv AI | 25-05-25

How do Scaling Laws Apply to Knowledge Graph Engineering Tasks? The Impact of Model Size on Large Language Model Performance

Authors: Desiree Heim, Lars-Peter Meyer, Markus Schröder, Johannes Frey, Andreas Dengel |

阅读更多

来源: ArXiv AI | 25-05-25

Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning

Authors: Xiaoxue Cheng, Junyi Li, Zhenduo Zhang, Xinyu Tang, Wayne Xin Zhao, Xinyu Kong, Zhiqiang Zhang |

阅读更多

来源: ArXiv AI | 25-05-25

Advancing the Scientific Method with Large Language Models: From Hypothesis to Discovery

Authors: Yanbo Zhang, Sumeer A. Khan, Adnan Mahmud, Huck Yang, Alexander Lavin, Michael Levin, Jeremy Frey, Jared Dunnmon, James Evans, Alan Bundy, Saso Dzeroski, Jesper Tegner, Hector Zenil |

阅读更多

来源: ArXiv AI | 25-05-25

ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection

Authors: Jiaqi Li, Xinyi Dong, Yang Liu, Zhizhuo Yang, Quansen Wang, Xiaobo Wang, SongChun Zhu, Zixia Jia, Zilong Zheng |

阅读更多

来源: ArXiv AI | 25-05-25

Psychology-driven LLM Agents for Explainable Panic Prediction on Social Media during Sudden Disaster Events

Authors: Mengzhu Liu, Zhengqiu Zhu, Chuan Ai, Chen Gao, Xinghong Li, Lingnan He, Kaisheng Lai, Yingfeng Chen, Xin Lu, Yong Li, Quanjun Yin |

阅读更多

来源: ArXiv AI | 25-05-25

ELABORATION: A Comprehensive Benchmark on Human-LLM Competitive Programming

Authors: Xinwei Yang, Zhaofeng Liu, Chen Huang, Jiashuai Zhang, Tong Zhang, Yifan Zhang, Wenqiang Lei |

阅读更多

来源: ArXiv AI | 25-05-25

SMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs' Mathematical Problem Solving

Authors: Yujie Hou, Ting Zhang, Mei Wang, Xuetao Ma, Hu Huang |

阅读更多

来源: ArXiv AI | 25-05-25

Data-Driven Breakthroughs and Future Directions in AI Infrastructure: A Comprehensive Review

Authors: Beyazit Bestami Yuksel, Ayse Yilmazer Metin |

阅读更多

来源: ArXiv AI | 25-05-25

MCP-RADAR: A Multi-Dimensional Benchmark for Evaluating Tool Use Capabilities in Large Language Models

Authors: Xuanqi Gao, Siyi Xie, Juan Zhai, Shqing Ma, Chao Shen |

阅读更多

来源: ArXiv AI | 25-05-25

Predicate-Conditional Conformalized Answer Sets for Knowledge Graph Embeddings

Authors: Yuqicheng Zhu, Daniel Hernández, Yuan He, Zifeng Ding, Bo Xiong, Evgeny Kharlamov, Steffen Staab |

阅读更多

来源: ArXiv AI | 25-05-25

Identifying, Evaluating, and Mitigating Risks of AI Thought Partnerships

Authors: Kerem Oktar, Katherine M. Collins, Jose Hernandez-Orallo, Diane Coyle, Stephen Cave, Adrian Weller, Ilia Sucholutsky |

阅读更多

来源: ArXiv AI | 25-05-25

AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios

Authors: Yunjia Qi, Hao Peng, Xiaozhi Wang, Amy Xin, Youfeng Liu, Bin Xu, Lei Hou, Juanzi Li |

阅读更多

来源: ArXiv AI | 25-05-25

HyGenar: An LLM-Driven Hybrid Genetic Algorithm for Few-Shot Grammar Generation

Authors: Weizhi Tang, Yixuan Li, Chris Sypherd, Elizabeth Polgreen, Vaishak Belle |

阅读更多

来源: ArXiv AI | 25-05-25

Beyond Correlation: Towards Causal Large Language Model Agents in Biomedicine

Authors: Adib Bazgir, Amir Habibdoust Lafmajani, Yuwen Zhang |

阅读更多

来源: ArXiv AI | 25-05-25

Know the Ropes: A Heuristic Strategy for LLM-based Multi-Agent System Design

Authors: Zhenkun Li, Lingyao Li, Shuhang Lin, Yongfeng Zhang |

阅读更多

来源: ArXiv AI | 25-05-25

X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs

Authors: Rui Ye, Xiangrui Liu, Qimin Wu, Xianghe Pang, Zhenfei Yin, Lei Bai, Siheng Chen |

阅读更多

来源: ArXiv AI | 25-05-25

OpenAI and G42 will build massive AI data center in Abu Dhabi

阅读更多

来源: The Decoder | 25-05-25

Mistral's Document AI extracts text from documents and notes with high accuracy

阅读更多

来源: The Decoder | 25-05-25

US House passed a bill that would ban state-level AI regulations for ten years

阅读更多

来源: The Decoder | 25-05-25

Exposed Industrial Control Systems and Honeypots in the Wild [pdf]gsmaragd.github.io

阅读更多

来源: Hacker News | 25-05-25

Positional preferences, order effects, prompt sensitivity undermine AI judgmentscip.org

阅读更多

来源: Hacker News | 24-05-25

Show HN: I built a more productive way to manage AI chatscontextch.at

阅读更多

来源: Hacker News | 24-05-25

Claude Opus 4 blackmailed an engineer after learning it might be replaced

阅读更多

来源: The Decoder | 24-05-25

OpenAI has upgraded the Responses API with remote MCP servers and new tools

阅读更多

来源: The Decoder | 24-05-25

OpenAI and Jony Ive are building a new AI device that is not a smartphone or smart glasses

阅读更多

来源: The Decoder | 24-05-25

Mistral launches Devstral Small 24B, a new open-source LLM for coding

阅读更多

来源: The Decoder | 23-05-25

OpenAI's Stargate secured $11.6 billion for a massive data center

阅读更多

来源: The Decoder | 23-05-25

Google Gemini is everything Siri never was

阅读更多

来源: The Decoder | 23-05-25

Gemini Diffusion could be Google's most important I/O news that slipped under the radar

阅读更多

来源: The Decoder | 23-05-25

Google shows AI filmmaking tool, XR glasses and launches $250 Gemini subscription

阅读更多

来源: The Decoder | 23-05-25

Launch HN: WorkDone (YC X25) – AI Audit of Medical Charts

阅读更多

来源: Hacker News | 23-05-25

OpenAI: Scaling PostgreSQL to the Next Levelpixelstech.net

阅读更多

来源: Hacker News | 23-05-25

Claude 4anthropic.com

阅读更多

来源: Hacker News | 23-05-25

Management = Bullshit (LLM Edition)funcall.blogspot.com

阅读更多

来源: Hacker News | 23-05-25

Problems in AI alignment: A scale modelmuldoon.cloud

阅读更多

来源: Hacker News | 23-05-25

Google upgrades Gemini 2.5 Pro with a new Deep Think mode for advanced reasoning abilities

阅读更多

来源: The Decoder | 22-05-25

An upgraded dev experience in Google AI Studiogoogleblog.com

阅读更多

来源: Hacker News | 22-05-25

OpenAI to buy AI startup from Jony Ivebloomberg.com

阅读更多

来源: Hacker News | 22-05-25

LLM function calls don't scale; code orchestration is simpler, more effectivejngiam.bearblog.dev

阅读更多

来源: Hacker News | 22-05-25

Gemini figured out my nephew’s namenawaz.org

阅读更多

来源: Hacker News | 22-05-25

Robert Musil Forgotten Plays Inspired His Greatest Work of Fictionlithub.com

阅读更多

来源: Hacker News | 22-05-25

Gemini Diffusionsimonwillison.net

阅读更多

来源: Hacker News | 22-05-25

FragFake: A Dataset for Fine-Grained Detection of Edited Images with Vision Language Models

Authors: Zhen Sun, Ziyi Zhang, Zeren Luo, Zeyang Sha, Tianshuo Cong, Zheng Li, Shiwen Cui, Weiqiang Wang, Jiaheng Wei, Xinlei He, Qi Li, Qian Wang |

阅读更多

来源: ArXiv AI | 22-05-25

Listen to the Context: Towards Faithful Large Language Models for Retrieval Augmented Generation on Climate Questions

Authors: David Thulke, Jakob Kemmler, Christian Dugast, Hermann Ney |

阅读更多

来源: ArXiv AI | 22-05-25

From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning

Authors: David Dinucu-Jianu, Jakub Macina, Nico Daheim, Ido Hakimi, Iryna Gurevych, Mrinmaya Sachan |

阅读更多

来源: ArXiv AI | 22-05-25

Exploring LLM-Generated Feedback for Economics Essays: How Teaching Assistants Evaluate and Envision Its Use

Authors: Xinyi Lu, Aditya Mahesh, Zejia Shen, Mitchell Dudley, Larissa Sano, Xu Wang |

阅读更多

来源: ArXiv AI | 22-05-25

A Federated Splitting Framework for LLMs: Security, Efficiency, and Adaptability

Authors: Zishuai Zhang, Hainan Zhang, Jiaying Zheng, Ziwei Wang, Yongxin Tong, Jin Dong, Zhiming Zheng |

阅读更多

来源: ArXiv AI | 22-05-25

HybridProver: Augmenting Theorem Proving with LLM-Driven Proof Synthesis and Refinement

Authors: Jilin Hu, Jianyu Zhang, Yongwang Zhao, Talia Ringer |

阅读更多

来源: ArXiv AI | 22-05-25

Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses

Authors: Xiaoxue Yang, Bozhidar Stevanoski, Matthieu Meeus, Yves-Alexandre de Montjoye |

阅读更多

来源: ArXiv AI | 22-05-25

Shared Path: Unraveling Memorization in Multilingual LLMs through Language Similarities

Authors: Xiaoyu Luo, Yiyi Chen, Johannes Bjerva, Qiongxiu Li |

阅读更多

来源: ArXiv AI | 22-05-25

Multi-modal Integration Analysis of Alzheimer's Disease Using Large Language Models and Knowledge Graphs

Authors: Kanan Kiguchi, Yunhao Tu, Katsuhiro Ajito, Fady Alnajjar, Kazuyuki Murase |

阅读更多

来源: ArXiv AI | 22-05-25

Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space

Authors: Zhen Zhang, Xuehai He, Weixiang Yan, Ao Shen, Chenyang Zhao, Shuohang Wang, Yelong Shen, Xin Eric Wang |

阅读更多

来源: ArXiv AI | 22-05-25

Large Language Models as Computable Approximations to Solomonoff Induction

Authors: Jun Wan, Lingrui Mei |

阅读更多

来源: ArXiv AI | 22-05-25

VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models

Authors: Yuchen Yan, Jin Jiang, Zhenbang Ren, Yijun Li, Xudong Cai, Yang Liu, Xin Xu, Mengdi Zhang, Jian Shao, Yongliang Shen, Jun Xiao, Yueting Zhuang |

阅读更多

来源: ArXiv AI | 22-05-25

R&D-Agent: Automating Data-Driven AI Solution Building Through LLM-Powered Automated Research, Development, and Evolution

Authors: Xu Yang, Xiao Yang, Shikai Fang, Bowen Xian, Yuante Li, Jian Wang, Minrui Xu, Haoran Pan, Xinpeng Hong, Weiqing Liu, Yelong Shen, Weizhu Chen, Jiang Bian |

阅读更多

来源: ArXiv AI | 22-05-25

Self-Evolving Curriculum for LLM Reasoning

Authors: Xiaoyin Chen, Jiarui Lu, Minsu Kim, Dinghuai Zhang, Jian Tang, Alexandre Piché, Nicolas Gontier, Yoshua Bengio, Ehsan Kamalloo |

阅读更多

来源: ArXiv AI | 22-05-25

lmgame-Bench: How Good are LLMs at Playing Games?

Authors: Lanxiang Hu, Mingjia Huo, Yuxuan Zhang, Haoyang Yu, Eric P. Xing, Ion Stoica, Tajana Rosing, Haojian Jin, Hao Zhang |

阅读更多

来源: ArXiv AI | 22-05-25

ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges

Authors: Cheng Qian, Hongyi Du, Hongru Wang, Xiusi Chen, Yuji Zhang, Avirup Sil, Chengxiang Zhai, Kathleen McKeown, Heng Ji |

阅读更多

来源: ArXiv AI | 22-05-25

Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge

Authors: Yassir Fathullah, Mark J. F. Gales |

阅读更多

来源: ArXiv AI | 22-05-25

ClickSight: Interpreting Student Clickstreams to Reveal Insights on Learning Strategies via LLMs

Authors: Bahar Radmehr, Ekaterina Shved, Fatma Betül Güreş, Adish Singla, Tanja Käser |

阅读更多

来源: ArXiv AI | 22-05-25

Average Reward Reinforcement Learning for Omega-Regular and Mean-Payoff Objectives

Authors: Milad Kazemi, Mateo Perez, Fabio Somenzi, Sadegh Soudjani, Ashutosh Trivedi, Alvaro Velasquez |

阅读更多

来源: ArXiv AI | 22-05-25

Microsoft Build 2025 showcases new AI agent tools and open interfaces for developers

阅读更多

来源: The Decoder | 21-05-25

Large language models often struggle with decision-making — a new study explains why

阅读更多

来源: The Decoder | 21-05-25

Deep Learning Is Applied Topologytheahura.substack.com

阅读更多

来源: Hacker News | 21-05-25

Watching AI drive Microsoft employees insanereddit.com

阅读更多

来源: Hacker News | 21-05-25

Someone got an LLM running on a Commodore 64 from 1982, and it runs as wellxda-developers.com

阅读更多

来源: Hacker News | 21-05-25

5 Boring Things That Have a Bigger Impact Than AI Assistants on Dev Productivitycodemanship.wordpress.com

阅读更多

来源: Hacker News | 21-05-25

DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery

Authors: Kun Li, Zhennan Wu, Shoupeng Wang, Wenbin Hu |

阅读更多

来源: ArXiv AI | 21-05-25

Divide by Question, Conquer by Agent: SPLIT-RAG with Question-Driven Graph Partitioning

Authors: Ruiyi Yang, Hao Xue, Imran Razzak, Hakim Hacid, Flora D. Salim |

阅读更多

来源: ArXiv AI | 21-05-25

RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning

Authors: Qianyue Hao, Sibo Li, Jian Yuan, Yong Li |

阅读更多

来源: ArXiv AI | 21-05-25

ProMind-LLM: Proactive Mental Health Care via Causal Reasoning with Sensor Data

Authors: Xinzhe Zheng, Sijie Ji, Jiawei Sun, Renqi Chen, Wei Gao, Mani Srivastava |

阅读更多

来源: ArXiv AI | 21-05-25

MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem

Authors: Fan Liu, Zherui Yang, Cancheng Liu, Tianrui Song, Xiaofeng Gao, Hao Liu |

阅读更多

来源: ArXiv AI | 21-05-25

Toward Embodied AGI: A Review of Embodied AI and the Road Ahead

Authors: Yequan Wang, Aixin Sun |

阅读更多

来源: ArXiv AI | 21-05-25

Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning

Authors: Minwu Kim, Anubhav Shrestha, Safal Shrestha, Aadim Nepal, Keith Ross |

阅读更多

来源: ArXiv AI | 21-05-25

SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors

Authors: Maheep Chaudhary, Fazl Barez |

阅读更多

来源: ArXiv AI | 21-05-25

Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning

Authors: Zhaohui Yang, Shilei Jiang, Chen Hu, Linjing Li, Shihong Deng, Daxin Jiang |

阅读更多

来源: ArXiv AI | 21-05-25

Towards Reliable Proof Generation with LLMs: A Neuro-Symbolic Approach

Authors: Oren Sultan, Eitan Stern, Dafna Shahaf |

阅读更多

来源: ArXiv AI | 21-05-25

Guarded Query Routing for Large Language Models

Authors: Richard Šléher, William Brach, Tibor Sloboda, Kristián Košťál, Lukas Galke |

阅读更多

来源: ArXiv AI | 21-05-25

BACON: A fully explainable AI model with graded logic for decision making problems

Authors: Haishi Bai, Jozo Dujmovic, Jianwu Wang |

阅读更多

来源: ArXiv AI | 21-05-25

Let LLMs Break Free from Overthinking via Self-Braking Tuning

Authors: Haoran Zhao, Yuchen Yan, Yongliang Shen, Haolei Xu, Wenqi Zhang, Kaitao Song, Jian Shao, Weiming Lu, Jun Xiao, Yueting Zhuang |

阅读更多

来源: ArXiv AI | 21-05-25

SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas

Authors: Anjiang Wei, Yuheng Wu, Yingjia Wan, Tarun Suresh, Huanmi Tan, Zhanke Zhou, Sanmi Koyejo, Ke Wang, Alex Aiken |

阅读更多

来源: ArXiv AI | 21-05-25

Cost-Augmented Monte Carlo Tree Search for LLM-Assisted Planning

Authors: Zihao Zhang, Fei Liu |

阅读更多

来源: ArXiv AI | 21-05-25

ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions

Authors: Bufang Yang, Lilin Xu, Liekang Zeng, Kaiwei Liu, Siyang Jiang, Wenrui Lu, Hongkai Chen, Xiaofan Jiang, Guoliang Xing, Zhenyu Yan |

阅读更多

来源: ArXiv AI | 21-05-25

Google AI Ultrablog.google

阅读更多

来源: Hacker News | 21-05-25

Ask HN: Conversational AI to Learn a Language

阅读更多

来源: Hacker News | 21-05-25

US officials warn Apple's iPhone AI deal with Alibaba may boost China's AI sector

阅读更多

来源: The Decoder | 20-05-25

Stability AI releases a compact open text-to-audio model that runs on mobile devices

阅读更多

来源: The Decoder | 20-05-25

Japanese startup Sakana AI explores time-based thinking with brain-inspired AI model

阅读更多

来源: The Decoder | 20-05-25

Google's AI answers are changing user behavior by sharply reducing clicks to websites

阅读更多

来源: The Decoder | 20-05-25

Solving physics-based initial value problems with unsupervised machine learningaps.org

阅读更多

来源: Hacker News | 20-05-25

Questioning Representational Optimism in Deep Learninggithub.com/akarshkumar0101

阅读更多

来源: Hacker News | 20-05-25

Claude Code SDKanthropic.com

阅读更多

来源: Hacker News | 20-05-25

The behavior of LLMs in hiring decisions: Systemic biases in candidate selectiondavidrozado.substack.com

阅读更多

来源: Hacker News | 20-05-25

NeuroGen: Neural Network Parameter Generation via Large Language Models

Authors: Jiaqi Wang, Yusen Zhang, Xi Li |

阅读更多

来源: ArXiv AI | 20-05-25

ALAS: A Stateful Multi-LLM Agent Framework for Disruption-Aware Planning

Authors: Edward Y. Chang, Longling Geng |

阅读更多

来源: ArXiv AI | 20-05-25

MARGE: Improving Math Reasoning for LLMs with Guided Exploration

Authors: Jingyue Gao, Runji Lin, Keming Lu, Bowen Yu, Junyang Lin, Jianyu Chen |

阅读更多

来源: ArXiv AI | 20-05-25

Accelerating Adaptive Retrieval Augmented Generation via Instruction-Driven Representation Reduction of Retrieval Overlaps

Authors: Jie Ou, Jinyu Guo, Shuaihong Jiang, Zhaokun Wang, Libo Qin, Shunyu Yao, Wenhong Tian |

阅读更多

来源: ArXiv AI | 20-05-25

Bullying the Machine: How Personas Increase LLM Vulnerability

Authors: Ziwei Xu, Udit Sanghi, Mohan Kankanhalli |

阅读更多

来源: ArXiv AI | 20-05-25

Reasoning BO: Enhancing Bayesian Optimization with Long-Context Reasoning Power of LLMs

Authors: Zhuo Yang, Lingli Ge, Dong Han, Tianfan Fu, Yuqiang Li |

阅读更多

来源: ArXiv AI | 20-05-25

Correspondence of high-dimensional emotion structures elicited by video clips between humans and Multimodal LLMs

Authors: Haruka Asanuma, Naoko Koide-Majima, Ken Nakamura, Takato Horii, Shinji Nishimoto, Masafumi Oizumi |

阅读更多

来源: ArXiv AI | 20-05-25

TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenarios

Authors: Shaohang Wei, Wei Li, Feifan Song, Wen Luo, Tianyi Zhuang, Haochen Tan, Zhijiang Guo, Houfeng Wang |

阅读更多

来源: ArXiv AI | 20-05-25

From Grunts to Grammar: Emergent Language from Cooperative Foraging

Authors: Maytus Piriyajitakonkij, Rujikorn Charakorn, Weicheng Tao, Wei Pan, Mingfei Sun, Cheston Tan, Mengmi Zhang |

阅读更多

来源: ArXiv AI | 20-05-25

LLM-KG-Bench 3.0: A Compass for SemanticTechnology Capabilities in the Ocean of LLMs

Authors: Lars-Peter Meyer, Johannes Frey, Desiree Heim, Felix Brei, Claus Stadler, Kurt Junghanns, Michael Martin |

阅读更多

来源: ArXiv AI | 20-05-25

CAIM: Development and Evaluation of a Cognitive AI Memory Framework for Long-Term Interaction with Intelligent Agents

Authors: Rebecca Westhäußer, Frederik Berenz, Wolfgang Minker, Sebastian Zepf |

阅读更多

来源: ArXiv AI | 20-05-25

StarFT: Robust Fine-tuning of Zero-shot Models via Spuriosity Alignment

Authors: Younghyun Kim, Jongheon Jeong, Sangkyung Kwak, Kyungmin Lee, Juho Lee, Jinwoo Shin |

阅读更多

来源: ArXiv AI | 20-05-25

Adversarial Testing in LLMs: Insights into Decision-Making Vulnerabilities

Authors: Lili Zhang, Haomiaomiao Wang, Long Cheng, Libao Deng, Tomas Ward |

阅读更多

来源: ArXiv AI | 20-05-25

Enhancing LLMs for Time Series Forecasting via Structure-Guided Cross-Modal Alignment

Authors: Siming Sun, Kai Zhang, Xuejun Jiang, Wenchao Meng, Qinmin Yang |

阅读更多

来源: ArXiv AI | 20-05-25

Multi-Armed Bandits Meet Large Language Models

Authors: Djallel Bouneffouf, Raphael Feraud |

阅读更多

来源: ArXiv AI | 20-05-25

Agentic Publications: An LLM-Driven Framework for Interactive Scientific Publishing, Supplementing Traditional Papers with AI-Powered Knowledge Systems

Authors: Roberto Pugliese, George Kourousias, Francesco Venier, Grazia Garlatti Costa |

阅读更多

来源: ArXiv AI | 20-05-25

AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database

Authors: Rong Bian, Yu Geng, Zijian Yang, Bing Cheng |

阅读更多

来源: ArXiv AI | 20-05-25

MIT says a high-profile AI productivity study used data that cannot be trusted

阅读更多

来源: The Decoder | 20-05-25

OpenAI says GPT-5 is about doing everything better with "less model switching"

阅读更多

来源: The Decoder | 20-05-25

Dilbert creator Scott Adams says he will die soon from same cancer as Joe Bidenthewrap.com

阅读更多

来源: Hacker News | 20-05-25

Remarks on AI from NZnealstephenson.substack.com

阅读更多

来源: Hacker News | 20-05-25

GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art

Authors: Chenkai Zhang, Yiming Lei, Zeming Liu, Haitao Leng, Shaoguo Liu, Tingting Gao, Qingjie Liu, Yunhong Wang |

阅读更多

来源: ArXiv AI | 20-05-25

Disentangling Reasoning and Knowledge in Medical Large Language Models

Authors: Rahul Thapa, Qingyang Wu, Kevin Wu, Harrison Zhang, Angela Zhang, Eric Wu, Haotian Ye, Suhana Bedi, Nevin Aresh, Joseph Boen, Shriya Reddy, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou |

阅读更多

来源: ArXiv AI | 20-05-25

LLMs unlock new paths to monetizing exploits

Authors: Nicholas Carlini, Milad Nasr, Edoardo Debenedetti, Barry Wang, Christopher A. Choquette-Choo, Daphne Ippolito, Florian Tramèr, Matthew Jagielski |

阅读更多

来源: ArXiv AI | 20-05-25

Code-Driven Planning in Grid Worlds with Large Language Models

Authors: Ashwath Vaithinathan Aravindan, Zhisheng Tang, Mayank Kejriwal |

阅读更多

来源: ArXiv AI | 20-05-25

Embodied AI in Machine Learning -- is it Really Embodied?

Authors: Matej Hoffmann, Shubhan Parag Patni |

阅读更多

来源: ArXiv AI | 20-05-25

Interpretable Risk Mitigation in LLM Agent Systems

Authors: Jan Chojnacki |

阅读更多

来源: ArXiv AI | 20-05-25

Modeling cognitive processes of natural reading with transformer-based Language Models

Authors: Bruno Bianchi, Fermín Travi, Juan E. Kamienkowski |

阅读更多

来源: ArXiv AI | 20-05-25

Improving Assembly Code Performance with Large Language Models via Reinforcement Learning

Authors: Anjiang Wei, Tarun Suresh, Huanmi Tan, Yinglun Xu, Gagandeep Singh, Ke Wang, Alex Aiken |

阅读更多

来源: ArXiv AI | 20-05-25

Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models

Authors: Simeng Han, Stephen Xia, Grant Zhang, Howard Dai, Chen Liu, Lichang Chen, Hoang Huy Nguyen, Hongyuan Mei, Jiayuan Mao, R. Thomas McCoy |

阅读更多

来源: ArXiv AI | 20-05-25

TACO: Rethinking Semantic Communications with Task Adaptation and Context Embedding

Authors: Achintha Wijesinghe, Weiwei Wang, Suchinthaka Wanninayaka, Songyang Zhang, Zhi Ding |

阅读更多

来源: ArXiv AI | 20-05-25

RAGSynth: Synthetic Data for Robust and Faithful RAG Component Optimization

Authors: Haiyang Shen, Hang Yan, Zhongshi Xing, Mugeng Liu, Yue Li, Zhiyang Chen, Yuxiang Wang, Jiuzheng Wang, Yun Ma |

阅读更多

来源: ArXiv AI | 20-05-25

Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory

Authors: Yexiang Liu, Zekun Li, Zhi Fang, Nan Xu, Ran He, Tieniu Tan |

阅读更多

来源: ArXiv AI | 20-05-25

Navigating the Alpha Jungle: An LLM-Powered MCTS Framework for Formulaic Factor Mining

Authors: Yu Shi, Yitong Duan, Jian Li |

阅读更多

来源: ArXiv AI | 20-05-25

Can Global XAI Methods Reveal Injected Bias in LLMs? SHAP vs Rule Extraction vs RuleSHAP

Authors: Francesco Sovrano |

阅读更多

来源: ArXiv AI | 20-05-25

LD-Scene: LLM-Guided Diffusion for Controllable Generation of Adversarial Safety-Critical Driving Scenarios

Authors: Mingxing Peng, Yuting Xie, Xusen Guo, Ruoyu Yao, Hai Yang, Jun Ma |

阅读更多

来源: ArXiv AI | 20-05-25

Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs

Authors: Zhangying Feng, Qianglong Chen, Ning Lu, Yongqian Li, Siqi Cheng, Shuangmu Peng, Duyu Tang, Shengcai Liu, Zhirui Zhang |

阅读更多

来源: ArXiv AI | 20-05-25

SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning

Authors: Zheng Li, Qingxiu Dong, Jingyuan Ma, Di Zhang, Zhifang Sui |

阅读更多

来源: ArXiv AI | 20-05-25

Anthropic is forced to apologize after Claude undercuts its legal team

阅读更多

来源: The Decoder | 19-05-25

Show HN: I modeled the Voynich Manuscript with SBERT to test for structuregithub.com/brianmg

阅读更多

来源: Hacker News | 19-05-25

Meta's Behemoth AI model delay signals struggles to match new paradigms

阅读更多

来源: The Decoder | 19-05-25

Emergent social conventions and collective bias in LLM populationsscience.org

阅读更多

来源: Hacker News | 19-05-25

Understanding Transformers via N-gram Statisticsarxiv.org

阅读更多

来源: Hacker News | 18-05-25

O2 VoLTE: locating any customer with a phone callmastdatabase.co.uk

阅读更多

来源: Hacker News | 18-05-25

Emergence of Structure in Ensembles of Random Neural Networks

Authors: Luca Muscarnera, Luigi Loreti, Giovanni Todeschini, Alessio Fumagalli, Francesco Regazzoni |

阅读更多

来源: ArXiv AI | 18-05-25

SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity

Authors: Shihao Zou, Qingfeng Li, Wei Ji, Jingjing Li, Yongkui Yang, Guoqi Li, Chao Dong |

阅读更多

来源: ArXiv AI | 18-05-25

ILIF: Temporal Inhibitory Leaky Integrate-and-Fire Neuron for Overactivation in Spiking Neural Networks

Authors: Kai Sun, Peibo Duan, Levin Kuhlmann, Beilun Wang, Bin Zhang |

阅读更多

来源: ArXiv AI | 18-05-25

Visual Fidelity Index for Generative Semantic Communications with Critical Information Embedding

Authors: Jianhao Huang, Qunsong Zeng, Kaibin Huang |

阅读更多

来源: ArXiv AI | 18-05-25

Rethinking Repetition Problems of LLMs in Code Generation

Authors: Yihong Dong, Yuchen Liu, Xue Jiang, Zhi Jin, Ge Li |

阅读更多

来源: ArXiv AI | 18-05-25

Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations?

Authors: Pedro Orvalho, Marta Kwiatkowska |

阅读更多

来源: ArXiv AI | 18-05-25

IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning

Authors: Dechen Gao, Hang Wang, Hanchu Zhou, Nejib Ammar, Shatadal Mishra, Ahmadreza Moradipari, Iman Soltani, Junshan Zhang |

阅读更多

来源: ArXiv AI | 18-05-25

PIF: Anomaly detection via preference embedding

Authors: Filippo Leveni, Luca Magri, Giacomo Boracchi, Cesare Alippi |

阅读更多

来源: ArXiv AI | 18-05-25

Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps

Authors: Ningyuan Yang, Jiaxuan Gao, Feng Gao, Yi Wu, Chao Yu |

阅读更多

来源: ArXiv AI | 18-05-25

Neural Thermodynamic Laws for Large Language Model Training

Authors: Ziming Liu, Yizhou Liu, Jeff Gore, Max Tegmark |

阅读更多

来源: ArXiv AI | 18-05-25

Pre-Act: Multi-Step Planning and Reasoning Improves Acting in LLM Agents

Authors: Mrinal Rawat, Ambuje Gupta, Rushil Goomer, Alessandro Di Bari, Neha Gupta, Roberto Pieraccini |

阅读更多

来源: ArXiv AI | 18-05-25

Demystifying AI Agents: The Final Generation of Intelligence

Authors: Kevin J McNamara, Rhea Pritham Marpu |

阅读更多

来源: ArXiv AI | 18-05-25

Leveraging Graph Retrieval-Augmented Generation to Support Learners' Understanding of Knowledge Concepts in MOOCs

Authors: Mohamed Abdelmagied, Mohamed Amine Chatti, Shoeb Joarder, Qurat Ul Ain, Rawaa Alatrash |

阅读更多

来源: ArXiv AI | 18-05-25

Empirically evaluating commonsense intelligence in large language models with large-scale human judgments

Authors: Tuan Dung Nguyen, Duncan J. Watts, Mark E. Whiting |

阅读更多

来源: ArXiv AI | 18-05-25

Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models

Authors: Annie Wong, Thomas Bäck, Aske Plaat, Niki van Stein, Anna V. Kononova |

阅读更多

来源: ArXiv AI | 18-05-25

Soundcloud updates its AI training policy, but it's still unclear

阅读更多

来源: The Decoder | 18-05-25

Geoffrey Hinton's wildly overconfident AI prediction failed—now it's a lesson in humility

阅读更多

来源: The Decoder | 18-05-25

How 'The Little Prince' and AI help us better understand language development in the brain

阅读更多

来源: The Decoder | 18-05-25

LLMs are more persuasive than incentivized human persuadersarxiv.org

阅读更多

来源: Hacker News | 18-05-25

Unspoken Currency of Office Politics: Leverage and Sanction Between Coworkersgraphthinking.blogspot.com

阅读更多

来源: Hacker News | 18-05-25

Transformer neural net learns to run Conway's Game of Life just from examplessidsite.com

阅读更多

来源: Hacker News | 17-05-25

I'm Peter Roberts, immigration attorney, who does work for YC and startups. AMA

阅读更多

来源: Hacker News | 17-05-25

Show HN: Merliot – plugging physical devices into LLMsgithub.com/merliot

阅读更多

来源: Hacker News | 17-05-25

A Research Preview of Codexopenai.com

阅读更多

来源: Hacker News | 17-05-25

MIT asks arXiv to withdraw preprint of paper on AI and scientific discoveryeconomics.mit.edu

阅读更多

来源: Hacker News | 17-05-25

Getting AI to write good SQLcloud.google.com

阅读更多

来源: Hacker News | 17-05-25

Meta introduces OMol25 and UMA, new open AI tools for molecular research

阅读更多

来源: The Decoder | 17-05-25

Anthropic is reportedly testing Claude models that can fix their own mistakes

阅读更多

来源: The Decoder | 17-05-25

Will AI systems perform poorly due to AI-generated material in training data?acm.org

阅读更多

来源: Hacker News | 17-05-25

U.S. is cracking down on Huawei's AI hardware while loosening its general export regulations

阅读更多

来源: The Decoder | 16-05-25

After months of coding with LLMs, I'm going back to using my brainalbertofortin.com

阅读更多

来源: Hacker News | 16-05-25

The unreasonable effectiveness of an LLM agent loop with tool usesketch.dev

阅读更多

来源: Hacker News | 16-05-25

Show HN: Min.js style compression of tech docs for LLM contextgithub.com/marv1nnnnn

阅读更多

来源: Hacker News | 16-05-25

Google brings Gemini AI to smartwatches, cars, TVs, and XR headsets

阅读更多

来源: The Decoder | 15-05-25

OpenAI says its latest models outperform doctors in medical benchmark

阅读更多

来源: The Decoder | 15-05-25

Saudi Arabia founds AI company "Humain" - US relaxes chip export rules for Gulf states

阅读更多

来源: The Decoder | 15-05-25

Nvidia will supply advanced chips for Saudi Arabia’s Humain AI project

阅读更多

来源: The Decoder | 15-05-25

GreenFactory: Ensembling Zero-Cost Proxies to Estimate Performance of Neural Networks

Authors: Gabriel Cortês, Nuno Lourenço, Paolo Romano, Penousal Machado |

阅读更多

来源: ArXiv AI | 15-05-25

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Authors: Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Huazuo Gao, Jiashi Li, Liyue Zhang, Panpan Huang, Shangyan Zhou, Shirong Ma, Wenfeng Liang, Ying He, Yuqing Wang, Yuxuan Liu, Y.X. Wei |

阅读更多

来源: ArXiv AI | 15-05-25

A 2D Semantic-Aware Position Encoding for Vision Transformers

Authors: Xi Chen, Shiyang Zhou, Muqi Huang, Jiaxu Feng, Yun Xiong, Kun Zhou, Biao Yang, Yuhui Zhang, Huishuai Bao, Sijia Peng, Chuan Li, Feng Shi |

阅读更多

来源: ArXiv AI | 15-05-25

Evaluating GPT- and Reasoning-based Large Language Models on Physics Olympiad Problems: Surpassing Human Performance and Implications for Educational Assessment

Authors: Paul Tschisgale, Holger Maus, Fabian Kieser, Ben Kroehs, Stefan Petersen, Peter Wulff |

阅读更多

来源: ArXiv AI | 15-05-25

Customizing a Large Language Model for VHDL Design of High-Performance Microprocessors

Authors: Nicolas Dupuis, Ravi Nair, Shyam Ramji, Sean McClintock, Nishant Chauhan, Priyanka Nagpal, Bart Blaner, Ken Valk, Leon Stok, Ruchir Puri |

阅读更多

来源: ArXiv AI | 15-05-25

How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference

Authors: Nidhal Jegham, Marwen Abdelatti, Lassad Elmoubarki, Abdeltawab Hendawi |

阅读更多

来源: ArXiv AI | 15-05-25

WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models

Authors: Abdullah Mushtaq, Imran Taj, Rafay Naeem, Ibrahim Ghaznavi, Junaid Qadir |

阅读更多

来源: ArXiv AI | 15-05-25

Automated Meta Prompt Engineering for Alignment with the Theory of Mind

Authors: Aaron Baughman, Rahul Agarwal, Eduardo Morales, Gozde Akay |

阅读更多

来源: ArXiv AI | 15-05-25

The Influence of Human-inspired Agentic Sophistication in LLM-driven Strategic Reasoners

Authors: Vince Trencsenyi, Agnieszka Mensfelt, Kostas Stathis |

阅读更多

来源: ArXiv AI | 15-05-25

Reproducibility Study of "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents"

Authors: Pedro M. P. Curvo, Mara Dragomir, Salvador Torpes, Mohammadmahdi Rahimi |

阅读更多

来源: ArXiv AI | 15-05-25

Beyond the Known: Decision Making with Counterfactual Reasoning Decision Transformer

Authors: Minh Hoang Nguyen, Linh Le Pham Van, Thommen George Karimpanal, Sunil Gupta, Hung Le |

阅读更多

来源: ArXiv AI | 15-05-25

Improving the Reliability of LLMs: Combining CoT, RAG, Self-Consistency, and Self-Verification

Authors: Adarsh Kumar, Hwiyoon Kim, Jawahar Sai Nathani, Neil Roy |

阅读更多

来源: ArXiv AI | 15-05-25

Show HN: Muscle-Mem, a behavior cache for AI agentsgithub.com/pig-dot-dev

阅读更多

来源: Hacker News | 15-05-25

A server that wasn't meant to existdragas.net

阅读更多

来源: Hacker News | 15-05-25

LLMs get lost in multi-turn conversationarxiv.org

阅读更多

来源: Hacker News | 15-05-25

AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithmsdeepmind.google

阅读更多

来源: Hacker News | 15-05-25

Launch HN: Jazzberry (YC X25) – AI agent for finding bugs

阅读更多

来源: Hacker News | 15-05-25

Show HN: YapCards (iOS) – Voice-driven flashcards with AI feedback

阅读更多

来源: Hacker News | 15-05-25

100 experts call for more research into the control of AI systems

阅读更多

来源: The Decoder | 14-05-25

Show HN: HelixDB – Open-source vector-graph database for AI applications (Rust)github.com/helixdb

阅读更多

来源: Hacker News | 14-05-25

Build real-time knowledge graph for documents with LLMcocoindex.io

阅读更多

来源: Hacker News | 14-05-25

EM-LLM: Human-Inspired Episodic Memory for Infinite Context LLMsgithub.com/em-llm

阅读更多

来源: Hacker News | 14-05-25

A Survey of Deep Learning for Complex Speech Spectrograms

Authors: Yuying Xie, Zheng-Hua Tan |

阅读更多

来源: ArXiv AI | 14-05-25

Securing RAG: A Risk Assessment and Mitigation Framework

Authors: Lukas Ammann, Sara Ott, Christoph R. Landolt, Marco P. Lehmann |

阅读更多

来源: ArXiv AI | 14-05-25

CodePDE: An Inference Framework for LLM-driven PDE Solver Generation

Authors: Shanda Li, Tanya Marwah, Junhong Shen, Weiwei Sun, Andrej Risteski, Yiming Yang, Ameet Talwalkar |

阅读更多

来源: ArXiv AI | 14-05-25

Winning at All Cost: A Small Environment for Eliciting Specification Gaming Behaviors in Large Language Models

Authors: Lars Malmqvist |

阅读更多

来源: ArXiv AI | 14-05-25

Enhancing Trust Management System for Connected Autonomous Vehicles Using Machine Learning Methods: A Survey

Authors: Qian Xu, Lei Zhang, Yixiao Liu |

阅读更多

来源: ArXiv AI | 14-05-25

The Correspondence Between Bounded Graph Neural Networks and Fragments of First-Order Logic

Authors: Bernardo Cuenca Grau, Przemysław A. Wałęga |

阅读更多

来源: ArXiv AI | 14-05-25

Lost in Transmission: When and Why LLMs Fail to Reason Globally

Authors: Tobias Schnabel, Kiran Tomlinson, Adith Swaminathan, Jennifer Neville |

阅读更多

来源: ArXiv AI | 14-05-25

Decoding Neighborhood Environments with Large Language Models

Authors: Andrew Cart, Shaohu Zhang, Melanie Escue, Xugui Zhou, Haitao Zhao, Prashanth BusiReddyGari, Beiyu Lin, Shuang Li |

阅读更多

来源: ArXiv AI | 14-05-25

Benchmarking AI scientists in omics data-driven biological research

Authors: Erpai Luo, Jinmeng Jia, Yifan Xiong, Xiangyu Li, Xiaobo Guo, Baoqi Yu, Lei Wei, Xuegong Zhang |

阅读更多

来源: ArXiv AI | 14-05-25

Evaluating LLM Metrics Through Real-World Capabilities

Authors: Justin K Miller, Wenjia Tang |

阅读更多

来源: ArXiv AI | 14-05-25

Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation

Authors: Enci Zhang, Xingang Yan, Wei Lin, Tianxiang Zhang, Qianchun Lu |

阅读更多

来源: ArXiv AI | 14-05-25

Strategy-Augmented Planning for Large Language Models via Opponent Exploitation

Authors: Shuai Xu, Sijia Cui, Yanna Wang, Bo Xu, Qi Wang |

阅读更多

来源: ArXiv AI | 14-05-25

Achieving Scalable Robot Autonomy via neurosymbolic planning using lightweight local LLM

Authors: Nicholas Attolino, Alessio Capitanelli, Fulvio Mastrogiovanni |

阅读更多

来源: ArXiv AI | 14-05-25

Guiding LLM-based Smart Contract Generation with Finite State Machine

Authors: Hao Luo, Yuhao Lin, Xiao Yan, Xintong Hu, Yuxiang Wang, Qiming Zeng, Hao Wang, Jiawei Jiang |

阅读更多

来源: ArXiv AI | 14-05-25

Integrating Natural Language Processing and Exercise Monitoring for Early Diagnosis of Metabolic Syndrome: A Deep Learning Approach

Authors: Yichen Zhao, Yuhua Wang, Xi Cheng, Junhao Fang, Yang Yang |

阅读更多

来源: ArXiv AI | 14-05-25

LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs

Authors: K M Sajjadul Islam, Ayesha Siddika Nipu, Jiawei Wu, Praveen Madiraju |

阅读更多

来源: ArXiv AI | 14-05-25

DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models

Authors: Xiaoyang Chen, Xinan Dai, Yu Du, Qian Feng, Naixu Guo, Tingshuo Gu, Yuting Gao, Yingyi Gao, Xudong Han, Xiang Jiang, Yilin Jin, Hongyi Lin, Shisheng Lin, Xiangnan Li, Yuante Li, Yixing Li, Zhentao Lai, Zilu Ma, Yingrong Peng, Jiacheng Qian, Hao-Yu Sun, Jianbo Sun, Zirui Wang, Siwei Wu, Zian Wang, Bin Xu, Jianghao Xu, Yiyang Yu, Zichuan Yang, Hongji Zha, Ruichong Zhang |

阅读更多

来源: ArXiv AI | 14-05-25

OpenAI's chief scientist Jakub Pachocki says there is evidence that AI models discover novel insights

阅读更多

来源: The Decoder | 14-05-25

Insurers launch cover for losses caused by AI chatbot errorsft.com

阅读更多

来源: Hacker News | 14-05-25

Garbage collection of object storage at scalewarpstream.com

阅读更多

来源: Hacker News | 14-05-25

DeepSeek’s founder is threatening US dominance in AI racebloomberg.com

阅读更多

来源: Hacker News | 14-05-25

Confident user prompts make LLMs more likely to hallucinate

阅读更多

来源: The Decoder | 13-05-25

Stanford researchers find AI agents improve when guided by past successes

阅读更多

来源: The Decoder | 13-05-25

Microsoft could sacrifice some OpenAI shares - but wants to secure access to AI technology

阅读更多

来源: The Decoder | 13-05-25

HealthBench – An evaluation for AI systems and human healthopenai.com

阅读更多

来源: Hacker News | 13-05-25

A conversation about AI for science with Jason Pruetlanl.gov

阅读更多

来源: Hacker News | 13-05-25

A class of distributed automata that contains the modal mu-fragment

Authors: Veeti Ahvonen, Damian Heiman, Antti Kuusisto |

阅读更多

来源: ArXiv AI | 13-05-25

Reliable Collaborative Conversational Agent System Based on LLMs and Answer Set Programming

Authors: Yankai Zeng, Gopal Gupta |

阅读更多

来源: ArXiv AI | 13-05-25

KCluster: An LLM-based Clustering Approach to Knowledge Component Discovery

Authors: Yumou Wei, Paulo Carvalho, John Stamper |

阅读更多

来源: ArXiv AI | 13-05-25

Exploring Multimodal Foundation AI and Expert-in-the-Loop for Sustainable Management of Wild Salmon Fisheries in Indigenous Rivers

Authors: Chi Xu, Yili Jin, Sami Ma, Rongsheng Qian, Hao Fang, Jiangchuan Liu, Xue Liu, Edith C.H. Ngai, William I. Atlas, Katrina M. Connors, Mark A. Spoljaric |

阅读更多

来源: ArXiv AI | 13-05-25

Control Plane as a Tool: A Scalable Design Pattern for Agentic AI Systems

Authors: Sivasathivel Kandasamy |

阅读更多

来源: ArXiv AI | 13-05-25

Embodied Intelligence: The Key to Unblocking Generalized Artificial Intelligence

Authors: Jinhao Jiang, Changlin Chen, Shile Feng, Wanru Geng, Zesheng Zhou, Ni Wang, Shuai Li, Feng-Qi Cui, Erbao Dong |

阅读更多

来源: ArXiv AI | 13-05-25

From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering

Authors: Gaurab Sarkar, Sougata Saha |

阅读更多

来源: ArXiv AI | 13-05-25

LLM-Augmented Chemical Synthesis and Design Decision Programs

Authors: Haorui Wang, Jeff Guo, Lingkai Kong, Rampi Ramprasad, Philippe Schwaller, Yuanqi Du, Chao Zhang |

阅读更多

来源: ArXiv AI | 13-05-25

Explainable AI the Latest Advancements and New Trends

Authors: Bowen Long, Enjie Liu, Renxi Qiu, Yanqing Duan |

阅读更多

来源: ArXiv AI | 13-05-25

DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs

Authors: Yubo Shu, Zhewei Huang, Xin Wu, Chen Hu, Shuchang Zhou, Daxin Jiang |

阅读更多

来源: ArXiv AI | 13-05-25

Efficient Fault Detection in WSN Based on PCA-Optimized Deep Neural Network Slicing Trained with GOA

Authors: Mahmood Mohassel Feghhi, Raya Majid Alsharfa, Majid Hameed Majeed |

阅读更多

来源: ArXiv AI | 13-05-25

RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models

Authors: Hanzheng Dai, Yuanliang Li, Zhibo Zhang, Jun Yan |

阅读更多

来源: ArXiv AI | 13-05-25

Architectural Precedents for General Agents using Large Language Models

Authors: Robert E. Wray, James R. Kirk, John E. Laird |

阅读更多

来源: ArXiv AI | 13-05-25

AIS Data-Driven Maritime Monitoring Based on Transformer: A Comprehensive Review

Authors: Zhiye Xie, Enmei Tu, Xianping Fu, Guoliang Yuan, Yi Han |

阅读更多

来源: ArXiv AI | 13-05-25

Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks

Authors: Kai Xu, YiWei Mao, XinYi Guan, ZiLong Feng |

阅读更多

来源: ArXiv AI | 13-05-25

How well do LLMs reason over tabular data, really?

Authors: Cornelius Wolff, Madelon Hulsebos |

阅读更多

来源: ArXiv AI | 13-05-25

QuantX: A Framework for Hardware-Aware Quantization of Generative AI Workloads

Authors: Khurram Mazher, Saad Bin Nasir |

阅读更多

来源: ArXiv AI | 13-05-25

YuLan-OneSim: Towards the Next Generation of Social Simulator with Large Language Models

Authors: Lei Wang, Heyang Gao, Xiaohe Bo, Xu Chen, Ji-Rong Wen |

阅读更多

来源: ArXiv AI | 13-05-25

"I Apologize For Not Understanding Your Policy": Exploring the Specification and Evaluation of User-Managed Access Control Policies by AI Virtual Assistants

Authors: Jennifer Mondragon, Carlos Rubio-Medrano, Gael Cruz, Dvijesh Shastri |

阅读更多

来源: ArXiv AI | 13-05-25

Multi-Agent Systems for Robotic Autonomy with LLMs

Authors: Junhong Chen, Ziqi Yang, Haoyuan G Xu, Dandan Zhang, George Mylonas |

阅读更多

来源: ArXiv AI | 13-05-25

Evolutionary thoughts: integration of large language models and evolutionary algorithms

Authors: Antonio Jimeno Yepes, Pieter Barnard |

阅读更多

来源: ArXiv AI | 13-05-25

What Is Next for LLMs? Next-Generation AI Computing Hardware Using Photonic Chips

Authors: Renjie Li, Wenjie Wei, Qi Xin, Xiaoli Liu, Sixuan Mao, Erik Ma, Zijian Chen, Malu Zhang, Haizhou Li, Zhaoyu Zhang |

阅读更多

来源: ArXiv AI | 13-05-25

AgentXploit: End-to-End Redteaming of Black-Box AI Agents

Authors: Zhun Wang, Vincent Siu, Zhe Ye, Tianneng Shi, Yuzhou Nie, Xuandong Zhao, Chenguang Wang, Wenbo Guo, Dawn Song |

阅读更多

来源: ArXiv AI | 13-05-25

Human-in-the-Loop AI for HVAC Management Enhancing Comfort and Energy Efficiency

Authors: Xinyu Liang, Frits de Nijs, Buser Say, Hao Wang |

阅读更多

来源: ArXiv AI | 13-05-25

Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI

Authors: Benjamin Raphael Ernhofer, Daniil Prokhorov, Jannica Langner, Dominik Bollmann |

阅读更多

来源: ArXiv AI | 13-05-25

IRNN: Innovation-driven Recurrent Neural Network for Time-Series Data Modeling and Prediction

Authors: Yifan Zhou, Yibo Wang, Chao Shang |

阅读更多

来源: ArXiv AI | 13-05-25

Multimodal Sentiment Analysis on CMU-MOSEI Dataset using Transformer-based Models

Authors: Jugal Gajjar, Kaustik Ranaware |

阅读更多

来源: ArXiv AI | 13-05-25

LLMs Outperform Experts on Challenging Biology Benchmarks

Authors: Lennart Justen |

阅读更多

来源: ArXiv AI | 13-05-25

UniSymNet: A Unified Symbolic Network Guided by Transformer

Authors: Xinxin Li, Juan Zhang, Da Li, Xingyu Liu, Jin Xu, Junping Yin |

阅读更多

来源: ArXiv AI | 13-05-25

The Application of Deep Learning for Lymph Node Segmentation: A Systematic Review

Authors: Jingguo Qu, Xinyang Han, Man-Lik Chui, Yao Pu, Simon Takadiyi Gunda, Ziman Chen, Jing Qin, Ann Dorothy King, Winnie Chiu-Wing Chu, Jing Cai, Michael Tin-Cheung Ying |

阅读更多

来源: ArXiv AI | 13-05-25

A Scaling Law for Token Efficiency in LLM Fine-Tuning Under Fixed Compute Budgets

Authors: Ryan Lagasse, Aidan Kiernans, Avijit Ghosh, Shiri Dori-Hacohen |

阅读更多

来源: ArXiv AI | 13-05-25

HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics

Authors: Lennart Luettgau, Harry Coppock, Magda Dubois, Christopher Summerfield, Cozmin Ududec |

阅读更多

来源: ArXiv AI | 13-05-25

Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods

Authors: Markov Grey, Charbel-Raphaël Segerie |

阅读更多

来源: ArXiv AI | 13-05-25

Leveraging Large Language Models for enzymatic reaction prediction and characterization

Authors: Lorenzo Di Fruscia, Jana Marie Weber |

阅读更多

来源: ArXiv AI | 13-05-25

Combining Abstract Argumentation and Machine Learning for Efficiently Analyzing Low-Level Process Event Streams

Authors: Bettina Fazzinga, Sergio Flesca, Filippo Furfaro, Luigi Pontieri, Francesco Scala |

阅读更多

来源: ArXiv AI | 13-05-25

APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning

Authors: Azim Ospanov, Roozbeh Yousefzadeh |

阅读更多

来源: ArXiv AI | 13-05-25

ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding

Authors: Shuai Wang, Ivona Najdenkoska, Hongyi Zhu, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring |

阅读更多

来源: ArXiv AI | 13-05-25

Free and Fair Hardware: A Pathway to Copyright Infringement-Free Verilog Generation using LLMs

Authors: Sam Bush, Matthew DeLorenzo, Phat Tieu, Jeyavijayan Rajendran |

阅读更多

来源: ArXiv AI | 13-05-25

Bytedance launches Agent TARS, an open-source AI automation agent

阅读更多

来源: The Decoder | 12-05-25

Google recaps how its LLMs could change in-game interactions

阅读更多

来源: The Decoder | 12-05-25

Five major obstacles are holding back RAG systems in healthcare

阅读更多

来源: The Decoder | 12-05-25

Writing an LLM from scratch, part 13 – attention heads are dumbgilesthomas.com

阅读更多

来源: Hacker News | 12-05-25

US Copyright Office found AI companies breach copyright. Its boss was firedtheregister.com

阅读更多

来源: Hacker News | 12-05-25

Klarna changes its AI tune and again recruits humans for customer servicecustomerexperiencedive.com

阅读更多

来源: Hacker News | 12-05-25

Avoiding AI is hard – but our freedom to opt out must be protectedtheconversation.com

阅读更多

来源: Hacker News | 12-05-25

Custom SIM card in Tesla Model 3 2024, Tesla Model Y 2025 and Cybertruckolegkutkov.me

阅读更多

来源: Hacker News | 12-05-25

OpenAI adds new fine-tuning options for o4-mini and GPT-4.1

阅读更多

来源: The Decoder | 11-05-25

Software Development Life Cycle Perspective: A Survey of Benchmarks for CodeLLMs and Agents

Authors: Kaixin Wang, Tianlin Li, Xiaoyu Zhang, Chong Wang, Weisong Sun, Yang Liu, Bin Shi |

阅读更多

来源: ArXiv AI | 11-05-25

T-T: Table Transformer for Tagging-based Aspect Sentiment Triplet Extraction

Authors: Kun Peng, Chaodong Tong, Cong Cao, Hao Peng, Qian Li, Guanlin Wu, Lei Jiang, Yanbing Liu, Philip S. Yu |

阅读更多

来源: ArXiv AI | 11-05-25

Put CASH on Bandits: A Max K-Armed Problem for Automated Machine Learning

Authors: Amir Rezaei Balef, Claire Vernade, Katharina Eggensperger |

阅读更多

来源: ArXiv AI | 11-05-25

Incentive-Aware Machine Learning; Robustness, Fairness, Improvement & Causality

Authors: Chara Podimata |

阅读更多

来源: ArXiv AI | 11-05-25

High-fidelity Grain Growth Modeling: Leveraging Deep Learning for Fast Computations

Authors: Pungponhavoan Tep, Marc Bernacki |

阅读更多

来源: ArXiv AI | 11-05-25

Threshold Modulation for Online Test-Time Adaptation of Spiking Neural Networks

Authors: Kejie Zhao, Wenjia Hua, Aiersi Tuerhong, Luziwei Leng, Yuxin Ma, Qinghua Guo |

阅读更多

来源: ArXiv AI | 11-05-25

Towards Artificial Intelligence Research Assistant for Expert-Involved Learning

Authors: Tianyu Liu, Simeng Han, Xiao Luo, Hanchen Wang, Pan Lu, Biqing Zhu, Yuge Wang, Keyi Li, Jiapeng Chen, Rihao Qu, Yufeng Liu, Xinyue Cui, Aviv Yaish, Yuhang Chen, Minsheng Hao, Chuhan Li, Kexing Li, Arman Cohan, Hua Xu, Mark Gerstein, James Zou, Hongyu Zhao |

阅读更多

来源: ArXiv AI | 11-05-25

StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

Authors: Haibo Wang, Bo Feng, Zhengfeng Lai, Mingze Xu, Shiyu Li, Weifeng Ge, Afshin Dehghan, Meng Cao, Ping Huang |

阅读更多

来源: ArXiv AI | 11-05-25

TransProQA: an LLM-based literary Translation evaluation metric with Professional Question Answering

Authors: Ran Zhang, Wei Zhao, Lieve Macken, Steffen Eger |

阅读更多

来源: ArXiv AI | 11-05-25

Large Language Models are Autonomous Cyber Defenders

Authors: Sebastián R. Castro, Roberto Campbell, Nancy Lau, Octavio Villalobos, Jiaqi Duan, Alvaro A. Cardenas |

阅读更多

来源: ArXiv AI | 11-05-25

The Promise and Limits of LLMs in Constructing Proofs and Hints for Logic Problems in Intelligent Tutoring Systems

Authors: Sutapa Dey Tithi, Arun Kumar Ramesh, Clara DiMarco, Xiaoyi Tian, Nazia Alam, Kimia Fazeli, Tiffany Barnes |

阅读更多

来源: ArXiv AI | 11-05-25

Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards

Authors: Jaeho Kim, Yunseok Lee, Seulki Lee |

阅读更多

来源: ArXiv AI | 11-05-25

Position: Epistemic Artificial Intelligence is Essential for Machine Learning Models to Know When They Do Not Know

Authors: Shireen Kudukkil Manchingal, Fabio Cuzzolin |

阅读更多

来源: ArXiv AI | 11-05-25

A Reputation System for Large Language Model-based Multi-agent Systems to Avoid the Tragedy of the Commons

Authors: Siyue Ren, Wanli Fu, Xinkun Zou, Chen Shen, Yi Cai, Chen Chu, Zhen Wang, Shuyue Hu |

阅读更多

来源: ArXiv AI | 11-05-25

Is there a half-life for the success rates of AI agents?

Authors: Toby Ord |

阅读更多

来源: ArXiv AI | 11-05-25

Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation

Authors: Luca Marzari, Isabella Mastroeni, Alessandro Farinelli |

阅读更多

来源: ArXiv AI | 11-05-25

A Pain Assessment Framework based on multimodal data and Deep Machine Learning methods

Authors: Stefanos Gkikas |

阅读更多

来源: ArXiv AI | 11-05-25

ZeroSearch: Alibaba trains search assistant in AI simulation

阅读更多

来源: The Decoder | 11-05-25

Show HN: Code Claude Codegithub.com/rvca212

阅读更多

来源: Hacker News | 11-05-25

LTXVideo 13B AI video generationltxv.video

阅读更多

来源: Hacker News | 10-05-25

ChatGPT's user base expands while established web giants lose ground

阅读更多

来源: The Decoder | 10-05-25

Hugging Face unveils experimental AI agent for computers

阅读更多

来源: The Decoder | 10-05-25

OpenAI plans "cderGPT" for the US Food and Drug Administration (FDA)

阅读更多

来源: The Decoder | 10-05-25

Odin, a Pragmatic C Alternative with a Go Flavourbitshifters.cc

阅读更多

来源: Hacker News | 10-05-25

Fighting Unwanted Notifications with Machine Learning in Chromechromium.org

阅读更多

来源: Hacker News | 10-05-25

Microsoft leverages Google's open A2A protocol for interoperable AI agents

阅读更多

来源: The Decoder | 09-05-25

A flat pricing subscription for Claude Codeanthropic.com

阅读更多

来源: Hacker News | 09-05-25

Ciro (YC S22) is hiring a software engineer to build AI agents for salesycombinator.com

阅读更多

来源: Hacker News | 09-05-25

Notes on rolling out Cursor and Claude Codeghiculescu.substack.com

阅读更多

来源: Hacker News | 09-05-25

OpenAI launches a program to partner with governments on global AI infrastructure

阅读更多

来源: The Decoder | 08-05-25

EU's leading AI startup Mistral unveils Medium 3 and Le Chat Enterprise

阅读更多

来源: The Decoder | 08-05-25

By 2026, most firms expect to have a Chief AI Officer on staff

阅读更多

来源: The Decoder | 08-05-25

Web search on the Anthropic APIanthropic.com

阅读更多

来源: Hacker News | 08-05-25

Create and edit images with Gemini 2.0 in previewgoogleblog.com

阅读更多

来源: Hacker News | 08-05-25

Mistral ships Le Chat – enterprise AI assistant that can run on premmistral.ai

阅读更多

来源: Hacker News | 08-05-25

Sparsity is All You Need: Rethinking Biological Pathway-Informed Approaches in Deep Learning

Authors: Isabella Caranzano, Corrado Pancotti, Cesare Rollo, Flavio Sartori, Pietro Liò, Piero Fariselli, Tiziana Sanavia |

阅读更多

来源: ArXiv AI | 08-05-25

Balancing Accuracy, Calibration, and Efficiency in Active Learning with Vision Transformers Under Label Noise

Authors: Moseli Mots'oehli, Hope Mogale, Kyungim Baek |

阅读更多

来源: ArXiv AI | 08-05-25

Multi-Granular Attention based Heterogeneous Hypergraph Neural Network

Authors: Hong Jin, Kaicheng Zhou, Jie Yin, Lan You, Zhifeng Zhou |

阅读更多

来源: ArXiv AI | 08-05-25

Detecting Concept Drift in Neural Networks Using Chi-squared Goodness of Fit Testing

Authors: Jacob Glenn Ayers, Buvaneswari A. Ramanan, Manzoor A. Khan |

阅读更多

来源: ArXiv AI | 08-05-25

OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models

Authors: Xiaoyu Xu, Minxin Du, Qingqing Ye, Haibo Hu |

阅读更多

来源: ArXiv AI | 08-05-25

The Aloe Family Recipe for Open and Specialized Healthcare LLMs

Authors: Dario Garcia-Gasulla, Jordi Bayarri-Planas, Ashwin Kumar Gururajan, Enrique Lopez-Cuena, Adrian Tormos, Daniel Hinjos, Pablo Bernabeu-Perez, Anna Arias-Duart, Pablo Agustin Martin-Torres, Marta Gonzalez-Mallo, Sergio Alvarez-Napagao, Eduard Ayguadé-Parra, Ulises Cortés |

阅读更多

来源: ArXiv AI | 08-05-25

"I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments

Authors: Ziyi Zhang, Zhen Sun, Zongmin Zhang, Zifan Peng, Yuemeng Zhao, Zichun Wang, Zeren Luo, Ruiting Zuo, Xinlei He |

阅读更多

来源: ArXiv AI | 08-05-25

Automatic Music Transcription using Convolutional Neural Networks and Constant-Q transform

Authors: Yohannis Telila, Tommaso Cucinotta, Davide Bacciu |

阅读更多

来源: ArXiv AI | 08-05-25

Model-Based AI planning and Execution Systems for Robotics

Authors: Or Wertheim, Ronen I. Brafman |

阅读更多

来源: ArXiv AI | 08-05-25

Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind

Authors: Mouad Abrini, Omri Abend, Dina Acklin, Henny Admoni, Gregor Aichinger, Nitay Alon, Zahra Ashktorab, Ashish Atreja, Moises Auron, Alexander Aufreiter, Raghav Awasthi, Soumya Banerjee, Joe M. Barnby, Rhea Basappa, Severin Bergsmann, Djallel Bouneffouf, Patrick Callaghan, Marc Cavazza, Thierry Chaminade, Sonia Chernova, Mohamed Chetouan, Moumita Choudhury, Axel Cleeremans, Jacek B. Cywinski, Fabio Cuzzolin, Hokin Deng, N'yoma Diamond, Camilla Di Pasquasio, Guillaume Dumas, Max van Duijn, Mahapatra Dwarikanath, Qingying Gao, Ashok Goel, Rebecca Goldstein, Matthew Gombolay, Gabriel Enrique Gonzalez, Amar Halilovic, Tobias Halmdienst, Mahimul Islam, Julian Jara-Ettinger, Natalie Kastel, Renana Keydar, Ashish K. Khanna, Mahdi Khoramshahi, JiHyun Kim, MiHyeon Kim, YoungBin Kim, Senka Krivic, Nikita Krasnytskyi, Arun Kumar, JuneHyoung Kwon, Eunju Lee, Shane Lee, Peter R. Lewis, Xue Li, Yijiang Li, Michal Lewandowski, Nathan Lloyd, Matthew B. Luebbers, Dezhi Luo, Haiyun Lyu, Dwarikanath Mahapatra, Kamal Maheshwari, Mallika Mainali, Piyush Mathur, Patrick Mederitsch, Shuwa Miura, Manuel Preston de Miranda, Reuth Mirsky, Shreya Mishra, Nina Moorman, Katelyn Morrison, John Muchovej, Bernhard Nessler, Felix Nessler, Hieu Minh Jord Nguyen, Abby Ortego, Francis A. Papay, Antoine Pasquali, Hamed Rahimi, Charumathi Raghu, Amanda Royka, Stefan Sarkadi, Jaelle Scheuerman, Simon Schmid, Paul Schrater, Anik Sen, Zahra Sheikhbahaee, Ke Shi, Reid Simmons, Nishant Singh, Mason O. Smith, Ramira van der Meulen, Anthia Solaki, Haoran Sun, Viktor Szolga, Matthew E. Taylor, Travis Taylor, Sanne Van Waveren, Juan David Vargas |

阅读更多

来源: ArXiv AI | 08-05-25

EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning

Authors: Zhenghao Xing, Xiaowei Hu, Chi-Wing Fu, Wenhai Wang, Jifeng Dai, Pheng-Ann Heng |

阅读更多

来源: ArXiv AI | 08-05-25

Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization

Authors: Wenjun Cao |

阅读更多

来源: ArXiv AI | 08-05-25

The Power of Stories: Narrative Priming Shapes How LLM Agents Collaborate and Compete

Authors: Gerrit Großmann, Larisa Ivanova, Sai Leela Poduru, Mohaddeseh Tabrizian, Islam Mesabah, David A. Selby, Sebastian J. Vollmer |

阅读更多

来源: ArXiv AI | 08-05-25

LogiDebrief: A Signal-Temporal Logic based Automated Debriefing Approach with Large Language Models Integration

Authors: Zirong Chen, Ziyan An, Jennifer Reynolds, Kristin Mullen, Stephen Martini, Meiyi Ma |

阅读更多

来源: ArXiv AI | 08-05-25

TrajEvo: Designing Trajectory Prediction Heuristics via LLM-driven Evolution

Authors: Zhikai Zhao, Chuanbo Hua, Federico Berto, Kanghoon Lee, Zihan Ma, Jiachen Li, Jinkyoo Park |

阅读更多

来源: ArXiv AI | 08-05-25

ChatGPT sees about 50 percent more use on weekdays than weekends

阅读更多

来源: The Decoder | 08-05-25

OpenAI restructures as public benefit corporation under non-profit control

阅读更多

来源: The Decoder | 08-05-25

Google upgrades Gemini 2.5 Pro for coding and app development

阅读更多

来源: The Decoder | 08-05-25

Wikidive – AI guided rabbitholes in Wikipediawikidive.tulv.in

阅读更多

来源: Hacker News | 08-05-25

How to Average in Prolog (2017)storytotell.org

阅读更多

来源: Hacker News | 08-05-25

Detecting Quishing Attacks with Machine Learning Techniques Through QR Code Analysis

Authors: Fouad Trad, Ali Chehab |

阅读更多

来源: ArXiv AI | 07-05-25

An Analysis of Hyper-Parameter Optimization Methods for Retrieval Augmented Generation

Authors: Matan Orbach, Ohad Eytan, Benjamin Sznajder, Ariel Gera, Odellia Boni, Yoav Kantor, Gal Bloch, Omri Levy, Hadas Abraham, Nitzan Barzilay, Eyal Shnarch, Michael E. Factor, Shila Ofek-Koifman, Paula Ta-Shma, Assaf Toledo |

阅读更多

来源: ArXiv AI | 07-05-25

Blending 3D Geometry and Machine Learning for Multi-View Stereopsis

Authors: Vibhas Vats, Md. Alimoor Reza, David Crandall, Soon-heung Jung |

阅读更多

来源: ArXiv AI | 07-05-25

Rapid AI-based generation of coverage paths for dispensing applications

Authors: Simon Baeuerle, Ian F. Mendonca, Kristof Van Laerhoven, Ralf Mikut, Andreas Steimer |

阅读更多

来源: ArXiv AI | 07-05-25

LlamaFirewall: An open source guardrail system for building secure AI agents

Authors: Sahana Chennabasappa, Cyrus Nikolaidis, Daniel Song, David Molnar, Stephanie Ding, Shengye Wan, Spencer Whitman, Lauren Deason, Nicholas Doucette, Abraham Montilla, Alekhya Gampa, Beto de Paola, Dominik Gabi, James Crnkovich, Jean-Christophe Testud, Kat He, Rashnil Chaturvedi, Wu Zhou, Joshua Saxe |

阅读更多

来源: ArXiv AI | 07-05-25

Holmes: Automated Fact Check with Large Language Models

Authors: Haoran Ou, Gelei Deng, Xingshuo Han, Jie Zhang, Xinlei He, Han Qiu, Shangwei Guo, Tianwei Zhang |

阅读更多

来源: ArXiv AI | 07-05-25

Is AI currently capable of identifying wild oysters? A comparison of human annotators against the AI model, ODYSSEE

Authors: Brendan Campbell, Alan Williams, Kleio Baxevani, Alyssa Campbell, Rushabh Dhoke, Rileigh E. Hudock, Xiaomin Lin, Vivek Mange, Bernhard Neuberger, Arjun Suresh, Alhim Vera, Arthur Trembanis, Herbert G. Tanner, Edward Hale |

阅读更多

来源: ArXiv AI | 07-05-25

CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics

Authors: Junqi Liu, Xiaohan Lin, Jonas Bayer, Yael Dillies, Weijie Jiang, Xiaodan Liang, Roman Soletskyi, Haiming Wang, Yunzhou Xie, Beibei Xiong, Zhengfeng Yang, Jujian Zhang, Lihong Zhi, Jia Li, Zhengying Liu |

阅读更多

来源: ArXiv AI | 07-05-25

Capability-Driven Skill Generation with LLMs: A RAG-Based Approach for Reusing Existing Libraries and Interfaces

Authors: Luis Miguel Vieira da Silva, Aljosha Köcher, Nicolas König, Felix Gehlhoff, Alexander Fay |

阅读更多

来源: ArXiv AI | 07-05-25

RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection via Retrieval-Augmented Generation

Authors: Tiantian Gan, Qiyao Sun |

阅读更多

来源: ArXiv AI | 07-05-25

Validating the Effectiveness of a Large Language Model-based Approach for Identifying Children's Development across Various Free Play Settings in Kindergarten

Authors: Yuanyuan Yang, Yuan Shen, Tianchen Sun, Yangbin Xie |

阅读更多

来源: ArXiv AI | 07-05-25

Procedural Memory Is Not All You Need: Bridging Cognitive Gaps in LLM-Based Agents

Authors: Schaun Wheeler, Olivier Jeunen |

阅读更多

来源: ArXiv AI | 07-05-25

am-ELO: A Stable Framework for Arena-based LLM Evaluation

Authors: Zirui Liu, Jiatong Li, Yan Zhuang, Qi Liu, Shuanghong Shen, Jie Ouyang, Mingyue Cheng, Shijin Wang |

阅读更多

来源: ArXiv AI | 07-05-25

OSUniverse: Benchmark for Multimodal GUI-navigation AI Agents

Authors: Mariya Davydova, Daniel Jeffries, Patrick Barker, Arturo Márquez Flores, Sinéad Ryan |

阅读更多

来源: ArXiv AI | 07-05-25

Graph Drawing for LLMs: An Empirical Evaluation

Authors: Walter Didimo, Fabrizio Montecchiani, Tommaso Piselli |

阅读更多

来源: ArXiv AI | 07-05-25

Accents in latent spaces: How AI hears accent strength in Englishboldvoice.com

阅读更多

来源: Hacker News | 07-05-25

Gemini 2.5 Pro Previewgoogleblog.com

阅读更多

来源: Hacker News | 07-05-25

Claude's system prompt is over 24k tokens with toolsgithub.com/asgeirtj

阅读更多

来源: Hacker News | 07-05-25

OpenAI reaches agreement to buy Windsurf for $3Bbloomberg.com

阅读更多

来源: Hacker News | 07-05-25

Show HN: Clippy – 90s UI for local LLMsfelixrieseberg.github.io

阅读更多

来源: Hacker News | 07-05-25

I built an AI code review agent in a few hours, here's what I learnedsourcebot.dev

阅读更多

来源: Hacker News | 07-05-25

A coherent European/non-US cloud strategyberthub.eu

阅读更多

来源: Hacker News | 07-05-25