This repository automatically fetches and displays relevant papers from arXiv based on configured criteria.
You can click this to deploy yours
- Last Updated: 2025-09-16 12:51:36 UTC
- Total Papers Found: 30
- Categories Monitored: cs.AI, cs.CL, cs.DC, cs.LG
Authors: Mohamed Wahib, Muhammed Abdullah Soyturk, Didem Unat
Category: cs.AI
Published: 2025-09-16
Score: 13.0
arXiv:2505.14864v2 Announce Type: replace-cross Abstract: To reduce the computational and memory overhead of Large Language Models, various approaches have been proposed. These include a) Mixture of Experts (MoEs), where token routing affects compute balance; b) gradual pruning of model parameters;...
Authors: Haiduo Huang, Fuwei Yang, Zhenhua Liu, Xuanwu Yin, Dong Li, Pengju Ren, Emad Barsoum
Category: cs.AI
Published: 2025-09-16
Score: 10.5
arXiv:2509.11815v1 Announce Type: cross Abstract: Speculative decoding is a powerful way to accelerate autoregressive large language models (LLMs), but directly porting it to vision-language models (VLMs) faces unique systems constraints: the prefill stage is dominated by visual tokens whose count ...
Authors: Swapnil Gandhi, Christos Kozyrakis
Category: cs.DC
Published: 2025-09-16
Score: 9.5
arXiv:2412.15411v3 Announce Type: replace Abstract: As large language models scale, training them requires thousands of GPUs over extended durations--making frequent failures an inevitable reality. While checkpointing remains the primary fault-tolerance mechanism, existing methods fall short when a...
Authors: Ruiqi Wang, Jing Ren, Tongyu Song, Wenjun Li, Xiong Wang, Sheng Wang, Shizhong Xu
Category: cs.AI
Published: 2025-09-16
Score: 9.0
arXiv:2509.10493v1 Announce Type: cross Abstract: The deployment of large-scale LoRaWAN networks requires jointly optimizing conflicting metrics like Packet Delivery Ratio (PDR) and Energy Efficiency (EE) by dynamically allocating transmission parameters, including Carrier Frequency, Spreading Fact...
Authors: Pedro Savarese
Category: cs.AI
Published: 2025-09-16
Score: 9.0
arXiv:2509.00174v2 Announce Type: replace-cross Abstract: Recent progress in deep learning has been driven by increasingly larger models. However, their computational and energy demands have grown proportionally, creating significant barriers to their deployment and to a wider adoption of deep lear...
Authors: Chien-Yu Lin, Keisuke Kamahori, Yiyu Liu, Xiaoxiang Shi, Madhav Kashyap, Yile Gu, Rulin Shao, Zihao Ye, Kan Zhu, Stephanie Wang, Arvind Krishnamurthy, Rohan Kadekodi, Luis Ceze, Baris Kasikci
Category: cs.DC
Published: 2025-09-16
Score: 9.0
arXiv:2502.20969v2 Announce Type: replace Abstract: Retrieval-augmented generation (RAG) extends large language models (LLMs) with external data sources to enhance factual correctness and domain coverage. Modern RAG pipelines rely on large datastores, leading to system challenges in latency-sensiti...
Authors: Yuhang Zhou, Zhibin Wang, Peng Jiang, Haoran Xia, Junhe Lu, Qianyu Jiang, Rong Gu, Hengxi Xu, Xinjing Huang, Guanghuan Fang, Zhiheng Hu, Jingyi Zhang, Yongjin Cai, Jian He, Chen Tian
Category: cs.DC
Published: 2025-09-16
Score: 9.0
arXiv:2508.21613v2 Announce Type: replace Abstract: Training large language models faces frequent interruptions due to various faults, demanding robust fault-tolerance. Existing backup-free methods, such as redundant computation, dynamic parallelism, and data rerouting, each incur performance penal...
Authors: Weihao Zhu, Long Shi, Kang Wei, Zhen Mei, Zhe Wang, Jiaheng Wang, Jun Li
Category: cs.DC
Published: 2025-09-16
Score: 8.5
arXiv:2509.12141v1 Announce Type: new Abstract: As an enabling architecture of Large Models (LMs), Mixture of Experts (MoE) has become prevalent thanks to its sparsely-gated mechanism, which lowers computational overhead while maintaining learning performance comparable to dense LMs. The essence of...
9. FineServe: Precision-Aware KV Slab and Two-Level Scheduling for Heterogeneous Precision LLM Serving
Authors: Kyungmin Bin, Seungbeom Choi, Jimyoung Son, Jieun Choi, Daseul Bae, Daehyeon Baek, Kihyo Moon, Minsung Jang, Hyojung Lee
Category: cs.DC
Published: 2025-09-16
Score: 8.5
arXiv:2509.06261v2 Announce Type: replace Abstract: Recent advances in Post-Training Quantization (PTQ) techniques have significantly increased demand for serving quantized large language models (LLMs), enabling higher throughput and substantially reduced memory usage with minimal accuracy loss. Qu...
Authors: Santhosh G S, Saurav Prakash, Balaraman Ravindran
Category: cs.AI
Published: 2025-09-16
Score: 8.0
arXiv:2509.11155v1 Announce Type: cross Abstract: The quadratic complexity of the attention mechanism remains a fundamental barrier to scaling Large Language Models (LLMs) to longer contexts, creating a critical bottleneck in both computation and memory. To address this, we introduce AQUA (Attentio...
Authors: David Schiff, Ofir Lindenbaum, Yonathan Efroni
Category: cs.AI
Published: 2025-09-16
Score: 8.0
arXiv:2509.11259v1 Announce Type: cross Abstract: Gradient based optimization is fundamental to most modern deep reinforcement learning algorithms, however, it introduces significant sensitivity to hyperparameters, unstable training dynamics, and high computational costs. We propose TabPFN RL, a no...
12. EfficientUICoder: Efficient MLLM-based UI Code Generation via Input and Output Token Compression
Authors: Jingyu Xiao, Zhongyi Zhang, Yuxuan Wan, Yintong Huo, Yang Liu, Michael R. Lyu
Category: cs.AI
Published: 2025-09-16
Score: 8.0
arXiv:2509.12159v1 Announce Type: cross Abstract: Multimodal Large Language Models have demonstrated exceptional performance in UI2Code tasks, significantly enhancing website development efficiency. However, these tasks incur substantially higher computational overhead than traditional code generat...
Authors: Xinzhe Zheng, Zhen-Qun Yang, Haoran Xie, S. Joe Qin, Arlene Chen, Fangzhen Lin
Category: cs.AI
Published: 2025-09-16
Score: 8.0
arXiv:2509.03054v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of Natural Language Processing (NLP) tasks, but require substantial memory and computational resources. Binary quantization, which compresses model wei...
Authors: Mingxiao Huo, Jiayi Zhang, Hewei Wang, Jinfeng Xu, Zheyu Chen, Huilin Tai, Yijun Chen
Category: cs.CL
Published: 2025-09-16
Score: 8.0
arXiv:2509.11961v1 Announce Type: new Abstract: Vision-Language Models (VLMs) enable powerful multimodal reasoning but suffer from slow autoregressive inference, limiting their deployment in real-time applications. We introduce Spec-LLaVA, a system that applies speculative decoding to accelerate VL...
Authors: Rahma Nouaji, Stella Bitchebe, Ricardo Macedo, Oana Balmau
Category: cs.DC
Published: 2025-09-16
Score: 8.0
arXiv:2509.10712v1 Announce Type: new Abstract: Data loaders are used by Machine Learning (ML) frameworks like PyTorch and TensorFlow to apply transformations to data before feeding it into the accelerator. This operation is called data preprocessing. Data preprocessing plays an important role in t...
16. LogGuardQ: A Cognitive-Enhanced Reinforcement Learning Framework for Cybersecurity Anomaly Detection in Security Logs
Authors: Umberto Gon\c{c}alves de Sousa
Category: cs.AI
Published: 2025-09-16
Score: 7.5
arXiv:2509.10511v1 Announce Type: cross Abstract: Reinforcement learning (RL) has transformed sequential decision-making, but traditional algorithms like Deep Q-Networks (DQNs) and Proximal Policy Optimization (PPO) often struggle with efficient exploration, stability, and adaptability in dynamic e...
17. Application of Machine Learning for Correcting Defect-induced Neuromorphic Circuit Inference Errors
Authors: Vedant Sawal, Hiu Yung Wong
Category: cs.AI
Published: 2025-09-16
Score: 7.5
arXiv:2509.11113v1 Announce Type: cross Abstract: This paper presents a machine learning-based approach to correct inference errors caused by stuck-at faults in fully analog ReRAM-based neuromorphic circuits. Using a Design-Technology Co-Optimization (DTCO) simulation framework, we model and analyz...
Authors: Alejandro Dopico-Castro, Oscar Fontenla-Romero, Bertha Guijarro-Berdi~nas, Amparo Alonso-Betanzos
Category: cs.AI
Published: 2025-09-16
Score: 7.5
arXiv:2509.11285v1 Announce Type: cross Abstract: Incremental learning remains a critical challenge in machine learning, as models often struggle with catastrophic forgetting -the tendency to lose previously acquired knowledge when learning new information. These challenges are even more pronounced...
Authors: Sangjun Lee, Seung-taek Woo, Jungyu Jin, Changhun Lee, Eunhyeok Park
Category: cs.AI
Published: 2025-09-16
Score: 7.5
arXiv:2509.12019v1 Announce Type: cross Abstract: To enable broader deployment of Large Language Models (LLMs), it is essential to identify the best-performing model under strict memory constraints. We present AMQ, Automated Mixed-Precision Weight-Only Quantization, a framework that assigns layer-w...
20. Hide-and-Shill: A Reinforcement Learning Framework for Market Manipulation Detection in Symphony-a Decentralized Multi-Agent System
Authors: Ronghua Shi, Yiou Liu, Xinyu Ying, Yang Tan, Yuchun Feng, Lynn Ai, Bill Shi, Xuhui Wang, Zhuang Liu
Category: cs.AI
Published: 2025-09-16
Score: 7.5
arXiv:2507.09179v2 Announce Type: replace Abstract: Decentralized finance (DeFi) has introduced a new era of permissionless financial innovation but also led to unprecedented market manipulation. Without centralized oversight, malicious actors coordinate shilling campaigns and pump-and-dump schemes...
21. D$^2$HScore: Reasoning-Aware Hallucination Detection via Semantic Breadth and Depth Analysis in LLMs
Authors: Yue Ding, Xiaofang Zhu, Tianze Xia, Junfei Wu, Xinlong Chen, Qiang Liu, Liang Wang
Category: cs.CL
Published: 2025-09-16
Score: 7.5
arXiv:2509.11569v1 Announce Type: new Abstract: Although large Language Models (LLMs) have achieved remarkable success, their practical application is often hindered by the generation of non-factual content, which is called "hallucination". Ensuring the reliability of LLMs' outputs is a critical ch...
Authors: Minxuan Lv, Zhenpeng Su, Leiyu Pan, Yizhe Xiong, Zijia Lin, Hui Chen, Wei Zhou, Jungong Han, Guiguang Ding, Cheng Luo, Di Zhang, Kun Gai, Songlin Hu
Category: cs.CL
Published: 2025-09-16
Score: 7.5
arXiv:2502.12455v3 Announce Type: replace Abstract: As large language models continue to scale, computational costs and resource consumption have emerged as significant challenges. While existing sparsification methods like pruning reduce computational overhead, they risk losing model knowledge thr...
Authors: M. Z. Haider, M. Dias de Assuncao, Kaiwen Zhang
Category: cs.DC
Published: 2025-09-16
Score: 7.5
arXiv:2509.11006v1 Announce Type: cross Abstract: Blockchain technology offers decentralization and security but struggles with scalability, particularly in enterprise settings where efficiency and controlled access are paramount. Sharding is a promising solution for private blockchains, yet existi...
Authors: Han Liang, Jiahui Zhou, Zicheng Zhou, Xiaoxi Zhang, Xu Chen
Category: cs.DC
Published: 2025-09-16
Score: 7.5
arXiv:2509.04719v2 Announce Type: replace Abstract: The escalating adoption of diffusion models for applications such as image generation demands efficient parallel inference techniques to manage their substantial computational cost. However, existing diffusion parallelism inference schemes often u...
25. Holographic Knowledge Manifolds: A Novel Pipeline for Continual Learning Without Catastrophic Forgetting in Large Language Models
Authors: Justin Arndt
Category: cs.LG
Published: 2025-09-16
Score: 7.5
arXiv:2509.10518v1 Announce Type: new Abstract: We introduce the Holographic Knowledge Manifold (HKM), a four-phase pipeline that achieves zero catastrophic forgetting in AI knowledge representation while maintaining minimal memory growth and high efficiency. Leveraging fractal quantization, probab...
Authors: Sangjoon Park, Yeonjong Shin, Jinhyun Choo
Category: cs.LG
Published: 2025-09-16
Score: 7.5
arXiv:2509.11966v1 Announce Type: new Abstract: Poroelasticity -- coupled fluid flow and elastic deformation in porous media -- often involves spatially variable permeability, especially in subsurface systems. In such cases, simulations with random permeability fields are widely used for probabilis...
Authors: Zhoujun Cheng, Richard Fan, Shibo Hao, Taylor W. Killian, Haonan Li, Suqi Sun, Hector Ren, Alexander Moreno, Daqian Zhang, Tianjun Zhong, Yuxin Xiong, Yuanzhe Hu, Yutao Xie, Xudong Han, Yuqi Wang, Varad Pimpalkhute, Yonghao Zhuang, Aaryamonvikram Singh, Xuezhi Liang, Anze Xie, Jianshu She, Desai Fan, Chengqian Gao, Liqun Ma, Mikhail Yurochkin, John Maggs, Xuezhe Ma, Guowei He, Zhiting Hu, Zhengzhong Liu, Eric P. Xing
Category: cs.LG
Published: 2025-09-16
Score: 7.5
arXiv:2509.07604v3 Announce Type: replace Abstract: K2-Think is a reasoning system that achieves state-of-the-art performance with a 32B parameter model, matching or surpassing much larger models like GPT-OSS 120B and DeepSeek v3.1. Built on the Qwen2.5 base model, our system shows that smaller mod...
28. Think Small, Plan Smart: Minimalist Symbolic Abstraction and Heuristic Subspace Search for LLM-Guided Task Planning
Authors: Junfeng Tang, Yuping Yan, Zihan Ye, Zhenshou, Song, Zeqi Zheng, Yaochu Jin
Category: cs.LG
Published: 2025-09-16
Score: 7.5
arXiv:2501.15214v2 Announce Type: replace-cross Abstract: Reliable task planning is pivotal for achieving long-horizon autonomy in real-world robotic systems. Large language models (LLMs) offer a promising interface for translating complex and ambiguous natural language instructions into actionable...
29. SABR: A Stable Adaptive Bitrate Framework Using Behavior Cloning Pretraining and Reinforcement Learning Fine-Tuning
Authors: Pengcheng Luo, Yunyang Zhao, Bowen Zhang, Genke Yang, Boon-Hee Soong, Chau Yuen
Category: cs.AI
Published: 2025-09-16
Score: 7.0
arXiv:2509.10486v1 Announce Type: cross Abstract: With the advent of 5G, the internet has entered a new video-centric era. From short-video platforms like TikTok to long-video platforms like Bilibili, online video services are reshaping user consumption habits. Adaptive Bitrate (ABR) control is wid...
Authors: Yuwen Cao, Guijun Liu, Tomoaki Ohtsuki, Howard H. Yang, Tony Q. S. Quek
Category: cs.AI
Published: 2025-09-16
Score: 7.0
arXiv:2509.10490v1 Announce Type: cross Abstract: The deep autoencoder (DAE) framework has turned out to be efficient in reducing the channel state information (CSI) feedback overhead in massive multiple-input multipleoutput (mMIMO) systems. However, these DAE approaches presented in prior works re...
This bot is configured to look for papers containing the following keywords:
- LLM, RL, RLHF, Inference, Training, Attention, Pipeline, MOE, Sparse, Quantization, Speculative, Efficient, Efficiency, Framework, Parallel, Distributed, Kernel, Decode, Decoding, Prefill, Throughput, Fast, Network, Hardware, Cluster, FP8, FP4, Optimization, Scalable, Communication
The bot runs daily at 12:00 UTC via GitHub Actions to fetch the latest papers.
- Fork this repository to your GitHub account
- Customize the configuration by editing
config.json
:- Add/remove arXiv categories (e.g.,
cs.AI
,cs.LG
,cs.CL
) - Modify keywords to match your research interests
- Adjust
max_papers
anddays_back
settings
- Add/remove arXiv categories (e.g.,
- Enable GitHub Actions in your repository settings
- The bot will automatically run daily and update the README.md
Common categories include:
cs.AI
- Artificial Intelligencecs.LG
- Machine Learningcs.CL
- Computation and Languagecs.CV
- Computer Visioncs.NE
- Neural and Evolutionary Computingstat.ML
- Machine Learning (Statistics)
Add keywords that match your research interests. The bot will search for these terms in paper titles and abstracts.
Add terms to exclude certain types of papers (e.g., "survey", "review", "tutorial").
You can manually trigger the bot by:
- Going to the "Actions" tab in your repository
- Selecting "arXiv Bot Daily Update"
- Clicking "Run workflow"
Generated automatically by arXiv Bot