Skip to content

suruoxi/HumanAIGC-arxiv-daily

 
 

Repository files navigation

HumanAIGC Research Papers

Updated on 2025.11.19

Table of Contents
  1. Talking Face
  2. Image Animation
  3. Video Generation
  4. TryOn
  5. Visual Edit
  6. Others
  7. Music2Dance and Co-speech
  8. Speech and Interaction
  9. Post Training
Talking Face

Talking Face

Publish Date Title Authors PDF Code
2025-11-18 Blur-Robust Detection via Feature Restoration: An End-to-End Framework for Prior-Guided Infrared UAV Target Detection Xiaolin Wang et.al. 2511.14371 null
2025-11-18 Towards Authentic Movie Dubbing with Retrieve-Augmented Director-Actor Interaction Learning Rui Liu et.al. 2511.14249 null
2025-11-18 StreamingTalker: Audio-driven 3D Facial Animation with Autoregressive Diffusion Model Yifan Yang et.al. 2511.14223 null
2025-11-17 B2F: End-to-End Body-to-Face Motion Generation with Style Reference Bokyung Jang et.al. 2511.13988 null
2025-11-17 Passive Dementia Screening via Facial Temporal Micro-Dynamics Analysis of In-the-Wild Talking-Head Video Filippo Cenacchi. Longbing Cao et.al. 2511.13802 null
2025-11-17 Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views Junyi Ma et.al. 2511.12878 null
2025-11-12 GRACE: Designing Generative Face Video Codec via Agile Hardware-Centric Workflow Rui Wan et.al. 2511.09272 null
2025-11-11 Is It Truly Necessary to Process and Fit Minutes-Long Reference Videos for Personalized Talking Face Generation? Rui-Qing Sun et.al. 2511.07940 null
2025-11-10 LiveNeRF: Efficient Face Replacement Through Neural Radiance Fields Integration Tung Vu et.al. 2511.07552 null
2025-11-10 The Inner Kernel of the Classical Kuiper Belt Amir Siraj et.al. 2511.07512 null
2025-11-10 ConsistTalk: Intensity Controllable Temporally Consistent Talking Head Generation with Diffusion Noise Search Zhenjie Liu et.al. 2511.06833 null
2025-11-08 DiLO: Disentangled Latent Optimization for Learning Shape and Deformation in Grouped Deforming 3D Objects Mostofa Rafid Uddin et.al. 2511.06115 null
2025-11-08 Reperio-rPPG: Relational Temporal Graph Neural Networks for Periodicity Learning in Remote Physiological Measurement Ba-Thinh Nguyen et.al. 2511.05946 null
2025-11-07 Shared Latent Representation for Joint Text-to-Audio-Visual Synthesis Dogucan Yaman et.al. 2511.05432 null
2025-11-07 THEval. Evaluation Framework for Talking Head Video Generation Nabyl Quignon et.al. 2511.04520 null
2025-11-05 Assessing Identity Leakage in Talking Face Generation: Metrics and Evaluation Framework Dogucan Yaman et.al. 2511.08613 null
2025-11-05 Laugh, Relate, Engage: Stylized Comment Generation for Short Videos Xuan Ouyang et.al. 2511.03757 null
2025-11-05 UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions Guozhen Zhang et.al. 2511.03334 null
2025-11-04 Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks Dmitrii Pozdeev et.al. 2511.02830 null
2025-11-01 Beyond the Uncanny Valley: A Mixed-Method Investigation of Anthropomorphism in Protective Responses to Robot Abuse Fan Yang et.al. 2510.26082 null
2025-11-01 Audio Driven Real-Time Facial Animation for Social Telepresence Jiye Lee et.al. 2510.01176 null
2025-10-29 Learning Disentangled Speech- and Expression-Driven Blendshapes for 3D Talking Face Animation Yuxiang Mao et.al. 2510.25234 null
2025-10-28 See the Speaker: Crafting High-Resolution Talking Faces from Speech with Prior Guidance and Region Refinement Jinting Wang et.al. 2510.26819 null
2025-10-28 The Divine Software Engineering Comedy -- Inferno: The Okinawa Files Michele Lanza et.al. 2510.24483 null
2025-10-28 GenTrack: A New Generation of Multi-Object Tracking Toan Van Nguyen et.al. 2510.24399 null
2025-10-28 Variable Projected Augmented Lagrangian Methods for Generalized Lasso Problems Stefano Aleotti et.al. 2510.24140 null
2025-10-27 Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation Junyoung Seo et.al. 2510.23581 null
2025-10-27 Revising Second Order Terms in Deep Animation Video Coding Konstantin Schmidt et.al. 2510.23561 null
2025-10-26 MAGIC-Talk: Motion-aware Audio-Driven Talking Face Generation with Customizable Identity Control Fatemeh Nazarieh et.al. 2510.22810 null
2025-10-26 DeepfakeBench-MM: A Comprehensive Benchmark for Multimodal Deepfake Detection Kangran Zhao et.al. 2510.22622 null
2025-10-24 Unmasking Puppeteers: Leveraging Biometric Leakage to Disarm Impersonation in AI-based Videoconferencing Danial Samadi Vahdati et.al. 2510.03548 null
2025-10-23 LSF-Animation: Label-Free Speech-Driven Facial Animation via Implicit Feature Representation Xin Lu et.al. 2510.21864 null
2025-10-16 PIA: Deepfake Detection Using Phoneme-Temporal and Identity-Dynamic Analysis Soumyya Kanti Datta et.al. 2510.14241 null
2025-10-14 Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback Xingpei Ma et.al. 2510.12089 null
2025-10-12 DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis Peiyin Chen et.al. 2510.10650 null
2025-10-11 VividAnimator: An End-to-End Audio and Pose-driven Half-Body Human Animation Framework Donglin Huang et.al. 2510.10269 null
2025-10-11 SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation Zeyu Ling et.al. 2510.10069 null
2025-10-09 Paper2Video: Automatic Video Generation from Scientific Papers Zeyu Zhu et.al. 2510.05096 null
2025-10-08 A Bridge from Audio to Video: Phoneme-Viseme Alignment Allows Every Face to Speak Multiple Languages Zibo Su et.al. 2510.06612 null
2025-10-03 EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation Tianheng Zhu et.al. 2510.08587 null
2025-10-02 Input-Aware Sparse Attention for Real-Time Co-Speech Video Generation Beijia Lu et.al. 2510.02617 null
2025-09-30 3DiFACE: Synthesizing and Editing Holistic 3D Facial Animation Balamurugan Thambiraja et.al. 2509.26233 null
2025-09-28 Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer Hyunsoo Cha et.al. 2509.04434 null
2025-09-26 StableDub: Taming Diffusion Prior for Generalized and Efficient Visual Dubbing Liyang Chen et.al. 2509.21887 null
2025-09-25 Unlocking Financial Insights: An advanced Multimodal Summarization with Multimodal Output Framework for Financial Advisory Videos Sarmistha Das et.al. 2509.20961 null
2025-09-24 KSDiff: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation Tianle Lyu et.al. 2509.20128 null
2025-09-24 Comparative Study of Subjective Video Quality Assessment Test Methods in Crowdsourcing for Varied Use Cases Babak Naderi et.al. 2509.20118 null
2025-09-24 SynchroRaMa : Lip-Synchronized and Emotion-Aware Talking Face Generation via Multi-Modal Emotion Embedding Phyo Thet Yee et.al. 2509.19965 null
2025-09-24 Talking Head Generation via AU-Guided Landmark Prediction Shao-Yu Chang et.al. 2509.19749 null
2025-09-24 EAI-Avatar: Emotion-Aware Interactive Talking Head Generation Haijie Yang et.al. 2508.18337 null
2025-09-23 Audio-Driven Universal Gaussian Head Avatars Kartik Teotia et.al. 2509.18924 null
2025-09-22 "I don't like my avatar": Investigating Human Digital Doubles Siyi Liu et.al. 2509.17748 null
2025-09-22 Stable Video-Driven Portraits Mallikarjun B. R. et.al. 2509.17476 null
2025-09-21 Beat on Gaze: Learning Stylized Generation of Gaze and Head Dynamics Chengwei Shi et.al. 2509.17168 null
2025-09-21 PGSTalker: Real-Time Audio-Driven Talking Head Generation via 3D Gaussian Splatting with Pixel-Aware Density Control Tianheng Zhu et.al. 2509.16922 null
2025-09-20 Follow-Your-Emoji-Faster: Towards Efficient, Fine-Controllable, and Expressive Freestyle Portrait Animation Yue Ma et.al. 2509.16630 null
2025-09-17 Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis Yikang Ding et.al. 2509.09595 null
2025-09-16 A Lightweight Pipeline for Noisy Speech Voice Cloning and Accurate Lip Sync Synthesis Javeria Amir et.al. 2509.12831 null
2025-09-15 AvatarSync: Rethinking Talking-Head Animation through Autoregressive Perspective Yuchen Deng et.al. 2509.12052 null
2025-09-10 Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video Xiao Li et.al. 2509.08376 null
2025-08-28 EmoCAST: Emotional Talking Portrait via Emotive Text Description Yiguo Jiang et.al. 2508.20615 null
2025-08-27 InfinityHuman: Towards Long-Term Audio-Driven Human Xiaodi Li et.al. 2508.20210 null
2025-08-27 Improving Generalization in Deepfake Detection with Face Foundation Models and Metric Learning Stelios Mylonas et.al. 2508.19730 null
2025-08-26 OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation Jianwen Jiang et.al. 2508.19209 null
2025-08-26 Wan-S2V: Audio-Driven Cinematic Video Generation Xin Gao et.al. 2508.18621 null
2025-08-25 Lightning Fast Caching-based Parallel Denoising Prediction for Accelerating Talking Head Generation Jianzhi Long et.al. 2509.00052 null
2025-08-22 Audio2Face-3D: Audio-driven Realistic Facial Animation For Digital Avatars NVIDIA et.al. 2508.16401 null
2025-08-20 D^3-Talker: Dual-Branch Decoupled Deformation Fields for Few-Shot 3D Talking Head Synthesis Yuhang Guo et.al. 2508.14449 null
2025-08-20 Taming Transformer for Emotion-Controllable Talking Face Generation Ziqi Zhang et.al. 2508.14359 null
2025-08-19 TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis Shunian Chen et.al. 2508.13618 null
2025-08-19 EDTalk++: Full Disentanglement for Controllable Talking Head Synthesis Shuai Tan et.al. 2508.13442 null
2025-08-18 Human Feedback Driven Dynamic Speech Emotion Recognition Ilya Fedorov et.al. 2508.14920 null
2025-08-17 CEM-Net: Cross-Emotion Memory Network for Emotional Talking Face Generation Kangyi Wu et.al. 2508.12368 null
2025-08-16 RealTalk: Realistic Emotion-Aware Lifelike Talking-Head Synthesis Wenqing Wang et.al. 2508.12163 null
2025-08-16 SimInterview: Transforming Business Education through Large Language Model-Based Simulated Multilingual Interview Training System Truong Thanh Hung Nguyen et.al. 2508.11873 null
2025-08-15 FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation MengChao Wang et.al. 2508.11255 null
2025-08-14 HM-Talker: Hybrid Motion Modeling for High-Fidelity Talking Head Synthesis Shiyu Liu et.al. 2508.10566 null
2025-08-13 LIA-X: Interpretable Latent Portrait Animator Yaohui Wang et.al. 2508.09959 null
2025-08-12 Preview WB-DH: Towards Whole Body Digital Human Bench for the Generation of Whole-body Talking Avatar Videos Chaoyi Wang et.al. 2508.08891 null
2025-08-11 Learning Phonetic Context-Dependent Viseme for Enhancing Speech-Driven 3D Facial Animation Hyung Kyu Kim et.al. 2507.20568 null
2025-08-10 KLASSify to Verify: Audio-Visual Deepfake Detection Using SSL-based Audio and Handcrafted Visual Features Ivan Kukanov et.al. 2508.07337 null
2025-08-08 MotionSwap Om Patil et.al. 2508.06430 null
2025-08-07 Evaluation of a Sign Language Avatar on Comprehensibility, User Experience & Acceptability Fenya Wasserroth et.al. 2508.05358 null
2025-08-07 RAP: Real-time Audio-driven Portrait Animation with Video Diffusion Transformer Fangyu Du et.al. 2508.05115 null
2025-08-07 UniTalker: Conversational Speech-Visual Synthesis Yifan Hu et.al. 2508.04585 null
2025-08-07 AudioGen-Omni: A Unified Multimodal Diffusion Transformer for Video-Synchronized Audio, Speech, and Song Generation Le Wang et.al. 2508.00733 null
2025-08-06 MienCap: Realtime Performance-Based Facial Animation with Live Mood Dynamics Ye Pan et.al. 2508.04687 null
2025-08-06 READ: Real-time and Efficient Asynchronous Diffusion for Audio-driven Talking Head Generation Haotian Wang et.al. 2508.03457 null
2025-08-05 Multi-human Interactive Talking Dataset Zeyu Zhu et.al. 2508.03050 null
2025-08-04 X-Actor: Emotional and Expressive Long-Range Portrait Acting from Audio Chenxu Zhang et.al. 2508.02944 null
2025-08-04 Text2Lip: Progressive Lip-Synced Talking Face Generation from Text via Viseme-Guided Rendering Xu Wang et.al. 2508.02362 null
2025-08-04 Is It Really You? Exploring Biometric Verification Scenarios in Photorealistic Talking-Head Avatar Videos Laura Pedrouzo-Rodriguez et.al. 2508.00748 null
2025-07-31 Who is a Better Talker: Subjective and Objective Quality Assessment for AI-Generated Talking Heads Yingjie Zhou et.al. 2507.23343 null
2025-07-30 X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention Xiaochen Zhao et.al. 2507.23143 null
2025-07-30 Robust Deepfake Detection for Electronic Know Your Customer Systems Using Registered Images Takuma Amada et.al. 2507.22601 null
2025-07-29 DiTalker: A Unified DiT-based Framework for High-Quality and Speaking Styles Controllable Portrait Animation He Feng et.al. 2508.06511 null
2025-07-29 JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1 Xinhan Di et.al. 2507.20987 null
2025-07-29 Versatile Multimodal Controls for Expressive Talking Human Animation Zheng Qin et.al. 2503.08714 null
2025-07-28 Mask-Free Audio-driven Talking Face Generation for Enhanced Visual Quality and Identity Preservation Dogucan Yaman et.al. 2507.20953 null
2025-07-28 MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization Hyung Kyu Kim et.al. 2507.20562 null
2025-07-28 JOLT3D: Joint Learning of Talking Heads and 3DMM Parameters with Application to Lip-Sync Sungjoon Park et.al. 2507.20452 null
2025-07-25 Face2VoiceSync: Lightweight Face-Voice Consistency for Text-Driven Talking Face Generation Fang Kang et.al. 2507.19225 null
2025-07-24 Tiny is not small enough: High-quality, low-resource facial animation models through hybrid knowledge distillation Zhen Han et.al. 2507.18352 null
2025-07-24 Celeb-DF++: A Large-scale Challenging Video DeepFake Benchmark for Generalizable Forensics Yuezun Li et.al. 2507.18015 null
2025-07-24 MEDTalk: Multimodal Controlled 3D Facial Animation with Dynamic Emotions by Disentangled Embedding Chang Liu et.al. 2507.06071 null
2025-07-23 MoDA: Multi-modal Diffusion Architecture for Talking Head Generation Xinyang Li et.al. 2507.03256 null
2025-07-22 Livatar-1: Real-Time Talking Heads Generation with Tailored Flow Matching Haiyang Liu et.al. 2507.18649 null
2025-07-22 Navigating Large-Pose Challenge for High-Fidelity Face Reenactment with Video Diffusion Model Mingtao Guo et.al. 2507.16341 null
2025-07-21 VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis Alexandre Symeonidis-Herzig et.al. 2507.06060 null
2025-07-18 FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers Qiang Wang et.al. 2507.12956 null
2025-07-17 ATL-Diff: Audio-Driven Talking Head Generation with Early Landmarks-Guide Noise Diffusion Hoang-Son Vo et.al. 2507.12804 null
2025-07-17 Think-Before-Draw: Decomposing Emotion Semantics & Fine-Grained Controllable Expressive Talking Head Generation Hanlei Shi et.al. 2507.12761 null
2025-07-17 Cross-Modal Watermarking for Authentic Audio Recovery and Tamper Localization in Synthesized Audiovisual Forgeries Minyoung Kim et.al. 2507.12723 null
2025-07-16 AU-Blendshape for Fine-grained Stylized 3D Facial Expression Manipulation Hao Li et.al. 2507.12001 null
2025-07-14 M2DAO-Talker: Harmonizing Multi-granular Motion Decoupling and Alternating Optimization for Talking-head Generation Kui Jiang et.al. 2507.08307 null
2025-07-11 Detecting Deepfake Talking Heads from Facial Biometric Anomalies Justin D. Norman et.al. 2507.08917 null
2025-07-10 GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation Wentao Hu et.al. 2506.21513 null
2025-07-07 MoDiT: Learning Highly Consistent 3D Motion Coefficients with Diffusion Transformer for Talking Head Generation Yucheng Wang et.al. 2507.05092 null
2025-07-05 EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation Rang Meng et.al. 2507.03905 null
2025-07-03 CanonSwap: High-Fidelity and Consistent Video Face Swapping via Canonical Space Modulation Xiangyang Luo et.al. 2507.02691 null
2025-07-02 FixTalk: Taming Identity Leakage for High-Quality Talking Head Generation in Extreme Cases Shuai Tan et.al. 2507.01390 null
2025-07-01 ICME 2025 Grand Challenge on Video Super-Resolution for Video Conferencing Babak Naderi et.al. 2506.12269 link
2025-06-30 JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching Mingi Kwon et.al. 2506.23552 null
2025-06-27 MirrorMe: Towards Realtime and High Fidelity Audio-Driven Halfbody Animation Dechao Meng et.al. 2506.22065 null
2025-06-27 Few-Shot Identity Adaptation for 3D Talking Heads via Global Gaussian Field Hong Nie et.al. 2506.22044 null
2025-06-27 RiverEcho: Real-Time Interactive Digital System for Ancient Yellow River Culture Haofeng Wang et.al. 2506.21865 null
2025-06-24 Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router Yubo Huang et.al. 2506.19833 null
2025-06-23 Advancing Talking Head Generation: A Comprehensive Survey of Multi-Modal Methodologies, Datasets, Evaluation Metrics, and Loss Functions Vineet Kumar Rakesh et.al. 2507.02900 null
2025-06-23 OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation Qijun Gan et.al. 2506.18866 null
2025-06-17 SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting Ziqiao Peng et.al. 2506.14742 null
2025-06-17 Compressed Video Super-Resolution based on Hierarchical Encoding Yuxuan Jiang et.al. 2506.14381 null
2025-06-16 Audio-Visual Driven Compression for Low-Bitrate Talking Head Videos Riku Takahashi et.al. 2506.13419 null
2025-06-15 iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer Zhelun Shen et.al. 2506.12847 null
2025-06-10 HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation Ziyao Huang et.al. 2506.08797 null
2025-06-03 NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results Xiaohong Liu et.al. 2506.02875 null
2025-06-02 Cocktail-Party Audio-Visual Speech Recognition Thai-Binh Nguyen et.al. 2506.02178 null
2025-06-02 Low-Rank Head Avatar Personalization with Registers Sai Tanmay Reddy Chakkera et.al. 2506.01935 null
2025-06-02 Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation Yuan Gan et.al. 2506.01591 link
2025-06-01 SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers Zhengcong Fei et.al. 2506.00830 null
2025-05-30 TalkingHeadBench: A Multi-Modal Benchmark & Analysis of Talking-Head DeepFake Detection Xinqi Xiong et.al. 2505.24866 null
2025-05-29 Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion Modulation Jiahao Cui et.al. 2505.23525 link
2025-05-29 Video Editing for Audio-Visual Dubbing Binyamin Manela et.al. 2505.23406 link
2025-05-29 Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation Hao Li et.al. 2505.23290 link
2025-05-29 MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation Siyuan Wang et.al. 2505.23120 link
2025-05-28 Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation Zhe Kong et.al. 2505.22647 link
2025-05-28 Tell me Habibi, is it Real or Fake? Kartik Kuckreja et.al. 2505.22581 null
2025-05-28 Neural Face Skinning for Mesh-agnostic Facial Expression Cloning Sihun Cha et.al. 2505.22416 null
2025-05-28 FaceEditTalker: Interactive Talking Head Generation with Facial Attribute Editing Guanwen Feng et.al. 2505.22141 null
2025-05-28 RESOUND: Speech Reconstruction from Silent Videos via Acoustic-Semantic Decomposed Modeling Long-Khanh Pham et.al. 2505.22024 null
2025-05-27 OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers Ziqiao Peng et.al. 2505.21448 null
2025-05-26 Total-Editing: Head Avatar with Editable Appearance, Motion, and Lighting Yizhou Zhao et.al. 2505.20582 null
2025-05-26 DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations Ziqiao Peng et.al. 2505.18096 null
2025-05-22 Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis Radek Daněček et.al. 2504.13386 null
2025-05-14 Test-Time Augmentation for Pose-invariant Face Recognition Jaemin Jung et.al. 2505.09256 null
2025-05-10 VTutor: An Animated Pedagogical Agent SDK that Provide Real Time Multi-Model Feedback Eason Chen et.al. 2505.06676 null
2025-05-10 OT-Talk: Animating 3D Talking Head with Optimal Transportation Xinmu Wang et.al. 2505.01932 null
2025-05-10 MagicPortrait: Temporally Consistent Face Reenactment with 3D Geometric Guidance Mengting Wei et.al. 2504.21497 link
2025-05-08 OXSeg: Multidimensional attention UNet-based lip segmentation using semi-supervised lip contours Hanie Moghaddasi et.al. 2505.05531 null
2025-05-03 GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting Anushka Agarwal et.al. 2505.01928 null
2025-05-02 Model See Model Do: Speech-Driven Facial Animation with Style Control Yifang Pan et.al. 2505.01319 null
2025-05-02 FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing Gaoxiang Cong et.al. 2505.01263 null
2025-05-01 KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution Antoni Bigata et.al. 2505.00497 null
2025-04-29 IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos Yuan Li et.al. 2504.19165 null
2025-04-27 Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions Mohammad Mahdi Abootorabi et.al. 2504.19056 link
2025-04-26 Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning Yifan Xie et.al. 2504.18810 null
2025-04-25 Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation Weipeng Tan et.al. 2504.18087 null
2025-04-14 SpinMeRound: Consistent Multi-View Identity Generation Using Diffusion Models Stathis Galanakis et.al. 2504.10716 null
2025-04-10 ChildlikeSHAPES: Semantic Hierarchical Region Parsing for Animating Figure Drawings Astitva Srivastava et.al. 2504.08022 null
2025-04-08 VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing Juan Luis Gonzalez Bello et.al. 2504.07146 null
2025-04-08 SE4Lip: Speech-Lip Encoder for Talking Head Synthesis to Solve Phoneme-Viseme Alignment Ambiguity Yihuan Huang et.al. 2504.05803 null
2025-04-08 Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation Zhihua Xu et.al. 2504.05746 null
2025-04-08 Contrastive Decoupled Representation Learning and Regularization for Speech-Preserving Facial Expression Manipulation Tianshui Chen et.al. 2504.05672 null
2025-04-07 Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation Fa-Ting Hong et.al. 2504.02542 link
2025-04-06 FluentLip: A Phonemes-Based Two-stage Approach for Audio-Driven Lip Synthesis with Optical Flow Consistency Shiyan Liu et.al. 2504.04427 null
2025-04-04 A Human Digital Twin Architecture for Knowledge-based Interactions and Context-Aware Conversations Abdul Mannan Mohammed et.al. 2504.03147 null
2025-04-03 OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication Zhongjian Wang et.al. 2504.02433 null
2025-04-03 VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models Kim Sung-Bin et.al. 2504.02386 null
2025-04-02 Detecting Lip-Syncing Deepfakes: Vision Temporal Transformer for Analyzing Mouth Inconsistencies Soumyya Kanti Datta et.al. 2504.01470 link
2025-04-02 EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters Xuli Shen et.al. 2503.19416 null
2025-04-01 Monocular and Generalizable Gaussian Talking Head Animation Shengjie Gong et.al. 2504.00665 null
2025-04-01 Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics Lee Chae-Yeon et.al. 2503.20308 null
2025-03-30 MoCha: Towards Movie-Grade Talking Character Synthesis Cong Wei et.al. 2503.23307 null
2025-03-29 STSA: Spatial-Temporal Semantic Alignment for Visual Dubbing Zijun Ding et.al. 2503.23039 link
2025-03-28 Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis Shuai Shen et.al. 2503.22605 null
2025-03-28 Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance Haijie Yang et.al. 2503.22225 null
2025-03-27 ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model Jinwei Qi et.al. 2503.21144 null
2025-03-26 Dual Audio-Centric Modality Coupling for Talking Head Generation Ao Fu et.al. 2503.22728 null
2025-03-25 AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers Jiazhi Guan et.al. 2503.19824 null
2025-03-25 MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation Yukang Lin et.al. 2503.19383 null
2025-03-25 HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation Zunnan Xu et.al. 2503.18860 null
2025-03-25 Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model Yingying Fan et.al. 2503.16942 null
2025-03-24 DisentTalk: Cross-lingual Talking Face Generation via Semantic Disentangled Diffusion Model Kangwei Liu et.al. 2503.19001 null
2025-03-24 Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation Dingcheng Zhen et.al. 2503.18429 null
2025-03-23 DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation Peng Chen et.al. 2503.18159 link
2025-03-21 TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting Jianchuan Chen et.al. 2503.17032 null
2025-03-21 From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech Ji-Hoon Kim et.al. 2503.16956 null
2025-03-20 UniSync: A Unified Framework for Audio-Visual Synchronization Tao Feng et.al. 2503.16357 null
2025-03-20 PC-Talk: Precise Facial Animation Control for Audio-Driven Talking Face Generation Baiqin Wang et.al. 2503.14295 null
2025-03-19 DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis Yuming Gu et.al. 2503.15667 link
2025-03-19 KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation Antoni Bigata et.al. 2503.01715 null
2025-03-17 SyncDiff: Diffusion-based Talking Head Synthesis with Bottlenecked Temporal Visual Prior for Improved Synchronization Xulin Fan et.al. 2503.13371 null
2025-03-17 Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait Chaolong Yang et.al. 2503.12963 link
2025-03-14 Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control Hejia Chen et.al. 2503.14517 null
2025-03-14 EmoDiffusion: Enhancing Emotional 3D Facial Animation with Latent Diffusion Models Yixuan Zhang et.al. 2503.11028 null
2025-03-12 StyleSpeaker: Audio-Enhanced Fine-Grained Style Modeling for Speech-Driven 3D Facial Animation An Yang et.al. 2503.09852 null
2025-03-12 Bidirectional Learned Facial Animation Codec for Low Bitrate Talking Head Videos Riku Takahashi et.al. 2503.09787 null
2025-03-09 Removing Averaging: Personalized Lip-Sync Driven Characters Based on Identity Adapter Yanyu Zhu et.al. 2503.06397 null
2025-03-07 MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice Hongwei Yi et.al. 2503.05978 null
2025-03-06 FREAK: Frequency-modulated High-fidelity and Real-time Audio-driven Talking Portrait Synthesis Ziqi Ni et.al. 2503.04067 null
2025-03-02 FaceShot: Bring Any Character into Life Junyao Gao et.al. 2503.00740 null
2025-03-01 Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture Xuanchen Li et.al. 2503.00495 null
2025-02-28 Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints Masoumeh Chapariniya et.al. 2502.20803 null
2025-02-28 ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model Xuangeng Chu et.al. 2502.20323 null
2025-02-27 InsTaG: Learning Personalized 3D Talking Head from Few-Second Video Jiahe Li et.al. 2502.20387 link
2025-02-27 High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model Mingtao Guo et.al. 2502.19894 link
2025-02-26 FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion mode Lingzhou Mu et.al. 2502.19455 null
2025-02-24 Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation Baptiste Chopin et.al. 2502.17198 null
2025-02-20 NeRF-3DTalker: Neural Radiance Field with 3D Prior Aided Audio Disentanglement for Talking Head Synthesis Xiaoxing Liu et.al. 2502.14178 null
2025-02-18 AV-Flow: Transforming Text to Audio-Visual Human-like Interactions Aggelina Chatziagapi et.al. 2502.13133 null
2025-02-17 SayAnything: Audio-Driven Lip Synchronization with Conditional Video Diffusion Junxian Ma et.al. 2502.11515 null
2025-02-15 SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers Di Qiu et.al. 2502.10841 link
2025-02-13 Long-Term TalkingFace Generation via Motion-Prior Conditional Diffusion Model Fei Shen et.al. 2502.09533 null
2025-02-13 VTutor: An Open-Source SDK for Generative AI-Powered Animated Pedagogical Agents with Multi-Media Output Eason Chen et.al. 2502.04103 null
2025-02-11 Playmate: Flexible Control of Portrait Animation via 3D-Implicit Space Guided Diffusion Xingpei Ma et.al. 2502.07203 null
2025-02-07 Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark Han Zhang et.al. 2502.04976 null
2025-02-02 EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis Junuk Cha et.al. 2502.00654 null
2025-01-24 SyncAnimation: A Real-Time End-to-End Framework for Audio-Driven Human Pose and Talking Head Animation Yujian Liu et.al. 2501.14646 null
2025-01-21 A Lightweight and Interpretable Deepfakes Detection Framework Muhammad Umar Farooq et.al. 2501.11927 null
2025-01-18 EMO2: End-Effector Guided Audio-Driven Avatar Video Generation Linrui Tian et.al. 2501.10687 null
2025-01-17 TalkingEyes: Pluralistic Speech-Driven 3D Eye Gaze Animation Yixiang Zhuang et.al. 2501.09921 null
2025-01-15 Joint Learning of Depth and Appearance for Portrait Image Animation Xinya Ji et.al. 2501.08649 null
2025-01-15 Make-A-Character 2: Animatable 3D Character Generation From a Single Image Lin Liu et.al. 2501.07870 null
2025-01-09 Towards Dynamic Neural Communication and Speech Neuroprosthesis Based on Viseme Decoding Ji-Ha Park et.al. 2501.14790 null
2025-01-09 Identity-Preserving Video Dubbing Using Motion Warping Runzhen Liu et.al. 2501.04586 null
2025-01-09 MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation Huaize Liu et.al. 2501.01808 null
2025-01-07 Generating and Detecting Various Types of Fake Image and Audio Content: A Review of Modern Deep Learning Technologies and Tools Arash Dehghani et.al. 2501.06227 null
2025-01-07 VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Yuanpeng Tu et.al. 2501.01427 null
2025-01-06 RDD4D: 4D Attention-Guided Road Damage Detection And Classification Asma Alkalbani et.al. 2501.02822 link
2025-01-06 Takeaways from Applying LLM Capabilities to Multiple Conversational Avatars in a VR Pilot Study Mykola Maslych et.al. 2501.00168 null
2025-01-03 JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing Qili Wang et.al. 2501.01798 link
2024-12-28 DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis Kaijun Deng et.al. 2412.20148 link
2024-12-26 UniAvatar: Taming Lifelike Audio-Driven Talking Head Generation with Comprehensive Motion and Lighting Control Wenzhang Sun et.al. 2412.19860 null
2024-12-26 Generating Editable Head Avatars with 3D Gaussian GANs Guohao Li et.al. 2412.19149 link
2024-12-23 FaceLift: Single Image to 3D Head with View Generation and GS-LRM Weijie Lyu et.al. 2412.17812 null
2024-12-22 FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation Tianyun Zhong et.al. 2412.16915 null
2024-12-18 Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters Steven Hogue et.al. 2412.14333 link
2024-12-18 GLCF: A Global-Local Multimodal Coherence Analysis Framework for Talking Face Generation Detection Xiaocan Chen et.al. 2412.13656 null
2024-12-18 Learning to Control an Android Robot Head for Facial Animation Marcel Heisler et.al. 2412.13641 null
2024-12-18 Real-time One-Step Diffusion-based Expressive Portrait Videos Generation Hanzhong Guo et.al. 2412.13479 link
2024-12-18 VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization Tao Liu et.al. 2412.09892 null
2024-12-16 Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content Rohit Kundu et.al. 2412.12278 null
2024-12-13 GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expression Ziqi Zhou et.al. 2412.09296 link
2024-12-12 LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync Chunyu Li et.al. 2412.09262 link
2024-12-12 EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing Gaoxiang Cong et.al. 2412.08988 null
2024-12-12 PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis Yifan Xie et.al. 2412.08504 null
2024-12-10 PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation Fatemeh Nazarieh et.al. 2412.07754 null
2024-12-10 IF-MDM: Implicit Face Motion Diffusion Model for High-Fidelity Realtime Talking Head Generation Sejong Yang et.al. 2412.04000 null
2024-12-05 MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation Longtao Zheng et.al. 2412.04448 null
2024-12-05 Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks Jiahao Cui et.al. 2412.00733 link
2024-12-04 SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model Yan Li et.al. 2412.03430 null
2024-12-02 One Shot, One Talk: Whole-body Talking Avatar from a Single Image Jun Xiang et.al. 2412.01106 null
2024-12-01 Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation Shuling Zhao et.al. 2412.00719 null
2024-11-29 LokiTalk: Learning Fine-Grained and Generalizable Correspondences to Enhance NeRF-based Talking Head Synthesis Tianqi Li et.al. 2411.19525 null
2024-11-29 Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis Tianqi Li et.al. 2411.19509 link
2024-11-29 V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow Jeongsoo Choi et.al. 2411.19486 link
2024-11-26 Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey Hong-Hanh Nguyen-Le et.al. 2411.17911 null
2024-11-25 Sonic: Shifting Focus to Global Audio Perception in Portrait Animation Xiaozhong Ji et.al. 2411.16331 null
2024-11-25 ESARM: 3D Emotional Speech-to-Animation via Reward Model from Automatically-Ranked Demonstrations Xulong Zhang et.al. 2411.13089 null
2024-11-24 LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis Haojie Zhang et.al. 2411.16748 null
2024-11-23 EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion Haotian Wang et.al. 2411.16726 null
2024-11-23 ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance Haijie Yang et.al. 2411.15436 null
2024-11-20 Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis Pegah Salehi et.al. 2411.13209 link
2024-11-20 JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation Xuyang Cao et.al. 2411.09209 link
2024-11-14 LES-Talker: Fine-Grained Emotion Editing for Talking Head Generation in Linear Emotion Space Guanwen Feng et.al. 2411.09268 null
2024-11-06 Large Generative Model-assisted Talking-face Semantic Communication System Feibo Jiang et.al. 2411.03876 null
2024-11-05 SPEAK: Speech-Driven Pose and Emotion-Adjustable Talking Head Generation Changpeng Cai et.al. 2405.07257 null
2024-10-31 Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts Xiang Deng et.al. 2410.23836 null
2024-10-29 Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing Haonan Tong et.al. 2410.22112 null
2024-10-24 Real-time 3D-aware Portrait Video Relighting Ziqi Cai et.al. 2410.18355 link
2024-10-21 Joker: Conditional 3D Head Synthesis with Extreme Facial Expressions Malte Prinzler et.al. 2410.16395 null
2024-10-18 Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization Bin Lin et.al. 2410.14283 null
2024-10-18 DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation Hanbo Cheng et.al. 2410.13726 link
2024-10-16 MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting Yue Zhang et.al. 2410.10122 link
2024-10-15 Titanic Calling: Low Bandwidth Video Conference from the Titanic Wreck Fevziye Irem Eyiokur et.al. 2410.11434 null
2024-10-15 MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes Zhenhui Ye et.al. 2410.06734 null
2024-10-14 Character-aware audio-visual subtitling in context Jaesung Huh et.al. 2410.11068 null
2024-10-14 Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads Federico Nocentini et.al. 2410.11041 null
2024-10-14 TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model Jiazhi Guan et.al. 2410.10696 null
2024-10-14 Generative Human Video Compression with Multi-granularity Temporal Trajectory Factorization Shanzhi Yin et.al. 2410.10171 null
2024-10-10 MMHead: Towards Fine-grained Multi-modal 3D Facial Animation Sijing Wu et.al. 2410.07757 null
2024-10-09 FreeAvatar: Robust 3D Facial Animation Transfer by Learning an Expression Foundation Model Feng Qiu et.al. 2409.13180 null
2024-10-01 LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details Jian Yang et.al. 2410.00990 null
2024-09-29 Learning Frame-Wise Emotion Intensity for Audio-Driven Talking-Head Generation Jingyi Xu et.al. 2409.19501 null
2024-09-27 Diverse Code Query Learning for Speech-Driven Facial Animation Chunzhi Gu et.al. 2409.19143 null
2024-09-26 Stable Video Portraits Mirela Ostrek et.al. 2409.18083 null
2024-09-25 ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE Sichun Wu et.al. 2409.07966 link
2024-09-24 FastTalker: Jointly Generating Speech and Conversational Gestures from Text Zixin Guo et.al. 2409.16404 null
2024-09-23 FaceVid-1K: A Large-Scale High-Quality Multiracial Human Face Video Dataset Donglin Di et.al. 2410.07151 null
2024-09-23 MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning Yue Han et.al. 2409.15179 null
2024-09-18 JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation Sai Tanmay Reddy Chakkera et.al. 2409.12156 null
2024-09-18 GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations Kartik Teotia et.al. 2409.11951 null
2024-09-17 3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy Xuanmeng Sha et.al. 2409.10848 null
2024-09-16 DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis Fa-Ting Hong et.al. 2409.10281 null
2024-09-14 StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads Suzhen Wang et.al. 2409.09292 null
2024-09-11 DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures Steven Hogue et.al. 2409.07649 null
2024-09-11 EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion Jian Zhang et.al. 2409.07255 link
2024-09-09 PersonaTalk: Bring Attention to Your Persona in Visual Dubbing Longhao Zhang et.al. 2409.05379 null
2024-09-09 KAN-Based Fusion of Dual-Domain for Audio-Driven Facial Landmarks Generation Hoang-Son Vo-Thanh et.al. 2409.05330 link
2024-09-05 SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing Lingyu Xiong et.al. 2409.03605 null
2024-09-05 SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model Weipeng Tan et.al. 2409.03270 null
2024-09-04 PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation Jun Ling et.al. 2409.02657 null
2024-09-02 KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding Zhihao Xu et.al. 2409.01113 link
2024-08-28 Micro and macro facial expressions by driven animations in realistic Virtual Humans Rubens Halbig Montanha et.al. 2408.16110 null
2024-08-27 MegActor- $Σ$ : Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer Shurong Yang et.al. 2408.14975 null
2024-08-25 TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation Jack Saunders et.al. 2408.13714 null
2024-08-23 G3FA: Geometry-guided GAN for Face Animation Alireza Javanmardi et.al. 2408.13049 null
2024-08-21 AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition Minheng Ni et.al. 2408.11564 null
2024-08-21 EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face with Mesh Attention Yihong Lin et.al. 2408.11518 null
2024-08-20 DEGAS: Detailed Expressions on Full-Body Gaussian Avatars Zhijing Shao et.al. 2408.10588 link
2024-08-18 FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model Ziyu Yao et.al. 2408.09384 null
2024-08-18 Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation Xukun Zhou et.al. 2408.09357 null
2024-08-18 S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis Dongze Li et.al. 2408.09347 null
2024-08-16 GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer Yihong Lin et.al. 2408.01826 null
2024-08-14 Content and Style Aware Audio-Driven Facial Animation Qingju Liu et.al. 2408.07005 null
2024-08-12 DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation Jisoo Kim et.al. 2408.06010 null
2024-08-10 High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model Weizhi Zhong et.al. 2408.05416 null
2024-08-10 Style-Preserving Lip Sync via Audio-Aware Style Reference Weizhi Zhong et.al. 2408.05412 null
2024-08-09 DeepSpeak Dataset v1.0 Sarah Barrington et.al. 2408.05366 null
2024-08-06 ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer Jiazhi Guan et.al. 2408.03284 null
2024-08-03 Landmark-guided Diffusion Model for High-fidelity and Temporally Coherent Talking Head Generation Jintao Tan et.al. 2408.01732 null
2024-08-03 JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid Transformer-Mamba Model Farzaneh Jafari et.al. 2408.01627 null
2024-08-01 UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model Xiangyu Fan et.al. 2408.00762 null
2024-08-01 Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion Manuel Kansy et.al. 2408.00458 null
2024-08-01 EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head Qianyun He et.al. 2408.00297 null
2024-07-31 Deformable 3D Shape Diffusion Model Dengsheng Chen et.al. 2407.21428 null
2024-07-26 LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial Control Enhancement Rui Zhang et.al. 2407.18595 null
2024-07-24 A Comprehensive Review and Taxonomy of Audio-Visual Synchronization Techniques for Realistic Speech Animation Jose Geraldo Fernandes et.al. 2407.17430 null
2024-07-24 The impact of differences in facial features between real speakers and 3D face models on synthesized lip motions Rabab Algadhy et.al. 2407.17253 null
2024-07-22 PAV: Personalized Head Avatar from Unstructured Video Collection Akin Caliskan et.al. 2407.21047 null
2024-07-21 Anchored Diffusion for Video Face Reenactment Idan Kligvasser et.al. 2407.15153 null
2024-07-20 Text-based Talking Video Editing with Cascaded Conditional Diffusion Bo Han et.al. 2407.14841 null
2024-07-17 Universal Facial Encoding of Codec Avatars from VR Headsets Shaojie Bai et.al. 2407.13038 null
2024-07-17 EmoFace: Audio-driven Emotional 3D Face Animation Chang Liu et.al. 2407.12501 link
2024-07-13 Learning Online Scale Transformation for Talking Head Video Generation Fa-Ting Hong et.al. 2407.09965 null
2024-07-12 Real Face Video Animation Platform Xiaokai Chen et.al. 2407.18955 null
2024-07-12 One-Shot Pose-Driving Face Animation Platform He Feng et.al. 2407.08949 null
2024-07-12 EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions Zhiyuan Chen et.al. 2407.08136 link
2024-07-08 MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices Jianwen Jiang et.al. 2407.05712 null
2024-07-08 Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN Jiacheng Su et.al. 2407.05577 null
2024-07-04 Compressed Skinning for Facial Blendshapes Ladislav Kavan et.al. 2406.11597 null
2024-07-03 LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control Jianzhu Guo et.al. 2407.03168 link
2024-07-02 Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert Han EunGi et.al. 2407.01034 null
2024-06-26 RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network Xiaozhong Ji et.al. 2406.18284 null
2024-06-24 The Effects of Embodiment and Personality Expression on Learning in LLM-based Educational Agents Sinan Sonlu et.al. 2407.10993 null
2024-06-21 EmpathyEar: An Open-source Avatar Multimodal Empathetic Chatbot Hao Fei et.al. 2406.15177 link
2024-06-20 MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset Kim Sung-Bin et.al. 2406.14272 null
2024-06-19 DF40: Toward Next-Generation Deepfake Detection Zhiyuan Yan et.al. 2406.13495 link
2024-06-19 AniFaceDiff: High-Fidelity Face Reenactment via Facial Parametric Conditioned Diffusion Models Ken Chen et.al. 2406.13272 null
2024-06-18 RITA: A Real-time Interactive Talking Avatars Framework Wuxinlin Cheng et.al. 2406.13093 null
2024-06-18 A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing Ming Meng et.al. 2406.10553 null
2024-06-17 NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation Niu Guanchen et.al. 2406.11259 null
2024-06-17 Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement Runyi Yu et.al. 2406.08096 null
2024-06-16 Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation Mingwang Xu et.al. 2406.08801 null
2024-06-14 DNPM: A Neural Parametric Model for the Synthesis of Facial Geometric Details Haitao Cao et.al. 2405.19688 null
2024-06-13 Talking Heads: Understanding Inter-layer Communication in Transformer Language Models Jack Merullo et.al. 2406.09519 null
2024-06-13 DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing Neha Sahipjohn et.al. 2406.08802 null
2024-06-12 Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation Jiadong Liang et.al. 2406.07895 null
2024-06-07 Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation Yue Ma et.al. 2406.01900 null
2024-06-05 Controllable Talking Face Generation by Implicit Facial Keypoints Editing Dong Zhao et.al. 2406.02880 link
2024-05-31 MunchSonic: Tracking Fine-grained Dietary Actions through Active Acoustic Sensing on Eyeglasses Saif Mahmud et.al. 2405.21004 null
2024-05-31 MegActor: Harness the Power of Raw Video for Vivid Portrait Animation Shurong Yang et.al. 2405.20851 link
2024-05-30 Audio2Rig: Artist-oriented deep learning tool for facial animation Bastien Arcelin et.al. 2405.20412 null
2024-05-28 OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance Shuheng Ge et.al. 2405.14709 null
2024-05-24 InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation Yuchi Wang et.al. 2405.15758 link
2024-05-22 Metabook: An Automatically Generated Augmented Reality Storybook Interaction System to Improve Children's Engagement in Storytelling Yibo Wang et.al. 2405.13701 null
2024-05-21 Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control Yue Han et.al. 2405.12970 null
2024-05-16 Faces that Speak: Jointly Synthesising Talking Face and Speech from Text Youngjoon Jang et.al. 2405.10272 null
2024-05-14 PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset Yang Hou et.al. 2405.08838 link
2024-05-10 NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior Gihoon Kim et.al. 2405.05749 null
2024-05-09 SwapTalk: Audio-Driven Talking Face Generation with One-Shot Customization in Latent Space Zeren Zhang et.al. 2405.05636 null
2024-05-08 Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention Ruijie Tao et.al. 2404.18501 link
2024-05-07 Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation Dogucan Yaman et.al. 2405.04327 null
2024-05-07 AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding Tao Liu et.al. 2405.03121 null
2024-04-29 EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars Nikita Drobyshev et.al. 2404.19110 null
2024-04-29 GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting Bo Chen et.al. 2404.19040 null
2024-04-29 Embedded Representation Learning Network for Animating Styled Video Portrait Tianyong Wang et.al. 2404.19038 null
2024-04-29 CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation Xiangyu Liang et.al. 2404.18604 null
2024-04-28 GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting Hongyun Yu et.al. 2404.14037 null
2024-04-25 GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting Kyusun Cho et.al. 2404.16012 link
2024-04-23 TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting Jiahe Li et.al. 2404.15264 link
2024-04-19 Learn2Talk: 3D Talking Face Learns from 2D Talking Face Yixiang Zhuang et.al. 2404.12888 null
2024-04-16 VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time Sicheng Xu et.al. 2404.10667 null
2024-04-15 FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance, Head-pose, and Facial Expression Features Andre Rochow et.al. 2404.09736 null
2024-04-13 THQA: A Perceptual Quality Assessment Database for Talking Heads Yingjie Zhou et.al. 2404.09003 link
2024-04-11 EFHQ: Multi-purpose ExtremePose-Face-HQ dataset Trung Tuan Dao et.al. 2312.17205 null
2024-04-09 Deepfake Generation and Detection: A Benchmark and Survey Gan Pei et.al. 2403.17881 link
2024-04-08 SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation Heyuan Li et.al. 2404.05680 null
2024-04-07 GvT: A Graph-based Vision Transformer with Talking-Heads Utilizing Sparsity, Trained from Scratch on Small Datasets Dongjing Shan et.al. 2404.04924 null
2024-04-07 Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation Renshuai Liu et.al. 2401.01207 null
2024-04-03 MI-NeRF: Learning a Single Face NeRF from Multiple Identities Aggelina Chatziagapi et.al. 2403.19920 null
2024-04-02 EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis Shuai Tan et.al. 2404.01647 null
2024-04-02 Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation Taekyung Ki et.al. 2404.00636 null
2024-04-02 Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation Se Jin Park et.al. 2305.19556 null
2024-04-01 FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio Chao Xu et.al. 2403.01901 link
2024-03-29 Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior Jaehoon Ko et.al. 2403.20153 link
2024-03-28 MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation Seyeon Kim et.al. 2403.19144 link
2024-03-28 GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response Govind Mittal et.al. 2210.06186 link
2024-03-27 X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention You Xie et.al. 2403.15931 null
2024-03-26 Superior and Pragmatic Talking Face Generation with Teacher-Student Framework Chao Liang et.al. 2403.17883 null
2024-03-26 AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation Huawei Wei et.al. 2403.17694 link
2024-03-26 Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis Zhenhui Ye et.al. 2401.08503 null
2024-03-25 DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment Stella Bounareli et.al. 2403.17217 null
2024-03-25 AnimateMe: 4D Facial Expressions via Diffusion Models Dimitrios Gerogiannis et.al. 2403.17213 null
2024-03-25 Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework Ziyao Huang et.al. 2403.16510 link
2024-03-23 Adaptive Super Resolution For One-Shot Talking-Head Generation Luchuan Song et.al. 2403.15944 link
2024-03-22 LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example Soyeon Yoon et.al. 2403.15227 link
2024-03-22 Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing Juan Zhang et.al. 2403.11700 null
2024-03-19 EmoVOCA: Speech-Driven Emotional 3D Talking Heads Federico Nocentini et.al. 2403.12886 link
2024-03-19 ScanTalk: 3D Talking Heads from Unregistered Scans Federico Nocentini et.al. 2403.10942 link
2024-03-15 StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation Dongchan Min et.al. 2208.10922 null
2024-03-14 GAIA: Zero-shot Talking Avatar Generation Tianyu He et.al. 2311.15230 null
2024-03-13 Say Anything with Any Style Shuai Tan et.al. 2403.06363 null
2024-03-12 FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization Shuai Tan et.al. 2403.06375 null
2024-03-12 Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style Shuai Tan et.al. 2403.06365 null
2024-03-11 A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos Weixia Zhang et.al. 2403.06421 link
2024-03-05 Memories are One-to-Many Mapping Alleviators in Talking Face Generation Anni Tang et.al. 2212.05005 null
2024-03-02 G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment Juan Zhang et.al. 2402.18122 null
2024-03-01 DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder Chenpeng Du et.al. 2303.17550 null
2024-02-29 Learning a Generalized Physical Face Model From Data Lingchen Yang et.al. 2402.19477 null
2024-02-28 Context-aware Talking Face Video Generation Meidai Xuanyuan et.al. 2402.18092 null
2024-02-27 EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions Linrui Tian et.al. 2402.17485 null
2024-02-27 Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis Zicheng Zhang et.al. 2402.17364 link
2024-02-26 Resolution-Agnostic Neural Compression for High-Fidelity Portrait Video Conferencing via Implicit Radiance Fields Yifei Li et.al. 2402.16599 null
2024-02-25 AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation Yasheng Sun et.al. 2402.16124 null
2024-02-21 Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters Zechen Bai et.al. 2402.13724 link
2024-02-21 StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing Gaoxiang Cong et.al. 2402.12636 link
2024-02-12 StyleLipSync: Style-based Personalized Lip-sync Video Generation Taekyung Ki et.al. 2305.00521 null
2024-02-08 DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer Zhiyuan Ma et.al. 2402.05712 link
2024-02-05 One-shot Neural Face Reenactment via Finding Directions in GAN's Latent Space Stella Bounareli et.al. 2402.03553 null
2024-02-02 EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation Guanwen Feng et.al. 2402.01422 null
2024-01-31 MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis Wenhao Guan et.al. 2312.10687 null
2024-01-30 Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance Qingcheng Zhao et.al. 2401.15687 null
2024-01-28 Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-Syncing DeepFakes Weifeng Liu et.al. 2401.15668 link
2024-01-27 An Implicit Physical Face Model Driven by Expression and Style Lingchen Yang et.al. 2401.15414 null
2024-01-26 Implicit Neural Representation for Physics-driven Actuated Soft Bodies Lingchen Yang et.al. 2401.14861 null
2024-01-25 SAiD: Speech-driven Blendshape Facial Animation with Diffusion Inkyu Park et.al. 2401.08655 link
2024-01-23 NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face Synthesis Chongke Bi et.al. 2401.12568 null
2024-01-19 Fast Registration of Photorealistic Avatars for VR Facial Animation Chaitanya Patel et.al. 2401.11002 null
2024-01-18 Exposing Lip-syncing Deepfakes from Mouth Inconsistencies Soumyya Kanti Datta et.al. 2401.10113 link
2024-01-18 Text-driven Talking Face Synthesis by Reprogramming Audio-driven Models Jeongsoo Choi et.al. 2306.16003 null
2024-01-16 EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model Bingyuan Zhang et.al. 2401.08049 null
2024-01-12 DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder Tao Liu et.al. 2311.01811 link
2024-01-11 Dubbing for Everyone: Data-Efficient Visual Dubbing using Neural Rendering Priors Jack Saunders et.al. 2401.06126 null
2024-01-11 Jump Cut Smoothing for Talking Heads Xiaojuan Wang et.al. 2401.04718 null
2024-01-08 AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation Liyang Chen et.al. 2310.07236 null
2024-01-07 Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness Sicheng Yang et.al. 2401.03476 null
2024-01-04 Expressive Speech-driven Facial Animation with controllable emotions Yutong Chen et.al. 2301.02008 link
2023-12-23 TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation Xize Cheng et.al. 2312.15197 null
2023-12-22 DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation Chenxu Zhang et.al. 2312.13578 null
2023-12-20 FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability Linze Li et.al. 2312.03775 null
2023-12-19 Learning Dense Correspondence for NeRF-Based Face Reenactment Songlin Yang et.al. 2312.10422 null
2023-12-19 Gaussian3Diff: 3D Gaussian Diffusion for 3D Full Head Synthesis and Editing Yushi Lan et.al. 2312.03763 null
2023-12-18 VectorTalker: SVG Talking Face Generation with Progressive Vectorisation Hao Hu et.al. 2312.11568 null
2023-12-18 AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis Dongze Li et.al. 2312.10921 null
2023-12-18 Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation Hui Fu et.al. 2312.10877 null
2023-12-15 DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models Yifeng Ma et.al. 2312.09767 link
2023-12-15 Attention-Based VR Facial Animation with Visual Mouth Camera Guidance for Immersive Telepresence Avatars Andre Rochow et.al. 2312.09750 null
2023-12-13 uTalk: Bridging the Gap Between Humans and AI Hussam Azzuni et.al. 2310.02739 null
2023-12-13 MMFace4D: A Large-Scale Multi-Modal 4D Face Dataset for Audio-Driven 3D Face Animation Haozhe Wu et.al. 2303.09797 null
2023-12-12 GMTalker: Gaussian Mixture based Emotional talking video Portraits Yibo Xia et.al. 2312.07669 null
2023-12-12 GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance Haiming Zhang et.al. 2312.07385 null
2023-12-11 Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism Georgios Milis et.al. 2312.06613 link
2023-12-11 Study of Non-Verbal Behavior in Conversational Agents Camila Vicari Maccari et.al. 2312.06530 null
2023-12-11 DiT-Head: High-Resolution Talking Head Synthesis using Diffusion Transformers Aaron Mir et.al. 2312.06400 null
2023-12-11 Audio-driven Talking Face Generation by Overcoming Unintended Information Flow Dogucan Yaman et.al. 2307.09368 null
2023-12-10 DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation Fa-Ting Hong et.al. 2305.06225 link
2023-12-09 R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning Zhiling Ye et.al. 2312.05572 null
2023-12-09 FT2TF: First-Person Statement Text-To-Talking Face Generation Xingjian Diao et.al. 2312.05430 null
2023-12-08 SingingHead: A Large-scale 4D Dataset for Singing Head Animation Sijing Wu et.al. 2312.04369 null
2023-12-07 VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior Xusen Sun et.al. 2312.01841 null
2023-12-05 PMMTalk: Speech-Driven 3D Facial Animation from Complementary Pseudo Multi-modal Features Tianshun Han et.al. 2312.02781 null
2023-12-05 MyPortrait: Morphable Prior-Guided Personalized Portrait Generation Bo Ding et.al. 2312.02703 null
2023-12-02 DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser Peng Chen et.al. 2311.16565 null
2023-12-01 3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing Balamurugan Thambiraja et.al. 2312.00870 null
2023-11-30 Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data Yu Deng et.al. 2311.18729 null
2023-11-30 Talking Head(?) Anime from a Single Image 4: Improved Model and Its Distillation Pramook Khungurn et.al. 2311.17409 null
2023-11-29 SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis Ziqiao Peng et.al. 2311.17590 link
2023-11-28 THInImg: Cross-modal Steganography for Presenting Talking Heads in Images Lin Zhao et.al. 2311.17177 null
2023-11-28 BakedAvatar: Baking Neural Fields for Real-Time Head Avatar Synthesis Hao-Bin Duan et.al. 2311.05521 link
2023-11-28 Continuously Controllable Facial Expression Editing in Talking Face Videos Zhiyao Sun et.al. 2209.08289 null
2023-11-20 MemoryCompanion: A Smart Healthcare Solution to Empower Efficient Alzheimer's Care Via Unleashing Generative AI Lifei Zheng et.al. 2311.14730 null
2023-11-15 CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding Jianzong Wang et.al. 2311.08673 null
2023-11-13 DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial Animation Guinan Su et.al. 2311.04766 null
2023-11-12 ChatAnything: Facetime Chat with LLM-Enhanced Personas Yilin Zhao et.al. 2311.06772 null
2023-11-08 Synthetic Speaking Children -- Why We Need Them and How to Make Them Muhammad Ali Farooq et.al. 2311.06307 null
2023-11-06 RADIO: Reference-Agnostic Dubbing Video Synthesis Dongyeun Lee et.al. 2309.01950 null
2023-11-05 3D-Aware Talking-Head Video Motion Transfer Haomiao Ni et.al. 2311.02549 null
2023-11-03 Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading Songtao Luo et.al. 2310.05058 link
2023-11-02 LaughTalk: Expressive 3D Talking Head Generation with Laughter Kim Sung-Bin et.al. 2311.00994 null
2023-11-02 High-Fidelity and Freely Controllable Talking Head Video Generation Yue Gao et.al. 2304.10168 null
2023-10-31 Breathing Life into Faces: Speech-driven 3D Facial Animation with Natural Head Pose and Detailed Shape Wei Zhao et.al. 2310.20240 null
2023-10-29 On the Vulnerability of DeepFake Detectors to Attacks Generated by Denoising Diffusion Models Marija Ivanovska et.al. 2307.05397 null
2023-10-25 Personalized Speech-driven Expressive 3D Facial Animation Synthesis with Style Control Elif Bozkurt et.al. 2310.17011 null
2023-10-23 The Self 2.0: How AI-Enhanced Self-Clones Transform Self-Perception and Improve Presentation Skills Qingxiao Zheng et.al. 2310.15112 null
2023-10-19 Gemino: Practical and Robust Neural Compression for Video Conferencing Vibhaalakshmi Sivaraman et.al. 2209.10507 null
2023-10-17 CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D Animation Zhaojie Chu et.al. 2310.11295 null
2023-10-15 HyperLips: Hyper Control Lips with High Resolution Decoder for Talking Face Generation Yaosen Chen et.al. 2310.05720 link
2023-10-12 CleftGAN: Adapting A Style-Based Generative Adversarial Network To Create Images Depicting Cleft Lip Deformity Abdullah Hayajneh et.al. 2310.07969 link
2023-10-12 Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation Yuan Gan et.al. 2309.04946 link
2023-10-08 GestSync: Determining who is speaking without a talking head Sindhu B Hegde et.al. 2310.05304 link
2023-09-30 DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models Zhiyao Sun et.al. 2310.00434 null
2023-09-28 OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions Jin Liu et.al. 2309.16148 null
2023-09-26 Emotional Speech-Driven Animation with Content-Emotion Disentanglement Radek Daněček et.al. 2306.08990 null
2023-09-20 FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion Stefan Stan et.al. 2309.11306 link
2023-09-20 Context-Aware Talking-Head Video Editing Songlin Yang et.al. 2308.00462 null
2023-09-18 That's What I Said: Fully-Controllable Talking Face Generation Youngjoon Jang et.al. 2304.03275 null
2023-09-15 Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech Junjie Li et.al. 2309.08408 link
2023-09-14 DT-NeRF: Decomposed Triplane-Hash Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis Yaoyu Su et.al. 2309.07752 null
2023-09-14 DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks Zipeng Qi et.al. 2309.07509 null
2023-09-14 HDTR-Net: A Real-Time High-Definition Teeth Restoration Network for Arbitrary Talking Face Generation Methods Yongyuan Li et.al. 2309.07495 link
2023-09-13 PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network Qinghua Liu et.al. 2309.06723 null
2023-09-12 DF-TransFusion: Multimodal Deepfake Detection via Lip-Audio Cross-Attention and Facial Self-Attention Aaditya Kharel et.al. 2309.06511 null
2023-09-12 Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos Ekta Prashnani et.al. 2305.03713 null
2023-09-11 ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment Yicheng Zhong et.al. 2308.14448 null
2023-09-10 MaskRenderer: 3D-Infused Multi-Mask Realistic Face Reenactment Tina Behrouzi et.al. 2309.05095 null
2023-09-09 Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video Xiuzhe Wu et.al. 2309.04814 link
2023-09-01 Unsupervised Learning of Style-Aware Facial Animation from Real Acting Performances Wolfgang Paier et.al. 2306.10006 null
2023-08-30 From Pixels to Portraits: A Comprehensive Survey of Talking Head Generation Techniques and Applications Shreyank N Gowda et.al. 2308.16041 null
2023-08-30 SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces Ziqiao Peng et.al. 2306.10799 link
2023-08-30 Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models Antoni Bigata Casademunt et.al. 2305.08854 link
2023-08-29 Papeos: Augmenting Research Papers with Talk Videos Tae Soo Kim et.al. 2308.15224 null
2023-08-25 EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation Ziqiao Peng et.al. 2303.11089 link
2023-08-24 ToonTalker: Cross-Domain Face Reenactment Yuan Gong et.al. 2308.12866 null
2023-08-24 Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis Jiahe Li et.al. 2307.09323 link
2023-08-23 DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion Se Jin Park et.al. 2310.05934 null
2023-08-21 Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis Tong Sha et.al. 2109.02081 null
2023-08-18 Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization Soumik Mukhopadhyay et.al. 2308.09716 link
2023-08-18 Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation Fa-Ting Hong et.al. 2307.09906 link
2023-08-17 A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation Li Liu et.al. 2308.08849 link
2023-08-16 Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with Instructions Yuqi Sun et.al. 2306.10813 null
2023-08-12 Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation Zhichao Wang et.al. 2308.06457 link
2023-08-12 DialogueNeRF: Towards Realistic Avatar Face-to-Face Conversation Video Generation Yichao Yan et.al. 2203.07931 null
2023-08-11 Versatile Face Animator: Driving Arbitrary 3D Facial Avatar in RGBD Space Haoyu Wang et.al. 2308.06076 link
2023-08-11 VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer Liyang Chen et.al. 2308.04830 null
2023-08-10 Near-realtime Facial Animation by Deep 3D Simulation Super-Resolution Hyojoon Park et.al. 2305.03216 null
2023-08-02 Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis Zhenhui Ye et.al. 2306.03504 null
2023-07-29 Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation Michał Stypułkowski et.al. 2301.03396 null
2023-07-26 Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation Federico Nocentini et.al. 2306.01415 link
2023-07-20 HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces Stella Bounareli et.al. 2307.10797 link
2023-07-20 MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions Yunfei Liu et.al. 2307.10008 null
2023-07-19 Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline Zhigang Chang et.al. 2307.09821 null
2023-07-19 OPHAvatars: One-shot Photo-realistic Head Avatars Shaoxu Li et.al. 2307.09153 link
2023-07-18 FACTS: Facial Animation Creation using the Transfer of Styles Jack Saunders et.al. 2307.09480 null
2023-07-09 Predictive Coding For Animation-Based Video Compression Goluck Konuko et.al. 2307.04187 null
2023-07-08 FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction Ganglai Wang et.al. 2307.03990 null
2023-07-05 Interactive Conversational Head Generation Mohan Zhou et.al. 2307.02090 null
2023-07-04 A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation Louis Airale et.al. 2307.03270 link
2023-07-04 Generating Animatable 3D Cartoon Faces from Single Portraits Chuanyu Pan et.al. 2307.01468 null
2023-07-03 RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations Neha Sahipjohn et.al. 2307.01233 null
2023-06-20 Audio-Driven 3D Facial Animation from In-the-Wild Videos Liying Lu et.al. 2306.11541 null
2023-06-13 Parametric Implicit Face Representation for Audio-Driven Facial Reenactment Ricong Huang et.al. 2306.07579 null
2023-06-13 AniFaceDrawing: Anime Portrait Exploration during Your Sketching Zhengyu Huang et.al. 2306.07476 null
2023-06-12 NPVForensics: Jointing Non-critical Phonemes and Visemes for Deepfake Detection Yu Chen et.al. 2306.06885 null
2023-06-10 StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles Yifeng Ma et.al. 2301.01081 link
2023-06-08 ReliableSwap: Boosting General Face Swapping Via Reliable Supervision Ge Yuan et.al. 2306.05356 link
2023-06-06 Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks Jianrong Wang et.al. 2306.03594 null
2023-06-05 Instruct-Video2Avatar: Video-to-Avatar Generation with Instructions Shaoxu Li et.al. 2306.02903 link
2023-05-31 High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning Chao Xu et.al. 2305.02572 null
2023-05-23 CPNet: Exploiting CLIP-based Attention Condenser and Probability Map Guidance for High-fidelity Talking Face Generation Jingning Xu et.al. 2305.13962 null
2023-05-22 RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars Dongwei Pan et.al. 2305.13353 link
2023-05-19 UniFLG: Unified Facial Landmark Generator from Text or Speech Kentaro Mitsui et.al. 2302.14337 null
2023-05-18 An Android Robot Head as Embodied Conversational Agent Marcel Heisler et.al. 2305.10945 null
2023-05-18 Audio-Visual Person-of-Interest DeepFake Detection Davide Cozzolino et.al. 2204.03083 link
2023-05-17 INCLG: Inpainting for Non-Cleft Lip Generation with a Multi-Task Image Processing Network Shuang Chen et.al. 2305.10589 null
2023-05-17 LPMM: Intuitive Pose Control for Neural Talking-Head Model via Landmark-Parameter Morphable Model Kwangho Lee et.al. 2305.10456 null
2023-05-15 Identity-Preserving Talking Face Generation with Landmark and Appearance Priors Weizhi Zhong et.al. 2305.08293 link
2023-05-09 Zero-shot personalized lip-to-speech synthesis with face image based voice control Zheng-Yan Sheng et.al. 2305.14359 null
2023-05-09 StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator Jiazhi Guan et.al. 2305.05445 null
2023-05-09 Multimodal-driven Talking Face Generation via a Unified Diffusion-based Generator Chao Xu et.al. 2305.02594 null
2023-05-01 StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video Lizhen Wang et.al. 2305.00942 link
2023-05-01 GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation Zhenhui Ye et.al. 2305.00787 null
2023-04-28 A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation Bo-Kyeong Kim et.al. 2304.00471 null
2023-04-27 Controllable One-Shot Face Video Synthesis With Semantic Aware Prior Kangning Liu et.al. 2304.14471 null
2023-04-25 AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head Rongjie Huang et.al. 2304.12995 link
2023-04-24 VR Facial Animation for Immersive Telepresence Avatars Andre Rochow et.al. 2304.12051 null
2023-04-21 Implicit Neural Head Synthesis via Controllable Local Deformation Fields Chuhan Chen et.al. 2304.11113 null
2023-04-20 DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation Shuai Shen et.al. 2301.03786 link
2023-04-18 Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations Rongliang Wu et.al. 2304.08945 null
2023-04-17 Autoregressive GAN for Semantic Unconditional Head Motion Generation Louis Airale et.al. 2211.00987 link
2023-04-11 One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field Weichuang Li et.al. 2304.05097 null
2023-04-06 Face Animation with an Attribute-Guided Diffusion Model Bohan Zeng et.al. 2304.03199 link
2023-04-06 4D Agnostic Real-Time Facial Animation Pipeline for Desktop Scenarios Wei Chen et.al. 2304.02814 null
2023-04-03 CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior Jinbo Xing et.al. 2301.02379 link
2023-04-01 DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance Longwen Zhang et.al. 2304.03117 null
2023-04-01 TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles Yifeng Ma et.al. 2304.00334 null
2023-03-31 FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions Jin Liu et.al. 2303.17789 null
2023-03-31 Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert Jiadong Wang et.al. 2303.17480 null
2023-03-27 OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis Hongyi Xu et.al. 2303.15539 null
2023-03-27 Accurate and Interpretable Solution of the Inverse Rig for Realistic Blendshape Models with Quadratic Corrective Terms Stevo Racković et.al. 2302.04843 null
2023-03-27 MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation Bowen Zhang et.al. 2212.08062 link
2023-03-27 A Majorization-Minimization Based Method for Nonconvex Inverse Rig Problems in Facial Animation: Algorithm Derivation Stevo Racković et.al. 2205.04289 null
2023-03-26 OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering Zhiyuan Ma et.al. 2303.14662 link
2023-03-26 Emotionally Enhanced Talking Face Generation Sahil Goyal et.al. 2303.11548 link
2023-03-26 Distributed Solution of the Inverse Rig Problem in Blendshape Facial Animation Stevo Racković et.al. 2303.06370 null
2023-03-24 Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement Siddarth Ravichandran et.al. 2209.01320 null
2023-03-23 PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360 $^{\circ}$ Sizhe An et.al. 2303.13071 null
2023-03-22 Style Transfer for 2D Talking Head Animation Trong-Thang Pham et.al. 2303.09799 link
2023-03-22 MARLIN: Masked Autoencoder for facial video Representation LearnINg Zhixi Cai et.al. 2211.06627 link
2023-03-14 DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions Geumbyeol Hwang et.al. 2303.07697 link
2023-03-13 SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation Wenxuan Zhang et.al. 2211.12194 link
2023-03-09 FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning Kazi Injamamul Haque et.al. 2303.05416 link
2023-03-09 Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation Qi Chen et.al. 2303.05322 link
2023-03-07 DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video Zhimeng Zhang et.al. 2303.03988 link
2023-03-05 Cyber Vaccine for Deepfake Immunity Ching-Chun Chang et.al. 2303.02659 null
2023-03-04 High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors Yunpeng Bai et.al. 2211.15064 null
2023-03-01 DPE: Disentanglement of Pose and Expression for General Video Portrait Editing Youxin Pang et.al. 2301.06281 link
2023-02-27 Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video Minsu Kim et.al. 2303.08670 null
2023-02-27 Memory-augmented Contrastive Learning for Talking Head Generation Jianrong Wang et.al. 2302.13469 link
2023-02-24 Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertex Attention Bin Liu et.al. 2302.12532 null
2023-02-16 OPT: One-shot Pose-Controllable Talking Head Generation Jin Liu et.al. 2302.08197 null
2023-02-14 Expressive Talking Head Video Encoding in StyleGAN2 Latent-Space Trevine Oorloff et.al. 2203.14512 link
2023-01-31 GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis Zhenhui Ye et.al. 2301.13430 null
2023-01-23 Data standardization for robust lip sync Chun Wang et.al. 2202.06198 null
2023-01-20 Neural Volumetric Blendshapes: Computationally Efficient Physics-Based Facial Blendshapes Nicolas Wagner et.al. 2212.14784 null
2023-01-15 Learning Audio-Driven Viseme Dynamics for 3D Face Animation Linchao Bao et.al. 2301.06059 null
2022-12-30 Imitator: Personalized Speech-driven 3D Facial Animation Balamurugan Thambiraja et.al. 2301.00023 null
2022-12-28 All's well that FID's well? Result quality and metric scores in GAN models for lip-sychronization tasks Carina Geldhauser et.al. 2212.13810 null
2022-12-23 Dubbing in Practice: A Large Scale Study of Human Localization With Insights for Automatic Dubbing William Brannon et.al. 2212.12137 null
2022-12-09 Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers Yasheng Sun et.al. 2212.04970 null
2022-12-07 Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors Zhentao Yu et.al. 2212.04248 null
2022-12-07 SPACE: Speech-driven Portrait Animation with Controllable Expression Siddharth Gururani et.al. 2211.09809 null
2022-11-30 Extracting Semantic Knowledge from GANs with Unsupervised Learning Jianjin Xu et.al. 2211.16710 null
2022-11-29 VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild Kun Cheng et.al. 2211.14758 null
2022-11-26 Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis Duomin Wang et.al. 2211.14506 link
2022-11-22 Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition Jiaxiang Tang et.al. 2211.12368 null
2022-11-10 On the role of Lip Articulation in Visual Speech Perception Zakaria Aldeneh et.al. 2203.10117 null
2022-11-04 SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory Se Jin Park et.al. 2211.00924 null
2022-10-21 Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection Alexandros Haliassos et.al. 2201.07131 link
2022-10-14 Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar Aolan Sun et.al. 2210.06877 null
2022-10-13 Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors Vladimir Iashin et.al. 2210.07055 link
2022-10-07 Compressing Video Calls using Synthetic Talking Heads Madhav Agarwal et.al. 2210.03692 null
2022-10-07 A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis Yichen Han et.al. 2210.03335 null
2022-10-06 Audio-Visual Face Reenactment Madhav Agarwal et.al. 2210.02755 link
2022-10-06 Finding Directions in GAN's Latent Space for Neural Face Reenactment Stella Bounareli et.al. 2202.00046 link
2022-10-04 Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale Aditya Agarwal et.al. 2208.09796 null
2022-09-29 Facial Landmark Predictions with Applications to Metaverse Qiao Han et.al. 2209.14698 link
2022-09-27 StyleMask: Disentangling the Style Space of StyleGAN2 for Neural Face Reenactment Stella Bounareli et.al. 2209.13375 link
2022-09-23 EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model Xinya Ji et.al. 2205.15278 null
2022-09-21 FNeVR: Neural Volume Rendering for Face Animation Bohan Zeng et.al. 2209.10340 link
2022-09-19 AutoLV: Automatic Lecture Video Generator Wenbin Wang et.al. 2209.08795 null
2022-09-09 Talking Head from Speech Audio using a Pre-trained Image Generator Mohammed M. Alghamdi et.al. 2209.04252 null
2022-09-07 Restructurable Activation Networks Kartikeya Bhardwaj et.al. 2208.08562 link
2022-08-29 StableFace: Analyzing and Improving Motion Stability for Talking Face Generation Jun Ling et.al. 2208.13717 null
2022-08-17 Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors Sindhu B Hegde et.al. 2208.08118 link
2022-08-03 Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control Michail Christos Doukas et.al. 2208.02210 null
2022-08-02 Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer Ailin Huang et.al. 2206.12837 link
2022-08-01 A Feasibility Study on Image Inpainting for Non-cleft Lip Generation from Patients with Cleft Lip Shuang Chen et.al. 2208.01149 link
2022-07-27 A Hybrid Deep Animation Codec for Low-bitrate Video Conferencing Goluck Konuko et.al. 2207.13530 null
2022-07-24 Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis Shuai Shen et.al. 2207.11770 link
2022-07-22 Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos Panagiotis P. Filntisis et.al. 2207.11094 link
2022-07-20 NARRATE: A Normal Assisted Free-View Portrait Stylizer Youjia Wang et.al. 2207.00974 null
2022-07-20 VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection Joanna Hong et.al. 2206.07458 null
2022-07-20 Responsive Listening Head Generation: A Benchmark Dataset and Baseline Mohan Zhou et.al. 2112.13548 null
2022-07-13 FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis Yongqi Wang et.al. 2207.03800 link
2022-06-29 Cut Inner Layers: A Structured Pruning Strategy for Efficient U-Net GANs Bo-Kyeong Kim et.al. 2206.14658 null
2022-06-09 Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos Alexander Waibel et.al. 2206.04523 null
2022-05-31 Text/Speech-Driven Full-Body Animation Wenlin Zhuang et.al. 2205.15573 null
2022-05-27 Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast Boqing Zhu et.al. 2204.14057 link
2022-05-26 One-Shot Face Reenactment on Megapixels Wonjun Kang et.al. 2205.13368 null
2022-05-24 Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts Debjoy Saha et.al. 2205.12194 link
2022-05-20 MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement Alexander Richard et.al. 2104.08223 link
2022-05-13 Talking Face Generation with Multilingual TTS Hyoung-Kyu Song et.al. 2205.06421 null
2022-05-02 Emotion-Controllable Generalized Talking Face Generation Sanjana Sinha et.al. 2205.01155 null
2022-05-02 A Novel Speech-Driven Lip-Sync Model with CNN and LSTM Xiaohong Li et.al. 2205.00916 null
2022-04-27 Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion Sen Chen et.al. 2204.12756 null
2022-04-25 Fast Facial Landmark Detection and Applications: A Survey Kostiantyn Khabarlak et.al. 2101.10808 null
2022-04-13 Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions Zipeng Ye et.al. 2204.06180 null
2022-04-12 Attention-Based Lip Audio-Visual Synthesis for Talking Face Generation in the Wild Ganglai Wang et.al. 2203.03984 null
2022-04-06 Transformer-S2A: Robust and Efficient Speech-to-Animation Liyang Chen et.al. 2111.09771 null
2022-04-03 Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text Pulkit Tandon et.al. 2106.14014 link
2022-03-30 End to End Lip Synchronization with a Temporal AutoEncoder Yoav Shalev et.al. 2203.16224 link
2022-03-29 Thin-Plate Spline Motion Model for Image Animation Jian Zhao et.al. 2203.14367 link
2022-03-17 StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN Fei Yin et.al. 2203.04036 link
2022-03-17 FaceFormer: Speech-Driven 3D Facial Animation with Transformers Yingruo Fan et.al. 2112.05329 link
2022-03-16 Efficient conditioned face animation using frontally-viewed embedding Maxime Oquab et.al. 2203.08765 null
2022-03-15 Depth-Aware Generative Adversarial Network for Talking Head Video Generation Fa-Ting Hong et.al. 2203.06605 link
2022-03-10 An Audio-Visual Attention Based Multimodal Network for Fake Talking Face Videos Detection Ganglai Wang et.al. 2203.05178 null
2022-03-04 Multi-modality Deep Restoration of Extremely Compressed Face Videos Xi Zhang et.al. 2107.05548 null
2022-03-01 FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset Hasam Khalid et.al. 2108.05080 link
2022-02-25 FSGANv2: Improved Subject Agnostic Face Swapping and Reenactment Yuval Nirkin et.al. 2202.12972 null
2022-02-22 Thinking the Fusion Strategy of Multi-reference Face Reenactment Takuya Yashima et.al. 2202.10758 null
2022-01-24 Selective Listening by Synchronizing Speech with Lips Zexu Pan et.al. 2106.07150 link
2022-01-22 Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary Sibo Zhang et.al. 2104.14631 null
2022-01-21 Stitch it in Time: GAN-Based Facial Editing of Real Videos Rotem Tzaban et.al. 2201.08361 link
2022-01-17 Towards Realistic Visual Dubbing with Heterogeneous Sources Tianyi Xie et.al. 2201.06260 null
2022-01-16 Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels Zipeng Ye et.al. 2201.05986 null
2022-01-03 DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering Shunyu Yao et.al. 2201.00791 null
2021-12-20 Parallel and High-Fidelity Text-to-Lip Generation Jinglin Liu et.al. 2107.06831 link
2021-12-19 Initiative Defense against Facial Manipulation Qidong Huang et.al. 2112.10098 link
2021-12-07 Joint Audio-Text Model for Expressive Speech-Driven 3D Facial Animation Yingruo Fan et.al. 2112.02214 null
2021-12-06 One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning Suzhen Wang et.al. 2112.02749 null
2021-11-29 Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates Shenhan Qian et.al. 2108.08020 link
2021-11-04 FEAFA+: An Extended Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation Wei Gan et.al. 2111.02751 null
2021-11-02 BiosecurID: a multimodal biometric database Julian Fierrez et.al. 2111.03472 null
2021-10-30 Imitating Arbitrary Talking Style for Realistic Audio-DrivenTalking Face Synthesis Haozhe Wu et.al. 2111.00203 link
2021-10-26 Emotion recognition in talking-face videos using persistent entropy and neural networks Eduardo Paluzo-Hidalgo et.al. 2110.13571 link
2021-10-26 ViDA-MAN: Visual Dialog with Digital Humans Tong Shen et.al. 2110.13384 null
2021-10-22 Invertible Frowns: Video-to-Video Facial Emotion Translation Ian Magnusson et.al. 2109.08061 null
2021-10-19 Talking Head Generation with Audio and Speech Related Facial Action Units Sen Chen et.al. 2110.09951 null
2021-10-16 Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor Anchit Gupta et.al. 2110.08580 null
2021-10-12 Fine-grained Identity Preserving Landmark Synthesis for Face Reenactment Haichao Zhang et.al. 2110.04708 null
2021-10-07 Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution Yangyang Shi et.al. 2110.05241 null
2021-09-24 Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation Yuanxun Lu et.al. 2109.10595 null
2021-09-20 Accurate, Interpretable, and Fast Animation: An Iterative, Sparse, and Nonconvex Approach Stevo Rackovic et.al. 2109.08356 null
2021-09-17 Detection of GAN-synthesized street videos Omran Alamayreh et.al. 2109.04991 null
2021-08-30 Audiovisual Speech Synthesis using Tacotron2 Ahmed Hussen Abdelaziz et.al. 2008.00620 null
2021-08-23 KoDF: A Large-scale Korean DeepFake Detection Dataset Patrick Kwon et.al. 2103.10094 null
2021-08-23 HeadGAN: One-shot Neural Head Synthesis and Editing Michail Christos Doukas et.al. 2012.08261 null
2021-08-19 AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis Yudong Guo et.al. 2103.11078 link
2021-08-18 DeepFake MNIST+: A DeepFake Facial Animation Dataset Jiajun Huang et.al. 2108.07949 link
2021-08-18 FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning Chenxu Zhang et.al. 2108.07938 link
2021-08-12 UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing Meng Cao et.al. 2108.05650 null
2021-08-11 AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person Xinsheng Wang et.al. 2108.04325 null
2021-08-06 SofGAN: A Portrait Image Generator with Dynamic Styling Anpei Chen et.al. 2007.03780 link
2021-07-27 Beyond Voice Identity Conversion: Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations Laurent Benaroya et.al. 2107.12346 null
2021-07-21 Speech Driven Talking Face Generation from a Single Image and an Emotion Condition Sefik Emre Eskimez et.al. 2008.03592 link
2021-07-20 Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion Suzhen Wang et.al. 2107.09293 link
2021-07-10 Speech2Video: Cross-Modal Distillation for Speech to Video Generation Shijing Si et.al. 2107.04806 null
2021-07-07 Egocentric Videoconferencing Mohamed Elgharib et.al. 2107.03109 null
2021-06-09 LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization Avisek Lahiri et.al. 2106.04185 null
2021-05-20 Audio-Driven Emotional Video Portraits Xinya Ji et.al. 2104.07452 null
2021-05-07 Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation Lincheng Li et.al. 2104.07995 link
2021-05-05 A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors Ruobing Zheng et.al. 2002.08700 null
2021-04-29 Learned Spatial Representations for Few-shot Talking-Head Synthesis Moustafa Meshry et.al. 2104.14557 null
2021-04-26 One-shot Face Reenactment Using Appearance Adaptive Normalization Guangming Yao et.al. 2102.03984 null
2021-04-25 3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head Qianyun Wang et.al. 2104.12051 null
2021-04-23 Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation Hang Zhou et.al. 2104.11116 null
2021-04-07 Single Source One Shot Reenactment using Weighted motion From Paired Feature Points Soumya Tripathy et.al. 2104.03117 null
2021-04-07 Everything's Talkin': Pareidolia Face Reenactment Linsen Song et.al. 2104.03061 link
2021-04-07 LI-Net: Large-Pose Identity-Preserving Face Reenactment Network Jin Liu et.al. 2104.02850 null
2021-04-02 One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing Ting-Chun Wang et.al. 2011.15126 null
2021-03-20 Not made for each other- Audio-Visual Dissonance-based Deepfake Detection and Localization Komal Chugh et.al. 2005.14405 link
2021-03-19 End-to-End Lip Synchronisation Based on Pattern Classification You Jin Kim et.al. 2005.08606 null
2021-03-05 Real-time RGBD-based Extended Body Pose Estimation Renat Bashirov et.al. 2103.03663 link
2021-03-03 Estimating Uniqueness of I-Vector Representation of Human Voice Erkam Sinan Tandogan et.al. 2008.11985 null
2021-02-25 MakeItTalk: Speaker-Aware Talking-Head Animation Yang Zhou et.al. 2004.12992 null
2021-02-19 One Shot Audio to Animated Video Generation Neeraj Kumar et.al. 2102.09737 null
2021-02-18 AudioVisual Speech Synthesis: A brief literature review Efthymios Georgiou et.al. 2103.03927 null
2020-12-14 Robust One Shot Audio to Video Generation Neeraj Kumar et.al. 2012.07842 null
2020-12-14 Multi Modal Adaptive Normalization for Audio to Video Generation Neeraj Kumar et.al. 2012.07304 null
2020-11-30 Adaptive Compact Attention For Few-shot Video-to-video Translation Risheng Huang et.al. 2011.14695 null
2020-11-21 Stochastic Talking Face Generation Using Latent Distribution Matching Ravindra Yadav et.al. 2011.10727 link
2020-11-21 Iterative Text-based Editing of Talking-heads Using Neural Retargeting Xinwei Yao et.al. 2011.10688 null
2020-11-09 FACEGAN: Facial Attribute Controllable rEenactment GAN Soumya Tripathy et.al. 2011.04439 null
2020-11-06 Large-scale multilingual audio visual dubbing Yi Yang et.al. 2011.03530 null
2020-11-02 Facial Keypoint Sequence Generation from Audio Prateek Manocha et.al. 2011.01114 null
2020-10-25 APB2FaceV2: Real-Time Audio-Guided Multi-Face Reenactment Jiangning Zhang et.al. 2010.13017 link
2020-10-12 Intuitive Facial Animation Editing Based On A Generative RNN Framework Eloïse Berson et.al. 2010.05655 null
2020-10-05 SMILE: Semantically-guided Multi-attribute Image and Layout Editing Andrés Romero et.al. 2010.02315 link
2020-10-05 Dynamic Facial Asset and Rig Generation from a Single Scan Jiaman Li et.al. 2010.00560 null
2020-09-20 An Improved Approach of Intention Discovery with Machine Learning for POMDP-based Dialogue Management Ruturaj Raval et.al. 2009.09354 null
2020-09-18 Mesh Guided One-shot Face Reenactment using Graph Convolutional Networks Guangming Yao et.al. 2008.07783 null
2020-09-12 DualLip: A System for Joint Lip Reading and Generation Weicong Chen et.al. 2009.05784 null
2020-09-02 Seeing wake words: Audio-visual Keyword Spotting Liliane Momeni et.al. 2009.01225 null
2020-08-29 "It took me almost 30 minutes to practice this". Performance and Production Practices in Dance Challenge Videos on TikTok Daniel Klug et.al. 2008.13040 null
2020-08-25 A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild K R Prajwal et.al. 2008.10010 null
2020-08-11 Audio- and Gaze-driven Facial Animation of Codec Avatars Alexander Richard et.al. 2008.05023 null
2020-08-04 Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract Tamás Gábor Csapó et.al. 2008.02098 link
2020-08-04 Real-Time Cleaning and Refinement of Facial Animation Signals Eloïse Berson et.al. 2008.01332 null
2020-08-02 Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos Yanhui Guo et.al. 2008.01652 null
2020-07-29 Neural Voice Puppetry: Audio-driven Facial Reenactment Justus Thies et.al. 1912.05566 link
2020-07-20 Deformable Style Transfer Sunnie S. Y. Kim et.al. 2003.11038 link
2020-07-18 A Robust Interactive Facial Animation Editing System Eloïse Berson et.al. 2007.09367 null
2020-07-16 Talking-head Generation with Rhythmic Head Motion Lele Chen et.al. 2007.08547 link
2020-07-08 Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision Abhinav Shukla et.al. 2007.04134 null
2020-06-20 Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams Huirong Huang et.al. 2006.11610 null
2020-05-27 Modality Dropout for Improved Performance-driven Talking Faces Ahmed Hussen Abdelaziz et.al. 2005.13616 null
2020-05-25 Identity-Preserving Realistic Talking Face Generation Sanjana Sinha et.al. 2005.12318 null
2020-05-22 Head2Head: Video-based Neural Head Synthesis Mohammad Rami Koujan et.al. 2005.10954 null
2020-05-16 FReeNet: Multi-Identity Face Reenactment Jiangning Zhang et.al. 1905.11805 null
2020-05-13 FaR-GAN for One-Shot Face Reenactment Hanxiang Hao et.al. 2005.06402 null
2020-05-13 Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning Hao Zhu et.al. 1812.06589 null
2020-05-11 Dancing to the Partisan Beat: A First Analysis of Political Communication on TikTok Juan Carlos Medina Serrano et.al. 2004.05478 link
2020-05-07 What comprises a good talking-head video generation?: A Survey and Benchmark Lele Chen et.al. 2005.03201 link
2020-05-04 Disentangled Speech Embeddings using Cross-modal Self-supervision Arsha Nagrani et.al. 2002.08742 null
2020-04-30 APB2Face: Audio-guided face reenactment with auxiliary pose and blink signals Jiangning Zhang et.al. 2004.14569 null
2020-03-30 ActGAN: Flexible and Efficient One-shot Face Reenactment Ivan Kosarevych et.al. 2003.13840 null
2020-03-29 Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose Xianfang Zeng et.al. 2003.12957 null
2020-03-26 High-Accuracy Facial Depth Models derived from 3D Synthetic Data Faisal Khan et.al. 2003.06211 null
2020-03-06 Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose Ran Yi et.al. 2002.10137 null
2020-03-05 Talking-Heads Attention Noam Shazeer et.al. 2003.02436 link
2020-03-01 Towards Automatic Face-to-Face Translation Prajwal K R et.al. 2003.00418 link
2020-02-19 Speech-driven facial animation using polynomial fusion of features Triantafyllos Kefalas et.al. 1912.05833 null
2020-01-17 ICface: Interpretable and Controllable Face Reenactment Using GANs Soumya Tripathy et.al. 1904.01909 null
2019-12-20 Disentangling Style and Content in Anime Illustrations Sitao Xiang et.al. 1905.10742 null
2019-11-21 FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis Kuangxiao Gu et.al. 1911.09224 null
2019-11-19 MarioNETte: Few-shot Face Reenactment Preserving Identity of Unseen Targets Sungjoo Ha et.al. 1911.08139 null
2019-10-28 Few-shot Video-to-Video Synthesis Ting-Chun Wang et.al. 1910.12713 null
2019-10-19 Real-Time Lip Sync for Live 2D Animation Deepali Aneja et.al. 1910.08685 link
2019-10-16 Designing Style Matching Conversational Agents Deepali Aneja et.al. 1910.07514 null
2019-10-15 A High-Fidelity Open Embodied Avatar with Lip Syncing and Expression Capabilities Deepali Aneja et.al. 1909.08766 link
2019-10-09 EmoCo: Visual Analysis of Emotion Coherence in Presentation Videos Haipeng Zeng et.al. 1907.12918 null
2019-10-02 Animating Face using Disentangled Audio Representations Gaurav Mittal et.al. 1910.00726 null
2019-09-25 Few-Shot Adversarial Learning of Realistic Neural Talking Head Models Egor Zakharov et.al. 1905.08233 null
2019-09-06 Neural Style-Preserving Visual Dubbing Hyeongwoo Kim et.al. 1909.02518 null
2019-08-29 3D Face Pose and Animation Tracking via Eigen-Decomposition based Bayesian Approach Ngoc-Trung Tran et.al. 1908.11039 null
2019-08-20 Prosodic Phrase Alignment for Machine Dubbing Alp Öktem et.al. 1908.07226 link
2019-08-16 FSGAN: Subject Agnostic Face Swapping and Reenactment Yuval Nirkin et.al. 1908.05932 link
2019-08-11 Emotion Dependent Facial Animation from Affective Speech Rizwan Sadiq et.al. 1908.03904 null
2019-08-05 One-shot Face Reenactment Yunxuan Zhang et.al. 1908.03251 link
2019-07-25 Talking Face Generation by Conditional Recurrent Adversarial Network Yang Song et.al. 1804.04786 link
2019-07-24 Data-Driven Physical Face Inversion Yeara Kozlov et.al. 1907.10402 null
2019-07-23 A system for efficient 3D printed stop-motion face animation Rinat Abdrashitov et.al. 1907.10163 null
2019-06-14 Realistic Speech-Driven Facial Animation with GANs Konstantinos Vougioukas et.al. 1906.06337 null
2019-06-04 Text-based Editing of Talking-head Video Ohad Fried et.al. 1906.01524 null
2019-05-27 Audio2Face: Generating Speech/Face Animation from Single Audio with Attention-Based Bidirectional LSTM Networks Guanzhong Tian et.al. 1905.11142 null
2019-05-09 Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss Lele Chen et.al. 1905.03820 link
2019-05-08 Capture, Learning, and Synthesis of 3D Speaking Styles Daniel Cudeiro et.al. 1905.03079 link
2019-04-23 Talking Face Generation by Adversarially Disentangled Audio-Visual Representation Hang Zhou et.al. 1807.07860 null
2019-04-02 FEAFA: A Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation Yanfu Yan et.al. 1904.01509 null
2019-03-13 Animating an Autonomous 3D Talking Avatar Dominik Borer et.al. 1903.05448 null
2018-12-22 Deep Audio-Visual Speech Recognition Triantafyllos Afouras et.al. 1809.02108 null
2018-12-20 DeepFakes: a New Threat to Face Recognition? Assessment and Detection Pavel Korshunov et.al. 1812.08685 null
2018-11-22 Towards Highly Accurate and Stable Face Alignment for High-Resolution Videos Ying Tai et.al. 1811.00342 link
2018-11-16 Influence of visual cues on head and eye movements during listening tasks in multi-talker audiovisual environments with animated characters Maartje M. E. Hendrikse et.al. 1812.02088 null
2018-08-28 GANimation: Anatomically-aware Facial Animation from a Single Image Albert Pumarola et.al. 1807.09251 link
2018-08-19 Dynamic Temporal Alignment of Speech to Lips Tavi Halperin et.al. 1808.06250 link
2018-07-29 ReenactGAN: Learning to Reenact Faces via Boundary Transfer Wayne Wu et.al. 1807.11079 link
2018-07-26 Learnable PINs: Cross-Modal Embeddings for Person Identity Arsha Nagrani et.al. 1805.00833 null
2018-07-19 End-to-End Speech-Driven Facial Animation with Temporal GANs Konstantinos Vougioukas et.al. 1805.09313 null
2018-05-29 Deep Video Portraits Hyeongwoo Kim et.al. 1805.11714 null
2018-05-24 VisemeNet: Audio-Driven Animator-Centric Speech Animation Yang Zhou et.al. 1805.09488 null
2018-05-21 Anime Style Space Exploration Using Metric Learning and Generative Adversarial Networks Sitao Xiang et.al. 1805.07997 null
2018-04-23 Generating Talking Face Landmarks from Speech Sefik Emre Eskimez et.al. 1803.09803 null
2018-03-28 Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network Hai X. Pham et.al. 1803.07716 null
2018-03-20 Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks Seyed Ali Jalalifar et.al. 1803.07461 null
2017-12-07 End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech Hai X. Pham et.al. 1710.00920 null
2017-12-06 ObamaNet: Photo-realistic lip-sync from text Rithesh Kumar et.al. 1801.01442 null
2017-07-30 Kernel Projection of Latent Structures Regression for Facial Animation Retargeting Christos Ouzounis et.al. 1707.09629 null
2017-07-26 Fast Deep Matting for Portrait Animation on Mobile Phone Bingke Zhu et.al. 1707.08289 null
2017-07-21 Multichannel Attention Network for Analyzing Visual Behavior in Public Speaking Rahul Sharma et.al. 1707.06830 null
2017-07-18 You said that? Joon Son Chung et.al. 1705.02966 null
2017-01-30 Lip Reading Sentences in the Wild Joon Son Chung et.al. 1611.05358 link
2016-10-28 Galaxy gas as obscurer: II. Separating the galaxy-scale and nuclear obscurers of Active Galactic Nuclei Johannes Buchner et.al. 1610.09380 link
2016-07-11 Large-Scale MIMO is Capable of Eliminating Power-Thirsty Channel Coding for Wireless Transmission of HEVC/H.265 Video Shaoshi Yang et.al. 1601.06684 null
2016-05-22 Improving Facial Analysis and Performance Driven Animation through Disentangling Identity and Expression David Rim et.al. 1512.08212 null
2016-02-08 Automatic Face Reenactment Pablo Garrido et.al. 1602.02651 null
2015-11-20 ExpressionBot: An Emotive Lifelike Robotic Face for Face-to-Face Communication Ali Mollahosseini et.al. 1511.06502 null
2014-09-03 Visual Speech Recognition Ahmad B. A. Hassanat et.al. 1409.1411 null
2012-09-22 Using multimodal speech production data to evaluate articulatory animation for audiovisual speech synthesis Ingmar Steiner et.al. 1209.4982 null
2012-03-30 Face Expression Recognition and Analysis: The State of the Art Vinay Bettadapura et.al. 1203.6722 null
2012-01-19 Progress in animation of an EMA-controlled tongue model for acoustic-visual speech synthesis Ingmar Steiner et.al. 1201.4080 null
2010-03-01 Re-verification of a Lip Synchronization Protocol using Robust Reachability Piotr Kordy et.al. 1003.0431 null

(back to top)

Image Animation

Image Animation

Publish Date Title Authors PDF Code
2025-11-18 PFAvatar: Pose-Fusion 3D Personalized Avatar Reconstruction from Real-World Outfit-of-the-Day Photos Dianbing Xi et.al. 2511.12935 null
2025-11-16 Sketch2PoseNet: Efficient and Generalized Sketch to 3D Human Pose Prediction Li Wang et.al. 2510.26196 null
2025-11-14 EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation Zongyang Qiu et.al. 2511.11002 null
2025-11-11 OmniAID: Decoupling Semantic and Artifacts for Universal AI-Generated Image Detection in the Wild Yuncheng Guo et.al. 2511.08423 null
2025-11-11 oboro: Text-to-Image Synthesis on Limited Data using Flow-based Diffusion Transformer with MMH Attention Ryusuke Mizutani et.al. 2511.08168 null
2025-11-11 Beyond the Pixels: VLM-based Evaluation of Identity Preservation in Reference-Guided Synthesis Aditi Singhania et.al. 2511.08087 null
2025-11-09 Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising Assaf Singer et.al. 2511.08633 null
2025-11-04 Video Text Preservation with Synthetic Text-Rich Videos Ziyang Liu et.al. 2511.05573 null
2025-11-03 FreeArt3D: Training-Free Articulated Object Generation using 3D Diffusion Chuhao Chen et.al. 2510.25765 null
2025-11-02 A Hybrid YOLOv5-SSD IoT-Based Animal Detection System for Durian Plantation Protection Anis Suttan Shahrir et.al. 2511.00777 null
2025-10-31 DANCER: Dance ANimation via Condition Enhancement and Rendering with diffusion model Yucheng Xing et.al. 2510.27169 null
2025-10-29 4-Doodle: Text to 3D Sketches that Move! Hao Chen et.al. 2510.25319 null
2025-10-28 DogMo: A Large-Scale Multi-View RGB-D Dataset for 4D Canine Motion Recovery Zan Wang et.al. 2510.24117 null
2025-10-27 Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation Junyoung Seo et.al. 2510.23581 null
2025-10-27 Revising Second Order Terms in Deep Animation Video Coding Konstantin Schmidt et.al. 2510.23561 null
2025-10-26 Cross-Species Transfer Learning in Agricultural AI: Evaluating ZebraPose Adaptation for Dairy Cattle Pose Estimation Mackenzie Tapp et.al. 2510.22618 null
2025-10-26 DynaPose4D: High-Quality 4D Dynamic Content Generation via Pose Alignment Loss Jing Yang et.al. 2510.22473 null
2025-10-20 From Volume Rendering to 3D Gaussian Splatting: Theory and Applications Vitor Pereira Matias et.al. 2510.18101 null
2025-10-16 Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation Shaowei Liu et.al. 2510.14976 null
2025-10-16 Zero-Shot Wildlife Sorting Using Vision Transformers: Evaluating Clustering and Continuous Similarity Ordering Hugo Markoff et.al. 2510.14596 null
2025-10-16 Hierarchical Re-Classification: Combining Animal Classification Models with Vision Transformers Hugo Markoff et.al. 2510.14594 null
2025-10-16 Evaluating plastic scintillator performance as a substitute of LYSO in SiPM based animal PET scanners: A GEANT4 simulation analysis Davinder Siwal et.al. 2510.14437 null
2025-10-16 Multi-identity Human Image Animation with Structural Video Diffusion Zhenzhi Wang et.al. 2504.04126 null
2025-09-19 TT-DF: A Large-Scale Diffusion-Based Dataset and Benchmark for Human Body Forgery Detection Wenkui Yang et.al. 2505.08437 null
2025-09-09 LINR Bridge: Vector Graphic Animation via Neural Implicits and Video Diffusion Priors Wenshuo Gao et.al. 2509.07484 null
2025-08-23 AnimateAnywhere: Rouse the Background in Human Image Animation Xiaoyu Liu et.al. 2504.19834 null
2025-08-14 Animate-X++: Universal Character Image Animation with Dynamic Backgrounds Shuai Tan et.al. 2508.09454 null
2025-08-10 Consistent and Controllable Image Animation with Motion Linear Diffusion Transformers Xin Ma et.al. 2508.07246 null
2025-07-20 StableAnimator++: Overcoming Pose Misalignment and Face Distortion for Human Image Animation Shuyuan Tu et.al. 2507.15064 null
2025-07-11 X-Dancer: Expressive Music to Human Dance Video Generation Zeyuan Chen et.al. 2502.17414 null
2025-07-01 DAM-VSR: Disentanglement of Appearance and Motion for Video Super-Resolution Zhe Kong et.al. 2507.01012 null
2025-07-01 Recomposed realities: animating still images via patch clustering and randomness Markus Juvonen et.al. 2506.22556 null
2025-05-30 MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation Yanbo Ding et.al. 2505.10238 null
2025-05-29 HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions Shuolin Xu et.al. 2505.22977 null
2025-05-24 EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation Qiang Qu et.al. 2503.18552 null
2025-05-18 DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation Haoyu Zhao et.al. 2503.21246 null
2025-04-20 DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance Yuxuan Luo et.al. 2504.01724 null
2025-04-15 UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer Xiang Wang et.al. 2504.11289 null
2025-04-15 Taming Consistency Distillation for Accelerated Human Image Animation Xiang Wang et.al. 2504.11143 null
2025-04-04 Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images In-Hwan Jin et.al. 2504.05458 null
2025-04-01 VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer Xinyu Liu et.al. 2502.05979 null
2025-03-23 MotiF: Making Text Count in Image Animation with Motion Focal Loss Shijie Wang et.al. 2412.16153 null
2025-03-13 Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer Jiahao Cui et.al. 2412.00733 link
2025-03-10 Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation Yingjie Chen et.al. 2501.05020 null
2025-02-25 DisPose: Disentangling Pose Guidance for Controllable Human Image Animation Hongxiang Li et.al. 2412.09349 link
2025-02-15 SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers Di Qiu et.al. 2502.10841 null
2025-02-10 Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance Li Hu et.al. 2502.06145 null
2025-02-06 MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation Jinbo Xing et.al. 2502.04299 null
2025-02-03 Every Image Listens, Every Image Dances: Music-Driven Image Animation Zhikang Dong et.al. 2501.18801 null
2025-01-20 X-Dyna: Expressive Dynamic Human Image Animation Di Chang et.al. 2501.10021 null
2025-01-15 Joint Learning of Depth and Appearance for Portrait Image Animation Xinya Ji et.al. 2501.08649 null
2024-12-12 Animate-X: Universal Character Image Animation with Enhanced Motion Representation Shuai Tan et.al. 2410.10306 null
2024-12-04 FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait Taekyung Ki et.al. 2412.01064 null
2024-11-30 DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses Yatian Pang et.al. 2412.00397 null
2024-11-28 JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation Xuyang Cao et.al. 2411.09209 link
2024-11-27 StableAnimator: High-Quality Identity-Preserving Human Image Animation Shuyuan Tu et.al. 2411.17697 link
2024-11-24 LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis Haojie Zhang et.al. 2411.16748 null
2024-11-22 HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation Zhenzhi Wang et.al. 2407.17438 null
2024-10-31 TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation Sunjae Yoon et.al. 2410.24037 null
2024-10-20 FrameBridge: Improving Image-to-Video Generation with Bridge Models Yuji Wang et.al. 2410.15371 null
2024-10-14 Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation Jiahao Cui et.al. 2410.07718 link
2024-09-30 Illustrious: an Open Advanced Illustration Model Sang Hyun Park et.al. 2409.19946 null
2024-09-29 High Quality Human Image Animation using Regional Supervision and Motion Blur Condition Zhongcong Xu et.al. 2409.19580 null
2024-09-22 Dormant: Defending against Pose-driven Human Image Animation Jiachen Zhou et.al. 2409.14424 link
2024-07-23 Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models Xin Ma et.al. 2407.15642 link
2024-07-12 TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models Jeongho Kim et.al. 2407.09012 null
2024-07-12 EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions Zhiyuan Chen et.al. 2407.08136 link
2024-07-11 MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model Muyao Niu et.al. 2405.20222 link
2024-06-16 Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation Mingwang Xu et.al. 2406.08801 null
2024-06-14 Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation Li Hu et.al. 2311.17117 null
2024-06-13 Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control Jingyun Xue et.al. 2406.03035 null
2024-06-03 UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation Xiang Wang et.al. 2406.01188 null
2024-06-01 Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance Shenhao Zhu et.al. 2403.14781 link
2024-05-29 Evaluating the efectiveness of sonifcation in science education using Edukoi Lucrezia Guiotto Nai Fovino et.al. 2405.18908 null
2024-05-28 VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation Qilin Wang et.al. 2405.18156 null
2024-05-28 Controllable Longer Image Animation with Diffusion Models Qiang Wang et.al. 2405.17306 null
2024-03-26 PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models Yiming Zhang et.al. 2312.13964 null
2024-03-13 Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts Yue Ma et.al. 2403.08268 link
2024-03-08 Audio-Synchronized Visual Animation Lin Zhang et.al. 2403.05659 link
2024-03-05 Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation Weijie Li et.al. 2403.02827 null
2024-01-17 Continuous Piecewise-Affine Based Motion Model for Image Animation Hexiang Wang et.al. 2401.09146 link
2024-01-03 Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions David Junhao Zhang et.al. 2401.01827 link
2023-12-08 AnimateZero: Video Diffusion Models are Zero-Shot Image Animators Jiwen Yu et.al. 2312.03793 null
2023-12-06 AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance Zuozhuo Dai et.al. 2311.12886 null
2023-12-05 LivePhoto: Real Image Animation with Text-guided Motion Control Xi Chen et.al. 2312.02928 null
2023-11-30 Motion-Conditioned Image Animation for Video Editing Wilson Yan et.al. 2311.18827 null
2023-11-27 MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model Zhongcong Xu et.al. 2311.16498 null
2023-11-27 DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors Jinbo Xing et.al. 2310.12190 link
2023-11-19 Differential Motion Evolution for Fine-Grained Motion Deformation in Unsupervised Image Animation Peirong Liu et.al. 2110.04658 null
2023-10-16 LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation Ruiqi Wu et.al. 2310.10769 link
2023-10-11 LEO: Generative Latent Image Animator for Human Video Synthesis Yaohui Wang et.al. 2305.03989 link
2023-09-26 Text-Guided Synthesis of Eulerian Cinemagraphs Aniruddha Mahapatra et.al. 2307.03190 link
2023-09-25 Automatic Animation of Hair Blowing in Still Portrait Photos Wenpeng Xiao et.al. 2309.14207 null
2023-07-10 AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning Yuwei Guo et.al. 2307.04725 link
2023-07-09 Predictive Coding For Animation-Based Video Compression Goluck Konuko et.al. 2307.04187 null
2023-04-12 VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs Moayed Haji Ali et.al. 2304.06020 null
2023-03-10 3D Cinemagraphy from a Single Image Xingyi Li et.al. 2303.05724 null
2023-02-02 Dreamix: Video Diffusion Models are General Video Editors Eyal Molad et.al. 2302.01329 null
2023-01-27 Animating Still Images Kushagr Batra et.al. 2209.10497 null
2023-01-14 Continuous odor profile monitoring to study olfactory navigation in small animals Kevin S. Chen et.al. 2301.05905 null
2022-11-30 NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-shot Real Image Animation Yu Yin et.al. 2211.17235 null
2022-10-05 Implicit Warping for Animation with Image Sets Arun Mallya et.al. 2210.01794 null
2022-09-28 Motion Transformer for Unsupervised Image Animation Jiale Tao et.al. 2209.14024 link
2022-07-19 Single Stage Virtual Try-on via Deformable Attention Flows Shuai Bai et.al. 2207.09161 link
2022-07-08 Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation Yucheng Suo et.al. 2207.03714 null
2022-06-11 Bayesian Statistics Guided Label Refurbishment Mechanism: Mitigating Label Noise in Medical Image Classification Mengdi Gao et.al. 2106.12284 link
2022-04-05 Neural Fields in Visual Computing and Beyond Yiheng Xie et.al. 2111.11426 null
2022-03-30 Image Animation with Perturbed Masks Yoav Shalev et.al. 2011.06922 null
2022-03-29 Thin-Plate Spline Motion Model for Image Animation Jian Zhao et.al. 2203.14367 link
2022-03-25 3D GAN Inversion for Controllable Portrait Image Animation Connor Z. Lin et.al. 2203.13441 null
2022-03-18 Latent Image Animator: Learning to Animate Images via Latent Space Navigation Yaohui Wang et.al. 2203.09043 null
2021-12-21 Image Animation with Keypoint Mask Or Toledano et.al. 2112.10457 link
2021-12-19 Move As You Like: Image Animation in E-Commerce Scenario Borun Xu et.al. 2112.13647 null
2021-12-17 AI-Empowered Persuasive Video Generation: A Survey Chang Liu et.al. 2112.09401 null
2021-12-01 Deep Spatial Transformation for Pose-Guided Person Image Generation and Animation Yurui Ren et.al. 2008.12606 null
2021-10-28 Application of Time Separation Technique to Enhance C-arm CT Dynamic Liver Perfusion Imaging Hana Haseljić et.al. 2110.14318 null
2021-10-26 Incremental Learning for Animal Pose Estimation using RBF k-DPP Gaurav Kumar Nayak et.al. 2110.13598 null
2021-10-07 Enhancement of Anime Imaging Enlargement using Modified Super-Resolution CNN Tanakit Intaniyom et.al. 2110.02321 null
2021-09-06 Sparse to Dense Motion Transfer for Face Image Animation Ruiqi Zhao et.al. 2109.00471 null
2021-08-18 DeepFake MNIST+: A DeepFake Facial Animation Dataset Jiajun Huang et.al. 2108.07949 link
2021-06-23 Analisis Kualitas Layanan Website E-Commerce Bukalapak Terhadap Kepuasan Pengguna Mahasiswa Universitas Bina Darma Menggunakan Metode Webqual 4.0 Adellia et.al. 2106.15342 null
2021-04-07 Single Source One Shot Reenactment using Weighted motion From Paired Feature Points Soumya Tripathy et.al. 2104.03117 null
2021-03-23 PriorityCut: Occlusion-guided Regularization for Warp-based Image Animation Wai Ting Cheung et.al. 2103.11600 null
2020-12-01 Ultra-low bitrate video conferencing using deep image animation Goluck Konuko et.al. 2012.00346 null
2020-10-01 First Order Motion Model for Image Animation Aliaksandr Siarohin et.al. 2003.00196 link
2019-08-30 Animating Arbitrary Objects via Deep Motion Transfer Aliaksandr Siarohin et.al. 1812.08861 link
2019-07-01 Style Generator Inversion for Image Enhancement and Animation Aviv Gabbay et.al. 1906.11880 null
2018-10-09 3D model silhouette-based tracking in depth images for puppet suit dynamic video-mapping Guillaume Caron et.al. 1810.03956 null
2018-06-24 A Design of FPGA Based Small Animal PET Real Time Digital Signal Processing and Correction Logic Jiaming Lu et.al. 1806.09117 null
2018-01-31 RAPTOR I: Time-dependent radiative transfer in arbitrary spacetimes Thomas Bronzwaer et.al. 1801.10452 null
2017-10-23 Quasi-random Agents for Image Transition and Animation Aneta Neumann et.al. 1710.07421 null
2016-06-23 Gender and Interest Targeting for Sponsored Post Advertising at Tumblr Mihajlo Grbovic et.al. 1606.07189 null
2015-03-16 Use of Effective Audio in E-learning Courseware Kisor Ray et.al. 1503.04837 null
2015-02-04 Multimedia-Video for Learning Kah Hean Chua et.al. 1502.01090 null
2013-01-25 Measurements of Martian Dust Devil Winds with HiRISE David S. Choi et.al. 1301.6130 null
2010-01-04 Tutoring System for Dance Learning Rajkumar Kannan et.al. 1001.0440 null

(back to top)

Video Generation

Video Generation

Publish Date Title Authors PDF Code
2025-11-18 Zero-shot Synthetic Video Realism Enhancement via Structure-aware Denoising Yifan Wang et.al. 2511.14719 null
2025-11-18 FreeSwim: Revisiting Sliding-Window Attention Mechanisms for Training-Free Ultra-High-Resolution Video Generation Yunfeng Wu et.al. 2511.14712 null
2025-11-18 ForensicFlow: A Tri-Modal Adaptive Network for Robust Deepfake Detection Mohammad Romani et.al. 2511.14554 null
2025-11-18 DeCo-VAE: Learning Compact Latents for Video Reconstruction via Decoupled Representation Xiangchen Yin et.al. 2511.14530 null
2025-11-18 FlowRoI A Fast Optical Flow Driven Region of Interest Extraction Framework for High-Throughput Image Compression in Immune Cell Migration Analysis Xiaowei Xu et.al. 2511.14419 null
2025-11-18 ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries Junfu Pu et.al. 2511.14349 null
2025-11-18 Dental3R: Geometry-Aware Pairing for Intraoral 3D Reconstruction from Sparse-View Photographs Yiyi Miao et.al. 2511.14315 null
2025-11-18 Towards Authentic Movie Dubbing with Retrieve-Augmented Director-Actor Interaction Learning Rui Liu et.al. 2511.14249 null
2025-11-18 Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion Zhuo Li et.al. 2511.14178 null
2025-11-18 Multi-view Phase-aware Pedestrian-Vehicle Incident Reasoning Framework with Vision-Language Models Hao Zhen et.al. 2511.14120 null
2025-11-18 Real-Time Mobile Video Analytics for Pre-arrival Emergency Medical Services Liuyi Jin et.al. 2511.14119 null
2025-11-18 A Patient-Independent Neonatal Seizure Prediction Model Using Reduced Montage EEG and ECG Sithmini Ranasingha et.al. 2511.14110 null
2025-11-18 Text-Driven Reasoning Video Editing via Reinforcement Learning on Digital Twin Representations Yiqing Shen et.al. 2511.14100 null
2025-11-18 Privis: Towards Content-Aware Secure Volumetric Video Delivery Kaiyuan Hu et.al. 2511.14005 null
2025-11-17 Learning Skill-Attributes for Transferable Assessment in Video Kumar Ashutosh et.al. 2511.13993 null
2025-11-17 PoCGM: Poisson-Conditioned Generative Model for Sparse-View CT Reconstruction Changsheng Fang et.al. 2511.13967 null
2025-11-17 SAE-MCVT: A Real-Time and Scalable Multi-Camera Vehicle Tracking Framework Powered by Edge Computing Yuqiang Lin et.al. 2511.13904 null
2025-11-17 Temporal Realism Evaluation of Generated Videos Using Compressed-Domain Motion Vectors Mert Onur Cakiroglu et.al. 2511.13897 null
2025-11-17 Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark Xinxin Liu et.al. 2511.13853 null
2025-11-17 Segment Anything Across Shots: A Method and Benchmark Hengrui Hu et.al. 2511.13715 null
2025-11-17 UnSAMv2: Self-Supervised Learning Enables Segment Anything at Any Granularity Junwei Yu et.al. 2511.13714 null
2025-11-17 TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models Harold Haodong Chen et.al. 2511.13704 null
2025-11-17 Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting Jiangnan Ye et.al. 2511.13684 null
2025-11-17 CacheFlow: Compressive Streaming Memory for Efficient Long-Form Video Understanding Shrenik Patel et.al. 2511.13644 null
2025-11-17 Computer Vision based group activity detection and action spotting Narthana Sivalingam et.al. 2511.13315 null
2025-11-17 CorrectAD: A Self-Correcting Agentic System to Improve End-to-end Planning in Autonomous Driving Enhui Ma et.al. 2511.13297 null
2025-11-17 FoleyBench: A Benchmark For Video-to-Audio Models Satvik Dixit et.al. 2511.13219 null
2025-11-17 Skeletons Speak Louder than Text: A Motion-Aware Pretraining Paradigm for Video-Based Person Re-Identification Rifen Lin et.al. 2511.13150 null
2025-11-17 VEIL: Jailbreaking Text-to-Video Models via Visual Exploitation from Implicit Language Zonghao Ying et.al. 2511.13127 null
2025-11-17 CloseUpShot: Close-up Novel View Synthesis from Sparse-views via Point-conditioned Diffusion Model Yuqi Zhang et.al. 2511.13121 null
2025-11-17 Semantics and Content Matter: Towards Multi-Prior Hierarchical Mamba for Image Deraining Zhaocheng Yu et.al. 2511.13113 null
2025-11-17 Recurrent Autoregressive Diffusion: Global Memory Meets Local Attention Taiye Chen et.al. 2511.12940 null
2025-11-17 Yanyun-3: Enabling Cross-Platform Strategy Game Operation with Vision-Language Models Guoyan Wang et.al. 2511.12937 null
2025-11-17 PFAvatar: Pose-Fusion 3D Personalized Avatar Reconstruction from Real-World Outfit-of-the-Day Photos Dianbing Xi et.al. 2511.12935 null
2025-11-17 Generative Photographic Control for Scene-Consistent Video Cinematic Editing Huiqiang Sun et.al. 2511.12921 null
2025-11-17 Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views Junyi Ma et.al. 2511.12878 null
2025-11-17 Video Finetuning Improves Reasoning Between Frames Ruiqi Yang et.al. 2511.12868 null
2025-11-16 SAGA: Source Attribution of Generative AI Videos Rohit Kundu et.al. 2511.12834 null
2025-11-16 Toward Real-world Text Image Forgery Localization: Structured and Interpretable Data Synthesis Zeqin Yu et.al. 2511.12658 null
2025-11-16 Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data Yunxin Li et.al. 2511.12609 null
2025-11-16 TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction Yukuo Ma et.al. 2511.12578 null
2025-11-16 ReaSon: Reinforced Causal Search with Information Bottleneck for Video Understanding Yuan Zhou et.al. 2511.12530 null
2025-11-16 DualGR: Generative Retrieval with Long and Short-Term Interests Modeling Zhongchao Yi et.al. 2511.12518 null
2025-11-16 DINO-Detect: A Simple yet Effective Framework for Blur-Robust AI-Generated Image Detection Jialiang Shen et.al. 2511.12511 null
2025-11-16 VLA-R: Vision-Language Action Retrieval toward Open-World End-to-End Autonomous Driving Hyunki Seong et.al. 2511.12405 null
2025-11-16 SynthGuard: An Open Platform for Detecting AI-Generated Multimedia with Multimodal LLMs Shail Desai et.al. 2511.12404 null
2025-11-15 Fast Reasoning Segmentation for Images and Videos Yiqing Shen et.al. 2511.12368 null
2025-11-15 Constructing and Interpreting Digital Twin Representations for Visual Reasoning via Reinforcement Learning Yiqing Shen et.al. 2511.12365 null
2025-11-15 AURA: Development and Validation of an Augmented Unplanned Removal Alert System using Synthetic ICU Videos Junhyuk Seo et.al. 2511.12241 null
2025-11-15 Cross-View Cross-Modal Unsupervised Domain Adaptation for Driver Monitoring System Aditi Bhalla et.al. 2511.12196 null
2025-11-15 Towards Obstacle-Avoiding Control of Planar Snake Robots Exploring Neuro-Evolution of Augmenting Topologies Advik Sinha et.al. 2511.12148 null
2025-11-15 Adaptive Begin-of-Video Tokens for Autoregressive Video Diffusion Models Tianle Cheng et.al. 2511.12099 null
2025-11-15 Learning to Hear by Seeing: It's Time for Vision Language Models to Understand Artistic Emotion from Sight and Sound Dengming Zhang et.al. 2511.12077 null
2025-11-15 ProAV-DiT: A Projected Latent Diffusion Transformer for Efficient Synchronized Audio-Video Generation Jiahui Sun et.al. 2511.12072 null
2025-11-15 PipeDiT: Accelerating Diffusion Transformers in Video Generation with Task Pipelining and Model Decoupling Sijie Wang et.al. 2511.12056 null
2025-11-15 TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space Wenxuan Miao et.al. 2511.12035 null
2025-11-14 Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models Siyou Li et.al. 2511.11910 null
2025-11-14 KVSwap: Disk-aware KV Cache Offloading for Long-Context On-device Inference Huawei Zhang et.al. 2511.11907 null
2025-11-14 Scalable Policy Evaluation with Video World Models Wei-Cheng Tseng et.al. 2511.11520 null
2025-11-14 Disentangling Emotional Bases and Transient Fluctuations: A Low-Rank Sparse Decomposition Approach for Video Affective Analysis Feng-Qi Cui et.al. 2511.11406 null
2025-11-14 YCB-Ev SD: Synthetic event-vision dataset for 6DoF object pose estimation Pavel Rojtberg et.al. 2511.11344 null
2025-11-14 RealisticDreamer: Guidance Score Distillation for Few-shot Gaussian Splatting Ruocheng Wu et.al. 2511.11213 null
2025-11-14 VIDEOP2R: Video Understanding from Perception to Reasoning Yifan Jiang et.al. 2511.11113 null
2025-11-14 LiteAttention: A Temporal Sparse Attention for Diffusion Transformers Dor Shmilovich et.al. 2511.11062 null
2025-11-14 EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation Zongyang Qiu et.al. 2511.11002 null
2025-11-14 Dexterous Manipulation Transfer via Progressive Kinematic-Dynamic Alignment Wenbin Bai et.al. 2511.10987 null
2025-11-14 Text-guided Weakly Supervised Framework for Dynamic Facial Expression Recognition Gunho Jung et.al. 2511.10958 null
2025-11-14 Language-Guided Graph Representation Learning for Video Summarization Wenrui Li et.al. 2511.10953 null
2025-11-14 Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labeling Seoik Jung et.al. 2511.10866 null
2025-11-13 Expert Consensus-based Video-Based Assessment Tool for Workflow Analysis in Minimally Invasive Colorectal Surgery: Development and Validation of ColoWorkflow Pooja P Jain et.al. 2511.10766 null
2025-11-13 Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals Shruti Singh Baghel et.al. 2511.10615 null
2025-11-13 TubeRMC: Tube-conditioned Reconstruction with Mutual Constraints for Weakly-supervised Spatio-Temporal Video Grounding Jinxuan Li et.al. 2511.10241 null
2025-11-13 Next-Frame Feature Prediction for Multimodal Deepfake Detection and Temporal Localization Ashutosh Anshul et.al. 2511.10212 null
2025-11-13 SUGAR: Learning Skeleton Representation with Visual-Motion Knowledge for Action Recognition Qilang Ye et.al. 2511.10091 null
2025-11-13 When Eyes and Ears Disagree: Can MLLMs Discern Audio-Visual Confusion? Qilang Ye et.al. 2511.10059 null
2025-11-13 Reinforcing Trustworthiness in Multimodal Emotional Support Systems Huy M. Le et.al. 2511.10011 null
2025-11-13 AHA! Animating Human Avatars in Diverse Scenes with Gaussian Splatting Aymen Mir et.al. 2511.09827 null
2025-11-12 Density Estimation and Crowd Counting Balachandra Devarangadi Sunil et.al. 2511.09723 null
2025-11-12 PriVi: Towards A General-Purpose Video Model For Primate Behavior In The Wild Felix B. Mueller et.al. 2511.09675 null
2025-11-12 TempRetinex: Retinex-based Unsupervised Enhancement for Low-light Video Under Diverse Lighting Conditions Yini Li et.al. 2511.09609 null
2025-11-12 Bridging the Data Gap: Spatially Conditioned Diffusion Model for Anomaly Generation in Photovoltaic Electroluminescence Images Shiva Hanifi et.al. 2511.09604 null
2025-11-12 Diffusion-Based Quality Control of Medical Image Segmentations across Organs Vincenzo Marcianò et.al. 2511.09588 null
2025-11-12 Video Echoed in Music: Semantic, Temporal, and Rhythmic Alignment for Video-to-Music Generation Xinyi Tong et.al. 2511.09585 null
2025-11-12 SPIDER: Scalable Physics-Informed Dexterous Retargeting Chaoyi Pan et.al. 2511.09484 null
2025-11-12 MCAD: Multimodal Context-Aware Audio Description Generation For Soccer Lipisha Chaudhary et.al. 2511.09448 null
2025-11-12 A cross-modal pre-training framework with video data for improving performance and generalization of distributed acoustic sensing Junyi Duan et.al. 2511.09342 null
2025-11-12 GRACE: Designing Generative Face Video Codec via Agile Hardware-Centric Workflow Rui Wan et.al. 2511.09272 null
2025-11-12 Unveiling the Impact of Data and Model Scaling on High-Level Control for Humanoid Robots Yuxi Wei et.al. 2511.09241 null
2025-11-12 AILINKPREVIEWER: Enhancing Code Reviews with LLM-Powered Link Previews Panya Trakoolgerntong et.al. 2511.09223 null
2025-11-12 DBINDS -- Can Initial Noise from Diffusion Model Inversion Help Reveal AI-Generated Videos? Yanlin Wu et.al. 2511.09184 null
2025-11-10 Robot Learning from a Physical World Model Jiageng Mao et.al. 2511.07416 null
2025-11-10 StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation Tianrui Feng et.al. 2511.07399 null
2025-11-10 Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation Jie Du et.al. 2511.01450 null
2025-11-09 GenAI vs. Human Creators: Procurement Mechanism Design in Two-/Three-Layer Markets Rui Ai et.al. 2511.06559 null
2025-11-09 RelightMaster: Precise Video Relighting with Multi-plane Light Images Weikang Bian et.al. 2511.06271 null
2025-11-08 Neodragon: Mobile Video Generation using Diffusion Transformer Animesh Karnewar et.al. 2511.06055 null
2025-11-07 THEval. Evaluation Framework for Talking Head Video Generation Nabyl Quignon et.al. 2511.04520 null
2025-11-06 InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation Jinlai Liu et.al. 2511.04675 null
2025-11-06 Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Jingqi Tong et.al. 2511.04570 null
2025-11-06 RISE-T2V: Rephrasing and Injecting Semantics with LLM for Expansive Text-to-Video Generation Xiangjun Zhang et.al. 2511.04317 null
2025-11-06 PhysCorr: Dual-Reward DPO for Physics-Constrained Text-to-Video Generation with Automated Preference Selection Peiyao Wang et.al. 2511.03997 null
2025-11-05 UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions Guozhen Zhang et.al. 2511.03334 null
2025-11-05 Unified Long Video Inpainting and Outpainting via Overlapping High-Order Co-Denoising Shuangquan Lyu et.al. 2511.03272 null
2025-11-04 Video Text Preservation with Synthetic Text-Rich Videos Ziyang Liu et.al. 2511.05573 null
2025-11-04 ID-Composer: Multi-Subject Video Synthesis with Hierarchical Identity Preservation Panwang Pan et.al. 2511.00511 null
2025-11-03 How Far Are Surgeons from Surgical World Models? A Pilot Study on Zero-shot Surgical Video Generation with Expert Assessment Zhen Chen et.al. 2511.01775 null
2025-11-03 Driving scenario generation and evaluation using a structured layer representation and foundational models Arthur Hubert et.al. 2511.01541 null
2025-11-03 Towards One-step Causal Video Generation via Adversarial Self-Distillation Yongqi Yang et.al. 2511.01419 null
2025-11-03 MotionStream: Real-Time Video Generation with Interactive Motion Controls Joonghyuk Shin et.al. 2511.01266 null
2025-11-01 Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models Panwang Pan et.al. 2511.00503 null
2025-10-31 Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals Xiangyu Fan et.al. 2510.27684 null
2025-10-31 Fine-Tuning Open Video Generators for Cinematic Scene Synthesis: A Small-Data Pipeline with LoRA and Wan2.1 I2V Meftun Akarsu et.al. 2510.27364 null
2025-10-31 DANCER: Dance ANimation via Condition Enhancement and Rendering with diffusion model Yucheng Xing et.al. 2510.27169 null
2025-10-31 Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Ziyu Guo et.al. 2510.26802 null
2025-10-30 AI Powered High Quality Text to Video Generation with Enhanced Temporal Consistency Piyushkumar Patel et.al. 2511.00107 null
2025-10-30 LeMiCa: Lexicographic Minimax Path Caching for Efficient Diffusion-Based Video Generation Huanlin Gao et.al. 2511.00090 null
2025-10-30 SEE4D: Pose-Free 4D Generation via Auto-Regressive Video Inpainting Dongyue Lu et.al. 2510.26796 null
2025-10-30 The Quest for Generalizable Motion Generation: Data, Model, and Evaluation Jing Lin et.al. 2510.26794 null
2025-10-30 Co-Evolving Latent Action World Models Yucen Wang et.al. 2510.26433 null
2025-10-30 LoCoT2V-Bench: A Benchmark for Long-Form and Complex Text-to-Video Generation Xiangqing Zheng et.al. 2510.26412 null
2025-10-29 VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning Baolu Li et.al. 2510.25772 null
2025-10-29 VC4VG: Optimizing Video Captions for Text-to-Video Generation Yang Du et.al. 2510.24134 null
2025-10-28 World Simulation with Video Foundation Models for Physical AI NVIDIA et.al. 2511.00062 null
2025-10-28 VividCam: Learning Unconventional Camera Motions from Virtual Synthetic Videos Qiucheng Wu et.al. 2510.24904 null
2025-10-28 Generative View Stitching Chonghyuk Song et.al. 2510.24718 null
2025-10-28 Uniform Discrete Diffusion with Metric Path for Video Generation Haoge Deng et.al. 2510.24717 null
2025-10-28 MC-SJD : Maximal Coupling Speculative Jacobi Decoding for Autoregressive Visual Generation Acceleration Junhyuk So et.al. 2510.24211 null
2025-10-28 LongCat-Video Technical Report Meituan LongCat Team et.al. 2510.22200 null
2025-10-27 CoMo: Compositional Motion Customization for Text-to-Video Generation Youcan Xu et.al. 2510.23007 null
2025-10-27 Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method Bohan Li et.al. 2510.22973 null
2025-10-26 MAGIC-Talk: Motion-aware Audio-Driven Talking Face Generation with Customizable Identity Control Fatemeh Nazarieh et.al. 2510.22810 null
2025-10-25 Hollywood Town: Long-Video Generation via Cross-Modal Multi-Agent Orchestration Zheng Wei et.al. 2510.22431 null
2025-10-24 Two-Steps Diffusion Policy for Robotic Manipulation via Genetic Denoising Mateo Clemente et.al. 2510.21991 null
2025-10-24 BachVid: Training-Free Video Generation with Consistent Background and Character Han Yan et.al. 2510.21696 null
2025-10-24 Epipolar Geometry Improves Video Generation Models Orest Kupyn et.al. 2510.21615 null
2025-10-24 OmniNWM: Omniscient Driving Navigation World Models Bohan Li et.al. 2510.18313 null
2025-10-23 Generative AI in Depth: A Survey of Recent Advances, Model Variants, and Real-World Applications Shamim Yazdani et.al. 2510.21887 null
2025-10-23 Video-As-Prompt: Unified Semantic Control for Video Generation Yuxuan Bian et.al. 2510.20888 null
2025-10-23 Video Prediction of Dynamic Physical Simulations With Pixel-Space Spatiotemporal Transformers Dean L Slack et.al. 2510.20807 null
2025-10-23 RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling Bingjie Gao et.al. 2510.20206 null
2025-10-23 Evaluating Video Models as Simulators of Multi-Person Pedestrian Trajectories Aaron Appelle et.al. 2510.20182 null
2025-10-23 Video Consistency Distance: Enhancing Temporal Consistency for Image-to-Video Generation via Reward-Based Fine-Tuning Takehiro Aoshima et.al. 2510.19193 null
2025-10-23 A Renaissance of Explicit Motion Information Mining from Transformers for Action Recognition Peiqin Zhuang et.al. 2510.18705 null
2025-10-22 Improving the Physics of Video Generation with VJEPA-2 Reward Signal Jianhao Yuan et.al. 2510.21840 null
2025-10-22 A new wave of vehicle insurance fraud fueled by generative AI Amir Hever et.al. 2510.19957 null
2025-10-22 PoseCrafter: Extreme Pose Estimation with Hybrid Video Synthesis Qing Mao et.al. 2510.19527 null
2025-10-22 GigaBrain-0: A World Model-Powered Vision-Language-Action Model GigaBrain Team et.al. 2510.19430 null
2025-10-22 FeatureFool: Zero-Query Fooling of Video Models via Feature Map Duoxun Tang et.al. 2510.18362 null
2025-10-22 MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models Yongshun Zhang et.al. 2510.17519 null
2025-10-22 ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints Meiqi Wu et.al. 2510.14847 null
2025-10-21 MoAlign: Motion-Centric Representation Alignment for Video Diffusion Models Aritra Bhowmik et.al. 2510.19022 null
2025-10-21 UltraGen: High-Resolution Video Generation with Hierarchical Attention Teng Hu et.al. 2510.18775 null
2025-10-21 MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation Weinan Jia et.al. 2510.18692 null
2025-10-21 Kaleido: Open-Sourced Multi-Subject Reference Video Generation Model Zhenxing Zhang et.al. 2510.18573 null
2025-10-20 World-in-World: World Models in a Closed-Loop World Jiahan Zhang et.al. 2510.18135 null
2025-10-20 Demystifying Transition Matching: When and Why It Can Beat Flow Matching Jaihoon Kim et.al. 2510.17991 null
2025-10-20 From Preferences to Prejudice: The Role of Alignment Tuning in Shaping Social Bias in Video Diffusion Models Zefan Cai et.al. 2510.17247 null
2025-10-20 DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion Weijie Wang et.al. 2510.15264 null
2025-10-20 Identity-Preserving Image-to-Video Generation via Reward-Guided Optimization Liao Shen et.al. 2510.14255 null
2025-10-19 An empirical study of the effect of video encoders on Temporal Video Grounding Ignacio M. De la Jara et.al. 2510.17007 null
2025-10-19 From Mannequin to Human: A Pose-Aware and Identity-Preserving Video Generation Framework for Lifelike Clothing Display Xiangyu Mu et.al. 2510.16833 null
2025-10-19 STANCE: Motion Coherent Video Generation Via Sparse-to-Dense Anchored Encoding Zhifei Chen et.al. 2510.14588 null
2025-10-17 VISTA: A Test-Time Self-Improving Video Generation Agent Do Xuan Long et.al. 2510.15831 null
2025-10-17 Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset Qingyan Bai et.al. 2510.15742 null
2025-10-17 Identity-GRPO: Optimizing Multi-Human Identity-preserving Video Generation via Reinforcement Learning Xiangyu Meng et.al. 2510.14256 null
2025-10-17 Ctrl-VI: Controllable Video Synthesis via Variational Inference Haoyi Duan et.al. 2510.07670 null
2025-10-16 TGT: Text-Grounded Trajectories for Locally Controlled Video Generation Guofeng Zhang et.al. 2510.15104 null
2025-10-16 RealDPO: Real or Not Real, that is the Preference Guo Cheng et.al. 2510.14955 null
2025-10-16 DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation Yu Zhou et.al. 2510.14949 null
2025-10-16 3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation JoungBin Lee et.al. 2510.14945 null
2025-10-16 In-Context Learning with Unpaired Clips for Instruction-based Video Editing Xinyao Liao et.al. 2510.14648 null
2025-10-16 Virtually Being: Customizing Camera-Controllable Video Diffusion Models with Multi-View Performance Captures Yuancheng Xu et.al. 2510.14179 null
2025-10-15 PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning Sihui Ji et.al. 2510.13809 null
2025-10-15 CanvasMAR: Improving Masked Autoregressive Video Generation With Canvas Zian Li et.al. 2510.13669 null
2025-10-15 VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator Hyojun Go et.al. 2510.13454 null
2025-10-15 Counting Hallucinations in Diffusion Models Shuai Fu et.al. 2510.13080 null
2025-10-14 SeqBench: Benchmarking Sequential Narrative Generation in Text-to-Video Models Zhengxu Tang et.al. 2510.13042 null
2025-10-14 MVP4D: Multi-View Portrait Video Diffusion for Animatable 4D Avatars Felix Taubner et.al. 2510.12785 null
2025-10-14 Time-Correlated Video Bridge Matching Viacheslav Vasilev et.al. 2510.12453 null
2025-10-14 BIGFix: Bidirectional Image Generation with Token Fixing Victor Besnier et.al. 2510.12231 null
2025-10-14 Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback Xingpei Ma et.al. 2510.12089 null
2025-10-13 Point Prompting: Counterfactual Tracking with Video Diffusion Models Ayush Shrivastava et.al. 2510.11715 null
2025-10-13 MoMaps: Semantics-Aware Scene Motion Generation with Motion Maps Jiahui Lei et.al. 2510.11107 null
2025-10-13 Q-Router: Agentic Video Quality Assessment with Expert Model Routing and Artifact Localization Shuo Xing et.al. 2510.08789 null
2025-10-12 AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes Yu Li et.al. 2510.10670 null
2025-10-12 DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis Peiyin Chen et.al. 2510.10650 null
2025-10-11 EditCast3D: Single-Frame-Guided 3D Editing with Video Propagation and View Selection Huaizhi Qu et.al. 2510.13652 null
2025-10-11 MultiCOIN: Multi-Modal COntrollable Video INbetweening Maham Tanveer et.al. 2510.08561 null
2025-10-10 Stable Video Infinity: Infinite-Length Video Generation with Error Recycling Wuyang Li et.al. 2510.09212 null
2025-10-10 MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling Qian Wang et.al. 2508.08487 null
2025-10-09 SkipSR: Faster Super Resolution with Token Skipping Rohan Choudhury et.al. 2510.08799 null
2025-10-09 NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos Hongyu Li et.al. 2510.08568 null
2025-10-09 VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning Minghong Cai et.al. 2510.08555 null
2025-10-09 X2Video: Adapting Diffusion Models for Multimodal Controllable Neural Video Rendering Zhitong Huang et.al. 2510.08530 null
2025-10-09 FlexTraj: Image-to-Video Generation with Flexible Point Trajectory Control Zhiyuan Zhang et.al. 2510.08527 null
2025-10-09 UniVideo: Unified Understanding, Generation, and Editing for Videos Cong Wei et.al. 2510.08377 null
2025-10-09 LinVideo: A Post-Training Framework towards O(n) Attention in Efficient Video Generation Yushi Huang et.al. 2510.08318 null
2025-10-09 UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution Shian Du et.al. 2510.08143 null
2025-10-09 Real-Time Motion-Controllable Autoregressive Video Diffusion Kesen Zhao et.al. 2510.08131 null
2025-10-09 CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving Tianrui Zhang et.al. 2510.07944 null
2025-10-09 TTOM: Test-Time Optimization and Memorization for Compositional Video Generation Leigang Qu et.al. 2510.07940 null
2025-10-09 Once Is Enough: Lightweight DiT-Based Video Virtual Try-On via One-Time Garment Appearance Injection Yanjie Pan et.al. 2510.07654 null
2025-10-09 Paper2Video: Automatic Video Generation from Scientific Papers Zeyu Zhu et.al. 2510.05096 null
2025-10-08 TRAVL: A Recipe for Making Video-Language Models Better Judges of Physics Implausibility Saman Motamed et.al. 2510.07550 null
2025-10-08 DynamicEval: Rethinking Evaluation for Dynamic Text-to-Video Synthesis Nithin C. Babu et.al. 2510.07441 null
2025-10-08 WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation Zezhong Qian et.al. 2510.07313 null
2025-10-08 MATRIX: Mask Track Alignment for Interaction-aware Video Generation Siyoon Jin et.al. 2510.07310 null
2025-10-08 TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation Jiaben Chen et.al. 2510.07249 null
2025-10-08 MV-Performer: Taming Video Diffusion Model for Faithful and Synchronized Multi-view Performer Synthesis Yihao Zhi et.al. 2510.07190 null
2025-10-08 Generative World Modelling for Humanoids: 1X World Model Challenge Technical Report Riccardo Mereu et.al. 2510.07092 null
2025-10-08 Addressing the ID-Matching Challenge in Long Video Captioning Zhantao Yang et.al. 2510.06973 null
2025-10-07 Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models Jiahao Wang et.al. 2510.06209 null
2025-10-07 When and How to Cut Classical Concerts? A Multimodal Automated Video Editing Approach Daniel Gonzálbez-Biosca et.al. 2510.05661 null
2025-10-06 LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation Yang Xiao et.al. 2510.05367 null
2025-10-06 VChain: Chain-of-Visual-Thought for Reasoning in Video Generation Ziqi Huang et.al. 2510.05094 null
2025-10-06 Character Mixing for Video Generation Tingting Liao et.al. 2510.05093 null
2025-10-06 Bridging Text and Video Generation: A Survey Nilay Kumar et.al. 2510.04999 null
2025-10-06 What Drives Compositional Generalization in Visual Generative Models? Karim Farid et.al. 2510.03075 null
2025-10-05 ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation Jay Zhangjie Wu et.al. 2510.04290 null
2025-10-05 Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers Shikang Zheng et.al. 2510.04188 null
2025-10-04 Generating Human Motion Videos using a Cascaded Text-to-Video Framework Hyelin Nam et.al. 2510.03909 null
2025-10-03 Mask2IV: Interaction-Centric Video Generation via Mask Trajectories Gen Li et.al. 2510.03135 null
2025-10-03 Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction Kaisi Guan et.al. 2510.03117 null
2025-10-03 When and Where do Events Switch in Multi-Event Video Generation? Ruotong Liao et.al. 2510.03049 null
2025-10-03 Pack and Force Your Memory: Long-form and Consistent Video Generation Xiaofei Wu et.al. 2510.01784 null
2025-10-02 Input-Aware Sparse Attention for Real-Time Co-Speech Video Generation Beijia Lu et.al. 2510.02617 null
2025-10-02 How Confident are Video Models? Empowering Video Models to Express their Uncertainty Zhiting Mei et.al. 2510.02571 null
2025-10-02 Inferring Dynamic Physical Properties from Video Foundation Models Guanqi Zhan et.al. 2510.02311 null
2025-10-02 MultiModal Action Conditioned Video Generation Yichen Li et.al. 2510.02287 null
2025-10-02 Learning to Generate Object Interactions with Physics-Guided Video Diffusion David Romero et.al. 2510.02284 null
2025-10-02 Self-Forcing++: Towards Minute-Scale High-Quality Video Generation Justin Cui et.al. 2510.02283 null
2025-10-02 TempoControl: Temporal Attention Guidance for Text-to-Video Models Shira Schiber et.al. 2510.02226 null
2025-10-02 Multi-marginal temporal Schrödinger Bridge Matching for video generation from unpaired data Thomas Gravier et.al. 2510.01894 null
2025-10-01 IMAGEdit: Let Any Subject Transform Fei Shen et.al. 2510.01186 null
2025-10-01 EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory Jiahao Wang et.al. 2510.01183 null
2025-10-01 Code2Video: A Code-centric Paradigm for Educational Video Generation Yanzhe Chen et.al. 2510.01174 null
2025-10-01 From Seeing to Predicting: A Vision-Language Framework for Trajectory Forecasting and Controlled Video Generation Fan Yang et.al. 2510.00806 null
2025-10-01 Arbitrary Generative Video Interpolation Guozhen Zhang et.al. 2510.00578 null
2025-10-01 BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration Zhaoyang Li et.al. 2510.00438 null
2025-09-30 Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation Chetwin Low et.al. 2510.01284 null
2025-09-30 Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation Agneet Chatterjee et.al. 2509.26555 null
2025-09-30 MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation Chenhui Zhu et.al. 2509.26391 null
2025-09-30 PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution Shian Du et.al. 2509.26025 null
2025-09-30 Wan-Alpha: High-Quality Text-to-Video Generation with Alpha Channel Haotian Dong et.al. 2509.24979 null
2025-09-30 QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification Weilun Feng et.al. 2509.23681 null
2025-09-29 FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation Yunyang Ge et.al. 2509.25187 null
2025-09-29 DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder Junyu Chen et.al. 2509.25182 null
2025-09-29 Rolling Forcing: Autoregressive Long Video Diffusion in Real Time Kunhao Liu et.al. 2509.25161 null
2025-09-29 PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion Yuyang Yin et.al. 2509.24997 null
2025-09-29 SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation Shuang Liang et.al. 2509.24980 null
2025-09-29 Attention Surgery: An Efficient Recipe to Linearize Your Video Diffusion Transformer Mohsen Ghafoorian et.al. 2509.24899 null
2025-09-29 Enhancing Physical Plausibility in Video Generation by Reasoning the Implausibility Yutong Hao et.al. 2509.24702 null
2025-09-29 SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer Junsong Chen et.al. 2509.24695 null
2025-09-29 Learning Object-Centric Representations Based on Slots in Real World Scenarios Adil Kaan Akan et.al. 2509.24652 null
2025-09-29 UI2V-Bench: An Understanding-based Image-to-video Generation Benchmark Ailing Zhang et.al. 2509.24427 null
2025-09-29 CLQ: Cross-Layer Guided Orthogonal-based Quantization for Diffusion Transformers Kai Liu et.al. 2509.24416 null
2025-09-29 NeRV-Diffusion: Diffuse Implicit Neural Representations for Video Synthesis Yixuan Ren et.al. 2509.24353 null
2025-09-29 FreeAction: Training-Free Techniques for Enhanced Fidelity of Trajectory-to-Video Generation Seungwook Kim et.al. 2509.24241 null
2025-09-28 Autoregressive Video Generation beyond Next Frames Prediction Sucheng Ren et.al. 2509.24081 null
2025-09-28 SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention Jintao Zhang et.al. 2509.24006 null
2025-09-28 VividFace: High-Quality and Efficient One-Step Diffusion For Video Face Enhancement Shulian Zhang et.al. 2509.23584 null
2025-09-27 Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing Rohit Chowdhury et.al. 2509.23279 null
2025-09-27 Sparse2Dense: A Keypoint-driven Generative Framework for Human Video Compression and Vertex Prediction Bolin Chen et.al. 2509.23169 null
2025-09-26 Physically Plausible Multi-System Trajectory Generation and Symmetry Discovery Jiayin Liu et.al. 2509.23003 null
2025-09-26 VideoScore2: Think before You Score in Generative Video Evaluation Xuan He et.al. 2509.22799 null
2025-09-26 Learning Human-Perceived Fakeness in AI-Generated Videos via Multimodal LLMs Xingyu Fu et.al. 2509.22646 null
2025-09-26 LongLive: Real-time Interactive Long Video Generation Shuai Yang et.al. 2509.22622 null
2025-09-26 EgoDemoGen: Novel Egocentric Demonstration Generation Enables Viewpoint-Robust Manipulation Yuan Xu et.al. 2509.22578 null
2025-09-26 EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer Zhehao Dong et.al. 2509.22407 null
2025-09-26 Syncphony: Synchronized Audio-to-Video Generation with Diffusion Transformers Jibin Song et.al. 2509.21893 null
2025-09-26 DiTraj: training-free trajectory control for video diffusion transformer Cheng Lei et.al. 2509.21839 null
2025-09-26 MoWM: Mixture-of-World-Models for Embodied Planning via Latent-to-Pixel Feature Modulation Yu Shang et.al. 2509.21797 null
2025-09-26 LongScape: Advancing Long-Horizon Embodied World Models with Context-Aware MoE Yu Shang et.al. 2509.21790 null
2025-09-26 UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models Lan Chen et.al. 2509.21760 null
2025-09-25 FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction Yixiang Dai et.al. 2509.21657 null
2025-09-25 What Happens Next? Anticipating Future Motion by Generating Point Trajectories Gabrijel Boduljak et.al. 2509.21592 null
2025-09-25 ControlHair: Physically-based Video Diffusion for Controllable Dynamic Hair Rendering Weikai Lin et.al. 2509.21541 null
2025-09-25 NewtonGen: Physics-Consistent and Controllable Text-to-Video Generation via Neural Newtonian Dynamics Yu Yuan et.al. 2509.21309 null
2025-09-25 MotionFlow:Learning Implicit Motion Flow for Complex Camera Trajectory Control in Video Generation Guojun Lei et.al. 2509.21119 null
2025-09-25 EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning Xuan Ju et.al. 2509.20360 null
2025-09-24 PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation Chen Wang et.al. 2509.20358 null
2025-09-24 4D Driving Scene Generation With Stereo Forcing Hao Lu et.al. 2509.20251 null
2025-09-24 CamPVG: Camera-Controlled Panoramic Video Generation with Epipolar-Aware Diffusion Chenhao Ji et.al. 2509.19979 null
2025-09-24 OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling Yang Zhou et.al. 2509.12201 null
2025-09-23 Text Slider: Efficient and Plug-and-Play Continuous Concept Control for Image/Video Synthesis via LoRA Adapters Pin-Yen Chiu et.al. 2509.18831 null
2025-09-22 VideoFrom3D: 3D Scene Video Generation via Complementary Image and Video Diffusion Models Geonung Kim et.al. 2509.17985 null
2025-09-22 I2VWM: Robust Watermarking for Image to Video Generation Guanjie Wang et.al. 2509.17773 null
2025-09-21 Echo-Path: Pathology-Conditioned Echo Video Generation Kabir Hamzah Muhammad et.al. 2509.17190 null
2025-09-21 VidCLearn: A Continual Learning Approach for Text-to-Video Generation Luca Zanchetta et.al. 2509.16956 null
2025-09-21 $\mathtt{M^3VIR}$ : A Large-Scale Multi-Modality Multi-View Synthesized Benchmark Dataset for Image Restoration and Content Creation Yuanzhi Li et.al. 2509.16873 null
2025-09-20 RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation Tianyi Yan et.al. 2509.16500 null
2025-09-19 Lynx: Towards High-Fidelity Personalized Video Generation Shen Sang et.al. 2509.15496 null
2025-09-19 AToken: A Unified Tokenizer for Vision Jiasen Lu et.al. 2509.14476 null
2025-09-18 OpenViGA: Video Generation for Automotive Driving Scenes by Streamlining and Fine-Tuning Open Source Models with Public Data Björn Möller et.al. 2509.15479 null
2025-09-18 RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation Yuming Jiang et.al. 2509.15212 null
2025-09-18 WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance Chenxi Song et.al. 2509.15130 null
2025-09-18 DACoN: DINO for Anime Paint Bucket Colorization with Any Number of Reference Images Kazuma Nagata et.al. 2509.14685 null
2025-09-18 BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching Hanshuai Cui et.al. 2509.13789 null
2025-09-17 PhysicalAgent: Towards General Cognitive Robotics with Foundation World Models Artem Lykov et.al. 2509.13903 null
2025-09-17 TeraSim-World: Worldwide Safety-Critical Data Synthesis for End-to-End Autonomous Driving Jiawei Wang et.al. 2509.13164 null
2025-09-17 Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis Yikang Ding et.al. 2509.09595 null
2025-09-16 \textsc{Gen2Real}: Towards Demo-Free Dexterous Manipulation by Harnessing Generated Video Kai Ye et.al. 2509.14178 null
2025-09-16 BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models Yuming Li et.al. 2509.06040 null
2025-09-15 AvatarSync: Rethinking Talking-Head Animation through Autoregressive Perspective Yuchen Deng et.al. 2509.12052 null
2025-09-15 SpeCa: Accelerating Diffusion Transformers with Speculative Feature Caching Jiacheng Liu et.al. 2509.11628 null
2025-09-15 MVQA-68K: A Multi-dimensional and Causally-annotated Dataset with Quality Interpretability for Video Assessment Yanyun Pu et.al. 2509.11589 null
2025-09-14 VideoAgent: Personalized Synthesis of Scientific Videos Xiao Liang et.al. 2509.11253 null
2025-09-14 PanoLora: Bridging Perspective and Panoramic Video Generation with LoRA Adaptation Zeyu Dong et.al. 2509.11092 null
2025-09-12 Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video Generation Hao Zhang et.al. 2509.10687 null
2025-09-12 T2Bs: Text-to-Character Blendshapes via Video Generation Jiahao Luo et.al. 2509.10678 null
2025-09-12 Compute Only 16 Tokens in One Timestep: Accelerating Diffusion Transformers with Cluster-Driven Feature Caching Zhixin Zheng et.al. 2509.10312 null
2025-09-11 Improving Video Diffusion Transformer Training by Multi-Feature Fusion and Alignment from Self-Supervised Vision Encoders Dohun Lee et.al. 2509.09547 null
2025-09-11 Zero-shot 3D-Aware Trajectory-Guided image-to-video generation via Test-Time Training Ruicheng Zhang et.al. 2509.06723 null
2025-09-10 RewardDance: Reward Scaling in Visual Generation Jie Wu et.al. 2509.08826 null
2025-09-10 GeneVA: A Dataset of Human Annotations for Generative Text to Video Artifacts Jenna Kang et.al. 2509.08818 null
2025-09-10 HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning Liyang Chen et.al. 2509.08519 null
2025-09-09 ANYPORTAL: Zero-Shot Consistent Video Background Replacement Wenshuo Gao et.al. 2509.07472 null
2025-09-09 Coefficients-Preserving Sampling for Reinforcement Learning with Flow Matching Feng Wang et.al. 2509.05952 null
2025-09-09 Attention of a Kiss: Exploring Attention Maps in Video Diffusion for XAIxArts Adam Cole et.al. 2509.05323 null
2025-09-07 UniVerse-1: Unified Audio-Video Generation via Stitching of Experts Duomin Wang et.al. 2509.06155 null
2025-09-04 Virtual Fitting Room: Generating Arbitrarily Long Videos of Virtual Try-On from a Single Image -- Technical Preview Jun-Kun Chen et.al. 2509.04450 null
2025-09-04 Human Motion Video Generation: A Survey Haiwei Xue et.al. 2509.03883 null
2025-09-03 CompSlider: Compositional Slider for Disentangled Multiple-Attribute Image Generation Zixin Zhu et.al. 2509.01028 null
2025-09-01 Identity-Preserving Text-to-Video Generation via Training-Free Prompt, Image, and Guidance Enhancement Jiayi Gao et.al. 2509.01362 null
2025-09-01 Communicative Agents for Slideshow Storytelling Video Generation based on LLMs Jingxing Fan et.al. 2509.01277 null
2025-09-01 FantasyHSI: Video-Generation-Centric 4D Human Synthesis In Any Scene through A Graph-based Multi-Agent Framework Lingzhou Mu et.al. 2509.01232 null
2025-08-30 DevilSight: Augmenting Monocular Human Avatar Reconstruction through a Virtual Perspective Yushuo Chen et.al. 2509.00403 null
2025-08-28 Mixture of Contexts for Long Video Generation Shengqu Cai et.al. 2508.21058 null
2025-08-28 POSE: Phased One-Step Adversarial Equilibrium for Video Diffusion Models Jiaxiang Cheng et.al. 2508.21019 null
2025-08-28 Learning Primitive Embodied World Models: Towards Scalable Robotic Learning Qiao Sun et.al. 2508.20840 null
2025-08-28 Realistic and Controllable 3D Gaussian-Guided Object Editing for Driving Video Generation Jiusi Li et.al. 2508.20471 null
2025-08-28 Ego-centric Predictive Model Conditioned on Hand Trajectories Binjie Zhang et.al. 2508.19852 null
2025-08-28 MIDAS: Multimodal Interactive Digital-humAn Synthesis via Real-time Autoregressive Video Generation Ming Chen et.al. 2508.19320 null
2025-08-27 ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion Xurui Peng et.al. 2508.21091 null
2025-08-26 ROSE: Remove Objects with Side Effects in Videos Chenxuan Miao et.al. 2508.18633 null
2025-08-26 Wan-S2V: Audio-Driven Cinematic Video Generation Xin Gao et.al. 2508.18621 null
2025-08-26 Waver: Wave Your Way to Lifelike Video Generation Yifu Zhang et.al. 2508.15761 null
2025-08-25 SuperGen: An Efficient Ultra-high-resolution Video Generation System with Sketching and Tiling Fanjiang Ye et.al. 2508.17756 null
2025-08-25 OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models Huanpeng Chu et.al. 2508.16212 null
2025-08-24 A Synthetic Dataset for Manometry Recognition in Robotic Applications Pedro Antonio Rabelo Saraiva et.al. 2508.17468 null
2025-08-24 MoCo: Motion-Consistent Human Video Generation via Structure-Appearance Decoupling Haoyu Wang et.al. 2508.17404 null
2025-08-24 DiCache: Let Diffusion Model Determine Its Own Cache Jiazi Bu et.al. 2508.17356 null
2025-08-23 SSG-Dit: A Spatial Signal Guided Framework for Controllable Video Generation Peng Hu et.al. 2508.17062 null
2025-08-23 HiCache: Training-free Acceleration of Diffusion Models via Hermite Polynomial-based Feature Caching Liang Feng et.al. 2508.16984 null
2025-08-23 HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation Sizhe Shan et.al. 2508.16930 null
2025-08-22 Seeing Clearly, Forgetting Deeply: Revisiting Fine-Tuned Video Generators for Driving Simulation Chun-Peng Chang et.al. 2508.16512 null
2025-08-22 Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers Shikang Zheng et.al. 2508.16211 null
2025-08-21 Spatial Policy: Guiding Visuomotor Robotic Manipulation with Spatial-Aware Modeling and Reasoning Yijun Liu et.al. 2508.15874 null
2025-08-21 CineScale: Free Lunch in High-Resolution Cinematic Visual Generation Haonan Qiu et.al. 2508.15774 null
2025-08-21 Scaling Group Inference for Diverse and High-Quality Generation Gaurav Parmar et.al. 2508.15773 null
2025-08-21 WorldWeaver: Generating Long-Horizon Video Worlds via Rich Perception Zhiheng Liu et.al. 2508.15720 null
2025-08-21 TiP4GEN: Text to Immersive Panorama 4D Scene Generation Ke Xing et.al. 2508.12415 null
2025-08-20 DreamSwapV: Mask-guided Subject Swapping for Any Customized Video Editing Weitao Wang et.al. 2508.14465 null
2025-08-20 MoVieDrive: Multi-Modal Multi-View Urban Scene Video Generation Guile Wu et.al. 2508.14327 null
2025-08-19 xDiff: Online Diffusion Model for Collaborative Inter-Cell Interference Management in 5G O-RAN Peihao Yan et.al. 2508.15843 null
2025-08-19 InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing Shaoshu Yang et.al. 2508.14033 null
2025-08-19 Physics-Based 3D Simulation for Synthetic Data Generation and Failure Analysis in Packaging Stability Assessment Samuel Seligardi et.al. 2508.13989 null
2025-08-18 4DNeX: Feed-Forward 4D Generative Modeling Made Easy Zhaoxi Chen et.al. 2508.13154 null
2025-08-18 Precise Action-to-Video Generation Through Visual Action Prompts Yuang Wang et.al. 2508.13104 null
2025-08-18 EgoTwin: Dreaming Body and View in First Person Jingqiao Xiu et.al. 2508.13013 null
2025-08-18 Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model Xianglong He et.al. 2508.13009 null
2025-08-18 Compact Attention: Exploiting Structured Spatio-Temporal Sparsity for Fast Video Generation Qirui Li et.al. 2508.12969 null
2025-08-18 Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models Jianshu Zeng et.al. 2508.12945 null
2025-08-18 S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models Chubin Chen et.al. 2508.12880 null
2025-08-18 E3RG: Building Explicit Emotion-driven Empathetic Response Generation System with Multimodal Large Language Model Ronghao Lin et.al. 2508.12854 null
2025-08-18 MixCache: Mixture-of-Cache for Video Diffusion Transformer Acceleration Yuanxin Wei et.al. 2508.12691 null
2025-08-15 CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models Xiaoxue Wu et.al. 2508.11484 null
2025-08-15 Preacher: Paper-to-Video Agentic System Jingwei Liu et.al. 2508.09632 null
2025-08-14 GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning Kelin Yu et.al. 2508.11049 null
2025-08-14 EVCtrl: Efficient Control Adapter for Visual Generation Zixiang Yang et.al. 2508.10963 null
2025-08-14 Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation Harold Haodong Chen et.al. 2508.10858 null
2025-08-14 Video-BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation Youping Gu et.al. 2508.10774 null
2025-08-14 AEGIS: Authenticity Evaluation Benchmark for AI-Generated Video Sequences Jieyu Li et.al. 2508.10771 null
2025-08-14 HM-Talker: Hybrid Motion Modeling for High-Fidelity Talking Head Synthesis Shiyu Liu et.al. 2508.10566 null
2025-08-14 From Large Angles to Consistent Faces: Identity-Preserving Video Generation via Mixture of Facial Experts Yuji Wang et.al. 2508.09476 null
2025-08-14 Yan: Foundational Interactive Video Generation Deheng Ye et.al. 2508.08601 null
2025-08-13 Physical Autoregressive Model for Robotic Manipulation without Action Pretraining Zijian Song et.al. 2508.09822 null
2025-08-12 X-UniMotion: Animating Human Images with Expressive, Unified and Identity-Agnostic Motion Latents Guoxian Song et.al. 2508.09383 null
2025-08-12 Turbo-VAED: Fast and Stable Transfer of Video-VAEs to Mobile Devices Ya Zou et.al. 2508.09136 null
2025-08-12 TaoCache: Structure-Maintained Video Generation Acceleration Zhentao Fan et.al. 2508.08978 null
2025-08-12 Subjective and Objective Quality Assessment of Banding Artifacts on Compressed Videos Qi Zheng et.al. 2508.08700 null
2025-08-12 RealisMotion: Decomposed Human Motion Control and Video Generation in the World Space Jingyun Liang et.al. 2508.08588 null
2025-08-12 S^2VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix Peng Dai et.al. 2508.08048 null
2025-08-12 Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation Fangyuan Mao et.al. 2508.07981 null
2025-08-12 Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation Bowen Xue et.al. 2508.07901 null
2025-08-11 VSF: Simple, Efficient, and Effective Negative Guidance in Few-Step Image Generation Models By \underline{V}alue \underline{S}ign \underline{F}lip Wenqi Guo et.al. 2508.10931 null
2025-08-11 StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation Shuyuan Tu et.al. 2508.08248 null
2025-08-11 Matrix-3D: Omnidirectional Explorable 3D World Generation Zhongqi Yang et.al. 2508.08086 null
2025-08-11 Dream4D: Lifting Camera-Controlled I2V towards Spatiotemporally Consistent 4D Generation Xiaoyan Liu et.al. 2508.07769 null
2025-08-11 ShoulderShot: Generating Over-the-Shoulder Dialogue Videos Yuang Zhang et.al. 2508.07597 null
2025-08-08 Restage4D: Reanimating Deformable 3D Reconstruction from a Single Video Jixuan He et.al. 2508.06715 null
2025-08-08 SwiftVideo: A Unified Framework for Few-Step Video Generation through Trajectory-Distribution Alignment Yanxiao Sun et.al. 2508.06082 null
2025-08-08 DreamVE: Unified Instruction-based Image and Video Editing Bin Xia et.al. 2508.06080 null
2025-08-07 Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation Yue Liao et.al. 2508.05635 null
2025-08-07 B4DL: A Benchmark for 4D LiDAR LLM in Spatio-Temporal Understanding Changho Choi et.al. 2508.05269 null
2025-08-07 PoseGen: In-Context LoRA Finetuning for Pose-Controllable Long Human Video Generation Jingxuan He et.al. 2508.05091 null
2025-08-07 S $^2$ Q-VDiT: Accurate Quantized Video Diffusion Transformer with Salient Data and Sparse Token Distillation Weilun Feng et.al. 2508.04016 null
2025-08-06 MSC: A Marine Wildlife Video Dataset with Grounded Segmentation and Clip-Level Captioning Quang-Trung Truong et.al. 2508.04549 null
2025-08-06 LayerT2V: Interactive Multi-Object Trajectory Layering for Video Generation Kangrui Cen et.al. 2508.04228 null
2025-08-06 Motion is the Choreographer: Learning Latent Pose Dynamics for Seamless Sign Language Generation Jiayi He et.al. 2508.04049 null
2025-08-06 Macro-from-Micro Planning for High-Quality and Parallelized Autoregressive Long Video Generation Xunzhi Xiang et.al. 2508.03334 null
2025-08-05 Scaling Up Audio-Synchronized Visual Animation: An Efficient Training Paradigm Lin Zhang et.al. 2508.03955 null
2025-08-05 LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation Jianxiong Gao et.al. 2508.03694 null
2025-08-05 RAAG: Ratio Aware Adaptive Guidance Shangwen Zhu et.al. 2508.03442 null
2025-08-05 V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models Jisoo Kim et.al. 2508.03254 null
2025-08-05 Multi-human Interactive Talking Dataset Zeyu Zhu et.al. 2508.03050 null
2025-08-05 MoCA: Identity-Preserving Text-to-Video Generation via Mixture of Cross Attention Qi Xie et.al. 2508.03034 null
2025-08-05 D3: Training-Free AI-Generated Video Detection Using Second-Order Features Chende Zheng et.al. 2508.00701 null
2025-08-04 X-Actor: Emotional and Expressive Long-Range Portrait Acting from Audio Chenxu Zhang et.al. 2508.02944 null
2025-08-04 DreamVVT: Mastering Realistic Video Virtual Try-On in the Wild via a Stage-Wise Diffusion Transformer Framework Tongchun Zuo et.al. 2508.02807 null
2025-08-04 QuaDreamer: Controllable Panoramic Video Generation for Quadruped Robots Sheng Wu et.al. 2508.02512 null
2025-08-04 PoseGuard: Pose-Guided Generation with Safety Guardrails Kongxin Wang et.al. 2508.02476 null
2025-08-04 Talking Surveys: How Photorealistic Embodied Conversational Agents Shape Response Quality, Engagement, and Satisfaction Matus Krajcovic et.al. 2508.02376 null
2025-08-03 Versatile Transition Generation with Image-to-Video Diffusion Zuhao Yang et.al. 2508.01698 null
2025-08-01 Video Generators are Robot Policies Junbang Liang et.al. 2508.00795 null
2025-08-01 SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation Kien T. Pham et.al. 2508.00782 null
2025-08-01 Video Forgery Detection with Optical Flow Residuals and Spatial-Temporal Consistency Xi Xue et.al. 2508.00397 null
2025-08-01 GV-VAD : Exploring Video Generation for Weakly-Supervised Video Anomaly Detection Suhang Cai et.al. 2508.00312 null
2025-08-01 Controllable Pedestrian Video Editing for Multi-View Driving Scenarios via Motion Sequence Danzhen Fu et.al. 2508.00299 null
2025-08-01 HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly Chang Liu et.al. 2507.19924 null
2025-07-31 World Consistency Score: A Unified Metric for Video Generation Quality Akshat Rakheja et.al. 2508.00144 null
2025-07-30 GVD: Guiding Video Diffusion Model for Scalable Video Distillation Kunyang Li et.al. 2507.22360 null
2025-07-29 JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1 Xinhan Di et.al. 2507.20987 null
2025-07-28 Compositional Video Synthesis by Temporal Object-Centric Learning Adil Kaan Akan et.al. 2507.20855 null
2025-07-27 MagicAnime: A Hierarchically Annotated, Multimodal and Multitasking Dataset with Benchmarks for Cartoon Animation Generation Shuolin Xu et.al. 2507.20368 null
2025-07-26 ChoreoMuse: Robust Music-to-Dance Video Generation with Style Transfer and Beat-Adherent Motion Xuanchen Wang et.al. 2507.19836 null
2025-07-25 ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment Chong Xia et.al. 2507.19058 null
2025-07-24 Captain Cinema: Towards Short Movie Generation Junfei Xiao et.al. 2507.18634 null
2025-07-24 Adversarial Distribution Matching for Diffusion Distillation Towards Efficient Image and Video Synthesis Yanzuo Lu et.al. 2507.18569 null
2025-07-24 Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows Simin Huo et.al. 2507.18405 null
2025-07-24 T2VWorldBench: A Benchmark for Evaluating World Knowledge in Text-to-Video Generation Yubin Chen et.al. 2507.18107 null
2025-07-24 Enhancing Scene Transition Awareness in Video Generation via Post-Training Hanwen Shen et.al. 2507.18046 null
2025-07-24 Celeb-DF++: A Large-scale Challenging Video DeepFake Benchmark for Generalizable Forensics Yuezun Li et.al. 2507.18015 null
2025-07-24 Controllable Video Generation: A Survey Yue Ma et.al. 2507.16869 null
2025-07-23 Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA Rameen Abdal et.al. 2507.17963 null
2025-07-23 Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation Jaechul Roh et.al. 2507.17937 null
2025-07-23 Yume: An Interactive World Generation Model Xiaofeng Mao et.al. 2507.17744 null
2025-07-23 EndoGen: Conditional Autoregressive Endoscopic Video Generation Xinyu Liu et.al. 2507.17388 null
2025-07-22 Livatar-1: Real-Time Talking Heads Generation with Tailored Flow Matching Haiyang Liu et.al. 2507.18649 null
2025-07-22 MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation Yanchen Liu et.al. 2507.16310 null
2025-07-22 PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep Adaptation Yaofang Liu et.al. 2507.16116 null
2025-07-21 Can Your Model Separate Yolks with a Water Bottle? Benchmarking Physical Commonsense Understanding in Video Generation Models Enes Sanli et.al. 2507.15824 null
2025-07-21 TokensGen: Harnessing Condensed Tokens for Long Video Generation Wenqi Ouyang et.al. 2507.15728 null
2025-07-21 Conditional Video Generation for High-Efficiency Video Compression Fangqiu Yi et.al. 2507.15269 null
2025-07-19 BusterX++: Towards Unified Cross-Modal AI-Generated Content Detection and Explanation with MLLM Haiquan Wen et.al. 2507.14632 null
2025-07-19 Advances in Feed-Forward 3D Reconstruction and View Synthesis: A Survey Jiahui Zhang et.al. 2507.14501 null
2025-07-18 Encapsulated Composition of Text-to-Image and Text-to-Video Models for High-Quality Video Synthesis Tongtong Su et.al. 2507.13753 null
2025-07-17 $\nabla$ NABLA: Neighborhood Adaptive Block-Level Attention Dmitrii Mikhailov et.al. 2507.13546 null
2025-07-17 "PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models Jing Gu et.al. 2507.13428 null
2025-07-17 Taming Diffusion Transformer for Real-Time Mobile Video Generation Yushu Wu et.al. 2507.13343 null
2025-07-17 Leveraging Pre-Trained Visual Models for AI-Generated Video Detection Keerthi Veeramachaneni et.al. 2507.13224 null
2025-07-17 LoViC: Efficient Long Video Generation with Context Compression Jiaxiu Jiang et.al. 2507.12952 null
2025-07-17 World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving Yanchen Guan et.al. 2507.12762 null
2025-07-16 EC-Diff: Fast and High-Quality Edge-Cloud Collaborative Inference for Diffusion Models Jiajian Xie et.al. 2507.11980 null
2025-07-15 NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation Models X. Feng et.al. 2507.11245 null
2025-07-14 Flows and Diffusions on the Neural Manifold Daniel Saragih et.al. 2507.10623 null
2025-07-14 M2DAO-Talker: Harmonizing Multi-granular Motion Decoupling and Alternating Optimization for Talking-head Generation Kui Jiang et.al. 2507.08307 null
2025-07-14 Democratizing High-Fidelity Co-Speech Gesture Video Generation Xu Yang et.al. 2507.06812 null
2025-07-12 $I^{2}$ -World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting Zhimin Liao et.al. 2507.09144 null
2025-07-11 Taming generative video models for zero-shot optical flow extraction Seungwoo Kim et.al. 2507.09082 null
2025-07-11 Detecting Deepfake Talking Heads from Facial Biometric Anomalies Justin D. Norman et.al. 2507.08917 null
2025-07-11 Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective Hangjie Yuan et.al. 2507.08801 null
2025-07-11 Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers Wongi Jeong et.al. 2507.08422 null
2025-07-11 T-GVC: Trajectory-Guided Generative Video Coding at Ultra-Low Bitrates Zhitao Wang et.al. 2507.07633 null
2025-07-10 Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling Haoyu Wu et.al. 2507.07982 null
2025-07-10 Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions Longfei Li et.al. 2507.07978 null
2025-07-10 Scaling RL to Long Videos Yukang Chen et.al. 2507.07966 null
2025-07-09 A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality Mohamed Elmoghany et.al. 2507.07202 null
2025-07-09 Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation Tao Feng et.al. 2507.06830 null
2025-07-09 PromptTea: Let Prompts Tell TeaCache the Optimal Threshold Zishen Huang et.al. 2507.06739 null
2025-07-09 Spatial-Temporal Graph Mamba for Music-Guided Dance Video Synthesis Hao Tang et.al. 2507.06689 null
2025-07-09 FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation Liqiang Jing et.al. 2507.06523 null
2025-07-09 Omni-Video: Democratizing Unified Video Understanding and Generation Zhiyu Tan et.al. 2507.06119 null
2025-07-09 Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation Zhenghao Zhang et.al. 2507.05963 null
2025-07-09 LongAnimation: Long Animation Generation with Dynamic Global-Local Memory Nan Chen et.al. 2507.01945 null
2025-07-08 Bridging Sequential Deep Operator Network and Video Diffusion: Residual Refinement of Spatio-Temporal PDE Solutions Jaewan Park et.al. 2507.06133 null
2025-07-08 MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos Rongsheng Wang et.al. 2507.05675 null
2025-07-08 StreamDiT: Real-Time Streaming Text-to-Video Generation Akio Kodaira et.al. 2507.03745 null
2025-07-07 HV-MMBench: Benchmarking MLLMs for Human-Centric Video Understanding Yuxuan Cai et.al. 2507.04909 null
2025-07-07 Music2Palette: Emotion-aligned Color Palette Generation via Cross-Modal Representation Learning Jiayun Hu et.al. 2507.04758 null
2025-07-07 Identity-Preserving Text-to-Video Generation Guided by Simple yet Effective Spatial-Temporal Decoupled Representations Yuji Wang et.al. 2507.04705 null
2025-07-06 MambaVideo for Discrete Video Tokenization with Channel-Split Quantization Dawit Mureja Argaw et.al. 2507.04559 null
2025-07-06 CLIP-RL: Surgical Scene Segmentation Using Contrastive Language-Vision Pretraining & Reinforcement Learning Fatmaelzahraa Ali Ahmed et.al. 2507.04317 null
2025-07-05 PresentAgent: Multimodal Agent for Presentation Video Generation Jingwei Shi et.al. 2507.04036 null
2025-07-05 EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation Rang Meng et.al. 2507.03905 null
2025-07-03 RefTok: Reference-Based Tokenization for Video Generation Xiang Fan et.al. 2507.02862 null
2025-07-03 Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching Xin Zhou et.al. 2507.02860 null
2025-07-03 AnyI2V: Animating Any Conditional Image with Motion Control Ziye Li et.al. 2507.02857 null
2025-07-03 Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation François Rozet et.al. 2507.02608 null
2025-07-03 RGC-VQA: An Exploration Database for Robotic-Generated Video Quality Assessment Jianing Jin et.al. 2506.23852 null
2025-07-02 SD-Acc: Accelerating Stable Diffusion through Phase-aware Sampling and Hardware Co-Optimizations Zhican Wang et.al. 2507.01309 null
2025-07-02 LLM-based Realistic Safety-Critical Driving Video Generation Yongjie Fu et.al. 2507.01264 null
2025-07-02 AIGVE-MACS: Unified Multi-Aspect Commenting and Scoring Model for AI-Generated Video Evaluation Xiao Liu et.al. 2507.01255 null
2025-07-01 Geometry-aware 4D Video Generation for Robot Manipulation Zeyi Liu et.al. 2507.01099 null
2025-07-01 Populate-A-Scene: Affordance-Aware Human Video Generation Mengyi Shan et.al. 2507.00334 null
2025-07-01 Listener-Rewarded Thinking in VLMs for Image Preferences Alexander Gambashidze et.al. 2506.22832 null
2025-06-30 FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion Yu Lu et.al. 2507.00162 null
2025-06-30 Epona: Autoregressive Diffusion World Model for Autonomous Driving Kaiwen Zhang et.al. 2506.24113 null
2025-06-30 VMoBA: Mixture-of-Block Attention for Video Diffusion Models Jianzong Wu et.al. 2506.23858 null
2025-06-30 SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation Shuai Tan et.al. 2506.23690 null
2025-06-30 ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models Zixun Fang et.al. 2506.23513 null
2025-06-29 Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis Lei-lei Li et.al. 2506.23263 null
2025-06-29 RoboScape: Physics-informed Embodied World Model Yu Shang et.al. 2506.23135 null
2025-06-27 Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy Yuhao Liu et.al. 2506.22432 null
2025-06-27 RoboEnvision: A Long-Horizon Video Generation Model for Multi-Task Robot Manipulation Liudi Yang et.al. 2506.22007 null
2025-06-27 ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models Hongbo Liu et.al. 2506.21356 null
2025-06-27 DFVEdit: Conditional Delta Flow Vector for Zero-shot Video Editing Lingling Cai et.al. 2506.20967 null
2025-06-26 SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture Kehan Sui et.al. 2506.21478 null
2025-06-26 HieraSurg: Hierarchy-Aware Diffusion Model for Surgical Video Generation Diego Biagini et.al. 2506.21287 null
2025-06-26 Video Virtual Try-on with Conditional Diffusion Transformer Inpainter Cheng Zou et.al. 2506.21270 null
2025-06-26 Consistent Zero-shot 3D Texture Synthesis Using Geometry-aware Diffusion and Temporal Video Models Donggoo Kang et.al. 2506.20946 null
2025-06-25 Video Perception Models for 3D Scene Synthesis Rui Huang et.al. 2506.20601 null
2025-06-25 BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos Jiahao Lin et.al. 2506.20103 null
2025-06-24 Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation Xingyang Li et.al. 2506.19852 null
2025-06-24 GenHSI: Controllable Generation of Human-Scene Interaction Videos Zekun Li et.al. 2506.19840 null
2025-06-24 SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution Liangbin Xie et.al. 2506.19838 null
2025-06-24 Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router Yubo Huang et.al. 2506.19833 null
2025-06-24 Training-Free Motion Customization for Distilled Video Generators with Adaptive Test-Time Distillation Jintao Rong et.al. 2506.19348 null
2025-06-23 VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory Runjia Li et.al. 2506.18903 null
2025-06-23 From Virtual Games to Real-World Play Wenqiang Sun et.al. 2506.18901 null
2025-06-23 FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation Kaiyi Huang et.al. 2506.18899 null
2025-06-23 MinD: Unified Visual Imagination and Control via Hierarchical World Models Xiaowei Chi et.al. 2506.18897 null
2025-06-23 OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation Qijun Gan et.al. 2506.18866 null
2025-06-23 Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset Zhuowei Chen et.al. 2506.18851 null
2025-06-23 Matrix-Game: Interactive World Foundation Model Yifan Zhang et.al. 2506.18701 null
2025-06-23 RDPO: Real Data Preference Optimization for Physics Consistency Video Generation Wenxu Qian et.al. 2506.18655 null
2025-06-23 BulletGen: Improving 4D Reconstruction with Bullet-Time Generation Denys Rozumnyi et.al. 2506.18601 null
2025-06-23 VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning Xuanyu Zhang et.al. 2506.18564 null
2025-06-23 Emergent Temporal Correspondences from Video Diffusion Transformers Jisu Nam et.al. 2506.17220 link
2025-06-21 STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation Jiamin Wang et.al. 2506.13138 null
2025-06-20 Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition Jiaqi Li et.al. 2506.17201 null
2025-06-20 Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation Riccardo Corvi et.al. 2506.16802 null
2025-06-20 Sekai: A Video Dataset towards World Exploration Zhen Li et.al. 2506.15675 null
2025-06-20 Show-o2: Improved Native Unified Multimodal Models Jinheng Xie et.al. 2506.15564 link
2025-06-19 VideoGAN-based Trajectory Proposal for Automated Vehicles Annajoyce Mariani et.al. 2506.16209 link
2025-06-19 FastInit: Fast Noise Initialization for Temporally Consistent Video Generation Chengyu Bai et.al. 2506.16119 null
2025-06-19 PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models Tianchen Zhao et.al. 2506.16054 null
2025-06-19 Advanced Sign Language Video Generation with Compressed and Quantized Multi-Condition Tokenization Cong Wang et.al. 2506.15980 link
2025-06-18 VideoMAR: Autoregressive Video Generatio with Continuous Tokens Hu Yu et.al. 2506.14168 null
2025-06-18 Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models Xuanchi Ren et.al. 2506.09042 link
2025-06-17 Causally Steered Diffusion for Automated Video Counterfactual Generation Nikos Spyrou et.al. 2506.14404 link
2025-06-17 CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation Jia-Chen Zhang et.al. 2506.14206 null
2025-06-16 EchoShot: Multi-Shot Portrait Video Generation Jiahao Wang et.al. 2506.15838 null
2025-06-16 UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions Zhucun Xue et.al. 2506.13691 null
2025-06-15 iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer Zhelun Shen et.al. 2506.12847 null
2025-06-13 SignAligner: Harmonizing Complementary Pose Modalities for Coherent Sign Language Generation Xu Wang et.al. 2506.11621 null
2025-06-13 Multimodal Cinematic Video Synthesis Using Text-to-Image and Audio Generation Models Sridhar S et.al. 2506.10005 null
2025-06-12 GenWorld: Towards Detecting AI-generated Real-world Simulation Videos Weiliang Chen et.al. 2506.10975 null
2025-06-12 M4V: Multi-Modal Mamba for Text-to-Video Generation Jiancheng Huang et.al. 2506.10915 null
2025-06-12 GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning Xiaoyi Bao et.al. 2506.10639 null
2025-06-12 DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers Lizhen Wang et.al. 2506.10568 null
2025-06-12 AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation Haoyuan Shi et.al. 2506.10540 null
2025-06-11 AlignHuman: Improving Motion and Fidelity via Timestep-Segment Preference Optimization for Audio-Driven Human Animation Chao Liang et.al. 2506.11144 null
2025-06-11 PlayerOne: Egocentric World Simulator Yuanpeng Tu et.al. 2506.09995 null
2025-06-11 InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions Zhenzhi Wang et.al. 2506.09984 null
2025-06-11 ReSim: Reliable World Simulation for Autonomous Driving Jiazhi Yang et.al. 2506.09981 null
2025-06-11 DGAE: Diffusion-Guided Autoencoder for Efficient Latent Representation Learning Dongxu Liu et.al. 2506.09644 null
2025-06-11 Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation Shanchuan Lin et.al. 2506.09350 null
2025-06-10 Seedance 1.0: Exploring the Boundaries of Video Generation Models Yu Gao et.al. 2506.09113 null
2025-06-10 FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation Zheqi He et.al. 2506.09081 link
2025-06-10 VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks Xinlong Chen et.al. 2506.09079 null
2025-06-10 MagCache: Fast Video Generation with Magnitude-Aware Cache Zehong Ma et.al. 2506.09045 link
2025-06-10 Product of Experts for Visual Generation Yunzhi Zhang et.al. 2506.08894 null
2025-06-10 HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation Ziyao Huang et.al. 2506.08797 null
2025-06-10 RoboSwap: A GAN-driven Video Diffusion Framework For Unsupervised Robot Arm Swapping Yang Bai et.al. 2506.08632 null
2025-06-10 How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models Huixuan Zhang et.al. 2506.08351 null
2025-06-10 From Generation to Generalization: Emergent Few-Shot Learning in Video Diffusion Models Pablo Acuaviva et.al. 2506.07280 null
2025-06-09 Seeing Voices: Generating A-Roll Video from Audio with Mirage Aditi Sundararaman et.al. 2506.08279 null
2025-06-09 Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion Xun Huang et.al. 2506.08009 null
2025-06-09 Dreamland: Controllable World Creation with Simulator and Generative Models Sicheng Mo et.al. 2506.08006 null
2025-06-09 Audio-Sync Video Generation with Multi-Stream Temporal Control Shuchen Weng et.al. 2506.08003 null
2025-06-09 Generative Modeling of Weights: Generalization or Memorization? Boya Zeng et.al. 2506.07998 link
2025-06-09 Video Unlearning via Low-Rank Refusal Vector Simone Facchiano et.al. 2506.07891 null
2025-06-09 EgoM2P: Egocentric Multimodal Multitask Pretraining Gen Li et.al. 2506.07886 null
2025-06-09 PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement Teng Hu et.al. 2506.07848 null
2025-06-09 Consistent Video Editing as Flow-Driven Image-to-Video Generation Ge Wang et.al. 2506.07713 null
2025-06-09 Evaluating Robustness in Latent Diffusion Models via Embedding Level Augmentation Boris Martirosyan et.al. 2506.07706 null
2025-06-09 Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers Haosong Liu et.al. 2506.05096 null
2025-06-08 TV-LiVE: Training-Free, Text-Guided Video Editing via Layer Informed Vitality Exploitation Min-Jung Kim et.al. 2506.07205 null
2025-06-08 Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models Sangwon Jang et.al. 2506.07177 null
2025-06-08 Hi-VAE: Efficient Video Autoencoding with Global and Detailed Motion Huaize Liu et.al. 2506.07136 null
2025-06-07 Self-Adapting Improvement Loops for Robotic Learning Calvin Luo et.al. 2506.06658 null
2025-06-06 Restereo: Diffusion stereo video generation and restoration Xingchang Huang et.al. 2506.06023 null
2025-06-06 LLIA -- Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models Haojie Yu et.al. 2506.05806 null
2025-06-06 FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion Akide Liu et.al. 2506.04648 null
2025-06-05 EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh Tao Hu et.al. 2506.05554 null
2025-06-05 ContentV: Efficient Training of Video Generation Models with Limited Compute Wenfeng Lin et.al. 2506.05343 null
2025-06-05 FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation Huihan Wang et.al. 2506.04956 link
2025-06-05 DualX-VSR: Dual Axial Spatial $\times$ Temporal Transformer for Real-World Video Super-Resolution without Motion Compensation Shuo Cao et.al. 2506.04830 null
2025-06-05 Follow-Your-Creation: Empowering 4D Creation through Video Inpainting Yue Ma et.al. 2506.04590 null
2025-06-05 FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers Xuanhua He et.al. 2506.04213 null
2025-06-05 SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios Lingwei Dang et.al. 2506.02444 link
2025-06-04 LayerFlow: A Unified Model for Layer-aware Video Generation Sihui Ji et.al. 2506.04228 null
2025-06-04 UNIC: Unified In-Context Video Editing Zixuan Ye et.al. 2506.04216 null
2025-06-04 DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models Ziyi Wu et.al. 2506.03517 null
2025-06-03 Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas Austin Silveria et.al. 2506.03275 null
2025-06-03 IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation Yuanze Lin et.al. 2506.03150 null
2025-06-03 Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval Jiwen Yu et.al. 2506.03141 null
2025-06-03 CamCloneMaster: Enabling Reference-based Camera Control for Video Generation Yawen Luo et.al. 2506.03140 null
2025-06-03 AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation Lu Qiu et.al. 2506.03126 null
2025-06-03 DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation Zhengyao Lv et.al. 2506.03123 null
2025-06-03 TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models Chetwin Low et.al. 2506.03099 null
2025-06-03 SG2VID: Scene Graphs Enable Fine-Grained Control for Video Synthesis Ssharvien Kumar Sivakumar et.al. 2506.03082 null
2025-06-03 ORV: 4D Occupancy-centric Robot Video Generation Xiuyu Yang et.al. 2506.03079 link
2025-06-03 Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers Pengtao Chen et.al. 2506.03065 null
2025-06-03 LinkTo-Anime: A 2D Animation Optical Flow Dataset from 3D Model Rendering Xiaoyi Feng et.al. 2506.02733 null
2025-06-03 LumosFlow: Motion-Guided Long Video Generation Jiahao Chen et.al. 2506.02497 null
2025-06-02 Motion aware video generative model Bowen Xue et.al. 2506.02244 null
2025-06-02 Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control Xiao Fu et.al. 2506.01943 null
2025-06-02 OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation Sen Liang et.al. 2506.01801 null
2025-06-02 Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks Tao Yang et.al. 2506.01758 null
2025-06-02 Respond Beyond Language: A Benchmark for Video Generation in Response to Realistic User Intents Shuting Wang et.al. 2506.01689 null
2025-06-02 LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model Xiaodong Wang et.al. 2506.01546 null
2025-06-02 Towards Scalable Video Anomaly Retrieval: A Synthetic Video-Text Benchmark Shuyu Yang et.al. 2506.01466 null
2025-06-02 DiffuseSlide: Training-Free High Frame Rate Video Generation Diffusion Geunmin Hwang et.al. 2506.01454 null
2025-05-30 MiniMax-Remover: Taming Bad Noise Helps Video Object Removal Bojia Zi et.al. 2505.24873 null
2025-05-30 DreamDance: Animating Character Art via Inpainting Stable Gaussian Worlds Jiaxu Zhang et.al. 2505.24733 null
2025-05-30 UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation Yang-Tian Sun et.al. 2505.24521 null
2025-05-30 Interactive Video Generation via Domain Adaptation Ishaan Rawal et.al. 2505.24253 null
2025-05-30 STORK: Improving the Fidelity of Mid-NFE Sampling for Diffusion and Flow Matching Models Zheng Tan et.al. 2505.24210 link
2025-05-29 MAGREF: Masked Guidance for Any-Reference Video Generation Yufan Deng et.al. 2505.23742 link
2025-05-29 VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos Tingyu Song et.al. 2505.23693 link
2025-05-29 VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models Xiangdong Zhang et.al. 2505.23656 link
2025-05-29 VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation Shi-Xue Zhang et.al. 2505.23484 link
2025-05-29 Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis Hengyuan Cao et.al. 2505.23325 null
2025-05-29 RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer Liu Liu et.al. 2505.23171 null
2025-05-29 Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing Tongtong Su et.al. 2505.23134 link
2025-05-29 MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation Siyuan Wang et.al. 2505.23120 link
2025-05-29 GeoMan: Temporally Consistent Human Geometry Estimation using Image-to-Video Diffusion Gwanghyun Kim et.al. 2505.23085 null
2025-05-29 MOVi: Training-free Text-conditioned Multi-Object Video Generation Aimon Rahman et.al. 2505.22980 null
2025-05-29 HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions Shuolin Xu et.al. 2505.22977 link
2025-05-29 Minute-Long Videos with Dual Parallelisms Zeqing Wang et.al. 2505.21070 link
2025-05-28 ATI: Any Trajectory Instruction for Controllable Video Generation Angtian Wang et.al. 2505.22944 null
2025-05-28 Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation Zhe Kong et.al. 2505.22647 link
2025-05-28 Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers Weilun Feng et.al. 2505.22167 null
2025-05-28 FaceEditTalker: Interactive Talking Head Generation with Facial Attribute Editing Guanwen Feng et.al. 2505.22141 null
2025-05-28 LatentMove: Towards Complex Human Movement Video Generation Ashkan Taghipour et.al. 2505.22046 null
2025-05-28 PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms Yifei Xia et.al. 2505.22016 null
2025-05-28 Learning World Models for Interactive Video Generation Taiye Chen et.al. 2505.21996 null
2025-05-28 SageAttention2++: A More Efficient Implementation of SageAttention2 Jintao Zhang et.al. 2505.21136 link
2025-05-28 OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation Shenghai Yuan et.al. 2505.20292 link
2025-05-27 HDRSDR-VQA: A Subjective Video Quality Dataset for HDR and SDR Comparative Evaluation Bowen Chen et.al. 2505.21831 null
2025-05-27 Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation Ke Zhang et.al. 2505.21653 null
2025-05-27 VideoMarkBench: Benchmarking Robustness of Video Watermarking Zhengyuan Jiang et.al. 2505.21620 link
2025-05-27 Frame In-N-Out: Unbounded Controllable Image-to-Video Generation Boyang Wang et.al. 2505.21491 null
2025-05-27 Dynamic Vision from EEG Brain Recordings: How much does EEG know? Prajwal Singh et.al. 2505.21385 null
2025-05-27 RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy Aiyue Chen et.al. 2505.21036 null
2025-05-27 Frame-Level Captions for Long Video Generation with Complex Multi Scenes Guangcong Zheng et.al. 2505.20827 null
2025-05-27 Learning Generalizable Robot Policy with Human Demonstration Video as a Prompt Xiang Zhu et.al. 2505.20795 null
2025-05-27 Photography Perspective Composition: Towards Aesthetic Perspective Recommendation Lujian Yao et.al. 2505.20655 null
2025-05-27 Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training Bolin Lai et.al. 2505.20629 null
2025-05-27 Dynamic-I2V: Exploring Image-to-Video Generation Models via Multimodal LLM Peng Liu et.al. 2505.19901 null
2025-05-26 MotionPro: A Precise Motion Controller for Image-to-Video Generation Zhongwei Zhang et.al. 2505.20287 null
2025-05-26 DriveCamSim: Generalizable Camera Simulation via Explicit Camera Modeling for Autonomous Driving Wenchao Sun et.al. 2505.19692 link
2025-05-26 TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs Juntong Wang et.al. 2505.19535 null
2025-05-26 The Role of Video Generation in Enhancing Data-Limited Action Understanding Wei Li et.al. 2505.19495 null
2025-05-26 Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals Nate Gillman et.al. 2505.19386 null
2025-05-26 DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation Junhao Chen et.al. 2505.18078 null
2025-05-25 From Single Images to Motion Policies via Video-Generation Environment Representations Weiming Zhi et.al. 2505.19306 null
2025-05-25 SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation Shenggan Cheng et.al. 2505.19151 null
2025-05-25 WorldEval: World Model as Real-World Robot Policies Evaluator Yaxuan Li et.al. 2505.19017 null
2025-05-25 Geometry-guided Online 3D Video Synthesis with Multi-View Temporal Consistency Hyunho Ha et.al. 2505.18932 null
2025-05-25 Interspatial Attention for Efficient 4D Human Video Generation Ruizhi Shao et.al. 2505.15800 null
2025-05-24 Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation Shuo Yang et.al. 2505.18875 null
2025-05-24 VORTA: Efficient Video Diffusion via Routing Sparse Attention Wenhao Sun et.al. 2505.18809 link
2025-05-24 DVD-Quant: Data-free Video Diffusion Transformers Quantization Zhiteng Li et.al. 2505.18663 link
2025-05-24 ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos Xiaodong Wang et.al. 2505.18650 null
2025-05-23 WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions Zizhang Li et.al. 2505.18151 null
2025-05-23 SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain Jiawei Zhou et.al. 2505.17727 null
2025-05-23 Scaling Image and Video Generation via Test-Time Evolutionary Search Haoran He et.al. 2505.17618 null
2025-05-23 InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO Xueji Fang et.al. 2505.17574 link
2025-05-23 Challenger: Affordable Adversarial Driving Video Generation Zhiyuan Xu et.al. 2505.15880 null
2025-05-22 Temporal Differential Fields for 4D Motion Modeling via Image-to-Video Synthesis Xin You et.al. 2505.17333 null
2025-05-22 Training-Free Efficient Video Generation via Dynamic Token Carving Yuechen Zhang et.al. 2505.16864 link
2025-05-22 Action2Dialogue: Generating Character-Centric Narratives from Scene-Level Prompts Taewon Kang et.al. 2505.16819 null
2025-05-22 MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM Siwei Meng et.al. 2505.16456 null
2025-05-21 Generative AI for Autonomous Driving: A Review Katharina Winter et.al. 2505.15863 null
2025-05-21 AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection Zhipei Xu et.al. 2505.15173 null
2025-05-21 CineTechBench: A Benchmark for Cinematographic Technique Understanding and Generation Xinran Wang et.al. 2505.15145 link
2025-05-21 BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation Haiquan Wen et.al. 2505.12620 link
2025-05-21 Video-GPT via Next Clip Diffusion Shaobin Zhuang et.al. 2505.12489 null
2025-05-20 Programmatic Video Prediction Using Large Language Models Hao Tang et.al. 2505.14948 link
2025-05-20 Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers Sucheng Ren et.al. 2505.14687 link
2025-05-20 LMP: Leveraging Motion Prior in Zero-Shot Video Generation with Diffusion Transformer Changgu Chen et.al. 2505.14167 null
2025-05-20 Hunyuan-Game: Industrial-grade Intelligent Game Creation Model Ruihuang Li et.al. 2505.14135 null
2025-05-20 MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation Yanbo Ding et.al. 2505.10238 link
2025-05-19 FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance Dian Shao et.al. 2505.13437 null
2025-05-19 MAGI-1: Autoregressive Video Generation at Scale Sand. ai et.al. 2505.13211 link
2025-05-19 DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories Joel Jang et.al. 2505.12705 link
2025-05-19 Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking Zihan Su et.al. 2505.12667 null
2025-05-18 EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models Hu Yue et.al. 2505.09694 link
2025-05-17 FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge Xuan Shen et.al. 2505.14709 link
2025-05-17 DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance Xuan Shen et.al. 2505.14708 link
2025-05-17 LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation Jiarui Wang et.al. 2505.12098 link
2025-05-17 VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption Tianxiong Zhong et.al. 2505.12053 null
2025-05-17 STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives Bo Wang et.al. 2505.08350 null
2025-05-16 QVGen: Pushing the Limit of Quantized Video Generative Models Yushi Huang et.al. 2505.11497 null
2025-05-16 Face Consistency Benchmark for GenAI Video Michal Podstawski et.al. 2505.11425 null
2025-05-16 Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model Wei Li et.al. 2505.07449 link
2025-05-15 ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars Rui-Yang Ju et.al. 2505.10072 null
2025-05-15 Generating time-consistent dynamics with discriminator-guided image diffusion models Philipp Hess et.al. 2505.09089 null
2025-05-15 Generative Pre-trained Autoregressive Diffusion Transformer Yuan Zhang et.al. 2505.07344 null
2025-05-14 Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios Huafeng Shi et.al. 2505.10584 null
2025-05-13 Generative AI for Autonomous Driving: Frontiers and Opportunities Yuping Wang et.al. 2505.08854 link
2025-05-13 Symbolically-Guided Visual Plan Inference from Uncurated Video Data Wenyan Yang et.al. 2505.08444 null
2025-05-12 DanceGRPO: Unleashing GRPO on Visual Generation Zeyue Xue et.al. 2505.07818 null
2025-05-12 ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models Ozgur Kara et.al. 2505.07652 null
2025-05-11 DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models Junhao Xia et.al. 2505.07057 null
2025-05-11 BridgeIV: Bridging Customized Image and Video Generation through Test-Time Autoregressive Identity Propagation Panwen Hu et.al. 2505.06985 null
2025-05-10 Jailbreaking the Text-to-Video Generative Models Jiayang Liu et.al. 2505.06679 null
2025-05-10 ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images Xianghao Kong et.al. 2505.06537 null
2025-05-08 3D Scene Generation: A Survey Beichen Wen et.al. 2505.05474 link
2025-05-08 T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models Xuyang Guo et.al. 2505.04946 null
2025-05-08 HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation Teng Hu et.al. 2505.04512 null
2025-05-06 Real-Time Person Image Synthesis Using a Flow Matching Model Jiwoo Jeong et.al. 2505.03562 link
2025-05-06 Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights Zhaiming Shen et.al. 2505.03205 null
2025-05-04 DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization Wenchuan Wang et.al. 2505.02192 null
2025-05-03 GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting Anushka Agarwal et.al. 2505.01928 null
2025-05-03 PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth Bu Jin et.al. 2505.01729 null
2025-05-02 VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos Zongxia Li et.al. 2505.01481 link
2025-05-02 FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis Jiangtong Tan et.al. 2505.01172 link
2025-05-01 Controllable Weather Synthesis and Removal with Video Diffusion Models Chih-Hao Lin et.al. 2505.00704 null
2025-05-01 T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation Xuyang Guo et.al. 2505.00337 null
2025-04-30 Direct Motion Models for Assessing Generated Videos Kelsey Allen et.al. 2505.00209 null
2025-04-30 Eye2Eye: A Simple Approach for Monocular-to-Stereo Video Synthesis Michal Geyer et.al. 2505.00135 null
2025-04-30 ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction Qihao Liu et.al. 2504.21855 null
2025-04-30 HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation Haiyang Zhou et.al. 2504.21650 link
2025-04-30 Simple Visual Artifact Detection in Sora-Generated Videos Misora Sugiyama et.al. 2504.21334 null
2025-04-30 Capturing Conditional Dependence via Auto-regressive Diffusion Models Xunpeng Huang et.al. 2504.21314 null
2025-04-29 TesserAct: Learning 4D Embodied World Models Haoyu Zhen et.al. 2504.20995 null
2025-04-29 DDPS: Discrete Diffusion Posterior Sampling for Paths in Layered Graphs Hao Luan et.al. 2504.20754 null
2025-04-29 Advance Fake Video Detection via Vision Transformers Joy Battocchio et.al. 2504.20669 null
2025-04-28 CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition Quynh Phung et.al. 2504.19894 null
2025-04-28 DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer Junpeng Jiang et.al. 2504.19614 null
2025-04-26 Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning Yifan Xie et.al. 2504.18810 null
2025-04-26 Stealing Creator's Workflow: A Creator-Inspired Agentic Framework with Iterative Feedback Loop for Improved Scientific Short-form Generation Jong Inn Park et.al. 2504.18805 null
2025-04-25 NoiseController: Towards Consistent Multi-view Video Generation via Noise Decomposition and Collaboration Haotian Dong et.al. 2504.18448 null
2025-04-25 We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback Minkyu Choi et.al. 2504.17180 null
2025-04-24 Dynamic Camera Poses and Where to Find Them Chris Rockwell et.al. 2504.17788 null
2025-04-24 MV-Crafter: An Intelligent System for Music-guided Video Generation Chuer Chen et.al. 2504.17267 null
2025-04-24 DIVE: Inverting Conditional Diffusion Models for Discriminative Tasks Yinqi Li et.al. 2504.17253 link
2025-04-23 Subject-driven Video Generation via Disentangled Identity and Motion Daneul Kim et.al. 2504.17816 null
2025-04-23 BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation Ruotong Wang et.al. 2504.16907 null
2025-04-23 ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance Ying Li et.al. 2504.16464 null
2025-04-23 VideoMark: A Distortion-Free Robust Watermarking Framework for Video Diffusion Models Xuming Hu et.al. 2504.16359 null
2025-04-22 DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment Xiaofan Li et.al. 2504.18576 link
2025-04-22 Survey of Video Diffusion Models: Foundations, Implementations, and Applications Yimu Wang et.al. 2504.16081 link
2025-04-22 Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework Xinyuan Song et.al. 2504.16016 null
2025-04-22 Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning Wang Lin et.al. 2504.15932 null
2025-04-22 Satellite to GroundScape -- Large-scale Consistent Ground View Generation from Satellite Views Ningli Xu et.al. 2504.15786 null
2025-04-22 DiTPainter: Efficient Video Inpainting with Diffusion Transformers Xian Wu et.al. 2504.15661 null
2025-04-21 Solving New Tasks by Adapting Internet Video Knowledge Calvin Luo et.al. 2504.15369 null
2025-04-21 Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform Xianpan Zhou et.al. 2504.15182 null
2025-04-21 DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation Weijie He et.al. 2504.15032 null
2025-04-21 Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation Chenjie Cao et.al. 2504.14899 link
2025-04-21 SkyReels-V2: Infinite-length Film Generative Model Guibin Chen et.al. 2504.13074 link
2025-04-21 Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Lvmin Zhang et.al. 2504.12626 link
2025-04-20 Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis Jingjing Ren et.al. 2504.14470 null
2025-04-19 SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation Minho Park et.al. 2504.14396 link
2025-04-18 Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting Jiaxin Huang et.al. 2504.11092 null
2025-04-17 Understanding Attention Mechanism in Video Diffusion Models Bingyan Liu et.al. 2504.12027 null
2025-04-17 VideoPanda: Video Panoramic Diffusion with Multi-view Attention Kevin Xie et.al. 2504.11389 null
2025-04-17 StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text Roberto Henschel et.al. 2403.14773 null
2025-04-16 VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate Zhihang Yuan et.al. 2504.12259 link
2025-04-16 Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM Zirui Pan et.al. 2504.12048 null
2025-04-16 The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation Bingjie Gao et.al. 2504.11739 null
2025-04-16 ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation Zongyi Li et.al. 2410.20502 null
2025-04-15 InterAnimate: Taming Region-aware Diffusion Model for Realistic Human Interaction Animation Yukang Lin et.al. 2504.10905 null
2025-04-15 OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding Dianbing Xi et.al. 2504.10825 null
2025-04-14 H-MoRe: Learning Human-centric Motion Representation for Action Analysis Zhanbo Huang et.al. 2504.10676 link
2025-04-14 H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models Yushu Wu et.al. 2504.10567 null
2025-04-14 FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos Rui Chen et.al. 2504.10358 null
2025-04-14 Aligning Anime Video Generation with Human Feedback Bingwen Zhu et.al. 2504.10044 null
2025-04-14 EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise Chao Liu et.al. 2504.09789 null
2025-04-13 CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models Pooja Guhan et.al. 2504.09472 null
2025-04-11 Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Team Seawead et.al. 2504.08685 null
2025-04-11 Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization Jialu Li et.al. 2504.08641 null
2025-04-11 Diffusion Models for Robotic Manipulation: A Survey Rosa Wolf et.al. 2504.08438 null
2025-04-11 EasyGenNet: An Efficient Framework for Audio-Driven Gesture Video Generation Based on Diffusion Model Renda Li et.al. 2504.08344 null
2025-04-11 RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements Guangcong Zheng et.al. 2504.08212 link
2025-04-11 TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation Ruineng Li et.al. 2504.08181 null
2025-04-10 Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction Zeren Jiang et.al. 2504.07961 link
2025-04-10 Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos Rundong Luo et.al. 2504.07940 null
2025-04-10 Diffusion Transformers for Tabular Data Time Series Generation Fabrizio Garuti et.al. 2504.07566 link
2025-04-09 EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation Diljeet Jagpal et.al. 2504.06861 null
2025-04-09 DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation Wangbo Zhao et.al. 2504.06803 link
2025-04-09 RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism Elia Peruzzo et.al. 2504.06672 null
2025-04-09 Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception Ruotian Peng et.al. 2504.06666 null
2025-04-08 CamContextI2V: Context-aware Controllable Video Generation Luis Denninger et.al. 2504.06022 link
2025-04-08 Physics-aware generative models for turbulent fluid flows through energy-consistent stochastic interpolants Nikolaj T. Mücke et.al. 2504.05852 link
2025-04-07 One-Minute Video Generation with Test-Time Training Karan Dalal et.al. 2504.05298 null
2025-04-07 Video-Bench: Human-Aligned Video Generation Benchmark Hui Han et.al. 2504.04907 null
2025-04-07 Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation Fa-Ting Hong et.al. 2504.02542 link
2025-04-05 Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization Yikai Wang et.al. 2504.04153 link
2025-04-05 Multi-identity Human Image Animation with Structural Video Diffusion Zhenzhi Wang et.al. 2504.04126 null
2025-04-05 Can You Count to Nine? A Human Evaluation Benchmark for Counting Limits in Modern Text-to-Video Models Xuyang Guo et.al. 2504.04051 null
2025-04-05 DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion Maksim Siniukov et.al. 2504.04010 null
2025-04-04 Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models Xuran Ma et.al. 2504.03140 link
2025-04-04 MG-Gen: Single Image to Motion Graphics Generation with Layer Decomposition Takahiro Shirakawa et.al. 2504.02361 null
2025-04-03 How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models Pascal Chang et.al. 2504.03072 null
2025-04-03 Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments Chenyu Zhang et.al. 2504.02918 null
2025-04-03 Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets Chuning Zhu et.al. 2504.02792 null
2025-04-03 Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model Shengjun Zhang et.al. 2504.02764 null
2025-04-03 ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer Jiayi Gao et.al. 2504.02451 link
2025-04-03 SkyReels-A2: Compose Anything in Video Diffusion Transformers Zhengcong Fei et.al. 2504.02436 link
2025-04-03 OmniCam: Unified Multimodal Video Generation via Camera Control Xiaoda Yang et.al. 2504.02312 null
2025-04-03 VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step Hanyang Wang et.al. 2504.01956 null
2025-04-03 Loong: Generating Minute-level Long Videos with Autoregressive Language Models Yuqing Wang et.al. 2410.02757 null
2025-04-02 Proof of Humanity: A Multi-Layer Network Framework for Certifying Human-Originated Content in an AI-Dominated Internet Sebastian Barros et.al. 2504.03752 null
2025-04-02 WorldPrompter: Traversable Text-to-Scene Generation Zhaoyang Zhang et.al. 2504.02045 null
2025-04-02 Towards Physically Plausible Video Generation via VLM Planning Xindi Yang et.al. 2503.23368 null
2025-04-01 AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction Junhao Cheng et.al. 2504.01014 link
2025-04-01 WorldScore: A Unified Evaluation Benchmark for World Generation Haoyi Duan et.al. 2504.00983 null
2025-04-01 DecoFuse: Decomposing and Fusing the "What", "Where", and "How" for Brain-Inspired fMRI-to-Video Decoding Chong Li et.al. 2504.00432 null
2025-04-01 HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation Boyuan Wang et.al. 2503.24026 null
2025-04-01 On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices Bosung Kim et.al. 2503.23796 link
2025-03-31 GazeLLM: Multimodal LLMs incorporating Human Visual Attention Jun Rekimoto et.al. 2504.00221 null
2025-03-31 Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Shengqiong Wu et.al. 2503.24379 null
2025-03-31 JointTuner: Appearance-Motion Adaptive Joint Training for Customized Video Generation Fangda Chen et.al. 2503.23951 null
2025-03-31 HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation Kun Liu et.al. 2503.23715 null
2025-03-30 VideoGen-Eval: Agent-based System for Video Generation Evaluation Yuhang Yang et.al. 2503.23452 link
2025-03-30 JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization Kai Liu et.al. 2503.23377 null
2025-03-30 MoCha: Towards Movie-Grade Talking Character Synthesis Cong Wei et.al. 2503.23307 null
2025-03-30 SketchVideo: Sketch-based Video Generation and Editing Feng-Lin Liu et.al. 2503.23284 null
2025-03-29 Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models Prin Phunyaphibarn et.al. 2503.20240 null
2025-03-28 Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model Jangho Park et.al. 2503.22622 null
2025-03-28 EchoFlow: A Foundation Model for Cardiac Ultrasound Image and Video Generation Hadrien Reynaud et.al. 2503.22357 null
2025-03-28 CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving Yishen Ji et.al. 2503.22231 null
2025-03-27 VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models Chi-Pin Huang et.al. 2503.21781 null
2025-03-27 Exploring the Evolution of Physics Cognition in Video Generation: A Survey Minghui Lin et.al. 2503.21765 link
2025-03-27 VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness Dian Zheng et.al. 2503.21755 link
2025-03-27 Audio-driven Gesture Generation via Deviation Feature in the Latent Space Jiahui Chen et.al. 2503.21616 null
2025-03-27 ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model Jinwei Qi et.al. 2503.21144 null
2025-03-26 Protecting Your Video Content: Disrupting Automated Video-based LLM Annotations Haitong Liu et.al. 2503.21824 link
2025-03-26 Synthetic Video Enhances Physical Fidelity in Video Synthesis Qi Zhao et.al. 2503.20822 null
2025-03-26 RecTable: Fast Modeling Tabular Data with Rectified Flow Masane Fuchi et.al. 2503.20731 link
2025-03-26 AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports Xiangwen Zhang et.al. 2503.20654 null
2025-03-26 GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving Lloyd Russell et.al. 2503.20523 null
2025-03-26 VPO: Aligning Text-to-Video Generation Models with Prompt Optimization Jiale Cheng et.al. 2503.20491 link
2025-03-26 Wan: Open and Advanced Large-Scale Video Generative Models WanTeam et.al. 2503.20314 link
2025-03-26 Video Motion Graphs Haiyang Liu et.al. 2503.20218 null
2025-03-26 Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing Jaihoon Kim et.al. 2503.19385 null
2025-03-26 EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models Yufei Cai et.al. 2503.19369 link
2025-03-25 Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors Yuke Lou et.al. 2503.20118 null
2025-03-25 Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals Stefan Stojanov et.al. 2503.19953 null
2025-03-25 FuXi-RTM: A Physics-Guided Prediction Framework with Radiative Transfer Modeling Qiusheng Huang et.al. 2503.19940 null
2025-03-25 FullDiT: Multi-Task Video Generative Foundation Model with Full Attention Xuan Ju et.al. 2503.19907 null
2025-03-25 Mask $^2$ DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation Tianhao Qi et.al. 2503.19881 null
2025-03-25 AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers Jiazhi Guan et.al. 2503.19824 null
2025-03-25 AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset Haiyu Zhang et.al. 2503.19462 null
2025-03-25 MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation Yukang Lin et.al. 2503.19383 null
2025-03-25 Long-Context Autoregressive Video Modeling with Next-Frame Prediction Yuchao Gu et.al. 2503.19325 link
2025-03-25 Aether: Geometric-Aware Unified World Modeling Aether Team et.al. 2503.18945 null
2025-03-25 AMD-Hummingbird: Towards an Efficient Text-to-Video Model Takashi Isobe et.al. 2503.18559 link
2025-03-25 Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model Yingying Fan et.al. 2503.16942 null
2025-03-24 Video-T1: Test-Time Scaling for Video Generation Fangfu Liu et.al. 2503.18942 null
2025-03-24 Training-free Diffusion Acceleration with Bottleneck Sampling Ye Tian et.al. 2503.18940 null
2025-03-24 EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation Qiang Qu et.al. 2503.18552 null
2025-03-24 Can Text-to-Video Generation help Video-Language Alignment? Luca Zanella et.al. 2503.18507 null
2025-03-24 Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation Dingcheng Zhen et.al. 2503.18429 null
2025-03-24 Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance Sicong Feng et.al. 2503.18386 null
2025-03-23 LongDiff: Training-Free Long Video Generation in One Go Zhuoling Li et.al. 2503.18150 null
2025-03-23 TransAnimate: Taming Layer Diffusion to Generate RGBA Video Xuewei Chen et.al. 2503.17934 null
2025-03-22 RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation Zhiqiang Yuan et.al. 2503.17735 null
2025-03-21 Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks Bhishma Dedhia et.al. 2503.17539 null
2025-03-21 Position: Interactive Generative Video as Next-Generation Game Engine Jiwen Yu et.al. 2503.17359 null
2025-03-21 AnimatePainter: A Self-Supervised Rendering Framework for Reconstructing Painting Process Junjie Hu et.al. 2503.17029 null
2025-03-21 Enabling Versatile Controls for Video Diffusion Models Xu Zhang et.al. 2503.16983 link
2025-03-21 SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation Chun-Han Yao et.al. 2503.16396 null
2025-03-20 A Recipe for Generating 3D Worlds From a Single Image Katja Schwarz et.al. 2503.16611 null
2025-03-20 XAttention: Block Sparse Attention with Antidiagonal Scoring Ruyi Xu et.al. 2503.16428 link
2025-03-20 MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance Quanhao Li et.al. 2503.16421 null
2025-03-20 ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos Haolin Yang et.al. 2503.16400 null
2025-03-20 PoseTraj: Pose-Aware Trajectory Control in Video Diffusion Longbin Ji et.al. 2503.16068 null
2025-03-20 Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models Marc Benedí San Millán et.al. 2503.15996 null
2025-03-20 MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving Haiguang Wang et.al. 2503.15875 link
2025-03-20 VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling Hyojun Go et.al. 2503.15855 null
2025-03-20 VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention Mingzhe Zheng et.al. 2503.15138 null
2025-03-19 Temporal Regularization Makes Your Video Generator Stronger Harold Haodong Chen et.al. 2503.15417 null
2025-03-19 Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models Tingxiu Chen et.al. 2503.14966 link
2025-03-18 MusicInfuser: Making Video Diffusion Listen and Dance Susung Hong et.al. 2503.14505 null
2025-03-18 MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation Hongyu Zhang et.al. 2503.14428 null
2025-03-18 Impossible Videos Zechen Bai et.al. 2503.14378 null
2025-03-18 LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models Yu Cheng et.al. 2503.14325 link
2025-03-18 Concat-ID: Towards Universal Identity-Preserving Video Synthesis Yong Zhong et.al. 2503.14151 null
2025-03-18 Fast Autoregressive Video Generation with Diagonal Decoding Yang Ye et.al. 2503.14070 null
2025-03-18 AIGVE-Tool: AI-Generated Video Evaluation Toolkit with Multifaceted Benchmark Xinhao Xiang et.al. 2503.14064 link
2025-03-17 MagicDistillation: Weak-to-Strong Video Distillation for Large-Scale Portrait Few-Step Synthesis Shitong Shao et.al. 2503.13319 null
2025-03-17 Language-guided Open-world Video Anomaly Detection Zihao Liu et.al. 2503.13160 null
2025-03-17 Frame-wise Conditioning Adaptation for Fine-Tuning Diffusion Models in Text-to-Video Prediction Zheyuan Liu et.al. 2503.12953 null
2025-03-17 AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations Quang Trung Truong et.al. 2503.12828 null
2025-03-17 Long-Video Audio Synthesis with Multi-Agent Collaboration Yehang Zhang et.al. 2503.10719 null
2025-03-16 SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse Inputs Guibiao Liao et.al. 2503.12535 null
2025-03-16 VMBench: A Benchmark for Perception-Aligned Video Motion Generation Xinran Ling et.al. 2503.10076 link
2025-03-15 ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis Yu Fang et.al. 2503.14526 null
2025-03-15 A Speech-to-Video Synthesis Approach Using Spatio-Temporal Diffusion for Vocal Tract MRI Paula Andrea Pérez-Toro et.al. 2503.12102 null
2025-03-15 SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering Byeongjun Park et.al. 2503.12024 link
2025-03-14 ReCamMaster: Camera-Controlled Generative Rendering from A Single Video Jianhong Bai et.al. 2503.11647 null
2025-03-14 HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models Ziqin Zhou et.al. 2503.11513 null
2025-03-14 TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation Hongxiang Zhao et.al. 2503.11423 null
2025-03-14 Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model Haoyang Huang et.al. 2503.11251 link
2025-03-14 Cross-Modal Learning for Music-to-Music-Video Description Generation Zhuoyuan Mao et.al. 2503.11190 null
2025-03-14 Long Context Tuning for Video Generation Yuwei Guo et.al. 2503.10589 null
2025-03-14 On the Limitations of Vision-Language Models in Understanding Image Transforms Ahmad Mustafa Anis et.al. 2503.09837 null
2025-03-13 CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models Hao He et.al. 2503.10592 null
2025-03-13 CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance Yufan Deng et.al. 2503.10391 null
2025-03-13 Semantic Latent Motion for Portrait Video Generation Qiyuan Zhang et.al. 2503.10096 null
2025-03-13 UVE: Are MLLMs Unified Evaluators for AI-Generated Videos? Yuanxin Liu et.al. 2503.09949 link
2025-03-13 Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers Yasheng Sun et.al. 2503.09942 null
2025-03-13 VideoMerge: Towards Training-free Long Video Generation Siyang Zhang et.al. 2503.09926 null
2025-03-13 WonderVerse: Extendable 3D Scene Generation with Video Generative Models Hao Feng et.al. 2503.09160 null
2025-03-12 Error Analyses of Auto-Regressive Video Diffusion Models: A Unified Framework Jing Wang et.al. 2503.10704 null
2025-03-12 LuciBot: Automated Robot Policy Learning from Generated Videos Xiaowen Qiu et.al. 2503.09871 null
2025-03-12 I2V3D: Controllable image-to-video generation with 3D guidance Zhiyuan Zhang et.al. 2503.09733 null
2025-03-12 Accelerating Diffusion Sampling via Exploiting Local Transition Coherence Shangwen Zhu et.al. 2503.09675 null
2025-03-12 Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k Xiangyu Peng et.al. 2503.09642 link
2025-03-12 PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop Chenyu Li et.al. 2503.09595 link
2025-03-12 Unified Dense Prediction of Video Diffusion Lehan Yang et.al. 2503.09344 null
2025-03-12 Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latant Space Jian Zhu et.al. 2503.09215 null
2025-03-12 SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video Chengshu Zhao et.al. 2503.09154 link
2025-03-12 Reangle-A-Video: 4D Video Generation as Video-to-Video Translation Hyeonho Jeong et.al. 2503.09151 null
2025-03-12 $^R$ FLAV: Rolling Flow matching for infinite Audio Video generation Alex Ergasti et.al. 2503.08307 link
2025-03-12 Object-Centric World Model for Language-Guided Manipulation Youngjoon Jeong et.al. 2503.06170 null
2025-03-11 V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video Jianqi Chen et.al. 2503.09631 null
2025-03-11 REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder Yitian Zhang et.al. 2503.08665 null
2025-03-11 Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling Subin Kim et.al. 2503.08605 null
2025-03-11 WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation Jing Wang et.al. 2503.08153 null
2025-03-11 ObjectMover: Generative Object Movement with Video Prior Xin Yu et.al. 2503.08037 null
2025-03-11 How Can Video Generative AI Transform K-12 Education? Examining Teachers' Perspectives through TPACK and TAM Unggi Lee et.al. 2503.08003 null
2025-03-11 VACE: All-in-One Video Creation and Editing Zeyinzi Jiang et.al. 2503.07598 null
2025-03-11 LightMotion: A Light and Tuning-free Method for Simulating Camera Motion in Video Generation Quanjian Song et.al. 2503.06508 link
2025-03-10 DreamRelation: Relation-Centric Video Customization Yujie Wei et.al. 2503.07602 null
2025-03-10 AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion Mingzhen Sun et.al. 2503.07418 null
2025-03-10 Automated Movie Generation via Multi-Agent CoT Planning Weijia Wu et.al. 2503.07314 link
2025-03-10 From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers Jiacheng Liu et.al. 2503.06923 link
2025-03-09 VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation Hritik Bansal et.al. 2503.06800 null
2025-03-09 TR-DQ: Time-Rotation Diffusion Quantization Yihua Shao et.al. 2503.06564 null
2025-03-09 QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation Junyi Wu et.al. 2503.06545 link
2025-03-09 Generative Video Bi-flow Chen Liu et.al. 2503.06364 null
2025-03-08 Text2Story: Advancing Video Storytelling with Text Guidance Taewon Kang et.al. 2503.06310 null
2025-03-08 ROCM: RLHF on consistency models Shivanshu Shekhar et.al. 2503.06171 null
2025-03-08 VACT: A Video Automatic Causal Testing System and a Benchmark Haotong Yang et.al. 2503.06163 null
2025-03-08 GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation Ye Tao et.al. 2503.06136 null
2025-03-08 DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation Runze Zhang et.al. 2503.06053 null
2025-03-08 The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation Aoxiong Yin et.al. 2503.04606 link
2025-03-08 Rethinking Video Tokenization: A Conditioned Diffusion-based Approach Nianzu Yang et.al. 2503.03708 link
2025-03-07 MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice Hongwei Yi et.al. 2503.05978 null
2025-03-07 MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio Xuenan Xu et.al. 2503.05242 link
2025-03-07 Unified Reward Model for Multimodal Understanding and Generation Yibin Wang et.al. 2503.05236 null
2025-03-07 Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos Zhiyu Tan et.al. 2502.21314 null
2025-03-06 Toward Lightweight and Fast Decoders for Diffusion Models in Image and Video Generation Alexey Buzovkin et.al. 2503.04871 link
2025-03-06 FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video Yue Gao et.al. 2503.04720 null
2025-03-06 What Are You Doing? A Closer Look at Controllable Human Video Generation Emanuele Bugliarello et.al. 2503.04666 null
2025-03-05 ProReflow: Progressive Reflow with Decomposed Velocity Lei Ke et.al. 2503.04824 null
2025-03-05 GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control Xuanchi Ren et.al. 2503.03751 link
2025-03-05 DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance Zhao Yang et.al. 2503.03689 link
2025-03-05 High-Quality Virtual Single-Viewpoint Surgical Video: Geometric Autocalibration of Multiple Cameras in Surgical Lights Yuna Kato et.al. 2503.03558 link
2025-03-05 Video Super-Resolution: All You Need is a Video Diffusion Model Zhihao Zhan et.al. 2503.03355 null
2025-03-04 GRADEO: Towards Human-Like Evaluation for Text-to-Video Generation via Multi-Step Reasoning Zhun Mou et.al. 2503.02341 null
2025-03-04 Unified Video Action Model Shuang Li et.al. 2503.00200 null
2025-03-03 VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation Wenhao Wang et.al. 2503.01739 link
2025-03-03 VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors Juil Koo et.al. 2503.01107 null
2025-03-03 TransVDM: Motion-Constrained Video Diffusion Model for Transparent Video Synthesis Menghao Li et.al. 2502.19454 null
2025-03-02 Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think Jie Tian et.al. 2503.00948 link
2025-03-01 Learning to Animate Images from A Few Videos to Portray Delicate Human Actions Haoxin Li et.al. 2503.00276 null
2025-02-28 Training-free and Adaptive Sparse Attention for Efficient Long Video Generation Yifei Xia et.al. 2502.21079 null
2025-02-28 HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models Xiao Wang et.al. 2502.20811 null
2025-02-28 WorldModelBench: Judging Video Generation Models As World Models Dacheng Li et.al. 2502.20694 null
2025-02-28 RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers Ke Cao et.al. 2502.14377 null
2025-02-27 Mobius: Text to Seamless Looping Video Generation via Latent Shift Xiuli Bi et.al. 2502.20307 link
2025-02-27 FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute Sotiris Anagnostidis et.al. 2502.20126 null
2025-02-27 C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation Yuhao Li et.al. 2502.19868 link
2025-02-26 Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis Long Cheng et.al. 2503.01873 null
2025-02-26 Glad: A Streaming Scene Generator for Autonomous Driving Bin Xie et.al. 2503.00045 null
2025-02-26 FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion mode Lingzhou Mu et.al. 2502.19455 null
2025-02-25 SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference Jintao Zhang et.al. 2502.18137 link
2025-02-25 ASurvey: Spatiotemporal Consistency in Video Generation Zhiyu Yin et.al. 2502.17863 null
2025-02-24 X-Dancer: Expressive Music to Human Dance Video Generation Zeyuan Chen et.al. 2502.17414 null
2025-02-24 VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing Xiangpeng Yang et.al. 2502.17258 null
2025-02-24 Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions Zhong Li et.al. 2502.17119 link
2025-02-21 RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers Min Zhao et.al. 2502.15894 null
2025-02-21 VaViM and VaVAM: Autonomous Driving through Video Generative Modeling Florent Bartoccioni et.al. 2502.15672 link
2025-02-21 LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities Florian Sestak et.al. 2502.12128 link
2025-02-20 Hardware-Friendly Static Quantization Method for Video Diffusion Transformers Sanghyun Yi et.al. 2502.15077 null
2025-02-20 LAVID: An Agentic LVLM Framework for Diffusion-Generated Video Detection Qingyuan Liu et.al. 2502.14994 null
2025-02-20 Improving the Diffusability of Autoencoders Ivan Skorokhodov et.al. 2502.14831 null
2025-02-20 Designing Parameter and Compute Efficient Diffusion Transformers using Distillation Vignesh Sundaresha et.al. 2502.14226 null
2025-02-19 FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation Yunpeng Zhang et.al. 2502.13995 link
2025-02-19 LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation Junchen Fu et.al. 2502.12945 null
2025-02-18 VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation Xinlong Chen et.al. 2502.12782 link
2025-02-18 MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation Sihyun Yu et.al. 2502.12632 null
2025-02-17 DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation Zhihang Yuan et.al. 2502.11897 link
2025-02-17 Object-Centric Image to Video Generation with Language Guidance Angel Villar-Corrales et.al. 2502.11655 null
2025-02-17 Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Guoqing Ma et.al. 2502.10248 link
2025-02-17 Magic 1-For-1: Generating One Minute Video Clips within One Minute Hongwei Yi et.al. 2502.07701 link
2025-02-16 MaskFlow: Discrete Flows For Flexible and Efficient Long Video Generation Michael Fuest et.al. 2502.11234 null
2025-02-16 Phantom: Subject-consistent video generation via cross-modal alignment Lijie Liu et.al. 2502.11079 null
2025-02-15 SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers Di Qiu et.al. 2502.10841 link
2025-02-14 RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control Teng Li et.al. 2502.10059 null
2025-02-14 GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation Hongyin Zhang et.al. 2502.09268 null
2025-02-13 Enhance-A-Video: Better Generated Video for Free Yang Luo et.al. 2502.07508 link
2025-02-12 CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation Qinghe Wang et.al. 2502.08639 null
2025-02-12 FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis Wonjoon Jin et.al. 2502.08244 null
2025-02-12 Learning Human Skill Generators at Key-Step Levels Yilu Wu et.al. 2502.08234 null
2025-02-12 AnyCharV: Bootstrap Controllable Character Video Generation with Fine-to-Coarse Guidance Zhao Wang et.al. 2502.08189 null
2025-02-12 Next Block Prediction: Video Generation via Semi-Autoregressive Modeling Shuhuai Ren et.al. 2502.07737 null
2025-02-12 VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation Sixiao Zheng et.al. 2502.07531 null
2024-05-07 LLM-grounded Video Diffusion Models Long Lian et.al. 2309.17444 null
2023-10-12 Echocardiography video synthesis from end diastolic semantic map via diffusion model Phi Nguyen Van et.al. 2310.07131 null
2023-05-30 Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising Fu-Yun Wang et.al. 2305.18264 null
2023-03-21 Latent Video Diffusion Models for High-Fidelity Long Video Generation Yingqing He et.al. 2211.13221 null
2022-07-12 Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis Long Zhuo et.al. 2207.05049 null
2021-12-02 Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image Andrew Liu et.al. 2012.09855 null
2020-11-10 Audeo: Audio Generation for a Silent Performance Video Kun Su et.al. 2006.14348 null
2019-10-15 TruNet: Short Videos Generation from Long Videos via Story-Preserving Truncation Fan Yang et.al. 1910.05899 null

(back to top)

TryOn

TryOn

Publish Date Title Authors PDF Code
2025-11-18 A System Dynamics Approach to Evaluating Sludge Management Strategies in Vinasse Treatment: Cost-Benefit Analysis and Scenario Assessment Agustin Olivares et.al. 2511.14607 null
2025-11-18 PFAvatar: Pose-Fusion 3D Personalized Avatar Reconstruction from Real-World Outfit-of-the-Day Photos Dianbing Xi et.al. 2511.12935 null
2025-11-17 Multi-Objective Statistical Model Checking using Lightweight Strategy Sampling (extended version) Pedro R. D'Argenio et.al. 2511.13460 null
2025-11-16 Nonlocal action in Everettian Quantum Mechanics Mordecai Waegell et.al. 2511.12403 null
2025-11-16 RefVTON: person-to-person Try on with Additional Unpaired Visual Reference Liuzhuozheng Li et.al. 2511.00956 null
2025-11-14 Learning Fair Representations with Kolmogorov-Arnold Networks Amisha Priyadarshini et.al. 2511.11767 null
2025-11-14 Discovering Meaningful Units with Visually Grounded Semantics from Image Captions Melika Behjati et.al. 2511.11262 null
2025-11-14 Power Ensemble Aggregation for Improved Extreme Event AI Prediction Julien Collard et.al. 2511.11170 null
2025-11-13 Optimal Welfare in Noncooperative Network Formation under Attack Natan Doubez et.al. 2511.10845 null
2025-11-13 Generating optimal Gravitational-Wave template banks with metric-preserving autoencoders Giovanni Cabass et.al. 2511.10466 null
2025-11-12 Efficiently Transforming Neural Networks into Decision Trees: A Path to Ground Truth Explanations with RENTT Helena Monke et.al. 2511.09299 null
2025-11-12 Food as Soft Power: Taiwanese Gastrodiplomacy on Social Media and Algorithmic Suppression Andrew Yen Chang et.al. 2511.05729 null
2025-11-10 Detecting Suicidal Ideation in Text with Interpretable Deep Learning: A CNN-BiGRU with Attention Mechanism Mohaiminul Islam Bhuiyan et.al. 2511.08636 null
2025-11-10 On maximizing private neighbors in graphs Stephen T. Hedetniemi et.al. 2511.07248 null
2025-11-06 Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts Ellis Brown et.al. 2511.04655 null
2025-11-06 IntelliProof: An Argumentation Network-based Conversational Helper for Organized Reflection Kaveh Eskandari Miandoab et.al. 2511.04528 null
2025-11-06 The truth is no diaper: Human and AI-generated associations to emotional words Špela Vintar et.al. 2511.04077 null
2025-11-04 Effective Test-Time Scaling of Discrete Diffusion through Iterative Refinement Sanghyun Lee et.al. 2511.05562 null
2025-11-04 FLAME: Flexible and Lightweight Biometric Authentication Scheme in Malicious Environments Fuyi Wang et.al. 2511.02176 null
2025-11-03 Confounding Factors in Relating Model Performance to Morphology Wessel Poelman et.al. 2511.01380 null
2025-11-02 AGRAG: Advanced Graph-based Retrieval-Augmented Generation for LLMs Yubo Wang et.al. 2511.05549 null
2025-11-01 Sparse and nonparametric estimation of equations governing dynamical systems with applications to biology G. Pillonetto et.al. 2511.00579 null
2025-10-31 Quantum-dot single photon source performance with off-resonant pulse preparation schemes Gavin Crowder et.al. 2511.00243 null
2025-10-31 EL-MIA: Quantifying Membership Inference Risks of Sensitive Entities in LLMs Ali Satvaty et.al. 2511.00192 null
2025-10-31 Consistency Training Helps Stop Sycophancy and Jailbreaks Alex Irpan et.al. 2510.27062 null
2025-10-30 Ring-polymer instanton theory for tunneling between asymmetric wells Marit R. Fiechter et.al. 2510.26592 null
2025-10-29 Heuristic Quantum Advantage with Peaked Circuits Hrant Gharibyan et.al. 2510.25838 null
2025-10-29 Tackling the Algorithmic Control Crisis -- the Technical, Legal, and Ethical Challenges of Research into Algorithmic Agents B. Bodo et.al. 2510.25337 null
2025-10-16 ART-VITON: Measurement-Guided Latent Diffusion for Artifact-Free Virtual Try-On Junseo Park et.al. 2509.25749 null
2025-10-09 Once Is Enough: Lightweight DiT-Based Video Virtual Try-On via One-Time Garment Appearance Injection Yanjie Pan et.al. 2510.07654 null
2025-10-06 AvatarVTON: 4D Virtual Try-On for Animatable Avatars Zicheng Jiang et.al. 2510.04822 null
2025-10-03 DiT-VTON: Diffusion Transformer Framework for Unified Multi-Category Virtual Try-On and Virtual Try-All with Integrated Image Editing Qi Li et.al. 2510.04797 null
2025-10-01 Virtual Fashion Photo-Shoots: Building a Large-Scale Garment-Lookbook Dataset Yannick Hauri et.al. 2510.00633 null
2025-09-29 UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections Zeyu Cai et.al. 2509.24817 null
2025-09-29 ControlHair: Physically-based Video Diffusion for Controllable Dynamic Hair Rendering Weikai Lin et.al. 2509.21541 null
2025-09-24 InstructVTON: Optimal Auto-Masking and Natural-Language-Guided Interactive Style Control for Inpainting-Based Virtual Try-On Julien Han et.al. 2509.20524 null
2025-09-24 Efficient Encoder-Free Pose Conditioning and Pose Control for Virtual Try-On Qi Li et.al. 2509.20343 null
2025-09-23 Clothing agnostic Pre-inpainting Virtual Try-ON Sehyun Kim et.al. 2509.17654 null
2025-09-21 SemanticGarment: Semantic-Controlled Generation and Editing of 3D Gaussian Garments Ruiyan Wang et.al. 2509.16960 null
2025-09-16 DEFT-VTON: Efficient Virtual Try-On with Consistent Generalised H-Transform Xingzi Xu et.al. 2509.13506 null
2025-09-05 LUIVITON: Learned Universal Interoperable VIrtual Try-ON Cong Cao et.al. 2509.05030 null
2025-09-04 Virtual Fitting Room: Generating Arbitrarily Long Videos of Virtual Try-On from a Single Image -- Technical Preview Jun-Kun Chen et.al. 2509.04450 null
2025-09-04 Towards High-Fidelity, Identity-Preserving Real-Time Makeup Transfer: Decoupling Style Generation Lydia Kin Ching Chau et.al. 2509.02445 null
2025-08-30 IC-Custom: Diverse Image Customization via In-Context Learning Yaowei Li et.al. 2507.01926 null
2025-08-28 Dress&Dance: Dress up and Dance as You Like It - Technical Preview Jun-Kun Chen et.al. 2508.21070 null
2025-08-28 FastFit: Accelerating Multi-Reference Virtual Try-On via Cacheable Diffusion Models Zheng Chong et.al. 2508.20586 null
2025-08-25 JCo-MVTON: Jointly Controllable Multi-Modal Diffusion Transformer for Mask-Free Virtual Try-on Aowen Wang et.al. 2508.17614 null
2025-08-19 OmniTry: Virtual Try-On Anything without Masks Yutong Feng et.al. 2508.13632 null
2025-08-16 DualFit: A Two-Stage Virtual Try-On via Warping and Synthesis Minh Tran et.al. 2508.12131 null
2025-08-12 StyleTailor: Towards Personalized Fashion Styling via Hierarchical Negative Feedback Hongbo Ma et.al. 2508.06555 null
2025-08-11 MuGa-VTON: Multi-Garment Virtual Try-On via Diffusion Transformers with Prompt Customization Ankan Deria et.al. 2508.08488 null
2025-08-11 Undress to Redress: A Training-Free Framework for Virtual Try-On Zhiying Li et.al. 2508.07680 null
2025-08-07 One Model For All: Partial Diffusion for Unified Try-On and Try-Off in Any Pose Jinxi Liu et.al. 2508.04559 null
2025-08-06 Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off Seungyong Lee et.al. 2508.04825 null
2025-08-06 Two-Way Garment Transfer: Unified Diffusion Framework for Dressing and Undressing Synthesis Angang Zhang et.al. 2508.04551 null
2025-08-06 FFHQ-Makeup: Paired Synthetic Makeup Dataset with Facial Consistency Across Multiple Styles Xingchao Yang et.al. 2508.03241 null
2025-08-04 DreamVVT: Mastering Realistic Video Virtual Try-On in the Wild via a Stage-Wise Diffusion Transformer Framework Tongchun Zuo et.al. 2508.02807 null
2025-07-29 From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos Chenjian Gao et.al. 2507.20331 null
2025-07-29 Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism Jun Zheng et.al. 2412.09822 null
2025-07-21 FW-VTON: Flattening-and-Warping for Person-to-Person Virtual Try-on Zheng Wang et.al. 2507.16010 null
2025-07-20 OmniVTON: Training-Free Universal Virtual Try-On Zhaotong Yang et.al. 2507.15037 null
2025-07-11 Scalable and Realistic Virtual Try-on Application for Foundation Makeup with Kubelka-Munk Theory Hui Pang et.al. 2507.07333 null
2025-07-08 TalkFashion: Intelligent Virtual Try-On Assistant Based on Multimodal Large Language Model Yujie Hu et.al. 2507.05790 null
2025-07-02 FreeLoRA: Enabling Training-Free LoRA Fusion for Autoregressive Multi-Subject Personalization Peng Zheng et.al. 2507.01792 null
2025-06-30 KiseKloset: Comprehensive System For Outfit Retrieval, Recommendation, And Try-On Thanh-Tung Phan-Nguyen et.al. 2506.23471 null
2025-06-29 DiffFit: Disentangled Garment Warping and Texture Refinement for Virtual Try-On Xiang Xu et.al. 2506.23295 null
2025-06-26 Video Virtual Try-on with Conditional Diffusion Transformer Inpainter Cheng Zou et.al. 2506.21270 null
2025-06-23 InstructAttribute: Fine-grained Object Attributes editing with Instruction Xingxi Yin et.al. 2505.00751 null
2025-06-14 Real-Time Per-Garment Virtual Try-On with Temporal Consistency for Loose-Fitting Garments Zaiqiang Wu et.al. 2506.12348 null
2025-06-13 HF-VTON: High-Fidelity Virtual Try-On via Consistent Geometric and Semantic Alignment Ming Meng et.al. 2505.19638 null
2025-06-12 Low-Barrier Dataset Collection with Real Human Body for Interactive Per-Garment Virtual Try-On Zaiqiang Wu et.al. 2506.10468 null
2025-06-06 ChronoTailor: Harnessing Attention Guidance for Fine-Grained Video Virtual Try-On Jinjuan Wang et.al. 2506.05858 null
2025-06-02 OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation Sen Liang et.al. 2506.01801 null
2025-06-01 DS-VTON: High-Quality Virtual Try-on via Disentangled Dual-Scale Generation Xianbing Sun et.al. 2506.00908 null
2025-05-29 VITON-DRR: Details Retention Virtual Try-on via Non-rigid Registration Ben Li et.al. 2505.23439 null
2025-05-28 MagicTryOn: Harnessing Diffusion Transformer for Garment-Preserving Video Virtual Try-on Guangyuan Li et.al. 2505.21325 null
2025-05-27 Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals Davide Lobba et.al. 2505.21062 null
2025-05-26 VTBench: Comprehensive Benchmark Suite Towards Real-World Virtual Try-on Models Hu Xiaobin et.al. 2505.19571 null
2025-05-22 Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction Dong Li et.al. 2505.16980 null
2025-05-22 Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On Siqi Wan et.al. 2505.16977 link
2025-05-15 Single View Garment Reconstruction Using Diffusion Mapping Via Pattern Coordinates Ren Li et.al. 2504.08353 link
2025-04-29 Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting Hanxi Liu et.al. 2504.20403 null
2025-04-24 FashionM3: Multimodal, Multitask, and Multiround Fashion Assistant based on Unified Vision-Language Model Kaicheng Pang et.al. 2504.17826 null
2025-04-24 3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models Min Wei et.al. 2504.17414 null
2025-04-21 Shape-Guided Clothing Warping for Virtual Try-On Xiaoyu Han et.al. 2504.15232 link
2025-04-21 Insert Anything: Image Insertion via In-Context Editing in DiT Wensong Song et.al. 2504.15009 null
2025-04-19 Flux Already Knows -- Activating Subject-Driven Image Generation without Training Hao Kang et.al. 2504.11478 link
2025-04-19 Concat-ID: Towards Universal Identity-Preserving Video Synthesis Yong Zhong et.al. 2503.14151 null
2025-04-18 Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation Fulvio Sanguigni et.al. 2504.14011 null
2025-04-17 Enhancing Person-to-Person Virtual Try-On with Multi-Garment Virtual Try-Off Riza Velioglu et.al. 2504.13078 link
2025-04-15 ReZero: Enhancing LLM search ability by trying one-more-time Alan Dao et.al. 2504.11001 null
2025-04-11 VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction Zijian He et.al. 2503.12165 null
2025-04-04 From Keypoints to Realism: A Realistic and Accurate Virtual Try-on Network from 2D Images Maliheh Toozandehjani et.al. 2504.03807 null
2025-04-03 MAD: Makeup All-in-One with Cross-Domain Diffusion Model Bo-Kai Ruan et.al. 2504.02545 null
2025-04-01 Diffusion Model-Based Size Variable Virtual Try-On Technology and Evaluation Method Shufang Zhang et.al. 2504.00562 null
2025-03-26 ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On Ji Woo Hong et.al. 2503.20418 null
2025-03-26 Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks Hailong Guo et.al. 2501.15891 null
2025-03-25 Exploring Disentangled and Controllable Human Image Synthesis: From End-to-End to Stage-by-Stage Zhengwentai Sun et.al. 2503.19486 null
2025-03-20 Shining Yourself: High-Fidelity Ornaments Virtual Try-on with Diffusion Model Yingmao Miao et.al. 2503.16065 null
2025-03-18 Limb-Aware Virtual Try-On Network with Progressive Clothing Warping Shengping Zhang et.al. 2503.14074 link
2025-03-16 Progressive Limb-Aware Virtual Try-On Xiaoyu Han et.al. 2503.12588 link
2025-03-15 ITVTON: Virtual Try-On Diffusion Transformer Based on Integrated Image and Text Haifeng Ni et.al. 2501.16757 null
2025-03-11 MF-VITON: High-Fidelity Mask-Free Virtual Try-On with Minimal Input Zhenchen Wan et.al. 2503.08650 null
2025-03-11 RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency Siqi Li et.al. 2501.08682 null
2025-02-20 CrossVTON: Mimicking the Logic Reasoning on Cross-category Virtual Try-on guided by Tri-zone Priors Donghao Luo et.al. 2502.14373 null
2025-02-05 Dress-1-to-3: Single Image to Simulation-Ready 3D Outfit with Diffusion Prior and Differentiable Physics Xuan Li et.al. 2502.03449 null
2025-02-03 MFP-VTON: Enhancing Mask-Free Person-to-Person Virtual Try-On via Diffusion Transformer Le Shen et.al. 2502.01626 null
2025-01-26 IPVTON: Image-based 3D Virtual Try-on with Image Prompt Adapter Xiaojing Zhong et.al. 2501.15616 null
2025-01-26 Cross-Cultural Fashion Design via Interactive Large Language Models and Diffusion Models Spencer Ramsey et.al. 2501.15571 null
2025-01-20 EfficientVITON: An Efficient Virtual Try-On Model using Optimized Diffusion Process Mostafa Atef et.al. 2501.11776 null
2025-01-20 CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation Zheng Chong et.al. 2501.11325 link
2025-01-17 Disharmony: Forensics using Reverse Lighting Harmonization Philip Wootaek Shin et.al. 2501.10212 null
2025-01-12 ODPG: Outfitting Diffusion with Pose Guided Condition Seohyun Lee et.al. 2501.06769 null
2025-01-10 MC-VTON: Minimal Control Virtual Try-On Diffusion Transformer Junsheng Luan et.al. 2501.03630 null
2025-01-09 1-2-1: Renaissance of Single-Network Paradigm for Virtual Try-On Shuliang Ning et.al. 2501.05369 null
2025-01-08 Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling Nannan Li et.al. 2501.04666 null
2025-01-07 HYB-VITON: A Hybrid Approach to Virtual Try-On Combining Explicit and Implicit Warping Kosuke Takemoto et.al. 2501.03910 link
2025-01-07 VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Yuanpeng Tu et.al. 2501.01427 null
2024-12-25 DRDM: A Disentangled Representations Diffusion Model for Synthesizing Realistic Person Images Enbo Huang et.al. 2412.18797 null
2024-12-22 PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask Jeongho Kim et.al. 2412.16978 link
2024-12-19 DiffusionTrend: A Minimalist Approach to Virtual Fashion Try-On Wengyi Zhan et.al. 2412.14465 null
2024-12-19 FashionComposer: Compositional Fashion Image Generation Sihui Ji et.al. 2412.14168 null
2024-11-18 Try-On-Adapter: A Simple and Flexible Try-On Paradigm Hanzhong Guo et.al. 2411.10187 null
2024-07-18 Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models Phuong Dam et.al. 2403.07371 null
2024-07-18 Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images Aiyu Cui et.al. 2311.16094 null
2024-06-05 GraVITON: Graph based garment warping with attention guided inversion for Virtual-tryon Sanhita Pathak et.al. 2406.02184 null
2024-05-28 Single Stage Warped Cloth Learning and Semantic-Contextual Attention Feature Fusion for Virtual TryOn Sanhita Pathak et.al. 2310.05024 null
2024-05-08 VTON-IT: Virtual Try-On using Image Translation Santosh Adhikari et.al. 2310.04558 null
2024-04-29 Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos Zhengze Xu et.al. 2404.17571 null
2024-04-02 TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On Jiazheng Xing et.al. 2404.00878 null
2023-04-03 Learning Garment DensePose for Robust Warping in Virtual Try-On Aiyu Cui et.al. 2303.17688 null
2021-09-13 Per Garment Capture and Synthesis for Real-time Virtual Try-on Toby Chong et.al. 2109.04654 null
2021-08-25 ARShoe: Real-Time Augmented Reality Shoe Try-on System on Smartphones Shan An et.al. 2108.10515 null
2021-06-01 An Efficient Style Virtual Try on Network for Clothing Business Industry Shanchen Pang et.al. 2105.13183 null
2021-01-14 ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on Gaurav Kuppa et.al. 2012.10495 null
2016-02-22 Issues in the Multiple Try Metropolis mixing L. Martino et.al. 1508.04253 null
2015-02-27 Trying to understand dark matter B. Hoeneisen et.al. 1502.07375 null
2014-05-20 On the flexibility of the design of Multiple Try Metropolis schemes Luca Martino et.al. 1201.0646 null

(back to top)

Visual Edit

Visual Edit

Publish Date Title Authors PDF Code
2025-11-18 UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning Rui Tian et.al. 2511.14760 null
2025-11-18 Task Addition and Weight Disentanglement in Closed-Vocabulary Models Adam Hazimeh et.al. 2511.14569 null
2025-11-18 ManipShield: A Unified Framework for Image Manipulation Detection, Localization and Explanation Zitong Xu et.al. 2511.14259 null
2025-11-18 InstantViR: Real-Time Video Inverse Problem Solver with Distilled Diffusion Prior Weimin Bai et.al. 2511.14208 null
2025-11-18 UniSER: A Foundation Model for Unified Soft Effects Removal Jingdong Zhang et.al. 2511.14183 null
2025-11-18 Text-Driven Reasoning Video Editing via Reinforcement Learning on Digital Twin Representations Yiqing Shen et.al. 2511.14100 null
2025-11-18 Error-Driven Scene Editing for 3D Grounding in Large Language Models Yue Zhang et.al. 2511.14086 null
2025-11-18 Semantic Context Matters: Improving Conditioning for Autoregressive Models Dongyang Jin et.al. 2511.14063 null
2025-11-18 Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline Rui Zuo et.al. 2511.13442 null
2025-11-18 MedGEN-Bench: Contextually entangled benchmark for open-ended multimodal medical generation Junjie Yang et.al. 2511.13135 null
2025-11-17 Free-Form Scene Editor: Enabling Multi-Round Object Manipulation like in a 3D Engine Xincheng Shuai et.al. 2511.13713 null
2025-11-17 Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting Jiangnan Ye et.al. 2511.13684 null
2025-11-17 Language-Guided Invariance Probing of Vision-Language Models Jae Joong Lee et.al. 2511.13494 null
2025-11-17 Semantic Document Derendering: SVG Reconstruction via Vision-Language Modeling Adam Hazimeh et.al. 2511.13478 null
2025-11-17 TripleFDS: Triple Feature Disentanglement and Synthesis for Scene Text Editing Yuchen Bao et.al. 2511.13399 null
2025-11-17 SkyReels-Text: Fine-grained Font-Controllable Text Editing for Poster Design Yunjie Yu et.al. 2511.13285 null
2025-11-17 Uncovering and Mitigating Transient Blindness in Multimodal Model Editing Xiaoqi Han et.al. 2511.13243 null
2025-11-17 InteractiveGNNExplainer: A Visual Analytics Framework for Multi-Faceted Understanding and Probing of Graph Neural Network Predictions TC Singh et.al. 2511.13160 null
2025-11-17 Semantic Prioritization in Visual Counterfactual Explanations with Weighted Segmentation and Auto-Adaptive Region Selection Lintong Zhang et.al. 2511.12992 null
2025-11-17 Text2Traffic: A Text-to-Image Generation and Editing Method for Traffic Scenes Feng Lv et.al. 2511.12932 null
2025-11-17 Generative Photographic Control for Scene-Consistent Video Cinematic Editing Huiqiang Sun et.al. 2511.12921 null
2025-11-16 Catastrophic Forgetting in Kolmogorov-Arnold Networks Mohammad Marufur Rahman et.al. 2511.12828 null
2025-11-16 Toward Real-world Text Image Forgery Localization: Structured and Interpretable Data Synthesis Zeqin Yu et.al. 2511.12658 null
2025-11-16 Designed to Spread: Generative Approaches to Enhance Information Diffusion Ziqing Qian et.al. 2511.12516 null
2025-11-15 ZoomEarth: Active Perception for Ultra-High-Resolution Geospatial Vision-Language Tasks Ruixun Liu et.al. 2511.12267 null
2025-11-15 Mixture of States: Routing Token-Level Dynamics for Multimodal Generation Haozhe Liu et.al. 2511.12207 null
2025-11-15 FIA-Edit: Frequency-Interactive Attention for Efficient and High-Fidelity Inversion-Free Text-Guided Image Editing Kaixiang Yang et.al. 2511.12151 null
2025-11-15 Image-POSER: Reflective RL for Multi-Expert Image Generation and Editing Hossein Mohebbi et.al. 2511.11780 null
2025-11-14 PEtab-GUI: A graphical user interface to create, edit and inspect PEtab parameter estimation problems Paul Jonas Jost et.al. 2511.11515 null
2025-11-14 ImAgent: A Unified Multimodal Agent Framework for Test-Time Scalable Image Generation Kaishen Wang et.al. 2511.11483 null
2025-11-14 WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation Wei Chow et.al. 2511.11434 null
2025-11-14 SimuFreeMark: A Noise-Simulation-Free Robust Watermarking Against Image Editing Yichao Tang et.al. 2511.11295 null
2025-11-14 Parameter-Efficient MoE LoRA for Few-Shot Multi-Style Editing Cong Cao et.al. 2511.11236 null
2025-11-14 On the Information-Theoretic Fragility of Robust Watermarking under Diffusion Editing Yunyi Ni et.al. 2511.10933 null
2025-11-14 STELLAR: Scene Text Editor for Low-Resource Languages and Real-World Data Yongdeuk Seo et.al. 2511.09977 null
2025-11-14 UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation Zhen Yang et.al. 2511.08195 null
2025-11-13 IPCD: Intrinsic Point-Cloud Decomposition Shogo Sato et.al. 2511.09866 null
2025-11-13 AHA! Animating Human Avatars in Diverse Scenes with Gaussian Splatting Aymen Mir et.al. 2511.09827 null
2025-11-12 SliderEdit: Continuous Image Editing with Fine-Grained Instruction Control Arman Zarei et.al. 2511.09715 null
2025-11-11 RePose-NeRF: Robust Radiance Fields for Mesh Reconstruction under Noisy Camera Poses Sriram Srinivasan et.al. 2511.08545 null
2025-11-11 3D4D: An Interactive, Editable, 4D World Model via 3D Video Generation Yunhong He et.al. 2511.08536 null
2025-11-11 UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Zhengyang Liang et.al. 2511.08521 null
2025-11-11 HardFlow: Hard-Constrained Sampling for Flow-Matching Models via Trajectory Optimization Zeyang Li et.al. 2511.08425 null
2025-11-11 LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning Fengyi Fu et.al. 2511.08251 null
2025-11-11 VectorSynth: Fine-Grained Satellite Image Synthesis with Structured Semantics Daniel Cher et.al. 2511.07744 null
2025-11-09 Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising Assaf Singer et.al. 2511.08633 null
2025-11-09 AesTest: Measuring Aesthetic Intelligence from Perception to Production Guolong Wang et.al. 2511.06360 null
2025-11-09 RelightMaster: Precise Video Relighting with Multi-plane Light Images Weikang Bian et.al. 2511.06271 null
2025-11-07 On the Brittleness of CLIP Text Encoders Allie Tran et.al. 2511.04247 null
2025-11-07 Med-Banana-50K: A Cross-modality Large-Scale Dataset for Text-guided Medical Image Editing Zhihui Chen et.al. 2511.00801 null
2025-11-06 Personalized Image Editing in Text-to-Image Diffusion Models via Collaborative Direct Preference Optimization Connor Dunlop et.al. 2511.05616 null
2025-11-06 MusRec: Zero-Shot Text-to-Music Editing via Rectified Flow and Diffusion Transformers Ali Boudaghi et.al. 2511.04376 null
2025-11-06 Improving Multi-View Reconstruction via Texture-Guided Gaussian-Mesh Joint Optimization Zhejia Cai et.al. 2511.03950 null
2025-11-05 Diffusion-Based Image Editing: An Unforeseen Adversary to Robust Invisible Watermarks Wenkai Fu et.al. 2511.05598 null
2025-11-05 Disentangled Concepts Speak Louder Than Words:Explainable Video Action Recognition Jongseo Lee et.al. 2511.03725 null
2025-11-05 Unified Long Video Inpainting and Outpainting via Overlapping High-Order Co-Denoising Shuangquan Lyu et.al. 2511.03272 null
2025-11-05 ESA: Energy-Based Shot Assembly Optimization for Automatic Video Editing Yaosen Chen et.al. 2511.02505 null
2025-11-03 UniREditBench: A Unified Reasoning-based Image Editing Benchmark Feng Han et.al. 2511.01295 null
2025-10-31 BlurGuard: A Simple Approach for Robustifying Image Protection Against AI-Powered Editing Jinsu Kim et.al. 2511.00143 null
2025-10-31 Understanding the Implicit User Intention via Reasoning with Large Language Model for Image Editing Yijia Wang et.al. 2510.27335 null
2025-10-30 Security Risk of Misalignment between Text and Image in Multi-modal Model Xiaosen Wang et.al. 2510.26105 null
2025-10-29 LGCC: Enhancing Flow Matching Based Text-Guided Image Editing with Local Gaussian Coupling and Context Consistency Fangbing Liu et.al. 2511.01894 null
2025-10-29 SplitFlow: Flow Decomposition for Inversion-Free Text-to-Image Editing Sung-Hoon Yoon et.al. 2510.25970 null
2025-10-29 RegionE: Adaptive Region-Aware Generation for Efficient Image Editing Pengtao Chen et.al. 2510.25590 null
2025-10-29 LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation Zeyu Wang et.al. 2510.22946 null
2025-10-28 Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation Inclusion AI et.al. 2510.24821 null
2025-10-28 Group Relative Attention Guidance for Image Editing Xuanpu Zhang et.al. 2510.24657 null
2025-10-28 Diffusion Adaptive Text Embedding for Text-to-Image Diffusion Models Byeonghu Na et.al. 2510.23974 null
2025-10-27 Autoregressive Styled Text Image Generation, but Make it Reliable Carmine Zaccagnino et.al. 2510.23240 null
2025-10-27 UniAIDet: A Unified and Universal Benchmark for AI-Generated Image Content Detection and Localization Huixuan Zhang et.al. 2510.23023 null
2025-10-27 VALA: Learning Latent Anchors for Training-Free and Temporally Consistent Zhangkai Wu et.al. 2510.22970 null
2025-10-27 FAME: Fairness-aware Attention-modulated Video Editing Zhangkai Wu et.al. 2510.22960 null
2025-10-27 LayerComposer: Interactive Personalized T2I via Spatially-Aware Layered Canvas Guocheng Gordon Qian et.al. 2510.20820 null
2025-10-25 GeoDiffusion: A Training-Free Framework for Accurate 3D Geometric Conditioning in Image Generation Phillip Mueller et.al. 2510.22337 null
2025-10-24 FlowOpt: Fast Optimization Through Whole Flow Processes for Training-Free Editing Or Ronai et.al. 2510.22010 null
2025-10-24 SafetyPairs: Isolating Safety Critical Image Features with Counterfactual Image Generation Alec Helbling et.al. 2510.21120 null
2025-10-24 EditInfinity: Image Editing with Binary-Quantized Generative Models Jiahuan Wang et.al. 2510.20217 null
2025-10-24 Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks Kai Zeng et.al. 2510.19195 null
2025-10-23 Positional Encoding Field Yunpeng Bai et.al. 2510.20385 null
2025-10-23 FlowCycle: Pursuing Cycle-Consistent Flows for Text-based Editing Yanghao Wang et.al. 2510.20212 null
2025-10-22 Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing Yusu Qian et.al. 2510.19808 null
2025-10-21 PICABench: How Far Are We from Physically Realistic Image Editing? Yuandong Pu et.al. 2510.17681 null
2025-10-21 Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback Zongjian Li et.al. 2510.16888 null
2025-10-20 ConsistEdit: Highly Consistent and Precise Training-free Visual Editing Zixin Yin et.al. 2510.17803 null
2025-10-19 Region in Context: Text-condition Image editing with Human-like semantic reasoning Thuy Phuong Vu et.al. 2510.16772 null
2025-10-17 BLIP3o-NEXT: Next Frontier of Native Image Generation Jiuhai Chen et.al. 2510.15857 null
2025-10-17 Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset Qingyan Bai et.al. 2510.15742 null
2025-10-16 Coupled Diffusion Sampling for Training-Free Multi-View Image Editing Hadi Alzayer et.al. 2510.14981 null
2025-10-16 Learning an Image Editing Model without Image Editing Pairs Nupur Kumari et.al. 2510.14978 null
2025-10-16 In-Context Learning with Unpaired Clips for Instruction-based Video Editing Xinyao Liao et.al. 2510.14648 null
2025-10-15 Edit-Your-Interest: Efficient Video Editing via Feature Most-Similar Propagation Yi Zuo et.al. 2510.13084 null
2025-10-14 UniFusion: Vision-Language Model as Unified Encoder in Image Generation Kevin Li et.al. 2510.12789 null
2025-10-14 Vectorized Video Representation with Easy Editing via Hierarchical Spatio-Temporally Consistent Proxy Embedding Ye Chen et.al. 2510.12256 null
2025-10-14 VIDMP3: Video Editing by Representing Motion with Pose and Position Priors Sandeep Mishra et.al. 2510.12069 null
2025-10-13 IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment Yinan Chen et.al. 2510.11647 null
2025-10-13 Zero-shot Face Editing via ID-Attribute Decoupled Inversion Yang Hou et.al. 2510.11050 null
2025-10-13 GeoVLMath: Enhancing Geometry Reasoning in Vision-Language Models via Cross-Modal Reward for Auxiliary Line Creation Shasha Guo et.al. 2510.11020 null
2025-10-13 DreamMakeup: Face Makeup Customization using Latent Diffusion Models Geon Yeong Park et.al. 2510.10918 null
2025-10-11 EditCast3D: Single-Frame-Guided 3D Editing with Video Propagation and View Selection Huaizhi Qu et.al. 2510.13652 null
2025-10-11 ReMix: Towards a Unified View of Consistent Character Generation and Editing Benjia Zhou et.al. 2510.10156 null
2025-10-11 MultiCOIN: Multi-Modal COntrollable Video INbetweening Maham Tanveer et.al. 2510.08561 null
2025-10-10 Mono4DEditor: Text-Driven 4D Scene Editing from Monocular Video via Point-Level Localization of Language-Embedded Gaussians Jin-Chuan Shi et.al. 2510.09438 null
2025-10-10 TBStar-Edit: From Image Editing Pattern Shifting to Consistency Enhancement Hao Fang et.al. 2510.04483 null
2025-10-09 FreqCa: Accelerating Diffusion Models via Frequency-Aware Caching Jiacheng Liu et.al. 2510.08669 null
2025-10-09 Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing Rishubh Parihar et.al. 2510.08532 null
2025-10-09 InstructX: Towards Unified Visual Editing with MLLM Guidance Chong Mou et.al. 2510.08485 null
2025-10-09 UniVideo: Unified Understanding, Generation, and Editing for Videos Cong Wei et.al. 2510.08377 null
2025-10-09 InstructUDrag: Joint Text Instructions and Object Dragging for Interactive Image Editing Haoran Yu et.al. 2510.08181 null
2025-10-09 Beyond Textual CoT: Interleaved Text-Image Chains with Deep Confidence Reasoning for Image Editing Zhentao Zou et.al. 2510.08157 null
2025-10-08 DreamOmni2: Multimodal Instruction-based Editing and Generation Bin Xia et.al. 2510.06679 null
2025-10-07 Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding Yi Xin et.al. 2510.06308 null
2025-10-07 Efficient High-Resolution Image Editing with Hallucination-Aware Loss and Adaptive Tiling Young D. Kwon et.al. 2510.06295 null
2025-10-07 Diffusion-Based Image Editing for Breaking Robust Watermarks Yunyi Ni et.al. 2510.05978 null
2025-10-07 When and How to Cut Classical Concerts? A Multimodal Automated Video Editing Approach Daniel Gonzálbez-Biosca et.al. 2510.05661 null
2025-10-06 SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder Ronen Kamenetsky et.al. 2510.05081 null
2025-10-05 ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation Jay Zhangjie Wu et.al. 2510.04290 null
2025-10-05 Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers Shikang Zheng et.al. 2510.04188 null
2025-10-05 Prompt-to-Prompt: Text-Based Image Editing Via Cross-Attention Mechanisms -- The Research of Hyperparameters and Novel Mechanisms to Enhance Existing Frameworks Linn Bieske et.al. 2510.04034 null
2025-10-04 From Filters to VLMs: Benchmarking Defogging Methods through Object Detection and Segmentation Performance Ardalan Aryashad et.al. 2510.03906 null
2025-10-04 Rare Text Semantics Were Always There in Your Diffusion Transformer Seil Kang et.al. 2510.03886 null
2025-10-03 DiT-VTON: Diffusion Transformer Framework for Unified Multi-Category Virtual Try-On and Virtual Try-All with Integrated Image Editing Qi Li et.al. 2510.04797 null
2025-10-03 OTR: Synthesizing Overlay Text Dataset for Text Removal Jan Zdenek et.al. 2510.02787 null
2025-10-02 DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing Zihan Zhou et.al. 2510.02253 null
2025-10-02 Towards Better Optimization For Listwise Preference in Diffusion Models Jiamu Bai et.al. 2510.01540 null
2025-10-02 VRWKV-Editor: Reducing quadratic complexity in transformer-based video editing Abdelilah Aitrouga et.al. 2509.25998 null
2025-10-01 IMAGEdit: Let Any Subject Transform Fei Shen et.al. 2510.01186 null
2025-10-01 EditTrack: Detecting and Attributing AI-assisted Image Editing Zhengyuan Jiang et.al. 2510.01173 null
2025-10-01 DIA: The Adversarial Exposure of Deterministic Inversion in Diffusion Models Seunghoo Hong et.al. 2510.00778 null
2025-10-01 CAMILA: Context-Aware Masking for Image Editing with Language Alignment Hyunseung Kim et.al. 2509.19731 null
2025-09-30 EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing Keming Wu et.al. 2509.26346 null
2025-09-30 Training-Free Reward-Guided Image Editing via Trajectory Optimal Control Jinho Chang et.al. 2509.25845 null
2025-09-30 Editable Noise Map Inversion: Encoding Target-image into Noise For High-Fidelity Image Manipulation Mingyu Kang et.al. 2509.25776 null
2025-09-30 Dragging with Geometry: From Pixels to Geometry-Guided Image Editing Xinyu Pu et.al. 2509.25740 null
2025-09-30 EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling Xin Luo et.al. 2509.23909 null
2025-09-30 FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing Junyi Wu et.al. 2509.22244 null
2025-09-29 Training-Free Multimodal Guidance for Video to Audio Generation Eleonora Grassucci et.al. 2509.24550 null
2025-09-29 Instruction Guided Multi Object Image Editing with Quantity and Layout Consistency Jiaqi Tan et.al. 2509.24514 null
2025-09-29 Latent Visual Reasoning Bangzheng Li et.al. 2509.24251 null
2025-09-28 Visual CoT Makes VLMs Smarter but More Fragile Chunxue Xu et.al. 2509.23789 null
2025-09-28 Seedream 4.0: Toward Next-generation Multimodal Image Generation Team Seedream et.al. 2509.20427 null
2025-09-27 Object-AVEdit: An Object-level Audio-Visual Editing Model Youquan Fu et.al. 2510.00050 null
2025-09-26 EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer Zhehao Dong et.al. 2509.22407 null
2025-09-26 SAGE: Scene Graph-Aware Guidance and Execution for Long-Horizon Manipulation Tasks Jialiang Li et.al. 2509.21928 null
2025-09-26 Taming Flow-based I2V Models for Creative Video Editing Xianghao Kong et.al. 2509.21917 null
2025-09-26 TDEdit: A Unified Diffusion Framework for Text-Drag Guided Image Manipulation Qihang Wang et.al. 2509.21905 null
2025-09-25 FreeInsert: Personalized Object Insertion with Geometric and Style Control Yuhong Zhang et.al. 2509.20756 null
2025-09-25 ArtUV: Artist-style UV Unwrapping Yuguang Chen et.al. 2509.20710 null
2025-09-25 EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning Xuan Ju et.al. 2509.20360 null
2025-09-25 Understanding-in-Generation: Reinforcing Generative Capability of Unified Model via Infusing Understanding into Generation Yuanhuiyi Lyu et.al. 2509.18639 null
2025-09-24 Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation Shufan Li et.al. 2509.19244 null
2025-09-23 Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation Yanzuo Lu et.al. 2509.18824 null
2025-09-23 GeoRemover: Removing Objects and Their Causal Visual Artifacts Zixin Zhu et.al. 2509.18538 null
2025-09-22 Multi-Agent Amodal Completion: Direct Synthesis with Fine-Grained Semantic Guidance Hongxing Fan et.al. 2509.17757 null
2025-09-20 Prompt-Driven Agentic Video Editing System: Autonomous Comprehension of Long-Form, Story-Driven Media Zihan Ding et.al. 2509.16811 null
2025-09-20 V-CECE: Visual Counterfactual Explanations via Conceptual Edits Nikolaos Spanos et.al. 2509.16567 null
2025-09-19 Neural Atlas Graphs for Dynamic Scene Decomposition and Editing Jan Philipp Schneider et.al. 2509.16336 null
2025-09-19 Enriched Feature Representation and Motion Prediction Module for MOSEv2 Track of 7th LSVOS Challenge: 3rd Place Solution Chang Soo Lim et.al. 2509.15781 null
2025-09-18 AutoEdit: Automatic Hyperparameter Tuning for Image Editing Chau Pham et.al. 2509.15031 null
2025-09-18 MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks Mingsong Li et.al. 2509.14638 null
2025-09-18 End4: End-to-end Denoising Diffusion for Diffusion-Based Inpainting Detection Fei Wang et.al. 2509.13214 null
2025-09-17 Controllable-Continuous Color Editing in Diffusion Model via Color Mapping Yuqi Yang et.al. 2509.13756 null
2025-09-17 LLM-I: LLMs are Naturally Interleaved Multimodal Creators Zirun Guo et.al. 2509.13642 null
2025-09-16 EdiVal-Agent: An Object-Centric Framework for Automated, Scalable, Fine-Grained Evaluation of Multi-Turn Editing Tianyu Chen et.al. 2509.13399 null
2025-09-16 Lego-Edit: A General Image Editing Framework with Model-Level Bricks and MLLM Builder Qifei Jia et.al. 2509.12883 null
2025-09-16 Beyond Artificial Misalignment: Detecting and Grounding Semantic-Coordinated Multimodal Manipulations Jinjie Shen et.al. 2509.12653 null
2025-09-15 LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence Zixin Yin et.al. 2509.12203 null
2025-09-13 EditDuet: A Multi-Agent System for Video Non-Linear Editing Marcelo Sandoval-Castaneda et.al. 2509.10761 null
2025-09-12 Immunizing Images from Text to Image Editing via Adversarial Cross-Attention Matteo Trippodo et.al. 2509.10359 null
2025-09-10 RoentMod: A Synthetic Chest X-Ray Modification Model to Identify and Correct Image Interpretation Model Shortcuts Lauren H. Cooke et.al. 2509.08640 null
2025-09-09 Delta Velocity Rectified Flow for Text-to-Image Editing Gaspard Beaudouin et.al. 2509.05342 null
2025-09-04 Improved 3D Scene Stylization via Text-Guided Generative Image Editing with Region-Based Control Haruo Fujiwara et.al. 2509.05285 null
2025-09-04 Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping Jingyi Lu et.al. 2509.04582 null
2025-09-04 From Editor to Dense Geometry Estimator JiYuan Wang et.al. 2509.04338 null
2025-09-03 Discrete Noise Inversion for Next-scale Autoregressive Text-based Image Editing Quan Dao et.al. 2509.01984 null
2025-09-02 Fidelity-preserving enhancement of ptychography with foundational text-to-image models Ming Du et.al. 2509.04513 null
2025-09-02 Draw-In-Mind: Learning Precise Image Editing via Chain-of-Thought Imagination Ziyun Zeng et.al. 2509.01986 null
2025-09-01 O-DisCo-Edit: Object Distortion Control for Unified Realistic Video Editing Yuqing Chen et.al. 2509.01596 null
2025-09-01 Neural Scene Designer: Self-Styled Semantic Image Manipulation Jianman Lin et.al. 2509.01405 null
2025-08-30 LatentEdit: Adaptive Latent Control for Consistent Semantic Editing Siyi Liu et.al. 2509.00541 null
2025-08-28 Webly-Supervised Image Manipulation Localization via Category-Aware Auto-Annotation Chenfan Qu et.al. 2508.20987 null
2025-08-28 Describe, Don't Dictate: Semantic Image Editing with Natural Language Intent En Ci et.al. 2508.20505 null
2025-08-28 Audio-Guided Visual Editing with Complex Multi-Modal Prompts Hyeonyu Kim et.al. 2508.20379 null
2025-08-27 Not Every Gift Comes in Gold Paper or with a Red Ribbon: Exploring Color Perception in Text-to-Image Models Shay Shomer Chai et.al. 2508.19791 null
2025-08-25 ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models Haitang Feng et.al. 2508.18271 null
2025-08-25 SpotEdit: Evaluating Visually-Guided Image Editing Methods Sara Ghazanfari et.al. 2508.18159 null
2025-08-24 An LLM-LVLM Driven Agent for Iterative and Fine-Grained Image Editing Zihan Liang et.al. 2508.17435 null
2025-08-24 Defending Deepfake via Texture Feature Perturbation Xiao Zhang et.al. 2508.17315 null
2025-08-24 PosBridge: Multi-View Positional Embedding Transplant for Identity-Aware Image Editing Peilin Xiong et.al. 2508.17302 null
2025-08-21 Visual Autoregressive Modeling for Instruction-Guided Image Editing Qingyang Mao et.al. 2508.15772 null
2025-08-20 AnchorSync: Global Consistency Optimization for Long Video Editing Zichi Liu et.al. 2508.14609 null
2025-08-20 DreamSwapV: Mask-guided Subject Swapping for Any Customized Video Editing Weitao Wang et.al. 2508.14465 null
2025-08-19 Sketch3DVE: Sketch-based 3D-Aware Scene Video Editing Feng-Lin Liu et.al. 2508.13797 null
2025-08-18 Single-Reference Text-to-Image Manipulation with Dual Contrastive Denoising Score Syed Muhmmad Israr et.al. 2508.12718 null
2025-08-18 TimeMachine: Fine-Grained Facial Age Editing with Identity Preservation Yilin Mi et.al. 2508.11284 null
2025-08-18 NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale NextStep Team et.al. 2508.10711 null
2025-08-16 PEdger++: Practical Edge Detection via Assembling Cross Information Yuanbin Fu et.al. 2508.11961 null
2025-08-14 LD-LAudio-V1: Video-to-Long-Form-Audio Generation Extension with Dual Lightweight Adapters Haomin Zhang et.al. 2508.11074 null
2025-08-14 A Segmentation-driven Editing Method for Bolt Defect Augmentation and Detection Yangjie Xiao et.al. 2508.10509 null
2025-08-14 TweezeEdit: Consistent and Efficient Image Editing with Path Regularization Jianda Mao et.al. 2508.10498 null
2025-08-13 LIA-X: Interpretable Latent Portrait Animator Yaohui Wang et.al. 2508.09959 null
2025-08-12 Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control Zeqian Long et.al. 2508.08134 null
2025-08-12 Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation Fangyuan Mao et.al. 2508.07981 null
2025-08-11 X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning Jian Ma et.al. 2508.07607 null
2025-08-11 Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing Joonghyuk Shin et.al. 2508.07519 null
2025-08-10 CLUE: Leveraging Low-Rank Adaptation to Capture Latent Uncovered Evidence for Image Forgery Localization Youqi Wang et.al. 2508.07413 null
2025-08-10 Consistent and Controllable Image Animation with Motion Linear Diffusion Transformers Xin Ma et.al. 2508.07246 null
2025-08-09 CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing Weiyan Xie et.al. 2508.06937 null
2025-08-09 Talk2Image: A Multi-Agent System for Multi-Turn Image Generation and Editing Shichao Ma et.al. 2508.06916 null
2025-08-08 UGD-IML: A Unified Generative Diffusion-based Framework for Constrained and Unconstrained Image Manipulation Localization Yachun Mi et.al. 2508.06101 null
2025-08-08 DreamVE: Unified Instruction-based Image and Video Editing Bin Xia et.al. 2508.06080 null
2025-08-08 NEP: Autoregressive Image Editing via Next Editing Token Prediction Huimin Wu et.al. 2508.06044 null
2025-08-08 InstantEdit: Text-Guided Few-Step Image Editing with Piecewise Rectified Flow Yiming Gong et.al. 2508.06033 null
2025-08-05 Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation Peiyu Wang et.al. 2508.03320 null
2025-08-05 Zero Shot Domain Adaptive Semantic Segmentation by Synthetic Data Generation and Progressive Adaptation Jun Luo et.al. 2508.03300 null
2025-08-05 LORE: Latent Optimization for Precise Semantic Control in Rectified Flow-based Image Editing Liangyang Ouyang et.al. 2508.03144 null
2025-08-05 UniEdit-I: Training-free Image Editing for Unified VLM via Iterative Understanding, Editing and Verifying Chengyu Bai et.al. 2508.03142 null
2025-08-05 The Promise of RL for Autoregressive Image Editing Saba Ahmadi et.al. 2508.01119 null
2025-08-04 Transport-Guided Rectified Flow Inversion: Improved Image Editing Using Optimal Transport Theory Marian Lupascu et.al. 2508.02363 null
2025-08-04 Qwen-Image Technical Report Chenfei Wu et.al. 2508.02324 null
2025-08-01 Controllable Pedestrian Video Editing for Multi-View Driving Scenarios via Motion Sequence Danzhen Fu et.al. 2508.00299 null
2025-08-01 Towards Robust Semantic Correspondence: A Benchmark and Insights Wenyue Chong et.al. 2508.00272 null
2025-08-01 Training-free Geometric Image Editing on Diffusion Models Hanshen Zhu et.al. 2507.23300 null
2025-07-31 UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing Hao Tang et.al. 2507.23278 null
2025-07-29 Low-Cost Test-Time Adaptation for Robust Video Editing Jianhui Wang et.al. 2507.21858 null
2025-07-29 From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos Chenjian Gao et.al. 2507.20331 null
2025-07-28 GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset Yuhan Wang et.al. 2507.21033 null
2025-07-28 ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation Sherry X. Chen et.al. 2507.07317 null
2025-07-25 HQ-SMem: Video Segmentation and Tracking Using Memory Efficient Object Embedding With Selective Update and Self-Supervised Distillation Feedback Elham Soltani Kazemi et.al. 2507.18921 null
2025-07-23 Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling Yi Xin et.al. 2507.17801 null
2025-07-22 ADCD-Net: Robust Document Image Forgery Localization via Adaptive DCT Feature and Hierarchical Content Disentanglement Kahim Wong et.al. 2507.16397 null
2025-07-22 Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling Chao Zhou et.al. 2507.16240 null
2025-07-22 LMM4Edit: Benchmarking and Evaluating Multimodal Image Editing with LMMs Zitong Xu et.al. 2507.16193 null
2025-07-20 Light Future: Multimodal Action Frame Prediction via InstructPix2Pix Zesen Zhong et.al. 2507.14809 null
2025-07-18 NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining Maksim Kuprashevich et.al. 2507.14119 null
2025-07-18 Moodifier: MLLM-Enhanced Emotion-Driven Image Editing Jiarong Ye et.al. 2507.14024 null
2025-07-16 MADI: Masking-Augmented Diffusion with Inference-Time Scaling for Visual Editing Shreya Kadambi et.al. 2507.13401 null
2025-07-15 EditGen: Harnessing Cross-Attention Control for Instruction-Based Auto-Regressive Audio Editing Vassilis Sioros et.al. 2507.11096 null
2025-07-14 Sparse Fine-Tuning of Transformers for Generative Tasks Wei Chen et.al. 2507.10855 null
2025-07-14 LayLens: Improving Deepfake Understanding through Simplified Explanations Abhijeet Narang et.al. 2507.10066 null
2025-07-11 FlowDrag: 3D-aware Drag-based Image Editing with Mesh-guided Deformation Vector Flow Fields Gwanhyeong Koo et.al. 2507.08285 null
2025-07-08 2D Instance Editing in 3D Space Yuhuan Xie et.al. 2507.05819 null
2025-07-07 Neural-Driven Image Editing Pengfei Zhou et.al. 2507.05397 null
2025-07-07 Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing Chun-Hsiao Yeh et.al. 2507.05259 null
2025-07-07 S $^2$ Edit: Text-Guided Image Editing with Precise Semantic and Spatial Control Xudong Liu et.al. 2507.04584 null
2025-07-04 Pose-Star: Anatomy-Aware Editing for Open-World Fashion Images Yuran Dong et.al. 2507.03402 null
2025-07-04 LACONIC: A 3D Layout Adapter for Controllable Image Creation Léopold Maillard et.al. 2507.03257 null
2025-07-03 From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding Xiangfeng Wang et.al. 2507.02790 null
2025-07-02 Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning Qingdong He et.al. 2507.01908 null
2025-07-02 ReFlex: Text-Guided Editing of Real Images in Rectified Flow via Mid-Step Feature Extraction and Attention Adaptation Jimyeong Kim et.al. 2507.01496 null
2025-07-02 QC-OT: Optimal Transport with Quasiconformal Mapping Yuping Lv et.al. 2507.01456 null
2025-07-01 Ovis-U1 Technical Report Guo-Hua Wang et.al. 2506.23044 null
2025-06-30 A Unified Framework for Stealthy Adversarial Generation via Latent Optimization and Transferability Enhancement Gaozheng Pei et.al. 2506.23676 null
2025-06-30 TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity Yuzhuo Chen et.al. 2506.23484 null
2025-06-29 OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions Yuanhao Cai et.al. 2506.23361 null
2025-06-29 Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis Lei-lei Li et.al. 2506.23263 null
2025-06-28 Towards Explainable Bilingual Multimodal Misinformation Detection and Localization Yiwei He et.al. 2506.22930 null
2025-06-28 STR-Match: Matching SpatioTemporal Relevance Score for Training-Free Video Editing Junsung Lee et.al. 2506.22868 null
2025-06-27 Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy Yuhao Liu et.al. 2506.22432 null
2025-06-27 GenEscape: Hierarchical Multi-Agent Generation of Escape Room Puzzles Mengyi Shan et.al. 2506.21839 null
2025-06-27 DFVEdit: Conditional Delta Flow Vector for Zero-shot Video Editing Lingling Cai et.al. 2506.20967 null
2025-06-26 Controllable 3D Placement of Objects with Scene-Aware Diffusion Models Mohamed Omran et.al. 2506.21446 null
2025-06-26 Improving Diffusion-Based Image Editing Faithfulness via Guidance and Scheduling Hansam Cho et.al. 2506.21045 null
2025-06-26 M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization Ju-Hyeon Nam et.al. 2506.20922 null
2025-06-26 FaSTA $^*$ : Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing Advait Gupta et.al. 2506.20911 null
2025-06-26 BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing Jiacheng Chen et.al. 2506.17450 null
2025-06-25 EditP23: 3D Editing via Propagation of Image Prompts to Multi-View Roi Bar-On et.al. 2506.20652 null
2025-06-25 Towards Efficient Exemplar Based Image Editing with Multimodal VLMs Avadhoot Jadhav et.al. 2506.20155 null
2025-06-25 OmniGen2: Exploration to Advanced Multimodal Generation Chenyuan Wu et.al. 2506.18871 null
2025-06-24 SceneCrafter: Controllable Multi-View Driving Scene Editing Zehao Zhu et.al. 2506.19488 null
2025-06-24 LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning Chenjian Gao et.al. 2506.10082 null
2025-06-23 Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models Ilia Beletskii et.al. 2506.19103 null
2025-06-23 Let Your Video Listen to Your Music! Xinyu Zhang et.al. 2506.18881 null
2025-06-23 CPAM: Context-Preserving Adaptive Manipulation for Zero-Shot Real Image Editing Dinh-Khoi Vo et.al. 2506.18438 null
2025-06-23 Instability in Diffusion ODEs: An Explanation for Inaccurate Image Reconstruction Han Zhang et.al. 2506.18290 null
2025-06-20 FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation Fan Yang et.al. 2506.16806 null
2025-06-19 Arch-Router: Aligning LLM Routing with Human Preferences Co Tran et.al. 2506.16655 null
2025-06-18 VectorEdits: A Dataset and Benchmark for Instruction-Based Editing of Vector Graphics Josef Kuchař et.al. 2506.15903 null
2025-06-17 Causally Steered Diffusion for Automated Video Counterfactual Generation Nikos Spyrou et.al. 2506.14404 link
2025-06-16 AttentionDrag: Exploiting Latent Correlation Knowledge in Pre-trained Diffusion Models for Image Editing Biao Yang et.al. 2506.13301 null
2025-06-15 Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing Zhuoying Li et.al. 2506.13827 null
2025-06-15 ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies Chenglin Wang et.al. 2506.12830 null
2025-06-14 Good Noise Makes Good Edits: A Training-Free Diffusion-Based Video Editing with Image and Text Prompts Saemee Choi et.al. 2506.12520 null
2025-06-13 SphereDrag: Spherical Geometry-Aware Panoramic Image Editing Zhiao Feng et.al. 2506.11863 null
2025-06-13 Consistent Video Editing as Flow-Driven Image-to-Video Generation Ge Wang et.al. 2506.07713 null
2025-06-12 VINCIE: Unlocking In-context Image Editing from Video Leigang Qu et.al. 2506.10941 null
2025-06-12 Edit360: 2D Image Edits to 3D Assets from Any Angle Junchao Huang et.al. 2506.10507 null
2025-06-12 Towards Reliable Identification of Diffusion-based Image Manipulations Alex Costanzino et.al. 2506.05466 null
2025-06-11 EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits Ron Yosef et.al. 2506.09988 null
2025-06-11 ELBO-T2IAlign: A Generic ELBO-Based Method for Calibrating Pixel-level Text-Image Alignment in Diffusion Models Qin Zhou et.al. 2506.09740 null
2025-06-11 Ming-Omni: A Unified Multimodal Model for Perception and Generation Inclusion AI et.al. 2506.09344 link
2025-06-11 Fine-Grained Spatially Varying Material Selection in Images Julia Guerrero-Viu et.al. 2506.09023 null
2025-06-10 Do Concept Replacement Techniques Really Erase Unacceptable Concepts? Anudeep Das et.al. 2506.08991 null
2025-06-10 RoboSwap: A GAN-driven Video Diffusion Framework For Unsupervised Robot Arm Swapping Yang Bai et.al. 2506.08632 null
2025-06-09 Highly Compressed Tokenizer Can Generate Without Training L. Lao Beyer et.al. 2506.08257 link
2025-06-09 PairEdit: Learning Semantic Variations for Exemplar-based Image Editing Haoguang Lu et.al. 2506.07992 link
2025-06-09 Diffusion Counterfactual Generation with Semantic Abduction Rajat Rasal et.al. 2506.07883 link
2025-06-09 DragNeXt: Rethinking Drag-Based Image Editing Yuan Zhou et.al. 2506.07611 null
2025-06-09 Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding Boyu Chen et.al. 2506.07576 null
2025-06-08 Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning Tianyi Bai et.al. 2506.07227 null
2025-06-08 TV-LiVE: Training-Free, Text-Guided Video Editing via Layer Informed Vitality Exploitation Min-Jung Kim et.al. 2506.07205 null
2025-06-06 Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models Yifu Qiu et.al. 2506.06006 link
2025-06-06 FADE: Frequency-Aware Diffusion Model Factorization for Video Editing Yixuan Zhu et.al. 2506.05934 link
2025-06-06 SeedEdit 3.0: Fast and High-Quality Generative Image Editing Peng Wang et.al. 2506.05083 null
2025-06-05 FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing Guangzhao Li et.al. 2506.05046 null
2025-06-05 Invisible Backdoor Triggers in Image Editing Model via Deep Watermarking Yu-Feng Chen et.al. 2506.04879 link
2025-06-05 FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers Xuanhua He et.al. 2506.04213 null
2025-06-04 HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation Hermann Kumbong et.al. 2506.04421 null
2025-06-04 Is Perturbation-Based Image Protection Disruptive to Image Editing? Qiuyu Tang et.al. 2506.04394 null
2025-06-04 UNIC: Unified In-Context Video Editing Zixuan Ye et.al. 2506.04216 null
2025-06-04 Image Editing As Programs with Diffusion Models Yujia Hu et.al. 2506.04158 null
2025-06-04 UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Bin Lin et.al. 2506.03147 null
2025-06-04 MedEBench: Revisiting Text-instructed Image Editing on Medical Domain Minghao Liu et.al. 2506.01921 null
2025-06-03 RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions Bimsara Pathiraja et.al. 2506.03448 null
2025-06-03 ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions Di Chang et.al. 2506.03107 null
2025-06-03 DCI: Dual-Conditional Inversion for Boosting Diffusion-Based Image Editing Zixiang Li et.al. 2506.02560 null
2025-06-03 RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers Yan Gong et.al. 2506.02528 null
2025-06-02 IMAGHarmony: Controllable Image Editing with Consistent Object Quantity and Layout Fei Shen et.al. 2506.01949 null
2025-06-02 OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation Sen Liang et.al. 2506.01801 null
2025-06-02 Unlocking Aha Moments via Reinforcement Learning: Advancing Collaborative Visual Comprehension and Generation Kaihang Pan et.al. 2506.01480 null
2025-06-02 DNAEdit: Direct Noise Alignment for Text-Guided Rectified Flow Editing Chenxi Xie et.al. 2506.01430 null
2025-06-01 Motion-Aware Concept Alignment for Consistent Video Editing Tong Zhang et.al. 2506.01004 null
2025-05-31 Concept-Centric Token Interpretation for Vector-Quantized Generative Models Tianze Yang et.al. 2506.00698 null
2025-05-30 MiniMax-Remover: Taming Bad Noise Helps Video Object Removal Bojia Zi et.al. 2505.24873 null
2025-05-29 Cora: Correspondence-aware image editing using few step diffusion Amirhossein Almohammadi et.al. 2505.23907 null
2025-05-29 LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers Yusuf Dalva et.al. 2505.23758 null
2025-05-29 Weakly-supervised Localization of Manipulated Image Regions Using Multi-resolution Learned Features Ziyong Wang et.al. 2505.23586 null
2025-05-29 Video Editing for Audio-Visual Dubbing Binyamin Manela et.al. 2505.23406 link
2025-05-29 FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing Jeongsol Kim et.al. 2505.23145 link
2025-05-29 Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing Tongtong Su et.al. 2505.23134 link
2025-05-28 HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer Qi Cai et.al. 2505.22705 link
2025-05-28 VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use Mingyuan Wu et.al. 2505.19255 null
2025-05-27 Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion Yang Yang et.al. 2505.21593 null
2025-05-27 Imago Obscura: An Image Privacy AI Co-pilot to Enable Identification and Mitigation of Risks Kyzyl Monteiro et.al. 2505.20916 null
2025-05-27 InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling Xiaoxiao Jiang et.al. 2505.20600 null
2025-05-26 What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models Lorenzo Baraldi et.al. 2505.20405 null
2025-05-26 ImgEdit: A Unified Image Editing Dataset and Benchmark Yang Ye et.al. 2505.20275 link
2025-05-26 StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation Yi Wu et.al. 2505.19874 null
2025-05-26 TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs Juntong Wang et.al. 2505.19535 null
2025-05-26 Understanding Generative AI Capabilities in Everyday Image Editing Tasks Mohammad Reza Taesiri et.al. 2505.16181 null
2025-05-25 Beyond Editing Pairs: Fine-Grained Instructional Image Editing via Multi-Scale Learnable Regions Chenrui Ma et.al. 2505.19352 null
2025-05-25 SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation Shenggan Cheng et.al. 2505.19151 null
2025-05-25 MIND-Edit: MLLM Insight-Driven Editing via Language-Vision Projection Shuyu Wang et.al. 2505.19149 null
2025-05-24 REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing Weihan Xu et.al. 2505.18880 null
2025-05-24 Affective Image Editing: Shaping Emotional Factors via Text Descriptions Peixuan Zhang et.al. 2505.18699 null
2025-05-24 Improved Immiscible Diffusion: Accelerate Diffusion Training by Reducing Its Miscibility Yiheng Li et.al. 2505.18521 link
2025-05-23 DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval Yuxin Yang et.al. 2505.17796 null
2025-05-23 R-Genie: Reasoning-Guided Generative Image Editing Dong Zhang et.al. 2505.17768 null
2025-05-22 KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models Yongliang Wu et.al. 2505.16707 null
2025-05-21 FragFake: A Dataset for Fine-Grained Detection of Edited Images with Vision Language Models Zhen Sun et.al. 2505.15644 link
2025-05-20 DragLoRA: Online Optimization of LoRA Adapters for Drag-based Image Editing in Diffusion Model Siwei Xia et.al. 2505.12427 link
2025-05-20 CompBench: Benchmarking Complex Instruction-guided Image Editing Bohan Jia et.al. 2505.12200 null
2025-05-18 From Shots to Stories: LLM-Assisted Video Editing with Unified Language Representations Yuzhi Li et.al. 2505.12237 null
2025-05-16 X-Edit: Detecting and Localizing Edits in Images Altered by Text-Guided Diffusion Models Valentina Bazyleva et.al. 2505.11753 null
2025-05-16 GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing Yusu Qian et.al. 2505.11493 null
2025-05-15 3D-Fixup: Advancing Photo Editing with 3D Priors Yen-Chi Cheng et.al. 2505.10566 null
2025-05-15 IntrinsicEdit: Precise generative image manipulation in intrinsic space Linjie Lyu et.al. 2505.08889 null
2025-05-14 Don't Forget your Inverse DDIM for Image Editing Guillermo Gomez-Trenado et.al. 2505.09571 null
2025-05-12 MDE-Edit: Masked Dual-Editing for Multi-Object Image Editing via Diffusion Models Hongyang Zhu et.al. 2505.05101 null
2025-05-11 DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models Junhao Xia et.al. 2505.07057 null
2025-05-11 Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation Chao Liao et.al. 2505.05472 null
2025-05-08 GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing Tong Wang et.al. 2505.04915 null
2025-05-07 Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers Divyansh Srivastava et.al. 2505.04718 null
2025-05-07 Multi-turn Consistent Image Editing Zijun Zhou et.al. 2505.04320 null
2025-05-07 Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction Inclusion AI et.al. 2505.02471 link
2025-05-06 MambaStyle: Efficient StyleGAN Inversion for Real Image Editing with State-Space Models Jhon Lopez et.al. 2505.15822 null
2025-05-06 Step1X-Edit: A Practical Framework for General Image Editing Shiyu Liu et.al. 2504.17761 link
2025-05-05 SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing Ming Li et.al. 2505.02370 link
2025-05-04 Video Forgery Detection for Surveillance Cameras: A Review Noor B. Tayfor et.al. 2505.03832 null
2025-05-02 Improving Editability in Image Generation with Layer-wise Memory Daneul Kim et.al. 2505.01079 null
2025-05-02 A Rusty Link in the AI Supply Chain: Detecting Evil Configurations in Model Repositories Ziqi Ding et.al. 2505.01067 null
2025-05-02 Photoshop Batch Rendering Using Actions for Stylistic Video Editing Tessa De La Fuente et.al. 2505.01001 null
2025-05-01 InstructAttribute: Fine-grained Object Attributes editing with Instruction Xingxi Yin et.al. 2505.00751 null
2025-05-01 Controllable Weather Synthesis and Removal with Video Diffusion Models Chih-Hao Lin et.al. 2505.00704 null
2025-05-01 Towards Scalable Human-aligned Benchmark for Text-guided Image Editing Suho Ryu et.al. 2505.00502 link
2025-04-30 PixelHacker: Image Inpainting with Structural and Semantic Consistency Ziyang Xu et.al. 2504.20438 null
2025-04-29 In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer Zechuan Zhang et.al. 2504.20690 null
2025-04-27 CapsFake: A Multimodal Capsule Network for Detecting Instruction-Guided Deepfakes Tuan Nguyen et.al. 2504.19212 null
2025-04-26 REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models Gal Almog et.al. 2504.18989 link
2025-04-24 DCT-Shield: A Robust Frequency Domain Defense against Malicious Image Editing Aniruddha Bala et.al. 2504.17894 null
2025-04-24 VEU-Bench: Towards Comprehensive Understanding of Video Editing Bozheng Li et.al. 2504.17828 null
2025-04-24 Generative Fields: Uncovering Hierarchical Feature Control for StyleGAN via Inverted Receptive Fields Zhuo He et.al. 2504.17712 null
2025-04-24 Enhancing Variational Autoencoders with Smooth Robust Latent Encoding Hyomin Lee et.al. 2504.17219 null
2025-04-24 Vidi: Large Multimodal Models for Video Understanding and Editing Vidi Team et.al. 2504.15681 null
2025-04-22 Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework Xinyuan Song et.al. 2504.16016 null
2025-04-22 Structure-Preserving Zero-Shot Image Editing via Stage-Wise Latent Injection in Diffusion Models Dasol Jeong et.al. 2504.15723 null
2025-04-21 MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World Ankit Dhiman et.al. 2504.15397 null
2025-04-21 Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach Lvpan Cai et.al. 2504.11922 link
2025-04-20 MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation Siyi Jiao et.al. 2504.14606 null
2025-04-19 Visual Prompting for One-shot Controllable Video Editing without Inversion Zhengbo Zhang et.al. 2504.14335 null
2025-04-19 PRISM: A Unified Framework for Photorealistic Reconstruction and Intrinsic Scene Modeling Alara Dirik et.al. 2504.14219 null
2025-04-18 Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation Fulvio Sanguigni et.al. 2504.14011 null
2025-04-18 Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing Joowon Kim et.al. 2504.13490 null
2025-04-17 Image Editing with Diffusion Models: A Survey Jia Wang et.al. 2504.13226 null
2025-04-17 $\texttt{Complex-Edit}$ : CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark Siwei Yang et.al. 2504.13143 null
2025-04-17 UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models Guanlong Jiao et.al. 2504.13109 null
2025-04-17 Image-Editing Specialists: An RLAIF Approach for Diffusion Models Elior Benarous et.al. 2504.12833 link
2025-04-17 SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding Qianqian Sun et.al. 2504.12704 null
2025-04-17 DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency Mengshi Qi et.al. 2504.12080 link
2025-04-17 Understanding Attention Mechanism in Video Diffusion Models Bingyan Liu et.al. 2504.12027 null
2025-04-14 Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing Taihang Hu et.al. 2504.10434 link
2025-04-14 Analysis of Attention in Video Diffusion Transformers Yuxin Wen et.al. 2504.10317 null
2025-04-14 TAPNext: Tracking Any Point (TAP) as Next Token Prediction Artem Zholus et.al. 2504.05579 null
2025-04-13 SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow Kenan Tang et.al. 2504.09697 link
2025-04-13 CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models Pooja Guhan et.al. 2504.09472 null
2025-04-11 CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model Ruohao Zhan et.al. 2504.08259 null
2025-04-10 POEM: Precise Object-level Editing via MLLM control Marco Schouten et.al. 2504.08111 null
2025-04-10 Learning Universal Features for Generalizable Image Forgery Localization Hengrun Zhao et.al. 2504.07462 link
2025-04-10 Routing to the Right Expertise: A Trustworthy Judge for Instruction-based Image Editing Chenxi Sun et.al. 2504.07424 null
2025-04-09 FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution Gene Chou et.al. 2504.07093 link
2025-04-08 VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing Juan Luis Gonzalez Bello et.al. 2504.07146 null
2025-04-08 Transfer between Modalities with MetaQueries Xichen Pan et.al. 2504.06256 null
2025-04-08 Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model Qi Mao et.al. 2504.05594 null
2025-04-08 Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing Xiangyu Zhao et.al. 2504.02826 link
2025-04-07 CREA: A Collaborative Multi-Agent Framework for Creative Content Generation with Diffusion Models Kavana Venkatesh et.al. 2504.05306 null
2025-04-07 Disentangling Instruction Influence in Diffusion Transformers for Parallel Multi-Instruction-Guided Image Editing Hui Liu et.al. 2504.04784 null
2025-04-07 MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models Wulin Xie et.al. 2504.03641 null
2025-04-04 Synthesizing Optimal Object Selection Predicates for Image Editing using Lattices Yang He et.al. 2504.03155 null
2025-04-03 How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models Pascal Chang et.al. 2504.03072 null
2025-04-03 VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning Xianwei Zhuang et.al. 2504.02949 link
2025-04-03 Concept Lancet: Image Editing with Compositional Representation Transplant Jinqi Luo et.al. 2504.02828 null
2025-04-03 GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation Zhiyuan Yan et.al. 2504.02782 link
2025-04-03 ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement Runhui Huang et.al. 2504.01934 null
2025-04-02 FreSca: Unveiling the Scaling Space in Diffusion Models Chao Huang et.al. 2504.02154 null
2025-04-02 A Diffusion-Based Framework for Occluded Object Movement Zheng-Peng Duan et.al. 2504.01873 null
2025-03-31 AI2Agent: An End-to-End Framework for Deploying AI Projects as Autonomous Agents Jiaxiang Chen et.al. 2503.23948 link
2025-03-31 Training-Free Text-Guided Image Editing with Visual Autoregressive Model Yufei Wang et.al. 2503.23897 link
2025-03-30 Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging Amar Kumar et.al. 2503.23618 null
2025-03-30 ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025 Tianming Liang et.al. 2503.23509 link
2025-03-30 SketchVideo: Sketch-based Video Generation and Editing Feng-Lin Liu et.al. 2503.23284 null
2025-03-29 FreeInv: Free Lunch for Improving DDIM Inversion Yuxiang Bao et.al. 2503.23035 null
2025-03-29 FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model Jun Zhou et.al. 2503.19839 null
2025-03-28 Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance Haijie Yang et.al. 2503.22225 null
2025-03-28 LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing Achint Soni et.al. 2503.21541 link
2025-03-26 Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising Yan-Bo Lin et.al. 2503.20782 null
2025-03-26 EditCLIP: Representation Learning for Image Editing Qian Wang et.al. 2503.20318 link
2025-03-26 Wan: Open and Advanced Large-Scale Video Generative Models WanTeam et.al. 2503.20314 link
2025-03-26 InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction Yuhui Wu et.al. 2503.20287 link
2025-03-25 Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning Sherry X. Chen et.al. 2503.18406 link
2025-03-25 Shot Sequence Ordering for Video Editing: Benchmarks, Metrics, and Cinematology-Inspired Computing Methods Yuzhi Li et.al. 2503.17975 null
2025-03-24 FDS: Frequency-Aware Denoising Score for Text-Guided Latent Diffusion Image Editing Yufan Ren et.al. 2503.19191 null
2025-03-24 Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance Sicong Feng et.al. 2503.18386 null
2025-03-24 MaSS13K: A Matting-level Semantic Segmentation Benchmark Chenxi Xie et.al. 2503.18364 link
2025-03-23 Collaborating with AI Agents: Field Experiments on Teamwork, Productivity, and Performance Harang Ju et.al. 2503.18238 link
2025-03-23 What Time Tells Us? An Explorative Study of Time Awareness Learned from Static Images Dongheng Lin et.al. 2503.17899 null
2025-03-23 Multi-focal Conditioned Latent Diffusion for Person Image Synthesis Jiaqi Liu et.al. 2503.15686 link
2025-03-22 InstructVEdit: A Holistic Approach for Instructional Video Editing Chi Zhang et.al. 2503.17641 null
2025-03-22 Guidance Free Image Editing via Explicit Conditioning Mehdi Noroozi et.al. 2503.17593 null
2025-03-21 HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks Maria Pilligua et.al. 2503.17276 null
2025-03-21 DCEdit: Dual-Level Controlled Image Editing via Precisely Localized Semantics Yihan Hu et.al. 2503.16795 null
2025-03-20 FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing Tianyi Wei et.al. 2503.16153 null
2025-03-20 Single Image Iterative Subject-driven Generation and Editing Yair Shpitzer et.al. 2503.16025 link
2025-03-19 VEGGIE: Instructional Editing and Reasoning of Video Concepts with Grounded Generation Shoubin Yu et.al. 2503.14350 null
2025-03-18 ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing Yulin Pan et.al. 2503.14482 null
2025-03-18 TarPro: Targeted Protection against Malicious Image Editing Kaixin Shen et.al. 2503.13994 null
2025-03-17 FiVE: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models Minghan Li et.al. 2503.13684 null
2025-03-17 Unified Autoregressive Visual Generation and Understanding with Continuous Tokens Lijie Fan et.al. 2503.13436 null
2025-03-17 Edit Transfer: Learning Image Editing via Vision In-Context Relations Lan Chen et.al. 2503.13327 null
2025-03-17 GIFT: Generated Indoor video frames for Texture-less point tracking Jianzheng Huang et.al. 2503.12944 null
2025-03-17 DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode Junjia Huang et.al. 2503.12838 null
2025-03-16 UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing Tsu-Jui Fu et.al. 2503.12652 null
2025-03-16 Personalize Anything for Free with Diffusion Transformer Haoran Feng et.al. 2503.12590 null
2025-03-14 Upcycling Text-to-Image Diffusion Models for Multi-Task Capabilities Ruchika Chavhan et.al. 2503.11905 null
2025-03-14 RASA: Replace Anyone, Say Anything -- A Training-Free Framework for Audio-Driven and Universal Portrait Video Editing Tianrui Pan et.al. 2503.11571 null
2025-03-14 LUSD: Localized Update Score Distillation for Text-Guided Image Editing Worameth Chinchuthakun et.al. 2503.11054 link
2025-03-14 V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes Yanming Zhang et.al. 2503.10634 null
2025-03-14 On the Limitations of Vision-Language Models in Understanding Image Transforms Ahmad Mustafa Anis et.al. 2503.09837 null
2025-03-13 Fine-Tuning Diffusion Generative Models via Rich Preference Optimization Hanyang Zhao et.al. 2503.11720 null
2025-03-13 CoSTA $\ast$ : Cost-Sensitive Toolpath Agent for Multi-turn Image Editing Advait Gupta et.al. 2503.10613 link
2025-03-13 EEdit : Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing Zexuan Yan et.al. 2503.10270 link
2025-03-13 MoEdit: On Learning Quantity Perception for Multi-object Image Editing Yanfeng Li et.al. 2503.10112 link
2025-03-13 Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models Armando Fortes et.al. 2503.08434 null
2025-03-12 Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space Yifan Zhou et.al. 2503.09419 link
2025-03-12 InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images Jiun Tian Hoe et.al. 2503.09130 null
2025-03-12 OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting Yongsheng Yu et.al. 2503.08677 null
2025-03-11 Aligning Text to Image in Diffusion Models is Easier Than You Think Jaa-Yeon Lee et.al. 2503.08250 link
2025-03-11 ObjectMover: Generative Object Movement with Video Prior Xin Yu et.al. 2503.08037 null
2025-03-11 CAD-VAE: Leveraging Correlation-Aware Latents for Comprehensive Fair Disentanglement Chenrui Ma et.al. 2503.07938 null
2025-03-11 VACE: All-in-One Video Creation and Editing Zeyinzi Jiang et.al. 2503.07598 null
2025-03-10 Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model Lixue Gong et.al. 2503.07703 null
2025-03-10 TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation Victor Shea-Jay Huang et.al. 2503.07050 null
2025-03-10 Interactive Tumor Progression Modeling via Sketch-Based Image Editing Gexin Huang et.al. 2503.06809 null
2025-03-10 VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control Yuxuan Bian et.al. 2503.05639 link
2025-03-09 Consistent Image Layout Editing with Diffusion Models Tao Xia et.al. 2503.06419 null
2025-03-08 Get In Video: Add Anything You Want to the Video Shaobin Zhuang et.al. 2503.06268 null
2025-03-08 X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation Jian Ma et.al. 2503.06134 link
2025-03-07 Towards Locally Explaining Prediction Behavior via Gradual Interventions and Measuring Property Gradients Niklas Penzel et.al. 2503.05424 null
2025-03-06 Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models Rui Jiang et.al. 2503.04215 null
2025-03-05 GuardDoor: Safeguarding Against Malicious Diffusion Editing via Protective Backdoors Yaopei Zeng et.al. 2503.03944 null
2025-03-04 h-Edit: Effective and Flexible Diffusion-Based Editing via Doob's h-Transform Toan Nguyen et.al. 2503.02187 link
2025-03-03 VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors Juil Koo et.al. 2503.01107 null
2025-03-01 GenVDM: Generating Vector Displacement Maps From a Single Image Yuezhi Yang et.al. 2503.00605 null
2025-02-27 Tight Inversion: Image-Conditioned Inversion for Real Image Editing Edo Kadosh et.al. 2502.20376 null
2025-02-27 Identity-preserving Distillation Sampling by Fixed-Point Iterator SeonHwa Kim et.al. 2502.19930 null
2025-02-26 SVGEditBench V2: A Benchmark for Instruction-based SVG Editing Kunato Nishina et.al. 2502.19453 link
2025-02-26 Bayesian Optimization for Controlled Image Editing via LLMs Chengkun Cai et.al. 2502.18116 null
2025-02-25 KV-Edit: Training-Free Image Editing for Precise Background Preservation Tianrui Zhu et.al. 2502.17363 link
2025-02-24 VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing Xiangpeng Yang et.al. 2502.17258 null
2025-02-23 PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data Shijie Huang et.al. 2502.14397 link
2025-02-22 DualNeRF: Text-Driven 3D Scene Editing via Dual-Field Representation Yuxuan Xiong et.al. 2502.16302 null
2025-02-18 AnyRefill: A Unified, Data-Efficient Framework for Left-Prompt-Guided Vision Tasks Ming Xie et.al. 2502.11158 null
2025-02-14 PromptArtisan: Multi-instruction Image Editing in Single Pass with Complete Attention Control Kunal Swami et.al. 2502.10258 null
2025-02-14 VideoDiff: Human-AI Video Co-Creation with Alternatives Mina Huh et.al. 2502.10190 null
2025-02-14 Hands-off Image Editing: Language-guided Editing without any Task-specific Labeling, Masking or even Training Rodrigo Santos et.al. 2502.10064 null
2025-02-14 SportsBuddy: Designing and Evaluating an AI-Powered Sports Video Storytelling Tool Through Real-World Deployment Tica Lin et.al. 2502.08621 null
2025-02-10 Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists Bojia Zi et.al. 2502.06734 null
2025-02-10 Predictive Red Teaming: Breaking Policies Without Breaking Robots Anirudha Majumdar et.al. 2502.06575 null
2025-02-08 AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection Shuheng Zhang et.al. 2502.05433 null
2025-02-06 MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation Jinbo Xing et.al. 2502.04299 null
2025-02-06 PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models Aleksandar Cvejic et.al. 2502.04050 null
2025-02-06 DICE: Distilling Classifier-Free Guidance into Text Embeddings Zhenyu Zhou et.al. 2502.03726 null
2025-02-05 Lost in Edits? A $λ$ -Compass for AIGC Provenance Wenhao You et.al. 2502.04364 null
2025-02-05 REALEDIT: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations Peter Sushko et.al. 2502.03629 null
2025-02-04 Exploring the latent space of diffusion models directly through singular value decomposition Li Wang et.al. 2502.02225 null
2025-02-04 EditIQ: Automated Cinematic Editing of Static Wide-Angle Videos via Dialogue Interpretation and Saliency Cues Rohit Girmaji et.al. 2502.02172 null
2025-02-04 Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation JooHyun Kwon et.al. 2502.02091 null
2025-01-30 DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models Ruofan Liang et.al. 2501.18590 null
2025-01-24 MATCHA:Towards Matching Anything Fei Xue et.al. 2501.14945 null
2025-01-24 Training-Free Style and Content Transfer by Leveraging U-Net Skip Connections in Stable Diffusion 2.* Ludovica Schaerf et.al. 2501.14524 null
2025-01-23 IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models Jiayi Lei et.al. 2501.13920 null
2025-01-09 Edit as You See: Image-guided Video Editing via Masked Motion Modeling Zhi-Lin Huang et.al. 2501.04325 null
2024-11-19 StableV2V: Stablizing Shape Consistency in Video-to-Video Editing Chang Liu et.al. 2411.11045 null
2024-08-29 Edit Temporal-Consistent Videos with Image Diffusion Model Yuanzhi Wang et.al. 2308.09091 null
2024-06-21 A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models Xincheng Shuai et.al. 2406.14555 null
2024-04-22 GenVideo: One-shot Target-image and Shape Aware Video Editing using T2I Diffusion Models Sai Sree Harsha et.al. 2404.12541 null
2024-03-04 FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing Yuren Cong et.al. 2310.05922 null
2024-02-20 Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts Yuyang Zhao et.al. 2305.08850 null
2024-01-19 Edit One for All: Interactive Batch Image Editing Thao Nguyen et.al. 2401.10219 null
2023-12-08 DiffusionAtlas: High-Fidelity Consistent Diffusion Video Editing Shao-Yu Chang et.al. 2312.03772 null
2023-10-12 FateZero: Fusing Attentions for Zero-shot Text-based Video Editing Chenyang Qi et.al. 2303.09535 null
2023-08-11 InFusion: Inject and Attention Fusion for Multi Concept Zero-Shot Text-based Video Editing Anant Khandelwal et.al. 2308.00135 null
2023-03-28 Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding Gyeongman Kim et.al. 2212.02802 null
2023-03-01 AVscript: Accessible Video Editing with Audio-Visual Scripts Mina Huh et.al. 2302.14117 null
2023-01-31 Shape-aware Text-driven Layered Video Editing Yao-Chih Lee et.al. 2301.13173 null
2022-06-22 Temporally Consistent Semantic Video Editing Yiran Xu et.al. 2206.10590 null
2022-05-26 Text2LIVE: Text-Driven Layered Image and Video Editing Omer Bar-Tal et.al. 2204.02491 null
2022-01-11 Video-Specific Autoencoders for Exploring, Editing and Transmitting Videos Kevin Wang et.al. 2103.17261 null
2021-08-18 A Latent Transformer for Disentangled Face Editing in Images and Videos Xu Yao et.al. 2106.11895 null
2021-04-22 Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions Xihui Liu et.al. 2008.01576 null
2020-08-10 Image2StyleGAN++: How to Edit the Embedded Images? Rameen Abdal et.al. 1911.11544 null

(back to top)

Others

Others

Publish Date Title Authors PDF Code
2025-11-18 UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning Rui Tian et.al. 2511.14760 null
2025-11-18 Co-Me: Confidence-Guided Token Merging for Visual Geometric Transformers Yutian Chen et.al. 2511.14751 null
2025-11-18 Graph Neural Networks for Vehicular Social Networks: Trends, Challenges, and Opportunities Elham Binshaflout et.al. 2511.14720 null
2025-11-18 Natural Language Interfaces for Databases: What Do Users Think? Panos Ipeirotis et.al. 2511.14718 null
2025-11-18 Talk, Snap, Complain: Validation-Aware Multimodal Expert Framework for Fine-Grained Customer Grievances Rishu Kumar Singh et.al. 2511.14693 null
2025-11-18 Giant enhancement of attosecond tunnel ionization competes with disorder-driven decoherence in silicon D. N. Purschke et.al. 2511.14678 null
2025-11-18 M-CALLM: Multi-level Context Aware LLM Framework for Group Interaction Prediction Diana Romero et.al. 2511.14661 null
2025-11-18 Robust Offset-free Kernelized Data-Driven Predictive Control for Nonlinear Systems Mahmood Mazare et.al. 2511.14652 null
2025-11-18 Real-time time-dependent density functional theory for high-energy density physics Alina Kononov et.al. 2511.14643 null
2025-11-18 Enhancing Agentic Autonomous Scientific Discovery with Vision-Language Model Capabilities Kahaan Gandhi et.al. 2511.14631 null
2025-11-18 Scalable Enforcement of Fine Grained Access Control Policies in Relational Database Management Systems Anadi Shakya et.al. 2511.14629 null
2025-11-18 XAttn-BMD: Multimodal Deep Learning with Cross-Attention for Femoral Neck Bone Mineral Density Estimation Yilin Zhang et.al. 2511.14604 null
2025-11-18 Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language Minyoung Hwang et.al. 2511.14565 null
2025-11-18 Full Atom Peptide Design via Riemannian Euclidean Bayesian Flow Networks Hao Qian et.al. 2511.14516 null
2025-11-18 Neural network impurity solver for real-frequency dynamical mean-field theory Fenglin Deng et.al. 2511.14505 null
2025-11-18 Overview and Prospects of Using Integer Surrogate Keys for Data Warehouse Performance Optimization Sviatoslav Stumpf et.al. 2511.14502 null
2025-11-18 Segmentation-Aware Latent Diffusion for Satellite Image Super-Resolution: Enabling Smallholder Farm Boundary Delineation Aditi Agarwal et.al. 2511.14481 null
2025-11-18 Cracking the Microsecond: An Efficient and Precise Time Synchronization Scheme for Hybrid 5G-TSN Networks Michael Gundall et.al. 2511.14462 null
2025-11-18 Advancing Minimally Invasive Precision Surgery in Open Cavities with Robotic Flexible Endoscopy Michelle Mattille et.al. 2511.14458 null
2025-11-18 Analyzing the Impact of Participant Failures in Cross-Silo Federated Learning Fabian Stricker et.al. 2511.14456 null
2025-11-17 Scaling Spatial Intelligence with Multimodal Foundation Models Zhongang Cai et.al. 2511.13719 null
2025-11-17 Crossing Borders: A Multimodal Challenge for Indian Poetry Translation and Image Generation Sofia Jamil et.al. 2511.13689 null
2025-11-17 Scalable Iterative Algorithm for Solving Optimal Transmission Switching with De-energization Benoît Jeanson et.al. 2511.13662 null
2025-11-17 Ontology-Driven Model-to-Model Transformation of Workflow Specifications Francisco Abreu et.al. 2511.13661 null
2025-11-17 Part-X-MLLM: Part-aware 3D Multimodal Large Language Model Chunshi Wang et.al. 2511.13647 null
2025-11-17 Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly? Chunqiu Steven Xia et.al. 2511.13646 null
2025-11-17 CreBench: Human-Aligned Creativity Evaluation from Idea to Process to Product Kaiwen Xue et.al. 2511.13626 null
2025-11-17 A Real-Time Driver Drowsiness Detection System Using MediaPipe and Eye Aspect Ratio Ashlesha G. Sawant et.al. 2511.13618 null
2025-11-17 BIOMERO 2.0: end-to-end FAIR infrastructure for bioimaging data import, analysis, and provenance Torec T. Luik et.al. 2511.13611 null
2025-11-17 A Gentle Introduction to Conformal Time Series Forecasting M. Stocker et.al. 2511.13608 null
2025-11-17 Long-range entanglement and quantum correlations in a multi-frequency comb system Sahil Pontula et.al. 2511.13604 null
2025-11-17 Physics-Informed Neural Networks for Nonlinear Output Regulation Sebastiano Mengozzi et.al. 2511.13595 null
2025-11-17 Data-driven Acceleration of MPC with Guarantees Agustin Castellano et.al. 2511.13588 null
2025-11-17 Graph Out-of-Distribution Detection via Test-Time Calibration with Dual Dynamic Dictionaries Yue Hou et.al. 2511.13541 null
2025-11-17 Towards Affect-Adaptive Human-Robot Interaction: A Protocol for Multimodal Dataset Collection on Social Anxiety Vesna Poprcova et.al. 2511.13530 null
2025-11-17 A Computationally Efficient Framework for Free-trajectory Minimum-lap-time Optimization of Racing Cars Erik van den Eshof et.al. 2511.13522 null
2025-11-17 Multi-Agent Multimodal Large Language Model Framework for Automated Interpretation of Fuel Efficiency Analytics in Public Transportation Zhipeng Ma et.al. 2511.13476 null
2025-11-17 Machine learning inspired photon number resolution in superconducting nanowire single-photon detectors I. S. Kuijf et.al. 2511.13475 null
2025-11-17 Measurement of Exclusive $π^+$ --argon Interactions Using ProtoDUNE-SP DUNE Collaboration et.al. 2511.13462 null
2025-11-17 Hardware optimization on Android for inference of AI models Iulius Gherasim et.al. 2511.13453 null
2025-11-17 Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline Rui Zuo et.al. 2511.13442 null
2025-11-17 Can Large Language Models Function as Qualified Pediatricians? A Systematic Evaluation in Real-World Clinical Contexts Siyu Zhu et.al. 2511.13381 null
2025-11-17 Dual-LoRA and Quality-Enhanced Pseudo Replay for Multimodal Continual Food Learning Xinlan Wu et.al. 2511.13351 null
2025-11-17 ZeroDexGrasp: Zero-Shot Task-Oriented Dexterous Grasp Synthesis with Prompt-Based Multi-Stage Semantic Reasoning Juntao Jian et.al. 2511.13327 null
2025-11-17 TacEleven: generative tactic discovery for football open play Siyao Zhao et.al. 2511.13326 null
2025-11-17 Computer Vision based group activity detection and action spotting Narthana Sivalingam et.al. 2511.13315 null
2025-11-17 Distributed Hierarchical Machine Learning for Joint Resource Allocation and Slice Selection in In-Network Edge Systems Sulaiman Muhammad Rashid et.al. 2511.13313 null
2025-11-17 DriveLiDAR4D: Sequential and Controllable LiDAR Scene Generation for Autonomous Driving Kaiwen Cai et.al. 2511.13309 null
2025-11-17 TabFlash: Efficient Table Understanding with Progressive Question Conditioning and Token Focusing Jongha Kim et.al. 2511.13283 null
2025-11-17 The Spontaneous Genesis of Solar Prominence Structures Driven by Supergranulation in Three-Dimensional Simulations Huanxin Chen et.al. 2511.13252 null
2025-11-17 DualTAP: A Dual-Task Adversarial Protector for Mobile MLLM Agents Fuyao Zhang et.al. 2511.13248 null
2025-11-17 MMD-Thinker: Adaptive Multi-Dimensional Thinking for Multimodal Misinformation Detection Junjie Wu et.al. 2511.13242 null
2025-11-17 GaRLILEO: Gravity-aligned Radar-Leg-Inertial Enhanced Odometry Chiyun Noh et.al. 2511.13216 null
2025-11-16 Sparsity-Driven Entanglement Detection in High-Dimensional Quantum States Stav Lotan et.al. 2511.12546 null
2025-11-16 High-level reasoning while low-level actuation in Cyber-Physical Systems: How efficient is it? Burak Karaduman et.al. 2511.12543 null
2025-11-16 Accepted with Minor Revisions: Value of AI-Assisted Scientific Writing Sanchaita Hazra et.al. 2511.12529 null
2025-11-16 Collaborative Charging Optimization for Wireless Rechargeable Sensor Networks via Heterogeneous Mobile Chargers Jianhang Yao et.al. 2511.12501 null
2025-11-16 Towards Better IncomLDL: We Are Unaware of Hidden Labels in Advance Jiecheng Jiang et.al. 2511.12494 null
2025-11-16 ClutterNav: Gradient-Guided Search for Efficient 3D Clutter Removal with Learned Costmaps Navin Sriram Ravie et.al. 2511.12479 null
2025-11-16 Lightweight Deep Autoencoder for ECG Denoising with Morphology Preservation and Near Real-Time Hardware Deployment Mahdi Pirayesh Shirazi Nejad et.al. 2511.12478 null
2025-11-16 Detecting LLM-Assisted Academic Dishonesty using Keystroke Dynamics Atharva Mehta et.al. 2511.12468 null
2025-11-16 Design of A Low-Latency and Parallelizable SVD Dataflow Architecture on FPGA Fangqiang Du et.al. 2511.12461 null
2025-11-16 Personality-guided Public-Private Domain Disentangled Hypergraph-Former Network for Multimodal Depression Detection Changzeng Fu et.al. 2511.12460 null
2025-11-16 CoTBox-TTT: Grounding Medical VQA with Visual Chain-of-Thought Boxes During Test-time Training Jiahe Qian et.al. 2511.12446 null
2025-11-16 Machine Learning Framework for Efficient Prediction of Quantum Wasserstein Distance Changchun Feng et.al. 2511.12443 null
2025-11-16 Real-Time Drivers' Drowsiness Detection and Analysis through Deep Learning ANK Zaman et.al. 2511.12438 null
2025-11-16 RoboAfford++: A Generative AI-Enhanced Dataset for Multimodal Affordance Learning in Robotic Manipulation and Navigation Xiaoshuai Hao et.al. 2511.12436 null
2025-11-16 Online Adaptive Probabilistic Safety Certificate with Language Guidance Zhuoyuan Wang et.al. 2511.12431 null
2025-11-16 RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning Jingqi Xu et.al. 2511.12428 null
2025-11-16 SynthGuard: An Open Platform for Detecting AI-Generated Multimedia with Multimodal LLMs Shail Desai et.al. 2511.12404 null
2025-11-16 Stochastic Predictive Analytics for Stocks in the Newsvendor Problem Pedro A. Pury et.al. 2511.12397 null
2025-11-15 Learning Adaptive Neural Teleoperation for Humanoid Robots: From Inverse Kinematics to End-to-End Control Sanjar Atamuradov et.al. 2511.12390 null
2025-11-15 CEDL: Centre-Enhanced Discriminative Learning for Anomaly Detection Zahra Zamanzadeh Darban et.al. 2511.12388 null
2025-11-14 Volumetric Ergodic Control Jueun Kwon et.al. 2511.11533 null
2025-11-14 Scalable Policy Evaluation with Video World Models Wei-Cheng Tseng et.al. 2511.11520 null
2025-11-14 W2S-AlignTree: Weak-to-Strong Inference-Time Alignment for Large Language Models via Monte Carlo Tree Search Zhenyu Ding et.al. 2511.11518 null
2025-11-14 Discrete Basis Parameterization for the Gauge Theory Bootstrap Rafael Cordoba et.al. 2511.11513 null
2025-11-14 Collaborative Representation Learning for Alignment of Tactile, Language, and Vision Modalities Yiyun Zhou et.al. 2511.11512 null
2025-11-14 OpenUS: A Fully Open-Source Foundation Model for Ultrasound Image Analysis via Self-Adaptive Masked Contrastive Learning Xiaoyu Zheng et.al. 2511.11510 null
2025-11-14 PAS : Prelim Attention Score for Detecting Object Hallucinations in Large Vision--Language Models Nhat Hoang-Xuan et.al. 2511.11502 null
2025-11-14 ImAgent: A Unified Multimodal Agent Framework for Test-Time Scalable Image Generation Kaishen Wang et.al. 2511.11483 null
2025-11-14 Context-aware Adaptive Visualizations for Critical Decision Making Angela Lopez-Cardona et.al. 2511.11476 null
2025-11-14 Proactive Hearing Assistants that Isolate Egocentric Conversations Guilin Hu et.al. 2511.11473 null
2025-11-14 MoCap2Radar: A Spatiotemporal Transformer for Synthesizing Micro-Doppler Radar Signatures from Motion Capture Kevin Chen et.al. 2511.11462 null
2025-11-14 Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing Classification Qinghao Gao et.al. 2511.11460 null
2025-11-14 DiffPro: Joint Timestep and Layer-Wise Precision Optimization for Efficient Diffusion Inference Farhana Amin et.al. 2511.11446 null
2025-11-14 Unsupervised Motion-Compensated Decomposition for Cardiac MRI Reconstruction via Neural Representation Xuanyu Tian et.al. 2511.11436 null
2025-11-14 The Persistence of Cultural Memory: Investigating Multimodal Iconicity in Diffusion Models Maria-Teresa De Rosa Palmini et.al. 2511.11435 null
2025-11-14 WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation Wei Chow et.al. 2511.11434 null
2025-11-14 MicroVQA++: High-Quality Microscopy Reasoning Dataset with Weakly Supervised Graphs for Multimodal Large Language Model Manyu Li et.al. 2511.11407 null
2025-11-14 Bidimensional measurements of photon statistics within a multimodal temporal framework C. Hainaut et.al. 2511.11403 null
2025-11-14 RadAround: A Field-Expedient Direction Finder for Contested IoT Sensing & EM Situational Awareness Owen A. Maute et.al. 2511.11392 null
2025-11-14 KarmaTS: A Universal Simulation Platform for Multivariate Time Series with Functional Causal Dynamics Haixin Li et.al. 2511.11357 null
2025-11-13 Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling Jiahao Wang et.al. 2511.10648 null
2025-11-13 Emergent spin order and steady-state superradiance in one-dimensional baths Silvia Cardenas-Lopez et.al. 2511.10638 null
2025-11-13 Robot Crash Course: Learning Soft and Stylized Falling Pascal Strauch et.al. 2511.10635 null
2025-11-13 Querying Labeled Time Series Data with Scenario Programs Edward Kim et.al. 2511.10627 null
2025-11-13 Bi-Level Contextual Bandits for Individualized Resource Allocation under Delayed Feedback Mohammadsina Almasi et.al. 2511.10572 null
2025-11-13 Oya: Deep Learning for Accurate Global Precipitation Estimation Emmanuel Asiedu Brempong et.al. 2511.10562 null
2025-11-13 OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Haosong Peng et.al. 2511.10560 null
2025-11-13 GraphFaaS: Serverless GNN Inference for Burst-Resilient, Real-Time Intrusion Detection Lingzhi Wang et.al. 2511.10554 null
2025-11-13 URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding Yongxin Shi et.al. 2511.10552 null
2025-11-13 Edge Machine Learning for Cluster Counting in Next-Generation Drift Chambers Deniz Yilmaz et.al. 2511.10540 null
2025-11-13 Evaluation of Grid-based Uncertainty Propagation for Collaborative Self-Calibration in Indoor Positioning Systems Andrea Jung et.al. 2511.10526 null
2025-11-13 A scalable and accurate framework for self-calibrating null depth retrieval using neural posterior estimation Baoyi Zeng et.al. 2511.10455 null
2025-11-13 Improving dependability in robotized bolting operations Lorenzo Pagliara et.al. 2511.10448 null
2025-11-13 Unlocking Dynamic Inter-Client Spatial Dependencies: A Federated Spatio-Temporal Graph Learning Method for Traffic Flow Forecasting Feng Wang et.al. 2511.10434 null
2025-11-13 CityVerse: A Unified Data Platform for Multi-Task Urban Computing with Large Language Models Yaqiao Zhu et.al. 2511.10418 null
2025-11-13 MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns Jiarui Zhang et.al. 2511.10390 null
2025-11-13 DermAI: Clinical dermatology acquisition through quality-driven image collection for AI classification in mobile Thales Bezerra et.al. 2511.10367 null
2025-11-13 On The Performance of Prefix-Sum Parallel Kalman Filters and Smoothers on GPUs Simo Särkkä et.al. 2511.10363 null
2025-11-13 Observable sets for free Schrödinger equation on combinatorial graphs Zhiqiang Wan et.al. 2511.10358 null
2025-11-13 Towards Comprehensive Sampling of SMT Solutions Shuangyu Lyu et.al. 2511.10326 null
2025-11-10 Lightning Grasp: High Performance Procedural Grasp Synthesis with Contact Fields Zhao-Heng Yin et.al. 2511.07418 null
2025-11-10 StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation Tianrui Feng et.al. 2511.07399 null
2025-11-10 Residual Rotation Correction using Tactile Equivariance Yizhe Zhu et.al. 2511.07381 null
2025-11-10 Real-Time LiDAR Super-Resolution via Frequency-Aware Multi-Scale Fusion June Moh Goo et.al. 2511.07377 null
2025-11-10 Offset-Free Robust Nonlinear Control Using Data-Driven Model: A Nonlinear Multi-Model Computationally Efficient Approach Carine Menezes Rebello et.al. 2511.07255 null
2025-11-10 Privacy on the Fly: A Predictive Adversarial Transformation Network for Mobile Sensor Data Tianle Song et.al. 2511.07242 null
2025-11-10 Resilient by Design - Active Inference for Distributed Continuum Intelligence Praveen Kumar Donta et.al. 2511.07202 null
2025-11-10 Dynamic Vaccine Prioritization via Non-Markovian Final-state Optimization Mi Feng et.al. 2511.07200 null
2025-11-10 Combining digital data streams and epidemic networks for real time outbreak detection Ruiqi Lyu et.al. 2511.07163 null
2025-11-10 Real-Time Co-Simulation for DC Microgrid Energy Management with Communication Delays S. Gokul Krishnan et.al. 2511.07052 null
2025-11-10 Raspi $^2$ USBL: An open-source Raspberry Pi-Based Passive Inverted Ultra-Short Baseline Positioning System for Underwater Robotics Jin Huang et.al. 2511.06998 null
2025-11-10 Light Focusing through Dynamic Media via Real-Valued Intensity Transmission Matrix Xuan Liu et.al. 2511.06993 null
2025-11-10 Koopman-Based Dynamic Environment Prediction for Safe UAV Navigation Vitor Bueno et.al. 2511.06990 null
2025-11-10 Fast Bayesian Updates via Harmonic Representations Di Zhang et.al. 2511.06978 null
2025-11-10 Ultrafast Topological Transitions Driven by Permittivity Modulation in Non-Hermitian Multilayers Giuseppina Simone et.al. 2511.06963 null
2025-11-10 DTTNet: Improving Video Shadow Detection via Dark-Aware Guidance and Tokenized Temporal Modeling Zhicheng Li et.al. 2511.06925 null
2025-11-10 Real-Time Diverse Fiber Sensing Multi-Event Detection using Phase OTDR Measurements Konstantinos Alexoudis et.al. 2511.06922 null
2025-11-10 MetricSynth: Framework for Aggregating DORA and KPI Metrics Across Multi-Platform Engineering Pallav Jain et.al. 2511.06864 null
2025-11-10 Synergistic Antenna-Modulator Integration for Monolithic Photonic RF Receiver Changlin Liu et.al. 2511.06825 null
2025-11-10 A Study of Cataclysmic Variables from the eFEDS Survey Rui Wang et.al. 2511.06814 null
2025-11-07 FPGA-Based Real-Time Waveform Classification Alperen Aksoy et.al. 2511.05479 null
2025-11-07 Precipitation nowcasting of satellite data using physically conditioned neural networks Antônio Catão et.al. 2511.05471 null
2025-11-07 EventFlow: Real-Time Neuromorphic Event-Driven Classification of Two-Phase Boiling Flow Regimes Sanghyeon Chang et.al. 2511.05467 null
2025-11-07 Helios: A 98-qubit trapped-ion quantum computer Anthony Ransford et.al. 2511.05465 null
2025-11-07 Large Language Models for Explainable Threat Intelligence Tiago Dinis et.al. 2511.05406 null
2025-11-07 AI Assisted AR Assembly: Object Recognition and Computer Vision for Augmented Reality Assisted Assembly Alexander Htet Kyaw et.al. 2511.05394 null
2025-11-07 Optimal Control of H-Mode Tokamak Plasma Temperature based on Pontryagin's Principle Slim Jmal et.al. 2511.05382 null
2025-11-07 ETHOS: A Robotic Encountered-Type Haptic Display for Social Interaction in Virtual Reality Eric Godden et.al. 2511.05379 null
2025-11-07 MultiVic: A Time-Predictable RISC-V Multi-Core Processor Optimized for Neural Network Inference Maximilian Kirschner et.al. 2511.05321 null
2025-11-07 Force-Safe Environment Maps and Real-Time Detection for Soft Robot Manipulators Akua K. Dickson et.al. 2511.05307 null
2025-11-07 psiUnity: A Platform for Multimodal Data-Driven XR Akhil Ajikumar et.al. 2511.05304 null
2025-11-07 LiveStar: Live Streaming Assistant for Real-World Online Video Understanding Zhenyu Yang et.al. 2511.05299 null
2025-11-07 Automatic segmentation of colorectal liver metastases for ultrasound-based navigated resection Tiziano Natali et.al. 2511.05253 null
2025-11-07 Transporter: A 128 $\times$ 4 SPAD Imager with On-chip Encoder for Spiking Neural Network-based Processing Yang Lin et.al. 2511.05241 null
2025-11-07 Scaling behavior of dissipative systems with imaginary gap closing Jinghui Pi et.al. 2511.05220 null
2025-11-07 Neural Operators for Power Systems: A Physics-Informed Framework for Modeling Power System Components Ioannis Karampinis et.al. 2511.05216 null
2025-11-07 SmartSecChain-SDN: A Blockchain-Integrated Intelligent Framework for Secure and Efficient Software-Defined Networks Azhar Hussain Mozumder et.al. 2511.05156 null
2025-11-07 On the Estimation of Climate Normals and Anomalies Tommaso Proietti et.al. 2511.05071 null
2025-11-07 Epically Powerful: An open-source software and mechatronics infrastructure for wearable robotic systems Jennifer K. Leestma et.al. 2511.05033 null
2025-11-07 Multi-agent Coordination via Flow Matching Dongsu Lee et.al. 2511.05005 null
2025-11-06 Funnel-Based Online Recovery Control for Nonlinear Systems With Unknown Dynamics Zihao Song et.al. 2511.04626 null
2025-11-06 Optimizing Sensor Placement in Urban Storm Sewers: A Data-Driven Sparse Sensing Approach Zihang Ding et.al. 2511.04556 null
2025-11-06 Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment Tao Lin et.al. 2511.04555 null
2025-11-06 Portable, cost_effective UV_vis_NIR microspectrophotometer for absorption and fluorescence microscopy and spectroscopy Negar Karpourazar et.al. 2511.04507 null
2025-11-06 AI-Driven Phase-Shifted Carrier Optimization for Cascaded Bridge Converters, Modular Multilevel Converters, and Reconfigurable Batteries Amin Hashemi-Zadeh et.al. 2511.04470 null
2025-11-06 Cutana: A High-Performance Tool for Astronomical Image Cutout Generation at Petabyte Scale Pablo Gómez et.al. 2511.04429 null
2025-11-06 Mitigating effects of nonlinearities in homodyne quadrature interferometers Johannes Lehmann et.al. 2511.04386 null
2025-11-06 Self-correcting High-speed Opto-electronic Probabilistic Computer Ramy Aboushelbaya et.al. 2511.04300 null
2025-11-06 A Parallel Region-Adaptive Differential Privacy Framework for Image Pixelization Ming Liu et.al. 2511.04261 null
2025-11-06 Accurate humidity and pH synchronized measurement with temperature compensation based on polarization maintaining fiber Jia Liu et.al. 2511.04203 null
2025-11-06 Deep reinforcement learning based navigation of a jellyfish-like swimmer in flows with obstacles Yihao Chen et.al. 2511.04156 null
2025-11-06 Infrared Microscopy of Biochemistry and Metabolism in Single Living Eukaryotic Cells Luca Quaroni et.al. 2511.04143 null
2025-11-06 Automated Tennis Player and Ball Tracking with Court Keypoints Detection (Hawk Eye System) Venkata Manikanta Desu et.al. 2511.04126 null
2025-11-06 Unified Effective Field Theory for Nonlinear and Quantum Optics Xiaochen Liu et.al. 2511.04118 null
2025-11-06 Tortoise and Hare Guidance: Accelerating Diffusion Model Inference with Multirate Integration Yunghee Lee et.al. 2511.04117 null
2025-11-06 Automated and Explainable Denial of Service Analysis for AI-Driven Intrusion Detection Systems Paul Badu Yakubu et.al. 2511.04114 null
2025-11-06 E-CARE: An Efficient LLM-based Commonsense-Augmented Framework for E-Commerce Ge Zhang et.al. 2511.04087 null
2025-11-06 Enhancing Fault-Tolerant Space Computing: Guidance Navigation and Control (GNC) and Landing Vision System (LVS) Implementations on Next-Gen Multi-Core Processors Kyongsik Yun et.al. 2511.04052 null
2025-11-06 An LLM-based Framework for Human-Swarm Teaming Cognition in Disaster Search and Rescue Kailun Ji et.al. 2511.04042 null
2025-11-06 Shellular Metamaterial Design via Compact Electric Potential Parametrization Chang Liu et.al. 2511.04025 null
2025-11-06 Node-Based Editing for Multimodal Generation of Text, Audio, Image, and Video Alexander Htet Kyaw et.al. 2511.03227 null
2025-11-05 LLM-enhanced Air Quality Monitoring Interface via Model Context Protocol Yu-Erh Pan et.al. 2511.03706 null
2025-11-05 Certified randomness amplification by dynamically probing remote random quantum states Minzhao Liu et.al. 2511.03686 null
2025-11-05 Simulation-Based Validation of an Integrated 4D/5D Digital-Twin Framework for Predictive Construction Control Atena Khoshkonesh et.al. 2511.03684 null
2025-11-05 LiveTradeBench: Seeking Real-World Alpha with Large Language Models Haofei Yu et.al. 2511.03628 null
2025-11-05 Super-resolution Optical Near-field EM for bio- and materials science Ilia Zykov et.al. 2511.03597 null
2025-11-05 Performance Evaluation of a Position-Sensitive SiPM-based Gamma Camera for Intraoperative Imaging Aramis Raiola et.al. 2511.03493 null
2025-11-05 A Modified Pulse and Design Framework to Halve the Complexity of OFDM Spectral Shaping Techniques Javier Giménez et.al. 2511.03465 null
2025-11-05 Formalizing ETLT and ELTL Design Patterns and Proposing Enhanced Variants: A Systematic Framework for Modern Data Engineering Chiara Rucco et.al. 2511.03393 null
2025-11-05 A Digital Twin of Evaporative Thermo-Fluidic Process in Fixation Unit of DoD Inkjet Printers Samarth Toolhally et.al. 2511.03379 null
2025-11-05 Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification Shaghayegh Kolli et.al. 2511.03217 null
2025-11-05 SurgAnt-ViVQA: Learning to Anticipate Surgical Events through GRU-Driven Temporal Cross-Attention Shreyas C. Dhake et.al. 2511.03178 null
2025-11-05 Subsampled Randomized Fourier GaLore for Adapting Foundation Models in Depth-Driven Liver Landmark Segmentation Yun-Chen Lin et.al. 2511.03163 null
2025-11-05 A Proprietary Model-Based Safety Response Framework for AI Agents Qi Li et.al. 2511.03138 null
2025-11-05 NOWS: Neural Operator Warm Starts for Accelerating Iterative Solvers Mohammad Sadegh Eshaghi et.al. 2511.02481 null
2025-11-05 Ultrafast magnetic moment transfer and bandgap renormalization in monolayer FeCl $_2$ Yu-Hui Song et.al. 2511.02461 null
2025-11-04 A Collaborative Reasoning Framework for Anomaly Diagnostics in Underwater Robotics Markus Buchholz et.al. 2511.03075 null
2025-11-04 Reading Between the Lines: The One-Sided Conversation Problem Victoria Ebert et.al. 2511.03056 null
2025-11-04 ROBoto2: An Interactive System and Dataset for LLM-assisted Clinical Trial Risk of Bias Assessment Anthony Hevia et.al. 2511.03048 null
2025-11-04 Exploratory Analysis of Cyberattack Patterns on E-Commerce Platforms Using Statistical Methods Fatimo Adenike Adeniya et.al. 2511.03020 null
2025-11-04 Establishing Trust in Crowdsourced Data Iffat Gheyas et.al. 2511.03016 null
2025-11-04 Observer-based neural networks for flow estimation and control Tarcísio C. Déda e

About

🎓 Update HumanAIGC related papers from ArXiv daily

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%