- Talking Face
- Image Animation
- Video Generation
- TryOn
- Visual Edit
- Others
- Music2Dance and Co-speech
- Speech and Interaction
- Post Training
Talking Face
Talking Face
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-11-18 | Blur-Robust Detection via Feature Restoration: An End-to-End Framework for Prior-Guided Infrared UAV Target Detection | Xiaolin Wang et.al. | 2511.14371 | null |
| 2025-11-18 | Towards Authentic Movie Dubbing with Retrieve-Augmented Director-Actor Interaction Learning | Rui Liu et.al. | 2511.14249 | null |
| 2025-11-18 | StreamingTalker: Audio-driven 3D Facial Animation with Autoregressive Diffusion Model | Yifan Yang et.al. | 2511.14223 | null |
| 2025-11-17 | B2F: End-to-End Body-to-Face Motion Generation with Style Reference | Bokyung Jang et.al. | 2511.13988 | null |
| 2025-11-17 | Passive Dementia Screening via Facial Temporal Micro-Dynamics Analysis of In-the-Wild Talking-Head Video | Filippo Cenacchi. Longbing Cao et.al. | 2511.13802 | null |
| 2025-11-17 | Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views | Junyi Ma et.al. | 2511.12878 | null |
| 2025-11-12 | GRACE: Designing Generative Face Video Codec via Agile Hardware-Centric Workflow | Rui Wan et.al. | 2511.09272 | null |
| 2025-11-11 | Is It Truly Necessary to Process and Fit Minutes-Long Reference Videos for Personalized Talking Face Generation? | Rui-Qing Sun et.al. | 2511.07940 | null |
| 2025-11-10 | LiveNeRF: Efficient Face Replacement Through Neural Radiance Fields Integration | Tung Vu et.al. | 2511.07552 | null |
| 2025-11-10 | The Inner Kernel of the Classical Kuiper Belt | Amir Siraj et.al. | 2511.07512 | null |
| 2025-11-10 | ConsistTalk: Intensity Controllable Temporally Consistent Talking Head Generation with Diffusion Noise Search | Zhenjie Liu et.al. | 2511.06833 | null |
| 2025-11-08 | DiLO: Disentangled Latent Optimization for Learning Shape and Deformation in Grouped Deforming 3D Objects | Mostofa Rafid Uddin et.al. | 2511.06115 | null |
| 2025-11-08 | Reperio-rPPG: Relational Temporal Graph Neural Networks for Periodicity Learning in Remote Physiological Measurement | Ba-Thinh Nguyen et.al. | 2511.05946 | null |
| 2025-11-07 | Shared Latent Representation for Joint Text-to-Audio-Visual Synthesis | Dogucan Yaman et.al. | 2511.05432 | null |
| 2025-11-07 | THEval. Evaluation Framework for Talking Head Video Generation | Nabyl Quignon et.al. | 2511.04520 | null |
| 2025-11-05 | Assessing Identity Leakage in Talking Face Generation: Metrics and Evaluation Framework | Dogucan Yaman et.al. | 2511.08613 | null |
| 2025-11-05 | Laugh, Relate, Engage: Stylized Comment Generation for Short Videos | Xuan Ouyang et.al. | 2511.03757 | null |
| 2025-11-05 | UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions | Guozhen Zhang et.al. | 2511.03334 | null |
| 2025-11-04 | Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks | Dmitrii Pozdeev et.al. | 2511.02830 | null |
| 2025-11-01 | Beyond the Uncanny Valley: A Mixed-Method Investigation of Anthropomorphism in Protective Responses to Robot Abuse | Fan Yang et.al. | 2510.26082 | null |
| 2025-11-01 | Audio Driven Real-Time Facial Animation for Social Telepresence | Jiye Lee et.al. | 2510.01176 | null |
| 2025-10-29 | Learning Disentangled Speech- and Expression-Driven Blendshapes for 3D Talking Face Animation | Yuxiang Mao et.al. | 2510.25234 | null |
| 2025-10-28 | See the Speaker: Crafting High-Resolution Talking Faces from Speech with Prior Guidance and Region Refinement | Jinting Wang et.al. | 2510.26819 | null |
| 2025-10-28 | The Divine Software Engineering Comedy -- Inferno: The Okinawa Files | Michele Lanza et.al. | 2510.24483 | null |
| 2025-10-28 | GenTrack: A New Generation of Multi-Object Tracking | Toan Van Nguyen et.al. | 2510.24399 | null |
| 2025-10-28 | Variable Projected Augmented Lagrangian Methods for Generalized Lasso Problems | Stefano Aleotti et.al. | 2510.24140 | null |
| 2025-10-27 | Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation | Junyoung Seo et.al. | 2510.23581 | null |
| 2025-10-27 | Revising Second Order Terms in Deep Animation Video Coding | Konstantin Schmidt et.al. | 2510.23561 | null |
| 2025-10-26 | MAGIC-Talk: Motion-aware Audio-Driven Talking Face Generation with Customizable Identity Control | Fatemeh Nazarieh et.al. | 2510.22810 | null |
| 2025-10-26 | DeepfakeBench-MM: A Comprehensive Benchmark for Multimodal Deepfake Detection | Kangran Zhao et.al. | 2510.22622 | null |
| 2025-10-24 | Unmasking Puppeteers: Leveraging Biometric Leakage to Disarm Impersonation in AI-based Videoconferencing | Danial Samadi Vahdati et.al. | 2510.03548 | null |
| 2025-10-23 | LSF-Animation: Label-Free Speech-Driven Facial Animation via Implicit Feature Representation | Xin Lu et.al. | 2510.21864 | null |
| 2025-10-16 | PIA: Deepfake Detection Using Phoneme-Temporal and Identity-Dynamic Analysis | Soumyya Kanti Datta et.al. | 2510.14241 | null |
| 2025-10-14 | Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback | Xingpei Ma et.al. | 2510.12089 | null |
| 2025-10-12 | DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis | Peiyin Chen et.al. | 2510.10650 | null |
| 2025-10-11 | VividAnimator: An End-to-End Audio and Pose-driven Half-Body Human Animation Framework | Donglin Huang et.al. | 2510.10269 | null |
| 2025-10-11 | SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation | Zeyu Ling et.al. | 2510.10069 | null |
| 2025-10-09 | Paper2Video: Automatic Video Generation from Scientific Papers | Zeyu Zhu et.al. | 2510.05096 | null |
| 2025-10-08 | A Bridge from Audio to Video: Phoneme-Viseme Alignment Allows Every Face to Speak Multiple Languages | Zibo Su et.al. | 2510.06612 | null |
| 2025-10-03 | EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation | Tianheng Zhu et.al. | 2510.08587 | null |
| 2025-10-02 | Input-Aware Sparse Attention for Real-Time Co-Speech Video Generation | Beijia Lu et.al. | 2510.02617 | null |
| 2025-09-30 | 3DiFACE: Synthesizing and Editing Holistic 3D Facial Animation | Balamurugan Thambiraja et.al. | 2509.26233 | null |
| 2025-09-28 | Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer | Hyunsoo Cha et.al. | 2509.04434 | null |
| 2025-09-26 | StableDub: Taming Diffusion Prior for Generalized and Efficient Visual Dubbing | Liyang Chen et.al. | 2509.21887 | null |
| 2025-09-25 | Unlocking Financial Insights: An advanced Multimodal Summarization with Multimodal Output Framework for Financial Advisory Videos | Sarmistha Das et.al. | 2509.20961 | null |
| 2025-09-24 | KSDiff: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation | Tianle Lyu et.al. | 2509.20128 | null |
| 2025-09-24 | Comparative Study of Subjective Video Quality Assessment Test Methods in Crowdsourcing for Varied Use Cases | Babak Naderi et.al. | 2509.20118 | null |
| 2025-09-24 | SynchroRaMa : Lip-Synchronized and Emotion-Aware Talking Face Generation via Multi-Modal Emotion Embedding | Phyo Thet Yee et.al. | 2509.19965 | null |
| 2025-09-24 | Talking Head Generation via AU-Guided Landmark Prediction | Shao-Yu Chang et.al. | 2509.19749 | null |
| 2025-09-24 | EAI-Avatar: Emotion-Aware Interactive Talking Head Generation | Haijie Yang et.al. | 2508.18337 | null |
| 2025-09-23 | Audio-Driven Universal Gaussian Head Avatars | Kartik Teotia et.al. | 2509.18924 | null |
| 2025-09-22 | "I don't like my avatar": Investigating Human Digital Doubles | Siyi Liu et.al. | 2509.17748 | null |
| 2025-09-22 | Stable Video-Driven Portraits | Mallikarjun B. R. et.al. | 2509.17476 | null |
| 2025-09-21 | Beat on Gaze: Learning Stylized Generation of Gaze and Head Dynamics | Chengwei Shi et.al. | 2509.17168 | null |
| 2025-09-21 | PGSTalker: Real-Time Audio-Driven Talking Head Generation via 3D Gaussian Splatting with Pixel-Aware Density Control | Tianheng Zhu et.al. | 2509.16922 | null |
| 2025-09-20 | Follow-Your-Emoji-Faster: Towards Efficient, Fine-Controllable, and Expressive Freestyle Portrait Animation | Yue Ma et.al. | 2509.16630 | null |
| 2025-09-17 | Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis | Yikang Ding et.al. | 2509.09595 | null |
| 2025-09-16 | A Lightweight Pipeline for Noisy Speech Voice Cloning and Accurate Lip Sync Synthesis | Javeria Amir et.al. | 2509.12831 | null |
| 2025-09-15 | AvatarSync: Rethinking Talking-Head Animation through Autoregressive Perspective | Yuchen Deng et.al. | 2509.12052 | null |
| 2025-09-10 | Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video | Xiao Li et.al. | 2509.08376 | null |
| 2025-08-28 | EmoCAST: Emotional Talking Portrait via Emotive Text Description | Yiguo Jiang et.al. | 2508.20615 | null |
| 2025-08-27 | InfinityHuman: Towards Long-Term Audio-Driven Human | Xiaodi Li et.al. | 2508.20210 | null |
| 2025-08-27 | Improving Generalization in Deepfake Detection with Face Foundation Models and Metric Learning | Stelios Mylonas et.al. | 2508.19730 | null |
| 2025-08-26 | OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation | Jianwen Jiang et.al. | 2508.19209 | null |
| 2025-08-26 | Wan-S2V: Audio-Driven Cinematic Video Generation | Xin Gao et.al. | 2508.18621 | null |
| 2025-08-25 | Lightning Fast Caching-based Parallel Denoising Prediction for Accelerating Talking Head Generation | Jianzhi Long et.al. | 2509.00052 | null |
| 2025-08-22 | Audio2Face-3D: Audio-driven Realistic Facial Animation For Digital Avatars | NVIDIA et.al. | 2508.16401 | null |
| 2025-08-20 | D^3-Talker: Dual-Branch Decoupled Deformation Fields for Few-Shot 3D Talking Head Synthesis | Yuhang Guo et.al. | 2508.14449 | null |
| 2025-08-20 | Taming Transformer for Emotion-Controllable Talking Face Generation | Ziqi Zhang et.al. | 2508.14359 | null |
| 2025-08-19 | TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis | Shunian Chen et.al. | 2508.13618 | null |
| 2025-08-19 | EDTalk++: Full Disentanglement for Controllable Talking Head Synthesis | Shuai Tan et.al. | 2508.13442 | null |
| 2025-08-18 | Human Feedback Driven Dynamic Speech Emotion Recognition | Ilya Fedorov et.al. | 2508.14920 | null |
| 2025-08-17 | CEM-Net: Cross-Emotion Memory Network for Emotional Talking Face Generation | Kangyi Wu et.al. | 2508.12368 | null |
| 2025-08-16 | RealTalk: Realistic Emotion-Aware Lifelike Talking-Head Synthesis | Wenqing Wang et.al. | 2508.12163 | null |
| 2025-08-16 | SimInterview: Transforming Business Education through Large Language Model-Based Simulated Multilingual Interview Training System | Truong Thanh Hung Nguyen et.al. | 2508.11873 | null |
| 2025-08-15 | FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation | MengChao Wang et.al. | 2508.11255 | null |
| 2025-08-14 | HM-Talker: Hybrid Motion Modeling for High-Fidelity Talking Head Synthesis | Shiyu Liu et.al. | 2508.10566 | null |
| 2025-08-13 | LIA-X: Interpretable Latent Portrait Animator | Yaohui Wang et.al. | 2508.09959 | null |
| 2025-08-12 | Preview WB-DH: Towards Whole Body Digital Human Bench for the Generation of Whole-body Talking Avatar Videos | Chaoyi Wang et.al. | 2508.08891 | null |
| 2025-08-11 | Learning Phonetic Context-Dependent Viseme for Enhancing Speech-Driven 3D Facial Animation | Hyung Kyu Kim et.al. | 2507.20568 | null |
| 2025-08-10 | KLASSify to Verify: Audio-Visual Deepfake Detection Using SSL-based Audio and Handcrafted Visual Features | Ivan Kukanov et.al. | 2508.07337 | null |
| 2025-08-08 | MotionSwap | Om Patil et.al. | 2508.06430 | null |
| 2025-08-07 | Evaluation of a Sign Language Avatar on Comprehensibility, User Experience & Acceptability | Fenya Wasserroth et.al. | 2508.05358 | null |
| 2025-08-07 | RAP: Real-time Audio-driven Portrait Animation with Video Diffusion Transformer | Fangyu Du et.al. | 2508.05115 | null |
| 2025-08-07 | UniTalker: Conversational Speech-Visual Synthesis | Yifan Hu et.al. | 2508.04585 | null |
| 2025-08-07 | AudioGen-Omni: A Unified Multimodal Diffusion Transformer for Video-Synchronized Audio, Speech, and Song Generation | Le Wang et.al. | 2508.00733 | null |
| 2025-08-06 | MienCap: Realtime Performance-Based Facial Animation with Live Mood Dynamics | Ye Pan et.al. | 2508.04687 | null |
| 2025-08-06 | READ: Real-time and Efficient Asynchronous Diffusion for Audio-driven Talking Head Generation | Haotian Wang et.al. | 2508.03457 | null |
| 2025-08-05 | Multi-human Interactive Talking Dataset | Zeyu Zhu et.al. | 2508.03050 | null |
| 2025-08-04 | X-Actor: Emotional and Expressive Long-Range Portrait Acting from Audio | Chenxu Zhang et.al. | 2508.02944 | null |
| 2025-08-04 | Text2Lip: Progressive Lip-Synced Talking Face Generation from Text via Viseme-Guided Rendering | Xu Wang et.al. | 2508.02362 | null |
| 2025-08-04 | Is It Really You? Exploring Biometric Verification Scenarios in Photorealistic Talking-Head Avatar Videos | Laura Pedrouzo-Rodriguez et.al. | 2508.00748 | null |
| 2025-07-31 | Who is a Better Talker: Subjective and Objective Quality Assessment for AI-Generated Talking Heads | Yingjie Zhou et.al. | 2507.23343 | null |
| 2025-07-30 | X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention | Xiaochen Zhao et.al. | 2507.23143 | null |
| 2025-07-30 | Robust Deepfake Detection for Electronic Know Your Customer Systems Using Registered Images | Takuma Amada et.al. | 2507.22601 | null |
| 2025-07-29 | DiTalker: A Unified DiT-based Framework for High-Quality and Speaking Styles Controllable Portrait Animation | He Feng et.al. | 2508.06511 | null |
| 2025-07-29 | JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1 | Xinhan Di et.al. | 2507.20987 | null |
| 2025-07-29 | Versatile Multimodal Controls for Expressive Talking Human Animation | Zheng Qin et.al. | 2503.08714 | null |
| 2025-07-28 | Mask-Free Audio-driven Talking Face Generation for Enhanced Visual Quality and Identity Preservation | Dogucan Yaman et.al. | 2507.20953 | null |
| 2025-07-28 | MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization | Hyung Kyu Kim et.al. | 2507.20562 | null |
| 2025-07-28 | JOLT3D: Joint Learning of Talking Heads and 3DMM Parameters with Application to Lip-Sync | Sungjoon Park et.al. | 2507.20452 | null |
| 2025-07-25 | Face2VoiceSync: Lightweight Face-Voice Consistency for Text-Driven Talking Face Generation | Fang Kang et.al. | 2507.19225 | null |
| 2025-07-24 | Tiny is not small enough: High-quality, low-resource facial animation models through hybrid knowledge distillation | Zhen Han et.al. | 2507.18352 | null |
| 2025-07-24 | Celeb-DF++: A Large-scale Challenging Video DeepFake Benchmark for Generalizable Forensics | Yuezun Li et.al. | 2507.18015 | null |
| 2025-07-24 | MEDTalk: Multimodal Controlled 3D Facial Animation with Dynamic Emotions by Disentangled Embedding | Chang Liu et.al. | 2507.06071 | null |
| 2025-07-23 | MoDA: Multi-modal Diffusion Architecture for Talking Head Generation | Xinyang Li et.al. | 2507.03256 | null |
| 2025-07-22 | Livatar-1: Real-Time Talking Heads Generation with Tailored Flow Matching | Haiyang Liu et.al. | 2507.18649 | null |
| 2025-07-22 | Navigating Large-Pose Challenge for High-Fidelity Face Reenactment with Video Diffusion Model | Mingtao Guo et.al. | 2507.16341 | null |
| 2025-07-21 | VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis | Alexandre Symeonidis-Herzig et.al. | 2507.06060 | null |
| 2025-07-18 | FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers | Qiang Wang et.al. | 2507.12956 | null |
| 2025-07-17 | ATL-Diff: Audio-Driven Talking Head Generation with Early Landmarks-Guide Noise Diffusion | Hoang-Son Vo et.al. | 2507.12804 | null |
| 2025-07-17 | Think-Before-Draw: Decomposing Emotion Semantics & Fine-Grained Controllable Expressive Talking Head Generation | Hanlei Shi et.al. | 2507.12761 | null |
| 2025-07-17 | Cross-Modal Watermarking for Authentic Audio Recovery and Tamper Localization in Synthesized Audiovisual Forgeries | Minyoung Kim et.al. | 2507.12723 | null |
| 2025-07-16 | AU-Blendshape for Fine-grained Stylized 3D Facial Expression Manipulation | Hao Li et.al. | 2507.12001 | null |
| 2025-07-14 | M2DAO-Talker: Harmonizing Multi-granular Motion Decoupling and Alternating Optimization for Talking-head Generation | Kui Jiang et.al. | 2507.08307 | null |
| 2025-07-11 | Detecting Deepfake Talking Heads from Facial Biometric Anomalies | Justin D. Norman et.al. | 2507.08917 | null |
| 2025-07-10 | GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation | Wentao Hu et.al. | 2506.21513 | null |
| 2025-07-07 | MoDiT: Learning Highly Consistent 3D Motion Coefficients with Diffusion Transformer for Talking Head Generation | Yucheng Wang et.al. | 2507.05092 | null |
| 2025-07-05 | EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation | Rang Meng et.al. | 2507.03905 | null |
| 2025-07-03 | CanonSwap: High-Fidelity and Consistent Video Face Swapping via Canonical Space Modulation | Xiangyang Luo et.al. | 2507.02691 | null |
| 2025-07-02 | FixTalk: Taming Identity Leakage for High-Quality Talking Head Generation in Extreme Cases | Shuai Tan et.al. | 2507.01390 | null |
| 2025-07-01 | ICME 2025 Grand Challenge on Video Super-Resolution for Video Conferencing | Babak Naderi et.al. | 2506.12269 | link |
| 2025-06-30 | JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching | Mingi Kwon et.al. | 2506.23552 | null |
| 2025-06-27 | MirrorMe: Towards Realtime and High Fidelity Audio-Driven Halfbody Animation | Dechao Meng et.al. | 2506.22065 | null |
| 2025-06-27 | Few-Shot Identity Adaptation for 3D Talking Heads via Global Gaussian Field | Hong Nie et.al. | 2506.22044 | null |
| 2025-06-27 | RiverEcho: Real-Time Interactive Digital System for Ancient Yellow River Culture | Haofeng Wang et.al. | 2506.21865 | null |
| 2025-06-24 | Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router | Yubo Huang et.al. | 2506.19833 | null |
| 2025-06-23 | Advancing Talking Head Generation: A Comprehensive Survey of Multi-Modal Methodologies, Datasets, Evaluation Metrics, and Loss Functions | Vineet Kumar Rakesh et.al. | 2507.02900 | null |
| 2025-06-23 | OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation | Qijun Gan et.al. | 2506.18866 | null |
| 2025-06-17 | SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting | Ziqiao Peng et.al. | 2506.14742 | null |
| 2025-06-17 | Compressed Video Super-Resolution based on Hierarchical Encoding | Yuxuan Jiang et.al. | 2506.14381 | null |
| 2025-06-16 | Audio-Visual Driven Compression for Low-Bitrate Talking Head Videos | Riku Takahashi et.al. | 2506.13419 | null |
| 2025-06-15 | iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer | Zhelun Shen et.al. | 2506.12847 | null |
| 2025-06-10 | HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation | Ziyao Huang et.al. | 2506.08797 | null |
| 2025-06-03 | NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results | Xiaohong Liu et.al. | 2506.02875 | null |
| 2025-06-02 | Cocktail-Party Audio-Visual Speech Recognition | Thai-Binh Nguyen et.al. | 2506.02178 | null |
| 2025-06-02 | Low-Rank Head Avatar Personalization with Registers | Sai Tanmay Reddy Chakkera et.al. | 2506.01935 | null |
| 2025-06-02 | Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation | Yuan Gan et.al. | 2506.01591 | link |
| 2025-06-01 | SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers | Zhengcong Fei et.al. | 2506.00830 | null |
| 2025-05-30 | TalkingHeadBench: A Multi-Modal Benchmark & Analysis of Talking-Head DeepFake Detection | Xinqi Xiong et.al. | 2505.24866 | null |
| 2025-05-29 | Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion Modulation | Jiahao Cui et.al. | 2505.23525 | link |
| 2025-05-29 | Video Editing for Audio-Visual Dubbing | Binyamin Manela et.al. | 2505.23406 | link |
| 2025-05-29 | Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation | Hao Li et.al. | 2505.23290 | link |
| 2025-05-29 | MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation | Siyuan Wang et.al. | 2505.23120 | link |
| 2025-05-28 | Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation | Zhe Kong et.al. | 2505.22647 | link |
| 2025-05-28 | Tell me Habibi, is it Real or Fake? | Kartik Kuckreja et.al. | 2505.22581 | null |
| 2025-05-28 | Neural Face Skinning for Mesh-agnostic Facial Expression Cloning | Sihun Cha et.al. | 2505.22416 | null |
| 2025-05-28 | FaceEditTalker: Interactive Talking Head Generation with Facial Attribute Editing | Guanwen Feng et.al. | 2505.22141 | null |
| 2025-05-28 | RESOUND: Speech Reconstruction from Silent Videos via Acoustic-Semantic Decomposed Modeling | Long-Khanh Pham et.al. | 2505.22024 | null |
| 2025-05-27 | OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers | Ziqiao Peng et.al. | 2505.21448 | null |
| 2025-05-26 | Total-Editing: Head Avatar with Editable Appearance, Motion, and Lighting | Yizhou Zhao et.al. | 2505.20582 | null |
| 2025-05-26 | DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations | Ziqiao Peng et.al. | 2505.18096 | null |
| 2025-05-22 | Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis | Radek Daněček et.al. | 2504.13386 | null |
| 2025-05-14 | Test-Time Augmentation for Pose-invariant Face Recognition | Jaemin Jung et.al. | 2505.09256 | null |
| 2025-05-10 | VTutor: An Animated Pedagogical Agent SDK that Provide Real Time Multi-Model Feedback | Eason Chen et.al. | 2505.06676 | null |
| 2025-05-10 | OT-Talk: Animating 3D Talking Head with Optimal Transportation | Xinmu Wang et.al. | 2505.01932 | null |
| 2025-05-10 | MagicPortrait: Temporally Consistent Face Reenactment with 3D Geometric Guidance | Mengting Wei et.al. | 2504.21497 | link |
| 2025-05-08 | OXSeg: Multidimensional attention UNet-based lip segmentation using semi-supervised lip contours | Hanie Moghaddasi et.al. | 2505.05531 | null |
| 2025-05-03 | GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting | Anushka Agarwal et.al. | 2505.01928 | null |
| 2025-05-02 | Model See Model Do: Speech-Driven Facial Animation with Style Control | Yifang Pan et.al. | 2505.01319 | null |
| 2025-05-02 | FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing | Gaoxiang Cong et.al. | 2505.01263 | null |
| 2025-05-01 | KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution | Antoni Bigata et.al. | 2505.00497 | null |
| 2025-04-29 | IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos | Yuan Li et.al. | 2504.19165 | null |
| 2025-04-27 | Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions | Mohammad Mahdi Abootorabi et.al. | 2504.19056 | link |
| 2025-04-26 | Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning | Yifan Xie et.al. | 2504.18810 | null |
| 2025-04-25 | Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation | Weipeng Tan et.al. | 2504.18087 | null |
| 2025-04-14 | SpinMeRound: Consistent Multi-View Identity Generation Using Diffusion Models | Stathis Galanakis et.al. | 2504.10716 | null |
| 2025-04-10 | ChildlikeSHAPES: Semantic Hierarchical Region Parsing for Animating Figure Drawings | Astitva Srivastava et.al. | 2504.08022 | null |
| 2025-04-08 | VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing | Juan Luis Gonzalez Bello et.al. | 2504.07146 | null |
| 2025-04-08 | SE4Lip: Speech-Lip Encoder for Talking Head Synthesis to Solve Phoneme-Viseme Alignment Ambiguity | Yihuan Huang et.al. | 2504.05803 | null |
| 2025-04-08 | Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation | Zhihua Xu et.al. | 2504.05746 | null |
| 2025-04-08 | Contrastive Decoupled Representation Learning and Regularization for Speech-Preserving Facial Expression Manipulation | Tianshui Chen et.al. | 2504.05672 | null |
| 2025-04-07 | Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation | Fa-Ting Hong et.al. | 2504.02542 | link |
| 2025-04-06 | FluentLip: A Phonemes-Based Two-stage Approach for Audio-Driven Lip Synthesis with Optical Flow Consistency | Shiyan Liu et.al. | 2504.04427 | null |
| 2025-04-04 | A Human Digital Twin Architecture for Knowledge-based Interactions and Context-Aware Conversations | Abdul Mannan Mohammed et.al. | 2504.03147 | null |
| 2025-04-03 | OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication | Zhongjian Wang et.al. | 2504.02433 | null |
| 2025-04-03 | VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models | Kim Sung-Bin et.al. | 2504.02386 | null |
| 2025-04-02 | Detecting Lip-Syncing Deepfakes: Vision Temporal Transformer for Analyzing Mouth Inconsistencies | Soumyya Kanti Datta et.al. | 2504.01470 | link |
| 2025-04-02 | EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters | Xuli Shen et.al. | 2503.19416 | null |
| 2025-04-01 | Monocular and Generalizable Gaussian Talking Head Animation | Shengjie Gong et.al. | 2504.00665 | null |
| 2025-04-01 | Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics | Lee Chae-Yeon et.al. | 2503.20308 | null |
| 2025-03-30 | MoCha: Towards Movie-Grade Talking Character Synthesis | Cong Wei et.al. | 2503.23307 | null |
| 2025-03-29 | STSA: Spatial-Temporal Semantic Alignment for Visual Dubbing | Zijun Ding et.al. | 2503.23039 | link |
| 2025-03-28 | Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis | Shuai Shen et.al. | 2503.22605 | null |
| 2025-03-28 | Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance | Haijie Yang et.al. | 2503.22225 | null |
| 2025-03-27 | ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model | Jinwei Qi et.al. | 2503.21144 | null |
| 2025-03-26 | Dual Audio-Centric Modality Coupling for Talking Head Generation | Ao Fu et.al. | 2503.22728 | null |
| 2025-03-25 | AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers | Jiazhi Guan et.al. | 2503.19824 | null |
| 2025-03-25 | MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation | Yukang Lin et.al. | 2503.19383 | null |
| 2025-03-25 | HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation | Zunnan Xu et.al. | 2503.18860 | null |
| 2025-03-25 | Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model | Yingying Fan et.al. | 2503.16942 | null |
| 2025-03-24 | DisentTalk: Cross-lingual Talking Face Generation via Semantic Disentangled Diffusion Model | Kangwei Liu et.al. | 2503.19001 | null |
| 2025-03-24 | Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation | Dingcheng Zhen et.al. | 2503.18429 | null |
| 2025-03-23 | DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation | Peng Chen et.al. | 2503.18159 | link |
| 2025-03-21 | TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting | Jianchuan Chen et.al. | 2503.17032 | null |
| 2025-03-21 | From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech | Ji-Hoon Kim et.al. | 2503.16956 | null |
| 2025-03-20 | UniSync: A Unified Framework for Audio-Visual Synchronization | Tao Feng et.al. | 2503.16357 | null |
| 2025-03-20 | PC-Talk: Precise Facial Animation Control for Audio-Driven Talking Face Generation | Baiqin Wang et.al. | 2503.14295 | null |
| 2025-03-19 | DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis | Yuming Gu et.al. | 2503.15667 | link |
| 2025-03-19 | KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation | Antoni Bigata et.al. | 2503.01715 | null |
| 2025-03-17 | SyncDiff: Diffusion-based Talking Head Synthesis with Bottlenecked Temporal Visual Prior for Improved Synchronization | Xulin Fan et.al. | 2503.13371 | null |
| 2025-03-17 | Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait | Chaolong Yang et.al. | 2503.12963 | link |
| 2025-03-14 | Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control | Hejia Chen et.al. | 2503.14517 | null |
| 2025-03-14 | EmoDiffusion: Enhancing Emotional 3D Facial Animation with Latent Diffusion Models | Yixuan Zhang et.al. | 2503.11028 | null |
| 2025-03-12 | StyleSpeaker: Audio-Enhanced Fine-Grained Style Modeling for Speech-Driven 3D Facial Animation | An Yang et.al. | 2503.09852 | null |
| 2025-03-12 | Bidirectional Learned Facial Animation Codec for Low Bitrate Talking Head Videos | Riku Takahashi et.al. | 2503.09787 | null |
| 2025-03-09 | Removing Averaging: Personalized Lip-Sync Driven Characters Based on Identity Adapter | Yanyu Zhu et.al. | 2503.06397 | null |
| 2025-03-07 | MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice | Hongwei Yi et.al. | 2503.05978 | null |
| 2025-03-06 | FREAK: Frequency-modulated High-fidelity and Real-time Audio-driven Talking Portrait Synthesis | Ziqi Ni et.al. | 2503.04067 | null |
| 2025-03-02 | FaceShot: Bring Any Character into Life | Junyao Gao et.al. | 2503.00740 | null |
| 2025-03-01 | Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture | Xuanchen Li et.al. | 2503.00495 | null |
| 2025-02-28 | Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints | Masoumeh Chapariniya et.al. | 2502.20803 | null |
| 2025-02-28 | ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model | Xuangeng Chu et.al. | 2502.20323 | null |
| 2025-02-27 | InsTaG: Learning Personalized 3D Talking Head from Few-Second Video | Jiahe Li et.al. | 2502.20387 | link |
| 2025-02-27 | High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model | Mingtao Guo et.al. | 2502.19894 | link |
| 2025-02-26 | FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion mode | Lingzhou Mu et.al. | 2502.19455 | null |
| 2025-02-24 | Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation | Baptiste Chopin et.al. | 2502.17198 | null |
| 2025-02-20 | NeRF-3DTalker: Neural Radiance Field with 3D Prior Aided Audio Disentanglement for Talking Head Synthesis | Xiaoxing Liu et.al. | 2502.14178 | null |
| 2025-02-18 | AV-Flow: Transforming Text to Audio-Visual Human-like Interactions | Aggelina Chatziagapi et.al. | 2502.13133 | null |
| 2025-02-17 | SayAnything: Audio-Driven Lip Synchronization with Conditional Video Diffusion | Junxian Ma et.al. | 2502.11515 | null |
| 2025-02-15 | SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers | Di Qiu et.al. | 2502.10841 | link |
| 2025-02-13 | Long-Term TalkingFace Generation via Motion-Prior Conditional Diffusion Model | Fei Shen et.al. | 2502.09533 | null |
| 2025-02-13 | VTutor: An Open-Source SDK for Generative AI-Powered Animated Pedagogical Agents with Multi-Media Output | Eason Chen et.al. | 2502.04103 | null |
| 2025-02-11 | Playmate: Flexible Control of Portrait Animation via 3D-Implicit Space Guided Diffusion | Xingpei Ma et.al. | 2502.07203 | null |
| 2025-02-07 | Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark | Han Zhang et.al. | 2502.04976 | null |
| 2025-02-02 | EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis | Junuk Cha et.al. | 2502.00654 | null |
| 2025-01-24 | SyncAnimation: A Real-Time End-to-End Framework for Audio-Driven Human Pose and Talking Head Animation | Yujian Liu et.al. | 2501.14646 | null |
| 2025-01-21 | A Lightweight and Interpretable Deepfakes Detection Framework | Muhammad Umar Farooq et.al. | 2501.11927 | null |
| 2025-01-18 | EMO2: End-Effector Guided Audio-Driven Avatar Video Generation | Linrui Tian et.al. | 2501.10687 | null |
| 2025-01-17 | TalkingEyes: Pluralistic Speech-Driven 3D Eye Gaze Animation | Yixiang Zhuang et.al. | 2501.09921 | null |
| 2025-01-15 | Joint Learning of Depth and Appearance for Portrait Image Animation | Xinya Ji et.al. | 2501.08649 | null |
| 2025-01-15 | Make-A-Character 2: Animatable 3D Character Generation From a Single Image | Lin Liu et.al. | 2501.07870 | null |
| 2025-01-09 | Towards Dynamic Neural Communication and Speech Neuroprosthesis Based on Viseme Decoding | Ji-Ha Park et.al. | 2501.14790 | null |
| 2025-01-09 | Identity-Preserving Video Dubbing Using Motion Warping | Runzhen Liu et.al. | 2501.04586 | null |
| 2025-01-09 | MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation | Huaize Liu et.al. | 2501.01808 | null |
| 2025-01-07 | Generating and Detecting Various Types of Fake Image and Audio Content: A Review of Modern Deep Learning Technologies and Tools | Arash Dehghani et.al. | 2501.06227 | null |
| 2025-01-07 | VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control | Yuanpeng Tu et.al. | 2501.01427 | null |
| 2025-01-06 | RDD4D: 4D Attention-Guided Road Damage Detection And Classification | Asma Alkalbani et.al. | 2501.02822 | link |
| 2025-01-06 | Takeaways from Applying LLM Capabilities to Multiple Conversational Avatars in a VR Pilot Study | Mykola Maslych et.al. | 2501.00168 | null |
| 2025-01-03 | JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing | Qili Wang et.al. | 2501.01798 | link |
| 2024-12-28 | DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis | Kaijun Deng et.al. | 2412.20148 | link |
| 2024-12-26 | UniAvatar: Taming Lifelike Audio-Driven Talking Head Generation with Comprehensive Motion and Lighting Control | Wenzhang Sun et.al. | 2412.19860 | null |
| 2024-12-26 | Generating Editable Head Avatars with 3D Gaussian GANs | Guohao Li et.al. | 2412.19149 | link |
| 2024-12-23 | FaceLift: Single Image to 3D Head with View Generation and GS-LRM | Weijie Lyu et.al. | 2412.17812 | null |
| 2024-12-22 | FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation | Tianyun Zhong et.al. | 2412.16915 | null |
| 2024-12-18 | Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters | Steven Hogue et.al. | 2412.14333 | link |
| 2024-12-18 | GLCF: A Global-Local Multimodal Coherence Analysis Framework for Talking Face Generation Detection | Xiaocan Chen et.al. | 2412.13656 | null |
| 2024-12-18 | Learning to Control an Android Robot Head for Facial Animation | Marcel Heisler et.al. | 2412.13641 | null |
| 2024-12-18 | Real-time One-Step Diffusion-based Expressive Portrait Videos Generation | Hanzhong Guo et.al. | 2412.13479 | link |
| 2024-12-18 | VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization | Tao Liu et.al. | 2412.09892 | null |
| 2024-12-16 | Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content | Rohit Kundu et.al. | 2412.12278 | null |
| 2024-12-13 | GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expression | Ziqi Zhou et.al. | 2412.09296 | link |
| 2024-12-12 | LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync | Chunyu Li et.al. | 2412.09262 | link |
| 2024-12-12 | EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing | Gaoxiang Cong et.al. | 2412.08988 | null |
| 2024-12-12 | PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis | Yifan Xie et.al. | 2412.08504 | null |
| 2024-12-10 | PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation | Fatemeh Nazarieh et.al. | 2412.07754 | null |
| 2024-12-10 | IF-MDM: Implicit Face Motion Diffusion Model for High-Fidelity Realtime Talking Head Generation | Sejong Yang et.al. | 2412.04000 | null |
| 2024-12-05 | MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation | Longtao Zheng et.al. | 2412.04448 | null |
| 2024-12-05 | Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks | Jiahao Cui et.al. | 2412.00733 | link |
| 2024-12-04 | SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model | Yan Li et.al. | 2412.03430 | null |
| 2024-12-02 | One Shot, One Talk: Whole-body Talking Avatar from a Single Image | Jun Xiang et.al. | 2412.01106 | null |
| 2024-12-01 | Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation | Shuling Zhao et.al. | 2412.00719 | null |
| 2024-11-29 | LokiTalk: Learning Fine-Grained and Generalizable Correspondences to Enhance NeRF-based Talking Head Synthesis | Tianqi Li et.al. | 2411.19525 | null |
| 2024-11-29 | Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis | Tianqi Li et.al. | 2411.19509 | link |
| 2024-11-29 | V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow | Jeongsoo Choi et.al. | 2411.19486 | link |
| 2024-11-26 | Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey | Hong-Hanh Nguyen-Le et.al. | 2411.17911 | null |
| 2024-11-25 | Sonic: Shifting Focus to Global Audio Perception in Portrait Animation | Xiaozhong Ji et.al. | 2411.16331 | null |
| 2024-11-25 | ESARM: 3D Emotional Speech-to-Animation via Reward Model from Automatically-Ranked Demonstrations | Xulong Zhang et.al. | 2411.13089 | null |
| 2024-11-24 | LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis | Haojie Zhang et.al. | 2411.16748 | null |
| 2024-11-23 | EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion | Haotian Wang et.al. | 2411.16726 | null |
| 2024-11-23 | ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance | Haijie Yang et.al. | 2411.15436 | null |
| 2024-11-20 | Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis | Pegah Salehi et.al. | 2411.13209 | link |
| 2024-11-20 | JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation | Xuyang Cao et.al. | 2411.09209 | link |
| 2024-11-14 | LES-Talker: Fine-Grained Emotion Editing for Talking Head Generation in Linear Emotion Space | Guanwen Feng et.al. | 2411.09268 | null |
| 2024-11-06 | Large Generative Model-assisted Talking-face Semantic Communication System | Feibo Jiang et.al. | 2411.03876 | null |
| 2024-11-05 | SPEAK: Speech-Driven Pose and Emotion-Adjustable Talking Head Generation | Changpeng Cai et.al. | 2405.07257 | null |
| 2024-10-31 | Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts | Xiang Deng et.al. | 2410.23836 | null |
| 2024-10-29 | Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing | Haonan Tong et.al. | 2410.22112 | null |
| 2024-10-24 | Real-time 3D-aware Portrait Video Relighting | Ziqi Cai et.al. | 2410.18355 | link |
| 2024-10-21 | Joker: Conditional 3D Head Synthesis with Extreme Facial Expressions | Malte Prinzler et.al. | 2410.16395 | null |
| 2024-10-18 | Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization | Bin Lin et.al. | 2410.14283 | null |
| 2024-10-18 | DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation | Hanbo Cheng et.al. | 2410.13726 | link |
| 2024-10-16 | MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting | Yue Zhang et.al. | 2410.10122 | link |
| 2024-10-15 | Titanic Calling: Low Bandwidth Video Conference from the Titanic Wreck | Fevziye Irem Eyiokur et.al. | 2410.11434 | null |
| 2024-10-15 | MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes | Zhenhui Ye et.al. | 2410.06734 | null |
| 2024-10-14 | Character-aware audio-visual subtitling in context | Jaesung Huh et.al. | 2410.11068 | null |
| 2024-10-14 | Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads | Federico Nocentini et.al. | 2410.11041 | null |
| 2024-10-14 | TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model | Jiazhi Guan et.al. | 2410.10696 | null |
| 2024-10-14 | Generative Human Video Compression with Multi-granularity Temporal Trajectory Factorization | Shanzhi Yin et.al. | 2410.10171 | null |
| 2024-10-10 | MMHead: Towards Fine-grained Multi-modal 3D Facial Animation | Sijing Wu et.al. | 2410.07757 | null |
| 2024-10-09 | FreeAvatar: Robust 3D Facial Animation Transfer by Learning an Expression Foundation Model | Feng Qiu et.al. | 2409.13180 | null |
| 2024-10-01 | LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details | Jian Yang et.al. | 2410.00990 | null |
| 2024-09-29 | Learning Frame-Wise Emotion Intensity for Audio-Driven Talking-Head Generation | Jingyi Xu et.al. | 2409.19501 | null |
| 2024-09-27 | Diverse Code Query Learning for Speech-Driven Facial Animation | Chunzhi Gu et.al. | 2409.19143 | null |
| 2024-09-26 | Stable Video Portraits | Mirela Ostrek et.al. | 2409.18083 | null |
| 2024-09-25 | ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE | Sichun Wu et.al. | 2409.07966 | link |
| 2024-09-24 | FastTalker: Jointly Generating Speech and Conversational Gestures from Text | Zixin Guo et.al. | 2409.16404 | null |
| 2024-09-23 | FaceVid-1K: A Large-Scale High-Quality Multiracial Human Face Video Dataset | Donglin Di et.al. | 2410.07151 | null |
| 2024-09-23 | MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning | Yue Han et.al. | 2409.15179 | null |
| 2024-09-18 | JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation | Sai Tanmay Reddy Chakkera et.al. | 2409.12156 | null |
| 2024-09-18 | GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations | Kartik Teotia et.al. | 2409.11951 | null |
| 2024-09-17 | 3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy | Xuanmeng Sha et.al. | 2409.10848 | null |
| 2024-09-16 | DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis | Fa-Ting Hong et.al. | 2409.10281 | null |
| 2024-09-14 | StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads | Suzhen Wang et.al. | 2409.09292 | null |
| 2024-09-11 | DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures | Steven Hogue et.al. | 2409.07649 | null |
| 2024-09-11 | EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion | Jian Zhang et.al. | 2409.07255 | link |
| 2024-09-09 | PersonaTalk: Bring Attention to Your Persona in Visual Dubbing | Longhao Zhang et.al. | 2409.05379 | null |
| 2024-09-09 | KAN-Based Fusion of Dual-Domain for Audio-Driven Facial Landmarks Generation | Hoang-Son Vo-Thanh et.al. | 2409.05330 | link |
| 2024-09-05 | SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing | Lingyu Xiong et.al. | 2409.03605 | null |
| 2024-09-05 | SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model | Weipeng Tan et.al. | 2409.03270 | null |
| 2024-09-04 | PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation | Jun Ling et.al. | 2409.02657 | null |
| 2024-09-02 | KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding | Zhihao Xu et.al. | 2409.01113 | link |
| 2024-08-28 | Micro and macro facial expressions by driven animations in realistic Virtual Humans | Rubens Halbig Montanha et.al. | 2408.16110 | null |
| 2024-08-27 | MegActor- |
Shurong Yang et.al. | 2408.14975 | null |
| 2024-08-25 | TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation | Jack Saunders et.al. | 2408.13714 | null |
| 2024-08-23 | G3FA: Geometry-guided GAN for Face Animation | Alireza Javanmardi et.al. | 2408.13049 | null |
| 2024-08-21 | AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition | Minheng Ni et.al. | 2408.11564 | null |
| 2024-08-21 | EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face with Mesh Attention | Yihong Lin et.al. | 2408.11518 | null |
| 2024-08-20 | DEGAS: Detailed Expressions on Full-Body Gaussian Avatars | Zhijing Shao et.al. | 2408.10588 | link |
| 2024-08-18 | FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model | Ziyu Yao et.al. | 2408.09384 | null |
| 2024-08-18 | Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation | Xukun Zhou et.al. | 2408.09357 | null |
| 2024-08-18 | S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis | Dongze Li et.al. | 2408.09347 | null |
| 2024-08-16 | GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer | Yihong Lin et.al. | 2408.01826 | null |
| 2024-08-14 | Content and Style Aware Audio-Driven Facial Animation | Qingju Liu et.al. | 2408.07005 | null |
| 2024-08-12 | DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation | Jisoo Kim et.al. | 2408.06010 | null |
| 2024-08-10 | High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model | Weizhi Zhong et.al. | 2408.05416 | null |
| 2024-08-10 | Style-Preserving Lip Sync via Audio-Aware Style Reference | Weizhi Zhong et.al. | 2408.05412 | null |
| 2024-08-09 | DeepSpeak Dataset v1.0 | Sarah Barrington et.al. | 2408.05366 | null |
| 2024-08-06 | ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer | Jiazhi Guan et.al. | 2408.03284 | null |
| 2024-08-03 | Landmark-guided Diffusion Model for High-fidelity and Temporally Coherent Talking Head Generation | Jintao Tan et.al. | 2408.01732 | null |
| 2024-08-03 | JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid Transformer-Mamba Model | Farzaneh Jafari et.al. | 2408.01627 | null |
| 2024-08-01 | UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model | Xiangyu Fan et.al. | 2408.00762 | null |
| 2024-08-01 | Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion | Manuel Kansy et.al. | 2408.00458 | null |
| 2024-08-01 | EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head | Qianyun He et.al. | 2408.00297 | null |
| 2024-07-31 | Deformable 3D Shape Diffusion Model | Dengsheng Chen et.al. | 2407.21428 | null |
| 2024-07-26 | LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial Control Enhancement | Rui Zhang et.al. | 2407.18595 | null |
| 2024-07-24 | A Comprehensive Review and Taxonomy of Audio-Visual Synchronization Techniques for Realistic Speech Animation | Jose Geraldo Fernandes et.al. | 2407.17430 | null |
| 2024-07-24 | The impact of differences in facial features between real speakers and 3D face models on synthesized lip motions | Rabab Algadhy et.al. | 2407.17253 | null |
| 2024-07-22 | PAV: Personalized Head Avatar from Unstructured Video Collection | Akin Caliskan et.al. | 2407.21047 | null |
| 2024-07-21 | Anchored Diffusion for Video Face Reenactment | Idan Kligvasser et.al. | 2407.15153 | null |
| 2024-07-20 | Text-based Talking Video Editing with Cascaded Conditional Diffusion | Bo Han et.al. | 2407.14841 | null |
| 2024-07-17 | Universal Facial Encoding of Codec Avatars from VR Headsets | Shaojie Bai et.al. | 2407.13038 | null |
| 2024-07-17 | EmoFace: Audio-driven Emotional 3D Face Animation | Chang Liu et.al. | 2407.12501 | link |
| 2024-07-13 | Learning Online Scale Transformation for Talking Head Video Generation | Fa-Ting Hong et.al. | 2407.09965 | null |
| 2024-07-12 | Real Face Video Animation Platform | Xiaokai Chen et.al. | 2407.18955 | null |
| 2024-07-12 | One-Shot Pose-Driving Face Animation Platform | He Feng et.al. | 2407.08949 | null |
| 2024-07-12 | EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions | Zhiyuan Chen et.al. | 2407.08136 | link |
| 2024-07-08 | MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices | Jianwen Jiang et.al. | 2407.05712 | null |
| 2024-07-08 | Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN | Jiacheng Su et.al. | 2407.05577 | null |
| 2024-07-04 | Compressed Skinning for Facial Blendshapes | Ladislav Kavan et.al. | 2406.11597 | null |
| 2024-07-03 | LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control | Jianzhu Guo et.al. | 2407.03168 | link |
| 2024-07-02 | Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert | Han EunGi et.al. | 2407.01034 | null |
| 2024-06-26 | RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network | Xiaozhong Ji et.al. | 2406.18284 | null |
| 2024-06-24 | The Effects of Embodiment and Personality Expression on Learning in LLM-based Educational Agents | Sinan Sonlu et.al. | 2407.10993 | null |
| 2024-06-21 | EmpathyEar: An Open-source Avatar Multimodal Empathetic Chatbot | Hao Fei et.al. | 2406.15177 | link |
| 2024-06-20 | MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset | Kim Sung-Bin et.al. | 2406.14272 | null |
| 2024-06-19 | DF40: Toward Next-Generation Deepfake Detection | Zhiyuan Yan et.al. | 2406.13495 | link |
| 2024-06-19 | AniFaceDiff: High-Fidelity Face Reenactment via Facial Parametric Conditioned Diffusion Models | Ken Chen et.al. | 2406.13272 | null |
| 2024-06-18 | RITA: A Real-time Interactive Talking Avatars Framework | Wuxinlin Cheng et.al. | 2406.13093 | null |
| 2024-06-18 | A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing | Ming Meng et.al. | 2406.10553 | null |
| 2024-06-17 | NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation | Niu Guanchen et.al. | 2406.11259 | null |
| 2024-06-17 | Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement | Runyi Yu et.al. | 2406.08096 | null |
| 2024-06-16 | Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation | Mingwang Xu et.al. | 2406.08801 | null |
| 2024-06-14 | DNPM: A Neural Parametric Model for the Synthesis of Facial Geometric Details | Haitao Cao et.al. | 2405.19688 | null |
| 2024-06-13 | Talking Heads: Understanding Inter-layer Communication in Transformer Language Models | Jack Merullo et.al. | 2406.09519 | null |
| 2024-06-13 | DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing | Neha Sahipjohn et.al. | 2406.08802 | null |
| 2024-06-12 | Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation | Jiadong Liang et.al. | 2406.07895 | null |
| 2024-06-07 | Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation | Yue Ma et.al. | 2406.01900 | null |
| 2024-06-05 | Controllable Talking Face Generation by Implicit Facial Keypoints Editing | Dong Zhao et.al. | 2406.02880 | link |
| 2024-05-31 | MunchSonic: Tracking Fine-grained Dietary Actions through Active Acoustic Sensing on Eyeglasses | Saif Mahmud et.al. | 2405.21004 | null |
| 2024-05-31 | MegActor: Harness the Power of Raw Video for Vivid Portrait Animation | Shurong Yang et.al. | 2405.20851 | link |
| 2024-05-30 | Audio2Rig: Artist-oriented deep learning tool for facial animation | Bastien Arcelin et.al. | 2405.20412 | null |
| 2024-05-28 | OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance | Shuheng Ge et.al. | 2405.14709 | null |
| 2024-05-24 | InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation | Yuchi Wang et.al. | 2405.15758 | link |
| 2024-05-22 | Metabook: An Automatically Generated Augmented Reality Storybook Interaction System to Improve Children's Engagement in Storytelling | Yibo Wang et.al. | 2405.13701 | null |
| 2024-05-21 | Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control | Yue Han et.al. | 2405.12970 | null |
| 2024-05-16 | Faces that Speak: Jointly Synthesising Talking Face and Speech from Text | Youngjoon Jang et.al. | 2405.10272 | null |
| 2024-05-14 | PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset | Yang Hou et.al. | 2405.08838 | link |
| 2024-05-10 | NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior | Gihoon Kim et.al. | 2405.05749 | null |
| 2024-05-09 | SwapTalk: Audio-Driven Talking Face Generation with One-Shot Customization in Latent Space | Zeren Zhang et.al. | 2405.05636 | null |
| 2024-05-08 | Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention | Ruijie Tao et.al. | 2404.18501 | link |
| 2024-05-07 | Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation | Dogucan Yaman et.al. | 2405.04327 | null |
| 2024-05-07 | AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding | Tao Liu et.al. | 2405.03121 | null |
| 2024-04-29 | EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars | Nikita Drobyshev et.al. | 2404.19110 | null |
| 2024-04-29 | GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting | Bo Chen et.al. | 2404.19040 | null |
| 2024-04-29 | Embedded Representation Learning Network for Animating Styled Video Portrait | Tianyong Wang et.al. | 2404.19038 | null |
| 2024-04-29 | CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation | Xiangyu Liang et.al. | 2404.18604 | null |
| 2024-04-28 | GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting | Hongyun Yu et.al. | 2404.14037 | null |
| 2024-04-25 | GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting | Kyusun Cho et.al. | 2404.16012 | link |
| 2024-04-23 | TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting | Jiahe Li et.al. | 2404.15264 | link |
| 2024-04-19 | Learn2Talk: 3D Talking Face Learns from 2D Talking Face | Yixiang Zhuang et.al. | 2404.12888 | null |
| 2024-04-16 | VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time | Sicheng Xu et.al. | 2404.10667 | null |
| 2024-04-15 | FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance, Head-pose, and Facial Expression Features | Andre Rochow et.al. | 2404.09736 | null |
| 2024-04-13 | THQA: A Perceptual Quality Assessment Database for Talking Heads | Yingjie Zhou et.al. | 2404.09003 | link |
| 2024-04-11 | EFHQ: Multi-purpose ExtremePose-Face-HQ dataset | Trung Tuan Dao et.al. | 2312.17205 | null |
| 2024-04-09 | Deepfake Generation and Detection: A Benchmark and Survey | Gan Pei et.al. | 2403.17881 | link |
| 2024-04-08 | SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation | Heyuan Li et.al. | 2404.05680 | null |
| 2024-04-07 | GvT: A Graph-based Vision Transformer with Talking-Heads Utilizing Sparsity, Trained from Scratch on Small Datasets | Dongjing Shan et.al. | 2404.04924 | null |
| 2024-04-07 | Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation | Renshuai Liu et.al. | 2401.01207 | null |
| 2024-04-03 | MI-NeRF: Learning a Single Face NeRF from Multiple Identities | Aggelina Chatziagapi et.al. | 2403.19920 | null |
| 2024-04-02 | EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis | Shuai Tan et.al. | 2404.01647 | null |
| 2024-04-02 | Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation | Taekyung Ki et.al. | 2404.00636 | null |
| 2024-04-02 | Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation | Se Jin Park et.al. | 2305.19556 | null |
| 2024-04-01 | FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio | Chao Xu et.al. | 2403.01901 | link |
| 2024-03-29 | Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior | Jaehoon Ko et.al. | 2403.20153 | link |
| 2024-03-28 | MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation | Seyeon Kim et.al. | 2403.19144 | link |
| 2024-03-28 | GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response | Govind Mittal et.al. | 2210.06186 | link |
| 2024-03-27 | X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention | You Xie et.al. | 2403.15931 | null |
| 2024-03-26 | Superior and Pragmatic Talking Face Generation with Teacher-Student Framework | Chao Liang et.al. | 2403.17883 | null |
| 2024-03-26 | AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation | Huawei Wei et.al. | 2403.17694 | link |
| 2024-03-26 | Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis | Zhenhui Ye et.al. | 2401.08503 | null |
| 2024-03-25 | DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment | Stella Bounareli et.al. | 2403.17217 | null |
| 2024-03-25 | AnimateMe: 4D Facial Expressions via Diffusion Models | Dimitrios Gerogiannis et.al. | 2403.17213 | null |
| 2024-03-25 | Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework | Ziyao Huang et.al. | 2403.16510 | link |
| 2024-03-23 | Adaptive Super Resolution For One-Shot Talking-Head Generation | Luchuan Song et.al. | 2403.15944 | link |
| 2024-03-22 | LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example | Soyeon Yoon et.al. | 2403.15227 | link |
| 2024-03-22 | Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing | Juan Zhang et.al. | 2403.11700 | null |
| 2024-03-19 | EmoVOCA: Speech-Driven Emotional 3D Talking Heads | Federico Nocentini et.al. | 2403.12886 | link |
| 2024-03-19 | ScanTalk: 3D Talking Heads from Unregistered Scans | Federico Nocentini et.al. | 2403.10942 | link |
| 2024-03-15 | StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation | Dongchan Min et.al. | 2208.10922 | null |
| 2024-03-14 | GAIA: Zero-shot Talking Avatar Generation | Tianyu He et.al. | 2311.15230 | null |
| 2024-03-13 | Say Anything with Any Style | Shuai Tan et.al. | 2403.06363 | null |
| 2024-03-12 | FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization | Shuai Tan et.al. | 2403.06375 | null |
| 2024-03-12 | Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style | Shuai Tan et.al. | 2403.06365 | null |
| 2024-03-11 | A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos | Weixia Zhang et.al. | 2403.06421 | link |
| 2024-03-05 | Memories are One-to-Many Mapping Alleviators in Talking Face Generation | Anni Tang et.al. | 2212.05005 | null |
| 2024-03-02 | G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment | Juan Zhang et.al. | 2402.18122 | null |
| 2024-03-01 | DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder | Chenpeng Du et.al. | 2303.17550 | null |
| 2024-02-29 | Learning a Generalized Physical Face Model From Data | Lingchen Yang et.al. | 2402.19477 | null |
| 2024-02-28 | Context-aware Talking Face Video Generation | Meidai Xuanyuan et.al. | 2402.18092 | null |
| 2024-02-27 | EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions | Linrui Tian et.al. | 2402.17485 | null |
| 2024-02-27 | Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis | Zicheng Zhang et.al. | 2402.17364 | link |
| 2024-02-26 | Resolution-Agnostic Neural Compression for High-Fidelity Portrait Video Conferencing via Implicit Radiance Fields | Yifei Li et.al. | 2402.16599 | null |
| 2024-02-25 | AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation | Yasheng Sun et.al. | 2402.16124 | null |
| 2024-02-21 | Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters | Zechen Bai et.al. | 2402.13724 | link |
| 2024-02-21 | StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing | Gaoxiang Cong et.al. | 2402.12636 | link |
| 2024-02-12 | StyleLipSync: Style-based Personalized Lip-sync Video Generation | Taekyung Ki et.al. | 2305.00521 | null |
| 2024-02-08 | DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer | Zhiyuan Ma et.al. | 2402.05712 | link |
| 2024-02-05 | One-shot Neural Face Reenactment via Finding Directions in GAN's Latent Space | Stella Bounareli et.al. | 2402.03553 | null |
| 2024-02-02 | EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation | Guanwen Feng et.al. | 2402.01422 | null |
| 2024-01-31 | MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis | Wenhao Guan et.al. | 2312.10687 | null |
| 2024-01-30 | Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance | Qingcheng Zhao et.al. | 2401.15687 | null |
| 2024-01-28 | Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-Syncing DeepFakes | Weifeng Liu et.al. | 2401.15668 | link |
| 2024-01-27 | An Implicit Physical Face Model Driven by Expression and Style | Lingchen Yang et.al. | 2401.15414 | null |
| 2024-01-26 | Implicit Neural Representation for Physics-driven Actuated Soft Bodies | Lingchen Yang et.al. | 2401.14861 | null |
| 2024-01-25 | SAiD: Speech-driven Blendshape Facial Animation with Diffusion | Inkyu Park et.al. | 2401.08655 | link |
| 2024-01-23 | NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face Synthesis | Chongke Bi et.al. | 2401.12568 | null |
| 2024-01-19 | Fast Registration of Photorealistic Avatars for VR Facial Animation | Chaitanya Patel et.al. | 2401.11002 | null |
| 2024-01-18 | Exposing Lip-syncing Deepfakes from Mouth Inconsistencies | Soumyya Kanti Datta et.al. | 2401.10113 | link |
| 2024-01-18 | Text-driven Talking Face Synthesis by Reprogramming Audio-driven Models | Jeongsoo Choi et.al. | 2306.16003 | null |
| 2024-01-16 | EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model | Bingyuan Zhang et.al. | 2401.08049 | null |
| 2024-01-12 | DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder | Tao Liu et.al. | 2311.01811 | link |
| 2024-01-11 | Dubbing for Everyone: Data-Efficient Visual Dubbing using Neural Rendering Priors | Jack Saunders et.al. | 2401.06126 | null |
| 2024-01-11 | Jump Cut Smoothing for Talking Heads | Xiaojuan Wang et.al. | 2401.04718 | null |
| 2024-01-08 | AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation | Liyang Chen et.al. | 2310.07236 | null |
| 2024-01-07 | Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness | Sicheng Yang et.al. | 2401.03476 | null |
| 2024-01-04 | Expressive Speech-driven Facial Animation with controllable emotions | Yutong Chen et.al. | 2301.02008 | link |
| 2023-12-23 | TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation | Xize Cheng et.al. | 2312.15197 | null |
| 2023-12-22 | DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation | Chenxu Zhang et.al. | 2312.13578 | null |
| 2023-12-20 | FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability | Linze Li et.al. | 2312.03775 | null |
| 2023-12-19 | Learning Dense Correspondence for NeRF-Based Face Reenactment | Songlin Yang et.al. | 2312.10422 | null |
| 2023-12-19 | Gaussian3Diff: 3D Gaussian Diffusion for 3D Full Head Synthesis and Editing | Yushi Lan et.al. | 2312.03763 | null |
| 2023-12-18 | VectorTalker: SVG Talking Face Generation with Progressive Vectorisation | Hao Hu et.al. | 2312.11568 | null |
| 2023-12-18 | AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis | Dongze Li et.al. | 2312.10921 | null |
| 2023-12-18 | Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation | Hui Fu et.al. | 2312.10877 | null |
| 2023-12-15 | DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models | Yifeng Ma et.al. | 2312.09767 | link |
| 2023-12-15 | Attention-Based VR Facial Animation with Visual Mouth Camera Guidance for Immersive Telepresence Avatars | Andre Rochow et.al. | 2312.09750 | null |
| 2023-12-13 | uTalk: Bridging the Gap Between Humans and AI | Hussam Azzuni et.al. | 2310.02739 | null |
| 2023-12-13 | MMFace4D: A Large-Scale Multi-Modal 4D Face Dataset for Audio-Driven 3D Face Animation | Haozhe Wu et.al. | 2303.09797 | null |
| 2023-12-12 | GMTalker: Gaussian Mixture based Emotional talking video Portraits | Yibo Xia et.al. | 2312.07669 | null |
| 2023-12-12 | GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance | Haiming Zhang et.al. | 2312.07385 | null |
| 2023-12-11 | Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism | Georgios Milis et.al. | 2312.06613 | link |
| 2023-12-11 | Study of Non-Verbal Behavior in Conversational Agents | Camila Vicari Maccari et.al. | 2312.06530 | null |
| 2023-12-11 | DiT-Head: High-Resolution Talking Head Synthesis using Diffusion Transformers | Aaron Mir et.al. | 2312.06400 | null |
| 2023-12-11 | Audio-driven Talking Face Generation by Overcoming Unintended Information Flow | Dogucan Yaman et.al. | 2307.09368 | null |
| 2023-12-10 | DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation | Fa-Ting Hong et.al. | 2305.06225 | link |
| 2023-12-09 | R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning | Zhiling Ye et.al. | 2312.05572 | null |
| 2023-12-09 | FT2TF: First-Person Statement Text-To-Talking Face Generation | Xingjian Diao et.al. | 2312.05430 | null |
| 2023-12-08 | SingingHead: A Large-scale 4D Dataset for Singing Head Animation | Sijing Wu et.al. | 2312.04369 | null |
| 2023-12-07 | VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior | Xusen Sun et.al. | 2312.01841 | null |
| 2023-12-05 | PMMTalk: Speech-Driven 3D Facial Animation from Complementary Pseudo Multi-modal Features | Tianshun Han et.al. | 2312.02781 | null |
| 2023-12-05 | MyPortrait: Morphable Prior-Guided Personalized Portrait Generation | Bo Ding et.al. | 2312.02703 | null |
| 2023-12-02 | DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser | Peng Chen et.al. | 2311.16565 | null |
| 2023-12-01 | 3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing | Balamurugan Thambiraja et.al. | 2312.00870 | null |
| 2023-11-30 | Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data | Yu Deng et.al. | 2311.18729 | null |
| 2023-11-30 | Talking Head(?) Anime from a Single Image 4: Improved Model and Its Distillation | Pramook Khungurn et.al. | 2311.17409 | null |
| 2023-11-29 | SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis | Ziqiao Peng et.al. | 2311.17590 | link |
| 2023-11-28 | THInImg: Cross-modal Steganography for Presenting Talking Heads in Images | Lin Zhao et.al. | 2311.17177 | null |
| 2023-11-28 | BakedAvatar: Baking Neural Fields for Real-Time Head Avatar Synthesis | Hao-Bin Duan et.al. | 2311.05521 | link |
| 2023-11-28 | Continuously Controllable Facial Expression Editing in Talking Face Videos | Zhiyao Sun et.al. | 2209.08289 | null |
| 2023-11-20 | MemoryCompanion: A Smart Healthcare Solution to Empower Efficient Alzheimer's Care Via Unleashing Generative AI | Lifei Zheng et.al. | 2311.14730 | null |
| 2023-11-15 | CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding | Jianzong Wang et.al. | 2311.08673 | null |
| 2023-11-13 | DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial Animation | Guinan Su et.al. | 2311.04766 | null |
| 2023-11-12 | ChatAnything: Facetime Chat with LLM-Enhanced Personas | Yilin Zhao et.al. | 2311.06772 | null |
| 2023-11-08 | Synthetic Speaking Children -- Why We Need Them and How to Make Them | Muhammad Ali Farooq et.al. | 2311.06307 | null |
| 2023-11-06 | RADIO: Reference-Agnostic Dubbing Video Synthesis | Dongyeun Lee et.al. | 2309.01950 | null |
| 2023-11-05 | 3D-Aware Talking-Head Video Motion Transfer | Haomiao Ni et.al. | 2311.02549 | null |
| 2023-11-03 | Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading | Songtao Luo et.al. | 2310.05058 | link |
| 2023-11-02 | LaughTalk: Expressive 3D Talking Head Generation with Laughter | Kim Sung-Bin et.al. | 2311.00994 | null |
| 2023-11-02 | High-Fidelity and Freely Controllable Talking Head Video Generation | Yue Gao et.al. | 2304.10168 | null |
| 2023-10-31 | Breathing Life into Faces: Speech-driven 3D Facial Animation with Natural Head Pose and Detailed Shape | Wei Zhao et.al. | 2310.20240 | null |
| 2023-10-29 | On the Vulnerability of DeepFake Detectors to Attacks Generated by Denoising Diffusion Models | Marija Ivanovska et.al. | 2307.05397 | null |
| 2023-10-25 | Personalized Speech-driven Expressive 3D Facial Animation Synthesis with Style Control | Elif Bozkurt et.al. | 2310.17011 | null |
| 2023-10-23 | The Self 2.0: How AI-Enhanced Self-Clones Transform Self-Perception and Improve Presentation Skills | Qingxiao Zheng et.al. | 2310.15112 | null |
| 2023-10-19 | Gemino: Practical and Robust Neural Compression for Video Conferencing | Vibhaalakshmi Sivaraman et.al. | 2209.10507 | null |
| 2023-10-17 | CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D Animation | Zhaojie Chu et.al. | 2310.11295 | null |
| 2023-10-15 | HyperLips: Hyper Control Lips with High Resolution Decoder for Talking Face Generation | Yaosen Chen et.al. | 2310.05720 | link |
| 2023-10-12 | CleftGAN: Adapting A Style-Based Generative Adversarial Network To Create Images Depicting Cleft Lip Deformity | Abdullah Hayajneh et.al. | 2310.07969 | link |
| 2023-10-12 | Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation | Yuan Gan et.al. | 2309.04946 | link |
| 2023-10-08 | GestSync: Determining who is speaking without a talking head | Sindhu B Hegde et.al. | 2310.05304 | link |
| 2023-09-30 | DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models | Zhiyao Sun et.al. | 2310.00434 | null |
| 2023-09-28 | OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions | Jin Liu et.al. | 2309.16148 | null |
| 2023-09-26 | Emotional Speech-Driven Animation with Content-Emotion Disentanglement | Radek Daněček et.al. | 2306.08990 | null |
| 2023-09-20 | FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion | Stefan Stan et.al. | 2309.11306 | link |
| 2023-09-20 | Context-Aware Talking-Head Video Editing | Songlin Yang et.al. | 2308.00462 | null |
| 2023-09-18 | That's What I Said: Fully-Controllable Talking Face Generation | Youngjoon Jang et.al. | 2304.03275 | null |
| 2023-09-15 | Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech | Junjie Li et.al. | 2309.08408 | link |
| 2023-09-14 | DT-NeRF: Decomposed Triplane-Hash Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis | Yaoyu Su et.al. | 2309.07752 | null |
| 2023-09-14 | DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks | Zipeng Qi et.al. | 2309.07509 | null |
| 2023-09-14 | HDTR-Net: A Real-Time High-Definition Teeth Restoration Network for Arbitrary Talking Face Generation Methods | Yongyuan Li et.al. | 2309.07495 | link |
| 2023-09-13 | PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network | Qinghua Liu et.al. | 2309.06723 | null |
| 2023-09-12 | DF-TransFusion: Multimodal Deepfake Detection via Lip-Audio Cross-Attention and Facial Self-Attention | Aaditya Kharel et.al. | 2309.06511 | null |
| 2023-09-12 | Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos | Ekta Prashnani et.al. | 2305.03713 | null |
| 2023-09-11 | ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment | Yicheng Zhong et.al. | 2308.14448 | null |
| 2023-09-10 | MaskRenderer: 3D-Infused Multi-Mask Realistic Face Reenactment | Tina Behrouzi et.al. | 2309.05095 | null |
| 2023-09-09 | Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video | Xiuzhe Wu et.al. | 2309.04814 | link |
| 2023-09-01 | Unsupervised Learning of Style-Aware Facial Animation from Real Acting Performances | Wolfgang Paier et.al. | 2306.10006 | null |
| 2023-08-30 | From Pixels to Portraits: A Comprehensive Survey of Talking Head Generation Techniques and Applications | Shreyank N Gowda et.al. | 2308.16041 | null |
| 2023-08-30 | SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces | Ziqiao Peng et.al. | 2306.10799 | link |
| 2023-08-30 | Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models | Antoni Bigata Casademunt et.al. | 2305.08854 | link |
| 2023-08-29 | Papeos: Augmenting Research Papers with Talk Videos | Tae Soo Kim et.al. | 2308.15224 | null |
| 2023-08-25 | EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation | Ziqiao Peng et.al. | 2303.11089 | link |
| 2023-08-24 | ToonTalker: Cross-Domain Face Reenactment | Yuan Gong et.al. | 2308.12866 | null |
| 2023-08-24 | Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis | Jiahe Li et.al. | 2307.09323 | link |
| 2023-08-23 | DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion | Se Jin Park et.al. | 2310.05934 | null |
| 2023-08-21 | Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis | Tong Sha et.al. | 2109.02081 | null |
| 2023-08-18 | Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization | Soumik Mukhopadhyay et.al. | 2308.09716 | link |
| 2023-08-18 | Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation | Fa-Ting Hong et.al. | 2307.09906 | link |
| 2023-08-17 | A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation | Li Liu et.al. | 2308.08849 | link |
| 2023-08-16 | Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with Instructions | Yuqi Sun et.al. | 2306.10813 | null |
| 2023-08-12 | Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation | Zhichao Wang et.al. | 2308.06457 | link |
| 2023-08-12 | DialogueNeRF: Towards Realistic Avatar Face-to-Face Conversation Video Generation | Yichao Yan et.al. | 2203.07931 | null |
| 2023-08-11 | Versatile Face Animator: Driving Arbitrary 3D Facial Avatar in RGBD Space | Haoyu Wang et.al. | 2308.06076 | link |
| 2023-08-11 | VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer | Liyang Chen et.al. | 2308.04830 | null |
| 2023-08-10 | Near-realtime Facial Animation by Deep 3D Simulation Super-Resolution | Hyojoon Park et.al. | 2305.03216 | null |
| 2023-08-02 | Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis | Zhenhui Ye et.al. | 2306.03504 | null |
| 2023-07-29 | Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation | Michał Stypułkowski et.al. | 2301.03396 | null |
| 2023-07-26 | Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation | Federico Nocentini et.al. | 2306.01415 | link |
| 2023-07-20 | HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces | Stella Bounareli et.al. | 2307.10797 | link |
| 2023-07-20 | MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions | Yunfei Liu et.al. | 2307.10008 | null |
| 2023-07-19 | Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline | Zhigang Chang et.al. | 2307.09821 | null |
| 2023-07-19 | OPHAvatars: One-shot Photo-realistic Head Avatars | Shaoxu Li et.al. | 2307.09153 | link |
| 2023-07-18 | FACTS: Facial Animation Creation using the Transfer of Styles | Jack Saunders et.al. | 2307.09480 | null |
| 2023-07-09 | Predictive Coding For Animation-Based Video Compression | Goluck Konuko et.al. | 2307.04187 | null |
| 2023-07-08 | FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction | Ganglai Wang et.al. | 2307.03990 | null |
| 2023-07-05 | Interactive Conversational Head Generation | Mohan Zhou et.al. | 2307.02090 | null |
| 2023-07-04 | A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation | Louis Airale et.al. | 2307.03270 | link |
| 2023-07-04 | Generating Animatable 3D Cartoon Faces from Single Portraits | Chuanyu Pan et.al. | 2307.01468 | null |
| 2023-07-03 | RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations | Neha Sahipjohn et.al. | 2307.01233 | null |
| 2023-06-20 | Audio-Driven 3D Facial Animation from In-the-Wild Videos | Liying Lu et.al. | 2306.11541 | null |
| 2023-06-13 | Parametric Implicit Face Representation for Audio-Driven Facial Reenactment | Ricong Huang et.al. | 2306.07579 | null |
| 2023-06-13 | AniFaceDrawing: Anime Portrait Exploration during Your Sketching | Zhengyu Huang et.al. | 2306.07476 | null |
| 2023-06-12 | NPVForensics: Jointing Non-critical Phonemes and Visemes for Deepfake Detection | Yu Chen et.al. | 2306.06885 | null |
| 2023-06-10 | StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles | Yifeng Ma et.al. | 2301.01081 | link |
| 2023-06-08 | ReliableSwap: Boosting General Face Swapping Via Reliable Supervision | Ge Yuan et.al. | 2306.05356 | link |
| 2023-06-06 | Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks | Jianrong Wang et.al. | 2306.03594 | null |
| 2023-06-05 | Instruct-Video2Avatar: Video-to-Avatar Generation with Instructions | Shaoxu Li et.al. | 2306.02903 | link |
| 2023-05-31 | High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning | Chao Xu et.al. | 2305.02572 | null |
| 2023-05-23 | CPNet: Exploiting CLIP-based Attention Condenser and Probability Map Guidance for High-fidelity Talking Face Generation | Jingning Xu et.al. | 2305.13962 | null |
| 2023-05-22 | RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars | Dongwei Pan et.al. | 2305.13353 | link |
| 2023-05-19 | UniFLG: Unified Facial Landmark Generator from Text or Speech | Kentaro Mitsui et.al. | 2302.14337 | null |
| 2023-05-18 | An Android Robot Head as Embodied Conversational Agent | Marcel Heisler et.al. | 2305.10945 | null |
| 2023-05-18 | Audio-Visual Person-of-Interest DeepFake Detection | Davide Cozzolino et.al. | 2204.03083 | link |
| 2023-05-17 | INCLG: Inpainting for Non-Cleft Lip Generation with a Multi-Task Image Processing Network | Shuang Chen et.al. | 2305.10589 | null |
| 2023-05-17 | LPMM: Intuitive Pose Control for Neural Talking-Head Model via Landmark-Parameter Morphable Model | Kwangho Lee et.al. | 2305.10456 | null |
| 2023-05-15 | Identity-Preserving Talking Face Generation with Landmark and Appearance Priors | Weizhi Zhong et.al. | 2305.08293 | link |
| 2023-05-09 | Zero-shot personalized lip-to-speech synthesis with face image based voice control | Zheng-Yan Sheng et.al. | 2305.14359 | null |
| 2023-05-09 | StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator | Jiazhi Guan et.al. | 2305.05445 | null |
| 2023-05-09 | Multimodal-driven Talking Face Generation via a Unified Diffusion-based Generator | Chao Xu et.al. | 2305.02594 | null |
| 2023-05-01 | StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video | Lizhen Wang et.al. | 2305.00942 | link |
| 2023-05-01 | GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation | Zhenhui Ye et.al. | 2305.00787 | null |
| 2023-04-28 | A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation | Bo-Kyeong Kim et.al. | 2304.00471 | null |
| 2023-04-27 | Controllable One-Shot Face Video Synthesis With Semantic Aware Prior | Kangning Liu et.al. | 2304.14471 | null |
| 2023-04-25 | AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head | Rongjie Huang et.al. | 2304.12995 | link |
| 2023-04-24 | VR Facial Animation for Immersive Telepresence Avatars | Andre Rochow et.al. | 2304.12051 | null |
| 2023-04-21 | Implicit Neural Head Synthesis via Controllable Local Deformation Fields | Chuhan Chen et.al. | 2304.11113 | null |
| 2023-04-20 | DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation | Shuai Shen et.al. | 2301.03786 | link |
| 2023-04-18 | Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations | Rongliang Wu et.al. | 2304.08945 | null |
| 2023-04-17 | Autoregressive GAN for Semantic Unconditional Head Motion Generation | Louis Airale et.al. | 2211.00987 | link |
| 2023-04-11 | One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field | Weichuang Li et.al. | 2304.05097 | null |
| 2023-04-06 | Face Animation with an Attribute-Guided Diffusion Model | Bohan Zeng et.al. | 2304.03199 | link |
| 2023-04-06 | 4D Agnostic Real-Time Facial Animation Pipeline for Desktop Scenarios | Wei Chen et.al. | 2304.02814 | null |
| 2023-04-03 | CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior | Jinbo Xing et.al. | 2301.02379 | link |
| 2023-04-01 | DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance | Longwen Zhang et.al. | 2304.03117 | null |
| 2023-04-01 | TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles | Yifeng Ma et.al. | 2304.00334 | null |
| 2023-03-31 | FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions | Jin Liu et.al. | 2303.17789 | null |
| 2023-03-31 | Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert | Jiadong Wang et.al. | 2303.17480 | null |
| 2023-03-27 | OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis | Hongyi Xu et.al. | 2303.15539 | null |
| 2023-03-27 | Accurate and Interpretable Solution of the Inverse Rig for Realistic Blendshape Models with Quadratic Corrective Terms | Stevo Racković et.al. | 2302.04843 | null |
| 2023-03-27 | MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation | Bowen Zhang et.al. | 2212.08062 | link |
| 2023-03-27 | A Majorization-Minimization Based Method for Nonconvex Inverse Rig Problems in Facial Animation: Algorithm Derivation | Stevo Racković et.al. | 2205.04289 | null |
| 2023-03-26 | OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering | Zhiyuan Ma et.al. | 2303.14662 | link |
| 2023-03-26 | Emotionally Enhanced Talking Face Generation | Sahil Goyal et.al. | 2303.11548 | link |
| 2023-03-26 | Distributed Solution of the Inverse Rig Problem in Blendshape Facial Animation | Stevo Racković et.al. | 2303.06370 | null |
| 2023-03-24 | Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement | Siddarth Ravichandran et.al. | 2209.01320 | null |
| 2023-03-23 | PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360 |
Sizhe An et.al. | 2303.13071 | null |
| 2023-03-22 | Style Transfer for 2D Talking Head Animation | Trong-Thang Pham et.al. | 2303.09799 | link |
| 2023-03-22 | MARLIN: Masked Autoencoder for facial video Representation LearnINg | Zhixi Cai et.al. | 2211.06627 | link |
| 2023-03-14 | DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions | Geumbyeol Hwang et.al. | 2303.07697 | link |
| 2023-03-13 | SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation | Wenxuan Zhang et.al. | 2211.12194 | link |
| 2023-03-09 | FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning | Kazi Injamamul Haque et.al. | 2303.05416 | link |
| 2023-03-09 | Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation | Qi Chen et.al. | 2303.05322 | link |
| 2023-03-07 | DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video | Zhimeng Zhang et.al. | 2303.03988 | link |
| 2023-03-05 | Cyber Vaccine for Deepfake Immunity | Ching-Chun Chang et.al. | 2303.02659 | null |
| 2023-03-04 | High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors | Yunpeng Bai et.al. | 2211.15064 | null |
| 2023-03-01 | DPE: Disentanglement of Pose and Expression for General Video Portrait Editing | Youxin Pang et.al. | 2301.06281 | link |
| 2023-02-27 | Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video | Minsu Kim et.al. | 2303.08670 | null |
| 2023-02-27 | Memory-augmented Contrastive Learning for Talking Head Generation | Jianrong Wang et.al. | 2302.13469 | link |
| 2023-02-24 | Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertex Attention | Bin Liu et.al. | 2302.12532 | null |
| 2023-02-16 | OPT: One-shot Pose-Controllable Talking Head Generation | Jin Liu et.al. | 2302.08197 | null |
| 2023-02-14 | Expressive Talking Head Video Encoding in StyleGAN2 Latent-Space | Trevine Oorloff et.al. | 2203.14512 | link |
| 2023-01-31 | GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis | Zhenhui Ye et.al. | 2301.13430 | null |
| 2023-01-23 | Data standardization for robust lip sync | Chun Wang et.al. | 2202.06198 | null |
| 2023-01-20 | Neural Volumetric Blendshapes: Computationally Efficient Physics-Based Facial Blendshapes | Nicolas Wagner et.al. | 2212.14784 | null |
| 2023-01-15 | Learning Audio-Driven Viseme Dynamics for 3D Face Animation | Linchao Bao et.al. | 2301.06059 | null |
| 2022-12-30 | Imitator: Personalized Speech-driven 3D Facial Animation | Balamurugan Thambiraja et.al. | 2301.00023 | null |
| 2022-12-28 | All's well that FID's well? Result quality and metric scores in GAN models for lip-sychronization tasks | Carina Geldhauser et.al. | 2212.13810 | null |
| 2022-12-23 | Dubbing in Practice: A Large Scale Study of Human Localization With Insights for Automatic Dubbing | William Brannon et.al. | 2212.12137 | null |
| 2022-12-09 | Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers | Yasheng Sun et.al. | 2212.04970 | null |
| 2022-12-07 | Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors | Zhentao Yu et.al. | 2212.04248 | null |
| 2022-12-07 | SPACE: Speech-driven Portrait Animation with Controllable Expression | Siddharth Gururani et.al. | 2211.09809 | null |
| 2022-11-30 | Extracting Semantic Knowledge from GANs with Unsupervised Learning | Jianjin Xu et.al. | 2211.16710 | null |
| 2022-11-29 | VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild | Kun Cheng et.al. | 2211.14758 | null |
| 2022-11-26 | Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis | Duomin Wang et.al. | 2211.14506 | link |
| 2022-11-22 | Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition | Jiaxiang Tang et.al. | 2211.12368 | null |
| 2022-11-10 | On the role of Lip Articulation in Visual Speech Perception | Zakaria Aldeneh et.al. | 2203.10117 | null |
| 2022-11-04 | SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory | Se Jin Park et.al. | 2211.00924 | null |
| 2022-10-21 | Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection | Alexandros Haliassos et.al. | 2201.07131 | link |
| 2022-10-14 | Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar | Aolan Sun et.al. | 2210.06877 | null |
| 2022-10-13 | Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors | Vladimir Iashin et.al. | 2210.07055 | link |
| 2022-10-07 | Compressing Video Calls using Synthetic Talking Heads | Madhav Agarwal et.al. | 2210.03692 | null |
| 2022-10-07 | A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis | Yichen Han et.al. | 2210.03335 | null |
| 2022-10-06 | Audio-Visual Face Reenactment | Madhav Agarwal et.al. | 2210.02755 | link |
| 2022-10-06 | Finding Directions in GAN's Latent Space for Neural Face Reenactment | Stella Bounareli et.al. | 2202.00046 | link |
| 2022-10-04 | Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale | Aditya Agarwal et.al. | 2208.09796 | null |
| 2022-09-29 | Facial Landmark Predictions with Applications to Metaverse | Qiao Han et.al. | 2209.14698 | link |
| 2022-09-27 | StyleMask: Disentangling the Style Space of StyleGAN2 for Neural Face Reenactment | Stella Bounareli et.al. | 2209.13375 | link |
| 2022-09-23 | EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model | Xinya Ji et.al. | 2205.15278 | null |
| 2022-09-21 | FNeVR: Neural Volume Rendering for Face Animation | Bohan Zeng et.al. | 2209.10340 | link |
| 2022-09-19 | AutoLV: Automatic Lecture Video Generator | Wenbin Wang et.al. | 2209.08795 | null |
| 2022-09-09 | Talking Head from Speech Audio using a Pre-trained Image Generator | Mohammed M. Alghamdi et.al. | 2209.04252 | null |
| 2022-09-07 | Restructurable Activation Networks | Kartikeya Bhardwaj et.al. | 2208.08562 | link |
| 2022-08-29 | StableFace: Analyzing and Improving Motion Stability for Talking Face Generation | Jun Ling et.al. | 2208.13717 | null |
| 2022-08-17 | Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors | Sindhu B Hegde et.al. | 2208.08118 | link |
| 2022-08-03 | Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control | Michail Christos Doukas et.al. | 2208.02210 | null |
| 2022-08-02 | Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer | Ailin Huang et.al. | 2206.12837 | link |
| 2022-08-01 | A Feasibility Study on Image Inpainting for Non-cleft Lip Generation from Patients with Cleft Lip | Shuang Chen et.al. | 2208.01149 | link |
| 2022-07-27 | A Hybrid Deep Animation Codec for Low-bitrate Video Conferencing | Goluck Konuko et.al. | 2207.13530 | null |
| 2022-07-24 | Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis | Shuai Shen et.al. | 2207.11770 | link |
| 2022-07-22 | Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos | Panagiotis P. Filntisis et.al. | 2207.11094 | link |
| 2022-07-20 | NARRATE: A Normal Assisted Free-View Portrait Stylizer | Youjia Wang et.al. | 2207.00974 | null |
| 2022-07-20 | VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection | Joanna Hong et.al. | 2206.07458 | null |
| 2022-07-20 | Responsive Listening Head Generation: A Benchmark Dataset and Baseline | Mohan Zhou et.al. | 2112.13548 | null |
| 2022-07-13 | FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis | Yongqi Wang et.al. | 2207.03800 | link |
| 2022-06-29 | Cut Inner Layers: A Structured Pruning Strategy for Efficient U-Net GANs | Bo-Kyeong Kim et.al. | 2206.14658 | null |
| 2022-06-09 | Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos | Alexander Waibel et.al. | 2206.04523 | null |
| 2022-05-31 | Text/Speech-Driven Full-Body Animation | Wenlin Zhuang et.al. | 2205.15573 | null |
| 2022-05-27 | Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast | Boqing Zhu et.al. | 2204.14057 | link |
| 2022-05-26 | One-Shot Face Reenactment on Megapixels | Wonjun Kang et.al. | 2205.13368 | null |
| 2022-05-24 | Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts | Debjoy Saha et.al. | 2205.12194 | link |
| 2022-05-20 | MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement | Alexander Richard et.al. | 2104.08223 | link |
| 2022-05-13 | Talking Face Generation with Multilingual TTS | Hyoung-Kyu Song et.al. | 2205.06421 | null |
| 2022-05-02 | Emotion-Controllable Generalized Talking Face Generation | Sanjana Sinha et.al. | 2205.01155 | null |
| 2022-05-02 | A Novel Speech-Driven Lip-Sync Model with CNN and LSTM | Xiaohong Li et.al. | 2205.00916 | null |
| 2022-04-27 | Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion | Sen Chen et.al. | 2204.12756 | null |
| 2022-04-25 | Fast Facial Landmark Detection and Applications: A Survey | Kostiantyn Khabarlak et.al. | 2101.10808 | null |
| 2022-04-13 | Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions | Zipeng Ye et.al. | 2204.06180 | null |
| 2022-04-12 | Attention-Based Lip Audio-Visual Synthesis for Talking Face Generation in the Wild | Ganglai Wang et.al. | 2203.03984 | null |
| 2022-04-06 | Transformer-S2A: Robust and Efficient Speech-to-Animation | Liyang Chen et.al. | 2111.09771 | null |
| 2022-04-03 | Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text | Pulkit Tandon et.al. | 2106.14014 | link |
| 2022-03-30 | End to End Lip Synchronization with a Temporal AutoEncoder | Yoav Shalev et.al. | 2203.16224 | link |
| 2022-03-29 | Thin-Plate Spline Motion Model for Image Animation | Jian Zhao et.al. | 2203.14367 | link |
| 2022-03-17 | StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN | Fei Yin et.al. | 2203.04036 | link |
| 2022-03-17 | FaceFormer: Speech-Driven 3D Facial Animation with Transformers | Yingruo Fan et.al. | 2112.05329 | link |
| 2022-03-16 | Efficient conditioned face animation using frontally-viewed embedding | Maxime Oquab et.al. | 2203.08765 | null |
| 2022-03-15 | Depth-Aware Generative Adversarial Network for Talking Head Video Generation | Fa-Ting Hong et.al. | 2203.06605 | link |
| 2022-03-10 | An Audio-Visual Attention Based Multimodal Network for Fake Talking Face Videos Detection | Ganglai Wang et.al. | 2203.05178 | null |
| 2022-03-04 | Multi-modality Deep Restoration of Extremely Compressed Face Videos | Xi Zhang et.al. | 2107.05548 | null |
| 2022-03-01 | FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset | Hasam Khalid et.al. | 2108.05080 | link |
| 2022-02-25 | FSGANv2: Improved Subject Agnostic Face Swapping and Reenactment | Yuval Nirkin et.al. | 2202.12972 | null |
| 2022-02-22 | Thinking the Fusion Strategy of Multi-reference Face Reenactment | Takuya Yashima et.al. | 2202.10758 | null |
| 2022-01-24 | Selective Listening by Synchronizing Speech with Lips | Zexu Pan et.al. | 2106.07150 | link |
| 2022-01-22 | Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary | Sibo Zhang et.al. | 2104.14631 | null |
| 2022-01-21 | Stitch it in Time: GAN-Based Facial Editing of Real Videos | Rotem Tzaban et.al. | 2201.08361 | link |
| 2022-01-17 | Towards Realistic Visual Dubbing with Heterogeneous Sources | Tianyi Xie et.al. | 2201.06260 | null |
| 2022-01-16 | Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels | Zipeng Ye et.al. | 2201.05986 | null |
| 2022-01-03 | DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering | Shunyu Yao et.al. | 2201.00791 | null |
| 2021-12-20 | Parallel and High-Fidelity Text-to-Lip Generation | Jinglin Liu et.al. | 2107.06831 | link |
| 2021-12-19 | Initiative Defense against Facial Manipulation | Qidong Huang et.al. | 2112.10098 | link |
| 2021-12-07 | Joint Audio-Text Model for Expressive Speech-Driven 3D Facial Animation | Yingruo Fan et.al. | 2112.02214 | null |
| 2021-12-06 | One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning | Suzhen Wang et.al. | 2112.02749 | null |
| 2021-11-29 | Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates | Shenhan Qian et.al. | 2108.08020 | link |
| 2021-11-04 | FEAFA+: An Extended Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation | Wei Gan et.al. | 2111.02751 | null |
| 2021-11-02 | BiosecurID: a multimodal biometric database | Julian Fierrez et.al. | 2111.03472 | null |
| 2021-10-30 | Imitating Arbitrary Talking Style for Realistic Audio-DrivenTalking Face Synthesis | Haozhe Wu et.al. | 2111.00203 | link |
| 2021-10-26 | Emotion recognition in talking-face videos using persistent entropy and neural networks | Eduardo Paluzo-Hidalgo et.al. | 2110.13571 | link |
| 2021-10-26 | ViDA-MAN: Visual Dialog with Digital Humans | Tong Shen et.al. | 2110.13384 | null |
| 2021-10-22 | Invertible Frowns: Video-to-Video Facial Emotion Translation | Ian Magnusson et.al. | 2109.08061 | null |
| 2021-10-19 | Talking Head Generation with Audio and Speech Related Facial Action Units | Sen Chen et.al. | 2110.09951 | null |
| 2021-10-16 | Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor | Anchit Gupta et.al. | 2110.08580 | null |
| 2021-10-12 | Fine-grained Identity Preserving Landmark Synthesis for Face Reenactment | Haichao Zhang et.al. | 2110.04708 | null |
| 2021-10-07 | Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution | Yangyang Shi et.al. | 2110.05241 | null |
| 2021-09-24 | Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation | Yuanxun Lu et.al. | 2109.10595 | null |
| 2021-09-20 | Accurate, Interpretable, and Fast Animation: An Iterative, Sparse, and Nonconvex Approach | Stevo Rackovic et.al. | 2109.08356 | null |
| 2021-09-17 | Detection of GAN-synthesized street videos | Omran Alamayreh et.al. | 2109.04991 | null |
| 2021-08-30 | Audiovisual Speech Synthesis using Tacotron2 | Ahmed Hussen Abdelaziz et.al. | 2008.00620 | null |
| 2021-08-23 | KoDF: A Large-scale Korean DeepFake Detection Dataset | Patrick Kwon et.al. | 2103.10094 | null |
| 2021-08-23 | HeadGAN: One-shot Neural Head Synthesis and Editing | Michail Christos Doukas et.al. | 2012.08261 | null |
| 2021-08-19 | AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis | Yudong Guo et.al. | 2103.11078 | link |
| 2021-08-18 | DeepFake MNIST+: A DeepFake Facial Animation Dataset | Jiajun Huang et.al. | 2108.07949 | link |
| 2021-08-18 | FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning | Chenxu Zhang et.al. | 2108.07938 | link |
| 2021-08-12 | UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing | Meng Cao et.al. | 2108.05650 | null |
| 2021-08-11 | AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person | Xinsheng Wang et.al. | 2108.04325 | null |
| 2021-08-06 | SofGAN: A Portrait Image Generator with Dynamic Styling | Anpei Chen et.al. | 2007.03780 | link |
| 2021-07-27 | Beyond Voice Identity Conversion: Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations | Laurent Benaroya et.al. | 2107.12346 | null |
| 2021-07-21 | Speech Driven Talking Face Generation from a Single Image and an Emotion Condition | Sefik Emre Eskimez et.al. | 2008.03592 | link |
| 2021-07-20 | Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion | Suzhen Wang et.al. | 2107.09293 | link |
| 2021-07-10 | Speech2Video: Cross-Modal Distillation for Speech to Video Generation | Shijing Si et.al. | 2107.04806 | null |
| 2021-07-07 | Egocentric Videoconferencing | Mohamed Elgharib et.al. | 2107.03109 | null |
| 2021-06-09 | LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization | Avisek Lahiri et.al. | 2106.04185 | null |
| 2021-05-20 | Audio-Driven Emotional Video Portraits | Xinya Ji et.al. | 2104.07452 | null |
| 2021-05-07 | Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation | Lincheng Li et.al. | 2104.07995 | link |
| 2021-05-05 | A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors | Ruobing Zheng et.al. | 2002.08700 | null |
| 2021-04-29 | Learned Spatial Representations for Few-shot Talking-Head Synthesis | Moustafa Meshry et.al. | 2104.14557 | null |
| 2021-04-26 | One-shot Face Reenactment Using Appearance Adaptive Normalization | Guangming Yao et.al. | 2102.03984 | null |
| 2021-04-25 | 3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head | Qianyun Wang et.al. | 2104.12051 | null |
| 2021-04-23 | Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation | Hang Zhou et.al. | 2104.11116 | null |
| 2021-04-07 | Single Source One Shot Reenactment using Weighted motion From Paired Feature Points | Soumya Tripathy et.al. | 2104.03117 | null |
| 2021-04-07 | Everything's Talkin': Pareidolia Face Reenactment | Linsen Song et.al. | 2104.03061 | link |
| 2021-04-07 | LI-Net: Large-Pose Identity-Preserving Face Reenactment Network | Jin Liu et.al. | 2104.02850 | null |
| 2021-04-02 | One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing | Ting-Chun Wang et.al. | 2011.15126 | null |
| 2021-03-20 | Not made for each other- Audio-Visual Dissonance-based Deepfake Detection and Localization | Komal Chugh et.al. | 2005.14405 | link |
| 2021-03-19 | End-to-End Lip Synchronisation Based on Pattern Classification | You Jin Kim et.al. | 2005.08606 | null |
| 2021-03-05 | Real-time RGBD-based Extended Body Pose Estimation | Renat Bashirov et.al. | 2103.03663 | link |
| 2021-03-03 | Estimating Uniqueness of I-Vector Representation of Human Voice | Erkam Sinan Tandogan et.al. | 2008.11985 | null |
| 2021-02-25 | MakeItTalk: Speaker-Aware Talking-Head Animation | Yang Zhou et.al. | 2004.12992 | null |
| 2021-02-19 | One Shot Audio to Animated Video Generation | Neeraj Kumar et.al. | 2102.09737 | null |
| 2021-02-18 | AudioVisual Speech Synthesis: A brief literature review | Efthymios Georgiou et.al. | 2103.03927 | null |
| 2020-12-14 | Robust One Shot Audio to Video Generation | Neeraj Kumar et.al. | 2012.07842 | null |
| 2020-12-14 | Multi Modal Adaptive Normalization for Audio to Video Generation | Neeraj Kumar et.al. | 2012.07304 | null |
| 2020-11-30 | Adaptive Compact Attention For Few-shot Video-to-video Translation | Risheng Huang et.al. | 2011.14695 | null |
| 2020-11-21 | Stochastic Talking Face Generation Using Latent Distribution Matching | Ravindra Yadav et.al. | 2011.10727 | link |
| 2020-11-21 | Iterative Text-based Editing of Talking-heads Using Neural Retargeting | Xinwei Yao et.al. | 2011.10688 | null |
| 2020-11-09 | FACEGAN: Facial Attribute Controllable rEenactment GAN | Soumya Tripathy et.al. | 2011.04439 | null |
| 2020-11-06 | Large-scale multilingual audio visual dubbing | Yi Yang et.al. | 2011.03530 | null |
| 2020-11-02 | Facial Keypoint Sequence Generation from Audio | Prateek Manocha et.al. | 2011.01114 | null |
| 2020-10-25 | APB2FaceV2: Real-Time Audio-Guided Multi-Face Reenactment | Jiangning Zhang et.al. | 2010.13017 | link |
| 2020-10-12 | Intuitive Facial Animation Editing Based On A Generative RNN Framework | Eloïse Berson et.al. | 2010.05655 | null |
| 2020-10-05 | SMILE: Semantically-guided Multi-attribute Image and Layout Editing | Andrés Romero et.al. | 2010.02315 | link |
| 2020-10-05 | Dynamic Facial Asset and Rig Generation from a Single Scan | Jiaman Li et.al. | 2010.00560 | null |
| 2020-09-20 | An Improved Approach of Intention Discovery with Machine Learning for POMDP-based Dialogue Management | Ruturaj Raval et.al. | 2009.09354 | null |
| 2020-09-18 | Mesh Guided One-shot Face Reenactment using Graph Convolutional Networks | Guangming Yao et.al. | 2008.07783 | null |
| 2020-09-12 | DualLip: A System for Joint Lip Reading and Generation | Weicong Chen et.al. | 2009.05784 | null |
| 2020-09-02 | Seeing wake words: Audio-visual Keyword Spotting | Liliane Momeni et.al. | 2009.01225 | null |
| 2020-08-29 | "It took me almost 30 minutes to practice this". Performance and Production Practices in Dance Challenge Videos on TikTok | Daniel Klug et.al. | 2008.13040 | null |
| 2020-08-25 | A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild | K R Prajwal et.al. | 2008.10010 | null |
| 2020-08-11 | Audio- and Gaze-driven Facial Animation of Codec Avatars | Alexander Richard et.al. | 2008.05023 | null |
| 2020-08-04 | Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract | Tamás Gábor Csapó et.al. | 2008.02098 | link |
| 2020-08-04 | Real-Time Cleaning and Refinement of Facial Animation Signals | Eloïse Berson et.al. | 2008.01332 | null |
| 2020-08-02 | Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos | Yanhui Guo et.al. | 2008.01652 | null |
| 2020-07-29 | Neural Voice Puppetry: Audio-driven Facial Reenactment | Justus Thies et.al. | 1912.05566 | link |
| 2020-07-20 | Deformable Style Transfer | Sunnie S. Y. Kim et.al. | 2003.11038 | link |
| 2020-07-18 | A Robust Interactive Facial Animation Editing System | Eloïse Berson et.al. | 2007.09367 | null |
| 2020-07-16 | Talking-head Generation with Rhythmic Head Motion | Lele Chen et.al. | 2007.08547 | link |
| 2020-07-08 | Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision | Abhinav Shukla et.al. | 2007.04134 | null |
| 2020-06-20 | Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams | Huirong Huang et.al. | 2006.11610 | null |
| 2020-05-27 | Modality Dropout for Improved Performance-driven Talking Faces | Ahmed Hussen Abdelaziz et.al. | 2005.13616 | null |
| 2020-05-25 | Identity-Preserving Realistic Talking Face Generation | Sanjana Sinha et.al. | 2005.12318 | null |
| 2020-05-22 | Head2Head: Video-based Neural Head Synthesis | Mohammad Rami Koujan et.al. | 2005.10954 | null |
| 2020-05-16 | FReeNet: Multi-Identity Face Reenactment | Jiangning Zhang et.al. | 1905.11805 | null |
| 2020-05-13 | FaR-GAN for One-Shot Face Reenactment | Hanxiang Hao et.al. | 2005.06402 | null |
| 2020-05-13 | Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning | Hao Zhu et.al. | 1812.06589 | null |
| 2020-05-11 | Dancing to the Partisan Beat: A First Analysis of Political Communication on TikTok | Juan Carlos Medina Serrano et.al. | 2004.05478 | link |
| 2020-05-07 | What comprises a good talking-head video generation?: A Survey and Benchmark | Lele Chen et.al. | 2005.03201 | link |
| 2020-05-04 | Disentangled Speech Embeddings using Cross-modal Self-supervision | Arsha Nagrani et.al. | 2002.08742 | null |
| 2020-04-30 | APB2Face: Audio-guided face reenactment with auxiliary pose and blink signals | Jiangning Zhang et.al. | 2004.14569 | null |
| 2020-03-30 | ActGAN: Flexible and Efficient One-shot Face Reenactment | Ivan Kosarevych et.al. | 2003.13840 | null |
| 2020-03-29 | Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose | Xianfang Zeng et.al. | 2003.12957 | null |
| 2020-03-26 | High-Accuracy Facial Depth Models derived from 3D Synthetic Data | Faisal Khan et.al. | 2003.06211 | null |
| 2020-03-06 | Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose | Ran Yi et.al. | 2002.10137 | null |
| 2020-03-05 | Talking-Heads Attention | Noam Shazeer et.al. | 2003.02436 | link |
| 2020-03-01 | Towards Automatic Face-to-Face Translation | Prajwal K R et.al. | 2003.00418 | link |
| 2020-02-19 | Speech-driven facial animation using polynomial fusion of features | Triantafyllos Kefalas et.al. | 1912.05833 | null |
| 2020-01-17 | ICface: Interpretable and Controllable Face Reenactment Using GANs | Soumya Tripathy et.al. | 1904.01909 | null |
| 2019-12-20 | Disentangling Style and Content in Anime Illustrations | Sitao Xiang et.al. | 1905.10742 | null |
| 2019-11-21 | FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis | Kuangxiao Gu et.al. | 1911.09224 | null |
| 2019-11-19 | MarioNETte: Few-shot Face Reenactment Preserving Identity of Unseen Targets | Sungjoo Ha et.al. | 1911.08139 | null |
| 2019-10-28 | Few-shot Video-to-Video Synthesis | Ting-Chun Wang et.al. | 1910.12713 | null |
| 2019-10-19 | Real-Time Lip Sync for Live 2D Animation | Deepali Aneja et.al. | 1910.08685 | link |
| 2019-10-16 | Designing Style Matching Conversational Agents | Deepali Aneja et.al. | 1910.07514 | null |
| 2019-10-15 | A High-Fidelity Open Embodied Avatar with Lip Syncing and Expression Capabilities | Deepali Aneja et.al. | 1909.08766 | link |
| 2019-10-09 | EmoCo: Visual Analysis of Emotion Coherence in Presentation Videos | Haipeng Zeng et.al. | 1907.12918 | null |
| 2019-10-02 | Animating Face using Disentangled Audio Representations | Gaurav Mittal et.al. | 1910.00726 | null |
| 2019-09-25 | Few-Shot Adversarial Learning of Realistic Neural Talking Head Models | Egor Zakharov et.al. | 1905.08233 | null |
| 2019-09-06 | Neural Style-Preserving Visual Dubbing | Hyeongwoo Kim et.al. | 1909.02518 | null |
| 2019-08-29 | 3D Face Pose and Animation Tracking via Eigen-Decomposition based Bayesian Approach | Ngoc-Trung Tran et.al. | 1908.11039 | null |
| 2019-08-20 | Prosodic Phrase Alignment for Machine Dubbing | Alp Öktem et.al. | 1908.07226 | link |
| 2019-08-16 | FSGAN: Subject Agnostic Face Swapping and Reenactment | Yuval Nirkin et.al. | 1908.05932 | link |
| 2019-08-11 | Emotion Dependent Facial Animation from Affective Speech | Rizwan Sadiq et.al. | 1908.03904 | null |
| 2019-08-05 | One-shot Face Reenactment | Yunxuan Zhang et.al. | 1908.03251 | link |
| 2019-07-25 | Talking Face Generation by Conditional Recurrent Adversarial Network | Yang Song et.al. | 1804.04786 | link |
| 2019-07-24 | Data-Driven Physical Face Inversion | Yeara Kozlov et.al. | 1907.10402 | null |
| 2019-07-23 | A system for efficient 3D printed stop-motion face animation | Rinat Abdrashitov et.al. | 1907.10163 | null |
| 2019-06-14 | Realistic Speech-Driven Facial Animation with GANs | Konstantinos Vougioukas et.al. | 1906.06337 | null |
| 2019-06-04 | Text-based Editing of Talking-head Video | Ohad Fried et.al. | 1906.01524 | null |
| 2019-05-27 | Audio2Face: Generating Speech/Face Animation from Single Audio with Attention-Based Bidirectional LSTM Networks | Guanzhong Tian et.al. | 1905.11142 | null |
| 2019-05-09 | Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss | Lele Chen et.al. | 1905.03820 | link |
| 2019-05-08 | Capture, Learning, and Synthesis of 3D Speaking Styles | Daniel Cudeiro et.al. | 1905.03079 | link |
| 2019-04-23 | Talking Face Generation by Adversarially Disentangled Audio-Visual Representation | Hang Zhou et.al. | 1807.07860 | null |
| 2019-04-02 | FEAFA: A Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation | Yanfu Yan et.al. | 1904.01509 | null |
| 2019-03-13 | Animating an Autonomous 3D Talking Avatar | Dominik Borer et.al. | 1903.05448 | null |
| 2018-12-22 | Deep Audio-Visual Speech Recognition | Triantafyllos Afouras et.al. | 1809.02108 | null |
| 2018-12-20 | DeepFakes: a New Threat to Face Recognition? Assessment and Detection | Pavel Korshunov et.al. | 1812.08685 | null |
| 2018-11-22 | Towards Highly Accurate and Stable Face Alignment for High-Resolution Videos | Ying Tai et.al. | 1811.00342 | link |
| 2018-11-16 | Influence of visual cues on head and eye movements during listening tasks in multi-talker audiovisual environments with animated characters | Maartje M. E. Hendrikse et.al. | 1812.02088 | null |
| 2018-08-28 | GANimation: Anatomically-aware Facial Animation from a Single Image | Albert Pumarola et.al. | 1807.09251 | link |
| 2018-08-19 | Dynamic Temporal Alignment of Speech to Lips | Tavi Halperin et.al. | 1808.06250 | link |
| 2018-07-29 | ReenactGAN: Learning to Reenact Faces via Boundary Transfer | Wayne Wu et.al. | 1807.11079 | link |
| 2018-07-26 | Learnable PINs: Cross-Modal Embeddings for Person Identity | Arsha Nagrani et.al. | 1805.00833 | null |
| 2018-07-19 | End-to-End Speech-Driven Facial Animation with Temporal GANs | Konstantinos Vougioukas et.al. | 1805.09313 | null |
| 2018-05-29 | Deep Video Portraits | Hyeongwoo Kim et.al. | 1805.11714 | null |
| 2018-05-24 | VisemeNet: Audio-Driven Animator-Centric Speech Animation | Yang Zhou et.al. | 1805.09488 | null |
| 2018-05-21 | Anime Style Space Exploration Using Metric Learning and Generative Adversarial Networks | Sitao Xiang et.al. | 1805.07997 | null |
| 2018-04-23 | Generating Talking Face Landmarks from Speech | Sefik Emre Eskimez et.al. | 1803.09803 | null |
| 2018-03-28 | Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network | Hai X. Pham et.al. | 1803.07716 | null |
| 2018-03-20 | Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks | Seyed Ali Jalalifar et.al. | 1803.07461 | null |
| 2017-12-07 | End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech | Hai X. Pham et.al. | 1710.00920 | null |
| 2017-12-06 | ObamaNet: Photo-realistic lip-sync from text | Rithesh Kumar et.al. | 1801.01442 | null |
| 2017-07-30 | Kernel Projection of Latent Structures Regression for Facial Animation Retargeting | Christos Ouzounis et.al. | 1707.09629 | null |
| 2017-07-26 | Fast Deep Matting for Portrait Animation on Mobile Phone | Bingke Zhu et.al. | 1707.08289 | null |
| 2017-07-21 | Multichannel Attention Network for Analyzing Visual Behavior in Public Speaking | Rahul Sharma et.al. | 1707.06830 | null |
| 2017-07-18 | You said that? | Joon Son Chung et.al. | 1705.02966 | null |
| 2017-01-30 | Lip Reading Sentences in the Wild | Joon Son Chung et.al. | 1611.05358 | link |
| 2016-10-28 | Galaxy gas as obscurer: II. Separating the galaxy-scale and nuclear obscurers of Active Galactic Nuclei | Johannes Buchner et.al. | 1610.09380 | link |
| 2016-07-11 | Large-Scale MIMO is Capable of Eliminating Power-Thirsty Channel Coding for Wireless Transmission of HEVC/H.265 Video | Shaoshi Yang et.al. | 1601.06684 | null |
| 2016-05-22 | Improving Facial Analysis and Performance Driven Animation through Disentangling Identity and Expression | David Rim et.al. | 1512.08212 | null |
| 2016-02-08 | Automatic Face Reenactment | Pablo Garrido et.al. | 1602.02651 | null |
| 2015-11-20 | ExpressionBot: An Emotive Lifelike Robotic Face for Face-to-Face Communication | Ali Mollahosseini et.al. | 1511.06502 | null |
| 2014-09-03 | Visual Speech Recognition | Ahmad B. A. Hassanat et.al. | 1409.1411 | null |
| 2012-09-22 | Using multimodal speech production data to evaluate articulatory animation for audiovisual speech synthesis | Ingmar Steiner et.al. | 1209.4982 | null |
| 2012-03-30 | Face Expression Recognition and Analysis: The State of the Art | Vinay Bettadapura et.al. | 1203.6722 | null |
| 2012-01-19 | Progress in animation of an EMA-controlled tongue model for acoustic-visual speech synthesis | Ingmar Steiner et.al. | 1201.4080 | null |
| 2010-03-01 | Re-verification of a Lip Synchronization Protocol using Robust Reachability | Piotr Kordy et.al. | 1003.0431 | null |
Image Animation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-11-18 | PFAvatar: Pose-Fusion 3D Personalized Avatar Reconstruction from Real-World Outfit-of-the-Day Photos | Dianbing Xi et.al. | 2511.12935 | null |
| 2025-11-16 | Sketch2PoseNet: Efficient and Generalized Sketch to 3D Human Pose Prediction | Li Wang et.al. | 2510.26196 | null |
| 2025-11-14 | EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation | Zongyang Qiu et.al. | 2511.11002 | null |
| 2025-11-11 | OmniAID: Decoupling Semantic and Artifacts for Universal AI-Generated Image Detection in the Wild | Yuncheng Guo et.al. | 2511.08423 | null |
| 2025-11-11 | oboro: Text-to-Image Synthesis on Limited Data using Flow-based Diffusion Transformer with MMH Attention | Ryusuke Mizutani et.al. | 2511.08168 | null |
| 2025-11-11 | Beyond the Pixels: VLM-based Evaluation of Identity Preservation in Reference-Guided Synthesis | Aditi Singhania et.al. | 2511.08087 | null |
| 2025-11-09 | Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising | Assaf Singer et.al. | 2511.08633 | null |
| 2025-11-04 | Video Text Preservation with Synthetic Text-Rich Videos | Ziyang Liu et.al. | 2511.05573 | null |
| 2025-11-03 | FreeArt3D: Training-Free Articulated Object Generation using 3D Diffusion | Chuhao Chen et.al. | 2510.25765 | null |
| 2025-11-02 | A Hybrid YOLOv5-SSD IoT-Based Animal Detection System for Durian Plantation Protection | Anis Suttan Shahrir et.al. | 2511.00777 | null |
| 2025-10-31 | DANCER: Dance ANimation via Condition Enhancement and Rendering with diffusion model | Yucheng Xing et.al. | 2510.27169 | null |
| 2025-10-29 | 4-Doodle: Text to 3D Sketches that Move! | Hao Chen et.al. | 2510.25319 | null |
| 2025-10-28 | DogMo: A Large-Scale Multi-View RGB-D Dataset for 4D Canine Motion Recovery | Zan Wang et.al. | 2510.24117 | null |
| 2025-10-27 | Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation | Junyoung Seo et.al. | 2510.23581 | null |
| 2025-10-27 | Revising Second Order Terms in Deep Animation Video Coding | Konstantin Schmidt et.al. | 2510.23561 | null |
| 2025-10-26 | Cross-Species Transfer Learning in Agricultural AI: Evaluating ZebraPose Adaptation for Dairy Cattle Pose Estimation | Mackenzie Tapp et.al. | 2510.22618 | null |
| 2025-10-26 | DynaPose4D: High-Quality 4D Dynamic Content Generation via Pose Alignment Loss | Jing Yang et.al. | 2510.22473 | null |
| 2025-10-20 | From Volume Rendering to 3D Gaussian Splatting: Theory and Applications | Vitor Pereira Matias et.al. | 2510.18101 | null |
| 2025-10-16 | Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation | Shaowei Liu et.al. | 2510.14976 | null |
| 2025-10-16 | Zero-Shot Wildlife Sorting Using Vision Transformers: Evaluating Clustering and Continuous Similarity Ordering | Hugo Markoff et.al. | 2510.14596 | null |
| 2025-10-16 | Hierarchical Re-Classification: Combining Animal Classification Models with Vision Transformers | Hugo Markoff et.al. | 2510.14594 | null |
| 2025-10-16 | Evaluating plastic scintillator performance as a substitute of LYSO in SiPM based animal PET scanners: A GEANT4 simulation analysis | Davinder Siwal et.al. | 2510.14437 | null |
| 2025-10-16 | Multi-identity Human Image Animation with Structural Video Diffusion | Zhenzhi Wang et.al. | 2504.04126 | null |
| 2025-09-19 | TT-DF: A Large-Scale Diffusion-Based Dataset and Benchmark for Human Body Forgery Detection | Wenkui Yang et.al. | 2505.08437 | null |
| 2025-09-09 | LINR Bridge: Vector Graphic Animation via Neural Implicits and Video Diffusion Priors | Wenshuo Gao et.al. | 2509.07484 | null |
| 2025-08-23 | AnimateAnywhere: Rouse the Background in Human Image Animation | Xiaoyu Liu et.al. | 2504.19834 | null |
| 2025-08-14 | Animate-X++: Universal Character Image Animation with Dynamic Backgrounds | Shuai Tan et.al. | 2508.09454 | null |
| 2025-08-10 | Consistent and Controllable Image Animation with Motion Linear Diffusion Transformers | Xin Ma et.al. | 2508.07246 | null |
| 2025-07-20 | StableAnimator++: Overcoming Pose Misalignment and Face Distortion for Human Image Animation | Shuyuan Tu et.al. | 2507.15064 | null |
| 2025-07-11 | X-Dancer: Expressive Music to Human Dance Video Generation | Zeyuan Chen et.al. | 2502.17414 | null |
| 2025-07-01 | DAM-VSR: Disentanglement of Appearance and Motion for Video Super-Resolution | Zhe Kong et.al. | 2507.01012 | null |
| 2025-07-01 | Recomposed realities: animating still images via patch clustering and randomness | Markus Juvonen et.al. | 2506.22556 | null |
| 2025-05-30 | MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation | Yanbo Ding et.al. | 2505.10238 | null |
| 2025-05-29 | HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions | Shuolin Xu et.al. | 2505.22977 | null |
| 2025-05-24 | EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation | Qiang Qu et.al. | 2503.18552 | null |
| 2025-05-18 | DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation | Haoyu Zhao et.al. | 2503.21246 | null |
| 2025-04-20 | DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance | Yuxuan Luo et.al. | 2504.01724 | null |
| 2025-04-15 | UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer | Xiang Wang et.al. | 2504.11289 | null |
| 2025-04-15 | Taming Consistency Distillation for Accelerated Human Image Animation | Xiang Wang et.al. | 2504.11143 | null |
| 2025-04-04 | Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images | In-Hwan Jin et.al. | 2504.05458 | null |
| 2025-04-01 | VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer | Xinyu Liu et.al. | 2502.05979 | null |
| 2025-03-23 | MotiF: Making Text Count in Image Animation with Motion Focal Loss | Shijie Wang et.al. | 2412.16153 | null |
| 2025-03-13 | Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer | Jiahao Cui et.al. | 2412.00733 | link |
| 2025-03-10 | Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation | Yingjie Chen et.al. | 2501.05020 | null |
| 2025-02-25 | DisPose: Disentangling Pose Guidance for Controllable Human Image Animation | Hongxiang Li et.al. | 2412.09349 | link |
| 2025-02-15 | SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers | Di Qiu et.al. | 2502.10841 | null |
| 2025-02-10 | Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance | Li Hu et.al. | 2502.06145 | null |
| 2025-02-06 | MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation | Jinbo Xing et.al. | 2502.04299 | null |
| 2025-02-03 | Every Image Listens, Every Image Dances: Music-Driven Image Animation | Zhikang Dong et.al. | 2501.18801 | null |
| 2025-01-20 | X-Dyna: Expressive Dynamic Human Image Animation | Di Chang et.al. | 2501.10021 | null |
| 2025-01-15 | Joint Learning of Depth and Appearance for Portrait Image Animation | Xinya Ji et.al. | 2501.08649 | null |
| 2024-12-12 | Animate-X: Universal Character Image Animation with Enhanced Motion Representation | Shuai Tan et.al. | 2410.10306 | null |
| 2024-12-04 | FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait | Taekyung Ki et.al. | 2412.01064 | null |
| 2024-11-30 | DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses | Yatian Pang et.al. | 2412.00397 | null |
| 2024-11-28 | JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation | Xuyang Cao et.al. | 2411.09209 | link |
| 2024-11-27 | StableAnimator: High-Quality Identity-Preserving Human Image Animation | Shuyuan Tu et.al. | 2411.17697 | link |
| 2024-11-24 | LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis | Haojie Zhang et.al. | 2411.16748 | null |
| 2024-11-22 | HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation | Zhenzhi Wang et.al. | 2407.17438 | null |
| 2024-10-31 | TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation | Sunjae Yoon et.al. | 2410.24037 | null |
| 2024-10-20 | FrameBridge: Improving Image-to-Video Generation with Bridge Models | Yuji Wang et.al. | 2410.15371 | null |
| 2024-10-14 | Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation | Jiahao Cui et.al. | 2410.07718 | link |
| 2024-09-30 | Illustrious: an Open Advanced Illustration Model | Sang Hyun Park et.al. | 2409.19946 | null |
| 2024-09-29 | High Quality Human Image Animation using Regional Supervision and Motion Blur Condition | Zhongcong Xu et.al. | 2409.19580 | null |
| 2024-09-22 | Dormant: Defending against Pose-driven Human Image Animation | Jiachen Zhou et.al. | 2409.14424 | link |
| 2024-07-23 | Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models | Xin Ma et.al. | 2407.15642 | link |
| 2024-07-12 | TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models | Jeongho Kim et.al. | 2407.09012 | null |
| 2024-07-12 | EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions | Zhiyuan Chen et.al. | 2407.08136 | link |
| 2024-07-11 | MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model | Muyao Niu et.al. | 2405.20222 | link |
| 2024-06-16 | Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation | Mingwang Xu et.al. | 2406.08801 | null |
| 2024-06-14 | Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation | Li Hu et.al. | 2311.17117 | null |
| 2024-06-13 | Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control | Jingyun Xue et.al. | 2406.03035 | null |
| 2024-06-03 | UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation | Xiang Wang et.al. | 2406.01188 | null |
| 2024-06-01 | Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance | Shenhao Zhu et.al. | 2403.14781 | link |
| 2024-05-29 | Evaluating the efectiveness of sonifcation in science education using Edukoi | Lucrezia Guiotto Nai Fovino et.al. | 2405.18908 | null |
| 2024-05-28 | VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation | Qilin Wang et.al. | 2405.18156 | null |
| 2024-05-28 | Controllable Longer Image Animation with Diffusion Models | Qiang Wang et.al. | 2405.17306 | null |
| 2024-03-26 | PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models | Yiming Zhang et.al. | 2312.13964 | null |
| 2024-03-13 | Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts | Yue Ma et.al. | 2403.08268 | link |
| 2024-03-08 | Audio-Synchronized Visual Animation | Lin Zhang et.al. | 2403.05659 | link |
| 2024-03-05 | Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation | Weijie Li et.al. | 2403.02827 | null |
| 2024-01-17 | Continuous Piecewise-Affine Based Motion Model for Image Animation | Hexiang Wang et.al. | 2401.09146 | link |
| 2024-01-03 | Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions | David Junhao Zhang et.al. | 2401.01827 | link |
| 2023-12-08 | AnimateZero: Video Diffusion Models are Zero-Shot Image Animators | Jiwen Yu et.al. | 2312.03793 | null |
| 2023-12-06 | AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance | Zuozhuo Dai et.al. | 2311.12886 | null |
| 2023-12-05 | LivePhoto: Real Image Animation with Text-guided Motion Control | Xi Chen et.al. | 2312.02928 | null |
| 2023-11-30 | Motion-Conditioned Image Animation for Video Editing | Wilson Yan et.al. | 2311.18827 | null |
| 2023-11-27 | MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model | Zhongcong Xu et.al. | 2311.16498 | null |
| 2023-11-27 | DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors | Jinbo Xing et.al. | 2310.12190 | link |
| 2023-11-19 | Differential Motion Evolution for Fine-Grained Motion Deformation in Unsupervised Image Animation | Peirong Liu et.al. | 2110.04658 | null |
| 2023-10-16 | LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation | Ruiqi Wu et.al. | 2310.10769 | link |
| 2023-10-11 | LEO: Generative Latent Image Animator for Human Video Synthesis | Yaohui Wang et.al. | 2305.03989 | link |
| 2023-09-26 | Text-Guided Synthesis of Eulerian Cinemagraphs | Aniruddha Mahapatra et.al. | 2307.03190 | link |
| 2023-09-25 | Automatic Animation of Hair Blowing in Still Portrait Photos | Wenpeng Xiao et.al. | 2309.14207 | null |
| 2023-07-10 | AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning | Yuwei Guo et.al. | 2307.04725 | link |
| 2023-07-09 | Predictive Coding For Animation-Based Video Compression | Goluck Konuko et.al. | 2307.04187 | null |
| 2023-04-12 | VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs | Moayed Haji Ali et.al. | 2304.06020 | null |
| 2023-03-10 | 3D Cinemagraphy from a Single Image | Xingyi Li et.al. | 2303.05724 | null |
| 2023-02-02 | Dreamix: Video Diffusion Models are General Video Editors | Eyal Molad et.al. | 2302.01329 | null |
| 2023-01-27 | Animating Still Images | Kushagr Batra et.al. | 2209.10497 | null |
| 2023-01-14 | Continuous odor profile monitoring to study olfactory navigation in small animals | Kevin S. Chen et.al. | 2301.05905 | null |
| 2022-11-30 | NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-shot Real Image Animation | Yu Yin et.al. | 2211.17235 | null |
| 2022-10-05 | Implicit Warping for Animation with Image Sets | Arun Mallya et.al. | 2210.01794 | null |
| 2022-09-28 | Motion Transformer for Unsupervised Image Animation | Jiale Tao et.al. | 2209.14024 | link |
| 2022-07-19 | Single Stage Virtual Try-on via Deformable Attention Flows | Shuai Bai et.al. | 2207.09161 | link |
| 2022-07-08 | Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation | Yucheng Suo et.al. | 2207.03714 | null |
| 2022-06-11 | Bayesian Statistics Guided Label Refurbishment Mechanism: Mitigating Label Noise in Medical Image Classification | Mengdi Gao et.al. | 2106.12284 | link |
| 2022-04-05 | Neural Fields in Visual Computing and Beyond | Yiheng Xie et.al. | 2111.11426 | null |
| 2022-03-30 | Image Animation with Perturbed Masks | Yoav Shalev et.al. | 2011.06922 | null |
| 2022-03-29 | Thin-Plate Spline Motion Model for Image Animation | Jian Zhao et.al. | 2203.14367 | link |
| 2022-03-25 | 3D GAN Inversion for Controllable Portrait Image Animation | Connor Z. Lin et.al. | 2203.13441 | null |
| 2022-03-18 | Latent Image Animator: Learning to Animate Images via Latent Space Navigation | Yaohui Wang et.al. | 2203.09043 | null |
| 2021-12-21 | Image Animation with Keypoint Mask | Or Toledano et.al. | 2112.10457 | link |
| 2021-12-19 | Move As You Like: Image Animation in E-Commerce Scenario | Borun Xu et.al. | 2112.13647 | null |
| 2021-12-17 | AI-Empowered Persuasive Video Generation: A Survey | Chang Liu et.al. | 2112.09401 | null |
| 2021-12-01 | Deep Spatial Transformation for Pose-Guided Person Image Generation and Animation | Yurui Ren et.al. | 2008.12606 | null |
| 2021-10-28 | Application of Time Separation Technique to Enhance C-arm CT Dynamic Liver Perfusion Imaging | Hana Haseljić et.al. | 2110.14318 | null |
| 2021-10-26 | Incremental Learning for Animal Pose Estimation using RBF k-DPP | Gaurav Kumar Nayak et.al. | 2110.13598 | null |
| 2021-10-07 | Enhancement of Anime Imaging Enlargement using Modified Super-Resolution CNN | Tanakit Intaniyom et.al. | 2110.02321 | null |
| 2021-09-06 | Sparse to Dense Motion Transfer for Face Image Animation | Ruiqi Zhao et.al. | 2109.00471 | null |
| 2021-08-18 | DeepFake MNIST+: A DeepFake Facial Animation Dataset | Jiajun Huang et.al. | 2108.07949 | link |
| 2021-06-23 | Analisis Kualitas Layanan Website E-Commerce Bukalapak Terhadap Kepuasan Pengguna Mahasiswa Universitas Bina Darma Menggunakan Metode Webqual 4.0 | Adellia et.al. | 2106.15342 | null |
| 2021-04-07 | Single Source One Shot Reenactment using Weighted motion From Paired Feature Points | Soumya Tripathy et.al. | 2104.03117 | null |
| 2021-03-23 | PriorityCut: Occlusion-guided Regularization for Warp-based Image Animation | Wai Ting Cheung et.al. | 2103.11600 | null |
| 2020-12-01 | Ultra-low bitrate video conferencing using deep image animation | Goluck Konuko et.al. | 2012.00346 | null |
| 2020-10-01 | First Order Motion Model for Image Animation | Aliaksandr Siarohin et.al. | 2003.00196 | link |
| 2019-08-30 | Animating Arbitrary Objects via Deep Motion Transfer | Aliaksandr Siarohin et.al. | 1812.08861 | link |
| 2019-07-01 | Style Generator Inversion for Image Enhancement and Animation | Aviv Gabbay et.al. | 1906.11880 | null |
| 2018-10-09 | 3D model silhouette-based tracking in depth images for puppet suit dynamic video-mapping | Guillaume Caron et.al. | 1810.03956 | null |
| 2018-06-24 | A Design of FPGA Based Small Animal PET Real Time Digital Signal Processing and Correction Logic | Jiaming Lu et.al. | 1806.09117 | null |
| 2018-01-31 | RAPTOR I: Time-dependent radiative transfer in arbitrary spacetimes | Thomas Bronzwaer et.al. | 1801.10452 | null |
| 2017-10-23 | Quasi-random Agents for Image Transition and Animation | Aneta Neumann et.al. | 1710.07421 | null |
| 2016-06-23 | Gender and Interest Targeting for Sponsored Post Advertising at Tumblr | Mihajlo Grbovic et.al. | 1606.07189 | null |
| 2015-03-16 | Use of Effective Audio in E-learning Courseware | Kisor Ray et.al. | 1503.04837 | null |
| 2015-02-04 | Multimedia-Video for Learning | Kah Hean Chua et.al. | 1502.01090 | null |
| 2013-01-25 | Measurements of Martian Dust Devil Winds with HiRISE | David S. Choi et.al. | 1301.6130 | null |
| 2010-01-04 | Tutoring System for Dance Learning | Rajkumar Kannan et.al. | 1001.0440 | null |
Video Generation
Video Generation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-11-18 | Zero-shot Synthetic Video Realism Enhancement via Structure-aware Denoising | Yifan Wang et.al. | 2511.14719 | null |
| 2025-11-18 | FreeSwim: Revisiting Sliding-Window Attention Mechanisms for Training-Free Ultra-High-Resolution Video Generation | Yunfeng Wu et.al. | 2511.14712 | null |
| 2025-11-18 | ForensicFlow: A Tri-Modal Adaptive Network for Robust Deepfake Detection | Mohammad Romani et.al. | 2511.14554 | null |
| 2025-11-18 | DeCo-VAE: Learning Compact Latents for Video Reconstruction via Decoupled Representation | Xiangchen Yin et.al. | 2511.14530 | null |
| 2025-11-18 | FlowRoI A Fast Optical Flow Driven Region of Interest Extraction Framework for High-Throughput Image Compression in Immune Cell Migration Analysis | Xiaowei Xu et.al. | 2511.14419 | null |
| 2025-11-18 | ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries | Junfu Pu et.al. | 2511.14349 | null |
| 2025-11-18 | Dental3R: Geometry-Aware Pairing for Intraoral 3D Reconstruction from Sparse-View Photographs | Yiyi Miao et.al. | 2511.14315 | null |
| 2025-11-18 | Towards Authentic Movie Dubbing with Retrieve-Augmented Director-Actor Interaction Learning | Rui Liu et.al. | 2511.14249 | null |
| 2025-11-18 | Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion | Zhuo Li et.al. | 2511.14178 | null |
| 2025-11-18 | Multi-view Phase-aware Pedestrian-Vehicle Incident Reasoning Framework with Vision-Language Models | Hao Zhen et.al. | 2511.14120 | null |
| 2025-11-18 | Real-Time Mobile Video Analytics for Pre-arrival Emergency Medical Services | Liuyi Jin et.al. | 2511.14119 | null |
| 2025-11-18 | A Patient-Independent Neonatal Seizure Prediction Model Using Reduced Montage EEG and ECG | Sithmini Ranasingha et.al. | 2511.14110 | null |
| 2025-11-18 | Text-Driven Reasoning Video Editing via Reinforcement Learning on Digital Twin Representations | Yiqing Shen et.al. | 2511.14100 | null |
| 2025-11-18 | Privis: Towards Content-Aware Secure Volumetric Video Delivery | Kaiyuan Hu et.al. | 2511.14005 | null |
| 2025-11-17 | Learning Skill-Attributes for Transferable Assessment in Video | Kumar Ashutosh et.al. | 2511.13993 | null |
| 2025-11-17 | PoCGM: Poisson-Conditioned Generative Model for Sparse-View CT Reconstruction | Changsheng Fang et.al. | 2511.13967 | null |
| 2025-11-17 | SAE-MCVT: A Real-Time and Scalable Multi-Camera Vehicle Tracking Framework Powered by Edge Computing | Yuqiang Lin et.al. | 2511.13904 | null |
| 2025-11-17 | Temporal Realism Evaluation of Generated Videos Using Compressed-Domain Motion Vectors | Mert Onur Cakiroglu et.al. | 2511.13897 | null |
| 2025-11-17 | Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark | Xinxin Liu et.al. | 2511.13853 | null |
| 2025-11-17 | Segment Anything Across Shots: A Method and Benchmark | Hengrui Hu et.al. | 2511.13715 | null |
| 2025-11-17 | UnSAMv2: Self-Supervised Learning Enables Segment Anything at Any Granularity | Junwei Yu et.al. | 2511.13714 | null |
| 2025-11-17 | TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models | Harold Haodong Chen et.al. | 2511.13704 | null |
| 2025-11-17 | Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting | Jiangnan Ye et.al. | 2511.13684 | null |
| 2025-11-17 | CacheFlow: Compressive Streaming Memory for Efficient Long-Form Video Understanding | Shrenik Patel et.al. | 2511.13644 | null |
| 2025-11-17 | Computer Vision based group activity detection and action spotting | Narthana Sivalingam et.al. | 2511.13315 | null |
| 2025-11-17 | CorrectAD: A Self-Correcting Agentic System to Improve End-to-end Planning in Autonomous Driving | Enhui Ma et.al. | 2511.13297 | null |
| 2025-11-17 | FoleyBench: A Benchmark For Video-to-Audio Models | Satvik Dixit et.al. | 2511.13219 | null |
| 2025-11-17 | Skeletons Speak Louder than Text: A Motion-Aware Pretraining Paradigm for Video-Based Person Re-Identification | Rifen Lin et.al. | 2511.13150 | null |
| 2025-11-17 | VEIL: Jailbreaking Text-to-Video Models via Visual Exploitation from Implicit Language | Zonghao Ying et.al. | 2511.13127 | null |
| 2025-11-17 | CloseUpShot: Close-up Novel View Synthesis from Sparse-views via Point-conditioned Diffusion Model | Yuqi Zhang et.al. | 2511.13121 | null |
| 2025-11-17 | Semantics and Content Matter: Towards Multi-Prior Hierarchical Mamba for Image Deraining | Zhaocheng Yu et.al. | 2511.13113 | null |
| 2025-11-17 | Recurrent Autoregressive Diffusion: Global Memory Meets Local Attention | Taiye Chen et.al. | 2511.12940 | null |
| 2025-11-17 | Yanyun-3: Enabling Cross-Platform Strategy Game Operation with Vision-Language Models | Guoyan Wang et.al. | 2511.12937 | null |
| 2025-11-17 | PFAvatar: Pose-Fusion 3D Personalized Avatar Reconstruction from Real-World Outfit-of-the-Day Photos | Dianbing Xi et.al. | 2511.12935 | null |
| 2025-11-17 | Generative Photographic Control for Scene-Consistent Video Cinematic Editing | Huiqiang Sun et.al. | 2511.12921 | null |
| 2025-11-17 | Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views | Junyi Ma et.al. | 2511.12878 | null |
| 2025-11-17 | Video Finetuning Improves Reasoning Between Frames | Ruiqi Yang et.al. | 2511.12868 | null |
| 2025-11-16 | SAGA: Source Attribution of Generative AI Videos | Rohit Kundu et.al. | 2511.12834 | null |
| 2025-11-16 | Toward Real-world Text Image Forgery Localization: Structured and Interpretable Data Synthesis | Zeqin Yu et.al. | 2511.12658 | null |
| 2025-11-16 | Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data | Yunxin Li et.al. | 2511.12609 | null |
| 2025-11-16 | TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction | Yukuo Ma et.al. | 2511.12578 | null |
| 2025-11-16 | ReaSon: Reinforced Causal Search with Information Bottleneck for Video Understanding | Yuan Zhou et.al. | 2511.12530 | null |
| 2025-11-16 | DualGR: Generative Retrieval with Long and Short-Term Interests Modeling | Zhongchao Yi et.al. | 2511.12518 | null |
| 2025-11-16 | DINO-Detect: A Simple yet Effective Framework for Blur-Robust AI-Generated Image Detection | Jialiang Shen et.al. | 2511.12511 | null |
| 2025-11-16 | VLA-R: Vision-Language Action Retrieval toward Open-World End-to-End Autonomous Driving | Hyunki Seong et.al. | 2511.12405 | null |
| 2025-11-16 | SynthGuard: An Open Platform for Detecting AI-Generated Multimedia with Multimodal LLMs | Shail Desai et.al. | 2511.12404 | null |
| 2025-11-15 | Fast Reasoning Segmentation for Images and Videos | Yiqing Shen et.al. | 2511.12368 | null |
| 2025-11-15 | Constructing and Interpreting Digital Twin Representations for Visual Reasoning via Reinforcement Learning | Yiqing Shen et.al. | 2511.12365 | null |
| 2025-11-15 | AURA: Development and Validation of an Augmented Unplanned Removal Alert System using Synthetic ICU Videos | Junhyuk Seo et.al. | 2511.12241 | null |
| 2025-11-15 | Cross-View Cross-Modal Unsupervised Domain Adaptation for Driver Monitoring System | Aditi Bhalla et.al. | 2511.12196 | null |
| 2025-11-15 | Towards Obstacle-Avoiding Control of Planar Snake Robots Exploring Neuro-Evolution of Augmenting Topologies | Advik Sinha et.al. | 2511.12148 | null |
| 2025-11-15 | Adaptive Begin-of-Video Tokens for Autoregressive Video Diffusion Models | Tianle Cheng et.al. | 2511.12099 | null |
| 2025-11-15 | Learning to Hear by Seeing: It's Time for Vision Language Models to Understand Artistic Emotion from Sight and Sound | Dengming Zhang et.al. | 2511.12077 | null |
| 2025-11-15 | ProAV-DiT: A Projected Latent Diffusion Transformer for Efficient Synchronized Audio-Video Generation | Jiahui Sun et.al. | 2511.12072 | null |
| 2025-11-15 | PipeDiT: Accelerating Diffusion Transformers in Video Generation with Task Pipelining and Model Decoupling | Sijie Wang et.al. | 2511.12056 | null |
| 2025-11-15 | TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space | Wenxuan Miao et.al. | 2511.12035 | null |
| 2025-11-14 | Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models | Siyou Li et.al. | 2511.11910 | null |
| 2025-11-14 | KVSwap: Disk-aware KV Cache Offloading for Long-Context On-device Inference | Huawei Zhang et.al. | 2511.11907 | null |
| 2025-11-14 | Scalable Policy Evaluation with Video World Models | Wei-Cheng Tseng et.al. | 2511.11520 | null |
| 2025-11-14 | Disentangling Emotional Bases and Transient Fluctuations: A Low-Rank Sparse Decomposition Approach for Video Affective Analysis | Feng-Qi Cui et.al. | 2511.11406 | null |
| 2025-11-14 | YCB-Ev SD: Synthetic event-vision dataset for 6DoF object pose estimation | Pavel Rojtberg et.al. | 2511.11344 | null |
| 2025-11-14 | RealisticDreamer: Guidance Score Distillation for Few-shot Gaussian Splatting | Ruocheng Wu et.al. | 2511.11213 | null |
| 2025-11-14 | VIDEOP2R: Video Understanding from Perception to Reasoning | Yifan Jiang et.al. | 2511.11113 | null |
| 2025-11-14 | LiteAttention: A Temporal Sparse Attention for Diffusion Transformers | Dor Shmilovich et.al. | 2511.11062 | null |
| 2025-11-14 | EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation | Zongyang Qiu et.al. | 2511.11002 | null |
| 2025-11-14 | Dexterous Manipulation Transfer via Progressive Kinematic-Dynamic Alignment | Wenbin Bai et.al. | 2511.10987 | null |
| 2025-11-14 | Text-guided Weakly Supervised Framework for Dynamic Facial Expression Recognition | Gunho Jung et.al. | 2511.10958 | null |
| 2025-11-14 | Language-Guided Graph Representation Learning for Video Summarization | Wenrui Li et.al. | 2511.10953 | null |
| 2025-11-14 | Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labeling | Seoik Jung et.al. | 2511.10866 | null |
| 2025-11-13 | Expert Consensus-based Video-Based Assessment Tool for Workflow Analysis in Minimally Invasive Colorectal Surgery: Development and Validation of ColoWorkflow | Pooja P Jain et.al. | 2511.10766 | null |
| 2025-11-13 | Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals | Shruti Singh Baghel et.al. | 2511.10615 | null |
| 2025-11-13 | TubeRMC: Tube-conditioned Reconstruction with Mutual Constraints for Weakly-supervised Spatio-Temporal Video Grounding | Jinxuan Li et.al. | 2511.10241 | null |
| 2025-11-13 | Next-Frame Feature Prediction for Multimodal Deepfake Detection and Temporal Localization | Ashutosh Anshul et.al. | 2511.10212 | null |
| 2025-11-13 | SUGAR: Learning Skeleton Representation with Visual-Motion Knowledge for Action Recognition | Qilang Ye et.al. | 2511.10091 | null |
| 2025-11-13 | When Eyes and Ears Disagree: Can MLLMs Discern Audio-Visual Confusion? | Qilang Ye et.al. | 2511.10059 | null |
| 2025-11-13 | Reinforcing Trustworthiness in Multimodal Emotional Support Systems | Huy M. Le et.al. | 2511.10011 | null |
| 2025-11-13 | AHA! Animating Human Avatars in Diverse Scenes with Gaussian Splatting | Aymen Mir et.al. | 2511.09827 | null |
| 2025-11-12 | Density Estimation and Crowd Counting | Balachandra Devarangadi Sunil et.al. | 2511.09723 | null |
| 2025-11-12 | PriVi: Towards A General-Purpose Video Model For Primate Behavior In The Wild | Felix B. Mueller et.al. | 2511.09675 | null |
| 2025-11-12 | TempRetinex: Retinex-based Unsupervised Enhancement for Low-light Video Under Diverse Lighting Conditions | Yini Li et.al. | 2511.09609 | null |
| 2025-11-12 | Bridging the Data Gap: Spatially Conditioned Diffusion Model for Anomaly Generation in Photovoltaic Electroluminescence Images | Shiva Hanifi et.al. | 2511.09604 | null |
| 2025-11-12 | Diffusion-Based Quality Control of Medical Image Segmentations across Organs | Vincenzo Marcianò et.al. | 2511.09588 | null |
| 2025-11-12 | Video Echoed in Music: Semantic, Temporal, and Rhythmic Alignment for Video-to-Music Generation | Xinyi Tong et.al. | 2511.09585 | null |
| 2025-11-12 | SPIDER: Scalable Physics-Informed Dexterous Retargeting | Chaoyi Pan et.al. | 2511.09484 | null |
| 2025-11-12 | MCAD: Multimodal Context-Aware Audio Description Generation For Soccer | Lipisha Chaudhary et.al. | 2511.09448 | null |
| 2025-11-12 | A cross-modal pre-training framework with video data for improving performance and generalization of distributed acoustic sensing | Junyi Duan et.al. | 2511.09342 | null |
| 2025-11-12 | GRACE: Designing Generative Face Video Codec via Agile Hardware-Centric Workflow | Rui Wan et.al. | 2511.09272 | null |
| 2025-11-12 | Unveiling the Impact of Data and Model Scaling on High-Level Control for Humanoid Robots | Yuxi Wei et.al. | 2511.09241 | null |
| 2025-11-12 | AILINKPREVIEWER: Enhancing Code Reviews with LLM-Powered Link Previews | Panya Trakoolgerntong et.al. | 2511.09223 | null |
| 2025-11-12 | DBINDS -- Can Initial Noise from Diffusion Model Inversion Help Reveal AI-Generated Videos? | Yanlin Wu et.al. | 2511.09184 | null |
| 2025-11-10 | Robot Learning from a Physical World Model | Jiageng Mao et.al. | 2511.07416 | null |
| 2025-11-10 | StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation | Tianrui Feng et.al. | 2511.07399 | null |
| 2025-11-10 | Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation | Jie Du et.al. | 2511.01450 | null |
| 2025-11-09 | GenAI vs. Human Creators: Procurement Mechanism Design in Two-/Three-Layer Markets | Rui Ai et.al. | 2511.06559 | null |
| 2025-11-09 | RelightMaster: Precise Video Relighting with Multi-plane Light Images | Weikang Bian et.al. | 2511.06271 | null |
| 2025-11-08 | Neodragon: Mobile Video Generation using Diffusion Transformer | Animesh Karnewar et.al. | 2511.06055 | null |
| 2025-11-07 | THEval. Evaluation Framework for Talking Head Video Generation | Nabyl Quignon et.al. | 2511.04520 | null |
| 2025-11-06 | InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation | Jinlai Liu et.al. | 2511.04675 | null |
| 2025-11-06 | Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm | Jingqi Tong et.al. | 2511.04570 | null |
| 2025-11-06 | RISE-T2V: Rephrasing and Injecting Semantics with LLM for Expansive Text-to-Video Generation | Xiangjun Zhang et.al. | 2511.04317 | null |
| 2025-11-06 | PhysCorr: Dual-Reward DPO for Physics-Constrained Text-to-Video Generation with Automated Preference Selection | Peiyao Wang et.al. | 2511.03997 | null |
| 2025-11-05 | UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions | Guozhen Zhang et.al. | 2511.03334 | null |
| 2025-11-05 | Unified Long Video Inpainting and Outpainting via Overlapping High-Order Co-Denoising | Shuangquan Lyu et.al. | 2511.03272 | null |
| 2025-11-04 | Video Text Preservation with Synthetic Text-Rich Videos | Ziyang Liu et.al. | 2511.05573 | null |
| 2025-11-04 | ID-Composer: Multi-Subject Video Synthesis with Hierarchical Identity Preservation | Panwang Pan et.al. | 2511.00511 | null |
| 2025-11-03 | How Far Are Surgeons from Surgical World Models? A Pilot Study on Zero-shot Surgical Video Generation with Expert Assessment | Zhen Chen et.al. | 2511.01775 | null |
| 2025-11-03 | Driving scenario generation and evaluation using a structured layer representation and foundational models | Arthur Hubert et.al. | 2511.01541 | null |
| 2025-11-03 | Towards One-step Causal Video Generation via Adversarial Self-Distillation | Yongqi Yang et.al. | 2511.01419 | null |
| 2025-11-03 | MotionStream: Real-Time Video Generation with Interactive Motion Controls | Joonghyuk Shin et.al. | 2511.01266 | null |
| 2025-11-01 | Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models | Panwang Pan et.al. | 2511.00503 | null |
| 2025-10-31 | Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals | Xiangyu Fan et.al. | 2510.27684 | null |
| 2025-10-31 | Fine-Tuning Open Video Generators for Cinematic Scene Synthesis: A Small-Data Pipeline with LoRA and Wan2.1 I2V | Meftun Akarsu et.al. | 2510.27364 | null |
| 2025-10-31 | DANCER: Dance ANimation via Condition Enhancement and Rendering with diffusion model | Yucheng Xing et.al. | 2510.27169 | null |
| 2025-10-31 | Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark | Ziyu Guo et.al. | 2510.26802 | null |
| 2025-10-30 | AI Powered High Quality Text to Video Generation with Enhanced Temporal Consistency | Piyushkumar Patel et.al. | 2511.00107 | null |
| 2025-10-30 | LeMiCa: Lexicographic Minimax Path Caching for Efficient Diffusion-Based Video Generation | Huanlin Gao et.al. | 2511.00090 | null |
| 2025-10-30 | SEE4D: Pose-Free 4D Generation via Auto-Regressive Video Inpainting | Dongyue Lu et.al. | 2510.26796 | null |
| 2025-10-30 | The Quest for Generalizable Motion Generation: Data, Model, and Evaluation | Jing Lin et.al. | 2510.26794 | null |
| 2025-10-30 | Co-Evolving Latent Action World Models | Yucen Wang et.al. | 2510.26433 | null |
| 2025-10-30 | LoCoT2V-Bench: A Benchmark for Long-Form and Complex Text-to-Video Generation | Xiangqing Zheng et.al. | 2510.26412 | null |
| 2025-10-29 | VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning | Baolu Li et.al. | 2510.25772 | null |
| 2025-10-29 | VC4VG: Optimizing Video Captions for Text-to-Video Generation | Yang Du et.al. | 2510.24134 | null |
| 2025-10-28 | World Simulation with Video Foundation Models for Physical AI | NVIDIA et.al. | 2511.00062 | null |
| 2025-10-28 | VividCam: Learning Unconventional Camera Motions from Virtual Synthetic Videos | Qiucheng Wu et.al. | 2510.24904 | null |
| 2025-10-28 | Generative View Stitching | Chonghyuk Song et.al. | 2510.24718 | null |
| 2025-10-28 | Uniform Discrete Diffusion with Metric Path for Video Generation | Haoge Deng et.al. | 2510.24717 | null |
| 2025-10-28 | MC-SJD : Maximal Coupling Speculative Jacobi Decoding for Autoregressive Visual Generation Acceleration | Junhyuk So et.al. | 2510.24211 | null |
| 2025-10-28 | LongCat-Video Technical Report | Meituan LongCat Team et.al. | 2510.22200 | null |
| 2025-10-27 | CoMo: Compositional Motion Customization for Text-to-Video Generation | Youcan Xu et.al. | 2510.23007 | null |
| 2025-10-27 | Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method | Bohan Li et.al. | 2510.22973 | null |
| 2025-10-26 | MAGIC-Talk: Motion-aware Audio-Driven Talking Face Generation with Customizable Identity Control | Fatemeh Nazarieh et.al. | 2510.22810 | null |
| 2025-10-25 | Hollywood Town: Long-Video Generation via Cross-Modal Multi-Agent Orchestration | Zheng Wei et.al. | 2510.22431 | null |
| 2025-10-24 | Two-Steps Diffusion Policy for Robotic Manipulation via Genetic Denoising | Mateo Clemente et.al. | 2510.21991 | null |
| 2025-10-24 | BachVid: Training-Free Video Generation with Consistent Background and Character | Han Yan et.al. | 2510.21696 | null |
| 2025-10-24 | Epipolar Geometry Improves Video Generation Models | Orest Kupyn et.al. | 2510.21615 | null |
| 2025-10-24 | OmniNWM: Omniscient Driving Navigation World Models | Bohan Li et.al. | 2510.18313 | null |
| 2025-10-23 | Generative AI in Depth: A Survey of Recent Advances, Model Variants, and Real-World Applications | Shamim Yazdani et.al. | 2510.21887 | null |
| 2025-10-23 | Video-As-Prompt: Unified Semantic Control for Video Generation | Yuxuan Bian et.al. | 2510.20888 | null |
| 2025-10-23 | Video Prediction of Dynamic Physical Simulations With Pixel-Space Spatiotemporal Transformers | Dean L Slack et.al. | 2510.20807 | null |
| 2025-10-23 | RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling | Bingjie Gao et.al. | 2510.20206 | null |
| 2025-10-23 | Evaluating Video Models as Simulators of Multi-Person Pedestrian Trajectories | Aaron Appelle et.al. | 2510.20182 | null |
| 2025-10-23 | Video Consistency Distance: Enhancing Temporal Consistency for Image-to-Video Generation via Reward-Based Fine-Tuning | Takehiro Aoshima et.al. | 2510.19193 | null |
| 2025-10-23 | A Renaissance of Explicit Motion Information Mining from Transformers for Action Recognition | Peiqin Zhuang et.al. | 2510.18705 | null |
| 2025-10-22 | Improving the Physics of Video Generation with VJEPA-2 Reward Signal | Jianhao Yuan et.al. | 2510.21840 | null |
| 2025-10-22 | A new wave of vehicle insurance fraud fueled by generative AI | Amir Hever et.al. | 2510.19957 | null |
| 2025-10-22 | PoseCrafter: Extreme Pose Estimation with Hybrid Video Synthesis | Qing Mao et.al. | 2510.19527 | null |
| 2025-10-22 | GigaBrain-0: A World Model-Powered Vision-Language-Action Model | GigaBrain Team et.al. | 2510.19430 | null |
| 2025-10-22 | FeatureFool: Zero-Query Fooling of Video Models via Feature Map | Duoxun Tang et.al. | 2510.18362 | null |
| 2025-10-22 | MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models | Yongshun Zhang et.al. | 2510.17519 | null |
| 2025-10-22 | ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints | Meiqi Wu et.al. | 2510.14847 | null |
| 2025-10-21 | MoAlign: Motion-Centric Representation Alignment for Video Diffusion Models | Aritra Bhowmik et.al. | 2510.19022 | null |
| 2025-10-21 | UltraGen: High-Resolution Video Generation with Hierarchical Attention | Teng Hu et.al. | 2510.18775 | null |
| 2025-10-21 | MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation | Weinan Jia et.al. | 2510.18692 | null |
| 2025-10-21 | Kaleido: Open-Sourced Multi-Subject Reference Video Generation Model | Zhenxing Zhang et.al. | 2510.18573 | null |
| 2025-10-20 | World-in-World: World Models in a Closed-Loop World | Jiahan Zhang et.al. | 2510.18135 | null |
| 2025-10-20 | Demystifying Transition Matching: When and Why It Can Beat Flow Matching | Jaihoon Kim et.al. | 2510.17991 | null |
| 2025-10-20 | From Preferences to Prejudice: The Role of Alignment Tuning in Shaping Social Bias in Video Diffusion Models | Zefan Cai et.al. | 2510.17247 | null |
| 2025-10-20 | DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion | Weijie Wang et.al. | 2510.15264 | null |
| 2025-10-20 | Identity-Preserving Image-to-Video Generation via Reward-Guided Optimization | Liao Shen et.al. | 2510.14255 | null |
| 2025-10-19 | An empirical study of the effect of video encoders on Temporal Video Grounding | Ignacio M. De la Jara et.al. | 2510.17007 | null |
| 2025-10-19 | From Mannequin to Human: A Pose-Aware and Identity-Preserving Video Generation Framework for Lifelike Clothing Display | Xiangyu Mu et.al. | 2510.16833 | null |
| 2025-10-19 | STANCE: Motion Coherent Video Generation Via Sparse-to-Dense Anchored Encoding | Zhifei Chen et.al. | 2510.14588 | null |
| 2025-10-17 | VISTA: A Test-Time Self-Improving Video Generation Agent | Do Xuan Long et.al. | 2510.15831 | null |
| 2025-10-17 | Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset | Qingyan Bai et.al. | 2510.15742 | null |
| 2025-10-17 | Identity-GRPO: Optimizing Multi-Human Identity-preserving Video Generation via Reinforcement Learning | Xiangyu Meng et.al. | 2510.14256 | null |
| 2025-10-17 | Ctrl-VI: Controllable Video Synthesis via Variational Inference | Haoyi Duan et.al. | 2510.07670 | null |
| 2025-10-16 | TGT: Text-Grounded Trajectories for Locally Controlled Video Generation | Guofeng Zhang et.al. | 2510.15104 | null |
| 2025-10-16 | RealDPO: Real or Not Real, that is the Preference | Guo Cheng et.al. | 2510.14955 | null |
| 2025-10-16 | DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation | Yu Zhou et.al. | 2510.14949 | null |
| 2025-10-16 | 3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation | JoungBin Lee et.al. | 2510.14945 | null |
| 2025-10-16 | In-Context Learning with Unpaired Clips for Instruction-based Video Editing | Xinyao Liao et.al. | 2510.14648 | null |
| 2025-10-16 | Virtually Being: Customizing Camera-Controllable Video Diffusion Models with Multi-View Performance Captures | Yuancheng Xu et.al. | 2510.14179 | null |
| 2025-10-15 | PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning | Sihui Ji et.al. | 2510.13809 | null |
| 2025-10-15 | CanvasMAR: Improving Masked Autoregressive Video Generation With Canvas | Zian Li et.al. | 2510.13669 | null |
| 2025-10-15 | VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator | Hyojun Go et.al. | 2510.13454 | null |
| 2025-10-15 | Counting Hallucinations in Diffusion Models | Shuai Fu et.al. | 2510.13080 | null |
| 2025-10-14 | SeqBench: Benchmarking Sequential Narrative Generation in Text-to-Video Models | Zhengxu Tang et.al. | 2510.13042 | null |
| 2025-10-14 | MVP4D: Multi-View Portrait Video Diffusion for Animatable 4D Avatars | Felix Taubner et.al. | 2510.12785 | null |
| 2025-10-14 | Time-Correlated Video Bridge Matching | Viacheslav Vasilev et.al. | 2510.12453 | null |
| 2025-10-14 | BIGFix: Bidirectional Image Generation with Token Fixing | Victor Besnier et.al. | 2510.12231 | null |
| 2025-10-14 | Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback | Xingpei Ma et.al. | 2510.12089 | null |
| 2025-10-13 | Point Prompting: Counterfactual Tracking with Video Diffusion Models | Ayush Shrivastava et.al. | 2510.11715 | null |
| 2025-10-13 | MoMaps: Semantics-Aware Scene Motion Generation with Motion Maps | Jiahui Lei et.al. | 2510.11107 | null |
| 2025-10-13 | Q-Router: Agentic Video Quality Assessment with Expert Model Routing and Artifact Localization | Shuo Xing et.al. | 2510.08789 | null |
| 2025-10-12 | AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes | Yu Li et.al. | 2510.10670 | null |
| 2025-10-12 | DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis | Peiyin Chen et.al. | 2510.10650 | null |
| 2025-10-11 | EditCast3D: Single-Frame-Guided 3D Editing with Video Propagation and View Selection | Huaizhi Qu et.al. | 2510.13652 | null |
| 2025-10-11 | MultiCOIN: Multi-Modal COntrollable Video INbetweening | Maham Tanveer et.al. | 2510.08561 | null |
| 2025-10-10 | Stable Video Infinity: Infinite-Length Video Generation with Error Recycling | Wuyang Li et.al. | 2510.09212 | null |
| 2025-10-10 | MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling | Qian Wang et.al. | 2508.08487 | null |
| 2025-10-09 | SkipSR: Faster Super Resolution with Token Skipping | Rohan Choudhury et.al. | 2510.08799 | null |
| 2025-10-09 | NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos | Hongyu Li et.al. | 2510.08568 | null |
| 2025-10-09 | VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning | Minghong Cai et.al. | 2510.08555 | null |
| 2025-10-09 | X2Video: Adapting Diffusion Models for Multimodal Controllable Neural Video Rendering | Zhitong Huang et.al. | 2510.08530 | null |
| 2025-10-09 | FlexTraj: Image-to-Video Generation with Flexible Point Trajectory Control | Zhiyuan Zhang et.al. | 2510.08527 | null |
| 2025-10-09 | UniVideo: Unified Understanding, Generation, and Editing for Videos | Cong Wei et.al. | 2510.08377 | null |
| 2025-10-09 | LinVideo: A Post-Training Framework towards O(n) Attention in Efficient Video Generation | Yushi Huang et.al. | 2510.08318 | null |
| 2025-10-09 | UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution | Shian Du et.al. | 2510.08143 | null |
| 2025-10-09 | Real-Time Motion-Controllable Autoregressive Video Diffusion | Kesen Zhao et.al. | 2510.08131 | null |
| 2025-10-09 | CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving | Tianrui Zhang et.al. | 2510.07944 | null |
| 2025-10-09 | TTOM: Test-Time Optimization and Memorization for Compositional Video Generation | Leigang Qu et.al. | 2510.07940 | null |
| 2025-10-09 | Once Is Enough: Lightweight DiT-Based Video Virtual Try-On via One-Time Garment Appearance Injection | Yanjie Pan et.al. | 2510.07654 | null |
| 2025-10-09 | Paper2Video: Automatic Video Generation from Scientific Papers | Zeyu Zhu et.al. | 2510.05096 | null |
| 2025-10-08 | TRAVL: A Recipe for Making Video-Language Models Better Judges of Physics Implausibility | Saman Motamed et.al. | 2510.07550 | null |
| 2025-10-08 | DynamicEval: Rethinking Evaluation for Dynamic Text-to-Video Synthesis | Nithin C. Babu et.al. | 2510.07441 | null |
| 2025-10-08 | WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation | Zezhong Qian et.al. | 2510.07313 | null |
| 2025-10-08 | MATRIX: Mask Track Alignment for Interaction-aware Video Generation | Siyoon Jin et.al. | 2510.07310 | null |
| 2025-10-08 | TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation | Jiaben Chen et.al. | 2510.07249 | null |
| 2025-10-08 | MV-Performer: Taming Video Diffusion Model for Faithful and Synchronized Multi-view Performer Synthesis | Yihao Zhi et.al. | 2510.07190 | null |
| 2025-10-08 | Generative World Modelling for Humanoids: 1X World Model Challenge Technical Report | Riccardo Mereu et.al. | 2510.07092 | null |
| 2025-10-08 | Addressing the ID-Matching Challenge in Long Video Captioning | Zhantao Yang et.al. | 2510.06973 | null |
| 2025-10-07 | Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models | Jiahao Wang et.al. | 2510.06209 | null |
| 2025-10-07 | When and How to Cut Classical Concerts? A Multimodal Automated Video Editing Approach | Daniel Gonzálbez-Biosca et.al. | 2510.05661 | null |
| 2025-10-06 | LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation | Yang Xiao et.al. | 2510.05367 | null |
| 2025-10-06 | VChain: Chain-of-Visual-Thought for Reasoning in Video Generation | Ziqi Huang et.al. | 2510.05094 | null |
| 2025-10-06 | Character Mixing for Video Generation | Tingting Liao et.al. | 2510.05093 | null |
| 2025-10-06 | Bridging Text and Video Generation: A Survey | Nilay Kumar et.al. | 2510.04999 | null |
| 2025-10-06 | What Drives Compositional Generalization in Visual Generative Models? | Karim Farid et.al. | 2510.03075 | null |
| 2025-10-05 | ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation | Jay Zhangjie Wu et.al. | 2510.04290 | null |
| 2025-10-05 | Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers | Shikang Zheng et.al. | 2510.04188 | null |
| 2025-10-04 | Generating Human Motion Videos using a Cascaded Text-to-Video Framework | Hyelin Nam et.al. | 2510.03909 | null |
| 2025-10-03 | Mask2IV: Interaction-Centric Video Generation via Mask Trajectories | Gen Li et.al. | 2510.03135 | null |
| 2025-10-03 | Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction | Kaisi Guan et.al. | 2510.03117 | null |
| 2025-10-03 | When and Where do Events Switch in Multi-Event Video Generation? | Ruotong Liao et.al. | 2510.03049 | null |
| 2025-10-03 | Pack and Force Your Memory: Long-form and Consistent Video Generation | Xiaofei Wu et.al. | 2510.01784 | null |
| 2025-10-02 | Input-Aware Sparse Attention for Real-Time Co-Speech Video Generation | Beijia Lu et.al. | 2510.02617 | null |
| 2025-10-02 | How Confident are Video Models? Empowering Video Models to Express their Uncertainty | Zhiting Mei et.al. | 2510.02571 | null |
| 2025-10-02 | Inferring Dynamic Physical Properties from Video Foundation Models | Guanqi Zhan et.al. | 2510.02311 | null |
| 2025-10-02 | MultiModal Action Conditioned Video Generation | Yichen Li et.al. | 2510.02287 | null |
| 2025-10-02 | Learning to Generate Object Interactions with Physics-Guided Video Diffusion | David Romero et.al. | 2510.02284 | null |
| 2025-10-02 | Self-Forcing++: Towards Minute-Scale High-Quality Video Generation | Justin Cui et.al. | 2510.02283 | null |
| 2025-10-02 | TempoControl: Temporal Attention Guidance for Text-to-Video Models | Shira Schiber et.al. | 2510.02226 | null |
| 2025-10-02 | Multi-marginal temporal Schrödinger Bridge Matching for video generation from unpaired data | Thomas Gravier et.al. | 2510.01894 | null |
| 2025-10-01 | IMAGEdit: Let Any Subject Transform | Fei Shen et.al. | 2510.01186 | null |
| 2025-10-01 | EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory | Jiahao Wang et.al. | 2510.01183 | null |
| 2025-10-01 | Code2Video: A Code-centric Paradigm for Educational Video Generation | Yanzhe Chen et.al. | 2510.01174 | null |
| 2025-10-01 | From Seeing to Predicting: A Vision-Language Framework for Trajectory Forecasting and Controlled Video Generation | Fan Yang et.al. | 2510.00806 | null |
| 2025-10-01 | Arbitrary Generative Video Interpolation | Guozhen Zhang et.al. | 2510.00578 | null |
| 2025-10-01 | BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration | Zhaoyang Li et.al. | 2510.00438 | null |
| 2025-09-30 | Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation | Chetwin Low et.al. | 2510.01284 | null |
| 2025-09-30 | Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation | Agneet Chatterjee et.al. | 2509.26555 | null |
| 2025-09-30 | MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation | Chenhui Zhu et.al. | 2509.26391 | null |
| 2025-09-30 | PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution | Shian Du et.al. | 2509.26025 | null |
| 2025-09-30 | Wan-Alpha: High-Quality Text-to-Video Generation with Alpha Channel | Haotian Dong et.al. | 2509.24979 | null |
| 2025-09-30 | QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification | Weilun Feng et.al. | 2509.23681 | null |
| 2025-09-29 | FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation | Yunyang Ge et.al. | 2509.25187 | null |
| 2025-09-29 | DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder | Junyu Chen et.al. | 2509.25182 | null |
| 2025-09-29 | Rolling Forcing: Autoregressive Long Video Diffusion in Real Time | Kunhao Liu et.al. | 2509.25161 | null |
| 2025-09-29 | PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion | Yuyang Yin et.al. | 2509.24997 | null |
| 2025-09-29 | SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation | Shuang Liang et.al. | 2509.24980 | null |
| 2025-09-29 | Attention Surgery: An Efficient Recipe to Linearize Your Video Diffusion Transformer | Mohsen Ghafoorian et.al. | 2509.24899 | null |
| 2025-09-29 | Enhancing Physical Plausibility in Video Generation by Reasoning the Implausibility | Yutong Hao et.al. | 2509.24702 | null |
| 2025-09-29 | SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer | Junsong Chen et.al. | 2509.24695 | null |
| 2025-09-29 | Learning Object-Centric Representations Based on Slots in Real World Scenarios | Adil Kaan Akan et.al. | 2509.24652 | null |
| 2025-09-29 | UI2V-Bench: An Understanding-based Image-to-video Generation Benchmark | Ailing Zhang et.al. | 2509.24427 | null |
| 2025-09-29 | CLQ: Cross-Layer Guided Orthogonal-based Quantization for Diffusion Transformers | Kai Liu et.al. | 2509.24416 | null |
| 2025-09-29 | NeRV-Diffusion: Diffuse Implicit Neural Representations for Video Synthesis | Yixuan Ren et.al. | 2509.24353 | null |
| 2025-09-29 | FreeAction: Training-Free Techniques for Enhanced Fidelity of Trajectory-to-Video Generation | Seungwook Kim et.al. | 2509.24241 | null |
| 2025-09-28 | Autoregressive Video Generation beyond Next Frames Prediction | Sucheng Ren et.al. | 2509.24081 | null |
| 2025-09-28 | SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention | Jintao Zhang et.al. | 2509.24006 | null |
| 2025-09-28 | VividFace: High-Quality and Efficient One-Step Diffusion For Video Face Enhancement | Shulian Zhang et.al. | 2509.23584 | null |
| 2025-09-27 | Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing | Rohit Chowdhury et.al. | 2509.23279 | null |
| 2025-09-27 | Sparse2Dense: A Keypoint-driven Generative Framework for Human Video Compression and Vertex Prediction | Bolin Chen et.al. | 2509.23169 | null |
| 2025-09-26 | Physically Plausible Multi-System Trajectory Generation and Symmetry Discovery | Jiayin Liu et.al. | 2509.23003 | null |
| 2025-09-26 | VideoScore2: Think before You Score in Generative Video Evaluation | Xuan He et.al. | 2509.22799 | null |
| 2025-09-26 | Learning Human-Perceived Fakeness in AI-Generated Videos via Multimodal LLMs | Xingyu Fu et.al. | 2509.22646 | null |
| 2025-09-26 | LongLive: Real-time Interactive Long Video Generation | Shuai Yang et.al. | 2509.22622 | null |
| 2025-09-26 | EgoDemoGen: Novel Egocentric Demonstration Generation Enables Viewpoint-Robust Manipulation | Yuan Xu et.al. | 2509.22578 | null |
| 2025-09-26 | EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer | Zhehao Dong et.al. | 2509.22407 | null |
| 2025-09-26 | Syncphony: Synchronized Audio-to-Video Generation with Diffusion Transformers | Jibin Song et.al. | 2509.21893 | null |
| 2025-09-26 | DiTraj: training-free trajectory control for video diffusion transformer | Cheng Lei et.al. | 2509.21839 | null |
| 2025-09-26 | MoWM: Mixture-of-World-Models for Embodied Planning via Latent-to-Pixel Feature Modulation | Yu Shang et.al. | 2509.21797 | null |
| 2025-09-26 | LongScape: Advancing Long-Horizon Embodied World Models with Context-Aware MoE | Yu Shang et.al. | 2509.21790 | null |
| 2025-09-26 | UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models | Lan Chen et.al. | 2509.21760 | null |
| 2025-09-25 | FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction | Yixiang Dai et.al. | 2509.21657 | null |
| 2025-09-25 | What Happens Next? Anticipating Future Motion by Generating Point Trajectories | Gabrijel Boduljak et.al. | 2509.21592 | null |
| 2025-09-25 | ControlHair: Physically-based Video Diffusion for Controllable Dynamic Hair Rendering | Weikai Lin et.al. | 2509.21541 | null |
| 2025-09-25 | NewtonGen: Physics-Consistent and Controllable Text-to-Video Generation via Neural Newtonian Dynamics | Yu Yuan et.al. | 2509.21309 | null |
| 2025-09-25 | MotionFlow:Learning Implicit Motion Flow for Complex Camera Trajectory Control in Video Generation | Guojun Lei et.al. | 2509.21119 | null |
| 2025-09-25 | EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning | Xuan Ju et.al. | 2509.20360 | null |
| 2025-09-24 | PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation | Chen Wang et.al. | 2509.20358 | null |
| 2025-09-24 | 4D Driving Scene Generation With Stereo Forcing | Hao Lu et.al. | 2509.20251 | null |
| 2025-09-24 | CamPVG: Camera-Controlled Panoramic Video Generation with Epipolar-Aware Diffusion | Chenhao Ji et.al. | 2509.19979 | null |
| 2025-09-24 | OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling | Yang Zhou et.al. | 2509.12201 | null |
| 2025-09-23 | Text Slider: Efficient and Plug-and-Play Continuous Concept Control for Image/Video Synthesis via LoRA Adapters | Pin-Yen Chiu et.al. | 2509.18831 | null |
| 2025-09-22 | VideoFrom3D: 3D Scene Video Generation via Complementary Image and Video Diffusion Models | Geonung Kim et.al. | 2509.17985 | null |
| 2025-09-22 | I2VWM: Robust Watermarking for Image to Video Generation | Guanjie Wang et.al. | 2509.17773 | null |
| 2025-09-21 | Echo-Path: Pathology-Conditioned Echo Video Generation | Kabir Hamzah Muhammad et.al. | 2509.17190 | null |
| 2025-09-21 | VidCLearn: A Continual Learning Approach for Text-to-Video Generation | Luca Zanchetta et.al. | 2509.16956 | null |
| 2025-09-21 | Yuanzhi Li et.al. | 2509.16873 | null | |
| 2025-09-20 | RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation | Tianyi Yan et.al. | 2509.16500 | null |
| 2025-09-19 | Lynx: Towards High-Fidelity Personalized Video Generation | Shen Sang et.al. | 2509.15496 | null |
| 2025-09-19 | AToken: A Unified Tokenizer for Vision | Jiasen Lu et.al. | 2509.14476 | null |
| 2025-09-18 | OpenViGA: Video Generation for Automotive Driving Scenes by Streamlining and Fine-Tuning Open Source Models with Public Data | Björn Möller et.al. | 2509.15479 | null |
| 2025-09-18 | RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation | Yuming Jiang et.al. | 2509.15212 | null |
| 2025-09-18 | WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance | Chenxi Song et.al. | 2509.15130 | null |
| 2025-09-18 | DACoN: DINO for Anime Paint Bucket Colorization with Any Number of Reference Images | Kazuma Nagata et.al. | 2509.14685 | null |
| 2025-09-18 | BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching | Hanshuai Cui et.al. | 2509.13789 | null |
| 2025-09-17 | PhysicalAgent: Towards General Cognitive Robotics with Foundation World Models | Artem Lykov et.al. | 2509.13903 | null |
| 2025-09-17 | TeraSim-World: Worldwide Safety-Critical Data Synthesis for End-to-End Autonomous Driving | Jiawei Wang et.al. | 2509.13164 | null |
| 2025-09-17 | Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis | Yikang Ding et.al. | 2509.09595 | null |
| 2025-09-16 | \textsc{Gen2Real}: Towards Demo-Free Dexterous Manipulation by Harnessing Generated Video | Kai Ye et.al. | 2509.14178 | null |
| 2025-09-16 | BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models | Yuming Li et.al. | 2509.06040 | null |
| 2025-09-15 | AvatarSync: Rethinking Talking-Head Animation through Autoregressive Perspective | Yuchen Deng et.al. | 2509.12052 | null |
| 2025-09-15 | SpeCa: Accelerating Diffusion Transformers with Speculative Feature Caching | Jiacheng Liu et.al. | 2509.11628 | null |
| 2025-09-15 | MVQA-68K: A Multi-dimensional and Causally-annotated Dataset with Quality Interpretability for Video Assessment | Yanyun Pu et.al. | 2509.11589 | null |
| 2025-09-14 | VideoAgent: Personalized Synthesis of Scientific Videos | Xiao Liang et.al. | 2509.11253 | null |
| 2025-09-14 | PanoLora: Bridging Perspective and Panoramic Video Generation with LoRA Adaptation | Zeyu Dong et.al. | 2509.11092 | null |
| 2025-09-12 | Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video Generation | Hao Zhang et.al. | 2509.10687 | null |
| 2025-09-12 | T2Bs: Text-to-Character Blendshapes via Video Generation | Jiahao Luo et.al. | 2509.10678 | null |
| 2025-09-12 | Compute Only 16 Tokens in One Timestep: Accelerating Diffusion Transformers with Cluster-Driven Feature Caching | Zhixin Zheng et.al. | 2509.10312 | null |
| 2025-09-11 | Improving Video Diffusion Transformer Training by Multi-Feature Fusion and Alignment from Self-Supervised Vision Encoders | Dohun Lee et.al. | 2509.09547 | null |
| 2025-09-11 | Zero-shot 3D-Aware Trajectory-Guided image-to-video generation via Test-Time Training | Ruicheng Zhang et.al. | 2509.06723 | null |
| 2025-09-10 | RewardDance: Reward Scaling in Visual Generation | Jie Wu et.al. | 2509.08826 | null |
| 2025-09-10 | GeneVA: A Dataset of Human Annotations for Generative Text to Video Artifacts | Jenna Kang et.al. | 2509.08818 | null |
| 2025-09-10 | HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning | Liyang Chen et.al. | 2509.08519 | null |
| 2025-09-09 | ANYPORTAL: Zero-Shot Consistent Video Background Replacement | Wenshuo Gao et.al. | 2509.07472 | null |
| 2025-09-09 | Coefficients-Preserving Sampling for Reinforcement Learning with Flow Matching | Feng Wang et.al. | 2509.05952 | null |
| 2025-09-09 | Attention of a Kiss: Exploring Attention Maps in Video Diffusion for XAIxArts | Adam Cole et.al. | 2509.05323 | null |
| 2025-09-07 | UniVerse-1: Unified Audio-Video Generation via Stitching of Experts | Duomin Wang et.al. | 2509.06155 | null |
| 2025-09-04 | Virtual Fitting Room: Generating Arbitrarily Long Videos of Virtual Try-On from a Single Image -- Technical Preview | Jun-Kun Chen et.al. | 2509.04450 | null |
| 2025-09-04 | Human Motion Video Generation: A Survey | Haiwei Xue et.al. | 2509.03883 | null |
| 2025-09-03 | CompSlider: Compositional Slider for Disentangled Multiple-Attribute Image Generation | Zixin Zhu et.al. | 2509.01028 | null |
| 2025-09-01 | Identity-Preserving Text-to-Video Generation via Training-Free Prompt, Image, and Guidance Enhancement | Jiayi Gao et.al. | 2509.01362 | null |
| 2025-09-01 | Communicative Agents for Slideshow Storytelling Video Generation based on LLMs | Jingxing Fan et.al. | 2509.01277 | null |
| 2025-09-01 | FantasyHSI: Video-Generation-Centric 4D Human Synthesis In Any Scene through A Graph-based Multi-Agent Framework | Lingzhou Mu et.al. | 2509.01232 | null |
| 2025-08-30 | DevilSight: Augmenting Monocular Human Avatar Reconstruction through a Virtual Perspective | Yushuo Chen et.al. | 2509.00403 | null |
| 2025-08-28 | Mixture of Contexts for Long Video Generation | Shengqu Cai et.al. | 2508.21058 | null |
| 2025-08-28 | POSE: Phased One-Step Adversarial Equilibrium for Video Diffusion Models | Jiaxiang Cheng et.al. | 2508.21019 | null |
| 2025-08-28 | Learning Primitive Embodied World Models: Towards Scalable Robotic Learning | Qiao Sun et.al. | 2508.20840 | null |
| 2025-08-28 | Realistic and Controllable 3D Gaussian-Guided Object Editing for Driving Video Generation | Jiusi Li et.al. | 2508.20471 | null |
| 2025-08-28 | Ego-centric Predictive Model Conditioned on Hand Trajectories | Binjie Zhang et.al. | 2508.19852 | null |
| 2025-08-28 | MIDAS: Multimodal Interactive Digital-humAn Synthesis via Real-time Autoregressive Video Generation | Ming Chen et.al. | 2508.19320 | null |
| 2025-08-27 | ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion | Xurui Peng et.al. | 2508.21091 | null |
| 2025-08-26 | ROSE: Remove Objects with Side Effects in Videos | Chenxuan Miao et.al. | 2508.18633 | null |
| 2025-08-26 | Wan-S2V: Audio-Driven Cinematic Video Generation | Xin Gao et.al. | 2508.18621 | null |
| 2025-08-26 | Waver: Wave Your Way to Lifelike Video Generation | Yifu Zhang et.al. | 2508.15761 | null |
| 2025-08-25 | SuperGen: An Efficient Ultra-high-resolution Video Generation System with Sketching and Tiling | Fanjiang Ye et.al. | 2508.17756 | null |
| 2025-08-25 | OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models | Huanpeng Chu et.al. | 2508.16212 | null |
| 2025-08-24 | A Synthetic Dataset for Manometry Recognition in Robotic Applications | Pedro Antonio Rabelo Saraiva et.al. | 2508.17468 | null |
| 2025-08-24 | MoCo: Motion-Consistent Human Video Generation via Structure-Appearance Decoupling | Haoyu Wang et.al. | 2508.17404 | null |
| 2025-08-24 | DiCache: Let Diffusion Model Determine Its Own Cache | Jiazi Bu et.al. | 2508.17356 | null |
| 2025-08-23 | SSG-Dit: A Spatial Signal Guided Framework for Controllable Video Generation | Peng Hu et.al. | 2508.17062 | null |
| 2025-08-23 | HiCache: Training-free Acceleration of Diffusion Models via Hermite Polynomial-based Feature Caching | Liang Feng et.al. | 2508.16984 | null |
| 2025-08-23 | HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation | Sizhe Shan et.al. | 2508.16930 | null |
| 2025-08-22 | Seeing Clearly, Forgetting Deeply: Revisiting Fine-Tuned Video Generators for Driving Simulation | Chun-Peng Chang et.al. | 2508.16512 | null |
| 2025-08-22 | Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers | Shikang Zheng et.al. | 2508.16211 | null |
| 2025-08-21 | Spatial Policy: Guiding Visuomotor Robotic Manipulation with Spatial-Aware Modeling and Reasoning | Yijun Liu et.al. | 2508.15874 | null |
| 2025-08-21 | CineScale: Free Lunch in High-Resolution Cinematic Visual Generation | Haonan Qiu et.al. | 2508.15774 | null |
| 2025-08-21 | Scaling Group Inference for Diverse and High-Quality Generation | Gaurav Parmar et.al. | 2508.15773 | null |
| 2025-08-21 | WorldWeaver: Generating Long-Horizon Video Worlds via Rich Perception | Zhiheng Liu et.al. | 2508.15720 | null |
| 2025-08-21 | TiP4GEN: Text to Immersive Panorama 4D Scene Generation | Ke Xing et.al. | 2508.12415 | null |
| 2025-08-20 | DreamSwapV: Mask-guided Subject Swapping for Any Customized Video Editing | Weitao Wang et.al. | 2508.14465 | null |
| 2025-08-20 | MoVieDrive: Multi-Modal Multi-View Urban Scene Video Generation | Guile Wu et.al. | 2508.14327 | null |
| 2025-08-19 | xDiff: Online Diffusion Model for Collaborative Inter-Cell Interference Management in 5G O-RAN | Peihao Yan et.al. | 2508.15843 | null |
| 2025-08-19 | InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing | Shaoshu Yang et.al. | 2508.14033 | null |
| 2025-08-19 | Physics-Based 3D Simulation for Synthetic Data Generation and Failure Analysis in Packaging Stability Assessment | Samuel Seligardi et.al. | 2508.13989 | null |
| 2025-08-18 | 4DNeX: Feed-Forward 4D Generative Modeling Made Easy | Zhaoxi Chen et.al. | 2508.13154 | null |
| 2025-08-18 | Precise Action-to-Video Generation Through Visual Action Prompts | Yuang Wang et.al. | 2508.13104 | null |
| 2025-08-18 | EgoTwin: Dreaming Body and View in First Person | Jingqiao Xiu et.al. | 2508.13013 | null |
| 2025-08-18 | Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model | Xianglong He et.al. | 2508.13009 | null |
| 2025-08-18 | Compact Attention: Exploiting Structured Spatio-Temporal Sparsity for Fast Video Generation | Qirui Li et.al. | 2508.12969 | null |
| 2025-08-18 | Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models | Jianshu Zeng et.al. | 2508.12945 | null |
| 2025-08-18 | S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models | Chubin Chen et.al. | 2508.12880 | null |
| 2025-08-18 | E3RG: Building Explicit Emotion-driven Empathetic Response Generation System with Multimodal Large Language Model | Ronghao Lin et.al. | 2508.12854 | null |
| 2025-08-18 | MixCache: Mixture-of-Cache for Video Diffusion Transformer Acceleration | Yuanxin Wei et.al. | 2508.12691 | null |
| 2025-08-15 | CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models | Xiaoxue Wu et.al. | 2508.11484 | null |
| 2025-08-15 | Preacher: Paper-to-Video Agentic System | Jingwei Liu et.al. | 2508.09632 | null |
| 2025-08-14 | GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning | Kelin Yu et.al. | 2508.11049 | null |
| 2025-08-14 | EVCtrl: Efficient Control Adapter for Visual Generation | Zixiang Yang et.al. | 2508.10963 | null |
| 2025-08-14 | Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation | Harold Haodong Chen et.al. | 2508.10858 | null |
| 2025-08-14 | Video-BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation | Youping Gu et.al. | 2508.10774 | null |
| 2025-08-14 | AEGIS: Authenticity Evaluation Benchmark for AI-Generated Video Sequences | Jieyu Li et.al. | 2508.10771 | null |
| 2025-08-14 | HM-Talker: Hybrid Motion Modeling for High-Fidelity Talking Head Synthesis | Shiyu Liu et.al. | 2508.10566 | null |
| 2025-08-14 | From Large Angles to Consistent Faces: Identity-Preserving Video Generation via Mixture of Facial Experts | Yuji Wang et.al. | 2508.09476 | null |
| 2025-08-14 | Yan: Foundational Interactive Video Generation | Deheng Ye et.al. | 2508.08601 | null |
| 2025-08-13 | Physical Autoregressive Model for Robotic Manipulation without Action Pretraining | Zijian Song et.al. | 2508.09822 | null |
| 2025-08-12 | X-UniMotion: Animating Human Images with Expressive, Unified and Identity-Agnostic Motion Latents | Guoxian Song et.al. | 2508.09383 | null |
| 2025-08-12 | Turbo-VAED: Fast and Stable Transfer of Video-VAEs to Mobile Devices | Ya Zou et.al. | 2508.09136 | null |
| 2025-08-12 | TaoCache: Structure-Maintained Video Generation Acceleration | Zhentao Fan et.al. | 2508.08978 | null |
| 2025-08-12 | Subjective and Objective Quality Assessment of Banding Artifacts on Compressed Videos | Qi Zheng et.al. | 2508.08700 | null |
| 2025-08-12 | RealisMotion: Decomposed Human Motion Control and Video Generation in the World Space | Jingyun Liang et.al. | 2508.08588 | null |
| 2025-08-12 | S^2VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix | Peng Dai et.al. | 2508.08048 | null |
| 2025-08-12 | Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation | Fangyuan Mao et.al. | 2508.07981 | null |
| 2025-08-12 | Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation | Bowen Xue et.al. | 2508.07901 | null |
| 2025-08-11 | VSF: Simple, Efficient, and Effective Negative Guidance in Few-Step Image Generation Models By \underline{V}alue \underline{S}ign \underline{F}lip | Wenqi Guo et.al. | 2508.10931 | null |
| 2025-08-11 | StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation | Shuyuan Tu et.al. | 2508.08248 | null |
| 2025-08-11 | Matrix-3D: Omnidirectional Explorable 3D World Generation | Zhongqi Yang et.al. | 2508.08086 | null |
| 2025-08-11 | Dream4D: Lifting Camera-Controlled I2V towards Spatiotemporally Consistent 4D Generation | Xiaoyan Liu et.al. | 2508.07769 | null |
| 2025-08-11 | ShoulderShot: Generating Over-the-Shoulder Dialogue Videos | Yuang Zhang et.al. | 2508.07597 | null |
| 2025-08-08 | Restage4D: Reanimating Deformable 3D Reconstruction from a Single Video | Jixuan He et.al. | 2508.06715 | null |
| 2025-08-08 | SwiftVideo: A Unified Framework for Few-Step Video Generation through Trajectory-Distribution Alignment | Yanxiao Sun et.al. | 2508.06082 | null |
| 2025-08-08 | DreamVE: Unified Instruction-based Image and Video Editing | Bin Xia et.al. | 2508.06080 | null |
| 2025-08-07 | Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation | Yue Liao et.al. | 2508.05635 | null |
| 2025-08-07 | B4DL: A Benchmark for 4D LiDAR LLM in Spatio-Temporal Understanding | Changho Choi et.al. | 2508.05269 | null |
| 2025-08-07 | PoseGen: In-Context LoRA Finetuning for Pose-Controllable Long Human Video Generation | Jingxuan He et.al. | 2508.05091 | null |
| 2025-08-07 | S |
Weilun Feng et.al. | 2508.04016 | null |
| 2025-08-06 | MSC: A Marine Wildlife Video Dataset with Grounded Segmentation and Clip-Level Captioning | Quang-Trung Truong et.al. | 2508.04549 | null |
| 2025-08-06 | LayerT2V: Interactive Multi-Object Trajectory Layering for Video Generation | Kangrui Cen et.al. | 2508.04228 | null |
| 2025-08-06 | Motion is the Choreographer: Learning Latent Pose Dynamics for Seamless Sign Language Generation | Jiayi He et.al. | 2508.04049 | null |
| 2025-08-06 | Macro-from-Micro Planning for High-Quality and Parallelized Autoregressive Long Video Generation | Xunzhi Xiang et.al. | 2508.03334 | null |
| 2025-08-05 | Scaling Up Audio-Synchronized Visual Animation: An Efficient Training Paradigm | Lin Zhang et.al. | 2508.03955 | null |
| 2025-08-05 | LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation | Jianxiong Gao et.al. | 2508.03694 | null |
| 2025-08-05 | RAAG: Ratio Aware Adaptive Guidance | Shangwen Zhu et.al. | 2508.03442 | null |
| 2025-08-05 | V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models | Jisoo Kim et.al. | 2508.03254 | null |
| 2025-08-05 | Multi-human Interactive Talking Dataset | Zeyu Zhu et.al. | 2508.03050 | null |
| 2025-08-05 | MoCA: Identity-Preserving Text-to-Video Generation via Mixture of Cross Attention | Qi Xie et.al. | 2508.03034 | null |
| 2025-08-05 | D3: Training-Free AI-Generated Video Detection Using Second-Order Features | Chende Zheng et.al. | 2508.00701 | null |
| 2025-08-04 | X-Actor: Emotional and Expressive Long-Range Portrait Acting from Audio | Chenxu Zhang et.al. | 2508.02944 | null |
| 2025-08-04 | DreamVVT: Mastering Realistic Video Virtual Try-On in the Wild via a Stage-Wise Diffusion Transformer Framework | Tongchun Zuo et.al. | 2508.02807 | null |
| 2025-08-04 | QuaDreamer: Controllable Panoramic Video Generation for Quadruped Robots | Sheng Wu et.al. | 2508.02512 | null |
| 2025-08-04 | PoseGuard: Pose-Guided Generation with Safety Guardrails | Kongxin Wang et.al. | 2508.02476 | null |
| 2025-08-04 | Talking Surveys: How Photorealistic Embodied Conversational Agents Shape Response Quality, Engagement, and Satisfaction | Matus Krajcovic et.al. | 2508.02376 | null |
| 2025-08-03 | Versatile Transition Generation with Image-to-Video Diffusion | Zuhao Yang et.al. | 2508.01698 | null |
| 2025-08-01 | Video Generators are Robot Policies | Junbang Liang et.al. | 2508.00795 | null |
| 2025-08-01 | SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation | Kien T. Pham et.al. | 2508.00782 | null |
| 2025-08-01 | Video Forgery Detection with Optical Flow Residuals and Spatial-Temporal Consistency | Xi Xue et.al. | 2508.00397 | null |
| 2025-08-01 | GV-VAD : Exploring Video Generation for Weakly-Supervised Video Anomaly Detection | Suhang Cai et.al. | 2508.00312 | null |
| 2025-08-01 | Controllable Pedestrian Video Editing for Multi-View Driving Scenarios via Motion Sequence | Danzhen Fu et.al. | 2508.00299 | null |
| 2025-08-01 | HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly | Chang Liu et.al. | 2507.19924 | null |
| 2025-07-31 | World Consistency Score: A Unified Metric for Video Generation Quality | Akshat Rakheja et.al. | 2508.00144 | null |
| 2025-07-30 | GVD: Guiding Video Diffusion Model for Scalable Video Distillation | Kunyang Li et.al. | 2507.22360 | null |
| 2025-07-29 | JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1 | Xinhan Di et.al. | 2507.20987 | null |
| 2025-07-28 | Compositional Video Synthesis by Temporal Object-Centric Learning | Adil Kaan Akan et.al. | 2507.20855 | null |
| 2025-07-27 | MagicAnime: A Hierarchically Annotated, Multimodal and Multitasking Dataset with Benchmarks for Cartoon Animation Generation | Shuolin Xu et.al. | 2507.20368 | null |
| 2025-07-26 | ChoreoMuse: Robust Music-to-Dance Video Generation with Style Transfer and Beat-Adherent Motion | Xuanchen Wang et.al. | 2507.19836 | null |
| 2025-07-25 | ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment | Chong Xia et.al. | 2507.19058 | null |
| 2025-07-24 | Captain Cinema: Towards Short Movie Generation | Junfei Xiao et.al. | 2507.18634 | null |
| 2025-07-24 | Adversarial Distribution Matching for Diffusion Distillation Towards Efficient Image and Video Synthesis | Yanzuo Lu et.al. | 2507.18569 | null |
| 2025-07-24 | Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows | Simin Huo et.al. | 2507.18405 | null |
| 2025-07-24 | T2VWorldBench: A Benchmark for Evaluating World Knowledge in Text-to-Video Generation | Yubin Chen et.al. | 2507.18107 | null |
| 2025-07-24 | Enhancing Scene Transition Awareness in Video Generation via Post-Training | Hanwen Shen et.al. | 2507.18046 | null |
| 2025-07-24 | Celeb-DF++: A Large-scale Challenging Video DeepFake Benchmark for Generalizable Forensics | Yuezun Li et.al. | 2507.18015 | null |
| 2025-07-24 | Controllable Video Generation: A Survey | Yue Ma et.al. | 2507.16869 | null |
| 2025-07-23 | Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA | Rameen Abdal et.al. | 2507.17963 | null |
| 2025-07-23 | Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation | Jaechul Roh et.al. | 2507.17937 | null |
| 2025-07-23 | Yume: An Interactive World Generation Model | Xiaofeng Mao et.al. | 2507.17744 | null |
| 2025-07-23 | EndoGen: Conditional Autoregressive Endoscopic Video Generation | Xinyu Liu et.al. | 2507.17388 | null |
| 2025-07-22 | Livatar-1: Real-Time Talking Heads Generation with Tailored Flow Matching | Haiyang Liu et.al. | 2507.18649 | null |
| 2025-07-22 | MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation | Yanchen Liu et.al. | 2507.16310 | null |
| 2025-07-22 | PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep Adaptation | Yaofang Liu et.al. | 2507.16116 | null |
| 2025-07-21 | Can Your Model Separate Yolks with a Water Bottle? Benchmarking Physical Commonsense Understanding in Video Generation Models | Enes Sanli et.al. | 2507.15824 | null |
| 2025-07-21 | TokensGen: Harnessing Condensed Tokens for Long Video Generation | Wenqi Ouyang et.al. | 2507.15728 | null |
| 2025-07-21 | Conditional Video Generation for High-Efficiency Video Compression | Fangqiu Yi et.al. | 2507.15269 | null |
| 2025-07-19 | BusterX++: Towards Unified Cross-Modal AI-Generated Content Detection and Explanation with MLLM | Haiquan Wen et.al. | 2507.14632 | null |
| 2025-07-19 | Advances in Feed-Forward 3D Reconstruction and View Synthesis: A Survey | Jiahui Zhang et.al. | 2507.14501 | null |
| 2025-07-18 | Encapsulated Composition of Text-to-Image and Text-to-Video Models for High-Quality Video Synthesis | Tongtong Su et.al. | 2507.13753 | null |
| 2025-07-17 | Dmitrii Mikhailov et.al. | 2507.13546 | null | |
| 2025-07-17 | "PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models | Jing Gu et.al. | 2507.13428 | null |
| 2025-07-17 | Taming Diffusion Transformer for Real-Time Mobile Video Generation | Yushu Wu et.al. | 2507.13343 | null |
| 2025-07-17 | Leveraging Pre-Trained Visual Models for AI-Generated Video Detection | Keerthi Veeramachaneni et.al. | 2507.13224 | null |
| 2025-07-17 | LoViC: Efficient Long Video Generation with Context Compression | Jiaxiu Jiang et.al. | 2507.12952 | null |
| 2025-07-17 | World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving | Yanchen Guan et.al. | 2507.12762 | null |
| 2025-07-16 | EC-Diff: Fast and High-Quality Edge-Cloud Collaborative Inference for Diffusion Models | Jiajian Xie et.al. | 2507.11980 | null |
| 2025-07-15 | NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation Models | X. Feng et.al. | 2507.11245 | null |
| 2025-07-14 | Flows and Diffusions on the Neural Manifold | Daniel Saragih et.al. | 2507.10623 | null |
| 2025-07-14 | M2DAO-Talker: Harmonizing Multi-granular Motion Decoupling and Alternating Optimization for Talking-head Generation | Kui Jiang et.al. | 2507.08307 | null |
| 2025-07-14 | Democratizing High-Fidelity Co-Speech Gesture Video Generation | Xu Yang et.al. | 2507.06812 | null |
| 2025-07-12 | Zhimin Liao et.al. | 2507.09144 | null | |
| 2025-07-11 | Taming generative video models for zero-shot optical flow extraction | Seungwoo Kim et.al. | 2507.09082 | null |
| 2025-07-11 | Detecting Deepfake Talking Heads from Facial Biometric Anomalies | Justin D. Norman et.al. | 2507.08917 | null |
| 2025-07-11 | Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective | Hangjie Yuan et.al. | 2507.08801 | null |
| 2025-07-11 | Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers | Wongi Jeong et.al. | 2507.08422 | null |
| 2025-07-11 | T-GVC: Trajectory-Guided Generative Video Coding at Ultra-Low Bitrates | Zhitao Wang et.al. | 2507.07633 | null |
| 2025-07-10 | Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling | Haoyu Wu et.al. | 2507.07982 | null |
| 2025-07-10 | Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions | Longfei Li et.al. | 2507.07978 | null |
| 2025-07-10 | Scaling RL to Long Videos | Yukang Chen et.al. | 2507.07966 | null |
| 2025-07-09 | A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality | Mohamed Elmoghany et.al. | 2507.07202 | null |
| 2025-07-09 | Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation | Tao Feng et.al. | 2507.06830 | null |
| 2025-07-09 | PromptTea: Let Prompts Tell TeaCache the Optimal Threshold | Zishen Huang et.al. | 2507.06739 | null |
| 2025-07-09 | Spatial-Temporal Graph Mamba for Music-Guided Dance Video Synthesis | Hao Tang et.al. | 2507.06689 | null |
| 2025-07-09 | FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation | Liqiang Jing et.al. | 2507.06523 | null |
| 2025-07-09 | Omni-Video: Democratizing Unified Video Understanding and Generation | Zhiyu Tan et.al. | 2507.06119 | null |
| 2025-07-09 | Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation | Zhenghao Zhang et.al. | 2507.05963 | null |
| 2025-07-09 | LongAnimation: Long Animation Generation with Dynamic Global-Local Memory | Nan Chen et.al. | 2507.01945 | null |
| 2025-07-08 | Bridging Sequential Deep Operator Network and Video Diffusion: Residual Refinement of Spatio-Temporal PDE Solutions | Jaewan Park et.al. | 2507.06133 | null |
| 2025-07-08 | MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos | Rongsheng Wang et.al. | 2507.05675 | null |
| 2025-07-08 | StreamDiT: Real-Time Streaming Text-to-Video Generation | Akio Kodaira et.al. | 2507.03745 | null |
| 2025-07-07 | HV-MMBench: Benchmarking MLLMs for Human-Centric Video Understanding | Yuxuan Cai et.al. | 2507.04909 | null |
| 2025-07-07 | Music2Palette: Emotion-aligned Color Palette Generation via Cross-Modal Representation Learning | Jiayun Hu et.al. | 2507.04758 | null |
| 2025-07-07 | Identity-Preserving Text-to-Video Generation Guided by Simple yet Effective Spatial-Temporal Decoupled Representations | Yuji Wang et.al. | 2507.04705 | null |
| 2025-07-06 | MambaVideo for Discrete Video Tokenization with Channel-Split Quantization | Dawit Mureja Argaw et.al. | 2507.04559 | null |
| 2025-07-06 | CLIP-RL: Surgical Scene Segmentation Using Contrastive Language-Vision Pretraining & Reinforcement Learning | Fatmaelzahraa Ali Ahmed et.al. | 2507.04317 | null |
| 2025-07-05 | PresentAgent: Multimodal Agent for Presentation Video Generation | Jingwei Shi et.al. | 2507.04036 | null |
| 2025-07-05 | EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation | Rang Meng et.al. | 2507.03905 | null |
| 2025-07-03 | RefTok: Reference-Based Tokenization for Video Generation | Xiang Fan et.al. | 2507.02862 | null |
| 2025-07-03 | Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching | Xin Zhou et.al. | 2507.02860 | null |
| 2025-07-03 | AnyI2V: Animating Any Conditional Image with Motion Control | Ziye Li et.al. | 2507.02857 | null |
| 2025-07-03 | Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation | François Rozet et.al. | 2507.02608 | null |
| 2025-07-03 | RGC-VQA: An Exploration Database for Robotic-Generated Video Quality Assessment | Jianing Jin et.al. | 2506.23852 | null |
| 2025-07-02 | SD-Acc: Accelerating Stable Diffusion through Phase-aware Sampling and Hardware Co-Optimizations | Zhican Wang et.al. | 2507.01309 | null |
| 2025-07-02 | LLM-based Realistic Safety-Critical Driving Video Generation | Yongjie Fu et.al. | 2507.01264 | null |
| 2025-07-02 | AIGVE-MACS: Unified Multi-Aspect Commenting and Scoring Model for AI-Generated Video Evaluation | Xiao Liu et.al. | 2507.01255 | null |
| 2025-07-01 | Geometry-aware 4D Video Generation for Robot Manipulation | Zeyi Liu et.al. | 2507.01099 | null |
| 2025-07-01 | Populate-A-Scene: Affordance-Aware Human Video Generation | Mengyi Shan et.al. | 2507.00334 | null |
| 2025-07-01 | Listener-Rewarded Thinking in VLMs for Image Preferences | Alexander Gambashidze et.al. | 2506.22832 | null |
| 2025-06-30 | FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion | Yu Lu et.al. | 2507.00162 | null |
| 2025-06-30 | Epona: Autoregressive Diffusion World Model for Autonomous Driving | Kaiwen Zhang et.al. | 2506.24113 | null |
| 2025-06-30 | VMoBA: Mixture-of-Block Attention for Video Diffusion Models | Jianzong Wu et.al. | 2506.23858 | null |
| 2025-06-30 | SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation | Shuai Tan et.al. | 2506.23690 | null |
| 2025-06-30 | ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models | Zixun Fang et.al. | 2506.23513 | null |
| 2025-06-29 | Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis | Lei-lei Li et.al. | 2506.23263 | null |
| 2025-06-29 | RoboScape: Physics-informed Embodied World Model | Yu Shang et.al. | 2506.23135 | null |
| 2025-06-27 | Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy | Yuhao Liu et.al. | 2506.22432 | null |
| 2025-06-27 | RoboEnvision: A Long-Horizon Video Generation Model for Multi-Task Robot Manipulation | Liudi Yang et.al. | 2506.22007 | null |
| 2025-06-27 | ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models | Hongbo Liu et.al. | 2506.21356 | null |
| 2025-06-27 | DFVEdit: Conditional Delta Flow Vector for Zero-shot Video Editing | Lingling Cai et.al. | 2506.20967 | null |
| 2025-06-26 | SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture | Kehan Sui et.al. | 2506.21478 | null |
| 2025-06-26 | HieraSurg: Hierarchy-Aware Diffusion Model for Surgical Video Generation | Diego Biagini et.al. | 2506.21287 | null |
| 2025-06-26 | Video Virtual Try-on with Conditional Diffusion Transformer Inpainter | Cheng Zou et.al. | 2506.21270 | null |
| 2025-06-26 | Consistent Zero-shot 3D Texture Synthesis Using Geometry-aware Diffusion and Temporal Video Models | Donggoo Kang et.al. | 2506.20946 | null |
| 2025-06-25 | Video Perception Models for 3D Scene Synthesis | Rui Huang et.al. | 2506.20601 | null |
| 2025-06-25 | BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos | Jiahao Lin et.al. | 2506.20103 | null |
| 2025-06-24 | Radial Attention: |
Xingyang Li et.al. | 2506.19852 | null |
| 2025-06-24 | GenHSI: Controllable Generation of Human-Scene Interaction Videos | Zekun Li et.al. | 2506.19840 | null |
| 2025-06-24 | SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution | Liangbin Xie et.al. | 2506.19838 | null |
| 2025-06-24 | Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router | Yubo Huang et.al. | 2506.19833 | null |
| 2025-06-24 | Training-Free Motion Customization for Distilled Video Generators with Adaptive Test-Time Distillation | Jintao Rong et.al. | 2506.19348 | null |
| 2025-06-23 | VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory | Runjia Li et.al. | 2506.18903 | null |
| 2025-06-23 | From Virtual Games to Real-World Play | Wenqiang Sun et.al. | 2506.18901 | null |
| 2025-06-23 | FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation | Kaiyi Huang et.al. | 2506.18899 | null |
| 2025-06-23 | MinD: Unified Visual Imagination and Control via Hierarchical World Models | Xiaowei Chi et.al. | 2506.18897 | null |
| 2025-06-23 | OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation | Qijun Gan et.al. | 2506.18866 | null |
| 2025-06-23 | Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset | Zhuowei Chen et.al. | 2506.18851 | null |
| 2025-06-23 | Matrix-Game: Interactive World Foundation Model | Yifan Zhang et.al. | 2506.18701 | null |
| 2025-06-23 | RDPO: Real Data Preference Optimization for Physics Consistency Video Generation | Wenxu Qian et.al. | 2506.18655 | null |
| 2025-06-23 | BulletGen: Improving 4D Reconstruction with Bullet-Time Generation | Denys Rozumnyi et.al. | 2506.18601 | null |
| 2025-06-23 | VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning | Xuanyu Zhang et.al. | 2506.18564 | null |
| 2025-06-23 | Emergent Temporal Correspondences from Video Diffusion Transformers | Jisu Nam et.al. | 2506.17220 | link |
| 2025-06-21 | STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation | Jiamin Wang et.al. | 2506.13138 | null |
| 2025-06-20 | Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition | Jiaqi Li et.al. | 2506.17201 | null |
| 2025-06-20 | Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation | Riccardo Corvi et.al. | 2506.16802 | null |
| 2025-06-20 | Sekai: A Video Dataset towards World Exploration | Zhen Li et.al. | 2506.15675 | null |
| 2025-06-20 | Show-o2: Improved Native Unified Multimodal Models | Jinheng Xie et.al. | 2506.15564 | link |
| 2025-06-19 | VideoGAN-based Trajectory Proposal for Automated Vehicles | Annajoyce Mariani et.al. | 2506.16209 | link |
| 2025-06-19 | FastInit: Fast Noise Initialization for Temporally Consistent Video Generation | Chengyu Bai et.al. | 2506.16119 | null |
| 2025-06-19 | PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models | Tianchen Zhao et.al. | 2506.16054 | null |
| 2025-06-19 | Advanced Sign Language Video Generation with Compressed and Quantized Multi-Condition Tokenization | Cong Wang et.al. | 2506.15980 | link |
| 2025-06-18 | VideoMAR: Autoregressive Video Generatio with Continuous Tokens | Hu Yu et.al. | 2506.14168 | null |
| 2025-06-18 | Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models | Xuanchi Ren et.al. | 2506.09042 | link |
| 2025-06-17 | Causally Steered Diffusion for Automated Video Counterfactual Generation | Nikos Spyrou et.al. | 2506.14404 | link |
| 2025-06-17 | CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation | Jia-Chen Zhang et.al. | 2506.14206 | null |
| 2025-06-16 | EchoShot: Multi-Shot Portrait Video Generation | Jiahao Wang et.al. | 2506.15838 | null |
| 2025-06-16 | UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions | Zhucun Xue et.al. | 2506.13691 | null |
| 2025-06-15 | iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer | Zhelun Shen et.al. | 2506.12847 | null |
| 2025-06-13 | SignAligner: Harmonizing Complementary Pose Modalities for Coherent Sign Language Generation | Xu Wang et.al. | 2506.11621 | null |
| 2025-06-13 | Multimodal Cinematic Video Synthesis Using Text-to-Image and Audio Generation Models | Sridhar S et.al. | 2506.10005 | null |
| 2025-06-12 | GenWorld: Towards Detecting AI-generated Real-world Simulation Videos | Weiliang Chen et.al. | 2506.10975 | null |
| 2025-06-12 | M4V: Multi-Modal Mamba for Text-to-Video Generation | Jiancheng Huang et.al. | 2506.10915 | null |
| 2025-06-12 | GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning | Xiaoyi Bao et.al. | 2506.10639 | null |
| 2025-06-12 | DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers | Lizhen Wang et.al. | 2506.10568 | null |
| 2025-06-12 | AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation | Haoyuan Shi et.al. | 2506.10540 | null |
| 2025-06-11 | AlignHuman: Improving Motion and Fidelity via Timestep-Segment Preference Optimization for Audio-Driven Human Animation | Chao Liang et.al. | 2506.11144 | null |
| 2025-06-11 | PlayerOne: Egocentric World Simulator | Yuanpeng Tu et.al. | 2506.09995 | null |
| 2025-06-11 | InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions | Zhenzhi Wang et.al. | 2506.09984 | null |
| 2025-06-11 | ReSim: Reliable World Simulation for Autonomous Driving | Jiazhi Yang et.al. | 2506.09981 | null |
| 2025-06-11 | DGAE: Diffusion-Guided Autoencoder for Efficient Latent Representation Learning | Dongxu Liu et.al. | 2506.09644 | null |
| 2025-06-11 | Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation | Shanchuan Lin et.al. | 2506.09350 | null |
| 2025-06-10 | Seedance 1.0: Exploring the Boundaries of Video Generation Models | Yu Gao et.al. | 2506.09113 | null |
| 2025-06-10 | FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation | Zheqi He et.al. | 2506.09081 | link |
| 2025-06-10 | VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks | Xinlong Chen et.al. | 2506.09079 | null |
| 2025-06-10 | MagCache: Fast Video Generation with Magnitude-Aware Cache | Zehong Ma et.al. | 2506.09045 | link |
| 2025-06-10 | Product of Experts for Visual Generation | Yunzhi Zhang et.al. | 2506.08894 | null |
| 2025-06-10 | HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation | Ziyao Huang et.al. | 2506.08797 | null |
| 2025-06-10 | RoboSwap: A GAN-driven Video Diffusion Framework For Unsupervised Robot Arm Swapping | Yang Bai et.al. | 2506.08632 | null |
| 2025-06-10 | How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models | Huixuan Zhang et.al. | 2506.08351 | null |
| 2025-06-10 | From Generation to Generalization: Emergent Few-Shot Learning in Video Diffusion Models | Pablo Acuaviva et.al. | 2506.07280 | null |
| 2025-06-09 | Seeing Voices: Generating A-Roll Video from Audio with Mirage | Aditi Sundararaman et.al. | 2506.08279 | null |
| 2025-06-09 | Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion | Xun Huang et.al. | 2506.08009 | null |
| 2025-06-09 | Dreamland: Controllable World Creation with Simulator and Generative Models | Sicheng Mo et.al. | 2506.08006 | null |
| 2025-06-09 | Audio-Sync Video Generation with Multi-Stream Temporal Control | Shuchen Weng et.al. | 2506.08003 | null |
| 2025-06-09 | Generative Modeling of Weights: Generalization or Memorization? | Boya Zeng et.al. | 2506.07998 | link |
| 2025-06-09 | Video Unlearning via Low-Rank Refusal Vector | Simone Facchiano et.al. | 2506.07891 | null |
| 2025-06-09 | EgoM2P: Egocentric Multimodal Multitask Pretraining | Gen Li et.al. | 2506.07886 | null |
| 2025-06-09 | PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement | Teng Hu et.al. | 2506.07848 | null |
| 2025-06-09 | Consistent Video Editing as Flow-Driven Image-to-Video Generation | Ge Wang et.al. | 2506.07713 | null |
| 2025-06-09 | Evaluating Robustness in Latent Diffusion Models via Embedding Level Augmentation | Boris Martirosyan et.al. | 2506.07706 | null |
| 2025-06-09 | Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers | Haosong Liu et.al. | 2506.05096 | null |
| 2025-06-08 | TV-LiVE: Training-Free, Text-Guided Video Editing via Layer Informed Vitality Exploitation | Min-Jung Kim et.al. | 2506.07205 | null |
| 2025-06-08 | Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models | Sangwon Jang et.al. | 2506.07177 | null |
| 2025-06-08 | Hi-VAE: Efficient Video Autoencoding with Global and Detailed Motion | Huaize Liu et.al. | 2506.07136 | null |
| 2025-06-07 | Self-Adapting Improvement Loops for Robotic Learning | Calvin Luo et.al. | 2506.06658 | null |
| 2025-06-06 | Restereo: Diffusion stereo video generation and restoration | Xingchang Huang et.al. | 2506.06023 | null |
| 2025-06-06 | LLIA -- Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models | Haojie Yu et.al. | 2506.05806 | null |
| 2025-06-06 | FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion | Akide Liu et.al. | 2506.04648 | null |
| 2025-06-05 | EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh | Tao Hu et.al. | 2506.05554 | null |
| 2025-06-05 | ContentV: Efficient Training of Video Generation Models with Limited Compute | Wenfeng Lin et.al. | 2506.05343 | null |
| 2025-06-05 | FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation | Huihan Wang et.al. | 2506.04956 | link |
| 2025-06-05 | DualX-VSR: Dual Axial Spatial |
Shuo Cao et.al. | 2506.04830 | null |
| 2025-06-05 | Follow-Your-Creation: Empowering 4D Creation through Video Inpainting | Yue Ma et.al. | 2506.04590 | null |
| 2025-06-05 | FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers | Xuanhua He et.al. | 2506.04213 | null |
| 2025-06-05 | SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios | Lingwei Dang et.al. | 2506.02444 | link |
| 2025-06-04 | LayerFlow: A Unified Model for Layer-aware Video Generation | Sihui Ji et.al. | 2506.04228 | null |
| 2025-06-04 | UNIC: Unified In-Context Video Editing | Zixuan Ye et.al. | 2506.04216 | null |
| 2025-06-04 | DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models | Ziyi Wu et.al. | 2506.03517 | null |
| 2025-06-03 | Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas | Austin Silveria et.al. | 2506.03275 | null |
| 2025-06-03 | IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation | Yuanze Lin et.al. | 2506.03150 | null |
| 2025-06-03 | Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval | Jiwen Yu et.al. | 2506.03141 | null |
| 2025-06-03 | CamCloneMaster: Enabling Reference-based Camera Control for Video Generation | Yawen Luo et.al. | 2506.03140 | null |
| 2025-06-03 | AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation | Lu Qiu et.al. | 2506.03126 | null |
| 2025-06-03 | DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation | Zhengyao Lv et.al. | 2506.03123 | null |
| 2025-06-03 | TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models | Chetwin Low et.al. | 2506.03099 | null |
| 2025-06-03 | SG2VID: Scene Graphs Enable Fine-Grained Control for Video Synthesis | Ssharvien Kumar Sivakumar et.al. | 2506.03082 | null |
| 2025-06-03 | ORV: 4D Occupancy-centric Robot Video Generation | Xiuyu Yang et.al. | 2506.03079 | link |
| 2025-06-03 | Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers | Pengtao Chen et.al. | 2506.03065 | null |
| 2025-06-03 | LinkTo-Anime: A 2D Animation Optical Flow Dataset from 3D Model Rendering | Xiaoyi Feng et.al. | 2506.02733 | null |
| 2025-06-03 | LumosFlow: Motion-Guided Long Video Generation | Jiahao Chen et.al. | 2506.02497 | null |
| 2025-06-02 | Motion aware video generative model | Bowen Xue et.al. | 2506.02244 | null |
| 2025-06-02 | Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control | Xiao Fu et.al. | 2506.01943 | null |
| 2025-06-02 | OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation | Sen Liang et.al. | 2506.01801 | null |
| 2025-06-02 | Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks | Tao Yang et.al. | 2506.01758 | null |
| 2025-06-02 | Respond Beyond Language: A Benchmark for Video Generation in Response to Realistic User Intents | Shuting Wang et.al. | 2506.01689 | null |
| 2025-06-02 | LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model | Xiaodong Wang et.al. | 2506.01546 | null |
| 2025-06-02 | Towards Scalable Video Anomaly Retrieval: A Synthetic Video-Text Benchmark | Shuyu Yang et.al. | 2506.01466 | null |
| 2025-06-02 | DiffuseSlide: Training-Free High Frame Rate Video Generation Diffusion | Geunmin Hwang et.al. | 2506.01454 | null |
| 2025-05-30 | MiniMax-Remover: Taming Bad Noise Helps Video Object Removal | Bojia Zi et.al. | 2505.24873 | null |
| 2025-05-30 | DreamDance: Animating Character Art via Inpainting Stable Gaussian Worlds | Jiaxu Zhang et.al. | 2505.24733 | null |
| 2025-05-30 | UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation | Yang-Tian Sun et.al. | 2505.24521 | null |
| 2025-05-30 | Interactive Video Generation via Domain Adaptation | Ishaan Rawal et.al. | 2505.24253 | null |
| 2025-05-30 | STORK: Improving the Fidelity of Mid-NFE Sampling for Diffusion and Flow Matching Models | Zheng Tan et.al. | 2505.24210 | link |
| 2025-05-29 | MAGREF: Masked Guidance for Any-Reference Video Generation | Yufan Deng et.al. | 2505.23742 | link |
| 2025-05-29 | VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos | Tingyu Song et.al. | 2505.23693 | link |
| 2025-05-29 | VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models | Xiangdong Zhang et.al. | 2505.23656 | link |
| 2025-05-29 | VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation | Shi-Xue Zhang et.al. | 2505.23484 | link |
| 2025-05-29 | Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis | Hengyuan Cao et.al. | 2505.23325 | null |
| 2025-05-29 | RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer | Liu Liu et.al. | 2505.23171 | null |
| 2025-05-29 | Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing | Tongtong Su et.al. | 2505.23134 | link |
| 2025-05-29 | MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation | Siyuan Wang et.al. | 2505.23120 | link |
| 2025-05-29 | GeoMan: Temporally Consistent Human Geometry Estimation using Image-to-Video Diffusion | Gwanghyun Kim et.al. | 2505.23085 | null |
| 2025-05-29 | MOVi: Training-free Text-conditioned Multi-Object Video Generation | Aimon Rahman et.al. | 2505.22980 | null |
| 2025-05-29 | HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions | Shuolin Xu et.al. | 2505.22977 | link |
| 2025-05-29 | Minute-Long Videos with Dual Parallelisms | Zeqing Wang et.al. | 2505.21070 | link |
| 2025-05-28 | ATI: Any Trajectory Instruction for Controllable Video Generation | Angtian Wang et.al. | 2505.22944 | null |
| 2025-05-28 | Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation | Zhe Kong et.al. | 2505.22647 | link |
| 2025-05-28 | Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers | Weilun Feng et.al. | 2505.22167 | null |
| 2025-05-28 | FaceEditTalker: Interactive Talking Head Generation with Facial Attribute Editing | Guanwen Feng et.al. | 2505.22141 | null |
| 2025-05-28 | LatentMove: Towards Complex Human Movement Video Generation | Ashkan Taghipour et.al. | 2505.22046 | null |
| 2025-05-28 | PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms | Yifei Xia et.al. | 2505.22016 | null |
| 2025-05-28 | Learning World Models for Interactive Video Generation | Taiye Chen et.al. | 2505.21996 | null |
| 2025-05-28 | SageAttention2++: A More Efficient Implementation of SageAttention2 | Jintao Zhang et.al. | 2505.21136 | link |
| 2025-05-28 | OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation | Shenghai Yuan et.al. | 2505.20292 | link |
| 2025-05-27 | HDRSDR-VQA: A Subjective Video Quality Dataset for HDR and SDR Comparative Evaluation | Bowen Chen et.al. | 2505.21831 | null |
| 2025-05-27 | Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation | Ke Zhang et.al. | 2505.21653 | null |
| 2025-05-27 | VideoMarkBench: Benchmarking Robustness of Video Watermarking | Zhengyuan Jiang et.al. | 2505.21620 | link |
| 2025-05-27 | Frame In-N-Out: Unbounded Controllable Image-to-Video Generation | Boyang Wang et.al. | 2505.21491 | null |
| 2025-05-27 | Dynamic Vision from EEG Brain Recordings: How much does EEG know? | Prajwal Singh et.al. | 2505.21385 | null |
| 2025-05-27 | RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy | Aiyue Chen et.al. | 2505.21036 | null |
| 2025-05-27 | Frame-Level Captions for Long Video Generation with Complex Multi Scenes | Guangcong Zheng et.al. | 2505.20827 | null |
| 2025-05-27 | Learning Generalizable Robot Policy with Human Demonstration Video as a Prompt | Xiang Zhu et.al. | 2505.20795 | null |
| 2025-05-27 | Photography Perspective Composition: Towards Aesthetic Perspective Recommendation | Lujian Yao et.al. | 2505.20655 | null |
| 2025-05-27 | Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training | Bolin Lai et.al. | 2505.20629 | null |
| 2025-05-27 | Dynamic-I2V: Exploring Image-to-Video Generation Models via Multimodal LLM | Peng Liu et.al. | 2505.19901 | null |
| 2025-05-26 | MotionPro: A Precise Motion Controller for Image-to-Video Generation | Zhongwei Zhang et.al. | 2505.20287 | null |
| 2025-05-26 | DriveCamSim: Generalizable Camera Simulation via Explicit Camera Modeling for Autonomous Driving | Wenchao Sun et.al. | 2505.19692 | link |
| 2025-05-26 | TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs | Juntong Wang et.al. | 2505.19535 | null |
| 2025-05-26 | The Role of Video Generation in Enhancing Data-Limited Action Understanding | Wei Li et.al. | 2505.19495 | null |
| 2025-05-26 | Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals | Nate Gillman et.al. | 2505.19386 | null |
| 2025-05-26 | DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation | Junhao Chen et.al. | 2505.18078 | null |
| 2025-05-25 | From Single Images to Motion Policies via Video-Generation Environment Representations | Weiming Zhi et.al. | 2505.19306 | null |
| 2025-05-25 | SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation | Shenggan Cheng et.al. | 2505.19151 | null |
| 2025-05-25 | WorldEval: World Model as Real-World Robot Policies Evaluator | Yaxuan Li et.al. | 2505.19017 | null |
| 2025-05-25 | Geometry-guided Online 3D Video Synthesis with Multi-View Temporal Consistency | Hyunho Ha et.al. | 2505.18932 | null |
| 2025-05-25 | Interspatial Attention for Efficient 4D Human Video Generation | Ruizhi Shao et.al. | 2505.15800 | null |
| 2025-05-24 | Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation | Shuo Yang et.al. | 2505.18875 | null |
| 2025-05-24 | VORTA: Efficient Video Diffusion via Routing Sparse Attention | Wenhao Sun et.al. | 2505.18809 | link |
| 2025-05-24 | DVD-Quant: Data-free Video Diffusion Transformers Quantization | Zhiteng Li et.al. | 2505.18663 | link |
| 2025-05-24 | ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos | Xiaodong Wang et.al. | 2505.18650 | null |
| 2025-05-23 | WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions | Zizhang Li et.al. | 2505.18151 | null |
| 2025-05-23 | SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain | Jiawei Zhou et.al. | 2505.17727 | null |
| 2025-05-23 | Scaling Image and Video Generation via Test-Time Evolutionary Search | Haoran He et.al. | 2505.17618 | null |
| 2025-05-23 | InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO | Xueji Fang et.al. | 2505.17574 | link |
| 2025-05-23 | Challenger: Affordable Adversarial Driving Video Generation | Zhiyuan Xu et.al. | 2505.15880 | null |
| 2025-05-22 | Temporal Differential Fields for 4D Motion Modeling via Image-to-Video Synthesis | Xin You et.al. | 2505.17333 | null |
| 2025-05-22 | Training-Free Efficient Video Generation via Dynamic Token Carving | Yuechen Zhang et.al. | 2505.16864 | link |
| 2025-05-22 | Action2Dialogue: Generating Character-Centric Narratives from Scene-Level Prompts | Taewon Kang et.al. | 2505.16819 | null |
| 2025-05-22 | MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM | Siwei Meng et.al. | 2505.16456 | null |
| 2025-05-21 | Generative AI for Autonomous Driving: A Review | Katharina Winter et.al. | 2505.15863 | null |
| 2025-05-21 | AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection | Zhipei Xu et.al. | 2505.15173 | null |
| 2025-05-21 | CineTechBench: A Benchmark for Cinematographic Technique Understanding and Generation | Xinran Wang et.al. | 2505.15145 | link |
| 2025-05-21 | BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation | Haiquan Wen et.al. | 2505.12620 | link |
| 2025-05-21 | Video-GPT via Next Clip Diffusion | Shaobin Zhuang et.al. | 2505.12489 | null |
| 2025-05-20 | Programmatic Video Prediction Using Large Language Models | Hao Tang et.al. | 2505.14948 | link |
| 2025-05-20 | Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers | Sucheng Ren et.al. | 2505.14687 | link |
| 2025-05-20 | LMP: Leveraging Motion Prior in Zero-Shot Video Generation with Diffusion Transformer | Changgu Chen et.al. | 2505.14167 | null |
| 2025-05-20 | Hunyuan-Game: Industrial-grade Intelligent Game Creation Model | Ruihuang Li et.al. | 2505.14135 | null |
| 2025-05-20 | MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation | Yanbo Ding et.al. | 2505.10238 | link |
| 2025-05-19 | FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance | Dian Shao et.al. | 2505.13437 | null |
| 2025-05-19 | MAGI-1: Autoregressive Video Generation at Scale | Sand. ai et.al. | 2505.13211 | link |
| 2025-05-19 | DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories | Joel Jang et.al. | 2505.12705 | link |
| 2025-05-19 | Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking | Zihan Su et.al. | 2505.12667 | null |
| 2025-05-18 | EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models | Hu Yue et.al. | 2505.09694 | link |
| 2025-05-17 | FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge | Xuan Shen et.al. | 2505.14709 | link |
| 2025-05-17 | DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance | Xuan Shen et.al. | 2505.14708 | link |
| 2025-05-17 | LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation | Jiarui Wang et.al. | 2505.12098 | link |
| 2025-05-17 | VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption | Tianxiong Zhong et.al. | 2505.12053 | null |
| 2025-05-17 | STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives | Bo Wang et.al. | 2505.08350 | null |
| 2025-05-16 | QVGen: Pushing the Limit of Quantized Video Generative Models | Yushi Huang et.al. | 2505.11497 | null |
| 2025-05-16 | Face Consistency Benchmark for GenAI Video | Michal Podstawski et.al. | 2505.11425 | null |
| 2025-05-16 | Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model | Wei Li et.al. | 2505.07449 | link |
| 2025-05-15 | ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars | Rui-Yang Ju et.al. | 2505.10072 | null |
| 2025-05-15 | Generating time-consistent dynamics with discriminator-guided image diffusion models | Philipp Hess et.al. | 2505.09089 | null |
| 2025-05-15 | Generative Pre-trained Autoregressive Diffusion Transformer | Yuan Zhang et.al. | 2505.07344 | null |
| 2025-05-14 | Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios | Huafeng Shi et.al. | 2505.10584 | null |
| 2025-05-13 | Generative AI for Autonomous Driving: Frontiers and Opportunities | Yuping Wang et.al. | 2505.08854 | link |
| 2025-05-13 | Symbolically-Guided Visual Plan Inference from Uncurated Video Data | Wenyan Yang et.al. | 2505.08444 | null |
| 2025-05-12 | DanceGRPO: Unleashing GRPO on Visual Generation | Zeyue Xue et.al. | 2505.07818 | null |
| 2025-05-12 | ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models | Ozgur Kara et.al. | 2505.07652 | null |
| 2025-05-11 | DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models | Junhao Xia et.al. | 2505.07057 | null |
| 2025-05-11 | BridgeIV: Bridging Customized Image and Video Generation through Test-Time Autoregressive Identity Propagation | Panwen Hu et.al. | 2505.06985 | null |
| 2025-05-10 | Jailbreaking the Text-to-Video Generative Models | Jiayang Liu et.al. | 2505.06679 | null |
| 2025-05-10 | ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images | Xianghao Kong et.al. | 2505.06537 | null |
| 2025-05-08 | 3D Scene Generation: A Survey | Beichen Wen et.al. | 2505.05474 | link |
| 2025-05-08 | T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models | Xuyang Guo et.al. | 2505.04946 | null |
| 2025-05-08 | HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation | Teng Hu et.al. | 2505.04512 | null |
| 2025-05-06 | Real-Time Person Image Synthesis Using a Flow Matching Model | Jiwoo Jeong et.al. | 2505.03562 | link |
| 2025-05-06 | Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights | Zhaiming Shen et.al. | 2505.03205 | null |
| 2025-05-04 | DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization | Wenchuan Wang et.al. | 2505.02192 | null |
| 2025-05-03 | GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting | Anushka Agarwal et.al. | 2505.01928 | null |
| 2025-05-03 | PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth | Bu Jin et.al. | 2505.01729 | null |
| 2025-05-02 | VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos | Zongxia Li et.al. | 2505.01481 | link |
| 2025-05-02 | FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis | Jiangtong Tan et.al. | 2505.01172 | link |
| 2025-05-01 | Controllable Weather Synthesis and Removal with Video Diffusion Models | Chih-Hao Lin et.al. | 2505.00704 | null |
| 2025-05-01 | T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation | Xuyang Guo et.al. | 2505.00337 | null |
| 2025-04-30 | Direct Motion Models for Assessing Generated Videos | Kelsey Allen et.al. | 2505.00209 | null |
| 2025-04-30 | Eye2Eye: A Simple Approach for Monocular-to-Stereo Video Synthesis | Michal Geyer et.al. | 2505.00135 | null |
| 2025-04-30 | ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction | Qihao Liu et.al. | 2504.21855 | null |
| 2025-04-30 | HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation | Haiyang Zhou et.al. | 2504.21650 | link |
| 2025-04-30 | Simple Visual Artifact Detection in Sora-Generated Videos | Misora Sugiyama et.al. | 2504.21334 | null |
| 2025-04-30 | Capturing Conditional Dependence via Auto-regressive Diffusion Models | Xunpeng Huang et.al. | 2504.21314 | null |
| 2025-04-29 | TesserAct: Learning 4D Embodied World Models | Haoyu Zhen et.al. | 2504.20995 | null |
| 2025-04-29 | DDPS: Discrete Diffusion Posterior Sampling for Paths in Layered Graphs | Hao Luan et.al. | 2504.20754 | null |
| 2025-04-29 | Advance Fake Video Detection via Vision Transformers | Joy Battocchio et.al. | 2504.20669 | null |
| 2025-04-28 | CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition | Quynh Phung et.al. | 2504.19894 | null |
| 2025-04-28 | DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer | Junpeng Jiang et.al. | 2504.19614 | null |
| 2025-04-26 | Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning | Yifan Xie et.al. | 2504.18810 | null |
| 2025-04-26 | Stealing Creator's Workflow: A Creator-Inspired Agentic Framework with Iterative Feedback Loop for Improved Scientific Short-form Generation | Jong Inn Park et.al. | 2504.18805 | null |
| 2025-04-25 | NoiseController: Towards Consistent Multi-view Video Generation via Noise Decomposition and Collaboration | Haotian Dong et.al. | 2504.18448 | null |
| 2025-04-25 | We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback | Minkyu Choi et.al. | 2504.17180 | null |
| 2025-04-24 | Dynamic Camera Poses and Where to Find Them | Chris Rockwell et.al. | 2504.17788 | null |
| 2025-04-24 | MV-Crafter: An Intelligent System for Music-guided Video Generation | Chuer Chen et.al. | 2504.17267 | null |
| 2025-04-24 | DIVE: Inverting Conditional Diffusion Models for Discriminative Tasks | Yinqi Li et.al. | 2504.17253 | link |
| 2025-04-23 | Subject-driven Video Generation via Disentangled Identity and Motion | Daneul Kim et.al. | 2504.17816 | null |
| 2025-04-23 | BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation | Ruotong Wang et.al. | 2504.16907 | null |
| 2025-04-23 | ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance | Ying Li et.al. | 2504.16464 | null |
| 2025-04-23 | VideoMark: A Distortion-Free Robust Watermarking Framework for Video Diffusion Models | Xuming Hu et.al. | 2504.16359 | null |
| 2025-04-22 | DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment | Xiaofan Li et.al. | 2504.18576 | link |
| 2025-04-22 | Survey of Video Diffusion Models: Foundations, Implementations, and Applications | Yimu Wang et.al. | 2504.16081 | link |
| 2025-04-22 | Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework | Xinyuan Song et.al. | 2504.16016 | null |
| 2025-04-22 | Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning | Wang Lin et.al. | 2504.15932 | null |
| 2025-04-22 | Satellite to GroundScape -- Large-scale Consistent Ground View Generation from Satellite Views | Ningli Xu et.al. | 2504.15786 | null |
| 2025-04-22 | DiTPainter: Efficient Video Inpainting with Diffusion Transformers | Xian Wu et.al. | 2504.15661 | null |
| 2025-04-21 | Solving New Tasks by Adapting Internet Video Knowledge | Calvin Luo et.al. | 2504.15369 | null |
| 2025-04-21 | Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform | Xianpan Zhou et.al. | 2504.15182 | null |
| 2025-04-21 | DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation | Weijie He et.al. | 2504.15032 | null |
| 2025-04-21 | Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation | Chenjie Cao et.al. | 2504.14899 | link |
| 2025-04-21 | SkyReels-V2: Infinite-length Film Generative Model | Guibin Chen et.al. | 2504.13074 | link |
| 2025-04-21 | Packing Input Frame Context in Next-Frame Prediction Models for Video Generation | Lvmin Zhang et.al. | 2504.12626 | link |
| 2025-04-20 | Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis | Jingjing Ren et.al. | 2504.14470 | null |
| 2025-04-19 | SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation | Minho Park et.al. | 2504.14396 | link |
| 2025-04-18 | Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting | Jiaxin Huang et.al. | 2504.11092 | null |
| 2025-04-17 | Understanding Attention Mechanism in Video Diffusion Models | Bingyan Liu et.al. | 2504.12027 | null |
| 2025-04-17 | VideoPanda: Video Panoramic Diffusion with Multi-view Attention | Kevin Xie et.al. | 2504.11389 | null |
| 2025-04-17 | StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text | Roberto Henschel et.al. | 2403.14773 | null |
| 2025-04-16 | VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate | Zhihang Yuan et.al. | 2504.12259 | link |
| 2025-04-16 | Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM | Zirui Pan et.al. | 2504.12048 | null |
| 2025-04-16 | The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation | Bingjie Gao et.al. | 2504.11739 | null |
| 2025-04-16 | ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation | Zongyi Li et.al. | 2410.20502 | null |
| 2025-04-15 | InterAnimate: Taming Region-aware Diffusion Model for Realistic Human Interaction Animation | Yukang Lin et.al. | 2504.10905 | null |
| 2025-04-15 | OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding | Dianbing Xi et.al. | 2504.10825 | null |
| 2025-04-14 | H-MoRe: Learning Human-centric Motion Representation for Action Analysis | Zhanbo Huang et.al. | 2504.10676 | link |
| 2025-04-14 | H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models | Yushu Wu et.al. | 2504.10567 | null |
| 2025-04-14 | FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos | Rui Chen et.al. | 2504.10358 | null |
| 2025-04-14 | Aligning Anime Video Generation with Human Feedback | Bingwen Zhu et.al. | 2504.10044 | null |
| 2025-04-14 | EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise | Chao Liu et.al. | 2504.09789 | null |
| 2025-04-13 | CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models | Pooja Guhan et.al. | 2504.09472 | null |
| 2025-04-11 | Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model | Team Seawead et.al. | 2504.08685 | null |
| 2025-04-11 | Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization | Jialu Li et.al. | 2504.08641 | null |
| 2025-04-11 | Diffusion Models for Robotic Manipulation: A Survey | Rosa Wolf et.al. | 2504.08438 | null |
| 2025-04-11 | EasyGenNet: An Efficient Framework for Audio-Driven Gesture Video Generation Based on Diffusion Model | Renda Li et.al. | 2504.08344 | null |
| 2025-04-11 | RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements | Guangcong Zheng et.al. | 2504.08212 | link |
| 2025-04-11 | TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation | Ruineng Li et.al. | 2504.08181 | null |
| 2025-04-10 | Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction | Zeren Jiang et.al. | 2504.07961 | link |
| 2025-04-10 | Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos | Rundong Luo et.al. | 2504.07940 | null |
| 2025-04-10 | Diffusion Transformers for Tabular Data Time Series Generation | Fabrizio Garuti et.al. | 2504.07566 | link |
| 2025-04-09 | EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation | Diljeet Jagpal et.al. | 2504.06861 | null |
| 2025-04-09 | DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation | Wangbo Zhao et.al. | 2504.06803 | link |
| 2025-04-09 | RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism | Elia Peruzzo et.al. | 2504.06672 | null |
| 2025-04-09 | Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception | Ruotian Peng et.al. | 2504.06666 | null |
| 2025-04-08 | CamContextI2V: Context-aware Controllable Video Generation | Luis Denninger et.al. | 2504.06022 | link |
| 2025-04-08 | Physics-aware generative models for turbulent fluid flows through energy-consistent stochastic interpolants | Nikolaj T. Mücke et.al. | 2504.05852 | link |
| 2025-04-07 | One-Minute Video Generation with Test-Time Training | Karan Dalal et.al. | 2504.05298 | null |
| 2025-04-07 | Video-Bench: Human-Aligned Video Generation Benchmark | Hui Han et.al. | 2504.04907 | null |
| 2025-04-07 | Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation | Fa-Ting Hong et.al. | 2504.02542 | link |
| 2025-04-05 | Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization | Yikai Wang et.al. | 2504.04153 | link |
| 2025-04-05 | Multi-identity Human Image Animation with Structural Video Diffusion | Zhenzhi Wang et.al. | 2504.04126 | null |
| 2025-04-05 | Can You Count to Nine? A Human Evaluation Benchmark for Counting Limits in Modern Text-to-Video Models | Xuyang Guo et.al. | 2504.04051 | null |
| 2025-04-05 | DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion | Maksim Siniukov et.al. | 2504.04010 | null |
| 2025-04-04 | Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models | Xuran Ma et.al. | 2504.03140 | link |
| 2025-04-04 | MG-Gen: Single Image to Motion Graphics Generation with Layer Decomposition | Takahiro Shirakawa et.al. | 2504.02361 | null |
| 2025-04-03 | How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models | Pascal Chang et.al. | 2504.03072 | null |
| 2025-04-03 | Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments | Chenyu Zhang et.al. | 2504.02918 | null |
| 2025-04-03 | Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets | Chuning Zhu et.al. | 2504.02792 | null |
| 2025-04-03 | Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model | Shengjun Zhang et.al. | 2504.02764 | null |
| 2025-04-03 | ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer | Jiayi Gao et.al. | 2504.02451 | link |
| 2025-04-03 | SkyReels-A2: Compose Anything in Video Diffusion Transformers | Zhengcong Fei et.al. | 2504.02436 | link |
| 2025-04-03 | OmniCam: Unified Multimodal Video Generation via Camera Control | Xiaoda Yang et.al. | 2504.02312 | null |
| 2025-04-03 | VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step | Hanyang Wang et.al. | 2504.01956 | null |
| 2025-04-03 | Loong: Generating Minute-level Long Videos with Autoregressive Language Models | Yuqing Wang et.al. | 2410.02757 | null |
| 2025-04-02 | Proof of Humanity: A Multi-Layer Network Framework for Certifying Human-Originated Content in an AI-Dominated Internet | Sebastian Barros et.al. | 2504.03752 | null |
| 2025-04-02 | WorldPrompter: Traversable Text-to-Scene Generation | Zhaoyang Zhang et.al. | 2504.02045 | null |
| 2025-04-02 | Towards Physically Plausible Video Generation via VLM Planning | Xindi Yang et.al. | 2503.23368 | null |
| 2025-04-01 | AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction | Junhao Cheng et.al. | 2504.01014 | link |
| 2025-04-01 | WorldScore: A Unified Evaluation Benchmark for World Generation | Haoyi Duan et.al. | 2504.00983 | null |
| 2025-04-01 | DecoFuse: Decomposing and Fusing the "What", "Where", and "How" for Brain-Inspired fMRI-to-Video Decoding | Chong Li et.al. | 2504.00432 | null |
| 2025-04-01 | HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation | Boyuan Wang et.al. | 2503.24026 | null |
| 2025-04-01 | On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices | Bosung Kim et.al. | 2503.23796 | link |
| 2025-03-31 | GazeLLM: Multimodal LLMs incorporating Human Visual Attention | Jun Rekimoto et.al. | 2504.00221 | null |
| 2025-03-31 | Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation | Shengqiong Wu et.al. | 2503.24379 | null |
| 2025-03-31 | JointTuner: Appearance-Motion Adaptive Joint Training for Customized Video Generation | Fangda Chen et.al. | 2503.23951 | null |
| 2025-03-31 | HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation | Kun Liu et.al. | 2503.23715 | null |
| 2025-03-30 | VideoGen-Eval: Agent-based System for Video Generation Evaluation | Yuhang Yang et.al. | 2503.23452 | link |
| 2025-03-30 | JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization | Kai Liu et.al. | 2503.23377 | null |
| 2025-03-30 | MoCha: Towards Movie-Grade Talking Character Synthesis | Cong Wei et.al. | 2503.23307 | null |
| 2025-03-30 | SketchVideo: Sketch-based Video Generation and Editing | Feng-Lin Liu et.al. | 2503.23284 | null |
| 2025-03-29 | Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models | Prin Phunyaphibarn et.al. | 2503.20240 | null |
| 2025-03-28 | Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model | Jangho Park et.al. | 2503.22622 | null |
| 2025-03-28 | EchoFlow: A Foundation Model for Cardiac Ultrasound Image and Video Generation | Hadrien Reynaud et.al. | 2503.22357 | null |
| 2025-03-28 | CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving | Yishen Ji et.al. | 2503.22231 | null |
| 2025-03-27 | VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models | Chi-Pin Huang et.al. | 2503.21781 | null |
| 2025-03-27 | Exploring the Evolution of Physics Cognition in Video Generation: A Survey | Minghui Lin et.al. | 2503.21765 | link |
| 2025-03-27 | VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness | Dian Zheng et.al. | 2503.21755 | link |
| 2025-03-27 | Audio-driven Gesture Generation via Deviation Feature in the Latent Space | Jiahui Chen et.al. | 2503.21616 | null |
| 2025-03-27 | ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model | Jinwei Qi et.al. | 2503.21144 | null |
| 2025-03-26 | Protecting Your Video Content: Disrupting Automated Video-based LLM Annotations | Haitong Liu et.al. | 2503.21824 | link |
| 2025-03-26 | Synthetic Video Enhances Physical Fidelity in Video Synthesis | Qi Zhao et.al. | 2503.20822 | null |
| 2025-03-26 | RecTable: Fast Modeling Tabular Data with Rectified Flow | Masane Fuchi et.al. | 2503.20731 | link |
| 2025-03-26 | AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports | Xiangwen Zhang et.al. | 2503.20654 | null |
| 2025-03-26 | GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving | Lloyd Russell et.al. | 2503.20523 | null |
| 2025-03-26 | VPO: Aligning Text-to-Video Generation Models with Prompt Optimization | Jiale Cheng et.al. | 2503.20491 | link |
| 2025-03-26 | Wan: Open and Advanced Large-Scale Video Generative Models | WanTeam et.al. | 2503.20314 | link |
| 2025-03-26 | Video Motion Graphs | Haiyang Liu et.al. | 2503.20218 | null |
| 2025-03-26 | Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing | Jaihoon Kim et.al. | 2503.19385 | null |
| 2025-03-26 | EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models | Yufei Cai et.al. | 2503.19369 | link |
| 2025-03-25 | Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors | Yuke Lou et.al. | 2503.20118 | null |
| 2025-03-25 | Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals | Stefan Stojanov et.al. | 2503.19953 | null |
| 2025-03-25 | FuXi-RTM: A Physics-Guided Prediction Framework with Radiative Transfer Modeling | Qiusheng Huang et.al. | 2503.19940 | null |
| 2025-03-25 | FullDiT: Multi-Task Video Generative Foundation Model with Full Attention | Xuan Ju et.al. | 2503.19907 | null |
| 2025-03-25 | Mask |
Tianhao Qi et.al. | 2503.19881 | null |
| 2025-03-25 | AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers | Jiazhi Guan et.al. | 2503.19824 | null |
| 2025-03-25 | AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset | Haiyu Zhang et.al. | 2503.19462 | null |
| 2025-03-25 | MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation | Yukang Lin et.al. | 2503.19383 | null |
| 2025-03-25 | Long-Context Autoregressive Video Modeling with Next-Frame Prediction | Yuchao Gu et.al. | 2503.19325 | link |
| 2025-03-25 | Aether: Geometric-Aware Unified World Modeling | Aether Team et.al. | 2503.18945 | null |
| 2025-03-25 | AMD-Hummingbird: Towards an Efficient Text-to-Video Model | Takashi Isobe et.al. | 2503.18559 | link |
| 2025-03-25 | Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model | Yingying Fan et.al. | 2503.16942 | null |
| 2025-03-24 | Video-T1: Test-Time Scaling for Video Generation | Fangfu Liu et.al. | 2503.18942 | null |
| 2025-03-24 | Training-free Diffusion Acceleration with Bottleneck Sampling | Ye Tian et.al. | 2503.18940 | null |
| 2025-03-24 | EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation | Qiang Qu et.al. | 2503.18552 | null |
| 2025-03-24 | Can Text-to-Video Generation help Video-Language Alignment? | Luca Zanella et.al. | 2503.18507 | null |
| 2025-03-24 | Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation | Dingcheng Zhen et.al. | 2503.18429 | null |
| 2025-03-24 | Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance | Sicong Feng et.al. | 2503.18386 | null |
| 2025-03-23 | LongDiff: Training-Free Long Video Generation in One Go | Zhuoling Li et.al. | 2503.18150 | null |
| 2025-03-23 | TransAnimate: Taming Layer Diffusion to Generate RGBA Video | Xuewei Chen et.al. | 2503.17934 | null |
| 2025-03-22 | RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation | Zhiqiang Yuan et.al. | 2503.17735 | null |
| 2025-03-21 | Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks | Bhishma Dedhia et.al. | 2503.17539 | null |
| 2025-03-21 | Position: Interactive Generative Video as Next-Generation Game Engine | Jiwen Yu et.al. | 2503.17359 | null |
| 2025-03-21 | AnimatePainter: A Self-Supervised Rendering Framework for Reconstructing Painting Process | Junjie Hu et.al. | 2503.17029 | null |
| 2025-03-21 | Enabling Versatile Controls for Video Diffusion Models | Xu Zhang et.al. | 2503.16983 | link |
| 2025-03-21 | SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation | Chun-Han Yao et.al. | 2503.16396 | null |
| 2025-03-20 | A Recipe for Generating 3D Worlds From a Single Image | Katja Schwarz et.al. | 2503.16611 | null |
| 2025-03-20 | XAttention: Block Sparse Attention with Antidiagonal Scoring | Ruyi Xu et.al. | 2503.16428 | link |
| 2025-03-20 | MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance | Quanhao Li et.al. | 2503.16421 | null |
| 2025-03-20 | ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos | Haolin Yang et.al. | 2503.16400 | null |
| 2025-03-20 | PoseTraj: Pose-Aware Trajectory Control in Video Diffusion | Longbin Ji et.al. | 2503.16068 | null |
| 2025-03-20 | Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models | Marc Benedí San Millán et.al. | 2503.15996 | null |
| 2025-03-20 | MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving | Haiguang Wang et.al. | 2503.15875 | link |
| 2025-03-20 | VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling | Hyojun Go et.al. | 2503.15855 | null |
| 2025-03-20 | VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention | Mingzhe Zheng et.al. | 2503.15138 | null |
| 2025-03-19 | Temporal Regularization Makes Your Video Generator Stronger | Harold Haodong Chen et.al. | 2503.15417 | null |
| 2025-03-19 | Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models | Tingxiu Chen et.al. | 2503.14966 | link |
| 2025-03-18 | MusicInfuser: Making Video Diffusion Listen and Dance | Susung Hong et.al. | 2503.14505 | null |
| 2025-03-18 | MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation | Hongyu Zhang et.al. | 2503.14428 | null |
| 2025-03-18 | Impossible Videos | Zechen Bai et.al. | 2503.14378 | null |
| 2025-03-18 | LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models | Yu Cheng et.al. | 2503.14325 | link |
| 2025-03-18 | Concat-ID: Towards Universal Identity-Preserving Video Synthesis | Yong Zhong et.al. | 2503.14151 | null |
| 2025-03-18 | Fast Autoregressive Video Generation with Diagonal Decoding | Yang Ye et.al. | 2503.14070 | null |
| 2025-03-18 | AIGVE-Tool: AI-Generated Video Evaluation Toolkit with Multifaceted Benchmark | Xinhao Xiang et.al. | 2503.14064 | link |
| 2025-03-17 | MagicDistillation: Weak-to-Strong Video Distillation for Large-Scale Portrait Few-Step Synthesis | Shitong Shao et.al. | 2503.13319 | null |
| 2025-03-17 | Language-guided Open-world Video Anomaly Detection | Zihao Liu et.al. | 2503.13160 | null |
| 2025-03-17 | Frame-wise Conditioning Adaptation for Fine-Tuning Diffusion Models in Text-to-Video Prediction | Zheyuan Liu et.al. | 2503.12953 | null |
| 2025-03-17 | AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations | Quang Trung Truong et.al. | 2503.12828 | null |
| 2025-03-17 | Long-Video Audio Synthesis with Multi-Agent Collaboration | Yehang Zhang et.al. | 2503.10719 | null |
| 2025-03-16 | SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse Inputs | Guibiao Liao et.al. | 2503.12535 | null |
| 2025-03-16 | VMBench: A Benchmark for Perception-Aligned Video Motion Generation | Xinran Ling et.al. | 2503.10076 | link |
| 2025-03-15 | ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis | Yu Fang et.al. | 2503.14526 | null |
| 2025-03-15 | A Speech-to-Video Synthesis Approach Using Spatio-Temporal Diffusion for Vocal Tract MRI | Paula Andrea Pérez-Toro et.al. | 2503.12102 | null |
| 2025-03-15 | SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering | Byeongjun Park et.al. | 2503.12024 | link |
| 2025-03-14 | ReCamMaster: Camera-Controlled Generative Rendering from A Single Video | Jianhong Bai et.al. | 2503.11647 | null |
| 2025-03-14 | HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models | Ziqin Zhou et.al. | 2503.11513 | null |
| 2025-03-14 | TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation | Hongxiang Zhao et.al. | 2503.11423 | null |
| 2025-03-14 | Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model | Haoyang Huang et.al. | 2503.11251 | link |
| 2025-03-14 | Cross-Modal Learning for Music-to-Music-Video Description Generation | Zhuoyuan Mao et.al. | 2503.11190 | null |
| 2025-03-14 | Long Context Tuning for Video Generation | Yuwei Guo et.al. | 2503.10589 | null |
| 2025-03-14 | On the Limitations of Vision-Language Models in Understanding Image Transforms | Ahmad Mustafa Anis et.al. | 2503.09837 | null |
| 2025-03-13 | CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models | Hao He et.al. | 2503.10592 | null |
| 2025-03-13 | CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance | Yufan Deng et.al. | 2503.10391 | null |
| 2025-03-13 | Semantic Latent Motion for Portrait Video Generation | Qiyuan Zhang et.al. | 2503.10096 | null |
| 2025-03-13 | UVE: Are MLLMs Unified Evaluators for AI-Generated Videos? | Yuanxin Liu et.al. | 2503.09949 | link |
| 2025-03-13 | Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers | Yasheng Sun et.al. | 2503.09942 | null |
| 2025-03-13 | VideoMerge: Towards Training-free Long Video Generation | Siyang Zhang et.al. | 2503.09926 | null |
| 2025-03-13 | WonderVerse: Extendable 3D Scene Generation with Video Generative Models | Hao Feng et.al. | 2503.09160 | null |
| 2025-03-12 | Error Analyses of Auto-Regressive Video Diffusion Models: A Unified Framework | Jing Wang et.al. | 2503.10704 | null |
| 2025-03-12 | LuciBot: Automated Robot Policy Learning from Generated Videos | Xiaowen Qiu et.al. | 2503.09871 | null |
| 2025-03-12 | I2V3D: Controllable image-to-video generation with 3D guidance | Zhiyuan Zhang et.al. | 2503.09733 | null |
| 2025-03-12 | Accelerating Diffusion Sampling via Exploiting Local Transition Coherence | Shangwen Zhu et.al. | 2503.09675 | null |
| 2025-03-12 | Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k | Xiangyu Peng et.al. | 2503.09642 | link |
| 2025-03-12 | PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop | Chenyu Li et.al. | 2503.09595 | link |
| 2025-03-12 | Unified Dense Prediction of Video Diffusion | Lehan Yang et.al. | 2503.09344 | null |
| 2025-03-12 | Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latant Space | Jian Zhu et.al. | 2503.09215 | null |
| 2025-03-12 | SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video | Chengshu Zhao et.al. | 2503.09154 | link |
| 2025-03-12 | Reangle-A-Video: 4D Video Generation as Video-to-Video Translation | Hyeonho Jeong et.al. | 2503.09151 | null |
| 2025-03-12 | Alex Ergasti et.al. | 2503.08307 | link | |
| 2025-03-12 | Object-Centric World Model for Language-Guided Manipulation | Youngjoon Jeong et.al. | 2503.06170 | null |
| 2025-03-11 | V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video | Jianqi Chen et.al. | 2503.09631 | null |
| 2025-03-11 | REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder | Yitian Zhang et.al. | 2503.08665 | null |
| 2025-03-11 | Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling | Subin Kim et.al. | 2503.08605 | null |
| 2025-03-11 | WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation | Jing Wang et.al. | 2503.08153 | null |
| 2025-03-11 | ObjectMover: Generative Object Movement with Video Prior | Xin Yu et.al. | 2503.08037 | null |
| 2025-03-11 | How Can Video Generative AI Transform K-12 Education? Examining Teachers' Perspectives through TPACK and TAM | Unggi Lee et.al. | 2503.08003 | null |
| 2025-03-11 | VACE: All-in-One Video Creation and Editing | Zeyinzi Jiang et.al. | 2503.07598 | null |
| 2025-03-11 | LightMotion: A Light and Tuning-free Method for Simulating Camera Motion in Video Generation | Quanjian Song et.al. | 2503.06508 | link |
| 2025-03-10 | DreamRelation: Relation-Centric Video Customization | Yujie Wei et.al. | 2503.07602 | null |
| 2025-03-10 | AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion | Mingzhen Sun et.al. | 2503.07418 | null |
| 2025-03-10 | Automated Movie Generation via Multi-Agent CoT Planning | Weijia Wu et.al. | 2503.07314 | link |
| 2025-03-10 | From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers | Jiacheng Liu et.al. | 2503.06923 | link |
| 2025-03-09 | VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation | Hritik Bansal et.al. | 2503.06800 | null |
| 2025-03-09 | TR-DQ: Time-Rotation Diffusion Quantization | Yihua Shao et.al. | 2503.06564 | null |
| 2025-03-09 | QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation | Junyi Wu et.al. | 2503.06545 | link |
| 2025-03-09 | Generative Video Bi-flow | Chen Liu et.al. | 2503.06364 | null |
| 2025-03-08 | Text2Story: Advancing Video Storytelling with Text Guidance | Taewon Kang et.al. | 2503.06310 | null |
| 2025-03-08 | ROCM: RLHF on consistency models | Shivanshu Shekhar et.al. | 2503.06171 | null |
| 2025-03-08 | VACT: A Video Automatic Causal Testing System and a Benchmark | Haotong Yang et.al. | 2503.06163 | null |
| 2025-03-08 | GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation | Ye Tao et.al. | 2503.06136 | null |
| 2025-03-08 | DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation | Runze Zhang et.al. | 2503.06053 | null |
| 2025-03-08 | The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation | Aoxiong Yin et.al. | 2503.04606 | link |
| 2025-03-08 | Rethinking Video Tokenization: A Conditioned Diffusion-based Approach | Nianzu Yang et.al. | 2503.03708 | link |
| 2025-03-07 | MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice | Hongwei Yi et.al. | 2503.05978 | null |
| 2025-03-07 | MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio | Xuenan Xu et.al. | 2503.05242 | link |
| 2025-03-07 | Unified Reward Model for Multimodal Understanding and Generation | Yibin Wang et.al. | 2503.05236 | null |
| 2025-03-07 | Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos | Zhiyu Tan et.al. | 2502.21314 | null |
| 2025-03-06 | Toward Lightweight and Fast Decoders for Diffusion Models in Image and Video Generation | Alexey Buzovkin et.al. | 2503.04871 | link |
| 2025-03-06 | FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video | Yue Gao et.al. | 2503.04720 | null |
| 2025-03-06 | What Are You Doing? A Closer Look at Controllable Human Video Generation | Emanuele Bugliarello et.al. | 2503.04666 | null |
| 2025-03-05 | ProReflow: Progressive Reflow with Decomposed Velocity | Lei Ke et.al. | 2503.04824 | null |
| 2025-03-05 | GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control | Xuanchi Ren et.al. | 2503.03751 | link |
| 2025-03-05 | DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance | Zhao Yang et.al. | 2503.03689 | link |
| 2025-03-05 | High-Quality Virtual Single-Viewpoint Surgical Video: Geometric Autocalibration of Multiple Cameras in Surgical Lights | Yuna Kato et.al. | 2503.03558 | link |
| 2025-03-05 | Video Super-Resolution: All You Need is a Video Diffusion Model | Zhihao Zhan et.al. | 2503.03355 | null |
| 2025-03-04 | GRADEO: Towards Human-Like Evaluation for Text-to-Video Generation via Multi-Step Reasoning | Zhun Mou et.al. | 2503.02341 | null |
| 2025-03-04 | Unified Video Action Model | Shuang Li et.al. | 2503.00200 | null |
| 2025-03-03 | VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation | Wenhao Wang et.al. | 2503.01739 | link |
| 2025-03-03 | VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors | Juil Koo et.al. | 2503.01107 | null |
| 2025-03-03 | TransVDM: Motion-Constrained Video Diffusion Model for Transparent Video Synthesis | Menghao Li et.al. | 2502.19454 | null |
| 2025-03-02 | Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think | Jie Tian et.al. | 2503.00948 | link |
| 2025-03-01 | Learning to Animate Images from A Few Videos to Portray Delicate Human Actions | Haoxin Li et.al. | 2503.00276 | null |
| 2025-02-28 | Training-free and Adaptive Sparse Attention for Efficient Long Video Generation | Yifei Xia et.al. | 2502.21079 | null |
| 2025-02-28 | HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models | Xiao Wang et.al. | 2502.20811 | null |
| 2025-02-28 | WorldModelBench: Judging Video Generation Models As World Models | Dacheng Li et.al. | 2502.20694 | null |
| 2025-02-28 | RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers | Ke Cao et.al. | 2502.14377 | null |
| 2025-02-27 | Mobius: Text to Seamless Looping Video Generation via Latent Shift | Xiuli Bi et.al. | 2502.20307 | link |
| 2025-02-27 | FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute | Sotiris Anagnostidis et.al. | 2502.20126 | null |
| 2025-02-27 | C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation | Yuhao Li et.al. | 2502.19868 | link |
| 2025-02-26 | Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis | Long Cheng et.al. | 2503.01873 | null |
| 2025-02-26 | Glad: A Streaming Scene Generator for Autonomous Driving | Bin Xie et.al. | 2503.00045 | null |
| 2025-02-26 | FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion mode | Lingzhou Mu et.al. | 2502.19455 | null |
| 2025-02-25 | SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference | Jintao Zhang et.al. | 2502.18137 | link |
| 2025-02-25 | ASurvey: Spatiotemporal Consistency in Video Generation | Zhiyu Yin et.al. | 2502.17863 | null |
| 2025-02-24 | X-Dancer: Expressive Music to Human Dance Video Generation | Zeyuan Chen et.al. | 2502.17414 | null |
| 2025-02-24 | VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing | Xiangpeng Yang et.al. | 2502.17258 | null |
| 2025-02-24 | Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions | Zhong Li et.al. | 2502.17119 | link |
| 2025-02-21 | RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers | Min Zhao et.al. | 2502.15894 | null |
| 2025-02-21 | VaViM and VaVAM: Autonomous Driving through Video Generative Modeling | Florent Bartoccioni et.al. | 2502.15672 | link |
| 2025-02-21 | LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities | Florian Sestak et.al. | 2502.12128 | link |
| 2025-02-20 | Hardware-Friendly Static Quantization Method for Video Diffusion Transformers | Sanghyun Yi et.al. | 2502.15077 | null |
| 2025-02-20 | LAVID: An Agentic LVLM Framework for Diffusion-Generated Video Detection | Qingyuan Liu et.al. | 2502.14994 | null |
| 2025-02-20 | Improving the Diffusability of Autoencoders | Ivan Skorokhodov et.al. | 2502.14831 | null |
| 2025-02-20 | Designing Parameter and Compute Efficient Diffusion Transformers using Distillation | Vignesh Sundaresha et.al. | 2502.14226 | null |
| 2025-02-19 | FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation | Yunpeng Zhang et.al. | 2502.13995 | link |
| 2025-02-19 | LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation | Junchen Fu et.al. | 2502.12945 | null |
| 2025-02-18 | VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation | Xinlong Chen et.al. | 2502.12782 | link |
| 2025-02-18 | MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation | Sihyun Yu et.al. | 2502.12632 | null |
| 2025-02-17 | DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation | Zhihang Yuan et.al. | 2502.11897 | link |
| 2025-02-17 | Object-Centric Image to Video Generation with Language Guidance | Angel Villar-Corrales et.al. | 2502.11655 | null |
| 2025-02-17 | Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model | Guoqing Ma et.al. | 2502.10248 | link |
| 2025-02-17 | Magic 1-For-1: Generating One Minute Video Clips within One Minute | Hongwei Yi et.al. | 2502.07701 | link |
| 2025-02-16 | MaskFlow: Discrete Flows For Flexible and Efficient Long Video Generation | Michael Fuest et.al. | 2502.11234 | null |
| 2025-02-16 | Phantom: Subject-consistent video generation via cross-modal alignment | Lijie Liu et.al. | 2502.11079 | null |
| 2025-02-15 | SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers | Di Qiu et.al. | 2502.10841 | link |
| 2025-02-14 | RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control | Teng Li et.al. | 2502.10059 | null |
| 2025-02-14 | GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation | Hongyin Zhang et.al. | 2502.09268 | null |
| 2025-02-13 | Enhance-A-Video: Better Generated Video for Free | Yang Luo et.al. | 2502.07508 | link |
| 2025-02-12 | CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation | Qinghe Wang et.al. | 2502.08639 | null |
| 2025-02-12 | FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis | Wonjoon Jin et.al. | 2502.08244 | null |
| 2025-02-12 | Learning Human Skill Generators at Key-Step Levels | Yilu Wu et.al. | 2502.08234 | null |
| 2025-02-12 | AnyCharV: Bootstrap Controllable Character Video Generation with Fine-to-Coarse Guidance | Zhao Wang et.al. | 2502.08189 | null |
| 2025-02-12 | Next Block Prediction: Video Generation via Semi-Autoregressive Modeling | Shuhuai Ren et.al. | 2502.07737 | null |
| 2025-02-12 | VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation | Sixiao Zheng et.al. | 2502.07531 | null |
| 2024-05-07 | LLM-grounded Video Diffusion Models | Long Lian et.al. | 2309.17444 | null |
| 2023-10-12 | Echocardiography video synthesis from end diastolic semantic map via diffusion model | Phi Nguyen Van et.al. | 2310.07131 | null |
| 2023-05-30 | Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising | Fu-Yun Wang et.al. | 2305.18264 | null |
| 2023-03-21 | Latent Video Diffusion Models for High-Fidelity Long Video Generation | Yingqing He et.al. | 2211.13221 | null |
| 2022-07-12 | Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis | Long Zhuo et.al. | 2207.05049 | null |
| 2021-12-02 | Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image | Andrew Liu et.al. | 2012.09855 | null |
| 2020-11-10 | Audeo: Audio Generation for a Silent Performance Video | Kun Su et.al. | 2006.14348 | null |
| 2019-10-15 | TruNet: Short Videos Generation from Long Videos via Story-Preserving Truncation | Fan Yang et.al. | 1910.05899 | null |
TryOn
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-11-18 | A System Dynamics Approach to Evaluating Sludge Management Strategies in Vinasse Treatment: Cost-Benefit Analysis and Scenario Assessment | Agustin Olivares et.al. | 2511.14607 | null |
| 2025-11-18 | PFAvatar: Pose-Fusion 3D Personalized Avatar Reconstruction from Real-World Outfit-of-the-Day Photos | Dianbing Xi et.al. | 2511.12935 | null |
| 2025-11-17 | Multi-Objective Statistical Model Checking using Lightweight Strategy Sampling (extended version) | Pedro R. D'Argenio et.al. | 2511.13460 | null |
| 2025-11-16 | Nonlocal action in Everettian Quantum Mechanics | Mordecai Waegell et.al. | 2511.12403 | null |
| 2025-11-16 | RefVTON: person-to-person Try on with Additional Unpaired Visual Reference | Liuzhuozheng Li et.al. | 2511.00956 | null |
| 2025-11-14 | Learning Fair Representations with Kolmogorov-Arnold Networks | Amisha Priyadarshini et.al. | 2511.11767 | null |
| 2025-11-14 | Discovering Meaningful Units with Visually Grounded Semantics from Image Captions | Melika Behjati et.al. | 2511.11262 | null |
| 2025-11-14 | Power Ensemble Aggregation for Improved Extreme Event AI Prediction | Julien Collard et.al. | 2511.11170 | null |
| 2025-11-13 | Optimal Welfare in Noncooperative Network Formation under Attack | Natan Doubez et.al. | 2511.10845 | null |
| 2025-11-13 | Generating optimal Gravitational-Wave template banks with metric-preserving autoencoders | Giovanni Cabass et.al. | 2511.10466 | null |
| 2025-11-12 | Efficiently Transforming Neural Networks into Decision Trees: A Path to Ground Truth Explanations with RENTT | Helena Monke et.al. | 2511.09299 | null |
| 2025-11-12 | Food as Soft Power: Taiwanese Gastrodiplomacy on Social Media and Algorithmic Suppression | Andrew Yen Chang et.al. | 2511.05729 | null |
| 2025-11-10 | Detecting Suicidal Ideation in Text with Interpretable Deep Learning: A CNN-BiGRU with Attention Mechanism | Mohaiminul Islam Bhuiyan et.al. | 2511.08636 | null |
| 2025-11-10 | On maximizing private neighbors in graphs | Stephen T. Hedetniemi et.al. | 2511.07248 | null |
| 2025-11-06 | Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts | Ellis Brown et.al. | 2511.04655 | null |
| 2025-11-06 | IntelliProof: An Argumentation Network-based Conversational Helper for Organized Reflection | Kaveh Eskandari Miandoab et.al. | 2511.04528 | null |
| 2025-11-06 | The truth is no diaper: Human and AI-generated associations to emotional words | Špela Vintar et.al. | 2511.04077 | null |
| 2025-11-04 | Effective Test-Time Scaling of Discrete Diffusion through Iterative Refinement | Sanghyun Lee et.al. | 2511.05562 | null |
| 2025-11-04 | FLAME: Flexible and Lightweight Biometric Authentication Scheme in Malicious Environments | Fuyi Wang et.al. | 2511.02176 | null |
| 2025-11-03 | Confounding Factors in Relating Model Performance to Morphology | Wessel Poelman et.al. | 2511.01380 | null |
| 2025-11-02 | AGRAG: Advanced Graph-based Retrieval-Augmented Generation for LLMs | Yubo Wang et.al. | 2511.05549 | null |
| 2025-11-01 | Sparse and nonparametric estimation of equations governing dynamical systems with applications to biology | G. Pillonetto et.al. | 2511.00579 | null |
| 2025-10-31 | Quantum-dot single photon source performance with off-resonant pulse preparation schemes | Gavin Crowder et.al. | 2511.00243 | null |
| 2025-10-31 | EL-MIA: Quantifying Membership Inference Risks of Sensitive Entities in LLMs | Ali Satvaty et.al. | 2511.00192 | null |
| 2025-10-31 | Consistency Training Helps Stop Sycophancy and Jailbreaks | Alex Irpan et.al. | 2510.27062 | null |
| 2025-10-30 | Ring-polymer instanton theory for tunneling between asymmetric wells | Marit R. Fiechter et.al. | 2510.26592 | null |
| 2025-10-29 | Heuristic Quantum Advantage with Peaked Circuits | Hrant Gharibyan et.al. | 2510.25838 | null |
| 2025-10-29 | Tackling the Algorithmic Control Crisis -- the Technical, Legal, and Ethical Challenges of Research into Algorithmic Agents | B. Bodo et.al. | 2510.25337 | null |
| 2025-10-16 | ART-VITON: Measurement-Guided Latent Diffusion for Artifact-Free Virtual Try-On | Junseo Park et.al. | 2509.25749 | null |
| 2025-10-09 | Once Is Enough: Lightweight DiT-Based Video Virtual Try-On via One-Time Garment Appearance Injection | Yanjie Pan et.al. | 2510.07654 | null |
| 2025-10-06 | AvatarVTON: 4D Virtual Try-On for Animatable Avatars | Zicheng Jiang et.al. | 2510.04822 | null |
| 2025-10-03 | DiT-VTON: Diffusion Transformer Framework for Unified Multi-Category Virtual Try-On and Virtual Try-All with Integrated Image Editing | Qi Li et.al. | 2510.04797 | null |
| 2025-10-01 | Virtual Fashion Photo-Shoots: Building a Large-Scale Garment-Lookbook Dataset | Yannick Hauri et.al. | 2510.00633 | null |
| 2025-09-29 | UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections | Zeyu Cai et.al. | 2509.24817 | null |
| 2025-09-29 | ControlHair: Physically-based Video Diffusion for Controllable Dynamic Hair Rendering | Weikai Lin et.al. | 2509.21541 | null |
| 2025-09-24 | InstructVTON: Optimal Auto-Masking and Natural-Language-Guided Interactive Style Control for Inpainting-Based Virtual Try-On | Julien Han et.al. | 2509.20524 | null |
| 2025-09-24 | Efficient Encoder-Free Pose Conditioning and Pose Control for Virtual Try-On | Qi Li et.al. | 2509.20343 | null |
| 2025-09-23 | Clothing agnostic Pre-inpainting Virtual Try-ON | Sehyun Kim et.al. | 2509.17654 | null |
| 2025-09-21 | SemanticGarment: Semantic-Controlled Generation and Editing of 3D Gaussian Garments | Ruiyan Wang et.al. | 2509.16960 | null |
| 2025-09-16 | DEFT-VTON: Efficient Virtual Try-On with Consistent Generalised H-Transform | Xingzi Xu et.al. | 2509.13506 | null |
| 2025-09-05 | LUIVITON: Learned Universal Interoperable VIrtual Try-ON | Cong Cao et.al. | 2509.05030 | null |
| 2025-09-04 | Virtual Fitting Room: Generating Arbitrarily Long Videos of Virtual Try-On from a Single Image -- Technical Preview | Jun-Kun Chen et.al. | 2509.04450 | null |
| 2025-09-04 | Towards High-Fidelity, Identity-Preserving Real-Time Makeup Transfer: Decoupling Style Generation | Lydia Kin Ching Chau et.al. | 2509.02445 | null |
| 2025-08-30 | IC-Custom: Diverse Image Customization via In-Context Learning | Yaowei Li et.al. | 2507.01926 | null |
| 2025-08-28 | Dress&Dance: Dress up and Dance as You Like It - Technical Preview | Jun-Kun Chen et.al. | 2508.21070 | null |
| 2025-08-28 | FastFit: Accelerating Multi-Reference Virtual Try-On via Cacheable Diffusion Models | Zheng Chong et.al. | 2508.20586 | null |
| 2025-08-25 | JCo-MVTON: Jointly Controllable Multi-Modal Diffusion Transformer for Mask-Free Virtual Try-on | Aowen Wang et.al. | 2508.17614 | null |
| 2025-08-19 | OmniTry: Virtual Try-On Anything without Masks | Yutong Feng et.al. | 2508.13632 | null |
| 2025-08-16 | DualFit: A Two-Stage Virtual Try-On via Warping and Synthesis | Minh Tran et.al. | 2508.12131 | null |
| 2025-08-12 | StyleTailor: Towards Personalized Fashion Styling via Hierarchical Negative Feedback | Hongbo Ma et.al. | 2508.06555 | null |
| 2025-08-11 | MuGa-VTON: Multi-Garment Virtual Try-On via Diffusion Transformers with Prompt Customization | Ankan Deria et.al. | 2508.08488 | null |
| 2025-08-11 | Undress to Redress: A Training-Free Framework for Virtual Try-On | Zhiying Li et.al. | 2508.07680 | null |
| 2025-08-07 | One Model For All: Partial Diffusion for Unified Try-On and Try-Off in Any Pose | Jinxi Liu et.al. | 2508.04559 | null |
| 2025-08-06 | Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off | Seungyong Lee et.al. | 2508.04825 | null |
| 2025-08-06 | Two-Way Garment Transfer: Unified Diffusion Framework for Dressing and Undressing Synthesis | Angang Zhang et.al. | 2508.04551 | null |
| 2025-08-06 | FFHQ-Makeup: Paired Synthetic Makeup Dataset with Facial Consistency Across Multiple Styles | Xingchao Yang et.al. | 2508.03241 | null |
| 2025-08-04 | DreamVVT: Mastering Realistic Video Virtual Try-On in the Wild via a Stage-Wise Diffusion Transformer Framework | Tongchun Zuo et.al. | 2508.02807 | null |
| 2025-07-29 | From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos | Chenjian Gao et.al. | 2507.20331 | null |
| 2025-07-29 | Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism | Jun Zheng et.al. | 2412.09822 | null |
| 2025-07-21 | FW-VTON: Flattening-and-Warping for Person-to-Person Virtual Try-on | Zheng Wang et.al. | 2507.16010 | null |
| 2025-07-20 | OmniVTON: Training-Free Universal Virtual Try-On | Zhaotong Yang et.al. | 2507.15037 | null |
| 2025-07-11 | Scalable and Realistic Virtual Try-on Application for Foundation Makeup with Kubelka-Munk Theory | Hui Pang et.al. | 2507.07333 | null |
| 2025-07-08 | TalkFashion: Intelligent Virtual Try-On Assistant Based on Multimodal Large Language Model | Yujie Hu et.al. | 2507.05790 | null |
| 2025-07-02 | FreeLoRA: Enabling Training-Free LoRA Fusion for Autoregressive Multi-Subject Personalization | Peng Zheng et.al. | 2507.01792 | null |
| 2025-06-30 | KiseKloset: Comprehensive System For Outfit Retrieval, Recommendation, And Try-On | Thanh-Tung Phan-Nguyen et.al. | 2506.23471 | null |
| 2025-06-29 | DiffFit: Disentangled Garment Warping and Texture Refinement for Virtual Try-On | Xiang Xu et.al. | 2506.23295 | null |
| 2025-06-26 | Video Virtual Try-on with Conditional Diffusion Transformer Inpainter | Cheng Zou et.al. | 2506.21270 | null |
| 2025-06-23 | InstructAttribute: Fine-grained Object Attributes editing with Instruction | Xingxi Yin et.al. | 2505.00751 | null |
| 2025-06-14 | Real-Time Per-Garment Virtual Try-On with Temporal Consistency for Loose-Fitting Garments | Zaiqiang Wu et.al. | 2506.12348 | null |
| 2025-06-13 | HF-VTON: High-Fidelity Virtual Try-On via Consistent Geometric and Semantic Alignment | Ming Meng et.al. | 2505.19638 | null |
| 2025-06-12 | Low-Barrier Dataset Collection with Real Human Body for Interactive Per-Garment Virtual Try-On | Zaiqiang Wu et.al. | 2506.10468 | null |
| 2025-06-06 | ChronoTailor: Harnessing Attention Guidance for Fine-Grained Video Virtual Try-On | Jinjuan Wang et.al. | 2506.05858 | null |
| 2025-06-02 | OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation | Sen Liang et.al. | 2506.01801 | null |
| 2025-06-01 | DS-VTON: High-Quality Virtual Try-on via Disentangled Dual-Scale Generation | Xianbing Sun et.al. | 2506.00908 | null |
| 2025-05-29 | VITON-DRR: Details Retention Virtual Try-on via Non-rigid Registration | Ben Li et.al. | 2505.23439 | null |
| 2025-05-28 | MagicTryOn: Harnessing Diffusion Transformer for Garment-Preserving Video Virtual Try-on | Guangyuan Li et.al. | 2505.21325 | null |
| 2025-05-27 | Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals | Davide Lobba et.al. | 2505.21062 | null |
| 2025-05-26 | VTBench: Comprehensive Benchmark Suite Towards Real-World Virtual Try-on Models | Hu Xiaobin et.al. | 2505.19571 | null |
| 2025-05-22 | Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction | Dong Li et.al. | 2505.16980 | null |
| 2025-05-22 | Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On | Siqi Wan et.al. | 2505.16977 | link |
| 2025-05-15 | Single View Garment Reconstruction Using Diffusion Mapping Via Pattern Coordinates | Ren Li et.al. | 2504.08353 | link |
| 2025-04-29 | Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting | Hanxi Liu et.al. | 2504.20403 | null |
| 2025-04-24 | FashionM3: Multimodal, Multitask, and Multiround Fashion Assistant based on Unified Vision-Language Model | Kaicheng Pang et.al. | 2504.17826 | null |
| 2025-04-24 | 3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models | Min Wei et.al. | 2504.17414 | null |
| 2025-04-21 | Shape-Guided Clothing Warping for Virtual Try-On | Xiaoyu Han et.al. | 2504.15232 | link |
| 2025-04-21 | Insert Anything: Image Insertion via In-Context Editing in DiT | Wensong Song et.al. | 2504.15009 | null |
| 2025-04-19 | Flux Already Knows -- Activating Subject-Driven Image Generation without Training | Hao Kang et.al. | 2504.11478 | link |
| 2025-04-19 | Concat-ID: Towards Universal Identity-Preserving Video Synthesis | Yong Zhong et.al. | 2503.14151 | null |
| 2025-04-18 | Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation | Fulvio Sanguigni et.al. | 2504.14011 | null |
| 2025-04-17 | Enhancing Person-to-Person Virtual Try-On with Multi-Garment Virtual Try-Off | Riza Velioglu et.al. | 2504.13078 | link |
| 2025-04-15 | ReZero: Enhancing LLM search ability by trying one-more-time | Alan Dao et.al. | 2504.11001 | null |
| 2025-04-11 | VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction | Zijian He et.al. | 2503.12165 | null |
| 2025-04-04 | From Keypoints to Realism: A Realistic and Accurate Virtual Try-on Network from 2D Images | Maliheh Toozandehjani et.al. | 2504.03807 | null |
| 2025-04-03 | MAD: Makeup All-in-One with Cross-Domain Diffusion Model | Bo-Kai Ruan et.al. | 2504.02545 | null |
| 2025-04-01 | Diffusion Model-Based Size Variable Virtual Try-On Technology and Evaluation Method | Shufang Zhang et.al. | 2504.00562 | null |
| 2025-03-26 | ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On | Ji Woo Hong et.al. | 2503.20418 | null |
| 2025-03-26 | Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks | Hailong Guo et.al. | 2501.15891 | null |
| 2025-03-25 | Exploring Disentangled and Controllable Human Image Synthesis: From End-to-End to Stage-by-Stage | Zhengwentai Sun et.al. | 2503.19486 | null |
| 2025-03-20 | Shining Yourself: High-Fidelity Ornaments Virtual Try-on with Diffusion Model | Yingmao Miao et.al. | 2503.16065 | null |
| 2025-03-18 | Limb-Aware Virtual Try-On Network with Progressive Clothing Warping | Shengping Zhang et.al. | 2503.14074 | link |
| 2025-03-16 | Progressive Limb-Aware Virtual Try-On | Xiaoyu Han et.al. | 2503.12588 | link |
| 2025-03-15 | ITVTON: Virtual Try-On Diffusion Transformer Based on Integrated Image and Text | Haifeng Ni et.al. | 2501.16757 | null |
| 2025-03-11 | MF-VITON: High-Fidelity Mask-Free Virtual Try-On with Minimal Input | Zhenchen Wan et.al. | 2503.08650 | null |
| 2025-03-11 | RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency | Siqi Li et.al. | 2501.08682 | null |
| 2025-02-20 | CrossVTON: Mimicking the Logic Reasoning on Cross-category Virtual Try-on guided by Tri-zone Priors | Donghao Luo et.al. | 2502.14373 | null |
| 2025-02-05 | Dress-1-to-3: Single Image to Simulation-Ready 3D Outfit with Diffusion Prior and Differentiable Physics | Xuan Li et.al. | 2502.03449 | null |
| 2025-02-03 | MFP-VTON: Enhancing Mask-Free Person-to-Person Virtual Try-On via Diffusion Transformer | Le Shen et.al. | 2502.01626 | null |
| 2025-01-26 | IPVTON: Image-based 3D Virtual Try-on with Image Prompt Adapter | Xiaojing Zhong et.al. | 2501.15616 | null |
| 2025-01-26 | Cross-Cultural Fashion Design via Interactive Large Language Models and Diffusion Models | Spencer Ramsey et.al. | 2501.15571 | null |
| 2025-01-20 | EfficientVITON: An Efficient Virtual Try-On Model using Optimized Diffusion Process | Mostafa Atef et.al. | 2501.11776 | null |
| 2025-01-20 | CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation | Zheng Chong et.al. | 2501.11325 | link |
| 2025-01-17 | Disharmony: Forensics using Reverse Lighting Harmonization | Philip Wootaek Shin et.al. | 2501.10212 | null |
| 2025-01-12 | ODPG: Outfitting Diffusion with Pose Guided Condition | Seohyun Lee et.al. | 2501.06769 | null |
| 2025-01-10 | MC-VTON: Minimal Control Virtual Try-On Diffusion Transformer | Junsheng Luan et.al. | 2501.03630 | null |
| 2025-01-09 | 1-2-1: Renaissance of Single-Network Paradigm for Virtual Try-On | Shuliang Ning et.al. | 2501.05369 | null |
| 2025-01-08 | Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling | Nannan Li et.al. | 2501.04666 | null |
| 2025-01-07 | HYB-VITON: A Hybrid Approach to Virtual Try-On Combining Explicit and Implicit Warping | Kosuke Takemoto et.al. | 2501.03910 | link |
| 2025-01-07 | VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control | Yuanpeng Tu et.al. | 2501.01427 | null |
| 2024-12-25 | DRDM: A Disentangled Representations Diffusion Model for Synthesizing Realistic Person Images | Enbo Huang et.al. | 2412.18797 | null |
| 2024-12-22 | PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask | Jeongho Kim et.al. | 2412.16978 | link |
| 2024-12-19 | DiffusionTrend: A Minimalist Approach to Virtual Fashion Try-On | Wengyi Zhan et.al. | 2412.14465 | null |
| 2024-12-19 | FashionComposer: Compositional Fashion Image Generation | Sihui Ji et.al. | 2412.14168 | null |
| 2024-11-18 | Try-On-Adapter: A Simple and Flexible Try-On Paradigm | Hanzhong Guo et.al. | 2411.10187 | null |
| 2024-07-18 | Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models | Phuong Dam et.al. | 2403.07371 | null |
| 2024-07-18 | Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images | Aiyu Cui et.al. | 2311.16094 | null |
| 2024-06-05 | GraVITON: Graph based garment warping with attention guided inversion for Virtual-tryon | Sanhita Pathak et.al. | 2406.02184 | null |
| 2024-05-28 | Single Stage Warped Cloth Learning and Semantic-Contextual Attention Feature Fusion for Virtual TryOn | Sanhita Pathak et.al. | 2310.05024 | null |
| 2024-05-08 | VTON-IT: Virtual Try-On using Image Translation | Santosh Adhikari et.al. | 2310.04558 | null |
| 2024-04-29 | Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos | Zhengze Xu et.al. | 2404.17571 | null |
| 2024-04-02 | TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On | Jiazheng Xing et.al. | 2404.00878 | null |
| 2023-04-03 | Learning Garment DensePose for Robust Warping in Virtual Try-On | Aiyu Cui et.al. | 2303.17688 | null |
| 2021-09-13 | Per Garment Capture and Synthesis for Real-time Virtual Try-on | Toby Chong et.al. | 2109.04654 | null |
| 2021-08-25 | ARShoe: Real-Time Augmented Reality Shoe Try-on System on Smartphones | Shan An et.al. | 2108.10515 | null |
| 2021-06-01 | An Efficient Style Virtual Try on Network for Clothing Business Industry | Shanchen Pang et.al. | 2105.13183 | null |
| 2021-01-14 | ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on | Gaurav Kuppa et.al. | 2012.10495 | null |
| 2016-02-22 | Issues in the Multiple Try Metropolis mixing | L. Martino et.al. | 1508.04253 | null |
| 2015-02-27 | Trying to understand dark matter | B. Hoeneisen et.al. | 1502.07375 | null |
| 2014-05-20 | On the flexibility of the design of Multiple Try Metropolis schemes | Luca Martino et.al. | 1201.0646 | null |
Visual Edit
Visual Edit
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-11-18 | UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning | Rui Tian et.al. | 2511.14760 | null |
| 2025-11-18 | Task Addition and Weight Disentanglement in Closed-Vocabulary Models | Adam Hazimeh et.al. | 2511.14569 | null |
| 2025-11-18 | ManipShield: A Unified Framework for Image Manipulation Detection, Localization and Explanation | Zitong Xu et.al. | 2511.14259 | null |
| 2025-11-18 | InstantViR: Real-Time Video Inverse Problem Solver with Distilled Diffusion Prior | Weimin Bai et.al. | 2511.14208 | null |
| 2025-11-18 | UniSER: A Foundation Model for Unified Soft Effects Removal | Jingdong Zhang et.al. | 2511.14183 | null |
| 2025-11-18 | Text-Driven Reasoning Video Editing via Reinforcement Learning on Digital Twin Representations | Yiqing Shen et.al. | 2511.14100 | null |
| 2025-11-18 | Error-Driven Scene Editing for 3D Grounding in Large Language Models | Yue Zhang et.al. | 2511.14086 | null |
| 2025-11-18 | Semantic Context Matters: Improving Conditioning for Autoregressive Models | Dongyang Jin et.al. | 2511.14063 | null |
| 2025-11-18 | Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline | Rui Zuo et.al. | 2511.13442 | null |
| 2025-11-18 | MedGEN-Bench: Contextually entangled benchmark for open-ended multimodal medical generation | Junjie Yang et.al. | 2511.13135 | null |
| 2025-11-17 | Free-Form Scene Editor: Enabling Multi-Round Object Manipulation like in a 3D Engine | Xincheng Shuai et.al. | 2511.13713 | null |
| 2025-11-17 | Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting | Jiangnan Ye et.al. | 2511.13684 | null |
| 2025-11-17 | Language-Guided Invariance Probing of Vision-Language Models | Jae Joong Lee et.al. | 2511.13494 | null |
| 2025-11-17 | Semantic Document Derendering: SVG Reconstruction via Vision-Language Modeling | Adam Hazimeh et.al. | 2511.13478 | null |
| 2025-11-17 | TripleFDS: Triple Feature Disentanglement and Synthesis for Scene Text Editing | Yuchen Bao et.al. | 2511.13399 | null |
| 2025-11-17 | SkyReels-Text: Fine-grained Font-Controllable Text Editing for Poster Design | Yunjie Yu et.al. | 2511.13285 | null |
| 2025-11-17 | Uncovering and Mitigating Transient Blindness in Multimodal Model Editing | Xiaoqi Han et.al. | 2511.13243 | null |
| 2025-11-17 | InteractiveGNNExplainer: A Visual Analytics Framework for Multi-Faceted Understanding and Probing of Graph Neural Network Predictions | TC Singh et.al. | 2511.13160 | null |
| 2025-11-17 | Semantic Prioritization in Visual Counterfactual Explanations with Weighted Segmentation and Auto-Adaptive Region Selection | Lintong Zhang et.al. | 2511.12992 | null |
| 2025-11-17 | Text2Traffic: A Text-to-Image Generation and Editing Method for Traffic Scenes | Feng Lv et.al. | 2511.12932 | null |
| 2025-11-17 | Generative Photographic Control for Scene-Consistent Video Cinematic Editing | Huiqiang Sun et.al. | 2511.12921 | null |
| 2025-11-16 | Catastrophic Forgetting in Kolmogorov-Arnold Networks | Mohammad Marufur Rahman et.al. | 2511.12828 | null |
| 2025-11-16 | Toward Real-world Text Image Forgery Localization: Structured and Interpretable Data Synthesis | Zeqin Yu et.al. | 2511.12658 | null |
| 2025-11-16 | Designed to Spread: Generative Approaches to Enhance Information Diffusion | Ziqing Qian et.al. | 2511.12516 | null |
| 2025-11-15 | ZoomEarth: Active Perception for Ultra-High-Resolution Geospatial Vision-Language Tasks | Ruixun Liu et.al. | 2511.12267 | null |
| 2025-11-15 | Mixture of States: Routing Token-Level Dynamics for Multimodal Generation | Haozhe Liu et.al. | 2511.12207 | null |
| 2025-11-15 | FIA-Edit: Frequency-Interactive Attention for Efficient and High-Fidelity Inversion-Free Text-Guided Image Editing | Kaixiang Yang et.al. | 2511.12151 | null |
| 2025-11-15 | Image-POSER: Reflective RL for Multi-Expert Image Generation and Editing | Hossein Mohebbi et.al. | 2511.11780 | null |
| 2025-11-14 | PEtab-GUI: A graphical user interface to create, edit and inspect PEtab parameter estimation problems | Paul Jonas Jost et.al. | 2511.11515 | null |
| 2025-11-14 | ImAgent: A Unified Multimodal Agent Framework for Test-Time Scalable Image Generation | Kaishen Wang et.al. | 2511.11483 | null |
| 2025-11-14 | WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation | Wei Chow et.al. | 2511.11434 | null |
| 2025-11-14 | SimuFreeMark: A Noise-Simulation-Free Robust Watermarking Against Image Editing | Yichao Tang et.al. | 2511.11295 | null |
| 2025-11-14 | Parameter-Efficient MoE LoRA for Few-Shot Multi-Style Editing | Cong Cao et.al. | 2511.11236 | null |
| 2025-11-14 | On the Information-Theoretic Fragility of Robust Watermarking under Diffusion Editing | Yunyi Ni et.al. | 2511.10933 | null |
| 2025-11-14 | STELLAR: Scene Text Editor for Low-Resource Languages and Real-World Data | Yongdeuk Seo et.al. | 2511.09977 | null |
| 2025-11-14 | UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation | Zhen Yang et.al. | 2511.08195 | null |
| 2025-11-13 | IPCD: Intrinsic Point-Cloud Decomposition | Shogo Sato et.al. | 2511.09866 | null |
| 2025-11-13 | AHA! Animating Human Avatars in Diverse Scenes with Gaussian Splatting | Aymen Mir et.al. | 2511.09827 | null |
| 2025-11-12 | SliderEdit: Continuous Image Editing with Fine-Grained Instruction Control | Arman Zarei et.al. | 2511.09715 | null |
| 2025-11-11 | RePose-NeRF: Robust Radiance Fields for Mesh Reconstruction under Noisy Camera Poses | Sriram Srinivasan et.al. | 2511.08545 | null |
| 2025-11-11 | 3D4D: An Interactive, Editable, 4D World Model via 3D Video Generation | Yunhong He et.al. | 2511.08536 | null |
| 2025-11-11 | UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist | Zhengyang Liang et.al. | 2511.08521 | null |
| 2025-11-11 | HardFlow: Hard-Constrained Sampling for Flow-Matching Models via Trajectory Optimization | Zeyang Li et.al. | 2511.08425 | null |
| 2025-11-11 | LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning | Fengyi Fu et.al. | 2511.08251 | null |
| 2025-11-11 | VectorSynth: Fine-Grained Satellite Image Synthesis with Structured Semantics | Daniel Cher et.al. | 2511.07744 | null |
| 2025-11-09 | Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising | Assaf Singer et.al. | 2511.08633 | null |
| 2025-11-09 | AesTest: Measuring Aesthetic Intelligence from Perception to Production | Guolong Wang et.al. | 2511.06360 | null |
| 2025-11-09 | RelightMaster: Precise Video Relighting with Multi-plane Light Images | Weikang Bian et.al. | 2511.06271 | null |
| 2025-11-07 | On the Brittleness of CLIP Text Encoders | Allie Tran et.al. | 2511.04247 | null |
| 2025-11-07 | Med-Banana-50K: A Cross-modality Large-Scale Dataset for Text-guided Medical Image Editing | Zhihui Chen et.al. | 2511.00801 | null |
| 2025-11-06 | Personalized Image Editing in Text-to-Image Diffusion Models via Collaborative Direct Preference Optimization | Connor Dunlop et.al. | 2511.05616 | null |
| 2025-11-06 | MusRec: Zero-Shot Text-to-Music Editing via Rectified Flow and Diffusion Transformers | Ali Boudaghi et.al. | 2511.04376 | null |
| 2025-11-06 | Improving Multi-View Reconstruction via Texture-Guided Gaussian-Mesh Joint Optimization | Zhejia Cai et.al. | 2511.03950 | null |
| 2025-11-05 | Diffusion-Based Image Editing: An Unforeseen Adversary to Robust Invisible Watermarks | Wenkai Fu et.al. | 2511.05598 | null |
| 2025-11-05 | Disentangled Concepts Speak Louder Than Words:Explainable Video Action Recognition | Jongseo Lee et.al. | 2511.03725 | null |
| 2025-11-05 | Unified Long Video Inpainting and Outpainting via Overlapping High-Order Co-Denoising | Shuangquan Lyu et.al. | 2511.03272 | null |
| 2025-11-05 | ESA: Energy-Based Shot Assembly Optimization for Automatic Video Editing | Yaosen Chen et.al. | 2511.02505 | null |
| 2025-11-03 | UniREditBench: A Unified Reasoning-based Image Editing Benchmark | Feng Han et.al. | 2511.01295 | null |
| 2025-10-31 | BlurGuard: A Simple Approach for Robustifying Image Protection Against AI-Powered Editing | Jinsu Kim et.al. | 2511.00143 | null |
| 2025-10-31 | Understanding the Implicit User Intention via Reasoning with Large Language Model for Image Editing | Yijia Wang et.al. | 2510.27335 | null |
| 2025-10-30 | Security Risk of Misalignment between Text and Image in Multi-modal Model | Xiaosen Wang et.al. | 2510.26105 | null |
| 2025-10-29 | LGCC: Enhancing Flow Matching Based Text-Guided Image Editing with Local Gaussian Coupling and Context Consistency | Fangbing Liu et.al. | 2511.01894 | null |
| 2025-10-29 | SplitFlow: Flow Decomposition for Inversion-Free Text-to-Image Editing | Sung-Hoon Yoon et.al. | 2510.25970 | null |
| 2025-10-29 | RegionE: Adaptive Region-Aware Generation for Efficient Image Editing | Pengtao Chen et.al. | 2510.25590 | null |
| 2025-10-29 | LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation | Zeyu Wang et.al. | 2510.22946 | null |
| 2025-10-28 | Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation | Inclusion AI et.al. | 2510.24821 | null |
| 2025-10-28 | Group Relative Attention Guidance for Image Editing | Xuanpu Zhang et.al. | 2510.24657 | null |
| 2025-10-28 | Diffusion Adaptive Text Embedding for Text-to-Image Diffusion Models | Byeonghu Na et.al. | 2510.23974 | null |
| 2025-10-27 | Autoregressive Styled Text Image Generation, but Make it Reliable | Carmine Zaccagnino et.al. | 2510.23240 | null |
| 2025-10-27 | UniAIDet: A Unified and Universal Benchmark for AI-Generated Image Content Detection and Localization | Huixuan Zhang et.al. | 2510.23023 | null |
| 2025-10-27 | VALA: Learning Latent Anchors for Training-Free and Temporally Consistent | Zhangkai Wu et.al. | 2510.22970 | null |
| 2025-10-27 | FAME: Fairness-aware Attention-modulated Video Editing | Zhangkai Wu et.al. | 2510.22960 | null |
| 2025-10-27 | LayerComposer: Interactive Personalized T2I via Spatially-Aware Layered Canvas | Guocheng Gordon Qian et.al. | 2510.20820 | null |
| 2025-10-25 | GeoDiffusion: A Training-Free Framework for Accurate 3D Geometric Conditioning in Image Generation | Phillip Mueller et.al. | 2510.22337 | null |
| 2025-10-24 | FlowOpt: Fast Optimization Through Whole Flow Processes for Training-Free Editing | Or Ronai et.al. | 2510.22010 | null |
| 2025-10-24 | SafetyPairs: Isolating Safety Critical Image Features with Counterfactual Image Generation | Alec Helbling et.al. | 2510.21120 | null |
| 2025-10-24 | EditInfinity: Image Editing with Binary-Quantized Generative Models | Jiahuan Wang et.al. | 2510.20217 | null |
| 2025-10-24 | Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks | Kai Zeng et.al. | 2510.19195 | null |
| 2025-10-23 | Positional Encoding Field | Yunpeng Bai et.al. | 2510.20385 | null |
| 2025-10-23 | FlowCycle: Pursuing Cycle-Consistent Flows for Text-based Editing | Yanghao Wang et.al. | 2510.20212 | null |
| 2025-10-22 | Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing | Yusu Qian et.al. | 2510.19808 | null |
| 2025-10-21 | PICABench: How Far Are We from Physically Realistic Image Editing? | Yuandong Pu et.al. | 2510.17681 | null |
| 2025-10-21 | Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback | Zongjian Li et.al. | 2510.16888 | null |
| 2025-10-20 | ConsistEdit: Highly Consistent and Precise Training-free Visual Editing | Zixin Yin et.al. | 2510.17803 | null |
| 2025-10-19 | Region in Context: Text-condition Image editing with Human-like semantic reasoning | Thuy Phuong Vu et.al. | 2510.16772 | null |
| 2025-10-17 | BLIP3o-NEXT: Next Frontier of Native Image Generation | Jiuhai Chen et.al. | 2510.15857 | null |
| 2025-10-17 | Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset | Qingyan Bai et.al. | 2510.15742 | null |
| 2025-10-16 | Coupled Diffusion Sampling for Training-Free Multi-View Image Editing | Hadi Alzayer et.al. | 2510.14981 | null |
| 2025-10-16 | Learning an Image Editing Model without Image Editing Pairs | Nupur Kumari et.al. | 2510.14978 | null |
| 2025-10-16 | In-Context Learning with Unpaired Clips for Instruction-based Video Editing | Xinyao Liao et.al. | 2510.14648 | null |
| 2025-10-15 | Edit-Your-Interest: Efficient Video Editing via Feature Most-Similar Propagation | Yi Zuo et.al. | 2510.13084 | null |
| 2025-10-14 | UniFusion: Vision-Language Model as Unified Encoder in Image Generation | Kevin Li et.al. | 2510.12789 | null |
| 2025-10-14 | Vectorized Video Representation with Easy Editing via Hierarchical Spatio-Temporally Consistent Proxy Embedding | Ye Chen et.al. | 2510.12256 | null |
| 2025-10-14 | VIDMP3: Video Editing by Representing Motion with Pose and Position Priors | Sandeep Mishra et.al. | 2510.12069 | null |
| 2025-10-13 | IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment | Yinan Chen et.al. | 2510.11647 | null |
| 2025-10-13 | Zero-shot Face Editing via ID-Attribute Decoupled Inversion | Yang Hou et.al. | 2510.11050 | null |
| 2025-10-13 | GeoVLMath: Enhancing Geometry Reasoning in Vision-Language Models via Cross-Modal Reward for Auxiliary Line Creation | Shasha Guo et.al. | 2510.11020 | null |
| 2025-10-13 | DreamMakeup: Face Makeup Customization using Latent Diffusion Models | Geon Yeong Park et.al. | 2510.10918 | null |
| 2025-10-11 | EditCast3D: Single-Frame-Guided 3D Editing with Video Propagation and View Selection | Huaizhi Qu et.al. | 2510.13652 | null |
| 2025-10-11 | ReMix: Towards a Unified View of Consistent Character Generation and Editing | Benjia Zhou et.al. | 2510.10156 | null |
| 2025-10-11 | MultiCOIN: Multi-Modal COntrollable Video INbetweening | Maham Tanveer et.al. | 2510.08561 | null |
| 2025-10-10 | Mono4DEditor: Text-Driven 4D Scene Editing from Monocular Video via Point-Level Localization of Language-Embedded Gaussians | Jin-Chuan Shi et.al. | 2510.09438 | null |
| 2025-10-10 | TBStar-Edit: From Image Editing Pattern Shifting to Consistency Enhancement | Hao Fang et.al. | 2510.04483 | null |
| 2025-10-09 | FreqCa: Accelerating Diffusion Models via Frequency-Aware Caching | Jiacheng Liu et.al. | 2510.08669 | null |
| 2025-10-09 | Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing | Rishubh Parihar et.al. | 2510.08532 | null |
| 2025-10-09 | InstructX: Towards Unified Visual Editing with MLLM Guidance | Chong Mou et.al. | 2510.08485 | null |
| 2025-10-09 | UniVideo: Unified Understanding, Generation, and Editing for Videos | Cong Wei et.al. | 2510.08377 | null |
| 2025-10-09 | InstructUDrag: Joint Text Instructions and Object Dragging for Interactive Image Editing | Haoran Yu et.al. | 2510.08181 | null |
| 2025-10-09 | Beyond Textual CoT: Interleaved Text-Image Chains with Deep Confidence Reasoning for Image Editing | Zhentao Zou et.al. | 2510.08157 | null |
| 2025-10-08 | DreamOmni2: Multimodal Instruction-based Editing and Generation | Bin Xia et.al. | 2510.06679 | null |
| 2025-10-07 | Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding | Yi Xin et.al. | 2510.06308 | null |
| 2025-10-07 | Efficient High-Resolution Image Editing with Hallucination-Aware Loss and Adaptive Tiling | Young D. Kwon et.al. | 2510.06295 | null |
| 2025-10-07 | Diffusion-Based Image Editing for Breaking Robust Watermarks | Yunyi Ni et.al. | 2510.05978 | null |
| 2025-10-07 | When and How to Cut Classical Concerts? A Multimodal Automated Video Editing Approach | Daniel Gonzálbez-Biosca et.al. | 2510.05661 | null |
| 2025-10-06 | SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder | Ronen Kamenetsky et.al. | 2510.05081 | null |
| 2025-10-05 | ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation | Jay Zhangjie Wu et.al. | 2510.04290 | null |
| 2025-10-05 | Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers | Shikang Zheng et.al. | 2510.04188 | null |
| 2025-10-05 | Prompt-to-Prompt: Text-Based Image Editing Via Cross-Attention Mechanisms -- The Research of Hyperparameters and Novel Mechanisms to Enhance Existing Frameworks | Linn Bieske et.al. | 2510.04034 | null |
| 2025-10-04 | From Filters to VLMs: Benchmarking Defogging Methods through Object Detection and Segmentation Performance | Ardalan Aryashad et.al. | 2510.03906 | null |
| 2025-10-04 | Rare Text Semantics Were Always There in Your Diffusion Transformer | Seil Kang et.al. | 2510.03886 | null |
| 2025-10-03 | DiT-VTON: Diffusion Transformer Framework for Unified Multi-Category Virtual Try-On and Virtual Try-All with Integrated Image Editing | Qi Li et.al. | 2510.04797 | null |
| 2025-10-03 | OTR: Synthesizing Overlay Text Dataset for Text Removal | Jan Zdenek et.al. | 2510.02787 | null |
| 2025-10-02 | DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing | Zihan Zhou et.al. | 2510.02253 | null |
| 2025-10-02 | Towards Better Optimization For Listwise Preference in Diffusion Models | Jiamu Bai et.al. | 2510.01540 | null |
| 2025-10-02 | VRWKV-Editor: Reducing quadratic complexity in transformer-based video editing | Abdelilah Aitrouga et.al. | 2509.25998 | null |
| 2025-10-01 | IMAGEdit: Let Any Subject Transform | Fei Shen et.al. | 2510.01186 | null |
| 2025-10-01 | EditTrack: Detecting and Attributing AI-assisted Image Editing | Zhengyuan Jiang et.al. | 2510.01173 | null |
| 2025-10-01 | DIA: The Adversarial Exposure of Deterministic Inversion in Diffusion Models | Seunghoo Hong et.al. | 2510.00778 | null |
| 2025-10-01 | CAMILA: Context-Aware Masking for Image Editing with Language Alignment | Hyunseung Kim et.al. | 2509.19731 | null |
| 2025-09-30 | EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing | Keming Wu et.al. | 2509.26346 | null |
| 2025-09-30 | Training-Free Reward-Guided Image Editing via Trajectory Optimal Control | Jinho Chang et.al. | 2509.25845 | null |
| 2025-09-30 | Editable Noise Map Inversion: Encoding Target-image into Noise For High-Fidelity Image Manipulation | Mingyu Kang et.al. | 2509.25776 | null |
| 2025-09-30 | Dragging with Geometry: From Pixels to Geometry-Guided Image Editing | Xinyu Pu et.al. | 2509.25740 | null |
| 2025-09-30 | EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling | Xin Luo et.al. | 2509.23909 | null |
| 2025-09-30 | FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing | Junyi Wu et.al. | 2509.22244 | null |
| 2025-09-29 | Training-Free Multimodal Guidance for Video to Audio Generation | Eleonora Grassucci et.al. | 2509.24550 | null |
| 2025-09-29 | Instruction Guided Multi Object Image Editing with Quantity and Layout Consistency | Jiaqi Tan et.al. | 2509.24514 | null |
| 2025-09-29 | Latent Visual Reasoning | Bangzheng Li et.al. | 2509.24251 | null |
| 2025-09-28 | Visual CoT Makes VLMs Smarter but More Fragile | Chunxue Xu et.al. | 2509.23789 | null |
| 2025-09-28 | Seedream 4.0: Toward Next-generation Multimodal Image Generation | Team Seedream et.al. | 2509.20427 | null |
| 2025-09-27 | Object-AVEdit: An Object-level Audio-Visual Editing Model | Youquan Fu et.al. | 2510.00050 | null |
| 2025-09-26 | EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer | Zhehao Dong et.al. | 2509.22407 | null |
| 2025-09-26 | SAGE: Scene Graph-Aware Guidance and Execution for Long-Horizon Manipulation Tasks | Jialiang Li et.al. | 2509.21928 | null |
| 2025-09-26 | Taming Flow-based I2V Models for Creative Video Editing | Xianghao Kong et.al. | 2509.21917 | null |
| 2025-09-26 | TDEdit: A Unified Diffusion Framework for Text-Drag Guided Image Manipulation | Qihang Wang et.al. | 2509.21905 | null |
| 2025-09-25 | FreeInsert: Personalized Object Insertion with Geometric and Style Control | Yuhong Zhang et.al. | 2509.20756 | null |
| 2025-09-25 | ArtUV: Artist-style UV Unwrapping | Yuguang Chen et.al. | 2509.20710 | null |
| 2025-09-25 | EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning | Xuan Ju et.al. | 2509.20360 | null |
| 2025-09-25 | Understanding-in-Generation: Reinforcing Generative Capability of Unified Model via Infusing Understanding into Generation | Yuanhuiyi Lyu et.al. | 2509.18639 | null |
| 2025-09-24 | Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation | Shufan Li et.al. | 2509.19244 | null |
| 2025-09-23 | Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation | Yanzuo Lu et.al. | 2509.18824 | null |
| 2025-09-23 | GeoRemover: Removing Objects and Their Causal Visual Artifacts | Zixin Zhu et.al. | 2509.18538 | null |
| 2025-09-22 | Multi-Agent Amodal Completion: Direct Synthesis with Fine-Grained Semantic Guidance | Hongxing Fan et.al. | 2509.17757 | null |
| 2025-09-20 | Prompt-Driven Agentic Video Editing System: Autonomous Comprehension of Long-Form, Story-Driven Media | Zihan Ding et.al. | 2509.16811 | null |
| 2025-09-20 | V-CECE: Visual Counterfactual Explanations via Conceptual Edits | Nikolaos Spanos et.al. | 2509.16567 | null |
| 2025-09-19 | Neural Atlas Graphs for Dynamic Scene Decomposition and Editing | Jan Philipp Schneider et.al. | 2509.16336 | null |
| 2025-09-19 | Enriched Feature Representation and Motion Prediction Module for MOSEv2 Track of 7th LSVOS Challenge: 3rd Place Solution | Chang Soo Lim et.al. | 2509.15781 | null |
| 2025-09-18 | AutoEdit: Automatic Hyperparameter Tuning for Image Editing | Chau Pham et.al. | 2509.15031 | null |
| 2025-09-18 | MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks | Mingsong Li et.al. | 2509.14638 | null |
| 2025-09-18 | End4: End-to-end Denoising Diffusion for Diffusion-Based Inpainting Detection | Fei Wang et.al. | 2509.13214 | null |
| 2025-09-17 | Controllable-Continuous Color Editing in Diffusion Model via Color Mapping | Yuqi Yang et.al. | 2509.13756 | null |
| 2025-09-17 | LLM-I: LLMs are Naturally Interleaved Multimodal Creators | Zirun Guo et.al. | 2509.13642 | null |
| 2025-09-16 | EdiVal-Agent: An Object-Centric Framework for Automated, Scalable, Fine-Grained Evaluation of Multi-Turn Editing | Tianyu Chen et.al. | 2509.13399 | null |
| 2025-09-16 | Lego-Edit: A General Image Editing Framework with Model-Level Bricks and MLLM Builder | Qifei Jia et.al. | 2509.12883 | null |
| 2025-09-16 | Beyond Artificial Misalignment: Detecting and Grounding Semantic-Coordinated Multimodal Manipulations | Jinjie Shen et.al. | 2509.12653 | null |
| 2025-09-15 | LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence | Zixin Yin et.al. | 2509.12203 | null |
| 2025-09-13 | EditDuet: A Multi-Agent System for Video Non-Linear Editing | Marcelo Sandoval-Castaneda et.al. | 2509.10761 | null |
| 2025-09-12 | Immunizing Images from Text to Image Editing via Adversarial Cross-Attention | Matteo Trippodo et.al. | 2509.10359 | null |
| 2025-09-10 | RoentMod: A Synthetic Chest X-Ray Modification Model to Identify and Correct Image Interpretation Model Shortcuts | Lauren H. Cooke et.al. | 2509.08640 | null |
| 2025-09-09 | Delta Velocity Rectified Flow for Text-to-Image Editing | Gaspard Beaudouin et.al. | 2509.05342 | null |
| 2025-09-04 | Improved 3D Scene Stylization via Text-Guided Generative Image Editing with Region-Based Control | Haruo Fujiwara et.al. | 2509.05285 | null |
| 2025-09-04 | Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping | Jingyi Lu et.al. | 2509.04582 | null |
| 2025-09-04 | From Editor to Dense Geometry Estimator | JiYuan Wang et.al. | 2509.04338 | null |
| 2025-09-03 | Discrete Noise Inversion for Next-scale Autoregressive Text-based Image Editing | Quan Dao et.al. | 2509.01984 | null |
| 2025-09-02 | Fidelity-preserving enhancement of ptychography with foundational text-to-image models | Ming Du et.al. | 2509.04513 | null |
| 2025-09-02 | Draw-In-Mind: Learning Precise Image Editing via Chain-of-Thought Imagination | Ziyun Zeng et.al. | 2509.01986 | null |
| 2025-09-01 | O-DisCo-Edit: Object Distortion Control for Unified Realistic Video Editing | Yuqing Chen et.al. | 2509.01596 | null |
| 2025-09-01 | Neural Scene Designer: Self-Styled Semantic Image Manipulation | Jianman Lin et.al. | 2509.01405 | null |
| 2025-08-30 | LatentEdit: Adaptive Latent Control for Consistent Semantic Editing | Siyi Liu et.al. | 2509.00541 | null |
| 2025-08-28 | Webly-Supervised Image Manipulation Localization via Category-Aware Auto-Annotation | Chenfan Qu et.al. | 2508.20987 | null |
| 2025-08-28 | Describe, Don't Dictate: Semantic Image Editing with Natural Language Intent | En Ci et.al. | 2508.20505 | null |
| 2025-08-28 | Audio-Guided Visual Editing with Complex Multi-Modal Prompts | Hyeonyu Kim et.al. | 2508.20379 | null |
| 2025-08-27 | Not Every Gift Comes in Gold Paper or with a Red Ribbon: Exploring Color Perception in Text-to-Image Models | Shay Shomer Chai et.al. | 2508.19791 | null |
| 2025-08-25 | ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models | Haitang Feng et.al. | 2508.18271 | null |
| 2025-08-25 | SpotEdit: Evaluating Visually-Guided Image Editing Methods | Sara Ghazanfari et.al. | 2508.18159 | null |
| 2025-08-24 | An LLM-LVLM Driven Agent for Iterative and Fine-Grained Image Editing | Zihan Liang et.al. | 2508.17435 | null |
| 2025-08-24 | Defending Deepfake via Texture Feature Perturbation | Xiao Zhang et.al. | 2508.17315 | null |
| 2025-08-24 | PosBridge: Multi-View Positional Embedding Transplant for Identity-Aware Image Editing | Peilin Xiong et.al. | 2508.17302 | null |
| 2025-08-21 | Visual Autoregressive Modeling for Instruction-Guided Image Editing | Qingyang Mao et.al. | 2508.15772 | null |
| 2025-08-20 | AnchorSync: Global Consistency Optimization for Long Video Editing | Zichi Liu et.al. | 2508.14609 | null |
| 2025-08-20 | DreamSwapV: Mask-guided Subject Swapping for Any Customized Video Editing | Weitao Wang et.al. | 2508.14465 | null |
| 2025-08-19 | Sketch3DVE: Sketch-based 3D-Aware Scene Video Editing | Feng-Lin Liu et.al. | 2508.13797 | null |
| 2025-08-18 | Single-Reference Text-to-Image Manipulation with Dual Contrastive Denoising Score | Syed Muhmmad Israr et.al. | 2508.12718 | null |
| 2025-08-18 | TimeMachine: Fine-Grained Facial Age Editing with Identity Preservation | Yilin Mi et.al. | 2508.11284 | null |
| 2025-08-18 | NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale | NextStep Team et.al. | 2508.10711 | null |
| 2025-08-16 | PEdger++: Practical Edge Detection via Assembling Cross Information | Yuanbin Fu et.al. | 2508.11961 | null |
| 2025-08-14 | LD-LAudio-V1: Video-to-Long-Form-Audio Generation Extension with Dual Lightweight Adapters | Haomin Zhang et.al. | 2508.11074 | null |
| 2025-08-14 | A Segmentation-driven Editing Method for Bolt Defect Augmentation and Detection | Yangjie Xiao et.al. | 2508.10509 | null |
| 2025-08-14 | TweezeEdit: Consistent and Efficient Image Editing with Path Regularization | Jianda Mao et.al. | 2508.10498 | null |
| 2025-08-13 | LIA-X: Interpretable Latent Portrait Animator | Yaohui Wang et.al. | 2508.09959 | null |
| 2025-08-12 | Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control | Zeqian Long et.al. | 2508.08134 | null |
| 2025-08-12 | Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation | Fangyuan Mao et.al. | 2508.07981 | null |
| 2025-08-11 | X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning | Jian Ma et.al. | 2508.07607 | null |
| 2025-08-11 | Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing | Joonghyuk Shin et.al. | 2508.07519 | null |
| 2025-08-10 | CLUE: Leveraging Low-Rank Adaptation to Capture Latent Uncovered Evidence for Image Forgery Localization | Youqi Wang et.al. | 2508.07413 | null |
| 2025-08-10 | Consistent and Controllable Image Animation with Motion Linear Diffusion Transformers | Xin Ma et.al. | 2508.07246 | null |
| 2025-08-09 | CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing | Weiyan Xie et.al. | 2508.06937 | null |
| 2025-08-09 | Talk2Image: A Multi-Agent System for Multi-Turn Image Generation and Editing | Shichao Ma et.al. | 2508.06916 | null |
| 2025-08-08 | UGD-IML: A Unified Generative Diffusion-based Framework for Constrained and Unconstrained Image Manipulation Localization | Yachun Mi et.al. | 2508.06101 | null |
| 2025-08-08 | DreamVE: Unified Instruction-based Image and Video Editing | Bin Xia et.al. | 2508.06080 | null |
| 2025-08-08 | NEP: Autoregressive Image Editing via Next Editing Token Prediction | Huimin Wu et.al. | 2508.06044 | null |
| 2025-08-08 | InstantEdit: Text-Guided Few-Step Image Editing with Piecewise Rectified Flow | Yiming Gong et.al. | 2508.06033 | null |
| 2025-08-05 | Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation | Peiyu Wang et.al. | 2508.03320 | null |
| 2025-08-05 | Zero Shot Domain Adaptive Semantic Segmentation by Synthetic Data Generation and Progressive Adaptation | Jun Luo et.al. | 2508.03300 | null |
| 2025-08-05 | LORE: Latent Optimization for Precise Semantic Control in Rectified Flow-based Image Editing | Liangyang Ouyang et.al. | 2508.03144 | null |
| 2025-08-05 | UniEdit-I: Training-free Image Editing for Unified VLM via Iterative Understanding, Editing and Verifying | Chengyu Bai et.al. | 2508.03142 | null |
| 2025-08-05 | The Promise of RL for Autoregressive Image Editing | Saba Ahmadi et.al. | 2508.01119 | null |
| 2025-08-04 | Transport-Guided Rectified Flow Inversion: Improved Image Editing Using Optimal Transport Theory | Marian Lupascu et.al. | 2508.02363 | null |
| 2025-08-04 | Qwen-Image Technical Report | Chenfei Wu et.al. | 2508.02324 | null |
| 2025-08-01 | Controllable Pedestrian Video Editing for Multi-View Driving Scenarios via Motion Sequence | Danzhen Fu et.al. | 2508.00299 | null |
| 2025-08-01 | Towards Robust Semantic Correspondence: A Benchmark and Insights | Wenyue Chong et.al. | 2508.00272 | null |
| 2025-08-01 | Training-free Geometric Image Editing on Diffusion Models | Hanshen Zhu et.al. | 2507.23300 | null |
| 2025-07-31 | UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing | Hao Tang et.al. | 2507.23278 | null |
| 2025-07-29 | Low-Cost Test-Time Adaptation for Robust Video Editing | Jianhui Wang et.al. | 2507.21858 | null |
| 2025-07-29 | From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos | Chenjian Gao et.al. | 2507.20331 | null |
| 2025-07-28 | GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset | Yuhan Wang et.al. | 2507.21033 | null |
| 2025-07-28 | ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation | Sherry X. Chen et.al. | 2507.07317 | null |
| 2025-07-25 | HQ-SMem: Video Segmentation and Tracking Using Memory Efficient Object Embedding With Selective Update and Self-Supervised Distillation Feedback | Elham Soltani Kazemi et.al. | 2507.18921 | null |
| 2025-07-23 | Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling | Yi Xin et.al. | 2507.17801 | null |
| 2025-07-22 | ADCD-Net: Robust Document Image Forgery Localization via Adaptive DCT Feature and Hierarchical Content Disentanglement | Kahim Wong et.al. | 2507.16397 | null |
| 2025-07-22 | Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling | Chao Zhou et.al. | 2507.16240 | null |
| 2025-07-22 | LMM4Edit: Benchmarking and Evaluating Multimodal Image Editing with LMMs | Zitong Xu et.al. | 2507.16193 | null |
| 2025-07-20 | Light Future: Multimodal Action Frame Prediction via InstructPix2Pix | Zesen Zhong et.al. | 2507.14809 | null |
| 2025-07-18 | NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining | Maksim Kuprashevich et.al. | 2507.14119 | null |
| 2025-07-18 | Moodifier: MLLM-Enhanced Emotion-Driven Image Editing | Jiarong Ye et.al. | 2507.14024 | null |
| 2025-07-16 | MADI: Masking-Augmented Diffusion with Inference-Time Scaling for Visual Editing | Shreya Kadambi et.al. | 2507.13401 | null |
| 2025-07-15 | EditGen: Harnessing Cross-Attention Control for Instruction-Based Auto-Regressive Audio Editing | Vassilis Sioros et.al. | 2507.11096 | null |
| 2025-07-14 | Sparse Fine-Tuning of Transformers for Generative Tasks | Wei Chen et.al. | 2507.10855 | null |
| 2025-07-14 | LayLens: Improving Deepfake Understanding through Simplified Explanations | Abhijeet Narang et.al. | 2507.10066 | null |
| 2025-07-11 | FlowDrag: 3D-aware Drag-based Image Editing with Mesh-guided Deformation Vector Flow Fields | Gwanhyeong Koo et.al. | 2507.08285 | null |
| 2025-07-08 | 2D Instance Editing in 3D Space | Yuhuan Xie et.al. | 2507.05819 | null |
| 2025-07-07 | Neural-Driven Image Editing | Pengfei Zhou et.al. | 2507.05397 | null |
| 2025-07-07 | Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing | Chun-Hsiao Yeh et.al. | 2507.05259 | null |
| 2025-07-07 | S |
Xudong Liu et.al. | 2507.04584 | null |
| 2025-07-04 | Pose-Star: Anatomy-Aware Editing for Open-World Fashion Images | Yuran Dong et.al. | 2507.03402 | null |
| 2025-07-04 | LACONIC: A 3D Layout Adapter for Controllable Image Creation | Léopold Maillard et.al. | 2507.03257 | null |
| 2025-07-03 | From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding | Xiangfeng Wang et.al. | 2507.02790 | null |
| 2025-07-02 | Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning | Qingdong He et.al. | 2507.01908 | null |
| 2025-07-02 | ReFlex: Text-Guided Editing of Real Images in Rectified Flow via Mid-Step Feature Extraction and Attention Adaptation | Jimyeong Kim et.al. | 2507.01496 | null |
| 2025-07-02 | QC-OT: Optimal Transport with Quasiconformal Mapping | Yuping Lv et.al. | 2507.01456 | null |
| 2025-07-01 | Ovis-U1 Technical Report | Guo-Hua Wang et.al. | 2506.23044 | null |
| 2025-06-30 | A Unified Framework for Stealthy Adversarial Generation via Latent Optimization and Transferability Enhancement | Gaozheng Pei et.al. | 2506.23676 | null |
| 2025-06-30 | TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity | Yuzhuo Chen et.al. | 2506.23484 | null |
| 2025-06-29 | OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions | Yuanhao Cai et.al. | 2506.23361 | null |
| 2025-06-29 | Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis | Lei-lei Li et.al. | 2506.23263 | null |
| 2025-06-28 | Towards Explainable Bilingual Multimodal Misinformation Detection and Localization | Yiwei He et.al. | 2506.22930 | null |
| 2025-06-28 | STR-Match: Matching SpatioTemporal Relevance Score for Training-Free Video Editing | Junsung Lee et.al. | 2506.22868 | null |
| 2025-06-27 | Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy | Yuhao Liu et.al. | 2506.22432 | null |
| 2025-06-27 | GenEscape: Hierarchical Multi-Agent Generation of Escape Room Puzzles | Mengyi Shan et.al. | 2506.21839 | null |
| 2025-06-27 | DFVEdit: Conditional Delta Flow Vector for Zero-shot Video Editing | Lingling Cai et.al. | 2506.20967 | null |
| 2025-06-26 | Controllable 3D Placement of Objects with Scene-Aware Diffusion Models | Mohamed Omran et.al. | 2506.21446 | null |
| 2025-06-26 | Improving Diffusion-Based Image Editing Faithfulness via Guidance and Scheduling | Hansam Cho et.al. | 2506.21045 | null |
| 2025-06-26 | M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization | Ju-Hyeon Nam et.al. | 2506.20922 | null |
| 2025-06-26 | FaSTA |
Advait Gupta et.al. | 2506.20911 | null |
| 2025-06-26 | BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing | Jiacheng Chen et.al. | 2506.17450 | null |
| 2025-06-25 | EditP23: 3D Editing via Propagation of Image Prompts to Multi-View | Roi Bar-On et.al. | 2506.20652 | null |
| 2025-06-25 | Towards Efficient Exemplar Based Image Editing with Multimodal VLMs | Avadhoot Jadhav et.al. | 2506.20155 | null |
| 2025-06-25 | OmniGen2: Exploration to Advanced Multimodal Generation | Chenyuan Wu et.al. | 2506.18871 | null |
| 2025-06-24 | SceneCrafter: Controllable Multi-View Driving Scene Editing | Zehao Zhu et.al. | 2506.19488 | null |
| 2025-06-24 | LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning | Chenjian Gao et.al. | 2506.10082 | null |
| 2025-06-23 | Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models | Ilia Beletskii et.al. | 2506.19103 | null |
| 2025-06-23 | Let Your Video Listen to Your Music! | Xinyu Zhang et.al. | 2506.18881 | null |
| 2025-06-23 | CPAM: Context-Preserving Adaptive Manipulation for Zero-Shot Real Image Editing | Dinh-Khoi Vo et.al. | 2506.18438 | null |
| 2025-06-23 | Instability in Diffusion ODEs: An Explanation for Inaccurate Image Reconstruction | Han Zhang et.al. | 2506.18290 | null |
| 2025-06-20 | FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation | Fan Yang et.al. | 2506.16806 | null |
| 2025-06-19 | Arch-Router: Aligning LLM Routing with Human Preferences | Co Tran et.al. | 2506.16655 | null |
| 2025-06-18 | VectorEdits: A Dataset and Benchmark for Instruction-Based Editing of Vector Graphics | Josef Kuchař et.al. | 2506.15903 | null |
| 2025-06-17 | Causally Steered Diffusion for Automated Video Counterfactual Generation | Nikos Spyrou et.al. | 2506.14404 | link |
| 2025-06-16 | AttentionDrag: Exploiting Latent Correlation Knowledge in Pre-trained Diffusion Models for Image Editing | Biao Yang et.al. | 2506.13301 | null |
| 2025-06-15 | Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing | Zhuoying Li et.al. | 2506.13827 | null |
| 2025-06-15 | ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies | Chenglin Wang et.al. | 2506.12830 | null |
| 2025-06-14 | Good Noise Makes Good Edits: A Training-Free Diffusion-Based Video Editing with Image and Text Prompts | Saemee Choi et.al. | 2506.12520 | null |
| 2025-06-13 | SphereDrag: Spherical Geometry-Aware Panoramic Image Editing | Zhiao Feng et.al. | 2506.11863 | null |
| 2025-06-13 | Consistent Video Editing as Flow-Driven Image-to-Video Generation | Ge Wang et.al. | 2506.07713 | null |
| 2025-06-12 | VINCIE: Unlocking In-context Image Editing from Video | Leigang Qu et.al. | 2506.10941 | null |
| 2025-06-12 | Edit360: 2D Image Edits to 3D Assets from Any Angle | Junchao Huang et.al. | 2506.10507 | null |
| 2025-06-12 | Towards Reliable Identification of Diffusion-based Image Manipulations | Alex Costanzino et.al. | 2506.05466 | null |
| 2025-06-11 | EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits | Ron Yosef et.al. | 2506.09988 | null |
| 2025-06-11 | ELBO-T2IAlign: A Generic ELBO-Based Method for Calibrating Pixel-level Text-Image Alignment in Diffusion Models | Qin Zhou et.al. | 2506.09740 | null |
| 2025-06-11 | Ming-Omni: A Unified Multimodal Model for Perception and Generation | Inclusion AI et.al. | 2506.09344 | link |
| 2025-06-11 | Fine-Grained Spatially Varying Material Selection in Images | Julia Guerrero-Viu et.al. | 2506.09023 | null |
| 2025-06-10 | Do Concept Replacement Techniques Really Erase Unacceptable Concepts? | Anudeep Das et.al. | 2506.08991 | null |
| 2025-06-10 | RoboSwap: A GAN-driven Video Diffusion Framework For Unsupervised Robot Arm Swapping | Yang Bai et.al. | 2506.08632 | null |
| 2025-06-09 | Highly Compressed Tokenizer Can Generate Without Training | L. Lao Beyer et.al. | 2506.08257 | link |
| 2025-06-09 | PairEdit: Learning Semantic Variations for Exemplar-based Image Editing | Haoguang Lu et.al. | 2506.07992 | link |
| 2025-06-09 | Diffusion Counterfactual Generation with Semantic Abduction | Rajat Rasal et.al. | 2506.07883 | link |
| 2025-06-09 | DragNeXt: Rethinking Drag-Based Image Editing | Yuan Zhou et.al. | 2506.07611 | null |
| 2025-06-09 | Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding | Boyu Chen et.al. | 2506.07576 | null |
| 2025-06-08 | Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning | Tianyi Bai et.al. | 2506.07227 | null |
| 2025-06-08 | TV-LiVE: Training-Free, Text-Guided Video Editing via Layer Informed Vitality Exploitation | Min-Jung Kim et.al. | 2506.07205 | null |
| 2025-06-06 | Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models | Yifu Qiu et.al. | 2506.06006 | link |
| 2025-06-06 | FADE: Frequency-Aware Diffusion Model Factorization for Video Editing | Yixuan Zhu et.al. | 2506.05934 | link |
| 2025-06-06 | SeedEdit 3.0: Fast and High-Quality Generative Image Editing | Peng Wang et.al. | 2506.05083 | null |
| 2025-06-05 | FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing | Guangzhao Li et.al. | 2506.05046 | null |
| 2025-06-05 | Invisible Backdoor Triggers in Image Editing Model via Deep Watermarking | Yu-Feng Chen et.al. | 2506.04879 | link |
| 2025-06-05 | FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers | Xuanhua He et.al. | 2506.04213 | null |
| 2025-06-04 | HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation | Hermann Kumbong et.al. | 2506.04421 | null |
| 2025-06-04 | Is Perturbation-Based Image Protection Disruptive to Image Editing? | Qiuyu Tang et.al. | 2506.04394 | null |
| 2025-06-04 | UNIC: Unified In-Context Video Editing | Zixuan Ye et.al. | 2506.04216 | null |
| 2025-06-04 | Image Editing As Programs with Diffusion Models | Yujia Hu et.al. | 2506.04158 | null |
| 2025-06-04 | UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation | Bin Lin et.al. | 2506.03147 | null |
| 2025-06-04 | MedEBench: Revisiting Text-instructed Image Editing on Medical Domain | Minghao Liu et.al. | 2506.01921 | null |
| 2025-06-03 | RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions | Bimsara Pathiraja et.al. | 2506.03448 | null |
| 2025-06-03 | ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions | Di Chang et.al. | 2506.03107 | null |
| 2025-06-03 | DCI: Dual-Conditional Inversion for Boosting Diffusion-Based Image Editing | Zixiang Li et.al. | 2506.02560 | null |
| 2025-06-03 | RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers | Yan Gong et.al. | 2506.02528 | null |
| 2025-06-02 | IMAGHarmony: Controllable Image Editing with Consistent Object Quantity and Layout | Fei Shen et.al. | 2506.01949 | null |
| 2025-06-02 | OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation | Sen Liang et.al. | 2506.01801 | null |
| 2025-06-02 | Unlocking Aha Moments via Reinforcement Learning: Advancing Collaborative Visual Comprehension and Generation | Kaihang Pan et.al. | 2506.01480 | null |
| 2025-06-02 | DNAEdit: Direct Noise Alignment for Text-Guided Rectified Flow Editing | Chenxi Xie et.al. | 2506.01430 | null |
| 2025-06-01 | Motion-Aware Concept Alignment for Consistent Video Editing | Tong Zhang et.al. | 2506.01004 | null |
| 2025-05-31 | Concept-Centric Token Interpretation for Vector-Quantized Generative Models | Tianze Yang et.al. | 2506.00698 | null |
| 2025-05-30 | MiniMax-Remover: Taming Bad Noise Helps Video Object Removal | Bojia Zi et.al. | 2505.24873 | null |
| 2025-05-29 | Cora: Correspondence-aware image editing using few step diffusion | Amirhossein Almohammadi et.al. | 2505.23907 | null |
| 2025-05-29 | LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers | Yusuf Dalva et.al. | 2505.23758 | null |
| 2025-05-29 | Weakly-supervised Localization of Manipulated Image Regions Using Multi-resolution Learned Features | Ziyong Wang et.al. | 2505.23586 | null |
| 2025-05-29 | Video Editing for Audio-Visual Dubbing | Binyamin Manela et.al. | 2505.23406 | link |
| 2025-05-29 | FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing | Jeongsol Kim et.al. | 2505.23145 | link |
| 2025-05-29 | Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing | Tongtong Su et.al. | 2505.23134 | link |
| 2025-05-28 | HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer | Qi Cai et.al. | 2505.22705 | link |
| 2025-05-28 | VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use | Mingyuan Wu et.al. | 2505.19255 | null |
| 2025-05-27 | Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion | Yang Yang et.al. | 2505.21593 | null |
| 2025-05-27 | Imago Obscura: An Image Privacy AI Co-pilot to Enable Identification and Mitigation of Risks | Kyzyl Monteiro et.al. | 2505.20916 | null |
| 2025-05-27 | InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling | Xiaoxiao Jiang et.al. | 2505.20600 | null |
| 2025-05-26 | What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models | Lorenzo Baraldi et.al. | 2505.20405 | null |
| 2025-05-26 | ImgEdit: A Unified Image Editing Dataset and Benchmark | Yang Ye et.al. | 2505.20275 | link |
| 2025-05-26 | StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation | Yi Wu et.al. | 2505.19874 | null |
| 2025-05-26 | TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs | Juntong Wang et.al. | 2505.19535 | null |
| 2025-05-26 | Understanding Generative AI Capabilities in Everyday Image Editing Tasks | Mohammad Reza Taesiri et.al. | 2505.16181 | null |
| 2025-05-25 | Beyond Editing Pairs: Fine-Grained Instructional Image Editing via Multi-Scale Learnable Regions | Chenrui Ma et.al. | 2505.19352 | null |
| 2025-05-25 | SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation | Shenggan Cheng et.al. | 2505.19151 | null |
| 2025-05-25 | MIND-Edit: MLLM Insight-Driven Editing via Language-Vision Projection | Shuyu Wang et.al. | 2505.19149 | null |
| 2025-05-24 | REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing | Weihan Xu et.al. | 2505.18880 | null |
| 2025-05-24 | Affective Image Editing: Shaping Emotional Factors via Text Descriptions | Peixuan Zhang et.al. | 2505.18699 | null |
| 2025-05-24 | Improved Immiscible Diffusion: Accelerate Diffusion Training by Reducing Its Miscibility | Yiheng Li et.al. | 2505.18521 | link |
| 2025-05-23 | DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval | Yuxin Yang et.al. | 2505.17796 | null |
| 2025-05-23 | R-Genie: Reasoning-Guided Generative Image Editing | Dong Zhang et.al. | 2505.17768 | null |
| 2025-05-22 | KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models | Yongliang Wu et.al. | 2505.16707 | null |
| 2025-05-21 | FragFake: A Dataset for Fine-Grained Detection of Edited Images with Vision Language Models | Zhen Sun et.al. | 2505.15644 | link |
| 2025-05-20 | DragLoRA: Online Optimization of LoRA Adapters for Drag-based Image Editing in Diffusion Model | Siwei Xia et.al. | 2505.12427 | link |
| 2025-05-20 | CompBench: Benchmarking Complex Instruction-guided Image Editing | Bohan Jia et.al. | 2505.12200 | null |
| 2025-05-18 | From Shots to Stories: LLM-Assisted Video Editing with Unified Language Representations | Yuzhi Li et.al. | 2505.12237 | null |
| 2025-05-16 | X-Edit: Detecting and Localizing Edits in Images Altered by Text-Guided Diffusion Models | Valentina Bazyleva et.al. | 2505.11753 | null |
| 2025-05-16 | GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing | Yusu Qian et.al. | 2505.11493 | null |
| 2025-05-15 | 3D-Fixup: Advancing Photo Editing with 3D Priors | Yen-Chi Cheng et.al. | 2505.10566 | null |
| 2025-05-15 | IntrinsicEdit: Precise generative image manipulation in intrinsic space | Linjie Lyu et.al. | 2505.08889 | null |
| 2025-05-14 | Don't Forget your Inverse DDIM for Image Editing | Guillermo Gomez-Trenado et.al. | 2505.09571 | null |
| 2025-05-12 | MDE-Edit: Masked Dual-Editing for Multi-Object Image Editing via Diffusion Models | Hongyang Zhu et.al. | 2505.05101 | null |
| 2025-05-11 | DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models | Junhao Xia et.al. | 2505.07057 | null |
| 2025-05-11 | Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation | Chao Liao et.al. | 2505.05472 | null |
| 2025-05-08 | GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing | Tong Wang et.al. | 2505.04915 | null |
| 2025-05-07 | Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers | Divyansh Srivastava et.al. | 2505.04718 | null |
| 2025-05-07 | Multi-turn Consistent Image Editing | Zijun Zhou et.al. | 2505.04320 | null |
| 2025-05-07 | Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction | Inclusion AI et.al. | 2505.02471 | link |
| 2025-05-06 | MambaStyle: Efficient StyleGAN Inversion for Real Image Editing with State-Space Models | Jhon Lopez et.al. | 2505.15822 | null |
| 2025-05-06 | Step1X-Edit: A Practical Framework for General Image Editing | Shiyu Liu et.al. | 2504.17761 | link |
| 2025-05-05 | SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing | Ming Li et.al. | 2505.02370 | link |
| 2025-05-04 | Video Forgery Detection for Surveillance Cameras: A Review | Noor B. Tayfor et.al. | 2505.03832 | null |
| 2025-05-02 | Improving Editability in Image Generation with Layer-wise Memory | Daneul Kim et.al. | 2505.01079 | null |
| 2025-05-02 | A Rusty Link in the AI Supply Chain: Detecting Evil Configurations in Model Repositories | Ziqi Ding et.al. | 2505.01067 | null |
| 2025-05-02 | Photoshop Batch Rendering Using Actions for Stylistic Video Editing | Tessa De La Fuente et.al. | 2505.01001 | null |
| 2025-05-01 | InstructAttribute: Fine-grained Object Attributes editing with Instruction | Xingxi Yin et.al. | 2505.00751 | null |
| 2025-05-01 | Controllable Weather Synthesis and Removal with Video Diffusion Models | Chih-Hao Lin et.al. | 2505.00704 | null |
| 2025-05-01 | Towards Scalable Human-aligned Benchmark for Text-guided Image Editing | Suho Ryu et.al. | 2505.00502 | link |
| 2025-04-30 | PixelHacker: Image Inpainting with Structural and Semantic Consistency | Ziyang Xu et.al. | 2504.20438 | null |
| 2025-04-29 | In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer | Zechuan Zhang et.al. | 2504.20690 | null |
| 2025-04-27 | CapsFake: A Multimodal Capsule Network for Detecting Instruction-Guided Deepfakes | Tuan Nguyen et.al. | 2504.19212 | null |
| 2025-04-26 | REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models | Gal Almog et.al. | 2504.18989 | link |
| 2025-04-24 | DCT-Shield: A Robust Frequency Domain Defense against Malicious Image Editing | Aniruddha Bala et.al. | 2504.17894 | null |
| 2025-04-24 | VEU-Bench: Towards Comprehensive Understanding of Video Editing | Bozheng Li et.al. | 2504.17828 | null |
| 2025-04-24 | Generative Fields: Uncovering Hierarchical Feature Control for StyleGAN via Inverted Receptive Fields | Zhuo He et.al. | 2504.17712 | null |
| 2025-04-24 | Enhancing Variational Autoencoders with Smooth Robust Latent Encoding | Hyomin Lee et.al. | 2504.17219 | null |
| 2025-04-24 | Vidi: Large Multimodal Models for Video Understanding and Editing | Vidi Team et.al. | 2504.15681 | null |
| 2025-04-22 | Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework | Xinyuan Song et.al. | 2504.16016 | null |
| 2025-04-22 | Structure-Preserving Zero-Shot Image Editing via Stage-Wise Latent Injection in Diffusion Models | Dasol Jeong et.al. | 2504.15723 | null |
| 2025-04-21 | MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World | Ankit Dhiman et.al. | 2504.15397 | null |
| 2025-04-21 | Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach | Lvpan Cai et.al. | 2504.11922 | link |
| 2025-04-20 | MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation | Siyi Jiao et.al. | 2504.14606 | null |
| 2025-04-19 | Visual Prompting for One-shot Controllable Video Editing without Inversion | Zhengbo Zhang et.al. | 2504.14335 | null |
| 2025-04-19 | PRISM: A Unified Framework for Photorealistic Reconstruction and Intrinsic Scene Modeling | Alara Dirik et.al. | 2504.14219 | null |
| 2025-04-18 | Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation | Fulvio Sanguigni et.al. | 2504.14011 | null |
| 2025-04-18 | Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing | Joowon Kim et.al. | 2504.13490 | null |
| 2025-04-17 | Image Editing with Diffusion Models: A Survey | Jia Wang et.al. | 2504.13226 | null |
| 2025-04-17 | Siwei Yang et.al. | 2504.13143 | null | |
| 2025-04-17 | UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models | Guanlong Jiao et.al. | 2504.13109 | null |
| 2025-04-17 | Image-Editing Specialists: An RLAIF Approach for Diffusion Models | Elior Benarous et.al. | 2504.12833 | link |
| 2025-04-17 | SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding | Qianqian Sun et.al. | 2504.12704 | null |
| 2025-04-17 | DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency | Mengshi Qi et.al. | 2504.12080 | link |
| 2025-04-17 | Understanding Attention Mechanism in Video Diffusion Models | Bingyan Liu et.al. | 2504.12027 | null |
| 2025-04-14 | Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing | Taihang Hu et.al. | 2504.10434 | link |
| 2025-04-14 | Analysis of Attention in Video Diffusion Transformers | Yuxin Wen et.al. | 2504.10317 | null |
| 2025-04-14 | TAPNext: Tracking Any Point (TAP) as Next Token Prediction | Artem Zholus et.al. | 2504.05579 | null |
| 2025-04-13 | SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow | Kenan Tang et.al. | 2504.09697 | link |
| 2025-04-13 | CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models | Pooja Guhan et.al. | 2504.09472 | null |
| 2025-04-11 | CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model | Ruohao Zhan et.al. | 2504.08259 | null |
| 2025-04-10 | POEM: Precise Object-level Editing via MLLM control | Marco Schouten et.al. | 2504.08111 | null |
| 2025-04-10 | Learning Universal Features for Generalizable Image Forgery Localization | Hengrun Zhao et.al. | 2504.07462 | link |
| 2025-04-10 | Routing to the Right Expertise: A Trustworthy Judge for Instruction-based Image Editing | Chenxi Sun et.al. | 2504.07424 | null |
| 2025-04-09 | FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution | Gene Chou et.al. | 2504.07093 | link |
| 2025-04-08 | VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing | Juan Luis Gonzalez Bello et.al. | 2504.07146 | null |
| 2025-04-08 | Transfer between Modalities with MetaQueries | Xichen Pan et.al. | 2504.06256 | null |
| 2025-04-08 | Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model | Qi Mao et.al. | 2504.05594 | null |
| 2025-04-08 | Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing | Xiangyu Zhao et.al. | 2504.02826 | link |
| 2025-04-07 | CREA: A Collaborative Multi-Agent Framework for Creative Content Generation with Diffusion Models | Kavana Venkatesh et.al. | 2504.05306 | null |
| 2025-04-07 | Disentangling Instruction Influence in Diffusion Transformers for Parallel Multi-Instruction-Guided Image Editing | Hui Liu et.al. | 2504.04784 | null |
| 2025-04-07 | MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models | Wulin Xie et.al. | 2504.03641 | null |
| 2025-04-04 | Synthesizing Optimal Object Selection Predicates for Image Editing using Lattices | Yang He et.al. | 2504.03155 | null |
| 2025-04-03 | How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models | Pascal Chang et.al. | 2504.03072 | null |
| 2025-04-03 | VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning | Xianwei Zhuang et.al. | 2504.02949 | link |
| 2025-04-03 | Concept Lancet: Image Editing with Compositional Representation Transplant | Jinqi Luo et.al. | 2504.02828 | null |
| 2025-04-03 | GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation | Zhiyuan Yan et.al. | 2504.02782 | link |
| 2025-04-03 | ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement | Runhui Huang et.al. | 2504.01934 | null |
| 2025-04-02 | FreSca: Unveiling the Scaling Space in Diffusion Models | Chao Huang et.al. | 2504.02154 | null |
| 2025-04-02 | A Diffusion-Based Framework for Occluded Object Movement | Zheng-Peng Duan et.al. | 2504.01873 | null |
| 2025-03-31 | AI2Agent: An End-to-End Framework for Deploying AI Projects as Autonomous Agents | Jiaxiang Chen et.al. | 2503.23948 | link |
| 2025-03-31 | Training-Free Text-Guided Image Editing with Visual Autoregressive Model | Yufei Wang et.al. | 2503.23897 | link |
| 2025-03-30 | Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging | Amar Kumar et.al. | 2503.23618 | null |
| 2025-03-30 | ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025 | Tianming Liang et.al. | 2503.23509 | link |
| 2025-03-30 | SketchVideo: Sketch-based Video Generation and Editing | Feng-Lin Liu et.al. | 2503.23284 | null |
| 2025-03-29 | FreeInv: Free Lunch for Improving DDIM Inversion | Yuxiang Bao et.al. | 2503.23035 | null |
| 2025-03-29 | FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model | Jun Zhou et.al. | 2503.19839 | null |
| 2025-03-28 | Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance | Haijie Yang et.al. | 2503.22225 | null |
| 2025-03-28 | LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing | Achint Soni et.al. | 2503.21541 | link |
| 2025-03-26 | Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising | Yan-Bo Lin et.al. | 2503.20782 | null |
| 2025-03-26 | EditCLIP: Representation Learning for Image Editing | Qian Wang et.al. | 2503.20318 | link |
| 2025-03-26 | Wan: Open and Advanced Large-Scale Video Generative Models | WanTeam et.al. | 2503.20314 | link |
| 2025-03-26 | InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction | Yuhui Wu et.al. | 2503.20287 | link |
| 2025-03-25 | Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning | Sherry X. Chen et.al. | 2503.18406 | link |
| 2025-03-25 | Shot Sequence Ordering for Video Editing: Benchmarks, Metrics, and Cinematology-Inspired Computing Methods | Yuzhi Li et.al. | 2503.17975 | null |
| 2025-03-24 | FDS: Frequency-Aware Denoising Score for Text-Guided Latent Diffusion Image Editing | Yufan Ren et.al. | 2503.19191 | null |
| 2025-03-24 | Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance | Sicong Feng et.al. | 2503.18386 | null |
| 2025-03-24 | MaSS13K: A Matting-level Semantic Segmentation Benchmark | Chenxi Xie et.al. | 2503.18364 | link |
| 2025-03-23 | Collaborating with AI Agents: Field Experiments on Teamwork, Productivity, and Performance | Harang Ju et.al. | 2503.18238 | link |
| 2025-03-23 | What Time Tells Us? An Explorative Study of Time Awareness Learned from Static Images | Dongheng Lin et.al. | 2503.17899 | null |
| 2025-03-23 | Multi-focal Conditioned Latent Diffusion for Person Image Synthesis | Jiaqi Liu et.al. | 2503.15686 | link |
| 2025-03-22 | InstructVEdit: A Holistic Approach for Instructional Video Editing | Chi Zhang et.al. | 2503.17641 | null |
| 2025-03-22 | Guidance Free Image Editing via Explicit Conditioning | Mehdi Noroozi et.al. | 2503.17593 | null |
| 2025-03-21 | HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks | Maria Pilligua et.al. | 2503.17276 | null |
| 2025-03-21 | DCEdit: Dual-Level Controlled Image Editing via Precisely Localized Semantics | Yihan Hu et.al. | 2503.16795 | null |
| 2025-03-20 | FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing | Tianyi Wei et.al. | 2503.16153 | null |
| 2025-03-20 | Single Image Iterative Subject-driven Generation and Editing | Yair Shpitzer et.al. | 2503.16025 | link |
| 2025-03-19 | VEGGIE: Instructional Editing and Reasoning of Video Concepts with Grounded Generation | Shoubin Yu et.al. | 2503.14350 | null |
| 2025-03-18 | ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing | Yulin Pan et.al. | 2503.14482 | null |
| 2025-03-18 | TarPro: Targeted Protection against Malicious Image Editing | Kaixin Shen et.al. | 2503.13994 | null |
| 2025-03-17 | FiVE: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models | Minghan Li et.al. | 2503.13684 | null |
| 2025-03-17 | Unified Autoregressive Visual Generation and Understanding with Continuous Tokens | Lijie Fan et.al. | 2503.13436 | null |
| 2025-03-17 | Edit Transfer: Learning Image Editing via Vision In-Context Relations | Lan Chen et.al. | 2503.13327 | null |
| 2025-03-17 | GIFT: Generated Indoor video frames for Texture-less point tracking | Jianzheng Huang et.al. | 2503.12944 | null |
| 2025-03-17 | DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode | Junjia Huang et.al. | 2503.12838 | null |
| 2025-03-16 | UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing | Tsu-Jui Fu et.al. | 2503.12652 | null |
| 2025-03-16 | Personalize Anything for Free with Diffusion Transformer | Haoran Feng et.al. | 2503.12590 | null |
| 2025-03-14 | Upcycling Text-to-Image Diffusion Models for Multi-Task Capabilities | Ruchika Chavhan et.al. | 2503.11905 | null |
| 2025-03-14 | RASA: Replace Anyone, Say Anything -- A Training-Free Framework for Audio-Driven and Universal Portrait Video Editing | Tianrui Pan et.al. | 2503.11571 | null |
| 2025-03-14 | LUSD: Localized Update Score Distillation for Text-Guided Image Editing | Worameth Chinchuthakun et.al. | 2503.11054 | link |
| 2025-03-14 | V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes | Yanming Zhang et.al. | 2503.10634 | null |
| 2025-03-14 | On the Limitations of Vision-Language Models in Understanding Image Transforms | Ahmad Mustafa Anis et.al. | 2503.09837 | null |
| 2025-03-13 | Fine-Tuning Diffusion Generative Models via Rich Preference Optimization | Hanyang Zhao et.al. | 2503.11720 | null |
| 2025-03-13 | CoSTA |
Advait Gupta et.al. | 2503.10613 | link |
| 2025-03-13 | EEdit : Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing | Zexuan Yan et.al. | 2503.10270 | link |
| 2025-03-13 | MoEdit: On Learning Quantity Perception for Multi-object Image Editing | Yanfeng Li et.al. | 2503.10112 | link |
| 2025-03-13 | Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models | Armando Fortes et.al. | 2503.08434 | null |
| 2025-03-12 | Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space | Yifan Zhou et.al. | 2503.09419 | link |
| 2025-03-12 | InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images | Jiun Tian Hoe et.al. | 2503.09130 | null |
| 2025-03-12 | OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting | Yongsheng Yu et.al. | 2503.08677 | null |
| 2025-03-11 | Aligning Text to Image in Diffusion Models is Easier Than You Think | Jaa-Yeon Lee et.al. | 2503.08250 | link |
| 2025-03-11 | ObjectMover: Generative Object Movement with Video Prior | Xin Yu et.al. | 2503.08037 | null |
| 2025-03-11 | CAD-VAE: Leveraging Correlation-Aware Latents for Comprehensive Fair Disentanglement | Chenrui Ma et.al. | 2503.07938 | null |
| 2025-03-11 | VACE: All-in-One Video Creation and Editing | Zeyinzi Jiang et.al. | 2503.07598 | null |
| 2025-03-10 | Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model | Lixue Gong et.al. | 2503.07703 | null |
| 2025-03-10 | TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation | Victor Shea-Jay Huang et.al. | 2503.07050 | null |
| 2025-03-10 | Interactive Tumor Progression Modeling via Sketch-Based Image Editing | Gexin Huang et.al. | 2503.06809 | null |
| 2025-03-10 | VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control | Yuxuan Bian et.al. | 2503.05639 | link |
| 2025-03-09 | Consistent Image Layout Editing with Diffusion Models | Tao Xia et.al. | 2503.06419 | null |
| 2025-03-08 | Get In Video: Add Anything You Want to the Video | Shaobin Zhuang et.al. | 2503.06268 | null |
| 2025-03-08 | X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation | Jian Ma et.al. | 2503.06134 | link |
| 2025-03-07 | Towards Locally Explaining Prediction Behavior via Gradual Interventions and Measuring Property Gradients | Niklas Penzel et.al. | 2503.05424 | null |
| 2025-03-06 | Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models | Rui Jiang et.al. | 2503.04215 | null |
| 2025-03-05 | GuardDoor: Safeguarding Against Malicious Diffusion Editing via Protective Backdoors | Yaopei Zeng et.al. | 2503.03944 | null |
| 2025-03-04 | h-Edit: Effective and Flexible Diffusion-Based Editing via Doob's h-Transform | Toan Nguyen et.al. | 2503.02187 | link |
| 2025-03-03 | VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors | Juil Koo et.al. | 2503.01107 | null |
| 2025-03-01 | GenVDM: Generating Vector Displacement Maps From a Single Image | Yuezhi Yang et.al. | 2503.00605 | null |
| 2025-02-27 | Tight Inversion: Image-Conditioned Inversion for Real Image Editing | Edo Kadosh et.al. | 2502.20376 | null |
| 2025-02-27 | Identity-preserving Distillation Sampling by Fixed-Point Iterator | SeonHwa Kim et.al. | 2502.19930 | null |
| 2025-02-26 | SVGEditBench V2: A Benchmark for Instruction-based SVG Editing | Kunato Nishina et.al. | 2502.19453 | link |
| 2025-02-26 | Bayesian Optimization for Controlled Image Editing via LLMs | Chengkun Cai et.al. | 2502.18116 | null |
| 2025-02-25 | KV-Edit: Training-Free Image Editing for Precise Background Preservation | Tianrui Zhu et.al. | 2502.17363 | link |
| 2025-02-24 | VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing | Xiangpeng Yang et.al. | 2502.17258 | null |
| 2025-02-23 | PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data | Shijie Huang et.al. | 2502.14397 | link |
| 2025-02-22 | DualNeRF: Text-Driven 3D Scene Editing via Dual-Field Representation | Yuxuan Xiong et.al. | 2502.16302 | null |
| 2025-02-18 | AnyRefill: A Unified, Data-Efficient Framework for Left-Prompt-Guided Vision Tasks | Ming Xie et.al. | 2502.11158 | null |
| 2025-02-14 | PromptArtisan: Multi-instruction Image Editing in Single Pass with Complete Attention Control | Kunal Swami et.al. | 2502.10258 | null |
| 2025-02-14 | VideoDiff: Human-AI Video Co-Creation with Alternatives | Mina Huh et.al. | 2502.10190 | null |
| 2025-02-14 | Hands-off Image Editing: Language-guided Editing without any Task-specific Labeling, Masking or even Training | Rodrigo Santos et.al. | 2502.10064 | null |
| 2025-02-14 | SportsBuddy: Designing and Evaluating an AI-Powered Sports Video Storytelling Tool Through Real-World Deployment | Tica Lin et.al. | 2502.08621 | null |
| 2025-02-10 | Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists | Bojia Zi et.al. | 2502.06734 | null |
| 2025-02-10 | Predictive Red Teaming: Breaking Policies Without Breaking Robots | Anirudha Majumdar et.al. | 2502.06575 | null |
| 2025-02-08 | AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection | Shuheng Zhang et.al. | 2502.05433 | null |
| 2025-02-06 | MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation | Jinbo Xing et.al. | 2502.04299 | null |
| 2025-02-06 | PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models | Aleksandar Cvejic et.al. | 2502.04050 | null |
| 2025-02-06 | DICE: Distilling Classifier-Free Guidance into Text Embeddings | Zhenyu Zhou et.al. | 2502.03726 | null |
| 2025-02-05 | Lost in Edits? A |
Wenhao You et.al. | 2502.04364 | null |
| 2025-02-05 | REALEDIT: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations | Peter Sushko et.al. | 2502.03629 | null |
| 2025-02-04 | Exploring the latent space of diffusion models directly through singular value decomposition | Li Wang et.al. | 2502.02225 | null |
| 2025-02-04 | EditIQ: Automated Cinematic Editing of Static Wide-Angle Videos via Dialogue Interpretation and Saliency Cues | Rohit Girmaji et.al. | 2502.02172 | null |
| 2025-02-04 | Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation | JooHyun Kwon et.al. | 2502.02091 | null |
| 2025-01-30 | DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models | Ruofan Liang et.al. | 2501.18590 | null |
| 2025-01-24 | MATCHA:Towards Matching Anything | Fei Xue et.al. | 2501.14945 | null |
| 2025-01-24 | Training-Free Style and Content Transfer by Leveraging U-Net Skip Connections in Stable Diffusion 2.* | Ludovica Schaerf et.al. | 2501.14524 | null |
| 2025-01-23 | IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models | Jiayi Lei et.al. | 2501.13920 | null |
| 2025-01-09 | Edit as You See: Image-guided Video Editing via Masked Motion Modeling | Zhi-Lin Huang et.al. | 2501.04325 | null |
| 2024-11-19 | StableV2V: Stablizing Shape Consistency in Video-to-Video Editing | Chang Liu et.al. | 2411.11045 | null |
| 2024-08-29 | Edit Temporal-Consistent Videos with Image Diffusion Model | Yuanzhi Wang et.al. | 2308.09091 | null |
| 2024-06-21 | A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models | Xincheng Shuai et.al. | 2406.14555 | null |
| 2024-04-22 | GenVideo: One-shot Target-image and Shape Aware Video Editing using T2I Diffusion Models | Sai Sree Harsha et.al. | 2404.12541 | null |
| 2024-03-04 | FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing | Yuren Cong et.al. | 2310.05922 | null |
| 2024-02-20 | Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts | Yuyang Zhao et.al. | 2305.08850 | null |
| 2024-01-19 | Edit One for All: Interactive Batch Image Editing | Thao Nguyen et.al. | 2401.10219 | null |
| 2023-12-08 | DiffusionAtlas: High-Fidelity Consistent Diffusion Video Editing | Shao-Yu Chang et.al. | 2312.03772 | null |
| 2023-10-12 | FateZero: Fusing Attentions for Zero-shot Text-based Video Editing | Chenyang Qi et.al. | 2303.09535 | null |
| 2023-08-11 | InFusion: Inject and Attention Fusion for Multi Concept Zero-Shot Text-based Video Editing | Anant Khandelwal et.al. | 2308.00135 | null |
| 2023-03-28 | Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding | Gyeongman Kim et.al. | 2212.02802 | null |
| 2023-03-01 | AVscript: Accessible Video Editing with Audio-Visual Scripts | Mina Huh et.al. | 2302.14117 | null |
| 2023-01-31 | Shape-aware Text-driven Layered Video Editing | Yao-Chih Lee et.al. | 2301.13173 | null |
| 2022-06-22 | Temporally Consistent Semantic Video Editing | Yiran Xu et.al. | 2206.10590 | null |
| 2022-05-26 | Text2LIVE: Text-Driven Layered Image and Video Editing | Omer Bar-Tal et.al. | 2204.02491 | null |
| 2022-01-11 | Video-Specific Autoencoders for Exploring, Editing and Transmitting Videos | Kevin Wang et.al. | 2103.17261 | null |
| 2021-08-18 | A Latent Transformer for Disentangled Face Editing in Images and Videos | Xu Yao et.al. | 2106.11895 | null |
| 2021-04-22 | Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions | Xihui Liu et.al. | 2008.01576 | null |
| 2020-08-10 | Image2StyleGAN++: How to Edit the Embedded Images? | Rameen Abdal et.al. | 1911.11544 | null |
Others
Others
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-11-18 | UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning | Rui Tian et.al. | 2511.14760 | null |
| 2025-11-18 | Co-Me: Confidence-Guided Token Merging for Visual Geometric Transformers | Yutian Chen et.al. | 2511.14751 | null |
| 2025-11-18 | Graph Neural Networks for Vehicular Social Networks: Trends, Challenges, and Opportunities | Elham Binshaflout et.al. | 2511.14720 | null |
| 2025-11-18 | Natural Language Interfaces for Databases: What Do Users Think? | Panos Ipeirotis et.al. | 2511.14718 | null |
| 2025-11-18 | Talk, Snap, Complain: Validation-Aware Multimodal Expert Framework for Fine-Grained Customer Grievances | Rishu Kumar Singh et.al. | 2511.14693 | null |
| 2025-11-18 | Giant enhancement of attosecond tunnel ionization competes with disorder-driven decoherence in silicon | D. N. Purschke et.al. | 2511.14678 | null |
| 2025-11-18 | M-CALLM: Multi-level Context Aware LLM Framework for Group Interaction Prediction | Diana Romero et.al. | 2511.14661 | null |
| 2025-11-18 | Robust Offset-free Kernelized Data-Driven Predictive Control for Nonlinear Systems | Mahmood Mazare et.al. | 2511.14652 | null |
| 2025-11-18 | Real-time time-dependent density functional theory for high-energy density physics | Alina Kononov et.al. | 2511.14643 | null |
| 2025-11-18 | Enhancing Agentic Autonomous Scientific Discovery with Vision-Language Model Capabilities | Kahaan Gandhi et.al. | 2511.14631 | null |
| 2025-11-18 | Scalable Enforcement of Fine Grained Access Control Policies in Relational Database Management Systems | Anadi Shakya et.al. | 2511.14629 | null |
| 2025-11-18 | XAttn-BMD: Multimodal Deep Learning with Cross-Attention for Femoral Neck Bone Mineral Density Estimation | Yilin Zhang et.al. | 2511.14604 | null |
| 2025-11-18 | Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language | Minyoung Hwang et.al. | 2511.14565 | null |
| 2025-11-18 | Full Atom Peptide Design via Riemannian Euclidean Bayesian Flow Networks | Hao Qian et.al. | 2511.14516 | null |
| 2025-11-18 | Neural network impurity solver for real-frequency dynamical mean-field theory | Fenglin Deng et.al. | 2511.14505 | null |
| 2025-11-18 | Overview and Prospects of Using Integer Surrogate Keys for Data Warehouse Performance Optimization | Sviatoslav Stumpf et.al. | 2511.14502 | null |
| 2025-11-18 | Segmentation-Aware Latent Diffusion for Satellite Image Super-Resolution: Enabling Smallholder Farm Boundary Delineation | Aditi Agarwal et.al. | 2511.14481 | null |
| 2025-11-18 | Cracking the Microsecond: An Efficient and Precise Time Synchronization Scheme for Hybrid 5G-TSN Networks | Michael Gundall et.al. | 2511.14462 | null |
| 2025-11-18 | Advancing Minimally Invasive Precision Surgery in Open Cavities with Robotic Flexible Endoscopy | Michelle Mattille et.al. | 2511.14458 | null |
| 2025-11-18 | Analyzing the Impact of Participant Failures in Cross-Silo Federated Learning | Fabian Stricker et.al. | 2511.14456 | null |
| 2025-11-17 | Scaling Spatial Intelligence with Multimodal Foundation Models | Zhongang Cai et.al. | 2511.13719 | null |
| 2025-11-17 | Crossing Borders: A Multimodal Challenge for Indian Poetry Translation and Image Generation | Sofia Jamil et.al. | 2511.13689 | null |
| 2025-11-17 | Scalable Iterative Algorithm for Solving Optimal Transmission Switching with De-energization | Benoît Jeanson et.al. | 2511.13662 | null |
| 2025-11-17 | Ontology-Driven Model-to-Model Transformation of Workflow Specifications | Francisco Abreu et.al. | 2511.13661 | null |
| 2025-11-17 | Part-X-MLLM: Part-aware 3D Multimodal Large Language Model | Chunshi Wang et.al. | 2511.13647 | null |
| 2025-11-17 | Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly? | Chunqiu Steven Xia et.al. | 2511.13646 | null |
| 2025-11-17 | CreBench: Human-Aligned Creativity Evaluation from Idea to Process to Product | Kaiwen Xue et.al. | 2511.13626 | null |
| 2025-11-17 | A Real-Time Driver Drowsiness Detection System Using MediaPipe and Eye Aspect Ratio | Ashlesha G. Sawant et.al. | 2511.13618 | null |
| 2025-11-17 | BIOMERO 2.0: end-to-end FAIR infrastructure for bioimaging data import, analysis, and provenance | Torec T. Luik et.al. | 2511.13611 | null |
| 2025-11-17 | A Gentle Introduction to Conformal Time Series Forecasting | M. Stocker et.al. | 2511.13608 | null |
| 2025-11-17 | Long-range entanglement and quantum correlations in a multi-frequency comb system | Sahil Pontula et.al. | 2511.13604 | null |
| 2025-11-17 | Physics-Informed Neural Networks for Nonlinear Output Regulation | Sebastiano Mengozzi et.al. | 2511.13595 | null |
| 2025-11-17 | Data-driven Acceleration of MPC with Guarantees | Agustin Castellano et.al. | 2511.13588 | null |
| 2025-11-17 | Graph Out-of-Distribution Detection via Test-Time Calibration with Dual Dynamic Dictionaries | Yue Hou et.al. | 2511.13541 | null |
| 2025-11-17 | Towards Affect-Adaptive Human-Robot Interaction: A Protocol for Multimodal Dataset Collection on Social Anxiety | Vesna Poprcova et.al. | 2511.13530 | null |
| 2025-11-17 | A Computationally Efficient Framework for Free-trajectory Minimum-lap-time Optimization of Racing Cars | Erik van den Eshof et.al. | 2511.13522 | null |
| 2025-11-17 | Multi-Agent Multimodal Large Language Model Framework for Automated Interpretation of Fuel Efficiency Analytics in Public Transportation | Zhipeng Ma et.al. | 2511.13476 | null |
| 2025-11-17 | Machine learning inspired photon number resolution in superconducting nanowire single-photon detectors | I. S. Kuijf et.al. | 2511.13475 | null |
| 2025-11-17 | Measurement of Exclusive |
DUNE Collaboration et.al. | 2511.13462 | null |
| 2025-11-17 | Hardware optimization on Android for inference of AI models | Iulius Gherasim et.al. | 2511.13453 | null |
| 2025-11-17 | Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline | Rui Zuo et.al. | 2511.13442 | null |
| 2025-11-17 | Can Large Language Models Function as Qualified Pediatricians? A Systematic Evaluation in Real-World Clinical Contexts | Siyu Zhu et.al. | 2511.13381 | null |
| 2025-11-17 | Dual-LoRA and Quality-Enhanced Pseudo Replay for Multimodal Continual Food Learning | Xinlan Wu et.al. | 2511.13351 | null |
| 2025-11-17 | ZeroDexGrasp: Zero-Shot Task-Oriented Dexterous Grasp Synthesis with Prompt-Based Multi-Stage Semantic Reasoning | Juntao Jian et.al. | 2511.13327 | null |
| 2025-11-17 | TacEleven: generative tactic discovery for football open play | Siyao Zhao et.al. | 2511.13326 | null |
| 2025-11-17 | Computer Vision based group activity detection and action spotting | Narthana Sivalingam et.al. | 2511.13315 | null |
| 2025-11-17 | Distributed Hierarchical Machine Learning for Joint Resource Allocation and Slice Selection in In-Network Edge Systems | Sulaiman Muhammad Rashid et.al. | 2511.13313 | null |
| 2025-11-17 | DriveLiDAR4D: Sequential and Controllable LiDAR Scene Generation for Autonomous Driving | Kaiwen Cai et.al. | 2511.13309 | null |
| 2025-11-17 | TabFlash: Efficient Table Understanding with Progressive Question Conditioning and Token Focusing | Jongha Kim et.al. | 2511.13283 | null |
| 2025-11-17 | The Spontaneous Genesis of Solar Prominence Structures Driven by Supergranulation in Three-Dimensional Simulations | Huanxin Chen et.al. | 2511.13252 | null |
| 2025-11-17 | DualTAP: A Dual-Task Adversarial Protector for Mobile MLLM Agents | Fuyao Zhang et.al. | 2511.13248 | null |
| 2025-11-17 | MMD-Thinker: Adaptive Multi-Dimensional Thinking for Multimodal Misinformation Detection | Junjie Wu et.al. | 2511.13242 | null |
| 2025-11-17 | GaRLILEO: Gravity-aligned Radar-Leg-Inertial Enhanced Odometry | Chiyun Noh et.al. | 2511.13216 | null |
| 2025-11-16 | Sparsity-Driven Entanglement Detection in High-Dimensional Quantum States | Stav Lotan et.al. | 2511.12546 | null |
| 2025-11-16 | High-level reasoning while low-level actuation in Cyber-Physical Systems: How efficient is it? | Burak Karaduman et.al. | 2511.12543 | null |
| 2025-11-16 | Accepted with Minor Revisions: Value of AI-Assisted Scientific Writing | Sanchaita Hazra et.al. | 2511.12529 | null |
| 2025-11-16 | Collaborative Charging Optimization for Wireless Rechargeable Sensor Networks via Heterogeneous Mobile Chargers | Jianhang Yao et.al. | 2511.12501 | null |
| 2025-11-16 | Towards Better IncomLDL: We Are Unaware of Hidden Labels in Advance | Jiecheng Jiang et.al. | 2511.12494 | null |
| 2025-11-16 | ClutterNav: Gradient-Guided Search for Efficient 3D Clutter Removal with Learned Costmaps | Navin Sriram Ravie et.al. | 2511.12479 | null |
| 2025-11-16 | Lightweight Deep Autoencoder for ECG Denoising with Morphology Preservation and Near Real-Time Hardware Deployment | Mahdi Pirayesh Shirazi Nejad et.al. | 2511.12478 | null |
| 2025-11-16 | Detecting LLM-Assisted Academic Dishonesty using Keystroke Dynamics | Atharva Mehta et.al. | 2511.12468 | null |
| 2025-11-16 | Design of A Low-Latency and Parallelizable SVD Dataflow Architecture on FPGA | Fangqiang Du et.al. | 2511.12461 | null |
| 2025-11-16 | Personality-guided Public-Private Domain Disentangled Hypergraph-Former Network for Multimodal Depression Detection | Changzeng Fu et.al. | 2511.12460 | null |
| 2025-11-16 | CoTBox-TTT: Grounding Medical VQA with Visual Chain-of-Thought Boxes During Test-time Training | Jiahe Qian et.al. | 2511.12446 | null |
| 2025-11-16 | Machine Learning Framework for Efficient Prediction of Quantum Wasserstein Distance | Changchun Feng et.al. | 2511.12443 | null |
| 2025-11-16 | Real-Time Drivers' Drowsiness Detection and Analysis through Deep Learning | ANK Zaman et.al. | 2511.12438 | null |
| 2025-11-16 | RoboAfford++: A Generative AI-Enhanced Dataset for Multimodal Affordance Learning in Robotic Manipulation and Navigation | Xiaoshuai Hao et.al. | 2511.12436 | null |
| 2025-11-16 | Online Adaptive Probabilistic Safety Certificate with Language Guidance | Zhuoyuan Wang et.al. | 2511.12431 | null |
| 2025-11-16 | RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning | Jingqi Xu et.al. | 2511.12428 | null |
| 2025-11-16 | SynthGuard: An Open Platform for Detecting AI-Generated Multimedia with Multimodal LLMs | Shail Desai et.al. | 2511.12404 | null |
| 2025-11-16 | Stochastic Predictive Analytics for Stocks in the Newsvendor Problem | Pedro A. Pury et.al. | 2511.12397 | null |
| 2025-11-15 | Learning Adaptive Neural Teleoperation for Humanoid Robots: From Inverse Kinematics to End-to-End Control | Sanjar Atamuradov et.al. | 2511.12390 | null |
| 2025-11-15 | CEDL: Centre-Enhanced Discriminative Learning for Anomaly Detection | Zahra Zamanzadeh Darban et.al. | 2511.12388 | null |
| 2025-11-14 | Volumetric Ergodic Control | Jueun Kwon et.al. | 2511.11533 | null |
| 2025-11-14 | Scalable Policy Evaluation with Video World Models | Wei-Cheng Tseng et.al. | 2511.11520 | null |
| 2025-11-14 | W2S-AlignTree: Weak-to-Strong Inference-Time Alignment for Large Language Models via Monte Carlo Tree Search | Zhenyu Ding et.al. | 2511.11518 | null |
| 2025-11-14 | Discrete Basis Parameterization for the Gauge Theory Bootstrap | Rafael Cordoba et.al. | 2511.11513 | null |
| 2025-11-14 | Collaborative Representation Learning for Alignment of Tactile, Language, and Vision Modalities | Yiyun Zhou et.al. | 2511.11512 | null |
| 2025-11-14 | OpenUS: A Fully Open-Source Foundation Model for Ultrasound Image Analysis via Self-Adaptive Masked Contrastive Learning | Xiaoyu Zheng et.al. | 2511.11510 | null |
| 2025-11-14 | PAS : Prelim Attention Score for Detecting Object Hallucinations in Large Vision--Language Models | Nhat Hoang-Xuan et.al. | 2511.11502 | null |
| 2025-11-14 | ImAgent: A Unified Multimodal Agent Framework for Test-Time Scalable Image Generation | Kaishen Wang et.al. | 2511.11483 | null |
| 2025-11-14 | Context-aware Adaptive Visualizations for Critical Decision Making | Angela Lopez-Cardona et.al. | 2511.11476 | null |
| 2025-11-14 | Proactive Hearing Assistants that Isolate Egocentric Conversations | Guilin Hu et.al. | 2511.11473 | null |
| 2025-11-14 | MoCap2Radar: A Spatiotemporal Transformer for Synthesizing Micro-Doppler Radar Signatures from Motion Capture | Kevin Chen et.al. | 2511.11462 | null |
| 2025-11-14 | Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing Classification | Qinghao Gao et.al. | 2511.11460 | null |
| 2025-11-14 | DiffPro: Joint Timestep and Layer-Wise Precision Optimization for Efficient Diffusion Inference | Farhana Amin et.al. | 2511.11446 | null |
| 2025-11-14 | Unsupervised Motion-Compensated Decomposition for Cardiac MRI Reconstruction via Neural Representation | Xuanyu Tian et.al. | 2511.11436 | null |
| 2025-11-14 | The Persistence of Cultural Memory: Investigating Multimodal Iconicity in Diffusion Models | Maria-Teresa De Rosa Palmini et.al. | 2511.11435 | null |
| 2025-11-14 | WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation | Wei Chow et.al. | 2511.11434 | null |
| 2025-11-14 | MicroVQA++: High-Quality Microscopy Reasoning Dataset with Weakly Supervised Graphs for Multimodal Large Language Model | Manyu Li et.al. | 2511.11407 | null |
| 2025-11-14 | Bidimensional measurements of photon statistics within a multimodal temporal framework | C. Hainaut et.al. | 2511.11403 | null |
| 2025-11-14 | RadAround: A Field-Expedient Direction Finder for Contested IoT Sensing & EM Situational Awareness | Owen A. Maute et.al. | 2511.11392 | null |
| 2025-11-14 | KarmaTS: A Universal Simulation Platform for Multivariate Time Series with Functional Causal Dynamics | Haixin Li et.al. | 2511.11357 | null |
| 2025-11-13 | Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling | Jiahao Wang et.al. | 2511.10648 | null |
| 2025-11-13 | Emergent spin order and steady-state superradiance in one-dimensional baths | Silvia Cardenas-Lopez et.al. | 2511.10638 | null |
| 2025-11-13 | Robot Crash Course: Learning Soft and Stylized Falling | Pascal Strauch et.al. | 2511.10635 | null |
| 2025-11-13 | Querying Labeled Time Series Data with Scenario Programs | Edward Kim et.al. | 2511.10627 | null |
| 2025-11-13 | Bi-Level Contextual Bandits for Individualized Resource Allocation under Delayed Feedback | Mohammadsina Almasi et.al. | 2511.10572 | null |
| 2025-11-13 | Oya: Deep Learning for Accurate Global Precipitation Estimation | Emmanuel Asiedu Brempong et.al. | 2511.10562 | null |
| 2025-11-13 | OmniVGGT: Omni-Modality Driven Visual Geometry Grounded | Haosong Peng et.al. | 2511.10560 | null |
| 2025-11-13 | GraphFaaS: Serverless GNN Inference for Burst-Resilient, Real-Time Intrusion Detection | Lingzhi Wang et.al. | 2511.10554 | null |
| 2025-11-13 | URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding | Yongxin Shi et.al. | 2511.10552 | null |
| 2025-11-13 | Edge Machine Learning for Cluster Counting in Next-Generation Drift Chambers | Deniz Yilmaz et.al. | 2511.10540 | null |
| 2025-11-13 | Evaluation of Grid-based Uncertainty Propagation for Collaborative Self-Calibration in Indoor Positioning Systems | Andrea Jung et.al. | 2511.10526 | null |
| 2025-11-13 | A scalable and accurate framework for self-calibrating null depth retrieval using neural posterior estimation | Baoyi Zeng et.al. | 2511.10455 | null |
| 2025-11-13 | Improving dependability in robotized bolting operations | Lorenzo Pagliara et.al. | 2511.10448 | null |
| 2025-11-13 | Unlocking Dynamic Inter-Client Spatial Dependencies: A Federated Spatio-Temporal Graph Learning Method for Traffic Flow Forecasting | Feng Wang et.al. | 2511.10434 | null |
| 2025-11-13 | CityVerse: A Unified Data Platform for Multi-Task Urban Computing with Large Language Models | Yaqiao Zhu et.al. | 2511.10418 | null |
| 2025-11-13 | MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns | Jiarui Zhang et.al. | 2511.10390 | null |
| 2025-11-13 | DermAI: Clinical dermatology acquisition through quality-driven image collection for AI classification in mobile | Thales Bezerra et.al. | 2511.10367 | null |
| 2025-11-13 | On The Performance of Prefix-Sum Parallel Kalman Filters and Smoothers on GPUs | Simo Särkkä et.al. | 2511.10363 | null |
| 2025-11-13 | Observable sets for free Schrödinger equation on combinatorial graphs | Zhiqiang Wan et.al. | 2511.10358 | null |
| 2025-11-13 | Towards Comprehensive Sampling of SMT Solutions | Shuangyu Lyu et.al. | 2511.10326 | null |
| 2025-11-10 | Lightning Grasp: High Performance Procedural Grasp Synthesis with Contact Fields | Zhao-Heng Yin et.al. | 2511.07418 | null |
| 2025-11-10 | StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation | Tianrui Feng et.al. | 2511.07399 | null |
| 2025-11-10 | Residual Rotation Correction using Tactile Equivariance | Yizhe Zhu et.al. | 2511.07381 | null |
| 2025-11-10 | Real-Time LiDAR Super-Resolution via Frequency-Aware Multi-Scale Fusion | June Moh Goo et.al. | 2511.07377 | null |
| 2025-11-10 | Offset-Free Robust Nonlinear Control Using Data-Driven Model: A Nonlinear Multi-Model Computationally Efficient Approach | Carine Menezes Rebello et.al. | 2511.07255 | null |
| 2025-11-10 | Privacy on the Fly: A Predictive Adversarial Transformation Network for Mobile Sensor Data | Tianle Song et.al. | 2511.07242 | null |
| 2025-11-10 | Resilient by Design - Active Inference for Distributed Continuum Intelligence | Praveen Kumar Donta et.al. | 2511.07202 | null |
| 2025-11-10 | Dynamic Vaccine Prioritization via Non-Markovian Final-state Optimization | Mi Feng et.al. | 2511.07200 | null |
| 2025-11-10 | Combining digital data streams and epidemic networks for real time outbreak detection | Ruiqi Lyu et.al. | 2511.07163 | null |
| 2025-11-10 | Real-Time Co-Simulation for DC Microgrid Energy Management with Communication Delays | S. Gokul Krishnan et.al. | 2511.07052 | null |
| 2025-11-10 | Raspi |
Jin Huang et.al. | 2511.06998 | null |
| 2025-11-10 | Light Focusing through Dynamic Media via Real-Valued Intensity Transmission Matrix | Xuan Liu et.al. | 2511.06993 | null |
| 2025-11-10 | Koopman-Based Dynamic Environment Prediction for Safe UAV Navigation | Vitor Bueno et.al. | 2511.06990 | null |
| 2025-11-10 | Fast Bayesian Updates via Harmonic Representations | Di Zhang et.al. | 2511.06978 | null |
| 2025-11-10 | Ultrafast Topological Transitions Driven by Permittivity Modulation in Non-Hermitian Multilayers | Giuseppina Simone et.al. | 2511.06963 | null |
| 2025-11-10 | DTTNet: Improving Video Shadow Detection via Dark-Aware Guidance and Tokenized Temporal Modeling | Zhicheng Li et.al. | 2511.06925 | null |
| 2025-11-10 | Real-Time Diverse Fiber Sensing Multi-Event Detection using Phase OTDR Measurements | Konstantinos Alexoudis et.al. | 2511.06922 | null |
| 2025-11-10 | MetricSynth: Framework for Aggregating DORA and KPI Metrics Across Multi-Platform Engineering | Pallav Jain et.al. | 2511.06864 | null |
| 2025-11-10 | Synergistic Antenna-Modulator Integration for Monolithic Photonic RF Receiver | Changlin Liu et.al. | 2511.06825 | null |
| 2025-11-10 | A Study of Cataclysmic Variables from the eFEDS Survey | Rui Wang et.al. | 2511.06814 | null |
| 2025-11-07 | FPGA-Based Real-Time Waveform Classification | Alperen Aksoy et.al. | 2511.05479 | null |
| 2025-11-07 | Precipitation nowcasting of satellite data using physically conditioned neural networks | Antônio Catão et.al. | 2511.05471 | null |
| 2025-11-07 | EventFlow: Real-Time Neuromorphic Event-Driven Classification of Two-Phase Boiling Flow Regimes | Sanghyeon Chang et.al. | 2511.05467 | null |
| 2025-11-07 | Helios: A 98-qubit trapped-ion quantum computer | Anthony Ransford et.al. | 2511.05465 | null |
| 2025-11-07 | Large Language Models for Explainable Threat Intelligence | Tiago Dinis et.al. | 2511.05406 | null |
| 2025-11-07 | AI Assisted AR Assembly: Object Recognition and Computer Vision for Augmented Reality Assisted Assembly | Alexander Htet Kyaw et.al. | 2511.05394 | null |
| 2025-11-07 | Optimal Control of H-Mode Tokamak Plasma Temperature based on Pontryagin's Principle | Slim Jmal et.al. | 2511.05382 | null |
| 2025-11-07 | ETHOS: A Robotic Encountered-Type Haptic Display for Social Interaction in Virtual Reality | Eric Godden et.al. | 2511.05379 | null |
| 2025-11-07 | MultiVic: A Time-Predictable RISC-V Multi-Core Processor Optimized for Neural Network Inference | Maximilian Kirschner et.al. | 2511.05321 | null |
| 2025-11-07 | Force-Safe Environment Maps and Real-Time Detection for Soft Robot Manipulators | Akua K. Dickson et.al. | 2511.05307 | null |
| 2025-11-07 | psiUnity: A Platform for Multimodal Data-Driven XR | Akhil Ajikumar et.al. | 2511.05304 | null |
| 2025-11-07 | LiveStar: Live Streaming Assistant for Real-World Online Video Understanding | Zhenyu Yang et.al. | 2511.05299 | null |
| 2025-11-07 | Automatic segmentation of colorectal liver metastases for ultrasound-based navigated resection | Tiziano Natali et.al. | 2511.05253 | null |
| 2025-11-07 | Transporter: A 128 |
Yang Lin et.al. | 2511.05241 | null |
| 2025-11-07 | Scaling behavior of dissipative systems with imaginary gap closing | Jinghui Pi et.al. | 2511.05220 | null |
| 2025-11-07 | Neural Operators for Power Systems: A Physics-Informed Framework for Modeling Power System Components | Ioannis Karampinis et.al. | 2511.05216 | null |
| 2025-11-07 | SmartSecChain-SDN: A Blockchain-Integrated Intelligent Framework for Secure and Efficient Software-Defined Networks | Azhar Hussain Mozumder et.al. | 2511.05156 | null |
| 2025-11-07 | On the Estimation of Climate Normals and Anomalies | Tommaso Proietti et.al. | 2511.05071 | null |
| 2025-11-07 | Epically Powerful: An open-source software and mechatronics infrastructure for wearable robotic systems | Jennifer K. Leestma et.al. | 2511.05033 | null |
| 2025-11-07 | Multi-agent Coordination via Flow Matching | Dongsu Lee et.al. | 2511.05005 | null |
| 2025-11-06 | Funnel-Based Online Recovery Control for Nonlinear Systems With Unknown Dynamics | Zihao Song et.al. | 2511.04626 | null |
| 2025-11-06 | Optimizing Sensor Placement in Urban Storm Sewers: A Data-Driven Sparse Sensing Approach | Zihang Ding et.al. | 2511.04556 | null |
| 2025-11-06 | Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment | Tao Lin et.al. | 2511.04555 | null |
| 2025-11-06 | Portable, cost_effective UV_vis_NIR microspectrophotometer for absorption and fluorescence microscopy and spectroscopy | Negar Karpourazar et.al. | 2511.04507 | null |
| 2025-11-06 | AI-Driven Phase-Shifted Carrier Optimization for Cascaded Bridge Converters, Modular Multilevel Converters, and Reconfigurable Batteries | Amin Hashemi-Zadeh et.al. | 2511.04470 | null |
| 2025-11-06 | Cutana: A High-Performance Tool for Astronomical Image Cutout Generation at Petabyte Scale | Pablo Gómez et.al. | 2511.04429 | null |
| 2025-11-06 | Mitigating effects of nonlinearities in homodyne quadrature interferometers | Johannes Lehmann et.al. | 2511.04386 | null |
| 2025-11-06 | Self-correcting High-speed Opto-electronic Probabilistic Computer | Ramy Aboushelbaya et.al. | 2511.04300 | null |
| 2025-11-06 | A Parallel Region-Adaptive Differential Privacy Framework for Image Pixelization | Ming Liu et.al. | 2511.04261 | null |
| 2025-11-06 | Accurate humidity and pH synchronized measurement with temperature compensation based on polarization maintaining fiber | Jia Liu et.al. | 2511.04203 | null |
| 2025-11-06 | Deep reinforcement learning based navigation of a jellyfish-like swimmer in flows with obstacles | Yihao Chen et.al. | 2511.04156 | null |
| 2025-11-06 | Infrared Microscopy of Biochemistry and Metabolism in Single Living Eukaryotic Cells | Luca Quaroni et.al. | 2511.04143 | null |
| 2025-11-06 | Automated Tennis Player and Ball Tracking with Court Keypoints Detection (Hawk Eye System) | Venkata Manikanta Desu et.al. | 2511.04126 | null |
| 2025-11-06 | Unified Effective Field Theory for Nonlinear and Quantum Optics | Xiaochen Liu et.al. | 2511.04118 | null |
| 2025-11-06 | Tortoise and Hare Guidance: Accelerating Diffusion Model Inference with Multirate Integration | Yunghee Lee et.al. | 2511.04117 | null |
| 2025-11-06 | Automated and Explainable Denial of Service Analysis for AI-Driven Intrusion Detection Systems | Paul Badu Yakubu et.al. | 2511.04114 | null |
| 2025-11-06 | E-CARE: An Efficient LLM-based Commonsense-Augmented Framework for E-Commerce | Ge Zhang et.al. | 2511.04087 | null |
| 2025-11-06 | Enhancing Fault-Tolerant Space Computing: Guidance Navigation and Control (GNC) and Landing Vision System (LVS) Implementations on Next-Gen Multi-Core Processors | Kyongsik Yun et.al. | 2511.04052 | null |
| 2025-11-06 | An LLM-based Framework for Human-Swarm Teaming Cognition in Disaster Search and Rescue | Kailun Ji et.al. | 2511.04042 | null |
| 2025-11-06 | Shellular Metamaterial Design via Compact Electric Potential Parametrization | Chang Liu et.al. | 2511.04025 | null |
| 2025-11-06 | Node-Based Editing for Multimodal Generation of Text, Audio, Image, and Video | Alexander Htet Kyaw et.al. | 2511.03227 | null |
| 2025-11-05 | LLM-enhanced Air Quality Monitoring Interface via Model Context Protocol | Yu-Erh Pan et.al. | 2511.03706 | null |
| 2025-11-05 | Certified randomness amplification by dynamically probing remote random quantum states | Minzhao Liu et.al. | 2511.03686 | null |
| 2025-11-05 | Simulation-Based Validation of an Integrated 4D/5D Digital-Twin Framework for Predictive Construction Control | Atena Khoshkonesh et.al. | 2511.03684 | null |
| 2025-11-05 | LiveTradeBench: Seeking Real-World Alpha with Large Language Models | Haofei Yu et.al. | 2511.03628 | null |
| 2025-11-05 | Super-resolution Optical Near-field EM for bio- and materials science | Ilia Zykov et.al. | 2511.03597 | null |
| 2025-11-05 | Performance Evaluation of a Position-Sensitive SiPM-based Gamma Camera for Intraoperative Imaging | Aramis Raiola et.al. | 2511.03493 | null |
| 2025-11-05 | A Modified Pulse and Design Framework to Halve the Complexity of OFDM Spectral Shaping Techniques | Javier Giménez et.al. | 2511.03465 | null |
| 2025-11-05 | Formalizing ETLT and ELTL Design Patterns and Proposing Enhanced Variants: A Systematic Framework for Modern Data Engineering | Chiara Rucco et.al. | 2511.03393 | null |
| 2025-11-05 | A Digital Twin of Evaporative Thermo-Fluidic Process in Fixation Unit of DoD Inkjet Printers | Samarth Toolhally et.al. | 2511.03379 | null |
| 2025-11-05 | Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification | Shaghayegh Kolli et.al. | 2511.03217 | null |
| 2025-11-05 | SurgAnt-ViVQA: Learning to Anticipate Surgical Events through GRU-Driven Temporal Cross-Attention | Shreyas C. Dhake et.al. | 2511.03178 | null |
| 2025-11-05 | Subsampled Randomized Fourier GaLore for Adapting Foundation Models in Depth-Driven Liver Landmark Segmentation | Yun-Chen Lin et.al. | 2511.03163 | null |
| 2025-11-05 | A Proprietary Model-Based Safety Response Framework for AI Agents | Qi Li et.al. | 2511.03138 | null |
| 2025-11-05 | NOWS: Neural Operator Warm Starts for Accelerating Iterative Solvers | Mohammad Sadegh Eshaghi et.al. | 2511.02481 | null |
| 2025-11-05 | Ultrafast magnetic moment transfer and bandgap renormalization in monolayer FeCl |
Yu-Hui Song et.al. | 2511.02461 | null |
| 2025-11-04 | A Collaborative Reasoning Framework for Anomaly Diagnostics in Underwater Robotics | Markus Buchholz et.al. | 2511.03075 | null |
| 2025-11-04 | Reading Between the Lines: The One-Sided Conversation Problem | Victoria Ebert et.al. | 2511.03056 | null |
| 2025-11-04 | ROBoto2: An Interactive System and Dataset for LLM-assisted Clinical Trial Risk of Bias Assessment | Anthony Hevia et.al. | 2511.03048 | null |
| 2025-11-04 | Exploratory Analysis of Cyberattack Patterns on E-Commerce Platforms Using Statistical Methods | Fatimo Adenike Adeniya et.al. | 2511.03020 | null |
| 2025-11-04 | Establishing Trust in Crowdsourced Data | Iffat Gheyas et.al. | 2511.03016 | null |
| 2025-11-04 | Observer-based neural networks for flow estimation and control | Tarcísio C. Déda e |