O-DisCo-Edit: Object Distortion Control for Unified Realistic Video Editing

Yuqing Chen1,3*, Junjie Wang1, Lin Liu2✉†, Ruihang Chu1, Xiaopeng Zhang2, Qi Tian2, Yujiu Yang1,
1Tsinghua University, 2Huawei Inc, 2Pengcheng Laboratory
*This work was done during an internship at Huawei Inc, Project Leader,Corresponding authors

Abstract

Diffusion models have recently advanced video editing, yet controllable editing remains challenging due to the need for precise manipulation of diverse object properties. Current methods require different control signal for diverse editing tasks, which complicates model design and demands significant training resources. To address this, we propose O-DisCo-Edit, a unified framework that incorporates a novel object distortion control (O-DisCo). This signal, based on random and adaptive noise, flexibly encapsulates a wide range of editing cues within a single representation. Paired with a ``copy-form'' preservation module for preserving non-edited regions, O-DisCo-Edit enables efficient, high-fidelity editing through an effective training paradigm. Extensive experiments and comprehensive human evaluations consistently demonstrate that O-DisCo-Edit surpasses both specialized and multitask state-of-the-art methods across various video editing tasks.

Object Removal

Outpainting

Object Internal Motion Transfer

Lighting Transfer

Color Change

Swap

Addition

Style Transfer

Method Overview

Mixed Video-Image Finetuning

The framework of the proposed O-DisCo-Edit. (a) Reference video. (b) Reference image (first frame during training, edited image during inference). (c) Masks. (d) R-O-DisCo. (e) A-O-DisCo. (f) Generated video. (g) Latent of reference video. (h) Latent of the preserved region. (i) Image latent with zero-padding. (m) Noisy Latent. (n) Image Latent with the latent of preserved region. α represents the contrast, σ represents the intensity of the added noise, and k is the size of the gaussian blur kernel. The adaptive distorter generates A-O-DisCo for inference, and the random distorter generates R-O-DisCo for training. The CFP ensures the preservation of unedited areas. The IDP maintains object appearance consistency.