O-DisCo-Edit: Object Distortion Control for Unified Realistic Video Editing

Yuqing Chen^1,3*, Junjie Wang^{1^✉,} Lin Liu^{2^✉†,} Ruihang Chu¹, Xiaopeng Zhang², Qi Tian², Yujiu Yang¹,

¹Tsinghua University, ²Huawei Inc, ²Pengcheng Laboratory
^*This work was done during an internship at Huawei Inc, ^†Project Leader,^✉Corresponding authors

Abstract

Diffusion models have recently advanced video editing, yet controllable editing remains challenging due to the need for precise manipulation of diverse object properties. Current methods require different control signal for diverse editing tasks, which complicates model design and demands significant training resources. To address this, we propose O-DisCo-Edit, a unified framework that incorporates a novel object distortion control (O-DisCo). This signal, based on random and adaptive noise, flexibly encapsulates a wide range of editing cues within a single representation. Paired with a ``copy-form'' preservation module for preserving non-edited regions, O-DisCo-Edit enables efficient, high-fidelity editing through an effective training paradigm. Extensive experiments and comprehensive human evaluations consistently demonstrate that O-DisCo-Edit surpasses both specialized and multitask state-of-the-art methods across various video editing tasks.

Object Removal

Remove the elephant

Remove the car

Remove the train

Remove the sailboard

Remove the donkey

Remove the chicken and dog

Object Internal Motion Transfer

Transfer the motion of milk

Transfer movement of the alarm clock's second hand

Transfer the planet's rotation

Transfer the movement of bubbles

Lighting Transfer

Transfer the lighting gradient effects of the faucet

Transfer the water's surface light reflection

Transfer the light and shadow effects on the fish in water

Transfer the refractive light gradients through the crystal ball

Swap

Swap the bear with a polar bear

Swap the elephant with sketch of elephant

Swap the train with high-tech train

Swap the race car with a sedan

Addition

Add a helmet to the person's head

Add a necklace to the person's neck

Add some roses on the swan's back

Add a flower to the bear's head

Method Overview

The framework of the proposed O-DisCo-Edit. (a) Reference video. (b) Reference image (first frame during training, edited image during inference). (c) Masks. (d) R-O-DisCo. (e) A-O-DisCo. (f) Generated video. (g) Latent of reference video. (h) Latent of the preserved region. (i) Image latent with zero-padding. (m) Noisy Latent. (n) Image Latent with the latent of preserved region. α represents the contrast, σ represents the intensity of the added noise, and k is the size of the gaussian blur kernel. The adaptive distorter generates A-O-DisCo for inference, and the random distorter generates R-O-DisCo for training. The CFP ensures the preservation of unedited areas. The IDP maintains object appearance consistency.

O-DisCo-Edit: Object Distortion Control for Unified Realistic Video Editing

Abstract

Object Removal

Remove the elephant

Remove the car

Remove the train

Remove the sailboard

Remove the donkey

Remove the chicken and dog

Outpainting

Outpaint the snow mountains

Outpaint the clouds

Outpaint the mountains and lakes

Outpaint the village

Object Internal Motion Transfer

Transfer the motion of milk

Transfer movement of the alarm clock's second hand

Transfer the planet's rotation

Transfer the movement of bubbles

Lighting Transfer

Transfer the lighting gradient effects of the faucet

Transfer the water's surface light reflection

​​Transfer the light and shadow effects on the fish in water

Transfer the refractive light gradients through the crystal ball

Color Change

Change the boat color to red

Change the bus color to green

Change the bus color to orange

Change the boat color to blue