CVPR 2026

InstantRetouch Efficient and High-Fidelity Instruction-Guided Image Retouching with Bilateral Space

Jiarui Wu1,2, Yujin Wang1, Ruikang Li1,2, Fan Zhang1, Mingde Yao2, Tianfan Xue2,1,3

1Shanghai AI Laboratory, 2CUHK MMLab, 3CPII under InnoHK

Distilling diffusion priors into one-step bilateral-space retouching for fast, faithful, instruction-aligned edits.

Overview

TL;DR

InstantRetouch combines diffusion-level instruction understanding with bilateral-space rendering, enabling one-step photo retouching that is fast, faithful, and practical at high resolution.

  • Fidelity first: preserves geometry and fine texture while changing color and tone.
  • One-step efficiency: runs in near-real-time with consistent behavior at 720p and 4K.
  • Strong instruction following: aligns edits with prompts without sacrificing visual quality.
Category

Strengths

Why Better

Fidelity

Bilateral affine transforms retain structure and detail while performing meaningful local retouch edits.

Efficiency

One-step inference with bilateral rendering removes expensive iterative denoising at deployment time.

Instruction Following

Distilled teacher priors transfer prompt intent into stable image-space edits with better controllability.

Method

How It Works

Why Bilateral Space

Language-guided retouching mainly changes exposure, contrast, color tone, and local appearance rather than geometry. Bilateral space is a natural fit: it models photometric transforms explicitly while preserving high-frequency structures.

Step 1: Pretrain a Multi-step Diffusion Teacher

A multi-step teacher learns robust instruction-aware editing behavior from paired retouch supervision.

Step 2: One-step Bilateral Distillation

Distill teacher behavior into a one-step student with two coordinated branches: low-resolution one-step diffusion branch training and full-resolution bilateral distillation.

Step 2 one-step bilateral distillation framework.

Evaluation

Results

Benchmark Distribution

Dataset tags cover diverse retouch intents including illumination control, color grading, portrait enhancement, and cinematic style transfer.

Quality-Performance Profile

Runtime, fidelity, and editing alignment jointly indicate a favorable operating point for practical deployment.

Direction: Runtime/GMSD/DISTS/L1 lower is better; SSIM/SC/PQ higher is better.

Head-to-Head

Comparison

Reference

Citation

@inproceedings{wu2026instantretouch,
  title={InstantRetouch: Efficient and High-Fidelity Instruction-Guided Image Retouching with Bilateral Space},
  author={Wu, Jiarui and Wang, Yujin and Li, Ruikang and Zhang, Fan and Yao, Mingde and Xue, Tianfan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}
Expanded preview