Newly

The following is my selected paper list. # means co-first author and * means corresponding author. The full paper list can be found here. The papers marked in red are about image/video composition. The papers marked with are recommended.

• Li Niu#, Wen Li#, Dong Xu, "Exploiting Privileged Information from Web Data for Image Categorization", ECCV, 2014. [pdf]

An SVM-based method which can handle label noise, utilize privileged information and tackle with domain distribution mismatch when learning from web data.

• Li Niu, Wen Li, Dong Xu, "Visual Recognition by Learning from Web Data: A Weakly Supervised Domain Generalization Approach", CVPR, 2015. [pdf]

A weakly supervised domain generalization approach which considers label noise and guarantees domain generalization ability simultaneously

• Li Niu, Wen Li, Dong Xu, "Multi-view Domain Generalization for Visual Recognition", ICCV, 2015. [pdf]

A multi-view domain generalization approach which incorporates Exemplar SVM and Low Rank Representation (LRR).

•Li Niu, Wen Li, Dong Xu, "Exploiting Privileged Information from Web Data for Action and Event Recognition", IJCV, 2016. [pdf]

An SVM-based method which can handle label noise, utilize privileged information and tackle with domain distribution mismatch when learning from web data.

• Li Niu, Jianfei Cai, Dong Xu, "Domain Adaptive Fisher Vector for Visual Recognition", ECCV, 2016. [pdf]

A domain adaptation method specifically designed for Fisher vectors, which are obtained based on IDT (resp., proposal CNN) for human action recognition (resp., object recognition).

• Li Niu, Xinxing Xu, Lin Chen, Lixin Duan, Dong Xu, "Action and Event Recognition in Videos by Learning from Heterogeneous Web Sources", T-NNLS, 2017. [pdf]

A web data driven approach to learn from heterogeneous web sources including images, videos, and surrounding text

• Li Niu, Wen Li, Dong Xu, Jianfei Cai, "Visual Recognition by Learning from Web Data via Weakly Supervised Domain Generalization", T-NNLS, 2017. [pdf]

A weakly supervised domain generalization approach which can generalize the classifier trained by using noisy web data to any unseen target domain.

•Li Niu, Wen Li, Dong Xu, Jianfei Cai, "An Exemplar-based Multi-view Domain Generalization Framework for Visual Recognition", T-NNLS, 2018. [pdf]

An exemplar-based domain generalization framework built on exemplar SVM classifiers, which exploit the consensus or complementary principle of multiple views.

• Li Niu, Ashok Veeraraghavan, Ashu Sabharwal, “Webly Supervised Learning Meets Zero-shot Learning: A Hybrid Approach for Fine-grained Classification”, CVPR, 2018. [pdf]

The first work to combine webly supervised learning (free images and semantic information online) and zero-shot learning coherently in one unified formulation for fine-grained classification.

• Li Niu, Qingtao Tang, Ashok Veeraraghavan, Ashu Sabharwal, “Learning from Noisy Web Data with Category-level Supervision”, CVPR, 2018. [pdf]

A Variational AutoEncoder (VAE)-based approach to cope with the label noise of web data with the aid of category-level semantic information, which fills the void for webly supervised learning between no strong supervision and instance-level strong supervision.

• Li Niu, Jianfei Cai, Ashok Veeraraghavan, Liqing Zhang, "Zero-Shot Learning via Category-Specific Visual-Semantic Mapping and Label Refinement", T-IP, 2019. [pdf]

A zero-shot learning framework which learns one visual-semantic mapping for each category to address the projection domain shift problem, considering that the visual-semantic mapping of different categories are considerably different.

• Jianfu Zhang, Li Niu*, Dexin Yang, Liwei Kang, Yaoyi Li, Weijie Zhao, Liqing Zhang*, "GAIN: Gradient Augmented Inpainting Network for Irregular Holes", ACM MM, 2019. [pdf]

An novel image inpainting method, which fuses gradient information in a multi-task framework and also uses gradient information to determine the filling priority.

• Yiyi Zhang, Li Niu*, Ziqi Pan, Meichao Luo, Jianfu Zhang, Dawei Cheng, Liqing Zhang*, "Exploiting Motion Information from Unlabeled Videos for Image Action Recognition", AAAI, 2020. [pdf] [arxiv]

A simple yet effective framework using unlabeled videos to facilitate image action recognition, which unifies two related research lines: self-supervised learning and motion information anticipation.

• Yi Tu, Li Niu*, Weijie Zhao, Dawei Cheng, Liqing Zhang*, "Image Cropping with Composition and Saliency Aware Aesthetic Score Map", AAAI, 2020. [pdf] [arxiv]

An aesthetic image cropping model, which produces composition-aware and saliency-aware aesthetic score map, aiming to unveil the inner mechanism of aesthetic evaluation.

• Ruicong Xu, Li Niu*, Jianfu Zhang, Liqing Zhang*, "A Proposal-based Approach for Activity Image-to-Video Retrieval", AAAI, 2020. [pdf] [arxiv]

The first proposal based activity image-to-video retrieval method, which introduces Graph Multi-Instance Learning (GMIL) into cross-modal retrieval and proposes a geometry-aware triplet loss.

• Yi Tu, Li Niu*, Junjie Chen, Dawei Cheng, Liqing Zhang*, "Learning from Web Data with Self-organizing Memory Module", CVPR, 2020. [pdf] [arxiv]

The first work on webly supervised learning using memory module. We point out that web data have two types of noises (label noise and background noise), which can be addressed by our method simultanesouly. We also propose self-organizing memory (SOM) module, which can make memory module insensitive to initialization.

• Wenyan Cong, Jianfu Zhang, Li Niu*, Liu Liu, Zhixin Ling, Weiyuan Li, Liqing Zhang, "DoveNet: Deep Image Harmonization via Domain Verification", CVPR, 2020. [pdf] [arxiv] [dataset&code]

We release the first large-scale image harmonization dataset with four subdatasets: HCOCO, HAdobe5k, HFlickr, and Hday2night. We introduce the concept of domain verfication and design a domain verification discriminator for image harmonization.

• Yan Hong, Li Niu*, Jianfu Zhang, Liqing Zhang, "MatchingGAN: Matching-based Few-shot Image Generation", ICME, 2020. [pdf] [arxiv] [code]

A metric-based GAN for few-shot image generation, analogous to matching network for few-shot classification.

• Yan Hong, Li Niu* , Jianfu Zhang, Liqing Zhang, "Beyond without Forgetting: Multi-task Learning for Classification with Disjoint Datasets", ICME, 2020. [pdf] [arxiv]

A multi-task learning method which jointly use labeled datasets and unlabeled datasets.

• Zhangxuan Gu, Siyuan Zhou, Li Niu*, Zihan Zhao, Liqing Zhang*, "Context-aware Feature Generation For Zero-shot Semantic Segmentation", ACM MM, 2020. [pdf] [arxiv] [code]

Context-aware pixel-wise feature generation for zero-shot semantic segmentation, which can segment unseen objects in the testing stage.

• Yan Hong, Li Niu*, Jianfu Zhang, Weijie Zhao, Chen Fu, Liqing Zhang, "F2GAN: Fusing-and-Filling GAN for Few-shot Image Generation", ACM MM, 2020. [pdf] [arxiv] [code]

An attentional fusion based GAN for few-shot image generation, which fuses the high-level features of conditional images and then fills in the low-level details attended from conditional images.

• Zhangxuan Gu, Li Niu*, Haohua Zhao, Liqing Zhang*, "Hard Pixel Mining for Depth Privileged Semantic Segmentation", T-MM, 2020. [pdf] [arxiv]

Use both RGB and depth data to mine hard pixels to facilitate semantic segmentation.

• Jianfu Zhang, Li Niu*, Liqing Zhang*, "Person Re-Identification with Reinforced Attribute Attention Selection", T-IP, 2020. [pdf]

An attribute-boosted person re-identification method, in which reinforcement learning is adopted to accomplish attribute attention selection to address identity-level attribute annotation noise.

• Ziqi Pan, Li Niu*, Jianfu Zhang, Liqing Zhang*, "Disentangled Information Bottleneck", AAAI, 2021. [pdf] [arxiv] [code]

Rethinking disentangled representation from the perspective of information bottleneck.

• Zhijie Zhang, Yan Liu, Junjie Chen, Li Niu*, Liqing Zhang*, "Depth Privileged Object Detection in Indoor Scenes via Deformation Hallucination", AAAI, 2021. [pdf]

Transfer geometric deformation from depth modality to RGB modality for depth-privileged object detection.

• Liu Liu, Jiangtong Li, Li Niu*, Ruicong Xu, Liqing Zhang, "Activity Image-to-Video Retrieval by Disentangling Appearance and Motion", AAAI, 2021. [pdf]

An image-to-video retrieval method by disentangling video feature into appearance feature and motion feature.

• Wenyan Cong, Li Niu*, Jianfu Zhang, Jing Liang, Liqing Zhang, "BargainNet: Background-Guided Domain Translation for Image Harmonization", ICME, 2021. [pdf] [arxiv] [code]

We formulate image harmonization as background guided domain translation task. The extracted domain code can be used to predict the inharmony level of a composite image.

• Jing Liang, Li Niu*, Liqing Zhang, "Inharmonious Region Localization", ICME, 2021. [pdf] [arxiv] [code]

The first work on detecting the inharmonious region due to inconsistent color and lighting statistics in an image.

• Shengyuan Huang, Xing Zhao, Li Niu* , Liqing Zhang, "Static Image Action Recognition with Hallucinated Fine-grained Motion Information", ICME, 2021. [pdf]

We hallucinate fine-grained high-level motion information for static images to facilitate static image action recognition.

• Lu He, Qianyu Zhou, Xiangtai Li, Li Niu*, Guangliang Cheng, Xiao Li, Wenxuan Liu, Yunhai Tong, Lizhuang Ma, Liqing Zhang*, "End-to-End Video Object Detection with Spatial-Temporal Transformers", ACM MM, 2021. [pdf] [arxiv] [code]

The first work on video object detection using transformer.

• Jiangtong Li, Wentao Wang, Junjie Chen, Li Niu*, Jianlou Si, Chen Qian, Liqing Zhang*, "Video Semantic Segmentation with Sparse Temporal Transformer", ACM MM, 2021. [pdf]

The first work on video semantic segmentation using transformer.

• Jing Liang, Li Niu*, Fengjun Guo, Teng Long, Liqing Zhang, "Visible Watermark Removal via Self-calibrated Localization and Background Refinement", ACM MM, 2021. [pdf] [arxiv] [code]

A multi-task coarse-to-fine network with self-calibrated mask refinement and mask-guided background enhancement for visible watermark removal in watermarked images.

• Wentao Wang, Jianfu Zhang, Li Niu*, Haoyu Ling, Xue Yang, Liqing Zhang*, "Parallel Multi-Resolution Fusion Network for Image Inpainting", ICCV, 2021. [pdf] [sup]

A parallel multi-resolution network based on HRNet for image inpainting with resolution-specific inpainting priority.

• Junjie Chen, Li Niu*, Liu Liu, Liqing Zhang*, "Weak-shot Fine-grained Classification via Similarity Transfer", NeurIPS, 2021. [pdf] [arxiv] [dataset&code]

A weak-shot classification approach by using transferred similarity across categories in weighted loss and graph regularizer.

• Yan Liu, Zhijie Zhang, Li Niu*, Junjie Chen, Liqing Zhang*, "Mixed Supervised Object Detection by Transferring Mask Prior and Semantic Similarity", NeurIPS, 2021. [pdf] [arxiv] [code]

A weak-shot object detection approach by transferring mask prior and semantic similarity across categories.

• Jiangtong Li, Liu Liu, Li Niu*, Liqing Zhang*, "Memorize, Associate and Match: Embedding Enhancement via Fine-grained Alignment for Image-Text Retrieval", T-IP, 2021. [pdf]

A novel cross-modal retrieval method named MEMBER by using Memory-based EMBedding Enhancement for image-text Retrieval (MEMBER), which introduces global memory banks to enable fine-grained alignment and fusion in embedding learning paradigm.

• Junjie Chen, Li Niu*, Liqing Zhang*, "Depth Privileged Scene Recognition via Dual Attention Hallucination", T-IP, 2021. [pdf]

A depth privileged classification method which uses RGB input to hallucinate two types of depth attention: post-hoc importance weight and trainable spatial transformation.

• Bo Zhang, Li Niu*, Liqing Zhang, "Image Composition Assessment with Saliency-augmented Multi-pattern Pooling", BMVC, 2021. [pdf] [arxiv] [dataset&code]

We release the first image composition assessment dataset CADB and propose a novel saliency-augmented multi-pattern pooling strategy.

• Li Niu*, Shengyuan Huang, Xing Zhao, Liwei Kang, Yiyi Zhang, Liqing Zhang, "Hallucinating Uncertain Motion and Future for Static Image Action Recognition", CVIU, 2022. [pdf]

We hallucinate multiple plausible motion features and future visual features for a static image to facilitate the image action recognition task.

• Jiangtong Li, Zhixin Ling, Li Niu*, Liqing Zhang*, "Zero-shot Sketch-based Image Retrieval with Structure-aware Asymmetric Disentanglement", CVIU, 2022. [pdf] [arxiv]

A structure-aware retrieval method via asymmetric disentanglement, in which image features are disentangled into structure features and appearance features while sketch features are only projected to structure space.

• Yan Hong, Li Niu*, Jianfu Zhang, "Shadow Generation for Composite Image in Real-world Scenes", AAAI, 2022. [pdf] [arxiv] [dataset&code]

This is the first work on generating shadows for the inserted objects in complex real-world scenes. We release the first real-world shadow generation dataset and propose a novel two-stage approach: shadow mask generation and shadow filling.

• Jiangtong Li, Li Niu*, Liqing Zhang*, "Action-Aware Embedding Enhancement for Image-Text Retrieval", AAAI, 2022. [pdf]

A novel image-text retrieval method which emphasizes the motion information and enriches the image/text features with motion-similar text features .

• Jing Liang, Li Niu*, Penghao Wu, Fengjun Guo, Teng Long, "Inharmonious Region Localization by Magnifying Domain Discrepancy", AAAI, 2022. [pdf] [arxiv] [code]

An inharmonious region localization method which learns a color mapping to magnify the domain discrepancy between inharmonious region and background for ease of localization.

• Zhangxuan Gu, Ziyuan Zhou, Li Niu*, Zihan Zhao, Liqing Zhang*, "From Pixel to Patch: Synthesize Context-aware Features for Zero-shot Semantic Segmentation", T-NNLS, 2022. [pdf] [arxiv] [code]

We extend pixel-wise feature generation (CaGNetv1) to patch-wise feature generation (CaGNetv2) by generating plausible category patches for zero-shot sementic segmentation.

• Wentao Wang, Li Niu*, Jianfu Zhang, Xue Yang, Liqing Zhang*, "Dual-path Image Inpainting with Auxiliary GAN Inversion", CVPR, 2022. [pdf]

A combination of GAN inversion based image inpainting method and feedforward image inpainting method.

• Jiangtong Li, Li Niu*, Liqing Zhang*, "From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering", CVPR, 2022. [pdf] [arxiv] [dataset&code]

A public dataset for causal video question-answering task which includes four types of questions: description, explanation, prediction, and counterfactual.

• Wenyan Cong, Xinhao Tao, Li Niu*, Jing Liang, Xuesong Gao, Qihao Sun, Liqing Zhang, "High-Resolution Image Harmonization via Collaborative Dual Transformations", CVPR, 2022. [pdf] [arxiv] [dataset&code]

We combine pixel-to-pixel transformation and RGB-to-RGB transformation in a coherent framework for high-resolution image harmonization.

• Xinyuan Lu, Shengyuan Huang, Li Niu*, Wenyan Cong, Liqing Zhang*, "Deep Video Harmonization with Color Mapping Consistency", IJCAI, 2022. [pdf] [arxiv] [dataset&code]

We release the first video harmonization dataset HYouTube. We utilize color mapping consistency to alleviate the flickering artifacts for video harmonization.

• Zhixin Ling, Zhen Xing, jiangtong li, Li Niu*, "Multi-Level Region Matching for Fine-Grained Sketch-Based Image Retrieval", ACM MM, 2022. [pdf]

A method to establish the fine-grained correspondence between sketches and images for fine-grained sketch-based image retrieval.

• Yan Hong, Li Niu*, Jianfu Zhang, Liqing Zhang, "Few-shot Image Generation Using Discrete Content Representation", ACM MM, 2022. [pdf] [arxiv]

An attempt to unify few-shot image translation and few-shot image generation with discrete content map and style-conditioned content map autoregression.

• Yan Hong, Li Niu*, Jianfu Zhang, Liqing Zhang, "DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta", ECCV, 2022. [pdf] [arxiv] [code]

Learning sample-specific intra-category delta information, which is combined with input image to generate a new image from the same category.

• Bo Zhang, Li Niu*, Xing Zhao, Liqing Zhang, "Human-centric Image Cropping with Partition-aware and Content-preserving Features", ECCV, 2022. [pdf] [arxiv] [code]

Human-centric image cropping with human-centric partition and important content preservation.

• Siyuan Zhou, Liu Liu, Li Niu*, Liqing Zhang, "Learning Object Placement via Dual-path Graph Completion", ECCV, 2022. [pdf] [arxiv] [code]

We reformulate object placement as a graph completion problem. Specifically, background nodes have both content features and placements, while the inserted foreground node only has content feature. Thus, we need to estimate the missing placement for the foreground node to complete the graph.

• Junjie Chen, Li Niu*, Siyuan Zhou, Jianlou Si, Chen Qian, Liqing Zhang*, "Weak-shot Semantic Segmentation via Dual Similarity Transfer", NeurIPS, 2022. [pdf] [arxiv] [code]

By decomposing semantic segmentation task into two subtasks (proposal classification and proposal segmentation), we transfer proposal-pixel similarity and pixel-pixel similarity for weak-shot semantic segmentation.

• Ziqi Pan, Li Niu*, Liqing Zhang*, "UniGAN: Reducing Mode Collapse in GANs using a Uniform Generator", NeurIPS, 2022. [pdf] [code]

We propose a new type of generative diversity named uniform diversity, which relates to a newly proposed type of mode collapse where the generative samples distribute nonuniformly over the data manifold.

• Junyan Cao, Wenyan Cong, Li Niu*, Jianfu Zhang, Liqing Zhang, "Deep Image Harmonization by Bridging the Reality Gap", BMVC, 2022. [pdf] [arxiv] [dataset&code]

We construct an image harmonization dataset RdHarmony using 3D rendering techniques. We also propose a cross-domain network to leverage the training images from both rendered domain and real domain.

• Penghao Wu, Li Niu*, Jing Liang, Liqing Zhang, "Inharmonious Region Localization via Recurrent Self-Reasoning", BMVC, 2022. [pdf] [arxiv]

We formualte inharmonious region localization as a clustering problem and propose a recurrent self-reasoning module to update the inharmonious/background clusters iteratively.

• Penghao Wu, Li Niu*, Liqing Zhang, "Inharmonious Region Localization with Auxiliary Style Feature", BMVC, 2022. [pdf] [arxiv] [code]

We extract discriminative style features to facilitate inharmonious region localization in two aspects: feature aggregation and style voting map.

• Siyuan Zhou, Li Niu*, Jianlou Si, Chen Qian, Liqing Zhang, "Weak-shot Semantic Segmentation by Transferring Semantic Affinity and Boundary", BMVC, 2022. [pdf] [arxiv] [code]

A simple weak-shot semantic segmentation baseline by transferring semantic affinity and boundary from base categories to novel categories.

• Xing Zhao, Li Niu*, Liqing Zhang , "Visible Watermark Removal with Dynamic Kernel and Semantic-aware Propagation", BMVC, 2022. [pdf]

We propose watermark-specific dynamic kernel and semantic-aware propagation to cope with diverse watermarks and complicated semantics.

• Yuxuan Duan, Yan Hong, Li Niu*, Liqing Zhang*, "Few-Shot Defect Image Generation via Defect-Aware Feature Manipulation", AAAI, 2023. [pdf] [arxiv] [code]

A StyleGAN-based few-shot defect image genreation method by adding residual features to the defect region.

• Junyan Cao, Yan Hong, Li Niu*, "Painterly Image Harmonization in Dual Domains", AAAI, 2023. [pdf] [arxiv] [code]

A painterly image harmonization method which adjusts the style of composite foreground in both spatial domain and frequency domain.

• Siyuan Zhou, Chunru Zhan, Biao Wang, Tiezheng Ge, Yuning Jiang, Li Niu*, "Video Object of Interest Segmentation", AAAI, 2023. [pdf] [arxiv]

We propose a new task named video object of interest segmentation: given an object of interest shown in an image, we aim to segment this object across frames in a video. We also design a baseline method.

• Junjie Chen, Li Niu*, Jianfu Zhang, Jianlou Si, Chen Qian, Liqing Zhang*, "Amodal Instance Segmentation via Prior-guided Expansion", AAAI, 2023. [pdf]

For amodal instance segmentation which infers the amodal mask (both the visible part and occluded part) of each object instance, we propose box-level (resp., pixel-level) expansion for amodal box (resp., mask) prediction.

• Ziqi Pan, Li Niu*, Liqing Zhang*, "Geometric Inductive Biases for Identifiable Unsupervised Learning of Disentangled Representations", AAAI, 2023. [pdf]

We propose novel geometric inductive biases from the manifold perspective for unsupervised disentangling.

• Ziqi Pan, Jianfu Zhang, Li Niu*, Liqing Zhang*, "Isometric Manifold Learning using Hierarchical Flow", AAAI, 2023. [pdf] [code]

We propose the Hierarchical Flow (HF) model constrained by isometric regularizations for manifold learning that combines manifold learning goals such as dimensionality reduction, inference, sampling, projection and density estimation into one unified framework.

• Wentao Wang, Lu He, Li Niu*, Jianfu Zhang, Yue Liu, Haoyu Ling, Liqing Zhang*, "Diverse image inpainting with disentangled uncertainty", Pattern Recognition, 2023. [pdf]

We disentangle structure uncertainty and appearance uncertainty for diverse image inpainting.

• Chao Wang, Li Niu*, Bo Zhang, Liqing Zhang*, "Image Cropping with Spatial-aware Feature and Rank Consistency", CVPR, 2023. [pdf]

We propose spatial-aware feature to encode the spatial relationship between candidate crops and aesthetic elements. We also transfer pair-wise ranking knowledge in the transductive setting.

• Li Niu*, Xing Zhao, Bo Zhang, Liqing Zhang, "Fine-grained Visible Watermark Removal", ICCV, 2023. [pdf]

We advance imagespecific dynamic network towards part-specific dynamic network for visible watermark removal.

• Li Niu*, Linfeng Tan, Xinhao Tao, Junyan Cao, Fengjun Guo, Teng Long, Liqing Zhang, "Deep Image Harmonization with Globally Guided Feature Transformation and Relation Distillation", ICCV, 2023. [pdf] [arxiv] [dataset]

We propose GiftNet with Globally guIded Feature Transformation (GIFT) module and relation distillation from reconstruction network. We also contribute ccHarmony dataset which approximates illumination variation with color checker.

• Li Niu*, Junyan Cao, Wenyan Cong, Liqing Zhang, "Deep Image Harmonization with Learnable Augmentation", ICCV, 2023. [pdf] [arxiv] [code]

We propose learnable augmentation to enhance the illumination diversity for a target domain with limited data.

• Jiangtong Li, Li Niu*, Liqing Zhang*, "Knowledge Proxy Intervention for Deconfounded Video Question Answering", ICCV, 2023. [pdf]

we propose a model-agnostic framework which introduces an extra knowledge proxy variable in the causal graph to cut the backdoor path and remove the confounder.

• Bo Zhang, Jiacheng Sui, Li Niu*, "Foreground Object Search by Distilling Composite Image Feature", ICCV, 2023. [pdf] [arxiv] [dataset&code]

We contribute two datasets for Foreground Object Search (FOS) task: S-FOSD dataset with synthetic composite images and R-FOSD dataset with real composite images. We also propose a teacher-student network, which enforces the interaction output between foreground and background features in the student network to match the composite image feature from the teacher network.

• Linfeng Tan, Jiangtong Li, Li Niu*, Liqing Zhang*, "Deep Image Harmonization in Dual Color Spaces", ACM MM, 2023. [pdf] [arxiv] [code]

Image harmonization using both correlated RGB color space and decorrelated Lab color space.

• Jieteng Yao, Junjie Chen, Li Niu*, Bin Sheng*, "Scene-aware Human Pose Generation using Transformer", ACM MM, 2023. [pdf] [arxiv]

A transformer-based approach to generate plausible human poses for a target location in the scene image.

• Lingxiao Lu, Jiangtong Li, Junyan Cao, Li Niu*, Liqing Zhang*, "Painterly Image Harmonization using Diffusion Model", ACM MM, 2023. [pdf] [arxiv] [code]

A diffusion model approach for painterly image harmonization, which outperforms PHDNet significantly when the background has dense textures or abstract styles.

• Xudong Wang, Li Niu*, Junyan Cao, Yan Hong, Liqing Zhang*, "Painterly Image Harmonization via Adversarial Residual Learning", WACV, 2024. [pdf] [arxiv]

An adversarial painterly image harmonization approach, which applies adversarial learning to encoder feature maps.

• Yan Hong, Li Niu*, Jianfu Zhang, "Arbitrary Style Transfer with Prototype-Based Channel Alignment", ICASSP, 2024. [pdf]

We propose a decoration branch with prototype-based channel alignment module to align the channels between style feature and content feature.

• Li Niu*, Yan Hong, Junyan Cao, Liqing Zhang, "Progressive Painterly Image Harmonization from Low-level Styles to High-level Styles", AAAI, 2024. [pdf] [arxiv] [code]

We consider different levels of styles and perform painterly image harmonization from low-level styles to high-level styles.

• Li Niu*, Junyan Cao, Yan Hong, Liqing Zhang, "Painterly Image Harmonization by Learning from Painterly Objects", AAAI, 2024. [pdf] [arxiv] [dataset&code]

We attempt to hallucinate the target style of foreground object based on background style and foreground object information. We also contribute an Arto Dataset with pairs of similar artistic objects and photorealistic objects.

• Yuxuan Duan, Li Niu*, Yan Hong, Liqing Zhang*, "WeditGAN: Few-Shot Image Generation via Latent Space Relocation", AAAI, 2024. [pdf] [arxiv] [code]

A simple and effective few-shot image generation approach by relocating the distribution of source latent spaces.

• Xinhao Tao, Junyan Cao, Yan Hong, Li Niu*, "Shadow Generation with Decomposed Mask Prediction and Attentive Shadow Filling", AAAI, 2024. [pdf] [arxiv] [dataset]

We contribute a rendered dataset for shadow generation task. We also propose to decompose shadow mask prediction into box prediction and shape prediction.

• Qingyang Liu, Junqi You, Jianting Wang, Xinhao Tao, Bo Zhang, Li Niu*, "Shadow Generation for Composite Image Using Diffusion Model", CVPR, 2024. [pdf] [arxiv] [dataset&code]

We contribute a real-world dataset DESOBAv2 for shadow generation task and a diffusion-based shadow generation model.

• Zhaohe Liao, Jiangtong Li, Li Niu*, Liqing Zhang*, "Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering", CVPR, 2024. [pdf] [arxiv]

We propose a VidQA framework with two additional modules: the video aligner and answer aggregator. The video aligner hierarchically aligns the question with the video clips. The answer aggregator takes in the questions from question decompose graph and deduces the answers to the questions.

• Ruodai Cui, Li Niu*, Guosheng Hu, "Unsupervised Exposure Correction", ECCV, 2024. [pdf] [arxiv] [code]

An unsupervised exposure correction method using freely available paired data from an emulated image signal processing pipeline.

• Zhaohe Liao, Jiangtong Li, Jun Lan, Huijia Zhu, Weiqiang Wang, Li Niu*, Liqing Zhang*, "COIN-Matting: Confounder Intervention for Image Matting", ECCV, 2024. [pdf]

We model the image matting task from the perspective of causal inference to address contrast bias and transparency bias.

• Yuxuan Duan, Yan Hong, Bo Zhang, jun lan, Huijia Zhu, Weiqiang Wang, Jianfu Zhang, Li Niu*, Liqing Zhang*, "DomainGallery: Few-shot Domain-driven Image Generation by Attribute-centric Finetuning", NeurIPS, 2024. [pdf] [arxiv] [code]

We propose DomainGallery, a few-shot domain-driven image generation method which aims at finetuning pretrained Stable Diffusion on few-shot target datasets in an attribute-centric manner.

• Bingjie Gao, Xinyu Gao, Xiaoxue Wu, yujie zhou, Yu Qiao*, Li Niu*, Xinyuan Chen*, Yaohui Wang*, "The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation", CVPR, 2025. [pdf] [arxiv] [code]

A retrieval-augmented prompt optimization framework for text-to-video generation.

• Qianli Ma, Xuefei Ning, Dongrui Liu, Li Niu*, Linfeng Zhang*, "Decouple-Then-Merge: Finetune Diffusion Models as Multi-Task Learning", CVPR, 2025. [pdf] [arxiv] [code]

A Decouple-then-Merge framework to solve the gradient conflicts in different timesteps. With a pretrained model, we finetune separate models tailored to specific timesteps and then merge them into a single model.

• Haonan Zhao, Qingyang Liu, Xinhao Tao, Li Niu*, Guangtao Zhai, "Shadow Generation Using Diffusion Model with Geometry Prior", CVPR, 2025. [pdf] [code]

A diffusion-based shadow generation method by leveraging geometry priors (rotated shadow bounding box and retrieved shadow shapes).

• Bingjie Gao, Bo Zhang, Li Niu*, "Object Placement for Anything", ICME, 2025. [pdf] [arxiv]

A semi-supervised framework which can exploit large-scale unlabeled dataset to promote the generalization ability of discriminative object placement models.

• Yujie Zhou, Jiazi Bu, Pengyang Ling, Pan Zhang, Tong Wu, Qidong Huang, Jinsong Li, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Anyi Rao, Jiaqi Wang, Li Niu*, "Light-A-Video: Training-free Video Relighting via Progressive Light Fusion", ICCV, 2025. [pdf] [arxiv] [code]

We propose Light-A-Video, a training-free approach to achieve temporally smooth video relighting, which is adapted from image relighting models and equipped with two key modules to enhance lighting consistency.

• Junjie Chen, Zeyu Luo, Wenhui Jiang, Zezheng Liu, Li Niu*, Yuming Fang, "Weak-shot Pose Estimation via Keyness and Correspondence Transfer", NeurIPS, 2025. [pdf]

A weak-shot pose estimation method where multiple novel classes are learned from unlabeled images with the help of labeled base classes.

• Jiaxuan Chen, Bo Zhang, Qingdong He, Jinlong Peng, Li Niu*, "CareCom: Generative Image Composition with Calibrated Reference Features", AAAI, 2026. [pdf] [arxiv] [code]

We propose to calibrate the global and local features of foreground reference images to make them compatible with the background information.

• Shuochen Chang, Xiaofeng Zhang, Qingyang Liu, Li Niu*, "D³ToM: Decider-Guided Dynamic Token Merging for Accelerating Diffusion MLLMs", AAAI, 2026. [pdf] [arxiv] [code]

A decider-guided dynamic token merging method that dynamically merges redundant visual tokens at different denoising steps to accelerate inference in Diffusion MLLMs.

• Jiangtong Li, Zhaohe Liao, Fengshun Xiao, Tianjiao Li, Qiang Zhang, Haohua Zhao, Li Niu*, Guang Chen, Liqing Zhang*, Changjun Jiang, "Parse, Align and Aggregate: Graph-driven Compositional Reasoning for Video Question Answering", T-PAMI, 2026. [pdf] [code]

We propose the Question Parsing, Video Alignment and Answer Aggregation framework (QPVA3), which leverages a compositional graph to drive visual and logical reasoning in VideoQA.

• Qingyang Liu, Jiangtong Li*, Zelin Peng, Shaobo Wang, Zhaohe Liao, Shuochen Chang, Bingjie Gao, Haonan Zhao, Mu Liu, Jidong Jiang, Li Niu*, "Bridging Visual Dynamics and Narrative Reasoning: Multimodal Large Language Models for Short Drama Quality Assessment", WWW, 2026. [pdf]

We define a user-centric quality indicator (hot value) and develop the first interpretable MLLM for short drama quality assessment.

• Qingyang Liu, Bingjie Gao, Weiheng Huang, Jun zhang, Zhongqian Sun, Wei Yang, Fengrui Liu, Zelin Peng, Qianli Ma, Shuai Yang, Zhaohe Liao, Haonan Zhao, Li Niu*, "AnimateScene: Camera-controllable Animation in Any Scene", ICASSP, 2026. [pdf] [arxiv]

AnimateScene takes in a scene image, a human image, a motion clip, and a user-defined camera path, producing a rendered video.

• Yujie Zhou, Pengyang Ling, Jiazi Bu, Yibin Wang, Yuhang Zang, Jiaqi Wang, Li Niu*, Guangtao Zhai, "Fine-Grained GRPO for Precise Preference Alignment in Flow Models", CVPR, 2026. [pdf] [arxiv] [code]

A novel framework called Granular-GRPO (GRPO) which enables fine-grained and comprehensive evaluation of sampling directions in the RL training of flow models.

• Haonan Zhao, Qingyang Liu, Jiaxuan Chen, Li Niu*, "Reflection Generation for Composite Image Using Diffusion Model", ICME, 2026. [arxiv] [dataset&code]

We release the first large-scale object reflection dataset DEROBA and propose a diffusion-based object reflection generation model.

• Yujie Zhou, Pengyang Ling, Jiazi Bu, Bingjie Gao, Li Niu*, "Uni-Classifier: Leveraging Video Diffusion Priors for Universal Guidance Classifier", ICME, 2026. [arxiv]

We propose a plug-and-play module that leverages video diffusion priors to guide the preceding image generation models, so that the generated images could be aligned with downstream requirements.

• Shuochen Chang, Tong Bai, Xiaofeng Zhang, Qianli Ma, Qingyang Liu, Zhaohe Liao, Yibo Miao, Li Niu*, "Unlocking the Black Box of Latent Reasoning: An Interpretability-Guided Approach to Intervention", ACL, 2026. [pdf] [arxiv]

We present a systematic analysis to reveal that latent vectors encode faithful compressed reasoning steps with early latent vectors as key causal hubs, based on which we design training-free decode-time interventions to refine latent reasoning.

• Qingyang Liu, Bingjie Gao, Canmiao Fu, Zhipeng Huang, Chen Li, Feng Wang, Shuochen Chang, Shaobo Wang, Yali Wang, Keming Ye, Jiangtong Li, Li Niu*, "Breaking Dual Bottlenecks: Evolving Unified Multimodal Models into Self-Adaptive Interleaved Visual Reasoners", ICML, 2026. [arxiv] [code]

We propose an interleaving understanding-generation pipeline including direct generation for simple cases, self-reflection for quality refinement, and multi-step planning for decomposing complex scenario.

• Jiacheng Sui, Tianyu Hao, Bingjie Gao, Li Niu*, Guangtao Zhai, "In-context Region-based Drag: Drag Any Region to Any Shape", ECCV, 2026. [arxiv] [dataset&code]

We propose a novel In-Context Region-based Drag (ICRDrag) method under the in-context learning framework. We also construct Paired Region Dataset (PRD), which is a large-scale dataset with paired masks and images.