Real-time hi-fi face editing method PersonaMagic seamlessly generates new character, style, or scene images based on portraits.

#News ·2025-01-07

This article is reprinted with the authorization of AIGC Studio public account, please contact the source for reprinting.

Today, we introduce a high-fidelity real-time face editing method PersonaMagic, which optimizes face customization through phased text conditioning and dynamic embedding learning. The technique utilizes time-dynamic cross-attention mechanism to effectively capture facial features at different stages, thus preserving identity information to the greatest extent when generating personalized images. Through comparative experiments, PersonaMagic outperforms current state-of-the-art methods in both quantitative and qualitative evaluations, demonstrating its flexibility and robustness in a variety of scenarios and styles.

图片PersonaMagic seamlessly generates an image of a new character, style, or scene based on the portrait provided by the user. By modulating embedments through the concatenated balancing strategy learning phase, the approach can accurately capture and represent unseen concepts, faithfully creating personas that match the cues provided, while minimizing identity distortions.

Related link

  • Paper: http://arxiv.org/abs/2412.15674v1
  • Code: https://github.com/xzhe-Vision/PersonaMagic

Paper introduction

图片PersonaMagic: High-fidelity facial customization using tandem balanced stage adjustment

Abstract

Personalized image generation has made significant progress in adapting content to new concepts. However, there remains an ongoing challenge: balancing the accurate reconstruction of unseen concepts with the need to edit on cue, especially when dealing with the complex nuances of facial features. In this study, we delve into the temporal dynamics of the text-to-image adjustment process, highlighting the critical role of stage division in introducing new concepts. We present PersonaMagic, a stage-adjusted generation technology designed for high-fidelity facial customization. Using a simple MLP network, our approach learns a series of embeddings within specific time-step intervals to capture facial concepts. In addition, we developed a concatenation balancing mechanism that adjusts the self-attention response in the text encoder, balancing text description and identity preservation, thereby improving both areas. Numerous experiments have confirmed that PersonaMagic outperforms the most advanced methods in both qualitative and quantitative evaluation. In addition, its robustness and flexibility are validated in non-facial domains, and it can also serve as a valuable plug-in to enhance the performance of pre-trained personalized models.

method

图片Overview of the process. Given an image, we learn a series of embeddings in the dynamic phase to effectively capture identity information, while using fixed embeddings in the static phase. The proposed TE strategy is applied to text encoders to ensure that personalized results are further aligned with text descriptions.

Neglected semantics lead to poor attention force. The attention weights are marked in the lower left corner of the cross attention diagram.

图片

The illustrated series balance presented.

result

图片

Qualitative comparison with the most advanced methods of celebrity.

图片

Qualitative comparison with the most advanced methods of non-celebrities.

图片

Customized results with and without Lte during training. The attention weights are marked in the lower left corner of the cross attention graph.

图片

Qualitative ablation of different model variants.

图片

This method can be applied to various downstream tasks. From top to bottom: localized customization, presentation modification, and composite generation.

PersonaMagic can be adapted to non-facial domains, demonstrating its versatility beyond facial content.

Integrating PersonaMagic into a pre-trained personalization model can improve the facial details in the results.

图片

conclusion

The PersonaMagic introduced in this paper is a high fidelity face customization technology, which uses a stage adjustment text adjustment strategy based on comprehensive analysis. A lightweight network is introduced to implement this mediation mechanism through dynamic word embedding, effectively capturing identity information while avoiding overfitting. In addition, a concatenated balancing loss is proposed to resolve the tradeoff between text alignment and identity preservation. A large number of experiments have demonstrated the superior performance of the method compared to the most advanced methods, performing well in both fidelity and edits, and demonstrating its effectiveness in various downstream customization tasks.

TAGS:

  • 13004184443

  • Room 607, 6th Floor, Building 9, Hongjing Xinhuiyuan, Qingpu District, Shanghai

  • gcfai@dongfangyuzhe.com

  • wechat

  • WeChat official account

Quantum (Shanghai) Artificial Intelligence Technology Co., Ltd. ICP:沪ICP备2025113240号-1

friend link