Real-time hi-fi face editing method PersonaMagic seamlessly generates new character, style, or scene images based on portraits.-News-Artificial Intelligence Global Cooperation Alliance

Real-time hi-fi face editing method PersonaMagic seamlessly generates new character, style, or scene images based on portraits.

#News ·2025-01-07

This article is reprinted with the authorization of AIGC Studio public account, please contact the source for reprinting.

Today, we introduce a high-fidelity real-time face editing method PersonaMagic, which optimizes face customization through phased text conditioning and dynamic embedding learning. The technique utilizes time-dynamic cross-attention mechanism to effectively capture facial features at different stages, thus preserving identity information to the greatest extent when generating personalized images. Through comparative experiments, PersonaMagic outperforms current state-of-the-art methods in both quantitative and qualitative evaluations, demonstrating its flexibility and robustness in a variety of scenarios and styles.

PersonaMagic seamlessly generates an image of a new character, style, or scene based on the portrait provided by the user. By modulating embedments through the concatenated balancing strategy learning phase, the approach can accurately capture and represent unseen concepts, faithfully creating personas that match the cues provided, while minimizing identity distortions.

Paper introduction

PersonaMagic: High-fidelity facial customization using tandem balanced stage adjustment

Abstract

Personalized image generation has made significant progress in adapting content to new concepts. However, there remains an ongoing challenge: balancing the accurate reconstruction of unseen concepts with the need to edit on cue, especially when dealing with the complex nuances of facial features. In this study, we delve into the temporal dynamics of the text-to-image adjustment process, highlighting the critical role of stage division in introducing new concepts. We present PersonaMagic, a stage-adjusted generation technology designed for high-fidelity facial customization. Using a simple MLP network, our approach learns a series of embeddings within specific time-step intervals to capture facial concepts. In addition, we developed a concatenation balancing mechanism that adjusts the self-attention response in the text encoder, balancing text description and identity preservation, thereby improving both areas. Numerous experiments have confirmed that PersonaMagic outperforms the most advanced methods in both qualitative and quantitative evaluation. In addition, its robustness and flexibility are validated in non-facial domains, and it can also serve as a valuable plug-in to enhance the performance of pre-trained personalized models.

method

Overview of the process. Given an image, we learn a series of embeddings in the dynamic phase to effectively capture identity information, while using fixed embeddings in the static phase. The proposed TE strategy is applied to text encoders to ensure that personalized results are further aligned with text descriptions.

Neglected semantics lead to poor attention force. The attention weights are marked in the lower left corner of the cross attention diagram.

The illustrated series balance presented.

result

Qualitative comparison with the most advanced methods of celebrity.

Qualitative comparison with the most advanced methods of non-celebrities.

Customized results with and without Lte during training. The attention weights are marked in the lower left corner of the cross attention graph.

Qualitative ablation of different model variants.

This method can be applied to various downstream tasks. From top to bottom: localized customization, presentation modification, and composite generation.

PersonaMagic can be adapted to non-facial domains, demonstrating its versatility beyond facial content.

Integrating PersonaMagic into a pre-trained personalization model can improve the facial details in the results.

conclusion

The PersonaMagic introduced in this paper is a high fidelity face customization technology, which uses a stage adjustment text adjustment strategy based on comprehensive analysis. A lightweight network is introduced to implement this mediation mechanism through dynamic word embedding, effectively capturing identity information while avoiding overfitting. In addition, a concatenated balancing loss is proposed to resolve the tradeoff between text alignment and identity preservation. A large number of experiments have demonstrated the superior performance of the method compared to the most advanced methods, performing well in both fidelity and edits, and demonstrating its effectiveness in various downstream customization tasks.

TAGS：

PREV： Together with NetEase, Xiamen University proposed StoryWeaver, which can achieve high-quality story visualization based on given characters within a unified model

RETURN

NEXT： Free manual annotation! Ideal multimodal framework UniPLV: New SOTA for open 3D scene understanding