Together with NetEase, Xiamen University proposed StoryWeaver, which can achieve high-quality story visualization based on given characters within a unified model

#News ·2025-01-07

This article is reprinted with the authorization of AIGC Studio public account, please contact the source for reprinting.

Together with NetEase, Xiamen University proposed StoryWeaver, which can achieve high-quality story visualization based on a given character within a unified model. An image can be generated to match the story text and ensure that each character is consistent in different scenes. The method in this paper mainly includes the following steps:

  • 1. Character Diagram construction: Design a character diagram (CG) to structurally represent the characters, events, and attribute nodes in the story. Roles are object nodes, and attribute nodes are attached to roles. The relationships between roles are connected by edges to form a comprehensive knowledge network.
  • 2. Custom generation: Customize through character diagrams (C-CG) to generate detailed scene descriptions and capture the details of characters and their interactions. The Visual language model (VLM) is used to extract the rich semantic information of images, and the event-related semantics are extracted by the scene graph parser.
  • 3. Spatial guidance of knowledge enhancement: The spatial guidance of knowledge enhancement is introduced into the cross-attention mechanism to modify the attention force and ensure the consistency of roles in the generation process. Improve the quality of multi-character generation by assigning external knowledge to optimize the positions and relationships of characters in the image.图片

图片StoryWeaver can achieve high-quality story visualization based on a given role within a unified model.

Related link

  • Paper: http://arxiv.org/abs/2412.07375v2
  • Home Page: https://github.com/Aria-Zhangjl/StoryWeaver

Thesis reading

图片StoryWeaver: A unified world model customized for knowledge enhanced story characters

Abstract

Story visualization is getting more and more attention in the field of artificial intelligence. However, existing approaches still struggle to maintain a balance between character identity preservation and textual semantic alignment, mainly due to a lack of detailed semantic modeling of story scenes.

To address this challenge, the paper proposes a new knowledge graph, the Character graph (CG), which comprehensively represents a variety of story-related knowledge, including characters, attributes associated with characters, and relationships between characters. We then introduced StoryWeaver, a custom image generator through the Character Graph (CCG) that enables consistent story visualizations with rich text semantics. In order to further improve the performance of multi-role generation, this paper combines knowledge enhanced spatial guidance (KE-SG) into StoryWeaver to precisely inject role semantics into generation.

In order to verify the validity of the proposed method, an extensive experiment is performed using a new benchmark named TBC-Bench. Experiments have confirmed that StoryWeaver is not only good at creating vivid visual storylines, but also good at accurately conveying character identities in various scenes, and has quite high storage efficiency, for example, DINO-I has an average increase of 9.03%, and CLIP-T has an average increase of 13.44%. In addition, ablation experiments were performed to verify the superiority of the proposed module.

method

图片StoryWeaver's overall framework.

a. Character-Graph is proposed to represent the semantically rich knowledge in the story world.

b. StoryWeaver is enhanced by the proposed spatial guidance to further improve the performance of multi-role generation

图片Visual examples of the impact of customization through character diagrams (C-CG) and knowledge-enhanced Spatial guidance (KE-SG).

a. Without C-CG, the generator would have difficulty capturing the finer granular details of the character.

b. Without KESG, generators tend to distribute attention evenly across all areas, resulting in a mix of identities.

result

图片Visual comparison of different approaches in single-character and multi-character visual storytelling. StoryWeaver excites character identity customization and well-matched semantic alignment.

图片(a) Single character generation example

图片(b) Multi-character generation example

图片

图片An example of a multi-character story visualization on the Pororo dataset.

图片The collection of characters and samples focuses on two animated films, Boruru and Frozen. The samples included detailed descriptions of individual characters and scenes showing interactions between multiple characters.

conclusion

This paper proposes a unified model, StoryWeaver, which has complex character customization functions and can be used for story visualization. This paper first proposes a novel character diagram, which encapsulates the rich semantic knowledge in the story world to enhance StoryWeaver. Then, knowledge enhanced spatial guidance is introduced to improve the cross attention map to achieve accurate multi-role generation. The experimental results show that StoryWeaver achieves better fidelity in identity customization and achieves better semantic alignment than a set of single and multiple customization methods.

TAGS:

  • 13004184443

  • Room 607, 6th Floor, Building 9, Hongjing Xinhuiyuan, Qingpu District, Shanghai

  • gcfai@dongfangyuzhe.com

  • wechat

  • WeChat official account

Quantum (Shanghai) Artificial Intelligence Technology Co., Ltd. ICP:沪ICP备2025113240号-1

friend link