LLM&Agent PPTAgent: PPT automatically generates the Agent framework

#News ·2025-01-08

Look at a PPT generation idea: PPTAgent. Traditional PPT generation methods usually use the end-to-end text generation paradigm, which only focuses on the text content, ignoring the layout design and PPT structure. PPTAgent uses an edit-based generative paradigm to address the challenges of dealing with spatial relationships and design style.

Each slide in the traditional method can be represented by the following formula:

method

PPTAgent框架PPTAgent framework

In this article, PPTAgent is a framework for automatically generating PPT. The edit-based workflow is divided into two stages: PPT analysis and PPT generation.

Stage I: PPT analysis

The main goal is to provide structured and semantic reference information for PPT generation through slide clustering and content schema extraction. The results of this stage will directly affect the quality and efficiency of the subsequent stage.

  • Slide clustering

Slide clustering (hierarchical clustering) is the process of grouping slides in a reference PPT according to their functionality and content. Slides can be divided into two broad categories: Clustering algorithms:

图片picture

  • Hierarchical clustering

聚类示例Clustering example

a. Structured slides: These slides are primarily used to support the structure of the presentation, such as opening slides, transition slides, and closing slides. For such slides, the PPTAgent uses the LLM to infer the functional roles of each slide and groups them according to those roles. These slides usually have a distinct textual feature.

b. Content slides: These slides are primarily used to convey specific information, such as slides containing bullets, charts, and images. For such slides, PPTAgent adopts hierarchical clustering method based on image similarity. Group similar slides together by calculating the image similarity between slides.

  • Content Schema extraction
    After the slide clustering is complete, the PPTAgent further analyzes the content Schema of each cluster to ensure consistency of editing goals. Since slides in the real world can be very complex and fragmented, PPTAgent utilizes the context-aware capabilities of LLM to extract diverse content schemas. PPTAgent defines a content Schema extraction framework where each element is represented by its category, mode, and content. Based on this framework, PPTAgent extracts the content Schema from each slide through the LLM's command compliance and structured output capabilities. The extraction process is as follows:

a. Category: Describes the type of element, such as text box, image, etc.

b. Modal: Describes how elements are rendered, such as plain text, text with graphics, etc.

c. Content: Describes the specific content of an element, such as text content or alternative text to an image.

Stage II: PPT generation

图片picture

The second stage is based on the analysis results of the first stage to generate a new PPT. At the heart of this stage is an interactive editing process that generates the target PPT using reference slides and input documents. The steps include: generating a structured outline, specifying the reference slides and related content for each slide; Use LLMs to iteratively edit reference slides to generate new slides; Implement five specialized apis that allow LLMs to edit, delete, and copy text elements, as well as edit and remove visual elements.

Outline generation: Outline generation guides the LLM to create a structured outline based on human preferences. Each entry specifies the reference slide, the index of the relevant document section, and the title and description of the new slide. By utilizing the planning and summarising capabilities of the LLM, together with the semantic information extracted from the reference PPT, a coherent and engaging outline is generated to guide the generation process of the new PPT.

Slide generation: Slide generation is the process of generating new slides by iteratively editing reference slides under the guidance of an outline. To enable precise manipulation of slide elements, PPTAgent implements five specialized apis that allow LLM to edit, delete, and copy text elements, as well as edit and delete visual elements. In addition, to enhance understanding of the slide structure, PPTAgent converts the slide from its original XML format to an HTML representation that is easier for LLM interpretation.

experiment

Evaluation indicators, existing indicators include:

  • Success Rate (SR)
  • Perplexity (PPL)
  • Frechet Inception Distance (FID)

PPTEval indicators include:

  • Content
  • Design (Design)
  • Coherence
  • Average score (Avg.)

These metrics are used to evaluate the quality of the generated PPT in different dimensions.

图片

reference

PPTAgent: Generating and Evaluating Presentations Beyond the Text - to - Slides TAB, https://arxiv.org/pdf/2501.03936v1

TAGS:

  • 13004184443

  • Room 607, 6th Floor, Building 9, Hongjing Xinhuiyuan, Qingpu District, Shanghai

  • gcfai@dongfangyuzhe.com

  • wechat

  • WeChat official account

Quantum (Shanghai) Artificial Intelligence Technology Co., Ltd. ICP:沪ICP备2025113240号-1

friend link