It's smarter to eat melons with a big model! Ali Tongyi Lab proposed a new timeline summary framework to comprehensively improve the efficiency of news summary-News-Artificial Intelligence Global Cooperation Alliance

It's smarter to eat melons with a big model! Ali Tongyi Lab proposed a new timeline summary framework to comprehensively improve the efficiency of news summary

#News ·2025-01-07

Now, the large model can help you sort out the news timeline, and it will be easier to eat melons in the future!

AI Agent's wind, we cyberfun people have to blow.

That's according to new research from Alibaba's Tongyi Lab and Shanghai Jiao Tong University, who propose a new Agent-based framework for news timeline summaries called CHRONOS.

Not only can it help you summarize the important events from the huge amount of news, but more importantly, it can also tease out a clear timeline, so that all kinds of complex events can be made clear when surfing the web later.

The word CHRONOS comes from the Greek god of time, Chronos.

By iterating multiple rounds of self-questioning and combining with retrieval enhanced generation technology, the framework can retrieve relevant event information from the Internet and generate chronological news summaries, which provides a new solution for generating news timeline summaries.

Let's take a look at some examples.

For example, for the news "National Football 1-0 Bahrain", CHRONOS can summarize massive news and present the ins and outs of the event.

For the longer coverage of the news "China's lunar exploration project", CHRONOS can also focus on key events and present the timeline development so that users can see at a glance.

Complete the TLS shortcomings in the open domain

The Timeline Summarization (TLS) task is a classic technical challenge in the field of natural language processing, which aims to extract key events from large amounts of textual data and arrange them in chronological order to provide a structured view of the historical development of a topic or field.

In journalism, for example, a timeline summary can help users quickly understand the ins and outs of a news event. This task requires not only identifying significant events, but also understanding temporal relationships and causal connections between events in order to generate a coherent, concise, and informative summary of the timeline.

Depending on the source of the retrievable event, TLS tasks can be subdivided into closed-domain and open-domain Settings: In a closed domain TLS task, a timeline is created from a predefined set of news articles related to a specific topic or domain, while open domain TLS refers to the process of generating a timeline by searching and retrieving news articles directly from the Internet.

While past work has focused on solving timeline generation problems over closed domains, open domain TLS requires robust information retrieval and filtering capabilities, as well as the ability to identify and establish connections between events without a global view, creating new requirements and challenges for this task.

CHRONOS framework for iterative retrieval

To address these challenges, the team proposed the CHRONOS framework, which uses iterative questions to retrieve relevant events and generate an accurate and comprehensive timeline summary, which can effectively solve TLS tasks in both open and closed domain Settings.

Step 1: Motivation

The core of timeline generation lies in establishing time and causality between events.

Each news event can be represented as a different node, and the goal of the task is to establish edges between these nodes to show their correlation and eventually form a heterogeneous graph, starting with the nodes of the topic news.

Therefore, through a retrieval mechanism to retrieve related news articles, these edges can be effectively established to form links between events.

2. Overview

CHRONOS leverages the power of large models to simulate the process of human information retrieval by asking questions, further asking new questions based on the search results, and ultimately gathering comprehensive information about relevant events and summarizing them into a timeline.

CHRONOS includes the following modules:

Self-Questioning: First search for coarse-grained news background information, then iteratively ask questions to retrieve more relevant news.

Question Rewriting: decomposes complex or poorly performed questions into more specific, easy-to-retrieve queries.

Timeline Generation: Summarize a timeline that highlights important events by merging the timelines generated from each round of searches.

3. Ask yourself questions

(1) Coarse grained background research

In the initial stage of self-questioning, CHRONOS searches using the headline of the target news as a keyword to gather the information most directly related to the target news.

This information forms the News Context and lays a preliminary foundation for self-questioning.

(2) Question sample selection

After the coarse-grained background research, CHRONOS uses the contextual learning capabilities of the large model to guide the model to generate questions about the target news through a small number of sample prompts.

In order to evaluate the sample quality of questions, the concept of Chrono-Informativeness (CI) is introduced to measure the ability of questions raised by the model to retrieve events aligned with the reference timeline, that is, questions with a high CI value are more likely to lead to the retrieval of articles related to the target news events. The timeline generated by the search and the F1 score containing the date in the reference timeline are measured.

Based on the goal of maximizing the temporal information of the problem set, a sample pool of "news-issues" is constructed to guide the problem generation of new target news.

For each new target news, the samples that are most similar to the target news are dynamically retrieved through cosine similarity, which ensures the contextual relevance of the samples and the accuracy of the time information.

(3) Iterative questions

CHRONOS gradually delves deeper into the details of events through successive iterations of questions.

Each iteration is based on the search results of the previous round to discover new questions and information until the number of events in the timeline is satisfied or the maximum number of iterations is reached.

(4) Problem rewriting

Query Rewriting is a common optimization method used in search enhancement generation.

In the CHRONOS framework, the team is able to generate more specific and targeted queries by rewriting broad or complex questions generated during the initial questioning phase into 2-3 subquestions that are easier to retrieve, thus improving the search engine's retrieval effectiveness.

They also include a small number of samples in the prompt to guide the efficient rewriting of the larger model, turning complex questions into more specific queries while maintaining the original intent of the question.

(5) Time line generation

CHRONOS generates a complete timeline summary in two stages: Generation and Merging.

Generate: Identify key events and details by analyzing news articles retrieved in each round. Use the understanding and generation power of large models to extract the date and related details of each event and write a concise description of each event. These events and descriptions are organized into preliminary timelines, arranged chronologically, to provide the basis for subsequent merging phases.

Merge: The initial timeline generated by multiple rounds of searches is consolidated into a coherent final summary. This process involves aligning events in different timelines, resolving any conflicts in dates or descriptions, and selecting the most representative and significant events.

New data set OPEN-TLS

To evaluate the TLS system, the research team also collected a timeline of recent news events written by professional journalists, constructing a new dataset called Open-TLS.

Compared with previous closed-domain datasets, Open-TLS is not only more diverse in dataset size and content, covering multiple fields such as politics, economics, society, sports, and science and technology, but also more timeliness, providing a more comprehensive and challenging benchmark for open-domain TLS tasks.

Experimental result

1. Experimental setting

The experiment builds CHRONOS system based on GPT-3.5-Turbo, GPT-4 and Qwen2.5-72B respectively, and evaluates TLS performance in open domain and closed domain Settings. The main evaluation indicators used are:

ROUGE-N: Measures the N-gram overlap between the generation timeline and the reference timeline. The details include:

Concat F1: Calculate ROUGE by concatenating all date summaries to assess overall consistency.
Agree F1: ROUGE is calculated using only summaries of matching dates to assess the accuracy of a particular date.
Align F1: The alignment of the forecast digest and the reference digest based on similarity and date proximity is evaluated before ROUGE is calculated.

Date F1: Measures how well the date in the generated timeline matches the actual date in the reference timeline.

2. Enable domain TLS

In experiments with open domain TLS, CHRONOS was compared to several baseline approaches, including searching target news directly (DIRECT) and rewriting target news to create queries for retrieval (REWRITE).

In contrast, CHRONOS significantly improved the quality of event summaries and the accuracy of date alignment by iterating on its approach to self-questioning and retrieving relevant news articles, leading the baseline approach on all metrics.

3. Block domain TLS

In the experiment of closed domain TLS, CHRONOS is compared with previous representative work, including: (1) CLUST based on event aggregation method (Gholipour Ghalandari and. Ifrim, 2020); (2) LLM-TLS based on event graph model EGC (Li et al., 2021) and (3) using large models for event clustering (Hu et al., 2024).

Comparison results on two classic datasets, Crisis and T17, show that CHRONOS achieves similar performance to these efforts, achieving SOTA effects on the AR-2 metrics of both datasets, demonstrating its strong performance and adaptability across different types of events and time spans.

4. Run time analysis

Another advantage of CHRONOS is efficiency.

Compared to the LLM-TLS approach, which is also based on a large model but needs to process all articles in the news library, it significantly reduces processing time by focusing on the most relevant news articles through a retrieval enhancement mechanism.

This increase in efficiency makes it more practical in practical applications, especially in scenarios that require a fast response.

Case study: Apple product launch timeline

The team analyzed the model's performance when dealing with specific news events, and by selecting a representative news event, such as a major product launch from Apple, they were able to observe how CHRONOS generates a timeline through self-questioning and information retrieval.

In case studies, CHRONOS demonstrated its ability to accurately extract key events and dates, while also revealing areas that might need improvement in some cases, such as missing certain events or date hallucinations.

Conclusion

The CHRONOS framework provides a novel and effective solution for timeline summary tasks by combining iterative self-questioning and retrieval enhanced generation techniques of large language models.

The core of this approach is to simulate the human information retrieval process, by constantly asking and answering new questions to gradually deepen the understanding of events, and ultimately generate a comprehensive and coherent timeline summary.

The experimental results have fully demonstrated the capability of CHRONOS in complex event retrieval and timeline construction, and demonstrated the application potential and accuracy of this framework in practical news timeline generation applications.

At the same time, whether the retrieval generation method of iterative questions has the ability to generalize to general tasks is also worthy of further study in the future.

Paper: https://arxiv.org/abs/2501.00888
Github: https://github.com/Alibaba-NLP/CHRONOS
Demo: https://modelscope.cn/studios/vickywu1022/CHRONOS

Reference:

[1] Demian Gholipour Ghalandari and Georgiana Ifrim. 2020. Examining the state-of-the-art in news timeline summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1322–1334, Online. Association for Computational Linguistics.

[2] Manling Li, Tengfei Ma, Mo Yu, Lingfei Wu, Tian Gao, Heng Ji, and Kathleen McKeown. 2021. Timeline summarization based on event graph compression via time-aware optimal transport. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6443–6456, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.

[3] Qisheng Hu, Geonsik Moon, and Hwee Tou Ng. 2024. From moments to milestones: Incremental timeline summarization leveraging large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7232–7246, Bangkok, Thailand. Association for Computational Linguistics.

TAGS：

PREV： Chen Danqi's team reduced costs again: the data was cut by one-third, but the performance was not reduced at all

RETURN

NEXT： The first big model company to fail? Transmission zero one training team was recruited by Ali! Kai-fu Lee refuted rumors overnight: all wrong, 24 years we income one hundred million! Netizen: or stop training to focus on AI applications!