Stop thinking of LLM as a database

#News ·2025-01-03

Translator | Bugatti

Review | heavy building

Imagine driving a car with headphones that update the road every five minutes, instead of providing a constant video stream of your current location. It won't be long before you crash.

Although this type of batch processing is not applicable in the real world, it is the way many systems operate today. Batch processing was born out of outdated technical limitations that forced applications to rely on static, delayed data. This approach may be the only viable solution when computing, memory, and storage are limited, but it is completely at odds with the way we interact with the real world, much less the way AI works.

Generative AI has incredible potential, and large language models (LLMS) cannot be seen as static databases, reactive systems that wait for inputs and provide outputs. AI relies on real-time situational data to thrive. If we cling to the batch concept, we are killing its capabilities.

Discuss why the batch paradigm is obsolete, how it has hindered the development of AI applications, and why the future of AI requires a real-time event streaming platform.

Why are we stuck in batch mode?

Batch-oriented systems for analytics and machine learning have dominated the technology world for decades. These systems came into being and were created at a time when computer memory was limited, computing power was limited, and storage space was minimal. However, the same traditional methods are now being applied to the new era of generative AI.

Machine Learning Operations (MLOps) has largely evolved around a discrete, sequential set of tasks, such as feature engineering, model training, model testing, model deployment, and bias characterization. This conceptual model is well suited for batch-oriented development and delivery, but it limits the reactivity and accuracy of these applications in a changing world. Applications that need to be more responsive will inevitably need to avoid the generic MLOps infrastructure.

In our view, this is a flawed approach.

No one has ever been fired for designing the batch process

At its core, this paradigm aggregates data into a central database that passively waits for the system or user to poll and call. The purpose of the resulting system depends entirely on the specific needs of the query received. While this approach worked for the constraints of the time, it was fundamentally divorced from the way we experience and interact with the world.

Figure 1. General diagram of batch process

Despite advances in technology, this belief remains deeply rooted. Today, we have alternative technologies such as data streaming platforms that enable real-time, event-driven architectures. But batch systems still exist, not because they are the best solution, but because they have become the accepted way of doing things.

Just like the old saying "no one gets fired for buying an IBM system", the same is true of batch systems: no one gets fired for designing a system that aggregates data in one place, provided that acting on this centralized data is efficient and reliable. We are used to thinking of work as a series of tasks, one completed before the next. Mature results from disciplines such as operations research and lean manufacturing show that we excel at doing batch work because we get better through practice, and switching minds is less efficient. Modern distributed systems need not be constrained by our limitations.

Batch thinking in machine learning

In our daily lives, we don't deal with the world based on "batch updates." We are constantly processing information, reacting and adapting to changing situations. However, historical limitations caused batch processing to become the default paradigm.

Traditional machine learning reflects this batch-oriented thinking. The model operates around a strict linear workflow:

  • Collection of training data: Collection of domain-specific static data sets, often used for time snapshots.
  • Feature engineering: Preprocess, refine, and prepare the data for the model.
  • Training model: The model is built based on the filtered data.
  • Test model: Parts of the existing data are isolated from the training data to test the effectiveness of the model against certain predefined performance thresholds.
  • Deployment model: Once deployed, the model becomes a fixed artifact that is used to predict queries.

Figure 2. Batch flow of traditional machine learning

While this process works well for specific use cases, it is inherently rigid and inflexible.

By contrast, one of the reasons generative AI is so transformative is because the underlying model is inherently reusable and capable of solving a variety of problems across many domains. However, in order for these models to be reusable between different domains, it is necessary to ensure that the data is in a specific scenario during prompt assembly, and batch processing does not meet this requirement.

Without contextualized data, LLMS provide no value

Consider a simple example. Imagine we develop an AI-based flight assistant to help customers when their flights are delayed.

Figure 3. Sample interaction between a user and an AI flight assistant

In the two rounds of interaction above, a lot of situational information is needed to satisfy the customer's requirements.

The LLM needs to remember that the relevant city is New York. It needs to know the customer's identity and current booking status, current flight information, departure/arrival times, seat layout, seat preferences, pricing information, and airline change policies.

Unlike traditional machine learning, where models are trained with application-specific data, LLM doesn't train with your data, they train with general information. Data engineering for a specific application occurs during prompt assembly, not during model creation.

Figure 4. LLM reusability and customization through prompt assembly

At a time when two medical papers are published every minute and 8,400 legal cases are resolved every hour, static data is not enough. AI systems need data flowing in real time to come up with solutions. Despite better options, sticking with batch-oriented systems limits the potential for modern applications, especially in AI. It's time to rethink this outdated approach and embrace architectures that reflect how we live and work in a dynamic, real-time world.

LLM is export-oriented

When we design the next generation of AI applications, we may fall into the same batch-oriented trap. We think of LLMS as databases (responsive tools that wait for input and respond to specific queries). But this perception simply does not match the capabilities of the LLM. AI isn't just for preserving information, it's for reasoning, generation, and evolution.

Databases are inward-looking, holding information and only making it available when explicitly requested, whereas LLMS are outward-looking, aiming to participate, synthesize, and actively contribute. They are suitable for environments where application scenarios are constantly changing and architectures that can support this dynamic behavior. A batch-oriented approach (where models and data are updated regularly, but otherwise static) stifles the true potential of generative AI.

To truly realize AI's potential, we need to shift our thinking.

AI systems should be active participants in the workflow - contributing ideas, engaging in dynamic conversations, and in some cases operating autonomously. This requires a significant change in the architecture. Instead of a static query-response system, we need an event-driven architecture that enables smooth real-time interactions and ADAPTS flexibly.

How can stream processing unlock the potential of AI?

The data flow platform supports real-time requirements, that is, supports continuous, event-driven workflows to meet dynamic and fast-paced system requirements. In areas such as finance, telecommunications, and e-commerce, where milliseconds make or break, batch-oriented architectures fall short. Applications are needed to detect fraud while a transaction is in progress, update inventory when a product is sold, or provide real-time personalization during a customer interaction.

Figure 5. General diagram of flow processing

Stream processing for generative AI

Most practical use cases for generative AI rely on real-time situational data. Stream processing platforms complement these models by overcoming significant challenges that batch processing systems cannot address.

  • Real-time contextualization: LLMS require up-to-date data to generate meaningful responses. Ai-based flight assistants, for example, need instant access to flight delay, cancellation and rebooking options. Stream processing platforms provide this real-time context, ensuring that AI gets the information it needs when it needs it.
  • Dynamic decision making: Generative AI systems can do more than just respond to queries. Stream processing platforms allow AI to react dynamically to changing inputs, such as adjusting product recommendations when inventory changes, or reacting to new law application cases that have just been published.
  • Scalable, decoupled architecture: LLMS often require integration with multiple systems, from CRM to analytics platforms. The stream processing platform supports a decoupled architecture where each component can operate independently while using the same data stream. This avoids the bottlenecks and rigidities of batch systems, allowing AI applications to scale efficiently.
  • Reduce latency in AI workflows: In batch systems, delays between data collection and data processing can lead to outdated information. For example, a bulk-updated vector database that stores customer data might recommend products that are out of stock. Flow processing eliminates this delay and aligns AI workflows with reality.

Agent AI: AI that acts instead of waiting

The rise of agent AI has sparked interest in agents that are not limited to simple query/response interactions. Such systems can initiate actions autonomously, make decisions, and adapt to changing circumstances.

Consider a typical AI agent. We can think of agents as automated processes that reason about their environment and take proactive action to achieve some specified goal. Its decisions can be complex and include conditional branching logic that is affected by intermediate data queries.

It may require pulling data from multiple sources, handling prompt engineering and RAG workflows, and interacting directly with various tools to perform deterministic and random workflows. The required orchestration is complex and depends on multiple systems. If agents need to communicate with other agents, the complexity will only increase. Without a flexible architecture, these dependencies make extension and modification almost impossible.

Figure 6. Proxy dependency profile

To do this, they need to:

  • Continuous awareness: A real-time stream of events, such as changes in inventory, user behavior, or system state.
  • Situational reasoning: The ability to synthesize dynamic data to infer intent and plan action.
  • Autonomous decision making: Actions can be performed without waiting for explicit user instructions, such as rebooking flights or dynamically adjusting system configurations.

For example, AI-based travel assistants using stream processing can automatically monitor flight schedules, identify delays, rebook affected flights, and notify users, all without human intervention. With static data that is updated in batches, this level of autonomy is not possible.

Stream processing platforms address these needs by providing continuous, low-latency data streams and the infrastructure essential for real-time computing. Without this foundation, autonomous, collaborative AI systems remain a distant dream.

Stream processing is the future of generative AI

Generative AI is a fundamental shift in the way we build and use technology. To realize its full potential, we need systems that align with the way AI processes and gains insights: continuously, dynamically, and in real time. Stream processing platforms provide the basis for this evolution.

If we integrate AI applications with stream processing platforms, we can:

  • From passive AI systems to active AI systems.
  • Support real-time personalization and decision making.
  • Ensure that the LLM is run on the latest, most relevant data.
  • Create scalable, flexible architectures that can evolve with advances in AI.

Generative AI isn't just about building smarter systems, it's about building a continuous, ever-changing stream of events. Stream processing platforms make all of this possible, bridging the gap between the static systems of yesteryear and the dynamic AI-based future.

Stop Treating Your LLM Like a Database, by Sean Falconer

TAGS:

  • 13004184443

  • Room 607, 6th Floor, Building 9, Hongjing Xinhuiyuan, Qingpu District, Shanghai

  • gcfai@dongfangyuzhe.com

  • wechat

  • WeChat official account

Quantum (Shanghai) Artificial Intelligence Technology Co., Ltd. ICP:沪ICP备2025113240号-1

friend link