10 Considerations: Building a Retrieval Augmented Generation System

It has been claimed that to create the ultimate Retrieval Augmented Generation (RAG) stack, all one needs is a plain vector database, a touch of LangChain, and a smattering of OpenAI. In the last couple of weeks, I provided the details of this myth and offered a better approach. Here are the ten essential considerations when constructing a RAG from a business perspective.

(A lot has happened since this was first published! To fast-track your organization's value creation using generative AI, check out our 2025 white paper!)

Data Access and Life Cycle Management

How will you effectively manage the entire information lifecycle, from data ingestion to deletion? This includes connecting to different enterprise data sources and collecting extensive, varied data swiftly and accurately.
The next step is ensuring that every piece of data is efficiently processed, enriched, readily available, and eventually archived or deleted when no longer needed. This involves constant monitoring, updates, and maintenance to meet data integrity, security, business needs, and compliance standards.

Data Indexing & Hybrid Search

Data indexing and the operation of hybrid search within a large-scale enterprise context are complex, ongoing endeavors that extend beyond the initial setup. It involves creating structured, searchable representations of vast datasets, allowing efficient and precise information retrieval. A hybrid search amplifies this complexity by combining different search methodologies to deliver more comprehensive and relevant results.
Maintaining such a large-scale index over time is non-trivial; it necessitates continuous updates and refinement to ensure the accuracy and relevance of the retrieved data, reflecting the most current state of information.

Enterprise Security & Access Control at Scale

Enterprise security, especially respecting complex Access Control Lists (ACL), is a crucial aspect of data management and system interactions in any organization. Proper implementation of ACLs is paramount to ensure that every user gains access only to the resources they are permitted to use, aligning with their role and responsibilities within the organization.
The task is not just about setting up strict access controls but maintaining and updating them in a dynamically changing index (see previous point) to adapt to evolving organizational structures and roles.
Any cloud system interaction with an enterprise opens attack vectors. The setup of such a system needs to be well thought through. We’ll deal with that in a separate post (and keep it short by pointing to our ISO27001 certification)

Chat User Interface

Building an adaptable chat interface is relatively straightforward. Integrating it with value-added services/agents is where the real challenge lies.
Recommendations, next-best-action tasks, and (semi) autonomous automation are more difficult to implement and integrate as they require a whole lot more scaffolding behind the scenes.

Comprehensive System Interaction

Developing a system that integrates interactions with indices, Large Language Models (LLM), and performs entailment checks of answers is a multidimensional challenge.
Building a comprehensive Information Retrieval (IR) Stack is an intricate endeavor. It demands meticulous consideration of the types and sources of data to be incorporated, aiming to achieve a good understanding of the information involved. By accurately accounting for the diversity and nature of data, the system can significantly enhance the quality and relevance of the generated results, providing more precise and contextualized responses.
In essence, the initial simplicity masks the underlying complexity and sophistication required to orchestrate coherent interactions among various components effectively.

Prompt Engineering

Creating an effective prompt service to facilitate interaction with a Large Language Model (LLM) requires a nuanced approach. It involves crafting prompts that are concise, clear, and contextually rich to elicit accurate and relevant responses from the LLM.
The prompts should be designed considering the model's capabilities and limitations, focusing on clarity and specificity to avoid ambiguous or generalized queries that might result in imprecise or irrelevant answers.
Additionally, integrating adaptive mechanisms can help refine prompts based on real-time interactions and feedback, enhancing the quality of the dialogue between the user and the LLM. Balancing specificity with adaptability is key in optimizing the efficacy of prompt services interacting with LLMs.

Chain of Reasoning

The implementation of a chain of reasoning represents a sophisticated progression in the journey of developing intelligent systems. It transcends the rudimentary interaction levels, enabling systems to engage in continuous, meaningful dialogue by logically connecting multiple pieces of information.
This involves not just processing individual queries but understanding and relating pieces of information in a coherent, logical sequence, allowing the system to provide more nuanced, contextual, and insightful responses. It represents a shift from isolated retrieval and response mechanisms to a more integrated, coherent interaction model, where the system can comprehend, relate, and extrapolate information across multiple interaction points, paving the way for more advanced, context-aware conversational experiences.

Enterprise Integration

Integrating a RAG into an existing enterprise setup is the next step, and often more intricate than anticipated, especially when striving to avoid inducing 'yet another dashboard' fatigue once the initial novelty diminishes.
Such integration is not about just plugging in a new component; it demands comprehensive Software Development Kits (SDKs) and thorough interoperability assessments to ensure seamless interaction within the existing technological ecosystem.
While APIs can offer pathways for integration, relying solely on them is insufficient. They are part of the solution, not the complete answer, serving as components. Achieving seamless integration is about harmonizing new capabilities with established systems, requiring meticulous planning, execution, and ongoing refinement.

Continuous Operation

The continuous operation of advanced systems demands attention to updates, upgrades, and enhancements to sustain optimal performance and adapt to evolving needs. This ongoing endeavor is not only about maintaining the system but also about refining and advancing it continuously.
A notable point is that the talents who develop such systems are often not the ones who manage them in the long run. The industry is dynamic, and the risk of skilled developers being recruited away is ever-present.

Cost Considerations

Cost considerations are paramount when scaling technologies like LLMs within a company. Early trials, while revealing, often expose vast amounts of data to LLMs, and when these are scaled, the costs escalate significantly. LLM operations tend to be 10-20x more expensive than classic retrievals. The key is to set up a system with a good balance between both.
Operating sophisticated technologies at scale over time, especially with little prior experience, can lead to painful lessons learned in navigating diverse environments and addressing unforeseen challenges. Furthermore, the financial implications extend beyond operational costs to include maintenance, updates, employee training, and support.

Conclusion

Building a perfect RAG stack is not as simple as mixing a few ingredients. It is a meticulous process riddled with complexities and steep learning curves. It involves considering aspects from data management to continuous operation, enterprise integration, costs, and beyond. Use this comprehensive checklist to track the ten considerations to build RAG.