Retrieval-Augmented Generation (RAG) systems promise to make AI smarter by pulling in relevant data before generating responses. But as Joe Antelmi, Senior Director Analyst at Gartner, points out, it’s not that simple.
The biggest challenges aren’t just about picking the right model; they’re about everything that happens before and after the model does its job.
In a recent talk at the Gartner Data and Analytics Summit, Antelmi broke down the messy reality of RAG systems: outdated documents, conflicting versions, poor data quality, and the constant struggle to retrieve the right information at the right time. He emphasized that AI success isn’t just about technology, it’s about smart system design, careful data handling, and choosing the right problems to solve.
Why It Matters: A well-designed RAG system can transform how businesses access and use knowledge, making AI responses more accurate and context-aware. But a poorly built system can lead to irrelevant answers, security risks, and costly inefficiencies. By focusing on structured data ingestion, retrieval optimization, and robust evaluations, organizations can build AI that actually delivers value.
- Data Quality is the Real Challenge: The hardest part of RAG isn’t the AI model, it’s dealing with the messy, outdated, and inconsistent documents it relies on. Antelmi highlights problems like multiple versions of the same document, incomplete drafts, and unstructured formats. Organizations need to focus on better metadata, document structuring, and ingestion pipelines to ensure retrieval accuracy. Some companies try to apply quick fixes, but improving underlying processes, like enforcing better document standards, yields more reliable AI performance.
- Chunking & Retrieval Matter More Than You Think: While long-context models can handle more data, chunking is still essential for cost efficiency and search accuracy. Antelmi discusses various chunking techniques, emphasizing that fixed-size chunking is too simplistic because it cuts off context awkwardly. Semantic chunking and content-aware chunking are better alternatives, helping RAG systems preserve meaning across sections. He also mentions small-to-big retrieval, where AI first retrieves summaries of documents before diving into specific sections, improving speed and accuracy.
- Better Search = Better Answers: Many organizations assume vector databases are the key to better search, but Antelmi cautions that simply switching databases won’t fix retrieval issues. Instead, he recommends hybrid search, which combines keyword-based and vector-based retrieval to improve accuracy. Techniques like query rewriting (where AI reformulates vague questions) and re-ranking (retrieving more documents and filtering the best) help refine results. He also discusses self-querying, where AI evaluates its own retrieval quality and refines search results dynamically.
- Guardrails & Monitoring are Non-Negotiable: AI doesn’t always behave as expected, and poorly monitored systems can produce hallucinations, sensitive data leaks, or security vulnerabilities. Antelmi highlights the importance of AI gateways, which function like API gateways but manage input validation, secure data handling, and cost controls. Observability tools help track hallucinations, policy violations, and response quality. He warns that AI security is still an evolving space, and businesses must layer multiple safeguards to prevent unexpected failures.
- It’s About the System: The best AI solutions aren’t just about picking the most advanced model. They require thoughtful design, user-friendly interfaces, and problem-solving strategies that align with business needs. Many organizations think AI success is about choosing the best large language model (LLM), but Antelmi argues that models only account for 10-20% of the work. The real effort is in data ingestion, preprocessing, and retrieval tuning. He also stresses the importance of choosing the right AI problem, many businesses fail because they tackle overly broad use cases (e.g., searching all documents in an enterprise). Instead, successful RAG implementations focus on niche, well-defined problems with structured, high-quality data.