AI Search

2025-02-10 | By Mariusz Jażdżyk

The Illusion

I remember attending a course where the instructor, with the seriousness of a creator, proclaimed:

“...and in a moment, I will create a data warehouse… oh, I just did.”

Many of us think, “I'll take course X and master it myself!”

In theory, yes. In practice, however, when we enter large organizations, we quickly realize that this “act of creation” doesn’t take five minutes, five weeks, or even five months—it often spans several years. At least, that was my experience.

Even at the Proof of Concept (PoC) stage, key integrations tend to behave differently than the theory suggests.

Is AI Just as Difficult?

If we aim to develop a model or a Retrieval-Augmented Generation (RAG) system based on our own data, then—just like before—we first need to have that data. I’ll skip the discussion on data acquisition strategies for now (though I covered this in my book Chief Data Officer).

However, merely possessing data is just the beginning. It must be structured and accessible in a way that enables the Large Language Model (LLM) to retrieve information swiftly and accurately. We've already encountered this challenge while building product recommendation engines, but working with large text corpora introduces additional complexities.

Meeting User Expectations

User expectations can be highly specific, reflecting the nuances of different industries. What seems straightforward at first often turns out to be just the starting point.

Those who have tackled this challenge know exactly what I mean. Embeddings play a crucial role, but we must not overlook classic search techniques: TF-IDF, Named Entity Recognition, query speed optimization, and heuristics that sometimes improve results significantly. There’s a reason why companies like Allegro and Google employ entire teams dedicated to search and data retrieval.

Achieving Speed and Accuracy

When we reach the point where we can retrieve one document out of 100,000 in just a third of a second, we are almost there.

After several setbacks, we arrived at an interesting combination: leveraging years of search expertise alongside AI-powered semantic search.

The Power of AI-Powered Agents

This approach allows our AI Agent to access critical information instantly, understand context and industry-specific nuances, and provide relevant answers or suggestions to employees. And it does this not based on OpenAI’s training data or even proprietary models like Bielik, but on something truly unique: hundreds of thousands of internal company documents and other confidential data.

By merging traditional search techniques with AI advancements, we create intelligent systems that go beyond generic knowledge bases—delivering precise, contextual, and invaluable insights for businesses.


Author: Mariusz Jażdżyk

The author is a lecturer at Kozminski University, specializing in building data-driven organizations in startups. He teaches courses based on his book Chief Data Officer, where he explores the practical aspects of implementing data strategies and AI solutions.