2024-12-08 | By Mariusz Jażdżyk
I often hear requests like, "Let's build a recommender system" or more commonly, "Let's implement AI." The appetite for AI solutions is growing rapidly.
For large corporations, this journey can feel like navigating a corporate-technological labyrinth. While startups might seem like an easier playing field, the reality is often more complex.
When starting such a project, the excitement and emotions are high. Creating a prototype within two weeks outside of a corporate environment is usually feasible, sparking great hope. However, after this initial phase, the project often faces the notorious "valley of death." If the team can navigate through this challenging phase, the result can be truly valuable.
Is it worth embarking on such a mission? How can we ensure we don't get stuck halfway?
Before diving into action, it’s crucial to set the right conditions from the start. Success in analytical projects often depends on several key factors:
Source: “Chief Data Officer” (https://books.chiefdataofficer.pl).
Once the groundwork is laid, it's time to move forward. In startups, flexibility is usually higher. Changing the purpose of a solution or pivoting the entire project is natural and often necessary. Without the ability to quickly adjust the tech stack and business approach, a project may fail, and its participants might shift their focus to other tasks.
Good preparation, especially learning from previous failures, is key. By analyzing past mistakes, we can avoid falling into the same traps in future projects.
While it may seem obvious that we have data available, in practice, we often find ourselves returning to the data preparation stage multiple times. Each iteration costs time and resources, limiting the number of possible adjustments. In most projects, data preparation consumes about 80% of the resources, both in corporations and startups. It’s worth investing in methods to speed up this process drastically to enable faster testing of results and provide room for more iterations. The difference can be stark: I’ve seen projects where cycles took either four hours or four months — two completely different scenarios.
Additionally, managing the resulting technical debt is crucial. Like any debt, it can either act as a lever for growth or become a burdensome expense.
A common assumption is that "the data is already there." This can be a major pitfall. Yes, data exists, but it often requires significant cleaning and preparation before it becomes usable. It might take seven iterations of data transformations before reaching a truly effective dataset. Many projects don’t survive long enough to reach this point due to limited budgets or waning patience from stakeholders.
Today, we have access to powerful tools like Large Language Models (LLMs) operating in the cloud or on local machines. Although using generative AI is tempting, it’s not always the best solution. During training sessions, tasks are often completed swiftly and efficiently, but the real challenge lies in integrating and operationalizing these solutions.
Pilot results obtained within a few days might showcase the potential feasibility of the solution. However, this is far from full implementation. To create a fully functional system, many additional features need to be developed — features that may not be exciting but are essential for the final deployment. Relying solely on pilot results can lead to excessive optimism.
In a sample project for a food-tech recommender system, we managed to organize the data, integrate information about hundreds of thousands of products from various sources, and adapt it to the needs of users in the food industry. We made many mistakes along the way but learned valuable lessons. We repeatedly adjusted our approach to the data and rules, gradually increasing the system’s accuracy.
By utilizing classical algorithms (not just LLMs), we developed a unique architecture that continuously evolved at all levels. As a result, it now surpasses the capabilities of top market experts in the food industry and may soon help you during your shopping!
The "valley of death" phase is a common occurrence in many AI projects. This is the period when the algorithm is still underperforming compared to human experts. However, with rapid iterations, patience, advanced technology, and a solid plan, the developed solution can eventually outperform humans and scale effectively.
This is where the real value lies — and it's worth fighting for.
Author: Mariusz Jażdżyk
The author is a lecturer at Kozminski University, specializing in building data-driven organizations in startups. He teaches courses based on his book Chief Data Officer, where he explores the practical aspects of implementing data strategies and AI solutions.