2025-03-13

Upskilling in AI - Our bootcamp approach at Spaceteams

Software Engineering
Learning & Growth
showing RAG in the middle of the picture and surrounded by futuristic pipelines in blue, orange and pink
showing Carola smiling

WRITTEN BY

Carola

CONTENT

At Spaceteams, we normally dedicate up to one day per week to Spacetime, our initiative where team members dedicate time to exploring new technologies, best practices and industry trends.

Last December, during the year end slow down, we used a different approach: This time, two of our colleagues prepared and hosted a two week bootcamp to dig into AI. But, AI is a very vast topic that we could easily spend months on.

So we decided to focus on building a RAG pipeline from scratch.


Why focus on RAG?

There are many great tutorials, blogs, etc about RAG pipelines out there, explaining what it is and how it works. So rather than repeating that, we want to emphasize here the benefits for businesses, who have to decide how to include AI in their products.

A Retrieval-Augmented Generation (RAG) pipeline allows to leverage any large language models (LLM) capabilities for a specific use case, without having to train an entire model on that data set.

Example: You need an AI Assistant for your customer care department.

Maybe you are familiar with pre-trained LLMs that are capable of “answering” questions based on the information they were trained on, e.g. ChatGPT or Claude. Answering questions means the LLM is capable of processing questions and formulating an understandable response.

The best public LLMs have (most probably) not been trained on your company specific internal data. So if you want to use AI to enhance your customer care, then you have a decision to make:

  • build and train an LLM for your specific area of expertise
  • or enhance pre-trained LLMs with your specific domain knowledge and use its language processing capabilities

Both options have their pros and cons.

On the one hand training LLMs requires very special skill sets, big data sets for training and testing and large computing power. As a result you will have your own LLM, trained to your needs.

On the other hand enhancing pre-trained LLMs for your domain requires high-quality input data, understanding of how to protect your domain knowledge as well as the LLM from abuse and optimising the LLM responses to your use case. As a result you profit from all the advantages of a ready and well-trained LLM, providing customer support grounded in your specific internal knowledge base. That is what a RAG pipeline is used for.

Why building a RAG from scratch?

In our experience few businesses have the skills, the data, or the time and capacity to train their own model. Whereas the skills needed to build a RAG pipeline are similar to other software engineering tasks, e.g. connecting APIs. And instead of training an LLM, an existing one can be used, usually for a license fee.

Even though there are services out there that offer to build the entire RAG pipeline for you there are many tricky parts one should know and understand well in order to use any LLM in the most beneficial way:

  • Embedding and indexing data: This is about how to structure and feed your business specific information to an LLM to ensure efficient user interaction later.
  • Performance validation: This is about how to establish benchmarks to measure and compare the efficiency of your pipeline based on relevant factors to your use case.
  • Iterative optimization: A general RAG pipeline enhanced with your data will probably already create decent results. In order to optimize for your use case, based on the available data you have, this additional step is to fine-tune for aspects like chunking, query relevance and response quality.

How to create real value

We did not just want to understand what a RAG pipeline is. We wanted to turn theory into practice. For that we needed examples. So the first bootcamp task was to find data sources and an understandable use case for that data. We came up with two projects:

  1. AI Slack Assistant: This tool uses internal Slack data for example to support onboarding of new colleagues and identify team subject matter experts—a practical step toward improving team efficiency.
  2. News Insights Tool: By analyzing public news collections, we explored how AI could surface actionable insights from vast datasets.

During the bootcamp each project team had to understand their data set and develop the use case to build and optimize their rag pipeline. The AI Slack Assistant team struggled with low context in chat messages. While the News Insights team had to deal with the challenge of long news articles that couldn’t be condensed into single key points. Both teams started out with the exact same RAG setup, but in order to create value for their individual use cases they had to adapt the pipeline in several different ways.

These projects underscored the value of going beyond "out-of-the-box" AI solutions. We think that - as in many other cases - the devil is in the detail. Unless you really understand how a rag pipeline works and what you want to achieve, you might struggle to generate good results with it.


Our way forward

The bootcamp was an intense but rewarding journey. Armed with deeper AI expertise, we’re already applying these insights in our first AI projects - ensuring our clients stay ahead in an AI-powered world. We continue to learn and expand our AI knowledge in order to leverage it for our customers.

Stay tuned.