RAG vs. Fine-Tuning
Choosing the Right Method for External Knowledge
In AI development, incorporating proprietary data and external knowledge is crucial. Two key methodologies are Retrieval Augmented Generation (RAG) and fine-tuning. Here’s a quick comparison.
๐๐๐ญ๐ซ๐ข๐๐ฏ๐๐ฅ ๐๐ฎ๐ ๐ฆ๐๐ง๐ญ๐๐ ๐๐๐ง๐๐ซ๐๐ญ๐ข๐จ๐ง (๐๐๐) ๐
RAG combines an LLM’s reasoning with external knowledge through three steps:
1๏ธโฃ Retrieve: Identify related documents from an external knowledge base.
2๏ธโฃ Augment: Enhance the input prompt with these documents.
3๏ธโฃ Generate: Produce the final output using the augmented prompt.
The retrieve step is pivotal, especially when dealing with large knowledge bases. Vector databases are often used to manage and search these extensive datasets efficiently.
Implementing a RAG-Chain with Vector Databases: time to recall the post “AI concept in a Nutshell: LLM series – Embeddings & Vectors ” from 1 month ago!
๐ ๐ข๐ง๐-๐๐ฎ๐ง๐ข๐ง๐ ๐ ๏ธ
Fine-tuning adjusts the LLM’s weights using proprietary data, extending its capabilities to specific tasks.
Approaches to Fine-Tuning:
1๏ธโฃ Supervised Fine-Tuning: Uses demonstration data with input-output pairs.
2๏ธโฃ Reinforcement Learning from Human Feedback: Requires human-labeled data and optimizes the model based on quality scores.
Both approaches need careful decision-making and can be complex.
๐๐ก๐๐ง ๐ญ๐จ ๐๐ฌ๐ ๐๐๐ ๐จ๐ซ ๐ ๐ข๐ง๐-๐๐ฎ๐ง๐ข๐ง๐ ? ๐ค
ยท RAG: Great for adding factual knowledge without altering the LLM. Easy to implement but adds extra components.
ยท Fine-Tuning: Best for specializing in new domains. Offers full customizability but requires labeled data and expertise. May cause catastrophic forgetting.
Choose based on your needs and resources. Both methods have their strengths and challenges, making them valuable tools in AI development.
LLM Temperature
I love the analogy shared during one of the break-out sessions at the OVHcloud summit: LLM temperature is like blood alcohol level – the higher it is, the more unexpected the answers! ๐
To be more specific, temperature is a parameter that controls the randomness of the model’s output. A higher temperature encourages more diverse and creative responses, while a lower temperature makes the output more deterministic and predictable.
๐ Key Points:
๐น High Temperature: More random and creative outputs, useful for brainstorming and generating novel ideas.
๐น Low Temperature: More predictable and coherent outputs, ideal for factual information and structured content.
Understanding and adjusting the temperature can help tailor LLM outputs to specific needs, whether you’re looking for creativity or precision.
Low-rank adaptation (LoRA): Turning LLMs and other foundation models into specialists
Before a foundation model is ready to take on real-world problems, itโs typically fine-tuned on specialized data and its billions, or trillions, of weights are recalculated. This style of conventional fine-tuning is slow & expensive.
โก LoRA is a quicker solution. With LoRA, you fine-tune a small subset of the base modelโs weights, creating a plug-in module that gives the model expertise in, for example, biology or mathematical reasoning at inference time. Like custom bits for a multi-head screwdriver, LoRAs can be swapped in and out of the base model to give it specialized capabilities.
By detaching model updates from the model itself, LoRA has become the most popular of the parameter-efficient fine-tuning (PEFT) methods to emerge with generative AI.
The LoRA approach also makes it easier to add new skills and knowledge without overwriting what the model previously learned, a phenomenon known as catastrophic forgetting. LoRA offers a way to inject new information into a model without sacrificing performance.
But perhaps its most powerful benefit comes at inference time. Loading LoRA updates on and off a base model with the help of additional optimization techniques can be much faster than switching out fully tuned models. With LoRA, hundreds of customized models or more can be served to customers in the time it would take to serve one fully fine-tuned model.
Chain of Thought (CoT)
One of the most effective techniques to improve the performance of a prompt is the Chain of Thought (CoT).
The principle consists of breaking down a problem into several tasks or summarizing all the data available upstream.
Examples:
1) ๐๐ผ๐ป๐๐ฒ๐ป๐ ๐ด๐ฒ๐ป๐ฒ๐ฟ๐ฎ๐๐ถ๐ผ๐ป: Rather than directly asking the LLM to generate a summary, we can first ask it to generate a table of contents and then a summary.
The final result will therefore have a better chance of being exhaustive on the content of the initial document.
2) ๐๐ฟ๐ฒ๐ฎ๐๐ถ๐ผ๐ป ๐ผ๐ณ ๐๐๐ฟ๐๐ฐ๐๐๐ฟ๐ฒ๐ฑ ๐ฒ๐ป๐๐ถ๐๐ถ๐ฒ๐: The CoT allows you to control the format of generation of structured entities (for example in a YAML format that we will have detailed).
These examples will greatly reduce the chances of having hallucinations in the final response of the LLM.
3) ๐๐ด๐ฒ๐ป๐ ๐ช๐ผ๐ฟ๐ธ๐ณ๐น๐ผ๐: For an LLM Agent, it is preferable to ask the LLM to explicitly generate a plan of the actions it must perform to accomplish its task.
โStart by making a plan of the actions you will have to perform to solve your task. Then answer with the tools you want to use in a YAML block, example: […]โ
4) ๐ฅ๐๐ ๐๐ฒ๐ป๐ฒ๐ฟ๐ฎ๐๐ถ๐ผ๐ป: Using CoT can improve the quality of the generated response while reducing hallucinations. Asking the LLM to summarize the relevant information contained in the documents will strengthen this semantic field when it has to generate the final response.
Conclusion
The chain of thought allows to strengthen the semantic field of the expected result in two different ways:
- Explicit chain of thought: by giving examples of the expected result in the prompt
- Internal chain of thought: by telling the LLM to generate intermediate โreasoningโ steps
This increases latency and costs but in many situations the result is worth the investment
**Follow me on LinkedIn for daily updates**
Fostering and enhancing impactful collaborations with a diverse array of AI partners, spanning Independent Software Vendors (ISVs), Startups, Managed Service Providers (MSPs), Global System Integrators (GSIs), subject matter experts, and more, to deliver added value to our joint customers.
๐ CAREER JOURNEY โ Got a Master degree in IT in 2007 and started career as an IT Consultant
In 2010, had the opportunity to develop business in the early days of the new Cloud Computing era.
In 2016, started to witness the POWER of Partnerships & Alliances to fuel business growth, especially as we became the main go-to MSP partner for healthcare French projects from our hyperscaler.
Decided to double down on this approach through various partner-centric positions: as a managed service provider or as a cloud provider, for Channels or for Tech alliances.
โก๏ธ Now happy to lead an ECOSYSTEM of AI players each bringing high value to our joint customers.