RAG chatbot using AI Endpoints and LangChain

A parrot and a robot looking in documents

Have a look at our previous blog posts:
– Enhance your applications with AI Endpoints
– How to use AI Endpoints and LangChain4j
– LLMs streaming with AI Endpoints and LangChain4j
– How to use AI Endpoints and LangChain to create a chatbot
– How to use AI Endpoints, LangChain and Javascript to create a chatbot

In the world of generative AI with LLMs, LangChain is one of the most famous Framework used to simplify the LLM use with API call.

LangChain’s tools and APIs simplify the process of building LLM-driven applications like chat bots and virtual agents.

LangChain is designed to be used with Python language and Javascript.

And, of course, we’ll use our AI Endpoints product to access to various LLM models 🤩.

ℹ️ All the code source used in the blog post is available on our GitHub repository: public-cloud-examples/tree/main/ai/ai-endpoints/python-langchain-chatbot ℹ️

Retrieval Augmented Generation

Before adding the RAG feature to our chatbot (see previous blog post on how to create a chatbot with AI Endpoints), let’s try to explain what Retrieval Augmented Generation (RAG) is.

To sum up what it is: RAG lets you to inject your data in the context of an LLM to help it to give a better answer when you ask it about your data.

To do this, you transform your data into vectors, so that you can search for similarities in the new data, based on a question (itself transformed in vector).

Vector transformation is complex and often delegated to an embedding model. AI Endpoints offers this kind of model 😊.

Once again, to avoid having to chain together all these steps (vectorization, search, context, …) we’ll use LangChain to do it in the following example.

How to implement RAG with AI Endpoints (and LangChain)

Be sure to have the correct dependencies in your requirements.txt:

langchain==0.3.24
langchain-mistralai==0.2.10
langchain_community==0.3.22
langchain_chroma==0.2.3
argparse==1.4.0
unstructured==0.17.2
langchainhub==0.1.21

pip3 install -r requirements.txt

Then you can develop your chatbot with RAG feature:

import time
import os

from langchain import hub

from langchain_mistralai import ChatMistralAI

from langchain_chroma import Chroma

from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings.ovhcloud import OVHCloudEmbeddings

from langchain_core.runnables import RunnablePassthrough
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load environment variables: see https://endpoints.ai.cloud.ovh.net/ for more information 
_OVH_AI_ENDPOINTS_ACCESS_TOKEN = os.environ.get('OVH_AI_ENDPOINTS_ACCESS_TOKEN') 
_OVH_AI_ENDPOINTS_MODEL_NAME = os.environ.get('OVH_AI_ENDPOINTS_MODEL_NAME') 
_OVH_AI_ENDPOINTS_MODEL_URL = os.environ.get('OVH_AI_ENDPOINTS_MODEL_URL') 
_OVH_AI_ENDPOINTS_EMBEDDING_MODEL_NAME = os.environ.get('OVH_AI_ENDPOINTS_EMBEDDING_MODEL_NAME') 

# Configure the used model: Mistral
model = ChatMistralAI(model=_OVH_AI_ENDPOINTS_MODEL_NAME, 
                      api_key=_OVH_AI_ENDPOINTS_ACCESS_TOKEN,
                      endpoint=_OVH_AI_ENDPOINTS_MODEL_URL, 
                      max_tokens=1500, 
                      streaming=True)

# Load and split the documents
loader = TextLoader("./rag-files/content.txt")
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

# Create the vector thanks to OVHcloud embedding model
vectorstore = Chroma.from_documents(documents=splits, 
                                    embedding=OVHCloudEmbeddings(model_name=_OVH_AI_ENDPOINTS_EMBEDDING_MODEL_NAME, 
                                                                  access_token=_OVH_AI_ENDPOINTS_ACCESS_TOKEN))

# Define the prompt template for the chatbot
prompt = hub.pull("rlm/rag-prompt")

# Create a chain that "apply" the prompt to the model.
rag_chain = (
  {"context": vectorstore.as_retriever(), "question": RunnablePassthrough()} |
  prompt
  | model
)

# Invoke the chatbot
print("👤: Which company created AI Endpoints?")
print("🤖:")
for r in rag_chain.stream({"question", "Which company created AI Endpoints?"}):
  print(r.content, end="", flush=True)
  time.sleep(0.150)

ℹ️ Note on the environment variables : you can find values on the documentation tab of each model. ℹ️
For example with the model list on April 2025 :
– OVH_AI_ENDPOINTS_MODEL_NAME: Mistral-7B-Instruct-v0.3
– OVH_AI_ENDPOINTS_MODEL_URL: https://mistral-7b-instruct-v0-3.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1
– OVH_AI_ENDPOINTS_EMBEDDING_MODEL_NAME: bge-m3

You can now add a text file in the rag-files folder. See public-cloud-examples repository for an example.

You can try your new assistant with the following command:

python3 chatbot-streaming-rag.py

Just to remember, the same question without RAG 😉:

And that it!

Don’t hesitate to test our new product, AI Endpoints, and give us your feedback.

You have a dedicated Discord channel (#ai-endpoints) on our Discord server (https://discord.gg/ovhcloud), see you there!

Stéphane Philippart

Website | + posts

Once a developer, always a developer!
Java developer for many years, I have the joy of knowing JDK 1.1, JEE, Struts, ... and now Spring, Quarkus, (core, boot, batch), Angular, Groovy, Golang, ...
For more than ten years I was a Software Architect, a job that allowed me to face many problems inherent to the complex information systems in large groups.
I also had other lives, notably in automation and delivery with the implementation of CI/CD chains based on Jenkins pipelines.
I particularly like sharing and relationships with developers and I became a Developer Relation at OVHcloud.
This new adventure allows me to continue to use technologies that I like such as Kubernetes or AI for example but also to continue to learn and discover a lot of new things.
All the while keeping in mind one of my main motivations as a Developer Relation: making developers happy.
Always sharing, I am the co-creator of the TADx Meetup in Tours, allowing discovery and sharing around different tech topics.