RAG chatbot using AI Endpoints and LangChain

A parrot and a robot looking in documents

Have a look at our previous blog posts:
Enhance your applications with AI Endpoints
How to use AI Endpoints and LangChain4j
LLMs streaming with AI Endpoints and LangChain4j
How to use AI Endpoints and LangChain to create a chatbot
How to use AI Endpoints, LangChain and Javascript to create a chatbot

In the world of generative AI with LLMs, LangChain is one of the most famous Framework used to simplify the LLM use with API call.

LangChain’s tools and APIs simplify the process of building LLM-driven applications like chat bots and virtual agents.

LangChain  is designed to be used with Python language and Javascript.

And, of course, we’ll use our AI Endpoints product to access to various LLM models 🤩.

ℹ️ All the code source used in the blog post is available on our GitHub repository: public-cloud-examples/tree/main/ai/ai-endpoints/python-langchain-chatbot ℹ️

Retrieval Augmented Generation

Before adding the RAG feature to our chatbot (see previous blog post on how to create a chatbot with AI Endpoints), let’s try to explain what Retrieval Augmented Generation (RAG) is.

To sum up what it is: RAG lets you to inject your data in the context of an LLM to help it to give a better answer when you ask it about your data.

To do this, you transform your data into vectors, so that you can search for similarities in the new data, based on a question (itself transformed in vector).

Vector transformation is complex and often delegated to an embedding model. AI Endpoints offers this kind of model 😊.

Once again, to avoid having to chain together all these steps (vectorization, search, context, …) we’ll use LangChain to do it in the following example.

How to implement RAG with AI Endpoints (and LangChain)

Be sure to have the correct dependencies in your requirements.txt:

langchain
langchain-mistralai
langchain_community
langchain_chroma
argparse
unstructured
langchainhub
pip3 install -r requirements.txt

Then you can develop your chatbot with RAG feature:

import argparse
import time

from langchain import hub

from langchain_mistralai import ChatMistralAI

from langchain_chroma import Chroma

from langchain_community.document_loaders import DirectoryLoader
from langchain_community.embeddings.ovhcloud import OVHCloudEmbeddings

from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

from langchain_text_splitters import RecursiveCharacterTextSplitter

# Function in charge to call the LLM model.
# Question parameter is the user's question.
# The function print the LLM answer.
def chat_completion(new_message: str):
  # no need to use a token
  model = ChatMistralAI(model="Mixtral-8x22B-Instruct-v0.1", 
                        api_key="foo",
                        endpoint='https://mixtral-8x22b-instruct-v01.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1', 
                        max_tokens=1500, 
                        streaming=True)

  # Load documents from a local directory
  loader = DirectoryLoader(
     glob="**/*",
     path="./rag-files/",
     show_progress=True
  )
  docs = loader.load()

  # Split documents into chunks and vectorize them
  text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
  splits = text_splitter.split_documents(docs)
  vectorstore = Chroma.from_documents(documents=splits, embedding=OVHCloudEmbeddings(model_name="multilingual-e5-base"))

  prompt = hub.pull("rlm/rag-prompt")

  rag_chain = (
    {"context": vectorstore.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | model
  )

  print("🤖: ")
  for r in rag_chain.stream({"question", new_message}):
    print(r.content, end="", flush=True)
    time.sleep(0.150)

# Main entrypoint
def main():
  # User input
  parser = argparse.ArgumentParser()
  parser.add_argument('--question', type=str, default="What is the meaning of life?")
  args = parser.parse_args()
  chat_completion(args.question)

if __name__ == '__main__':
    main()

You can now add a text file in the rag-files folder. See public-cloud-examples repository for an example.

You can try your new assistant with the following command:

python3 chat-bot-streaming-rag.py --question "What is AI Endpoints?"

Just to remember, the same question without RAG 😉:

And that it!

Don’t hesitate to test our new product, AI Endpoints, and give us your feedback.

You have a dedicated Discord channel (#ai-endpoints) on our Discord server (https://discord.gg/ovhcloud), see you there!

Website | + posts

Once a developer, always a developer!
Java developer for many years, I have the joy of knowing JDK 1.1, JEE, Struts, ... and now Spring, Quarkus, (core, boot, batch), Angular, Groovy, Golang, ...
For more than ten years I was a Software Architect, a job that allowed me to face many problems inherent to the complex information systems in large groups.
I also had other lives, notably in automation and delivery with the implementation of CI/CD chains based on Jenkins pipelines.
I particularly like sharing and relationships with developers and I became a Developer Relation at OVHcloud.
This new adventure allows me to continue to use technologies that I like such as Kubernetes or AI for example but also to continue to learn and discover a lot of new things.
All the while keeping in mind one of my main motivations as a Developer Relation: making developers happy.
Always sharing, I am the co-creator of the TADx Meetup in Tours, allowing discovery and sharing around different tech topics.