Have a look at our previous blog posts:
– Enhance your applications with AI Endpoints
– How to use AI Endpoints and LangChain4j
– LLMs streaming with AI Endpoints and LangChain4j
– How to use AI Endpoints and LangChain to create a chatbot
– How to use AI Endpoints, LangChain and Javascript to create a chatbot
In the world of generative AI with LLMs, LangChain is one of the most famous Framework used to simplify the LLM use with API call.
LangChain’s tools and APIs simplify the process of building LLM-driven applications like chat bots and virtual agents.
LangChain is designed to be used with Python language and Javascript.
And, of course, we’ll use our AI Endpoints product to access to various LLM models 🤩.
ℹ️ All the code source used in the blog post is available on our GitHub repository: public-cloud-examples/tree/main/ai/ai-endpoints/python-langchain-chatbot ℹ️
Retrieval Augmented Generation
Before adding the RAG feature to our chatbot (see previous blog post on how to create a chatbot with AI Endpoints), let’s try to explain what Retrieval Augmented Generation (RAG) is.
To sum up what it is: RAG lets you to inject your data in the context of an LLM to help it to give a better answer when you ask it about your data.
To do this, you transform your data into vectors, so that you can search for similarities in the new data, based on a question (itself transformed in vector).
Vector transformation is complex and often delegated to an embedding model. AI Endpoints offers this kind of model 😊.
Once again, to avoid having to chain together all these steps (vectorization, search, context, …) we’ll use LangChain to do it in the following example.
How to implement RAG with AI Endpoints (and LangChain)
Be sure to have the correct dependencies in your requirements.txt:
langchain
langchain-mistralai
langchain_community
langchain_chroma
argparse
unstructured
langchainhub
pip3 install -r requirements.txt
Then you can develop your chatbot with RAG feature:
import argparse
import time
from langchain import hub
from langchain_mistralai import ChatMistralAI
from langchain_chroma import Chroma
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.embeddings.ovhcloud import OVHCloudEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_text_splitters import RecursiveCharacterTextSplitter
# Function in charge to call the LLM model.
# Question parameter is the user's question.
# The function print the LLM answer.
def chat_completion(new_message: str):
# no need to use a token
model = ChatMistralAI(model="Mixtral-8x22B-Instruct-v0.1",
api_key="foo",
endpoint='https://mixtral-8x22b-instruct-v01.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1',
max_tokens=1500,
streaming=True)
# Load documents from a local directory
loader = DirectoryLoader(
glob="**/*",
path="./rag-files/",
show_progress=True
)
docs = loader.load()
# Split documents into chunks and vectorize them
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OVHCloudEmbeddings(model_name="multilingual-e5-base"))
prompt = hub.pull("rlm/rag-prompt")
rag_chain = (
{"context": vectorstore.as_retriever(), "question": RunnablePassthrough()}
| prompt
| model
)
print("🤖: ")
for r in rag_chain.stream({"question", new_message}):
print(r.content, end="", flush=True)
time.sleep(0.150)
# Main entrypoint
def main():
# User input
parser = argparse.ArgumentParser()
parser.add_argument('--question', type=str, default="What is the meaning of life?")
args = parser.parse_args()
chat_completion(args.question)
if __name__ == '__main__':
main()
You can now add a text file in the rag-files folder. See public-cloud-examples repository for an example.
You can try your new assistant with the following command:
python3 chat-bot-streaming-rag.py --question "What is AI Endpoints?"
Just to remember, the same question without RAG 😉:
And that it!
Don’t hesitate to test our new product, AI Endpoints, and give us your feedback.
You have a dedicated Discord channel (#ai-endpoints) on our Discord server (https://discord.gg/ovhcloud), see you there!
Once a developer, always a developer!
Java developer for many years, I have the joy of knowing JDK 1.1, JEE, Struts, ... and now Spring, Quarkus, (core, boot, batch), Angular, Groovy, Golang, ...
For more than ten years I was a Software Architect, a job that allowed me to face many problems inherent to the complex information systems in large groups.
I also had other lives, notably in automation and delivery with the implementation of CI/CD chains based on Jenkins pipelines.
I particularly like sharing and relationships with developers and I became a Developer Relation at OVHcloud.
This new adventure allows me to continue to use technologies that I like such as Kubernetes or AI for example but also to continue to learn and discover a lot of new things.
All the while keeping in mind one of my main motivations as a Developer Relation: making developers happy.
Always sharing, I am the co-creator of the TADx Meetup in Tours, allowing discovery and sharing around different tech topics.