RAG chatbot using AI Endpoints and LangChain4J


If you want to have more information on AI Endpoints, please read the following blog post.

You can, also, have a look at our previous blog posts on how use AI Endpoints.

Retrieval Augmented Generation

RAG lets you inject your data in the context of an LLM to help it to give a better answer when you ask it about your data.

To do this, you transform your data into vectors, so that you can search for similarities in the new data, based on a question (itself transformed in vector).

Vector transformation is complex and often delegated to an embedding model. AI Endpoints offers this kind of model 😊.

Once again, to avoid having to chain together all these steps (vectorization, search, context, …) we’ll use LangChain4j to do it in the following example.

How to implement RAG with AI Endpoints (and LangChain4j)?

Setup the environment

Before you begin development, you must obtain a valid token here and set it as environment variable:

export OVHCLOUD_API_KEY=<your awesome token>

Next, add the appropriate dependencies to your pom.xml file.

<?xml version="1.0" encoding="UTF-8"?>

<!-- ... -->
  <properties>
    <!-- ... -->
    <langchain4j.version>0.33.0</langchain4j.version>
  </properties>

  <dependencies>
    <dependency>
      <groupId>dev.langchain4j</groupId>
      <artifactId>langchain4j</artifactId>
      <version>${langchain4j.version}</version>
    </dependency>

    <dependency>
      <groupId>dev.langchain4j</groupId>
      <artifactId>langchain4j-ovh-ai</artifactId>
      <version>${langchain4j.version}</version>
    </dependency>

    <dependency>
      <groupId>dev.langchain4j</groupId>
      <artifactId>langchain4j-mistral-ai</artifactId>
      <version>${langchain4j.version}</version>
    </dependency>

    <dependency>
      <groupId>dev.langchain4j</groupId>
      <artifactId>langchain4j-pgvector</artifactId>
      <version>${langchain4j.version}</version>
    </dependency>

    <!-- ... -->
  </dependencies>

  <!-- ... -->
</project>

See the full pom.xml file here.

ℹ️ As you can see, there is an official dependency for OVHcloud in the LangChain4j project ℹ️

Create the RAGStreamingChatbot class

The chatbot use the streaming mode as explained in the blog post Memory chatbot using AI Endpoints and LangChain4j.

package com.ovhcloud.examples.aiendpoints;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import dev.langchain4j.model.mistralai.MistralAiStreamingChatModel;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.TokenStream;


public class RAGStreamingChatbot {
  private static final Logger _LOG = LoggerFactory.getLogger(RAGStreamingChatbot.class);
  private static final String OVHCLOUD_API_KEY = System.getenv("OVHCLOUD_API_KEY");

  interface Assistant {
    TokenStream chat(String userMessage);
  }

  public static void main(String[] args) {
    MistralAiStreamingChatModel streamingChatModel = MistralAiStreamingChatModel.builder()
        .apiKey(OVHCLOUD_API_KEY)
        .modelName("Mistral-7B-Instruct-v0.2")
        .baseUrl(
            "https://mistral-7b-instruct-v02.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1")
        .maxTokens(512)
        .build();

    Assistant assistant = AiServices
        .builder(Assistant.class)
        .streamingChatLanguageModel(streamingChatModel)
        .build();

    _LOG.info("\n💬: What is AI Endpoints?\n");

    TokenStream tokenStream = assistant.chat("Can you explain me what is AI Endpoints?");
    _LOG.info("🤖: ");
    tokenStream
        .onNext(_LOG::info)
        .onError(Throwable::printStackTrace)
        .start();
  }
}

Let’s see this chatbot in action!

As you can see, the LLM gave an answer, but not the expected one 😅.

This is no surprise, as the model was trained before OVHcloud created AI Endpoints.

Add your data to the knowledge of the model

Create a content.txt file with your data in src/main/resources/rag-files:

AI Endpoint is a new cool product designed by OVHcloud.

Designed with simplicity in mind, our platform allows developers of all skill levels to enhance their applications with cutting-edge AI APIs —no AI expertise required.

Designed for Developers

with comprehensive documentation, straightforward APIs, and sample code.
Committed to Privacy

We neither store nor share your data during or after model use.
Curated list of AI models

World-renowned AI models alongside a handpicked selection of Nvidia’s optimized models.
Non-locking technology

Thanks to our transparency about the AI models used, customers can implement these models on their own infrastructure or other cloud services.

Unlock the Future: seamless AI with strong privacy.

These endpoints require no AI expertise or dedicated infrastructure, as the serverless platform provides access to advanced AI models including Large Language Models (LLMs), natural language processing, translation, speech recognition, image recognition, and more. Developers can select from a range of models, including open-source options like Mistral AI, Llama, Whisper, and Stable Diffusion, as well as a variety of optimized models from NVIDIA’s portfolio, creating a versatile testing ground for chosen AI models.

AI Endpoints are now available in a free Alpha version, which initially includes open-source models. We will regularly update the AI Endpoints Alpha with new models, incorporating user feedback to enhance functionality.

Enhance Applications with AI

AI Endpoints equips you with a suite of powerful AI capabilities, enabling you to deliver personalized, intelligent features without the need for extensive AI expertise or infrastructure. By integrating our robust, pre-trained models, you can rapidly innovate and enhance your offerings, driving user engagement and operational excellence.

Increase Productivity, Creativity and Efficiency of your organization

AI Endpoints enables you to deliver next-generation solutions to their customers. These tools leverage cutting-edge AI models to automate routine tasks, derive insights from data, and foster creative problem-solving, thereby propelling businesses towards digital excellence with minimal friction.
 Join the Alpha, Shape the Future

Ready to unlock the future? Join us on this journey. Start experimenting with our APIs from April, 9th, 2024,
and let's redefine what's possible with AI—responsibly, efficiently, and brilliantly.

We're excited to see the incredible applications you'll create and the feedback you'll share.
Together, we're not just users and providers; we're partners in pioneering a smarter, safer, and more seamless digital world.
Transform the text in vectors using OVHcloud AI Endpoints

First, you need to create chunks from your document. A chunk is a part of the document that will be transformed in vector.

It’s then used to perform a similarity search. This is a delicate phase, and in this example, the chunking is based on the number of characters. In a more complex use case, you will create chunk based on the meaning of the text.

public class RAGStreamingChatbot {

  // ...

  public static void main(String[] args) {
     // Load the document and split it into chunks
    DocumentParser documentParser = new TextDocumentParser();
    Document document = loadDocument(
            Path.of(ClassLoader.getSystemResource("rag-files/content.txt").toURI()),
            documentParser);
    DocumentSplitter splitter = DocumentSplitters.recursive(300, 0);

    List<TextSegment> segments = splitter.split(document);

    // ...
  }
}

Next, you transform the text in vectors and store them.

ℹ️ If you have not a PostgreSQL manage instance you can use an in memory store (only for test purpose). ℹ️

public class RAGStreamingChatbot {
  // ...

  private static final String DATABASE_HOST = System.getenv("DATABASE_HOST");
  private static final String DATABASE_USER = System.getenv("DATABASE_USER");
  private static final String DATABASE_PASSWORD = System.getenv("DATABASE_PASSWORD");


  public static void main(String[] args) {
    // ...

    EmbeddingStore<TextSegment> embeddingStore = PgVectorEmbeddingStore.builder()
                    .host(DATABASE_HOST)
                    .port(20184)
                    .database("rag_demo")
                    .user(DATABASE_USER)
                    .password(DATABASE_PASSWORD)
                    .table("rag_embeddings")
                    .dimension(768)
                    .createTable(false)
                    .build();

    // If you haven't a PostgreSQL database, you can use an in-memory embedding store
    // EmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();

    embeddingStore.addAll(embeddings, segments);
    ContentRetriever contentRetriever = EmbeddingStoreContentRetriever.builder()
        .embeddingStore(embeddingStore)
        .embeddingModel(embeddingModel)
        .maxResults(5)
        .minScore(0.9)
        .build();
    // ...

  }
}
Use this RAG feature for your chatbot

Adding RAG functionality to the chatbot is easy by adding the ContentRetriever to the Assistant in the RAGStreamingChatbot class:

public class RAGStreamingChatbot {
  // ...

  interface Assistant {
    TokenStream chat(String userMessage);
  }

  public static void main(String[] args) {
    // ...

    Assistant assistant = AiServices
        .builder(Assistant.class)
        .streamingChatLanguageModel(streamingChatModel)
        .contentRetriever(contentRetriever)
        .build();
    // ...
  }
}

At this step, you have completed the development of the RAGStreamingChatbot class:

package com.ovhcloud.examples.aiendpoints;

import static dev.langchain4j.data.document.loader.FileSystemDocumentLoader.loadDocument;
import java.util.List;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.DocumentParser;
import dev.langchain4j.data.document.DocumentSplitter;
import dev.langchain4j.data.document.parser.TextDocumentParser;
import dev.langchain4j.data.document.splitter.DocumentSplitters;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.model.mistralai.MistralAiStreamingChatModel;
import dev.langchain4j.model.ovhai.OvhAiEmbeddingModel;
import dev.langchain4j.rag.content.retriever.ContentRetriever;
import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.TokenStream;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.pgvector.PgVectorEmbeddingStore;

public class RAGStreamingChatbot {
  private static final Logger _LOG = LoggerFactory.getLogger(RAGStreamingChatbot.class);
  private static final String OVHCLOUD_API_KEY = System.getenv("OVHCLOUD_API_KEY");
  private static final String DATABASE_HOST = System.getenv("DATABASE_HOST");
  private static final String DATABASE_USER = System.getenv("DATABASE_USER");
  private static final String DATABASE_PASSWORD = System.getenv("DATABASE_PASSWORD");

  interface Assistant {
    TokenStream chat(String userMessage);
  }

  public static void main(String[] args) {
     // Load the document and split it into chunks
    DocumentParser documentParser = new TextDocumentParser();
    Document document = loadDocument(
        RAGStreamingChatbot.class.getResource("/rag-files/content.txt").getFile(),
        documentParser);
    DocumentSplitter splitter = DocumentSplitters.recursive(300, 0);

    List<TextSegment> segments = splitter.split(document);

    // Do the embeddings and store them in an embedding store
    EmbeddingModel embeddingModel = OvhAiEmbeddingModel.withApiKey(OVHCLOUD_API_KEY);
    List<Embedding> embeddings = embeddingModel.embedAll(segments).content();

    EmbeddingStore<TextSegment> embeddingStore = PgVectorEmbeddingStore.builder()
                    .host(DATABASE_HOST)
                    .port(20184)
                    .database("rag_demo")
                    .user(DATABASE_USER)
                    .password(DATABASE_PASSWORD)
                    .table("rag_embeddings")
                    .dimension(768)
                    .createTable(false)
                    .build();

    // If you haven't a PostgreSQL database, you can use an in-memory embedding store
    // EmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
    embeddingStore.addAll(embeddings, segments);
    ContentRetriever contentRetriever = EmbeddingStoreContentRetriever.builder()
        .embeddingStore(embeddingStore)
        .embeddingModel(embeddingModel)
        .maxResults(5)
        .minScore(0.9)
        .build();

    MistralAiStreamingChatModel streamingChatModel = MistralAiStreamingChatModel.builder()
        .apiKey(OVHCLOUD_API_KEY)
        .modelName("Mistral-7B-Instruct-v0.2")
        .baseUrl(
            "https://mistral-7b-instruct-v02.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1")
        .maxTokens(512)
        .build();

    Assistant assistant = AiServices
        .builder(Assistant.class)
        .streamingChatLanguageModel(streamingChatModel)
        .contentRetriever(contentRetriever)
        .build();

    _LOG.info("\n💬: What is AI Endpoints?\n");

    TokenStream tokenStream = assistant.chat("Can you explain me what is AI Endpoints?");
    _LOG.info("🤖: ");
    tokenStream
        .onNext(_LOG::info)
        .onError(Throwable::printStackTrace)
        .start();
  }
}

It’s time to see our new chatbot in action!

Much better! 🎉

Don’t hesitate to test our new product, AI Endpoints, and give us your feedback.

You have a dedicated Discord channel (#ai-endpoints) on our Discord server (https://discord.gg/ovhcloud), see you there!

ℹ️ You can find all the source code on out dedicated GitHub repository, public-cloud-examples. ℹ️

Website | + posts

Once a developer, always a developer!
Java developer for many years, I have the joy of knowing JDK 1.1, JEE, Struts, ... and now Spring, Quarkus, (core, boot, batch), Angular, Groovy, Golang, ...
For more than ten years I was a Software Architect, a job that allowed me to face many problems inherent to the complex information systems in large groups.
I also had other lives, notably in automation and delivery with the implementation of CI/CD chains based on Jenkins pipelines.
I particularly like sharing and relationships with developers and I became a Developer Relation at OVHcloud.
This new adventure allows me to continue to use technologies that I like such as Kubernetes or AI for example but also to continue to learn and discover a lot of new things.
All the while keeping in mind one of my main motivations as a Developer Relation: making developers happy.
Always sharing, I am the co-creator of the TADx Meetup in Tours, allowing discovery and sharing around different tech topics.