<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Fine Tuning Archives - OVHcloud Blog</title>
	<atom:link href="https://blog.ovhcloud.com/tag/fine-tuning-2/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.ovhcloud.com/tag/fine-tuning-2/</link>
	<description>Innovation for Freedom</description>
	<lastBuildDate>Fri, 06 Feb 2026 15:18:43 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://blog.ovhcloud.com/wp-content/uploads/2019/07/cropped-cropped-nouveau-logo-ovh-rebranding-32x32.gif</url>
	<title>Fine Tuning Archives - OVHcloud Blog</title>
	<link>https://blog.ovhcloud.com/tag/fine-tuning-2/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Fine tune an LLM with Axolotl and OVHcloud Machine Learning Services</title>
		<link>https://blog.ovhcloud.com/fine-tune-an-llm-with-axolotl-and-ovhcloud-machine-learning-services/</link>
		
		<dc:creator><![CDATA[Stéphane Philippart]]></dc:creator>
		<pubDate>Fri, 25 Jul 2025 13:07:40 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[Tranches de Tech & co]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[AI Notebook]]></category>
		<category><![CDATA[Fine Tuning]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=29408</guid>

					<description><![CDATA[There are many ways to train a model,📚 using detailed instructions, system prompts, Retrieval Augmented Generation, or function calling One way is fine-tuning, which is what this blog is about! ✨ Two years back we posted a blog on fine-tuning Llama models—it’s not nearly as complicated as it was before 😉.  This time we’re using the [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Ffine-tune-an-llm-with-axolotl-and-ovhcloud-machine-learning-services%2F&amp;action_name=Fine%20tune%20an%20LLM%20with%20Axolotl%20and%20OVHcloud%20Machine%20Learning%20Services&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image aligncenter size-full is-resized"><img fetchpriority="high" decoding="async" width="1024" height="1024" src="https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02-1.png" alt="A robot with a car tuning style" class="wp-image-29462" style="width:600px" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02-1.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02-1-300x300.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02-1-150x150.png 150w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02-1-768x768.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02-1-70x70.png 70w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p>There are many ways to train a model,📚 using detailed instructions, system prompts, Retrieval Augmented Generation, or function calling</p>



<p>One way is fine-tuning, which is what this blog is about! ✨</p>



<p>Two years back we posted a <a href="https://blog.ovhcloud.com/fine-tuning-llama-2-models-using-a-single-gpu-qlora-and-ai-notebooks/" data-wpel-link="internal">blog</a> on fine-tuning Llama models—it’s not nearly as complicated as it was before 😉.  This time we’re using the Framework <a href="https://docs.axolotl.ai/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Axolotl</a>, so hopefully there’s less to manage.</p>



<h3 class="wp-block-heading">So what’s the plan?</h3>



<p>For this blog, I’d like to fine-tune a small model, <a href="https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Llama-3.2-1B-Instruct</a>, and then test it out on a few questions about our <a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud AI Endpoints</a> product 📝.</p>



<p>Before we fine-tune, let’s try it out! Deploying a <a href="https://huggingface.co/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Hugging Face</a> model is super easy with <a href="https://www.ovhcloud.com/fr/public-cloud/ai-deploy/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Deploy</a> from <a href="https://www.ovhcloud.com/fr/public-cloud/ai-machine-learning/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Machine Learning Services</a> 🥳.</p>



<p>And thanks to a <a href="https://blog.ovhcloud.com/mistral-small-24b-served-with-vllm-and-ai-deploy-one-command-to-deploy-llm/" data-wpel-link="internal">previous blog post</a>, we know how to use <a href="https://docs.vllm.ai/en/v0.7.3/index.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">vLLM</a> and <a href="https://www.ovhcloud.com/fr/public-cloud/ai-deploy/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Deploy</a>.</p>



<pre title="Deploy a model thanks to vLLM and AI Deploy" class="wp-block-code"><code lang="bash" class="language-bash line-numbers">ovhai app run --name $1 \
	--flavor l40s-1-gpu \
	--gpu 2 \
	--default-http-port 8000 \
	--env OUTLINES_CACHE_DIR=/tmp/.outlines \
	--env HF_TOKEN=$MY_HUGGING_FACE_TOKEN \
	--env HF_HOME=/hub \
	--env HF_DATASETS_TRUST_REMOTE_CODE=1 \
	--env HF_HUB_ENABLE_HF_TRANSFER=0 \
	--volume standalone:/hub:rw \
	--volume standalone:/workspace:rw \
	vllm/vllm-openai:v0.8.2 \
	-- bash	-c "vllm serve meta-llama/Llama-3.2-1B-Instruct"</code></pre>



<p class="has-text-align-center"><strong><strong>⚠️ Make sure you’ve agreed to the terms of use for the model’s license from Hugging Face ⚠️</strong></strong></p>



<p>Check out the <a href="https://blog.ovhcloud.com/mistral-small-24b-served-with-vllm-and-ai-deploy-one-command-to-deploy-llm/" data-wpel-link="internal">blog</a> I mentioned earlier for all the details you need on the command and its parameters.</p>



<p>To test our different chatbots we will use a simple <a href="https://github.com/ovh/public-cloud-examples/tree/main/ai/llm-fine-tune/chatbot/chatbot.py" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Gradio application</a>:</p>



<pre title="Chatbot" class="wp-block-code"><code lang="python" class="language-python line-numbers"># Application to compare answers generation from OVHcloud AI Endpoints exposed model and fine tuned model.
# ⚠️ Do not used in production!! ⚠️

import gradio as gr
import os

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

# 📜 Prompts templates 📜
prompt_template = ChatPromptTemplate.from_messages(
    [
        ("system", "{system_prompt}"),
        ("human", "{user_prompt}"),
    ]
)

def chat(prompt, system_prompt, temperature, top_p, model_name, model_url, api_key):
    """
    Function to generate a chat response using the provided prompt, system prompt, temperature, top_p, model name, model URL and API key.
    """

    # ⚙️ Initialize the OpenAI model ⚙️
    llm = ChatOpenAI(api_key=api_key, 
                 model=model_name, 
                 base_url=model_url,
                 temperature=temperature,
                 top_p=top_p
                 )

    # 📜 Apply the prompt to the model 📜
    chain = prompt_template | llm
    ai_msg = chain.invoke(
        {
            "system_prompt": system_prompt,
            "user_prompt": prompt
        }
    )

    # 🤖 Return answer in a compatible format for Gradio component.
    return [{"role": "user", "content": prompt}, {"role": "assistant", "content": ai_msg.content}]

# 🖥️ Main application 🖥️
with gr.Blocks() as demo:
    with gr.Row():
        with gr.Column():
            system_prompt = gr.Textbox(value="""You are a specialist on OVHcloud products.
If you can't find any sure and relevant information about the product asked, answer with "This product doesn't exist in OVHcloud""", 
                label="🧑‍🏫 System Prompt 🧑‍🏫")
            temperature = gr.Slider(minimum=0.0, maximum=2.0, step=0.01, label="Temperature", value=0.5)
            top_p = gr.Slider(minimum=0.0, maximum=1.0, step=0.01, label="Top P", value=0.0)
            model_name = gr.Textbox(label="🧠 Model Name 🧠", value='Llama-3.1-8B-Instruct')
            model_url = gr.Textbox(label="🔗 Model URL 🔗", value='https://oai.endpoints.kepler.ai.cloud.ovh.net/v1')
            api_key = gr.Textbox(label="🔑 OVH AI Endpoints Access Token 🔑", value=os.getenv("OVH_AI_ENDPOINTS_ACCESS_TOKEN"), type="password")

        with gr.Column():
            chatbot = gr.Chatbot(type="messages", label="🤖 Response 🤖")
            prompt = gr.Textbox(label="📝 Prompt 📝", value='How many requests by minutes can I do with AI Endpoints?')
            submit = gr.Button("Submit")

    submit.click(chat, inputs=[prompt, system_prompt, temperature, top_p, model_name, model_url, api_key], outputs=chatbot)

demo.launch()</code></pre>



<p>ℹ️ You can find all resources to build and run this application in the <a href="https://github.com/ovh/public-cloud-examples/tree/main/ai/llm-fine-tune/chatbot/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">dedicated folder</a> in the GitHub repository.</p>



<p>Let&#8217;s test with a simple question: &#8220;How many requests by minutes can I do with AI Endpoints?&#8221;.<br>The first test is with <a href="https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Llama-3.2-1B-Instruct</a> from <a href="https://huggingface.co/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"></a><a href="https://huggingface.co/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Hugging Face</a> deployed with <a href="https://docs.vllm.ai/en/v0.7.3/index.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">vLLM</a> and <a href="https://www.ovhcloud.com/fr/public-cloud/ai-deploy/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud AI Deploy</a>.</p>



<figure class="wp-block-image aligncenter size-large"><img decoding="async" width="1024" height="474" src="https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-13.19.16-1024x474.png" alt="Ask for AI Endpoints rate limit with a Llama-3.2-1B-Instruct model" class="wp-image-29448" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-13.19.16-1024x474.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-13.19.16-300x139.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-13.19.16-768x356.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-13.19.16-1536x712.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-13.19.16-2048x949.png 2048w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p>The response isn’t exactly what we expected. 😅</p>



<p>FYI, according to the official <a href="https://help.ovhcloud.com/csm/fr-public-cloud-ai-endpoints-capabilities?id=kb_article_view&amp;sysparm_article=KB0065424#limitations" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud guide</a>, the correct answer is:<br> &#8211; <strong>Anonymous</strong>: 2 requests per minute, per IP and per model.<br> &#8211; <strong>Authenticated with an API access key</strong>: 400 requests per minute, per Public Cloud project and per model.</p>



<h3 class="wp-block-heading"><strong>What’s the best way to feed the model fresh data?</strong></h3>



<p>I bet you already know this—you can use some data during the inference step, using Retrieval Augmented Generation (RAG). You can learn how to set up RAG by reading our <a href="https://blog.ovhcloud.com/rag-chatbot-using-ai-endpoints-and-langchain/" data-wpel-link="internal">past blog post</a>. 📗</p>



<p>Another way to feed a model fresh data by fine-tuning. ✨</p>



<p>In a nutshell,  fine-tuning is when you take a pre-trained machine learning model and train it further on additional data, so it can do a specific job. It’s quicker and easier than building a model yourself, or from scratch. 😉</p>



<p>For this, I’m picking <a href="https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Llama-3.2-1B-Instruct</a> from Hugging Face as the base model.</p>



<p><em>ℹ️ The more parameters your base model has, the more computing power you need. In this case, this model needs between 3GB and 4GB of memory, <em>which is why we’ll be using</em> a <a href="https://www.ovhcloud.com/fr/public-cloud/prices/#5260" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">single L4 GPU</a> (we need </em><a href="https://www.nvidia.com/en-us/data-center/ampere-architecture/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>Ampere</em> compatible architecture</a>).</p>



<h3 class="wp-block-heading">When data is your gold</h3>



<p>To train a model, you need enough good-quality data.</p>



<p>The first part is easy; I get the OVHcloud AI Endpoints official documentation in a markdown format from our <a href="https://github.com/ovh/docs/tree/develop/pages/public_cloud/ai_machine_learning" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">public cloud documentation repository</a> (by the way, would you like to contribute?). 📚</p>



<p>First, create a dataset with the right format, Axolotl offers varying <a href="https://docs.axolotl.ai/docs/dataset-formats/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">dataset formats</a>. I prefer the <a href="https://docs.axolotl.ai/docs/dataset-formats/conversation.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">conversation format</a> because it’s the easiest for my use case, so I’m going with that. 😉</p>



<pre title="Conersation format dataset" class="wp-block-code"><code lang="json" class="language-json line-numbers"><a href="https://docs.axolotl.ai/docs/dataset-formats/conversation.html#cb1-1" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"></a>{
   "messages": [
     {"role": "...", "content": "..."}, 
     {"role": "...", "content": "..."}, 
     ...]
}</code></pre>



<p>And to create it manually and add the relevant information, I use an LLM to convert the markdown data into a well-formed dataset. 🤖</p>



<p>Here we’re using <a href="https://github.com/ovh/public-cloud-examples/tree/main/ai/llm-fine-tune/ai/llm-fine-tune/dataset/DatasetCreation.py" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Python script </a>🐍:</p>



<pre title="Dataset creation with LLM" class="wp-block-code"><code lang="python" class="language-python line-numbers">import os
from pathlib import Path
from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage

# 🗺️ Define the JSON schema for the response 🗺️
message_schema = {
    "type": "object",
    "properties": {
        "role": {"type": "string"},
        "content": {"type": "string"}
    },
    "required": ["role", "content"]
}

response_format = {
    "type": "json_object",
    "json_schema": {
        "name": "Messages",
        "description": "A list of messages with role and content",
        "properties": {
            "messages": {
                "type": "array",
                "items": message_schema
            }
        }
    }
}

# ⚙️ Initialize the chat model with AI Endpoints configuration ⚙️
chat_model = ChatOpenAI(
    api_key=os.getenv("OVH_AI_ENDPOINTS_ACCESS_TOKEN"),
    base_url=os.getenv("OVH_AI_ENDPOINTS_MODEL_URL"),
    model_name=os.getenv("OVH_AI_ENDPOINTS_MODEL_NAME"),
    temperature=0.0
)

# 📂 Define the directory path 📂
directory_path = "docs/pages/public_cloud/ai_machine_learning"
directory = Path(directory_path)

# 🗃️ Walk through the directory and its subdirectories 🗃️
for path in directory.rglob("*"):
    # Check if the current path is a directory
    if path.is_dir():
        # Get the name of the subdirectory
        sub_directory = path.name

        # Construct the path to the "guide.en-gb.md" file in the subdirectory
        guide_file_path = path / "guide.en-gb.md"

        # Check if the "guide.en-gb.md" file exists in the subdirectory
        if "endpoints" in sub_directory and guide_file_path.exists():
            print(f"📗 Guide processed: {sub_directory}")
            with open(guide_file_path, 'r', encoding='utf-8') as file:
                raw_data = file.read()

            user_message = HumanMessage(content=f"""
With the markdown following, generate a JSON file composed as follows: a list named "messages" composed of tuples with a key "role" which can have the value "user" when it's the question and "assistant" when it's the response. To split the document, base it on the markdown chapter titles to create the question, seems like a good idea.
Keep the language English.
I don't need to know the code to do it but I want the JSON result file.
For the "user" field, don't just repeat the title but make a real question, for example "What are the requirements for OVHcloud AI Endpoints?"
Be sure to add OVHcloud with AI Endpoints so that it's clear that OVHcloud creates AI Endpoints.
Generate the entire JSON file.
An example of what it should look like: messages [{{"role":"user", "content":"What is AI Endpoints?"}}]
There must always be a question followed by an answer, never two questions or two answers in a row.
The source markdown file:
{raw_data}
""")
            chat_response = chat_model.invoke([user_message], response_format=response_format)
            
            with open(f"./generated/{sub_directory}.json", 'w', encoding='utf-8') as output_file:
                output_file.write(chat_response.content)
                print(f"✅ Dataset generated: ./generated/{sub_directory}.json")

</code></pre>



<p><em>ℹ️ You can find all resources to build and run this application in the <a href="https://github.com/ovh/public-cloud-examples/tree/main/ai/llm-fine-tune/dataset/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">dedicated folder</a> in the GitHub repository.</em></p>



<p>Here’s a sample of the file created as the dataset:</p>



<pre title="Dataset example" class="wp-block-code"><code lang="json" class="language-json line-numbers">[
  {
    "role": "user",
    "content": "What are the requirements for using OVHcloud AI Endpoints?"
  },
  {
    "role": "assistant",
    "content": "To use OVHcloud AI Endpoints, you need the following: \n1. A Public Cloud project in your OVHcloud account \n2. A payment method defined on your Public Cloud project. Access keys created from Public Cloud projects in Discovery mode (without a payment method) cannot use the service."
  },
  {
    "role": "user",
    "content": "What are the rate limits for using OVHcloud AI Endpoints?"
  },
  {
    "role": "assistant",
    "content": "The rate limits for OVHcloud AI Endpoints are as follows:\n- Anonymous: 2 requests per minute, per IP and per model.\n- Authenticated with an API access key: 400 requests per minute, per PCI project and per model."
  }, 
   ...]
}</code></pre>



<p>As for quantity, it’s a bit tricky. How can we generate the right data for training without lowering data quality?</p>



<p>To do this, I’ve created synthetic data using an LLM to create it from the original data. The trick is to generate more data on the same topic by rephrasing it differently but with the same idea.</p>



<p>Here is the <a href="https://github.com/ovh/public-cloud-examples/tree/main/ai/llm-fine-tune/dataset/DatasetAugmentation.py" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Python script</a> 🐍 to do the data augmentation:</p>



<pre title="Data augmentation" class="wp-block-code"><code lang="python" class="language-python line-numbers">import os
import json
import uuid
from pathlib import Path
from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage
from jsonschema import validate, ValidationError

# 🗺️ Define the JSON schema for the response 🗺️
message_schema = {
    "type": "object",
    "properties": {
        "role": {"type": "string"},
        "content": {"type": "string"}
    },
    "required": ["role", "content"]
}

response_format = {
    "type": "json_object",
    "json_schema": {
        "name": "Messages",
        "description": "A list of messages with role and content",
        "properties": {
            "messages": {
                "type": "array",
                "items": message_schema
            }
        }
    }
}

# ✅ JSON validity verification ❌
def is_valid(json_data):
    """
    Test the validity of the JSON data against the schema.
    Argument:
        json_data (dict): The JSON data to validate.  
    Raises:
        ValidationError: If the JSON data does not conform to the specified schema.  
    """
    try:
        validate(instance=json_data, schema=response_format["json_schema"])
        return True
    except ValidationError as e:
        print(f"❌ Validation error: {e}")
        return False

# ⚙️ Initialize the chat model with AI Endpoints configuration ⚙️
chat_model = ChatOpenAI(
    api_key=os.getenv("OVH_AI_ENDPOINTS_ACCESS_TOKEN"),
    base_url=os.getenv("OVH_AI_ENDPOINTS_MODEL_URL"),
    model_name=os.getenv("OVH_AI_ENDPOINTS_MODEL_NAME"),
    temperature=0.0
)

# 📂 Define the directory path 📂
directory_path = "generated"
print(f"📂 Directory path: {directory_path}")
directory = Path(directory_path)

# 🗃️ Walk through the directory and its subdirectories 🗃️
for path in directory.rglob("*"):
    print(f"📜 Processing file: {path}")
    # Check if the current path is a valid file
    if path.is_file() and path.name.__contains__ ("endpoints"):
        # Read the raw data from the file
        with open(path, 'r', encoding='utf-8') as file:
            raw_data = file.read()

        try:
            json_data = json.loads(raw_data)
        except json.JSONDecodeError:
            print(f"❌ Failed to decode JSON from file: {path.name}")
            continue

        if not is_valid(json_data):
            print(f"❌ Dataset non valide: {path.name}")
            continue
        print(f"✅ Input dataset valide: {path.name}")

        user_message = HumanMessage(content=f"""
        Given the following JSON, generate a similar JSON file where you paraphrase each question in the content attribute
        (when the role attribute is user) and also paraphrase the value of the response to the question stored in the content attribute
        when the role attribute is assistant.
        The objective is to create synthetic datasets based on existing datasets.
        I do not need to know the code to do this, but I want the resulting JSON file.
        It is important that the term OVHcloud is present as much as possible, especially when the terms AI Endpoints are mentioned
        either in the question or in the response.
        There must always be a question followed by an answer, never two questions or two answers in a row.
        It is IMPERATIVE to keep the language in English.
        The source JSON file:
        {raw_data}
        """)

        chat_response = chat_model.invoke([user_message], response_format=response_format)

        output = chat_response.content

        # Replace unauthorized characters
        output = output.replace("\\t", " ")

        generated_file_name = f"{uuid.uuid4()}_{path.name}"
        with open(f"./generated/synthetic/{generated_file_name}", 'w', encoding='utf-8') as output_file:
            output_file.write(output)

        if not is_valid(json.loads(output)):
            print(f"❌ ERROR: File {generated_file_name} is not valid")
        else:
            print(f"✅ Successfully generated file: {generated_file_name}")</code></pre>



<p><em>ℹ️ Again, you can find all resources to build and run this application in the <a href="https://github.com/ovh/public-cloud-examples/tree/main/ai/llm-fine-tune/dataset/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">dedicated folder</a> in the GitHub repository.</em></p>



<h3 class="wp-block-heading">Fine-tune the model</h3>



<p>We now have enough training data, let’s fine-tune!</p>



<p><em><em>ℹ️ It’s hard to say exactly how much data is needed to train a model properly. It all depends on the model, the data, the topic, and so on.</em><br><em>The only option is to test and adapt. 🔁</em>.</em></p>



<p>I use <a href="https://jupyter.org/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Jupyter notebook</a>, created with <a href="https://www.ovhcloud.com/fr/public-cloud/ai-notebooks/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud AI Notebooks</a>, to fine-tune my models.</p>



<pre title="Jupyter notebook creation" class="wp-block-code"><code lang="bash" class="language-bash line-numbers">ovhai notebook run conda jupyterlab \
	--name axolto-llm-fine-tune \
	--framework-version 25.3.1-py312-cudadevel128-gpu \
	--flavor l4-1-gpu \
	--gpu 1 \
	--envvar HF_TOKEN=$MY_HF_TOKEN \
	--envvar WANDB_TOKEN=$MY_WANDB_TOKEN \
	--unsecure-http</code></pre>



<p><em><em>ℹ️ For more details on how to create Jupyter notebook with <a href="https://www.ovhcloud.com/fr/public-cloud/ai-notebooks/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Notebooks</a>, read the <a href="https://help.ovhcloud.com/csm/fr-documentation-public-cloud-ai-and-machine-learning-ai-notebooks?id=kb_browse_cat&amp;kb_id=574a8325551974502d4c6e78b7421938&amp;kb_category=c8441955f49801102d4ca4d466a7fd58&amp;spa=1" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">documentation</a>.</em></em></p>



<p class="has-text-align-left">⚙️ The <strong>HF_TOKEN</strong> environment variable is used to pull and push the trained model to <a href="https://huggingface.co/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Hugging Face</a> <br>⚙️ The <strong>WANDB_TOKEN</strong> environment variable helps you track training quality in <a href="https://wandb.ai" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Weight &amp; Biases</a></p>



<p>Once the notebook is set up, you can start coding the model’s training with Axolotl.</p>



<p>To start, install Axolotl CLI and its dependencies. 🧰</p>



<pre title="Axolot installation" class="wp-block-code"><code lang="bash" class="language-bash"># Axolotl need these dependencies
!pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126

# Axolotl CLI installation
!pip install --no-build-isolation axolotl[flash-attn,deepspeed]

# Verify Axolotl version and installation
!axolotl --version</code></pre>



<p></p>



<p>The next step is to configure the Hugging Face CLI. 🤗</p>



<pre title="Hugging Face configurartion" class="wp-block-code"><code lang="bash" class="language-bash line-numbers">!pip install -U "huggingface_hub[cli]"

!huggingface-cli --version</code></pre>



<pre title="Hugging Face hub authentication " class="wp-block-code"><code lang="python" class="language-python line-numbers">import os
from huggingface_hub import login

login(os.getenv("HF_TOKEN"))</code></pre>



<p></p>



<p>Then, configure your Weight &amp; Biases access.</p>



<pre title="Weight &amp; Biases configuration" class="wp-block-code"><code lang="bash" class="language-bash line-numbers">pip install wandb

!wandb login $WANDB_TOKEN</code></pre>



<p></p>



<p>Once all that’s done, it’s time to train the model.</p>



<pre title="Train the model" class="wp-block-code"><code lang="bash" class="language-bash line-numbers">!axolotl train /workspace/instruct-lora-1b-ai-endpoints.yml</code></pre>



<p>You only need to type this one line to train it, how cool is that? 😎</p>



<p><em><em>ℹ️ With one L4 card, 10 epochs, and roughly 2000 questions and answers in the datasets, it ran for about 90 minutes.</em></em></p>



<p>Basically, the command line needs just one parameter: the Axolotl config file. You can find everything you need to set up Axolotl in the <a href="https://docs.axolotl.ai/docs/config-reference.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">official documentation</a>.📜<br>Here’s what the model was trained on:</p>



<pre title="Axolotl configuration" class="wp-block-code"><code lang="yaml" class="language-yaml">base_model: meta-llama/Llama-3.2-1B-Instruct
# optionally might have model_type or tokenizer_type
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
# Automatically upload checkpoint and final model to HF
# hub_model_id: username/custom_model_name

load_in_8bit: true
load_in_4bit: false

datasets:
  - path: /workspace/ai-endpoints-doc/
    type: chat_template
      
    field_messages: messages
    message_property_mappings:
      role: role
      content: content
    roles:
      user:
        - user
      assistant:
        - assistant

dataset_prepared_path:
val_set_size: 0.01
output_dir: /workspace/out/llama-3.2-1b-ai-endpoints

sequence_len: 4096
sample_packing: false
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true

wandb_project: ai_endpoints_training
wandb_entity: &lt;user id&gt;
wandb_mode: 
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 10
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

bf16: auto
tf32: false

gradient_checkpointing: true
resume_from_checkpoint:
logging_steps: 1
flash_attention: true

warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.0
special_tokens:
   pad_token: &lt;|end_of_text|&gt;
</code></pre>



<p>🔎 Some key points (only the fields modified from the <a href="https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/llama-3/instruct-lora-8b.yml" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">given templates</a>):<br>&#8211; <strong>base_model: meta-llama/Llama-3.2-1B-Instruct</strong>: before you download the base model from Hugging Face, be sure to accept the licence’s <a>terms of use </a><a href="#_msocom_1">[JD1]</a> <br>&#8211; <strong>path: /workspace/ai-endpoints-doc/</strong>: folder to upload the generated dataset<br>&#8211; <strong>wandb_project: ai_endpoints_training</strong> &amp; <strong>wandb_entity: &lt;user id></strong>: to configure weights and biases<br>&#8211; <strong>num_epochs: 10</strong>: number of epochs for the training<a id="_msocom_1"></a></p>



<p>After the training, you can test the new model 🤖:</p>



<pre title="New model testing" class="wp-block-code"><code lang="bash" class="language-bash line-numbers">!echo "What is OVHcloud AI Endpoints and how to use it?" | axolotl inference /workspace/instruct-lora-1b-ai-endpoints.yml --lora-model-dir="/workspace/out/llama-3.2-1b-ai-endpoints" </code></pre>



<p></p>



<p>When you’re satisfied with the result, merge the weights and upload the new model to Hugging Face:</p>



<pre title="Push the model" class="wp-block-code"><code lang="bash" class="language-bash line-numbers">!axolotl merge-lora /workspace/instruct-lora-1b-ai-endpoints.yml

%cd /workspace/out/llama-3.2-1b-ai-endpoints/merged

!huggingface-cli upload wildagsx/Llama-3.2-1B-Instruct-AI-Endpoints-v0.6 .</code></pre>



<p>ℹ️ <em>You can find all resources to create and run the notebook in the <a href="https://github.com/ovh/public-cloud-examples/tree/main/ai/llm-fine-tune/notebook/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">dedicated folder</a> in the GitHub repository.</em></p>



<h3 class="wp-block-heading">Test the new model</h3>



<p>Once you have pushed your model in Hugging Face you can, again, deploy it with vLLM and AI Deploy to test it ⚡️.</p>



<figure class="wp-block-image aligncenter size-large"><img decoding="async" width="1024" height="474" src="https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-14.58.02-1024x474.png" alt="" class="wp-image-29459" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-14.58.02-1024x474.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-14.58.02-300x139.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-14.58.02-768x356.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-14.58.02-1536x712.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-14.58.02-2048x949.png 2048w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p>Ta-da! 🥳 Our little Llama model is now an OVHcloud AI Endpoints pro!</p>



<p></p>



<p>Feel free to try out OVHcloud Machine Learning products, and share your thoughts on our Discord server (<em><a href="https://discord.gg/ovhcloud" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://discord.gg/ovhcloud</a></em>), see you soon! 👋</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Ffine-tune-an-llm-with-axolotl-and-ovhcloud-machine-learning-services%2F&amp;action_name=Fine%20tune%20an%20LLM%20with%20Axolotl%20and%20OVHcloud%20Machine%20Learning%20Services&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
