AI Notebooks Archives - OVHcloud Blog

Reference Architecture: set up MLflow Remote Tracking Server on OVHcloud

Eléa Petton — Tue, 15 Apr 2025 07:52:46 +0000

Travel through the Data & AI universe of OVHcloud with the MLflow integration.

Mlflow Remote Tracking Server on OVHcloud

As Artificial Intelligence (AI) continues to grow in importance, Data Scientists and Machine Learning Engineers need a robust and scalable platform to manage the entire Machine Learning (ML) lifecycle.
MLflow, an open-source platform, provides a comprehensive framework for managing ML experiments, models, and deployments.

Mlflow offers many benefits and provides a complete framework for ML lifecycle management with features such as:

Experiment tracking and model management
Reproducibility and collaboration
Scalability, flexibility, and integration
Automated ML and model serving capabilities
Improved model accuracy, faster time-to-market, and reduced costs.

In this reference architecture, you will explore how to leverage remote experience tracking with the MLflow Tracking Server on the OVHcloud Public Cloud infrastructure.
In fact, you will be able to build a scalable and efficient ML platform, streamlining your ML workflow and accelerating model development using OVHcloud AI Notebooks, AI Training, Managed Databases (PostgreSQL), and Object Storage.

The result? A fully remote, production-ready ML experiment tracking pipeline, powered by OVHcloud’s Data & Machine Learning Services (e.g. AI Notebooks and AI Training).

Overview of the MLflow server architecture

Here is how will be configured MLflow:

Development and training environment: create and train model with AI Notebooks
Remote Tracking Server: host in an AI Training job (Container as a Service)
Backend Store: benefit from a managed PostgreSQL database (DBaaS).
Artifact Store: use OVHcloud Object Storage (S3-compatible).

MLflow remote server deployment steps

In the following tutorial, all services are deployed within the OVHcloud Public Cloud.

Prerequisites

Before you begin, ensure you have:

An OVHcloud Public Cloud account
An OpenStack user with the following roles:
- Administrator
- AI Training Operator
- Object Storage Operator

🚀 Having all the ingredients for our recipe, it’s time to set up your MLflow remote tracking server!

Architecture guide: MLflow remote tracking server

Let’s go for the set up and deployment of your custom MLflow tracking tool!

⚙️ Also consider that all of the following steps can be automated using OVHcloud APIs!

Step 1 – Install `ovhai` CLI

Firstly, start by setting up your CLI environment.

curl https://cli.gra.ai.cloud.ovh.net/install.sh | bash

Secondly, login using your OpenStack credentials.

ovhai login -u  -p

Now, it’s time to create your bucket inside OVHcloud Object Storage!

Step 2 – Provision Object Storage (Artifact Store)

Go to Public Cloud > Storage > Object Storage in the OVHcloud Control Panel.
Create a datastore and a new S3 bucket (e.g., mlflow-s3-bucket).
Register the datastore with the ovhai CLI:

ovhai datastore add s3  https://s3.gra.io.cloud.ovh.net/ gra   --store-credentials-locally

Step 3 – Create PostgreSQL Managed DB (Backend Store)

1. Navigate to Databases & Analytics > Databases

2. Create a new PostgreSQL instance with Essential plan

3. Select Location and Node type

4. Reset the user password

5. Take note of te following parameters

Go to your database dashboard:

Then, copy the connexion information:

Your Backend Store is now ready to use!

Step 4 -Build you custom MLflow Docker image and

1. Develop MLflow launching script

Firstly, you have to write a script in bash to launch the server: mlflow_server.sh

echo "The MLflow server is starting..."

mlflow server \
  --backend-store-uri postgresql://${POSTGRE_USER}:${POSTGRE_PASSWORD}@${PG_HOST}:${PG_PORT}/${PG_DB}?sslmode=${SSL_MODE} \
  --default-artifact-root ${S3_BUCKET_NAME}/ \
  --host 0.0.0.0 \
  --port 5000

2. Create Dockerfile

Install the required Python dependency and give the rights on the /mlruns path to the OVHcloud user.

FROM ghcr.io/mlflow/mlflow:latest

# Install Python dependencies
RUN pip install psycopg2-binary

COPY mlflow_server.sh .

# Change the ownership of `mlruns` directory to the OVHcloud user (42420:42420)
RUN mkdir -p /mlruns
RUN chown -R 42420:42420 /mlruns

# Start MLflow server inside container
CMD ["bash", "mlflow_server.sh"]

3. Build your custom MLflow docker image

Build the docker image using the previous Dockerfile.

docker build . -t mlflow-server-ai-training:latest

4. Tag and push the docker image to your registry

Finally, you can push the Docker image to your registry.

docker tag mlflow-server-ai-training:latest /mlflow-server-ai-training:latest

docker push /mlflow-server-ai-training:latest

Congrats! You can now use the Docker image to launch MLflow server.

Step 5 – Start MLflow Tracking Server inside container

You can use AI Training to start MLflow server inside a job.

1. Using ovhai CLI, run the following command inside terminal

ovhai job run --name mlflow-server \
              --default-http-port 5000 \
              --cpu 4 \
              -v mlflow-s3-bucket@DEMO/:/artifacts:RW:cache \
              -e POSTGRE_USER=avnadmin \
              -e POSTGRE_PASSWORD= \
              -e S3_ENDPOINT=https://s3.gra.io.cloud.ovh.net/ \
              -e S3_BUCKET_NAME=mlflow-s3-bucket \
              -e PG_HOST= \
              -e PG_DB=defaultdb \
              -e PG_PORT=20184 \
              -e SSL_MODE=require \
              /mlflow-server-ai-training:latest

Full command explained:

ovhai job run

This is the core command to run a job using the OVHcloud AI Training platform.

--name mlflow-server

Sets a custom name for the job. For example, mlflow-server.

--default-http-port 5000

Exposes port 5000 as the default HTTP endpoint. MLflow’s web UI typically runs on port 5000, so this ensures the UI is accessible once the job is running.

--cpu 4

Allocates 4 CPUs for the job. You can adjust this based on how heavy your MLflow workload is.

-v mlflow-s3-bucket@DEMO/:/artifacts:RW:cache

This mounts your OVHcloud Object Storage volume into the job’s file system:
– mlflow-s3-bucket@DEMO/: refers to your S3 bucket volume from the OVHcloud Object Storage
– :/artifacts: mounts the volume into the container under /artifacts
– RW: enables Read/Write permissions
– cache: enables volume caching, improving performance for frequent reads/writes

-e POSTGRE_USER=avnadmin
-e POSTGRE_PASSWORD=
-e PG_HOST=
-e PG_DB=defaultdb
-e PG_PORT=20184
-e SSL_MODE=require

These are environment variables for connecting to the PostgreSQL backend store:
– avnadmin: the default admin user for OVHcloud’s managed PostgreSQL
– POSTGRE_PASSWORD: must be replaced with your actual database password
– PG_HOST: the hostname of your managed PostgreSQL instance
– PG_DB: the name of the database to use (default: defaultdb)
– PG_PORT: the port your PostgreSQL server is listening on
– SSL_MODE: enforce SSL connection to secure DB traffic

-e S3_ENDPOINT=https://s3.gra.io.cloud.ovh.net/

Tells MLflow where the S3-compatible endpoint is hosted. This is specific to OVHcloud’s GRA (Gravelines) region Object Storage.

-e S3_BUCKET_NAME=mlflow-s3-bucket

Sets the name of the S3 bucket where MLflow should store artifacts (models, metrics, etc.).

/mlflow-server-ai-training:latest

This is the custom MLflow Docker image you are running inside the job.

2. Check if your AI Training job is RUNNING

Replace the by yours.

ovhai job get

You should obtain:

History: DATE STATE 04-04-25 09:58:00 QUEUED 04-04-25 09:58:01 INITIALIZING 04-04-25 09:58:07 PENDING 04-04-25 09:58:10 RUNNING Info: Message: Job is running

3. Recover the IP and external IP of your AI Training job

Using, your , you can retrieve your AI Training job IP.

ovhai job get  -o json | jq '.status.ip' -r

For example, you can obtain something like that: 10.42.80.176

You also need the External IP:

ovhai job get  -o json | jq '.status.externalIp' -r

Returning the IP address you will have to whitelist to be able to connect to your database (e.g. 51.210.38.188)

Step 6 – Whitelist AI Training job IP in PostgreSQL DB

From Databases & Analytics > Databases, edit your DB configuration to allow access from the job Extranal IP.

Then, you can see that the job External IP is now white listed.

Well done! Your MLflow server and the backend store are now connected.

Step 7 – Create an AI Notebook

It’s time to train and track your Machine Learning models using MLflow!

To do so, use the OVHcloud ovhai CLI and start a new AI Notebook with GPU.

ovhai notebook run conda jupyterlab \
  --name mlflow-notebook \
  --framework-version conda-py311-cudaDevel11.8 \
  --gpu 1

Full command explained:

ovhai notebook run

This is the core command to run a notebook using the OVHcloud AI Notebooks platform.

--name mlflow-notebook

Sets a custom name for the notebook. In this case, you can name it mlflow-notebook.

--framework-version conda-py311-cudaDevel11.8

Define the framework and version you want to use in your notebook. Here, you are using Python 3.11 with Conda framework and CUDA compatibility.

--gpu 1

Allocates 1 GPU for the job, by default a Tesla V100S from NVIDIA (ai1-1-gpu). You can select the flavor you want from the OVHcloud GPU range.

Then, check if your AI Notebook is RUNNING.

ovhai notebook get

Once your notebook is in RUNNING status, you should be able to access it using its URL:

State: RUNNING Duration: 1411412 Url: https://.notebook.gra.ai.cloud.ovh.net Grpc Address: .nb-grpc.gra.ai.cloud.ovh.net:443 Info Url: https://ui.gra.ai.cloud.ovh.net/notebook/

You can start your AI model development inside notebook.

Step 8 – Model training inside Jupyter notebook

To begin with, set up your notebook environment.

1. Create the requirements.txt file

numpy==2.2.3
scipy==1.15.2
mlflow==2.20.3
sklearn==1.6.1

2. Install dependencies

From a notebook cell, launch the following command.

!pip3 install -r requirements.txt

Perfect! You can start coding…

3. Import Python librairies

Here, you have to import os, mlflow and scikit-learn.

# import dependencies
import os
import mlflow
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor

In another notebook cell, set the MLflow tracking URI. Note that you have to replace 10.42.80.176 by your own job IP.

mlflow.set_tracking_uri("http://10.42.80.176:5000")

Then start training your model!

mlflow.autolog()

db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)

# Create and train models.
rf = RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3)
rf.fit(X_train, y_train)

# Use the model to make predictions on the test dataset.
predictions = rf.predict(X_test)

Output:

🏃 View run dashing-foal-850 at: http://10.42.80.176:5000/#/experiments/0/runs/e7dad7c073634ec28675c0defce2b9ec
🧪 View experiment at: http://10.42.80.176:5000/#/experiments/0

Congrats! You can now track your model training from MLflow remote server…

Step 9 – Track and compare models from MLflow remote server

Finally, access to MLflow dashboard using the job URL: https://.job.gra.ai.cloud.ovh.net

Then, you can check your model trainings and evaluations:

What a success! You can finally use your MLflow to evaluate, compare and archive your various trainings.

Step 10 – Monitor everything remotely

You now have a complete Machine Learning pipeline with remote experiment tracking. Access:

Metrics, Parameters, and Tags → PostgreSQL
Artifacts (Models, Files) → S3 bucket

This setup is reusable, automatable, and production-ready!

What’s next?

Automate deployment with OVHcloud APIs
Run different training sessions in parallel and compare them with your remote MLflow tracking server
Use AI Deploy to serve your trained models

Create your solution for Sign Language recognition with OVHcloud AI tools

Eléa Petton — Fri, 01 Sep 2023 09:27:49 +0000

A guide to build a solution for sign language interpretation based on a Computer Vision algorithm: YOLOv7.

Sign Language recognition with OVHcloud AI tools

Introduction

In the field of Artificial Intelligence, we often talk about Computer Vision and Object Detection, but what role do these AI techniques play in the vast field of healthcare? We’ll see that data plays a key role in AI applications for the medical-social sector.

Have you ever wondered if AI could be the solution to understand sign language?

Through this article, you will see that it is possible to use an AI model to detect signed letters. How? Thanks to the power of Computer Vision and Transfer Learning!

The article is organized as follows:

Objectives
American Sign Language Dataset
Fine-Tune YOLOv7 model for Sign Language detection
Deploy custom YOLOv7 model for real time detection

All the code for this blogpost is available in our dedicated G itHub repository. You can Fine-Tune YOLOv7 to detect signs with AI Notebooks tool and deploy the custom model for real-time detection with AI Deploy.

Objectives

The purpose of this article is to show how it is possible to deploy a solution for Sign Language recognition thanks to AI.

An Object Detection algorithm will be used to detect the various signs and categorize them. Although closely related to image classification, Object Detection performs Image Classification on a more precise scale.

In this article, you will learn how to Fine-Tune YOLOv7 model for Sign Language detection.

Once the model has been trained, what do you think of deploying a web app? Streamlit is the answer to your needs! At the end, AI will enable you to understand Sign Language, with real-time detection and written transcription.

American Sign Language Dataset

First of all, let’s talk data!

American Sign Language Letters Dataset v1 is a public set of alphabet images and their labels created by David Lee.

This dataset is composed of 1728 images and 26 classes with the alphabet letters from A to Z.

ASL dataset

This dataset is composed of images and their corresponding labels, which are in txt format and give information about the location of the object thanks to the x, y coordinates as well as the height and width of the bounding box.

Label components of the ASL dataset for YOLOv7 usage

This data format is ideal for training a YOLO type Object Detection model.

Fine-Tune YOLOv7 model for Sign Language recognition

How can the model YOLOv7 be trained to recognize American Sign Language letters?

Object Detection with YOLOv7

YOLOv7 is part of the “YOLO family” algorithms, which actually means “You Only Look Once.” In fact, unlike many detection algorithms, YOLO is a neural network that evaluates the position and class of identified objects from a single end-to-end network that detects classes using a fully connected layer.

Object Detection

Therefore, YOLO models pass only once on each image to detect the objects. This Object Detection model is particularly known for its speed and accuracy and allows real-time recognition.

But how can the model YOLOv7 be trained to recognize American Sign Language letters? Follow the next steps and let the magic work!

Fine-Tuning of YOLOv7

The full notebook is available on the following GitHub repository.

Import dependencies

Firstly, import the dependencies you need.

import torch
import os
import yaml
import torchvision
from IPython.display import Image, clear_output

Check GPU availability

Then, check the GPU availability. Indeed, the training of a model like YOLOv7 requires the use of GPU, in this case a Tesla V100S is used.

print('Setup complete. Using torch %s %s' % (torch.__version__, torch.cuda.get_device_properties(0) if torch.cuda.is_available() else 'CPU'))

Setup complete. Using torch 1.12.1+cu102 _CudaDeviceProperties(name='Tesla V100S-PCIE-32GB', major=7, minor=0, total_memory=32510MB, multi_processor_count=80)

Extract the dataset information

Next, you can access to the data.yaml file.

This file contains vital information about the dataset, especially the number of classes. Here we got 26 classes with the letters from A to Z.

# go to the directory where the data.yaml file is located to extract the number of classes
%cd /workspace/data
with open("data.yaml", 'r') as stream:
    num_classes = str(yaml.safe_load(stream)['nc'])

Now, it’s time to train YOLOv7 model!

Recover YOLOv7 weights

In this tutorial, you can use the Transfer Learning method by using YOLOv7 weights pre-trained on the COCO dataset.

How to define Transfer Learning?

For both humans and machines, learning something new takes time and practice. However, it is easier to perform out tasks similar to those already learned. As with humans, AI will be able to identify patterns from previous knowledge and apply them to new learning.

If a model is trained on a database, there is no need to re-train the model from scratch to fit a new set of similar data.

Main advantages of Transfer Learning:

saving resources
improving efficiency
model training facilitation
saving time

At this time, you can download the trained model:

# YOLOv7 path
%cd /workspace/yolov7
!wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7_training.pt

Saving to: ‘yolov7_training.pt’ yolov7_training.pt 100%[===================>] 72.12M 12.0MB/s in 5.5s

Run YOLOv7 training on ASL Letters Dataset

You can therefore set the following parameters.

workers: maximum number of dataloader workers.
device: cuda device.
batch-size: refers to the batch size (number of training examples used in one iteration).
data: refers to the path to the yaml file.
img: refers to the input images size.
cfg: define the model configuration.
weights: initial weights path.
name: save to project/name.
hyp: hyperparameters path.
epochs: refers to the number of training epochs. An epoch corresponds to one cycle through the full training dataset.

# time the performance
%time

# train yolov7 on custom data for 100 epochs
!python /workspace/yolov7/train.py \
          --workers 8 \
          --device 0 \
          --batch-size 8 \
          --data '/workspace/data/data.yaml' \
          --img 416 416 \
          --cfg '/workspace/yolov7/cfg/training/yolov7.yaml' \
          --weights '/workspace/yolov7/yolov7_training.pt' \
          --name yolov7-asl \
          --hyp '/workspace/yolov7/data/hyp.scratch.custom.yaml' \
          --epochs 100

Display results of YOLOv7 training on ASL Letters dataset

Then you can display the results of the training and check the evolution of the metrics.

# display images
Image(filename='/workspace/yolov7/runs/train/yolov7-asl/results.png', width=1000)  # view results

YOLOv7 training overview

Export new weights for future inference

Finally, you can extract the new weights coming from YOLOv7 training on ASL Alphabet dataset. The goal is to save the model weights in a bucket in the cloud for reuse in a dedicated application.

Firstly, rename the PyTorch model it with the name you want.

%cd /workspace/yolov7/runs/train/yolov7-asl/weights/
os.rename("best.pt","yolov7.pt")

/workspace/yolov7/runs/train/yolov7-asl/weights

Secondly, copy it in a new folder where you can put all the weights generated during your trainings.

%cp /workspace/yolov7/runs/train/yolov7-asl/weights/yolov7.pt /workspace/asl-volov7-model/yolov7.pt

Your model is ready? It’s now time to deploy a web app to use the model and benefit from real-time detection 🎉 !

Deploy custom YOLOv7 model for real time detection

Once this YOLOv7 model is trained, it can be used for inference. If you want to quickly build an app to serve your AI model, the Streamlit framework may be right for you.

What is Streamlit?

Now, it’s time to discuss about the framework used to create a Web App: Streamlit!

Streamlit allows you to transform data scripts into quickly shareable web applications using only the Python language. Moreover, this framework does not require front-end skills.

This is a time-saver for the data scientist who wants to deploy an app around the world of data!

To make this app accessible, you need to containerize it using Docker.

Streamlit web app

By creating an app, you will enable anyone to understand Sign Language, with Real-Time detection and written transcription.

Let’s go for the implementation!

Create the interface with Streamlit

First of all, we must build the web interface to take a photo and the various functions to analyze the signs present on this image.

load_model: this function should be pushed in “cache” so that you only have to load the model once

@st.cache
def load_model():

    custom_yolov7_model = torch.hub.load("WongKinYiu/yolov7", 'custom', '/workspace/asl-volov7-model/yolov7.pt')

    return custom_yolov7_model

get_prediction: the model analyzes the image and returns the result of the prediction

def get_prediction(img_bytes, model):

    img = Image.open(io.BytesIO(img_bytes))
    results = model(img, size=640)

    return results

analyse_image: the image is processed before and after the model analysis

def analyse_image(image, model):

    if image is not None:

        img = Image.open(image)

        bytes_data = image.getvalue()
        img_bytes = np.asarray(bytearray(bytes_data), dtype=np.uint8)
        result = get_prediction(img_bytes, model)
        result.render()

        for img in result.imgs:
            RGB_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            im_arr = cv2.imencode('.jpg', RGB_img)[1]
            st.image(im_arr.tobytes())

        result_list = list((result.pandas().xyxy[0])["name"])

    else:
        st.write("no asl letters were detected!")
        result_list = []

    return result_list

display_letters: the letters are recovered and displayed to form the final word

def display_letters(letters_list):

    word = ''.join(letters_list)
    path_file = "/workspace/word_file.txt"
    with open(path_file, "a") as f:
        f.write(word)

    return path_file

To access the full code of the app, refer to this GitHub repository.

Containerize your app with Docker

Once the app code has been created, it’s time to containerize it!

The containerization is based on the construction of a Docker image, and before this image is usable, several steps must be completed.

What are the containerization steps 🐳 ?

The following steps refer to this documentation where you can find detailed information.

Write the requirements.txt file
Create the Dockerfile
Build the Docker image
Tag and push the Docker image on a registry

Your docker image is created successfully? You are ready to launch your app 🚀 !

Deploy your app and make it accessible

The following command starts a new AI Deploy app running your Streamlit web interface.

ovhai app run
       --gpu 1 \
       --default-http-port 8501 \
       --volume asl-volov7-model@GRA/:/workspace/asl-volov7-model:RO \
       /yolov7-streamlit-asl-recognition:latest

In this command line, you can set up several parameters:

resources: choose between CPUs or GPUs
default HTTP port: precise the Streamlit default port – 8501
data: link the bucket containing your model
docker image: add your docker image addess

When your app is up and running, you can access the following page:

Resulting Streamlit app

Conclusion

Well done 🎉 ! You have learned how to create your own solution for Sign Language recognition with OVHcloud AI tools.

You have been able to Fine-Tune YOLOv7 model thanks to AI Notebooks and deploy a Real-Time recognition app with AI Deploy.

Want to find out more?

Notebook

You want to access the notebook? Refer to the GitHub repository.

To launch this notebook with AI Notebook, please refer to our documentation.

App

You want to access to the full code to create the Streamlit app? Refer to the GitHub repository.

To deploy this app with AI Deploy, please refer to our doc umentation.

References

Fine-Tuning LLaMA 2 Models using a single GPU, QLoRA and AI Notebooks

Mathieu Busquet — Fri, 21 Jul 2023 15:04:00 +0000

In this tutorial, we will walk you through the process of fine-tuning LLaMA 2 models, providing step-by-step instructions.

All the code related to this article is available in our dedicated GitHub repository . You can reproduce all the experiments with OVHcloud AI Notebooks.

Introduction

On July 18, 2023, Meta released LLaMA 2, the latest version of their Large Language Model (LLM).

Trained between January 2023 and July 2023 on 2 trillion tokens, these new models outperforms other LLMs on many benchmarks, including reasoning, coding, proficiency, and knowledge tests. This release comes in different flavors, with parameter sizes of 7B, 13B, and a mind-blowing 70B. Models are intended for free for both commercial and research use in English.

To suit every text generation needed and fine-tune these models, we will use QLoRA (Efficient Finetuning of Quantized LLMs), a highly efficient fine-tuning technique that involves quantizing a pretrained LLM to just 4 bits and adding small “Low-Rank Adapters”. This unique approach allows for fine-tuning LLMs using just a single GPU! This technique is supported by the PEFT library.

To fine-tune our model, we will create a OVHcloud AI Notebooks with only 1 GPU.

Mandatory requirements

To successfully fine-tune LLaMA 2 models, you will need the following:

Fill Meta’s form to request access to the next version of Llama. Indeed, the use of Llama 2 is governed by the Meta license, that you must accept in order to download the model weights and tokenizer.
Have a Hugging Face account (with the same email address you entered in Meta’s form).
Have a Hugging Face token.
Visit the page of one of the LLaMA 2 available models (version 7B, 13B or 70B), and accept Hugging Face’s license terms and acceptable use policy.
Log in to the Hugging Face model Hub from your notebook’s terminal by running the huggingface-cli login command, and enter your token. You will not need to add your token as git credential.
Powerful Computing Resources: Fine-tuning the Llama 2 model requires substantial computational power. Ensure you are running code on GPU(s) when using AI Notebooks or AI Training.

Set up your Python environment

Create the following requirements.txt file:

torch
accelerate @ git+https://github.com/huggingface/accelerate.git
bitsandbytes
datasets==2.13.1
transformers @ git+https://github.com/huggingface/transformers.git
peft @ git+https://github.com/huggingface/peft.git
trl @ git+https://github.com/lvwerra/trl.git
scipy

Then install and import the installed libraries:

pip install -r requirements.txt

import argparse
import bitsandbytes as bnb
from datasets import load_dataset
from functools import partial
import os
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, AutoPeftModelForCausalLM
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed, Trainer, TrainingArguments, BitsAndBytesConfig, \
    DataCollatorForLanguageModeling, Trainer, TrainingArguments
from datasets import load_dataset

Download LLaMA 2 model

As mentioned before, LLaMA 2 models come in different flavors which are 7B, 13B, and 70B. Your choice can be influenced by your computational resources. Indeed, larger models require more resources, memory, processing power, and training time.

To download the model you have been granted access to, make sure you are logged in to the Hugging Face model hub. As mentioned in the requirements step, you need to use the huggingface-cli login command.

The following function will help us to download the model and its tokenizer. It requires a bitsandbytes configuration that we will define later.

def load_model(model_name, bnb_config):
    n_gpus = torch.cuda.device_count()
    max_memory = f'{40960}MB'

    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        quantization_config=bnb_config,
        device_map="auto", # dispatch efficiently the model on the available ressources
        max_memory = {i: max_memory for i in range(n_gpus)},
    )
    tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=True)

    # Needed for LLaMA tokenizer
    tokenizer.pad_token = tokenizer.eos_token

    return model, tokenizer

Download a Dataset

There are many datasets that can help you fine-tune your model. You can even use your own dataset!

In this tutorial, we are going to download and use the Databricks Dolly 15k dataset, which contains 15,000 prompt/response pairs. It was crafted by over 5,000 Databricks employees during March and April of 2023.

This dataset is designed specifically for fine-tuning large language models. Released under the CC BY-SA 3.0 license, it can be used, modified, and extended by any individual or company, even for commercial applications. So it’s a perfect fit for our use case!

However, like most datasets, this one has its limitations. Indeed, pay attention to the following points:

It consists of content collected from the public internet, which means it may contain objectionable, incorrect or biased content and typo errors, which could influence the behavior of models fine-tuned using this dataset.
Since the dataset has been created for Databricks by their own employees, it’s worth noting that the dataset reflects the interests and semantic choices of Databricks employees, which may not be representative of the global population at large.
We only have access to the train split of the dataset, which is its largest subset.

# Load the databricks dataset from Hugging Face
from datasets import load_dataset

dataset = load_dataset("databricks/databricks-dolly-15k", split="train")

Explore dataset

Once the dataset is downloaded, we can take a look at it to understand what it contains:

print(f'Number of prompts: {len(dataset)}')
print(f'Column names are: {dataset.column_names}')

*** OUTPUT ***
Number of prompts: 15011
Column Names are: ['instruction', 'context', 'response', 'category']

As we can see, each sample is a dictionary that contains:

An instruction: What could be entered by the user, such as a question
A context: Help to interpret the sample
A response: Answer to the instruction
A category: Classify the sample between Open Q&A, Closed Q&A, Extract information from Wikipedia, Summarize information from Wikipedia, Brainstorming, Classification, Creative writing

Pre-processing dataset

Instruction fine-tuning is a common technique used to fine-tune a base LLM for a specific downstream use-case.

It will help us to format our prompts as follows:

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Sea or Mountain

### Response:
I believe Mountain are more attractive but Ocean has it's own beauty and this tropical weather definitely turn you on! SO 50% 50%

### End

To delimit each prompt part by hashtags, we can use the following function:

def create_prompt_formats(sample):
    """
    Format various fields of the sample ('instruction', 'context', 'response')
    Then concatenate them using two newline characters 
    :param sample: Sample dictionnary
    """

    INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
    INSTRUCTION_KEY = "### Instruction:"
    INPUT_KEY = "Input:"
    RESPONSE_KEY = "### Response:"
    END_KEY = "### End"
    
    blurb = f"{INTRO_BLURB}"
    instruction = f"{INSTRUCTION_KEY}\n{sample['instruction']}"
    input_context = f"{INPUT_KEY}\n{sample['context']}" if sample["context"] else None
    response = f"{RESPONSE_KEY}\n{sample['response']}"
    end = f"{END_KEY}"
    
    parts = [part for part in [blurb, instruction, input_context, response, end] if part]

    formatted_prompt = "\n\n".join(parts)
    
    sample["text"] = formatted_prompt

    return sample

Now, we will use our model tokenizer to process these prompts into tokenized ones.

The goal is to create input sequences of uniform length (which are suitable for fine-tuning the language model because it maximizes efficiency and minimize computational overhead), that must not exceed the model’s maximum token limit.

# SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def get_max_length(model):
    conf = model.config
    max_length = None
    for length_setting in ["n_positions", "max_position_embeddings", "seq_length"]:
        max_length = getattr(model.config, length_setting, None)
        if max_length:
            print(f"Found max lenth: {max_length}")
            break
    if not max_length:
        max_length = 1024
        print(f"Using default max length: {max_length}")
    return max_length


def preprocess_batch(batch, tokenizer, max_length):
    """
    Tokenizing a batch
    """
    return tokenizer(
        batch["text"],
        max_length=max_length,
        truncation=True,
    )


# SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def preprocess_dataset(tokenizer: AutoTokenizer, max_length: int, seed, dataset: str):
    """Format & tokenize it so it is ready for training
    :param tokenizer (AutoTokenizer): Model Tokenizer
    :param max_length (int): Maximum number of tokens to emit from tokenizer
    """
    
    # Add prompt to each sample
    print("Preprocessing dataset...")
    dataset = dataset.map(create_prompt_formats)#, batched=True)
    
    # Apply preprocessing to each batch of the dataset & and remove 'instruction', 'context', 'response', 'category' fields
    _preprocessing_function = partial(preprocess_batch, max_length=max_length, tokenizer=tokenizer)
    dataset = dataset.map(
        _preprocessing_function,
        batched=True,
        remove_columns=["instruction", "context", "response", "text", "category"],
    )

    # Filter out samples that have input_ids exceeding max_length
    dataset = dataset.filter(lambda sample: len(sample["input_ids"]) < max_length)
    
    # Shuffle dataset
    dataset = dataset.shuffle(seed=seed)

    return dataset

With these functions, our dataset will be ready for fine-tuning !

Create a bitsandbytes configuration

This will allow us to load our LLM in 4 bits. This way, we can divide the used memory by 4 and import the model on smaller devices. We choose to apply bfloat16 compute data type and nested quantization for memory-saving purposes.

def create_bnb_config():
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
    )

    return bnb_config

To leverage the LoRa method, we need to wrap the model as a PeftModel.

To do this, we need to implement a LoRa configuration:

def create_peft_config(modules):
    """
    Create Parameter-Efficient Fine-Tuning config for your model
    :param modules: Names of the modules to apply Lora to
    """
    config = LoraConfig(
        r=16,  # dimension of the updated matrices
        lora_alpha=64,  # parameter for scaling
        target_modules=modules,
        lora_dropout=0.1,  # dropout probability for layers
        bias="none",
        task_type="CAUSAL_LM",
    )

    return config

Previous function needs the target modules to update the necessary matrices. The following function will get them for our model:

# SOURCE https://github.com/artidoro/qlora/blob/main/qlora.py

def find_all_linear_names(model):
    cls = bnb.nn.Linear4bit #if args.bits == 4 else (bnb.nn.Linear8bitLt if args.bits == 8 else torch.nn.Linear)
    lora_module_names = set()
    for name, module in model.named_modules():
        if isinstance(module, cls):
            names = name.split('.')
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])

    if 'lm_head' in lora_module_names:  # needed for 16-bit
        lora_module_names.remove('lm_head')
    return list(lora_module_names)

Once everything is set up and the base model is prepared, we can use the print_trainable_parameters() helper function to see how many trainable parameters are in the model.

def print_trainable_parameters(model, use_4bit=False):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        num_params = param.numel()
        # if using DS Zero 3 and the weights are initialized empty
        if num_params == 0 and hasattr(param, "ds_numel"):
            num_params = param.ds_numel

        all_param += num_params
        if param.requires_grad:
            trainable_params += num_params
    if use_4bit:
        trainable_params /= 2
    print(
        f"all params: {all_param:,d} || trainable params: {trainable_params:,d} || trainable%: {100 * trainable_params / all_param}"
    )

We expect the LoRa model to have fewer trainable parameters compared to the original one, since we want to perform fine-tuning.

Train

Now that everything is ready, we can pre-process our dataset and load our model using the set configurations:

# Load model from HF with user's token and with bitsandbytes config

model_name = "meta-llama/Llama-2-7b-hf" 

bnb_config = create_bnb_config()

model, tokenizer = load_model(model_name, bnb_config)

## Preprocess dataset

max_length = get_max_length(model)

dataset = preprocess_dataset(tokenizer, max_length, seed, dataset)

Then, we can run our fine-tuning process:

def train(model, tokenizer, dataset, output_dir):
    # Apply preprocessing to the model to prepare it by
    # 1 - Enabling gradient checkpointing to reduce memory usage during fine-tuning
    model.gradient_checkpointing_enable()

    # 2 - Using the prepare_model_for_kbit_training method from PEFT
    model = prepare_model_for_kbit_training(model)

    # Get lora module names
    modules = find_all_linear_names(model)

    # Create PEFT config for these modules and wrap the model to PEFT
    peft_config = create_peft_config(modules)
    model = get_peft_model(model, peft_config)
    
    # Print information about the percentage of trainable parameters
    print_trainable_parameters(model)
    
    # Training parameters
    trainer = Trainer(
        model=model,
        train_dataset=dataset,
        args=TrainingArguments(
            per_device_train_batch_size=1,
            gradient_accumulation_steps=4,
            warmup_steps=2,
            max_steps=20,
            learning_rate=2e-4,
            fp16=True,
            logging_steps=1,
            output_dir="outputs",
            optim="paged_adamw_8bit",
        ),
        data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False)
    )
    
    model.config.use_cache = False  # re-enable for inference to speed up predictions for similar inputs
    
    ### SOURCE https://github.com/artidoro/qlora/blob/main/qlora.py
    # Verifying the datatypes before training
    
    dtypes = {}
    for _, p in model.named_parameters():
        dtype = p.dtype
        if dtype not in dtypes: dtypes[dtype] = 0
        dtypes[dtype] += p.numel()
    total = 0
    for k, v in dtypes.items(): total+= v
    for k, v in dtypes.items():
        print(k, v, v/total)
     
    do_train = True
    
    # Launch training
    print("Training...")
    
    if do_train:
        train_result = trainer.train()
        metrics = train_result.metrics
        trainer.log_metrics("train", metrics)
        trainer.save_metrics("train", metrics)
        trainer.save_state()
        print(metrics)    
    
    ###
    
    # Saving model
    print("Saving last checkpoint of the model...")
    os.makedirs(output_dir, exist_ok=True)
    trainer.model.save_pretrained(output_dir)
    
    # Free memory for merging weights
    del model
    del trainer
    torch.cuda.empty_cache()
    
    
output_dir = "results/llama2/final_checkpoint"
train(model, tokenizer, dataset, output_dir)

If you prefer to have a number of epochs (entire training dataset will be passed through the model) instead of a number of training steps (forward and backward passes through the model with one batch of data), you can replace the max_steps argument by num_train_epochs.

To later load and use the model for inference, we have used the trainer.model.save_pretrained(output_dir) function, which saves the fine-tuned model’s weights, configuration, and tokenizer files.

Fine-tuning llama2 results on databricks-dolly-15k dataset

Unfortunately, it is possible that the latest weights are not the best. To solve this problem, you can implement a EarlyStoppingCallback, from transformers, during your fine-tuning. This will enable you to regularly test your model on the validation set, if you have one, and keep only the best weights.

Merge weights

Once we have our fine-tuned weights, we can build our fine-tuned model and save it to a new directory, with its associated tokenizer. By performing these steps, we can have a memory-efficient fine-tuned model and tokenizer ready for inference!

model = AutoPeftModelForCausalLM.from_pretrained(output_dir, device_map="auto", torch_dtype=torch.bfloat16)
model = model.merge_and_unload()

output_merged_dir = "results/llama2/final_merged_checkpoint"
os.makedirs(output_merged_dir, exist_ok=True)
model.save_pretrained(output_merged_dir, safe_serialization=True)

# save tokenizer for easy inference
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.save_pretrained(output_merged_dir)

Conclusion

We hope you have enjoyed this article!

You are now able to fine-tune LLaMA 2 models on your own datasets!

In our next tutorial, you will discover how to Deploy your Fine-tuned LLM on OVHcloud AI Deploy for inference!

AI Notebooks Archives - OVHcloud Blog

Reference Architecture: set up MLflow Remote Tracking Server on OVHcloud

Overview of the MLflow server architecture

Prerequisites

Architecture guide: MLflow remote tracking server

Step 1 – Install ovhai CLI

Step 2 – Provision Object Storage (Artifact Store)

Step 3 – Create PostgreSQL Managed DB (Backend Store)

Step 4 -Build you custom MLflow Docker image and

Step 5 – Start MLflow Tracking Server inside container

Step 6 – Whitelist AI Training job IP in PostgreSQL DB

Step 7 – Create an AI Notebook

Step 8 – Model training inside Jupyter notebook

Step 9 – Track and compare models from MLflow remote server

Step 10 – Monitor everything remotely

What’s next?

Create your solution for Sign Language recognition with OVHcloud AI tools

Introduction

Objectives

American Sign Language Dataset

Fine-Tune YOLOv7 model for Sign Language recognition

Object Detection with YOLOv7

Import dependencies

Check GPU availability

Extract the dataset information

Recover YOLOv7 weights

Run YOLOv7 training on ASL Letters Dataset

Display results of YOLOv7 training on ASL Letters dataset

Export new weights for future inference

Deploy custom YOLOv7 model for real time detection

What is Streamlit?

Create the interface with Streamlit

Containerize your app with Docker

Deploy your app and make it accessible

Conclusion

Want to find out more?

Notebook

App

References

Fine-Tuning LLaMA 2 Models using a single GPU, QLoRA and AI Notebooks

Introduction

Mandatory requirements

Set up your Python environment

Download LLaMA 2 model

Download a Dataset

Explore dataset

Pre-processing dataset

Create a bitsandbytes configuration

Train

Merge weights

Conclusion

Step 1 – Install `ovhai` CLI